Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
7,
No.
5,
October
2017,
pp.
2565
ā
2573
ISSN:
2088-8708
2565
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
V
ideo
Shot
Boundary
Detection
Using
The
Scale
In
v
ariant
F
eatur
e
T
ransf
orm
and
RGB
Color
Channels
Zaynab
El
khattabi
1
,
Y
ouness
T
abii
2
,
and
Abdelhamid
Benkaddour
3
1,3
LIR
OSA
Laboratory
,
F
aculty
of
Sciences,
Abdelmalek
Essaadi
Uni
v
ersity
,T
etuan,
Morocco
2
LIR
OSA
Laboratory
,
National
School
of
Applied
Sciences,
Abdelmalek
Essaadi
Uni
v
ersity
,
T
etuan,
Morocco
Article
Inf
o
Article
history:
Recei
v
ed:
May
5,
2017
Re
vised:
Jun
12,
2017
Accepted:
Jun
29,
2017
K
eyw
ord:
V
ideo
Se
gmentation
Shot
Boundary
Detection
Gradual
T
ransition
Abrupt
Change
SIFT
ABSTRA
CT
Se
gmentation
of
the
video
sequence
by
detecting
shot
changes
is
essential
for
video
analysis,
inde
xing
and
retrie
v
al.
In
this
conte
xt,
a
shot
boundary
detection
algorithm
is
proposed
in
this
paper
based
on
the
scale
in
v
ariant
feature
transform
(SIFT).
The
ļ¬rst
step
of
our
method
consists
on
a
top
do
wn
search
scheme
to
detect
the
locations
of
tran-
sitions
by
comparing
the
ratio
of
matched
features
e
xtracted
via
SIFT
for
e
v
ery
RGB
channel
of
video
frames.
The
o
v
ervie
w
step
pro
vides
the
locations
of
boundaries.
Sec-
ondly
,
a
mo
ving
a
v
erage
calculation
is
performed
to
determine
the
type
of
transition.
The
proposed
method
can
be
used
for
detecting
gradual
transitions
and
abrupt
changes
without
requiring
an
y
training
of
the
video
content
in
adv
ance.
Experiments
ha
v
e
been
conducted
on
a
multi
type
video
database
and
sho
w
that
this
algorithm
achie
v
es
well
performances.
Copyright
c
ī
2017
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
Zaynab
El
khattabi
F
aculty
of
Sciences,
Abdelmalek
Essaadi
T
etuan,
Morocco
zaynabelkhattabi@gmail.com
1.
INTR
ODUCTION
The
high
increasing
v
olume
of
video
content
on
the
W
eb
has
created
profound
challenges
for
de
v
eloping
ef
ļ¬cient
inde
xing
and
search
techniques
to
manage
video
data.
Whereas
m
anaging
multimedia
data
requires
more
than
collecting
the
data
into
storage
archi
v
es
and
deli
v
ering
it
via
netw
orks
to
homes
or
of
ļ¬ces,
content
based
video
retrie
v
al
is
becoming
a
highly
recommended
trend
in
man
y
video
retrie
v
al
systems.
Ho
we
v
er
,
con
v
entional
techniques
such
as
video
compression
and
summarization
stri
v
e
for
the
tw
o
commonly
conļ¬icting
goals
of
lo
w
storage
and
high
visual
and
semantic
ļ¬delity
[1].
V
ideo
se
gmentation
is
the
fundamental
process
for
a
number
of
applications
related
to
automatic
video
inde
xing,
bro
wsing
and
video
analysis.
The
basic
requirement
of
video
se
gmentation
is
to
partition
a
video
into
shots.
It
is
often
used
as
a
basic
meaningful
unit
in
a
video.
In
[2],
Thompson
et
al.
deļ¬ned
a
video
shot
as
the
smallest
unit
of
visual
information
captured
at
one
time
by
a
camera
that
sho
ws
a
certain
action
or
e
v
ent.
Therefore,
se
gmenting
video
into
separate
video
shots
needs
to
detect
the
joining
of
tw
o
shots
in
the
video
and
locate
the
position
of
these
joins.
There
are
a
number
of
dif
ferent
types
of
transitions
or
boundaries
between
shots.
A
cut
is
an
abrupt
shot
change
that
occurs
in
a
single
frame.
A
f
ade
is
a
slo
w
change
in
brightness
usually
resulting
in
or
starting
with
a
solid
black
frame.
A
dissolv
e
occurs
when
the
images
of
the
ļ¬rst
shot
get
dimmer
and
the
images
of
the
second
shot
get
brighter
,
with
frames
within
the
transition
sho
wing
one
image
superimposed
on
the
other
.
A
wipe
occurs
when
pix
els
from
the
second
shot
replace
those
of
the
ļ¬rst
shot
in
a
re
gular
pat
tern
such
as
in
a
line
from
the
left
edge
of
the
frames
[3].
Other
types
o
f
shot
transitions
include
computer
generated
ef
fects
such
as
morphing.
The
ef
fects
of
this
kind
of
transition
are
obtained
with
the
h
e
lp
of
the
cross-dissolv
e
or
f
ading
techniques
which
permit
to
achie
v
e
a
smooth
change
of
image
content
(i.e.
te
xture
and/or
color)
from
source
to
tar
get
frames.
Whereas
there
is
a
wealth
of
research
on
shot
boundary
detecti
o
n
(SBD),
some
methods
aim
at
detecting
J
ournal
Homepage:
http://iaesjournal.com/online/inde
x.php/IJECE
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
,
DOI:
10.11591/ijece.v7i5.pp2565-2573
Evaluation Warning : The document was created with Spire.PDF for Python.
2566
ISSN:
2088-8708
abrupt
boundaries,
while
others
focus
on
gradual
boundaries.
In
addition,
certain
kind
of
transitions
can
be
easily
confused
with
camera
motion
or
object
motion.
In
this
paper
,
a
shot
boundary
detection
scheme
bas
ed
on
SIFT
is
proposed.
Section
2.
presents
the
v
arious
methods
that
ha
v
e
been
proposed
in
thi
s
ļ¬eld,
section
3.
presents
the
method.
Finally
,
section
4.
and
5.
gi
v
e
the
e
xperiments
and
a
conclusion.
2.
RELA
TED
W
ORKS
In
literature,
Algorithms
for
shot
boundary
detection
can
broadly
be
classiļ¬ed
into
man
y
groups;
we
can
ļ¬nd
lots
of
techniques
include
comparison
of
pix
el
v
alues,
statistical
dif
ferences,
histogram
comparisons,
edge
dif
ferences,
compression
dif
ferences,
and
motion
v
ectors
to
quantify
the
v
ariation
of
continuous
video
frames.
The
easiest
w
ay
to
detect
if
tw
o
frames
are
signiļ¬cantly
dif
ferent
is
to
count
the
number
of
pix
els
that
change
in
v
alue
more
than
some
threshold.
This
total
is
compared
ag
ainst
a
second
threshold
to
determine
if
a
shot
boundary
has
been
found.
Only
the
luminance
channel
of
the
considered
videos
is
considered
in
this
case.
If
the
number
of
pix
els
which
change
from
one
image
to
another
e
xceeds
a
certain
threshold
a
shot
transition
is
declared
[4].
A
technique
introduced
and
v
alidated
during
the
TRECVID
2004
campaign
is
presented
in
[5].
First,
small
images
are
created
from
the
original
frames
by
taking
one
pix
el
e
v
ery
eight
pix
els
and
the
y
are
con
v
erted
to
HSV
color
space,
only
the
V
component
is
k
ept
for
luminance
processing.
W
ith
e
v
ery
ne
w
frame,
the
absolute
dif
ference
between
pix
els
intensity
is
computed
and
compared
with
the
a
v
erage
v
alues
to
detect
cut
transitions.
Re
g
arding
the
gradual
transitions
the
method
can
detect
only
dissolv
es
and
f
ades.
The
idea
proposed
in
[6]
is
di
viding
the
images
into
12
re
gions
and
founding
the
bes
t
match
for
each
re
gion
in
a
neighborhood
around
the
re
gion
in
the
other
image.
Gradual
t
ransitions
were
detected
by
generating
a
cumulati
v
e
dif
ference
measure
from
consecuti
v
e
v
alues
of
the
image
dif
ferences.The
incon
v
enient
of
methods
based
on
comparison
of
pix
el
v
alues
is
their
sensiti
vity
to
camera
motion.
T
o
a
v
oid
this
problem
of
camera
motion
and
object
mo
v
ements,
some
tec
h
ni
ques
can
be
done
by
com-
paring
the
histograms
of
successi
v
e
images.
The
idea
behind
histogram-based
approaches
(
[7],
[8])
is
that
tw
o
frames
with
unchanging
background
and
unchanging
(although
mo
ving)
objects
will
ha
v
e
little
d
i
f
ference
in
their
histograms.
Color
histograms
are
used
in
[9]
to
detect
shot
boundaries
by
representing
each
frame
of
the
video
by
their
color
histogram
features.
Then,
the
video
frames
are
treated
as
a
sequence
of
feature
v
ectors
which
are
fed
to
the
split
and
mer
ge
frame
w
ork.
After
completion
of
recursi
v
e
split
and
mer
ge
process,
the
s
h
ot
boundaries
are
identiļ¬ed
easily
.
Another
approach
to
detect
shot
boundaries
is
edge/contour
-based
methods
that
e
xploit
the
contour
in-
formation
present
in
the
indi
vidual
frames,
under
the
assumpt
ion
that
the
amount
and
location
of
edges
between
consecuti
v
e
frames
should
not
change
drastically
.
In
[10],
the
feature
of
edge
pix
el
count
is
proposed
for
shot
detection,
where
Sobel
edge
detector
is
used.
Besides,
color
,
edge
or
te
xture
information
can
be
combined
to
mak
e
use
of
the
adv
antages
of
all
this
features
and
increase
the
accurac
y
of
the
technique
used.
An
e
xample
of
this
combination
is
proposed
in
[11]
using
global
color
features
combined
with
the
characteristics
of
local
edge.
Some
temporal
ļ¬ltering
mechanism
is
used
to
eliminate
camera
motion
noise
when
it
is
present
in
detect-
ing
shot
changes.
The
w
ork
analysis
resides
in
the
discrimination
between
camera
w
ork-induced
apparent
motion
and
object
motion-induced
apparent
motion,
follo
wed
by
analysis
of
the
camera
w
ork-induced
motion
in
order
to
identify
camera
w
ork
[12].
In
[13],
an
approach
block-based
motion
estimation
is
used,
in
which
the
whole
frame
is
di
vided
into
possible
blocks
of
3x3
pix
els.
All
pix
els
within
the
same
block
are
assumed
to
belong
to
the
same
object,
which
under
goes
translational
motion.
Each
block
is
compared
with
all
possible
such
blocks
within
the
corresponding
search
windo
w
with
the
same
center
pix
el
location
in
current
frame.
In
an
other
side,
a
camera
motion
characterization
technique
is
introduced
in
[14]
using
a
camera
motion
histogram
descriptor
to
represent
the
o
v
erall
motion
acti
vity
of
a
shot.
V
arious
features
can
be
combined
to
mak
e
use
of
the
adv
antages
of
v
arious
popular
techniques
such
as
color
,
te
xture,
shape
and
motion
v
ectors
in
spatial
as
well
as
in
transformed
domains
such
as
F
ourier
,
cosine
w
a
v
elets,
Eigen
v
alues,
etc.
An
e
xample
of
such
combinations
is
presented
in
[15]
where
color
feature
is
used
and
in
[16],
where
te
xture
feature
is
used.
T
e
xture
methods
lik
e
Local
Binary
P
atterns
(LBP)
are
used
in
v
arious
recent
computer
vision
and
pattern
recognition
applications.
In
[16]
an
e
xtension
of
LBP
histogram
is
used
to
represent
the
frame
te
xture,
it
is
called
Midrange
LBP
(MRLBP).
The
authors
justify
their
proposition
by
the
comparison
of
gray
center
pix
el
v
alue,
a
v
erage
gray
v
alue
and
midrange
gray
v
al
ue
that
is
more
rob
ust
to
noise
and
illumination
v
ariants.
LBP
histogram
v
alues
are
e
xtracted
based
on
midrange
statistics
on
each
frame
and
the
y
are
stored
as
a
feature
v
ector
in
a
video
sequence.
Then,
the
dissimilarity
metric
is
applied
on
the
feature
v
ectors
of
adjacent
frames
to
be
used
for
shot
detection
process
using
adapti
v
e
threshold
approach.
IJECE
V
ol.
7,
No.
5,
October
2017:
2565
ā
2573
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
2567
Shot
boundary
detection
approaches
can
also
be
cate
gorized
based
on
machine
learning
techniques
such
as
support
v
ector
machines,
neural
netw
orks,
fuzzy
logic,
clustering
techniques
and
Eigen
analysis
[17]
.
In
this
conte
xt,
the
problem
of
shot
detect
ion
in
endoscopic
sur
gery
videos
is
addressed
in
[18]
to
manage
the
video
content
of
sur
gical
procedures.
The
method
proposed
relies
on
the
application
of
a
v
ariational
Bayesian
(VB)
frame
w
ork
for
computing
the
posterior
distrib
ution
of
spatiotemporal
Gaussian
mixture
models
(GMMs).
The
video
is
ļ¬rst
decomposed
into
a
series
of
consecuti
v
e
clips
of
ļ¬x
ed
duration.
Then,
the
VBGMM
algorithm
is
applied
on
feature
v
ectors
e
xtracted
from
each
clip
to
handle
automatically
the
number
of
components
which
are
matched
along
the
video
sequence.
These
components
denote
clusters
of
pix
els
in
the
video
clip
with
similar
feature
v
alues
and
the
labels
are
the
tags
of
these
components.
Hence,
the
process
of
label
tracking
starts
to
deļ¬ne
s
hot
borders
when
component
tracking
f
ails,
signifying
a
dif
ferent
visual
appearance
of
the
sur
gical
scene.
Genetic
Algorithm
and
Fuzzy
Logic
ha
v
e
been
also
used
for
shot
boundary
detection.
The
authors
of
[19]
proposed
a
system
based
on
computing
the
Normalized
Color
Histogram
Dif
ference
between
each
tw
o
consecuti
v
e
frames
in
a
video.
Then,
a
fuzzy
system
is
performed
to
classify
the
frames
into
abrupt
and
gradual
changes.
In
order
to
optimize
the
fuzzy
system,
genetic
algorithm
GA
is
used.
The
results
sho
w
the
beneļ¬ts
of
the
GA
optimization
process
on
achie
ving
a
lo
w
computational
time.
Man
y
recent
approaches
reported
in
the
literature
related
to
shot
boundary
detection
rely
on
SIFT
([20],
[21]).
The
method
proposed
in
[20]
i
s
based
on
SIFT
-point
distrib
ution
histogram
e
xtraction.
Each
video
frame
is
represented
by
a
histogram,
named
SIFT
-point
distrib
ution
histogram
(SIFT
-PDH).
It
describes
the
distrib
ution
of
the
e
xtracted
stable
k
e
ypoints
within
the
frame
under
polar
coordinates.
Distance
comparison
represents
the
dif
ference
between
each
tw
o
consecuti
v
e
frames
of
the
video;
it
is
calculated
by
comparing
their
SIFT
-PDHs.
An
adapti
v
e
threshold
is
used
to
identify
the
shot
boundaries.
Some
other
surv
e
ys
of
e
xisting
SBD
techniques
in
the
literature
are
pro
vided
and
discussed
in
[22].
3.
PR
OPOSED
METHOD
Selection
of
an
appropriate
approach
feature
for
se
gmenting
a
video
sequence
into
shots
is
the
most
critical
issues.
Se
v
eral
such
features
ha
v
e
been
suggest
ed
in
the
literature
(histogram
dif
ference,
optical
ļ¬o
w
...),
b
ut
none
of
them
is
general
enough
to
operate
for
all
of
changes
in
the
video
data.
The
proposed
method
is
based
on
feature
e
xtraction
using
scale
in
v
ariant
feature
transform
adopted
by
Da
vid
G.
Lo
we
[23].
The
reason
of
this
choice
is
that
the
SIFT
image
features
are
in
v
ariant
to
image
rotation,
scale
and
rob
ust
across
a
substantial
range
of
af
ļ¬ne
distortion,
addition
of
noise,
and
change
in
illumination.
Firstly
,
the
video
is
o
v
ervie
wed
and
zooms
in
where
v
er
a
shot
boundary
e
xists
using
a
top
do
wn
search
scheme
that
is
presented
in
[24].
The
search
is
carried
out
by
comparing
the
ratio
of
matched
k
e
ypoints
e
xtracted
via
SIFT
for
e
v
ery
RGB
channel
of
tw
o
video
frames
separated
by
a
temporal
sampling
period
N
.
SIFT
descriptors
are
computed
o
v
er
all
three
channels
of
the
RGB
color
space.
Hence,
three
feature
descriptors
matrices
associated
with
R,
G
and
B
color
spaces
ha
v
e
been
obtained
for
each
N
th
frame.
Instead
of
comparing
the
number
of
SIFT
feature
k
e
y
point
s,
we
calculate
and
compare
the
ratio
of
matched
number
to
total
number
between
e
v
ery
tw
o
sampled
frames
to
a
v
oid
f
alse
detection
caused
by
too
fe
w
k
e
ypoints
generated.
In
order
to
zoom
into
the
location
of
boundaries,
peaks
are
detected
and
ļ¬ltered
to
tak
e
only
the
deep
enough
peaks
to
be
re
g
arded
as
boundaries.
3.1.
F
eatur
e
Extraction
Scale
In
v
ariant
Feature
T
ransform
(SIFT)
is
an
approach
for
detecting
and
e
xtracti
ng
local
feature
de-
scriptors
that
are
reasonably
in
v
ariant
to
changes
in
illumination,
image
noise,
rotation,
scaling,
and
small
changes
in
vie
wpoint.
There
are
four
major
steps:
Detection
of
scale-space
e
xtreme,
accurate
k
e
ypoint
locali
zation,
orien-
tation
assignment,
descriptor
representation.
ī
scale-space
peak
selection:
The
ļ¬rst
stage
of
computation
searches
o
v
er
all
scales
and
image
locations.
It
is
implemented
ef
ļ¬ciently
by
using
a
dif
ference-of-Ga
ussian
function
(DoG)
to
identify
k
e
ypoint
candidates
for
SIFT
features
that
ar
e
in
v
ariant
to
scale
and
orientation.
DoG
scale
space
can
be
obtained
from
equation
(1).
D
(
x;
y
;
ī
)
=
(
G
(
x;
y
;
k
ī
)
ī
G
(
x;
y
;
ī
))
ī
I
(
x;
y
)
(1)
where
*
is
the
con
v
olution
operation,
I(x,y)
is
the
gray
v
alue
of
pix
el
at
(x,y)
and
G(x,y
,
ī
)
is
a
v
ariable-scale
Gaussian
k
ernel
deļ¬ned
as:
G
(
x;
y
;
ī
)
=
1
2
ī
ī
2
e
ī
(
x
2
+
y
2
)
=
2
ī
2
(2)
V
ideo
Shot
Boundary
Detection
Using
The
Scale
In
variant
F
eatur
e
T
r
ansform
and
RGB
...
(Z.
El
khattabi)
Evaluation Warning : The document was created with Spire.PDF for Python.
2568
ISSN:
2088-8708
ī
K
eypoint
localization:
At
each
candidate
location,
a
detailed
model
is
ļ¬t
to
determine
location
and
scale.
K
e
ypoints
are
selected
based
on
measures
of
their
stability
.
Lo
w
contrast
k
e
ypoints
introduced
by
noise
and
edge
response
will
be
remo
v
ed.
ī
Orientation
assignment:
An
orientation
is
assigned
to
each
k
e
ypoint
to
achie
v
e
in
v
ariance
to
image
ro-
tation.
A
neigbourhood
is
tak
en
around
the
k
e
ypoint
location
depending
on
the
scale,
and
the
gradient
magnitude
and
direction
is
calculated
in
that
re
gion.
An
orientation
histogram
with
36
bins
co
v
ering
360
de
grees
is
created.
ī
k
eypoint
descriptor:
A
16x16
neighborhood
around
the
k
e
ypoint
is
tak
en.
It
is
di
vided
into
16
sub-blocks
of
4x4
sizes.
F
or
each
sub-block,
8
bin
orientation
histogram
is
created.
So
a
tot
al
of
128
bin
v
alues
are
a
v
ailable.
It
leads
to
a
SIFT
feature
v
ector
of
128
dimensions.
Color
pro
vides
more
discriminatory
information
than
simple
intensities.
Although,
RGB
Color
space
is
simple
and
v
ery
common.
Hence,
in
our
w
ork,
SIFT
descriptors
are
computed
for
e
v
ery
RGB
channel
indepen-
dently
,
and
the
information
a
v
ailable
in
the
three
dif
ferent
color
spaces
are
combined,
unlik
e
SIFT
model
that
is
designed
only
for
grayscale
information
and
misses
important
visual
information
re
g
arding
color
.
3.2.
Shot
boundary
detection
SIFT
k
e
ypoints
are
e
xtracted
from
frames
of
video
and
then
ratios
of
matched
k
e
ypoints
number
to
total
number
between
frame
i
and
frame
i+N
are
used
to
detect
shot
boundaries.
The
adv
antage
of
feature
matching
is
that
it
is
in
v
ariant
to
af
ļ¬ne
transformations;
thus,
we
can
e
v
en
match
objects
after
the
y
ha
v
e
mo
v
ed.
Figure
1
sho
ws
local
feature
matching
between
tw
o
frames.
(a)
Frames
within
the
same
shot.
(b)
Frames
from
dif
ferent
shots.
Figure
1.
Feature
k
e
ypoints
matching
between
tw
o
frames.
The
simi
larity
matching
between
tw
o
frames
in
the
same
shot
is
usually
high,
due
to
the
similar
image
feature,
objects
and
colors.
Ho
we
v
er
,
frames
from
dif
ferent
shots
ha
v
e
visual
discontinui
ty
.
As
a
result,
the
y
ha
v
e
no
similarity
matching
or
a
lo
w
number
of
it.
3.2.1.
The
top
do
wn
sear
ch
scheme
T
o
a
v
oid
unnecessary
processing
of
video
frames
within
an
y
shot,
a
search
is
ļ¬rst
carried
out
by
perform-
ing
similarit
y
matching
for
e
v
ery
N
th
frame
i
n
the
video.
It
is
a
good
solution
for
decreasing
computational
cost.
Let
us
denote
the
i
th
frame
of
a
video
as
F(i)
.
Then,
the
algorithm
is
conducted
as
follo
ws
(Figure
2):
Figure
2.
The
top
do
wn
search
process.
IJECE
V
ol.
7,
No.
5,
October
2017:
2565
ā
2573
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
2569
Each
color
channel
obtained
for
each
N
th
frame
of
the
video
i
s
subjected
to
feature
e
xtraction
process
(SIFT
-RGB),
the
output
of
which
is
fed
to
similarity
matching
process
among
the
successi
v
e
frames
that
results
in
three
similarity
v
alues
for
each
i
frame:
ratioR,
ratioG
and
ratioB.
This
similarity
information
is
fused
to
obtain
one
ratio
representing
the
matched
similarities
between
F(i)
and
F(i+N)
.
The
choice
of
using
the
ratios
of
matched
features
e
xtracted
to
total
number
features,
instead
of
comparing
the
number
of
feature
k
e
ypoints
with
a
preļ¬x
ed
threshold,
is
referred
to
the
f
alse
detection
caused
by
the
small
number
of
k
e
ypoints
in
the
frames
with
fe
w
objects
and
colors,
whi
ch
generates
a
fe
wer
matched
similarities
e
v
en
though
the
y
are
similar
.
The
ratio
for
each
color
channel
of
the
frame
F
i
is
deļ¬ned
as:
r
atioR
(
i
)
=
2
M
r
K
r
(
F
i
)
+
K
r
(
F
i
+
N
))
(3)
r
atioG
(
i
)
=
2
M
g
K
g
(
F
i
)
+
K
g
(
F
i
+
N
))
(4)
r
atioB
(
i
)
=
2
M
b
K
b
(
F
i
)
+
K
b
(
F
i
+
N
))
(5)
Where
M
r
,
M
g
and
M
b
are
the
number
of
matches
found
respecti
v
ely
for
red,
green
and
blue
color
planes
between
F
i
and
F
i
+
N
.
K
r
,
K
g
and
K
b
are
the
total
number
of
feature
k
e
ypoints
e
xtracted
from
each
color
plane
of
the
frame
.The
ļ¬nal
ratio
obtained
from
the
three
ratios
is
deļ¬ned
as:
R
atio
R
GB
(
i
)
=
r
atioR
+
r
atioG
+
r
atioB
3
(6)
The
determination
of
the
temporal
sampling
period
N
depends
on
the
type
of
video
content
and
the
duration
of
the
shots,
if
a
sequence
of
successi
v
e
frames
is
captured
by
man
y
cameras
lik
e
in
case
of
action
mo
vies,
we
can
ha
v
e
uncontinuous
action
and
v
ery
short
shots.
Consequently
,
an
entire
shot
may
start
and
end
up
between
the
sampled
frames
and
be
missed.
F
or
that,
the
choice
of
N
must
tak
e
into
consideration
the
nature
of
video
content.
The
temporal
sampling
period
N
is
chosen
to
be
N=25
(1
sec)
in
the
e
xample
illustrated
in
ļ¬gure
3.
Figure
3.
the
o
v
ervie
w
of
a
video
with
N=25.
In
order
to
zoom
into
locations
of
shot
boundaries
,
e
xtrema
peaks
are
detected
to
ļ¬lter
the
v
ery
deep
peaks
to
be
tak
en
as
boundaries.
The
peak
detection
function
is
used
in
[24]
to
ļ¬nd
boundaries
by
comparing
each
minima
peak
with
the
pre
vious
and
successi
v
e
e
xtrema
peaks,
using
a
threshold
T=0.5
to
compare
the
depth
of
the
peak
with
the
others.
The
boundaries
detection
function
is
described
in
Algorithm
1.
P
i
is
a
peak
and
P
t
and
P
r
are
the
left
and
right
end
of
the
peak.
Dashed
lines
in
ļ¬gure
3
present
the
peaks
detected
with
this
function.
V
ideo
Shot
Boundary
Detection
Using
The
Scale
In
variant
F
eatur
e
T
r
ansform
and
RGB
...
(Z.
El
khattabi)
Evaluation Warning : The document was created with Spire.PDF for Python.
2570
ISSN:
2088-8708
Algorithm
1:Boundaries
detection
1:
F
or
i=1,2,3,...
do
2:
if
(
P
i
<
P
i
ī
1
and
P
i
<
P
i
+1
)
3:
then
t=i-1;
r=i+1;
4:
while
(
P
t
<
P
t
ī
1
)
t=t-1;
5:
while
(
P
r
<
P
r
+1
)
r=r+1;
6:
if
(
P
i
<
P
t
*T
or
P
i
<
P
r
*T)
7:
then
zoom
in
to
[
F
(
i
ī
1)
ī
N
,
F
i
ī
N
]
3.2.2.
Determination
of
transition
type
T
o
determine
if
a
shot
is
a
hard
cut
or
gradual
transition,
the
mo
ving
a
v
erage
v
al
ue
of
frames
in
the
boundaries
is
calculated.
The
mo
ving
a
v
erage
of
frame
t
is
deļ¬ned
as:
Av
er
ag
eR
atio
(
t
)
=
1
N
t
ī
1
X
i
=
t
ī
N
R
atio
R
GB
(
t
)
(7)
Where
R
atio
R
GB
(
t
)
is
the
ratio
of
matching
feature
k
e
ypoints
obtained
in
equation
(6)
by
fusing
the
three
ratios
r
atioR
,
r
atioG
and
r
atioB
of
a
frame
t
,
this
frame
is
detected
as
a
boundary
using
the
algorithm
1.
The
period
N
is
used
as
a
number
of
pre
vious
frames
used
with
t
he
current
frame
t
when
calculating
the
mo
ving
a
v
erage.
W
e
can
distinguish
transitions
by
measuring
the
dif
ference
of
A
ver
a
g
eRatio(t)
and
R
atio
R
GB
(
t
)
as
described
in
algorithm
2.
Algorithm
2:
T
ype
of
transition
1:
F
or
t
=
t
1
;
t
2
;
:::;
t
n
do
(
t
i
is
a
shot
boundary)
2:
if
(
Av
er
ag
eR
atio
(
t
)
ī
R
atio
R
GB
(
t
)
>
=
ī
)
3:
then
4:
type
of
transition=cut
boundary
5:
else
6:
type
of
transition=gradual
transition
A
threshold
ī
is
used
to
detect
transition
types.
In
our
e
xperiments,
the
choice
of
an
appropriate
threshold
ī
,
has
a
high
impact
on
the
accurac
y
of
the
results.
4.
EXPERIMENTS
AND
RESUL
TS
In
order
to
e
v
aluate
the
performance
of
the
proposed
method
and
re
v
eal
its
adv
antages
o
v
er
the
other
methods
in
literature,
W
e
ha
v
e
designed
an
e
xperimental
video
dataset
containing
four
types
of
videos
(sport,
ne
ws,
cartoon,
mo
vie)
.The
video
sequences
used
are
MPEG-4
compressed
videos,
with
v
arious
dimensions
and
containing
se
v
eral
types
of
transitions,
The
Experiment
dataset
used
for
e
v
aluation
are
listed
in
table
1.
T
able
1.
Information
of
e
xperimental
videos
T
ype
Number
of
frames
Size
Duration
Number
of
shots
Sport
83525
640x360
3341
sec
411
Ne
ws
45100
640x360
1804
sec
223
Cartoon
31855
1280x720
1385
sec
204
Mo
vie
72749
1280x720
3163
sec
530
The
performance
results
of
the
proposed
method
are
sho
wn
as
precision
and
recall
v
alues
in
T
able
2.
Precision
and
recall
are
deļ¬ned
as:
P
r
ecision
=
N
c
N
c
+
N
f
(8)
IJECE
V
ol.
7,
No.
5,
October
2017:
2565
ā
2573
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
2571
R
ecal
l
=
N
c
N
m
+
N
c
(9)
Where
N
c
,
N
f
and
N
m
are
the
numbers
of
correct,
f
alse
and
miss
shot
boundary
detections,
respecti
v
ely
.
T
able
2.
Ev
aluation
of
the
proposed
method
Abrupt
Changes
Gradual
T
ransition
Precision
Recall
Precision
Recall
Sport
0.92
0.85
0.93
0.77
Ne
ws
0.95
0.94
0.89
0.86
Cartoon
0.88
0.91
0.75
0.81
Mo
vie
0.94
0.87
0.79
0.88
Figure
4
sho
ws
some
shot
boundaries
detected
f
rom
the
e
xperimental
dataset.
The
transitions
presented
in
ļ¬gure
4
belong
to
a
cut
transition
where
there
is
a
complete
dissimilarity
between
tw
o
successi
v
e
frames,
and
the
ratio
of
matched
k
e
ypoints
is
v
ery
small
or
null.
(a)
Example
1
of
cut
transition
(frames
99
and
100).
(b)
Example
2
of
cut
transition
(frames
230
and
231).
Figure
4.
Examples
of
tw
o
cut
transitions
detected
in
cartoon
video.
W
e
tested
our
method
on
some
videos
from
the
Open
V
ideo
Project
[25].
Figure
5
sho
ws
t
he
frames
in
the
ļ¬rst
gradual
transition
detected
by
our
method
on
a
video
pro
vided
by
The
Open
V
ideo
repository:
(N
ASA
25th
Anni
v
ersary
Sho
w
,
se
gment
1),
we
can
see
clearly
that
the
changes
and
dissimilarities
occur
gradually
between
the
successi
v
e
frames.
These
v
ariations
are
translated
by
the
v
alue
of
RGB
ratio
of
matched
similarities
that
decrease
gradually
between
the
frame
128
and
142.
Figure
5.
Example
of
gradual
transition
detected.
The
lo
w
recall
rate
in
sports
video
is
may
be
due
to
the
short
shots
that
are
missed
between
the
sampled
frames.
In
contrast,
the
precision
rates
in
this
kind
of
videos
are
more
than
90%.
It
sho
ws
that
the
method
is
ef
fecti
v
e
in
detecting
abrupt
and
gradual
transitions.On
the
other
side,
in
general,
recall
rates
are
lo
w
.
This
re
v
eals
that
some
frames
belonging
to
dif
ferent
shots
were
re
g
arded
as
similar
.
As
a
result,
se
v
eral
shot
boundaries
are
missed.
In
ne
ws
vi
d
e
o
the
precision
rate
and
the
recall
rate
are
high
(more
than
90
%),because
of
the
long
shots
and
the
e
xistence
of
man
y
cut
transition
which
are
distinguished
by
the
great
changes
between
the
frames.
Accordingly
,
shot
boundaries
are
well
detected.
Also,
the
choice
of
the
temporal
sampling
period
N
as
1
second
indicates
that
all
the
shots
less
than
this
v
alue
will
be
missed.
The
adaptation
of
the
parameter
N
in
accordance
with
the
video
sequences
can
increase
the
performance
results
by
the
reduction
of
miss
or
f
alse
shot
boundary
detection.
The
comparison
of
this
method
with
the
e
xperimental
results
reported
in
other
w
orks
based
on
SIFT
,
sho
ws
that
the
inte
gration
of
the
three
col
or
channels
R,
G
and
B
of
video
frames
gi
v
es
more
precision
in
detecting
shot
boundaries
than
using
only
the
grayscale
channel.
V
ideo
Shot
Boundary
Detection
Using
The
Scale
In
variant
F
eatur
e
T
r
ansform
and
RGB
...
(Z.
El
khattabi)
Evaluation Warning : The document was created with Spire.PDF for Python.
2572
ISSN:
2088-8708
5.
CONCLUSION
In
this
w
ork,
a
ne
w
algorithm
is
presented
based
on
scale
in
v
ariant
feature
transform
adapted
to
the
RGB
color
space.
First,
a
top
do
wn
search
process
is
performed
by
comparing
the
ratio
of
matched
k
e
ypoints
e
xtracted
via
SIFT
for
e
v
ery
R,
G
and
B
channels
of
tw
o
video
frames
separated
by
a
temporal
sampling
period
N.
Then,
an
algorithm
is
used
to
detect
the
shot
boundaries.
Finally
,
the
mo
ving
a
v
erage
of
frames
in
the
boundaries
is
calculated
to
determine
the
type
of
the
transition
by
using
a
threshold.
Our
method
is
applied
to
dif
ferent
types
of
video
and
sho
ws
satisf
actory
performance
in
detecting
abrupt
changes
and
gradual
transitions,
b
ut
it
can
be
impro
v
ed
by
using
weighting
coef
ļ¬cients
to
calculate
the
ratioRGB
from
the
three
ratios(R,G
and
B),
depending
on
the
type
of
the
video.
In
the
future
w
orks,we
aim
to
include
performance
impro
v
ements
and
minimizing
the
computational
cost
without
decreasing
the
accurac
y
.
REFERENCES
[1]
J.
T
.
T
.
Mei,
L.-X.
T
ang
and
X.-S.
Hua,
āNear
-lossless
semantic
video
summarization
and
its
applications
to
video
analysis,
ā
A
CM
T
r
ansactions
on
Multimedia
Computi
ng
,
Communications,
and
Applications
(T
OMM)
,
v
ol.
9,
no.
3,
June
2013.
[2]
R.
Thompson,
Gr
ammar
of
the
Shot
,
F
.
Press,
Ed.,
1998.
[3]
J.
S.
Boreczk
y
and
L.
A.
Ro
we,
āComparison
of
video
shot
boundary
detection
techniques,
ā
J
ournal
of
Electr
onic
Ima
ging
,
v
ol.
5,
no.
2,
pp.
122ā128,
April
1996.
[4]
R.
G.
T
apu,
āSe
gmentation
and
structuring
of
video
documents
for
inde
xing
applications,
ā
December
2012.
[5]
S.
H.
G.
Jaf
fre,
Ph.
Joly
,
āThe
s
amo
v
a
shot
boundary
detection
for
trecvid
e
v
aluation
2004,
ā
in
Pr
oceedings
of
the
TRECVID
2004
W
orkshop,
Gaither
sb
ur
g
,
MD,
USA,
NIST
,
2004.
[6]
B.
Shahraray
,
āScene
change
detection
and
content-based
sampling
of
video
sequences,
ā
in
Pr
oc.
SPIE
Digital
V
ideo
Compr
ession:
Algorithms
and
T
ec
hnolo
gies
,
v
ol.
2419,
1995,
pp.
2ā13.
[7]
C.-L.
Huang
and
B.-Y
.
Liao,
ā
A
rob
ust
scene-change
detection
method
for
video
se
gmentation,
ā
IEEE
T
r
ans-
actions
on
Cir
cuits
and
Systems
for
V
ideo
T
ec
hnolo
gy
,
v
ol.
11,
no.
12,
pp.
1281ā1288,
December
2001.
[8]
D.
S.
Guru
and
M.
Suhil,
āHistogram
based
split
and
mer
ge
frame
w
ork
for
shot
boundary
detecti
o
n,
ā
Min-
ing
Intellig
ence
and
Knowledg
e
Explor
ation,
Lectur
e
Notes
in
Computer
Science
,
v
ol.
8284,
pp.
180ā191,
December
2013.
[9]
D.
Guru
and
M.
Suhil,
āHistogram
based
split
and
mer
ge
frame
w
ork
for
shot
boundary
detection,
ā
Min-
ing
Intellig
ence
and
Knowledg
e
Explor
ation,
Lectur
e
Notes
in
Computer
Science
,
v
ol.
8284,
pp.
180ā191,
December
2013.
[10]
S.
C.
R.
S.
Jadon
and
K.
K.
Bisw
as,
ā
A
fuzzy
theoretic
approach
for
video
se
gmentation
using
syntactic
features,
ā
P
attern
Reco
gnition
Letter
s
,
v
ol.
22,
no.
13,
pp.
1359ā1369,
No
v
ember
2001.
[11]
L.
Y
.
R.
L.
C.
Y
.
.
Z.
R.
Qu,
Z.,
ā
A
method
of
shot
detection
based
on
color
and
edge
features,
ā
in
1st
IEEE
Symposium
on
W
eb
Society
,
SWSā09
,
August
2009,
pp.
1ā4.
[12]
H.
Z.
P
.
Aigrain
and
D.
P
etk
o
v
i
c,
āContent-based
representation
and
retrie
v
al
of
visual
media:
A
state-of-
the-art
re
vie
w
,
ā
Multimedia
T
ools
and
Applications
,
v
ol.
3,
no.
3,
pp.
179ā202,
No
v
ember
1996.
[13]
S.
M.
P
.
P
anchal
and
N.
P
atel,
āScene
detection
and
retrie
v
al
of
video
using
motion
v
ector
and
occurrence
rate
of
shot
boundaries,
ā
in
2012
Nirma
Univer
sity
International
Confer
ence
on
Engineering
(NUiCONE)
,
December
2012,
pp.
1ā6.
[14]
X.
H.
Y
.
W
.
Muhammad
Ab
ul
Hasan,
Min
Xu,
ā
A
camera
motion
histogram
descriptor
for
video
shot
classi-
ļ¬cation,
ā
Multimedia
T
ools
and
Applications
,
v
ol.
24,
no.
74,
p.
1107311098,
December
2015.
[15]
F
.
B.
F
.
Bayat,
M.
Shahram
Moin,
āGoal
detection
in
soccer
video:
Role-based
e
v
ents
detection
approach,
ā
International
J
ournal
of
Electrical
and
Computer
Engineering
(IJECE)
,
v
ol.
4,
no.
6,
pp.
979ā988,
2014.
[16]
.
N.
H.
S.
Rashmi,
B.
S.,
āV
ideo
shot
boundary
detection
using
midrange
local
binary
pattern,
ā
in
Interna-
tional
Confer
ence
on
Advances
in
Computing
,
Communications
and
Informatics
(ICA
CCI),IEEE
,
September
2016,
pp.
201ā206.
[17]
A.
M.
E.
M.
M.
Pournazari,
F
.
Mahmoudi,
āV
ideo
summarization
based
on
a
fuzzy
based
incremental
clus-
tering,
ā
International
J
ournal
of
Electrical
and
Computer
Engineering
(IJECE)
,
v
ol.
4,
no.
4,
pp.
593ā602,
2014.
[18]
N.
N.
S.
D.
.
G.
E.
Loukas,
C.,
āShot
boundary
detection
in
endoscopic
sur
gery
videos
using
a
v
ariational
bayesian
frame
w
ork,
ā
International
journal
of
computer
assisted
r
adiolo
gy
and
sur
g
ery
,
v
ol.
11,
no.
11,
pp.
1937ā1949,
2016.
[19]
K.
T
.
S.
K.
M.
.
R.
S.
Thounaojam,
D.
M.,
ā
A
genetic
algorithm
and
fuzzy
logic
approach
for
video
shot
boundary
detection,
ā
Computational
intellig
ence
and
neur
oscience
,
no.
14,
2016.
IJECE
V
ol.
7,
No.
5,
October
2017:
2565
ā
2573
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
2573
[20]
E.
A.
A.
K.
N.
P
.
.
J.
M.
Hannane,
R.,
ā
An
ef
ļ¬cient
method
for
video
shot
boundary
detection
and
k
e
yframe
e
xtraction
using
sift-point
distrib
ution
histogram,
ā
International
J
ournal
of
M
u
l
timedia
Information
Re-
trie
val
,
v
ol.
2,
no.
5,
pp.
89ā104,
2016.
[21]
W
.
X.
Z.
W
.
.
H.
P
.
Liu,
G.,
āShot
boundary
detection
and
k
e
yframe
e
xtraction
based
on
scale
in
v
ariant
feature
transform,
ā
in
Eighth
IEEE/A
CIS
International
Confer
ence
on
Computer
and
Information
Science
,
ICIS
2009
,
June
2009,
pp.
1126ā1130.
[22]
W
.
X.
Z.
A.
.
W
.
J.
C
hi,
A.,
āRe
vie
w
of
research
on
shot
boundary
detection
algorithm
of
the
compressed
video
domain
in
content-based
video
retrie
v
al
technique,
ā
in
DEStec
h
T
r
ansactions
on
Engineering
and
T
ec
hnolo
gy
Resear
c
h,
(iceta)
,
2016.
[23]
D.
Lo
we,
āDistincti
v
e
image
features
from
scale
in
v
ariant
k
e
ypoints,
ā
International
J
ournal
of
Computer
V
ision
,
v
ol.
60,
no.
2,
pp.
91ā110,
2004.
[24]
M.
B
irinci
and
S.
Kiran
yaz,
ā
A
perceptual
scheme
for
fully
automatic
video
shot
boundary
detection,
ā
Signal
Pr
ocessing:
Ima
g
e
Communication
,
v
ol.
29,
no.
3,
pp.
410ā423,
March
2014.
[25]
The
open
video
project.
[Online].
A
v
ailable:
https://open-video.or
g/inde
x.php
BIOGRAPHIES
OF
A
UTHORS
Zaynab
El
khattabi
is
a
Ph.D.
student
in
F
aculty
of
Sciences,
Abdelmalek
Essadi
Uni
v
ersity
,
Mo-
rocco
.
She
is
a
Computer
Sciences
engineer
,
graduated
in
2012
from
National
School
of
Applied
Sciences,
Abdelmalek
Essadi
Uni
v
ersity
.
She
got
a
DEUG
on
Mathematics
and
computer
Sciences
in
2009
from
F
aculty
of
Sciences,
Abdelmalek
Essadi
Uni
v
ersity
.
Her
current
research
interests
include
image
and
video
processing
and
focuses
on
video-content
analysis
and
retrie
v
al.
Y
ouness
T
abii
recei
v
ed
his
PhD
in
July
2010
from
the
National
School
of
Computer
Sciences
and
Systems
Analysis,
Mohammed
V
Uni
v
ersity-Rabat.
He
is
a
Professor
at
the
National
School
of
Applied
Sciences
of
T
etuan
(ENSA
T).
He
is
a
member
in
Ne
w
T
echnology
T
rends
T
eam
(NTT
T
eam)
and
the
Head
of
Master:
Embedded
and
Mobile
S
ystems.
His
research
interest
includes
video
processing
and
analysis
,
also
interested
by
cloud
security
.
He
is
the
F
ounder
and
Chair
of
International
Conference
on
Big
Data,
Coul
d
and
Applications
(BDCA).
He
is
a
Guest-Editor
of
the
International
Journal
of
Cloud
Computing
in
2016.
Abdelhamid
Benkaddour
got
a
MAS
and
a
PhD
in
Applied
Mathematics
and
Mechanics
from
Pierre
et
Marie
Curie
(P
aris
VI)
Uni
v
ersity
in
June
1986
and
1990,
respecti
v
ely
,
and
a
PhD
in
Mathematics
from
Abdelmalek
Essaadi
Uni
v
ersity
in
1994.
His
research
focuses
on
numerical
analysis,
scientiļ¬c
computing
and
computer
science.
V
ideo
Shot
Boundary
Detection
Using
The
Scale
In
variant
F
eatur
e
T
r
ansform
and
RGB
...
(Z.
El
khattabi)
Evaluation Warning : The document was created with Spire.PDF for Python.