Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
10,
No.
2,
April
2020,
pp.
1337
1345
ISSN:
2088-8708,
DOI:
10.11591/ijece.v10i2.pp1337-1345
r
1337
Rob
ust
f
or
egr
ound
modelling
to
segment
and
detect
multiple
mo
ving
objects
in
videos
Rahul
M
P
atil,
Chethan
K
P
,
Azra
Nasr
een,
Shobha
G
Department
of
Computer
Science
and
Engineering,
Rashtree
ya
V
idyalaya
Colle
ge
of
Engineering,
Bang
alore,
Karnataka,
India
Article
Inf
o
Article
history:
Recei
v
ed
Dec
9,
2018
Re
vised
May
31,
2019
Accepted
Oct
5,
2019
K
eyw
ords:
Background
subtraction
F
ore
ground
modelling
Mean
A
v
eraging
Mo
ving
object
detection
V
ideo
analysis
ABSTRA
CT
Last
decade
has
witnessed
an
e
v
er
increasing
number
of
video
surv
eillance
installa-
tions
due
to
the
rise
of
security
concerns
w
orldwide.
W
ith
this
comes
the
need
for
video
analysis
for
fraud
detection,
crime
in
v
estig
ation,
traf
fic
monitoring
to
name
a
fe
w
.
F
or
an
y
kind
of
video
anal
ysis
application,
detection
of
mo
ving
objects
in
videos
is
a
fundamental
step.
In
this
paper
,
an
ef
ficient
fore
ground
modelling
method
to
se
gment
multiple
mo
ving
objects
is
implemented.
Proposed
method
signif-
icantly
reduces
noise
thereby
accurately
se
gmenting
re
gion
of
interest
under
dynamic
conditions
while
handling
occlusion
to
a
lar
ge
e
xtent.
Extensi
v
e
performance
analysis
sho
ws
that
the
proposed
method
w
as
found
to
gi
v
e
f
ar
better
results
when
compared
to
the
de
f
acto
standard
as
well
as
relati
v
ely
ne
w
approache
s
used
for
mo
ving
object
detection.
Copyright
c
2020
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
Rahul
M
P
atil,
Department
of
Computer
Science
and
Engineering,
Rashtree
ya
V
idyalaya
Colle
ge
of
Engineering,
Bang
alore-560069,
Karnataka,
India.
Email:
patilmrahul06@gmail.com
1.
INTR
ODUCTION
The
first
step
in
an
y
video
analytics
solution
is
the
se
gmentation
of
mo
ving
objects.
Though
this
has
been
studied
for
se
v
eral
y
e
ars,
there
has
been
lot
of
concerns
when
accurately
detecting
mo
ving
objects
such
as
background
noise,
illumination
changes,
v
ariable
frame
rate
in
recording
videos
resulting
in
lag,
shado
ws
and
occlusion
to
name
a
fe
w
.
In
this
paper
,
we
propose
an
ef
fic
ient
object
detection
method
that
addresses
issues
such
as
background
noise,
illumination
changes/reflection
causing
f
alse
positi
v
es,
o
v
erlapping
or
occlusion
to
lar
ge
e
xtent,
e
xtracting
e
xact
bounding
box
or
re
gion
of
interest
(R
OI)
using
morphological
operations
and
con
v
e
x
hull
algorithm
in
post-processing
phase.
V
arious
methods
ha
v
e
been
proposed
for
back-
ground
subtraction
[1,2],
each
ha
ving
its
o
wn
limitation
due
to
man
y
challenges
such
as
sudden
changes
in
scene,
non-static
background
objects
,
lag
introduced
due
to
v
ariable
frame
rate,
changes
in
appearance
of
the
objects
with
vie
wpoint
and
dynami
c
backgrounds
such
as
gush
of
wind,
mo
v
ement
of
tree
lea
v
es,
shado
ws
etc.
A
re
vie
w
of
the
most
rele
v
ant
methods
in
background
subtraction
is
pro
vided
in
[3],
gi
ving
a
good
under
-
standing
of
the
opt
imal
method
to
be
used
for
an
y
background
subtraction
task.
Se
gmentation
methods
using
techniques
such
as
background
subtraction,
Deep
Learning
etc.,
play
highly
pi
v
otal
roles
in
se
v
eral
applications,
ranging
from
visual
observ
ation
of
animals
[4,5]
to
video
surv
eillance
syst
ems
[6,7].
The
y
are
also
e
xtremely
popular
in
content
based
video
coding
as
in
[8,9].
Much
of
the
past
and
on-going
research
in
this
field
aims
at
resolving
these
issues
in
order
to
impro
v
e
accurac
y
of
results
[10].
Gaura
v
T
akhar
et
al
[11]
discusses
v
arious
methods
of
background
subtraction
such
as
basic,
statistical
as
well
as
the
machine
learning
techniques
with
the
a
v
erage,
best
and
w
orst
cases
of
se
v
eral
J
ournal
homepage:
http://ijece
.iaescor
e
.com/inde
x.php/IJECE
Evaluation Warning : The document was created with Spire.PDF for Python.
1338
r
ISSN:
2088-8708
other
dif
ferent
methods.
Proposed
system
is
compared
with
statistical
technique
of
adapti
v
e
Gaussian
mixtures
using
popular
datasets.
Non-max
suppression
technique
is
discuss
ed
in
[12].
A
f
aster
v
ersion
of
this
method
helps
in
the
process
of
mer
ging
bounding
box
es
if
multiple
bounding
box
es
are
obtained
for
a
single
object,
which
are
in
close
proximity
and
ha
v
e
similar
area
sizes.
F
or
se
v
eral
morphological
transformations
that
are
used
in
the
proposed
method,
sound
understanding
of
these
are
pro
vided
in
[13],
most
popular
being
Gaussian
mixture
model
[14].
The
state
of
the
art
in
background
subtraction
has
been
proposed
by
[15],
where
an
adapti
v
e
Gaussian
mixture
model
is
used
to
automatically
find
the
number
of
Gaussi
an
components
for
each
pix
el.
A
subsequent
method
is
described
in
[16],
where
ef
ficienc
y
of
the
adapti
v
e
Gaussian
mixture
model
is
impro
v
ed.
Arun
V
ar
ghese
et
al
[1]
discusses
background
subtraction
being
done
at
the
pix
el
le
v
el
and
performance
analysis
using
popular
dataset
Highw
ay
from
changedetection.net
.
Performance
analysis
at
the
pix
el
le
v
el
is
also
discussed
in
[17].
W
e
used
Pedestrians
and
Highw
ay
dataset
from
baseline
cate
gory
and
T
urnpik
e
from
the
lo
w
frame
rate
cate
gory
of
the
2014
CD
W
da
tasets.
Frame
based
performance
metrics
are
discussed
in
[18,19]
such
as
T
rue
Positi
v
es,
F
alse
Positi
v
es,
F
alse
Ne
g
ati
v
es
and
T
rue
Ne
g
ati
v
es
for
dif
ferent
datasets
and
models
respecti
v
ely
.
The
system
proposed
in
this
paper
uses
techniques
such
as
f
ast
non-maximum
suppression
method
to
increase
the
ac
curac
y
of
detection,
con
v
e
x
hull
method
to
get
better
defined
blobs
of
each
fore
ground
object
and
morphological
transformations
wit
h
circular
k
ernels
to
get
a
much
smoother
outline
of
the
detected
fore
ground
blobs.
The
model
is
e
xtremely
lightweight,
v
ery
f
ast
and
requires
no
initial
training.
Proposed
model
also
accounts
for
changing
background
by
ha
ving
the
background
updated
by
using
weighted
a
v
erages
of
each
input
frame.
All
in
all,
the
model
is
computationall
y
ef
ficient,
accurate
for
major
ity
of
the
cases
with
a
small
number
of
limitations
that
will
be
discussed
later
.
2.
GA
USSIAN
MIXTURE
MODEL
Pix
els
in
the
background
are
modelled
with
a
mixture
of
K
Gaussian
distrib
utions,
the
v
alue
of
K
being
three
to
fi
v
e.
The
time
that
a
pix
el
stays
in
the
scene
is
determined
by
the
weights
of
the
distrib
uti
ons
in
the
mixture.
The
most
lik
ely
background
colours
will
be
the
ones
that
stay
longer
as
determined
by
the
weights.
Impro
v
ed
Gaussian
mixture
model
is
more
adapti
v
e
than
the
Gaussian
mixture
model
[15,16],
K
distrib
utions
used
for
modelling
is
appropriately
determined
for
each
pix
el
in
the
image.
The
probability
of
a
pix
el
ha
ving
v
alue
X
N
at
time
N
is
indicated
in
equation
(1):
p
(
X
N
)
=
K
X
j
=1
w
j
(
X
N
;
j
)
(1)
Wherein
w
k
is
weight
k
th
Gaussian
component.
(
x
;
k
)
is
normal
distrib
ution
of
k
th
component
as
indicated
in
equation
(2):
(
x
;
k
)
=
(
x
;
k
;
P
k
)
=
1
(2
)
D
2
j
P
k
j
1
2
e
1
2
(
x
k
)
T
P
1
k
(
x
k
)
(2)
In
which
mean
is
k
and
the
co
v
ariance
is
P
k
=
2
k
I
.
The
K
distrib
utions
are
sorted
based
on
the
v
alue
of
w
k
/
k
and
the
first
B
distrib
utions
are
used
to
create
a
model
of
the
background
of
the
scene.
B
is
computed
as
in
equation
(3):
B
=
arg
min
b
0
@
b
X
j
=1
w
j
>
T
1
A
(3)
Where
T
is
the
minimum
fraction
of
the
background
model.
In
other
w
ords,
it
is
the
mi
nimum
prior
probability
that
the
background
is
in
the
scene.
(a)
GMM
adapti
v
e
to
v
ariable
lighting
conditions:
This
method
incorporates
per
pix
el
Bayesian
se
gmen-
tation
into
the
Gaussian
mixture
model
in
order
to
account
for
videos
recorded
in
v
ariable
lighting
conditions
[20].
(b)
Adapti
v
e
v
ariable
frame
rate
coding:
This
method
adjusts
the
frame-rate
of
the
video
dynamically
and
adapti
v
ely
,
making
use
of
information
from
already
e
xisting
video
encoders
[21].
Int
J
Elec
&
Comp
Eng,
V
ol.
10,
No.
2,
April
2020
:
1337
–
1345
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
Comp
Eng
ISSN:
2088-8708
r
1339
(c)
Intermittent
motion
coding:
This
method
in
v
olv
es
disabling
of
motion
coding
during
periods
of
inacti
vity
in
the
video.
Thus
it
records
only
parts
of
the
video
were
acti
v
e
fore
ground
mo
v
ement
is
in
v
olv
ed
for
further
processing
[22].
All
of
the
methods
e
xplained
abo
v
e
incur
considerable
o
v
erhead
with
re
g
ard
to
time
or
CPU
usage.
The
Gaussian
mixture
model
based
methods
cannot
ef
ficiently
deal
with
v
ariable
frame
rates
in
videos.
The
v
ariable
frame
rate
coding
techniques
mak
e
use
of
video
encoder
information,
the
compilation
of
which
in
v
olv
es
CPU
o
v
erhead.
Also,
recording
only
during
periods
of
acti
vity
means
that
the
definiti
on
of
acti
vity
in
the
scene
has
to
be
pre-determined
in
adv
ance,
and
done
so
using
e
xtensi
v
e
statistical
analysis.
Non-static
background
objects
must
be
included
the
background
modelled.
3.
PR
OPOSED
SYSTEM
Background
is
modelled
by
obtaining
the
background
scene
without
occurrence
of
an
y
of
the
fore
ground
objects,
so
that
fore
ground
objects
from
it
can
be
obtained
by
background
subtraction.
Though
it
looks
simple,
it
is
v
ery
dif
ficult
and
a
tedious
task
as
it
should
not
contain
an
y
fore
ground
objects
in
it,
i.e
an
y
mo
v
ement
suc
h
as
gush
of
a
wind,
mo
v
ement
of
tree
lea
v
es
etc.
should
be
part
of
the
background
itself.
The
background
of
the
scene
should
be
updated
as
and
when
the
scene
changes
and
must
be
free
from
an
y
kind
of
noise
and
must
be
susceptible
to
an
y
kind
of
illumination
changes.
3.1.
Running
a
v
erage
method
A
background
model
has
to
be
constructed
initially
in
order
to
perform
the
background
subt
raction
task.
Running
a
v
erage
is
found
to
be
a
good
method
of
approximating
the
background.
This
method
is
f
aster
than
Gaussian
mixture
model
and
is
more
consistent
than
direct
frame
dif
ferencing
[23].
Proposed
system
uses
f
ast
running
a
v
erage
method
for
background
modelling
as
illustrated
in
equation
(Eq.
4):
dst
(
x;
y
)
=
(1
r
)
:dst
(
x;
y
)
+
r
:sr
c
(
x;
y
)
(4)
Where
dst
(
x;
y
)
is
the
accumulator
image
with
the
same
number
of
channels
as
input
image,
sr
c
(
x;
y
)
is
input
image
which
can
ha
v
e
1
or
3-channels,
and
r
is
a
weight
of
the
input
image.
Using
continuous
frames
in
a
video
stream,
the
weighted
a
v
erage
background
model
can
be
calculated
by
choosing
an
appropriate
v
alue
for
r
,
for
that
particular
sequence.
By
using
a
higher
v
alue
of
r
,
we
are
able
to
eliminate
the
fore
ground
objects
that
are
not
persistent
in
the
scene.
Also,
a
suitable
v
alue
of
r
can
be
chosen
by
taking
into
consideration
the
amount
of
data
a
v
ailable
for
modelling.
The
process
of
learning
the
background
is
as
illustrated
in
Figure
1.
Figure
1.
Running
A
v
erage
to
learn
background
3.2.
Backgr
ound
subtraction
The
V
channel
of
the
HSV
image
is
fed
as
an
input
to
the
dif
ferencing
method,
where
the
absolute
dif
ference
between
the
V
channel
of
the
current
frame
and
the
modelled
background
is
obtained.
This
is
done
by
finding
the
abs
olute
dif
ference
between
each
pix
el
element
of
the
modelled
background
and
the
V
channel
Rob
ust
for
e
gr
ound
modelling
to
se
gment
and
detect
multiple
...
(Rahul
M
P
atil)
Evaluation Warning : The document was created with Spire.PDF for Python.
1340
r
ISSN:
2088-8708
of
the
current
frame,
which
are
fed
as
parameters
to
the
method.
The
HSV
color
space
is
used
because
it
w
orks
well
ag
ainst
shado
ws
[24].
The
final
absolute
dif
ferenced
image
is
processed
to
find
and
dra
w
the
most
prominent
contours
for
the
detected
fore
ground
objects.
Then
a
thresholding
is
performed
where
pix
els
belo
w
a
certain
threshold
v
alue
are
assigned
a
0
v
alue,
and
the
pix
els
ha
ving
a
v
alue
greater
are
assi
gned
the
maximum
v
alue
of
255.
This
method
is
kno
wn
as
binary
thresholding
as
sho
wn
belo
w:
dst
(
x;
y
)
(
maxV
al
if
;
sr
c
(
x;
y
)
>
thr
esh
0
o
ther
w
ise
(5)
Here
sr
c
(
x;
y
)
is
a
source
image
pix
el,
thr
esh
is
the
threshold
v
alue
used
in
binary
thresholding
and
dst
(
x;
y
)
is
the
result
image
pix
el.
maxV
al
is
the
v
alue
that
the
particular
sr
c
(
x;
y
)
pix
el
will
obtain
if
it’
s
v
alue
e
xceeds
that
of
the
pre-assigned
thr
esh
v
alue.
The
entirety
of
the
steps
performed
in
the
proposed
method
can
be
e
xpressed
in
a
flo
w
diagram
as
seen
in
Figure
2.
The
Sequence
of
operation
are
sho
wn
in
Figure
3.
Figure
2.
Proposed
method
to
se
gment
mo
ving
objects
(a)
Dif
ference
Image
(b)
Gaussian
Blur
Applied
(c)
Thresholding,
remo
ving
noisy
contours
and
opening
operation
(d)
Final
contours
obtained
after
Con
v
e
x
Hull
Figure
3.
Sequence
of
operations
3.3.
F
or
egr
ound
modelling
After
the
threshold
frame
is
determined,
we
ha
v
e
a
binary
frame
with
blobs
representing
fore
ground
objects.
Morphological
transformations
such
as
dilation,
erosion
and
opening
are
applied
to
reduce
mer
ging
of
contours
of
dif
ferent
fore
ground
objects.
Opening
operation
is
used
to
eliminate
portions
of
the
fore
ground
object
that
may
just
e
xtend
out
into
the
background.
It
is
achie
v
ed
by
using
the
dilation
and
erosion
operation
which
augments
and
shrinks
a
re
gion
respecti
v
ely
.
W
e
use
a
structuring
element
S
otherwise
kno
wn
as
a
k
ernel
to
perform
these
operations.
This
operation
is
used
to
e
xpand
the
fore
ground
object’
s
obtained
contours.
The
dilation
of
an
image
B
with
S
,
is
gi
v
en
by
the
belo
w
equation
(6):
B
S
=
[
bB
S
b
(6)
Int
J
Elec
&
Comp
Eng,
V
ol.
10,
No.
2,
April
2020
:
1337
–
1345
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
Comp
Eng
ISSN:
2088-8708
r
1341
Erosion
reduces
the
size
of
the
fore
ground
object’
s
contour
and
is
used
to
remo
v
e
unw
anted
e
xcess
contour
elements
that
may
ha
v
e
e
xtended
into
the
background.
Similar
to
dilation,
the
erosion
of
an
image
B
with
structuring
element
is
gi
v
en
belo
w:
B
S
=
f
b
j
b
+
s
B
8
s
S
g
(7)
Opening
operation
is
erosion
operation
follo
wed
by
dilation
operation,
which
is
used
to
pre
v
ent
mer
ging
of
contours
of
dif
ferent
objects
and
ultimatel
y
gi
v
es
much
better
final
bounding
box
es
for
the
fore
ground
objects,
and
can
be
represented
mathematically
as
in
equation
(8):
B
S
=
(
B
S
)
S
(8)
These
blobs
are
e
xtract
ed
as
contours.
Smaller
blobs
and
contours
that
represent
noise
and
other
unw
anted
detail
are
eliminated
and
properti
es
lik
e
the
edges,
centres
and
areas
of
the
final
set
of
resulting
contours
are
calculated.
A
con
v
e
x
hull
of
the
contours
is
found
to
gi
v
e
a
definiti
v
e
shape
to
an
y
incomplete
contours
that
might
ha
v
e
resulted
due
to
similarity
of
intensity
v
alue
or
illumination
defects.
In
order
to
get
whole
bounding
box
es
for
fore
ground
objects,
there
w
as
a
need
to
mak
e
the
contours
of
the
fore
ground
objects
more
wholesome.
T
o
accomplish
this,
the
con
v
e
x
hull
operation
is
performed
on
the
cont
ours.
The
contours
obtained
finally
after
performing
this
are
used
to
dra
w
the
bounding
box
es
for
the
detected
fore
ground
objects.
The
con
v
e
x
hull
of
a
finite
set
of
points
S
is
the
set
of
all
con
v
e
x
combinations
of
the
points.
Each
and
e
v
ery
point
in
this
set
denoted
by
x
i
is
attrib
uted
with
a
weight
i
.
Each
and
e
v
ery
weight
must
be
non-ne
g
ati
v
e
and
their
sum
must
be
equal
to
unity
.
These
weights
are
used
to
obtain
a
weighted
a
v
erage
of
all
the
points
in
set
S
.
F
or
v
arious
choice
of
coef
ficients,
a
certain
con
v
e
x
combination
is
obtained
that
is
a
point
in
the
con
v
e
x
hull.
Therefore,
the
entire
con
v
e
x
hull
may
be
obtained
by
considering
all
the
v
arious
combinations
of
weights.
It
can
be
e
xpressed
in
a
single
equation
as
sho
wn
belo
w
in
equation
(9):
C
onv
(
S
)
=
j
S
j
X
i
=1
D
i
x
j
j
(
r
i
:
i
0)
^
j
S
j
X
i
=1
i
=
1
E
(9)
The
final
blobs
are
returned
as
contours,
and
the
bounding
box
es
for
all
these
contours
are
obtained
and
stored
in
an
array
structure.
Then
re
d
undant
bounding
box
es
that
occur
inside
other
lar
ger
bound-
ing
box
es
are
eliminated.
Finally
an
iteration
of
f
ast
non-max-suppression
is
emplo
yed
to
mer
ge
multiple
detections
for
the
same
object
for
impro
v
ed
final
results.
It
uses
area
of
the
obtained
box
es
in
addition
t
o
the
o
v
erlapping
percentage
of
neighbouring
box
es.
Then
the
final
box
es
that
are
in
the
array
are
dra
wn
onto
the
frames.
Area
of
these
bounding
box
es
along
with
their
pix
els
are
compared
with
the
bounding
box
es
and
the
pix
els
of
the
ground
truth
frames
in
order
to
estimate
and
analyse
the
performance.
4.
EXPERIMENT
AL
SETUP
AND
RESUL
T
AN
AL
YSIS
Dataset
used
for
performance
e
v
aluation
is
CDnet,
(Change
Detection),
consists
of
31
videos
depicting
indoor
and
outdoor
scenes
with
boats,
cars,
trucks,
and
pedestrians
that
ha
v
e
been
captured
in
dif
ferent
scenarios
and
contain
a
range
of
challenges.
Pedestrians
and
Highw
ay
from
baseline
cate
gory
and
T
urnpik
e
from
the
lo
w
frame
rate
cate
gory
of
the
2014
CD
W
datasets
ha
v
e
been
us
ed.
The
v
alidation
metrics
that
ha
v
e
been
used
in
the
conte
xt
of
comparing
the
se
gmented
result
with
the
corresponding
ground-truth
for
that
frame
in
the
video
sequence
are:
(a)
T
rue
Negati
v
e
(TN)
:
Pix
els
correctly
classified
as
the
background
(b)
T
rue
P
ositi
v
e
(TP)
:
Pix
els
correctly
classified
as
the
fore
ground
(c)
F
alse
P
ositi
v
e
(FP)
:
Pix
els
wrongly
classified
as
the
fore
ground
(d)
F
alse
Negati
v
e
(FN)
:
Pix
els
wrongly
classified
as
the
background
V
arious
performance
metrics
that
ha
v
e
been
used
are
as
sho
wn
from
equation
(10)
to
(17)
belo
w:
P
r
ecision
(
P
)
=
T
P
F
P
+
T
P
(10)
Rob
ust
for
e
gr
ound
modelling
to
se
gment
and
detect
multiple
...
(Rahul
M
P
atil)
Evaluation Warning : The document was created with Spire.PDF for Python.
1342
r
ISSN:
2088-8708
R
ecal
l
(
R
)
=
T
P
F
N
+
T
P
(11)
S
pecif
icity
=
T
N
F
P
+
T
N
(12)
F
al
se
N
eg
ativ
e
R
ate
=
F
N
F
N
+
T
P
(13)
F
al
se
P
ositiv
e
R
ate
=
F
P
F
P
+
T
N
(14)
P
W
C
=
F
P
+
F
N
T
N
+
T
P
+
F
P
+
F
N
100
(15)
F
M
easur
e
=
2
R
P
R
+
P
(16)
Accur
acy
=
T
N
+
T
P
T
N
+
T
P
+
F
N
+
F
P
(17)
T
able
1
sho
ws
the
performance
comparison
of
the
proposed
system,
ag
ainst
the
impro
v
ed
adapti
v
e
Gaussian
mixture
model
[15]
on
three
datasets,
namely
highw
ay
,
turnpik
e
and
pedestrians.
T
able
1.
Performance
e
v
aluation
of
proposed
system
with
impro
v
ed
adapti
v
e
Gaussian
mixture
model
and
Hybrid
model
Datasets
Highway
P
edestrians
T
ur
npik
e
Model
Pr
oposed
Zi
vk
o
vic[15]
Hybrid
Pr
oposed
Zi
vk
o
vic
Hybrid
Pr
oposed
Zi
vk
o
vic
Recall
0.7387
0.9619
0.9152
0.6594
0.9860
0.7290
0.9259
0.9649
Specificity
0.9982
0.9272
0.9314
0.9988
0.9613
0.9921
0.9868
0.9695
FPR
0.0137
0.5682
0.5391
0.0216
0.6804
0.1384
0.0724
0.1678
FNR
0.0334
0.0049
0.0118
0.0194
0.0008
0.0154
0.0134
0.0064
PWC
3.1237
6.8897
7.0895
1.9496
3.7379
2.2092
2.2525
3.1197
Pr
ecision
0.9817
0.6286
0.6293
0.9682
0.5917
0.8404
0.9275
0.8519
F-Measur
e
0.8430
0.7603
0.7453
0.7845
0.7396
0.781
0.9267
0.9049
Accuracy
0.9688
0.9311
0.9288
0.9792
0.6258
0.9767
0.9775
0.9688
As
indicated
in
T
able
1
it
w
as
found
that
proposed
method
w
as
found
to
be
ef
fecti
v
e
and
yielded
better
accurac
y
of
96.88%
and
precision
of
98.17%.
Also,
i
t
has
a
v
ery
lo
w
f
alse
positi
v
e
rate
and
f
alse
ne
g
ati
v
e
rate
for
detecting
m
o
vi
n
g
objects
in
videos,
when
compared
to
the
de
f
acto
standard
of
the
impro
v
ed
adapti
v
e
Gaussi
an
mixture
model
on
the
highw
ay
dataset
from
change
detection
net.
The
snapshots
obtained
with
proposed
system,
and
the
adapti
v
e
Gaussian
mixture
model
for
three
datasets,
as
sho
wn
in
Figures
4,
5
and
6.
T
able
1
also
sho
ws
comparison
of
the
proposed
system
with
another
e
xisting
method,
namely
the
multi-modal
h
ybrid
approach
of
adapti
v
e
Gaussian
mixture
model
and
mean
a
v
eraging.
The
h
ybrid
model
used
for
comparison
can
model
and
track
mo
ving
objects
in
a
video
and
it
w
orks
as
follo
ws.
In
order
to
smoothen
the
e
xtracted
frames,
a
sequence
of
smoothing
filters
are
applied,
these
being
Gaussian
blur
and
median
blur
,
respecti
v
ely
.
The
approach
tak
en
to
reduce
noise
uses
the
morphological
operations
erosion
and
dilation.
Mean
a
v
eraging
is
used
for
background
modelling
and
frame
dif
ferencing
along
with
the
adapti
v
e
Gaussian
mixture
model
is
used
to
obtain
fore
ground
masks.
Contours
are
found
from
the
fore
ground
masks
on
which
con
v
e
x
hull
is
applied
to
get
the
final
object
blobs.
Proposed
h
ybrid
model
is
able
to
detect
and
track
mo
ving
objects
in
videos
in
real
time
and
is
tested
for
man
y
outdoor
scenes,
and
snapshots
of
the
obtained
results
follo
w
the
conclusion
section.
No
comparison
has
been
made
for
the
T
urnpik
e
dataset
for
the
h
ybrid
model,
as
it
has
not
been
designed
for
lo
w
frame
rate
videos,
and
therefore
it
has
not
been
included
in
the
table.
As
e
vident
from
the
T
able
1,
the
proposed
system
is
able
to
perform
well
when
compared
to
the
h
ybrid
method
as
well.
Ef
fec
ti
v
ely
reduces
noise
and
is
able
to
se
gment
e
xact
R
OI
of
mo
ving
objects.
This
is
achie
v
ed
by
Gauss
ian
blur
and
remo
v
al
of
small
contours
leading
to
noise,
and
by
applying
opening
morphological
operations.
This
isolates
contours
of
dif
ferent
bounding
box
es,
e
v
en
if
the
distance
between
the
objects
is
small,
thereby
handling
occlusion
to
an
e
xtent.
Int
J
Elec
&
Comp
Eng,
V
ol.
10,
No.
2,
April
2020
:
1337
–
1345
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
Comp
Eng
ISSN:
2088-8708
r
1343
(a)
Original
Input
(b)
Ground
T
ruth
(c)
Contours
obtained
by
MoG2
(d)
Contours
obtained
by
Proposed
Model
(e)
Bounding
box
es
from
Ground
T
ruth
(f)
Bounding
box
es
from
MoG2
(g)
Bounding
box
es
from
Proposed
Model
Figure
4.
Results
for
T
urnpik
e
dataset
(a)
Original
Input
(b)
Ground
T
ruth
(c)
Contours
obtained
by
MoG2
(d)
Contours
obtained
by
Proposed
Model
(e)
Bounding
box
es
from
Ground
T
ruth
(f)
Bounding
box
es
from
MoG2
(g)
Bounding
box
es
from
Proposed
Model
Figure
5.
Results
for
T
urnpik
e
dataset
Rob
ust
for
e
gr
ound
modelling
to
se
gment
and
detect
multiple
...
(Rahul
M
P
atil)
Evaluation Warning : The document was created with Spire.PDF for Python.
1344
r
ISSN:
2088-8708
(a)
Original
Input
(b)
Ground
T
ruth
(c)
Contours
obtained
by
MoG2
(d)
Contours
obtained
by
Proposed
Model
(e)
Bounding
box
es
from
Ground
T
ruth
(f)
Bounding
box
es
from
MoG2
(g)
Bounding
box
es
from
Proposed
Model
Figure
6.
Results
for
T
urnpik
e
dataset
The
Figures
4,
5
and
6
sho
w
a
comparison
of
the
w
orking
of
our
proposed
model
ag
ainst
Impro
v
ed
Adapti
v
e
Gaussian
mixture
model
and
the
Hybrid
model.
Each
figure
consists
of
a
set
of
7
sub-figures
each,
which
summarize
the
results
obtained
on
the
dif
ferent
datasets
that
ha
v
e
been
used.
The
first
sub-figure,
is
the
input
frame
from
the
original
dataset,
just
as
is
the
follo
wing
ground
truth
sub-figure.
The
follo
wing
tw
o
sub-
figures
are
the
blobs
that
are
obtained
by
the
impro
v
ed
adapti
v
e
Gaussian
mixture
model
and
our
o
wn
method
respecti
v
ely
.
The
follo
wing
three
figures
are
as
their
captions
suggest.
Essentially
,
the
y
are
bounding
box
es
that
ha
v
e
been
obtained
for
the
corresponding
blobs,
and
dra
wn
onto
the
original
input
frame.
5.
CONCLUSION
The
proposed
system
w
as
found
to
be
an
ef
fecti
v
e
approach
in
capturing
small
and
lar
ge
mo
v
ements
in
the
mo
ving
objects
and
e
xtracts
well
defined
fore
ground
objects.
Exact
re
gion
of
interest
were
e
xtracted
and
it
yielded
better
accurac
y
when
compared
to
state
of
art
de
v
elopment
method
such
as
mixture
of
Gaussians
and
relati
v
ely
ne
w
h
ybrid
approach
of
mean
a
v
eraging
and
mixture
of
Gaussians
method
when
it
comes
to
issues
such
as
noise
and
much
better
contours
when
considering
indi
vidual
and
multiple
objects.
An
y
noise
due
to
flick
ering
of
frames
or
noises
added
to
the
camera
feed
are
ef
fecti
v
ely
remo
v
ed
from
being
included
in
the
fore
ground.
The
mer
ging
of
fore
ground
objects
that
might
tak
e
place
due
to
occlusion
of
multiple
fore
ground
objects
has
been
a
v
oided
to
a
maximum
e
xtent
using
morphological
transformations.
The
proposed
model
is
a
light
weight
model
which
can
perform
background
subtraction
in
real
time
on
machines
with
v
ery
basic
processing
po
wer
.
Future
enhancement
can
be
shado
w
detection
and
better
splitti
ng
of
contours
of
objects
that
are
totally
occluded.
REFERENCES
[1]
Arun
V
ar
ghese,
Sreelekha
G,
”Background
Subtraction
for
V
ehicle
Detection,
”
Pr
oceedings
of
Global
Confer
ence
on
Communication
T
ec
hnolo
gies
2015
(GCCT
2015)
,
pp.
380-382,
2015.
[2]
Azra
Nasreen,
Kaushik
Ro
y
,
K
unal
Ro
y
,
Shobha
G,
”K
e
y
Frame
Extraction
and
F
ore
ground
Modelling
Using
K-Means
Clustering,
”
7th
International
Confer
ence
on
Computational
Intell
ig
ence
Communication
Systems
and
Networks
(CICSyN)
,
pp.
141-145,
2015.
[3]
Massimo
Piccardi,
”Background
Subtraction
T
echniques:
A
Re
vie
w
,
”
IEEE
International
J
ournal
on
Sys-
tems,
Man
and
Cybernetics
,
V
ol.
2,
(5),
pp.
05-25,
2004.
[4]
T
.
K
o,
S.
Soatto,
D.
Estrin,
”Background
Subtraction
on
Distrib
utions,
”
Eur
opean
Confer
ence
on
Computer
V
ision
(ECCV
2008)
,
pp.
222-230,
October
2008.
Int
J
Elec
&
Comp
Eng,
V
ol.
10,
No.
2,
April
2020
:
1337
–
1345
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
Comp
Eng
ISSN:
2088-8708
r
1345
[5]
M.
Himmelsbach,
U.
Knauer
,
F
.
W
inkler
,
F
.
Zautk
e,
K.
Bienefeld,
B.
Mef
fert,
”Application
of
an
Adapti
v
e
Background
Model
for
Monitoring
Hone
ybees,
”
VIIP
2005
,
2005.
[6]
Q.
Ling,
J.
Y
an,
F
.
Li,
Y
.
Zhang,
”A
Background
Modelling
and
F
ore
ground
Se
gmentation
Approach
Based
on
the
Feedback
of
Mo
ving
Objects
in
T
raf
fic
Surv
eillance
Systems,
”
Neur
ocomputing
,
2014.
[7]
Rahul
M
P
atil,
N
R
V
inay
,
Rohith
Y
,
Ram
Srini
v
as,
Pratiba
D,
”IoT
Enabled
V
ideo
Surv
eillance
System
us-
ing
Raspberry
Pi,
”
2nd
Confer
ence
on
Computational
Systems
and
Infor
ma
t
ion
T
ec
hnolo
gy
for
Sustainable
Solutions
(CSITSS
2017)
,
December
2017.
[8]
S.
Chakraborty
,
M.
P
aul,
M.
Murshed,
M.
Ali,
”An
Ef
ficient
V
ideo
Coding
T
echnique
Using
a
No
v
el
Non-parametric
Background
Model,
”
IEEE
International
Confer
ence
on
Multimedia
and
Expo
W
orkshops
(ICMEW
2014)
,
pp.
1-6,
July
2014.
[9]
X.
Zhang,
Y
.
T
ian,
T
.
Huang,
W
.
Gao,
”Lo
w-comple
xity
and
High-ef
ficienc
y
Background
modelling
for
Surv
eillance
V
ideo
Coding,
”
IEEE
International
Confer
ence
on
V
isual
Communication
and
Ima
g
e
Pr
o-
cessing
(VCIP
2012)
,
San
Jose,
USA,
No
v
ember
2012.
[10]
T
.
Bouwmans,
”T
raditional
and
Recent
Approaches
in
Background
modelling
for
F
ore
ground
Detection:
An
Ov
ervie
w
,
”
Computer
Science
Re
vie
w
,
2014.
[11]
Goura
v
T
akhar
,
Chandra
Prakash,
Namita
Mittal,
Rajesh
K
umar
,
”Comparati
v
e
Analysis
of
Background
Subtraction
T
echniques
and
Applications,
”
IEEE
International
Confer
ence
on
Recent
Advances
and
Inno-
vations
in
Engineering
(ICRAIE-2016)
,
pp.
1-8,
2016.
[12]
Pedro
F
.
Felzenszw
alb,
Ross
B.
Girshick,
Da
vid
McAllester
,
De
v
a
Ramanan,
”Object
Detection
with
Discriminati
v
ely
T
rained
P
art
Based
Models,
”
IEEE
T
r
ansactions
on
P
attern
Analysis
and
Mac
hine
Intel-
lig
ence
,
V
ol.
32,
(9),
pp.1627-1645,
2010.
[13]
Linda
Shapiro
et
al.,
editors,
”Computer
V
ision,
”
Illustr
ated,
Oxfor
d
UP
,
Pr
entice
Hall
,
2001.
[14]
Zezhi
Chen,
T
im
Ellis,
”A
Self-Adapti
v
e
Gaussian
Mixture
Model,
”
International
J
ournal
of
Else
vier
Computer
V
ision
and
Ima
g
e
Under
standing
,
V
ol.
122,
(3),
pp.
35-46,
2014.
[15]
Zi
vk
o
vic
Z,
”
Impro
v
ed
Adapti
v
e
Gaussian
Mixture
Model
for
Background
Subtraction,
”
Pr
oceedings
of
International
Confer
ence
on
P
attern
Reco
gnition
(ICPR)
,
Mosco
w
,
pp.
28-31,
2004.
[16]
Zi
vk
o
vic
Z,
”Ef
ficient
Adapti
v
e
Density
Estimation
per
Image
Pix
el
for
the
T
ask
of
Background
Subtrac-
tion
and
P
attern
Recognition
Letters,
”
International
J
ournal
on
P
attern
Reco
gnition
(IJPR)
,
V
ol.
27,
(7),
pp.
773-780,
2006.
[17]
N.
Go
yette,
P
.-M.
Jodoin,
F
.
Porikli,
J.
K
onrad,
and
P
.
Ishw
ar
,
http://changedetection.net
,
Pr
oc.
IEEE
W
orkshop
on
Chang
e
Detection
(CD
W
-2012)
at
CVPR-2012
,
Pro
vidence,
RI,
June
2012.
[18]
F
aisal
Bashir
,
F
atih
Porikli,
”Performance
Ev
aluation
of
Object
Detection
and
T
racking
Systems,
”
TR2006-041,
Mitsubishi
Electric
Resear
c
h
Labor
atories
,
2016.
[19]
Haixia
W
ang,
Li
Shi,
”F
ore
ground
Model
for
Background
Subtraction
wit
h
Blind
Updating,
”
IEEE
Inter
-
national
Confer
ence
on
Signal
and
Ima
g
e
Pr
ocessing
,
pp.
74-78,
2016.
[20]
A
B
Godbehere,
Matsuka
w
a
A,
Goldber
g
K,
”V
isual
T
racking
of
Human
V
isitors
Under
V
ariable
Lighting
Conditions
for
a
Responsi
v
e
Audio
Art
Installation,
”
American
Contr
ol
Confer
ence
(A
CC)
,
pp.
4305-4312,
June
2012.
[21]
Y
u
Y
uan,
Feng
D,
Y
uzhuo
Zhong,
”F
ast
Adapti
v
e
V
ariable
Frame
Rate
Coding,
”
IEEE
V
ehicular
T
ec
hnol-
o
gy
Confer
ence
,
V
ol.
5,
pp.
2734-2738,
May
2004.
[22]
Guarangnella
C,
Di
Sciasco
E,
”V
ariable
Frame
Rate
for
V
ery
Lo
w
Bit
Rate
V
ideo
Coding,
”
10th
Mediter
-
r
anean
Electr
otec
hnical
Confer
ence
,
V
ol.
2,
pp.
503-506,
2000.
[23]
Zheng
Y
i,
F
an
Liangzhong,
”Mo
ving
Object
Detection
Based
on
Running
A
v
erage
Background
and
T
em-
poral
Dif
ference,
”
Pr
oceedings
of
International
Confer
ence
on
Intellig
ent
Systems
and
Knowledg
e
Engi-
neering
(ISKE)
,
T
aiw
an,
pp.
270-272,
2010.
[24]
V
inod
M,
Sra
v
anthi
T
,
Brahma
Reddy
,
”An
Adapti
v
e
Algorithm
for
Object
T
racking
and
Counting,
”
International
J
ournal
of
Engineering
and
Inno
vative
T
ec
hnolo
gy
(IJEIT)
,
V
ol.
2,
(4),
pp.
560-585,
2012.
Rob
ust
for
e
gr
ound
modelling
to
se
gment
and
detect
multiple
...
(Rahul
M
P
atil)
Evaluation Warning : The document was created with Spire.PDF for Python.