Int
ern
ati
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
4
,
A
ugus
t
2020
,
pp.
3576
~
35
87
IS
S
N:
2088
-
8708
,
DOI:
10.11
591/
ijece
.
v
10
i
4
.
pp3576
-
35
87
3576
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
Deep
-
lea
rning ba
sed sin
gle object tracke
r f
or night
surveilla
nce
Z
ulaikha
K
adi
m
1
,
M
oh
d
As
yraf Z
ulki
fley
2
,
N
ab
il
ah H
am
z
ah
3
1
,2
Depa
rtment
of
Elec
tr
ic
a
l, E
l
ect
ronic
and
S
y
s
tem
s E
ngine
eri
ng
,
Facul
t
y
of
Eng
in
ee
ring
and
Buil
t
Envi
ronm
en
t,
Univer
siti
Keba
ngsaa
n
Mal
a
y
s
ia
,
Mal
a
y
s
ia
1
MIM
OS
Berha
d,
T
ec
hnolog
y
P
ark
Mal
a
y
s
ia,
M
al
a
y
sia
3
Facul
t
y
of
E
lect
ric
al E
ng
ine
e
rin
g,
Univer
si
ti T
ek
nologi
Mar
a
(Ui
TM),
Mal
a
y
s
ia
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
A
ug
23, 201
9
Re
vised Jan
29
, 2020
Accepte
d
Fe
b 7, 2
02
0
Tra
ck
ing
an
ob
je
ct
in
night
surveil
l
ance
vid
eo
is
a
challe
ng
i
ng
ta
sk
as
the
qu
al
ity
of
th
e
c
apt
ure
d
imag
e
is
norm
all
y
po
or
with
low
brig
htne
ss
and
cont
rast
.
The
t
a
sk
bec
om
es
har
der
for
a
sm
al
l
obje
ct
as
fewe
r
fea
tur
es
are
appa
ren
t.
Tra
di
t
iona
l
appr
oac
h
is
base
d
on
improving
the
image
qua
lit
y
bef
ore
tra
ck
ing
i
s
per
form
ed.
In
t
his
pape
r
,
a
sing
le
obj
ec
t
tr
ac
kin
g
al
gori
thm
base
d
on
de
ep
-
le
arn
ing
appr
o
a
ch
is
proposed
to
expl
o
it
it
s
outsta
nding
ca
pab
il
ity
of
m
o
del
li
ng
ob
je
c
t’s
appe
ar
anc
e
eve
n
during
night
.
T
h
e
al
gori
thm
uses
pre
-
traine
d
convol
uti
on
al
ne
ura
l
n
et
works
co
uple
d
wit
h
ful
l
y
conne
c
te
d
lay
ers,
whi
ch
ar
e
tra
in
ed
onli
n
e
during
the
tr
ac
ki
ng
so
tha
t
it
is
a
ble
to
c
at
e
r
for
appeara
nc
e
cha
nges
as
th
e
object
m
oves
aro
und.
V
ari
o
us
le
arn
ing
h
y
per
p
ara
m
eters
f
or
the
opti
m
iz
at
ion
fun
ct
ion
,
le
arn
ing
r
at
e
a
nd
rat
io
o
f
tra
ini
ng
sam
ples
are
t
este
d
to
find
opti
m
al
s
et
up
for
tr
ac
k
in
g
in
night
sce
nar
ios.
Fourt
ee
n
night
surve
il
la
n
ce
vid
eos
are
col
l
ec
t
ed
for
val
ida
t
ion
purpose,
which
a
re
ca
ptur
ed
from
thre
e
vie
wing
a
ngle
s.
The
results
show
tha
t
the
best
a
cc
ur
acy
is
obt
ai
ned
b
y
using
Adam
op
ti
m
iz
er
with
le
a
r
ning
rat
e
o
f
0.
00075
and
sa
m
pli
ng
rat
io
of
2:1
for
posit
ive
and
n
ega
t
ive
tr
ai
ning
d
ata.
Thi
s
al
gor
it
hm
is
suita
ble
to
b
e
implemente
d
in
highe
r
le
ve
l
surveil
la
n
c
e
appl
ic
at
ions
suc
h
as
abno
rm
al
be
havi
ora
l
re
cogni
t
ion.
Ke
yw
or
d
s
:
Deep
-
le
ar
ning
obj
ect
trac
ker
Night s
urveil
la
nce
vid
e
o
Visu
al
obj
ect
t
rack
i
ng
Copyright
©
202
0
Instit
ut
e
o
f
A
d
vanc
ed
Engi
n
ee
r
ing
and
S
ci
en
ce
.
Al
l
righ
ts re
serv
ed
.
Corres
pond
in
g
Aut
h
or:
Zulai
k
ha Kadi
m
,
Dep
a
rtm
ent o
f El
ect
rical
, Elect
ronic an
d Sy
stem
s En
gi
neer
i
ng,
Faculty
of E
ngineerin
g
a
nd B
uilt
Environm
e
nt
,
Un
i
ver
sit
i Ke
ba
ngsaan
Mal
ay
sia
,
Ba
ng
i
4365
0,
Ma
la
ysi
a.
Em
a
il
:
zulai
kh
a.k
a
dim
@
m
i
m
os
.m
y
1.
I
NTR
ODU
CT
ION
The
r
ole
of
vi
de
o
surveil
la
nce
is
to
pr
ovide
a
pr
otect
ive
m
e
an
thr
ough
m
o
nitor
i
ng
a
nd
an
al
yz
ing
any
abno
rm
ality
in
the
scenes
.
N
ow
a
days,
it
is
beco
m
ing
m
or
e
i
m
po
rta
nt
with
the
e
ver
increasin
g
num
ber
of
crim
es.
Cri
m
e
can
ta
ke
place
anyt
i
m
e
a
ll
ov
er
the
dayb
ut
it
is
m
or
e
pr
evalent
duri
ng
night
tim
e,
especial
ly
after
the
m
idn
igh
t.
W
it
h
the
app
li
cat
ion
of
autom
at
ed
vide
o
surveil
la
nce
syst
e
m
,
i
t
can
prov
i
de
co
ntinuo
us
m
on
it
or
ing ser
vice f
or 24/7
wi
th m
ini
m
al
d
epende
ncy on t
he
secu
rity
o
f
ficer.
In
t
h
e
past
dec
ades,
resea
rch
in
aut
om
at
ed
vid
eo
s
urveil
la
nc
e
ap
plica
ti
on
s
has
e
volve
d
tr
e
m
end
ously
and
m
any
sign
ific
ant
pro
gresses
ca
n
be
obser
ved
th
rough
a
vaila
bity
of
m
any
co
m
m
ercial
pr
oducts
i
n
the
m
ark
et
.
Thanks
to
the
ne
w
breakt
hroughs
in
softwar
e
te
chn
o
l
og
y,
it
has
becam
e
m
or
e
eff
ect
iv
e
an
d
afforda
ble.
T
he
key
te
ch
no
l
og
yi
n
the
e
ff
e
ct
iveness
of
these
syst
em
s
is
the
abili
ty
to
detect
a
nd
track
the m
ov
ing o
bj
ect
ev
en
in
t
he dar
k
e
nviro
nme
nts,
es
pecial
ly
dur
i
ng the
night.
Both
obj
ect
de
te
ct
ion
an
d
ob
j
ect
t
rack
i
ng
a
re
the
f
undam
ental
com
po
ne
nts
in
a
n
a
utom
at
ed
vid
e
o
su
r
veill
ance
a
pp
li
cat
ion.
O
bject
detect
ion
ta
sk
is
to
detect
the
pr
ese
nc
e
of
ob
j
ect
of
interest
in
the
vid
e
o
fr
am
e.
Wh
il
e,
obj
ect
trac
king
connects
a
nd
analy
ses
the
obj
ect
m
ov
em
e
nts
f
or
t
he
su
c
ce
ssive
vi
deo
f
ram
es.
The
inf
orm
ati
on
der
i
ved
from
the
tracke
r
can
be
us
e
d
to
further
a
na
ly
ze
and
de
duce
obj
ect
act
ivit
ies
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
Com
p
En
g
IS
S
N:
2088
-
8708
Deep
-
le
arnin
g based
sin
gle objec
t t
ra
cker f
or ni
gh
t s
ur
vei
ll
an
ce
(
Zu
l
aikh
a K
adim
)
3577
in
the
vid
e
o.
T
her
e
a
re
m
any
researc
hes
has
been
do
ne
on
t
hese
tw
o
t
op
ic
s,
ho
wev
e
r,
m
os
t
of
them
focuses
on the
bri
ght e
nv
i
ronm
ent, w
i
th li
tt
le
e
m
ph
asi
ze o
n dar
k
e
nvir
on
m
ent.
Object
trac
king
f
or
ni
gh
t
s
urveil
la
nce
is
a
ver
y
c
halle
nging
ta
s
k
m
ai
nl
y
du
e
t
o
lo
w
inf
or
m
at
ion
captu
red
by
t
he
norm
al
RGB
cam
eras.
The
captu
red
im
ag
es
ha
ve
l
ow
br
igh
tne
ss,
l
ow
c
on
t
rast
an
d
nea
rly
no
disti
nguish
a
b
le
color
i
nfor
m
at
ion
.
It
is
w
ors
t
if
the
obj
ect
is
s
m
al
l
in
size,
cause
d
by
the
fa
r
distance
from
the
ca
m
era
[1
,
2]
.
Although,
m
os
t
of
the
recent
cam
era
s
are
equ
ip
pe
d
with
night
visio
n
te
chnolog
y
t
o
i
m
pr
ove
im
age
qual
it
y
in
low
-
li
gh
t
c
onditi
on,
ye
t,
the
im
a
ge
qu
al
it
y
is
sti
ll
no
m
at
ch
as
com
par
ed
t
o
the
day
tim
e
i
m
a
ge.
In
so
m
e
cases,
t
her
m
al
infr
are
d
cam
era
is
us
ed
f
or
night
surveil
la
nce
[
3,
4]
,
but
the
cost
of
this
ty
pe
of
cam
era
is
relat
ively
too
co
stl
y.
H
ence,
night
surveil
la
nce
is
norm
al
ly
per
f
orm
ed
us
in
g
da
y/
nig
ht
CC
TV
cam
era w
it
h
a
dd
it
io
n o
f
IR
f
il
te
r
a
nd IR
il
lu
m
inator
f
or b
et
te
r nig
ht
visio
n.
These
days,
de
ep
-
le
ar
ning
stud
y
has
bec
om
e
a
center
of
at
te
ntio
n
am
ong
resea
rch
e
r
s
in
div
e
rse
fiel
ds
that
inc
lud
e
obj
ect
de
te
ct
ion
,
cl
assif
ic
at
ion
,
facial
and
s
peec
h
r
ecognit
ion,
re
hab
il
it
at
ion
,
m
achi
ne
translat
ion
an
d
et
c.
[5
-
11]
.
D
eep
-
le
ar
ning
is
a
s
ubfiel
d
of
m
achine
le
ar
nin
g
that
was
i
nspire
d
by
the
hu
m
an
br
ai
n’
s
str
ucture
cal
le
d
ne
uro
n,
w
hich
ca
n
be
ada
pted
t
o
le
arn
c
om
plex
re
la
ti
on
sh
i
p
[
5]
and
can
be
ext
end
e
d
to
m
ult
i
-
la
ye
r
netw
orks
f
or
non
-
li
near
pro
blem
s.
Ther
e
are
m
any
t
ypes
of
deep
-
le
ar
ning
arc
hitec
ture,
i.e
.
Conv
olu
ti
onal
Neural
Net
wor
k
(C
NN),
Ge
ne
rati
ve
A
dvesa
rial
Netw
ork
(
GAN
),
Re
cu
rrent
Ne
ur
al
Networ
ks
(RN
N)
an
d
et
c
.
Am
ong
al
l
of
them
,
CNN
is
the
m
os
t
wide
ly
us
ed
arc
hitec
ture,
especial
ly
in
c
om
pu
te
r
visi
on
fiel
d
f
or
ob
j
e
ct
detect
ion
,
r
ecognit
ion
an
d
trac
king.
C
NN
arc
hitec
tu
re
wa
s
dev
ise
d
by
Ya
nn
L
eC
un
i
n
1998
[
7],
w
here
the
featur
e
e
xtracto
r
is
al
so
trai
ned
instea
d
of
ha
nd
-
c
raf
t
e
d.
Fi
gure
1
s
hows
a
n
exam
ple
of
basic
CNN
str
uctu
re
[12]
tha
t
con
sist
s
of
t
wo
c
onvoluti
onal
la
ye
rs,
two
pooling
la
ye
rs
,
on
e
fu
ll
y
connecte
d
la
ye
r
and
on
e
ou
t
pu
t
la
ye
r
that
def
ines
the
final
cl
assifi
cat
ion
acc
ordin
g
to
the
nu
m
ber
of
c
la
sses.
The
c
onvoluti
on
al
la
ye
rs
in
CNN
act
s
as
t
he
detect
io
n
fi
lt
ers
to
extract
sp
eci
fic
feat
ures
or
patte
r
ns
that
ar
e
pr
ese
nce
in
the
i
m
age.
A
n
ad
di
ti
on
of
a
ne
w
l
ay
er
will
incre
ase
the
com
plexity
,
thu
s
al
lo
ws
it
to
capt
ure
m
or
e
abstract
featur
e
s.
Figure
1
.
An e
xam
ple o
f basi
c CN
N netw
ork [12]
Du
e
to
the
CN
N
ca
pab
il
it
y,
this
pa
pe
r
pr
opos
es
a
m
et
ho
d
of
onli
ne
trac
ki
ng
of
ob
j
ect
of
interest
for
night
surveil
la
nce
ap
plica
ti
on
thr
ough
dee
p
-
le
ar
ning
a
ppro
ac
h.
A
net
w
ork
with
3
co
nvolu
ti
onal
la
ye
rs
a
nd
3
f
ully
connec
te
d
la
ye
rs
is
use
d
to
m
od
el
the
ob
j
ect
ap
pe
aran
ce
as
pro
pose
d
in
[
13]
.
The
f
ully
-
co
nn
ect
ed
la
ye
rs
will
be
updated
onli
ne
to
cat
er
the
c
hanges
in
ta
r
ge
t
obj
ect
a
pp
e
aran
ce
as
it
m
ov
e
s
ar
ound
t
he
scen
e
unde
r
dif
fer
e
nt
li
gh
ti
ng
c
ondi
ti
on
s.
Va
rio
us
hyperpa
ram
eter
s
f
or
on
li
ne
le
arn
in
g
are
e
xp
e
rim
ented,
wh
ic
h
include
the
se
le
ct
ion
of
op
ti
m
iz
ation
al
go
r
it
h
m
s,
on
li
ne
le
arn
in
g
rates
and
trai
ning
s
a
m
ple
rati
o
to
fin
d
the opti
m
a
l t
rack
er
setu
p.
The
m
a
in contrib
ut
ion
s
of this
wo
rk are:
-
On
li
ne
ta
rg
et
t
rack
in
g
f
ram
e
wor
k
f
or
nigh
t
su
r
veill
ance
vid
e
o
that
util
iz
es
deep
-
le
ar
ning
ap
proac
h
to
dynam
ic
ally repr
ese
nt tar
get a
pp
ea
ra
nce m
odel
.
-
Re
search
on t
he
i
m
pact o
f
opt
i
m
al
o
nline learni
ng h
y
perpar
a
m
et
ers
for
the
b
est
overall
tr
ackin
g
acc
ur
ac
y.
The
rem
ai
nd
er
o
f
this
pap
e
r
is
or
ga
nized
a
s
fo
ll
ows:
Sect
ion
2
disc
us
s
es
so
m
e
rela
ted
w
orks
on
visu
al
obj
ect
tracki
ng.
Sect
ion
3
desc
ribe
s
the
pr
opos
e
d
m
et
ho
d,
fo
ll
ow
e
d
by
ex
pe
rim
ental
resu
lts
an
d
discussi
on in
S
ect
ion
4.
Finall
y, Sect
ion 5
conclu
des
al
l t
he
researc
h fin
dings.
2.
RELATE
D
W
ORKS
This
sect
io
n
w
il
l
discuss
ge
ne
ral
ap
proach
t
o
vis
ual
ob
j
ect
trackin
g,
fo
ll
owed
by
s
pecial
iz
ed
track
e
r
for
night
surv
ei
ll
ance
app
li
cat
ion
s
an
d
the
evo
luti
on
of
obj
ect
trackin
g
al
gorithm
towards
dee
p
-
le
arn
i
ng
appr
oach.
A
good
ob
j
ect
tr
acker
is
de
fine
d
as
an
al
gori
thm
that
is
ca
pab
le
of
pr
ov
i
ding
accu
rate
obj
ect
local
iz
at
ion
wi
th
co
ns
ist
ent
obj
ect
’s
trac
king
la
bel
ac
ro
s
s
su
ccessi
ve
f
ra
m
es.
Object
tr
ackin
g
stu
dies
ha
ve
been
a
n
act
ive
researc
h
fiel
d
for
the
past
se
ver
al
decad
es
,
and
ha
ve
d
em
onstrat
ed
good
pro
gr
ess
in
diff
e
ren
t
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN
:
2088
-
8708
In
t J
Elec
&
Com
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3576
-
3587
3578
scenari
os
a
nd
app
li
cat
ions.
Most
of
the
tr
ackin
g
al
gorit
hm
s
are
based
on
t
rack
i
ng
-
by
-
detect
ion
pa
rad
i
gm
,
wh
e
re
by
obj
ec
t
of
inte
rest
is
detect
ed
in
e
ver
y
f
ram
e,
w
hich
will
beus
ed
to
update
t
he
trac
king
st
at
es
of
the
o
bj
ect
.
T
his
ap
proac
h
is
heav
il
y
de
pende
nt
on
th
e
detect
ion
a
ccur
acy
.
T
hus
,
an
im
pr
ovm
ent
in
the
detect
io
n
al
gorithm
will
lead
to
bette
r
t
ra
ckin
g
acc
ur
acy
accor
dingly
.
Am
on
g
oth
e
rs,
so
m
e
good
tra
ckin
g
-
by
-
detect
ion
al
gorithm
s
are
pr
esente
d
in
[
14
-
20]
.
So
m
e
of
these
tracki
ng
ap
proac
hes
are
able
to
f
unct
ion
well
unde
r
go
od
li
gh
ti
ng
c
onditi
on
s
,
ho
wever,
their
pe
rfo
rm
ance
deterior
at
e
as
the
e
nv
i
ronm
ent
beco
m
es
darker
s
uch as
in n
i
gh
t
surveil
la
nce appli
cat
ion.
Pr
e
viously
,
on
e
of
the
c
ommon
ap
proac
hes
to
im
pr
ov
e
t
ra
ckin
g
perform
ance
f
or
ni
gh
t
su
r
veill
ance
is
by
introd
ucing
pr
e
proce
ssing
m
odule
to
enh
a
nce
im
age
qu
al
it
y
for
the
case
of
under
e
xpos
e
d
a
nd
lo
w
con
t
rast
e
nv
ir
on
m
ents.
S
ome
exam
ples
of
the
pr
e
proces
sing
ste
p
are
histo
gr
am
equal
iz
at
ion
,
histo
gr
am
sp
eci
ficat
ion
a
nd
inten
sit
y
m
app
i
ng.
Anothe
r
a
ppro
ac
h
is
thr
ough
a
naly
zi
ng
t
he
c
on
t
ras
t
le
vel
so
that
obj
ect
detect
ion
will
be
i
m
pr
oved
be
fore
tracki
ng
is
per
f
or
m
ed.
This
is
based
on
the
ass
um
ption
that
the
hum
an
visu
al
syst
e
m
is
de
pende
nt
on
t
he
neig
hbour
hood
sp
at
i
al
relat
ion
to
it
s
bac
kgr
ound.
Hu
a
ng
et
al
.
[
1]
use
d
con
t
rast
cha
ng
es
inf
orm
ation
betwee
n
su
cc
essive
fr
am
es
to
im
pr
ov
e
ob
je
ct
detect
ion
a
ccur
acy
i
n
the
night
vid
e
o
ap
plica
ti
on.
L
ocal
co
ntr
ast
is
com
pu
te
d
by d
ivi
ding
the
local
sta
nda
rd
d
e
viati
on
o
f
i
m
age
in
te
ns
it
y
with
local
m
ean
int
ensity
.
Then,
the
obj
ect
is
det
ect
ed
by
thres
holdin
g
the
co
nt
rast
change
bet
ween
the
s
ucce
ssive
fr
am
es.
The
com
pu
ta
ti
on
is
qu
it
e
fast,
bu
t
the
local
con
tr
ast
inform
ation
to
in
dicat
e
t
he
presence
of
obj
ec
t
of
i
nterest
m
i
gh
t
be
m
i
sle
a
ding
as
t
he
ba
ckgr
ound
in
f
or
m
at
ion
it
sel
f
m
a
y
con
trai
n
high
l
ocal
co
ntrast.
On
the
oth
e
r
hand,
the
obj
e
ct
m
igh
t
ha
ve
alm
os
t
si
m
il
a
r
ap
pea
ran
ce
th
at
pro
duces
l
ow
local
c
on
t
rast.
Lat
er
in
[
21]
,
Hu
a
ng
et
al
.
pro
posed
m
otion
pre
dicti
on
a
nd
s
patia
l
nea
res
t
neig
hbou
r
data
associat
ion
t
o
furthe
r
s
uppr
e
s
s
the
false
det
ect
ion
.
In
[
2],
Wang
et
al
.
i
m
pr
ov
e
Hu
a
ng
’s
CC
m
od
el
by
introdu
ci
ng
sal
ie
nt
con
t
rast
chang
e
(S
CC
),
w
hich
involve
tw
o
m
or
e
ste
ps
;
on
li
ne
le
arn
in
g
and
a
naly
zi
ng
the
detect
ed
obj
ect
trajecto
ries.
B
y
app
ly
ing
a
t
hr
es
hold
on
t
he
con
t
rast
cha
ng
e
outp
ut,
it
is
m
or
e
sensiti
ve
to
sli
ght
ch
ang
e
s
in
the
li
gh
ti
ng
le
vel.
T
hu
s
Nazi
b
et
al
.
[
22]
m
ulti
plye
d
S
hahnon
’s
ent
ropy
est
i
m
ation
wit
h
their
own
co
ntrast
est
i
m
ation
to
pro
duce
il
lum
inati
on
in
var
ia
nt
represe
n
ta
ti
on.
I
n
[
23]
,
ve
hic
le
s
in
night
s
urveil
la
nce
vi
deos
are
detect
ed
by
co
m
pu
ti
ng
H
OG
featur
es
as
i
nput
to
s
upport
vector
m
achine
(SVM)
t
o
cl
assify
the
de
te
ct
ed
obj
ect
eit
he
r
as
a v
e
hicle
or
not,
befor
e
K
al
m
an
filt
er is a
ppli
ed
to
trac
k
t
he vehicl
es.
Ap
a
r
t
from
pr
evio
us
ly
m
entio
ne
d
ap
proac
he
s,
there
are
a
lso
a
few
rese
arch
es
that
ha
s
exp
loit
e
d
ca
m
era
te
chnolo
gy
to
incr
ease
the
dete
ct
ion
an
d
tra
ckin
g
accu
rac
y
in
night
env
i
ronm
ent.
In
[
24
]
,
the
resea
rch
e
r
s
has
us
e
d
far
-
inf
rar
e
d
cam
eras
to
obta
in
t
he
f
oreg
rou
nd
inf
or
m
at
ion
thr
ough
backg
r
ound
subtract
ion
te
c
hn
i
qu
e
.
I
n
[25]
,
the
researc
he
rs
has
us
e
d
a
near
in
fr
a
re
d
ca
m
era
to
detect
ped
est
ria
ns
us
in
g
adap
ti
ve
pre
processin
g
te
c
hniqu
e
f
or
t
he
nig
ht
e
nvir
on
m
ent.
Anothe
r
re
search
in
[
26
]
has
us
e
d
a
f
usi
on
of
two
dif
fer
e
nt
t
ypes
of
cam
er
a,
w
hich
are
li
gh
t
visible
ca
m
era
an
d
F
IR
ca
m
era
m
ou
nted
on
a
ca
r
t
o
detec
t
ped
e
stria
ns
durin
g
the
day
and
night
tim
es.
Even
with
the
help
f
ro
m
i
mp
r
oved
cam
era
te
chn
ol
og
y,
t
he
total
cost
of
t
he
syt
e
m
s h
as r
ise
n be
cause
of m
or
e
com
plex
sensi
ng h
a
r
dw
a
res.
Deep
le
a
rn
i
ng
has
been
popula
rized
by
th
e
introd
uction
of
Alex
Ne
t
in
20
12,
w
he
n
it
has
w
on
Im
ageN
et
co
m
petit
ion
for
im
age classi
ficat
ion
tas
k
[
27]
. E
ver
si
nce, deep
learnin
g
ha
s bee
n
wi
dely
ap
pl
ie
d
in
m
any
app
li
cat
ions
,
ove
rsh
a
dowi
ng
t
he
othe
r
tradit
io
nal
m
achine
le
arn
i
ng
a
ppr
oach
e
s
su
ch
as
SVM
and
arti
fici
al
neura
l
netw
ork
(AN
N).
I
n
[
28
]
,
C
NN
is
us
ed
to
detect
hum
an
pr
ese
nce
i
n
night
s
urveil
la
nce
vid
e
os
as
an
in
put
to
obj
ect
trac
ke
r.
T
heir
pro
posed
netw
ork
c
onsist
s
of
fi
ve
conv
olu
ti
onal
la
ye
rs
an
d
3
fu
ll
y
connecte
d
la
ye
rs.
T
he
input
im
age
is
resized
to
183x
119
first,
be
fore
hi
stogram
equ
al
i
zat
ion
is
app
li
ed
f
or
hu
m
an
detect
ion
ta
sk.
The
pro
posed
m
et
ho
d
is
cl
os
el
y
rel
at
ed
to
hu
m
an/backgro
und
cl
assifi
cat
ion
in
night
scene
s
rat
her
t
han
trac
king
pro
blem
.
An
othe
r
early
effor
t
in
app
ly
in
g
CNN
in
obj
ect
tracki
ng
is
pro
pose
d
i
n
[29],
w
her
e
a
n
on
li
ne
trac
king
fr
am
ework
ba
sed
on
m
ulti
-
do
m
ai
n
represe
ntati
on
s
is
pro
posed
.
Its
arc
hi
te
ct
ur
e
consi
sts
of
m
ulti
ple
sh
ared
la
ye
rs
that
they
r
efer
as
do
m
ai
n
ind
e
pende
nt
la
ye
rs,
w
her
e
only
the
cl
assifi
cat
ion
la
ye
r
is
def
ined
as
the
do
m
a
in
-
s
pecific
on
e
s.
The
sh
a
red
la
ye
rs
are
train
ed
us
in
g
m
ul
ti
ple
ann
otate
d
vide
o
seq
uen
ces
offl
ine,
wh
il
e
cl
a
ssific
at
ion
la
ye
r
is
trai
ne
d
s
epar
at
el
y
base
d
on
each
do
m
ai
n.
Wh
en
a
ne
w
seq
uen
ce
or
do
m
ai
n
is
giv
en,
a
new
cl
assifi
c
at
ion
la
ye
r
will
be
co
ns
tr
ucted
to
com
pu
te
the
ta
rg
et
sco
re
base
d
on
t
he
new
i
nput.
T
hen,
the
f
ully
-
co
nn
ect
e
d
la
ye
rs
within
the
sh
a
red
la
ye
rs
an
d
t
he
ne
w
cl
assifi
cat
io
n
la
ye
r
will
be
updated
pe
ri
dio
cal
ly
so
that
it
is
ad
apted
to
the
ne
w
dom
ai
n.
In
[3
0],
m
ulti
ple
CNNs
in
TC
NN
i
s
m
ai
ntained
in
a
tree
structur
e
to
rep
rese
nt
m
ul
ti
-
m
od
al
tar
get
ap
pear
a
nc
e.
It
will
up
dat
e
the
CNN
m
o
dels
in
the
br
a
nches
wh
ic
h
has
m
os
t
si
m
il
ar
app
earance
wit
h
the
curre
n
t
ta
rg
et
est
i
m
ation
.
In
[
3,
13]
,
a
gen
e
ral
trackin
g
f
ram
e
wor
k
f
or
the
rm
al
infr
are
d
vid
e
os
ha
s
bee
n
pr
opos
e
d.
T
her
m
al
i
m
ages
exh
ibit
si
m
i
la
r
pro
per
ti
es
to
night
su
r
ve
il
la
nce
i
m
ages
wh
e
re
the
ta
rg
et
ob
j
ect
usual
ly
con
sist
s
of
lo
w
co
ntr
ast
inform
ation
an
d
neg
li
ga
ble
te
xt
ur
es
.
I
n
[
3],
m
ulti
ple
m
od
el
s
are
m
ai
ntained
to
re
pr
ese
nt
the
ta
rg
et
ap
pe
aran
ce
in
different
cases
su
c
h
as
f
or
t
he
case
of
t
e
m
po
ra
ry
occl
us
io
n.
D
ur
in
g
netw
ork
updat
es,
pa
ren
t
node
s
will
be
re
placed
by
the
new
node
so
that
there
is
no
redu
ndancy
in
the
pool
of
ta
rg
et
obj
ect
ap
pear
a
nce
m
od
el
s.
In
[
13
]
,
a
Sia
m
ese
app
ro
ac
h
is
util
ized
in
w
hich
pa
ir
of
patc
hes
are
com
par
ed
to
find
the
m
os
t
li
kely
locat
i
on
of
the tar
get ob
j
e
ct
in
the c
urre
nt
f
ram
e.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
Com
p
En
g
IS
S
N:
2088
-
8708
Deep
-
le
arnin
g based
sin
gle objec
t t
ra
cker f
or ni
gh
t s
ur
vei
ll
an
ce
(
Zu
l
aikh
a K
adim
)
3579
3.
METHO
DOL
OGY
3.1.
Tr
ac
ker
w
orkf
l
ow
Figure
2
il
lustr
at
es
the
ove
rall
workflo
w
of
the
pro
pose
d
t
rack
i
ng
m
et
h
odol
og
y.
I
n
the
first
f
ram
e,
the
tracke
r
is
init
ia
li
zed
us
in
g
a
sin
gle
gro
und
tr
uth
boundi
ng
box
that
encloses
the
obj
ect
.
P
os
it
iv
e
and
neg
at
ive
can
di
dates
are
t
hen
gen
e
rated
us
i
ng
t
he
giv
e
n
boundi
ng
box
.
P
os
it
ive
sam
ples
co
rr
e
spo
nd
to
the
patches
or
subim
ages
t
hat
represe
nt
the
obj
ect
of
interest
,
wh
il
e
neg
at
ive
sa
m
ples
cor
respond
to
sub
im
ages
that
belo
ngs
to
t
he
bac
kgrou
nd.
Let
n
t
a
nd
m
t
be
the
num
ber
of
posit
ive
and
ne
gative
tr
ai
nin
g
sam
ples
,
resp
e
ct
ively
.
Po
sit
ive
trai
ning
dat
a
are
gen
e
rate
d
by
ra
ndom
l
y
sh
ifti
ng
the
init
ia
l
bo
undi
ng
box
within
a
sm
al
l
distance
(
the
sh
ifte
d
patc
h
shou
l
d
at
le
ast
con
sist
s
of
80%
overla
p
area
with
res
pe
ct
to
the
ori
gi
nal
bo
unding
box
)
a
nd
neg
at
ive
sam
ples
are
ge
ne
ra
t
ed
by
rand
om
l
y
sh
ifti
ng
the
init
ia
l
bounding
box
su
c
h
that
they
will
hav
e
m
i
nim
a
l
ov
erla
p
area
(
overlap
area
with
at
m
os
t
10%
with
resp
ect
t
o
the
init
ial
boundi
ng
box
).
A
fter
gen
e
r
at
ing
al
l
the
trai
ning
sam
ples,
ap
pear
a
nce
featur
es
will
be
extracte
d
us
in
g
the
CNN
netw
orks
to
pro
du
c
e
a
featur
e
vec
tor
with
le
ng
t
h
of
512
.
B
oth
s
et
s
of
posit
ive
and
ne
gative
f
eat
ur
e
vecto
rs
a
re th
e
n use
d
to
train
the r
est
of the
full
y connect
ed
la
ye
rs,
which
will
r
esult i
n t
he
traine
d
m
od
e
l.
Durin
g
onli
ne
tracki
ng,
the
process
sta
rts
by
ge
ne
rati
ng
the
possi
ble
c
and
i
date
sam
ples
locat
ion
pivoted
on
the
la
st
know
n
lo
cat
ion
of
the
obj
ect
.
T
otal
nu
m
ber
of
sam
pl
es
extracte
d
is
le
sser
c
om
par
ed
t
o
trai
ning
sam
pl
es
to
sp
eed
up
the
trackin
g
process.
T
he
featur
es
are
the
n
e
xtracted
an
d
te
ste
d
us
in
g
the
trai
ne
d
netw
ork.
The
net
work
outp
ut
are the p
roba
bili
ti
es that t
he
pa
tc
h
bel
ongs
t
o fore
groun
d object an
d bac
kgr
ound
data.
T
he
locat
ion
s
of
n
highe
st
foregr
ound
pro
balit
ie
s
sam
p
le
s
will
then
be
us
e
d
to
up
date
est
i
m
at
ed
locat
ion
of
t
he
trac
ke
d
obj
ect
in
cu
rrent
in
pu
t
fr
am
e
.
Finall
y,
the
netw
ork
is
re
trai
ned
or
up
da
te
d
pe
rio
dical
ly
to
captu
re
the
ch
ang
e
s
in
obj
ec
t’s
app
ea
ra
nce
as
it
m
ov
es
aro
un
d
the
scene
s
under
dif
fer
e
nt
li
gh
ti
ng
e
xp
osure
and b
ac
kgr
ound.
Figure
2. O
veral
l t
rack
in
g
fl
ow
3.2.
Ne
twor
k
architec
tu
re
The
n
et
work
a
rch
it
ect
ur
e
c
onsist
s
of
t
hr
ee
convo
l
ution
al
la
ye
r
s
and
th
ree
fu
ll
y
co
nnect
ed
la
ye
rs
(F
C)
.
The
first thr
ee
co
nvolu
ti
on
al
la
ye
rs
weigh
ts
an
d
biases
are
ob
ta
ine
d
from
VG
G
-
M
[
31]
,
wh
ic
h
has
been
trai
ned
on
Im
ageN
et
da
ta
set
[
32
].
V
GG
-
M
is
an
ei
ght
la
ye
rs
net
work
w
her
e
the
first
fi
ve
la
ye
rs
ar
e
the
conv
olu
ti
onal
la
ye
rs,
w
hich
f
un
ct
io
n
as
featur
e
e
xtract
or
a
nd
the
la
st
three
la
ye
rs
ar
e
the
den
se
FC
la
ye
rs.
The
or
igi
nal
i
nput
siz
e
of
V
GG
-
M
is
22
4x22
4.
Howe
ve
r,
the
pro
pose
d
net
wor
k
use
s
only
the
firs
t
three
conv
olu
ti
onal
la
ye
rs
with
i
nput
siz
e
of
75x75.
T
hu
s
,
al
l
trai
ning
a
nd
t
est
ing
sam
ples
are
resized
to
m
a
tc
h
the
co
rr
e
spo
nding
i
nput
siz
e.
The full
netw
ork
arc
hitec
ture use
d
in
this
work is il
lustrate
d i
n
Fi
gure
3.
T
he
fi
rst
C
NN
la
ye
r
co
ns
ist
s
of
96
filt
ers
of
7x7
ke
rn
el
.
The
st
ride
ste
p
is
2
in
x
a
nd
y
directi
ons
,
fo
ll
owe
d
by
R
eLU
act
ivati
on
functi
on
,
loca
l
response
no
r
m
al
iz
a
ti
on
an
d
3x3
m
ax
i
m
um
po
olin
g
t
o
pro
duce
featur
e
m
aps
of
siz
e
1
7x17
x9
6.
T
he
sec
ond
conv
olu
ti
on
la
ye
r
co
ns
ist
s
of
256
diff
e
ren
t
filt
ers
of
kerne
l
siz
e
5x5
,
w
hich
is
then
f
ollow
e
d
by
Re
LU
act
iv
at
ion
functi
on
,
local
re
spo
ns
e
norm
al
iz
ation
an
d
3x3
m
ax
i
m
u
m
poolin
g
to
pro
du
ce
3x
3x256
featur
e
m
aps.
Finall
y,
the
thi
rd
la
ye
r
co
ns
is
ts
of
51
2
filt
er
s
of
kernel
siz
e
3x3
,
wh
ic
h wil
l pro
du
ce
feat
ur
e m
aps
of 1x
1x51
2
.
Both
posit
ive
and
neg
at
ive
extracte
d
f
eat
ure
vecto
rs
are
then
us
e
d
to
trai
n
the
three
FC
la
ye
rs.
Final
ou
t
pu
t
f
r
om
the
la
st
so
ftm
ax
l
ay
er
are
the
two
pro
ba
bili
ti
es
that
rep
rese
nt
the
li
kelihoo
d
of
the
in
pu
t
i
m
age
patch
belo
ng
s
to
th
e
tracke
d
ob
je
ct
and
the
li
kelihoo
d
that
the
input
im
a
ge
patc
h
bel
ongs
to
the
bac
kgr
ound.
I
niti
al
ly
,
a
ll
FC
pa
ram
e
te
rs
are
ra
nd
om
l
y
init
ia
l
ized
.
I
n
this
w
ork,
three
dif
fer
e
nt
op
ti
m
iz
a
ti
on
a
lgorit
hm
s
are
exp
e
rim
ented
to
trai
n
t
he
F
C
la
ye
rs:
Gr
a
dient
Descen
t
[33]
,
Ad
am
[
34
]
an
d
Ad
a
gr
ad
[35
]
with
four d
i
ff
e
ren
t l
ear
ning
ra
te
s: 0.001
25, 0.
001, 0.0
0075 a
nd 0.00
05.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN
:
2088
-
8708
In
t J
Elec
&
Com
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3576
-
3587
3580
Figure
3. Net
w
ork
a
rch
it
ect
ur
e of the
prop
ose
d
trac
king al
gorithm
3.3.
Net
w
ork
le
ar
ning par
ame
t
ers
I
n
t
his
w
ork
,
only
the
la
st
thr
ee
FC
la
ye
rs
will
underg
o
r
et
rainin
g
s
o
th
at
the
netw
ork
is
ada
pted
t
o
the
changes
in
the
obj
ect
appearance
.
In
the
first
fr
am
e,
th
e
weigh
ts
of
th
ese
la
ye
rs
are
ran
dom
ly
init
ial
iz
ed,
wh
il
e
the
bias
es
are
fixe
d
to
0.0
5.
Lea
r
ning
par
am
et
er
va
lues
f
or
posit
ive
sam
ples,
ne
gative
sam
ples,
init
ia
l
le
arn
in
g
rate
a
nd
num
ber
of
epo
c
h
a
re
set
to
50
0,
1000,
0.0
005
a
nd
150
resp
ect
ively
.
C
ro
s
s
entr
opy
(
1)
loss
functi
on
is
use
d
to
trai
n
th
e
netw
ork,
w
he
re
p
is
t
he
tr
ue
la
bel,
q
is
the
predict
ed
pro
bab
il
it
y
and
x
is
the
num
ber
of
outp
ut
cl
ass.
Since
the
ne
twork
outp
uts
are
a
set
of
two
pro
ba
bili
ti
es;
pr
oba
bili
t
y
that
the
sam
ple
is
f
or
e
gro
und
a
nd
bac
kgr
ound,
t
hus
x
val
ue
is
tw
o
w
he
re
the
s
umm
a
ti
on
of
each
sam
ple
pro
bab
il
it
ie
s
is
eq
ual
to
1.
Now,
le
t
the
true
la
bel
be
=
0
=
an
d
=
1
=
1
−
,
and
th
e
pr
e
dicte
d
pro
bab
il
it
y
be
=
0
=
̂
an
d
=
1
=
(
1
−
̂
)
.
T
he
l
os
s
f
un
ct
io
n
is
the
n
c
om
pu
te
d
by
ta
king
t
he
a
ve
rag
e
cr
os
s
entr
op
y
of all
N
input
sam
ples (3)
.
Cros
s
en
t
ropy,
(
,
)
=
−
∑
log
(1)
(
,
)
=
−
log
̂
−
(
1
−
)
log
(
1
−
̂
)
(2)
Loss f
un
ct
io
n,
(
)
=
1
∑
(
,
)
=
=
1
−
1
∑
[
log
̂
+
(
1
−
)
log
(
1
−
̂
)
]
=
1
(3)
Durin
g
on
li
ne
le
arn
in
g,
num
ber
of
trai
ning
e
po
c
h
is
reduce
d
to
75,
w
hile
the
ot
her
t
wo
par
am
et
ers;
le
arn
in
g
rate
a
nd
nu
m
ber
of
posit
ive
and
ne
gative
sam
ples
var
ie
s
acco
rd
i
ng
to
t
he
best
s
et
up
.
T
hr
ee
dif
fer
e
nt
op
ti
m
iz
ers;
sto
chasti
c
gr
a
dien
t
descen
t,
A
da
gr
a
d
an
d
A
da
m
(ad
aptive
m
om
ent
est
i
m
ati
on)
are
com
p
ared
t
o
fin
d
the
opti
m
a
l values
of the
m
od
el
p
aram
eter
s
(w
ei
gh
ts a
nd
biases)
b
y m
i
nim
iz
ing
the l
oss f
unct
ion.
3.3.1.
Op
timi
zer #
1: Stoch
astic
g
r
ad
ie
n
t
d
escent
(SGD
)
Gr
a
dient
de
sce
nt
[33]
is
a
popu
la
r
optim
iz
a
ti
on
te
ch
nique
and
it
has
bee
n
wi
dely
us
ed
in
n
et
w
ork
le
arn
in
g
[
28,
29,
36]
.
At
a
ti
m
e
ste
p
t,
gr
ad
ie
nt
descen
t
al
gorithm
co
m
pu
te
s
the
gr
a
die
nt
of
los
s
f
un
ct
ion
with
resp
ect
to
the
m
od
el
par
am
e
t
ers,
w
he
re
the
resu
lt
ant
val
ue
is
us
ed
to
upda
te
the
networ
k.
G
ra
dient
is
a
vector
of
par
ti
al
der
i
vative
of
t
he
l
os
s
f
unct
ion
with
res
pect
to
eve
ry
weig
ht
and
bias
for
the
trai
ning
s
a
m
ples
.
The
n,
eac
h
of
the
wei
gh
t
a
nd
bias
a
re
update
d
by
sub
tract
ing
pr
e
vious
value
with
the
m
ulti
plication
of
the
le
arn
i
ng
ra
te
with
the
cal
culat
ed
gra
dient
(5),
(6).
T
he
process
w
il
l
be
rep
eat
e
d
unti
l
the
loss
f
un
ct
ion
i
s
m
ini
m
iz
ed
(conv
e
r
ge)
or
t
he
m
axi
m
u
m
nu
m
ber
of
e
po
c
h
is
reache
d.
O
ne
it
erati
on
of
a
gr
a
dient
des
cent
on
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
Com
p
En
g
IS
S
N:
2088
-
8708
Deep
-
le
arnin
g based
sin
gle objec
t t
ra
cker f
or ni
gh
t s
ur
vei
ll
an
ce
(
Zu
l
aikh
a K
adim
)
3581
on
e
pa
ram
et
er
is
su
m
m
arized
as
f
ollows.
Gradient
of
loss
f
un
ct
io
n
with
re
sp
ect
to
pa
ram
et
er
i
f
or
tim
e
ste
p
t
is cal
culat
ed
as
:
,
=
1
∑
,
,
=
1
(4)
w
he
re
N
is
the
nu
m
ber
of
trai
ning
sam
ple
s.Th
e
n
the
we
igh
t
an
d
bias
of
par
am
et
er
i
fo
r
ti
m
e
s
te
p
t
is
cal
culat
ed
as i
n gr
a
dient ste
p belo
w:
,
=
,
−
1
−
,
(5)
,
=
,
−
1
−
,
(6)
wh
e
re
is t
he
le
arn
i
ng r
at
e.
N
ot
e that t
he
sam
e lea
rn
i
ng r
at
e
is ap
plied to
all
p
a
ram
et
ers
updates.
On
e
gr
a
dient
de
scent
operati
on
co
ns
ist
s
of
one
it
eratat
ion
over
al
l
trai
ning
sa
m
ples.
This
is
diff
e
ren
t
from
stochastic
gr
a
dient
des
cent,
w
he
reb
y
instea
d
of
ta
kin
g
t
he
whole
trai
ning
sam
ples,
it
rand
om
l
y
sel
ect
s
a
few
trai
ni
ng
sam
ples
in
each
it
erati
on
to
optim
iz
e
the
m
od
el
par
am
et
ers.
This
m
akes
SG
D
c
om
pu
ta
tio
nally
eff
ect
ive
a
nd
m
akes
it
popu
l
ar
f
or
onli
ne
ne
twork
trai
ning.
Nev
e
rthel
es
s,
since
SGD
use
s
only
a
few
trai
ning
sam
ples, th
e p
a
th to
c
onve
r
ge
nce
will
b
e
no
i
sy.
3.3.2.
Op
timi
zer #
2: Adam
(ad
aptive m
om
ent
es
tima
tio
n
)
Ad
am
[34]
op
t
i
m
iz
er
sta
nd
s
f
or
a
da
ptive
m
ome
nt
est
i
m
a
ti
on
.
It
com
pu
te
s
diff
e
re
nt
le
arn
i
ng
rate
f
or
diff
e
re
nt
pa
ra
m
et
ers
by
usi
ng
the
e
stim
ates
of
first
a
nd
seco
nd
order
m
o
m
ents
of
gradie
nt.
T
he
first
a
nd
seco
nd
order
m
o
m
ents
are
the
m
ov
ing
a
ve
rag
e
a
nd
unc
entere
d
m
ov
ing
va
riance
as
show
n
in
(
4
)
and
(
5
)
.
It
introdu
ce
s
three
m
or
e
hype
rp
a
ram
et
ers
c
om
par
ed
to
gr
adient
s
te
p
in
SGD,
w
hich
a
r
e
β
1
,
β
2
and
ε;
wh
ic
h
corres
pond
to
expo
nen
ti
al
de
cay
rate
fo
r
first
orde
r
m
ome
nt,
expo
nent
ia
lly
decay
rate
fo
r
sec
ond
order
m
o
m
ent
and
ve
ry
sm
a
ll
con
sta
nt
to
pr
e
ve
nt
the
case
zero
div
isi
on,
resp
e
ct
ively
.
1
st
ord
er
m
o
m
ent
(
m
ov
i
ng
aver
a
g
e)
of
para
m
et
er
i
for
ti
m
e step
t
,
,
=
1
∗
,
−
1
+
(
1
−
1
)
∗
,
(7)
2
nd
or
der
m
ome
nt (u
ncen
te
re
d varia
nce)
of
par
am
et
er
i
for
tim
e step
t
,
,
=
2
∗
,
−
1
+
(
1
−
2
)
∗
,
2
(8)
Estim
ation
of
these
m
o
m
ent
s
will
be
bia
s
-
co
rr
ect
e
d
be
fore
t
hey
are
us
e
d
to
updat
e
the
m
od
el
par
am
et
ers.
Th
is
ste
p
is
i
m
po
rtant
to
ens
ure
that
the
first
and
sec
ond
ord
er
m
o
m
ents
are
no
t
biase
d
to
wards
zero
as
the
ini
ti
al
values
of
0
and
0
are
set
t
o
zer
o.
Bi
as
-
c
orrecte
d
fi
rst
an
d
seco
nd
orde
r
m
o
m
ents
are
cal
culat
ed
as
be
low. Bi
as
-
c
orrecte
d 1
st
order m
o
m
ent o
f par
a
m
et
er
i
for
ti
m
e step
t
,
̂
,
=
,
(
1
−
1
)
(9)
Bi
as
-
correct
ed
2
nd
or
der
m
ome
nt of
par
am
eter
i
f
or ti
m
e ste
p
t
,
̂
,
=
,
(
1
−
2
)
(10)
Af
te
r
est
im
at
i
ng
t
he
m
o
m
ents,
m
od
el
pa
ra
m
et
er
is
up
dat
ed
as
in
(
8
)
.
No
te
that
the
le
arn
in
g
rate
is
now
m
ulti
pli
ed
by
the
rati
o
of
fir
st
an
d
sec
ond
orde
r
m
ome
nts
of
the
gradients.
η
is
th
e
le
arn
i
ng
rate
and
is a ve
ry sm
al
l
nu
m
be
r
t
o pr
e
ve
nt d
i
visio
n by zer
o.
U
pd
at
e
d weig
ht and
bia
ses of
par
am
eter
i
f
or ti
m
e ste
p
t
,
,
=
,
−
1
−
,
̂
√
,
̂
+
(11)
,
=
,
−
1
−
,
̂
√
,
̂
+
(12)
Since it
s f
irst i
ntr
oductio
n
in 2
0
15, Adam
o
pti
m
iz
er h
as b
e
en
wide
ly
u
sed
in
netw
ork
le
arn
i
ng
[37].
It has fast
conve
rg
e
nce
ra
te
an
d t
hus
pr
a
ct
ic
al
f
or
t
raini
ng a lar
ge
m
odel
w
it
h
la
r
ge
tr
ai
nin
g sam
ples.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN
:
2088
-
8708
In
t J
Elec
&
Com
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3576
-
3587
3582
3.3.3.
Op
timi
zer #
3: Ada
gr
ad
Ad
a
G
rad
[
35
]
op
ti
m
iz
er
is
a
gr
a
dient
-
base
d
le
arn
in
g
al
go
r
it
h
m
,
bu
t
it
co
m
pu
te
s
diff
e
re
nt
le
arn
i
ng
rates
f
or
diff
e
r
ent
pa
ram
et
ers.
A
da
Gr
a
d
pe
r
form
s
s
m
al
update
on
the
pa
ram
et
ers
that
are
ass
ociat
ed
with
fr
e
qu
e
ntly
occ
urrin
g
feat
ur
es
,
w
hile
it
perform
s
big
up
da
te
on
the
pa
ram
et
ers
that
are
ass
ociat
ed
with
infr
e
qu
ent
occ
urrin
g
feat
ur
es
.
This
is
ac
hiv
e
d
by
A
daGrad
thr
ough
m
od
ify
ing
the
ge
neral
le
arn
in
g
rate
in
(
5),
base
d on the
pa
st gr
a
dient
of
the p
a
ram
et
er
i.
T
he gra
dient
ste
p
in
Ada
Gr
a
d becom
es:
,
=
,
−
1
−
√
,
+
,
(13)
wh
e
re
,
is t
he
a
ccum
ulate
d
sum
o
f
the squa
r
es of the
previ
ous gra
dient
w
it
h
res
pect to t
he param
et
er
i
up to
tim
e step
t.
,
=
∑
,
2
=
1
(14)
No
te
t
hat
sinc
e
the
gr
a
dient
values
are
al
l
posit
ive,
the
a
ccum
ulate
d
sum
,
w
il
l
kee
p
i
ncr
easi
ng
durin
g
the
trai
ning
process
wh
ic
h
will
cause
the
le
arn
i
ng
rate
in
(
13
)
to
sh
ri
nk
a
nd
eve
ntu
al
ly
beco
m
e
infin
it
esi
m
a
ll
y
sm
a
ll
.
At
this
po
i
nt,
the
optim
iz
er
is
no
t
able
to
le
ar
n
a
ny
new
knowle
dg
e
.
De
sp
it
e
of
this
weakness
, Ada
Gr
a
d
sti
ll
perf
orm
s b
et
te
r
com
par
e
d
to
the
S
GD as the
lear
ning r
at
e is
not
m
anu
al
ly
f
ine
-
tun
e
d.
Ad
a
G
rad
has
been
us
e
d
at
Goo
gle
[
38
]
to
trai
n
la
r
ge
ne
ur
al
netw
orks
to
rec
ognize
c
at
s
in
youT
ube
vid
e
os.
It
is
al
so
us
e
d
in
[39]
to
trai
n
Gl
oV
e
w
ord
e
m
bed
di
ngs,
as
infr
e
que
nt
wor
ds
re
quire
m
uch
la
rg
er
updates
com
par
ed
t
o
th
e fr
e
quent
ones
.
3.3.4.
Le
arnin
g
r
at
e
Choosin
g
a
le
arn
i
ng
rate
ca
n
be
a
di
ff
ic
ult
ta
sk
.
A
to
o
sm
al
l
le
arn
ing
rate
le
ads
to
a
slo
w
conve
rg
e
nce,
wh
il
e
a
t
oo
la
r
ge
le
ar
ning
rate
can
hinder
c
onve
rg
e
nce
a
n
d
ca
us
es
l
os
s
f
un
ct
io
n
t
o
fl
uc
tuate
or
even
ca
us
e
tra
ining
div
e
r
gence.
I
n
this
w
ork
,
le
arn
i
ng
ra
te
s
of
0.001
25,
0.0
01,
0.0
00
75
a
nd
0.0
005
ar
e
exp
e
rim
ented
to
fin
d
a
n op
ti
m
al
setup.
3.4.
Obj
ec
t
loc
ati
on estim
at
i
on
Give
n
an
in
put
fr
am
e
du
ri
ng
onli
ne
trac
ki
ng,
the
syst
e
m
will
esti
m
at
e
the
obj
ect
locat
ion
by
analy
zi
ng
the
ou
t
pu
t
pro
ba
bili
ti
es
fr
om
the
netw
ork.
T
he
netw
ork
outp
uts
two
pro
bab
il
it
ie
s;
(1
)
prob
a
bili
ti
es
that
the
input
sa
m
ple
belongs
to
the
foregr
ound
ob
j
ect
an
d
(2
)
pro
bab
il
it
ie
s
that
the
input
sa
m
ple
belon
gs
to
the
backg
rou
nd.
T
he
final
ob
j
ect
’s
locat
io
n
is
est
i
m
at
ed
by
co
m
pu
ti
ng
t
he
weig
hted
a
ver
a
ge
of
the
top
five
sam
ples w
it
h
the
highest f
ore
gro
und pro
bab
i
li
ti
es w
her
e
by
the w
ei
gh
t i
s
base
d o
n
their
proba
bili
ty
v
al
ues.
4.
RES
ULTS
A
ND
D
IS
CUSS
ION
F
or
validat
ion
pur
po
se
,
14
nig
ht
scene
vi
de
os
of
siz
e
352x28
8
has
bee
n
colle
ct
ed.
I
n
each
vid
e
o,
the
tracke
d
ob
je
ct
siz
e
is
abo
ut
30x70
pi
xels
and
t
he
total
num
ber
of
acc
um
ula
te
d
fr
am
e
s
of
al
l
vid
e
o
is
3646.
The
c
hosen
vi
deos
co
ntain
t
he
c
halle
ng
e
of
va
rio
us
li
ghti
ng
co
ndit
ion
,
occlusi
on
a
nd
m
ov
e
-
sto
p
-
m
ov
e
pro
blem
.
Sn
ap
shot
of
the
t
hree
cam
era
views
of
t
he
vid
eo
s
are
sho
w
n
in
Fig
ur
e
4.
The
gro
undtr
uth
is
gen
e
rated
m
anu
al
ly
b
y d
ra
wing the
obj
ect
boun
ding
box
i
n ea
ch fram
e b
y an e
xpert i
n
c
om
pu
te
r visi
on.
(a)
(b)
(c)
Figure
4. Th
re
e cam
era v
ie
w
s for f
ourteen
test
ing
vid
e
os
(
a) Cm
01
, (b
)
C
a
m
02
, (c)
Cam
03
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
Com
p
En
g
IS
S
N:
2088
-
8708
Deep
-
le
arnin
g based
sin
gle objec
t t
ra
cker f
or ni
gh
t s
ur
vei
ll
an
ce
(
Zu
l
aikh
a K
adim
)
3583
4.1.
Implem
ent
ati
on
det
ails
The
trac
king
c
od
e
is
im
ple
m
e
nted
i
n
Pyt
hon wit
h
te
nso
rf
lo
w
li
br
a
ry. O
ri
gi
nal
locat
ion
of
the
trac
ked
obj
ect
is
gi
ven
in
the
fo
rm
of
bounding
box
([
x
0
,
y
0
,
wi
dth
0
,
heig
ht
0
]
).
In
the
first
fr
am
e,
hype
rp
a
ram
et
e
rs
f
or
le
arn
in
g
rate,
nu
m
ber
of
e
poch,
num
ber
of
posit
ive
sam
ple
an
d
num
ber
of
ne
gative
sa
m
ple
are
init
ializ
ed
to
0.000
5,
150,
500
an
d
1000
re
sp
ect
ively
.
Sa
m
ples
extracte
d
f
r
om
first
fr
a
m
e
is
the
m
os
t
i
m
po
rtant
ste
p
as
it
is
the
only
kn
own
gro
undtr
uth
by
the
t
rack
e
r
.
Fig
ure
5
s
hows
e
xam
ples
of
posit
ive
a
nd
neg
at
ive
sa
m
ples
extracte
d
in
t
he
first
f
ram
e
of
th
ree
dif
f
eren
t
te
st
seq
uen
ce
s.
The
n,
the
tracker
will
be
upd
at
ed
onli
ne
per
i
od
ic
al
ly
throug
h weak
s
uperv
isi
on a
s the
conseq
ue
nt fra
m
es g
r
oundtr
uth
data is
no
t
kn
own.
Figure
5. Exa
m
ples o
f
posit
ive a
nd n
e
gativ
e sam
ples that has bee
n
e
xtra
ct
ed
f
ro
m
the c
urren
t
fr
am
e
(f
irst
20 sam
ples),
w
hich
a
re
repres
e
nted by
blu
e a
nd
red b
ox
e
s, res
pecti
ve
ly
4.2.
Perf
orm
ance
metric
To
e
valuate
t
he
pe
rfor
m
ance
of
our
night
tracker
al
gorit
hm
,
we
use
one
of
the
V
O
T
eval
uation
m
et
rics,
wh
ic
h
is
accuracy
(A
c
)
as
de
fine
d
in
(
4
)
.
Acc
uracy
m
easur
es
how
well
the
tracke
d
bo
undi
ng
bo
x
relat
ive
to
gro
und
tr
uth
bo
x
by
com
pu
ti
ng
the
intersect
io
n
over
un
i
on
(
IoU)
a
rea.
A
higher
ov
e
rlap
area
represe
nts
a
be
tt
er
trackin
g
accuracy.
The
tracke
r
is
no
t
re
-
i
niti
al
iz
ed
in
the
e
ven
t
of
trac
k
fail
ure
(where
the Io
U
is ze
ro).
Accuracy
,
=
1
∑
,
∩
,
,
∪
,
=
1
(4)
wh
e
re
de
no
te
s
the
num
ber
of f
ram
es
in
t
he
t
est
vi
deo, whil
e
,
and
,
are
t
he
boundi
ng b
ox
es
of
obj
ect
in
fram
e
i
from
the
trac
ke
r
outp
ut a
nd
gro
und
tr
uth
,
r
especti
vely
.
Table
1
s
hows
the
accuracy
com
par
ison
be
tween
the
th
re
e
op
ti
m
iz
er
alg
ori
thm
s:
SG
D,
A
dam
and
Ad
a
grad
.
For
a
fair
com
par
iso
n,
le
ar
ning
rate
,
num
ber
of
po
sit
ive
sa
m
ple
and
num
ber
of
neg
at
ive
sam
ple
are
fixe
d
to
0.0
01,
50
a
nd
100,
r
especti
vely
.
D
efau
lt
value
s
f
or
Ad
am
’s
hy
perparam
et
ers
β
1,
β
2
an
d
ε
ar
e
set
to
0.9,
0.999
a
nd
1e
-
08
,
res
pecti
vely
.
In
a
ve
ra
ge,
Ad
am
op
ti
m
iz
er
pr
od
uce
s
the
best
acc
ur
acy
as
c
om
par
ed
to
the
ot
her
tw
o
op
ti
m
iz
ers,
fo
l
lowe
d
by
A
da
Gr
a
d.
A
dagra
d
pe
r
f
or
m
s
signi
ficantl
y
bette
r
in
Ca
m
01
-
vi
de
o08
com
par
ed
to
the
oth
e
r
two
optim
iz
ers.
Wh
i
le
,
it
is
no
te
d
that
SG
D
perf
or
m
s
the
wo
rs
t
in
m
os
t
of
the
te
st
vid
e
os
.
T
his
ind
ic
at
es
t
hat
t
he
perform
ance
of
a
dap
ti
ve
l
earn
i
ng
rate
m
et
hod
is
bette
r
com
par
ed
to
a
fixe
d
value.
As
t
he
num
ber
of
it
erat
ion
s
for
eac
h
tr
ai
nin
g
is
set
to
m
ini
m
u
m
,
SG
D
m
ay
no
t
be
able
to
co
nver
ge
a
nd
con
t
rib
utes
to
it
s
bad
p
er
f
or
m
ance.
S
om
e
sam
ples
of
fr
am
e
with
overlai
d
tracki
ng
outp
ut
for
Ca
m
01
-
vi
de
o08,
Cam
02
–
vid
e
o0
2
an
d
Ca
m
0
3
-
vid
e
o02
are
sh
ow
n
in
F
igure
6.
Gr
e
en,
bl
ue
an
d
m
agen
ta
bounding
boxesc
orres
pond
t
o
the
outp
ut
of
S
GD,
A
dam
and
A
da
Gr
a
d
op
ti
m
iz
e
rs,
resp
ect
ivel
y.
In
Fig
ur
e
6,
the
firs
t
row
im
ages
co
rr
es
pond
t
o
Ca
m
01
-
vid
e
o08,
i
n
wh
ic
h
A
daGrad
opti
m
iz
er
giv
es
the
highe
st
accuracy.
I
ni
ti
al
l
y
,
al
l
three
op
ti
m
iz
ers
pr
oduce
good
resu
lt
s
a
s
sho
wn
in
fr
a
m
e
#2
,
the
n
e
ven
t
ually
SGD
optim
iz
er
m
od
el
ha
s
dr
ifte
d
to
m
ix
with
the
bac
kg
rou
nd
(fram
e
#27)
f
ollo
wed
by
A
dam
op
ti
m
iz
er
(f
ram
e
#71).
The
seco
nd
r
ow
shows
the
im
a
ges
f
or
Ca
m
02
-
vi
deo0
2
seq
ue
nces,
i
n
wh
ic
h
A
dam
giv
es
the
best
accu
r
acy
wh
il
e
the
oth
e
rs
giv
e
al
m
os
t
0
%
accuracy
(the
boun
ding
boxes
are
stu
cked
at
the
ba
ckgr
ound
are
a
as
it
con
ta
in
m
or
e
te
xtu
res
com
par
ed
to
t
hetracke
d
obj
ect
)
.
T
he
thir
d
r
ow
im
a
ges
c
orres
pond
to
the
outp
ut
f
or
Ca
m
03
-
vi
de
o0
2,
in
w
hich
al
l
three
opti
m
iz
ers
pro
duce
poor
acc
ur
acy
res
ults.
T
his
m
igh
t
be
ca
us
ed
by
si
m
iliarit
y
between
the fore
gro
und ap
pear
a
nce a
nd the
b
ac
kgr
ou
nd.
Table
2
s
how
s
the
accuracy
com
par
ison
be
tween
f
our
di
ff
ere
nt
values
of
le
ar
ning
r
at
e.
In
this
exp
e
ri
m
ent,
Ad
am
op
tim
iz
er
has
been
c
ho
sen
as
the
bas
is
op
ti
m
iz
er,
wh
il
e
the
nu
m
ber
of
posit
iv
e
an
d
neg
at
ive
sam
ples
are
set
to
50
a
nd
100,
r
especti
vely
.
I
n
aver
a
ge,
le
ar
ning
rate
of
0.
00075
giv
es
t
he
best
accuracy
perf
orm
ance
com
par
ed
to
t
he
oth
e
r
s,
f
ollo
wed
by
0.000
5
le
ar
ning
rate.
T
he
res
ults
al
so
in
dicat
e
that
an
inc
rease i
n
l
earn
i
ng r
at
e
va
lue, the
av
e
ra
ge
tracke
r
acc
uracy
w
il
l be low
er.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN
:
2088
-
8708
In
t J
Elec
&
Com
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3576
-
3587
3584
Table
3
sho
ws
the
accu
racy
com
par
ison
be
tween
fou
r
dif
fer
e
nt
com
bina
ti
on
s
of
t
otal
nu
m
ber
of
posit
ive
and
ne
gative
trai
ni
ng
sam
ples
us
ed
durin
g
onli
ne
update
.
A
da
m
op
tim
iz
er
with
le
arn
in
g
rate
of
0.000
75
will
be
the
basic
set
up
f
or
t
he
trai
ni
ng
sam
ples
com
par
ison
.
I
n
a
ver
a
ge,
a
c
om
bin
at
ion
of
50
a
nd
100
for
posit
ive
a
nd
ne
gative
trai
ing
sam
ple
s
,
re
sp
ec
ti
vely
retu
rns
the
best
acc
ur
acy
com
par
ed
to
oth
e
r
com
bin
at
ion
s.
Total
num
ber
of
ne
gative
sa
m
ples
are
twic
e
of
t
he
posit
ive
sam
ples,
suc
h
that
it
cat
er
s
f
or
la
rg
er
b
ac
kgr
ound a
rea c
om
par
ed
to
c
oncent
rated
fore
ground sam
ples.
Table
1.
Acc
uracy
co
m
par
iso
n betwee
n
th
re
e optim
iz
er
al
go
rithm
s: SGD,
Ad
am
an
d A
da
gr
a
d, with
onli
ne
le
arn
in
g param
et
ers; lea
rn
i
ng
rate,
nu
m
ber
of posit
ive a
nd negati
ve
sam
ples are fixe
d
t
o 0
.001,
50 and
100 res
pecti
vely
No
.
Datasets
Nu
m
b
er
o
f
f
ra
m
es
Accurac
y
Lear
n
in
g
r
ate
= 0.0
0
1
#
po
sitiv
e sa
m
p
les
=5
0
#
neg
ativ
e sa
m
p
les
=1
0
0
Op
ti
m
ize
r:
Ad
a
m
Op
ti
m
ize
r:
SG
D
Op
ti
m
ize
r:
Ad
ag
ra
d
1
Ca
m
0
1
–
v
id
eo
0
1
146
8
5
.02
1
5
.71
6
9
.81
2
Ca
m
0
1
–
v
id
eo
0
2
184
6
4
.82
4
4
.92
4
5
.81
3
Ca
m
0
1
–
v
id
eo
0
3
71
9
6
.89
0
.44
5
7
.35
4
Ca
m
0
1
–
v
id
eo
0
4
22
9
1
.46
1
4
.85
7
4
.28
5
Ca
m
0
1
–
v
id
eo
0
5
34
8
9
.51
7
3
.42
2
5
.17
6
Ca
m
0
1
–
v
id
eo
0
6
150
8
8
.96
7
4
.95
8
1
.64
7
Ca
m
0
1
–
v
id
eo
0
7
86
5
5
.79
6
.88
2
0
.96
8
Ca
m
0
1
–
v
id
eo
0
8
125
5
9
.26
2
1
.53
9
4
.89
9
Ca
m
0
2
–
v
id
eo
0
1
257
6
7
.56
0
.86
3
5
.71
10
Ca
m
0
2
–
v
id
eo
0
2
1083
6
2
.27
0
3
.8
11
Ca
m
0
3
–
v
id
eo
0
1
344
9
5
.83
8
9
.66
7
9
.9
3
12
Ca
m
0
3
–
v
id
eo
0
2
227
1
3
.58
1
2
.15
2
.97
13
Ca
m
0
3
–
v
id
eo
0
3
317
6
2
.96
5
5
.78
7
4
.03
14
Ca
m
0
3
–
v
id
eo
0
4
600
3
6
.97
4
8
.32
4
5
.54
Av
erag
e accura
cy
6
9
.35
3
2
.82
5
0
.85
fr
am
e #
2
fr
am
e #
27
fr
am
e #
71
fr
am
e #
3
fr
am
e #
105
fr
am
e #
246
fr
am
e
#2
fr
am
e #
100
fr
am
e #
154
Figure
6. Sam
ple o
f
fram
es w
it
h
ove
rlai
d
trac
king
ou
t
pu
t
f
or (
a)
Cam
01
-
vi
de
o08, (
b) Cam
03
-
vid
e
o02
and (c
)
Ca
m
02
-
vi
deo0
2.
B
oxes col
or
:
gr
ee
n (S
GD), blue
(Ad
am
)
an
d
m
agen
ta
(
Ad
a
Gr
a
d)
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
Com
p
En
g
IS
S
N:
2088
-
8708
Deep
-
le
arnin
g based
sin
gle objec
t t
ra
cker f
or ni
gh
t s
ur
vei
ll
an
ce
(
Zu
l
aikh
a K
adim
)
3585
Table
2.
Acc
uracy
co
m
par
iso
n betwee
n four o
nline lear
ning
rates:
0
.
0012
5,
0.0
01, 0.
00075 an
d 0.0
005,
with
on
li
ne
learni
ng p
a
ram
et
e
rs; opti
m
er algo
rithm
, num
ber
of
posit
ive a
nd
neg
at
ive
sam
ples f
ixe
d
t
o Ad
a
m
, 5
0
an
d 1
00
resp
ect
ively
No
.
Datasets
Nu
m
b
er
o
f
f
ra
m
es
Accurac
y
Op
ti
m
ize
r:
Ad
a
m
#
po
sitiv
e
sa
m
p
les
=5
0
#
neg
ativ
e sa
m
p
les
=1
0
0
Lear
n
in
g
r
ate
=
0
.00
1
2
5
Lear
n
in
g
r
ate
=
0
.00
1
Lear
n
in
g
r
ate
=
0
.00
0
7
5
Lear
n
in
g
r
ate
=
0
.00
0
5
1
Ca
m
0
1
–
v
id
eo
0
1
270
6
8
.23
8
5
.02
7
8
.91
8
3
.28
2
Ca
m
0
1
–
v
id
eo
0
2
448
6
2
.82
6
4
.82
6
5
.59
7
0
.56
3
Ca
m
0
1
–
v
id
eo
0
3
71
9
8
.23
9
6
.89
8
9
.20
9
4
.10
4
Ca
m
0
1
–
v
id
eo
0
4
128
8
7
.66
9
1
.46
9
4
.06
8
8
.76
5
Ca
m
0
1
–
v
id
eo
0
5
34
8
9
.24
8
9
.51
7
6
.20
8
9
.02
6
Ca
m
0
1
–
v
id
eo
0
6
224
9
5
.19
8
8
.96
9
3
.76
9
7
.77
7
Ca
m
0
1
–
v
id
eo
0
7
460
4
8
.22
5
5
.79
5
9
.81
5
4
.06
8
Ca
m
0
1
–
v
id
eo
0
8
125
2
7
.53
5
9
.26
9
1
.29
9
5
.54
9
Ca
m
0
2
–
v
id
eo
0
1
1137
4
5
.26
6
7
.56
7
6
.11
6
0
.43
10
Ca
m
0
2
–
v
id
eo
0
2
1083
6
6
.10
6
2
.27
8
8
.16
7
6
.16
11
Ca
m
0
3
–
v
id
eo
0
1
344
8
8
.21
9
5
.83
9
1
.63
8
0
.73
12
Ca
m
0
3
–
v
id
eo
0
2
700
1
1
.76
1
3
.58
7
.18
3
5
.29
13
Ca
m
0
3
–
v
id
eo
0
3
317
6
4
.93
6
2
.96
6
4
.32
7
6
.41
14
Ca
m
0
3
–
v
id
eo
0
4
689
36
.28
3
6
.97
4
3
.04
1
6
.91
Av
erage accu
rac
y
6
3
.55
6
9
.35
7
2
.80
7
2
.79
Table
3.
Acc
uracy
co
m
par
iso
n betwee
n four dif
fer
e
nt c
ombinati
on
of pos
it
ive an
d ne
gat
ive sam
ples (
50,
100),
(50,5
0)
,
(1
00,100) a
nd (1
50,150),
w
it
h o
nlin
e lea
rn
i
ng p
a
ra
m
et
ers; o
ptim
i
zer alg
or
it
hm
an
d
le
arn
in
g rate
is
f
ixe
d
as
Adam
and
0.000
75 re
sp
ect
ively
No
.
Datasets
Nu
m
b
er
o
f
f
ra
m
es
Accuracy
Op
ti
m
ize
r:
Ad
a
m
Lear
n
in
g
r
ate
=0
.00
0
7
5
n
= 10
0
p
= 50
n
= 50
p
= 50
n
= 10
0
p
= 10
0
n
= 15
0
p
= 15
0
1
Ca
m
0
1
–
v
id
eo
0
1
270
7
8
.91
1
2
.84
7
4
.49
8
3
.84
2
Ca
m
0
1
–
v
id
eo
0
2
448
6
5
.59
5
5
.00
5
4
.78
5
4
.58
3
Ca
m
0
1
–
v
id
eo
0
3
71
8
9
.20
9
6
.19
9
3
.97
9
6
.40
4
Ca
m
0
1
–
v
id
eo
0
4
128
9
4
.06
9
1
.05
9
5
.27
9
2
.85
5
Ca
m
0
1
–
v
id
eo
0
5
34
7
6
.20
7
7
.65
9
2
.40
7
5
.07
6
Ca
m
0
1
–
v
id
eo
0
6
224
9
3
.76
9
2
.22
9
2
.99
9
7
.57
7
Ca
m
0
1
–
v
id
eo
0
7
460
5
9
.81
64.
52
4
9
.81
1
0
.37
8
Ca
m
0
1
–
v
id
eo
0
8
125
9
1
.29
9
1
.01
9
0
.41
3
4
.12
9
Ca
m
0
2
–
v
id
eo
0
1
1137
7
6
.11
6
5
.09
6
.43
5
.27
10
Ca
m
0
2
–
v
id
eo
0
2
1083
8
8
.16
7
3
.88
5
5
.29
8
0
.56
11
Ca
m
0
3
–
v
id
eo
0
1
344
9
1
.63
9
3
.92
8
5
.35
9
3
.74
12
Ca
m
0
3
–
v
id
eo
0
2
700
7
.18
1
1
.99
1
0
.82
1
0
.24
13
Ca
m
0
3
–
v
id
eo
0
3
317
6
4
.32
6
7
.63
6
9
.17
7
5
.61
14
Ca
m
0
3
–
v
id
eo
0
4
689
4
3
.04
3
4
.35
1
4
.22
4
3
.14
Av
erage accu
rac
y
7
2
.80
6
6
.24
6
3
.24
6
0
.95
5.
CONCL
US
I
O
N
In
c
on
cl
us
io
n,
the
pro
posed
trackin
g
sc
hem
e
is
able
t
o
track
obj
ect
of
inte
rest
in
the
night
su
r
veill
ance
vi
deo
s
.
Ad
am
op
ti
m
iz
er
sh
ows
s
uperi
or
a
ccur
acy
pe
rform
ance
as
c
om
par
ed
to
S
GD
a
nd
Ad
a
G
rad
in
m
os
t
of
the
te
sti
ng
vid
e
os
.
The
best
le
arn
ing
rate
is
fo
und
to
be
0.0
0075
that
are
achiev
ed
by
us
in
g
sam
ple
trai
ning
rati
o
of
2:1
bet
wee
n
neg
at
ive
a
nd
po
sit
ive
sa
m
pl
es.
Hen
ce,
this
tracke
r
can
be
i
m
ple
m
ented
in the
h
i
gh
e
r
le
vel appli
cat
ion o
f nig
ht sur
veill
ance syst
e
m
.
ACKN
OWLE
DGE
MENTS
This
w
ork
was
su
pport
ed
by
t
he
N
vid
ia
Corporat
io
n
thr
ough
the
Tit
an
V
Gr
ant
(KK
-
2019
-
005)
a
nd
Mi
nistry of E
ducat
ion t
hro
ug
h
FR
GS
/
1/
2019/ICT
02
/
UK
M
/02
/1
.
REFERE
NCE
S
[1]
K.
Huang,
L.
W
ang,
and
T
.
T
an,
“
Dete
ct
ing
and
tra
cki
ng
d
ista
nt
obje
ct
s
a
t
night
base
d
on
hum
an
visual
s
y
stem,
”
Asian
Confe
r
ence
on
Comput
er
V
ision
,
pp
.
822
–
8
31,
2006
.
Evaluation Warning : The document was created with Spire.PDF for Python.