I
nte
rna
t
io
na
l J
o
urna
l o
f
E
lect
rica
l a
nd
Co
m
p
ute
r
E
ng
in
ee
ring
(
I
J
E
CE
)
Vo
l.
11
,
No
.
4
,
A
u
g
u
s
t
2021
,
p
p
.
3
3
7
4
~
3
3
8
0
I
SS
N:
2088
-
8708
,
DOI
: 1
0
.
1
1
5
9
1
/
i
j
ec
e
.
v
11
i
4
.
pp
3
3
7
4
-
3
3
8
0
3374
J
o
ur
na
l ho
m
ep
a
g
e
:
h
ttp
:
//ij
ec
e.
ia
esco
r
e.
co
m
A f
ully
i
ntegra
ted
vio
lence de
tect
io
n sy
ste
m
using
C
NN and
LST
M
Sa
rt
ha
k
S
ha
r
m
a
,
B
Su
dh
a
rs
a
n,
Sa
a
m
a
j
a
Na
ra
ha
riset
t
i,
Vi
m
a
r
s
h
T
re
ha
n,
K
a
y
a
lv
izhi J
a
y
a
v
el
De
p
a
rtme
n
t
o
f
In
f
o
rm
a
ti
o
n
T
e
c
h
n
o
lo
g
y
,
S
RM
In
stit
u
te o
f
S
c
ien
c
e
a
n
d
T
e
c
h
n
o
l
o
g
y
,
Ka
tt
a
n
k
u
lath
u
r,
Ch
e
n
n
a
i,
In
d
ia
Art
icle
I
nfo
AB
ST
RAC
T
A
r
ticle
his
to
r
y:
R
ec
eiv
ed
Sep
1
,
2
0
20
R
ev
i
s
ed
Dec
2
0
,
2
0
2
0
A
cc
ep
ted
J
an
1
3
,
2
0
2
1
Re
c
e
n
tl
y
,
th
e
n
u
m
b
e
r
o
f
v
io
len
c
e
-
re
late
d
c
a
se
s
in
p
lac
e
s
su
c
h
a
s
re
m
o
te
ro
a
d
s,
p
a
t
h
w
a
y
s,
sh
o
p
p
i
n
g
m
a
ll
s,
e
lev
a
to
rs,
sp
o
rts
sta
d
i
u
m
s,
a
n
d
li
q
u
o
r
sh
o
p
s,
h
a
s in
c
re
a
se
d
d
ra
stica
ll
y
wh
ich
a
re
u
n
f
o
rtu
n
a
tely
d
isc
o
v
e
r
e
d
o
n
ly
a
f
ter
it
’s
to
o
late
.
T
h
e
a
i
m
is
to
c
re
a
t
e
a
c
o
m
p
lete
s
y
ste
m
th
a
t
c
a
n
p
e
rf
o
r
m
r
e
a
l
-
ti
m
e
v
id
e
o
a
n
a
l
y
sis
w
h
ich
w
il
l
h
e
lp
re
c
o
g
n
ize
t
h
e
p
re
se
n
c
e
o
f
a
n
y
v
io
len
t
a
c
ti
v
it
ies
a
n
d
n
o
ti
fy
th
e
sa
m
e
to
th
e
c
o
n
c
e
rn
e
d
a
u
th
o
rit
y
,
su
c
h
a
s
th
e
p
o
li
c
e
d
e
p
a
rtm
e
n
t
o
f
th
e
c
o
rre
sp
o
n
d
in
g
a
re
a
.
Us
in
g
th
e
d
e
e
p
lea
rn
in
g
n
e
tw
o
rk
s
CNN
a
n
d
L
S
T
M
a
lo
n
g
w
it
h
a
w
e
ll
-
d
e
f
in
e
d
s
y
ste
m
a
rc
h
it
e
c
tu
re
,
w
e
h
a
v
e
a
c
h
iev
e
d
a
n
e
ff
icie
n
t
so
lu
ti
o
n
t
h
a
t
c
a
n
b
e
u
se
d
f
o
r
re
a
l
-
ti
m
e
a
n
a
l
y
s
is
o
f
v
id
e
o
f
o
o
tag
e
so
th
a
t
th
e
c
o
n
c
e
rn
e
d
a
u
th
o
rit
y
c
a
n
m
o
n
it
o
r
t
h
e
situ
a
ti
o
n
th
ro
u
g
h
a
m
o
b
il
e
a
p
p
li
c
a
ti
o
n
th
a
t
c
a
n
n
o
ti
fy
a
b
o
u
t
a
n
o
c
c
u
rre
n
c
e
o
f
a
v
io
len
t
e
v
e
n
t
imm
e
d
iate
l
y
.
K
ey
w
o
r
d
s
:
Dee
p
lear
n
in
g
L
ST
M
Mo
b
ile
ap
p
licatio
n
S
m
ar
t
c
itie
s
T
r
an
s
f
er
lear
n
in
g
Vio
len
ce
d
etec
tio
n
T
h
is i
s
a
n
o
p
e
n
a
c
c
e
ss
a
rticle
u
n
d
e
r th
e
CC B
Y
-
SA
li
c
e
n
se
.
C
o
r
r
e
s
p
o
nd
ing
A
uth
o
r
:
Ka
y
al
v
izh
i J
a
y
av
el
Dep
ar
t
m
en
t o
f
I
n
f
o
r
m
atio
n
T
e
ch
n
o
lo
g
y
SR
M
I
n
s
tit
u
te
o
f
Sc
ien
ce
a
n
d
T
ec
h
n
o
lo
g
y
SR
M
Na
g
ar
,
Kattan
k
u
lat
h
u
r
,
C
h
e
n
g
alp
attu
D
is
tr
ict,
T
a
m
il
Nad
u
-
603203
,
I
n
d
ia
E
m
ail:
k
a
y
al
v
ij
@
s
r
m
i
s
t.e
d
u
.
i
n
1.
I
NT
RO
D
UCT
I
O
N
I
n
t
h
is
d
a
y
a
n
d
a
g
e,
v
io
len
ce
h
as
b
ee
n
d
r
asti
ca
l
l
y
in
cr
ea
s
in
g
a
n
d
is
p
o
s
in
g
s
er
io
u
s
t
h
r
ea
ts
to
h
u
m
an
s
,
s
y
s
te
m
s
,
a
n
d
b
u
ild
i
n
g
s
.
T
h
e
s
i
tu
atio
n
b
ec
o
m
es
w
o
r
s
e
w
h
e
n
v
io
len
ce
tak
e
s
p
lace
i
n
p
u
b
lic
w
h
er
e
m
o
s
t
p
eo
p
le
ar
e
n
o
t
ac
co
u
n
tab
le
an
d
ca
n
n
o
t
b
e
h
eld
r
esp
o
n
s
ib
le
w
it
h
o
u
t
p
r
o
o
f
.
M
o
s
t
o
f
th
e
h
ei
n
o
u
s
c
r
i
m
es
ta
k
e
p
lace
in
p
u
b
lic
d
u
e
to
th
eir
an
o
m
alo
u
s
n
atu
r
e.
W
h
en
w
e
ar
e
tal
k
i
n
g
ab
o
u
t
v
i
o
len
t
ac
ti
v
itie
s
,
it
u
s
u
a
ll
y
r
e
f
e
r
s
to
an
u
n
u
s
u
al
p
h
y
s
ical
i
n
ter
ac
tio
n
th
a
t
h
ap
p
en
s
b
et
w
ee
n
t
w
o
o
r
m
o
r
e
p
eo
p
le
[
1
]
.
Mo
n
ito
r
in
g
t
h
e
s
u
r
v
eilla
n
ce
h
a
s
led
to
a
lo
t o
f
d
if
f
ic
u
lt
y
f
o
r
s
ec
u
r
it
y
p
er
s
o
n
n
el
a
s
th
e
y
n
o
w
h
a
v
e
t
o
p
ain
s
tak
in
g
l
y
g
o
th
r
o
u
g
h
t
h
e
f
o
o
tag
e
to
f
i
n
d
th
e
c
u
lp
r
it
s
p
ec
if
icall
y
an
d
tr
ac
k
h
is
m
o
v
es
f
r
o
m
o
n
e
ca
m
er
a
to
an
o
t
h
er
o
r
v
ie
w
it
i
n
r
ea
l
-
ti
m
e
to
d
etec
t
v
io
len
t
ac
ti
v
itie
s
a
n
d
b
eh
a
v
io
u
r
b
e
f
o
r
e
o
r
as
th
e
y
ar
e
o
cc
u
r
r
in
g
.
A
m
aj
o
r
co
n
s
tr
ain
t
f
o
r
t
h
is
i
s
al
s
o
th
e
q
u
alit
y
o
f
s
u
r
v
eilla
n
ce
v
id
eo
s
t
h
at
t
h
e
y
ar
e
p
r
o
v
id
ed
w
i
th
.
Mo
s
t
s
n
ap
s
h
o
ts
f
r
o
m
s
u
r
v
ei
llan
ce
f
o
o
tag
e
d
o
n
o
t h
o
ld
g
o
o
d
in
co
u
r
t a
s
t
h
e
d
ef
en
d
a
n
t
w
ill d
e
n
y
th
eir
p
r
esen
ce
i
n
th
e
p
h
o
to
.
Sin
ce
t
h
e
v
io
len
ce
in
a
cit
y
ca
n
o
cc
u
r
at
a
n
y
ti
m
e,
r
el
y
i
n
g
o
n
a
h
u
m
a
n
to
m
o
n
ito
r
an
d
d
et
ec
t
v
io
le
n
t
ev
en
t
s
i
s
n
o
t
an
ef
f
icie
n
t
w
a
y
to
h
an
d
le
s
u
ch
ca
s
es.
Su
c
h
ac
tiv
itie
s
u
s
u
all
y
lead
to
v
er
y
u
n
p
leasa
n
t
s
ce
n
ar
io
s
w
h
ic
h
m
a
k
es
it
v
er
y
cr
u
cial
f
o
r
au
to
m
a
tic
d
etec
tio
n
o
f
s
u
c
h
ev
e
n
ts
t
h
r
o
u
g
h
r
ea
l
-
ti
m
e
v
i
d
eo
f
o
o
tag
e
to
tak
e
p
lace
,
s
o
th
at
t
h
e
r
eq
u
ir
ed
,
cr
u
cia
l
d
ec
is
io
n
ca
n
b
e
m
ad
e
b
y
t
h
e
co
n
ce
r
n
ed
au
t
h
o
r
it
y
.
As
a
r
esu
lt,
th
e
id
ea
o
f
i
m
p
le
m
en
t
in
g
s
y
s
te
m
s
a
n
d
eq
u
ip
m
e
n
t
h
a
s
b
ee
n
in
tr
o
d
u
ce
d
to
d
etec
t
s
u
ch
in
cid
en
t
s
u
s
in
g
v
id
eo
r
etr
iev
al
an
d
r
ea
l
-
ti
m
e
m
o
n
ito
r
in
g
.
T
h
e
p
r
i
m
e
f
o
cu
s
i
s
to
g
et
r
id
o
f
t
h
e
ab
o
v
e
-
m
e
n
tio
n
ed
r
ea
l
-
w
o
r
ld
co
n
s
tr
ain
t
s
an
d
d
ec
r
ea
s
e
th
e
cr
i
m
e
r
ates s
ig
n
i
f
ican
tl
y
an
d
e
f
f
ic
ien
t
l
y
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J
E
lec
&
C
o
m
p
E
n
g
I
SS
N:
2088
-
8708
A
fu
lly
in
teg
r
a
ted
vio
len
ce
d
et
ec
tio
n
s
ystem
u
s
in
g
C
N
N
a
n
d
LS
TM
(
S
a
r
th
a
k
S
h
a
r
ma
)
3375
T
ec
h
n
o
lo
g
y
h
a
s
r
ev
o
lu
t
io
n
iz
ed
o
u
r
w
o
r
ld
an
d
d
ail
y
lif
e.
W
e
ar
e
n
o
w
ab
le
to
co
m
b
i
n
e
v
ar
io
u
s
tech
n
o
lo
g
ies
t
h
at
ar
e
u
s
ed
to
d
etec
t
o
b
j
ec
ts
an
d
v
ar
io
u
s
m
o
v
e
m
e
n
ts
a
n
d
cr
ea
te
a
s
y
s
t
e
m
t
h
at
w
ill
h
elp
i
n
d
etec
tin
g
s
u
ch
p
o
ten
tial
t
h
r
ea
ts
.
I
n
r
ec
en
t
y
ea
r
s
,
t
h
er
e
h
as
b
ee
n
a
m
a
s
s
i
v
e
a
m
o
u
n
t
o
f
wo
r
k
estab
lis
h
ed
o
n
h
u
m
a
n
ac
t
io
n
r
ec
o
g
n
it
io
n
[
2
,
3
]
.
Sp
ec
if
icall
y
,
e
m
p
lo
y
i
n
g
d
ee
p
lear
n
in
g
f
o
r
cla
s
s
i
f
y
i
n
g
t
h
e
v
id
eo
s
eq
u
e
n
ce
s
h
as
b
ee
n
ac
h
ie
v
in
g
b
etter
r
esu
lt
s
t
h
an
t
h
e
ex
is
ti
n
g
h
a
n
d
c
r
af
ted
m
et
h
o
d
s
[
4
,
5
]
.
T
h
e
r
esear
ch
r
elate
d
to
v
io
len
ce
d
etec
tio
n
u
s
u
all
y
i
n
v
o
lv
es t
h
e
u
s
e
o
f
t
w
o
p
o
p
u
lar
d
atasets
-
Mo
v
ie
s
an
d
Ho
ck
e
y
[
6
]
.
T
r
an
s
f
er
lear
n
i
n
g
i
s
a
v
er
y
p
o
p
u
lar
tech
n
iq
u
e
th
at
i
s
u
s
e
d
in
m
ac
h
i
n
e
lear
n
i
n
g
,
w
h
ic
h
u
s
es
t
h
e
f
ea
t
u
r
es
th
a
t
it
lear
n
s
f
r
o
m
a
s
o
u
r
ce
d
o
m
ai
n
to
m
a
k
e
p
r
ed
ictio
n
s
w
it
h
a
tar
g
et
d
o
m
ai
n
.
T
o
d
ea
l
w
it
h
p
r
o
b
lem
s
ar
is
in
g
f
r
o
m
h
u
g
e
v
o
l
u
m
es
o
f
d
ata,
m
an
y
k
i
n
d
s
o
f
r
esear
ch
ch
o
o
s
e
th
e
s
tr
ateg
y
o
f
tr
an
s
f
er
lear
n
in
g
to
s
o
lv
e
th
e
p
r
o
b
le
m
.
I
n
o
u
r
ca
s
e,
w
e
ar
e
r
e
-
tr
ain
i
n
g
t
h
e
last
4
la
y
e
r
s
o
f
th
e
p
r
e
-
tr
ain
ed
Xce
p
tio
n
n
et
w
o
r
k
a
n
d
th
e
n
u
tili
zi
n
g
i
t
as
a
f
ea
t
u
r
e
e
x
tr
ac
to
r
f
o
r
o
u
r
c
u
s
to
m
c
lass
if
ier
.
W
ith
t
h
i
s
tec
h
n
iq
u
e,
th
e
s
e
m
o
d
el
s
ca
n
b
e
r
ep
u
r
p
o
s
ed
f
o
r
an
y
r
elate
d
w
o
r
k
w
e
r
eq
u
ir
e,
f
r
o
m
o
b
ject
d
etec
tio
n
f
o
r
s
el
f
-
d
r
iv
i
n
g
v
eh
icles,
ac
t
io
n
r
ec
o
g
n
itio
n
to
class
i
f
y
in
g
v
id
eo
clip
s
[
7
,
8]
.
L
o
n
g
s
h
o
r
t
-
ter
m
m
e
m
o
r
y
(
L
S
T
M)
[
9
]
n
et
w
o
r
k
i
s
a
t
y
p
e
o
f
r
ec
u
r
r
en
t
n
e
u
r
al
n
et
w
o
r
k
u
s
ed
w
id
el
y
i
n
s
eq
u
en
ce
p
r
ed
ictio
n
p
r
o
b
le
m
s
w
h
er
e
it
ca
n
lear
n
t
h
e
o
r
d
er
d
ep
en
d
en
ce
.
L
ST
M,
as
t
h
e
n
a
m
e
s
u
g
g
e
s
ts
,
ca
n
r
etain
h
u
g
e
a
m
o
u
n
ts
o
f
i
n
f
o
r
m
atio
n
b
y
d
ef
a
u
lt
f
o
r
a
lo
n
g
p
er
io
d
o
f
ti
m
e
.
C
N
N
f
o
llo
w
e
d
b
y
L
ST
M
h
as
b
ee
n
p
r
o
v
en
to
b
e
th
e
b
est
ar
c
h
itec
tu
r
e
w
h
e
n
t
h
e
a
v
ailab
le
d
ata
a
r
e
s
m
a
ll
a
n
d
t
h
e
co
m
p
u
t
in
g
p
o
w
er
r
eso
u
r
ce
s
ar
e
n
o
t
v
er
y
h
ig
h
f
o
r
th
e
ta
s
k
.
O
u
r
p
r
o
p
o
s
ed
C
NN
+
L
ST
M
m
o
d
el
ca
n
p
r
o
ce
s
s
t
h
e
v
id
eo
s
w
i
th
a
s
p
ee
d
o
f
1
2
6
f
r
a
m
e
s
p
er
s
ec
o
n
d
in
o
u
r
test
e
n
v
ir
o
n
m
e
n
t.
2.
RE
L
AT
E
D
WO
RK
T
h
e
in
itial
w
o
r
k
f
o
r
v
io
le
n
ce
class
if
icatio
n
m
a
in
l
y
r
e
v
o
lv
ed
ar
o
u
n
d
a
u
d
io
-
v
id
eo
co
r
r
elatio
n
[
1
0
]
,
d
etec
tin
g
t
h
e
p
r
esen
ce
o
f
b
lo
o
d
,
v
ig
o
r
o
u
s
d
eg
r
ee
s
o
f
m
o
tio
n
,
an
d
id
en
tify
i
n
g
s
o
u
n
d
f
ea
tu
r
es
s
u
c
h
as
s
cr
ea
m
s
[
1
1
,
1
2
]
.
C
ar
n
eir
o
et
a
l.
[
1
3
]
f
o
cu
s
ed
to
i
m
p
le
m
e
n
t
a
h
a
n
d
-
d
r
a
w
n
h
i
g
h
-
le
v
el
d
escr
ip
tio
n
an
d
m
u
lti
-
s
tr
ea
m
-
b
ased
lear
n
in
g
m
o
d
el
to
s
o
lv
e
th
e
co
n
f
lic
t
d
etec
tio
n
p
r
o
b
lem
i
n
v
id
eo
s
.
On
e
o
f
t
h
e
r
ec
en
t
w
o
r
k
s
i
n
t
h
is
f
ie
ld
p
r
o
p
o
s
ed
[
1
4
]
a
s
y
s
te
m
th
a
t
wo
r
k
s
o
n
t
h
e
HO
G
f
ea
t
u
r
es o
f
v
id
eo
f
r
a
m
e
s
.
T
h
e
a
u
th
o
r
s
ex
tr
a
ct
ed
HOG
f
ea
t
u
r
es
f
r
o
m
b
i
n
ar
y
i
m
a
g
es
a
n
d
u
s
ed
th
e
r
an
d
o
m
f
o
r
est
clas
s
i
f
ier
to
id
en
ti
f
y
th
e
ex
i
s
te
n
ce
o
f
v
io
l
en
ce
i
n
ea
ch
f
r
a
m
e.
Fin
all
y
,
t
h
e
y
e
m
p
lo
y
ed
t
h
e
m
aj
o
r
ity
v
o
ti
n
g
tec
h
n
iq
u
e
to
cla
s
s
i
f
y
t
h
e
v
id
eo
clip
in
to
v
io
le
n
ce
o
r
n
o
n
-
v
io
len
ce
.
A
lt
h
o
u
g
h
t
h
is
s
y
s
te
m
d
o
es
n
’
t
r
eq
u
ir
e
a
GP
U
f
o
r
co
m
p
u
ta
tio
n
s
a
n
d
estab
li
s
h
e
s
i
m
p
r
o
v
ed
r
esu
lt
s
co
m
p
ar
ed
to
p
r
ev
io
u
s
w
o
r
k
s
,
it su
f
f
er
s
f
r
o
m
lo
w
ac
cu
r
ac
y
.
R
ec
en
t
r
esear
c
h
w
o
r
k
o
n
f
i
g
h
t
an
d
v
io
le
n
ce
d
etec
tio
n
s
h
o
ws
th
e
e
x
te
n
s
i
v
e
i
m
p
le
m
e
n
tatio
n
o
f
d
ee
p
lear
n
in
g
ar
ch
itect
u
r
es
s
u
c
h
as
co
n
v
o
l
u
tio
n
al
n
e
u
r
al
n
et
w
o
r
k
s
(
C
NN
s
)
[
1
5
]
,
lo
n
g
s
h
o
r
t
-
ter
m
m
e
m
o
r
y
(
L
ST
Ms)
,
an
d
t
w
o
s
tr
ea
m
C
N
Ns
[
1
6
]
.
T
h
ese
au
to
m
atic
m
et
h
o
d
s
p
er
f
o
r
m
m
u
c
h
b
etter
th
a
n
th
e
h
an
d
-
cr
af
ted
alg
o
r
ith
m
s
u
s
ed
f
o
r
s
p
atio
-
te
m
p
o
r
al
f
ea
tu
r
e
ex
tr
ac
tio
n
.
Mu
m
taz
et
a
l.
[
1
7
]
u
s
ed
tr
an
s
f
er
lear
n
i
n
g
w
it
h
Go
o
g
L
eNe
t
(
I
n
ce
p
tio
n
)
[
1
8
]
w
h
ic
h
co
n
s
is
ted
o
f
2
2
la
y
er
s
,
o
v
er
th
e
t
w
o
p
o
p
u
lar
d
atasets
-
Ho
ck
e
y
a
n
d
Mo
v
ie
s
.
B
o
th
o
f
th
e
d
ata
s
ets
h
a
v
e
t
h
eir
o
w
n
co
m
p
le
x
it
y
.
T
h
e
an
n
o
tated
v
id
eo
s
w
er
e
c
o
n
v
er
ted
i
n
to
lab
elled
i
m
ag
e
f
r
a
m
e
s
,
a
n
d
th
e
1
0
0
0
class
l
a
y
er
is
r
ep
lace
d
w
i
th
t
w
o
ca
te
g
o
r
ies
(
v
io
le
n
ce
an
d
n
o
n
-
v
io
len
ce
)
.
T
h
e
r
esu
lt
s
h
o
w
s
t
h
e
ac
c
u
r
ac
y
o
f
9
9
.
2
8
%
an
d
9
9
.
9
7
%
in
th
e
Ho
ck
e
y
a
n
d
Mo
v
ies
d
ataset
r
esp
ec
tiv
el
y
.
I
n
[
1
9
]
,
P
er
ez
et
a
l.
p
r
o
p
o
s
ed
a
m
et
h
o
d
o
lo
g
y
o
f
f
ea
t
u
r
e
ex
tr
ac
tio
n
w
it
h
t
h
e
t
w
o
-
s
tr
ea
m
b
ased
s
o
lu
tio
n
,
co
n
s
i
s
ti
n
g
o
f
t
w
o
d
if
f
er
en
t
2
D
-
C
NN
m
o
d
els.
O
n
e
is
tr
ain
ed
w
i
th
t
h
e
R
GB
f
r
a
m
es o
f
t
h
e
v
id
eo
an
d
th
e
o
th
er
o
n
e
is
tr
ai
n
ed
w
it
h
a
s
tack
o
f
o
p
tical
f
lo
w
s
f
r
o
m
th
e
v
id
eo
f
r
a
m
e
s
.
Vio
len
ce
d
etec
tio
n
u
s
i
n
g
C
N
N
an
d
L
ST
M
h
as b
ee
n
e
m
p
lo
y
ed
i
n
[
2
0
]
w
h
er
e
t
h
e
a
u
t
h
o
r
c
o
m
b
i
n
es
t
h
e
th
r
ee
b
en
ch
m
ar
k
d
atasets
.
T
h
e
m
eth
o
d
o
lo
g
y
i
n
v
o
lv
e
s
th
e
u
s
e
o
f
a
p
r
e
-
tr
ain
ed
C
NN
ar
ch
itec
tu
r
e
VGG
-
1
9
[
2
1
]
f
o
llo
w
ed
b
y
L
ST
M
w
h
ic
h
w
o
r
k
s
w
i
th
an
i
n
p
u
t
o
f
3
0
f
r
a
m
e
s
at
a
ti
m
e.
T
h
e
r
es
u
lt
s
f
r
o
m
C
NN
o
f
ea
c
h
f
r
a
m
e
ar
e
g
r
o
u
p
ed
an
d
th
e
n
f
ed
t
o
th
e
L
ST
M
as a
s
eq
u
e
n
ce
.
T
h
e
m
o
d
el
p
er
f
o
r
m
s
w
ell,
w
i
th
a
n
ac
cu
r
ac
y
o
f
9
4
.
7
6
5
%
o
n
th
eir
co
m
b
i
n
ed
d
ataset.
T
h
e
m
et
h
o
d
o
lo
g
y
p
r
o
p
o
s
ed
in
[
2
2
]
in
v
o
lv
e
s
t
h
e
u
s
e
o
f
C
NN
alo
n
g
w
i
th
C
o
n
v
L
ST
M
f
o
r
ch
ar
ac
ter
izin
g
t
h
e
v
id
eo
s
.
T
h
ey
h
a
v
e
m
ad
e
u
s
e
o
f
t
h
e
A
lex
Net
m
o
d
el
p
r
e
-
tr
ain
ed
o
n
th
e
I
m
ag
eNe
t
d
atab
ase
as
th
e
C
NN
m
o
d
el
f
o
r
f
ea
t
u
r
e
ex
tr
ac
tio
n
.
T
h
eir
r
esu
lts
h
a
v
e
b
ee
n
p
r
o
v
ed
w
ith
t
h
e
th
r
ee
b
en
ch
m
ar
k
d
atasets
,
ac
h
iev
in
g
t
h
e
m
a
x
i
m
u
m
1
0
0
%
ac
cu
r
ac
y
w
i
th
th
e
Mo
v
ie
s
d
ataset.
T
h
e
m
e
th
o
d
o
lo
g
y
f
r
o
m
[
2
3
]
f
o
cu
s
e
s
o
n
cr
ea
tin
g
a
n
e
w
l
o
ca
lized
g
u
i
d
ed
f
ig
h
t
ac
tio
n
d
etec
tio
n
f
r
a
m
e
w
o
r
k
f
o
r
r
ea
lis
tic
s
u
r
v
eilla
n
ce
v
id
eo
s
.
A
n
e
w
d
ataset
w
it
h
1
5
2
0
v
id
eo
s
w
a
s
p
r
o
p
o
s
ed
an
d
s
tate
-
of
-
t
h
e
-
ar
t
m
o
d
els
w
er
e
tr
ain
ed
o
v
er
th
e
d
ataset
to
ac
h
iev
e
h
ig
h
le
v
els
o
f
ac
cu
r
ac
ie
s
.
T
h
e
y
h
a
v
e
u
s
ed
th
e
p
r
e
-
tr
ai
n
ed
S
SD
-
V
GG1
6
f
o
r
h
u
m
an
d
et
ec
tio
n
an
d
th
e
p
r
e
-
tr
ain
ed
Flo
w
Ne
t
2
.
0
[
2
4
]
m
o
d
el
f
o
r
esti
m
ati
n
g
o
p
tical
f
lo
w
.
A
t
w
o
-
s
tr
ea
m
C
3
D
n
et
w
o
r
k
[
2
5
]
is
tr
ain
ed
o
n
th
e
ac
tiv
e
r
eg
io
n
s
e
x
tr
ac
ted
f
r
o
m
t
h
e
lo
ca
lizatio
n
p
h
a
s
e
w
h
ich
ar
e
later
co
m
b
i
n
ed
f
o
r
p
r
ed
ictio
n
s
.
I
n
ter
esti
n
g
l
y
,
Ser
r
an
o
et
a
l.
[
2
6
]
p
r
o
p
o
s
ed
a
m
et
h
o
d
w
h
er
e
a
v
id
eo
s
eq
u
e
n
ce
is
s
u
m
m
ar
i
ze
d
in
to
a
r
ep
r
esen
tab
le
i
m
ag
e,
w
h
ich
c
an
b
e
u
s
ed
to
clas
s
i
f
y
t
h
e
s
ce
n
e
a
s
v
io
len
t
o
r
n
o
n
-
v
io
le
n
t
.
A
d
d
itio
n
all
y
,
t
h
e
m
et
h
o
d
o
lo
g
y
lev
er
a
g
es
t
h
o
s
e
zo
n
es
th
at
co
u
ld
b
e
im
p
o
r
ta
n
t
f
o
r
th
e
cla
s
s
i
f
icatio
n
.
So
,
th
e
m
o
s
t
i
m
p
o
r
tan
t
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
8
-
8708
I
n
t J
E
lec
&
C
o
m
p
E
n
g
,
Vo
l.
11
,
No
.
4
,
A
u
g
u
s
t
2021
:
3
3
7
4
-
3380
3376
p
ar
ts
o
f
th
e
s
eq
u
e
n
ce
ar
e
g
i
v
e
n
m
o
r
e
w
eig
h
t,
g
iv
in
g
les
s
i
m
p
o
r
tan
ce
to
n
o
is
e
an
d
s
tatic
b
ac
k
g
r
o
u
n
d
.
Fi
n
all
y
,
2
D
C
NN
ta
k
es
t
h
e
r
ep
r
esen
tat
iv
e
i
m
ag
e
to
class
i
f
y
it
as
n
ee
d
ed
.
T
h
e
r
esu
lt
p
r
o
v
id
ed
an
ac
cu
r
ac
y
o
f
9
9
±
0
.
5
%
f
o
r
Mo
v
ies a
n
d
9
4
.
6
±
0
.
6
% f
o
r
th
e
Ho
ck
e
y
d
ata
s
et.
Se
y
m
an
u
r
et
a
l.
[
2
7
]
s
o
lv
e
th
e
p
r
o
b
lem
o
f
d
etec
ti
n
g
f
i
g
h
ts
f
r
o
m
s
u
r
v
eilla
n
ce
ca
m
er
as a
n
d
ex
p
lo
r
e
th
e
L
ST
M
-
b
ased
ap
p
r
o
ac
h
to
u
n
r
av
el
f
i
g
h
t
d
etec
tio
n
in
v
id
e
o
w
it
h
th
e
as
s
is
ta
n
ce
o
f
an
atten
tio
n
la
y
er
.
I
t's
r
ec
o
g
n
ized
th
a
t
t
h
is
m
et
h
o
d
th
at
h
as
b
ee
n
p
r
o
p
o
s
ed
,
co
n
f
lu
e
n
ce
s
th
e
Xce
p
tio
n
[
2
8
]
m
o
d
el
as
w
el
l
as
B
iL
ST
M.
I
n
[
2
9
]
,
th
e
v
io
le
n
ce
d
etec
tio
n
s
y
s
te
m
h
as
b
ee
n
i
m
p
le
m
e
n
ted
u
s
i
n
g
3
D
C
o
n
v
Net
a
n
d
k
e
y
f
r
a
m
e
ex
tr
ac
tio
n
al
g
o
r
ith
m
to
p
r
o
d
u
ce
an
e
f
f
ec
ti
v
e
ap
p
r
o
ac
h
.
T
h
ey
ac
h
ie
v
ed
9
3
.
5
%
w
it
h
t
h
e
C
r
o
w
d
v
io
le
n
ce
d
ataset.
Keç
eli
a
t
a
l.
[
3
0
]
u
s
e
tr
an
s
f
er
lear
n
i
n
g
ag
ai
n
u
s
i
n
g
Alex
Net
b
u
t
h
as
u
til
ized
th
e
L
u
ca
s
-
Kan
ad
e
m
et
h
o
d
f
o
r
f
i
n
d
in
g
t
h
e
o
p
tical
f
lo
w
s
o
f
t
h
e
s
eq
u
e
n
ce
o
f
f
r
a
m
es.
Fro
m
th
e
o
p
tical
f
lo
w
v
alu
es,
te
m
p
late
s
ar
e
m
ad
e
w
h
ic
h
ar
e
t
h
e
n
g
i
v
en
as
t
h
e
i
n
p
u
t
to
p
r
e
-
tr
ain
ed
C
NN
A
le
x
Net
f
o
r
f
ea
t
u
r
e
e
x
tr
ac
tio
n
.
Fi
n
all
y
,
t
w
o
class
i
f
ier
s
ar
e
e
m
p
lo
y
ed
s
u
p
p
o
r
t
v
ec
to
r
m
ac
h
i
n
es
(
S
VM
)
an
d
s
u
b
s
p
ac
e
k
-
n
ea
r
est
n
e
ig
h
b
o
r
s
(
Sk
N
N
)
.
T
h
e
b
est
r
esu
lt
s
h
a
v
e
b
ee
n
o
b
tain
ed
f
r
o
m
th
e
SVM
clas
s
i
f
ier
.
3.
P
RO
P
O
SE
D
M
E
T
H
O
D
T
h
e
p
r
o
p
o
s
ed
ar
ch
itectu
r
e
u
s
e
s
co
n
v
o
l
u
tio
n
n
e
u
r
al
n
e
t
w
o
r
k
s
as
th
e
s
p
atial
f
ea
tu
r
e
e
x
tr
ac
to
r
f
o
llo
w
e
d
b
y
an
L
ST
M
n
et
w
o
r
k
to
p
er
f
o
r
m
s
eq
u
e
n
ce
p
r
ed
ictio
n
o
n
t
h
e
f
ea
t
u
r
e
v
ec
to
r
s
.
Fo
r
th
e
s
p
atial
f
ea
t
u
r
e
ex
tr
ac
tio
n
,
w
e
h
a
v
e
e
m
p
lo
y
e
d
a
tr
an
s
f
er
lear
n
i
n
g
ap
p
r
o
ac
h
w
it
h
C
NN.
T
h
e
ar
ch
itectu
r
e
o
f
Xce
p
tio
n
[
2
8
]
n
et
w
o
r
k
h
a
s
b
ee
n
co
n
s
id
er
ed
w
it
h
th
e
p
r
e
-
tr
ai
n
ed
m
o
d
el
o
n
th
e
I
m
a
g
eNe
t
d
ataset
[
3
1
]
.
I
n
s
tead
o
f
tr
ain
i
n
g
f
r
o
m
s
cr
atc
h
,
w
e
u
s
ed
a
p
r
e
-
t
r
ain
ed
Xce
p
tio
n
m
o
d
el
as
a
f
ea
tu
r
e
ex
tr
ac
to
r
as
it
p
er
f
o
r
m
ed
b
etter
th
an
o
th
e
r
p
r
e
-
tr
ain
ed
C
NN
m
o
d
els
li
k
e
VGG
[
2
1
]
,
L
eNe
t
[
3
2
]
o
r
R
esNet.
[
3
3
]
.
A
n
d
w
e
f
in
e
-
t
u
n
e
d
it
b
y
k
ee
p
in
g
t
h
e
in
itial la
y
er
s
i
n
tact
a
n
d
r
etr
ain
i
n
g
t
h
e
las
t 4
la
y
er
s
o
n
r
esp
ec
ti
v
e
d
atasets
Fo
r
th
e
d
ata
s
ets,
w
e
h
av
e
co
n
s
id
er
ed
Ho
ck
e
y
,
Mo
v
ies,
a
n
d
th
e
U
C
F
C
r
i
m
e
d
at
a
s
et.
H
o
ck
e
y
a
n
d
Mo
v
ies
ar
e
t
h
e
w
e
ll
-
k
n
o
w
n
s
t
an
d
ar
d
b
en
ch
m
ar
k
d
ata
s
ets.
U
C
F
C
r
i
m
e
d
ataset
[
3
4
]
co
n
s
i
s
t
s
o
f
C
C
T
V
f
o
o
tag
e
o
f
v
ar
io
u
s
ca
teg
o
r
ies
o
f
v
io
le
n
ce
.
W
e
h
av
e
u
s
ed
4
ca
teg
o
r
ies,
n
a
m
el
y
-
f
ig
h
ti
n
g
,
as
s
au
lt,
ab
u
s
e,
a
n
d
ar
r
est,
f
o
r
m
i
n
g
t
h
e
v
io
len
ce
ca
te
g
o
r
y
,
a
n
d
th
e
n
o
r
m
al
ca
te
g
o
r
y
a
s
t
h
e
n
o
n
-
v
io
len
ce
ca
teg
o
r
y
.
T
h
e
v
io
len
ce
v
id
eo
s
w
er
e
m
an
u
all
y
tr
i
m
m
ed
to
co
n
tain
o
n
l
y
t
h
e
s
ce
n
es h
a
v
i
n
g
v
i
o
len
ce
in
t
h
e
m
.
I
n
th
e
ar
ch
itec
tu
r
e,
w
e
ar
e
d
ea
lin
g
w
it
h
s
eq
u
e
n
tia
l
in
p
u
t
o
f
s
h
ap
e
(
1
5
x
2
0
0
x
2
0
0
x
3
)
w
h
ic
h
co
r
r
esp
o
n
d
s
to
(
f
r
a
m
e
x
H
x
W
x
ch
a
n
n
els).
I
n
r
ea
l
-
ti
m
e
co
n
s
id
er
in
g
a
3
0
f
p
s
f
ee
d
,
f
o
r
ev
er
y
9
0
f
r
a
m
es,
e
v
er
y
6
th
f
r
a
m
e
is
tak
e
n
to
f
o
r
m
a
s
eq
u
en
ce
o
f
1
5
f
r
a
m
es.
Her
e
w
e
ar
e
u
s
i
n
g
a
s
eq
u
en
ce
o
f
1
5
f
r
a
m
es
w
h
er
e
ea
ch
f
r
a
m
e
is
o
f
a
n
R
GB
f
o
r
m
at
an
d
w
i
th
2
0
0
x
2
0
0
s
ize.
B
u
t
s
in
c
e
th
e
Xce
p
tio
n
n
et
w
o
r
k
ca
n
o
n
l
y
ac
ce
p
t 3
D
in
p
u
t
s
w
e
u
s
ed
t
h
e
ti
m
e
d
i
s
tr
ib
u
ted
la
y
er
.
T
im
e
d
is
tr
ib
u
ted
la
y
er
w
r
ap
p
er
ap
p
lies
th
e
co
n
v
o
l
u
ti
o
n
m
o
d
el
to
e
v
er
y
te
m
p
o
r
al
s
lice
o
f
t
h
e
i
n
p
u
t.
T
im
e
D
is
tr
ib
u
ted
is
a
s
p
ec
ial
t
y
p
e
o
f
la
y
er
w
r
ap
p
er
p
r
esen
t
i
n
t
h
e
Ker
as
l
ib
r
ar
y
.
I
t
ap
p
lies
th
e
s
a
m
e
la
y
er
f
o
r
a
lis
t o
f
ch
r
o
n
o
lo
g
ical
i
n
p
u
t.
I
n
o
u
r
m
o
d
el,
th
e
in
p
u
t
co
n
s
is
t
s
o
f
1
5
ch
r
o
n
o
lo
g
icall
y
o
r
d
er
ed
f
r
am
es,
s
o
t
h
e
ti
m
e
d
is
tr
ib
u
tio
n
o
p
er
atio
n
ap
p
lies
th
e
s
a
m
e
X
ce
p
tio
n
m
o
d
el
f
o
r
ev
er
y
f
r
a
m
e.
T
h
ese
la
y
er
s
s
h
ar
e
t
h
e
s
a
m
e
w
ei
g
h
ts
.
Fo
r
1
5
i
m
a
g
es,
th
e
w
ei
g
h
t
s
ar
e
n
o
t
t
w
ea
k
ed
1
5
ti
m
es,
b
u
t
o
n
l
y
o
n
ce
,
an
d
d
is
tr
ib
u
ted
to
e
v
er
y
b
lo
ck
d
ef
in
ed
i
n
th
e
cu
r
r
en
t
ti
m
e
d
is
tr
ib
u
ted
la
y
er
.
W
e
u
s
e
t
h
i
s
tec
h
n
iq
u
e
to
ap
p
ly
t
h
e
Xce
p
tio
n
n
et
w
o
r
k
to
all
1
5
in
p
u
ts
to
g
e
t
f
ea
t
u
r
e
m
ap
s
w
it
h
2
0
4
8
ch
an
n
e
ls
.
T
h
ese
f
ea
tu
r
e
m
ap
s
ar
e
th
en
f
lat
ten
ed
to
a
2
D
ten
s
o
r
o
f
s
h
ap
e
(
1
5
,
1
0
0
3
5
2
)
w
h
ic
h
i
s
f
ed
in
to
th
e
L
ST
M
h
av
in
g
5
1
2
ce
lls
th
a
t
tr
y
to
lea
r
n
ti
m
e
r
elatio
n
s
b
et
w
ee
n
1
5
-
t
i
m
e
s
tep
s
.
Fi
n
all
y
,
w
e
ta
k
e
th
e
1
D
p
r
ed
ictio
n
s
f
r
o
m
L
ST
M
an
d
f
ee
d
it
to
a
s
e
r
ies
o
f
d
en
s
e
la
y
er
s
to
g
et
t
h
e
o
u
tp
u
t
p
r
ed
ictio
n
s
.
So
f
t
m
ax
ac
t
iv
at
io
n
is
u
s
ed
i
n
t
h
e
o
u
tp
u
t la
y
er
.
T
h
e
b
asic a
r
ch
itect
u
r
e
is
s
h
o
w
n
i
n
Fi
g
u
r
e
1
(
s
ee
in
ap
p
en
d
ix
)
.
Fo
r
tr
ain
in
g
,
w
e
u
s
ed
cr
o
s
s
en
tr
o
p
y
lo
s
s
as
t
h
e
lo
s
s
f
u
n
ct
i
o
n
.
A
b
atch
n
o
r
m
aliza
tio
n
la
y
er
ad
d
ed
b
ef
o
r
e
th
e
d
en
s
e
la
y
er
s
h
e
lp
ed
to
r
ed
u
ce
th
e
lo
s
s
an
d
s
p
ee
d
u
p
th
e
tr
ai
n
i
n
g
p
r
o
ce
s
s
b
y
a
b
it.
W
e
also
u
s
ed
d
r
o
p
o
u
t [
3
5
]
lay
er
s
to
r
ed
u
ce
th
e
o
v
er
f
i
tti
n
g
o
f
th
e
m
o
d
el
o
n
tr
ain
i
n
g
d
ata.
4.
SYST
E
M
ARCH
I
T
E
CT
U
R
E
Ou
r
s
y
s
te
m
ai
m
s
to
p
r
o
v
id
e
r
ea
l
-
ti
m
e
v
io
le
n
ce
d
etec
tio
n
th
at
ca
n
d
ec
r
ea
s
e
lab
o
r
an
d
h
elp
th
e
co
n
ce
r
n
ed
p
o
lice
au
th
o
r
itie
s
t
o
ef
f
icie
n
tl
y
m
o
n
ito
r
t
h
eir
lo
ca
lit
y
a
n
d
q
u
ic
k
l
y
te
n
d
to
an
y
s
ite
i
f
v
io
le
n
ce
h
a
s
b
ee
n
d
etec
ted
.
T
h
e
ar
ch
itect
u
r
e
is
i
n
te
g
r
ated
w
it
h
v
ar
io
u
s
cl
o
u
d
s
er
v
ice
s
t
h
at
ar
e
h
i
g
h
l
y
s
c
alab
le
an
d
r
o
b
u
s
t
as
s
h
o
w
n
i
n
Fi
g
u
r
e
2
.
A
s
m
o
r
e
an
d
m
o
r
e
lo
ca
liti
e
s
an
d
p
o
lic
e
d
ep
ar
tm
e
n
ts
w
ill
b
e
ad
d
ed
,
s
ca
lab
ilit
y
p
la
y
s
a
cr
u
cial
r
o
le.
On
th
e
d
etec
tio
n
o
f
an
y
v
io
le
n
ce
,
th
e
d
etails
o
f
th
e
o
cc
u
r
r
en
ce
s
u
ch
a
s
th
e
ca
m
er
a
I
D,
lo
ca
tio
n
,
ti
m
e
s
ta
m
p
,
an
d
f
e
w
s
n
ap
s
h
o
t
s
o
f
th
e
o
cc
u
r
r
en
ce
ar
e
s
av
ed
i
n
th
e
d
atab
ase,
f
o
llo
w
ed
b
y
an
aler
t
th
at
is
i
s
s
u
ed
b
y
t
h
e
clo
u
d
m
e
s
s
a
g
e
s
er
v
ic
e
to
all
th
e
co
n
ce
r
n
ed
au
th
o
r
ities
o
f
th
at
p
ar
ticu
lar
lo
ca
lit
y
.
W
e
h
av
e
u
s
ed
P
o
s
tg
r
eSQL
f
o
r
d
esig
n
i
n
g
o
u
r
d
atab
ase
s
ch
e
m
a.
T
h
e
m
o
b
ile
ap
p
licatio
n
allo
ws
t
h
e
p
o
lice
a
u
t
h
o
r
ities
to
co
n
ti
n
u
o
u
s
l
y
m
o
n
ito
r
t
h
eir
lo
ca
lit
y
all
t
h
e
ti
m
e.
T
h
e
ap
p
p
r
o
v
id
es
th
e
liv
e
f
ee
d
o
f
th
e
s
u
r
v
eilla
n
ce
th
a
t
th
e
au
t
h
o
r
it
y
ca
n
ch
o
o
s
e
to
s
u
p
er
v
is
e.
E
v
er
y
aler
t
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J
E
lec
&
C
o
m
p
E
n
g
I
SS
N:
2088
-
8708
A
fu
lly
in
teg
r
a
ted
vio
len
ce
d
et
ec
tio
n
s
ystem
u
s
in
g
C
N
N
a
n
d
LS
TM
(
S
a
r
th
a
k
S
h
a
r
ma
)
3377
is
s
u
ed
ca
n
b
e
m
ar
k
ed
as
atte
n
d
ed
b
y
t
h
e
p
o
lice
w
h
e
n
t
h
e
y
s
tar
t
tak
i
n
g
an
y
k
in
d
o
f
ac
t
io
n
.
T
h
is
m
a
k
es
s
u
r
e
th
at
e
v
er
y
o
n
e
el
s
e
in
t
h
at
lo
ca
lit
y
’
s
d
ep
ar
t
m
e
n
t
i
s
u
p
d
ated
a
b
o
u
t
th
e
s
tatu
s
o
f
t
h
e
a
ler
t
a
n
d
w
h
o
’
s
ta
k
i
n
g
ca
r
e
o
f
it.
Fig
u
r
e
2
.
S
y
s
te
m
a
r
ch
itect
u
r
e
5.
RE
SU
L
T
AND
ANA
L
YS
I
S
T
h
e
m
o
b
ile
ap
p
licatio
n
h
as
b
ee
n
s
u
cc
es
s
f
u
ll
y
d
e
v
elo
p
ed
an
d
i
n
te
g
r
ated
w
ith
all
t
h
e
ess
e
n
tia
l
s
er
v
ices,
to
p
r
o
v
id
e
a
r
o
b
u
s
t
v
io
len
ce
d
etec
tio
n
s
y
s
te
m
a
s
s
h
o
w
n
Fi
g
u
r
e
3
.
I
t
w
o
u
ld
co
v
er
all
t
h
e
s
ec
u
r
it
y
ca
m
er
as
th
a
t
ar
e
in
s
talled
in
t
h
e
au
t
h
o
r
it
y
’
s
l
o
ca
lit
y
u
n
d
er
co
n
tr
o
l.
Du
r
in
g
an
y
o
cc
u
r
r
en
ce
o
f
v
io
len
ce
,
th
e
o
f
f
icia
ls
g
o
v
er
n
i
n
g
th
at
lo
ca
lit
y
ar
e
i
m
m
ed
iatel
y
n
o
ti
f
ie
d
.
T
h
e
aler
t
r
ec
eiv
ed
p
r
o
v
id
es
all
th
e
cr
u
cial
in
f
o
r
m
atio
n
r
eq
u
ir
ed
to
tak
e
a
n
y
n
ec
e
s
s
ar
y
ac
tio
n
i
m
m
ed
iat
el
y
.
I
t
also
p
r
o
v
id
es
a
f
e
w
s
n
ap
s
h
o
ts
o
f
th
e
ar
ea
th
at
ca
n
h
elp
t
h
e
m
m
ak
e
b
ett
er
d
ec
is
io
n
s
o
n
h
o
w
to
b
r
in
g
th
e
s
it
u
atio
n
u
n
d
er
co
n
tr
o
l.
E
v
er
y
f
u
n
c
tio
n
ali
t
y
is
test
ed
in
r
ea
l
-
ti
m
e
an
d
h
a
s
b
ee
n
co
n
n
ec
ted
to
w
o
r
k
w
it
h
o
u
r
p
r
o
p
o
s
ed
d
ee
p
lea
r
n
in
g
m
o
d
el
.
Fig
u
r
e
3
.
Mo
b
ile
a
p
p
licatio
n
(
Vio
M
o)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
8
-
8708
I
n
t J
E
lec
&
C
o
m
p
E
n
g
,
Vo
l.
11
,
No
.
4
,
A
u
g
u
s
t
2021
:
3
3
7
4
-
3380
3378
Ou
r
m
o
d
el
b
ased
o
n
Xce
p
tio
n
an
d
L
ST
M
w
as
f
ir
s
t
tr
ai
n
ed
o
n
th
e
t
w
o
b
en
c
h
m
ar
k
d
atase
t
s
,
n
a
m
el
y
-
Ho
ck
e
y
a
n
d
Mo
v
ies.
T
h
e
Mo
v
ies
d
ataset
co
n
s
is
t
s
o
f
2
0
0
v
id
eo
s
,
an
d
th
e
Ho
ck
e
y
s
d
ataset
co
n
s
is
t
s
o
f
1
0
0
0
v
id
eo
s
.
T
h
e
d
atasets
h
a
v
e
an
e
q
u
al
n
u
m
b
er
o
f
V
io
len
ce
a
n
d
No
n
-
V
io
len
ce
v
id
eo
s
.
T
h
e
co
d
e
h
as
b
ee
n
w
r
itte
n
in
P
y
t
h
o
n
u
s
i
n
g
t
h
e
Ker
as
l
ib
r
ar
y
.
T
h
e
tr
ai
n
in
g
w
as
d
o
n
e
w
i
th
a
5
-
f
o
ld
cr
o
s
s
v
a
lid
atio
n
te
ch
n
iq
u
e
h
a
v
in
g
3
5
ep
o
ch
s
f
o
r
ea
ch
f
o
ld
.
T
h
e
r
esu
lts
o
f
tes
t a
cc
u
r
ac
ies ar
e
d
is
p
la
y
ed
an
d
co
m
p
ar
ed
in
T
ab
le
1
.
T
ab
le
1
.
A
cc
u
r
ac
ies r
ep
o
r
ted
o
n
th
e
b
en
c
h
m
ar
k
d
ataset
s
-
M
o
v
ies
an
d
Ho
ck
e
y
A
l
g
o
r
i
t
h
m
Y
e
a
r
M
o
v
i
e
s d
a
t
a
se
t
-
a
c
c
u
r
a
c
y
H
o
c
k
e
y
d
a
t
a
se
t
-
a
c
c
u
r
a
c
y
M
o
S
F
I
T
+
H
I
K
[
6
]
2
0
1
1
8
9
.
5
0
%
9
0
.
9
0
%
O
p
t
i
c
a
l
f
l
o
w
s+C
N
N
+
S
V
M
a
n
d
S
k
N
N
[
2
3
]
2
0
1
7
9
6
.
5
0
%
9
4
.
5
0
%
I
n
c
e
p
t
i
o
n
+
T
r
a
n
sf
e
r
l
e
a
r
n
i
n
g
[
1
7
]
2
0
1
8
9
9
.
9
7
%
9
9
.
2
8
%
F
A
S
T
D
e
t
e
c
t
o
r
+
H
o
u
g
h
f
o
r
e
st
s
w
i
t
h
C
N
N
c
l
a
ssi
f
i
e
r
[
2
6
]
2
0
1
8
9
9
%
9
4
.
6
0
%
M
u
l
t
i
-
st
r
e
a
m V
G
G
-
1
6
[
1
3
]
2
0
1
9
1
0
0
%
8
9
.
1
0
%
V
G
G
1
9
+
L
S
T
M
[
2
0
]
2
0
1
9
1
0
0
%
9
6
.
3
3
%
H
O
G
+
R
a
n
d
o
m F
o
r
e
st
[
2
7
]
2
0
1
9
--
8
6
.
0
0
%
Pr
o
p
o
sed:
X
c
e
p
t
i
o
n
+
L
S
T
M
2
0
2
0
9
8
.
3
2
%
9
6
.
5
5
%
We
o
b
s
er
v
ed
th
at
t
h
e
m
o
d
el
tr
ain
ed
o
n
t
h
e
b
en
c
h
m
ar
k
d
ata
s
ets
d
o
n
o
t
w
o
r
k
ac
c
u
r
atel
y
w
it
h
t
h
e
r
ea
l
-
ti
m
e
C
C
T
V
f
o
o
tag
e.
I
t
is
m
ai
n
l
y
d
u
e
to
t
h
e
f
ac
t t
h
at
t
h
e
v
id
e
o
s
ar
e
u
n
r
ea
li
s
tic
a
n
d
d
o
n
o
t a
p
tl
y
d
ep
ict
th
e
r
ea
l
-
w
o
r
ld
s
ce
n
ar
io
s
.
T
h
ese
v
id
eo
s
d
if
f
er
a
lo
t
f
r
o
m
t
h
e
ac
t
u
al
C
C
T
V
o
n
es
in
ter
m
s
o
f
t
h
e
ca
m
er
a
an
g
le
to
o
.
T
o
o
v
er
co
m
e
th
is
a
n
d
v
alid
ate
o
u
r
ar
ch
itect
u
r
e
f
o
r
r
ea
l
-
ti
m
e
a
n
al
y
s
i
s
,
th
e
U
C
F
C
r
i
m
e
d
atas
et
w
a
s
ta
k
en
w
h
ic
h
m
ak
e
s
o
u
r
m
o
d
el
p
er
f
o
r
m
b
et
ter
in
r
ea
l
-
ti
m
e.
Ou
r
m
o
d
if
ied
UC
F
C
r
i
m
e
d
ataset
co
n
s
i
s
ts
o
f
an
eq
u
al
n
u
m
b
er
o
f
1
6
0
tr
im
m
ed
v
io
len
ce
a
n
d
n
o
n
-
v
io
le
n
ce
v
id
eo
s
.
W
e
tr
ain
ed
th
e
m
o
d
el
o
n
t
h
is
f
o
r
5
0
ep
o
ch
s
,
w
it
h
a
tr
ain
/tes
t
s
p
lit
s
ch
e
m
e
w
i
th
8
0
%
f
o
r
tr
ain
i
n
g
a
n
d
2
0
%
f
o
r
test
i
n
g
.
T
h
e
r
esu
lts
f
r
o
m
t
h
e
s
a
m
e
ar
e
s
h
o
w
n
in
Fig
u
r
e
4
.
W
e
ac
h
iev
ed
r
esu
lt
s
f
r
o
m
th
e
te
s
ti
n
g
s
et
o
f
ac
cu
r
a
c
y
o
f
9
8
.
8
7
%,
w
h
ic
h
is
b
etter
th
an
t
h
e
r
es
u
lts
w
e
o
b
tain
ed
u
s
i
n
g
t
h
e
b
en
c
h
m
ar
k
d
atasets
an
d
t
h
e
o
th
er
ex
i
s
ti
n
g
s
y
s
te
m
s
.
T
h
e
lo
s
s
d
ep
icted
b
y
b
o
th
tr
ain
an
d
test
d
atasets
ar
e
ca
lcu
lated
u
s
i
n
g
c
ateg
o
r
ical
cr
o
s
s
-
e
n
tr
o
p
y
.
T
h
e
p
r
o
p
o
s
ed
m
o
d
el
tr
ain
ed
t
ak
es
ap
p
r
o
x
i
m
atel
y
7
.
8
9
m
il
li
s
ec
o
n
d
s
p
er
f
r
a
m
e
f
o
r
t
h
e
cla
s
s
i
f
icatio
n
(
w
it
h
o
u
t
in
cl
u
d
i
n
g
t
h
e
ti
m
e
f
o
r
p
r
ep
r
o
ce
s
s
in
g
)
.
Ov
er
all,
t
h
e
p
r
o
p
o
s
ed
m
et
h
o
d
tak
es
a
p
p
r
o
x
im
a
tel
y
0
.
2
8
s
ec
o
n
d
s
f
o
r
p
r
o
ce
s
s
in
g
o
f
a
3
-
s
ec
o
n
d
v
id
eo
clip
at
3
0
f
p
s
.
T
h
e
u
s
e
o
f
t
h
e
U
C
F
C
r
i
m
e
d
ataset
an
d
f
as
t
p
r
o
ce
s
s
in
g
ti
m
e
m
ak
e
s
it
s
u
ita
b
le
f
o
r
v
io
len
ce
d
etec
tio
n
in
r
ea
l
-
ti
m
e
v
id
eo
p
r
o
ce
s
s
in
g
ap
p
licatio
n
s
.
T
h
e
en
tire
ex
p
er
i
m
e
n
t
w
as p
er
f
o
r
m
ed
o
n
a
12
GB
NVI
DI
A
T
esla K
8
0
GP
U.
F
ig
u
r
e
4
.
A
cc
u
r
ac
y
an
d
l
o
s
s
g
r
ap
h
s
o
f
o
u
r
m
o
d
el
f
o
r
th
e
m
o
d
if
ied
UC
F
c
r
i
m
e
d
ataset
6.
CO
NCLU
SI
O
N
No
w
ad
a
y
s
,
t
h
e
r
ate
o
f
v
io
len
ce
ar
o
u
n
d
u
s
is
i
n
cr
ea
s
i
n
g
d
r
asti
ca
ll
y
,
ac
ti
n
g
as
a
t
h
r
ea
t
t
o
h
u
m
a
n
s
,
b
u
ild
in
g
s
,
a
n
d
s
y
s
te
m
s
.
T
h
er
e
h
as a
l
w
a
y
s
b
ee
n
a
n
ee
d
f
o
r
a
b
etter
s
y
s
te
m
th
at
ca
n
aid
t
h
e
p
o
lice
in
m
o
n
ito
r
in
g
th
e
v
io
len
ce
,
w
h
ich
is
u
s
u
all
y
h
ar
d
to
h
a
n
d
le
as
it i
s
a
g
r
o
u
p
ac
tiv
it
y
a
n
d
t
h
e
p
r
o
ce
s
s
o
f
eli
m
i
n
atio
n
to
f
in
d
t
h
e
cu
lp
r
it
is
ti
m
e
-
co
n
s
u
m
i
n
g
.
T
h
e
r
esu
lt
s
f
r
o
m
o
u
r
ex
p
er
i
m
en
t
d
e
m
o
n
s
tr
ate
th
e
e
f
f
ec
ti
v
e
u
s
e
o
f
C
NN+
L
ST
M
ar
ch
itect
u
r
e,
f
o
r
tr
ain
in
g
th
e
m
o
d
el
o
v
er
t
w
o
p
o
p
u
lar
Ho
ck
e
y
s
an
d
Mo
v
ies
d
atase
t,
an
d
o
u
r
m
o
d
i
f
ied
UC
F
C
r
i
m
e
d
ataset.
O
u
r
w
o
r
k
p
r
o
v
id
es
a
f
u
l
l
y
i
n
te
g
r
ated
s
y
s
te
m
th
at
ca
n
h
elp
th
e
p
o
lice
au
t
h
o
r
ities
m
o
n
ito
r
th
eir
ar
ea
u
n
d
er
co
n
tr
o
l.
I
t
e
f
f
icien
t
l
y
u
tili
ze
s
t
h
e
e
x
is
ti
n
g
c
lo
u
d
s
er
v
ices
to
d
eli
v
er
a
r
o
b
u
s
t
m
o
b
ile
ap
p
licatio
n
th
a
t
ca
n
en
h
an
ce
t
h
e
c
u
r
r
en
t
f
u
n
cti
o
n
s
o
f
t
h
e
p
o
lice
d
ep
ar
tm
e
n
t.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J
E
lec
&
C
o
m
p
E
n
g
I
SS
N:
2088
-
8708
A
fu
lly
in
teg
r
a
ted
vio
len
ce
d
et
ec
tio
n
s
ystem
u
s
in
g
C
N
N
a
n
d
LS
TM
(
S
a
r
th
a
k
S
h
a
r
ma
)
3379
Ou
r
m
o
d
el
t
h
at
h
a
s
b
ee
n
tr
ai
n
ed
o
v
er
th
e
b
en
ch
m
ar
k
d
atase
ts
h
a
s
s
h
o
w
n
v
er
y
p
o
w
er
f
u
l
r
e
s
u
lt
s
.
T
h
e
ex
ce
lle
n
t
r
esu
lt
s
o
v
er
th
e
UC
F
C
r
i
m
e
d
ata
s
et
w
it
h
a
g
o
o
d
s
p
ee
d
as
co
m
p
ar
ed
to
o
th
er
ap
p
r
o
ac
h
es
m
ak
e
s
o
u
r
m
o
d
el
f
u
n
c
tio
n
ac
c
u
r
ate
i
n
r
ea
l
-
li
f
e
s
ce
n
ar
io
s
.
A
p
ar
t
f
r
o
m
p
r
o
d
u
cin
g
a
n
o
v
el
m
o
d
el
f
o
r
v
io
len
ce
d
etec
tio
n
,
we
h
av
e
co
n
s
tr
u
cted
a
f
u
ll
y
f
u
n
c
t
io
n
in
g
s
y
s
te
m
t
h
at
s
u
p
p
o
r
ts
th
is
m
o
d
el
to
w
o
r
k
w
ell
in
t
h
e
r
ea
l
-
w
o
r
ld
,
m
ak
i
n
g
o
u
r
ap
p
r
o
ac
h
u
n
iq
u
e,
d
etailed
,
an
d
ad
v
an
ce
d
f
r
o
m
t
h
e
ex
is
ti
n
g
s
o
l
u
tio
n
s
.
T
h
e
s
y
s
te
m
p
r
ec
is
el
y
aler
ts
th
e
p
o
lice
th
r
o
u
g
h
t
h
e
ap
p
d
u
r
in
g
an
y
o
cc
u
r
r
en
ce
o
f
v
io
le
n
ce
an
d
allo
w
s
t
h
e
p
o
lice
to
tak
e
a
ctio
n
ac
co
r
d
in
g
l
y
.
T
h
e
au
th
o
r
it
y
ca
n
u
s
e
o
u
r
s
y
s
t
e
m
to
m
a
n
ag
e
t
h
e
cr
i
m
es a
r
o
u
n
d
th
e
m
in
a
m
u
ch
e
f
f
icie
n
t
m
an
n
er
.
Ou
r
p
r
o
p
o
s
ed
s
y
s
te
m
ca
n
b
e
i
m
p
r
o
v
ed
to
p
er
f
o
r
m
b
etter
in
m
a
n
y
a
s
p
ec
ts
.
T
h
e
s
y
s
te
m
ca
n
b
e
i
m
p
r
o
v
ed
b
y
s
p
ec
i
f
y
in
g
h
o
w
s
ev
er
e
t
h
e
d
etec
ted
v
io
le
n
c
e
is
.
T
h
is
ca
n
h
elp
t
h
e
a
u
t
h
o
r
ities
m
a
k
e
b
etter
d
ec
is
io
n
s
.
T
h
e
p
r
o
p
o
s
ed
Dee
p
L
ea
r
n
i
n
g
ar
ch
i
tectu
r
e
ca
n
b
e
alter
ed
b
y
m
o
d
if
y
i
n
g
t
h
e
h
y
p
er
p
ar
a
m
eter
s
,
to
i
m
p
r
o
v
e
th
e
p
er
f
o
r
m
a
n
ce
.
A
ls
o
,
th
e
s
y
s
te
m
ca
n
b
e
ex
te
n
d
ed
to
s
er
v
e
th
e
p
u
r
p
o
s
e
o
f
o
th
er
ty
p
e
s
o
f
d
etec
tio
n
s
u
c
h
as
f
ir
e
ac
cid
en
ts
a
n
d
b
u
r
g
lar
y
.
AP
P
E
NDI
X
Fig
u
r
e
1
.
Mo
d
el
a
r
ch
itectu
r
e
RE
F
E
R
E
NC
E
S
[1
]
M
.
M
a
rsz
a
łek
,
I.
L
a
p
te
v
,
a
n
d
C.
S
c
h
m
id
,
“
Ac
ti
o
n
s
in
c
o
n
tex
t
,
”
IEE
E
Co
mp
u
ter
S
o
c
iety
Co
n
fer
e
n
c
e
o
n
Co
m
p
u
ter
Vi
sio
n
a
n
d
P
a
tt
e
rn
Rec
o
g
n
it
i
o
n
W
o
rk
sh
o
p
s
,
2
0
0
9
.
[2
]
S
.
R.
Ke
,
H.
T
h
u
c
,
Y.
J.
L
e
e
,
J.
N.
Hw
a
n
g
,
J.
H.
Yo
o
,
a
n
d
K.
H
.
C
h
o
i
,
“
A
Re
v
ie
w
o
n
V
i
d
e
o
-
Ba
se
d
Hu
m
a
n
A
c
ti
v
it
y
Re
c
o
g
n
it
io
n
,
”
Co
mp
u
ter
s
,
2
0
1
3
.
[3
]
H.
Kim
,
S.
Lee
,
a
n
d
H.
Ju
n
g
,
“
Hu
m
a
n
a
c
ti
v
it
y
re
c
o
g
n
it
io
n
b
y
u
sin
g
c
o
n
v
o
lu
ti
o
n
a
l
n
e
u
ra
l
n
e
tw
o
rk
,”
In
ter
n
a
ti
o
n
a
l
J
o
u
rn
a
l
o
f
El
e
c
tro
n
ics
a
n
d
C
o
mm
u
n
ica
ti
o
n
E
n
g
i
n
e
e
rin
g
(
IJ
ECE
)
,
v
o
l.
9
,
n
o
.
6
,
pp
.
5
2
7
0
–
5
2
7
6
,
2
0
1
9
.
[4
]
D.
W
u
,
N.
S
h
a
rm
a
,
a
n
d
M
.
Bl
u
m
e
n
ste
in
,
“
Re
c
e
n
t
A
d
v
a
n
c
e
s
in
V
id
e
o
Ba
se
d
Hu
m
a
n
A
c
ti
o
n
Re
c
o
g
n
it
io
n
u
sin
g
De
e
p
L
e
a
rn
in
g
:
A
R
e
v
ie
w
,
”
In
ter
n
a
ti
o
n
a
l
J
o
i
n
t
C
o
n
fer
e
n
c
e
o
n
Ne
u
ra
l
Ne
two
rk
s
(
IJ
CNN)
,
2
0
1
7
,
p
p
.
2
8
6
5
-
2
8
7
2
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
8
-
8708
I
n
t J
E
lec
&
C
o
m
p
E
n
g
,
Vo
l.
11
,
No
.
4
,
A
u
g
u
s
t
2021
:
3
3
7
4
-
3380
3380
[5
]
A
.
S
a
r
g
a
n
o
,
P
.
A
n
g
e
lo
v
,
a
n
d
Z.
Ha
b
ib
,
“
A
Co
m
p
re
h
e
n
siv
e
Re
v
i
e
w
o
n
Ha
n
d
c
ra
f
ted
a
n
d
L
e
a
rn
in
g
-
Ba
se
d
Ac
ti
o
n
Re
p
re
se
n
tatio
n
A
p
p
r
o
a
c
h
e
s f
o
r
Hu
m
a
n
A
c
ti
v
it
y
Re
c
o
g
n
it
io
n
,
”
Ap
p
li
e
d
S
c
ien
c
e
s
,
v
o
l.
7
,
n
o
.
1
,
2
0
1
7
,
A
rt.
n
o
.
1
1
0
.
[6
]
E
.
N
.
Be
rm
e
jo
,
O
.
D
.
S
u
a
re
z
,
G
.
B
.
G
a
rc
ıa,
a
n
d
R
.
S
u
k
th
a
n
k
a
r,
“
Vio
len
c
e
d
e
tec
ti
o
n
i
n
v
i
d
e
o
u
sin
g
c
o
m
p
u
ter
v
isio
n
tec
h
n
iq
u
e
s
,”
Co
mp
u
ter
An
a
lys
is o
f
Ima
g
e
s
a
n
d
Pa
tt
e
rn
s (
CAIP
),
v
o
l.
6
8
5
5
,
p
p
.
3
3
2
–
3
3
9
,
2
0
1
1
.
[7
]
A
.
K
a
rp
a
th
y
,
G
.
T
o
d
e
rici,
S
.
S
h
e
t
ty
,
T
.
L
e
u
n
g
,
R.
S
u
k
th
a
n
k
a
r,
a
n
d
L
.
F
e
i
-
F
e
i,
“
La
rg
e
-
sc
a
le
v
id
e
o
c
la
ss
if
ic
a
ti
o
n
w
it
h
c
o
n
v
o
lu
ti
o
n
a
l
n
e
u
ra
l
n
e
tw
o
rk
s
,”
2
0
1
4
IEE
E
C
o
n
f.
of
Co
mp
u
ter
Vi
sio
n
a
n
d
Pa
tt
e
rn
Rec
o
g
n
i
ti
o
n
(
CVP
R
),
2
0
1
4
,
p
p
.
1
7
2
5
–
1
7
3
2
.
[8
]
A
.
B.
S
a
rg
a
n
o
,
X.
W
a
n
g
,
P
.
A
n
g
e
lo
v
,
a
n
d
Z
.
Ha
b
i
b
,
“
Hu
m
a
n
a
c
ti
o
n
re
c
o
g
n
it
io
n
u
si
n
g
tran
sf
e
r
lea
r
n
in
g
w
it
h
d
e
e
p
re
p
re
se
n
tatio
n
s,”
2
0
1
7
I
n
ter
n
a
ti
o
n
a
l
J
o
in
t
Co
n
fer
e
n
c
e
o
n
Ne
u
ra
l
N
e
two
rk
s (
IJ
CNN)
,
2
0
1
7
,
p
p
.
4
6
3
–
4
6
9
.
[9
]
S
.
Ho
c
h
re
it
e
r
a
n
d
J.
S
c
h
m
id
h
u
b
e
r,
“
L
o
n
g
sh
o
rt
-
term
m
e
m
o
r
y
,
”
Ne
u
ra
l
Co
m
p
u
t
a
ti
o
n
,
v
o
l.
9
,
n
o
.
8
,
p
p
.
1
7
3
5
–
1
7
8
0
,
1
9
9
7
.
[1
0
]
Y.
G
o
n
g
,
W
.
W
a
n
g
,
S
.
Jia
n
g
,
Q.
Hu
a
n
g
,
a
n
d
W
.
G
a
o
,
“
De
tec
ti
n
g
v
io
len
t
sc
e
n
e
s
in
m
o
v
ies
b
y
a
u
d
it
o
ry
a
n
d
v
isu
a
l
c
u
e
s
,
”
Pa
c
if
ic
-
Ri
m Co
n
fer
e
n
c
e
o
n
M
u
lt
ime
d
ia
,
2
0
0
8
,
p
p
.
3
1
7
–
3
2
6
.
[1
1
]
T
.
G
ian
n
a
k
o
p
o
u
lo
s,
D
.
Ko
sm
o
p
o
u
l
o
s,
A
.
A
risti
d
o
u
,
a
n
d
S
.
T
h
e
o
d
o
rid
is,
“
V
i
o
len
c
e
c
o
n
ten
t
c
las
sif
ica
ti
o
n
u
sin
g
au
d
io
f
e
a
tu
re
s,
”
He
ll
e
n
ic Co
n
fer
e
n
c
e
o
n
Arti
fi
c
ia
l
In
telli
g
e
n
c
e
,
2
0
0
6
,
p
p
.
5
0
2
–
5
0
7
.
[1
2
]
J.
Na
m
,
M
.
A
l
g
h
o
n
iem
y
,
a
n
d
A
.
H.
T
e
wf
i
k
,
“
A
u
d
io
-
v
isu
a
l
c
o
n
ten
t
-
b
a
se
d
v
io
len
t
sc
e
n
e
c
h
a
ra
c
teriz
a
ti
o
n
,
”
Pro
c
e
e
d
in
g
s
1
9
9
8
In
ter
n
a
ti
o
n
a
l
Co
n
fer
e
n
c
e
o
n
Ima
g
e
Pro
c
e
ss
in
g
(
ICIP9
8
),
1
9
9
8
,
p
p
.
3
5
3
-
3
5
7
.
[1
3
]
S
.
A
.
Ca
rn
e
ir
o
,
G
.
P
.
d
a
S
il
v
a
,
S
.
J
.
F
.
G
u
im
a
r
a
̃e
s
,
a
n
d
H
.
P
e
d
ri
n
i,
“
F
ig
h
t
De
tec
ti
o
n
i
n
V
i
d
e
o
S
e
q
u
e
n
c
e
s
Ba
se
d
o
n
M
u
lt
i
-
S
trea
m
Co
n
v
o
lu
ti
o
n
a
l
Ne
u
ra
l
Ne
tw
o
rk
s
,
”
3
2
n
d
S
IBGR
AP
I
Co
n
fer
e
n
c
e
o
n
Gr
a
p
h
ics
,
P
a
tt
e
r
n
s
a
n
d
Im
a
g
e
s
(
S
IBGR
AP
I)
,
2
0
1
9
,
p
p
.
8
-
15.
[1
4
]
S
.
Da
s,
A
.
S
a
rk
e
r
,
a
n
d
T
.
M
a
h
m
u
d
,
“
V
io
len
c
e
De
tec
ti
o
n
f
ro
m
V
id
e
o
s
u
si
n
g
HO
G
F
e
a
tu
re
s,”
2
0
1
9
4
t
h
In
ter
n
a
ti
o
n
a
l
Co
n
fer
e
n
c
e
o
n
El
e
c
trica
l
I
n
fo
rm
a
ti
o
n
a
n
d
C
o
mm
u
n
ica
ti
o
n
T
e
c
h
n
o
l
o
g
y
(
EICT
)
,
Kh
u
ln
a
,
Ba
n
g
lad
e
sh
,
2
0
1
9
,
p
p
.
1
–
5
.
[1
5
]
G
.
S
a
k
th
iv
in
a
y
a
g
a
m
,
R
.
Eas
a
w
a
ra
k
u
m
a
r,
A
.
A
ru
n
a
c
h
a
la
m
,
a
n
d
M
.
P
a
n
d
i
,
“
V
io
len
c
e
De
tec
ti
o
n
S
y
st
e
m
u
sin
g
Co
n
v
o
l
u
ti
o
n
Ne
u
ra
l
Ne
t
w
o
rk
,
”
S
S
RG
In
ter
n
a
ti
o
n
a
l
J
o
u
rn
a
l
o
f
El
e
c
tro
n
ics
a
n
d
Co
mm
u
n
ica
ti
o
n
En
g
i
n
e
e
rin
g
(
IJ
ECE
)
,
v
o
l.
6
,
n
o
.
2
,
2
0
1
9
,
A
rt
.
n
o
.
1
0
2
.
[1
6
]
K
.
S
im
o
n
y
a
n
a
n
d
A
.
Zi
ss
e
rm
a
n
,
“
Tw
o
-
stre
a
m
c
o
n
v
o
lu
ti
o
n
a
l
n
e
tw
o
rk
s
f
o
r
a
c
ti
o
n
re
c
o
g
n
it
io
n
i
n
v
i
d
e
o
s
,
”
Ad
v
a
n
c
e
s
in
Ne
u
ra
l
In
f
o
rm
a
ti
o
n
Pro
c
e
ss
in
g
S
y
ste
ms
(
NIPS
)
,
2
0
1
4
,
p
p
.
5
6
8
–
5
7
6
.
[1
7
]
A
.
M
u
m
taz
,
A
.
Bu
x
a
n
d
Z
.
Ha
b
i
b
,
“
V
i
o
len
c
e
De
tec
ti
o
n
in
S
u
rv
e
il
lan
c
e
V
id
e
o
s
w
it
h
De
e
p
Ne
t
w
o
rk
u
sin
g
T
ra
n
s
f
e
r
L
e
a
r
n
in
g
,
”
El
e
c
trica
l
En
g
i
n
e
e
rin
g
a
n
d
C
o
mp
u
ter
S
c
ien
c
e
(
EE
CS
)
,
p
p
.
5
5
8
–
5
6
3
,
2
0
1
8
.
[1
8
]
C.
S
z
e
g
e
d
y
e
t
a
l
.
,
“
G
o
in
g
De
e
p
e
r
w
it
h
Co
n
v
o
l
u
ti
o
n
s
,
”
IEE
E
Co
n
fer
e
n
c
e
o
n
C
o
mp
u
ter
Vi
sio
n
a
n
d
P
a
tt
e
rn
Rec
o
g
n
it
io
n
(
CVP
R),
2
0
0
1
,
p
p
.
1
–
9
.
[1
9
]
M
.
P
e
re
z
a
n
d
A
.
C.
Ko
t,
“
De
te
c
ti
o
n
o
f
re
a
l
-
w
o
rld
f
ig
h
ts
in
su
rv
e
il
lan
c
e
v
id
e
o
s
,
”
In
ter
n
a
ti
o
n
a
l
Co
n
fer
e
n
c
e
o
n
Aco
u
stics
,
S
p
e
e
c
h
,
a
n
d
S
ig
n
a
l
Pro
c
e
ss
in
g
,
2
0
1
9
,
p
p
.
2
6
6
2
-
2
6
6
6
.
[2
0
]
Al
-
M
a
a
m
o
o
n
R.
A
.
a
n
d
R
.
F
.
A
l
-
T
u
m
a
,
“
Ro
b
u
st
Re
a
l
-
T
i
m
e
V
io
len
c
e
De
tec
ti
o
n
in
V
i
d
e
o
Us
in
g
CN
N
A
n
d
L
S
T
M
,
”
S
tu
d
e
n
t
C
o
n
fer
e
n
c
e
o
n
Co
n
se
rv
a
t
io
n
S
c
ien
c
e
,
2
0
1
9
,
p
p
.
1
0
4
-
1
0
8
.
[2
1
]
K.
S
im
o
n
y
a
n
a
n
d
A
.
Zi
ss
e
r
m
a
n
,
“
V
e
ry
D
e
e
p
Co
n
v
o
lu
t
io
n
a
l
Ne
two
rk
s
f
o
r
Larg
e
-
S
c
a
le
I
m
a
g
e
R
e
c
o
g
n
it
io
n
,
”
a
rXiv
:
1
4
0
9
.
1
5
5
6
,
p
p
.
1
–
1
4
,
2
0
1
4
.
[2
2
]
S
.
S
u
d
h
a
k
a
ra
n
a
n
d
O.
L
a
n
z
,
“
Lea
rn
in
g
to
d
e
tec
t
v
io
len
t
v
id
e
o
s
u
si
n
g
c
o
n
v
o
lu
t
io
n
a
l
lo
n
g
sh
o
rt
-
term
m
e
m
o
r
y
,
”
2
0
1
7
1
4
t
h
IEE
E
In
ter
n
a
ti
o
n
a
l
C
o
n
f
.
o
n
Ad
v
a
n
c
e
d
Vi
d
e
o
a
n
d
S
i
g
n
a
l
Ba
se
d
S
u
rv
e
il
la
n
c
e
(
AV
S
S
)
,
L
e
c
c
e
,
2
0
1
7
,
p
p
.
1
–
6
.
[2
3
]
Q.
Xu
,
J
.
S
e
e
,
a
n
d
W
.
L
in
,
“
L
o
c
a
li
z
a
ti
o
n
G
u
id
e
d
F
ig
h
t
A
c
ti
o
n
De
tec
ti
o
n
In
S
u
rv
e
il
lan
c
e
V
id
e
o
s
,
”
IE
EE
In
ter
n
a
t
io
n
a
l
C
o
n
fer
e
n
c
e
o
n
M
u
l
t
ime
d
ia
a
n
d
Exp
o
,
2
0
1
9
,
p
p
.
5
6
8
-
5
7
3
.
[2
4
]
E.
Ilg
,
N.
M
a
y
e
r,
T
.
S
a
i
k
ia,
M
.
Ke
u
p
e
r,
A
.
Do
so
v
it
sk
i
y
,
a
n
d
T
.
Bro
x
,
“
F
lo
w
n
e
t
2
.
0
:
Ev
o
lu
ti
o
n
o
f
o
p
ti
c
a
l
f
lo
w
e
sti
m
a
ti
o
n
w
it
h
d
e
e
p
n
e
tw
o
rk
s,
”
Co
mp
u
ter
V
isio
n
a
n
d
Pa
t
ter
n
Rec
o
g
n
it
io
n
(
CVP
R
),
v
o
l
.
2
,
p
p
.
1
-
1
6
,
2
0
1
7
.
[2
5
]
D.
T
ra
n
,
L
.
Bo
u
rd
e
v
,
R.
F
e
rg
u
s,
L
.
T
o
rre
sa
n
i,
a
n
d
M
.
P
a
lu
r
i,
“
L
e
a
rn
in
g
sp
a
ti
o
tem
p
o
ra
l
f
e
a
tu
re
s
w
it
h
3
D
c
o
n
v
o
lu
ti
o
n
a
l
n
e
tw
o
rk
s
,
”
Pro
c
e
e
d
in
g
s o
f
th
e
IE
EE
I
n
t
.
C
o
n
fer
e
n
c
e
o
n
C
o
mp
u
ter
Vi
si
o
n
,
2
0
1
5
,
p
p
.
4
4
8
9
-
4
4
9
7
.
[2
6
]
I
.
S
e
rra
n
o
,
O
.
De
n
iz,
J
.
L
.
Esp
in
o
sa
-
A
ra
n
d
a
,
a
n
d
G
.
Bu
e
n
o
,
“
F
ig
h
t
Re
c
o
g
n
it
io
n
i
n
v
id
e
o
u
si
n
g
Ho
u
g
h
F
o
re
sts
a
n
d
2
D Co
n
v
o
lu
t
io
n
a
l
Ne
u
ra
l
Ne
tw
o
rk
,
”
IEE
E
T
ra
n
s
,
o
n
Im
a
g
e
Pro
c
e
s
sin
g
,
v
o
l.
2
7
,
n
o
.
1
0
,
p
p
.
4
7
8
7
-
4
7
9
7
,
2
0
1
8
.
[2
7
]
Ş
.
A
k
tı
,
G
.
A
.
Tata
ro
ğ
lu
,
a
n
d
H.
K.
Ek
e
n
e
l,
“
V
isio
n
-
ba
se
d
F
ig
h
t
De
tec
ti
o
n
f
ro
m
S
u
rv
e
il
lan
c
e
Ca
m
e
r
a
s,
”
2
0
1
9
N
in
t
h
In
ter
n
a
t
io
n
a
l
C
o
n
fer
e
n
c
e
o
n
Im
a
g
e
Pro
c
e
ss
in
g
T
h
e
o
ry
,
T
o
o
ls a
n
d
Ap
p
li
c
a
ti
o
n
s (
IPT
A),
Ista
n
b
u
l,
T
u
rk
e
y
,
2
0
1
9
.
[2
8
]
F
.
Ch
o
ll
e
t,
“
X
c
e
p
ti
o
n
:
De
e
p
L
e
a
rn
in
g
w
it
h
De
p
th
w
ise
S
e
p
a
ra
b
le
Co
n
v
o
lu
ti
o
n
s,
”
2
0
1
7
IE
EE
Co
n
fer
e
n
c
e
o
n
Co
mp
u
ter
V
isio
n
a
n
d
Pa
t
ter
n
Rec
o
g
n
it
io
n
(
CVP
R)
,
H
o
n
o
lu
l
u
,
HI
,
2
0
1
7
,
p
p
.
1
2
5
1
-
1
2
5
8
.
[2
9
]
W
.
S
o
n
g
,
D
.
Z
h
a
n
g
,
X
.
Zh
a
o
,
J
.
Yu
,
R
.
Zh
e
n
g
,
a
n
d
A
.
W
a
n
g
,
“
A
No
v
e
l
V
i
o
len
t
V
id
e
o
De
tec
ti
o
n
S
c
h
e
m
e
Ba
se
d
o
n
M
o
d
if
ied
3
D
Co
n
v
o
lu
t
io
n
a
l
Ne
u
r
a
l
Ne
tw
o
rk
s
,
”
IEE
E
Acc
e
s
s
,
v
o
l.
7
,
p
p
.
3
9
1
7
2
–
3
9
1
7
9
,
20
19
.
[3
0
]
A
.
S
.
Ke
ç
e
li
a
n
d
A
.
Ka
y
a
,
“
V
io
len
t
a
c
ti
v
it
y
d
e
tec
ti
o
n
w
it
h
tran
sf
e
r
lea
rn
in
g
m
e
th
o
d
,
”
in
El
e
c
tro
n
ics
L
e
tt
e
rs
,
v
o
l.
5
3
,
n
o
.
1
5
,
p
p
.
1
0
4
7
–
1
0
4
8
,
2
0
1
7
.
[3
1
]
A
.
Kriz
h
e
v
sk
y
,
I.
S
u
tsk
e
v
e
r,
a
n
d
G
.
E.
Hin
to
n
,
“
Im
a
g
e
Ne
t
Clas
sif
ic
a
ti
o
n
w
it
h
De
e
p
Co
n
v
o
lu
ti
o
n
a
l
Ne
u
ra
l
Ne
tw
o
rk
s,”
Ad
v
a
n
c
e
s in
n
e
u
ra
l
in
fo
rm
a
ti
o
n
p
ro
c
e
ss
in
g
sy
ste
ms
,
p
p
.
1
–
9
,
2
0
1
2
.
[3
2
]
L
e
c
u
n
Y
.
,
Bo
tt
o
u
L
.
,
Be
n
g
io
Y
e
t
a
l
.
,
“G
ra
d
ien
t
-
b
a
se
d
lea
rn
i
n
g
a
p
p
li
e
d
to
d
o
c
u
m
e
n
t
re
c
o
g
n
it
io
n
,
”
Pro
c
e
e
d
in
g
s
o
f
th
e
IEE
E,
v
o
l.
86
,
n
o
.
11
,
p
p
.
2
2
7
8
–
2
3
2
4
,
1
9
9
8
.
[3
3
]
K.
He
,
X
.
Zh
a
n
g
,
S
.
Re
n
,
a
n
d
J.
S
u
n
,
“
De
e
p
Re
sid
u
a
l
L
e
a
rn
in
g
f
o
r
Im
a
g
e
Re
c
o
g
n
it
io
n
,
”
2
0
1
6
IEE
E
Co
n
fer
e
n
c
e
o
n
Co
mp
u
ter
V
isio
n
a
n
d
Pa
t
ter
n
Rec
o
g
n
it
io
n
(
CVP
R
),
2
0
1
6
,
p
p
.
7
7
0
–
7
7
8
.
[3
4
]
W
.
S
u
lt
a
n
i,
C
.
C
h
e
n
,
a
n
d
M
.
S
h
a
h
,
“
Re
a
l
-
w
o
rld
a
n
o
m
a
l
y
d
e
tec
ti
o
n
i
n
su
rv
e
il
lan
c
e
v
id
e
o
s
,
”
a
rXi
v
:
1
8
0
1
.
0
4
2
6
4
,
2
0
1
8
.
[3
5
]
S
.
W
a
n
g
a
n
d
C.
M
a
n
n
in
g
,
“
F
a
st
d
ro
p
o
u
t
train
i
n
g
,
”
Pro
c
e
e
d
in
g
s
o
f
th
e
3
0
t
h
In
ter
n
a
ti
o
n
a
l
C
o
n
fer
e
n
c
e
o
n
M
a
c
h
in
e
L
e
a
rn
in
g
,
v
o
l.
2
8
,
n
o
.
2
,
p
p
.
1
1
8
–
1
2
6
,
2
0
1
3
.
Evaluation Warning : The document was created with Spire.PDF for Python.