Indonesi
an
Journa
l
of El
ect
ri
cal Engineer
ing
an
d
Comp
ut
er
Scie
nce
Vo
l.
24
,
No.
1
,
Octo
be
r
2021
,
pp.
17
8
~
18
8
IS
S
N: 25
02
-
4752, DO
I: 10
.11
591/ijeecs
.v
24
.i
1
.
pp
17
8
-
18
8
178
Journ
al h
om
e
page
:
http:
//
ij
eecs.i
aesc
or
e.c
om
Static ha
nd
ge
stu
re re
cogn
ition of
Arabic s
ign lan
gu
age
by
usin
g deep CNN
s
Moham
ma
d
H.
Ism
ail
,
She
fa A
.
Daww
d
,
Fak
h
ra
deen
H. Ali
Depa
rtment
o
f
C
om
pute
r
Engi
n
e
eri
ng,
Coll
ege of
Engi
n
ee
ring
,
U
nive
rsit
y
of
Mos
ul,
Mos
ul,
Ira
q
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
histo
ry:
Re
cei
ved
J
un
17
,
2021
Re
vised
J
ul
31
,
2021
Accepte
d
Aug
4
,
2021
An
Arabi
c
sig
n
la
ngu
age
r
e
cogni
ti
on
using
two
con
ca
t
en
at
ed
dee
p
convol
uti
on
neu
ral
ne
twork
m
odel
s
DenseNet
1
21
&
VG
G16
is
pre
sente
d
.
The
pr
e
-
tr
ai
ned
m
odel
s
are
fe
d
with
images,
and
the
n
t
he
s
y
stem
c
an
aut
om
at
i
ca
l
l
y
r
ec
ogni
ze
th
e
Arabi
c
sign
language
.
To
e
val
ua
te
th
e
per
form
anc
e
of
conc
atena
t
ed
t
wo
m
odel
s
in
the
Arabi
c
sig
n
la
nguage
rec
ogni
ti
on,
the
red
-
gre
en
-
blue
(
RGB
)
images
for
var
ious
sta
ti
c
signs
ar
e
col
l
ec
t
ed
in
a
dat
ase
t.
Th
e
d
at
ase
t
comprise
s
220,
000
images
for
44
ca
t
egor
ie
s:
32
l
et
t
ers,
11
num
b
ers
(0:10),
and
1
for
none
.
For
each
of
the
stat
ic
signs,
th
er
e
are
5000
imag
es
col
l
ec
t
ed
fro
m
diffe
ren
t
vo
lu
nte
ers.
The
pre
-
traine
d
m
od
el
s
were
used
a
nd
tra
in
ed
on
pr
epa
red
Ar
abi
c
si
gn
la
ngua
g
e
dat
a
.
The
se
m
odel
s
were
used
aft
er
som
e
m
odifi
ca
t
ion.
Also,
an
at
t
empt
has
bee
n
m
ade
to
a
dopt
two
m
odel
s
from
the
pre
viousl
y
tr
ai
ned
m
odel
s,
wher
e
they
a
re
tra
in
ed
in
par
al
l
el
d
ee
p
fea
tur
e
ex
tracti
o
ns
.
The
n
they
ar
e
combine
d
and
pre
par
ed
f
or
the
cl
assif
i
c
at
ion
stage.
Th
e
result
s
demons
tra
te
th
e
compari
son
betw
ee
n
the
per
for
m
anc
e
of
the
single
m
odel
and
mul
ti
-
m
odel.
It
appe
ars
th
at
m
ost
of
the
m
ult
i
-
m
odel
is
bet
t
e
r
in
fea
tur
e
ex
t
rac
t
ion
and
cl
assifi
ca
t
ion
th
a
n
the
singl
e
m
odel
s.
And
a
lso
show
tha
t
when
d
e
pendi
ng
on
the
to
ta
l
num
ber
of
inc
orr
ec
t
r
ecogniz
e
sign
image
in
tr
ai
ning
,
v
al
id
at
ion
an
d
te
sting
d
ataset
,
the
b
est
c
onvo
l
uti
onal
neur
al
net
works
(
CNN
)
m
odel
in
fea
tur
e
ex
tracti
o
n
and
c
la
ss
ifi
c
ati
on
Arabi
c
sign
l
angua
ge
is t
he
DenseNet
121
for
a
single m
ode
l
using
and
Den
seNet
121
&
VG
G16 for
m
ult
i
-
m
odel
using
.
Ke
yw
or
d
s
:
Ar
a
bic sig
n
la
ngua
ge
Conv
olu
ti
onal
neural
netw
ork
Deep l
ear
ning
Mult
i
-
m
od
el
Stat
ic
h
an
d ges
ture
This
is an
open
acc
ess arti
cl
e
un
der
the
CC
B
Y
-
SA
l
ic
ense
.
Corres
pond
in
g
Aut
h
or
:
Moh
am
m
ad
H. Ism
ai
l
Dep
a
rtm
ent o
f
Com
pu
te
r
E
ng
i
neer
i
ng
Coll
ege
of
En
gi
neer
in
g
,
U
nive
rsity
o
f
Mos
ul
,
Mos
ul
,
Ir
a
q
Em
a
il
:
m
oh
a
m
m
ad.
ha
qq
i
@gm
ai
l.co
m
1.
INTROD
U
CTION
Sign
la
ngua
ge
is
th
ought
to
be
the
on
ly
m
eans
for
norm
al
people,
hea
rin
g
-
im
paired
,
an
d
deaf
t
o
com
m
un
ic
at
e.
People
us
e
no
n
-
verbal
s
peec
h
in
the
f
or
m
of
si
gn
la
ngua
ge
sig
nals
to
e
xpress
t
heir
th
oughts
and
feeli
ngs
.
I
n
si
gn
la
ngua
ge
,
the
re
a
re
tw
o
ty
pes
of g
est
ur
es:
stat
ic
an
d
dynam
ic
.
[1
]
.
A
ra
bic
Si
gn
La
ngua
ge
Ar
S
L
has
m
any
nation
va
riet
ie
s
and
diale
ct
s.
It
var
ie
s
from
on
e
nation
to
a
no
t
her,
eve
n
oft
en
inside
the
s
a
m
e
country.
Des
pite
this,
the
al
phabet
a
nd
num
ber
s
i
n
the
Ar
a
bic
la
ngua
ge
a
re
sta
ndar
dized
in
sig
n
la
ngua
ge
[
2]
.
Sign
la
ngua
ge
s
n
ee
d
a
n
inte
ll
igent
de
vice
that
can
c
onve
rt
them
fr
om
on
e
sig
n
la
ng
ua
ge
to
an
oth
e
r
us
in
g
natu
ral
la
ngua
ge.
With
out
an
inter
pr
et
er
,
it
’
s
dif
ficult
for
m
os
t
peo
ple
w
ho
are
n'
t
intere
ste
d
in
sig
n
la
ngua
ge
to co
m
m
un
ic
at
e. T
hese
pro
blem
s n
ecessi
ta
te
the use
of
a
u
to
m
at
ic
sign
lan
guage
tra
ns
la
ti
on
pro
gr
am
s.
Non
-
m
anu
al
a
nd
m
anu
al
sig
ns
are
t
he
tw
o
m
ajo
r
c
om
ponen
ts
t
hat
m
ak
e
up
si
gn
la
ng
uag
e
s.
Bo
dy
m
ot
ion
a
nd
fa
ci
al
expressio
ns
a
re
represe
nted
by
t
he
non
-
m
anu
al
.
Ha
nd
locat
io
n,
ori
entat
ion,
s
ha
pe,
a
nd
trajecto
ry
are
t
h
e
m
anu
al
sig
nals.
Mo
st
w
orks
,
ho
wev
e
r,
con
ce
ntrate
on
m
anu
al
sig
ns
because
they
pro
vid
e
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Sta
ti
c ha
nd ges
ture rec
ogniti
on
of Ar
ab
ic
sig
n
la
ngua
ge by
us
in
g dee
p
CN
Ns
(
Moha
mma
d H. Is
ma
il
)
179
the
m
os
t
i
m
p
or
ta
nt
inform
a
ti
on
,
no
n
-
m
anu
al
sign
s
,
on
the
oth
e
r
ha
nd,
assist
sign
e
rs
in
ex
plaining
a
nd
e
m
ph
asi
zi
ng th
e v
al
ue
of m
anu
al
sig
ns
[
3
]
,
[
4]. In
t
his
wor
k,
t
he
m
anu
al
s
ign
is
in
vestiga
te
d.
Sign
la
ngua
ge
detect
ion
is
ac
hieve
d
in
tw
o
appr
oach
es:
t
he
first
a
ppr
oac
h
dep
e
nds
on
visio
n
-
base
d,
wh
il
e
the
seco
nd
a
ppr
oach
is
based
on
se
nsor
-
ba
sed
[
5
]
,
[
6].
The
visi
on
-
base
d
ap
proac
h
captu
res
the
hand
gestu
re
with
t
he
ca
m
era
in
th
e
form
of
sta
tic
or
se
quentia
l
i
m
ages
without
the
use
of
gl
ov
es
or
sens
or
s.
This
appr
oach
is
m
os
t
ap
pro
pr
ia
te
for
the
real
da
il
y
li
fe
of
the
deaf
a
nd
m
ute,
al
though
the
r
e
are
m
any
obsta
cl
es
su
c
h
as
li
ghti
ng
c
onditi
ons,
s
kin
c
olou
r,
bac
kgr
ound
di
ff
e
r
e
nces,
i
n
a
dd
it
i
on
to
t
he
properti
es
an
d
set
ti
ng
s
of
the
cam
era
[5]
.
The
se
ns
or
-
base
d
ap
proac
h
in
vo
l
ves
w
e
arin
g
gl
ov
es
,
wh
ic
h
co
ntain
sens
or
s
i
nten
ded
t
o
express
sig
n
l
angua
ge.
Thes
e
gloves
ha
ve
the
c
har
act
eri
sti
c
of
bein
g
un
a
ff
ect
e
d
by
the
obsta
cl
es
t
hat
t
he
visio
n
-
base
d
a
ppr
oach faces
. Ho
we
ver, it
is
no
t a
ppr
opriat
e
to wear
it
m
os
t of the ti
m
e
[5]
.
The
m
ai
n
goal
of
the
pr
ese
nt
ed
re
searc
h
is
to
pr
e
pa
re
da
ta
relat
ed
t
o
t
he
Ar
a
bic
si
gn
la
ngua
ge,
a
m
ou
ntin
g
to
220
th
ousan
d
colo
ur
im
ages.
Then,
us
i
ng
pre
-
trai
ned
m
odel
s
fed
with
i
m
ages,
set
up
a
syst
e
m
that
can
a
uto
m
at
ic
al
ly
reco
gniz
e
the
A
rab
ic
sign
la
ngua
ge,
wh
ic
h
incl
ud
e
s
44
cat
eg
or
ie
s:
32
le
tt
ers,
nu
m
ber
s
from
0
to
10,
a
nd
one
f
or
none
.
Als
o,
an
at
te
m
pt
to
evaluat
e
the
perf
or
m
a
nce
of
co
ncate
nating
tw
o
m
od
el
s
i
n
the Ara
bic sig
n l
angua
ge reco
gn
it
io
n
is
pr
e
se
nted.
2.
RELATE
D
W
ORKS
Deep
le
a
rn
i
ng
is
widely
us
e
d
in
m
any
areas.
Co
nvolu
ti
onal
ne
ur
al
networks
are
a
f
or
m
of
de
e
p
neural
net
wor
k
that
is
wide
ly
us
ed
f
or
i
m
age
analy
sis.
The
re
are
var
i
ou
s
a
rc
hitec
tures
a
vaila
b
le
f
or
conv
olu
ti
onal
neural
netw
or
ks
(C
NN
s
).
C
NN
s
a
re
gi
ving
the
best
a
nd
m
os
t
accurate
resu
lt
s
w
hen
so
lvi
ng
real
-
w
orl
d
pro
blem
s.
On
e
of
it
s
app
li
cat
io
ns
is
im
age
cl
assifi
cat
ion
,
w
hich
is
t
he
process
of
capt
uri
ng
a
n
i
m
age
as
an
inp
ut
an
d
produ
ci
ng
the
im
ag
e’s
cl
ass.
A
crit
ic
al
ly
i
m
po
rtant
good
predic
ti
on
can
be
ob
ta
ined
thr
ough CN
Ns ro
le
i
n
re
duci
ng im
ages to
a
f
or
m
that is eas
y t
o
pr
ocess wi
thout l
os
i
ng f
e
at
ur
es
.
Ma
ny
researc
he
rs
ha
ve
us
e
d
diff
e
re
nt
m
et
ho
ds
to
ide
ntify
sign
la
ngua
ge
in
gen
eral
or
Ar
a
bic,
a
nd
so
m
e
of
them
will
be
prese
nt
ed.
I
n
[
7],
a
m
et
ho
d
for
re
cognizi
ng
A
rSL
num
ber
s
an
d
le
tt
ers
is
sug
gested
.
W
it
h
a
real
dat
aset
of
5839
i
m
ages
of
28
c
har
act
er
s
an
d
2030
im
ages
of
nu
m
ber
s
(fro
m
0
to
10),
t
his
s
yste
m
is base
d on CN
N.
The
prop
ose
d
syst
em
h
as
a
rec
ogniti
on ra
te
o
f
90.0
2%.
Using
a
fine
-
t
uned
VGG
19
m
od
el
,
Cre
pso
et
al
.
[8
]
pr
opose
s
an
red
-
gree
n
-
blu
e
(
R
GB
)
and
RGB
-
D
sta
ti
c
gestur
e
r
ecognit
ion
syst
e
m
.
The
fi
ne
-
t
un
e
d
VGG
19
m
od
el
us
es
a
f
eat
ur
e
c
on
c
at
enate
la
ye
r
of
R
GB
an
d
RGB
-
D
im
ages
to
increa
se
the
ne
ural
net
work
'
s
accu
rac
y.
The
pr
opose
d
m
od
el
te
ste
d
an
Am
erican
sig
n
la
nguag
e
(
AS
L
)
Re
co
gnit
ion
dataset
achie
ve
d
a
94.
8% reco
gn
it
io
n rate
.
Dad
a
sh
za
de
h
et
al.
[9
]
su
gge
ste
d
a
two
-
sta
ge
fusio
n
netw
ork
base
d
on
CNN
arc
hitec
ture
f
or
ha
nd
gestu
re
recog
ni
ti
on
.
In
the
fir
st
sta
ge
of
the
netw
ork
,
they
pro
posed
ha
nd
se
gm
entat
ion
a
rch
it
ect
ure.
Wh
en
there
is
a
si
m
i
la
rity
between
sk
in
col
our
a
nd
backg
rou
nd
colour,
the
ha
nd
se
gm
entat
i
on
m
od
el
perf
or
m
ed
well
in
dif
ficu
lt
conditi
on
s
,
accor
ding
t
o
t
heir
data.
The
y
desig
n
e
d
a
two
-
stream
CN
N
for
t
he
net
work
'
s
seco
nd
le
vel
unti
l
cl
assifi
cat
i
on,
it
can
le
arn
to
m
erg
e
featur
e
re
presentat
ion
s
f
r
om
bo
th
the
RGB
i
m
a
ge
an
d
it
s seg
m
entat
ion
m
ap.
T
heir
s
yst
e
m
r
un
s
at a
f
ram
e rate of
23 m
s p
er
fr
am
e.
A
deep
le
ar
nin
g
-
based
m
et
h
od
f
or
A
rS
L
recog
niti
on
w
as
sug
gested
in
[10].
Deep
featur
e
s
ar
e
sel
ect
ed
by
pro
cessi
ng in
pu
t i
m
ages w
it
h va
r
iou
s lay
er
s.
Fi
na
ll
y, the SoftM
ax
f
unct
io
n
is
us
e
d
to
di
vid
e t
arg
et
cl
asses
into
ca
te
gories
an
d
c
om
pu
te
a
nor
m
al
iz
ed
pr
oba
bili
ty
scor
e
f
or
each
.
W
it
h
a
s
cor
e
of
99.52%,
t
he
su
ggest
e
d
syst
e
m
based
on
residu
al
net
work
Re
s
Net1
01
ob
ta
ine
d
the
gr
e
at
est
accuracy.
Elsa
y
ed
an
d
Fathy
[11]
trai
ned
a
nd
te
ste
d
Deep
C
NN
a
rc
hitec
ture
on
a
n
Ar
a
bic
sig
n
la
ngua
ge
dataset
.
Their
e
xperi
m
ental
resu
lt
s
s
how
t
hat
the
trai
ni
ng
set
'
s
cl
assifi
cat
ion
acc
ur
a
c
y
was
98.
6%,
wh
il
e
t
he
te
sti
ng
set
s
was
94.
31%,
accor
ding t
o
th
e colle
ct
ed dat
aset
.
Althaga
fi
et
al
.
[12]
us
e
d
a
CNN
m
od
el
by
ta
king
gra
ysc
al
e
i
m
ages
as
input
to
a
syst
e
m
that
autom
at
ic
ally
recog
nizes
28
le
tt
ers
for
Ar
a
bic
Sig
n
Lan
gu
a
ge
rec
ogniti
on,
they
achieve
d
92.
9%
of
recog
niti
on
ac
cur
acy
on
10810
te
ste
d
im
ages.
Lat
if
et
al
.
[13]
sug
gested
a
syst
em
that
recog
nizes
the
Ar
a
bi
c
al
ph
a
bet'
s
signs
in
real
-
ti
m
e.
A
data
base
of m
or
e
than
50
000
im
ages
was
us
ed
t
o
trai
n
a
nd
te
st
the D
ee
p
CN
N
arch
it
ect
ures.
Seve
ral
tria
ls
are
car
ried
ou
t
to
determ
ine
the
hi
gh
e
st
r
ecognit
ion
rat
es
by
ch
an
ging
CN
N
arch
it
ect
ural
de
sign
pa
ram
et
e
rs.
Th
ree
c
onvoluti
onal
la
ye
r
s,
th
ree
poolin
g
la
ye
rs
,
a
nd
a
f
ully
co
nn
ect
e
d
la
ye
r
m
ake up
the
propose
d
d
ee
p
C
NN arc
h
it
ect
ure. T
he
acc
ur
ac
y of t
he
e
xperi
m
ental
r
esults i
s 97.6%.
The
acc
ur
acy
of
recog
nizing
32
ha
nd
gest
ur
es
from
the
Ar
a
bic
sig
n
la
ngua
ge
is
im
p
rove
d
us
i
ng
trans
fer
le
arn
i
ng
an
d
fin
e
-
t
un
i
ng
de
ep
c
onvoluti
onal
ne
ur
al
netw
ork
s
(
VGG
16,
R
esNet1
52)
[
14]
.
The
i
m
pl
e
m
entat
io
n
of
the
prese
nt
ed
m
od
el
was
accom
plished
by
reducin
g
th
e
siz
e
of
the
tr
ai
nin
g
dataset
wh
il
e
increasin
g
acc
ur
acy
.
Th
e
networks
wer
e
fe
d
by
im
ages
of
va
rio
us
Ar
a
bi
c
Sig
n
La
ngua
ge
data
a
nd
we
re
a
ble
to
achie
ve
a
n
accuracy
of
a
ppr
oxim
a
te
ly
99
%.
The
c
on
voluti
onal
neur
al
netw
ork
(C
NN)
a
nd
a
dat
aset
of
20,00
0
sig
n
im
ages
of
10
sta
ti
c
dig
it
s
we
re
us
e
d
in
researc
h
[
15]
to
buil
d
the
BSL
dig
it
s
recog
niti
on
s
yst
e
m
.
The
pro
pose
d
CNN
m
od
el
was
com
par
ed
to
a
nu
m
ber
of
oth
er
sig
n
la
ngua
ge
m
od
el
s.
The
pro
po
s
ed
CNN
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
24
, N
o.
1
,
Oct
ober
20
21
:
17
8
-
18
8
180
m
od
el'
s
arch
it
e
ct
ur
e
was
cl
ose
to
t
hat
of
t
he
V
GGNet,
but
it
on
ly
had
six
conv
olu
ti
onal
l
ay
ers
instea
d
of
the
VGG
Net'
s
m
in
i
m
u
m
o
f 13. T
he
trai
ning acc
ur
acy
of the
pr
opos
e
d
syst
em
is 97.
62
%
.
The
syst
em
is
a
re
-
trai
ni
ng
VGG
syst
em
[1
6]
for
real
-
ti
m
e
AS
L
fin
ge
rsp
el
li
ng
recogn
it
io
n
with
CNNs
netw
ork
s
to
cl
assify
a
total
of
26
al
phabets
,
as
well
as
two
cl
asses
for
sp
ace
a
nd
delet
e.
The
sy
stem
had
a
trai
ni
ng
set
accuracy
of
98.
53%
a
nd
a
validat
ion
s
et
accuracy
of
98
.
84%.
CN
N
us
e
d
the
pr
opos
e
d
syst
e
m
[1
7]
to
reco
gniz
e
Ara
bic
hand
sig
n
-
base
d
le
tt
ers
a
nd
tra
ns
la
te
the
m
into
Ar
abic
sp
eech.
T
his
syst
e
m
has
a
90%
ac
cur
acy
rate
in
recogn
iz
in
g
Ar
a
bic
sig
n
le
tt
ers.
Tasm
ere
et
al.
[18]
introdu
ce
d
a
sys
tem
to
recog
nize
ha
nd
gest
ur
es
i
n
r
eal
-
tim
e.
Han
d
segm
entat
ion
in
the
YC
bCr
colo
ur
s
pace
was
use
d
for
gestu
re
identific
at
ion, fo
ll
owe
d
by
t
he
sugg
e
ste
d
C
NN
m
od
el
. Three
co
nvol
utio
n
la
ye
rs,
t
wo m
ax
-
po
oling
la
ye
rs,
a
nd
two
f
ully
connecte
d
la
ye
rs
represe
nt
the
pro
po
se
d
CN
N
m
od
el
.
Fo
r
11
gestu
res
from
dep
th
i
m
a
ges,
this
pro
po
se
d
t
ec
hniqu
e
pro
vid
e
d
an
acc
ur
acy
of
94.61%
.
A
dat
aset
con
ta
ini
ng
1320
sam
ple
im
ages
was
us
e
d.
In
the
current
st
ud
y,
the
re
are
sever
al
at
tempts
to
de
velo
p
bo
th
the
si
ngle
m
od
el
and
the
m
ulti
-
m
od
el
s
to
increase
t
he
pe
rfor
m
ance
an
d
accu
racy
of
t
he
Ar
a
bic
sig
n
la
ngua
ge
recogn
it
io
n.
I
n
a
ddit
ion
,
this
stu
dy
was
disti
nguish
e
d b
y t
he
f
ollo
wing
:
A
la
r
ge
-
siz
e
d
c
olored dat
aset
was pre
par
e
d f
or
t
he Ara
bic s
ign
la
ngua
ge d
ue
to
the i
nab
il
it
y t
o
acce
ss such
data,
by m
any researc
hers
who deal
w
it
h A
r
abic sig
n
la
ngua
ge reco
gni
ti
on
.
Accor
ding
to
the
previ
ou
s
re
searche
s
an
d
usi
ng
the
m
ulti
-
m
od
el
’s
m
et
ho
d,
the
re
are
dif
fer
e
nt
input
da
ta
for
each
m
od
el
as
colo
ur
a
nd
dep
t
h
im
ages.
Wh
il
e
in
this
s
tud
y,
the
sam
e
input
colo
ur
i
m
ages
wer
e
use
d
for
eac
h
m
od
el
.
The
CN
N
m
odel
s
gen
erate
di
f
fe
ren
t
le
ngths
of
featu
re
m
a
ps
with
dif
fer
e
nt
ranges
of
va
lues.
Wh
en
us
in
g
m
ulti
ple
m
od
el
s,
be
for
e
m
erg
ing
t
he
two
m
od
el
s
fe
at
ur
es,
we
nor
m
al
iz
ed
the
va
lues
of
t
hese
f
eat
ur
e
m
aps
in t
he
sa
m
e range.
3.
E
X
PERI
MEN
TAL MET
H
O
DOLO
GY
3.1.
Dataset
The
th
ree
-
c
ha
nn
el
RGB
im
ag
es
are
receive
d
from
the
ca
m
era.
T
he
RGB
i
m
ages
fo
r
va
r
iou
s
sta
ti
c
sign
s
ar
e
colle
ct
ed
in
this
da
ta
set
.
The
data
set
com
pr
ise
s
220,0
00
im
ages
fo
r
44
cat
e
gories:
32
le
tt
ers
as
sh
ow
n
in
Fi
gure
1
t
o
ex
pr
e
ss
al
l
the
Ar
a
bic
sign
la
ngua
ge
(ArSL
)
voca
bu
la
ry,
11
num
ber
s
(
0:10),
a
nd
1
f
or
none.
F
or
each
of
the
sta
ti
c
sign
s
,
t
her
e
are
5000
im
ages
colle
ct
ed
f
r
om
10
di
ff
e
ren
t
vo
lun
te
ers
.
T
he
da
ta
set
div
ide
d
into
th
ree
gro
up
s
trai
ning,
validat
io
n,
an
d
te
sti
ng
, w
he
re
80%
(
176,000
im
ages)
of
the
data
we
r
e
us
e
d
f
or
trai
ni
ng,
10
%
(22,0
00
im
a
ges)
of
the
dat
a
wer
e
us
e
d
f
or
validat
io
n,
an
d
10%
(
22,00
0
i
m
ages)
of
the
data
wer
e
us
e
d
f
or
te
sti
ng
.
Th
e
dataset
al
so
i
nclu
ded
se
ver
al
cas
es
of
div
e
rse
li
gh
ti
ng
c
on
diti
on
s
a
nd
backg
rounds
;
it
includ
e
d
c
ha
ng
i
ng the
dista
nce
betw
ee
n
a
us
er
and t
he
ca
m
era,
as s
how
n
in
Fig
ure
2.
Figure
1.
A
rab
i
c alph
a
bet si
gns ar
e ty
pe
of st
at
ic
g
est
ures a
nd are
p
e
rfor
m
ed usin
g
a
sin
gl
e h
an
d
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Sta
ti
c ha
nd ges
ture rec
ogniti
on
of Ar
ab
ic
sig
n
la
ngua
ge by
us
in
g dee
p
CN
Ns
(
Moha
mma
d H. Is
ma
il
)
181
Figure
2. A
set o
f
im
ages o
f
the let
te
r
Ba
fro
m
the d
at
aset
3.2.
Data
p
re
processin
g
Sign
im
ages
wer
e
pre
process
ed
by
resizi
ng
and
norm
al
iz
ing
the
im
age.
T
he
im
age
is
then
re
siz
ed
t
o
100
x100.
This
siz
e
is
cho
sen
as
a
tradeof
f
betwee
n
acc
ur
acy
an
d
exe
cution
ti
m
e.
T
hese
im
ages
a
re
then
norm
al
iz
ed
to ch
an
ge
t
he ran
ge of
p
i
xel inte
ns
it
y values
, re
su
lt
ing
in a
m
e
an value
of
0
a
nd a
var
ia
nce
of
1.
3.3.
Data
a
ugme
ntati
on
Usu
al
ly
,
f
or
t
he
se
ver
y
po
we
rful
dee
p
ne
ural
networks,
de
ep
le
ar
ning
is
associat
ed
with
m
i
ll
ion
s
of
i
m
ages.
The
di
sadv
a
ntage
of
the
lim
it
ed
trai
nin
g
im
age
set
is
that
the
neural
netw
or
k
m
a
y
re
m
e
m
ber
our
trai
ning
data
a
nd
ca
n
pre
dict
the
perform
ance
of
the
trai
nin
g
set
well
,
bu
t
the
ver
ific
at
ion
acc
ur
a
cy
is
poor.
Fo
r
s
olv
i
ng
t
he
dataset
prob
l
e
m
,
data
aug
m
entat
ion
was
a
pp
li
ed
t
o
pr
e
ve
nt
over
fitt
ing
and
im
pr
ov
e
m
od
el
gen
e
rali
zat
ion
abili
ty
[1
9
]
.
T
he
stu
dy
us
es
on
li
ne
data
a
ugm
entat
ion
.
T
her
e
a
re
var
i
ous
data
au
gm
e
ntati
on
te
chn
iq
ues
us
e
d
f
or
sta
ti
c
sign
la
ngua
ge
to
preve
nt
m
od
el
over
fitt
ing
and
e
nh
a
nce
le
arn
in
g
capa
bili
ty
:
Norm
al
iz
a
ti
on
i
m
age,
br
ig
htne
ss
range
(
0.4
-
1.2),
zo
om
ran
ge
(1.
0,
1.2
),
h
ei
gh
t
sh
ift
ra
ng
e
(10%),
widt
h
sh
ift
range
(
10%)
,
r
otati
on
ra
nge
(
±10°).
T
he
au
gm
entat
ion
of
data
for
the
dy
nam
ic
s
ign
wa
s
done
by
a
pp
l
yi
ng
ro
ta
ti
on
±
(
5°
-
10°),
translat
io
n
tran
sf
or
m
at
i
on
±
(
4
-
8%)
a
nd
cha
nge
the
bri
ghtness
±
(
8
-
28%)
a
nd
s
harpen
t
he
i
m
age,
added
no
ise
sal
t
an
d
pa
pe
r
an
d
blurrin
g
im
ages
with
filt
ers
gaussia
n,
m
edian,
a
ve
rag
i
ng
an
d
m
or
phologica
l
op
e
rati
on
e
r
osi
on
a
nd
dilat
ion
of
t
he
data
set
.
W
e
al
so
f
li
pp
ed
t
he
im
a
ges
ho
rizo
ntall
y
t
o
include
le
ft
or
righ
t
-
ha
nd
e
d
sign
la
ngua
ge.
The
trai
ning
set
is
increased
ab
ou
t
48
tim
es
throu
gh
these
op
e
rati
ons. I
nsi
de
the m
ini
-
ba
tc
h
fe
d
i
nto
t
he
m
od
el
, all of t
hese
operati
on
s ar
e a
ppli
ed
at
r
a
ndom
.
3.4.
Pre
-
tr
ai
n
m
odel
s
To
ta
ke
a
dv
a
nt
age
of
Tra
nsfe
r
le
arn
i
ng
by
usi
ng
pr
e
-
trai
ne
d
m
od
el
s.
Im
a
geN
et
is
a
rese
a
rch
pro
j
ect
that
aim
s
to
create
a
m
assive
im
age
database.
M
odel
s
su
c
h
as
t
he
Den
s
eNet
121
[20],
VGG
16
[21],
Nasnet
Mob
il
e
[22],
Xcep
ti
on
[23], Mo
bileN
et
V2
[24], E
ff
i
ci
entB0 [2
5], Inceptio
nV3
[2
6] an
d
Re
s
Net50 [
27]
wer
e
t
raine
d
on
va
rio
us
cl
ass
es
of
im
ages.
T
hese
m
od
el
s
w
ere
create
d
f
rom
scratch
and
t
raine
d
on
m
il
l
i
ons
of
i
m
ages
con
ta
i
ning
th
ou
sa
nd
s
of
obj
ect
s
us
in
g
hi
gh
-
qual
it
y
GP
Us.
The
m
od
el
ha
s
le
arn
e
d
a
good
represe
ntati
on
of
lo
w
-
le
vel
f
eat
ur
es
su
c
h
a
s
sp
at
ia
l,
e
dge
s,
r
otati
on,
il
lum
inati
on
,
a
nd
s
ha
pes
si
nce
it
was
trai
ned
on
a
la
r
ge
dataset
.
The
se
featur
es
m
a
y
be
exch
a
ng
e
d
to
facil
it
at
e
t
ran
s
fer
le
ar
ning
an
d
extract
f
eat
ur
e
s
from
new
im
a
ges
ac
ro
s
s
se
ve
ral
com
pu
te
r
visio
n
pr
ob
le
m
s.
T
he
pr
e
viou
sly
te
ste
d
m
od
el
sh
ould
al
so
be
a
ble
to
extract
s
peci
fic
featu
re
s f
r
om
these
ne
w
im
ages
base
d
on
the
c
oncepts o
f
tr
ans
fer
le
a
r
ning,
e
ve
n
th
ough
t
he
new
im
ages
ar
e
from
entirel
y
dif
fer
e
nt
gro
ups
t
han
the
sou
rce
dataset
.
T
hi
s
is
to
be
nef
it
from
these
m
o
dels
in
extracti
ng
feat
ur
es
a
nd
cl
assify
ing
im
ages,
wh
ic
h
are
dif
f
ere
nt
f
r
om
wh
at
they
wer
e
tr
ai
ned
on.
T
herefo
re
,
this
require
s
c
hangin
g
t
he
la
st
la
ye
rs
respo
ns
ible
f
or
cl
as
sific
at
ion
f
r
om
these
m
od
el
s
with
di
ff
e
ren
t
oth
e
r
la
ye
rs
to
m
at
c
h
the
nu
m
ber
of
ob
j
ect
s
to
be
cl
assifi
ed.
T
hen
trai
ni
ng
on
the
ne
w
im
age
data
unti
l
the
desir
e
d
pr
eci
sio
n
in
r
ecognit
ion
is
ob
ta
ine
d.
T
his
m
et
ho
d
is
con
side
red
the
be
st
on
e
in
obta
ining
the
re
quire
d
accuracy i
n rec
ognizin
g from
adoptin
g un
t
rained m
od
el
s
.
3.5.
Pr
opose
d metho
d
3.5.1. Si
ng
le
mod
el
Pr
e
-
t
raine
d
m
o
dels
with
trai
ne
d
weig
hts
are
us
ed
on
the
I
m
ageN
et
.
The
se
m
od
el
s
(D
e
ns
eN
et
12
1,
VGG
16,
RES
Net5
0,
Mo
bile
NetV
2,
Xcep
ti
on,
Ef
fici
ent
B0,
N
ASNet
Mob
il
e,
a
nd
I
ncep
ti
on
V3)
w
ere
us
e
d
with
so
m
e
m
od
ific
at
ion
s
.
Ea
ch
of
these
m
od
el
s
incl
udes
two
pa
rts,
the
first
for
e
xtra
ct
ing
feat
ur
es
and
t
he
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
24
, N
o.
1
,
Oct
ober
20
21
:
17
8
-
18
8
182
sec
ond
f
or
cl
assifi
cat
ion
.
T
he
second
pa
rt
has
bee
n
rem
ov
ed
,
an
d
we
ha
ve
ke
pt
the
f
eat
ur
e
ext
racti
on
par
t.
The
n,
a
la
ye
r
of
gl
ob
al
ave
r
age
poolin
g
w
as
add
e
d
after
the
la
st
la
ye
r
in
the
featur
e
extracti
on
pa
r
t.
The
global
ave
rag
e p
ooli
ng
la
ye
r
(
GAP)
is
ad
de
d
t
o
reduce
the
s
iz
e
of
the
feat
ure
m
ap
by
conver
ti
ng
it
into a
on
e
-
dim
ension
al
m
at
rix
w
hile
kee
ping
vital
in
for
m
at
ion
,
w
her
e
the
siz
e
of
a
f
e
at
ur
e
m
ap
with
dim
ension
s
h×
w×d
is
reduced
t
o
dim
ension
s
siz
e
to
1×
1×d.
G
AP
la
ye
rs
us
e
the
ave
rag
e
of
al
l
h×w
val
ue
s
to
re
du
ce
ea
ch
hw
featur
e
m
ap
to
a
sing
le
nu
m
ber
.
Attem
pt
s
hav
e
al
so
be
en
m
ade
to
add
diff
e
re
nt
la
ye
rs
fo
r
op
t
i
m
u
m
cl
assifi
cat
ion
.
The
best
acc
uracy
was
w
hen
add
i
ng
one
la
y
er
of
t
he
dro
pout
rate
of
20%
reducti
on
in
ex
ist
ing
connecti
ons
t
o
pr
e
ve
nt
over
fitt
i
ng
,
i
n
wh
ic
h
the
co
nnect
ions
betwe
en
t
he
l
ay
ers
are
ra
ndom
l
y
el
i
m
inate
d,
t
he
dro
pout
la
ye
r
i
s
disa
bled
i
n
t
est
ing
a
nd
validat
ion
m
od
e.
The
n
f
ollo
wed
by
a
fu
ll
y
c
onnecte
d
ou
t
pu
t
la
ye
r
(F
C)
of
siz
e
44,
it
s
un
it
’s
nu
m
ber
eq
ua
l
to
cl
ass’s
num
ber
,
with
a
so
ftm
a
x
activati
on
f
unct
ion
for
cl
assifi
cat
ion
.
The
fo
ll
owin
g
m
od
el
s
wer
e
de
velo
ped
acco
r
ding
to
w
hat
w
as
m
entioned
a
bove:
Den
se
N
et
121,
VGG
16,
RES
Net5
0,
M
ob
il
e
NetV
2,
Xcep
ti
on,
Ef
fici
ent
B
0,
NASNetM
obil
e,
an
d
I
nce
pt
ion
V
3.
The
n
e
ach
of
them
was
trai
ne
d
on
a
n
Ar
a
bi
c
sig
n
la
ng
ua
ge
dataset
t
o
rec
ognize
Ar
a
bic
sign
la
ngua
ge.
And
Fig
ur
e
3
s
hows
the g
e
ne
ral la
yout
of arc
hitec
ture
for ea
ch
of
these m
od
el
s
.
3.5.2. Mul
ti
-
m
od
el
An
at
te
m
pt
has
bee
n
m
ade
t
o
a
dopt
t
wo
m
od
el
s
from
t
he
pr
e
viously
trai
ned
m
od
el
s
refe
r
red
to
above
,
w
her
e
t
hey
are
trai
ned
in
pa
rall
el
de
ep
feat
ur
e
e
xtr
act
or
s
.
The
n
t
hey
are
c
om
bin
ed
an
d
pr
e
pared
f
or
the
cl
assifi
cat
i
on
sta
ge
.
Fig
ure
4
shows
th
e
arch
it
ect
ur
e
of
a
m
ult
i
-
m
od
el
netw
ork,
wh
ic
h
co
ns
ist
s
of
two
br
a
nc
hes.
Eac
h
br
a
nch
is
a
C
NN
m
od
el
.
De
ns
eN
et
12
1
m
od
el
an
d
V
G
G16
m
od
el
are
use
d
in
the
case
sh
ow
n
in
Fig
ur
e
4
.
I
n
this
m
ulti
-
m
od
el
,
our
data
set
'
s
pr
e
-
pr
oce
ssed
in
put
col
our
im
ages
siz
e
is
100x10
0
pix
el
s,
wh
ic
h
represe
nt
the
in
pu
t
im
a
ge
for
tw
o
m
ulti
-
m
od
el
bran
c
hes.
From
the
input
im
age,
D
ense
Net
pro
du
c
es
a
3x3x10
24
featur
e
m
ap
on
it
s
la
st
featur
e
ex
tract
or
la
ye
r,
wh
il
e
V
GG1
6
gen
e
rates
a
3x
3x512
featur
e
m
ap
on
it
s
la
st
featur
e
extracto
r
la
ye
r.
To
re
du
ce
t
he
siz
e
of
the
la
st
la
ye
r
featur
e
m
ap,
we
ap
plied
Gl
ob
al
A
ve
rage
Pooli
ng
by
ta
ki
ng
t
he
a
verag
e
of
eac
h
featu
re
m
ap
an
d
e
xt
ract
im
po
rtant
featur
e
s.
Since
the
netw
orks
of
both
m
od
el
s
gen
era
te
diff
e
ren
t
f
ea
ture
m
aps
with
diff
e
re
nt
rang
e
values
,
the
n
we
no
rm
alized
the
value
s
of
these
featur
e
m
aps
i
n
the
sam
e
ra
ng
e
by
us
in
g
th
e
lam
bd
a
la
ye
r
.
Af
te
r
the
nor
m
al
iz
ation
,
we
com
bin
e
these
value
s
of the
featu
res m
aps
by c
onca
te
nating
la
ye
r
s
to im
pr
ov
e
the
qu
al
it
y o
f
the
c
reated sem
antic
f
eat
ures.
Fo
r
both
sin
gle
m
od
el
and
m
ulti
-
m
od
el
during
t
he
trai
ning
process
,
the
da
ta
aug
m
entat
i
on
is
one
of
the
m
os
t
popula
r
m
et
ho
ds
for
re
du
ci
ng
ov
e
rf
it
ti
ng.
Wh
en
the
m
od
e
l
is
trai
ned
on
the
GPU,
t
he
data
aug
m
entat
ion
i
s
perf
or
m
ed
in
real
-
ti
m
e
on
the
CP
U.
E
xper
i
m
ents
are
r
un
on
a
sin
gle
co
m
pu
te
r
with
a
n
I
ntel
Core
i
7
-
9750
H
Hex
a
-
c
or
e
CPU,
16
GB
S
DRAM,
an
d
a
n
NVI
DIA
Ge
Fo
r
ce
RT
X
2060
G
PU
with
6G
B
of
m
e
m
or
y. Pyt
hon m
od
ules a
re
u
se
d
t
o
im
ple
m
ent the neu
ra
l netw
ork
m
odel
s.
Figure
3.
The
a
rch
it
ect
ure
of
t
he
si
ng
le
m
od
e
l
netw
ork
Fi
gure
4.
The
a
rch
it
ect
ure
of
t
he
c
on
cat
e
nate
d netw
ork
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Sta
ti
c ha
nd ges
ture rec
ogniti
on
of Ar
ab
ic
sig
n
la
ngua
ge by
us
in
g dee
p
CN
Ns
(
Moha
mma
d H. Is
ma
il
)
183
3.6.
E
va
lu
at
i
on
of pr
opose
d metho
d
To
e
valua
te
th
e
perform
ance
of
t
he
be
st
m
od
el
s
pro
posed
in
this
stu
dy
by
com
par
in
g
them
with
oth
e
r
stu
dies
work
i
ng
on
a
sta
nd
a
rd
data
set
.
Since
the
pro
po
se
d
m
odel
s
wer
e
trai
ne
d
on
the
A
rabi
c
sig
n
la
nguag
e
data
wh
ic
h
we
pre
pa
red,
so
the
m
od
el
s
propose
d
in
this
stud
y
wer
e
ret
raine
d
on
the
Am
eric
an
si
gn
la
ngua
ge
(AS
L)
sta
ndar
d
da
ta
,
wh
ic
h
Ka
ggle
c
halle
nge
dev
el
op
e
d.
Th
e
AS
L
data
in
cl
ud
es
87,00
0
i
m
age
s
div
ide
d
into
29
cat
egories:
26
le
tt
ers
fo
r
al
l
AS
L
vocab
ul
ary
and
thr
ee
le
tt
ers
fo
r
s
pac
e,
delet
e,
and
e
m
pty
.
The
ASL
data
set
was
div
ide
d
into
three
set
s
fo
r
trai
ning,
validat
io
n
and
te
sti
ng
,
80%
(
69600
im
ages)
of
th
e
data
we
re
us
ed
for
trai
ning,
10%
(
8700
im
a
ges)
of
the
dat
a
wer
e
us
e
d
for
validat
io
n,
an
d
10%
(
8700
im
ages
)
of
t
he
data
use
d
f
or
te
sti
ng
.
Th
us
,
t
he
m
od
el
'
s
per
f
or
m
ance
in
t
his
stu
dy
can
be
c
om
par
ed
with
previ
ou
s
stud
ie
s t
hat use
d
the
sam
e A
S
L d
at
aset
.
3.7.
Ge
neral
Workfl
ow
of the
pr
oposed
meth
od
T
h
e
o
p
e
n
-
s
o
u
r
c
e
G
o
o
g
l
e
M
e
d
i
a
P
i
p
e
t
e
c
h
n
o
l
o
g
y
i
s
u
s
i
n
g
t
o
d
e
t
e
c
t
t
h
e
ha
n
d
s
.
T
h
i
s
p
l
a
t
f
o
r
m
a
l
l
o
w
s
u
s
i
n
g
r
e
a
l
-
t
i
m
e
c
om
p
u
t
e
r
v
i
s
i
o
n
t
e
c
h
n
o
l
o
g
y
,
i
n
c
l
u
d
i
n
g
h
a
n
d
d
e
t
e
c
t
i
o
n
,
h
a
n
d
t
r
a
c
k
i
n
g
.
I
t
w
a
s
r
e
l
e
a
s
e
d
i
n
2
0
2
0
.
T
h
e
G
o
o
g
l
e
M
e
d
i
a
P
i
p
e
t
e
c
h
n
o
l
o
g
y
p
r
o
v
i
d
e
s
d
e
t
a
i
l
e
d
r
e
a
l
-
t
i
m
e
f
i
n
g
e
r
t
r
a
c
k
i
n
g
w
i
t
h
m
u
l
t
i
p
l
e
h
a
nd
s
.
T
h
e
a
c
c
u
r
a
c
y
o
f
t
h
e
p
a
l
m
d
e
t
e
c
t
i
o
n
i
s
9
5
%
.
M
e
d
i
a
P
i
p
e
u
s
e
s
t
w
o
c
o
n
v
o
l
u
t
i
o
n
a
l
n
e
u
r
a
l
n
e
t
w
o
r
k
m
o
d
e
l
s
t
o
d
e
t
e
c
t
t
h
e
h
a
n
d
:
p
a
l
m
d
e
t
e
c
t
i
o
n
a
n
d
f
i
n
g
e
r
d
e
t
e
c
t
i
o
n
f
r
o
m
a
p
i
c
t
u
r
e
o
r
v
i
d
e
o
c
l
i
p
.
T
h
i
s
w
a
s
u
s
e
d
t
o
d
e
f
i
n
e
t
h
e
h
a
n
d
r
e
g
i
o
n
t
h
a
t
w
o
u
l
d
b
e
e
x
t
r
a
c
t
e
d
[
2
8
]
.
T
h
e
s
e
q
u
e
n
c
e
o
f
f
r
a
m
e
s
c
a
p
t
u
r
e
d
b
y
t
h
e
c
a
m
e
r
a
i
s
p
a
s
s
e
d
t
h
r
o
u
g
h
a
m
e
d
i
a
p
i
p
e
f
r
a
m
e
w
o
r
k
h
a
n
d
d
e
t
e
c
t
o
r
t
o
f
i
n
d
t
h
e
h
a
n
d
b
o
u
n
d
a
r
y
.
A
f
t
e
r
t
h
a
t
,
t
h
e
h
a
n
d
r
e
g
i
o
n
i
s
e
x
t
r
a
c
t
e
d
a
n
d
p
a
s
s
e
d
i
n
t
o
t
h
e
p
r
e
p
r
o
c
e
s
s
i
n
g
s
t
a
g
e
t
o
r
e
s
i
z
e
a
n
d
n
o
r
m
a
l
i
s
e
t
h
e
h
a
n
d
r
e
g
i
o
n
i
m
a
g
e
.
T
h
e
n
t
h
e
h
a
n
d
r
e
g
i
o
n
i
m
a
g
e
p
a
s
s
e
d
i
n
t
o
s
i
n
g
l
e
o
r
m
u
l
t
i
-
C
N
N
m
o
d
e
l
s
f
o
r
s
i
g
n
l
a
n
g
u
a
g
e
r
e
c
o
g
n
i
t
i
o
n
b
y
f
e
a
t
u
r
e
e
x
t
r
a
c
t
i
o
n
a
n
d
c
l
a
s
s
i
f
i
c
a
t
i
o
n
.
F
i
g
u
r
e
5
s
h
o
w
s
t
h
e
o
v
e
r
a
l
l
a
r
c
h
i
t
e
c
t
u
r
e
o
f
t
h
e
s
y
s
t
e
m
f
o
r
h
a
n
d
d
e
t
e
c
t
i
o
n
a
n
d
s
i
g
n
l
a
n
g
u
a
g
e
r
e
c
o
g
n
i
t
i
o
n
.
Figure
5.
The
gen
e
ral
workfl
ow of t
he pr
opos
e
d
m
et
ho
d f
or h
a
nd
detect
ion an
d g
est
ur
e
recog
niti
on
4.
RESU
LT
S
AND DI
SCUS
S
ION
The per
f
or
m
ance ev
al
ua
ti
on of
our pro
po
sal
has bee
n
car
rie
d ou
t
with t
he f
ollow
i
ng m
et
ri
cs [29
]
,
widely
used
fo
r
this
kind
of ta
sk
: A
ccu
racy,
Pr
eci
sio
n,
Rec
al
l and
F1
-
sc
ore. T
hey are
de
f
ined
a
s foll
ows
:
=
+
(1)
=
+
(2)
=
+
(3)
−
=
2
×
(
×
)
(
+
)
(4)
False
n
egati
ve
is
a
resu
lt
unde
r
w
hich
the
m
od
el
forecast
s
th
e
neg
at
ive
cl
as
s
wron
gly.
Fal
se
p
os
it
iv
e
is
a
res
ult
unde
r
w
hic
h
the
m
od
el
f
or
ecast
s
the
posit
ive
cl
ass
w
ron
gly.
T
ru
e
n
e
gative
is
a
res
ult
un
der
wh
i
c
h
the
m
od
el
fore
cast
s
the
ne
gat
ive
cl
ass
accu
r
at
el
y.
Tru
e
p
osi
ti
ve
is
a
resul
t
un
de
r
wh
ic
h
the
m
od
el
for
ecast
s
the posit
ive cla
s
s accu
ratel
y.
Table
1
com
pa
res
the
validat
i
on
accu
racy
an
d
te
st
accu
racy
of
a
sin
gle
m
od
el
an
d
m
ulti
-
m
od
el
with
epo
c
hs
e
qu
al
to
5.
Accuracy
:
is
the
rati
o
of
the
nu
m
ber
of
co
rr
ect
cl
assifi
cat
ion
s
to
the
total
nu
m
ber
of
cl
assifi
cat
ion
s.
I
ncorr
ect
rec
ognize
t
rainin
g,
validat
io
n,
an
d
te
sti
ng:
are
the
nu
m
ber
of
m
isc
la
ssifie
d
im
ages
for
trai
ning,
va
li
dation,
an
d
te
sti
ng
res
pecti
ve
ly
.
The
ta
ble
rev
eal
e
d
that
both
t
he
validat
ion
acc
ur
acy
a
nd
th
e
te
st
accuracy
wer
e
at
le
ast
97%
in
m
os
t
s
ing
le
or
m
ultim
od
al
m
od
el
s.
T
he
accu
racy
rati
o
is
hig
h
des
pite
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
24
, N
o.
1
,
Oct
ober
20
21
:
17
8
-
18
8
184
sever
al
inc
orre
ct
reco
gniz
e
sign
im
ages,
whet
her
in
te
sti
ng
or
validat
io
n
du
e
to
the
la
r
ge
siz
e
of
the
data,
wh
e
re
22,
000
i
m
ages
of
the
data
wer
e
us
e
d
for
each
of
te
sti
ng
and
va
li
dation.
T
he
ta
ble
al
so
sh
ows
the
com
par
ison
be
tween
the
pe
rfor
m
ance
of
the
sing
le
m
od
el
and
m
ulti
-
m
od
el
.
It
app
ears
t
hat
the
m
ulti
-
m
od
el
is
bette
r
in
f
eat
ure extracti
on a
nd classi
ficat
io
n t
han the
sin
gle.
Table
1.
C
om
par
iso
n of t
he v
al
idati
on
acc
uracy
an
d t
est
ac
cur
acy
for
si
ngle
an
d m
ulti
-
m
od
el
s
w
it
h Ep
oc
hs
=
5
Mod
el
Inco
rr
ect R
ecog
n
ize
Tr
ain
in
g
Inco
rr
ect R
ecog
n
ize
Valid
atio
n
Valid
atio
n
Accurac
y
%
Inco
rr
ect
Reco
g
n
ize T
est
Test
Accurac
y
%
Den
seNet1
2
1
12
3
9
9
.99
1
100
VGG1
6
9
6
9
9
.97
5
9
9
.98
Den
seNet1
2
1
&
V
GG1
6
1
1
100
1
100
RESNet5
0
34
10
9
9
.95
6
9
9
.97
Mob
ileNetV2
45
3
9
9
.99
6
9
9
.97
RESNet5
0
&
M
o
b
ileNetV2
23
5
9
9
.98
5
9
9
.98
Xcepti
o
n
110
15
99
.
93
15
9
9
.93
Ef
f
icien
t B0
267
53
99
.
85
38
9
9
.83
Xcepti
o
n
&
Ef
f
icien
t B0
106
16
99
.
93
17
9
9
.92
NASNet
Mob
ile
2334
328
98
.
51
320
9
8
.55
Incep
tio
n
V3
3883
50
8
97
.
69
491
9
7
.77
NASNet
Mob
ile
&
Incep
tio
n
V3
3304
415
98
.
11
417
9
8
.10
Den
seNet1
2
1
&
Mob
ileNetV2
7
2
9
9
.99
1
100
Den
seNet1
2
1
&
RE
SNet5
0
11
2
9
9
.99
2
9
9
.99
Table
2
show
s
total
par
am
et
ers
,
FPS
,
trai
nin
g
ti
m
e,
s
iz
e
of
featur
e
m
aps
and
to
ta
l
incor
rect
recog
nize
sig
n
i
m
a
ge
ou
t
of
220
t
hous
a
nd
f
or
dif
fe
ren
t
m
odel
s
.
Total
pa
ra
m
et
ers
:
The
pa
ram
et
ers
sel
ected
by
the
netw
ork
du
rin
g
the
trai
ning
process
a
re
con
si
der
e
d
the
netw
ork
pa
ram
et
ers.
Their
num
ber
deter
m
ines
the
com
plexity
of
the
net
wor
k
and
the
possib
il
ity
of
bette
r
le
arn
in
g,
but
this
nee
ds
m
ore
i
m
ages
to
tr
ai
n
the
netw
ork.
Train
ing
Tim
e:
The
tim
e
is
ta
ken
to
trai
n
the
net
work.
Size
of
f
eat
ur
e
m
aps:
the
siz
e
of
la
st
feature
extracti
on
la
ye
r.
T
he
fr
am
e
pe
r
sec
ond
(F
P
S)
is
t
he
m
os
t
com
m
on
un
it
of
ti
m
e
us
ed
i
n
obj
ect
detect
ion
.
I
t
ind
ic
at
es
the
m
axim
u
m
nu
m
b
er
of
fr
am
es
that
the
net
work
will
process
i
n
a
seco
nd.
total
inco
rr
ect
recog
nize
:
the num
ber
of
m
isc
la
ssifie
d
im
ages.
The
t
op
th
ree best
in
f
eat
ur
e
e
xtracti
on
a
nd
c
la
ssific
at
ion
m
od
el
s
are
the
m
ulti
-
m
od
el
s,
D
ense
Net1
21
&
V
G
G16,
D
e
ns
eN
et
121
& Mob
il
eNet
V2, an
d
De
ns
e
Net
121
&
RE
SN
et
50. I
t
is base
d
on
the
total
nu
m
ber
of
In
c
orrect
Re
co
gn
iz
e
si
gn
im
a
ges
in
the
trai
ni
ng
,
validat
io
n
an
d
te
sti
ng
da
ta
set
.
It
is
cl
ear
f
ro
m
the
ta
bl
e
that
the
trai
ning
ti
m
e
fo
r
the
m
ulti
-
m
od
el
is
great
er
than
t
he
trai
ning
tim
e
fo
r
the
sin
gle
m
od
el
s
that
com
po
se
it
and
le
ss
t
han
t
he
trai
ni
ng
ti
m
e
for
both
si
ngle
m
od
el
s.
It
al
so
s
hows
that
t
he
FP
S
in
the
m
ul
ti
-
m
od
el
case
is
le
ss
than
the
s
ing
le
m
od
el
a
nd
ra
nges
bet
ween
66
-
96%
of
the
FPS
of
si
ng
le
m
od
el
s.
It
al
s
o
rev
eal
ed
th
at
wh
e
n
t
he
total
par
am
et
ers
ar
e
increase
d,
th
e
FPS
dec
reas
es.
T
hese
are
evide
nt
in
t
he
m
ul
ti
-
m
od
el
in
wh
ic
h
the total
par
am
et
ers
are
great
e
r
tha
n
t
hat of t
he
si
ng
le
m
od
e
l.
Table
2.
T
otal
pa
ram
et
ers,
FPS an
d
t
otal i
nc
orrec
t rec
ogniz
e f
or
dif
fer
e
nt
dee
p C
N
N
m
od
el
s
Mod
el
Total
Para
m
eters*1
0
6
Tr
ain
in
g
T
i
m
e
(ho
u
r)
Size of
Feature
Maps
FPS Fo
r
Inf
erence
Total Inco
rr
ect
Reco
g
n
ize
Den
seNet1
2
1
7
.08
2
.13
3
*
3
*
1
0
2
4
24
16
VGG1
6
1
4
.73
1
.52
3
*
3
*
5
1
2
32
20
Den
seNet1
2
1
&
V
GG1
2
1
.81
3
.17
1536
22
3
RESNet5
0
2
3
.67
1
.58
4
*
4
*
2
0
4
8
28
50
Mob
ileNetV2
2
.31
1
.17
4
*
4
*
1
2
8
0
32
54
RESNet5
0
&
Mob
ileNetV2
2
5
.99
2
.28
3328
24
33
Xcepti
o
n
2
0
.95
2
.00
3
*
3
*
2
0
4
8
28
140
Ef
f
icien
t B0
4
.10
2
.23
4
*
4
*
1
2
8
0
24
358
Xcepti
o
n
&
Ef
f
icien
t B0
2
5
.05
3
.55
3328
19
139
NASNet
Mob
ile
4
.31
3
.68
4
*
4
*
1
0
5
6
21
2692
Incep
tio
n
V3
2
1
.89
2
.18
1
*
1
*
2
0
4
8
23
4882
NASNet
Mob
ile &
Incep
tio
n
V3
2
6
.3
4
.77
3104
16
4136
Den
seNet1
2
1
&
RESNet5
0
3
0
.76
3
.22
3072
23
15
Den
seNet1
2
1
&
Mob
ileNet
V2
9
.39
3
.12
2304
21
10
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Sta
ti
c ha
nd ges
ture rec
ogniti
on
of Ar
ab
ic
sig
n
la
ngua
ge by
us
in
g dee
p
CN
Ns
(
Moha
mma
d H. Is
ma
il
)
185
Seve
ral
sin
gle
and
m
ulti
m
od
el
s
ha
ve
been
trai
ned
an
d
te
s
te
d.
DenseNet
121
a
nd
VGG
16
net
wor
k
extracti
ng
dee
p
featu
res
is
be
tt
er
than
oth
er
netw
orks
base
d
on
the
m
et
ho
d
desc
ribe
d
a
bove
.
C
om
par
ed
to
t
he
con
cat
e
natio
n
of
De
ns
eNet
121
an
d
VGG
16
neural
net
wor
ks
with
oth
e
r
ne
ur
al
netw
orks
.
The
higher
acc
ur
ac
y
is
ob
ta
ine
d
with
the
c
on
cat
e
na
te
d
netw
ork
.
We
we
re
a
ble
to
hel
p
the
network
le
a
rn
t
he
represe
ntati
on
of
bo
t
h
by
con
cat
e
nating
the
feat
ur
e
vecto
rs
of
bo
t
h
networks,
w
hich
accu
ratel
y
rep
rese
nted
the
i
m
age
and
pro
du
ce
d
a b
et
te
r
acc
ura
cy
o
f
pr
edict
io
n.
The
si
ng
le
m
od
el
s
use
d
i
n
t
he
propose
d
m
et
ho
d
were
arr
a
ng
e
d
on
T
ables
1
an
d
2
accor
di
ng
t
o
accuracy
that
de
pends
on
the
total
In
c
orrect
Re
cognize.
A
fter
that,
the
m
ulti
-
m
od
el
s
wer
e
us
ed
f
or
e
ver
y
two
sing
le
m
od
el
s
in
the
se
qu
e
nc
e,
m
eaning
f
our
m
ulti
-
m
od
e
ls.
The
n
ot
her
op
ti
ons
wer
e
add
e
d
us
in
g
th
e
best
sing
le
m
od
el
with
oth
e
r
si
ngle
m
od
el
s
outsi
de
the
seq
ue
nce.
F
ig
ur
e
6
sho
ws
t
he
tra
ining
a
nd
vali
dation
accuracy
in
add
it
io
n
to
the
trai
nin
g
an
d
validat
io
n
loss
of
m
ult
i
-
m
odel
Den
seNet
12
1
an
d
VGG
16.
The
accuracy c
onti
nu
e
s to
incr
ea
s
e, and t
he
lo
ss
rate dec
reases
durin
g
t
he
trai
ning a
nd v
al
i
da
ti
on
ph
a
ses.
Figure
6.
The
trainin
g
a
nd
validat
ion
a
ccu
rac
y i
n
ad
diti
on to
the trai
ning a
nd
validat
io
n
lo
ss of m
ulti
-
m
od
el
Den
s
eNet
121 & VG
G16
Table
of
44
-
cl
ass
co
nfusi
on
m
at
rix
the
m
od
el
is
us
ed
f
or
data
a
ugm
ent
at
ion
te
c
hn
i
ques
in
ArS
L
i
m
age
cl
assifi
cat
ion
.
Col
um
ns
represent
the
true
cl
asses,
a
nd
th
e
cl
assifi
er'
s
pr
edict
io
ns
are
represe
nt
ed
by
rows.
All
c
orre
ct
ion
cl
assifi
ca
ti
on
s
a
re
ar
ra
nged
in
t
he
diag
on
al
of
a
s
quar
e
m
a
trix.
T
he
r
esults
of
the
m
ulti
-
m
od
el
neu
ral
netw
ork
e
valu
at
ion
of
De
nse
Net1
21
an
d
VGG
16
a
re
il
lustrate
d
f
or
t
he
trai
ning
a
nd
te
sti
ng
netw
orks
i
n
t
he
co
nfusion
m
at
rix
s
how
n
i
n
Figures
7
an
d
8.
Fig
ur
e
9
s
how
ta
bula
ti
on
of
preci
sio
n,
reca
ll
,
f1
-
scor
e
,
an
d
sup
port
for
each
c
la
ss
of
trai
ni
ng
networ
k
to
re
cognize
A
rab
i
c
sign
la
ngua
ge
with
the
ta
sk
of
the
44 class
by
m
ulti
-
m
od
el
D
e
nse
Net1
21 & V
G
G16.
Figure
7.
Trai
ni
ng
c
onf
us
io
n m
at
rix
of m
ulti
-
m
od
el
Den
s
eNet
121 & VG
G16
Figure
8.
Test
in
g
co
nfusi
on
m
at
rix
of m
ulti
-
m
od
el
Den
s
eNet
121 & VG
G16
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
24
, N
o.
1
,
Oct
ober
20
21
:
17
8
-
18
8
186
Figure
9.
Tabul
at
ion
of
pr
e
ci
sion
,
r
ecal
l,
f
1
-
s
cor
e
, a
nd sup
port
for
eac
h
cl
a
ss for t
est
ing &
trainin
g netw
ork
of
m
ul
ti
-
m
od
el
D
ense
Net1
21 & VG
G
16
Table
3
com
pa
res
the
validat
i
on
accu
racy
an
d
te
st
accu
racy
of
a
sin
gle
m
od
el
an
d
m
ulti
-
m
od
el
with
epo
c
hs
eq
ual
t
o
5
for
t
he
A
SL
dataset
.
T
he
ta
ble
s
hows
the
c
om
par
iso
n
betwee
n
t
he
pe
rfor
m
ances
of
the
sing
le
m
od
el
and
m
ulti
-
m
od
e
l.
It
ap
pear
s
t
ha
t
the
m
ulti
-
m
od
el
is bett
er
in
feat
ur
e
e
xtra
ct
ion
an
d
cl
ass
if
ic
at
ion
than
the
sin
gl
e
m
od
el
s.
In
a
dd
it
io
n,
10
0%
accuracy
was
ob
ta
ine
d
in
each
of
the
trai
ning,
validat
i
on
an
d
te
sti
ng
of
t
he m
ul
ti
-
m
od
el
if
the traini
ng w
a
s incr
ea
sed
at e
po
c
hs eq
ual
7.
Table
4
s
hows
the
c
om
par
iso
n
bet
ween
this
work
a
nd
pr
e
vious
w
orks
f
or
the
AS
L
da
ta
set
.
Fr
om
Table 4
, it i
s clea
r
that t
he pr
opose
d
m
et
ho
d,
wh
et
her
us
in
g a si
ng
le
m
od
el
or
a m
ulti
-
m
odel
, is b
et
te
r
tha
n
the
m
od
el
s p
rese
nt
ed
in
the
previ
ou
s
stu
dies
refe
rr
e
d
to
in
t
he
t
able.
Table
3.
C
om
par
iso
n of t
he v
al
idati
on
acc
uracy
an
d t
est
ac
cur
acy
f
or si
ngle
an
d m
ulti
-
m
od
el
s
with
Epo
chs=
5
for ASL
datase
t
Mod
el
Inco
rr
ect R
ecog
n
ize
Tr
ain
in
g
Inco
rr
ect R
ecog
n
ize
Valid
atio
n
Valid
atio
n
Accurac
y
%
Inco
rr
ect
Reco
g
n
ize T
est
Test
Accurac
y
%
Den
seNet1
2
1
10
1
99
.
99
0
1
00
VGG1
6
38
6
99
.
93
3
99
.
97
Den
seNet1
2
1
&
VG
G
16
2
0
1
00
0
1
00
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Sta
ti
c ha
nd ges
ture rec
ogniti
on
of Ar
ab
ic
sig
n
la
ngua
ge by
us
in
g dee
p
CN
Ns
(
Moha
mma
d H. Is
ma
il
)
187
Table
4.
C
om
par
iso
n
this
work a
nd pre
vious
works f
or
AS
L
d
at
aset
Au
th
o
rs
Descripti
o
n
Accurac
y
Lu
m
et
al
.,
20
2
0
[
3
0
]
Tr
an
sf
er
lea
rnin
g
us
in
g
M
o
b
ileNetV2
on
29
classes
9
8
.67
%
Sin
h
a
et al
.,
20
1
9
[
3
1
]
Cu
sto
m
CN
N
m
o
d
el with
f
u
lly
con
n
ected lay
e
r
o
n
29
cl
ass
es
9
6
.03
%
Kad
h
i
m
et al
.
,
2
0
2
0
[
3
2
]
Tr
an
sf
er
lea
rnin
g
us
in
g
VGG1 o
n
28
classes
9
8
.65
%
Pau
l
et al
.
2
0
2
0
[
3
3
]
Cu
sto
m
CN
N
m
o
d
el with f
u
lly
con
n
ected lay
e
r
o
n
24
cl
ass
es
9
9
.02
%
Mah
m
u
d
et al
.
,
2
0
1
8
[
3
4
]
HOG featu
re
extra
ctio
n
&
K
NN cl
ass
if
ier
o
n
26
classes
9
4
.23
%
Prasad
20
1
8
[
3
5
]
I
m
ag
e
m
ag
n
itu
d
e
g
radien
t f
o
r
f
eatu
re
ex
traction
on
24
classes
9
5
.40
%
Ph
o
n
g
&
Rib
eiro 2
0
1
9
[
3
6
]
Tr
an
sf
er
lea
rnin
g
on
m
u
ltip
le
ar
ch
ite
c
tu
re,
etc
on
29
classes
9
9
.00
%
Ash
iq
u
zza
m
an
et a
l
.,
20
2
0
[
3
7
]
Tra
n
sf
er
lea
rnin
g
us
in
g
VGG16
on
29
class
es
9
4
.00
%
This
work
s
in
g
le
m
o
d
el
Tr
an
sf
er
lea
rnin
g
us
in
g
Den
seNet1
2
1
o
n
29
classes
1
0
0
.00
%
This
work
s
in
g
le
m
o
d
el
Tr
an
sf
er
lea
rnin
g
us
in
g
VGG1
6
o
n
29
class
es
9
9
.97
%
This
work
m
u
lti
-
m
o
d
el
Tr
an
sf
er
lea
rnin
g
us
in
g
m
u
lti
-
m
o
d
el
Den
seNet1
2
&
VG
G1
6
on
29
classes
1
0
0
.00
%
5.
CONCL
US
I
O
N
Thro
ugh
a
naly
sis
an
d
discuss
ion
of
the
res
ul
ts
of
t
he
pro
pose
d
m
et
ho
d,
and
un
der
the
lim
it
at
ion
s
adopted
by
th
e
researc
h,
the
fo
ll
owin
g
wa
s
con
cl
ud
e
d:
The
resea
rc
h
pr
e
par
e
d
a
bout
220
th
ou
sa
nd
co
lour
i
m
age
dataset
s,
as
the
re
is
no
public
col
our
dataset
f
or
Ar
a
bic
sig
n
la
ng
ua
ge
rec
ogniti
on.
When
c
om
par
in
g
the
pe
r
form
ance
of
si
ng
le
m
od
el
s
an
d
m
ulti
-
m
od
el
s,
it
ap
pear
s
that
m
os
t
m
ulti
-
m
od
el
s
are
bette
r
i
n
f
eat
ur
e
extracti
on
tha
n
sin
gle
m
od
el
s.
The
Den
s
eNet1
21
is
t
he
best
CN
N
m
od
el
fo
r
e
xtracti
ng
featu
re
s
an
d
cl
assify
ing
t
he
Ar
a
bic
sig
n
la
ngua
ge
by
de
pe
nd
i
ng
on
t
he
t
otal
num
ber
of
inco
rr
ect
ly
rec
ognized
sig
n
i
m
ages
in
trai
ning,
val
idati
on
an
d
te
s
ti
ng
dataset
s.
F
ur
t
her
m
or
e,
ba
sed
on
the
tota
l
nu
m
ber
of
i
nc
orrectl
y
reco
gniz
ed
sign
im
ages
in
trai
ning,
vali
da
ti
on
,
a
nd
te
sti
ng
dataset
s,
th
e
Den
s
eNet
121
&
VGG
16
m
ulti
-
m
od
el
CNN
is
the
best
f
or
e
xtrac
ti
ng
featur
e
s
a
nd
cl
assify
in
g
Ar
a
bic
sig
n
la
ngua
ge.
The
m
ul
ti
-
m
od
el
is
bette
r
for
t
he
featur
e
extracti
on
a
nd
cl
as
sific
at
ion
of
A
SL
t
han
the
sin
gle
m
od
el
s
by
us
i
ng
the
pr
opos
e
d
m
et
ho
d.
A
nd
the
acc
uracy
of
t
he
pro
po
se
d
m
et
ho
d,
w
he
ther
us
in
g
a
sing
le
m
od
el
or
a
m
ult
i
-
m
od
el
,
is
bette
r
t
han
the
m
od
el
s
pre
sented
in
the
previ
ous
stud
ie
s
in
e
xt
racti
ng
feat
ures
and
cl
assify
in
g
A
SL.
In
f
uture
resea
rches,
the
w
ork
will
be
exten
ded
to
de
velo
p
a
m
ob
il
e
-
base
d
ap
plica
ti
on
t
o
rec
ogniz
e
A
rab
ic
sig
n
l
angua
ge
i
n
rea
l
-
tim
e.
And
al
s
o,
t
he
syst
e
m
will
b
e
exten
ded
t
o
us
e
dy
nam
ic
gestu
re
rec
ogniti
on
for
A
ra
bic
sign
la
ngua
ge,
wh
ic
h
re
qu
i
res
p
re
par
i
ng a
vide
o
-
base
d datas
et
.
REFERE
NCE
S
[1]
A.
Thongt
awe
e,
O.
Pins
anoh,
and
Y.
Kitj
a
idu
re,
“
A
Novel
F
ea
tur
e
Ext
r
ac
t
io
n
for
Am
eri
ca
n
Sign
La
nguage
Rec
ognition
Us
ing
W
ebc
am,”
11th
Bi
omedi
cal
Engi
ne
ering
Int
ernati
onal
Conf
ere
nce
(
Bme
ic
o
n)
,
2018,
pp.
1
-
5,
doi
:
10
.
1109/B
MEiCON.2018.8609933
.
[2]
A.
Al
-
Khali
fa
,
“
The
Arabi
c
Dict
ion
ar
y
of
Gest
ure
for
the
Dea
f,
”
Suprem
e
Counic
a
l
for
Fam
il
y
Affai
rs
,
2008
.
[Online
]
.
Avai
lable: ht
tps:
//
ar
ab.org/di
re
ct
or
y
/su
pre
m
e
-
counc
i
l
-
f
amil
y
-
aff
a
irs/
[3]
M.
Mukus
hev,
A.
Sab
y
rov
,
A.
Im
ashe
v,
K.
Koishiba
y
,
V.
Ki
m
m
el
m
an,
and
A.
Sand
y
gu
lova,
“
Eva
luation
o
f
Manua
l
and
No
n
-
Manua
l
Com
ponent
s
for
Sign
La
nguag
e
Re
cogni
ti
on
,
”
Proce
ed
ings
of
the
12th
Language
Re
sour
ce
s and
E
val
uati
on
Conf
ere
nce
,
2020
,
pp
.
6073
-
6078.
[4]
H.
Cooper,
B.
H
olt
,
and
R
.
Bowden,
“
Sign
La
ng
uage
Re
cogni
t
io
n,
”
V
isual
Analy
sis
of
Hum
ans
.
London:
Springe
r,
pp.
539
-
562
,
2011
,
doi
:
10
.
1007
/978
-
0
-
85729
-
99
7
-
0_27.
[5]
A.
H.
Vo,
V.
H.
Pham
,
and
B.
T.
Ngu
y
en
,
“
Dee
p
Learni
ng
for
Viet
namese
Sig
n
La
nguag
e
Re
c
ogn
it
ion
in
Vid
e
o
Sequenc
e
,
”
Int
e
rnational
Journ
al
of
Mac
hin
e
Learning
and
Computing
,
vol.
9
,
no.
4,
pp.
440
-
445,
2019
,
doi:
10
.
18178/ij
m
lc
.
2019.
9
.
4.
82
3.
[6]
S.
M.
El
ataw
y
,
D.
M.
Hawa,
A.
A.
Ewe
es,
an
d
A.
M.
Saad,
“
Rec
ognit
ion
S
y
stem
for
Alphabe
t
Arabi
c
Sign
Lan
guag
e
Us
in
g
Neutr
osophic
and
Fuz
z
y
C
-
Mea
ns
,
”
Education
and
Info
rm
ati
on
Techno
logi
es
,
vol
.
25
,
pp.
5601
-
5616
,
2020
,
doi
:
10
.
10
07/s10639
-
020
-
10184
-
6.
[7]
S.
Ha
y
an
i,
M.
B
ena
dd
y
,
O.
El
Meslouhi,
and
M.
Kardouc
hi,
“
Ara
b
Sign
La
nguage
Rec
ognition
wit
h
Convolut
ional
Neura
l
Network
s,”
2019
Inte
rna
ti
onal
Conf
ere
nc
e
of
Computer
S
ci
en
ce
and
R
enewable
En
ergie
s
(
I
CCSRE
)
,
2019
,
pp.
1
-
4,
doi:
10
.
1109/ICCSRE.
2
019.
8807586
.
[8]
R.
G.
Crespo,
M.
Khari
,
E
.
Verd
ú,
M.
Khari
,
a
nd
A.
K.
Garg,
“
Gesture
Rec
og
nit
ion
of
RGB
and
RGB
-
D
Stat
i
c
Im
age
s
Us
ing
Convolut
ional
Neura
l
Network
s,”
Inte
rnati
ona
l
Journal
of
Int
erac
tive
Mul
ti
m
edi
a
&
Arti
fi
cial
Inte
lligen
ce
,
vol
.
5
,
no.
7,
pp.
23
-
27,
2019
,
doi:
10
.
9781/i
j
imai.
201
9.
09.
0
02
.
[9]
A.
Dada
shza
d
eh
,
A.
T
.
Ta
rgh
i,
M.
Ta
hm
asbi,
a
nd
M.
Mirm
ehdi
,
“
Hgr
-
Net:
A
Fus
ion
Network
for
Hand
Gesture
Segm
ent
at
ion
a
nd
Rec
ogni
ti
on
,
”
I
ET
Compute
r
Vi
sion
,
vo
l.
13
,
no.
8,
pp
.
7
00
-
707,
2019
,
doi:
10.
1049
/iet
-
cvi
.
2018
.
5796.
[10]
A.
I.
Shah
in
an
d
S.
Alm
ota
ir
i
,
“
Autom
at
ed
Arabi
c
Sign
La
ngua
ge
Rec
ogn
it
ion
S
y
stem
Based
o
n
Dee
p
Tr
ansfe
r
Le
arn
ing,”
I
JCSNS
Int. J.
Comp
ut.
S
ci.
Ne
tw. Secur.
,
vol. 19, no. 10, pp. 144
-
152
,
2019
.
[11]
E.
E
lsa
y
ed
and
D.
R.
Fath
y
,
“
Si
gn
La
nguag
e
Se
m
ant
ic
Tr
ansla
t
i
on
S
y
stem
Us
ing
Ontolog
y
and
Dee
p
Learni
ng
,
”
Inte
rnational
Jo
urnal
of
Adv
an
ce
d
Computer
Sci
en
ce
and
A
ppli
cations,
vol
.
11,
no.
1
,
pp.
141
-
147,
2020
,
doi:
10
.
14569/IJ
ACS
A.2020.
0110118.
Evaluation Warning : The document was created with Spire.PDF for Python.