Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 5
,
O
c
tob
e
r
201
6, p
p
. 2
176
~218
7
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
5.1
138
4
2
176
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Sign Language Recognition
System Simulated for Video
Captured with Smart Phone Front Camera
G. An
ant
h
Ra
o, P.
V.V
.
Kish
ore
Department o
f
Electronics
and C
o
mmunicati
ons
Engineering, K.L. Univ
ersity
, In
dia
Article Info
A
B
STRAC
T
Article histo
r
y:
Received
May 20, 2016
Rev
i
sed
Ju
l 22
,
20
16
Accepted Aug 10, 2016
This works objec
tive is to b
r
ing si
gn lang
uage closer to
real time
implementation
on mobile
platforms. A vid
e
o data
base of
Indian sign
languag
e
is created with a mobile front
camera in selfie mode. This video is
processed on a personal computer b
y
constr
aining the computin
g power to
that of a smart p
hone with
2GB
ram.
Pre-filter
i
n
g
, segmentation
and featur
e
extraction on video frames creates a si
gn language feature
space. Minimum
distance classification of th
e sign feat
u
r
e space converts sign
s to text or
speech. ASUS sm
art phone wi
th
5M pixel
front
cam
era
cap
tures
continuous
sign videos con
t
aining
around
240 frame
s at a frame rate of
30fps. Sobel
edge op
erator’s
power is en
hanced
with
morpholog
y
an
d adaptiv
e
thresholding giv
i
ng a near perfect se
gmentation of hand and head portions.
Word matching
score (WMS) estimates
perfo
rmance of
th
e propo
sed method
with an
av
erag
e
WMS of around
90.58%.
Keyword:
In
di
an
si
g
n
l
a
n
gua
ge
M
a
hal
a
n
obi
s d
i
st
ance
M
obi
l
e
pl
at
fo
r
m
Sob
e
l ad
ap
tiv
e th
resho
l
d
Morphological differe
n
cing
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
P.
V.
V.
K
i
sho
r
e,
Depa
rt
m
e
nt
of
El
ect
roni
cs
an
d C
o
m
m
uni
cati
ons
En
gi
nee
r
i
n
g
,
K
.
L.
Un
iv
e
r
s
i
ty
,
Gree
n
Fi
el
ds,
Vad
d
es
waram
,
G
unt
ur
DT
,
A
n
d
h
r
a P
r
ade
s
h
,
I
ndi
a.
Em
a
il: p
v
v
k
i
sho
r
e@k
l
u
n
i
v
e
rsity.in
1.
INTRODUCTION
C
o
m
put
er vi
si
on
i
s
a
fi
el
d t
h
at
f
eatures stra
tegies to
proce
ss, a
n
al
yze and unde
rstand im
ages. It aim
s
to
facsim
ile th
e po
ten
tial of
h
u
m
an
v
i
si
o
n
b
y
appreh
e
n
ding an im
age. One
of the
m
a
jor
hindra
nce t
o
the
appl
i
cat
i
o
ns o
f
com
put
er vi
si
on i
s
m
o
t
i
on
est
i
m
a
ti
on, s
h
ape est
i
m
at
i
on and a
n
al
y
s
i
s
.
Si
gn l
a
ng
ua
g
e
i
s
a
com
put
er vi
si
o
n
base
d i
n
t
act
i
n
t
r
i
cat
e l
a
ng
ua
ge t
h
at
enga
ge
s si
gns sh
ape
d
by
han
d
m
o
m
e
nt
s i
n
am
al
gam
a
t
i
on
wi
t
h
faci
al
ex
pressi
o
n
s an
d
post
u
re
s. It
m
a
ps
nat
u
ral
wa
y
of c
o
m
m
unicat
i
on t
o
h
u
m
a
n si
g
n
s a
n
d g
e
st
ure
s
en
ab
ling
h
e
ar
in
g im
p
a
ir
ed
peo
p
l
e t
o
co
mmu
n
i
cate am
o
n
g
th
em
. D
y
n
a
mic h
a
nd
m
o
v
e
men
t
s ar
e involv
e
d
i
n
gestures a
n
d t
h
ey form
signs suc
h
as
numbers
,
al
pha
bet
s
an
d sent
e
n
ces
.
C
l
assi
fi
cat
i
o
n of
gest
u
r
es can be
id
en
tified
as bo
th
static and
d
y
n
a
m
i
c. Stati
c
g
e
stures
invo
lv
e a tim
e in
v
a
rian
t fi
nger orie
ntations whereas
dy
nam
i
c gest
ures su
p
p
o
r
t
a t
i
m
e
vary
i
ng h
a
nd
ori
e
nt
a
t
i
ons an
d hea
d
p
o
si
t
i
ons
. The
pr
o
pose
d
f
o
ur
cam
e
ra
m
o
d
e
l fo
r si
g
n
lan
g
u
a
g
e
recog
n
ition
is a com
p
u
t
er v
i
sion
b
a
sed
app
r
o
a
ch
and
d
o
e
s not e
m
p
l
o
y
m
o
ti
o
n
or
co
lored
g
l
o
v
e
s for g
e
sture reco
gn
itio
n.
An efficien
t si
g
n
langu
ag
e reco
gn
itio
n syste
m
req
u
i
res
kn
owledg
e of
featu
r
e t
r
ack
i
n
g
an
d
h
a
nd
ori
e
nt
at
i
ons
. A
p
p
r
oach
fo
r i
d
ent
i
f
i
cat
i
on
of
si
gn l
a
ng
ua
ge
can be cat
e
g
o
r
i
zed as gl
ove
d
base
d ap
pr
oac
h
an
d
com
put
er vi
si
on
base
d ap
p
r
oac
h
. T
h
e f
o
rm
er i
nvol
ves
gl
ove
s t
o
be
em
pl
oy
ed by
si
gner
s
d
u
ri
n
g
t
h
e
per
f
o
r
m
a
nce o
f
ge
st
ure
s
.
It
m
i
nim
i
zes t
h
e com
p
l
e
xi
t
y
of se
gm
ent
a
t
i
o
n i
n
t
h
e st
a
g
e
s
o
f
pr
ocessi
n
g
bu
t
com
p
l
i
cat
es t
h
e ha
rd
wa
re
re
qui
rem
e
nt
s. Vi
si
on
ba
sed
ap
pr
oac
h
a
dva
nc
es i
m
age pr
oc
essi
ng
m
e
t
hod
ol
o
g
i
e
s
fo
r ha
n
d
si
g
n
d
e
t
ect
i
on an
d t
r
acki
n
g. T
h
is a
p
proach als
o
a
dds
on signer
f
acial p
r
o
c
lam
a
tio
n
and
is feasib
le to
sig
n
e
r as it does no
t inv
o
l
v
e
th
e u
tilizatio
n o
f
g
l
o
v
e
s. Accu
racy
h
i
nd
ers th
e usab
ility o
f
im
ag
e p
r
o
c
essin
g
approach m
a
king it an e
n
terprising
researc
h
field.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Si
g
n
L
a
ng
u
age
Reco
g
n
i
t
i
on
S
y
st
em
Si
m
u
l
a
t
e
d f
o
r Vi
de
o C
a
pt
ure
d
w
i
t
h
S
m
art
P
h
o
n
e
...
. (
G
.
An
a
n
t
h
R
a
o
)
2
177
B
a
si
cal
l
y
Si
gn l
a
ng
uage i
s
use
d
by
t
h
e
heari
n
g i
m
pai
r
ed peo
p
l
e
fo
r t
h
ei
r c
o
m
m
uni
cat
i
on.
Usi
n
g si
g
n
l
a
ng
uage
we
c
a
n c
o
m
m
uni
cat
e
l
e
t
t
e
rs wo
rd
s o
r
eve
n
se
nt
e
n
ces
of
ge
neral
sp
oke
n l
a
ng
ua
ge
by
usi
ng
di
f
f
ere
n
t
han
d
si
gn
s an
d
di
f
f
ere
n
t
ha
n
d
gest
u
r
es.
Thi
s
t
y
pe o
f
com
m
uni
cat
i
o
n
hel
p
s
heari
n
g
i
m
paired
pe
opl
e t
o
t
a
l
k
o
r
exp
r
ess
t
h
ei
r v
i
ews.
T
h
ese ki
nd o
f
sy
st
em
s
bri
dge or cha
n
nel
bet
w
een n
o
rm
al
peopl
e
a
nd hea
r
i
n
g
i
m
pai
r
e
d
peo
p
l
e
.
Basic sign langua
ge system
base
d
on the
5 param
e
ters and they are hand
and
head rec
ognition,
hand
and
hea
d
o
r
i
e
n
t
at
i
on, ha
nd
m
ovem
e
nt
,
s
h
ap
e
o
f
han
d
a
n
d
l
o
cat
i
o
n of ha
nd
an
d hea
d
(
d
epe
n
ds u
p
on
bac
k
gr
o
u
n
d
)
.
Am
ong
t
h
e
fi
ve
par
a
m
e
t
e
rs t
h
ere
are t
w
o
pa
ram
e
t
e
rs w
h
i
c
h a
r
e m
o
st
im
port
a
nt
a
n
d
t
h
ey
a
r
e
han
d
and
head
ori
e
n
t
at
i
on an
d ha
n
d
m
ovem
e
nt
in a part
i
c
ul
a
r
di
rect
i
o
n. T
h
es
e sy
st
em
s hel
p
s i
n
reco
g
n
i
z
i
n
g t
h
e
si
gn l
a
n
gua
ges
wi
t
h
bet
t
e
r ac
curacy
.
Ha
n
d
sha
p
es a
n
d
he
ad a
r
e se
gm
ented and
obta
in feature
vectors [1].
These feature vectors whic
h are
classi
fi
ed
and
gi
ven
t
o
n
e
ural
net
w
or
ks
fo
r
t
r
ai
ni
n
g
. B
y
adva
nci
n
g som
e
m
e
thods i
n
sign language
rec
o
gnition
a large researc
h
ca
n
be done for hum
an com
puter
interface
. T
h
e
m
a
jor
and
fo
rem
o
st
di
ffi
c
u
l
t
y
or t
a
sk i
n
si
g
n
l
a
ng
ua
ge rec
o
g
n
i
t
i
on i
s
t
o
i
d
ent
i
f
y
t
h
e si
gne
r, faci
al
ex
pre
ssi
ons
,
post
u
re
o
f
t
h
e
pers
o
n
a
nd
n
o
t
o
n
l
y
hi
s
h
a
nd
m
o
m
e
nt
s but
al
s
o
hi
s h
ead m
o
m
e
nt
s. Al
l
t
h
ese at
t
r
i
but
es
co
m
b
in
e tog
e
th
er can m
a
k
e
a go
od
reco
gn
itio
n system
.
The
next
m
a
jor
pr
o
b
l
e
m
obser
ved i
n
si
g
n
l
a
n
gua
ge
reco
g
n
i
t
i
on sy
st
em
i
s
to t
r
ac
k t
h
e si
g
n
er
fo
rm
t
h
e
vi
de
o by
av
oi
d
i
ng t
h
e
ot
he
r v
a
ri
at
i
ons i
n
t
h
e
vi
de
o. Thi
s
t
h
e
m
a
jor t
a
sk
w
h
ere al
l
t
h
e res
earche
r
s co
nce
n
t
r
at
e
o
n
. Track
i
n
g
han
d
is qu
ite sim
p
ler co
m
p
ared
to
t
r
ack all th
e fing
er
v
a
riatio
n
s
,
wh
ich
is t
h
e m
o
st d
i
fficult task
to perform
, a sign language space can
be
obtained with
differe
n
t entities suc
h
as hum
a
ns or
obj
ects stored in
it around a 3Di
m
ensional body centere
d
space of t
h
e signer. These e
n
titi
es ar
e locate
d
with certain l
o
cations
and later re
ferenced it by poi
n
ting to
space. To
defi
ne a m
odel
for spatia
l inform
ation, c
ontaini
ng entities is
anot
her
challenge faced
by res
earche
r
s [2].
In sel
f
i
e
si
g
n
l
a
ng
ua
ge rec
o
g
n
i
t
i
on sy
st
em
t
h
e han
d
posi
t
i
on pl
ay
s a cr
uci
a
l
rol
e
beca
use we a
r
e
im
ple
m
enting for differe
n
t positions
with
single
ha
nd c
a
n ca
use
overlap
of ha
nd on face. In selfie the
b
ackgroun
d is
n
o
t
con
s
tan
t
wh
ich
v
a
ries con
s
tan
tly wh
ic
h is on
e
of th
e
ch
allen
g
i
ng
task
in th
is
reco
gn
itio
n
syste
m
.
An
ot
he
r m
o
st
chal
l
e
ng
e f
ace
d
by
t
h
e
res
ear
cher
i
n
si
g
n
l
a
ng
ua
ge
reco
g
n
i
t
i
on sy
st
em
i
s
bac
k
gr
ou
n
d
o
f
t
h
e sign
er an
d
also
th
e contr
a
st o
f
t
h
e ligh
t
wh
er
e
sign
er
pr
esen
t.
W
e
h
a
v
e
wo
rk
ed
on
sim
p
le b
ackgr
oun
d.
Ou
r resea
r
ch i
s
based
up
on
sel
f
i
e
based si
gn l
a
n
g
u
age
re
cog
n
i
t
i
on
wi
t
h
real
t
i
m
e
const
r
ai
nt
s suc
h
as no
n-
u
n
i
f
o
r
m
b
ackgr
oun
d,
v
a
r
i
ed
lig
h
tin
g an
d to
m
a
k
e
th
e syste
m
in
d
e
p
e
nden
t
to th
e signer
.
Ob
j
ect
d
e
t
ectio
n
(segm
e
nt
at
i
on)
i
s
done by
a
ppl
y
i
n
g
gr
adi
e
nt
m
a
ski
ng t
o
t
h
e im
age. Thi
s
i
s
done bec
a
u
se t
h
e bac
k
g
r
ou
n
d
d
i
ffers greatly with
th
e ob
j
ect
in
th
e i
m
ag
e.
Ob
tain
i
n
g
t
h
re
shol
ds wi
t
h
di
f
f
ere
n
t
segm
ent
a
t
i
on o
p
erat
ors
.
Tune
t
h
e sy
st
em
wi
t
h
t
h
resh
ol
d
va
l
u
es an
d a
ppl
y
i
ng t
h
e
ed
ge t
o
get
bi
na
ry
m
a
ski
ng
of
t
h
e im
age. The
bi
na
ry
gra
d
i
e
nt
m
a
ski
ng i
s
us
ual
l
y
di
l
a
t
e
d usi
ng
vert
i
cal
st
ruct
uri
ng el
em
ent
s
l
i
k
e “bal
l
”
, “
d
i
s
k”
, “di
a
m
ond”
by
horizontal struct
uring elem
ents.
A se
nt
ence i
n
si
gn l
a
ng
ua
ge
i
s
reco
rde
d
u
s
i
ng a cam
era and t
h
e
obt
ai
ned
vi
de
o i
s
d
i
vi
ded i
n
t
o
sev
e
ral
fram
e
s. Each
sign
in
a fram
e
tak
e
n
ou
t of a set
of fram
es is proce
ssed a
n
d the
fe
atures a
r
e e
x
tracted
suc
h
that t
h
e
features a
p
ply
for
nearby
pre
ceding a
n
d s
u
cceeding
fram
e
s.
Our
research m
a
inly deals with
self
ie v
i
d
e
o
s
o
f
sign
languag
e
wh
ich
are d
i
v
i
d
e
d
in
t
o
f
r
a
m
e
s f
o
r p
r
o
cessing
.
Th
e pr
opo
sed syste
m
conce
n
t
r
at
es
o
n
se
gm
ent
i
ng
han
d
a
n
d
hea
d
fr
om
t
h
e gi
ve
n set
of
f
r
am
es p
r
o
b
a
b
l
y
f
o
r
va
ri
o
u
s si
gns
i
n
t
h
e
vi
de
o a
n
d
feat
ures
are
ext
r
a
c
t
e
d
fo
r
vari
ous
han
d
a
n
d
head
m
odel
s
.
In Sel
f
i
e
SLR
s
peo
p
l
e
can co
m
m
uni
cat
e wi
th t
h
e hel
p
of t
h
ei
r p
h
ones
by
recor
d
i
n
g t
h
ei
r si
gn
vi
de
o
and se
ndi
ng it to the other
pe
rson. The sent vide
o will
be decode
d according to th
e system we designe
d and
th
e sign
s are i
n
ev
itab
l
y conv
erted
t
o
tex
t
.
P.V.V.K
i
sh
or
e in
[
3
] pr
oj
ected
a sk
el
et
o
n
o
f
i
s
ol
at
ed vi
de
o
base
d ISL i
d
en
t
i
f
i
cat
i
on sy
st
em
i
n
whi
c
h
co
m
p
u
t
atio
n
a
l
in
tellig
en
ce an
d
im
ag
e p
r
o
cessing
m
e
th
o
d
o
l
og
ies are in
teg
r
ated. Wav
e
let b
a
sed
v
i
d
e
o
seg
m
en
tatio
n
p
r
o
c
ed
ures are ado
p
t
ed
fo
r d
e
tectio
n
of
han
d
sh
ap
es
and
h
ead
po
sition
s
. Ellip
tical
Fou
r
ier
descri
pt
o
r
s ac
hi
eves
uni
que
feat
ure
vect
o
r
s f
o
r eac
h i
n
di
vi
d
u
al
gest
u
r
e an
d l
i
n
ear
out
put
m
e
m
b
ershi
p
fu
nct
i
o
n
base
d
Su
ge
no
f
u
zzy
i
n
fe
re
nce sy
st
em
reco
gni
zes
these gestures at
a
rate of 96%
for
a 80 word
and
10
se
nt
ence t
r
a
i
ned
sy
st
em
.
P
.
V.
V.
K
i
sh
or
e in
[
4
]
r
e
comme
n
d
e
d
an e
f
f
i
c
i
en
t th
eo
ry
o
f
seg
m
e
n
tatio
n fo
r SLR in
wh
ich
back
g
r
o
u
nd i
n
vi
deo
s
i
s
assu
m
e
d t
o
be n
o
n
st
at
i
c
. Prof
us
e at
t
r
i
but
es of
im
ages such a
s
col
o
r
,
t
e
xt
u
r
e, and
b
oun
d
a
ry are fu
sed
for th
e lev
e
l sets
en
erg
y
fun
c
tio
n
d
e
tractio
n
.co
l
ou
r fra
m
e
s are ex
tracted
and
u
tilized
as
per
t
h
e
bac
k
gr
ou
n
d
req
u
i
r
em
ent
o
f
t
h
e
vi
de
o
fram
e
s. Spat
i
a
l
i
n
fo
rm
at
i
on i
s
p
r
ovi
ded
b
y
edg
e
m
a
ppi
n
g
a
n
d
bo
u
nda
ry
feat
u
r
es are
obt
ai
ne
d t
h
r
o
ug
h e
dge
m
a
p of i
m
age. Gesture sha
p
e
s
are en
um
erated dy
nam
i
ca
lly a
nd
are ass
u
m
e
d to be a
d
a
p
tive i
n
the entire
p
r
oc
ess o
f
se
gm
ent
a
t
i
on
of
vi
deo
f
r
am
es.
Tzu
u
-
H
se
ng i
n
[
5
]
end
o
rs
e
d
a sy
st
em
for t
h
e real
i
zat
i
on
of
gest
u
r
es
usi
n
g AB
C
b
a
sed M
a
r
k
o
v
Mo
d
e
l. Filtered
d
a
ta fro
m
AHRS
sen
s
ors u
n
d
e
rgo
p
r
i
n
cip
l
e co
m
p
on
en
t an
alysis for red
und
ant d
a
ta
eli
m
in
atio
n
and
featu
r
e
v
ecto
r
acqu
i
sitio
n. ABC-HMM se
rv
es th
e pu
rpo
s
e
o
f
classifi
er
with
th
e num
b
e
r o
f
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 6
,
N
o
. 5
,
O
c
tob
e
r
20
16
:
217
6
–
21
87
2
178
hi
d
d
en st
at
es obt
ai
ne
d f
r
om
k
m
eans al
gori
t
h
m
.
Thi
s
work als
o
provides a com
p
arision
of ac
hieve
d
recogn
itio
n
rat
e
with
v
a
riou
s classifiers related
to
m
a
rko
v
m
odel
s
and
i
s
foun
d t
o
prese
n
t
enc
o
u
r
agi
n
g
recognition rat
e
s
com
p
ared
to othe
r classifie
r
s.
Li
C
h
u
n
wan
g
i
n
[
6
]
em
barke
d
a m
odel
w
h
i
c
h c
oul
d m
easure t
h
e cl
os
ene
ss o
f
vi
de
o i
n
C
h
i
n
ese si
g
n
l
a
ng
uage
. T
h
i
s
m
odel
con
s
i
d
ers si
g
n
sem
a
nt
i
c
s whi
c
h d
e
fi
nes t
h
e
ha
n
d
sha
p
e, l
o
cat
i
on
, o
r
i
e
nt
at
i
o
n an
d
vi
si
o
n
com
p
o
n
e
nt
w
h
i
c
h i
s
di
st
ance base
d
o
n
VLB
P
. T
h
e e
xpe
ri
m
e
nt
al
resul
t
s
of t
h
i
s
m
odel
pr
o
v
ed e
f
fect
i
v
e
assessm
en
t o
f
measu
r
ing
sim
i
larity.
Sut
a
rm
an i
n
[7]
pr
op
ose
d
a Dy
nam
i
c M
a
lay
s
i
a
n si
gn l
a
ng
ua
ge rec
o
g
n
i
t
i
on sy
st
em
in w
h
i
c
h
3
D
Im
age dat
a
i
s
c
a
pt
u
r
ed
f
r
om
ki
nect
sens
o
r
u
s
i
ng s
k
el
et
al
dat
a
t
r
acki
n
g
.
Fea
t
ure e
x
t
r
act
i
o
n
i
s
do
ne
usi
ng t
h
e
x,
y, z co
o
r
d
i
nates relativ
e to
h
ead
and
sp
ine p
o
s
ition
s
.
Sp
h
e
rical co
ordin
a
te co
nv
ersio
n
p
r
o
cess is
o
b
t
ained
usi
n
g
se
gm
ent
a
t
i
on fo
r di
m
e
nsi
o
n
m
a
t
c
hi
ng.
B
a
c
k
p
r
o
p
a
g
at
i
o
n
neu
r
al
net
w
or
ks are
use
d
f
o
r
cl
assi
fi
cat
i
o
n
usi
n
g n
o
d
e va
ri
at
i
ons i
n
hi
d
d
en l
a
y
e
r
.
Th
e sy
st
em
i
d
ent
i
f
i
e
s 15 M
a
l
a
y
s
i
a
n si
gn l
a
n
gua
ges
wi
t
h
8
0
.
5
4
%
accuracy .This syste
m
offers
a
n
advanta
g
e in i
m
age acquisition usi
ng
dept
h i
m
ages and by e
m
ploys
infrare
d
lig
h
t
fo
r ind
e
pen
d
e
n
c
y
o
f
ligh
tin
g cond
itio
ns.
Lu
bo
-Ge
n
g i
n
[8]
cam
e up w
i
t
h
C
h
i
n
ese si
g
n
l
a
n
gua
ge
rec
o
g
n
i
t
i
on
sy
st
em
i
n
whi
c
h t
h
e
han
d
s
h
a
p
e
feat
ure
s
e
x
t
r
ac
t
e
d f
r
om
dept
h i
m
ages and
sp
heri
cal
c
o
o
r
di
nat
e
feat
u
r
e
s
ext
r
act
e
d
f
r
o
m
3D ha
n
d
m
o
ti
on
trajectories form the featur
e
vector. E
x
tre
m
e Learning
Machine is
e
m
ployed as classifier.ELM
has
a
n
accuracy
of 82.79% a
nd
provi
des
6 %
im
provem
e
nt in
ide
n
tification c
o
m
p
ared
t
o
SVM.
This
syste
m
in
corpo
r
ates sp
atial an
d
temp
oral representatio
n
wh
ic
h
dep
i
cts sp
atial co
nn
ectiv
ity fo
r recog
n
ition an
d
is
i
m
m
u
n
e
for illu
m
i
n
a
tio
n
ch
an
g
e
s an
d clu
s
tered b
a
ckg
r
o
und
s.
Set
i
a
war
dha
na
i
n
[9]
de
vel
o
p
e
d a
n
o
p
en
C
V
an
dr
oi
d
a
ppl
i
c
at
i
on t
h
at
ca
n
el
uci
d
at
e si
gn
l
a
ng
ua
ge t
o
speec
h.
Vi
ol
a
J
one
s al
g
o
ri
t
h
m
an
d
KN
N
cl
assi
fi
er a
r
e em
pl
oy
ed
f
o
r
t
h
e sy
st
em
desi
g
n
.
S
ki
n c
o
l
o
u
r
det
e
ct
i
on,
Thre
sh
ol
di
ng
, Noi
s
e rem
oval
i
s
carri
ed on
i
n
p
u
t
im
ages in w
h
i
c
h si
g
n
s
are based fi
nge
r al
p
h
abet
s
.
The
syste
m
l
i
m
its i
t
s u
tilizatio
n
fo
r a
d
i
stan
ce
ran
g
e
of 50
cm at an
ang
l
e of 0
d
e
grees
b
e
tween
th
e
h
a
nd
and
ca
m
e
ra, Six to
twen
ty FPS is
req
u
i
red fo
r sing
le sign
i
d
en
tificatio
n
.
Neha
ba
ra
nwa
l
i
n
[
1
0]
ad
v
a
nced
an
I
n
di
an si
gn
l
a
n
g
u
a
ge i
d
e
n
t
i
f
i
cat
i
on sy
st
em
t
h
at
hel
p
s
t
o
un
de
rst
a
n
d
t
h
e
i
nheri
t
m
eaning
of a com
m
uni
cat
i
o
n sy
st
em
where gest
u
r
e pl
ay
a key
rol
e
.
No
vel
G
e
st
ure
Ide
n
tification
mechanism
is devel
ope
d
where
the
com
p
ressed data from DWPT ext
r
acts its features using
PCA
an
d ANN serv
es as classif
i
er
.DW
P
T
of
f
e
r
s
eff
i
cien
t
p
r
o
cessin
g
and is adv
a
n
t
ag
eou
s
co
m
p
ar
ed
to
H
aar
an
d W
a
v
e
let
Tran
sfo
r
m
s
.
Prat
i
b
ha Pan
d
e
y
i
n
[11]
put
f
o
rt
h a revi
ew
o
f
va
ri
ou
s app
r
o
aches f
o
r t
r
a
n
sl
at
i
on o
f
si
gn l
a
ng
ua
ge t
o
t
h
e w
o
rl
d co
m
m
uni
cat
i
on l
a
ng
ua
ge.
Vari
ous
Ha
n
d
ge
st
ure i
d
ent
i
f
i
cat
i
on m
e
t
h
o
d
s b
a
sed
on
I
n
st
r
u
m
e
nt
e
d
Gloves,
vision, and c
o
lor m
a
rker are foc
u
se
d in this
w
o
r
k
.
DC
T i
s
em
pl
oy
ed f
o
r
feature extrac
tio
n. KOM,
CAMSHIFT
,
TMM m
odels are s
o
m
e
of the
m
echanis
m
s
em
p
l
o
y
ed
to
acco
m
p
lish
th
e task
o
f
translatio
n.
Neel
esh Sa
ra
wat
e
i
n
[1
2]
devel
ope
d Am
eri
can Si
g
n
L
a
ng
ua
ge (
A
SL
) wo
r
d
reco
g
n
i
t
i
on sy
st
e
m
su
ppo
rted
n
e
u
r
al n
e
twork
s
and
a p
r
o
b
a
b
ilistic
m
o
d
e
l is co
n
f
erred
.
A Cy
b
e
r Gl
o
v
e an
d Flo
c
k
of Birds area
u
n
it won
t
to
track
b
e
nd
ing
fi
n
g
e
rs,
h
a
n
d
positio
n
,
an
d
h
a
nd
orien
t
ation
.
A pro
b
a
b
ilistic
m
o
d
e
l supp
ort
e
d
th
e
Markov c
h
ains
and HMM is em
ployed to
m
e
thod the outp
uts of the feature
vectors
.
The s
y
ste
m
has accuracy
of
95.4% over a
vocabul
ar
y of
4
0
A
S
L w
o
r
d
s.
The exte
nsive l
iterature provi
d
es a SLR syste
m
desi
g
n
th
at
was n
e
v
e
r tested
fo
r real ti
m
e
ap
p
lication
o
n
m
o
b
ile p
l
atform
s. In
th
is
work we try to si
m
u
late
t
h
i
s
new
ap
pr
oac
h
fo
r si
g
n
l
a
ng
ua
ge rec
o
gni
ze
r.
Sel
f
i
e
cam
e
ra capt
u
r
e
s si
g
n
l
a
ng
ua
ge c
o
nt
i
n
u
o
u
s
vi
de
os
u
nde
r s
i
m
p
l
e
back
gr
o
u
n
d
s.
T
h
e si
gn
s are
si
n
g
l
e
ha
nd
e
d
si
m
p
le sig
n
s
, t
o
facilitate selfie cap
ture
b
y
ho
ld
ing
selfie st
ick
in
t
h
e
o
t
h
e
r h
a
n
d
of t
h
e si
g
n
e
r
[13
]
,[14
].
Pre-filterin
g
for rem
o
v
i
ng
v
i
deo
cap
t
u
r
e no
i
s
e du
ring
im
ag
e acqu
i
sitio
n
is d
o
n
e
u
s
ing
m
u
ltip
le sets
of
Gau
ssi
an fi
l
t
ers of ze
r
o
m
e
an an
d
0.
1 t
o
0
.
5 ra
n
g
e va
riances. Ha
nd a
n
d
head c
o
ntour s
h
ape e
x
traction use
s
sobl
e e
d
ge o
p
e
r
at
or
en
ha
nced
wi
t
h
m
o
rp
hol
ogi
cal
ope
rat
i
o
n. S
h
a
p
e e
n
er
g
y
i
s
m
odel
l
e
d
wi
t
h
di
sc
ret
e
c
o
si
ne
trans
f
orm
(DCT) and this
fea
t
ure is
optim
ized with
pr
inciple com
ponent
analysis (PCA). Finally, each
fram
e
is represente
d
with a
1×50 featur
e vect
o
r
. An
a
v
e
r
age o
f
22
0 fram
e
s
were det
ect
ed per
si
gn
m
a
king
t
h
e
feat
ure
m
a
t
r
i
x
22
0×
50
p
e
r
vi
d
e
o.
2
0
si
gn
s
ha
vi
n
g
E
n
gl
i
s
h al
pha
bet
s
,
se
nt
en
ces i
n
re
gul
a
r
use a
r
e ca
pt
u
r
e
d
.
To
cut down on cl
assification time minim
u
m
distance classi
fier (MDC
)
with
Eu
clid
ia
n
distance is em
ployed.
Wo
r
d
m
a
t
c
hi
ng sco
r
e (
W
M
S
) i
s
t
h
e perf
or
m
a
nce est
i
m
a
tor w
h
e
n
t
e
st
i
ng t
h
e pr
o
p
o
s
ed
SLR
sy
st
em
.
It is th
e ratio
of correct classificatio
n
s
to
t
o
tal n
u
m
b
e
r of sig
n
s
. Mu
ltip
le testin
g
with
10
d
i
fferen
t
sig
n
e
rs has
resu
lted in
an
av
erag
e
WMS o
f
91
.23
%
.
Fu
rt
h
e
r t
h
is is th
e fi
rst ti
m
e
i
n
literatu
re th
i
s
k
i
nd
o
f
m
e
t
h
od
is
propose
d
for c
a
pturing sign videos
.
But the
processi
ng is done
using alre
ady propos
ed
m
e
thods in literature
t
o
m
a
ke t
h
e sy
st
em
proce
ss si
gn
vi
deo
s
i
n
sh
ort
peri
od
o
f
t
i
m
e
.
The
rest
of t
h
e pa
per
i
s
o
r
g
a
ni
zed a
s
f
o
l
l
o
ws.
Sect
i
o
n 2
gi
ves
resea
r
ch
m
e
t
hod
ol
y
,
m
a
t
h
em
ati
cal
m
o
d
e
llin
g
an
d d
e
riv
e
d
algorith
m
s
fo
r p
r
e-filtering
, se
gmen
tatio
n
,
featu
r
e ex
t
r
actio
n an
d
classificatio
n
.
Resu
lts and
analysis is in
sectio
n
3
.
Sectio
n 4
con
c
lu
des t
h
e
pr
o
pose
d
n
ovel
i
d
ea
of
S
L
R
capt
u
re
wi
t
h
sel
f
i
e
front cam
era.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Si
g
n
L
a
ng
u
age
Reco
g
n
i
t
i
on
S
y
st
em
Si
m
u
l
a
t
e
d f
o
r Vi
de
o C
a
pt
ure
d
w
i
t
h
S
m
art
P
h
o
n
e
...
. (
G
.
An
a
n
t
h
R
a
o
)
2
179
2.
PRE-P
R
O
C
E
SSIN
G
,
SEG
M
EN
ATIO
N,
FEATU
R
E E
X
T
R
ACTI
O
N
A
N
D
CL
ASS
I
FIC
A
TIO
N
The
fl
o
w
cha
r
t
of t
h
e
pr
op
os
ed SLR
i
s
s
h
o
w
n i
n
Fi
gu
re
1
.
The
pi
ct
u
r
e
u
nde
r t
h
e
fi
rst
b
l
ock s
h
ows
t
h
e capt
u
re
m
e
chani
s
m
fol
l
o
wed
i
n
t
h
i
s
w
o
r
k
fo
r
vi
de
o
capt
u
re.
Ac
qui
red
vi
deo
i
n
m
p
4
f
o
rm
at
havi
ng
f
u
l
l
HD
1
9
20×
1
0
8
0
vi
de
o
reco
r
d
i
n
g
o
n
a
5M
pi
x
e
l
C
M
OS
fr
ont
cam
era. Let
t
h
i
s
2
D
vi
de
o
be
rep
r
ese
n
t
e
d a
s
a 2
D
fram
e
2
,
(
,
)
xy
xy
. For vi
deo the fram
e
,
x
y
chan
ges
wi
t
h
t
i
m
e, whi
c
h i
s
fi
xed
uni
ver
s
al
l
y
at 3
0
fram
e
s
p
e
r seco
nd
. Th
ese v
i
d
e
o
s
fo
rm
th
e d
a
tab
a
se o
f
th
is wo
rk
. A
3
fo
l
d
2
D
Gau
ssian filter
2
2
2
2
2
1
()
2
xm
D
Kx
e
[2]
with ze
ro m
ean (
m
=0
)
and thre
e va
ri
ances
of
0.
01
,
0
.
1
,
0
.
1
5
sm
oot
he
ns eac
h
fram
e
by
rem
ovi
n
g
s
h
ar
p
va
ri
at
i
ons
d
u
ri
n
g
c
a
pt
u
r
e.
Fi
gu
re
1.
Fl
o
w
cha
r
t
o
f
si
gn
l
a
ng
ua
ge
reco
g
n
i
t
i
on
sy
st
em
wi
t
h
sm
art
ph
o
n
e
fr
ont
cam
era vi
de
o ca
pt
ure
The sm
oothe
d
fram
e
s in real space
are t
r
eat
ed wi
t
h
a
ne
w
t
y
pe of m
u
l
t
i
di
m
e
nsi
onal
so
bel
m
a
sk
[15
]
. Fro
m
li
teratu
re th
e so
b
e
l edg
e
o
p
e
rator is a
2
D
g
ead
ien
t
op
erator. Grad
ien
t
s prov
ide
in
form
at
io
n
related
to
ch
ang
e
s i
n
th
e d
a
ta alon
g with
t
h
e
d
i
rectio
n
o
f
m
a
x
i
mu
m
ch
an
ge.
Fo
r 2D
g
r
ad
ien
t
calcu
latio
n, t
w
o 1D
gra
d
i
e
nt
s i
n
x
and
y
directions of the
fram
e
m
a
tr
ix are
com
pute
d
as
follows.
1
,(
)
N
x
k
g
xk
yg
k
(1)
1
,(
)
N
yT
k
g
x
y
k
g
k
(2)
Whe
r
e
1,
1
g
is th
e discrete g
r
ad
ient o
p
e
rator. Th
e g
r
ad
ien
t
m
a
g
n
itu
d
e
xy
G
gi
ves m
a
gni
t
u
de
of
edge
st
ren
g
t
h
i
n
so
bel
e
dge
d
e
t
ect
or com
put
ed as
22
()
()
xy
x
y
Gg
g
. For convenie
nce s
oble
represe
n
ted the
fu
nct
i
o
n usi
n
g
a 2D co
n
v
o
l
ut
i
on m
a
sk
12
1
00
0
12
1
Mx
S
and
12
1
000
12
1
T
My
S
. The
s
e m
a
sks are
sen
s
itiv
e t
o
ligh
tin
g
v
a
riations, m
o
tio
n
b
l
ur
an
d cam
er
a v
i
b
r
ation
s
, wh
ich
are co
mm
o
n
l
y a cau
se
o
f
co
n
c
ern
for si
g
n
v
i
d
e
o
acq
u
i
sition
u
n
d
e
r selfie m
o
de. A su
itab
l
e
thresho
l
d at th
e en
d will ex
tract
th
e fi
n
a
l
b
i
n
a
ry h
a
nd
and
hea
d
p
o
r
t
i
ons
. E
dge a
d
a
p
t
i
v
e t
h
resh
ol
d
i
ng i
s
c
onsi
d
er
ed wi
t
h
bl
oc
k
vari
at
i
o
nal
m
e
an o
f
eac
h 3×
3 so
bel
m
a
sk i
s
u
s
ed
a
s
t
h
re
sh
ol
d
.
T
h
e fi
nal
bi
na
ry
i
m
age i
s
22
22
11
1
Nb
N
x
M
x
x
My
y
M
x
x
My
y
xi
x
BS
S
S
S
(
3
)
Whe
r
e
b
i
s
t
h
e b
l
ock si
ze.
Thi
s
pr
oce
d
u
r
e i
s
f
a
st
and
red
u
ce
s bac
k
g
r
o
u
nd
vari
at
i
o
ns a
u
t
o
m
a
t
i
call
y
with
ou
t hu
m
a
n
in
terv
en
tio
n
of selectin
g
a suitab
l
e th
resho
l
d as i
n
case o
f
sob
e
l
edge
ope
rat
o
r
.
Fi
g
u
re
2 sho
w
s
t
h
e di
ffe
rence
i
n
bl
oc
k t
h
res
h
ol
di
n
g
a
n
d
gl
o
b
al
t
h
resh
ol
di
n
g
(use
d
0.
2
)
w
h
i
c
h
fai
l
e
d t
o
h
a
ndl
e m
o
t
i
o
n
b
l
ur.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 6
,
N
o
. 5
,
O
c
tob
e
r
20
16
:
217
6
–
21
87
2
180
Fi
gu
re
2.
(a
) B
l
ock
va
ri
at
i
ona
l
m
ean t
h
res
h
o
l
ded
fram
e
. (
b
)
Gl
o
b
al
t
h
res
h
ol
d
o
f
0.
2 f
o
r s
obel
m
a
sk
Sig
n
lang
u
a
g
e
d
e
f
i
n
e
s
h
a
nd
sh
ap
es.
H
a
nd
sh
ap
es ar
e d
e
f
i
n
e
d b
y
p
r
ecise con
t
ou
r
s
t
h
at
f
o
r
m
ar
ound
t
h
e ed
ges
o
f
t
h
e
han
d
i
n
t
h
e vi
de
o
fram
e
. A
ha
nd
co
nt
ou
r
Cx
Hx
C
B
in
sp
atial d
o
m
ain
.
A si
m
p
l
e
di
ffe
re
nt
i
a
l
m
o
rp
h
o
l
o
gi
cal
gr
adi
e
nt
o
n
t
h
e
bi
na
ry
im
ag
e with connected
com
p
onent a
n
alysis separat
e
s head
and
ha
nd c
ont
ou
rs. M
o
r
p
h
o
l
ogi
cal
g
r
a
d
i
e
nt
i
s
defi
ne
d by
l
i
n
e
m
a
sks i
n
ho
ri
zo
nt
al
3
H
M
and ve
rtical
3
V
M
di
rect
i
o
ns
of
l
e
ngt
h
3. C
o
nt
o
u
r
e
x
t
r
act
i
o
n i
s
r
e
prese
n
t
e
d
as
33
ˆˆ
()
|
|
Cx
x
HH
zz
Hx
z
M
B
z
M
B
(4)
33
ˆˆ
()
|
|
Cx
x
VV
zz
H
y
zM
B
z
M
B
(5)
,(
)
(
)
CC
C
Hx
y
Hx
H
y
(
6
)
Han
d
a
nd
hea
d
cont
ou
rs are s
e
parat
e
d by
fi
n
d
i
n
g t
h
e c
o
n
n
e
c
t
e
d com
pone
n
t
s wi
t
h
m
a
ximum
num
ber
of
pi
xel
s
wi
t
h
a 4 nei
g
h
b
o
u
r
h
oo
d o
p
erat
i
o
n
o
n
t
h
e co
nt
o
u
r
i
m
age
,
C
Hx
y
. Fi
gu
re 3 p
r
o
v
i
des vi
sual
con
f
orm
a
t
i
on o
f
t
h
e
di
sc
ussi
o
n
.
Fi
gu
re
3.
(a
) C
a
pt
u
r
e
98
th
fra
m
e
. (b
) Se
gm
ented
Fram
e. (c)
Ha
nd
Co
nto
u
r
.
(
d
)
Hea
d
C
o
n
t
ou
r a
n
d
(e
)
bo
th
cont
ou
rs
Featu
r
es are
un
iqu
e
rep
r
esentatio
n
of obj
ects in
th
is
wo
rl
d
.
Featu
r
e is a set o
f
m
easu
r
ed
q
u
a
n
tities in
a 1D s
p
ace
re
prese
n
ted a
s
()
(
)
|
V
Fx
f
x
x
, whe
r
e
f
x
can
b
e
an
y tran
sfo
r
matio
n
or op
timizatio
n
m
odel
on vect
or
x
. To
p pri
o
ri
t
y
i
n
t
h
i
s
work
i
s
sped o
f
exec
ut
i
on
of t
h
e p
r
op
ose
d
al
go
ri
t
h
m
.
Hence
f
x
is
con
s
i
d
ere
d
as
Di
scret
e
C
o
si
n
e
Tra
n
sf
orm
(DC
T
)
[
18]
al
o
n
g
wi
t
h
Pri
n
ci
pl
e C
o
m
pone
n
t
Anal
y
s
i
s
(
P
C
A
)
[1
6]
.
Th
e 2D
D
C
T of
h
a
nd
con
t
ou
r
C
H
x
a
n
d h
e
ad
con
t
ou
r
C
H
x
is com
puted
as
11
12
1
2
1
cos
c
o
s
42
2
NN
Vu
v
C
uv
xy
xy
FC
C
H
x
u
v
NN
(7)
Whe
r
e
1
.0
2
uv
CC
u
v
and 1
e
l
sewhe
r
e. Sim
i
lar
expression with
C
H
x
calculated
2D DCT of
h
e
a
d
con
t
ou
r
as
V
uv
F
. Fi
g
u
re
4 s
h
ows a
col
o
r c
o
ded
re
prese
n
t
a
t
i
on o
f
han
d
D
C
T feat
ur
es f
o
r t
h
e f
r
am
e i
n
a
vide
o se
quenc
e
. T
h
e
head does not c
h
ange
m
u
ch in an
y
of the
fram
e
s captured and
he
nce hea
d
c
ont
our DC
T
rem
a
in
s fairly co
nstan
t
throug
hou
t th
e
v
i
deo
sequ
en
ce.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Si
g
n
L
a
ng
u
age
Reco
g
n
i
t
i
on
S
y
st
em
Si
m
u
l
a
t
e
d f
o
r Vi
de
o C
a
pt
ure
d
w
i
t
h
S
m
art
P
h
o
n
e
...
. (
G
.
An
a
n
t
h
R
a
o
)
2
181
Fi
gu
re 4.
2
D
D
C
T
re
prese
n
t
a
t
i
on o
f
han
d
c
o
nt
o
u
r
ene
r
gy
re
prese
n
t
a
t
i
o
n
s
The
fi
rst
50×
5
0
m
a
t
r
i
x
of
va
l
u
es p
o
sse
ss
m
a
xim
u
m
am
ou
nt
of e
n
e
r
gy
i
n
a
fram
e
. B
u
t
t
h
i
s
DC
T
matrix
for ev
ery fram
e
co
n
s
istin
g
of
2
500
valu
es re
p
r
esen
t
i
n
g
a si
g
n
will co
st program
e
x
ecu
tion
tim
e.
PCA
treat
m
e
n
t
o
f
t
h
e m
a
trix
V
uv
F
, whi
c
h ret
a
i
n
s
o
n
l
y
t
h
e uni
q
u
e c
o
m
pone
nt
s o
f
t
h
e m
a
t
r
i
x
V
uv
F
. T
h
e final
V
uv
F
is
represe
n
ted as
V
fn
F
, whe
r
e
fn
gives fram
e nu
m
b
er.
PCA re
duc
es the feat
ure vect
or
per fram
e to 50 sam
p
le
val
u
es
per f
r
a
m
e. Each 50 sa
m
p
l
e
Ei
gen ve
ct
or f
r
om
PC
A uni
q
u
el
y
rep
r
e
s
ent
s
DC
T ene
r
gy
o
f
t
h
e ha
nd
shape
in each fram
e.
The
feature si
gn m
a
trix
V
fn
F
i
nput
s a cl
assi
fi
er
.
Si
nce s
p
ee
d i
s
t
h
e
pri
m
e const
r
ai
nt
d
u
ri
ng
m
obi
l
e
i
m
p
l
e
m
en
tatio
n
,
it will
b
e
reaso
n
ab
le
t
o
u
s
e min
i
m
u
m
d
i
st
a
n
ce classifier (MDC).
Th
is is
o
n
e
sim
p
le cla
ssifier
th
at do
es no
t
requ
ire
p
r
i
o
r t
r
ain
i
ng
. Mah
a
l
a
n
o
b
i
s
d
i
stan
ce is th
e m
e
tric
th
at will assig
n
class v
a
riab
les to
di
ffe
re
nt
si
gn
cl
asses. M
a
hal
a
no
bi
s di
st
a
n
c
e
[1
6]
i
s
chos
en f
o
r S
L
R
cl
assi
fi
cat
i
on
on
sm
art
pho
nes
ove
r
Eucl
i
d
i
a
n
di
st
ance, t
h
e f
o
r
m
er i
n
cl
ude
s i
n
t
e
r sam
p
l
e
cova
ri
ance
’s i
n
di
ffe
re
nt
di
re
ct
i
ons
du
ri
n
g
di
st
anc
e
calcu
latio
n
.
Th
e Mah
a
lan
obis d
i
stan
ce equ
a
ls Eu
cli
d
ian d
i
stan
ce
for
u
n
c
orrelated
data with
in
ter class
vari
a
n
ce
of
u
n
i
t
y
.
The s
q
uare
d M
a
hal
a
n
obi
s
di
st
ance
2
M
D
i
s
gi
v
e
n as
21
T
VV
M
f
nC
C
f
nC
D
FS
F
S
(8)
Whe
r
e
C
S
i
s
m
e
an ve
ct
or
o
f
ea
ch si
g
n
cl
ass
d
e
fi
ne
d by
V
fn
F
.
C
is th
e in
ter class
co
v
a
rian
ce m
a
trix
and
1
C
is its in
v
e
rse m
a
trix
.
Wh
ere
T
is transpositio
n
.
B
u
t faulty d
i
stan
ces
a
r
e m
easured i
f
the inte
r class
vari
a
n
ce i
s
ver
y
l
a
rge. I
n
si
g
n
l
a
ng
ua
ge vi
d
e
os ha
n
d
sha
p
e vari
at
i
o
n
s
wi
t
h
i
n
a part
i
c
ul
ar si
gn cl
ass a
r
e very
sm
al
l
m
a
ki
ng
M
a
hal
a
n
obi
s
d
i
st
ance i
d
eal
fo
r si
g
n
l
a
ng
ua
ge
cl
assi
fi
cat
i
on.
3.
R
E
SU
LTS AN
D ANA
LY
SIS
The f
r
ont
cam
era vi
deo
rec
o
rdi
ng
of
si
g
n
l
a
ng
uage
gest
ures
usi
ng
wi
t
h
sm
art
pho
ne
s As
us Ze
n
ph
o
n
e I
I
a
nd
S
a
m
s
un
g gal
a
xy
S4 at
t
h
e en
d
of sel
f
i
e
st
i
c
k.
B
o
t
h
t
h
e
m
obi
l
e
s are e
qui
ppe
d wi
t
h
5M
pi
x
e
l
fr
on
t
cam
e
ra. Si
gn
vi
de
o capt
u
ri
n
g
i
s
const
r
ai
ne
d i
n
a cont
r
o
l
l
i
ng en
vi
r
onm
ent
wi
t
h
ro
om
l
i
ght
i
ng a
nd si
m
p
le
back
g
r
o
u
n
d
.
T
h
e fi
rst
ph
ot
o
i
n
Fi
gu
re 1 de
m
onst
r
at
es
t
h
e
p
r
oce
d
ure f
o
l
l
o
we
d f
o
r si
g
n
e
r
s fo
r vi
de
o
ca
pt
u
r
e.
Th
e
resu
lts d
i
scu
ssion
is presen
ted in
t
w
o
sectio
n
s
:
q
u
an
titativ
e and qu
alitativ
e. Qu
an
titativ
e an
alysis
p
r
ov
id
es
v
i
su
al
o
u
t
co
m
e
s o
f
th
e wo
rk
an
d
qu
alitativ
e an
alysis relates to
v
a
ri
o
u
s
con
s
train
t
s o
n
t
h
e algo
rith
m
and how a
r
e these constrai
nts
handled.
3.
1.
Visual An
aly
s
is
Each
vi
de
o se
que
nce i
s
ha
v
i
ng a m
eani
ngf
ul
sent
e
n
ce.
The f
o
l
l
o
wi
n
g
sent
e
n
ce i
s
“Hai
G
o
o
d
M
o
r
n
in
g, I am
P R I D H U
,
Have
A Nice
Day
,
By
e Th
a
nk
You”.
The
r
e are 18 wo
rds
in the sentenc
e
. The
wo
rd
s i
n
t
h
e
t
r
ai
ni
n
g
vi
de
o
are se
qu
ence
d
i
n
t
h
e a
b
o
v
e
o
r
de
r
b
u
t
t
e
st
i
ng
vi
d
e
o
co
nt
ai
ns sam
e
wo
r
d
s i
n
di
ffe
re
nt
o
r
de
r.
C
l
assi
fi
cat
i
on
of t
h
e
wo
rd
s i
s
t
e
st
ed
with Eu
clid
ian
,
Normalized
Eu
clid
ian
an
d
Mah
a
lano
b
i
s
distance functi
ons
.
Fe
w frames
o
f
t
h
e
vi
deo
seq
u
ence
are i
n
Fi
gu
res
5 t
o
8.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 6
,
N
o
. 5
,
O
c
tob
e
r
20
16
:
217
6
–
21
87
2
182
Figu
re 5.
Fe
w seq
u
ence
o
f
fra
m
e
s
fo
r
sig
n
‘
H
A
I
or
HE
LL
O’
Figu
re
6.
Sig
n
fram
e
s fo
r ‘
G
OO
D’
(T
o
p
)
a
n
d
‘M
or
nin
g
’
(
B
ottom
)
Figure
7.
Fram
es of Si
gn ‘I’ ‘AM’
The ad
va
nt
age
of usi
ng
fr
ont
cam
e
ra i
s
pro
n
o
u
n
ce
d fr
om
Fi
gu
re 7.
Here t
h
e si
g
n
er c
o
r
r
e
c
t
s
him
s
el
f
du
ri
n
g
t
h
e si
g
n
i
ng
p
r
oces
s
wh
i
c
h hel
p
s t
o
gi
v
e
co
rrect
ha
nd
si
gn
t
o
t
h
e sy
st
em
.
Fi
gu
re
8.
Si
n
g
l
e
fram
e
s per
si
gn
.
(Fr
o
m
top l
e
ft): ‘
P
’,
‘R
’,
‘I
’,
‘D
’,
‘
H
’,
‘
I
AM
’,
‘T
H
A
N
K
’,
‘
Y
OU
’,
‘B
YE’
,
‘N
O S
I
GN
’
Filterin
g
an
d
ad
ap
tiv
e th
resho
l
d
i
ng
with
sob
e
l grad
ien
t
pro
d
u
ces
reg
i
on
s o
f
si
gn
er’s h
a
n
d
s
and
h
ead
seg
m
en
ts. Mo
rp
ho
log
i
cal d
i
fferen
tial grad
ien
tin
g with
re
s
p
ect to line st
ructuring el
em
e
n
t as in
eq
’n
4 to
6
refi
ne
s t
h
e
ed
g
e
s o
f
han
d
s
an
d
read
p
o
rt
i
ons
. Fi
g
u
r
e
9 s
h
o
w
s t
h
e re
sul
t
s
of
t
h
e se
gm
ent
a
t
i
on
pr
ocess
o
n
a
f
e
w
fram
e
s. Ro
w (a) h
a
s
o
r
i
g
in
al
RGB cap
tured
v
i
d
e
o
fram
e
s.
Row (b
)
h
a
s Gau
s
sian
filtered, sob
l
e gradien
t
ed
an
d reg
i
on
filled
o
u
t
p
u
t
s
of t
h
e fram
es in
row (a). Th
e
last row con
t
ain
s
m
o
rp
ho
log
i
cal su
b
t
racted
ou
tp
u
t
s
of
th
e fr
am
es in
ro
w (b
).
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Si
g
n
L
a
ng
u
age
Reco
g
n
i
t
i
on
S
y
st
em
Si
m
u
l
a
t
e
d f
o
r Vi
de
o C
a
pt
ure
d
w
i
t
h
S
m
art
P
h
o
n
e
...
. (
G
.
An
a
n
t
h
R
a
o
)
2
183
Fig
u
re
9
.
(a) Few
fram
e
s in
RGB fo
rm
at. (b) Th
eir re
g
i
on
seg
m
en
ts with
Gau
s
sian
filterin
g
and
sob
l
e
ope
rat
i
o
n.
(c)
C
ont
ou
rs
of
ha
nds
a
n
d
hea
d
p
r
o
d
u
ced
wi
t
h
m
o
rph
o
l
o
gi
cal
subt
ract
i
o
n
wi
t
h
l
i
n
e
st
ruct
uri
n
g
ele
m
ents
The e
n
er
gy
o
f
t
h
e ha
n
d
an
d
h
ead c
ont
ou
rs
g
i
ves f
eat
ures
for si
gn classification.
2D DC
T calculates
ener
gy
of t
h
e
h
a
nd
an
d
hea
d
c
ont
ou
rs.
DC
T
i
s
uses
o
r
t
h
o
g
o
n
al
ba
si
s f
unct
i
ons
t
h
at
rep
r
es
ent
t
h
e si
g
n
al
e
n
er
gy
wi
t
h
m
i
nim
u
m num
ber o
f
fre
que
ncy
d
o
m
a
in sam
p
l
e
s t
h
at
can effectively use to re
prese
n
t the entire ha
nd
a
nd
h
ead curv
atur
es. As sh
own
i
n
Figu
r
e
4,
f
i
r
s
t
50
×50
sam
p
le
s of
th
e D
C
T
matr
ix
w
e
r
e
extr
acted
.
Th
ese
2
500
sam
p
les o
u
t
of
65
536
sam
p
les ar
e eno
ugh
to
r
e
pr
odu
ce th
e o
r
i
g
in
al
co
n
t
o
u
r
using
inv
e
r
s
e
D
C
T. Th
is
hypothesis is tested for each fram
e
and
a decision
wa
s
m
a
de to conside
r
only 2500 sam
p
les for si
gn
represen
tatio
n
.
W
i
t
h
50×
50
fe
ature m
a
trix pe
r fram
e and a
n
ave
r
ag
e
n
u
m
b
er o
f
fram
e
s pe
r vi
de
o at
2
2
0
fram
e
s, t
h
e
featu
r
e m
a
trix
for th
e con
s
i
d
ered
1
8
sign
s is a st
ack
of 5
0
×
50
×220
matrix
. In
itiati
n
g
a m
u
lti d
i
men
s
io
n
feature
m
a
trix of this
size t
a
ke
s l
o
nge
r
e
x
ecution pe
riods. He
nce
PC
A treats
each
fram
e
s 50×
50 ene
r
gy
feat
ure
s
by
c
o
m
put
i
ng Ei
ge
n
vect
ors
an
d
re
t
a
i
n
i
n
g
t
h
e
p
r
i
n
ci
pl
e com
p
o
n
e
n
t
s
t
o
f
r
om
a 5
0
×1
ve
ct
or
per
fram
e
.
A c
o
m
b
ination of these
feat
ures re
prese
n
ts a
sign i
n
a
vi
deo
sequ
ence. W
h
en
n
o
h
a
nd
is
detected in the fram
e
it is considere
d
as ‘No Sign’.
These
particul
ar fram
e
s are
detected as their feature m
a
trix
is h
a
v
i
ng
on
ly
h
ead
cont
ou
r e
n
er
gy
sam
p
l
e
s.
The traini
ng
vector contains
a few hea
d
only sa
m
p
le values fo
r suc
h
‘N
o Sig
n
’
detection
.
Th
ree
classifier are
com
p
ared to test the exec
ut
ion s
p
ee
ds
m
a
tch
i
n
g
th
at
o
f
sm
art p
h
o
n
e
ex
ecu
tion
.
Euclid
ian
distance,
Normalized Euclidian distan
ce a
n
d Mahalanobis
distance classifies
the feature
m
a
trix as individual
signs
. T
h
e
ne
xt section analys
es the classifie
r
s
per
f
o
r
m
a
nce
base
d o
n
wo
rd
m
a
t
c
hi
ng sco
r
e
(
W
M
S
)
.
3.
2.
Classifiers
Pe
rform
a
nce
:
Word
Matchi
ng
Score
(W
MS
)
Wo
rd
m
a
tch
i
n
g
sco
r
e
g
i
v
e
s th
e
ratio
o
f
co
rrect classificatio
n to to
tal
n
u
m
b
er o
f
sam
p
les u
s
ed for
classification. The
e
x
pres
sio
n
fo
r WM
S
%
100
S
Cor
r
e
c
t
C
las
s
if
ic
ations
M
T
o
ta
l
S
igns
in
a
V
ide
o
.
Feature
m
a
trix has
a size
of
50×
2
2
0
,
eac
h r
o
w
re
prese
n
t
i
n
g a
fram
e
i
n
t
h
e vi
de
o se
q
u
e
n
ce. T
o
t
e
st
t
h
e
uni
que
ness
o
f
t
h
e f
eat
ure
m
a
t
r
i
x
f
o
r
a part
i
c
ul
ar
si
g
n
o
r
n
o
si
g
n
, e
n
er
gy
de
nsi
t
y
vari
at
i
o
ns o
f
t
h
e 50 sam
p
l
e
s f
o
r
fi
rst
1
5
0
fr
a
m
es i
s
co
m
put
ed an
d
pl
ot
t
e
d
d
u
ri
ng
t
e
st
i
ng i
n
Fi
g
u
r
e
1
0
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 6
,
N
o
. 5
,
O
c
tob
e
r
20
16
:
217
6
–
21
87
2
184
Fig
u
re
10
.
Samp
le En
erg
y
d
i
st
ribu
tio
n in
frames for id
en
tificatio
n
of si
gn
s
Excl
usi
v
e
t
e
st
i
ng wi
t
h
t
h
ree di
st
ance
m
easure o
n
a
si
g
n
vi
de
o
ha
vi
ng 18 si
g
n
s
co
nsi
s
t
i
ng of 22
0
fram
e
s pro
v
i
d
e
s
an i
n
si
ght
i
n
t
o
t
h
e
best
di
st
a
n
ce m
easure f
o
r si
g
n
feat
u
r
es.
Tabl
e 1
gi
ves
det
a
i
l
s
of t
h
e
m
e
t
r
i
c
%
S
M
for three
dista
n
ce m
easures. The
ave
r
a
g
e
classificati
o
n
rate with
sam
e
train
i
ng
feature for testing
in
d
i
v
i
d
u
a
l
f
r
a
mes is ar
o
und 9
0
.58
%
w
ith
m
a
h
a
lan
o
b
i
s
d
i
stan
ce.
Th
e
lo
w
sco
r
es r
e
co
rd
ed
b
y
Euclid
ian
di
st
ance (
7
4.
1
1
%) a
n
d n
o
rm
al
i
zed Eucl
i
d
i
a
n Di
st
anc
e
(
7
1
.
7
6
%) c
o
m
p
ar
ed t
o
m
a
hal
a
nobi
s i
s
t
h
e i
n
t
e
r cl
ass
v
a
rian
ce co
n
s
i
d
eration
s
as in
eq
’n
8
.
Test rep
e
titio
n
frequ
en
cy is
1
0
p
e
r si
g
n
.
Tabl
e
1. T
h
e
P
e
rf
orm
a
nce o
f
Th
ree
Minim
u
m
Distance Classifiers
SIGNS
EUCLI
D
IA
N
DIST
ANCE
CLASSI
FER
NORMA
LI
ZED
EUCLI
D
IA
N
DIST
ANCE
MAHA
LAN
O
BIS
DIST
ANCE
CLASSI
FIER
HAI 80
70
90
GOOD
70
70
90
M
O
RNING 80
80
80
I
AM
60
60
80
P 90
90
100
R 90
90
100
I
90
90
100
D 90
90
100
H 90
90
100
U 90
90
100
HAVE
50
50
80
A 90
80
100
NI
CE 60
50
80
DAY 70
70
80
BYE 50
50
90
T
HANK 50
50
90
YOU 60
50
80
Aver
age W
M
S
74.
11
71.
76
90.
58
For m
o
re e
xha
ust
i
v
e t
e
st
i
ng
d
i
ffere
nt
si
g
n
er
’
s
vi
de
os w
e
re
use
d
agai
nst
t
h
e pre
v
i
o
us t
r
ai
ni
n
g
sam
p
l
e
an
d
th
e resu
lts were
tabu
lated
in
Ta
ble 2 for
all three
distance m
easures.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Si
g
n
L
a
ng
u
age
Reco
g
n
i
t
i
on
S
y
st
em
Si
m
u
l
a
t
e
d f
o
r Vi
de
o C
a
pt
ure
d
w
i
t
h
S
m
art
P
h
o
n
e
...
. (
G
.
An
a
n
t
h
R
a
o
)
2
185
Tabl
e
2. T
h
e
P
e
rf
orm
a
nce o
f
Three
M
i
ni
m
u
m
Di
st
ance C
l
assi
fi
ers
wi
t
h
d
i
ffere
nt
t
e
st
i
n
g
vi
de
o
Signs
Euclidian
distance classif
e
r
Norm
aliz
ed
euclidian
distance
M
a
halanobis
distance classif
i
er
HAI 70
60
80
GOOD
60
60
80
M
O
RNING 70
70
80
I
AM
50
40
80
P 80
80
90
R 80
80
100
I
80
80
100
D 80
80
100
H 80
80
90
U 80
80
90
HAVE
40
40
80
A 60
80
90
NI
CE 50
40
80
DAY 60
60
80
BYE 40
40
80
T
HANK 40
40
80
YOU 50
40
80
Aver
age W
M
S
62.
94
61.
76
85.
88
To
find the a
v
erage
WMS for all the
diffe
rent si
g
n
er
vi
de
o sam
p
l
e
s an
d
t
h
ei
r
per
f
o
r
m
a
nce a
g
ai
nst
sam
e
sa
m
p
le tr
ain
-
test d
a
ta and
d
i
fferen
t samp
le train
te
st d
a
ta with
resp
ect to
th
ree distance m
e
trics, a chart
i
s
draw
n f
o
r
4 si
gne
r dat
a
i
n
Fi
gu
re 1
1
. F
r
o
m
t
h
e chat
th
e
fo
llowing
ob
serv
ation
s
can
me
m
a
d
e
: (i) fo
r sa
m
e
test train
samp
les all d
i
stance m
e
trics sho
w
goo
d
W
M
S.
(ii) Fo
r
d
i
fferen
t trai
n test d
a
ta, Mah
a
lan
obi
s
d
i
stan
ce p
e
rforman
ce is b
e
tter fo
r all test d
a
ta. Furth
e
r
W
M
S decrea
ses by
4% to 6%
if the num
b
er of
fram
es
i
n
t
e
st
t
r
ai
n
dat
a
d
o
es
n
o
t
m
a
tch.
Fo
r
per
f
ect
m
a
t
c
hi
ng t
h
e
WM
S i
s
3%
ab
ove
ave
r
a
g
e.
Figure
11.
Distance a
n
d Train Test Data C
o
m
p
arison c
h
art
Figu
re 1
1
. Sh
o
w
in
g
a
v
era
g
e WM
S fo
r
test
t
r
ai
n
d
a
ta
with
resp
ect to 3 d
i
st
an
ce m
e
trics.
Evaluation Warning : The document was created with Spire.PDF for Python.