Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
8,
No.
1,
February
2018,
pp.
52
–
59
ISSN:
2088-8708
52
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
Emotion
Recognition
fr
om
F
acial
Expr
ession
Based
on
Fiducial
P
oints
Detection
and
using
Neural
Netw
ork
F
atima
Zahra
Salmam
1
,
Abdellah
Madani
2
,
and
Mohamed
Kissi
3
1,2
LAR
OSERI
Laboratory
,
F
aculty
of
Sciences,
Uni
v
ersity
of
Chouaib
Doukkali,
El
Jadida-Morocco
3
LIM
Laboratory
,
F
aculty
of
Sciences
and
T
echnologies,
Uni
v
ersity
Hassan
II,
Casablanca-Morocco
Article
Inf
o
Article
history:
Recei
v
ed:
Jun
5,
2017
Re
vised:
No
v
30,
2017
Accepted:
Dec
16,
2017
K
eyw
ord:
F
acial
e
xpression
Feature
selection
Neural
netw
ork
Supervised
Descent
Method
Best
First
ABSTRA
CT
The
importance
of
emotion
r
ecognition
lies
in
the
role
that
emotions
play
in
our
e
v
eryday
li
v
es.
Emotions
ha
v
e
a
strong
relationship
with
our
beha
vior
.
Thence,
automatic
emotion
recognition,
is
to
equip
the
machine
of
this
human
ability
to
analyze,
and
to
understand
the
human
emotional
state,
in
order
to
anticipate
his
intentions
from
f
acial
e
xpression.
In
this
paper
,
a
ne
w
approach
is
proposed
to
enhance
accurac
y
of
emotion
recognition
from
f
acial
e
xpression,
which
is
based
on
input
features
deducted
only
from
fiducial
points.
The
proposed
approach
consists
firstly
on
e
xtracting
1176
dynamic
features
from
image
sequences
that
represent
the
proportions
of
euclidean
distances
between
f
acial
fiducial
points
in
the
first
frame,
and
f
aicial
fiduci
al
points
in
the
last
frame.
Secondly
,
a
feature
selection
met
hod
is
used
to
select
only
the
most
rele
v
ant
features
from
them.
Finally
,
the
selected
features
are
presented
to
a
Neural
Netw
ork
(NN)
classifier
to
classify
f
acial
e
x-
pression
input
into
emotion.
The
proposed
approach
has
achie
v
ed
an
emotion
recognition
accurac
y
of
99%
on
the
CK+
database,
84.7%
on
the
Oulu-CASIA
VIS
database,
and
93.8%
on
the
J
AFFE
database.
Copyright
c
2018
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
Abdellah
Madani
LAR
OSERI
Laboratory
,
computer
science
departement
F
aculty
of
Sciences,
Uni
v
ersity
of
Chouaib
Doukkali,
El
Jadida
-
Morocco
Email:
madaniabdellah@gmail.com
1.
INTR
ODUCTION
As
emotions
play
an
im
plicit
role
in
the
communication
process,
and
reflect
human
beha
vior
,
automatic
emotion
recognition
is
a
task
of
gro
wing
interest.
T
o
recognize
human
emotions,
a
wide
range
of
features
can
be
used
such
as
f
acial
e
xpression[1,
2],
body
gesture
[2,
3],
or
speech
[4,
5].
Gi
ving
a
computer
the
capability
of
emotion
recognition
(ER)
is
the
scientific
challenge
around
which
g
ather
dif
ferent
communities
(signal
processing,
image
processing,
artifcial
intelligence,
robotics,
human-computer
interaction
)
Mehrabian
[6]
af
firms
that
f
acial
e
xpression
represents
55%
of
the
non
v
erbal
communication
that
allo
ws
to
understand
the
state
or
the
emotion
of
a
person.
The
objecti
v
e
of
this
w
ork
is
to
get
a
computer
to
detect
human
emotions
from
f
acial
e
xpressions.
F
acial
e
xpression
is
the
most
important
k
e
y
to
understand
human
emotions.
In
f
act,
not
all
f
acial
e
xpres-
sions
ha
v
e
a
meaning
and
can
be
classified
into
emotions,
b
ut
there
are
some
basic
emotions
t
hat
are
uni
v
ersal
[7]
and
can
be
e
xpressed
in
the
same
w
ay
,
which
are:
happ
y
,
sad,
fear
,
anger
,
disgust,
and
surprise.
The
main
principles
steps
of
f
acial
e
xpression
re
cogn
i
tion
are
generally:
f
ace
detection,
feature
e
xtraction,
and
f
acial
e
xpression
classification.
In
t
he
first
step;
we
ha
v
e
to
determine
whether
an
image
belongs
to
the
class
of
f
aces
or
not.
In
the
second
step;
we
ha
v
e
to
e
xtract
features
or
characteristics
from
the
f
ace
that
better
describe
emotions.
In
the
last
step;
we
ha
v
e
t
o
classify
the
e
xtracted
features
into
basic
emotions.
Ho
we
v
er
,
usually
the
issue
comes
from
the
second
step
which
is
feature
e
xtraction.
A
set
of
features
that
better
describes
a
f
acial
e
xpression
mo
v
ement
must
be
found
and
used
for
classification.
F
or
this
reason,
the
proposed
technique
in
this
paper
is
based
on
image
sequences,
and
focused
on
calculating
1176
euclidean
dist
ances
between
all
detected
points
to
mea
sure
all
possible
deformations
of
the
f
ace,
because,
there
may
be
distances
more
descripti
v
e
than
others
that
appear
visually
J
ournal
Homepage:
http://iaescor
e
.com/journals/inde
x.php/IJECE
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
,
DOI:
10.11591/ijece.v8i1.pp52-59
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
53
Figure
1.
Steps
of
an
emotion
recognition
system
logic.
Once
we
ha
v
e
calculated
dynamic
features
from
fiducial
points;
an
additional
step
of
feature
selection
process
(see
Figure
1)
is
used
to
reduce
the
number
of
features
by
choosing
only
the
most
rele
v
ant
ones
from
them.
The
rest
of
the
paper
is
or
g
anized
as
follo
ws:
an
o
v
ervie
w
of
related
w
ork
is
presented
in
section
2,
our
proposed
method
is
presented
in
section
3,
e
xperimental
setup
is
gi
v
en
in
section
4,
in
section
5;
results
and
a
discussion
of
our
proposed
approach
are
presented
with
a
comparison
of
recognition
rate
between
pre
vious
w
orks.
Section
6
concludes
the
paper
and
presents
some
perspecti
v
e
researches.
2.
RELA
TED
W
ORK
In
general,
f
acial
e
xpression
recognition
system
can
be
classified
into
tw
o
cate
gories:
geometric
features
based
methods
and
global
features
based
methods
[8].
In
geometric
features-based
methods,
only
some
parts
of
the
f
ace
are
cons
idered
for
feature
e
xtraction
such
as
e
yes,
nose
and
mouth.
Such
methods
consume
a
lot
of
computation
time
to
obtain
accurate
results
for
f
acial
feat
u
r
es
detection
and
tracking
which
is
a
major
disadv
antage.
Besides,
in
global
fea
tures
based
methods,
the
whole
f
ace
is
considered
for
feature
e
xtraction,
as
used
in
[9]
where
global
features
are
e
xtracted
from
f
ace
images
usi
ng
local
Zernik
e
moment.
These
methods
are
easy
to
use
because
the
y
w
ork
directly
on
f
acial
images
to
describe
f
acial
te
xtures.
There
are
a
plethora
of
w
orks
that
aim
to
f
acilitate
the
w
ay
of
recognizing
emotions
from
f
acial
e
xpression
using
static
[10,
11,
12]
or
dynamic
images
[13,
14,
15,
16,
17,
18,
9].
By
measuring
dynamic
f
acial
motions
in
image
sequences;
Bassili
[19]
has
confirmed
that
dynamic
images
gi
v
e
accurate
results
in
f
acial
e
xpression
recognition
than
single
static
ones.
Friesen
et
al.
[14]
ha
v
e
proposed
the
F
A
CS
system
that
describes
mo
v
ements
of
the
f
ace,
where
forty
four
Action
Units
(A
U)
are
defined,
and
each
one
represents
a
mo
v
ement
of
a
particular
part
of
the
f
ace
(e.g
Bro
w
Lo
werer).
According
to
Friesen
et
al.,
a
f
acial
e
xpression
could
be
characterized
by
a
combination
of
A
Us.
T
o
demonstrate
that
A
Us
are
capable
to
perform
emotion
e
xpressions,
basori
et
al.
[20]
ha
v
e
generated
an
emotion
e
xpression
of
an
a
v
atar
using
combination
of
A
Us
based
on
f
acial
muscle.
P
antic
et
al.
[15]
ha
v
e
focused
their
w
ork
on
recognizing
f
acial
action
units
(A
Us)
and
their
temporal
models
usi
n
g
profile-vie
w
f
ace
image
sequences.
T
o
track
15
f
acial
points
in
an
input
f
ace
profile
sequences;
the
y
apply
particle
filtering
method
[21].
V
alstar
et
al.
[22]
ha
v
e
proposed
an
automati
c
method
to
recognize
22
actions
units
(A
Us)
and
their
models
using
image
sequences.
Firstly
,
to
automatically
detect
20
fiducial
points,
the
y
used
Gabor
-feature-based
boosted
classifier
,
then,
thes
e
points
were
track
ed
through
a
sequence
of
images
using
a
particle
filtering
method
with
f
ac-
torized
lik
elihoods.
Pu
et
al.
[16]
ha
v
e
suggested
a
ne
w
frame
w
ork
for
f
acial
e
xpression
analysis
based
on
recognizing
A
Us
from
image
sequences.
T
o
detect
and
track
fiducial
points,
the
y
applied
first
AAM
[23]
to
model
the
neutral
f
acial
e
xpression
in
the
first
frame,
after
that
the
y
used
p
yramidal
implementation
of
Lucas-Kanade
[24]
to
track
feature
points
in
the
others
frames.
The
y
used
tw
o
le
v
els
to
classify
f
acial
e
xpressions
using
random
forest
as
method
of
classification.
The
first
le
v
el
consists
of
classifying
A
Us,
taking
as
input
the
displacement
v
ectors
between
the
neutral
e
xpres
sion
frame
and
the
peak
e
xpression
frame.
The
second
le
v
el
consists
of
using
as
input
the
detected
A
Us
to
classify
f
acial
e
xpressions.
Most
of
f
acial
e
xpression
recognition
methods
are
based
on
A
U-based
method
[15,
22,
16,
20].
The
y
are
often
influenced
by
the
F
A
CS
system
proposed
by
Friesen
et
al.[14].
Ne
v
ertheless,
there
are
als
o
other
techniques
that
are
based
only
on
fidicual
points
to
recognize
f
acial
e
xpression,
which
minimize
computation
time.
Abdat
et
al.
[17]
ha
v
e
focused
on
another
geometric
method
to
detect
f
acial
e
xpression.
The
y
ha
v
e
used
twenty
one
distances
to
encode
f
acial
e
xpressions;
these
distances
describe
f
acial
features
deformations
compared
to
the
neutral
state.
These
methods
are
focused
firstly
on
the
algorithm
of
Shi&Thomasi
to
e
xtract
feature
points,
and
secondl
y
on
the
Lucas-Kanade
algorithm
[24]
to
track
and
detect
points,
after
that
the
distance
v
ector
w
as
used
as
a
descriptor
of
the
f
acial
e
xpression,
which
is
calculated
from
image
sequences.
This
v
ector
is
the
input
of
SVM
classifier
.
Hammal
et
al.
[13]
ha
v
e
de
v
eloped
a
classifying
system
based
on
the
belief
theory
,
and
applied
it
on
the
Hammal-Caplier
database.
The
y
used
fi
v
e
distances
between
dif
ferent
parts
of
the
f
ace
(e
ybro
w
,
both
e
yes
and
mouth).
In
their
w
ork,
distances
were
computed
on
sk
eletons
of
e
xpression
from
image
sequences,
ho
we
v
er
,
only
Emotion
Reco
gnition
fr
om
F
acial
Expr
ession
Based
on
F
iducial
...
(F
atima
Zahr
a
Salmam)
Evaluation Warning : The document was created with Spire.PDF for Python.
54
ISSN:
2088-8708
four
emotions
(jo
y
,
surprise,
disgust
and
neutral)
were
considered
from
the
six
basic
emotions.
Perv
een
et
al.
[10]
ha
v
e
focused
their
w
ork
on
three
re
gions
(e
yebro
ws,
e
yes,
mouth)
to
define
an
emotion
from
static
images.
First,
the
y
calculated
the
characteristic
points
of
the
f
ace,
then
the
y
tried
to
e
v
aluate
some
animation
parameters
such
as:
the
openness
of
e
yes,
the
width
of
e
yes,
the
height
of
e
yebro
ws,
the
opening
of
mouth,
and
the
width
of
mouth.
As
a
classification
technique,
the
y
used
a
decision
tree
based
method,
applied
only
on
thirty
images
from
the
J
AFFE
database
[25],
and
the
y
recognized
six
emotions
(happ
y
,
surprise,
fear
,
sad,
angry
,
and
neutral)
e
xcluding
the
disgust
emotion.
Saeed
et
al.
[26]
ha
v
e
proposed
an
emotion
recognition
system
based
on
just
eight
fiducial
points.
The
y
represented
six
geometric
features
by
measuring
some
distances
between
mouth,
e
yes,
and
e
yebro
ws.
These
features
represent
the
changes
of
the
f
ace
during
an
emotion
occurrence.
Then,
the
features
were
presented
to
an
SVM
classifier
for
emotion
recognition.
The
system
w
as
applie
d
on
Cohn-Kanade
dat
abase
(CK+)
[27],
and
Binghamton
Uni
v
ersity
3D
F
acial
Expression
Database
[28]
to
recognize
six
basic
emotions.
Majumder
et
al.
[29]
ha
v
e
suggested
an
emotion
recognition
model
based
on
the
K
ohonen
self-or
g
anizing
map
(KSOM)
that
uses
26
dimensional
f
acial
geometric
feature
v
ector
calculated
from
three
parts
of
the
f
ace
(lips,
e
yes
and
e
yebro
ws
)
that
describes
the
change
of
six
basic
emotions.
The
e
xperience
w
as
applied
on
the
MMI
database
[30].
The
research
studies
cited
abo
v
e
sho
w
that
dynamic
f
acial
e
xpressions
from
image
sequences
are
more
descripti
v
e
for
the
task
of
emotion
recognition
and
can
increase
the
accur
ac
y
in
real
time
applications
instead
of
using
static
images.
3.
PR
OPOSED
METHOD
This
section
presents
and
justifies
our
proposed
technique
for
emotion
recognition
from
f
acial
e
xpression.
Our
contrib
ution
concerns
the
feature
e
xtraction
step,
in
which
we
ha
v
e
proposed
to
calculate
all
euclidean
distances
between
fiducial
points,
in
the
first
and
in
the
last
frames
to
measure
f
acial
motion.
Firstly
,
we
detect
the
f
ace
using
V
iolaJones
algorithm
[31],
then,
we
detect
and
track
49
fiducial
point
s
using
a
po
werful
and
recent
Supervised
Decent
Method
(SDM)
proposed
by
Xiong
et
al.
[32],
and
from
these
points
that
represent
the
four
parts
of
the
f
ace
(e
yebro
ws,
e
yes,
nose,
and
mouth),
we
calculate
all
possible
distances
between
each
pair
of
points,
as
a
result,
we
get
C
2
49
=
1176
euclidean
distances.
After
that,
to
measure
dynamic
deformation
related
to
the
neutral
state,
we
calculate
the
distance
ratio
that
represents
dynamic
features,
it
is
calculated
between
the
first
and
the
last
frames
(Section
2.1).
Afterw
ard,
we
use
a
feature
selection
method
to
reduce
the
number
of
features
and
to
select
only
the
most
rele
v
ant
ones.
Finally
,
we
present
the
selected
dynamic
features
to
a
neural
netw
ork
classifier
for
f
acial
e
xpression
recognition.
3.1.
F
acial
expr
ession
r
epr
esentation
Once
we
ha
v
e
detected
the
f
ace
using
V
iola
Jones
algorithm
[31],
we
ha
v
e
applied
SDM
method
[32]
to
detect
and
track
fiducial
points
in
image
sequences.
T
o
meas
ure
the
f
ace
deformation,
we
ha
v
e
considered
only
the
first
and
the
last
frames.
Firstly
,
we
ha
v
e
calculated
all
Euclidean
distances
(1)
from
49
detected
points
that
are
represented
by
x
and
y
coordinates
(2)
(3),
and
that
refer
to
the
parts
of
the
f
ace
which
are:
10
points
for
e
yebro
ws,
12
points
for
e
yes,
9
points
for
nose,
and
18
points
for
mouth.
In
the
total,
we
ha
v
e
calculated
1176
distances
in
the
first
and
in
the
last
frames.
Then,
we
ha
v
e
measured
dynamic
def
ormation
by
calculating
the
rati
o
(4)
between
frames.
The
ratio
represents
the
di
vision
of
the
calculated
distance
of
the
peak
frame
by
the
same
cal
culated
distance
of
the
first
frame.
The
dynamic
features
(5)
represent
a
v
ector
of
features
t
hat
contains
1176
ratios
calculated
related
to
the
neutral
state.An
o
v
ervie
w
of
f
acial
e
xpression
representation
process
is
presented
in
Figure
2.
D
=
[
D
1
;
D
2
;
:::;
D
i
;
:::;
D
t
]
(1)
V
0
=
[
x
10
;
y
10
;
x
20
;
y
20
;
:::;
x
n
0
;
y
n
0
]
(2)
V
p
=
[
x
1
p
;
y
1
p
;
x
2
p
;
y
2
p
;
:::;
x
np
;
y
np
]
(3)
D
i
=
D
ip
D
i
0
(4)
D
F
=
[
D
1
;
D
2
;
:::;
D
t
]
(5)
Where
n:
The
total
number
of
fiducial
points
IJECE
V
ol.
8,
No.
1,
February
2018:
52
–
59
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
55
t:
The
number
of
euclidean
distances
calculated
between
each
pair
of
points
V
0
:
x
and
y
coordinates
of
detected
points
in
the
first
frame
V
p
:
x
and
y
coordinates
of
detected
points
in
the
peak
frame.
D
ip
:
Euclidean
distance
in
the
peak
frame.
D
i
0
:
Euclidean
distance
in
the
first
frame.
Figure
2.
Dynamic
features
representation
process
3.2.
F
eatur
e
selection
One
of
the
k
e
y
issues
in
emotion
classification
is
the
features
used
for
prediction.
F
or
this
reason;
a
feature
selection
step
has
been
used
t
o
choose
the
most
rele
v
ant
features.
Generally
,
the
feature
selection
step
combines
an
attrib
ute
s
ubset
e
v
aluator
wi
th
a
search
method.
The
attrib
ute
e
v
aluator
determines
what
method
is
used
to
assign
a
w
orth
to
each
subset
of
features.
The
search
method
determines
what
style
of
search
is
performed
[33].
In
this
paper
,
we
ha
v
e
chosen
as
a
feature
e
v
aluator
the
CfsSubsetEv
al,
and
Best
First
as
search
method
implemented
in
weka.
The
CfsSubsetEv
al
e
v
aluator
e
v
aluates
the
w
orth
of
a
subset
of
features
by
considering
the
indi
vidual
predicti
v
e
ability
of
each
feature
along
with
the
de
gree
of
redundanc
y
between
them;
subsets
of
features
that
are
highly
correlated
with
the
class
while
ha
ving
lo
w
inter
-correlation
are
preferred.
The
Best
first
method
searches
the
space
of
feature
subsets
by
greedy
hill-climbing
augmented
with
a
backtracking
f
acility
[33].
3.3.
Classification
A
neural
netw
ork
(NN)
classifier
has
been
chosen
to
classify
f
acial
e
xpressions
based
on
dynamic
features
that
are
pre
viously
selected.
It
w
as
trained
on
a
multi-class
emotion
recognition
task,
using
the
backpropag
ation
algorithmn,
and
the
Sigmoid
function
as
an
acti
v
ation
function.
Our
NN
is
a
signle
netw
ork
with
one
hidden
layer
.
The
first
layer
re
p
r
esents
the
input
data
which
are
the
DF
.
The
second
one
is
the
hidden
layer
,
and
the
last
one
represents
the
output
classes.
The
number
of
neurons
in
the
hidden
layer
w
as
chosen
e
xperimentally
.
4.
EXPERIMENT
AL
SETUP
The
e
xperiments
of
our
w
ork
w
as
conducted
on
three
kno
wn
f
acial
e
xpression
databases:
Extended
Cohn-
Kanade
(CK+)
database
[34,
27],
Oulu-CASIA
VIS
database
[35]
database,
and
J
AFFE
database
[25].
The
CK+
database
[34,
27]
contains
327
labeled
image
sequences
that
refer
to
on
e
of
se
v
en
e
xpressions,
i.e.,
anger
,
contempt,
disgust,
fear
,
happiness,
sadness,
and
surprise.
F
or
each
image
sequences,
only
the
last
frame
is
pro
vided
with
an
e
xpression
label.
This
database
is
detailed
as
follo
ws:
45
images
of
angry
e
xpression,
59
images
of
disgust
e
xpression,
25
images
of
fear
e
xpression,
69
images
of
happ
y
e
xpression,
28
images
of
sad
e
xpression,
and
83
images
of
surprise
e
xpression.
The
Oulu-CASIA
VIS
database
[35]
contains
dif
ferent
light
conditions,
we
ha
v
e
used
the
strong
and
good
lighting
onces
that
contains
80
subjects.
F
acial
e
xpressions
are
made
by
each
subject
and
refer
to
the
six
basic
e
xpressions
(anger
,
disgust,
fear
,
happiness,
sadness,
and
surprise).
In
total
we
ha
v
e
480
e
xpression
labeled
image
sequences.
Emotion
Reco
gnition
fr
om
F
acial
Expr
ession
Based
on
F
iducial
...
(F
atima
Zahr
a
Salmam)
Evaluation Warning : The document was created with Spire.PDF for Python.
56
ISSN:
2088-8708
The
J
AFFE
database
[25]
contains
213
images
from
10
Japanese
female
subjects.
Each
subject
has
3
or
4
e
xamples
of
each
of
the
six
basic
e
xpress
ions
(anger
,
disgust,
fear
,
happiness,
sadness,
surprise
and
neutral
e
xpression).
This
database
is
detailed
as
follo
ws:
30
images
of
angry
e
xpression,
29
images
of
disgust
e
xpression,
32
images
of
fear
e
xpression,
31
images
of
happ
y
e
xpression,
31
images
of
sad
e
xpression,
30
images
of
surprise
e
xpression,
and
30
images
of
neutral
e
xpression.
4.1.
T
raining
pr
ocess
In
our
w
ork,
we
ha
v
e
proceeded
with
three
e
xperiments
to
remark
the
influence
of
each
used
detail
on
the
emotion
recognition
accurac
y
.
All
e
xperiments
w
as
cond
uc
ted
on
the
three
databases
(CK+,
Oulu-CASIA
VIS,
and
J
AFFE),
and
each
one
has
been
di
vided
into
60%
for
training,
10%
for
v
alidation,
and
30%
for
test,
with
a
NN
of
20
neurons
in
the
hidden
layer
in
all
e
xperiments.
The
first
e
xperiment
consists
firstly;
on
omitting
the
feature
selection
step,
and
using
the
DF
(5)
directly
as
input
to
our
classifier
.
Therefore,
the
NN
classifier
tak
es
1176
features
as
input,
and
six
or
se
v
en
classes
in
the
output
that
depend
on
the
number
of
cl
asses
presented
in
the
used
database.
Secondly;
on
using
the
feature
selection
step.
Thus,
after
calculating
DF
(5)
for
each
image
sequences
presented
in
each
used
database;
we
ha
v
e
applied
feature
selection
method
on
the
three
databases
to
reduce
the
number
of
features.
First,
we
ha
v
e
combined
the
CK+
,
the
Oulu-CASIA
VIS
and
the
J
AFFE
databases
in
one
database
that
contains
1020
data
and
refers
to
the
eight
e
xpressions
(anger
,
contempt,
disgust,
fear
,
happiness,
sadness,
surprise,
and
neutral).
Then,
we
ha
v
e
applied
feature
selection
method
to
this
ne
w
database
in
order
to
select
the
common
and
only
the
rele
v
ant
features.
As
result,
we
ha
v
e
reduced
our
features
from
1176
to
83
features.
Last,
we
ha
v
e
trained
three
classi
fiers
on
the
three
databases,
each
one
apart.
The
NN
classifier
tak
es
83
features
as
input,
and
six
or
se
v
en
classes
in
the
output.
In
the
second
e
xperiment,
we
ha
v
e
tried
to
observ
e
the
ability
of
classifying
ne
w
image
sequences,
the
classifier
trained
on
the
CK+
w
as
tested
on
the
Oulu-CASIA
VIS
and
the
J
AFFE
databases,
and
vice
v
ersa.
The
last
e
xperiment
consists
firstly
on
unifying
the
three
databases;
that
means
to
delete
the
emotions
that
don’
t
appear
in
other
databases
and
k
eep
the
common
ones.
Ho
we
v
er;
it
will
remai
n
only
309
and
183
image
sequences
for
the
CK+
and
the
J
AFFE
databases
respecti
v
ely
.
Secondly
,
it
consists
on
testing
each
classifier
trained
on
one
database,
on
the
tw
o
other
databases
by
v
arying
the
size
of
the
training
set
,
and
sho
wing
ho
w
that
influences
the
emotion
recognition
accurac
y
.
5.
RESUL
TS
&
DISCUSSION
T
able
1
summarizes
a
compar
aison
between
the
results
achie
v
ed
in
our
first
e
xperiment
and
those
achie
v
ed
by
Pu
et
al.
[16]
usi
n
g
random
forest.
The
third
column
presents
emotion
recognition
accurac
y
achie
v
ed
using
directly
the
DF
calculated.
The
last
column
presents
emot
ion
recognition
accurac
y
using
feature
selection
step
where
only
83
features
are
used
from
1176.
The
obtained
results
sho
w
that
our
method
outperforms
A
U
based
method
proposed
by
[16]
whether
the
feature
selection
process
is
used
or
not.
Ne
v
ertheless,
the
use
of
feature
selection
process
allo
ws
to
tak
e
a
less
number
of
features
and
gi
v
es
better
results
than
when
using
DF
directly
.
T
able
1.
Comparison
of
emotion
recognition
accurac
y
with
and
without
the
use
of
feature
selection
process
Pu
et
al.
[16]
Our
approach
W
ithout
FS
W
ith
FS
CK+
96.3
98
99
OULU-CASIA
VIS
76.25
81.3
84.7
J
AFFE
-
90.6
93.8
T
able
2
presents
the
achie
v
ed
results
by
the
three
classifiers
trained
separately
on
the
CK+,
the
Oulu-CASIA
VIS,
and
the
J
AFFE
databases,
in
the
second
e
xperiment.
The
first
classifier
which
w
as
trained
on
the
CK+
database
using
al
w
ays
83
features,
gi
v
es
an
emotion
recognition
accurac
y
of
67.29%
and
43.66%
on
the
Oulu-CASIA
VIS,
and
the
J
AFF
E
databases
respecti
v
ely
.
The
second
classifier
which
w
as
trained
on
the
Oulu-CASIA
VIS
database,
gi
v
es
an
emotion
recognition
accurac
y
of
90.52%
and
46.48%
on
the
CK+,
and
the
J
AFFE
databases
respecti
v
ely
.
The
third
classifier
which
w
as
trained
on
the
J
AFFE
database,
gi
v
es
an
emotion
recognition
accurac
y
of
64.22%
and
47.29%
on
the
CK+,
and
the
Oulu-CASIA
VIS
databases
respecti
v
ely
.
W
e
ha
v
e
obtained
a
competiti
v
e
emotion
recognition
ac
curac
y
by
the
second
classifier
which
w
as
trained
on
the
Oulu-CASIA
VIS
database,
unlik
e
classifiers
IJECE
V
ol.
8,
No.
1,
February
2018:
52
–
59
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
57
T
able
2.
T
est
the
trained
classifier
on
other
databases
T
esting
CK+
OULU-CASIA
VI
S
J
AFFE
CK+
-
67.29
43.66
T
raining
OULU-CASIA
VIS
90.52
-
46.48
J
AFFE
64.22
47.29
-
which
were
trained
on
the
CK+,
and
the
J
AFFE
databases.
Ho
we
v
er
,
this
decrease
of
results
can
be
justified
firstly
by
the
training
size
of
the
CK+
database
(196
image
sequences)
and
the
training
size
of
the
J
AFFE
database
(127
image
sequences)
comparati
v
ely
to
the
size
of
the
Oulu-CASIA
VIS
database
(288
image
sequences)
,
secondly
,
by
the
e
xpressions
which
the
y
do
not
e
xist
on
all
databases,
kno
wing
that
the
contempt
e
xpression
is
present
on
the
CK+
with
an
occurrence
of
18,
b
ut
not
in
other
tw
o
databases,
lik
e
wise,
in
the
J
AFFE
database
30
neutral
e
xpressions
are
considered
as
emotion
cla
ss.
Therefore,
the
classifiers
trained
on
a
supplementary
e
xpressions
that
do
not
e
xist
in
all
databases
cause
a
decreasing
of
emotion
recognition
accurac
y
,
for
this
reason,
we
ha
v
e
proceeded
with
the
third
and
last
e
xperiment
to
unify
all
used
databases
and
sho
w
ho
w
that
influence
our
results.
(a)
T
raining
on
the
CK+
(b)
T
raining
on
the
OULU-CASIA
VIS
(c)
T
raining
on
the
J
AFFE
Figure
3.
T
raining
our
proposed
method
with
one
database
and
testing
it
with
another
databases
Figure
3
sho
ws
ho
w
the
training
size
and
unified
databases
mark
an
increase
of
emotion
recognition
ac-
curac
y
o
v
er
all
dat
abases.
Figure3
(a)
sho
ws
that
emotion
recognition
accurac
y
increase
from
67.29%
to
72.08%
and
from
43.66%
to
56.83%
in
the
Oulu-CASIA
VIS
and
the
J
AFFE
databases
respecti
v
ely
.
Figure3
(b)
sho
ws
that
emot
ion
recognition
accurac
y
increase
from
90.52%
to
96.44%
and
from
46.48%
to
53.55%
in
the
CK+
and
the
J
AFFE
databases
respecti
v
ely
.
Figure3
(c)
sho
ws
that
emotion
recognition
accurac
y
increase
from
64.22%
to
72.49%
and
from
47.29%
to
51.25%
in
the
CK+
and
the
Oulu-CASIA
VIS
databases
respecti
v
ely
.
6.
CONCLUSION
&
FUTURE
W
ORK
In
this
research
w
ork,
we
ha
v
e
proposed
an
automatic
approach
for
f
aci
al
e
xpression
recognition
task.
Our
approach
w
as
tested
using
dynamic
features
that
are
calculated
from
the
first
and
the
last
frames
which
represent
respecti
v
ely
the
neutral
state,
and
an
emotional
state.
Aft
er
detecting
the
f
ace
and
fiducial
points
in
the
first
and
the
last
frames;
all
possible
euclidian
distances
ha
v
e
been
calculated
between
each
pair
of
points.
F
or
that,
we
ha
v
e
calculated
1176
distances,
then,
to
measure
the
deformation;
each
calculated
distance
of
the
first
frame
is
di
vided
by
the
same
calculated
distance
of
t
he
peak
frame.
After
that,
we
ha
v
e
used
a
feature
selection
process
to
reduce
the
number
of
features
by
choosing
only
the
most
rele
v
ant
ones
from
them.
In
the
last
step
of
our
proposed
approach,
we
ha
v
e
presented
the
selected
dynamic
features
to
a
neural
netw
ork
classifier
for
f
acial
e
xpression
recognition.
Ev
aluating
this
approach
on
three
kno
wn
databases
has
gi
v
en
encouraging
results
using
neural
netw
ork
classifier
,
with
an
emotion
recognition
accurac
y
of
99%
on
the
CK+
database,
84.7%
on
the
Oulu-CASIA
VIS
database,
and
93.8%
on
the
J
AFFE
database.
In
our
future
w
ork
we
will
conti
nu
e
de
v
eloping
our
proposed
system
along
se
v
eral
ax
es.
Firstly
,
we
will
in
v
estig
ate
the
possibility
of
adding
other
features
that
represent
the
pose
of
the
f
ace.
Secondly
,
we
also
intend
to
consider
another
source
to
recognize
emotions,
which
is
the
intonation
of
v
oice,
using
acoustic
parameters.
Finally
,
our
ultimate
aim
is
to
combine
the
tw
o
s
ources,
which
are
f
acial
e
xpression
and
v
oice
intonation,
to
automatically
recognize
emotions
from
multimodal
data
using
ne
w
approaches
of
deep
learning
classification.
Emotion
Reco
gnition
fr
om
F
acial
Expr
ession
Based
on
F
iducial
...
(F
atima
Zahr
a
Salmam)
Evaluation Warning : The document was created with Spire.PDF for Python.
58
ISSN:
2088-8708
REFERENCES
[1]
S.
Lee
and
S.-Y
.
Shin,
“F
ace
song
player
according
to
f
acial
e
xpressions,
”
International
J
ournal
of
El
ectrical
and
Computer
Engineering
(IJECE)
,
v
ol.
6,
no.
6,
pp.
2805–2809,
2016.
[2]
P
.
Barros,
G.
I.
P
arisi,
C.
W
eber
,
and
S.
W
ermter
,
“Emotion-modulated
attention
impro
v
es
e
xpression
recogni-
tion:
A
deep
learning
model,
”
Neur
ocomputing
,
2017.
[3]
J.
Arunnehru
and
M.
K.
Geetha,
“
Automatic
human
emotion
recognition
in
surv
eillance
video,
”
in
Intellig
ent
T
ec
hniques
in
Signal
Pr
ocessing
for
Multimedia
Security
.
Springer
,
2017,
pp.
321–342.
[4]
H.
K.
P
alo
and
M.
N.
Mohanty
,
“Classification
of
emotional
speech
of
children
using
probabilistic
neural
netw
ork,
”
International
J
ournal
of
Electrical
and
Computer
Engineering
,
v
ol.
5,
no.
2,
p.
311,
2015.
[5]
S.
Motamed,
S.
Setayeshi,
and
A.
Rabiee,
“Speech
emotion
recognition
based
on
a
modified
brain
emotional
learning
model,
”
Biolo
gically
Inspir
ed
Co
gnitive
Ar
c
hitectur
es
,
2017.
[6]
A.
Mehrabian,
“Communication
without
w
ords,
”
Communication
Theory
,
,
pp.
193–200,
2008.
[7]
P
.
Ekman,
“
An
ar
gument
for
basic
emotions,
”
Co
gnition
&
emotion
,
v
ol.
6,
no.
3-4,
pp.
169–200,
1992.
[8]
C.
Shan,
S.
Gong,
and
P
.
W
.
McOw
an,
“F
acial
e
xpression
recognition
based
on
local
binary
patterns:
A
comprehensi
v
e
study
,
”
Ima
g
e
and
V
ision
Computing
,
v
ol.
27,
no.
6,
pp.
803–816,
2009.
[9]
X.
F
an
and
T
.
Tjahjadi,
“
A
dynamic
frame
w
ork
based
on
local
zernik
e
moment
and
motion
history
image
for
f
acial
e
xpression
recognition,
”
P
attern
Reco
gnition
,
v
ol.
64,
pp.
399–406,
2017.
[10]
N.
Perv
een,
S.
Gupta,
and
K.
V
erma,
“F
acial
e
xpression
recognition
using
f
acial
characteristic
points
and
gini
inde
x,
”
in
Engineering
and
Systems
(SCES),
2012
Students
Confer
ence
on
.
IEEE,
2012,
pp.
1–6.
[11]
F
.
Z.
Salmam,
A.
Madani,
and
M.
Kissi,
“F
acial
e
xpression
recognition
using
decision
trees,
”
in
2016
13th
International
Confer
ence
on
Computer
Gr
aphics,
Ima
ging
and
V
isualization
(CGiV)
.
IEEE,
2016,
pp.
125–
130.
[12]
A.
T
.
Lopes
,
E.
de
Aguiar
,
A.
F
.
De
Souza,
and
T
.
Oli
v
eira-Santos,
“F
acial
e
xpression
recognition
with
con
v
o-
lutional
neural
netw
orks:
Coping
with
fe
w
data
and
the
training
sample
order
,
”
P
attern
Reco
gnition
,
v
ol.
61,
pp.
610–628,
2017.
[13]
Z.
Hammal,
L.
Couvreur
,
A.
Caplier
,
and
M.
Rombaut,
“F
acial
e
xpression
recognition
based
on
the
belief
theory:
comparison
with
dif
ferent
classifiers,
”
in
Ima
g
e
Analysis
and
Pr
ocessing–ICIAP
2005
.
Springer
,
2005,
pp.
743–752.
[14]
E.
Friesen
and
P
.
Ekman,
“F
acial
action
coding
system:
a
technique
for
the
measurement
of
f
acial
mo
v
ement,
”
P
alo
Alto
,
1978.
[15]
M.
P
antic
and
I.
P
atras,
“Dynamics
of
f
acial
e
xpression:
recogniti
on
of
f
acial
actions
and
their
temporal
se
gments
from
f
ace
profile
image
sequences,
”
IEEE
T
r
ansactions
on
Systems,
Man,
and
Cybernetics,
P
art
B
(Cybernetics)
,
v
ol.
36,
no.
2,
pp.
433–449,
2006.
[16]
X.
Pu,
K.
F
an,
X.
Chen,
L.
Ji,
and
Z.
Zhou,
“F
acial
e
xpression
recognition
from
image
sequences
using
tw
ofold
random
forest
classifier
,
”
Neur
ocomputing
,
v
ol.
168,
pp.
1173–1180,
2015.
[17]
F
.
Abdat,
C.
Maaoui,
and
A.
Pruski,
“Human-computer
inter
action
using
emotion
recognition
from
f
acial
e
x-
pression,
”
in
Computer
Modeling
and
Simulation
(EMS),
2011
F
ifth
UKSim
Eur
opean
Symposium
on
.
IEEE,
2011,
pp.
196–201.
[18]
A.
S
´
anchez,
J.
V
.
Ruiz,
A.
B.
Moreno,
A.
S.
Montemayor
,
J.
Hern
´
andez,
and
J.
J.
P
antrigo,
“Dif
ferential
optical
flo
w
applied
to
automati
c
f
acial
e
xpress
ion
recognition,
”
Neur
ocomputing
,
v
ol.
74,
no.
8,
pp.
1272–1282,
2011.
[19]
J.
N.
Bassili,
“Emotion
recognition:
the
role
of
f
acial
mo
v
ement
and
the
relati
v
e
importance
of
upper
and
lo
wer
areas
of
the
f
ace.
”
J
ournal
of
per
sonality
and
social
psyc
holo
gy
,
v
ol.
37,
no.
11,
p.
2049,
1979.
[20]
A.
H.
Basori
and
H.
M.
A.
AlJahdali,
“Emotional
f
acial
e
xpression
based
on
action
units
and
f
acial
muscle,
”
International
J
ournal
of
Electrical
and
Computer
Engineering
(IJECE)
,
v
ol.
6,
no.
5,
pp.
2478–2487,
2016.
[21]
N.
Shepard
and
M.
PITT
,
“Filtering
via
simulation:
auxiliary
particle
filter
,
”
J
ournal
of
the
American
Statistical
Association
,
v
ol.
94,
pp.
590–599,
1999.
[22]
M.
F
.
V
alstar
and
M.
P
antic,
“Fully
automatic
recognition
of
the
temporal
phases
of
f
acial
actions,
”
IEEE
T
r
ansactions
on
Systems,
Man,
and
Cybernetics,
P
art
B
(Cybernetics)
,
v
ol.
42,
no.
1,
pp.
28–43,
2012.
[23]
I.
Matthe
ws
and
S.
Bak
er
,
“
Acti
v
e
appearance
models
re
visited,
”
International
J
ournal
of
Computer
V
ision
,
v
ol.
60,
no.
2,
pp.
135–164,
2004.
[24]
J.-Y
.
Bouguet,
“Pyramidal
implementation
of
the
af
fine
lucas
kanade
feature
track
er
description
of
the
algo-
rithm,
”
Intel
Corpor
ation
,
v
ol.
5,
no.
1-10,
p.
4,
2001.
[25]
M.
J.
L
yons,
S.
Akamatsu,
M.
Kamachi,
J.
Gyoba,
and
J.
Budynek,
“The
japanese
female
f
acial
e
xpression
(jaf
fe)
database,
”
1998.
[26]
A.
Saeed,
A.
Al-Hamadi,
R.
Niese,
and
M.
Elzobi,
“Ef
fecti
v
e
geometric
features
for
human
emotion
recog-
nition,
”
in
Signal
Pr
ocessing
(ICSP),
2012
IEEE
11th
International
Confer
ence
on
,
v
ol.
1.
IEEE,
2012,
pp.
IJECE
V
ol.
8,
No.
1,
February
2018:
52
–
59
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
59
623–627.
[27]
P
.
Luce
y
,
J.
F
.
Cohn,
T
.
Kanade,
J.
Saragih,
Z.
Ambadar
,
and
I.
Matthe
ws,
“The
e
xtended
cohn-kanade
dataset
(ck+):
A
complete
dataset
for
action
unit
and
emotion-specified
e
xpression,
”
in
Computer
V
ision
and
P
attern
Reco
gnition
W
orkshops
(CVPR
W),
2010
IEEE
Computer
Society
Confer
ence
on
.
IEEE,
2010,
pp.
94–101.
[28]
L.
Y
in,
X.
W
ei,
Y
.
Sun,
J.
W
ang,
and
M.
J.
Rosato,
“
A
3d
f
acial
e
xpression
database
for
f
acial
beha
vior
research,
”
in
7th
international
confer
ence
on
automatic
face
and
g
estur
e
r
eco
gnition
(FGR06)
.
IEEE,
2006,
pp.
211–216.
[29]
A.
Majumder
,
L.
Behera,
and
V
.
K.
Subramanian,
“Emotion
recognition
from
geometric
f
acial
features
using
self-or
g
anizing
map,
”
P
attern
Reco
gnition
,
v
ol.
47,
no.
3,
pp.
1282–1293,
2014.
[30]
M.
V
alstar
and
M.
P
antic,
“Induced
disgust,
happiness
and
surprise:
an
addition
to
the
mmi
f
acial
e
xpression
database,
”
in
Pr
oc.
3r
d
Intern.
W
orkshop
on
EMO
TION
(satellite
of
LREC):
Corpor
a
for
Resear
c
h
on
Emotion
and
Af
fect
,
2010,
p.
65.
[31]
P
.
V
iola
and
M.
Jones,
“Rapid
object
detection
using
a
boosted
cascade
of
simple
features,
”
in
Computer
V
ision
and
P
attern
Reco
gnition,
2001.
CVPR
2001.
Pr
oceedings
of
the
2001
IEEE
Computer
Society
Confer
ence
on
,
v
ol.
1.
IEEE,
2001,
pp.
I–511.
[32]
X.
Xiong
and
F
.
T
orre,
“Supervised
descent
method
and
its
applications
to
f
ace
alignment,
”
in
Pr
oceedings
of
the
IEEE
confer
ence
on
computer
vision
and
pattern
r
eco
gnition
,
2013,
pp.
532–539.
[33]
M.
I.
De
vi,
R.
Rajaram,
and
K.
Selv
akuberan,
“Generating
best
features
for
web
page
classification,
”
W
ebolo
gy
,
v
ol.
5,
no.
1,
p.
52,
2008.
[34]
T
.
Kanade,
J.
F
.
Cohn,
and
Y
.
T
ian,
“Comprehensi
v
e
database
for
f
acial
e
xpression
analysis,
”
in
A
utomatic
F
ace
and
Gestur
e
Reco
gnition,
2000.
Pr
oceedings.
F
ourth
IEEE
International
Confer
ence
on
.
IEEE,
2000,
pp.
46–53.
[35]
G.
Zhao,
X.
Huang,
M.
T
aini,
S.
Z.
Li,
and
M.
Pietik
¨
ainen,
“F
acial
e
xpression
recognition
from
near
-infrared
videos,
”
Ima
g
e
and
V
ision
Computing
,
v
ol.
29,
no.
9,
pp.
607–619,
2011.
BIOGRAPHIES
OF
A
UTHORS
F
atima
Zahra
SALMAM
is
a
Ph.D
student
at
LAR
OSERI
Laboratory
,
F
aculty
of
Sciences,
Uni-
v
ersity
of
Chouai
b
Doukkali,
EL
Jadida
(Morocco).
She
obtained
Master
De
gree
in
computer
science
specialty
of
Business
Intelligence
from
the
Uni
v
ersity
of
Sultan
Moulay
Slimane-Morocco
in
2014.
Her
researches
are
in
fields
of
emotion
recognition,
data
mining,
data
analysis,
computer
vision,and
intelligence
artificielle.
She
prepare
a
dissertation
on
emotion
recognition
from
image
and
speech
data
using
data
mining
techniques.
Abdellah
Madani
is
currently
a
Professor
and
PhD
T
utor
in
Department
of
Computer
Science,
Chouaib
Doukkali
Uni
v
ersity
,
F
aculty
of
Sciences,
El
Jadida,
Morocco.
His
main
research
interests
include
optimization
algorithms,
te
xt
mining,
traf
fic
flo
w
and
modelling
platforms.
He
is
the
author
of
man
y
research
papers
published
at
conference
proceedings
and
international
journals.
Mohamed
Kissi
recei
v
ed
his
PhD
de
gree
from
the
UPMC,
France
in
2004
in
Computer
Science.
He
is
currently
a
Professor
in
Department
of
Computer
S
cience,
Uni
v
ersity
Hassan
II
Casablanca,
F
aculty
of
Sciences
and
T
echnologies,
Mohammedia,
M
orocco.
His
current
research
interest
s
in-
clude
machine
learning,
data
and
te
xt
mining
(Arabic)
and
Big
Data.
He
is
the
author
of
man
y
research
papers
published
at
conference
proceedings
and
international
journals
in
Arabic
te
xt
min-
ing,
bioinformatics,
genetic
algorithms
and
fuzzy
sets
and
systems.
Emotion
Reco
gnition
fr
om
F
acial
Expr
ession
Based
on
F
iducial
...
(F
atima
Zahr
a
Salmam)
Evaluation Warning : The document was created with Spire.PDF for Python.