TELKOM
NIKA
, Vol. 13, No. 4, Dece
mb
er 201
5, pp. 1343
~1
351
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i4.2735
1343
Re
cei
v
ed Se
ptem
ber 2, 2015; Re
vi
sed
Octob
e
r 29, 2
015; Accepte
d
No
vem
ber
20, 2015
Recognition of Emotions in Video Clips: The Self-
Assessment Manikin Validation
Dini Hand
ay
ani*
1
, Abdul Wah
a
b
2
, Ham
w
i
r
a Yaacob
3
Comp
uter Scie
nce De
partme
n
t, Kulli
yya
h
of
Information a
n
d
Commu
nicati
on T
e
chnol
og
y
Internatio
na
l Islamic Univ
ersit
y
Mala
ysi
a
*Corres
p
o
ndi
n
g
author, em
ail
:
dini.ha
n
d
a
y
an
i@gma
il.com
1
, abd
ul
w
a
h
ab@
i
i
um.ed
u
.m
y
2
,
h
y
aac
ob@
iium
.
edu.m
y
3
A
b
st
r
a
ct
Many r
e
searc
h
d
o
m
ai
ns
us
e vi
deo
co
nte
n
ts as
sti
m
uli
for study
on
h
u
man
e
m
oti
o
n
s
.A vide
o
content w
i
thi
n
a partic
u
l
a
r g
e
n
re or
a s
pecif
ic the
m
eevok
es dyn
a
m
ic
e
m
oti
ons th
at a
r
e hi
ghly
usef
u
l
in
ma
ny res
earc
h
fields. T
h
e pr
esent stu
d
y pr
opos
ed
a set
of vide
o-cli
p
st
imuli
that e
m
b
ody fo
ur e
m
oti
ons
und
er specific
genr
es of mov
i
es
, na
me
ly ha
ppi
ness, cal
m
ness, s
adn
ess
and fear. T
w
o exper
i
m
ents
(a
preli
m
in
ary a
n
d
a v
a
l
i
dati
on)
w
e
re con
ducte
d in
or
der
to validat
e the video clips. S
e
lf-Ass
essment M
anik
i
n
w
a
s utili
z
e
d to
rate the
vid
eos
. All the
vi
deo
clips w
e
re
rate
d w
i
th res
pect t
o
val
enc
e a
n
d
arous
alj
u
d
g
me
nt.
In the preli
m
in
ary exper
iment
, the video cli
p
sw
ere rat
ed i
n
terms of ho
w
clearly the expecte
d e
m
ot
ion
s
w
e
reevoke
d
. T
he val
i
d
a
tion
e
x
peri
m
e
n
t w
a
s cond
ucted to
c
onfir
m the res
u
lts from pr
eli
m
i
nary ex
peri
m
e
n
t,
and o
n
ly vi
deo
clips w
i
th hig
h
recog
n
itio
n rates w
e
re inclu
d
ed into d
a
ta se
t.
Ke
y
w
ords
: SAM, stimuli, vid
e
o
emotio
n, vale
nce, arous
al
Copy
right
©
2015 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Emotional respon
se
s to a v
i
deo
conte
n
t may we
ll b
e
one of the
m
o
st complex t
a
sks th
at
human
s
can
accompli
sh.
There
has
been a
re
se
arch tren
d to
wardsth
e
affective co
mpu
t
ing
comm
unity todevelo
p
a stimuli re
po
sitories a
nd reco
gni
ze hu
man emotio
n
s
stimulate
d
b
y
watching vid
eo
clips,
as
sho
w
n i
n
T
a
ble 1.
Whe
n
wat
c
hing
a
video, ape
rson expe
rie
n
ces
emotion
ba
sed o
n
hi
s/he
r co
gnitive pe
rce
p
tion
and
appraisal of t
he
situation
depi
cted in
the
video [1]. For this rea
s
on, it is nece
s
sary to
understa
nd a huma
n
cognitive perception of a given
situation a
nd
its relation to
his/he
r emoti
ons [2].
Although th
e
r
e i
s
a
n
in
creasi
ng i
n
tere
st in
th
e recognition
of e
m
otions u
s
in
g video
stimuli, many
questio
n
s
re
main; ho
w do
the videos
ev
oke e
m
otion
s
, and to what
extent can th
ey
do so?To
an
swer th
ese qu
estion
s, for a
start, a
set of
video stimuli
need
s to b
e
e
s
tabli
s
he
d.Th
e
aim of this study is to provide su
ch stimuli
set. Here, four categori
e
s of e
m
otiona
re used;
‘happy’, ‘cal
m’, ‘sad’, an
d ‘fear’. Th
ey are d
e
fined
on the dim
e
n
s
ion
s
of vale
nce
and a
r
o
u
s
al.
Valence ran
g
e
s from po
sit
i
ve (plea
s
ant
) to negative (unple
a
sant) while aro
u
sal rang
es fro
m
excited (a
ctive)
to calm (p
assi
ve). As p
r
esented
in F
i
gure
1,
the correspon
ding
dimen
s
ion
s
of
valence a
nd
arou
sal
a
r
e
d
epicte
d
a
s
ho
rizo
ntal
and
vertical
axe
s
,
resp
ectively,
on a
Carte
s
ia
n
coo
r
din
a
te space. The video stimuli
set have to con
s
i
s
t of 2
2
= 4 videos of expressi
ons
corre
s
p
ondin
g
to the
com
b
ination
s
of {
plea
sant
,
un
p
l
easant
}
{
ac
tiv
e
,
pa
ss
iv
e
} for e
a
ch of t
he
emotion
s
.
Two exp
e
rim
ents were condu
cted in
orde
r to
validatethe vide
o-cli
p
stimuli.
In the
prelimi
nary e
x
perime
n
t, the partici
pant
s rated the video clip
s ba
se
d on the valence a
nd arou
sal
judgem
ent. The aim was t
o
determi
ne
whi
c
h video
clip
s that can
be cle
a
rly ide
n
tified (in terms of
emotional re
spon
se) within
an
optimal d
u
ration of
time. In the valid
ation expe
rim
ent, these vid
eo
clip
s
were rated to fin
d
the
one
s
with hig
hest
accu
ra
cy that wo
uld f
o
rm th
e d
a
ta
set. The
rest
of
the pape
r is
orga
nized a
s
follows:
Rela
ted wo
rks
a
r
e revie
w
ed in
Section 2. S
e
ction 3
pre
s
ent
and de
scrib
e
d
the materi
al and meth
od. Section
4 pre
s
ent
s developme
n
t of stimuli. Current
open i
s
sue
s
, future wo
rk, and co
ncl
u
si
on
s are
cove
red
in Section 5.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 134
3 – 1351
1344
2. Related Works
Empha
sizi
ng
on develop
ment of the
stimuli
set
,
r
e
v
i
ew
s on se
v
en sele
ct
ed
scient
if
ic
literatures
we
re don
e ba
se
d on seve
ral
categ
o
rie
s
in
cludi
ng the d
a
taba
se nam
e,stimuli set
size
as well as aff
e
ct rep
r
e
s
e
n
tation as
sho
w
n in Table 1.
Table 1. Moo
d
a
nd Emo
t
i
on Stimuli
R
eposi
t
o
r
ies
No
Source
Name
Size
A
ffe
ct
Represe
nta
t
io
n
1
Koelstra et al., 2012 [2]
Database for Em
otion Anal
y
s
is
using Ph
y
s
iological Signal (DEAP)
40 music videos;
a
minuteeach.
Valence,
arousal, and
dominance.
2
Sandra Ca
rvalho
, Jorge
Leite, Santiago
Galdo-
Alvarez, 2012 [3]
Emotional Movie Database
(EMDB)
50 film clip
s; 40
seconds each.
Valence,
arousal, and
dominance.
3
Douglas-Co
w
ie, Co
w
i
e,
& Sneddon, 200
7 [4]
HUMAINE Data
b
a
se
50 clips; 5 to 180
seconds each.
Intensit
y
,
arousal, valence,
dominance, and
pr
edictability
.
4
Schaefer, Nils, Sanchez,
& Philippot, 2010 [5]
F
ilmStim
70 film clip
s;1 to
7minutes each.
Six
emotionsdiscreet
and 15 mixed
feeling scores.
5
M. Sole
y
m
ani,
Lichtenauer, Pun
,
&
Pantic, 2012 [6]
MAHNOB
-
HCI
20 film clip
s; 35 to 117
seconds each.
Arousal,
valence,
dominance, and
pr
edictability
.
6
Schedl et al., 2014 [7]
VIOLENT SCEN
E DATABASE
25 full movies.
Not repo
rted.
7
Bave
y
e
, Dellandr
ea,
Chamaret, &
Ch
en,
2015 [8]
LIRIS-ACCE
DE
9,800 film clips;8
to 12
seconds each.
Valence and
arousal.
With rega
rd t
o
the affe
ct repre
s
e
n
tation
, one
study
u
s
ed discrete approa
ch
to descri
be emo
t
ion,
while
some
others re
presented em
otio
ns in eith
er
2
D
va
le
nc
e-
ar
o
u
s
a
l
s
p
ac
e o
r
3
D
va
lenc
e
-
arou
sal
-
d
o
mi
nan
ce, as
su
gge
sted by p
s
ych
o
logi
sts.
Although the
r
e are many
sets of video
stim
uli a
s
me
ntioned
abov
e, most of th
em are
prote
c
ted
by copyri
ghts an
d thus not
fre
e
ly available.
For vide
o sti
m
uli that were freely avail
abl
e
online,
some
of them no l
onge
r do. Thi
s
prompt
s th
e nee
d of a freely availa
bl
e data
s
et tha
t
is
suitabl
e for re
sea
r
ch on hu
man emotio
n
s
.
Figure
1. Ci
r
c
u
m
ple
x
Mo
del o
f
Affect
from
[9] wi
th
the
emotio
nal
sta
t
e
colo
rs
[10
]
,
wher
ebyth
e
x
-a
xis
is fo
r valence scale
d
a
nd
y
-
a
xis is for
ar
ousal
scaled
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
cog
n
ition o
f
Em
otions in Video Clip
s: The Self-Assessm
ent Man
i
kin …
(Dini Handayani
)
1345
3. Materials
and Method
3.1. Emotional Model
The mo
st straightforwa
r
d way to expre
ss em
ot
ion
s
is by using
ca
tegori
c
al ap
proach or
discrete la
bel
s, su
ch a
s
‘a
nger’, ‘conte
m
pt’, ‘disgu
st’, ‘fear’, ‘sad’, ‘surpr
i
s
e’, an
d ‘happy’. On
the
other
han
d, psycholo
g
ist
s
o
ften expre
ss
emotio
ns in an
n
-di
m
ensi
onal
space. Ru
ssel
l [9]
prop
osed a
two-dim
e
n
s
i
onal affectiv
e spa
c
e m
o
del for mea
s
uri
ng em
otions
kno
w
n
as
circum
plex model of affect. It is compo
s
ed of valence
and aro
u
sal.
Bialosko
rski et al., labeled emot
ional st
ates with col
ours [10], as illustrate
d in Figure 1.
Hap
p
y emoti
onal
state, in
dicate
d in
orange, i
s
d
e
fined
as
havin
g po
sitive va
lence an
d hi
gh
degree
of a
r
ousal. Calm
emotional
st
ate, indi
ca
te
d in
green, i
s
d
e
fined
a
s
having
po
sit
i
ve
valence b
u
t l
o
w
deg
ree
of
arousal. Sad
emotion
a
l
st
ate, indi
cated
in bl
ue, i
s
d
e
fined
as hav
ing
negative val
ence an
d lo
w de
gree of
aro
u
sal. Fe
ar
, indi
cate
d
in re
d, is
d
e
fined a
s
ha
ving
negative vale
nce b
u
t high
degree of aro
u
sal.
3.2. Self-Ass
essmen
t Ma
nikin (SAM)
3.2.1. Repre
sentation
The commo
n
l
y used tech
nique to vali
date the emo
t
ion stimuli is SAM [11]. SAM is a
self-rep
orting
affective state measu
r
em
ent, usi
ng
ca
rtoon like ma
nikin (se
e
Fi
gure 2
)
to pl
ot
basi
c
em
otions on the affective space.
A nine-poin
t
pictori
a
l scale was
utiliz
ed for the purpose of
this
s
t
udy. In the following, two sets
of mani
ki
n were u
s
e
d
. Th
e first set is the scori
ng
for
valence, the
range
is from
nine
(ha
ppy)
to one
(s
a
d
). The se
con
d
set
is
th
e scori
ng
fo
r
a
r
ou
sa
l,
the rang
e is from nine (acti
v
e) to one (p
assive).
SCO
R
IN
G
:
9
8
7 6 5
4
3
2 1
Figure
2.
SAM
In orde
r to m
easure th
e a
g
ree
m
ent b
e
twee
n
emotio
n label
s fro
m
video cli
p
s a
n
d
that of
SAM, the SAM arou
sal a
n
d
valen
c
e
score
s
we
re tra
n
slate
d
into f
our
emotio
nal
state
s
: ‘ha
p
p
y
’,
‘calm’, ‘sad’, and ‘fea
r’ as
descri
bed i
n
Section
3.1.
Based
on the
two axes i
n
Figure 1, ea
ch
partici
pant
h
ad to
sel
e
ct
one
of the S
A
Ms. A SAM
is d
e
fined
as ‘hap
py’ wh
e
n
the l
e
vels
of
valence and
arou
sal
a
re ab
ove 5:
(val
en
ce
5)
(ar
o
u
s
al
5)
(
1
)
‘Calm’ is
whe
n
the levels o
f
valencei
s ab
ove 5 and aro
u
sal b
e
lo
w 5:
(val
en
ce
5)
(
a
r
o
u
s
a
l
<
5
)
(2)
‘Sad’ is wh
en
the levels of valence is bel
ow 5 an
d aro
u
sal b
e
lo
w 5:
(val
en
ce < 5
)
(
a
r
o
u
s
a
l
<
5
)
(3)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 134
3 – 1351
1346
‘Fear’ i
s
wh
e
n
the levels o
f
valence is b
e
low 5 a
nd arousal above
5:
(val
en
ce <5)
(aro
us
al
5
)
(4)
3.3 Experiment Protocol and Setup
The p
a
rti
c
ipa
n
ts
were b
r
ief
ed a
bout the
experim
ent th
roug
h a
con
s
ent form
and
a verb
al
introduction. Participants were
al
so
instructed on how
to
fill i
n
thei
r SAM forms.
The
approxim
a
te
time interval
betwe
en th
e
start
of a tri
a
l an
d
the
e
nd of th
e
sel
f
-rep
o
rting
p
hase
wa
s three
minutes an
d ten second
s.
Eights vide
o
clip
s were
pl
ayed from
th
e propo
se
d d
a
taset in
ra
n
dom
orde
r fo
r ea
ch pa
rticipa
n
t. The e
n
tire
p
r
otocol
too
k
30 minute
s
o
n
average, in
addition
of five
minutes of
se
tup time (se
e
Figure 3).
The p
r
op
ose
d
video
set
con
s
i
s
ted of
video cl
i
p
s
selecte
d
from
Asian m
o
vie
scene
s,
comm
ercial
a
d
vertise
m
ent
s, an
d onli
ne
resou
r
ces.
T
h
e selectio
n
criteria fo
r the v
i
deo
clip
s
were
as follo
ws: (i
) the video
cli
p
s
sho
u
ld
be
unde
rst
and
a
b
le with
out e
x
planation,
(ii
)
the
duration
of
the video
clip
s shoul
d be
relatively sho
r
t, and (iii
) the
video cli
p
s
sh
ould evo
k
e
o
n
ly one em
otion
(rath
er than
multiple emot
ions) from the
particip
ants.
T
r
ial 1
…
T
r
ial 5
…
T
r
ial 8
Video Clip
SAM
1~3 minutes
10 seconds
Figure
3. Th
ere
wer
e
eig
h
t
tri
a
ls in
ea
ch
experi
m
e
n
tal
sessi
on
.Ea
c
h
trial
w
a
s
con
ducte
d
with
a
vid
e
o
clip.
The
self-rep
o
rti
ng ph
ase
wa
s don
e
a
t
th
e end
of ea
ch
tri
a
l
After watchin
g
ea
ch vide
o
clip, the
participant
s were given a
n
S
A
M form e
a
ch and
aske
d
to
provide
the
fo
llowing
info
rm
ation: (i
) vale
nce
sco
r
e,
(ii) arou
sal
score, and
(iii
) the
co
nfirmatio
n
if
they wat
c
hed
the cli
p
p
r
io
r to the exp
e
riment. The
e
x
perime
n
t wa
s p
e
rfo
r
med
i
n
a
cla
s
sroo
m
environ
ment with
co
ntrolle
d
temperature and illumin
a
tion.
4. Dev
e
lopment of Stimul
i
4.1. Descrip
tion of the Video Set
In this p
r
op
o
s
ed
set of vi
deo-clip
stim
uli,
four
cate
gorie
s of
em
otion (‘h
appy’
,
‘calm’,
‘sad’, an
d ‘fear’)
were set. The clip
s were taken
fro
m
different films and sh
ows o
f
various ge
nre
s
to express th
ese em
otion
s
. In a study
done by
Ekman, happi
n
e
ss, sad
n
e
s
s, and fear were
con
s
id
ere
d
a
s
basi
c
emoti
ons[1
2]. Calmness was n
o
tcon
sid
e
re
d as a ba
sic e
m
otion in Ekman’s
study, butitis in this stud
y as the op
posite of
fea
r
, mirro
ring t
he fact that happi
ne
ss is the
oppo
site of sadne
ss.
‘Hap
py’, ‘cal
m’, ‘sad’, a
n
d
‘fear’a
re d
e
fined a
s
regi
o
n
s al
ong val
e
nce
and
arou
sal axe
s
as illu
strated i
n
Figure 1, together
with
their explan
atio
ns in Sectio
n 3.1.
‘Hap
py’ videos are co
n
s
ide
r
ed a
s
‘
a
rou
s
in
g’ an
d ‘plea
s
ant’.
‘Calm’ videos a
r
e
con
s
id
ere
d
a
s
‘low
arousi
ng’ and ‘
p
lea
s
ant’. ‘S
ad’vi
deo is
co
nsi
dere
d
a
s
‘lo
w arou
sing’
and
‘unple
a
sa
nt’. ‘Fear’ vide
os
are con
s
ide
r
e
d
as ‘arou
s
in
g’ and ‘un
p
le
asa
n
t’.
To cal
c
ul
ate the pa
rticip
ant
s’ pe
rception
rate
(i
n pe
rce
n
tage) i
n
re
cogni
zing a
n
e
m
otion
in a happy video,
V
h
, the followin
g
form
ula is u
s
ed:
100
∗
(5)
whe
r
e
n
is th
e gro
up
size
of the parti
cip
ants,
lh
i
i
s
the
emotional i
n
tensity level of
each pa
rticip
ant
whe
n
they watch a hap
py video. Like
wise,
equ
ation (5
) ca
n be re
-written for Cal
m
ness,
Sadne
ss a
nd
Fearne
ss inte
nsity level as
,
and
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
cog
n
ition o
f
Em
otions in Video Clip
s: The Self-Assessm
ent Man
i
kin …
(Dini Handayani
)
1347
4.2. Prelimin
ar
y
Stud
y
4.2.1. Participants
Forty-eig
h
tyoung a
nd h
eal
thy particip
a
n
t
s (2
6 wome
n and
22 m
e
n) of differen
t
races
(Malay an
d
non-M
a
lay) and edu
ca
tional ba
ckg
r
oun
ds volu
nteere
d
to partici
pate i
n
the
prelimi
nary e
x
perime
n
t
at Internation
a
l
I
s
lami
c Un
ive
r
sity of Mal
a
ysia
(IIUM
)
. T
heir age
s varied
betwe
en 1
9
t
o
39 ye
ars ol
d, with m
ean
(M) of 21.
56
years ol
d an
d sta
nda
rd
d
e
viation (S
D) of
5.11 yea
r
s.
They h
ad
different
ed
ucational
ba
ckgro
und
s from
u
nder
graduat
e
to p
o
stg
r
ad
uate
stude
nts with
different Engli
s
h proficie
nc
y
from intermediate to native s
p
eakers
.
4.2.2. Materi
al
The
set of 2
9
video
clip
s con
s
i
s
ting o
f
four catego
ries of em
otion were u
s
e
d
for th
e
prelimi
nary
study. Some of these
clip
scame
wi
th English
su
btitles. The ex
perim
ent also
examined
the
role
of lang
u
age in
the
study of
emotio
ns fo
r multira
c
ial p
a
rti
c
ipa
n
ts.The
duration
of the clip
sv
aried
betwee
n
20
se
con
d
s
to 17
7
second
s, with M
of 87.10
se
con
d
s
and S
D
of
37.28 second
s.
4.2.3. Proce
dure
Each
parti
cipant wasasked to
fill in the SAM form
after watchi
ng a video cli
p
. Eight
video clip
s were rated by
each pa
rticip
ant. They we
re also
asked
to confirm if they have viewe
d
the clip
s befo
r
e; the aim was to obtai
n t
heir ge
nuin
e
emotional
re
spon
se
s.
4.2.4. Results
Table 2 list
s
the video clip
s used in pre
lim
inary and
validation experim
ents. T
hey are
pre
s
ente
d
an
d org
ani
zed
by categ
o
rie
s
of emotion
(‘H’ for ‘h
appy
’, ‘C’ for ‘
c
al
m’, ‘F’ for ‘fe
a
r’,
and ‘S’ for ‘sad’
). The p
e
rcentag
e of the best
re
cog
n
ition, du
ration, and result from the
prelimi
nary
e
x
perime
n
t are sho
w
n fo
r
each video
clip.The
re
sult
of prelimina
r
y experim
ent
is
sho
w
s in Fig
u
re 4. 20 out
of 29 video cl
ips are in
cl
ud
ed for validati
on experi
m
en
t; consi
s
t of five
happy video
s, five calm videos, five sad
videos a
nd five fear video
s.
(a)
(b)
Figure
4. (
a
)
Perce
n
tage
rating for e
a
ch video clip.
Yellow ba
r in
dicate
d the e
x
clude
d video
clip
s for valid
ation study.
(b)
Mean ratin
g
on a 9-p
o
int
s
scale o
b
tain
ed for ea
ch video cli
p
on
valence and
arou
sal. Ea
ch
symbol re
pre
s
ent
s one vid
eo clip.
Each im
age
was rated 1
3
times in
avera
ge.
For
ea
ch
video cli
p
, the re
co
gnitio
n
re
sult
wa
s ba
se
d
on the u
s
e
r
perceptio
n
percenta
ge
on the vide
o clip
s
with
ce
rtain exp
e
cted
emotion
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 134
3 – 1351
1348
4.2.5. Discus
s
ion
In general, video clip
s’that expre
s
s sadn
es
s a
nd hap
pine
ss we
re sli
g
h
t
ly better
recogni
se
dco
m
pared to those exp
r
e
ssi
ng calm
ne
ss
and fea
r
. Psychol
ogi
sts su
gge
sted vide
os
to be 1 to 10 minutes of le
ngthto evoke a parti
cu
la
r e
m
otion [13]. For this
rea
s
on, Video 8
wa
s
exclud
ed fro
m
the data se
t eventhough
the accuracy
wa
s high, du
e to
its short l
ength. Ho
wev
e
r,
video 26 al
so
exclud
ed fro
m
the data
set, due to the
long du
ratio
n
. The re
sult
s
sho
w
e
d
that the
partici
pant
s
reco
gni
zed
m
u
ltiple em
otio
ns i
n
vide
o
clips th
at hav
e du
ration
sof
more tha
n
t
w
o
minutes.
Additionally, some of the
video clipst
hat had
no subtitle failed
to be recog
n
ize
d
by
partici
pant
s. It shows that langu
age play
s an im
po
rtan
t role in re
cog
n
izin
g emotio
ns.
The pa
rtici
p
a
n
ts could
still
corre
c
tly re
cogni
ze the vi
deo
s’ emotio
ns eve
n
thou
g
h
it wa
s
theirfirst time watching the
m
. A total of
20 video cli
p
s were in
clu
d
e
d
for validatio
n experim
ent.
4.3. Validation Stud
y
4.3.1. Participants
A
different group of
pa
rtici
pants: 113
yo
ung and healt
h
y
parti
cipa
nts
(54 wo
men and 59
men) of diffe
rent
ra
ce
s (Malay an
d n
on-M
a
la
y)
an
d edu
catio
nal
ba
ckg
ro
und
s volunte
e
re
d to
partici
pate i
n
the validati
on expe
rime
nt at
IIUM. They
were unde
rg
radu
at
e
stu
dent
s with
different En
gl
ish
profi
c
ien
cy from inte
rm
ediate to
nati
v
e sp
ea
kers.
In additio
n
, th
eir a
g
e
s
va
rie
d
betwe
en 19 t
o
21 years ol
d, with M of
19.99 years ol
d and SD of 0
.
81 years.
4.3.2. Materi
al
The set of 20 video clip
s con
s
isting of
fourcate
gori
e
s of emotio
n from the prelimina
r
y
study was
used. Du
e to la
ngua
ge b
a
rrier, Engli
s
h
subtitles
were
adde
d to eve
r
y video cli
p
s to
avoid failure
of recog
n
ition. The vide
o clip
s’ d
u
ra
tions varie
d
betwe
en 60
se
con
d
s to 103
se
con
d
s, with
M of 76.6 se
con
d
s a
nd SD of 16.20 se
con
d
s.
4.3.3. Proce
dure
Similar with t
he preliminary study, parti
cipants
were asked to fill in their SAM forms
after
watching a vi
deo
clip. Eig
h
t video clip
s were rate
d
by each pa
rticipa
n
t. The p
a
rticip
ants
were
also a
s
ked if they had s
een
the video clip
s before.
(a)
(b)
Figure
5. (
a
)
Perce
n
tage
rating for e
a
ch video clip.
Yellow ba
r in
dicate
d the e
x
clude
d video
clip
s for a final data set.
(b)
Mean ratin
g
on a 9-p
o
int
s
scale o
b
tain
ed for ea
ch video cli
p
on
valence and
arou
sal. Ea
ch
symbol re
pre
s
ent
s one vid
eo clip.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
cog
n
ition o
f
Em
otions in Video Clip
s: The Self-Assessm
ent Man
i
kin …
(Dini Handayani
)
1349
4.3.4. Results
The p
e
rcent
age of th
e
best
re
cog
n
ition, du
ration
, and
re
sult from the
validatio
n
experim
ent are sho
w
n for ea
ch video clip, as l
i
sted in Tabl
e 2.The re
sult of validation
experim
ent is sho
w
s in Fig
u
re 5.
In the valid
at
ion exp
e
rim
e
nt, each vid
e
o
was rated
four time
s in
average.
Fo
r e
a
ch
video cli
p
, the re
cog
n
ition
result wa
s b
a
se
d
on th
e
use
r
pe
rcepti
on pe
rcentag
e on the vid
e
o
clip
s with
certain expe
cted
emotion
s
.Fin
ally, 16 out
o
f
20 video
cli
p
s
are
in
clud
ed for data
set;
con
s
i
s
t of four happy video
s, four calm video
s, four sa
d videos a
nd
four fear vide
os.
4.3.5. Discus
s
ion
Ultimately, 16
video
s
we
re
cho
s
e
n
from
the
validatio
n
expe
riment.
Only video
s
with th
e
recognitio
n
a
c
cura
cy of m
o
re th
an
60
% were in
clu
ded fo
r the
stimuli data
s
et
. In addition,
with
rega
rd to
the
duratio
n, the
video cli
p
s
were
ke
pt as
short a
s
p
o
ssi
b
le to avoid
multiple em
otional
recognitio
n
.
5. General Di
scussio
n
an
d Conclusio
n
In this study,
a set of video stimuli h
a
d
been p
r
o
p
o
se
d. Some importa
nt issu
es were
discu
s
sed; sp
ecifically, the
durat
io
n of the video clip
s, the authent
icity of participants’ emotio
n
s
while watchi
n
g
the video clips, and final
ly the use
of subtitles for
multira
c
ial pa
rticipa
n
ts. Th
e
use
of SAM had al
so b
e
e
n
sh
own to b
e
an effe
ctive tool to re
co
gnize emotio
ns fro
m
vale
nce
and arou
sal d
i
mensi
o
n
s
.
Table 2. The
video clip
s u
s
ed in experi
m
ental study
No
Cod
e
So
ur
c
e
Du
rat
i
on
(min
utes)
Be
st
Reco
gnit
i
on
Pr
el
im
i
n
ary
Ex
p
e
ri
ment
(%)
Pr
el
im
i
n
ary
Ex
p
e
ri
ment
Re
su
lt
Be
st
Reco
gnit
i
on
Vali
dat
i
on
Ex
p
e
ri
ment
(%)
Vali
dat
i
on
Re
su
lt
1 H01
Maxis Har
i
Ray
a
2
013 TVC
(Eng.)
1 67
Include
d for
valid
at
io
n.
77
Include
d
for dataset.
2 C01
Incredi
ble
Ind
i
a
1.58
20
Exclud
ed.
3 S01
Touchi
ng Th
ai
Advertisem
ent,
Shows
Ho
w A
Singl
e Act of
Kindn
ess Cou
l
d
Chan
ge Your
Life
2.57
50
Exclud
ed.
4 F01
The G
r
udge M
o
vie, Scari
e
st
Horror Sce
n
e
1.43
89
Include
d for
valid
at
io
n.
71
Include
d
for dataset.
5
H02
PETRO
NAS Jahit
60s TVC
1
45
Exclud
ed.
6 C02
Beach DVD-W
a
ve
-With
Re
laxi
ng
Beaches a
nd Se
a Sounds
1.03
96
Include
d for
valid
at
io
n.
68
Include
d
for dataset.
7 S02
Raya
TVC PTS
Medi
a G
r
ou
p -
'Ibu, Al-Fatihah Tu
Apa'
2.57
34
Exclud
ed.
8 F02
Missin
g
O
u
r
Dea
l
s Wil
l
H
aunt
You - Little G
i
rl TV Advert
.20 77
Exclud
ed.
9 H03
[Thai TVC] 'Mae
Toi' - Tha
i
Life
Insuranc
e
1.57
33
Exclud
ed.
10
C03
‘Hav
asup
ai Indi
an Waterfal
l
Rela
xatio
n
’ The Cl
assic Vid
eo by
Davi
d Hut
i
n
g
1 95
Include
d for
valid
at
io
n.
79
Include
d
for dataset.
11
S03
A Blind Fath
er an
d
His Da
ughter-
Short Sad Story
1 61
Include
d for
valid
at
io
n.
66
Include
d
for dataset.
12
F03
Proton
Advertis
em
ent,
Seat Belt
1
11
Exclud
ed.
13
H04
CNY C
o
mmerc
i
a
ls 20
13 -
BERNAS - 'Ka Fan
'
1.3 12
Exclud
ed.
14
C04
Cuia d
e
Via
gem-L
angka
w
i
1.43
93
Include
d for
valid
at
io
n.
44
Exclud
ed.
15
S04
BERNAS-
Chi
nese
Ne
w Ye
ar
Commerc
ia
l-F
a
mi
l
y
Reun
ion
Dinn
er ‘Sek Fan
’
1 78
Include
d for
valid
at
io
n.
78
Include
d
for dataset.
16
F04
The G
r
udge 3-sc
ar
iest scene
1.20
63
Include
d for
valid
at
io
n.
32
Exclud
ed.
17
H05
N
i
d
o
M
i
l
k
-
Y
ou
’
r
e My
N
u
mb
e
r
On
e
2014 TVC Sh
ar
on Cun
e
ta &
Barbie Almalbis
1.30
96
Include
d for
valid
at
io
n.
67
Include
d
for dataset.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 134
3 – 1351
1350
18 C05
Relaxing D
V
D-Mang
rove
Journe
y-T
r
opical Waterfalls
w
i
th N
a
ture So
un
ds
1 82
Included
for
validation.
81
Included
for
dataset.
19 S05
The Saddest
Commercial
Ever
1.30 100
Included
for
validation.
96
Included
for
dataset.
20 F05
The Ring-best
scene as a
horro
r mov
i
e
1.38 96
Included
for
validation.
70
Included
for
dataset.
21 H06
Dtac
TriNet
-Hap
piness
1.21
88
Included
for
validation.
59
Excluded.
22 C06
Robin Bird C
h
irping an
d
Singing - Song
of Robin Red
Breast Birds - Ro
bins
1.12 89
Included
for
validation.
89
Included
for
dataset.
23 S06 ‘Crash’
Saddest
scene
1.36
53
Included
for
validation.
21
Excluded.
24 F06
The scariest scene ever-
T
h
e
E
y
e-H
o
rr
or movie
1.17 75
Included
for
validation.
62
Included
for
dataset.
25 H07
Bab
y
Laughing
H
y
stericall
y
at
Ripping Paper
1 92
Included
for
validation.
93
Included
for
dataset.
26 C07
Heart
w
a
r
ming
Thai
Commercial -
Thai Go
od
Stories
2.55 80
Excluded.
27 S07 Line
TVC-Closer
1.30
75
Included
for
validation.
75
Included
for
dataset.
28
F07
The most scar
y
s
c
ene on roof
1.09
87
Included
for
validation.
87
Included
for
dataset.
29
H08
Tourism Australia’s new
ad
1
60
Included
for
validation.
89
Included
for
dataset.
With re
gard to the validation of the stimuli,
as a future work, ad
ditional expe
riments to
measure em
otions a
r
e
ne
eded
with
an
implicit
a
p
p
r
o
a
ch
such a
s
electro
en
cep
halog
ram
(E
EG)
to automatica
lly recog
n
ize partici
pant
s’ emotion
s
wh
en they watch these vide
o clips. Altho
ugh
there a
r
e ot
her m
e
a
s
ure
m
ent tools
available,
th
ey seem l
e
ss suitable fo
r re
co
gnition
of
emotion
s
at first glan
ce.
In
c
o
nc
lus
i
on, b
y
c
r
e
a
t
ing th
is
d
a
t
as
e
t
, it
is hope
d
that it can
resolve the l
a
ck of
availability of previous dat
a sets and it can be
easily
shared and used by
other researchers in
the field of affective com
p
u
t
ing.
Ackn
o
w
l
e
dg
ment
This work is suppo
rted by Funda
mental
Re
sea
r
ch Grant Schem
e (FRGS
)
funde
d by the
Ministry of Hi
gher Ed
ucation
(G
rant cod
e
: FRGS14
-*
137-037
8).
Referen
ces
[1]
KR Scherer. “W
hat are emo
t
ions
? An
d ho
w
ca
n the
y
be
measure
d
?”.
Soc. Sci. Inf.
200
5;
44(4
)
:
695
–7
29.
[2]
S Koelstra, C
Muhl, M Sol
e
ymani, JS Le
e, A Ya
zda
n
i, T
Ebrah
i
mi, T
Pu
n, A Nijh
olt an
d I (Yiann
is)
Patras. “DEAP
: A Datab
a
se f
o
r Emotio
n An
al
ysis
Usin
g P
h
y
s
iol
o
g
i
cal
Si
gna
ls”.
IEEE Trans. Affect.
Co
mp
ut.
2012;
3(1): 18–3
1.
[3]
ÓF
G Sandra
Carval
ho, Jor
ge L
e
ite, Sa
n
t
i
ago Ga
ld
o-Ál
varez. “T
he Emotion
a
l Mov
i
e Data
bas
e
(EMDB): A Self-Rep
ort and
Ps
y
c
h
o
p
h
y
sio
l
ogic
a
l Stud
y”.
Appl. Psychophysiol. B
i
ofeedback
. 20
12
;
37(4): 27
9–
294
.
[4]
E Doug
las-C
o
w
i
e, R Co
w
i
e
a
nd I Sned
do
n. “T
he HUMAINE datab
ase: ad
dr
essi
ng the c
o
llecti
on a
n
d
ann
otatio
n of natura
listic an
d i
nduc
ed
em
otion
a
l
data”.
Affect. Comput
. In
tell. Interact.
200
7: 4
88–
500.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
cog
n
ition o
f
Em
otions in Video Clip
s: The Self-Assessm
ent Man
i
kin …
(Dini Handayani
)
1351
[5]
A Scha
efer, F
Nils,
X Sa
nche
z an
d P
Ph
ili
pp
ot. “Assessin
g
T
he Effectiven
ess of
a
Lar
ge
Datab
a
se
o
f
Emotion-E
liciti
ng Films: A Ne
w
T
o
ol
fo
r Emoti
o
n
R
e
se
a
r
chers”.
Cogn. Emo
t
.
2010: 1–3
6.
[6]
M Sol
e
yman
i,
J Lic
h
ten
auer,
T
Pun, and
M
Pantic
. “A M
u
lt
imod
al
Data
ba
se for Affect
R
e
cog
n
itio
n
and Impl
icit T
a
ggi
ng”.
IEEE Trans. Affect. C
o
mput.
201
2; 3
(
1): 42–5
5.
[7]
M
Schedl, M
S
j
ob
erg, I M
i
ron
i
ca, B Io
nescu,
VL Q
uan
g, Y
G
Jian
g, a
n
d
CH D
e
mart
y. “
VSD20
14: A
Dataset for
Vi
ole
n
t Scen
es
Detectio
n i
n
Holl
y
w
o
o
d
M
o
vies a
n
d
W
e
b
Vid
eos”.
C
o
n
t
ent-Base
d
Multi
m
ed. Ind
e
x
. (CBMI), 2015 13th Int. W
o
rk.
2014.
[8]
Y Bave
ye, E
D
e
lla
ndr
éa, C
C
hamar
et an
d L
Che
n
. “LIRIS-
A
CCEDE: A Vi
deo
Data
base
for Affective
Conte
n
t Anal
ys
is”.
Affect. Com
p
ut. IEEE Tr
ans.
201
5: 1–1
4.
[9]
J Russell. “A circum
ple
x
m
o
d
e
l of affect”.
J.
Pers. Soc. Psychol.
19
80.
[10]
LSS Bial
oskor
ski, JHD W
e
sterink a
nd EL v
an
de
n Broek.
“Mood S
w
i
ngs:
An affective Interactive Art
S
y
stem”.
ICST Inst. Com
p
ut. Sci. Soc. In
formatics T
e
l
e
co
mmu
n
. Eng. 20
09
. 200
9: 181
–
186.
[11]
M Bradl
e
y
an
d PJ
Lan
g. “
M
easuri
n
g
Em
oti
on: T
he S
e
l
f
-Assessment
Maniki
n
and
T
he S
e
manti
c
Differential”.
J. Behav. Ther. Exp. Psychiat.
1994; 25(I).
[12]
P Ekman, D Matsumoto a
n
d
W
V
F
r
iesen.
“F
acial E
x
pre
ssion i
n
Affective Dis
orders”.
W
hat face
Reve
al. Basic
Appl. Stud. Sp
ontan
eo
us Exp
r
. Using F
a
cia
l
Action Co
din
g
Syst.
1997.
[13]
M Sole
ym
ani,
M Pantic a
nd
T
Pun. “Multim
oda
l
Emoti
on Reco
gniti
on in Resp
onse
to
V
i
de
os”.
IEEE
Trans. Affect.
Com
p
ut.
2012;
3(2): 211–
22
3.
Evaluation Warning : The document was created with Spire.PDF for Python.