Indonesian J
ournal of Ele
c
trical Engin
eering and
Computer Sci
e
nce
Vol. 1, No. 3,
March 20
16, pp. 590 ~ 5
9
6
DOI: 10.115
9
1
/ijeecs.v1.i3.pp59
0-5
9
6
590
Re
cei
v
ed
De
cem
ber 3, 20
15; Re
vised Janua
ry 2
4
, 20
16; Accepted
February 9, 2
016
A Comparative Study on Similarity Measurement in
Noisy Voice Speaker Id
entification
Inggih Permana
Dep
a
rtment of Information S
ystem, F
a
cult
y
o
f
Science an
d
T
e
chnolog
y,
State Islamic U
n
iversit
y
of Sult
an S
y
arif
Kasi
m Riau, Peka
n
baru 2
8
2
93, Indon
esia
e-mail:
*1
ing
g
i
h
perma
na@
uin-
suska.ac.id
A
b
st
r
a
ct
One of importa
nt part on spe
a
k
er ide
n
tificati
o
n
is
the meas
u
r
ement of sou
n
d
similar
i
ty. T
h
is study
has c
o
mpar
ed
tw
o of the s
i
mi
larity
me
asur
e
m
e
n
t tech
ni
qu
es i
n
th
e n
o
isy
voic
e.F
i
rst techni
que
is
don
e
by
usin
g s
m
al
lest
vector su
m
of pairs
an
d sec
ond tec
h
n
i
qu
e
is do
ne
by us
ing fre
q
u
ency
of occurre
nce
o
f
sma
llest vecto
r
pairs. Noise
in the voic
e can red
u
ce
acc
u
racy of speak
er ide
n
tificatio
n
signific
antly.
T
o
overco
me this
prob
le
m, the t
w
o of simi
larity
me
asure
m
ent
w
a
s comb
in
ed
w
i
th Least Me
an Sq
uare (
L
M
S
)
for remove n
o
i
s
e. Results of
the exp
e
ri
me
nts show
ed that
the use
of LMS c
an i
m
pr
ove
the accuracy
o
f
speak
er id
entifi
c
ation
at t
he t
w
o of simi
lar
i
ty me
asur
e
m
ent
techni
qu
es. Se
cond tec
h
n
i
qu
e pro
duces
bet
ter
accuracy tha
n
first techniqu
e. Experi
m
e
n
tal resu
lt
also
show
ed i
m
pr
o
v
ement of LM
S learn
i
ng rat
e
can
improve th
e ac
curacy of spea
ker ide
n
tificatio
n
.
Ke
y
w
ords
: LM
S, noisy voice,
soun
d si
mil
a
rit
y
meas
ure
m
en
t, speaker ide
n
t
ification
1. Introduc
tion
Speaker
re
cognition i
s
p
a
rt of the sound p
r
o
c
e
s
sing that ai
ms to find o
u
t who i
s
spe
a
ki
ng. Sp
eaker recogn
ition is divide
d into
two pa
rts, the sp
ea
ker
i
dentification and
spe
a
k
er
verification.
Speaker ide
n
t
ification is a
manne
r to identify some
one from the
existing voice,
whe
r
ea
s
spe
a
ke
r verification is a m
ann
er to verify a claim ag
ain
s
t an identity throu
gh
certai
n
words [1]. Thi
s
study focuses on
spe
a
ke
r identificatio
n.
One of impo
rtant part on spea
ker id
entificati
on i
s
the measurement
of sound
si
milarity.
In this part will be determi
ned o
w
ne
r of the voice t
hat identified. In the previo
us stu
d
y [2]
ha
s
made modifi
cations to the
sound
simila
rity meas
u
r
e
m
ent techni
q
ueby sele
ctin
g the code
bo
ok
that has the
most of sm
allest di
stan
ce with
input vectors to produ
ce
a bett
e
r identification
accuracy. Bu
t the techniq
ue is not re
sistant to
noisy sound. Thi
s
study aim
s
to improve the
cap
ability of that techni
qu
e by addi
ng
active
noi
se
can
c
eli
ng (ANC) in the
pre
-
p
r
ocess of
data.Re
se
arch co
ndu
cted
by Perma
na
et al.
[3] sho
w
ed th
e ANC can i
m
prove
the accu
ra
cy
of
spe
a
ker ide
n
tification. The
ANC meth
od
use
d
in this rese
arch is le
ast mean
s
sq
uare
(LMS).
T
h
is s
t
ud
y use
d
a me
l fr
equ
e
n
c
y
ce
ps
tra
l
co
e
fficie
n
t
(MF
C
C) a
s
a
feature
extra
c
tion an
d
self-o
rg
ani
zin
g
map
(SOM
) as
a code
bo
ok m
a
ker.
M
F
CC cho
s
en
becau
se of th
e way it
works is
based on the
frequen
cy di
fference ca
n be ca
pture
d
by
the huma
n
ear so that it can rep
r
e
s
ent
how p
eopl
e receive
sou
n
d
signal
s [4]. MFCC is ofte
n use
d
be
ca
use it is
con
s
ide
r
ed
a bet
ter
perfo
rman
ce
than othe
r
method
s, su
ch a
s
in
te
rms of redu
ced erro
r rate
s. SOM cho
s
en
becau
se it has bee
n su
cce
ssfully app
lied to high-
d
i
mensi
onal d
a
ta [5]. This
is the rea
s
on
for
usin
g SOM, beca
u
se the re
sults of t
he M
F
CC ve
ctors
can b
e
high d
i
mensi
on.
2. Rese
arch
Metho
d
Broadly
sp
ea
king, thi
s
stu
d
y is divided
into th
ree
p
a
rts.
The
first part i
s
ma
king
of
cod
ebo
ok u
s
i
ng trainin
g
vo
ice data that
are n
o
t given
noise. Th
e seco
nd pa
rt is
a mea
s
urem
en
t
of simila
rity to the codeb
o
o
k that h
a
s
b
een ma
de.Vo
ice d
a
ta u
s
ed
in this pa
rt i
s
the te
st voice
data that h
a
s been
given t
he noi
se. T
h
i
s
stu
d
y u
s
ed white
n
o
ise with
vari
ous values belo
w
6.5
dB. At this p
a
r
t LMS i
s
use
d
a
s
a
data
p
r
ep
ro
ce
ssi
ng
to rem
o
ve n
o
i
se. T
here a
r
e two
si
milari
ty
measurement
techni
que
s u
s
ed in thi
s
study, that are
by using
sm
al
lest vecto
r
su
m of pairs [6, 7]
(the first techniqu
e) an
d by using fre
quen
cy of
occurre
n
ce of smalle
st vect
or pai
rs [2] (the
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
1, No. 3, March 20
16 : 590 – 596
591
se
con
d
techn
i
que). The la
st part is the comp
ari
s
o
n
and analysi
s
o
f
the similarity measureme
n
t
techni
que
s u
s
ed. Fo
r more details, see
Figure 1.
T
r
ai
n
i
n
g
v
o
i
c
e
/
w
i
t
h
ou
t
no
i
s
e
F
e
at
ur
e ex
t
r
a
c
t
i
on
us
i
n
g
M
F
C
C
M
a
ki
n
g
co
d
e
b
o
o
k
us
i
n
g
S
O
M
S
i
mila
r
i
t
y
m
e
as
u
r
e
m
en
t
u
s
i
n
g
f
r
e
q
ue
nc
y
o
f
oc
c
u
r
r
en
c
e
o
f
s
m
a
l
l
e
st
ve
ct
o
r
p
a
i
r
s
Te
s
t
in
g
v
o
i
c
e
/
wi
t
h
n
o
i
s
e
Re
m
o
v
i
n
g
n
o
i
s
e
us
i
n
g
L
M
S
F
e
at
ur
e ex
t
r
ac
t
i
on
us
i
n
g
M
F
C
C
S
i
mila
r
i
t
y
m
e
as
u
r
e
m
en
t
u
s
i
n
g
su
m
o
f
s
m
a
l
l
e
st
ve
ct
o
r
p
a
i
r
s
Re
s
u
l
t
co
m
p
a
r
i
s
o
n
an
d
an
a
l
y
s
i
s
Figure 1. Illustration of re
se
arch metho
d
Experiment
s
in this study will
be
perf
o
rmed
in several
com
b
inations of parameters.
Learning
rate
of ANC that
is attempted
0.1, 0.
3, 0.5, 0.7 and 0.9
.
MF
CC
coef
ficients th
at is
attempted 1
3
,
15 a
nd
20.
SOM cl
uste
r
numbe
r th
at i
s
attempte
d i
s
9,
16, 2
5
, 3
6
, 49, 6
4
, 81
and
100.MF
CC frame len
g
th i
s
12.5 m
s
. O
v
erlap of
MF
CC i
s
0.4. To
pology of SO
M is hexa
g
o
nal.
SOM iteration
numbe
r is 10
00.
At each
com
b
ination
of p
a
ram
e
ters o
n
e
voice files that o
w
ned
by
ea
ch
spe
a
ke
r will
be
us
ed to
c
r
eate the
c
o
debook
. After that
, tes
t
i
ng p
e
rf
orme
d u
s
ing
all voice
data
.
All voice d
a
t
a
use
d
ha
s b
e
e
n
rem
o
ved
sil
ent time. This is do
ne
5 tim
e
s
so that all
voice file
s for each spea
ke
r
ever b
e
the
data to
creat
e the
co
deb
ook. F
o
r ea
ch expe
riment
are
calculat
ed the
re
sulti
n
g
accuracy. After all
the exp
e
rime
nt
s
ca
rried out i
n
a
combinatio
n of
parti
cula
r p
a
r
amete
r
s, th
e
n
comp
uted the
averag
e of a
c
cura
cy. Thi
s
accu
ra
cy
is
use
d
a
s
the l
e
vel of sp
ea
ker ide
n
tificati
on
ability.
2.2. Voice Data
Voice data u
s
ed i
s
ever u
s
ed by Re
da
[8] in
their study. The voice data con
s
i
s
ts of 8
3
spe
a
kers, wh
ich a
r
e divide
d into 35 fem
a
le sp
ea
kers
and 48 m
a
le
spe
a
kers. Th
e spe
a
kers a
r
e
Indian citi
zen
s
of different backg
rou
n
d
s
. Each sp
ea
ke
r has 5 voi
c
e
files in wav fo
rmat. The voice
file length is 1 to 39 seco
n
d
s. The wo
rd
s that s
pea
k by the spea
ker is a ran
d
o
m
combin
atio
n of
numbe
rs. Re
cording
is
do
ne on
the p
h
one u
s
in
g an
IVR syste
m
(Intera
c
tive V
o
ice
Re
sp
on
se).
Sampling rate use
d
is 80
0
0
Hz.
2.3. Similarity
Measurem
e
nt
This
study u
s
e
s
two
mea
s
ureme
n
t techniqu
es
s
i
milarity. In the firs
t tec
hniques
[6, 7],
each in
put vector is me
a
s
ured th
e di
stan
ce
with
vectors
that exist
in a
pa
rticula
r
sp
ea
ke
r
cod
ebo
ok. Choo
se
a pai
r of
vectors wh
ich
h
a
s
th
e
smallest
dista
n
ce fo
r e
a
ch i
nput vecto
r
.
Su
m
all the minim
a
l pairs that
obtaine
d. Perform th
e
s
e p
r
oce
s
se
s for a
ll existing spe
a
ke
r code
boo
k.
After that, ch
oose the
cod
eboo
k
with th
e mo
st
minim
a
l sum
as sp
eakers
repre
s
entin
g the v
o
ice
identified. Illustration of first te
chniqu
es
can b
e
se
en i
n
Figure 2.
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
A Com
parative Study on Si
m
ilarity Measurem
ent
in Noisy Voice Sp
eaker … (In
g
g
ih Perm
ana
)
592
Figure 2. Pre
v
ious simil
a
rit
y
measu
r
eme
n
t
techni
que
s [2
]
Figure 3. Pro
posed simil
a
ri
ty measurem
ent
techni
que
s [2
]
In thesecond
techni
que
s [2
], the input ve
ctors
a
r
e
not
only mea
s
u
r
e
d
the di
stan
ce to the
particula
r sp
e
a
ke
r code
bo
ok, but it will
be mea
s
u
r
e
d
with all ve
ctors that exi
s
t in all available
spe
a
ker
cod
e
boo
k. The sm
allest di
stan
ce sele
cted fr
o
m
the input vector to o
ne
of a colle
ction
o
f
vectors that exist in the ava
ilable
cod
eboo
k. Co
de
boo
k vecto
r
whi
c
h cau
s
e
s
the sm
alle
st
distan
ce
will
be sele
cted a
s
the
pair of the inp
u
t
vect
or. After that,
sele
ct the
co
debo
ok th
at h
a
s
the highe
st freque
ncy pai
r as spea
ke
rs rep
r
esenting
the input voi
c
e. Illustratio
n
of the se
co
nd
techni
que
s can be seen in
Figure 3.
2.4. Mel Frequenc
y
Cepstral Coe
fficie
n
t (MF
C
C)
This
re
se
arch u
s
ed
a typ
e
MF
CC-FB4
0 [9]
be
cau
s
e it ha
s the
equal
erro
r rate (EER)
and de
ci
sion
co
st functio
n
(DCFo
p
t) i
s
lower
tha
n
the other types of MF
CC [10]. Illustration
MFCC stag
es can be
see
n
in Figure 4.
Figure 4. Illustration of the MFCC pro
c
e
ss [2]
The first ste
p
in the MFCC pro
c
e
s
s is divide the incoming si
gnal
into multiple frame
s
.
The se
cond
step
i
s
the smoothing of each
fram
e to minimi
ze n
on-contin
uou
s si
gnal
usi
n
g
hammin
g
wi
n
dow. T
he thi
r
d step
is to
conve
r
t the v
o
ice
sig
nal from the time
domain
to th
e
freque
ncy d
o
m
ain u
s
ing t
he fast fou
r
i
e
r tra
n
sfo
r
m
(FFT
). The
fourth ste
p
i
s
to chang
e
th
e
freque
ncy
of
the FFT
results into
mel
scale. Th
e final
step
is to
re
store
the
si
gn
al from
the ti
me
domain to the
frequen
cy do
main usi
ng th
e discrete
co
sine tra
n
sfo
r
m (DCT).
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
1, No. 3, March 20
16 : 590 – 596
593
2.5. Self Organizing Map
(SOM)
SOM or
also
kno
w
n
as K
ohon
en i
s
on
e type of arti
ficial ne
ural
netwo
rk (ANN)
with
unsupe
rvise
d
lea
r
ning
system. SOM i
s
very effe
ct
ive to
create
an inte
rnal
repre
s
e
n
tation
of
spa
c
e th
at is
orga
nized fo
r the vario
u
s f
eature
s
of the input
sign
al
[11]. SOM a
s
sume
s top
o
l
ogy
stru
cture am
ong cl
uste
rs
of units, it is run by
a hum
an brai
n but i
s
ab
sent in some othe
r ANN
[12].
The first
step
of training proce
s
s usi
ng
SOM
is dete
r
mine the nu
mber of cl
ust
e
rs to b
e
gene
rated. After that, the next step is to cre
a
te
a vector fo
r ea
ch clu
s
ter. Ve
ctors clu
s
te
r are
given initial weight. Find the smalle
st distance b
e
twe
e
n the input vectors
and the
clu
s
ter vecto
r
s.
Clu
s
t
e
r v
e
ct
o
r
t
hat
cau
s
e
s
t
he smalle
st
dist
an
ce
i
s
the winn
er vect
or. Upd
a
te th
e weig
ht vector
of the winne
r usin
g Equatio
n 1.
)]
(
[
)
(
)
(
old
w
x
old
w
new
w
ij
i
ij
ij
(1)
Whe
r
e
w is t
he wei
ght of the unit in t
he output laye
r, x is the in
put data an
d
α
is the
learni
ng rate.
2.6. Least M
ean Squar
e
(LMS)
ANC m
e
thod
use
d
in this
study is the
L
M
S (Lea
st M
ean Squ
a
re
).
LMS is ap
pl
ying the
gradi
ent de
scent. This met
hod was first propo
se
d
by Widro
w
an
d Hoff [13]. Ste
epe
st desce
n
t,
whi
c
h i
s
o
n
e
method
that
impleme
n
ts
gradi
ent
d
e
scent
actually
been
very g
ood to
gen
erate
optimal wei
g
h
t
s, but this method re
qui
re
s a true
g
r
adi
ent at each
step. LMS can
overcome the
s
e
sho
r
tco
m
ing
s
beca
u
se LM
S can insta
n
tly estimate the gradi
ent at each step.
The first step
of LMS are create a filter and in
itialization the weig
ht (w) ofthe filter. After
that sp
ecify
a
value
of the
l
earni
ng
rate
(
α
). Cal
c
ul
ate the
anti
-
noi
se
usi
n
g
the Eq
uation
2. Th
e
n
,
cal
c
ulate th
e
re
sidual
sig
nal u
s
ing Eq
uation 3.
T
h
e last
step i
s
chan
ge th
e wei
ghts
using
equatio
n 4.
M
j
j
j
i
u
w
y
1
(2)
i
i
i
y
d
e
(3)
j
i
j
j
u
e
w
new
w
2
)
(
(4)
Whe
r
e y is a
n
ti-noi
se, u is refere
nce no
ise, d is in
co
ming voice
si
gnal, and e i
s
resid
ual
sign
al.
4. Results a
nd Analy
s
is
The gra
ph in Figure 5 sho
w
s the hig
h
e
s
t accura
cy i
n
the spea
ke
r identification
for noisy
test data that
do not u
s
e t
he LMS in d
a
t
a prep
ro
ce
ssi
ng is ve
ry low, 1.45% in t
he first
simila
rity
measurement
tech
nique
a
nd 1.6
3
% in
the
se
cond
simila
rity m
easure
m
ent t
e
ch
niqu
e. Bo
th
accuracy o
c
curred in the n
u
mbe
r
of SOM clust
e
rs is
9.
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
A Com
parative Study on Si
m
ilarity Measurem
ent
in Noisy Voice Sp
eaker … (In
g
g
ih Perm
ana
)
594
Figure 5. Co
mpari
s
o
n
of Speak
er Ide
n
t
ification Accura
cy
The g
r
ap
h in Figu
re 5
shows the
sp
eaker
id
entification accu
ra
cy be
com
e
s greatly
increa
sed
aft
e
r th
e a
dditio
n
of L
M
S al
g
o
rith
m
on th
e
data
preprocessing.
The
h
i
ghe
st
a
c
c
u
ra
cy
in the first techniqu
e is 82.
35%. The hig
hest a
c
cura
cy in the seco
nd tech
niqu
e is 90.72%. Both
of highe
st accuracy o
c
curs at SOM with the numbe
r of clusters is 64.Both of highe
st accura
cy
also
sho
w
e
d
an incre
a
se in accu
ra
cy betwee
n
the first techniq
u
e
and the second techniq
u
e
in
whi
c
h
use L
M
S on
prep
rocessin
g of
data i
s
qui
te sig
n
ifica
n
t with th
e hi
ghe
st a
c
cura
cy
improvem
ent 8.37%.
Figure 6. Effect of LMS Learnin
g
Rate
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
1, No. 3, March 20
16 : 590 – 596
595
Figure 6 sho
w
s the
effect
of LMS learni
ng rate o
n
se
con
d
tech
niq
ue. Whe
n
the
value of
learni
ng rate
is 0.1, the
resulting a
ccura
cy is
very
low which the high
est a
c
cura
cy is
o
n
ly
47.89%. Wh
en le
arni
ng
rate in
crea
sed to
0.3 a
nd a
bove, t
he a
c
curacy
increa
se
d
very
signifi
cantly. The high
est
accuracy o
c
curs
whe
n
the
learnin
g
rate
incre
a
sed to
0.7, which the
accuracy is 9
2
.47%.
Figure 7. Differen
c
e of Accura
cy (S
econ
d Tech
niqu
e – First Te
ch
ni
que)
The g
r
ap
h in
Figure 6
sh
o
w
s the diffe
re
nce
in
the
a
c
curacy
of the
spe
a
ker i
dent
ification
betwe
en
simil
a
rity mea
s
u
r
e
m
ent te
chniq
ues that u
s
e
d
.
The
se
co
nd
tech
niqu
e al
ways p
r
odu
ces
highe
r id
entification
a
c
cura
cy. Differe
nce of hi
ghe
st
accuracy fo
u
nd o
n
the l
e
a
r
ning
rate is
0.3
and the num
b
e
r of clu
s
ters
SOM is 36, which i
s
16.57
%.
5. Conclusio
n
Based
on the
results of ex
perim
ents
on
noisy
voi
c
e,
the use of L
M
S can im
prove the
accuracy of spea
ker
ide
n
tification
atcom
b
ination
of th
e smalle
st ve
ctor sum of
p
a
irs te
chniq
u
e
s
[6, 7] with L
M
S andthe combinatio
n o
f
the frequen
cy
of occu
rre
n
ce of sm
all
e
st vector p
a
i
rs
techni
que
s [2
] with LMS. Seco
nd combi
nation pr
odu
ces bette
r accura
cy than first com
b
inatio
n.
Improveme
n
t of LMS learning ratecan
improve the
accuracy of
spe
a
ke
r ide
n
tification for all
combi
nation
s
. Experiments in this study sh
o
w
e
d
the b
e
st learning
rate is 0.7.
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
A Com
parative Study on Si
m
ilarity Measurem
ent
in Noisy Voice Sp
eaker … (In
g
g
ih Perm
ana
)
596
Referen
ces
[1]
T
ogneri R, P
u
llel
l
a
D. An O
v
ervie
w
of Sp
eaker
I
dentific
ation: Acc
u
rac
y
an
d R
obust
ness Issu
es.
Circuits and Sy
stem
s Maga
z
i
n
e
,
IEEE
. 2011; 11(2): 23-
61.
[2]
Perman
a
I, Bu
ono
A, Sil
a
la
hi
BP. Simil
a
rit
y
Measur
ement f
o
r Sp
eaker
Ide
n
tificatio
n
Us
in
g F
r
eq
uenc
y
of Vector Pairs
.
T
E
LKOMNIKA Indones
ia
n Journ
a
l
of Electr
ical En
gin
eeri
n
g
. 2014; 1
2
(8): 620
5-62
10.
[3]
Perman
a
I, Bu
ono A, S
ila
lah
i
BP.
Noise
C
ance
lli
ng for
R
obust S
peak
er
Identificati
on
Using
Le
ast
Mean
Squ
a
re
. Pro
c
ee
d
i
n
g
.
Pro
c
e
e
d
i
ng
s
o
f
th
e
1
s
t In
te
rn
a
t
io
n
a
l
Con
f
e
r
e
n
c
e
on
Sci
e
n
c
e
and
T
e
chnolog
y for
Sustaina
bil
i
t
y
.
IcosT
e
chs. 2014; 1: 247-2
52.
[4]
Muda L, Beg
a
m
M, Elamvazuthi I. Voice Reco
gniti
on Al
gorithms Usi
n
g Mel F
r
equ
e
n
c
y
C
epstra
l
Coeffici
ent (M
F
CC) an
d D
y
n
a
mic T
i
me W
a
rpin
g (DT
W
)
T
e
chn
i
qu
es.
Jo
urna
l of C
o
mp
uting
. 20
10
;
2(3): 138-
14
3.
[5]
Yan J, Z
h
u Y,
He
H, Su
n Y.
Multi-C
onti
n
g
enc
y
Casca
di
n
g
An
al
ysis
of
Smart Grid B
a
sed
on
Self-
Organiz
i
ng Ma
p.
IEEE Transactions on Infor
m
ati
on Fore
nsi
cs and Secur
i
ty
. 2013; 8(4): 6
46-6.
[6]
F
r
uandta A, B
uon
o A.
Identi
f
ikasi Ca
mpur
an Na
da p
a
d
a
Suara P
i
an
o Meng
gu
nak
an Co
de
book
.
Semin
a
r Nasi
o
nal Ap
likas
i T
e
knol
ogi Inform
asi. Yog
y
ak
arta. 2011; 8-
13.
[7]
W
i
snudis
a
stra
E, Buo
n
o
A. Pen
g
e
nal
an
Chor
d p
a
d
a
A
l
at Musik
Gita
r Men
ggu
nak
a
n
C
odeB
ook
den
ga
n T
e
knik Ekstraksi Ciri MFCC.
Jurnal Ilmiah Il
mu K
o
mp
uter
. 201
0; 14(1): 16-
21.
[8]
Reda A, Panjw
a
ni S, Cutrell E.
Hyke: A
Low
-Cost R
e
mote Attend
ance T
r
ack
i
ng
System fo
r
D
e
ve
lo
pi
n
g
Reg
i
on
s
. Proc
eed
ings
of th
e
5th
ACM
w
o
rksh
o
p
o
n
Net
w
ork
e
d s
y
stems for
deve
l
op
in
g
regi
ons. ACM. 201
1; 15-2
0
.
[9] Slaney
M.
Aud
i
tory T
oolbox
. Interval R
e
sear
ch Corp
oratio
n
,
T
e
ch Rep. 19
98.
[10]
Ganchev T
,
F
a
kotakis N, Kok
k
inakis G.
C
o
mparativ
e Eval
u
a
tion
of Vario
u
s
MF
CC Imp
l
e
m
e
n
tatio
n
s o
n
the Speak
er V
e
rificati
on T
a
sk
. Proceedi
ngs
of the SPECOM. 2005; 1: 19
1-19
4.
[11] Koho
ne
n
T
.
T
h
e Self-Organ
i
z
i
ng Map
. Proce
edi
ngs of the I
EEE. 1990; 78(
9): 1464-
14
80.
[12]
Bashe
e
r
IA, Hajme
e
r
M. Artificial Neur
al
N
e
t
w
orks:
F
undam
enta
l
s, Comp
utin
g,
Desig
n
, a
n
d
Appl
icatio
n.
Jo
urna
l of Microb
iolo
gic
a
l Meth
o
d
s
. 2000; 4
3
(1)
:
3-31.
[13]
Kinn
une
n T
,
Li H. An Ov
ervie
w
of T
e
xt-I
ndepe
nd
ent
Speak
er R
e
cogn
iti
on: fro
m
F
eatures t
o
Superv
e
ctors.
Speec
h Co
mmunic
a
tion
. 2
010
; 52(1): 12-40.
[14]
F
u
rui S. A
n
Overvie
w
of
Speak
er R
e
c
ogn
ition
T
e
chnol
og
y.
A
u
to
matic
sp
eec
h
an
d s
peak
er
recog
n
itio
n
. Sp
ring
er US. 199
6; 31-56.
[15]
Alam MJ, Kenn
y
P, Ouell
e
t P, O’Shaugh
n
e
ss
y
D. Multita
per MF
CC and
PLP F
eatures
for Speake
r
Verificati
on Usi
ng i-Vectors.
S
peec
h Co
mmu
n
icati
o
n
. 20
13; 55: 237
–2
51.
[16]
Che
n
SH, Luo
YR.
Speaker Verificati
on Usi
ng MF
CC and
Support Vect
or Machin
e
. Procee
din
g
s of
the Internati
o
n
a
l Multi C
onfer
ence
of En
gin
e
e
rs and C
o
mp
uter Scient
ists.
Hong Ko
ng. 2
009; 1: 18-
20.
[17]
Naka
ga
w
a
S,
W
ang L, Ohts
uka S. Sp
eak
e
r
I
dentific
ation
and Verific
a
tio
n
b
y
Comb
ini
n
g
MF
CC and
Phase I
n
forma
tion.
IEEE Tra
n
sactio
ns o
n
Audi
o, Spe
e
ch
, and
La
ngu
ag
e Process
i
n
g
. 201
2; 20(
4):
108
5-10
95.
[18]
Davis S, M
e
r
m
elstei
n P.
C
omp
a
riso
n o
f
Parametric
Repr
esentati
o
n
s
for Mon
o
s
y
llab
i
c W
o
r
d
Reco
gniti
on i
n
Conti
nuo
usl
y
S
poke
n
Se
ntenc
es.
IEEE Transactions
on Aco
u
stics, Speec
h
and S
i
gn
a
l
Processi
ng
. 19
80; 28(4): 3
57-
366.
[19]
Steve Y, Odel
J, Ollason D,
Valt
chev V,
W
oodl
and P.
T
he HT
K Boo
k
, version 2.
1
. Cambridge
Univers
i
t
y
. 1
9
9
7
.
[20]
Sko
w
ro
nski M
D
, Harris
JG. Explo
i
tin
g
In
dep
en
dent F
i
lter Ba
nd
w
i
dth
of Huma
n F
a
ctor Cepstr
a
l
Coeffici
ents i
n
Automatic
Sp
eech
Rec
o
g
n
it
ion.
T
h
e Jo
ur
nal
of the
Aco
u
stical S
o
ci
ety of A
m
eric
a
.
200
4; 116: 17
7
4–1
78
0.
[21]
Koho
ne
n T
.
Self-Organiz
ed F
o
rmatio
n
of T
o
pol
ogic
a
ll
y C
o
r
r
ect F
eature Maps.
Biol
ogic
a
l Cyber
netics
.
198
2; 43(1): 59
-69.
Evaluation Warning : The document was created with Spire.PDF for Python.