TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 8, August 201
4, pp. 6205 ~ 6210
DOI: 10.115
9
1
/telkomni
ka.
v
12i8.619
4
6205
Re
cei
v
ed Ap
ril 30, 2014; Revi
sed
Ju
n
e
1, 2014; Acce
pted Ju
ne 15,
2014
Similarity Measurement for Speaker Id
entification
Using Frequency of Vector Pairs
Inggih Permana*
1
, Agus
Buono
2
, Bib
Paruhum Silalahi
3
1,2
Department of Computer S
c
ienc
e, F
a
cult
y of Mathematic
s and Natur
a
l
Scienc
es,
Bogor Agr
i
cult
ural U
n
ivers
i
t
y
,
Bogor 16
68
0, Indon
esi
a
3
Departme
n
t of Mathematic, F
a
cult
y of
Mathe
m
atics and N
a
tural Sci
ences,
Bogor Agr
i
cult
ural U
n
ivers
i
t
y
,
Bogor 16
68
0, Indon
esi
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: ingg
ih.p
erma
na1
2p@
ap
ps.i
pb.ac.id
1
, pu
de
sha@
ya
ho
o.co
.id
2
,
bib
paru
hum
1
@
yah
oo.com
3
A
b
st
r
a
ct
Similar
i
ty meas
ure
m
e
n
t
is an i
m
p
o
rtant part of
spe
a
ker identification
. T
h
is
study
has
mo
difie
d
the
similar
i
ty meas
ure
m
e
n
t techn
i
que
perfor
m
ed
in
previ
ous st
udi
es. Previ
o
u
s
studies
us
ed
the su
m of th
e
sma
llest
distan
ce betw
een t
h
e in
put vector
s and th
e c
o
d
ebo
ok vectors
of a partic
u
la
r speak
er. In this
study, the t
e
ch
niq
ue
has
be
e
n
mo
difie
d
by
selecti
ng
a
par
ticular s
p
e
a
ker
cod
ebo
ok w
h
i
c
h h
a
s th
e h
i
g
hest
freque
ncy of v
e
ctor pa
irs. Ve
ctor pair
in th
is
case is
t
he s
m
allest
dista
n
ce
betw
een th
e i
n
put vector
and
the
vector in
the c
ode
bo
ok. T
h
is
study us
ed M
e
l
F
r
eque
ncy
C
e
pstral C
oeffici
e
n
t (MF
CC) as f
eature
extracti
on,
Self Orga
ni
z
i
n
g
Ma
p (SOM) as co
de
bo
ok
maker
an
d Eucli
dea
n as a me
asure
of
dista
n
ce.
T
h
e
exper
imenta
l
results show
e
d
that t
he simil
a
ri
ty meas
urin
g techn
i
qu
es pro
pose
d
can i
m
p
r
ove the acc
u
r
a
cy
of speak
er ide
n
tificatio
n
. In the MF
CC coef
ficients
13, 1
5
and 2
0
the a
v
erag
e a
ccur
a
cy of identific
a
t
ion
respectiv
e
ly in
crease
d
as mu
ch as 0.61%, 0
.
98% an
d 1.27
%.
Ke
y
w
ords
: freque
ncy of vector pairs, MF
CC
, simil
a
rity
mea
s
ure
m
e
n
t, SOM, speaker id
e
n
tificatio
n
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Speaker ide
n
t
ification i
s
p
a
rt of th
e
so
und
pro
c
e
s
si
ng that
aims to find
out
who
is
talking. Sp
ea
ker identifi
c
at
ion is
ne
ce
ssary
be
cau
s
e the hu
man
ability to re
cog
n
ize hum
an
speech is very limited, especi
a
lly with so mu
ch div
e
rsity amo
n
g
human voice. Therefo
r
e,
the
spe
a
ker ide
n
tification syste
m
is neede
d and wid
e
ly
ap
plied in real lif
e. One import
ant applicatio
n
of spea
ke
r id
entification is
in the field of
fore
n
s
ics [1], for example i
dentifying wh
o is spe
a
ki
ng
on
a re
co
rde
d
p
hone
call
tha
t
will be
use
d
as evide
n
ce in a
co
urt
ca
se. In d
a
il
y life, spea
ker
identificatio
n is al
so very i
m
porta
nt, su
ch a
s
a
c
ce
ss control to te
lepho
ne ba
n
k
ing,
sho
ppin
g
,
openi
ng a pe
rson
al com
put
er and
so fort
h.
Speaker iden
tification h
a
s
two mai
n
p
a
rts, nam
ely th
e feature extraction
and
si
milarit
y
measurement
. This study
has mo
difie
d
the si
mila
ri
ty measurem
ent techni
qu
e perfo
rme
d
in
previou
s
stud
ies. In
previo
us
studi
es [2
, 3] si
mila
rity mea
s
u
r
eme
n
ts h
a
ve b
e
e
n
pe
rform
ed
by
usin
g sum of
the sm
alle
st distan
ce
bet
wee
n
t
he in
p
u
t vector
an
d the code
b
ook ve
cto
r
of
a
particula
r sp
e
a
ke
r. Co
deb
o
o
k i
s
voice
prints pr
odu
ce
d
throug
h a training [3]. Th
e sum
re
sult
of
the most mi
nimally defin
ed a
s
sp
ea
kers
re
pre
s
e
n
ting the inp
u
tted voice. In
this study, the
techni
que wa
s modified b
y
means of sele
cting
a p
a
rticul
ar spe
a
ke
r co
deb
o
o
k that has t
h
e
highe
st fre
q
u
ency of
occu
rren
ce
of vect
or p
a
irs
with
i
nput vecto
r
s
as
spe
a
kers repre
s
e
n
ting t
h
e
inputted voi
c
e. Vecto
r
pai
r is the
small
e
st di
st
an
ce
betwe
en the
input vecto
r
s with o
ne of
the
vectors that
exist in the e
n
tire
code
bo
ok. Di
stan
ce
measurement
method u
s
e
d
in this
stud
y is
Euclidean.
Feature extraction m
e
tho
d
used in th
is st
u
d
y is
mel freq
uen
cy cep
s
tral
coefficient
(MF
C
C). MF
CC i
s
often
use
d
be
ca
u
s
e it is
co
n
s
ide
r
ed
a b
e
tter pe
rform
ance than
o
t
her
method
s, su
ch as in term
s of error rate r
edu
ction.
The wo
rking
s
of MFCC is base
d
on th
e
freque
ncy diff
eren
ce
can
b
e
captu
r
ed
b
y
the hum
an
ear so that it can re
pre
s
e
n
t how pe
opl
e
receive so
un
d sign
als [4].
In this
study
the algo
rithm
whi
c
h
will b
e
used a
s
a
cod
ebo
ok m
a
ker is
self o
r
gani
zing
map (S
OM).
SOM su
cce
s
sfully appli
e
d
to high
-d
im
ensi
onal
dat
a [5], which
is the traditio
nal
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 8, August 2014: 620
5 –
6210
6206
method may
not be able t
o
do so. Its
ability to handle data of h
i
gh dimen
s
io
n whi
c
h is th
e
con
s
id
eratio
n
for ch
oo
sing
this metho
d
to gene
rate
codeb
oo
k. Dat
a
re
sults f
r
o
m
MFCC mig
h
t
prod
uce a hig
h
-dim
en
siona
l, dependi
ng
on ho
w many
coefficie
n
ts a
r
e dete
r
mine
d at the MFCC.
Speaker ide
n
t
ification b
a
sed o
n
the
word
s
sp
o
k
e
n
divided int
o
t
w
o, n
a
mely t
he text-
depe
ndent a
nd text-inde
p
ende
nt [6]. Depen
dent-text
is
the introd
uction
of the spe
a
ker utte
red
the wo
rd
s fixed. Whil
e text-inde
pen
dent
spe
a
ke
r re
co
gnition whi
c
h are not
dete
r
mined what
t
h
e
word
sho
u
ld
be p
r
on
ou
nce
d
. Thi
s
study will
fo
cu
s on
iden
tifying the spea
ker in te
xt-
indep
ende
nt.
2. Rese
arch
Metho
d
2.1. Propose
d
Techniqu
e
s
In the p
r
evio
us te
chni
que
s [2, 3], ea
ch i
nput ve
ct
or i
s
me
asured the
dista
n
ce
with
vectors th
at
exist in
a p
a
r
ticula
r
sp
ea
ker
cod
ebo
ok.
Ch
oo
se
a p
a
ir of
vecto
r
s whi
c
h
ha
s t
he
smalle
st dista
n
ce for e
a
ch input vector.
Sum a
ll the minimal pai
rs that obtained. Perform th
ese
pro
c
e
s
ses fo
r all existin
g
spe
a
ker
cod
eboo
k. After
that, choo
se
the co
deb
oo
k with th
e m
o
st
minimal
sum
as
spea
ke
rs re
pre
s
e
n
ting
the voi
c
e
id
entified. Illust
ration
of p
r
ev
ious techniq
u
e
s
can b
e
se
en i
n
Figure 1.
c
o
d
e
bo
ok
of
sp
eaker
1
colle
ction
o
f
codebook
∑
∑
S
e
l
e
ct
t
h
e
s
p
eak
er w
h
ic
h
has
s
m
alles
t
su
m
Vec
t
ors
generat
ed by
t
he proc
e
s
s
of
M
F
C
C
of
in
put
s
ound
C
odebook
v
e
c
t
ors
t
hat
ex
is
t
in
a part
i
c
u
lar
s
p
e
a
k
e
r
T
he s
m
alles
t
dis
t
anc
e
bet
w
een a
inp
u
t
v
e
c
t
or and
c
odebook
v
e
c
t
ors
t
hat
ex
is
t
in
a part
i
c
u
lar
s
peak
er
c
o
d
e
bo
ok
of
sp
eaker
2
I
n
p
u
t vecto
r
s
Figure 1. Pre
v
ious Similari
ty Measurem
ent
Tech
niqu
es
Figure 2. Pro
posed Similarity Measurem
ent
Tech
niqu
es
In the techni
que
s offered,
the inp
u
t ve
ctors
a
r
e
not
only mea
s
u
r
e
d
the di
stan
ce to the
particula
r
spe
a
ke
r
cod
ebo
ok, b
u
t it will
be
mea
s
u
r
e
d
with
all ve
ctors that exi
s
t in all
availa
ble
spe
a
k
e
r
co
de
boo
k.
The
s
m
allest
dist
a
n
c
e
s
e
le
ct
ed f
r
o
m
the in
put v
e
ctor to o
ne
o
f
a colle
ction
of
vectors th
at
exist in th
e
available
cod
eboo
k.
Cod
e
boo
k
ve
ctor whi
c
h ca
use
s
the
smalle
st
distan
ce will
be sele
cted a
s
the pair of the input
vect
or. After that, sele
ct the co
debo
ok that h
a
s
the high
est freque
ncy p
a
ir as
spe
a
kers rep
r
e
s
enti
n
g
the input vo
ice. Illustratio
n
of pro
p
o
s
e
d
techni
que
s can be seen in
Figure 2.
2.2. Sound Data
Sound data u
s
ed i
s
so
und
that was o
n
ce use
d
by
Re
da [7] in their study of the sea
r
ch
of a
pre
s
e
n
ce. The
data
consi
s
ts of
83
spe
a
kers,
wh
ich i
s
divided
into
35 fe
ma
le spea
ke
rs a
n
d
48 male
spe
a
ke
rs. T
he
words
uttere
d by the
sp
eaker i
s
a combinatio
n o
f
numbers.
Each
spe
a
ker
ha
s
5 so
und
files in wav form
. Recording
wa
s do
ne ov
er the
pho
ne
usin
g a
n
IVR
system (Inte
r
active Voice Re
spo
n
se) in
March 201
1 i
n
India. The
p
a
rticip
ants
are Indian citi
ze
ns
from different
backgroun
ds.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Sim
ilarity Measu
r
em
ent for Spea
ker Ide
n
tification Usi
ng Fre
que
ncy of… (Inggih
Perm
ana)
6207
2.3.
Ho
w
to Condu
ct
Ex
periments
The expe
rim
ents in thi
s
study perfo
rm
ed on
seve
ral com
b
inatio
ns of p
a
ram
e
ters. At
each com
b
in
ation of para
m
eters one
voice files that
is owne
d by each of the spe
a
kers will
be
use
d
to creat
e the code
bo
ok a
nd oth
e
r
voice file
s
to
be u
s
ed
as te
st data. Thi
s
is do
ne 5 tim
e
s
so that all the voice files for each spea
ke
r had
be
en a data to make
the cod
eboo
k. For example
,
for the first e
x
perime
n
t, the first
so
und
file is u
s
e
d
to
create th
e codeb
oo
k an
d
other voice file
s
are u
s
e
d
a
s
a voice te
st, the se
co
nd
experim
ent, the second v
o
ice file
s u
s
e
d
to create t
h
e
cod
ebo
ok an
d othe
r voi
c
e
files
are
u
s
e
d
a
s
a
voic
e
test, and
so f
o
rth. Fo
r e
a
ch expe
rime
nt is
cal
c
ulate
d
the resulting a
c
cura
cy. After five
experiment con
d
u
c
ted for one combi
nation
of
para
m
eters, t
hen i
s
calcul
ated the ave
r
age a
c
cura
cy
. This ave
r
ag
e is u
s
e
d
a
s
a mea
s
u
r
e
of
ability of a parameters
com
b
ination in the speaker identification.
2.4. Mel Frequenc
y
Cepstral Coe
fficie
n
t (MF
C
C)
MFCC i
s
wid
e
ly used
as a
feature extra
c
tion i
n
vari
o
u
s fiel
ds of
so
und
sig
nal
proce
s
sing
[4], [8-10]. MFCC
co
nsi
s
ts of several dif
f
erent
types [
11], namely
MFCC-FB
20
[12], HTK MF
CC-
FB24 [13], M
F
CC-Fb
40 [1
4] and
HF
CC-E FB-2
9 [
15]
. This
re
sea
r
ch will u
s
e
a type MF
CC-FB
4
0
becau
se it
ha
s the
eq
ual
e
rro
r
rate
(EER)
and
de
ci
si
on
co
st fun
c
tion (DCFo
p
t) i
s
lo
we
r tha
n
t
he
other three types of MF
CC [11]. Illustrati
on MFCC sta
ges
can b
e
seen in Figu
re
3.
Figure 3. Illustration of the MFCC Pro
c
e
s
s
The first
step
in the
MF
CC proce
s
s i
s
di
vide the i
n
co
ming
sign
al i
n
to multiple
frame
s
.
The se
co
nd
step is the
smoothing of
each fram
e t
o
minimize non-contin
uou
s sig
nal usi
n
g
hammin
g
win
dow. The third step is to conve
r
t
the voice
signal from the time domain to th
e
freque
ncy
do
main
usi
ng t
he fa
st fou
r
i
e
r tran
sform
(FFT
). Th
e f
ourth
ste
p
i
s
to
cha
nge
the
freque
ncy of
the FFT re
sul
t
s into mel scale. The final
step is to restore the si
gn
al from the time
domain to the
frequen
cy do
main usi
ng th
e discrete
co
sine tra
n
sfo
r
m (DCT).
2.5. Self Organizing Map
(SOM)
SOM wa
s first offered by T
euvo Koho
ne
n [16]. SOM or al
so kno
w
n as Koh
one
n, is one
type of artifici
al neu
ral
net
work
(ANN)
with un
su
pe
rvised le
arnin
g
syst
em. SO
M is ve
ry effective
to create
an
internal
repre
s
entatio
n of
spa
c
e
that
i
s
org
ani
zed
fo
r the
vario
u
s feature
s
of t
he
input sig
nal [17]. SOM assumes top
o
log
y
structu
r
e a
m
ong cl
uste
rs of units, it is run by a hu
man
brain b
u
t is a
b
se
nt in som
e
other ANN [18].
The first
step
of traini
ng p
r
oce
s
s u
s
ing
SOM
is dete
r
mine the
nu
mber of cl
ust
e
rs to b
e
gene
rated. Af
ter that, the
next step i
s
t
o
create
a
vector for
ea
ch clu
s
te
r. Ve
ctors
clu
s
ter
are
given initial weig
ht. Find
the smalle
st dist
ance b
e
twee
n the input vecto
r
s and the cl
u
s
ter
vectors. Clu
s
ter vector that
cau
s
e
s
the smallest
di
stan
ce is the win
n
e
r vector. Up
date the weig
ht
vector of the
winn
er.
3. Results a
nd Analy
s
is
The
exp
e
rim
e
nts were con
ducte
d
by ch
angin
g
some
para
m
eter
val
ues. The pa
rameters
cha
nge
d to measure the ef
fect of chan
gi
ng t
hese parameters on t
he ac
cu
ra
cy and compa
r
e
the
accuracy
pro
duced by th
e pa
ramete
rs u
s
ing
the
previo
us te
chni
que
s a
n
d
the p
r
o
p
o
s
ed
techni
que
s i
n
this
study.
Param
e
ters to be
perm
u
ted value
i
s
MF
CC coe
fficients
and
the
numbe
r of clu
s
ters on the SOM. The nu
mber of
expe
riment
s con
d
u
cted is 2
4
. MFCC coeffi
cient
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 8, August 2014: 620
5 –
6210
6208
that were trie
d is 13, 15 an
d 20. The nu
mber of SO
M
cluste
rs that
were tried is
9, 16, 25, 36,
49,
64, 81
and
1
00. In a
dditio
n
there a
r
e
several
pa
ram
e
ters fixed d
u
ring
the th
e
experi
m
ent,
the
frame len
g
th i
s
25 m
s
, MF
CC
overla
p is 0.4, t
he SOM topology i
s
hexagon
al a
nd SOM iteration
numbe
r is 1
0
00.
Figure 4. Identification Accura
cy for M
F
CC
Coeffici
ents 13
Figure 4 sh
o
w
s the effe
ct of the accu
racy
level of
MFCC
coeff
i
cient
s 13 to
some
numbe
r of SOM clu
s
te
rs.
In the grap
h i
t
is see
n
that whe
n
the nu
mber of SOM
cluste
rs 9 un
its,
the accu
ra
cy of previou
s
si
m
ilarity mea
s
urem
ent tech
nique
s bette
r than simil
a
rit
y
measu
r
em
e
n
t
prop
osed te
chniqu
es,
whi
c
h is 1.51%
hi
gher.
But wh
en the
nu
mb
er
of SOM
cl
usters in
cre
a
s
ed,
the propo
se
d
tech
nique
ha
d better a
c
cu
racy. Imp
r
ov
e
d
a
c
cura
cy i
s
high
est
whe
n
the
numb
e
r of
SOM clu
s
ters is 81, whi
c
h is 1.14%. The high
est
accuracy of prop
osed techniqu
es o
c
cu
rre
d
whe
n
the
nu
mber
of SOM
clu
s
ters i
s
8
1
, whi
c
h i
s
9
5
.84%. The
a
v
erage
in
cre
a
se in
a
c
cura
cy is
0.61%.
Figure 5. Identification Accura
cy for M
F
CC
Coeffici
ents 15
Figure 5 sh
o
w
s the effe
ct of the accu
racy
level of
MFCC
coeff
i
cient
s 15 to
some
numbe
r of SOM clu
s
te
rs.
In the grap
h i
t
is see
n
that whe
n
the nu
mber of SOM
cluste
rs 9 un
its,
the accu
ra
cy of previou
s
t
e
ch
niqu
es
be
tter t
han p
r
o
posed te
chni
que
s, whi
c
h i
s
0.42% hi
gh
er.
But when the
numbe
r of SOM clu
s
ters i
n
crea
sed, su
ch a
s
wh
en the MFCC co
efficient 13, the
prop
osed techniqu
e had b
e
tter accu
ra
cy. Improved
accuracy i
s
highe
st wh
en
the numbe
r of
SOM clu
s
ters is 81, whi
c
h
is 1.69%. Thi
s
increa
se i
s
better than M
F
CC
coeffici
e
n
ts 13, whi
c
h
only amount
ed to 1.14%. The highe
st
accura
cy of propo
se
d techni
que
s occurred wh
en the
numbe
r of S
O
M clu
s
te
rs
are 4
1
an
d 8
1
, whi
c
h i
s
9
6
.08%. It is better than M
F
CC
13 that
the
highe
st accu
racy wa
s only
95.84%. The
aver
ag
e of accuracy in
crea
se is 0,9
8
%.
C
o
m
p
a
r
i
s
o
n
o
f
S
p
e
aker
I
d
en
t
i
f
i
cat
i
o
n
A
c
c
u
r
a
c
y
M
F
C
C
co
ef
f
i
ci
en
t
=
13
80
85
90
95
10
0
The
num
b
e
r
of
S
O
M
c
l
us
t
e
r
s
Ac
c
u
r
a
c
y
(%
)
P
r
ev
i
o
us
t
e
c
h
n
i
qu
es
92
.
5
9
9
3.
55
9
4
.
1
0
9
4
.
4
0
94
.
3
4
9
4.
5
8
94
.
7
0
9
4.
58
P
r
op
os
ed
t
e
c
h
n
i
qu
es
91
.
0
8
9
4.
46
9
4
.
8
2
9
5
.
3
0
95
.
3
6
9
5.
4
8
95
.
8
4
9
5.
36
9
1
62
53
64
9
6
4
8
1
1
0
0
C
o
m
p
a
r
i
s
o
n
o
f
S
p
e
aker
I
d
en
t
i
f
i
cat
i
o
n
A
c
c
u
r
a
c
y
M
F
C
C
co
ef
f
i
ci
en
t
=
15
80
85
90
95
10
0
The
num
b
e
r
of
S
O
M
c
l
us
t
e
r
s
Ac
c
u
r
a
c
y
(%
)
P
r
ev
i
o
us
t
e
c
h
n
i
qu
es
92
.
5
3
9
3.
61
9
4
.
2
2
9
4
.
7
0
94
.
7
0
9
4.
7
0
94
.
4
0
9
4.
58
P
r
op
os
ed
t
e
c
h
n
i
qu
es
92
.
1
1
9
4.
34
9
5
.
3
0
9
5
.
6
6
96
.
0
8
9
5.
7
8
96
.
0
8
9
5.
90
9
1
62
53
64
9
6
4
8
1
1
0
0
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Sim
ilarity Measu
r
em
ent for Spea
ker Ide
n
tification Usi
ng Fre
que
ncy of… (Inggih
Perm
ana)
6209
Figure 6. Identification Accura
cy for M
F
CC
Coeffici
ents 20
Figure 6 sh
o
w
s the effe
ct of the accu
racy
level of
MFCC
coeff
i
cient
s 20 to
some
numbe
r of S
O
M cl
uste
rs.
Unli
ke at M
F
CC c
oeffici
ents 1
3
an
d
15, wh
en the
numbe
r of
SOM
clu
s
ters 9 u
n
i
t
s, the accu
racy of the p
r
opo
sed
te
ch
n
i
que i
s
better than the p
r
e
v
ious te
chni
q
ue,
whi
c
h is 0.72
% higher. Improved a
c
curacy is high
est when the n
u
mbe
r
of SOM cluste
rs is 81,
whi
c
h is 1.6
9
%
. The high
est accu
ra
cy
of propo
se
d
techniq
u
e
s
occurre
d
wh
e
n
the numb
e
r
o
f
SOM clu
s
ters are 49, which is 96.51%. It is bette
r tha
n
MFCC 15 that the highe
st accu
ra
cy was
only 96.08%. The average
of accu
ra
cy increa
se is 1.
27%.
Figure 7. Effect of an Incre
a
se
in the
Co
efficient of MFCC
Figure 7 is
a gra
ph that
sho
w
ing th
e e
ffect of increa
sing th
e numb
e
r of
MFCC
coeffici
ents t
o
increa
sin
g
accuracy in
the u
s
e of
pro
posed te
chni
que
s compa
r
ed with
previou
s
techni
que
s. In Figu
re 7 it i
s
seen th
at
whe
n
the MF
CC
co
efficien
ts 13, 15
an
d
20 respe
c
tively
the avera
ge
of increa
sing
accura
cy 0.6
1
%, 0.98% and 1.27%. Th
is
indi
cate
s th
at the highe
r
the
MFCC co
efficients then the
higher the in
cre
a
se
of the accuracy of s
pea
ker id
entification.
4. Conclusio
n
Experiment
s
con
d
u
c
ted
sh
ow th
at the
simila
rity me
asu
r
ing
tech
nique
s
pro
p
o
s
ed
can
improve the
accuracy of
spe
a
ker ide
n
t
ificati
on. It can be seen f
r
om the 24 e
x
perime
n
ts that
have bee
n carri
ed out onl
y 2 times the techniq
u
e
s
o
ffered are not
succe
s
sful in improvin
g the
accuracy
of i
dentificatio
n. Addi
tion of a
c
cura
cy by
u
s
ing th
e
simil
a
rity mea
s
u
r
e
m
ent techniq
ues
prop
osed wh
en co
mpa
r
e
d
with previous te
chni
qu
es to MF
CC coefficie
n
ts 13, 15 and
20
respe
c
tively are
0.61%, 0.
98% an
d 1.2
7
%. It also
shows th
at the
highe
r th
e M
F
CC coefficie
n
ts
then the
hig
her th
e in
cre
a
se
in th
e a
c
cura
cy of
spea
ker ide
n
tification. T
he
highe
st
spe
a
k
e
r
C
o
m
p
ar
i
s
o
n
o
f
S
p
eak
er
I
d
en
t
i
f
i
cat
i
o
n
A
ccu
r
a
c
y
M
F
C
C
co
ef
f
i
ci
e
n
t
=
20
80
85
90
95
10
0
Th
e
nu
m
b
e
r
of
S
O
M
c
l
u
s
t
e
r
s
A
c
cu
r
a
cy (
%
)
P
r
ev
i
o
us
t
e
c
h
ni
que
s
93.
3
7
94
.
1
0
9
4.
94
9
4
.
8
2
9
4.
88
94.
8
8
9
4
.
70
94.
9
4
P
r
opo
s
ed t
e
c
h
ni
que
s
94.
1
0
95
.
0
0
9
5.
84
9
6
.
4
5
9
6.
51
96.
2
0
9
6
.
39
96.
3
3
9
1
6
2
53
64
96
48
1
1
0
0
A
ve
r
a
ge
I
n
c
r
e
a
s
e
Th
e A
c
c
u
r
a
c
y
of
The
Pr
opos
ed
Te
ch
ni
qu
es
C
o
m
p
ar
ed
t
o
The
Pr
e
v
i
ous
Te
ch
ni
qu
es
0.
00
0.
50
1.
00
1.
50
The
num
be
r
of
M
F
C
C
c
o
e
f
f
i
c
i
e
n
t
s
I
n
c
r
eas
e
(
%
)
A
v
e
r
ag
e i
n
c
r
eas
e of
ac
c
u
r
a
c
y
0.
61
0.
9
8
1.
27
13
15
20
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 8, August 2014: 620
5 –
6210
6210
identificatio
n accuracy
i
s
96.51%
with
the numb
e
r
of SOM clu
s
t
e
rs i
s
4
9
an
d the num
be
r of
MFCC c
o
efficients
is
20.
Although
su
cce
ssful i
n
in
crea
sing
the
accura
cy
of the spea
ke
r i
dentificatio
n, but the
increa
se
wa
s sm
all. Th
erefore, fo
r fu
rther
re
sea
r
ch
, the te
chni
q
ue in
this
study ne
ed to
be
improve
d
in o
r
de
r to increa
se in hig
her a
c
cura
cy.
Referen
ces
[1]
Kinn
une
n T
,
Li H. A
n
Ov
ervie
w
of T
e
xt-Indepe
nd
ent
Spe
a
ker R
e
cogn
ition: fro
m
F
eatures t
o
Superv
e
ctors.
Speec
h Co
mmunic
a
tion
. 2
010
; 52(1): 12-40.
[2]
F
r
uandta A, B
uon
o A.
Identi
f
ikasi Ca
mpur
an Na
da
pad
a
S
uara Pi
an
o
Meng
gun
aka
n
Cod
e
b
ook
.
Semin
a
r Nasi
o
nal Ap
likas
i T
e
knol
ogi Inform
asi. Yog
y
ak
arta. 2011; 8-
13.
[3]
W
i
snudis
a
stra
E, Bu
o
no
A
.
Pen
g
e
nal
an
Chor
d p
a
d
a
A
l
at Mus
i
k Gita
r Men
ggu
nak
a
n
C
ode
Book
den
ga
n T
e
knik Ekstraksi Ciri MFCC.
Jurnal Ilmiah Il
mu K
o
mp
uter
. 201
0; 14(1): 16-
21.
[4]
Muda L, Be
ga
m M, Elamvazuthi I. Voice
Reco
gniti
on Al
gorithms Us
in
g Mel F
r
equ
e
n
c
y
C
epstra
l
Coeffici
ent (M
F
CC) an
d D
y
n
a
mic T
i
me W
a
rpin
g (DT
W
)
T
e
chn
i
qu
es.
Jo
urna
l of C
o
mp
uting
. 201
0;
2(3): 138-
14
3.
[5]
Yan J, Z
hu Y, He H, Sun Y.
Multi-Co
nting
enc
y Casc
ad
in
g Anal
ys
is of Smart Grid Based on S
e
lf-
Organiz
i
ng Ma
p.
Inform
ation Forensics
and Se
curity, IEEE Transactions on
. 2013; 8(
4): 646-6.
[6]
F
u
rui S. An Overvie
w
of Speak
er Reco
gniti
on T
e
chnolo
g
y
.
Auto
matic speec
h and sp
eak
e
r
recog
n
itio
n
. Sp
ring
er US. 199
6; 31-56.
[7]
Reda A, Panjw
a
ni
S, Cutre
ll E.
Hyke: A
Low
-Cost R
e
mote Atten
d
ance T
r
ack
i
n
g
System f
o
r
Devel
o
p
i
ng R
e
gio
n
s
. Procee
d
i
ngs of the 5th
ACM
w
o
rks
h
o
p
on Net
w
o
r
ke
d s
y
stems for deve
l
op
in
g
regi
ons. ACM. 201
1; 15-2
0
.
[8]
Alam MJ, Ken
n
y
P, Oue
llet
P, O’Shaug
hn
ess
y
D. Multita
per MF
CC a
n
d
PLP F
eatures
for Speak
er
Verificati
on Usi
ng i-Vectors.
S
peec
h Co
mmu
n
icati
o
n
. 20
13; 55: 237
–2
51.
[9]
Chen SH, Luo YR.
Speaker
Verificati
on Usi
ng MF
CC an
d
Support Vect
or Machi
n
e
. Procee
din
g
s o
f
the Internati
o
n
a
l Multi C
onfer
ence
of En
gin
e
e
rs and C
o
mp
uter Scient
ists.
Hong Ko
ng. 2
009; 1: 18-
20.
[10]
Naka
ga
w
a
S,
W
ang
L, Ohts
uka S.
Spe
a
ke
r Ident
ific
atio
n
and
Verific
a
tio
n
b
y
C
o
mbi
n
in
g MF
CC
an
d
Phase Information.
Au
di
o,
Sp
eech, an
d Lan
gua
ge
Proc
ess
i
ng, IEEE
T
r
an
sactions
o
n
. 2
012;
20(
4)
:
108
5-10
95.
[11]
Ganchev T
,
Fakotakis
N, Ko
kkinakis
G.
Co
mp
arativ
e Eval
uatio
n of Va
r
i
o
u
s MF
CC Impl
ementati
o
n
s
on the Sp
eaker
Verificatio
n
T
a
sk
. Proceedi
ng
s of the SPECOM. 2005; 1: 191-1
94.
[12]
Davis S, M
e
r
m
elstei
n P.
C
omp
a
riso
n o
f
Parametric
Repr
esentati
o
n
s
for Mon
o
s
y
llab
i
c W
o
r
d
Reco
gniti
on i
n
Conti
nuo
usl
y
Spoke
n
Se
nte
n
ces.
Acoustic
s
, Speech
an
d Sig
nal Pr
oc
essin
g
, IEEE
T
r
ansactio
n
s o
n
. 1980; 2
8
(4): 357-
366.
[13]
Steve Y, Ode
l
J, Ollaso
n D,
Valtch
ev V,
W
oodl
and
P.
Th
e
H
T
K Book, ve
rsi
o
n 2
.
1
.
Cambr
i
dg
e
Univers
i
t
y
. 1
9
9
7
.
[14] Slan
e
y
M.
Aud
i
tory T
oolbox
. Interval R
e
sear
ch Corp
oratio
n
,
T
e
ch Rep. 19
98.
[15]
Sko
w
ro
nski M
D
, Harris
JG. Expl
oitin
g
In
dep
en
dent F
i
lter Ba
nd
w
i
dth
of
Hum
an F
a
ctor Ce
pstra
l
Coeffici
ents i
n
Automatic
Sp
eech
Rec
o
g
n
it
ion.
T
h
e J
our
nal
of th
e Ac
o
u
stical
Soci
ety of A
m
er
ica
.
200
4; 116: 17
7
4–1
78
0.
[16]
Koho
ne
n T
.
Self-Organiz
ed F
o
rmatio
n
of T
o
pol
ogic
a
ll
y C
o
r
r
ect F
eature Maps.
Biol
ogic
a
l Cyber
netics
.
198
2; 43(1): 59
-69.
[17] Koho
ne
n
T
.
T
h
e Self-Organ
i
z
i
ng Map
. Proce
edi
ngs of the I
EEE. 1990; 78(
9): 1464-
14
80.
[18]
Bashe
e
r
IA, Hajme
e
r
M. Artificial Neur
al
N
e
t
w
orks:
F
undam
enta
l
s, Comp
utin
g,
Desig
n
, a
n
d
Appl
icatio
n.
Jo
urna
l of Microb
iolo
gic
a
l Meth
o
d
s
. 2000; 4
3
(1)
:
3-31.
Evaluation Warning : The document was created with Spire.PDF for Python.