TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol.12, No.6, Jun
e
201
4, pp. 4299 ~ 4
3
0
5
DOI: 10.115
9
1
/telkomni
ka.
v
12i6.399
4
4299
Re
cei
v
ed
Jul
y
27, 201
3; Revi
sed
De
ce
m
ber
23, 201
3; Acce
pted Janua
ry 2
2
, 20
14
Speaker Recognition Based on i-vector and Improved
Local Preserving Projection
Di Wu*, Jie Cao, Jinhua
Wang
Coll
eg
e of Elec
trical an
d Information En
gi
ne
erin
g, Lanz
hou
Universit
y
of T
e
chn
o
lo
g
y
,
Lanz
ho
u, 730
0
50, Chi
n
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
:
wu
di61
5200
7@h
o
tmail.co
m
A
b
st
r
a
ct
In this
pa
per,a
i
m
prov
ed
l
o
ca
l pr
eserve
pr
oj
ection
al
gor
ith
m
is prop
ose
d
in order
to en
h
ance
the
recog
n
itio
n p
e
r
f
orma
nce
of th
e i-vector s
pea
ker re
co
gniti
on
system
und
er
unpr
edicte
d
n
o
i
se e
n
viro
n
m
e
n
t.
F
i
rst , the n
on
z
e
r
o
eig
env
alu
e
is r
e
ject
ed w
hen
w
e
solv
e t
he
opti
m
a
l
o
b
j
e
ctive fu
nction
and
on
ly th
e v
a
lu
e
greater
tha
n
z
e
ro are
us
ed. A
ma
pp
ing
matri
x
is o
b
tai
ned
b
y
solvi
n
g
a
ge
n
e
rali
z
e
d
ei
ge
nv
alu
e
pr
obl
e
m
,
so
can settle the s
i
ng
ular va
lu
e p
r
obl
em
alw
a
ys occurre
d in tra
d
itio
nal l
o
ca
l pr
eserve pr
ojecti
on al
gorit
hm. T
h
e
exper
iment re
sults show
n t
hat the
rec
o
g
n
itio
n perfor
m
ance
of the
me
th
od
prop
o
s
ed i
n
this p
a
per i
s
improve
d
un
de
r several ki
nds
of noise e
n
vir
o
nments.
Ke
y
w
ords
:
computer
ap
p
licatio
n, i-vect
or, loca
l pres
er
vin
g
pro
j
ecti
on, man
i
fold
lear
nin
g
, spe
a
ker
recog
n
itio
n
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
With the pa
st de
cad
e
s,
spe
a
ker
reco
gni
tion ha
s
become a v
e
ry pop
ular
area
of
resea
r
ch in
p
a
ttern
re
cogn
ition, com
put
er visi
on
and
machi
ne l
earning [1]. Due
to the mi
smat
ch
betwe
en
trai
ning and
te
st
ing con
d
ition
s
cau
s
e
d
by
som
e
in
evitable rea
s
o
n
su
ch as
cha
nnel
distortio
n
, different mi
crop
hone
s, tran
smitting cha
n
n
e
ls o
r
en
cod
e
r. One of th
e main cau
s
e
s
of
the perfo
rma
n
ce de
grada
tion is the additive noise that may appea
r in many pra
c
tical
appli
c
ation
s
.
There a
r
e a l
a
rge
num
ber of different
solution
s to all
e
viate this p
r
oblem.
We
can
identify three
main cla
s
s of techniq
u
e
s
fo
r noi
se
-robu
st ASR, namely featu
r
e en
han
cem
ent
method [2],
model a
dapt
ation metho
d
[3] and sco
r
e no
rmali
z
at
ion metho
d
[4]. The featu
r
e
enha
ncement
method attempts to norm
a
lizatio
n the
distorte
d feature, or e
s
tim
a
te undi
stort
e
d
feature form
the disto
r
ted
spe
e
ch, and
do not re
qui
re any explicit
kno
w
led
ge a
bout the noi
se
.
Some exa
m
ples a
r
e th
e
ce
pst
r
al
me
an n
o
rm
aliza
t
ion (CM
N
),
cep
s
tral
me
a
n
an
d va
rian
ce
norm
a
lization
(CMVN), rel
a
tive
spe
c
tra
(RASTA) an
d feature ma
pping. In co
n
t
rast, the mo
del
adaptatio
n method
s wo
rk i
n
the backe
n
d
to comp
en
sate by modifying the acou
stic model
s an
d
carrie
d out
b
y
usin
g som
e
type of
kn
owle
dge
abo
ut the n
o
ise.
Some typi
cal exampl
es
are
maximum likelihoo
d linea
r regressio
n
(MLLR), maxi
mum a po
ste
r
ior
(MAP), factor a
nalyse (FA)
and vecto
r
T
a
ylor serie
s
(VTS) etc. T
he sco
r
e no
rmalizatio
n m
e
thod try to norm
a
lizi
ng the
output score usin
g variou
s normali
zatio
n
method
s
,
su
c
h
as
HN
orm, T
N
o
r
m and
Z
N
or
m e
t
c
.
In the la
st years, th
e Ga
u
ssi
an Mixture
Model
s b
a
se
d on
Unive
r
sal Backg
r
ou
n
d
Mod
e
l
(GMM
-UBM
) [5]
has
be
co
me
the mo
st popul
ar mode
ling
a
pproa
ch
in spe
a
ke
r re
cog
n
ition, so
me
gene
rative m
odel
s su
ch a
s
Eigenvoi
ce
s, Eigenchan
nels a
nd the most po
we
rfu
l
one, the Joi
n
t
Facto
r
Analy
s
is, have
built
on the
succe
s
s of
the
GM
M-UBM
ap
proach. Recentl
y
, a ne
w met
hod
whi
c
h in
spi
r
e
d
from the j
o
i
n
t factor
anal
ysis a
nd
con
s
ists in fin
d
ing
a low
dimen
s
ion
a
l sub
s
p
a
ce
of the GMM
supervector
space, named the total
variability space
that r
epresents both speaker
and
cha
nnel
variability, the vecto
r
s i
n
t
he lo
w dim
e
n
s
ion
a
l spa
c
e
are
call
ed i-v
e
ctors [6]. Th
e i-
vector met
h
o
d
are b
e
com
e
the main
stream in the
spe
a
ker reco
gnition sy
ste
m
at home a
n
d
aboa
rd for th
e rea
s
on of its leadi
ng rol
e
in the NIST test.
Locality Preserving Proje
c
tions (LPP) [
7
] is
a m
anifold lea
r
nin
g
method
widel
y used i
n
pattern recog
n
ition and
co
mputer visi
on
, LPP is also
well known
as a line
a
r g
r
aph em
beddi
ng
method. But the tradition
al
LPP method
wa
s un
sup
e
r
vised
and
was p
r
op
osed
for only vect
or
sampl
e
s, n
o
t being a
b
le t
o
be di
re
ctly applied
to i
m
age
sampl
e
s, so there are b
een
sev
e
ral
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4299 – 4
305
4300
types improv
ements to co
nventional L
PP [8]. T
he first type of th
e improvem
e
n
t is supe
rvised
LPP, whi
c
h t
r
y to expl
oiting the
cl
ass
label i
n
form
ation of
sa
mpl
e
s i
n
the
trai
ning
pha
se.
The
se
con
d
type cha
nge
s LPP
to a nonline
a
r tran
sfo
r
m
method by u
s
ing the
kern
el trick. The third
type of the
improvem
ent
to LPP ma
inly focu
se
s on di
re
ctly impleme
n
tin
g
LPP for t
w
o
dimen
s
ion
a
l rather than o
n
e
dimen
s
ion
a
l
vectors
a
n
d
its have hig
her comp
utational efficie
n
cy.
And the
la
st i
m
provem
ent
see
k
s to
obta
i
n LPP
solu
ti
ons
with
diffe
rent solutio
n
prop
ertie
s
, su
ch
as orth
ogo
nal
locality pre
s
e
r
ving metho
d
and un
co
rrela
t
ed LPP feature extra
c
tion
method.
From the mo
deling p
r
o
c
e
s
s of the i-vector
method, the manifold l
earni
ng meth
od ha
s
been
a
c
hiev
ed well perf
o
rma
n
ce
in
automatic
sp
eaker re
co
g
n
ition syste
m
.
But
the LPP
algorithm always
suffers
f
r
om the small
s
a
mple
s
i
z
e
(SSS) problem. A new
s
o
lution sc
heme for
LPP is pro
p
o
s
ed in thi
s
pa
per
whi
c
h ca
n be directly impleme
n
ted
no matter
wh
ether the
r
e e
x
ists
the SSS problem or not.
We only usi
n
g the ei
genv
ectors
correspondi
ng posit
i
ve eigenvalue
whe
n
solving
the optimize
d
objective fun
c
tion an
d rem
o
ving the ze
ro eigenvalu
e
.
The remai
n
d
e
r of the
pa
per i
s
o
r
ga
ni
zed
as follo
ws: in
Sectio
n 2
we intro
duce the
conve
n
tional
LPP and
pre
s
ent
our ne
w LPP sol
u
tio
n
. In Sectio
n
3 the
origi
n
al i-ve
ctor A
S
R
system i
s
giv
en and
our
new i
-
vecto
r
ASR sy
stem
based on
o
u
r ne
w LPP
solutio
n
is al
so
prop
osed. In Section 4 we descri
be the
experim
ent
re
sult
s.
S
e
ct
ion
5 of
f
e
rs ou
r Con
c
lu
sion.
2. The improv
e
d
LPP method
2.1. Descrip
tion of LPP
LPP was p
r
o
posed as a way to transform samp
le
s in
to an new sp
ace an
d to en
sure that
sampl
e
s that
were in cl
ose
proximity in the origi
nal sp
ace
remai
n
so in the new
spa
c
e. Con
s
i
der
there have
l
training
sampl
e
s
1
{}
l
ii
X
x
, the goal of LPP is to minimize the followin
g
function
[9-10]:
2
,
mi
n
(
(
)
)
TT
ii
i
j
ij
Wx
Wx
S
(1)
The
ij
S
is a symmetric m
a
trix and the elem
ent of the
ij
S
is defined a
s
fol
l
ows:
else
x
of
neighbors
K
of
one
is
x
if
t
x
x
S
i
j
j
i
ij
0
exp
2
(2)
From th
e o
p
timized
fun
c
tio
n
eq
uation
(1
) we
can
see
the lo
cal
structure of th
e
feature
spa
c
e
ca
n p
r
ese
r
ved li
ke i
n
the o
r
igin
al
high di
men
s
i
on spa
c
e afte
r dime
nsi
on
redu
ction, whi
c
h
mean
s
clo
s
e
sam
p
le
s in t
he o
r
igin
al space will
st
ill
clo
s
e i
n
the
new sp
ace, so the p
r
oj
ecti
on
matrix
W
c
an be written as
:
W
X
S
D
X
W
W
XLX
W
W
T
T
T
T
)
(
min
arg
min
arg
(3)
In Equation
(3), D is
diago
nal ma
trix,
j
ij
ii
S
D
,
S
D
L
,
the solution of
Equation (3)
can b
e
obtain
ed by finding
the gener
alized eige
nvalu
e
of the following fun
c
tion:
W
XDX
W
XLX
T
T
(4)
2.2. Ne
w
LP
P Solution Scheme
For th
e
conv
entional
LPP
method,
even if the
nei
ghbo
ur
sa
mp
les
are
from
different
c
l
as
se
s
,
in
the
tr
an
s
f
or
m
sp
a
c
e
ob
ta
in
ed
us
in
g th
e co
n
v
e
n
t
io
na
l LPP s
o
lu
tion
th
e
y
migh
t a
l
so
statistically h
a
ve the sam
e
re
pre
s
e
n
ta
tion, whi
c
h i
s
di
sadva
n
ta
ges fo
r p
a
ttern
re
cog
n
ition
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Speaker
Re
cognition Ba
se
d on i-vecto
r
and Im
pr
oved
Local Preserving Proje
c
tio
n
(Di
Wu)
4301
probl
em
s. In other
wo
rd
s, it is po
ssi
ble
for t
he
conve
n
tional LPP
solution to p
r
o
duce the
sa
me
rep
r
e
s
entatio
n for sampl
e
s from
different cla
s
se
s, esp
e
ci
ally for the sam
p
le
s locate
d on t
h
e
bord
e
r of two
cla
s
ses, all u
n
su
pervi
sed
LPP met
hods might suffer from this
sam
e
dra
w
ba
ck.
In this se
ctio
n, we de
scri
be our n
e
w i
m
provem
ent scheme to
the co
nventio
nal LPP
s
o
lution
s
c
heme. Firs
t, we demons
t
rate
the
effective
sol
u
tion
of t
he
conve
n
tio
nal LPP
sol
u
tion
sho
u
ld be fro
m
a sub
s
pa
ce
T
X
DX
, for s
i
mplic
i
ty, we define matrix
1
D
,
1
L
and
1
S
:
T
T
T
XSX
S
XLX
L
XDX
D
1
1
1
(5)
Suppo
se th
at
n
,...,
,
2
1
are the
eigenve
c
tors corre
s
po
n
d
ing to the
positive
eigenvalu
e
s of
1
D
while
N
n
n
,...,
,
2
1
are the eig
env
ectors
co
rre
spondi
ng to t
he zero
eigenvalu
e
s,
in this pa
p
e
r, we rega
rd eige
nvalue
s that a
r
e
le
ss than
10
10
2
.
0
ar
e z
e
r
o
eigenvalu
e
s.
Acco
rdi
ng t
o
the natu
r
e
of LPP,
the ability of the preservin
g
the nei
ghb
our
relation
shi
p
ca
n be
mea
s
ured
by
W
D
W
W
L
W
T
T
1
1
, that mean
s the smal
ler
W
D
W
W
L
W
T
T
1
1
value is , t
he better th
e
local
stru
ctu
r
e of sampl
e
s is
pre
s
e
r
ve
d, so the
Equation (4)
can b
e
re
writ
e as:
W
D
λ
W
L
1
1
(5)
Then
we d
e
si
gn a matrix,
]
,...,
,
[
2
1
n
R
, using
R
, we res
p
ec
tively trans
form
1
D
,
1
L
,
1
S
into the f
o
llowing matric
es
:
R
S
R
S
R
L
R
L
R
D
R
D
T
T
T
1
1
1
(6)
We then construct the
following eig
en-
equation
:
W
D
λ
W
L
(7)
Then
we
can
dire
ctly solve the
equ
atio
n (7
)
sin
c
e
D
is of full
ra
nk.
Let
n
β
β
β
,...,
,
2
1
denote the
e
i
genve
c
tors correspon
ding
to eigenvalu
e
s
n
λ
λ
λ
,...,
,
2
1
in the increa
sing
ord
e
r of
Equation
(7
). Using
the
matrix
R
, we
p
r
odu
ce
R
X
W
T
, then we tra
n
sfo
r
m
W
into
Y
bu
carrying o
u
t
WG
Y
, where
]
,...,
,
[
2
1
n
β
β
β
G
, that is
:
G
R
T
X
WG
Y
)
(
(8)
In this new
method, the
small sampl
e
size
p
r
obl
e
m
are solve
d
becau
se its dire
ctly
implemented
no m
a
tter
whether there exists
the SSS problem
or not, and
only t
he ei
genvect
o
r
whi
c
h its eig
envalue g
r
ea
ter than ze
ro
are us
ed so the dra
w
b
a
ck of the convention
a
l LPP
algorith
m
ca
n
avoid.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4299 – 4
305
4302
3. The Improv
ed
i-
v
ector Sy
stem
3.1. Baseline
i-Vector Sy
s
t
em De
scription
The m
a
in i
d
e
a
in tradition
al JFA
is to
f
i
nd
two sub
s
pace whi
c
h repre
s
e
n
t
the spe
a
ker
and
cha
nnel
variabilitie
s resp
ectively. The exp
e
rim
ent
show that
JFA i
s
o
n
ly p
a
rtially succe
ssful
in sepa
rating
spe
a
ker
and
cha
nnel
varia
b
ilities.
Whil
e in th
e i
-
vect
or m
e
thod
p
r
opo
sed
a
sin
g
le
spa
c
e that m
odel
s the two
variabilities a
nd nam
e
d
it the total varia
b
ility space [11-12]:
ω
T
m
M
(9)
Whe
r
e
M
is the
mean
supe
rvector which
cont
ain spe
a
ker
a
nd ch
ann
el
information
,
m
is UBM sup
e
rv
ecto
r,
T
is a low rank m
a
trix named t
o
tal
variability matrix, which represent
s a
basi
s
of the reduced to
tal
variability space and
ω
is a
stan
dard no
rmal di
stribute
d
vecto
r
, the
comp
one
nts of
ω
are
the fa
ctors a
nd th
ey rep
r
e
s
ent
t
he coo
r
din
a
tes
of the
spea
ker in th
e
reduced total
variability space, th
ese feat
ure vectors ar
e referred to
as ident
ity vectors or nam
ed
i-vec
t
or for short.
The cru
c
ial st
ep to the i-ve
ctor meth
od i
s
to com
pute
total variabili
ty matrix
T
. At f
i
rs
t,
we train
UB
M usi
ng EM
algo
rithm, a
nd extra
c
t th
e Baum
-Wel
ch va
riabl
es
according
to
the
trained UBM:
t
t
m
m
N
,
(10)
t
m
t
t
m
m
μ
ξ
γ
F
)
(
,
(11
)
The
m
N
and
m
F
rep
r
esent ze
ro o
r
de
r and first orde
r stati
s
tic variable resp
ectively,
t
is
the frame nu
mbers,
m
repre
s
ent the m-th
hy
brid vectors of UBM,
t
m
γ
,
is the Gaussia
n
sha
r
ing
rate, that:
M
i
i
i
t
m
m
t
t
m
μ
ξ
N
μ
ξ
N
γ
1
;
;
,
)
,
(
)
,
(
(12)
)
,
(;
m
m
μ
N
is the
Gau
s
si
an
comp
one
n
t
whi
c
h the
m
eani
s
m
μ
and vari
ance i
s
m
,
t
ξ
is the ra
ndo
m vector of the
t
frame
,
M
is the mixed nu
mber of UB
M
.
After calcul
ate Baum-
Wel
c
h varia
b
l
e
s, we
can training mat
r
ix
T
using EM me
thod as follo
ws:
NT
T
I
L
T
1
(13)
F
T
L
x
E
T
1
1
)
(
(14)
F
is the vecto
r
arran
geme
n
t of
m
F
,
N
、
is the diago
nal m
a
trix of
m
N
、
m
r
e
spec
tively.
3.2. The Proposed i-Ve
ctor Sy
stem
After obtaine
d the initial i-vecto
r
features
, we
com
p
lete the improved LPP
algorithm
prop
osed in this pa
per to the i-vecto
r
sy
stem
, the sp
e
c
ific procedu
re are ta
ken a
s
follows:
(1)
Performi
ng t
he dime
nsi
o
n
redu
ction
proc
e
s
s to the
i-vecto
r
by the imp
r
oved
LPP
method p
r
op
ose
d
in this p
aper.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Speaker
Re
cognition Ba
se
d on i-vecto
r
and Im
pr
oved
Local Preserving Proje
c
tio
n
(Di
Wu)
4303
(2) Furthe
r dime
nsio
n red
u
cti
on pro
c
e
s
sin
g
usin
g LDA
scheme.
(3)
Takin
g
the
e
quivalent dim
ensi
on ma
pp
ing to the
re
duced i-ve
ct
or by the
WCCN
trans
form [13]:
r
n
i
T
r
r
i
r
r
i
R
r
r
WCCN
ν
ν
ν
ν
n
R
W
1
1
)
)(
(
1
1
(15)
In this equati
on,
R
is rep
r
e
s
ent the total numbe
rs of spea
ker in th
e
training
set,
r
v
is
the mea
n
rth
training
spe
a
ker sam
p
le
s,
r
i
v
is repre
s
e
n
t
the
ith
sam
p
le of th
e
rth
sp
ea
ke
r,
a
n
d
r
n
rep
r
e
s
ent the
training nu
m
bers of the
rth
speaker.
(4)
Re
cog
n
ize the test sa
mple usi
ng cosin
e
dista
n
ce score:
θ
ω
ω
ω
ω
ω
ω
Score
test
tar
test
tar
test
tar
,
)
,
(
(16)
4. Experiment
4.1. Experiment Design
T
o
e
v
a
l
ua
te
ou
r
imp
r
o
v
ed
i-
ve
c
t
or
s
y
s
t
em
, experim
en
t were
con
d
u
c
ted
on the
d
a
taba
se
from CLEA
R evaluation. Which con
s
ist of
200
voice se
gme
n
ts, each voice segm
en
t is
corre
s
p
ondin
g
a face figu
re pro
p
o
s
ed in
the above, a
nd the len
g
th of each
se
gm
ent is 1 min
u
te.
Those 100
segment
s are use
d
for train
i
ng GMM pa
ra
meters, and
the rest use
d
for testing. The
HTK tool
s we
re u
s
ed fo
r e
x
perime
n
ts. In the fronte
d
, spe
e
ch was
Hammi
ng
win
dowed eve
r
y 10
ms with
a wi
ndo
w width
o
f
20ms, the f
eature
used were
13
-D M
F
CC co
efficie
n
ts
ap
pend
ed
b
y
their first a
n
d
second
ord
e
r
derivative
s
. The mi
xture
comp
one
nts of UBM is 5
12, the col
u
mn
numbers in t
he total vari
ability matrix
T
is 40
0, the n
e
w dim
e
n
s
io
n after dim
e
n
s
ion
red
u
ctio
n
usin
g improved LPP method is 35
0 whil
e after LDA di
mensi
on p
r
o
c
essing i
s
200.
4.2. Ev
aluation Crite
r
ion
In order to test the performance of the new
method
propo
sed in this pape
r
,
we utilizing
the
Equal
E
rro
r Rate(EE
R
) and Min
Dete
ction
Co
st Fun
c
tio
n
(Min
DCF)
a
s
the
evalua
tion
crite
r
ion, the
comp
utation
of MinDCF is
taken a
s
follo
ws [14]:
}
)
(
)
(
{
min
Im
p
A
FA
Tar
R
FR
θ
P
θ
F
C
P
θ
F
C
MinDCF
(17)
While
FR
C
and
FA
C
are the cost of erro
r refu
se
and error a
c
cept re
spe
c
tively
,in the
NIST
match, the
FR
C
is set a
s
10 an
d the
FA
C
i
s
s
e
t
a
s
1
.
Tar
P
and
p
P
Im
are the prior
probability
of genuine sp
eaker and im
poste
r spea
ker in the test
set, naturally
,
Tar
P
is set as 0.0
1and
p
P
Im
is
s
e
t as
0.99.
R
F
is the false
ref
u
se rate, and
A
F
is the false a
c
cept rate.
4.3. Experiment Result a
nd Analy
s
is
The sim
u
latio
n
experim
ent
s in this pa
pe
r are
con
s
i
s
t of two part:
(1)
In the clean
backgroun
d, we com
par
e
the perform
ance betwe
e
n
the conven
tional
LPP
method, the improve
d
LPP
method prop
osed i
n
this pap
er
utilizing in th
e i-vecto
r
system
and GMM m
e
thod, the re
su
lt are sho
w
n i
n
the
T
abl
e 1.
(2)
Unde
r dif
f
erent noi
se
en
vironme
n
ts,
we
expl
ore the robu
stne
ss of the
ne
w LPP
method utilizi
ng in the i-ve
ctor
system,
the re
sult are sho
w
n in the
T
abl
e 2.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4299 – 4
305
4304
Table 1. Experime
n
t Re
su
lt Compa
r
ed
betwe
en Initial LPP Algorithm and Improved LPP
Algorithm whi
c
h u
s
ed for i
-
vector Spe
a
ker Re
co
gnitio
n
System
Method
EER(%)
MinDCF
LPP(i-vector)
4.72
0.19
Improved LPP(i-
vector)
4.45
0.17
Conventional
GMM 7.32 0.53
From the
re
sults sho
w
n in
the
T
a
ble 1, i
t
is
cle
a
r that
the re
cog
n
itio
n perfo
rma
n
ce of the
i-vecto
r
syste
m
is better than the initial GMM re
cog
n
i
t
ion system whether
u
nde
r EER crite
r
ion
or
MinDCF crite
r
ion.we ca
n see that the EER is
redu
cing 3% and
MinDCF i
s
redu
cin
g
0.3
5
%
comp
ared to the initial GMM system, so
the exper
im
ent result co
nfirm the su
p
e
rio
r
ity of the i-
vector
syste
m
powerfully
.
While fu
rthe
r to see t
he result
sho
w
n i
n
the
T
able
1, the pe
rform
a
nce
given by the improved L
P
P
algorithm
are bette
r
than the pe
rf
orma
nce given by initial LPP
algorith
m
, the EER is reducin
g 0.27% and MinDCF
is redu
cin
g
0.02%.
This improve
d
method
can en
han
cin
g
the reco
gnit
i
on perfo
rma
n
ce of the i-ve
ctor
system
for the rea
s
o
n
of it can further
discrimi
nate the in-cla
ss
sa
mples a
nd th
e near di
stan
ce extra
-
cl
ass sa
mple
s.
Table 2. Experime
n
t Re
su
lt Based on I
m
prove
d
LPP Algorithm un
der Diffe
rent
Noi
s
e
Environme
n
t whi
c
h u
s
ed fo
r i-vecto
r
Spe
a
ke
r Re
co
gni
tion System
V
o
ice Enironment
SNR EER(%)
MinDCF
Clean Backgroun
d
>40dB
4.45
0.17
White Noise Environment
0dB
7.04 0.335
5dB
6.72 0.295
10dB
5.91 0.276
15dB
5.36 0.242
20dB
4.93 0.204
Babble Noise Environment
0dB
6.89 0.314
5dB
6.49 0.282
10dB
5.71 0.255
15dB
5.02 0.228
20dB
4.76 0.189
Form the experime
n
t result
s sho
w
n in the
T
abl
e 2, the performa
n
ce
given by the
i-vecto
r
system ba
se
d on the improved LPP
scheme are bet
ter than the initial GMM method.
The EER is
4.45% and the MinDCF is 0.17 unde
r clean bac
kg
roun
d and its decre
asi
ng 2.87% and 0.36
respe
c
tively compa
r
ed to the initial GM
M method.
The pe
rform
ance of the method p
r
op
ose
d
in this pape
r ca
n re
duci
ng the EER and
MinDCF cert
ain deg
ree u
nder dif
f
e
r
ent
signal n
o
ise
rate(S
NR) u
nder
white n
o
ise a
nd ba
b
b
le
noise enviro
n
ment. While
the SNR is 20, t
he EER is 4.93% and 0.17 un
der white no
ise
environ
ment
and babble
environme
n
t, and its d
e
crea
sing 2.
29% and 2.56% respe
c
tively
comp
ared to the initial GM
M method.
5. Conclusio
n
In this pa
per,
a new
metho
d
of enha
nci
n
g the spea
ke
r recognitio
n
p
e
rform
a
n
c
e u
nder i
-
vector
syste
m
whi
c
h its th
e most
cuttin
g
edg
e re
co
g
n
ition sy
stem
in our
kn
owl
edge i
s
p
r
op
o
s
ed,
the ne
w
met
hod i
s
ba
se
d on
conven
tional LPP
method
and
the m
o
tivation
wa
s that
the
conve
n
tional
LPP method
i
s
al
way
s
suffer from th
e S
SS probl
em,
and i
n
thi
s
n
e
w
scheme,
We
only u
s
ing t
he ei
genve
c
t
o
rs corre
s
po
nding
po
si
tive eig
envalue
wh
en
solvin
g the
optimi
z
ed
obje
c
tive function and re
mo
ving the zero eigenvalu
e
.
Further work will
concentrate on following two areas:
(1)
Solving the small sample size (SSS)
probl
em of the LPP method utilizing other
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Speaker
Re
cognition Ba
se
d on i-vecto
r
and Im
pr
oved
Local Preserving Proje
c
tio
n
(Di
Wu)
4305
mathemati
c
al
method forev
e
r.
(2)
The comp
uta
t
ional re
quire
ments fo
r trai
ning the i
-
vector syst
ems
a
nd e
s
timating
the
i-vecto
r
s, ho
wever, a
r
e too high for ce
rtain types
of appli
c
ation
s
. A simply method to the ori
g
inal
i-vecto
r
extra
c
tion
and
training
which
wo
uld
dra
m
atically d
e
c
re
ase thei
r co
mplexity
while
retainin
g the reco
gnition p
e
r
forma
n
ce is i
n
si
stent dem
and.
Ackn
o
w
l
e
dg
ements
This wo
rk
was
su
ppo
rted
by in
part
the
Nation
al
Scien
c
e
-
tech
nology Su
pp
ort Plan
Proje
c
t of
China
und
er contra
ct 1
214
ZGA008,
in part by
the
Nature
S
c
ien
c
e Fou
ndatio
n
of
Chin
a un
der
contract
612
6
3031, in
pa
rt by the
Scie
nce Fou
ndation
of Gan
s
u P
r
ovince
of Chi
na
unde
r co
ntra
ct 1010RJZA0
46.
Referen
ces
[1]
Kinn
une
n T
,
Li HZ
. An o
v
ervie
w
of te
xt-in
d
e
pen
de
nt speak
er rec
ogn
iti
on: from
features t
o
superv
e
ctors.
Speec
h Co
mmunic
a
tion
. 2
010
; 52: 12-40.
[2]
Hamid
Rez
a
T
ohi
d
y
po
ur, Se
yye
d
Al
i Se
yyedsa
l
eh
i, Hoss
ein B
e
h
b
o
od,
Hossei
n
R
o
sh
and
el. A
n
e
w
repres
entati
o
n
for spe
e
ch f
r
ame rec
o
g
n
iti
on
base
d
o
n
redu
nd
ant
w
a
vel
e
t filter
b
anks.
Sp
eec
h
Co
mmun
icati
o
n.
2012; 5
4
: 25
6-27
1.
[3]
T
y
ler K P
e
rrac
h
io
ne, Step
han
ie N D
e
l T
u
fo,
John
D
E
Gabri
e
li. Hum
an Vo
i
c
e Reco
gn
ition
Depe
nds
o
n
Lan
gu
age Ab
ili
t
y
.
Scienc
e
. 20
11; 333: 5
95.
[4]
Parvin Z
a
r
e
i E
skikan
da, Se
yyed Al
i Se
yye
d
s
a
le
hia.
R
o
b
u
st speak
er reco
g
n
itio
n b
y
e
x
trac
ting i
n
vari
an
t
features. Proce
d
ia - Soci
al an
d Behav
ior
a
l S
c
ienc
es. 201
2; 32(3): 23
0-2
3
7
.
[5]
Shao Y
ang,
Jin Z
haoz
hu
a
ng, W
ang D
e
lian
g
. An au
d
i
tor
y
base
d
feature for ro
bust spe
a
ker
recog
n
itio
n.
ICASSP.
T
a
ibei, T
a
n
w
a
n
. 200
9: 4625-
46
28.
[6]
Di W
u
, Jie
C
ao, Jin
h
u
a
W
ang, W
e
i
Li.
Multi-f
eatur
e fusio
n
face r
e
cogn
ition
bas
e
d
on
Kern
e
l
Discrimi
nate
L
o
cal
Preserv
e
Projecti
on
Alg
o
rithm
und
er s
m
art env
ironm
ent.
Jo
urna
l of
Co
mputer
s
.
201
2; 7(10): 24
79-2
487.
[7]
Jun Du, Qian
g Huo. A F
e
a
t
ure Comp
ens
ation Ap
pro
a
c
h
Usin
g Hig
h-
Order Vector T
a
y
l
or Seri
es
Appro
x
imatio
n
of an E
x
plic
it Distortio
n
Mo
del for
Nois
y s
peak
er reco
gn
i
t
ion.
IEEE Transactions o
n
Audi
o, Speec
h, and La
ng
ua
ge
Processin
g
.
2011; 19(
8): 228
5-22
93.
[8] Jeon
g
Y.
Sp
e
a
ker a
d
a
p
tatio
n
b
a
sed
on
th
e
multi
lin
ear
d
e
co
mp
ositio
n
of traini
ng
spe
a
ker
mo
de
ls
.
Procee
din
g
s o
f
the IEEE Internati
ona
l Co
n
f
erenc
e
on Ac
oustics, Spe
e
c
h
and S
i
g
nal
Processi
ng.
Dall
as, USA: IEEE. 2010; 48
70-4
873.
[9]
Yong
jun H
e
, Jiqin
g
Ha
n. Gaussia
n
Spe
c
ific Compe
n
s
a
tion for Ch
a
nne
l Distortio
n
in speak
er
recog
n
itio
n
. IEEE Signal Proc
essing Letters.
201
1; 18(1
0
): 599-6
02.
[10]
Omid Dehz
ang
i, Bin Mab, En
g
Sion
g Chn
g
,
Haizh
ou Li. Di
scriminativ
e fe
ature e
x
tractio
n
for speake
r
recog
n
itio
n usi
ng conti
n
u
ous
output co
des.
Pattern Recognition Letters.
201
2; 33: 170
3
-
170
9.
[11]
GU
Xia
o
hua,
GONG W
e
i
guo, YA
NG L
i
pi
ng.
S
u
p
e
rvi
s
ed
gra
ph-o
p
ti
mized
loc
a
lit
y pres
ervin
g
proj
ections.
Op
tics and Precis
i
on Eng
i
n
eeri
n
g
. 2011; 19(
3): 672-6
80.
[12]
N De
hak, P
Kenn
y,
R D
e
h
a
k, P D
u
mou
c
hel, P
Ou
e
l
-l
et. F
r
ont-end
factor an
al
ysis
for sp
eak
e
r
verificati
on. Au
dio, Spe
e
ch, a
nd La
ng
uag
e P
r
ocessi
ng.
IEEE Transactions
on
. 2010; 99.
[13]
T
o
mas Pfister, Peter Ro
bi
nso
n
. Rea
l
-T
ime R
e
cog
n
it
io
n of A
ffective States
fr
om Nonv
erb
a
l
F
eatures
of
Speec
h
and
I
t
s Appl
icatio
n
for Pu
blic
S
peak
ing
Ski
ll
Anal
ys
is.
IEEE Transactions on Affective
Co
mp
uting
. 2
0
11; 2(2): 66-7
8
.
[14]
C Santh
o
sh K
u
mar, VP Moh
and
as. Rob
u
st
features for multili
ng
ual acou
stic
model
in
g
. Int J Speech
T
e
chno
l
. 201
1; 14: 147-1
55.
Evaluation Warning : The document was created with Spire.PDF for Python.