Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
2
,
A
pr
il
2020
, p
p. 18
59
~
1867
IS
S
N: 20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v10
i
2
.
pp1859
-
18
67
1859
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
High
l
evel speak
er specif
ic featu
res m
odeli
ng
in au
tomatic
speak
er
recogniti
on
s
yste
m
Sa
t
yan
an
d
Si
ng
h
1
,
Pr
agya
Singh
2
1
School
of El
ec
t
ric
a
l
and
E
lectr
o
nic
s E
ng
ine
er
ing
,
Fij
i
Nat
ional
U
nive
rsit
y
,
Fij
i
Isl
and
2
School
of
Publi
c
Hea
lt
h
and
Pri
m
ar
y
C
are
,
Fij
i N
at
ional
U
niver
sist
y
,
Fiji
Island
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
A
p
r
1
9
, 201
9
Re
vised
Oct
2
9
,
20
19
Accepte
d
Nov 6
, 2
0
19
Spoken
words
c
onve
y
sev
eral
l
eve
ls
of
informati
on
.
At
the
pr
imar
y
l
evel,
the
spee
ch
conv
e
y
s
words
or
sp
oken
m
essage
s,
but
at
th
e
se
con
dar
y
le
v
el
,
the
spe
ec
h
a
lso
r
eve
a
ls
informat
i
on
about
th
e
spe
ake
rs.
Thi
s
work
is
b
ase
d
o
n
the
h
igh
-
l
eve
l
spea
ker
-
spec
ific
fe
at
ure
s
on
st
at
isti
cal
spe
ake
r
m
odel
ing
te
chn
ique
s
that
expr
ess
the
cha
r
ac
t
eri
sti
c
sound
of
the
hum
an
v
oi
ce.
Us
ing
Hidden
Markov
m
odel
(HM
M),
Gauss
ia
n
m
ixt
ur
e
m
odel
(GM
M),
and
Li
n
ear
Discriminant
An
aly
s
is
(LDA)
m
odel
s
buil
d
Aut
om
at
ic
Speak
er
Rec
ognition
(AS
R)
sy
st
em
t
hat
are
comput
at
ion
al
ine
xpens
ive
ca
n
r
ec
ogn
i
ze
spe
ake
rs
reg
ard
l
ess
of
wh
at
is
said.
The
p
erf
orm
anc
e
of
th
e
AS
R
s
y
stem
is
ev
al
u
at
ed
for
cl
e
ar
spee
ch
to
a
wide
r
ang
e
of
sp
ee
ch
qu
alit
y
using
a
stan
dar
d
TIMIT
spee
ch
cor
pus.
T
he
AS
R
eff
i
ci
en
c
y
of
HM
M,
GM
M,
and
LDA
bas
ed
m
odeling
te
chn
ique
are
98
.
8%,
99
.
1%,
and
98.
6%
and
Equ
al
Er
ror
Ra
te
(
E
ER)
is
4
.
5%
,
4.
4%
and
4.
55
%
respe
c
ti
v
ely
.
The
EE
R
improvem
ent
of
GM
M
m
odel
ing
te
chn
ique
b
ase
d
AS
R
sy
st
emcom
par
ed
with
H
MM
and
LDA
is
4.
25%
and
8.
51%
r
espe
ctiv
ely
.
Ke
yw
or
d
s
:
Au
t
om
atic
sp
e
aker r
ec
ogniti
on
(A
SR
)
Extrem
e
le
arn
ing m
achine
(EL
M)
G
aus
sia
n
m
ixer
m
od
el
(G
MM
)
Hidden
m
ark
ov m
od
el
(
HM
M)
Linear
discrim
i
nan
t a
naly
sis
(
LD
A
)
S
upport
v
ect
or
m
achines
(S
VM
)
U
ni
ver
sal
b
ac
kgr
ound
m
od
el
(U
BM
)
Copyright
©
202
0
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Saty
anand S
i
ngh,
School
of Elec
tric
al
an
d El
ect
ronics E
nginee
rin
g,
Fiji N
at
io
nal
U
niv
e
rsity
, F
iji
I
sla
nd.
Em
a
il
: sat
ya
na
nd.sin
gh@
fnu.a
c.f
j
1.
INTROD
U
CTION
Most
of
AS
R
app
li
cat
ion
m
od
el
ing
te
ch
niques
m
ake
var
i
ou
s
m
at
hem
at
ic
al
assumpti
ons
ab
ou
t
sp
ea
ker
-
sp
eci
fic
f
eat
ures
.
I
f
voic
e
da
ta
does
not
sat
isfy
th
ese
at
trib
utes,
incom
plete
ness
will
occ
ur
at
ASR
m
od
el
ing
sta
ge
.
T
her
e
f
or
e,
th
e
m
at
he
m
atical
m
od
el
fits
t
he
featur
e
s
a
nd
is
forced
to
der
i
ve
rec
ogniti
on
s
cor
e
s
base
d
on
the
se
m
od
el
s
an
d
te
s
t
sp
eec
h
data.
Convertin
g
au
dio
segm
ents
i
nto
the
f
unct
io
nal
par
am
et
er,
after
t
hat
m
od
el
ing
proc
ess
sta
rted
in A
SR. I
n
AS
R m
od
el
ing
is
a
process f
lo
w
to
cat
egories
al
l
sp
ea
ker
s b
ase
d
on
t
hei
r
char
act
e
risti
cs.
T
he
m
od
el
shou
l
d
al
s
o
pro
vi
de
it
s
m
eaning
for
c
om
par
iso
n
with
unfam
ilia
r
s
pea
ker
ut
te
ra
nces
.
AS
R
m
od
el
in
g
is
cal
le
d
as
r
ob
us
t
wh
e
n
it
s
s
pe
aker
s
pecific
f
eat
ur
e
cha
racte
rizat
ion
proc
es
s
is
not
sig
nific
antly
aff
ect
ed
by
unwan
te
d
m
al
adie
s,
al
th
ough
t
he
se
feat
ur
es
a
re
i
deal
i
f
s
uc
h
fe
at
ur
es
can
be
de
sign
e
d
in
su
c
h
a
way
that
interspea
ke
r
dis
crim
inatio
n
is
m
axi
m
um
,
then
no
in
t
rasp
ea
ker
var
i
at
ion
exists
a
nd
sim
ple
m
od
el
ing
m
et
ho
ds
can
be
suffici
ent.
I
n
short
f
or
m
,
th
e
no
n
-
ideal
pr
op
e
rtie
s
of
the
sp
ea
ke
r
s
pecif
ic
featu
re
e
xtr
act
ion
ph
a
se
require
diff
e
re
nt
c
ompen
sat
ion
te
ch
niques
duri
ng
the
A
SR
m
od
el
ing
ph
a
se
so
t
hat
the
e
f
fect
of
the
dist
urbanc
e
va
riat
ion
pr
esent
in
the
s
peech
sig
nal
c
an
be
reduce
d
duri
ng
the
te
sti
ng
of
the
s
peak
e
r
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
2
,
A
pr
i
l 202
0
:
1859
-
1867
1860
recog
niti
on
process.
Most
of
the
AS
R
m
od
el
in
g
te
c
hn
i
qu
e
s
do
dif
fere
n
t
m
at
he
m
a
tical
hypotheses
ab
ou
t
the
s
peak
e
r
-
spe
c
ific
featu
res.
If
ass
um
ed
pr
op
e
rtie
s
are
not
m
e
t
fr
om
the
sp
eec
h
data,
then
we
are
ba
sic
al
ly
pr
ese
ntin
g flaws e
ve
n durin
g t
he ASR m
od
e
li
ng
ph
a
se.
The
no
rm
aliza
ti
on
of
sp
ea
ke
r
-
s
pecific
feat
ures
can
reduce
these
pro
blem
s
to
s
om
e
exte
nt,
but
not
com
plete
ly
.
A
s
a
res
ult,
m
at
hem
atical
m
od
el
s
are
c
om
pel
le
d
to
a
dopt
the
c
har
a
ct
erist
ic
s
an
d
sp
ea
ker
recog
niti
on
sc
ores
a
re
obta
ine
d
base
d
on
thes
e
m
od
el
s
a
nd
te
st
sp
ee
ch
data.
Th
us
,
in
this
process,
t
he
pro
pe
rtie
s
of
detect
in
g
a
rtifact
s
are
i
ntr
oduce
d
a
nd
a
fa
m
ily
of
s
c
ore
s
ta
nd
a
rd
iz
at
io
n
te
chn
iq
ues
has
been
pr
opos
e
d
wh
ic
h
is
pro
po
se
d
t
o
com
ple
te
this
final
sta
ge
m
is
m
at
ch
[
1
].
I
n
esse
nce,
the
dec
li
ne
i
n
ac
ou
sti
c
si
gn
al
a
ff
ect
s
the
s
peak
e
r
-
s
pe
ci
fic
featu
res,
patte
rn
s
,
a
nd
s
cor
es
.
T
heref
ore,
it
is
im
po
rtant
to
im
pr
ov
e
the
rob
us
tness
of
A
SR
syst
e
m
s in
al
l th
ree
dom
ai
ns
. I
t has
b
ee
n
m
entione
d rece
ntly
that sp
ea
ker
m
od
el
ing
tech
niques
have im
pro
ved
and sc
or
e
nor
m
al
iz
ation
tech
n
iq
ues
a
re
no
t
m
uch
eff
ect
ive
[
2
].
Pr
oba
bili
sti
c
m
od
el
in
g
te
ch
ni
qu
e
s
s
uc
h
as
GMM
an
d
HMM
are
widely
u
se
d
f
or
the
s
peaker,
la
ngua
ge,
e
m
otion
,
a
nd
s
peech
rec
ognit
ion
.
I
n
t
he
pro
bab
il
ist
ic
m
od
el
,
each
sp
ea
ke
r
/l
angua
ge/em
ot
ion
is
m
odel
ed
as
a
pro
ba
bili
ty
s
ource
with
a
n
unknow
n
bu
t
fixed
pro
ba
bili
ty
de
ns
it
y
functi
on.
T
he
trai
ning
ph
ase
is
a
para
m
et
e
r
that
est
i
m
a
te
s
the
pro
ba
bili
ty
de
ns
it
y
f
unct
ion
from
a
su
f
fici
ent
num
ber
of
trai
ni
ng
sam
ples.
For
AS
R
recog
niti
on
,
t
he
po
s
sibil
it
y
of
te
st
utteranc
es
on
t
he
m
od
el
is
cal
culat
e
d.
GMM
is
a
li
near
com
bin
a
ti
on
of
m
ul
ti
var
ia
te
G
aussian
distri
buti
ons
that
sim
ulate
(
)
⁄
.
GMM
c
an
be
co
nverte
d
to
a
post
cl
assifi
er
usi
ng
Ba
ye
sia
n
r
ules
[
3]
.
T
her
e
are
oth
er
a
dvanta
ges,
s
uc
h
a
s
be
ing
a
ble
t
o
t
r
ai
n
t
he
m
od
el
for
a
la
rg
e
am
ount
of
sp
eec
h
data
a
nd
a
da
pt
it
to
t
he
ne
w
data
f
or
m
at
.
Wh
e
n
us
in
g
a
m
od
el
f
or
A
SR
a
pp
l
ic
at
ion
s
uch
a
s
GMM,
the
s
pea
ker
-
in
dep
e
ndent
U
ni
ver
sal
B
ack
groun
d
M
odel
(
UBM)
first
use
s
voic
e
data
f
or
trai
ning.
UBM
represe
nts
the
distrib
ution
of
featur
e
vect
or
s
ind
e
pe
nd
e
nt
of
spe
ake
rs.
W
hen
a
ne
w
s
pe
aker
is
re
gister
ed
i
n
the
AS
R
syst
em
,
the
pa
ram
eter
s
of
t
he
bac
kg
r
ound
m
od
el
are
a
da
pted
to
the
featu
re
dist
rib
ution
of
the
ne
w
sp
ea
ker
.
T
he
a
dap
ti
ve
m
od
el
is t
hen u
se
d
a
s
a
n
AS
R
s
pea
ke
r
’
s m
od
el
.
Stat
ist
ic
al
L
ang
ua
ge
M
odel
ing
(
LM)
is
the
sci
e
nce
of
buil
ding
a
m
od
el
to
e
stim
a
te
the
pri
or
pro
bab
il
it
y
of
word
strin
gs
.
S
ucc
ess
fu
l
use
of
la
ngua
ge
m
od
el
to
m
od
el
the
r
hythm
of
sp
eake
r
a
nd
la
ngua
ge
.
The
f
undam
ental
fr
e
quency
F
o
and
e
nergy profi
le
s
are
la
beled
as d
isc
rete
cl
a
sses
a
nd
the
n m
od
el
ed
usi
ng two
bigram
s
or
trig
ram
s
[
4
]
.
Hidd
en
E
ven
ts
LM
con
ta
in
s
s
pecial
word
s
that
a
ppear
i
n
the
m
od
el
’s
N
-
gram
.
In
ste
ad
,
they
corres
pond
to
t
he
sta
te
of
the
HMM
an
d
ca
n
be
use
d
t
o
sim
ulate
la
ng
ua
ge
e
ve
nts
s
uch
as
bounda
r
ie
s
of
un
m
ark
e
d
se
nt
ences.
Alte
r
nat
ively
,
these
ev
ents
m
ay
be
as
so
ci
at
ed
with
unnatu
ral
poss
ibil
it
ie
s
fo
r
ad
justi
ng
LM
(eg
,
r
hyth
m
)
fo
r
oth
e
r
s
ources
of
knowl
edg
e
.
A
s
pecial
ty
pe
of
hidden
even
t
LM
ca
n
si
m
ulate
a
no
nsm
oo
th
sp
eec
h by le
tt
i
ng h
i
dd
e
n
e
ve
nt
s m
od
ify
the
word h
ist
or
y
[
5
].
Decisi
on
tre
es
are
al
so
s
uc
cessf
ul
ly
us
e
d
i
n
pros
odic
m
od
el
ing
f
or
AS
R
ap
plica
ti
on
[
6]
.
Th
e
d
eci
sio
n
t
ree
m
od
el
“progress”
by
sys
tem
-
gen
er
at
ed
qu
e
sti
on
t
o
th
e
sp
ea
ker
at
once.
T
he
featu
res
of
the
quest
io
ns
in
eac
h
quest
io
n
a
nd
the
n
t
he
thres
holds
in
the
quest
io
ns
(
eg
no
rm
al
iz
ed
pitch
great
er
tha
n
thres
ho
l
d
value
)
pr
e
fer
a
bly
disti
ng
uis
h
t
he
cl
a
ss
of
no
des
i
n
t
he
tree
.
I
n
the
t
est
phase,
the
de
ci
sion
t
ree
est
i
m
at
es
the
poste
rio
r
prob
a
bili
ty
of
ea
ch
cl
ass
C
of
eac
h
sam
ple
X
,
re
sul
ti
ng
in
(
)
⁄
[
7
].
O
ne
of
t
he
m
ai
n
dr
a
w
back
s
of
decisi
on
tre
es
is
t
he
greed
y
buil
d
proce
s
s:
at
eac
h
ste
p,
the
c
om
bin
at
i
on
sel
ect
s
a
si
ng
le
be
st
var
ia
ble
a
nd
the
best
br
ea
kp
oin
t,
but
co
ns
id
erin
g
m
ulti
-
step
pr
e
fet
chin
g
of
va
riable
c
ombinati
ons
t
ha
n
a
good
resu
lt
.
Anothe
r
disad
va
ntage
is
the
fa
ct
that
con
ti
nu
ous
vari
ables
a
re
im
plici
tly
discreti
z
ed
by
t
he
pa
rtit
ion
in
g
p
ro
ce
s
s
a
nd
inf
or
m
at
ion
is
lost
al
ong
the
way.
T
he
a
dva
ntage
of
decisi
on
trees
f
or
ot
her
m
achine
le
arn
i
ng
m
et
ho
ds
is
that
they
are
not
bl
ack
-
box
m
od
el
s,
but
can
easi
ly
be
represe
nted
as
r
ules.
I
n
m
any
app
li
cat
ion
s
,
these
m
odel
s
are
m
or
e i
m
po
rtan
t t
han
disa
d
va
nt
ages,
s
o
t
hese
m
o
dels are
wi
de
ly
u
sed
in
ASR
appli
cat
ion
.
Discrim
inant
m
od
el
s
su
c
h
a
s
A
rtific
ia
l
N
e
ur
al
N
et
w
orks
(ANN)
[
8
]
an
d
S
up
port
V
ect
or
M
achi
nes
(S
VM
)
are
al
s
o
us
ed
f
or
pros
odic
m
od
el
ing
[
9
]
.
Dee
p
Ne
ura
l
Network
(
DNN)
[
10
]
,
E
xtre
m
e
Learn
in
g
M
achi
ne
(EL
M)
,
an
d
D
NN
-
ELM
hav
e
proved
us
e
fu
l
for
pro
sodic
-
ba
sed
s
peak
e
r
r
ecognit
ion
[
11]
.
The
S
VM
m
od
el
is
an
alg
or
it
hm
ic
i
m
ple
m
entat
io
n o
f
t
he
idea
fr
om
the stat
ist
ic
al
learni
ng t
he
or
y
[1
2] a
nd
focuses
on the
prob
le
m
of
c
onstr
ucting
a
consi
ste
nt
e
stim
at
or
fr
o
m
the
sp
eec
h
data.
Mo
del
perf
orm
ance
and
tra
ining
set
est
i
m
at
ion
m
et
ho
d
f
or
unkn
own
data
set
wh
e
n
only
m
od
el
ch
aracte
risti
cs
a
re
giv
e
n
Per
f
or
m
ance
?
Re
gardin
g
the
al
gorithm
,
the
s
upport
ve
ct
or
m
achine
e
sta
blishes
a
n
optim
al
separ
at
ion
bo
unda
ry
be
tween
data
se
ts
by
so
lvi
ng
t
he
c
onstrai
ne
d
quad
ra
ti
c
op
ti
m
iz
ation
pro
blem
[
13
].
By
us
in
g
diff
e
re
nt
ke
rn
e
l
functi
ons,
dif
fer
e
nt
degrees
of
nonl
inearit
y and
f
l
exibili
ty
can
be
includ
e
d
in t
he
m
od
el
. S
up
port v
ect
or
m
a
chines
a
re g
ai
ne
d
f
ro
m
adv
a
nce
d
sta
ti
sti
cal
ideas
an
d
can
cal
culat
e
the
ra
ng
e
of
gen
e
rali
zat
ion
error
for
the
m
,
so
we
ha
ve
gaine
d
consi
der
a
ble
re
search
i
nterest
ov
e
r
the
past
f
ew
ye
ars.
T
he
perform
ance
of
oth
e
r
m
achine
le
arn
in
g
al
go
rithm
s
equ
al
to
or
be
tt
er
tha
n
t
hose
of
ot
her
m
ac
hin
e
le
ar
ning
al
gorithm
s
are
re
ported
i
n
t
he
m
edical
li
te
ratur
e.
A
disa
dv
a
ntag
e
of
the
s
uppor
t
vect
or
m
achine
is
that
the
cl
assifi
cat
ion
res
ult
is
purely
di
cho
t
om
ou
s
a
nd
the
re
is n
o po
ssi
bili
ty
o
f
g
i
ving cla
ss m
e
m
ber
sh
ip
[
14
]
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
High leve
l s
pe
aker speci
fi
c features
mo
delin
g
in
auto
m
atic spe
aker
…
(
Saty
anand Si
ngh
)
1861
2.
MO
DELIN
G
BASED
O
N
P
R
OS
ODY
IN
AU
TO
M
ATI
C
SPE
AKE
R RECO
G
N
ITI
ON
SYST
EM
Pr
oso
dy
u
se
s
t
he
a
pprop
riat
e
m
e
tho
d
t
o
obta
in
the
g
l
ob
al
sta
ti
sti
cs
of
the
s
peak
e
r
’
s
f
undam
ental
fr
e
qu
e
ncy
val
ue
a
nd
the
A
S
R
syst
e
m
recogn
iz
in
g
the
ta
sk
.
The
dyna
m
ic
s
of
t
he
con
t
our
ref
le
ct
ing
the
pe
rson
’
s
ta
lkin
g st
yl
e
has
be
en s
how
n t
o
be
able
to
help
the
sp
ea
ker
rec
ogniti
on
t
he t
ask.
T
he
m
otion
of
the
sp
ea
ker
is
m
od
el
ed
by
fitt
ing
a
piece
wise
li
nea
r
m
od
e
l
to
the
or
bit
to
ob
ta
in
a
st
yl
iz
ed
pro
file
.
Using
m
edian
F
0,
the
sl
ope
a
nd
durati
on
re
present
eac
h
li
near
se
gm
ent.
T
hese
f
eat
ur
es
are
m
od
el
e
d
by
log
-
norm
al
distrib
ution,
norm
al
distribu
t
ion
,
a
nd
s
hift
expo
nen
ti
al
di
stribu
ti
on,
res
pe
ct
ively
.
In
order
t
o
inv
est
igate
the
po
s
sibil
it
y
of
s
peak
e
r
rec
ogni
ti
on
us
i
ng
r
hythm
and
idiom
,
NI
S
T
i
ntrod
uc
ed
e
xten
de
d
da
ta
ta
sk
te
le
ph
one
ta
lk
base
d
on
exc
ha
ng
e
c
orp
us
.
U
nlike
tra
diti
on
a
l
sp
eake
r
rec
ogniti
on
ta
s
ks
,
th
e
exten
ded
dat
a
ta
sk
pro
vid
es m
ulti
ple co
m
plete
session p
la
nes (
4/8
/1
6 si
des
)
f
or sp
ea
ke
r
trai
ning a
nd test
in
g
the
ASR
syst
e
m
.
In
[
15]
the
focus
is
on
i
nv
e
sti
gating
va
rio
us
pro
sodic
fe
at
ur
es.
F
unda
m
ental
fr
e
que
ncy
base
d
on
segm
ent
per
i
od
an
d
pa
us
e
per
i
od.
Pe
rio
dic
c
ha
racteri
sti
cs,
or
w
ord
c
ha
racter
ist
ic
s,
te
le
phone
pe
rio
ds
a
nd
pe
rio
d
seq
uen
ces
ha
ve
bee
n
us
e
d
to
m
od
el
the
pe
riod.
I
n
[
16
]
,
dur
at
ion
,
pit
ch,
an
d
e
nergy
c
har
a
ct
erist
ic
s
are
ca
lc
ulate
d
for
eac
h
est
im
at
ed
syl
la
ble
r
egio
n.
Syl
la
bl
e
bounda
ry
ob
ta
ined
f
r
om
th
e
AS
R
syst
em
.
Thes
e
feat
ures
are
qu
a
ntize
d an
d use
d
to
for
m
N
-
gram
s call
ed
N
-
gr
am
b
ase
d sy
ll
able n
on
-
unif
or
m
ex
tract
ion re
gion
featu
res.
In
[
17
]
,
co
ntin
uous
pro
sodic
featur
e
s
wer
e
m
od
el
ed
us
i
ng
Jo
i
nt
Fact
or
An
al
ysi
s
(JFA
)
f
or
s
pea
ker
recog
niti
on
.
T
he
pro
sodic
featur
e
use
d
is
the
pitch
a
nd
e
nerg
y
pro
file
over
unit
s
of
sim
il
ar
syl
la
bles,
re
pr
es
ente
d
us
in
g
bases
of
Le
gendr
e
po
ly
no
m
ials.
Stand
a
r
d
GMM
i
s
us
e
d
f
or
m
od
el
ing.
I
n
a
dd
it
ion
,
the
ef
fe
ct
of
the
s
pea
ker
an
d
se
ssio
n
c
ha
nge
is
m
od
el
ed
in
the
sam
e
way
as
co
nve
ntion
al
JF
A.
Lege
ndre
poly
no
m
ia
l
coeffic
ie
nts o
f
p
it
ch
a
nd
en
er
gy,
to
gethe
r
wi
th
the
le
ng
t
h
of
the
se
gm
ent,
con
sti
tute
a
13
-
dim
ension
al
p
r
os
ody
featur
e
set f
or
GMM an
d fact
or analy
sis m
od
el
ing [
17]
.
2.1
. Ei
genvoic
e
consi
dera
tio
n in h
id
den m
arkov m
od
el
s
In
t
he
sta
nd
a
r
d
ei
ge
n
vo
ic
e
appr
oach,
vo
ic
e
data
is
c
ollec
te
d
f
ro
m
the
num
ber
of
s
pe
ak
ers
with
the
div
e
rse
sce
nar
i
o
.
Wh
e
n
ea
ch
HMM
sta
te
is
m
od
el
ed
as
a
m
ixtur
e
of
Ga
us
sia
n
distrib
ut
ion
s
,
a
set
of
spe
aker
-
dep
e
ndent
HM
Ms
are
f
or
m
ed
from
each
s
peak
e
r.
T
he
s
peak
e
r'
s
voi
ce
is
r
ep
rese
nted
by
t
he
super
vec
to
r
com
po
sed
of
the
co
ncatenat
ion
of
the
m
ean
vect
or
s
of
al
l
Gau
ssia
n
HMM
distr
ibu
ti
ons.
T
he
r
e
fore,
the
i
-
th
s
pea
ke
r
s
uper
vecto
r
i
s
com
po
se
d
of
R
com
po
ne
nts,
one
Ga
us
sia
n
per
distri
bu
ti
on,
an
d
is
e
xpre
ssed
as
=
[
1
,
,
2
,
,
…
.
.
,
]
,
∈
ℝ
2
. T
he
sim
i
la
rity
bet
wee
n a
ny
t
wo
s
pea
ker
s
uper
vecto
rs
an
d
is
m
easur
e
d
by
their
do
t
pro
duct
as foll
ows.
,
=
∑
,
=
1
(1)
Pr
inci
pal
com
pone
nt
ana
ly
sis
(P
CA
)
is
th
en
pe
rfor
m
ed
on
t
he
trai
ni
ng
sp
ea
ke
r
sup
erv
ect
or
a
nd
the r
es
ulti
ng
ei
gen
vecto
r
is
re
ferred
t
o
as
ei
ge
nvoice
.
In
ord
er to
ad
a
pt to
t
he
ne
w
s
peak
e
r,
h
is/
her
s
uper
vecto
r
process
deals
with
a
li
nea
r
c
om
bin
at
ion
of
the
to
p
ei
gen
voic
es
=
(
)
=
∑
[
{
1
,
2
,
…
.
}
]
′
=
1
.
Usu
al
ly
,
on
ly
a
le
ss
than
te
n
e
igen
vo
ic
es
a
re
ta
ken
i
nto
c
on
s
iderati
on
s
o
th
at
few
sec
ond
of
a
da
ptati
on
s
peec
h
will
be
require
d.
T
he
m
at
he
m
at
ic
ally
com
pu
te
d
ei
ghte
en
ei
ge
nvoices
ar
e
as:
0.1
8069
6,
0.168
936,
0.0
82378,
0.065
117,
0.0
58677,
0.0
27971,
0.0
2012
4,
0.017
375,
0.0
16086,
0.0
08081,
0.0
0706
3,
0.004
332,
0.0
0347
4,
0.003
072,
0.0
02031,
0.0
01976,
0.0
0112,
a
nd
0.0
01062.
The
a
dap
ta
ti
on
data
,
=
1
,
…
…
.
,
to
est
im
at
e
un
i
qu
e
ei
genv
oice
weig
hts
by
m
axi
m
iz
ing
the
li
kelih
ood
of
.
I
n
m
at
hem
atical
l
y
on
e
ca
n
fin
d
by
m
axi
m
iz
ing
th
e
f
unct
ion as
f
ollows:
(
)
=
∑
1
(
)
(
)
+
∑
∑
(
,
)
(
)
+
∑
∑
(
)
(
(
,
)
)
=
1
=
1
−
1
=
1
,
=
1
=
1
(2)
Stat
e
r
init
ia
l
pr
oba
bili
ty
and
po
ste
rio
r
pr
obabili
ty
of
obse
rv
at
io
n
is
repr
esented
by
π
r
an
d
(
)
resp
ect
ively
at
tim
e
t
.
Stat
e
p
poste
rio
r
pro
bab
il
it
y
of
ob
serv
at
io
n
seq
ue
nce
is
re
pr
es
ented
by
ξ
t
(
p
,
r
)
at
tim
e
t
an
d
at
sta
te
r
at
ti
m
e
+
1
.
is
the
r
th
G
aussia
n
pro
ba
bili
ty
den
sit
y
f
un
ct
i
on
.
Fu
rt
her
(
)
=
∑
∑
γ
t
(
r
)
log
(
b
r
(
o
t
,
w
)
)
T
t
=
1
R
r
=
1
is relat
ed
to
the
ne
w
s
peak
e
r
s
uper
ve
ct
or
as
fo
ll
ows
:
Q
b
(
w
)
=
−
0
.
5
∑
∑
γ
t
(
r
)
[
d
1
lo
g
(
2π
)
+
log
|
C
r
|
+
‖
o
t
−
s
r
(
w
)
‖
2
C
r
]
T
t
=
1
R
r
=
1
(
3
)
C
ov
a
riance
m
at
rix
of
th
e
Ga
ussi
an
in
e
qn.
(
3)
at
sta
te
is
r
epr
ese
nted
as
.
Her
e
the
est
i
m
at
ion
of
ei
genvo
ic
es
is
gen
e
rali
zed
by
pe
rfor
m
ing
ke
rn
el
PCA
in
it
s
place
of
li
ne
ar
PCA
.
S
ubse
quent
ly
,
le
t
(
.
,
.
)
be
a k
e
rn
el
with
a
corres
pondin
g m
app
ing
. Th
i
s m
aps
the
patte
rn
of t
he
s
peci
fic sp
ea
ke
r
s
up
erv
ect
or sp
ac
e
to
the
(
)
in
the
sp
ea
ker
s
pec
ific
featur
e
s
pa
ce
ℱ
.
Give
n
a
set
of
N
pat
te
rn
s
s
pea
ker
su
pe
r
vecto
r
s
(
1
,
2
,
…
…
−
1
,
)
denote
the
m
ean
of
t
he
-
m
app
ed
fea
ture
vectors
by
̅
=
1
∑
(
)
=
1
an
d
the
centere
d
m
ap
with
̃
=
(
)
−
̅
.
Nex
t
ste
p
E
igen
deco
m
posit
ion
is
pe
rfor
m
ed
on
̃
wh
e
re
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
2
,
A
pr
i
l 202
0
:
1859
-
1867
1862
=
[
(
,
)
]
,
.
is
the
ℎ
ort
ho
gonal
ei
ghnvec
tor
of
dim
ension
c
ov
a
riance
m
at
rix
in
the
f
eat
ur
e
sp
ace
is
re
pres
ented
as
=
∑
√
=
1
̅
(
)
by
c
on
si
der
i
ng
=
⋀
′
w
he
re
=
[
1
,
…
…
−
1
,
]
with
=
[
1
,
…
.
.
(
−
1
)
,
]
′
and
⋀
=
(
1
,
…
…
−
1
,
)
.
A
com
pu
te
r
ge
ne
rated
8
8
ort
hogo
nal
ei
ghnv
ect
or
is
represe
nted
in
T
able
1.
T
wo
-
dim
ension
re
presentat
ion
of
ut
te
ran
ces
f
ro
m
TIMIT
databas
e
evaluati
on
us
in
g KPCA+li
nea
r
s
olu
t
io
n
a
nd
non
-
li
near
SVM
shown i
n
Fi
gure
1.
Table
1
.
A
c
om
pu
te
r
ge
ner
at
ed
8X8
o
rth
ogonal
ei
ghnv
ect
or
v
m
C1
C2
C3
C4
C5
C6
C7
C8
R1
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
2
9
0
.00
0
0
R2
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
29
0
.00
0
0
R3
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
2
9
0
.00
0
0
R4
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
2
9
0
.00
0
0
R5
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
2
9
0
.00
0
0
R6
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
2
9
0
.00
0
0
R7
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
2
9
0
.00
0
0
R8
-
1
.00
0
0
-
0
.85
7
1
-
0
.71
4
3
-
0
.57
1
4
-
0
.42
8
6
-
0
.28
5
7
-
0
.14
2
9
0
.00
0
0
Figure
1
.
Tw
o
-
dim
ensio
repre
sentat
ion o
f utt
eran
ces
fro
m
TI
MIT
databa
se
evaluati
on
us
in
g KPCA+
li
nea
r
so
luti
on a
nd
non
-
li
near S
VM
2.2.
G
au
ssi
an
mi
xture m
odel
(G
MM)
b
ase
d h
ig
h la
bel f
eature m
od
el
ing
GMM
has
bec
om
e
the
le
ading
ge
ne
rati
on
st
at
is
ti
cal
m
od
el
in
the
sta
te
of
the
art
A
SR
syst
e
m
.
GMM
is
an
at
tract
ive
sta
ti
sti
cal
m
od
el
becau
se
it
can
re
pr
ese
nt
va
r
iou
s
pro
bab
il
it
y
den
sit
y
funct
ion
s
wh
e
n
est
i
m
at
ing
a
suffici
ent
nu
m
ber
of
pa
ram
et
ers.
T
he
GM
M
,
in
ge
ner
al
,
con
ta
in
s
a
set
of
m
ulti
var
ia
te
Ga
us
sia
n
de
ns
it
y
functi
ons
re
pre
sented
by
the
i
nd
e
x
. T
he res
ulti
ng
pro
bab
il
it
y
den
sit
y
fun
ct
ion
f
or
a
p
art
ic
ular s
peak
e
r
m
od
el
is
a
convex
com
bin
at
ion
of
al
l
den
sit
y
f
un
ct
io
ns.
GM
M
is
bu
il
t
us
ing
sta
nd
a
r
d
m
ulti
var
ia
te
Ga
us
sia
n
densi
ty
,
bu
t
intr
oduces
c
om
po
nen
t
in
dex
k
a
s
a
la
te
nt
var
ia
ble
with
discre
te
pro
ba
bili
ty
(
⁄
)
.
Th
e
weig
ht
s
are
re
pr
es
ente
d
as
=
(
⁄
)
.
Com
plies
with
the
GMM
densi
ty
fu
ncti
on
an
d
t
he
co
ndit
ion
s
that
c
har
a
ct
erize
the
past
c
on
t
ribu
ti
ons
of
t
he
co
rr
es
pondin
g
com
ponen
t
as
∑
=
1
=
1
.
E
ach
Gau
s
sia
n
de
ns
it
y
re
presents
a
co
ndit
ion
al
densi
ty
functi
on
(
(
|
,
)
)
.
Acc
ordin
g
t
o
Ba
ye
s
’
theo
rem
,
the
joint
pro
ba
bili
t
y
de
ns
it
y
functi
on
(
(
|
,
)
)
is
give
n
by
t
he
m
ultip
li
cat
ion
of
t
he
tw
o.
T
he
s
um
ov
e
r
al
l
de
ns
it
ie
s
re
su
lt
s
i
n
th
e
m
ulti
-
m
od
al
p
r
ob
a
bili
ty
d
ensity
o
f
GMMs as
fo
ll
ows:
(
|
⊖
)
=
∑
(
|
⊝
)
=
1
∙
(
|
,
⊖
)
=
∑
∙
{
(
|
,
Σ
)
}
=
1
(
4
)
Wh
e
re
μ
k
is
the
m
ean
vect
or
a
nd
Σ
k
is
the
co
var
ie
nc
e
m
at
rix.
Each
com
po
ne
nt
de
ns
it
y
is
com
ple
te
ly
deter
m
ined
by
μ
k
a
nd
Σ
k
.
The
pa
ram
et
er
set
⊝
=
{
1
,
2
,
…
.
.
,
,
1
,
2
,
…
.
.
,
Σ
1
,
Σ
2
,
…
.
.
Σ
}
wh
e
re
ei
ghti
ng
fact
or
includi
ng s
pecific
sp
ea
ke
r
m
od
el
of m
ean vect
or
a
nd c
ov
a
riance m
at
rix.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
High leve
l s
pe
aker speci
fi
c features
mo
delin
g
in
auto
m
atic spe
aker
…
(
Saty
anand Si
ngh
)
1863
Fig
ure
2
il
lustr
at
es
the
li
kelih
ood
f
unct
io
n
of
the
GMM,
in
cl
ud
in
g
se
ve
n
Gau
s
sia
n
distribu
ti
ons
with
cov
a
riance
m
atr
ic
es
of
t
wo
dim
ension
al
m
ea
n
an
d
feat
ur
e
ve
ct
or
s
are
c
hos
en
1
an
d
2
de
no
te
the
el
em
ents
of
the
featu
re
ve
ct
or
.
C
om
pu
te
r
ge
ner
at
e
d
log
-
li
kelih
ood
c
om
plete
d
trai
nin
g
s
pe
a
ker
1
m
od
el
is
rep
re
sente
d
as
-
6.0
6737
9
,
-
4.2
8833
3
,
-
4.253
459
,
-
4.2
4104
3
,
-
4.2
30592
,
-
4.218
451
,
-
4.2
03952
,
-
4.1
88224
,
-
4.17356
6
,
-
4.1
6195
5
,
-
4.
153866
,
-
4.14861
2
,
-
4.1
45268
,
-
4.1
43124
,
-
4.1
4171
2
,
-
4.
140738
.
A
c
om
pu
te
r
gen
e
rat
ed
8
8
trai
ning
feature
vecto
rs
of
a
sp
ea
ker
by
G
aussian
Mi
xtur
e
Mo
dels
is
r
e
pr
ese
nted
in
T
able
2
a
nd
T
a
ble
3
represe
nt
te
sti
ng
featu
re
vecto
rs
of
sam
e
sp
eaker
with
di
ff
e
ren
t
te
xt.
Fi
gur
e
2
sho
ws
a
li
ke
li
ho
od
functi
on
f
or
a G
MM
with s
even Ga
us
sia
n densi
ti
es.
Figure
2
.
A
li
ke
li
ho
o
d f
un
ct
io
n for a
GM
M
with se
ve
n Ga
us
sia
n de
ns
it
ie
s
Table
2
.
A
c
om
pu
te
r
ge
ner
at
ed
8X8
trai
ning
fea
ture vect
ors
of
a sp
ea
ker
by
G
aussian
m
ixtur
e m
od
el
s
C1
C2
C3
C4
C5
C6
C7
C8
R1
4
.06
4
6
2
.79
6
0
3
.36
9
6
2
.56
6
5
1
.41
1
5
1
.45
8
2
1
.33
9
3
0
.76
3
7
R2
4
.83
1
7
3
.57
5
6
3.
3678
2
.86
0
8
0
.93
0
4
0
.80
7
5
0
.92
9
5
1
.18
4
8
R3
3
.75
6
2
3
.42
7
3
3
.83
8
0
2
.75
2
2
1
.34
7
1
0
.99
3
4
1
.47
3
1
1
.65
7
6
R4
5
.00
2
1
3
.39
6
9
3
.40
3
2
2
.23
5
4
0
.49
1
4
0
.89
3
1
2
.05
6
3
1
.42
4
4
R5
4
.15
2
8
3
.34
6
2
3
.81
4
8
3
.40
0
6
1
.82
6
8
1
.04
5
0
1
.54
3
6
1
.15
1
2
R6
3
.83
5
2
3
.16
0
5
4
.36
1
6
2
.86
5
2
1
.75
10
1
.04
6
4
1
.63
3
6
1
.30
0
7
R7
4
.16
1
0
3
.34
3
0
4
.41
1
4
1
.78
5
7
1
.10
0
3
1
.53
8
8
1
.38
8
5
1
.65
4
9
R8
3
.59
2
1
3
.72
6
5
4
.16
3
4
2
.51
1
8
1
.86
2
3
1
.52
3
1
1
.55
6
9
1
.41
4
8
Table
3
.
8X8
te
sti
ng
featur
e
v
e
ct
ors
of
a s
peak
e
r by
Ga
us
sia
n
m
ixtur
e
m
od
el
s
C1
C2
C3
C4
C5
C6
C7
C8
R1
3
.2
927
2
.00
8
6
4
.76
3
0
3
.17
6
0
1
.46
7
5
0
.93
3
1
1
.73
1
8
1
.31
9
4
R2
3
.64
1
8
2
.61
7
2
5
.19
2
5
2
.51
2
4
0
.54
1
7
1
.29
2
9
1
.99
1
6
0
.97
5
6
R3
2
.98
9
7
1
.63
8
2
5
.25
6
5
4
.00
0
6
1
.36
4
7
1
.88
2
4
1
.95
7
6
1
.02
4
5
R4
3
.42
0
3
2
.37
6
0
4
.45
9
6
2
.54
3
4
1
.08
0
3
1
.41
0
7
1
.84
4
0
1
.32
0
8
R5
3
.48
6
4
2
.96
0
4
3
.94
1
0
3
.21
2
0
1
.51
3
8
1
.50
9
8
2
.21
6
0
1
.20
5
1
R6
4
.00
0
4
2
.29
8
0
4
.27
8
1
3
.05
0
4
1
.83
6
4
1
.01
2
1
1
.26
0
0
1
.14
9
1
R7
3
.08
0
6
2
.04
1
7
4
.03
3
1
3
.63
9
5
1
.97
4
3
1
.81
9
5
1
.37
7
4
1
.08
0
0
R8
2
.91
0
9
2
.31
1
6
4
.60
1
9
3
.51
6
7
2
.32
7
0
1
.18
5
8
2
.66
7
4
1
.39
9
4
2.3.
Li
near
di
scri
mi
na
n
t an
aly
sis
(
LD
A)
ba
sed
hig
h la
bel fe
at
ure
m
od
el
ing
LDA
is
a
c
omm
on
ly
e
m
plo
ye
d
te
c
hn
i
qu
e
i
n
sta
ti
sti
cal
pat
te
rn
rec
ogniti
on
t
hat
ai
m
s
at
find
in
g
li
near
com
bin
at
ion
s
of
feat
ur
e
c
oe
ff
ic
ie
nts
t
o
fa
ci
li
ta
te
discrim
inati
on
of
m
ul
ti
ple
cl
asses.
It
fin
ds
or
thogonal
or
ie
ntati
o
n
i
n
pl
ace
of
m
os
t
ef
fecti
ve
functi
ons
i
n
cl
ass
disc
rim
inati
on
.
By
introd
ucin
g
the
ori
gin
al
featu
r
es
in
these
guideli
ne
s,
t
he
acc
ur
a
cy
of
cl
assifi
cat
io
n
im
pr
oves.
Le
t
us
in
dicat
e
th
e
set
of
al
l
de
ve
lop
m
ent
utter
ances
by
D
,
utterance
featur
e
s
i
nd
ic
a
te
d
by
w
s
,
i
,
t
hese
featur
e
s
obta
in
ed
from
the
it
h
utterance
of
the
s
pea
ker
s
,
the
total
nu
m
be
r
of
uttera
nces
belo
ng
i
ng
to
s
is
ind
ic
at
e
d
by
n
s
and
the
total
num
ber
of
sp
e
ake
rs
in
D
is
in
dicat
ed
by
S
. Class co
va
riance m
at
rices betwee
n
S
b
an
d wit
hin
S
w
are
giv
e
n by
0
10
20
30
40
50
0
10
20
30
40
50
-
2
0
-
1
0
0
10
20
x1
x2
Li
k
e
l
i
ho
od
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
2
,
A
pr
i
l 202
0
:
1859
-
1867
1864
=
1
∑
(
̅
−
̅
)
(
̅
−
̅
)
=
1
(5)
=
1
∑
1
∑
(
,
−
̅
)
(
,
−
̅
)
=
1
=
1
(6)
Wh
e
re
t
he
s
pe
aker
de
pe
nd
a
nt
m
ean
vector
is
giv
e
n
by
w
̅
s
=
1
n
s
∑
w
s
,
i
n
s
i
=
1
⁄
a
nd
s
peak
e
r
i
ndepende
nt
m
ean
vecto
r
is
giv
e
n
by
w
̅
=
1
S
∑
1
n
s
∑
w
s
,
i
n
s
i
=
1
S
s
=
1
res
pecti
vely
.
The
L
D
A
op
ti
m
iz
a
ti
on
is
the
refo
re
to
m
axi
m
ize
betwee
n
cl
ass
var
ia
nce,
w
herea
s
re
du
ci
ng
w
it
hin
the
cl
as
s
var
ia
nce.
T
he
exact
est
im
a
ti
on
ca
n
be
ob
ta
i
n
f
r
om
this o
ptim
iz
at
i
on b
y s
ol
ving
gen
e
rali
zed ei
ge
nv
al
ue pr
oble
m
:
=
∧
(7)
The
dia
gonal
m
at
rix
co
ntaini
ng
of
ei
gn
vect
or
is
in
dicat
ed
by
∧
.
If
the
m
at
rix
S
w
in
e
qn.
(
6
)
i
s
in
ver
ti
ble
the
n
the s
olu
ti
on ca
n be easil
y f
ound b
y
S
w
−
1
S
b
.
A
L
DA
m
a
trix
of d
im
ension
R
×
k
is as f
ollo
ws
:
=
[
1
…
…
.
.
]
(8)
k
ei
ge
nvect
ors
v
1
…
…
.
.
v
k
obta
ined
by
s
olv
in
g
e
qn.
(
7
)
.
T
hu
s
,
the
L
D
A
c
hange
of
th
e
uttera
nce
fea
ture
w
is
ob
ta
ine
d
i
n
t
his w
ay
:
(
)
=
(9)
A
c
om
pu
te
r ge
ner
at
e
d
8X8
Φ
L
DA
(
w
)
m
at
ri
x of dim
ension
RXk
by L
DA Mo
de
ls i
s r
e
pr
ese
nte
d
in
T
a
ble
4.
Table
4
.
A
c
om
pu
te
r
ge
ner
at
ed
8X8
(
)
m
at
rix
of dim
ension
C1
C2
C3
C4
C5
C6
C7
C8
R1
-
0
.53
0
2
-
0
.63
2
8
-
0
.64
0
2
-
0
.58
6
1
-
0
.53
0
6
-
0
.51
3
7
-
0
.54
0
3
-
0
.56
7
8
R2
-
0
.66
0
1
-
0
.79
3
2
-
0
.81
8
9
-
0
.77
7
4
-
0
.73
4
7
-
0.
7332
-
0
.77
7
3
-
0
.81
3
8
R3
-
0
.69
4
9
-
0
.84
2
0
0
.88
4
6
-
0
.86
2
2
-
0
.83
8
9
-
0
.85
6
5
-
0
.92
1
9
-
0
.97
8
3
R4
-
0
.65
9
4
-
0
.80
3
1
-
0
.84
8
4
-
0
.83
0
8
-
0
.81
2
4
-
0
.83
9
9
-
0
.92
8
9
-
1
.02
7
1
R5
-
0
.63
1
4
-
0
.76
5
3
-
0
.79
6
8
-
0
.75
8
4
-
0
.71
6
9
-
0
.73
2
5
-
0
.83
7
4
-
0
.98
8
5
R6
-
0
.66
9
8
-
0
.80
2
9
-
0
.81
7
0
-
0
.7
446
-
0
.66
1
5
-
0
.64
5
0
-
0
.74
6
2
-
0
.93
3
2
R7
-
0
.75
4
8
-
0
.89
8
5
-
0
.90
7
2
-
0
.81
5
7
-
0
.70
4
4
-
0
.65
8
8
-
0
.74
2
3
-
0
.93
3
3
R8
-
0
.78
7
6
-
0
.93
2
8
-
0
.94
6
7
-
0
.86
8
8
-
0
.77
2
2
-
0
.73
1
4
-
0
.80
6
5
-
0
.98
0
6
LDA
ass
um
es
norm
al
distribu
ti
on
data
f
or
a
ll
cl
asses,
sta
ti
sti
cal
ly
ind
epe
nd
e
nt f
eat
ures and
the
sam
e
cov
a
riance
m
a
trix.
H
ow
e
ver,
this
only
a
ppli
es
to
LD
A
a
s
a
cl
assifi
er
.
If
t
hese
ass
um
ptions
are
vi
olate
d,
the
dim
ensional
ly
red
uc
ed
L
DA
ca
n
wor
k
reasona
bly.
E
ve
n
for
cl
assi
ficat
ion
ta
s
ks
,
L
DA
seem
s
powerfu
l
enou
gh
t
o
be
use
d
f
or
data
d
i
stribu
ti
on
in
A
SR
app
li
cat
io
ns.
The
s
pea
ker
featur
e
m
od
el
ing
histo
gr
am
s
with
norm
al
f
it
ei
gen
vect
or obtai
ne
d from
the LD
A
is i
ll
us
trat
e
d
in
Fig
ure
3.
Figure
3
.
The
s
peak
e
r feat
ure
m
od
el
ing
h
ist
ogram
s w
it
h nor
m
al
f
it
eigen
ve
ct
or
with L
D
A
-
0
.
5
-
0
.
4
-
0
.
3
-
0
.
2
-
0
.
1
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
2
4
6
8
10
Le
ng
t
h
of
Fe
a
t
ure
V
e
c
t
ors
C
ou
nt
s
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
High leve
l s
pe
aker speci
fi
c features
mo
delin
g
in
auto
m
atic spe
aker
…
(
Saty
anand Si
ngh
)
1865
3.
ACOU
STIC
DA
T
A
FE
AT
UR
E E
X
TR
A
CTIO
N
T
he
s
peak
e
r
s
pecific
feat
ur
e
s
ref
e
r
to
par
a
m
et
ers
extracte
d
f
ro
m
ph
ra
s
e
segm
ents/perio
ds
withi
n
a
20
-
25
m
s
f
ra
m
e.
The
m
os
t
com
m
on
short
-
te
rm
acou
sti
c
f
eat
ur
es
are
Me
l
Fr
e
quency
C
e
ps
tr
um
Coeff
i
ci
ents
(MFCC
)
a
nd
L
inear
Pr
e
dicti
ve
Co
ding
(L
P
C)
base
d
featu
res
[18
,
19,
20
]
.
I
n
orde
r
t
o
ob
ta
in
these
c
oeffici
ents
from
the
s
peec
h
rec
ordin
g,
th
e
s
peech
sam
ples
are
first
di
vid
ed
i
nto
sho
rt
ov
e
rlap
ping
se
gm
ents.
T
he
si
gn
al
s
ob
ta
ine
d at
the
se
segm
ents
/
fr
am
es
are
then
m
ul
ti
plied
by
a
window
functi
o
n
(e.
g.
Ham
m
ing
a
nd
Ha
nning)
t
o
ob
ta
in
a
F
our
ie
r
powe
r
s
pe
ct
ru
m
.
In
th
e
ne
xt
ste
p,
t
he
log
a
rithm
of
the
s
pectr
um
is
cal
culat
e
d
a
nd
a
m
el
-
sp
ace
f
il
te
r
ba
nk
a
na
ly
sis
of
no
n
-
l
inear
inter
vals
is
perform
ed.
L
og
a
rithm
ic
operati
on
s
e
xp
a
nd
the
ra
nge
of
c
oe
ff
ic
ie
nts
a
nd
br
ea
k
up
t
he m
ulti
plica
ti
ve c
om
po
nen
ts
i
nto
add
it
io
nal
c
ompone
nts
[21]
.
I
n
filt
er
bank
analy
sis,
sp
ect
ral
e
nerg
y
(also
cal
le
d
filt
er
ba
nk
e
ne
rg
y
c
oe
ff
ic
ie
nt
)
is
ge
ner
at
e
d
for
eac
h
cha
nnel
to
represe
nt d
if
fere
nt freq
ue
ncy
bands
.
Fil
te
rb
an
ks
,
li
ke
th
e
hum
an
au
ditor
y
syst
e
m
,
are
desig
ne
d
to
be
m
or
e
se
ns
it
ive
to
f
reque
nc
y
cha
ng
e
s
at
the
bo
tt
om
of
the
sp
e
ct
ru
m
.
Finall
y,
the
MFC
C
is
obta
ine
d
by
pe
rfo
rm
in
g
a
discr
et
e
cos
ine
tra
ns
f
or
m
(
DCT
)
on
the
filt
er
ba
nk
ene
r
gy
pa
ra
m
et
ers
and
reta
ining
m
any
pr
e
a
m
ble
co
ef
fici
ents
[
22,
23
]
.
D
CT
h
as
tw
o
im
portant
pro
per
ti
es.
(i)
t
o
c
om
pr
ess
the
ene
rg
y
of
the
sign
al
i
n
to
m
ulti
ple
coeffic
ie
nt
s,
an
d
(ii)
to
be
highly
co
r
relat
ed
with
the
c
oe
ff
ic
ie
nts.
For
these
reas
ons,
us
in
g
D
CT
to
rem
ov
e
s
pecific
dim
ension
s
i
m
pr
ov
es
the
e
ff
ic
ie
ncy
of
the
m
od
el
a
nd
r
ed
uces
so
m
e
har
m
fu
l
c
om
po
ne
nts
[
24
]
.
F
ur
t
her
m
or
e
,
t
he
unc
orrelat
ed
pro
per
ti
es
of
t
he
DCT
help
t
o
ass
um
e
that
the
m
od
el
s
of
featur
e
coe
ff
ic
ie
nts
are
not
r
el
evan
t.
I
n
s
um
m
ary,
the
f
ollo
wing
s
equ
e
nce
of
op
e
rati
on
s
-
power
sp
ect
r
um
,
lo
ga
rithm
,
DCT
-
pr
oduces
a
sig
na
l
with
a
well
-
know
n
cepstral
repres
entat
ion
[
25
]
.
4.
E
X
PERI
MEN
TAL SET
UP
The
ex
pe
rim
en
t
us
es
the
TIM
IT
set
of
datab
ase.
The
pro
posed
al
gorithm
i
m
ple
m
ented
in
MAT
L
A
B
and
res
ults
were
c
om
par
ed
wit
h
t
hose
of
the
Eigen
voic
e
c
onside
r
at
ion
i
n
HMM,
GMM
and
L
D
A.
A
t
ot
al
1000
utterances
of
the
TIM
IT
database
of
6
sec,
4
sec
a
nd
2
se
c
vo
ic
e
wer
e
put
to
trai
n
a
nd
te
st
the
AS
R
s
yst
e
m
.
Fo
r
the
ab
ove
cases,
A
SR
r
ecognit
ion
ef
fi
ci
ency
has
bee
n
cal
c
ulate
d
“
Eff
ic
ie
ncy”
=
Nu
m
ber
of
utterance
corre
ct
ly
identifie
d/To
ta
l
N
um
ber
of
uttera
nce
un
der
te
st.
Table
5 s
hows
that
the
eff
ic
ie
ncy
of
t
he AS
R
syst
e
m
for
HMM,
G
MM
and
LD
A
res
pecti
vely
.
It
can
be
obse
rv
e
d
from
this
ta
ble
that
us
e
of
GMM
has
highest
eff
ic
ie
ncy
c
ompare
d
t
o
oth
e
r
m
od
el
ing
te
ch
niques.
Fi
gure
4
s
how
t
he
e
qual
er
r
or
rate
(
EER)
of
HMM,
GMM
,
and
L
D
A
ba
se
d
m
od
el
ing
te
c
hn
i
qu
e
.
T
he
A
SR
eff
ic
ie
ncy
of
HMM,
GMM,
an
d
LD
A
base
d
m
od
el
ing
te
c
hn
i
qu
e
are
98.8
%
,
99.
1%,
an
d
98.
6
%
an
d
E
ER
ar
e
4.5%
,
4.4%
and
4.5
5%
res
pecti
vely
.
T
he
EER
im
pr
ov
e
m
ent
of
GMM
m
od
el
ing
te
ch
nique
base
d
AS
R
s
yst
e
m
co
m
pared
with
HM
M
an
d
L
DA
is
4.2
5%
a
nd
8.5
1
%
resp
ect
ively
.
Figure
4.
Eq
ua
l Error
Rat
e of
AS
R sy
ste
m
o
f
H
MM
, GM
M
and
LDA ba
sed
m
od
el
ing t
ech
nique
for 2 sec
of
vo
ic
e
data
0
.
1
0
.
2
0
.
5
1
2
5
1
0
2
0
4
0
0
.
1
0
.
2
0
.
5
1
2
5
1
0
2
0
4
0
Fa
l
s
e
A
l
a
r
m
proba
bi
l
i
t
y
(
i
n
%
)
M
i
s
s
proba
bi
l
i
t
y
(
i
n
%
)
H
M
M
G
M
M
LD
A
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
2
,
A
pr
i
l 202
0
:
1859
-
1867
1
866
Table
5
.
E
ff
ic
i
ency o
f
t
he AS
R sy
stem
f
or
H
MM
,
GMM a
nd L
DA
res
pect
ively
HMM
GMM
LDA
Ef
f
icien
cy
in %
EE
R in
%
Ef
f
icien
cy
in %
EE
R in
%
Ef
f
icien
cy
in %
EE
R in
%
6
sec
9
9
.6
4
.9
9
9
.9
4.
7
9
9
.1
5.
1
4
sec
9
8
.8
4
.9
9
9
.5
4.
7
9
8
.2
5.
1
2
sec
9
8
.8
4
.9
9
9
.1
4.
7
9
8
.6
5.
1
5.
CONCL
US
I
O
N
This
pa
pe
r
pres
ented
t
he
resea
rch,
de
velo
pm
e
nt
an
d
e
valuati
on
of
A
SR
syst
e
m
based
on
H
MM
,
GM
M
and
LD
A
m
od
el
ing
te
ch
niques.
GMM
m
od
el
s
pro
vid
e
a
si
m
ple
bu
t
eff
ect
ive
repre
sentat
ion
t
hat
offer
s
inex
pensi
ve
a
nd
high
recog
niti
on
acc
ur
acy
for
a
wide
ra
nge
of
sp
ea
ke
r
recog
niti
on
ta
s
ks
.
A
n
e
xperi
m
ental
evaluati
on
of
t
he
pe
r
form
ance
of
t
he
s
peake
r
rec
ogniti
on
s
yst
e
m
has
bee
n
do
ne
on
publ
ic
ly
avail
able
TIMIT
database
.
F
or
t
he
10
00,
voic
e
sa
m
ples
of
the
TIMIT
databa
se
sp
a
ker
recogn
it
io
n
accu
ra
cy
99
.
1%,
98.
8%
and
98.6
f
or
GMM,
HMM
an
d
L
DA
was
obta
ined
f
or
2
sec
of
voic
e
le
ng
t
h.
T
he
EER
i
m
pr
ov
em
ent
of
GM
M
m
od
el
ing
tech
nique
base
d
A
SR syst
em
com
par
ed
with
HM
M and L
DA
is 4.2
5%
a
nd 8.51
%
res
pecti
ve
ly
.
As
e
xp
e
rim
ent
al
resu
lt
s
s
how
ed
that,
sp
e
ake
r
rec
ogniti
on
pe
rfor
m
ance
is
at
pr
act
ic
al
ly
usa
ble
le
v
el
s
for
sp
eci
fic
a
ppli
cat
ion
s
s
uch
as
acce
ss
c
ontrol
a
uth
e
ntica
ti
on
.
T
he
m
ai
n
li
m
iting
fact
or
in
le
ss
c
on
t
ro
ll
ed
sit
uations
is
th
e
la
ck
of
r
obust
ness
to
tra
ns
m
issi
on
im
pairm
ents
s
uch
as
noise
an
d
m
ic
var
ia
bili
ty
.
Much
m
or
e
to
a
ddress
the
se
lim
it
ation
s,
su
c
h
as
e
xplori
ng
areas
s
uc
h
as
unde
rstand
i
ng
a
nd
m
od
el
ing
the
im
pact
of
i
m
pair
m
ents
on
s
pectral
c
ha
r
act
erist
ic
s,
ap
pl
yi
ng
m
or
e
s
ophisti
cat
ed
c
ha
nn
el
com
pen
sa
ti
on
te
c
hn
i
qu
e
s,
a
nd
exp
l
or
i
ng f
eat
ures that a
re less
sen
sit
ive t
o
c
ha
nn
el
de
gr
a
dation ef
forts a
re
un
de
r
way.
REFERE
NCE
S
[1]
S.
Singh,
“
Forensic
and
Autom
at
i
c
Speake
r
Re
cogni
ti
on
S
y
ste
m
”
Inte
rnationa
l
Journal
of
Ap
pli
ed
Engi
ne
eri
ng
Re
search
,
Vol.
8
,
No.
5,
2018,
pp
.
2804
-
2811
,
20
18.
[2]
S.
Singh
and
Aje
et
Singh
“
Acc
ur
acy
Com
par
ison
using
Diffe
r
ent
Modeli
ng
Tech
n
ique
s
und
er
Li
m
it
ed
Spee
ch
D
ata
of
Speak
er
R
ecogniti
on
S
y
st
ems
,
”
Globa
l
Journal
of
Sc
ie
n
ce
Fronti
er
Re
sear
ch:
F
Math
emat
ic
s
and
Dec
isio
n
Sci
en
ce
s,
vol
16
(
2),
pp
.
1
-
17
,
201
6
.
[3]
S.
Singh.
“
Ba
yesia
n
dist
ance
m
et
ric
le
arn
ing
and
i
ts
applic
at
ion
in
aut
om
a
ti
c
spe
a
ker
re
c
ognit
ion
s
y
stem
s”
Inte
rnational
Jo
urnal
of El
e
ct
ri
c
al
and
Comput
er
Engi
n
ee
ring
,
V
ol,
9
,
No
.
4
,
201
9.
[4]
S.
Singh.
“
The
Role
of
Spe
ec
h
Te
chno
log
y
in
Biom
et
ric
s,
For
ensic
s
and
Man
-
Mac
hine
Int
erf
ac
e
”
In
te
rnat
ion
al
Journal
of
Elec
t
rical
and
Computer
Eng
ine
ering
,
Vol.
9
,
No.
1,
pp
.
281
-
288,
2019.
[5]
S.
Singh.
“
High
Le
v
el
Spe
ake
r
Speci
fi
c
Fea
tures
as
an
Eff
icien
c
y
Enh
anc
ing
Para
m
et
ers
in
Sp
ea
ker
Rec
ogn
it
i
on
S
y
stem
,
”
Int
ernati
onal Journal of
E
le
c
tric
al
and
Computer
Eng
i
nee
ring
,
Vol,
9,
No.
4,
2019.
[6]
S.
Singh,
Abha
y
Kum
ar,
David
Raj
u
Kollur
i,
“
Eff
icient
Modelling
Techni
qu
e
b
ase
d
Speak
er
R
ec
ogni
ti
on
und
e
r
Li
m
it
ed
Speec
h
Data
,
”
In
te
rnati
onal
Journal
of
Image,
Gr
aphics
and
Signal
Pr
oce
ss
ing(
IJI
GSP)
,
Vol.
8,
No.11
,
pp.
41
-
48,
2016.
[7]
Shri
ber
g,
E.,
&
Stolc
ke
,
“
Dire
c
t
m
odel
ing
of
p
rosod
y
:
An
over
v
iew
of
applic
at
ion
s
in
au
tomatic
sp
ee
ch
proc
essing
,”
In
Speec
h
Pros
od
y
,
Nar
a, Ja
pan
2004
.
[8]
Mar
y
,
L.,
&
Y
egna
nar
a
y
an
a,
B,
“
Pros
odic
f
e
at
ure
s
for
spe
a
ker
v
eri
fi
cation,”
In
Proc
ee
d
in
gs
of
In
te
rs
peech,
Pitt
sburgh,
Penn
s
y
lva
n
ia,
pp
.
91
7
-
92
0,
2006.
[9]
Ferre
r,
L
.
,
Shrib
erg
,
E.,
K
aj
ar
ek
ar,
S.,
&
Sonm
ez
,
K
,
“Param
ete
rization
of
pros
odic
fe
ature
distri
buti
ons
for
S
VM
modeli
ng
in
spe
ake
r
rec
ogni
ti
on
,
”
In
Proc
ee
din
gs
of
Inte
rna
ti
o
nal
Confer
ence
on
Acoustic
s,
S
pee
ch
and
Sign
al
Proce
ss
ing
,
Vol
.
4,
pp.
233
-
236,
2007
.
[10]
Han,
K.
,
Dong
,
Y.,
&
T
ashe
v,
I
,
“
Spee
ch
emotion
rec
ognition
using
dee
p
neur
al
net
wor
k
and
ex
treme
le
arnin
g
machine
,
”
In
Proce
ed
ings o
f
In
te
r
spee
ch
,
pp.
223
-
227,
2014
.
[11]
W
ang,
Z.
Q.
,
&
Ta
shev,
I
,
“
Learning
ut
te
ranc
e
-
l
ev
e
l
repr
ese
ntati
ons
for
s
pe
ec
h
e
motion
and
age
/
gende
r
re
cogni
t
i
on
using
dee
p
n
eu
ral
net
works
,”
I
n
IEEE
Inte
rn
ational
Confer
ence
on
Acousti
cs,
Spee
ch
and
Si
gnal
Proc
essing
(ICAS
S
P)
,
2017.
[12]
Vapnik
V.
“
An
Ove
rvie
w
o
f
Sta
t
isti
cal
Learning
Theory
,
”
IEE
E
Tra
nsac
ti
on
on
Neura
l
Ne
tworks
,
V
ol.
10,
No.
5,
pp.
988
-
999
,
199
9
.
[13]
S.Singh,
“
Support
Vec
tor
Ma
chine
Based
Appro
ac
hes
For
Real
Ti
m
e
Autom
at
ic
Speake
r
Re
cog
nit
ion
S
y
s
te
m
,
”
Inte
rnational
Jo
urnal
of Appl
i
ed Engi
ne
ering
R
ese
arch
,
Vol
.
13,
No.
10,
pp.
8561
-
8567,
2018
.
[14]
Scholkopf B, S
m
ola
A, “
Lear
n
i
ng
with kern
el
s:
support
vec
tor
m
ac
hin
es,
reg
u
la
r
i
za
t
ion,
op
ti
m
iz
a
t
ion,
and
be
y
ond
,
”
Cambridge, MA:
MIT
Press
;
200
2
[15]
Peskin,
B.
,
Na
vra
ti
l
,
J.
,
Abra
m
son,
J.,
Jones,
D.,
Klusa
ce
k
,
D.,
Re
y
no
lds,
D.,
e
t
al.,
“
Us
ing
pros
odic
and
conv
ersationa
l
f
eat
ures
for
hig
h
-
performance
sp
eak
er
r
ec
ogni
ti
o
n,
”
Report
from
JH
U
W
S’02,
In
Proce
edi
ngs
o
f
ICAS
S
P
,
Hong
Kong,
China,
V
ol.
4
,
pp
.
792
-
79
5,
2003
.
[16]
S.Singh,
Mansour
H.
As
saf,
Sunil
R.
Das,
Emil
M.
Petri
u,
and
Voicu
Groza
,
“
Short
Durat
ion
V
oic
e
Da
ta
Spe
ak
er
Rec
ognition
S
y
s
te
m
Us
ing
Novel
Fuzz
y
V
ec
tor
Q
uant
i
za
t
ion
Algo
rit
hm
s
,
”
2016
IE
EE
In
te
rn
at
ion
al
Instrum
ent
at
ion
and
Mea
sur
ement
T
ec
hnolog
y
C
onfe
ren
c
e
,
Ma
y
23
-
26,
T
ai
p
ei
,
T
ai
wan
,
2016
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
High leve
l s
pe
aker speci
fi
c features
mo
delin
g
in
auto
m
atic spe
aker
…
(
Saty
anand Si
ngh
)
1867
[17]
Naji
m
,
D.,
Dum
ouche
l,
P.
,
&
Kenn
y
,
P,
“
Modeli
ng
prosodi
c
feature
s
with
joi
nt
f
ac
to
r
an
aly
s
is
for
spea
k
e
r
ver
ifica
t
ion,”
I
E
EE
Tr
ansacti
ons
on
Aud
io, Speec
h,
and
Language
Proce
ss
ing
,
Vol
.
15
7,
2095
-
210
3,
2007
.
[18]
S.Singh,
“
Speak
er
Re
cogni
t
ion
b
y
Gauss
ia
n
Filt
er
Based
Fea
ture
E
xtra
c
ti
on
and
Propos
ed
Fuzz
y
Ve
c
tor
Quantiza
ti
on
Modeli
ng
Techn
ique
,
”
Inte
rn
at
io
nal
Journal
of
A
ppli
ed
Eng
ineeri
ng
R
ese
arch
,
Vo
l.
13,
No.
16
,
pp
.
12798
-
12804
,
2018.
[19]
S.Singh,
“
High
Le
ve
l
Speak
er
Speci
f
ic
Fe
at
ure
s
Modeli
ng
in
Aut
om
at
ic
Speak
er
Rec
ognition
S
y
s
te
m
,”
In
te
rnatio
na
l
Journal
of
Elec
t
rica
l
and
Computer
Eng
ine
ering
,
Vol.
10
,
No.
2
,
2
018,
pp
.
2804
-
2
811,
2020
.
[20]
S.Singh,
“
Speak
er
Re
cogni
t
ion
S
y
stem
for
L
imite
d
Spee
ch
Da
ta
Us
ing
High
-
Level
Spe
ake
r
Spe
ci
fi
c
Fea
ture
s
an
d
Support
Vec
tor
Mac
hin
es
”
Int
ernati
onal
Jou
r
nal
of
App
li
ed
Engi
n
ee
ring
R
ese
arch
,
Vol.
1
2
,
No.
9
,
2018
,
pp.
8026
-
8033
2
017.
[21]
S.Singh,
MH
As
saf
and
Abha
y
Kum
ar,
“
A
Novel
Algo
rit
hm
of
Spars
e
R
epr
ese
n
ta
t
io
ns
for
Speec
h
Com
pre
ss
ion/
Enha
nce
m
ent
and
Its
Appli
ca
t
io
n
in
Spe
a
ker
Rec
ognition
S
y
stem
,”
Inte
rnat
i
onal
Journal
o
f
Computati
onal
a
nd
Applied Math
emati
cs
,
Vol.
11,
No.
1
,
pp
.
89
-
10
4
,
2016
.
[22]
S.Singh,
“
Evalu
at
ion
of
Sparsifi
ca
t
ion
a
lgori
thm
and
I
ts
Appli
ca
t
ion
in
Speake
r
R
ec
ogni
ti
on
S
y
ste
m
”
Inte
rnationa
l
Journal
of
Appli
ed
Eng
ine
ering
Re
search
,
Vol.
1
3,
No.
17,
pp.
13
015
-
13021,
201
8.
[23]
S.Singh
and
Man
sour
H.
As
saf
“
A
Perfe
ct
Balanc
e
of
Sparsit
y
and
Acoustic
hole
in
Speec
h
Signal
an
d
Its
Appl
icati
o
n
in
Speak
er
R
ec
o
gnit
ion
S
y
stem”
Middl
e
-
East
Jou
rnal
of
S
ci
en
ti
f
ic R
ese
arch
,
Vol
.
24,
No.11
,
pp
.
3
527
-
3541,
2016
.
[24]
S.Singh
and
Dr
.
E
.
G.
Ra
ja
n
,
“
MF
CC
VQ
bas
ed
Speak
er
Re
c
ognit
ion
and
It
s
Acc
ura
c
y
Aff
ec
t
ing
Fact
o
rs
,
”
Inte
rnational
Jo
urnal
of
Engi
n
eer
ing
Re
search &
Technol
ogy
,
Int
ernati
onal
Journal
of
Co
mputer
Appl
ic
a
ti
ons
,
Vo
l
.
21,
No.
6
,
pp.
1
-
6,
2011
.
[25]
S.Singh
a
nd
Dr.
E.
G.
R
ajan,
“
Applic
a
ti
on
o
f
Diffe
ren
t
Filt
ers
In
Mel
Freque
nc
y
Cepstral
Co
eff
icient
s
Fe
at
ur
e
Ext
ra
ct
ion
And
Fuzz
y
Vec
tor
Quantiza
t
ion
Approac
h
In
Spe
ake
r
R
ec
ogni
ti
on
,
”
Int
e
rnational
Journa
l
of
Eng
ine
erin
g
Re
search
&
Technol
ogy
,
Vol.
2
I
ss
ue
6,
pp
-
3171
-
3182,
2013
.
Evaluation Warning : The document was created with Spire.PDF for Python.