Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
7,
No.
6,
December
2017,
pp.
3344
–
3357
ISSN:
2088-8708
3344
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
Limited
Data
Speak
er
V
erification:
Fusion
of
F
eatur
es
T
.
R.
J
ayanthi
K
umari
1
and
H.
S.
J
ayanna
2
1
Department
of
Electronics
and
Communication
Engineering,
Siddag
ang
a
Institute
of
T
echnology
,
Karnataka,
India
2
Department
of
Information
Science
and
Engineering,
Siddag
ang
a
Institute
of
T
echnology
,
Karnataka,
India
Article
Inf
o
Article
history:
Recei
v
ed:
Mar
29,
2017
Re
vised:
Jul
18,
2017
Accepted:
Aug
3,
2017
K
eyw
ord:
MFCC
LPCC
LPR
LPRP
GMM
GMM-UBM
ABSTRA
CT
The
present
w
ork
demonstrates
e
xperimental
e
v
aluation
of
speak
er
v
erification
for
dif-
ferent
speech
feature
e
xtraction
techniques
with
the
constraints
of
limited
data
(less
than
15
seconds).
The
state-of-the-art
speak
er
v
erification
techniques
pro
vide
good
performance
for
suf
ficient
data
(greater
than
1
minutes
).
It
is
a
challenging
task
to
de-
v
elop
techniques
which
perform
well
for
speak
er
v
erification
under
limited
data
condi-
tion.
In
this
w
ork
dif
ferent
features
lik
e
Mel
Frequenc
y
Cepstral
Coef
ficients
(MFCC),
Linear
Prediction
Cepstral
Coef
ficients
(LPCC),
Delta
(
4
),
Delta-Delta
(
44
),
Line
ar
Prediction
Residual
(LPR)
and
Linear
Prediction
Residual
Phase
(LPRP)
are
consid-
ered.
The
performance
of
indi
vidual
features
is
studied
and
for
better
v
erification
per
-
formance,
combination
of
these
features
is
attempted.
A
comparati
v
e
study
is
made
between
Gaussian
mixture
model
(GMM)
and
GMM-uni
v
ersal
background
model
(GMM-UBM)
through
e
xperimental
e
v
aluation.
The
e
xperiments
are
conducted
using
NIST
-2003
database.
The
e
xperimental
results
sho
w
that,
the
combination
of
features
pro
vides
better
performance
compared
to
the
indi
vidual
features.
Further
GMM-UBM
modeling
gi
v
es
reduced
equal
error
rate
(EER)
as
compared
to
GMM.
Copyright
c
2017
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
T
.
R.
Jayanthi
K
umari
Department
of
Electronics
and
Communication
Engineering
Siddag
ang
a
Institute
of
T
echnology
India,
Karnataka,
Beng
aluru-560077
Email:
trjayanthikumari@gmail.com
1.
INTR
ODUCTION
Speech
signals
play
a
main
role
in
communication
media
to
understand
the
con
v
ersation
between
the
people
[1].
The
speak
er
recognition
is
a
technique
to
recognize
a
speak
er
using
his/her
original
speech
v
oice
and
can
be
used
for
either
speak
er
v
erification
or
speak
er
identification
[2].
Ov
er
the
last
decade,
speak
er
v
eri-
fication
has
been
used
for
man
y
commercial
applications
and
these
applications
prefer
limited
data
conditions.
Further
,
limited
data
indicates
speech
data
of
fe
w
seconds
(less
than
15
sec).
Based
on
the
nature
of
training
and
test
speech
data,
te
xt-dependent
and
te
xt-independent
[3]
are
tw
o
classification
of
speak
er
v
erification.
In
te
xt-dependent
mode,
speak
er
training
and
testing
data
remains
same
and
in
case
of
te
xt-independent,
training
and
testing
speech
data
are
dif
ferent.
T
e
xt-independent
speak
er
v
erification
under
limited
data
conditions
has
al
w
ays
been
a
challenging
task.
The
speak
er
v
erification
system
contains
four
stages,
namely
analysis
of
speech
data,
e
xtraction
of
features,
modeling
and
testing
[4].
The
analysis
stage
analyzes
the
speak
er
information
using
v
ocal
tract
[5],
e
xcitation
source
[6]
and
suprase
gmental
features
lik
e
duration,
accent
and
modulation
[7].
The
amount
of
data
a
v
ailable
in
limited
data
condition
is
v
ery
small
which
gi
v
es
poor
v
erification
performance.
T
o
impro
v
e
the
v
erification
performance
in
limited
dat
a
condition,
we
need
dif
ferent
le
v
els
of
information
to
be
e
xtracted
from
speech
data
and
the
y
ha
v
e
to
be
combined
to
good
v
erification
performance.
The
v
ocal
tract
and
e
xcitation
source
information
are
combined
in
the
present
study
for
impro
ving
the
performance
of
speak
er
v
erification
system
under
limited
data
condition.
Second
stage
of
speak
er
v
erification
is
feature
e
xtraction.
Speech
production
system
usually
generates
J
ournal
Homepage:
http://iaesjournal.com/online/inde
x.php/IJECE
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
,
DOI:
10.11591/ijece.v7i6.pp3344-3357
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3345
S
pe
e
c
h
Tr
a
i
n/Test
S
pe
a
ke
r mode
ls
Te
sti
ng
F
e
a
tu
re
s
Ve
rif
ied
S
pe
a
ke
r
MF
C
C
or
L
P
C
C
De
lt
a
(
Δ
)
De
lt
a
-
De
lt
a
(
Δ
Δ
)
L
P
R
L
P
R
P
C
om
bi
ne
d
diff
e
re
nt
fe
a
tur
e
Ex
tra
c
ti
on
GMM
or
GMM
-
UB
M
Te
sti
ng
Figure
1.
Block
diagram
of
combination
of
dif
ferent
features
for
speak
er
v
erification
system.
lar
ge
amount
of
data
which
include
sensor
,
channel,
language,
style
etc
[8].
The
purpose
of
feature
e
xtraction
is
to
e
xtract
feature
v
ectors
of
reduced
dimension.
The
e
xtracted
feature
information
are
emphasized
and
other
redundant
f
actors
are
suppressed
in
these
feature
v
ectors
[3][9].
The
v
ocal
tract
information
can
be
e
xtracted
using
Mel-frequenc
y
cepstral
coef
ficients
(MFCC)
[10]
and
Linear
prediction
cepstral
coef
ficients
(LPCC)
[11]
e
xtraction
methods.
The
speech
signal
contains
both
static
and
dynamic
characteristics.
The
MFCC
and
LPCC
feature
set
contain
only
static
characteristics.
The
dynamic
characteristics
represented
by
Delta
(
4
)
and
Delta-Delta
(
44
)
contains
some
more
speak
er
information,
which
are
useful
in
speak
er
v
erification
[4].
Excitation
source
features
are
e
xtracted
using
Linear
prediction
residual
(LPR)
and
Linear
prediction
residual
phase
(LPRP)
[12].
In
this
w
ork,
LPR,
LPRP
,
MFC
C,
4
MFCC,
44
MFCC,
LPCC,
4
LPCC
and
44
LPCC
features
are
used
to
e
v
aluate
the
performance
of
the
system
under
limited
data
condition.
Further
,
each
of
these
features
of
fer
dif
ferent
information
and
a
combination
of
t
h
e
se
may
impro
v
e
the
performance
of
speak
er
v
erification.
Hence,
the
performance
of
speak
er
v
erification
considering
combination
of
features
is
e
v
aluated
in
the
present
w
ork.
Fig.
1.
sho
ws
the
block
diagram
representation
of
combination
of
dif
ferent
features
for
speak
er
v
erifica-
tion
system.
The
paper
has
been
or
g
anized
into
the
follo
wing
sections:
Section
2
describe
s
the
speak
er
v
eri
fication
studies
using
dif
ferent
feature
e
xtraction
techniques.
Dif
ferent
modelling
techniques
and
testing
are
presented
in
Section
3.
Experimental
results
are
reported
in
Section
4.
Section
5
contains
conclusion
and
future
scope
of
w
ork.
2.
SPEAKER
VERIFICA
TION
STUDIES
USING
DIFFERENT
FEA
TURES
The
speak
er
-specific
information
can
be
e
xtracted
from
feature
e
xtraction
techniques
at
a
reduced
data
rate[13].
These
feature
v
ectors
contain
v
ocal
tract,
e
xcitation
source
and
beha
vioral
traits
of
speak
er
-specific
information[4].
A
good
feature
is
one
which
contains
all
components
of
speak
er
-specific
information.
T
o
create
a
good
feature
set,
dif
ferent
feature
e
xtraction
techniques
need
to
be
understood.
2.1.
V
ocal
tract
featur
es
f
or
speak
er
v
erification
The
v
ocal
tract
features
are
e
xtracted
using
MFCC
and
LPCC
feature
e
xtraction
techniques.
The
features
e
xtracted
from
these
techniques
are
dif
ferent
and
therefore
their
performance
v
aries.
The
reason
for
the
same
is
as
follo
ws.
In
case
of
MFCC,
the
spectral
distortion
is
minimized
using
hamming
windo
w
.
The
magnitude
fre-
quenc
y
response
is
obtained
by
applying
F
ourier
T
ransformation
to
the
windo
wed
frame
signal.
The
22
trian-
gular
band
pass
filters
are
used
to
pass
the
resulting
spectrum.
Discrete
cosine
transform
is
applied
to
the
output
of
the
mel
filters
in
order
to
obtain
the
cepstral
coef
ficients.
The
obtained
MFCC
features
are
used
to
train
and
test
speech
data.
LPCC
reflects
the
dif
ferences
of
the
biological
structure
of
human
v
ocal
tract.
Computing
method
by
LPCC
is
a
recursion
from
LPC
parameter
to
LPC
cepstrum
according
to
all-pole
model.
LPC
is
simply
Limited
Data
Speak
er
V
erification:
Fusion
of
F
eatur
es
(J
ayanthi
K
umari)
Evaluation Warning : The document was created with Spire.PDF for Python.
3346
ISSN:
2088-8708
the
coef
ficients
of
this
all-pole
filter
and
is
equi
v
alent
to
the
smoothened
en
v
elope
of
the
log
spectrum
of
the
speech.
LPC
can
be
calculated
either
by
the
autocorrelation
or
co
v
ariance
methods
directly
from
the
windo
wed
portion
of
speech.
The
Durbin’
s
recursi
v
e
method
is
used
to
calculate
LPCC
without
using
the
Discrete
F
ourier
T
ransform
(DFT)
and
the
in
v
erse
DFT
.
These
tw
o
methods
are
more
comple
x
and
time
consuming
[14].
The
MFCC
and
LPCC
e
xtraction
techniques
are
widely
used
and
ha
v
e
pro
v
en
to
be
ef
fecti
v
e
in
speak
er
v
erification.
Ho
we
v
er
,
the
y
are
not
pro
viding
satisf
actory
performance
under
limited
data
condition.
Therefore,
there
is
a
need
to
impro
v
e
the
performance
of
speak
er
v
erification
system
by
obtaining
e
xtra
information
about
the
speech
data.
The
feature
set
of
MFCC
and
LPCC
contains
only
static
properties
of
speech
signal.
In
addition,
the
dynamic
characteristics
of
the
speech
signal
can
also
be
obtained
to
impro
v
e
the
performance
of
speak
er
v
erification.
This
will
be
helpful
for
v
erification
of
speak
ers[15].
T
w
o
types
of
dynamics
are
a
v
ailable
in
speech
processing
[16]
:
The
v
elocity
of
the
features
which
is
kno
wn
as
4
features
obtained
by
a
v
erage
first-order
temporal
deri
v
ati
v
e.
The
acceleration
of
the
features
which
is
kno
wn
as
44
features
obtained
by
a
v
erage
second
order
temporal
deri
v
ati
v
e.
2.2.
Excitation
sour
ce
featur
es
f
or
speak
er
v
erification
The
spectral
features
e
xtracted
from
v
ocal
tract
are
in
the
range
of
10-30
ms.
These
spectral
features
ignore
some
of
the
speak
er
specific
e
xcitation
information
lik
e
linear
prediction
(LP)
residual
and
LP
residual
phase
that
can
be
used
for
speak
er
v
erification
[6].
In
order
to
calculate
LP
residual,
first
the
v
ocal
tract
information
is
predicted
from
speech
data
using
LP
analysis
and
in
v
erse
filter
formation
is
used
to
suppress
them
from
the
speech
data
[17][6].
T
o
calculate
LPRP
,
first
we
need
to
di
vide
LP
residual
by
its
Hilbert
en
v
elop
[17].
The
LPRP
contains
speak
er
-specific
information
and
LPR
contains
information
obtained
from
e
xcitation
source
mainly
glottal
closure
instants
(GCIS)
[18].
The
features
of
LPR
and
LPRP
contai
n
speak
er
-
specific
e
xcitation
source
information,
which
are
dissimilar
in
their
characteristics.
These
tw
o
feat
ures
can
be
combined
to
g
ain
more
adv
antage.
3.
SPEAKER
MODELING
AND
TESTING
Dif
ferent
modelling
techniques
are
a
v
ailable
for
speak
er
modelling
including
V
ector
q
ua
n
t
ization
(VQ),
Hidden
mark
o
v
model
(HMM),
Gaussian
mixture
model
(GMM)
and
GMM-Uni
v
ersal
background
model
(UBM)
etc.
Among
these
GMM
and
GMM-UBM
are
used
as
a
classifier
for
the
present
w
ork.
When
the
a
v
ailable
training
data
is
inadequate,
the
GMM-UBM
is
widely
used
for
speak
er
v
erification
[19].
UBM
represents
the
speak
er
independent
distrib
ution
of
features.
T
o
construct
UBM,
we
require
lar
ge
amount
of
speech
data.
UBM
is
the
core
part
of
GMM-UBM
speak
er
v
erification
system.
A
balance
of
male
and
female
speak
ers
must
be
ensured
in
UBM.
The
simplest
approach
to
train
a
UBM
is
to
pool
all
the
data
and
use
it
via
e
xpectation-maximization
(EM)
algorithm
[20].
The
coupled
tar
get
and
background
speak
er
model
com-
ponents
are
inte
grated
ef
fecti
v
ely
while
performing
speak
er
recognition,
when
Maximum
a
posteriori
(MAP)
adaptation
is
used
[13].
The
adv
antage
of
UBM
model
i
s
that
a
lar
ge
number
of
s
peak
ers
are
used
to
design
speak
er
indepen-
dent
model
and
trained
for
the
required
task.
Ev
en
with
minimal
speak
er
data,
UBM-based
modeling
technique
pro
vides
good
performance.
The
dra
wback
of
UBM
model
is
that
a
lar
ge
gender
-balanced
speak
er
set
is
re-
quired
for
training
[20].
The
speak
ers
are
also
modelled
using
GMM
to
v
erify
its
ef
fecti
v
eness
under
limited
data
speak
er
v
erification.
In
case
of
testing,
the
reference
models
are
compared
by
test
feature
v
ectors,
if
the
test
feature
v
ectors
are
matches
with
the
reference
models
scores
is
generated.
The
scores
represent
ho
w
well
the
test
feature
v
ec-
tors
match
with
reference
models
[4].
In
practical
applications,
there
will
be
chance
of
rejecting
true
speak
ers
and
chance
of
accepting
f
alse
speak
ers.
In
the
present
w
ork
the
log
lik
elihood
ratio
test
method
[21]
is
adopted.
4.
RESUL
TS
AND
DISCUSSIONS
4.1.
Experimental
setup
In
current
analysis,
the
NIST
-2003
database
is
used
for
v
erifying
the
speak
ers
[22].
This
conta
ins
356
train
and
2559
test
speak
ers.
The
train
speak
er
contains
149
male
and
207
female
speak
ers.
The
UBM
contains
IJECE
V
ol.
7,
No.
6,
December
2017:
3344
–
3357
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3347
251
female
and
male
speak
ers.
The
duration
of
test,
train
and
UBM
speech
v
aries
from
seconds
to
fe
w
minutes.
The
present
w
ork
is
for
limited
data,
therefore
we
ha
v
e
tak
en
each
speak
ers
is
of
durations
3s-3s
(train-test),
4s-4s,
5s-5s,
6s-6s,
9s-9s
and
12s-12s
data
to
create
the
database
for
the
study
.
4.2.
Speak
er
v
erification
r
esults
W
e
conducted
the
te
xt-independent
speak
er
v
erificati
on
e
xperiments.
The
v
erification
performance
of
the
system
can
be
calculated
by
using
equal
error
rate
(EER).
It
is
the
ratio
of
f
alse
rejection
rate
(FRR)
and
f
alse
acceptance
rate
(F
AR).
The
e
xtracted
features
are
MFCC,
LPCC,
LPR,
LPRP
and
transitional
characteristics
lik
e
4
and
44
are
in
the
dimension
of
13.
In
case
of
MFCC
and
LPCC
and
its
deri
v
ati
v
es,
speech
data
is
analyzed
with
the
frame
size
(FS)
of
20
ms
and
with
frame
rate
(FR)
of
10
ms.
In
case
of
LPCC,
we
considered
10
th
order
LP
analysis
because
speech
is
sampled
at
8
KHz.
The
LP
order
v
aries
from
8
to
12
[23]
and
10
th
order
sho
wn
to
be
appropriate
to
compute
LPCC
[23].
FS
of
12
ms
and
FR
6
ms
has
been
fix
ed
for
LPR
and
LPRP
.
The
speak
er
specific
information
obtained
for
each
of
these
features
are
dif
ferent.
Therefore
the
combination
of
these
features
may
gi
v
e
better
performance.
The
modeling
techniques
used
are
GMM
and
GMM-UBM.
The
speak
ers
are
modelled
for
Gaussian
mixture
of
16,
32,
64,
128
and
256.
4.3.
Indi
vidual
featur
e
perf
ormance
using
GMM
and
GMM-UBM
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(MFCC)
Equal Error Rate (%)
Gaussian Mixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaMFCC)
Equal Error Rate (%)
GaussianMixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaDeltaMFCC)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
16
32
64
16
32
64
Figure
2.
Performance
of
speak
er
v
erification
system
based
on
MFCC
indi
vidual
features
using
GMM
model-
ing
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(LPCC)
Equal Error Rate (%)
Gaussian Mixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaLPCC)
Equal Error Rate (%)
Gaussian Mixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaDeltaLPCC)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
16
32
64
16
32
64
Figure
3.
Performance
of
speak
er
v
erification
system
based
on
LPCC
indi
vidual
features
using
GMM
modeling
The
e
xperimental
results
are
sho
wn
in
Figure.
2,
3
and
4
for
indi
vidual
features
of
(MFCC,
4
MFCC,
44
MFCC),
(LPCC,
4
LPCC,
44
LPCC)
and
(LPR,
LPRP)
re
specti
v
ely
.
The
e
xperiment
is
conducted
for
3s-
Limited
Data
Speak
er
V
erification:
Fusion
of
F
eatur
es
(J
ayanthi
K
umari)
Evaluation Warning : The document was created with Spire.PDF for Python.
3348
ISSN:
2088-8708
T
able
1.
Comparison
of
minimum
EER(%)
for
indi
vidual
features
using
dif
ferent
amount
of
training
and
testing
data
for
GMM
Indi
vidual
Features
T
raining/T
esting
data
3s-3s
4s-4s
5s-5s
6s-6s
9s-9s
12s-12s
MFCC
45.16
44.21
42.36
41.89
38.25
35.68
4
MFCC
45.75
43.54
42.09
44.89
38.07
35.63
44
MFCC
45.27
44.67
42.95
42.14
38.70
37.30
LPCC
43.08
41.41
39.97
38.7
31.34
28.18
4
LPCC
44.89
44.12
41.82
41.05
37.17
35.32
44
LPCC
44.76
43.13
42.00
41.10
37.48
35.86
LPR
47.85
48.34
47.34
47.06
46.59
46.09
LPRP
47.16
47.43
46.08
46.58
46.66
46.62
3s,
4s-4s,
5s-5s,
6s-6s,
9s-9s
and
12s-12s
for
dif
ferent
Gaussian
mixtures.
Further
,
the
modeling
is
done
using
GMM
for
Gaussian
mixtures
of
16,
32
and
64.
Since
the
data
is
v
e
ry
small,
the
Gaussian
mixtures
are
limited
to
64.
The
minimum
EER
of
each
speech
data
are
tab
ulated
in
T
able
1.
irrespecti
v
e
of
Gaussian
mixtures.
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(LPR)
Equal Error Rate (%)
Gaussian Mixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(LPRP)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
16
32
64
Figure
4.
Performance
of
speak
er
v
erification
system
based
on
LPR
and
LPRP
indi
vidual
features
using
GMM
modeling
The
performance
of
indi
vidual
features
are
analysed
by
considering
3s-3s
data
size
as
sho
wn
in
Figure.
2.
From
the
e
xperimental
results
it
w
as
observ
ed
that,
the
indi
vidual
feature
MFCC
pro
vides
a
reduced
EER
which
is
less
by
0.59%
and
0.11%
of
4
MFCC
and
44
MFCC
respecti
v
ely
.
The
results
of
LPCC
features
for
the
same
data
size
is
sho
wn
in
Figure.
3,
the
indi
vidual
feature
LPCC
pro
vides
a
reduced
EER
which
is
less
by
1.81%
and
1.68%
of
4
LPCC
and
44
LPCC
respecti
v
ely
.
The
tw
o
points
can
be
noticed
from
these
results.
First
point,
static
characteristics
pro
vides
better
performance
as
compared
with
dynamic
characteristics.
The
second
point
is,
the
indi
vidual
features
of
LPCC
and
its
deri
v
ati
v
es
gi
v
es
better
v
erification
performance
than
MFCC
and
its
deri
v
ati
v
es.
The
results
of
LPCC
features
for
the
same
data
size
is
sho
wn
in
Figure.
4.
From
the
e
xperimen-
tal
results
it
w
as
observ
ed
that,
the
reduced
EER
of
LPR
which
is
greater
than
2.69%
and
4.77%
of
MFCC
and
LPCC
respecti
v
ely
.
Further
,
the
reduced
EER
of
LPRP
which
is
more
by
2%
and
4.08%
of
MFCC
and
LPCC
respecti
v
ely
.
It
clearly
sho
ws
that
performance
of
v
ocal
tract
features
gi
v
es
better
EER
as
compared
to
e
xcitation
source
features.
The
same
study
is
also
conducted
for
other
data
sizes
of
4s-4s,
5s-5s,
6s-6s,
9s-9s
and
12s-12s
to
v
erify
the
performance
using
indi
vidual
features.
In
all
the
cases,
the
results
sho
ws
that
EER
decreases
as
we
increased
the
train
and
test
data.
The
GMM
modeling
w
orks
v
ery
well
in
case
of
suf
ficient
data
[20].
T
o
o
v
ercome
this
problem,
we
used
GMM-UBM
modeling.
UBM
should
be
trained
in
such
a
w
ay
that
it
should
ha
v
e
equal
number
of
male
and
female
speak
ers.
In
our
e
xperiment
the
total
duration
of
male
and
female
speak
ers
is
1506
sec
each.
IJECE
V
ol.
7,
No.
6,
December
2017:
3344
–
3357
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3349
T
o
study
the
significance
of
GMM-UBM
m
odeling,
the
same
set
of
e
xperiments
are
conducted.
The
e
xperimental
results
are
sho
wn
in
Figure.
5,
Figure.
6,
Figure.7
for
indi
vidual
features
using
(MFCC,
4
MFCC,
44
MFCC),
(LPCC,
4
LPCC,
44
LPCC)
and
(LPR,
LPRP)
respecti
v
ely
.
The
Gaussian
mixtures
considered
are
16,
32,
64,
128
and
256
as
additional
UBM
speech
data
is
used
for
training.
T
able
2.
represents
the
minimum
EER
of
indi
vidual
features
for
dif
ferent
speech
data
and
dif
ferent
amount
of
Gaussian
mixtures.
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(MFCC)
Equal Error Rate (%)
Gaussian Mixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaMFCC)
Equal Error Rate (%)
Gaussian Mixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaDeltaMFCC)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
128
256
16
32
64
128
256
16
32
64
128
256
Figure
5.
Performance
of
speak
er
v
erification
system
based
on
MFCC
indi
vidual
features
using
GMM-UBM
modeling
3
4
5
6
9
12
25
30
35
40
45
Training/Testing data in secs
(LPCC)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
128
256
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaLPCC)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
128
256
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(DeltaDeltaLPCC)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
128
256
Figure
6.
Performance
of
speak
er
v
erification
system
based
on
LPCC
indi
vidual
features
using
GMM-UBM
modeling
Consider
3s-3s
data
for
indi
vidual
features
MFCC
and
its
deri
v
ati
v
es,
MFCC
pro
vides
a
reduced
EER
which
is
less
by
1.18%
and
0.73%
of
less
4
MFCC
and
44
MFCC
respecti
v
ely
.
In
case
of
LPCC
feature
for
same
data
size,
the
indi
vidual
feature
LPCC
pro
vides
a
reduced
EER
which
is
less
by
0.9%
and
1.49%
of
4
LPCC
and
44
LPCC
respecti
v
ely
.
In
this
modeling
also,
static
characteri
stics
pro
vides
better
performance
as
compared
with
dynam
ic
characteristics.
Further
,
the
indi
vidual
features
of
LPCC
and
its
deri
v
ati
v
es
gi
v
es
better
v
erification
performance
than
MFCC
and
its
deri
v
ati
v
es.
Consider
LPR
and
LPRP
f
eatures
for
3s-3s
data
size.
The
minimum
EER
of
LPR
which
is
more
by
1.35%
and
2.25%
of
MFCC
and
LPCC
respecti
v
ely
.
Further
,
LPRP
is
also
ha
ving
1.15%
and
2.05%
higher
in
EER
as
compared
with
MFCC
and
LPCC
respecti
v
ely
.
The
same
study
is
also
conducted
for
other
data
sizes
of
4s-4s,
5s-5s,
6s-6s,
9s-9s
and
12s-12s
to
v
erify
the
performance
using
indi
vidual
features.
Here
also,
the
results
sho
ws
that
EER
decreases
as
we
increased
the
train
and
test
data.
From
these
tw
o
modeling
techniques
it
is
clear
that,
performance
of
v
ocal
tract
features
gi
v
es
bet
ter
EER
as
compared
to
e
xcitation
source
features.
Further
,
the
indi
vidul
features
e
xtracted
from
v
aries
e
xtraction
techniques
are
dif
ferent
and
hence
the
y
may
combine
to
further
impro
v
e
the
speak
er
v
erification
performance
under
limited
data
condition.
In
T
able
1
and
2,
it
w
as
observ
ed
that
irrespecti
v
e
of
speech
data
size
and
indi
vidual
features,
the
minimum
EER
of
GMM-UBM
performance
is
better
than
GMM.
Limited
Data
Speak
er
V
erification:
Fusion
of
F
eatur
es
(J
ayanthi
K
umari)
Evaluation Warning : The document was created with Spire.PDF for Python.
3350
ISSN:
2088-8708
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(LPR)
Equal Error Rate (%)
Gaussian Mixture
3
4
5
6
9
12
25
30
35
40
45
50
Training/Testing data in secs
(LPRP)
Equal Error Rate (%)
Gaussian Mixture
16
32
64
128
256
16
32
64
128
256
Figure
7.
Performance
of
speak
er
v
erification
system
based
on
LPR
and
LPRP
indi
vidual
features
using
GMM-
UBM
modeling
T
able
2.
Comparison
of
minimum
EER(%)
for
indi
vidual
features
using
dif
ferent
amount
of
training
and
testing
data
for
GMM-UBM
Indi
vidual
Features
T
raining/T
esting
data
3s-3s
4s-4s
5s-5s
6s-6s
9s-9s
12s-12s
MFCC
40.01
39.02
37.75
36.90
29.35
27.12
4
MFCC
41.19
41.14
39.79
39.88
36.31
33.10
44
MFCC
40.74
40.28
38.79
38.03
30.98
29.14
LPCC
39.11
37.75
36.35
35.99
29.08
26.91
4
LPCC
40.01
39.20
38.12
37.75
32.33
30.57
44
LPCC
40.60
39.11
38.07
36.54
29.58
27.41
LPR
41.36
40.32
39.25
37.04
36.06
33.28
LPRP
41.16
40.43
39.16
37.23
36.16
33.02
4.4.
Combination
of
featur
es
perf
ormance
using
GMM
and
GMM-UBM
The
speak
er
v
er
ification
system
using
limited
data
contains
speech
dat
a
of
fe
w
seconds.
Due
to
this
the
a
v
ailable
feature
v
ectors
are
less
in
numbers.
The
performance
of
speak
er
v
erification
system
can
be
increased
by
combining
feature
v
ectors
of
dif
ferent
features.
The
combination
of
features
is
accomplished
by
a
simple
concatenation
of
the
feature
sets
obtained
by
dif
ferent
feature
e
xtraction
techniques.
The
performance
of
speak
er
v
erification
system
for
combination
of
features
(MFCC,
4
,
44
,
LPR
and
LPRP)
for
dif
ferent
data
sizes
and
modeling
is
done
by
GMM.
The
e
xperimental
results
are
sho
wn
in
Figure.
8
and
9
for
combinations
of
MFCC
and
LPCC
respecti
v
ely
.
The
minimum
EER
of
v
aries
Gaussian
mixtures
of
each
speech
data
are
tab
ulated
in
T
able
3.
Further
,
consider
Figure.
8(a)
to
analyse
the
performance
for
multiple
combination
of
features
with
MFCC
using
3s-3s
data.
From
the
e
xperimental
results
it
w
a
s
observ
ed
that,
the
combination
of
features
(MFCC+
4
+
44
)
is
pro
viding
minimum
EER
of
44.35%
for
Gaussian
mixture
of
32
and
the
indi
vidual
features
MFCC,
4
and
44
are
pro
viding
minimum
EER
of
45.27%,
45.75%
and
45.16%
respecti
v
ely
for
the
Gaussian
mixture
of
16.
The
(MFCC+
4
+
44
)
pro
vides
a
reduced
EER
which
is
less
by
0.92%,
1.4%
and
0.81%
MFCC,
4
and
44
respecti
v
ely
.
The
performance
of
MFCC
and
its
deri
v
ati
v
es
(MFCC+
4
+
44
)
is
better
than
indi
vidual
perfor
-
mance
of
MFCC,
4
MFCC,
44
MFCC.
This
is
due
to
combination
of
both
static
and
dynamic
characteristics
of
speech
data
in
training
and
testing.
The
(MFCC+LPR)
is
pro
viding
minimum
EER
of
37.75%
for
Gaussian
mixture
of
16.
The
indi
vidual
features
LPR
is
pro
viding
minimum
EER
of
47.85%
for
Gaussian
mixture
of
16
and
which
is
more
by
10.1%
of
(MFCC+LPR).
The
(MFCC+LPR)
pro
vides
a
reduced
EER
which
is
less
by
6.6%
of
(MFCC+
4
+
44
).
The
combination
of
(MFCC+LPR)
performance
is
better
than
(MFCC+
4
+
44
).
The
(MFCC+LPRP)
is
ha
ving
minimum
EER
of
37.62%
for
the
Gaussian
mixture
of
64.
The
indi-
vidual
features
LPRP
is
pro
viding
minimum
EER
of
47.16%
for
Gaussian
mixture
of
32
and
which
is
more
by
IJECE
V
ol.
7,
No.
6,
December
2017:
3344
–
3357
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3351
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures −−−−> %
Equal Error Rate (EER) −−−−−> %
(a)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures−−−−−> %
Equal Error Rate (EER) −−−−−> %
(b)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures −−−−> %
Equal Error Rate (EER) −−−−−> %
(c)
MFCC+Delta+DeltaDelta
MFCC+LPR
MFCC+LPRP
MFCC+Delta+DeltaDelta+LPR
MFCC+Delta+DeltaDelta+LPRP
MFCC+Delta+DeltaDelta+LPR+LPRP
MFCC+Delta+DeltaDelta
MFCC+LPR
MFCC+LPRP
MFCC+Delta+DeltaDelta+LPR
MFCC+Delta+DeltaDelta+LPRP
MFCC+Delta+DeltaDelta+LPR+LPRP
MFCC+Delta+DeltaDelta
MFCC+LPR
MFCC+LPRP
MFCC+Delta+DeltaDelta+LPR
MFCC+Delta+DeltaDelta+LPRP
MFCC+Delta+DeltaDelta+LPR+LPRP
Figure
8.
Performance
of
speak
er
v
erification
system
for
MFCC
and
dif
ferent
combined
system
using
(a)
3s-3s,
(b)
4s-4s
and
(c)
5s-5s
and
modeling
using
GMM.
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures −−−−> %
Equal Error Rate (EER) −−−−−> %
(a)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures−−−−−> %
Equal Error Rate (EER) −−−−−> %
(b)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures −−−−> %
Equal Error Rate (EER) −−−−−> %
(c)
MFCC+Delta+DeltaDelta
MFCC+LPR
MFCC+LPRP
MFCC+Delta+DeltaDelta+LPR
MFCC+Delta+DeltaDelta+LPRP
MFCC+Delta+DeltaDelta+LPR+LPRP
MFCC+Delta+DeltaDelta
MFCC+LPR
MFCC+LPRP
MFCC+Delta+DeltaDelta+LPR
MFCC+Delta+DeltaDelta+LPRP
MFCC+Delta+DeltaDelta+LPR+LPRP
MFCC+Delta+DeltaDelta
MFCC+LPR
MFCC+LPRP
MFCC+Delta+DeltaDelta+LPR
MFCC+Delta+DeltaDelta+LPRP
MFCC+Delta+DeltaDelta+LPR+LPRP
Figure
9.
Performance
of
speak
er
v
erification
system
for
MFCC
and
dif
ferent
combined
system
using
(a)
6s-6s,
(b)
9s-9s
and
(c)
12s-12s
data
and
modeling
using
GMM.
9.54%
of
(MFCC+LPRP).
The
combination
of
(MFCC+LPRP)
pro
vides
a
reduced
EER
which
is
less
by
6.73%
of
(MFCC+
4
+
44
).
This
is
due
to
combination
of
both
v
ocal
tract
and
e
xcitation
source
information.
The
LPR
contains
Glottal
Closures
Instants
(GCIs)
related
to
e
xcitation
source
information.
Whereas,
LPRP
con-
tains
speak
er
-specific
sequence
information
[24].
The
LPR
and
LPRP
feat
u
r
es
contains
dif
ferent
characteristics
of
speak
er
-specific
e
xcitation
information.
The
(MFCC+
4
+
44
+LPR)
and
(MFCC+
4
+
44
+LPRP)
is
ha
ving
minimum
EER
of
37.63%
and
37.03%
for
the
Gaussian
mixture
o
f
16
respecti
v
ely
and
pro
vids
reduced
EER
which
is
less
by
0.12%
and
0.59%
of
(MFCC+LPR)
and
(MFCC+LPRP)
respecti
v
ely
.
The
combination
of
(MFCC+
4
+
44
+LPR+LPRP)
pro
vide
minimum
EER
of
34.32%
for
Gaussian
mixture
of
32.
Further
,
this
combination
pro
vides
reduced
EER
which
is
less
by
10.03%,
3.43%,
3.3%
3.31%
and
2.71%
of
(MFCC+
4
+
44
),
(MFCC+LPR),
(MFCC+LPRP),
(MFCC+
4
+
44
+LPR)
and
(MFCC+
4
+
44
+LPRP)
respecti
v
ely
.
The
combined
(MFCC+
4
+
44
+LPR+LPRP)
system
performs
better
as
compared
to
other
combined
systems
performance
for
all
training
and
testing
data.
This
is
because,
in
case
of
(MFCC+
4
+
44
+LPR+LPRP)
the
speak
er
-specific
information
includes
static,
transitional
characteristics
and
e
xcitation
source.
The
same
trend
is
observ
ed
for
remaining
data
sizes
are
gi
v
en
in
Figure.
8
and
Figure.
9.
From
abo
v
e
mentioned
results,
we
ha
v
e
observ
ed
that,
if
we
increase
training
and
testing
data
the
performance
of
combined
system
sho
ws
significant
impro
v
ement
in
EER.
T
o
study
the
significance
of
LPCC
and
combined
system
the
same
set
of
e
xperiments
are
conducted
as
in
case
of
MFCC
and
combined
system.
The
e
xperimental
results
are
sho
wn
in
Fig.
10
and
Fig.
11
for
combination
of
features
(LPCC,
4
,
Limited
Data
Speak
er
V
erification:
Fusion
of
F
eatur
es
(J
ayanthi
K
umari)
Evaluation Warning : The document was created with Spire.PDF for Python.
3352
ISSN:
2088-8708
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures −−−−> %
Equal Error Rate (EER) −−−−−> %
(a)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures−−−−−> %
Equal Error Rate (EER) −−−−−> %
(b)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures −−−−> %
Equal Error Rate (EER) −−−−−> %
(c)
LPCC+Delta+DeltaDelta
LPCC+LPR
LPCC+LPRP
LPCC+Delta+DeltaDelta+LPR
LPCC+Delta+DeltaDelta+LPRP
LPCC+Delta+DeltaDelta+LPR+LPRP
LPCC+Delta+DeltaDelta
LPCC+LPR
LPCC+LPRP
LPCC+Delta+DeltaDelta+LPR
LPCC+Delta+DeltaDelta+LPRP
LPCC+Delta+DeltaDelta+LPR+LPRP
LPCC+Delta+DeltaDelta
LPCC+LPR
LPCC+LPRP
LPCC+Delta+DeltaDelta+LPR
LPCC+Delta+DeltaDelta+LPRP
LPCC+Delta+DeltaDelta+LPR+LPRP
Figure
10.
Performance
of
speak
er
v
erification
system
for
LPCC
and
dif
ferent
combined
system
using
(a)
3s-3s,
(b)
4s-4s
and
(c)
5s-5s
data
and
modeling
using
GMM.
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures−−−−−> %
Equal Error Rate (EER) −−−−−> %
(a)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures −−−−> %
Equal Error Rate (EER) −−−−−> %
(b)
16
32
64
128
256
20
25
30
35
40
45
50
Gaussian Mixtures−−−−−> %
Equal Error Rate (EER) −−−−−> %
(c)
LPCC+Delta+DeltaDelta
LPCC+LPR
LPCC+LPRP
LPCC+Delta+DeltaDelta+LPR
LPCC+Delta+DeltaDelta+LPRP
LPCC+Delta+DeltaDelta+LPR+LPRP
LPCC+Delta+DeltaDelta
LPCC+LPR
LPCC+LPRP
LPCC+Delta+DeltaDelta+LPR
LPCCDelta+DeltaDelta+LPRP
LPCC+Delta+DeltaDelta+LPR+LPRP
LPCC+Delta+DeltaDelta
LPCC+LPR
LPCC+LPRP
LPCC+Delta+DeltaDelta+LPR
LPCC+Delta+DeltaDelta+LPRP
LPCC+Delta+DeltaDelta+LPR+LPRP
Figure
11.
Performance
of
speak
er
v
erification
system
for
LPCC
and
dif
ferent
combined
system
using
(a)
6s-6s,
(b)
9s-9s
and
(c)
12s-12s
data
and
modeling
using
GMM.
44
,
LPR
and
LPRP).
The
modeling
is
done
by
GMM.
Consider
3s-3s
data,
the
follo
wing
e
xperimental
results
are
observ
ed
from
Fig.
10(a).
The
combination
of
features
(LPCC+
4
+
44
)
is
pro
viding
minimum
EER
of
41.37%
for
Gaussian
mixture
of
16
and
the
indi
vidual
features
LPCC,
4
and
44
are
pro
viding
minimum
EER
of
43.08%,
44.89%
and
44.76%
respecti
v
ely
for
the
Gaussian
m
ixture
of
16.
The
(LPCC+
4
+
44
)
pro
vides
a
reduced
EER
which
is
less
by
1.71%,
3.52%
and
3.39
%
of
LPCC,
4
and
44
respecti
v
ely
.
The
performance
of
LPCC
and
its
deri
v
ati
v
es
(LPCC+
4
+
44
)
is
better
than
indi
vidual
performance
of
LPCC,
4
LPCC,
44
LPCC.
This
is
due
to
combination
of
both
static
and
dynamic
characteristics
of
speech
data
in
training
and
testing.
The
(LPCC+LPR)
is
pro
viding
minimum
EER
of
36.26%
for
Gaussian
mixture
of
16.
The
(LPCC+LPR)
pro
vides
a
reduction
in
EER
which
is
less
by
6.48%
of
LPR.
The
(LPCC+LPR)
pro
vides
a
reduction
in
EER
which
is
less
by
5.11%
of
(LPCC+
4
+
44
).
The
combination
of
(LPCC+LPR)
performance
is
bet-
ter
than
(LPCC+
4
+
44
).
The
(LPCC+LPRP)
is
ha
ving
minimum
EER
of
37.57%
for
the
Gaussian
mixture
of
32.
The
indi
vidual
features
LPRP
is
pro
vides
a
reduced
EER
which
is
more
by
9.59%
(LPCC+LPR).
The
(LPCC+LPRP)
pro
vides
a
reduction
in
EER
which
is
less
by
1.31%
of
(LPCC+LPR).
The
combination
of
(LPCC+LPRP)
performance
is
better
than
(LPCC+LPR).
This
is
because
LPR
and
LPRP
contains
dif
ferent
speak
er
-specific
information.
The
(LPCC+
4
+
44
+LPR)
and
(LPCC+
4
+
44
+LPRP)
is
ha
ving
minimum
EER
of
36.12%
and
37.54%
for
the
Gaussian
mixture
of
16
respecti
v
ely
and
pro
vids
reduced
EER
which
is
less
by
0.14%
and
0.03%
of
(LPCC+LPR)
and
(LPCC+LPRP)
respecti
v
ely
.
The
combination
of
(LPCC+
4
+
44
+LPR+LPRP)
pro
vide
minimum
EER
of
33.69%
for
Gaus-
sian
mixture
of
16.
The
reduced
EER
which
is
less
by
(LPCC+
4
+
44
+LPR+LPRP)
is
7.68%,
2.57%,
3.88%
,
2.73%
and
3.85%
of
(LPCC+
4
+
44
),
(LPCC+LPR),
(LPCC+LPRP),
(LPCC+
4
+
44
+LPR)
and
(LPCC+
4
+
44
+LPRP)
respecti
v
ely
.
The
combined
(LPCC+
4
+
44
+LPR+LPRP)
system
perform
better
IJECE
V
ol.
7,
No.
6,
December
2017:
3344
–
3357
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3353
T
able
3.
Comparison
of
minimum
EER(%)
for
dif
ferent
combinations
of
feature
s
using
dif
ferent
amount
of
training
and
testing
data
for
GMM
Indi
vidual
Features
T
raining/T
esting
data
3s-3s
4s-4s
5s-5s
6s-6s
9s-9s
12s-12s
MFCC+
4
+
4
44.35
44.12
41.86
41.41
38.07
32.61
MFCC+LPR
37.75
37.57
37.48
37.98
33.55
32.06
MFCC+LPRP
37.62
36.99
36.22
36.54
32.07
31.44
MFCC+
4
+
4
+LPR
37.63
37.52
37.34
36.40
31.65
31.03
MFCC+
4
+
4
+LPRP
37.03
36.35
36.12
36.31
31.33
31.23
MFCC+
4
+
4
+LPR+LPRP
34.32
33.96
33.83
33.46
28.95
28.31
LPCC+
4
+
4
41.37
39.97
38.88
37.98
32.61
30.26
LPCC+LPR
36.26
37.86
37.48
36.58
32.06
33.42
LPCC+LPRP
37.57
36.94
36.17
36.31
30.44
29.53
LPCC+
4
+
4
+LPR
36.12
36.85
35.64
35.86
31.32
3
0.89
LPCC+
4
+
4
+LPRP
37.54
36.48
36.12
34.13
30.15
29.12
LPCC+
4
+
4
+LPR+LPRP
33.69
33.78
33.73
33.42
28.31
28.22
as
compared
to
other
combined
systems
performance
for
all
training
and
testing
data.
This
is
because,
the
combination
of
(LPCC+
4
+
44
+LPR+LPRP)
contains
static,
transitional
characteristics
and
e
xcitation
source
information.
The
same
trend
is
observ
ed
for
remaining
data
sizes
as
sho
wn
in
Figure.
10
and
Figure.
11.
T
able
3.
pro
vides
the
comparison
of
dif
ferent
combined
systems
for
dif
ferent
amount
of
tra
ining
and
testing
data.
The
EER
of
(LPCC+
4
+
44
)
is
less
by
2.98%,
4.15%,
2.98%,
3.43%,
5.46%
and
5.46%
of
(MFCC+
4
+
44
)
for
3s-3s,
4s-4s,
5s-5s,
6s-6s,
9s-9s
and
12s-12s
data
respecti
v
ely
.
The
same
trend
has
been
observ
ed
for
remaining
combinations.
In
this
e
xperimental
study
,
we
observ
ed
that
when
both
training
and
testing
d
a
ta
are
limited,
the
(LPCC+
4
+
44
+LPR+LPRP)
is
ha
ving
minimum
EER
compared
to
all
other
combination
in
case
of
GMM
modeling.
This
is
because
LPCC
and
its
deri
v
ati
v
es
along
with
e
xcitation
source
features
are
able
to
capture
more
s
peak
er
-specific
information
from
speech
data,
this
will
create
dif
ferent
char
-
acteristics
between
speak
ers
[25].
T
o
study
the
significance
of
GMM-UBM
for
combination
of
features
the
follo
wing
e
xperiments
are
analysied.
The
performance
of
speak
er
v
erification
system
for
combination
of
features
(MFCC,
LPCC,
4
,
44
,
LPR
and
LPRP)
for
dif
ferent
data
sizes
using
GMM-UBM
as
a
m
odeling
technique
is
sho
wn
in
Figure.
12
to
Figure.
15.
Further
,
The
minimum
EER
of
v
aries
Gaussian
mixtures
of
each
speech
data
are
tab
ulated
in
T
ABLE
IV
.
Consider
3s-3s
data,
the
follo
wing
points
are
observ
ed
in
this
e
xperimental
setup
as
sho
wn
in
Figure.
12
and
Figure.
14.
The
combinat
ion
of
features
(MFCC+
4
+
44
)
and
(LPCC+
4
+
44
)
is
ha
ving
minimum
EER
of
38.84%
and
36.44%
respecti
v
ely
.
The
(LPC
C+
4
+
44
)
is
pro
viding
reduced
EER
which
is
less
by
2.4%
o
f
(MFCC+
4
+
44
).
The
combination
of
features
(MFCC+
LPR)
and
(MFCC
+LPRP)
is
ha
ving
minimum
EER
of
38.3%
and
36.54%
respecti
v
ely
.
Further
,
the
minimum
EER
of
(LPCC+LPR)
and
(LPCC+LPRP)
is
36.94%
and
34.55%
respe
cti
v
ely
.
The
(LPCC+LPR)
and
LPCC+LPRP)
is
pro
viding
reduced
EER
of
1.36%
and
1.99%
less
in
EER
as
compared
to
(MFCC+LPR)
and
(MFCC+LPRP)
respecti
v
ely
.
The
combination
of
features
(MFCC+
4
+
44
+LPR)
and
(MFCC+
4
+
44
+LPRP)is
ha
ving
mini-
mum
EER
of
34.73%
and
34.74%
respecti
v
ely
.
Further
,
the
minimum
EER
of
(LPCC+
4
+
44
+LPR)
and
(LPCC+
4
+
44
+LPRP)
is
34.12%
and
33.93%
respecti
v
ely
.
The
(LPCC+
4
+
44
+LPR)
and
(LPCC+
4
+
44
+LPRP)
is
pro
viding
reduced
EER
which
is
less
by
0.61%
and
0.81%
of
(MFCC+
4
+
44
+LPR)
and
(MFCC+
4
+
44
+LPRP)
respecti
v
ely
.
The
combination
of
features
(MFCC+
4
+
44
+LPR+LPRP)
is
pro
viding
reduced
EER
which
less
by
6.19%,
4.29%,
2.08%,
3.89%
and
2.09%
of
(MFCC+
4
+
44
),
(MFCC+LPR),
(MFCC+LPRP),
(MFCC+
4
+
44
+LPR)
and
(MFCC+
4
+
44
+LPRP)
respecti
v
ely
.
The
combination
of
features
(LPCC+
4
+
44
+LPR+LPRP)
is
pro
viding
reduced
EER
which
is
less
by
2.61%,
4.47%,
0.72%,
2.29%
and
0.1%
of
(LPCC+
4
+
44
),
(LPCC+LPR),
(LPCC+LPRP),
(LPCC+
4
+
44
+LPR)
and
(LPCC+
4
+
44
+LPRP)
respecti
v
ely
.
Limited
Data
Speak
er
V
erification:
Fusion
of
F
eatur
es
(J
ayanthi
K
umari)
Evaluation Warning : The document was created with Spire.PDF for Python.