TELKOM
NIKA
, Vol.14, No
.2, June 20
16
, pp. 588~5
9
7
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i1.2353
588
Re
cei
v
ed Au
gust 6, 201
5; Re
vised Decem
ber
10, 20
15; Accepted
Jan
uary 2, 20
16
An Optimum Database for Isolated Word in Speech
Recognition System
Sy
ifaun Nafi
sah*
1
, O
y
as
Wahy
unggoro
2
, Lukito Edi Nugroho
3
1,2,
3
Deparme
nt of Electrical En
gin
eeri
ng a
nd Info
rmation T
e
c
hnology
, Univ
e
r
sitas Gadja
h
Mada
1
Departme
n
t of Librar
y an
d In
formation Sci
e
nce, Islamic
State Univ
ersit
y
Suna
n Kal
ija
ga
Yog
y
akarta,
Jala
n Grafika No. 2, Yog
y
ak
arta 552
81, 62
274 5
5
2
305
*Corres
p
o
ndi
n
g
author,
e-mail: s
y
ifa
u
n
@
yah
oo.com
1
, o
y
as@u
gm.ac
.
id
2
, lukito.nugr
oho
@gma
il.co
m
3
A
b
st
r
a
ct
Speec
h rec
ogn
ition syste
m
(A
SR) is a tec
h
n
o
lo
gy
that a
llo
w
s
comp
uters
receiv
e the i
n
p
u
t usin
g
the spok
en w
o
r
d
s. T
h
is techn
o
l
ogy re
qu
ires s
a
mpl
e
w
o
rd
s in
the pattern
ma
tching
process
that is stored i
n
the d
a
tab
a
se.
T
here
is n
o
r
e
ference
as
the
fund
a
m
e
n
tal t
heory
to
deve
l
op
data
base
i
n
ASR. S
o
, th
e
researc
h
of da
tabase
deve
l
o
p
ment
to opti
m
i
z
e t
he p
e
rfo
r
ma
nce of th
e
system is re
q
u
ire
d
. Mel-scal
e
freque
ncy ce
p
s
tral coeffici
en
ts (MF
CCs) is used
to
extr
act the ch
ara
c
teristics of s
peec
h si
gna
l
and
backpr
opa
gati
on n
eur
al
netw
o
rk in
qu
ant
i
z
e
d
vector
is us
e
d
to ev
alu
a
te l
i
k
elih
oo
d the
maxi
mu
m
log
val
ues
to the n
ear
est pattern i
n
th
e
datab
ase. T
h
e resu
lts show
s the ro
bustn
e
ss of ASR is
o
p
timu
m us
ing
140
sampl
e
s of data referenc
e for each w
o
rd w
i
th an aver
age
of accuracy is 99.95
% an
d d
u
ratio
n
proces
s is
27.4
ms
ec. T
h
e i
n
vestig
atio
n
also
rep
o
rted
the
gen
der
do
e
s
n
’
t
h
a
ve s
i
g
n
if
icantly
infl
uenc
e to th
e acc
u
ra
cy.
F
r
om thes
e res
u
lts it conclu
de
d t
hat the perfo
rma
n
ce of ASR
can be in
cr
eas
ed by opti
m
i
z
i
n
g the data
base
.
Ke
y
w
ords
: Optimu
m, Data
ba
se, ASR, Backprop
agati
on, M
F
CCs
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
An automati
c
sp
eech
re
co
gnition
(ASR) uses th
e p
r
oce
s
s an
d
re
lated te
chnol
ogy for
conve
r
ting
sp
eech
signal
s
into a
se
que
nce
of
words or
othe
r ling
u
istic unit
s
u
s
ing a
n
alg
o
rit
h
m
whi
c
h wa
s e
m
bedd
ed as a
comp
uter p
r
og
ram.
Spe
e
ch
un
dersta
nding
sy
stem
s p
r
e
s
e
n
tly a
r
e
cap
able for v
o
ca
bula
r
ie
s o
f
thousan
ds
of word
s in o
peratio
nal en
vironme
n
ts. ASR has so
me
ability for
special
purposes
such
as a
machi
ne t
r
anslation.
Now, ASR was armed with vast
amount
s of e
x
ample tra
n
sl
ations
and p
o
we
rful comp
uters to
proving a
significa
nt prog
re
ss fo
r
achi
eving the dream. Th
e one algo
rithm for con
s
t
r
uctin
g
an a
u
tomatic ma
chin
e tran
sla
t
ion
system i
s
using stati
s
tical
analysi
s
of bilingual
pa
rallel corpora. It is the b
e
st algo
rithm
of
machi
ne tran
slation
syste
m
s for some l
angu
age pai
rs
up till now [1, 2]. In
this algorith
m
, part of
spe
e
ch (PoS
) is u
s
ed
as
feature to
im
prove th
e
quality of this
mac
h
ine [3].
In this
s
t
udy, the
experim
ents were con
d
u
c
ted
o
n
lon
g
senten
ce (3
0
words).
T
he results sh
ow that
the
ave
r
age
of increa
se in
accuracy of t
he
tran
slatio
n
use
s
g
r
amm
a
r PoS on
bef
ore the
use of
PoS is 2.23
%.
The a
c
cura
cy is increa
se
s abo
ut 4.13
% or the
accura
cy is a
c
hi
eving up to
6.45% whe
n
the
machi
ne u
s
in
g PoS compu
t
ing.
ASR con
s
ist
s
of two m
a
jor step
s. Fi
rst, DSP-st
yle
op
eration
s
to
co
nvert the
si
gn
al fro
m
analo
g
to dig
i
tal and the signal will extracted to
g
e
t the key featu
r
e vecto
r
s. T
he key featu
r
e
vectors are p
a
ssed into p
a
ttern matchi
ng pha
se.
In
this step, ASR req
u
ire
s
the data sam
p
les
that are stored as
database. Although this tech
nol
ogy has been widely used, it still requires
human
revie
w
an
d inte
rve
n
tion to en
su
re the a
c
curacy rate of up
to 100%. Sev
e
ral
studie
s
h
a
ve
documente
d
t
hat
e
rro
r rate
s are for vocabula
r
y
sizes of 200 is 3%, 5000 is 7% and more tha
n
1000
00 i
s
4
5
%
[4]. Rabi
n
e
r a
nd
Ju
ang
(20
0
6
)
al
so
explained
wo
rd e
r
ror rate
s
for rang
e of A
S
R
su
ch a
s
de
scribed in T
able
1 [5].
There are some meth
od
s to imp
r
ove
the perfo
rm
ance of ASR. The n
o
rmalize
d
Euclidia
n distance
can b
e
use
d
as
a method
fo
r matchi
ng
pro
c
e
ss. In this metho
d
, the
recognitio
n
p
r
ocess wa
s
perfo
rmed
u
s
ing the
ne
arest n
e
igh
bor and
sum
of ab
solute
e
r
ror.
Overall, the accuracy of the
method i
s
96.36% [5].
Hidde
n Markov model
(HMM) i
s
also
the
widely u
s
ed
method in
sp
eech re
co
gni
tion. Ho
weve
r, the accu
ra
cy usi
ng HM
M wa
s st
ron
g
ly
influen
ced by
the optimalization of extra
c
tion
p
r
o
c
e
s
s and mod
e
llin
g method
s. T
he expe
rimen
t
s
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Optim
u
m
Datab
a
se for Isolated
Word in
Speech Re
cog
n
ition
Syst
em
(Syif
aun Nafisa
h)
589
on the hyb
r
id
HMM
-
ge
netic alg
o
rithm
(GA) to optimi
z
ing th
e Bau
m
-welch met
hod in th
e tra
i
nin
g
p
r
oc
es
s inc
r
ea
s
e
th
e
a
c
c
u
ra
c
y
fr
om 20
% to
4
1
%
. It
is
proved
that th
ese
combin
ations give m
o
re
optimal resul
t
s than
HM
M method
[6]. Linea
r
p
r
edictive
Codi
ng (LPC) an
d dynami
c
ti
me
wra
ppin
g
(DT
W)
also eve
r
use
d
a
s
a techniqu
e in
th
e
matchin
g
p
r
o
c
e
ss.
The
co
nclu
sio
n
is st
at
e
that these te
chni
que
s
are
useful fo
r S
pea
ker
Depe
ndent &
Spe
a
ke
r Ind
epe
n
dent in
ASR
[7].
Other m
e
tho
d
that is u
s
ed to re
co
g
n
ize th
e
sp
e
e
ch so
that the
system become
s
faster,
efficiently and
accurately is combin
ation betwe
en mel freque
nci
e
s
cepstral co
efficient
s (MF
C
Cs)
and di
stan
ce
minimum te
chniqu
es. Ba
sed on th
e ex
p
e
rime
nts, it found that the
s
e com
b
inatio
ns
give the be
st performan
ce
results a
c
cu
rately in mo
st of the case
s with a
n
ove
r
all efficie
n
cy
of
95%. The
stu
d
y also
revea
l
s that t
he
HMM alg
o
ri
thm
achieve
s
the
effecie
n
cy
of the
system
u
p
to 98% to identify the most commo
nly used i
s
olate
d
word [8].
Table 1. Wo
rd error rates f
o
r a ra
nge of
spe
e
ch re
cog
n
i tion system
s
Vocabular
y
size
Word Erro
r Rat
e
11
1000
2500
64000
210000
28000 - 4
5000
0.3% -
5.0%
2.0%
2.5%
6.6%
~ 15%
~ 27% - 3
5
%
Will speech-proce
ssi
ng system ever reach human tran
scriber accuracy
? Realistically,
this will
not
happ
en. Wa
nkh
ede
et al.
determi
ned
the dime
nsi
o
n that
affect
the accu
ra
cy
o
f
speech recognition sy
stem. These
dimensi
o
ns
are vocabulary size,
confusability of word,
depe
nden
ce/i
ndep
ende
nce
spea
ke
r, isolat
ed/disco
n
tinu
ous/
c
ontin
uo
us spee
ch,
read/
spo
n
tan
eou
s sp
ee
ch
and adve
r
se
conditio
n
s [4
]. It will prod
uce a va
riou
s pattern in d
a
ta
referen
c
e i
n
databa
se.
T
he va
riou
s
pa
ttern in
data
b
a
se
ma
ke
s th
e trai
ning
ph
ase
is difficult to
do when te
st
ed u
s
ing va
ri
ous
sam
p
le g
r
oup
s. How
t
o
minimi
ze th
e effects
of these
dimen
s
io
ns
in datab
ase i
s
be
co
me
s a
cho
r
e t
hat is con
s
tantly
scrutini
ze
d by
resea
r
ch in A
S
R. Based
on
this
con
s
ide
r
ation, this
p
aper will
pre
s
ent
the
pro
posed m
e
tho
d
by devel
o
p
ing a
n
o
p
timum
databa
se to p
r
odu
ce the
ro
bustn
ess spe
e
ch
rec
ogniti
on syste
m
in orde
r to gen
e
r
ate re
co
gniti
on
accuracy ap
p
r
oa
chin
g 100
%.
2. Proposed
Method
The main g
oal of this study is to develop the
database for isol
ated word. The
investigatio
n covers ho
w
many data spee
ch
and t
he impa
ct of gende
r in the databa
se
to
improve the
accuracy of
ASR. It includes the
al
go
ri
thm that will be used in th
e develop
me
nt of
databa
se for
data refe
ren
c
e. The block
diagr
am in th
e Figure 1 sh
ows the pro
p
o
se
d model.
Figure 1. Block
Diag
ram o
f
System
3. Rese
arch
Metho
d
3.1. Speech
Data Selecti
on
I
n
t
h
is se
ct
io
n,
t
he st
ep
s wa
s
inv
o
lv
e
d
t
o
develo
p
a
databa
se
in
ASR was det
ailed.
Th
e
databa
se
wa
s compil
ed by
30 re
spo
ndent
s
that
i
s
con
s
i
s
ting of
20
m
a
le and 10
fem
a
l
e
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 588 – 59
7
590
spe
a
kers. First, the re
co
rding process was d
one to
acqui
re data
that was
co
nstru
c
ted
usi
n
g
isolate
d
wo
rd
in Indone
sia
n
langu
age.
The reco
rdi
n
g must b
e
cl
ean
with mini
mal ba
ckgro
u
n
d
disturban
ce.
Any mistake
s
ma
de
whil
e re
co
rd
in
g
have be
en u
ndon
e by re-recording
or
by
makin
g
the correspon
ding
cha
nge
s in the transcrip
tio
n
set. To ensure the minim
a
l disturban
ces,
the pro
c
e
s
s wa
s pe
rform
e
d in a so
und t
r
eated
audi
o
m
etric b
ooth
usin
g vocal m
i
cro
pho
ne PG
48-
LC, mini
mixer Eu
ro
Ra
ck
UB100
2FX th
at wa
s
con
n
e
c
ted to
com
p
uter. Th
e di
stance
betwee
n
the mo
uth to
microph
one
wa
s
ca
refully maintain
ed
at one
in
ch f
r
om the
left h
and
co
rne
r
of the
mouth
and
du
ration
of p
r
on
oun
ciation
for ea
ch
wo
rd
a
bout two
se
cond [9].
Th
e
data te
sting
wa
s
grou
ped into f
our data
s
et a
nd will teste
d
usin
g the co
mbination
su
ch a
s
sh
own in Table 2.
Table 2. The
combin
ation
s
of data set
Com
b
ina
t
io
n
Trainin
g
se
t
Testi
ng se
t
Com
b
ina
t
io
n
Trainin
g
se
t
Testi
ng se
t
1
2
3
4
5
Set I
Set I
Set I
Set II
Set II
Set II
Set III
Set IV
Set I
Set III
6
7
8
9
Set II
Set III
Set III
Set III
Set IV
Set I
Set II
Set IV
Set I c
o
ns
is
t
s
of the males
data speec
h
,
Set II is
females
data
s
p
eech and
Set III is
combi
nation
betwe
en mal
e
s an
d femal
e
s spea
ke
rs.
Set I,
II and III are the spe
a
ke
rs
who fill
as
data refere
nce in the
dat
aba
se. Set I
V
con
s
i
s
t of
combi
nation
betwe
en m
a
l
e
s
and fe
ma
les
speakers
who is not fill the databas
e
as data
reference.
All of th
e data set wil
l
tested by ot
her
data set to evaluate the a
c
curacy
of the
system. The
spe
a
ker rangi
ng from 15
-2
2 years of ag
e.
They aske
d to utters
a set
of words in
a norm
a
l ma
nner
whi
c
h th
e utteran
c
e
wa
s re
peate
d
12
times in
a l
o
w-noi
se
en
vironme
n
t to red
u
ce
a
c
o
u
stic interfe
r
ence an
d o
n
l
y the first
e
i
ght
repetition
s
were
used in
the trai
ning/te
sting p
h
a
s
e [
10]. The
dat
a wa
s
se
gre
gated in
divid
ually
and sto
r
ed a
s
*.wav files using Co
ol Edit Pro 2.
0. The set of wo
rd are listed in Ta
ble 3.
Table 3. The
List of Words
No
the v
a
ria
t
ion
of
w
o
rds
F
a
m
i
l
y
Me
mb
e
r
s
(
1
)
N
u
mb
e
r
s
(
2
)
T
h
e
n
a
m
e
o
f
c
i
ty
(3
)
N
o
un
(
4
)
H
u
m
a
n
bo
d
y
(
5
)
1
2
3
4
5
6
7
8
9
10
a
-
y
ah
i-bu
a-dik
ka-
k
ak
sa-y
a
ne-nek
ka-
k
ek
bi-bi
pa-man
s
a
-u-da-
ra
sa-
t
u
du-a
ti-ga
em-pat
li-
ma
e-nam
tu-juh
de-la-pan
sem-bi-lan
se-pu-luh
j
a
-k
ar-ta
ban-dung
s
e
-ma-ra
ng
yog-
ya
-k
ar-ta
s
u
-ra-ba
-
ya
den-pa
-sar
ma-ka-sar
me-dan
pon-ti-a-
nak
jem-ber
ro-ti
na-si
ke-
j
u
bo-la
to-pi
bu-ku
me-ja
k
u
r-s
i
pin-tu
sen-dok
ma-ta
gi-gi
pi-pi
hi-dung
ta-ngan
ka-
k
i
le-her
pung-gun
g
ping-gang
bi-bir
3.2. Featur
e Extrac
tion
The first
step
in feat
ure
ex
traction
p
r
o
c
e
s
s is
segm
en
ted the
sig
nal
to g
e
t spe
c
ific p
a
rt
s
of the utte
ran
c
e
s
by li
sten
the wave file
of ea
ch
data
to loo
k
the
b
ound
ary of th
e spe
c
ific
parts,
then
cut the
wave file to
e
x
tract the
sp
e
c
ific
part
s
m
a
nually. Fo
r e
x
ample, the
word ‘Pintu’,
will
cut into ‘Pin’
and ‘T
u’ waves file
s. Anot
her
sam
p
le i
s
the word
‘Ja
k
arta’
will
se
g
m
ented i
n
to ‘
J
a’,
‘Kar’ and ‘Ta’
From the reco
rdin
g pro
c
e
ss,
this
stu
d
y have 132
30 final wo
rd
s, and after t
he
segm
entation
step, fro
m
1
3230 fin
a
l wo
rds, th
er
e are 268
50
wav
e
files a
s
dat
a refe
ren
c
e i
n
the
databa
se
su
ch as de
scrib
e
d
in Table 4.
Table 4. The
final word
s o
f
data spee
ch
Speech Data
Uttera
nces
Segme
n
te
d
File
s
Femal
e
Male
8380
4850
17900
8950
Total
13230
26850
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Optim
u
m
Datab
a
se for Isolated
Word in
Speech Re
cog
n
ition
Syst
em
(Syif
aun Nafisa
h)
591
In
th
is
p
r
oc
ess
,
th
e
MF
CCs
w
a
s
us
e
d
fo
r
ac
ou
s
t
ic
fea
t
u
r
es
be
ca
us
e
it tak
e
s
huma
n
pe
rc
e
p
t
io
n
sen
s
itivity with re
spe
c
t to freque
nci
e
s.
Figure
2 shal
l explain the
step-
by-step comp
utation of
MFCCs in thi
s
investig
atio
n [11-12].
Figure 2. Block
Diag
ram o
f
Speech Ana
l
ysis Pro
c
e
d
u
r
e
3.2.1. Pre-e
m
phasis
The 1
st
step i
n
MFCC al
go
rithm is
se
nd
the sp
ee
ch
si
gnal
to
a
h
i
gh
-
p
ass
filte
r
us
in
g
the followin
g
equatio
ns:
∗
1
(1)
Whe
r
e
s
n
is the
output si
gnal
and the val
u
e
of
whi
c
h u
s
ed in thi
s
stu
d
y is bet
wee
n
0.9 an
d
1.0. The z-t
r
a
n
sform of the filter is:
1
∗
(2)
The g
oal of
pre
-
em
pha
sis is to co
mpen
sate th
e high
-fre
qu
ency p
a
rt t
hat wa
s
suppressed during the sound pr
oduction mechanism
of hum
ans. T
he result
of
t
h
is process will
be used a
s
the input in the frame blo
c
kin
g
pro
c
e
ss.
3.2.2. Frame Blocking
After the pre-empha
si
s p
r
o
c
e
ss,
the
sig
nals
we
re
se
gmented i
n
to
frame
s
. In th
is stu
d
y,
the sampl
e
ra
te is 44.1 kHz and the fram
e size is 10
24
sample p
o
int
s
, so the du
ra
tion is:
1024
44100
0
.02
s
ec
20
Based
on
thi
s
calu
cul
a
tio
n
, the
spe
e
ch data
will
se
gment
duri
n
g
20
msec wit
h
ove
r
lap
50% of each f
r
ame. If the
overlap i
s
51
2 points, then
the frame rat
e
is:
44100
1024
512
86.
12
In this p
r
o
c
e
ss, the
sig
n
a
l
s ne
ede
d a
zero pa
ddin
g
pro
c
e
s
s into
the length v
a
lue
=
5000
0 as the
nearest len
g
th of powe
r
of two frame
s
.
3.2.3. Windo
w
i
n
g
The next pro
c
e
ss i
s
wind
owin
g all of the frame
s
. In this step, e
a
ch fra
m
e ha
s to be
multiplied
with a fun
c
tion o
f
windo
w to
keep the
co
nt
inuity first an
d
last poi
nts of
the frame. If
the
sign
al in a fra
m
e is den
ote
d
by
,
0
,…,
1
, then the signal after
wind
owin
g is:
,
∗
(3)
Whe
r
e
is the function
of windo
w. In
this
study, a
re
ctangl
e wind
ow wa
s cho
s
e
a
s
a
function
of wi
ndo
w be
ca
use it pro
d
u
c
e
s
the high
es
t
accuracy th
a
n
the othe
r f
unctio
n
such
as
pre
s
ente
d
in Figure 3.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 588 – 59
7
592
Figure 3. The
accura
cy ba
sed o
n
the variou
s functio
n
of windo
ws
defined by:
.
,0
1
(4)
In practi
ce, the value is set
to 0.97. MA
TLAB also provides the co
mmand recta
ngle fo
r
gene
rating th
e curve of a
Re
ctangl
e wi
ndo
w.
3.2.4. Fast F
ourier Tran
s
f
orm (FFT
)
After windo
wing, spe
c
tral
analysi
s
sh
ows t
hat different timb
re
s in spee
ch
signal
s
corre
s
p
ond
s to different energy dist
ributi
on ove
r
frequen
cie
s
. To obtain the magnitude
freque
ncy respon
se of ea
ch frame, FFT
was p
e
rfo
r
m
ed. In this proce
s
s, the si
gnal is a
s
su
med
a
pe
rio
d
ic, a
nd contin
uou
s whe
n
wrap
ping around.
If
this
i
s
n
o
t
t
he ca
se,
th
e sign
als
can still
perfo
rm F
F
T
but the
in
continuity at t
he frame'
s fi
rst
and last
points is likely to introduce
unde
sirable
effects i
n
th
e freq
uen
cy
respon
se.
To
deal
with t
h
is
pro
b
lem,
the si
gnal
s are
multiply each
frame by a rectan
gle win
dow to increa
se its co
ntinu
i
ty at the first and last poi
nts.
If the input fr
ame co
nsi
s
ts of three iden
tica
l fundam
e
n
tal perio
ds, then the mag
n
itude freq
ue
ncy
respon
se
will
be in
se
rted
two
zeros be
tween
every
two nei
ghb
ori
ng poi
nts
of
the freq
uen
cy
respon
se
of a sin
g
le fun
damental
period. In
other words, the
harm
oni
cs
of the frequ
en
cy
respon
se i
s
gene
rally ca
u
s
ed by the re
peating fun
d
a
mental pe
ri
ods in the fra
m
e. Ho
wever, in
this st
udy, to extract
enve
l
op-li
ke fe
atu
r
es, thi
s
step
use
s
th
e tri
angul
ar
band
pass filters,
as
explained in t
he next step.
3.2.5. Mel
After the coefficients
were kept, this i
n
ve
stigation will com
pute the DCT of
the log
filterban
k en
ergie
s
. The
r
e are two m
a
in re
ason
s
this is p
e
rfo
r
med. The first rea
s
o
n
is the
filterban
ks in
this st
udy are
all overl
appi
ng an
d
the
seco
nd
rea
s
o
n
is the filte
r
ba
nk e
n
e
r
gie
s
a
r
e
quite co
rrel
a
ted with ea
ch other. To comp
ute
the
DCT of the
log filterban
k ene
rgie
s, the
freque
ncy of the sig
nal sho
u
ld co
nvert in
to Mel scale usin
g the followin
g
equati
ons:
2595
log
1
700
⁄
)
(5)
The
result of
this
step i
s
t
he dia
gon
al
covari
an
ce m
a
trice
s
ca
n b
e
used to
m
odel the
feature
s
in th
e cla
s
sifier. F
r
om thi
s
step,
the
matrices comp
osed b
y
96 cep
s
tral coeffici
ents
p
e
r
feature
whi
c
h
we
re
con
s
i
s
t of 47 M
F
CCs
coefficie
n
ts,
47 MF
CCs d
e
lta features that indi
cate t
h
e
degree of sp
ectral
ch
ang
e, one ene
rg
y feature,
an
d one delta
-energy feat
ure. In reco
gni
zer,
Cep
s
tral
-me
a
n
-subtractio
n
(CMS)
of the MFCCs
co
efficients
wa
s done to re
m
o
ve som
e
of the
effects of noi
se.
3.2.6. Discre
t
e Co
sine Tr
ansform (DCT)
The next ste
p
is ap
ply DCT on th
e 2
0
log en
ergy
obtained from the trian
gula
r
band
pa
ss filters to h
a
ve L mel-scale ce
pstral
coe
ffici
ents. The formula for DCT
is sho
w
n n
e
xt.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Optim
u
m
Datab
a
se for Isolated
Word in
Speech Re
cog
n
ition
Syst
em
(Syif
aun Nafisa
h)
593
cos
∗
0
.
5
∗
/
∗
(6)
Whe
r
e
N i
s
t
he nu
mbe
r
of
triang
ular
ba
ndpa
ss filt
ers, L is the
nu
mber
of MF
CCs. In thi
s
step,
the investigati
on set N=2
0
and L
=
12.
3.2.7. Log Energ
y
After mel-filte
r
ban
k p
r
o
c
e
s
sing, log
proces
sing
will be
done. Th
e e
nergy
within
a frame
is al
so
an im
portant fe
ature that
can
be
ea
sily obtain
ed. Hen
c
e thi
s
step a
d
d
s
t
he log
en
ergy
a
s
the 13
rd
feature to MFCCs.
3.2.8. Delta
Ceps
trum
It is also
ad
vantagou
s to
have the ti
me de
rivatives of e
nergy and MF
CCs as n
e
w
feature
s
, whi
c
h
sho
w
s the
velocity an
d
accele
ratio
n
of both. Th
e
equatio
ns to
com
pute th
e
s
e
feature
s
are:
∆
/
(7)
The value of
M is set to
2. In this st
udy, this study add the
velocity, the feature
dimen
s
ion i
s
26, so
the
accele
ration,
the feat
ure
dimen
s
ion i
s
96-dimen
s
io
nal featu
r
es
for
recognitio
n
.
3.3. Classifie
r
The final ste
p
of this stu
d
y is sen
d
the feat
ures t
o
the cla
ssifi
er. In this study, the
BPNNs have
96 input n
o
d
e
s, 10
hidde
n nod
es, an
d
50 output n
o
des
whi
c
h
was trai
ned
using
4364
sampl
e
s from ea
ch
word. The architecture of cl
assifier sho
w
n in Figure 4.
Figure 4. 96-10-5
0
BPNNs Archite
c
ture
Figure 5.
The MSE of recogni
zer u
s
in
g
variou
s num
b
e
rs of hid
den
node
s
The
experi
m
ents fixed
tha
t
the n
u
mbe
r
of hidd
en
layer to
ten.
Fig
u
re
5
sh
ows the M
e
a
n
Square Erro
r (MSE) of the utteran
c
e type rec
ogni
ze
r with one hid
d
en layer an
d variou
s numb
e
r
of hidden n
o
d
e
s.
4. Results a
nd Discu
ssi
on
In the testin
g
pha
se, 4
5
repon
dent
s h
ad be
en te
st
ed to m
e
a
s
ure the p
e
rfo
r
mance of
system. Of that numbe
r, 30
re
spon
dents a
r
e sp
eakers wh
o fill as data referen
c
e in t
he
databa
se,
while 15
re
sp
onde
nts a
r
e
unre
c
o
gni
ze
d sp
ea
kers.
The p
r
o
c
ed
u
r
e of te
st using
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 588 – 59
7
594
scena
rio
su
ch a
s
al
rea
d
y sh
own in
T
able
1.
All o
f
the data
was t
r
aine
d u
s
ing
the BP
NNs
algorith
m
. The numbe
r of iteration
s
use
an epoch
that will be set
as a variabl
e. The variable
wa
s n
eed
ed
becau
se th
e
system
iterat
ed u
n
til all
th
e e
rro
rs
we
re
belo
w
t
he th
reshold
of
0.5, o
r
until the number of iteratio
ns re
ached 1,
000,00
0. An
epo
ch wa
s trained with a f
i
xed training
set
until all the
e
rro
rs that wa
s p
r
od
uced b
y
the dat
a p
a
irs were
bel
ow a
thre
sh
o
l
d. Every epo
ch
comp
ri
sed a
variable
num
ber that were resulted by
b
a
ckpropa
gati
on iteratio
ns.
The data te
sting
will be tested usin
g one respondent a
nd the
system will be
cal
c
ulat
ed the accuracy and
the
numbe
r of re
spo
nde
nt will be added to
be two, th
re
e, etc until ASR is finding
the number
of
respon
dent in
the databa
se
whi
c
h will
produ
ced the
robu
st system
for re
cog
n
itio
n. Based
on t
h
e
experim
ents,
the result of the test
sh
own in Table 5 a
nd Table 6.
As explan
atio
n of Table
5 a
nd Tabl
e 6, (1), (2
), (3
), (4
) and
(5
) is th
e gro
up of
wo
rds th
at
have b
een
p
r
eviously
de
scribed
in
Tabl
e 2.
Ba
sed
o
n
expe
rime
nts in
Ta
ble
4
and
Table
5,
the
numbe
r of re
spo
nde
nts in the databa
se
generates th
e differen
c
e
of accura
cy. For exampl
e, by
usin
g one respond
ent, the experim
ent
s
noted that the averag
e of ac
curacy that
is prod
uced
by
each group
a
r
e 99.9
8
%, 6
6
.65%,
72.21
%, 79.61% a
nd 85.6
7
%.
The p
r
o
c
e
ssi
ng times are
13
mse
c
, 10 m
s
ec, 14 m
s
e
c
, 12 mse
c
da
n 10 msec.
Based
on th
ese valu
es, the accu
ra
cy of
system
usi
n
g
one
re
sp
ond
ent is
80.82
% on ave
r
ag
e
an
d the
averag
e of
dura
t
ion of p
r
o
c
e
s
s i
s
11.8 mse
c
.
The inve
stiga
t
ion noted the
occurre
n
ce o
f
fluct
uation
s
of the accu
ra
cy
that occu
rs when
the numbe
r
of respond
en
ts in the data
base is in
cre
a
se
d. For e
x
ample, whe
n
the numb
e
r
of
respon
dent
s i
s
incre
a
sed u
p
to six re
spo
ndent
s, t
he a
c
cura
cy ha
s i
n
crea
sed fo
r
all gro
u
p
s
up
to
90.48%
with t
he d
u
ratio
n
o
f
pro
c
e
s
s is 2
0
msec
on
a
v
erage,
but t
h
is
accu
ra
cy
is n
o
t stabl
e.
It
can b
e
seen f
r
om the d
e
crease in a
c
curacy on
so
me
grou
ps
while
the numb
e
r o
f
resp
ond
ents is
increa
sed u
p
to twelve respo
nde
nts. The inve
stig
ations n
o
ted
that when d
a
ta referen
c
e
in
databa
se i
s
i
n
crea
sed
up
to twelve re
spo
nde
nt
s, the a
c
cura
cy
of each g
r
ou
ps a
r
e
99.98
%,
83.31%, 83.3
1
%, 88.87% and 79.6
1
%. Based on th
is
re
sult, it can cal
c
ul
ated
the average
of
accuracy is 8
7
.02% and the duratio
n of pro
c
e
ss t
hat is need
ed is 2
2
mse
c
on averag
e. In oth
e
r
words, the
r
e
has b
een a d
e
crea
se in a
c
curacy by 3.4
6
%.
Table 5. The
accuracy of ASR
∑
respo
nde
nt
∑
data
reference
∑
segm
e
nt
ed
files
The ac
curac
y
(1)
(2)
(3)
(4)
(5)
A
v
era
g
e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
486
921
1317
1767
2214
2601
3036
3486
3930
4380
4830
5250
5697
6123
6573
7008
7443
7890
8340
8790
9204
9642
10092
10542
10992
11439
11880
12330
12780
13230
985
1868
2673
3586
4493
5280
6163
7076
7977
8890
9803
10656
11563
12428
13341
14224
15107
16014
16927
17840
18681
19570
20483
21396
22309
23216
24111
25024
25937
26850
99.98
99.98
99.98
99.99
99.98
99.99
99.97
99.96
83.31
99.97
99
99.98
85.71
99.98
99
99.98
99.96
99.98
99.98
99.98
99
97.98
97.98
99
93
97.64
95.78
99.87
97.89
99.89
66.65
66.56
83.31
83.82
83.32
99.99
83.32
99.96
99.96
93,52
99.97
83.31
99.96
83.32
99.98
99.89
99.98
92.86
99
99
99.96
99.96
99.98
99.98
99
93.91
99.98
99.97
89.91
99.96
72.21
99.98
83.31
83.82
83.32
86.71
83.32
86.67
99.96
99
95.67
83.31
99
83.32
89.9
99.96
92.86
99.98
99.98
99.94
99.98
90.91
99.98
99
99.98
99
99.98
99.98
99.92
99
79.61
88.84
88.87
89.21
88.87
99.99
88.87
99.96
99.96
99
99.97
88.87
99.96
88.88
99.98
99.96
99.98
99.94
99.98
99.98
99.98
99.97
99.96
99.96
99.98
99.98
99.98
99.96
99.98
98.86
85.67
99
79.61
72.21
89.9
65.71
99
90.71
72.21
66.67
99
79.61
99
99.98
99
99.98
99
90.9
99
97.64
99.94
99
99
99.97
99.98
99.96
99.94
99.98
99.98
99.98
80.82
90.87
87.02
85.81
89.08
90.48
90.90
95.45
91.08
91.16
98.72
87.02
96.73
91.10
97.57
99.95
98.36
96.73
99.59
99.31
99.77
97.56
99.38
99.58
98.39
98.10
99.13
99.95
97.54
99.54
90.92
98.86
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Optim
u
m
Datab
a
se for Isolated
Word in
Speech Re
cog
n
ition
Syst
em
(Syif
aun Nafisa
h)
595
Table 6. The
duratio
n of proce
ss
∑
respo
nde
nt
∑
data
reference
∑
segm
e
nt
ed
files
Processin
g
ti
m
e
(1)
(2)
(3)
(4)
(5)
A
v
era
g
e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
486
921
1317
1767
2214
2601
3036
3486
3930
4380
4830
5250
5697
6123
6573
7008
7443
7890
8340
8790
9204
9642
10092
10542
10992
11439
11880
12330
12780
13230
985
1868
2673
3586
4493
5280
6163
7076
7977
8890
9803
10656
11563
12428
13341
14224
15107
16014
16927
17840
18681
19570
20483
21396
22309
23216
24111
25024
25937
26850
13
13
13
14
15
17
17
21
22
24
26
23
27
26
13
28
42
36
42
42
156
143
132
156
143
101
140
142
98
147
10
13
13
25
13
23
10
17
19
18
27
18
16
23
32
26
105
42
143
56
142
87
156
101
91
167
67
136
104
107
14
13
15
25
15
23
10
21
22
18
26
17
13
42
27
28
105
106
143
56
135
103
167
108
106
104
197
156
165
145
12
13
13
21
15
21
12
22
15
20
20
25
22
15
40
28
84
42
109
51
144
167
132
136
196
78
54
67
101
54
10
15
15
22
10
16
26
19
21
27
18
27
12
21
23
27
67
156
89
37
138
93
89
106
123
88
105
56
83
108
11.8
13.4
13.8
21.4
13.6
20
15
20
19.8
21.4
23.4
22
18
25.4
27
27.4
80.6
76.4
105.2
48.4
143
118.6
135.2
121.4
131.8
107.6
112.6
111.4
110.2
112.2
Table 5 an
d Table 6 sho
w
s that the ASR pro
d
u
c
e
s
the rob
u
st sy
stem by using
at least
7008
sample
of
utteran
c
e
as data refe
rence
in
t
he
d
a
taba
se. In
this
study, 7
0
08 d
a
ta
sam
p
les
were colle
cte
d
from 16 resp
ond
ents and
it
p
r
o
d
u
c
e
s
the
accura
cy up
to
99.95%
with
the
duratio
n of p
r
ocess i
s
27.
4 msec.
Thi
s
study
sho
w
s that the
nu
mber of data
sam
p
le
s in t
h
e
databa
se
wit
h
a minim
a
l
amount
of re
feren
c
e
data
as m
u
ch a
s
7008
can p
r
odu
ce
a mo
re
reliabl
e
syste
m
. The A
c
curacy that i
s
ge
nerate
d
u
s
in
g
the n
u
mbe
r
of sa
mple
s le
ss than
70
08
i
s
90.92% on
a
v
erage
with t
he lo
we
st accuracy i
s
8
0
.82% and t
h
e
highe
st a
c
cu
racy i
s
9
8
.86
%
.
While
accu
ra
cy u
s
ing
a d
a
ta sa
mple
s least th
an
7008 i
s
98.6
9
%. The lo
west a
c
curacy
is
95.94% and t
he highe
st accuracy rea
c
h
e
s 99.9
5
%.
It means that if 50 words n
e
ed 7008
sam
p
le
of syllable
s
to
pro
d
u
c
e the
robu
stne
ss of
ASR,
so fo
r
each word, th
e syste
m
req
u
ire
s
ab
out 1
4
0
sampl
e
s of sy
llable
s
that ar
e colle
cted from 16 re
sp
on
dents.
The dete
r
min
a
tion of an optimum data
base in
this study is not only cho
s
en
based o
n
accuracy. T
h
e du
ratio
n
of
pro
c
e
s
s al
so
use
d
a
s
a
co
nsid
eratio
n. It is u
s
ed
to d
e
t
ermine
whet
her
the co
mpo
s
ition of data
b
a
s
e is
rea
s
o
nab
le to use
in
real ap
plicatio
ns. As
ca
n b
e
se
en in
Ta
ble
6, the averag
e time that is requi
re
d to
pro
c
e
ss t
he
data in ASR
were divide
d
into two m
a
j
o
r
grou
ps b
a
se
d on the averag
e of accura
cy. The 1
st
group is a
group u
s
in
g
less than 7
008
sampl
e
s
data
and the 2
nd
grou
p is a g
r
oup u
s
ing le
ast 700
8 sa
mples. Th
e study noted, the
duratio
n of th
e pro
c
e
s
that
is re
qui
red
o
f
the 1
st
gro
u
p is 1
9
.1 msec
while th
e
2
nd
grou
p is
103
mse
c
on
ave
r
age. It ca
n b
e
co
ncl
uded t
hat the a
c
curacy of 1
st
gro
up lo
wer th
a
n
2
nd
grou
p b
u
t
the processin
g
time that i
s
need
ed by
1
st
grou
p faste
r
than
2
nd
group. Th
e inv
e
stigatio
ns
al
so
noted th
at th
e sy
stem
wo
uld be
mo
re
reliable
both i
n
a
c
cura
cy a
nd d
u
ratio
n
o
f
pro
c
e
s
s on
the
numbe
r of data sampl
e
s
as mu
ch a
s
in 7008 for
a
ll types of isolated wo
rd
s.
The resulti
n
g
accuracy
is 9
9
.95% with
th
e du
ration
of
the p
r
o
c
ess i
s
2
7
.4 m
s
e
c
on ave
r
ag
e.
After the n
u
m
ber
of re
spon
den
t is kn
own, the next ste
p
is fi
nd o
u
t the impa
ct o
f
gende
r in
databa
se. Th
e
experim
ents
noted that the
impact of ge
nder to t
he a
c
curacy is p
r
e
s
ente
d
in the Table 7.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 588 – 59
7
596
Table 7. The i
m
pact of gen
der to the a
c
cura
cy
G
e
nd
e
r
of
T
r
a
i
ne
r
A
ccu
r
a
c
y
Male Female
A
l
l
Data
Male 99.76
99.76
99.76
Female
99.56
99.56
99.56
Male+Female 99.69
99.67
99.66
T
h
e
e
x
pe
r
i
me
n
t
sh
o
w
s
tha
t
th
e
ma
le
s
p
ea
ke
r
is
pr
o
d
u
c
e
th
e
ac
cu
r
a
c
y
h
i
g
h
e
r
than
female spea
ker with
the dif
f
eren
ce of
0.2%. It is
not the sig
n
ifica
n
t
values. Th
e
com
b
ination
of
males a
nd fe
males
spe
a
ker i
n
d
a
taba
se
also
sh
o
w
s the
sm
all
differe
nce v
a
lue
only a
r
o
und
0.1%. It mea
n
s that the compo
s
ition of
gende
r in
d
a
taba
se do
e
s
not affect the accu
ra
cy of
system. B
a
sed o
n
the
experim
ents, th
e main
con
c
l
u
sio
n
i
s
the
a
c
cura
cy of
sp
eech recognit
i
on
system
will
g
enerate a
n
o
p
timum valu
e
almo
st to
10
0% by pe
rforming the
p
r
o
c
e
ssi
ng
of da
ta
reference that will be stored in the dat
abase. T
he processing of data re
ference will reach
an
optimum val
u
e by
developi
ng a
n
optimu
m
data
b
a
s
e.
The
devel
op
ment n
o
t only
in th
e p
a
rt
s
of
the ASR pro
c
e
ss b
u
t it covers all of the pro
c
e
s
se
s. If we comp
are the re
sult
s that have been
done
by som
e
re
sea
r
che
r
s to imp
r
ove
the perf
o
rm
ance of ASR, it is clea
rly
visible that t
he
efforts of imp
r
ovemn
e
t by usin
g an e
m
p
hasi
s
on one pro
c
e
ss ca
n not
pro
d
u
c
e optimal
value
of
ASR su
ch a
s
descri
be in T
able 8.
Table 8. Co
m
paratio
n pe
rforma
nce bet
wee
n
pro
p
o
s
ed method a
n
d
other meth
od
Part of Pr
ocess
in
A
S
R
Metho
d
A
c
c
u
rac
y
Segmentation
Contour
anal
y
s
is
[15]
82.63%
Normalization
Fuzz
y
logic [16]
86.36
Window
ing
1.
Function
of w
i
nd
ow
[17]
a. Hanning
Windo
w
b. Hamming
Window
c. Blackman
Windo
w
d. Gaussian
Window
2.
Non standard
w
i
ndo
w
[18]
67.65
66.2%
68.5%
68.5%
83.75%
Feature
Ext
r
action
1.
Extreme
Learnin
g
Machine [14]
2.
Mel-Freq
uencies Cepstral Coefficients (MFC
C) [8]
92.1%
95%
Matching Process
1.
Support Vector
Machine (SVM) [
14]
2.
Euclidean Distance [5]
3.
H
y
brid
Hidden M
a
rkov-Gen
etic Algorithm [6]
4. Hidden
Markov
Model
80.86%
96.36%
increasing 20% -
40%
98%
Optimum Data
ba
se
Combination bet
w
e
en :
a. Window
ing
(Rect
angle
function)
b.
Feature
Ext
r
action (MFC
C)
c.
Pattern Recogniti
on (BPNNs)
d.
Number o
g
data
reference
up to 99.95
%
(98.86
% on aver
age)
5. Conclusio
n
Based
on the
experime
n
ts
results, it can
be co
nclu
de
d that an opti
m
um datab
ase has
a
signifi
cant eff
e
ct on
ASR.
An optimum
databa
se
in
this context consi
s
ts
of th
e metho
d
in
all
main p
r
o
c
e
s
s in
ASR i
n
clu
d
ing
win
dowi
ng, feat
ure
extra
c
tio
n
, cla
s
sifier and
ho
w t
he
comp
ositio
n
of data refe
re
nce i
n
datab
ase. Th
e
pa
rameter
of performan
ce th
a
t
is used in t
h
is
study is m
e
asu
r
ed
by a
c
cura
cy
an
d
the duratio
n of pro
c
e
ss. The re
sults sho
w
s that the
developm
ent
of a
datab
a
s
e
usi
ng th
e
re
ctangl
e
wi
ndo
w fun
c
tio
n
(re
c
twin
) in
frame
blo
c
ki
ng
process and
the MF
CCs i
n
featur
e extraction process
will abl
e to
improve the
per
formance of
ASR. T
h
is perform
a
nce will be optimi
z
ed using BP
NNs algorithm in pattern matching process.
To ove
r
come
the difficultie
s of th
e m
a
tching
pro
c
e
s
s, it ca
n b
e
sol
v
ed by p
r
ovid
ing
spe
e
ch d
a
ta
referen
c
e in databa
se. Ba
sed o
n
experi
m
ents on 5
0
type of word, an optimum p
e
rform
a
n
c
e will
be achieved
usin
g lea
s
t 7
008 nu
mbe
r
of data refe
re
nce. It prod
u
c
e
s
the a
c
curacy up to 9
9
.95%
and the d
u
ration process i
s
27.4 m
s
e
c
.
It is mean
s that ideally th
e numb
e
rs of
data refe
ren
c
e
that are provided in the dat
aba
se for ea
ch word
as ma
ny as 140 sa
mples. In the next study, the
research
will be develop
ed using different l
anguages. It i
s
im
portant to do to ensure the
comp
ositio
n
of databa
se t
hat ca
n
optim
ize the
accu
racy of ASR.
It is hope
d that the re
sult
can
be used a
s
fundame
n
tal theory in the de
velopment of ASR.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Optim
u
m
Datab
a
se for Isolated
Word in
Speech Re
cog
n
ition
Syst
em
(Syif
aun Nafisa
h)
597
Ackn
o
w
l
e
dg
ements
The autho
rs woul
d
li
ke
to
thank
for
the sup
por
t
given
to Mini
stry
o
f
Religi
o
u
s
Af
fairs of
the Rep
ubli
c
of Indone
sia for the schol
arship of do
ctoral degree p
r
o
g
ram.
Referen
ces
[1]
Koeh
n P. Statistical Mach
ine
T
r
anslation
.
Ne
w
York: Cam
b
r
i
dg
e Univ
ersit
y
Press. 2010.
[2]
Peng L. A Surve
y
of Mach
in
e T
r
anslation Methods.
T
E
L
K
OMNIKA Indones
ian Jo
urn
a
l of Electrica
l
Engi
neer
in
g
. 2013; 11(
12): 71
25-7
130.
[3]
Suja
ini H, Kus
p
ri
ya
nto, Arman AA, Pur
w
a
r
ia
nti
A. A Novel Part-of-Speec
h
Set Develo
pi
n
g
Method fo
r
Statistical M
a
c
h
in
e T
r
anslati
o
n
.
T
E
LKOMNIKA Indo
nes
ian
Jour
nal
of E
l
ectrical
Eng
i
n
e
e
rin
g
.
20
14;
12(3): 58
1-5
8
8
.
[4]
W
ankhe
de
HS, Chh
abri
a
SA,
Dhar
as RV. H
u
man
Comp
ut
er Interacti
on
Using
E
y
e
an
d
Speec
h: T
he
H
y
brid A
ppro
a
c
h.
Internatio
n
a
l Jour
nal
of Emer
gi
n
g
Scie
n
c
e and E
ngi
ne
erin
g (IJESE)
. 2013; 1(
7):
54-5
8
.
[5]
Rabi
ner L
R
, Juang BH. Sp
eec
h Reco
gniti
on:
Statistical Meth
ods.
Elsevier
. 200
6: 1-18.
[6]
Emilli
a N
R
, Su
ya
nto, Ma
hara
n
i W. Isolate
d
Word
Rec
o
g
n
it
ion
Usin
g Er
go
dic H
i
dd
en
Ma
rkov Mod
e
ls
and Gen
e
tic Algorit
hm.
T
E
LKOMNIKA Indones
ian Jo
urn
a
l of
Electrical
Engin
eeri
n
g
. 201
2; 10(1)
:
129-
136.
[7]
Shin
de RB, P
a
w
a
r VP. Isol
ated W
o
rd R
e
cogn
it
ion S
y
st
em base
d
o
n
LPC an
d DT
W
T
e
chniqu
e.
Internatio
na
l Journ
a
l of
Co
mputer App
lic
ations.
201
2; 59(
6): 1-4.
[8]
S
w
a
m
y
S, Ra
makrishn
an K
V
. An Efficient Speech Rec
ogn
ition S
y
ste
m
.
Comp
uter Scienc
e &
Engi
neer
in
g: An Internatio
na
l Journ
a
l (CSEIJ
)
. 2013; 3(4).
[9]
Goss B. Listeni
ng as inform
ati
on proc
essin
g
. Co
mmun
icati
o
n Quarterly.
19
82; 30(4).
[10]
Polur P
D
, Gerald ME. Effects of hig
h
-freq
uenc
y s
pectra
l
compo
n
e
n
t in
computer r
e
c
ogn
ition
of
d
y
s
a
rthric sp
e
e
ch bas
ed
on
Mel-cepstra
l
stochastic mo
del.
Jour
na
l of Reha
bil
i
tatio
n
Researc
h
&
Devel
o
p
m
ent (JRRD).
20
05; 42(3): 36
3-3
7
2
.
[11]
Doma
n G. W
h
at
T
o
Do About
Your Brain-
inj
u
red C
h
il
d. Square
One P
ubl
i
s
hers. 200
5.
[12]
T
i
w
a
ri V. MFCC and its a
pplic
atio
ns in
speak
er recog
n
itio
n.
Internation
a
l Jour
nal
on Emergi
n
g
T
e
chno
log
i
es.
201
0: 19-2
2
.
[13]
Furui S. Digita
l
Speech Pr
oc
e
ssing: S
y
nthes
i
s
and Rec
o
g
n
it
ion. Seco
nd e
d
i
tion. CRC Pr
e
ss. 2000.
[14]
Hard
y, C
h
e
ah
YN. Questio
n
Classific
a
tio
n
Using
E
x
treme
Le
arni
ng M
a
c
h
in
e o
n
S
e
ma
ntic F
eatur
es.
Journ
a
l of ICT
Rese
arch an
d
Appl
icatio
ns. 2
013; 7(1): 3
6
-5
8
[15]
Kurni
a
w
a
n F
,
Mohd
Rah
i
m
MS, Sholi
h
a
h
N, Rakh
m
adi
A, Mohama
d
D
.
Characters
S
egme
n
tatio
n
of
Cursive H
a
n
d
w
r
i
tte
n W
o
rds
base
d
on C
ont
our Ana
l
ysis a
nd Ne
ural N
e
t
w
o
r
k Vali
dati
o
n.
Journa
l of
ICT
Research
and Ap
plic
atio
ns.
2011;
5(
1): 1-16.
[16]
Su
yanto, P
u
tro AE. Automa
tic Segme
n
tati
on of
In
do
nes
ian S
p
e
e
ch i
n
to S
y
lla
bl
es u
s
ing F
u
zz
y
Smoothe
d En
erg
y
Co
ntour
w
i
t
h
Loc
al No
rmalizati
on, S
p
littin
g
, and A
ssimilati
on.
Jo
urna
l of ICT
Rese
arch an
d
Appl
icatio
ns. 2
014; 8(2): 9
7
-1
12.
[17] F
a
vero
RF
.
C
o
mp
ariso
n
of mother w
a
ve
lets
for spe
e
ch r
e
c
ogn
ition
. Inter
n
ation
a
l
Confer
e
n
ce S
peec
h
Scienc
e an
d T
e
chn
o
lo
g
y
. 199
4: 336-3
41.
[18]
Rozma
n
R, Kodek DM. Improvin
g speec
h
rec
ogn
ition r
obustn
ess usi
ng no
n-stan
da
rd
w
i
nd
o
w
s.
Europ
e
a
n
Scie
nce F
i
ction C
o
nventi
on.
Lj
ubl
j
ana, Slov
en
ia. 200
3
Evaluation Warning : The document was created with Spire.PDF for Python.