TELKOM
NIKA
, Vol.12, No
.2, June 20
14
, pp. 389~3
9
6
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v12i2.1945
389
Re
cei
v
ed
De
cem
ber 1
2
, 2013; Re
vi
sed
April 12, 201
4; Acce
pted
May 16, 20
14
Nearest Neighbour-Based Indonesian G2P Conversion
Su
y
a
nto
1
, A
gus Ha
rjoko
2
1
School of Co
mputin
g, T
e
lkom Universit
y
Jala
n T
e
lekomunik
a
si T
e
rusan Buah B
a
tu, Band
un
g 402
57,
Indones
ia
2
F
a
cult
y
of Ma
thematics an
d Natura
l Scienc
es, Gadjah Ma
da Un
iversit
y
Sekip Utar
a, Bulaks
umur, Yo
g
y
akarta 5
5
2
8
1
, Indon
esia
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: su
y
@
ittelkom
.ac.id
1
, aharjok
o
@u
gm.ac.id
2
A
b
st
r
a
ct
Graphe
me-to-p
hon
e
m
e co
nv
ersio
n
(G2P), also
know
n
as letter-to-so
und co
nversi
o
n
, is an
importa
nt
mod
u
le
in
bot
h sp
eech
synth
esis
an
d sp
eec
h r
e
cog
n
itio
n. T
h
e
meth
ods
of
G2P giv
e
vary
in
g
accurac
i
es for
different
lan
g
u
ages
alth
ou
gh
they
are
desi
gne
d to
be
la
n
gua
ge
in
dep
en
dent. T
h
is
pa
p
e
r
discuss
es a
ne
w
mod
e
l
bas
e
d
o
n
the
ps
eu
do
near
est n
e
i
ghb
our r
u
le
(P
NNR) for
Ind
o
nesi
an G2P. I
n
this
mo
de
l, a partial orth
ogo
na
l
binary co
de
for graphe
mes, contextual
w
e
ight
ing, a
nd ne
igh
b
o
u
rh
oo
d
w
e
ightin
g ar
e i
n
troduc
ed. T
e
s
t
ing to 9,
604
u
n
see
n
w
o
rd
s s
how
s that the
mo
de
l par
a
m
et
ers are
easy to
be
tuned
to re
ach
hig
h
accuracy
. T
e
sting to
1
23 s
ent
e
n
ces contai
nin
g
ho
mo
gra
phs s
h
o
w
s that the
mode
l
coul
d dis
a
mbi
g
uate h
o
m
ogr
ap
hs if it
uses a l
ong
grap
he
mic
context.Co
m
p
a
red to
an i
n
for
m
ati
on
gai
n tre
e
,
PNNR
giv
e
s
a slightly hig
her pho
ne
me error
ra
te, but it could dis
a
mbi
g
u
a
te ho
mo
gra
phs.
Ke
y
w
ords
: gr
aph
e
m
e-to-p
ho
ne
me co
nvers
i
on, Indo
nesi
a
n lan
g
u
a
g
e
, pseud
o ne
arest
neig
h
b
our ru
l
e
,
partia
l
ortho
g
o
nal b
i
nary co
de
, contextual w
e
ighti
n
g
1. Introduc
tion
In gene
ral, there a
r
e three
approa
che
s
in G2P, lingu
istic
kno
w
led
ge-b
a
sed a
p
p
roa
c
h,
data-d
r
iven a
ppro
a
ch, and
a combi
nati
on of them. The first a
p
p
r
oa
ch i
s
com
m
only use
d
for
spe
c
ific lan
g
uage,
but it u
s
ually
ha
s lo
w g
ene
rali
sat
i
on for un
se
e
n
word
s.
Hen
c
e, mo
st
re
cent
researches employ the
second
approach because of flexibility
and general
isati
on, such
as
informatio
n g
a
in tree [1], conditio
nal rand
om
field
s
[2], Kullback-Lei
ble
r
divergen
ce
-b
ased
hidde
n Markov model [3], joint multigra
m model
s
[4]
,
instan
ce-ba
s
ed l
earning
[5], table lookup
with defaults
[5], neural ne
tworks [6], finite st
ate [7] a
nd [8], morph
o
logy and ph
oneme hi
story
[9], hidden M
a
rkov model
[10], and self
-lea
rnin
g tech
nique
s [11]. These metho
d
s a
r
e ge
ne
rally
desi
gne
d to
be la
ngua
ge
indep
ende
nt, but the
data
se
t
s
they
use are
comm
o
n
ly for En
glish,
Dutch, and French.
The Indon
esi
an lang
uage
has rel
a
tively simple pho
nemic
rule
s. A graph
eme
<u>
i
s
gene
rally pro
nun
ced a
s
/u
/. But,if <u> is pre
c
e
ede
d by <a> in
so
me ca
se
s, then it shoul
d be
pron
un
ced a
s
a diphton
g
/aw/, such
as the wo
rd
‘
kerb
au
’ (b
u
ffalo) that is pronu
nced as
/kerb
a
w/. Thi
s
is
different f
r
om Engli
s
h,
whe
r
e a
gra
p
heme
<u
> co
uld be p
r
o
n
u
n
ce
d a
s
/u/, /a/,
or /e/ such as in ‘put’, ‘funny’, or ‘furt
her’.Ind
one
sian langu
age
has thirty two pho
nem
e
s
:
sixvowel
s
, four dip
h
tong
s,
and twe
n
ty two con
s
on
an
ts [12]. Those phon
eme
s
and the relat
ed
English oneswritten using the ARPAbet sym
bols
can be seen
in [
13].Indonesian has nine
affixes: six p
r
efixes a
nd t
h
ree
suffixes [12]. The usage of sufix ‘
-i
’may produ
ce a
n
ambi
g
u
ity
betwe
en a vo
wel seri
es a
n
da diphto
ng.
For exam
ple,
a prefix ‘
m
eng
-‘ follo
wed b
y
a root ‘
ku
as
a
’
(autho
rity)
a
nd a
suffix ‘-
i
’
p
r
od
uc
es
a d
e
r
i
va
tive
‘
m
engua
sai
’. The
gra
pheme
<
ai
> in
‘
m
enguasai
’isa vowel
se
rie
s
/a/ and /i/, but <
ai
>in a
root ‘
belai
’
(ca
r
es) i
s
a dip
h
tong /ay/. There
are so many
su
ch cases in
the Indone
si
an lang
uage t
hat make the
G2P co
nversi
on quite ha
rd.
Nea
r
e
s
t nei
g
hbou
r i
s
q
u
ite
a g
ood
meth
od fo
r ma
ny
probl
em
s. Thi
s
m
e
thod
pe
rforms a
high
accu
ra
cy for i
s
olate
d
sig
n
la
ngu
a
ge
cha
r
a
c
ter
recognitio
n
[1
4], as well a
s
fo
r b
a
n
k
rup
t
cy
predi
ction m
odel
s [15]. Now, ther
e are so many variations for
t
h
is method, such as fuzzy k-
NN,n
e
igh
bou
rhoo
d weight
ed nea
re
st n
e
ighb
our,P
NNR,
etc. In [
16], the rese
arche
r
s
sho
w
tha
t
PNNR pe
rforms better th
an the traditi
onal
k
-ne
a
re
st neigh
bou
r classificatio
n
rule (kNN), the
neigh
bou
rho
o
d
weig
hted
nearest n
e
ig
hbou
r cl
assifi
cation
rule
(WNN), a
nd t
he local me
an
-
based le
arni
n
g
metho
d
(L
M) in l
a
rg
e training
sa
mpl
e
and
mixture mod
e
l data
situation
s
. B
u
t, in
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 389 – 39
6
390
a sm
all trai
ni
ng
sampl
e
a
nd
singul
ar
model
data,
PNNR p
e
rfo
r
ms b
e
tter th
an b
o
th kNN and
WNN, but it does n
o
t outpe
rform the LM.
This resea
r
ch focu
se
s o
n
developi
ng
a new G
2
P model ba
se
d on PNNR for the
Indone
sia
n
la
ngua
ge. In th
is mod
e
l, a p
a
rtial o
r
thogo
nal bina
ry co
de for g
r
ap
he
mes, contextual
weig
hting, a
n
d
nei
ghb
ourh
ood
wei
ghtin
g a
r
e
pro
p
o
s
ed. Thi
s
mo
del
will b
e
e
v
aluated
usi
n
ga
data set of 47
thousa
nd wo
rds a
nd will b
e
comp
ar
ed to the IG-tree
method a
s
de
scribe
d in [1].
2. Rese
arch
Metho
d
Conve
r
ting
a grap
hem
e
into a phoneme
cont
extually dep
end
s on so
me other
suroun
ding
g
r
aph
eme
s
. T
he
contextua
l
length
is va
rying b
a
sed
on the
lan
g
u
age. In [5], t
he
optimum
cont
extual length
s
for Engli
s
h,
Dut
c
h,
an
d
Fren
ch
are fi
ve gra
phem
e
s
o
n
the l
e
ft and
five graph
em
es o
n
the rig
h
t. In other la
ngua
ge
s
with
some
homo
g
rap
h
s, the
contextual len
g
ths
coul
d be lon
g
e
r.
In [1], the cal
c
ulatio
n of inf
o
rmatio
n g
a
in
(I
G) for
6,79
1 Indo
ne
sian
words sho
w
s
that the
focu
s graph
e
m
e ha
s the h
i
ghe
st IG (around 3.9
)
.
Th
e first graph
e
m
es o
n
the ri
ght and o
n
the
left of the fo
cu
s have a l
o
we
r IG tha
n
that on
the focu
s, i.e. arou
nd 1.1.
The IG sh
a
r
ply
decrea
s
e
s
u
n
til the seventh gra
phe
me (ar
oun
d
0.2). Devel
oping a IG-tree usi
n
g
s
e
v
en
graphem
es
on the ri
ght and the
left, commonly written as 7-1-7,
produces a phonem
e error rate
(PER) of 0.9
9
% and
a
word
error rat
e
(WER) of
7.58% for
6
79 un
se
en
words [1]. This
contextual
scheme 7
-
1-7 is
adapted in th
is re
sea
r
ch.
2.1. Data Pre
p
roces
sing
The d
a
ta
sets used h
e
re a
r
e pai
rs of word
(g
rap
hemi
c
symbol
s) an
d their pronu
nciatio
n
(pho
nemi
c
sy
mbols). First, each wo
rd
sh
ould be
ali
g
n
ed to the co
rresp
ondi
ng ph
onemi
c
symb
ol
(se
e
figu
re
1), whe
r
e ‘
*
’ i
s
a symb
ol fo
r
blan
k (no
ph
oneme
)
.
Next, each g
r
ap
h
e
me o
c
cu
rrin
g
in
a word is co
nse
c
utively locate
d as th
e focu
s gra
p
heme an
d the others are
located on t
heir
approp
riate
contextual po
sitions a
s
ill
ustr
ated by fig
u
r
e 2. In
the fi
gure,
wo
rd ‘
be
la
i
’ (c
ar
es
) is
trans
formed into five patterns
.
In this research, ea
ch pho
nemic
symbo
l
is desig
ned
to have one characte
rto simplif
y
the alig
nment
process. Ta
ble 1
list
s
all
phon
em
e
s
a
nd thei
r
one
-cha
ra
cter sy
mbols.
Phon
eme
/ng/ is sym
b
o
lised
as /)/ to
disting
u
ish it with
the p
h
o
neme
se
rie
s
/n/ and /g/. For in
stan
ce, two
grap
hem
es <n> an
d <g
> in
astrin
gen
should be
con
v
erted as a p
honem
e se
ri
es, not a sin
g
le
phon
eme /)/.
Figure 1. Aligning a word t
o
its phon
emi
c
symb
ols. G
= gra
phe
me
s, P = phonem
es
L7 L6
L5 L4 L3
L2 L1
Focu
s
R1 R2
R3 R4 R5 R6 R7
Class
* *
* * *
* *
b
e l
a i
* * *
b
* *
* * *
*
b
e
l a
i
* * * *
ə
* *
* * *
b
e
l
a i
* * * * *
l
* *
* *
b
e l
a
i
*
* * * * *
$
*
* *
b
e l
a
i
* *
* * * * *
*
Figure 2. Locating ea
ch graphem
e occu
red in a wo
rd as focus a
nd
its cla
ss. L1 i
s
the first
grap
hem
e on
the left of
the focus a
nd R1 is the first g
r
aph
eme o
n
the right
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Nea
r
e
s
t Neig
hbou
r-Ba
s
e
d
Indone
sia
n
G
2
P Con
v
e
r
si
o
n
(Suyanto)
391
Table 1. Indo
nesi
an ph
one
mes an
d their one-cha
r
a
c
te
r symbol
s
Phoneme(s)
One-cha
r
acter p
honemic sy
m
bol
/kh/ (
/ng/ )
/n
y
/
+
/sy
/
~
/a
y
/
$
/aw/
@
/e
y
/
%
/o
y
/
^
/a/ and /?/
1
/e/ and /?/
2
/
ə
/ and /?/
3
/i/ and /?/
4
/o/ and /?/
5
/u/ and /?/
6
Homo
graph
s
sho
u
ld be
accomp
anie
d
b
y
one or m
o
re other
wo
rd
s on the l
e
ft or ri
ght to
disam
b
iguate them, as illustra
ted
by Figure 3. The
word ‘
ap
el
’ is a
homog
rap
h
with two diffe
rent
pron
un
ciation
s
, i.e. /
ap
ə
l
/ (apple
)
a
n
d
/
a
pel
/ (assem
bl
y). The
sente
n
ce‘
m
e
reka m
e
m
a
kan
bu
ah
apel sebelum
apel pagi di
m
u
lai
’ (they eat an apple
before the
morni
ng a
s
sembly) shoul
d be
inclu
ded in th
e data set sin
c
e some
wo
rds on th
e left and rig
h
t of word ‘
ap
el
’ are very import
ant
to dis
a
mbiguate that homograph.
Figure 3. Gra
pheme
s
in a
sente
n
ce and
the aligned p
honemi
c
sym
bols
2.2. Partial orthogo
nal binar
y
code
The g
r
ap
he
me en
co
din
g
u
s
ed fo
r
neural n
e
twork-b
a
sed G
2
P is
usuall
y
full of
orthog
onal
bi
nary code,
such
as i
n
[6]. Here a
p
a
rti
a
l ortho
gon
al
binary
cod
e
is propo
se
d
b
y
con
s
id
erin
g a
categ
o
ri
satio
n
of Indon
esi
an ph
onem
e
s
ba
se
d on
(1) a
r
ticul
a
tion
mann
ers: st
op,
fricative, na
sal, trill, latera
l, or semivo
wel;
(2)
articulation a
r
ea:
bilabial, la
bi
odental, alve
olar,
palatal, vela
r, or
glotal; a
n
d
(3
) th
e con
d
ition of
vo
cal cord
s: voi
c
ed
or unvoi
ced. T
he
pa
rtial
orthog
onal
bi
nary
cod
e
s for Ind
one
sia
n
graph
eme
s
are
liste
d in
table 2. Ba
se
d on
the
cod
e
s,
two graphe
m
e
s in the sam
e
categ
o
ry h
a
ve two
different bits and t
heir eu
clidi
a
n
distan
ce is
2
.
But, those in different cate
gorie
s have f
our di
fferent bits and thei
r eucli
dian di
stance is 2.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 389 – 39
6
392
Table 2.Parti
a
l orthog
onal
binary code f
o
r all I
ndon
esian gra
phe
me
s and th
ree ot
her sym
bol
s: ‘*’
(blan
k
o
r
no g
r
aph
eme
)
, ‘-‘ (da
s
h
)
, and ‘space’.
Grap
heme
Binar
y
code
a 110000000
0000
000000000
0000
000000000
0000
00000000
e 101000000
0000
000000000
0000
000000000
0000
00000000
i 100100000
0000
000000000
0000
000000000
0000
00000000
o 100010000
0000
000000000
0000
000000000
0000
00000000
u 100001000
0000
000000000
0000
000000000
0000
00000000
b 000000110
0000
000000000
0000
000000000
0000
00000000
p 000000101
0000
000000000
0000
000000000
0000
00000000
t 000000000
1100
000000000
0000
000000000
0000
00000000
d 000000000
1010
000000000
0000
000000000
0000
00000000
k 000000000
0001
100000000
0000
000000000
0000
00000000
g 000000000
0001
010000000
0000
000000000
0000
00000000
c 000000000
0000
001100000
0000
000000000
0000
00000000
j 000000000
0000
001010000
0000
000000000
0000
00000000
f 000000000
0000
000001100
0000
000000000
0000
00000000
v 000000000
0000
000001010
0000
000000000
0000
00000000
s 000000000
0000
000000001
1000
000000000
0000
00000000
z 000000000
0000
000000001
0100
000000000
0000
00000000
m
000000000
0000
000000000
0011
000000000
0000
00000000
n 000000000
0000
000000000
0010
100000000
0000
00000000
h 000000000
0000
000000000
0000
011000000
0000
00000000
x 000000000
0000
000000000
0000
000110000
0000
00000000
q 000000000
0000
000000000
0000
000001100
0000
00000000
r 000000000
0000
000000000
0000
000000011
0000
00000000
l 000000000
0000
000000000
0000
000000000
1100
00000000
w 000000000
0000
000000000
0000
000000000
0011
00000000
y 000000000
0000
000000000
0000
000000000
0000
11000000
* 000000000
0000
000000000
0000
000000000
0000
00110000
- 000000000
0000
000000000
0000
000000000
0000
00001100
space
000000000
0000
000000000
0000
000000000
0000
00000011
2.3. Contextual
w
e
ig
ht
As de
scribe
d
in [1] and [5], graph
eme
s
that closer to
the focu
s ha
ve a highe
r IG. Thus,
a
contextual weig
hting use
d
he
re
i
s
an
e
x
ponential
fu
nction
in
ord
e
r to
app
roa
c
h
the trend
of t
h
e
IG. The wei
g
ht of the
i
-th contextual g
r
aphem
e on t
he left or
the
right is fo
rmul
ated by equ
a
t
ion
1, where
L
is the context
ual length. The focu
s gr
a
pheme i
s
the 0-th context, which h
a
s t
h
e
maximum wei
ght.
1
i
L
i
p
w
(1)
Figure 4. illustrates the f
unction for varying
p
. It is
clea
r that t
he g
r
eate
r
th
e
p,
the
sha
r
pe
r th
e
slope
of the
fu
nction.
The
p
sh
ould
be
chosen
pre
c
i
s
ely to bal
an
ce the
wei
ghti
n
g
.
The tren
d of what is d
e
scri
bed in [1] cou
l
d be app
roa
c
hed by the Figure 4.
For
p
= 1.6, the wei
ght of the first cont
ext
ual grap
he
me is 26.84
3
5
and the su
m of the
se
con
d
to the
seventh
gra
p
heme
s
is
42.
0726. It
mea
n
s that fu
rthe
r contextual g
r
aph
eme
s
h
a
v
e
quite high i
m
porta
nce in
decidi
ng th
e output cla
ss. But, if
p
= 4.0, the weig
ht of the first
contextual g
r
aphem
e is 1
6
,384 and th
e sum of
the others is 5,460. This
make
s the first
contextaul
graphem
es hav
ing very
high
impo
rtan
ce
and th
e furth
e
r o
n
e
s
a
r
e
a
l
most u
s
el
ess. It
is quite e
a
sy
to rea
s
on th
at
p
= 2
will
be very good since the
weight for the
first contextual
grap
hem
e is
128 an
d the
sum of the ot
hers is 1
26. Thus, a p
a
ttern with
wro
n
g
grap
heme
s
on
the first left and the first rig
h
t context has a low
di
stan
ce. But, if it h
a
s corre
c
t graphem
es o
n
the
se
con
d
to
th
e seventh
co
ntext, its di
stance
coul
d b
e
quite
simila
r to th
e
patte
rns that
have
a
corre
c
t graph
eme eithe
r
o
n
the first lef
t
or on t
he firs
t right c
o
ntext. However, this
is
jus
t
a
predi
ction. Th
e optimum va
lue of
p
coul
d
be some val
ues a
r
ou
nd 2.
0.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Nea
r
e
s
t Neig
hbou
r-Ba
s
e
d
Indone
sia
n
G
2
P Con
v
e
r
si
o
n
(Suyanto)
393
Figure 4. Fou
r
example
s
of
contextual weightswith varying
p
an
d
L
=
7
2.4. Pseudo neare
s
t neig
hbour rule
PNNR
wo
rks simply
by ca
lculatin
g the
total di
stan
ce
of the
k
ne
are
s
t ne
ig
hb
o
u
r
s
in
a
l
l
cla
s
ses a
nd t
hen de
cidi
ng
a cla
ss
with
a minimum t
o
tal distan
ce
as the o
u
tput
. The
k
ne
arest
neigh
bou
rs a
r
e wei
ghted
grad
ually based on their rank
i
n
g
s
of distan
ce in ascendin
g
orde
r.
In
[16], the nei
g
hbou
rho
od
weight for
j
-th
neigh
bou
r i
s
formulate
d
a
s
o
ne
divided
by
j
. It is c
l
ear
that the close
s
t neighb
ou
r has a weight
of 1,
and the
weig
hts grad
ually
decrea
s
e for the further
neigh
bou
rs. T
hus, the furth
e
st neig
hbo
ur has the lo
we
st weig
ht.
Figure 5. Nei
ghbo
urh
ood
weig
hts for varying
c
PNNR
as in [
16] is a
dopte
d
in
this re
se
ar
ch
be
cau
s
e
of its hig
h
p
e
rform
a
n
c
e
for l
a
rg
e
data sets. But
,
the neig
hbo
urho
od
weig
h
t
formula
u
is
slightly mo
dified by introdu
cing
a
con
s
ta
nt
c
a
s
a p
o
wer for the
di
sta
n
ce
ra
nki
ng t
o
produ
ce
va
rying g
r
a
dual
neig
hbou
rh
o
od
weig
hts
a
s
descri
bed i
n
equatio
n 2. T
hus, the
gre
a
t
er
c,
the
sharper decr
easi
ng of weight
as illustrated
b
y
figure 5. It is clea
r that if
c
= 1.0, then
u
j
is the sa
me a
s
the formul
a descri
bed in [
16].
c
j
j
u
1
(2)
3. Results a
nd Discu
ssi
on
The data set use
d
here co
ntains 4
7
tho
u
sa
nd pai
rs o
f
word
s (a
nd
some
sente
n
c
e
s
) an
d
their p
r
o
nun
ci
ation
colle
cte
d
from
the
gr
eat di
ctiona
ry of the In
don
esia
n la
ngua
ge(
Kamus Bes
a
r
Bahasa Indo
nesi
a
Pusat Bahasa
, abbreviated
as KBBI) fourth
edition, released in
2008,
1
1.
5
2
2.
5
3
3.
5
4
4.
5
5
5.
5
6
0
0.
1
0.
2
0.
3
0.
4
0.
5
0.
6
0.
7
0.
8
0.
9
1
i
-
t
h
near
es
t
nei
gh
bor
We
i
g
h
t
c
=
0.
50
c
=
1.
00
c
=
1.
50
c
=
2.
00
0
10
0
20
0
30
0
40
0
50
0
60
0
W
ei
gh
t
p =
2.
20
p =
2.
00
p =
1.
80
p =
1.
60
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 389 – 39
6
394
develop
ed by
P
u
sat
B
aha
sa
. The data
set is divide
d into three
grou
ps: 60%
train set, 20
%
validation set, and 20% test
set.
First, the
PNNR is train
e
d
usi
ng th
e trai
n se
t. Next, the trai
ned
P
N
NR i
s
valid
a
t
ed u
s
ing
a validation
set to get the optimum valu
es forth
e
thre
e paramete
r
s:
k
(nei
ghb
ou
rhoo
d si
ze
),
p
(co
n
textual weight), and
c
(neigh
bou
rho
o
d
weig
ht).
3.1. Opt
i
mum paramet
e
rs
First, the op
timum valueof
k
shoul
d be
found sin
c
e it is quite hard to predictthi
s
para
m
eter. T
h
is is pe
rformed by usin
g a parti
al orthogon
al bin
a
ryco
de an
d by assuming
the
optimum valu
e for
p
is
2.0
(ba
s
ed o
n
th
e mathemati
c
al cal
c
ulatio
n
as de
scrib
e
d
in sub
-
sectio
n
2.3) a
nd the
optimum valu
e for
c
is 1.0
(ba
s
e
d
on t
he expe
rime
nt
al re
sult
s in
[16]). Analysing
th
e
t
r
a
in se
t give
s
th
e min
i
mu
m nu
mbe
r
o
f
pa
tte
r
n
s
in
th
e s
m
a
llest c
l
as
s as
1
1
. H
e
nc
e
,
in
th
is
experim
ent,
k
co
uld
be
1 t
o
11.
A
com
puter sim
u
lati
on
sho
w
s th
at the PE
R i
s
hi
gh
(1.2
22
%)
whe
n
k
=
1 since th
e ne
w pattern i
n
th
e validation
set sho
u
ld b
e
simila
r to onl
y one p
a
ttern
in
the dec
i
s
i
on
c
l
as
s
.
It means
the PNNR is
ver
y
s
p
ec
ific
in dec
i
ding
the output c
l
as
s
.
The
PER
is
also hi
gh
(1.0
89%) when
k
is 11 that
sh
owsthe PNNR is too
gen
e
r
al. The o
p
timum
k
is
6
,
wh
i
c
h
prod
uces the
lowe
st PER (1.065%) a
nd
also the lo
we
st WER (7.44
9
%).
Next, the optimum value o
f
p
is sea
r
che
d
using
k
= 6
and
c
= 1.0. The simul
a
tion
sho
w
s
that the PER is very high,
i.e. 1.153% and 1.13
2%, when
p
is l
o
w (1.50)and
p
is
h
i
gh
(
4
.0
0)
respe
c
tively. The
optimum
p
is 1.90 whichgives the lo
we
st PER (1.
058%).
The optimu
m
value of
c
i
s
then inve
st
igated u
s
ing
k
= 6 an
d
p
= 1.90 (ba
s
e
d
on the
previou
s
exp
e
rime
nts). T
he simul
a
tio
n
sho
w
s
tha
t
the PER is very high,
i.e. 1.092% and
1.122%, whe
n
c
is sm
all (0.50)and
c
is big
(2.50
)
resp
ectively.The o
p
timum
c
i
s
1.07,
wh
ich
prod
uces PE
R = 1.05
3%. The greater t
he
c,
the lower the value
s
of distant nei
ghbo
urs.
Finally, the PNNR
with opt
imum value
s
for those pa
rameters,
k
= 6,
p
= 1.
9
0
,
c
= 1.07,
and th
e
parti
al o
r
thog
onal
bina
ry
cod
e
, are te
sted
to
9,604
u
n
see
n
words (75,
456
graphe
m
e
s).
The PNNR p
r
odu
ce
s PER = 1.07% an
d WER
= 7.6
5
%, which i
s
quite simila
r to those of the
tr
ain
s
e
t
(
PER
=
1.053%). It s
h
ow
s that the
PNNR has
ver
y
good gene
r
a
lisation c
a
pability
.
These re
sultsare sli
ghtly better than tho
s
e u
s
i
ng the
PNNR with a
full orthogon
al binary co
d
e
,
whi
c
h produ
ces a PER of 1.08% and WER of 7.68%.
3.2. Homogr
aphsdis
a
mbiguation
Contextual
le
ngth
L
could
be
sho
r
t o
r
lo
ng to
se
e if it
co
uld
disam
b
iguate
hom
o
g
rap
h
s.
To see the ef
fect of contex
tual length, 369 s
ent
ences containin
g
h
o
mog
r
ap
hs a
r
e add
ed to the
train set. Th
en, 123
un
se
en hom
og
rap
h
s a
r
e te
sted
to the PNNR u
s
ing
a pa
rtial orth
ogo
n
a
l
cod
e
,
k
=
6,
p
= 1.90,
c
= 1.
07, and va
rying
L
. The
results are illust
rated by figure 6. The WE
R
is
very high
(5
4
.
472%)
whe
n
L
= 1. Th
e
PNNR
rea
c
h
e
s
optimum f
o
r
L
=
8, 9, o
r
10
with
WE
R
=1.63%.
Figure 6. The
WER for varying
L
, from 1 to 15
Based
on th
ose
expe
rim
ental results,
the optimu
m
values for
b
o
th
p
a
nd
c
are quite
easy to
b
e
tu
ned
andth
e
y
coul
d b
e
p
r
e
d
icted
ma
th
e
m
atically. But
,
the o
p
timu
m value
for
k
is
54.
47%
17.
89%
11.
38%
4.
07%
4.
07%
3.
25%
1.63%
1.63%
1.63%
3.
25%
5.
69%
3.
25%
3.
25%
3.
25%
0.00%
20.00%
40.00%
60.00%
23
4567
89
1
0
1
1
1
2
1
3
1
4
1
5
Pe
rc
e
n
t
a
g
e
L
The
WER
fo
r
va
r
y
i
n
g
L
with
k
=
6,
p
=
1.90
and
c
=
1.07
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Nea
r
e
s
t Neig
hbou
r-Ba
s
e
d
Indone
sia
n
G
2
P Con
v
e
r
si
o
n
(Suyanto)
395
quite ha
rd to
be p
r
edi
cted
sin
c
e the
r
e i
s
no m
a
thema
t
ical tool to
predict. Some
value
s
of
k
, from
5 to 8, produ
ce a simila
r PER. It means
this paramete
r
is not sen
s
itive.
3.3. C
o
mparison of
PNNR t
o
IG-
t
r
ee
The optimu
m
values fo
r all
para
m
eters
of the PNNR
are
k
= 6,
p
=
1.90,
c
= 1.
07, and
encodin
g
= p
a
rtial orth
ogo
nal bina
ry co
de. Te
sting t
o
9,604 un
se
en Indon
esi
a
n words
sho
w
s
that the PNNR ha
s a very
good g
ene
ralisatio
n
abilit
y with PER = 1.07% a
n
d
WER
= 7.65
%.
These re
sult
s are
slightly worse th
an th
at of t
he IG-tree in [1] that prod
uced PE
R = 0.9
9
% a
n
d
WER
=
7.58
%. But, it should
be
not
ed that the
IG-tree
was
tested to
onl
y 679 u
n
see
n
Indone
sia
n
word
s. The P
N
NR is
ca
pa
ble of di
sam
b
iguatin
g ho
mograph
s, b
u
t the IG-tre
e is
not.The IG
-tree
sho
u
ld b
e
help
ed
by a
text-catego
ri
sation meth
od
to di
sambi
g
u
a
tehomo
g
rap
h
s
[1].
Table 3.C
o
m
pari
s
on of the
PNNR to the
IG-tree
Comparison
IG-t
ree
PNNR-based
Number of
words
in testing set
679
9,504
PER 0.99%
1.07%
WER 7.58%
7.65%
Could disambiguate homogr
aphs
?
No
Y
e
s
3.4. The disa
dv
antages o
f
PNNR-bsed
G2P
A PNNR
with
no lin
gui
stic
kno
w
le
dge
wi
ll have a
prob
lem. The
r
e
are some
ca
se
s
whe
r
e
conve
r
ting g
r
aphem
es to t
he ph
onem
e
s
shoul
d not
occur
be
cau
s
e of the dep
e
nden
cy of other
grap
hem
es o
n
the left or on the right. For insta
n
ces,
see Fig
u
re 7.
Figure 7. Two
examples of
wro
ng
G2P conversion
A word ‘
m
e
m
belai
’ (
c
ar
es
)
s
h
ou
ld
be
c
o
n
v
er
te
d in
to
/m
ə
mb
ə
l$*/, but the PNNR
conve
r
ted it
to /m
ə
mb
ə
l$i
/. In this ca
se, the
gra
p
heme
<i
> sh
ould n
o
t be
conve
r
ted to
a
phon
eme /i/ sin
c
e the left graph
eme
<
a
>
had be
en converted to /$/. The grap
h
e
me <i
> sh
o
u
ld
be co
nverte
d to /*/. The word ‘
m
encel
ai
’ (to dep
re
cate
) sh
ould b
e
converted i
n
to /m
ə
+c
ə
lai/, but
the PNNR co
nverted it to /m
ə
+c
ə
la
*/. In this ca
se, the grap
hem
e <i>
sho
u
ld no
t be conve
r
te
d to
/*/ since the
l
e
ft grap
hem
e
<
a
>
h
ad b
e
e
n
converte
d t
o
/a/. This P
N
NR
ha
s not
been
de
sign
e
d
to
handl
e su
ch
ca
se
s so it prod
uces
so
me wrong
conversion
s. To
solv
e th
e
probl
ems,
some
lingui
stic kn
o
w
led
ge co
uld
be inco
rpo
r
ated into
the PNNR. Fo
r example
s
: diphtong /$/ co
uld
occur if gra
p
heme
<a
>is f
o
llowed by ei
ther
<i>
or
<y>; diphton
g
/@/ occu
rs if
grap
hem
e <a
>is
f
o
llowe
d by
e
i
t
her
<u
> o
r
<w
>;
di
phton
g /%/ occurs
if grap
heme
<e
>is followe
d by eithe
r
<i
> o
r
<y>; dip
h
tong
/^/ occurs if
grap
hem
e <o
>is follo
we
d
by either
<i>
or <y
>; pho
n
e
me /(/ occu
rs if
<k> o
r
<c>i
s
followe
d by <h>; pho
nem
e
/)/ occu
rs
if <n
>is follo
we
d by <g
> or
<k>; ph
one
me
/+
/
oc
cur
s
if
<n
>
i
s f
o
llo
wed
b
y
<y
>,
<
c
>,
or
<j>;
p
hon
eme /~/
occurs if <s>is f
o
llowed by
<y>;
phon
eme /1/, /2/, /3/,
/4/, /5/, and /6/ occur if g
r
a
phe
me <a
>, <e>,
<e
>, <i
>, <o
>, <u
>i
s follo
we
d
by another
co
nstrai
ned vo
wel re
sp
ectiv
e
ly.
The PNNR n
eed
s as m
a
n
y
train setsa
s
po
ssi
ble to
be stored a
s
patterngrou
ps. To
deci
de a
n
o
u
tput cla
s
s, the
PNNR
sho
u
l
d
find the
k
n
eare
s
t n
e
igh
b
ours. Hen
c
e,
the processin
g
time in the PNNR is relati
vely longer th
an that
in neural net
works or rule
-ba
s
e
d
method
s. But,
this pro
b
lem
coul
d be solved by an inde
xing techni
qu
e.
Contextual weighting u
s
ed
here is a
n
ex
ponential fu
nction that is
equally used
for both
left and ri
ght
contexts. T
h
is could
be
modified to f
o
llow th
e tre
nd of the IG,
whe
r
e th
e ri
ght
contexts com
m
only have
a sli
ghtly hig
her I
G
t
han
t
he left o
n
e
s
as
de
scribe
d
in [1]. Hen
c
e
,
contextual
weighting fun
c
t
i
on in eq
uati
on 1
could
b
e
split to be t
w
o different functio
n
s, for
the
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 389 – 39
6
396
right
and
the l
e
ft context.
Next,
p
for th
e
right
co
ntext coul
d b
e
tun
e
d
sli
ghtly g
r
e
a
ter th
an fo
r t
he
left one.
4. Conclusio
n
The optimum
values for bo
th contextual weig
ht
p
and neigh
bou
rho
o
d
weight
c
a
r
e easy
to be tun
ed
si
nce th
ey are
not very sen
s
itive.
They also
coul
d be
p
r
edi
cted m
a
th
ematically. B
u
t,
the optimum
value for nei
ghbo
urh
ood
size
k
i
s
quit
e
hard to be
predi
cted
be
cause there is no
mathemati
c
al
tool to p
r
e
d
ict.The
co
ntextual length
L
could
be
quite lo
ng t
o
di
sambi
g
u
a
te
homog
ra
ph
s. Given some
repr
esentati
v
e of enough
training sent
ences, the PNNR-ba
s
ed
G2P
coul
d conve
r
t
word
s into pron
un
ciation
symbols.
Co
mpare to the IG-tree, the PNNR gives
a
slightly hig
h
e
r
PER, but it
could
disa
mbi
guat
e homo
g
r
aph
s.
Som
e
lingui
stic kno
w
led
ge coul
d
be
inco
rpo
r
ate
d
to improve the
accura
cy of the PNNR-ba
sed G
2
P.
Ackn
o
w
l
e
dg
ment
The first aut
hor i
s
no
w a
docto
ral
stu
dent in
Com
puter S
c
ien
c
e Prog
ram,
Faculty of
Mathemati
c
s and Natural
Scien
c
e
s
, G
adjah M
ada
University. He is an
empl
oyee of Tel
k
om
Found
ation
of Education
(Yayas
a
n
Pendidi
kan
Telko
m
, YPT) as a le
ctu
r
er at Scho
ol of
Comp
uting, T
e
lkom
Unive
r
sity (forme
r:
Telko
m
In
stit
ute of Techn
o
logy). Thi
s
work i
s
supp
orted
by YPT with grant nu
mbe
r
: 15/SDM-06/
YPT/2013.
Referen
ces
[1]
Harto
y
o A, S
u
ya
nto. An
impr
oved I
ndo
nes
i
an
gr
ap
heme-t
o
-ph
onem
e co
nversi
on
usin
g
statistic an
d
ling
u
istic i
n
for
m
ation.
Intern
ation
a
l Jo
urn
a
l
Rese
arch i
n
Co
mp
uting Sc
i
ence (IJRC
S
).
201
0; 46(1):
179-
190.
[2]
Don
g
W
,
Simo
n K. L
e
tter-to-soun
d pr
on
unc
iatio
n
pr
edicti
o
n usi
n
g
co
nditi
ona
l ra
nd
om fi
elds.
IE
EE
Sign
al Process
i
ng L
e
tters.
20
11; 18(2): 1
22-
125.
[3]
Ramy
a
R, Mathe
w
MD.
Ac
o
u
stic d
a
ta-driv
en
grap
he
me-to-ph
one
me c
o
nversi
on
usin
g
KL-HMM
,
Internatio
na
l Confer
ence
o
n
Acoustics,
Speec
h an
d
Sign
al Proc
es
sing (ICASSP
). 2012 IEEE
Internatio
na
l C
onfere
n
ce o
n
. 201
2: 484
1-48
44.
[4]
Maximil
i
an
B,
Herman
n
N. J
o
int-se
que
nce
mode
ls for gr
a
phem
e-to-ph
o
n
e
me co
nvers
i
o
n
.
Speech
Co
mmun
icati
o
n
. 2008; 5
0
(5): 434-
451.
[5]
Bosch A, Da
el
emans W
.
Dat
a
-orie
n
ted methods
for grap
h
e
me-to-p
hon
e
m
e
c
onv
ersio
n
. Proceed
ings
of the s
i
xth c
onfere
n
ce
on
Europ
e
a
n
ch
a
p
ter of
th
e A
ssociati
on for
Comp
utation
a
l
Li
ngu
istics,
Utrecht, T
he
Netherl
ands.
199
3.
[6]
Sejn
o
w
ski T
J
, Rose
nb
erg
C
R
. Paral
l
el
net
w
o
rks that
lea
r
n to pr
on
oun
ce En
glis
h te
xt.
Comple
x
System
s
. 19
87
: 145-16
8.
[7] Bouma
G.
A finite state and d
a
ta-or
i
e
n
ted
meth
od
for graph
e
m
e-to-ph
o
n
e
me
conversi
on
.
Procee
din
g
s o
f
the 1st Nort
h Americ
an c
hapter
of
the
Associati
on for
Comp
utation
a
l Li
ngu
istics
confere
n
ce, Se
attle, W
a
shingt
on. 200
0: 303-
310.
[8]
Caseir
o D, T
r
ancos
o I, Oliveira L, Vi
an
a C.
Graphe
me-to-pho
ne us
in
g finite-state t
r
ansd
u
cers
.
Procee
din
g
IEEE Workshop
on Spe
e
ch S
y
n
t
hesis, Santa
Monic
a
, CA, USA. 2002.
[9]
Reich
e
l, U
w
e
D, Florian S,
Using morp
hol
ogy an
d p
hon
e
m
e hist
ory to improv
e grap
he
me-to
-
pho
ne
me c
onv
ersio
n
. Procee
din
g
s of Interspeec
h. 200
5: 1937-
194
0.
[10] T
a
y
l
or
P.
Hid
d
en Markov M
o
dels for gr
ap
he
me to
pho
ne
me conv
ersio
n
. Procee
din
g
s of
Interspeec
h.
200
5: 197
3-19
76.
[11] Yvon,
F
.
Self-lear
nin
g
tech
ni
ques for
gra
p
h
e
m
e-to-
pho
ne
me c
onv
ersio
n
.
Proceed
in
g
of the 2
n
d
Onomastica R
e
searc
h
Col
l
o
q
u
ium. Lo
nd
on. 199
4.
[12]
Hasa
n A, Soe
n
jo
no D, H
a
n
s
L, Anton M
M
.
T
a
ta bah
a
s
a bak
u b
a
h
a
s
a Indo
nes
ia (
T
he stand
art
Indon
esi
an gra
m
mar)
. Jakarta
.
Balai Pustak
a
.
1998.
[13]
Sakria
ni S, K
o
ns
tantin
M, Sat
o
shi
N.
Ra
pi
d deve
l
op
ment o
f
initia
l
Ind
o
n
e
s
i
an ph
one
me-b
ased
s
p
e
e
ch
recog
n
itio
n us
i
ng th
e cross-l
a
ngu
ag
e a
ppro
a
c
h.
Procee
di
ng
O-COCOSDA. Jakarta. Ind
o
n
e
sia. 2
0
0
5
:
38-4
3
.
[14]
Santosa PI. Isolated Si
gn L
a
n
gua
ge C
haract
e
rs Reco
gniti
o
n
.
TEL
K
OMNIKA
. 2013; 11(
3
)
: 583-59
0.
[15]
Ariesh
anti I, P
u
r
w
a
nanto
Y,
Rama
dha
ni
A, Nu
ha M
U
.
Co
mparativ
e Stu
d
y
of B
ankru
pt
c
y
Pr
ed
ictio
n
Mode
ls.
TEL
K
OMNIKA
. 2013
; 11(3): 591-5
9
6
.
[16]
Yong
Z
,
Yup
u
Y, Lia
n
g
Z
.
Ps
eud
o
near
est
nei
ghb
our
rul
e
for p
a
ttern c
l
a
ssificatio
n
.
Expe
rt System
s
w
i
th Applicati
o
ns
:
An Internati
ona
l Journ
a
l
. 2
009; 36(
2): 358
7-35
95.
Evaluation Warning : The document was created with Spire.PDF for Python.