Indonesi
an
Journa
l
of El
ect
ri
cal Engineer
ing
an
d
Comp
ut
er
Scie
nce
Vo
l.
1
3
,
No.
2
,
Febr
uar
y
201
9
, pp.
6
71
~
676
IS
S
N: 25
02
-
4752, DO
I: 10
.11
591/ijeecs
.v1
3
.i
2
.pp
6
71
-
676
671
Journ
al h
om
e
page
:
http:
//
ia
es
core.c
om/j
ourn
als/i
ndex.
ph
p/ij
eecs
An inves
tigative
design of
optimum
stochastic l
angu
age mo
del
for ban
gla
autoc
omplet
e
Md. If
takher
Alam
Ey
amin
, Md. T
arek
H
ab
ib,
Muh
am
mad
If
te Khai
rul Isl
am
,
Md. S
ad
ek
ur
Rahma
n,
Md.
Abbas
A
li
K
h
an
Da
ffodil
In
te
rn
ational Unive
rsit
y
,
4/2, Sobhanba
g
,
Mirpur
Rd
,
Ban
gla
desh
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
A
ug
6
, 2
018
Re
vised
N
ov
22
, 2
018
Accepte
d
Dec
3
, 2
018
W
ord
completio
n
and
word
pre
d
i
ct
ion
ar
e
two
important
phenome
na
in
t
y
ping
tha
t
h
ave
ext
re
m
e
eff
ect
on
a
i
ding
disable
peo
ple
and
stud
ent
s
while
using
ke
y
bo
ard
or
oth
er
sim
il
ar
devic
es.
Such
aut
oco
m
ple
te
te
chn
iqu
e
a
lso
hel
ps
student
s
signific
ant
l
y
dur
ing
lea
rning
proc
ess
th
rough
construc
t
i
ng
prope
r
ke
y
words
durin
g
web
sea
r
chi
ng
.
A
lot
of
work
s
are
condu
cted
for
English
la
nguag
e, but for
Bangla,
i
t is sti
l
l
ver
y
in
ade
qua
te a
s we
ll a
s
the m
et
ri
cs
used
for
per
form
ance
computat
ion
is
n
ot
rigorous
y
e
t
.
Bangl
a
is
one
of
the
m
ost
l
y
spoken
la
nguag
e
s
(3.
05%
of
worl
d
popula
t
ion)
an
d
ran
ked
as
seve
nth
among
al
l
th
e
l
angua
g
e
s
in
the
world.
In
thi
s
pape
r
,
word
pre
dic
t
ion
on
Bangl
a
sente
nc
e
b
y
usin
g
stocha
st
ic,
i.
e
.
N
-
gra
m
base
d
language
m
ode
ls
a
re
proposed
for
aut
o
complete
a
sen
te
n
ce
b
y
pre
dicting
a
set
of
words
rat
her
t
han
a
sing
le
word,
which
was
done
in
pre
v
ious
work.
A
nove
l
a
pproa
ch
is
propo
sed
in
ord
er
to
find
th
e
opti
m
um
la
nguage
m
o
del
base
d
on
per
f
orm
anc
e
m
et
ri
c.
In
addi
t
ion,
for
find
ing
ou
t
bet
t
er
p
erf
orm
an
ce
,
a
la
rge
B
ang
la
cor
pus
of
di
ff
ere
nt
word
t
y
p
es
is
used
.
Ke
yw
or
d
s
:
Wor
d pr
e
dicti
on
Natu
ral la
ngua
ge pr
ocessi
ng
Lan
gu
a
ge
m
odel
N
-
gr
am
Ma
chine
le
a
rn
i
ng
Eager l
ear
ning
Perfo
rm
ance
m
et
ric
Copyright
©
201
9
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Md.
If
ta
kh
e
r A
lam
Eya
m
in
,
Daffodil
Inter
na
ti
on
al
Uni
ver
s
it
y
,
4/2
,
S
obha
nb
a
g,
Mi
r
pur
Rd,
Dh
a
ka 1
207, B
ang
la
des
h
.
Em
a
il
:
ifta
kh
er
.eyam
in@g
m
ail.co
m
1.
INTROD
U
CTION
Inn
ov
at
io
n
i
n
wr
it
in
g
a
nd
ty
ping
of
a
la
ng
uag
e
is
s
o
im
po
rta
nt.
Es
pecia
ll
y
fo
r
disa
ble
per
s
ons
a
nd
early
le
arn
e
rs
of
t
he
la
ngua
ge
.
A
pe
rs
on
ha
ving
disabili
ty
can
li
ve
a
c
om
fo
rtable
li
fe
if
he
or
s
he
ha
s
the
opport
un
it
y
of
ty
pin
g
a
note
,
an
em
ai
l
or
an
yt
hin
g
el
se
c
om
fo
rtably
with
the
ai
d
of
a
utoc
om
plete
.
In
a
dd
it
io
n,
for
t
he
ea
rly
le
arn
e
rs
i
n
a
ny
fiel
d
(
i.e
.
st
udents
,
novice
researc
hers)
th
e
aut
oco
m
plete
te
ch
nique
m
igh
t
be
ben
e
fici
al
duri
ng
the
le
a
rn
i
ng
proces
s
by
pr
ov
i
ding
m
os
t
su
it
able
s
uggest
ion
s
w
hile
sear
chin
g
for
new
top
ic
s
with
keyw
ords
.
T
hough
Ba
ng
la
is
one
of
the
m
os
t
widely
s
poke
n
la
ngua
ge
s
(
3.0
5%
of
w
or
l
d
popula
ti
on
)
an
d
consi
der
e
d
se
ve
nth
la
ngua
ge
of
al
l
la
ng
uag
e
s
in
the
w
or
l
d
[
1],
no
w
ork
wa
s
f
ound
on
a
utom
at
ed
Au
t
ocom
ple
te
.
In
rece
nt
c
oupl
e
of
ye
ars
,
very
few
ef
forts
ha
ve
bee
n
m
ade
f
or
w
ord
pred
ic
ti
on
,
s
pecial
ly
fo
c
us
e
d
on
Ba
ng
la
la
nguag
e
.
I
n
t
he
researc
h
w
ork
on
Ba
ng
la
w
ord
pr
e
dicti
on
[2
]
,
st
och
a
sti
c,
i.e.
N
-
gr
am
based
la
ng
uag
e
m
od
el
s
are
pro
posed
f
or
c
om
pleti
ng
a
se
ntence
by
pr
e
dicti
ng
a
si
ng
le
word.
Th
e
ne
xt
im
pr
ov
e
m
ent
to
ok
place
in
th
e
work
of
M.
T.
Ha
bib
et
.
al
.
[
3],
w
he
re
wor
d
pr
e
dicti
on
on
se
ntence
by
us
in
g
st
och
a
sti
c,
i.e.
N
-
gram
base
d
la
nguag
e
m
od
e
ls
.
They
ha
ve
use
d
a
novel
m
e
tric
to
assess
th
e
perform
ances
of
t
heir
propos
ed
m
od
el
.
Alth
ou
gh
they
achie
ved
good
acc
uracy
,
it
is
a
m
at
te
r
of
fact
that
opportu
niti
es
sti
ll
rem
ai
n
f
or
im
pro
vem
ent.
Art
ific
ia
l
In
te
ll
igence
use
d
f
or
w
ord
pr
edict
ion
in
Sp
a
nish
is
al
so
ob
serv
e
d
in
[4
]
,
i
n
w
hich
us
in
g
the
cha
rt
bott
om
-
up
te
chn
iq
ue,
synt
act
ic
and
s
em
a
ntic
analy
sis
is
done
for
w
ord
pr
e
dicti
on.
H.
Al
-
Mu
baid
[
5]
pr
ese
nted
an
ef
fecti
ve
m
et
ho
d
of
word
predict
io
n
i
n
E
ngli
sh
us
in
g
m
achine
le
a
rn
i
ng.
I
n
[
6]
Nag
al
a
vi
a
nd
Hanum
anthapp
a
ha
ve
app
li
ed
N
-
gr
a
m
based
w
ord
p
re
dicti
on
m
od
el
in o
r
der
to
e
sta
blish
t
he
li
nk b
et
wee
n
dif
f
eren
t bloc
ks
of
a p
ie
ce
of
wr
it
in
g
in
e
-
new
s
pa
per
i
n
E
ng
li
s
h
retai
ning
with
the
se
nt
ence
rea
ding
or
der.
So
m
e
relat
ed
w
ork
us
e
N
-
gr
am
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
1
3
, N
o.
2
,
Fe
bru
ary
201
9
:
6
71
–
67
6
672
la
nguag
e
m
od
e
l
for
Au
t
o
com
plete
in
U
rdu
la
ngua
ge
[
7]
an
d
i
n
Hindi
la
ngua
ge
[8
]
for
detec
ti
ng
disam
big
ua
ti
on
in
Hindi
w
ord.
So
m
e
researc
h
w
orks
i
n
Ba
ng
la
la
ng
uag
e
,
e.g
.
Ba
ng
la
gram
m
ar
checke
r
[
9]
us
in
g
N
-
gram
la
nguag
e
m
od
e
l, chec
king
the
correct
ness
of
Ba
ng
la
w
ord [
10
]
,
v
e
rificat
io
n of Ba
ngla
s
e
ntence
str
uctu
r
e [
11
]
,
and
validit
y
de
te
rm
inati
on
of
Ba
ngla
se
ntences
[12]
are
al
so
co
nducte
d.
T
he
re
are
s
om
e
diff
ere
nt
word
pr
e
dicti
on
t
oo
l
s
su
c
h
as
Au
t
oC
om
plete
by
Mi
cro
s
of
t,
Au
t
oF
il
l
by
G
oogl
e
Chrom
e,
Typ
in
gA
i
d,
Let
M
eTy
pe
et
c.
In
[13]
s
o
ft
war
e
with
im
pr
ov
e
d
trai
ni
ng
a
nd
recall
al
gorithm
s
are
su
ggest
e
d
to
so
lve
t
he
sen
te
nce
com
pleti
on
prob
le
m
us
ing
t
he
co
ge
nt
co
nfab
ulati
on
m
od
el
,
w
hich
ca
n
rem
e
m
ber
sentences
with
100%
accuracy
i
n
t
he
trai
ning
file
s.
An
N
-
gr
am
m
od
el
is
co
nst
ru
ct
ed
in
[
14]
,
wh
ic
h
was
us
e
d
t
o
c
ompu
te
30
al
te
rn
at
ive
wor
ds
for
a
giv
e
n
l
ow
f
reque
ncy
word
in
a
se
ntence,
a
nd
hu
m
an
j
ud
ges
th
en
pi
cked
t
he
best
i
m
po
stor
words,
base
d o
n
a set
of provi
ded guideli
ne
s
Inde
x
-
base
d
re
trie
val alg
or
it
hm
an
d
a cluste
r
-
base
d
a
ppr
oac
h
are
pro
po
s
e
d
at
[
15]
f
or
se
ntence
-
com
pleti
on
.
Bi
ckel
et
al
.
[
16]
le
ar
ned
a
li
nea
rly
cast
N
-
gra
m
m
od
el
for
s
e
ntence
com
pleti
on
.
B
hatia
et
al
.
[
17
]
extracte
d
fr
e
qu
e
ntly
occ
urr
ing
phrase
s
a
nd
N
-
gram
s
fr
om
te
xt
colle
ct
ion
s
an
d
dep
l
oyed
t
hem
f
or
ge
ner
at
in
g
an
d
rankin
g
a
u
to
-
com
pleti
on
can
did
at
es
f
or
pa
rtia
l
qu
e
rie
s
in
t
he
a
bs
en
ce
of
search
lo
gs
.
A
new ap
proac
h
i
s prop
os
ed
in
[18
]
,
for l
earn
i
ng to
p
e
rs
on
al
iz
e auto
-
c
om
pletio
n ra
nk
i
ng
s
base
d.
Wor
d p
re
dicti
on
m
eans
guessing t
he
nex
t
word
i
n a
se
ntenc
e.
A
uto
c
om
plete
or
Au
t
oc
om
plete
w
ork
s
so
that
the
us
er
ty
pes t
he first
le
tt
er
or
le
tt
ers of a
wor
d an
d t
he p
rog
ram
pr
ov
i
des o
ne
or
m
or
e
hig
he
r p
r
ob
a
bl
e
words.
I
f
t
he
w
ord
he
inte
nds
to
ty
pe
is
i
nclu
ded
in
t
he
li
st,
he
ca
n
sel
ect
it
,
for
e
xam
ple
by
us
i
ng
the
num
ber
of
keys
.
If
the
w
ord
that
the
use
r
wa
nts
is
not
predict
e
d,
t
he
us
er
m
us
t
ty
pe
the
nex
t
le
tt
er
of
the
predict
e
d
w
ord.
At
this
ti
m
e,
the
w
ord
c
ho
ic
e
(
s)
is
al
te
re
d
s
o
that
the
wor
ds
pro
vid
e
d
beg
i
n
with
t
he
sam
e
le
tt
ers
as
th
os
e
that
hav
e
b
ee
n sel
ect
ed
or t
he wor
d
that t
he use
r wants
appear
s
it
is sele
ct
ed.
Au
t
oco
m
plete
te
chn
iq
ue
c
omplet
e
word
by
analy
zi
ng
pr
e
vi
ou
s
word
flo
w
an
d
fi
rst
le
tt
er
of
the
word
for
a
ut
o
c
om
pleti
ng
a
w
ord
an
d
sente
nce
wit
h
m
or
e
acc
ur
ac
y
an
d
re
du
c
es
m
isspell
ing
.
N
-
gr
am
la
ngua
ge
m
od
el
is im
po
rtant
techn
i
qu
e
for w
ord
predict
io
n.
The
prob
le
m
addresse
d i
n t
his
pa
per
is
a
bout
stochastic
al
ly
pr
e
dicti
ng
a
sui
ta
ble
word
t
o
com
plete
an
incom
plete
sen
te
nce,
wh
ic
h
c
on
sist
s
of
s
om
e
w
ords
a
nd
a
si
ng
le
c
ha
racter.
Let
w
1
w
2
w
3
…
w
m
-
1
w
m
be
a
se
ntence
i.e.
se
qu
e
nce
of
w
o
rd
s
,
wh
e
re
w
m
=
c
1
c
2
c
3……
c
n
an
d
w
1
w
2
w
3
…
w
m
-
1
c
1
has
al
read
y
been
ty
ped.
T
he
pro
ble
m
is
to
buil
d
a
la
ng
uag
e
m
od
el
w
hich
ta
ke
s
w
1
w
2
w
3
…
w
m
-
1
c
1
as
input
an
d
predict
s
an
n
-
t
uple
of
word
frag
m
ents
(
v
m
1
,
v
m2
,
v
m
3
,
…v
mn
)
as
outp
ut
i
n
orde
r
to
m
at
ch
the
rem
ai
nin
g
unty
ped
w
ord
fr
a
gm
ent
c
2
c
3
c
4……
c
n
,
as
s
how
n
in
Fig
ure
1.
Figure
1. Lan
guage
Mo
del
We
us
e la
rg
e
dat
a corp
us
for
t
rainin
g
i
n
N
-
gr
a
m
languag
e
m
od
el
f
or com
pleti
ng
c
orrect B
ang
la
word
to
com
plete
a
Ba
ng
la
se
nten
ce
with
m
or
e
accuracy.
I
n
t
his
pa
pe
r,
we
pro
po
se
an
i
nvest
igati
ng
des
ign
of
op
ti
m
u
m
stochastic
la
ng
ua
ge
m
od
el
f
or
Ba
ng
la
a
uto
c
om
plete
us
ing
s
up
erv
ise
d
m
achi
ne
le
ar
ning
te
chn
i
que
base
d
on
diff
e
r
ent
N
-
gram
la
ng
ua
ge
m
od
el
in
g.
P
roba
bili
ty
i
s
base
d
on
c
ounting
t
hings
or
word
i
n
m
os
t
cases.
In
our
pr
e
viou
s
w
orks
[
2,
3],
we
us
e
d
diff
e
r
ent
ty
pes
of
la
ngua
ge
m
od
el
s
f
or
w
ord
predi
ct
ion
.
Both
the
se
tw
o
works
are
dif
f
eren
t
from
the
wor
k
pr
ese
nte
d
in
t
his
pa
pe
r
beca
us
e
word
pr
e
dicti
on
o
bvio
us
ly
dif
fer
s
from
autoc
om
plete
,
i.e.
wor
d
c
om
pleti
on
.
I
n
t
hese
earli
er
w
orks
word
[2
]
or
w
ord
set
[
3]
is
predict
ed
base
d
on
on
e
or
m
or
e
pr
ece
di
ng
w
ords
,
but
in
this
w
ork
,
w
ord
fr
a
gm
ent
s
et
is
be
in
g
pr
e
dicte
d
base
d
on
one
or
m
or
e
words
and a si
ng
le
c
ha
r
act
er.
The
re
st
of
the
pap
e
r
is
organi
zed
as
f
ollo
ws
.
I
n
Sect
io
n
2,
com
es
the
desc
riptio
n
of
our
a
ppr
oach
t
o
so
lve
t
he
pro
ble
m
.
Sect
ion
3
descr
i
bes
how
we
a
pp
ly
our
e
ntire
m
et
ho
dolog
y
a
nd
wh
at
r
esults
are
ac
hie
ved.
I
n
Sect
ion
4,
we
i
nv
e
sti
gate
res
ul
ts
ob
ta
ine
d
in
order
to
de
velo
p
a
n
unde
rstan
ding
a
bout
t
he
m
erit
s
of
our
pr
opose
d
appr
oach.
Fi
na
ll
y,
we
su
m
m
a
rize
our
w
ork
al
ong
with
li
m
it
at
ion
s,
a
nd
di
scuss
t
he
sc
op
e
for
fu
t
ur
e
w
ork
in
Sect
ion
5.
2.
PROP
OSE
D MET
HO
D
We
be
gin
with
five
la
ng
ua
ge
m
od
el
s,
nam
el
y
un
i
gra
m
,
big
ram
,
tr
igram
,
bac
kof
f
a
nd
li
nea
r
interp
olati
on.
All
these
la
ngua
ge
m
od
el
s
are
based
on
N
-
gra
m
app
r
ox
im
ation
.
Ba
ye
sia
n
cl
assifi
ers
ha
ve
been
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
An
i
nvesti
gati
ve desi
gn o
f
opti
mum
st
och
as
ti
c lang
uage m
odel
for
ba
ng
l
a…
(
Md.
Iftakhe
r Ala
m
Ey
amin
)
673
us
e
d
in
[
19
-
21
]
.
As
op
po
se
d
to
Ba
ye
sia
n,
cl
assifi
er
ass
ume
s
no
co
rr
el
at
ion
betwee
n
w
ords
in
t
he
sa
m
e
te
xt,
wh
e
re
N
-
gram
la
nguag
e
m
od
el
assum
e
relat
i
on
s
hi
ps
be
twee
n
the
w
ords,
an
d
e
valuate
the
pro
bab
il
it
y
of
a
w
or
d
bein
g
be
f
or
e
or
after
a
nothe
r
wor
d
.
T
he
or
din
a
ry
eq
uatio
n
f
or
t
he
N
-
gra
m
app
r
oxim
a
t
ion
to
t
he
c
onditi
on
al
pro
bab
il
it
y of t
he next
w
ord
i
n
a
seq
ue
nce
is
:
(
|
1
−
1
)
≈
(
|
−
+
1
−
1
)
(1)
Eq
uation
1
sho
ws
that
pro
ba
bi
li
t
y
of
a
w
ord
w
n
giv
en
al
l
th
e
pr
e
vious
w
or
ds
can
be
pres
um
ptive
by
the
pro
ba
bili
ty
giv
e
n
on
ly
the
pr
e
vious
N
w
ords
.
I
f
N
=
1,
2,
3 i
n (
1),
the
m
od
el
bec
om
es
un
i
gr
am
, big
ra
m
an
d
trigram
langua
ge
m
od
el
, resp
ect
ively
, and
s
o on.
We
trai
n
our
f
ull
corp
us
al
ong
with
our
fi
ve
la
ngua
ge
m
od
el
s.
W
e
c
al
culat
e
each
word
N
-
gra
m
portabil
it
y.
W
e
pr
e
dict
the
w
ord
,
w
hich
ha
ve
the
m
os
t
fr
eq
ue
ncy
bas
ed
on
N
-
gram
pr
obab
il
ity.
The
n
we
chec
k
the
first
c
ha
rac
te
r
of
the
predi
ct
ed
w
ord
with
our
giv
e
n
c
ha
racter.
If
the
first
cha
racter
of
the
pre
dicte
d
wor
d
and g
i
ven wo
r
d
m
at
ched
w
e
sel
ect
the word
as
pr
e
dicte
d or res
ult w
ord.
The
pe
rfor
m
ance
of
each
la
ng
uag
e
m
od
el
is
m
easur
ed
by
ta
king
both
the
m
at
ching
of
predict
ed
w
ord
with
inten
de
d
word
as
well
as
the
order
of
m
at
ching
i
nto
ac
count.
T
her
e
fore,
accu
racy
a
nd
fail
ure
rate
a
r
e
us
e
d
in
order
t
o
a
dd
ress
this
iss
ue.
I
f
is
m
ention
that
the
e
qu
at
i
on
2
&
3
us
ed
f
ro
m
our
pr
e
vi
ou
s
pap
e
r
[
3].
I
f
w
m
m
at
ches w
it
h
v
mi
, (
i.e.
w
m
eq
ua
ls
v
mi
, wher
e
1 ≤
i
≤
n
+
1), t
hen the ac
cu
ra
cy
is
Accur
acy
=
%
10
0
1
n
i
n
(2)
Fail
ur
e
occ
ur
s
wh
e
n
i
e
qual
s
n
+
1.
(
n
+
1)
-
th
m
at
ch
m
eans
no
m
at
ch
has
ta
ken
place,
i.
e.
accu
racy
equ
al
s
0.
I
f
i
n
a
n
e
xperim
ent
a
la
ngua
ge
m
odel
fail
s
t
o
pr
e
di
ct
f
ti
m
es,
i.e.
f
fail
ur
es
occur,
ou
t
of
p
pr
e
dicti
on
s
,
then
t
he fai
lure
r
at
e is
Failure
ra
te
=
%
100
p
f
(3)
Anothe
r
as
pect
of
t
he
prob
le
m
is
em
pirical
.
Given
a
num
ber
of
la
ngua
ge
m
od
el
s
,
we
nee
d
to
com
e
up
with the
one,
wh
ic
h ou
t
perform
s all
o
ther
m
od
el
s
in
te
rm
s o
f
accu
racy f
or possibly
sm
al
l
v
al
ue
of
n
.
3.
RESEA
R
CH MET
HO
D
We
trai
n
a
la
ngua
ge
m
od
el
ba
sed
on
a
c
orp
us
se
tt
ing
n,
th
e
pr
e
dicti
on
le
ng
t
h,
with
1.
T
hen
acc
uracy
of
the
t
raine
d
m
od
el
is
te
ste
d.
T
he
value
of
n
is
i
ncr
ease
d
by
1
a
nd
the
la
ngua
ge
m
od
el
is
trai
ne
d
a
nd
te
ste
d.
The
process
co
ntinu
e
s
unti
l
in
sign
ific
a
nt
cha
ng
e
in
acc
ur
a
c
y
occurs
a
nd
the
value
e
xcee
ds
t
he
ave
ra
ge
wor
d
le
ng
th
of
c
orp
us,
|
w
|.
Her
e
is
t
o
m
ention
t
hat
as
the
value
of
n
inc
reases
,
s
o
is
for
acc
uracy
too
.
Alth
ough
la
rg
e
r
value
of
n
i
nvolv
es
bette
r
ac
cur
acy
,
it
inc
r
eases
the
val
ue
of
i,
the
nu
m
ber
of
posit
ion
in
n
-
tu
ple
at
w
hich
pr
e
dicti
on
m
at
ches.
T
hu
s
,
it
a
lso
in
volves
la
r
ger
num
ber
of
key
str
okes
require
d.
This
is
wh
y
the
a
ve
rage
w
or
d
le
ng
th
of
c
orp
us
is
us
ed
in
loopin
g
c
onditi
on
.
I
n
this
w
ay
,
n
*,
the
co
ns
ide
rab
le
opti
m
u
m
value
of
n
is
autom
at
ic
ally
cal
culat
ed
for
ever
y
la
ngua
ge
m
od
el
sta
te
d
e
arli
er,
w
hich
is
gi
ven
as
pse
udoc
ode
i
n
Algorithm
1 of [
3]. T
he b
est
m
od
el
is ch
os
e
n by
the tec
hn
i
qu
e
, whic
h was
desc
ribe
i
n our
pr
e
vious
pap
e
r [3
]
.
A
set
of
trai
ni
ng
m
od
ules
of
word
pre
dicti
on
w
ere
de
velo
ped
to
c
om
pu
te
unigram
,
bigram
,
trigram
,
backo
ff
as
wel
l
as
li
nea
r
inte
rpolat
ion
base
d
on
N
-
gram
.
The
im
ple
m
entat
ion
is
di
ff
e
r
ent
in
respec
t
to
the
pr
e
vious
w
ork
[4
]
a
s
t
he
predi
ct
ion
is
bu
il
t
w
it
h
a
set
of
wor
ds
an
d
a
cha
ra
ct
er
in
ste
ad
of
a
sin
gle
one
du
rin
g
fin
ding
out
th
e
best
la
ngua
ge
m
od
el
am
on
g
these
la
ngua
ge
m
od
el
s.
T
he
se
m
od
el
s
are
us
e
d
to
dete
rm
ine
diff
e
re
nt
pro
ba
bili
ti
es
by
co
unti
ng
f
reque
nc
ie
s
of
w
ords
i
n
a
ver
y
la
rg
e
corp
us
,
w
hich
has
bee
n
co
ns
t
ru
ct
e
d
from
the
popu
la
r
Ba
ng
la
ne
wsp
a
pe
r
the
“
Dail
y
Proth
om
Al
o”.
T
he
co
rpus
co
ntains
m
or
e
than
12
m
illi
on
(12,2
03,79
0)
words
a
nd
a
bo
ut
1
m
il
l
ion
(937,34
9)
se
nten
ces,
wh
e
re
tota
l
num
ber
of
un
iqu
e
w
ords
is
294,3
71
and ave
ra
ge w
ord
le
ng
t
h
(|
w
|
)
is
7.
Durin
g
t
his
work,
we
div
i
ded
the
e
ntire
c
orp
us
i
nto
tw
o
part
s,
nam
el
y
trai
ning
par
t
an
d
te
sti
ng
pa
rt.
The
ho
l
dout
m
et
hod
[
22]
is
use
d
to
sp
li
t
the
corp
us
at
the
pro
portion
of
two
-
thir
ds
for
t
rainin
g
a
nd
on
e
-
thir
d
for
te
sti
ng.
Th
eref
or
e
,
this
w
ork
sta
rts
with
a
trai
ning
c
orpu
s
of
siz
e
m
or
e
tha
n
seve
n
(
7)
hu
ndre
d
th
ousa
nd
sentences
.
In
order
to
av
oi
d
m
od
el
ov
e
r
-
fitt
ing
pro
ble
m
(i.e.
to
ha
ve
l
ow
e
r
trai
ni
ng
er
ror
but
higher
gen
e
rali
zat
ion
error),
a
valida
ti
on
set
is
use
d.
I
n
acco
r
dance
wit
h
t
his
a
ppr
oac
h,
the
or
i
gin
al
trai
ning
data
is
div
ide
d
int
o
tw
o
sm
al
le
r
su
bs
e
ts.
On
e
of
the
s
ub
s
et
s
is
us
e
d
f
or
trai
ning,
whil
e
the
oth
er
on
e
(i.e.
the
valid
at
ion
set
)
is
us
e
d
f
or
cal
culat
in
g
t
he
ge
ne
rali
zat
ion
e
rro
r.
Tw
o
-
thir
ds
of
t
he
t
ra
ining
set
is
fix
ed
f
or
m
od
el
buil
ding
wh
il
e
the
rem
ai
nin
g
one
-
thi
rd
is
us
e
d
f
or
e
rror
est
im
ation
.
The
hold
out
m
et
hod
is
rep
eat
ed
f
or
five
ti
m
es
in
order
t
o
fi
nd
t
he
best
m
od
el
.
Af
te
r
fin
ding
out
the
best
m
od
el
,
the
accu
rac
y
of
the
m
od
el
is
com
pu
te
d
us
ing
the
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
1
3
, N
o.
2
,
Fe
bru
ary
201
9
:
6
71
–
67
6
674
te
st
set
,
th
r
ough
w
hich
the
co
ns
ide
rab
le
opti
m
u
m
pr
edict
io
n
le
ng
t
h
(
n*
)
is
dete
rm
i
ned
au
tom
a
ti
cal
l
y
bas
ed
on
Algorithm
1
of
[3
]
.T
he
e
ntire
appr
oach is s
how
n
i
n
Fi
gure
2.
Fig
ure
2. Pro
pose
d
a
ppro
ac
h for s
pecific m
od
el
4.
RESU
LT
A
N
D DIS
CURTI
ON
The o
pti
m
u
m
p
re
dicti
on len
gt
h
(
n*
)
al
ong w
it
h
the acc
ur
ac
y of eac
h m
od
el
is shown
on
Table
1.
Table
1.
O
pti
m
um
p
red
ic
ti
on l
eng
t
h (n*) o
f
a
ll
lang
ua
ge
m
od
el
s
Lang
u
ag
e M
o
d
el
Predictio
n
L
en
g
th
Op
ti
m
u
m
valu
e
of
n
(
n
*
)
n
=1
n
=2
n
=3
n
=4
n
=5
n
=6
n
=7
Un
ig
ra
m
3
.4%
1
2
.65
%
2
1
.32
%
2
5
.37
%
2
9
.48
%
3
3
.75
%
3
8
.62
%
7
Big
ra
m
5
9
.90
%
6
5
.35
%
6
9
.14
%
7
2
.52
%
7
4
.95
%
7
7
.06
%
7
9
.12
%
7
Tr
ig
ra
m
7
5
.74
%
7
9
.21
%
8
1
.02
%
8
1
.93
%
8
2
.48
%
8
3
.03
%
8
3
.38
%
7
Back
o
ff
7
5
.74
%
80%
8
2
.39
%
8
3
.41
%
8
4
.73
%
8
5
.42
%
8
5
.96
%
7
Linear
Interpo
latio
n
7
3
.76
%
7
9
.70
%
8
3
.33
%
8
6
.39
%
8
8
.42
%
9
0
.10
%
9
1
.58
%
7
In
a
dd
it
io
n,
a
de
ta
il
ed
in
vestig
at
ion
is
c
onduc
te
d
(s
hown
on
Table
2)
t
o
eva
luate
the
perform
ance
of
the
cl
assifi
er
f
or
al
l
m
od
el
s
by
var
yi
ng
t
he
le
ng
t
h
of
te
st
sentences
,
i.e.
un
igram
,
big
ram
,
trigram
,
backo
ff
a
nd
li
near
inte
rpola
ti
on
.
A
fter
fin
di
ng
out
the
di
fferent
acc
uracy
rate
of
t
op
th
re
e
m
od
el
s
with
the
te
st
set
c
onsist
s
of
se
ntences
with
dif
fer
e
nt
le
ng
th
s,
the
av
erag
e
acc
ur
acy
of
the
m
od
el
’
s
(i.e
.
t
rig
ram
,
bac
kof
f
an
d
li
nea
r
interp
olati
on)
is
c
om
pu
te
d(
se
e
Fig
ure
3)
w
hi
ch
m
igh
t
le
ad
us
in
fi
nd
i
ng
out
t
he
best
la
ngua
ge
m
od
el
.
Duri
ng
fin
ding
out
the
accuracy
of
eac
h
m
od
el
,
it
is
noti
ced
t
hat,
som
et
i
m
es
m
od
el
s
s
how
al
m
os
t
sam
e
accuracy
durin
g
the
pro
cess
of
pr
e
dicti
ng
the
su
it
able
wor
d.
The
re
fore,
ke
epin
g
trac
k
of
the
fail
ure
rate
is
co
ns
ide
re
d
as
a
sign
ific
a
nt
ta
s
k,
as
s
om
e
m
od
el
s
m
igh
t
s
ho
w
sam
e
accura
cy
but
with
diff
e
ren
t
fail
ure
rate.
I
n
Table
3,
the
fail
ur
e
rate
of
al
l
the
m
od
el
s
is
pr
e
sente
d.
A
fter
fin
ding
ou
t
the
differe
nt
f
ai
lure
r
at
e
of
t
op
th
ree
m
od
el
s
wit
h
the
te
st
set
consi
sts
of
se
ntenc
es
with
diff
e
re
nt
le
ng
t
hs
,
t
he
aver
a
ge
fail
ur
e
rate
(see
Fi
gure
4)
of
the
to
p
m
od
el
’
s
(i.e.
tri
gr
am
,
ba
ckoff
an
d
li
nea
r
inter
pola
ti
on)
is
com
pu
te
d
w
hich
m
igh
t
le
ad
us
in
fin
di
ng
ou
t
t
he
best
la
ngua
ge
m
od
el
;
as
duri
ng
the
pr
ocess
of
fin
ding
out
the
ac
cu
racy
so
m
e
m
od
el
s
hav
e
s
how
n
al
m
os
t
si
m
il
ar
accurac
y
wh
ic
h
m
akes th
e sele
ct
io
n process
dif
ficult
.
Table
2.
All m
od
el
s
’
acc
ur
ac
y ac
ro
s
s the
avail
abili
ty
o
f word
s
Av
ailab
le
W
o
rds
in Test
Sen
ten
ces
Accurac
y
of
L
an
g
u
ag
e M
o
d
el
Tr
ig
ra
m
Back
o
ff
Linear
Interpo
latio
n
1
15%
20%
3
7
.78
%
2
3
4
.22
%
44%
6
1
.07
%
3
5
9
.99
%
7
4
.16
%
6
3
.57
%
4
6
0
.68
%
7
6
.21
%
6
4
.56
%
5
6
1
.30
%
7
9
.2%
7
3
.21
%
6
6
1
.52
%
7
9
.99
%
7
3
.40
%
7
6
7
.5%
8
2
.63
%
7
4
.28
%
8
6
8
.26
%
8
3
.43
%
7
4
.34
%
9
6
8
.69
%
8
5
.40
%
75%
10
7
0
.12
%
8
5
.75
%
75%
11
7
0
.71
%
8
7
.40
%
7
6
.73
%
12
7
1
.59
%
90%
8
2
.63
%
13
8
3
.57
%
9
1
.11
%
8
3
.92
%
14
8
3
.57
%
9
1
.99
%
9
2
.14
%
15
8
7
.85
%
9
3
.6%
9
3
.57
%
Table
3.
All m
od
el
s
’ fai
lure r
at
e w
it
h
th
e
avail
abili
ty
o
f
t
he wor
ds
in
tes
t sentence
Av
ailab
le
W
o
rds
in Test
Sen
ten
ces
Failu
re
Rate
of
L
a
n
g
u
ag
e M
o
d
el
Tr
ig
ra
m
Back
o
ff
Linear
Interpo
latio
n
1
6
4
.44
%
5
4
.28
%
6
0
.00
%
2
3
9
.28
%
25%
3
5
.71
%
3
3
8
.63
%
2
1
.62
%
3
0
.43
%
4
3
6
.95
%
20%
2
8
.57
%
5
3
6
.95
%
1
9
.04
%
25%
6
3
2
.14
%
1
6
.34
%
25%
7
3
0
.43
%
1
5
.78
%
25%
8
3
0
.43
%
1
3
.51
%
2
3
.91
%
9
2
8
.76
%
1
2
.12
%
2
2
.72
%
10
2
8
.76
%
1
2
.04
%
2
1
.73
%
11
2
8
.57
%
1
0
.71
%
1
9
.56
%
12
2
7
.27
%
9
.52
%
1
4
.54
%
13
1
4
.28
%
7
.40
%
1
4
.28
%
14
1
4
.28
%
5
.71
%
7
.14
%
15
1
0
.71
%
4
.21
%
3
.57
%
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
An
i
nvesti
gati
ve desi
gn o
f
opti
mum
st
och
as
ti
c lang
uage m
odel
for
ba
ng
l
a…
(
Md.
Iftakhe
r Ala
m
Ey
amin
)
675
Durin
g
e
xperi
m
ent,
as
s
how
n
on
Ta
ble
1,
it
is
no
ti
ceable
,
the
t
op
th
ree
m
od
el
s
ha
ve
sh
ow
n
good
accuracy
am
on
g
al
l
t
he
m
od
el
s,
th
ough
li
nea
r
inte
rpolat
ion
m
od
el
shows
s
li
gh
tl
y
bette
r
pe
rfor
m
ance
in
te
rm
s
of
pre
dicti
ng
ne
xt
po
s
sible
w
ord
with
opti
m
um
pr
edict
io
n
le
ng
th
se
ven,
i.
e.
n*
=
7.
T
herefo
re,
to
fin
d
out
th
e
best
m
od
el
,
i
n
the
sec
ond
pha
se,
a
f
ur
t
her
de
ep
i
nv
est
igati
on
is
co
nducte
d,
as
s
how
n
on
Table
2,
to
fin
d
ou
t
,
how
the
t
op
t
hree
m
od
el
s
be
hav
e
a
gain
st
the
te
st
set
s
with
di
ff
e
ren
t
siz
es
(av
e
ra
ge)
of
sentences
.
F
r
om
the
exp
e
rim
ent
in
seco
nd
phase,
al
l
the
top
m
odel
s
be
ha
ve
si
m
il
ar
li
ke
befor
e
c
onse
qu
e
nt
ly
,
a
third
ph
ase
is
require
d,
in
w
hich
the
fail
ur
e
rate
of
t
he
t
op
m
od
el
s
is
c
om
pu
te
d
(see
T
able
3).
T
houg
h,
in
so
m
e
cas
es
the
trigram
,
backo
ff
a
nd
li
n
ea
r
in
te
rpolat
ion
m
eth
od
s
how
al
m
os
t
sam
e
accuracy,
bu
t
th
e
fai
lure
rate
of
t
he
oth
e
r
two
m
od
el
s
(tr
igram
and
ba
c
koff)
is
higher
com
par
ed
to
t
he
li
near
inter
po
la
ti
on
.
M
or
e
ov
er,
from
the
a
ve
rag
e
accuracy
an
d
a
ver
a
ge
fail
ure
rate
of
al
l
m
odel
s
(
Fi
gure
3
a
nd
Fi
gure
4
re
sp
ect
ively
)
it
is
ob
vious
to
c
om
e
up
with
the
fi
nal
decisi
on
that
li
near
inter
pola
ti
on
m
od
el
acc
om
plishes
m
os
t
accuracy
am
ong
al
l
oth
e
r
m
od
el
s
durin
g
the
wo
r
d pr
e
dicti
on
pr
ocess. The
acc
ur
acy
rate al
on
g wit
h
the i
ncre
m
ent o
f
t
he pr
edict
ion
le
ng
t
h of t
he
li
near
inte
rpola
ti
on
m
od
el
is s
how
n on
Fig
ur
e
5.
Figure
3. A
verage acc
ur
acy
of lan
guage
m
od
el
s
Figure
4. A
verage
fail
ur
e
r
at
e
of lan
guage
m
od
el
s
Figure
5. Acc
uracy
and
Fail
ure al
ong wit
h
t
he
predict
io
n
le
ng
t
h of Li
nea
r In
te
r
pola
ti
on
m
od
el
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
1
3
, N
o.
2
,
Fe
bru
ary
201
9
:
6
71
–
67
6
676
Althou
gh
t
he
li
near
inte
rpolat
ion
m
od
el
has
s
how
n
bette
r
pe
rfor
m
ance
tha
n
ot
her
t
op
m
odel
s
(
91.
98
%
with
n*
=
7) an
d
the
expe
rim
e
nt r
es
ult i
s
pro
m
isi
ng
.
5.
CONCL
US
I
O
N
The
f
oc
us
of
this
resea
rch
w
as
m
od
el
in
g,
tr
ai
nin
g
a
nd
ap
pl
y
te
chn
iqu
e
s
that
can
assist
in
aut
om
atic
Ba
ng
la
w
ord
c
om
pleti
on
.
F
or
the
pur
pose
of
this
re
searc
h,
a
la
rg
e
a
nd
r
ic
h
Ba
ng
la
c
or
pu
s
is
a
ppli
ed
an
d
su
pe
r
vised
m
a
chine
le
a
rn
i
ng
te
chn
i
que
bas
ed
on
po
pu
l
a
r
N
-
gram
la
nguag
e
m
od
el
is
us
e
d.
Am
ong
five
-
la
nguag
e
m
od
e
l t
o determ
ine the
best la
ngua
ge
m
od
el
is
the
m
ai
n
con
tri
bu
t
ion o
f
t
he resea
rch. T
houg
h durin
g
the
se
ver
al
phases
of
e
xp
e
rim
ents,
in
te
r
m
s
of
bo
t
h
a
ccur
acy
an
d
f
ai
lure
rate,
th
e
li
near
inte
rpolat
i
on
ou
t
perform
s
the
oth
er
m
od
el
s
.
For
the
f
utur
e
work,
a
f
ur
th
er
te
sti
ng
wit
h
the
pr
e
sent
m
od
el
s
is
plan
ned
with
la
rg
er
co
rpus
.
An
ada
ptive
s
of
t
war
e
for
B
ang
la
a
uto
m
ated
w
ord
com
pleti
on
base
d
on
this
w
ork
w
il
l
be
dev
el
op
e
d.
REFERE
NCE
S
[1]
Li
st
of
language
s
b
y
num
ber
of
n
at
iv
e
spe
ak
ers,
Ava
il
a
ble
a
t
:
htt
ps://
en.
wik
ipedia.
org
/wiki
/
Li
st
_of_la
nguag
es_b
y
_num
ber
_of_
nat
iv
e_spe
ake
rs
.
(La
st
Acc
essed:
Marc
h
10,
2018
).
[2]
M.
M.
Haqu
e,
M
.
T
Hab
ib
a
nd
M.
M.
Rahm
an.
“
Autom
at
ed
W
ord
Predic
t
ion
in
B
angla
La
ng
uage
Us
ing
Sto
c
hasti
c
La
nguag
e
Mod
el
s
”.
A
cade
my
&
I
ndu
stry
Res
earc
h
Coll
abo
ration
Center
(
AIR
CC)
Inte
rna
ti
onal
Journal
in
Foundat
ions o
f Com
pute
r Sc
i
en
ce
&
Technol
ogy
.
Nov
ember
20
15,
vol
.
5
,
no
.
6
,
pp.
67
–
75
.
[3]
.
M.
T.
Habib
,
A.
Al
-
Mam
un,
M.
S.
Rahman,
S.
M.
T.
Sidd
iq
u
ee
and
F.
Ahm
ed.
"A
n
Expl
ora
t
or
y
Approa
ch
to
Find
a
Novel
Me
tric
B
ase
d
Opt
imum
La
nguag
e
Mode
l
for
Autom
atic
Bangl
a
W
ord
Pr
edi
c
ti
on".
Int
ernati
onal
Journal
of
Inte
lligen
t
Syst
e
ms
and
Appl
i
cat
i
ons (
IJI
SA)
,
Februa
r
y
2018
,
vol
.
10,
no
.
2
,
pp
.
47
-
54.
[4]
N.
Gar
a
y
-
Vit
oria
and
J.
Gonz
al
e
z
-
Abasca
l,
(2
005).
"A
ppli
c
at
i
on
of
Artif
icial
I
nte
lligen
ce
Meth
ods
in
a
W
ord
-
Pr
edi
c
ti
o
n
Aid".
L
abor
at
or
y
of
Hum
an
-
Co
m
pute
r
Inte
r
acti
on
for
Spe
ci
a
l
N
ee
ds.
[5]
H.
Al
-
Mubai
d,
"A
L
ea
rn
ing
-
Cla
ss
ifi
c
at
ion
B
ase
d
Approa
ch
f
or
W
ord
Predi
c
tion"
.
The
In
te
rna
ti
onal
Arab
Jour
nal
of
Information
Tec
hnology
,
2007,
V
ol.
4
,
No
.
3
.
[6]
D.
Naga
la
vi
a
nd
and
M.
Hanu
m
ant
happa
.
“
N
-
gra
m
W
ord
pre
diction
l
angua
g
e
m
odel
s
to
ide
nt
if
y
the
sequ
enc
e
of
art
i
cle
bloc
ks
in
Engl
is
h
e
-
n
ews
pape
rs”
.
In
Proc
ee
d
ings
of
In
t
ernati
ona
l
Confe
renc
e
on
C
omputati
on
S
yst
em
and
Informat
ion
Technol
ogy
for
Sustainabl
e
Solu
ti
ons (
CSITSS
)
,
2016
.
[7]
Q.
Abbas,
(2
014).
"A
Stoch
a
stic
Prediction
I
nte
rfa
ce
for
Urd
u".
In
te
l
li
gen
t
Sy
stems
and
Appli
cat
ions
,
Vol
.
7,
No.1,
pp
94
-
100.
[8]
U.
P.
S
ingh
,
V.
Go
y
a
l
and
A.
Ran
i.
"D
isambi
guat
ing
Hindi
W
ords
Us
ing
N
-
Gr
am
Sm
oothi
ng
Models".
In
te
rn
ati
onal
Journal
of
Engi
n
ee
ring S
ci
en
ce
s
.
2014,
Vol
.
10,
Iss
ue
June,
pp
26
-
29.
[9]
J.
Alam,
N.
Uzz
aman
and
M.
khan
.
"
N
-
gra
m
base
d
Statis
ti
c
al
Gram
m
ar
Chec
ker
for
Ba
n
gla
and
English"
.
In
Proce
ed
ings o
f
I
nte
rnational
Co
nfe
renc
e
on
Co
mputer
and
Info
rm
ati
on
Technology
.
2006
.
[10]
N.
H.
Khan,
G.
C
.
Saha
,
B
.
Sarke
r
and
M
.
H
.
Rahman.
"Che
cki
ng
th
e
Corr
ectne
ss
of
Bangla
W
ords
using
N
-
Gram
".
Inte
rnational
Jo
urnal
of
Comput
er
A
ppl
ic
at
ion
,
2
014,
Vol
.
89
,
No
.
11
.
[11]
N.
H.
Khan
,
M.
F.
Khan
,
M.
M.
Islam,
M.
H
.
Rahman
and
B.
Sarke
r.
"V
eri
fi
cation
of
Bang
la
Sente
n
ce
Struct
ur
e
usin
g
N
-
Gram
".
Global
Journal
of
Computer
Sc
ie
nc
e
an
d
Technol
og
y
.
2
007,
vol
.
14
,
issue
1.
[12]
M.
R.
R
ahman,
M
.
T.
Habib,
M.
S.
R
ahman,
S
.
B.
Shuvo
and
M.
S.
Uddin
.
“
An
Inve
stig
at
iv
e
Design
Based
Stat
isti
c
a
l
Approac
h
for
D
et
ermining
Ban
gla
Sente
n
ce
V
al
idit
y
”
.
In
te
rna
ti
onal
Journal
o
f
Computer
Sc
ience
and
N
et
wor
k
Sec
urit
y
.
Novem
ber
2016,
vol
.
16
,
no
.
11
,
pp
.
30
–
37.
[13]
Q.
Qiu
e
t
a
l.
"Confabu
lation
base
d
sent
enc
e
complet
io
n
for
m
ac
hin
e
rea
ding"
.
2011
IEE
E
Symposiu
m
on
Computati
onal I
nte
lligen
ce,
Cog
nit
ive Algorit
hm
s,
Mind
,
and
Bra
in
(
CCMB)
.
Pari
s.
2011,
pp.
1
-
8.
[14]
G.
Zwe
ig
,
C.
J.
C.
Burges.
Te
ch
rep
or
t:
“
Th
e
Microsoft
Res
ea
rch
Sent
enc
e
Com
ple
ti
on
Ch
a
ll
eng
e”
.
2011.
[15]
K.
Grabski
and
T.
Schef
f
er.
Sente
nc
e
comple
ti
on.
In
Pro
c.
SI
GIR
,
p
age
s
433
–
439,
Sheff
ie
ld
,
Unite
d
Kingdom
,
2004.
[16]
S.
Bic
ke
l,
P.
Haide
r
,
and
T.
Schef
fer
.
Learni
n
g
to
com
-
ple
t
e
se
nte
nc
es.
I
n
Proc
ee
dings.
ECML
,
volume
3720
of
Le
c
ture
Notes
in
Com
pu
te
r
Sc
ie
nc
e, pa
g
e
s 497{504.
Sprin
ger
,
2005}
.
[17]
S.
Bhat
i
a,
D
.
Majumdar
,
and
P.
Mitra.
Quer
y
s
uggesti
ons
in
the
abse
nc
e
of
quer
y
logs.
In
Proc
eedings.
SIGIR
.
Be
ij
in
g
,
China
,
2011,
pp.
795
-
804.
[18]
.
Danie
l
Jura
fsk
y
and
Jam
es
H.
Marti
n
.
Spe
e
ch
and
L
angua
g
e
proc
essing,
US
A:
Prentice
-
Ha
ll,
Inc
.
2000
.
[19]
K.
C.
Ran
i,
Y.
Prasanth
.
"
A
Dec
ision
S
y
st
e
m
for
Predic
t
ing
Diabete
s
using
Neura
l
Ne
tworks
".
IAE
S
Inte
rna
ti
onal
Journal
of
Artifi
ci
al
Intelli
g
ence
(
IJ
-
AI
)
.
June
201
7,
Vol
.
6
,
No.
2,
pp
56
-
65.
[20]
S.
Shah,
K.
Kum
ar,
R
a.
K
.
Sara
v
ana
guru
,
“
Senti
m
ent
al
A
naly
s
is
of
Twi
tter
Dat
a
Us
ing
Cla
ss
ifi
er
Algor
it
hm
s”.
Inte
rnational
Jo
urnal
of El
e
ct
ri
c
al
and
Comput
er
Engi
n
ee
ring
(
IJE
CE)
.
Februa
r
y
2016,
Vol
.
6
,
No
.
1
,
pp
.
357
-
366
.
[21]
As
hwin
V,
“
Twit
te
r
Twe
et
Cla
ss
ifi
er
”,
“
IA
ES
In
te
rnationa
l
Journal
o
f
Artif
ic
ial
In
te
l
li
g
enc
e
(
IJ
-
AI)
”.
Marc
h
2016
,
Vol.
5
,
No.
1,
pp
.
41
-
44
.
[22]
P.
-
N.
Ta
n
,
M.
Stei
nb
ac
h
,
an
d
V.
Kum
ar,
“
Introduct
ion to
Dat
a
Mini
ng
,
”
Addi
son
-
W
esley
,
200
6
.
Evaluation Warning : The document was created with Spire.PDF for Python.