Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
3
,
June
2020
,
pp. 3
244
~
3252
IS
S
N: 20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v10
i
3
.
p
p3
244
-
32
52
3244
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
An explo
rator
y research
on g
ram
mar ch
eckin
g of Bangl
a
sentence
s us
ing statisti
cal
langu
age models
Md. Ri
az
ur Rahm
an
,
Md.
T
arek Ha
bib,
Md. S
ad
ek
ur
Rahma
n,
Ga
z
i Z
ah
ir
ul Islam, M
d. A
bba
s
A
li
Khan
Depa
rtment
o
f
C
om
pute
r
Scie
n
ce a
nd
Engi
n
ee
rin
g,
Daffod
il
In
te
r
nat
ion
al
Univ
ersity
,
Banglade
sh
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
J
un
30
, 2
019
Re
vised
N
ov 4
,
20
19
Accepte
d
Dec
7,
2019
N
-
gra
m
base
d
la
nguag
e
m
odels
are
v
er
y
pop
ula
r
and
ex
te
ns
ive
l
y
used
stat
isti
ca
l
m
et
ho
ds
for
solving
v
ari
ous
na
tura
l
l
a
nguage
pro
ce
ss
i
ng
proble
m
s
inc
ludi
ng
gra
m
m
ar
che
ck
ing.
Sm
oothi
ng
is
one
of
th
e
m
o
st
eff
e
ct
iv
e
te
chn
ique
s
used
in
bu
il
ding
a
l
angua
ge
m
odel
to
de
al
wi
th
d
a
ta
spars
i
t
y
proble
m
.
Kneser
-
Ne
y
is
one
of
the
m
ost
pro
mi
nent
l
y
used
an
d
succ
essful
sm
oothi
ng
te
ch
n
ique
for
la
ng
uage
m
odel
li
ng
.
In
our
pre
vi
ous
work,
we
pre
sente
d
a
W
it
te
n
-
Be
ll
sm
oothi
ng
base
d
la
n
guage
m
odel
l
ing
te
chn
ique
for
checki
ng
gr
amm
at
ic
al
cor
r
e
c
tne
ss
of
B
angla
sent
enc
es
whi
ch
show
ed
prom
ising
result
s
outperform
ing
pre
vious
m
et
hods.
In
t
his
work
,
we
proposed
an
improved
m
e
thod
using
Kn
ese
r
-
Ne
y
sm
oot
hing
base
d
n
-
gra
m
la
nguage
m
odel
for
gra
mmar
che
cki
ng
an
d
per
form
ed
a
c
om
par
at
iv
e
per
form
anc
e
an
al
y
s
is
bet
w
ee
n
Kneser
-
Ne
y
a
nd
W
it
te
n
-
B
el
l
sm
oothi
ng
te
chn
ique
s
for
t
he
sam
e
purpose.
W
e
al
so
prov
ide
d
an
improve
d
te
chni
qu
e
for
ca
l
cul
a
ti
ng
t
he
opti
m
um
thre
shold
which
furt
her
enha
n
ce
d
th
e
the
result
s
.
Our
expe
riment
al
resul
ts
show
tha
t
,
Kneser
-
N
e
y
outpe
rfor
m
s
W
it
te
n
-
Be
ll
as
a
sm
oothi
ng
technique
when
used
wi
th
n
-
gra
m
LMs
for
checki
n
g
gra
m
m
at
ic
al c
or
rec
tn
ess of
Bang
la
sen
te
nc
es.
Ke
yw
or
d
s
:
Gr
am
m
ar ch
ec
king
Lan
gu
a
ge
m
odel
s
Natu
ral la
ngua
ge pr
ocessi
ng
N
-
gr
am
s
Sm
oo
thing
Copyright
©
202
0
Instit
ut
e
o
f
Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Md.
Ri
azu
r
Ra
hm
an,
Dep
a
rtm
ent o
f C
om
pu
te
r
Scie
nce a
nd E
ng
i
ne
erin
g,
Daffodil
Inter
na
ti
on
al
Uni
ver
s
it
y,
4/2
,
S
obha
nb
a
g,
Mi
r
pur
Roa
d,
D
han
m
ondi,
Dhak
a
-
1207, B
ang
la
desh
.
Em
a
il
:
riazur
_r
ahm
an@
daff
odil
var
sit
y.e
du.
bd
1.
INTROD
U
CTION
The
fiel
d
of
stud
y
that
de
al
s
with
proc
essing
natu
ral
la
nguag
e
s
is
cal
le
d
Natural
Langua
ge
Pr
oc
essin
g
(NLP)
w
hich
i
nvest
igate
s
how
c
om
pu
te
rs
can
be
us
e
d
to
rec
ognize
and
ope
rate
na
tura
l
la
nguag
e
s
[1
]
.
NLP
is
a
n
im
po
rta
nt
brac
h
of
A
rtific
ia
l
In
te
l
li
gen
ce
(AI),
wh
ic
h
has
plen
ty
of
a
ppli
cat
ion
s
as
oth
e
r
br
a
nches
of
AI
do
li
ke
rice
gr
ai
n
cl
assifi
cat
ion
[
2]
,
an
om
al
ou
s
so
un
d
e
ve
nt
de
te
ct
ion
[
3],
r
obotic
nav
i
gation
[
4],
rec
omm
end
at
ion
syst
em
fo
r
bu
yi
ng
house
[
5],
a
nd
s
o
on
.
O
ne
su
c
h
ap
plica
ti
on
of
N
LP
is
gr
am
m
ar
chec
king
[6
]
.
Th
ou
gh
the
re
a
re
a
lot
of
to
ols
a
nd
te
c
hn
i
qu
es
,
a
s
desc
ribe
d
i
n
[7
-
10]
,
de
velo
ped
f
or
gr
am
m
ar ch
ec
king in
r
ece
nt
ye
ars,
bu
t,
gra
m
m
ar ch
ecke
rs
h
a
ve qu
it
e a
lo
t of lim
it
a
ti
on
s
sti
ll
n
ow.
The
re
are
m
ai
nly
two
a
ppr
oac
hes
to
i
m
ple
m
ent
a
gr
a
m
m
ar
check
er
,
nam
el
y
ru
le
-
bas
e
d
appr
oach
[
11]
and
sta
ti
sti
cal
appr
oach
[
12]
.
In
r
ule
-
based
gr
am
m
ar
check
ers
,
a
set
of
m
anu
al
ly
dev
e
lop
e
d
gr
am
m
atical
ru
le
s
are
us
e
d
to d
eci
de
the
c
orr
ect
ness
of
the g
ive
n
te
xt
an
d
dev
el
op
i
ng
s
uc
h
r
ules
re
qu
ir
e
s
tim
e
and
high
-
le
vel
li
ng
uisti
c
ex
pe
rtise
of
the
ta
r
get
la
ngua
ge.
Wh
e
reas,
i
n
sta
ti
sti
cs
-
based
a
ppr
oach,
the
gr
a
m
m
ar
ru
le
s
are
bu
il
t
from
a
te
xt
cor
pu
s
of
the
ta
r
ge
t
la
ng
ua
ge
us
i
ng
sta
ti
sti
cal
m
et
ho
ds
wh
e
r
e
com
m
on
sequ
ence
s
that
occ
ur
of
t
en
ca
n
be
c
on
sidere
d
c
orrect
and
the
un
c
om
m
on
ones
in
correct
.
La
ngua
ge
m
od
el
(L
M)
is
a
widely
us
ed
sta
ti
sti
cal
te
chn
iq
ue
that
buil
ds
a
sta
ti
sti
cal
m
achine
from
a
te
xt
cor
pus
of
the
ta
rg
et
la
ngua
ge
that
can
est
i
m
a
te
the
distribu
t
ion
of
the
l
an
guage
as
acc
ur
a
te
ly
as
po
ssible.
A
central
iss
ue
in
LM
est
im
at
ion
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
An
ex
plo
r
atory
resea
r
c
h o
n gr
amm
ar
c
heckin
g of B
angla
…
(
Md. Ri
azur
Ra
hma
n
)
3245
is
data
sp
a
rse
ne
ss,
in
w
hich
c
ase
LMs
fail
s
t
o
a
pproxim
at
e
accurate
pro
ba
bili
ti
es
du
e
to
lim
i
te
d
trai
ning
data.
Sm
oo
thing
[
13]
is
a
te
chn
iq
ue
that
res
olv
es
this
pro
blem
by
ad
justi
ng
the
m
a
xi
m
u
m
l
ikeli
hood
est
im
at
or
to
com
pen
sat
e
f
or
data
s
pa
rsen
e
ss.
I
n
pr
act
ic
e,
LMs
are
usual
ly
i
m
ple
m
ented
in
co
njuncti
on
with
sm
oo
thing
te
chn
i
qu
e
s
f
or
be
tt
er
pe
rfor
m
ance.
T
her
e
are
m
any
sm
oo
thing
te
ch
niques
avail
able
ou
t
of
wh
ic
h
W
it
te
n
-
Be
ll
(
WB)
[
14
]
a
nd
K
neser
-
Ney
(
KN)
[15]
a
re
by
fa
r
t
he
tw
o
m
os
t
effe
ct
ive
a
nd
widely
use
d
sm
oo
thing t
ech
niques.
A
num
ber
of
good
works
is
done
i
n
Ba
ng
la
in
dif
fer
e
nt
pro
ble
m
do
m
ai
ns
of
NL
P,
e
.g.
autoc
om
plete
[
16
]
,
a
uto
c
orrec
ti
on
of
sp
el
li
ng
[
17
]
,
w
ord
predict
io
n
[
18
]
.
Fu
rt
her
m
or
e,
t
her
e
has
bee
n
m
uch
dev
el
op
m
ent
in
gr
am
m
ar
check
in
g
researc
h
in
m
any
diff
e
ren
t
la
ngua
ges.
Ne
ver
t
heless,
bein
g
on
e
of
t
he
t
op
te
n
spo
ken
la
ngua
ges
i
n
the
world
[19],
the
re
has
bee
n
li
tt
le
dev
el
op
m
ent
in
the
Ba
ngla
la
nguag
e
proc
essin
g
sp
eci
al
ly
in
gr
a
m
m
ar
check
i
ng.
T
houg
h
s
om
e
eff
ort
s
ha
ve
bee
n
m
ade,
there
a
re
sti
ll
pl
enty
of
roo
m
s
fo
r
i
m
pr
ovem
ent.
In
[
20
]
the
au
thors
pr
ese
nte
d
a
n
-
gram
LM
to
de
sig
n
a
Ba
ngla
gram
m
ar
chec
ker
,
wh
e
re
the
-
gr
am
pr
ob
abili
ty
distribut
ion
s
of
par
ts
-
of
-
s
peec
h
(POS
)
ta
gs
of
w
ords
are
us
e
d
as
fe
at
ur
e.
A
se
nten
ce
is
detect
ed
a
s
gra
m
m
a
ti
cal
l
y
correct
if
the
pro
duct
of
al
l
the
-
gram
s
in
t
he
se
nte
nce
is
great
er
tha
n
zer
o
oth
e
rw
ise
inc
orrect.
D
ue
to
this,
their
m
e
t
hod
suffe
rs
f
r
om
the
data
sp
arsit
y
pro
blem
,
wh
ic
h
sev
erely
degra
de
s
the
pe
rfor
m
ance
of
the
syst
em
.
M
or
e
over,
they
us
e
d
a
ver
y
s
m
al
l
cor
pus
of
only
5000
w
ords
t
o
bu
il
d
t
he
-
gra
m
m
od
el
an
d
t
est
ed
t
he
m
od
e
l
on
a
te
st
set
of
sim
ple
sente
nc
es.
T
he
a
utho
rs
i
n
[
21
]
prese
nted
ano
t
her
-
gr
am
base
d
sta
ti
sti
cal
te
chn
i
qu
e
f
or
gra
m
m
ar
chec
king.
Ra
t
her
th
an
us
in
g
pro
ba
bili
ty
of
P
OS
ta
gs
of
w
ords
t
his
ti
m
e
-
gr
am
pro
ba
bili
ty
distribut
ion
of
w
ords
i
s
us
e
d
t
o
trai
n
and
te
st
the
sy
stem
.
To
deal
with
sp
arsit
y
pro
blem
of
-
gram
m
od
el
s
,
t
hey
us
e
d
WB
sm
oo
thi
ng
with
t
heir
-
gr
am
m
od
el
.
T
h
ey
trai
ned
the
ir
sta
ti
sti
cal
-
gr
a
m
m
od
el
with
a
s
m
al
l
exp
erim
ental
cor
pus
of
1
m
i
ll
ion
w
ords
wit
h
a
te
st
set
of
1000
c
orrect
and
1000
inc
orrect
sentences
.
Howe
ver,
their
appro
ac
h
did
no
t
cl
arify
ho
w
the
th
reshol
d
bet
ween
c
orr
ect
and
inco
rr
ect
se
ntences
i
s
determ
ined
wh
ic
h
is
no
t
a
pr
act
ic
al
appr
oach.
Mo
reover
,
in
our
pr
e
vious
wor
k
[22],
a
sta
ti
sti
cal
m
e
thod
was
pro
pose
d
w
hich
use
d
-
gr
am
base
d
LM
com
bin
e
d
with
WB
sm
oo
t
hing
an
d
ba
ckoff
te
chn
iq
ue
t
o
de
te
rm
ine
the
gram
m
at
ic
a
l
correct
ness
of
sim
ple
Ba
ngla
s
en
te
nces,
wh
ic
h
pr
ese
nted
pro
m
isi
ng
resu
lt
s.
Nev
e
r
thele
ss,
there
are
sti
ll
ro
om
fo
r
im
pr
ove
m
ent
and
f
ur
t
her
a
naly
sis
are
require
d
to
find
an
e
nh
a
nce
d,
r
obus
t a
nd
well
p
er
form
ing
stat
ist
ic
al
g
ram
m
a
r
c
heck
i
ng syst
e
m
f
or Ban
gla.
The
issues
m
e
ntion
e
d
ab
ove
a
nd
facts
m
o
ti
vated
this
wo
r
k
w
her
e
a
com
pr
ehe
ns
ive
com
par
at
ive
stud
y
on
the
pe
rfor
m
ance
of
WB
an
d
K
N
s
m
oo
thing
base
d
LMs
f
or
the
purpose
of
gra
m
m
ar
check
ing
of
Ba
ng
la
se
ntenc
es
has
bee
n
perform
ed
to
fin
d
the
best
possib
le
LM,
set
ti
ng
s
an
d
m
et
ho
ds
f
or
t
he
d
evel
opm
en
t
of
a
m
or
e
accu
rate
and
rob
us
t
gr
am
m
ar
check
er
for
Ba
ngla
.
The
pr
ese
nte
d
te
ch
nique
w
as
trai
ned
on
a
la
rg
e
Ba
ng
la
c
orpus
of
20
m
il
l
ion
words
colle
ct
ed
f
ro
m
var
io
us
onli
ne
newspa
per
s
.
A
n
im
pr
ov
e
d
strat
egy
is
pro
po
se
d
t
o
determ
ine
appropr
ia
te
th
res
h
ol
d
to
disti
nguis
h
betwee
n
gr
am
m
at
ic
a
l
an
d
ungr
am
m
at
ic
al
sentences
.
The
threshold
was
finali
zed
by
perform
ing
cro
ss
validat
io
n
on
the
trai
ning
set
and
te
sti
ng
on
a
separ
at
e
validat
ion
set
in
two
sta
ges
to
e
ns
ure
m
axi
m
um
op
tim
al
ity.
The
propose
d
m
et
ho
d
was
te
ste
d
on
an
update
d
rea
li
sti
c
and
chall
eng
i
ng
te
st
set
of
15000
c
orre
ct
and
15
000
inco
rr
ect
se
nten
ces
con
sist
in
g
of
al
l
kinds
of
sim
pl
e
&
com
plex
sentences
with
var
yi
ng
le
ng
t
hs
.
T
he
rest
of
the
pap
e
r
is
or
ga
nized
as
f
ol
lows
;
sect
ion
2
pr
ese
nts
s
om
e
theo
r
et
ic
al
backgro
und
on
-
gr
am
base
d
se
ntence
pro
ba
bili
ty
cal
culat
ion
.
Wh
e
reas
sect
ion
3
desc
ribes
the
m
et
ho
dolo
gy
us
e
d
for
de
velo
ping
the
syst
e
m
.
S
ect
ion
4
pr
ese
nts
the
ex
per
i
m
ental
resu
lt
s
w
hile sec
ti
on
5
c
oncl
udes t
he pape
r.
2.
STATISTI
CAL
LANGU
AGE
M
O
DELI
NG
N
-
gr
am
s
ta
ti
s
t
ic
al
LMs
are
ver
y
popula
r
ly
us
ed
sta
ti
st
ic
al
m
e
tho
ds
fo
r
so
l
ving
va
rio
us
NL
P
pro
blem
s.
2
.
1.
N
-
gr
am
la
ngu
ag
e
m
od
el
s
A
la
ngua
ge
m
od
el
(
LM)
is
a
prob
a
bili
ty
distrib
ution
ov
er
al
l
po
ssi
ble
sentences
or
string
s
i
n
a
la
ngua
ge.
Le
t’s
ass
um
e
that
S
de
no
te
s
a
sentence
co
ns
i
sti
ng
of
a
sp
e
ci
fied
se
quenc
e
of
wor
ds
s
uc
h
t
hat
S
=
w
1
w
2
w
3
…
w
k
.
A
n
n
-
gr
a
m
LM
con
sid
ers
the
word
s
equ
e
nce
or
se
nt
ence
to
be
a
Ma
rkov
proces
s
[23].
Its pr
ob
a
bili
ty
is cal
culat
ed
as
,
(
)
=
∏
(
|
−
+
1
…
−
1
)
=
1
(1)
w
he
re
ref
ers
t
o
the
or
der
of
the
Ma
rko
v
pr
ocess.
When
=
3
we
cal
l
it
a
trigram
LM
wh
ic
h
is
est
i
m
at
ed
us
in
g
i
nfor
m
ation
ab
out
the
c
o
-
occ
urren
ce
of
3
-
t
up
le
s
of
w
ords.
T
he
pro
ba
bili
ty
of
(
|
−
+
1
…
−
1
)
ca
n
be
cal
c
ulate
d
a
s,
(
|
−
+
1
…
−
1
)
=
(
−
+
1
…
)
∑
(
−
+
1
…
−
1
)
(2)
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
32
4
4
-
32
5
2
3246
w
he
re,
(
−
+
1
…
)
is
the
c
ount
of
occ
urre
nces
of
word
se
qu
e
nce
−
+
1
…
a
nd
∑
(
−
+
1
…
−
1
)
in
dicat
es the
su
m
o
f
c
ounts
of all
the
-
gra
m
s that start
s w
it
h
−
+
1
…
−
1
.
Fo
r
e
xam
ple, let
u
s c
onside
r
t
he follo
wing B
ang
la
sente
nce,
কাদে
র
একটি
আম
খেদেদে
[engli
sh
]
Kad
e
r
at
e
a m
ango
(
Kad
er
ekt
i
aam
kheyche
y)
The pr
obabili
ty
o
f
this se
nten
ce can
be c
al
cu
la
te
d
us
i
ng b
i
gram
LM with
(
1
)
as,
P(
কাদের
এক
ট
ি
আম
খেদে
দ
ে
)
=
P(
কাদে
র
|<s>
)
*
P(
একটি
|
কাদে
র
)
*
P(
আম
|
এ
কট
ি
)
*
P(
খেদেদে
|
আম
)
*
P(</
s>|
খেদেদে
)
For
t
he
s
ame E
ng
li
s
h
se
ntence
,
P(
Kad
er
ate
a
man
go
)
=
P(
Kad
er
|
<s
>
)
*
P(
ate
|
Kad
er)
*
P(
a|
ate)
*
P(man
go |
a)
*
P(</
s>|
mang
o)
In
pr
act
ic
e,
to
cal
culat
e
the
pro
bab
il
it
y
of
a
sentence
a
st
art
tok
e
n
<s>
and
a
n
e
nd
t
oken
</
s>
a
re
use
d
to
ind
ic
at
e the
start an
d
e
nd
of
t
he
sen
te
nce resp
ect
ively
.
2.2.
Data
sp
arsit
y problem
Fo
r
a
ny
-
gr
am
that
ap
pea
red
an
a
dequate
num
ber
of
ti
m
e
s,
we
m
igh
t
ha
ve
a
good
e
sti
m
at
e
of
it
s
pro
bab
il
it
y.
But
beca
us
e
a
ny
corp
us
is
lim
i
te
d,
s
om
e
per
f
ect
ly
acce
ptable
wor
d
seq
ue
nc
es
are
bo
und
to
be
m
issi
ng
from
it
.
T
hat
m
eans,
the
re
will
be
m
any
cases
in
wh
ic
h
c
o
rr
ect
-
gram
sequ
e
nc
es
will
be
assi
gn
e
d
zero
pro
ba
bili
t
y.
For
exam
ple,
sup
po
se
in
t
he
trai
ni
ng
set
the
bi
gr
am
একট
ি
(
ekt
i
)
আম
(
aam
)
occurs
5
ti
m
es
bu
t
al
thou
gh
c
orrect
the
re
is
zero
occurre
nc
e
of
t
he
sim
il
ar
bi
gr
am
এক
টি
(
ekt
i
)
আদ
েল
(
apple
)
.
N
ow
s
uppose
we have
the
f
ollow
in
g sente
nc
e in the
test
set
,
কাদে
র
একটি
আদ
েল
খেদেদে
[engli
sh
]
Kad
e
r
at
e a
n
a
pp
le
(
Kad
er
ekt
i
apple
kheyche
y)
Since
the
bigr
a
m
এক
টি
(
ekt
i
)
আদ
েল
(
appl
e
)
has
ze
r
o
co
un
t
in
t
he
trai
ning
co
r
pu
s
,
in
the
bigram
m
od
el
the
pro
bab
il
it
y
will
be
zero
as
P
(
আদ
েল
(
apple
)|
একট
ি
(
ekt
i
))
=
0.
Co
ns
e
quen
tl
y,
the
prob
a
bi
li
t
y
of
the
se
ntence
w
il
l
be,
P
(
কাদে
র
একটি
আদে
ল
খেদেদে
)
=
0.
T
his
pro
ba
bili
ty
will
be
zero
since
acc
ordin
g
to
(
1
)
t
he
sente
nce
pro
ba
bili
ty
is
cal
c
ulate
d
by
m
ult
iply
ing
t
he
c
onsti
tuent
-
gram
prob
a
bili
ti
es
and
if
on
e
of
them
is
zero
th
en
total
p
r
oba
bi
li
t
y
will
be
ze
r
o.
Th
e
refo
re,
t
hese
ze
r
o
-
fr
e
quency
-
gr
am
sequ
e
nces
that d
o
no
t
occur
in
the
t
ra
ining
data
but
app
ea
r
in
the
t
est
set
po
ses
great
pr
oble
m
fo
r
sim
ple
-
gram
m
od
el
s
in
acc
ur
at
e
pro
bab
il
it
y est
im
at
ion
of the
s
entences
.
2.3.
Smoot
hing
Sm
oo
thing
te
chn
i
qu
e
s
are
use
d
to
kee
p
a
LM
from
assigning
ze
ro
pr
ob
a
bili
ty
to
unsee
n
w
ord
seq
uen
ces
,
an
d
has
bec
om
e
a
n
ind
is
pe
ns
abl
e
par
t
of
any
L
M.
In
this
w
or
k,
we
util
iz
ed
the
two
m
os
t
widely
us
e
d
sm
oo
thin
g
al
go
rithm
s
fo
r
la
ngua
ge
m
od
el
li
ng
nam
ely
W
it
te
n
-
Be
ll
(
W
B
)
sm
oo
thi
ng
an
d
K
neser
-
Ney
(KN)
sm
oo
thin
g.
Sm
oo
thin
g
te
chn
i
qu
e
s
are
of
te
n
i
m
ple
m
e
nted
in
co
njun
ct
ion
with
tw
o
us
ef
ul
strat
egie
s
that
ta
ke
ad
va
ntage
of
the
l
ow
e
r
orde
r
-
gram
s
fo
r
the
cal
culat
io
n
of
highe
r
ord
er
-
gram
s
that
yi
el
ds
zero
or
l
ow
p
r
obabili
ti
es. T
hese a
re
backo
ff
[
24
]
a
nd inte
rpolat
ion [
25
]
strat
egies.
2.4.
Witten
-
bell
s
mo
ot
hin
g
W
it
te
n
Be
ll
(W
B)
sm
oo
thi
ng
com
pen
sat
es
the
co
unts
of
word
seq
ue
nce
s
occ
urrin
g
on
ce
to
est
i
m
at
e
the
co
unts
of
zero
f
reque
nc
y
wor
d
se
que
nces.
O
rigina
ll
y,
W
B
sm
oo
thin
g
al
gorithm
was
im
ple
m
e
nted
as
a li
near
i
nter
pola
ti
on
insta
nce
ta
kin
g ad
va
nta
ge of
lo
wer
ord
er
-
gram
co
un
t
s.
−
(
|
−
+
1
…
−
1
)
=
(
−
+
1
…
−
1
)
(
|
−
+
1
…
−
1
)
+
[
1
−
(
−
+
1
…
−
1
)
]
−
(
|
−
+
2
…
−
1
)
(3)
Her
e
,
1
−
(
−
+
1
…
−
1
)
is
the
total
pro
bab
il
it
y
m
ass
that
is
disco
unte
d
to
al
l
the
zero
-
gr
a
m
s
and
(
−
+
1
…
−
1
)
is
the
the
le
ftov
e
r
pro
ba
bili
ty
m
ass
of
for
al
l
no
n
-
ze
r
o
c
ount
-
gram
s.
W
it
h
a
li
tt
le
adjustm
ent
the
W
B
sm
o
oth
in
g
can
be
im
plem
ented
as
an
i
ns
ta
nce
of
bac
koff
la
ngua
ge
m
od
el
.
The
ba
ckof
f
ver
si
on of
WB
sm
oo
thing can
be writt
en
as:
−
(
|
−
+
1
…
−
1
)
=
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
An
ex
plo
r
atory
resea
r
c
h o
n gr
amm
ar
c
heckin
g of B
angla
…
(
Md. Ri
azur
Ra
hma
n
)
3247
{
(
−
+
1
…
−
1
)
(
|
−
+
1
…
−
1
)
,
(
−
+
1
…
)
>
0
[
1
−
(
−
+
1
…
−
1
)
]
−
(
|
−
+
2
…
−
1
)
,
ℎ
(4)
2.5
.
Kneser
-
ney s
mo
ot
hin
g
In
K
nese
r
-
Ney
(K
N
)
sm
oo
thing
the
lo
wer
-
orde
r
distrib
utio
n
that
on
e
com
bin
es
with
a
hi
gh
e
r
-
orde
r
distrib
ution
is
bu
il
t
on
the
i
ntu
it
ion
that
rath
er
tha
n
cal
c
ula
ti
ng
t
he
pr
ob
a
bili
ty
of
a
w
ord
pro
portiona
l
to
it
s
nu
m
ber
of
occ
ur
e
nces,
it
sh
ould
be
cal
cula
te
d
base
d
on
t
he
num
ber
of
diff
e
re
nt
words
it
fo
ll
ow
s.
I
n
it
s
or
i
gin
al
def
i
niti
on
,
K
nese
r
a
nd
Ney
de
fine
d
K
N
sm
oo
thi
ng
as
a
backo
ff
la
ngua
ge
m
odel
com
bin
ing
lowe
r
order m
od
el
s
with
higher
or
der m
od
el
u
sin
g back
off st
rategy as:
−
(
|
−
+
1
…
−
1
)
=
{
{
(
−
+
1
…
)
−
,
0
}
(
−
+
1
…
−
1
)
,
(
−
+
1
…
)
>
0
(
|
−
+
1
…
−
1
)
−
(
|
−
+
2
…
−
1
)
,
ℎ
(5)
w
he
re
(
|
−
+
1
…
−
1
)
repres
ent
the
backo
ff
weig
hts
ass
ign
e
d
to
t
he
l
ow
e
r
order
-
gram
s
wh
ic
h
determ
ine
the
im
pact
of
t
he
lowe
r
orde
r
val
ue
on
the
res
ul
t.
T
he
disc
ount
re
pr
ese
nts
th
e
am
ou
nt
of
co
un
t
s
that are
disco
unte
d from
each
higher o
rd
e
r
-
gr
am
s.
can
be e
stim
at
ed
based
on t
he
total
num
ber
of
-
gr
a
m
s
occurri
ng
exac
tl
y
on
ce
(
1
)
a
nd twice
(
2
)
as
=
1
1
+
2
2
.
Th
e
pro
bab
il
it
y
f
or
t
he
lo
we
r
or
der
-
gr
am
s
can
be
cal
culat
ed
as
−
(
|
−
+
2
…
−
1
)
=
1
+
(
−
+
2
…
)
1
+
(
−
+
2
…
−
1
)
(6)
wh
e
re,
1
+
(
−
+
2
…
)
=
|
{
−
+
1
:
(
−
+
1
…
)
>
0
}
|
and
1
+
(
−
+
2
…
−
1
)
=
∑
1
+
(
−
+
2
…
)
.
W
it
h a l
it
tl
e m
od
ific
at
ion
th
e inter
po
la
te
d
ver
si
on KN
of
can
be defi
ned
as foll
ows:
−
(
|
−
+
1
…
−
1
)
=
{
(
−
+
1
…
)
−
,
0
}
(
−
+
1
…
−
1
)
+
(
|
−
+
1
…
−
1
)
−
(
|
−
+
2
…
−
1
)
(7)
3.
PROP
OSE
D GR
AMM
AR
CHEK
CI
NG MET
HO
DOL
OGY
In
this
sect
io
n
we
pr
ese
nt
t
he
gram
m
ar
chec
king
m
et
ho
dol
og
y
t
hat
we
use
d
to
e
valuate
and
an
al
yse
the
perf
or
m
ances
of
sm
oo
thin
g
al
gorithm
s.
It
is
an
updated
ver
si
on
o
f
th
e
gr
am
m
ar
check
er w
e
present
ed
a
nd
descr
i
bed
in
our
previ
ous
w
ork.
The
ove
ra
ll
fr
am
ewo
r
k
or
w
ork
flo
w
of
the
syst
em
is
dep
ic
te
d
in
Figure
1.
The
work
i
ng
proce
dure
of
t
he
gram
m
ar
c
heck
e
r
c
on
sist
s
of
t
hr
ee
m
ai
n
phases:
T
rai
ning
phase
,
va
li
dation
ph
a
se a
nd test
ing p
hase
.
Figure
1.
Wo
r
k
flo
w diag
ram
for pr
opos
e
d
t
he
gram
m
ar ch
e
cker
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
32
4
4
-
32
5
2
3248
The
trai
ning
process
i
n
the
pr
opos
e
d
syst
em
sta
rts
by
acce
pting
the
t
raini
ng
co
rpus
an
d
the
value
as
input.
Af
te
r
acce
ptin
g
the
input
te
xt
an
d
the
value,
pos
sible
-
gr
am
patte
rn
s
of
w
ords
are
ext
racted
a
nd
fr
e
qu
e
ncies
of
-
gr
am
s
are
then
cal
culat
ed
.
Usi
ng
these
-
gram
fr
eq
uenci
es
LMs
are
trai
ned
base
d
on
the
al
gorithm
s
discuss
e
d
in
the
pr
e
vi
ou
s
s
ect
ion
s.
In
t
he
validat
ion
ph
ase,
a
best
po
ssible
thres
hold
is
cal
culat
ed
f
or
separ
ti
ng
t
he
correct
or
in
correct
sente
nc
es.
The
vali
dat
ion
proces
s
sta
rts
by
acc
epting
a
validat
io
n
or
held
ou
t
set
co
nsi
sti
ng
of
a
set
of
co
rr
ect
a
nd
inco
rr
ect
te
st
s
en
te
nces
.
T
he
n
the
pro
bab
il
it
ie
s
of
these te
st sente
nces a
re calcul
at
ed
an
d
a t
hr
e
sh
ol
d value is
determ
ined
tha
t best sepa
rates
the gram
m
a
ti
c
al
and
ungram
m
at
ic
a
l
sentences.
T
o
do
so
fir
st
we
need
to
de
fine
a
m
et
ho
d
to
cal
culat
e
the
sentence
pro
ba
bili
ty
pro
pe
rly
which
is d
isc
us
se
d n
ext.
3.1.
Ca
lc
ul
at
i
on
of sent
e
nce pr
obabil
it
y
The
se
ntence
pro
ba
bili
ty
i
n
-
gram
LMs
is
usual
ly
cal
culat
ed
us
in
g
(
1
)
by
firs
t
fin
ding
the
co
ns
ti
tue
nt
-
gram
s
in
the
s
entence
a
s
s
hown
i
n
sect
io
n
2.1.
Since
pro
ba
bili
ti
es
are
by
d
efi
niti
on
le
ss
t
han
or
e
qual
to
1,
the
m
or
e
pro
bab
il
it
ie
s
we
m
ul
ti
ply
tog
et
her,
the
sm
al
l
er
the
pro
du
ct
bec
om
es.
Due
to
that
sentence
le
ngth
(i.e.
t
he
nu
m
ber
of
wor
d
to
kens
in
the
se
ntence
)
has
a
neg
at
ive
ef
fect
on
the
pr
ob
a
bi
li
t
y
of
a
sentence.
W
it
h
la
rg
e
r
le
ng
th
a
sentence
te
nd
s
to
ha
ve
sm
a
ll
er
pr
oba
bili
ti
es
even
thou
gh
hav
i
ng
higher
pro
bab
il
it
y
con
sti
tuent
-
gra
m
s.
So
,
a
la
rger
le
ng
t
h
co
rrec
t
sentence
m
igh
t
hav
e
s
m
al
le
r
pr
obabi
li
ty
than
a
sm
aller
le
ngth
inc
orrect
se
ntence
beca
use
of
this
e
ffec
t.
To
dea
l
with
this
im
pact
of
sente
nce
le
ng
t
h
on
sentence
pr
obabili
ty
cal
culat
ion
a
ne
w
se
ntence
pro
ba
bi
li
t
y
scor
i
ng
f
un
ct
io
n
is
intr
oduce
d
i
n
this
w
ork
def
i
ned in
(
8
)
by no
rm
alizi
ng
the se
ntence
proba
bili
ty
in
(
1
)
.
(
)
=
√
∏
(
|
−
+
1
…
−
1
)
=
1
(8)
3.2.
Opt
im
al
t
hres
ho
ld c
alcula
ti
on
In
the
validat
io
n
ph
a
se,
optim
al
thres
hold
f
or
the
-
gr
am
ba
sed
cl
ass
fier
is
cal
culat
ed
i
n
t
wo
sta
ges
.
In
the
first
sta
ge,
we
us
e
d
10
-
fo
l
d
cr
os
s
va
li
dation
on
the
trai
nin
g
set
w
hich
c
on
sist
s
of
only
gr
am
m
a
ti
cal
ly
correc
t
senetcn
es.
Since,
a
c
orrect
sente
nce
ty
pical
ly
has
a
hig
he
r
pr
ob
a
bi
li
t
y
than
an
in
correct
one,
in
each
fo
l
d
we
sel
ect
ed
the
l
ow
e
st
pr
ob
a
bili
ty
scor
e
am
on
g
the
se
ntences
of
trai
ning
par
t
as
t
he
thres
hold
a
nd
us
ed
that
thres
ho
l
d
to
cl
assify
t
he
te
st
senten
ces
and
fin
d
the
m
isc
la
ssifi
cat
ion
er
ror
with
that
thre
sh
ol
d.
The
th
reshold
t
hat
has
the
m
ini
m
u
m
m
isc
la
ss
ifcat
on
e
rror
is
finall
y
cho
se
n
as
the
final
thr
esh
old
.
T
he
pr
ocess
is an im
pr
oved
v
e
rsion to
the
process
w
e
u
se
d
in
ou
r pr
e
vious
wor
k.
T
he
proce
s
s is
e
xp
la
i
ned in
Algo
rith
m
1
.
Algorithm
1
. P
rilim
inary thr
e
sh
ol
d
sel
ect
io
n from
training
set
in
sta
ge
1
Inp
ut:
S=
trai
ni
ng
data set
;
L = cor
resp
on
ding tr
ue
la
bel
s o
f
posit
iv
e a
nd
negativ
e se
ntences i
n
V
S
LM = l
angu
age mod
el
t
o be
use
d
1.
Divid
e
the
data
set into
10 e
qu
al
sized s
ubset
s as
S = {
S
1
, S
2
,…., S
10
}
2.
Set
MCR
min
=
1 /
/t
he
m
ini
m
u
m
m
isc
la
ssificati
on
rate an
d
Set
T
=
final t
hresh
old
3.
Fo
r
i = 1
t
o 10
Do,
4.
Set
S
test
=
S
i
a
nd
S
train
=
S
-
S
i
5.
T
rain
t
he
LM
on
S
train
.
6.
t
= Fin
d
the
m
i
nim
u
m
p
robab
i
li
ty
in
S
train
and set i
t as cu
rr
e
nt
thr
es
ho
l
d
7.
pr
obs
= Test
t
he
LM on
S
test
usi
ng t a
s
thres
hold.
.
8.
mcr
=
Fin
d
t
he
m
isc
la
ssific
at
i
on r
at
e
for
t
he c
urren
t t
hr
es
ho
ld.
9.
If m
cr <
MCR
min
then Set M
CR
min
=
mc
r
and T
=
t
10.
En
d
F
or
11.
return
T //T
is t
he final
th
res
hold
sel
ect
ed
Th
ough
m
et
ho
ds
in
the
first
sta
ge
work
we
ll
bu
t
they
i
ntr
oduce
a
lot
of
false
posit
ives
in
the
fina
l
cl
assifi
cat
ion
.
Since
we
are
usi
ng
t
he
m
ini
m
um
pr
obabili
ty
sco
re
of
c
orre
ct
or
posit
ive
s
entences
a
s
th
r
esh
old
it
ensu
res
hi
gh
true
posit
ives
bu
t
it
adv
ers
e
ly
ov
erlaps
with
a
su
bs
ta
ntial
nu
m
ber
of
in
correct
senten
ces
i
n
the proba
bili
ty
distrib
ution. He
nce,
the
high
false p
os
it
ive ra
te
s.
To red
uce
the unw
a
nted hig
h
num
ber
of f
al
s
e
po
sit
ives
an
d
t
o
im
pr
ove
the
cl
assifi
cat
ion
pe
rfor
m
ance
ov
erall
in
the
sec
ond
sta
ge
we
us
e
d
a
m
et
ho
d
that
gr
a
dual
ly
incr
eases t
he
thres
hold to
re
duce the num
ber
o
f
f
al
se p
os
it
ives but al
so
en
s
ur
es
the b
al
ance b
e
twee
n
false
posit
ives
and
false
ne
ga
ti
ves.
This
m
et
hod
is
ap
plied
on
a
se
par
at
e
validat
ion
se
t
con
sist
in
g
of
equ
al
nu
m
ber
o
f
pos
it
ive
and
ne
ga
ti
ve
sente
nces
to
finali
ze
th
e
optim
al
thresh
ol
d.
This
pr
ocess
is
ex
plained
i
n
Algorithm
2
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
An
ex
plo
r
atory
resea
r
c
h o
n gr
amm
ar
c
heckin
g of B
angla
…
(
Md. Ri
azur
Ra
hma
n
)
3249
In
the test
in
g
phase, the classi
ficat
ion
LMs a
re teste
d
on a s
epar
at
e test
set con
sist
in
g
of
gr
am
m
atical
and
ungram
m
at
ic
al
sentences
us
in
g
the
o
ptim
u
m
thresh
ol
d
cal
culat
e
d
in
the
validat
ion
phase.
If
a
ny
senetence
has
a
pro
bab
il
it
y
le
ss
than
the
optim
u
m
thresh
ol
d
the
n
it
is
cl
assifi
ed
as
ungr
a
m
m
a
ti
cal
oth
erw
ise
gr
am
m
atical
.
Algorithm
2
. O
pti
m
al
thr
esh
ol
d
sel
ect
io
n
f
r
om
v
al
idati
on
se
t i
n
sta
ge
2
Inp
ut:
t
0
=
prel
i
m
inary thr
es
ho
l
d
cal
culat
ed fr
om
the trainin
g
set
usi
ng A
l
gorithm
1
;
VS = v
alid
atio
n
set
L = cor
resp
on
ding tr
ue
la
bel
s o
f
posit
iv
e a
nd
negativ
e se
ntences i
n
V
S
LM = l
angu
age mod
el
t
o be
use
d
1.
Ca
lc
ulate
[
TP,
FP,TN,FN
]
us
i
ng
t
0
as
th
res
hold
te
sti
ng
on
VS
wh
e
re
TP
=
no.
of
tr
ue
posit
ives,
FP
=
no. of
f
al
se
pos
it
ives,
TN
=
no
. of
tr
ue ne
gatives,
FN
=
no. of
tru
e
posit
ive
s
Ca
lc
ulate
FPR
=
FP/(
TN+F
P)
,
FNR
=FN/
(
TP+FN)
an
d
MCR
=FPR
+
FNR
where
F
PR
=
false
po
sit
iv
e r
ate, F
NR = f
alse ne
gati
ve rate
and
MCR
=
over
all miscl
as
sif
ic
ati
on ra
te
2.
Set
th
=
t
0
//
th
is t
he
fi
nal th
re
sh
ol
d
Divid
e
the
ra
nge
[
t
0
, 1
]
i
nto
k
eq
ual size
d t
hr
esh
old
s i
n
T
H
S = {
t
1
,
t
2
,…., t
k
}
3.
Fo
r
each th
res
ho
l
d
t
in
T
HS
Do,
4.
Ca
lc
ulate
[
TP,
FP,TN,FN
]
us
i
ng
t
as t
hr
es
hold
on
VS
a
nd h
e
nce calc
ulate
the
fpr
t
an
d
f
nr
t
for
t.
5.
If
f
pr
t
≤
fnr
t
an
d
MCR
≥
f
pr
t
+
fnr
t
the
n,
6.
Set
th
=
t,
FPR
=
f
pr
t
, FN
R =
fnr
t
an
d
MCR
=
FPR
+FNR
7.
En
d
I
f
8.
En
d
F
or
9
return
t
h
//
th
is
the
final th
res
ho
l
d
sel
ect
ed
4.
RESU
LT
S
A
ND AN
ALYSIS
The
m
ai
n
fo
cu
s
of
t
his
sect
io
n
is
to
in
vestigat
e
the
pe
rfo
r
m
ance
of
t
he
gr
am
m
ar
check
in
g
syst
em
base
d
on
ce
rtai
n
factors
s
uc
h
a
s
t
he
sm
oo
t
hing
al
gori
thm
us
ed,
-
gr
a
m
orders,
le
ng
t
h
of
t
he
t
arg
et
sentences
et
c.
To
trai
n
a
nd
te
st
the
LMs
we
us
ed
a
la
r
ge
c
orp
us
of
20
m
i
ll
ion
w
ords
co
ntainin
g
18
18
20
gr
am
m
atical
l
y
correct
sente
nc
es.
A
r
ound
80
%
of
the
c
orp
us
is
us
e
d
for
tr
ai
nin
g
pur
pose
.
The
validat
io
n
set
consi
sts
of 20000
co
rrec
t
se
ntences
a
nd 20000
i
nc
orrect
se
ntences
.
T
he
gram
m
at
ic
a
ll
y
i
ncorr
ect
sente
nc
es
ar
e
arti
fici
al
ly
cre
at
ed
by
inseti
ng,
delet
ing
or
rep
la
ci
ng
wor
ds
in
the
c
orre
ct
sentences
in
the
set
.
The
t
est
set
con
ta
in
s
1500
0
c
orrect
a
nd
15000
inc
orrec
t
sentence
s.
In
our
pr
e
vious
w
ork,
we
onl
y
t
est
ed
the
m
et
ho
ds
on
a
te
st
set
con
t
ai
nin
g
only
sim
ple
senetnces
of
le
ngth
of
5
-
10
w
ords
.
T
his
tim
e
we
te
ste
d
the
te
c
hn
i
qu
e
s
on
a
m
or
e d
iffic
ul
t and
pract
ic
al
test
set con
sist
ing
of
all
k
in
ds
of
sim
ple, co
m
plex
an
d
c
om
po
un
d
sente
nces
w
ith
le
ng
th
s
ra
ng
i
ng
from
5
to
20
w
ords.
T
he
exp
e
rim
ents
hav
e
bee
n
te
ste
d
on
a
m
achine
with
2.4
0GHz
I
ntel
Core
i
3
pro
ces
so
r
a
nd
12
GB
of
R
AM,
run
ni
ng
on
Mi
cro
s
of
t
W
i
ndows
8.
The
ex
per
im
e
ntal
syst
e
m
has
bee
n
dev
el
op
e
d
us
i
ng
python
pro
gr
a
m
m
ing
la
ngua
ge.
The
c
om
par
at
ive
perf
or
m
ances
of
t
he
LMs
we
re
e
va
luate
d
by
preci
sion,
r
ecal
l
and
f
-
sc
or
es
.
T
he
ove
r
al
l
per
f
or
m
ances
of
t
he
diff
e
ren
t
LMs
base
d
on
th
e
sm
oo
ti
ng
te
chn
iq
ues
and
-
gram
or
de
r us
ed
a
re
pr
ese
nte
d
in
Ta
ble 1.
Table
1
repres
ents
the
res
ults
of
diff
e
re
nt
LMs
for
each
m
et
ric
(precisi
on,
recall
&
f
-
sc
or
e
)
in
tw
o
colum
ns
.
The
gr
ay
s
had
e
d
c
ol
um
n
rep
rese
nt
s
the
res
ults
obta
ined
us
i
ng
t
he
th
reshold
se
le
ct
ion
m
et
ho
d
us
e
d
in
our
pr
e
viou
s
work
[22]
an
d
the
ot
her
c
ol
um
n
rep
rese
nts
the
resu
lt
s
at
ta
ined
us
in
g
ou
r
two
sta
ge
thr
esh
ol
d
sel
ect
ion
proce
dure
ex
plaine
d
in
Algorithm
1
an
d
Al
gorith
m
2,
wh
ic
h
is
pro
po
se
d
in
th
is
wo
r
k.
O
ur
ne
wly
pro
po
se
d
tw
o
s
ta
ge
optim
u
m
t
hr
es
hold
sel
ect
ion
a
ppro
ac
h
c
le
arly
pr
ovide
s
sign
ific
a
ntly
i
m
pr
ov
e
d
res
ults
for
al
l
the
LMs
co
m
par
ed
to
the
pr
e
vious
a
ppr
o
ach.
It
sig
nific
antly
increases
the
preci
sio
n
and
he
nce
the
ov
e
rall
f
-
sc
or
e
for
al
l
the
LMs
with
the
cost
of
s
m
al
l
or
insigni
ficant
reducti
on
i
n
recall
va
lues
f
or
gr
am
m
at
ic
al
sentences
.
Sim
il
arly
,
fo
r
ungra
m
m
a
ti
cal
sentences
the
rec
al
l
scor
es
are
sign
ific
a
ntly
i
m
pr
ov
e
d
res
ulti
ng
i
n
m
uch
i
m
pr
ov
e
d
f
-
s
co
re
with
the
neg
li
gi
ble
loss
of
preci
s
ion
val
ues.
T
hi
s
i
m
pr
ov
e
d
pe
rfor
m
ance
is
du
e
t
o
the
re
duct
ion
in
f
al
se
posit
ive
s
an
d
al
s
o
kee
ping
a
balance
betwee
n
false
po
sit
ives
an
d
f
al
se
ne
gatives.
These
resu
lt
s
pro
ve
the
superi
or
it
y
of
ou
r
pro
po
se
d
m
e
tho
d
c
ompare
d
to
the
previo
us
one.
From
the
new
ly
fou
nd
resu
lt
s
i
n
Ta
bl
e
1
it
is
e
vid
e
nt
that,
K
N
-
inte
rp
with
it
s
5
-
gr
a
m
m
od
el
cl
early
outper
f
or
m
s
al
l
the
oth
e
r
LMs
in
te
rm
s
of
pr
eci
s
ion
,
recall
and
f
-
sc
or
e
f
or
bo
th
gr
am
m
at
ic
a
l
and
ungram
mati
cal
sentences
achievin
g
hig
he
st
f
-
sc
or
e
s
of
72.
92%
an
d
68.
51%
res
pecti
vely
.
I
n
te
rm
s
of
f
-
scor
e
as
we
ca
n
see
from
the
T
able
1,
W
B
-
ba
ckoff
pro
du
ces
the
s
econd
best
res
ults
f
or
both
gra
m
m
a
ti
cal
and
ungram
m
a
ti
ca
l
sentence
s
with
KN
-
backo
f
f
m
od
el
pro
vid
in
g
t
he
t
hir
d
best
pe
rfo
rm
ance.
The
m
od
el
s
ra
nk
sim
il
ia
rly
in
te
rm
s
of
pr
eci
sio
n
a
nd
recall
with
on
e
or
two
e
xce
ptio
ns suc
h
as
for rec
al
l
m
et
ric KN
-
backo
ff
pe
rform
s sl
igh
tl
y bet
te
r
tha
n W
B
-
ba
ckoff.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
32
4
4
-
32
5
2
3250
Table
1.
Per
for
m
ances of
different
LMs
Mod
els
N
-
g
ra
m
Order
Perf
o
r
m
an
ces
with
Gr
a
m
m
a
tical
Data
Perf
o
r
m
an
ces
with
Ung
ra
m
m
a
tic
al D
ata
Precisio
n
Recall
F
-
sco
re
Precisio
n
Recall
F
-
sco
re
W
ith
T1
W
ith
T2
W
ith
T1
W
ith
T2
W
ith
T1
W
ith
T2
W
ith
T1
W
ith
T2
W
ith
T1
W
ith
T2
W
ith
T1
W
ith
T2
WB
-
b
acko
ff
2
3
4
.56
%
4
2
.10
%
5
2
.67
%
5
1
.54
%
4
1
.74
%
4
6
.35
%
3
8
.69
%
3
7
.53
%
2
4
.89
%
2
9
.11
%
3
0
.29
%
3
2
.79
%
3
5
2
.31
%
6
1
.81
%
7
3
.36
%
7
1
.57
%
6
1
.07
%
6
6
.33
%
6
8
.29
%
6
6
.24
%
5
0
.76
%
5
5
.78
%
5
8
.24
%
6
0
.56
%
4
5
5
.43
%
6
6
.48
%
7
5
.76
%
7
3
.55
%
6
4
.02
%
6
9
.84
%
7
2
.58
%
7
0
.40
%
5
6
.32
%
6
2
.91
%
6
3
.43
%
6
6
.45
%
5
5
6
.31
%
6
6
.42
%
7
5
.29
%
7
4
.25
%
6
4
.43
%
7
0
.12
%
7
3
.01
%
7
0
.81
%
5
5
.89
%
6
2
.46
%
6
3
.31
%
6
6
.37
%
WB
-
in
terp
2
3
1
.87
%
4
0
.45
%
5
1
.51
%
5
0
.55
%
3
9
.38
%
4
4
.94
%
3
5
.15
%
3
4
.09
%
2
0
.32
%
2
5
.58
%
2
5
.75
%
2
9
.23
%
3
5
2
.12
%
6
0
.70
%
6
9
.47
%
6
7
.91
%
5
9
.56
%
6
4
.10
%
6
5
.55
%
6
3
.58
%
4
7
.98
%
5
6
.02
%
5
5
.41
%
5
9
.56
%
4
5
3
.11
%
6
4
.38
%
7
3
.85
%
7
2
.26
%
6
1
.79
%
6
8
.10
%
7
0
.52
%
6
8
.40
%
5
0
.21
%
6
0
.03
%
5
8
.66
%
6
3
.94
%
5
5
5
.02
%
6
4
.70
%
7
5
.34
%
7
3
.36
%
6
3
.60
%
6
8
.76
%
7
1
.39
%
6
9
.24
%
5
3
.41
%
5
9
.97
%
6
1
.10
%
6
4
.27
%
KN
-
b
acko
ff
2
3
2
.12
%
3
8
.81
%
5
0
.12
%
4
8
.61
%
3
9
.15
%
4
3
.16
%
3
2
.22
%
3
1
.25
%
1
9
.97
%
2
3
.36
%
2
4
.66
%
2
6
.73
%
3
4
9
.18
%
5
9
.64
%
6
9
.92
%
6
8
.02
%
5
7
.75
%
6
3
.56
%
6
4
.75
%
6
2
.80
%
4
7
.65
%
5
3
.97
%
5
4
.90
%
5
8
.05
%
4
5
1
.55
%
6
2
.61
%
7
6
.56
%
7
4
.84
%
6
1
.61
%
6
8
.18
%
7
0
.86
%
6
8
.73
%
5
0
.33
%
5
5
.31
%
5
8
.86
%
6
1
.29
%
5
5
2
.44
%
6
4
.01
%
7
8
.93
%
7
7
.00
%
6
3
.01
%
6
9
.91
%
7
3
.35
%
7
1
.14
%
5
0
.91
%
5
6
.71
%
6
0
.10
%
6
3
.11
%
KN
-
in
terp
2
3
5
.76
%
4
4
.79
%
5
6
.30
%
5
5
.36
%
4
3
.74
%
4
9
.52
%
4
2
.86
%
4
1
.57
%
2
5
.87
%
3
1
.76
%
3
2
.26
%
3
6
.01
%
3
5
2
.89
%
6
2
.61
%
7
4
.60
%
7
2
.99
%
6
1
.90
%
6
7
.40
%
6
9
.72
%
6
7
.62
%
4
8
.79
%
5
6
.42
%
5
7
.41
%
6
1
.52
%
4
5
7
.09
%
6
7
.18
%
7
7
.38
%
7
6
.46
%
6
5
.70
%
7
1
.52
%
7
3
.63
%
7
2
.69
%
5
5
.88
%
6
2
.64
%
6
3
.54
%
6
7
.29
%
5
5
8
.71
%
6
8
.15
%
7
9
.51
%
7
8
.41
%
6
7
.54
%
7
2
.92
%
7
5
.70
%
7
4
.58
%
5
6
.10
%
6
3
.35
%
6
4
.44
%
6
8
.51
%
*
Here
,
T1
is
th
e
th
resh
o
ld
calculated
u
sing
th
e
th
resh
o
ld
selectio
n
alg
o
rith
m
d
ef
in
ed
in
o
u
r
p
revio
u
s
wo
rk
[
2
2
].
T2
is
t
h
e
th
r
esh
o
l
d
calculated
us
in
g
the two
-
stag
e thresh
o
ld
selectio
n
techn
iq
u
e intro
d
u
ced in
th
is wo
rk.
Perfo
rm
ances
of
the
LMs
i
npr
ov
e
with
the
gro
wing
orde
r
of
-
gr
am
an
d
th
e
perf
or
m
ance
i
m
pr
ovem
ent
gets
le
sser
with
each
highe
r
order.
T
houg
h
the
perform
ances
of
m
os
t
of
t
he
LMs
te
nd
t
o
increase
f
ro
m
4
-
gr
am
or
de
r
to
5
-
gram
or
de
r
,
the
perform
a
nce
dif
fer
e
nces
are
ver
y
insig
nificant.
Fi
gur
e
2
and
Figure
3
de
pic
t
this
e
ff
ect
w
her
e
the
f
-
sc
ores
of
the
LMs
var
ie
d
by
t
he
-
gr
am
orde
r
a
re
prese
nted
f
or
bo
t
h
gr
am
m
atical
a
nd
ungram
m
a
t
ic
al
sentences.
Th
ough
not
presente
d
he
re,
si
m
il
ar
eff
ect
s
can
be
obse
r
ved
i
n
te
rm
s o
f
preci
s
ion
a
nd
recall
.
Since
we
are
us
in
g
a
data
set
con
sist
ing
of
va
ried
le
ng
th
of
sente
nce
s,
nex
t
we
try
to
find
out
wh
et
her
se
nten
ce
le
ng
th
has
a
ny
eff
ect
on
th
e
per
f
orm
ances
of
LMs.
Fi
gures
4
an
d
5
pr
e
sent
the
f
-
sco
r
es
of
two
of our
best
p
erfo
rm
ing
LMs, K
N
-
inter
p and
W
B
-
back
off
va
ried
by the leng
t
h
of se
ntences test
ed fo
r
both
gr
am
m
atical
a
nd
ungram
m
a
t
ic
al
data
resp
e
ct
ively
.
Fr
om
Figures
4
a
nd
5,
we
fi
nd
that
the
pe
rfor
m
an
ces
of
the
LMs
grad
ually
decr
ea
se
with
the
i
ncrea
sing
sente
nc
e
le
ngth
for
t
he
sente
nces.
T
his
is
unde
rsta
nd
a
ble
since
se
ntence
s
with
m
or
e
w
ords
or
higher
le
ng
th
will
te
nd
t
o
be
m
or
e
c
om
plex
in
struc
ture
a
nd
dif
ficult
t
o
be
j
ud
ged.
B
ut
this
de
gr
a
da
ti
on
in
pe
rfo
r
m
ance
is
li
ne
ar
no
t
e
xpone
nt
ia
l
and
c
hanges
a
re
ver
y
sm
a
ll
.
This
show
s
th
e
ro
bust
ness
of
ou
r
senten
c
e
prob
a
bili
ty
cal
culat
ion
f
un
ct
io
n
de
fine
d
in
(
8
)
.
T
hough
not
pr
ese
nted
her
e
,
per
f
or
m
ances
of
ot
her
LMs
(
KN
-
ba
ck
off
an
d
W
B
-
i
nter
p)
a
nd
on
oth
er
m
e
tric
s
sh
ows
sim
il
ar
char
act
e
risti
cs
for
the
de
pe
ndency
of
t
he
m
e
thod
on
sente
nc
e
le
ng
th
.
So,
we
can
c
on
cl
ude
t
hat
K
N
L
M
with
it
s
interp
olate
d
ve
rsion
i.e
.
K
N
-
i
nterp
out
pe
rfor
m
s
al
l
the
oth
e
r
LMs
in
te
rm
s
of
al
l
pe
rfor
m
ance
m
et
rics.
W
it
h
highe
r
-
gram
or
de
r
t
he
perform
ances
of
the
LMs
im
pro
ve
with
4
-
gram
and
5
-
gra
m
m
od
el
s
sh
owin
g
si
m
il
ar
per
f
orm
ances
with
neg
li
gib
le
di
ff
re
nces
a
nd
the
le
ngth
of
t
he
se
nten
ce
d
oe
s
not
aff
ect
the p
e
rfo
rm
ance of th
e
LMs si
gn
i
ficantl
y.
Figure
2
.
Ef
fec
t of
-
gr
am
o
r
de
r on t
he per
f
orm
ances
of LMs f
or
gr
a
m
m
a
ti
cal
d
at
a
Figure
3
.
Ef
fec
t of
-
gr
am
o
r
de
r on t
he per
f
orm
ances
of LMs
f
or
ungram
m
at
ic
a
l data
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
An
ex
plo
r
atory
resea
r
c
h o
n gr
amm
ar
c
heckin
g of B
angla
…
(
Md. Ri
azur
Ra
hma
n
)
3251
Figure
4
.
Ef
fec
t of sente
nce le
ng
t
h on the
perform
ances o
f
LMs f
or
gr
a
m
m
a
ti
cal
d
at
a
Figure
5
.
Ef
fec
t of sente
nce le
ng
t
h on the
perform
ances o
f
LMs f
or
ungram
m
at
ic
a
l data
5.
CONCL
US
I
O
N
The
goal
of
thi
s
resea
rch
was
to
desig
n
a
nd
dev
el
op
a
rob
ust
gr
am
m
ar
ch
eckin
g
syst
em
for
Ba
ng
la
la
nguag
e
wh
ic
h
ca
n
acc
urat
el
y
j
ud
ge
r
eal
ist
ic
,
si
m
ple
an
d
com
plex
se
nt
ences
f
or
gra
m
m
a
ti
cal
i
ty
.
To
at
ta
in
that
exte
nt,
a
s
ta
ti
sti
ca
l
gr
am
m
ar
chec
king
syst
e
m
bas
ed
on
-
gr
am
la
nguag
e
m
od
el
li
ng
has
bee
n
des
igne
d
and
de
velo
ped.
To
ac
hieve
rob
us
t
pe
rform
ance
with
-
gr
am
m
od
el
s
two
m
os
t
wide
ly
us
ed
sm
oo
thi
ng
te
chn
iq
ues
na
m
el
y
Kn
eser
-
ne
y
an
d
W
it
te
n
-
bell
wer
e
use
d
a
nd
c
om
par
ed
t
o
fi
nd
bes
t
perf
or
m
ing
s
yst
e
m
.
Fu
rt
her
m
o
re,
t
he
LMs’
pe
rform
ances
wer
e
te
ste
d
on
a
ne
w
ly
dev
el
op
e
d
c
halle
ng
i
ng
te
st
set
con
ta
inin
g
30000
al
l
ty
pes
of
si
m
ple,
co
m
plex
and
c
om
po
und
sen
te
nces
to
at
t
ai
n
reali
sti
c
perform
ance
resu
lt
s
.
Our
ex
per
im
ental
resu
lt
s
sh
ow
that
Kn
ese
r
-
ney
interp
olate
d
sm
oo
thi
ng
ba
sed
5
-
gram
L
M
ou
tpe
rfor
m
s
oth
er
s
in
te
rm
s
of
al
l
the
m
et
rics
ach
ie
vin
g
f
-
sc
or
es
of
72.92%
a
nd
68.
51
%
f
or
gram
m
at
ic
a
l
an
d
ungram
m
at
ical
data
resp
ect
ively
.
F
or
f
ur
the
r
this
researc
h
wor
k,
m
or
e
featur
e
s
su
c
h
as
par
ts
of
sp
eec
h
ta
gs
an
d
ot
her
li
nguisti
c
f
eat
ur
e
s ca
n be
added
to
im
prov
e
the
pe
rform
ance of th
e
s
yst
e
m
.
REFERE
NCE
S
[1]
E.
D.
L
idd
y
,
"
Natur
a
l
la
ngu
age
proc
essing
in
E
nc
y
cl
op
ae
di
a
of
Li
bra
r
y
and
Inf
orm
at
ion
Scie
n
c
e,
"
2nd
Edition,
Florida
:
CRC Pr
ess,
pp.
1
-
20
,
20
01
.
[2]
Shafa
f
Ibra
h
im,
Nurul
Am
ira
h
Zul
kifli,
Nurbai
t
y
Sabr
i,
Anis
Am
il
ah
Shari
,
an
d
Mohd
Rahmat
Mohd
Noordin,
"Rice
gra
in
cla
ss
ifi
ca
ti
on
usin
g
m
ult
i
-
cl
ass
support
vec
tor
m
ac
hine
(SV
M),"
IAE
S
Int
ernati
onal
Journal
of
Arti
ficial Int
el
l
ig
enc
e
(
IJ
-
AI)
,
vol
.
8
,
no
.
3
,
pp
.
21
5
-
220,
Sep
.
201
9.
[3]
Am
irul
Sadiki
n
Md
Affendi
,
Marina
Yus
off,
"Revi
ew
of
ano
m
al
ous
sound
eve
nt
de
tecti
on
a
pproa
che
s,"
IAES
Inte
rnational
Jo
urnal
of Artifici
a
l
Int
el
l
ige
nc
e
(
IJ
-
AI)
,
vol. 8, no.
3
,
pp
.
264
-
269
,
Sep.
2019
.
[4]
Cesar
G.
Pacho
n
-
Suescun,
Car
l
os
J.
Enc
iso
-
Aragon,
Robinson
Jim
ene
z
-
Moreno
,
"Robotic
Navi
gat
ion
Algori
th
m
with
Mac
hin
e
Vision,
"
Inte
rna
ti
onal
Journal
o
f
Elec
tric
al
and
Computer
Enginee
ring
(
IJE
CE)
,
vol
.
10
,
no
.
2
,
pp
.
1308
-
1316
,
Apr.
2020.
[5]
K.A.F.
A.
Sam
ah,
I.
M.
Bad
aru
d
in,
E
.
E
.
Odza
l
y
,
K.N.
Ism
ai
l,
N.I.
S.
Nasarud
in,
N.F.
Ta
har
,
M.H.
Khair
uddi
n
,
"O
pti
m
iz
at
ion
of
house
purc
hase
rec
om
m
enda
ti
on
sy
st
em
(HP
RS
)
using
gene
tic
algorithm,"
Indone
sian
Journal
of
El
e
ct
rica
l
Eng
in
ee
ring a
nd
Computer
Sc
ie
nc
e
(
IJEECS)
,
vo
l. 16,
no.
3
,
pp
.
1530
-
1538,
D
ec
.
2019
.
[6]
Vernon
A.
,
"
Com
pute
riz
ed
gr
amm
ar
che
ck
er
s
2000:
Capa
b
i
li
ties,
li
m
it
a
ti
on
s
,
and
p
eda
gog
ic
a
l
poss
ibi
liti
es,
"
Computers and Com
positi
on
,
vo
l.
17
,
no
.
3
,
pp
.
3
29
-
49
,
De
c.
200
0
.
[7]
Ric
har
dson,
S.,
"
Microsoft
nat
ur
al
la
nguag
e
und
ersta
n
ding
s
y
st
e
m
and
gra
m
m
ar
che
ck
er,
"
In
F
if
t
h
Confe
ren
ce
on
Appl
ie
d
Natural
Language
Proces
sing:
Descriptions
of
Syst
em
D
emonstrations
and
Vi
d
eos
,
pp
.
2
0
-
20,
1997
[8]
Arppe
A.
,
"
Deve
lopi
n
g
a
gr
amm
ar
che
cke
r
fo
r
Sw
edi
sh,"
In
Proce
ed
ings
of
the
12th
Nordic
Confe
renc
e
o
f
Computati
o
nal
L
ingui
stic
s (
NOD
ALIDA
1999)
,
p
p.
13
-
27
,
2000
.
[9]
Shaal
an
KF
.
,
"
Arabi
c
Gram
Chec
k:
A
gra
mmar
che
cke
r
for
Arabi
c,
"
Soft
ware:
Pr
act
i
ce
and
Ex
pe
rienc
e
,
vol.
35,
no.
7
,
pp
.
643
-
65
,
Jun.
2005
[10]
Bopche
L, Dhopa
vkar
G
,
Ks
hirsa
gar
M.
,
"
Gram
m
ar
Che
cki
ng
S
y
s
te
m
Us
ing
Rule
Based
Morpholo
gic
a
l
Proce
ss
for
an
India
n
La
ng
uage
,
"
In
Glob
al
Tr
ends
in
In
formation
System
s
and
Soft
war
e
Appl
i
cat
ions
,
Springer,
Berlin
,
Heide
lb
erg
,
pp.
524
-
531,
2012
.
[11]
Jensen
K,
Heido
rn
GE,
Ri
cha
rds
on
SD
,
"
Natur
al
la
ngua
ge
proc
e
ss
ing:
the
PLNL
P
appr
oac
h
,
"
Sp
ringer
Sci
en
c
e
&
Busine
ss
Me
dia
,
Dec
.
2012
.
[12]
Manning
CD,
Ragha
van
P,
Sch
utz
e
H.
,
"
Introd
uc
ti
on
to
Infor
m
at
ion
Ret
ri
eval
‖,"
Cambridge
Univer
sit
y
Pres
s,
Ch.
20,
pp.
405
-
416
,
2008
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
32
4
4
-
32
5
2
3252
[13]
Marti
n
JH
,
Jur
afsk
y
D.
,
"
Spe
ec
h
and
la
n
gu
a
ge
proc
essing:
An
int
roduc
ti
on
to
nat
ura
l
la
n
guage
proc
essin
g,
computat
ion
al
ling
uisti
cs,
and
spee
ch
re
cogni
t
ion
,
"
Pear
son/P
r
en
tice
Ha
ll,
2009
.
[14]
W
it
te
n
IH,
Bel
l
TC.
,
"
Th
e
ze
ro
-
fre
quency
probl
em:
Esti
m
at
ing
the
proba
bi
li
t
ie
s
of
novel
eve
n
ts
in
ada
pti
v
e
te
xt
compress
ion,
"
I
EE
E
transacti
on
s on
inf
orm
ati
on
the
ory
,
vo
l.
37
,
no.
4
,
pp
.
1085
-
94
,
Jul.
1991
.
[15]
Kneser
R,
Ne
y
H.
,
"
Im
prove
d
bac
king
-
off
for
m
-
gra
m
la
nguage
m
odel
ing.
In
Acoustic
s,
Spe
ec
h,
and
Signa
l
Proce
ss
ing,
"
1995
Inte
rnational
Confe
renc
e
on
Ac
oustic
s,
Spe
ec
h
,
and
Signal
Proce
ss
ing
ICASSP
-
95
,
IEE
E
,
v
ol.
1,
pp.
181
-
184
,
Ma
y
1995
.
[16]
Md.
Ifta
kh
er
Al
am
E
y
amin,
Md.
Ta
rek
Habib
,
M
.
Ift
e
Khair
u
l
Isl
am,
Md.
Sadeku
r
Rahman,
Md.
Abbas
Ali
Khan
,
"A
n
Inve
stiga
ti
v
e
Design
of
Optimum
Stocha
stic
La
nguag
e
Model
for
Bangl
a
Aut
ocomplet
e
,
"
Indone
sian
Journal
of
E
le
c
tric
al
En
gine
ering
and
C
omputer
Scienc
e
(
IJE
ECS)
,
vol
.
13,
no
.
2
,
pp
.
67
1
-
676,
2019
.
[17]
Muham
m
ad
Ifte
Khair
ul
Islam,
Md.
Ta
rek
Habi
b,
Md.
Sadekur
Rahman
and
Md.
Ria
zur
Rah
m
an,
"A
Conte
xt
-
Sensiti
ve
Appro
ac
h
to
Find
Opti
m
um
La
nguage
Model
for
Autom
at
ic
Bang
la
Sp
el
li
ng
Corr
ectio
n,
"
Inte
rnationa
l
Journal
of
Ad
va
nce
d
Comput
er
Sci
en
ce and
App
li
cati
ons
,
vo
l. 9,
no.
11
,
pp
.
184
-
191,
2018
.
[18]
Md.
Ta
rek
Habi
b,
Abdulla
h
Al
-
Mam
un,
Md.
Sa
dekur
Rahman,
Shah
Md.
Ta
nvir
Siddique
e
and
Farruk
Ah
m
ed,
"A
n
Expl
ora
tor
y
Approac
h
to
Find
a
Novel
Metr
ic
Based
Optim
um
La
nguage
Model
for
Au
tomati
c
Bang
la
W
ord
Predic
ti
on
,
"
In
ter
nati
onal Journal
of
Intelli
g
ent S
yste
ms
and
Applicat
ions
(
IJI
SA)
,
vol
.
10
,
no
.
2
,
pp
.
47
-
54
,
2018
.
[19]
J.
La
ne
,
“
The
10
Mos
t
Spoken
La
nguag
es
i
n
the
W
orld,”
Babbe
l
Magaz
ine
,
2019
.
[Onl
ine
]
,
Available:
htt
ps://
ww
w.ba
b
bel
.
com/en/
m
ag
az
in
e/
th
e
-
10
-
m
ost
-
spoken
-
la
ngu
age
s
-
in
-
th
e
-
worl
d,
[
Ac
ce
ss
ed
:
18
Marc
h
2018
]
.
[20]
Alam
M.
Jaha
ngir,
Naushad
U
zZ
aman,
and
Mum
it
Khan,
"
N
-
gra
m
base
d
Stat
isti
ca
l
Gr
amm
ar
Chec
ker
fo
r
Bangl
a
and
Eng
li
sh,"
In
Proce
e
ding
of
nint
h
Int
ernati
onal
Conf
ere
nce
on
Computer
and
Inform
ati
on
Technol
o
g
y
(
ICCIT 2006
)
,
2006
.
[21]
Nur
Hossai
n
Kh
an
M,
Khan
F,
Islam
MM
,
Rah
m
an
MH
,
Sarke
B.
,
"
Veri
ficati
o
n
of
Bangl
a
Sente
nc
e
Struct
u
r
e
using N
-
Gram
,
"
Global
Journal
of
Computer
S
cienc
e
and
Te
chno
logy
,
Ma
y
2014
.
[22]
Rahman
MR,
Habib
MT,
Rahman
M
S,
Shu
vo
SB
,
Uddin
MS
.
,
"
An
Inve
stiga
ti
v
e
Design
Based
Stat
isti
c
a
l
Approac
h
for
Dete
r
m
ini
ng
B
an
gla
Sente
n
ce
Va
li
dity
,
"
Inte
rnat
i
onal
Journal
of
Computer
Sci
e
nce
and
Net
wor
k
Sec
urit
y
(
IJCSN
S)
,
vol.
16
,
no
.
1
1,
pp
.
30
,
Nov.
2
016
.
[23]
Charni
ak
E
.
,
"
Stat
isti
cal
l
angua
g
e
l
ea
rning
,
" MIT
pre
s
s,
1996
.
[24]
Katz
S.
,
"
Esti
m
a
ti
on
of
proba
bi
litie
s
from
sparse
dat
a
for
the
l
ang
uage
m
odel
c
om
ponent
of
a
spe
e
ch
rec
ogn
izer,
"
IEE
E
transacti
o
ns on
acoustics, s
pee
ch, and
sign
al
proce
ss
ing
,
v
ol.
35
,
no
.
3
,
pp
.
400
-
1
,
Mar
.
198
7
.
[25]
Jeli
nek
F.
,
"
Inte
r
pola
t
ed
esti
m
at
i
on
of
Markov
so
u
rce
par
amete
rs
from
sparse
dat
a,
"
In
Proc.
Work
shop
on
Pat
te
rn
Re
cogn
it
ion
in
P
racti
c
e
,
1980.
Evaluation Warning : The document was created with Spire.PDF for Python.