I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
10
, N
o.
2
,
J
une
202
1
, pp.
306
~
315
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
10
.i
2
.pp
306
-
315
306
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
E
f
f
e
c
t
i
v
e
p
r
e
p
r
oc
e
ss
i
n
g b
ase
d
n
e
u
r
al
m
ac
h
i
n
e
t
r
an
sl
at
i
on
f
or
E
n
gl
i
s
h
t
o
T
e
l
u
gu
c
r
oss
-
l
an
gu
age
i
n
f
or
m
at
i
on
r
e
t
r
i
e
val
B
.
N
.
V
.
N
ar
as
i
m
h
a R
aj
u
1
, M
.
S
.
V
.
S
.
B
h
ad
r
i
R
aj
u
2
, K
.
V
.
V
.
S
at
yan
ar
ayan
a
3
1,3
Department of Compu
ter Science and Engineering,
Koneru Lakshamaiah Educati
on Foundatio
n (KLEF), Green fields,
Vaddeswaram
-
522502, Guntur District, AP, India
2
Department of Compu
ter Science and Engineering,
S R K R En
gineering Coll
ege, China amirma,
Bhimavar
am
-
534204,
West Godav
ari Dist
rict, A.P
, India
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
J
u
l
3
1, 20
20
R
e
vi
s
e
d
F
e
b
20, 20
21
A
c
c
e
pt
e
d
M
a
r
2
0
, 20
21
In
cross
-
language
information
retrieva
l
(CLIR),
the
neural
machine
translation
(NMT)
plays
a
vital
role.
CLIR
retrieve
s
the
information
written
in
a
language
which
is
different
from
the
user'
s
query
language.
In
CL
IR,
the
main
concern
is
to
translate
the
u
ser
query
from
the
source
language
to
the
target
language.
NMT
is
useful
for
translating
the
data
from
one
lang
uage
to
another.
NMT
has
better
accuracy
for
different
languages
like
Eng
lish
to
German
and
so
-
on.
In
this
paper,
NMT
has
applied
for
translating
En
glish
to
Indian langu
ages, especiall
y for Telugu
. Besides
NMT, an effort
is als
o made
to
improve
accuracy
by
applying
effective
preprocessing
mechanis
m.
The
role
of
effective
preprocessing
in
improving
accuracy
will
be
l
ess
but
countabl
e. Machin
e transl
ation (M
T) is a d
ata
-
driven approach wher
e
parallel
corpus
will
act
as
input
in
MT.
NMT
requires
a
massive
amount
of
parallel
corpus
for
performing
the
translation.
Building
an
English
-
Telugu
parallel
corpus
is
costly
because
they
are
resource
-
poor
langu
ages.
Different
mechanisms
are
available
for
preparin
g
the
parallel
corpus.
The
majo
r
issue
in
preparing
parallel
co
rpus
is
data
replication
that
is
handled
during
preprocessing.
The
other
issue
in
machine
translation
is
the
out
-
of
-
vocabulary
(OOV)
problem
.
Earlier
dictionaries
are
used
to
h
andl
e
OOV
problems.
To
overcome
this
problem
the
rare
words
are
segmented
into
sequences
of
subwords
during
preprocessing.
The
parameters
like
ac
curacy,
perplexity,
cross
-
entropy
and
BLEU
scores
shows
better
translat
ion
quality
for NMT with
effective preprocess
ing.
K
e
y
w
o
r
d
s
:
C
r
os
s
-
la
ngua
ge
I
R
L
ong s
hor
t
-
te
r
m
m
e
m
or
y
M
a
c
hi
ne
t
r
a
ns
la
ti
on
N
e
ur
a
l
m
a
c
hi
ne
t
r
a
ns
la
ti
on
P
r
e
pr
oc
e
s
s
in
g
This is an
open
acce
ss artic
le unde
r the
CC BY
-
SA
license.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
B
. N
. V
. N
a
r
a
s
im
ha
R
a
ju
D
e
pa
r
tm
e
nt
of
C
om
put
e
r
S
c
ie
nc
e
a
nd E
ngi
ne
e
r
in
g
K
one
r
u L
a
ks
ha
m
a
ia
h E
duc
a
ti
on F
ounda
ti
on (
K
L
E
F
)
G
r
e
e
n f
ie
ld
s
,
V
a
dde
s
w
a
r
a
m
-
522502, G
unt
ur
D
is
tr
ic
t,
A
P
, I
nd
ia
E
m
a
il
:
budda
r
a
ju
.na
r
a
s
im
ha
r
a
ju
@
gm
a
il
.c
om
1.
I
N
T
R
O
D
U
C
T
I
O
N
C
L
I
R
is
a
s
ubf
ie
ld
of
in
f
o
r
m
a
ti
on
r
e
t
r
ie
va
l
(
I
R
)
.
I
n
I
R
,
w
e
r
e
tr
ie
ve
th
e
r
e
le
va
nt
da
ta
by
us
in
g
th
e
us
e
r
que
r
y.
C
L
I
R
is
us
e
d
to
r
e
tr
ie
ve
th
e
da
ta
in
a
not
he
r
la
ngua
ge
th
a
n
th
a
t
of
th
e
us
e
r
'
s
que
r
y
la
ngua
ge
.
I
n
th
is
, w
e
ne
e
d t
o t
r
a
ns
la
te
t
he
us
e
r
que
r
y
l
a
ngua
ge
t
o a
not
he
r
l
a
n
gua
ge
.
M
a
c
hi
ne
t
r
a
ns
la
ti
on (
M
T
)
i
s
us
e
f
ul
f
or
th
e
tr
a
ns
la
ti
on
of
da
ta
f
r
om
one
la
ngua
ge
to
a
not
he
r
.
T
h
e
ne
e
d
f
or
M
T
ha
s
in
c
r
e
a
s
e
d
due
to
th
e
r
is
e
of
us
e
r
s
in
na
ti
ve
la
ngua
ge
s
.
I
n
1990s
,
80%
of
th
e
w
e
b
c
ont
e
nt
w
a
s
in
E
ngl
is
h.
I
n
2011
it
ha
d
f
a
ll
e
n
to
27%
.
T
hi
s
is
be
c
a
us
e
of
th
e
r
is
e
in
th
e
c
ont
e
nt
of
ot
he
r
la
ngua
ge
s
li
ke
R
us
s
ia
n,
F
r
e
nc
h,
G
e
r
m
a
n,
a
nd
s
o
-
on.
T
he
K
P
M
G
a
na
ly
s
is
in
I
ndi
a
dur
in
g
2017
s
ta
te
d
th
a
t
I
ndi
a
n
la
ngua
ge
in
te
r
n
e
t
us
e
r
s
a
r
e
38%
in
2011
w
hi
c
h
w
a
s
r
a
is
e
d
to
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
E
ff
e
c
ti
v
e
pr
e
pr
oc
e
s
s
in
g bas
e
d ne
u
r
al
m
ac
hi
ne
t
r
ans
la
ti
on f
or
E
ngl
is
h t
o
…
(
B
. N
. V
. N
ar
as
imha R
aj
u
)
307
57%
in
2016
a
nd
e
xpe
c
te
d
to
be
72%
by
2021.
T
hi
s
is
a
c
le
a
r
s
ig
n
of
a
n
in
c
r
e
a
s
e
in
th
e
im
por
ta
nc
e
of
na
ti
ve
la
ngua
ge
s
.
T
he
r
e
is
a
ne
e
d
f
or
a
ppr
opr
ia
te
M
T
m
e
c
ha
ni
s
m
f
or
tr
a
ns
la
ti
ng
f
r
om
one
la
ngua
ge
to
a
not
he
r
,
e
s
pe
c
ia
ll
y
in
th
e
I
ndi
a
n
la
ngu
a
ge
s
.
T
hi
s
he
lp
s
th
e
u
s
e
r
s
to
go
th
r
ough
th
e
c
ont
e
nt
th
a
t
w
a
s
pr
e
s
e
nt
in
ot
he
r
th
a
n t
he
na
ti
ve
l
a
ngua
ge
.
M
a
c
hi
ne
t
r
a
ns
la
ti
on
c
ons
i
s
ts
of
di
f
f
e
r
e
nt
ki
nds
of
t
r
a
ns
la
ti
on.
E
a
r
li
e
r
di
r
e
c
t
tr
a
ns
la
ti
on
ha
s
to
tr
a
ns
la
te
th
e
s
e
nt
e
nc
e
f
r
om
th
e
s
our
c
e
la
ngu
a
ge
in
to
a
ta
r
g
e
t
la
ngua
ge
.
T
hi
s
te
c
hni
que
us
e
s
a
bi
li
ngua
l
di
c
ti
ona
r
y
f
or
t
r
a
ns
la
ti
on.
N
ow
c
or
pus
-
ba
s
e
d
tr
a
ns
la
ti
on
is
us
e
d,
w
hi
c
h
i
s
c
a
te
gor
is
e
d
in
to
s
ta
ti
s
ti
c
a
l
a
nd
ne
ur
a
l
m
a
c
hi
ne
tr
a
n
s
la
ti
on.
S
ta
ti
s
ti
c
a
l
m
a
c
hi
ne
tr
a
n
s
la
ti
on
(
S
M
T
)
w
il
l
de
p
e
nd
on
bot
h
bi
li
ngua
l
c
or
pus
a
nd
s
ta
ti
s
ti
c
a
l
m
ode
ls
.
I
n
N
M
T
,
th
e
ne
ur
a
l
ne
twor
ks
w
il
l
pe
r
f
o
r
m
th
e
tr
a
ns
la
ti
ons
.
E
a
r
li
e
r
S
M
T
w
a
s
us
e
d,
but
N
M
T
ha
s
s
how
n
im
pr
ove
m
e
nt
in
th
e
a
c
c
ur
a
c
y
of
th
e
tr
a
ns
la
ti
o
n.
M
T
is
a
d
a
ta
-
dr
iv
e
n
a
ppr
oa
c
h
a
nd
de
pe
nd
s
on
th
e
c
or
pus
[
1
]
-
[3
]
.
W
it
hout
a
n
a
de
qua
te
a
m
ount
of
c
or
pus
,
it
c
a
nnot
a
c
hi
e
ve
b
e
tt
e
r
tr
a
ns
la
ti
ons
.
N
M
T
us
e
s
R
N
N
a
r
c
hi
te
c
tu
r
e
[
4
]
a
nd
m
a
in
ly
de
pe
nds
on
th
e
pa
r
a
ll
e
l
c
or
p
us
.
S
o,
th
e
r
e
is
a
ne
e
d
to
c
ol
le
c
t
m
or
e
pa
r
a
ll
e
l
c
or
pus
f
or
be
tt
e
r
t
r
a
ns
la
ti
ons
.
N
M
T
c
on
s
is
ts
of
th
r
e
e
pha
s
e
s
,
vi
z
.
pr
e
pr
oc
e
s
s
in
g,
e
nc
odi
ng
a
nd
de
c
odi
ng.
P
r
e
pr
oc
e
s
s
in
g
is
pe
r
f
or
m
e
d
on
th
e
pa
r
a
ll
e
l
c
or
pus
.
O
bt
a
in
in
g
th
e
pa
r
a
ll
e
l
c
or
pus
f
or
T
e
lu
gu
-
E
ng
li
s
h
la
ngua
ge
is
di
f
f
ic
ul
t
be
c
a
us
e
th
e
y
a
r
e
r
e
s
our
c
e
-
poor
.
C
ol
le
c
ti
on
of
th
e
pa
r
a
ll
e
l
c
or
p
us
is
e
it
he
r
done
m
a
nua
ll
y
or
by
us
in
g
to
ol
s
.
T
he
p
a
r
a
ll
e
l
c
or
pus
m
a
y
c
ont
a
in
di
f
f
e
r
e
nt
noi
s
e
s
a
nd
in
c
ons
i
s
t
e
nc
ie
s
.
T
he
s
e
ki
nd
s
of
pr
obl
e
m
s
w
il
l
be
m
or
e
in
m
or
phol
ogi
c
a
ll
y r
ic
h l
a
ngua
ge
s
l
ik
e
T
e
lu
gu. T
he
pa
r
a
ll
e
l
c
or
pus
m
a
y c
ont
a
in
da
ta
r
e
pl
ic
a
ti
on l
ik
e
t
he
s
a
m
e
s
our
c
e
a
nd
di
f
f
e
r
e
nt
tr
a
ns
la
ti
ons
a
nd
vi
c
e
-
ve
r
s
a
.
T
hi
s
w
oul
d
c
onf
us
e
th
e
m
a
c
hi
ne
w
hi
le
pe
r
f
or
m
in
g
tr
a
ns
la
ti
ons
.
N
M
T
s
how
s
good
r
e
s
ul
ts
,
but
th
e
tr
a
ns
l
a
ti
on
f
a
c
e
s
pr
obl
e
m
s
li
ke
out
-
of
-
voc
a
bul
a
r
y
[
5]
,
[
6
]
.
I
f
N
M
T
us
e
s
a
li
m
it
e
d
-
s
iz
e
vo
c
a
bul
a
r
y
w
it
h
hi
ghe
s
t
f
r
e
que
nc
y
w
or
ds
,
t
he
n
it
le
a
ds
to
O
O
V
pr
obl
e
m
s
.
T
hi
s
r
e
s
ul
t
in
poor
tr
a
ns
la
ti
ons
[
7]
-
[9
]
.
I
f
th
e
s
our
c
e
s
e
nt
e
nc
e
c
ont
a
in
s
m
or
e
f
r
e
que
nt
w
or
ds
,
th
e
n
th
e
tr
a
n
s
la
t
io
n
w
il
l
be
good.
I
f
th
e
s
our
c
e
s
e
nt
e
nc
e
c
ont
a
in
s
m
or
e
unknown
w
or
ds
,
t
he
n
th
e
tr
a
ns
la
ti
on
w
il
l
be
poor
.
T
hi
s
ki
nd
of
pr
obl
e
m
s
w
il
l
be
m
or
e
in
bot
h
r
e
s
our
c
e
-
poor
a
nd mor
phol
og
ic
a
ll
y r
ic
h l
a
ngua
ge
s
. E
a
r
li
e
r
,
O
O
V
pr
obl
e
m
s
a
r
e
us
in
g
w
or
d
-
le
ve
l
N
M
T
a
lo
ng
w
it
h
ba
c
k
-
of
f
di
c
ti
ona
r
ie
s
[
10]
-
[
12
]
.
T
he
s
e
m
ode
ls
a
r
e
not
s
ui
ta
bl
e
f
or
unknown
w
or
ds
.
S
o,
th
e
y
c
opy
th
e
unknown
w
or
ds
in
to
th
e
ta
r
ge
t
te
xt
.
I
t
w
oul
d
be
a
pt
f
or
th
e
na
m
e
d
e
nt
it
ie
s
but
not
f
or
a
ll
t
he
unknown w
or
ds
.
T
he
pr
e
pr
oc
e
s
s
in
g
pha
s
e
w
il
l
r
e
m
ove
a
ll
th
e
no
is
e
s
,
da
ta
r
e
pl
ic
a
ti
on
a
nd
O
O
V
pr
obl
e
m
s
in
th
e
da
ta
.
T
he
n t
he
e
nc
odi
ng a
nd de
c
odi
ng pha
s
e
s
of
t
he
N
M
T
w
il
l
us
e
t
he
da
ta
. P
r
e
pr
oc
e
s
s
in
g w
il
l
s
ol
ve
di
f
f
e
r
e
nt
ki
nd
of
pr
obl
e
m
s
w
hi
c
h
im
pr
ove
s
th
e
qua
li
ty
of
tr
a
ns
la
ti
on.
S
o,
pr
e
pr
oc
e
s
s
in
g
i
s
a
n
e
s
s
e
nt
ia
l
s
te
p
i
n
N
M
T
.
T
h
e
ne
xt
pha
s
e
s
i
n N
M
T
a
r
e
e
nc
odi
ng a
nd de
c
odi
ng. NM
T
c
on
s
is
ts
of
t
w
o ne
ur
a
l
ne
two
r
ks
[
1
3]
-
[
16
]
, i
.e
. e
nc
ode
r
a
nd
de
c
ode
r
.
T
h
e
s
our
c
e
s
e
nt
e
nc
e
is
th
e
in
put
f
or
th
e
e
nc
ode
r
,
a
nd
th
e
de
c
ode
r
w
il
l
ge
ne
r
a
te
th
e
tr
a
n
s
la
ti
on
f
or
th
e
s
our
c
e
s
e
nt
e
nc
e
.
T
he
e
n
c
ode
r
-
de
c
ode
r
pha
s
e
c
a
n
a
ls
o
ha
ndl
e
th
e
pr
obl
e
m
in
pr
oc
e
s
s
in
g
lo
ng
s
e
nt
e
nc
e
s
.
2.
R
E
L
A
T
E
D
WORK
E
a
r
li
e
r
S
M
T
te
c
hni
que
s
a
r
e
e
s
s
e
nt
ia
l
f
or
t
r
a
ns
la
ti
ng
th
e
s
e
nt
e
nc
e
s
.
B
.N
.V
N
a
r
a
s
im
ha
R
a
ju
e
t
al
.
pr
opos
e
d
[
1
7
]
S
M
T
f
or
tr
a
ns
la
ti
on
w
hi
c
h
c
ons
is
ts
of
la
ngua
ge
m
ode
l
,
t
r
a
ns
la
ti
on
m
ode
l
a
nd
de
c
ode
r
.
I
n
th
is
,
phr
a
s
e
-
ba
s
e
d
tr
a
ns
la
ti
on
m
ode
l
ha
s
a
c
hi
e
ve
d
a
de
qu
a
te
qua
li
t
y
of
tr
a
ns
la
ti
on.
L
a
te
r
ne
ur
a
l
n
e
twor
ks
a
lo
ng
w
it
h
M
T
ha
s
im
pr
ove
d
th
e
tr
a
n
s
la
ti
on
w
he
n
c
om
pa
r
e
d
to
th
e
S
M
T
.
M
a
i
O
uda
h
e
t
al
.
pr
opos
e
d
[
1
3
]
th
e
c
om
bi
na
ti
on
of
N
M
T
a
nd S
M
T
f
or
E
ngl
is
h
-
A
r
a
bi
c
l
a
ngua
ge
s
. I
n t
hi
s
, N
M
T
ha
s
s
ho
w
n good pe
r
f
or
m
a
nc
e
but
s
uf
f
e
r
s
f
r
om
th
e
tr
a
ns
la
ti
on
of
s
hor
t
s
e
nt
e
nc
e
s
.
T
hi
s
w
a
s
r
e
s
ol
v
e
d
by
us
in
g
bot
h
s
t
a
ti
s
ti
c
a
l
a
nd
ne
ur
a
l
M
T
,
but
it
l
a
c
ks
good tokeniz
a
ti
on s
c
he
m
e
s
.
T
he
s
e
t
oke
ni
z
a
ti
on
pr
obl
e
m
s
a
r
e
ha
ndl
e
d dur
in
g pr
e
pr
oc
e
s
s
in
g.
P
r
e
pr
oc
e
s
s
in
g
is
one
of
th
e
m
a
in
s
te
ps
in
N
M
T
.
I
f
pr
e
pr
oc
e
s
s
i
ng
is
not
a
ppl
ie
d
to
th
e
c
or
pus
,
th
e
n
th
e
qua
li
ty
of
th
e
tr
a
n
s
la
ti
on
w
il
l
be
le
s
s
.
P
r
e
pr
oc
e
s
s
in
g
c
a
n
pe
r
f
or
m
to
ke
ni
z
a
ti
ons
,
r
e
m
ove
uni
m
por
ta
nt
w
or
ds
,
a
nd
s
o
-
on
.
P
r
e
f
or
m
in
g
th
e
s
e
t
e
c
hni
que
s
w
il
l
de
pe
nd
on
th
e
s
it
ua
ti
on.
D
uygu
A
ta
m
a
n
e
t
al
.
pr
opos
e
d
[1
8
]
us
in
g
a
f
ix
e
d
-
s
iz
e
d
voc
a
bul
a
r
y.
T
he
c
onve
nt
io
na
l
m
e
th
o
ds
w
il
l
c
a
u
s
e
s
e
m
a
nt
ic
a
nd
s
ynt
a
c
ti
c
lo
s
s
e
s
.
T
he
s
e
pr
obl
e
m
s
a
r
e
s
ol
ve
d
b
y
uns
upe
r
vi
s
e
d
m
or
phol
ogy
le
a
r
ni
ng
w
hi
c
h
r
e
duc
e
s
voc
a
bul
a
r
y.
A
noop
K
unc
hukutt
a
n
e
t
al
.
pr
opos
e
d
[
1
9
]
th
e
te
xt
no
r
m
a
li
z
a
ti
on
o
n
H
in
di
-
E
ngl
is
h
pa
r
a
ll
e
l
c
or
pus
a
nd
pe
r
f
o
r
m
to
ke
ni
z
a
ti
on by us
in
g M
os
e
s
t
ool
ki
t.
K
yunghyun
C
ho
e
t
al
.
pr
opos
e
d
[
20
]
R
N
N
e
nc
ode
r
-
de
c
o
de
r
a
lo
ng
w
it
h
a
ga
te
d
r
e
c
ur
s
iv
e
c
onvolut
io
na
l
ne
ur
a
l
ne
twor
k.
H
e
r
e
e
nc
od
e
r
is
f
or
e
xt
r
a
c
ti
ng
r
e
pr
e
s
e
nt
a
ti
ons
of
f
ix
e
d
le
ngt
h
f
r
om
th
e
in
put
of
va
r
ia
bl
e
le
ngt
h.
F
r
om
th
e
r
e
p
r
e
s
e
nt
a
ti
on,
th
e
de
c
ode
r
w
il
l
ge
ne
r
a
te
th
e
tr
a
ns
la
ti
ons
.
T
hi
s
m
e
c
ha
ni
s
m
ha
s
s
how
n
good r
e
s
ul
ts
f
or
s
hor
t
s
e
nt
e
nc
e
s
i
f
i
t
ha
s
l
e
s
s
numbe
r
of
unknown w
or
ds
. I
n t
hi
s
w
a
y, N
M
T
i
s
s
how
in
g
a
good
pe
r
f
or
m
a
nc
e
w
he
n
c
om
pa
r
e
d
to
th
e
ot
he
r
te
c
hni
que
s
.
T
he
pe
r
f
or
m
a
nc
e
of
N
M
T
w
il
l
de
c
r
e
a
s
e
if
th
e
r
e
a
r
e
m
or
e
numbe
r
of
unknown w
or
ds
. S
o, t
he
s
e
unknown
w
or
ds
a
r
e
a
ls
o a
ki
nd of
O
O
V
pr
obl
e
m
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
10
, N
o.
2
,
J
une
20
2
1
:
306
–
315
308
P
r
e
pr
oc
e
s
s
in
g
c
a
n
ha
ndl
e
da
ta
r
e
pl
ic
a
ti
on
pr
obl
e
m
s
in
th
e
E
n
gl
is
h
to
T
e
lu
gu
T
r
a
ns
la
ti
on.
T
he
da
ta
r
e
pl
ic
a
ti
on
w
il
l
c
onf
us
e
th
e
m
ode
l
w
hi
le
m
a
ki
ng
th
e
tr
a
n
s
la
ti
ons
,
w
hi
c
h
de
gr
a
de
s
th
e
pe
r
f
or
m
a
nc
e
.
S
o,
r
e
m
ove
th
e
r
e
pl
i
c
a
te
d
da
ta
f
r
om
th
e
pa
r
a
ll
e
l
c
or
pus
.
T
o
ove
r
c
om
e
th
e
pr
obl
e
m
of
O
O
V
,
w
e
ha
ve
us
e
d
a
to
ke
ni
z
a
ti
on
s
c
he
m
e
c
a
ll
e
d
byt
e
-
pa
ir
e
nc
odi
ng
(
B
P
E
)
dur
in
g
pr
e
pr
oc
e
s
s
in
g.
B
P
E
w
a
s
ge
ne
r
a
ll
y
us
e
d
f
or
da
ta
c
om
pr
e
s
s
io
n
[
21
]
. T
he
s
e
a
r
e
a
ki
nd
of
s
ubw
or
d
m
ode
ls
w
hi
c
h a
c
hi
e
ve
be
tt
e
r
a
c
c
ur
a
c
y
f
or
th
e
tr
a
ns
la
ti
on.
I
t
i
s
a
ls
o
pos
s
ib
le
to
tr
a
ns
la
t
e
s
om
e
w
or
ds
th
a
t
a
r
e
not
pr
e
s
e
nt
a
t
tr
a
in
in
g
ti
m
e
.
M
a
tt
ia
A
.
D
i
G
a
ngi
e
t
al
.
ha
s
pr
opos
e
d [
22
]
byt
e
-
pa
ir
e
nc
odi
ng i
n pr
e
pr
oc
e
s
s
in
g f
or
E
ngl
is
h
-
G
e
r
m
a
n l
a
ngua
ge
s
.
I
n
N
M
T
,
bot
h
th
e
e
n
c
ode
r
a
nd
d
e
c
ode
r
a
r
e
one
of
th
e
pha
s
e
s
in
M
T
.
I
n
ge
ne
r
a
l,
th
e
r
e
gul
a
r
R
N
N
w
il
l
be
us
e
d
f
or
e
nc
od
e
r
a
nd
de
c
od
e
r
pha
s
e
s
.
T
he
r
e
gul
a
r
R
N
N
w
il
l
ha
ve
a
pr
obl
e
m
w
h
e
n
ha
ndl
in
g
th
e
lo
nge
r
-
r
a
nge
de
pe
nde
nc
ie
s
in
th
e
s
our
c
e
s
e
nt
e
nc
e
.
T
he
pr
o
pos
e
d
m
ode
l
us
e
s
lo
ng
s
hor
t
-
te
r
m
m
e
m
or
y
(
L
S
T
M
)
in
s
te
a
d
of
r
e
gul
a
r
R
N
N
in
e
nc
ode
r
a
nd
de
c
ode
r
.
L
S
T
M
w
il
l
pr
oduc
e
be
tt
e
r
a
c
c
ur
a
c
y
th
a
n
r
e
gul
a
r
R
N
N
in
c
a
s
e
of
lo
ng
-
r
a
nge
de
pe
nde
nc
ie
s
.
N
M
T
ne
e
d
s
to
us
e
L
S
T
M
s
in
bot
h
e
nc
ode
r
a
nd
de
c
ode
r
pha
s
e
s
.
I
n
th
e
pr
e
pr
oc
e
s
s
in
g pha
s
e
, i
t
ha
ndl
e
s
bot
h O
O
V
a
nd da
ta
r
e
pl
ic
a
ti
on pr
obl
e
m
s
.
3.
N
E
U
R
A
L
M
A
C
H
I
N
E
T
R
A
N
S
L
A
T
I
O
N
N
M
T
c
on
s
is
t
s
of
th
r
e
e
pha
s
e
s
,
i.
e
.
pr
e
pr
oc
e
s
s
in
g,
e
n
c
odi
ng
a
nd
de
c
odi
ng,
a
s
s
how
n
in
F
ig
ur
e
1.
N
M
T
i
s
a
da
t
a
-
dr
iv
e
n
a
ppr
oa
c
h.
S
o,
it
de
pe
nd
s
on
th
e
p
a
r
a
ll
e
l
c
or
pus
.
N
M
T
r
e
qui
r
e
s
la
r
ge
pa
r
a
ll
e
l
c
or
pus
f
or
ge
ne
r
a
ti
ng
be
tt
e
r
tr
a
ns
la
ti
ons
.
T
he
c
ol
le
c
ti
on
of
th
e
pa
r
a
ll
e
l
c
or
pus
is
a
ls
o
a
pr
obl
e
m
f
or
th
e
r
e
s
our
c
e
-
poor
la
ngua
ge
s
li
ke
T
e
lu
gu
-
E
ngl
is
h.
T
he
c
r
e
a
ti
on
of
pa
r
a
ll
e
l
c
or
pus
c
a
n
be
e
it
he
r
m
a
nua
l
or
by
us
in
g
to
ol
s
.
U
s
a
ge
of
to
ol
s
w
il
l
ge
ne
r
a
te
noi
s
e
s
.
H
a
ndl
in
g
th
e
s
e
noi
s
e
s
a
r
e
di
f
f
ic
ul
t.
S
o,
m
a
nua
l
pr
e
pa
r
a
ti
on
of
th
e
c
or
pus
w
oul
d
be
be
tt
e
r
but
s
ti
ll
th
e
r
e
e
xi
s
t
noi
s
e
a
nd
in
c
on
s
is
te
n
c
ie
s
in
th
e
da
ta
.
I
f
th
e
P
a
r
a
ll
e
l
c
or
pus
i
s
in
c
ons
i
s
te
nt
or
noi
s
y,
th
e
n
it
w
il
l
r
e
duc
e
th
e
a
c
c
ur
a
c
y
in
tr
a
ns
la
ti
on.
T
he
pr
e
pr
oc
e
s
s
in
g
pha
s
e
w
il
l
ha
ndl
e
bot
h
noi
s
e
s
a
nd
in
c
ons
is
te
nc
ie
s
in
a
pa
r
a
ll
e
l
c
or
pus
.
N
M
T
a
ls
o
f
a
c
e
s
a
pr
obl
e
m
w
it
h
da
ta
r
e
pl
ic
a
ti
on
a
nd
O
O
V
.
D
a
ta
r
e
pl
ic
a
ti
on
in
th
e
pa
r
a
ll
e
l
c
or
pus
w
il
l
c
ons
i
s
t
of
th
e
s
a
m
e
s
o
ur
c
e
s
e
nt
e
n
c
e
s
w
it
h
di
f
f
e
r
e
nt
tr
a
ns
la
ti
ons
a
nd
vi
c
e
-
ve
r
s
a
. N
M
T
w
il
l
ha
ndl
e
bot
h t
he
da
ta
r
e
pl
ic
a
ti
on a
nd O
O
V
pr
obl
e
m
s
w
hi
le
pr
e
pr
oc
e
s
s
in
g.
F
ig
ur
e
1. S
te
ps
i
n ne
ur
a
l
m
a
c
hi
ne
t
r
a
ns
la
ti
on
I
n
pr
e
pr
oc
e
s
s
in
g,
th
e
s
our
c
e
te
xt
is
th
e
in
put
.
I
t
r
e
m
ove
s
unw
a
nt
e
d
c
ha
r
a
c
te
r
s
,
s
ym
bol
s
a
nd
s
o
-
on
f
r
om
a
pa
r
a
ll
e
l
c
or
pus
.
T
he
n
r
e
m
ove
th
e
r
e
pl
ic
a
te
d
da
ta
in
a
p
a
r
a
ll
e
l
c
or
pus
.
F
ir
s
t,
w
e
lo
a
d
th
e
pa
r
a
ll
e
l
c
or
pus
a
nd
c
onve
r
t
th
e
c
h
a
r
a
c
te
r
s
to
lo
w
e
r
c
a
s
e
.
N
ow
,
r
e
m
ove
th
e
un
w
a
nt
e
d
s
ym
bol
s
in
th
e
pa
r
a
ll
e
l
c
or
pus
.
A
f
te
r
r
e
m
ovi
ng
th
e
m
s
a
ve
th
e
f
il
te
r
e
d
c
or
pus
.
N
ow
r
e
m
ove
th
e
r
e
pl
ic
a
te
d
da
ta
f
r
om
th
e
f
i
lt
e
r
e
d
c
or
pus
.
F
or
th
is
f
ir
s
t,
w
e
ha
ve
c
om
bi
ne
d
th
e
pa
r
a
ll
e
l
c
or
pus
in
to
a
s
in
gl
e
f
il
e
by
m
a
in
ta
in
in
g
a
de
li
m
it
e
r
.
I
de
nt
if
y
a
nd
r
e
m
ove
th
e
r
e
pl
ic
a
te
d s
e
nt
e
n
c
e
s
f
r
om
T
e
lu
gu
-
E
ngl
is
h pa
r
a
ll
e
l
c
or
pus
.
I
n
ge
ne
r
a
l,
th
e
pa
r
a
ll
e
l
c
or
pus
w
oul
d
c
ont
a
in
th
e
da
ta
of
bo
th
T
e
lu
gu
a
nd
E
ngl
is
h
in
two
f
il
e
s
.
C
ons
id
e
r
a
s
a
m
pl
e
pa
r
a
ll
e
l
c
or
pus
a
s
,
T
hi
s
i
s
r
a
vi
hous
e
ఇది
ర
ఇ
ల
ు
R
a
vi
gone
t
o s
c
hool
ర
బ
డ
ికి
వ
ె
ళ
ా
డ
ు
T
hi
s
i
s
r
a
vi
hous
e
ఇది
ర
గృ
హ
మ
ు
P
a
r
a
ll
e
l
c
or
pus
c
on
s
is
ts
of
d
a
ta
r
e
pl
ic
a
ti
on.
N
ow
c
onve
r
t
E
ngl
i
s
h
c
or
pus
to
lo
w
e
r
c
a
s
e
.
N
ow
pa
r
a
ll
e
l
c
or
pus
w
il
l
be
th
is
i
s
r
a
vi
hous
e
ఇది
ర
ఇ
ల
ు
r
a
vi
gone
t
o s
c
hool
ర
బ
డ
ికి
వ
ె
ళ
ా
డ
ు
th
is
i
s
r
a
vi
hous
e
ఇది
ర
గృ
హ
మ
ు
r
e
m
ove
th
e
unw
a
nt
e
d
s
ym
bol
s
a
nd
da
ta
r
e
pl
ic
a
ti
on
in
th
e
p
a
r
a
ll
e
l
c
or
pus
.
P
a
r
a
ll
e
l
c
or
pus
c
ons
is
t
s
of
d
a
ta
r
e
pl
ic
a
ti
on
in
s
e
nt
e
nc
e
s
1
a
nd
3.
I
t
ha
s
th
e
s
a
m
e
s
our
c
e
s
e
nt
e
nc
e
but
di
f
f
e
r
e
nt
tr
a
ns
la
ti
on
s
.
T
o
r
e
m
ove
th
e
S
our
c
e
T
e
xt
P
r
e
p
r
o
c
e
s
s
i
n
g
E
nc
odi
ng
D
e
c
odi
ng
T
a
r
ge
t
T
e
xt
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
E
ff
e
c
ti
v
e
pr
e
pr
oc
e
s
s
in
g bas
e
d ne
u
r
al
m
ac
hi
ne
t
r
ans
la
ti
on f
or
E
ngl
is
h t
o
…
(
B
. N
. V
. N
ar
as
imha R
aj
u
)
309
r
e
pl
ic
a
ti
on
in
da
ta
,
c
om
bi
ne
pa
r
a
ll
e
l
c
or
pus
in
to
a
s
in
gl
e
f
il
e
b
y
us
in
g
a
de
li
m
it
e
r
.
T
he
de
li
m
it
e
r
is
us
e
f
ul
f
or
s
e
pa
r
a
ti
ng both E
ngl
is
h
-
T
e
lu
gu pa
r
a
ll
e
l
c
or
pus
.
T
he
de
li
m
it
e
r
u
s
e
d i
s
*/
*. N
ow
pa
r
a
ll
e
l
c
or
pu
s
w
il
l
be
,
th
is
i
s
r
a
vi
hous
e
*/
*
ఇ
ద
ి
ర
ఇ
ల
ు
r
a
vi
gone
t
o s
c
hool
*/
*
ర
బ
డ
ికి
వ
ె
ళ
ా
డ
ు
th
is
i
s
r
a
vi
hous
e
*/
*
ఇ
ద
ి
ర
గ
ృ
హ
మ
ు
C
onve
r
t
th
e
c
om
pl
e
te
c
or
pus
in
to
th
e
U
ni
c
ode
f
or
m
a
t.
I
de
nt
if
y w
he
th
e
r
th
e
r
e
is
a
ny
d
a
ta
r
e
pl
ic
a
ti
on.
T
he
c
or
pus
pr
e
s
e
nt
be
f
or
e
th
e
d
e
li
m
it
e
r
is
E
ngl
is
h
a
nd
a
f
te
r
t
he
de
li
m
it
e
r
is
T
e
lu
gu.
I
de
nt
if
y
th
e
r
e
pl
ic
a
te
d
s
e
nt
e
nc
e
s
in
th
e
E
ngl
is
h
or
T
e
lu
gu
la
ngua
ge
c
or
pus
.
N
ow
,
r
e
m
ove
th
e
c
om
pl
e
te
s
e
nt
e
nc
e
i.
e
.
bot
h
E
ngl
is
h
a
nd
c
or
r
e
s
ponding
T
e
lu
gu
la
ngua
ge
s
e
nt
e
nc
e
.
L
in
e
1
a
nd
3
c
ons
is
ts
of
r
e
pl
ic
a
te
d
da
ta
.
A
f
te
r
r
e
m
ovi
ng
th
e
da
ta
r
e
pl
ic
a
ti
on, t
he
c
or
pus
w
il
l
be
,
th
is
i
s
r
a
vi
hous
e
*/
*
ఇ
ద
ి
ర
ఇ
ల
ు
r
a
vi
gone
t
o s
c
ho
ol
*/
*
ర
బ
డ
ికి
వ
ె
ళ
ా
డ
ు
R
e
m
ove
th
e
da
ta
r
e
pl
ic
a
ti
on
in
a
p
a
r
a
ll
e
l
c
or
pus
.
N
ow
,
pa
s
s
th
e
c
or
pus
f
or
f
ur
th
e
r
pr
e
pr
oc
e
s
s
in
g.
A
pa
r
a
ll
e
l
c
or
pus
of
20000
li
ne
s
w
a
s
pa
s
s
e
d
f
or
te
s
ti
ng
d
a
ta
r
e
p
li
c
a
ti
on.
I
t
ha
s
r
e
duc
e
d
th
e
c
or
pus
to
17589.
P
a
r
a
ll
e
l
c
or
pus
ha
s
241
1
r
e
pl
ic
a
te
d
s
e
nt
e
nc
e
s
.
R
e
m
ova
l
of
da
ta
r
e
pl
ic
a
ti
on
w
oul
d
r
a
is
e
th
e
pe
r
f
or
m
a
nc
e
o
f
N
M
T
.
N
M
T
r
e
qui
r
e
s
T
e
lu
gu
-
E
ngl
is
h
pa
r
a
ll
e
l
c
or
pus
f
or
tr
a
in
in
g.
T
he
E
ngl
is
h
-
T
e
lu
gu
pa
r
a
ll
e
l
c
or
pus
i
s
a
r
e
s
our
c
e
-
poor
la
ngua
ge
.
D
ue
to
lo
w
r
e
s
our
c
e
,
th
e
voc
a
bul
a
r
y
of
th
e
c
or
pus
m
a
y
c
ont
a
in
hi
gh
-
f
r
e
que
nc
y
w
or
ds
w
hi
c
h
c
a
us
e
th
e
O
O
V
pr
obl
e
m
[
23]
-
[
25
]
.
I
f
th
e
in
put
o
f
N
M
T
c
on
s
is
ts
of
unknown
w
or
ds
,
th
e
n
it
w
il
l
r
e
duc
e
pe
r
f
or
m
a
nc
e
.
T
o
r
e
m
ove
th
is
O
O
V
pr
obl
e
m
,
w
or
d
s
e
gm
e
nt
a
ti
on
te
c
hni
que
s
a
r
e
u
s
e
d.
D
iv
id
e
th
e
unknown w
or
d i
nt
o s
ubw
or
d unit
s
by us
in
g B
P
E
, t
he
n t
r
y t
o t
r
a
ns
la
te
by us
in
g s
ub
w
or
ds
.
B
yt
e
pa
ir
e
nc
odi
ng
[
12]
,
[
21
]
is
a
da
t
a
c
om
pr
e
s
s
io
n
m
e
c
ha
ni
s
m
,
th
a
t
is
us
e
d
f
or
m
e
r
gi
ng
f
r
e
que
nt
pa
ir
s
of
byt
e
s
.
T
hi
s
te
c
hni
que
is
a
ls
o
us
e
f
ul
f
or
w
or
d
s
e
gm
e
nt
a
ti
on.
M
e
r
ge
th
e
c
ha
r
a
c
te
r
s
or
c
ha
r
a
c
te
r
s
e
que
nc
e
s
.
I
n
B
P
E
,
in
it
ia
li
z
e
th
e
s
ym
bol
voc
a
bul
a
r
y
w
it
h
c
ha
r
a
c
te
r
voc
a
bul
a
r
y.
R
e
pr
e
s
e
nt
e
a
c
h
w
or
d
a
s
a
c
ha
r
a
c
te
r
s
e
que
nc
e
w
it
h
a
s
p
e
c
ia
l
d
e
li
m
it
e
r
a
t
th
e
e
nd.
T
he
de
l
im
it
e
r
is
us
e
f
ul
a
f
te
r
tr
a
ns
la
ti
on
to
r
e
s
to
r
e
th
e
or
ig
in
a
l
to
ke
n.
C
ount
a
ll
th
e
s
ym
bol
p
a
ir
s
a
nd
r
e
pl
a
c
e
th
e
m
os
t
f
r
e
que
nt
s
ym
bol
p
a
ir
w
it
h
a
n
e
w
s
ym
bol
w
hi
c
h
r
e
pr
e
s
e
nt
s
a
n
n
-
gr
a
m
c
ha
r
a
c
te
r
.
M
e
r
ge
th
e
f
r
e
que
nt
n
-
gr
a
m
c
ha
r
a
c
te
r
s
to
f
or
m
a
s
in
gl
e
s
ym
bol
.
I
n
B
P
E
,
th
e
in
it
ia
l
voc
a
bul
a
r
y
s
iz
e
a
nd
th
e
f
in
a
l
s
ym
bol
voc
a
bul
a
r
y
s
iz
e
s
a
r
e
e
qua
l
.
T
he
B
P
E
A
lg
or
it
hm
in
F
ig
ur
e
2
is
us
e
f
ul
f
or
th
is
ki
nd
of
w
or
d
s
e
gm
e
nt
a
ti
on.
A
ppl
y
B
P
E
f
or
bot
h
th
e
s
our
c
e
a
nd
ta
r
ge
t
voc
a
bul
a
r
y.
I
t
is
c
om
pa
c
t
in
t
e
xt
or
voc
a
bul
a
r
y
s
iz
e
. T
ha
t
m
e
a
ns
ha
vi
ng a
gua
r
a
nt
e
e
t
ha
t
th
e
s
ubw
or
d unit
i
s
p
r
e
s
e
nt
i
n t
he
r
e
s
pe
c
ti
ve
l
a
ngua
ge
t
r
a
in
in
g t
e
xt
.
A
l
gor
i
t
hm
:
B
yt
e
-
pa
i
r
e
nc
odi
ng
I
nput
:
I
t
c
ont
a
i
ns
a
s
e
t
of
s
t
r
i
ngs
S
a
nd t
he
t
a
r
ge
t
voc
a
b s
i
z
e
i
s
k
pr
oc
e
dur
e
B
P
E
(
S
, k)
X
i
s
t
he
a
l
l
uni
que
c
ha
r
a
c
t
e
r
s
i
n S
w
hi
l
e
|
X
|
< k do
t
m
, t
n
i
s
t
he
M
os
t
f
r
e
que
nt
bi
gr
a
m
i
n S
t
l
i
s
t
m
+ tn
X
i
s
X
+ [
t
l
]
R
e
pl
a
c
e
e
a
c
h oc
c
ur
r
e
nc
e
of
t
m
, t
n i
n S
w
i
t
h t
l
e
nd w
hi
l
e
r
e
t
ur
n X
e
nd pr
oc
e
dur
e
F
ig
ur
e
2. A
lg
or
it
hm
f
o
r
byt
e
pa
ir
e
nc
odi
ng
I
n
c
or
pus
,
B
P
E
w
il
l
c
ount
th
e
f
r
e
que
nc
y
of
e
a
c
h
w
or
d.
S
pl
it
w
o
r
d
in
to
c
ha
r
a
c
te
r
s
a
nd
pl
a
c
e
a
s
pe
c
i
a
l
to
ke
n
c
a
ll
e
d <
/w>
a
t
th
e
e
nd
of
th
e
w
or
d.
F
or
e
xa
m
pl
e
,
c
on
s
id
e
r
th
e
w
or
d
“
hi
gh”
a
nd
th
e
to
ke
ns
f
or
th
e
w
or
d
a
r
e
[
“
h”
, “
i”
, “
g”
, “
h”
, “
<
/w>
”
]
. C
ount
t
he
f
r
e
que
nc
y of
a
ll
w
or
ds
i
n t
he
c
or
pus
. I
t
w
il
l
s
pe
c
if
y t
he
voc
a
b
ul
a
r
y
f
or
t
oke
ni
z
e
d w
or
d a
lo
ng w
it
h i
ts
c
or
r
e
s
ponding c
ount
s
. F
or
e
xa
m
pl
e
, c
ons
id
e
r
{
'
h i
g h <
/w>
'
:
4, '
h i
g h e
r
<
/w>
'
:
2, '
n e
w
e
s
t
<
/w
>
'
:
6, '
w
i
d e
s
t
<
/w>
'
:
3}
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
10
, N
o.
2
,
J
une
20
2
1
:
306
–
315
310
I
n
e
ve
r
y
it
e
r
a
ti
on,
f
in
d
th
e
m
o
s
t
f
r
e
que
nt
c
on
s
e
c
ut
iv
e
byt
e
p
a
ir
a
nd
m
e
r
ge
th
e
tw
o
byt
e
-
pa
ir
to
ke
ns
in
to
one
.
I
n
th
e
f
ir
s
t
it
e
r
a
ti
on,
th
e
byt
e
pa
ir
“
e
”
a
nd
“
s
”
oc
c
ur
r
e
d
6+
3=
9
ti
m
e
s
.
M
e
r
ge
th
e
m
in
to
ne
w
to
ke
n
“
e
s
”
. N
ow
t
h
e
voc
a
bul
a
r
y i
s
{
'
h i
g h <
/w>
'
:
4, '
h i
g h e
r
<
/w>
'
:
2, '
n e
w
e
s
t
<
/w
>
'
:
6, '
w
i
d e
s
t
<
/w>
'
:
3}
I
n
th
e
ne
xt
it
e
r
a
ti
on,
byt
e
p
a
ir
“
e
s
”
a
nd
“
t”
o
c
c
ur
r
e
d
6+
3
=
9
ti
m
e
s
.
M
e
r
ge
th
e
m
in
to
n
e
w
to
ke
n
“
e
s
t”
.
N
ow
t
he
voc
a
bul
a
r
y i
s
{
'
h i
g h <
/w>
'
:
4, '
h i
g h e
r
<
/w>
'
:
2, '
n e
w
e
s
t
<
/w
>
'
:
6, '
w
i
d e
s
t
<
/w>
'
:
3}
I
n
th
e
ne
xt
it
e
r
a
ti
on,
byt
e
pa
ir
“
e
s
t”
a
nd
“
<
/w>
a
r
e
m
or
e
f
r
e
qu
e
nt
.
M
e
r
ge
th
e
byt
e
pa
ir
s
in
to
a
ne
w
to
ke
n
“
e
s
t
<
/w>
”
.
R
e
pe
a
t
th
i
s
unt
il
th
e
de
f
in
e
d
s
ubw
or
d
voc
a
bu
la
r
y
s
iz
e
or
th
e
ne
xt
hi
ghe
s
t
f
r
e
que
nc
y
pa
ir
is
1.
S
uppos
e
if
a
w
or
d
“
hi
ghe
s
t”
n
e
e
d
to
e
n
c
ode
th
e
n
th
e
B
P
E
w
oul
d
s
pl
it
it
in
to
two
s
ubw
or
ds
vi
z
.
“
hi
g
h”
a
nd
“
e
s
t<
/w
>
”
.
B
y
us
in
g
th
e
s
ubw
or
ds
,
N
M
T
w
oul
d
tr
y
to
tr
a
ns
l
a
te
th
e
m
.
I
n
th
e
s
a
m
e
w
a
y,
a
ny
unkno
w
n
w
or
ds
a
r
e
c
onve
r
te
d
to
s
ubw
or
d
uni
ts
a
nd
t
r
y
to
t
r
a
ns
la
te
th
e
m
.
A
ppl
y
th
e
s
a
m
e
pr
oc
e
s
s
to
th
e
pa
r
a
ll
e
l
c
or
pus
.
T
r
a
ns
la
ti
ng t
he
unknown w
or
ds
w
il
l
in
c
r
e
a
s
e
t
he
pe
r
f
or
m
a
nc
e
o
f
N
M
T
.
A
f
te
r
pr
e
pr
oc
e
s
s
in
g,
in
put
w
il
l
be
pa
s
s
ed
to
th
e
e
nc
od
e
r
of
th
e
N
M
T
.
I
n
ge
ne
r
a
l,
th
e
e
nd
-
to
-
e
nd
a
ppr
oa
c
h
is
u
s
e
d
f
or
s
e
que
n
c
e
le
a
r
ni
ng.
T
he
s
e
que
nc
e
to
s
e
que
nc
e
m
ode
l
w
il
l
ge
ne
r
a
te
a
f
ix
e
d
-
le
ngt
h
out
put
f
r
om
f
ix
e
d
-
le
ngt
h
in
put
.
H
e
r
e
th
e
le
ngt
h
of
in
put
a
nd
out
put
m
a
y
va
r
y.
T
he
e
nc
ode
r
c
ons
is
t
s
of
a
m
ul
ti
la
ye
r
L
S
T
M
,
a
nd
it
w
il
l
m
a
p
th
e
in
put
s
e
que
nc
e
to
a
f
ix
e
d
di
m
e
n
s
i
ona
li
ty
ve
c
to
r
.
T
he
d
e
c
ode
r
is
a
ls
o
a
n
L
S
T
M
th
a
t
w
il
l
ge
ne
r
a
te
th
e
ta
r
ge
t
s
e
que
nc
e
f
r
om
th
e
di
m
e
ns
io
na
li
ty
ve
c
to
r
[
2
6]
,
[
27
]
.
T
hi
s
a
r
c
hi
te
c
tu
r
e
is
s
how
n
in
F
ig
ur
e
3.
T
hi
s
m
e
c
h
a
ni
s
m
w
il
l
r
e
a
d
th
e
in
put
s
e
nt
e
n
c
e
s
c
om
pl
e
te
ly
,
a
nd
th
e
n
it
w
il
l
s
t
a
r
t
ge
ne
r
a
ti
ng
th
e
out
put
. T
he
m
ode
l
w
il
l
s
to
p ge
ne
r
a
ti
ng t
he
out
put
w
he
n i
t
e
nc
o
unt
e
r
s
a
<
e
o
s
>
t
oke
n.
F
ig
ur
e
3. S
e
que
nc
e
t
o s
e
que
nc
e
m
ode
l
I
n t
he
N
M
T
t
he
i
nput
s
e
que
nc
e
i
s
(
i
1
,...,i
T
)
a
nd t
he
R
N
N
ge
ne
r
a
te
s
a
n output
s
e
que
nc
e
of
(
j
1
,…
,j
T
)
by
it
e
r
a
ti
ve
ly
pe
r
f
or
m
in
g t
he
f
ol
lo
w
in
g e
qua
ti
ons
.
h
t
= f
(
W
hi
i
t
+ W
hh
h
t
-
1
)
j
t
= W
jh
h
t
H
e
r
e
h
r
e
pr
e
s
e
nt
s
th
e
hi
dde
n
l
a
ye
r
,
f
r
e
pr
e
s
e
nt
s
th
e
a
c
ti
va
ti
on
f
unc
ti
on,
W
r
e
pr
e
s
e
nt
s
th
e
w
e
ig
ht
s
.
I
t
is
li
tt
le
bi
t
di
f
f
ic
ul
t
to
a
ppl
y
R
N
N
w
he
n
in
put
a
nd
out
put
s
e
que
nc
e
s
a
r
e
of
di
f
f
e
r
e
nt
le
ngt
h.
A
ge
n
e
r
a
l
s
e
que
nc
e
m
ode
l
c
a
n
m
a
p
th
e
in
put
s
e
que
nc
e
to
a
f
ix
e
d
-
s
iz
e
ve
c
to
r
us
in
g
one
R
N
N
.
N
ow
ge
ne
r
a
te
th
e
ta
r
ge
t
s
e
que
nc
e
by
us
in
g
a
not
he
r
R
N
N
f
r
om
th
is
ve
c
to
r
.
T
he
pr
obl
e
m
a
r
is
e
s
in
r
e
gul
a
r
R
N
N
w
he
n
w
e
ne
e
d
to
tr
a
in
w
it
h
lo
ng
-
r
a
nge
de
pe
nde
n
c
ie
s
.
I
n
s
uc
h
s
it
ua
ti
on
s
,
th
e
r
e
gul
a
r
R
N
N
m
a
y
f
a
il
to
pr
oduc
e
a
c
c
ur
a
te
tr
a
ns
la
ti
ons
.
T
o
ove
r
c
om
e
s
uc
h
pr
obl
e
m
s
,
w
e
c
a
n
u
s
e
L
S
T
M
s
.
L
S
T
M
s
c
a
n
h
a
ndl
e
th
e
lo
ng
-
r
a
nge
de
pe
nde
n
c
ie
s
in
a
be
tt
e
r
w
a
y t
ha
n t
he
r
e
gul
a
r
R
N
N
. T
h
e
a
r
c
hi
te
c
tu
r
e
of
L
S
T
M
i
s
s
how
n
i
n F
ig
ur
e
4.
T
he
be
ha
vi
our
of
L
S
T
M
i
s
t
o hold t
he
i
nf
or
m
a
ti
on f
or
a
l
onge
r
ti
m
e
. I
n L
S
T
M
s
, t
he
r
e
a
r
e
f
our
l
a
ye
r
s
w
hi
c
h i
nt
e
r
a
c
t
in
a
s
pe
c
ia
l
w
a
y. L
S
T
M
c
ons
is
t
s
of
a
c
e
ll
s
ta
te
w
hi
c
h i
s
a
hor
iz
ont
a
l
li
ne
f
r
om
C
t
-
1
to
C
t
. I
t
r
uns
th
r
ough the
e
nt
ir
e
c
ha
in
w
it
h s
om
e
m
in
o
r
li
ne
a
r
in
te
r
a
c
ti
ons
. L
S
T
M
c
a
n a
dd or
r
e
m
ove
t
he
i
n
f
or
m
a
ti
on i
n
t
he
c
e
ll
s
ta
te
by
us
in
g
ga
t
e
s
.
I
t
c
on
s
is
ts
of
a
s
ig
m
oi
d
la
y
e
r
a
nd
poi
nt
w
is
e
m
ul
ti
pl
ic
a
ti
on.
T
he
out
put
of
th
e
s
ig
m
oi
d f
unc
ti
on i
s
f
t
.
f
t
=σ
(
W
f
. [
h
t
-
1
, i
t
]
+ b
f
నేన
ు
am
goi
ng
hom
e
<e
os
>
<e
os
>
ఇం
టి
క
ి
వె
త
ు
న
ా
ను
I
నేన
ు
ఇం
టిక
ి
వె
త
ు
నా
ను
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
E
ff
e
c
ti
v
e
pr
e
pr
oc
e
s
s
in
g bas
e
d ne
u
r
al
m
ac
hi
ne
t
r
ans
la
ti
on f
or
E
ngl
is
h t
o
…
(
B
. N
. V
. N
ar
as
imha R
aj
u
)
311
F
ig
ur
e
4. L
S
T
M
c
e
ll
T
he
ne
xt
s
t
e
p
w
il
l
de
c
id
e
w
h
a
t
is
th
e
ne
w
in
f
or
m
a
ti
on
w
e
a
r
e
goi
ng
to
a
dd
to
th
e
c
e
ll
s
ta
t
e
.
T
he
out
put
of
th
e
s
ig
m
oi
d
la
ye
r
g
t
.
N
e
xt
,
th
e
ta
nh
w
il
l
c
r
e
a
te
a
ne
w
c
a
ndi
da
te
va
lu
e
̃
.
N
ow
w
e
w
il
l
upda
te
th
e
c
e
ll
s
ta
te
f
r
om
C
t
-
1
to
C
t
. M
ul
ti
pl
y t
he
ol
d c
e
ll
s
ta
t
e
by
f
t
a
nd t
he
n a
dd t
he
g
t
*
̃
.
g
t
=
σ
(
W
g
. [
h
t
-
1
, i
t
]
+ b
g
)
̃
=
ta
nh(
W
c
. [
h
t
-
1
, i
t
]
+ b
c
)
C
t
=
f
t
*
C
t
-
1
+
g
t
*
̃
F
in
a
ll
y, t
he
out
put
w
il
l
be
a
f
il
te
r
e
d ve
r
s
io
n of
t
he
c
e
ll
s
ta
te
. T
he
out
put
of
t
he
s
ig
m
oi
d l
a
ye
r
i
s
o
t
.
o
t
=
σ
(
W
o
. [
h
t
-
1
, i
t
]
+ b
o
)
h
t
=
o
t
*
ta
nh(
C
t
)
I
n
th
is
w
a
y,
e
a
c
h
L
S
T
M
c
e
ll
w
oul
d
f
unc
ti
on.
T
he
goa
l
of
t
he
L
S
T
M
is
to
f
in
d
th
e
c
ondi
ti
ona
l
pr
oba
bi
li
ty
p(
j
1
,…
,j
T'
|
i
1
,...,i
T
)
w
he
r
e
th
e
le
ngt
h
o
f
bot
h
in
put
s
e
que
nc
e
a
nd
out
put
s
e
que
nc
e
m
a
y
va
r
y.
T
he
le
ngt
h
of
th
e
in
put
s
e
que
n
c
e
i
s
T
,
a
nd
th
e
le
ngt
h
of
th
e
out
p
ut
s
e
que
n
c
e
i
s
T'
.
I
t
c
om
put
e
s
th
e
c
ondi
ti
ona
l
pr
oba
bi
li
ty
by
obt
a
in
in
g
t
he
f
ix
e
d
di
m
e
ns
io
na
l
r
e
pr
e
s
e
nt
a
ti
on
v
of
th
e
in
put
s
e
qu
e
nc
e
(i
1
,...,i
T
)
a
nd
it
is
gi
ve
n
by
th
e
la
s
t
hi
dde
n
s
ta
te
of
th
e
L
S
T
M
.
N
ow
c
om
put
e
th
e
pr
oba
bi
li
ty
of
j
1
,
…
,
j
T'
'
w
it
h
a
s
ta
nda
r
d
L
S
T
M
-
L
M
f
or
m
ul
a
ti
on w
hos
e
i
ni
ti
a
l
hi
dde
n s
ta
te
i
s
t
he
r
e
pr
e
s
e
nt
a
ti
on
v
of
i
1
,...,i
T
.
p(
j
1
,…
,j
T'
| i
1
,...,i
T
)
=
∏
′
=
1
(
j
t
|
v
,
j
1
,…
,j
t
-
1
)
A
f
te
r
ha
vi
ng
a
r
ig
or
ous
tr
a
in
in
g
by
u
s
in
g
m
a
ny
s
e
nt
e
n
c
e
pa
ir
s
,
now
th
e
de
c
ode
r
w
il
l
pr
oduc
e
th
e
c
or
r
e
c
t
tr
a
ns
la
ti
on
T
of
t
he
s
our
c
e
s
e
nt
e
nc
e
S
by us
in
g L
S
T
M
.
̂
=
a
r
g
m
a
x
(
|
)
T
he
pa
r
a
m
e
te
r
s
us
e
d
f
or
th
e
e
va
lu
a
ti
on
of
th
e
m
ode
l
a
r
e
a
c
c
ur
a
c
y,
pe
r
pl
e
xi
ty
,
c
r
os
s
-
e
nt
r
opy,
a
nd
bi
li
ngua
l
e
va
lu
a
ti
on
unde
r
s
tu
dy
(
B
L
E
U
)
s
c
or
e
.
A
c
c
ur
a
c
y
r
e
pr
e
s
e
nt
s
th
e
a
m
ount
of
c
or
r
e
c
t
c
la
s
s
if
ic
a
ti
on.
T
h
e
m
ode
l
ha
vi
ng
hi
gh
a
c
c
ur
a
c
y
is
a
be
tt
e
r
pe
r
f
or
m
e
r
.
T
he
pe
r
pl
e
xi
ty
is
a
m
e
a
s
ur
e
us
e
d
f
or
f
in
di
ng
how
w
e
ll
a
pr
oba
bi
li
ty
m
ode
l
pr
e
di
c
ts
a
s
a
m
pl
e
.
T
he
lo
w
pe
r
pl
e
xi
ty
s
c
o
r
e
in
di
c
a
te
s
good
pr
oba
bi
li
ty
di
s
tr
ib
ut
io
n
f
or
pr
e
di
c
ti
ng
th
e
s
a
m
pl
e
. T
he
c
r
os
s
-
e
nt
r
opy
is
us
e
f
ul
f
or
c
a
lc
ul
a
ti
ng
th
e
lo
s
s
f
unc
ti
on.
I
t
m
e
a
s
ur
e
s
th
e
di
f
f
e
r
e
nc
e
be
twe
e
n
two
pr
oba
bi
li
ty
di
s
tr
ib
ut
io
ns
f
or
a
gi
ve
n
s
e
t
of
e
ve
nt
s
.
T
he
m
ode
l
ha
vi
ng
le
s
s
c
r
os
s
-
e
nt
r
opy
s
c
or
e
w
il
l
be
a
be
tt
e
r
pe
r
f
or
m
e
r
.
T
he
B
L
E
U
s
c
or
e
i
s
us
e
f
ul
f
or
e
va
lu
a
ti
ng
th
e
pr
e
di
c
ti
ons
m
a
de
b
y
th
e
m
a
c
hi
ne
tr
a
ns
la
ti
on s
ys
te
m
s
.
T
he
m
ode
l
h
a
vi
ng highe
r
B
L
E
U
s
c
or
e
w
il
l
pe
r
f
or
m
w
e
ll
i
n pr
e
di
c
ti
ng t
he
t
r
a
ns
la
ti
ons
.
4.
R
E
S
U
L
T
S
A
N
D
D
I
S
C
U
S
S
I
O
N
I
n
th
is
,
th
e
p
a
r
a
ll
e
l
c
or
pus
w
it
h
a
nd
w
it
hout
r
e
pl
ic
a
te
d
d
a
ta
i
s
us
e
d.
T
he
r
e
pl
ic
a
te
d
da
ta
w
il
l
c
ont
a
in
r
e
pe
a
te
d
s
e
nt
e
nc
e
s
,
s
e
nt
e
nc
e
s
w
it
h
th
e
s
a
m
e
s
our
c
e
but
di
f
f
e
r
e
nt
tr
a
ns
la
ti
ons
a
nd
vi
c
e
-
ve
r
s
a
.
R
e
m
ove
a
ll
th
e
f
t
C
t
-
1
h
t
-
1
C
t
h
t
O
t
g
t
̃
X
+
X
X
h
t
t
a
nh
t
a
nh
i
t
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
10
, N
o.
2
,
J
une
20
2
1
:
306
–
315
312
r
e
pe
a
te
d
s
e
nt
e
nc
e
s
f
r
om
th
e
da
ta
s
e
t
be
c
a
us
e
it
is
di
f
f
ic
ul
t
to
f
in
d
w
hi
c
h
s
our
c
e
is
c
or
r
e
c
t
f
or
th
e
tr
a
ns
la
te
d
s
e
nt
e
nc
e
a
nd
vi
c
e
-
ve
r
s
a
.
I
f
a
s
e
nt
e
nc
e
is
r
e
p
e
a
te
d
m
or
e
num
be
r
of
ti
m
e
s
in
th
e
da
ta
ba
s
e
,
it
w
il
l
c
onf
us
e
th
e
m
ode
l
in
id
e
nt
if
yi
ng
a
nd
le
a
r
ni
ng
ne
w
f
e
a
tu
r
e
s
.
I
t
c
a
us
e
s
ove
r
f
it
ti
ng
a
nd
w
il
l
ge
ne
r
a
te
w
r
ong
r
e
s
ul
ts
.
I
f
th
e
tr
a
in
a
nd
te
s
t
da
ta
s
e
ts
a
ls
o
c
ont
a
in
th
e
s
a
m
e
s
e
nt
e
nc
e
s
,
th
e
a
c
c
ur
a
c
y
w
il
l
be
m
o
r
e
dur
in
g
tr
a
in
in
g
a
nd
te
s
ti
ng.
I
t
w
il
l
f
a
il
w
hi
le
tr
a
ns
la
ti
ng
th
e
ne
w
s
e
nt
e
nc
e
s
to
pr
oduc
e
th
e
c
or
r
e
c
t
tr
a
ns
la
ti
ons
.
R
e
m
ove
th
e
s
e
pr
obl
e
m
s
f
r
om
t
he
da
ta
s
e
t
th
a
t
he
lp
s
i
n i
nc
r
e
a
s
in
g t
he
e
f
f
ic
ie
nc
y of
t
he
t
r
a
ns
la
ti
on.
A
ppl
y
nor
m
a
l
pr
e
pr
oc
e
s
s
in
g
to
ge
ne
r
a
te
w
or
d
voc
a
bul
a
r
ie
s
,
s
e
que
nc
e
s
of
in
di
c
e
s
a
nd
B
P
E
f
or
da
ta
w
it
h
a
nd
w
it
hout
r
e
pl
ic
a
ti
on.
T
he
pe
r
f
or
m
a
nc
e
of
N
M
T
us
in
g
B
P
E
f
or
da
ta
w
it
h
r
e
pl
ic
a
ti
on
a
nd
w
it
hou
t
r
e
pl
ic
a
ti
on
is
a
s
s
how
n
in
F
ig
ur
e
5
.
T
h
e
tr
a
in
in
g
a
c
c
ur
a
c
y
f
or
da
ta
ba
s
e
s
is
s
how
n
in
F
ig
ur
e
5(
a
)
,
th
e
tr
a
in
in
g
pe
r
pl
e
xi
ty
f
or
da
ta
ba
s
e
s
is
s
how
n
in
F
ig
ur
e
5(
b)
,
th
e
tr
a
in
in
g
c
r
os
s
-
e
nt
r
opy
f
or
da
ta
ba
s
e
s
is
s
ho
w
n
in
F
ig
ur
e
5(
c
)
,
th
e
va
li
da
ti
on
a
c
c
ur
a
c
y
f
or
da
ta
ba
s
e
s
is
s
ho
w
n
in
F
ig
ur
e
5(
d)
,
a
nd
th
e
va
li
da
ti
on
pe
r
pl
e
xi
ty
f
or
da
ta
ba
s
e
s
is
s
how
n
in
F
ig
ur
e
5(
e
)
.
I
nput
th
e
da
ta
f
or
N
M
T
.
H
e
r
e
bot
h
th
e
r
e
gul
a
r
R
N
N
a
nd
L
S
T
M
a
r
e
us
e
d.
N
M
T
c
on
s
is
ts
of
t
w
o e
nc
od
e
r
a
nd de
c
ode
r
l
a
ye
r
s
.
T
he
s
iz
e
of
t
he
R
N
N
i
s
500. I
t
us
e
s
A
d
a
m
opt
im
iz
e
r
, w
it
h a
le
a
r
ni
ng r
a
te
of
0.01. T
he
de
c
a
y i
s
0.5. T
he
dr
opout i
s
0.3, a
nd t
he
t
r
a
in
in
g s
te
ps
a
r
e
20000.
(
a
)
(
b)
(
c
)
(
d)
(
e
)
D
a
t
a
ba
s
e
w
i
t
hout
r
e
pl
i
c
a
t
i
on
D
a
t
a
ba
s
e
w
i
t
h r
e
pl
i
c
a
t
i
on
F
ig
ur
e
5.
P
e
r
f
or
m
a
nc
e
of
N
M
T
us
in
g
B
P
E
f
or
da
ta
w
it
h r
e
pl
ic
a
ti
on a
nd w
it
hout
r
e
pl
ic
a
ti
on
.
(
a
)
t
r
a
in
in
g a
c
c
ur
a
c
y, (
b)
t
r
a
in
in
g pe
r
pl
e
xi
ty
, (
c
)
t
r
a
in
in
g c
r
os
s
-
e
nt
r
opy, (
d)
va
li
da
ti
on a
c
c
ur
a
c
y,
(
e
)
va
li
da
ti
on
pe
r
pl
e
xi
ty
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
E
ff
e
c
ti
v
e
pr
e
pr
oc
e
s
s
in
g bas
e
d ne
u
r
al
m
ac
hi
ne
t
r
ans
la
ti
on f
or
E
ngl
is
h t
o
…
(
B
. N
. V
. N
ar
as
imha R
aj
u
)
313
N
ow
w
e
c
om
pa
r
e
d
th
e
pe
r
f
or
m
a
nc
e
of
N
M
T
us
in
g
di
f
f
e
r
e
nt
pr
e
pr
oc
e
s
s
in
g
te
c
hni
que
s
li
ke
da
t
a
w
it
h
a
nd
w
it
hout
r
e
pl
ic
a
ti
on
a
lo
ng
w
it
h
B
P
E
.
T
he
c
om
pa
r
is
on
o
f
pa
r
a
m
e
te
r
s
li
ke
tr
a
in
in
g
a
c
c
ur
a
c
y,
tr
a
in
in
g
pe
r
pl
e
xi
ty
,
tr
a
in
in
g
c
r
os
s
-
e
nt
r
opy,
va
li
da
ti
on
a
c
c
ur
a
c
y,
va
li
da
ti
on
pe
r
pl
e
xi
ty
a
r
e
s
how
n
in
T
a
bl
e
1.
I
n
a
ll
pa
r
a
m
e
te
r
s
, t
he
da
ta
ba
s
e
w
it
hout
r
e
pl
ic
a
ti
on ha
s
a
c
hi
e
ve
d b
e
tt
e
r
pe
r
f
or
m
a
nc
e
.
T
a
bl
e
1. C
om
pa
r
is
on of
pa
r
a
m
e
t
e
r
s
f
or
e
va
lu
a
ti
ng t
he
pe
r
f
or
m
a
nc
e
of
da
ta
ba
s
e
w
it
h a
nd w
it
hout
r
e
pl
ic
a
ti
on
P
a
r
a
m
e
t
e
r
s
D
a
t
a
ba
s
e
w
i
t
hout
r
e
pl
i
c
a
t
i
on
D
a
t
a
ba
s
e
w
i
t
h r
e
pl
i
c
a
t
i
on
T
r
a
i
ni
ng A
c
c
ur
a
c
y
91.02
88.32
T
r
a
i
ni
ng P
e
r
pl
e
xi
t
y
1.394
1.598
T
r
a
i
ni
ng C
r
os
s
-
E
nt
r
opy
0.332
0.4686
V
a
l
i
da
t
i
on a
c
c
ur
a
c
y
77.99
77.77
V
a
l
i
da
t
i
on P
e
r
pl
e
xi
t
y
4.086
4.14
W
e
ha
ve
a
ls
o
m
e
a
s
ur
e
d
th
e
pe
r
f
or
m
a
nc
e
of
N
M
T
by
us
in
g
th
e
B
L
E
U
m
e
t
r
ic
,
a
nd
th
e
s
c
or
e
s
a
r
e
a
s
s
how
n
in
T
a
bl
e
2
.
T
he
pe
r
f
or
m
a
nc
e
c
om
pa
r
is
on
f
or
di
f
f
e
r
e
nt
N
M
T
te
c
hni
que
s
us
in
g
r
e
pl
ic
a
te
d
a
nd
non
-
r
e
pl
ic
a
te
d
c
or
pus
a
r
e
a
s
s
how
n
in
F
ig
ur
e
6.
N
M
T
w
it
hout
r
e
pl
ic
a
t
io
n
us
in
g
B
P
E
a
nd
L
S
T
M
ha
s
m
or
e
a
c
c
ur
a
c
y
in
tr
a
ns
la
ti
on.
T
he
pr
e
pr
oc
e
s
s
in
g
m
ode
l
us
e
d
f
or
r
e
m
ovi
ng
th
e
r
e
pl
ic
a
ti
on
a
lo
ng
w
it
h
B
P
E
is
us
e
f
ul
f
or
i
m
pr
ovi
ng t
he
a
c
c
ur
a
c
y of
N
M
T
.
T
a
bl
e
2. P
e
r
f
or
m
a
nc
e
i
n va
r
io
us
t
e
c
hni
que
s
f
or
c
or
pus
w
it
h a
nd
w
it
hout
r
e
pl
ic
a
t
io
n us
in
g B
L
E
U
s
c
or
e
M
ode
l
B
L
E
U
S
c
or
e
W
i
t
hout
r
e
pl
i
c
a
t
i
on
W
i
t
h r
e
pl
i
c
a
t
i
on
R
N
N
46.87
46.03
L
S
T
M
47.19
46.38
B
P
E
+R
N
N
47.70
47.01
B
P
E
+L
S
T
M
48.81
48.27
F
ig
ur
e
6. C
om
pa
r
is
on of
t
he
c
or
pus
w
it
h a
nd w
it
hout
r
e
pl
ic
a
ti
on us
in
g B
L
E
U
s
c
or
e
5.
C
O
N
C
L
U
S
I
O
N
I
n N
M
T
, r
e
m
ovi
ng r
e
pl
ic
a
ti
on i
n t
he
pa
r
a
ll
e
l
c
or
pus
w
il
l
im
p
r
o
ve
t
he
t
r
a
ns
la
ti
on a
c
c
ur
a
c
y. A
ppl
yi
ng
B
P
E
s
ol
ve
s
th
e
unknown
w
or
ds
pr
obl
e
m
li
ke
O
O
V
.
I
n
th
is
pa
pe
r
,
th
e
pa
r
a
ll
e
l
c
or
pus
w
it
h
a
nd
w
it
hout
da
ta
r
e
pl
ic
a
ti
on a
lo
ng w
it
h B
P
E
is
us
e
d i
n t
he
pr
e
pr
oc
e
s
s
in
g pha
s
e
.
T
he
out
put
of
t
he
pr
e
pr
oc
e
s
s
in
g pha
s
e
i
s
gi
ve
n
a
s
in
put
f
or
th
e
e
nc
ode
r
a
nd
de
c
ode
r
ph
a
s
e
s
.
T
he
m
ode
l
w
a
s
t
e
s
te
d
f
or
bot
h
E
ngl
is
h
-
T
e
lu
gu
la
ngua
ge
pa
ir
s
.
T
he
qua
li
ty
of
tr
a
ns
la
ti
on
is
m
e
a
s
ur
e
d
by
u
s
in
g
th
e
a
c
c
ur
a
c
y,
p
e
r
pl
e
xi
ty
,
c
r
os
s
-
e
nt
r
opy
a
nd
B
L
U
E
s
c
or
e
s
.
B
y
a
na
ly
z
in
g
th
e
p
e
r
f
or
m
a
nc
e
of
va
r
io
us
te
c
hni
que
s
,
it
w
a
s
s
how
n
t
ha
t
th
e
m
ode
l
not
h
a
vi
ng
r
e
pl
ic
a
te
d
c
or
pus
i
s
s
how
in
g
be
tt
e
r
a
c
c
ur
a
c
y
f
or
t
r
a
ns
la
ti
ons
.
S
o,
r
e
m
ovi
ng
th
e
r
e
p
li
c
a
te
d
da
ta
in
pa
r
a
ll
e
l
c
or
pus
a
nd
s
ol
vi
ng
th
e
O
O
V
pr
obl
e
m
s
a
r
e
th
e
two
im
por
ta
nt
s
te
ps
f
or
im
p
r
ovi
ng
th
e
a
c
c
ur
a
c
y
of
tr
a
ns
la
ti
on
in
r
e
s
our
c
e
-
poor
la
ngua
ge
s
.
P
r
e
pr
oc
e
s
s
in
g
ha
s
s
how
n
a
s
li
ght
im
pr
ove
m
e
nt
in
th
e
qua
li
ty
of
tr
a
ns
la
ti
on,
w
hi
c
h
i
s
c
ount
a
bl
e
.
P
r
e
pr
oc
e
s
s
in
g
c
a
n
be
c
on
s
id
e
r
e
d
a
s
one
of
th
e
e
s
s
e
nt
ia
l
s
te
p
s
in
N
M
T
.
S
o,
N
M
T
w
it
h e
f
f
e
c
ti
ve
pr
e
pr
oc
e
s
s
in
g
li
ke
r
e
m
ova
l
of
da
ta
r
e
pl
ic
a
ti
on
a
lo
ng
w
it
h
B
P
E
ha
s
pe
r
f
o
r
m
e
d
be
tt
e
r
f
or
th
e
E
ng
li
s
h
-
T
e
lu
gu
pa
r
a
ll
e
l
c
or
pus
.
T
hus
,
by
u
s
in
g
th
e
N
M
T
w
it
h
e
f
f
ic
ie
nt
pr
e
pr
oc
e
s
s
in
g
f
or
E
ngl
is
h
-
T
e
lu
gu
pa
r
a
ll
e
l
c
or
pus
im
pr
ove
s
th
e
tr
a
ns
la
ti
on a
c
c
ur
a
c
y of
C
L
I
R
.
44
46
48
50
R
N
N
L
S
T
M
B
P
E
+R
N
N
B
P
E
+L
S
T
M
B
L
E
U
S
c
or
e
N
M
T
M
e
c
h
a
n
i
s
m
s
B
L
E
U
S
c
or
e
W
i
t
hout
r
e
pl
i
c
a
t
i
on
B
L
E
U
S
c
or
e
W
i
t
h r
e
pl
i
c
a
t
i
on
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
10
, N
o.
2
,
J
une
20
2
1
:
306
–
315
314
R
E
F
E
R
E
N
C
E
S
[1]
Karunesh
Kumar
Arora,
Shyam
S.
Agrawal,
“Pre
-
Proce
ssing
of
E
nglish
-
Hindi
Corpus
for
Statistical
Machin
e
Translation,”
Computació
n y Siste
mas
, pp. 725
-
737, 2017,
doi: 10.13053/CyS
-
21
-
4
-
2697
.
[2]
Adam
Lopez,
“Statistical
Machine
Translation,
”
ACM
Computing
Su
rveys
,
Vol.
40,
Issue
No.
3,
Article
8,
Augus
t
2008,
doi
: https://doi.org/10.
1145/1380584.13
80586.
[3]
Mikel
L.
Forcada
and
Ramon
P.
Neco,
“
Re
cursive
hetero
-
associati
ve
memories
for
translat
ion,”
In
Mira
J.,
Moreno
-
Díaz
R.,
Cabestany
J.
(eds)
Biological
and
Artificial
Comp
utation:
From
Neuroscience
to
Technology.
IWANN,
Lecture
Notes
in
Comput
er
Science
,
vol
1240.
S
pringer,
Berlin,
Heidelberg,
1997.
doi
:
https://doi.org/10.1007/BFb0032504
[4]
Mussyazwa
nn
Azizi
Mustafa
Azizi,
Mohammad
Nazrin
Mohd
Noh,
Idnin
Pasya,
Ahmad
Ihsan
Mohd
Yassin,
Megat
Syahirul
Amin
Megat
Ali,
“Pedestrian
d
etection
using
Dopp
ler
radar
and
LSTM
neural
network,”
IAES
Internati
onal
Journal
of
Artifici
al
Intelligen
ce
(IJ
-
AI)
,
Vol.
9
,
No.
3,
pp.
394
-
401,
2020,
doi
:
http://doi.org/10.11591/ijai.v9.i3.pp394
-
401
[5]
Nal
Kalchbrenner
and
Phil
Blunsom,“Recurrent
Continuous
Transl
ation
Models,”
In
Proceedings
of
the
201
3
Conferenc
e
on
Empirical
Methods
in
Natural
Language
Processi
ng
,
Seattle.
As
sociation
for
Computational
Linguistic
s
. 2013, Web Link: https://www.aclweb.org/anthology/D13
-
1176.
[6]
Thang
Luong,
Ilya
Sutskever,
Quoc
Le,
Oriol
Vinyals,
and
Wojciech
Zaremba,“Addressing
the
Rare
Word
Problem
in
Neura
l
Machin
e
Trans
lation,”
In
Proceedings
of
the
5
3rd
Annual
Meeting
of
the
Association
for
Computationa
l
Linguistics
and
the
7th
Internatio
nal
Joint
Conferenc
e
on
Natural
Language
Processi
ng,
B
eijing,
China. Assoc
iation for Com
putational Lin
guistics
Vol. 1, pp. 11
–
19, 2015, doi: 10.3115/v1/P15
-
1002.
[7]
Ahmed
Y.
Tawfik,
Mahitab
Emam,
Khaled
Essam,
Robert
Nabil
and
Hany
Hassan,
“Morphology
-
Aware
Word
-
Segment
ation
in
Dialec
tal
Arabi
c
Adapta
tion
of
Neura
l
Machin
e
Tran
slation”
,
In
Proceedings
of
the
Fourth
A
rabic
Natural Langua
ge Proc
essing Wo
rkshop
, pp. 11
-
17, 2019,
doi
: 10.18653/v1/W
19
-
4602
[8]
Pan,
Yirong
&
Li,
Xiao
&
Yang,
Yating
&
Dong,
Rui,
“Morphol
ogical
Word
Segmentation
on
Agglutinative
Languages for Neural Machine Translation,”
ArXiv.org
, 2020, Web Link
: https://arxiv.or
g/abs/2001.015
89.
[9]
Taku
Kudo,
“Subword
regularization:
Improving
neural
network
translation
models
with
multiple
subword
candidates
,”
Proceedings
of
the
56th
Annual
Meeting
of
the
Associati
on
for
Computational
Linguistics
,
pp.
66
–
75
Vol.
1, 2018, doi: 10.18653/v1/P18
-
1007
[10]
Sébasti
en
Jean,
Kyunghy
un
Cho,
Roland
Memisev
ic,
and
Yosh
ua
Bengio,
“On
Using
Very
Large
Targe
t
Vocabulary
for
Neural
Machine
Translation,”
In
Proceedings
of
the
53rd
Annual
Meeting
of
the
Association
for
Computationa
l
Ling
uistics
and
the
7th
International
Joint
Conferenc
e
on
Natural
Language
Processing
,
B
eijing,
China. Assoc
iation for Com
putational Lin
guistics
,Vol. 1, pp. 1
-
10, 2015,
doi
: 10.3115/v1/P15
-
1001.
[11]
Thang
Luong,
Richard
Socher,
and
Christopher
D.
Manning,
“Better
Word
Representations
with
Recursive
Neural
Networks
for
Morphology,”
In
Proceedings
of
the
Seventeenth
Conf
erence
on
Computational
Natural
Languag
e
Learning
,
CoNLL2013,
Sofia,
Bulgaria
,
August
8
-
9,
pp.
104
-
113,
2013,
Web
Link
:
https://www.aclweb.org/antholo
gy/W13
-
3512.
[12]
Rico
Sennrich,
Barry
Haddow
and
Alexandra
Birch,
“Neural
Machi
ne
Translation
of
Rare
Words
with
Subword
Units,”
In
Proceedings
of
the
54th
Annual
Meeting
of
the
Association
for
Computational
Linguistics
,
August
7
-
12
,
pp. 1715
-
1725, 2016, doi: 1
0.18653/v1/P16
-
1162
[13]
Mai
Oudah,
Amjad
Almahairi
and
Nizar
Habash,
“The
Impact
of
Pre
processing
on
Arabic
-
English
Statistical
and
Neural
Machine
Translation,”
ArXiv.org
,
Aug.
19
-
23,
pp.
214
-
221,
2019,
Web
Link
:
https://www.aclweb.org/anthology/W19
-
6621.
[14]
Piotr
Bojanowski,
Edouard
Grave,
Armand
Joulin
and
Tomas
Mikol
ov,
“Enriching
Word
Vectors
with
Subword
Information”,
Transact
ions
of
the
Associa
tion
for
Computat
ional
Linguistic
s
,
vol.
5,
pp.
135
-
146,
2017,
doi:
1
0.1162/tacl_a_00051.
[15]
Carlos
Escolano,
Marta
R.
Costa
-
jussà,
and
José
A.
R.
Fonollosa,
“
The
TALP
-
UPC
Neural
Machine
Translation
System
for
German/Finnish
-
English
Using
the
Inverse
Direction
Model
in
Rescoring,”
In
Proceedings
of
the
Second Conference on Mac
hine Translation
, Vol. 2, pp. 283
–
287, 2017, doi: 10.18653/v1/W17
-
4725.
[16]
Noe
Casas,
José
A.
R.
Fonollosa,
Carlos
Escolano,
Christine
Basta,
and
Marta
R.
Costa
-
jussà,
The
TALP
-
UPC
Machine
Translation
Systems
for
WMT19
News
Translation
Task:
Pi
voting
Techniqu
es
for
Low
Resource
MT,
In
Proceedings
of
the
Fourth
Conference
on
Machine
Translation
,
Vol. 2,
pp.
155
–
162,
2019,
doi
:
10.18653/v1/W19
-
5311.
[17]
B.N.V
Narasimha
Raju,
M
S
V
S
Bhadri
Raju,
“Statistical
Machine
T
ranslation
System
for
Indian
Languages,”
6th
Inter
national Advanced Computing Conference
, pp. 174
-
177, 2016,
doi
: 10.1109/IACC.20
16.41.
[18]
Duygu
Ataman,
M.
N.,
Marco
Turchi,
Marcello
Federico,
“Linguistically
Motivated
Vocabulary
Reduction
for
Neural Machine Translation
from Turkish to
English,”
The 20th
Annu
a
l Conference of the European Association for
Machine Tra
nslatio
n (EAMT)
, pp. 331
-
342, 2017,
doi: 10.1515/pralin
-
2017
-
0031.
[19]
Anoop
Kunchukuttan,
P.
M.,
Pushpak
Bhattacharyya,
“The
IIT
Bomb
ay
English
-
Hindi
Parallel
Corpus,”
European
Language
Resources
Association
(ELRA)
,
pp.
3473
-
3476,
2018,
Web
Link:
https://www.aclweb.org/anthology/L18
-
1548.
[20]
Kyunghyun Cho,
B. V.
M., Dzmitry
Bahdanau, Yoshua
Bengio, “On t
he Properties of Neural
Machine Translation
:
Encoder
–
Decoder
Approaches,”
Proceedings
of
S
SST
-
8,
Eighth
Work
shop
on
Syntax,
Semantics
and
Structure
in
Statistical Translation
, pp. 103
-
111, 2014,
doi
: 10.3115/v1/W1
4
-
4012.
[21]
Philip
Gage,
“A
New
Algorithm
for
Data
Compression,”
CUsers
J
.,
pp.
23
-
38,
February,
1994,
Web
Link:
https://dl.acm.org/doi/1
0.5555/177910.177914.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
E
ff
e
c
ti
v
e
pr
e
pr
oc
e
s
s
in
g bas
e
d ne
u
r
al
m
ac
hi
ne
t
r
ans
la
ti
on f
or
E
ngl
is
h t
o
…
(
B
. N
. V
. N
ar
as
imha R
aj
u
)
315
[22]
Mattia
A.
Di
Gangi
and
Nicola
Bertoldi
and
Marcello
Federico,
“F
BK’s
Participation
to
the
English
-
to
-
German
News Translation Task
of WMT 2017”,
Proceedings of the Conferenc
e on Machine Translation (WMT)
, Vol. 2, pp.
271
-
275, 2017, doi
: 10.18653/v1/W
17
-
4723.
[23]
Graham
Neubig,
Taro
Watanabe,
Shinsuke
Mori1
and
Tatsuya
Kaw
ahara,
“
Machine
Translation
without
Words
through
Substring
Alignment”,
Proceedings
of
the
50th
Annual
Meeting
of
the
Association
for
Computational
Linguistic
s
, pp. 165
-
174
, 2012, Web Links: https://www.aclweb.org/anthology/P12
-
1018.
[24]
Rohan
Chitnis
and
John
DeNero,
“
Variable
-
Length
Word
Encodings
for
Neural
Translation
Models”,
Proceedings
of
the
2015
Conference
on
Empirical
Methods
in
Natural
Langu
age
Processing
,
pp.
2088
-
20
93,
2015,
doi
:
10.18653/v1/D15
-
1249
.
[25]
T.
Mikolov,
A.
Deoras,
D.
Povey,
L.
Burget
and
J.
Černocký,
“Strat
egies
for
training
large
scale
neural
network
language
models,”
IEEE
Workshop
on
Automatic
Speech
Recognitio
n
&
Understanding,
Waikoloa,
HI,
USA
,
pp.
19
6
-
201, 2011, doi: 10.1109/ASRU.2011.6163930.
[26]
Ilya
Sutskever,
Oriol
Vinyals,
Quoc
V.
Le,
“Sequence
to
Sequence
Learning
with
Neural
Networks,”
NIPS'
14
:
Proceedings
of
the
27th
International
Conference
on
Neural
Inform
ation
Processing
Systems
,
Vol.
2,
pp.
31
04
-
3112, 2014, Web Link:
https://dl.acm.org/doi/10.5555/2969033.2969173
.
[27]
S. Hoch
reite
r and
J. Sch
midhube
r, “L
ong
short
-
term memory
,”
Neural
Computation
, Vol. 9, Issue No.
8, 1997, doi
:
10.1162/neco.1997.9.8.1735.
B
I
O
G
R
A
P
H
I
E
S
O
F
A
U
T
H
O
R
S
Mr
B
N
V
Narasimha
Raju
is
a
Research
Scholar
in
the Department of
CSE
in
K
L
Deemed
to
be
University.
He
obtained
his
master’s
degree
from
SRKR
Engine
ering
college
affiliated
to
Andhra
University
-
Vishakapatnam.
He
also
has
a
teaching
experien
ce
of
6
Years.
His
area
of
interest is I
nformation r
etrieval,
Machine L
earning,
Deep Le
arning, M
achine Tr
anslation. H
e is a
Life Member of Computer Society of India (CSI), Insti
tute of Engineers (IE).
Dr. Bhadri Raju
M
S VS
received Ph.D.
from
JNTU Univers
ity Hyd
erabad (JNTUH) and h
as
a
total
teaching
experience
of
24
years
and
research
experience
of
10
years.
He
published
more
than
45
papers
in
internation
al
journal
and
conferences.
His
area
o
f
interest
are
Information
Secu
rity,
Machine
Learning
and
Information
Retrieval.
He
is
a
senior
Member
of
IEE,
Member
of
ACM,
Life
Member
of
Computer
Society
of
India
(CSI),
Indi
an
Society
for
Technical
Education
(ISTE),
Institute
of
Engineers
(IE),
Cryptography
Researc
h
Society
of
Indi
a
(CRSI).
Currentl
y
he
is
associa
ted
with
SRKR
Enginee
ring,
Bhimavra
m,
AP,
as
Professo
r
and
Head
of
CSE Depa
rtment
.
Prof.
K.V.V.
SATYANARAYANA
,
is
working
as
Professor
in
the
Department
of
CSE
in
K
L
Deemed
to
be
University
since
2012
with
specializa
tion
areas
in
Bioinformatics
and
Cloud
Computing.
He
has
32+
years
of
teachin
g
experie
nce
both
in
UG
and
PG
Enginee
ring
Courses.
He
obtained
his
Engineering
Masters
degree
from
JNTUK
-
Kakinada
and
Doctoral
Degree
from
Acharya
Nagarjuna
University.
He
worked
as
Director
and
Head
o
f
the
Department
at
other
institutions.
He
has
more
than
60
National
and
Internatio
nal
peer
revi
ewed
publications.
He
has
organized
and
attended
a
many
National
and
International
confer
ences
and
workshops.
He
looked
after
so
many
adm
inistrative
rolls
in
K
L
University
and
now
he
is
acting
as
Coordinator
for M. Tech C
ourses.
Evaluation Warning : The document was created with Spire.PDF for Python.