I
AE
S In
t
er
na
t
io
na
l J
o
urna
l o
f
Art
if
icia
l In
t
ellig
ence
(
I
J
-
AI
)
Vo
l.
10
,
No
.
3
,
Sep
tem
b
er
2
0
2
1
,
p
p
.
6
3
6
~
6
4
8
I
SS
N:
2
2
5
2
-
8
9
3
8
,
DOI
: 1
0
.
1
1
5
9
1
/ijai.v
10
.i
3
.
p
p
6
3
6
-
6
4
8
636
J
o
ur
na
l ho
m
ep
a
g
e
:
h
ttp
:
//ij
a
i
.
ia
esco
r
e.
co
m
A
two
-
pha
se
pla
g
ia
rism
detec
tion
s
y
stem
ba
sed
on
m
ulti
-
la
y
er
lo
ng
sho
rt
-
term
memo
ry
network
s
Ng
uy
en
Va
n
So
n
1
,
Le
T
ha
nh
H
uo
ng
2
,
Ng
uy
en
Chi
T
ha
nh
3
1,
3
In
stit
u
te
of
In
f
o
rm
a
ti
o
n
Tec
h
n
o
lo
g
y
,
M
I
S
T,
Vie
t
n
a
m
2
S
c
h
o
o
l
of
In
f
o
rm
a
ti
o
n
a
n
d
Co
m
p
u
ter
S
c
ien
c
e
Tec
h
n
o
lo
g
y
,
Ha
n
o
i
Un
iv
e
rsity
of
S
c
ien
c
e
a
n
d
Tec
h
n
o
lo
g
y
Ha
n
o
i,
Vie
tn
a
m
Art
icle
I
nfo
AB
S
T
RAC
T
A
r
ticle
his
to
r
y:
R
ec
eiv
ed
Au
g
2
5
,
2
0
2
0
R
ev
is
ed
May
15,
2
0
2
1
Acc
ep
ted
May
2
6
,
2
0
2
1
F
in
d
in
g
p
la
g
iarism
strin
g
s
b
e
twe
e
n
two
g
iv
e
n
d
o
c
u
m
e
n
ts
a
re
th
e
m
a
in
tas
k
of
th
e
p
lag
iarism
d
e
tec
ti
o
n
p
ro
b
le
m
.
Trad
it
io
n
a
l
a
p
p
r
o
a
c
h
e
s
b
a
se
d
on
stri
n
g
m
a
tch
in
g
a
re
not
v
e
ry
u
se
fu
l
in
c
a
se
s
of
sim
il
a
r
s
e
m
a
n
ti
c
p
lag
iarism
.
De
e
p
lea
rn
in
g
a
p
p
ro
a
c
h
e
s
s
o
lv
e
th
is
p
r
o
b
lem
by
m
e
a
su
rin
g
t
h
e
se
m
a
n
ti
c
sim
il
a
rit
y
b
e
twe
e
n
p
a
irs
of
se
n
ten
c
e
s.
Ho
we
v
e
r,
th
e
se
a
p
p
ro
a
c
h
e
s
stil
l
fa
c
e
th
e
fo
ll
o
win
g
c
h
a
ll
e
n
g
i
n
g
p
o
in
ts.
F
irst
,
it
is
imp
o
ss
ib
le
to
so
l
v
e
c
a
se
s
wh
e
re
o
n
ly
p
a
rt
of
a
se
n
ten
c
e
b
e
lo
n
g
s
to
a
p
lag
iarism
p
a
ss
a
g
e
.
S
e
c
o
n
d
,
m
e
a
su
rin
g
th
e
se
n
ten
ti
a
l
sim
il
a
rit
y
wit
h
o
u
t
c
o
n
sid
e
ri
n
g
th
e
c
o
n
tex
t
of
su
rr
o
u
n
d
i
n
g
se
n
ten
c
e
s
lea
d
s
to
d
e
c
re
a
sin
g
in
a
c
c
u
ra
c
y
.
To
s
o
lv
e
th
e
a
b
o
v
e
p
ro
b
lem
s,
t
h
is
p
a
p
e
r
p
ro
p
o
se
s
a
two
-
p
h
a
se
p
lag
iarism
d
e
tec
ti
o
n
s
y
ste
m
b
a
se
d
on
m
u
lt
i
-
lay
e
r
l
o
n
g
sh
o
r
t
-
term
m
e
m
o
ry
n
e
two
rk
m
o
d
e
l
a
n
d
fe
a
tu
re
e
x
trac
ti
o
n
tec
h
n
i
q
u
e
:
(i)
a
p
a
ss
a
g
e
-
p
h
a
se
to
re
c
o
g
n
ize
p
lag
iarism
p
a
ss
a
g
e
s,
a
n
d
(ii
)
a
wo
rd
-
p
h
a
se
to
d
e
term
in
e
th
e
e
x
a
c
t
p
lag
iarism
strin
g
s.
Ou
r
e
x
p
e
rim
e
n
t
re
su
lt
s
on
P
AN
2
0
1
4
c
o
rp
u
s
re
a
c
h
e
d
9
4
.
2
6
%
F
-
m
e
a
su
re
,
h
ig
h
e
r
th
a
n
e
x
isti
n
g
re
se
a
rc
h
in
th
is
fiel
d
.
K
ey
w
o
r
d
s
:
Deep
lear
n
in
g
Featu
r
e
ex
tr
ac
tio
n
Mu
lti
-
lay
er
lo
n
g
s
h
o
r
t
-
ter
m
m
em
o
r
y
Plag
iar
is
m
d
etec
tio
n
T
wo
-
p
h
ase
T
h
is
is
an
o
p
e
n
a
c
c
e
ss
a
rticle
u
n
d
e
r
th
e
CC
BY
-
SA
li
c
e
n
se
.
C
o
r
r
e
s
p
o
nd
ing
A
uth
o
r
:
Ng
u
y
en
Van
So
n
I
n
s
titu
te
of
I
n
f
o
r
m
atio
n
T
ec
h
n
o
lo
g
y
MI
ST,
Vietn
am
T
el:
(
+8
4
)
9
0
4
2
3
6
6
8
3
E
m
ail:
s
o
n
n
v
7
8
@
g
m
ail.
co
m
1.
I
NT
RO
D
UCT
I
O
N
Plag
iar
is
m
is
d
ef
in
ed
as
th
e
r
eu
s
e
of
an
o
t
h
er
p
er
s
o
n
’
s
id
ea
s
,
p
r
o
ce
s
s
es,
r
esu
lts
,
or
wo
r
d
s
with
o
u
t
ex
p
licitly
ac
k
n
o
wled
g
in
g
th
e
s
o
u
r
ce
[
1
]
.
Plag
iar
is
m
d
etec
tio
n
is
th
e
alg
o
r
ith
m
f
o
r
au
to
m
atica
lly
r
etr
iev
in
g
s
tr
in
g
s
in
a
s
u
s
p
icio
u
s
d
o
cu
m
en
t
r
eu
s
ed
f
r
o
m
an
o
th
er
d
o
c
u
m
en
t.
Plag
iar
is
m
m
eth
o
d
s
ar
e
d
iv
id
ed
in
to
two
m
ain
ty
p
es:
liter
al
p
lag
iar
is
m
an
d
in
tellig
en
t
o
n
e,
b
ased
on
t
h
e
p
lag
ia
r
is
t’
s
b
eh
av
io
r
[
2
]
.
L
iter
al
p
lag
iar
is
m
is
a
co
m
m
o
n
an
d
p
o
p
u
lar
ca
s
e
in
wh
ich
p
lag
iar
is
ts
do
not
s
p
en
d
m
u
ch
tim
e
h
id
in
g
t
h
e
a
ca
d
em
ic
cr
im
e
th
e
y
co
m
m
itted
.
Fo
r
ex
am
p
le,
th
e
y
co
p
y
a
n
d
p
aste
th
e
tex
t
f
r
o
m
th
e
i
n
ter
n
et.
I
n
tellig
en
t
p
lag
iar
is
m
is
s
ev
er
e
ac
ad
em
ic
d
is
h
o
n
esty
wh
er
ein
p
lag
iar
is
ts
tr
y
to
d
ec
eiv
e
r
ea
d
er
s
by
ch
an
g
in
g
o
th
er
s
’
co
n
tr
i
b
u
tio
n
s
to
ap
p
ea
r
as
th
eir
o
wn
.
I
n
tellig
en
t
p
la
g
iar
i
s
ts
tr
y
to
h
id
e,
o
b
f
u
s
ca
te,
an
d
ch
an
g
e
th
e
o
r
ig
in
al
wo
r
k
in
v
ar
io
u
s
in
tellig
en
t
way
s
,
in
clu
d
in
g
te
x
t
m
an
ip
u
latio
n
,
tr
an
s
latio
n
,
a
n
d
id
ea
a
d
o
p
tio
n
.
Ov
er
th
e
p
ast
two
d
ec
ad
es,
a
u
to
m
atic
p
lag
ia
r
is
m
d
etec
tio
n
h
as
r
ec
ei
v
ed
s
ig
n
i
f
ican
t
atte
n
tio
n
f
r
o
m
th
e
r
esear
ch
c
o
m
m
u
n
ity
.
T
w
o
m
ain
task
s
of
au
t
o
m
atic
p
l
ag
iar
is
m
d
etec
tio
n
a
r
e
s
o
u
r
c
e
r
etr
iev
al
a
n
d
tex
t
alig
n
m
en
t.
In
th
e
s
o
u
r
ce
r
etr
i
ev
al
task
,
g
iv
e
n
a
s
u
s
p
icio
u
s
d
o
cu
m
e
n
t
an
d
a
web
s
ea
r
ch
e
n
g
in
e,
t
h
e
task
is
to
r
etr
iev
e
all
s
o
u
r
ce
d
o
c
u
m
en
ts
f
r
o
m
wh
ic
h
tex
t
h
as
b
ee
n
r
eu
s
ed
.
In
th
e
tex
t
alig
n
m
en
t
s
u
b
ta
s
k
,
g
iv
en
a
p
air
of
d
o
cu
m
e
n
ts
(a
s
u
s
p
icio
u
s
d
o
cu
m
en
t
an
d
a
s
o
u
r
ce
o
n
e)
,
th
e
task
is
to
id
en
tify
co
n
tig
u
o
u
s
m
ax
im
al
-
len
g
th
p
ass
ag
es
of
r
eu
s
ed
tex
t.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
A
tw
o
-
p
h
a
s
e
p
la
g
ia
r
is
m
d
etec
tio
n
s
ystem
b
a
s
ed
o
n
mu
lti
-
la
y
er LS
TM
n
etw
o
r
k
s
(
N
g
u
ye
n
V
a
n
S
o
n
)
637
Mo
s
t
of
ex
is
tin
g
wo
r
k
s
on
tex
t
alig
n
m
en
t
f
o
cu
s
on
s
u
p
er
v
is
ed
an
d
u
n
s
u
p
er
v
is
ed
a
p
p
r
o
ac
h
e
s
.
Sev
er
al
u
n
s
u
p
er
v
is
ed
ap
p
r
o
ac
h
es
u
s
e
ch
ar
ac
ter
-
b
ased
m
et
h
o
d
s
(
e.
g
.
,
[
1
]
,
[
3
]
,
[
4
]
)
th
at
a
p
p
lied
s
tr
in
g
m
atc
h
in
g
or
ap
p
r
o
x
im
ate
s
tr
in
g
m
atch
in
g
with
m
ea
s
u
r
es
s
u
ch
as
Ha
m
m
in
g
or
L
e
v
en
s
h
tein
d
is
tan
ce
s
to
co
m
p
u
te
th
e
s
im
ilar
ity
b
etwe
en
two
s
tr
in
g
s
with
in
a
s
lid
in
g
win
d
o
w.
I
n
s
tead
of
co
m
p
a
r
in
g
s
tr
in
g
s
as
in
ch
ar
ac
ter
-
b
ased
m
eth
o
d
s
,
v
ec
to
r
-
b
ased
m
et
h
o
d
s
(
e.
g
.
,
[
5
]
,
[
6
]
)
p
r
o
p
o
s
ed
r
ep
r
esen
tin
g
in
p
u
t
tex
ts
as
v
ec
to
r
s
of
to
k
e
n
s
an
d
m
ea
s
u
r
in
g
th
e
d
is
tan
ce
b
etw
ee
n
th
ese
v
ec
to
r
s
by
u
s
in
g
s
im
ilar
ity
co
ef
f
icien
ts
s
u
ch
as
J
ac
ca
r
d
,
C
o
s
in
e,
E
u
clid
ea
n
,
or
Ma
n
h
attan
d
is
tan
ce
s
.
B
ased
on
th
e
i
n
tu
itio
n
t
h
at
s
im
ilar
d
o
cu
m
en
ts
wo
u
ld
h
a
v
e
s
im
ilar
s
y
n
tactica
l
s
tr
u
ctu
r
es,
s
o
m
e
r
esear
ch
wo
r
k
s
(
e.
g
.
,
[
7
]
,
[
8
]
)
u
s
ed
s
y
n
tactic
in
f
o
r
m
atio
n
at
th
e
f
ir
s
t
s
tag
e
of
m
ea
s
u
r
in
g
s
en
ten
tial
s
im
ilar
ity
.
T
h
e
m
ain
lim
itatio
n
of
th
ese
u
n
s
u
p
er
v
is
ed
ap
p
r
o
ac
h
es
is
th
at
th
ey
ca
n
n
o
t
d
ea
l
with
i
n
tellig
en
t
p
lag
iar
is
m
in
wh
ich
th
e
s
am
e
co
n
ten
t
ca
n
be
ex
p
r
ess
ed
by
d
if
f
er
en
t
w
o
r
d
s
an
d
in
d
if
f
er
e
n
t
o
r
d
er
s
.
R
esear
ch
on
i
n
tellig
en
t
p
lag
iar
is
m
(
e.
g
.
,
[
9
]
-
[
1
1
]
)
o
f
t
e
n
c
o
n
c
e
n
t
r
a
t
e
on
f
i
n
d
i
n
g
t
h
e
s
i
m
i
l
a
r
i
t
y
b
e
t
w
e
e
n
p
a
i
r
s
of
s
e
n
t
e
n
c
e
s
.
G
h
a
r
a
v
i
et
a
l.
[
9
]
p
r
o
p
o
s
ed
a
p
lag
iar
is
m
d
etec
tio
n
m
eth
o
d
f
o
r
th
e
Per
s
ian
lan
g
u
ag
e
by
r
e
p
r
esen
tin
g
ea
c
h
s
en
ten
c
e
by
a
s
em
an
tic
em
b
ed
d
in
g
v
ec
to
r
an
d
th
en
c
o
m
p
ar
in
g
t
h
e
s
im
ilar
ity
b
etwe
e
n
th
ese
v
ec
to
r
s
u
s
in
g
t
h
e
co
s
in
e
s
im
ilar
ity
.
C
h
er
r
o
u
n
et
a
l.
[
1
0
]
p
r
o
p
o
s
e
d
a
two
-
p
h
ase
s
y
s
tem
u
s
in
g
a
s
u
p
er
v
is
ed
lear
n
in
g
a
p
p
r
o
ac
h
to
d
etec
t
p
lag
iar
is
m
in
Ar
ab
ic.
T
h
e
f
ir
s
t
p
h
ase
p
r
o
d
u
ce
d
a
r
ep
r
esen
tin
g
v
ec
to
r
f
o
r
ea
c
h
s
en
ten
ce
by
co
m
b
in
in
g
d
if
f
e
r
en
t
f
ea
tu
r
es,
in
clu
d
in
g
wo
r
d
em
b
e
d
d
in
g
,
w
o
r
d
alig
n
m
en
t,
ter
m
f
r
eq
u
en
c
y
weig
h
tin
g
,
an
d
p
ar
t
-
of
-
s
p
ee
ch
tag
g
in
g
.
T
h
e
s
ec
o
n
d
p
h
ase
u
s
ed
lex
ica
l,
s
y
n
tactic,
an
d
s
em
an
tic
f
ea
t
u
r
es
in
t
h
r
ee
m
ac
h
in
e
lea
r
n
in
g
m
o
d
els
(
s
u
p
p
o
r
t
v
ec
to
r
m
ac
h
in
e
(
SVM)
,
d
ec
is
io
n
tr
ee
s
(
DT
)
,
an
d
r
an
d
o
m
f
o
r
ests
(
R
F))
to
im
p
r
o
v
e
th
e
a
cc
u
r
ac
y
of
th
e
f
ir
s
t
p
h
ase
r
esu
lts
.
Ho
wev
er
,
th
eir
ap
p
r
o
ac
h
d
i
d
not
d
ea
l
with
o
b
f
u
s
ca
ted
p
lag
iar
is
m
ca
s
es
wh
en
a
p
ass
ag
e
is
in
s
er
ted
in
th
e
m
id
d
le
of
a
s
en
ten
ce
.
Alth
en
ey
an
et
a
l.
[
1
1
]
p
r
esen
ted
two
s
y
s
tem
s
(
Plag
L
in
SVM
an
d
Plag
R
b
f
SVM)
u
s
in
g
th
e
s
u
p
p
o
r
t
v
ec
to
r
m
ac
h
i
n
e
class
if
ie
r
(
SVM)
with
lex
ical,
s
y
n
tactic,
an
d
s
em
an
tic
f
ea
tu
r
es
to
d
etec
t
p
lag
ia
r
is
m
s
en
ten
ce
s
.
T
h
eir
a
p
p
r
o
ac
h
a
p
p
lied
two
p
lag
iar
is
m
d
etec
tin
g
lev
els:
p
ar
a
g
r
ap
h
an
d
s
en
ten
ce
o
n
es.
T
h
e
p
a
r
ag
r
ap
h
-
lev
el
d
etec
ts
s
im
ilar
p
ar
a
g
r
ap
h
s
in
th
e
two
in
p
u
t
d
o
cu
m
en
ts
b
asin
g
on
th
e
n
u
m
b
er
of
co
m
m
o
n
u
n
ig
r
am
s
an
d
b
ig
r
am
s
of
th
ese
p
ar
ag
r
ap
h
s
.
T
h
e
s
en
ten
ce
-
lev
el
alig
n
s
s
en
ten
ce
s
in
th
e
ab
o
v
e
r
esu
lt
p
ar
ag
r
ap
h
p
air
s
b
asin
g
on
th
e
n
u
m
b
er
of
co
m
m
o
n
u
n
ig
r
am
s
b
etwe
en
th
e
two
s
en
ten
ce
s
.
If
th
e
s
co
r
e
of
a
s
en
ten
ce
p
air
was
h
ig
h
er
th
an
th
e
p
r
e
-
d
ef
in
e
d
th
r
e
s
h
o
ld
,
th
e
SVM
clas
s
if
ier
is
a
p
p
lied
to
d
eter
m
in
e
wh
eth
er
two
s
en
ten
ce
s
ar
e
s
i
m
ilar
or
n
o
t.
Fin
ally
,
p
la
g
iar
is
m
p
ass
ag
es
wer
e
cr
ea
ted
by
co
n
n
ec
tin
g
ad
jace
n
t
s
en
ten
ce
s
th
at
wer
e
co
p
ied
f
r
o
m
th
e
s
o
u
r
ce
d
o
cu
m
en
ts
.
Pre
v
io
u
s
in
tellig
en
t
p
lag
iar
is
m
ap
p
r
o
ac
h
es
h
av
e
lim
itatio
n
s
on
f
in
d
in
g
co
p
ie
d
p
ar
a
g
r
ap
h
s
b
ased
on
s
en
ten
ce
u
n
its
,
ass
u
m
in
g
t
h
at
p
eo
p
le
o
n
ly
c
o
p
y
or
r
ewr
ite
s
en
ten
ce
s
.
Ho
wev
er
,
ex
is
tin
g
c
ases
of
p
lag
iar
is
m
ar
e
more
co
m
p
licated
th
an
t
h
at.
W
h
en
co
m
p
ar
in
g
th
e
p
lag
i
ar
is
m
s
tr
in
g
s
an
d
th
e
s
o
u
r
ce
o
n
e,
we
f
o
u
n
d
th
at
th
ey
can
be
d
if
f
er
e
n
t
in
;
(
i)
th
e
n
u
m
b
er
of
s
en
ten
ce
s
;
(
ii)
th
e
s
en
ten
ce
len
g
th
;
an
d
(
iii)
th
e
tex
t
ap
p
ea
r
an
ce
’
s
o
r
d
er
.
T
h
e
ab
o
v
e
s
itu
atio
n
s
ar
e
not
r
eso
lv
e
d
y
et
in
e
x
is
tin
g
r
esear
ch
on
p
lag
iar
is
m
d
etec
tio
n
.
R
ec
en
tly
,
d
ee
p
lear
n
i
n
g
ap
p
r
o
ac
h
es
h
a
v
e
p
r
o
v
e
n
to
be
e
f
f
icien
t
in
s
o
lv
i
n
g
m
an
y
task
s
of
n
atu
r
al
lan
g
u
ag
e
p
r
o
ce
s
s
in
g
.
H
o
wev
e
r
,
as
f
ar
as
we
k
n
o
w,
th
e
lar
g
est
tr
ain
in
g
co
r
p
u
s
f
o
r
th
e
p
lag
iar
is
m
d
etec
tio
n
task
is
s
ti
ll
v
er
y
s
m
all
f
o
r
th
e
tr
ain
in
g
p
h
ase.
T
h
er
ef
o
r
e,
in
th
is
p
ap
er
,
we
p
r
o
p
o
s
e
a
p
la
g
iar
is
m
s
y
s
tem
th
at
tak
es
ad
v
an
tag
e
of
h
a
n
d
-
cr
a
f
t
ed
f
ea
tu
r
e
v
ec
to
r
s
an
d
lo
n
g
s
h
o
r
t
-
ter
m
m
em
o
r
y
(
L
STM
)
n
et
wo
r
k
m
o
d
el
[
1
2
]
to
d
ea
l
with
th
e
p
r
o
b
lem
s
m
en
tio
n
ed
ab
o
v
e.
T
h
e
s
y
s
tem
in
clu
d
es
two
m
ain
p
h
ases
:
−
p
ass
ag
e
-
p
h
ase
to
f
ig
u
r
e
o
u
t
p
l
ag
iar
is
m
p
ass
ag
es
in
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
d
o
cu
m
e
n
ts
.
−
wo
r
d
-
p
h
ase
to
r
em
o
v
e
r
ed
u
n
d
an
cy
p
ar
ts
f
r
o
m
p
lag
iar
is
m
p
ass
ag
es
to
ac
h
iev
e
th
e
ex
ac
t
p
lag
iar
is
m
s
tr
in
g
s
.
T
h
e
m
ain
co
n
tr
ib
u
tio
n
s
of
th
is
wo
r
k
ar
e:
−
We
p
r
o
p
o
s
ed
n
ew
f
ea
t
u
r
es
at
b
o
th
th
e
p
ass
ag
e
an
d
wo
r
d
l
ev
el
to
im
p
r
o
v
e
th
e
ac
cu
r
ac
y
in
d
etec
tin
g
s
im
ilar
s
tr
in
g
s
b
etwe
en
two
d
o
cu
m
en
ts
.
T
h
ese
f
ea
tu
r
es
ar
e:
(
i)
Ma
x
im
ize
p
ass
ag
e
s
im
ilar
ity
,
m
ax
im
ize
p
ass
ag
e
in
ter
s
ec
tio
n
,
p
ass
ag
e
im
p
o
r
tan
ce
at
th
e
p
ass
ag
e
-
p
h
ase;
an
d
(
ii)
wo
r
d
s
im
ilar
ity
,
av
er
ag
e
wo
r
d
s
im
ilar
ity
,
s
en
ten
ce
b
ased
s
im
ilar
ity
at
th
e
wo
r
d
-
p
h
ase.
−
We
p
r
o
p
o
s
ed
a
two
-
p
h
ase
p
l
ag
iar
is
m
d
etec
tio
n
s
y
s
tem
b
a
s
ed
on
a
m
u
lti
-
lay
er
L
STM
n
etwo
r
k
m
o
d
el
u
s
in
g
our
p
r
o
p
o
s
ed
f
ea
tu
r
es
to
s
o
lv
e
b
o
th
liter
al
a
n
d
in
tellig
e
n
t
p
lag
iar
is
m
p
r
o
b
lem
s
.
T
h
e
r
est
of
t
h
e
ar
ticle
is
o
r
g
a
n
ized
as:
our
p
r
o
p
o
s
ed
m
eth
o
d
is
in
tr
o
d
u
ce
d
in
s
ec
tio
n
2.
In
s
ec
tio
n
3,
we
d
escr
ib
e
our
ex
p
er
im
e
n
ts
an
d
an
aly
ze
th
e
r
esu
lts
.
Fin
ally
,
our
co
n
clu
s
io
n
s
an
d
f
u
t
u
r
e
r
esear
ch
d
ir
ec
tio
n
s
ar
e
p
r
esen
ted
in
s
ec
tio
n
4.
2.
P
RO
P
O
SE
D
M
E
T
H
O
D
T
h
e
p
r
o
b
lem
of
f
in
d
in
g
s
im
ilar
s
tr
in
g
s
b
etwe
en
two
d
o
cu
m
e
n
ts
is
s
tated
is
[
1
3
]
:
Def
in
itio
n
1:
Giv
en
two
d
o
c
u
m
en
ts
d
an
d
d
’
,
t
h
e
g
o
al
is
to
d
etec
t
a
s
et
of
p
ass
ag
e
p
air
s
,
P,
s
u
ch
as:
P
=
{
<
p
d
i
,
p
d
′
j
>
|
p
d
i
,
p
d
′
j
:
p
d
i
d
p
d
′
j
d
′
|
p
d
i
p
d
′
j
|
>
}
(
1
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
10
,
N
o
.
3
,
Sep
tem
b
er
2
0
2
1
:
6
3
6
-
648
638
in
wh
ich
p
d
i
is
a
s
tr
in
g
f
r
o
m
d;
p
d′
j
is
a
s
tr
in
g
f
r
o
m
d
’
;
p
d
i
p
d
′
j
in
d
icate
s
th
e
s
im
ilar
ity
b
etwe
en
p
d
i
a
n
d
p
d
′
j
;
is
a
th
r
esh
o
ld
th
at
is
u
s
e
d
to
d
eter
m
in
e
wh
eth
er
two
s
tr
in
g
s
ar
e
s
im
i
lar
en
o
u
g
h
to
be
co
n
s
id
er
ed
as
p
lag
ia
r
is
m
.
T
h
e
s
er
ies
of
co
m
p
etitio
n
s
h
ar
ed
task
s
f
o
r
p
lag
iar
is
m
d
etec
tio
n
n
am
ed
p
la
g
iar
is
m
an
aly
s
is
,
au
th
o
r
s
h
ip
i
d
en
tific
atio
n
,
a
n
d
n
ea
r
-
d
u
p
licate
d
etec
tio
n
(
PAN)
h
as
d
ef
in
e
d
f
o
u
r
ty
p
es
of
p
la
g
iar
is
m
.
a.
No
n
e
o
b
f
u
s
ca
tio
n
:
C
r
ea
te
p
lag
iar
is
m
ca
s
es
by
co
p
y
in
g
a
p
ar
ag
r
ap
h
f
r
o
m
th
e
s
o
u
r
ce
d
o
c
u
m
en
t
an
d
in
s
er
t
it
in
to
th
e
s
u
s
p
icio
u
s
o
n
e.
b.
R
an
d
o
m
o
b
f
u
s
ca
tio
n
:
C
r
ea
te
p
lag
iar
is
m
ca
s
es
by
in
s
er
tin
g
,
d
eletin
g
,
ch
an
g
in
g
t
h
e
o
r
d
er
of
wo
r
d
s
f
r
o
m
a
p
ar
ag
r
a
p
h
of
th
e
s
o
u
r
ce
,
an
d
i
n
s
er
tin
g
it
in
to
th
e
s
u
s
p
icio
u
s
d
o
cu
m
e
n
t.
c.
T
r
an
s
latio
n
o
b
f
u
s
ca
tio
n
:
C
r
ea
te
p
lag
iar
is
m
ca
s
es
by
tr
an
s
latin
g
a
p
ar
ag
r
ap
h
more
th
an
o
n
ce
th
r
o
u
g
h
s
ev
er
al
lan
g
u
a
g
es
an
d
b
ac
k
to
th
e
o
r
ig
i
n
al
lan
g
u
ag
e
u
s
in
g
d
if
f
e
r
en
t
m
ac
h
in
e
tr
an
s
latio
n
to
o
ls
.
T
h
en
,
in
s
er
tin
g
th
e
tr
an
s
lated
p
a
r
ag
r
ap
h
in
to
t
h
e
s
u
s
p
icio
u
s
d
o
c
u
m
en
t.
d.
Su
m
m
ar
y
o
b
f
u
s
ca
tio
n
:
C
r
ea
te
p
lag
iar
is
m
ca
s
es
by
s
u
m
m
ar
izin
g
th
e
s
o
u
r
ce
p
ar
ag
r
ap
h
a
n
d
in
s
er
tin
g
it
in
to
th
e
s
u
s
p
icio
u
s
d
o
c
u
m
en
t.
T
h
is
p
ap
er
aim
s
at
s
o
lv
in
g
p
l
ag
iar
is
m
ca
s
es
b
elo
n
g
to
all
f
o
u
r
ty
p
es
ab
o
v
e
.
Ou
r
p
r
o
p
o
s
e
d
s
y
s
tem
’
s
wo
r
k
f
lo
w
is
s
h
o
wn
in
Fig
u
r
e
1,
in
clu
d
i
n
g
th
r
ee
s
tep
s
.
−
Pre
-
p
r
o
ce
s
s
in
g
:
T
h
is
s
tep
s
p
lits
in
p
u
t
d
o
cu
m
en
ts
in
to
s
en
ten
ce
s
,
r
em
o
v
es
s
to
p
wo
r
d
s
an
d
s
p
ec
ial
ch
ar
ac
ter
s
,
an
d
c
o
m
b
in
es
s
o
r
t
s
en
ten
ce
s
in
to
o
n
e.
−
Pas
s
ag
e
-
p
h
ase:
Af
ter
th
e
p
r
e
-
p
r
o
ce
s
s
in
g
s
tep
,
we
u
s
e
a
co
n
tex
t
win
d
o
w
s
lid
in
g
o
v
er
th
e
s
o
u
r
ce
a
n
d
s
u
s
p
icio
u
s
d
o
cu
m
en
ts
to
cr
e
ate
ca
n
d
id
ate
p
ass
ag
es.
We
ex
tr
ac
t
f
ea
tu
r
es
f
r
o
m
th
ese
p
ass
ag
es
an
d
g
en
er
ate
an
in
p
u
t
f
ea
tu
r
e
m
a
tr
ix
co
r
r
esp
o
n
d
in
g
to
th
ese
f
ea
tu
r
es.
T
h
is
m
atr
i
x
is
f
ee
d
in
to
a
b
in
ar
y
class
if
ier
of
th
e
ca
n
d
id
ate
s
ele
ctio
n
m
o
d
u
le
to
o
b
tain
p
air
s
of
p
lag
iar
is
m
p
ass
ag
es.
−
W
o
r
d
-
p
h
ase:
T
h
e
p
air
s
of
p
lag
iar
is
m
p
ass
ag
es
ar
e
u
s
ed
as
th
e
in
p
u
t
f
o
r
th
e
wo
r
d
-
p
h
ase.
T
h
e
p
u
r
p
o
s
e
of
th
is
p
h
ase
is
to
d
ef
in
e
th
e
ex
ac
t
p
lag
iar
is
m
s
tr
in
g
s
f
r
o
m
th
e
in
p
u
t
p
ass
ag
es.
A
b
in
ar
y
cl
ass
if
ier
at
th
e
wo
r
d
-
lev
el
is
u
s
ed
to
p
er
f
o
r
m
th
is
task
.
Fig
u
r
e
1.
Ov
e
r
v
iew
of
th
e
p
r
o
p
o
s
ed
s
y
s
tem
’
s
wo
r
k
f
lo
w
f
o
r
p
lag
iar
is
m
d
etec
tio
n
2
.
1
.
P
re
-
pro
ce
s
s
ing
T
h
e
in
p
u
t
d
o
cu
m
en
ts
ar
e
s
p
lit
in
to
s
en
ten
ce
s
u
s
in
g
th
e
s
en
t
to
k
en
izer
to
o
l
f
r
o
m
th
e
NL
T
K
lib
r
ar
y
.
T
h
en
s
to
p
wo
r
d
s
ar
e
r
em
o
v
ed
f
r
o
m
th
ese
s
en
ten
ce
s
.
So
m
e
s
p
ec
if
ic
ca
s
es
can
af
f
ec
t
th
e
ac
cu
r
ac
y
of
p
lag
iar
is
m
s
elec
tio
n
.
T
h
ese
ca
s
es
ar
e:
−
T
h
e
in
p
u
t
d
o
cu
m
en
ts
co
n
tain
n
u
m
b
er
s
th
at
ar
e
wr
itten
in
co
r
r
ec
tly
,
s
u
ch
as
‘
8
.
3
9
’
,
‘7
p.
m
’
.
In
th
is
ca
s
e,
th
e
s
en
ten
ce
s
p
litt
er
in
co
r
r
ec
tl
y
s
eg
m
en
ts
tex
t
in
to
s
en
ten
ce
s
at
th
e
dot
(
‘
.
’
)
ch
ar
ac
ter
.
−
Af
ter
r
em
o
v
in
g
s
to
p
wo
r
d
s
,
th
er
e
ar
e
s
o
m
e
s
h
o
r
t
s
en
ten
ce
s
co
n
tain
in
g
n
o
n
e
or
o
n
ly
o
n
e
or
two
to
k
e
n
s
.
Fo
r
ex
am
p
le,
two
s
en
ten
ce
s
“
C
an
you
f
ee
l
th
e
b
u
r
n
?”
,
“Wh
o
we
ar
e?
”
r
em
ain
two
wo
r
d
s
an
d
em
p
ty
,
r
esp
ec
tiv
ely
,
af
ter
clea
n
i
n
g
s
to
p
wo
r
d
s
an
d
p
u
n
ctu
atio
n
c
h
ar
a
cter
s
.
Sin
ce
th
e
s
im
ilar
ities
of
s
h
o
r
t
s
en
ten
ce
s
do
not
h
av
e
m
u
ch
m
ea
n
in
g
,
we
c
o
m
b
in
e
th
e
s
h
o
r
t
s
en
ten
ce
s
with
s
u
r
r
o
u
n
d
in
g
s
en
ten
ce
s
a
n
d
co
m
p
ar
e
th
e
s
im
ilar
ity
b
et
wee
n
th
e
p
ass
ag
es
af
ter
co
m
b
in
ed
.
T
h
er
e
f
o
r
e,
to
d
ea
l
with
th
e
p
r
o
b
lem
s
m
en
ti
o
n
ed
a
b
o
v
e,
we
f
ir
s
t
ap
p
ly
t
h
e
s
en
ten
ce
s
p
litt
er
an
d
th
e
n
r
em
o
v
e
s
to
p
wo
r
d
s
,
n
u
m
b
er
s
,
an
d
s
p
ec
ial
ch
ar
ac
ter
s
f
r
o
m
th
e
s
en
ten
ce
s
.
Af
ter
clea
n
in
g
th
e
tex
t,
s
en
ten
ce
s
with
less
th
an
th
r
ee
wo
r
d
s
ar
e
co
m
b
in
e
d
with
th
e
n
ex
t
s
en
ten
ce
to
cr
ea
te
ex
ten
d
ed
s
en
ten
ce
s
.
To
th
e
b
est
of
our
k
n
o
wled
g
e,
th
e
ab
o
v
e
co
m
b
in
atio
n
s
tep
allo
w
s
us
to
ef
f
icien
tly
m
an
ag
e
th
e
p
ass
ag
e’
s
len
g
th
af
ter
p
air
in
g
an
d
av
o
i
d
in
g
th
e
ca
s
e
of
cr
ea
tin
g
to
o
-
l
o
n
g
p
ass
ag
es.
We
u
s
e
a
win
d
o
w
of
s
ize
w
(
s
en
ten
ce
s
)
s
lid
in
g
on
b
o
th
s
u
s
p
icio
u
s
an
d
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
A
tw
o
-
p
h
a
s
e
p
la
g
ia
r
is
m
d
etec
tio
n
s
ystem
b
a
s
ed
o
n
mu
lti
-
la
y
er LS
TM
n
etw
o
r
k
s
(
N
g
u
ye
n
V
a
n
S
o
n
)
639
s
o
u
r
ce
d
o
cu
m
e
n
ts
to
g
en
er
ate
ca
n
d
id
ate
p
lag
iar
is
m
p
ass
ag
es,
wh
ich
ar
e
u
s
ed
as
in
p
u
ts
of
th
e
p
ass
ag
e
-
p
h
ase.
T
h
e
o
p
tim
al
win
d
o
w
s
ize
f
o
r
t
h
e
PAN
d
atasets
is
th
r
ee
s
en
te
n
ce
s
.
2
.
2
.
P
a
s
s
a
g
e
-
ph
a
s
e
T
h
e
in
p
u
t
of
th
is
p
h
ase
is
ca
n
d
id
ate
p
la
g
iar
is
m
p
ass
ag
es,
each
p
ass
ag
e
co
n
s
is
tin
g
of
th
r
ee
co
n
s
ec
u
tiv
e
s
en
ten
ce
s
f
r
o
m
th
e
s
u
s
p
icio
u
s
or
s
o
u
r
ce
d
o
cu
m
en
ts
.
In
th
is
p
h
ase,
each
p
ass
ag
e
is
en
co
d
ed
as
a
s
em
an
tic
em
b
ed
d
in
g
v
ec
to
r
.
T
h
e
s
em
an
tic
s
im
ilar
ity
b
etwe
en
two
p
ass
ag
es
is
ca
lcu
l
ated
b
ased
on
th
e
d
is
tan
ce
b
etwe
en
th
ese
v
ec
to
r
s
.
We
u
s
e
S
B
E
R
T
to
en
co
d
e
p
ass
ag
es,
s
in
ce
it
is
p
r
o
v
ed
in
[
1
4
]
th
at
SB
E
R
T
is
b
etter
th
an
o
th
er
m
eth
o
d
s
(
e
.
g
.
,
W
o
r
d
2
Vec
[
1
5
]
,
Glo
v
e
[
1
6
]
,
Fas
tex
t
[
1
7
]
,
I
n
f
er
Sen
t
[
1
8
]
,
or
Un
iv
er
s
al
Sen
ten
ce
E
n
co
d
e
r
[
1
9
]
)
in
v
a
r
io
u
s
d
o
m
ain
s
.
Featu
r
es
r
ep
r
e
s
en
tin
g
f
o
r
ea
c
h
p
ass
ag
e
is
d
er
iv
ed
f
r
o
m
th
ese
p
ass
ag
e
v
ec
to
r
s
.
T
h
ey
ar
e
th
en
u
s
ed
as
in
p
u
ts
f
o
r
th
e
b
in
ar
y
class
if
icatio
n
at
th
e
p
as
s
ag
e
lev
el
to
d
etec
t
wh
eth
er
two
p
ass
ag
es
ar
e
s
im
ilar
or
n
o
t.
2
.
2
.
1
.
P
a
s
s
a
g
e
-
ph
a
s
e
f
ea
t
ure
ex
t
ra
ct
io
n
Giv
en
a
s
et
of
all
ca
n
d
id
ate
p
ass
ag
es
in
th
e
s
u
s
p
icio
u
s
d
o
cu
m
en
t
U
=
(u
1
,u
2
,
…,
u
n
)
an
d
a
s
et
of
all
ca
n
d
id
ate
p
ass
ag
es
in
t
h
e
s
o
u
r
ce
d
o
c
u
m
en
t
V
=
(v
1
,v
2
,…,v
m
)
,
with
each
p
ass
ag
e
u
i
an
d
v
j
is
r
ep
r
esen
ted
as
a
p
ass
ag
e
em
b
ed
d
in
g
v
ec
to
r
.
We
p
r
o
p
o
s
e
th
e
f
o
llo
win
g
f
ea
tu
r
es
f
o
r
th
is
p
h
ase:
−
Ma
x
im
ize
p
ass
ag
e
s
im
ilar
ity
T
h
is
f
ea
tu
r
e
is
u
s
ed
to
d
eter
m
in
e
th
e
m
ax
im
u
m
s
im
ilar
ity
of
a
p
ass
ag
e
v
ec
to
r
u
i
ag
ain
s
t
a
s
et
of
p
ass
ag
e
v
ec
to
r
s
V.
L
et
us
s
ay
,
is
th
e
s
im
ilar
ity
b
etwe
en
two
p
ass
ag
e
v
ec
to
r
s
u
i
an
d
v
j
wh
er
e
u
i
∈
U,
v
j
∈
V
.
L
et
,
is
th
e
m
ax
im
u
m
p
ass
ag
e
s
im
ilar
ity
of
t
h
e
p
ass
ag
e
v
ec
to
r
u
i
ag
ai
n
s
t
th
e
s
et
of
p
as
s
ag
e
v
ec
to
r
s
V
.
It
is
ca
lcu
lated
as:
,
=
ma
x
∈
(
,
)
(
2
)
T
h
e
ma
ximize
p
a
s
s
a
g
e
s
imila
r
ity
f
ea
tu
r
e
v
ec
to
r
of
all
p
ass
ag
e
v
ec
to
r
s
in
th
e
p
air
of
s
u
s
p
i
cio
u
s
an
d
s
o
u
r
ce
d
o
c
u
m
en
t
is
d
eter
m
i
n
e
d
by
(
3
)
:
(
,
)
=
(
1
,
,
2
,
,
…
,
,
,
1
,
,
2
,
,
…
,
,
)
(
3
)
−
Ma
x
im
ize
p
ass
ag
e
in
ter
s
ec
tio
n
To
d
eter
m
i
n
e
th
e
m
ax
im
u
m
i
n
ter
s
ec
tio
n
v
alu
e
of
a
p
ass
ag
e
u
i
with
a
s
et
of
p
ass
ag
es
V
,
we
s
p
lit
p
ass
ag
es
in
to
wo
r
d
s
an
d
f
in
d
t
h
e
in
ter
s
ec
tio
n
wo
r
d
s
of
each
p
ass
ag
e
p
air
(u
i
,
v
j
),
with
u
i
∈
U,
v
j
∈
V
an
d
ta
k
e
th
e
m
ax
im
u
m
len
g
th
of
th
is
in
ter
s
ec
tio
n
.
T
h
is
v
alu
e
is
ca
lcu
l
ated
as
in
(
4
)
:
,
=
ma
x
∈
(
∩
)
(
4
)
T
h
e
ma
ximize
p
a
s
s
a
g
e
in
ter
s
ec
tio
n
f
ea
tu
r
e
v
ec
to
r
of
all
p
ass
ag
es
in
t
h
e
p
ai
r
of
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
d
o
cu
m
e
n
t
is
d
eter
m
in
ed
by
(
5
)
:
(
,
)
=
(
1
,
,
2
,
,
…
,
,
,
1
,
,
2
,
,
…
,
,
)
(
5
)
−
Pas
s
ag
e
im
p
o
r
tan
ce
T
er
m
f
r
eq
u
e
n
cy
-
i
n
v
er
s
e
d
o
c
u
m
en
t
f
r
eq
u
e
n
cy
(TF
-
I
DF)
is
th
e
m
o
s
t
wid
ely
u
s
ed
an
d
co
n
s
id
er
ed
one
of
th
e
m
o
s
t
ap
p
r
o
p
r
iate
ter
m
weig
h
tin
g
s
ch
em
es
.
T
h
is
TF
-
I
DF
is
em
p
lo
y
e
d
to
g
et
r
id
of
ter
m
s
with
lo
wer
weig
h
ts
f
r
o
m
d
o
c
u
m
en
ts
an
d
h
elp
s
to
in
cr
ea
s
e
th
e
r
etr
iev
al
ef
f
ec
tiv
en
ess
.
T
er
m
f
r
e
q
u
en
c
y
-
in
v
er
s
e
d
o
c
u
m
en
t
f
r
eq
u
e
n
cy
is
a
n
u
m
e
r
ical
s
tatis
tic
th
at
tells
us
how
im
p
o
r
ta
n
t
a
wo
r
d
is
to
a
d
o
cu
m
en
t
in
a
c
o
llectio
n
or
a
co
r
p
u
s
.
It
is
m
o
s
tly
u
s
ed
as
a
weig
h
tin
g
f
ac
to
r
in
v
ar
io
u
s
p
r
o
ce
s
s
es
u
s
ed
f
o
r
in
f
o
r
m
atio
n
r
etr
iev
al
an
d
tex
t
m
in
in
g
.
To
d
eter
m
in
e
s
im
ilar
p
ass
ag
es,
we
put
f
o
r
war
d
th
e
id
ea
of
ter
m
f
r
eq
u
en
cy
-
in
v
er
s
e
s
en
te
n
ce
f
r
eq
u
e
n
cy
(TF
-
I
SF
)
[
2
0
]
.
We
t
r
ea
t
each
p
ass
ag
e
as
a
d
o
cu
m
e
n
t
an
d
each
d
o
cu
m
en
t
as
a
co
r
p
u
s
,
th
en
ca
lcu
late
th
e
v
alu
es
of
TF(
w
,
U)
,
TF(
u
i
,
U)
,
an
d
I
S
F
(
u
i
,
U)
,
in
wh
ic
h
w
is
a
ter
m
in
a
p
ass
ag
e
u
i
,
U
is
th
e
d
o
cu
m
e
n
t
co
n
tain
in
g
u
i
.
Giv
en
|
|
is
th
e
to
tal
n
u
m
b
er
of
wo
r
d
s
in
th
e
p
ass
ag
e
u
i
,
TF(
u
i
,
U)
is
co
m
p
u
ted
as:
(
,
)
=
∑
(
,
)
∈
|
|
(
6
)
I
S
F
(
u
i
,
U)
is
co
m
p
u
te
d
by
(
7
)
:
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
10
,
N
o
.
3
,
Sep
tem
b
er
2
0
2
1
:
6
3
6
-
648
640
(
,
)
=
∑
(
,
)
∈
|
|
(
7
)
T
h
e
p
ass
ag
e
im
p
o
r
tan
ce
of
th
e
p
ass
ag
e
u
i
in
th
e
d
o
cu
m
e
n
t
U
is
d
eter
m
in
ed
by
(
8
)
:
,
=
(
,
)
×
(
,
)
(
8
)
T
he
p
a
s
s
a
g
e
imp
o
r
ta
n
ce
f
ea
t
u
r
e
v
ec
to
r
of
all
p
ass
ag
e
in
th
e
p
air
of
s
u
s
p
icio
u
s
a
n
d
s
o
u
r
ce
d
o
c
u
m
en
t
is
d
eter
m
in
ed
by
(
9
)
:
(
,
)
=
(
1
,
,
2
,
,
…
,
,
,
1
,
,
2
,
,
…
,
,
)
(
9
)
−
T
h
e
f
ea
tu
r
e
m
atr
ix
f
o
r
th
e
p
ass
ag
e
-
p
h
ase
Af
ter
ex
tr
ac
tin
g
a
n
d
cr
ea
tin
g
th
r
ee
f
ea
tu
r
e
v
ec
to
r
s
p
s
im(
U,
V
)
,
p
in
ter(U,
V
)
,
a
n
d
p
imp
(
U,
V
)
,
we
co
m
b
in
e
th
em
i
n
to
a
two
-
d
im
en
s
io
n
al
m
atr
ix
of
s
ize
(
n
+
m
)
x
3
wh
e
r
e
n
+m
is
th
e
to
tal
n
u
m
b
er
of
p
ass
ag
es
f
r
o
m
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
d
o
cu
m
e
n
ts
.
T
h
e
f
ea
tu
r
e
m
atr
ix
f
o
r
all
p
ass
ag
es
in
th
e
p
air
of
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
d
o
c
u
m
en
ts
is
d
eter
m
in
ed
as
in
(
1
0
)
.
It
is
u
s
ed
as
th
e
in
p
u
t
f
o
r
th
e
m
u
lti
-
lay
er
L
ST
M
n
etwo
r
k
m
o
d
el,
d
escr
ib
ed
in
s
ec
tio
n
2
.
2
.
2
.
=
(
1
,
1
,
1
,
2
,
2
,
2
,
⋮
⋮
⋮
,
,
,
)
(
1
0
)
2
.
2
.
2
.
P
la
g
ia
rism
pa
s
s
a
g
e
s
elec
t
io
n
We
b
u
ild
o
u
r
b
i
n
ar
y
class
if
ier
by
u
s
in
g
a
m
u
lti
-
lay
er
L
STM
n
etwo
r
k
m
o
d
el,
w
h
ich
is
u
s
ed
to
p
r
ed
ict
th
e
p
r
o
b
a
b
ilit
y
of
b
ein
g
a
p
lag
iar
is
m
p
ass
ag
e
in
th
e
p
air
of
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
d
o
cu
m
e
n
ts
.
Fig
u
r
e
2
s
h
o
ws
th
e
s
tr
u
ctu
r
e
of
o
u
r
m
o
d
el
at
t
h
e
p
ass
ag
e
-
p
h
ase.
At
th
is
p
h
as
e,
we
g
e
n
er
ate
th
e
in
p
u
t
v
ec
to
r
s
by
r
esh
ap
in
g
th
e
f
ea
tu
r
e
m
atr
ix
f
passage
in
to
a
t
h
r
ee
-
d
im
e
n
s
io
n
al
m
atr
ix
of
b
a
tch
_
s
iz
e,
time_
s
tep
s
,
an
d
s
eq
_
len
an
d
f
ee
d
th
em
in
to
th
e
m
o
d
el.
T
h
e
p
a
r
am
eter
s
u
s
in
g
in
th
e
L
STM
m
o
d
el
ar
e:
(
i)
b
a
tc
h
_
s
iz
e
eq
u
als
th
e
n
u
m
b
er
of
p
ass
ag
es;
(
ii)
time_
s
tep
s
eq
u
als
1;
(
iii)
s
eq
_
len
eq
u
als
th
e
n
u
m
b
er
of
f
ea
tu
r
es
(
s
eq
_
len
=3
)
.
Fig
u
r
e
2.
T
h
e
ar
ch
itectu
r
e
of
t
h
e
m
u
lti
-
lay
er
L
STM
m
o
d
el
at
th
e
p
ass
ag
e
-
p
h
ase
T
h
e
o
u
tp
u
t
of
th
e
s
ig
m
o
id
ac
ti
v
atio
n
f
u
n
ctio
n
is
alwa
y
s
in
th
e
r
an
g
e
of
(
0
,
1
)
.
T
h
is
f
u
n
ctio
n
is
ap
p
lied
to
th
e
o
u
tp
u
t
of
all
u
n
its
in
th
e
last
h
id
d
en
L
STM
lay
er
.
L
et
=
(
1
,
2
,
…
,
+
)
is
th
e
o
u
tp
u
t
of
th
e
b
in
ar
y
class
if
icatio
n
mode
l
(0
<
y
i
<
1
)
,
an
d
n
+m
is
th
e
n
u
m
b
er
of
p
ass
ag
es
in
th
e
p
air
of
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
d
o
cu
m
e
n
ts
.
Fig
u
r
e
3
s
h
o
ws
th
e
o
u
tp
u
t
of
th
e
m
o
d
el
is
a
v
ec
to
r
of
0s
an
d
1s
in
wh
ich
v
alu
e
s
1
f
o
r
all
y
i
b
ein
g
h
ig
h
er
th
a
n
a
th
r
esh
o
ld
θ
,
an
d
v
alu
es
0
f
o
r
th
e
r
e
m
ain
in
g
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
A
tw
o
-
p
h
a
s
e
p
la
g
ia
r
is
m
d
etec
tio
n
s
ystem
b
a
s
ed
o
n
mu
lti
-
la
y
er LS
TM
n
etw
o
r
k
s
(
N
g
u
ye
n
V
a
n
S
o
n
)
641
0
0
1
1
1
0
0
0
.
.
.
0
0
1
1
1
1
0
0
.
.
S
u
s
p
i
c
i
o
u
s
S
o
u
r
c
e
.
0
0
.
.
.
.
.
0
0
.
Fig
u
r
e
3.
T
h
e
o
u
t
p
u
t
of
th
e
m
o
d
el
at
th
e
p
ass
ag
e
-
p
h
ase
Plag
iar
is
m
p
ass
ag
es
ar
e
g
en
er
ated
by
s
elec
tin
g
s
en
ten
ce
s
c
o
r
r
esp
o
n
d
in
g
to
th
e
lo
n
g
est
v
alu
es
of
1
f
r
o
m
th
e
o
u
tp
u
t
of
th
e
m
o
d
el.
W
h
en
o
b
s
er
v
i
n
g
a
n
d
an
aly
zin
g
th
e
p
lag
iar
is
m
p
ass
ag
es
o
b
ta
in
ed
,
we
f
o
u
n
d
th
at
m
o
s
t
p
lag
iar
is
m
p
ass
ag
es
co
n
tain
en
tire
s
en
ten
ce
s
.
Ho
we
v
er
,
th
e
p
lag
iar
is
m
p
ar
ag
r
a
p
h
co
n
tain
s
s
ev
er
al
r
ed
u
n
d
an
t
wo
r
d
s
at
th
e
two
en
d
s
,
s
u
ch
as
th
e
e
x
am
p
le
in
th
e
PAN
2014
c
o
r
p
u
s
e
x
p
lain
ed
b
y
:
th
is
ex
am
p
le.
In
th
is
ex
am
p
le,
th
e
u
n
d
er
lin
e
d
t
ex
t
is
in
s
id
e
th
e
p
lag
iar
is
m
p
ar
ag
r
ap
h
,
wh
er
ea
s
th
e
r
est
is
r
ed
u
n
d
an
t
.
Th
e
s
u
s
p
icio
u
s
p
la
g
ia
r
is
m
p
a
r
a
g
r
a
p
h
:
T
h
e
ca
p
s
u
le
was
d
esig
n
ed
f
o
r
en
tr
y
i
n
to
th
e
Ma
r
tian
atm
o
s
p
h
er
e,
d
escen
t
to
th
e
s
u
r
f
a
ce
,
im
p
ac
t
s
u
r
v
iv
al,
an
d
s
u
r
f
ac
e
life
tim
es
of
as
m
u
ch
as
s
ix
m
o
n
th
s
an
d
co
n
tain
ed
th
e
p
o
wer
,
g
u
id
an
ce
,
co
n
tr
o
l
c
o
m
m
u
n
icatio
n
s
,
an
d
d
ata
h
an
d
lin
g
s
y
s
tem
s
n
ec
ess
ar
y
to
co
m
p
lete
its
m
is
s
io
n
.
is
p
erh
a
p
s
th
e
mo
s
t
p
r
o
d
u
ctive
s
p
a
ce
p
r
o
b
e
yet
d
ep
lo
ye
d
,
visi
tin
g
fo
u
r
p
la
n
ets
and
th
eir
mo
o
n
s
,
in
clu
d
in
g
tw
o
p
r
ima
r
y
visi
t
s
to
p
r
ev
io
u
s
ly
u
n
ex
p
lo
r
ed
p
l
a
n
ets,
w
ith
p
o
w
erfu
l
ca
mera
s
and
a
mu
ltit
u
d
e
of
s
cien
tifi
c
in
s
tr
u
men
ts
,
at
a
fr
a
ctio
n
of
t
h
e
mo
n
ey
la
ter
s
p
en
t
on
s
p
ec
i
a
liz
ed
p
r
o
b
es
s
u
ch
as
th
e
a
n
d
th
e
p
r
o
b
e.
A
lo
n
g
w
ith
,
and
V
o
y
a
g
e
r
2
is
an
.
V
o
y
a
g
e
r
2
G
a
l
i
l
e
o
s
p
a
c
e
c
r
a
f
t
C
a
s
s
i
n
i
-
H
u
y
g
e
n
s
[
2
]
[
3
]
P
i
o
n
e
e
r
10
P
i
o
n
e
e
r
11
V
o
y
a
g
e
r
1
N
e
w
H
o
r
i
z
o
n
s
i
n
t
e
r
s
t
e
l
l
a
r
p
r
o
b
e
r
e
s
i
d
e
n
t
p
e
r
y
e
a
r
,
or
r
o
u
g
h
l
y
h
a
l
f
t
h
e
c
o
s
t
of
one
c
a
n
d
y
b
a
r
each
y
e
a
r
s
i
n
c
e
p
r
o
j
e
c
t
in
ce
p
tio
n
.
Th
e
s
o
u
r
ce
p
la
g
ia
r
is
m
p
a
r
a
g
r
a
p
h
:
Vo
y
ag
er
2
u
n
m
a
n
n
ed
in
ter
p
l
an
etar
y
s
p
ac
e
p
r
o
b
e
V
o
y
ag
e
r
p
r
o
g
r
am
Vo
y
ag
er
1
Vo
y
ag
e
r
2
ec
lip
tic
So
lar
Sy
s
tem
Ur
an
u
s
Nep
tu
n
e
g
r
av
ity
ass
is
t
Satu
r
n
Vo
y
ag
er
2
T
itan
Plan
etar
y
G
r
an
d
T
o
u
r
[
1
]
is
p
erh
a
p
s
th
e
m
o
s
t
p
r
o
d
u
ctive
s
p
a
ce
p
r
o
b
e
yet
d
e
p
l
o
ye
d
,
visi
tin
g
f
o
u
r
p
l
a
n
ets
a
n
d
th
eir
mo
o
n
s
,
in
clu
d
in
g
tw
o
p
r
ima
r
y
visi
t
s
to
p
r
ev
io
u
s
ly
u
n
ex
p
lo
r
ed
p
la
n
ets,
w
ith
p
o
w
erfu
l
ca
me
r
a
s
and
a
mu
ltit
u
d
e
of
s
cien
tifi
c
in
s
tr
u
men
ts
,
at
a
fr
a
ctio
n
of
t
h
e
mo
n
e
y
la
ter
s
p
en
t
on
s
p
ec
ia
liz
ed
p
r
o
b
es
s
u
ch
as
th
e
a
n
d
th
e
p
r
o
b
e.
A
l
o
n
g
w
ith
,
,
and
V
o
ya
g
er
2
is
an
.
V
o
ya
g
er
2
Ga
lileo
s
p
a
ce
cra
ft
C
a
s
s
in
i
-
Hu
yg
en
s
[
2
]
[
3
]
P
io
n
ee
r
10
P
io
n
ee
r
11
V
o
ya
g
er
1
N
ew
H
o
r
iz
o
n
s
in
ters
tella
r
p
r
o
b
e
C
o
n
ten
ts
T
itan
3E
C
en
tau
r
was
o
r
ig
in
ally
p
la
n
n
ed
to
b
e,
p
ar
t
of
th
e.
To
s
o
lv
e
th
is
p
r
o
b
lem
,
we
e
x
ten
d
p
air
s
of
p
lag
iar
is
m
p
a
s
s
ag
es
f
r
o
m
th
e
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
d
o
cu
m
e
n
ts
by
a
d
d
in
g
k
s
en
ten
ce
s
to
th
e
lef
t
an
d
r
ig
h
t
of
b
o
t
h
p
ass
ag
es.
E
x
ten
d
ed
p
ass
ag
es
will
be
u
s
ed
as
th
e
in
p
u
t
f
o
r
th
e
w
o
r
d
-
p
h
ase
to
f
in
d
ex
ac
t
p
lag
iar
is
m
s
tr
in
g
s
.
It
is
done
by
r
e
m
o
v
in
g
r
e
d
u
n
d
an
t
tex
t
f
r
o
m
th
e
ex
ten
d
ed
p
lag
iar
is
m
p
ass
ag
es.
T
h
e
wo
r
d
-
p
h
ase
will
be
in
tr
o
d
u
ce
d
n
ex
t.
2
.
3
.
Wo
rd
-
ph
a
s
e
To
r
em
o
v
e
th
e
r
e
d
u
n
d
an
t
tex
t
at
th
e
two
en
d
s
of
th
e
ex
te
n
d
ed
p
lag
iar
is
m
p
ass
ag
es,
we
n
ee
d
to
id
en
tify
s
em
an
tically
r
elate
d
s
eg
m
en
ts
b
ased
on
co
n
s
ec
u
tiv
e
wo
r
d
s
of
h
ig
h
s
im
ilar
ity
.
To
g
et
th
e
m
ea
n
in
g
of
a
wo
r
d
,
we
put
th
at
wo
r
d
in
a
win
d
o
w
s
ize
of
3
with
one
wo
r
d
on
th
e
lef
t
an
d
one
wo
r
d
on
th
e
r
ig
h
t.
T
h
e
tex
t
in
s
id
e
th
is
win
d
o
w
is
u
s
ed
as
th
e
in
p
u
t
of
SB
E
R
T
to
cr
ea
te
wo
r
d
f
ea
tu
r
e
v
ec
to
r
s
.
2
.
3
.
1
.
Wo
rd
-
lev
el
f
ea
t
ure
ex
t
ra
ct
io
n
In
th
is
p
h
ase,
t
h
r
ee
f
ea
t
u
r
es
a
r
e
p
r
o
p
o
s
ed
b
ased
on
th
e
co
s
i
n
e
s
im
ilar
ity
b
etwe
en
th
e
w
o
r
d
an
d
th
e
s
en
ten
ce
co
n
tain
in
g
th
at
wo
r
d
.
T
h
e
w
o
r
d
s
imila
r
i
ty
f
ea
tu
r
e
is
a
v
ec
to
r
th
at
co
n
tain
s
th
e
m
ax
im
u
m
s
im
ilar
ity
v
alu
es
of
each
wo
r
d
.
T
h
e
m
ax
im
u
m
s
im
ilar
ity
of
a
wo
r
d
in
th
e
s
u
s
p
icio
u
s
p
ass
ag
e
is
t
h
e
m
ax
im
u
m
s
im
ilar
it
y
of
th
at
wo
r
d
with
each
wo
r
d
in
th
e
s
o
u
r
ce
p
ass
ag
e
an
d
v
ice
v
er
s
a.
Featu
r
es
a
ve
r
a
g
e
w
o
r
d
s
imi
la
r
ity
an
d
s
en
ten
ce
b
a
s
ed
s
imila
r
ity
ar
e
u
s
ed
to
s
o
lv
e
ca
s
es
wh
er
e
th
e
s
im
ilar
ity
v
alu
e
of
a
wo
r
d
h
as
a
b
ig
d
if
f
er
e
n
ce
with
th
e
s
u
r
r
o
u
n
d
i
n
g
wo
r
d
s
.
T
h
e
a
ve
r
a
g
e
w
o
r
d
s
imila
r
ity
f
ea
tu
r
e
is
a
v
ec
to
r
th
at
each
item
is
th
e
av
er
ag
e
of
th
e
w
o
r
d
s
imila
r
ity
v
alu
es
wi
th
in
th
e
s
en
ten
ce
.
T
h
e
s
en
ten
ce
b
a
s
ed
s
imila
r
ity
f
ea
tu
r
e
is
a
v
ec
to
r
th
at
ea
c
h
item
is
th
e
m
ax
im
u
m
of
s
en
ten
ce
s
im
ilar
ities
of
th
e
s
en
ten
c
e
co
n
tain
in
g
th
at
wo
r
d
.
T
h
e
d
etailed
in
f
o
r
m
atio
n
on
th
e
wo
r
d
-
p
h
ase
f
ea
tu
r
es
is
ex
p
lain
ed
by
:
Giv
en
th
e
e
x
ten
d
ed
s
u
s
p
icio
u
s
p
ass
ag
e
P
=(
p
1
,p
2
,
…,
p
n
)
,
t
h
e
e
x
ten
d
ed
s
o
u
r
ce
p
ass
ag
e
Q=(
q
1
,q
2
,…,q
m
)
with
each
wo
r
d
p
i
an
d
q
j
is
r
ep
r
esen
ted
by
a
wo
r
d
e
m
b
ed
d
in
g
v
ec
to
r
.
−
W
o
r
d
s
im
ilar
ity
L
et
us
ca
ll
s
im(
p
i
,q
j
)
is
th
e
co
s
in
e
s
im
ilar
ity
b
etwe
en
two
wo
r
d
v
ec
to
r
s
p
i
an
d
q
j
.
T
h
e
w
o
r
d
s
imila
r
ity
f
ea
tu
r
e
b
etwe
en
P
a
n
d
Q
is
a
v
ec
to
r
b
ein
g
co
m
p
u
ted
as
(
1
1
)
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
10
,
N
o
.
3
,
Sep
tem
b
er
2
0
2
1
:
6
3
6
-
648
642
w
s
im(
P
,
Q)
=
(
∈
(
1
,
)
,
∈
(
2
,
)
,…,
∈
(
,
)
)
(
1
1
)
−
Av
er
ag
e
wo
r
d
s
im
ilar
ity
Giv
en
wi
(
with
i=
1
÷
n
+m
)
,
is
th
e
i
-
th
wo
r
d
in
th
e
p
air
of
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
p
ass
ag
es,
d
is
th
e
s
en
ten
ce
th
at
w
i
∈
d,
an
d
|
d
|
is
th
e
to
tal
n
u
m
b
e
r
of
wo
r
d
s
in
th
e
s
en
ten
ce
d
.
L
et
us
ca
ll
a
vg
(
w
i
)
is
th
e
a
ve
r
a
g
e
s
imila
r
ity
of
wo
r
d
w
i
in
th
e
s
en
ten
ce
d;
w
s
im(
i)
is
th
e
v
al
u
e
of
th
e
i
-
th
item
in
th
e
w
o
r
d
s
imila
r
ity
f
ea
tu
r
e
v
ec
to
r
.
T
h
en
,
th
e
a
vg
(
w
i
)
is
co
m
p
u
ted
as
:
(
)
=
∑
(
)
∈
|
|
(
1
2
)
T
h
e
a
ve
r
a
g
e
w
o
r
d
s
imila
r
ity
f
ea
tu
r
e
b
etwe
en
two
p
ass
ag
es
P
an
d
Q
is
a
v
ec
to
r
d
eter
m
in
e
d
by
th
e
f
o
llo
win
g
f
o
r
m
u
la:
w
a
vg
(
P
,
Q)
=
(
a
v
g
(
p
1
),
a
vg
(p
2
)
,
…,
a
vg
(p
n
),
a
vg
(
q
1
),
a
vg
(
q
2
)
,
…,
a
vg
(
q
m
))
(
1
3
)
−
Sen
ten
ce
b
ased
s
im
ilar
ity
We
r
eu
s
e
th
e
ma
ximi
z
e
p
a
s
s
a
g
e
s
imila
r
ity
f
ea
tu
r
e
(
as
d
e
s
cr
ib
ed
in
th
e
p
ass
ag
e
-
p
h
ase)
with
th
e
m
ea
n
in
g
of
th
e
p
ass
ag
e
is
th
e
s
en
ten
ce
.
Giv
en
th
e
s
et
of
s
en
ten
ce
s
U
=
(u
1
,u
2
,
…,
u
k
),
an
d
V
=
(v
1
,v
2
,
…,
v
s
)
in
th
e
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
p
as
s
ag
es,
r
esp
ec
tiv
ely
.
L
et
us
ca
l
l
s
im_
s
en
t(
p
i
)
is
th
e
s
en
ten
ce
b
a
s
ed
s
imila
r
ity
of
wo
r
d
p
i
in
th
e
s
en
ten
ce
u
j
.
T
h
e
s
im_
s
en
t(
p
i
)
is
co
m
p
u
ted
as
:
_
(
)
=
ma
x
∈
(
,
)
|
∈
(
1
4
)
T
h
e
s
en
ten
ce
b
a
s
ed
s
imila
r
ity
f
ea
tu
r
e
b
etwe
en
two
p
ass
ag
es
P
an
d
Q
is
a
v
ec
to
r
d
eter
m
in
ed
by
th
e
f
o
llo
win
g
f
o
r
m
u
la:
w
s
en
t(
P
,
Q)
=(
s
im_
s
en
t(
p
1
)
,
s
im
_
s
en
t(
p
2
)
,
…,
s
im_
s
en
t(
p
n
)
,
s
im_
s
en
t(
q
1
)
,
s
im_
s
en
t(
q
2
)
,
…,
s
im_
s
en
t(
q
m
))
(
1
5
)
T
h
e
f
ea
tu
r
e
m
atr
ix
f
o
r
th
e
w
o
r
d
-
p
h
ase:
Af
ter
co
m
p
u
tin
g
th
r
ee
f
ea
tu
r
e
v
ec
to
r
s
w
s
im(
P
,
Q)
,
w
a
vg
(
P
,
Q)
,
an
d
w
s
en
t(
P
,
Q)
,
we
co
m
b
in
e
th
ese
f
ea
tu
r
e
v
ec
to
r
s
in
to
a
two
-
d
im
en
s
io
n
al
m
atr
ix
of
s
ize
(
n
+m)
x
3
.
=
(
ma
x
q
j
∈
Q
s
im
(
p
1
,
q
j
)
(
1
)
_
(
1
)
ma
x
q
j
∈
Q
s
im
(
p
2
,
q
j
)
(
2
)
_
(
2
)
⋮
⋮
⋮
ma
x
p
j
∈
Q
s
im
(
q
m
,
p
j
)
(
)
_
(
)
)
(
1
6
)
T
h
e
f
ea
tu
r
e
m
atr
ix
of
all
th
e
e
x
ten
d
ed
p
lag
iar
is
m
p
ass
ag
es
is
d
eter
m
in
ed
by
(
1
6
)
.
T
h
is
f
ea
tu
r
e
m
atr
ix
is
u
s
ed
as
th
e
in
p
u
t
f
o
r
th
e
m
u
lti
-
lay
er
L
STM
m
o
d
el,
d
escr
ib
e
d
in
s
e
ctio
n
2
.
3
.
2
.
2
.
3
.
2
.
P
la
g
ia
rism
s
t
ring
s
elec
t
io
n
In
th
is
s
ec
tio
n
,
we
co
n
d
u
ct
t
wo
p
r
o
ce
s
s
in
g
s
tep
s
:
(
i)
s
ele
ct
p
la
g
ia
r
is
m
s
en
ten
ce
s
an
d
(
ii)
r
emo
ve
r
ed
u
n
d
a
n
t
text
.
T
h
e
d
etail
s
of
each
s
tep
ar
e
d
escr
ib
e
d
as
:
−
Select
p
lag
iar
is
m
s
en
ten
ce
s
To
s
elec
t
ex
ac
t
p
lag
iar
is
m
s
en
ten
ce
s
f
r
o
m
t
h
e
ex
ten
d
ed
p
l
ag
iar
is
m
p
ass
ag
es,
we
u
s
e
a
m
u
lti
-
lay
er
L
STM
m
o
d
el
wh
o
s
e
i
n
p
u
t
is
t
ak
en
f
r
o
m
th
e
f
ea
tu
r
e
m
atr
i
x
f
word
as
s
h
o
wn
in
Fig
u
r
e
4.
T
h
e
p
ar
am
ete
r
s
u
s
in
g
in
th
is
m
o
d
el
ar
e:
(
i)
b
a
tc
h
_
s
iz
e
eq
u
als
th
e
n
u
m
b
er
of
wo
r
d
s
;
(
ii)
time_
s
tep
s
eq
u
als
1;
(
iii
)
s
eq
_
len
eq
u
als
t
h
e
n
u
m
b
er
of
f
ea
tu
r
es
(
s
eq
_
len
=
3
)
.
In
Fig
u
r
e
4,
p
i
a
n
d
q
j
d
e
n
o
te
s
th
e
i
-
th
an
d
j
-
th
wo
r
d
in
t
h
e
p
air
of
ex
ten
d
ed
p
l
a
g
i
a
r
i
s
m
p
a
s
s
a
g
e
s
,
_
=
(
1
,
2
,
…
,
+
)
is
t
h
e
o
u
t
p
u
t
of
t
h
e
b
i
n
a
r
y
c
l
a
s
s
i
f
i
c
a
t
i
o
n
m
o
d
e
l
(0
<
y
i
<
1)
,
n
+
m
is
t
h
e
t
o
t
a
l
n
u
m
b
e
r
of
w
o
r
d
s
in
t
h
e
p
a
i
r
of
t
h
e
s
e
p
a
s
s
a
g
e
s
.
T
h
e
p
r
e
d
i
c
t
e
d
m
e
a
n
v
a
l
u
e
of
a
s
e
n
t
e
n
c
e
u
is
c
o
m
p
u
t
e
d
as
in
(
1
7
)
:
_
_
=
(
_
)
=
∑
y
∈
|
|
(
1
7
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
A
tw
o
-
p
h
a
s
e
p
la
g
ia
r
is
m
d
etec
tio
n
s
ystem
b
a
s
ed
o
n
mu
lti
-
la
y
er LS
TM
n
etw
o
r
k
s
(
N
g
u
ye
n
V
a
n
S
o
n
)
643
wh
er
e
w
i
is
a
wo
r
d
in
th
e
s
en
te
n
ce
u
.
Af
ter
co
m
p
u
tin
g
v
alu
es
_
_
f
o
r
all
s
en
ten
ce
s
,
we
cr
ea
te
a
v
ec
to
r
with
th
e
s
ize
co
r
r
esp
o
n
d
in
g
to
t
h
e
to
tal
n
u
m
b
er
of
s
en
ten
ce
s
in
th
e
p
air
of
p
la
g
iar
is
m
p
ass
ag
es.
If
th
e
v
alu
e
of
y_
p
r
ed
_
s
en
t
of
a
s
en
ten
ce
is
h
ig
h
e
r
th
a
n
a
th
r
esh
o
l
d
β
,
th
e
v
alu
e
c
o
r
r
esp
o
n
d
i
n
g
to
th
at
wo
r
d
in
th
e
s
en
ten
ce
is
1;
o
th
er
wis
e,
it
is
0.
We
s
elec
t
th
e
lo
n
g
est
s
tr
in
g
s
with
th
e
v
alu
e
of
1
as
th
e
p
la
g
i
a
r
is
m
s
en
ten
ce
s
.
Fig
u
r
e
4.
T
h
e
ar
ch
itectu
r
e
of
t
h
e
m
u
lti
-
lay
er
L
STM
m
o
d
el
at
th
e
wo
r
d
-
p
h
ase
−
R
em
o
v
e
r
ed
u
n
d
an
t
te
x
t
To
ac
h
iev
e
th
e
ex
ac
t
p
la
g
iar
is
m
s
tr
in
g
s
,
we
co
n
s
id
er
th
e
lef
tm
o
s
t
p
lag
iar
is
m
s
en
te
n
c
e
an
d
th
e
r
ig
h
tm
o
s
t
one
.
T
h
e
d
if
f
er
en
c
e
b
etwe
en
th
ese
s
en
ten
ce
s
’
ma
x_
th
r
esh
o
ld
an
d
min
_
th
r
esh
o
ld
is
h
ig
h
er
th
a
n
t
1
(t
1
=0
.
4
)
.
T
h
e
ma
x_
t
h
r
esh
o
ld
an
d
min
_
th
r
esh
o
ld
of
a
s
en
ten
ce
u
ar
e
d
eter
m
in
e
d
by
(
1
8
)
a
n
d
(
1
9
)
:
_
ℎ
ℎ
=
∈
(
1
8
)
_
ℎ
ℎ
=
∈
(
1
9
)
with
w
i
is
a
wo
r
d
in
th
e
s
en
ten
ce
u.
T
h
ese
s
en
ten
ce
s
ab
o
v
e
h
av
e
o
n
e
p
ar
t
in
s
id
e
a
n
d
t
h
e
r
em
ain
i
n
g
p
a
r
t
o
u
ts
id
e
th
e
p
lag
iar
is
m
p
ass
ag
e.
T
h
e
o
u
ts
id
e
p
a
r
t
is
on
th
e
lef
t
(
o
r
ien
t
=1
)
if
th
e
s
en
ten
ce
is
on
th
e
lef
t
of
th
e
p
lag
iar
is
m
s
en
ten
ce
s
or
on
th
e
r
ig
h
t
(
o
r
ien
t
=2
)
if
th
e
s
en
ten
c
e
is
on
t
h
e
r
ig
h
t
of
th
e
p
lag
iar
i
s
m
s
en
ten
ce
s
.
If
t
h
e
p
r
ev
io
u
s
s
tep
r
esu
lt
co
n
tain
s
o
n
ly
one
s
en
ten
ce
,
t
h
e
o
u
ts
id
e
p
ar
t
b
elo
n
g
s
to
th
e
two
en
d
s
(
o
r
ien
t
=3
)
of
th
e
s
en
ten
ce
.
An
aly
zin
g
th
e
o
u
tp
u
t
v
ec
to
r
of
th
e
L
STM
m
o
d
el
y
_
p
r
ed
,
we
d
is
co
v
er
th
at
th
e
p
r
e
d
icted
v
alu
e
y
i
c
o
r
r
esp
o
n
d
in
g
of
th
e
in
s
id
e
wo
r
d
s
is
m
u
ch
h
ig
h
e
r
th
an
th
e
p
r
e
d
ic
ted
v
alu
e
y
j
c
o
r
r
esp
o
n
d
in
g
of
t
h
e
o
u
ts
id
e
o
n
es.
Alg
o
r
ith
m
1
is
u
s
ed
to
cu
t
o
f
f
th
e
r
ed
u
n
d
a
n
t
tex
t
f
r
o
m
th
ese
s
en
ten
ce
s
.
T
h
e
id
ea
of
th
is
alg
o
r
ith
m
is
:
Giv
en
a
th
r
esh
o
ld
α,
f
in
d
th
e
lo
n
g
est
tex
t
in
th
e
lef
tm
o
s
t
s
e
n
ten
ce
an
d
th
e
r
ig
h
tm
o
s
t
one
wh
o
s
e
all
of
th
eir
wo
r
d
s
h
av
e
th
e
p
r
ed
ictiv
e
v
al
u
e
y_
p
r
ed
<
α.
We
d
ef
in
ed
th
e
lef
t
an
d
r
ig
h
t
p
o
s
itio
n
as
th
e
f
ir
s
t
an
d
last
wo
r
d
of
th
e
ex
ac
t
p
lag
iar
is
m
s
tr
in
g
s
,
r
esp
ec
tiv
ely
.
T
h
e
alg
o
r
ith
m
r
ec
eiv
es
th
e
f
o
llo
win
g
p
ar
am
et
er
s
as
in
p
u
ts
:
−
y_
d
:
is
th
e
p
r
ed
icted
v
ec
to
r
of
th
e
s
en
ten
ce
.
y
_
=
(
_
1
,
_
2
,
…
,
_
)
with
t
is
th
e
n
u
m
b
er
of
wo
r
d
s
in
th
e
s
en
ten
ce
.
−
o
r
ien
t:
d
eter
m
in
es
th
e
in
ter
s
e
ctio
n
p
o
s
itio
n
in
th
e
lef
t
(
o
r
i
en
t
=1
)
or
r
ig
h
t
(
o
r
ien
t
=2
)
or
b
o
th
s
id
es
(
o
r
ien
t
=3
)
of
b
o
u
n
d
ar
y
s
en
te
n
ce
s
.
Alg
o
rit
hm
1:
I
n
ter
s
ec
tio
n
p
o
s
itio
n
d
eter
m
in
atio
n
Input:
y_d,
orient
1:
#
orient
=
1:
left;
orient
=
2:
right;
orient
=
3:
both
2:
pos_left
=
0;
pos_right
=
length(y_d)
–
1
3:
α
=
min(y_d)
+
(max(y_d)
-
min(y_d))/2
4:
if
orient
=
1
or
orient
=
3
then
5:
for
i
=
0
to
length(y_d)
-
1
do
6:
if
y_d
[i]
>
α
then
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
10
,
N
o
.
3
,
Sep
tem
b
er
2
0
2
1
:
6
3
6
-
648
644
7:
pos_left
=
i
8
break
9:
if
orient
=
2
or
orient
=
3
then
10:
for
i
=
length(y_d)
–
1
downto
0
do
11:
if
y_d[i]
>
α
then
12:
pos_right
=
i
13
break
Output:
pos_left,
pos_right
We
in
itialize
th
e
lef
t
an
d
r
ig
h
t
p
o
s
itio
n
s
with
th
e
f
ir
s
t
an
d
last
p
o
in
ts
,
r
esp
ec
tiv
ely
(
lin
es
2
)
.
T
h
e
th
r
esh
o
ld
α
is
th
e
av
er
ag
e
v
al
u
e
of
m
ax
im
u
m
a
n
d
m
in
im
u
m
of
y_
d
v
ec
to
r
.
We
d
e
f
in
e
t
h
e
lef
t
(
lin
e
4)
an
d
r
ig
h
t
(
lin
e
9)
p
o
s
itio
n
b
ased
on
th
e
o
r
ien
t
v
alu
e.
Fo
r
each
d
ir
ec
tio
n
,
we
s
ca
n
all
th
e
p
o
in
ts
(
lin
e
5
an
d
lin
e
10)
an
d
g
et
th
e
f
ir
s
t
p
o
in
ts
wh
o
s
e
p
r
ed
ict
v
alu
e
y
_
p
r
ed
ar
e
h
i
g
h
e
r
th
an
th
e
th
r
esh
o
ld
α
(
lin
e
7
a
n
d
lin
e
1
3
)
.
T
h
ese
p
o
in
ts
ar
e
th
e
r
esu
lts
of
th
e
alg
o
r
ith
m
.
3.
E
XP
E
R
I
M
E
N
T
R
E
SU
L
T
S
AND
DIS
CUSS
I
O
N
In
our
ex
p
e
r
im
en
t,
we
u
s
e
P
AN
2013
tex
t
alig
n
m
en
t
tr
ain
in
g
co
r
p
u
s
[
2
1
]
f
o
r
tr
ain
in
g
th
e
s
y
s
tem
.
T
h
is
co
r
p
u
s
is
also
th
e
tr
ai
n
in
g
co
r
p
u
s
u
s
in
g
in
PAN
2
0
1
4
co
m
p
etitio
n
.
T
h
e
PAN
2
0
1
3
co
r
p
u
s
co
n
s
is
ts
of
1000
n
o
o
b
f
u
s
ca
tio
n
,
1
0
0
0
r
an
d
o
m
o
b
f
u
s
ca
tio
n
,
1
0
0
0
tr
a
n
s
latio
n
o
b
f
u
s
ca
tio
n
,
an
d
1
1
8
5
s
u
m
m
ar
y
o
b
f
u
s
ca
tio
n
p
air
s
of
d
o
cu
m
en
ts
.
No
r
m
ally
,
th
is
co
r
p
u
s
is
to
o
s
m
all
f
o
r
tr
a
in
in
g
a
d
ee
p
lear
n
in
g
m
o
d
el.
By
our
ex
p
er
i
m
en
t,
we
will
p
r
o
v
e
th
at
o
u
r
ap
p
r
o
ac
h
of
co
m
b
in
in
g
h
a
n
d
-
c
r
af
te
d
f
ea
tu
r
es
with
th
e
L
STM
m
o
d
el
will
be
a
g
o
o
d
s
o
lu
tio
n
f
o
r
th
is
p
r
o
b
lem
.
To
co
m
p
ar
e
o
u
r
s
y
s
tem
p
er
f
o
r
m
a
n
ce
with
s
tate
-
of
-
th
e
-
ar
t
r
esea
r
ch
in
th
is
task
,
we
u
s
ed
PAN
2014
te
x
t
alig
n
m
en
t
test
co
r
p
u
s
[
2
2
]
f
o
r
ev
alu
atin
g
th
e
s
y
s
tem
.
3
.
1
.
E
v
a
lua
t
io
n
m
et
rics
Ou
r
s
y
s
tem
was
ev
alu
ate
d
by
u
s
in
g
a
to
o
l
p
r
o
v
id
e
d
by
PA
N
to
m
ea
s
u
r
e
th
e
s
y
s
tem
p
er
f
o
r
m
an
ce
.
Fo
u
r
m
ea
s
u
r
es
u
s
ed
in
PAN
a
r
e
m
ac
r
o
-
av
e
r
ag
ed
Pre
cisi
o
n
,
R
ec
all,
Plag
d
et,
an
d
Gr
an
u
lar
ity
.
T
h
e
f
o
r
m
u
la
to
co
m
p
u
te
th
ese
v
al
u
es
ar
e
d
esc
r
ib
ed
s
u
ch
as:
Giv
en
S,
R,
s,
r
ar
e
a
s
et
of
all
p
lag
iar
is
m
ca
s
es,
a
s
et
of
all
p
lag
iar
is
m
s
y
s
tem
-
d
etec
tio
n
ca
s
es,
a
p
lag
iar
is
m
ca
s
e,
an
d
a
p
lag
ia
r
is
m
s
y
s
tem
-
d
etec
tio
n
ca
s
e,
r
esp
ec
tiv
ely
.
T
h
e
m
ac
r
o
-
a
v
er
a
g
ed
p
r
ec
is
io
n
an
d
r
ec
all
ar
e
d
ef
in
e
d
by:
(
,
)
=
1
|
|
×
∑
|
∪
∈
(
∩
)
|
|
|
∈
(
2
0
)
(
,
)
=
1
|
|
×
∑
|
∪
∈
(
∩
)
|
|
|
∈
(
2
1
)
T
h
e
d
etec
tio
n
g
r
an
u
lar
ity
of
R
u
n
d
er
S
in
d
icate
s
wh
eth
er
ea
ch
p
lag
iar
is
m
ca
s
e
s
S
is
d
etec
ted
as
a
wh
o
le
or
in
s
ev
er
al
p
iece
s
.
It
is
ca
lcu
lated
as:
(
,
)
=
1
|
|
×
∑
|
|
∈
(
2
2
)
wh
er
e
S
R
S
ar
e
ca
s
es
d
etec
ted
by
d
etec
tio
n
s
in
R,
an
d
R
S
R
ar
e
th
e
d
etec
tio
n
s
of
a
g
iv
en
s.
Plag
d
et
is
th
e
o
v
er
all
s
co
r
e
of
th
e
s
y
s
tem
,
wh
ich
is
ca
lcu
lated
as:
(
,
)
=
2
×
×
+
×
1
2
(
1
+
(
,
)
)
(
2
3
)
3
.
2
.
E
x
perim
ent
a
l
re
s
ults
a
nd
a
n
a
ly
s
is
Sev
er
al
test
s
h
av
e
b
ee
n
ca
r
r
ied
out
to
ch
o
o
s
e
th
e
b
est
co
n
f
i
g
u
r
atio
n
f
o
r
o
u
r
s
y
s
tem
.
We
p
er
f
o
r
m
ed
ex
p
er
im
en
ts
by
each
p
h
ase
to
o
p
tim
ize
p
ar
am
eter
s
of
th
e
s
y
s
tem
.
E
x
tr
ac
ted
f
ea
tu
r
e
v
ec
to
r
s
f
r
o
m
p
air
s
of
d
o
cu
m
e
n
ts
in
th
e
PAN
2013
tr
ain
in
g
co
r
p
u
s
ar
e
p
ass
ed
to
th
e
m
u
lti
-
lay
er
L
STM
m
o
d
el
d
u
r
in
g
t
h
e
tr
ain
in
g
p
r
o
ce
s
s
.
We
ch
o
s
e
b
in
a
r
y_
cro
s
s
en
tr
o
p
y
as
th
e
lo
s
s
f
u
n
ctio
n
s
in
ce
th
e
m
o
d
el
is
a
b
in
ar
y
class
if
icatio
n
m
o
d
el.
T
h
e
th
r
esh
o
ld
θ,
wh
ich
is
u
s
e
d
to
s
elec
t
s
en
ten
ce
s
in
th
e
p
ass
ag
e
-
p
h
ase,
is
ch
o
s
en
to
be
0
.
1
.
To
ch
o
o
s
e
th
e
v
alu
e
k
(
m
e
n
tio
n
ed
in
s
ec
tio
n
2
.
2
.
2
)
f
o
r
ex
ten
d
in
g
p
lag
ia
r
is
m
p
ass
ag
es,
we
in
itiate
th
e
k
v
alu
e
by
1
an
d
co
n
tin
u
o
u
s
ly
in
cr
ea
s
in
g
th
is
v
alu
e
u
n
til
th
e
s
y
s
tem
r
ea
ch
es
t
h
e
h
ig
h
est
r
ec
all
v
alu
e.
E
x
p
e
r
im
en
ts
p
r
o
v
e
d
th
at
th
e
v
alu
e
of
k
d
e
p
en
d
s
on
th
e
l
en
g
th
of
th
e
p
la
g
iar
is
m
p
ass
ag
es,
as
s
h
o
wn
in
T
ab
le
1.
At
th
e
wo
r
d
-
p
h
ase,
in
s
tead
of
u
s
in
g
th
r
esh
o
l
d
s
to
id
en
tify
each
wo
r
d
,
we
ap
p
ly
th
e
th
r
esh
o
ld
β
(
β
=
0
.
1
)
to
th
e
y_
p
r
ed
_
s
en
t
.
T
h
e
L
STM
m
o
d
el
g
en
er
ate
s
an
ar
r
ay
wh
o
s
e
s
ize
is
eq
u
al
to
th
e
n
u
m
b
er
of
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
A
tw
o
-
p
h
a
s
e
p
la
g
ia
r
is
m
d
etec
tio
n
s
ystem
b
a
s
ed
o
n
mu
lti
-
la
y
er LS
TM
n
etw
o
r
k
s
(
N
g
u
ye
n
V
a
n
S
o
n
)
645
s
en
ten
ce
s
.
T
h
e
v
alu
e
of
th
e
a
r
r
ay
’
s
elem
en
t
is
1
if
y_
p
r
ed
_
s
en
t
is
h
ig
h
er
th
an
β
,
an
d
0
f
o
r
o
th
er
s
.
T
h
en
we
s
elec
t
a
co
n
tin
u
o
u
s
s
tr
in
g
with
th
e
h
ig
h
est
p
r
ed
icted
v
alu
e.
T
ab
le
2
s
h
o
ws
th
e
ac
cu
r
ac
y
a
n
d
lo
s
s
v
alu
es
in
th
e
L
STM
tr
ain
in
g
p
h
ase
with
th
e
f
o
u
r
d
atasets
in
PAN
2013.
To
ev
alu
ate
th
e
e
f
f
ec
tiv
en
ess
of
our
p
r
o
p
o
s
ed
f
ea
tu
r
es,
we
ca
r
r
ied
ex
p
e
r
im
en
ts
u
s
in
g
ea
ch
f
ea
tu
r
e
in
s
tead
of
all
f
ea
tu
r
es,
with
th
e
in
p
u
t
is
p
air
s
of
d
o
cu
m
e
n
ts
f
r
o
m
PAN
2014
test
co
r
p
u
s
.
Fig
u
r
e
5
s
h
o
ws
th
e
ef
f
ec
t
of
t
h
ese
f
ea
tu
r
es
at
th
e
wo
r
d
-
p
h
ase
on
t
h
e
s
y
s
tem
o
u
t
p
u
t.
T
h
r
ee
p
air
s
of
Fig
u
r
es
5
(
a)
to
5
(
f
)
s
h
o
w
th
e
p
r
ed
ictio
n
r
esu
lts
of
y
_
p
r
ed
a
n
d
th
e
f
in
al
r
esu
lts
u
s
in
g
1,
2,
an
d
3
f
ea
t
u
r
es,
r
esp
ec
tiv
ely
.
In
th
ese
f
ig
u
r
es,
th
e
b
lu
e
lin
e
s
h
o
ws
th
e
p
r
ed
icted
r
esu
lt;
th
e
r
ed
lin
e
s
h
o
ws
th
e
av
er
ag
e
p
r
ed
icted
v
alu
e
by
s
e
n
ten
ce
s
.
T
h
e
g
r
ee
n
lin
e
s
ep
ar
ates
th
e
s
u
s
p
icio
u
s
an
d
s
o
u
r
ce
p
ass
ag
e;
th
e
b
lack
lin
e
s
h
o
ws
th
e
r
an
g
e
of
th
e
s
elec
ted
p
lag
iar
is
m
p
ass
ag
es.
T
h
e
ev
alu
atio
n
r
es
u
lts
p
r
o
v
e
d
t
h
a
t
a
l
l
t
h
e
p
r
o
p
o
s
e
d
f
e
a
t
u
r
e
s
a
r
e
u
s
e
f
u
l
,
s
o
l
v
i
n
g
w
e
l
l
f
o
r
b
o
t
h
l
i
t
e
r
a
l
p
l
a
g
i
a
r
i
s
m
a
n
d
i
n
t
e
l
l
i
g
e
n
t
p
lag
iar
is
m
.
T
ab
le
1.
T
h
e
d
y
n
am
ic
p
ar
am
et
er
s
f
o
r
ex
te
n
d
in
g
p
ass
ag
e
P
l
a
g
i
a
r
i
sm
p
a
ss
a
g
e
’
s
l
e
n
g
t
h
k
1
≥6
s
e
n
t
e
n
c
e
s
1
2
≥3
s
e
n
t
e
n
c
e
s
2
3
≥2
s
e
n
t
e
n
c
e
s
3
4
1
se
n
t
e
n
c
e
4
T
ab
le
2.
Acc
u
r
ac
y
an
d
lo
s
s
v
a
lu
es
of
th
e
tr
ain
in
g
p
h
ase
P
A
N
2
0
1
3
t
r
a
i
n
i
n
g
c
o
r
p
u
s
S
e
n
t
e
n
c
e
l
e
v
e
l
W
o
r
d
l
e
v
e
l
A
c
c
u
r
a
c
y
Lo
ss
A
c
c
u
r
a
c
y
Lo
ss
N
o
n
e
O
b
f
u
sc
a
t
i
o
n
0
.
9
9
2
5
0
.
0
0
6
8
0
.
9
8
0
8
0
.
0
1
6
1
R
a
n
d
o
m
O
b
f
u
s
c
a
t
i
o
n
0
.
9
7
2
7
0
.
0
8
1
4
0
.
9
3
0
3
0
.
1
9
0
9
Tr
a
n
s
l
a
t
e
O
b
f
u
s
c
a
t
i
o
n
0
.
9
7
0
7
0
.
0
7
4
8
0
.
9
4
4
3
0
.
1
2
2
9
S
u
mm
a
r
y
O
b
f
u
s
c
a
t
i
o
n
-
-
0
.
9
2
0
1
0
.
2
0
9
6
(
a)
(
b
)
(
c)
(
d
)
(
e)
(f)
Fig
u
r
e
5.
E
f
f
ec
ts
of
s
elec
tin
g
d
if
f
er
en
t
f
ea
tu
r
es
at
wo
r
d
-
p
h
a
s
e
to
p
lag
iar
is
m
p
ass
ag
e:
(
a)
u
s
in
g
one
f
ea
t
u
r
e
-
wsi
m
(
P,Q)
;
(
b
)
o
u
t
p
u
t’
s
r
esu
l
t
wh
en
u
s
in
g
wsi
m
(
P,Q)
;
(
c)
u
s
in
g
two
f
ea
tu
r
es
-
wsi
m
(
P,Q)
,
wav
g
(
P,Q)
;
(
d
)
o
u
tp
u
t’
s
r
esu
lt
wh
en
u
s
in
g
wsi
m
(
P,Q)
,
wav
g
(
P,Q)
;
(
e)
u
s
in
g
th
r
ee
f
ea
tu
r
es
-
wsi
m
(
P,Q)
,
wav
g
(
P,Q)
;
wsen
t
(
P,Q)
;
an
d
(f)
o
u
tp
u
t’
s
r
esu
lt
wh
en
u
s
in
g
wsi
m
(
P,Q)
,
wav
g
(
P,Q)
,
wsen
t
(
P,Q)
Evaluation Warning : The document was created with Spire.PDF for Python.