I
nd
o
ne
s
ia
n J
o
urna
l o
f
E
lect
rica
l En
g
ineering
a
nd
Co
m
pu
t
er
Science
Vo
l.
25
,
No
.
3
,
Ma
r
ch
2
0
2
2
,
p
p
.
1
7
0
3
~
1
7
1
1
I
SS
N:
2
5
0
2
-
4
7
5
2
,
DOI
: 1
0
.
1
1
5
9
1
/ijeecs.v
25
.i
3
.
p
p
1
7
0
3
-
1
7
1
1
1703
J
o
ur
na
l ho
m
ep
a
g
e
:
h
ttp
:
//ij
ee
cs.ia
esco
r
e.
co
m
Ana
ly
zing
sema
nt
ic simila
rity a
mo
ng
st t
e
x
tual
do
cu
ments to
sug
g
est near dupl
ica
tes
Vij
i D
ev
a
ra
j
a
n
1,
2
,
Rev
a
t
hy
S
ub
ra
m
a
nia
n
3
1
D
e
p
a
r
t
me
n
t
o
f
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
a
n
d
E
n
g
i
n
e
e
r
i
n
g
,
F
a
c
u
l
t
y
o
f
E
n
g
i
n
e
e
r
i
n
g
a
n
d
T
e
c
h
n
o
l
o
g
y
,
S
a
t
h
y
a
b
a
ma
I
n
st
i
t
u
t
e
o
f
S
c
i
e
n
c
e
a
n
d
Te
c
h
n
o
l
o
g
y
,
C
h
e
n
n
a
i
,
I
n
d
i
a
2
D
e
p
a
r
t
me
n
t
o
f
C
o
m
p
u
t
i
n
g
T
e
c
h
n
o
l
o
g
i
e
s,
F
a
c
u
l
t
y
o
f
En
g
i
n
e
e
r
i
n
g
a
n
d
T
e
c
h
n
o
l
o
g
y
,
S
R
M
I
n
st
i
t
u
t
e
o
f
Te
c
h
n
o
l
o
g
y
,
K
a
t
t
a
n
k
u
l
a
t
h
u
r
,
I
n
d
i
a
3
D
e
p
a
r
t
me
n
t
o
f
I
n
f
o
r
mat
i
o
n
Te
c
h
n
o
l
o
g
y
,
F
a
c
u
l
t
y
o
f
E
n
g
i
n
e
e
r
i
n
g
a
n
d
Te
c
h
n
o
l
o
g
y
,
S
a
t
h
y
a
b
a
ma
I
n
st
i
t
u
t
e
o
f
S
c
i
e
n
c
e
a
n
d
Te
c
h
n
o
l
o
g
y
,
C
h
e
n
n
a
i
,
I
n
d
i
a
Art
icle
I
nfo
AB
S
T
RAC
T
A
r
ticle
his
to
r
y:
R
ec
eiv
ed
Au
g
16
,
2
0
2
1
R
ev
is
ed
Dec
18
,
2
0
2
1
Acc
ep
ted
J
an
11
,
2
0
2
2
Da
ta
d
e
d
u
p
li
c
a
ti
o
n
tec
h
n
iq
u
e
s
re
m
o
v
in
g
re
p
e
a
ted
o
r
re
d
u
n
d
a
n
t
d
a
t
a
fro
m
th
e
sto
ra
g
e
.
I
n
re
c
e
n
t
d
a
y
s,
m
o
re
d
a
ta
h
a
s
b
e
e
n
g
e
n
e
ra
ted
a
n
d
sto
r
e
d
in
t
h
e
sto
ra
g
e
e
n
v
ir
o
n
m
e
n
t
.
M
o
re
re
d
u
n
d
a
n
t
a
n
d
se
m
a
n
ti
c
a
ll
y
sim
il
a
r
c
o
n
ten
t
o
f
th
e
d
a
ta
o
c
c
u
p
ied
in
t
h
e
st
o
ra
g
e
e
n
v
i
ro
n
m
e
n
t
d
u
e
t
o
th
is
sto
ra
g
e
e
ffici
e
n
c
y
will
b
e
re
d
u
c
e
d
a
n
d
c
o
st
o
f
th
e
st
o
ra
g
e
will
b
e
h
ig
h
.
To
o
v
e
rc
o
m
e
th
is
p
ro
b
lem
,
we
p
ro
p
o
se
d
a
m
e
th
o
d
h
y
b
ri
d
b
id
irec
ti
o
n
a
l
e
n
c
o
d
e
r
re
p
re
se
n
tatio
n
fr
o
m
tran
sfo
rm
e
rs
fo
r
tex
t
se
m
a
n
ti
c
s
u
sin
g
g
ra
p
h
c
o
n
v
o
lu
ti
o
n
a
l
n
e
tw
o
rk
h
y
b
rid
b
id
irec
ti
o
n
a
l
e
n
c
o
d
e
r
re
p
re
se
n
tat
io
n
fro
m
tran
sfo
rm
e
rs
(
BERT
)
m
o
d
e
l
f
o
r
tex
t
se
m
a
n
ti
c
s
(
HBTS
G
)
wo
rd
e
m
b
e
d
d
in
g
-
b
a
se
d
d
e
e
p
lea
rn
in
g
m
o
d
e
l
to
i
d
e
n
t
i
f
y
n
e
a
r
d
u
p
l
i
c
a
t
e
s
b
a
se
d
o
n
t
h
e
s
e
m
a
n
t
i
c
r
e
l
a
t
i
o
n
s
h
i
p
b
e
t
w
e
e
n
t
e
x
t
d
o
c
u
m
e
n
t
s
.
I
n
t
h
i
s
p
a
p
e
r
w
e
h
y
b
r
i
d
i
z
e
t
h
e
c
o
n
c
e
p
t
s
o
f
c
h
u
n
k
i
n
g
a
n
d
s
e
m
a
n
t
i
c
a
n
a
l
y
s
i
s
.
T
h
e
c
h
u
n
k
i
n
g
p
r
o
c
e
s
s
i
s
c
a
r
r
ie
d
o
u
t
t
o
s
p
l
i
t
t
h
e
d
o
c
u
m
e
n
t
s
i
n
t
o
b
l
o
c
k
s
.
N
e
x
t
s
t
a
g
e
w
e
i
d
e
n
t
i
f
y
t
h
e
s
e
m
a
n
t
i
c
r
e
l
a
t
i
o
n
s
h
i
p
b
e
t
w
e
e
n
d
o
c
u
m
e
n
t
s
u
s
i
n
g
w
o
r
d
e
m
b
e
d
d
i
n
g
t
e
c
h
n
i
q
u
e
s
.
I
t
c
o
m
b
i
n
e
s
t
h
e
a
d
v
a
n
t
a
g
e
s
o
f
t
h
e
c
h
u
n
k
i
n
g
,
f
e
a
t
u
r
e
e
x
t
r
a
c
t
i
o
n
,
a
n
d
s
e
m
a
n
t
ic
r
e
la
t
i
o
n
s
t
o
p
r
o
v
i
d
e
b
e
t
t
e
r
r
e
s
u
l
t
s
.
K
ey
w
o
r
d
s
:
B
E
R
T
Dee
p
lear
n
in
g
GC
N
Key
wo
r
d
ex
tr
ac
ti
o
n
Sem
an
tic
-
s
im
ilar
ity
T
h
is i
s
a
n
o
p
e
n
a
c
c
e
ss
a
rticle
u
n
d
e
r th
e
CC B
Y
-
SA
li
c
e
n
se
.
C
o
r
r
e
s
p
o
nd
ing
A
uth
o
r
:
Viji De
v
ar
ajan
Dep
ar
tm
en
t o
f
C
o
m
p
u
ter
Scie
n
ce
an
d
E
n
g
in
ee
r
in
g
,
Facu
lt
y
o
f
E
n
g
i
n
ee
r
in
g
an
d
T
ec
h
n
o
lo
g
y
Sath
y
ab
am
a
I
n
s
titu
te
o
f
Scien
ce
an
d
T
ec
h
n
o
lo
g
y
C
h
en
n
ai,
I
n
d
ia
E
m
ail:
d
v
iji2
k
@
g
m
ail.
co
m
1.
I
NT
RO
D
UCT
I
O
N
I
n
th
e
d
ig
ital
wo
r
ld
e
n
v
ir
o
n
m
en
t,
ev
e
r
y
o
n
e
cr
ea
tes
a
n
d
u
s
es
d
ig
ital
d
ata
d
a
y
b
y
d
ay
.
Du
e
to
t
h
is
en
o
r
m
o
u
s
am
o
u
n
t
o
f
d
ata
g
e
n
er
ated
an
d
s
to
r
e
d
in
th
e
clo
u
d
en
v
ir
o
n
m
e
n
t.
As
p
er
th
e
s
tatis
t
ics
r
ep
o
r
t
o
f
in
ter
n
atio
n
al
d
ata
c
o
r
p
o
r
atio
n
(
I
DC
)
th
e
d
ata
v
o
lu
m
e
wil
l
r
e
ac
h
7
4
ze
ttab
y
tes
b
y
th
e
en
d
o
f
2
0
2
1
.
T
h
e
an
n
u
al
g
r
o
wth
r
ate
o
f
d
ig
ital
d
ata
is
2
6
p
e
r
ce
n
t,
s
o
b
y
th
e
en
d
o
f
2
0
2
4
it
will
r
ea
ch
1
4
9
ze
ttab
y
tes.
T
h
is
will
b
e
a
ted
io
u
s
is
s
u
e
to
m
ain
tain
d
ig
ital
d
ata
in
th
e
f
u
tu
r
e.
T
h
e
d
ata
d
ed
u
p
licatio
n
tech
n
iq
u
e
is
b
ec
o
m
in
g
a
p
r
ed
o
m
i
n
an
t
m
eth
o
d
o
lo
g
y
to
m
ain
tain
d
ata
in
th
e
clo
u
d
e
n
v
ir
o
n
m
e
n
t.
Data
d
ed
u
p
licatio
n
r
em
o
v
es
d
u
p
licate
co
n
ten
t
f
iles
an
d
m
ain
tain
s
u
n
iq
u
e
co
n
ten
t
in
th
e
s
to
r
ag
e
en
v
ir
o
n
m
en
t
[
1
]
-
[
3
]
.
Data
m
ay
b
e
in
a
d
if
f
er
en
t
f
o
r
m
at:
tex
t,
im
ag
es,
au
d
io
,
a
n
d
v
id
eo
.
Ded
u
p
licatio
n
tech
n
iq
u
es
will
d
if
f
er
b
ased
o
n
th
e
d
ata
ty
p
e.
Ma
jo
r
ly
class
if
ied
in
to
two
ty
p
es
te
x
t
an
d
m
u
ltime
d
ia
d
ata.
T
ex
t
d
a
ta
d
ed
u
p
licatio
n
is
ca
r
r
ie
d
o
u
t
b
y
f
ile
-
lev
el
an
d
co
n
ten
t
-
b
ased
d
e
d
u
p
licatio
n
.
Fil
e
lev
el
d
ed
u
p
licatio
n
wo
r
k
i
n
g
p
r
in
ci
p
le
b
ased
o
n
th
e
f
ile
n
am
e,
f
ile
ty
p
e,
a
n
d
s
ize
o
f
th
e
f
ile.
C
o
n
ten
t
-
b
ased
d
ed
u
p
licatio
n
s
p
lits
th
e
f
ile
in
to
b
lo
ck
s
.
B
lo
ck
s
m
ay
b
e
f
ix
e
d
o
r
v
ar
ia
b
le
s
izes.
Ded
u
p
licatio
n
ac
h
ie
v
es
s
o
m
e
h
ig
h
lig
h
te
d
p
o
i
n
ts
,
it
g
r
e
atly
in
cr
ea
s
es
th
e
s
to
r
ag
e
e
f
f
icien
cy
,
n
etwo
r
k
b
an
d
wid
th
,
a
n
d
d
ec
r
ea
s
es
th
e
co
s
t.
B
u
t
to
ac
h
iev
e
th
ese
th
in
g
s
d
ed
u
p
licatio
n
n
ee
d
s
ex
tr
a
r
eso
u
r
ce
s
to
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
5
0
2
-
4
7
5
2
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
,
Vo
l.
25
,
No
.
3
,
Ma
r
ch
20
22
:
1
7
0
3
-
1
7
1
1
1704
c
a
l
c
u
l
a
te
h
a
s
h
i
n
d
e
x
i
n
g
a
n
d
s
eg
m
e
n
t
a
t
i
o
n
p
o
r
t
i
o
n
.
D
e
d
u
p
l
i
c
a
ti
o
n
m
e
t
h
o
d
o
l
o
g
y
p
e
r
f
o
r
m
s
b
a
s
e
d
o
n
t
h
e
f
o
l
l
o
w
i
n
g
f
e
a
t
u
r
e
s
s
i
m
il
a
r
i
t
y
a
n
d
l
o
c
a
l
ity
-
b
a
s
e
d
i
n
d
e
x
i
n
g
[
4
]
,
c
o
n
t
e
n
t
-
b
a
s
e
d
i
n
l
i
n
e
d
e
d
u
p
l
i
c
a
ti
o
n
[
5
]
,
s
e
m
a
n
t
i
c
a
w
a
r
e
d
e
d
u
p
l
i
c
a
t
i
o
n
[
6
]
,
h
a
s
h
f
u
n
c
t
i
o
n
[
7
]
,
a
n
d
r
e
c
o
r
d
l
i
n
k
a
g
e
d
e
d
u
p
l
i
c
a
t
i
o
n
[
8
]
w
h
i
c
h
a
l
l
i
d
e
n
t
i
f
ie
s
d
u
p
l
i
c
a
t
e
c
o
n
t
e
n
t
a
n
d
r
e
d
u
c
e
s
t
h
e
s
t
o
r
a
g
e
s
p
a
ce
.
S
e
m
a
n
t
i
c
s
i
m
i
l
a
r
it
y
i
s
c
a
l
c
u
la
t
ed
b
e
t
w
e
e
n
w
o
r
d
s
,
o
r
s
e
n
t
e
n
c
es
,
a
n
d
d
o
c
u
m
e
n
t
s
.
C
o
m
p
ar
ed
with
wo
r
d
-
lev
el
o
r
s
en
ten
ce
-
lev
el,
th
e
tex
t
at
th
e
d
o
cu
m
e
n
t
-
lev
el
f
ea
tu
r
e
ex
tr
ac
tio
n
g
iv
es
b
etter
f
in
d
in
g
s
b
ec
au
s
e
d
o
cu
m
en
t
-
lev
el
r
elatio
n
s
h
ip
s
co
n
t
ain
a
g
r
ea
ter
n
u
m
b
er
o
f
e
n
titi
es
co
m
p
ar
ed
with
s
en
ten
ce
-
lev
el
[
9
]
.
So
,
it
b
ec
o
m
es
a
h
ar
d
er
task
,
to
o
v
er
co
m
e
th
is
p
r
o
b
lem
s
o
m
e
p
re
-
tr
a
in
ed
d
ee
p
lear
n
i
n
g
m
o
d
els
ar
e
u
s
ed
.
At
th
e
i
n
itial
s
tag
e
g
r
o
u
p
in
g
s
im
ilar
co
n
ten
t
d
ata
u
s
in
g
te
x
t
clu
s
ter
in
g
.
T
r
a
d
itio
n
a
l
alg
o
r
ith
m
s
wo
r
k
b
ased
o
n
k
ey
wo
r
d
an
d
p
atter
n
m
atch
in
g
,
g
r
o
u
p
in
g
s
im
ilar
d
o
cu
m
e
n
ts
in
o
n
e
clu
s
ter
an
d
n
o
n
-
s
im
ilar
d
o
cu
m
en
ts
in
a
n
o
th
er
clu
s
ter
[
1
0
]
.
Sem
an
tic
s
im
ilar
ity
is
ca
lcu
lated
b
y
th
e
d
is
tan
c
e
b
etwe
en
wo
r
d
s
b
y
u
s
in
g
co
s
in
e
s
im
ilar
ity
[
1
1
]
.
Of
ten
,
wo
r
d
s
m
ay
co
n
tain
d
if
f
er
en
t
m
ea
n
in
g
s
in
two
d
if
f
er
en
t
s
en
ten
ce
s
b
ased
o
n
th
e
c
o
n
tex
t,
b
u
t it
is
tr
ea
ted
as b
o
th
th
e
s
en
ten
ce
s
ar
e
s
im
ilar
.
T
h
e
m
ain
co
n
tr
ib
u
tio
n
o
f
th
is
r
esear
ch
wo
r
k
is
i)
f
i
x
ed
-
s
ize
b
lo
ck
in
g
m
ec
h
an
is
m
was
u
s
e
d
to
d
iv
id
e
th
e
d
o
cu
m
en
ts
in
to
b
lo
ck
s
to
d
o
th
e
co
m
p
ar
is
o
n
s
in
th
e
b
lo
ck
wis
e
m
an
n
er
,
ii)
k
ey
wo
r
d
e
x
tr
ac
tio
n
d
o
n
e
with
th
e
h
elp
o
f
b
id
i
r
ec
tio
n
al
en
c
o
d
er
r
ep
r
esen
tatio
n
f
r
o
m
tr
a
n
s
f
o
r
m
er
s
(
B
E
R
T
)
an
d
g
r
ap
h
co
n
v
o
lu
tio
n
al
n
etwo
r
k
(
GC
N
)
m
o
d
el
,
an
d
iii)
f
i
n
ally
s
em
an
tically
s
im
ilar
d
o
cu
m
e
n
ts
ar
e
g
r
o
u
p
ed
to
g
eth
e
r
u
s
in
g
K
-
m
ea
n
s
clu
s
ter
in
g
alg
o
r
ith
m
.
So
,
th
e
p
r
o
p
o
s
ed
m
o
d
el
b
len
d
ed
th
e
ad
v
an
tag
e
s
o
f
th
e
ch
u
n
k
in
g
p
r
o
ce
s
s
,
f
e
atu
r
e
ex
tr
ac
tio
n
a
n
d
s
em
an
tic
r
elatio
n
s
.
Data
d
ed
u
p
licatio
n
g
e
n
er
all
y
f
o
c
u
s
ed
o
n
elim
in
atin
g
r
ed
u
n
d
an
t
c
o
n
ten
t
f
r
o
m
th
e
s
to
r
ag
e
en
v
ir
o
n
m
en
t.
T
h
e
in
itial
s
tag
e
o
f
r
esear
ch
wo
r
k
f
o
c
u
s
ed
o
n
ca
lcu
latin
g
f
in
g
e
r
p
r
in
ts
f
o
r
b
lo
ck
s
b
ased
o
n
id
en
tify
in
g
r
ep
ea
ted
c
o
n
ten
ts
.
L
ater
f
o
cu
s
ed
o
n
co
n
ten
t
-
b
a
s
ed
d
ed
u
p
licatio
n
m
eth
o
d
s
li
k
e
lo
ca
l
m
ax
im
u
m
ch
u
n
k
i
n
g
(
L
MC),
asy
m
m
et
r
ic
ex
tr
em
u
m
(
AE
)
,
r
ap
id
asy
m
m
etr
ic
m
ax
i
m
u
m
alg
o
r
ith
m
(
R
AM
)
,
an
d
p
ar
ity
ch
ec
k
o
f
in
ter
v
al
(
PC
I
)
was p
r
o
p
o
s
ed
to
a
v
o
id
b
y
te
s
h
if
tin
g
p
r
o
b
lem
a
n
d
p
er
f
o
r
m
an
ce
o
f
c
h
u
n
k
in
g
s
ize
[
1
2
]
.
Ded
u
p
licatio
n
-
ass
is
ted
clo
u
d
-
of
-
clo
u
d
s
(
DAC)
[
1
3
]
u
s
es
d
ata
d
is
tr
ib
u
tio
n
in
m
u
ltip
le
in
d
ep
en
d
e
n
t
clo
u
d
s
to
r
a
g
e
p
r
o
v
id
er
s
.
DA
C
im
p
r
o
v
es
t
h
e
p
er
f
o
r
m
an
ce
an
d
c
o
s
t
-
ef
f
icien
cy
s
ig
n
if
ican
tly
.
L
ess
s
ec
u
r
ity
o
f
clo
u
d
s
to
r
ag
e
s
y
s
tem
s
.
Fas
t
s
em
an
tic
d
u
p
licate
d
etec
tio
n
[
1
4
]
d
id
au
t
o
m
atic
tex
t
d
ata
d
ed
u
p
licatio
n
with
Fre
n
ch
an
d
E
n
g
lis
h
tex
t
in
a
p
ar
ticu
lar
r
eg
io
n
.
B
u
t
th
e
class
if
icatio
n
d
o
esn
’
t
s
elec
t
o
p
tim
al
f
ea
tu
r
es,
in
cr
ea
s
in
g
s
ea
r
ch
co
m
p
lex
ity
.
R
ec
o
r
d
lin
k
ag
e
an
d
v
ar
io
u
s
in
d
ex
m
eth
o
d
s
p
er
f
o
r
m
an
ce
h
av
e
b
ee
n
co
m
p
a
r
ed
in
th
is
s
u
r
v
ey
[8
]
.
Ded
u
p
licatio
n
ac
cu
r
ac
y
ca
lcu
lated
o
n
ly
b
ased
o
n
r
ec
o
r
d
lev
el.
M
a
j
o
r
e
x
i
s
t
i
n
g
w
o
r
k
f
o
c
u
s
e
d
o
n
c
h
u
n
k
i
n
g
o
r
b
l
o
c
k
s
i
z
e
,
an
d
t
h
e
i
n
d
e
x
i
n
g
m
e
t
h
o
d
.
B
a
s
e
d
o
n
t
h
i
s
a
c
h
i
e
v
e
d
d
e
d
u
p
l
i
c
a
ti
o
n
o
n
t
e
x
t
d
a
t
a
.
T
h
e
r
e
is
a
r
es
e
a
r
c
h
g
a
p
i
n
i
d
e
n
t
i
f
y
i
n
g
t
h
e
s
e
m
a
n
ti
c
r
e
la
t
i
o
n
s
h
i
p
s
b
e
t
we
e
n
t
e
x
t
d
o
c
u
m
e
n
t
s
.
I
n
t
h
i
s
p
a
p
e
r
,
w
e
f
o
c
u
s
e
d
o
n
d
e
d
u
p
l
i
c
a
t
i
o
n
b
a
s
e
d
o
n
s
e
m
a
n
t
i
c
r
e
la
t
i
o
n
s
h
i
p
s
b
e
t
w
e
e
n
d
o
c
u
m
e
n
t
s
.
S
e
m
a
n
t
i
c
r
el
a
t
i
o
n
s
h
i
p
s
a
r
e
i
d
e
n
t
i
f
i
e
d
a
n
d
a
p
p
l
i
e
d
i
n
t
h
e
f
o
l
l
o
w
i
n
g
a
p
p
l
i
c
at
i
o
n
i
n
t
h
e
p
r
e
v
i
o
u
s
d
a
y
s
:
t
e
x
t
c
l
a
s
s
i
f
ic
a
t
i
o
n
,
d
o
c
u
m
e
n
t
c
l
u
s
t
er
i
n
g
,
s
e
a
r
c
h
e
n
g
i
n
e
q
u
e
r
i
es
,
t
e
x
t
s
u
m
m
a
r
i
z
at
i
o
n
,
a
n
d
r
e
c
o
m
m
en
d
a
t
i
o
n
s
y
s
te
m
.
Sem
an
tic
r
elatio
n
s
h
ip
s
b
etwe
e
n
d
o
cu
m
en
ts
i
d
en
tify
s
im
ilar
co
n
ten
t
in
f
o
r
m
atio
n
b
etwe
en
t
h
em
.
T
h
is
ca
n
b
e
d
o
n
e
b
y
th
e
d
is
tan
ce
b
etwe
en
ter
m
in
o
lo
g
y
.
I
n
r
e
ce
n
t
y
ea
r
s
n
atu
r
al
lan
g
u
ag
e
p
r
o
ce
s
s
in
g
h
el
p
s
to
ca
lcu
late
s
em
an
tic
s
im
ilar
ity
b
etwe
en
wo
r
d
s
an
d
s
en
ten
ce
s
b
ased
o
n
w
o
r
d
co
-
o
cc
u
r
r
e
n
c
es,
lex
ical
d
atab
ase,
o
r
co
r
p
u
s
.
W
o
r
d
c
o
-
o
cc
u
r
r
en
c
e
m
eth
o
d
s
ex
tr
ac
t
k
ey
wo
r
d
s
f
r
o
m
th
e
d
o
cu
m
e
n
ts
b
ased
o
n
t
h
at
p
r
o
ce
s
s
in
g
b
u
t,
it d
o
es n
o
t ta
k
e
ca
r
e
ab
o
u
t th
e
wo
r
d
o
r
d
e
r
o
f
s
en
ten
ce
s
an
d
m
ea
n
in
g
o
f
th
e
wo
r
d
f
r
o
m
th
e
co
n
tex
t.
T
h
e
l
ex
ical
d
atab
ase
co
n
tain
s
th
e
p
r
ed
ef
i
n
ed
wo
r
d
tr
ee
s
tr
u
ctu
r
e
it
r
ep
r
esen
ts
th
e
wo
r
d
,
m
ea
n
in
g
,
a
n
d
r
elatio
n
s
h
ip
with
o
th
er
wo
r
d
s
.
I
t
id
e
n
tifie
s
o
n
l
y
b
est
p
air
m
atc
h
in
g
r
ath
er
t
h
an
id
en
tif
y
in
g
t
h
e
ex
ac
t
m
e
an
in
g
an
d
also
th
is
m
eth
o
d
wo
r
k
s
b
ased
o
n
th
e
c
o
r
p
u
s
.
C
o
r
p
u
s
in
f
o
r
m
atio
n
m
ay
d
if
f
er
f
r
o
m
o
n
e
to
an
o
th
e
r
[
1
5
]
.
W
o
r
d
em
b
ed
d
in
g
tech
n
iq
u
es
ar
e
m
o
r
e
p
o
p
u
lar
to
f
i
n
d
s
e
m
an
tic
s
im
ilar
ities
b
etwe
en
d
o
cu
m
e
n
ts
.
E
ar
ly
-
s
tag
e
laten
t
s
em
an
tic
an
aly
s
is
(
L
SA)
i
s
u
s
ed
f
o
r
wo
r
d
m
ea
n
in
g
id
en
tific
atio
n
.
L
ater
W
o
r
d
2
v
ec
b
ec
am
e
m
o
r
e
p
o
p
u
lar
th
an
L
SA
b
ec
au
s
e
wo
r
2
v
ec
c
o
u
ld
h
an
d
le
l
ar
g
e
d
atasets
.
Me
an
wh
ile,
L
SA
wo
u
ld
b
e
m
o
r
e
s
u
itab
le
f
o
r
a
s
m
all
-
s
ize
co
r
p
u
s
[
1
6
]
.
Z
h
o
u
et
a
l
.
[
1
7
]
p
r
o
p
o
s
ed
a
tex
t
s
im
ilar
ity
m
ea
s
u
r
e
b
ased
o
n
wo
r
d
v
ec
to
r
d
is
tan
ce
d
ec
e
n
tr
aliza
tio
n
in
p
u
t
s
en
ten
ce
s
wer
e
clas
s
if
ied
b
ased
o
n
th
e
la
b
els
an
d
p
r
ep
r
o
ce
s
s
ed
b
y
r
em
o
v
in
g
s
to
p
w
o
r
d
s
,
th
e
n
f
a
s
t
s
em
an
tic
d
u
p
licate
d
etec
tio
n
ca
lcu
lates
a
d
is
tan
ce
b
etwe
en
wo
r
d
v
ec
to
r
s
to
f
in
d
s
im
ilar
ities
am
o
n
g
s
en
ten
ce
s
.
W
o
r
d
v
ec
t
o
r
d
is
tan
ce
d
ec
en
tr
aliza
tio
n
(
W
VDD
)
p
er
f
o
r
m
ed
well
to
id
en
tif
y
th
e
s
im
ilar
ity
o
f
C
h
in
ese
tex
ts
.
W
VDD
is
lag
g
in
g
to
p
r
o
ce
s
s
lo
n
g
s
en
ten
ce
s
an
d
f
ea
tu
r
e
s
elec
tio
n
,
a
tr
ain
in
g
s
et
lib
r
ar
y
n
o
t
u
p
to
t
h
e
lev
el.
Ost
en
d
o
r
f
f
et
a
l
.
[
1
8
]
p
r
o
p
o
s
ed
a
m
eth
o
d
to
class
if
y
th
e
s
em
an
tic
r
elatio
n
s
h
ip
b
etwe
en
d
o
cu
m
en
t
p
air
s
;
th
ey
im
p
lem
en
te
d
s
ix
d
if
f
er
en
t
wo
r
d
-
em
b
e
d
d
in
g
tech
n
o
lo
g
ies.
Glo
b
al
v
ec
to
r
s
f
o
r
wo
r
d
r
e
p
r
esen
tatio
n
(
Glo
v
e
)
,
D
o
c2
v
ec
,
d
ee
p
co
n
te
x
tu
al
lan
g
u
ag
e
m
o
d
els,
B
E
R
T
,
XL
Net,
an
d
Siam
ese
ar
ch
ite
ctu
r
e
ea
ch
m
eth
o
d
ev
alu
ated
an
d
ex
p
l
o
r
ed
in
W
ik
ip
ed
ia
ar
ticle
p
air
d
ataset.
T
h
e
f
in
d
in
g
s
o
f
th
is
p
a
p
er
co
m
p
ar
ed
with
v
a
n
illa
tr
an
s
f
o
r
m
er
s
s
iam
en
s
e
ar
ch
it
ec
tu
r
e
wer
e
n
o
t
ab
le
to
g
i
v
e
g
o
o
d
r
esu
lts
in
id
en
tify
i
n
g
s
em
an
tic
r
elatio
n
s
.
v
an
illa
tr
an
s
f
o
r
m
e
r
s
allo
w
ex
ec
u
tin
g
o
f
two
d
o
cu
m
en
ts
p
a
r
allel.
B
u
t
tr
an
s
f
o
r
m
er
s
ca
n
u
s
e
o
n
ly
5
1
2
t
o
k
en
s
f
r
o
m
th
e
te
x
t
d
o
c
u
m
en
ts
wh
er
ea
s
,
Av
g
Glo
v
e
ca
n
e
v
alu
ate
e
n
tire
tex
t
d
o
c
u
m
en
ts
.
T
h
e
XL
Net
m
eth
o
d
ca
n
u
s
e
len
g
th
y
s
eq
u
e
n
ce
s
b
u
t it
n
ee
d
s
p
r
etr
ain
in
g
with
lo
n
g
s
eq
u
en
c
es.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
I
SS
N:
2502
-
4
7
5
2
A
n
a
lyzi
n
g
s
ema
n
tic
s
imila
r
ity
a
mo
n
g
s
t te
xtu
a
l d
o
cu
men
ts
to
s
u
g
g
est n
ea
r
d
u
p
lica
tes
(
V
iji De
va
r
a
ja
n
)
1705
A
h
y
b
r
id
B
E
R
T
m
o
d
el
was
p
r
o
p
o
s
ed
f
o
r
m
u
lti
-
lab
el
tex
t
class
if
icatio
n
[
1
9
]
.
T
h
is
m
o
d
el
wo
r
k
s
b
ased
o
n
f
o
u
r
s
u
b
-
task
;
f
ir
s
t
co
n
tex
t
-
awa
r
e
r
e
p
r
esen
tatio
n
was
d
ev
elo
p
ed
u
s
in
g
th
e
wo
r
d
em
b
e
d
d
in
g
tech
n
iq
u
e
(
p
r
e
-
t
r
ain
ed
B
E
R
T
)
.
T
h
e
s
ec
o
n
d
m
o
d
u
le
l
ab
el
g
r
ap
h
em
b
ed
d
in
g
m
o
d
u
le
f
o
cu
s
ed
o
n
th
e
s
em
an
tic
co
r
r
elatio
n
b
etwe
en
lab
els
u
s
i
n
g
GC
N.
GC
N
is
u
s
ed
to
r
ep
r
esen
t
f
in
d
in
g
t
h
e
s
em
an
tic
r
el
atio
n
s
am
o
n
g
la
b
els
[
2
0
]
.
E
ac
h
n
o
d
e
is
r
ep
r
esen
te
d
b
y
lab
els,
ed
g
es
ar
e
r
ep
r
esen
ted
b
y
th
e
s
em
an
tic
co
r
r
elatio
n
b
etwe
en
n
o
d
es.
T
h
ir
d
,
th
e
a
d
jectiv
e
atten
tio
n
m
o
d
u
le
ca
lcu
lates
a
s
co
r
e
b
etwe
en
wo
r
d
an
d
lab
el.
T
h
e
f
o
u
r
th
m
o
d
u
le
ag
g
r
eg
ates
th
e
f
ea
tu
r
es
f
r
o
m
wo
r
d
em
b
ed
d
in
g
an
d
lab
el
g
r
ap
h
em
b
ed
d
in
g
an
d
f
ee
d
s
in
to
a
b
id
ir
ec
tio
n
al
lo
n
g
s
h
o
r
t
-
ter
m
m
em
o
r
y
n
etwo
r
k
(
B
i
-
L
STM
)
f
o
r
class
if
icatio
n
.
Sen
ten
ce
-
lev
el
f
ea
tu
r
e
o
r
r
ela
tio
n
ex
tr
ac
tio
n
was
ea
s
y
co
m
p
ar
ed
with
d
o
cu
m
en
t
-
le
v
el.
Do
cu
m
en
t
-
lev
el
r
elatio
n
id
en
tific
atio
n
n
ee
d
s
m
o
r
e
ef
f
o
r
t
s
in
ce
it
c
o
n
tain
s
m
an
y
m
o
r
e
en
titi
es.
T
o
r
eso
lv
e
th
is
is
s
u
e
d
o
cu
m
e
n
t
-
lev
el
en
tity
m
ask
m
eth
o
d
with
ty
p
e
in
f
o
r
m
atio
n
(
DE
MM
T
)
was
in
tr
o
d
u
ce
d
b
y
Han
an
d
W
an
g
[
9
]
.
E
ac
h
en
tity
is
m
ask
ed
b
y
two
to
k
en
s
.
T
h
e
f
ir
s
t
to
k
e
n
r
e
p
r
esen
ts
th
e
ty
p
e
o
f
en
tity
an
d
t
h
e
s
ec
o
n
d
th
at
e
n
tity
t
h
a
t
i
s
l
i
n
k
e
d
t
o
.
B
E
R
T
[
2
1
]
e
n
c
o
d
e
r
u
s
e
d
t
o
f
i
n
d
t
h
e
r
e
l
a
t
i
o
n
s
h
i
p
a
m
o
n
g
e
n
t
i
t
i
e
s
.
T
h
e
b
il
i
n
e
ar
l
a
y
e
r
a
n
d
s
o
f
t
m
a
x
l
a
y
e
r
w
e
r
e
u
s
e
d
t
o
i
d
e
n
ti
f
y
r
e
la
t
i
o
n
s
o
f
a
l
l
e
n
t
i
t
y
p
a
i
r
s
.
D
E
MM
T
b
r
i
n
g
s
i
m
p
r
o
v
e
m
e
n
t
r
e
s
u
lts
w
i
t
h
al
l
e
n
c
o
d
e
r
s
l
i
k
e
c
o
n
v
o
l
u
t
i
o
n
a
l
n
e
u
r
a
l
n
et
wo
r
k
(
C
N
N
)
,
l
o
n
g
s
h
o
r
t
-
te
r
m
m
em
o
r
y
n
e
t
w
o
r
k
(
L
S
T
M
)
,
B
i
-
L
ST
M
,
a
n
d
B
E
R
T
.
R
em
in
d
er
s
ec
tio
n
s
ar
e
o
r
g
a
n
ized
as
s
h
o
wn
in
:
I
n
s
ec
tio
n
2
we
p
r
o
p
o
s
e
a
m
eth
o
d
h
y
b
r
id
B
E
R
T
m
o
d
el
f
o
r
tex
t
s
em
an
tics
(
HB
T
SG
)
m
o
d
el.
I
n
s
ec
tio
n
3
we
ev
alu
ated
r
esu
lts
an
d
d
is
cu
s
s
io
n
.
Fin
ally
,
in
s
ec
t
io
n
4
we
co
n
clu
d
e
o
u
r
r
es
ea
r
ch
wo
r
k
.
2.
M
E
T
H
O
DO
L
O
G
Y
I
n
th
is
s
ec
tio
n
,
we
f
o
cu
s
ed
o
n
id
en
tify
in
g
n
ea
r
d
u
p
licates
b
ased
o
n
s
em
an
tic
s
im
ilar
ity
b
etwe
en
tex
t
d
o
cu
m
e
n
ts
.
T
h
e
p
r
o
p
o
s
ed
wo
r
k
co
n
s
is
ts
o
f
th
r
ee
s
u
b
-
m
o
d
u
l
es,
in
th
e
f
ir
s
t
s
tag
e
s
eg
m
en
tat
io
n
,
d
o
cu
m
en
ts
a
r
e
s
p
lit
in
to
b
lo
c
k
s
,
th
e
s
ec
o
n
d
s
tag
e
k
ey
w
o
r
d
ex
tr
ac
tio
n
is
d
o
n
e
th
r
o
u
g
h
GC
N
an
d
wo
r
d
s
c
o
r
in
g
m
eth
o
d
,
a
n
d
th
e
last
s
tag
e
f
in
d
s
th
e
d
is
tan
c
e
r
elatio
n
b
etwe
en
clu
s
ter
s
.
T
h
e
wo
r
k
f
lo
w
o
f
th
e
h
y
b
r
id
B
E
R
T
m
o
d
el
f
o
r
tex
t
s
em
an
tics
u
s
in
g
th
e
GC
N
m
eth
o
d
is
s
h
o
wn
i
n
Fig
u
r
e
1
.
Fig
u
r
e
1
.
R
esear
ch
f
lo
w
o
f
H
B
T
SG m
o
d
el
2
.
1
.
Seg
m
ent
a
t
i
o
n
Fil
es
ar
e
s
eg
m
en
ted
in
to
s
ev
er
al
b
lo
ck
s
,
b
lo
ck
-
b
ased
d
ed
u
p
licatio
n
is
u
s
ed
to
r
ed
u
ce
r
ed
u
n
d
a
n
cy
lev
els
an
d
i
m
p
r
o
v
e
e
f
f
icien
cy
.
Seg
m
en
tatio
n
m
a
y
o
cc
u
r
in
two
way
s
:
f
ix
e
d
s
ize
a
n
d
v
ar
i
ab
le
s
ize
o
f
b
l
o
ck
s
[
3
]
.
Fix
ed
-
s
ize
ch
u
n
k
in
g
p
r
o
c
ess
v
er
y
s
im
p
le
t
o
im
p
lem
e
n
t
b
u
t
lag
g
in
g
in
b
y
te
s
h
if
tin
g
p
r
o
b
lem
[
1
2
]
.
I
n
o
u
r
p
r
o
p
o
s
ed
m
eth
o
d
we
ar
e
m
ai
n
ly
f
o
c
u
s
ed
o
n
s
em
an
tic
r
elatio
n
s
b
etwe
en
d
o
cu
m
e
n
ts
.
So
,
we
u
s
ed
th
e
f
ix
e
d
ch
u
n
k
i
n
g
m
eth
o
d
f
o
r
s
eg
m
en
t
atio
n
.
T
h
e
m
o
s
t
co
m
m
o
n
b
lo
c
k
s
ize
f
o
r
th
e
f
ix
e
d
ch
u
n
k
in
g
m
eth
o
d
is
4
KB
,
it
g
iv
es
th
e
o
p
tim
al
s
o
lu
tio
n
f
o
r
d
ed
u
p
licatio
n
as
p
er
th
e
ex
is
tin
g
s
u
r
v
ey
[
1
2
]
.
T
h
e
d
o
cu
m
e
n
ts
ar
e
s
eg
m
en
ted
in
to
4
KB
an
d
th
is
will b
e
g
iv
e
n
as a
n
in
p
u
t f
o
r
th
e
p
r
ep
r
o
ce
s
s
in
g
.
2
.
2
.
P
re
pro
ce
s
s
ing
Af
ter
s
eg
m
en
tatio
n
,
we
n
ee
d
t
o
d
o
p
r
e
p
r
o
ce
s
s
in
g
to
im
p
r
o
v
e
ac
cu
r
ac
y
an
d
ef
f
icien
cy
.
I
n
g
en
er
al,
f
o
r
all
d
ata
m
in
in
g
p
r
o
ce
s
s
es
p
r
ep
r
o
ce
s
s
in
g
is
th
e
f
ir
s
t
s
tep
.
Go
o
d
d
ata
p
r
ep
ar
atio
n
lead
s
to
ef
f
icien
t
d
ata
an
aly
s
is
,
elim
in
atin
g
er
r
o
r
s
an
d
p
r
ec
is
e
r
esu
lts
d
u
r
in
g
p
r
o
ce
s
s
in
g
.
Data
p
r
ep
ar
atio
n
is
also
ca
lled
d
ata
p
r
e
-
p
r
o
ce
s
s
in
g
.
Data
p
r
ep
ar
ati
o
n
i
s
th
e
jo
b
o
f
clea
n
in
g
a
n
d
clea
n
s
in
g
th
e
r
aw
d
ata
b
ef
o
r
e
p
r
o
ce
s
s
in
g
an
d
an
aly
s
is
.
T
h
e
m
ain
m
o
ti
v
e
o
f
th
is
p
r
o
ce
s
s
m
ak
es
o
u
r
d
ata
r
ea
d
y
to
ex
p
lo
r
e.
Data
p
r
e
p
ar
atio
n
is
a
q
u
ite
len
g
th
y
p
r
o
ce
s
s
,
b
u
t
it
is
e
s
s
en
tial
f
o
r
f
u
r
th
er
p
r
o
ce
s
s
in
g
.
Usu
ally
,
d
ata
p
r
ep
ar
atio
n
in
clu
d
es
r
eg
u
latin
g
th
e
d
ata
f
o
r
m
ats,
en
h
an
cin
g
t
h
e
r
aw
d
ata,
an
d
o
r
g
ettin
g
r
id
o
f
o
u
tlier
s
with
th
e
h
elp
o
f
d
ata
f
r
am
es.
W
e
d
id
s
to
p
wo
r
d
r
em
o
v
al
an
d
to
k
en
izatio
n
u
s
in
g
a
n
atu
r
al
lan
g
u
ag
e
to
o
lk
it
(
NL
T
K)
.
Sto
p
wo
r
d
s
ar
e
in
E
n
g
lis
h
f
o
r
ex
am
p
le
is
,
I
,
a
n
,
an
d
ar
e.
T
h
ese
wo
r
d
s
ar
e
n
o
t
g
iv
in
g
im
p
o
r
tan
t
f
ea
t
u
r
es.
I
f
we
r
em
o
v
e
s
to
p
wo
r
d
s
we
ca
n
f
o
c
u
s
o
n
th
e
ex
ac
t
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
5
0
2
-
4
7
5
2
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
,
Vo
l.
25
,
No
.
3
,
Ma
r
ch
20
22
:
1
7
0
3
-
1
7
1
1
1706
f
e
a
t
u
r
e
a
n
d
d
a
t
a
s
e
t
s
i
z
e
w
il
l
b
e
r
e
d
u
c
e
d
g
r
e
a
t
l
y
.
T
a
b
l
e
1
s
h
o
w
s
a
s
a
m
p
l
e
o
f
b
e
f
o
r
e
a
n
d
a
f
t
e
r
s
t
o
p
w
o
r
d
r
e
m
o
v
a
l
.
T
o
k
e
n
i
z
a
t
i
o
n
is
t
h
e
p
r
o
c
e
s
s
o
f
s
p
l
i
tt
i
n
g
s
e
n
t
e
n
c
es
i
n
t
o
i
n
d
i
v
i
d
u
a
l
w
o
r
d
s
o
r
t
e
r
m
s
t
h
a
t
i
s
al
s
o
c
a
l
l
e
d
t
o
k
e
n
s
.
T
ab
le
1
.
Sam
p
le
f
o
r
with
o
u
t sto
p
wo
r
d
S
a
mp
l
e
Te
x
t
W
i
t
h
o
u
t
S
t
o
p
w
o
r
d
s
I
l
i
k
e
f
o
o
d
,
s
o
I
a
m c
o
o
k
i
n
g
Li
k
e
,
f
o
o
d
,
c
o
o
k
i
n
g
H
e
i
s
st
u
d
y
i
n
g
c
o
m
p
u
t
e
r
sc
i
e
n
c
e
S
t
u
d
y
i
n
g
,
c
o
m
p
u
t
e
r
s
c
i
e
n
c
e
2
.
3
.
K
ey
wo
rd
e
x
t
r
a
ct
io
n
I
n
th
e
s
ec
o
n
d
s
tag
e
o
f
w
o
r
k
,
k
ey
wo
r
d
ex
t
r
ac
tio
n
ca
n
b
e
d
o
n
e
GC
N,
an
d
wo
r
d
s
co
r
in
g
b
ased
o
n
s
em
an
tic
r
elatio
n
s
h
ip
s
u
s
in
g
a
B
E
R
T
en
co
d
er
.
W
o
r
d
em
b
ed
d
in
g
is
co
n
v
er
tin
g
wo
r
d
s
in
to
n
u
m
b
er
s
b
y
u
s
in
g
one
-
h
o
t
en
co
d
in
g
.
I
n
th
is
m
e
th
o
d
,
all
t
h
e
wo
r
d
s
ar
e
t
r
ea
ted
as
o
n
e
f
ea
tu
r
e
o
f
th
e
v
ec
to
r
i.e
.
ea
c
h
wo
r
d
r
ep
r
esen
ted
b
y
o
n
e
c
o
lu
m
n
.
T
h
e
lim
itatio
n
o
f
o
n
e
-
h
o
t
e
n
co
d
in
g
m
et
h
o
d
is
if
it
co
n
tai
n
s
m
o
r
e
n
u
m
b
er
o
f
wo
r
d
s
th
en
,
it
h
as
to
f
o
r
m
m
o
r
e
n
u
m
b
er
o
f
co
lu
m
n
s
.
As
a
r
esu
lt,
th
is
will
cr
ea
te
m
an
y
ze
r
o
s
en
tr
y
,
it
will
cr
ea
te
a
s
p
ar
s
e
m
atr
ix
s
h
o
wn
in
T
ab
le
2
.
I
t
will
b
e
r
ig
id
to
h
an
d
le
s
p
ar
s
e
m
atr
ices
in
a
m
ac
h
in
e
lear
n
in
g
alg
o
r
ith
m
.
T
ab
le
2
.
On
e
-
h
o
t r
e
p
r
esen
tatio
n
o
f
wo
r
d
I
am
d
o
i
n
g
g
o
o
d
I
1
0
0
0
am
0
1
0
0
d
o
i
n
g
0
0
1
0
g
o
o
d
0
0
0
1
I
f
th
e
s
en
ten
ce
c
o
n
tain
s
m
o
r
e
s
im
ilar
wo
r
d
s
,
th
o
s
e
wo
r
d
s
ar
e
also
tr
ea
ted
as
two
d
if
f
e
r
en
t w
o
r
d
s
.
Fo
r
ex
am
p
le,
g
o
o
d
an
d
g
r
ea
t
ar
e
m
o
r
e
o
r
less
s
im
ilar
b
u
t
in
o
n
e
h
o
t
r
e
p
r
esen
tatio
n
it
will
b
e
co
n
s
id
er
ed
as
two
d
if
f
er
en
t
wo
r
d
s
.
Her
e
th
e
r
e
a
r
e
n
o
f
i
n
d
in
g
s
o
n
th
e
s
em
a
n
tic
r
elatio
n
s
h
ip
b
etwe
en
w
o
r
d
s
.
So
,
th
e
wo
r
d
2
v
ec
em
b
ed
d
in
g
alg
o
r
it
h
m
was
in
tr
o
d
u
ce
d
[
2
2
]
.
W
o
r
d
2
v
ec
n
e
u
r
al
n
etwo
r
k
m
o
d
el
co
m
p
ar
es
two
v
ec
to
r
s
u
s
in
g
co
s
in
e
s
im
ilar
ity
.
T
h
e
m
ain
g
o
al
o
f
wo
r
d
e
m
b
ed
d
in
g
s
is
to
r
ed
u
ce
th
e
d
im
en
s
io
n
a
lity
an
d
in
ter
-
w
o
r
d
s
em
an
tics
m
u
s
t
b
e
ca
p
t
u
r
ed
.
W
o
r
d
2
v
ec
a
n
d
L
SA
d
o
es
n
o
t
ac
co
u
n
t
f
o
r
c
o
-
o
cc
u
r
r
en
ce
s
tatis
tics
[
2
3
]
.
Glo
b
al
v
ec
to
r
s
f
o
r
wo
r
d
r
ep
r
esen
tatio
n
(
Glo
v
e)
aim
to
im
p
r
o
v
e
co
n
tex
t
ca
p
tu
r
ed
an
d
co
-
o
cc
u
r
r
e
n
ce
p
r
o
b
ab
ilit
ies.
I
ts
em
b
ed
d
in
g
r
elate
s
to
p
r
o
b
a
b
ilit
ies
th
at
t
wo
wo
r
d
s
ap
p
ea
r
to
g
eth
er
.
Glo
v
e
c
o
u
n
t
-
b
ased
m
o
d
els
lear
n
th
eir
v
ec
to
r
s
b
y
d
im
en
s
io
n
ality
r
e
d
u
ctio
n
o
n
c
o
-
o
cc
u
r
r
en
ce
[
2
4
]
.
T
h
e
g
lo
v
e
d
o
es
n
o
t
r
ely
o
n
lo
ca
l
co
n
tex
t
in
f
o
r
m
atio
n
o
f
wo
r
d
s
[
2
5
]
.
W
o
r
d
2
v
ec
an
d
Glo
v
e
ar
e
c
o
n
tex
t
-
in
d
ep
en
d
en
t.
I
t
c
o
m
b
in
es
all
th
e
d
if
f
er
e
n
t
s
en
s
es
o
f
wo
r
d
s
in
to
o
n
e
v
ec
to
r
.
E
m
b
ed
d
in
g
s
f
r
o
m
lan
g
u
ag
e
m
o
d
els
(
E
lm
o
)
,
a
n
d
B
E
R
T
ar
e
co
n
tex
t
-
d
ep
en
d
en
t.
T
h
ese
m
o
d
els
ca
n
g
en
er
ate
a
v
ec
to
r
f
o
r
wo
r
d
s
b
ased
o
n
co
n
tex
t.
B
E
R
T
g
en
e
r
ates
its
em
b
ed
d
in
g
d
i
f
f
er
en
tly
co
m
p
ar
ed
with
o
t
h
er
wo
r
d
em
b
ed
d
in
g
m
eth
o
d
s
[
2
1
]
.
T
ab
le
3
s
h
o
ws
th
e
co
m
p
ar
is
o
n
o
f
ch
ar
ac
ter
is
tics
b
etwe
en
wo
r
d
em
b
ed
d
i
n
g
m
o
d
els.
B
E
R
T
tak
es
th
e
i
n
f
o
r
m
atio
n
f
o
r
war
d
an
d
b
ac
k
war
d
an
d
co
m
b
in
es
it.
B
E
R
T
m
o
d
el
u
s
ed
in
th
e
f
o
llo
win
g
ap
p
licatio
n
s
:
n
eu
r
al
m
ac
h
in
e
tr
an
s
latio
n
,
q
u
esti
o
n
an
d
an
s
wer
in
g
,
s
en
tim
en
t a
n
al
y
s
is
,
an
d
tex
t su
m
m
ar
izatio
n
.
T
ab
le
3
.
Dif
f
e
r
en
t w
o
r
d
em
b
e
d
d
in
g
m
o
d
els
W
o
r
d
Em
b
e
d
d
i
n
g
M
o
d
e
l
s
C
o
n
t
e
x
t
S
e
n
s
i
t
i
v
e
Le
a
r
n
t
r
e
p
r
e
s
e
n
t
a
t
i
o
n
W
o
r
d
2
v
e
c
No
W
o
r
d
s
G
l
o
v
e
No
W
o
r
d
s
El
m
o
Y
e
s
C
h
a
r
a
c
t
e
r
-
b
a
se
d
B
ER
T
Y
e
s
S
u
b
w
o
r
d
s
2
.
3
.
1
.
Wo
rd
s
co
ring
ba
s
ed
o
n
s
em
a
ntic
re
la
t
io
ns
hip
s
W
o
r
d
s
co
r
e
ca
lcu
lated
t
h
r
o
u
g
h
th
e
weig
h
t
o
f
e
d
g
es,
in
itially
wo
r
d
s
co
r
e
ca
lcu
lated
b
ased
o
n
wo
r
d
co
-
o
cc
u
r
r
en
ce
.
W
o
r
d
s
co
r
e
is
b
ased
o
n
wo
r
d
co
-
o
cc
u
r
r
en
ce
ca
lcu
lated
b
y
th
e
weig
h
t
o
f
e
d
g
es,
b
u
t
two
wo
r
d
s
d
o
n
o
t
ap
p
ea
r
in
th
e
s
am
e
win
d
o
w
ev
en
th
o
u
g
h
th
ey
h
a
v
e
s
im
ilar
m
ea
n
in
g
s
.
T
h
is
lim
itatio
n
ca
n
b
e
o
v
e
r
co
m
e
th
r
o
u
g
h
a
wo
r
d
s
co
r
e
ca
lc
u
lated
b
ased
o
n
s
em
an
tic
r
elatio
n
s
.
Ma
n
y
d
if
f
er
en
t
way
s
we
ca
n
ca
lcu
late
s
em
an
tic
r
elatio
n
s
h
ip
s
b
etwe
en
wo
r
d
s
.
I
n
th
is
p
ap
er
,
we
ca
lcu
lated
th
r
o
u
g
h
th
e
W
o
r
d
e
m
b
ed
d
in
g
m
e
th
o
d
b
ased
o
n
th
e
p
r
e
-
tr
ain
ed
B
E
R
T
m
o
d
el
u
s
ed
f
o
r
in
p
u
t f
ea
tu
r
es.
B
E
R
T
co
n
s
is
t
s
o
f
th
r
ee
lay
er
s
as
in
p
u
t
lay
er
,
B
E
R
T
e
n
co
d
er
,
o
u
t
p
u
t
lay
er
.
First
in
p
u
t
lay
e
r
,
s
en
ten
ce
s
s
p
lit
in
to
wo
r
d
s
et
an
d
in
d
icate
d
with
#
#
.
Seco
n
d
lay
er
B
E
R
T
en
co
d
er
co
n
tain
s
tr
an
s
f
o
r
m
er
b
lo
ck
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
I
SS
N:
2502
-
4
7
5
2
A
n
a
lyzi
n
g
s
ema
n
tic
s
imila
r
ity
a
mo
n
g
s
t te
xtu
a
l d
o
cu
men
ts
to
s
u
g
g
est n
ea
r
d
u
p
lica
tes
(
V
iji De
va
r
a
ja
n
)
1707
an
d
s
elf
-
atten
tio
n
h
ea
d
a
d
d
ed
with
in
p
u
t
s
eq
u
en
ce
s
.
Stru
ctu
r
e
o
f
B
E
R
T
en
co
d
er
s
h
o
wn
in
F
ig
u
r
e
2
.
T
o
p
o
f
th
e
B
E
R
T
co
n
t
ain
s
a
So
f
tMa
x
class
if
ier
to
ca
lcu
late
co
n
d
itio
n
al
p
r
o
b
ab
ilit
y
o
n
p
r
e
d
ef
in
ed
l
ab
els.
Fig
u
r
e
2
.
Stru
ctu
r
e
o
f
in
p
u
t
en
co
d
er
L
et
ass
u
m
e
A
will
b
e
a
n
in
p
u
t
s
eq
u
en
ce
co
n
tain
in
g
n
wo
r
d
s
,
d
en
o
ted
as
A
1:n
=
A
1
,A
2
,
A
3
,…..A
n
,
wh
er
e
A
i
(1
≤
A
i
≥
n
)
r
ef
er
s
t
o
th
e
i
th
wo
r
d
in
th
e
s
eq
u
en
ce
.
I
n
p
u
t
s
eq
u
en
ce
s
ar
e
co
n
s
tr
u
ct
ed
with
o
u
t
o
r
with
au
x
iliar
y
s
en
ten
ce
s
f
o
r
ex
am
p
le
“He
wr
o
te
th
e
ex
am
”
wit
h
a
lab
el
m
en
tio
n
ed
f
r
o
m
th
e
s
et
o
f
{“
p
o
s
itiv
e”
,
“n
eg
ativ
e”
}.
Sam
p
le
in
p
u
t
s
eq
u
en
ce
co
n
s
tr
u
ctio
n
m
en
tio
n
e
d
in
T
ab
le
4
.
T
h
e
p
s
eu
d
o
-
s
en
t
en
ce
is
m
ad
e
u
p
o
f
ca
teg
o
r
ical
lab
els
an
d
o
th
e
r
w
o
r
d
s
lik
e,
[
C
L
S]
A
1
….
.
A
i
,
……
A
n
[
SEP]
a
1
……a
j
……a
m
[
SEP]
.
T
h
e
tar
g
et
lab
els
ar
e
r
e
p
r
esen
ted
as
{0
,
1
}
in
t
h
e
B
E
R
T
4
T
C
-
AQ
an
d
B
E
R
T
4
T
C
-
AA
m
o
d
els
[
2
6
]
.
T
h
e
ab
o
v
e
m
o
d
els
e
v
er
y
in
p
u
t
s
tatem
en
t
n
o
t
co
n
tain
in
g
m
o
r
e
th
an
5
1
2
to
k
e
n
s
.
I
f
th
e
in
p
u
t
s
en
ten
ce
p
air
s
(
A
1:n
,
a
1:m
)
f
u
lf
ill
th
e
co
n
d
itio
n
n
+m
+3
>5
1
2
m
ea
n
s
th
en
o
n
ly
at
m
o
s
t
5
0
9
to
k
en
s
ca
n
b
e
k
e
p
t.
W
h
er
e
th
e
co
n
s
tan
t
‘
3
’
in
d
icate
s
o
n
e
[
C
L
S]
to
k
en
p
l
u
s
two
[
SEP]
to
k
en
s
.
So
,
5
1
2
to
k
en
s
–
3
to
k
en
s
(
[
C
L
S],
[
SEP]
,
[
SEP]
)
=
5
0
9
to
k
e
n
s
.
T
h
e
co
n
d
itio
n
s
s
p
ec
if
ied
f
o
r
p
r
etr
e
atm
en
t
in
(
1
)
.
(
1
:
,
1
:
)
=
{
[
]
1
…
.
.
[
]
1
…
.
.
[
]
+
<
509
[
]
1
…
.
.
+
−
509
[
]
1
…
.
.
[
]
+
≥
509
(
1
)
T
ab
le
4
.
C
o
n
s
tr
u
ctio
n
o
f
i
n
p
u
t
s
eq
u
en
ce
I
n
p
u
t
S
e
q
u
e
n
c
e
La
b
e
l
[
C
LS]
H
e
w
r
o
t
e
t
h
e
e
x
a
m
[
S
EP]
{
p
o
si
t
i
v
e
,
n
e
g
a
t
i
v
e
}
[
C
LS]
H
e
w
r
o
t
e
t
h
e
e
x
a
m
[
S
EP]
W
h
a
t
i
s
t
h
e
r
e
su
l
t
[
S
EP]
{
p
o
si
t
i
v
e
,
n
e
g
a
t
i
v
e
}
[
C
LS]
H
e
w
r
o
t
e
t
h
e
e
x
a
m
[
S
EP]
T
h
e
r
e
su
l
t
i
s
p
o
s
i
t
i
v
e
[
S
EP]
{
1
,
0
}
[
C
LS]
H
e
w
r
o
t
e
t
h
e
e
x
a
m
[
S
EP]
T
h
e
r
e
su
l
t
i
s
n
e
g
a
t
i
v
e
[
S
EP]
{1
,
0
}
C
o
m
p
u
ter
s
d
o
n
'
t
k
n
o
w
wo
r
d
s
;
th
ey
ca
n
k
n
o
w
n
u
m
b
er
s
an
d
v
ec
to
r
s
.
I
n
p
u
t
em
b
ed
d
i
n
g
t
o
m
ap
all
wo
r
d
s
p
h
y
s
ically
clo
s
e
to
ea
ch
o
th
er
.
T
h
e
s
am
e
wo
r
d
h
as
d
if
f
er
en
t
m
ea
n
in
g
s
in
d
if
f
er
e
n
t
s
en
ten
ce
s
b
ased
o
n
th
e
p
o
s
itio
n
.
T
o
o
v
er
co
m
e
th
is
p
o
s
itio
n
al
em
b
ed
d
in
g
a
g
g
r
eg
a
ted
with
in
p
u
t
em
b
e
d
d
in
g
.
A
p
o
s
itio
n
al
em
b
ed
d
in
g
v
ec
to
r
g
iv
es
a
co
n
tex
t
-
b
ased
p
o
s
itio
n
o
f
th
e
wo
r
d
in
th
e
s
en
ten
ce
.
Fo
r
ex
am
p
le,
Sen
ten
ce
1
Ap
p
le
r
elea
s
es
a
n
ew
v
er
s
io
n
.
Sen
ten
ce
2
Daily
ea
t
o
n
e
ap
p
le.
Her
e
th
e
wo
r
d
ap
p
le
is
th
e
s
am
e
b
u
t
it
g
iv
es
a
d
if
f
er
e
n
t
m
ea
n
in
g
b
ased
o
n
th
e
p
o
s
itio
n
in
th
e
s
en
ten
ce
s
.
Po
s
itio
n
al
em
b
ed
d
i
n
g
u
s
ed
th
e
S
in
,
C
o
s
f
u
n
ctio
n
to
g
en
er
ate
a
v
ec
to
r
.
E
n
co
d
er
b
lo
ck
s
co
n
tain
f
ee
d
-
f
o
r
war
d
an
d
m
u
lti
-
h
ea
d
atten
tio
n
b
l
o
ck
s
.
Feed
f
o
r
war
d
tr
an
s
f
o
r
m
atten
tio
n
v
ec
to
r
to
b
lo
ck
s
.
Ho
w
d
o
es
it
w
o
r
k
r
el
ev
an
t
to
o
th
er
wo
r
d
s
in
t
h
e
s
am
e
s
en
ten
ce
?
T
h
is
I
n
p
u
t
E
n
co
d
er
B
lo
ck
Ad
d
an
d
N
o
r
m
Feed
F
o
r
war
d
Ad
d
&
No
r
m
Mu
lti He
ad
Atten
tio
n
I
n
p
u
t E
m
b
ed
d
in
g
Po
s
itio
n
al
E
m
b
ed
d
in
g
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
5
0
2
-
4
7
5
2
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
,
Vo
l.
25
,
No
.
3
,
Ma
r
ch
20
22
:
1
7
0
3
-
1
7
1
1
1708
will
b
e
r
eso
lv
ed
u
s
in
g
an
atte
n
tio
n
v
ec
to
r
f
o
r
ev
e
r
y
w
o
r
d
g
en
er
atin
g
atten
tio
n
v
ec
to
r
b
a
s
ed
o
n
th
e
c
o
n
tex
t
r
elatio
n
s
h
ip
b
etwe
en
w
o
r
d
s
i
n
th
e
s
en
ten
ce
.
Ad
d
itio
n
ally
,
we
ad
d
n
o
r
m
aliza
tio
n
f
o
r
e
ac
h
lay
er
to
m
ak
e
n
o
r
m
aliza
tio
n
ac
r
o
s
s
ea
ch
f
ea
tu
r
e
in
s
tead
o
f
ea
ch
s
am
p
le.
T
h
e
s
tack
o
f
in
p
u
t
en
c
o
d
er
b
lo
ck
s
u
s
ed
in
B
E
R
T
.
B
E
R
T
f
ea
tu
r
e
ex
tr
ac
tio
n
ex
p
la
in
ed
in
alg
o
r
ith
m
1
.
Alg
o
r
ith
m
1
: BER
T
f
ea
tu
r
e
e
x
tr
ac
tio
n
Input R(f):
Set consisting of input sequence f.
S(f):
Similarity scores set related to input sentence f.
Output J:
Extractions Set. Each jεJ denotes the tuple of (feature, similarity scores)
pre
-
processing the data;
for
sεS(s) do
Remove s;
Tag the Features with the Pre
-
trained BERT model s;
end
reconfigure;
J=
∝
for
sεS(f)
do
for
each and every sentence in s
do
M = BERT Tag nouns of s;
for
rεR(f)
do
N = r
∪
M;
RES = Original Features
–
BERT Features ;
j =(N,RES);
Add j to J ;
end
end
end
J =Feature Extracted Normalized Data;
return J;
2
.
3
.
2
.
G
ra
ph
e
m
bedd
ing
Gr
ap
h
em
b
ed
d
in
g
tech
n
o
lo
g
y
co
n
v
er
ts
g
r
ap
h
s
in
to
lo
wer
-
d
im
en
s
io
n
al
b
ef
o
r
e
s
en
d
i
n
g
th
em
in
to
a
m
ac
h
in
e
lear
n
i
n
g
alg
o
r
ith
m
.
T
h
at
tr
an
s
f
o
r
m
s
n
o
d
es,
e
d
g
es,
an
d
r
elatio
n
s
(
f
ea
tu
r
es)
i
n
to
v
ec
to
r
s
p
ac
e.
Gr
a
p
h
an
aly
tics
p
r
o
ce
s
s
m
ajo
r
l
y
u
s
e
d
in
th
e
f
o
llo
win
g
ar
ea
:
n
o
d
e
class
if
icatio
n
u
s
ed
to
f
in
d
a
lab
el
o
f
n
o
d
es,
lin
k
p
r
ed
ictio
n
to
p
r
ed
ict
m
is
s
in
g
lin
k
an
d
f
u
tu
r
e
o
cc
u
r
r
e
n
ce
,
clu
s
ter
in
g
u
s
ed
to
id
en
tify
s
am
e
n
o
d
e
ty
p
e
an
d
g
r
o
u
p
th
em
,
v
is
u
aliza
tio
n
im
p
r
o
v
es t
h
e
s
tr
u
ctu
r
e
o
f
th
e
n
etwo
r
k
[
2
7
]
.
2
.
3
.
3
.
Dee
p lea
rning
-
ba
s
ed
m
o
del
Dee
p
n
eu
r
al
n
etwo
r
k
-
b
ased
m
o
d
els
s
tr
u
ctu
r
al
d
ee
p
n
etwo
r
k
em
b
e
d
d
in
g
(
SDNE
)
an
d
d
ee
p
n
eu
r
al
n
etwo
r
k
s
f
o
r
g
r
ap
h
r
ep
r
esen
t
atio
n
(
DNGR
)
ar
e
ex
p
en
s
iv
e
an
d
n
o
t
ef
f
icien
t
f
o
r
lar
g
e
s
p
ar
s
e
g
r
ap
h
s
.
Gr
ap
h
co
n
v
o
l
u
tio
n
al
n
etwo
r
k
s
o
v
er
c
o
m
e
th
is
p
r
o
b
lem
[
2
0
]
.
I
n
r
e
c
en
t
y
ea
r
s
GC
N
was
u
s
ed
in
m
o
s
t
o
f
th
e
r
esear
ch
wo
r
k
to
s
p
ec
if
y
u
n
iq
u
e
lab
el
s
f
o
r
all
n
o
d
es.
I
t
g
iv
e
s
o
v
e
r
all
s
tr
u
ctu
r
e
in
f
o
r
m
atio
n
o
f
t
h
e
g
r
ap
h
,
esp
ec
ially
s
em
an
tic
r
elatio
n
s
am
o
n
g
la
b
els
f
r
o
m
t
h
e
co
n
tex
t.
As
s
h
o
wn
in
(
2
)
[
2
0
]
lay
e
r
-
wis
e
p
r
o
p
ag
atio
n
r
u
le
u
s
ed
in
th
e
m
u
lti
-
lay
er
GC
N.
(
+
1
)
=
(
̃
−
1
2
̃
̃
−
1
2
(
)
(
)
)
(
2
)
wh
er
e,
-
̃
=
+
,
is
an
ad
jace
n
cy
m
atr
ix
o
f
g
r
ap
h
G.
-
,
is
an
id
en
tity
m
atr
ix
.
-
̃
=
∑
̃
,
(
)
,
is
a
lay
er
s
p
ec
if
ic
tr
ain
ab
le
w
eig
h
t m
atr
ix
.
-
(
.
)
,
is
an
ac
tiv
atio
n
f
u
n
ctio
n
.
i.e
,
[
R
eL
U
(
.
)
=
m
a
x
(
0
,
.
)
]
I
n
two
-
lay
e
r
GC
N,
n
o
d
e
clas
s
if
icatio
n
o
n
a
g
r
ap
h
with
a
d
jace
n
cy
m
atr
ix
A.
First
in
p
r
ep
r
o
ce
s
s
in
g
s
tag
e
ca
lcu
late
th
e
v
alu
e
f
o
r
̂
=
̃
−
1
2
̃
̃
−
1
2
.
=
(
,
)
=
(
̂
(
̂
(
0
)
)
(
1
)
)
(3
)
As
s
h
o
wn
in
(
3
)
[
2
0
]
ex
p
lain
s
th
e
s
im
p
le
f
o
r
m
o
f
f
o
r
war
d
m
o
d
el.
Fig
u
r
e
3
e
x
p
lain
s
th
e
p
r
o
ce
s
s
o
f
GC
N,
C
in
d
icate
s
in
p
u
t
ch
an
n
els
an
d
F
d
en
o
tes
f
ea
tu
r
es,
Yi
d
en
o
tes
lab
els,
ed
g
es
ar
e
r
ep
r
esen
ted
b
y
b
lack
lin
es
in
th
e
F
ig
u
r
e
3
.
GC
N
c
o
llects
th
e
v
alu
es
o
f
all
n
eig
h
b
o
r
in
g
n
o
d
es
to
ev
alu
ate
t
h
e
cu
r
r
en
t
n
o
d
e
.
R
eL
U
ac
tiv
atio
n
f
u
n
cti
o
n
u
s
ed
h
er
e
[
1
9
]
,
[
2
7
]
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
I
SS
N:
2502
-
4
7
5
2
A
n
a
lyzi
n
g
s
ema
n
tic
s
imila
r
ity
a
mo
n
g
s
t te
xtu
a
l d
o
cu
men
ts
to
s
u
g
g
est n
ea
r
d
u
p
lica
tes
(
V
iji De
va
r
a
ja
n
)
1709
Fig
u
r
e
3
.
Gr
a
p
h
co
n
v
o
l
u
tio
n
al
n
etwo
r
k
2
.
3
.
4
.
Clus
t
er
ing
T
h
e
clu
s
ter
in
g
co
n
ce
p
t
is
u
s
ed
to
g
r
o
u
p
th
e
o
b
jects
b
ased
o
n
s
im
ilar
ch
ar
ac
ter
is
tics
.
Do
cu
m
en
t
clu
s
ter
in
g
o
r
tex
t
g
r
o
u
p
in
g
is
m
ajo
r
ly
f
o
cu
s
ed
b
ased
o
n
g
r
o
u
p
in
g
th
e
d
o
cu
m
en
ts
b
ased
o
n
s
im
ilar
co
n
ten
t
o
r
th
e
s
am
e
ca
teg
o
r
y
o
f
th
e
f
il
es.
Similar
ity
ca
n
b
e
id
en
tifi
ed
u
s
in
g
s
em
an
tic
r
elatio
n
s
b
etwe
en
d
o
cu
m
en
ts
th
r
o
u
g
h
n
atu
r
al
lan
g
u
a
g
e
p
r
o
ce
s
s
in
g
tech
n
iq
u
es.
Key
wo
r
d
ex
tr
ac
tio
n
is
d
o
n
e
b
y
wo
r
d
s
co
r
in
g
u
s
in
g
th
e
B
E
R
T
an
d
GC
N
m
o
d
el,
th
en
th
e
d
o
cu
m
en
ts
ar
e
clu
s
ter
ed
b
y
u
s
in
g
th
e
K
-
m
ea
n
al
g
o
r
ith
m
.
K
-
m
ea
n
s
c
l
u
s
t
e
r
i
n
g
al
g
o
r
i
t
h
m
m
o
s
t
p
o
p
u
l
a
r
a
l
g
o
r
ith
m
f
o
r
c
l
u
s
t
e
r
i
n
g
t
h
e
d
o
c
u
m
e
n
ts
b
a
s
e
d
o
n
s
e
m
a
n
t
i
c
s
i
m
i
l
a
r
it
y
[
2
8
]
,
[
2
9
]
.
T
h
e
K
-
m
e
a
n
a
l
g
o
r
i
t
h
m
i
s
s
i
m
p
l
e
a
n
d
f
a
s
t
e
r
t
h
a
n
o
t
h
e
r
s
[
3
0
]
,
s
o
m
e
r
e
s
e
a
r
c
h
p
a
p
e
r
s
u
s
e
d
b
i
s
e
ct
i
n
g
K
-
m
e
a
n
i
t
c
a
n
w
o
r
k
w
i
t
h
l
a
r
g
e
d
a
t
a
s
et
a
n
d
a
c
c
u
r
a
c
y
w
a
s
i
m
p
r
o
v
e
d
.
T
o
f
i
n
e
t
u
n
e
o
u
r
r
e
s
u
l
t
s
a
f
t
e
r
g
r
o
u
p
i
n
g
t
h
e
s
i
m
il
a
r
c
o
n
t
e
n
t
o
f
d
o
c
u
m
e
n
t
s
,
w
e
c
a
l
c
u
la
t
e
d
t
h
e
r
el
a
t
i
o
n
s
b
e
tw
e
e
n
c
l
u
s
t
e
r
s
b
as
e
d
o
n
t
h
e
c
o
s
in
e
d
i
s
t
a
n
c
e
v
ec
t
o
r
.
3.
RE
SU
L
T
S AN
D
D
I
SCU
SS
I
O
N
I
n
th
is
s
ec
tio
n
,
d
is
cu
s
s
io
n
o
n
d
atasets
an
d
ev
alu
atin
g
th
e
ef
f
ec
t
o
f
t
h
e
p
r
o
p
o
s
e
d
m
eth
o
d
.
C
o
m
p
ar
is
o
n
s
m
ad
e
a
m
o
n
g
ex
i
s
tin
g
s
em
an
tic
r
elatio
n
s
id
en
ti
f
icatio
n
tech
n
i
q
u
es.
I
n
s
p
c
d
ata
s
et
ar
e
u
s
ed
f
o
r
t
h
e
co
llectio
n
o
f
a
b
s
tr
ac
ts
.
I
n
s
p
c
d
ataset
was
f
ir
s
t
in
tr
o
d
u
c
ed
b
y
Hu
lth
[
3
1
]
in
2
0
0
3
.
I
n
s
p
c
d
ataset
ab
o
u
t
th
e
co
llectio
n
o
f
ab
s
tr
ac
ts
n
ea
r
ly
2
0
0
0
ab
s
tr
ac
ts
ar
e
av
ailab
le
in
th
is
.
Hu
lth
d
iv
id
ed
th
at
in
to
th
r
ee
p
ar
ts
:
1
0
0
0
tr
ain
in
g
d
atasets
,
5
0
0
v
alid
ati
o
n
d
atasets
,
5
0
0
test
in
g
d
atasets
.
R
ef
er
en
ce
k
e
y
wo
r
d
s
f
r
o
m
th
e
co
r
p
u
s
3
8
2
9
.
DUC0
1
d
ataset
was
r
elea
s
ed
i
n
2
0
0
8
b
y
W
an
an
d
Xiao
[
3
2
]
.
T
h
ese
d
ataset
c
o
llectio
n
s
o
f
n
ews
ar
ticles
to
tal
3
0
8
a
r
ticles
ar
e
av
ailab
le
.
r
ef
e
r
en
ce
k
e
y
wo
r
d
s
f
r
o
m
th
e
co
r
p
u
s
ar
e
2
4
8
8
.
B
ib
T
x
t
d
ataset
[
3
3
]
co
n
tain
s
a
lar
g
e
n
u
m
b
er
o
f
a
r
ticles
d
o
wn
lo
ad
e
d
f
r
o
m
th
e
web
.
T
h
e
d
ataset
co
n
tain
s
a
to
tal
o
f
4
3
8
7
B
ib
T
x
t
f
iles
with
m
o
r
e
th
an
6
m
illi
o
n
ar
ticle
r
ec
o
r
d
s
av
ailab
le.
3.
1
.
E
v
a
lua
t
i
o
n m
ea
s
ures
T
h
e
m
o
s
t
co
m
m
o
n
ev
alu
atio
n
m
ea
s
u
r
es
ar
e
p
u
r
ity
a
n
d
n
o
r
m
alize
d
m
u
tu
al
in
f
o
r
m
atio
n
(
NM
I
)
.
T
h
ese
tech
n
iq
u
es
ar
e
u
s
ed
f
o
r
cl
u
s
ter
s
to
ch
ec
k
th
e
co
r
r
ec
t
n
ess
o
f
ea
c
h
clu
s
ter
.
C
lu
s
ter
r
ep
r
esen
ted
as
C=
{C1
,
C
2
,
C
3
,
.
.
.
.
,
C
j}an
d
p
a
r
titi
o
n
ed
,
P={P1
,
P2
,
P3
,
.
.
.
Pi}
s
u
ch
as
i,
an
d
j
r
ep
r
esen
te
d
th
e
n
u
m
b
er
o
f
clu
s
ter
an
d
clu
s
ter
class
es.C
lu
s
ter
s
ca
n
ac
h
iev
e
h
ig
h
p
u
r
ity
s
h
o
w
n
in
(
4
)
,
an
d
NM
I
ca
lcu
late
d
b
etwe
en
p
air
s
o
f
clu
s
ter
s
an
d
in
d
iv
id
u
al
class
es
.
(
,
)
=
1
/
∑
|
∩
|
(
4
)
Fo
r
k
ey
wo
r
d
ex
tr
ac
tio
n
ev
alu
a
tio
n
m
ea
s
u
r
e
ca
lcu
lated
b
y
p
r
e
cisi
o
n
,
r
ec
all,
an
d
F1
m
ea
s
u
r
e
s
h
o
wn
in
(
5
)
-
(
7
)
:
=
|
∩
|
/
|
|
(
5
)
=
|
∩
|
/
|
|
(
6
)
F1
=
2
PR
/ P
+
R
(
7
)
wh
er
e
th
e
P
r
ep
r
esen
ts
a
p
r
ec
is
io
n
m
ea
s
u
r
e,
K
r
ep
r
esen
ts
an
ex
tr
ac
ted
k
ey
wo
r
d
,
an
d
MK
r
ep
r
esen
ts
th
e
m
an
u
ally
ass
ig
n
ed
k
ey
wo
r
d
.
R
d
en
o
ted
as r
ec
all.
3.
2
.
Ana
ly
s
is
Fro
m
th
e
ex
p
er
im
en
t
r
esu
lts
p
r
ec
is
io
n
,
r
ec
all,
an
d
F1
Me
asu
r
e
v
alu
es
in
T
ab
le
5
.
T
ab
le
6
d
escr
i
b
ed
a
co
m
p
ar
is
o
n
am
o
n
g
th
r
ee
d
if
f
er
en
t
d
atasets
with
s
em
an
tic
r
elatio
n
s
tech
n
o
lo
g
ies
lik
e
th
ir
d
co
m
b
in
atio
n
n
o
r
m
alize
d
g
o
o
g
le
d
is
tan
ce
(
T
C
NGD)
,
p
r
o
b
ab
ilis
tic
f
ea
tu
r
e
p
atter
n
s
(
PF
P),
an
d
o
u
r
m
e
th
o
d
h
y
b
r
id
B
E
R
T
m
o
d
el
f
o
r
tex
t sem
an
tics
u
s
in
g
g
cn
(
HB
T
SG)
.
Fig
u
r
es
4
a
n
d
5
r
ep
r
esen
t th
e
s
am
e
d
escr
ib
ed
in
T
ab
le
5.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
5
0
2
-
4
7
5
2
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
,
Vo
l.
25
,
No
.
3
,
Ma
r
ch
20
22
:
1
7
0
3
-
1
7
1
1
1710
T
ab
le
5
.
C
o
m
p
a
r
is
o
n
o
f
k
ey
w
o
r
d
ex
t
r
ac
tio
n
r
esu
lt
D
a
t
a
s
e
t
I
n
sp
e
c
B
i
p
T
x
t
D
U
C
0
1
M
e
t
h
o
d
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
F1
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
F1
P
r
e
c
i
s
i
o
n
R
e
c
a
l
l
F1
TC
N
G
D
0
.
3
9
1
0
.
7
5
2
0
.
5
0
1
0
.
4
0
1
0
.
7
5
9
0
.
5
2
2
0
.
2
9
2
0
.
3
7
4
0
.
3
2
2
PFP
0
.
3
9
5
0
.
7
7
1
0
.
5
3
2
0
.
4
1
1
0
.
7
9
1
0
.
5
5
2
0
.
3
3
3
0
.
3
9
2
0
.
3
6
3
H
B
TSG
(
P
r
o
p
o
s
e
d
)
0
.
4
1
1
0
.
8
0
1
0
.
5
5
1
0
.
4
2
3
0
.
8
3
4
0
.
5
9
3
0
.
3
5
4
0
.
3
9
4
0
.
3
9
6
T
ab
le
6
.
An
aly
s
is
o
f
ac
c
u
r
ac
y
A
c
c
u
r
a
c
y
%
TC
N
G
D
PFP
H
B
TSG
I
n
sp
e
c
86
87
90
B
i
p
T
x
t
84
85
87
D
U
C
0
1
82
83
85
Fig
u
r
e
4
.
Key
w
o
r
d
e
x
tr
ac
tio
n
ev
alu
atio
n
m
ea
s
u
r
e
Fig
u
r
e
5
.
An
al
y
s
is
o
f
ac
cu
r
ac
y
4.
CO
NCLU
SI
O
N
I
n
th
is
p
ap
er
,
we
f
o
c
u
s
ed
o
n
id
en
tify
in
g
n
ea
r
d
u
p
licates
b
a
s
ed
o
n
s
em
an
tic
r
elatio
n
s
b
et
wee
n
tex
t
d
o
cu
m
e
n
ts
u
s
in
g
NL
P
tech
n
iq
u
es.
Var
io
u
s
m
eth
o
d
s
d
is
cu
s
s
ed
s
em
an
tic
r
elatio
n
s
id
en
tifi
ca
tio
n
.
Do
cu
m
en
ts
ar
e
s
p
lit
in
to
f
ix
ed
b
lo
c
k
s
4
KB
th
en
p
r
ep
r
o
ce
s
s
in
g
d
o
n
e
.
I
n
th
e
t
h
ir
d
s
tag
e,
k
ey
wo
r
d
ex
tr
ac
tio
n
is
d
o
n
e
th
r
o
u
g
h
a
co
m
b
in
atio
n
o
f
th
e
B
E
R
T
an
d
G
C
N
m
o
d
els.
T
h
is
h
y
b
r
id
m
eth
o
d
g
iv
es
b
etter
k
ey
wo
r
d
ex
tr
ac
tio
n
.
Fu
r
th
er
s
im
ilar
c
o
n
ten
ts
a
r
e
g
r
o
u
p
e
d
u
s
in
g
th
e
clu
s
ter
in
g
te
ch
n
iq
u
e,
f
in
ally
th
e
d
is
tan
ce
ca
lcu
lated
b
etw
ee
n
clu
s
ter
s
to
g
et
f
in
e
-
tu
n
e
d
r
esu
lts
o
f
s
em
an
tic
s
im
ilar
ity
b
etwe
en
tex
t
d
o
cu
m
en
ts
.
T
h
r
o
u
g
h
th
is
d
ed
u
p
licatio
n
ca
n
b
e
d
o
n
e
ea
s
ily
an
d
it
will
g
iv
e
g
r
ea
t
r
esu
lts
to
id
en
tify
s
im
ilar
co
n
ten
t
f
iles
s
to
r
ed
in
t
h
e
s
to
r
ag
e
en
v
ir
o
n
m
en
t.
RE
F
E
R
E
NC
E
S
[
1
]
R
.
K
a
u
r
,
I
.
C
h
a
n
a
,
a
n
d
J.
B
h
a
t
t
a
c
h
a
r
y
a
,
“
D
a
t
a
d
e
d
u
p
l
i
c
a
t
i
o
n
t
e
c
h
n
i
q
u
e
s
f
o
r
e
f
f
i
c
i
e
n
t
c
l
o
u
d
s
t
o
r
a
g
e
m
a
n
a
g
e
m
e
n
t
:
a
s
y
st
e
ma
t
i
c
r
e
v
i
e
w
,
”
T
h
e
J
o
u
rn
a
l
o
f
S
u
p
e
rc
o
m
p
u
t
i
n
g
,
v
o
l
.
7
4
,
n
o
.
5
,
p
p
.
2
0
3
5
–
2
0
8
5
,
M
a
y
2
0
1
8
,
d
o
i
:
1
0
.
1
0
0
7
/
s
1
1
2
2
7
-
0
1
7
-
2
2
1
0
-
8.
[
2
]
J.
P
a
u
l
o
a
n
d
J.
P
e
r
e
i
r
a
,
“
A
S
u
r
v
e
y
a
n
d
C
l
a
ssi
f
i
c
a
t
i
o
n
o
f
S
t
o
r
a
g
e
D
e
d
u
p
l
i
c
a
t
i
o
n
S
y
st
e
ms,
”
AC
M
C
o
m
p
u
t
i
n
g
S
u
rv
e
y
s
,
v
o
l
.
4
7
,
n
o
.
1
,
p
p
.
1
–
3
0
,
J
u
l
.
2
0
1
4
,
d
o
i
:
1
0
.
1
1
4
5
/
2
6
1
1
7
7
8
.
[
3
]
D
.
V
i
j
i
a
n
d
S
.
R
e
v
a
t
h
y
,
“
V
a
r
i
o
u
s
D
a
t
a
D
e
d
u
p
l
i
c
a
t
i
o
n
T
e
c
h
n
i
q
u
e
s
o
f
P
r
i
mary
S
t
o
r
a
g
e
,
”
i
n
2
0
1
9
I
n
t
e
r
n
a
t
i
o
n
a
l
C
o
n
f
e
re
n
c
e
o
n
C
o
m
m
u
n
i
c
a
t
i
o
n
a
n
d
El
e
c
t
ro
n
i
c
s
S
y
st
e
m
s (I
C
C
E
S
)
,
J
u
l
.
2
0
1
9
,
p
p
.
3
2
2
–
3
2
7
,
d
o
i
:
1
0
.
1
1
0
9
/
I
C
C
ES4
5
8
9
8
.
2
0
1
9
.
9
0
0
2
1
8
5
.
[
4
]
W
.
X
i
a
,
H
.
Ji
a
n
g
,
D
.
F
e
n
g
,
a
n
d
Y
.
H
u
a
,
“
S
i
m
i
l
a
r
i
t
y
a
n
d
L
o
c
a
l
i
t
y
B
a
se
d
I
n
d
e
x
i
n
g
f
o
r
H
i
g
h
P
e
r
f
o
r
ma
n
c
e
D
a
t
a
D
e
d
u
p
l
i
c
a
t
i
o
n
,
”
I
EEE
T
r
a
n
s
a
c
t
i
o
n
s
o
n
C
o
m
p
u
t
e
rs
,
v
o
l
.
6
4
,
n
o
.
4
,
p
p
.
1
1
6
2
–
1
1
7
6
,
A
p
r
.
2
0
1
5
,
d
o
i
:
1
0
.
1
1
0
9
/
T
C
.
2
0
1
4
.
2
3
0
8
1
8
1
.
[
5
]
A
.
K
h
a
n
,
P
.
H
a
ma
n
d
a
w
a
n
a
,
a
n
d
Y
.
K
i
m
,
“
A
C
o
n
t
e
n
t
F
i
n
g
e
r
p
r
i
n
t
-
B
a
se
d
C
l
u
st
e
r
-
W
i
d
e
I
n
l
i
n
e
D
e
d
u
p
l
i
c
a
t
i
o
n
f
o
r
S
h
a
r
e
d
-
N
o
t
h
i
n
g
S
t
o
r
a
g
e
S
y
s
t
e
ms
,
”
I
EEE
Ac
c
e
ss
,
v
o
l
.
8
,
p
p
.
2
0
9
1
6
3
–
2
0
9
1
8
0
,
2
0
2
0
,
d
o
i
:
1
0
.
1
1
0
9
/
A
C
C
ESS
.
2
0
2
0
.
3
0
3
9
0
5
6
.
[
6
]
Y
.
Ta
n
,
H
.
Ji
a
n
g
,
D
.
F
e
n
g
,
L
.
T
i
a
n
,
Z.
Y
a
n
,
a
n
d
G
.
Z
h
o
u
,
“
S
A
M
:
A
S
e
ma
n
t
i
c
-
A
w
a
r
e
M
u
l
t
i
-
t
i
e
r
e
d
S
o
u
r
c
e
D
e
-
d
u
p
l
i
c
a
t
i
o
n
F
r
a
mew
o
r
k
f
o
r
C
l
o
u
d
B
a
c
k
u
p
,
”
i
n
2
0
1
0
3
9
t
h
I
n
t
e
r
n
a
t
i
o
n
a
l
C
o
n
f
e
r
e
n
c
e
o
n
Pa
r
a
l
l
e
l
Pro
c
e
ss
i
n
g
,
S
e
p
.
2
0
1
0
,
p
p
.
6
1
4
–
6
2
3
,
d
o
i
:
1
0
.
1
1
0
9
/
I
C
P
P
.
2
0
1
0
.
6
9
.
[
7
]
A
.
T.
C
l
e
me
n
t
s,
I
.
A
h
mad
,
M
.
V
i
l
a
y
a
n
n
u
r
,
a
n
d
J.
Li
,
“
D
e
c
e
n
t
r
a
l
i
z
e
d
d
e
d
u
p
l
i
c
a
t
i
o
n
i
n
S
A
N
c
l
u
st
e
r
f
i
l
e
sy
s
t
e
m
s,”
U
S
EN
I
X
An
n
u
a
l
T
e
c
h
n
i
c
a
l
C
o
n
f
e
r
e
n
c
e
,
v
o
l
.
9
,
p
p
.
1
0
1
–
1
1
4
,
2
0
0
9
.
[
8
]
P
.
C
h
r
i
st
e
n
,
“
A
S
u
r
v
e
y
o
f
I
n
d
e
x
i
n
g
Te
c
h
n
i
q
u
e
s
f
o
r
S
c
a
l
a
b
l
e
R
e
c
o
r
d
Li
n
k
a
g
e
a
n
d
D
e
d
u
p
l
i
c
a
t
i
o
n
,
”
I
EEE
T
ra
n
s
a
c
t
i
o
n
s
o
n
K
n
o
w
l
e
d
g
e
a
n
d
D
a
t
a
E
n
g
i
n
e
e
r
i
n
g
,
v
o
l
.
2
4
,
n
o
.
9
,
p
p
.
1
5
3
7
–
1
5
5
5
,
S
e
p
.
2
0
1
2
,
d
o
i
:
1
0
.
1
1
0
9
/
T
K
D
E
.
2
0
1
1
.
1
2
7
.
[
9
]
X
.
H
a
n
a
n
d
L
.
W
a
n
g
,
“
A
N
o
v
e
l
D
o
c
u
m
e
n
t
-
Le
v
e
l
R
e
l
a
t
i
o
n
E
x
t
r
a
c
t
i
o
n
M
e
t
h
o
d
B
a
se
d
o
n
B
E
R
T
a
n
d
E
n
t
i
t
y
I
n
f
o
r
mat
i
o
n
,
”
I
EE
E
Ac
c
e
ss
,
v
o
l
.
8
,
p
p
.
9
6
9
1
2
–
9
6
9
1
9
,
2
0
2
0
,
d
o
i
:
1
0
.
1
1
0
9
/
A
C
C
ESS
.
2
0
2
0
.
2
9
9
6
6
4
2
.
[
1
0
]
R
.
K
.
I
b
r
a
h
i
m,
S
.
R
.
M
.
Ze
e
b
a
r
e
e
,
a
n
d
K
.
F
.
S
.
Ja
c
k
s
i
,
“
S
u
r
v
e
y
o
n
S
e
ma
n
t
i
c
S
i
mi
l
a
r
i
t
y
B
a
s
e
d
o
n
D
o
c
u
me
n
t
C
l
u
st
e
r
i
n
g
,
”
A
d
v
a
n
c
e
s
i
n
S
c
i
e
n
c
e
,
T
e
c
h
n
o
l
o
g
y
a
n
d
E
n
g
i
n
e
e
ri
n
g
S
y
st
e
m
s J
o
u
rn
a
l
,
v
o
l
.
4
,
n
o
.
5
,
p
p
.
1
1
5
–
1
2
2
,
2
0
1
9
,
d
o
i
:
1
0
.
2
5
0
4
6
/
a
j
0
4
0
5
1
5
.
[
1
1
]
S
.
T
o
n
g
p
h
u
,
“
T
o
w
a
r
d
s
e
ma
n
t
i
c
si
mi
l
a
r
i
t
y
me
a
su
r
e
b
e
t
w
e
e
n
c
o
n
c
e
p
t
s
i
n
a
n
o
n
t
o
l
o
g
y
,
”
I
n
d
o
n
e
si
a
n
J
o
u
r
n
a
l
o
f
El
e
c
t
r
i
c
a
l
En
g
i
n
e
e
ri
n
g
a
n
d
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
,
v
o
l
.
1
4
,
n
o
.
3
,
p
.
1
3
5
6
,
Ju
n
.
2
0
1
9
,
d
o
i
:
1
0
.
1
1
5
9
1
/
i
j
e
e
c
s.
v
1
4
.
i
3
.
p
p
1
3
5
6
-
1
3
7
2
.
[
1
2
]
D
.
V
i
j
i
a
n
d
D
.
S
.
R
e
v
a
t
h
y
,
“
C
o
m
p
a
r
a
t
i
v
e
A
n
a
l
y
s
i
s
f
o
r
C
o
n
t
e
n
t
D
e
f
i
n
e
d
C
h
u
n
k
i
n
g
A
l
g
o
r
i
t
h
ms
i
n
D
a
t
a
D
e
d
u
p
l
i
c
a
t
i
o
n
,
”
We
b
o
l
o
g
y
,
v
o
l
.
1
8
,
n
o
.
2
,
p
p
.
2
5
5
–
2
6
8
,
A
p
r
.
2
0
2
1
,
d
o
i
:
1
0
.
1
4
7
0
4
/
W
E
B
/
V
1
8
S
I
0
2
/
W
E
B
1
8
0
7
0
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
d
o
n
esian
J
E
lec
E
n
g
&
C
o
m
p
Sci
I
SS
N:
2502
-
4
7
5
2
A
n
a
lyzi
n
g
s
ema
n
tic
s
imila
r
ity
a
mo
n
g
s
t te
xtu
a
l d
o
cu
men
ts
to
s
u
g
g
est n
ea
r
d
u
p
lica
tes
(
V
iji De
va
r
a
ja
n
)
1711
[
1
3
]
S
.
W
u
,
K
.
-
C
.
Li
,
B
.
M
a
o
,
a
n
d
M
.
L
i
a
o
,
“
D
A
C
:
I
mp
r
o
v
i
n
g
st
o
r
a
g
e
a
v
a
i
l
a
b
i
l
i
t
y
w
i
t
h
D
e
d
u
p
l
i
c
a
t
i
o
n
-
A
ssi
st
e
d
C
l
o
u
d
-
of
-
C
l
o
u
d
s,”
Fu
t
u
r
e
G
e
n
e
r
a
t
i
o
n
C
o
m
p
u
t
e
r
S
y
s
t
e
m
s
,
v
o
l
.
7
4
,
p
p
.
1
9
0
–
1
9
8
,
S
e
p
.
2
0
1
7
,
d
o
i
:
1
0
.
1
0
1
6
/
j
.
f
u
t
u
r
e
.
2
0
1
6
.
0
2
.
0
0
1
.
[
1
4
]
I
.
M
.
N
g
u
e
n
a
a
n
d
A
.
-
M
.
O
.
C
.
R
i
c
h
e
l
i
n
e
,
“
F
a
st
S
e
m
a
n
t
i
c
D
u
p
l
i
c
a
t
e
D
e
t
e
c
t
i
o
n
Te
c
h
n
i
q
u
e
s
i
n
D
a
t
a
b
a
ses,
”
J
o
u
rn
a
l
o
f
S
o
f
t
w
a
r
e
En
g
i
n
e
e
ri
n
g
a
n
d
A
p
p
l
i
c
a
t
i
o
n
s
,
v
o
l
.
1
0
,
n
o
.
0
6
,
p
p
.
5
2
9
–
5
4
5
,
2
0
1
7
,
d
o
i
:
1
0
.
4
2
3
6
/
j
se
a
.
2
0
1
7
.
1
0
6
0
2
9
.
[
1
5
]
A
.
P
a
w
a
r
a
n
d
V
.
M
a
g
o
,
“
C
a
l
c
u
l
a
t
i
n
g
t
h
e
si
m
i
l
a
r
i
t
y
b
e
t
w
e
e
n
w
o
r
d
s
a
n
d
se
n
t
e
n
c
e
s
u
si
n
g
a
l
e
x
i
c
a
l
d
a
t
a
b
a
s
e
a
n
d
c
o
r
p
u
s
s
t
a
t
i
s
t
i
c
s,”
F
e
b
.
2
0
1
8
,
[
O
n
l
i
n
e
]
.
A
v
a
i
l
a
b
l
e
:
h
t
t
p
:
/
/
a
r
x
i
v
.
o
r
g
/
a
b
s
/
1
8
0
2
.
0
5
6
6
7
.
[
1
6
]
E
.
A
l
t
s
z
y
l
e
r
,
M
.
S
i
g
m
a
n
,
S
.
R
i
b
e
i
r
o
,
a
n
d
D
.
F
.
S
l
e
z
a
k
,
“
C
o
m
p
a
r
a
t
i
v
e
s
t
u
d
y
o
f
L
S
A
v
s
W
o
r
d
2
v
e
c
e
m
b
e
d
d
i
n
g
s
i
n
s
m
a
l
l
c
o
r
p
o
r
a
:
a
c
a
s
e
s
t
u
d
y
i
n
d
r
e
a
m
s
d
a
t
a
b
a
s
e
,
”
C
o
n
s
c
i
o
u
s
n
e
s
s
a
n
d
C
o
g
n
i
t
i
o
n
,
v
o
l
.
5
6
,
p
p
.
1
7
8
–
1
8
7
,
O
c
t
.
2
0
1
6
,
d
o
i
:
1
0
.
1
0
1
6
/
j
.
c
o
n
c
o
g
.
2
0
1
7
.
0
9
.
0
0
4
.
[
1
7
]
S
.
Zh
o
u
,
X
.
X
u
,
Y
.
Li
u
,
R
.
C
h
a
n
g
,
a
n
d
Y
.
X
i
a
o
,
“
Te
x
t
S
i
m
i
l
a
r
i
t
y
M
e
a
su
r
e
me
n
t
o
f
S
e
ma
n
t
i
c
C
o
g
n
i
t
i
o
n
B
a
s
e
d
o
n
W
o
r
d
V
e
c
t
o
r
D
i
st
a
n
c
e
D
e
c
e
n
t
r
a
l
i
z
a
t
i
o
n
W
i
t
h
C
l
u
st
e
r
i
n
g
A
n
a
l
y
s
i
s,
”
I
EEE
Ac
c
e
ss
,
v
o
l
.
7
,
p
p
.
1
0
7
2
4
7
–
1
0
7
2
5
8
,
2
0
1
9
,
d
o
i
:
1
0
.
1
1
0
9
/
A
C
C
ESS
.
2
0
1
9
.
2
9
3
2
3
3
4
.
[
1
8
]
M
.
O
st
e
n
d
o
r
f
f
,
T.
R
u
a
s,
M
.
S
c
h
u
b
o
t
z
,
G
.
R
e
h
m,
a
n
d
B
.
G
i
p
p
,
“
P
a
i
r
w
i
se
m
u
l
t
i
-
c
l
a
s
s d
o
c
u
me
n
t
c
l
a
ss
i
f
i
c
a
t
i
o
n
f
o
r
s
e
ma
n
t
i
c
r
e
l
a
t
i
o
n
s
b
e
t
w
e
e
n
w
i
k
i
p
e
d
i
a
a
r
t
i
c
l
e
s,
”
Pro
c
e
e
d
i
n
g
s
o
f
t
h
e
A
C
M
/
I
EEE
J
o
i
n
t
C
o
n
f
e
re
n
c
e
o
n
D
i
g
i
t
a
l
L
i
b
r
a
r
i
e
s
,
M
a
r
.
2
0
2
0
,
p
p
.
1
2
7
–
1
3
6
,
d
o
i
:
1
0
.
1
1
4
5
/
3
3
8
3
5
8
3
.
3
3
9
8
5
2
5
.
[
1
9
]
L.
C
a
i
,
Y
.
S
o
n
g
,
T.
L
i
u
,
a
n
d
K
.
Zh
a
n
g
,
“
A
H
y
b
r
i
d
B
ER
T
M
o
d
e
l
Th
a
t
I
n
c
o
r
p
o
r
a
t
e
s
L
a
b
e
l
S
e
m
a
n
t
i
c
s
v
i
a
A
d
j
u
st
i
v
e
A
t
t
e
n
t
i
o
n
f
o
r
M
u
l
t
i
-
La
b
e
l
T
e
x
t
C
l
a
ssi
f
i
c
a
t
i
o
n
,
”
I
EE
E
Ac
c
e
ss
,
v
o
l
.
8
,
p
p
.
1
5
2
1
8
3
–
1
5
2
1
9
2
,
2
0
2
0
,
d
o
i
:
1
0
.
1
1
0
9
/
A
C
C
ESS
.
2
0
2
0
.
3
0
1
7
3
8
2
.
[
2
0
]
T
.
N
.
K
i
p
f
a
n
d
M
.
W
e
l
l
i
n
g
,
“
S
e
m
i
-
s
u
p
e
r
v
i
s
e
d
c
l
a
s
s
i
f
i
c
a
t
i
o
n
w
i
t
h
g
r
a
p
h
c
o
n
v
o
l
u
t
i
o
n
a
l
n
e
t
w
o
r
k
s
,
”
5
t
h
I
n
t
e
r
n
a
t
i
o
n
a
l
C
o
n
f
e
r
e
n
c
e
o
n
L
e
a
r
n
i
n
g
R
e
p
r
e
s
e
n
t
a
t
i
o
n
s
,
I
C
L
R
2
0
1
7
-
C
o
n
f
e
r
e
n
c
e
T
r
a
c
k
P
r
o
c
e
e
d
i
n
g
s
,
S
e
p
.
2
0
1
7
,
[
O
n
l
i
n
e
]
.
A
v
a
i
l
a
b
l
e
:
h
t
t
p
:
/
/
a
r
x
i
v
.
o
r
g
/
a
b
s
/
1
6
0
9
.
0
2
9
0
7
.
[
2
1
]
J.
D
e
v
l
i
n
,
M
.
W
.
C
h
a
n
g
,
K
.
Le
e
,
a
n
d
K
.
T
o
u
t
a
n
o
v
a
,
“
B
E
R
T:
P
re
-
t
r
a
i
n
i
n
g
o
f
d
e
e
p
b
i
d
i
r
e
c
t
i
o
n
a
l
t
r
a
n
sf
o
r
me
r
s
f
o
r
l
a
n
g
u
a
g
e
u
n
d
e
r
s
t
a
n
d
i
n
g
,
”
N
AA
C
L
H
L
T
2
0
1
9
-
2
0
1
9
C
o
n
f
e
re
n
c
e
o
f
t
h
e
N
o
r
t
h
Am
e
ri
c
a
n
C
h
a
p
t
e
r
o
f
t
h
e
Ass
o
c
i
a
t
i
o
n
f
o
r
C
o
m
p
u
t
a
t
i
o
n
a
l
L
i
n
g
u
i
st
i
c
s:
H
u
m
a
n
L
a
n
g
u
a
g
e
T
e
c
h
n
o
l
o
g
i
e
s
-
Pr
o
c
e
e
d
i
n
g
s
o
f
t
h
e
C
o
n
f
e
re
n
c
e
,
v
o
l
.
1
,
p
p
.
4
1
7
1
–
4
1
8
6
,
O
c
t
.
2
0
1
9
.
[
2
2
]
T.
M
i
k
o
l
o
v
,
K
.
C
h
e
n
,
G
.
C
o
r
r
a
d
o
,
a
n
d
J.
D
e
a
n
,
“
Ef
f
i
c
i
e
n
t
E
st
i
mat
i
o
n
o
f
W
o
r
d
R
e
p
r
e
s
e
n
t
a
t
i
o
n
s
i
n
V
e
c
t
o
r
S
p
a
c
e
,
”
J
a
n
.
2
0
1
3
,
[
O
n
l
i
n
e
]
.
A
v
a
i
l
a
b
l
e
:
h
t
t
p
:
/
/
a
r
x
i
v
.
o
r
g
/
a
b
s/
1
3
0
1
.
3
7
8
1
.
[
2
3
]
A
.
R
o
z
e
v
a
a
n
d
S
.
Ze
r
k
o
v
a
,
“
A
ssess
i
n
g
sem
a
n
t
i
c
si
m
i
l
a
r
i
t
y
o
f
t
e
x
t
s
-
M
e
t
h
o
d
s
a
n
d
a
l
g
o
r
i
t
h
ms,”
i
n
AI
P
C
o
n
f
e
r
e
n
c
e
Pro
c
e
e
d
i
n
g
s
,
v
o
l
.
1
9
1
0
,
2
0
1
7
,
p
.
0
6
0
0
1
2
,
d
o
i
:
1
0
.
1
0
6
3
/
1
.
5
0
1
4
0
0
6
.
[
2
4
]
S
.
M
.
M
o
h
a
m
me
d
,
K
.
Ja
c
k
s
i
,
a
n
d
S
.
R
.
M
.
Ze
e
b
a
r
e
e
,
“
A
s
t
a
t
e
-
of
-
t
h
e
-
a
r
t
s
u
r
v
e
y
o
n
s
e
ma
n
t
i
c
s
i
m
i
l
a
r
i
t
y
f
o
r
d
o
c
u
me
n
t
c
l
u
s
t
e
r
i
n
g
u
si
n
g
G
l
o
V
e
a
n
d
d
e
n
s
i
t
y
-
b
a
se
d
a
l
g
o
r
i
t
h
ms
,
”
I
n
d
o
n
e
s
i
a
n
J
o
u
r
n
a
l
o
f
E
l
e
c
t
ri
c
a
l
En
g
i
n
e
e
r
i
n
g
a
n
d
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
,
v
o
l
.
2
2
,
n
o
.
1
,
p
.
5
5
2
,
A
p
r
.
2
0
2
1
,
d
o
i
:
1
0
.
1
1
5
9
1
/
i
j
e
e
c
s.v
2
2
.
i
1
.
p
p
5
5
2
-
5
6
2
.
[
2
5
]
J.
P
e
n
n
i
n
g
t
o
n
,
R
.
S
o
c
h
e
r
,
a
n
d
C
.
D
.
M
a
n
n
i
n
g
,
“
G
l
o
V
e
:
G
l
o
b
a
l
v
e
c
t
o
r
s
f
o
r
w
o
r
d
r
e
p
r
e
s
e
n
t
a
t
i
o
n
,
”
i
n
E
M
N
L
P
2
0
1
4
-
2
0
1
4
C
o
n
f
e
re
n
c
e
o
n
Em
p
i
r
i
c
a
l
Me
t
h
o
d
s
i
n
N
a
t
u
r
a
l
L
a
n
g
u
a
g
e
Pro
c
e
ss
i
n
g
,
Pr
o
c
e
e
d
i
n
g
s
o
f
t
h
e
C
o
n
f
e
re
n
c
e
,
2
0
1
4
,
p
p
.
1
5
3
2
–
1
5
4
3
,
d
o
i
:
1
0
.
3
1
1
5
/
v
1
/
d
1
4
-
1
1
6
2
.
[
2
6
]
S
.
Y
u
,
J
.
S
u
,
a
n
d
D
.
L
u
o
,
“
I
mp
r
o
v
i
n
g
B
ER
T
-
B
a
s
e
d
Te
x
t
C
l
a
ss
i
f
i
c
a
t
i
o
n
w
i
t
h
A
u
x
i
l
i
a
r
y
S
e
n
t
e
n
c
e
a
n
d
D
o
m
a
i
n
K
n
o
w
l
e
d
g
e
,
”
I
EE
E
Ac
c
e
ss
,
v
o
l
.
7
,
p
p
.
1
7
6
6
0
0
–
1
7
6
6
1
2
,
2
0
1
9
,
d
o
i
:
1
0
.
1
1
0
9
/
A
C
C
ESS
.
2
0
1
9
.
2
9
5
3
9
9
0
.
[
2
7
]
P
.
G
o
y
a
l
a
n
d
E
.
F
e
r
r
a
r
a
,
“
G
r
a
p
h
e
m
b
e
d
d
i
n
g
t
e
c
h
n
i
q
u
e
s,
a
p
p
l
i
c
a
t
i
o
n
s
,
a
n
d
p
e
r
f
o
r
ma
n
c
e
:
A
s
u
r
v
e
y
,
”
K
n
o
w
l
e
d
g
e
-
B
a
se
d
S
y
s
t
e
m
s
,
v
o
l
.
1
5
1
,
p
p
.
7
8
–
9
4
,
M
a
y
2
0
1
8
,
d
o
i
:
1
0
.
1
0
1
6
/
j
.
k
n
o
s
y
s.
2
0
1
8
.
0
3
.
0
2
2
.
[
2
8
]
M
.
S
t
e
i
n
b
a
c
h
,
G
.
K
a
r
y
p
i
s
,
a
n
d
V
.
K
u
mar,
“
A
c
o
m
p
a
r
i
s
o
n
o
f
d
o
c
u
me
n
t
c
l
u
st
e
r
i
n
g
t
e
c
h
n
i
q
u
e
s
,
”
Pr
o
c
e
e
d
i
n
g
s
o
f
t
h
e
I
n
t
e
r
n
a
t
i
o
n
a
l
K
D
D
W
o
rk
s
h
o
p
o
n
T
e
x
t
M
i
n
i
n
g
,
2
0
0
0
.
[
2
9
]
S
.
S
h
a
r
ma
a
n
d
R
.
K
.
G
u
p
t
a
,
“
I
mp
r
o
v
e
d
B
S
P
C
l
u
st
e
r
i
n
g
A
l
g
o
r
i
t
h
m
f
o
r
S
o
c
i
a
l
N
e
t
w
o
r
k
A
n
a
l
y
si
s
,
”
I
n
t
e
r
n
a
t
i
o
n
a
l
J
o
u
rn
a
l
o
f
G
r
i
d
a
n
d
D
i
st
r
i
b
u
t
e
d
C
o
m
p
u
t
i
n
g
,
v
o
l
.
3
,
n
o
.
3
,
p
p
.
6
7
–
7
6
,
2
0
1
0
.
[
3
0
]
M
.
F
a
r
i
ss,
N
.
E
l
A
l
l
a
l
i
,
H
.
A
s
a
i
d
i
,
a
n
d
M
.
B
e
l
l
o
u
k
i
,
“
A
sem
a
n
t
i
c
w
e
b
s
e
r
v
i
c
e
s
d
i
sc
o
v
e
r
y
a
p
p
r
o
a
c
h
i
n
t
e
g
r
a
t
i
n
g
mu
l
t
i
p
l
e
s
i
mi
l
a
r
i
t
y
mea
s
u
r
e
s
a
n
d
k
-
me
a
n
s
c
l
u
s
t
e
r
i
n
g
,
”
I
n
d
o
n
e
s
i
a
n
J
o
u
r
n
a
l
o
f
E
l
e
c
t
r
i
c
a
l
En
g
i
n
e
e
ri
n
g
a
n
d
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
,
v
o
l
.
2
4
,
n
o
.
2
,
p
.
1
2
2
8
,
N
o
v
.
2
0
2
1
,
d
o
i
:
1
0
.
1
1
5
9
1
/
i
j
e
e
c
s
.
v
2
4
.
i
2
.
p
p
1
2
2
8
-
1
2
3
7
.
[
3
1
]
A
.
H
u
l
t
h
,
“
I
mp
r
o
v
e
d
a
u
t
o
ma
t
i
c
k
e
y
w
o
r
d
e
x
t
r
a
c
t
i
o
n
g
i
v
e
n
m
o
r
e
l
i
n
g
u
i
st
i
c
k
n
o
w
l
e
d
g
e
,
”
i
n
Pr
o
c
e
e
d
i
n
g
s
o
f
t
h
e
2
0
0
3
c
o
n
f
e
re
n
c
e
o
n
Em
p
i
r
i
c
a
l
m
e
t
h
o
d
s
i
n
n
a
t
u
ra
l
l
a
n
g
u
a
g
e
p
r
o
c
e
ss
i
n
g
,
v
o
l
.
1
0
,
2
0
0
3
,
p
p
.
2
1
6
–
2
2
3
,
d
o
i
:
1
0
.
3
1
1
5
/
1
1
1
9
3
5
5
.
1
1
1
9
3
8
3
.
[
3
2
]
X
.
W
a
n
a
n
d
J.
X
i
a
o
,
“
S
i
n
g
l
e
D
o
c
u
me
n
t
K
e
y
p
h
r
a
s
e
E
x
t
r
a
c
t
i
o
n
U
si
n
g
N
e
i
g
h
b
o
r
h
o
o
d
K
n
o
w
l
e
d
g
e
,
”
i
n
AA
AI
’
0
8
:
P
ro
c
e
e
d
i
n
g
s
o
f
t
h
e
2
3
r
d
n
a
t
i
o
n
a
l
c
o
n
f
e
r
e
n
c
e
o
n
Ar
t
i
f
i
c
i
a
l
i
n
t
e
l
l
i
g
e
n
c
e
,
2
0
0
8
,
p
p
.
8
5
5
–
8
6
0
,
d
o
i
:
1
0
.
5
5
5
5
/
1
6
2
0
1
6
3
.
1
6
2
0
2
0
5
.
[
3
3
]
“
B
i
b
T
e
x
t
d
a
t
a
s
e
t
,
”
I
E
S
L
.
h
t
t
p
s:
/
/
s
i
t
e
s.
g
o
o
g
l
e
.
c
o
m
/
a
/
i
e
s
l
.
c
s.
u
mass
.
e
d
u
/
h
o
m
e
/
d
a
t
a
/
b
i
b
t
e
x
(
a
c
c
e
ss
e
d
F
e
b
.
1
8
,
2
0
2
0
)
.
B
I
O
G
RAP
H
I
E
S O
F
AUTH
O
RS
Viji
De
v
a
r
a
ja
n
re
c
e
iv
e
d
h
e
r
Ba
c
h
e
lo
r
o
f
En
g
i
n
e
e
rin
g
d
e
g
re
e
in
Co
m
p
u
ter
S
c
ien
c
e
a
n
d
En
g
in
e
e
ri
n
g
in
2
0
1
0
.
S
h
e
a
ls
o
re
c
e
iv
e
d
h
e
r
M
a
ste
r
o
f
En
g
in
e
e
ri
n
g
De
g
re
e
in
Co
m
p
u
ter
S
c
ien
c
e
a
n
d
E
n
g
in
e
e
ri
n
g
fr
o
m
Ad
h
i
p
a
ra
sa
k
th
i
e
n
g
in
e
e
rin
g
E
n
g
in
e
e
rin
g
Co
l
leg
e
,
M
e
lma
ru
v
a
th
u
r
in
2
0
1
5
.
S
h
e
is
c
u
rre
n
tl
y
p
u
rsu
i
n
g
th
e
P
h
.
D.
d
e
g
re
e
with
t
h
e
De
p
a
rtme
n
t
o
f
Co
m
p
u
ter
S
c
ien
c
e
a
n
d
En
g
in
e
e
rin
g
,
S
a
t
h
y
a
b
a
m
a
In
stit
u
te
o
f
S
c
ien
c
e
a
n
d
Tec
h
n
o
lo
g
y
,
Ch
e
n
n
a
i
I
n
d
ia.
S
h
e
is
c
u
rre
n
tl
y
a
n
As
sista
n
t
P
ro
fe
ss
o
r
in
th
e
De
p
a
rtme
n
t
o
f
Co
m
p
u
ter
S
c
ien
c
e
a
n
d
E
n
g
in
e
e
rin
g
,
S
RM
I
n
stit
u
te o
f
S
c
ien
c
e
a
n
d
Tec
h
n
o
l
o
g
y
,
C
h
e
n
n
a
i
In
d
ia.
He
r
wo
r
k
in
c
lu
d
e
s
o
v
e
r
1
9
Jo
u
r
n
a
l
p
u
b
li
c
a
ti
o
n
s
a
n
d
2
1
c
o
n
fe
re
n
c
e
p
u
b
li
c
a
ti
o
n
s
wit
h
i
n
h
e
r
wo
rk
in
g
e
x
p
e
rien
c
e
o
f
5
y
e
a
rs so
fa
r.
S
h
e
c
a
n
b
e
c
o
n
tac
ted
a
t
e
m
a
il
:
d
v
ij
i2
k
@g
m
a
il
.
c
o
m
.
Dr
.
Re
v
a
th
y
S
u
b
r
a
m
a
n
ia
n
is
p
re
se
n
tl
y
wo
r
k
i
n
g
a
s
a
n
As
so
c
iat
e
P
ro
fe
ss
o
r
in
th
e
De
p
a
rtme
n
t
o
f
In
fo
rm
a
ti
o
n
Tec
h
n
o
lo
g
y
,
S
a
th
y
a
b
a
m
a
In
stit
u
te
o
f
S
c
ien
c
e
a
n
d
Tec
h
n
o
l
o
g
y
,
Ch
e
n
n
a
i
I
n
d
ia.
He
r
re
se
a
rc
h
in
ter
e
st
in
c
lu
d
e
s
M
a
c
h
i
n
e
Lea
rn
in
g
,
D
a
ta
An
a
ly
ti
c
s
a
n
d
Bi
g
D
a
ta.
S
h
e
h
a
s
p
u
b
li
sh
e
d
o
v
e
r
4
1
p
a
p
e
rs
in
re
fe
re
e
d
jo
u
rn
a
ls.
S
h
e
c
a
n
b
e
c
o
n
tac
ted
a
t
e
m
a
il
:
ra
m
e
sh
.
re
v
a
th
y
@g
m
a
il
.
c
o
m
.
Evaluation Warning : The document was created with Spire.PDF for Python.