I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
11
,
N
o.
1
,
M
a
r
c
h
20
22
, pp.
34
~
40
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
11
.i
1
.pp
34
-
40
34
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
T
e
xt
si
m
i
l
ar
i
t
y al
gor
i
t
h
m
s t
o
d
e
t
e
r
m
i
n
e
I
n
d
i
a
n
p
e
n
al
c
od
e
se
c
t
i
on
s f
or
of
f
e
n
c
e
r
e
p
or
t
A
m
b
r
is
h
S
r
iv
as
t
av, S
h
al
ig
r
am
P
r
aj
ap
at
D
e
pa
r
t
m
e
nt
of
C
om
put
e
r
S
c
i
e
nc
e
,
I
nt
e
r
na
t
i
ona
l
I
ns
t
i
t
ut
e
of
P
r
o
f
e
s
s
i
ona
l
S
t
udi
e
s
(
I
I
P
S
)
,
D
e
vi
A
hi
l
ya
U
ni
ve
r
s
i
t
y (
DAVV
)
,
I
ndor
e
, I
ndi
a
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
M
a
r
25, 2021
R
e
vi
s
e
d
D
e
c
22
, 2021
A
c
c
e
pt
e
d
D
e
c
29, 2021
Taking decisions by
c
omparing two text documents is
a new innovativ
e idea.
Text
documents
contain
details,
rules
and
information
related
to
a
domain.
The
judiciary
system
is
an
area
where
many
textual
documents
ar
e
ava
ilable.
In
some
documents,
rules
related
to
the
judiciary
a
re
mentioned,
such
as
the
Indian
penal
code
(IPC)
section
documents
and
other
documents
li
ke
first
information
re
port
(FIR),
and
Investigati
on
report.
contain
deta
ils
of
incidents.
Our
assumption
is
that
the
system
can
help
in
making
the
d
ecision
by
findin
g
the
right
IPC
Section
from
the
result
of
text
similarity
b
etween
IPC
s
ection
document
and
FIR,
investigation
report
.
In
thi
s
research
paper,
we
preface
a
new
research
problem
to
make
de
cisions
to
suggest
appr
opriate
IPC
Section
for
crime
related
informat
ion
from
user’s
input
by
using
vector
space model a
nd natural langua
ge processing te
chniques.
K
e
y
w
o
r
d
s
:
D
e
c
is
io
n s
uppor
t
s
y
s
te
m
I
nf
or
m
a
ti
on r
e
t
r
ie
va
l
s
ys
te
m
L
a
w
i
nf
or
m
a
ti
on s
ys
te
m
N
a
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
T
e
xt
s
im
il
a
r
it
y
V
e
c
to
r
s
pa
c
e
m
ode
l
This is an
open
acce
ss artic
le unde
r the
CC BY
-
SA
license.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
A
m
br
is
h S
r
iv
a
s
ta
v
D
e
pa
r
tm
e
nt
of
C
om
put
e
r
S
c
ie
nc
e
, I
I
P
S
, D
A
V
V
139, Kha
ndw
a
R
d, I
ndr
a
pur
i
C
ol
ony, I
ndor
e
, M
a
dhya
P
r
a
de
s
h (
I
ndi
a
)
452001
E
m
a
il
:
a
.s
r
iv
a
s
ta
v30@
gm
a
il
.c
om
1.
I
N
T
R
O
D
U
C
T
I
O
N
T
he
de
c
is
io
n
s
uppor
t
s
ys
te
m
(
D
S
S
)
is
a
c
om
put
e
r
iz
e
d
pr
ogr
a
m
us
e
d
f
or
de
c
is
io
n
-
m
a
ki
ng
a
c
ti
vi
ti
e
s
a
im
e
d
a
t
gr
ow
in
g
th
e
bus
in
e
s
s
.
P
r
e
s
e
nt
ly
,
due
to
th
e
pr
ogr
e
s
s
in
th
e
f
ie
ld
of
c
om
put
e
r
s
,
a
ll
ne
w
doc
um
e
nt
s
f
r
om
di
f
f
e
r
e
nt
a
r
e
a
s
a
r
e
be
in
g
d
ig
it
a
li
z
e
d.
D
oc
um
e
nt
s
r
e
la
te
d
t
o
th
e
ju
di
c
ia
l
s
ys
te
m
,
s
uc
h
a
s
f
ir
s
t
in
f
o
r
m
a
ti
on
r
e
por
t
s
(
F
I
R
s
)
,
in
ve
s
ti
ga
ti
on
r
e
por
ts
,
a
nd
ju
dgm
e
nt
s
a
r
e
a
va
i
la
bl
e
di
gi
ta
ll
y,
in
w
hi
c
h
w
e
c
a
n
e
xt
r
a
c
t
a
ny
in
f
or
m
a
ti
on
by
im
pl
e
m
e
nt
in
g
a
c
om
put
e
r
iz
e
d
a
lg
or
it
hm
.
I
n
th
e
pa
s
t
de
c
a
d
e
,
s
om
e
s
ys
te
m
s
w
e
r
e
de
ve
lo
pe
d
to
he
lp
w
it
h
de
c
is
io
n
m
a
ki
ng
by
us
in
g
te
xt
s
im
il
a
r
it
y
a
lg
or
it
hm
s
.
T
hi
s
s
y
s
te
m
c
a
lc
ul
a
te
s
th
e
s
im
il
a
r
it
y
be
twe
e
n
two
le
ga
l
doc
um
e
nt
s
by
us
in
g
c
onc
e
pt
b
a
s
e
d
s
im
il
a
r
it
y
,
m
ul
ti
-
di
m
e
ns
io
na
l
s
im
il
a
r
it
y
[
1]
a
nd
e
m
be
ddi
ng
-
ba
s
e
d m
e
th
odol
ogi
e
s
[
2]
–
[
4]
.
D
e
ve
lo
pi
ng
D
S
S
to
a
n
a
ly
z
e
r
e
por
t
a
nd
f
in
di
ng
a
ppr
opr
ia
t
e
I
ndi
a
n
pe
na
l
c
ode
(
I
P
C
)
s
e
c
ti
on
a
c
c
or
di
ng
is
a
ne
w
id
e
a
.
W
he
ne
v
e
r
th
e
r
e
i
s
a
ny
c
r
im
e
in
th
e
s
o
c
ie
ty
,
it
s
in
f
or
m
a
ti
on
is
gi
ve
n
to
th
e
pol
ic
e
a
nd
th
e
pol
ic
e
a
r
e
in
ve
s
ti
ga
te
b
a
s
e
d
on
th
a
t
in
f
or
m
a
ti
on.
T
he
pol
ic
e
pr
e
pa
r
e
a
c
om
pr
e
he
n
s
iv
e
r
e
por
t
(
c
ha
r
ge
s
he
e
t)
f
or
th
e
c
our
t,
w
hi
c
h
m
e
nt
io
ns
s
e
c
ti
on
s
of
th
e
va
r
io
us
IP
C
r
e
la
te
d
to
th
e
c
r
im
e
.
K
now
le
dge
a
nd
e
xpe
r
ie
nc
e
of
th
e
s
e
c
ti
ons
of
th
e
I
P
C
is
r
e
qui
r
e
d
to
pr
e
pa
r
e
th
e
c
ha
r
ge
s
he
e
t,
on
th
e
ba
s
is
of
w
hi
c
h
a
c
or
r
e
c
t
a
nd
a
ppr
opr
ia
te
doc
um
e
nt
is
pr
e
pa
r
e
d
f
or
th
e
c
our
t.
A
pa
r
t
f
r
om
th
e
pol
ic
e
,
s
om
e
ot
he
r
pe
opl
e
or
or
ga
ni
z
a
ti
o
ns
c
a
n
a
ls
o
be
us
e
r
s
of
th
e
s
ys
te
m
.
A
la
w
ye
r
w
ho
r
e
-
e
xa
m
in
e
s
th
e
c
ha
r
ge
s
he
e
t
a
nd
ba
s
e
d
on
hi
s
e
xpe
r
ie
nc
e
pr
e
pa
r
e
s
th
e
ba
c
kgr
ound
of
th
e
c
r
im
e
a
nd
pr
e
s
e
nt
s
it
to
th
e
of
f
e
nde
r
or
vi
c
ti
m
’
s
s
id
e
in
c
our
t.
R
e
a
di
ng
a
nd
unde
r
s
ta
ndi
ng
do
c
um
e
nt
s
m
a
nua
ll
y
s
u
c
h
a
di
f
f
ic
ul
t
a
nd
ti
m
e
ta
ki
ng
ta
s
k
f
or
e
ve
r
yone
.
I
f
c
om
put
e
r
pr
ogr
a
m
he
lp
s
in
hi
ghl
ig
ht
in
g
im
por
ta
nt
in
f
or
m
a
ti
on
a
nd
c
he
c
ki
ng
c
or
r
e
c
tn
e
s
s
of
r
e
s
ul
t
a
c
c
or
di
ng
to
r
ul
e
s
,
it
w
il
l
h
e
lp
to
unde
r
s
t
a
ndi
ng
doc
um
e
nt
f
a
s
tl
y.
A
c
om
m
on
pe
r
s
on
or
or
ga
ni
z
a
ti
on
c
a
n
a
ls
o
u
s
e
th
is
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
T
e
x
t
s
imi
la
r
it
y
al
gor
it
hm
s
t
o d
e
te
r
m
in
e
I
ndi
an pe
nal
c
od
e
s
e
c
ti
ons
f
or
of
fe
nc
e
r
e
por
t
(
A
m
br
is
h S
r
iv
as
ta
v
)
35
s
ys
te
m
,
w
it
h
w
hi
c
h
a
ny
c
r
im
e
,
d
e
c
e
pt
io
n
or
vi
ol
a
ti
on
of
r
ig
ht
s
ha
s
ta
ke
n
pl
a
c
e
.
T
h
e
pe
r
s
on
or
or
ga
ni
z
a
ti
on
ha
s
t
o e
nt
e
r
t
he
de
ta
il
s
of
t
he
i
nc
id
e
nt
w
it
h t
he
m
i
n t
he
s
ys
te
m
.
T
o
us
e
th
e
s
y
s
te
m
,
th
e
us
e
r
w
il
l
ha
ve
to
e
nt
e
r
th
e
in
f
or
m
a
ti
on
of
th
e
in
c
id
e
nt
i
n
th
e
f
or
m
of
na
tu
r
a
l
la
ngua
ge
te
xt
a
nd
a
f
te
r
a
na
ly
z
in
g
th
e
in
c
id
e
nt
,
th
e
s
y
s
te
m
w
il
l
de
c
id
e
th
e
s
e
c
ti
on
of
th
e
I
P
C
.
H
e
r
e
,
w
e
pr
opos
e
a
D
S
S
f
or
f
in
di
ng
I
P
C
s
e
c
ti
ons
(
a
s
a
n
a
ppr
opr
ia
te
a
n
s
w
e
r
)
f
or
in
put
o
f
th
e
us
e
r
.
T
he
s
e
c
ti
on
of
th
e
pe
na
l
c
ode
de
p
e
nds
on
th
e
va
r
io
us
s
it
ua
ti
ons
,
c
ir
c
um
s
ta
n
c
e
s
,
s
om
e
ot
he
r
in
f
or
m
a
ti
on
of
th
e
c
r
im
e
a
nd
th
e
de
f
in
it
io
n
de
f
in
e
d
in
I
P
C
d
oc
um
e
nt
.
T
h
e
r
e
f
or
e
,
a
na
ly
s
i
s
of
I
P
C
doc
um
e
nt
s
a
nd
in
put
s
w
il
l
be
ne
c
e
s
s
a
r
y.
A
us
e
r
m
a
y
a
l
s
o
not
w
r
it
e
e
x
a
c
t
w
or
d
of
of
f
e
ns
e
a
c
c
or
di
ng
to
p
e
na
l
c
od
e
do
c
um
e
nt
in
a
ppl
ic
a
ti
on,
r
e
por
t
or
que
r
y
a
s
in
put
th
e
n
our
pr
opos
e
d
s
ys
te
m
f
in
ds
pe
na
l
c
ode
s
e
c
ti
ons
a
s
a
n
a
ppr
opr
ia
te
a
ns
w
e
r
a
nd
r
e
la
t
e
d
in
f
or
m
a
ti
on
f
or
th
e
us
e
r
.
O
ur
id
e
a
is
to
c
a
lc
ul
a
te
s
im
il
a
r
it
y
be
twe
e
n
e
ve
r
y
s
e
nt
e
nc
e
of
us
e
r
’
s
in
put
a
nd
de
s
c
r
ip
ti
on
of
e
ve
r
y
s
e
c
ti
on
of
I
P
C
doc
um
e
nt
.
A
c
c
or
di
ng
to
s
im
il
a
r
it
y
va
lu
e
,
s
ys
te
m
w
il
l
s
ugge
s
t
li
s
t
of
m
os
t
a
ppr
opr
ia
te
I
P
C
s
e
c
ti
ons
f
or
us
e
r
’
s
i
nput
.
I
n
e
a
r
li
e
r
da
ys
,
D
S
S
w
a
s
de
ve
lo
pe
d
f
or
de
c
is
io
n
m
a
ki
ng
f
or
bus
in
e
s
s
pur
pos
e
s
,
but
to
da
ys
,
it
is
e
vol
vi
ng
f
or
m
a
ny
f
ie
ld
s
li
ke
he
a
lt
hc
a
r
e
,
s
e
c
ur
it
y,
m
e
di
c
in
e
,
m
a
nuf
a
c
tu
r
in
g,
a
nd
e
ngi
ne
e
r
in
g.
I
n
li
te
r
a
tu
r
e
,
huge
w
or
k
is
a
va
il
a
bl
e
f
or
a
va
r
ie
ty
of
de
c
is
io
n
s
uppor
t
s
ys
te
m
s
.
I
n
r
e
c
e
nt
ye
a
r
s
th
e
r
e
a
r
e
m
a
ny
va
r
io
us
le
ga
l/
la
w
in
f
or
m
a
ti
on
s
ys
te
m
s
d
e
ve
lo
pe
d.
Q
u
a
r
e
s
m
a
a
nd
R
odr
ig
ue
s
ha
ve
pr
opos
e
d
a
c
om
put
a
ti
ona
l
li
ngui
s
ti
c
th
e
or
y
(
s
ynt
a
c
ti
c
,
s
e
m
a
nt
ic
a
na
ly
s
is
a
nd
s
e
m
a
nt
ic
in
te
r
pr
e
ta
ti
on)
ba
s
e
d
a
ppr
oa
c
h
to
de
ve
lo
p
a
que
s
ti
on
-
a
ns
w
e
r
in
g
s
ys
te
m
f
or
ju
r
id
ic
a
l
doc
um
e
nt
s
in
P
or
tu
gue
s
e
la
ngua
g
e
.
Q
ue
r
y
pr
oc
e
s
s
in
g
by
in
f
or
m
a
ti
on
r
e
tr
ie
va
l
a
nd
a
na
ly
s
i
s
of
doc
um
e
nt
s
by
in
f
or
m
a
ti
on
e
xt
r
a
c
ti
on
a
r
e
two
m
odul
e
s
of
th
is
que
s
ti
on
a
ns
w
e
r
in
g
s
ys
te
m
s
(
QAS
)
.
T
hi
s
s
ys
te
m
c
ont
a
in
e
d
c
om
pl
e
te
s
e
t
of
de
c
is
io
ns
f
r
om
s
e
ve
r
a
l
P
or
tu
gue
s
e
ju
r
id
ic
a
l
in
s
ti
tu
ti
ons
[
5]
.
T
ir
pude
a
nd
A
lv
i
ha
ve
pr
opos
e
d
a
ke
yw
or
d
-
ba
s
e
d
qua
li
ty
a
s
s
ur
a
nc
e
(
QA
)
s
y
s
te
m
f
or
le
ga
l
doc
um
e
nt
s
of
I
ndi
a
n
l
a
w
s
.
F
or
th
is
,
th
e
a
ut
hor
c
ons
tr
uc
ts
th
e
c
or
pus
a
nd
knowle
dg
e
ba
s
e
f
r
om
le
ga
l
doc
um
e
nt
s
a
nd
pr
e
pa
r
e
d
que
s
ti
on
da
ta
s
e
t
w
it
h
a
n
s
w
e
r
ty
pe
.
T
hi
s
s
ys
te
m
s
ugge
s
te
d
a
ns
w
e
r
of
que
r
y
on
th
e
ba
s
is
of
ke
yw
or
ds
I
nde
xe
d t
e
r
m
di
c
ti
ona
r
y
[
6]
. K
a
m
di
a
nd A
gr
a
w
a
l
de
ve
lo
pe
d que
s
ti
on a
ns
w
e
r
in
g s
y
s
te
m
f
or
I
P
C
s
e
c
ti
ons
a
nd
I
ndi
a
n
a
m
e
ndm
e
nt
la
w
s
.
T
hi
s
Q
A
S
s
e
le
c
t
ke
yw
or
ds
a
nd
qu
e
s
ti
on
ty
pe
f
r
om
que
r
y
a
nd
r
e
s
pons
e
a
c
c
or
di
ng a
ns
w
e
r
s
to
r
e
d i
n c
or
pus
. A
ut
hor
s
de
f
in
e
t
ha
t
pr
obl
e
m
l
ie
s
on i
nt
e
r
s
e
c
ti
on of
t
w
o doma
in
s
:
I
nf
or
m
a
ti
on
r
e
tr
ie
va
l
(
I
R
)
a
nd
na
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
(
N
L
P
)
[
7]
.
S
a
nge
e
th
a
e
t
al
.
ha
ve
pr
opos
e
d
a
n
in
f
or
m
a
ti
on
r
e
tr
ie
va
l
s
ys
te
m
is
de
s
ig
ne
d
to
r
e
tr
ie
ve
r
e
le
va
nt
a
n
s
w
e
r
s
a
bout
la
w
s
.
T
he
u
s
e
r
que
r
y
in
a
s
y
s
te
m
w
a
s
pr
oc
e
s
s
e
d
us
in
g
n
a
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
te
c
hni
que
s
.
T
hi
s
s
y
s
te
m
w
a
s
d
e
s
ig
ne
d
to
f
a
c
e
dyna
m
ic
que
r
ie
s
f
r
om
t
he
us
e
r
e
nd i
ns
te
a
d of
s
to
r
e
d que
s
ti
on a
ns
w
e
r
s
[
8]
.
T
e
xt
pr
oc
e
s
s
in
g
is
a
n
e
s
s
e
nt
ia
l
pa
r
t
of
e
ve
r
y
na
tu
r
a
l
la
ngu
a
ge
ba
s
e
d
s
ys
te
m
.
V
a
r
io
us
m
a
c
hi
ne
le
a
r
ni
ng a
ppr
oa
c
h l
ik
e
de
c
is
io
n t
r
e
e
, ne
a
r
e
s
t
ne
ig
hbor
s
, s
uppor
t
ve
c
to
r
m
a
c
hi
ne
s
, s
pa
r
s
e
ne
twor
k of
w
in
dow
s
,
na
ïv
e
ba
ye
s
a
nd
lo
g
-
li
ne
a
r
m
ode
l
(
m
a
xi
m
um
e
nt
r
opy
m
od
e
ls
)
e
xpe
r
im
e
nt
e
d
f
or
c
la
s
s
if
ic
a
ti
on
of
t
e
xt
[
8]
–
[
10]
.
F
o
r
id
e
nt
if
yi
ng
pa
r
t
-
of
-
s
pe
e
c
h
ta
ggi
ng,
na
m
e
e
nt
i
ti
e
s
a
nd
m
or
phol
ogi
c
a
l
a
na
ly
s
is
r
ul
e
s
-
ba
s
e
d
te
c
hni
que
s
,
G
oogl
e
di
r
e
c
to
r
y
a
nd
hi
dde
n
m
a
r
kov
m
ode
l
w
e
r
e
de
ve
lo
pe
d
[
11]
–
[
15]
.
F
or
id
e
nt
if
yi
ng
a
nd
r
e
m
ovi
ng
s
to
p
w
or
ds
f
r
o
m
te
xt
a
la
te
nt
s
e
m
a
nt
ic
in
de
xi
ng
(
L
S
I
)
,
S
V
M
-
ba
s
e
d
a
ppr
oa
c
h
a
nd
de
te
r
m
in
is
ti
c
f
in
it
e
a
ut
om
a
ta
(
D
F
A
)
w
e
r
e
de
ve
lo
pe
d
[
16]
–
[
18]
.
F
or
s
ol
vi
ng
th
e
is
s
ue
o
f
s
ta
te
m
e
nt
f
or
m
a
ti
on
of
s
ys
t
e
m
a
ti
c
que
s
ti
on
T
e
m
pl
a
te
-
ba
s
e
d
a
ppr
oa
c
h
pr
opos
e
d.
T
hi
s
a
ppr
oa
c
h
w
or
ke
d
on
dom
a
in
-
s
pe
c
if
ic
W
h
-
ty
pe
que
s
ti
ons
a
nd i
m
pe
r
a
ti
ve
que
s
ti
ons
[
19]
.
C
a
lc
ul
a
ti
ng
te
xt
s
im
il
a
r
it
y
be
twe
e
n
two
di
f
f
e
r
e
nt
doc
um
e
nt
s
is
th
e
m
a
in
ta
s
k
of
m
y
r
e
s
e
a
r
c
h.
V
a
r
io
us
a
ppr
oa
c
he
s
ha
v
e
be
e
n
pr
opos
e
d
by
di
f
f
e
r
e
nt
a
ut
hor
s
f
or
th
is
w
or
k.
M
ih
a
lc
e
a
e
t
al
.
ha
ve
pr
opos
e
d
a
c
or
pus
-
ba
s
e
d
a
nd
knowle
dge
-
ba
s
e
d
m
e
a
s
ur
e
s
m
e
th
od
of
f
or
m
e
a
s
ur
in
g
th
e
s
e
m
a
nt
ic
s
im
il
a
r
it
y
of
s
hor
t
te
xt
s
by
e
xpl
oi
ti
ng
th
e
in
f
or
m
a
ti
on
th
a
t
c
a
n
be
dr
a
w
n
f
r
om
th
e
s
im
il
a
r
it
y
of
th
e
c
om
pone
nt
w
or
ds
[
20]
,
[
21]
.
V
e
c
to
r
s
pa
c
e
m
ode
l
(
V
S
M
)
is
u
s
e
d
f
or
c
a
l
c
ul
a
ti
ng
te
xt
s
im
il
a
r
it
y
of
s
m
a
ll
s
e
nt
e
n
c
e
s
a
nd
pa
r
a
gr
a
phs
[
22]
–
[
25]
.
G
r
a
ph
-
ba
s
e
d
te
xt
s
im
il
a
r
it
y
(
G
B
T
S
)
a
lg
or
it
hm
m
a
ps
C
hi
ne
s
e
t
e
xt
s
in
to
gr
a
phs
th
e
n
c
a
lc
ul
a
te
s
th
e
s
im
il
a
r
it
y
of
two
te
xt
s
by
c
om
pa
r
in
g
th
e
ir
g
r
a
phs
[
26]
.
X
ue
e
t
al
.
pr
e
s
e
nt
e
d
a
m
e
th
od
of
te
x
t
s
im
il
a
r
it
y
c
om
put
in
g
to
th
e
c
li
ni
c
a
l
d
e
c
is
io
n s
uppor
t
s
y
s
te
m
.
A
ut
hor
s
im
pr
ove
d
T
F
-
I
D
F
a
lg
or
it
hm
a
nd
c
os
in
e
s
im
il
a
r
it
y
a
lg
or
it
hm
by
c
om
bi
n
in
g
w
it
h
e
ig
e
nve
c
to
r
a
s
s
oc
ia
te
d
m
ode
l
to
de
te
r
m
in
e
th
e
c
a
s
e
f
e
a
tu
r
e
w
e
ig
ht
s
[
27]
.
D
ua
n
a
nd
X
u
pr
e
s
e
nt
e
d
s
hor
t
te
xt
s
im
il
a
r
it
y
a
lg
or
it
hm
f
or
f
in
di
ng
s
im
il
a
r
pol
ic
e
in
c
id
e
nt
s
.
T
hi
s
a
lg
or
it
hm
w
a
s
de
ve
lo
p
e
d
f
r
om
a
nove
l
s
e
m
a
nt
ic
s
im
il
a
r
it
y
a
lg
or
it
hm
w
or
d
m
ove
r
’
d
di
s
ta
nc
e
(
W
M
D
)
[
28]
.
J
o
pr
opos
e
d t
he
ve
r
s
io
n of
k
-
ne
a
r
e
s
t
ne
ig
hbor
(
K
N
N
)
w
hi
c
h c
ons
id
e
r
s
s
im
il
a
r
it
y a
m
ong a
tt
r
ib
ut
e
s
f
or
c
om
put
in
g
th
e
s
im
il
a
r
it
y
be
twe
e
n
f
e
a
tu
r
e
ve
c
to
r
s
[
29
]
.
N
ouf
a
A
ln
a
jr
a
n
e
t
al
.
pr
opos
e
d
he
ur
is
ti
c
dr
iv
e
n
pr
e
-
pr
oc
e
s
s
in
g
m
e
th
odol
ogy f
or
e
nha
nc
in
g t
he
pe
r
f
or
m
a
nc
e
of
s
im
il
a
r
it
y m
e
a
s
ur
e
s
i
n t
he
c
ont
e
xt
of
t
w
it
te
r
t
w
e
e
ts
[
30]
.
2.
P
R
O
P
O
S
E
D
A
R
C
H
I
T
E
C
T
U
R
E
O
F
S
Y
S
T
E
M
B
a
s
e
d
on
r
a
ti
ona
le
s
in
pr
e
vi
ous
s
e
c
ti
ons
,
F
ig
ur
e
1
pr
e
s
e
nt
s
a
r
c
hi
te
c
tu
r
e
of
D
S
S
f
or
f
in
di
ng
th
e
m
o
s
t
s
ui
ta
bl
e
I
P
C
S
e
c
ti
on
of
us
e
r
’
s
in
put
.
I
n
th
e
f
ir
s
t
la
ye
r
of
th
e
s
ys
te
m
,
us
e
r
in
put
w
il
l
be
a
na
ly
z
e
d
us
in
g
N
L
P
te
c
hni
que
s
a
nd i
n t
he
s
e
c
ond la
ye
r
a
knowle
dge
ba
s
e
f
or
t
he
I
P
C
s
e
c
ti
on doc
um
e
nt
w
il
l
be
de
ve
lo
pe
d. S
ys
te
m
c
ons
is
ts
of
s
e
ve
r
a
l
c
om
pone
nt
s
i
n
c
lu
di
ng
-
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
34
-
40
36
−
C
om
pone
nt
f
or
e
xt
r
a
c
ti
on of
of
f
e
nc
e
w
or
ds
a
nd c
r
im
e
r
e
la
te
d i
nf
or
m
a
ti
on f
r
om
th
e
us
e
r
’
s
i
nput
que
r
y.
−
C
om
pone
nt
s
f
or
a
na
ly
z
in
g c
r
im
e
r
e
la
te
d i
nf
or
m
a
ti
on a
nd de
f
in
it
io
n of
s
e
le
c
te
d I
P
C
s
e
c
ti
on
s
.
−
R
e
le
va
nc
e
m
a
tc
hi
ng c
om
pone
nt
f
or
c
r
im
e
:
A
c
c
or
di
ng t
o t
he
de
f
in
it
io
n of
pa
r
ti
c
ul
a
r
I
P
C
s
e
c
ti
ons
.
−
G
e
t
a
nd s
how
m
os
t
a
ppr
opr
ia
te
I
P
C
s
e
c
ti
ons
.
F
ig
ur
e
1
.
P
r
opos
e
d
a
r
c
hi
te
c
tu
r
e
of
s
ys
te
m
3.
M
E
T
H
O
D
I
P
C
doc
um
e
nt
a
nd
of
f
e
nc
e
r
e
por
t
a
r
e
two
di
f
f
e
r
e
nt
ty
pe
of
un
s
t
r
uc
tu
r
e
d
te
xt
.
D
e
ve
lo
pm
e
nt
of
s
uc
h
a
s
ys
te
m
f
or
de
te
r
m
in
e
s
m
os
t
a
ppr
opr
ia
te
I
P
C
S
e
c
ti
ons
f
or
a
c
r
i
m
e
r
e
por
t
f
r
om
uns
tr
uc
tu
r
e
d
te
xt
doc
um
e
nt
of
I
P
C
i
s
di
f
f
ic
ul
t
ta
s
k. W
e
i
de
nt
if
y t
he
f
ol
lo
w
in
g s
te
ps
t
o a
c
hi
e
ve
our
goa
l.
−
S
te
p
1:
D
e
ve
lo
pi
ng a
c
or
p
us
f
or
I
P
C
s
e
c
ti
o
n doc
um
e
nt
.
T
he
I
P
C
doc
um
e
nt
di
s
tr
ib
ut
e
s
511 s
e
c
ti
ons
i
n 23
c
ha
pt
e
r
s
.
E
a
c
h
c
ha
pt
e
r
de
s
c
r
ib
e
s
s
om
e
ki
nd
of
c
r
im
e
a
nd
c
o
ndi
ti
ons
.
I
n
a
c
or
pus
of
I
P
C
s
e
c
ti
on
w
e
in
c
lu
de
f
our
pa
r
ts
(
I
P
C
s
e
c
ti
on no, r
oot
, of
f
e
nc
e
a
nd de
s
c
r
ip
ti
on of
s
e
c
ti
on)
.
−
S
te
p
2:
A
ppl
y
m
e
th
od
of
c
a
lc
ul
a
ti
ng
th
e
te
xt
s
im
il
a
r
i
ty
be
t
w
e
e
n
in
put
te
xt
a
nd
de
s
c
r
ip
ti
on
of
I
P
C
s
e
c
ti
on.
S
e
m
a
nt
ic
s
im
il
a
r
it
y
is
a
m
e
a
s
ur
e
of
c
onc
e
pt
ua
l
di
s
ta
nc
e
be
twe
e
n
two
obj
e
c
t
s
,
ba
s
e
d
on
th
e
c
or
r
e
s
ponde
nc
e
of
t
he
ir
m
e
a
ni
ngs
[
31]
.
T
he
I
P
C
s
e
c
ti
on
de
s
c
r
ip
ti
on
te
xt
a
nd
us
e
r
in
pu
t
te
xt
a
r
e
two
di
f
f
e
r
e
nt
ty
pe
s
of
doc
um
e
nt
s
a
nd
th
e
r
e
is
ve
r
y l
it
tl
e
c
ha
nc
e
t
ha
t
th
e
y a
r
e
l
e
xi
c
a
l
s
im
il
a
r
. O
ur
obj
e
c
ti
ve
i
s
t
o c
a
lc
ul
a
te
s
e
m
a
nt
ic
s
im
il
a
r
it
y be
twe
e
n p
a
ir
of
e
ve
r
y
s
e
nt
e
nc
e
of
s
e
le
c
te
d
I
P
C
s
e
c
ti
on
d
e
s
c
r
ip
ti
on
te
xt
w
it
h
e
ve
r
y
s
e
nt
e
nc
e
of
us
e
r
’
s
in
put
.
T
o
c
a
l
c
ul
a
te
s
im
il
a
r
it
y, f
ol
lo
w
t
he
f
ol
lo
w
in
g s
te
ps
:
i)
A
ppl
y
pr
e
-
pr
oc
e
s
s
in
g
in
I
P
C
S
e
c
ti
on
de
s
c
r
ip
ti
on
te
xt
a
nd
us
e
r
’
s
in
put
te
xt
.
W
e
us
e
d
na
tu
r
a
l
la
ngua
g
e
pr
oc
e
s
s
in
g t
ool
ki
t,
N
L
T
K
f
or
i
m
pl
e
m
e
nt
in
g pr
e
-
pr
oc
e
s
s
in
g.
S
te
ps
a
r
e
:
−
T
oke
ni
z
a
ti
on:
T
ok
e
ni
z
a
ti
on i
s
a
pr
oc
e
dur
e
of
s
pl
it
ti
ng a
s
e
nt
e
nc
e
i
nt
o l
is
t
of
w
or
ds
.
−
L
ow
e
r
c
a
s
in
g:
C
onve
r
t
a
ll
w
or
ds
in
c
om
m
on
c
a
s
e
(
m
os
t
pr
e
f
e
r
a
bl
e
lo
w
e
r
c
a
s
e
)
be
c
a
u
s
e
in
N
L
P
s
a
m
e
w
or
d i
n di
f
f
e
r
e
nt
c
a
s
e
t
r
e
a
te
d a
s
a
di
f
f
e
r
e
nt
w
or
d.
−
S
to
p
w
or
ds
r
e
m
ova
l:
I
n
a
te
xt
doc
um
e
nt
,
th
e
r
e
a
r
e
s
o
m
a
ny
w
or
ds
(
li
ke
‘
is
’
,
‘
w
a
s
’
,
‘
a
’
,
a
nd
‘
th
e
’
.
)
th
a
t
do
not
s
ig
ni
f
y
a
ny
im
por
ta
nc
e
in
pr
oc
e
s
s
in
g.
S
o,
th
e
s
e
w
or
ds
m
us
t
r
e
m
ove
f
r
om
d
oc
um
e
nt
be
f
or
e
pr
oc
e
s
s
in
g.
−
S
te
m
m
in
g/
le
m
m
a
ti
z
a
ti
on:
S
te
m
m
in
g
a
nd
l
e
m
m
a
ti
z
a
ti
on
is
a
pr
oc
e
s
s
of
tr
a
ns
f
or
m
in
g
a
w
or
d
to
it
s
r
oot
f
or
m
. L
e
m
m
a
ti
z
a
ti
on w
or
ks
be
tt
e
r
t
he
n s
te
m
m
in
g f
or
c
onve
r
ti
n
g a
w
or
d t
o i
ts
r
oot
f
or
m
.
−
A
f
te
r
c
le
a
ni
ng
te
xt
doc
um
e
nt
,
w
e
f
ound
m
os
t
im
por
ta
nt
w
o
r
ds
in
I
P
C
s
e
c
ti
on
de
s
c
r
ip
ti
on
a
nd
us
e
r
’
s
in
put
f
or
f
u
r
th
e
r
pr
oc
e
s
s
in
g.
ii)
U
s
e
f
il
te
r
e
d
I
P
C
S
e
c
ti
on
de
s
c
r
ip
ti
on
w
or
ds
a
s
a
te
r
m
.
A
ppl
y
f
e
a
tu
r
e
e
ngi
ne
e
r
in
g
f
or
f
in
di
ng
f
e
a
tu
r
e
o
f
us
e
r
’
s
in
put
te
xt
a
s
a
ve
c
to
r
f
r
om
te
r
m
S
o,
f
e
a
tu
r
e
e
ngi
ne
e
r
i
ng
te
c
h
ni
que
w
il
l
c
a
lc
ul
a
te
ve
c
to
r
va
lu
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
T
e
x
t
s
imi
la
r
it
y
al
gor
it
hm
s
t
o d
e
te
r
m
in
e
I
ndi
an pe
nal
c
od
e
s
e
c
ti
ons
f
or
of
fe
nc
e
r
e
por
t
(
A
m
br
is
h S
r
iv
as
ta
v
)
37
a
c
c
or
di
ng
to
pr
e
s
e
nc
e
of
te
r
m
s
or
it
s
s
ynonyms
w
or
d
in
us
e
r
’
s
in
put
.
T
h
e
r
e
a
r
e
s
e
ve
r
a
l
te
c
hni
que
s
th
a
t
a
ppl
y t
o de
r
iv
e
r
e
le
va
nt
f
e
a
tu
r
e
s
f
r
om
a
t
e
xt
doc
um
e
nt
.
3.1
.
V
e
c
t
or
s
p
ac
e
m
od
e
l
V
e
c
to
r
s
pa
c
e
m
ode
l
is
a
m
a
tr
ix
r
e
pr
e
s
e
nt
a
ti
on
of
li
s
t
of
doc
um
e
nt
s
a
nd
c
or
pu
s
of
w
or
ds
.
E
ve
r
y
r
ow
r
e
pr
e
s
e
nt
s
in
di
vi
dua
l
doc
um
e
nt
a
nd c
ol
um
ns
r
e
pr
e
s
e
nt
w
or
ds
o
f
c
or
pus
.
C
e
ll
s
to
r
e
va
lu
e
‘
0’
or
‘
1’
.
‘
0’
m
e
a
ns
th
a
t
w
or
d
not
pr
e
s
e
nt
in
doc
um
e
nt
a
nd
‘
1’
in
di
c
a
t
e
s
w
or
d
oc
c
ur
r
e
d
in
doc
um
e
nt
.
I
n
our
pr
obl
e
m
ve
c
to
r
m
a
tr
ix
s
how
s
oc
c
ur
r
e
nc
e
of
te
r
m
s
(
s
e
le
c
te
d
f
e
a
tu
r
e
of
p
a
r
ti
c
ul
a
r
I
P
C
s
e
c
ti
on)
in
a
te
xt
doc
um
e
nt
(
us
e
r
’
s
in
put
)
a
nd
a
c
c
or
di
ng
to
c
e
ll
s
va
lu
e
w
e
c
a
n
c
a
lc
ul
a
te
a
ppe
a
r
a
nc
e
of
I
P
C
S
e
c
ti
on
in
s
e
nt
e
nc
e
.
I
n
th
e
us
e
r
'
s
in
put
,
th
e
r
e
m
a
y
be
m
a
ny
s
e
nt
e
n
c
e
s
th
a
t
a
r
e
not
r
e
la
te
d
t
o
th
e
I
P
C
s
e
c
ti
on.
I
f
th
e
ve
c
to
r
va
lu
e
of
a
ll
th
e
w
or
ds
in
th
e
s
e
nt
e
nc
e
is
‘
0
’
th
e
n
s
y
s
te
m
w
il
l
ig
nor
e
th
a
t
s
e
nt
e
n
c
e
f
or
s
c
or
e
c
a
lc
ul
a
ti
on.
W
e
c
r
e
a
te
v
e
c
to
r
s
f
or
de
s
c
r
ip
ti
on
of
e
a
c
h
I
P
C
s
e
c
ti
on
a
nd
e
ve
r
y
pa
r
a
gr
a
ph
of
us
e
r
’
s
i
nput
a
nd
th
e
s
ys
te
m
w
il
l
us
e
th
e
s
e
ve
c
to
r
s
f
or
f
ur
th
e
r
c
a
lc
ul
a
ti
ons
. T
he
r
e
a
r
e
s
om
e
t
ool
s
f
or
c
onve
r
ti
ng t
e
xt
doc
um
e
nt
i
nt
o a
ve
c
to
r
.
i)
C
ount
V
e
c
to
r
iz
e
r
:
C
ount
V
e
c
to
r
iz
e
r
is
a
to
ol
pr
ovi
de
d
by
th
e
s
c
ik
it
-
le
a
r
n
li
br
a
r
y
in
P
yt
ho
n.
I
t
is
us
e
d
to
tr
a
ns
f
or
m
a
gi
ve
n
te
xt
in
to
a
ve
c
to
r
on
th
e
b
a
s
is
of
th
e
f
r
e
que
n
c
y
(
c
ount
)
of
e
a
c
h
w
or
d
th
a
t
o
c
c
ur
s
in
th
e
e
nt
ir
e
t
e
xt
.
L
e
t
c
ons
id
e
r
t
he
e
xa
m
pl
e
f
or
s
om
e
f
il
te
r
e
d I
P
C
S
e
c
ti
on de
s
c
r
ip
ti
on
:
−
D
0:
publ
ic
nui
s
a
nc
e
i
ll
e
ga
l
om
is
s
io
n c
a
u
s
e
c
om
m
on i
nj
ur
y da
n
ge
r
−
D
1:
unl
a
w
f
ul
ly
ne
gl
ig
e
nt
a
c
t
li
ke
ly
s
pr
e
a
d i
nf
e
c
ti
on dis
e
a
s
e
da
n
ge
r
ous
l
if
e
−
D
2:
m
a
li
gna
nt
a
c
t
li
ke
ly
s
pr
e
a
d i
nf
e
c
ti
on dis
e
a
s
e
da
ng
e
r
ous
l
if
e
S
a
m
pl
e
r
e
s
ul
t
of
C
ount
V
e
c
to
r
iz
e
r
s
how
s
f
r
e
que
nc
y
of
w
or
ds
in
e
a
c
h
doc
um
e
nt
(
D
0,
D
1
a
nd
D
2)
in
T
a
bl
e
1. I
f
w
or
d a
ppe
a
r
s
i
n doc
um
e
nt
t
he
n f
r
e
que
nc
y of
w
or
d b
e
c
om
e
s
‘
1’
ot
he
r
w
is
e
i
t
w
il
l
be
‘
0’
.
ii)
TF
-
I
D
F
:
T
F
-
I
D
F
s
ta
nds
f
or
te
r
m
f
r
e
que
nc
y
-
in
ve
r
s
e
doc
um
e
nt
f
r
e
que
nc
y
.
I
n
th
is
m
ode
l,
w
e
ta
ke
te
r
m
f
r
e
que
nc
y
a
nd
in
ve
r
s
e
doc
um
e
nt
f
r
e
que
nc
y
a
s
pa
r
a
m
e
te
r
s
to
de
c
r
e
a
s
e
th
e
w
e
ig
ht
of
th
e
te
r
m
s
a
ppe
a
r
in
g
c
om
m
onl
y i
n a
ll
t
he
s
e
nt
e
nc
e
s
. F
or
m
ul
a
s
of
c
a
lc
ul
a
ti
ng T
F
-
I
D
F
s
te
pw
is
e
a
r
e
:
−
tf
(
t
,
d)
=
c
ou
nt
of
t
in
d/
nu
m
be
r
of
w
o
r
ds
in
d
/
/t
e
r
m
f
r
e
que
nc
y
−
df
(
t
)
=
o
c
c
u
r
r
e
nc
e
of
t
in
do
c
um
e
n
ts
//
doc
um
e
nt
f
r
e
q
ue
nc
y
−
id
f
(
t
)
=
lo
g
(
N
/
d
f
(
t)
)
//
in
ve
r
s
e
doc
u
m
e
nt
f
r
e
q
ue
n
c
y
−
tf
-
i
d
f
(
t,
d
)
=
t
f
(
t
,
d
)
*
id
f
(
t
)
S
a
m
pl
e
r
e
s
ul
t
of
T
F
-
I
D
F
s
how
s
f
r
e
que
nc
y
of
w
or
ds
in
e
a
c
h
doc
um
e
nt
(
D
0,
D
1
a
nd
D
2)
in
T
a
bl
e
2.
F
r
e
que
nc
y of
e
a
c
h w
or
d c
a
lc
ul
a
te
d by it
s
a
ppe
a
r
a
nc
e
i
n pa
r
ti
c
ul
a
r
doc
um
e
nt
a
nd a
ll
doc
um
e
nt
s
.
T
a
bl
e
1. S
a
m
pl
e
I
P
C
s
e
c
ti
on ve
c
to
r
us
in
g C
ount
V
e
c
to
r
iz
e
r
act
c
a
us
e
c
om
m
on
da
nge
r
publ
i
c
s
pr
e
a
d
U
nl
a
w
f
ul
l
y
0
1
0
0
0
0
1
1
1
1
0
0
0
0
1
0
2
0
1
1
1
1
0
0
T
a
bl
e
2
.
S
a
m
pl
e
I
P
C
s
e
c
ti
on ve
c
to
r
us
in
g T
F
-
I
D
F
act
c
a
us
e
c
om
m
on
da
nge
r
publ
i
c
s
pr
e
a
d
U
nl
a
w
f
ul
l
y
0
0.309228
0
0
0
0
0.309228
0.406598
1
0.33847
0
0
0
0
0.33847
0
2
0
0.353553
0.353553
0.353553
0.353553
0
0
−
S
te
p
3:
C
a
lc
ul
a
te
C
os
in
e
s
im
il
a
r
it
y
be
twe
e
n
ve
c
to
r
s
of
e
ve
r
y
pa
r
a
gr
a
ph
of
us
e
r
s
in
put
w
it
h
ve
c
to
r
of
e
a
c
h
I
P
C
S
e
c
ti
on
de
s
c
r
ip
ti
on.
C
os
in
e
s
im
il
a
r
it
y
m
e
a
s
ur
e
s
th
e
s
i
m
il
a
r
it
y
be
twe
e
n
two
ve
c
to
r
s
of
a
n
in
ne
r
pr
oduc
t
s
pa
c
e
a
s
s
how
n
in
F
ig
ur
e
2.
I
t
is
m
e
a
s
ur
e
d
by
th
e
c
o
s
in
e
of
th
e
a
ngl
e
be
twe
e
n
two
v
e
c
to
r
s
a
nd
de
te
r
m
in
e
s
w
he
th
e
r
two
ve
c
to
r
s
a
r
e
poi
nt
in
g
in
r
oughly
th
e
s
a
m
e
di
r
e
c
ti
on.
I
t
is
of
te
n
us
e
d
to
m
e
a
s
ur
e
doc
um
e
nt
s
im
il
a
r
it
y
in
t
e
xt
a
na
ly
s
is
. V
a
lu
e
s
r
a
nge
b
e
twe
e
n
-
1 a
nd 1, whe
r
e
-
1 i
s
pe
r
f
e
c
tl
y
di
s
s
im
il
a
r
a
nd
1 i
s
pe
r
f
e
c
tl
y s
im
il
a
r
.
S
im
il
a
r
it
y (
A
, B
)
=
.
|
|
|
|
|
|
|
|
=
∑
=
1
√
∑
2
=
1
√
∑
2
=
1
−
S
te
p
4:
A
c
c
or
di
ng
to
th
is
c
a
lc
ul
a
ti
on
of
c
os
in
e
s
im
il
a
r
it
y,
s
ys
te
m
w
il
l
s
how
li
s
t
of
m
os
t
a
ppr
opr
ia
te
I
P
C
s
e
c
ti
ons
th
a
t’
s
c
lo
s
e
ly
r
e
la
t
e
d
to
u
s
e
r
s
in
put
.
H
e
r
e
one
do
c
u
m
e
nt
is
d
e
s
c
r
ip
ti
on
of
I
P
C
s
e
c
ti
on
a
nd
a
not
he
r
doc
um
e
nt
i
s
pa
r
a
gr
a
ph of
us
e
r
’
s
i
nput
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
34
-
40
38
F
ig
u
r
e
2
.
C
os
in
e
di
s
ta
nc
e
s
im
il
a
r
it
y
4.
R
E
S
U
L
T
S
A
N
D
D
I
S
C
U
S
S
I
O
N
4.1
.
D
e
ve
lo
p
m
e
n
t
of
c
or
p
u
s
T
he
r
e
a
r
e
511
s
e
c
ti
ons
in
I
P
C
doc
um
e
nt
th
a
t
s
a
r
e
de
vi
de
d
i
nt
o
23
c
ha
pt
e
r
s
.
W
e
h
a
ve
s
e
le
c
te
d
4
c
ha
pt
e
r
s
of
t
he
I
P
C
doc
um
e
nt
, w
hi
c
h a
r
e
c
ha
pt
e
r
s
14, 15, 16 a
n
d 22, to pr
ove
t
he
pr
e
s
um
e
d c
or
r
e
c
tn
e
s
s
of
our
pr
opos
e
d w
or
k. W
e
de
ve
lo
pe
d
c
or
pus
f
or
s
e
c
ti
ons
(
a
r
ound 120)
of
t
he
s
e
c
ha
pt
e
r
a
s
s
how
n i
n
T
a
bl
e
3.
4.2
.
S
e
le
c
t
co
m
p
la
in
f
or
i
n
p
u
t
W
e
ha
ve
s
e
le
c
te
d
th
e
c
om
pl
a
in
t
te
xt
a
s
s
how
n
in
F
ig
ur
e
3
r
e
la
te
d
to
th
e
s
e
c
h
a
pt
e
r
s
a
s
th
e
in
put
que
r
y.
T
he
s
e
c
om
pl
a
in
ts
a
r
e
a
v
a
il
a
bl
e
in
th
e
f
or
m
of
F
I
R
.
on
th
e
of
f
ic
ia
l
por
ta
l
of
s
ta
te
pol
ic
e
in
I
ndi
a
.
T
he
F
I
R
i
s
di
vi
de
d i
nt
o pa
r
a
gr
a
phs
w
hi
c
h
c
ont
a
in
t
he
of
f
e
ns
e
a
nd i
ts
r
e
la
te
d i
nf
or
m
a
ti
on.
4.3. S
im
il
ar
it
y c
al
c
u
la
t
io
n
C
ount
ve
c
to
r
a
nd
T
F
-
I
D
F
m
ode
l
a
ppl
ie
d
to
c
a
lc
ul
a
te
te
xt
s
im
il
a
r
it
y
be
twe
e
n
e
a
c
h
pa
r
a
gr
a
ph
of
c
om
pl
a
in
t
w
it
h
de
s
c
r
ip
ti
on
of
e
a
c
h
s
e
c
ti
on
a
nd
f
ound
li
s
t
of
m
os
t
a
ppr
opr
ia
te
‘
10’
I
P
C
s
e
c
ti
ons
th
a
t’
s
m
os
t
r
e
la
te
d
to
c
om
pl
a
in
a
s
s
how
n
in
T
a
bl
e
4.
A
s
a
r
e
s
ul
t
bo
th
m
ode
ls
pr
oduc
e
s
om
e
li
s
t
of
I
P
C
s
e
c
ti
ons
.
T
hi
s
li
s
t
a
nd
it
s
s
e
que
n
c
e
a
r
e
di
f
f
e
r
e
nt
in
r
e
s
ul
t
of
bot
h
m
ode
l
but
m
os
t
of
s
e
c
ti
ons
a
r
e
c
om
m
on
r
e
la
te
d
to
c
om
pl
a
in
.
B
a
s
e
d on the
out
put
of
t
he
s
e
m
ode
ls
, t
he
s
ys
te
m
c
a
n a
c
t
a
s
de
c
i
s
io
n s
uppor
t
f
or
t
he
us
e
r
.
T
a
bl
e
3.
C
or
pus
f
or
I
P
C
s
e
c
ti
on doc
um
e
nt
S
e
c
t
i
on
R
oot
O
f
f
e
nc
e
D
e
s
c
r
i
pt
i
on
268
nui
s
a
nc
e
P
ubl
i
c
nui
s
a
nc
e
P
ubl
i
c
nui
s
a
nc
e
, i
l
l
e
ga
l
om
i
s
s
i
on w
hi
c
h
c
a
us
e
s
a
ny c
om
m
on i
nj
ur
y, da
nge
r
269
ne
gl
i
ge
nt
l
y
N
e
gl
i
ge
nt
a
c
t
U
nl
a
w
f
ul
l
y, N
e
gl
i
ge
nt
a
c
t
l
i
ke
l
y t
o s
pr
e
a
d
i
nf
e
c
t
i
on of
di
s
e
a
s
e
d
a
nge
r
ous
t
o l
i
f
e
270
m
a
l
i
gna
nt
M
a
l
i
gna
nt
a
c
t
M
a
l
i
gna
nt
a
c
t
l
i
ke
l
y t
o s
pr
e
a
d i
nf
e
c
t
i
on of
di
s
e
a
s
e
da
nge
r
ous
t
o l
i
f
e
da
nge
r
ous
t
o l
i
f
e
F
ig
ur
e
3. S
a
m
pl
e
c
om
pl
a
in
t
te
xt
T
a
bl
e
4
.
C
om
pa
r
is
io
n of
c
ount
ve
c
to
r
a
nd T
F
-
I
D
F
r
e
s
ul
t
C
ount
V
e
c
t
or
R
e
s
ul
t
TF
-
I
D
F
R
e
s
ul
t
r
e
l
a
t
e
d_i
pc
s
_i
nde
x
[
118 42 48 41 49 51 66 26 123
43]
(
'
I
P
C
'
, 364, '
:
'
, '
K
i
dna
ppi
ng or
a
bduc
t
i
ng i
n or
de
r
t
o m
ur
de
r
'
)
(
'
I
P
C
'
, 303, '
:
'
, '
P
uni
s
hm
e
nt
f
or
m
u
r
de
r
by l
i
f
e
-
c
onvi
c
t
'
)
(
'
I
P
C
'
, 307, '
:
'
, '
A
t
t
e
m
pt
t
o m
u
r
de
r
'
)
(
'
I
P
C
'
, 302, '
:
'
, '
P
uni
s
hm
e
nt
f
or
m
u
r
de
r
'
)
(
'
I
P
C
'
, 308, '
:
'
, '
A
t
t
e
m
pt
t
o c
om
m
i
t
c
ul
pa
bl
e
hom
i
c
i
de
'
)
(
'
I
P
C
'
, 310, '
:
'
, '
T
hug
'
)
(
'
I
P
C
'
, '
320F
'
, '
:
'
, '
G
r
i
e
vous
hur
t
'
)
(
'
I
P
C
'
,
290,
'
:
'
,
'
P
uni
s
hm
e
nt
f
or
publ
i
c
nui
s
a
nc
e
i
n
c
a
s
e
s
not
ot
he
r
w
i
s
e
pr
ovi
de
d f
or
'
)
(
'
I
P
C
'
, '
366B
'
, '
:
'
, '
I
m
po
r
t
a
t
i
on of
gi
r
l
f
r
o
m
f
or
e
i
gn c
ount
r
y
'
)
(
'
I
P
C
'
,
304,
'
:
'
,
'
P
uni
s
hm
e
nt
f
o
r
c
ul
pa
bl
e
hom
i
c
i
de
not
a
m
ount
i
ng
t
o m
ur
de
r
'
)
r
e
l
a
t
e
d_i
pc
s
_i
nde
x
[
118 42 66 26 48 123 49 41 141]
(
'
I
P
C
'
, 364, '
:
'
, '
K
i
dna
ppi
ng or
a
bduc
t
i
ng i
n or
de
r
t
o m
ur
de
r
'
)
('
I
P
C
'
, 303, '
:
'
, '
P
uni
s
hm
e
nt
f
or
m
u
r
de
r
by l
i
f
e
-
c
onvi
c
t
'
)
(
'
I
P
C
'
, '
320F
'
, '
:
'
, '
G
r
i
e
vous
hur
t
'
)
(
'
I
P
C
'
, 290, '
:
'
, '
P
uni
s
hm
e
nt
f
or
publ
i
c
nui
s
a
nc
e
i
n c
a
s
e
s
not
ot
he
r
w
i
s
e
pr
ovi
de
d f
or
'
)
(
'
I
P
C
'
, 307, '
:
'
, '
A
t
t
e
m
pt
t
o m
u
r
de
r
'
)
(
'
I
P
C
'
, '
366B
'
, '
:
'
, '
I
m
po
r
t
a
t
i
on
of
gi
r
l
f
r
o
m
f
or
e
i
gn c
ount
r
y
'
)
(
'
I
P
C
'
, 308, '
:
'
, '
A
t
t
e
m
pt
t
o c
om
m
i
t
c
ul
pa
bl
e
hom
i
c
i
de
'
)
(
'
I
P
C
'
, 302, '
:
'
, '
P
uni
s
hm
e
nt
f
or
m
u
r
de
r
'
)
(
'
I
P
C
'
, '
376C
'
, '
:
'
, '
I
nt
e
r
c
our
s
e
by s
upe
r
i
nt
e
nde
nt
of
j
a
i
l
a
nd
r
e
m
a
nd
hom
e
'
)
(
'
I
P
C
'
, 310, '
:
'
, '
T
hug
'
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
T
e
x
t
s
imi
la
r
it
y
al
gor
it
hm
s
t
o d
e
te
r
m
in
e
I
ndi
an pe
nal
c
od
e
s
e
c
ti
ons
f
or
of
fe
nc
e
r
e
por
t
(
A
m
br
is
h S
r
iv
as
ta
v
)
39
5.
C
O
N
C
L
U
S
I
O
N
T
hi
s
r
e
s
e
a
r
c
h
pa
pe
r
s
ta
r
ts
w
it
h
a
n
in
tr
oduc
ti
on
of
a
pr
obl
e
m
in
ju
di
c
ia
l
s
ys
te
m
a
nd
f
in
ds
s
ol
ut
io
n
by
us
in
g
de
c
is
io
n
s
uppor
t
s
y
s
te
m
(
D
S
S
)
.
D
S
S
a
im
s
to
he
lp
m
a
ke
th
e
be
s
t
de
c
i
s
io
n
ba
s
e
d
on
e
xi
s
ti
ng
in
f
or
m
a
ti
on.
O
ve
r
th
e
pa
s
t
f
e
w
de
c
a
de
s
,
a
num
be
r
of
in
f
or
m
a
ti
on
r
e
tr
ie
va
l
(
I
R
)
s
ys
te
m
a
nd
que
s
ti
on
a
ns
w
e
r
in
g
s
y
s
te
m
s
(
Q
A
S
)
ha
ve
be
e
n
de
ve
lo
pe
d
to
f
in
d
r
e
s
ul
t
a
nd
a
ns
w
e
r
s
in
a
li
m
it
e
d
s
pe
c
if
ic
a
r
e
a
.
I
R
s
ys
te
m
a
nd
Q
A
S
ta
ke
s
s
in
gl
e
li
ne
que
s
ti
on
a
nd
a
ppl
y
N
L
P
te
c
hni
que
s
to
e
xt
r
a
c
t
k
e
yw
or
d
a
nd
s
e
a
r
c
h
r
e
s
ul
t.
H
e
r
e
w
e
p
r
opos
e
th
e
a
r
c
hi
te
c
tu
r
e
of
D
S
S
f
or
c
r
im
e
in
c
id
e
nt
doc
um
e
nt
s
w
hi
c
h
s
ugge
s
t
th
e
li
s
t
of
m
os
t
a
ppl
ic
a
bl
e
I
P
C
s
e
c
ti
on
by
c
om
pa
r
in
g
th
e
us
e
r
in
put
doc
um
e
n
t
a
nd
I
P
C
s
e
c
ti
on
doc
um
e
nt
by
ve
c
to
r
s
pa
c
e
m
ode
l.
O
ur
pr
opos
e
d
s
y
s
te
m
e
nh
a
nc
e
s
th
e
w
or
ki
ng
of
ty
pi
c
a
l
que
s
t
io
n
a
n
s
w
e
r
in
g
s
y
s
te
m
a
nd
h
e
lp
to
ta
ke
de
c
is
io
n
on
th
e
ba
s
is
of
r
e
s
ul
t.
I
n
th
e
f
ut
ur
e
,
s
om
e
ot
he
r
t
e
xt
s
im
il
a
r
it
y
a
lg
or
it
hm
s
s
uc
h
a
s
w
or
d2ve
c
,
doc
2ve
c
,
a
nd
B
E
R
T
(
s
e
nt
e
nc
e
t
r
a
ns
f
or
m
)
. w
il
l
us
e
t
o c
he
c
k t
he
a
c
ur
e
a
c
y of
t
he
s
y
s
te
m
.
A
C
K
N
O
WL
E
D
G
E
M
E
N
T
I
w
a
nt
to
th
a
nk
m
y
s
upe
r
vi
s
or
D
r
.
S
ha
li
g
r
a
m
P
r
a
ja
pa
t,
A
s
s
oc
i
a
te
P
r
of
e
s
s
or
in
I
I
P
S
D
A
V
V
,
I
ndor
e
not
onl
y f
or
hi
s
c
ont
in
ue
d s
uppor
t
but
f
or
t
he
m
ot
iv
a
ti
on a
nd f
r
ui
tf
ul
a
dvi
s
e
s
i
n a
c
c
om
pl
is
hi
ng t
hi
s
t
a
s
k.
R
E
F
E
R
E
N
C
E
S
[
1]
R
.
S
.
W
a
gh
a
nd
D
.
A
na
nd,
“
L
e
g
a
l
doc
um
e
nt
s
i
m
i
l
a
r
i
t
y:
a
m
ul
t
i
-
c
r
i
t
e
r
i
a
de
c
i
s
i
on
-
m
a
ki
ng
pe
r
s
pe
c
t
i
ve
,”
P
e
e
r
J
C
om
put
e
r
Sc
i
e
nc
e
,
vol
. 6, A
r
t
. no. e
262, M
a
r
. 2020, doi
:
10.7717/
pe
e
r
j
-
c
s
.262.
[
2]
A
.
M
a
nda
l
,
R
.
C
ha
ki
,
S
.
S
a
ha
,
K
.
G
ho
s
h,
A
.
P
a
l
,
a
nd
S
.
G
ho
s
h,
“
M
e
a
s
ur
i
ng
s
i
m
i
l
a
r
i
t
y
a
m
ong
l
e
g
a
l
c
our
t
c
a
s
e
doc
um
e
nt
s
,
”
i
n
P
r
oc
e
e
di
ngs
of
t
he
10t
h
A
nnual
A
C
M
I
ndi
a
C
om
put
e
C
onf
e
r
e
nc
e
on
Z
Z
Z
-
C
om
put
e
’
17
,
2017,
pp.
1
–
9,
doi
:
10.1145/
3140107.3140119.
[
3]
P
.
B
ha
t
t
a
c
ha
r
ya
,
K
.
G
hos
h,
A
.
P
a
l
,
a
nd
S
.
G
hos
h,
“
M
e
t
hods
f
or
c
om
put
i
ng
l
e
ga
l
doc
um
e
nt
s
i
m
i
l
a
r
i
t
y:
a
c
om
pa
r
a
t
i
ve
s
t
udy,”
C
om
put
e
r
Sc
i
e
n
c
e
, A
pr
. 2020.
[
4]
S
.
R
e
nj
i
t
a
nd
S
.
M
.
I
di
c
ul
a
,
“
S
i
m
i
l
a
r
i
t
y
i
n
l
e
ga
l
t
e
xt
s
us
i
ng
doc
um
e
nt
l
e
ve
l
e
m
b
e
ddi
ngs
,”
C
U
SA
T
N
L
P
@
A
I
L
A
-
F
I
R
E
2019
,
pp.
25
–
30, 2019.
[
5]
P
.
Q
ua
r
e
s
m
a
a
nd
I
.
P
.
R
odr
i
gue
s
,
“
A
que
s
t
i
on
a
n
s
w
e
r
s
y
s
t
e
m
f
or
l
e
ga
l
i
nf
or
m
a
t
i
on
r
e
t
r
i
e
va
l
,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
2005
c
onf
e
r
e
nc
e
on L
e
gal
K
now
l
e
dge
and I
nf
or
m
at
i
on Sy
s
t
e
m
s
:
J
U
R
I
X
2005:
T
he
E
i
ght
e
e
nt
h A
nnual
C
onf
e
r
e
nc
e
, 2005, pp. 91
–
100.
[
6]
S
.
C
.
T
i
r
pude
a
nd
D
.
A
.
S
.
A
l
vi
,
“
C
l
os
e
d
dom
a
i
n
ke
yw
or
d
ba
s
e
d
qu
e
s
t
i
on
a
n
s
w
e
r
i
ng
s
ys
t
e
m
f
or
l
e
ga
l
doc
um
e
nt
s
of
I
P
C
s
e
c
t
i
on
s
I
ndi
a
n l
a
w
s
,”
I
nt
e
r
nat
i
onal
J
our
nal
of
I
nnov
at
i
v
e
R
e
s
e
a
r
c
h i
n C
om
put
e
r
and C
om
m
uni
c
at
i
on E
ngi
ne
e
r
i
ng
, 2015.
[
7]
R
.
P
.
K
a
m
di
a
nd
A
.
J
.
A
gr
a
w
a
l
,
“
K
e
yw
or
ds
b
a
s
e
d
c
l
o
s
e
d
dom
a
i
n
que
s
t
i
on
a
n
s
w
e
r
i
ng
s
ys
t
e
m
f
or
I
ndi
a
n
pe
na
l
c
ode
s
e
c
t
i
ons
a
nd
I
ndi
a
n a
m
e
ndm
e
nt
l
a
w
s
,
”
I
nt
e
r
na
t
i
onal
J
ou
r
nal
of
I
nt
e
l
l
i
ge
nt
S
y
s
t
e
m
s
and A
ppl
i
c
at
i
ons
, vol
. 7, no.
12, pp. 57
–
67,
N
ov. 2015,
doi
:
10.5815/
i
j
i
s
a
.2015.12.06.
[
8]
D
.
S
a
nge
e
t
ha
,
R
.
K
a
vya
s
hr
i
,
S
.
S
w
e
t
ha
,
a
nd
S
.
V
i
gne
s
h,
“
I
nf
or
m
a
t
i
on
r
e
t
r
i
e
va
l
s
ys
t
e
m
f
or
l
a
w
s
,”
i
n
2016
E
i
ght
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on A
dv
anc
e
d C
om
put
i
ng (
I
C
oA
C
)
, J
a
n. 2017, pp. 212
–
217, doi
:
10.1109/
I
C
oA
C
.2017.7951772.
[
9]
D
. Z
ha
ng a
nd W
. S
. L
e
e
, “
Q
ue
s
t
i
on c
l
a
s
s
i
f
i
c
a
t
i
on us
i
ng
s
uppor
t
ve
c
t
or
m
a
c
hi
ne
s
,”
i
n
P
r
oc
e
e
di
ng
s
of
t
he
26t
h annual
i
nt
e
r
nat
i
onal
A
C
M
SI
G
I
R
c
onf
e
r
e
nc
e
on
R
e
s
e
ar
c
h
and
de
v
e
l
opm
e
nt
i
n
i
nf
or
m
ai
on
r
e
t
r
i
e
v
al
-
SI
G
I
R
’
03
,
A
ug.
2003,
p.
26,
doi
:
10.1145/
860435.860443.
[
10]
P
.
B
l
uns
om
,
K
.
K
oc
i
k
,
a
nd
J
.
R
.
C
ur
r
a
n,
“
Q
u
e
s
t
i
on
c
l
a
s
s
i
f
i
c
a
t
i
on
w
i
t
h
l
og
-
l
i
ne
a
r
m
ode
l
s
,”
i
n
P
r
oc
e
e
di
ng
s
of
t
he
29t
h
annua
l
i
nt
e
r
nat
i
onal
A
C
M
SI
G
I
R
c
onf
e
r
e
nc
e
on
R
e
s
e
a
r
c
h
and
de
v
e
l
opm
e
nt
i
n
i
nf
or
m
at
i
on
r
e
t
r
i
e
v
al
-
SI
G
I
R
’
06
,
2006,
p.
615,
doi
:
10.1145/
1148170.1148282.
[
11]
J
. L
i
u a
nd L
. B
i
r
nba
um
, “
M
e
a
s
ur
i
ng s
e
m
a
nt
i
c
s
i
m
i
l
a
r
i
t
y be
t
w
e
e
n na
m
e
d e
nt
i
t
i
e
s
by s
e
a
r
c
hi
ng t
he
w
e
b di
r
e
c
t
or
y.”
[
12]
R
.
A
ge
i
s
hi
a
nd
T
.
M
i
ur
a
,
“
N
a
m
e
d
e
nt
i
t
y
r
e
c
ogni
t
i
on
ba
s
e
d
on
a
H
i
dde
n
M
a
r
k
ov
M
ode
l
i
n
pa
r
t
-
of
-
s
pe
e
c
h
t
a
ggi
ng,”
i
n
2008
F
i
r
s
t
I
nt
e
r
nat
i
onal
C
onf
e
r
e
n
c
e
on t
he
A
ppl
i
c
at
i
ons
of
D
i
gi
t
al
I
nf
or
m
at
i
on and
W
e
b
T
e
c
hnol
ogi
e
s
(
I
C
A
D
I
W
T
)
, A
ug. 2008, pp. 397
–
402,
doi
:
10.1109/
I
C
A
D
I
W
T
.2008.4664380.
[
13]
Z
ha
ng
Y
ouz
hi
,
“
R
e
s
e
a
r
c
h
a
nd
i
m
pl
e
m
e
nt
a
t
i
on
of
pa
r
t
-
of
-
s
pe
e
c
h
t
a
ggi
ng
ba
s
e
d
on
H
i
dde
n
M
a
r
kov
M
ode
l
,”
i
n
2009
A
s
i
a
-
P
ac
i
f
i
c
C
onf
e
r
e
nc
e
on
C
om
put
at
i
onal
I
nt
e
l
l
i
ge
nc
e
and
I
ndus
t
r
i
al
A
ppl
i
c
at
i
o
ns
(
P
A
C
I
I
A
)
,
N
ov.
2009,
pp.
26
–
29,
doi
:
10.1109/
P
A
C
I
I
A
.2009.5406648.
[
14]
R
.
C
r
e
t
ul
e
s
c
u,
A
.
D
a
vi
d,
D
.
M
or
a
r
i
u,
a
nd
L
.
V
i
nt
a
n,
“
P
a
r
t
of
s
pe
e
c
h
t
a
ggi
ng
w
i
t
h
N
a
&
a
m
p;
#x00E
F
;
ve
B
a
ye
s
m
e
t
hods
,”
i
n
2014
18t
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
Sy
s
t
e
m
T
he
or
y
,
C
ont
r
ol
and
C
om
put
i
ng
(
I
C
ST
C
C
)
,
O
c
t
.
2014,
pp.
446
–
451,
doi
:
10.1109/
I
C
S
T
C
C
.2014.6982457.
[
15]
S
.
P
.
S
i
ngh,
A
.
K
um
a
r
,
a
nd
H
.
D
a
r
ba
r
i
,
“
D
e
e
p
ne
ur
a
l
ba
s
e
d
na
m
e
e
nt
i
t
y
r
e
c
ogni
z
e
r
a
nd
c
l
a
s
s
i
f
i
e
r
f
o
r
E
ngl
i
s
h
l
a
ngua
ge
,”
i
n
2017
I
nt
e
r
nat
i
onal
C
onf
e
r
e
n
c
e
on
C
i
r
c
ui
t
s
,
C
ont
r
ol
s
,
and
C
om
m
uni
c
at
i
on
s
(
C
C
U
B
E
)
,
D
e
c
.
2017,
pp.
242
–
246,
doi
:
10.1109/
C
C
U
B
E
.2017.8394152.
[
16]
A
.
N
.
K
.
Z
a
m
a
n,
P
.
M
a
t
s
a
ki
s
,
a
nd
C
.
B
r
ow
n,
“
E
va
l
ua
t
i
on
of
s
t
op
w
or
d
l
i
s
t
s
i
n
t
e
xt
r
e
t
r
i
e
v
a
l
us
i
ng
l
a
t
e
nt
s
e
m
a
nt
i
c
i
nd
e
xi
ng,”
i
n
2011
Si
x
t
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
D
i
gi
t
al
I
n
f
or
m
at
i
on
M
a
nage
m
e
nt
,
S
e
p.
2011,
pp.
133
–
136,
do
i
:
10.1109/
I
C
D
I
M
.2011.6093315.
[
17]
S
.
X
u,
G
.
C
he
ng,
a
nd
F
.
K
ong,
“
R
e
s
e
a
r
c
h
on
que
s
t
i
on
c
l
a
s
s
i
f
i
c
a
t
i
on
f
or
a
ut
om
a
t
i
c
que
s
t
i
on
a
ns
w
e
r
i
ng,”
i
n
2016
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on A
s
i
an L
anguage
P
r
oc
e
s
s
i
ng (
I
A
L
P
)
, N
ov. 2016, pp. 218
–
221, doi
:
10.1109/
I
A
L
P
.2016.7875972.
[
18]
S
.
B
e
he
r
a
,
“
I
m
pl
e
m
e
nt
a
t
i
on
of
a
f
i
ni
t
e
s
t
a
t
e
a
ut
om
a
t
on
t
o
r
e
c
ogni
z
e
a
nd
r
e
m
ove
s
t
op
w
or
ds
i
n
e
ngl
i
s
h
t
e
xt
o
n
i
t
s
r
e
t
r
i
e
va
l
,”
i
n
2018
2nd
I
nt
e
r
nat
i
onal
C
onf
e
r
e
n
c
e
on
T
r
e
nd
s
i
n
E
l
e
c
t
r
oni
c
s
and
I
nf
or
m
at
i
c
s
(
I
C
O
E
I
)
,
M
a
y
2018,
pp.
476
–
480,
doi
:
10.1109/
I
C
O
E
I
.2018.8553828.
[
19]
K
.
P
a
w
a
r
a
nd
U
.
S
hr
a
w
a
nka
r
,
“
Q
ue
s
t
i
on
s
ys
t
e
m
a
t
i
z
a
t
i
on
us
i
ng
t
e
m
pl
a
t
e
s
,”
3r
d
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
C
om
put
i
ng
f
o
r
Sus
t
ai
nabl
e
G
l
obal
D
e
v
e
l
opm
e
nt
,
2016.
[
20]
R
.
M
i
ha
l
c
e
a
a
nd
C
.
C
.
C
.
S
t
r
a
ppa
r
a
va
,
“
C
or
pus
-
ba
s
e
d
a
nd
kno
w
l
e
dge
-
ba
s
e
d
m
e
a
s
ur
e
s
of
t
e
xt
s
e
m
a
nt
i
c
s
i
m
i
l
a
r
i
t
y,”
i
n
{
A
A
A
I
}
’
06
:
{
P
r
oc
e
e
di
ngs
}
of
t
he
21s
t
nat
i
onal
c
onf
e
r
e
nc
e
on {
A
r
t
i
f
i
c
i
al
}
i
nt
e
l
l
i
ge
nc
e
, J
ul
. 2006, vol
. 1, pp. 775
–
780.
[
21]
W
.
H
.G
om
a
a
a
nd
A
.
A
.
F
a
hm
y,
“
A
s
ur
ve
y
of
t
e
xt
s
i
m
i
l
a
r
i
t
y
a
ppr
oa
c
he
s
,”
I
nt
e
r
nat
i
onal
J
our
nal
of
C
om
put
e
r
A
ppl
i
c
at
i
ons
,
vol
.
68, no. 13, pp. 13
–
18, A
pr
. 2013, doi
:
10.5120/
11638
-
7118.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
34
-
40
40
[
22]
H
.
D
ong,
J
.
W
u,
X
.
Z
ha
o,
a
nd
Y
.
L
i
,
“
S
t
udy
on
t
he
c
a
l
c
ul
a
t
i
on
of
t
e
xt
s
i
m
i
l
a
r
i
t
y
ba
s
e
d
on
ke
y
-
s
e
nt
e
nc
e
,”
i
n
2010
I
nt
e
r
nat
i
ona
l
C
onf
e
r
e
nc
e
on E
-
B
us
i
ne
s
s
and E
-
G
ov
e
r
nm
e
nt
,
M
a
y 2010, pp. 1952
–
1955, doi
:
10.1109/
I
C
E
E
.2010.493.
[
23]
W
.
Y
i
h,
K
.
T
out
a
nova
,
J
.
C
.
P
l
a
t
t
,
a
nd
C
.
M
e
e
k,
“
L
e
a
r
ni
ng
di
s
c
r
i
m
i
na
t
i
ve
pr
oj
e
c
t
i
ons
f
or
t
e
xt
s
i
m
i
l
a
r
i
t
y
m
e
a
s
ur
e
s
,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
F
i
f
t
e
e
nt
h C
onf
e
r
e
nc
e
on C
om
put
at
i
onal
N
at
u
r
al
L
anguage
L
e
ar
ni
ng
, 2011, pp. 247
–
256.
[
24]
P
. S
hr
e
s
t
ha
, “
C
or
pus
-
ba
s
e
d m
e
t
hod
s
f
or
s
hor
t
t
e
xt
s
i
m
i
l
a
r
i
t
y,”
i
n
T
A
L
N
20
11
, 2
011, pp. 1
–
6.
[
25]
G
.
L
i
u
a
nd
H
.
W
a
ng,
“
A
r
e
c
ur
s
i
ve
de
s
c
e
nt
e
va
l
u
a
t
i
on
a
l
gor
i
t
hm
on
pol
i
c
y
c
ont
e
xt
s
i
m
i
l
a
r
i
t
y,”
i
n
2018
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on A
r
t
i
f
i
c
i
al
I
nt
e
l
l
i
ge
nc
e
and B
i
g D
at
a (
I
C
A
I
B
D
)
, M
a
y 2018, pp. 21
–
25, doi
:
10.1109/
I
C
A
I
B
D
.2018.8396160.
[
26]
Z
.
L
i
u
a
nd
X
.
C
he
n,
“
M
a
ppi
ng
t
e
xt
s
i
nt
o
gr
a
phs
:
A
n
i
m
pr
ove
d
t
e
xt
s
i
m
i
l
a
r
i
t
y
a
l
gor
i
t
hm
,”
i
n
P
r
oc
e
e
di
ngs
of
2012
2n
d
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
C
om
put
e
r
S
c
i
e
nc
e
and
N
e
t
w
or
k
T
e
c
hnol
ogy
,
D
e
c
.
2012,
pp.
1357
–
1361,
doi
:
10.1109/
I
C
C
S
N
T
.2012.6526173.
[
27]
T
.
X
ue
,
Y
.
Y
ua
n,
Q
.
F
u,
H
.
G
u,
S
.
Z
ha
ng,
a
nd
C
.
W
a
ng,
“
T
he
a
ppl
i
c
a
t
i
on
of
t
e
xt
s
i
m
i
l
a
r
i
t
y
c
om
put
i
ng
i
n
t
he
c
l
i
ni
c
a
l
de
c
i
s
i
on
s
uppor
t
s
ys
t
e
m
,”
N
ov. 2014, doi
:
10.1109/
c
c
i
s
.2014.7175759.
[
28]
L
.
D
ua
n
a
nd
T
.
X
u,
“
A
s
hor
t
t
e
xt
s
i
m
i
l
a
r
i
t
y
a
l
gor
i
t
hm
f
o
r
f
i
ndi
ng
s
i
m
i
l
a
r
pol
i
c
e
110
i
nc
i
de
nt
s
,”
i
n
2016
7t
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on C
l
oud C
om
put
i
ng and B
i
g D
at
a (
C
C
B
D
)
, N
ov. 2016, pp. 260
–
264, doi
:
10.1109/
C
C
B
D
.2016.058.
[
29]
T
.
J
o,
“
U
s
i
ng
k
-
ne
a
r
e
s
t
n
e
i
ghbor
s
f
or
t
e
xt
s
e
gm
e
nt
a
t
i
on
w
i
t
h
f
e
a
t
ur
e
s
i
m
i
l
a
r
i
t
y,”
i
n
2017
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
C
om
m
uni
c
at
i
on,
C
ont
r
ol
,
C
om
put
i
ng
and
E
l
e
c
t
r
oni
c
s
E
ngi
ne
e
r
i
ng
(
I
C
C
C
C
E
E
)
,
J
a
n.
2017,
pp.
1
–
5,
doi
:
10.1109/
I
C
C
C
C
E
E
.2017.7866706.
[
30]
N
.
A
l
na
j
r
a
n,
K
.
C
r
oc
ke
t
t
,
D
.
M
c
L
e
a
n,
a
nd
A
.
L
a
t
ha
m
,
“
A
h
e
ur
i
s
t
i
c
ba
s
e
d
pr
e
-
pr
oc
e
s
s
i
ng
m
e
t
hodol
ogy
f
or
s
hor
t
t
e
xt
s
i
m
i
l
a
r
i
t
y
m
e
a
s
ur
e
s
i
n
m
i
c
r
obl
ogs
,”
i
n
2018
I
E
E
E
20
t
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
H
i
gh
P
e
r
f
or
m
anc
e
C
om
put
i
ng
and
C
om
m
uni
c
at
i
ons
;
I
E
E
E
16t
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
Sm
ar
t
C
i
t
y
;
I
E
E
E
4t
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
D
at
a
Sc
i
e
n
c
e
and
Sy
s
t
e
m
s
(
H
P
C
C
/
Sm
ar
t
C
i
t
y
/
D
SS)
, J
un. 2018, pp. 1627
–
1633, doi
:
10.1109/
H
P
C
C
/
S
m
a
r
t
C
i
t
y/
D
S
S
.2018.00265.
[
31]
D
.
L
i
n,
“
A
n
i
nf
or
m
a
t
i
on
-
t
he
or
e
t
i
c
de
f
i
ni
t
i
on
of
s
i
m
i
l
a
r
i
t
y,”
i
n
I
C
M
L
’
98:
P
r
oc
e
e
di
ngs
of
t
he
F
i
f
t
e
e
nt
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on M
ac
hi
ne
L
e
ar
ni
ng
, 1998, pp. 296
–
304.
B
I
O
G
R
A
P
H
I
E
S
O
F
A
U
T
H
O
R
S
Ambrish
Srivastav
is
a
research
scholar
at,
Devi
Ahilya
U
nive
rsity
(DAVV),
Indore
and
having
teaching
experience
approx
10
years
in
field
of
Computer
Science
and
Engineering.
He
graduated
in
2009
from
I.E.T.E,
New
Delhi
and
received
his
Master’s
degree
in 2011 from
I.E.T. D
AVV. His
research interests are
Artificial Intell
i
gence, Natural Language
Proce
ssing a
nd Mac
hine L
earn
ing.
He can be contacted at email:
a.srivast
av30@
gmail.co
m.
Dr.
Shaligram
Prajapat
has
been
working
in
academics
as
a
n
education
ist,
teacher
,
resear
cher
and
learner
since
past
2
decade
s.
He
has
execu
ted
many
academic
and
research projects
as a part of
Devi Ahil
ya universit
y,
India. In res
earch
career, with P
h.D. from
Maulana
Azad
National
Institute
of
Technology
(M.A.N.I.T
.)
in
Com
puter
Applications
from
Bhopal
India
and
Master
of
Philosophy
(Computer
Science)
from
Devi
Ahilya
University
Indore,
he
has
more
many
research
publicati
ons
in
International
jo
urnals
listed
in
web
of
science and
Scopus.
He can be contacted at email
:
shaligram.praja
pat@
gmail.com.
Evaluation Warning : The document was created with Spire.PDF for Python.