Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
3
,
June
2020
,
pp.
2934
~
2943
IS
S
N:
20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v
10
i
3
.
pp2934
-
29
43
2934
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
Towa
rds o
ptim
ize
-
ESA
for text s
em
antic s
imi
larity
: A case
study
of bi
omedica
l
text
Khaoul
a Mrh
ar
1
, M
ou
ni
a A
bik
2
1
IPS
S Re
sea
rch
Te
am,
FS
R,
Moham
m
ed
V Unive
rsit
y
,
Morocc
o
2
IPS
S Re
sea
rch
Te
am,
ENSIA
S, M
ohamm
ed
V Unive
rsit
y
,
Moroc
co
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
J
un
4
, 2019
Re
vised
Jan
3
,
2020
Accepte
d
J
a
n
10
, 2
020
Expl
icit
Sem
antic
Anal
y
sis
(
ESA)
is
an
appr
o
a
ch
to
m
ea
sure
t
he
sem
ant
i
c
rel
a
te
dness
be
twee
n
te
rm
s
or
do
cuments
base
d
o
n
sim
il
ari
t
ie
s
to
documents
of
a
ref
er
enc
e
s
cor
pus
usually
W
iki
p
edi
a
.
ESA
usage
has
rec
ei
v
ed
tre
m
endous
at
t
e
nti
on
in
th
e
fi
eld
of
nat
ura
l
l
an
guage
proc
essing
NLP
and
informati
on
retri
eva
l
.
How
eve
r,
ESA
uti
li
z
es
a
huge
W
iki
pedia
i
ndex
m
at
rix
in
it
s
int
erp
r
etation
b
y
m
ult
iply
i
n
g
a
la
rge
m
at
rix
b
y
a
t
erm
vec
tor
to
produc
e
a
high
-
d
imensional
v
ec
tor
.
Cons
eque
nt
l
y
,
the
ES
A
proc
ess
is
too
expe
nsiv
e
in
interpre
ta
t
ion
and
sim
ilarity
s
te
ps.
The
r
efo
re
,
the
eff
icien
c
y
of
ESA
will
slow
down
bec
ause
we
lose
a
lot
of
ti
m
e
in
unnec
essar
y
oper
at
ions
.
Thi
s
pape
r
prop
ose
enha
nce
m
en
ts
to
ESA
ca
ll
e
d
opti
m
iz
e
-
ESA
tha
t
red
uc
e
the
dimension
at
the
interpre
t
at
ion
st
ag
e
b
y
computing
the
sem
ant
ic
sim
il
ari
t
y
in
a
spec
if
ic
dom
ai
n.
The
expe
r
iment
al
result
s
show
cl
e
arly
that
our
m
et
hod
cor
rel
a
te
s
m
uch
b
et
t
er
with
hum
an
judge
m
ent
t
han
the
full
ver
sion E
SA
app
roa
ch.
Ke
yw
or
d
s
:
Ex
plici
t se
m
antic
an
al
ysi
s ES
A
Natu
ral la
ngua
ge
processi
ng
NLP
Sem
antic
r
el
ated
ne
ss
Sem
antic
si
m
ilarity
Copyright
©
202
0
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Kh
a
oula
Mr
har,
IP
SS
Resea
rc
h Tea
m
,
FSR, M
oh
am
m
ed
V U
niv
e
rs
it
y,
Ibn
Ba
to
uta a
ve
nu
e
,
Ra
bat,
Mor
occo
Em
a
il
:
Kh
ao
ula_m
rh
ar@
um
5.
ac.m
a
1.
INTROD
U
CTION
Sem
antic
related
ne
ss
m
easure
s
quantify
t
he
de
gr
ee
i
n
wh
i
ch
tw
o
w
or
ds
or
c
oncepts
ar
e
relat
ed
i
n
a
ta
xono
m
y
by
us
ing
al
l
rel
at
ion
s
betwee
n
them
,
su
ch
as
synonym
y,
h
yp
on
ym
y.
Se
m
antic
si
m
il
a
rity
is
a
sp
eci
al
case
of
relat
ed
ness
and
it
is
lim
i
ted
to
hypo
nym
y
(i.e.
is
-
a
)
r
el
at
ion
s.
Me
as
ur
es
of
relat
ed
ne
ss
or
si
m
il
arity
are
us
e
d
in
m
any
Nat
ur
al
L
angua
ge
P
ro
c
essing
(
NLP)
ap
plica
ti
on
s,
su
c
h
as
wor
d
se
ns
e
disam
big
uatio
n,
I
nfo
rm
ation
retrieval
,
aut
om
at
ic
detection
an
d
sp
el
li
ng
correct
ion,
se
m
antic
ann
otat
ion
,
te
xt
cl
us
te
rin
g
an
d
cl
assifi
cat
ion
,
top
ic
detect
ion
[
1,
2].
Me
asur
i
ng
the
se
m
an
ti
c
si
m
il
ar
it
y
between
te
xts
is
a
chall
eng
i
ng
ta
sk
.
The
tra
di
ti
on
al
le
xical
appr
oach
base
d
on
Ba
g
of
Wor
d
(BO
W)
[3
]
an
d
vect
or
sp
ace
m
od
el
[4
]
w
hi
ch
co
nvert
e
ach
te
xt
int
o
a
w
ord
vector,
ha
s
a
no
t
or
i
ou
s
disa
dv
a
ntage
t
hat
is
ignor
e
the sem
antic
r
el
at
ion
sh
i
p
am
on
g w
ords
a
nd
t
reat w
ords
in
de
pende
nt of ea
ch
ot
her
[3
]
. O
ne
so
l
uti
on to
r
esolve
this
prob
le
m
is
to
enr
ic
h
te
xt
represe
ntati
on
with
an
e
xternal
so
ur
c
e
of
kn
ow
le
dg
e
.
So
m
e
te
chn
iq
ue
use
la
rg
e
corp
or
a
s
uc
h
as
the
sta
ti
sti
cal
corpus
ba
se
d
sim
i
la
rity
a
ppr
oach,
w
hic
h
m
easur
es
th
e
sem
antic
si
m
il
arity
m
et
ric
between
two
te
xt
and
word
base
d
on
the
inform
at
io
n
gai
ned
f
r
om
corp
or
a.
A
Co
r
pu
s
ref
e
rs
to
a
la
rg
e
colle
ct
ion
of
wr
it
te
n
or
spo
ken
te
xts
that
is
us
e
d
to
stud
y
a
nd
descri
be
a
la
ng
uage.
The
m
os
t
relevant
te
chn
iq
ue
of
th
is
appr
oach
is
HA
L
[
4],
LS
A
[4
]
,
ESA
[
5].
Howe
ver
,
the
corp
or
a
te
c
hn
i
qu
e
s
are
unstr
uc
ture
d
and
im
pr
eci
se.
Morev
e
r,
oth
e
r
te
chn
i
qu
es
use
a
le
xical
structur
es
s
uch
as
ta
xono
m
ie
s
sp
eci
al
ly
wo
r
dne
t
[6
]
,
bu
t
wordnet
is
lim
it
ed
in
scope
and
c
ov
e
ra
ge
and
do
e
s
not
inclu
de
the
in
form
ation
ab
ou
t
nam
ed
entit
ie
s
an
d
sp
eci
al
iz
ed
c
oncept, a
nd does
n’
t
giv
e
a
good
r
es
ults in te
xt
si
m
il
arity [
7].
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
To
wa
rd
s
opti
m
ize
-
ESA
for
text
seman
ti
c simi
larit
y: A
c
as
e
s
tud
y
of
biomed
ic
al text
(
Khao
ula
Mrh
ar
)
2
935
In
c
ontrast
,
to
so
lve
t
hese
s
hortcom
ing
s,
W
i
kip
e
dia
is
a
n
outst
an
ding
resou
rce
f
or
te
xt
sem
antic
si
m
il
arity
pr
oblem
.
It’s
a
la
rg
e
-
scal
e
colla
bo
rati
ve
open
e
nc
yc
lop
edia
that
has
evo
l
ve
d
int
o
a
com
pr
eh
ensive
resou
rce
with
ver
y
go
od
c
overa
ge
on
dive
rse
to
pics,
im
po
rtant
entit
ie
s,
eve
nts,
it
widely
co
ver
s
nam
ed
entit
ie
s,
dom
ain
s
pecific
e
ntit
ie
s,
an
d
ne
w
e
ntit
ie
s.
The
En
glish
Wiki
ped
i
a
cu
rr
e
ntly
co
nt
ai
ns
over
4
m
il
li
on
arti
cl
es
(inclu
ding
re
dir
ect
io
n
arti
cl
es).
F
urt
her
m
or
e,
Wi
kiRel
at
e
[7
]
was
the
fi
rst
work
w
hich
c
om
pu
te
the
m
easur
es
of
sem
antic
relat
edn
ess
us
in
g
W
i
kip
e
dia,
thi
s
appro
ac
h
ap
plied
the
fam
iliar
te
chn
iq
ue
use
d
in
sem
antic
related
ne
ss
based
on
w
ord
net
a
nd
m
od
ifie
d
i
t
to
be
us
e
d
in
W
i
kip
e
dia,
su
c
h
as
path
-
le
ngth
m
easur
e
[
8],
but
in
gen
e
ral
th
e
resu
lt
s
a
re
sim
il
ar.
Howev
e
r,
Gabril
ov
ic
h
and
Ma
rko
vitch
(
2007)
[
5]
propos
e
a
ne
w
a
ppr
oac
h
with
E
xpli
ci
t
Sem
antic
An
a
ly
sis
(ESA)
t
ha
t
achieve
hi
gh
ly
accurate
res
ults,
this
m
et
h
od
ha
s
been
e
xtens
ive
ly
stud
ie
d
in
m
any
ap
plica
tio
ns
[
9].
ES
A
us
e
Wiki
ped
ia
as
a
sem
antic
inter
pr
et
er
a
nd
bu
il
ds
a
weig
hted
in
ver
te
d
vecto
r
that
m
aps
each
te
r
m
into
a
list
of
W
i
kip
e
dia
arti
cl
es
in
wh
ic
h
it
app
e
ars
,
a
nd
com
pu
te
s
the
si
m
i
la
rity
between
vecto
rs
ge
ne
rated
f
ro
m
tw
o
te
rm
s
or
te
xt
s.
It
m
eans
that
the
inv
erted
vecto
r
m
ay
con
ta
in
a
m
i
ll
ion
s
of
c
olu
m
ns
with
m
any
0
value
con
si
der
in
g
t
he
sh
ee
r
siz
e
of
W
i
kip
e
dia
arti
cl
es
(m
or
e
than
4M
con
ce
pts).
A
ccordin
gly,
int
erpreti
ng
te
xt
base
d
on
al
l
W
i
kip
e
dia
c
oncepts
can
be
e
xp
e
ns
i
ve
and
c
om
pu
ti
ng
se
m
antic
related
ne
ss
after
be
tween
this
huge
vecto
rs
us
i
ng
Cosine
sim
i
la
rity
,
the
eff
ic
ie
nc
y
of
ESA
will
slo
w dow
n.
Seve
ral
relat
ed
pa
per
a
re
interest
ed
t
o
this
pro
blem
.
[1
0]
Pro
pose
Eco
no
m
y
-
ESA
wh
ic
h
is
an
econom
ic
schem
a
of
e
xp
li
ci
t
sem
antic
analy
sis
ESA
,
by
reduce
t
he
ES
A
in
de
x
m
at
ri
x
dim
ension
usi
ng
rand
om
sel
ecti
on,
k
-
m
eans
a
nd
norm
-
base
d
cl
us
te
rin
g
a
ppr
oach
es
.
T
he
auth
ors
in
[
11
]
pro
po
se
a
novel
gr
a
ph
-
base
d
re
la
te
dn
ess
asse
ssm
ent
m
et
ho
d
us
in
g
W
i
kipe
dia
featur
e
s
to
av
oid
the
dr
awb
ac
ks
.
It
propose
Naive
-
ES
A
al
gorithm
to
return
the
to
p
m
os
t
relevan
t W
i
ki
ped
ia
in o
r
der
to
reduce
the dim
ension
al
sp
a
ce
of
Ex
plici
t
Se
m
a
ntic
A
naly
sis
(
ESA).
A
n
e
ff
i
ci
ent
an
d
e
ff
ec
ti
ve
al
gorithm
was
pro
po
s
ed
in
[
12]
,
it
’s
re
pr
ese
nt
the
m
eaning
of
a
te
xt
by
u
sin
g
the
co
nce
pts
that
best
m
at
ch
it
.
This
ap
pro
ach
first
c
om
pu
te
s
the
ap
pro
xim
a
te
top
-
k
Wiki
pedi
a
con
ce
pts
th
at
are
m
os
t
relevan
t
to
t
he
giv
e
n
te
xt
a
nd
then
le
ver
a
ge
these
co
nce
pt
s
for
represe
nting
t
he
m
eaning
of
t
he
gi
ven
te
xt.
Fo
ll
owin
g
the
above
-
m
entio
ned
st
udie
s,
in
t
his
pa
per
we
present
a
new
m
et
ho
d
that
op
ti
m
iz
e
ESA
a
ppro
ac
h
and
reso
l
ve
som
e
of
it
s
lim
i
t
at
ion
an
d
draw
backs.
O
ptim
ize
-
E
S
A
reduce t
he dim
ensio
n
at
t
he
in
te
rp
retat
io
n
sta
ge by com
pu
ti
ng the
sem
antic
sim
il
arit
y i
n
a sp
eci
fic
dom
ai
n.
Th
us
,
based
on
seve
ral
w
orks
[
13]
,
us
in
g
a
do
m
ai
n
know
le
dg
e
base
is
m
or
e
benefici
al
an
d
perform
ant
in
sem
a
ti
c
si
m
il
a
rity
com
pu
ta
tio
n
pr
ocess
[14
]
.
This
res
ult
ha
s
pus
hed
m
any
researc
he
rs
t
o
us
e
do
m
ai
n
knowl
edg
e
base
w
he
n
the
te
xt
in
pu
t
dom
ai
n
is
al
read
y
known
.T
h
e
ba
sed
m
ajo
rity
of
work
in
sem
antic
si
m
i
l
arit
y
in
a
sp
ec
ific
do
m
ai
n
ar
e
in
a
bi
om
edi
cal
do
m
ai
n
be
cause
of
th
e
prolife
rati
on
of
te
xtu
al
resou
rces
an
d
the
i
m
po
rta
nce
of
the
te
rm
ino
log
y.
I
n
this
co
ntext,
the
sta
te
-
of
-
t
he
-
a
rt
m
eth
ods
f
or
cal
c
ulati
ng
sem
antic
related
ne
ss
in
a
s
pe
ci
fic
dom
ai
n
can
be
rou
gh
l
y
div
ide
d
i
nto
two
m
ai
n
gro
up
s
.
T
hose
th
at
are
con
ce
ntrate
d
on
ontolo
gy
ba
sed
m
et
ho
ds
[
15
]
A
nd
distri
bu
ti
onal
m
et
h
o
ds
t
hat
us
e
the
dom
ai
n
sp
eci
fic
corp
us
[
16
]
.
Ma
ny
at
tem
pt
s
to
us
e
W
i
kip
edia
to
c
om
pu
te
sem
antic
si
m
i
la
rity
in
a
sp
eci
fic
dom
ain
.
[
17]
assesses
the
s
uitabil
it
y
of
W
i
kip
e
dia
i
n
the
bi
om
edical
do
m
ai
n
as
a
pote
ntial
knowle
dge
resou
r
ce
f
or
sem
antic
related
ne
ss
com
puta
ti
on
by
com
par
i
ng
it
with
oth
er
m
et
hods
(
ontolo
gy
base
d,
dist
rib
ut
ion
al
m
et
ho
ds).
Howev
e
r,
Jai
s
wal
[18]
pro
pose
a
m
et
ho
d
for
ca
lc
ulati
ng
the
se
m
antic
relat
edn
ess
of
te
xt
relat
ed
to
diseases,
co
ndi
ti
on
s,
an
d
well
ness
issues
tha
t
us
es
ESA
wi
th
Me
dlinePlu
s
as
it
s
kn
ow
l
edg
e
base
inst
ead
of
W
i
kip
e
dia.
In
this
paper,
we
pro
pose
an
appro
ac
h
op
ti
m
iz
e
-
ESA
that
perform
the
E
SA
a
ppro
ac
h
a
nd
pro
vid
e
s
sign
ific
a
nt
gai
ns
in
exec
utio
n
tim
e
and
sp
a
ce
con
s
um
ing
without
causi
ng
sig
nificant
r
edu
ct
io
n
in
preci
sion
.
In
our
ap
proa
ch
we
li
m
i
t
t
he
K
c
oncept
base
d
on
the
cat
eg
or
y
W
i
kip
e
dia
tree
a
nd
the
dom
ain
in
put
.
Af
te
r
t
hat,
we
le
ver
a
ge
these
con
ce
pts
vecto
r
to
m
ap
a
te
xt
from
the
keyword
-
s
pace
int
o
the
c
on
c
ept
-
sp
ac
e
op
ti
m
iz
ed.
All
evaluati
ons
a
re
pe
rfor
m
ed
on
dataset
s
co
ntai
ning
pairs
of
t
erm
s
fr
om
biom
edical
do
m
ain
a
nd
a
go
l
d
sta
ndar
d
sem
antic
si
mil
arit
y
value
f
or
each
pair.
T
he
resu
lt
s
are
c
om
par
ed
with
th
e
resu
lt
s
of
the
ESA
appr
oach
a
nd
t
he
oth
er
sta
te
of
a
rt
sem
antic
si
m
il
arity
appro
ac
h.
The
re
m
ai
nd
er
of
t
his
pa
per
is
orga
ni
zed
as
fo
ll
ows.
Sect
i
on
3
prese
nt
ou
r
m
e
tho
d
op
ti
m
iz
e
-
ESA
an
d
it
arch
it
ect
ur
e,
Sect
ion
4
det
ai
ls
the
exp
eri
m
ents
that
evaluate
the
eff
ect
ive
nes
s
of
our
m
et
ho
d
an
d
repo
rts
the
analy
sis
of
resu
lt
s
in
the
bio
m
edical
do
m
ai
n.
Finall
y, we re
m
ark
our
c
on
cl
us
io
n
a
nd
pr
es
ent s
om
e p
erspec
ti
ves
f
or futu
re r
e
searc
h
in
Sect
ion
5.
2.
PROP
OSE
D
APP
ROAC
H
:
OPTI
MIZ
E
-
ESA F
OR SE
MANTI
C
SI
MILA
RIT
Y MEAS
U
RES
2
.
1.
T
h
e Wiki
pedia
fe
at
ure
s
W
i
kip
e
dia
is
a
la
rg
e
onli
ne
encycl
opedia
fou
nd
e
d
i
n
2001
a
nd
it
is
a
fr
ee
,
e
ditable
by
use
rs
,
web
-
based,
col
la
borati
ve,
m
ulti
li
ng
ual
encyc
lop
e
dia.
Wh
il
e
it
un
derwe
nt
a
tre
m
end
ous
grow
t
h
an
d
cu
rr
e
ntly
com
pr
ise
s
m
ore
than
2,382,0
00
a
rtic
le
s
in
about
25
0
la
ngua
ges
.
A
nd
be
com
e
on
e
of
the
m
os
t
i
m
p
or
ta
nt
inf
or
m
at
ion
r
es
ources i
n
the
web.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
2
9
3
4
-
2
9
4
3
2936
W
i
kip
e
dia c
onte
nt is
pr
ese
nte
d on pa
ges:
Ar
ti
cl
es:
A
re
t
he
norm
al
page
in
Wiki
ped
ia
that
c
on
ta
in
e
ncycl
op
e
dic
i
nfo
rm
ation
,
Eac
h
a
rtic
le
desc
ribes
a
sing
le
co
nce
pt
or
to
pic
wit
h
a
co
ncise
ti
t
le
that
can
be
us
e
d
in
a
onto
log
ie
s
an
d
a
bri
ef
over
view
of
the to
pic. T
he
r
e is o
nly o
ne
a
r
ti
cl
e fo
r e
ach
c
on
ce
pt
or to
pic.
Re
directs:
Re
di
rects
is
a
Wi
kip
e
dia
pag
e
wh
ic
h
a
uto
m
atical
ly
red
irect
s
us
e
rs
t
o
a
nother
pa
ge
(c
onnect
arti
cl
es
to
arti
cl
es
or
sect
io
n
of
a
n
arti
cl
e).
I
t
is
po
ssi
ble
to
red
irect
t
o
jus
t
a
sp
eci
fic
sec
ti
on
of
the
ta
rget
pag
e
.
Disam
big
uatio
n
pa
ges:
disa
m
big
uation
is
the
process
of
reso
l
ving
confli
ct
s
wh
e
n
arti
cl
e
ti
tl
e
is
a
m
big
uous,
it
con
ta
in
a
li
st
of
arti
cl
es
cor
re
sp
on
ding
to
dif
fer
e
nt
m
eaning
of
the
sam
e
wo
r
d.
F
or
ex
am
ple
,
the
w
ord
"Jav
a
"
can
re
fer
t
o
an
isl
an
d
of
I
ndonesi
a,
a
pro
gr
am
m
ing
la
ngua
ge,
a
Fr
e
nc
h
band,
a
nd
m
any
oth
e
r
thi
ng
s
.
Ca
te
go
ries:
cat
egories
are
node
s
for
hiera
r
chical
organ
iz
at
ion
of
arti
cl
es,
it
intend
to
gro
up
pa
ges
on
si
m
il
ar
su
bj
ect
s,
alm
os
t
al
l
W
i
kip
e
dia
arti
cl
es
are
within
on
e
or
m
or
e
cat
egories.
W
i
ki
ped
ia
cat
eg
ory
i
s
orga
nized
as
a
netw
ork
t
hat we
present
bri
efl
y i
n
sect
io
n 3.3
.1
.
2
.
2
.
ES
A Ap
p
roa
c
h
Ex
plici
t
Se
m
a
ntic
An
al
ysi
s
create
d
by
Ga
br
il
ovic
h
a
nd
Ma
rkov
it
ch
[
19
]
.
This
ap
pro
ach
co
ns
ist
t
o
represe
nt
te
xts
as
weig
hted
m
ixtur
e
of
a
s
et
con
ce
pts
an
d
us
i
ng
Wi
kipe
dia
co
ncep
t
wh
ic
h
each
c
oncept
i
s
a
ti
tl
e
of
W
iki
ped
ia
pa
ge
.
Th
e
m
ai
n
adv
ant
age
of
this
ap
proac
h
is
the
use
of
a
vast
am
ount
of
highly
hu
m
an
knowle
dge.
Th
e
first
ste
p
of
this
appro
ac
h
is
to
con
str
uct
the
se
m
antic
interp
reter
that
m
aps
fr
agm
e
nts
of
natu
ral
la
ng
ua
ge
te
xt
i
nto
a
weig
hted
se
qu
ence
of
Wiki
pe
dia
co
nce
pts
or
der
e
d
by
their
releva
nce
to
t
he
input.
Give
n
a
in
pu
t
te
xt
F
rag
m
ent
T
com
po
se
of
I
words
T={
wi
},
w
e
fi
rst
repr
esent
it
as
an
inter
pr
et
at
io
n
ve
ct
or
s
us
in
g
TF
IDF
S
chem
a
Vi
,
where
Vi
is
the
w
ei
gh
t
of
the
w
ord
wi.
T
he
n,
we
us
e
Wiki
pe
dia
arti
cl
es
as
ind
e
x
do
c
um
ents,
each
W
i
kip
e
dia
con
ce
pt
is
re
presente
d
as
a
vecto
r
of
w
or
ds
that
occur
in
the
co
rr
e
spondin
g
arti
cl
e.
Entries
of
these
vector
s
a
re
assig
ne
d
weig
hts
usi
ng
T
FIDF
sc
hem
e.
Hen
ce,
t
hese
wei
gh
ts
qu
a
n
ti
fy
the
stren
gth
of
associat
ion
be
tween
w
ords
a
nd
c
on
ce
pts.
We
buil
d
an
in
ver
te
d
ind
e
x
w
hich
m
aps
each
w
ord
into
a
li
st
of
co
ncep
t
in
wh
ic
h
it
app
ears.
Let
Kj
be
an
in
vert
ed
ind
e
x
ent
ry
fo
r
wor
d
W
i
,
wh
ic
h
K
j
qua
nt
ifie
s
the
stren
gth
of
associat
ion
of
wo
r
d
W
i
wit
h
W
iki
ped
ia
c
on
ce
pt
cj
,
{
cj
,
c1,
.
.
.
,
cN
}
(w
he
re
N
de
no
te
s
the
total
num
ber
of
W
i
kip
e
di
a
con
ce
pts).
T
hen,
the
sem
antic
interpr
et
at
ion
vecto
r
V
f
or
te
xt
T
is
a
ve
ct
or
of
le
ng
th
N
,
in
w
hich
t
he
wei
ght
of
eac
h
c
on
c
ept
Cj
is
def
i
ne
d
as
∑
wi
€
T
vi
.
kj
Entries
of
t
his
vect
or
ref
le
ct
the
releva
nce
of
the
c
orres
pondin
g
co
nce
pt
s
to
te
xt
T
.
Af
te
r
T
hat
ES
A
us
es
C
os
ine
m
et
ric
to
com
pu
t
e
sem
antic
related
n
e
ss
of
a
pa
ir
of
te
xt
fr
a
gm
ents
by
com
par
i
ng
t
heir
ve
ct
or
s.
The
F
i
gure
1
belo
w
pr
ese
nt
the wh
ole ES
A
pro
ces
s.
Figure
1. Ex
plici
t se
m
antic
an
al
ysi
s ESA
sys
tem
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
To
wa
rd
s
opti
m
ize
-
ESA
for
text
seman
ti
c simi
larit
y: A
c
as
e
s
tud
y
of
biomed
ic
al text
(
Khao
ula
Mrh
ar
)
2937
The
ES
A
ap
proach
is
sim
ple
and
ef
fici
ent,
howe
ver
the
pr
ocess
is
too
ex
pensi
ve
f
or
m
a
ny
rea
sons
.
Firstl
y,
the
di
m
ension
of
c
oncept
vecto
r
f
or
a
giv
e
n
word
is
too
la
rge
be
cause
it
le
ng
t
h
equ
al
s
t
o
al
l
con
cept
s
in
W
i
kip
e
dia
con
si
der
i
ng
the
sh
eer
siz
e
of
W
i
kip
e
dia
a
rtic
le
(m
or
e
than
4
M
c
on
ce
pt).
Seco
ndly
,
to
pro
du
c
e
this
co
nce
pt
ve
ct
or
,
the
over
al
l
ind
ex
m
at
ri
x
m
us
t
be
m
ulti
plied
by
a
t
erm
vector
a
nd
giv
e
a
la
r
ge
ind
e
x
m
at
rix
that
requires
nu
m
erou
s
m
ulti
plica
ti
o
ns
.
T
hir
dly,
th
e
sp
ace
ve
ct
or
of
a
wor
d
is
a
m
at
rix
in
wh
ic
h
m
os
t
of
the
el
em
ent
s
are
zer
o
beca
us
e
t
he
w
ord
will
ap
pear
s
ju
st
in
a
fe
w
Wi
ki
pe
dia
arti
cl
es
.
T
he
reinter
pret
at
ion
of
te
xt
base
d
on
Wi
kip
e
dia
con
ce
pt
ca
n
be
ver
y
e
xpens
ive
an
d
sl
ow
,
b
eca
us
e
w
e
lose
a
lot
of
ti
m
e
i
n
unnecessa
ry
operati
ons
beca
us
e
the
ze
ro
va
lue
in
hi
gh
-
dim
ension
al
s
parse
vecto
rs
ca
n
i
m
pact
eff
ic
ie
nc
y
and
perform
ance
of
ES
A
ap
proa
ch.
F
i
nally
the
com
pu
ta
ti
on
s
of
sim
il
arity
or
relat
ed
ness
betwee
n
tw
o
ve
ct
ors
with
num
ero
us
dim
ension
s
are
ve
ry
costly
.
Thu
s
,
b
eca
use
of
this
pr
ob
l
e
m
s,
we
pro
pose
in
this
pa
per
a
n
appr
oach
wh
ic
h
opti
m
iz
e
the
ESA
a
ppr
oac
h
an
d
al
lo
wed
us
to
no
t
ret
urn
the
vect
or
sp
ace
f
or
t
he
whole
con
ce
pts
in
Wi
kip
e
dia
but
onl
y
the
top
k
c
oncepts
m
os
t
re
le
van
t.
Indee
d,
g
ive
n
a
do
m
ain
s
pecific,
we
sel
ect
the
m
os
t
rele
van
t
W
i
kip
e
di
a
arti
cl
es
relat
ed
to
dom
a
in
Di
base
d
on
Wiki
ped
ia
cat
ego
ry
network.
Fu
rt
her
m
or
e,
we
create
a
dom
ai
n
ind
ex
Ui
that
save
the
inv
e
rted
in
de
x
of
W
i
kip
e
dia
arti
cl
es
of
each
do
m
ai
n
cal
culat
ed
afte
r
a d
om
ai
n
Di en
te
re
d.
A
nd for
eac
h
te
xt
T
i
n
a
s
pecific dom
ai
n
Di,
we
se
m
antic
al
ly
reinterp
ret
it
based
on
k
c
on
ce
pt
save
d
in
dom
ai
n
ind
ex
U
j
.
W
e
proc
ess
an
update
for
this
dom
ain
ind
e
x
acco
r
di
ng
to
W
i
kip
e
dia
upda
te
f
re
qu
e
ncy.
We
pr
ese
nt
br
i
efly
the
op
ti
m
i
ze ES
A
a
ppro
a
ch
in
the
sect
io
n belo
w.
2
.
3
.
Op
timi
z
e
-
ESA
ap
pr
oac
h
In
t
his
pa
pe
r,
we
pr
opos
e
a
n
ap
proach
to
c
om
pu
te
a
sem
a
ntic
si
m
il
arity
in
a
s
pecific
dom
ai
n
cal
le
d
the
Op
ti
m
iz
e
-
ESA
ap
proac
h.
This
appro
ac
h
reso
lve
s
om
e
of
the
s
hortco
m
ing
s
of
ES
A
appr
oach
a
nd
optim
iz
e
it
in
te
r
m
of
s
pa
ce
co
ns
um
ing
and
ti
m
e
si
m
ilarity
com
pu
ta
tio
n.
T
he
arc
hitec
ture
of
our
a
ppr
oac
h
prese
nted
in
Figure
2
,
it
con
sist
o
f
tw
o
la
ye
rs: f
il
te
r k c
onc
ept for
dom
ai
n
Di and
buil
d
a
do
m
ai
n
inve
rte
d
in
de
x.
Figure
2.
O
ptim
iz
e
-
ESA
a
rchi
te
ct
ur
e
2
.
3
.
1.
Fir
st
L
ayer
:
filter
K c
on
cep
t fo
r
d
om
ain
Di
The
relat
ion
s
hi
p
betwee
n
c
on
cept
or
a
rtic
le
and
cat
eg
or
y
i
n
W
i
kipe
dia
is
ex
pr
es
sed
by
a
li
nk
cal
le
d
cat
egory
li
nk
(the
En
glish
ver
si
on
c
on
ta
i
n
49.
98
m
illio
n
inte
r
li
nk
s
in
Septem
ber
2006
[
20
]
).
I
nd
ee
d,
the
W
i
kip
e
dia
cat
egory
syst
em
is
so
ci
al
ly
c
reated
a
nd
edit
ed
an
d
a
ny
use
r
can
c
reate
a
n
arti
cl
e
and
cl
a
ssify
it
into
cat
eg
ory
.
This
le
ad
s
to
a
tre
m
end
ous
grow
t
h
of
a
rtic
le
s
an
d
cat
e
gories
in
W
i
kipe
dia
(m
or
e
tha
n
5000
00
cat
egories
in
En
glish
W
iki
pe
dia
arti
cl
e
[
20
]
).
Con
s
eq
ue
ntly
,
W
i
kip
e
dia
editors
try
to
bette
r
org
anize
W
i
kip
e
dia
cat
egory
str
uctu
r
e
by
purifyi
ng
certai
n
co
nce
pts
a
nd
s
plit
cat
egory
in
t
o
m
ul
ti
ple
fine
-
gr
ai
ne
d
cat
egories
(t
he
num
ber
of
cat
egories
in
wi
ki
-
14
was
i
ncr
ea
sed
25%
tha
n
wiki
-
12).
F
ur
t
her
m
or
e,
t
he
c
at
egory
syst
e
m
in
W
i
kip
edia
is
repres
ented
as
a
di
re
ct
ed
grap
h
w
he
re
no
des
r
ep
r
esent
pa
ges
or
cat
egories,
a
nd
edges
represe
nt
the
or
ie
nted
relat
ion
s
hi
p
“i
s
assigne
d
to”.
E
ve
ry
cat
egory
has
a
m
ulti
ple
par
e
nts
an
d
childr
e
n
cat
egories.
A
nd
each
cat
e
go
ry
is
connecte
d
to
a
nu
m
ber
of
arti
cl
es
(
cov
e
ra
ge
al
l
W
i
kip
e
dia
arti
cl
es
by
a
cat
ego
ry)
.
Be
sides,
the
cat
egory
syst
e
m
i
n
W
iki
ped
ia
ha
s
a
ta
x
onom
y
structur
e
wh
i
ch
is
a
hierar
c
hy
of
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
2
9
3
4
-
2
9
4
3
2938
top
ic
s
a
n
d
s
ubto
pics
as
sho
wn
in
F
ig
ur
e
3
.
It
e
nab
le
s
us
t
o
sea
rch
a
rtic
le
s
by
na
r
r
ow
i
ng
f
ro
m
broa
der
cat
egories
to
the
dow
n
cat
e
gories.
I
ndeed
,
W
i
kip
e
dia
offe
r
a
cat
e
gory
tre
e
syst
e
m
[
21
]
wh
ic
h
e
na
ble
use
rs
t
o
brows
e
cat
egories
bu
t
not
al
l
con
cepts
bel
onging
to
a
sp
eci
fic
cat
egor
y
becau
se
it
’s
no
t
a
tree
structu
re
.
Nev
e
rtheless
,
sta
rting
by
a
cat
egory
we
c
an
trav
erse
th
e
descenda
nt
cat
egories
an
d
detect
al
l
art
ic
le
s
connecte
d.
Figure
3. Ca
te
gory tree
wiki
pe
dia
I
n
this
pa
rt,
w
e
us
e
the
W
iki
ped
ia
cat
eg
or
y
syst
e
m
to
extract
the
arti
cl
es
or
co
nce
pt
relat
ed
to
an
input
do
m
ai
n
D.
usi
ng
this
cat
egory
syst
e
m
,
we
can
c
onsider
our
in
pu
t
do
m
ai
n
as
a
cat
egory
in
W
i
kip
e
dia
and
try
to
sear
ch
al
l
cat
eg
ory
belo
ngin
g,
as
well
as
by
tra
ve
rsing
the
desc
end
a
nt
cat
e
gor
ie
s
extract
al
l
a
rtic
le
s
connecte
d.
Howev
e
r,
as
the
l
evel
increa
ses,
we
ca
n
note
t
hat
the
a
rtic
le
s
cov
e
re
d
are
a
ug
m
ented
m
or
e
an
d
m
or
e
alm
os
t
all
the
arti
cl
es
in
the
W
i
kip
e
dia
are
c
ov
e
red.
T
hat
m
eans,
al
l
the
arti
cl
es
be
l
ong
t
o
al
l
the
broa
d
cat
egories,
wh
i
ch
is
inc
orrect.
So
our
iss
ue
is
how
t
o
de
fine
wh
ic
h
le
vel
of
the
breadt
h
fi
rs
t
traver
sal
we
need
to
stop,
in
ot
he
r
w
ords,
in
w
hich
le
vel
in
W
i
kip
e
dia
tree
structu
re
the
cat
egories
are
eff
ect
ively
rel
at
ed
to
the
cat
ego
ry
in
pu
t.
T
he
refor
e
,
we
pro
po
se
t
o
com
pu
te
the
se
m
antic
si
m
il
a
rity
between
c
at
egory
input
and
al
l
cat
egories
in
e
ach
le
vel,
a
nd
decidi
ng
a
fter
exp
e
rim
entat
i
on
i
n
w
hic
h
le
vel
we
nee
d
to
sto
p.
The
T
able
1
belo
w pr
ese
nt t
he result
of our ex
per
im
entat
i
on.
Ba
sed
on
se
ve
ral
ex
pe
rim
e
ntati
on
an
d
obser
vatio
n,
we
fin
d
t
hat
the
cat
egories
le
vel
that
a
re
eff
ect
ively
relat
ed
to
the
do
m
ai
n
input
changes
from
on
e
dom
ai
n
to
ano
th
er
and
is
not
al
ways
correct
to
stop
in
a
sp
eci
fic
ca
te
gory
le
vel
(c
om
pu
te
r
sci
enc
e
at
8
le
vel
an
d
bio
in
form
at
ic
s
at
7
le
vel).
b
e
cause
it
is
acco
rd
i
ng
to
the
num
ber
of
do
wn
ca
te
gories
of
th
is
do
m
ai
n
existi
ng
in
Wiki
ped
ia
cat
e
gor
y
syst
e
m
.
Ther
ef
or
e
,
the
cat
egories
extracte
d
m
us
t
be
based
on
a
sem
antic
si
m
i
la
rity
m
easur
e
betwee
n
dom
ai
n
inp
ut
and
the
cat
eg
or
ie
s
in
each
le
vel.
Con
se
quently
,
after
e
xp
e
rim
e
ntati
on
,
we
de
ci
ded
t
o
st
op
the
e
xtracti
on
of
sub
cat
egories
relat
ed
to
dom
ai
n
i
nput
afte
r
a
sim
il
arity
value
of
0.4.
T
he
F
ig
ur
e
4
pr
ese
nts
the
w
ho
le
proc
ess
of
detect
ing t
he W
i
kip
e
dia a
rtic
l
es r
el
at
ed
t
o a specific
dom
ai
n
in
pu
t.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
To
wa
rd
s
opti
m
ize
-
ESA
for
text
seman
ti
c simi
larit
y: A
c
as
e
s
tud
y
of
biomed
ic
al text
(
Khao
ula
Mrh
ar
)
2939
Ta
ble
1
.
C
orrel
at
ion
sem
antic
si
m
il
arity b
et
w
een
wikip
e
dia
cat
egory
tree l
evels
Categ
o
ry
inp
u
t
Co
rr
elatio
n
Se
m
an
tic Si
m
i
larit
y
Categ
o
ry
T
ree
L
ev
els
1
2
3
4
5
6
7
8
9
10
11
12
Co
m
p
u
ter
scien
ce
0
.83
8
9
0
.68
5
8
0
.61
2
7
0
,58
7
0
,55
7
0
,51
7
0
,48
9
0
,43
5
0
,38
7
0
,34
5
0
,28
7
0
,19
8
Bio
in
for
m
atics
0
.62
9
0
.50
5
8
0
,50
2
0
,49
7
0
,47
6
0
,45
6
0
,43
6
0
,42
7
0
,35
7
0
,24
5
0
,22
7
0
,17
5
Bio
lo
g
y
0
.72
0
5
0
.60
6
3
0
,59
8
0
,57
6
0
,55
4
0
,51
8
0
,48
6
0
,42
3
0
,39
7
0
,29
7
0
,26
7
0
,10
9
Medicin
e
0
.73
7
9
0
.64
4
9
8
0
.58
7
0
0
.56
3
0
0
.55
6
3
0
.53
6
0
,50
1
0
,45
6
0
,34
5
0
,32
9
0
,23
4
0
,20
6
Figure
4. The
process
of
detect
ing
wikipe
di
a arti
cl
es r
el
at
ed
to
dom
ai
n
in
pu
t
2
.
3
.
2.
Seco
nd
l
ay
er
:
Bui
ld
domain in
dex Ui
Af
te
r
the
filt
ering
of
the
W
i
kip
e
dia
arti
cl
es
relat
ed
to
a
sp
eci
fic
do
m
ain
Di,
we
buil
d
an
in
ve
rte
d
ind
e
x
dom
ai
n
Di
wh
ic
h
m
aps
each
word
i
nto
a
li
st
of
con
ce
pt
in
w
hich
it
app
ears
as
pr
ese
nted
in
s
ect
ion
3.2.
1.
L
et
kj
be
an
i
nv
e
rted
ind
e
x
e
ntry
for
word
wi
,
w
he
r
e
kj
qu
a
ntifie
s
the
stre
ng
t
h
of
associat
io
n
of
w
ord
W
i
with
W
i
kipe
dia
co
nce
pt
c
j
,
{c
j
∍
C
1,…..
,Cn},
w
her
e
n
denotes
t
he
num
ber
of
Wi
kipe
dia
co
nce
pt
fi
lt
ered
for do
m
ai
n
Di
as ap
pear i
n
ta
ble 2
.
Table
2
.
Wiki
pe
dia
arti
cl
es f
il
te
red
for d
om
a
in D
i
WA
1
…..
…..
W
Aj
Ter
m
1
T[
0
,0]
…..
…..
Ter
m
k
…..
….
….
T[
i,j]
Ter
m
s in
w
ik
ip
ed
i
a
articl
es
f
iltered f
o
r
d
o
m
ain
Di
Af
te
r
buil
ding
the
weig
ht
ed
i
nv
e
rted
i
nd
e
x
for
dom
ai
n
Di
,
W
e
sto
re
it
in
a
database
as
Ui
to
use
it
for
a
ny
f
uture
interp
retat
ion
to
opti
m
iz
e
t
he
c
om
pu
ta
ti
on
of
sem
antic
si
m
il
arity.
O
ur
datab
ase
m
us
t
be
updated
for sel
ect
ing
new arti
cl
es ad
ded to
W
i
kip
e
dia,
the
al
gorithm
o
f o
ur m
et
ho
d
is
presented
b
el
ow
:
//
the a
lg
or
it
hm
create t
he
inve
rte
d
in
dex wik
i
ped
i
a
fo
r
a
s
pe
ci
f
ic
d
omai
n
th
at can
be use
d i
n
the
simil
ar
it
y
semant
ic
mea
s
ur
es
betwe
en
text
b
ase
d o
n
E
SA meth
od
//
I
nput
: dom
ai
n
Di
//
o
utput
: dom
ain
i
nd
ex
U
i
step 1
//
ext
ra
ct
k con
cept rel
ate
d
to
dom
ain
i
nput
Di
for do
m
ai
n
Di
if D
i e
xist in
U
return
U
[D
i]
Else
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
2
9
3
4
-
2
9
4
3
2940
//
Sea
rc
h Di in
node
c
ateg
or
y
tre
e wi
ki
ped
ia
Ck =sea
rch [
Di,CT]
//
ext
ra
ct
K co
ncep
t C
k
[
C1….Ck
]
be
l
ongs t
o Di
Ret
ur
n C
k
St
ep
2
//
b
uild i
nverte
d
in
dex f
o
r
do
ma
i
n Di, W
Di
Fo
r
C1
to
C
k
WD
i [C
1…..C
k]
st
or
e
WD
i i
n
Ui
return
Ui
stop
Fu
rt
her
m
or
e,
To
c
om
pu
te
th
e
sem
antic
si
m
il
arity
betwe
en
tw
o
te
x
t
T1
an
d
T
2
,
we
consi
der
it
as
a
bag
of
w
ords
T1=
{t
1,t
2,…t
n}
with
n
wor
ds
.
And
we
se
m
antic
al
ly
rein
te
rp
ret
it
base
d
on
k
c
on
ce
pt
save
d
in
dom
ai
n
ind
ex
Ui.
A
nd
fin
al
ly
we
com
pu
te
the
sem
a
ti
c
si
m
il
arity
betwee
n
the
tw
o
te
xt
vecto
rs
base
d
on
a cosi
nes
sim
ilarity
m
et
ric.
3.
RESU
LT
S
AND DI
SCUS
S
ION
3
.
1.
C
as
e s
tu
d
y
:
biome
dical
do
m
ain
In
the
la
st
ye
ars,
the
am
ou
nt
of
in
form
ation
avail
a
ble
in
te
xtu
al
f
or
m
at
is
rap
idly
i
ncr
easi
ng
in
the
bi
om
edica
l
do
m
ai
n
su
c
h
as
patie
nt
he
al
th
rec
ords
a
nd
m
edical
do
cum
ents.
T
he
refor
e
,
Me
asu
r
es
of
sem
antic
related
ne
ss
bet
wee
n
co
nce
pts
an
d
te
xts
is
wi
dely
us
e
d
in
t
his
dom
ai
n,
disco
ver
i
ng
sim
il
ar
diseases
[
22
],
and
re
dunda
nc
y
detect
ion
in
cl
i
nical
reco
rd
s
[23],
com
par
ing
ge
ne
pro
duct
s
[24
]
,
iden
ti
fyi
ng
direct
a
nd
in
di
rect
pr
otein
int
eracti
on
s
within
hu
m
an
re
gul
at
or
y
path
ways
us
i
ng
gen
e
ontolo
gy
[
25
]
,
c
od
i
ng
m
edical
diagnoses
a
nd
a
dve
rse
dr
ug
reacti
on
s
us
in
g
se
m
antic
distance
[
26
].
F
ur
t
he
rm
or
e,
the
cl
a
ssic
al
sem
antic
si
m
ilarity
co
m
pu
ta
ti
on
m
easur
es
hav
e
been
a
da
pted
to
be
us
e
d
in
seve
ral
dom
ai
n.
Howev
e
r,
thes
e
m
easur
es
a
re
l
ess
ef
fici
ent
due
to
the
li
m
ited
c
overa
ge
of
sp
eci
al
iz
ed
do
m
ai
n
s.
T
hat
is
wh
y,
the
nee
d
to
use
a
sp
eci
al
iz
ed
knowle
dge
ba
se
su
ch
as
in
the
bio
m
edic
al
do
m
ai
n,
by
exp
loit
in
g
th
e
m
edical
on
tolo
gies,
knowle
dge
re
posit
or
ie
s
a
nd
bi
om
edical
structur
e
d
voca
bu
l
aries.
F
or
t
his
r
easo
n,
we
propose
i
n
this
p
ape
r
a
do
m
ai
n
sp
eci
al
iz
ed
m
et
h
od
that
opti
m
iz
e
ESA
sem
antic
si
m
il
a
rity
app
r
oach.
W
e
c
hoos
e
to
t
est
the
pe
rfor
m
an
ce
of
ou
r
m
et
h
od
on
th
ree
bi
om
edical
dataset
because
of
t
he
a
vaila
bili
ty
an
d
pr
oliferat
ion
of
the
resource
s.
We
pr
e
sent
in
the
sect
ion
bel
ow
the
dataset
us
e
d
in
our
ex
per
im
entat
ion
and
the
inter
pr
et
at
ion
of our res
ult.
3
.
2.
E
xp
eri
me
nt
ati
on
Hu
m
ans
ha
ve
an
in
nate
abili
ty
to
j
udge
se
m
antic
relat
edn
ess
of
te
xts.
Accor
dingly
,
to
eval
uate
the
perform
ance
of
m
achine
m
easur
em
ent
of
sem
antic
si
m
il
arity
bet
ween
te
xts
,
w
e
com
par
e
them
with
hu
m
an
rati
ng
on
the
sam
e
s
et
ti
ng
by
com
par
e
th
e
co
rr
e
la
ti
on
betwee
n
hu
m
an
j
udge
m
ent
and
m
ac
hin
e
cal
culat
ion
s.
I
n
this
w
ork,
be
cause
of
the
no
s
uitabil
it
y
of
dataset
of
biom
edical
pairs
s
entences
as
a
ppear
i
n
Table
3
.
W
e
use
BIOSSES
Dataset
[
27
]
,
wh
ic
h
is
a
be
nc
hm
ark
dataset
for
bio
m
edical
sentences
s
i
m
il
arity
est
i
m
ation
.
It
con
ta
in
100
s
entences
pair
sel
ect
ed
from
the
TAC
(Te
xt
An
al
ysi
s
Con
fe
ren
ce
)
bi
om
edical
su
m
m
arization
track
trai
ning
con
ta
inin
g
ar
ti
cl
es
fr
om
the
bio
m
edical
do
m
ai
n.
The
s
entences
pairs
wer
e
evaluate
d
by
f
ive
diff
e
re
nt
hum
an
exp
ert
t
hat
giv
e
a
sco
r
es
rangin
g
fro
m
0
(n
o
relat
ion)
to
4
(equi
valent)
.
Wh
ic
h
a
ver
a
ge
d
f
or
eac
h
pai
r
to
pr
oduce
a
s
ing
le
r
el
at
edn
e
ss
sco
re.
We
t
est
our
m
et
ho
d
al
so
on
tw
o
F
ren
c
h
Web
c
orp
or
a
[
28
]
.
T
he
first
corp
us
is
about
“epidem
ic
s”
a
nd
the
sec
ond
on
e
is
ab
ou
t
“s
pace
co
nquest
.
”
Each
corp
us
c
on
ta
in
s
ref
e
re
nce
se
ntences
an
d
e
ach
of
t
hem
was
ass
ociat
e
d
with
six
se
ntences
ch
os
e
n
with
si
m
il
arities
scor
e
rangin
g
f
ro
m
0
(the
s
entences
a
re
unrelat
ed)
a
nd
4
(th
e
sente
nces
are
c
omplet
el
y
equ
i
valent)
.
Table
3
. T
he
dat
ase
ts use
d
in
sem
antic
si
m
ilarity
task
Dataset
Pairs
Scale
Ref
erences
BIOSSE
S
100
1
-
5
[
2
7
]
Epid
e
m
ics
60
0
-
4
[
2
8
]
Sp
ace
co
n
q
u
est
60
0
-
4
[
2
8
]
Fo
ll
owin
g
the
li
te
ratur
e
on
se
m
antic
relat
edn
ess,
we
e
valu
at
e
the
perfor
m
ance
by
m
ea
su
ri
ng
a
pair
correla
ti
on
sc
ores
betwee
n
th
e
scor
e
assig
ne
d
by
the
pro
po
sed
m
et
ho
d
an
d
hum
an
j
udge
m
ent
scor
e
f
or
each
dataset
we
re
port
the
co
rr
el
a
ti
on
com
pu
te
d
on
al
l
pair
s
with
the
m
et
ri
c
Pearson
’s
co
rr
el
at
ion
c
oeffi
ci
ents.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
To
wa
rd
s
opti
m
ize
-
ESA
for
text
seman
ti
c simi
larit
y: A
c
as
e
s
tud
y
of
biomed
ic
al text
(
Khao
ula
Mrh
ar
)
2941
The
Pea
rs
on’s
correla
ti
on
m
et
ric
denoted
a
s
P
ref
le
ct
s
the
li
near
co
rr
el
at
ion
betwee
n
m
easur
i
ng
res
ult
with
hu
m
an
ju
dg
m
ents,
wh
e
re
0
m
eans
un
c
orrelat
ed
a
nd
1
m
eans
pe
rf
ect
c
orrelat
ed.
T
he
c
orre
sp
on
ding
form
ula
is
def
i
ned as:
w
he
re
re
fer
s
to
the
value
of
the
it
h
in
the
dataset
giv
en
by
hu
m
an
j
udgm
ents,
to
the
corres
pondin
g
valu
e
returne
d by an
Op
ti
m
iz
e
-
ESA
m
e
tho
d, a
nd n to the
len
gth o
f
the
targ
et
d
at
aset
.
Table
4
show
t
he
correla
ti
on
coeffic
ie
nt
Pearson
by
the
ES
A
al
gorithm
a
nd
our
m
e
tho
ds
Op
ti
m
iz
e
-
ESA
f
or
t
he
t
hr
ee
dataset
s
B
IO
S
SES
,
E
pid
em
ic
s
and
S
pace
C
onquest
.
O
ur
m
et
ho
d
optim
iz
e
ESA
gets
a
cor
relat
io
n
of
0.612
com
pared
to
0.59
5
f
or
ESA
m
e
tho
d
f
or
se
ntences
da
ta
set
BIOS
SE
S.
On
the
E
pidem
ic
s
dataset
,
our
m
et
hod
gets
a
c
orrelat
ion
of
0.
544
c
om
par
ed
to
0.5
25
f
or
the
fu
ll
ve
rsion
E
SA.
A
nd
ES
A
appr
oach
with
W
i
kip
e
dia
knowle
dge
base
ge
t
a
correla
ti
on
of
0.5
58
f
or
S
pace
c
onquest
dataset
c
om
pared
to
0.5
71
f
or
our
m
et
ho
d.
T
his
c
le
arly
sh
ow
t
ha
t
our
m
et
ho
d
correla
te
s
m
uch
bette
r
with
hum
an
j
ud
gem
e
nt
tha
n
the
fu
ll
ve
rsi
on
ES
A
ap
proa
ch.
A
co
m
pari
so
n
of
our
m
et
hod
O
ptim
ize
-
ES
A
an
d
s
om
e
sta
te
-
of
-
a
rt
for
com
pu
ti
ng
se
m
antic
relat
edn
ess
in
the
biom
e
dical
do
m
ain
is
show
n
in
T
able
5
.
We
c
om
par
e
it
with
Re
sink
and
L
in
wh
ic
h
is
the
m
os
t
po
pula
r
i
nfor
m
at
ion
co
nte
nt
m
easur
e
s
in
knowle
dge
ba
sed
m
et
ho
ds.
I
n
ad
diti
on
,
Leve
ns
htein
w
hich
is
a
string
based
m
easure
.
Be
sides
com
par
i
ng
our
opti
m
iz
e
ESA
with
the
tradit
io
na
l
ESA
appr
oach with
wikip
e
dia as
a
knowle
d
ge gra
ph
.
Table
4
. T
he
c
om
par
ison o
f P
earson
’s
c
orrelat
ion
c
oeffici
ent on B
IOSSE
S,
E
pid
em
ic
s,
Sp
ace c
onques
t Datase
ts
Dataset
ESA
A
lg
o
rith
m
(
G
ab
rilov
ich
&
M
ark
o
v
itch
,
2
0
0
7
)
Op
ti
m
ize
-
ESA
Pearso
n
’s (
P
)
Pearso
n
’s (
P
)
BIOSSE
S
0
.59
5
0
.61
2
Epid
e
m
ics
0
.52
5
0
.54
4
Sp
ace
co
n
q
u
est
0
.55
8
0
.57
1
Table
5
. C
orrel
at
ion
c
oeffici
ents
pear
s
on (P)
b
et
wee
n rela
te
d
st
ud
ie
s
Re
la
te
d
stu
dies
Dataset
Re
fer
e
nces
B
IO
S
SES
Epid
em
ic
s
Sp
ace C
onques
t
IC
-
based
m
eas
ur
es
Re
sink
0.473
0.396
0.412
P.Resni
k [29
]
Lin
0.645
0.591
0.611
D.
Li
n
[
30
]
Strin
g
sim
i
la
rit
y
m
easur
es
Leve
ns
htein
0.592
0.601
0.591
Fink
el
ste
in
et a
l.,
[31
]
ESA
sim
il
arit
y
m
easur
es
ESA
-
wi
ki
0.59
5
0.525
0.558
Gabril
ovic
h
a
nd
Ma
rkov
it
ch
[19
]
Op
ti
m
iz
e
-
ESA
0.612
0.544
0.571
As
the
a
bove
resu
lt
s
i
n
Ta
ble
5
i
nd
ic
at
e
t
ha
t
the
opti
m
ize
-
ES
A
ca
n
ob
t
ai
n
com
petit
ive
res
ults
f
or
Pears
on
co
rr
el
at
ion
es
pecial
ly
for
t
he
sm
all
dataset
.
In
c
on
t
rast,
i
n
the
big
siz
e
datase
t,
the
us
e
of
t
he
fu
ll
ver
si
on
ES
A
i
nclu
ding
al
l
co
ncep
ts
i
n
W
i
ki
ped
ia
or
opti
m
iz
e
-
ESA
in
a
dom
ai
n
sp
eci
fic
is
m
or
e
perfor
m
ant
com
par
ed
to
s
tring
sim
il
arity
m
easur
e
and
IC
base
d
m
e
asur
e
s
.
F
ur
t
herm
or
e,
we
noti
ced
that
our
m
et
ho
d
op
ti
m
iz
e
-
ESA
is
faster
tha
n
ESA
with
f
ul
l
W
iki
ped
ia
after
an
e
xper
i
m
entat
ion
pr
e
sented
i
n
F
ig
ur
e
5
.
We
m
easur
ed
the
co
sines
sim
il
arity
proces
sing
cost
of
si
x
pairs
from
each
te
st
c
ollec
ti
on
a
nd
we
c
om
pu
te
the
r
unni
ng ti
m
e com
par
ison betwee
n ES
A and O
ptim
iz
e
-
ESA
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020
:
2
9
3
4
-
2
9
4
3
2942
Figure
5: ES
A &
O
ptim
iz
e
-
ESA
R
unni
ng Tim
e
4.
CONCL
US
I
O
N
AND
F
UT
U
RE W
ORK
The
st
ud
y
of
sem
antic
si
m
ilarity
betwee
n
words
has
l
ong
been
a
n
int
egr
al
pa
rt
of
i
nfor
m
at
ion
retrieval
a
nd
natu
ral
la
ngua
ge
processi
ng.
Ba
sed
on
th
e
theo
reti
cal
pr
i
nciples
a
nd
the
way
in
wh
i
c
h
on
t
ologies
are
inv
est
igate
d
to
com
pu
te
si
m
ilarity
,
diff
e
re
nt
kinds
of
m
et
ho
ds
can
be
ide
ntifie
d
acc
ordi
ng
t
o
ty
pe,
siz
e
a
nd
do
m
ai
n
of
dataset
.
Am
ong
t
hese
m
et
ho
ds,
we
ca
n
ci
te
t
he
E
xpli
c
it
Sem
antic
An
al
ys
is
ESA
appr
oach
with
W
iki
ped
ia
kn
ow
le
dg
e
base
wh
ic
h
perfor
m
ver
y
well
the
ta
sk
of
c
om
pu
ti
ng
the
s
e
m
at
i
c
relat
edn
es
s
of
word
a
nd
te
xt
fr
a
gm
ent.
H
owever,
T
he
ES
A
proces
s
is
t
oo
ex
pe
ns
ive
due
to
t
he
la
r
ge
le
ng
t
h
dim
ension
of
c
on
ce
pt
vect
or
f
or
a
gi
ven
wor
d
w
hich
eq
uals
al
l
W
ikipe
dia
con
ce
pt
(
4
M).
And
the
eff
ic
i
ency
of ES
A wil
l sl
ow do
wn b
eca
us
e
we
l
os
e a l
ot of tim
e in unn
ece
ssary
ope
rati
on
s
.
We
pro
pose
in
this
pa
per
a
new
m
et
ho
d
cal
le
d
op
ti
m
i
ze
-
ES
A
w
hich
reduce
the
dim
ension
at
the
inter
p
retat
ion
sta
ge
by
com
pu
ti
ng
t
he
sem
antic
si
m
il
arity
in
a
sp
eci
fic
do
m
ai
n.
To
ev
al
uate
the p
er
f
or
m
ance o
f our m
et
ho
d,
we
giv
e a c
om
par
ison
b
et
w
een d
if
fere
nt algorit
hm
s f
or
S
e
m
antic
Rel
at
e
dn
e
ss
in
the
bio
m
edical
do
m
a
in.
We
ch
oose
the
bio
m
edical
do
m
ai
n
becau
se
of
the
ava
il
abili
ty
of
dif
fer
e
nt
on
t
ologies
a
nd
m
et
ho
ds
,
w
hi
ch
is
sig
nifica
ntly
higher
tha
n
a
ny
oth
e
r
dom
ai
n.
W
e
co
nc
lud
e
that
our
m
et
hod
ou
t
perform
s
the
curre
nt
sta
te
-
of
-
the
-
art
m
et
ho
ds
f
or
cal
c
ula
ti
ng
the
sem
antic
relat
edn
ess
of
bio
m
edical
te
xts
as
it
cor
relat
es
m
uch
bette
r
with
hum
an
j
udgem
ents.
The
re
are
two
ot
he
r
interest
in
g
li
nes
of
f
uture
r
esearch
relat
ed
to
t
he
m
et
ho
d
pr
e
sen
te
d
in
this
w
ork
.
Firstl
y,
we
plan
t
o
m
or
e
op
ti
m
iz
e
ou
r
m
et
ho
d
by
filt
erin
g
the
W
i
kip
e
dia
con
ce
pt
us
in
g
the
dom
ai
n
spe
ci
fic
knowle
dge
base
d
le
ve
r
aged
with
W
i
ki
ped
ia
cat
e
gor
y
tree.
Seco
nd
ly
,
we
plan
to
m
or
e
per
f
or
m
the
resu
lt
of
ES
A
by
add
i
ng
to
the
weig
hted
in
vert
ed
ind
e
x
a
cat
egory
ind
e
x.
Finall
y,
a
wide
r
e
val
ua
ti
on
will
be
de
sirable,
co
ns
id
erin
g
la
r
ger
set
s
of
te
xt
pai
rs
as
be
nc
hm
ark
data
in
oth
e
r do
m
ai
n.
REFERE
NCE
S
[1]
S.
Tongphu,
"
T
oward
Sem
ant
ic
Sim
il
ari
t
y
Mea
sure
Bet
wee
n
C
once
pts
in
An
Ontolog
y
,
"
Indo
nesian
Journal
of
El
e
ct
rica
l
Eng
in
ee
ring a
nd
Computer
Sc
ie
nc
e,
v
ol.
14
,
no
.
3
,
p
p
.
1356
-
1372
,
201
9
.
[2]
Baz
i
and
N.
L
aa
chf
oub
i,
"A
rab
ic
Nam
ed
Entit
y
Re
cognition
using
Dee
p
L
ea
rning
Approa
ch,
"
Int
ernati
on
al
Journal
of
Elec
t
rical
and
Computer
Eng
ine
ering
(
IJE
CE)
,
vol. 9,
no.
3
,
p
p
.
2025
-
2032
,
2019
.
[3]
Y.
Zha
ng,
R.
Ji
n
and
Z.
Zhou,
"U
nder
standi
ng
Bag
-
of
-
W
ords
Mod
el
:
A
Stat
isti
c
al
Fram
ework
",
Inte
rnational
Journal
of
Mac
h
ine
Learning
an
d
Cybe
rne
ti
cs
,
v
ol.
1
,
no
.
1
-
4
,
pp
.
43
-
52
,
2010
.
[4]
T.
L
anda
u
er,
P.
Foltz
and
D.
L
a
ham,
"A
n
Introd
uct
ion
to
L
at
en
t
Sem
ant
ic
Anal
ysis,
"
Discourse
Proce
ss
es,
vol.
2
5,
no.
2
-
3
,
pp
.
259
-
28
4,
1998
.
[5]
Evge
ni
y
Gabri
lo
vic
h
and
Shaul
Markovit
ch
.
“
O
ver
coming
the
Brit
tleness
Bott
lenec
k
using
W
iki
pedi
a
:
Enha
nc
in
g
Te
xt
Categori
za
t
ion
with
Ency
c
l
opedi
c
Know
le
d
ge,
”
In
AA
AI’
0
6,
p
p.
1301
–
130
6,
2006
.
[6]
A.
Budani
tsk
y
and
G.
Hirst,
"Eva
lu
at
ing
W
ordNet
-
base
d
M
ea
sures
of
Le
xical
Sem
a
nti
c
Re
la
t
edne
s
s
,
"
Computati
onal
L
ingui
stic
s,
vol
.
3
2,
no
.
1
,
pp
.
13
-
47,
2006
.
[7]
M
.
Strube,
and
S
.
P
.
Ponzet
to
.
W
iki
Rel
a
te
!
Co
m
puti
ng
Sem
ant
ic
Re
la
t
edne
ss
u
sing
W
iki
pedi
a
,
”
Proce
ed
ings
of
the
21st
Nati
ona
l
Conf
ere
nce
on
Arti
ficial
in
te
l
li
g
enc
e
,
Boston
,
M
assac
husett
s,
p
p
.
1419
-
1424,
200
6
.
[8]
Nurifa
n,
R.
Sar
no
and
C.
W
ah
y
uni
,
"D
eve
lo
ping
Corpora
using
W
iki
pedi
a
and
W
ord2ve
c
for
W
ord
Sen
se
Disam
bigua
ti
on
,
"
Indone
sian
J
ournal
of
E
lec
tric
al
Engi
n
ee
r
ing
and
Comp
ute
r
Sc
ie
nc
e,
vol.
12
,
no
.
3
,
p
p
.
1239
-
1246
,
2018.
[9]
K.
Dram
é,
F.
Mougin
and
G.
Di
al
lo
,
"La
rg
e
Sca
le
Biom
edica
l
T
ext
s
Cla
ss
ifica
t
i
on
:
A
kNN
and
An
ESA
-
Based
Approac
hes,
"
Jo
urnal
of Bi
omedi
cal
S
emanti
cs
,
v
ol.
7
,
no
.
1
,
2016
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
To
wa
rd
s
opti
m
ize
-
ESA
for
text
seman
ti
c simi
larit
y: A
c
as
e
s
tud
y
of
biomed
ic
al text
(
Khao
ula
Mrh
ar
)
2943
[10]
F.
Rahut
om
o,
Y.
Mana
be,
T.
Ki
tas
uka
and
M.
Arit
sugi,
"Ec
ono
-
ES
A
Reduc
ti
on
Sc
heme
and
the
Im
pac
t
of its Inde
x
Matri
x
Densi
t
y
,
"
Proce
d
ia
Comp
ute
r Sc
ie
nc
e,
vol
.
35
,
pp
.
474
-
48
3,
2014
.
[11]
P.
Li
,
B.
Xi
ao,
W
.
Ma,
Y.
Jia
ng
and
Z.
Zh
an
g,
"A
Graph
-
Based
Sem
ant
ic
R
el
a
te
dness
As
sess
m
ent
Method
Com
bini
ng
W
ikipedia Fea
tur
es,
"
Engi
ne
er
ing
App
li
cations of
Artif
ic
ial Int
el
l
ige
n
ce,
vol
.
65
,
pp
.
268
-
281,
2017
.
[12]
Kim
,
J.
W.
Kash
y
ap
,
A
and
Bhamidipa
ti,
S.
W
iki
pedi
a
-
B
ase
d
Sem
ant
ic
In
terpret
er
using
a
p
proximate
Top
-
K
Proce
ss
ing
and
I
ts Appl
icati
on
,
v
ol.
18
,
pp
.
650
-
6
75
,
2012
.
[13]
X.
Song,
"O
ntolog
y
-
bas
ed
Dom
ai
n
-
spec
ifi
c
Se
m
ant
ic
Sim
il
ar
ity
Ana
l
y
s
is
and
Applic
a
ti
ons
,
"
Computer
Scien
ce
,
All
Diss
ert
a
ti
ons
,
2018
.
[14]
T
.
Gottro
n,
M.
Anderka
and
B
.
Stei
n
,
“
Insights
i
nto
Explicit
Sem
ant
i
c
Anal
y
sis
,
”
In
Proce
ed
ings
of
the
20
th
AC
M
Inte
rnational
Co
nfe
r
ence
on
Info
rm
ati
on
and
Kn
owle
dge
Manag
eme
nt
(
CIKM)
,
pp.
1961
-
1964
,
2
011.
[15]
V.
Garl
a
and
C
.
Brandt
,
"S
ema
nti
c
Sim
il
arit
y
i
n
the
Biom
edi
c
al
Dom
ai
n:
An
Eva
luation
Acr
oss
Kno
wledge
Source
s,
"
BMC
Bi
oinf
orm
atics,
vol.
13
,
no
.
1
,
20
12
.
[16]
D.
Sánchez,
M.
Bat
et,
and
A.
V
al
ls,
“
W
eb
-
Base
d
Sem
ant
ic
Sim
i
la
rity
:
an
e
va
luation
in
the
B
iomedical
Dom
ai
n,
”
Int
J
Soft
w
Infor
m,
vol
.
4
,
pp
.
39
-
52
,
2010
.
[17]
E.
Cost
a,
H.
Tjandra
sa
and
S.
Djana
l
i,
"
Te
xt
Mining
for
Pest
and
Dise
ase
Id
ent
ifica
ti
on
on
Ric
e
Farm
ing
with
Inte
ra
ct
iv
e
Te
x
t
Mes
saging
,
"
Inte
rnational
Journ
al
of
El
e
ct
rica
l
and
Computer
Engi
nee
ring
(
IJECE)
,
vol.
8,
no.
3,
p
p
.
1671
,
2018
.
[18]
A.
Jaiswal,
and
A.
Bharga
v
a,
“
Expl
icit
Sem
antic
Anal
y
sis
for
Com
puti
ng
Sem
ant
i
c
Relat
edne
s
s
of
Biom
edi
cal
Te
xt
,”
In
Proc
eedings of
Conf
luence
The
Ne
xt
G
e
nerati
on
In
formation
Te
chnol
og
y
Summ
it
(
IEEE)
,
2014
[19]
E.
Gabr
il
ovi
ch
a
nd
S.
Markovitc
h.
“
Com
puti
ng
Sem
ant
ic
Re
late
dness
using
W
iki
pedia
-
base
d
E
xpli
cit
Se
m
anti
c
Anal
y
sis
,”
In
Proce
edi
ngs
of
the
20th
Int
e
rnational
Joi
n
t
Confe
renc
e
o
n
Arti
fici
a
l
Int
el
li
g
ence
(
IJCAI),
pp.
1606
-
1611
,
2007.
[20]
R.
B.
B
ai
ri
,
M
.
Carman,
and
G
.
Ramakrishna
n
,
"O
n
the
Evolution
of
W
iki
p
e
dia
:
D
y
nami
cs
of
Cat
egor
ie
s
a
nd
Artic
l
es
,
"
Nin
th Inte
rnational
A
A
AI
Conf
ere
nc
e
o
n
We
b
and
Soc
ia
l
Me
d
ia
,
pp.
6
-
1
0,
2015
.
[21]
"P
et
Scan",
Petsca
n.
wm
fla
bs
.
o
rg,
2019.
[Online
]
.
Available:
htt
ps:/
/pe
ts
ca
n.
wm
fla
bs.org
/.
[Acc
essed:
31
-
Ma
y
-
2019]
.
[22]
S.
Mathur
and
D.
Dinaka
rpa
ndia
n,
"F
indi
ng
Disea
se
Sim
il
ari
t
y
Based
on
Im
pli
ci
t
Sem
ant
ic
Sim
il
arit
y
,
"
Journal
of
Bi
om
edi
ca
l
Informati
cs,
vol
.
45
,
no
.
2
,
pp
.
363
-
371
,
2
012.
[23]
Zha
ng
R,
Pakho
m
ov
S,
McInne
s
B
.
T,
and
Me
lton
G
.
B
,
“
Evalu
at
ing
Me
asure
s
of
red
undan
c
y
i
n
cl
in
ic
a
l
t
ext
s
,”
AM
IA Annu
Sy
m
p
Proc.
pp.
16
12
-
16
20
,
2011
.
[24]
C.
Pesquita,
D.
Faria,
A.
Fa
lcão,
P.
Lord
an
d
F.
Couto,
"S
emanti
c
Sim
i
la
r
ity
in
B
iomedi
ca
l
Onto
logies
,
"
PLoS
Computational Biology,
vo
l.
5
,
no
.
7
,
2009
.
[25]
X.
Guo,
R.
Li
u,
C.
Shriver
,
H.
Hu
and
M.
Li
ebman,
"A
ss
es
sing
Se
m
ant
ic
Sim
il
arit
y
Mea
sur
es
for
the
Ch
aract
er
izat
ion
of
Hum
an
R
egul
a
tor
y
Pathw
a
y
s
,
"
B
ioi
nforma
ti
cs,
vol
.
22
,
no
.
8,
pp
.
967
-
973
,
2006.
[26]
C.
Bousquet
,
et
al.
,
"A
ppra
isal
o
f
the
MedDRA
Conce
ptu
al
Stru
ct
ure
f
or
Descri
bing
and
Groupi
ng
Adverse
Drug
Rea
c
ti
ons
,
"
Dr
u
g
Safety
,
vol
.
28
,
no.
1,
pp.
19
-
34
,
2005
.
[27]
G.
Soğancı
oğ
lu,
H.
Öztür
k
and
A.
Özgür,
"BIO
SS
ES:
A
Sema
ntic
Sent
enc
e
Sim
il
ari
t
y
Esti
m
at
ion
S
y
stem
fo
r
the
B
iomedic
a
l D
om
ai
n
,
"
Bi
o
infor
matic
s,
vol
.
33
,
no
.
14
,
pp
.
i49
-
i58,
2017
.
[28]
Vu,
H.
H.,
Villa
nea
u,
J.
,
Saïd
,
F.,
and
Mart
ea
u,
P
.
F.
“
Sente
nce
Si
m
il
ari
t
y
b
y
c
om
bini
ng
Explicit
Sem
ant
ic
Ana
l
y
s
i
s
and
Overl
a
pp
ing
N
-
Gram
s
,”
In
Proce
edi
ngs
of
t
he
17th
Int
ernat
ional
Confe
ren
c
e
on
Text,
Spe
e
ch
and
Dialogu
e
(
TS
D 2014)
,
Brno,
Cz
ec
h
Republic, Springe
r
Int
er
nat
ion
al
Publ
ishi
ng,
pp
.
201
–
208
,
2014.
[29]
Resnik,
P
.
,
“
Us
ing
Inform
at
ion
Conte
nt
to
Ev
aluate
Sem
ant
i
c
Sim
il
ari
t
y
in
A
Taxonom
y
.
In
Proce
edi
ngs
of
IJCAI
,
pp.
448
–
453
.
19
95
.
[30]
Deka
ng
L
in
,
“
An
Inform
at
ion
-
T
heor
etic
D
efi
ni
tion
of
W
ord
Sim
i
la
rity
,
”
In
ICML
’98,
1998
[31]
L.
Finke
lste
in
,
et
al
.
,
“
Plac
ing
Sear
ch
in
Context:
Th
e
Conc
ept
Revi
sit
ed,”
AC
M
Tr
ansacti
ons
on
Informatio
n
Syste
ms
,
vol
.
20
,
pp.
116
–
131
,
20
02.
BIOGR
AP
H
I
ES
OF
A
UTH
ORS
Khaou
la
Mr
har
is
cur
ren
tly
Ph.D
student
at
IPS
S
rese
arc
h
team
in
Mohamm
ed
V
unive
rsit
y
Raba
t
,
Morocc
o
.
Her
bac
kgrou
nd
inc
lude
s
a
degr
ee
in
m
a
th
emati
cs
and
co
m
pute
r
scie
nce.
Her
rese
a
rch
in
te
rest
contains
Form
al
and
No
n
Form
al
learni
ng,
ar
ti
fi
cial
i
nt
el
li
g
ence,
data
int
egr
at
ion
,
t
ext
m
ini
ng
and
r
ec
o
m
m
ende
r
s
y
stem.
Mounia
Ab
ik
I
rec
e
ive
d
a
PhD
from
the
Nati
onal
High
School
for
Com
pute
r
Scie
nce
and
S
y
st
ems
Anal
y
sis
(ENSIA
S)
in
2009
an
d
an
Habil
itati
o
n
to
Drive
Resea
rch
(HD
R)
from
Moha
m
m
ed
V
Univer
sit
y
of
Raba
t
in
2014.
M
y
m
ai
n
r
ese
arc
h
in
te
r
ests
foc
us
on
e
-
L
ea
r
ning,
Know
le
dg
e
Ext
ra
ct
ion
from
Socia
l
Networks,
Sem
ant
i
c
W
eb
and
C
y
ber
-
v
iole
nce
.
Evaluation Warning : The document was created with Spire.PDF for Python.