Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
C
om
put
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
9
, No
.
6
,
Decem
ber
201
9
, p
p.
5016
~
5023
IS
S
N:
20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v
9
i
6
.
pp5016
-
50
23
5016
Journ
al h
om
e
page
:
http:
//
ia
es
core
.c
om/
journa
ls
/i
ndex.
ph
p/IJECE
A
g
raph
-
b
ase
d
a
pp
ro
ach
for
t
ext
q
uery
e
xp
ansion
u
sing
p
seudo
r
ele
va
n
ce
f
ee
db
ack and
a
ssociatio
n
r
ules
m
ini
n
g
Siham J
ab
ri
,
Az
z
eddine Dahbi
,
Ta
oufiq
Ga
di
La
bora
tor
y
Infor
m
at
ic
s,
Im
agi
ng
and
Model
li
ng
o
f
Com
ple
x
S
y
s
tem
s,
Facul
t
y
of
S
ci
en
ce a
nd
Tech
nolog
y
,
Hass
an
1
st
Univ
ersity
Se
tt
a
t, Morocc
o
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
N
ov
17
, 20
18
Re
vised
Me
i
31
, 2
01
9
Accepte
d
J
un
27
, 201
9
Ps
eudo
-
rel
eva
n
c
e
fee
dba
ck
is
a
quer
y
expa
nsion
appr
oac
h
whos
e
te
rm
s
are
sele
c
te
d
from
a
set
of
top
ran
ked
ret
ri
eve
d
do
cuments
in
response
to
th
e
origi
nal
qu
er
y
.
How
eve
r,
the
se
le
c
te
d
te
rm
s
will
not
be
rel
ated
t
o
the
quer
y
if
the
top
ret
r
ie
ved
documen
ts
are
irre
le
v
a
nt.
As
a
result
,
re
tr
ie
va
l
per
form
anc
e
for
the
expa
nd
ed
quer
y
is
no
t
i
m
prove
d,
comp
are
d
to
th
e
origi
nal
one
.
This
pape
r
suggests
the
use
of
do
cu
m
ent
s
sele
c
te
d
u
sing
Ps
eudo
Rel
ev
anc
e
Feed
bac
k
for
g
ene
r
a
ti
ng
associ
at
ion
rule
s.
Thus,
a
n
al
gori
thm
base
d
on
dom
i
nanc
e
relati
ons
is
applied.
T
hen
the
strong
cor
re
la
t
ions
bet
wee
n
qu
er
y
a
nd
othe
r
te
rm
s
are
de
te
c
te
d
,
an
d
an
orie
nt
ed
an
d
weight
e
d
gra
ph
ca
l
le
d
Ps
eudo
-
Graph
Fee
dbac
k
is
construc
te
d
.
Thi
s
gra
ph
serve
s
for
expa
nding
original
qu
eri
es
b
y
term
s
rel
at
ed
sem
ant
i
ca
l
l
y
and
sel
ec
t
ed
b
y
th
e
user.
The
r
esults
of
the
expe
ri
m
ent
s
on
Te
xt
Ret
ri
eva
l
Conf
er
enc
e
(
TRE
C)
col
l
ec
t
ion
ar
e
v
e
r
y
sign
ifi
c
ant,
a
nd
best
r
esult
s
a
re
a
chi
ev
ed
b
y
t
he
proposed
appr
oac
h
compa
red
to
both
the b
ase
li
n
e
s
y
s
te
m
a
nd
an exi
sting
t
e
chni
que
.
Ke
yw
or
d
s
:
Associ
at
ion
r
ules
Do
m
inance
r
el
at
ion
s
Inform
at
ion
r
et
rieval
Pseudo
-
g
ra
ph
f
eedb
ac
k
Qu
e
ry
e
xp
a
ns
i
on
TREC
Copyright
©
201
9
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Siham
Jab
ri,
Lab
or
at
ory
In
f
or
m
at
ic
s,
Im
ag
ing
a
nd
M
od
el
l
ing
of Com
plex
Syst
em
s
,
Faculty
of S
ci
e
nce a
nd Tec
hnology,
Hassa
n 1
st
U
niv
e
rsity
,
577
C
asa
blanc
a Roa
d,
Sett
at
, Mo
r
occo
.
Em
a
il
:
si.jab
ri
@uh
p.
ac.m
a
1.
INTROD
U
CTION
The
I
nfo
rm
ati
on
Re
trie
val
(
IR)
dom
ai
n
is
as
old
as
the
com
pu
te
rs
th
e
m
sel
ves
,
its
s
yst
e
m
s
are
or
i
gin
al
ly
desi
gn
e
d
i
n
order
t
o
a
uto
m
at
e
the
docum
ents
m
a
nag
em
ent
by
st
or
i
ng
a
colle
ct
ion
of
them
as
ind
e
x,
then
retrievi
ng
inf
or
m
at
ion
for
m
app
ing
t
he
us
er
’s
qu
e
ry
to
a
set
of
a
sso
ci
at
ed
doc
um
ent
s.
W
it
h
t
he
a
dvent
of
the
I
nternet
,
th
e
vo
l
um
e
of
do
cum
ents
and
th
e
nu
m
ber
of
pe
op
le
to
m
anag
e
hav
e
increa
s
ed
ex
pone
ntial
ly
and
value
d
at
hundre
ds
of
m
illio
ns.
As
res
ult,
the
we
b
sea
r
ch
has
bec
ome
a
sta
nda
rd
s
ource
of
in
for
m
at
ion
fin
ding.
T
his
gro
wth o
f data
w
as a
nd sti
ll
is a b
i
g
c
halle
ng
e f
or
i
nfor
m
at
i
on r
et
rie
val sys
tem
s.
Most
qu
e
ries
are
short
an
d
a
m
big
uous
for
descr
ibi
ng
t
he
relevan
t
doc
um
ents
that
meet
the
us
er
inf
or
m
at
ion
ne
eds.
T
his
is
t
he
te
rm
m
is
m
at
c
h
pro
blem
in
w
hich
the
i
nd
e
xe
rs
a
nd
the
us
er
s
do
n’
t
us
e
the
sam
e
words f
or
d
e
sc
ribing the sam
e idea. One o
f t
he
su
ccess
fu
l t
echn
i
qu
e
s to
ha
nd
le
the
prob
l
e
m
o
f
te
rm
m
is
m
at
ch
is
to
re
form
ulate
the
ori
gina
l
qu
e
ry
by
a
dding
relat
ed
te
r
m
s
that
descr
i
be
the
us
e
r
ne
ed
a
nd
ha
ve
not
bee
n
m
entioned,
thi
s
process
is
ca
ll
ed
Query
E
xpan
sio
n
(
QE)
.
Qu
e
ry
Ex
pa
ns
i
on
m
ay
be
do
ne
in
dif
fer
e
nt
ways:
m
anu
al
, in
te
ra
ct
ive and a
utom
at
ic
. I
nteracti
ve qu
e
ry ex
pa
nsi
on
proces
s th
at
involves b
oth
the
syst
em
a
nd u
se
r
is
bette
r
than
the
autom
at
ic
process
,
but
it
is
no
t
feasibl
e
to
involve
t
he
us
e
r
in
m
os
t
of
the
ti
m
e
[
1
,
2
]
.
The
m
os
t
po
pula
r
te
ch
nique
i
n
the
li
te
ratur
e
is
to
de
fine
w
ord
s
i
n
a
vect
or
s
pace
a
nd
givi
ng
wei
gh
ts
t
o
them
.
Rocchi
o
et
al
[
3
]
pro
po
s
ed
a
cl
assic
al
re
le
van
ce
fee
db
ack
m
od
el
to
fin
d
te
xt
si
m
ilarity
and
ide
nt
ify
ing
rel
eva
nt
an
d
non
-
rele
van
t
do
cum
ents.
Othe
r
m
et
ho
ds
for
re
le
van
ce
feedba
ck
an
d
rankin
g
us
e
d
co
ntext
ua
l
and
word sim
i
la
rity m
od
el
le
d
as c
o
-
occ
urren
ce
[
4
-
12]
.
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
A g
r
aph
-
ba
se
d a
pp
r
oac
h
for
text
q
ue
ry ex
pans
io
n usin
g ps
eudo re
le
va
nce
feedb
ack
and
.
..
(
Si
ham
J
abri)
5017
Pseudo
releva
nce
fee
db
ac
k
(
PRF
)
is
one
of
us
ef
ul
te
chn
i
ques
to
am
eliorat
e
retrieval
pe
rfor
m
ance.
It
o
btain
s
the
exp
a
ns
i
on
te
r
m
s
or
phrase
s
from
the
top
ran
ke
d
ret
rieve
d
doc
um
ents
i
n
res
pons
e
t
o
a
give
n
qu
e
ry.
H
owev
er,
if
the
doc
um
ents
us
ed
f
or
this
releva
nc
e
feedback
a
re
irreleva
nt,
the
sel
ect
ed
exp
a
ns
io
n
te
rm
s
i
m
pact
t
he
retrieval
pe
rfor
m
ance
ne
ga
ti
vely
[
13
]
.
Ar
ia
nn
ez
ha
d
e
t
al
[
14
]
pro
pose
d
a
ne
w
a
ppr
oac
h
wh
ic
h
c
onsider
that
the
docu
m
ents
co
ntaining
m
or
e
inf
or
m
at
ive
te
r
m
s
fo
r
PRF
s
houl
d
hav
e
hi
gh
e
r
relevanc
e
scor
e
s.
Mo
re
over
,
an
it
erati
ve
al
gorithm
is
pro
vid
e
d
f
or
e
ns
uri
ng
the
sat
isfact
ion
of
t
he
pro
po
se
d
c
onstrai
nt
for
a
ny
PRF
m
od
el
.
In
this
reg
a
rd,
t
he
al
gorithm
cal
culat
es
the
fee
db
ac
k
weig
ht
of
te
r
m
s
and
the
rel
evan
c
e
scor
e o
f
fee
dba
ck
docum
ents,
si
m
ultaneou
sly
.
Singh
et
al
[
15
]
pr
ese
nted
a n
ew
fu
zzy
lo
gic
-
base
d
Q
E
m
e
thod
for
do
c
um
ent
retrieval
bas
ed
on
PRF
te
chn
i
qu
e
s.
T
his
ap
proach
co
m
bin
ed
the
B
orda,
C
ondorc
et
and
reciprocal
wei
gh
ts
of
ca
nd
i
da
te
exp
a
ns
io
n
te
rm
s
and
pro
du
ce
d
a
sin
gle
fu
zzy
weig
ht
fo
r
e
ver
y
ca
ndidate
exp
a
ns
i
on
t
er
m
.
Then
the
de
gr
ee
of
im
po
r
ta
nce
of
a
rele
van
t
te
rm
is
ca
lc
ulate
d,
an
d
the
highe
r
this
degree,
the
hi
gh
e
r
the
chan
ce
to
sel
e
ct
relevan
t
te
r
m
s
fo
r
query
exp
a
ns
i
on.
F
or
filt
ering
out
irreleva
nt
te
rm
s
from
cand
i
dates,
the
Fu
zzy
log
ic
-
ba
sed
sem
antic
si
m
il
ari
t
y
al
go
rithm
s
are
us
ed
.
Colac
e
et
al
[
16
]
intr
oduce
d
a
new
te
rm
extracti
on
m
e
tho
d
for
qu
e
ry
ex
pan
si
on.
The
init
ia
l
qu
e
ry
is
exp
a
nd
e
d
with
a
st
ru
ct
ur
e
d
re
pr
es
entat
io
n
m
ade
of
weig
ht
ed
word
pairs
extracte
d
fro
m
a
set
of
t
rainin
g
docum
ents
(
releva
nce
fe
edb
ac
k
).
B
ouzi
ri
et
al
[
17
]
propose
d
a
qu
ery
ex
pa
ns
io
n
ap
proac
h
base
d
on
associat
ion
r
ules
between
te
rm
s.
The
ex
pans
ion
is
m
od
el
le
d
as
su
pe
r
vised
cl
as
sific
at
ion
pr
oblem
and
so
lv
e
d
us
i
ng
a
s
up
erv
ise
d
le
ar
ning
al
gorithm
.
Fo
r
this
pur
po
se
,
a
trai
ning
set
is
ge
ne
rat
ed
us
in
g
a
gen
et
ic
al
gorit
hm
-
based
a
ppr
oach
t
hat
ex
plo
res
as
so
ci
at
io
n
r
ules
sp
ace
f
or
retri
evin
g
the
best
exp
a
ns
io
n
te
r
m
s
and
ge
ne
ra
ti
ng
a
trai
ni
ng
instances
that
are
us
e
d
to
buil
d
a
cl
assifi
er im
ple
m
enting decisi
on tree alg
ori
th
m
.
In
our
previ
ou
s
wo
rk
[
18
]
, a
query e
xp
a
nsi
on
a
ppr
oac
h
ba
se
d
on
a
n
exte
rn
al
structu
red
kn
ow
le
dg
e
res
ource
nam
el
y
W
ikipedia,
E
xpli
ci
t
se
m
antic
a
naly
sis
(ESA)
and
associat
ion
r
ul
es
te
ch
nique
ha
s
bee
n
pro
pos
ed.
The
sem
antic
interp
retat
ion
ES
A
ha
s
be
en
us
e
d
f
or
buil
ding
the
expansi
on
gr
a
ph.
The
n
we
cal
culat
ed
a
new
sem
antic
relat
edn
ess
m
easur
e
that
com
bin
es
an
as
so
ci
at
ion
ru
le
s tec
hn
i
que
, s
em
antic
m
ea
su
re
a
nd the e
xp
a
ns
i
on grap
h
av
oid
in
g
t
he
i
nclusi
on of ir
re
le
van
t t
erm
s.
In
this
pa
pe
r,
ano
t
her
a
que
r
y
exp
an
sio
n
te
chn
i
qu
e
is
intr
odu
ce
d
us
i
ng
ps
e
udo
releva
nc
e
feedback
and
ass
ociat
ion
r
ules
for
buil
din
g
a
Pse
udo
-
Gr
a
ph
Fee
dback
in
order
to
ex
pa
n
d
que
ries
by
sem
antic
al
ly
relat
ed
te
rm
s selec
te
d
by t
he user
.
T
he
c
ontr
ibu
ti
ons
of this
work
a
re
orga
nized
a
s foll
ow
s:
a.
A
set
of
retrieved
doc
um
e
nts
in
res
pons
e
to
the
ori
gi
nal
query
is
sel
ect
e
d
an
d
judge
d
to
be
rele
van
t
f
or
gen
e
rati
ng ass
ociat
ion
r
ules
us
in
g
a
tech
nique
base
d on do
m
inance r
el
at
ion
s
[
19
]
.
b.
The
ext
racted
ru
le
s
al
lo
w
to
disco
ver
t
he
stren
gth
c
orrelat
ion
s
betwee
n
qu
e
ry
te
rm
s
a
nd
th
e
c
an
did
a
te
on
e
s,
to
the
n
c
on
st
ru
c
t
a
n o
riented a
nd
weig
hted g
raph call
ed
Ps
eu
do
-
Gr
a
ph Fee
db
ac
k.
c.
To
a
void
t
he
i
nteg
rati
on
of
non
-
sim
i
la
r
te
r
m
s
in
the
ex
pa
nd
e
d
queries
,
t
he
us
er
is
in
vit
ed
to
sel
ect
f
r
o
m
the buil
t g
raph
the m
os
t rela
te
d
te
rm
s d
escri
bi
ng
his i
n
f
or
m
at
ion
need.
The
rem
ai
nd
er
of
this
pa
per
co
ns
ist
s
of
t
he
pro
posed
ap
proac
h
a
naly
sis
prese
nted
i
n
S
ect
ion
2,
resu
lt
s a
nd
discusson re
porte
d
in
secti
on
3
a
nd the c
oncl
uti
on is
giv
e
n
in
the last
par
t.
2.
PROP
OSE
D MET
HO
D
A
NA
L
YS
I
S
In
this
se
ct
ion,
the
pro
posed
appr
oach
f
or
query
e
xpan
sio
n
base
d
on
ps
e
udo
rele
van
ce
f
eedb
ac
k
an
d
associat
ion
r
ul
es
is
descr
ibe
d.
The
ap
proac
h
consi
sts
of
buil
ding,
f
ro
m
the
retrieve
d
doc
um
ents
in
respo
ns
e
to
a
giv
e
n
a
qu
ery,
the
sem
a
ntic
gr
a
ph,
ca
ll
ed
Pseud
o
-
G
raph
Fee
dbac
k,
w
hic
h
re
presents
the
ca
nd
i
date
exp
a
ns
i
on
te
r
m
s.
Rou
ghly
,
t
hr
ee
m
ai
n
ste
ps
are
c
ar
ried
out.
T
he
syst
em
arch
it
ect
ure of
the
que
ry
ex
pa
ns
io
n
is
il
lustrate
d
in
Figure
1.
T
he
fi
rst
ste
p
co
ncerns
associat
io
n
ru
le
s
ge
ne
rati
on
w
her
e
the
ve
ct
or
sp
ace
m
od
el
is
us
e
d
f
or
ranki
ng
te
xt
doc
ume
nts
acco
r
ding
to
the
giv
e
n
qu
e
ry
[
20
]
.
For
the
n
ap
plyi
ng
an
a
sso
ci
at
io
n
r
ules
al
gorithm
based
on
dom
inance
relat
ion
that
will
be
detai
le
d
la
te
r.
This
phase
al
lo
ws
t
o
disco
ver
the
stren
gt
h
correla
ti
on
s
be
tween
docum
ent
te
rm
s
and
ori
gin
al
query.
The
sec
ond
phase
us
e
d
the
ge
ner
at
e
d
ass
oc
ia
ti
on
ru
le
s
as
data
s
ource
f
or
bu
il
ding
a
gr
a
ph
cal
le
d
Pseudo
-
Gr
a
ph
Feed
ba
ck.
A
s
third
st
ep
the
best
ex
pansi
on
te
rm
s ar
e extra
ct
ed
f
ro
m
the
ge
ner
at
e
d gr
a
ph
b
y t
he
user a
voidin
g
t
he
i
nclusion
of ina
de
qu
at
e
te
rm
s.
2.1.
Associ
ait
on ru
le
s g
ener
at
i
on
The
i
dea
is
to u
se
t
he
T
F
-
IDF
of v
ect
or sp
a
ce
m
od
el
to f
in
d
a
n
i
niti
al
set
of
m
os
t
releva
nt
doc
um
ents
for
a
giv
e
n
que
ry,
to
the
n
est
i
m
at
e
that
the
top
k
ranke
d
docum
ents
are
re
le
van
t
without
any
us
e
r
inte
ra
ct
ion
.
This
process
is
cal
le
d
P
seu
do
Re
le
va
nce
Feed
back,
it
a
ll
ow
s
t
o
a
utom
at
e
the
m
an
ual
pa
rt
of
rel
evan
c
e
feedbac
k.
The
sel
ect
ed
do
c
um
ents
are
us
ed
to
ge
ner
at
e
associat
ion
r
ules
us
in
g
an
al
go
rithm
based
on
do
m
inance
rel
at
ion
s
.
I
t
al
lows
to
ra
nk
as
so
c
ia
ti
on
r
ules
acc
ordin
g
to
a real
value
a
nd
t
o
fi
nd
t
he
m
os
t
relevant
ru
le
s
am
ong
ve
ry
la
r
ge
dataset
s.
T
his
al
gorithm
us
es
a
com
bin
at
ion
of
a
set
of
m
easur
e
s
a
nd
not
onl
y
on
e
[
19
].
An
il
lustrati
ve
exa
m
ple
of
associ
at
ion
r
ules
al
gorithm
p
rincipl
e
is
pr
ese
nted
i
n
Ta
ble
1
.
S
u
ppos
i
ng
that
Me
asur
es
=
{Supp
or
t,
Confide
nce,
Lift
,
Jacca
rd,
GI
}
.
The
ru
le
“R
1”
stric
tl
y
do
m
i
nates
the
seco
nd
r
ule
“R
2”
beca
us
e
R1(Supp
or
t)
=
240,
R1
(Confi
den
ce
)
=
0.8
4,
R1
(Lift)
=
18
,
35,
R1
(J
acca
rd)
=
0,7
3
an
d
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
9
, N
o.
6
,
Dece
m
ber
201
9
:
5016
-
5023
5018
R1(GI
)
=
10.
35
w
hich
are
al
l
(p
ai
r
by
pair)
b
i
gg
e
r
t
han
R2
(Supp
ort
)
=
70,
R
2(Confi
de
nc
e)
=
0.7
2,
R2 (Li
ft)
=
10.24, R
2(
Jacca
rd)
=
0,5
6
a
nd R2(GI)
=
2.9
0. S
i
m
i
la
rly
, R2
dom
inate
s R3.
Fig
ure
1
.
The
pro
po
se
d q
uery
ex
pa
ns
i
on pr
ocess
Table
1
.
Assoc
ia
ti
on
rules
e
xa
m
ples
R
Su
p
p
Co
n
f
Lif
t
Jaccard
GI
R1
: colo
m
b
ia
c
o
cain
e
240
0
.84
1
8
.35
0
.73
1
0
.35
R2
: acid
rain
70
0
.72
1
0
.24
0
.56
2
.90
R3
: aids
v
irus
84
0
.71
1
.52
0
.55
0
.42
Associ
at
ion
r
ules g
e
ner
at
io
n
proces
s fo
r
a
give
n qu
e
ry is e
xe
cuted
a
s in
the
fo
ll
owin
g
ste
ps:
a.
Step
1:
Pr
e
pro
cessi
ng
is
an
e
ssentia
l
ph
ase
in
te
xt
m
ining
process.
T
his
ste
p
trans
form
s
the
data
sour
ce
con
te
nts
int
o
a
f
or
m
at
that
w
il
l
be
m
or
e
ef
f
ect
ively
proces
sed
by
s
ubseq
uen
t
ste
ps
.
S
o,
the
docum
ent’s
con
te
nts
are
to
ken
iz
e
d
a
nd
only
te
xt
is
kept
.
Af
te
r
that
stop
w
ords
su
c
h
as
com
m
on
w
ords,
prep
os
it
io
ns
and
il
le
gal
cha
racters
are
filt
ered,
an
d
the
se
ntences
a
re
ide
ntifie
d.
T
he
n
the
al
gorithm
of
Porte
r
[
21
]
f
or
En
glish text
is
us
e
d
f
or s
te
m
m
ing
in
flect
ed or
d
e
rivati
onal
words to
their
roo
t
f
or
m
.
b.
Step
2:
F
or
c
onstr
ucting
the
t
ran
sact
io
nal
da
ta
set
each
keyword
is
c
on
si
de
red
as
it
em
,
t
he
tra
ns
act
io
ns
are
represe
nted
by
the
se
nte
nces
an
d
t
he
do
c
um
ent
in
wh
ic
h
the
occ
urred
sentence
re
pr
e
sents
tra
ns
act
i
on
el
e
m
ents.
c.
Step
3:
Tra
ns
a
ct
ion
al
dataset
is
i
m
po
rted,
a
nd
the
re
fer
e
nce
d
al
gorithm
[
19
]
is
app
li
ed,
it
execu
te
s
Apriori
al
gorithm
[
22
]
to
fin
d
t
he
frequ
e
ncy
of
it
em
se
ts
and
ge
ne
rates
al
l
ass
oc
ia
ti
on
r
ules.
Finall
y,
sig
nifi
cant
m
easur
es t
o
e
va
l
uate an
d ran
k t
he ob
ta
in
ed
rules are
calc
ul
at
ed.
d.
Step
4: Ran
king
of irre
dunda
nt ass
ociat
ion r
ules.
2.2.
Bui
ldi
ng
of
ps
eudo
-
gr
aph
fe
edba
c
k
The pr
opos
e
d gr
a
ph call
ed
Ps
eudo
-
G
raph Fe
edb
ac
k
is
base
d
on t
he
ge
ne
ra
te
d
r
ules in
t
he
f
irst p
hase
.
This
grap
h
det
erm
ines
the
ca
nd
i
date
exp
a
nsi
on
te
rm
s,
and
the
relat
ion
s
be
tween
them
a
nd
the
or
igi
nal
qu
ery
.
The
ai
m
of
th
e
Pseud
o
-
Gr
a
ph
Fee
dback
is
to
tran
s
f
or
m
the
us
er
query
into
a
str
uct
ured
qu
e
ry
that
can
be
m
app
ed
t
o
kn
own
te
rm
s.
So
,
in
this
seco
nd
ph
a
se
the
in
flu
ence
am
on
g
as
so
ci
at
ion
ru
le
s
it
e
m
s
is
con
si
der
e
d
to
fin
d
the
a
de
qu
at
e
te
rm
s.
The
al
gorithm
t
o
buil
d
this
or
i
ented
a
nd
weig
hted
gr
a
ph
G
pgf
=
(V,E,
w)
t
akes
as
input
a
set
of
ru
le
s
R=
{R
1
,R
2
,R
3
,..,R
m
},w
hi
ch
are
sel
ect
ed
a
m
on
g
the
gen
e
rated
r
ules
in
the
first
ph
a
se
.
In
these
r
ules,
te
rm
s
are
corre
la
te
d.
L
og
ic
al
ly
,
w
hen
a
te
rm
t
i
is
relat
ed
t
o
the
init
ia
l
quer
y,
the
te
rm
t
k
wh
ic
h
is
cor
relat
ed
t
o
t
i
in
so
m
e
ru
le
s,
shou
l
d
al
so
be
relat
ed
to
th
e
qu
e
ry.
Th
us,
any
associat
io
n
r
ule
R
j
from
R
m
us
t
con
ta
in
at
le
ast
one
query
te
r
m
or
a
cor
relat
ed
te
rm
with
t
he
query
te
rm
,
and
it
s
con
fi
de
nce
m
us
t
be
gr
eat
er
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
A g
r
aph
-
ba
se
d a
pp
r
oac
h
for
text
q
ue
ry ex
pans
io
n usin
g ps
eudo re
le
va
nce
feedb
ack
and
.
..
(
Si
ham
J
abri)
5019
than
a
certai
n
thres
hold.
T
he
set
of
no
des
V
in
the
Pse
udo
-
Gr
a
ph
Fee
dbac
k
is
t
he
set
of
disti
nct
te
rm
s
t
in
R.
Each
te
rm
rep
resen
ts
a
grap
h
node
,
and
th
e
relat
edn
ess
be
tween
tw
o
te
rm
s
rep
rese
nts
an
ed
ge.
Gi
ve
n
tw
o
te
rm
s
t,t
’
V
,they
are
co
nn
e
ct
ed
with
a
dir
ect
ed
ed
ge
if
there
is
at
le
ast
on
e
ru
le
R
j
fro
m
R
in
wh
ic
h
t
and
t
’
are locate
d
in
the
pr
em
ise
an
d t
he
c
on
cl
us
io
n res
pecti
vely
. In othe
r word
s, t
he
set
of
e
dge
s E is
form
ed
as:
{
|
}
,
’
/
’
j
j
j
E
t
t
R
R
t
R
p
r
e
m
i
s
e
t
R
c
o
n
c
l
u
s
i
o
n
(1)
The
ke
y
aspec
t
of
the
c
onstr
uction
of
Pse
udo
-
G
ra
ph
Feed
back
is
to
de
fine
the
weig
hting
functi
on
w
:
E
→
[0
,1
]
as
the
m
axi
m
um
of
the
con
f
idence
of
any
associat
ion
ru
l
e
R
j
fr
om
R,
w
hich
co
ntains
t
he
two
ver
ti
ces t a
nd t
’
in
the
pr
em
ise
and the
concl
us
io
n res
pecti
ve
ly
.
,
'
,
,
’
C
o
n
f
i
d
e
n
c
e
(
R
(
t
,
t
')
)
j
j
t
t
E
R
R
w
t
t
m
a
x
(2)
Fo
r
exam
ple,
Figure
2
il
lustr
at
es
a
possible
Pseudo
-
Gr
a
ph
Feed
back
res
ul
ti
ng
f
r
om
the
ge
ner
at
e
d
ass
oci
at
ion
ru
le
s
f
or
t
he que
ry “
W
at
e
r po
l
luti
on
”.
Fig
ure
2
.
A
po
rtion o
f
the
pse
udo
-
grap
h fee
dback
usi
ng the
associat
ion r
ul
es
f
or the
quer
y
“water
poll
utio
n”
2.3.
From
pseu
do
graph
feed
ba
c
k to the e
xpan
ded quer
y
On
ce
a
Pseud
o
-
G
raph
Fee
dba
ck
is
buil
t,
the
nu
m
ber
of
ca
nd
i
date
te
rm
s
gen
e
rated
sti
ll
is
too
la
r
ge
for
ex
pa
nd
i
ng
the
short
us
e
r
qu
e
ry.
T
he
us
e
r
can
in
flue
nce
the
exp
a
nded
qu
e
ry
by
sel
ect
ing
ade
qu
at
e
te
rm
s
and
i
gnor
i
ng
ba
d
ones.
So,
to
avo
i
d
the
incl
us
io
n
of
la
r
ge
nu
m
ber
of
te
r
m
s
wh
ic
h
ca
n
neg
at
ively
in
fluen
c
e
the
inf
or
m
at
ion
retrieval
pe
rfor
m
ance,
the
us
er
is
aske
d
to
pro
vid
e
fe
ed
bac
k
inf
orm
at
ion
by
sel
ect
ing
the
te
rm
s
that
bette
r
sat
isfy
hi
s
inf
o
rm
at
ion
need.
It
is
sim
ple
f
or
him
to
determ
ine
w
hi
ch
of
the
a
vai
la
ble
te
rm
s
bette
r
descr
ibes
his
in
te
rest.
The
Pse
udo
-
G
raph
Fee
db
ac
k
is
la
belle
d
by
set
s
of
te
rm
s
extracte
d
fr
om
the
associat
io
n
r
ules
ge
ner
a
te
d
from
the
do
c
um
ents
in
the
ans
wer
set
.
it
al
low
s
the
syst
e
m
to
m
anage
a
m
big
uiti
es.
O
nce
the
us
e
r
ha
s
sel
ect
ed
the
relat
ed
te
r
m
s
to
the
qu
e
ry
fro
m
the
gr
ap
h,
the
te
rm
s
are
a
dd
e
d
to
the origi
nal qu
ery an
d
t
he
re
f
or
m
ulate
d
que
r
y i
s p
r
ocesse
d.
3.
RESU
LT
S
AND DI
SCUS
S
ION
In
this
la
st
s
ect
ion
,
the
ex
per
im
ental
s
tud
ie
s
to
te
st
t
he
retrie
val
e
ff
ect
ive
ness
is
prese
nted.
The
dataset
on
wh
ic
h
t
he
r
un
s
ar
e
c
ondu
ct
ed
a
nd
the
evaluati
on
m
etr
ic
s
us
ed
to
te
st
the
a
ppr
oa
ch
a
re
descr
i
bed, the
n t
he ob
ta
in
ed
re
su
lt
s ar
e
d
isc
usse
d.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
9
, N
o.
6
,
Dece
m
ber
201
9
:
5016
-
5023
5020
3.1.
Te
st
c
ollec
tion and e
va
lu
ati
on metri
cs
The
colle
ct
io
n
TREC
AP
88
90
c
hosen
to
app
ly
the
pr
opose
d
a
pproac
h
is
a
set
of
eng
li
s
h
ne
ws
arti
cl
es
published
by
Associ
at
ed
Pr
ess
(
1988
-
1990).The
colle
ct
ion
co
nt
ai
ns
242
918
do
c
um
ents
with
15
0
top
ic
s
wic
h
re
pr
ese
nt
the
qu
eries
and
a
rel
evan
ce
judgm
e
nts
file
m
ade
by
do
m
ai
n
experts
.
O
nly
50
t
it
le
s
of
the
TREC
to
pics
are
us
ed
as
qu
e
ries
f
or
sim
ulati
ng
sea
rc
h
scenari
os
wh
e
r
e
us
er
s
te
nd
to
su
bm
it
sh
or
t
queries.
In
this
wor
k,
the
colle
ct
ion
of
docum
ents
is
ind
e
xed
us
in
g
Lucen
e
[
2
3
]
,
wh
ic
h
is
an
op
en
-
s
ource
Ja
va
f
ull
-
te
xt
search
li
br
ary.
The
s
am
e
li
br
ary
is
the
n
us
e
d
f
or
r
et
riev
ing
the
t
op
1
,
000
docum
ents, f
or
each quer
y usin
g
the
TF
-
I
DF
of
the
vector
s
pa
ce
m
od
el
[
2
4
].
The
fo
ll
owin
g
m
e
tric
s
are
us
e
d
for
evalu
at
ing
the
inf
orm
at
io
n
retrieval
perf
orm
ance
of
the
pro
pos
ed
ap
pro
ach
by
c
om
pari
ng
t
he
respo
nse
s
of
a
syst
em
accor
ding
t
o
a
qu
e
ry
with a
releva
nc
e judgem
ent
[25]
:
a.
Pr
eci
sio
n: m
ea
su
res
the
prop
ort
ion o
f rel
eva
nt doc
um
ents am
on
g
al
l d
oc
um
ents r
et
rieve
d by the
syst
em
.
b.
Re
cal
l:
m
easur
es the
pro
portion o
f rel
eva
nt
do
c
um
ents am
ong
al
l
releva
nt
d
oc
um
ents in t
he
data
base.
c.
MAP: M
ean
a
ver
a
ge precisi
on, w
hich
m
easur
es
the a
rea
unde
r
neath
t
he
e
ntire r
ec
al
l pr
e
ci
sion
.
d.
recip
_r
a
nk
: t
he
r
a
nk of the
f
ir
st releva
nt doc
um
ent.
Each
que
ry
in
TREC
colle
ction
is
ex
pande
d
with
the
e
xp
ansio
n
te
rm
s
s
el
ect
ed
by
the
us
er
f
r
om
the
Pseud
o
-
Graph
Fee
dback
.
The
ex
pande
d
qu
e
ries
are
an
swer
e
d
by
the
inf
or
m
at
ion
retrieval
syst
e
m
base
d
on
Luce
ne
[
2
3
]
.
F
or
t
he
ba
sel
ine
m
et
ho
d,
the
or
igi
nal
qu
e
ries
ar
e
in
te
rr
ogat
ed
wit
hout
a
ny
ex
pa
ns
io
n.
The follo
win
g runs a
re c
ondu
ct
ed
an
d
t
he ge
ner
at
e
d
re
spo
nse
s ar
e
ev
al
uated:
a.
Ba
sel
ine: The
or
i
gin
al
qu
e
rie
s w
it
ho
ut an
y e
xp
a
ns
i
on.
b.
PG
F
-
ap
proac
h:
Q
uer
y
e
xp
a
nsi
on
base
d
on
the
Pse
udo
G
r
aph
Fee
db
ac
k
and
us
e
r
i
nter
act
ion
for
te
r
m
s
sel
ect
ion
.
c.
0
-
Fil
te
rin
g
:
Query ex
pa
ns
io
n ba
se
d on the
P
seu
do Grap
h F
eedb
ac
k wit
ho
ut an
y
us
e
r
inte
racti
on.
d.
PRF:
Th
e
class
ic
al
Pseudo
-
Re
le
van
ce
Fee
dback tech
ni
qu
e
im
ple
m
ented
usi
ng
Luce
ne.
The param
et
ers
f
or
t
he
e
xperi
m
ents h
a
ve be
en
set
e
xperim
ental
ly
as f
oll
ows:
a.
The
nu
m
ber
of
te
xt
doc
um
ent
s
us
e
d
in
ass
oc
ia
ti
on
r
ules
is
fixe
d
to
20
docum
ents
retrieved
at
the
t
op
of
resu
lt
s.
b.
The
value
s
of
m
easur
es
us
e
d
in
ass
ociat
ion
r
ules
gen
e
rati
on
are
determ
ined
by
ta
ki
ng
m
i
nim
a
l
values
f
or
no
t
e
xclu
ding
any
im
po
rtan
t
ru
le
:
m
inSu
pport
=1;
m
inConfide
nce=0.
1;
m
inLift
=0.
1;
m
inJacard
=
0.1;
m
inGI
=
0.1;
c.
The
c
onfide
nc
e
thres
ho
l
d
f
or
te
rm
s
sel
ect
i
on
from
the
gen
e
rated
as
soc
ia
ti
on
r
ules
in
Pse
udo
-
G
ra
ph
Feed
back b
uild
ing
is
em
pirical
ly
set to 0
.
7.
d.
The
num
ber
of
exp
a
ns
io
n
te
r
m
s
sel
ect
ed
by
the
us
er
ha
ve
been
set
to
3
to
5
te
rm
s
at
m
os
t
,
that
al
lows
to
get the
best
res
ults
,
becau
se
in
this case t
he q
uer
ie
s a
re sh
or
t
.
The
ai
m
pr
incipal
of
this
w
ork
is
to
pr
ese
nt
si
m
ple
fo
rm
of
inform
ation
to
the
us
er
in
ord
er
to
sel
ect
the
a
dequate
te
rm
s
fo
r
query
exp
a
ns
i
on.
F
or
this
reason,
w
e
pr
op
os
ed
the
Pse
udo
Grap
h
Feed
bac
k
bas
ed
on
associat
ion
ru
l
es
wh
ic
h
descri
bes
the
vo
ca
bula
ry
te
rm
s
relat
ed
to
a
giv
en
query
an
d
th
e
relat
ion
s
bet
ween
them
.
In
order
to
eval
uate
the
pe
rfor
m
ance
of
this
pro
po
se
d
a
ppr
oach,
it
is
reco
m
m
end
e
d
to
com
par
e
i
t
with
r
ecent
que
ry
exp
a
ns
i
on
a
ppr
oach
e
s
base
d
on
associat
io
n
r
ules.
But,
despi
te
of
us
in
g
the
sa
m
e
data
collecti
on
and
the
sam
e
appr
oach
es
,
c
on
tradict
ion
s
in
r
esults
are
detect
ed
wh
ic
h
pr
e
ven
t
a
fair
c
omparis
on
du
e
to
us
e
of
la
rg
e
va
riet
y
of
co
nf
i
gurati
on
par
am
et
ers
li
ke
ste
m
m
ing
al
gorithm
s,
sto
p
words
filt
ering,
ra
nkin
g
m
od
e
ls,
et
c.
Ther
e
f
or
e,
f
or
com
par
ing
t
he
pro
po
se
d
a
ppr
oach,
the
sam
e
search
en
gin
e
Luce
ne
is
us
e
d
for
im
ple
m
ent
ing
a
m
et
ho
d
pro
pos
ed
by
aut
hors
[
17
]
a
nd
detai
le
d
in
intr
oduction
sect
io
n,
us
i
ng
th
e
sam
e
pa
ram
et
ers
value
s
an
d
data set
TREC
AP889
0.
3.2.
Results
and
discussi
on
Table
2
s
hows
the
dif
fer
e
nt
va
lues
of
the
M
ean
A
ver
a
ge
P
recisi
on
(MA
P
),
an
d
the
rank
of
the
first
releva
nt
docu
m
ent
(r
eci
p_ra
nk)
ob
ta
ine
d
by
the
syst
em
without
an
d
with
usi
ng
the
pro
po
se
d
e
xpansi
on
te
chn
iq
ues
.
F
or
eac
h
qu
e
ry
the
MA
P,
reci
p_ra
nk
an
d
t
he
rate
of
im
prov
em
ent
com
par
ed
to
t
he
ba
sel
ine
(MAP
-
Gai
n)
a
re
cal
culat
e
d.
Re
gardin
g
t
he
resu
lt
s obtai
ne
d
a
nd
s
um
m
arized
i
n
Ta
ble 2
,
it
can b
e seen
that
the
pro
po
se
d
quer
y
exp
a
ns
io
n
te
chn
i
qu
e
achie
ves
a
si
gn
i
ficant
im
pr
ov
em
ent
in
te
rm
s
of
MAP
a
nd
reci
p_ra
nk
com
par
ed
t
o
th
e b
asel
ine
and
oth
e
r
r
uns
(0
-
F
il
te
ring
, PR
F a
nd AG
-
a
ppr
oa
ch).
Table
2
.
C
om
par
iso
n of t
he ru
ns
with
res
pect to the
b
a
sel
ine
and a
n
e
xisti
ng alg
ori
thm
Ru
n
MAP
recip_
rank
MAP
-
Gain
PGF
-
ap
p
roach
0
,20
0
4
0
,51
0
9
86%
AG
-
ap
p
roach
[
1
0
]
0
.18
4
0
.44
5
4
71%
PRF
0
,13
4
0
,40
4
6
25%
0
-
Filtering
0
,13
6
5
0
,40
7
1
27%
Bas
elin
e
0
,10
7
6
0
,36
8
5
-
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
A g
r
aph
-
ba
se
d a
pp
r
oac
h
for
text
q
ue
ry ex
pans
io
n usin
g ps
eudo re
le
va
nce
feedb
ack
and
.
..
(
Si
ham
J
abri)
5021
The
i
ncr
ease
of
MA
P
m
eans
that
w
he
nev
e
r
the
qu
e
ry
c
on
t
ai
ns
m
or
e
relevan
t
query
ex
pa
ns
io
n
te
rm
s
the
nu
m
ber
of
releva
nt
do
c
um
ents
is
increasing.
This
see
m
s
cl
ear
in
PG
F
-
a
ppro
ac
h,
t
he
rate
of
im
pr
ovem
ent
is
+86
%
t
han
t
he
ba
sel
ine.
W
hile
the
A
G
-
a
ppr
oac
h
achie
v
e
d
71%
of
im
prov
em
ent.
So,
the
use
of
the
P
seu
do
Gr
a
ph
Fee
dba
ck
with
t
he
use
r
inter
act
ion
f
or
ex
pa
ns
io
n
t
erm
s
filt
ering
i
m
pr
ov
es
t
he
re
trie
val
ef
fecti
ve
ness.
In
th
e
othe
r
ha
nd,
the
MA
P
of
0
-
Fil
te
rin
g
lo
ok
s
bette
r
tha
n
the
baseli
ne
a
nd
PRF,
they
hav
e
a
ppr
ox
im
at
el
y
27%
a
nd
25%
i
m
pr
ovem
ent
ov
er
the
baseli
ne
res
pecti
vely
.
This
m
eans
tha
t
the
te
rm
s
co
m
po
sed
the
Ps
eudo
-
Gr
a
ph
Fee
dba
ck
e
ve
n
without
filt
erin
g
pr
ocess
a
re
m
ore
releva
nt
for
refor
m
ulati
ng
the
que
ries
th
an
PR
F
te
rm
s r
et
rieved by the
classi
ca
l Pseu
do Rel
ev
ance Fee
dbac
k p
ro
ces
s.
Figure
3
pr
ese
nts
the
pr
eci
si
on
w
hen
X
docu
m
ents
are
retrieved
(P
@
X
).
X
de
no
te
s
the p
r
oport
ion
o
f
releva
nt
docu
m
ents
in
the
to
p
X
docum
ents
in
the
ret
urn
ed
li
st
fo
r
a
gi
ven
queries.
X
is
set
to
5,
10
,
15
,
20
and
30
res
pec
ti
vely
.
I
t
i
s
ob
ser
ve
d
that
usi
ng
the
P
seu
do
-
G
ra
ph
Fee
db
ac
k
a
nd
the
us
er
interact
i
on
for
sel
ect
ing
the
e
xp
a
ns
i
on
te
rm
s,
le
ads
to
the
i
m
pr
ovem
ent
of
the
ret
riev
al
eff
ect
ive
nes
s
w
he
n
c
om
par
ed
to
the
baseli
ne
a
nd
oth
e
r
a
ppr
oa
ches.
T
he
us
e
r
interact
io
n
in
this
ap
proac
h
ens
ur
e
d
that
t
he
exte
nded
queries
con
ta
ini
ng a
de
qu
at
e te
rm
s.
Fig
ure
3
.
Im
prov
em
ent p
e
rce
ntage i
n P@
X
Fo
r
e
xam
ple,
the
P
GF
-
a
ppr
oa
ch
preci
sio
ns
a
re
0,3
42
an
d
0.
293
f
or
the
to
p
five
a
nd
t
op
te
n
retrie
ved
do
c
um
ents
resp
ect
ively
,
w
hile
the
baseli
ne
br
in
gs
on
ly
0,
19
2
(+
78%
)
an
d
0.1
72
(
+70%).
F
or
th
e
AG
-
appr
oach
0.2
67(+
39)
a
nd
0.2
265(+3
2),
a
nd
for
0
-
filt
ering
ap
pr
oach
0.2
25
(+
17%)
an
d
0.197
(+15%
).
These
exp
e
rim
ents
of
fer
the
a
dv
a
nta
ge
to
ra
nk
t
he
r
el
evan
t
doc
ume
nts
acco
rd
i
ng
to
queries
in
th
e
top
of
res
ults.Thi
s
perform
ance
cou
l
d
be
e
xp
la
i
ned
by
t
he
us
e
of
the
ass
ociat
ion
ru
le
s
te
c
hniq
ue
base
d
on
m
ulti
ple
crit
eri
a
f
or
bu
il
di
ng
t
he
Ps
eudo
G
ra
ph
Fe
edb
ac
k.
T
his algorit
hm
is eff
i
ci
ent
to r
an
k
a
nd
a
nd
kee
p
on
ly
i
m
po
rtant r
ul
es b
y
consi
der
i
ng
m
ulti
ple
m
easur
es
an
d
do
m
inance
relat
ion
s.
The
value
d
gr
a
ph
is
a
sim
ple
and
str
uctu
red
form
of
inf
or
m
at
ion
s
r
epr
ese
ntin
g
th
e
cor
r
el
at
io
ns
betwee
n
query
te
r
m
s
and
th
e
cand
i
date’s
on
e
s,
an
d
the
edg
e
s
represe
nt
the
se
m
antic
relat
ion
s
bet
ween
th
e
m
.
The
nu
m
ber
of
e
xp
a
ns
i
on
te
rm
s
can
be
too
la
rg
e
for
sh
ort
qu
e
ries
e
ng
e
nderi
ng
l
ow
pe
rfor
m
ance.
F
or
ens
uri
ng
tha
t
the
ex
pande
d
que
ries
will
co
ntain
the
a
dequat
e
te
rm
s,
the
ge
ne
rated
grap
h
is
presente
d
t
o
t
he
us
er
f
or
sel
ect
ing
t
he
best
ex
pansi
on
te
r
m
s.
This
phase
has
a
po
sit
ive
im
pact
fo
r
e
xpan
ding
the
qu
e
ries
with
ade
quat
e
te
rm
s
and
i
m
pr
ov
i
ng
t
he
retri
eval
eff
ect
i
veness.
F
or
AG
-
ap
proac
h,
pr
ese
n
te
d
by
bouzi
ri
et
al
[1
0]
add
s
te
rm
s
e
xtracted
f
r
om
t
he
associat
io
n
ru
le
s
to
the
ori
gin
al
qu
e
ries.
T
he
r
ules
ge
ner
at
e
d
by
Charm
alg
ori
thm
fr
om
the
w
ho
le
doc
um
ents
colle
ction
are
m
od
el
le
d
as
cl
assifi
cat
ion
pro
blem
and
r
esolve
d
by
the
decisi
on
tree
al
gorithm
fo
r
detect
ing
the
best
te
rm
s
fo
r
qu
e
r
y
exp
a
ns
i
on.
Des
pite
of
the
prec
isi
on
in
the
pr
oc
ess
of
sel
ect
in
g
releva
nt
te
rm
s,
irreleva
nt
on
es
can
be
ad
de
d
to
the origi
nal qu
ery.
4.
CONCL
US
I
O
N
The
pro
po
s
ed
qu
e
ry
e
xp
a
ns
i
on
ap
proac
h
e
xp
a
nds
qu
e
rie
s
with
te
rm
s
sel
ect
ed
by
the
us
e
r
f
r
om
Pseudo
-
Gr
a
ph
Feedb
ac
k.
T
hi
s
gr
ap
h
is
buil
t
us
ing
the
associat
ion
r
ule
s
gen
e
rated
by
a
te
chn
iqu
e
us
ing
m
ul
ti
ple
crit
eria
and
do
m
inance
relat
ion
s
.
The
ex
pe
rim
ental
stu
dy
w
as
co
nducte
d
on
TREC
AP889
0
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
9
, N
o.
6
,
Dece
m
ber
201
9
:
5016
-
5023
5022
colle
ct
ion
.
O
ur
m
et
ho
d
le
ad
s
to
sign
ifi
ca
nt
ly
i
m
pr
ov
e
d
retrieval
pe
rfo
rm
ance,
and
e
xceeds
the
ba
sel
ine
sign
ific
a
ntly
.
In
te
rm
s
of
Me
an
A
ver
a
ge
P
r
eci
sion
(M
AP
)
the
propose
d
appr
oach
has
appr
ox
im
at
ely
86
%
ov
e
r
the
baseli
ne,
al
th
ough
the
im
pr
ov
em
ents
at
ta
ined
by
the
com
par
ison
m
et
ho
d
do
n’
t
ou
t
perf
orm
71%
.
This
c
onfirm
s
that
the
a
sso
ci
at
ion
ru
le
s
te
chn
i
qu
e
s
is
a
si
gn
i
ficant
way
to
prese
nt
a
sim
ple
and
str
uc
ture
d
form
of
in
for
m
at
ion
’s
to
th
e
us
e
r
in
the
form
of
gr
a
ph
for
sel
ect
in
g
adequate
te
rm
s
f
or
e
xpa
ns
io
n.
A
s
per
s
pecti
ves
,
ot
her
data
s
our
ces
and
te
xt
m
ining
al
gorithm
s
will
be
us
ed
f
or
sel
ect
ing
a
nd
rankin
g
quer
y
exp
a
ns
i
on term
s.
REFERE
NCE
S
[1]
D.
Pal,
et
al
.
,
“
Expl
oring
qu
er
y
ca
t
egor
isation
f
or
quer
y
exp
ansion:
A
stud
y
,
”
a
rXiv
preprint
ar
Xiv:1509.
05567,
2015.
[2]
C.
Buckl
e
y
,
et
al
.,
“
The
eff
e
ct
of
addi
ng
r
el
ev
anc
e
i
nfor
m
at
ion
in
a
re
le
van
ce
fe
edbac
k
envi
ronm
ent
,”
SIGIR’94
,
Sprin
ger
,
London
,
pp.
292
-
300
,
1994.
[3]
J.
Rocc
hio
and
G.
Salt
on,
“T
he
SM
ART
r
et
ri
eva
l
s
y
st
em
,
”
Re
levanc
e
feedbac
k
in
inf
or
mation
retrie
val
,
pp.
313
-
323
,
19
71.
[4]
A.
I
lgarriff
,
e
t
a
l
.
,
“
Itri
-
04
-
08
the
sketc
h
engi
ne
,
”
I
nformation
Tech
nology
,
vol
.
105
,
pp.
116,
2004.
[5]
Y.
Matsuo
and
M.
Ishizuka
,
“
Ke
y
word
ex
tra
c
tion
from
a
singl
e
document
using
word
co
-
occ
urre
nce
sta
ti
sti
c
al
informati
on
,
”
In
t
ernati
onal
Journ
al
on
Arti
f
ic
ia
l
I
nte
lligen
ce
Tools
,
vol. 13, pp. 15
7
-
169,
2004
.
[6]
E.
Te
rra
and
C
.
L
.
Cl
ark
e
,
“
Freque
nc
y
esti
m
a
te
s
for
sta
ti
sti
c
al
word
sim
il
ar
ity
m
ea
sures
,
”
Proc.
The
200
3
Confe
renc
e
of
th
e
North
Ame
ri
ca
n
Chapte
r
of
th
e
Associat
ion
for
Computati
onal
Linguisti
cs
on
H
uman
Language
Technol
ogy
,
As
sociation
for
Co
m
puta
ti
onal L
in
guisti
c
,
vol
.
1
,
p
p.
165
-
172
,
200
3.
[7]
G.
Cao,
et
al
.,
“
Sele
c
ti
ng
good
expa
nsion
t
erms
for
pseudo
-
rele
vanc
e
f
ee
db
ac
k
,”
Proc
ee
d
ings
of
the
31st
annua
l
int
ernati
ona
l AC
M
SIGIR confe
r
enc
e
on
Re
searc
h
and
de
velopme
nt
in
inf
or
mation
retrieval
,
pp
.
24
3
-
250
,
2008
.
[8]
S.
E.
Rober
tson
and
K
.
S.
Jone
s,
“
Rel
e
v
anc
e
w
ei
ghti
ng
of
se
ar
ch
t
erms
,”
Journal
of
the
Ame
r
ic
an
So
ci
e
ty
for
Information
sci
e
nce
,
vol
.
27
,
pp
.
129
-
146
,
1976
.
[9]
Y.
Lv
and
C.
Zha
i
,
“
Pos
it
iona
l
releva
n
ce
m
odel
f
or
pseudo
-
rel
ev
ance
feedbac
k
,”
Proceed
ings
of
the
33rd
int
ernati
ona
l AC
M
SIGIR confe
r
enc
e
on
Re
searc
h
and
de
velopme
nt
in
inf
orm
ati
on
retrieval
,
pp
.
57
9
-
586
,
2010
.
[10]
J.
Alla
n
,
“
Re
le
v
anc
e
fe
edback
w
it
h
too
m
uch
d
ata
,”
SIGIR
,
v
ol
.
9
5,
pp
.
337
-
343
,
1995
.
[11]
S.
Yu,
et
al
.,
“
I
m
proving
pseudo
-
rel
ev
ance
fee
d
bac
k
in
web
inf
orm
at
ion
ret
r
ie
v
al
using
w
eb
pa
ge
segm
ent
ation
,”
Proce
ed
ings o
f
t
he
12th
internati
onal
con
fe
ren
ce
on
World
Wid
e Web
,
pp
.
11
-
18
,
2003
.
[12]
S.
Jabri,
e
t
al
.,
“
Ranki
ng
of
te
x
t
documents
using
TF
-
IDF
weight
ing
an
d
association
rul
es
m
ini
ng
,”
2018
4th
Inte
rnational
Co
nfe
renc
e
on
Opti
mization
and
Ap
pli
cations (
ICOA)
,
pp.
1
-
6
,
2018
.
[13]
C.
Mac
dona
ld
a
nd
I.
Ounis,
“
Expe
rti
se
dr
ift
and
quer
y
expa
nsion
in
expe
r
t
sea
r
ch
,
”
T
he
six
te
en
th
ACM
conference
on
Confe
ren
ce o
n
inf
orm
ati
on
an
d
knowledge
management
,
ACM
,
pp.
341
-
350
,
20
07
.
[14]
M.
Arian
ne
zha
d
,
e
t
al
.
,
“
It
era
t
i
ve
Esti
m
ation
o
f
Docum
ent
Relevance
Score
fo
r
Ps
eudo
-
Rel
ev
a
nce
Fe
edba
ck
,
”
European
Conf
e
renc
e
on
Inform
ati
on
Re
tri
ev
al
.
Springer
,
Cham,
pp.
676
-
683
,
20
17
.
[15]
J.
Sing
h,
et
a
l
.
,
“
Fuzz
y
logi
c
h
ybrid
m
odel
wi
th
sem
ant
ic
filtering
appr
oa
ch
for
pseudo
re
le
van
ce
f
ee
db
ac
k
-
bas
ed
quer
y
exp
ansion
,
”
Computa
ti
ona
l
Int
el
l
ige
nc
e
(
SSCI)
,
2017
IEEE
Symposium Se
ri
es
on.
IEEE
,
pp.
1
-
7
,
2017
.
[16]
F.
Cola
ce,
et
al
.
,
“
Im
proving
rel
eva
n
ce
fe
edba
c
k‐ba
sed
quer
y
e
xpansion
b
y
the
use
of
a
weig
hte
d
word
pai
rs
appr
oac
h
,
”
Jour
nal
of
the A
ss
ociation
for
Information
S
ci
en
ce an
d
Technol
og
y
,
v
ol.
66
,
pp
.
2223
-
2234
,
2015
.
[17]
A.
Bouzi
r
i,
et
al
.
,
“
Le
arn
ing
quer
y
exp
ansio
n
from
associa
t
i
on
rule
s
b
et
we
en
term
s
,
”
Kno
wle
dge
Dis
cov
e
ry,
Knowle
dge
Enginee
ring
and
Kn
owle
dge
Manag
eme
nt
(
IC3K)
,
2015
7th
In
te
rnat
ional
Joi
nt
Conf
ere
nce
on,
IE
EE
,
pp.
525
-
530
,
20
15
.
[18]
S.
Jabri,
et
al
.
,
“
Im
proving
Ret
rie
va
l
Perform
anc
e
Based
on
Qu
er
y
Expa
nsion
with
W
ik
ipe
di
a
and
Te
x
t
Minin
g
Te
chn
ique
,
”
In
t.
J.
In
tell. E
ng
.
S
y
st
,
vol
.
11
,
pp
.
2
83
-
292,
2018
.
[19]
A.
Dahbi,
e
t
al
.
,
“
A
new
m
et
hod
for
ran
king
association
rul
es
with
m
ult
ipl
e
cri
t
er
ia
base
d
on
dom
ina
nc
e
relati
on
,
”
Computer
System
s
and
Appl
ications
(
AICCSA
)
,
2016
IE
EE
/
ACS
13th
Int
ernati
onal
Conf
ere
nce
o
f.
I
EEE
,
pp.
1
-
7
,
2016
.
[20]
G.
Salt
on,
et
a
l
.
,
“
A
vec
tor
s
pac
e
m
odel
for
aut
om
at
ic
indexing
,
”
Comm
unic
ati
ons
of
th
e
ACM
,
vol.
18
,
pp.
613
-
620
,
19
75
.
[21]
M.
Porter,
“
PorterSte
m
m
er
(ja
va
v
ersi
on)
[Software]
,”
1980
.
.
A
vai
l
abl
e
:
htt
ps
:
//
t
artarus.org
/
m
art
in/
PorterSt
e
m
m
er/
inde
x
-
old.htm
l.
[22]
R.
Agrawal
,
“
Fast
al
gor
it
hm
s
for
m
ini
ng
association
rul
es
,
”
20th
int.
conf.
ve
ry
large
da
t
a
bases,
VLDB
,
pp.
487
-
499
,
19
94
.
[23]
Luc
en
e
.
Availab
le
:
htt
p
:l
uc
ene.apac
he
.
o
rg/
cor
e
.
[24]
G.
Salt
on
and
C.
Buckl
e
y
,
“
T
erm
-
weight
ing
a
pproa
che
s
in
au
tomati
c
te
x
t
ret
r
ie
va
l
,
”
Informat
ion
proce
ss
in
g
&
management
,
vo
l.
24
,
pp
.
513
-
52
3
,
1988
.
[25]
A.
Bacci
ni
,
e
t
al
.
,
“
Anal
y
s
e
des
c
rit
èr
es
d'évalua
t
i
on
des
s
y
st
èmes
de
re
che
rch
e
d
'
in
form
at
ion.
Tech
nique
e
t
Sci
ence
Inform
at
ique
s
,
”
vol.
29
,
pp
.
289
-
308
,
2010
.
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
A g
r
aph
-
ba
se
d a
pp
r
oac
h
for
text
q
ue
ry ex
pans
io
n usin
g ps
eudo re
le
va
nce
feedb
ack
and
.
..
(
Si
ham
J
abri)
5023
BIOGR
AP
HI
ES OF
A
UTH
ORS
S
iham
Jabri
i
s
Business
Inte
ll
ig
ence
Engi
ne
er,
gra
dut
ed
fr
om
the
fac
ulty
of
scie
nce
an
d
te
chno
logi
es
(H
assan
First
Univer
sit
y
of
Set
ta
t
Morocc
o)
in
201
4.
Since
2015
,
s
he
is
pre
par
ing
h
er
Ph.D
in
the
L
a
bora
tor
y
of
Info
rm
at
ic
s,
Im
agi
n
g
and
Modeli
ng
of
Com
ple
x
S
y
stems
(
LII
MCS
)
.
She
is worki
ng
o
n
Natur
a
l La
nga
ge
Proce
ss
ing
an
d
Data
m
ini
ng
.
A
z
z
ed
din
e
Da
h
bi
got
his
Bac
hel
or
degr
ee
in
co
m
pute
r
scie
nce
i
n
2010
from
the
fac
ul
t
y
of
sci
en
c
e
and
t
ec
hniqu
es
u
nive
rsit
y
H
assan
1st
Settat
,
Moro
cc
o.
Followed
b
y
a
Mast
er
d
egr
e
e
in
m
at
hemat
ic
s
and
app
li
c
at
ion
from
the
sam
e
fac
u
lty
.
Now
pre
par
ing
his
P
h.
D
degr
ee
in
t
he
L
abor
a
tor
y
of
Inform
at
ic
s,
Im
agi
ng
and
Modeling
of
Com
ple
x
S
y
stems
(LI
IMCS
).
His
rese
arc
h
int
er
ests
includ
e
knowledge
d
isco
ver
y
from
da
ta
b
ase
.
Taou
fiq
Gadi
is
a
Profess
or
on
computer
sci
e
nce
a
t
th
e
fa
cult
y
of
scie
n
ce
a
nd
te
chno
logi
es
(Hass
an
First
Univer
sit
y
of
Se
ttat
Morocc
o)
.
Si
nce
2014,
he
is
the
Dire
ct
or
of
the
Inform
atics
,
Im
agi
ng
and
Modeli
ng
of
Com
ple
x
S
y
st
ems
La
bora
tor
y
.
He
h
as
conduc
t
ed
m
ore
tha
n
t
ens
PhD
the
ses
and
writte
n
a
fifty
of
sci
en
ti
fic
p
ape
rs
in
th
e
dom
ai
n
of
3D
m
odel
s
ana
l
y
s
is,
m
odel
s
Driving
Archi
tectur
e
,
D
a
ta
m
ini
ng
and
Da
ta
base
Anal
y
sis,
Modeli
ng
of
Co
m
ple
x
S
y
st
ems
.
Evaluation Warning : The document was created with Spire.PDF for Python.