TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 8, August 201
4, pp. 6380 ~ 6385
DOI: 10.115
9
1
/telkomni
ka.
v
12i8.514
3
6380
Re
cei
v
ed
No
vem
ber 1
4
, 2013; Re
vi
sed
April 3, 2014;
Accept
ed Ap
ril 17, 2014
Conceptual Search Based on Semantic Relatedness
Abdoula
h
i Boubac
a
r
1
, Zhendong
Niu
2
Beiji
ng Institute
of
T
e
chnolo
g
y
,
School of Co
mputer Scie
nc
e
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: abd_
bo
ubac
ar@
y
ah
oo.com
1
, zniu@bit.edu.cn
2
A
b
st
r
a
ct
T
r
aditio
nal s
e
arch e
ngi
nes
base
d
on sy
n
t
actic search
are u
nab
le to
solve key
is
sues lik
e
synony
my a
n
d
polyse
my. So
l
v
ing th
ese iss
u
es lea
d
s to
the
inventi
on
of the se
m
antic w
e
b. T
he se
ma
nti
c
search
en
gin
e
s
ind
e
e
d
ov
erc
o
me thes
e iss
ues. Now
a
day
s the
most i
m
portant
part of
the d
a
ta re
main
s
unstructure
d d
o
cu
me
nts. It i
s
conseq
ue
ntl
y
very ti
me c
onsu
m
ing to a
nnotate s
u
ch
big d
a
ta. Con
c
ept
based retrieval system
s intend to
m
a
nage directly unstructured doc
um
ents. Sem
a
ntic relationships are
their
main
feat
ure to
exte
nd
syntactic se
arc
h
. In
mo
st
of t
he
metho
d
s i
m
ple
m
ented
so
far, conc
epts
a
r
e
used for
both
i
ndex
ing
an
d s
earch
ing. W
o
r
d
s re
ma
in th
e
sma
llest
unit t
o
proc
ess se
mantic re
lated
n
e
ss.
T
he differe
nce
s
persist in th
e w
a
y that conce
p
ts ar
e repr
ese
n
ted, map
ped t
o
eac
h other, a
nd
man
a
g
ed fo
r
the sake
of in
dexi
ng
and/or
search
ing. Our
appr
oac
h is
base
d
o
n
W
i
ki
ped
ia c
once
p
ts. Conce
p
ts a
r
e
repres
ente
d
as
an u
ndir
e
cted
grap
h.
T
heir se
ma
ntic rel
a
ted
ness is co
mput
ed w
i
th a d
i
sta
n
ce d
e
rive
d fro
m
a se
mantic s
i
mi
larity
me
asu
r
e. T
he same
distanc
e
is us
ed to calc
ulat
e
both se
ma
nti
c
related
ness
and
query match
i
n
g
.
Ke
y
w
ords
: c
oncept analysis, information re
trieval, se
manti
c
related
ness
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
To imple
m
ent
a co
ncept b
a
se
d retrieval
system
,
the firs
t
ques
tion is
always
“what
is
a
con
c
e
p
t”. Th
ere
are m
any
an
swers to
t
h
is
que
st
ion.
A co
ncept m
a
y be
any i
d
ea o
r
thi
ng t
hat
has a
meani
n
g
by itself. Some co
ncepts
are m
ono
-wo
r
d while othe
rs are multi-word. A co
nce
p
t
can
be
rep
r
e
s
ente
d
by a
word, a
sente
n
ce f
r
agm
ent
, a whol
e sen
t
ence
or
an e
n
tire d
o
cu
me
nt.
Con
c
e
p
ts h
a
s
be
en d
e
fin
ed a
s
Word
Net ent
ree
s
[
1
, 2]. The Word
Net ap
pro
a
ch
ha
s solved
certai
nly the
synonymy probl
em. Qu
ery ca
n
be
expand
ed u
s
ing the syno
nyms. To so
lve
polysemy p
r
o
b
lem the se
mantic
web
search e
ngi
ne
s u
s
e ontol
og
ies. The m
e
thod is
perfe
ct in
term of preci
s
ion [3
-5]. Another
app
ro
ach i
s
ba
se
d
on wo
rd’
s
fre
quen
cie
s
a
c
cordin
g to a gi
ven
corpu
s
. The
Latent Sema
ntic Analysi
s
(LSA) [6, 7], pre
s
ent
s a
re
ductio
n
meth
od that optimi
z
e
s
con
c
e
p
t extraction
for
a
large
scal
e
of co
rp
u
s
. T
he LSA m
e
thod
uses m
a
trix facto
r
ization
instea
d of h
u
m
an
comp
re
hen
sible
kn
o
w
led
ge. Ou
r approa
ch
i
s
based
o
n
Wi
kipe
dia
a
r
ticl
es.
Each of the
selecte
d
arti
cl
es rep
r
e
s
ent
s one
con
c
ept.
Incompl
e
te a
r
ticle
s
a
r
e n
o
t sele
cted. Th
e
se
con
d
issue
to deal with
is the ch
oice
of the t
ool. Tools
coul
d be
statistical, p
r
obabili
stic et
c.
We h
a
ve
cho
s
en to
u
s
e o
n
l
y one tool: th
e se
manti
c
di
stan
ce b
e
twe
en the th
ree
different e
n
tities
that are que
ri
es, co
ncepts
and do
cum
e
n
t
s. The se
ma
ntic dista
n
ce i
s
used to buil
d
an undirect
ed
grap
h of
con
c
ept
s. We
co
nsid
er that
e
a
ch
co
nc
ept
may have a l
i
nk to oth
e
r
concepts.
We
did
not g
r
ou
p the
co
ncepts int
o
pa
rtition
s
. F
o
r thi
s
re
ason
the
gra
p
h
re
pre
s
entatio
n
seem
s to b
e
t
h
e
most a
dequ
a
t
e. Oppo
site
the method
s
based o
n
Fo
rmal Co
ncept
Analysis [8]
and [9], we
d
i
d
not establi
s
h
a hiera
r
chy b
e
twee
n co
ncepts.
2. R
e
lated
Work
The best
choi
ce for in
dexing is still
unclear in
informat
ion retrieval.
Words or
concept
s,
whi
c
h o
ne i
s
the better? Yi
ming Yan
g
[1
0] and
He
rsh
et al [11]
ha
ve investigat
ed the
be
st way
to rep
r
e
s
ent
a do
cum
ent.
For
a sake o
f
perfo
rman
ce, indexing
with words
a
s
lexical u
n
its i
s
better tha
n
in
dexing
with
concepts. F
o
r
a sake
of
rele
vance, in
dexi
ng with
con
c
e
p
ts a
s
sem
a
n
t
ic
units is
better than indexin
g with word
s.
In a co
n
c
ept
base
d
ret
r
ie
val system a
n
y idea, person,
thing etc. can
be a con
c
ept
[12]. In such system
u
s
e
r
s do not need to find a magi
c wo
rd that can
con
n
e
c
t them
to the inform
ation they se
ek. Willia
m A. Wood
s [13] i
s
one
of the rese
arche
r
s who
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Con
c
e
p
tual Search Ba
sed
on Sem
antic Relate
dne
ss (Abdo
ulahi B
ouba
ca
r)
6381
develop
ed
ve
ry
ea
rly
(199
7) a con
c
e
p
tu
al
ind
e
xing m
e
thod ba
sed
on
taxon
o
my whe
r
e
con
c
e
p
ts
are
pre
s
e
n
te
d at senten
ce level. Hi
s
method, d
o
e
s
not u
s
e
a hi
era
r
chy of co
nce
p
ts in
con
t
rast
with Wright et
al [14] and
Chen et al [15].
Hier
archi
c
al
relation
shi
p
s
have bee
n u
s
ed by He
rsh
et
al to implement SAPHIRE.
SAPHIRE
[16] combined both semantic
and probabili
stic methods
to
develop a he
uristi
c retri
e
val environm
e
n
t. Conc
e
p
t base
d
system
s have bee
n develop
ed as an
alternative to
syntactic se
arch [17] placing
words i
n
to a contex
t [18]. Most of the models
develop
ed to
overcome i
s
sue
s
related
to sy
ntacti
c search a
r
e
no
t langua
ge d
epen
dent [19
].
Con
c
e
p
t ca
n
be extra
c
te
d from q
uery
[20] or fro
m
document
s [
21]. Comp
ari
s
on
have b
e
e
n
made by
Dob
s
a a
nd Ba
sie
[22] betwee
n
Latent
Sema
ntic
Indexing and con
c
ept based
ind
e
xing
in information
retri
e
val. Th
eir
re
sults ha
ve sh
own tha
t
con
c
e
p
t ind
e
xing i
s
com
putationally
more
efficient than
Latent Sem
antic Indexin
g. Diffe
rent concept ba
se
d web
appli
c
ations h
a
ve
been
built u
s
ing
co
nce
p
t re
co
gni
tion [23, 2
4
] for q
u
e
r
y
an
sweri
ng.
A su
rvey
con
d
u
c
te
d
by Haav an
d
Lubi [2
5], through
thirty si
x con
c
e
p
t ba
sed
info
rmati
on retrieval
tools on
the
web,
have
sh
own
the ne
ed
of improveme
n
t in differe
nt dire
ct
ion
s
.
Ou
r ap
pro
a
ch
is ba
se
d on
sema
ntic
related
n
e
ss.
The qu
estio
n
we inte
nd to
solve i
s
ho
w to efficiently use th
e con
c
ept
s semant
ic
related
n
e
s
s to improve th
e state
-
of-the
-art m
e
thod
s.
For th
at re
a
s
on
we
nee
d
an a
ppropri
a
te
sema
ntic di
stance and a p
e
rti
nent con
c
ept rep
r
e
s
ent
ation.
3. Semantic
Di
stan
ce
We have p
r
e
s
ente
d
in a previou
s
wo
rk [26], not publishe
d yet, two sema
ntic si
milarity
measures
an
d
∆
.
We have proven
their accuracy
to e
s
tabli
s
h sem
a
ntic
related
n
e
ss and
qu
ery
relevan
c
e.
We have defined
the
and
∆
as:
,
∩
∪
,
and
∆
,
,
,
, where
∩
denotes the su
m of the num
ber of occu
rrences for
all the comm
on wo
rd
s in two given texts
and
,
∪
denotes the su
m of the nu
mber of
words in A
and the nu
mber of words in B in
cl
uding eve
n
tu
ally their occurre
n
ces, a
nd
,
denote
s
the
Ja
ccard
simil
a
rity
measure for two do
cuments
and
. Now w
e
are inte
re
st in a distan
ce functio
n
that can
mea
s
u
r
e the releva
nce and rel
a
tedn
ess. The ch
oi
ce
of the distan
ce is di
ctate
d
by the gra
ph rep
r
e
s
ent
ation of the con
c
e
p
ts. Le
t consi
der th
e
followin
g
data
represented
by Table 1.
Table 1. The
Semantic
Rel
a
tedne
ss bet
wee
n
the Do
cuments
,…,
.
We defin
e a distan
ce d
e
n
o
ted by
∆
for all docum
ents
and
such t
h
a
t
:
∆
,
∆
,
∆
,
(7)
Table2. Th
e semantic di
sta
n
ce
s bet
wee
n
the docume
n
ts
,…,
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
0
46
TELKOM
NI
KA
Vol. 12, No. 8, August 2014: 638
0 –
6385
6382
We
can
calculate the
dist
ance for
all d
o
cum
ents
, and
su
ch t
hat
∆
,
0
as
rep
r
e
s
ente
d
by Table 2.
∆
is always po
sitive. When
two docume
n
t
s are same the distan
ce i
s
ze
ro. Wh
en
two
document
s h
a
ve no
simila
rity t
he distan
ce i
s
not d
e
fined.
∆
is a di
stance
but it is not a metri
c
becau
se the triangle in
equ
ality is not ve
rified. If t
he tr
iangle ine
q
ua
lity is respe
c
t
ed, it could b
e
very importa
nt when we have to calcul
ate the path. From now on we only use
∆
when
comp
uting eit
her qu
ery to concept releva
nce o
r
co
nce
p
t to docume
n
t relatedn
ess.
4.
Conc
ept
Re
presen
ta
tion
To re
pre
s
e
n
t the co
nce
p
t we have
retri
e
ved
Wi
kipe
d
i
a articl
es
an
d sele
cted th
ose
are
compl
e
te and well written.
The selection is certai
nly
subje
c
tive but the sel
e
cte
d
article
s
(almo
s
t
2.5 millions articles)
cover a lar
ge
range of knowledge if we keep
in memory t
hat the current
numbe
r of English wo
rd
s is re
pre
s
e
n
te
d by 616.
500
entree
s
a
c
cordin
g to the
Oxford Engli
s
h
Dictio
nary, 2
nd
edition. From ea
ch sel
e
cted a
r
ticl
e
we rem
o
ve
the stop word
s, apply the
stemmin
g
an
d store the re
maining in a repo
sito
ry. From each sel
e
cted arti
cle we have only one
con
c
e
p
t. Co
nce
p
ts a
r
e
only stored
but not inde
xed. We th
us
cal
c
ulate
their sema
ntic
relatednes
s
with
the
∆
distance and
rep
r
esent them
as an
undi
re
cted g
r
ap
h. The ed
ge
s a
r
e
rep
r
e
s
ente
d
by the semantic di
stan
ce
s betwee
n
the article
s
. If
we con
s
id
er the docum
en
ts
,…,
as con
c
ept
s,
we can re
pre
s
ent them by
an undi
recte
d
graph a
s
illustrated by Fig
u
re 1.
Whe
n
two
co
nce
p
ts h
a
ve no semanti
c
simila
rity, t
here is
no path
from on
e to the othe
r. By that
method
we
have b
u
ilt an
undi
re
cted
grap
h
of
con
c
ept
s
from
t
he sele
cted
article
s
. We
can
r
e
mar
k
th
a
t
ea
c
h
time
w
e
co
mp
ar
e
tw
o ar
tic
l
es
th
e di
st
ance i
s
betwe
en
ze
ro
and
5
00 a
s
long
a
s
the sto
p
words
are
not
re
moved. Fo
r t
h
is
rea
s
o
n
we have
to re
move the
sto
p
word
s
and
take
500
as the
limit to e
s
tabli
s
h th
e
sem
a
ntic
related
n
e
ss. Th
e n
u
m
ber 50
0
co
rre
sp
ond
s to
one
occurre
n
ce o
f
exactly one commo
n wo
rd for two do
cume
nts that
have 1000
words a
s
su
m of
their le
ngth
s
. Indeed
if two entities hav
e a total of
o
n
e thou
sa
nd
words but le
ss than
on
e word
occurs o
ne ti
me in both,
we ca
n co
ncl
u
de that t
hey are n
o
t sem
a
ntically rel
a
te
d. Con
s
e
que
ntly,
from no
w on,
each tim
e
the distan
ce i
s
not less
tha
n
500
we
co
nclu
de that there
are
neit
her
related
n
e
ss
n
o
r relevan
c
e.
We d
o
not n
eed to
cal
c
ul
ate the rel
a
te
dne
ss
beyon
d this limit. We
thus gai
n a p
e
rform
a
n
c
e b
e
ca
use t
he computation
cost de
cre
a
se
s.
Figure 1. An Undi
re
cted G
r
aph of Con
c
epts Con
s
tru
c
ted from
,…,
.
5. Indexing
Do
cuments
Figure 2. Indexing Re
pre
s
entation
The in
dexing
uses A
p
a
c
h
e
Lu
cen
e
. T
h
irty thre
e st
op words are re
moved f
r
om ea
ch
document
an
d stem
ming
is a
pplied
u
s
i
ng the
po
rt
er algorith
m
.
In addition we
h
a
ve
chan
ged the
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Con
c
e
p
tual Search Ba
sed
on Sem
antic Relate
dne
ss (Abdo
ulahi B
ouba
ca
r)
6383
tf-idf similarity
measure th
a
t
use
s
Lu
cen
e
. The si
mil
a
rity measure to index the d
o
cum
e
nts i
s
t
h
e
∆
. Lucene i
s
compatible
with multi-index.
It c
an ea
sily create a
n
d manag
e multi-index, Fig2.
The first way is to index the do
cume
nts dire
ctly usi
ng the sam
e
∆
measu
r
e. Th
is index works
exactly like
syntactic
sea
r
ch. Con
s
e
que
ntly if a doc
ument is
not related to any
concept, it c
a
n be
retrieve
d. Our appro
a
ch extend
s synta
c
tic se
ar
ch. We thus have two indexe
s
to con
s
id
er.
The se
co
nd
way to index
a docum
ent
is to measu
r
e its rel
a
ted
n
ess to ea
ch
of the
con
c
e
p
ts. On
ce
we have
e
s
tabli
s
he
d the sem
antic
re
latedne
ss for
all the co
nce
p
ts an
d built the
undirecte
d
graph of co
nce
p
ts.
Table 3. Inde
xing Do
cume
nts
We thu
s
can
index any d
o
cum
ent to
be ret
r
ieved.
If the sema
ntic di
stan
ce
from a
document to
a con
c
ept i
s
le
ss than
500,
we a
d
d
the do
cu
me
nt to the
co
nce
p
t a
s
rel
a
ted
document
wit
h
the
corre
s
p
ondin
g
di
stan
ce. T
he
do
cu
ment i
s
con
s
eque
ntly add
ed to
the
gra
p
h
of con
c
e
p
ts.
If we have, f
o
r exam
ple, three
do
cum
e
nts
,
,
and the
previou
s
con
c
ept
s,
(se
c
tion
4)
as rep
r
e
s
ente
d
by Table
3, we ca
n ind
e
x the con
c
ept
s
and a
dd the
d
o
cum
e
nts to the
grap
h as
rep
r
ese
n
ted, Figu
re 3.
Figure 3. Con
c
eptu
a
l Index
ing Re
pre
s
e
n
t
ation
6. Quer
y
Match
i
ng
Each qu
ery is pro
c
e
s
sed i
n
two differen
t
direction
s
a
c
cordi
ng to the index we consi
der.
To se
arch in
the index cre
a
ted directly, we p
r
o
c
e
ss t
he que
ry a
s
in synta
c
tic search. Th
ere
i
s
nothing to ch
ange an
d we
only call Lucene’
s Index Se
arche
r
to pro
c
e
ss the
query. To se
arch
the index that
have be
en b
u
ilt with
the
concepts,
we
have to con
s
i
d
er th
e qu
ery
as a
do
cum
e
nt
and me
asure
its relate
dne
ss to t
he
con
c
epts. When
we kn
ow th
e re
latedne
s
s of the qu
ery to th
e
con
c
e
p
ts, we
can
cal
c
ulat
e the dista
n
ce from
the q
uery to the
document
s via the match
e
d
con
c
e
p
ts. We thus con
s
i
der th
e path
s
from
the q
uery to the
document
s. If the path to
a
document
is less than
500, the
d
o
c
ume
n
t i
s
re
turne
d
with
the co
rresp
ondin
g
di
sta
n
ce.
Otherwise nu
ll is return
ed.
Index1 is proce
s
s
ed first and retu
rne
d
document
s are colle
cted a
nd
sent to
a
ren
dere
r
. Ind
e
x2
is processe
d at th
e
se
cond tim
e
, an
d ea
ch
retrie
ved do
cu
me
nt is
che
c
ked in th
e list of retrie
ved docu
m
en
ts from
index
1. When a do
cume
nt that has bee
n
alre
a
d
y
returned fro
m
index1 wit
h
a given distan
ce
is again retu
rne
d
from index2
with anothe
r
distan
ce
, we
com
pare the
two di
stan
ce
s an
d return t
he do
cu
ment
with the mi
ni
mum di
stan
ce
min
,
to avoid the no risk of d
uplication. If a docume
nt has n
o
t yet been return
ed
from
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
0
46
TELKOM
NI
KA
Vol. 12, No. 8, August 2014: 638
0 –
6385
6384
index1, we return the do
cument with its co
rr
espond
ing distan
ce.
If we consid
er the followi
n
g
grap
h, fig4,
whe
r
e
are
d
o
cum
ents,
concepts
and
a que
ry, we
can
cal
c
ul
ate the path
s
from
to each
of the docum
ents. We thu
s
can retr
ieve
from index2
related
do
cu
ments. Relat
ed
document
s a
r
e those wit
h
in a distan
ce le
ss than
500 from the que
r
y
via their relat
e
d
con
c
e
p
ts.
T
o
retrieve
ea
ch
relevant do
cu
ment we
hav
e to
sum
the
distan
ce
from
the q
u
e
r
y to i
t
s
related
con
c
e
p
t and
the
distance
from
th
at do
cume
nt to the
co
ncept
, as i
n
di
cated
by the
arro
ws,
Figure 4.
Figure 4. Index2 Query Proce
s
sing
7. Discus
s
ion
The aim
s
of this stu
d
y, at this ste
p
, is to
sho
w
that on
e can
retriev
e
document
s
related
to a given
qu
ery with
out knowi
ng the
m
agic
wo
rd
th
a
t
link you to t
h
e info
rmatio
n nee
ded. T
h
e
approa
ch extend
s synta
c
ti
c se
arch. Th
e first co
ntrib
u
tion of this method
s is t
o
use the
sa
me
measure to
comp
ute
bot
h qu
ery to
concepts a
n
d
co
ncepts to
do
cum
ents
relevan
c
e.
T
h
is
onen
ess allo
ws u
s
to expre
ss the p
a
t
h and retr
i
e
ve the relevant documen
ts. The se
co
nd
contri
bution i
s
that the results are a
b
sol
u
te,
not corp
us de
pend
ent
,
unlike the works me
ntion
ed
earlie
r. The l
a
st co
ntributi
on is to co
nsi
der t
he
con
c
epts like they are: sem
anti
c
ally dep
end
ent.
The qu
estio
n
we expe
cte
d
to answe
r i
s
to sco
r
e th
e improveme
n
t providin
g the rate fo
r b
o
th
recall and p
r
eci
s
ion. The
limitation is that at
this step we ha
ve not been
able to use
the
con
c
e
p
t’s rel
a
tedne
ss. For example d
o
c
ume
n
t
(Fig
ure 4
)
is rel
a
tively close
d
to query
but,
at this
step
o
f
the implem
entation, we
are
una
bl
e t
o
ret
r
ieve d
o
c
ume
n
ts th
at are
not di
re
ctly
linke
d to the con
c
ept
s matche
d by the query. Fo
r this limitation we did no
t investigate to
measure the
accu
ra
cy of
this metho
d
co
mpa
r
ed
to s
y
ntac
tic
searc
h
. It
s
e
ems
for
us more
importa
nt to develop a met
hod that can retrieve a
ll the
relevant do
cuments. In ad
dition, one m
a
y
ask why the
grap
h represent
ation of th
e co
ncepts if
we d
o
not u
s
e that inform
ation. At this
step
th
e
se
ma
n
t
ic
r
e
la
te
dn
es
s of c
o
nc
ep
ts
h
a
ve
n
o
t
be
e
n
u
s
ed
. T
h
es
e is
su
es
lea
d
us
to
in
ve
s
t
igate
query
expan
sion. Qu
ery
e
x
pansi
on i
s
o
ne the
solu
ti
ons of the
int
e
rrogatio
ns we may
have
at
this step of th
e impleme
n
ta
tion.
8. Conclu
sion
We h
a
ve pre
s
ente
d
a
con
c
ept b
a
sed a
ppro
a
ch for i
n
formatio
n re
trieval. Our
a
ppro
a
ch
is ba
sed on
Wikip
edia a
r
ticle
s
. It extend
s synt
a
c
tic se
arch u
s
i
ng sem
a
ntic relatedn
ess.
It
pre
s
ent
s a
not
her
way to
im
prove
synta
c
tic
sea
r
ch
. All
the presente
d
con
c
ept
s a
r
e differe
nt, a
n
d
each o
ne i
s
related
to
onl
y one
su
bje
c
t therefo
r
e
o
u
r m
e
thod
ov
ercome
s
bot
h poly
s
emy
and
synonymy p
r
oblem
s. The
sema
ntic m
easure
appl
i
ed to the g
r
aph
structu
r
e pre
s
e
n
ts
an
oppo
rtunity to better opti
m
ize th
e
se
mantic
rel
e
vance. Nevert
hele
ss t
he concept’s
sem
antic
relation
shi
p
s
have n
o
t yet
been
in
use.
Our future
work i
s
to
incre
a
se
the
pe
rfo
r
man
c
e
with
t
h
e
c
o
nc
ept-to-c
onc
e
pt interac
t
ions
.
Referen
ces
[1]
Julio
G
onz
alo,
F
e
lisa
Verd
ej
o, Irina
Chu
g
u
r, Juan
Ci
ga
rran Ind
e
x
i
ng
w
i
t
h
W
o
rd
Net
s
y
nsets ca
n
improve te
xt re
trieval.
Proce
e
d
in
gs of the CO
LING
/ACL'
9
8
W
o
rkshop on
Usag
e of W
o
rdNet for NLP,
Montrea
l
.
1
9
98.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Con
c
e
p
tual Search Ba
sed
on Sem
antic Relate
dne
ss (Abdo
ulahi B
ouba
ca
r)
6385
[2]
Rad
a
Mih
a
lc
ea
, Dan M
o
ld
ova
n
. Sema
ntic In
de
xi
ng
usin
g
W
o
rdNet Se
ns
es.
Se
mantic
i
ndex
ing
usi
n
g
W
o
rdNet sens
es. Procee
din
g
s
of the ACL 2
000 W
o
rksh
op
on Rec
ent Adv
ances i
n
Natur
a
l La
ng
uag
e
Processi
ng an
d Information
Retriev
a
l.
2
0
00.
[3]
Anjo An
je
w
i
er
den, Suza
nn
e
Kabel. Auto
matic ind
e
x
in
g of docume
n
ts
w
i
th
onto
l
ogi
es.
13t
h
Belg
ian/D
u
tch Confer
ence on
Artificial
Intell
i
genc
e.
200
1
.
[4]
Jacob Ko
hler,
Stepha
n Phi
lip
pi, v Michael
Specht, Ale
x
a
nder R
uee
g. Ontolog
y
b
a
se
d text in
de
xi
ng
and q
uer
yi
ng f
o
r the semanti
c
w
e
b.
Know
le
dge-B
a
se
d Systems.
20
06; 19
(8): 744–
75
4.
[5]
Rifat Ozcan, Y Aalp Asla
nd
o
gan. Co
nce
p
t Ba
sed Informa
tion Access U
s
ing Ontol
o
g
i
e
s
and Lat
ent
Semantic An
al
ysis
Infor
m
ati
o
n T
e
chno
lo
gy: Codi
ng a
nd C
o
mp
utin
g.
IT
CC. 2005; 1: 79
4 –
799.
[6]
Scott Deer
w
e
s
t
er, Susan
T
Dumais, G
eor
ge W
F
u
r
nas,
T
homas K L
a
nda
uer, R
i
ch
ard H
a
rshma
n
.
Inde
xi
ng b
y
L
a
tent Sema
nti
c
Anal
ysis.
Jo
urna
l of the Amer
ican S
o
cie
t
y for Informati
on Scie
nce
.v
199
0; 41(6): 39
1–4
07.
[7]
Susan T
Duma
is. Latent Sem
antic An
al
ysis.
Annu
al R
e
view
of Informati
o
n
Scienc
e an
d
T
e
chno
logy.
200
5; 38: 188
[8]
La
w
r
ence
W
W
r
ight, Hol
l
y
K
Grossetta N
a
r
d
ini,
Ala
n
R
Ar
onso
n
, T
homa
s
C. Ri
ndfl
e
sc
h Hi
erarc
h
ica
l
conce
p
t ind
e
x
i
ng of full-te
xt d
o
cume
nts in th
e Un
ifi
ed Me
di
cal La
ng
uag
e
S
y
stem i
n
form
ation so
urces
map..
Journ
a
l o
f
the Americ
an
Society for Informati
on Sci
enc
e.
1999; 5
0
(6): 514-
523.
[9]
Posh
yva
n
y
k D,
Marcus A.
Co
mbini
n
g
Forma
l Co
nce
p
t An
al
ysis
w
i
th
Infor
m
ation
Retri
e
v
a
l for
Co
ncept
Locati
on i
n
So
urce Co
de. Pro
g
ra
m Co
mpreh
ensi
on. ICPC '
07. 200
7: 37-4
[10]
Yang Y, Ch
ute CG.
W
o
rds or conce
p
ts: the featur
es of
index
in
g un
its and the
i
r op
tima
l use i
n
infor
m
ati
on retr
ieval.
Proc An
n
u
S
y
m
p
Comp
ut Appl Med C
a
re. 199
3; 685-
9.
[11]
Hersh W
,
R Hi
ckam DH, Le
o
ne T
J
W
o
rds.
Conc
epts or b
o
th: optima
l
in
de
xi
ng u
n
its fo
r automate
d
information retrieval.
An
nu
al p
r
ocee
din
g
of computer a
p
p
lie
d me
dic
a
l care.
1992; 6
44-6
4
8
.
[12]
Conc
ept i
n
d
e
x
i
ng. An
gi V
o
ss, Keiic
hi
Nak
a
ta, Marcus J
u
h
n
ke.
Proc
eed
in
gs of th
e i
n
ter
natio
nal
AC
M
SIGGROUP conferenc
e on Supporting gr
oup
work.
1999; 1-10.
[13]
Conc
eptu
a
l In
de
xi
ng: A Better W
a
y
to Org
aniz
e
Kno
w
l
e
d
ge.
Technical Report
.
Sun Microsystem
s,
Inc. Mountain
View
, CA, USA.
1997.
[14]
Wright, Holly
K Gro
ssetta, Nardi
n
i, Al
ar
R Arons
on, T
homas
C. Ri
n
d
flesch. H
i
er
ar
chical
Co
nce
p
t
Inde
xi
ng of F
u
l
l
-T
ext Docume
nts in the U
n
ifi
ed
Me
dica
l La
ngu
ag
e S
y
st
e
m
. Information
Sources M
a
p
La
w
r
ence W
.
Journ
a
l of the A
m
er
ican S
o
ciet
y
for Informatio
n Scienc
e (JASIS
). 1999; 50(
6).
[15]
Yifan, Gui-Ro
ng
Xue, Yo
ng
Yu. Advertisi
ng ke
yw
o
r
d s
ugg
estio
n
bas
ed on co
nce
p
t hierarch
y.
Procee
din
g
s of
the Internatio
n
a
l Co
nf
erenc
e
on W
eb Searc
h
and D
a
ta Min
i
ng.
20
08; 25
1-
260.
[16]
Hersh W
R
, G
r
een
es RA. S
APHIRE a
n
i
n
formation
retri
e
val s
y
st
em f
eaturi
ng c
onc
ept match
i
ng,
automatic
in
de
xi
ng,
prob
ab
ilis
tic retrie
v
a
l, a
n
d
h
i
erarc
h
ica
l
r
e
lati
onsh
i
ps.
C
o
mputers
an
d
Bio
m
e
d
ica
l
Research.
Aca
demic Press Pr
ofessio
nal, Inc.
San Die
go, C
A
, USA. 1990; 23(5): 41
0-4
2
5
.
[17]
Conc
ept searc
h
: Semantics
e
nab
led s
y
nt
acti
c search. F
.
Giunch
i
gl
ia, U. K
harkev
i
ch, an
d
I. Z
a
ihra
yeu.
Procee
din
g
of the 6th Eur
ope
an Sema
ntic W
eb Co
nf. ESWC. 2009: 4
29-4
4
4
.
[18]
Placi
ng se
arch
in co
nte
x
t: the
conce
p
t revisit
ed. Proce
e
d
i
ng
s of the 10t
h
i
n
ternati
o
n
a
l co
nferenc
e o
n
W
o
rld W
i
de W
eb. ACM Ne
w
York, NY, USA. 2001; 40
6-41
4.
[19]
Phili
pp
Cimi
a
n
o
, Antje Sch
u
l
z
, Ser
gej S
i
zo
v, Philip
p Sor
g
, Steffen Staab
. Explicit vs. L
a
tent Co
nce
p
t
Mode
ls for
Cro
ss-Lan
gua
ge
Informatio
n
R
e
trieval.
Pr
oce
e
d
i
ngs
of the
Inte
rnatio
nal
Joi
n
t
Confer
enc
e
on Artificial Intelligence (IJCAI)
, Pasaden
a, CA.
2009: 1
5
1
3
-15
18.
[20] Michael Bendersky
,
W Bruce Croft.
Discover
i
ng key conc
ep
ts in verbose q
ueri
e
s
. Procee
din
g
s of the
31st annual international ACM SIGIR conferenc
e on
Research and dev
elopment
in information
retrieval. ACM
Ne
w
York, NY,
USA. 2008: 49
1-49
8
[21]
George K
a
r
y
pi
s, Eui-Hon
g
(
S
am) Han.
F
a
st superv
i
sed
di
mens
io
nal
ity reducti
on a
l
gorith
m
w
i
th
app
licati
ons t
o
docu
m
ent cat
e
gori
z
at
ion
& re
trieval
. Pr
ocee
din
g
s of th
e
ni
nth i
n
ternati
o
n
a
l co
nfere
n
c
e
on Informatio
n
and kn
o
w
l
e
d
g
e
manag
eme
n
t. ACM Ne
w
Yor
k
, NY, USA. 2000: 12-1
9
[22]
Jasmink
a
D
o
b
š
a, Boj
ana
D
a
lb
elo
Bašic.
Comp
ariso
n
o
f
informati
on r
e
trieva
l tech
ni
ques:
late
n
t
semantic i
nde
xing a
nd co
nce
p
t inde
xing.
Jo
urna
l of Information a
nd Org
ani
z
a
t
i
on
al Sci
ences.
20
04
;
28.
[23]
Damia
n
Borth,
Adrian
Ulg
es,
T
homas Mich
ael Bre
u
e
l
. Automatic conc
ep
t to quer
y
m
a
p
p
in
g for
w
e
b
base
d
conc
ep
t detector trai
nin
g
. Procee
d
i
ngs
of the
19th ACM int
e
r
nati
ona
l con
f
erence o
n
Multimed
ia. AC
M Ne
w
Y
o
rk, NY, USA. 2011: 145
3-14
56.
[24]
Cap
o
raso J Gr
egor
y, W
ill
iam
A Baumg
a
rtner
Jr
, H
y
u
n
-min
Kim, Z
h
i
y
on
g Lu, He
len
L Jo
hnso
n
, Olg
a
Medve
deva, A
nna
Lin
dem
an
n, L
y
nn
e M F
o
x, E
liz
a
beth
K W
h
ite, K Bretonn
el C
o
h
e
n
,
La
w
r
e
n
c
e
Hunter
. C
once
p
t Reco
gniti
on
, Informati
on
Retriev
a
l a
nd
Machi
ne L
ear
n
i
ng i
n
Gen
o
m
i
cs Question-
Answering
.
T
R
EC 200
6 Proce
edi
ngs. 20
06.
[25]
Hele-M
ai
Ha
av
, T
anel-Laur
i L
ubi.
A s
u
rvey
o
f
conce
p
t-bas
e
d
inf
o
rmatio
n
r
e
trieva
l too
l
s
on
th
e
w
eb5
th
East-Europ
ea
n
Confere
n
ce, A
D
BIS 2001, Vil
n
ius, Lith
ua
nia.
2001.
[26]
Valu
ing S
e
man
t
ic Similarit
y
. Abdo
ula
h
i Bo
ub
acar, Niu Z
h
e
n
don
g.
Evaluation Warning : The document was created with Spire.PDF for Python.