TELKOM
NIKA
, Vol. 13, No. 4, Dece
mb
er 201
5, pp. 1495
~1
504
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i4.2736
1495
Re
cei
v
ed Au
gust 13, 20
14
; Revi
sed
No
vem
ber 2
0
, 2015; Accepte
d
No
vem
ber
30, 2015
The Analysis of Rank Fusion Techniques to Improve
Query Relevan
ce
Diy
a
h Puspitaningrum*
1
, Jeri Aprian
s
y
ah
Pagua
2
, Aan Erlansari
3
, Fauzi
4
, Rusdi Efendi
5
,
Desi Andres
w
a
r
i
6
, I.S.W.B. Prasety
a
7
1,2,
3,4,
5,6
Department of Comp
uter Scienc
e, F
a
cult
y of
Engi
n
eeri
ng, Univ
ers
i
t
y
of Ben
g
kul
u
W
R
Supratma
n Street, Kand
ang L
i
mun, Be
ngku
l
u 3
837
1, Sumatera, Ind
ones
ia
7
Departme
n
t of Information a
n
d
Comp
uting S
c
ienc
es, Utrecht Univers
i
t
y
PO Box 80.0
8
9
,
3508 T
B
Utre
cht,
T
he Nethe
r
lan
d
s
*Corres
p
o
ndi
n
g
author, e-ma
i
l
s: di
yah
pus
pitani
ngrum
@gm
a
il.com
1
, jeri.a
p
r
ians
ya
hp
ag
ua
@uni
b.ac.id
2
,
aan.er
lans
ari@
uni
b.ac.id
3
, fauzi.faisal@unib.
ac.id
4
, rusdi.ef
end
i@u
n
ib.
a
c.i
d
5
,
desi.a
ndres
w
a
r
i
@un
i
b.ac.i
d
6
, S.W
.
B.Prasetya@u
u.nl
7
A
b
st
r
a
ct
Rank
fus
i
on meta-searc
h
en
g
i
ne alg
o
rith
ms can
b
e
use
d
to
mer
ge w
eb s
e
arch resu
lts of mu
ltipl
e
search
eng
in
e
s
. In this pap
e
r
w
e
introduc
e
tw
o vari
ants o
f
the W
e
ighte
d
Borda-F
u
s
e
al
gorith
m
. T
he fi
rst
varia
n
t retrieve
s docu
m
e
n
ts based o
n
pop
ul
arities
of co
mp
one
nt engi
nes.
T
he second o
ne is bas
ed on
k
user-d
efine
d
t
oplist
of co
mpon
ent e
n
g
i
n
e
s. In
this r
e
searc
h
, exp
e
ri
ments w
e
r
e
perf
o
rmed
o
n
k=
{50,100,
200}
topl
ist w
i
th
AND/OR co
mbin
ations
i
m
p
l
emente
d
o
n
‘
UNIB Meta
F
u
sio
n
’
meta-se
a
rch
eng
ine
pr
ototype
w
h
ic
h e
m
ploy
ed 3 out
of 5
po
pul
ar
search
e
ngi
ne
s. Both
of o
u
r
tw
o alg
o
rith
ms
outperfor
m
ed
o
t
her rank fus
i
o
n
al
gorith
m
s (r
elev
ance sc
ore
is upto
0.76
c
o
mpar
e to Goo
g
le th
at is 0.2
7
,
at
P@10). T
h
e
p
s
eud
o-rel
e
va
n
c
e a
u
to
matic
j
udg
e
m
ent
tec
hni
ques
i
n
volv
ed
are
Rec
i
pr
ocal
Ra
nk, Bo
rda
Cou
n
t, and C
o
ndorc
e
t. T
he o
p
timal settin
g
w
a
s reache
d for quer
ies w
i
th
operat
or "AN
D
" (de
g
ree
1) or
"AND ... AND"
(degree 2)
with k=200. The
‘
UNIB M
e
ta Fusion
’
m
e
ta-s
earch
engine
system
was
built
correctly.
Ke
y
w
ords
: W
e
ig
hted
Bord
a
-
F
u
se, rank
fusio
n
,
meta-s
earch
en
gi
ne,
pse
udo-r
e
l
e
v
ance
aut
omati
c
jud
g
e
m
e
n
t, qu
ery relev
ance
Copy
right
©
2015 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
There are many prop
osal
s for a meta
-sea
rc
h engi
n
e
(MSE). Given a que
ry (a set of
keyword
s
), typically
an M
SE system
retrieves we
b
pag
es that
are
rel
e
vant
to the q
uery
by
exploiting all
its underlyin
g sea
r
ch en
gine
s. It s
ends the qu
ery
to these en
gine
s; the re
sults
obtaine
d are then me
rged
and ran
k
ed. I
t
return
s fi
nal
web d
o
cume
nts ra
nked b
y
relevan
c
e. In
the Helio
s archite
c
ture [1]
the MSE system
u
s
e
s
st
anda
rd me
rg
er and
ran
k
e
r
modul
es. T
o
achi
eve hi
gh perform
a
nce it ut
ilizes async I/O
and
parall
e
l T
C
P
connections
with the rem
o
te
sea
r
ch e
ngin
e
s. In th
e Ta
dpole
archite
c
ture
[2, 3], the
ran
k
fu
si
on alg
o
rithm
s
are
ba
se
d o
n
a
variety of parameters, such as
the
ra
nk orde
r an
d th
e numb
e
r of t
i
mes a
n
URL
appea
rs in t
he
results of e
a
ch of it
s
sea
r
ch en
gine
s
co
mpone
nts,
to
comp
ute
a
weight fo
r e
a
ch
coll
ecte
d
re
sults
[2-4]. The
r
e i
s
al
so th
e co
nce
r
n
of user spe
c
ific
ne
e
d
s. Fo
r exa
m
ple, an MSE
sho
u
ld id
eally
let
the u
s
e
r
cho
o
s
e
his favou
r
i
t
e se
arch
en
g
i
nes from
a
n
available
list,
and
do
que
ry
modifi
cation
s,
as well as ex
plore avail
abl
e ran
k
fusio
n
techni
que
s [2
].
In gene
ral, rank fu
sio
n
al
gorithm
s offe
r im
proveme
n
t of the rel
e
vance sco
r
es of the
returned
do
cuments of m
u
ltiple sea
r
ch
engin
e
s.
Dwork, Ku
mar,
Nao
r
, an
d Si
vakuma
r p
r
o
pose
the u
s
e of
ra
nk
agg
reg
a
tion meth
od
s f
o
r MSE
s
viz.
the Borda’
s
method, F
oot
rule
and
Scal
ed
Footrul
e
, and
Markov Ch
ai
n method
s [5
]. Lam and L
eung p
r
o
p
o
s
e a co
mplete
dire
cted g
r
a
ph
viz. MST Algorithm [6].
Supervi
sed
rank
agg
reg
a
t
ion method
s su
ch a
s
B
o
rda
-
Fu
se
a
n
d
sup
e
rvised
Markov
Chai
n ba
sed
met
hod
s a
r
e in
v
e
stigate
d
in
[7]. KE algorithm [8] and
its
variants [4] e
x
ploit the ran
k
ing o
n
the re
sults
that an
MSE receive
s
from its
co
mpone
nt engi
nes,
by con
s
id
eri
n
g the n
u
mbe
r
of d
o
cume
nts a
ppea
ra
n
c
e
s
in th
e
compon
ent e
n
g
ine
s
’ list
s
with
equal reliabili
ty assumptio
n
of those en
gine
s.
Another ra
nk fu
sio
n
MSE algorit
hm name
d
Count
Functio
n
[9]
define
s
web
document
s ranki
ng a
s
su
mming ran
k
s as p
e
r p
o
sit
i
ons
of a URL
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 149
5 – 1504
1496
divided by t
he
count
of
URL do
cu
me
nts. Asla
m a
nd Mo
ntagu
e
introd
uces
a voting fu
si
on
method na
m
ed
Borda-Fus
e
which is a
n
adaptatio
n of the Borda
Cou
n
t electio
n
pro
c
e
ss [1
0].
Borda
-
Fu
se, t
e
sted i
n
two
of the five tests usi
ng T
R
E
C
test d
a
ta, p
e
rform
ed b
e
tter than th
e b
e
st
comp
one
nt IR sy
stem in t
he ele
c
tion
re
sults [1
0, 11]. Borda
-
Fu
se t
hat is exten
d
ed to a
weigh
t
ed
variant is cal
l
ed Weig
hted
Borda-Fu
se
algorithm
; it multiplies the points in
whi
c
h a retri
e
val
sy
st
em
i
S
assigns to
a candid
a
te URL with a
sy
stem
weight
i
W
. By using
improve
d
perfo
rman
ce
weig
hts, Wei
ghted Borda-Fuse h
a
s
the
potential of o
u
tperfo
rming
Comb
MNZ [1
2].
The
coverage
of each sea
r
ch e
ngin
e
is li
mit
ed, only a
bout 1% of bil
lion pa
ge
s are in the
surfa
c
e
we
b
while the
re
st
are in the
de
ep we
b. The
r
efore it is inte
restin
g to kno
w
ho
w to me
rge
different se
arch en
gine
s a
nd ho
w deep
we sh
oul
d
crawl the
web
to potentially retrieved st
ill
relevant do
cuments. G
e
tting less search en
gi
ne
s report ran
k
ing
sco
re
s, we
can
conve
r
t the
local ran
ks i
n
to local ran
k
ing
score
s
.
The KE algo
rithm [13] is a sco
re
-ba
s
ed method t
hat
exploits ra
nki
ng of the se
arch re
sult
s of t
he compo
nent engin
e
s where all those e
ngine
s
are
treated eq
uall
y
reliable. Co
nsid
er a do
cu
ment
x
. In KE
, the loc
a
l rank
s
(
i
r
) of
x
as
re
tu
r
n
ed
fr
o
m
all compo
nen
t engine
s
of
an MSE a
r
e
summ
ed
and
co
nverted
to
a
single
wei
ght sco
r
e
(
ke
W
)
usin
g this formula:
)
)
1
10
(
*
)
((
1
n
m
m
i
i
ke
k
n
r
W
(
1
)
whe
r
e
m
i
i
r
1
is the
sum of all
ran
k
ing
s
from
ea
ch
sea
r
ch en
gine
s that the
docu
m
ent a
ppea
rs,
n
is the
nu
mbe
r
of
comp
one
n
t
engin
e
s wh
ere
the
do
cu
ment a
ppea
rs in thei
r
re
sult
s,
m
is
the tot
a
l
numbe
r of co
mpone
nt engi
nes expl
oited
,
and
k
is the
numbe
r of to
plist do
cum
e
nts cra
w
led from
each com
pon
ent engin
e
. The lesse
r
the weig
ht, the better the ran
k
i
ng score is.
Patel and Sh
ah propo
se t
o
simply u
s
e
the Co
unt Fu
nction to
com
pute the ran
k
i
ng of a
n
MSE docum
e
n
t [9]. The rankin
g of the docum
ent
x
is
comp
uted a
s
follows:
)
(
)
(
)
(
1
x
count
x
P
x
Rank
n
i
i
(2)
whe
r
e
)
(
x
P
i
is the local
ran
k
of
document
x
returne
d
from
a com
pon
ent engin
e
.
)
(
x
P
i
in (2
) is
same wit
h
i
r
in (1
).
Unli
ke
the KE algori
t
hm, the docum
ents are
here ra
nke
d
in
de
scendi
n
g
orde
r (the hi
g
her the weigh
t
the better the ran
k
ing
sco
r
e).
Borda
Co
unt
[10] is a votin
g
-ba
s
e
d
data
fu
sion th
at is adopte
d
to a
meta-sea
rch
engin
e
environ
ment i
n
the Weight
ed Borda-Fu
se (WBF
)
alg
o
rithm [14, 1
5
]. Different from the KE a
n
d
Cou
n
t Fun
c
tion algo
rithm
s
, in WBF, co
mpone
nt
se
a
r
ch
engi
ne
s
do not have
to be treate
d
as
equally realia
ble. The vote
s for
a web d
o
cum
ent that
lays on th
e l
o
cal
ran
k
i
of the co
mpo
n
ent
sea
r
ch engi
n
e
j
are
)
1
)
(
(max
*
)
(
,
i
r
w
r
V
k
k
j
j
i
(3)
whe
r
e
j
w
is the weight of
j
,
and
)
(
max
k
k
r
is the numbe
r of toplist document
s crawl
ed fro
m
comp
one
nt search
engi
ne
k
.
Retri
e
ved
we
b d
o
cum
ents th
at ap
p
ear i
n
m
o
re
than
one
search
engin
e
s recei
v
e the sum o
f
their votes. The do
cu
m
e
nts are
ran
k
e
d
in descen
d
i
ng ord
e
r of the
total votes they receive
(th
e
highe
r the vote
the better the ran
k
ing score) [3, 7, 10, 16].
Evaluation
ke
y param
eters for ran
k
ing
strategi
e
s
in
a
n
MSE ca
n b
e
viewe
d
(opt
ionally)
by its algo
rithmic
compl
e
xity, rank ag
greg
ati
on tim
e
, overlap a
c
ross
sea
r
ch
engin
e
s, an
d
the
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
The Anal
ysi
s
of Ran
k
Fu
sion Techniq
u
e
s
to Im
prove
Query
Rele
va
nce
(Diyah Puspitaningrum
)
1497
pre
c
isi
o
n
s
fro
m
the user’
s
perspe
c
tive
[3, 17
]. For
situation
s
wh
ere a
se
arch
engin
e
join
s a
meta-sea
rch
engin
e
on the
fly [18], some ran
k
fusi
on
algorith
m
s
ca
n not be impl
emented d
u
e
to
estimating a
sea
r
ch engi
n
e
score u
s
u
a
lly require
s e
noug
h sam
p
l
e
data from
each com
pon
ent
engin
e
. In our work, we u
s
e
several
k
-to
p
list values of
comp
one
nt engine
s.
Re
sea
r
ch of
query
ope
rat
o
rs’
utilizatio
n ha
s
re
p
o
rte
d
that a
bout
10% of
sea
r
ch en
gine
use
r
s
use ad
vance
d
que
ry
operators su
ch a
s
B
oole
a
n
AND/O
R
(a
nd the rest o
n
ly use
simpl
e
queri
e
s) [19]. We are intere
sted in e
x
amining th
e
effect of co
mplexity degree of qu
ery, in
particula
r d
e
g
r
ee
on
e a
nd t
w
o
(u
sin
g
respectively o
n
e
and
two o
perators). A
c
cording to
[20], t
h
e
perfo
rman
ce
of que
rie
s
of
compl
e
xity one o
u
tperfo
rm
that of
co
mplexity two,
in all
cases;
but is
the decrea
s
e
in relevance signifi
cant en
ough
? All
of
our qu
ery experim
ents in this re
se
arch
are
impleme
n
ted on
searchi
ng web do
cume
nts
u
s
ing 3
search e
ngin
e
s
: Goo
g
le, Altavista, and F
a
st
sea
r
ch engi
n
e
. It would be intere
sting to see wh
eth
e
r the re
sults are co
nsi
s
te
nt using diffe
rent
comp
one
nt search en
gine
s.
This pa
per int
r
odu
ce
s two
variants of
Weight
Bo
rda
F
u
se
alg
o
rithm
that aim
s
to i
m
prove
query releva
nce of
web search results
in
a
me
ta-search e
ngin
e
environm
ent
. The re
st of this
pape
r is
org
a
n
ize
d
a
s
follo
ws: Se
ction 2
descri
b
e
s
ou
r prototype of
meta-sea
rch
engin
e
sy
ste
m
and th
e p
r
op
ose
d
Weig
hte
d
Boda
-Fu
s
e
variants,
Se
ct
ion 3
repo
rts
our expe
rime
ntal result
s, a
n
d
f
i
nally
S
e
ct
ion 4 giv
e
s s
o
me con
c
lu
sio
n
s.
2. The 'UNIB
Meta Fu
sion' Protot
y
p
e and The Algo
rithms
To overco
me
issue
s
d
e
fin
ed in Se
ction
1,
we
built a
prototype
of
a user
ada
ptive MSE
calle
d ‘UNIB Meta Fu
sion’
that allows
a user
to
ch
oose hi
s favourite
sea
r
ch
engin
e
s
and
k
-
toplist
set u
p
of ret
r
ieved
web
do
cum
e
nts
(
k
=
50,
100, 2
00). In
this
re
sea
r
ch we exp
e
ri
ment
with ou
r two
prop
osed v
a
riant
s of th
e WBF al
gorithm [14, 15], KE algorith
m
[8], and Count
Functio
n
alg
o
r
ithm [9]. Rel
e
vance is
co
mputed u
s
i
n
g
two IR m
e
tri
cs: p
r
e
c
i
s
ion
s
and M
R
R. All is
measured fro
m
10-topli
s
t o
f
MSE against 10-toplist of
pse
udo
-relevance
tech
niq
ue.
Figure 1. The
web interfa
c
e of ‘UNIB Meta Fusio
n
’.
A user
can
cho
o
se hi
s preferred ran
k
fusio
n
algorith
m
,
k
-t
oplist, com
b
in
ation of comp
onent search
engin
e
s a
s
well as qu
ery
Our ‘UNIB Meta Fusio
n
’ (Figure 1) is
a me
ta-sea
rch engin
e
prot
otype that su
pport
s
a
choi
ce
of 3 o
v
er 5
well
-kn
o
wn
se
arch
e
ngine
s. Relat
ed to Fig
u
re
1,
SE
= {
SE
1
, ... ,
SE
5
} is a list
of comp
one
nt sea
r
ch engi
n
e
s: Goo
g
le,
Bing, Ask.
co
m, Lycos, an
d Exalead, re
spe
c
tively. Since
we intend
ed
this proje
c
t
for research purp
o
se
, pro
c
e
s
ses of
retrieving, parsing, me
rging,
ran
k
ing,
and
re
portin
g
th
e results
of the
sea
r
ch e
ngine
s are
d
one se
parate
l
y
in
the off-l
i
ne
mode. Th
e p
r
ototype
will
sho
w
o
n
ly the list of
URL results of
the
best ran
k
fu
sion method.
By
‘UNIB Meta F
u
sion’ we especially want to in
vestigate
whi
c
h ra
nk fu
sion meth
od outperfo
rm
s the
others, in dif
f
erent topli
s
t
retrieve
d do
cume
nt
s’
set
up. We
modi
fy the Weig
h
t
ed Bord
a-F
u
se
algorith
m
into
2 varia
n
ts, called ‘
Default
’
WBF a
nd ‘
My
Ow
n
’ WBF. ‘
Default
’ sets up
the
n
u
mb
er
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 149
5 – 1504
1498
of retrieved t
oplist do
cu
m
ents ba
se
d o
n
the
pop
ularity of a search engin
e
whe
r
ea
s in ‘
My
Ow
n
’
the use
r
is fre
e
in defining t
he com
b
inati
on of
toplist o
f
multiple co
mpone
nts se
arch engi
ne
s.
Figure 2. The
archite
c
tu
re
of ‘UNIB Meta Fusio
n
’
Figure 2
sho
w
s the
archit
ecture of ‘
U
NIB Meta F
u
sion’
sy
stem. The
Web Interfac
e
allows a user to submit a query, sp
ecif
y a
choice of
three se
arch
engine
s, an
d a numbe
r
k
for
how many toplist documents
will
be retrieved from
each of the
specified search engine.
The
Query Pa
rser
cre
a
tes
an
approp
riate f
o
rmat for th
e
query a
nd p
a
sse
s
the inf
o
rmatio
n of
k
–
toplist URL
s
and
choi
ce
s
of sea
r
ch e
n
g
i
nes to
B
e
st Algorithm
.
Best Algorithm
emp
l
o
y
s
on
ly be
s
t
perfo
rmed
ra
nk fu
sion al
g
o
rithm viz. th
e algo
rith
m that ha
s the
highe
st do
cu
ments
retriev
a
l’s
relevan
c
e
score to
a set o
f
gold
standa
rd retri
e
va
l rel
e
vant do
cum
ents g
ene
rate
d by eithe
r
Rank
Releva
nce, Borda
Count,
or
Con
dorcet
tech
nique
such
as sugg
ested
by Nuray and
Ca
n [
21].
Bes
t
Algorithm
then returns the m
e
rg
e
d
and
ran
k
ed
list of docu
m
ents to
Que
r
y
Pa
rs
er
that in
turn will return sets of [URL, title, sn
ippet] as que
ry search results
into use
r
.
The
Offline
Q
uery Processor
i
s
the
mo
st
time con
s
um
ing
part
of th
is resea
r
ch.
Given a
list of qu
erie
s, set
s
of
k
-toplist
of co
mpone
nt en
gine
s (
k
=
50, 100,
20
0), an
d sets of
combi
nation
s
of 3 out of 5 sea
r
ch
engin
e
s, the
HTTP Retri
e
ve
rs
han
dle
s
the net
wo
rk
comm
uni
cati
ons, and
the
S
ear
ch Re
sul
t
s
C
o
lle
ct
or
stores
sep
a
rat
e
ly ea
ch
k
-
t
op
lis
t do
cu
men
t
s
of the sea
r
ch engin
e
s.
Results
Pars
er
p
a
rses th
e list
s
into URL
s
, titles, and sni
ppets. Th
ese are
then com
b
ine
d
in
Me
rg
er
an
d
R
a
nk
er
an
d then
ran
k
e
d
by
one
of t
he
Ran
k
Fu
si
on
Algo
rithm
s
:
KE, ‘
Default
’ WBF, ‘
MyOwn
’ WBF, and
Cou
n
t Fun
c
tion algo
rithm.
We then
ca
refully investig
ate
and de
cid
e
which al
gorith
m
that work b
e
st to be emp
l
oyed in
Best Algorithm
.
The ‘
UNIB M
e
ta Fu
sion’
return
s d
o
cu
ments
processed
by the
off-line q
uery
p
r
ocesso
r
usin
g 1
0
q
u
e
r
ies
of two te
rms
and
three
terms le
n
g
th (see
Ta
ble 1). Those que
rie
s
a
r
e
extend
e
d
further usin
g
operators AND/O
R
. Fo
r example,
for thre
e terms qu
ery o
f
“Java a
ppl
et
prog
ram
m
ing
”
is com
b
ine
d
further to form
4 differe
nt new que
ri
es: “Java AND ap
plet AND
Table 1. The
Multi Domai
n
Queri
e
s [20]
T
w
o Terms Que
r
ies
Three
Terms
Qu
eries
database overlap
multilingual OPACs
program
ming algorithm
road-ma
p plan
adolescent alcoholism
comparative edu
cation methodolog
y
java applet progr
amming
indexing AND di
gital libraries
geographical stro
ke incidence
culturally
respon
sive
teaching
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
The Anal
ysi
s
of Ran
k
Fu
sion Techniq
u
e
s
to Im
prove
Query
Rele
va
nce
(Diyah Puspitaningrum
)
1499
prog
ram
m
ing
”
, “Java AND
applet O
R
p
r
ogra
mming
”,
“Java
O
R
ap
plet AND pro
g
rammi
ng
”, and
“Java
OR
ap
plet OR p
r
og
rammin
g
”. T
w
o term
s l
e
n
g
th que
rie
s
g
e
t a simila
r treatment. At the
end, the 10 q
uerie
s a
r
e extende
d into 30
different que
ries.
2.1. The Use
r
Defin
e
d ‘
MyOwn
’ WBF
Algorithm
The u
s
e
r
d
e
fined ‘
My
Own
’ WBF
algo
rithm allo
ws
a user to
de
fine his o
w
n
sea
r
ch
engin
e
weigh
t
s. By co
nsi
d
ering
(3),
we
set the
k
r
of
ea
ch se
arch en
gine with an equal
value o
f
k
-topli
s
t; but
with diffe
re
nt wei
ghts
a
s
sug
g
e
s
ted
by the u
s
e
r
. The
weig
hts a
r
e
usu
a
lly
prop
ortio
nate
with the use
r
's trust of re
levancy
of the corre
s
p
ond
ing se
arch e
ngine
s. In other
words, by
usi
ng
the user defined '
MyOwn
' WBF alg
o
rithm we wo
uld
like
to kn
ow wheth
e
r
t
h
e
MSE system will pro
d
u
c
e
a good relev
ance score if
we treat the system
with different wei
g
hts,
given a
u
s
e
r
define
d
k
va
l
u
es
o
f
k
-to
p
list crawlin
g
do
cum
ent
f
o
r all
of co
mpone
nt sea
r
ch
engin
e
s.
The ‘
MyOwn
’ WBF'
s pro
c
e
s
ses fo
r meta
-se
a
rch 3 of 5
search e
ngin
e
s are as foll
ow:
Step 1.
The
user
sp
ecifie
s
k
; this determine
s
the numbe
r of
docum
ent
s of all of compone
nt
sea
r
ch engi
n
e
s will late
r re
trieve (
k
-topli
s
t), e.g.
k
=
20
0
.
Step
2.
Define the
se
t of search
e
ngine
s
SE =
{
SE
1
, SE
2
,
...
, SE
n
} that a
r
e availa
ble f
o
r meta
-
s
e
arching. // In our case
n
= 5.
Step
3.
The user set the weight
j
w
(
j
=
1, ..
. ,
n
)
for each search en
gine
in Step 2.
// For
example
j
w
= {50, 30, 20, 25, 15}.
Step 4.
The user
sele
cts three out
of
n
sea
r
ch e
ngine
s to be
use
d
.
Step
5.
For e
a
ch three en
gine
s from Step 4,
set
)
(
max
k
k
r
to be the
value of
the
con
s
tant
k
from Step 1 for ea
ch comp
onent en
gine.
Step
6.
For ea
ch d
o
cument found i
n
the
k
-to
p
list
return
ed by each com
pon
ent engin
e
ch
ose
n
i
n
Step 4:
Step 6a.
Comp
ute
)
(
,
j
i
r
V
, using equatio
n (3), where
i
is the ran
k
in
g
of the docu
m
ent
in engin
e
j
.
Step 6b.
Comp
ute the document
'
s
WBF
ra
nki
ng score:
3
1
,
_
))
(
(
_
_
j
j
i
SE
total
r
V
score
ranking
WBF
Con
s
id
er a d
o
cum
ent
x.
T
he
total_SE
is the nu
mbe
r
of sea
r
ch en
gine
s wh
ere
document
x
i
s
found.
Step 7.
Orde
r the fou
nd do
cume
nts de
sc
endi
ngl
y by their WBF ran
k
ing
sco
r
es.
Step 8.
Presents the
top 10 do
cum
ents
obtai
ned
from Step 7 to the use
r
.
Example 1:
Con
s
id
er a m
e
ta-sea
rch e
ngine
system
built using ‘
MyOwn
’ WBF
algorithm. Let
n
= 5,
and
the cho
s
en se
arch engin
e
s
a
r
e
SE
1
,
SE
2
, and
SE
3
, with
k
=
200 fo
r
all of them (this
determi
ne
s th
e
k
-topli
s
t to
be retrieve
d).
Suppo
se
the
user
sp
ecifie
s {
50, 3
0
, 20}
as the
weig
h
t
s
of the
com
p
o
nent e
ngin
e
s re
spe
c
tive
ly. Assume
we
have 3
do
cu
ments:
Do
c
1
,
Do
c
2
, an
d
Doc
3
and several f
a
cts:
-
Do
c
1
is found
resp
ectively at rank
8,9, and 11 in the toplist
s
of
SE
1
,
SE
2
, and
SE
3
.
-
Do
c
2
is found
at rank 9 a
n
d
13 in the toplists of
SE
1
and
SE
3
; it is not found by
SE
2
.
-
Do
c
3
is found
resp
ectively at rank
3, 5, and 4.
The WBF
sco
r
ing of these document
s is then sho
w
n i
n
Table 2.
Table 2. The
Scori
ng of WBF Algorithm
by Con
s
ide
r
in
g (3)
Quer
y
SE
1
(50
%
)
SE
2
(30
%
)
SE
3
(20
%
)
W
B
F Ran
k
in
g S
c
ore
Doc
1
50*(200-
8+1) = 9
650
30*(200-
9+1) = 5
760
20*(200-
11+1)=
3800
(9650+5760+3
8
0
0
)*3 = 57630
Doc
2
50*(200-
9+1) = 9
600
Not found
20*(200-
13+1)=
3760
(9600+0+3760
)*
2
= 26720
Doc
3
50*(200-
3+1) = 9
900
30*(200-
5+1) = 5
880
20*(200-
4+1)= 3
940
(9900+5880+3
9
4
0
)*3 = 59160
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 149
5 – 1504
1500
Table 2 sho
w
s the orderi
n
g
:
Doc
3
>
Doc
1
>
Do
c
2
.
2.2. The ‘
Default
’ WBF Algorithm
The ‘
Def
ault
’ WBF algo
rithm is a spe
c
ial instan
ce
of the ‘
MyOwn
’ WBF but with the
differen
c
e
s
in
Step 1
and
Step 5 of th
e
‘
MyOwn
’
algo
rithm. In ‘
Default
’ WBF, the
k
values
for
k
-
toplist
of ea
ch co
mpo
nent
engin
e
a
r
e i
n
fluen
ced
by each
comp
o
nent en
gine’
s weig
ht. In the
‘
Default
’
WBF, con
s
ide
r
3
comp
one
nt sea
r
ch en
gin
e
s a
nd
k
is
an elem
ent o
f
{50, 100, 2
00}
,
then for equ
a
t
ion (3) we se
t
)
(
max
k
k
r
to be 200 for the engin
e
with the high
est
j
w
; 100 for the
engin
e
with the se
co
nd hi
ghe
st
j
w
, and 50 for the third
engine.
Example 2:
Con
s
id
er a m
e
ta-sea
rch e
ngine
system
built usin
g ‘
Default
’
WBF
algorithm. L
e
t
n
= 5,
and the
ch
osen search
en
gine
s are
SE
1
,
SE
2
, and
SE
3
. Suppose
the user
spe
c
ifies {5
0, 30,
20
}
as the
re
spe
c
tive wei
g
th
of those
engi
nes. Assu
me
we h
a
ve 3
document
s:
Do
c
1
,
Do
c
2
, and
Do
c
3
and several fact
s:
-
Do
c
1
is foun
d
on
SE
1
in
ra
nk
8 fro
m
SE
1
toplis
t, on
SE
2
in ra
nk 9 f
r
om
SE
2
topli
s
t, and
on
SE
3
in rank
11 from
SE
3
toplis
t.
-
Do
c
2
is
fo
un
d o
n
SE
1
in ra
nk 9 from
SE
1
toplist, not found
on
SE
2
, and on
SE
3
in
r
a
nk
13
from
SE
3
toplis
t.
-
Do
c
3
is found
on
SE
1
in ra
nk 3
from
SE
1
toplis
t, on
SE
2
in ra
nk
5 f
r
om
SE
2
topli
s
t, and
o
n
SE
3
in rank 4
from
SE
3
toplis
t.
Then the
sco
ring of ‘
Default
’ WBF as on
Table 3.
Table 3. The
Scori
ng of WBF Algorithm
by con
s
ide
r
in
g (3)
Quer
y
SE
1
(50
%
)
SE
2
(30
%
)
SE
3
(20
%
)
W
B
F Ran
k
in
g S
c
ore
Doc
1
50*(200-
8+1) = 9
650
30*(100-
9+1) = 2
760
20*(50-1
1
+1) = 8
00
(9650+2760+8
0
0
)
*3 = 39630
Doc
2
50*(200-
9+1) = 9
600
Not found
20*(50-1
3
+1) = 7
60
(9600+0+760
)*2
= 20720
Doc
3
50*(200-
3+1) = 9
900
30*(100-
5+1) = 2
880
20*(50-4+1
)
= 94
0
(9900+2880+9
4
0
)
*3 = 41160
From Ta
ble 3
we have
Doc
3
>
Doc
1
>
Do
c
2
in the rank o
r
de
r of the MSE syst
em. This
ran
k
o
r
de
r is i
n
fluen
ced by
WBF sco
r
e
s
of each
do
cu
ments. Th
e
more
releva
nt a do
cume
nt, the
WBF will put i
t
into higher p
o
sition of
web
search
retrie
val of the MSE.
3. Results a
nd Analy
s
is
We h
a
ve de
scribe
d two
variants for
ranki
ng in
WBF meta-sea
rch.
We
wo
u
l
d like to
comp
are the
m
with othe
r
existing ran
k
fusion
al
go
ri
thms: the KE algorithm [8]
and the
Cou
n
t
Functio
n
alg
o
r
ithm [9]. Que
r
ies
are
se
nt to
each search engin
e
, retrieving toplist
s
until
k
{
k
= 5
0
,
100, 20
0} URL
s
have b
e
en cra
w
led from ea
ch
co
mpone
nt se
a
r
ch
engi
ne a
nd me
rge
d
b
y
the
four algo
rith
ms (‘
Default
’ WBF,
‘
My
Own
’ WB
F, KE,
and
Co
unt F
unctio
n
). F
o
r
evaluation,
a
s
the
queri
e
s
are
multi domain
(not limited
su
ch a
s
TR
E
C
data
s
ets; t
hese multi d
o
main q
ueri
e
s are
for sim
u
lating
real
wo
rld
situations) an
d also
si
nce
usin
g hum
an
judgme
n
t is expen
sive, we
evaluate
o
u
r system usi
n
g
three differe
nt
gold st
an
d
a
rd
s:
Reci
pro
c
al Ran
k
(RR),
Bo
rda Co
unt
(BC),
an
d
Condo
rcet met
hod
s. Th
e la
ttests a
r
e
kn
own
a
s
“Pse
udo-Rel
e
van
c
e” d
a
tasets as
sug
g
e
s
ted in
[21].
In this research, all exp
e
ri
ments
are
exec
ute
d
on an
Acer 474
1 machi
ne with
an
Intel
core i3
and
5
G
B RAM. All
prototyping
p
r
ocesse
s fr
o
m
retri
e
val, p
a
rsi
ng, me
rgi
ng, ra
nki
ng,
until
pre
s
entin
g th
e que
ry re
sult
s to u
s
er, a
r
e
impleme
n
ted
in Python. The lan
guag
e i
s
efficie
n
t an
d a
fast
Python module, nam
ed
web
p
y
,
h
e
lps in
provi
d
ing
a u
s
e
r
f
r
iendly i
n
terf
ace
of the
M
S
E
prototype. F
o
r the
evaluati
on of the
tasks in
a
ll
of our experim
ents, we adopte
d
t
w
o
m
e
tri
cs
th
at
captu
r
e the relevan
c
e at d
i
fferent asp
e
cts [22]:
Preci
s
io
n
at
ran
k
n
(P@
n
): Precision
at rank
n
i
s
defined
as th
e pro
p
o
r
tion
of retrieve
d
document
s that is relevant
with the gol
d
stand
ard, ave
r
age
d over all
docum
ents.
Mean
Re
cip
r
ocal
Ran
k
(MRR): M
RR mea
s
u
r
e
s
whe
r
e i
n
the
ra
nkin
g th
e
first
releva
nt
document
(with the gold
stand
ard
)
i
s
retur
ned
by the syste
m
, averag
ed ov
er all th
e
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
The Anal
ysi
s
of Ran
k
Fu
sion Techniq
u
e
s
to Im
prove
Query
Rele
va
nce
(Diyah Puspitaningrum
)
1501
document
s.
This m
e
asure provides
i
n
sight in the ability of the
system to return a
relevant
document at the top of the ran
k
ing.
Table
4 to
T
able
6
sho
w
s the
re
sults i
n
term
s
of b
o
th P@
n
a
n
d
M
R
R of dif
f
erent
k
-
toplist
comp
a
r
ed to
gol
d
standards
(P
seudo
-Releva
n
ce
). From t
hes
e results,
we
can
see
tha
t
our pro
p
o
s
ed
method (‘
De
fault
’ and ‘
MyO
w
n
’ WBF
)
achi
eves
th
e best re
sults
i
n
term
s
of b
o
th
P@
n
an
d M
RR. In ge
neral, these veri
fies the
effectiveness of o
u
r pr
opo
se
d method for
rank
aggregatio
n.
For both
WB
F variants i
n
all of experi
m
ents in
thi
s
research
we
set up weig
hts = {
30,
20,15,25,1
0
}
respe
c
tively for Goo
g
le,
Bing,
AskJeeves, Ly
co
s (p
owered by
Yahoo!), and
Exalead. All
relevan
c
e
score
s
is obtai
ned usi
ng
b
e
st P@1
0
of each ran
k
fusio
n
algo
rithm
comp
are to b
e
st P@10
of Googl
e. Go
ogle h
a
s
ch
o
s
en
as a b
e
n
c
hma
r
k si
nce
Googl
e
sho
w
s
best pe
rform
ance of any individual sea
r
ch e
ngi
n
e
s.
We orde
r indi
vidual com
p
o
nent engin
e
s
by
their wei
ghts
for conve
n
ien
c
e (T
able 4 to
Table 6).
Table 4. Re
sults of Differe
nt Methods fo
r
MSE, comp
ared to Pseu
do-Releva
nce Sets at
k
=50
S
y
s
t
em
P@10_RR
P@10_BC
P@10_Co
ndorc
e
t
MRR_RR
MRR_BC
MRR_C
ond
orce
t
MSE Rank F
u
si
on Perf
orma
nce
My
O
w
n
WBF
0.6563
0.6950
0.5577
0.9853
0.9627
0.4339
Default
WBF
0.6530
0.7300
0.6300
0.8877
0.7543
0.4453
KE
0.6650
0.6613
0.5763
0.9132
0.8747
0.4009
Count
Function
0.2687
0.2483
0.3050
0.3238
0.1373
0.1417
Indi
v
i
dual
Co
m
pone
nt E
ngi
nes
Performa
nce
Google
0.3267
0.3437
0.2933
0.4113
0.4356
0.2949
L
y
cos
0.2893
0.3073
0.2643
0.4088
0.4432
0.2899
Bing
0.1783
0.1800
0.1630
0.3385
0.3299
0.2015
Ask.com
0.2010
0.2013
0.1907
0.3625
0.3676
0.2053
Exalead
0.0680
0.0627
0.0727
0.1188
0.1227
0.1135
Table 5. Re
sults of Differe
nt Methods fo
r
MSE, comp
ared to Pseu
do-Releva
nce Sets at
k
=1
00
S
y
s
t
em
P@10_RR
P@10_BC
P@10_Co
ndorc
e
t
MRR_RR
MRR_BC
MRR_C
ond
orce
t
MSE Rank F
u
si
on Perf
orma
nce
My
O
w
n
WBF
0.7103
0.7470
0.6023
0.9764
0.9340
0.4097
Default
WBF
0.6683
0.6953
0.6447
0.9294
0.9285
0.4378
KE 0.6970
0.6993
0.5957
0.9214
0.8870
0.3875
Count
Function
0.2533
0.2273
0.3000
0.3205
0.1277
0.1409
Indi
v
i
dual
Co
m
pone
nt E
ngi
nes
Performa
nce
Google
0.2987
0.3097
0.2863
0.3909
0.4264
0.3015
L
y
cos 0.2717
0.2853
0.2567
0.3829
0.4091
0.2715
Bing 0.1663
0.1693
0.1583
0.3242
0.3255
0.2006
Ask.com
0.1960
0.1980
0.1827
0.3468
0.3574
0.2036
Exalead
0.0633
0.0597
0.0643
0.1334
0.1406
0.0995
Table 6. Re
sults of Differe
nt Methods fo
r
MSE, comp
ared to Pseu
do-Releva
nce Sets at
k
=200
S
y
s
t
em
P@10_RR
P@10_BC
P@10_Co
ndorc
e
t
MRR_RR
MRR_BC
MRR_C
ond
orce
t
MSE Rank F
u
si
on Perf
orma
nce
My
O
w
n
WBF
0.7253
0.7630
0.5953
0.9683
0.8880
0.3979
Default
WBF
0.6200
0.6377
0.6280
0.9153
0.9376
0.4463
KE 0.7050
0.7213
0.5887
0.9131
0.8541
0.3777
Count
Function
0.2503
0.2203
0.3003
0.3218
0.1270
0.1450
Indi
v
i
dual
Co
m
pone
nt E
ngi
nes
Performa
nce
Google
0.2653
0.2703
0.2743
0.3843
0.4300
0.3116
L
y
cos 0.2370
0.2437
0.2417
0.3506
0.4027
0.2789
Bing 0.1497
0.1527
0.1497
0.3245
0.3361
0.2179
Ask.com
0.1884
0.1887
0.1747
0.3489
0.3613
0.2234
Exalead
0.0580
0.0563
0.0573
0.1418
0.1404
0.0980
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 149
5 – 1504
1502
The d
a
ta in
Table
4 a
r
e
that of
k
-toplis
t with
k
= 5
0
. Table
4
cl
early
sho
w
s
that the
highe
st relev
ance score (P@10
)
for ‘
My
Ow
n
’ WBF i
s
0.695
0 for Borda
Count,
two times hi
gher
than that of Google
(2.02 ti
mes); that of ‘
Default
’ WBF
is 0.730
0 for
Borda
Cou
n
t, also two times
highe
r th
an t
hat of
Goo
g
l
e
(2.12 tim
e
s);
and
t
hat
of KE is 0.6
650 fo
r
Re
ci
pro
c
al
Rank
that
almost two times hi
ghe
r than that of
Googl
e (1.
9
3
times). F
r
o
m
Table
4, the be
st P@1
0
for
Cou
n
t Fun
c
tion is o
n
ly 0.3050 fo
r Co
ndorcet that
belo
w
the P
@
10
of Goo
g
le (le
s
ser,
0.89
times). F
r
om
the MRR perspe
c
tive, in g
eneral at
k
= 50 the two
WBF metho
d
s
outpe
rform the
others (WBF
s
do quite
wel
l
in getting the first co
rrect
positio
n).
Table
5
sh
ows the
same
d
a
ta, but fo
r
k
= 1
00. It al
so
sho
w
s im
prov
ement i
n
P
@
10
with
highe
st rel
e
vance sco
r
e i
s
for ‘
My
Ow
n
’ WBF with 0
.
7470 for Bo
rda Co
unt (2.
41 times
hig
her
than P@1
0
of Google
)
. The high
est
score
of ‘
Def
ault
’ WBF is now 0.69
53
for Borda
Count
pse
udo
-relevance sets
(2.
25 time
s); tha
t
of KE
is no
w 0.69
93 fo
r
Borda
Co
unt
(2.26 time
s); and
Cou
n
t Fun
c
ti
on is 0.30
0 f
o
r
Con
dorcet
(le
s
ser,
o
n
ly 0.97 time
s).
All again
s
t
best P
@
10
of
Googl
e. In terms of MRR, the two WBF
al
gorith
m
s
still perform b
e
tter than the ot
hers.
Table 6
sho
w
s the
re
sul
t
s for
k
-toplis
t with
k
= 200. Fro
m
T
able 6 the
highe
st
improvem
ent
in relevan
c
e
score (P
@1
0) a
gain
s
t be
st P@10 of
Googl
e for ‘
MyOwn
’ WB
F
is
0.7630 fo
r Borda
Co
unt p
s
eu
do-rel
e
vance
set
s
(2.7
8 times tha
n
that of Goo
g
le); for ‘
Def
ault
’
WBF the
hi
gh
est i
s
0.6377
for Bo
rda
Co
unt p
s
eu
do-relevan
c
e
set
s
(2.3
2 time
s);
for KE
meth
od,
the high
est i
s
0.7
213 fo
r Bord
a Cou
n
t pseudo
-rel
evance
sets (2.63
times); and fo
r
Co
unt
Functio
n
the
highe
st i
s
0
.
3003 fo
r
Co
ndorcet
(1.09
times). All again
s
t P@1
0
of Go
ogle.
In
gene
ral as others
k
-topli
s
t, at
k
= 2
00 t
he M
RR
of
WBFs al
so
stable outp
e
rf
orm oth
e
r ra
nk
f
u
sion met
h
o
d
s.
As a con
c
lu
si
on, both the ‘
Default
’ WBF
and the use
r
defined ‘
MyO
w
n
’ WBF
p
r
odu
ce
best re
sults. They
outp
e
rf
orm
other al
gorithm
s
su
ch a
s
KE [13]
and
Co
unt
Functio
n
[9].
The
relevan
c
e
of Cou
n
t Fun
c
tion algo
rithm i
s
far b
e
lo
w the WBF
s
, this
is due to th
e
simpli
city of the
algorith
m
that only computes the
su
m of local rank of do
cu
ment
x
retu
rned from ea
ch
comp
one
nt engine
s divide
s by total numbe
r of occuren
c
e of
x
in all meta-sea
rch engi
n
e
comp
one
nts. The Co
unt Fu
nction al
gorit
hm doe
s not
con
s
id
er neit
her po
pula
r
iti
e
s of co
mpon
ent
engin
e
s n
o
r the numb
e
r of
cra
w
led topli
s
t docum
ent
s. KE algorithm con
c
e
r
n
s
a
bout ho
w ma
ny
numbe
r of toplist do
cume
nts cra
w
led from each
co
m
pone
nt engin
e
s but po
pula
r
ities a
r
e missed.
From exp
e
ri
ments al
so
we foun
d th
at the best
k
-topli
s
t of e
a
ch
com
pon
ent sea
r
ch
engin
e
s (in
t
e
rm
s of
preci
s
ion
)
i
s
re
ached fo
r
k
= 200 (‘
My
Ow
n
’ WBF
with
p
r
eci
s
io
n
of 2.
78
times hig
her t
han that of G
oogle
)
, followed by 2.41 times hi
ghe
r fo
r ‘
MyOwn
’ WBF
with
k
= 1
00,
and 2.12 tim
e
s high
er fo
r ‘
Default
’ WBF
with
k
= 50,
all com
pared
to Googl
e. Th
erefo
r
e the b
e
st
method
s fou
nd am
ong
ra
nk fu
sion
m
e
thod
s a
r
e t
he Weighte
d
Borda
-
Fu
se
algo
rithms.
The
‘
Default
’ WB
F is suitabl
e for small dataset
s
whil
e the ‘
MyOwn
’ WBF is sui
t
able for big
ger
datasets (e.g
. >
100
-topli
s
t cra
w
led fro
m
each com
pone
nt
engin
e
). The be
st
gold sta
nda
rd is
achi
eved by Borda
Cou
n
t techni
que.
From
Tabl
e
4
to Ta
ble
6,
mostly in
all
ca
se
s the
we
ight that
ha
s
been
set up
i
n
fluen
ce
the result. For example,
high wei
ght on Googl
e will most prob
ably give Google as the
best
sea
r
ch engi
n
e
. Except for ask.co
m that alwa
ys b
e
tter than Bing. The Ask.com uses t
he
ExpertRa
n
k a
l
gorithm that perfo
rms b
e
tter than Bing
that use
s
be
st trail finding algorithm
s. Th
e
ExpertRa
n
k
algorith
m
is b
a
se
d on the
HITS algo
rith
m that use
s
a schem
e in
whi
c
h eve
r
y web
page i
s
a
s
sig
ned two sco
r
es: the h
ub
score an
d the
authority sco
r
e. Whe
n
com
pare to
Goo
g
l
e
’s
PageRan
k al
gorithm, th
e
Googl
e
sea
r
ch en
gine
ha
d
more
relevant
top
re
sults,
h
i
gher qu
antity of
relevant re
sul
t
s and that its re
sults rem
a
ined
mo
re stable than the ExpertRan
k algo
rithm [23].
While
Go
ogle
’
s
sea
r
ch
alg
o
rithm
s
a
r
e
very d
epe
nde
nt on
HT
ML t
e
xt whe
n
it
comes to i
nde
xing
web
s
ite
s
, multimedia co
nte
n
ts (ima
ge
s, video, audi
o,
Flash, an
d others) a
r
e far
better with Bi
ng
[24]. In all of
our
expe
rime
nts
we i
gno
re
any mu
ltim
e
d
ia conte
n
ts and
fo
cu
s on
text
(structu
red
informatio
n) this i
s
the
rati
onale
for why ask.
com
always b
e
tter th
an Bin
g
. Th
e
Exalead
sea
r
ch
engin
e
provid
es hyb
r
id se
arching
over
typed in
form
ation extra
c
te
d from struct
ured
databa
ses,
as well as se
arching
over unstructu
re
d
text
[25].
This sema
ntic se
a
r
ch
en
gine
i
s
not
too su
cce
s
s
in our expe
ri
ments, agai
n becau
se we onl
y focu
s on
stru
cture
d
informatio
n.
Furthe
rmo
r
e, con
s
id
erin
g the length of the
que
rie
s
, in Table 7 and Table 8 we fo
cu
s on
the use of two or th
ree te
rms q
uery
sin
c
e m
o
st
q
uery (97%)
of all
queri
e
s in
World
Wide
Web
having le
ss than 6 te
rm
s [
26]. In our
experim
ents,
we do n
o
t u
s
e
queri
e
s f
r
om
TREC
sin
c
e t
heir
length
avera
g
e
a
r
e l
ong
er than
co
mmon
que
rie
s
on
i
n
ternet. T
abl
e 7
shows th
e effect
of le
ngth
of query te
rm
s. Fro
m
the result
s we kno
w
that
u
s
ing t
w
o o
r
three t
e
rm
s are opti
onal
sin
c
e th
ere
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
The Anal
ysi
s
of Ran
k
Fu
sion Techniq
u
e
s
to Im
prove
Query
Rele
va
nce
(Diyah Puspitaningrum
)
1503
are no si
gnifi
cant differe
nce in
the relevance scores (P@
n
and M
R
R). Table 7 suppo
rts re
sult
in
[20] that performa
n
ce of the que
ry co
mplexity
2 terms always o
u
tperfo
rms th
e perfo
rman
ce of
the 3 term
s,
but in ou
r ca
se the
differe
nce i
s
not
si
gnifica
nt thus we leave thi
s
a
s
a choi
ce to
use
r
.
Table 7. Perf
orma
nce of Borda
Cou
n
t Ps
eu
do-Rel
e
vance ba
sed o
n
Length of T
e
rm
s
Leng
th of Term
s
P@1
P@3
P@5
P@10
MRR
Three
t
e
rms
0.6100
0.5629
0.5053
0.3610
0.4909
T
w
o
te
rms
0.6470
0.5926
0.5404
0.3920
0.5257
Table 8. Perf
orma
nce of Borda
Cou
n
t
Pseu
do-Rel
e
vance ba
sed o
n
Operators
Opera
t
ors
P@1
P@3
P@5
P@10
MRR
AND
0.7074
0.6222
0.5806
0.4424
0.5725
OR
0.5867
0.5630
0.5001
0.3416
0.4789
AND ... AND
0.6378
0.6116
0.5630
0.4244
0.5495
AND ... OR
0.6151
0.5593
0.5053
0.3562
0.4879
OR ... AN
D
0.5945
0.5503
0.4860
0.3379
0.4696
OR
...
OR
0.5926
0.5304
0.4670
0.3255
0.4568
To exami
ne t
he effe
ct of
complexity de
gree
of q
uery
length,
we
a
nalyze
d
qu
eri
e
s
with
degree
1
= 1
ope
rato
r (t
wo term
s) an
d
deg
ree
2
=
2 op
erato
r
s (t
hree
term
s).
As op
erators
we
use A
ND/O
R
combi
nation
s
. Table 8
sh
ows releva
nce scores of B
o
rda
Cou
n
t p
s
eu
do-rel
e
vance
sets
obtain
e
d
from the ‘UNIB Meta Fusi
on’ MSE pr
ot
otype. From
Table 8
we
sugge
st the u
s
e of
operator "AND" for degree 1 and o
perators "AND ... AND" for degr
ee 2
that stable
in produci
ng
relevant resul
t
s.
4. Conclusio
n
s
In this pape
r we bri
e
fly descri
bed two ran
k
fusi
o
n
algorith
m
s:
‘
MyOwn
’ WBF and
‘
Default
’
WBF
as
well a
s
th
eir impl
ement
ation on th
e ‘
UNIB Meta
F
u
sio
n
’, a met
a
-sea
rch en
gi
ne
prototype. F
r
om expe
rime
nts we
sho
w
ed that o
u
r v
a
riant
s of
We
ighted Bo
rda
-
Fuse
alg
o
rith
ms
stable o
u
tpe
r
form
s othe
r M
SE rank fu
sio
n
method
s.
We
sho
w
ed t
hat the wei
g
h
t
that has be
en
set up influ
e
n
c
e the
re
sult. The ‘
Default
’
WBF is
best for small datasets
while the ‘
MyOwn
’ WBF
is be
st for larger data
s
et
s.
The be
st value of
k
-to
p
list for cra
w
ling
the web is
a
c
hieve
d
for
k
=
200. Furthermore, we suggest the use
of operat
ors "AND" or "AND ...
AND" each for degree 1
and de
gree
2 que
rie
s
to increa
se
relev
ance with
u
s
er ne
ed
s. From ex
pe
rime
nts there are
no
signifi
cant
differen
c
e
in
rel
e
vance
if a
u
s
er
uses
eithe
r
two
or three t
e
rm
s q
u
e
r
ies
while
browsin
g
a se
arch
engi
ne. As a
ge
n
e
ral
co
nclu
si
on, ou
r
sy
ste
m
, the MSE prototype ‘
U
NIB Meta Fu
sion’,
wa
s built
co
rr
ect
l
y
.
Ackn
o
w
l
e
dg
ment
This work is supp
orted
by the University
of Bengkulu’
s Excellent Re
sea
r
ch G
r
a
n
t
#557/
UN30.1
5
/LT/2014 fro
m
the Rese
a
r
ch a
nd Co
m
m
unity
Services Office of the Unive
r
sity
o
f
Bengkulu (LPPM UNIB), Beng
kulu Province, Sumate
ra, Indone
sia.
Referen
ces
[1]
Gulli A, Sign
or
ini A.
Buil
din
g
an Open S
our
ce Meta Searc
h
Engi
ne
. W
o
rl
d W
i
de W
eb
Confer
ence.
Chib
a, Jap
an. 200
5: 100
4-10
05.
[2]
Meng W
.
Meta
search E
n
g
i
ne
s. In: Liu
L, Ozsu MT
.
Editors
. En
cy
cl
op
ed
i
a
o
f
D
a
tab
a
s
e Sy
ste
m
s. 20
09
editi
on. Ne
w
Y
o
rk, USA:
Springer; 20
09: 17
30-1
734.
[3]
Maha
bhas
h
y
a
m
MS, Singit
ham P.
T
a
d
p
o
le: A Met
a
se
arch En
gi
ne
Evalu
a
tion
of
Meta Searc
h
Ranki
ng Strate
gies
. Stanfor
d Univers
i
t
y
. R
e
p
o
rt number: CS
276A. 20
02.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 149
5 – 1504
1504
[4]
Akritidis L, Katsaros D, Boza
nis P.
Effective Rank
ing F
u
s
i
on Meth
ods fo
r Persona
li
z
e
d
Metasearc
h
Engi
nes
. 12th
Pan-H
e
ll
enic
C
onfere
n
ce o
n
Informatics (PC
I
2008). Samo
s Island, Greec
e.
200
8: 39-
43.
[5]
D
w
ork C, Kum
a
r R, Naor M, Sivakum
a
r D.
Rank Ag
gre
gat
ion Meth
ods fo
r the W
e
b
. Proceed
ings
of
the 10th Intern
ation
a
l W
o
rld
W
i
de W
eb Co
n
f
erence. Ho
ng
Kong.
20
01: 6
1
3
-62
2
.
[6]
Lam KW
, Le
ung
CH.
Ra
nk Ag
g
r
eg
a
t
io
n fo
r Me
ta
-search
En
gi
ne
s
. Procee
din
g
s o
f
the 13
t
h
Internatio
na
l C
onfere
n
ce o
n
W
o
rld W
i
de W
eb. Ne
w
York.
200
4: 384-
385.
[7]
Liu YT
, Liu TY, Qin
T
,
Ma
ZM, Li H.
Supervise
d Rank
Aggre
gati
o
n
. Procee
din
g
s of W
W
W
200
7
Conference. Banff, Alberta,
Canada. 2007:
481-489.
[8]
Akritidis
L, Vo
utsakel
i
s G, Katsaros D, B
o
zanis P.
Qu
a
d
Searc
h
: A N
o
vel M
e
tase
ar
ch En
gin
e
.
Procee
din
g
s of
the 11th Pan
hell
e
n
i
c Conf
erence
o
n
Informatics (PCI 2007). Patras, Greece. 20
07
:
453-
466.
[9]
Patel B, Sh
ah
D. Ra
nkin
g
Algorit
hm for
Meta Se
arch
Engi
ne.
IJAER
S Internati
ona
l
Journ
a
l
of
Advanc
ed En
gi
neer
ing R
e
se
a
r
ch and Stu
d
ie
s
. 2012; 2(1): 3
9
-40.
[10]
Aslam JA, Montagu
e M.
Models for Metas
earch
. Proce
e
d
in
gs of the 24
th Annu
al Inter
natio
nal A
C
M
SIGIR
Conference on Research
a
nd
Development in Information
Retrieval (SIGIR 01).
Ne
w
York.
200
1: 276-
284.
[11]
Christe
n
sen
H
U
, Ortiz-Arro
yo D. App
l
yin
g
Da
ta F
u
si
on
Methods to P
a
ssage
Retriev
a
l i
n
QAS. In:
Hain
dl M, Kittl
er J, Roli F.
Editors
. Multip
le
Classifi
er S
y
s
t
ems: 7th Internatio
nal W
o
rks
hop, MC
S
200
7, Pra
gue,
Czech
Re
pu
bli
c
, Ma
y
2
3
-25,
200
7,
Proc
eed
i
ngs. IEEE C
o
mputer S
o
ciet
y Press. 2
007
.
p. 82. (Lecture
Notes in Com
puter Scie
nce,
Vol.
4472). 1s
t ed. Berlin He
idel
ber
g: Sprin
ger-Verl
ag;
200
7: 82-9
2
. Availa
bl
e from:
10.100
7/97
8-3-5
40-7
252
3-7
_9.
[12]
Montag
ue M, Aslam J.
Condorcet Fusion for
Improve
d
Retriev
a
l
. Proc
eed
ings of the
11th Annu
al
ACM C
onfere
n
c
e o
n
Inform
ati
on
an
d Kn
o
w
l
e
dge
Ma
nag
em
ent (CIKM
02).
T
y
so
ns
Corn
e
r
, VA. 20
02:
538-
548.
[13]
Ren
da ME, S
t
raccia U.
W
e
b Metas
earch:
Rank
vs. Sc
ore b
a
se
d R
a
nk Ag
greg
atio
n Meth
ods
.
Procee
din
g
s of
the ACM S
y
m
posi
u
m on Ap
p
lied C
o
mp
utin
g
(SAC). Melbo
u
rne, F
L
. 200
3: 841–
84
6.
[14]
F
agin
R, K
u
m
a
r R, S
i
vak
u
m
a
r D.
Comp
ari
ng T
op-
k
Lists
.
SIAM Jour
na
l o
n
D
i
screte
Mathe
m
atics
.
200
3; 17(1): 13
4-16
0.
[15]
Dorn J,
Naz
T
.
Structurin
g Meta-S
earc
h
Res
earc
h
b
y
Desi
gn P
a
tterns
. Proc
eed
ings
of th
e
Internatio
na
l C
o
mputer Sc
ien
c
e an
d T
e
chnolo
g
y
Co
nf
ere
n
ce (ICST
C
).
San Di
eg
o, Ca
liforni
a, USA
.
200
8.
[16]
van Er
p M, S
c
homak
er L.
Varia
n
ts of th
e Bor
da
Co
un
t Method
for
Co
mbi
n
i
ng
Ra
nked
Cl
assifie
r
Hypoth
e
ses
. Procee
din
g
s of the 7th Intern
ati
ona
l W
o
rks
hop
on F
r
ontiers in
Hand
w
r
itin
g R
e
cog
n
itio
n.
Amsterdam. 20
00: 443-
45
2.
[17]
Jadi
dol
eslam
y
H. Search R
e
s
u
lt Mergi
ng a
n
d
R
ank
in
g Strategies i
n
Meta-
S
earch En
gi
ne
s: A Surve
y
.
IJCSI Internationa
l Journ
a
l of
Computer Sci
ence
. 20
12; 9(
4): 239-2
51.
[18]
W
u
Z
,
Rag
hav
an V,
Du
C,
S
a
i K, M
eng
W, He H,
Yu
C.
SE-LEGO: Creating Meta
search Engine
on
De
ma
nd
. ACM SIGIR Conference, Demo paper.
T
o
ronto, Canada.
2003: 464-464.
[19]
Eastman CM,
Janse
n
BJ. C
o
vera
ge, Re
le
vance,
a
nd R
a
nkin
g: the Imp
a
ct
of Quer
y
Operators o
n
W
eb Search E
ngi
ne Res
u
lts.
ACM T
r
ansacti
ons on Infor
m
a
t
ion Syste
m
s
. 200
3; 21(4): 38
3–4
11.
[20]
Moham
ed KA-
E-F
.
Mergin
g
Multipl
e
S
earc
h
R
e
sults A
ppr
oach
for Met
a
-Search
En
gin
e
s
. PhD T
hesis
.
Pittsburgh, U
n
i
t
ed States: Po
stgradu
ate Sc
hoo
l of Inform
ation Sc
ie
nces
, Univers
i
t
y
of Pittsburgh
;
200
4.
[21]
Nura
y
R, C
an
F
.
Automatic R
anki
ng
of In
for
m
ation
Retri
e
v
a
l S
y
stems
usi
ng
Data F
u
s
i
o
n
.
Jour
nal
o
f
Information Pr
ocessi
ng a
nd
Mana
ge
me
nt
. 200
6; 42(3): 59
5-61
4.
[22]
Sigur
björ
nsso
n
B, van Z
w
o
l
R.
F
lickr T
ag Reco
mme
n
d
a
ti
on b
a
sed
on C
o
llectiv
e Kn
ow
ledg
e
. WWW
200
8 Co
nferen
ce. Beiji
ng, Chi
na. 200
8: 327-
336.
[23]
Z
eno G. Page
Rank vs E
x
p
e
r
t
Rank: Ana
l
ysi
s
of
T
heir Li
nk
Anal
ysis A
l
go
rithms for Ran
k
ing Se
arc
h
Results. T
hesis. Ba
y
a
m
on: D
epartme
n
t of Comput
er Sci
e
n
c
e, Univers
i
t
y
of Puerto Rico;
2010.
[24]
Sing
la A,
W
h
ite RW
, H
uan
g J.
Stu
d
yin
g
T
r
ailfi
ndi
ng
Algorit
h
m
s for
Enh
anc
ed
W
eb Se
arch
.
Procee
din
g
s of
SIGIR’10. Geneva,
S
w
itzerl
an
d. 2010: 4
43-4
50.
[25]
Exa
l
e
ad.
Exal
e
ad Cl
oud
View
Semantics W
h
i
t
e Paper
. E
x
al
ead. Re
port nu
mber: EN.140.
001.0-V
1
.2
.
201
0.
[26]
Janse
n
B, Spin
k A, Bateman J, Saracevic T
.
Real
Life Infor
m
ation R
e
triev
a
l: A Stud
y
of User Queri
e
s
on the W
eb.
ACM SIGIR For
u
m
. 1
998; 3
2
(1
): 5-17.
Evaluation Warning : The document was created with Spire.PDF for Python.