TELKOM
NIKA
, Vol.11, No
.11, Novemb
er 201
3, pp. 6986
~6
991
e-ISSN: 2087
-278X
6986
Re
cei
v
ed
Jun
e
7, 2013; Re
vised Augu
st
3, 2013; Acce
pted Augu
st 15, 2013
An Efficient Content based Image Retrieval S
c
heme
Zukuan Wei
*
1
, Hong
y
e
o
n
Kim
2
, You
ngk
y
un Kim
3
, Jaehong Ki
m
4
1,2,
3
BigData So
ft
w
a
re
Lab., El
ectronics a
nd
T
e
leco
mmunic
a
tions R
e
se
arch Institute (ET
R
I),
Daej
eo
n, 305-
700, KOREA,
4
Dept. of Comp
uter & Information En
gin
eer
in
g, Y
oung
Do
ng
Univers
i
t
y
, C
h
u
ngb
uk, 370-
70
1, KOREA,
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: anle
x
w
e
e
@
e
t
ri.re.kr
1
, kimhy
@
etri.re.kr
2
, ki
my
ou
ng
@
e
tri.re.kr
3
,
jho
ng@
yo
u
ngd
ong.ac.kr
4
A
b
st
r
a
ct
Due to the re
cent improv
e
m
e
n
ts in di
gital
ph
otogra
p
h
y
and storag
e
c
apacity, storing lar
g
e
amou
nts of i
m
ages
has
be
en
made
p
o
ssib
l
e
.
Conse
q
u
ent
ly
efficient
mea
n
s
to retriev
e
i
m
ages
matchi
ng
a
user
’
s
q
uery ar
e nee
ded. In this pap
er, w
e
propos
e
a frame
w
ork based on
a bipartite gr
a
ph mod
e
l (BGM)
for se
ma
ntic i
m
a
ge r
e
triev
a
l.
BGM is a sc
al
abl
e d
a
ta st
ruc
t
ure
that aids
semantic in
dex
ing in an
effici
ent
ma
nn
er, and it
can als
o
be i
n
cre
m
e
n
tally
u
pdate
d
. F
i
rstly, all the i
m
ag
e
s
are seg
m
ent
ed int
o
sever
a
l
regi
ons w
i
th i
m
age s
e
g
m
entat
ion
al
gorith
m
,
pre-trai
ned
SV
Ms are
used
to
ann
otate
eac
h
regi
on,
and
fin
a
l
lab
e
l is
obt
ain
ed by
merg
in
g
all t
he re
gi
on
lab
e
ls. T
h
e
n
w
e
use
the s
e
t
of i
m
ag
es a
n
d
the set
of reg
i
on
lab
e
ls to bu
ild
a bip
a
rtite gra
p
h
. W
hen a qu
e
r
y is given,
a q
uery no
de, in
iti
a
lly co
ntain
i
n
g
a fixed n
u
m
ber
of
labels, is creat
ed to attach to the bipartite graph.
The node then distributes t
he labels based
on the
edge
w
e
ight betw
e
e
n
the no
de an
d its neig
h
b
o
rs
. Image n
o
d
e
s
receivi
ng the
most la
be
ls re
prese
n
t the most
relev
ant i
m
ag
e
s
. Experimenta
l
results de
mon
s
trat
e that our prop
osed tec
h
niq
ue is pro
m
is
ing.
Ke
y
w
ords
: Image R
e
trieva
l; Image Se
gme
n
tation; Image A
nnotati
o
n
Copy
right
©
2013 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
In the la
st
decade,
due
to the
imp
r
ovem
ents in
digital
phot
ogra
phy a
n
d
sto
r
ag
e
cap
a
city, there ha
s be
en
rapid g
r
o
w
th i
n
the u
s
e
of
digital medi
a
su
ch ima
g
e
s
, video an
d au
dio.
As the
use of
digital m
edia
increa
se
s, ef
fective ret
r
iev
a
l and
man
a
gement te
ch
nique
s b
e
co
me
more i
m
po
rta
n
t. Such te
ch
nique
s a
r
e
re
quire
d to
fa
cil
i
tate the effective sea
r
chin
g and
browsi
ng
of larg
e m
u
ltimedia
datab
a
s
e
s
. In
ord
e
r to re
sp
ond
to this ne
ed, i
m
age
retri
e
va
l ha
s b
een
a
hot
resea
r
ch topi
c a
n
d
majo
r
techn
o
logy
of many
ma
jo
r re
se
arch
project
s
n
e
a
r
ly. Image
retrie
val
system
s mai
n
ly can
be
cast in t
w
o
ca
tegorie
s:
text based
and
conte
n
t ba
se
d syste
m
s. T
e
xt-
based
retri
e
val is a
metho
d
that m
anu
a
lly annotat
e
s
image
s by
te
xt. Howeve
r,
this m
e
thod
h
a
s
many limitati
ons be
ca
use
it is
hi
ghly la
bor-inten
sive,
time
con
s
um
ing a
nd
unp
ractical
with la
rge
databa
se
s. In conte
n
t-ba
sed ima
ge retrieval (C
BIR), the main
idea is to extract low l
e
vel
feature
s
, su
ch as
colo
r, texture and
sh
ape, whi
c
h
ar
e use
d
to me
asu
r
e
simila
rity. However, i
t
is
difficult to describe hi
gh
-l
evel sema
ntics u
s
in
g low-level feature
s
, thus such
image ret
r
ie
val
system
s have
poor p
e
rfo
r
m
ance for sem
antic qu
erie
s.
In order to i
m
prove the
retrieval accu
racy
of co
ntent-ba
s
ed
imag
e
retrieval
syste
m
s, resea
r
ch
focu
s
ha
s b
een
shifted
from
de
signi
n
g
sop
h
isti
cated
low-l
e
vel feat
ure extractio
n
algorith
m
s to
redu
cin
g
the
“se
m
anti
c
ga
p” bet
wee
n
th
e
visual feature
s
and the
rich
ness of huma
n
sema
ntics.
Automatic im
age semanti
c
annotation h
a
s be
en ap
p
r
oved to be
an effective way for
reali
z
ing
sem
antic im
age
retrievals. In
[1], Wa
n
g
et
al. co
nstructe
d a
we
b ima
ge a
nnotatio
n
system
calle
d ‘Annosea
rch’. The sy
ste
m
first sea
r
ched for sem
antically and
visually simi
lar
image
s on the Web, and t
hen ann
otations were mined
from retrieval results. In [2], Mei et al.
prop
osed a system for an
notating by learnin
g
sem
a
n
t
ic distan
ce from each se
mantic cl
uste
r in
image set. In [3], Li et al.
prop
osed a n
e
w ap
pro
a
ch
of multi-label
image ann
otation for ima
g
e
retrieval b
a
sed on ann
otated key
w
ords, a nov
el annotatio
n re
finement app
roa
c
h ba
se
d
on
PageRan
k is
also p
r
op
ose
d
to further i
m
prove retrie
val perform
an
ce.
Annotation
ca
n be treated
as a
pr
oblem
of cla
ssifi
cati
on from
a poi
nt of view of
machi
n
e
learni
ng. So
me previo
us
annotatio
n work a
r
e b
a
si
n
g
on k-nea
re
st neighbo
rs
(k-NN) [4, 5]. In th
e
field of cla
ssif
i
cation,
k-NN
is a relative l
o
w p
r
e
c
is
i
on
cla
ssifie
r
be
cause it
lacks t
r
ainin
g
proce
ss.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
An Efficient Content ba
sed
Im
age Retrie
val Sch
e
m
e
(Zuku
an
Wei)
6987
Thus,
cla
s
sification
app
ro
ach
with hi
g
her
pre
c
i
s
ion
s
shoul
d be
use
d
in a
n
n
o
tation p
r
o
c
e
ss.
More
over, fo
r the
abu
nd
ant and
vari
ous vis
ual
content of im
age, multi
-
la
bels sh
ould
be
annotate
d
to
get mo
re p
r
e
c
ise de
scri
ption of im
age
i
n
semanti
c
le
vel. What’
s
m
o
re, m
o
st of t
h
e
prop
osed
ap
proa
ch
es are assu
m
ed t
hat the d
a
ta
base is con
s
tant. But fo
r mo
st p
r
a
c
tical
databa
se
s
(li
k
e i
n
tern
et i
m
age
coll
ecti
ons), ne
w
im
age
s a
r
e
co
nstantly a
d
d
ed, in thi
s
case,
these
app
ro
ach
e
s
wo
n’t perfo
rm
well, prima
r
ily due to it
s re
sou
r
ce i
n
tensive
ma
trix
comp
utation
s
.
Most
im
age
r
e
t
r
iev
a
l
sy
st
e
m
s a
r
e
pr
opo
sed
a
s
sumin
g
that the
dat
aba
se
s a
r
e
consta
nt.
But for most pra
c
tical d
a
ta
bases (li
k
e in
ternet
imag
e colle
ction
s
),
new ima
g
e
s
are co
nsta
ntly
adde
d to th
e
image
coll
ecti
on that
po
se
s a
con
s
id
era
b
l
e challe
nge,
prima
r
ily du
e
to its
re
sou
r
ce
intensive mat
r
ix computati
ons. An incre
m
ental va
rian
t of pLSA proposed by Wu
et al.
[6] tried to
improve of th
e comp
utatio
nal efficien
cy
. Howeve
r th
ey failed to addre
s
s the issue of sto
r
a
g
e
compl
e
xity. In [7], Chan
drika Pulla
et a
l
. propo
se
d a
novel metho
d
ba
sed
on a
bipartite g
r
a
p
h
model can ad
dre
ss the
s
e i
s
sue
s
well, b
u
t has a lo
w pre
c
isi
on.
In this pa
pe
r, we p
r
o
p
o
s
e
a ne
w mo
del
for
content
-based ima
g
e
retrieval. Fi
rstly, the
image
s a
r
e a
u
tomatically
annotate
d
wit
h
seve
ral l
a
b
e
ls, an
d then
the label
s a
r
e used to b
u
il
d a
bipartite
grap
h, after th
at
we
ca
n u
s
e
it to retrie
ve
relevant ima
g
es
at runtime
.
The
re
st of
this
pape
r i
s
stru
cture
d
a
s
foll
ows: Sectio
n
2 di
sc
u
s
ses the ima
ge
a
nnotation
pro
c
e
ss. Se
ction
3
pre
s
ent
s the
bipartite g
r
a
ph mod
e
l we build a
nd Section
4 prese
n
ts
the retrieval
process.
Experiment
s and an
alysis will be pre
s
ented in
Section 5 and concl
u
si
on
s will be dra
w
n
in
Section 6.
2. Image An
nota
t
ion
2.1. Image Annota
t
ion
As it has al
re
ady been m
e
ntioned, auto
m
atic
imag
e sema
ntic a
n
n
o
tation is an
effective
way for reali
z
ing
sem
anti
c
image ret
r
ieval. In
this section we will disc
uss the role of im
age
segm
entation
and ann
otation, and we wi
ll pres
ent the approa
ch u
s
e
d
in this wo
rk.
Given the e
n
tire set of image
s of a giv
en databa
se an
d their extracted lo
w-level
feature
s
, it
may easily b
e
observed t
hat regi
o
n
s t
hat corre
s
po
nd to the sa
me con
c
e
p
t have
similar low-level descriptions.
Also, im
age
s that co
ntain the sa
me high
-leve
l
con
c
ept
s are
typically co
nsisted of
simil
a
r r
egio
n
s. F
o
r exam
ple, region
s that
contain the
co
nce
p
t ‘sky’ are
gene
rally visually simila
r, i.e. t
he color
of most of them sh
ould b
e
som
e
tone
of ‘blue’. On
the
other ha
nd, image
s that cont
ain ‘sky’ often ate con
s
i
s
ted of simila
r regi
on
s.
Figure 1. Fra
m
ewo
r
k of Image Annotati
o
n
An
no
tati
on
Trai
nin
g
s
e
t
Vis
u
al f
eature
s
I
m
ag
e Seg
m
en
t
a
t
ion
U
n
la
be
le
d im
ages
Vis
u
al f
eature
s
Gen
e
ti
c s
e
lect
io
n
Fi
nal
ann
o
t
a
t
ion
Cl
as
si
fi
cat
i
o
n
&
vo
tin
g
Cl
as
si
fi
e
r
s
Trai
nin
g
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 11, Novemb
er 201
3: 698
6 – 6991
6988
The afo
r
em
e
n
tioned
ob
se
rvations in
dica
te that simil
a
r regi
on
s ofte
n co-exi
st wit
h
some
high-l
e
vel co
nce
p
ts. Thi
s
mean
s that
region
co
-existence
s
sho
u
l
d
be
able to
provid
e visu
al
descri
p
tion
s
whi
c
h
can d
i
scrimin
a
te b
e
twee
n t
he
existen
c
e
s
o
r
not of certain high
-lev
el
con
c
e
p
ts. By
approp
riately
quanti
z
ing th
e re
gion
s
of an
ima
ge dat
aset, we ca
n extract
effici
e
n
t
descri
p
tion
s. Thus,
we
ca
n create a
set of l
abels
o
f
the most common
regi
o
n
types an
d
use
multiple lab
e
l
s
to represe
n
t an ima
ge.
In this pa
pe
r, we
use an
app
roa
c
h p
r
opo
sed i
n
[3] to
annotate th
e
unlab
eled im
age
s. The fra
m
ewo
r
k of an
notation a
pproach is
sh
ow
n in Fig
u
re
1. As
it can be see
n
, the pro
c
ess co
ntain
s
tw
o main stag
e
s
: training a
n
d
annotatio
n.
At training st
age, Supp
ort
Vector M
a
ch
ines
(
SVM) i
s
use
d
as
ba
si
c cl
assifiers,
for th
e
rea
s
on
that
SVM cla
ssifi
ers can b
e
reused in
i
n
cremental
data
base while
measures li
ke K-
Mean
s cl
uste
r ca
n’t. In this pape
r, the cl
assifica
tio
n
problem in
clu
d
i
ng multi-cla
s
ses
can b
e
se
en
as a set of two-cl
ass cl
assi
fica
tion. Thu
s
, a SVM is tra
i
ned for every
couple of cl
a
s
ses in the se
t.
Assu
ming the
r
e are
k cla
s
ses in the train
i
ng set,
k*(k
-
1
)/2
SVMs ne
ed to be train
ed totally. Th
en
visual featu
r
e
s
li
ke
colo
r,
colo
r layo
ut, colo
r stru
cture,
homo
gen
e
ous
texture, edge histo
g
ram
and
regi
on
sh
ape
are
extra
c
ted to
repre
s
ent i
m
age
s.
Ho
wever, if
al
l these featu
r
es
are
involv
ed
in the cla
s
sification, the di
mensi
on
com
p
lexity
will bri
ng high
com
p
utational cost
ing. Mean
whi
l
e,
different feat
ure
s
al
ways
have differe
n
t
impor
tan
c
e,
thus the
weights
of them need to
b
e
automatically adju
s
ted
in
cla
ssifi
cation.
In orde
r to i
m
prove
the
pre
c
isi
on
whi
l
e de
crea
se t
h
e
compl
e
xity of feature
s
, the mech
ani
sm
of genetic
sel
e
ction i
s
intro
duced to sele
ct best featu
r
es
for cla
s
sificati
on between t
w
o cl
as
se
s d
u
ring the train
i
ng pro
c
e
s
s.
Two
kind
s of
chromo
som
e
s a
r
e u
s
ed
here: a
real
chromo
som
e
{
w
i
},
w
i
∈
[0,1
]
,
i=1,2,…,n
re
pre
s
ent
s the
weig
hts
co
rre
spo
ndin
g
to f
eature
de
scri
ptors;
a bin
a
ry chromo
som
e
{
a
i
},
a
i
∈
{0,1}, i=1,2,…,n
rep
r
e
s
ent
s the pre
s
en
ce or ab
sen
c
e
of a feature descri
p
tor in
the
optimal featu
r
e
sub
s
et, n
is the
numb
e
r
of feat
u
r
e
s
. For
every o
ne vs. o
ne S
V
M, the fitness
function
in
G
A
is
de
sign
e
d
a
s
th
e
cla
s
sificatio
n
a
ccura
cy in
traini
ng
set. Mo
re
details ab
out
bi-
cod
ed
GA ca
n be
seen i
n
[6]. Geneti
c
operati
on i
s
t
e
rmin
ated
wh
en the
proce
s
s re
ache
s t
h
e
maximum nu
mber
of gene
ration; the fittest individu
al
is treate
d
a
s
the optimal
sele
ction
re
sult.
The relative o
p
timal wei
ght
ed feature se
t is
V
:
V=
{ v
i
}, v
i
= w
i
× a
i
× f
i
, i=
1,2,…,n,
f
i
is the i-th
descri
p
tor. After trainin
g
, we have optim
al su
b
s
et
s an
d co
rre
sp
ondi
ng wei
ghts fo
r every coupl
e
of classe
s in trainin
g
set.
At annotation
stag
es, all
th
e imag
es in t
he
d
a
taba
se
are
se
gmente
d
ba
sed
on
th
eir lo
w-
level cohe
re
n
c
e of feature
s
, such a
s
gra
y
-level
simila
rity and texture etc. For ea
ch image re
gio
n
,
each o
ne v
s
.
one SVM
cl
a
ssifie
r
i
s
use
d
to d
e
ci
de
a
label
a
c
cording the
optim
al featu
r
e
su
bset
and
optimal
weig
hts train
ed b
e
fore. T
he final
cla
s
s la
bel i
s
d
e
c
ide
d
u
s
ing
a majo
rity voting
approa
ch, e
v
ery SVM vote the u
n
l
abele
d
re
gio
n
with it
s classificatio
n
result, its final
cla
ssif
i
cat
i
on res
u
lt
f(x
)
is t
he label of
g
i
whi
c
h ha
s hig
hest voting score:
f(x
) = ar
g m
ax g
i
(x), i = 1,2,…,k
(1)
Each regio
n
is ann
otated
with sem
antic l
abel voted maximum ba
sed o
n
the an
notation
algorith
m
, an
d then all the regio
n
label
s
ar
e me
rge
d
a
s
the final image an
notatio
n.
2.2. Refinem
e
nt
The pe
rform
a
nce of ann
otation may be
influenc
e
d
b
y
many factors, one of whi
c
h is the
confin
e of se
gmentation
algorit
hm. Be
cause so
me region
s do
n’
t have sp
ecifi
c
meaning
s, a
nd
annotate the
s
e regio
n
s
will bring a
nnotati
on errors
. Co
nse
que
ntly, the image
retri
e
val results will
be influ
e
n
c
e
d
. So a
n
a
nnotation
ref
i
nement
app
roa
c
h i
s
ne
eded
to
ran
k
the
candi
date
annotatio
ns
a
nd g
e
t rid
of
irrel
e
vant an
notation
s
. Althoug
h ma
ng
y algorithm
s [
8
-12]
sati
sfy the
requi
rem
ent [
8
], the algo
rit
h
m u
s
ing
Ra
ndom
Wal
k
with Re
start
s
(RWR) is cho
s
en to
re-ra
n
k
th
e
can
d
idate a
n
notation
s
for its simpli
ci
ty and effectiven
ess in this pa
per.
To fa
cilitate t
he a
nnotatio
n refineme
n
t proces
s, a
confid
en
ce
score fo
r the
can
d
idate
annotatio
ns
should be p
r
ov
ided. The p
r
o
bability that
a region i
s
involved in a lab
e
l cla
ss i
s
often
use
d
as the
confiden
ce
score. Mo
re sp
ec
ifically, the confid
en
ce score of label
w
i
is defined a
s
:
score (
w
i
) =
p ( w
i
| I )
(2)
In ord
e
r to f
u
lly utilize th
e co
nfiden
ce
scores
of the candi
date
annotatio
ns and the
corpu
s
info
rm
ation, we refo
rmulate
the i
m
age
ann
otation refinem
en
t pro
c
e
s
s a
s
a graph
ran
k
i
ng
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
An Efficient Content ba
sed
Im
age Retrie
val Sch
e
m
e
(Zuku
an
Wei)
6989
probl
em a
nd
solve it with
the RWR alg
o
r
ithm. Each candid
a
te an
n
o
tation
v
i
i
s
consi
dered a
s
a
vertex of a
g
r
aph
G. All ve
rtice
s
of G
are fully c
onn
e
c
ted with pro
per
weights.
The weig
ht
of
an
edge i
s
defin
ed ba
sed o
n
the ‘co
-
o
c
currence” si
milari
ty as follows:
(3)
The
RWR alg
o
rithm
pe
rforms
as follo
ws [13]. As
sum
e
that the
r
e i
s
a rand
om
wal
k
er that
start
s
from n
ode
w
i
with certain probability.
At each time-t
ick, the wal
k
er has t
w
o choi
ces. One
is to ran
dom
ly choo
se an
available ed
ge to follow.
The other
choice is to jump to
w
i
with
probability
cx
v(j)
, where
v
is
the re
start vector and
c
is
the probability
of
restarting the
random
walk
[13].
A
ssu
me t
hat
G
is a g
r
aph
with
N
ver
t
ic
es
w
i
const
r
u
c
ted as
aforem
entione
d de
scriptio
n.
Let
A
be the
adjacen
cy matrix of
G
.
A
is colum
n
-norm
a
lized to
ensu
r
e that
the sum of e
a
ch
colum
n
in
A
i
s
o
ne. T
he
original
co
nfide
n
ce
sco
r
e
s
of ca
ndid
a
te a
n
notation
s
a
r
e
co
nsi
dered
a
s
the re
start ve
ctor
v
.
v
is
no
rmali
z
ed
su
ch that the su
m of all elem
ents in
v
is
o
ne. The
aim
is to
estimate the
steady
state pro
bability
of
all vertices, which is denote
d
by
u
. Let
c
b
e
the
probability of restarting th
e random
walk. It is empirically set to be 0.3 in our i
m
plementation.
Then the N-b
y
-1 steady
state prob
abilit
y vector
u
satisfies the fol
l
owin
g equati
on:
u = ( 1 – c )
Au + c
v
(4)
Therefore,
u = c ( I – (1 – c) A)
-1
v
(5)
whe
r
e
I
is
the
NxN
identity matrix.
The
i
th
eleme
n
t
u(i)
of the steady state
vector
u
is the probability that
w
i
can b
e
the final
annotatio
n.
After the ran
k
sco
re of eve
r
y word i
s
obt
ained, the
r
e a
r
e two m
e
tho
d
s to de
cide t
he final
annotatio
ns. One way
is to
ran
k
the score
from
h
i
gh to lo
w, a
nd then
ch
o
o
se
a pa
rticular
numbe
r of them. The othe
r way is to ch
oose all
the annotatio
ns
with pro
babili
ties larg
er th
an a
threshold a
n
d
the others are omitted.
3. Bipartite Graph Mod
e
l
Whe
n
the stage of image
annotation i
s
finishe
d
, we shoul
d mai
n
tain the rela
tionshi
p
betwe
en im
a
ges an
d lab
e
l
s. In
real
a
pplicatio
n
s
, the regio
n
typ
e
s
are
u
s
uall
y
too many,
and
each im
age
h
a
s
only of
a f
e
w
of them,
so the
matrix
o
f
the ima
g
e
s
and l
abel
s i
s
spa
r
se, a
nd it
is
a wa
ste of st
orag
e, we
ca
n use a
bipa
rtite grap
h m
odel to co
nvert the matrix
into a biparti
te
grap
h of ima
ges a
nd la
bel
s, and it can
address t
he
storage
pro
b
le
m efficiently. The st
ru
cture
of
a BGM is sh
o
w
ed in Fig
u
re
2.
Figure 2. The
Structure of a BGM
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
0
87-278X
TELKOM
NIKA
Vol. 11, No
. 11, Novemb
er 201
3: 698
6 – 6991
6990
The edg
es b
e
twee
n imag
es an
d label
s are wei
ghte
d
with frequ
e
n
cie
s
of label
s in the
image
s an
d
each is al
so
asso
ciated
with an inve
rse ima
ge fre
quen
cy value
.
These val
u
es
determi
ne th
e impo
rtan
ce
of a label to
a parti
cula
r i
m
age. G
= (W, D, E) i
s
the bip
a
rtite g
r
aph
su
ch
t
hat
W =
{
w
1
,
w
2
... ,
w
n
}, I = {
i
1
,
i
2
… ,
i
n
} and
E =
{
…,
…,
}. Here the
weigh
t
asso
ciated wi
th
w
1
= I
D
F
(
w
1
) and that
of
=
TF (w
1
, i
1
)
.
An image m
a
y contain
many label
s
and a lab
e
l
may be present in many
image
s.
Similarity of two imag
es
can be mea
s
ured in b
a
se
d on the nu
mber of lab
e
l
s they sha
r
e
.
If
image
s A a
n
d
B a
s
well
as A
and
C
are
simila
r, t
hen B
and
C a
r
e
also
similar. Thi
s
g
e
ts
reflecte
d in the paths
whi
c
h
traverse the grap
h
from i
m
age to label
s and the
n
ba
ck to ima
ges.
4. Image Retrie
v
a
l
Whe
n
a qu
ery image is giv
en, an ima
ge
segm
enta
tion
algorithm i
s
run to se
gme
n
t
it into
several
regi
o
n
s. Ea
ch
re
gi
on i
s
cla
ssifie
d
by
o
u
r
pre-t
r
aine
d SVM
s, and
a
co
rrespondi
n
g la
bel
is
distrib
u
ted to
it. The final
label
s of th
e qu
ery ima
ge a
r
e
obtai
ned
by merg
ing all th
e re
gion
label
s. Then
a que
ry no
de
, a kind
of im
age n
ode,
i
s
cre
a
ted
attaching itself to the lab
e
l no
d
e
s
that rep
r
e
s
e
n
t
the lab
e
ls it
ha
s. The
no
de the
n
di
stri
butes
the
l
a
b
e
ls ba
sed on the
ed
ge wei
ght
betwe
en the
node an
d its neigh
bors,
such that
the re
ceived
amount of la
bels i
s
directly
prop
ortio
nal to the edge
weight. The qu
ery node i
s
di
sconn
ecte
d from the gra
p
h
.
The neigh
b
o
rs
then p
r
opa
ga
te the label
s t
o
their
neig
h
bors. If t
he n
ode i
s
a im
ag
e nod
e, the d
i
stributio
n of t
h
e
label
s amo
n
g
its edge
s i
s
determi
ned a
c
cordi
ng to
th
e quantity wh
ich is
propo
rtional to the
fl
ow
cap
a
city cal
c
ulated by
the
normali
zed Term Fre
que
ncy (TF) valu
e. If the no
de
is
a word n
o
d
e,
then a
pen
al
ty, which i
s
prop
ortio
nal t
o
the In
ve
rse do
cum
ent
Freq
uen
cy (I
DF) value
of the
word, i
s
ta
ke
n from
the
a
m
ount
of lab
e
l it receives
and th
e
re
st i
s
di
stri
buted
l
i
ke th
e d
o
cu
men
t
node ba
sed on
the
fl
o
w
capa
city of its edge
s. Hen
c
e high
er the
edge
weig
hts the more lab
e
l is
prop
agate
d
t
o
the
releva
n
t
node. At e
a
c
h
nod
e the l
abel i
s
com
p
ared
with
a
cutoff value
which
is the
lea
s
t
amount
of th
e lab
e
l n
eed
ed for a
nod
e to fo
rwa
r
d
the lab
e
l. He
nce
the l
abel
is
prop
agate
d
to releva
nt do
cume
nts a
nd
terms
until
a
cutoff value i
s
re
ached
at whi
c
h poi
nt la
bel
is no lo
nge
r
prop
agate
d
. The no
de
s receivin
g the
most lab
e
ls
are the m
o
st
relevant ima
g
es.
Thus, it divides the no
de
s in the biparti
te graph in
to
relevant an
d non-rel
evant sets
simila
r to a
grap
h cut alg
o
rithm.
Our sy
stem
also
supp
ort
s
keywo
r
d
s
q
uerie
s. Un
de
r this circu
m
stan
ce, the users a
r
e
asked to i
npu
t keywo
r
d
s
a
s
qu
erie
s a
n
d
the issue
of image
retriev
a
l is transf
erred into the i
s
sue
of text-based
retrieval. It is faster tha
n
im
age que
ri
es, be
cau
s
e
the pro
c
e
ss
of segme
n
tin
g
image
s and o
b
taining lab
e
l
s
is not ne
ed
ed. We only
need to prop
agate the lab
e
ls bet
w
ee
n label
node
s an
d image no
de
s, without cre
a
tin
g
a new q
uery node.
5. Experimental Re
sults
on Image Re
triev
a
l
The mo
st co
mmon evalu
a
tion mea
s
u
r
es u
s
ed i
n
IR are preci
s
i
o
n and
re
call
, usually
pre
s
ente
d
a
s
a preci
s
io
n
vs. re
call g
r
a
ph. Re
se
arch
ers
are
famili
ar with
PR g
r
aph
s and ca
n
extrac
t information from t
hem
without interpreta
tion
probl
em
s. Preci
s
ion
an
d
recall
a
r
e
defi
n
ed
as a bell
o
w: x`
Preci
s
io
n an
d recall are
stand
ard m
e
asu
r
e
s
in IR, which give a good indi
cation of
system pe
rformance. Either val
ue alone
contain
s
insufficient information. We can alway
s
make
recall, simply
by retrieving all images. Similarly,
pre
c
isi
o
n can b
e
kept hig
h
by
retrieving
on
ly a
few imag
es.
Thus
pre
c
i
s
io
n and
recall should eith
er
be used tog
e
t
her, or the
n
u
mbe
r
of ima
g
es
retrieve
d sh
o
u
ld be spe
c
ified.
In our expe
riment, we
sele
ct more
than 1400
images fro
m
Pascal d
a
taset a
s
experim
ental
dataset. It co
ntains 15
obj
ect
cla
s
se
s. Due
to
o
b
je
cts contai
ned, every
ima
g
e h
a
s
1 to 5 l
abel
s.
At the traini
n
g
sta
ge, for o
b
ject
s
in
every categ
o
ry, cl
assifi
ers
a
r
e con
s
tru
c
ted. At
annotatio
n st
age, whole i
m
age i
s
firstly segm
ented
usin
g ima
g
e
seg
m
entatio
n algo
rithm,
and
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
An Efficient Content ba
sed
Im
age Retrie
val Sch
e
m
e
(Zuku
an
Wei)
6991
then a
nnotat
ed by p
r
e
-
trai
ned SVM
s. Normali
z
e
d
cut algo
rithm i
s
use
d
for imag
e segme
n
tation.
Con
s
id
erin
g obje
c
ts contai
ned in imag
e
s
, we set N=8
.
Table 1
sho
w
s th
e average p
r
e
c
isi
o
n and
re
call
of 15cate
g
o
r
ies
ba
sed o
n
image
annotatio
n an
d imag
e an
no
tation refin
e
m
ent app
ro
ac
h
prop
osed i
n
this p
ape
r. ML
mean
s im
ag
e
annotatio
n wi
thout annotat
ion refin
e
me
nt, while
ML
-refinem
ent m
ean
s imag
e annotatio
n wi
th
annotatio
n ref
i
nement.
Table 1. Re
sult in Preci
s
io
n and Recall
recall
precision
ML annotation
0.463
0.122
ML-refinemen
t
0.225
0.134
6. Conclusio
n
In this pa
per, we p
r
opo
se a ne
w
sch
e
me for
co
ntent-ba
s
e
d
im
age retrieval.
In the
annotatio
n st
age, a
bi-cod
ed ge
netic al
gorithm i
s
ap
plied to
sel
e
ct optimal feat
ure
su
bset a
nd
corre
s
p
ondin
g
optimal wei
ghts for eve
r
y one vs
one
SVM classifie
r
. For every region of imag
es,
optimal weig
hted feature sub
s
et
an
d a
nnotation
refi
nement imp
r
ove
the pre
c
i
s
ion of a
nnot
ation.
Then th
e
re
served l
abel
s
are
used to
b
u
ild a
bipa
rtite graph
an
d i
t
will be
u
s
ed
in the
retri
e
val
p
r
oc
es
s
.
Ho
wever, im
age a
nnotati
on ba
se
d o
n
im
age se
gmentation algorith
m
s
i
s
often
unsatisfa
ctory, for the rea
s
on that
they
often pro
d
u
c
e
regio
n
s t
hat don’t
have sp
ecific
m
eani
n
g
s,
thus a
nnotate
these
re
gion
s will
bri
ng a
nnotation
e
rror. In future,
contin
uou
s ef
forts in i
n
vent
ing
new a
nnotati
on algo
rithm
s
to obtain ded
icated a
nnota
t
ions will b
e
n
e
ce
ssary.
Ackn
o
w
l
e
dg
ements
This work wa
s su
ppo
rted b
y
the IT R&D prog
ram of M
KE/KEIT, KOREA. [10038
768,
the Develo
p
m
ent of Supe
rco
m
putin
g
System for the Genom
e Anal
ysis]
Referen
ces
[1]
X W
a
ng,
L Z
han
g, et a
l
.
AnnoS
eatch:
I
m
a
ge
auto-
an
notatio
n by
s
earch
. Pr
ocee
din
g
of IEE
E
Comp
uter Visi
on an
d Pattern
Recog
n
itio
n. 2
006; 14
83-
149
0.
[2]
T
Mei, Y W
a
n
g
, XS H
ua, S
Gong, S Li.
C
oher
ent i
m
a
g
e
ann
otatio
n by
l
ear
nin
g
se
ma
ntic dista
n
c
e
.
proce
edi
ngs of
Confere
n
ce o
n
Comp
uter Vi
sion a
nd Patter
n
Reco
gniz
e
. 2
008; 1-8.
[3]
R Li, YF
Z
h
a
ng, Z
Lu, J L
u
, Y
T
i
an.
T
e
chni
que of Image Retri
e
va
l base
d
on
Mult
i-lab
e
l
Imag
e
Annotati
on.
Se
cond Intern
atio
nal C
onfere
n
c
e
on Multim
edi
a and Inform
ation T
e
chnol
og
y. 20
10; 10
-
13.
[4]
X
Li,
L C
h
e
n
, L
Z
han
g, F
Li
n,
W
Y
Ma.
Ima
g
e
an
notati
on by larg
e-scal
e
co
n
t
ent-base
d
i
m
a
ge r
e
trieva
l.
Procee
din
g
of the 14th A
nnu
al
ACM internat
i
ona
l Conf
erenc
e on Multim
edi
a. 2006; 6
07-6
10.
[5]
T
M
Hamdani,
AM Alimi, F
K
a
rra
y
.
Distri
but
ed Ge
netic
Alg
o
rith
m w
i
th B
i
-
C
od
ed
Chro
moso
mes
an
d
a
New
Evaluati
o
n F
unction for
F
eatures Sel
e
ction.
Proce
edi
ng of the IEEE
Congr
ess on
Evoluti
onar
y
Comp
utation,
Can
ada. 2
006:
581-5
88.
[6]
H W
u
, Y W
ang, X C
h
e
ng.
In
crementa
l
pro
b
abil
i
stic lat
ent
se
mantic
ana
ly
sis for auto
m
a
t
ic questi
on
reco
mme
ndati
o
n
. Proc. ACM Conf. Recomm
end
er S
y
stems
,
ACM Press. 2008; 99-
10
6.
[7]
Cha
ndrik
a Pul
l
a
, Suman Kar
t
hik, CV Ja
w
a
har.
Efficient
Semantic In
de
xing for Imag
e Retriev
a
l.
Internatio
na
l C
onfere
n
ce o
n
Pattern Reco
gn
ition. 20
10: 32
7
6
-32
79.
[8]
C W
a
n
g
, F
Ji
ng, L
Z
h
a
ng,
HJ Z
h
a
ng.
I
m
age
An
notatio
n R
e
fine
ment
usin
g R
a
n
d
o
m
W
a
lk w
i
t
h
Research.
Pro
c
eed
ings
of th
e 14th
An
nua
l
ACM inter
nat
ion
a
l C
onfer
en
ce on
Multim
e
d
ia. Sa
nta
Barbar
a, Califo
r
nia, USA. 200
6.
[9]
E Chang, G Kingshy
, G Sy
chay
, G Wu. CBSA:
content-b
a
s
ed soft an
not
ation for
multi
m
o
d
a
l
i
m
a
g
e
retrieval
usin
g Bayes po
int machi
nes
. IEEE T
r
ans.
On CSVT. 2003; 13(1):
26-38.
[10]
Cha
ndrik
a Pul
l
a
, Suman Kar
t
hik, CV Ja
w
a
har.
Efficient
Semantic In
de
xing for Imag
e Retriev
a
l.
Internatio
na
l C
onfere
n
ce o
n
Pattern Reco
gn
ition. 20
10: 32
7
6
-32
79.
[11]
Yoha
n Jin, Ltif
ur
Khan, B Pr
abh
akara
n
.
T
o
be Ann
o
tated
or not? the
Ran
d
o
m
i
z
e
d
A
pprox
imatio
n
Graph Alg
o
rith
m for Inag
e An
notatio
n Refi
ne
me
nt Probl
e
m
.
ICDE20
08. 20
08.
[12]
W
ong RCF
, Leu
ng
CH
C. Automatic
S
e
mant
ic A
n
n
o
tation
of
Rea
l
-W
orld W
e
b I
m
ages.
IEEE
Transactio
n
on
Pattern Analys
is and Mac
h
in
e
Intellig
enc
e. 2008; 30(
11): 19
33-1
944.
[13]
Page
L, Brin
S
,
W
i
nogra
d
T
.
T
he Pa
geR
ank
Citatio
n R
anki
ng: Brin
gi
ng O
r
der to th
e w
e
b
. T
e
chnica
l
Rep
o
rt, Stanford Univ
ersit
y
, St
anford, CA. 19
88.
Evaluation Warning : The document was created with Spire.PDF for Python.