Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 1
,
Febr
u
a
r
y
201
6,
pp
. 13
9
~
15
0
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
1.8
540
1
39
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Incremental T
a
g Suggesti
o
n f
o
r Landmark Image Collections
Sutasinee Chi
m
lek,
Pu
npiti Piams
a
-n
ga
Department o
f
C
o
mputer Engin
e
ering, Facu
lt
y of
Engin
eering
,
K
a
setsart Univ
ersit
y
,
Chatujak
, B
a
ngk
ok, 10900
, Th
ailand
Article Info
A
B
STRAC
T
Article histo
r
y:
Received J
u
l
6, 2015
Rev
i
sed
Sep
28
, 20
15
Accepted Oct 15, 2015
In recent social media
applications,
descr
i
ptiv
e inform
at
ion i
s
colle
ct
e
d
through user tag
g
ing, such as face re
cognition,
and automatic environment
sensing, such as
GPS. There ar
e
man
y
app
licatio
ns that recognize landmarks
using informatio
n gather
ed from GPS da
ta. However, GPS is dependent on
the lo
cation of
the camer
a, no
t the landm
ark
.
In
this research, we propose an
automatic landm
ark tagg
ing scheme usi
ng secondar
y
regions to
distinguish
between similar landmarks. We propose tw
o alg
o
rithms: 1) land
mark tagging
b
y
second
ar
y
o
b
jects and 2) auto
matic new landmark recogn
ition.
The
images of 30 famous landmarks from vari
ous p
ublic databases
were used in
our experiment. Results show incr
ements of tagged areas and the
improvement of
landmark taggin
g
accuracy
.
Keyword:
Au
t
o
m
a
t
i
c lab
e
lin
g
Im
age search
Seconda
ry tag
Tag s
u
gge
st
i
o
n
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Pun
p
iti Piam
sa
-ng
a
,
Depa
rt
m
e
nt
of
C
o
m
put
er E
ngi
neeri
n
g
,
Facu
lty of En
gin
eering
,
Kaset
s
art Un
iv
ersity,
Ch
atuj
ak,
Bang
kok
, 10
900
, Th
ailan
d
.
Em
a
il: p
p
@
k
u
.ac.th
1.
INTRODUCTION
The wide avai
lability and low cost of digit
a
l phot
ogra
phy
m
eans that peopl
e who travel generally
accum
u
late a
very large
num
b
er of photogra
p
hs. Furthe
rm
ore, these i
m
ages are
ofte
n uploa
d
e
d
to
vari
ous
social
m
e
dia, such as Fa
cebook. The
im
ag
es n
o
t
on
ly relate to
th
e p
e
op
l
e
in
th
e ph
o
t
og
raph
s bu
t also
to
th
e
lo
catio
n
o
f
wh
ere t
h
ey were tak
e
n
.
For
in
fo
rm
atio
n
on
id
en
tity, social-n
etwo
rk
ap
p
lication
s
, such
as
Facebook a
n
d
Google Plus
,
use m
a
nual
and sem
i
-autom
a
tic tagging
on sh
a
r
ed phot
ogra
phs. For
e
x
a
m
ple,
face rec
o
gnition applications
will sugg
est “nam
e
tags” of
recognised hum
an faces in new phot
ogra
phs and
ask
use
r
s for c
o
nfirm
a
tion. T
a
gs
of hum
a
n faces are
te
xt-based ta
gs
for
use in a
text-bas
ed sea
r
ch.
Tag
s
for obj
ects in
so
cial m
e
d
i
a m
u
st b
e
related
to
key
w
ords as m
o
st que
ries to the syst
e
m
are still
t
e
xt
based
[
1
]
-
[
4]
.
Key
w
or
d t
a
gs s
u
ch as “T
oky
o To
we
r”,
“B
i
g
B
e
n”, a
n
d “Taj M
a
hal
”
are com
m
on q
u
eri
e
s.
The m
a
ssi
ve num
ber of use
r
s
on t
h
e I
n
t
e
rne
t
m
a
y
hel
p
t
a
g and sha
r
e t
o
i
m
prove t
a
g i
n
f
o
rm
at
i
on. Ho
w
e
ver
,
t
e
xt
ual
key
w
o
r
d t
a
ggi
ng
be
com
e
s
m
o
re di
ffi
c
u
l
t
fo
r
n
o
n
-
s
o
ci
al
appl
i
cat
i
ons, s
u
ch
as cat
al
ogi
n
g
hom
e
photographs,
because it is tim
e
cons
um
in
g a
n
d assistan
ce from
social networks
is
rarely available
.
By
obs
er
vat
i
o
n
,
m
o
st
use
r
s si
m
p
ly
keep a set
o
f
ph
ot
o
g
r
ap
hs i
n
a f
o
l
d
e
r
o
n
t
h
ei
r
har
d
d
r
i
v
e
and t
h
e f
o
l
d
e
r
nam
e
i
s
us
ual
l
y
base
d
on
t
h
e
ass
o
c
i
at
ed eve
n
t
,
w
h
i
c
h
i
s
not
ne
cessari
l
y
co
nsi
s
t
e
nt
wi
t
h
t
h
e
ot
he
rs.
Ph
ot
o
al
bum
soft
ware
, s
u
c
h
as
Google Pi
casa and Adobe Li
ght
room
,
can also be
use
d
to m
a
nage these
im
age
s
. T
h
is
soft
ware ca
n s
u
ggest tags
for faces using the sam
e
m
e
thods as c
u
rrent s
o
cial ne
tworks
; howeve
r, la
ndm
a
rk
im
ages are rar
e
l
y
t
a
gged as
GPS
dat
a
m
a
y
not
b
e
avai
l
a
bl
e from
t
h
e ph
o
t
og
rap
h
s.
User
s reg
u
l
a
rl
y
t
a
g onl
y
a
sm
al
l
num
ber
of
i
n
t
e
rest
i
n
g l
a
ndm
arks a
n
d
l
eave m
o
st
other sim
ilar im
a
g
es
untagged,
affecting t
h
e s
earch
accuracy.
While landm
ark im
ages can be tagge
d similarly to faces
in social
m
e
dia, the task is
more di
fficult
because there
are m
a
ny type
s of la
ndm
arks to be recognized including
churches, pagodas,
towers, buildings,
t
e
m
p
l
e
s,
an
d h
ous
es.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
13
9 – 15
0
14
0
Land
m
a
r
k
s in
p
h
o
t
ogr
aph
s
can
b
e
tagg
ed
au
t
o
m
a
t
i
cally [
5
]-
[
7
].
I
f
t
h
e landmar
k
is kno
wn an
d tagg
ed
by a user, the
features a
r
e e
x
tracted
from
these landm
ark i
m
ages and use
d
t
o
searc
h
for
ot
he
r i
m
ages by
feature
index.
If a
searc
h
res
u
lt is found, a
n
im
age ar
ea
at the location of the e
x
tract
ed feat
ure
s
is
then
autom
a
tically
tagge
d wit
h
the search re
sult
. If a land
m
a
rk i
n
a p
hot
og
r
a
ph i
s
ne
w an
d u
n
avai
l
a
bl
e i
n
t
h
e
l
a
ndm
ark dat
a
base, t
h
e
res
u
l
t
sho
u
l
d
not
be
fo
u
nd (
s
ee Fi
g
u
re
1)
. H
o
we
v
e
r, t
h
e cl
assi
fi
e
r
m
u
st
st
il
l
select
t
h
e
best
-
r
an
ke
d a
n
swer
,
whi
c
h m
a
y
not
al
way
s
be acc
urat
e.
Fi
gu
re
1.
Aut
o
m
a
t
i
c
l
a
ndm
ar
k t
a
g
g
i
n
g
On
e li
m
itat
i
o
n
o
f
land
m
a
rk
tag
g
i
n
g
is th
at
it can
no
t d
i
sting
u
i
sh
b
e
tween two sim
i
lar lan
d
m
ark
s
i
n
di
ffe
re
nt
l
o
cat
i
ons
. I
n
Fi
g
u
re
2, t
h
e Ei
f
f
el
T
o
we
r an
d t
h
e
To
ky
o T
o
we
r
have t
h
e sam
e
sha
p
e, b
u
t
t
h
ei
r col
o
rs
and
su
rr
o
u
ndi
ngs
are
com
p
l
e
t
e
l
y
di
ffere
nt
.
In
m
o
st
l
earning m
achines
that ge
ne
rally rec
o
gnize
onl
y the
landm
ark obje
c
t, bot
h object
s would
receive the sam
e
ta
g
.
W
e
believe that som
e
seconda
ry objects in the
ph
ot
o
g
r
ap
h, s
u
ch as a Ja
pa
ne
se h
ouse
,
m
a
y
hel
p
use
r
s t
o
s
e
l
ect
t
h
e cor
r
e
c
t
answe
r
.
In
o
t
her
wo
rd
s, t
h
e
user
’
s
p
r
i
o
r kno
wledg
e
m
a
y n
o
t
b
e
su
fficien
t
for l
a
n
d
m
ark
reco
gn
itio
n and
consid
ering th
e
seco
nd
ary su
rroun
d
i
n
g
s
of
t
h
e l
a
ndm
ark m
a
y
im
prove
searc
h
acc
urac
y
.
Fi
gu
re
2.
E
x
a
m
pl
e of t
h
e s
e
c
o
n
d
a
r
y
su
rr
o
u
n
d
i
n
gs
of
t
h
e l
a
ndm
ark
There
f
ore, t
h
r
ee chal
l
e
ngi
n
g
pr
obl
em
s
m
u
st
be res
o
l
v
e
d
fo
r i
m
prove
d
l
a
ndm
ark i
m
age t
a
g
g
i
n
g.
First, searc
h
a
ccuracy is too lo
w as there
is a dearth of
m
a
nually tagged la
ndm
ark im
ages. Second, tag
sug
g
est
i
o
ns ca
nn
ot
be acc
ura
t
e si
nce i
t
i
s
di
ffi
cul
t
t
o
di
st
i
n
g
u
i
s
h bet
w
ee
n t
w
o si
m
i
l
a
r landm
arks. T
h
i
r
d
,
as
T
o
ky
o
To
w
e
r
T
o
ky
o
To
w
e
r
E
i
ffe
l
To
w
e
r
Ei
ffe
l
To
w
e
r
L
a
n
d
m
a
r
k
t
a
g
s
u
gge
s
t
i
o
n
E
i
ffe
l
To
w
e
r
?
L
a
nd
m
a
r
k
t
a
g
kn
ow
l
e
d
g
e
Seco
ndar
y
re
gion
Seco
ndar
y
re
gio
n
High
li
ght
l
a
ndm
ark
Hi
ghl
ight landm
a
rk
T
oky
o
T
o
w
e
r
Eif
f
el T
o
wer
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
In
cremen
ta
l Tag
S
ugg
estion
f
o
r Lan
d
ma
rk
Imag
e C
o
llectio
n
s
(Punp
iti Piamsa-n
ga
)
14
1
new
ph
ot
o
g
r
ap
hs are i
n
se
rt
ed
i
n
t
o
t
h
e dat
a
b
a
se, ne
w l
a
nd
m
a
rks m
u
st
be sug
g
est
e
d;
ot
her
w
i
s
e, t
h
e s
y
st
em
o
u
t
p
u
t
s th
e cl
osest resu
lt.
In
t
h
is p
a
p
e
r,
we pro
p
o
s
e increm
en
tal au
to
matic tag
g
i
n
g
for land
m
a
rk
imag
es u
s
ing
su
rroun
d
i
ng
regi
ons t
o
distinguish
betwee
n sim
i
lar landmarks. T
h
is i
m
proves la
ndmark tagg
ing
accuracy a
n
d identifie
s
new
t
a
gs
f
o
r
ne
w l
a
n
d
m
a
rk i
m
ages.
The o
r
ga
ni
zat
i
on
of t
h
i
s
pa
p
e
r i
s
as fol
l
o
w
s
:
Sect
i
on 2 i
s
a revi
ew rel
a
t
e
d wo
rk;
Sect
i
on 3 d
e
scri
be
s t
h
e
ove
r
v
i
e
w
of
o
u
r
p
r
o
p
o
se
d f
r
a
m
e
wor
k
;
Sect
i
on
4
p
r
esent
s
t
h
e ex
pe
ri
m
e
ntal
resul
t
s
.
Fi
na
l
l
y
, we co
ncl
u
de t
h
i
s
researc
h
i
n
Sec
t
i
on
5.
2.
RELATED WORK
Ty
pi
cal
l
y
, l
a
nd
m
a
rk i
m
age t
a
ggi
ng
us
es l
ear
ni
n
g
-
b
as
e
d
a
p
proac
h
es with visual
features from
im
age
regi
ons c
o
r
r
es
po
n
d
i
n
g t
o
t
h
e l
a
ndm
ark. L
earni
ng
-ba
s
ed
approaches ge
nerate
landm
a
rk
classifiers for
a
pre
d
efi
n
ed set
of l
a
n
d
m
a
rk re
gi
o
n
s usi
ng
kn
owl
e
dge
pre
v
i
ousl
y
l
earne
d f
r
om
l
a
bel
e
d l
a
ndm
ark re
gi
o
n
s
[8]
,
[9].
A la
ndm
ark c
o
rres
ponds
to a cate
g
or
y
f
r
om
whi
c
h a cl
assi
fi
er i
s
t
r
ai
n
e
d.
Jos
h
i
et
al
.
[1
0]
p
r
op
ose
d
usi
n
g
im
age t
a
gs an
d vi
s
u
al
co
nt
ent
t
o
i
n
fe
r t
h
e
l
a
ndm
ark. Pa
p
a
do
p
oul
os et
al
. [1
1]
pr
o
pose
d
a fram
e
wo
rk
t
o
t
a
g
lan
d
m
ark
s
an
d ev
en
ts in
g
e
o-tagg
ed im
ag
e
s
. Th
is inv
o
l
v
e
s l
earni
ng
f
r
o
m
a hy
bri
d
i
m
age-si
m
i
l
a
ri
ty
gra
p
h,
in
clu
d
i
n
g
v
i
sual an
d
tag
simi
larities b
e
tween
im
ag
es. Rag
u
r
am
et al. [1
2] p
r
opo
sed
an
i
c
o
n
i
c scen
e
g
r
ap
h
t
o
b
u
ild
stereo
scen
e m
o
d
e
ls fo
r 3D land
m
a
rk
recog
n
itio
n
.
In
[13
]
, Yu
np
eng
et al. p
r
esen
ted
a large-scale
l
a
ndm
ark cl
ass
i
fi
cat
i
on sy
st
em
from
geo-t
a
gge
d Fl
i
c
k
r
i
m
ages using
m
u
lti-class
su
p
por
t v
ector
machines.
The accuracy of landm
a
rk taggi
ng
usin
g learni
ng-ba
s
ed
approaches is
lo
w bec
a
use t
h
e prede
f
ine
d
set of
l
a
ndm
ark i
m
ages i
s
l
i
m
i
t
e
d an
d
new
l
a
n
d
m
a
rks ca
n
not
be
d
e
t
ect
ed.
Fo
r real-world
ap
p
lication
s
,
we canno
t rely en
tirel
y on
a pred
ef
i
n
ed set
o
f
land
m
a
r
k
s and
im
p
r
ov
i
ng
t
h
e l
ear
ni
n
g
al
go
ri
t
h
m
t
o
det
ect
new
l
a
n
d
m
a
rks
i
s
cr
uci
a
l
.
Seve
ral
pu
bl
i
cat
i
ons s
u
gge
st
ed m
e
t
hods
t
o
s
u
p
p
o
rt
new
l
a
n
d
m
a
rk im
ages by
us
i
ng i
n
crem
ent
a
l
t
a
ggi
ng
. F
o
r i
n
st
a
n
ce,
t
h
e
pe
rso
n
al
i
m
age t
a
ggi
ng
ap
pr
oa
c
h
t
a
gge
d ne
w i
m
ages wi
t
h
t
a
gs
m
o
st
ly
used b
y
t
h
e que
ry
us
er o
r
t
h
ei
r
fri
e
nd
[1
4]
, [
1
5]
.
Xi
r
o
n
g
et
al
. p
r
o
p
o
se
d
an algorithm
that learns tag re
levan
ce
by accum
u
lating vote
s
from
visual neighbors
of th
e
target im
age on the
assum
p
t
i
on t
h
at
di
ffe
re
nt
us
ers l
a
bel
vi
s
u
a
l
l
y
sim
i
l
a
r im
ages
usi
n
g t
h
e
sam
e
t
a
gs [1
6]
, [
1
7]
. T
h
e
feat
ur
e
i
nde
xi
n
g
st
rat
e
gy
i
s
f
u
rt
her
en
hance
s
scalabil
ity for la
rge
-
sc
ale applications.
Dong
et al. [18
]
p
r
esen
t a sch
e
m
e
th
at p
e
rfo
r
m
s
in
cremen
tal tag
g
i
n
g
u
n
til a satisfacto
r
y tagg
ing
accuracy is rea
c
hed
or the
use
r
stops the
process. T
h
is
approach t
o
e
x
em
plar selec
tion as
sum
e
s that exem
plar
im
ages can
represent
visually
sim
ilar im
age clusters
be
tter
th
an
t
h
e cen
t
roid
s of th
e ob
tain
ed clu
s
ters,
wh
ich
m
a
y
not
be rea
l
sam
p
l
e
s. The
C
o
l
u
m
b
i
a
TAG (T
ra
nsd
u
ctive A
nno
tatio
n
by G
r
ap
h)
system [
1
9
]
in
co
rpor
ates
gra
p
h-base
d la
bel propa
gation m
e
thods
and int
u
itive
gra
phical
use
r
int
e
rfaces
(GUI) that allow
us
ers t
o
q
u
i
ck
ly b
r
owse and
anno
tate a sm
al
l n
u
m
b
e
r of im
ag
es/v
id
eo
s. Th
e syste
m
th
en
refi
nes th
e lab
e
ls
fo
r t
h
e
rem
a
in
in
g
un
lab
e
led
d
a
ta in
t
h
e co
llection
.
Th
e
g
r
ap
h-
bas
e
d l
a
bel
pr
opa
gat
i
on m
e
t
h
o
d
s
are c
onst
r
uct
e
d f
r
om
v
i
su
ally si
m
ila
r i
m
ag
es an
d
t
h
e in
itial lab
e
l
s
o
f
a sub
s
et of no
d
e
s in
the g
r
aph
are provid
e
d
b
y
so
m
e
ex
tern
al
filters, classifi
ers,
o
r
rank
ing syste
m
s. Th
is syste
m
’s p
e
rfo
r
m
a
n
ce d
e
p
e
n
d
s
on
classifi
ers fo
r im
ag
e lab
e
lin
g
an
d
t
h
e u
s
er’s ch
o
i
ce of positiv
e an
d
n
e
gativ
e i
m
ag
es/v
id
eo
s
for th
e
lab
e
ls. Th
anh
-
Bin
h
Le et al. [20
]
propose
d
a
n
i
n
crem
ental selection strategy which
can
im
proved t
h
e
classifi
cation accuracy
of semi-
supervised learning
(SSL
)
al
g
o
rithm
s
.
As
desc
ri
be
d
abo
v
e,
t
h
e
r
e a
r
e se
veral
p
r
o
pos
ed
in
crem
e
n
tal tagg
ing
meth
od
s t
h
at
u
s
e graph
s
for
vi
sual
l
y
si
m
i
l
a
r i
m
ages and
user
res
p
onses
t
o
ad
d
ne
w t
a
gs f
o
r n
e
w l
a
ndm
ark i
m
ages. H
o
weve
r,
t
h
es
e
existing m
e
thods
require
c
o
m
p
lex com
putations a
n
d
ha
ve
lim
ited accuracy beca
use
they either
depend
on
vi
sual
feat
ures
fr
om
t
h
e ent
i
re i
m
age or hi
ghl
i
g
ht
o
n
l
y
t
h
e l
a
n
d
m
a
rk r
e
gi
o
n
an
d
use
r
res
p
o
n
ses
(t
here
by
p
o
t
en
tially co
nfu
s
i
n
g sim
i
lar lan
d
m
ark
s
).
3.
R
E
SEARC
H M
ETHOD
In t
h
i
s
pa
pe
r,
we
pr
op
ose a
n
i
n
crem
ent
a
l
landm
ark t
a
g
g
i
n
g
f
r
am
ework
t
o
s
u
p
p
o
rt
ne
w l
a
n
d
m
a
rk
tagging a
nd a
d
dress t
h
e low
accuracy of existing m
e
thod
s resulting
from an inability
to distinguish
be
tween
sim
i
l
a
r
l
a
ndm
arks i
n
di
ffe
rent
l
o
cat
i
ons.
We use a hy
b
r
i
d
ap
pr
oac
h
com
b
i
n
i
ng t
h
e
learn
i
ng
-b
ased
m
e
th
od
and
th
e correlation
b
e
tween
land
m
a
rk
s and
salient
su
rro
und
ing
ob
j
ects u
s
i
n
g co-occ
urre
nce.
In
ge
neral
,
t
h
e
im
age has
t
w
o t
y
pes
o
f
regi
ons:
t
h
e
hi
g
h
l
i
ght
e
d
l
a
ndm
ark
regi
on
an
d t
h
e sec
o
nda
r
y
regi
on
. The
hi
ghl
i
g
ht
ed l
a
nd
m
a
rk re
gi
o
n
i
s
t
h
e p
r
om
i
n
ent
or
wel
l
-
k
n
o
w
n
ob
ject
wi
t
h
i
n
a part
i
c
ul
ar l
a
n
d
sca
p
e
t
h
at
i
s
a co
m
m
on i
n
t
e
rest
area
for
p
hot
og
rap
h
s. Sec
o
nda
ry reg
i
o
n
s
are domin
an
t areas related
to
th
e land
m
a
rk
th
at u
s
er
s
n
e
g
l
ect to
tag or
that co
n
t
ai
n
s
o
b
jects th
at cannot b
e
r
e
pr
esen
ted
b
y
lab
e
l tex
t
.
Ou
r
pr
o
pos
ed
fram
e
wor
k
c
o
nsi
s
t
s
of t
w
o
m
e
t
hods:
1)
L
a
ndm
ark t
a
g l
e
arni
ng
t
o
ge
ne
rat
e
l
a
n
d
m
a
rk
cl
assi
fi
ers fr
o
m
l
a
bel
e
d l
a
ndm
ark im
ages, a
m
e
t
hod w
h
i
c
h uses
bot
h
t
h
e hi
g
h
l
i
g
ht
ed l
a
n
d
m
a
rk and t
h
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
13
9 – 15
0
14
2
su
rroun
d
i
ng
ob
j
ects to
h
e
lp
im
p
r
o
v
e
tag
accu
racy; and
2) In
crem
en
tal su
gg
estion
to
tag
ex
isting
lan
d
mark
s
fr
om
a prede
f
i
n
ed l
a
n
d
m
a
rk dat
a
set
and a
d
d ne
w t
a
gs f
o
r
new l
a
n
d
m
a
rk im
ages or
di
ffe
rent
vi
e
w
s
of t
h
e
l
a
ndm
ark i
m
age fr
om
a pre
d
ef
i
n
ed l
a
ndm
ark
dat
a
set
.
Ou
r
fr
am
ewor
k i
s
sh
ow
n i
n
Fi
gu
re
3.
Fig
u
re
3
.
In
cre
m
en
tal learn
i
n
g
fo
r land
m
a
rk
im
ag
e tag
g
i
ng
3.
1. L
a
nd
mar
k
T
a
g L
e
arni
ng
Land
m
a
rk
tag
learn
i
ng
con
s
ists o
f
t
w
o pro
c
esses:
1) a m
u
lticlass classificatio
n
for tag
learn
i
n
g
t
h
at
l
earns m
odel
s
by
di
st
i
n
g
u
i
s
hi
ng
bet
w
ee
n t
h
e
hi
g
h
l
i
ght
e
d
l
a
ndm
ark re
gi
o
n
s
and s
eco
n
d
ar
y
regi
o
n
s f
o
r l
a
bel
e
d
lan
d
m
ark
im
ag
es. Mu
lticlass learn
i
n
g
is required
t
o
learn
tag
m
o
d
e
ls fo
r th
e land
m
a
rk
s i
n
th
e d
a
taset; an
d 2)
t
h
e co
nst
r
uct
i
o
n
of
a m
a
t
r
i
x
of
t
h
e
hi
g
h
l
i
g
ht
ed l
a
n
d
m
a
rk an
d sec
o
nda
r
y
regi
on
co
-
o
c
c
ur
rence
t
o
re
prese
n
t
their correlation (see Fi
gure 4).
W
e
use the l
a
ndm
ark cl
assifier m
odel and co-occu
rre
nce
for t
h
e increm
ental
su
gg
estion
m
e
t
h
od
.
High
lig
h
t
ed
lan
d
m
ark
reg
i
o
n
s correspo
n
d
t
o
th
e h
i
g
h
prob
ab
ility reg
i
o
n
s of each
landmark
im
ag
e.
We
d
e
term
in
e secon
d
a
ry reg
i
on
s
b
y
u
s
i
n
g
sign
ifican
t co
nd
itio
ns of
prob
ab
ility b
e
tween th
e
h
i
ghlig
h
t
ed
l
a
ndm
arks a
n
d
t
h
e ot
her
re
gi
o
n
s
fo
r eac
h l
a
n
d
m
a
rk i
m
age.
We use a sal
i
e
nt
det
ect
i
on al
go
ri
t
h
m
t
o
detect
t
h
e sal
i
e
nt
regi
ons
fo
r fea
t
ure ext
r
act
i
on
and
K-m
eans
gr
o
upi
ng
. Eac
h
cl
ust
e
r i
s
nam
e
d acc
or
di
n
g
t
o
t
h
e
“re
g
i
o
n t
a
g” t
h
at
c
o
nt
ai
ns a si
m
i
l
a
r ob
ject
o
r
vi
sual
c
ont
e
n
t
.
Thi
s
regi
on
t
a
g
i
s
use
d
t
o
det
e
rm
i
n
e t
h
e hi
g
h
l
i
ght
ed
l
a
n
d
m
a
rk
regi
on
an
d t
h
e seco
n
d
ary
re
gi
o
n
.
Lan
d
m
a
rk t
a
g l
earni
n
g
st
art
s
wi
t
h
t
h
e
hi
g
h
l
i
ght
e
d
l
a
n
d
m
a
rk re
gi
o
n
a
nd s
econ
d
a
r
y
regi
o
n
s det
ect
e
d
usi
n
g sal
i
e
nt
ob
ject
det
ect
i
o
n ap
pr
oac
h
es.
W
e
ass
u
m
e
that
sal
i
e
nt
ob
j
ect
det
ect
i
on i
s
abl
e
t
o
sel
ect
t
h
e
hi
g
h
l
i
ght
e
d
l
a
n
d
m
a
rk and sec
o
n
d
a
r
y
regi
o
n
s
rel
a
t
e
d t
o
t
h
e l
a
ndm
ark im
age whi
l
e
i
g
n
o
ri
ng i
r
rel
e
va
nt
v
i
sual
in
fo
rm
atio
n
.
We use three
salien
t
o
b
j
ect
d
e
tectio
n
ap
p
r
o
ach
es: th
e meth
od
of
Itti et
al [21
]
, Graph
Based
Vi
sual
Sal
i
e
nc
y
(GB
V
S)
[2
2
]
, and
H
ouC
V
P
R
[2
3]
. B
eca
use eac
h of these three m
odels is efficie
n
t for
sp
ecif
i
c salient r
e
g
i
on
s i
n
o
b
j
ect tagg
ing, u
s
i
n
g
t
h
ese
com
p
le
m
e
nta
r
y m
odels increases salient
objec
t
detection accuracy.
Ou
r sal
i
e
nt
det
ect
i
on m
odel
i
s
sh
ow
n i
n
Fi
g
u
re
5.
Let
M(.)
represe
n
t the
detected im
age area
using
a
sal
i
e
nt
det
ect
i
o
n m
e
t
hod;
d
re
prese
n
ts t
h
e image;
r
i
re
prese
n
ts
a re
gion
in the
im
age
d
; an
d
sc
ore(r
i
)
indicates
the sc
ore c
o
rre
sponding t
o
r
i
bei
n
g
part
of
t
h
e sal
i
e
nt
re
gi
o
n
det
ect
ed
by
M(.
)
, wh
ich
is
g
i
v
e
n as
fo
llows:
≡
,
∪
∪
(
1
)
If
sc
o
r
e
(
r
i
)
is greater than a
predef
i
n
ed threshold, each area
r
i
becom
e
s a sal
i
ent re
gion.
Finally, a set
o
f
all salien
t
reg
i
on
s
S(
d)
={S
1
,S
2
,…,
S
|s|
}
becomes a represe
n
tation of a
n
image
d
.
L
a
nd
mark
t
a
g
lear
ning
R
e
gi
on
t
a
g
s
,
H
i
g
h
li
g
h
t
la
nd
ma
rk and
s
e
c
ond
ary
r
e
g
i
on
s
L
a
ndmar
k la
be
ls
I
n
cr
em
en
t
a
l
s
ugg
est
i
o
n
L
a
nd
ma
rk l
a
be
l
S
e
co
nd
ary
regi
on
(Reg
ion
t
a
g
)
H
i
g
h
li
g
h
t
la
nd
ma
rk
(T
oky
o
T
o
we
r)
E
x
is
ting
lan
d
m
a
rk
im
age
Ne
w lan
d
mar
k
im
ag
e
Ne
w Re
g
i
o
n
t
a
g 1
Ne
w Re
g
i
o
n
tag
2
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
In
cremen
ta
l Tag
S
ugg
estion
f
o
r Lan
d
ma
rk
Imag
e C
o
llectio
n
s
(Punp
iti Piamsa-n
ga
)
14
3
The
features
are e
x
tracted
from
the salient re
gi
o
n
s a
n
d ar
e g
r
ou
pe
d
by
B
a
g
o
f
V
i
sual
Wor
d
s
(B
O
V
W) [
2
4]
and
G
I
ST
[2
5]
usi
n
g a
K-m
eans cl
ust
e
ri
n
g
al
go
ri
t
h
m
.
W
e
assi
gn a
speci
f
i
c regi
o
n
t
a
g t
o
eac
h
cl
ust
e
r co
nt
ai
n
i
ng si
m
i
l
a
r vi
sual
co
nt
ent
.
T
hus
, eac
h i
m
a
g
e is re
prese
n
t
e
d by a “Bag
of Re
gion tags
”—an
occu
rre
nce
vec
t
or
of
sal
i
e
nt
t
a
gs i
n
eac
h i
m
age.
Fi
gu
re
4.
Lan
d
m
ark t
a
g
l
ear
ni
ng
Fi
gu
re
5.
Sal
i
e
nt
det
ect
i
o
n
Hi
g
h
l
i
ght
e
d
l
a
ndm
ark an
d se
con
d
a
r
y
t
a
g generat
i
o
n uses
an arbi
t
r
a
r
y
t
h
resh
ol
d t
h
at
i
s
det
e
rm
i
n
ed
eith
er statistically o
r
b
y
a
learn
i
ng m
achine.
Highlight
ed landm
a
rks
cor
r
es
p
o
n
d
t
o
re
gi
on t
a
gs
wi
t
h
a
p
r
ob
ab
ility g
r
eater th
an
t
h
e th
resho
l
d
.
Secon
d
a
ry reg
i
on
s
are d
e
term
in
ed
b
y
reg
i
on
tags with
a co
nd
i
tio
n
e
d
Reg
i
o
n
t
a
g
s
,
Hi
g
h
l
i
g
ht
l
a
nd
m
a
r
k
and
s
ec
on
da
r
y
t
a
g
s
,
Landm
a
rk
l
a
be
ls
I
n
attenti
v
e reg
i
o
n
Hi
g
h
l
i
g
ht
l
a
nd
m
a
r
k
La
nd
m
a
r
k
l
a
bel
(Tok
y
o
T
o
w
e
r)
Ta
g
L
e
a
r
n
i
n
g
Sal
i
ent
reg
i
on
C
o
-
o
cc
urren
c
e
b
e
tw
een hig
h
lig
hted
lan
d
m
a
rk
reg
i
o
n
s
and se
condary
reg
i
o
n
s
Ho
uC
VPR
GB
VS
Itti
Salient regions
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
13
9 – 15
0
14
4
p
r
ob
ab
ility with
h
i
gh
lig
h
t
ed
l
a
n
d
m
ar
k
s
of greater th
an
zero
. Let
L
be th
e
set o
f
land
m
a
rk
lab
e
ls;
D
be t
h
e
set
of
t
r
ai
ni
ng
i
m
age
dat
a
a
n
d
D=
{D
1
,D
2
,…,D
|L|
}
whe
r
e
D
i
re
prese
n
ts t
h
e s
u
bsets
of im
age
s
that
have
label
l
ϵ
L
;
C={c
1
,c
2
,…,c
k
}
b
e
th
e set
o
f
k
feature cl
usters from
all salie
nt re
g
i
on
s of all i
m
ag
es in
train
i
n
g
set
D
; reg
i
on
tag
R={r
1
,r
2
,…,
r
k
}
b
e
th
e
set o
f
rep
r
esentativ
es o
f
C
, whe
r
e
r
i
is t
h
e cen
tro
i
d of
c
i
; an
d
P(r
i
,D
j
)
be t
h
e
p
r
ob
ab
ility o
f
reg
i
o
n
tag
r
i
oc
cur
r
i
n
g i
n
t
h
e t
r
ai
ni
n
g
s
ubset
D
j
. Hi
ghl
i
g
ht
e
d
l
a
n
d
m
a
rk re
gi
o
n
s an
d sec
o
nda
ry
with
land
m
a
rk
lab
e
l
j
are
gi
ve
n as
f
o
l
l
o
w
s
:
H
∀
r
Pr
,D
T
(
2
)
I
∀
r
|Pr
,H
0
(3
)
Al
l
hi
g
h
l
i
g
ht
ed
l
a
ndm
ark
regi
ons
an
d al
l
sec
o
n
d
a
r
y
re
gi
o
n
s
are
defi
ned
by
H=
{
j
|H
j
}
a
n
d
I=
{
j
|I
j
}
,
respectively. T
h
e thres
hol
d
T
j
is arb
itrary.
Each i
m
age can be re
pres
ent
e
d by
a bi
na
ry
vect
or
of re
gi
on t
a
g
s
, re
fer
r
e
d
t
o
as t
h
e “B
ag of R
e
gi
o
n
tags” (B
oRtags
).
If an im
age
d
i
i
s
t
a
gg
ed
by
r
e
gi
o
n
t
a
g
r
j
th
en
Bo
R
t
ag
s
(
d
i
,r
j
)
=1, ot
her
w
i
s
e
BoRtags(d
i
,r
j
)
=0.
We u
s
e m
u
lti
class SVM to co
n
s
tru
c
t tag classifiers for all
|L|
l
a
ndm
ark l
a
bel
s
usi
ng B
o
R
t
ags.
Hi
g
h
l
i
ght
e
d
l
a
ndm
ark an
d s
econ
d
a
r
y
regi
on c
o
r
r
el
at
i
o
n is analyzed using the
c
o
-occurrence
of t
h
e two
regi
ons
fo
r ea
ch l
a
n
d
m
a
rk l
a
bel
as sh
ow
i
n
eq.
4
.
We a
ssum
e
that ea
ch land
m
a
rk image has a similar
l
a
ndm
ark co
-o
ccur
r
ence
bet
w
een
hi
g
h
l
i
g
h
t
ed l
a
ndm
ark
and sec
o
nda
ry
regi
o
n
s. Let
P
D
l
(
H
i
,
I
j
)
be the co-
occu
rre
nce
bet
w
een
hi
g
h
l
i
ght
ed l
a
ndm
ark
re
gi
o
n
Hi
a
n
d
se
con
d
a
r
y
re
gi
o
n
I
j
i
n
t
h
e
t
r
ai
ni
ng
s
ubset
D
l
,
whic
h
i
s
gi
ve
n as
f
o
l
l
o
ws:
)
(
)
(
)
,
(
l
j
i
j
i
D
D
P
I
H
P
I
H
P
l
(
4
)
Al
l
t
a
g cl
assi
fi
ers a
n
d
co
-occ
ur
rence
val
u
es
bet
w
ee
n
hi
g
h
l
i
ght
ed
l
a
n
d
m
a
rk a
n
d sec
o
nd
ary
re
gi
o
n
s
are
used
t
o
a
ssi
gn
l
a
n
d
m
a
rk t
a
gs
fo
r
unt
a
g
ge
d i
m
ages t
h
at
o
f
ei
t
h
er
exi
s
t
i
n
g
or
ne
w l
a
nd
m
a
rks.
3.2. I
n
cremental
Sugges
t
ion
Ty
pi
cal
l
y
, aut
o
m
a
ti
c t
a
ggi
ng
defi
ne
s t
a
gs f
o
r
unt
a
gge
d i
m
ages usi
n
g t
a
g cl
assi
fi
ers
f
r
om
l
earni
ng
pr
ocesses
.
Thi
s
app
r
oac
h
i
s
l
i
m
i
t
e
d by
a rest
ri
ct
i
on o
n
t
h
e si
ze of t
a
g m
odel
s
. Hence
,
t
h
e t
a
ggi
n
g
p
r
eci
si
on
an
d
recall
m
e
a
s
u
r
e
o
f
au
to
m
a
tic tag
g
i
n
g
are still fairly
lo
w. W
e
p
r
op
ose increm
en
tal su
g
g
estio
n
fo
r im
p
r
o
v
ed
tag
g
i
ng
accuracy for
n
e
w land
m
a
rk
im
ag
es o
r
d
i
fferen
t
v
i
ews
o
f
ex
isting
lan
d
m
ark
s
in a d
a
taset, as illustrated
in
Figur
e
6
.
Our
fram
e
wo
rk
co
m
b
in
es tag
classifiers
with
th
e
co-occurrence
between
h
i
g
h
lighted
land
m
a
r
k
regi
ons a
nd s
econ
d
a
r
y
regi
ons t
o
s
u
g
g
est
t
a
gs. For
unt
agge
d i
m
ages, t
a
gs are sug
g
e
st
ed fr
om
bot
h t
a
g
cl
assi
fi
ers an
d t
h
e co-
o
cc
ur
re
nce bet
w
ee
n h
i
ghl
i
g
ht
ed l
a
n
d
m
ark regi
on
s and sec
o
nda
ry
r
e
gi
o
n
s.
We ass
u
m
e
that if the
unta
gge
d im
age is
a ne
w landm
a
rk im
age, each
m
e
thod
woul
d
suggest a
di
ffe
rent ta
g.
Tags
fo
r u
n
t
a
g
g
ed i
m
ages ar
e su
ggest
e
d
us
i
ng t
h
e
f
o
l
l
o
wi
ng
st
eps.
Fi
rst
,
t
h
e sal
i
e
nt
re
g
i
ons
of t
h
e
unta
gge
d im
age are
detected
and their feat
ures a
r
e e
x
tr
acte
d
.
Second,
a re
gion tag is assi
gne
d t
o
eac
h s
a
lient
r
e
g
i
o
n
in
t
h
e un
tagg
ed im
ag
e
and
Bo
Rtag
s
r
e
pr
esen
t th
e
un
tagg
ed im
ag
e
.
Th
i
r
d, r
e
g
i
o
n
tag
s
an
d landmark
l
a
bel
s
are su
g
g
est
e
d f
o
r t
h
e unt
a
gge
d i
m
age usi
n
g a t
a
g
classifier
m
odel
with the
suggestion re
ferred to as
Tag
model
. Fou
r
t
h
, a co-
o
cc
ur
r
e
nce m
a
t
r
i
x
i
s
const
r
uct
e
d
b
e
t
w
een t
h
e re
g
i
on t
a
gs o
f
t
h
e
unt
ag
ge
d im
age. Let
P
d
q
(
r
i
,
r
j
)be
the co-occ
urrence matr
i
x
bet
w
ee
n re
gi
o
n
t
a
g
r
i
and
regi
on
r
j
of
t
h
e unt
a
gge
d
im
age
d
q
, gi
ven as
fo
llows:
)
r
P(r
)
,r
(r
P
j
i
j
i
d
q
(
5
)
Fifth, t
h
e c
o
-occurre
nce
of the untagge
d
im
age
is com
p
ared with
the
co-oc
c
urre
nce of
eac
h
l
a
ndm
ark l
a
bel
.
Let
Sc
ore
(
d
q
,D
l
)be the
dista
n
ce m
easure s
c
ore
betwe
e
n t
h
e c
o
-occurrence of the
untagge
d
im
age and the
co-occ
urrence
of
each landm
a
rk label as
follows:
|
)
r
,
r
(
P
)
r
,
r
(
P
|
)
D
,
d
(
Score
j
i
d
j
i
D
k
1
j
,
i
l
q
q
l
(
6
)
Let
P
D
l
(
r
i
,
r
j
) be the co-occurrence betwee
n region tag
r
i
and r
e
g
i
on
tag
r
j
i
n
l
a
ndm
ark t
r
ai
ni
n
g
su
bset
D
l
. T
h
e l
a
n
d
m
a
rk l
a
bel
i
s
s
u
gge
st
ed f
o
r
u
n
t
agge
d i
m
ag
es b
a
sed
on
th
e lan
d
m
ark
labels with
th
e lo
west
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
In
cremen
ta
l Tag
S
ugg
estion
f
o
r Lan
d
ma
rk
Imag
e C
o
llectio
n
s
(Punp
iti Piamsa-n
ga
)
14
5
distance m
easure
score
bet
w
een the
c
o
-occurrence
of t
h
e
unta
gge
d i
m
ag
e and t
h
e
co-occ
urrence
of each
l
a
ndm
ark l
a
be
l
.
Tag
co-occurenc
e
refers to the landm
ark label sugges
ted using the c
o
-occ
urre
nce of each
l
a
ndm
ark l
a
b
e
l
.
Si
xt
h,
t
h
e
l
a
ndm
ark l
a
bel
s
sug
g
est
e
d
usi
n
g t
a
g cl
assi
fi
er
s (
Tag
model
) a
r
e com
p
ared to thos
e
sug
g
est
e
d
usi
n
g c
o
-
o
cc
ur
renc
e (
Ta
g
co-occurence
). Th
e
un
tagg
ed
im
ag
e o
b
t
ains th
e sug
g
e
sted lan
d
m
ark
lab
e
ls th
e
tag
m
o
d
e
l
if
Tag
model
=T
ag
co-occurrence
or
Scor
e(
d
q
,D
l
)<
T
new
, ot
he
rwi
s
e, a ne
w l
a
ndm
ark l
a
bel
i
s
sug
g
est
e
d.
T
ne
w
is a thresh
o
l
d
wh
ich
is d
e
termin
ed
statistically fro
m
th
e
max
i
m
u
m
d
i
st
ance
m
easure
score
betwee
n the
c
o
-
occurre
nce
of l
a
ndm
ark im
ages and t
h
e co-occurre
nce
be
t
w
een
hi
g
h
l
i
g
ht
ed l
a
n
d
m
a
rk re
gi
o
n
s an
d sec
o
nda
ry
reg
i
o
n
s
in
each lan
d
m
ark
d
a
taset. Fin
a
lly, if
th
e un
tagg
ed
i
m
age is gi
ven
a ne
w landm
a
rk label, we
re
present
th
e co
-o
ccurren
ce
o
f
th
e n
e
w land
m
a
rk
lab
e
l
u
s
ing
t
h
e
Bo
Rtag
s
of the un
tagg
ed
imag
e. In
add
itio
n
,
i
f
t
h
e
unt
a
gge
d i
m
age i
s
assi
g
n
ed
a t
a
g m
odel
i
n
t
h
e
da
tabas
e
, we
update
the co-occurre
nce
of the as
s
i
gne
d
l
a
ndm
ark l
a
bel
.
Fig
u
re
6
.
In
cremen
tal tag
sugg
estion
We use the co-occ
urre
nce matrix
P
D
l
(
H
i
,
I
j
)
bet
w
ee
n t
h
e hi
ghl
i
g
ht
ed l
a
n
d
m
ark regi
ons a
nd sec
oda
ry
regi
ons
i
n
eac
h l
a
ndm
ark l
a
bel
t
o
de
fi
ne
t
h
e l
a
n
d
m
a
rk
di
ct
i
ona
ry
.
If
P
D
l
(
H
i
,
I
j
)>
0 t
h
en t
h
e
hi
g
h
l
i
g
ht
ed
l
a
ndm
ark
H
i
an
d th
e po
sitiv
e second
ary
regio
n
I
j
are
defi
n
e
d i
n
t
h
e
l
a
n
d
m
ark
di
ct
i
ona
r
y
of
l
a
n
d
m
a
rk
L
. T
h
e
l
a
ndm
ark
di
ct
i
ona
ry
i
s
s
h
ow
n
i
n
Fi
gu
re
7.
R
e
gio
n
ta
gs
,
Hi
ghl
i
g
h
t
l
a
nd
m
a
r
k
and
secon
d
ary
ta
gs,
L
a
ndm
a
rk la
bel
s
T
a
g
Classifier
Se
con
d
ary
regi
o
n
(Reg
io
n tag)
Hig
h
lig
ht la
ndm
ark
(T
ok
y
o
T
o
wer)
S
a
l
i
e
n
t r
e
gi
on
T
a
g
S
u
gge
stion
New
T
a
g
Increm
en
t
Salient re
gion
Newt R
e
g
i
on
tag
1
Ne
w l
a
n
d
m
a
r
k
im
a
g
e
E
x
isting
la
ndm
ark
im
a
g
e
Co-oc
c
urren
c
e
b
e
tween
h
i
g
h
lig
h
ted
land
m
a
rk
region
s
a
nd sec
ond
ary
region
s
Newt
Regi
o
n
t
a
g
2
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
13
9 – 15
0
14
6
Fi
gu
re 7.
Lan
d
m
ark di
ct
i
ona
r
y
4.
RESULTS
A
N
D
DI
SC
US
S
I
ON
We assum
e
that each landmark im
age contains a
highli
g
hted landm
ark regi
on and othe
r seconda
ry
regi
ons a
nd t
h
a
t
new l
a
ndm
ark im
ages have
di
ffe
ri
n
g
hi
ghl
i
ght
e
d
l
a
ndm
ark regi
on
s or se
con
d
a
r
y
regi
o
n
s
.
W
e
aim
t
o
eval
uat
e
t
h
e i
m
prove
m
e
nt
i
n
t
h
e ac
curacy
of l
a
nd
m
a
rk t
a
g
g
i
n
g
whe
n
usi
n
g se
con
d
a
r
y
re
gi
o
n
s a
n
d
in
crem
en
tal ta
g
sugg
estio
ns in
co
m
p
arison
to
tag
g
i
n
g
u
s
i
n
g o
n
l
y
t
h
e hi
g
h
l
i
ght
e
d
l
a
n
d
m
a
rk re
gi
o
n
s a
n
d usi
n
g
onl
y
a t
a
g m
o
d
e
l
.
4.
1. D
a
t
a
We c
o
n
d
u
ct
ed
ex
peri
m
e
nt
s usi
n
g
30
di
f
f
e
r
ent
fam
ous l
a
ndm
arks
fr
o
m
aroun
d t
h
e
wo
rl
d
.
We
collected 35,829 im
ages from the Pa
ris Dataset [26],
Flickr, a
n
d Goog
le by sea
r
ching each
pa
rticular
l
a
ndm
ark
nam
e
. Eac
h
l
a
n
d
m
a
rk
l
a
bel
has a
b
out
1
1
0
0
i
m
ages an
d c
o
nt
ai
ns
ar
ou
n
d
t
h
ree s
a
l
i
e
nt
regi
on
s.
4.
2. Fea
t
ures
We use
d
a 5
1
2
-
d
i
m
ensi
onal
GIS
T
descri
pt
or
an
d
a
2
5
0
0
-
di
m
e
nsi
onal
B
O
V
W
, w
h
i
c
h
are deri
ve
d
fr
om
a SIFT descri
pt
o
r
fo
r t
h
e feat
ure e
x
t
r
act
i
on o
f
sal
i
e
nt
regi
o
n
s
.
Each im
age i
s
rep
r
ese
n
t
e
d by
40
0-
di
m
e
nsi
onal
B
o
R
t
ags.
4.
3 E
val
u
a
ti
o
n
o
f
Resul
t
s
We m
easured the accuracy of the salient selec
tion
m
e
thod and eval
uated the pe
rform
a
nce of
l
a
ndm
ark t
a
g
g
i
ng i
n
t
e
rm
s of
preci
si
o
n
,
rec
a
l
l
,
and t
h
e F1-m
easure compare
d
to t
h
e gro
und
-
t
ru
th
land
m
a
r
k
tag
s
u
s
ing
h
i
ghlig
h
t
ed
land
m
a
rk reg
i
on
s and
p
o
s
itiv
e second
ary
reg
i
o
n
s.
The acc
u
r
acy
of
t
h
e
sal
i
e
nt
s
e
l
ect
i
on i
s
e
v
al
uat
e
d
usi
n
g
a
gr
o
u
n
d
-t
rut
h
d
a
t
a
set
gene
rat
e
d
fr
om
user-
sel
ect
ed hi
g
h
l
i
ght
e
d
l
a
n
d
m
a
rk re
gi
o
n
s. Eac
h
det
ect
ed i
m
age has l
e
ss t
h
a
n
t
h
re
e sal
i
e
nt
regi
ons i
n
t
h
e
gr
o
u
n
d
-
tru
t
h im
ag
es.
Fi
gu
re
8 sh
o
w
s exam
pl
es of sal
i
e
nt
sel
e
ct
i
on res
u
l
t
s
.
We g
r
ou
pe
d l
a
ndm
ark i
m
ages i
n
t
o
fi
ve
categories: tem
p
le,
tower, churc
h
, cas
tle, and art buildi
n
g. The ave
r
age
sa
lient selection accuracy is
82.23%
as show
n in
Tab
l
e 1.
Eif
f
el
To
w
e
r
Bi
gB
e
n
T
o
kyo
To
w
e
r
L
e
a
n
i
ng
to
w
e
r
Blue
M
o
s
que
Le
s
I
n
v
a
lid
es
St
.
m
a
r
k
s
b
a
s
ilica
S
t
.
B
a
s
ils
Ca
t
h
e
d
ra
l
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
In
cremen
ta
l Tag
S
ugg
estion
f
o
r Lan
d
ma
rk
Imag
e C
o
llectio
n
s
(Punp
iti Piamsa-n
ga
)
14
7
Fi
gu
re
8.
Sal
i
e
nt
det
ect
i
o
n
Table 1. Salient
selection
acc
uracy
Land
m
a
rk
Type
True Positive Rate
(%)
T
e
m
p
le
85.
50
T
o
wer
78.
31
Chur
ch
87.
82
Castle
81.
00
Ar
t Building
78.
51
Average
82.23
We ev
alu
a
ted
tag
su
gg
estion u
s
in
g
h
i
gh
ligh
t
ed
land
m
a
rk
an
d
second
ary reg
i
o
n
s with
m
u
ltic
lass
SVM classifiers with 10
-fo
ld cro
ss
v
a
lid
atio
n (ex
e
cu
te
d
i
n
WEKA).
This serv
ed as a
p
r
ed
ictio
n m
o
d
e
l fo
r
l
a
ndm
ark t
a
g
s
on
2
1
l
a
n
d
m
a
rk i
m
ages.
We
expe
ri
m
e
nt
ed
wi
t
h
t
h
ree
set
s
o
f
regi
ons:
1
)
hi
g
h
l
i
ght
e
d
l
a
n
d
m
a
rk
o
n
l
y (E1)
,
2
)
en
tir
e im
ag
e (
E
2
)
, 3)
h
i
gh
lig
hted
land
m
a
r
k
an
d
second
ar
y reg
i
on
s (E3)
. E1
and
E2
u
s
e
GIST
and B
O
V
W
f
e
at
ures as i
n
p
u
t
.
E3 i
s
o
u
r
p
r
op
ose
d
m
e
t
hod an
d
uses B
o
R
t
ags an
d E4
i
s
t
h
e gr
o
u
n
d
-
t
rut
h
m
e
t
hod.
Ta
bl
e
2 s
h
o
w
s
t
h
e
pe
rf
orm
a
nce o
f
t
h
e t
h
ree m
e
t
h
o
d
s.
Tabl
e
2.
Per
f
o
r
m
a
nce o
f
l
a
n
d
m
ark t
a
g
g
i
n
g
u
s
i
n
g
t
a
g cl
assi
f
i
er m
odel
of
2
1
l
a
ndm
ark i
m
ages
M
e
thod
Set of r
e
gions
F1-m
easur
e
E
1
Highlight land
m
a
rk only
0.
730
E
2
W
hole
im
age
0.
831
E
3
Highlight land
m
a
rk a
nd secondar
y
salient
0.
885
The
res
u
l
t
s
de
m
onst
r
at
e a 1
5
.
5% i
m
pro
v
em
ent
i
n
t
h
e F
1
-
m
easure
whe
n
usi
n
g
ou
r
pr
o
p
o
se
d m
e
t
hod
(E3
)
o
v
e
r
t
h
e
m
e
t
hod
usi
n
g t
h
e hi
ghl
i
g
ht
ed
l
a
ndm
ark re
gi
on
o
n
l
y
(E1
)
.
Ou
r m
e
t
hod (E
3) i
s
al
so s
u
pe
ri
or t
o
th
e using
th
e en
tire im
ag
e (E2
)
.
Inc
r
em
ent
a
l
t
a
g s
u
ggest
i
o
n
f
o
r
ne
w l
a
n
d
m
a
rk
i
m
ages i
s
ev
al
uat
e
d i
n
c
o
m
p
ari
s
on
t
o
f
o
ur
m
e
t
hods:
1
)
t
a
g co
-occ
u
rre
nce (
P
1
)
,
2) t
a
g m
odel
wi
t
h
21 l
a
ndm
arks
(
P
2)
, 3
)
t
a
g m
odel
wi
t
h
30 l
a
n
d
m
a
rks (P
3,
ba
sel
i
n
e
Landm
a
r
k
Image
HouC
VP
R
GB
V
S
It
ti
Salient re
gion
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
13
9 – 15
0
14
8
m
e
t
hod
), 4
)
o
u
r
pr
op
ose
d
m
e
t
h
o
d
o
f
i
n
cre
m
ent
a
l
t
a
g sugge
st
i
on
(P
4).
W
e
ex
peri
m
e
nt
ed
wi
t
h
2
1
exi
s
t
i
n
g
landm
arks in the tag code
book and seve
n
ne
w landm
arks
. Our propose
d m
e
thod (P4) ac
hieves a
n
accuracy of
0.
85
5,
o
u
t
p
e
r
f
o
rm
i
ng t
h
e
ot
he
r m
e
t
hods,
as s
h
o
w
n i
n
Ta
bl
e
3.
Tabl
e
3.
Acc
u
r
acy
of
i
n
crem
ent
a
l
t
a
g s
u
gges
t
i
on
fo
r
new
l
a
ndm
ark i
m
ages
M
e
thod F1-m
easur
e
Tag co-occurr
ence
(P1)
0.803
T
a
g
m
odel with 21 land
m
a
r
k
s (P2)
0.
641
T
a
g
m
odel with 30 land
m
a
r
k
s (P3)
0.
804
I
n
cr
e
m
ental tag su
ggestion (
P
4)
0.
855
Figure 9 s
h
ows accuracy
of t
a
g
suggestion for seve
n
new landm
arks
. T
h
e res
u
lts of
our
propos
e
d
m
e
t
hod (
P
4
)
ar
e very
si
m
i
l
a
r
t
o
t
hose
of t
h
e
basel
i
n
e m
e
t
h
od
(P
3) f
o
r t
a
g
s
such as “L
o
u
v
re”
,
an
d “Pa
n
t
h
eo
n
R
o
m
e”. The resul
t
s
of o
u
r
pr
op
ose
d
m
e
t
hod (P
4) are
bet
t
e
r t
h
an t
h
ose o
f
t
h
e basel
i
n
e
m
e
t
hod (
P
3
)
f
o
r t
a
g
s
suc
h
as “M
oulin R
o
uge”, “Potala
Palace”, a
n
d “St. Marks B
a
silica”.
5.
CO
NCL
USI
O
N
We propo
se an
in
crem
en
tal au
to
m
a
tic
lan
d
mark
tag
g
i
n
g
fram
e
wo
rk
th
at can
tag
th
e h
i
gh
lig
h
t
ed
l
a
ndm
ark re
gi
ons
a
n
d
seco
n
d
ary
regi
on
s
o
f
l
a
n
d
m
a
rk i
m
ages a
n
d
vi
de
o
s
t
o
ac
hi
eve
i
m
prove
d t
a
g
a
ccuracy
and sea
r
ch
per
f
o
r
m
a
nce. Ou
r pr
o
pose
d
m
e
t
h
od
di
scri
m
i
na
t
e
s n
e
w land
m
a
rk
tag
s
fro
m
e
x
istin
g
land
m
a
rk
tags
by
u
s
i
n
g a
co
m
b
i
n
at
i
on o
f
t
a
g m
odel
s
a
n
d
t
a
g c
o
-
o
cc
ur
re
nce m
a
t
r
i
ces. The t
a
g m
odel
s
are
co
nst
r
uct
e
d
usi
n
g
m
u
lt
i
c
l
a
ss SVM
cl
assi
fi
ers and sal
i
e
nt
det
e
ct
i
on ext
r
act
s h
i
ghl
i
g
ht
ed l
a
n
d
m
ark re
gi
o
n
s a
nd sec
o
nd
ary
r
e
gi
o
n
s
for each landmark label.
We use SIFT and GIST as feat
ures in the salient regions. Our
expe
rim
e
ntal r
e
sult
s
sho
w
t
h
at
com
b
i
n
i
n
g t
h
e i
n
fo
rm
ati
on f
r
o
m
hi
ghl
i
g
ht
ed l
a
nd
m
a
rk re
gi
o
n
s a
n
d
seco
n
d
ary
r
e
gi
o
n
s ca
n i
m
pro
v
e
F1 pe
rf
o
r
m
a
nce by
15.
5% c
o
m
p
ared wi
t
h
u
s
i
ng
onl
y
hi
g
h
l
i
ght
ed l
a
n
d
m
a
rk re
gi
o
n
s a
n
d
5% com
p
ared
wi
t
h
using salient
regions
from
entire im
ages. Our i
n
crem
en
tal tag
su
gg
estion can
tag
new l
a
n
d
m
ark
im
ag
es with
an
acc
uracy of 0.855.
We are
cu
rre
nt
l
y
im
provi
n
g
o
u
r t
e
c
hni
que
t
o
i
n
cr
ease t
h
e a
ccuracy
of
ne
w landm
a
rk image taggi
ng
to
su
ppo
rt large lan
d
m
ark
imag
e
d
a
tab
a
ses.
Fi
gu
re
9.
Acc
u
racy
o
f
t
a
g
s
u
g
g
est
i
o
n f
o
r se
v
e
n
new
l
a
n
d
m
a
rks
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pot
a
la Pa
lace
St Marks Ba
sil
i
ca
Arc D
e
T
r
imphe
B
o
r
obu
dur
La
De
f
e
n
s
e
L
ouvr
e
M
ecca s
a
udi
Pant
heon Rome
M
ouli
n
Rouge
P1:
T
ag co-occur
r
ence
P2:
t
a
g
model
wit
h
21
landm
arks
P3:
t
a
g
model
wit
h
30
landm
arks
P4:
Incr
ement
a
l
tag s
uggesti
o
n
Evaluation Warning : The document was created with Spire.PDF for Python.