Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 1
,
Febr
u
a
r
y
201
6,
pp
. 30
7
~
31
9
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
1.8
633
3
07
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
The Selection of Useful Visual
Words in Class-Imbalan
ced
Image Classification
Sutasinee
Chi
m
lek*,
Par
t
P
r
am
okch
on
**,
Punpi
ti Piam
sa-nga***
*Department of Computer
Scien
ce and
In
formation
Technolog
y
,
Faculty
of
Scie
n
ce, Naresuan
Un
iversity
, Th
ailan
d
** Departm
e
n
t
o
f
Com
puter S
c
i
e
nce,
F
acu
lt
y o
f
S
c
ien
c
e
,
M
a
ejo U
n
ivers
i
t
y
,
Chi
a
n
g
M
a
i,
Th
ail
a
nd
***Department of
Computer En
gineer
ing,
Faculty
of
Eng
i
neering
,
Kasets
art Univ
ersity
, Bangkok
, Thailand
Article Info
A
B
STRAC
T
Article histo
r
y:
Received
J
u
l 21, 2015
Rev
i
sed
O
c
t 31
, 20
15
Accepted Nov 16, 2015
The bag of v
i
s
u
al words
(BOVW
) has
recentl
y b
een us
e
d
for im
age
clas
s
i
fi
cat
ion in
large
dat
a
s
e
ts
. A
m
a
jor probl
em
of im
age
clas
s
i
fi
cat
ion us
ing
BOVW is high
dimensionality
,
with most
features usually
bein
g irrelev
a
nt
and differ
e
nt B
OVW
for
m
u
lti-view im
ages in
each c
l
ass. Th
erefore
,
th
e
selec
tion of
signific
a
nt v
i
sual wo
rds for m
u
lti-vie
w
im
ages in e
a
c
h
class is
a
n
essentia
l m
e
tho
d
to r
e
duce
th
e
size
of BOVW
while re
ta
ini
ng the
high
perform
ance of
im
age clas
s
i
fi
cat
ion. M
a
n
y
f
eatur
e s
c
ores
for ranking
produce low classification perf
ormance
for class imbalanced
distributions
and m
u
lti-vi
ews in each cl
ass. We propose a featur
e score b
a
sed on the
statisti
cal
t-
test
t
echniqu
e,
which
is a
stat
istic
al
e
v
alua
tion of
th
e
differ
e
nc
e
between
two sa
m
p
le m
eans,
to
asse
ss the discriminating po
wer of
each
individua
l fe
atur
e.
The m
u
lt
i-cl
a
s
s
im
age clas
s
i
f
i
ca
tion p
e
rform
ance of
th
e
proposed featur
e score is compared with
four modern feature scor
es, such as
Document Frequency
(DF), Mutual in
form
at
io
n (MI), Pointwise Mutual
information (PMI) and Chi-squar
e
statisti
cs (CHI). The results show that the
averag
e F1-measure performance on
th
e Paris dataset and th
e SUN397
dataset using
th
e proposed f
eature sc
ore ar
e 9
2
% and 94%
, r
e
spectively
,
while
all
oth
e
r f
e
ature
s
c
ores
do
n
o
t ex
ce
ed 80%
.
Keyword:
B
a
g of vi
s
u
al
wo
rd
s
Ch
i-sq
u
a
re stat
istics
Feature
selection
Im
age classification
Mu
tu
al informatio
n
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Pun
p
iti Piam
sa
-ng
a
,
Depa
rt
m
e
nt
of
C
o
m
put
er E
ngi
neeri
n
g
,
Fac
u
l
t
y
of
En
gi
nee
r
i
n
g
K
a
setsar
t
Un
iver
sity, Bang
kok
, C
h
atuj
ak
, B
a
n
gko
k,
1
090
0, Th
ailan
d
.
Em
a
il: p
p
@
k
u
.ac.th
1.
INTRODUCTION
Due to t
h
e efficiency and e
ffective
n
ess
of us
i
n
g a ba
g of
vi
sual
w
o
r
d
s (B
OV
W)
,
whi
c
h wa
s
propose
d
by Sivic
and
Zi
ss
erm
a
n [1], it
becam
e very well-known in
the fields of im
age retrieval and
classification, e.g.,
PASC
AL [2]
and SU
N [3]. The BOVW is used t
o
re
pres
e
n
t
l
o
cal
fe
at
ures a
n
d des
c
ri
pt
o
r
s,
al
on
g
wi
t
h
g
e
o
m
et
ry
veri
fi
cat
i
on,
w
h
i
c
h i
s
m
o
ti
vat
e
d by
a
n
a
n
al
o
g
y
,
wi
t
h
t
h
e
‘
b
ag
-o
f-
wo
rd
s’
re
prese
n
t
a
t
i
o
n
fo
r t
e
xt
cat
eg
o
r
i
zat
i
on.
The
r
e
are p
u
b
l
i
cat
i
ons [
4
]
-
[
7]
a
b
o
u
t
vi
s
u
al
co
nt
e
n
t
re
prese
n
t
a
t
i
on
usi
ng t
h
e B
O
V
W
due t
o
i
t
bei
n
g
a pr
om
i
s
i
ng m
e
t
hod
fo
r vi
s
u
al
co
nt
ent
cl
a
ssi
fi
cat
i
on, a
n
not
at
i
o
n, a
nd
r
e
t
r
i
e
val
.
T
h
e
B
O
V
W
m
odel of im
a
g
es m
a
y be classified in a c
l
ass on th
e
ba
si
s of
vi
s
u
al
wo
rd
hi
st
og
ra
m
s
. Vi
sual
wo
rds
are
obt
ai
ne
d
by
cl
ust
e
ri
n
g
i
n
t
h
e
descri
pt
o
r
s
p
ac
e [
1
]
,
whe
r
e al
l
pat
c
hes
co
ver
e
d
by
one
vi
su
al
wor
d
rep
r
es
ent
t
h
e
sam
e
p
a
r
t
in
i
m
ag
es. Each
imag
e is r
e
p
r
esen
ted
u
s
i
n
g
BO
VW
,
n
o
longer
b
e
ing
su
itable as a
lar
g
e nu
m
b
er
o
f
i
m
ag
es. Fu
rt
h
e
rm
o
r
e,
m
u
lti-cl
ass i
m
ag
e clas
sificatio
n
is u
s
efu
l
to
org
a
n
i
ze a larg
e n
u
m
b
e
r o
f
im
ag
es, wh
ich
are increa
sing significa
ntly. The s
upe
rvised learning process is use
d
to produce a c
l
assifier usi
n
g a pre
-
defi
ned
num
ber of cl
asses ba
sed o
n
B
O
V
W
[9]
.
A m
a
jor pr
o
b
l
e
m
of im
age cl
assi
fi
cat
i
on
usi
n
g B
O
V
W
i
s
high dim
e
nsionality,
m
o
st features of whic
h are us
ually
irreleva
nt, and the am
ount of
data excee
ds what ca
n
be stored in
available m
e
m
o
ry. The size of B
OVW has a trem
e
n
dous im
p
act on the class
i
fication
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
30
7 – 31
9
30
8
perform
a
nce [10]. T
h
ere
f
ore, the selec
tion of signi
ficant vi
sual words for
each
class is an essential m
e
thod t
o
reduce t
h
e size
of BOVW
whi
l
e retaining the
hi
g
h
pe
rf
orm
a
nce of
im
age classification.
In
ge
neral,
feat
ure s
e
lection a
p
proaches
are
use
d
in im
age classification t
o
re
duce the
di
mension
of
the feat
ure
spa
ce and im
prove the e
ffi
cienc
y
and precision
of t
h
e classifi
er [10]-[12].
These a
p
proac
h
es aim
to select efficient use
f
ul
features
from
the ori
g
inal feat
ure space acc
or
ding to s
o
m
e
evaluation c
r
iteria. The
featu
r
e ran
k
i
ng
-b
ased
ap
proach
[13
]
-[15
] is a well-k
nown
filter-b
a
sed
featu
r
e selection
fo
r h
a
nd
lin
g a v
e
ry
huge
num
ber
of features
. In t
h
is approac
h
,
each feat
ure
is
evaluate
d by
a scori
n
g m
eas
urem
ent. All features
are sorted i
n
descending orde
r; then,
a sm
all set of
high-sc
o
re
features is
kept as a
n
opti
m
al feature set
and
t
h
e rest
of t
h
e
m
are i
gnore
d
.
The feat
u
r
e ra
n
k
i
n
g-
base
d ap
p
r
oac
h
i
s
sim
p
l
e
, effi
ci
ent
an
d i
nde
pe
nde
nt
o
f
t
y
pes
of cl
assi
fi
ers;
hence
,
i
t
has b
een wi
del
y
use
d
i
n
im
ag
e cla
ssification. T
h
ere are m
a
ny e
fficient and effective
feature sc
ore
s
based on m
e
asurem
ent of the releva
n
ce of each indivi
dual feat
ure
to the class, such a
s
Doc
u
m
e
nt
Fre
que
ncy
(D
F)
[
13]
,
P
o
i
n
t
w
i
s
e
M
u
t
u
al
i
n
f
o
r
m
at
i
on (PM
I
)
[1
0]
, M
u
t
u
al
i
n
f
o
rm
at
i
on (M
I)
[
10]
,
C
h
i
-
s
qua
re st
at
i
s
t
i
c
s (C
HI
) [
1
3]
, et
c. M
o
st
s
c
ori
n
g
f
unct
i
o
ns a
r
e
base
d
o
n
vi
sual
w
o
r
d
occu
rre
nce
fre
que
ncy
in each class
of the dataset,
having an
im
balanced
distribut
i
on in
reality. As
a result, the
s
e feature scores also
cause low
perform
a
nce. The
class-im
b
a
lan
ced
d
i
stribu
tion
th
at arises fro
m
th
e ran
k
in
g-b
a
sed
selectio
n
ex
cessiv
e
ly con
s
id
ers v
i
su
al words th
at stron
g
l
y relate to
large classes (c
alled
m
a
jority classes) and tends t
o
ignore
vis
u
al words in sm
all classes (
cal
l
e
d
m
i
nori
t
y
cl
asses) [
1
6]
, [
17]
.
I
t
i
s
a basi
c n
o
t
i
on t
h
at
a vi
s
u
al
wo
r
d
wh
ose
occ
u
r
r
e
n
ce
fre
que
ncy
i
n
an
i
m
age of
a spe
c
i
f
i
c
cl
as
s i
s
hi
ghe
r t
h
a
n
t
h
at
o
f
ot
her
cl
asses i
s
desi
rabl
e
because it cont
ains
higher i
n
form
ation and
ha
s m
o
re discri
minating powe
r
tha
n
othe
rs.
There
f
ore,
we apply and exte
nd the t-sc
ore techniqu
e [18
]
, wh
ich
is b
a
sed
on
th
e statistical t-test
tech
n
i
qu
e, to
co
m
p
u
t
e feature scores
for m
u
lti-class-i
m
b
a
lan
ce tex
t
classi
ficatio
n
.
Th
e t-sco
r
e is b
a
sed
on
th
e
i
d
ea t
h
at
feat
ur
es m
a
y
di
scri
m
i
nat
e
pa
rt
i
c
ul
ar
l
y
wel
l
bet
w
ee
n t
w
o cl
asses i
f
occ
u
rre
nce
fr
eque
nci
e
s
fr
o
m
bot
h
classes are si
gnificantly diff
e
r
ent
.
We
use t
h
e t
-
sc
ore
t
o
e
s
t
i
m
a
t
e
t
h
e di
scri
m
i
nat
i
ng
po
wer
o
f
eac
h fe
at
ure
.
The
hi
g
h
e
r
t
h
e
score
a
feat
u
r
e
has,
t
h
e
m
o
re r
e
l
e
vance
t
h
e
r
e
is to
discrim
i
nate a s
p
ecific c
l
ass from
the
others.
We a
pplied t
h
is to com
b
ine t
h
e class-s
p
ecific score
for
a
visual word that
has th
ree in
cl
u
d
e
d
t-scores:
Max
t-
s
c
o
r
e,
A
v
er
ag
ed
t-
s
c
or
e
,
and
W
e
igh
t
e
d
Av
e
r
a
g
e
d
t-
s
c
o
r
e.
Howe
ver, the im
ages in each class have
differe
nt
appearance features from
the text in each class
because the images in each cl
ass can be
take
n from
m
u
ltiple views, as s
h
ow
n in Fi
gure 1. There
f
ore, the
r
e are
subclasses in each im
age class that
represe
n
t each view. E
ach subclass ha
s specific vis
u
al words. T
h
us, we
p
r
op
o
s
ed
t
h
e t-sco
r
e fo
r im
ag
e classificatio
n, wh
ich is effectiv
e in
im
b
a
lan
ced d
i
stri
b
u
t
i
o
n classes an
d
m
u
l
ti-
view im
ages in each class.
In
this pa
per, we
prese
n
t a t-sc
ore m
e
thod
to furt
her reduce
t
h
e
am
ount of BOVW
sto
r
ed
fro
m
ea
ch
im
ag
e sub
class, wh
ile still m
a
in
tain
in
g strong
classificat
io
n
p
e
rfo
r
m
a
n
ce.
The rest
of t
h
e
pape
r i
s
or
ga
n
i
zed as fol
l
ows
.
In
Sect
i
on
2,
we p
r
ese
n
t
rel
a
t
e
d w
o
r
k
s.
In
Sect
i
on
3
,
we prese
n
t
the
feature
sc
ore for visual word
selectio
n
i
n
a su
bcl
a
ss.
W
e
desc
ri
be
ou
r
ex
peri
m
e
nt
s and
t
h
e
resul
t
s
a
r
e
di
sc
usse
d i
n
Sect
i
o
n
4;
we con
c
lud
e
in Section
5.
Fig
u
re
1
.
Th
e ex
am
p
l
e i
m
ag
es of m
u
ltip
le v
i
ews
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
The
Sel
ect
i
o
n
of
Us
ef
ul
Vi
s
u
al
W
o
rds
i
n
C
l
ass-
I
m
b
a
l
a
nce
d
Im
ag
e C
l
assi
f
i
c
at
i
o
n
(Punp
iti Pia
m
sa
-ng
a
)
30
9
2.
RELATED WORK
2.
1. B
a
g
of
Vi
s
u
al
W
o
rd
(B
O
V
W)
The BOVW
method is the state-of-the
-art a
p
pro
ach, which dom
i
nates in
im
age classification and
retriev
a
l
for larg
e
d
a
tab
a
ses [1
]. Th
e m
e
th
od
s t
h
at pr
oduc
e BOVW include the follo
wing
three steps: feature
extraction, feature quantization
and
B
O
V
W
gene
rat
i
o
n. F
eat
ure e
x
t
r
act
i
on
det
ect
s sev
e
ral
l
o
cal
pat
c
hes i
n
each im
age and re
presents t
h
e patche
s as
num
e
rical vect
ors. Ma
ny interest poi
nt
detec
t
ors [19], [20]
and
descri
pt
o
r
s
[2
1
]
, [2
2]
are
p
r
op
ose
d
f
o
r
use.
T
h
e m
o
st
use
d
f
eat
ure e
x
t
r
act
i
o
n
i
n
t
h
e
bag
-
o
f
-
w
o
r
d
s
m
odel
i
s
t
h
e
Scal
e-i
n
vari
a
n
t
feat
u
r
e t
r
a
n
s
f
orm
(SI
F
T)
de
scri
pt
o
r
[2
3]
.
The
SIF
T
desc
ri
pt
o
r
cal
c
u
l
a
t
e
s t
h
e e
d
ge
gra
d
i
e
nt
i
n
eig
h
t
orien
t
atio
n
s
fo
r each
of th
e tiles in
th
e g
r
id, th
u
s
resu
ltin
g
in
a
128
-d
im
en
sio
n
vecto
r
fo
r each
i
m
ag
e.
Th
e
SIFT
d
e
scrip
t
or
h
a
s t
h
e ab
ility to
h
a
n
d
l
e in
ten
s
ity, ro
tat
i
o
n
,
scale an
d
affin
e
v
a
riatio
ns.
Feat
ure
q
u
ant
i
zat
i
on p
r
od
uce
s
a “vi
s
ual
w
o
r
d
v
o
ca
b
u
l
a
r
y
” (anal
o
g
ous
t
o
a w
o
r
d
di
ct
i
onary
)
.
A
vi
sual
w
o
rd
v
o
cabul
a
r
y
rep
r
e
s
ent
s
si
m
i
l
a
r pat
c
hes.
One si
m
p
l
e
m
e
t
hod i
s
t
o
pe
rf
orm
k-
m
eans cl
ust
e
ri
ng
o
v
er
all the vect
ors
[1].
The
vis
u
al
words a
r
e t
h
e
n
defi
ned as t
h
e centers
of t
h
e learne
d cl
ust
e
rs.
The
num
b
er
of
cl
ust
e
rs i
s
t
h
e vi
sual
w
o
r
d
v
o
cabul
a
r
y
si
ze (anal
o
go
us t
o
t
h
e size of the
word
dictiona
ry). Each patc
h in an
im
age i
s
m
a
pped t
o
a cert
a
i
n
vi
sual
w
o
r
d
. T
h
e fi
nal
st
ep,
B
O
V
W
ge
nera
t
i
on, i
s
per
f
o
r
m
e
d t
o
con
v
ert
vect
or
rep
r
ese
n
t
e
d
pat
c
hes t
o
vi
sual
wo
rd
s
(anal
o
g
ous
t
o
w
o
r
d
s
i
n
t
e
xt
d
o
c
u
m
e
nt
s),
w
h
i
c
h
al
s
o
rep
r
ese
n
t
eac
h i
m
age
by
t
h
e
hi
st
o
g
ra
m
of t
h
e
vi
sual
w
o
r
d
s.
2.
2. I
m
a
g
e
Cl
assi
fi
c
a
ti
o
n
Am
ong
su
pe
rv
i
s
ed l
ear
ni
n
g
t
echni
que
s, B
a
y
e
si
an cl
assi
fi
ers
[2
4]
a
n
d
Su
pp
ort
Vect
o
r
M
a
c
h
i
n
es
(SVM
) [
2
4]
ar
e wi
del
y
use
d
.
In i
m
ag
e retrie
val and classifi
cation, c
u
rre
n
t visual
word
vocabula
r
y sizes range
fr
om
sm
all
,
t
y
pi
cal
l
y
1 K
[2
5]
, t
o
l
a
r
g
e,
1
M
w
o
rds
[2
6]
. B
eca
use
o
f
t
h
e
c
o
m
put
at
i
onal
a
n
d
st
ora
g
e
requirem
ents, large
vis
u
al word
vocabul
aries are
d
i
fficu
lt to
m
a
n
a
g
e
in
real world
scen
ario
s t
h
at in
vo
lve v
e
ry
l
a
rge dat
a
base
s. Ther
ef
ore
,
a
m
e
t
hod f
o
r vi
sual
wo
r
d
v
o
c
a
bul
a
r
y
red
u
ct
i
on i
s
im
perat
i
v
e. T
h
ere are
several
m
e
t
hods t
h
at
t
r
y
t
o
red
u
ce t
h
e vi
sual
w
o
r
d
voca
b
ul
ary
si
g
n
i
f
i
cant
l
y
whi
l
e keepi
n
g ret
r
i
e
val
or cl
assi
fi
cat
i
o
n
per
f
o
r
m
a
nce const
a
nt
.
The c
o
m
m
on m
e
thods
proposed to re
duce the
visual
wo
rd vocabulary try to kee
p
visual words tha
t
fre
que
nt
l
y
ap
p
ear i
n
t
h
e
dat
a
set
[2
7]
, [
28]
.
In c
o
nt
rast
, T
u
rc
ot
an
d L
o
we [
2
9]
use
g
e
om
et
ri
call
y
sel
ect
ed
vi
sual
w
o
r
d
s,
whi
c
h
a
r
e
a
p
p
r
o
p
ri
at
e fo
r
c
o
nst
r
uct
i
n
g
a re
duce
d
vi
s
u
al
wo
rd
voca
b
ul
ary
.
Ho
we
ver,
f
o
r
t
h
i
s
technique, t
h
e
reduced
visual word
voca
b
ulary size
de
p
e
nd
s o
n
t
h
e g
e
om
et
ri
c prop
ert
i
e
s of t
h
e
dat
a
set
im
ages, whi
c
h
req
u
i
r
es m
o
re com
put
i
ng.
To
sol
v
e t
h
e
geo
m
et
ri
c const
r
ai
nt
, we
pr
o
pose
a sel
ect
i
on of
vi
sua
l
words from
a pool of
visu
al
words that re
peatedly appea
r
in each
subcl
a
ss, which are
robust agai
nst
m
u
lti-
views
of the
sa
me scene
or
object.
2.
3. Fea
t
ure S
c
ori
n
g
Fu
ncti
on
In im
age classification,
feature selection is
pot
en
tially i
m
p
o
r
tan
t
, as t
h
e size of the v
i
su
al
-wo
r
d
voca
b
ul
ary
i
s
usu
a
l
l
y
very
hi
gh
, b
u
t
i
t
has n
o
t
bee
n
us
ed i
n
any
e
x
i
s
t
i
ng
wo
rk
. T
h
ere
f
o
r
e, feat
u
r
e sc
or
e i
s
an
im
port
a
nt
t
ech
ni
q
u
e fo
r re
du
ci
ng t
h
e vi
s
u
a
l
wor
d
v
o
cab
ul
ary
si
ze. Th
i
s
m
e
t
hod m
e
asure
s
t
h
e rel
e
vanc
e
bet
w
ee
n
eac
h vi
sual
wo
rd
a
n
d
t
h
e
cl
ass by
anal
y
z
i
ng ge
ne
ral characteristics of the
traini
ng e
x
am
ples, such a
s
i
n
f
o
rm
at
i
on,
d
e
pen
d
e
n
cy
,
di
st
ance, c
o
nsi
s
t
e
ncy
,
et
c.
A
hi
g
h
-sc
o
re
vi
s
u
al
w
o
rd
ha
s
use
f
ul
feat
ur
es f
o
r
cl
assi
fy
i
ng.
T
h
ere are
se
veral
feat
ure
sel
ect
i
o
n m
e
t
hods
wi
d
e
l
y
used
i
n
i
m
age cl
assi
fi
cat
i
o
n
,
s
u
c
h
as
DF
, M
I
,
PM
I, C
H
I
,
et
c.
W
e
e
x
peri
m
e
nt
usi
n
g
fo
u
r
fea
t
ur
e sc
oring cri
t
eria use
d
in i
m
age categoriz
ation.
2.
3.
1. D
o
cume
nt
Fre
quenc
y (DF
)
DF is th
e
nu
mb
er of im
ages
in which a
vis
u
al word occ
u
rs.
Th
e v
i
su
al
words
with
sm
all DF are
usu
a
l
l
y
no
n-i
n
fo
rm
ati
v
e fo
r
cat
ego
r
y
pre
d
i
c
t
i
on.
We
choose vis
u
al words
with
DF above
a pre
d
e
f
ine
d
t
h
res
hol
d.
2.
3.
2. Mu
tu
al
Inf
o
rma
ti
o
n
(
M
I)
MI ha
s
been
use
d
as
a c
r
iterion
for
feature
sc
or
ing in im
age classification.
It ca
n be
use
d
to
characte
r
ize both the
rele
vanc
e and
re
dundancy of feat
ures
and m
easure the de
pe
nde
nce
betwee
n t
w
o ra
ndom
feature
s
. This
measure calcul
a
tes the num
ber a visual
w
o
rd c
ont
ri
but
es
t
o
m
a
ki
ng
the correct
classification
decision
on image class c
.
The M
I
bet
w
e
e
n a vi
s
u
al
w
o
r
d
v
w
a
nd a
cl
ass l
a
bel
c can be cal
c
u
l
a
t
e
d by
usi
n
g t
h
e fol
l
o
wi
n
g
equat
i
o
n:
MI
vw,
c
=
∑∑
p
vw
,c
lo
g
p(vw
,
c
)
p
vw
,p
(c
)
c
∈
{0
,1}
vw
∈
{0
,1)
(
1
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
30
7 – 31
9
31
0
We
use t
h
e a
v
e
r
age
of MI as
MI
avg
(v
w
)
of
K im
age classes in t
h
e
dataset to c
o
m
pute Eq. 2.
MI
avg
(v
w)
=
1
K
∑
M
I(vw
,
c
i
)
K
i=1
(
2
)
2.
3.
3.
Poi
n
tw
i
s
e M
u
tu
al
In
f
o
rm
ati
o
n
(PM
I
)
PMI is related
to
MI, referri
n
g
to
sing
le ev
en
ts, whereas MI refers to
t
h
e av
erag
e
o
f
all p
o
ssib
l
e
events
. It is
us
ed to m
easure t
h
e ass
o
ciation
betwee
n
a
vis
u
al word
vw and a clas
s
label
c, as i
n
E
q
.
3.
PMI
vw
,c
=log
p(vw
,
c
)
p
vw
,p
(c
)
(
3
)
We
use t
h
e a
v
e
r
age
of PM
I as
PMI
avg
(vw) of K im
age classes in
t
h
e
d
a
taset to
co
m
p
u
t
e
Eq
. 4.
PMI
avg
(v
w)=
1
K
∑
PM
I(v
w
,
c
i
)
K
i=1
(
4
)
2.
3.
4.
Sta
t
ist
i
cs (C
HI)
The
χ
test is u
s
ed
to
test th
e i
n
d
e
p
e
nd
en
ce of
two
ev
en
ts in
statistics. Th
u
s
,
we use it to
test wh
et
h
e
r
the oc
curre
nce
of a
specific
visual
word a
n
d the
occ
u
rre
n
ce
of a s
p
eci
fic class a
r
e i
nde
pe
nde
nt i
n
im
age
classification. Thus, we
ra
nk the
qua
n
tity for each visual by their score.
Let
χ
2
(v
w,
c
i
)
be t
h
e
C
H
I bet
w
een
a
specific
visual
word vw a
nd l
a
bel of an im
age class ci
.
We
use the a
v
e
r
age of
K im
age classes in the
da
taset
as
χ
2
vw
=
1
K
∑
χ
2
vw
,c
i
K
i=1
.
Th
e resu
lts in [10
]
in
d
i
cated
th
at CHI sign
ifi
cant
l
y
out
p
e
rf
orm
s
M
I
, PM
I and
DF.
Ho
we
ver
,
al
l
those
feature
score
s
we
re i
n
capa
b
le
of i
m
age classi
fication for class
im
balanced
datasets and di
ffe
rent
appea
r
a
n
ce vis
u
al words in each class. Those scores ar
e
m
o
re interested in the large classes than s
m
all
classes. T
h
ere
f
ore
,
inform
ativ
e vis
u
al
w
o
r
d
s
i
n
l
a
r
g
e cl
asse
s ha
ve a
hi
ghe
r
c
h
ance
to be selected
tha
n
visua
l
wo
rd
s i
n
sm
all cl
asses, wh
os
e perf
o
r
m
a
nces were al
so s
h
ow
n t
o
be l
o
w
.
Fu
rt
herm
ore,
t
hose sco
r
es o
m
i
t
t
e
d
vari
at
i
o
n
of t
h
e vi
sual
w
o
r
d
s
i
n
t
h
e sam
e
scenes a
n
d
o
b
j
ec
t
s
, w
h
i
c
h i
s
i
m
po
rt
ant
t
o
i
m
prove
pe
rf
orm
a
nce fo
r
im
age classification.
There
f
ore,
we propose a
feat
ure
s
c
ore of visual
wo
rds
from a class im
balanced
dataset
and
differe
n
t
v
i
ew
s of
scenes an
d
obj
ects. O
u
r
app
r
o
a
ch
ai
m
s
to
tak
e
th
ese v
a
r
i
ation
s
in
to
account in
th
e v
i
su
al w
o
rd
v
o
c
abu
l
ar
y r
e
du
ctio
n pro
cess.
3.
R
E
SEARC
H M
ETHOD
The
pr
op
ose
d
feat
ure
sco
r
e
i
s
based
o
n
t
h
e ass
u
m
p
tion that we can
evaluate the
differe
nce
of
occu
rre
nce fre
que
ncy
of vi
s
u
al
w
o
r
d
s bet
w
een
t
w
o
s
p
e
c
ific im
age classes and
othe
r im
age classes as a
co
nsequ
e
n
ce
of th
e po
ten
tial to
u
s
e t
h
is di
ffe
rence
as the
fe
ature sc
ore.
We solve
visual word selection
for differe
n
t views of sce
n
es an
d object
s in each im
a
g
e class by
groupi
ng im
ages in eac
h clas
s in the
pr
elim
inary process
.
There
f
ore,
we use
cluster an
alysis to
g
r
o
u
p
a set o
f
im
ages in each class suc
h
that im
ages in the
sam
e
grou
p
ha
ve a m
o
re sim
i
lar view
tha
n
t
hos
e in
othe
r
groups.
Thi
s
o
p
e
r
at
o
r
p
e
rf
orm
s
cl
ust
e
r
i
ng
usi
n
g t
h
e E
xpect
at
i
o
n M
a
xi
m
i
zat
i
on (E
M
)
al
g
o
ri
t
h
m
[30]
. EM
cl
ust
e
ri
n
g
i
s
perform
e
d to e
s
tim
a
te
the
me
ans and standa
rd de
viatio
ns for each cluste
r to
m
a
xi
mize
the likelihood
of the
obs
er
ved dat
a
and
at
t
e
m
p
t
s
t
o
app
r
o
x
i
m
at
e
the
o
b
se
rve
d
di
st
ri
but
i
o
ns of v
a
l
u
es
base
d on m
i
xt
ures
o
f
di
f
f
ere
n
t
d
i
stribu
tio
ns in d
i
fferen
t clu
s
t
e
rs.
Let D
b
e
th
e set o
f
im
ag
e d
a
ta an
d
D={D
1
,D
2
..
.D
|L|
}
,
w
h
er
e
l∈L
; a visual
word voca
bulary
V =
{vw
1
,…,v
w
|V
|
} b
e
th
e set of rep
r
esen
tativ
es
to
D;
C={c
1
,c
2
...
c
k
}
b
e
th
e set o
f
K
im
age classes of all im
a
g
es in
dataset D; eac
h im
age class
c
i
be group
ed with th
e EM al
g
o
rith
m
to
sub
c
lassc
i,
j
, w
h
ere
c
i
={
c
i,
1
,c
i,
2
,..
.,c
i,j
}
.
The im
ages are converted i
n
to
vi
sual
wo
rd
occ
u
r
r
ence
s by
t
h
e Te
r
m
Freq
uency
and
I
nve
rse
Doc
u
m
e
nt
Freque
ncy
(t
f
-
i
d
f)
wei
g
ht
s [1
0]
.
The t
f-i
df
w
e
i
ght
can
rep
r
esent
t
h
e sem
a
nt
i
c
cont
e
n
t
of t
h
e
im
ages. A
hi
g
h
wei
g
ht
i
n
t
f
-i
d
f
i
s
achi
e
ved
b
y
a hi
gh
vi
s
u
al
wo
rd
f
r
eq
ue
nc
y
and a l
o
w i
m
age f
r
eq
ue
ncy
of t
h
e
v
i
su
al
wo
rd
in
th
e im
ag
e d
a
taset; th
e wei
g
h
t
s h
e
n
ce ten
d
to filter ou
t
no
n-in
fo
rm
ativ
e v
i
su
al word
s.
The t
f-i
df w
e
i
g
ht
i
s
defi
ne
d as
)
idf(vw
)
d
,
tf(vw
)
d
,
tfidf(vw
i
j
i
j
i
. The term
freque
ncy
(tf
)
is
t
h
e f
r
eq
ue
ncy
of a
vi
s
u
al
w
o
r
d
i
n
a
n
im
age.
We
use the
normali
zed freque
n
cy of
visual
word vw
i
of
image d
j
,
defi
ned as
i
d
,
vw
d
,
vw
j
i
j
i
j
i
f
f
)
d
,
vw
(
tf
, where
j
i
d
,
vw
f
i
s
t
h
e nu
m
b
er of occ
u
r
r
e
nces o
f
vi
s
u
al
wo
rd
vw
i
in i
m
age
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
The
Sel
ect
i
o
n
of
Us
ef
ul
Vi
s
u
al
W
o
rds
i
n
C
l
ass-
I
m
b
a
l
a
nce
d
Im
ag
e C
l
assi
f
i
c
at
i
o
n
(Punp
iti Pia
m
sa
-ng
a
)
31
1
d
j
.
An i
n
verse
do
cum
e
nt
fre
que
ncy
(i
df
) f
act
or i
s
i
n
c
o
rp
orat
e
d
,
whi
c
h
decrease
s
t
h
e
wei
g
ht
o
f
t
e
r
m
s
t
h
at
occur
very fre
que
ntly in the
im
age se
t and
increases t
h
e
weight of vis
u
al
wo
rds t
h
at o
ccur
rarely
. T
h
e idf is
defi
ned as
i
vw
i
n
|
D
|
)
vw
(
idf
, where
i
vw
n
is th
e n
u
m
b
er o
f
im
ag
e
s
in
wh
ich
v
i
su
al wo
rd
vw
i
occurs
. The t
f
-
i
d
f
wei
g
ht
s a
r
e
n
o
rm
al
i
zed by
cosi
ne
no
rm
ali
zat
i
on,
defi
ne
d as
2
|
V
|
1
i
j
i
j
i
j
i
))
d
,
vw
(
tfidf
(
)
d
,
vw
(
tfidf
)
d
,
vw
(
tfidf
.
Hen
c
e, let
)
d
,
vw
(
w
j
i
b
e
t
h
e t
f-i
d
f
w
e
i
ght
of
vi
sua
l
wo
r
d
vw
i
in im
ag
e d
j
.
An im
age dj is
represe
n
ted as
a feature
weight vect
or
)]
d
,
vw
(
w
),...,
d
,
vw
(
w
[
d
j
|
V
|
j
1
j
.
We use t
h
e st
at
i
s
t
i
cal
t
-
t
e
st
t
e
chni
que
, w
h
i
c
h
i
s
a com
m
onl
y used m
e
t
hod
f
o
r st
at
i
s
t
i
cal
eval
uat
i
on
o
f
the differe
n
ce
betwee
n two s
a
m
p
les
m
eans [31], to s
o
l
v
e
the class im
b
a
lance. T
h
e t-t
e
st can be
use
d
t
o
det
e
rm
i
n
e whe
t
her b
o
t
h
dat
a
s
e
t
si
zes are t
r
em
endo
usl
y
u
n
e
qual
t
h
ro
u
gh
anal
y
s
i
s
m
eans, st
anda
r
d
de
vi
at
i
ons
an
d th
e assu
m
p
tio
n th
at
bo
th
d
i
stribu
tio
ns
are norm
al and
both
va
riances
a
r
e
une
qual.
Th
e t-test techn
i
qu
e d
e
term
in
es th
e sign
ifican
t d
i
ff
ere
n
ce
of m
eans of the tf-idf wei
ght
of a
visual
wo
rd bet
w
een a
speci
fi
c
s
ubc
l
a
ss
and ot
he
r sub
c
l
a
sses.
T
h
e basic idea is
that a visu
al word, whose
m
ean
tf-
id
f weigh
t
amo
n
g
th
e im
ag
e
in
a sp
ecific su
b
c
lass is si
g
n
i
fi
cant
l
y
hi
ghe
r t
h
an t
h
at
o
f
ot
he
r su
bcl
a
ss
es, i
s
a
highly
disc
riminative visual word b
ecause
i
t
contai
ns
higher inform
ation
about a
speci
fic subclass. T
h
is is a
pr
o
pose
d
t
-
t
e
st
sco
r
e,
w
h
i
c
h i
s
de
fi
ne
d as
f
o
l
l
o
ws:
subclass
j
,
k
i
j
,
k
i
j
,
k
i
subclass
S
)
c
,
vw
(
w
)
c
,
vw
(
w
)
c
,
vw
(
tscore
(5
)
whe
r
e
)
c
,
vw
(
w
j
,
k
i
is th
e sam
p
le
mean
o
f
th
e t
f-idf
weig
h
t
of v
i
su
al
word
vw
i
of
an im
age in
a
specific subcla
ss
j
,
k
c
in class
k
c
, where
k
j
,
k
c
c
and
C
c
k
, and
)
c
,
vw
(
w
j
,
k
i
is
th
e sa
m
p
le
m
ean
o
f
th
e
tf
-
i
df
w
e
igh
t
of
v
i
su
al wo
rd
vw
i
of a
n
im
age i
n
t
h
e ot
her su
bcl
a
ss,
j
,
k
j
,
k
c
C
c
. S i
s
t
h
e st
andar
d
de
vi
at
i
o
n
o
f
th
e t
w
o sub
c
lasses, wh
ich
i
s
calcu
lated
as
fo
llows:
j
,
k
j
,
k
c
j
,
k
i
2
c
j
,
k
i
2
subclass
N
)
c
,
vw
(
S
N
)
c
,
vw
(
S
S
(6
)
whe
r
e
)
c
,
vw
(
S
j
,
k
i
2
i
s
t
h
e st
anda
rd
devi
at
i
on
of t
h
e t
f
-i
d
f
wei
ght
of
vi
su
al
wor
d
v
w
i
of an im
age in
a specific s
ubc
lass
j
,
k
c
in class
k
c
, and
)
c
,
vw
(
S
j
,
k
i
2
i
s
t
h
e st
anda
r
d
de
vi
at
i
o
n o
f
t
h
e
wei
g
h
t
of v
w
i
in
t
h
e
ot
he
r s
ubcl
a
ss
.
j
,
k
c
N
is th
e
nu
m
b
er
o
f
im
ag
es in
su
b
c
lass
j
,
k
c
and
j
,
k
c
N
is th
e
nu
m
b
er of im
ag
es in
th
e o
t
h
e
r
subclass.
The hi
gh t
-
t
e
st
score o
f
vi
s
u
al
wor
d
v
w
i
i
n
di
cat
es hi
g
h
er
di
scri
m
i
nat
i
ng po
wer d
u
e t
o
i
t
havi
ng a
statistically significant
diffe
re
nce in t
h
e o
c
c
u
rrence
freque
ncy in a s
p
ecific subclass
j
,
k
c
, com
p
ared with t
h
e
ot
he
r s
ubcl
a
ss
.
The t
-
test score is locally specified
with
resp
ect to a sp
ecific sub
c
lass
j
,
k
c
. To
g
l
ob
ally
assess
th
e
value
of a
visual word
vw
i
i
n
each class
k
c
,
we so
l
v
e th
e fo
llowing
eq
u
a
tion
:
|
c
|
1
j
j
,
k
i
subclass
k
i
k
)
c
,
vw
(
tscore
)
c
,
vw
(
tscore
(7
)
We calcu
late th
ree t-test scores in
th
ree altern
ate ways to
co
m
b
in
e th
e class with a speci
fic score: the
M
a
x-tsc
o
re (ts
c
ore
ma
x
(v
w
i
))
p
u
r
p
oses t
o
m
e
r
i
t
t
h
e
m
a
xim
u
m
si
gni
fi
cance
of a vi
sual
w
o
r
d
occ
u
r
r
i
n
g i
n
o
n
e
class against othe
r classes; the Ave
r
a
g
e-ts
core (tsc
ore
avg
(v
w
i
)) u
s
es equ
a
l weigh
t
s o
f
all c
l
asses, with
an
in
atten
tiv
e
n
u
m
b
er o
f
im
ag
es b
e
l
o
ng
ing
t
o
it; th
e
Weigh
t
ed
Av
erag
e-tsco
re
(tsco
r
e
wa
v
g
(vw
i
)) app
lies the
avera
g
e of the
F1
m
easure of
each class, with its weight varying with
its size.
The three t-test scores are
defi
ned
as
fol
l
ows:
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
30
7 – 31
9
31
2
)
c
,
vw
(
tscore
max
)
vw
(
tscore
k
i
|
C
|
1
k
i
max
(
8
)
|
C
|
)
c
,
vw
(
tscore
)
vw
(
tscore
|
C
|
1
k
k
i
i
avg
(9
)
|
C
|
1
k
k
i
k
i
wavg
)
c
,
vw
(
tscore
)
c
(
P
)
vw
(
tscore
(1
0)
4.
R
E
SU
LTS AN
D ANA
LY
SIS
4.
1.
D
a
t
a
se
t
We
use t
w
o
dat
a
set
s
t
o
e
x
peri
m
e
nt
t
h
e f
eat
ure
sc
oring of
visual
words
and its
us
e in im
age
cl
assi
fi
cat
i
on:
t
h
e Pa
ri
s
dat
a
se
t
[3
2]
an
d t
h
e
SU
N3
9
7
dat
a
s
e
t
[3
3]
.
The Pa
ri
s dat
a
set
cont
ai
ns
6,
30
0
hi
g
h
res
o
l
u
t
i
on
(1
0
24 ×
76
8) i
m
ages obt
ai
ne
d
fr
om
Fl
i
c
kr by
que
ry
i
n
g t
h
e as
soci
at
ed t
e
xt
t
a
gs f
o
r fam
ous
Pari
s l
a
n
d
m
a
rks, suc
h
as “
E
iff
e
l Towe
r Pa
ris
”
or “
L
o
u
v
re
P
a
ris”.
Thi
s
dat
a
set
c
o
nsi
s
t
s
o
f
1
2
l
a
ndm
ark sce
n
es
i
n
Pari
s. E
ach
l
a
ndm
ark sce
n
e ha
s di
f
f
ere
n
t
num
bers o
f
i
m
ages
r
a
ng
ing
fro
m
as f
e
w as ~15
0
im
ag
es fo
r ‘
‘
E
iff
e
l To
wer
’
’
to
~
1
400 i
m
ag
es f
o
r
“G
en
er
al Par
i
s”. The
distribution
of
im
ages in classes is im
balanced. Ave
r
ag
e
,
St
d. de
v.
, an
d
C
o
ef
fi
ci
en
t of
Varia
n
ce (CV) of the
num
ber
of images
of the cl
asses are
525, 311.68 a
n
d
1.51,
res
p
ective
l
y. The e
x
am
ple im
ages from
this
d
a
taset ar
e show
n in
Figu
r
e
2.
Figure
2. The
e
x
am
ple im
ages of the
Paris
da
taset
The S
U
N3
9
7
dat
a
set
i
n
cl
u
d
e
s
t
h
e e
x
t
e
nsi
v
e
Scene
U
N
d
er
st
andi
ng
(S
U
N
)
dat
a
base,
w
h
i
c
h co
nt
ai
ns
3
9
7
categor
ies an
d
13
0,519
i
m
ag
es
(
200
x
2
00)
.
Ex
am
p
l
es
of c
a
tegories
include
“ab
bey
”
, “grotto”
,
“os
s
uary”
,
“salt plain”, “s
ignal box”, “si
n
khole”, “s
unken ga
rden”
a
n
d “wi
nne
rs circle”. Each cat
egory has a
di
ffe
rent
num
ber
o
f
i
m
ages,
ra
n
g
i
n
g
fr
om
as few
a
s
~
10
0 i
m
ages
t
o
~
2300 im
ages. T
h
e
distribution
of im
ages in
classes is im
balanced.
Avera
g
e, Std.
de
v., a
n
d Coe
ffici
ent of
Va
riance (CV) of
t
h
e
number
of im
ages of t
h
e
classes ar
e
2
71.79
,
2
5
1
.
62
and
1.50
,
res
p
ect
ively. The e
x
a
m
ple images from
th
is d
a
taset are shown in
Fig
u
re
3.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
The
Sel
ect
i
o
n
of
Us
ef
ul
Vi
s
u
al
W
o
rds
i
n
C
l
ass-
I
m
b
a
l
a
nce
d
Im
ag
e C
l
assi
f
i
c
at
i
o
n
(Punp
iti Pia
m
sa
-ng
a
)
31
3
Figure
3. The
e
x
am
ple im
ages of the
SUN397
dataset
4.
2. V
o
ca
bul
a
r
y
Si
z
e
We e
x
am
ine to select the
be
st size of
visual wo
r
d
voca
b
ul
ary
f
o
r eac
h
dat
a
set
;
ch
oo
si
ng
t
h
e
ri
g
h
t
voca
b
ulary siz
e
involves t
h
e
perform
a
nce of each im
age. We expe
rim
e
n
t
using bi
nary
BOVW in eac
h im
age.
Let BOVW
vw
i
,d
j
be
a
bag
o
f
vi
sual
w
o
r
d
vw
i
of im
age d
j
. If
vis
u
al w
o
r
d
vw
i
occurs
in a
n
im
age d
j
, the
n
BOVW
vw
i
,d
j
=1
;
ot
he
r
w
i
s
e,
B
OVW
vw
i
,d
j
=0. We u
s
e
a
100-
v
i
su
al w
o
rd
vo
cabu
l
ar
y
to
50
,0
00-
v
i
su
al
words vo
cab
u
l
ary in
BOVW
to
classify th
e
Paris dataset an
d
t
h
e SUN397
dataset u
s
ing th
e Su
ppo
rt
Vecto
r
M
achi
n
es (
S
V
M
). Fi
g
u
re
4 sho
w
s t
h
e rel
a
t
i
ons
hi
p bet
w
ee
n t
h
e cl
assi
fi
cat
i
on pe
rf
orm
a
nce and t
h
e si
ze
of t
h
e
visual-word vocabulary.
We
use the F1-m
easure to e
v
aluate the classification
pe
rf
or
m
a
nce. The o
p
t
i
m
a
l
vi
sual
w
o
r
d
vo
cabul
a
r
y
si
ze is an ap
pr
o
x
i
m
at
el
y
5,00
0-
vi
s
u
al
wo
r
d
s v
o
c
a
bul
a
r
y
fo
r t
h
e
Pari
s dat
a
set
and a
n
app
r
oxi
m
a
t
e
ly
20
,0
0
0
-
v
i
s
ual
wo
rd
s v
o
cab
ul
ary
fo
r t
h
e SU
N397
d
a
taset.
Th
erefo
r
e,
we
exam
ine feature score
techniques
on t
h
e
optim
al visual wo
rd voca
bula
r
y size for
each
dataset.
Fi
gu
re
4.
The
c
l
assi
fi
cat
i
on
pe
rf
orm
a
nce f
o
r
di
ffe
re
nt
si
zes
of
t
h
e
vi
sual
-w
or
d
v
o
cab
ul
ary
o
n
t
w
o
dat
a
set
s
4.
3. Cl
as
si
fi
er
The pe
rform
a
nce of im
age classification is use
d
to
evalua
te the effectivenes
s of feature selection.
We
use t
w
o cl
assi
fi
ers i
n
t
h
e
expe
ri
m
e
nt
, whi
c
h c
o
nsi
s
t of SVM
with a linear
ke
rnel a
n
d Naï
v
e Bayes
.
SVM
fi
n
d
s t
h
e m
a
xi
m
u
m
m
a
rgi
n
hy
pe
r pl
a
n
e b
e
t
w
een t
w
o cl
asses by
usi
n
g
t
h
e t
r
ai
ni
n
g
d
a
t
a
and a
ppl
y
i
ng a
n
opt
i
m
i
zati
on t
echni
que
.
The
de
ci
si
on
b
o
u
nda
ry
i
s
defi
n
e
d
by
a
su
b-
s
e
t
o
f
t
h
e
t
r
ai
ni
ng
dat
a
,
t
h
e
s
o
-cal
l
e
d
su
pp
ort
vect
o
r
s. S
V
M
wi
t
h
a l
i
n
ear
ker
n
el
has s
h
ow
n
g
o
o
d
ge
neral
i
z
a
t
i
on
per
f
o
r
m
a
nce a
n
d
i
s
r
o
b
u
st
o
n
hi
g
h
l
y
di
m
e
nsional
i
m
age cl
assi
fi
cat
i
on.
Al
t
h
o
u
gh t
h
e Naï
v
e Bayes classifier suf
f
e
r
s from
lower acc
uracy
co
m
p
ared to th
e
SVM
classifier, it
m
a
k
e
s it easy t
o
esti
m
a
te th
e probab
ility th
at a
sam
p
le b
e
lo
ngs to
a
part
i
c
ul
a
r
cl
as
s. T
h
e e
x
peri
m
e
nt
i
s
pe
rf
or
m
e
d usi
n
g
t
h
e
Li
bS
VM
pac
k
age
a
n
d
Naï
v
e B
a
y
e
s
pac
k
age
o
f
WE
KA
[
34]
,
w
i
t
h
t
h
e
defa
u
lt v
a
lu
es
of p
a
rameters.
0
10
20
30
40
50
60
70
80
90
100
50
0
200
0
350
0
500
0
650
0
800
0
950
0
1100
0
1250
0
1400
0
1550
0
1700
0
1850
0
2000
0
2150
0
2300
0
2450
0
2600
0
2750
0
2900
0
3050
0
3200
0
3350
0
3500
0
3650
0
3800
0
3950
0
4100
0
4250
0
4400
0
4550
0
4700
0
4850
0
5000
0
F1
-
m
e
a
s
u
r
e
V
o
cabulary siz
e
Pari
s
D
a
taset
SUN39
7
dataset
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
30
7 – 31
9
31
4
4.
4.
E
v
al
u
a
ti
o
n
We
use t
h
e
F1 m
easure to a
g
gre
g
ate the
pe
rform
a
nce
of m
u
l
tip
le
classifiers.
It e
x
am
ines
both t
h
e
p
r
ecision
an
d
recall o
f
t
h
e test to
co
m
p
u
t
e th
e scor
e: precisio
n
is t
h
e n
u
m
b
e
r
o
f
correct po
sitiv
e
resu
lts
divide
d
by the
num
b
er of all positive results, and
reca
ll is the num
b
er
of correct
positive
re
sults divi
ded
by
th
e nu
m
b
er o
f
p
o
s
itiv
e
resu
lts th
at sh
ou
ld
h
a
v
e
b
e
en
ret
u
rned
. Th
e F1
score can
b
e
in
terp
reted
as a wei
g
h
t
ed
avera
g
e of the
precision a
nd recall, where
an F1 scor
e reaches its best value at 1 and worst score at 0.
C
a
l
c
ul
at
i
on
of
t
h
e F
1
m
easure
i
s
de
fi
ne
d as:
F1=2·
Precision ·Rec
all
Precis
i
on+Re
call
(1
1)
4.
5.
Res
u
l
t
s
4.
5.
1.
Perf
orm
a
nce
on
S
V
M
and
N
a
ï
v
e
B
a
yes
The n
u
m
b
er of
subcl
a
sses
gr
ou
pe
d usi
ng t
h
e EM
al
gori
t
h
m
i
s
i
n
t
h
e ran
g
e fr
om
t
w
o subcl
a
sse
s
t
o
fiv
e
sub
c
lasses fo
r t
h
e two
d
a
tasets.
W
e
co
mp
are th
e
p
e
rformance of im
age classifi
cat
i
on vi
a 10 fe
at
ure
scor
e
m
e
t
hods:
1)
Doc
u
m
e
nt
Fre
que
ncy
(
D
F
)
,
2) M
u
t
u
al
i
n
f
o
rm
at
i
on (M
I
)
,
3
)
P
o
i
n
t
w
i
s
e M
u
t
u
al
i
n
fo
r
m
at
i
o
n
(PM
I
), 4
)
Chi
-
sq
ua
re statistics (CHI
),
5)
M
a
x-tsc
o
re
,
6) Ave
r
age
-
tscore, 7)
Weighted Avera
g
e-tsc
o
re
, 8)
M
a
x-t
s
c
o
re wi
t
h
su
bcl
a
ss (
M
ax-t
sc
ore
-
su
b)
,
9
)
A
v
era
g
e-t
s
co
re wi
t
h
sub
c
l
a
ss
(
A
ve
rage
-t
sco
r
e-s
u
b)
,
1
0
)
Weighted Average-tsc
o
re
with
subclass (W
a
v
erage
-
tscore-s
ub).
Fi
gu
re 5 t
o
Fi
gu
re 8
sh
o
w
t
h
e F
1
m
easure resul
t
s
,
whi
c
h
were cl
assi
fi
ed by
usi
n
g S
u
pp
o
r
t
V
ect
or
Machines a
n
d
Naïve Bayes a
n
d are
use
d
on
the Pa
ris d
a
taset an
d SUN379
d
a
taset, resp
ectiv
ely.
Fig
u
re
5
.
F1
measu
r
e resu
lt usin
g SVM
on
t
h
e Paris
d
a
taset
The Pa
ri
s dat
a
set
(Fi
g
ure
5)
sho
w
s t
h
at
t
h
e
F1
per
f
o
r
m
a
nces of
SVM
u
s
i
ng M
A
X
-
t
s
core
-
s
ub a
n
d
M
A
X
-
t
s
core a
r
e s
u
peri
or
t
o
t
hose
of
ot
her
feat
ure
sc
ore
s
.
Fi
g
u
r
e
6
sh
o
w
s t
h
e
F1
m
easure
of
Naï
v
e
B
a
y
e
s,
whe
r
e M
A
X-t
s
core
-s
ub out
p
erform
s the
other feat
ure
sc
oresa
n
d
Weighted Avera
g
e
d
-tscore s
h
ows
results
that are slightl
y
lower tha
n
t
hos
e of the ot
her
feat
ure scores; PMI s
h
ows very low pe
rform
a
nce for i
m
age
classification.
The re
sults of
Weight
ed Ave
r
age
d
-tsc
ore are
the
worst be
cau
se it is an
avera
g
e
of t
h
e
tscore
and the
weight
of each class
varies
wit
h
its
size. Ho
weve
r, A
v
e
r
a
g
ed-tscore
is
not so.
T
h
e F1
m
easure
of
SV
M
b
a
sed
on Max-
tsco
r
e
-sub
is th
e h
i
g
h
e
st
(
9
2
.
00
)
wh
en
th
e nu
m
b
er
of
selected
f
eatures is 40
0, an
d th
e F1
measure
of
Naïve Bayes
usi
n
g Max-tsco
re-s
ub
reaches m
a
xim
u
m
(89.00)
whe
n
the
number of selected
visua
l
wo
rd
s
i
s
60
0.
0
10
20
30
40
50
60
70
80
90
10
0
20
0
4
00
6
0
0
800
10
00
1
200
1
4
0
0
160
0
1
8
0
0
2
0
0
0
F1
m
e
a
s
u
r
e
Num
b
er
of
visua
l
w
o
r
d
s
T
h
e P
a
r
i
s datase
t
DF
MI
PM
I
CH
I
Ma
x
-
t
s
c
o
r
e
A
v
er
age
-
ts
co
re
W
e
ig
hte
d
A
v
erag
e
-
tsc
o
r
e
Ma
x
-
t
s
c
o
r
e
-s
u
b
A
v
er
age
-
ts
co
re
-s
ub
W
a
ver
a
ge
-
t
s
c
o
r
e
-
s
u
b
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
The
Sel
ect
i
o
n
of
Us
ef
ul
Vi
s
u
al
W
o
rds
i
n
C
l
ass-
I
m
b
a
l
a
nce
d
Im
ag
e C
l
assi
f
i
c
at
i
o
n
(Punp
iti Pia
m
sa
-ng
a
)
31
5
Fig
u
re
6
.
F1
measu
r
e resu
lt usin
g Naï
v
e Bayes on
th
e Pari
s d
a
taset
Fo
r bo
th classifiers
o
n
th
e
SUN39
7
d
a
taset
,
Max-ts
co
re-s
ub
o
ffe
rs th
e b
e
st F1 m
easure
res
u
lts. T
h
e
F1 m
easur
e o
f
M
a
x
-
t
s
core
-
s
u
b
usi
n
g S
V
M
i
s
hi
g
h
est
(9
4.
00)
when the
num
b
er of se
lec
t
ed feat
ures
is
1800,
as s
h
own i
n
Fi
gure
7, a
n
d the
F1 m
easure
of Max-tsco
r
e
-
s
ub
u
s
ing N
a
ï
v
e
Bayes is h
i
g
h
e
st (9
1.00
)
w
h
en
t
h
e
num
ber
of
sel
e
ct
ed vi
s
u
al
wo
r
d
s i
s
2
4
0
0
,
as s
h
o
w
n i
n
Fi
g
u
r
e
8.
There
f
ore, Max-tsc
o
re
-s
ub
has significant
classifi
catio
n
perfo
r
m
a
n
ce on th
e two
d
a
tasets an
d with
the two classifiers bec
a
use
the feature sc
ore
aim
s
to
m
a
ximize the signi
ficance
of fe
ature
rele
vance
in
one
subclass in eac
h class a
g
ainst
othe
r classes
rather tha
n
using the a
v
e
r
age
of t
h
e feat
ure
score
for all cl
asses
.
Thu
s
, th
e feature score is su
itab
l
e fo
r situ
ati
o
n
s
wh
ere
t
h
e
class sizes are
q
u
ite d
i
fferen
t
an
d
m
u
lti-v
i
ews ex
ist
in each class.
Fig
u
r
e
7
.
F1
measu
r
e r
e
su
lt usin
g SV
M
on
t
h
e SUN
379
d
a
taset
0
10
20
30
40
50
60
70
80
90
10
0
200
400
600
800
1000
1200
1400
1600
1800
2000
F1
m
e
asur
e
Num
b
er
of
vis
u
al
words
The Paris d
a
taset
DF
MI
PM
I
CH
I
Max
-
t
s
co
r
e
A
v
e
r
ag
e-t
s
c
o
r
e
W
e
i
ght
ed Aver
a
g
e
-
t
s
core
Max
-
t
s
co
r
e
-sub
A
v
e
r
ag
e-t
s
c
o
r
e
-s
u
b
W
a
v
e
r
a
ge
-t
sc
o
r
e
-
su
b
0
10
20
30
40
50
60
70
80
90
100
20
0
40
0
60
0
80
0
10
00
12
00
14
00
16
00
18
00
20
00
22
00
24
00
26
00
28
00
30
00
32
00
34
00
36
00
38
00
40
00
42
00
44
00
46
00
48
00
50
00
52
00
54
00
56
00
58
00
60
00
62
00
64
00
66
00
68
00
70
00
72
00
74
00
76
00
78
00
80
00
82
00
84
00
86
00
88
00
90
00
92
00
94
00
96
00
98
00
10
00
0
F1 m
e
a
s
ure
N
u
m
b
er
of v
i
s
u
al w
o
rds
T
h
e SU
N3
97
data
set
DF
MI
PMI
CHI
Max-tscor
e
A
v
erage-tsc
o
re
W
e
i
g
hted Averag
e-tscore
Max-ts
cor
e
-s
ub
A
v
erag
e-tscore-s
ub
Wav
e
rag
e
-t
s
c
or
e-s
u
b
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
30
7 – 31
9
31
6
Fig
u
re
8
.
F1-measu
r
e resu
lt usin
g Naï
v
e Bayes on
th
e SUN3
79
d
a
taset
4.
5.
2. Qu
al
i
t
y
of
Sel
ected
Fe
atu
res
We di
vi
de t
h
e
t
w
o dat
a
set
s
i
n
t
o
t
h
ree g
r
o
ups
of
cl
asses
based o
n
si
ze
:
m
a
jori
t
y
,
m
oderat
e
, a
nd
m
i
nori
t
y
. Fi
g
u
r
e 9 a
nd
Fi
g
u
r
e
10 s
h
ow t
h
e avera
g
es
of F
1
m
easures o
f
t
h
e Pari
s
dat
a
se
t
and S
U
N
3
97
dat
a
set
whe
n
va
rious
feature sc
ore
s
are use
d
to conside
r
the
effe
ct on the classification
of m
a
jo
ri
t
y
,
m
oderat
e
and
min
o
r
ity classes.
We co
m
p
are th
e Max
-
tsco
re-sub
with
other feat
ure
scores and
use t
h
e
SVM classifie
r
.
Ho
we
ver
,
o
u
r
expe
ri
m
e
nt
sho
w
s t
h
at
t
h
e
M
a
x-t
s
c
o
re
-s
ub cl
ea
rl
y
out
per
f
o
r
m
s
al
l
f
eat
ure sc
ore
s
unde
r all class
e
s, especially
very sm
all size
classes (
m
i
nor
i
t
y
). We c
oncl
ude t
h
at
o
u
r
score i
n
c
r
eases t
h
e
F
1
measu
r
e
o
f
th
e
min
o
r
ity class
with
ou
t sacrifi
c
ing t
h
e F1 m
easure
of t
h
e m
a
jority class.
Fi
gu
re
9.
The
F1 m
easure
res
u
l
t
o
f
t
h
e
SVM classifier
for e
ach
group clas
s on
t
h
e Par
i
s
dataset w
h
en
v
a
r
i
ou
s
feature
sc
ores
are used
0
10
20
30
40
50
60
70
80
90
10
0
200
400
600
800
10
00
12
00
14
00
16
00
18
00
20
00
22
00
24
00
26
00
28
00
30
00
32
00
34
00
36
00
38
00
40
00
42
00
44
00
46
00
48
00
50
00
52
00
54
00
56
00
58
00
60
00
62
00
64
00
66
00
68
00
70
00
72
00
74
00
76
00
78
00
80
00
82
00
84
00
86
00
88
00
90
00
92
00
94
00
96
00
98
00
10
00
0
F1 m
easur
e
N
u
m
b
er of
visu
al wor
d
s
The SUN397
dat
a
se
t
DF
MI
PMI
CHI
Ma
x
-
ts
co
r
e
Av
er
ag
e-
t
s
c
o
r
e
We
i
g
ht
ed Avera
g
e
-
t
s
core
M
a
x-
t
s
c
o
re
-
s
ub
Av
er
a
g
e-tscor
e
-
s
u
b
W
a
ve
r
a
g
e
-tscore
-
sub
0
20
40
60
80
10
0
m
a
jori
t
y
m
o
d
e
r
a
te
m
i
n
o
ri
ty
F1-m
eaus
re
Th
e
P
a
r
i
s
da
t
a
se
t
DF
MI
PMI
CHI
M
a
x-ts
core-
s
u
b
Evaluation Warning : The document was created with Spire.PDF for Python.