TELKOM
NIKA
, Vol.14, No
.4, Dece
mbe
r
2016, pp. 14
80~149
2
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i4.4646
1480
Re
cei
v
ed Au
gust 28, 20
16
; Revi
sed O
c
t
ober 1
1
, 201
6; Acce
pted
Octob
e
r 26, 2
016
Multi Features Content-Based Image Retrieval Using
Clustering and Decision Tree Algorithm
Kusrini Kusrini*
1
, M.
Dedi Iskandar
2
, F
e
rry
Wahy
u
Wibo
w
o
3
ST
M
I
K AMIK
OM Yogy
akarta,
Jl. Ringr
oad Ut
ara Co
ndo
ng C
a
tur Dep
o
k Sle
m
an Yog
y
a
k
art
a
Indon
esi
a
,
T
e
lp:+
62274
88
420
1/fax:+
6
22
748
84
208
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: kusrini@
amik
om.ac.id
1
, dedi
.s.kom@gmail.
com
2
, ferry
.
w
@amikom.ac.id
3
A
b
st
r
a
ct
T
he classific
a
ti
on can b
e
perf
o
rmed by us
in
g the
decis
io
n tree ap
pro
a
ch. Previo
us resea
r
ches o
n
the classific
a
ti
on usi
ng the
d
e
cisio
n
tree h
a
v
e mostly b
e
e
n
inten
d
e
d
to classify text dat
a. T
h
is pap
er w
a
s
inten
d
e
d
to introduc
e a class
i
ficatio
n
ap
plic
ation to
the co
ntent-bas
ed i
m
age retr
i
e
val (
C
BIR) w
i
th mu
lti
-
attributes
by u
s
ing
a d
e
cisi
on
tree. T
he attri
butes us
ed
w
e
re the vis
u
a
l
fe
atures of th
e i
m
a
ge, i.e. : co
l
o
r
mo
ments (ord
e
r
1, 2 and 3), i
m
a
ge e
n
tropy,
ener
gy an
d ho
mo
ge
neity. K-
me
ans cl
uster
alg
o
rith
m w
a
s used
to categ
o
ri
z
e
each
attrib
ute. T
he r
e
sult
of
categor
i
z
e
d
d
a
t
a w
a
s then
b
u
ilt i
n
to
a
deci
s
ion
tree
by
u
s
ing
C4.5. T
o
show
the concept i
n
appl
icat
io
n, thi
s
research b
u
il
t an appl
icati
o
n w
i
th mai
n
features, i.e.: cases
data i
nput, cas
e
s list, traini
ng
process a
nd t
e
sting
proc
ess
to do classific
a
tion. T
he res
u
ltin
g tests of 15
0
rontge
n data s
how
ed the trai
nin
g
data class
i
ficatio
n
’
s
truth valu
e of 75.33
% and te
stin
g data class
i
ficat
i
on
of 55.7%.
Ke
y
w
ords
: Image, Cl
assificat
i
on, Dec
i
sio
n
T
r
ee, Cluster
ing
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Image d
a
tab
a
se
ha
s be
e
n
co
mmonly
use
d
by ma
ny appli
c
atio
n dom
ains n
o
wa
days
su
ch a
s
m
u
ltimedia
se
arch en
gine
s, digita
l lib
rari
es, me
di
cal d
a
taba
ses, ge
ograp
hical
databa
se
s, e
-
co
mme
rce, online tutori
n
g
system
s a
nd criminal i
n
vestigatio
ns.
Image datab
ase
coul
d b
e
visu
alize
d
u
s
ing
i
m
age
browsi
ng a
s
a
way t
o
ret
r
ieval
sy
stem
s [1]. Image
retri
e
val
can
be perfo
rme
d
using attrib
u
t
es attach
ed
to the image
such as
cre
a
tion date, st
orag
e locatio
n
,
size or oth
e
r pred
efined a
ttributes. Ho
wever, t
he
search p
e
rfo
r
mance produ
ced in thi
s
way is
highly depe
n
ded on the ex
pertise of a user in d
e
scri
b
i
ng the image
. In addition, the se
arch co
uld
not be perfo
rmed ba
sed o
n
the sema
ntics of the ima
ge itself [2].
Tech
nolo
g
y is evolving to
wards im
age
sea
r
ching by
usin
g method
called
conte
n
t-ba
sed
image
ret
r
iev
a
l
(CBIR)
a
n
d
al
so kno
w
n
as query by image conten
t (QBIC) [3]. Instea
d of taking
image’
s
information
from external re
so
urces or
m
e
tadata, CBIR approa
ch u
s
ed th
e intri
n
si
c
feature
s
of th
e imag
e
su
ch
as colo
r, sha
pe, text
ure,
o
r
a
combin
ation of th
ese f
eature
ele
m
e
n
ts
[4]. Color a
s
an imag
e fea
t
ure ha
s b
e
e
n
su
cce
s
fully applie
d in im
age retrieval
appli
c
ation
si
nce
it has strong
correl
ation
wi
th
the o
b
je
cts in
sid
e
the
image [5]. F
u
rtherm
o
re, th
e color featu
r
e’s
robu
stne
ss h
a
s b
een
proven in p
r
o
c
e
ssing scal
ed a
n
d
ori
entation
-
cha
nge
d ima
ges [6
-1
0]. The
image it
self is cla
s
sified int
o
thre
e ima
g
e
s i.e.
inte
nsi
t
y image, ind
e
xed ima
ge,
and bi
na
ry im
age
[11]. To be more
s
p
ec
ific, CBIR’s
ret
r
ieval tec
hni
qu
e is ba
sed o
n
informatio
n
extracted from
pixels [12, 13
].
CBIR h
a
s
be
en wi
dely ap
plied in
many
re
sea
r
ch p
r
o
j
ects. O
ne of
appli
c
ation
of CBIR i
s
for re
cog
n
izi
n
g porn im
age
[14-15]. CBIR is al
so u
s
e
d
in health area re
sea
r
che
s
. It is used
as
image a
d
mini
stration
syst
e
m
that
su
ppo
rts the
physi
cian task
su
ch
as di
agn
osi
s
, telemedici
n
e
,
teachi
ng an
d learni
ng ne
w
medical kn
owledge [16
-
18].
This p
ape
r introdu
ce
s th
e con
s
tructio
n
of
CBIR b
y
using a
case b
a
sed reasonin
g
con
c
e
p
t. In previou
s
re
se
a
r
ch,
ca
se-ba
s
ed re
aso
n
ing
has be
en ex
plaine
d as a
con
c
e
p
t to build
rule fo
r
clini
c
al se
rvice, sp
ecifically in di
agno
si
s p
r
obl
em dom
ain [1
9]. One al
go
ri
thm that can
be
use
d
to build
rule
s as a d
e
c
isi
on tree i
s
C4.5.
It could
be also a
p
lie
d to classify image [20
-
22]
.
The novelty
of this re
se
a
r
ch i
s
the
combinin
g u
s
ed of the cl
usteri
ng met
hod an
d
deci
s
io
n tree
developing i
n
image cl
assificatio
n
. Instead of pred
etermini
ng th
e discrete d
a
ta
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Multi Feature
s
Co
ntent-Ba
s
ed Im
age Retrieval Using
Cluste
ring a
nd… (Ku
s
ri
ni Kusri
n
i)
1481
manually
fo
r cla
ssifi
cation pro
c
e
ss,
the data
cl
us
te
re
d by usi
ng th
e k-mea
n
s
cl
usteri
ng m
e
th
od.
This p
r
o
c
e
ss
is used to cla
ssify ea
ch of
image featu
r
e
s
before cla
s
sificatio
n
pro
c
ess.
2. Rese
arch
Metho
d
2.1. Architec
ture Sy
stem
The archite
c
t
u
re
system of
CBIR in this
pape
r was fo
rmulated a
s
shown in Figu
re 1. All
of image
do
cuments were
prep
ro
ce
ssed
in o
r
de
r to
m
a
ke
all im
age
s h
a
ve the
sa
me format a
n
d
size. After the pre
p
ro
ce
ssi
ng step
have
finishe
d
, so
m
e
visual featu
r
es
we
re the
n
extracted fro
m
the image
s a
nd sto
r
ed int
o
image d
a
ta
bases. In
the
next step, using the
k-m
ean
s clu
s
te
ring
method [23],
each of the
feature
s
wa
s then
cate
g
o
rized a
nd
stored into
ca
tegori
z
ed im
age
databa
se. Th
ese d
a
ta we
re then used to build the de
cisi
on tree u
s
i
ng the C4.5 a
l
gorithm.
Figure 1. CBIR Architectu
re
2.2. Image Featur
e Extr
a
c
tion
Images we
re
pre
c
on
dition
ed into
som
e
sa
me
states, befo
r
e th
e pro
c
e
s
s of
feature
extraction
wa
s
starte
d, na
mely si
ze
of
140x14
0
pixe
l, bmp fo
rmat
, and
turn
ed
into 8
bit g
r
e
y
image color
mode. The fe
ature
s
used i
n
this experi
m
ent are quit
e
similar to p
r
evious
study, that
are
colo
r mo
ment and tex
t
ure. The
col
o
r mom
ent was divide
d int
o
3 low-o
r
de
r moment
s wh
ile
the texture fe
ature
s
u
s
ed
were contra
st
, corr
elation, energy,
hom
ogen
eity
, and entropy [24],
but
in this p
ape
r
use
d
ent
ropy
, energy, con
t
rast, an
d ho
mogen
eity [25]. Entropy, e
nergy,
contra
st,
and
hom
oge
neity feature
s
rep
r
e
s
ent
i
m
age
textur
e
.
Texture
ca
n be
d
e
fined
as a
region
’s
cha
r
a
c
teri
stic that is wi
de
enou
gh to f
o
rm p
a
tte
rn repetition.
In anothe
r
defin
ition, texture
is
defined a
s
a
spe
c
ifically ordere
d
pattern
con
s
is
tin
g
of pixel stru
ctures in the ima
ge.
One
parame
t
er ne
ede
d t
o
calculate t
he va
lu
e of
entro
py, en
ergy,
contra
st, and
homog
eneity is obtain
ed from co
-o
ccure
n
ce mat
r
ix,
that is a matrix that
describ
e
s
the frequ
en
cy
with whi
c
h a
pair of two
pixels with a
certai
n inten
s
ity within a
certai
n di
stan
ce an
d direct
ion
occurs in the
image. Co-o
ccurren
c
e int
ensity ma
trix
p (i1, i2) is defined by the following t
w
o
simple st
ep
s:
1.
Cal
c
ulate th
e
distan
ce
of e
v
ery two d
o
ts in
the ima
ge,
expre
s
sed i
n
a vecto
r
of v
e
rtical
and
hori
z
ontal di
rection
s
(d
= (dx, dy)). The
values of dx and dy are al
so in pixel.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1480 – 149
2
1482
2.
Cou
n
t the nu
mber of pixel
pairs th
at ha
ve
intensity value
s
of i1 and i2 and di
stance of d
pixels.
Put the calculation of
ea
ch
pair inten
s
ity
values i
n
to the m
a
tr
ix according to the
coo
r
din
a
tes,
whi
c
h the ab
scissa fo
r intensity va
lue i1 and the ordin
a
te for intensi
t
y value i2.
Entropy is a
n
image feat
ure u
s
e
d
for
measuri
ng th
e diso
rd
er of
intensity dist
ribution.
The entropy calcul
ation is
shown in Equa
tion 1.
12
)
2
,
1
(
log
)
2
,
1
(
ii
i
i
p
i
i
p
Entropy
(1)
Energy i
s
a f
eature
for me
asu
r
ing
the
n
u
mbe
r
of
con
c
entration
of
intensity p
a
irs in
co-
occurrence matrix. Equation 2 is used to cal
c
ul
ate this feat
ure’
s value.
The value
will be
increa
sed
if q
ualified
pixel
pairs th
at mat
c
he
d th
e
co-o
ccurren
c
e
int
ensity
m
a
trix requi
rem
ent are
more
co
ncen
trated at som
e
co
ordi
nate
s
of the
matri
x
and will b
e
decrea
s
e
d
if they are mo
re
disp
ersed.
En
er
gy
p
2
(
i
1
,
i
2
)
i
2
i
1
(2)
The
co
ntra
st
feature
is u
s
e
d
to m
e
a
s
u
r
e
the
stre
ngth
of differe
nt in
tensity of th
e
image,
while
homo
g
eneity is u
s
ed to mea
s
u
r
e ho
mog
e
n
e
ity of image inten
s
ity variation. Ima
ge’s
homog
eneity
value
will
b
e
in
crea
sed
as it
s va
riati
on inte
nsity i
s
d
e
cre
a
sed.
The
form
ula
to
measure the image contra
st is s
hown in Equation 3 while that fo
r measurin
g the homog
eneity is
sho
w
n in Equ
a
tion 4.
Contrast
(
i
1
i
2
)
2
p
(
i
1
,
i
2
)
i
2
i
1
(3)
Ho
mo
gen
e
it
y
p
(
i
1
,
i
2
)
1
|
i
1
i
2
|
i
2
i
1
(4)
The notation
of p in Equation 2, 3 and 4
denote
s
the prob
ability and ha
s valu
e in the
rang
e of
0 t
o
1, that
is the el
ement
va
lue i
n
co
-o
ccurre
nce mat
r
ix. Mean
whil
e i
1
a
nd i
2
ar
e
denote
d
as n
earby inten
s
it
y pair in the x and y directi
on.
Given a sam
p
le of 1-bit image data with
a size of 3 x
3 pixels. Findi
ng the mean
of RGB
values of ea
ch pixel, the result is repr
esented in a ma
trix as sh
own
in Figure 2.
Figure 2. Image’s
RGB me
an matrix
Feature cal
c
u
l
ations of ent
ropy, ene
rgy, c
ontrast an
d
homog
eneity is ba
sed o
n
the co
-
occurre
n
ce intensity matrix. Conse
q
u
ently, pr
ior to sea
r
ching
those featu
r
es valu
es,
co-
occurre
n
ce in
tensity matrix
is n
eed
ed to
be built.
In thi
s
system, the
value of
dist
ance (d)=1
a
n
d
dire
ction in a
ngle 45
o
(dx=1 an
d dy=1
) are
pre
dete
r
mine
d. From
the above setting, then th
e
value of
co
-o
ccurren
c
e
intensity mat
r
ix value
fro
m
im
age m
a
trix Fi
gure
2 i
s
o
b
tained
as sho
w
n
in Figure 3.
The second
value in ro
w
dx = 1 and
column
dy =
1 is obtai
ned
from pair
co
unt with
distan
ce
1 an
d angl
e 45
o
b
e
twee
n inten
s
ity value of i
m
age 1
and i
n
tensity valu
e of image
1
as
illustrated on
Figure 4.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Multi Feature
s
Co
ntent-Ba
s
ed Im
age Retrieval Using
Cluste
ring a
nd… (Ku
s
ri
ni Kusri
n
i)
1483
Figure 3. Co-occurre
nce Intensity Matrix
Figure 4. Illustration of pair
cou
n
t cal
c
ulat
ion
The value of
p(0,1) i
s
ob
tained from
element valu
e on ro
w dx=0 an
d col
u
mn dy=1
divided by total element va
lue in co
-o
ccurren
c
e ma
tri
x
. Thus, the value of p(0,1
)
is ¼ or 0.25.
The entropy visual featu
r
e
value is ca
lcu
l
ated by Equa
tion 1, as follows:
entrop
y
=-
((p
(0,0
)log
(
p
(0,0
)+
(p
(0,1
)log
(p(
0
,1)
+
(
p
(1,0)l
og(
p(
1,0
)
+
(p(
1
,1)lo
g
(
p
(1
,1)
=-(0.25 lo
g(0,
25)+0.25 lo
g(
0.25)+0 log
(
0
)
+0.5 lo
g(0.5)
=-(- 0.1
5
- 0.1
5
– 0 - 0.15)
=0.
4
5
The value of energy visual
feature is
cal
c
ulate
d
by Equation 2, as f
o
llows:
energy
=
))
1
,
1
(
2
)
0
,
1
(
2
)
1
,
0
(
2
)
0
,
0
(
2
(
p
p
p
p
=
)
2
5
.
0
2
0
2
25
.
0
2
25
.
0
(
=
0.375
The valu
e of
visual fe
ature
co
ntra
st is o
b
ta
ined
with
cal
c
ulatio
n u
s
ing Equ
a
tion
3. Tha
t
is:
contrast
=
5
.
0
2
)
1
1
(
0
2
)
0
1
(
25
.
0
2
)
1
0
(
25
.
0
2
)
0
0
(
= 0 + 0.25
+ 0 + 0
=
0.25
Visual featu
r
e
homoge
neity is obtaine
d with
calculation
using Equ
a
tion 4. That is:
Hom
ogen
eity
=
1
5
.
0
2
0
2
25
.
0
1
25
.
0
=
0.25 + 0.12
5 + 0.125
+ 0.5
=
0.875
Colo
r mom
e
nt can b
e
d
e
fined a
s
si
mple re
prese
n
tation of co
lor featu
r
e in
colo
red
image
s. The
three l
o
w-ord
e
r mo
ment
s
of colo
r
fo
r capturin
g the i
n
formatio
n of
image’
s
col
o
r
distrib
u
tion
are m
ean,
sta
ndard
deviation a
nd
ske
w
ness [2
5]. Fo
r a
colo
r
c in
the im
age,
th
e
mean of
c is symboli
z
ed
as
μ
c
, the standard devi
a
tion as
σ
c and
the ske
w
ne
ss
as
θ
c
. The
values of
μ
c
,
σ
c
, and
θ
c
are
calculat
ed u
s
in
g Eq
uation
5, Eq
uation
6, an
d Equatio
n
7,
r
e
spec
tively.
M
i
N
j
c
ij
p
MN
c
11
1
(5)
2
1
11
2
)
(
1
M
i
N
j
c
c
ij
p
MN
c
(6)
3
1
11
3
)
(
1
M
i
N
j
c
c
ij
p
MN
c
(7)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1480 – 149
2
1484
Whe
r
e M
and
N are the h
o
r
izo
n
tal an
d vertical
si
ze
s o
f
the image a
nd is th
e value of color
c i
n
the image’
s row i and
colu
mn j.
Usi
ng E
quati
on 5,
the val
ue of
visu
al f
eature
s
color mome
nt o
r
d
e
r
1 i
s
cal
c
ul
ated from
the matrix as
follows
:
c
=
)
1
1
1
1
0
1
1
1
0
(
3
3
1
x
= 0.78
The value of colo
r mome
nt orde
r 2 is obt
ained u
s
ing t
he followi
ng steps:
1.
Tran
sfo
r
m ea
ch valu
e in t
he RGB me
a
n
matrix in
Fi
gure
2 by
op
erating
ea
ch
value in th
e
matrix with the Equation 8.
2
)
(
c
c
ij
p
nb
(8)
c
ij
p
is the matrix
cell’
s value in row
i
a
nd
co
lumn
j
and
nb
is the n
e
w
value of the
cell.
The re
sult is t
hen re
presen
ted in Figure 5.
Figure 5. Col
o
r Mome
nt O
r
de
r 2 Value
Matrix
Figure 6. Col
o
r Mome
nt O
r
de
r 3 Matrix
2
.
The col
o
r mo
ment ord
e
r 2
value
c
is cal
c
ulated from th
e matrix usin
g Equation 2.
c
=
2
1
11
1
M
i
N
j
nb
MN
c
=
2
1
556
.
1
3
3
1
x
= 0.416
The col
o
r mo
ment ord
e
r 3
value is obtai
ned with the f
o
llowin
g
step
s:
1.
T
r
a
n
s
f
o
r
m
e
a
c
h
v
a
l
u
e
i
n
t
h
e
R
G
B
m
e
a
n
m
a
t
r
i
x
i
n
F
i
g
u
r
e
2
by
operating
ea
ch
value
in
th
e
matrix with the Equation 9.
3
)
(
c
c
ij
p
nb
(9)
c
ij
p
is the matrix
cell’
s value in row
i
an
d col
u
mn
j
and
nb
is the matrix new
cell’
s value.
Usi
ng these
cal
c
ulations,
the matrix in
Figur
e 2
will
be transform
ed into the
color m
o
ment
orde
r 3 matri
x
as sho
w
n in
Figure 6.
2.
The c
o
l
o
r
moment ord
e
r 3
value
c
is cal
c
ulated from th
e matrix usin
g Equation 3.
3
1
11
1
M
i
N
j
nb
MN
c
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Multi Feature
s
Co
ntent-Ba
s
ed Im
age Retrieval Using
Cluste
ring a
nd… (Ku
s
ri
ni Kusri
n
i)
1485
c
=
3
1
864
.
0
3
3
1
x
= -0.
4
58
2.3. K-Me
an
Clustering
Clu
s
terin
g
m
a
chi
ne i
s
i
n
te
nded
to m
ap
the contino
u
s or di
screte
class d
a
ta
with ma
ny
variation
s
into discrete
cla
ss
with determi
ned vari
ations. Th
e inpu
t of this proce
ss i
s
:
1.
Initial data, that can be eith
er co
ntinuo
us or discrete d
a
ta.
2.
Delta, that is the value to b
e
use
d
to
determin
e
the all
o
we
d gap b
e
twee
n ce
ntroi
d
and mea
n
.
The outp
u
t of this pro
c
e
s
s
is a map
p
ing
tabl
e co
ntaini
ng the di
scret
e
cla
ss i
n
clu
d
i
ng its
centroid valu
e.
The al
go
rith
m used i
n
th
e data
cl
uste
ring
ma
chine
is
derive
d
a
nd exten
ded
from
k-
means
c
l
us
tering algorithm. This
data c
l
us
te
ring us
ing the K-Means
method is
commonly done
with bas
i
c
algorithm as
follow [26]:
1.
Determine
co
unt of class (clu
ster)
2.
Determine ini
t
ial centroi
d
o
f
each cl
ass
3.
Put each dat
a into cla
ss
which h
a
s the
nearest centroid
4.
Cal
c
ulate the
data mean from each cla
ss
5.
For all
cla
s
s, if the differe
nce
of the m
ean
valu
e an
d ce
ntroid
go
es b
e
yond to
lerabl
e e
rro
r,
repla
c
e the
centroid valu
e with the cla
ss mean then g
o
to step 3.
In fundament
al k-me
an
s cl
usteri
ng alg
o
rithm, init
ial centroid value from spe
c
ific di
screte
cla
ss i
s
defin
ed rand
omly, while
in thi
s
resea
r
ch
the
value is prod
uce
d
from
an
equatio
n
sho
w
n
in Equation 1
0
.
n
n
i
i
c
*
2
min)
(max
min)
(max
*
)
1
(
min
(10)
Whe
r
e,
c
i
:
cent
roi
d
cla
s
s i
min
: the lowest value of co
ntin
ue cla
s
s data
max
: the biggest
value of discrete cla
ss d
a
ta
n
: total number of discrete cl
ass
T
h
e
bu
ild
in
g
p
r
oc
es
s o
f
d
i
s
c
r
e
te
c
l
ass
is
sh
ow
n
as
fo
llo
ws
: 1)
Spe
c
ify th
e
s
o
urc
e
da
ta
;
2) Sp
ecify d
e
s
ire
d
total
nu
mber of di
screte cl
ass
(n
); 3)
Get th
e lo
we
st
va
lu
e fro
m
s
o
ur
ce
da
ta
(min); 4
)
Ge
t the highe
st value from
sou
r
ce
data
(max); 5)
Specify delta
(d) to get t
h
e
accepta
b
le
e
rro
r
by Equ
a
t
ion 11;
6) F
o
r
ea
ch
di
screte
cla
s
s, find the
initial
ce
ntroi
d
(c) by
Equation 10;
7) For e
a
ch value in so
urce data, put
it
into its appro
p
riate di
scret
e
cla
ss, whi
c
h is
one that ha
s nearest centroid to the value; 8)
Cal
c
ul
ate the avera
ge value of all members fo
r
each cla
s
s (mean
); 9) For each di
scret
e
cla
ss,
calcu
l
ate the difference betwee
n
its mean an
d its
centroid (s) b
y
Equation 12; 10) For ea
ch di
screte
cl
ass, if s>e , then re
pla
c
e its ce
ntroid val
u
e
with its mean,
put out all values fro
m
t
heir co
rre
sp
ondi
ng cla
s
s then
go back to st
ep 7.
e
d
*(
m
a
x
min)
(11)
n
i
i
i
c
mean
s
1
|
|
(12)
2.4. C4.5 Alg
o
rithm
De
cisi
on tree
is one of methods in cl
assificati
o
n
pro
c
e
ss. In de
cision tre
e
, cla
ssifi
cati
o
n
pro
c
e
ss invol
v
ing classification rule
s ge
nerate
d
fr
om
a set of training data. The deci
s
io
n tree wil
l
split the traini
ng data b
a
se
d on some
criteria given.
The criteri
a
a
r
e divide
d into target
crite
r
ia
and
determin
a
tor
crite
r
ia.
The
cla
ssifi
cation p
r
o
c
e
s
s pro
d
u
c
e
s
th
e targ
et criteria value
s
b
a
sed
on determinat
or criteria val
ues [27].
The al
gorithm of deci
sion tree
buil
d
ing will
search for the
best
deciding crit
erion for
categori
zing
data hom
ogenously. The
criterion will be considered
as
root node.
The next step of
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1480 – 149
2
1486
the algorithm
is cre
a
ting b
r
an
che
s
ba
se
d on the va
lues insi
de of the crite
r
ion.
The trainin
g
data
then
catego
ri
zed
ba
sed
o
n
value
s
at t
he b
r
an
ch. If
the data
co
nsid
ere
d
h
o
m
ogen
after
the
previou
s
ste
p
,
then the proce
ss for thi
s
bran
ch
i
s
immediately sto
pped. Othe
rwise, the pro
c
e
ss
will be
repeat
ed by
cho
o
si
ng a
nothe
r
criterion
as nex
t root n
ode
u
n
til all the
dat
a in the
b
r
an
ch
are ho
mog
e
n
or there i
s
no
remain d
e
ci
d
i
ng crite
r
io
n can be u
s
ed.
The
resulting tree will then be used i
n
cl
assification of
new data. The proccess
of
cla
ssifi
cation is cond
ucte
d by
matchi
ng the
crit
erio
n
value of the
new
data
wit
h
the n
ode
a
nd
bran
ch in de
cision tre
e
until finding the leaf node whi
c
h is al
so the
target crite
r
ion. The value at
that leaf node
is being the
con
c
lu
sio
n
of cla
ssifi
cion re
sult.
Usi
ng
C4.5
al
gorithm,
ch
oo
sing
which att
r
ibute
to
be
u
s
ed
a
s
a
ro
ot
is b
a
sed
on
h
i
ghe
st
gain of ea
ch
attribute, the gain is
cou
n
ted usi
ng Equ
a
tion 13 an
d 14 [28].
n
i
Si
Entropy
S
i
S
S
Entropy
A
S
Gain
1
)
(
*
|
|
|
|
)
(
)
,
(
(13)
n
i
pi
pi
S
Entropy
1
2
log
*
)
(
(14)
W
h
er
e
S
:
Case
set
A :
Attribute
n
: Partition Co
unt of attribute A
|Si|
: Count of ca
se
s in i
th
partition
|S|
: Count of ca
se
s in S
p
i
: Proportion o
f
case
s in S that belon
g to the i
th
partition
3. Results a
nd Analy
s
is
3.1. Result
The result of this research is a
co
mpu
t
er
prog
ram t
o
cla
s
sify image
s. The
progra
m
’
s
main featu
r
e
s
a
r
e 1
)
Data Ca
se In
pu
t; 2) Ca
se
L
i
st; 3) T
r
aini
ng Pro
c
e
s
s; and 4
)
Te
sti
n
g
Process.
3.1. Data Inp
u
t and Case
List
The Data Case Input feature i
s
used t
o
stor
e images that
will be used in the training
process. In t
h
is feature, the
program
will do preprocessing, ta
ke the image visual feature’s
values and
th
en
store
the
m
in the
data
base. The
ca
se
data in
put
feature
inte
rface
is shown
in
Figure 7. Th
is figure sho
w
s th
at the loade
d im
ag
e pro
c
e
s
sed
to some vi
sual featu
r
e
s
as
explained
in
se
ction
2.2
and
the
cl
assificati
o
n
t
e
xt box is filled ma
nuall
y
as th
e ex
pert
judgem
ent.
Figure 7. Ne
w Ca
se Inp
u
t Feature
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Multi Feature
s
Co
ntent-Ba
s
ed Im
age Retrieval Using
Cluste
ring a
nd… (Ku
s
ri
ni Kusri
n
i)
1487
The Case List feature is u
s
ed to revie
w
the
ca
se
s
stored in the data
base. This fe
ature
i
s
also
provided with ability to delete a part
icular
data that are assum
ed not to be
used in the next
training p
r
o
c
e
ss. Th
e Ca
se
List page i
s
shown in Figu
re 8.
Figure 8. List of Cases F
e
a
t
ure
3.2. Training Process usi
ng K-M
ean
s Cluster and
C4.5 Algo
rithm
The traini
ng p
r
ocess i
s
use
d
to prod
uce the de
cisi
on tree from the
case
s in the
case li
st.
This p
r
o
c
e
s
s contain
s
2 sub p
r
o
c
e
s
ses, they are
catego
rization pro
c
e
s
s and de
cisi
on
tre
e
building
p
r
o
c
ess. Th
e
cat
egori
z
atio
n p
r
ocess is pe
rformed
for e
a
ch
of im
age
visual
featu
r
e by
usin
g K-M
e
a
n
s
Clu
s
ter a
l
gorithm
a
s
explained
in
se
ction
2.3.
The
input
of cate
gori
z
a
t
ion
pro
c
e
s
s i
s
all
value
s
of
im
age
s in
a
fea
t
ures a
nd th
e
re
sult i
s
th
e
clu
s
te
r a
nd t
he
cent
roid
o
f
each d
a
ta. T
he
clu
s
ter sel
e
cted
will
be
the on
e th
at
has the
ne
arest
cent
roid.
The
de
cisi
on
tree
building process
is perfo
rmed
u
s
ing C4.5
algo
rithm
.
The traini
ng
pro
c
e
ss i
s
t
hen visualize
d
in
the form of deci
s
ion tre
e
a
s
in Figu
re 9
and in a table
of rule as
sh
own in Fig
u
re
10.
Figure 9. Tre
e
Feature
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1480 – 149
2
1488
Figure 10. List of Rules Fe
ature
3.3. Testing
Process
The te
sting
feature
is u
s
ed
to
do
classificatio
n
t
o
ne
wly in
coming
imag
e
s
. Th
e
cla
ssifi
cation
is ca
rri
ed out by compa
r
ing
betw
een the
new ima
ge’
s visual feature
value and the
rule con
s
tru
c
t
ed in the prio
r training p
r
o
c
ess. The testi
ng feature i
s
sho
w
n in Fig
u
re 11.
Figure 11. Te
sting Featu
r
e
3.2. Analy
s
is
The
te
sting p
r
ocess wa
s carri
ed out
wit
h
15
0 rontge
n imag
e d
a
ta
and
re
sulte
d
som
e
cla
ssifi
cation
s. The data i
s
sho
w
n in Ta
ble 1.
Table 1. Ca
ses Cl
assification
Classific
a
tio
n
Cou
n
t
Backbone 50
Chest 50
Head
50
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Multi Feature
s
Co
ntent-Ba
s
ed Im
age Retrieval Using
Cluste
ring a
nd… (Ku
s
ri
ni Kusri
n
i)
1489
The
te
sting
wa
s carrie
d out
by comp
aring
bet
wee
n
t
he
cla
s
sif
i
cat
i
on
f
r
om
sy
st
em’
s
output an
d th
e expert ju
dg
ment in Case
Input Data
a
s
explai
ned in
se
ction 3.1.
The testin
g to
all
training d
a
ta resulte
d
to true cla
ssifi
catio
n
for 113 dat
a. It means 75.3% of truth
value. The de
tail
result of testing trainin
g
da
ta is sho
w
n in
Table 2.
Table 2. Te
sting Re
sult of All Training
Data
Classific
a
tio
n
Testi
n
g
Result
Cou
n
t
Backbone Backbone
True
30
Backbone Chest
False
12
Backbone Unclassif
i
ed
Unclassif
i
ed
8
Chest Backbone
False
10
Chest Chest
True
34
Chest Head
False
1
Chest Unclassif
i
ed
Unclassif
i
ed
5
Head
Chest
False
1
Head
Head
True
49
Beside
the
te
sting
proce
ss to all
traini
ng
dat
a, it i
s
also cond
ucte
d
10 exp
e
rim
e
n
t
s u
s
ing
120 data. Th
e data co
nsi
s
ts of 40 Backbone, 40
che
s
t, and 40 he
ad image d
a
ta with differe
nt
combi
nation.
The test re
sul
t
is sho
w
n in
Table 3.
Table 3. Te
sting Re
sult
s of the 120 Traini
ng Data
Iterati
o
n
True
False
Unclassi
fied
1 88
22
10
2 88
23
9
3 85
25
10
4 85
25
10
5 88
23
9
6 92
19
9
7 97
15
8
8 97
17
6
9 94
19
7
10 90
20
10
A
v
e
r
age
90.4
20.8
8.8
%
75.33
17.33
7.33
The testin
g to testing d
a
ta
wa
s don
e by
10-fold
cross validation. It is u
s
ed 1
20 t
r
ainin
g
data an
d 3
0
testing
data
for ea
ch
of e
x
perime
n
ts.
The te
st re
su
lt with this m
odel i
s
sho
w
n in
Table 4. Fro
m
the Table 4
,
it is known that the
testing pro
c
e
ss to the data test p
r
odu
cin
g
correct
cla
ssifi
cation
of 55.7% wro
ng cla
s
sificati
on of 33.7% and un
cla
s
sified data ab
out
10.7%.
Table 4. Te
sting Re
sult
s of the Test Data
Iterati
o
n
True
False
Unclassi
fied
1 14
16
0
2 20
6
4
3 20
9
1
4 18
9
3
5 16
7
7
6 16
9
5
7 16
11
3
8 16
9
5
9 19
7
4
10 12
18
0
A
v
e
r
age
16.7
10.1
3.2
%
55.7
33.7
10.7
Evaluation Warning : The document was created with Spire.PDF for Python.