TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 15, No. 2, August 201
5, pp. 373 ~
380
DOI: 10.115
9
1
/telkomni
ka.
v
15i2.818
3
373
Re
cei
v
ed Ma
y 22, 201
5; Revi
sed
Jul
y
3, 2015; Accept
ed Jul
y
17, 2
015
Two Le
vel Clustering for Quality Improvement using
Fuzzy Subtractive Clustering and Self-Organizing Map
Erick Alfon
s
Lisangan
1
, Aina Musdholi
f
ah
2
, Sri Hartati
3
1
Universitas At
ma Ja
ya Maka
ssar, Makassar
,
Indonesi
a
2,3
Universitas
Gadja
h
Mad
a
, Yog
y
ak
arta, Indon
esia
E-mail: erick_
li
sang
an@
lectur
er.uajm.ac.i
d
A
b
st
r
a
ct
Rece
ntly, clust
e
rin
g
al
gor
ith
m
s co
mbi
n
e
d
co
nvent
i
o
n
a
l met
hods an
d
artifi
cial intel
lig
enc
e.
F
S
C-
SOM is desi
g
n
ed to
han
dl
e th
e pro
b
le
m of S
O
M, such
as d
e
fini
ng th
e n
u
m
b
e
r of cl
uster
s
and
in
itial v
a
l
u
e
of ne
uron
w
e
ig
hts. F
S
C find t
he n
u
m
b
e
r of c
l
usters a
nd t
h
e
cluster c
enters
w
h
ich b
e
co
me
the p
a
ra
met
e
r
of
SOM. F
S
C-SOM is exp
e
cted
to i
m
prov
e the
qua
lity of
F
S
C
since t
he
dete
r
mi
natio
n of th
e cluster
cente
r
s
are pr
ocesse
d
tw
ice i.e. searchin
g for data
w
i
th hi
gh
dens
i
t
y at F
S
C then upd
ating th
e
cluster ce
nters
at
SOM. F
S
C-SOM w
a
s tested usi
ng
10
da
tasets that is
me
asur
ed w
i
th
F
-Measure, e
n
tropy, Sil
h
o
u
e
tte
Index, a
nd
Du
nn In
dex. T
h
e
result s
how
ed
that F
S
C-
SOM can i
m
prov
e the c
l
uster c
enter of F
S
C
w
i
t
h
SOM in order to obtai
n the b
e
tter qual
ity of
clusteri
ng resu
l
t
s.
T
he clus
teri
ng result of F
S
C-SOM is better
than or eq
ual
to the clusteri
ng result
of F
S
C that prove
n
by the valu
e
of external a
nd inter
nal va
li
dity
me
asur
e
m
ent.
Ke
y
w
ords
: clu
s
tering, fu
zz
y s
ubstractive cl
u
s
tering, self-or
gan
i
z
i
n
g map
Copy
right
©
2015 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Clu
s
terin
g
is
one of the most impo
rtant
rese
arch i
s
sues in the do
main of data
mining
and very u
s
eful for ma
n
y
application
s
, su
ch
as
marketing, in
dustri
a
l en
gi
neeri
ng, biol
ogy,
medici
ne, an
d image
pro
c
essing [1]. Cl
usteri
ng divid
e
s d
a
ta into
a homo
gen
e
ous
gro
u
p
s
called
c
l
us
ters
. Each c
l
us
ter
c
o
ns
is
t
s
of data
that hav
e a
g
r
eate
r
similari
ty between
th
e othe
r d
a
ta i
n
their own clu
s
ter as
comp
ared with data i
n
other
cluste
r [2].
The effort
s to make imp
r
ovements
cl
uster
mod
e
ls, such a
s
th
e optimal n
u
m
ber
of
clusters and
the best cluste
ring
results still cont
inui
ng because
the me
thods that has been
develop
ed is
heuri
s
tic [3]. Re
cently, clu
s
terin
g
al
go
rit
h
ms
combi
n
e
d
conve
n
tion
al method
s a
n
d
artificial
intelli
gen
ce, li
ke
n
eural
net
wo
rk, geneti
c
al
g
o
rithm, fu
zzy
set
theo
ry, and
evolution
a
ry
prog
ram
m
ing
.
Combini
ng t
w
o
clu
s
terin
g
method
s, so
metimes
call
ed two l
e
vel
clu
s
terin
g
, ha
ve
been
certified
to be more p
o
we
rful than the indivi
dual
method
s. Two level clu
s
tering is propo
sed
to improve p
a
rtitional met
hod, e.
g. k-Mean
s or F
u
zzy C-Me
an
s,
that sensiti
v
e to the initial
clu
s
ter
cente
r
and difficult to determi
ne the numb
e
r of
cluste
rs [4].
Self-Organi
zi
ng Ma
p (S
O
M
) i
s
cl
uste
ri
ng al
g
o
rithm
that apply t
he
con
c
ept
of neu
ral
netwo
rk a
n
d
can be u
s
e
d
for data visuali
z
atio
n [5]. Generally, clu
s
terin
g
alg
o
rithm
s
trie
s to
grou
p data b
y
maximize the inter-cl
ust
e
r and mi
nim
i
ze the intra
-
clu
s
ter [6]. SOM perfo
rm
to
grou
p data
wi
th a different
cha
r
a
c
teri
stic that is
maint
a
ining th
e rel
a
tionship of n
e
ighb
orh
ood i
n
data [7]. Th
e
advanta
ge
o
f
SOM i
s
resistan
ce to
th
e data
noi
se
[8]. But the
disa
dvantag
e
of
SOM is the
structu
r
e
of th
e ne
ural
net
work
and
the
numb
e
r
of n
euro
n
s in th
e
Kohon
en l
a
yer
must be defin
ed first [8].
SOM
is
implemented
to
p
r
odu
ce p
r
oto
c
lu
ster in two
level clusteri
ng [4,
7], [10-11]. Then, the second cl
uste
rin
g
algo
rithms
grou
p the p
r
otoclu
ster
at the se
con
d
l
e
vel.
The re
se
arch
about usi
ng
SOM at the
seco
nd level is not found so
far.
Fuzzy Subt
ra
ctive Cl
uste
ri
ng (FSC)
can
solv
e
the
disadvantag
e of
SOM by
usi
ng d
a
ta
point as
a candid
a
te of the clu
s
te
r ce
nter [
12]. A data point
with the highe
st density will
be
defined
a
s
a
clu
s
te
r
cent
er [1
3]. FSC is i
m
plem
en
ted to initiali
ze th
e n
u
mb
er
of cl
uste
r
and
clu
s
ter
cente
r
in two level
clusteri
ng a
nd
combi
ne
with
FCM, al
so
cal
l
ed Hyb
r
id Fu
zzy
Clu
s
terin
g
[14], [15]. FCM can n
o
t en
sure the uni
q
ue clu
s
te
ring
result be
cau
s
e numb
e
r of
clu
s
ter mu
st
b
e
defined first and the initial of cluste
r cen
t
ers i
s
sele
cte
d
[15].
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 15, No. 2, August 2015 : 373 –
380
374
In this re
se
arch, a n
e
w m
e
thos i
s
propo
sed fo
r two l
e
vels cl
uste
rin
g
by usi
ng F
S
C and
SOM. FSC i
s
u
s
ed to
fin
d
the nu
mbe
r
of cl
us
te
rs
by sea
r
ching
data poi
nt with the hi
gh
est
density will be the cl
uster
center.
The
result of FS
C i
s
the number
of clusters and cluster cent
ers
then will be t
he initial wei
g
ht of SOM. Then, SOM
wil
l
ameliorate the clu
s
te
r ce
nters
of FSC
and
is expe
cted to improve the
quality of clusterin
g
by FSC.
2. The Propo
sed Algori
t
h
m
Fuzzy Subt
ra
ctive Cl
uste
ri
ng
(FSC) i
s
p
r
opo
se
d by
S
t
ephen
Chiu (1994
) wh
ere
finding
the numb
e
r
of cluste
rs b
a
se
d on de
n
s
ity of each
data point. T
he data p
o
int with the hig
hest
numbe
r
of ne
igbors
or hig
h
e
st d
e
n
s
ity wi
ll be
ch
os
en
as th
e
clu
s
ter ce
nter and
t
he d
e
n
s
ity value
of
the cluster
centers will be r
educe so
that can
not
be chosen again.
T
he algorithm will
find
anothe
r data point with
the
high
est num
ber
of
n
e
ig
b
o
r
s or hig
h
e
s
t
den
sity to be
anothe
r
clu
s
t
e
r
cente
r
[16].
Self-Organi
zi
ng Map (SO
M
) is
propo
sed by Teuvo Kohone
n
(19
82) an
d widel
y used as
a method to redu
ce the di
mensi
on of data and cl
u
s
t
e
ring [17]. SOM is a type of neural net
work
that is train
e
d
usin
g un
su
p
e
rvise
d
lea
r
ni
ng to pr
odu
ce a re
present
ation of data i
n
to a map,
su
ch
as 1
D
[18]. In this re
se
arch, we
used
1D m
ap in f
eature m
ap
or outp
u
t layer of SOM. The
numbe
rs of neuron in in
put layer hav
e the sa
m
e
amount
with the numb
e
r
of attribute (
j
) of
dataset. Similarly, the n
u
m
bers
of ne
uron
in
outp
u
t
layer hav
e
the sam
e
amount with
the
numbe
r of clu
s
ter (
k
) that result be
st qu
ality from each dataset.
X
i1
X
i2
X
i3
X
ij
w
1
w
2
w
4
w
3
w
k
Figure 1. SOM Archite
c
ture
FSC-SO
M is
prop
osed to solve the di
sadv
antag
e of SOM that need some pa
ramete
r,
i.e. the n
u
m
ber of n
euro
n
in
output
l
a
yer
and
the
initial
weig
hts of
ne
uro
n
s that d
e
termi
ned
rand
omly. Futhermo
re, FSC-SO
M is expecte
d to im
prove the qu
ality of cluste
ring result from
FSC. At the
first level, FS
C i
s
impl
eme
n
ted to e
s
tim
a
te the n
u
m
ber
of cl
uste
rs a
nd find
th
e
cluster centers that
becom
e
the
parameter of S
O
M.
At the seco
nd level, SOM
will
ameliorate
the clu
s
ter
ce
nters of FS
C with the purp
o
se
to imp
r
ov
e the quality of cluste
ring
by FSC.
We
can d
e
fin
e
two level
clusteri
ng al
go
rithm of FSC-SOM into five main p
r
o
c
e
ss, a
s
follows
:
1. Initialitation.
a.
Data
set with
data point
X
ij
whe
r
e
i
is
i
-th
data point of
n
data and
j
is
j
-th attribute
of
m
attribute in dataset.
b.
Initialize the
para
m
eter, i.
e.
r
(radi
us),
reje
ct ratio,
accept
ratio
,
q
(squa
sh
fac
t
or)
,
α
(
lea
r
ning rate
),
m
a
xE
po
ch
(ma
x
imum epo
ch
), and
ε
(thre
s
hold).
2.
Data no
rmalit
ation usi
ng M
i
n-Max Norm
alizatio
n.
3. Clus
ter
Es
timation
a.
Cal
c
ulate the
den
sity value of each d
a
ta point (
D
i
) u
s
in
g Form
ula (1
).
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Two Le
vel
Cl
usteri
ng for Q
uality Im
provem
ent using
Fuzzy… (Eri
ck Alfons Li
sa
ngan
)
375
∑
∑
(1)
b.
Find the
dat
a point
with
the hig
h
e
s
t den
sity value an
d
set i
t
become th
e
can
d
idate clu
s
ter cente
r
.
c.
Cal
c
ulate ratio of canditat
e
clu
s
ter cent
er (
R
) by divide it with density value of
firs
t c
a
nditate c
l
us
ter center.
d.
Checking the eligibility
of the candi
dat
e cl
uster center
with this followi
ng
conditions:
i. If
R
>
a
c
cept
ratio
the
n
th
e ca
ndid
a
te
clu
s
ter
cente
r
can b
e
a
c
cepted a
s
clu
s
ter cente
r
,
otherwi
se
check the second conditio
n
,
ii. If
R
>
rejec
t
ratio
then
calculate the
sum
of the
ratio
a
nd di
stan
ce
b
e
twee
n
the ca
ndid
a
te clu
s
ter
ce
nter an
d p
r
e
defined clu
s
t
e
r
cente
r
s, otherwise
clu
s
ter
estim
a
tion p
r
o
c
e
s
s is
stop
ped
b
e
ca
use the
r
e
is n
o
d
a
ta poi
nt ca
n b
e
the can
d
idate
cluste
r cente
r
(ste
p 4).
If the sum i
s
gre
a
ter th
an
or
equal
to
1 then th
e
candid
a
te cl
uster cente
r
ca
n be
accepte
d
as
clu
s
ter
cente
r
, otherwi
se th
e
data point
can
not be a
c
cepte
d
as
clu
s
ter
cente
r
and
se
t the density value of it become 0.
e.
If the can
d
ita
t
e clu
s
ter cen
t
er can
be
a
c
cepte
d
be
co
me
the new clu
s
ter ce
nter
then in
cream
ent the
numb
e
r of
clu
s
te
r
(
k
) and
redu
ce the
den
sity
value of
ea
ch
data p
o
int a
r
o
und th
e n
e
w
clu
s
ter
ce
nter (
c
) usi
ng Formula (2)
then
ba
ck
to step
3b.
′
∗
║
║
∗
(2)
4. Usage
FS
C
a.
After the p
r
o
c
ess of
estim
a
tion cl
uste
r
i
s
co
mpleted,
then
cal
c
ul
ate
memb
ership
function of ea
ch cl
uste
r for
each
data poi
nt using Fo
rm
ula (3
).
µ
∑
(3)
Whe
r
e the
sigma value of attribut
j
(
δ
j
)
can be calcula
t
e using Fo
rm
ula (4
),
XMin
j
and
XMax
j
is the
minimum and maximum value for
j
-th attribute.
∗
√
(4)
b.
For e
a
ch dat
a point, find
the high
est
membe
r
ship
function
of each cl
uste
r.
Clu
s
ter with
the hi
ghe
st m
e
mbe
r
ship fu
nction
for ea
ch
data
point
indi
cate th
at
the data
poin
t
get in that
clu
s
ter. After
that, cal
c
ula
t
e the qu
ality of clu
s
te
rin
g
result usin
g F
-
Mea
s
u
r
e (F
fs
c
) usin
g Form
ula (12
)
.
5. Learning
a.
Cal
c
ulate
the
dista
n
ce val
ue b
e
twe
en
each n
euron
weig
ht (
w
) an
d
ea
ch
da
ta
point
X
i
usin
g Formul
a (5
).
∑
(5)
b.
Find win
n
e
r
n
euro
n
that is nearest ne
uron from
i
-th d
a
ta point.
c.
Upd
a
te the
weig
ht of wi
nner
neu
ro
n
and n
euron
s arou
nd the
winn
er n
e
u
r
o
n
based on the
neigh
borhoo
d
value in
t
-th epo
ch (
d(
t)
) u
s
ing Fo
rmul
a (6).
∗
(6)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 15, No. 2, August 2015 : 373 –
380
376
Rep
eat step
5a if there i
s
data point tha
t
hav
e not be
en cal
c
ul
ate the dista
n
ce with each
neuron
, othe
rwise go to ste
p
5d.
d.
Modify the va
lue of l
earnin
g
rate
(
α
)
an
d n
e
i
gh
b
o
r
h
ood
va
lu
e (
d
)
us
in
g
F
o
r
m
u
l
a
(7) a
nd (8
) th
en increm
ant the value of
epoch
.
∗
1
(7)
∗
1
(8)
e. Conve
r
ge
nce
conditio
n
i.
Find the
nea
rest ne
uron fo
r ea
ch
data p
o
int indi
cate t
hat the d
a
ta
point get
in that clu
s
te
r. After that, cal
c
ulate th
e
quality of clu
s
terin
g
result usin
g F-
Measure (
F
fsc
-
so
m
) using Fo
rmula (12
)
.
ii.
Che
c
k conv
e
r
gen
ce
con
d
it
i
on, if the
di
fference
bet
wee
n
F
fsc-som
and
F
fsc
is
more
than
ε
or
m
a
ximum epo
ch ha
s
b
e
en rea
c
he
d
t
hen FSC-SO
M
process
is stop
ped, ot
herwise ba
ck
to step 5a.
3. Rese
arch
Metho
d
3.1. Data
se
t
In this research,
we
use
10
dat
aset from
UCI M
a
chin
e Lea
rni
n
g
(URL:
http://archive.
i
cs.uci.edu/ml/) to
test our
proposed met
hod with
one
level clusteri
ng, i.e. FSC and
SOM. Table
1 sho
w
ab
ou
t testing data
s
et that we u
s
ed in thi
s
re
sea
r
ch and d
e
tail about th
e
dataset, i.e. the num
ber
o
f
data point, attribut
e, and
class they h
a
ve. For dat
aset wi
ne a
n
d
glass, the
re
al num
ber of
cla
s
s
is 7 b
u
t there
is
1
cla
s
s that do
es
not have
a memb
er so
we
define that they have 6 cla
ss.
Table 1. Te
sting Data
set (UCI Ma
chin
e Learning
Rep
o
sitory
)
Dataset Data
Point
Attribute
Class
Iris 150
4
3
Wine 178
13
3
Glass 214
9
6*
WDBC 569
30
2
CMC
1473
9
3
Y
east
1484
8
10
Optical Digit
5620
64
10
Statlog 6435
36
6*
Th
yroid
7200
21
3
Magic Gamma
19020
10
2
3.2. Cluste
r Ev
aluation
There a
r
e 3
a
ppro
a
che
s
to
study the vali
dity of
the cl
u
s
terin
g
results, which
is ba
sed
on
external criteria, internal, a
nd relative [1
9]. The va
lidity of external crite
r
ia is d
o
n
e
by evaluating
the cl
uste
ring
re
sult
s
with
pred
efined
st
ructu
r
e
in
a d
a
taset. T
he
measuri
ng i
n
strum
ent vali
dity
based o
n
external
criteria
is F-mea
s
u
r
e
and e
n
tro
p
y.
The validity
of internal
cri
t
eria is don
e
by
evaluating the clustering results wit
h
utilize
vector dataset information.
The measuring
instru
ment va
lidity of internal crite
r
ia is S
ilhouette ind
e
x and Dun
n
index.
3.2.1. F-Mea
s
ure
F-Mea
s
u
r
e
is used
to
cal
c
ulate the
p
r
e
c
isi
on
and
re
call
between
the cl
uste
rin
g
re
sult
s
with true
class. F-Me
asure
for each clu
s
ter
r
ca
n be calcute
d
usi
n
g
Formula
(9).
,
∗
,
(9)
Whe
r
e
n
(r,
s)
i
s
the num
ber
of membe
r
that is in clu
s
ter
r
and
s
,
is the numbe
r of
membe
r
that
is in cluster
r
, and
is
the number of member that is
in c
l
us
ter
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Two Le
vel
Cl
usteri
ng for Q
uality Im
provem
ent using
Fuzzy… (Eri
ck Alfons Li
sa
ngan
)
377
The overall of
F-Mea
s
u
r
e
(F) from the
cl
usteri
ng resul
t
can be
cal
c
ulated u
s
ing
Formul
a
(10
)
. The gre
a
ter value of the F-m
easure then
the bet
ter clu
s
teri
ng
results is o
b
tained [20].
∑
max
,
(10
)
3.2.2. Entrop
y
Entropy is used to measure how mu
ch the homo
gen
eity of the cluster o
r
distri
b
u
tion of
clu
s
ter m
e
mb
ers i
n
ea
ch
cl
uster [2
1]. Th
e lower valu
e
of entropy i
s
more
homo
g
eneo
us
clu
s
t
e
rs
and the qu
ality of clusterin
g
results is g
e
tting better.
∑
(11
)
Whe
r
e
k
is the numbe
r of
clu
s
ter,
is the numbe
r of data from cluster
i
that get in cluster
r
.
The overall of entropy (E)
can
be calcula
t
ed usin
g Formula (1
2).
∑
(12
)
3.2.3. Silhouette Index
Silhouette In
dex or Silh
o
uette Co
effici
ent is a
no
rmalize
su
mm
ation ind
e
x [22] tha
t
combi
n
e
s
bot
h coh
e
ssio
n and sepa
ratio
n
terms [6, 23
].
,
(13
)
Whe
r
e co
he
ssion
(
a(
i)
) i
s
measured
by cal
c
ul
ating t
he ave
r
ag
e d
i
stan
ce
of all
data
point in
a
clu
s
ter
and
separation (
b(
i)
) i
s
mea
s
u
r
e
d
by
cal
c
ulati
ng the
ave
r
a
ge di
stan
ce
o
f
each d
a
ta p
o
int
in a clu
s
ter wi
th its neare
s
t
clu
s
ter.
a(i
)
a
nd
b(i
)
ca
n be
calculated u
s
ing Form
ula (14) an
d (1
5).
∑
,
,
,
∈
(14
)
∑
,
,
∈
∈
(15
)
Whe
r
e
d(i,j)
i
s
the di
stan
ce between
i-
th and
j-
th dat
a point,
n
Ci
a
nd
n
Ck
is th
e
numbe
r of d
a
t
a
point in
i
-th a
nd
k
-th clu
s
te
r.
Silhouette wi
dth (
s(
i)
) from
each data po
int is used to ca
l
c
ulate Silh
outte Index (S) using
Formul
a (1
6)
whe
r
e
n
i
s
th
e numb
e
r of
data point. T
he ra
nge of
Silhouette In
dex is [-1, 1].
The
greate
r
its val
ue then the b
e
tter quality
of cluste
ring re
sults i
s
achie
v
ed.
∑
(16
)
3.2.4. Dunn Index
Dun
n
Index
(D) is
pro
p
o
s
ed
by Du
nn
[24]
mea
s
ure the ratio
b
e
twee
n the
smallest
intercl
u
ste
r
d
i
stan
ce with
the large
s
t i
n
tracl
u
st
e
r
di
stan
ce. Du
nn
index is used to to iden
tify
clu
s
ters that are comp
act
and well se
pa
rated [6].
∈
∈
,
,
∈
(17
)
Whe
r
e
i, j,
a
nd
k
is cl
uster
from the clu
s
terin
g
re
sult,
d(i,j)
is the intercl
u
ste
r
di
stan
ce betwe
en
clu
s
t
e
r
i
an
d
j
,
d(k)
is i
n
tracl
u
ste
r
di
st
ance from
cl
uster
k
. T
he la
r
g
er
va
lu
e o
f
D
u
nn
In
de
x
sho
w
e
d
the b
e
tter clu
s
teri
n
g
results a
r
e
obtaine
d [19].
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 15, No. 2, August 2015 : 373 –
380
378
4. Results a
nd Analy
s
is
FSC nee
d 4 para
m
eter, i.e. radiu
s
(
r
),
reje
ct ratio, accept ra
tio, a
n
d sq
uash factor (
q
).
Cho
o
si
ng the
value of accept ratio and
reje
ct ration
can affect the clu
s
terin
g
re
sult [16].
If accept
ratio is to
o la
rge the
n
too l
i
ttle data poin
t
can b
e
accepted a
s
cluster ce
nt
er. Where
a
s
if reje
ct
ratio is too
small then too
much
clu
s
te
r ce
nt
ers can
be re
sulted.
The re
com
m
ende
d valu
e of
each paramet
er, i.e. accept
ratio=0.5,
rej
e
ct ratio
=
0.15
, and q=1.5 [
16].
The valu
e of
r
i
s
diffe
rent
for e
a
ch d
a
taset
be
cau
s
e
the resolutio
n
of e
a
ch d
a
t
aset i
s
different e
a
ch
other.
In thi
s
experi
m
ental
re
sult, the
o
p
timal value
of
r
i
s
th
e val
u
e of
r
that can
prod
uce the
highe
st value
of F-Mea
s
u
r
e and p
r
od
uce the numb
e
r
of clu
s
ter
a
bout 2 cl
uste
rs
from the real
numbe
r of cl
uster fo
r ea
ch dat
a
s
et. Table 2
sho
w
t
he optimal va
lue of
r
for eac
h
dataset and the numb
e
r of
clus
te
r that can be produ
ced.
Table 2. The
Optimal Valu
e of
r
Dataset
r
True Class
Predefined Class
Iris 0.45
3
3
Wine 0.9
3
3
Glass 0.145
6
8
WDBC 0.5
2
2
CMC
1.1
3
2
Y
east
0.16
10
10
Optical Digit
2.2
10
10
Statlog 0.65
6
7
Th
yroid
0.5
3
4
Magic Gamma
0.7
2
2
The valu
e o
f
learni
ng
ra
te (
α
)
and
maximum e
poch (maxEpoch) i
s
α
=0
.4
a
nd
m
a
xEpoch
=5
0 that i
s
the
best
co
mbina
t
ion in [2
5]. T
he th
re
shold
value (
ε
) i
s
0.
7 for
FSC
-
S
O
M
becau
se there is co
nverg
e
n
ce
con
d
itio
n
that compa
r
e the differen
c
e bet
wee
n
F
fsc-so
m
and
F
fsc
in
learni
ng p
r
ocess at the se
con
d
level
.
Figure 2. Visualization of glas
s data
s
et
Figure 3. Visualization of FSC-SO
M re
sult for
glass data
set
The p
e
rfo
r
m
ance of
FSC-SO
M
can
be
se
en
i
n
Table
3
where
the
me
aning
of
che
c
kma
r
k i
s
the quality of clu
s
terin
g
re
sult by
FSC-SOM gre
a
ter
than or
equ
al
to the clu
s
tering
result of anot
her al
gorithm
s, i.e. FSC and SOM.
Th
ere i
s
4 clu
s
t
e
r validity measure
m
ent
s to
comp
are the
pro
p
o
s
ed
al
gorithm
with
anothe
r al
gor
ithm, i.e. F-Measure, En
tropy, Silhou
ette
Index, and Dunn Index. T
he re
sult
sh
o
w
that the quality of clust
e
ring
re
sult b
y
FSC-SOM
a
t
least e
qual to
the quality of clu
s
terin
g
re
sult by FSC f
o
r all d
a
taset and all
clu
s
ter validity eithe
r
external o
r
internal validity.
Whe
r
ea
s, the
quality of clusteri
ng resul
t
by FSC-SOM is gre
a
ter
than or eq
ual
to th
e
quality of cl
u
s
terin
g
result
by SOM for some d
a
ta
set
and diffe
rent
clu
s
ter vali
dity. The qu
ality of
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Two Le
vel
Cl
usteri
ng for Q
uality Im
provem
ent using
Fuzzy… (Eri
ck Alfons Li
sa
ngan
)
379
clu
s
terin
g
result by FSC-S
O
M ba
sed
on
the pr
eci
s
ion
and
recall of
true
cla
ss
using F-Mea
s
u
r
e
is greate
r
th
an or
equ
al to SOM in 7
datasets.
Th
e quality of clusteri
ng
re
sult by FSC-S
O
M
based o
n
ho
mogen
eity of the clu
s
ter
usin
g entrop
y
is gre
a
ter
than or
equ
a
l
to SOM in
9
datasets. T
h
e
quality of
clu
s
terin
g
re
sult
by
FSC-SOM
ba
sed
on
th
e ratio b
e
twe
en the
ave
r
a
ge
distan
ce
of in
traclu
ste
r
di
st
ance an
d the
avera
ge di
st
ance of inte
rcluster u
s
ing
Silhouette In
dex
is greate
r
th
an or
equ
al to SOM in 8
datasets.
Th
e quality of clusteri
ng
re
sult by FSC-S
O
M
based
on th
e ratio
bet
ween th
e
sma
llest di
stan
ce
s
of i
n
terclu
ster with
largest
dista
n
ce of
intracl
u
ste
r
u
s
ing
Dun
n
Index is gre
a
te
r than or equ
al
to SOM in 8
datasets.
Table 3. The
Perform
a
n
c
e
of FSC-SOM
Dataset F-Measur
e
Entrop
y
Silhouette
Dunn
FSC
SOM
FSC
SOM
FSC
SOM
FSC
SOM
Iris
√
√
√
√
√
√
√
√
Wine
√
√
√
√
√
√
√
√
.G
lass
√
√
√
√
√
√
√
WDBC
√
√
√
√
√
√
√
√
CMC
√
√
√
√
√
√
Y
east
√
√
√
√
√
√
√
Optical Digit
√
√
√
√
√
√
√
√
Statlog
√
√
√
√
√
√
Th
yroi
d
√
√
√
√
√
√
√
Magic Gamma
√
√
√
√
√
√
√
5. Conclusio
n
FSC ca
n han
dle the pro
b
l
e
m of SOM throu
gh
defini
ng the param
eter of SOM, i.e. the
numbe
r of cl
uster a
nd init
al value of neuro
n
’s
weig
ht. SOM also can am
elio
rate the clu
s
t
e
r
cente
r
s that
are
define
d
by FSC so t
he b
e
tter
qu
ality of cl
ust
e
ring
can
be
achieved.
T
he
clu
s
terin
g
re
sult of FSC-S
O
M is b
e
tter
than or
equ
al
to the clu
s
te
ring
re
sult of FSC that pro
v
en
by the value
of external
a
nd inte
rnal
validity
meas
urement. Futhermore, th
e
clusteri
ng
re
su
lt of
FSC-SO
M is
better than th
e clu
s
terin
g
result of SOM for som
e
data
s
ets.
Future
wo
rk
will be involv
ed with u
s
in
g
anothe
r met
hod to up
dat
e the value o
f
learnin
g
rate an
d neig
hborhoo
d in
SOM, e.g. Gaussia
n
or
Heuri
s
tic an
d u
s
ing a
nothe
r
method to ge
t th
e
best co
mbin
ation of SOM’s paramet
er, i.e. value of learning
rate, maximum epo
ch,
and
threshold.
Referen
ces
[1]
Yang C,
Chi S.
An Ant-Based Se
lf-Organi
z
i
n
g
F
e
ature
Maps Alg
o
rith
m
. 5th W
o
rks
hop On S
e
lf-
Organiz
i
ng Ma
ps. Paris. 200
5
.
[2]
Gu L, Lu X
.
Semi-su
pervis
ed Subtractiv
e
Clusteri
ng by
Seedi
ng
. 9th
Internatio
na
l C
onfere
n
ce o
n
F
u
zz
y
S
y
stems
and Kno
w
l
e
dg
e Discov
e
r
y
. Si
chua
n. 201
2; 1: 738-74
1.
[3]
Santosa
B. Da
ta Mini
ng T
e
kn
ik Pema
nfaata
n
Data
u
n
tuk
Keper
lua
n
Bis
n
is. Yo
g
y
akart
a
: Graha Ilm
u
.
200
7.
[4]
Chi S, Yang C.
A
T
w
o-stag
e Clusteri
ng Met
hod C
o
mbi
n
in
g
Ant Colon
y
S
O
M and K-means.
Journ
a
l of
Information Sci
ence a
nd En
gi
neer
ing.
2
008; 24(1):
14
45-
14
60.
[5]
Luo B, T
ang X.
Using Self-Organ
i
z
i
n
g Map
for Ideas
Clu
stering of Group Argu
mentat
ion
. T
he 11th
Internatio
na
l Symp
osi
u
m on
Kno
w
l
e
d
ge a
n
d
S
y
stems Sci
ences.
Xi’
an. 2
010; 1: 1-6.
[6]
Mushd
o
lifa
h
A
,
Hashim SZ
M.
T
r
iangu
lar
Kerne
l
Ne
ares
t Neig
hbor B
a
sed Cl
usteri
ng
for Pattern
Extraction in
Spatio-T
e
m
por
al Datab
a
se
.
Intelli
gent
S
y
s
t
ems Des
i
gn
and
Ap
plic
atio
ns (ISDA),
Intelli
gent S
y
st
ems Des
i
g
n
a
n
d
Ap
plic
ations
(ISDA),
201
0 1
0
th Intern
atio
n
a
l C
onfer
ence.
Cair
o. 20
10;
1: 67-73.
[7]
Sarlin P, Eklund T
.
F
u
zz
y
Cl
u
s
tering
of th
e S
e
lf-Organ
i
z
i
n
g
M
ap: So
me Ap
plicati
ons
o
n
F
i
nanc
ial
T
i
me
Series
. Adva
nc
es in Self-Org
anizi
ng Ma
ps - 8th In
ternatio
nal W
o
rksho
p
,
W
S
OM 2011. Espoo. 20
11;
1: 40-50.
[8]
Silva B, M
a
rqu
e
s N.
F
eatur
e
Clusteri
ng w
i
th
Self-Organ
i
z
i
n
g
Maps
an
d A
n
App
lic
ation
to F
i
na
nci
a
l
Time-S
eries f
o
r Portfoli
o
Selecti
o
n
. Pro
c
eed
ings
of the Intern
atio
n
a
l Co
nfere
n
ce
on F
u
zz
y
Comp
utation
a
nd Internati
o
n
a
l
Confer
ence
o
n
Neur
al Com
p
utation. Val
enc
ia. 201
0; 1: 30
1-30
9.
[9]
Mokris I, Forgac R.
Decr
easi
n
g the
F
eature
Space
Di
mensi
on
by Ko
ho
nen
Self-Orga
n
i
z
i
n
g Ma
ps
. 2n
d
Slovak
ian – H
u
ngar
ian Jo
int S
y
mp
osi
u
m on
Appl
ied Mac
h
i
ne Intell
ig
ence.
Budap
est. 200
4.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 15, No. 2, August 2015 : 373 –
380
380
[10]
T
a
rek KM, F
a
rouk B.
Ko
ho
ne
n Maps
Co
mbi
ned to
F
u
zz
y
C
-
me
ans, a
T
w
o Leve
l
Cl
usteri
ng Ap
pro
a
ch
.
Appl
icatio
n to
Electricity Lo
ad
Data.
Self Org
anizi
ng M
aps -
Applic
atio
ns a
nd Nov
e
l Al
gor
ithm Desi
gn.
201
1; 1: 541-5
58.
[11]
Souza J
R
, Lu
dermir T
B
, Almeid
a LM.
A
T
w
o Stage Cl
usterin
g
Meth
o
d
Co
mbin
in
g
Self-Organ
i
z
i
n
g
Maps an
d Ant K-Means.
Artifi
cial Ne
ura
l
Net
w
o
r
ks – ICAN
N 200
9. Limas
s
ol. 200
9; 576
8: 485-4
94.
[12]
Abdu
lla
h AG,
F
e
rani
e S. Deve
lo
pment of Short T
e
rm Load F
o
recast
in
g Based On F
u
z
z
y
Su
btracti
v
e
Clusteri
ng.
201
4.
https://
w
w
w
.
re
search
gate.n
e
t/publ
icatio
n/22
893
31
18_
DEV
E
LOPMENT
_
OF
_SHORT
_
T
ERM_LOAD
_F
ORECAST
ING_BASED_ON_FUZZY_SUBTRACT
I
VE_CLUST
E
RING.
[13] Sastria G, Li
on
g C, Has
h
im I.
Appl
icatio
n of
F
u
zz
y
S
ubtract
ive C
l
usteri
ng f
o
r En
z
y
mes
Cl
assificati
on
.
Appl
ied C
o
mp
uting C
onfer
en
ce (ACC
'
08). Istanbu
l. 200
8; 1: 304-3
09.
[14]
Han
L, Che
n
G.
HF
CT
: A Hybrid F
u
zzy Clusteri
ng
Method for
C
o
lla
bor
ative T
agg
ing
. 2007
Internatio
na
l C
onfere
n
ce o
n
Conv
erge
nce I
n
forma
tio
n
T
e
chno
log
y
. G
y
eo
ngj
u. 200
7; 1: 138
9-13
94.
[15]
Yang
Q, Z
han
g D, T
i
an F
.
A
n
Initi
a
li
z
a
t
i
o
n
Method
for Fu
zz
y
C-Mea
n
s
Algorit
h
m
Us
in
g Su
btracti
v
e
Clusteri
n
g
. 2
0
10 T
h
ird Inter
natio
nal
Co
nfe
r
ence
on Inte
l
lige
n
t Net
w
o
r
k
s
and Int
e
ll
ige
n
t S
y
stems.
Shen
ya
n
g
. 201
0; 1: 393-3
96.
[16]
Chiu
SL.
F
u
zz
y M
o
d
e
l I
dentif
icatio
n B
a
sed
On Cl
uster Est
i
matio
n
.
Jour
n
a
l
of Intel
lig
ent
an
d F
u
zz
y
System
s
. 19
94
; 2(3): 267-27
8
.
[17]
Camastra F
,
Vinciar
e
ll
i A. Machin
e Le
arni
ng
for Audio, Image an
d Vid
eo
Anal
ys
is. Lon
d
on: Sprin
ger-
Verla
g
. 200
7.
[18]
Rojas R. N
eur
al Net
w
o
r
ks: A S
y
stematic
Intr
oducti
on. Berli
n
: Spring
er-Ver
lag. 19
96.
[19]
Ren
dón E, Ab
und
ez I, Arizmend
i A, Quiroz EM. In
ternal versus Ext
e
rna
l
cluster vali
da
tion in
de
xes
.
Internatio
na
l Journ
a
l of Co
mputers an
d Co
mmu
n
icati
ons
.
201
1; 5(1): 27-
34.
[20]
Che
n
Y, Qin B, Liu T
,
Li S.
T
h
e Comp
ariso
n
of SOM and K-
means
for T
e
xt
Clusteri
ng.
Co
mp
uter a
n
d
Information Science
. 20
10; 3(
2): 268-2
74.
[21]
Z
hao Y, Kar
y
pis G. Empirical an
d T
heoretical
C
o
mpar
i
s
ons of Selec
t
ed Criteri
on
F
unctions fo
r
Docum
ent Clu
stering.
Mach
in
e Lear
nin
g
. 20
04; 55: 31
1–3
3
1
.
[22]
Arbel
aitz, O,
Gurrutxa
ga I,
Mugu
erza J,
Perez
JM, Per
ona I. An E
x
t
ensiv
e Com
p
a
r
ative Stud
y
of
Cluster Va
lid
it
y Indices.
Pattern Recognition
.
201
3; 46: 243-
256.
[23]
Rouss
eeu
w
JP
. Silho
uettes:
A Graphic
a
l A
i
d to th
e
Interpr
e
tation
an
d Va
lidati
o
n
of Cl
us
ter Anal
ys
is.
Journ
a
l of Co
mputatio
na
l and
Appl
ied Mat
h
e
m
atics
. 1
987; 2
0
: 53-65.
[24]
Dun
n
J. W
e
l
l
S
epar
ated
Cl
uste
rs a
nd Optim
a
l F
u
zz
y
Partiti
ons.
Jo
urna
l of
Cyb
e
rnetics
. 1
974; 4(1):
9
5
-
104.
[25]
Cha
udar
y V,
Bhatia
RS, Ah
la
w
a
t AK. A C
ons
tant L
ear
ni
ng R
a
te Se
lf-Organiz
i
ng M
a
p (CLRSOM
)
Lear
nin
g
Alg
o
ri
thm.
Journal of
Informati
on Sc
ienc
e an
d Engi
neer
ing.
2
015; 31(1):
38
7-3
9
7
.
Evaluation Warning : The document was created with Spire.PDF for Python.