TELKOM
NIKA
, Vol. 13, No. 4, Dece
mb
er 201
5, pp. 1361
~1
367
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i4.2179
1361
Re
cei
v
ed
De
cem
ber 2
3
, 2014; Re
vi
sed
Jan
uar
y 29, 2
015; Accepte
d
March 12, 2
015
Alternative Technique Reducing Complexity of
Maximum Attribute Relation
I
w
a
n
T
r
i Riy
a
di Yanto*
1
, Imam Az
hari
2
Information S
ystem, Universit
a
s Ahmad D
a
h
l
an (UAD)
3
rd
Camp
us, Jln. Prof. Soepo
mo, Janturan,
Yog
y
ak
arta, Indon
esia
*Corres
p
o
ndi
n
g
author, em
ai
l
:
yanto.itr@is.
u
ad.ac.id
1
, azh
a
r
i@ua
d.ac.id
2
A
b
st
r
a
ct
Clusteri
ng r
e
fe
rs to the meth
od gr
oup
in
g th
e larg
e d
a
ta i
n
to the s
m
a
lle
r grou
ps bas
e
d
on th
e
similar
i
ty
meas
ure. Cl
usteri
ng
techn
i
qu
es
ha
ve b
een
a
ppl
ie
d o
n
n
u
m
eric
al
, categor
ical
a
nd
mix
d
a
ta. On
e
of the
categ
o
ri
cal
data
cluste
ring
tech
niq
u
e
bas
ed
on
the soft set theory is Ma
xi
mu
m Attribute
R
e
l
a
ti
o
n
(MAR). T
he M
A
R tech
niq
u
e
allow
s
c
a
lcu
l
ati
ng
all
of
pair
multi s
o
ft set
ma
de. H
o
w
e
ver, t
he c
o
mput
atio
nal
compl
e
xity is
still
an
issu
e
of the tec
h
n
i
qu
e.
To over
c
o
me t
he
draw
back, t
he
pa
per
pro
p
o
s
es the
a
l
terna
t
iv
e
alg
o
rith
m to
d
e
creas
e th
e c
o
mpl
e
xity so
g
e
t the fast
er
r
e
spo
n
se ti
me.
In this
p
aper,
to g
e
t the
si
mi
lar
results as MAR
w
i
thout calcu
l
ating
all
pair
of soft set is
prov
ed. T
he alt
e
rna
t
ive alg
o
rith
m i
s
imple
m
ente
d
in
MAT
L
AB Software, and the
n
e
x
peri
m
e
n
tal is
run on the
1
0
benc
h
m
ark dat
asets. T
he
results show
that the
altern
ative al
go
rithm i
m
proves
t
he computati
ona
l co
mpl
e
xit
y
in term of res
pons
e time up
to 36.46%.
Ke
y
w
ords
: Soft set,
Multi Soft set, At
tribute Relation
Copy
right
©
2015 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Clu
s
terin
g
is
a fundam
ent
al pro
b
lem th
at freque
ntly arises i
n
a broad vari
ety of fields
su
ch as p
a
ttern re
cog
n
ition
,
image pro
c
essing,
ma
ch
ine learning
and statisti
cs [1, 2]. It can be
defined a
s
a pro
c
e
ss of pa
rtitioning a gi
ven data set
of multiple attribute
s
into group
s. Clu
s
teri
ng
has b
een
use
d
in many a
r
eas
su
ch a
s
gene d
a
ta
proce
s
sing [3], transactio
nal
data processi
ng
[4], decisio
n sup
port [5], mobile ad
-h
o
c
netwo
rks (MANETs) [6], study anxiety [7] and radar
sign
als
pro
c
e
ssi
ng [8]. Re
cently, many
attentions
ha
ve been
put
on catego
ri
ca
l data
clu
s
teri
ng,
whe
r
e data o
b
ject
s are m
a
de up of non
-nume
r
ical attributes [9, 10].
The main differen
c
e b
e
tween catego
ri
cal data an
d
numeri
c
al d
a
ta is the multi-valued
attribute that
belong
s to
the categ
o
rical data. Th
ese p
r
o
perti
es lea
d
to difficulties in
the
simila
rities a
nd di
ssimila
ri
ty measure
m
ent in th
e
c
l
us
te
r
i
ng
p
r
oc
es
s
,
s
i
nc
e
th
e
n
o
r
m
a
l
d
i
s
t
an
c
e
measures
ca
nnot be appl
ied dire
ctly
to the catego
rical d
a
ta. Therefo
r
e, the
best simila
ri
ty
measurement
of the categ
o
rical data is
done by
defin
ing the com
m
on obje
c
t for the attribute as
well a
s
the co
mmon value
s
of the attribute,
and the asso
ciation b
e
twee
n the two
[9].
Curre
n
tly, tw
o measure
m
ent approa
ch
es
ba
sed o
n
the theory of rough set h
a
s bee
n
introdu
ce
d in
clu
s
terin
g
attribute sel
e
ctio
n. The
first a
ppro
a
ch is b
a
sed o
n
the ro
ughn
ess of th
e
attribute, i.e. Total Rou
g
hne
ss
(TR)
prop
os
ed by
by Mazla
c
k et al. [11] and Min–
Min
Rou
ghn
ess
(MMR) p
r
op
o
s
ed
by Pa
rm
ar et
al. [12]
. The
se
con
d
ap
pro
a
ch
calle
d Maxim
u
m
Dep
end
ency
of Attribute (MDA) p
r
op
osed by He
ra
w
an et al. [13]. The ap
pro
a
ches of findi
ng
a
clu
s
terin
g
attribute had
su
ccessfu
lly ex
ploited the u
n
ce
rtaintie
s
i
n
the multi-v
a
lued info
rm
ation
system. But, t
here
exist
s
some
unexpe
cted iteration
t
hat lea
d
s to
a
n
in
crement i
n
the
processing
time. the soft set the
o
ry p
r
opo
sed
by M
o
lodtsov
is
a
new
to ma
n
age u
n
certai
n
data. Mam
a
t et
al. propo
se
[14] MAR,
an
alternative te
chni
que
to
select
a
clu
s
te
ring
attribute,
One
of th
e
well
k
n
ow
n te
ch
n
i
q
u
e
s
ba
se
d
on
so
ft s
e
t theo
r
y
[1
5
].
It
is based on a concept
of Ma
ximum
Attrib
ute
Relative wh
e
r
e the com
p
a
r
iso
n
of attributes is
ma
de
by taking into account th
e relative of the
attribute at the categ
o
ry le
vel after the multi-va
lue
d
attribute is d
e
c
omp
o
sed be
a multi soft set.
The propo
se
d techniq
u
e
potentially discovers th
e attributes
sub
s
et
s with
well cove
ra
ge.
However,
the time
compl
e
xity is still i
n
issue
si
nce more
categories
on
every
attribute of
a
categ
o
ri
cal
d
a
ta, the mo
re
multi soft set
made. In
this pap
er, the
al
ternative al
go
rithm to
red
u
c
e
the com
p
lexity of MAR is prop
osed. Th
e differe
n
c
e
with MAR i
s
the propo
se
d algorith
m
usi
n
g
Maximum Attribute Relative con
c
e
p
t wi
thout cal
c
ulat
ing all of pai
r multi soft set made. Th
e
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 136
1 – 1367
1362
prop
osed al
g
o
rithm p
o
tenti
a
lly achi
eves
a lower
co
m
p
utational time
com
p
lexity as comp
are
d
to
the origin
al M
A
R.
2. Rese
arch
Metho
d
2.1. Information Sy
stem
An informati
on sy
stem (I
S) refe
rs to
a co
llection of multiple pieces of equipment
involved in
th
e di
sseminati
on of i
n
form
a
t
ion. Form
ally
, as
define
d
i
n
[16] a
n
info
rmation
sy
ste
m
can b
e
re
pre
s
ente
d
as
a 4-tuple
(qu
a
d
r
uple
)
f
V
A
U
S
,
,
,
, where
U
u
u
u
u
U
,
,
,
,
3
2
1
and
A
a
a
a
a
A
,
,
,
,
3
2
1
is a no
n- em
pty finite set of obj
e
c
ts a
nd attrib
utes, re
sp
ecti
vely.
a
a
A
a
V
V
V
,
is the domai
n (value set
)
of attribute
a
,
V
A
U
f
:
is an information
function
such that
a
V
a
u
f
,
, for every
,
,
A
U
a
u
calle
d informatio
n (kno
wle
d
g
e
)
function.
Defini
tion 1
: let
,
,
,
,
be an inf
o
rmatio
n syst
em. If
0,1
,
for every
∈
,
then
,
,
,
,
is call
ed
a Bolean-val
ued informati
on syste
m
.
2.2. Soft Set
Theory
Let
is a
set
of param
ete
r
de
scribin
g
obje
c
ts in
,
is the power
set of
and
⊂
,
s
o
ft s
e
t over
as defin
ed in
[15] is a pair
,
,
whe
r
e
is a function give
n by
:
→
(
1
)
Obv
i
ou
sly
,
a sof
t
set
,
over
can be said a
s
a paramete
r
ized family (sub
set
)
of the
universe. For
∈
,
may be considere
d
as the set of
-eleme
nts of the soft set
or the
set
-ap
p
roximate element
s of the soft set
.
Based
on the
definition of
an informatio
n syst
em
and
a soft set, a
s
explain
ed i
n
[17], a
soft set
can
be inte
rp
rete
d a
s
a
sp
eci
a
l type of
inf
o
rmatio
n syst
ems,
te
rme
d
a
bin
a
ry-val
u
e
d
informatio
n. The pro
p
o
s
itio
n and proof a
r
e given a
s
follows
Propotition 1
: Each soft set can be
con
s
ide
r
ed a
s
a
bolea
n-valu
e
d
informatio
n system.
Proof
: lets
,
be a soft
set over univ
e
rse
,
,
,
,
be an
information
system. Obvi
ously, the universe
in
,
can
be consi
dere
d
as the universe
,
the parameter
set
can be consi
dered as the attribute
.
Then, the informatio
n syst
em function,
is defined
by
1,
∈
0,
∉
(
2
)
That is, when
∈
,
where
∈
and
∈
, then
,
1
,
otherwi
se
,
0
.
To this, we have
,
0,1
.
Therefore, a soft set
,
can be con
s
i
dere
d
as
Bolean
-value
d informatio
n system
where
,
,
,
,
and
a soft set
,
can
b
e
rep
r
e
s
ente
d
in the form of Bolean Ta
ble
.
Defini
tion 2
:
(see [18]
) th
e cla
s
s of all
value
sets
o
f
a soft
set
,
is
called value-
cla
ss of the soft set and is
denote
d
by
,
.
From p
r
op
osi
t
ion 1, the “standard”
soft
set deal
s wi
th a Boolean
-valued info
rmation
system. F
o
r
a cate
go
rical-valued
information
syste
m
,
,
,
,
wh
ere
⋃
,
∈
is
the domai
n (value set
)
of attribute
which ha
s
categ
o
rical (m
ulti) values, a d
e
compo
s
ition
can
be made fro
m
into
|
|
number of Boolea
n-value
d
information syste
m
,
,
,
,
.
The
decompo
sitio
n
of
,
,
,
is ba
sed
on de
co
mpositio
n of
,
,⋯,
|
|
into the
disjoi
nt-si
ngle
t
on attribute
,
,⋯
,
|
|
.
Defini
tion 3:
(see [14]) L
e
ts
,
,
,
be an information sy
stem su
ch that for
every
∈
,
,
is a fi
nite non
-em
p
ty set and fo
r every
∈
,
|
,
|
1
.
For
every
unde
r
-
a
ttr
ibute
c
o
ns
id
er
a
t
io
n
,
∈
and
∈
, the
map
of
is def
ined a
s
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Alternative Tech
niqu
e Re
duci
ng Com
p
lexity of MAR
(Iwan T
r
i Ri
yadi Yanto)
1363
:
→
0,1
,
(
3
)
su
ch t
hat
1,
,
0,
(
4
)
Defini
tion 4
: Lets
,
,
,
be a categori
c
al
-valued information syste
m
and
,
,
,
,
1
,
2
,
⋯,
|
|
Boolean
-valu
ed inform
atio
n system, we have
,
,
,
,
,
,
,
⟺
,
,
,
,
,
⟺
,
⋮
,
,
,
,⋯,
,
|
|
||
,
||
,
,
,
⟺
,
||
.
(5)
Then,
,
,
,
,
,⋯,
,
|
|
ca
n be
defined as a
multi soft set over universe
rep
r
e
s
entin
g a categ
o
ri
cal
-
valued inform
ation system
,
,
,
.
2.3. Maximum Attribu
t
e
Relativ
e
(MAR)
The MAR te
chniqu
e app
ro
ach h
a
s b
een
propo
se
d by Mamat et al [14]. There a
r
e three
main ste
p
s t
o
sel
e
ct the
domina
n
t attribute. The
first step i
s
det
ermin
e
the suppo
rt of soft
set
respe
c
t to ea
ch p
a
ra
meters over th
e uni
verse
.
Co
nsi
d
er to the p
a
ir
,
, as
s
i
gn to multi-s
o
ft
set over
, repre
s
e
n
ting
a categ
o
ri
cal
-
valued info
rmation sy
ste
m
,
,
,
, where
,
,⋯,
,
|
|
⊆
,
and
,
,⋯,
,
|
|
⊆
,
.
Suppo
rt of
,
by
,
denote
d
,
,
is d
e
fined a
s
,
,
|
,
∩
,
|
|
,
|
.
(6)
The next ste
p
is cal
c
ul
ating the maximum and mi
nimum supp
ort. The max
i
mum su
ppo
rt is
defined
as a
summ
ation
of all supp
ort
with valu
e e
qual to
1.
Fo
r ea
ch
soft
set
,
,
the
maximum su
pport is d
enot
ed maxsu
p
,
,
is defined a
s
maxsup
,
∑
,
,
1
.
(7)
Mean
while, t
he minim
u
m
sup
port i
s
a
summation
of
all su
ppo
rt wi
th value le
ss
than 1.
For e
a
ch
s
o
ft s
e
t
,
,
the maximum su
pport is d
enot
ed minsup
,
,
is defined a
s
minsup
,
∑
,
,
1
.
(
8
)
The MAR tech
niqu
e uses the hig
h
e
st and mo
st freque
ntly occure
d in
the proba
b
ility
dis
t
ribution.If
max
max
sup
,
,⋯,m
a
x
s
u
p
,
|
|
|
|
1
, then
,
is a
c
l
us
tering attribute. If
max
max
sup
,
,⋯,m
a
x
s
u
p
,
|
|
|
|
1
then
max
minsup
,
,⋯,m
i
n
s
u
p
,
|
|
|
|
is a clusteri
ng attribute, whe
r
e max refers to the
value that th
e high
est i
n
t
he p
r
ob
ability distri
but
ion
and m
ode
ref
e
rs to the val
ue that i
s
mo
st
freque
ntly accored in the
p
r
oba
bility distribution.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 136
1 – 1367
1364
Input : Categorical data se
t
Output : Selected attribute
Begin
Builts the mul
t
i-soft set approximation
Cal
c
ulate
sup
port, MaxSup
and MinSup
for i=all categories
for j=all categories
Calc
ulate inters
ec
tion
s
o
ft s
e
t i res
p
ec
t to s
o
ft s
e
t j
Calc
ulate the
s
u
pport :=
int
e
rsec
tion / s
o
ft s
e
t j
Cal
c
u
k
ate a
s
MaxSup or M
i
nSup
end
end
Select attribut
e based on M
a
xsup a
nd Mi
nSup.
Figure 1. The
pseu
do code
of MAR
2.4. Reducin
g Complexity
Thro
ugh
out this sectio
n, a pair
,
, refers
to multi-soft s
e
ts
over
the univers
e U
descri
b
ing a categ
o
ri
cal valued inform
ation system
,
,
,
. Consi
der to the suppo
rt in
[14], the
value of
,
,
∈
0,1
is clear.
Howeve
r, the value of support is alwa
ys 0 o
r
1 in the certai
n ca
se. The j
u
stific
atio
n is
given in pro
p
o
sition 2.
Proposition 2
: Lets
,
,
,
be an inform
atio
n system, re
pre
s
ente
d
a
s
a pair
,
as multi soft set over
,
where
,
,,
⋯
,
,
|
|
⊆
,
and
,
,
⋯
,
,
|
|
⊆
,
. If
,
,
|
,
∩
,
|
|
,
|
,
.
(
9
)
then
1.
,
,
0
,
2.
,
,
1
, for
Proof.
1. Let
,
be a m
u
lti soft set o
v
er the unive
rse
U, whe
r
e
,
,,⋯,
,
|
|
⊆
,
a
nd
,
,
⋯
,
,
|
|
⊆
,
. Based on the definition 4, the mu
lti s
o
ft
s
e
t is
c
o
ns
truc
ted by
decompo
sin
g
the multi valued of attr
ibute in the
information syste
m
,
,
,
be multi
para
m
eters. In the other
word
s, ea
ch at
tribut
e can b
e
re
con
s
tru
c
t
ed into so
me
pair of soft
set
i.
e:
,
∪⋯
∪
,
|
|
and
,
∩
,
∅
,
for
. Thus
, for
,
,
,
0
,
whe
n
.
2.
Clear. The soft s
e
t inters
ects
with its
s
e
lf
.
Based o
n
the
propo
sition
2, the supp
ort value
of each attribute
s
is
exactly always 1 if
the soft set i
n
tersect
s
wit
h
itself
and
0
if the
so
ft s
e
t
inters
ec
ts
res
p
ec
t
to
other s
o
ft s
e
t
in the
same
attribut
e. For this
rea
s
on, n
o
t all sup
port o
f
pair set
s
in
the multi so
ft set
,
is
cal
c
ulate
d
, b
u
t we
only
ca
lculate th
e
su
pport
of pai
rs soft set re
sp
ect to all
soft set in
the ot
her
attribute. The
r
efore, the co
mplexity of algorithm
will b
e
red
u
ced. S
uppo
se th
at in an info
rmati
on
system, there
are
object
s
,
attributes an
d
is the maximum distin
ct values of ea
ch attribute.
Comp
utation
a
l co
st to determin
e
the
el
ementa
r
y set
of all attributes is
1
. The propo
se
d
techni
que ne
eds
1
times to determi
ne the sup
port for ea
ch cate
gory. Thu
s
, the
comp
utationa
l complexity for the MAR techniq
ue i
s
1
1
1
.
howeve
r
, the modification
algorith
m
nee
ds
1
. Thus, the computatio
n
a
l
compl
e
xity fo
r the pro
p
o
s
ed tech
niqu
e
is
1
1
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Alternative Tech
niqu
e Re
duci
ng Com
p
lexity of MAR
(Iwan T
r
i Ri
yadi Yanto)
1365
1
. The pseu
d
o
cod
e
of MAR and alte
rnative
algorit
hm are give
n
in figure 1
and figure 2
,
r
e
spec
tively.
Input : Categorical data se
t
Output : Selected attribute
Begin
Builts the mul
t
i-soft set ap
p
r
oximation
Cal
c
ulate
sup
port, MaxSup
and MinSup
for i=all attribute
for j=all attribute
if i isnot equal
to j
Cal
c
ulate inte
rse
c
tion all
so
ft set in attribut
e i res
p
ec
t t
o
all s
o
ft s
e
t in attribute j
Cal
c
ulate the
sup
port := int
e
rsectio
n
/ eac
h
s
o
ft s
e
t in the attribute j
Cal
c
ulate the
minsu
p
and
maxsup
end
end
Select attribut
e based on M
a
xsup a
nd Mi
nSup.
Figure 2. The
pseu
do code
of propo
sed
algorith
m
3. Experimental Re
sults
and discus
s
ion
In ord
e
r to
co
mpare the
propo
sed
algo
ri
thm
and
MAR app
roa
c
h, th
e both
algo
rit
h
ms
are
impleme
n
ted
in MATLAB
win 32
bit versi
on 7.1
0
.0
(R2
010a
). Th
ey are
executed
seq
uentially o
n
a pro
c
e
s
sors
Intel Atom Qu
ad core
@1.5
0 GHz (4 CP
Us). The total
main mem
o
ry is 2G an
d the
operating
system is Wi
ndo
ws
7 Profe
ssi
onal 32
-b
it.
We el
abo
rate
throug
h the
UCI b
e
n
c
hm
ark
datasets [19] as in table 1:
Table 1. The
UCI be
nchma
r
k data
s
et
s
Datase
ts
Num
b
er o
f
Insta
n
ces
Num
b
er o
f
A
ttribu
t
es
Size of Da
ta
AcuteImflammations
120
6
720
balloon
16
5
80
cilinder_band
520
14
7,280
lenses
24
5
120
lungCancer
32
57
1,824
mushroom
8,124
22
178,728
solar_flare
1,066
13
13,858
soybean
47
35
1,645
soybean_lar
ge
265
36
9,540
suplier
27
7
189
All the sele
cted data
set
s
are differe
nt fr
om on
e
anothe
r in
terms
of si
ze, either
hori
z
ontally o
r
vertically ai
med to analyze
the perfo
rmance of the prop
osed tech
niqu
e wh
en
involving a high numb
e
r o
f
reco
rds a
s
well as
the h
i
gh numb
e
r
of attributes. Some datase
t
s
have be
en m
odified by
re
moving in
sta
n
ce
s h
a
ving i
n
com
p
lete
da
ta and
rem
o
ving an
attribu
t
e
only having o
ne cate
gori
c
a
l
value.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 136
1 – 1367
1366
Table 2. Co
m
putation time of MAR and p
r
opo
se
d algo
rithm
Datase
ts
TR
MMR
MD
A
M
A
R
The
prop
osed
algorit
hm
I
m
p
r
o
v
em
en
t
AcuteImflammations
0.3038
0.3080
0.2896
0.1880
0.1560
17.02%
balloon
0.0171
0.0194
0.0194
0.0160
0.0150
6.25%
cilinder_band
0.2231
0.2409
0.1175
0.1090
0.0620
43.12%
lenses
0.1085
0.1240
0.0310
0.0310
0.0160
48.39%
lungCancer
0.1395
0.1292
0.0775
0.0620
0.0470
24.19%
mushroom
1.0128
1.1517
1.0060
0.0470
0.0200
57.45%
solar_flare
0.4694
0.4687
0.3957
0.0780
0.0470
39.74%
soybean
0.0858
0.0633
0.0349
0.0310
0.0160
48.39%
soybean_lar
ge
0.2306
0.2490
0.1214
0.1090
0.0780
28.44%
suplier
0.0395
0.0550
0.0352
0.0310
0.0150
51.61%
Average
36.46%
The
comp
utation results com
pari
ng
the MA
R
an
d propo
se
d
algorith
m
in
term of
executio
n time are
sh
own in table 2. A decrea
s
in
g relative velocit
y
of the prop
ose
d
algo
rith
m
respec
t to MAR is
calc
ulated by following formula:
%
100%
(10
)
In summa
ry, based on ex
perim
ents on
UCI datas
ets, the propo
sed algo
rithm achi
eves
lowe
r
com
p
u
t
ation time th
an the
p
r
evi
ous alg
o
ri
th
m. An in
crea
se
average
time of
pro
p
o
s
ed
algorith
m
rea
c
h 36.4
6
%.
Balloon d
a
taset contai
ning
16 obje
c
ts, 5
attributes
an
d su
plier
co
ntaining 2
7
obj
ects,
7
attributes are
the faste
s
t
executio
n tim
e
s, st
a
ndin
g
at
0.015 se
cond
at pro
p
o
s
ed algo
rith
m,
0.016 second
and 0,031
seco
nd at MAR, improvin
g up to 6.25 % and 51.6
1
%, resp
ectively. A
part from tha
t, the longest
execution ti
me is at
acu
t
e inflammations d
a
ta set
containi
ng 1
20
obje
c
ts and 6
attributes, the prop
osed a
l
gorithm
ne
ed
s 0.156
se
co
nd, improving
17.02 % from
0.188
se
con
d
nee
ded M
A
R. The hi
g
hest imp
r
ov
e
m
ent achiev
es 5
7
.45 %
on the mu
sh
room
datasets, co
mpri
sing 8
1
2
4
obje
c
ts a
n
d
22 attribut
e
s
. On other h
a
nd, the lower improvem
ent
is
on ballo
on da
tasets. Th
e range of the h
i
ghe
st and th
e lowe
st imp
r
ovement is 5
1
.20% while t
h
e
averag
e improvement re
aches
3
6
.46% o
f
10 dataset
s.
4. Conclusio
n
A numbe
r al
gorithm
used
to sele
cting
attr
ibute
on categori
c
al dat
a
clu
s
teri
ng probl
em
have bee
n propo
sed, on
e of them is MAR. However,
computation
a
l compl
e
xity
of this algorit
h
m
is still an i
s
sue. This paper pr
oposes an alternative al
gorithm
m
odif
i
ed MAR based on multi
soft
set theo
ry selectin
g attri
bute to
clu
s
tering
mu
lti-v
a
lue info
rmat
ion sy
stem
s. The m
odification
allows not all multi soft set composed attributes
cal
c
ul
ated. The experim
ent results illustrate the
prop
osed
alg
o
rithm
achiev
e lo
wer exe
c
ution time.
Th
e mai
n
contri
bution
of this
work is in te
rms
of redu
cin
g
e
x
ecution time
whe
r
e it is sli
ghtly
improve
d
as
comp
are
d
to MAR. In the next stag
e,
the ability of
the techniq
u
e
to cl
assify categor
i
c
al
dat
a will
be
exa
m
ined i
n
terms of
clu
s
te
ri
ng
efficien
cy an
d accu
ra
cy of cluste
rin
g
. It
also
n
eed
s to
be develo
p
e
d
for groupi
n
g
different
kin
d
s
of data types.
Referen
ces
[1]
Z
Huang. “E
xtensi
ons to the
k-Means Alg
o
r
ithm
for Clusterin
g Larg
e
D
a
ta Sets
w
i
t
h
Categ
o
rica
l
Values”.
Data Min. Know
l. Di
scov.
1998; 2(
3): 283–
30
4.
[2]
DW
Kim, KH
Lee
an
d D
L
e
e
. “F
uzz
y
clust
e
rin
g
of c
a
teg
o
rical
d
a
ta us
i
ng fuzz
y
centr
o
ids”.
Pattern
Recognit. Lett.
200
4; 25(1
1
): 1263
–1
271.
[3]
D Jian
g, C T
a
ng a
nd A Z
h
a
ng. “Cluster
an
al
ysis for g
ene
expressi
on d
a
t
a: a surve
y
”.
IEEE Trans.
Know
l. Data Eng.
200
4; 16(1
1
): 1370
–1
386.
[4]
F
Giannotti, C Gozzi and G M
anco. “Cl
u
steri
ng T
r
ansaction
al Data”. in
Pr
i
n
cipl
es of Data
Minin
g
an
d
Know
led
ge
Di
scovery SE
-
15
, T
Eloma
a
, H Ma
nni
la
and
H T
o
ivo
nen, E
d
s. Spr
i
ng
er Ber
l
i
n
Heid
el
berg. 2
0
02; 243
1: 175
–
187.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Alternative Tech
niqu
e Re
duci
ng Com
p
lexity of MAR
(Iwan T
r
i Ri
yadi Yanto)
1367
[5]
RG Mathieu
and JE Gibso
n
. “A methodolo
g
y
for lar
g
e-scal
e
R&D pla
nni
ng b
a
se
d on cluste
r
analy
sis”.
IEEE Trans. Eng. Manag.
199
3; 40
(3): 283–
29
2.
[6]
A Afsharfarn
ia
and
A Kar
i
mi. “A Ne
w
Clusteri
ng A
l
g
o
rithm Us
ing
Links’
W
e
ig
ht to Decr
ease
Cons
umed En
erg
y
in MANE
T
s
”.
T
E
LKOMNIKA (T
eleco
m
mu
nicati
on
Co
mp
ut. Electron. Contro
l.
201
4; 12(2): 41
1.
[7]
IT
R Yanto, P
Vitasari, T
Hera
w
a
n a
nd MM
Deris. “Ap
p
l
y
i
ng var
i
ab
le
pr
ecisio
n ro
ug
h
set mode
l for
clusteri
ng stud
ent suffering st
ud
y’s a
n
x
iet
y
”.
Expert Syst. Appl.
20
12; 39(
1
)
: 452–4
59.
[8]
S Haim
ov, M Micha
l
ev, A
Savche
n
ko
an
d OI Yo
rda
n
o
v
. “Classific
a
ti
on of r
adar
si
gnatur
es b
y
autore
g
ressiv
e
mode
l fitting
and c
l
uster a
n
a
l
y
sis”.
IEEE Trans. Geosci.
Rem
o
te S
ens.
198
9; 27(
5):
606
–6
10.
[9]
S Den
g
, Z
He
and
X
Xu. “G-ANMI: A mutual
in
formati
on b
a
sed
ge
n
e
tic cluster
i
ng
alg
o
rithm fo
r
categor
ical d
a
t
a
”.
Know
led
ge-
Based Syst.
20
10; 23(2): 1
44–
149.
[10]
K Chen a
nd L
Liu. “‘Best K’: critical cl
usteri
ng structures i
n
categor
ical
d
a
tasets”.
Knowl. Inf. Sys
t
.
200
8; 20(1): 1–
33.
[11]
AHYZ
La
w
r
e
n
c
e
J Mazlack. “A Roug
h Set
Appro
a
ch in C
h
o
o
sin
g
Partition
i
ng Attributes”.
[12]
D Parmar, T
Wu, and J
Bla
ckhurst. “MMR
: An al
gorit
hm
for clusteri
ng c
a
tegor
ical
dat
a
usin
g R
o
u
g
h
Set T
heor
y”.
Data Know
l. Eng
.
2007; 63(
3): 879–
89
3.
[13]
T
Hera
w
a
n, M
M
Deris
a
n
d
J
H
Ab
a
w
a
j
y
.
“A
rou
g
h
set a
p
p
roac
h for
sel
e
cting
clust
e
ri
ng
attribute”.
Know
led
ge-B
a
sed Syst.
2010
; 23(3): 220
–23
1.
[14]
R Mamat,
T
H
e
ra
w
a
n and MM Deris. “MAR: Max
i
m
u
m Attribute Rel
a
ti
ve of
soft set
for clustering
attribute sel
e
cti
on”.
Know
le
dg
e-Base
d Syst.
201
3; 52: 11–
2
0
,.
[15]
D Molo
dtsov. “Soft set theor
y—F
i
rst results”.
Comput. Math. with Appl.
1999; 37(4
–5): 19
–31.
[16]
MK Ng. “A fuzz
y
k-mo
des a
l
gorit
hm for cl
u
s
tering c
a
te
go
ri
ca
l
da
ta
”.
IEEE Trans. Fu
z
z
y
Syst.
1999
;
7(4): 446
–4
52,.
[17]
T
Hera
w
a
n. “A
Direct Pr
oof
o
f
Ever
y Ro
ug
h
Set is
a S
o
ft Set”. in
As
ia In
ternatio
nal
C
o
nferenc
e o
n
Mode
lli
ng &a
mp; Simulati
on
. 200
9; 0: 119–
1
24.
[18]
MI Ali, F
F
eng,
X Li
u, W
K
Min and M Sh
ab
i
r
. “On some ne
w
op
erati
ons
in soft set theo
r
y
”.
Com
p
ut.
Math. with Appl.
2009; 5
7
(9): 154
7–
155
3.
[19]
UCI Machine Learning Repository
:, “h
ttps://archive.ics.uci.ed
u/ml/datasets.html”.
Evaluation Warning : The document was created with Spire.PDF for Python.