Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
8,
No.
3,
June
2018,
pp.
1805
–
1813
ISSN:
2088-8708
1805
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
A
Pr
efer
ence
Model
on
Adapti
v
e
Affinity
Pr
opagation
Rina
Refianti
1
,
Achmad
Benny
Mutiara
2
,
Asep
J
uar
na
3
,
and
Adang
Suhendra
4
1,2,3
F
aculty
of
Computer
Science
and
Information
T
echnology
,
Gunadarma
Uni
v
ersity
,
Indonesia
4
Department
of
Informatics,
Gunadarma
Uni
v
ersity
,
Indonesia
Article
Inf
o
Article
history:
Recei
v
ed
No
v
17,
2017
Re
vised
Feb
5,
2018
Accepted
Feb
19,
2018
K
eyw
ord:
Af
finity
Propag
ation
Ex
emplars
Data
Points
Similarity
Matrix
Preference
ABSTRA
CT
In
recent
years,
tw
o
ne
w
data
clustering
algorithms
ha
v
e
been
proposed.
One
of
them
is
Af
finity
Propag
ation
(AP).
AP
is
a
ne
w
data
clustering
technique
that
use
iterati
v
e
mes-
sage
passing
and
consider
all
data
points
as
potential
e
x
emplars.
T
w
o
important
inputs
of
AP
are
a
similarity
matrix
(SM)
of
the
data
and
the
parameter
”preference”
p
.
Although
the
original
AP
algorithm
has
sho
wn
much
success
in
data
clustering,
it
still
suf
fer
from
one
limitation:
it
is
not
easy
to
determine
the
v
alue
of
the
parameter
”preference”
p
which
can
result
an
optimal
clustering
soluti
on.
T
o
resolv
e
this
limitation,
we
propose
a
ne
w
model
of
the
parameter
”preference”
p
,
i.e.
it
is
modeled
based
on
the
similarity
distrib
ution.
Ha
ving
the
SM
and
p
,
Modified
Adapti
v
e
AP
(MAAP)
procedure
is
running.
MAAP
procedure
means
that
we
omit
the
adapti
v
e
p-scanning
algorithm
as
in
original
Adapti
v
e-AP
(AAP)
procedure.
Experimental
results
on
random
non-partition
and
parti-
tion
data
sets
sho
w
that
(i)
the
proposed
algorithm,
MAAP-DDP
,
is
slo
wer
than
original
AP
for
random
non-partition
dataset,
(ii)
for
random
4-partition
dataset
and
real
datasets
the
proposed
algorithm
has
succeeded
to
identify
clusters
a
ccording
to
the
number
of
dataset’
s
true
labels
with
the
e
x
ecution
times
that
are
comparable
with
those
original
AP
.
Beside
that
the
MAAP-DDP
algorithm
demonstrates
more
feasible
and
ef
fecti
v
e
than
original
AAP
procedure.
Copyright
c
2018
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
Achmad
Benn
y
Mutiara
F
aculty
of
Computer
Science
and
Information
T
echnology
,
Gunadarma
Uni
v
ersity
Jl.
Mar
gonda
Raya
No.100,
Depok
16424,
Indonesia
+62-815-916055
amutiara@staf
f.gunadarma.ac.id
1.
INTR
ODUCTION
In
big
data
e
ra,
analysis
of
lar
ge
amounts
of
data
become
as
essential
area
in
Computer
Science.
The
methods
of
data
mining,
among
others
clustering
methods,
classification
methods,
etc.,
are
needed
to
e
xtract
or
mine
the
kno
wledge
from
lar
ge
amounts
of
data.
T
o
group
the
data
in
accordance
with
their
multiple-characteristic
based
similarities
is
kno
wn
as
clustering
[1].
In
recent
years,
there
are
tw
o
ne
w
proposed
data
clusteri
ng
algorithms.
One
of
them
is
Af
finity
Propa-
g
ation
(AP)
that
has
been
proposed
by
Brendan
J.
Fre
y
and
Delbert
Dueck
(2007)
[2].
Unlik
e
pre
vious
clustering
method
such
as
k
-means
which
taking
random
data
points
as
first
potential
e
x
emplars,
AP
considers
all
the
data
points
as
potential
cluster
centers
[3,
4].
AP
w
orks
by
taking
an
input
of
similarity
between
data
points
and
simul-
taneously
considers
all
the
data
points
as
potential
cluster
centers
which
called
e
x
emplars
by
iterati
v
ely
calculating
responsibility
r
and
a
v
ailability
a
based
on
the
similarity
until
con
v
er
ge.
After
the
points
con
v
er
ge,
AP
found
clusters
with
f
ar
less
error
than
k
-means
and
it
tak
es
place
in
less
than
one
hundredth
of
the
amount
of
time
[3].
AP
ha
v
e
se
v
eral
adv
antage
o
v
er
other
clustering
methods
due
to
AP
consideration
of
all
data
points
as
potential
e
x
emplars
while
most
clustering
methods
find
e
x
emplars
by
recording
and
tracing
the
fix
ed
data
points
and
iterati
v
ely
correcting
it
[3].
Because
of
it,
most
clustering
methods
does
not
change
the
set
that
much
and
just
k
eep
tracking
on
the
particular
sets.
Furthermore,
AP
supports
non-symmetrical
simil
arities
and
it
does
not
depend
on
initialization
that
found
on
other
clustering
algorithms.
Because
of
these
adv
antages,
it
has
been
successfully
applied
in
man
y
disciplines
such
as
image
clustering
[3]
and
Chinese
calligraph
y
clustering
[5].
J
ournal
Homepage:
http://iaesjournal.com/online/inde
x.php/IJECE
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
,
DOI:
10.11591/ijece.v8i3.pp1805-1813
Evaluation Warning : The document was created with Spire.PDF for Python.
1806
ISSN:
2088-8708
The
paper
is
the
continue
study
of
our
pre
vious
w
orks
[6,
7].
In
[6]
we
surv
e
y
and
in
v
estig
ate
the
per
-
formance
of
v
arious
AP
approaches,
i.e.
Adapti
v
e
AP
,
P
artition
AP
,
Soft
Constraint
AP
,
and
Fuzzy
Statistic
AP
.
And
it
is
found
that
i)
P
artition
AP
(P
AP)
is
the
f
astest
one
among
four
other
approaches,
ii)
Adapti
v
e
AP
(AAP)
can
remo
v
e
the
oscillation
and
more
stable
than
the
other
,
and
iii)
Fuzzy
Statistic
AP
(FSAP)
could
result
smaller
cluster
number
than
the
other
approach,
because
its
preferences
are
generated
by
using
fuzzy-statistical
methods
it-
erati
v
ely
.
In
[7]
we
in
v
estig
ate
a
time
comple
xity
of
v
arious
AP
approaches,
i.e.
AAP
,
P
AP
,
Landmark
AP
(L-AP),
and
K-AP
.
And
it
is
found
that
the
approach
that
has
the
most
ef
ficient
computational
cost
and
the
f
astest
running
time
is
Landmark
AP
,
although
its
clustering
result
is
v
ery
dif
ferent
in
clusters
number
than
AP
.
Although
AP
itself
has
been
pro
v
en
to
be
f
aster
than
k
-means,
and
it
also
has
sho
wn
much
success
in
data
clustering,
AP
still
has
a
limitation,
i.e.
it
is
not
easy
to
determine
the
v
alue
of
the
parameter
”preference”
p
which
can
result
an
optimal
clustering
solution
[8].
The
goal
of
this
paper
is
to
resolv
e
this
limitation
with
proposing
a
ne
w
model
of
the
parameter
”preference”
p
,
i.e.
it
is
modeled
based
on
the
distrib
ution
of
simi
larity
.
The
model
will
be
e
xplain
in
the
subsection
3.1..
Then
it
will
be
applied
to
Adapti
v
e
AP
(AAP).
This
is
because
AAP
has
a
better
le
v
el
of
accurac
y
than
other
approaches.
In
e
xperiment
random
non-partition
dataset,
random
partition
dataset,
and
real
dataset
from
UCI
datasets
[9]
are
used.
By
partition
of
dataset
k
-means
algorithm
[10]
is
applied
to
generate
a
four
groups
of
data
points.
The
results
of
our
e
xperiment
are
sho
wn
in
section
4..
2.
THEORETICAL
B
A
CKGR
OUND
2.1.
Affinity
Pr
opagation
2.1.1.
Input
Pr
efer
ence
Supposed
we
ha
v
e
a
set
of
data
points
X
=
f
x
1
;
x
2
;
x
3
;
:::;
x
n
g
,
AP
tak
es
as
input
of
similarity
matrix
(SM)
between
data
points
s
,
where
each
sim
ilarity
s
(
i;
j
)
sho
ws
ho
w
good
data
point
x
j
is
fitted
to
be
an
e
x
emplar
for
x
i
.
The
similarities
of
an
y
type
can
be
accepted,
e.g.
for
real
data,
the
ne
g
at
i
v
e
Euclidean
distance,
and
for
non-metric
data
the
Jaccard
coef
ficient
,
so
AP
can
be
widely
applied
in
dif
ferent
areas
[7].
Instead
of
requiring
that
the
number
of
clusters
be
predetermined,
AP
tak
es
as
input
a
real
number
s
(
j
;
j
)
for
each
data
point
j
so
that
data
points
with
lar
ger
v
alues
of
s
(
j
;
j
)
are
more
lik
ely
to
be
chosen
as
e
x
emplars.
These
v
alues
are
referenced
to
as
”preferences”.
These
preferences
will
af
fect
the
number
of
clusters
produced.
The
preferences
v
alues
can
be
the
median
of
the
similarities
or
their
minimum.
p
=
median
(
s
(:))
;
or
;
p
=
min
(
s
(:))
(1)
2.1.2.
Messages
passing
Supposed
we
ha
v
e
similarity
s
(
i;
j
)
;
(
i;
j
=
1
;
2
;
:::;
n
)
,
AP
attempts
to
obtain
the
best
e
x
emplars
that
can
mak
e
the
net
similarity
maximized,
i.e.
the
roundly
sum
of
similarities
between
all
e
x
empl
ars
and
their
member
data
points.
Process
in
AP
can
be
vie
wed
as
passing
v
alues
between
data
points.
There
are
tw
o
v
alues
that
are
passed
between
data
points:
responsibility
and
a
v
ailability
.
Respons
ibility
r
(
i;
j
)
sho
ws
ho
w
well-suited
point
j
is
to
serv
e
as
the
e
x
emplar
for
point
i
,
taking
into
account
other
potential
e
x
emplars
for
point
i
.
A
v
ailability
a
(
i;
j
)
reflects
the
accumulated
e
vidence
for
ho
w
appropriate
it
w
ould
be
for
point
i
to
choose
point
j
as
its
e
x
emplar
,
taking
into
account
the
support
from
other
points
that
point
j
should
be
an
e
x
emplar
.
Figure
1
sho
ws
us
ho
w
the
a
v
ailability
and
responsibility
w
orks
in
AP
.
A
v
ailabilities
a
(
i;
j
)
are
transmitted
from
candidate
e
x
emplars
to
data
points
to
sta
te
the
a
v
ailability
of
candidate
e
x
emplars
to
data
points
as
cluster
point.
Responsibilities
r
(
i;
j
)
are
transmitted
from
data
points
to
candidate
e
x
emplars
and
state
ho
w
strongly
each
data
point
f
a
v
ors
the
candidate
e
x
emplar
o
v
er
other
candidate
e
x
emplars.
All
of
this
message
passings
are
k
ept
done
until
con
v
er
gence
is
met
or
the
iteration
reach
a
certain
number
.
Initially
all
r
(
i;
j
)
and
a
(
i;
j
)
are
set
to
0,
and
iterati
v
ely
their
v
alues
are
updated
as
follo
ws
until
con
v
e
r
-
gence
v
alues
achie
v
ed:
r
(
i;
j
)
=
(
s
(
i;
j
)
max
k
6
=
j
f
a
(
i;
k
)
+
s
(
i;
k
)
g
(
i
6
=
j
)
s
(
i;
j
)
max
k
6
=
j
f
s
(
i;
k
)
g
(
i
=
j
)
(2)
a
(
i;
j
)
=
(
min
f
0
;
r
(
j
;
j
)
g
+
P
k
6
=
i;j
max
f
0
;
r
(
k
;
j
)
g
(
i
6
=
j
)
P
k
6
=
i
max
f
0
;
r
(
k
;
j
)
g
(
i
=
j
)
(3)
IJECE
V
ol.
8,
No.
3,
June
2018:
1805
–
1813
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1807
Figure
1.
Message
P
assing
in
Af
finity
Propag
ation
[3]
After
calculating
both
responsibility
and
a
v
ai
lability
,
iterati
v
ely
their
v
alues
are
updated
as
follo
ws
until
con
v
er
gence
v
alues
achie
v
ed:
r
(
i;
j
)
=
(1
)
r
(
i;
j
)
+
r
o
(
i;
j
)
(4)
a
(
i;
j
)
=
(1
)
a
(
i;
j
)
+
a
o
(
i;
j
)
(5)
where
is
a
damping
f
actor
modeled
to
reduce
numerical
oscillations,
and
r
o
(
i;
j
)
and
a
o
(
i;
j
)
are
pre
vious
v
alues
of
responsibility
and
a
v
ailability
.
should
ha
v
e
v
alue
that
is
greater
than
or
equal
to
0.5
and
less
than
1,
i.e.
0
:
5
<
1
.
A
high
v
alue
of
may
mak
e
number
oscillations
a
v
oided,
b
ut
this
is
not
guaranteed,
and
a
lo
w
v
alue
of
will
mak
e
the
AP
run
slo
wly
[11].
2.1.3.
Exemplar
decision
F
or
a
set
of
data
point
x
i
,
if
x
j
can
reach
the
maximal
of
r
(
i;
j
)
+
a
(
i;
j
)
,
then
it
could
be
deduced
that
i)
x
j
is
the
most
suitable
e
x
emplar
for
x
i
,
or
that
ii)
x
j
w
ould
be
the
most
e
x
emplar
of
x
i
.
The
Ex
emplar
for
x
i
is
selected
as
the
follo
wing
formula:
c
i
ar
g
max
k
f
r
(
i;
j
)
+
a
(
i;
j
)
g
(6)
where
c
i
is
the
e
x
emplar
for
x
i
.
2.2.
Adapti
v
e
Affinity
Pr
opagation
AP
has
man
y
e
xtensions.
One
of
the
e
xtension
is
Adapti
v
e
Af
finity
Propag
ation
(AAP).
AAP
is
designed
to
solv
e
AP
limitation
:
we
can
not
kno
w
what
v
alue
of
preference
can
result
the
best
clustering
solution,
and
if
oscillations
occurs,
it
cannot
be
eliminated
automatically
.
T
o
solv
e
the
problem,
AAP
can
adapt
to
the
need
of
the
data
sets
by
configuring
the
v
alue
of
preferences
and
damping
f
actor
.
As
in
[11]
we
assume
that
C
(
i
)
is
the
number
of
clusters
in
the
iterat
ion.
W
e
summarize
the
AAP
algorithm
as
follo
ws:
Algorithm
1
Af
finity
Propag
ation
Adaptif
Input
:
Data
Points
x
i
,
i
=
1
;
2
;
:::;
n
Output
:
Centers
of
Clusters
C
(
i
)
1:
Set
parameters
=
0
:
5
;
2:
Calculate
s
(
i;
j
)
;
3:
Set
r
(
i;
j
)
=
0
and
a
(
i;
j
)
=
0
;
4:
Run
AP
steps
(Eq.2
-
Eq.6);
5:
V
erify
whether
oscillations
occur
or
not.
If
oscillations
occur
,
then
+
step
,
else
run
AP
steps
continu-
ally
.
6:
If
C
(
i
)
C
(
i
+
1)
,
then
p
p
+
p
step
,
and
s
(
i;
i
)
p
.
Go
to
step
4.
As
proposed
in
[11]
when
the
v
alues
of
C
(
i
)
is
lo
wer
than
2,
the
AAP
algorithm
stops.
AAP
has
sho
wn
a
better
quality
or
at
least
same
quality
in
making
a
clustering
result
as
AP
and
finding
optimal
solution
based
on
dif
ferent
kind
of
data
sets
[11].
AAP
has
sho
wn
to
be
able
to
process
se
v
eral
type
of
data
A
Pr
efer
ence
Model
on
Adaptive
Af
finity
Pr
opa
gation
(Rina
Refianti)
Evaluation Warning : The document was created with Spire.PDF for Python.
1808
ISSN:
2088-8708
such
as
gene
e
xpression
[11],
tra
v
el
route
[11],
image
clustering
[11,
12],
a
mix
ed
numbers
and
cate
gorical
dataset
[13],
te
xt
document
[14],
and
zoogeographical
re
gions
[15].
3.
PR
OPOSED
W
ORK
3.1.
Pr
oposed
Pr
efer
ence
Model
From
a
set
of
data
points
X
=
f
x
1
;
x
2
;
:::;
x
n
g
,
supposed
we
ha
v
e
randomly
tw
o
data
points
x
i
and
x
j
,
if
distance
from
x
i
to
other
points
is
lar
ger
than
to
x
j
,
then
x
i
has
a
lo
wer
possibility
than
x
j
to
be
the
dataset
center
.
On
the
basis
of
this
assumption,
preference
for
each
data
point
can
be
computed
as
follo
ws.
F
or
a
gi
v
en
data
point
x
i
,
similarities
from
x
i
to
other
data
points
are
summed
up
as:
T
D
S
(
i
)
=
n
X
j
=1
;j
6
=
i
s
(
x
i
;
x
j
)
(7)
Then
it
is
normalized
as:
N
T
D
S
(
i
)
=
T
D
S
(
i
)
P
n
j
T
D
S
(
j
)
(8)
Finally
,
for
each
data
point
preference
can
be
defined
as
follo
ws
:
p
(
i
)
=
s
(
i;
i
)
=
N
N
T
D
S
(
i
)
C
onst:
(9)
where
C
onst
can
be
real
non-zero
number
or
min
(
s
(:))
Equation
(9)
of
preferences
re
p
r
esents
and
reflects
the
distrib
ution
of
data
set,
and
also
we
hope
that
it
will
tend
to
better
results
as
sho
wn
in
results
sect
ion.
Then
this
model
is
appl
ied
to
Modified
Adapti
v
e
Af
finity
Propag
ation
algorithm
(MAAP),
as
e
xplain
in
subsection
3.2..
Our
model
is
simple
and
easy
to
apply
if
we
compare
to
a
another
model
proposed
by
Ping
Li
et.al
(2017)
[16].
3.2.
Modified
Adapti
v
e
Affinity
Pr
opagation
(MAAP)
W
e
modify
adapti
v
e
AP
in
the
follo
wing
manner
Algorithm
2
Modified
Adapti
v
e
Af
finity
Propag
ation
(MAAP)
Input
:
Data
Points
x
i
,
i
=
1
;
2
;
:::;
n
Output
:
Centers
of
Clusters
C
(
i
)
1:
Set
parameters
=
0
:
5
;
2:
Calculate
s
(
i;
j
)
,
and
set
p
as
Eq.9
3:
Set
r
(
i;
j
)
=
0
and
a
(
i;
j
)
=
0
;
4:
Run
AP
steps
(Eq.2
-
Eq.6);
5:
V
erify
whether
oscillations
occur
or
not.
If
oscillations
occur
,
then
+
step
,
else
run
AP
steps
continu-
ally
.
Because
we
set
the
proposed
preferences
p
in
algorithm,
then
we
could
omit
the
step
6
in
Algorithm
1.
3.3.
Cluster
Ev
aluation
Silhouette
v
alidation
inde
x
and
F
o
wlk
es-Mallo
ws
inde
x
[17]
are
used
to
e
v
aluate
the
quality
of
a
clustering
process.
F
or
a
gi
v
en
dataset
X
=
f
x
1
;
x
2
;
:::;
x
n
g
,
x
i
2
R
m
,
the
Silhouette
inde
x
of
x
i
can
be
defined
as
S
il
(
x
i
)
=
(
b
(
x
i
)
a
(
x
i
))
=max
(
a
(
x
i
)
;
b
(
x
i
))
(10)
where
a
(
x
i
)
is
defined
as
a
mean
distance
from
other
data
points
in
the
same
cluster
K
c
to
x
i
;
d
(
x
i
;
K
c
0
)
represents
a
mean
distance
from
all
data
points
in
cluster
K
c
0
(
c
0
6
=
c
)
to
x
i
.
If
b
(
x
i
)
is
defined
as
b
(
x
i
)
=
min
(
d
(
x
i
;
K
c
0
))
;
(11)
c
0
=
1
;
2
;
::;
C
,
(
C
represents
the
number
of
cluster)
(
C
0
6
=
C
).
IJECE
V
ol.
8,
No.
3,
June
2018:
1805
–
1813
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1809
And
for
a
gi
v
en
clus
ter
X
=
f
x
1
;
x
2
;
:::;
x
n
g
,
the
Silhouett
e
Inde
x
of
the
X
dataset
can
be
e
xpressed
as
follo
ws:
S
il
(
X
)
=
(
n
X
i
=1
S
il
(
x
i
))
=n
(12)
F
o
wlk
es-Mallo
ws
Inde
x(FMI)
is
an
e
xternal
criteria
[17],
and
defined
as
F
M
I
=
r
a
a
+
b
a
a
+
c
(13)
where
a
is
the
number
of
data
with
the
same
label
and
classified
in
the
same
cluster
,
b
is
the
number
of
data
with
the
same
label
b
ut
classified
in
dif
ferent
clusters,
and
c
is
the
number
of
data
with
dif
ferent
labels
b
ut
classified
in
the
same
cluster
.
4.
RESUL
TS
AND
DISCUSSION
MAAP
algorithm
is
written
and
ran
with
MA
TLAB
R2014b
.
The
test
w
as
carried
on
4GB
RAM
Intel(R)
Core(TM)
i5-2670QM
2.20
GHz
machine.
W
e
test
this
MAAP
algorithm
with:
tw
o-dimensional
random
non-partition
data
point
sets
of
size
100,
500,
1000,
1500
and
2000
respecti
v
ely
to
vie
w
the
scala;
tw
o-dimensional
random
partition
data
point
sets
of
size
100,
300,
500,
800,
and
1000
respecti
v
ely
.
The
random
non-partition
data
points
are
generated
using
uniform
distrib
ution
from
0
to
1.
The
random
partition
data
points
are
di
vided
into
four
group
and
generated
using
k
-means
algorithm.
Real
datasets
are
used
as
sho
wn
in
table
1.
These
datasets
can
be
do
wnloaded
from
UCI-Repository
[9].
T
able
1.
UCI
Datasets
Datasets
T
rue
Cluster
Number
of
Sampel
Dimension
4k2
f
ar
4
400
2
Ionosphere
2
351
4
Iris
3
150
4
W
ine
3
178
13
F
or
the
similarity
,
we
use
ne
g
ati
v
e
Euclidean’
s
distance
from
t
he
data
points:
for
points
x
i
and
x
j
,
s
(
i;
j
)
=
k
x
i
x
j
k
2
.
4.1.
Experiments
on
Random
Non-P
artition
Dataset
The
clustering
results
on
random
non-partition
dataset
are
presented
in
table
2,
table
3.
And
Fi
gu
r
e
2
sho
ws
an
e
xample
result
with
number
of
data
N
=
1000
.
From
those
tables
and
Figure
2,
although
the
number
of
cluster
N
C
resulting
from
the
MAAP-DDP
algorithm
are
almost
the
same
those
from
the
original
AP
with
p
=
min
(
s
)
,
MAAP-DDP
algorithm
is
slo
wer
than
original
AP
both
with
p
=
median
(
s
)
and
with
p
=
min
(
s
)
.
This
mak
e
sense,
because
MAAP-DDP
algorithm
searches
adapti
v
ely
the
-v
alue
in
order
to
eliminate
the
oscillations.
F
or
v
arious
v
alues
of
N
Silhouette
inde
x
(
S
il
)
for
both
algorithm
are
almost
the
same
with
the
range
from
0.3
to
0.45.
Interestingly
,
the
S
il
v
alue
from
MAAP-DDP
looks
more
constant,
around
0,325.
It
means
that
the
clusters
resulting
from
the
MAAP-DDP
is
more
stable
than
those
from
original
AP
.
The
FMI
can
not
be
calculated
because
the
random
non-partition
dataset
do
not
ha
v
e
true
labels.
4.2.
Experiments
on
Random
P
artition
Dataset
The
clustering
results
on
random
4-partition
dataset
are
presented
in
table
4,
table
5.
And
Figure
3
sho
ws
an
e
xample
result
with
number
of
data
N
=
1000
.
From
those
tables
and
Figure
3,
the
MAAP-DDP
has
succeeded
to
identify
clusters
according
to
the
number
of
dataset’
s
true
labels.
The
number
of
dataset’
s
true
labels
is
4,
and
the
number
of
clusters
(
N
C
)
resulting
from
the
MAAP-DDP
is
also
4
for
v
arious
v
alues
of
N
.
The
speed
of
the
MAAP-DDP
is
comparable
with
those
of
original
AP
,
it
means
that
the
e
x
ecution
time
of
MAAP-DDP
is
not
slo
wer
than
those
original
AP
.
Although
the
S
il
v
alues
of
the
MAAP-DDP
are
smaller
than
A
Pr
efer
ence
Model
on
Adaptive
Af
finity
Pr
opa
gation
(Rina
Refianti)
Evaluation Warning : The document was created with Spire.PDF for Python.
1810
ISSN:
2088-8708
T
able
2.
AP
Clustering
Results
on
Random
non-partition
Data
AP
(
p
=
median
(
s
)
)
AP
(
p
=
min
(
s
)
)
N
N
C
E
T
(s)
S
il
N
C
E
T
(s)
S
il
100
10
0.021
0.440
3
0.117
0.420
500
19
0.963
0.374
8
0.531
0.384
1000
27
3.232
0.369
10
2.723
0.371
1500
34
11.019
0.361
12
8.233
0.359
2000
37
16.193
0.358
15
11.022
0.367
N
:
Number
of
data
N
C
:
Number
of
Cluster
E
T
:
Ex
ecution
T
ime
S
il
:
Silhoutte
Inde
x
F
M
I
:
F
o
wlk
es-Mallo
ws
Inde
x
T
able
3.
MAAP-DDP
Clustering
Results
on
Random
non-partition
Data
MAAP-DDP
N
N
C
E
T
(s)
S
il
C
onst
100
6
0.048
0.325
1.0
500
8
2.183
0.325
2.0
1000
10
8.734
0.318
2.0
1500
13
18.055
0.325
2.0
2000
15
31.620
0.323
2.0
(a)
(b)
(c)
(d)
Figure
2.
(a)
Non-partition
random
data
N
=
1000
;
(b)
AP
with
p
=
min
(
s
)
for
non-partition
random
data
N
=
1000
;
(c)
AP
with
p
=
median
(
s
)
for
non-partition
random
f
ata
N
=
1000
;
(d)
MAAP
with
distrib
uted-
data
based
p
for
non-partition
random
data
N
=
1000
;.
those
of
AP
,
F
M
I
inde
x
v
alues
of
the
MAAP-DDP
are
greater
than
those
of
original
AP
.
It
means
that
the
MAAP-
DDP
is
better
in
reco
v
ering
the
true
clustering
structure.
The
k
e
y
parameter
of
the
MAAP-DDP
algorithm
is
C
onst
in
Eq.9.
The
algorithm
is
designed
to
find
adapti
v
ely
C
onst
-v
alue
for
obtaining
the
best
clustering
solution.
IJECE
V
ol.
8,
No.
3,
June
2018:
1805
–
1813
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1811
(a)
(b)
(c)
(d)
Figure
3.
(a)
4-P
artition
random
data
N
=
1000
;
(b)
AP
with
p
=
min
(
s
)
for
4-partition
random
data
N
=
1000
;
(c)
AP
with
p
=
median
(
s
)
for
4-partition
random
data
N
=
1000
;
(d)
MAAP
with
distrib
uted-data
based
p
for
4-partition
random
f
ata
N
=
1000
;.
T
able
4.
AP
Clustering
Results
on
Random
4-partition
Data
AP
(p=
median(s))
AP
(p=
min(s))
N
N
C
E
T
(s)
S
il
F
M
I
N
C
E
T
(s)
S
il
F
M
I
100
11
0.094
0.358
0.58
2
0.090
0.368
0.718
300
18
0.477
0.350
0.38
4
0.502
0.338
0.639
500
27
1.502
0.336
0.35
5
1.444
0.292
0.556
800
35
4.252
0.348
0.31
7
4.872
0.317
0.457
1000
36
6.800
0.343
0.31
9
5.642
0.304
0.474
N
:
Number
of
data
N
C
:
Number
of
Cluster
E
T
:
Ex
ecution
T
ime
S
il
:
Silhoutte
Inde
x
F
M
I
:
F
o
wlk
es-Mallo
ws
Inde
x
T
able
5.
MAAP-DDP
Clustering
Results
on
Random
4-partition
Data
MAAP-DDP
N
N
C
E
T
(s)
S
il
F
M
C
onst
100
4
0.164
0.300
0.421
0.029
300
4
0.578
0.224
0.545
6.457
500
4
0.898
0.278
0.438
0.410
800
4
3.641
0.228
0.531
27.554
1000
4
6.312
0.262
0.474
30.850
4.3.
Experiments
on
Real
Datasets
From
table
6
and
table
7
it
sho
ws
that
MAAP
has
successfully
identified
the
cluster
of
all
real
data,
i.e.
the
number
of
clusters
is
equal
to
the
number
of
true
clusters
(
T
C
),
while
the
original
AP
did
not
all
succeeded
A
Pr
efer
ence
Model
on
Adaptive
Af
finity
Pr
opa
gation
(Rina
Refianti)
Evaluation Warning : The document was created with Spire.PDF for Python.
1812
ISSN:
2088-8708
to
identify
.
The
original
AP
with
p
(
k
)
is
the
minimum
v
alue
of
s
(
i;
k
)
succeeded
to
identify
clusters
of
three
real
datasets,
e
xcept
for
the
real
datasets
of
the
Ionosphere.
The
k
e
y
parameter
of
the
MAAP-DDP
algorithm
is
C
onst
in
Eq.9.
The
algorithm
is
designed
to
find
adapti
v
ely
C
onst
-v
alue
for
obtaining
the
best
clustering
solution.
The
success
of
M
AAP-DDP
is
supported
by
the
v
alues
of
it’
s
S
il
that
are
greater
t
han
0,300,
and
it’
s
F
M
I
that
are
greater
than
.
The
v
alues
of
it’
s
Sil
are
as
follo
ws:
for
4k2
f
ar
0.455;
for
Ionosphere
0,513;
for
Iris
0.522
and
for
W
ine
0.365.
The
v
alues
of
it’
s
FMI
are
as
follo
ws:
for
4k2
f
ar
0.681;
for
Ionosphere
0.719;
for
Iris
0.679
and
for
W
ine
0.622.
T
able
6.
AP
Clustering
Results
on
Real
Datasets
AP
(p
=
min(s))
AP
(p
=
med(s))
T
C
N
C
E
T
(
s
)
S
il
F
M
I
N
C
E
T
(
s
)
S
il
F
M
I
4k2
f
ar
4
4
0,790
0,761
1,000
6
0,808
0,450
0,743
Ionosphere
2
4
0,633
0,386
0,575
28
1,517
0,444
0,413
Iris
3
3
0,129
0,541
0,809
6
0,103
0,469
0,724
W
ine
3
3
0,165
0,491
0,830
11
0,174
0,314
0,468
T
C
:
T
rue
Cluster
N
C
:
Number
of
Cluster
E
T
:
Ex
ecution
T
ime
S
il
:
Silhoutte
Inde
x
F
M
I
:
F
o
wlk
es-Mallo
ws
Inde
x
T
able
7.
MAAP-DDP
Clustering
Results
on
Real
Datasets
MAAP-DDP
T
C
N
C
E
T
(
s
)
S
il
F
M
I
C
onst
4k2
f
ar
4
4
1,318
0,455
0,681
4,879
Ionosphere
2
2
1,228
0,513
0,719
160,200
Iris
3
3
0,771
0,522
0,679
9,252
W
ine
3
3
0,795
0,365
0,622
7,559
5.
CONCLUSIONS
AND
FUTURE
RESEARCH
From
abo
v
e
results
it
can
be
concluded:
(i)
the
proposed
algorithm,
MAAP-DDP
,
is
slo
wer
than
original
AP
for
random
non-partition
dataset,
(
ii)
for
random
4-partition
dataset
and
real
datasets
the
proposed
algorithm
has
succeeded
to
identify
clusters
according
to
the
number
of
dataset’
s
true
labels
with
the
e
x
ecution
times
that
are
comparable
with
those
original
AP
.
Beside
that
the
MAAP-DDP
algorithm
demonstrates
more
feasible
and
ef
fecti
v
e
than
original
AAP
pro-
cedure.
As
we
kno
w
that
the
original
AAP
algorithm
stops
when
the
v
alues
of
C
(
i
)
is
lo
wer
than
2,
while
MAAP-
DDP
algorithm
terminates
after
the
best
clusters
founded.
The
k
e
y
parameter
of
the
MAAP-DDP
algorithm
is
C
onst
in
Eq.9.
The
algorithm
is
designed
to
find
adapti
v
ely
C
onst
-v
alue
for
obtaining
the
best
clustering
solu-
tion.
In
the
future,
for
v
erification
of
the
algorithm
we
ha
v
e
to
test
the
MAAP-DDP
algorithm
with
the
other
kinds
of
dataset,
e.g.
synthetics
dataset
,
f
ace-image
dataset,
and
so
on.
A
CKNO
WLEDGMENT
The
Authors
gratefully
ackno
wledge
Gunadarma
Uni
v
ersity
for
pro
viding
research
funding
and
for
per
-
mission
in
using
the
research
f
acilities.
REFERENCES
[1]
K.
R.
Nirmal
and
K.
V
.
V
.
Satyanarayana,
“Issues
of
k
means
clustering
while
migrating
to
map
reduce
paradigm
with
big
data:
A
surv
e
y
,
”
International
J
ournal
of
Electrical
and
Computer
Engineering
(IJECE)
,
v
ol.
6,
no.
6,
pp.
3047–3051,
December
2016.
[2]
X.
Shi,
W
.
W
ang,
and
C.
Zhang,
“
An
empirical
comparison
of
latest
data
clustering
algorithms
with
state-
of-the-art,
”
Indonesian
J
ournal
of
Electrical
Engineering
and
Computer
Science
,
v
ol.
5,
no.
2,
pp.
410–415,
February
2017.
[3]
B.
J.
Fre
y
and
D.
Dueck,
“Clustering
by
passing
messages
between
data
points,
”
Science
,
pp.
972–976,
2007.
[4]
J.
Macqueen,
“Some
methods
for
clas
sification
and
analysis
of
multi
v
ariate
observ
ations,
”
in
In
5-th
Berk
ele
y
Symposium
on
Mathematical
Statistics
and
Pr
obability
,
1967,
pp.
281–297.
IJECE
V
ol.
8,
No.
3,
June
2018:
1805
–
1813
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1813
[5]
D.
Xia,
F
.
W
u,
X.
Zhang,
and
Y
.
Zhuang,
“Local
and
global
approaches
of
af
finity
propag
ation
clustering
for
lar
ge
scale
data,
”
CoRR
,
v
ol.
abs/0910.1650,
2009.
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/0910.1650
[6]
R.
Refianti,
A.
Mut
iara,
and
A.
Syamsudduha,
“Performance
e
v
aluation
of
af
finity
propag
ation
approaches
on
data
clustering,
”
International
J
ournal
of
Advanced
Computer
Science
and
Applications(IJ
A
CSA)
,
v
ol.
7,
no.
3,
2016.
[7]
R.
Refianti,
A.
Mutiara,
and
S.
Guna
w
an,
“T
ime
comple
xity
comparison
between
af
finity
propag
ation
algo-
rithms,
”
J
ournal
of
Theor
etical
and
Applied
Information
T
ec
hnolo
gy
,
v
ol.
95,
no.
7,
pp.
1497–1505,
April
2017.
[8]
V
.
K.
Ramana,
“Enhanced
fuzzy
based
af
finity
propag
ation
for
multispectral
images,
”
Imperial
J
ournal
of
Inter
disciplinary
Resear
c
h
(IJIR)
,
v
ol.
12,
no.
12,
pp.
1548–1551,
2016.
[9]
A.
Asuncion
and
D.
Ne
wman,
“UCI
machine
learning
repository
,
”
Uni
v
ersity
of
Cali-
fornia,
Irvine,
School
of
Information
and
Computer
Sciences,
2007.
[Online].
A
v
ailable:
http://www
.ics.uci.edu/
mlearn/MLRepository
.html
[10]
J.
K
og
an,
Intr
oduction
to
Clustering
Lar
g
e
and
High-Dimensional
Data
.
Cambridge
Uni
v
ersity
Press,
2007.
[11]
K.
W
ang,
J.
Zhang,
D.
Li,
X.
Zhang,
and
T
.
Guo,
“
Adapti
v
e
af
finity
propag
ation
clustering,
”
CoRR
,
v
ol.
abs/0805.1096,
2008.
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/0805.1096
[12]
R.
Refianti,
A.
Mutiara,
and
A.
Syamsudduha,
“Performance
e
v
aluation
of
af
finity
propag
ation
approaches
on
data
clustering,
”
International
J
ournal
of
Advanced
Computer
Science
and
Applications(IJ
A
CSA)
,
v
ol.
7,
no.
3,
2016.
[13]
K.
Zhang
and
X.
Gu,
“
An
af
finity
propag
ation
clustering
algorithm
for
mix
ed
numeric
and
cate
gorical
datasets,
”
Mathematical
Pr
oblems
in
Engineering
,
September
2014.
[14]
Y
.
He,
Q.
Chen,
X.
W
ang,
R.
Xu,
X.
Bai,
and
X.
Meng,
“
An
adapti
v
e
af
finity
propag
ation
document
clus-
tering,
”
in
Informatics
and
Systems
(INFOS),
2010
The
7th
International
Confer
ence
on
,
March
2010,
pp.
1–7.
[15]
M.
Rueda,
M.
A.
Rodriguez,
and
B.
A.
Ha
wkins,
“Identifying
global
zoogeographical
re
gions:
Lessons
from
w
allace,
”
J
ournal
of
Bio
g
eo
gr
aphy
,
v
ol.
40,
no.
12,
pp.
2215–2225,
Dec.
2013.
[16]
P
.
Li,
H.
Ji,
B.
W
ang,
Z.
Huang,
and
H.
Li,
“
Adjustable
preference
af
finity
propag
ation
clustering,
”
P
attern
Reco
gnition
Letter
s
,
v
ol.
85,
pp.
72–78,
January
2017.
[17]
P
.
Rousseeuw
,
“Silhouettes:
a
graphical
aid
to
the
interpretation
and
v
alidation
of
cluster
analysis,
”
J
.
Comp
App.
Math
,
v
ol.
20,
pp.
53–65,
1987.
A
Pr
efer
ence
Model
on
Adaptive
Af
finity
Pr
opa
gation
(Rina
Refianti)
Evaluation Warning : The document was created with Spire.PDF for Python.