Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 1
,
Febr
u
a
r
y
201
6,
pp
. 27
5
~
28
2
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
1.7
381
2
75
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Threshold Computation to Dis
cover Cluster Structure, a New
Approach
Preeti Mulay
Departm
e
nt o
f
C
S
& IT
, S
y
m
b
iosis Institut
e
of
T
e
chnol
og
y
,
S
y
m
b
iosis Intern
ation
a
l Univ
ersit
y
, Pu
ne, Ind
i
a
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Feb 6, 2015
Rev
i
sed
No
v
18
, 20
15
Accepte
d Dec 2, 2015
Cluster struc
t
ur
e form
ation is
still one o
f
t
h
e resear
ch ar
e
a
s open for
res
earch
ers
.
Th
e
“
N
oSQL” conc
ept open
e
d a ne
w arena for inn
ovations
. I
n
this paper a co
mparative stud
y about fo
rming cluster
struct
ure is discussed
and this discussion is based on selecting
an op
timal threshold value to form a
cluster
.
Thresho
l
d selection con
t
inues to
play
im
portant role in p
o
st cluster
phas
e
as
well
, t
o
accom
m
odate
influx of new data
. This
pap
e
r
cons
is
ts
of a
new increm
ent
a
l-clust
e
ring app
r
oach (ICNBCF), various possi
bilit
ies of
thres
hold
com
putation
and
ev
alu
a
tion
m
eas
ures
.
Keyword:
C
l
osness
fact
or
Clu
s
ter
Eval
uat
i
o
n m
easure
s
Inc
r
em
ental-clustering
Thre
shold
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Preeti Mu
lay,
Dep
a
rtm
e
n
t
o
f
CS &
IT, Symb
io
sis
In
stitu
te o
f
Techno
log
y
,
Sy
m
b
io
sis
In
tern
ation
a
l Un
i
v
ersity,
Lav
a
le, Pun
e
4
1
2
11
5, Ind
i
a
Em
a
il: Preeti.m
u
la
y@sitp
u
n
e.edu
.
in
1.
INTRODUCTION
W
i
t
h
t
h
e sp
urt
of dat
a
i
n
al
l
dom
ai
ns (alm
ost
)
,
i
t
i
s
essent
ial
t
o
have m
oderni
z
e
d
dat
a
expl
orat
ory
m
e
t
hods
, l
i
k
e i
n
crem
ent
a
l
-
cl
ust
e
ri
n
g
, cl
us
t
e
r anal
y
s
i
s
, increm
ent
a
l
-
l
earni
ng et
c. t
o
nam
e
a few. Thes
e
m
e
t
hods a
r
e us
eful
i
n
vari
e
d
appl
i
cat
i
o
ns [
1
]
whi
c
h re
q
u
i
r
e han
d
l
i
n
g i
n
fl
ux
of
ne
w dat
a
consi
s
t
e
nt
l
y
and t
o
per
f
o
r
m
forec
a
st
i
ng,
deci
si
o
n
m
a
ki
ng a
n
d
pre
d
i
c
t
i
o
ns.
These a
p
pl
i
cat
i
on
d
o
m
a
i
n
s incl
u
d
e fi
na
nce
[2]
,
b
i
o
l
og
y
[3
],
feed
b
a
ck
an
alysis [4
], sen
s
itiv
ity an
alysis [5
],
electricity p
o
w
er co
n
s
u
m
p
tio
n
etc.
Th
e
p
u
rp
o
s
e
of th
is research p
a
per is to bro
a
d
e
n
th
e ab
i
lities o
f
“In
c
remen
t
al clu
s
terin
g
u
s
i
n
g
Naïv
e Bays an
d Clo
s
en
ess-Facto
r
” (ICNBCF) [6
] algo
rith
m
,
and
i
n
tro
d
u
ce set
o
f
activ
ities at p
o
st-
clu
s
tering
p
h
a
se. Th
ese activ
ities in
clud
e
v
a
lid
ating
clu
s
ter stru
ct
u
r
es.
Th
ese m
o
d
i
ficatio
n
s
p
r
ov
ed
th
e
enha
ncem
ents in resulting cluster
structures. ICNBCF has
already pr
ove
d
by im
ple
m
enting pa
ram
e
ter-free
d
a
ta-clu
stering alg
o
rith
m
.
UCI rep
o
s
it
o
r
y’s
W
i
n
e
,
W
i
n
e
Qu
ality, Electricity, So
ft
ware Pro
j
ect and
Zenn
i
Opt
i
c
s dat
a
set
s
are use
d
t
o
per
f
o
r
m
experi
m
e
nt
s. The re
sul
t
i
ng cl
u
s
t
e
r
s
are eval
uat
e
d by
f
o
ur a
n
al
y
t
i
cal
m
easures:
f
-
m
e
asure
,
R
a
nd
i
n
dex
,
Vari
a
n
ce
and
D
u
nn
i
n
de
x.
2.
R
E
SEARC
H M
ETHOD
IC
NB
C
F
i
n
cre
m
ent
a
l
cl
ust
e
ri
ng
al
g
o
ri
t
h
m
s
is exec
ut
ed
by
usi
n
g M
i
cr
os
of
t
Vi
sual
St
udi
o
20
0
5
a
n
d
Ecl
i
p
se Ja
va
pl
at
form
on
an
I
n
t
e
l
®
C
o
reTM
i
5
C
P
U M
4
5
0
@
2.
4
0
G
H
z,
4
G
B
R
A
M
c
o
m
put
e
r
sy
t
e
m
.
The n
ovel
m
e
tho
d
base
d o
n
“
c
l
u
st
er-first” a
p
proach, alm
o
st param
e
ter-free, error-b
a
sed statistica
l
i
n
crem
ent
a
l
clust
e
ri
n
g
m
e
t
hod [
1
]
i
s
ext
e
nd
ed i
n
t
h
i
s
pa
pe
r. T
o
exec
ut
e I
C
NB
C
F
fo
r t
h
e fi
rst
t
i
m
e
on ne
w
dat
a
set
,
pre-
p
r
ocessi
n
g
o
r
pr
e-cl
ust
e
ri
ng st
e
p
m
a
y
be co
m
put
e
d
, (PC
A
)
.
IC
NB
C
F
ge
ner
a
t
e
s basi
c cl
ust
e
rs,
at
fi
rst
an
d t
h
e
n
ei
t
h
er
u
p
d
at
es cl
ust
e
rs
or
gene
rat
e
s
new
cl
ust
e
rs
base
d
on
i
n
fl
ow
o
f
new
dat
a
.
IC
N
B
C
F
works in th
ree ph
ases. In
t
h
e first
ph
ase in
i
tial clu
s
te
rs are bu
ilt. On
ce t
h
e
b
a
sic clu
s
ters are read
y,
with
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE Vo
l. 6
,
N
o
. 1
,
Febru
a
ry
2
016
:
27
5
–
28
2
27
6
i
n
fl
u
x
of
ne
w i
n
f
o
rm
at
i
on, e
x
i
s
t
i
ng cl
ust
e
r
i
s
up
dat
e
d o
r
ne
w cl
ust
e
r i
s
f
o
r
m
ed. A
ne
w cl
ust
e
r i
s
gene
rat
e
d i
f
th
e b
e
h
a
v
i
or
of inpu
t d
a
ta is
en
tir
el
y
di
f
f
ere
n
t
f
r
om
t
h
e
pr
evi
o
usl
y
ge
ne
r
a
t
e
d cl
ust
e
rs a
n
d
t
h
ei
r
m
e
m
b
ers,
that m
eans there exists t
h
res
hol
d
diffe
re
nc
e. IC
NBCF
does not use
t
h
e
(exte
r
nal
)
distance
m
easure
for
determ
ining the relative closeness bet
w
een the series
. It
m
a
kes use of the novel a
p
proach “clos
eness
fact
or”
.
T
h
e
m
e
t
hod
nee
d
s
t
h
res
hol
d val
u
e t
o
be set
wh
i
c
h i
s
use
d
as
l
i
m
i
t
whi
l
e
fo
r
m
i
ng cl
ust
e
rs
.
The
cl
ust
e
rs are
ge
nerat
e
d dy
nam
i
cal
l
y
and resu
l
t
s
show t
h
at
w
i
t
h
go
od c
h
oi
ce of t
h
resh
ol
d
val
u
e,
IC
NB
C
F
has
assets that are
valua
b
le for analyst.
IC
NB
C
F
i
s
an e
x
t
e
nsi
o
n
of
ori
g
i
n
al
C
F
BA algorith
m [1
].
Th
e statistical
d
e
tails o
f
CFB
A
algorith
m
are as sh
own
b
e
lo
w:
CFB
A
al
gori
t
hm:
In th
is algorithm
cap
ital C an
d
G are
u
s
ed interch
a
n
g
e
ab
ly. Read
th
e
inp
u
t
csv
file,
using
a
CSV Parser –
store t
h
e c
onte
n
ts in a
V
ector of
“Dataset” objects
1.
Read all the cl
ose
n
ess
param
e
ters:
a.
New or i
n
cremental
b.
Num
b
er
of Se
ries to
use to ca
lculate the “
g
”
values
– de
faul
t is 2
2.
If m
e
thod
is
N
e
w, t
h
en
a.
Meth
od
=
N
e
w
i.
CalculateG()
Fo
r all th
e seri
es,
a.
S1=S(i), S2=S(i+1)
b.
Calculate the s
u
m
of eac
h c
o
lum
n
)
(
)
(
)
(
2
1
j
S
j
S
j
T
T
c.
Calcu
l
ate th
e su
m
o
f
each
series – th
is
will be u
s
ed
t
o
calculate th
e Prob
abilit
y ratio
.
n
j
n
j
j
T
j
S
p
1
1
1
)
(
)
(
d.
Calculate the E
r
ror for each se
ries,
)
1
(
)
(
)
(
)
(
)
(
p
p
j
T
j
S
j
T
p
j
c
i
e.
Calculate the
Weight of eac
h series,
)
(
)
(
j
T
j
w
f.
Calculate G
for these
two se
ries,
n
j
n
j
j
w
j
w
j
c
j
G
1
1
2
)
(
)
(
)
(
)
(
g.
Store
the
p, G
values
for each series
by a
ddi
ng two c
o
lum
n
s to t
h
e e
n
d of
the series
.
h.
Next i
ii.
CreateClusters
(
)
Create clusters
usi
n
g the cl
os
eness m
e
thod
For all the se
ri
es “i” to “
n
”,
a.
Get th
e
g
v
a
l
u
e of S(i)
b.
Check for Seri
es_proce
sse
d_flag(Boolean) –
i.
if fla
g
=true, i
g
nore se
ries, a
s
its alread
y
p
a
rt
o
f
a cl
ust
e
r.
C
ont
i
n
ue t
h
e
fo
r
-
l
o
op
fo
r the
ne
xt ele
m
ent
ii.
Else CreateCluster and a
d
d thi
s
series a
s
part
of the cl
uster
c.
For all the se
ri
es “j=i+
1
” to “
n
”
i.
Get th
e
g
v
a
l
u
e of S(j)
ii.
If
(S
(i)
– S
(
j)
<
close
n
ess
_
fact
or
)
1.
Ad
d
S(
j)
t
o
t
h
e
C
l
ust
e
r
2.
Set
Series
_p
r
o
cessed
_
fla
g
to
true fo
r Series(
j
)
iii.
C
ont
i
n
ue fo
r n
e
xt
el
em
ent
of Seri
es(
j
)
d.
C
ont
i
n
ue
fo
r t
h
e ne
xt
el
em
ent
of
Seri
es
(i
)
At th
e end
o
f
th
e
for loop
, all
series sh
ou
ld be p
a
rt of so
m
e
clu
s
ter, and
all
th
e series
shoul
d
have
the “Series
_
proc
essed_fla
g
” to
true.
b.
Else, Method
= Increm
ental
i.
Read
th
e Resu
l
t
s file, wh
ich co
n
t
ains th
e clusters an
d its elemen
ts, in
m
e
mo
ry
ii.
CalculateG() – step
3.a
.
i
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Thres
h
ol
d C
o
mp
ut
at
i
o
n t
o
D
i
scover C
l
ust
e
r
St
r
u
ct
ure,
a
N
e
w
Ap
pr
oac
h
(Preeti Mulay)
27
7
iii.
Up
dat
e
C
l
use
r
s
(
)
Fo
r each
ex
istin
g clu
s
ters,
a.
Get each Se
ries in t
h
is cluster, S(i
)
b.
Get th
e
g
v
a
l
u
e of S(i)
c.
For
eac
h
newl
y
adde
d
Seri
es
i.
Get
th
e Series S(j
)
ii.
Get
Se
ries_
p
r
o
cessed
_
fla
g
. If
true,
ig
nore se
ries, and
go to s
t
ep v.
iii.
Get th
e
g
v
a
l
u
e of S(j)
iv
.
If
( S
(
i)
–
S(
j)
< close
n
ess
_
fa
ctor)
1.
Add
S(j
)
to
cl
uster,
2.
Set the Se
ries_processe
d_flag to true
v.
C
ont
i
n
ue t
o
t
h
e ne
xt
Se
ri
es
d.
C
ont
i
n
ue t
o
t
h
e ne
xt
cl
ust
e
r.
Check if the
Se
ries_process
_
fl
ag is
set
fo
r all
th
e in
crem
en
ta
l ele
m
en
ts
Fo
r all th
e i
n
cre
m
en
tal series
a.
Fo
llow step
s in 3
.
a.ii
3.
Write
Ou
t
p
u
t
i
n
th
e ou
tpu
t
file
4.
End.
ICNBCF is a
n
increm
ental clustering m
e
thod m
o
st
suitable for qua
n
titative data
sets only. The
esse
ntial
statistical co
mputations i
n
vol
v
ed are
as
follows:
1.
Accept ra
w numeric
dataset
2.
Ap
pl
y
p
r
e-
p
r
oc
essi
ng
l
i
k
e
rem
ove
zeros, PC
A etc. i
f
re
quired
3.
Select two
d
a
taseries say
DS
1
and
DS
2
4.
Co
m
p
u
t
e
th
eir row-t
o
tal
T(j
)
, wh
ere j
=
1
,
2
5.
Co
m
p
u
t
e ev
ery co
lu
m
n
-to
t
al /
attrib
u
t
e to
tal
T
i
(j),
w
h
ere
i v
a
ries f
r
om
1 to
N,
N is
n
u
m
b
er
of
attrib
u
t
es.
6.
C
o
m
put
e gra
n
d-t
o
t
a
l
= T(
DS
1
) an
d
T(
DS
2
)
7.
Th
e
p
r
ob
ab
ility th
at DS
1
and
DS
2
will b
e
p
a
rt o
f
sam
e
clu
s
ter is co
m
p
u
t
ed
u
s
ing
8.
P
j
∑
9.
Ex
pect
ed
val
u
e
o
f
dat
a
seri
es i
s
gi
ven
by
D
s
j
∗
gr
and
t
o
ta
l
10
.
Err
o
r
i
s
c
o
m
put
ed by
f
o
l
l
o
wi
n
g
fo
rm
ul
a:
err
o
r
∗
∗
11
.
Wei
g
ht
s
of i
n
d
i
vi
dual
at
t
r
i
but
es i
s
com
put
ed
usi
n
g
12
.
w
i
√
co
lumn
t
o
t
a
l
13
.
The cl
osene
ss
factor [2] is c
o
m
put
ed usi
n
g e
r
ror and
weights, as
14
.
closeness
f
act
o
r
j
∑
∑
This com
puted closeness
-fact
or is refe
rre
d
here
a
f
ter as CF. Once the algorithm
calculates all
CF
v
a
lu
es; it is th
e ti
me to
d
eci
d
e
th
e clu
s
ter stru
cture.
CF
values guide users which dat
a
series are close to
each ot
her a
nd can be
a pa
rt of sam
e
cluster. Once t
h
e
de
cision
of two c
l
ose data se
ries is
m
a
de, it is also
essent
i
a
l
t
o
k
n
o
w
w
h
et
her t
h
e sel
ect
ed dat
a
seri
es ha
ve m
a
t
c
hi
ng e
v
e
n
t
s
base
d o
n
sel
e
c
t
ed feat
u
r
es
or
not
?
To achi
e
ve t
h
i
s
m
odi
fi
ed
Naï
v
e B
a
y
e
s
m
e
t
hod i
s
a
ppl
i
e
d
b
a
sed o
n
C
F
va
l
u
es. T
h
e com
b
i
n
at
i
o
n o
f
C
F
an
d
Naï
v
e B
a
y
e
s m
e
t
h
o
d
pr
o
v
ed
t
o
be a
not
her
w
a
y
t
o
confirm
Prin
ci
p
a
l
Co
m
p
on
en
ts o
f
g
i
ven
d
a
taset.
Naïv
e Bayes
classifier is a term
in
Bayesian
statistics d
ealin
g
with
sim
p
le p
r
o
b
a
b
ilities b
a
sed
cl
ust
e
ri
n
g
base
d
on
ap
pl
y
i
ng
B
a
y
e
s’ The
o
re
m
wi
t
h
naï
v
e i
nde
pe
nde
nt
as
sum
p
t
i
ons. “
I
n
d
epe
n
dent
feat
ure-
b
a
sed
m
o
d
e
l” is m
o
re su
itab
l
e sim
i
lar term
u
s
ed
fo
r pro
b
a
b
ility b
a
sed
m
o
d
e
ls.
Naï
v
e Bayes m
e
t
h
od
assum
e
s that the
prese
n
ce
or
abse
nce
of
a
particular
feature of a class
is
un
rel
a
t
e
d
t
o
pr
esence or
a
b
se
nce of
any
ot
her
feat
u
r
e.
In
m
a
ny
pra
c
t
i
cal
appl
i
cat
i
ons
m
a
xim
u
m
l
i
k
el
i
hoo
d a
p
pr
oach
i
s
u
s
ed
f
o
r pa
ram
e
t
e
r-ba
s
ed
est
i
m
a
ti
ons
usi
n
g
Naï
v
e B
a
y
e
s’.
A
n
a
d
v
a
nt
age
of
u
s
i
n
g
Naïv
e Bayes’
meth
o
d
is th
at
it requ
ires a smal
l
am
ount
of trai
ning
data to es
tim
a
te the pa
ra
m
e
ters n
ecessa
ry for finding
whet
her the t
w
o
give
n
data se
ries
are cl
ose
t
o
e
ach
ot
he
r
base
d
on
pa
rt
i
c
ul
a
r
l
y
sel
ect
ed fe
at
ure
or
n
o
t
.
In
IC
NB
C
F
,
C
F
val
u
es a
r
e
use
d
i
n
st
ead
of m
ean an
d
vari
a
n
ce
of
va
ri
abl
e
s,
as i
n
o
r
i
g
i
n
al
m
e
t
hod,
as t
h
e
C
F
val
u
e i
s
c
o
m
put
ed
base
d o
n
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE Vo
l. 6
,
N
o
. 1
,
Febru
a
ry
2
016
:
27
5
–
28
2
27
8
ent
i
r
e set
of a
t
t
r
i
but
es.
Naï
v
e B
a
y
e
s’ ba
se
d m
e
t
hod i
s
f
a
st
and
i
n
c
r
e
m
ent
a
l
and
de
al
wi
t
h
di
scret
e
an
d
co
n
tinuo
us attribu
t
es. Th
e co
m
p
ariso
n
o
f
p
e
rf
or
m
a
n
ce in
v
a
r
i
ou
s domain
s
confirms the advanta
g
es of
su
ccessi
v
e
learn
i
ng
an
d sugg
ests its ap
p
licatio
n to
o
t
h
e
r learning algorithms.
In
the Baye
sian a
p
proac
h
,
the
task
co
rresp
ond
s to
find
ing
class lab
e
l y
t
h
at b
a
sed
on selected im
pactful feat
ures that
m
a
xim
i
zes
the
p
r
ob
ab
ility th
at th
e two
d
a
ta series are fo
und clo
s
e t
o
each
o
t
h
e
r an
d th
eir
sp
ecific ev
en
ts also
m
a
tch
.
Let
x=(
x
1,
x2
,
…
..
xd
)
be t
h
e
set
of at
t
r
i
but
e
val
u
es
f
o
r a
n
u
n
l
a
bel
e
d
i
n
st
a
n
ce z=(x
,y
).
Th
e p
o
st
eri
o
r
/
m
a
tch
i
n
g
p
r
ob
ab
ilities fo
r y
g
i
v
e
n
x
can
b
e
co
m
p
u
t
ed
u
s
ing
th
e Bayes th
eo
rem
as:
|
|
1
,
2
,
…..
1
,
2…..
∗
1
,
2
…
..
Sin
ce
we are in
terested in
com
p
arin
g
th
e
posterio
r
or
m
a
tc
h
i
ng
p
r
o
b
a
b
ilities
for d
i
fferent
v
a
lu
es of
y
,
we ca
n si
m
p
ly
i
gno
re t
h
e
den
o
m
i
nator t
e
rm
. The di
f
f
i
c
ul
t
pa
rt
i
s
t
o
det
e
rm
i
n
e t
h
e c
o
ndi
t
i
onal
probabilities P(x1,x2,…
…
.x
d|y) for e
v
ery
possible cluste
r. A Naïve
Baye
s’ m
e
thod attem
p
ts
to resol
v
e this
by
m
a
ki
ng a
d
di
t
i
onal
as
sum
p
t
i
o
n
s
r
e
ga
rdi
n
g
t
h
e
nat
u
re
of
rel
a
t
i
o
nshi
p
s
am
ong t
h
e
gi
ve
n at
t
r
i
b
ut
e
s
. It
assu
m
e
s th
at a
ttrib
u
t
es are con
d
ition
a
lly in
dep
e
nd
en
t o
f
each
o
t
her wh
en class lab
e
l y
i
s
k
nown. In
o
t
h
e
r
wo
rd
s
|
∗
fo
r all i’s an
d
j’s, therefore,
1
,
2
,
……..
|
∏
|
This equation
is
m
o
re practical because i
n
stead of com
puti
ng t
h
e conditional proba
b
ilitie
s
for eve
r
y possi
ble com
b
ination of x
give
n y, we only
have
to esti
m
a
te
the co
nditional proba
bilities
for each
pair P(Xi|y). T
h
e CF values c
a
lculation has
already ta
ken c
a
re of fi
ndi
ng
data series whi
c
h are close to each
ot
he
r,
o
n
l
y
l
e
f
t
out
pa
rt
i
s
t
o
co
nce
n
t
r
at
e
o
n
s
p
eci
fic
feat
ure
or e
v
e
n
t for m
o
re effectual close
n
ess
base
d
increm
ental-clustering.
To cluster
base
d on instance z
=
(x,y) the naï
v
e ba
yes
m
e
thod com
putes the
posteri
or
probability of
y
gi
ven
x
usi
ng
∏
|
an
d sel
ect
s t
h
e val
u
e
of
y
t
h
at
m
a
xi
m
i
zes t
h
i
s
pro
duct
.
Thi
s
wa
y
in
crem
en
tal cl
u
s
tering
is ob
tain
ed
u
s
i
n
g th
e
co
m
b
i
n
at
i
on
of
cl
ose
n
ess
fact
or
an
d B
a
y
s
t
h
eorem
.
Th
e cru
c
ial task
inv
o
l
v
e
d
wh
ile d
e
sig
n
i
ng
n
e
w in
crem
en
tal clu
s
tering
algorith
m
ICNBCF b
a
sed
on
CF and
Naiv
e
Bays ap
pro
a
ch [6
] is to d
eci
de th
e thre
sh
o
l
d for
form
in
g
clu
s
ter st
ru
ct
u
r
es. In
th
e i
n
itial first
pha
se cluster
structure is created from
com
p
le
tely
raw dat
a
an
d
by
p
e
rf
orm
i
ng Pri
n
ci
pal
C
o
m
p
o
n
en
t
Analysis (PCA)
as pre-clustering stage.
In t
h
e sec
o
nd
pha
se
of
fo
rm
ing
cl
ust
e
r
st
r
u
ct
ure,
analytical evaluation m
easure
s
s
u
ch a
s
m
a
ximu
m
Dunn Index, f-measure,
varia
n
ce, m
ean (
)
and
st
an
dar
d
d
e
vi
at
i
on
(
) i
s
com
put
ed. T
h
e t
a
bl
e 7 s
h
o
w
s t
h
e
co
m
p
lete co
mp
ariso
n
u
s
ing
th
ese ev
alu
a
ti
on
m
easu
r
es o
n
v
a
riou
s standard
q
u
a
n
titativ
e d
a
tasets su
ch
as
UCI
Mach
i
n
e Learn
i
n
g
Reposito
ry’s W
i
n
e
, W
i
n
e
Qu
ality
, Zenn
i Op
tics,
So
ft
ware
Pro
j
ect an
d
Electricity.
The ta
ble no. 3 shows the
eval
uation
m
easure
use
d
, and t
h
eir form
ulas.
Tabl
e
3. E
v
al
u
a
t
i
ons m
easure
s
External eval
uation m
easures
in
clude num
b
er of clusters
havi
ng m
a
ximum
f-m
easure and Rand
i
nde
x an
d i
n
t
e
rnal
m
easures are vari
a
n
ce and t
h
e D
u
nn i
nde
x. I
n
ad
di
t
i
on, t
h
e
num
ber of cl
ust
e
rs an
d
cluster e
r
ror ra
te are also obta
ined t
o
m
easure cluster
res
u
lts.
The test
ex
per
i
m
e
nt
of t
h
e
I
C
NB
C
F
i
n
cremen
tal clu
s
ter alg
o
rith
m
is
co
ndu
cted
b
y
runn
ing
th
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Thres
h
ol
d C
o
mp
ut
at
i
o
n t
o
D
i
scover C
l
ust
e
r
St
r
u
ct
ure,
a
N
e
w
Ap
pr
oac
h
(Preeti Mulay)
27
9
algorithm
fifty times sim
u
la
ting influx
of da
ta, to validate increm
ental-cl
ustering,
for each of the five
real
d
a
tasets:
W
i
n
e
,
W
i
n
e
Qu
ality, So
ft
ware Proj
ects, Zenn
i
Op
ti
cs and
Electricity.
The rea
s
o
n
be
hi
n
d
u
s
i
n
g U
C
I M
achi
n
e
L
earni
ng R
e
p
o
sito
ries is m
u
lt
ifo
l
d, on
e is classes are
already known and all of the
m
are va
lidated datasets. As t
h
e classes are
alr
ead
y kn
ow
n, it b
ecom
e
s
ea
sier to
com
p
are the results give
n
by resear
c
h
ed algorithm
,
for
ex. Wine da
taset contains
three classes
and
W
i
n
e
Qu
ality co
n
t
ains si
x
classes etc.
Th
e ev
alu
a
tion
m
easu
r
es wit
n
essed
W
i
n
e
ach
i
ev
i
n
g a m
ean
o
f
2
.
9
8
for three clu
s
t
e
rs an
d
Win
e
Qu
ality 4
.
26
fo
r si
x
cluste
rs. ICNBCF is co
m
p
eten
t to
fi
n
d
ou
tliers as
well as
d
u
p
licates, alm
o
st d
u
p
licates fro
m
g
i
v
e
n
d
a
ta. Hen
ce it was ob
serv
ed
th
at
W
i
n
e
Qu
ality d
a
taset
also
co
n
t
ains 10
win
e
s of
9
th
q
u
ality, an
d
fo
ur i
m
p
actfu
l featu
r
es, i
n
stead
o
f
three,
wh
ich
was
pu
b
lished
.
Ev
alu
a
tion
resu
lts also
witn
essed
wo
rst resu
lts wh
ile
cl
us
t
e
ri
ng Ze
nni
Opt
i
c
s,
wi
t
h
a
m
ean of 1.
0
6
,
and
det
ect
ed
onl
y
one
cl
ust
e
r i
n
i
t
i
a
l
l
y
.
Li
near
or s
e
m
i
-correl
a
t
e
d dat
a
set
p
e
rf
orm
best
w
i
t
h
IC
NB
C
F
,
whi
l
e
highly correlat
e
d classes
s
how
s d
e
gr
ad
ed
per
f
o
r
m
a
n
ce.
Tab
l
e
4
.
Resu
lts showing
clusters fou
n
d
,
clu
s
ter error an
d total ex
ecu
tion
ti
me.
Da
ta
s
e
ts
W
i
n
e
2,9
8
0,2
5
0
,
14
0,0
7
0,74
0,4
4
E
l
ec
t
r
i
c
i
t
y
3
,9
8
0
,7
7
0
,32
0
,0
5
0
,48
0
,3
8
So
f
t
w
a
r
e
Pr
o
j
e
c
t
2,3
6
0,4
8
0
,
37
0,0
5
2,32
0,6
3
Z
e
n
n
i
O
p
t
i
c
s
1,0
6
0,2
4
0
,
88
0,0
3
1,4
0
,5
6
W
i
n
e
Q
u
al
i
t
y
4
,2
6
0
,4
4
0
,01
0
,0
1
0
,52
0
,5
4
C
l
u
s
t
e
r
s
f
o
u
n
d
C
l
u
s
t
er
i
n
g
E
r
r
o
r
T
o
t
al
ex
ec
u
t
i
o
n
Ti
m
e
Th
e to
tal ex
ecu
tio
n
tim
e reco
rd
ed
fo
r i
n
d
i
vid
u
a
l d
a
taset
dep
e
nd
s
on
size o
f
d
a
taset g
i
ven
as inpu
t
fo
r fi
n
d
i
n
g a
p
p
r
o
p
ri
at
e cl
u
s
t
e
r
an
d f
o
r acc
om
m
odat
i
ng ne
w
dat
a
seri
es
wi
t
h
i
n
fl
u
x
of
dat
a
.
As s
h
ow
n i
n
t
h
e
t
a
bl
e no
. 4,
ru
nt
im
e for sm
all
electricit
y
d
a
t
a
is 0
.
48
and
2.32
fo
r larg
est
so
ft
ware
p
r
oj
ect d
a
taset. It is also
p
r
ov
ed
th
at this alg
o
r
ith
m
i
s
stab
le wh
ile p
r
o
cessing
si
n
g
l
e record
fro
m
in
pu
t d
a
taset. Th
e av
erag
e
d
i
f
f
e
r
e
n
ce recor
d
ed
is
0.0052
(
(
∑
_tim
e) / Nm
ax) / (DatasetNum
b
er).
T
h
e m
a
xim
u
m
avera
g
e
distribution
equals
0
.
6
3
,
re
cor
d
e
d
f
o
r the
Soft
ware
P
r
o
j
e
c
t dataset an
d t
h
e m
i
nim
u
m
m
ean distri
buti
o
n
,
reco
rde
d
f
o
r the
Electricity d
a
taset, is 0.38
.
Tabl
e
5. C
o
m
p
ari
s
o
n
of
wi
ne
and
Ze
nni
O
p
t
i
c
s by
e
x
t
e
r
n
al
m
easures
Da
t
a
s
e
ts
F
-
m
eas
u
r
e
R
an
d
i
n
d
e
x
Wi
n
e
0,84
0,0503
0,84
0,03
Z
e
nni
Op
t
i
c
s
0,7
0
,0127
0,56
0,05
The eval
uat
i
o
n
m
easures,
na
m
e
l
y
f-
m
easure and R
a
nd
In
dex
usi
n
g
W
i
n
e
and Ze
n
n
i
O
p
t
i
c
s dat
a
set
i
s
gi
ven i
n
t
a
bl
e no
. 5 f
-
m
easure
usi
n
g
W
i
ne d
a
t
a
set
i
s
0.
84
whi
c
h sho
w
s t
h
e c
o
m
p
act
ness of cl
ust
e
r
m
e
m
b
ers.
T
h
e R
a
nd
I
n
dex
va
l
u
e of
0
.
8
4
s
h
o
w
s di
f
f
ere
n
t
pe
rspect
i
v
e
due
t
o
de
nse dat
a
d
i
st
ri
but
i
o
n rel
a
t
e
d
t
o
som
e
dat
a
seri
es and
vari
e
d
di
st
ri
b
u
t
i
o
n a
t
ot
her en
d
of
d
a
ta file. Zenn
i Op
tics is a n
o
n
-
lin
ear d
a
taset an
d
records f-m
eas
ure
(
= 0
.
7)
and
th
e Rand
i
n
dex
(
= 0.56).
Table
6. C
o
m
p
arison
of Electricity a
nd
Soft
ware
Project by intra m
easures
D
a
t
a
s
e
t
s
V
a
r
i
a
n
c
e
Dunn
i
nde
x
Softw
a
re
Proj
e
c
t
0.
8
5
0.
0
7
11
.9
4
2.
8
7
Electricity
0
.
3
8
0.
0
1
9.
6
8
4.
8
1
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE Vo
l. 6
,
N
o
. 1
,
Febru
a
ry
2
016
:
27
5
–
28
2
28
0
Vari
a
n
ce an
d
Du
n
n
I
nde
x c
o
m
put
at
i
on base
d o
n
S
o
ft
ware
Pro
j
ect
a
nd El
e
c
t
r
i
c
i
t
y
dat
a
set i
s
sho
w
n
i
n
t
a
bl
e 6. It
is obse
r
ve
d t
h
a
t
t
h
e vari
ance
com
put
ed of Soft
ware P
r
o
j
ect
dat
a
i
s
hi
gh alm
o
st
equa
l
t
o
com
put
ed f
o
r
W
i
ne dat
a
set
.
B
y
obser
vi
n
g
t
h
e val
u
es c
o
m
p
u
t
ed
u
s
ing
Electricity d
a
t
a
set, it
is cle
a
r th
at
m
o
st d
a
ta is gath
ered
arou
nd m
ean
clo
s
ely. In add
ition
,
t
h
e av
erag
e
Dun
n
ind
e
x v
a
l
u
e for t
h
e Electri
ci
t
y
d
a
ta is
9
.
6
8
wi
th
a
h
i
gh
sep
a
ratio
n
rate of
4
.
8
1
. Th
e al
g
o
rith
m
attain
s th
e o
p
tim
u
m
ru
n fo
r Electricity as it
achieves
a ze
ro e
r
ror rate
for a
b
out
80%
of fi
fty i
nde
pe
ndent runs
a
n
d the
ass
o
ciated varia
n
ce
a
n
d Dunn
i
nde
x ha
ve val
u
es o
f
0.
3
8
an
d 1
2
.
71 r
e
spec
t
i
v
el
y
.
It
i
s
ob
serve
d
t
h
at
t
h
e
com
put
at
i
ons gi
ve
n by
eval
u
a
t
i
o
n
measures i
ndic
a
te success
f
ul
com
b
ining
of s
i
m
i
lar data m
e
m
b
ers in the
form
of clusters.
To
furth
e
r evalu
a
te th
e p
e
rform
a
n
ce o
f
ICNBCF, in
ad
d
ition
to
ev
alu
a
tio
n
m
easu
r
es,
o
t
h
e
r
in
crem
en
tal cl
u
s
tering
al
g
o
rith
m
s
n
a
m
e
ly C
O
B
W
EB
, I k-mean
s is used [7
].
Fro
m
th
e co
mp
ariso
n
tab
l
e no
. 7, it is v
i
sib
l
e th
at
IC
NB
C
F
achi
e
ved
best
val
u
e o
f
co
rrec
t
cl
ust
e
rs
as g
i
v
e
n
in
pu
b
lish
e
d
resu
lt
s an
d
o
n
e
add
itio
n
a
l clu
s
ter sho
w
i
n
g
h
i
gh
er
q
u
a
lity win
e
in
W
i
n
e
Qualit
y
dat
a
set
.
Va
ri
an
ce val
u
e=
0.
3
5
,
cl
ust
e
r e
r
r
o
r=
0.
08
,
gi
ve
n by
IC
NB
C
F
sh
o
w
s i
n
ne
r cl
ust
e
r com
p
act
ness
.
The
f
-
m
easu
r
e =
0.96
wh
ich
is
o
b
t
ain
e
d
i
n
50
%
o
f
to
tal iter
a
tio
n
s
, again o
n
e
m
o
r
e
check
fo
r
pr
oposed
in
crem
en
tal cl
u
s
tering
.
As sho
w
n
in
t
h
e tab
l
e
n
o
.
7,
I k
-
m
eans, C
O
B
W
EB
a
n
d k
-
m
eans ha
ve f
-
m
easures=
0
.
9
6
,
0.
85 a
nd
0.
9
6
res
p
ect
i
v
el
y
,
t
h
e wo
rst
error v
a
lu
e is th
at o
f
COB
W
EB (0
.2
969
). ICNB
CF
ach
iev
e
s
h
i
g
h
est in
tra-cl
u
s
ter v
a
rian
ce wh
ich
is
0
.
77
, and
low
cluster
err
o
r
v
a
lu
e,
on
ly j
u
st 0
.
22.
In
crem
en
tal v
e
rsion
o
f
k-m
e
a
n
s
p
r
ov
id
es 0.2
4
in
tra-cl
u
s
ter v
a
rian
ce, wh
i
c
h
is l
o
west, st
ill d
o
e
s
no
t
p
r
ov
id
e
correct set
of
cluster m
e
m
b
ers.
The
cluster error c
o
m
put
ed fo
r I k-m
eans
i
s
ve
ry
hi
g
h
,
.0
8
9
. He
nce, i
t
i
s
p
r
ov
ed
th
at ICNBCF is
mo
st su
itab
l
e increm
en
tal-
cl
ust
e
ri
ng m
e
t
hod
for
Wi
ne
dat
a
set
.
The g
r
ap
hi
cal
com
p
ari
s
on
us
i
ng f
-
m
easure,
vari
ance an
d
cl
ust
e
ri
n
g
err
o
r, al
on
g wi
t
h
2% m
ovi
ng a
v
era
g
e base
d
on f-
measu
r
e is show
n in
f
i
gu
r
e
1
b
e
low
.
0.
96
0
.
96
0.
85
0.
96
0.
35
0.
3
4
0.
39
0.
34
0.
08
0
.
08
0.
29
0.
1
0
0.
2
0.
4
0.
6
0.
8
1
1.
2
MC
F
B
A
I
k
‐
me
an
s
C
O
B
W
E
B
k
‐
me
an
s
f
‐
me
a
s
u
r
e
va
r
i
a
n
c
e
c
l
us
t
e
r
i
ng
‐
er
ro
r
2
pe
r
.
Mo
v
.
Av
g
.
(f
‐
me
a
s
u
r
e
)
Li
n
e
a
r
(
c
lu
st
e
r
in
g
‐
er
ro
r)
Fi
gu
re
1.
S
h
o
w
i
ng c
o
m
p
ari
s
o
n
of
f
o
u
r
dat
a
-
c
l
u
st
eri
n
g al
go
ri
t
h
m
s
Th
e b
e
st D
unn in
d
e
x
v
a
lu
e
ob
tain
ed
f
o
r
th
e
So
f
t
w
a
r
e
Pro
j
ect d
a
taset b
y
I
C
N
BCF clu
s
ter
i
ng
w
a
s
1
3
.83. Th
e
I k
-
m
ean
s, COB
W
EB and
k-mean
s, on
th
e
o
t
h
e
r h
a
n
d
, fail
ed
to
attain
th
e v
a
lu
e in
an
y of th
eir run
s
. For th
e
So
ft
ware Proj
ect d
a
taset, ICNBCF clu
s
teri
n
g
was ab
le to
ob
tain
th
e lo
west clu
s
terin
g
erro
r v
a
l
u
e (0
.2
9)
corres
ponding
to two clusters
and a
v
a
lu
e of
0.34
cor
r
e
spo
n
d
i
ng
t
o
thr
e
e-
clu
s
ter gr
oup
s; wh
ile th
e
o
t
h
e
r
algorithm
s
recorde
d
clusteri
ng erro
rs
near t
o
a val
u
e of one suc
h
as t
h
e
I k-m
eans and thus this m
odel
gene
rat
e
d
t
o
o
m
a
ny
cl
ust
e
rs i
n
m
o
st
of
t
h
e
r
uns
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Thres
h
ol
d C
o
mp
ut
at
i
o
n t
o
D
i
scover C
l
ust
e
r
St
r
u
ct
ure,
a
N
e
w
Ap
pr
oac
h
(Preeti Mulay)
28
1
Tabl
e 7.
C
o
m
p
arison of
Perform
ance
The res
u
l
t
s
usi
ng Ze
n
n
i
O
p
t
i
c
s dat
a
i
s
as sho
w
n i
n
Ta
bl
e 7;
t
h
e f-m
easure = 0
.
6
7
, R
a
nd i
nde
x =
0.
55
, vari
anc
e
=
1.
1
2
, Du
n
n
i
nde
x=1
5
.
4
2 a
n
d
cl
ust
e
ri
ng
e
r
r
o
r=
0.
8
9
u
s
i
n
g IC
NB
C
F
al
g
o
ri
t
h
m
.
I k
-
m
e
ans
has vi
si
bl
y
o
b
t
ai
ned bet
t
e
r
r
e
sul
t
s
t
h
a
n
IC
NB
C
F
, w
h
ere t
h
e
f
-
m
easure was 0.
70;
b
u
t
t
h
ere were
p
o
o
re
r
resul
t
s
by
k-m
eans
f-m
easure
=
0.
6
0
a
n
d
C
O
B
W
EB
:
f
-
m
easure=
0.
6
6
.
For Electricity, a s
m
all data set com
posed of 75
data
o
b
ject
s wi
t
h
f
o
u
r
cl
u
s
t
e
rs, al
l
al
gori
t
hm
s gave
o
p
tim
al resu
lts, m
o
stl
y
si
mila
r. Th
eir f-m
eas
u
r
e is on
e,
cl
u
s
t
e
ri
ng e
r
r
o
r i
s
zero a
nd
vari
a
n
ce i
s
0.
3
7
(Ta
b
l
e
7)
. The I
k-m
eans,
ho
we
ver
,
at
t
a
i
n
s i
n
feri
o
r
res
u
l
t
s
si
nce i
t
s
f-
m
easure
=
0.4
6
,
has a
vari
a
n
ce=0
.
6
3
and
cl
ust
e
ri
n
g
e
r
r
o
r
=
0.
68
.
3.
CO
NCL
USI
O
N
In
t
h
i
s
pape
r vari
ous
com
p
u
t
at
i
ons usi
n
g
e
v
al
uat
i
o
n
m
easures
s
u
ch
as f-m
easure, D
u
nn
I
n
dex
,
R
a
nd
In
de
x al
on
g
wi
t
h
cl
us
t
e
r err
o
r i
s
di
scusse
d. T
h
i
s
di
scussi
o
n
wa
s base
d
on
va
ri
o
u
s i
n
c
r
em
ent
a
l
clustering algorithm
s
such
as
ICNBCF,
Inc
r
em
ental k-m
eans,
C
O
B
W
EB
an
d
k-
m
eans bei
n
g
m
o
st
pi
o
n
eere
d
dat
a
-cl
u
st
eri
ng al
g
o
ri
t
h
m
.
Ex
peri
m
e
nt
al
resu
lts
sh
owed
th
at
ICNBCF is com
p
arab
le to
the o
t
h
e
r
clu
s
tering
algorith
m
s
in
terms of v
a
li
d
ity
measu
r
e.
More
over, the m
e
thod
has ac
hieve
d
a hi
ghe
r
degree of
clustering acc
uracy for som
e
datasets.
In
th
e in
itial part o
f
th
is paper, co
m
p
u
t
ation
of
clu
s
ter thresho
l
d
p
r
actice is d
i
scu
ssed
b
a
sed
on
no
rm
al
i
z
at
i
on.
The
sol
u
t
i
o
n
s
di
sc
usse
d i
n
t
h
i
s
pa
per
pr
o
v
i
d
e a
n
ot
h
e
r
opt
i
o
n
fo
r
m
a
nual
“t
hre
s
hol
d
selectio
n
”
b
y
user
h
a
v
i
n
g
know
ledg
e
o
f
en
tire d
a
ta set.
As a pa
rt
o
f
fut
u
re
wo
rk
, I
C
NB
C
F
i
s
achi
e
vi
n
g
i
n
t
e
res
t
i
ng be
ha
vi
o
u
r
a
l
resul
t
s
usi
n
g va
ri
o
u
s
datasets including
world-wi
de ice-cream
dataset, beer
dat
a
set
,
wi
ne
dat
a
set
revi
si
t
e
d f
r
om
perspect
i
v
e of
co
m
b
in
in
g
wine with
food
in
tak
e
to
geth
er, co
sm
etics d
a
ta
s
e
t b
a
sed
on
sk
in
-typ
e, ag
e, all
e
rg
ies,
fat co
n
t
en
ts
etc., cheese
da
taset, to nam
e
a few as
prim
a
r
y produc
t
s
w
h
i
c
h can
be co
n
s
um
ed di
rect
l
y
, al
on
g
wi
t
h
s
o
me
seco
nda
ry
dat
a
set
s
l
i
k
e t
ea a
n
d c
o
f
f
ee
dat
a
se
t
s
whi
c
h
nee
d
s
m
e
di
um
s t
o
co
nsum
e an
d t
a
st
e.
REFERE
NC
ES
[1]
Mooi, E.A
.
and
M. Sarstedt, 201
1. A Concise G
u
ide to
Mark
et
Research
: Th
e Process, Data and
Methods Using
IBM SPSS Statistics. 1st
Edn
.
, Springer, Ber
lin, I
S
BN-10: 3642125417, pp: 307.
[2]
Cai, F., N.A. Le-Khac and M.T. K
ech
adi, 2012.Cluster
i
ng approaches for
f
i
nancial d
a
taanaly
s
is: A surv
ey
.
Proceedings of
t
h
e 8th Int
e
rna
tio
nal Confer
enc
e
on Da
ta Mining
, (DM’ 12), Las
Ve
gas, Nevad
a
,
USA, pp: 105-
111.
[3]
Nazeer
, K.A., M. Sebastian a
nd S.M.
Kum
a
r
,
2013. A novel
harm
on
y
searc
h
-K m
eans hy
b
r
id algori
t
hm
fo
r
cluster
i
ng gen
e
expressi
on data.
Bioinformation
,
9:84-88. DOI: 1
0
.6026/9732063
0009084.
D
a
t
a
se
ts
M
easur
es
N
B
CIC
I
k-
m
eans
CO
B
W
E
B
W
i
ne
Q
u
a
l
i
t
y
f
-
M
e
a
su
r
e
0
.
96
0.
9
6
0
.
85
R
a
n
d
In
d
e
x
0
.9
5
0
.9
5
0
.8
5
V
a
r
i
an
c
e
0.
3
5
0
.
34
0.
3
9
D
u
n
n
I
nde
x
9
.
4
9
9
.
4
8
.
83
C
l
u
s
t
e
r
Erro
r
0
.0
8
0
.0
8
0
.2
9
S
o
ftw
a
re
P
r
o
j
e
ct
f
-
M
ea
s
u
r
e
0.
9
1
0
.
68
0.
5
9
R
a
n
d
In
d
e
x
0
.8
5
0
.5
3
0
.5
2
V
a
r
i
an
c
e
0.
8
6
1
.
41
1.
8
3
D
u
nn
I
n
d
e
x
1
4.05
3
.
92
9.
1
9
C
l
u
s
t
e
r
Erro
r
0
.2
9
0
.9
3
0
.9
5
Ele
c
t
r
ic
i
t
y
f-Me
a
s
u
re
1
0
.4
6
1
Ra
n
d
I
nde
x
1
0.
6
6
1
V
a
r
i
an
c
e
0.
3
7
0
.
63
0.
3
6
D
u
nn
I
n
d
e
x
1
2.71
1
.
14
1
2
.
4
8
C
l
u
s
t
e
r
Erro
r
0
0
.
6
8
0
W
i
n
e
f-Meas
u
re
0
.
9
0
.3
4
0
.8
9
0
.9
R
a
n
d
In
d
e
x
0
.8
8
0
.5
5
0
.8
7
V
a
r
i
an
c
e
0.
7
7
0
.
24
0.
7
5
D
u
nn
I
n
d
e
x
1
0.16
25
.
0
6
1
0.
3
7
C
l
u
s
t
e
r
Erro
r
0
.2
2
0
.8
9
0
.2
5
Ze
n
n
i
Op
t
i
c
s
f-Me
a
s
u
re
0.
67
0
.
7
0
.
6
Ra
n
d
I
nde
x
0.
55
0
.
5
6
0.
5
Varian
ce
1.
12
1
.
4
1
.
1
9
D
u
n
n
I
nde
x
15
.42
5
.14
1
9
.
4
8
C
l
u
s
t
e
r
Erro
r
0.
89
0
.
8
6
0.
98
1.1
19
.
9
7
0.9
k-
m
e
a
n
s
0.
8
8
0.
7
4
9.
8
7
0.
2
3
0.
66
0.
55
0.
3
6
12
.
4
8
0
0.
5
6
1.
4
6
8.
6
5
0.
8
6
1
1
0.
9
6
0.
9
4
0.
3
4
9.
3
4
0.
1
0.
6
8
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE Vo
l. 6
,
N
o
. 1
,
Febru
a
ry
2
016
:
27
5
–
28
2
28
2
[4]
Inkay
a
, T., 201
1. A methodo
log
y
o
f
swarm
intelligen
ce app
lication
in
clustering b
a
sed on
neighbourhood
construction. Th
e Gradu
a
te Scho
ol
of Natural
and
Applied
Sciences of
M
i
ddl
e
Eas
t
T
echni
ca
l Univ
ers
i
t
y
.
[5]
William
Clast
e
r, “Wine Tasting
and a
Novel
Approach to Cl
uster Anal
y
s
is
”, 2010 Fourth Asia Intern
ation
a
l
Conferenc
e
on
M
a
them
ati
cal
/Anal
y
ti
ca
l Modelling and
Computer Simulation.
[6]
"Evolve s
y
stems using incremental clus
ter
i
ng app
r
oach", P
r
eeti
M
u
la
y, Dr.P
arag
A. Kulkarni Evolving S
y
s
t
em
s
,
An Interdisciplin
ar
y
Journal for
Advanced Scien
ce
and
Technolo
g
y
, Journal No.
12530 b
y
Spring
er, O
c
t 2012
.
[7]
Jain, A.K. and
S. Maheswari,
2
012. Surve
y
of
recen
t clust
e
ring
techniqu
es in data m
i
ning
. Int.
J. Com
put.Sci
.
Manage. Res., 1
:
72-78.
BI
O
G
R
A
P
HY
OF
A
U
T
HO
R
Preeti Mulay
did her M.S from
WSU, MI, USA,
M.Tech from JNTU, H
y
der
a
bad India and PhD
from BVU, Pun
e
. She is associated with S
y
mb
iosis Internation
a
l University
fr
om 2013. Her
areas of
in
terest include mach
in
e learning
, d
a
ta mining, soft
w
a
re eng
i
neering
and knowledge
augmentation.
Evaluation Warning : The document was created with Spire.PDF for Python.