TELKOM
NIKA
, Vol. 11, No. 12, Decem
ber 20
13, pp.
7302
~73
0
8
e-ISSN: 2087
-278X
7302
Re
cei
v
ed
Jun
e
24, 2013; Revi
sed
Jul
y
1
3
, 2013; Acce
pted Augu
st 13, 2013
Clustering-boundary-detection Algorithm Based on
Center-of-gravity of Neighborhood
Wang G
u
i-Z
h
i*, Zhang Jian-Wei
Dep
a
rtment of Comp
uter Appl
icatio
n, Hen
an
Univers
i
t
y
of Animal H
u
sb
an
d
r
y
an
d Econ
om
y, Z
hon
g zho
u
450
04
4, No.2, Yingc
ai Street, Hui Ji district, Z
hengz
ho
u, He Nan pr
ovinc
e
, China
Phon
e: 037
1-6
351
51
39
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: 5228
29
031
@
qq.com
A
b
st
r
a
ct
T
h
e c
l
u
s
ter bo
un
dary
i
s
a
usefu
l
mo
d
e
l, i
n
order
to
i
dentify t
he
bo
u
ndary
effective
l
y, accord
in
g to
the u
neve
n
d
i
stributio
n of
d
a
ta p
o
ints
in t
he e
p
si
lon
ne
i
ghb
orho
od
of
bou
nd
ary o
b
je
cts, a bou
nd
ar
y
detectio
n
a
l
g
o
r
i
thm-S-BOU
N
D
is
prop
ose
d
. F
i
rstly, a
ll t
he
poi
nts in
the
e
p
silo
n
nei
gh
bo
rhoo
d of th
e
d
a
ta
obj
ects are pro
j
ected o
n
to the
boun
dar
y of the conv
ex hu
ll
of the nei
g
h
b
o
rho
od, an
d th
en calc
ulat
e th
e
center of gr
avi
t
y of t
he nei
g
hbor
hoo
d. F
i
n
a
lly, d
e
tect the
bou
ndary
ob
j
e
ct accord
ing
to the de
gre
e
of
devi
a
tion
of t
h
e ce
nter
of gr
a
v
ity
of th
e n
e
i
g
hbor
hoo
d w
i
th
the o
b
j
e
ct. T
h
e
exp
e
ri
me
ntal
r
e
sults s
how
th
a
t
the S-BOUND
alg
o
rith
m c
an
accurate
ly d
e
tect a var
i
ety
of clusteri
ng
bo
u
ndary
an
d re
move th
e n
o
ises
,
the time of performanc
e is als
o
better.
Ke
y
w
ords
: clu
s
ter, bound
ary
obj
ect, neig
hbo
rhoo
d, pr
oj
ection, nei
gh
borh
o
od
center of gr
avity
Copy
right
©
2013 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
The so-calle
d clu
s
teri
ng i
s
that acco
rd
ing to som
e
simila
r mea
s
ure, divide th
e data
obje
c
t into a
plurality of
cl
asse
s o
r
clusters,
m
a
ki
ng
the obje
c
ts in
the same
cl
uster ha
s ve
ry
high d
e
g
r
ee
of simila
rity a
nd the
obje
c
t
s
in
differe
nt
clu
s
ter i
s
ve
ry different [1]. As a
n
imp
o
rt
ant
method
of d
a
ta mini
ng,
clu
s
terin
g
a
n
a
lysis
ha
s b
een wid
e
ly use
d
in
patt
e
rn
re
co
gniti
on,
grap
hics an
d image p
r
o
c
e
s
sing, etc. In a
ddition, it has been wi
dely use
d
in pra
c
ti
cal work. The
colle
ction of data obje
c
ts located on
the edge of
the regio
n
of high-d
e
n
s
ity of clusteri
n
g
,
con
s
titutes th
e clu
s
te
r bo
u
ndary. Bou
n
d
a
ry obj
ect
s
of
ten have t
w
o
or mo
re fe
atu
r
es of
cluste
rs,
of whi
c
h the
attribution i
s
not cle
a
r [2].
To e
ffectively
extract
clu
s
ter b
ound
ari
e
s not o
n
ly ca
n
improve th
e
accuracy
of clu
s
terin
g
, b
u
t also
st
u
d
y furthe
r the
cha
r
a
c
teri
stic of the ed
ge
of
clu
s
ter. The
r
efore, Cl
uste
ring bou
nda
ry has be
co
me
one of the e
m
ergi
ng area
s of re
sea
r
ch
of
data mining.
DBSCAN [3]
algorith
m
i
s
t
he e
a
rlie
st to
pro
p
o
s
e th
e
co
ncept of
clusteri
ng
bou
ndary.
DBSCAN alg
o
rithm i
s
clu
s
tering
algo
rith
m ba
se
d o
n
den
sity. The
algorith
m
starts from
a
core
obje
c
t and
ite
r
atively se
eks all obj
ect
s
th
at ar
e
directly
den
sity-re
achable, a
nd
st
ops
until me
et
non-co
re
obj
ect iteratio
n, to form
Clu
s
te
ring. We can
say, if a non
-core obj
ect
p
can be
di
re
ctly
den
sity-re
ach
able th
roug
h
co
re o
b
je
cts
q
, then
p
i
s
the bo
unda
ry point. Ho
wever, DBS
C
AN
algorith
m
just propo
se
s clu
s
terin
g
strategy
, but does n
o
t give the detecti
on method
of
boun
dary poi
nt.
BORDE
R [3]
algo
rithm first propo
se
s t
he u
s
e
of
reverse
k-nea
r n
e
ighb
or
of ob
ject t
o
detect the bo
unda
ry points. In
the algorithm, if an object
p
is in the
k
-n
ea
r neigh
b
o
r of an obje
c
t
o
, we call th
e obje
c
t o is the reverse
k
-nea
r neig
h
b
o
r of obje
c
t p. Because the numbe
r of k-
near n
e
igh
b
o
r
s of internal obje
c
ts of clu
s
terin
g
is mo
re than that of
clu
s
terin
g
bo
unda
ry obje
c
ts,
BORDE
R alg
o
rithm first ca
lculate
s
the n
u
mbe
r
of reverse k-nea
r n
e
ighb
or of ea
ch obje
c
t, then
according to t
he num
be
r re
verse
k-ne
ar
neigh
bors,
in
the ord
e
r fro
m
small to l
a
rge, arran
ge t
he
whol
e data
set, taking n o
b
ject
s a
s
bou
ndary p
o
ints.
Ho
wever,
sin
c
e the
numb
e
r of reverse
k-
near
neig
h
b
o
rs
of noi
se
point is
often sm
alle
r than that of
reverse
k-ne
ar nei
ghb
ors of
boun
dary
poi
nts,
when
the
entire d
a
ta
set is arra
nge
d
,
the n
o
ise p
o
i
nts a
r
e
al
so
among
the
first
n
data obj
ect
s
, so BO
RDE
R
algo
rithm
cannot ide
n
tify
noise, and it
s si
gnifica
nt d
e
ficien
cy lies
in
that it cannot effectively extract the
b
oun
dary data set that contain
s
noise.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
Clu
s
terin
g
-bo
unda
ry-d
etect
i
on Algorithm
based
on
cen
t
er-of
-
gravit
y of… (WA
NG
Gui-zhi
)
7303
bDBSCA
N
[4
] algo
rithm i
s
to co
ndu
ct p
r
ojectio
n
a
nd
vector uniti
za
tion of o
b
je
cts in
th
e
neigh
borhoo
d
of core obj
ects, discrimin
a
t
e the bal
an
ce of the neig
hborhoo
d of core obje
c
t a
n
d
clu
s
terin
g
th
e eq
uilibri
um
-den
sity-rea
chable
obj
ect
s
of b
a
lan
c
e
core
obje
c
ts.
The
alg
o
rith
m
define
s
imbal
ance obje
c
ts
in neigh
bo
rho
od a
s
bou
nd
ary obje
c
t, effectively elimi
nating the typ
e
of noise of bo
unda
ry spa
r
se object
s
and
improv
ing cl
usteri
ng a
c
cu
racy. Ho
weve
r, the algorith
m
just co
ndu
cts cluste
ring, b
u
t does not condu
ct cl
u
s
te
r boun
dary e
x
traction. In rece
nt years, the
experts
also
prop
ose ma
ny related
cl
uster
bou
nda
ry detectio
n
algorith
m
s, li
ke IBORA [5
]
algorith
m
, BRIM [6] algorit
hm, Gre
en [7
]
algorithm, B
R
INK [8] algo
rithm. The
s
e
algorith
m
s
use
different
strat
egie
s
to i
den
tify the boun
darie
s
of
o
b
jects,
whi
c
h
has validity, but al
so
so
m
e
limitations.
2.
S-BO
UN
D Algorithm
S-BOUND i
s
a n
e
w alg
o
rithm,
with
the hel
p of
proje
c
tion
[9] to ta
ke
bo
unda
ry
detectio
n
. It gives th
e d
e
finition of
proj
ection
ag
ain,
usi
ng th
e
concept "c
ent
er of
gravity" in
physi
cs to di
scrimin
a
te th
e balan
ce
n
e
ighb
orh
ood
core obj
ect,
so a
s
to ide
n
tify bounda
ry
obje
c
ts.
2.1. Theore
t
ical Basis
In the physical mech
ani
cs, each pa
rt of an obj
e
c
t sh
ould be
ar the
role of gravit
y. The
so-call
ed
ce
n
t
er of
gravity is that in
th
e g
r
avit
ation
a
l field, fo
r t
he o
b
je
ct in
any po
sition,
the
resultant force of gravity that con
s
titute
mass
p
o
ints,
all thro
ugh th
e point. Th
e
center
of gravity
of regula
r
obj
ect of uniform density is
its geomet
ric cente
r
and t
he po
sition o
f
the center
of
gravity influence
s
the bal
a
n
ce a
nd sta
b
i
lity of
the object.
Observing
ε
neigh
borhoo
d
of
cluste
ring internal obj
e
c
ts, you can fi
nd its data o
b
ject is
relatively unif
o
rmly di
stri
bu
ted,
the
ce
nte
r
of
gravity is
the
cente
r
of
the nei
ghb
orh
ood
and
the
ε
neigh
borhoo
d
is balan
ce
d; Ho
wever,
th
e
data obje
c
ts in
ε
nei
ghbo
rhood
of cl
ust
e
ring
bo
und
a
r
y
obje
c
t is une
venly distribu
ted, with one
side
the hig
h
-de
n
sity are
a
, the other side the lo
w-
den
sity area
s. Therefore, t
he gravity cente
r
of its neigh
bo
rh
ood is d
e
viated from t
h
e
neigh
borhoo
d
center a
n
d
it is in the high den
sit
y
area. The
ε
neighborh
ood is u
nev
en
.
Therefore, we can u
s
e th
e deviation d
egre
e
of
the cente
r
of gra
v
ity and geometric
cente
r
of
ε
neigh
borhoo
d
of the core o
b
ject to identi
f
y boundar
y o
b
ject
s, thereb
y to extract bound
arie
s.
2.2. Relate
d Defini
tions a
nd Terminolog
y
Defini
tion 1:
ε
-neigh
bo
rho
od [3]. For a
n
y
data obje
c
t
p
in space
D
, the area within the
radiu
s
ε
is cal
l
ed
ε
neighb
o
r
hoo
d of the obje
c
t, reco
rd
ed as
N
ε
(p)
. Namely:
N
ε
(p
)={q|q
D and di
st(p,
q
)
≤ε
}
Whe
r
ein
di
st (p, q)
represe
n
ts the Euclid
ean di
stan
ce
betwe
en
p
an
d
q
.
Defini
tion 2: C
ore obj
ect
[3]. Given
ε
and MinPt
s
, if the numb
e
r of obj
ect
s
of
ε
neigh
borhoo
d
N
ε
(p)
of obj
ect p is
|N
ε
(p)|
≥
MinPts
, then P is called t
he co
re obj
ect.
Defini
tion 3:
Projec
tion. In the
ε
-neig
h
borh
ood
of core o
b
je
ct, for any obje
c
t
p
N
ε
(q)
,
draw a
ray with
q
as
the sta
r
ting
p
o
int to
be th
rough
q
, let the
ray intersec
t
with the
ε
-
neigh
borhoo
d
bou
nda
ry
q
at
point
p
, namely
di
st(p
′
,q
)
=
ε
. T
h
e
p
r
oc
es
s is calle
d
pr
o
j
ec
tion
,
whe
r
ein the p
o
int p 'calle
d the su
bpoi
nt of object p on the
ε
neighb
orhood.
For th
e
d-dim
ensi
onal
data
set
D
, in the
ε
-
n
e
igh
borh
ood
of core o
b
ject
q (x
q1
, x
q2
, ...
,
x
qd
)
, the proj
ection of o
b
j
e
ct
p (x
p1
, x
p2
, ..
., x
pd
)
is the proje
c
tion
of each att
r
ibute o
n
eve
r
y
dimen
s
ion. A
s
sume the p
r
ojectio
n
point
as
p '(x
p1'
, x
p2'
, ...
, x
pd'
)
, then:
x
pi
′
=
ε
·c
os
θ
i
1
≤
i
≤
d
Whe
r
ein
θ
i i
s
the inclu
ded
angle of ray q
p
and unit ve
ctor Ii on the i
-
dimen
s
io
n [7].
And,
co
s
θ
i
=
-
)
dist (p, q
x
x
qi
pi
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 12, Dece
mb
er 201
3: 730
2 – 7308
7304
In bDBS
CAN algo
rithm, i
n
the
neig
h
b
o
rho
od
of two-dim
e
n
s
iona
l sp
ace, obj
ect i
s
proje
c
ted
ont
o the
unit
circle. In orde
r to
impr
ove the
cal
c
ulatio
n a
c
curacy
of the
neigh
borhoo
d
equilibrium, the articl
e put t
he projection point on the
ε
-circle, rat
her than the
unit circle. As
s
h
ow
n
in
F
i
gu
r
e
1
,
in
ε
-ne
i
ghbo
rho
o
d
o
f
two-dime
nsi
onal spa
c
e o
b
ject q, obje
c
t p is proje
c
te
d
onto point
p'
of
ε
-
circle. Th
e other obj
ect
s
are p
r
oj
ecte
d in the same
way onto the
ε
-
cir
c
le.
Defini
tion 4:
Center of g
r
avity of neighborh
ood. Fo
r the core o
b
j
e
ct
q
, the pro
j
ection
points
of all
obje
c
ts of its
ε
-neigh
bo
rh
ood a
r
e
seq
uentially con
necte
d, thus
forming
epib
o
ly
convex h
u
ll. The cente
r
o
f
gr
avity of convex hull is defined
a
s
t
he center
of gravity of
ε
-
neigh
borhoo
d
of object q, reco
rde
d
as p
o
int m.
For
the
d-dimen
s
ion
a
l
data
set
D
, p
r
oje
c
tion poi
n
t
of object
p
j
in
ε
-neigh
borhood of
obje
c
t
q
is
rec
o
rded as
pi
′
(x
pj1
′
,x
pj2
′
,…,x
pj
d
′
)
.Its
neighborhood c
enter of
gravity
m (
x
q1'
, x
q2'
, ..., x
qd'
)
is re
co
rde
d
a
s
m
, then:
x
qi
′
=
n
j
pji
x
1
′
(1
≤
i
≤
d
n
= |
N
ε
(
q
)|
Obviously,
the
ce
nter
of
gravity
m
is
c
l
os
e
r
to
obje
c
t
q
, indic
a
t
e
s
that the
points
in the
ε
-neigh
bo
rho
od of obje
c
t
q
are
more e
v
enly distrib
u
t
ed, otherwi
se the farthe
r,
more
uneve
n
ly,
namely nei
gh
borh
ood i
s
u
nbala
n
ced. Fi
gure
2 is
sc
h
e
matic di
ag
ra
m of cente
r
o
f
gravity of
ε
-
neigh
borhoo
d
of two-di
me
nsio
nal obj
ect
q
. The proj
ection
point
s (soli
d
dot
s)
of the obje
c
t
s
(ope
n dots) i
n
the
ε
-neigh
borh
ood of o
b
ject
p
in
F
i
gu
r
e
2(
a
)
on
ε
-circle, form a
convex polyg
on.
The cente
r
o
f
gravity
m
1
of the p
o
lygo
n is very
clo
s
e to
obje
c
t
p
, so
its
nei
ghbo
rho
od
are
balan
ce
d; Ho
wever, th
e ce
nter of g
r
avity
m
2
of the convex polygo
n
forme
d
by
sub
point
s of the
obje
c
ts in
ε
-neigh
borhoo
d
of object
q
in Figure 2
(
b), is very far from obj
ect
p
and Its
neigh
borhoo
d
is uneven.
Figure 2. Dat
a
Obje
cts Nei
ghbo
rho
od E
quilibri
um
(A) Balan
c
e
d
neigh
borhoo
d
(B) Imbala
n
ced neig
hbo
rh
ood
Figure 1. Nei
ghbo
rho
od Projectio
n
of
two-di
men
s
io
nal data ob
j
ec
ts
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
Clu
s
terin
g
-bo
unda
ry-d
etect
i
on Algorithm
based
on
cen
t
er-of
-
gravit
y of… (WA
NG
Gui-zhi
)
7305
Defini
tion 5
:
Bound
ary O
b
ject
s.
when
the
di
stan
ce betwe
en obje
c
t
q
with ce
n
t
er
of
gravity m of
ε
-neigh
bo
rho
od is greate
r
than the given value
η
, objec
t
q
is called bo
unda
ry
obje
c
t.
3.3. Algorith
m
Steps De
s
c
ription
S-BOUND
al
gorithm i
s
a
c
cording to the
abov
e ide
a
s
and definitio
n
s
of bou
nda
ry object
to identify bounda
rie
s
. The
spe
c
ific ste
p
s
are d
e
script
i
ons a
r
e a
s
follows:
(1) Input dat
a set
D
, neighbo
rho
od radiu
s
ε
, core object nei
ghbo
rho
od thre
shol
d
MinPts
and b
ound
ary thre
shold
η
;
(2) S
c
an data
set, repe
at steps (2)
- (3
);
(3) Cal
c
ul
ate the
ε
-nei
ghb
o
r
hoo
d of d
a
ta
obje
c
t
q
, if
| N
ε
(q)
|
≥
Mi
nPts
, execute
step
(
4
)-
(6)
;
(4) P
r
oje
c
tion
. Take
obje
c
t
q
as th
e cen
t
er to e
s
tabli
s
h
coo
r
din
a
te
system, p
r
oj
ect all
obje
c
ts in
N
ε
(q)
o
n
the nei
ghbo
rho
od b
ound
ary;
(5) Cal
c
ulate cente
r
of
grav
ity
m
of neighborh
ood a
c
co
rding to defini
t
ion 3;
(6) Calculate the
distan
ce bet
we
en the cente
r
of gra
v
ity
m
and
q
, when
| mq
|>
η
,
q
is
the boun
dary
obje
c
t;
(7) O
u
tput bo
unda
ry obje
c
t.
3. Experiment Situatio
n Analy
s
is
In order to v
e
rify the co
rrectne
s
s and
efficien
cy of algorith
m
, the article a
dopt
s a plu
r
ality
of two-dime
nsio
nal
com
p
reh
e
n
s
ive
data set to
con
d
u
c
t e
x
perime
n
t. For exp
e
rime
ntal
environ
ment,
it u
s
e: PDC E5
800
CP
U, 3G
me
m
o
ry, Wi
ndo
ws 7
Home
E
d
ition o
perating
system. Pre
paratio
n a
n
d
impleme
n
ta
tion of
the
algo
rithm i
s
cond
ucte
d
in VC++
6.0
environ
ment.
3.1. Validit
y
of the
Algori
t
hm
The arti
cle u
s
e
s
multiple
synthetic d
a
ta
sets to test effectiveness of S-BOUND and
comp
are with
the typical bound
ary dete
c
tion
alg
o
rith
m
BORDE
R experim
ental results.
Figure 3 i
s
t
h
ree
data
set
s
with noi
se
and u
n
iform
den
sity. Whe
r
ein i
n
(a) th
ere
are
12,917
data
points,
natura
lly forming
three
clu
s
te
rs; i
n
(b
) th
ere
a
r
e 84
86
data
points,
natura
l
ly
forming
four
clu
s
ters; in
(c) the
r
e
are 1
1
,182
dat
a
p
o
ints, n
a
turall
y forming
five cl
uste
rs.
T
he
three d
a
ta sets a
r
e with
cle
a
r bo
und
arie
s and vari
ou
s
sha
p
e
s
, whi
c
h co
uld te
st the validation
of
algorith
m
.
Figure 4 is t
he co
mputati
onal results
of
the three
data set
s
sh
own in Fi
gure 3 by
BORDE
R al
g
o
rithm,wherei
n the pa
ram
e
ters
used b
y
(a) a
r
e: th
e numb
e
r n
e
i
ghbo
r
k=
8
, t
h
e
numbe
r of b
o
unda
ry point
s
n=1
925
; the
para
m
eters u
s
ed
by (b
) are: the numb
e
r
neig
hbo
r
k=
8,
the numb
e
r
o
f
bound
ary p
o
ints
n=
13
9
0
;
the pa
ramet
e
rs u
s
ed by
(c)
are: the
nu
mber
neig
h
b
o
r
k=
8
, the nu
m
ber
of bou
nd
ary point
s
n
=
1799
;As can
be foun
d fro
m
t
he figure, the "bou
nda
ry
points"
dete
c
ted BO
RDER alg
o
rith
m incl
ude
n
o
ise
and
th
e bou
nda
ry
point cann
ot be
disting
u
ished
from the noi
se area.
c
b
a
Figure 3. Orig
inal Date Set
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 12, Dece
mb
er 201
3: 730
2 – 7308
7306
Figure 5 i
s
t
he
comp
utational
re
sults
of
the three
data
sets sh
own
in Fi
gure 3 b
y
BORDE
R al
g
o
rithm, whe
r
ein the p
a
ra
meters u
s
ed
by (a) are: n
e
ighb
orh
ood
radiu
s
ε
=3
, core
obje
c
t nei
gh
borh
ood
thre
shol
d
MinPt
s
=12
, th
e
bou
ndary
thre
sh
old
η
=0.052
,
the
numb
e
r of
boun
dary poi
nts dete
c
ted i
s
175
6;the p
a
ram
e
ters used by (b) a
r
e:
neighb
orh
o
o
d
radiu
s
ε
=2.3
,
core obj
ect n
e
ighb
orh
ood
threshold
Mi
nPts=7
, the b
o
unda
ry thre
shold
η
=0.
0
75
,
the numb
e
r
of
boun
dary
poi
nts d
e
tecte
d
i
s
1
043; the
p
a
ram
e
ters u
s
ed by
(c) a
r
e:
neigh
bo
rho
o
d
ra
diu
s
ε
=2.2
,
core obj
ect n
e
ighb
orh
ood
threshold
Mi
nPts=7
, the b
o
unda
ry thre
shold
η
=0.
0
95
,
the numb
e
r
of
boun
dary poi
nts dete
c
ted i
s
154
6; It can be found
from the diag
ram, S-BOUND algo
rithm can
effectively de
tect the b
o
u
ndary
obje
c
t
s
in
noi
se d
a
ta set
and t
he bo
und
arie
s ide
n
tified a
r
e
clea
r, very well reflectin
g
the sh
ape
cha
r
ac
te
ri
stic of the bou
nda
ry of each
clu
s
ter.
In the a
bove t
h
ree
u
s
ed
da
ta set
s
, the
d
ensity of
ea
ch cl
uste
r i
s
v
e
ry hig
h
a
nd t
he d
a
ta
obje
c
ts a
r
e e
v
enly distrib
u
t
ed, while the
noise
obje
c
t
s
are relatively rare. In order to tes
t
the
effectivene
ss of S-BOUND alg
o
rithm,
we h
a
ve
al
so used
com
p
lex data
set
s
[7] shown
in
Figure 6(a
)
to
cond
uct exp
e
rime
nts. Th
e origin
al dat
a set ha
s 23
724 poi
nts an
d inclu
d
e
s
a lot
of noise a
nd
"interferin
g
lin
e". The inte
rn
al obje
c
ts of
each cl
uste
r
are
distri
bute
d
very spa
r
se
ly
and not that evenly. (b) is the
result detected by algo
rithm BORDER, the param
eters u
s
e
d
are:
the numbe
r
of neighb
or
k=140
, the n
u
mbe
r
of bo
unda
ry point
s
n=450
0
; (c
) is the re
su
lt
detecte
d by algorith
m
S-BOUND, the
para
m
eters u
s
ed a
r
e: neig
hborhoo
d rad
i
us
ε
=8.
6
, core
obje
c
t neig
h
borh
ood
thre
shol
d
MinPts=128
, bou
nd
ary threshold
η
=0.006
4
, the nu
mbe
r
of
F
i
gure 6. Bou
n
dar
y
Detecti
on
Case of Com
p
l
e
x D
a
ta Sets
a
b
c
Figure 5. S-BOUND Algo
rit
h
m Re
sults
a
b
c
a
Figure 4. BORDE
R Algorit
hm Re
sults
b
c
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
Clu
s
terin
g
-bo
unda
ry-d
etect
i
on Algorithm
based
on
cen
t
er-of
-
gravit
y of… (WA
NG
Gui-zhi
)
7307
boun
dary
poi
nts d
e
tecte
d
is 38
37. It
can
be
fou
n
d
by
com
p
a
r
ing the
resul
t
s of th
e two
experim
ents t
hat bou
nda
ry points
dete
c
ted by BORDER algo
rithm
contai
n a lot
of noise an
d
coul
d not ve
ry well id
entify bound
ary o
u
tline of e
a
ch clu
s
te
r; Wh
ile S-BO
UND algo
rithm
ca
n
effectively distinguish noi
se, remove th
e "inter
fe
ring
lines"
and d
e
tect the true
boun
dari
e
s
of
each clu
s
ter.
Thus, the effe
ctivene
ss of the algo
rithm i
s
verified.
3.2. Time Analy
s
is of Algorithm
S-BOUND al
gorithm
scan
s throug
h th
e data
set to
cal
c
ulate
th
e
ε
-nei
ghb
orhood
of
data obje
c
t
s
and dete
c
t b
ound
ary obj
e
c
ts by the p
r
ojecto
r, its time co
mplexit
y
is
O (n
2
)
; while
BORDE
R
al
gorithm
s
re
q
u
ire
sca
nnin
g
twi
c
e
dat
a
s
et, to
cal
c
u
l
ate the
rev
e
rse
k-nea
re
st
neigh
bor
of e
a
ch
data o
b
j
e
ct, and
sort
all the
data
o
b
ject a
c
co
rdi
ng to the
rev
e
rse
k-n
e
a
r
e
s
t
neigh
bor,
al
gorithm'
s
tim
e
complexity is
O (kn
2
)
.
The
r
efore BORDE
R al
gorithm’
s
tim
e
compl
e
xity is highe
r than S
-
BOUND algo
rithms.
In ord
e
r to
verify the time efficie
n
cy of the S-BOUND
algo
rithm, we co
ndu
cted
experim
ents
comp
ared to the BORDER algorithm
with
the origi
nal
data set sho
w
n in Fig
u
re
3
(a).
400
0, 60
00, 75
50, 85
50, 95
50, 11
000
and
129
17 (all) data
point
s
were
sel
e
cte
d
, th
e
results sho
w
n in Figure 7.
As can be seen, S-BO
UND alg
o
rithm
'
s time efficie
n
cy is more than
BORDE
R alg
o
rithm
s
.
In above
experim
ents,
the data
poi
nts of e
a
ch
clu
s
ter is
uniform
and
den
sity,
para
m
eter val
ues
use
d
in the two alg
o
rit
h
ms
a
r
e
smal
l; in the data set sh
own in
Figure
6,
dat
a
obje
c
ts di
stri
buted le
ane
r
and le
ss eve
n
ly, param
eter value
s
used in the t
w
o
algorith
m
s
a
r
e
larger.
To
b
e
tter validati
on S-B
O
UND al
gorith
m
’
s
time
pe
rfo
r
man
c
e,
Ex
p
e
rime
nts we
re
execute
d
wit
h
the data
se
t sho
w
n in Fi
gure
6.
5000
, 8000, 10
10
0, 1303
0,163
00, 198
88 an
d
2372
4 (all
) da
ta points were sele
cted, th
e
comp
arative results
sho
w
n in Figu
re
8.
As can b
e
se
en from
Figu
re 8, to S
-
BOU
ND
algo
rit
h
m, althou
gh
the pa
ram
e
ters are
vary greatly
in the t
w
o ty
pes of d
a
ta
sets,
but the
execution ti
me is ba
si
ca
lly the same;
Ho
wever, the
time compl
e
xity of
BORDER al
gorith
m
is
O (kn
2
)
, when u
s
e
d
the data se
ts
sho
w
n i
n
Fig
u
re
6(a
)
, the
experim
ents
need
a la
rge
r
value
of the
paramete
r
(
k=
140
), a
nd th
e
time perform
ance is far be
low the S-BO
UND alg
o
rith
ms.
4. Conclusio
n
In the article, all the data
points a
r
e p
r
ojecte
d in the
ε
-neigh
bo
rho
od of the core obje
c
t of
the data
set o
n
to the conve
x
hull of the e
p
iboly
an
d at
the sa
me tim
e
, identify bo
unda
ry obje
c
t
s
with the con
c
ept of center
of gravity in p
h
ysic
s. The e
x
perime
n
tal result
s sho
w
that the method
can
effectivel
y identify cluster bou
nda
rie
s
of vari
o
u
s
shape
s in the
data set that contai
ns
noi
se,
and its time e
fficiency is
greater tha
n
that of BO
RDE
R algo
rithm.
Espe
cially for the clu
s
ter
with
low d
e
n
s
ity a
nd not th
at evenly dist
ribut
ed, S-
BO
UND alg
o
rithm
h
a
s m
o
re
time
advantag
e th
an
BORDE
R alg
o
rithm. Ho
we
v
e
r,
S-BO
UN
D
al
go
rith
m h
a
s
an
ap
pare
n
t defe
c
t - mo
re
paramete
r
s.
whe
n
the
al
gorithm
is e
x
ecuted, i
n
addition
to t
he n
e
igh
borh
ood
radi
us
ε
and
bo
und
ary
F
i
gure 8. T
i
me efficienc
y com
pari
ng t
w
o
algorithms(
)
0
50
10
0
15
0
20
0
25
0
30
0
35
0
50
00
8000
1
010
0
1
3030
1
6300
1
9888
2
3724
数据规模
执行时间(s
)
BORDER
S-BOUND
F
i
gure 7. T
i
me Efficienc
y Com
pari
ng T
w
o
Algorit
hms(
)
0
2
4
6
8
10
12
14
16
18
3
000
60
00
90
00
12
00
0
1
5
000
数据规
模
执行时间
(s
)
B
ORD
ER
S
-BO
UN
D
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 12, Dece
mb
er 201
3: 730
2 – 7308
7308
threshold
η
, it
nee
ds nei
gh
borh
ood
thre
shol
d
MinPt
s
of Co
re
obje
c
t as th
e in
put
paramete
r
, i
n
orde
r to rem
o
ve noise.
Referen
ces
[1]
MS Ch
en, JH
Han, PS
Yu.
D
a
ta mi
nin
g
: a
n
overvi
e
w
from
a d
a
tab
a
se
per
spective.
IEEE
Trans
KDE
.
199
6; 8(6): 866
-883.
[2]
Xi
a Ch
en
yi, H
s
u W
,
Lee Mongli, et al. BORDE
R: efficie
n
t
computation
of boun
dar
y p
o
ints.
IEEE
T
r
anson Kn
ow
ledg
e an
d Data
Engin
eeri
n
g
. 2
006; 18(
3): 289
-303.
[3]
Ester M, Kriegel HP
, Sa
nd
er
J, et al.
A de
nsity-bas
ed a
l
gorith
m
for
dis
c
overi
ng cl
usters in
larg
e
spatia
l dat
abas
e w
i
th nois
e
. P
r
ocee
din
g
s of
2nd Inter
nati
o
n
a
l Co
nfere
n
ce
on Kn
o
w
l
e
d
ge
Discover
i
n
g
and D
a
ta Mini
n
g
(KDD-9
6
). Portlan
d
: Orego
n. 1996: 2
26-2
31.
[4]
Wu Jia
w
ei, Li
X
i
ongfei, Sun tao, etc. A
De
nsit
y-B
a
sed
Cl
usteri
ng Al
gor
ithm
Conc
ern
i
ng
Neig
hb
orh
ood Bala
nce.
Jour
n
a
l of Co
mp
uter
Researc
h
an
d
Devel
o
p
m
ent
. 201
0; 47(6): 10
44-1
052.
[5]
WU
SHOUR, Silamu, LI
Fen
g
ju
n, T
A
O Mei. IBORA: an I
m
prove
d
Effici
ent D
e
tection
of Bou
n
d
a
r
y
Points.
Journ
a
l
of Chines
e Co
mp
uter Syste
m
s
. 2008; 29(
10)
: 1845-1
8
4
8
.
[6]
Qiu Ba
ozhi, Y
ue F
e
ng, Sh
e
n
Ju
n
y
i, etc.
B
R
IM: An Effici
ent Bo
und
ary
Points D
e
tecti
ng Al
gor
ith
m
.
Proc.Of Advances in Kno
w
ledge Disc
o
ver
y
and Da
ta Mining. Heidelber
g:
Springer. 2007; 761-
768.
[7]
Qiu Baoz
hi, Y
ueF
en
g. Gravi
t
y
-
bas
ed
Bou
n
dar
y Po
ints D
e
tecting
Alg
o
ri
thm.
Journ
a
l of
Chi
nes
e
Co
mp
uter Systems
. 2
008; 2
9
(
9
): 279-2
82.
[8]
Qiu Baoz
hi, Ya
ng Ya
ng, D
u
Xiao
w
e
i. BRINK:
An
Al
gorithm of
Bou
ndar
y P
o
in
ts of
Cluster
s Detectio
n
Based On
Loc
al Qua
litative
F
a
ctors.
Journ
a
l of Z
h
e
n
g
z
h
o
u Univ
ersity (E
ngi
neer
in
g Sci
ence).
2
012;
33(3): 11
7-1
2
0
.
[9]
Z
hou Jin
g
b
o
, Yin Jun, Jin
Z
hong. Ne
w
Ensemb
le Co
nstructor Base
d on Loc
al
it
y Preservin
g
Projecti
on for High D
i
me
nsio
nal Cl
usteri
ng.
Co
mp
uter Scie
nce
. 201
1; 38(
9): 177-1
81.
Evaluation Warning : The document was created with Spire.PDF for Python.