TELKOM
NIKA
, Vol. 11, No. 8, August 2013, pp. 44
8
4
~4
490
e-ISSN: 2087
-278X
4484
Re
cei
v
ed
No
vem
ber 1
7
, 2012; Re
vi
sed
April 10, 201
3; Acce
pted
May 19, 20
13
Community Detection Algorithm based on Neighbor
Similarity
Jianjun Che
ng, Hong Xu
, Mahmud Ga
y
bullae
v
, M
i
ng
w
e
i L
e
ng,
Xiao
y
un Chen*
Schoo
l of Information Sci
enc
e and En
gi
neer
ing, La
nzh
ou U
n
iversit
y
, La
nz
hou 7
3
0
000, C
h
in
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: cheng
jia
nj
un
@lzu
.e
du.cn, h
x
u
1
1
@
lzu.e
du.
cn, chen
xy@
l
z
u
.edu.cn
*
A
b
st
r
a
ct
Many co
mple
x netw
o
rks have d
i
spl
a
ye
d
the
co
mmu
n
i
ty structures, and th
e det
ection
of
community str
u
cture ca
n giv
e
insi
ghts i
n
to
the stru
ctural
and fu
nctio
n
a
l
infor
m
ati
on
o
f
these co
mp
l
e
x
netw
o
rks. In this pap
er, w
e
propos
ed a
nei
ghb
or si
mi
l
a
rit
y
based
new
a
l
gorit
hm for c
o
mmu
n
ity structure
detectio
n
, in w
h
ich w
e
only c
onsi
der the si
mi
lariti
es
betw
een a n
o
d
e
an
d its unclass
ifi
ed ne
igh
bors i
n
the
brea
dth-first traversa
l or
der,
w
i
thout
cons
id
erin
g oth
e
r n
o
des i
n
flu
enc
es
; w
e
take this
nod
e as
a f
a
ther
nod
e an
d its neig
hbors as th
e childr
en n
o
d
e
s, to find
out those ch
ildr
en
nod
es w
h
ich shou
ld be
lo
ng i
n
the
same co
mmuni
ty w
i
th their fat
her n
o
d
e
. T
hen
these c
h
il
dre
n
nod
es ar
e pr
oc
essed
in th
e s
a
me
w
a
y as th
ei
r
father no
de re
cursively, u
n
til
the
terminati
o
n con
d
itio
n is
reach
ed. T
he
most pr
o
m
in
en
t property of o
u
r
alg
o
rith
m is t
h
at it has
ne
ar li
ner ti
me c
o
mpl
e
xity, and
furthermore
it is a
determi
nistic
al
g
o
rith
m. W
e
hav
e
tested our a
l
go
rithm o
n
sever
a
l rea
l
netw
o
rk
s, comp
ar
ed w
i
t
h some other
alg
o
rith
ms, an
d the results h
a
v
e
ma
nifeste
d
tha
t
our algor
ith
m
outperfor
m
s
th
e previ
ous al
go
rithms si
gnific
a
ntly.
Ke
y
w
ords
: co
mmu
n
ity detect
i
on, netw
o
rks, nei
ghb
or
si
mil
a
rity, breadth-fir
s
t traversal
Copy
right
©
2013 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
In recent years, ma
ny sc
ientific syst
ems can
al
ways
b
e
re
p
r
esented
as compl
e
x
netwo
rks [1-4
], e.g., the In
ternet
and
world
wid
e
we
b is a
network fo
rmed
by
web
pa
ge
s a
n
d
hyperlin
ks, th
e so
cial
network re
present
s the
relatio
n
s
hip
s
a
m
ong
peopl
e, and
the food
we
b
rep
r
e
s
ent
s th
e relatio
n
shi
p
s am
ong
predators a
nd
preys,
et
c
. In these networks, the n
o
des
rep
r
e
s
ent the
objects o
r
e
n
tities in the system
s, and
the edges repre
s
e
n
t the relation
shi
p
s
or
con
n
e
c
tion
s betwe
en the
object
s
or
entities. The
s
e net
wo
rks may have some i
n
tere
sting
cha
r
a
c
teri
stics, on
e of the
most
co
mm
on an
d p
r
om
inent p
r
op
ert
y
is its
com
m
unit
y
st
r
u
ct
ure
.
Although, to t
he
con
c
ept
of com
m
unity, there
is
no
uni
fied definition
at present ye
t [5-7], most
of
the resea
r
che
r
s have
rea
c
h
ed a co
nsen
sus that co
mm
unities in a n
e
twork indi
cat
e
grou
ps of th
e
node
s, su
ch t
hat the node
s within a gro
u
p
are c
onn
ect
ed more often than those across differe
n
t
grou
ps [8
-10]
.
To dete
c
t the
com
m
unity
stru
cture of a
netwo
rk i
s
o
f
great
sig
n
ifican
ce,
be
cau
s
e th
e
comm
unity
st
ructu
r
e
of a netwo
rk can
give
u
s
so
m
e
insi
ghts into
the
stru
ctu
r
a
l
and
fun
c
tio
nal
informatio
n of
the netwo
rk. So the study
of comm
unity detectio
n
ha
s arou
sed m
a
ny resea
r
chers’
intere
sts an
d
attentions, and many alg
o
rithm
s
have
been bro
u
g
h
t out in the
last decade
s [5],
su
ch a
s
edg
e betwe
enne
ss b
a
sed me
thod [8, 11],
modula
r
ity optimization me
thods [12], L
PA
(Lab
el Prop
a
gation Algo
rithm) an
d its variation
s
[13-16],
etc
. M
any of the aforeme
n
tione
d
method
s hav
e a relative hi
gh co
mputati
on dem
and, t
hus
can
not b
e
use
d
to dea
l with very large
netwo
rks; Co
mpared with t
hese algo
rith
ms, LPA
(La
bel Prop
agati
on Algorithm
) has ne
ar lin
ear
time compl
e
xity, but it is not a determini
stic algorith
m
.
To
solve the
s
e
pro
b
lem
s
,
in thi
s
p
a
p
e
r,
we p
r
o
p
o
se
a
neig
h
bor
simil
a
rity ba
sed
algorith
m
(NSA), to identify the community stru
cture in the n
e
twork.
The
most promin
ent
prop
erty of o
u
r alg
o
rithm i
s
that it also
has
nea
r line
r
time compl
e
xity, and furthermo
re it is a
determi
nisti
c
algorith
m
.
The re
st
of
t
h
is pap
er
is orga
nized as
fo
llows
:
s
e
c
t
ion II is
the
in
tr
o
d
u
c
t
io
n of s
o
me
related
work
for
comm
unit
y
detectio
n
; t
he p
r
o
p
o
s
ed
algorith
m
b
a
s
ed
on
n
e
igh
bor simil
a
rity
is
elaborated in s
e
c
t
ion III; s
e
c
t
ion IV is the expe
riments
and
results
analys
is
; c
o
nc
lus
i
ons is
arrang
ed in section V.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
Com
m
unity Detection Algo
rithm
based o
n
Neig
hbo
r Sim
ilarity (Jia
nj
un Ch
eng
)
4485
2. Related Work
Many algorith
m
s have bee
n prop
osed in
the
past years. Among th
em, the most famous
one i
s
the
G
N
alg
o
rithm
origin
ated
by Girvan
and
Ne
wman
[11]
. It repeate
d
l
y
calculate
s
t
h
e
betwe
enn
ess for
all the
ed
ges in th
e
ne
twork, remov
e
s th
e
edge
with the
hig
h
e
st b
e
twe
enn
ess
from the net
work, until al
l the edge
s are removed.
Along with
the GN alg
o
rithm, [11] also
prop
osed a
con
c
e
p
t nam
ed a
s
“
m
odu
larity
”
Q
, to
measure the
good
ne
ss
o
f
a comm
uni
ty
stru
cture. Althoug
h the G
N
alg
o
rithm
has m
any su
ccessful a
ppl
ication
s
, but
its com
putati
o
n
deman
d is to
o high to be u
s
ed in very la
rge net
wo
rks.
To incre
a
se the computati
on efficie
n
cy,
Ne
wman
ha
s propo
se
d a
fast algo
rith
m based
on the idea of modularity o
p
timization (F
ast
Q
) [12]. In
the algorithm,
each no
de of the network is
con
s
id
ere
d
a
s
a commu
nity initially, and then the alg
o
rithm choo
ses two
com
m
unities to me
rge
into one ite
r
a
t
ively, until all the node
s
are m
e
rg
ed i
n
to the same
comm
unity. In the pro
c
e
s
s,
each
me
rg
e sho
u
ld re
sult in
the greate
s
t
in
cr
ea
se (or small
e
st d
e
crea
se
)
of modula
r
ity
Q
.
The
outputs
of bo
th the GN
an
d FAST
Q
a
r
e
dend
rog
r
am
s to de
pict th
e
com
m
unity
stru
ctures i
n
the
netwo
rks. Ea
ch level of t
he den
drogram re
pr
e
s
e
n
t
s a commu
nity structu
r
e
,
and the b
e
st
comm
unity structu
r
e
can b
e
pursue
d
by se
e
k
in
g the maximal valu
e of modula
r
ity.
LPA (La
bel
Propa
gation
Algorithm
) [1
3] is
a ne
ar l
i
near time
co
mplexity algo
rithm for
comm
unity detection p
r
op
ose
d
by Rag
havan
et al
. The main ide
a
of this algo
rithm is: if a given
node
x
ha
s
k
neighbors
k
x
x
x
,...
,
2
1
and
ea
ch neig
hbo
r of
the nod
e has
a
label
i
ndicating
th
e
comm
unity in
whi
c
h it sho
u
ld bel
ong, t
hen the
nod
e
x
update
s
its l
abel to the
o
ne mo
st of its
neigh
bors ha
ve. The pro
c
e
ss
contin
ue
s iterativel
y until every node h
a
s the lab
e
l carri
ed by most
of its n
e
igh
b
o
r
s. In
this wa
y, the label
s
are
propa
gat
ed in
the
wh
o
l
e net
work,
a
nd at th
e
end
of
prop
agatio
n, a grou
p of node
s have the same la
b
e
l
form a com
m
unity. The disa
dvantag
e
o
f
LPA is that it is sen
s
itive to
the label u
pdating o
r
de
r of the nodes,
i.e.
, it is not a determini
stic
algorith
m
. Th
at mea
n
s ru
n
n
ing th
e al
go
rithm m
any ti
mes agai
nst
a given
net
works, th
e o
u
tputs
of LPA might not be ide
n
tical. But LPA has its
nea
r linear time
compl
e
xity, its computatio
n
deman
d is fa
r lower tha
n
the GN al
go
rithm.
Just b
e
c
au
se of thi
s
, many improvements
and
variants
have
been b
r
o
ugh
t out base
d
o
n
the LPA. Among the
m
,
LPAm
[14] is a rep
r
e
s
entat
ive
that modified
the label
upd
ating rule of
LPA to
pursu
e the maxima
l modula
r
ity o
f
the com
m
un
ity
division.
3. Neighbor
Similarit
y
ba
sed Algori
t
h
m
for Comm
unit
y
Detecti
on
To d
e
tect th
e
co
mmunity
structu
r
e i
n
a
netwo
rk,
we
often can
exp
l
oit so
me i
n
fo
rmation
based on the
topology of the network. For exampl
e,
in soci
al networks, one p
eople mig
h
t know
clea
rly somet
h
ing a
bout
hi
s a
c
q
uaintan
ce
s (th
e
re a
r
e ed
ge
con
n
e
ction
s
between th
em, so
the
peopl
e an
d hi
s a
c
qu
aintan
ce
s a
r
e n
e
igh
bors e
a
ch
oth
e
r), b
u
t he
ca
nnot kno
w
th
e co
unte
r
pa
rt of
a strang
er
(the pe
ople
an
d the
stran
g
e
r
a
r
e n
o
t nei
ghbo
rs). Insp
ired
by this
p
henom
eno
n,
we
prop
osed a n
e
ighb
or si
mil
a
rity based al
gorithm
(N
SA
) to detect th
e comm
unity stru
cture fro
m
a
netwo
rk in thi
s
pap
er.
In NSA, we
only make utilization of
the relatio
n
s
hip b
e
twe
e
n
every no
d
e
and its
uncl
a
ssified
neigh
bors in
the bre
adth-fi
rst trave
r
sal
orde
r, to det
ermin
e
wh
eth
e
r the no
de
and
some of its n
e
ighb
ors sh
o
u
ld belon
g in
the sa
me co
mmunity, without con
s
id
ering influen
ce
s of
any other no
des. The ba
sic idea of NS
A is simple, for ea
ch nod
e
,
we call it a
father nod
e and
take its
un
cla
ssifie
d
nei
gh
bors a
s
its
ch
ild (r
en)
nod
e
s
. If the relati
onship b
e
twe
en the n
ode
NA
and NB i
s
father n
ode a
n
d
child n
ode,
and the
simi
l
a
rity betwe
en
NA and
NB is greate
r
tha
n
a
given thre
sho
l
d
, the child n
ode NB is in
serted into its f
a
ther no
de’
s comm
unity.
The con
c
ept
of similarity b
e
twee
n a no
de and
his fa
ther no
de is
very importa
nt to the
algorith
m
. Any form of si
milarity mea
s
ure
can b
e
e
m
ployed; ma
tched
with th
e basi
c
id
ea
o
f
NSA, in this pape
r, we onl
y utilize som
e
nume
r
ic
al value
s
asso
cia
t
ed with a no
de and its fat
her
node to
com
pute the si
mi
larity betwe
e
n
them; t
hese nume
r
ical
values a
r
e th
e deg
ree of t
h
e
node, the
de
gree
of its fat
her n
ode, a
n
d
the
num
be
r of the co
mm
on nei
ghbo
rs of the nod
e
an
d
its fathe
r
n
o
d
e
, re
sp
ectivel
y
. The p
r
op
o
s
ed
simil
a
rity
mea
s
u
r
e
bet
wee
n
two n
o
des in
a n
e
twork
is formul
ated
in the form of definition follo
wing.
Defini
tion:
(Similarity betwee
n
no
de
s) the simil
a
rit
y
betwee
n
two
con
n
e
c
te
d nod
es
i
and
j
is com
put
ed as the foll
owin
g formul
a:
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 8, August 2013: 4484 –
4490
4486
2
)
2
)
,
(
(
)
1
(
)
1
(
)
,
(
j
i
n
k
k
k
k
j
i
similarity
j
i
j
i
Whe
r
e,
)
,
(
j
i
n
is the
numbe
r of th
e com
m
on
ne
ighbo
rs
between n
ode
i
and
node
j
;
i
k
and
j
k
are the d
egre
e
s of no
d
e
i
and n
ode
j
, res
p
ec
tively.
NSA is a two
-
stage al
gorith
m
. In the first stage, a
simil
a
rity threshol
d
is em
ployed
, and
the node set of the network is divided in
to some
grou
ps corre
s
pon
d to the mediate comm
unit
i
es
according to
the value of
; The second
stage i
s
a
n
o
p
timization
st
age, the n
o
d
e
s in
som
e
o
f
the medi
ate
com
m
unitie
s
who
s
e
no
de n
u
mbe
r
is le
ss tha
n
the give
n
thre
shol
d
are
redi
strib
u
ted into other com
m
unities. Here, the ma
jorit
y
voting strategy is employ
ed to determi
ne
whi
c
h commu
nity a node should redi
stri
buted into.
Figure 2. The
First Stage o
f
NSA
The
pseudo
-code
of the
first stage
of
NS
A is
depi
cted
as
algo
rithm I
in Fi
gure 1.
It is
an
iterative pro
c
ess, and in e
a
ch
ite
r
ation,
we sele
ct the node
v
with t
he large
s
t de
gree from th
e
uncl
a
ssified node
s set
V
, in
sert it into t
he set
U
, that mean
s a n
e
w
commu
nity has b
een
cre
a
ted; At the sa
me tim
e
, we in
sert t
he sel
e
cte
d
node to the
set
F
, that means it is no
w a
father no
de.
Takin
g
this
setting as the
tipping poi
nt, the newly created
com
m
unity begin
s
to
expand: for e
a
ch n
ode
v
in the set
F
, and each un
cla
s
sified child no
de
u
of
v
, if the similarity
betwe
en
v
and
u
is larger than the
similarity
threshol
d
, the child n
ode
u
and it
s fathe
r
n
o
d
e
v
sho
u
ld
belon
g in th
e sam
e
commu
nity, so
the
child
nod
e
u
is inserted into th
e
comm
unity
U
.
And the relati
onship b
e
twe
e
n
u
and its
chil
dren
sh
ould
b
e
co
nsi
dered,
i.e., the nod
e
u
sh
ould b
e
a father n
ode
now, so we
also in
se
rt it into the set
F
. After it is pro
c
e
s
sed, the no
d
e
v
is del
eted
from the
s
e
t
s
V
and
F
. At the end
of
ea
ch
iteratio
n, all
the
node
s i
n
the
set
U
co
mpri
se a
comm
unity; this process is repeat
e
d
unti
l
every node i
s
assig
ned to
a corre
s
po
nd
ing com
m
unit
y
.
And the set
C
co
ntains the me
diate com
m
u
n
ity structu
r
e.
A
l
gorit
hm
I
I
np
u
t:
A
n
u
ndir
e
cted a
nd u
n
w
ei
g
h
ted
gr
aph
)
,
(
E
V
G
A
similarit
y
t
h
res
hold
Ou
t
p
u
t
:
A
co
mm
unit
y
str
u
ctu
r
e o
f
Gr
aph
)
,
(
E
V
G
1.
;
Ø
C
//
C
is the se
t
of comm
unities
2.
Wh
i
l
e
(
Ø
V
)
3.
))
(
degree
(
max
arg
x
v
V
x
4.
}
{
v
U
; //
U
is the ne
w
l
y c
r
ea
ted c
o
m
m
unit
y
5.
}
{
v
F
; //
F
is the se
t of f
a
th
er
nod
es
6.
Wh
ile
(
Ø
F
)
7.
Fo
r
ea
ch
v
in F
do
8.
For
eac
h
u
in
)
(
edchildren
unclassifi
v
do
9.
If
)
,
(
u
v
simlarity
do
10.
};
{
u
U
U
};
{
u
F
F
};
/{
u
V
V
1
1
.
End i
f
12.
E
nd f
o
r
13.
};
/{
v
F
F
};
/{
v
V
V
14.
End
fo
r
15.
E
nd w
h
ile
16.
};
{
U
C
C
17.
E
n
d w
h
ile
18. Outp
ut
C
;
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
Com
m
unity Detection Algo
rithm
based o
n
Neig
hbo
r Sim
ilarity (Jia
nj
un Ch
eng
)
4487
After the pro
c
e
ss of the fi
rst sta
ge, the
mediate co
mmunity stru
cture
ca
n be
obtaine
d.
Ho
wever, i
n
the expe
rime
nts, we have
found th
at some of th
e
mediate
com
m
unities are
too
small, so tha
t
every one of them can not be
held as a se
parate commu
nity, so the medi
ate
comm
unity st
ructu
r
e
ne
ed
to b
e
o
p
timi
zed;
and
the
optimi
z
ation
process i
s
carri
ed
out i
n
the
se
con
d
st
ag
e
.
Comp
ared
wi
th the first
sta
ge, the
se
con
d
stag
e is si
mple a
nd intu
itive. For ea
ch of the
mediate
com
m
unities
acq
u
ired f
r
om th
e first sta
ge,
if the numbe
r of node
s in t
he communit
y
is
less than o
r
e
qual to the gi
ven thre
shol
d
, then every n
ode
t
in the co
mmunity is re
assign
ed to
the comm
uni
ty which con
t
ains mo
st n
e
ighb
ors of the nod
e
t
. The pse
udo
-co
de is liste
d as
algorith
m
II in Figure 2, an
d it is almost
self-expl
anat
ory.
Figure 2. The
Second Stag
e of NSA
A simpl
e
an
a
l
ysis
ca
n reveal the
co
mp
lexity
of NSA. In the first
stage,
every
node
is
pro
c
e
s
sed a
s
a father nod
e once, so the time
compl
e
xity of
the algorithm
see
m
s to be
)
(
n
O
,
whe
r
e,
n
is the
numb
e
r
of n
ode
s in th
e n
e
twork. But i
n
ea
ch ite
r
ati
on, we ne
ed
to sel
e
ct the
node
v
with the
large
s
t d
egre
e
(lin
e 3 in t
h
e algo
ri
thm I), the co
st of t
h
is o
p
e
r
ation
self is
)
(
n
O
,
so the time
compl
e
xity of the first sta
ge is
)
(
kn
O
, where,
k
is the numb
e
r of co
mmu
nities
extracted fro
m
the netwo
rk. To t
he se
cond sta
ge, it is obviou
s
ly
that its time complexity is
)
(
n
O
.
So, the comp
utation com
p
l
e
xity of NSA
is
)
(
kn
O
. In
reality,
the numbe
r o
f
communitie
s
is far
less tha
n
th
e nu
mbe
r
of
nod
es in
th
e net
wo
rk, i.
e.,
n
k
, so
NSA
has ne
ar lin
ear time
compl
e
x
i
t
y
.
4. Experiments
4.1. Thresho
l
ds and Mod
u
larit
y
In the first st
age of NSA,
the algo
rith
m
need
s to use the simila
rit
y
thresh
old
, and
works a
s
a
para
m
eter. S
o
we n
eed t
o
determine
the value of
. It is
obvious
ly,
]
1
,
0
[
.
Different valu
es of
indicat
e
different co
mmunity stru
cture
s
, that is, corre
s
p
o
n
d
to different
values
of mo
dularity. We
choo
se the val
ue of
which in
dicate
s the m
a
ximal value
of modula
r
ity
as the optima
l
value of
.
A
l
gorit
hm
II
I
n
put
:
Media
t
e c
o
mmunit
y
s
t
r
u
ctu
r
e
C
from the first
st
age;
O
u
tp
ut
:
T
he fin
a
l communit
y
s
t
ruc
t
ure
;
1.
count
=1;
2.
Wh
i
l
e
count
<
3.
Fo
r
each
c
o
mmunit
y
C
C
i
4.
If
count
C
i
do
5.
For
eac
h nod
e
t
in
i
C
do
6.
assig
n
t
to t
he co
mmu
nit
y
which h
a
s mo
r
e
tha
n
count
nodes
and
cont
a
i
ns most n
e
ighb
o
r
s of
t
7.
En
d fo
r
8.
End
i
f
9.
delete
i
C
fro
m
C
;
10.
E
nd f
o
r
11.
count
++;
12.
E
n
d w
h
ile
13.
Ou
tput
C
;
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 8, August 2013: 4484 –
4490
4488
Figure 3. Curves of Thre
sh
old
and Modul
arity
In the expe
ri
ments, the v
a
lue of
is a
s
sign
ed to
be
0 at the
be
ginnin
g
, then
it is
increa
sed
by 0.01 at ea
ch
time until it equal
s to
1; And we
execute the first
sta
ge 100 time
s on
each netwo
rk. The curve
s
in Figure
3 ill
ustrate the re
lationship bet
wee
n
and the modula
r
ity o
f
comm
unity st
ructu
r
e
for th
e thre
e n
e
two
r
ks afte
r r
unn
ing the
first
st
age, respe
c
tively. It is cle
a
r
ly
that the opti
m
al value of
for Za
cha
r
y'
s Karate Clu
b
network is 0.3, for American
Colle
g
e
Football
network i
s
0.3
6
, a
nd fo
r Santa
Fe In
stit
ute collabo
ration
n
e
twork is 0.2
7
, re
sp
ectivel
y
.
From
ou
r
exp
e
rime
nts, it
seems the
opti
m
al valu
e of
sho
u
ld
be i
n
t
he
ran
ge
of [
0
.25, 0.38],
b
u
t
this ran
ge is
only an empi
rical interval,
maybe nee
d to be verified furthe
r in the future.
Having d
e
termined the va
lue of
, we need to dete
r
mine the val
ue of
used i
n
the
se
con
d
stage
of NSA. Just like the me
thod of determining the o
p
timal value of
in the firs
t
stage,
we
al
so take the
va
lue of
whi
c
h
result
s in
the
maximum
of
modula
r
ity a
s
the o
p
timal
value of thre
shold
. It is easy to see that the value of
sh
ould be
an int
eger,
so
we a
ssi
gn 1 to
at the begi
nn
ing, and i
n
crease it by 1
each time,
u
n
til the value
of modul
arit
y rea
c
he
s to
0
.
Gene
rally, th
e value
of m
odula
r
ity will
increa
se
alon
g with th
e in
cre
a
se of
at the be
ginni
ng,
and then it
wi
ll decrea
s
e af
ter it re
a
c
he
s
its maxima. And the value
of
at the pea
k
point is
what
we n
eed, the
relation
shi
p
betwe
en
and
modula
r
ity is illustrate
d in
Fig.4. We
can see
clea
rly
that the maxi
ma of mo
dula
r
ity in Figu
re
4 are g
r
eate
r
than the
co
un
terpa
r
ts in
Fig
u
re
3 on
all th
e
three net
wo
rks. So, after the optimizatio
n of se
cond
stage, the qual
ity of
community structu
r
e
is
improve
d
.
Figure 4. Sca
tters of Threshold
and Mod
u
larity
4.2. Aanly
s
is
of Experime
ntal Re
sults
We have test
ed NSA on three real net
works;
they are Za
cha
r
y's
Karate Clu
b
netwo
rk,
American
Col
l
ege F
ootball
netwo
rk an
d
Santa Fe In
stitute coll
ab
oration
netwo
rk, respe
c
tively.
The tru
e
com
m
unity stru
ct
ure
s
of the
s
e
three
n
e
two
r
ks
have b
een
kno
w
n
a pri
o
ri, and they a
r
e
illustrate
d in the Figu
re 5, Fi
gure 6 and
Figure 7, respectively.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
Com
m
unity Detection Algo
rithm
based o
n
Neig
hbo
r Sim
ilarity (Jia
nj
un Ch
eng
)
4489
The commu
n
i
ty structu
r
e i
dentified by
NSA
on Za
chary'
s Karate
Club
netwo
rk is th
e
same
as the
true com
m
u
n
ity structu
r
e
that s
howed
in Figure 5,
and the mo
dularity of o
u
r
comm
unity structu
r
e i
s
gre
a
ter than the
other alg
o
rith
ms.
Figure 5. Tru
e
Comm
unity Structure of
Zach
ary
’
s K
a
rat
e
Clu
b
Net
w
or
k
Figure 6. Tru
e
Comm
unity
Structure of American
Colle
ge Foot
ball Net
w
ork
To the Am
eri
c
an
College
Football
network,
only 4
n
ode
s (‘
59’, ‘6
0’, ‘64’, ‘98’, i
n
Figu
re
6) are mi
scla
ssifie
d
into wrong co
mmuni
ties. In
Figure
6, these 4 nodes
shoul
d b
e
the membe
r
s
of the small
comm
unity which
con
s
i
s
ts of only
these 4 nod
es, o
r
iginally; but the links b
e
tween
these
4 n
o
d
e
s in
this small commu
nity are
not
more
freq
ue
nt than lin
ks out of the
small
comm
unity, so the small
comm
unity is broken do
wn by other la
rge
r
com
m
un
ities. After the
pro
c
e
s
s of
NSA, the nod
e
‘60’
and
no
d
e
‘64’
ar
e m
e
rge
d
into
th
e commu
nity at the to
p
ri
gh
t
marked a
s
o
c
tagon, nod
e ‘
98’ wa
s me
rg
ed into the rh
ombic
com
m
unity at the bottom right, a
n
d
node ‘59’ was merged into the circ
ul
ar
community at the bottom left.
Figure 7. Tru
e
Comm
unity Structure of
Santa Fe Inst
itute Collab
o
ration Net
w
ork
For the
ca
se
of Santa Fe Institute colla
bor
atio
n net
work, o
n
ly one
node i
s
a
ssi
g
ned into
wro
ng
co
mm
unity; this n
o
de i
s
ma
rked
as ‘10
6
’ in
Figure 7, it
should
be
a
membe
r
of t
he
pentag
on
co
mmunity. Since the
num
b
e
rs of ed
ge
s con
n
e
c
ted t
o
the n
ode
'
106' from th
e
rhom
bic
com
m
unity and from the penta
gon commu
ni
ty
are the sa
me, it is hard
to classify the
node, an
d the
n
it is miscla
s
sified by NSA
into the rhom
bic commu
nity incorre
c
tly.
To validate o
u
r propo
se
d algorith
m
, we
have also compa
r
ed the
perfo
rman
ce
with the
cla
ssi
c algo
rit
h
m FA
S
T
Q
and LPAm. The compari
s
on result
s
are illustrated in the Table I. It i
s
obviou
s
ly tha
t
both th
e m
odula
r
ity and
accu
ra
cy of
ou
r al
gorith
m
is si
gnifica
ntly larg
er th
an
those of othe
r algorithm
s.
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 8, August 2013: 4484 –
4490
4490
Table 1. Co
m
p
rison
s
of Experim
ent Re
sults for Different Algorithm
s
NSA
FAST
Q
LPAm
Karate Club
=0.3
Q=0.731
A
=
0.735
Q=0.363
A=0.706
Q=0.360
A=0.971
=[1, 3]
Q=0.415
A=0.735
=[5, 15]
Q=0.372
A=
1
College Football
=[0.36, 0.37]
Q=0.579
A=0.852
=0.41
Q=0.558
A
=
0
.
93
9
Q=0.562
A=0.383
Q=0.578
A=0.800
=1
Q=0.593
A
=
0
.
87
0
=[2, 7]
Q=0.600
A=0.852
=[1,5]
Q=0.581
A=0.965
Collaboration
=0.27
Q=0.7
06
A
=
0.7
5
4
Q=0.72
2
A=0.83
9
Q=0.656
A=0.603
=5
Q=0.747
A=0.941
=[6, 10]
Q=0.738
A
=
0
.
99
2
Notes: Q
represe
n
ts modularit
y
of
communit
y
struct
ure;
A rep
r
esents
the accurac
y
of
communit
y
struct
ure.
= [1,3] is
equal to
= {1,2,3}. The values in boldface is the best results
5. Conclusio
n
In this
pap
er, we
pro
p
o
s
e
d
a
com
m
uni
ty detection
algorith
m
na
med
NSA ba
sed
on
neigh
bor
simi
larity. Compa
r
ed
with the
FASTQ algo
ri
thm, NSA ha
s nea
r line
a
r
time compl
e
xity;
comp
ared wit
h
LPAm, NSA is a determi
nistic al
gorith
m
.
In ou
r al
gorit
hm, the
use o
f
paramete
r
is significant. Obviously
, the
optimal valu
e
of
is depe
nde
nt on the com
putation met
hod of simila
rity measu
r
e.
Matched
with the similari
ty
comp
utation
method, we have dr
awn from the exp
e
rime
nts an
empiri
cal
ran
ge, in whi
c
h
the
para
m
eter
sh
ould bel
ong i
n
; maybe, further verification and refine
ment of the range
can b
e
done in the fu
ture.
Referen
ces
[1]
SH Strogatz. Exp
l
ori
ng com
p
l
e
x
net
w
o
rks.
Na
tu
re
. 200
1; 26
8-27
6.
[2]
J Park, MEJ N
e
w
m
an. Statistical mechanics
of net
w
o
rks.
Phys. Rev. E.
2004; 70: 1-1
3
.
[3]
SN Doro
govtse
v
, JF
F Mendes
. Evolution of n
e
t
w
o
r
ks.
Advances in Phys
ic
s
. 2002; 51: 10
79-1
187.
[4]
MEJ Ne
w
m
a
n
. T
he structure and functio
n
of compl
e
x net
w
o
r
ks.
SIAM Review.
2003; 45: 167-
256.
[5]
Andre
a
Lanc
ic
hin
e
tti,
Santo
F
o
rtunato.
C
o
mmunit
y
detec
tion
alg
o
rithms
: A comp
arati
v
e a
nal
ys
is.
Phys. Rev. E
. 200
9;
80.
[6]
S F
o
rtunato. Communit
y
d
e
te
ction in gr
ap
hs.
Phys. Rep
. 20
10; 486(
3): 75-
174.
[7]
RD Al
ba. A
gra
ph-the
o
retic
de
finitio
n
of
a s
o
c
i
ometric c
liq
ue.
Mathe
m
atical
Socio
l
ogy
. 19
7
3
;
3(1): 11
3-
126.
[8]
MEJ Ne
w
m
an. Detecting com
m
unit
y
struct
ur
e in net
w
o
rks.
Eur. Phys. J. B
. 2004; 38: 32
1
-
330.
[9]
Ming
w
e
i
Le
ng,
Jin
jin
W
a
n
g
,
Pengfe
i
W
a
ng,
Xia
o
y
un
C
h
e
n
. Hi
erarch
ical
Agg
l
omer
atio
n C
o
mmun
i
t
y
Detectio
n Alg
o
r
ithm via Com
m
unit
y
S
i
mil
a
rit
y
Meas
ures.
TEL
K
OMNIKA
. 201
2; 10(6): 15
10-1
518.
[10]
Xi
n
Xi
a, Sh
u-
xin Z
h
u. A Surv
e
y
o
n
W
e
i
ghte
d
Net
w
o
r
k Me
asurem
ent a
n
d
Mod
e
li
ng.
TE
LKOMNIKA.
201
3; 11(1): 18
1-18
6.
[11]
MEJ Ne
w
m
an,
M Girvan. F
i
ndi
ng
an
d ev
a
l
uati
ng c
o
mmu
nit
y
structure
i
n
n
e
t
w
orks.
P
h
ys. Rev. E
.
200
4; 69.
[12]
MEJ Ne
w
m
a
n
. F
a
st algorithm
for det
ecting c
o
mmunit
y
struc
t
ure in net
w
o
rk
s.
Phys. Rev.
E.
2004; 69.
[13]
UN Ra
ghav
an,
R Albert, S Kumara. Near l
i
n
ear time
al
gor
ithm to detect communit
y
struc
t
ures in lar
g
e
-
scale net
w
orks
.
Phys. Rev. E.
2007; 7
6
.
[14]
MJ Barber, JW Clark. Detecti
ng net
w
o
rk comm
unities b
y
prop
agati
ng l
a
bels u
nder co
n
s
traints.
Phys.
Rev. E.
2009; 80.
[15]
X
Li
u, T
Murata. Adva
nce
d
mo
dul
arit
y-s
pec
i
a
liz
ed
la
b
e
l
prop
ag
atio
n a
l
gor
ithm f
o
r d
e
tectin
g
communiti
es in
net
w
o
rks.
Phy
s
ical A
. 201
0; 149
3-15
00.
[16]
S Pan
g
, C C
h
en, T
W
e
i.
A realti
me co
mmunity d
e
tectio
n
alg
o
rith
m: Inc
r
ementa
l
l
abe
l
prop
ag
ation
.
Procee
din
g
of ICF
I
N conferen
ce. Beiji
ng. 20
09; 313-
31
7.
Evaluation Warning : The document was created with Spire.PDF for Python.