TELKOM
NIKA
, Vol.12, No
.4, Dece
mbe
r
2014, pp. 10
05~101
6
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v12i4.438
1005
Re
cei
v
ed Au
gust 27, 20
14
; Revi
sed O
c
t
ober 2
6
, 201
4; Acce
pted
No
vem
ber 1
6
,
2014
A New Algorithm for Detecting Local Community Based
on Random Walk
Yueping Li
1,2
, Weikun Zh
eng
3
1
School of Co
mputer Eng
i
n
e
e
rin
g
, Shenz
he
n
Pol
y
tech
nic, Shenz
he
n 518
055, P.R. Chin
a
2
Shenzh
en gra
duate sch
oo
l, Harbi
n
Institute
of
T
e
chnolo
g
y
, Shenzhe
n 51
805
5, P.R. China
3
Information C
enter, She
n
zhe
n
Institute of Info
rmation T
e
chnol
og
y, She
n
z
hen 5
1
8
172, P.
R. Chin
a
e
-ma
i
l
:
l
eey
ue
pi
n
g
@
g
m
ai
l
.
com
1
, zheng
w
k
@
sziit.edu.cn
3
A
b
st
r
a
ct
T
h
is pa
per
pre
s
ents on
e n
e
w
alg
o
rith
m for
l
o
cal c
o
mmun
ity discov
e
ry. It empl
oys a
ne
w
vertex
selecti
on strate
gy w
h
ich c
ons
i
ders n
o
t o
n
ly t
he
bou
nd
ary structure
of can
d
id
ate
local comm
unity but als
o
the prob
ab
ility
w
h
ich the in
v
e
stigated vert
ex
w
ill return to the can
d
i
date l
o
cal co
mmunit
y
. A local ran
d
o
m
w
a
lk is a
d
o
p
te
d to co
mpute
this retur
n
pr
o
bab
ility w
h
ic
h
does
not r
e
q
u
i
r
e the
gl
oba
l i
n
formatio
n
. W
e
choos
e four
al
gorith
m
s for c
o
mp
ariso
n
w
h
ic
h are th
e
best
ones
existed
b
y
far. F
o
r better eval
uatio
n, th
e
datasets
incl
ud
e n
o
t o
n
ly th
e
computer
ge
ne
rated
grap
hs
i
n
stan
dar
d b
e
n
ch
mark
but
al
so the
rea
l
-w
orl
d
netw
o
rks w
h
ic
h ar
e cl
assic
a
l
on
es i
n
glo
b
a
l c
o
mmun
ity
discov
e
ry. T
h
e
exp
e
ri
me
ntal
results s
how
o
u
r
alg
o
rith
m out
p
e
rforms th
e other on
es on t
he co
mp
uter gen
erate
d
gra
phs.
T
he perf
o
rmanc
e of our
alg
o
rith
m is
ap
proxi
m
at
ely th
e sa
me
w
i
th th
e al
gorit
hm pr
opos
ed
by L
u
o
,
W
ang a
nd Pr
omislow
o
n
re
al-
w
o
rld netw
o
rks.
Ke
y
w
ords
:
loc
a
l co
mmu
n
ity, community d
i
s
c
overy, rand
o
m
w
a
lk
1. Introduc
tion
Extracting community structures
in comp
lex n
e
tworks
ha
s g
a
i
ned m
u
ch a
ttention
r
e
ce
n
t
ly. G
ene
r
a
lly, ne
tw
ork
s
c
a
n b
e
mod
e
lle
d as
gr
ap
h
()
GV
E
, where
V
is a set of vertice
s
rep
r
e
s
entin
g individual
s an
d
E
is a set of edge
s that sh
ow the intera
ction bet
wee
n
the vertice
s
.
There is no
universally a
c
cepted
defi
n
ition of
com
m
unity. Conv
entionally, a
comm
unity o
f
a
netwo
rk is a
gro
up
of ve
rtice
s
that
are de
nsely co
nne
cted
amo
ngst th
em
sel
v
es
while
bei
ng
spa
r
sely con
necte
d to th
e
vertice
s
out
side
t
he
gro
u
p
. Usually, th
e vertices of
one
com
m
uni
ty
exhibit certai
n comm
on ch
ara
c
teri
stics.
Many algo
rit
h
ms
have b
e
en p
r
opo
se
d
to disc
ove
r
comm
unity st
ructu
r
e i
n
re
al wo
rld
netwo
rks. Ho
wever, the
s
e
algorith
m
s a
r
e sup
p
o
s
ed
t
o
find the enti
r
e commu
nity stru
cture
of the
grap
h. Thi
s
constraint m
a
ke
s the
s
e
al
gorithm
s
ca
n
not ha
ndle
th
e dynami
c
n
e
tworks a
nd
the
large
scale n
e
tworks. Unfo
rtunately,
n
e
tworks
i
n
r
eal
worl
d u
s
u
a
lly
are
larger tha
n
the
scale
can
be settled by
the fastest al
gorithm
s [1].
Re
cent works focus o
n
find
ing local com
m
uni
ty stru
cture, whi
c
h d
e
t
ects the com
m
unity
given a
start
vertex. This t
a
sk h
a
s ma
n
y
scena
rio
s
i
n
daily
appli
c
ations.
For ex
ample, th
e p
o
lice
might like to
quantify the
local
com
m
unities
of a
su
spe
c
t give
n his
so
cial
netwo
rk. Sev
e
ral
method
s hav
e been propo
sed to extract local comm
unity. Howev
e
r, they suffer in one or m
o
re
ways
. For instanc
e, proposed algorithms in [2],[3
] are des
igned to proces
s
the graphs
whic
h has
a minim
a
l
co
nne
cted i
n
itia
l topolo
g
y. T
he meth
od
s
[4]-[6] in req
u
ire
so
me
d
egre
e
of
glo
bal
informatio
n o
beys
asce
rtai
n pa
rtit
ions.
T
he al
gorith
m
i
n
[7] is p
r
e
s
e
n
ted fo
r dyn
a
m
ic
netwo
rks.
In
addition, its time com
p
lexity is too high
which is
4
()
OV
larger than m
a
ny global co
mmunity
discovery alg
o
rithm
s
.
Due to ab
ove limitations,
these metho
d
s
are not widely used. Currently, the popul
ar
methods for f
i
nding local
community
are [8]-[10] whi
c
h
will be di
sc
ussed further in Section
3.
Local co
mmu
nity can be fo
rmally define
d
as follo
ws: Given an u
n
d
i
recte
d
graph
()
GV
E
and
a
s
t
ar
t ver
t
ex
s
V
. In the
a
b
se
nce of
th
e glo
bal
kn
o
w
led
ge, a
subgraph
()
s
ss
GV
E
is
extracting fro
m
G
containi
ng
the start vert
ex
s
, where
s
is densely con
nect with the
vertice
s
of
s
V
than the vertice
s
of
s
V\
V
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 4, Dece
mb
er 201
4: 100
5 – 1016
1006
The algo
rith
ms of [8]-[10] have the
followi
n
g
pro
b
l
ems: Cla
u
se
t’s algorithm
[8] give
hiera
r
chi
c
al community wh
ile
not output
a ce
rt
ain l
o
cal commu
nity. The
algo
rith
m propo
se
d
by
[9] and [10] perfo
rm well on the g
r
aph
s which
co
nt
ains signifi
ca
nt
local com
m
unities but wo
rk
poor o
n
the
grap
hs
witho
u
t strong
co
mmunity st
ru
cture.
In a
ddi
tion, the
co
rrectne
s
s of
th
ese
algorith
m
s
drop d
r
amati
c
a
lly when
pro
c
e
ss th
e
vertices th
at lie
on the
bou
ndary
of local
comm
unity. The
rea
s
o
n
i
s
that th
ey a
r
e d
e
si
gne
d
mainly on
th
e greedy
of l
o
cal
commu
nity
measure. Th
e merging
an
d rem
o
val of
vertex in
the
temporary lo
cal co
mmunity
just inve
stig
ate
the boun
da
ry. Therefo
r
e,
these m
e
tho
d
s
could
not
explain why the output lo
cal com
m
unity is
relevant with
the start vert
ex. The limitation w
ill affect the further appli
c
ation of
the found local
comm
unity.
In
the
foll
owi
ng section, we will pr
esent one measure to t
he relevance of
l
o
cal
comm
unity and the sta
r
t vertex.
This pap
er p
r
opo
se
s
one
ne
w al
gorit
hm for extra
c
ting l
o
cal
community. T
he vertex
sele
ction of
our alg
o
rithm
con
s
ide
r
s o
n
ly the
boun
dary structu
r
e affected by
the inse
rtion
of
vertex but also the probabil
i
ty
which the vertex will return to
the candidate local
comm
unity. This
return proba
bility is com
puted by a local
rand
om
walk
whi
c
h
does n
o
t d
e
mand the
g
l
obal
stru
cture of the gra
ph. We
compa
r
e ou
r algorithm
wit
h
four algo
rithms which are the best on
es
kno
w
n
by fa
r. The
data
s
ets in
clu
d
e
s
not o
n
ly the compute
r
gene
rated
o
nes in
stand
ard
ben
chma
rk b
u
t also the
on
es
mod
e
lled
by re
al-worl
d
netwo
rks whi
c
h
are
cla
ssi
cal on
es in
glo
bal
comm
unity di
scovery. The
latter one
s
repre
s
e
n
ts
dif
f
erent type o
f
commu
nity stru
cture whi
c
h
help
s
to eval
uate the al
go
rithms. T
he e
x
perime
n
tal result
s sho
w
our al
go
rithm
outperfo
rm
s
the
other o
n
e
s
o
n
the gra
p
h
s
in the stand
ard b
e
n
c
hma
r
k, an
d perf
o
rms al
mo
st the sa
me with
the
algorith
m
pro
posed by Luo
, Wang an
d Promi
s
lo
w [9] on real
-worl
d
datasets.
The rest of th
is pa
per is
organi
zed
as fo
llo
ws: Section
2 presents th
e relate
d works. T
h
e
prop
osed al
g
o
rithm is
give
n in Section
3. The eval
u
a
tion of the a
l
gorithm o
n
a
r
tificial an
d re
al-
worl
d datasets is illustrated in Section 4. Se
ction 5 discusses some furtherer improvem
ent.
Finally, Section 6 co
ncl
u
d
e
s the pa
per.
2. Related Works
This se
ction descri
b
e
s
three
st
ate
-
of
-th
e
-a
rt algo
rith
ms fo
r dete
c
ti
ng lo
cal
com
m
unity. In
addition, we provide fo
rma
l
definitions of
related me
asure
s
.
Firstly, Clau
set gave the definition of loca
l modu
alarity [8] as the portion
of the
con
n
e
c
ting e
dge
s in the b
ound
ary ed
g
e
s, wher
e co
nne
cting e
d
g
e
den
otes th
e edg
e co
nn
ects
the vertex in the local com
m
unity to
the vertex outsid
e
the comm
u
n
ity.
Let
C
denote
s
the lo
cal
co
mmunity an
d
B
be
the ve
rtice
s
comp
ri
se
the b
ound
ary in
whi
c
h ea
ch
vertex has a
t
least one
neigh
bor n
o
t in
C
. The bo
unda
ry-a
djacency matrix i
s
defined by Cl
auset [8] as
1
i
f
v
ert
i
ces
an
d
a
re
co
nn
ect
ed
and
e
i
t
h
er
v
e
rt
ex
i
s
i
n
0o
t
h
e
r
w
i
s
e
ij
ij
BB
(1)
Based o
n
this, Clauset pro
posed the me
asu
r
e "lo
c
al
modual
arity" to be
()
ij
ij
ij
ij
B
ij
R
B
(2)
whe
r
e
()
ij
is 1
when eith
er
i
vB
a
nd
j
vC
or vice ve
rsa,
and i
s
0
otherwise. By mean
s of
this mea
s
u
r
e,
one co
mmun
i
ty is expecte
d to have
few conn
ectio
n
s
from its bou
n
dary to the p
a
rt
outsid
e
the community, while having a
greate
r
prop
ortion of con
nectio
n
s fro
m
the boun
d
a
ry
back into the
comm
unity.
Clau
set g
a
ve
a greedy m
e
thod o
n
the
lo
cal
co
m
m
unit
y
measure "lo
c
al m
odu
alari
t
y". But
the output i
s
a hie
r
archi
c
al
comm
unity containin
g
giv
en vertex
wh
i
c
h d
o
e
s
not
show th
e regio
n
of
local comm
unity.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
A New Algo
rithm
for Detect
ing Lo
cal Co
mm
unity Based on Rand
o
m
Walk (Yue
ping Li)
1007
Luo, Wa
ng a
nd Promi
s
lo
w [9] prop
osed seve
ral al
gorithm
s ba
sed on the framewo
r
k
maintainin
g t
w
o ve
rtex qu
eue
s: addi
ng
queu
e an
d d
e
leting q
ueu
e
.
The me
rgi
n
g of the ve
rte
x
in
addin
g
que
u
e
and the
re
moval of the vertex in de
leting que
ue
will both in
crea
se the lo
cal
comm
unity measure. Th
e algorith
m
rep
eats
comp
uting the these
two queu
es
and pe
rformin
g
addition an
d removal op
erations u
n
til these two
que
ues a
r
e empt
y; that is, ad
ding any vert
ex
into
C
no
r re
m
o
ving any ve
rtex in
C
will im
prove the measure. At
that
time, the community
C
will be output. The employed measure
M
is defined a
s
follows:
()
[]
[
]
ij
ij
ij
ij
Bi
j
M
B
iC
jC
(3)
whe
r
e
[]
iC
equal
s 1 wh
en
iC
, otherwise 0. Th
e definition of
[]
j
B
is similar.
Luo, Wang
a
nd Pro
m
isl
o
w provid
ed th
ree vers
io
ns
of pro
p
o
s
ed
algorith
m
[11
]: greedy
addition, ad
d
-
all ad
dition
and K-li
ke m
o
ve. Stat
ed by [11], the algorithm
usin
g
add-all addit
i
on
perfo
rms be
st among
the
s
e thre
e ve
rsi
ons.
We
impl
ement the
ve
rsio
ns of the
add-all a
dditi
on
and greedy a
ddition, den
oted by
all
LWP
and
g
reed
y
LW
P
, in our expe
rim
ents.
Re
cently, Ba
gro
w
[10] e
m
ploy
ed the
greedy me
asure "outward
ne
ss"
to sel
e
ct vertice
s
mergi
ng into
local
comm
u
n
ity. The outwardne
ss of
vertex
v
with res
p
ec
t to
c
o
mmunity
C
is
defined a
s
:
()
1
(
)
([
]
[
])
()
v
iv
Ci
C
i
C
v
(4)
whe
r
e
()
v
are the neigh
bors o
f
v
.
Bagro
w
[1
0]
defined
a
p
-st
r
ong
commu
nity
as stop
p
i
ng crite
r
ia. A
comm
unity
C
is
called
p
-st
r
on
g
if a fraction
p
of vertices in
C
satisfy that they have mo
re neig
hbo
rs insid
e
C
than outsi
de. Bagro
w
[10] st
ated multiple
values of
p
ca
n be used si
multaneo
usly
.
From the
ab
ove introd
uct
i
on, it con
c
lu
des
th
at the
s
e three al
g
o
rithm
s
ad
op
t greedy
strategy on resp
ective
m
e
asu
r
e
s
.
Unfo
rtunatel
y, the
s
e m
e
a
s
ures are
define
d
mainly on t
h
e
boun
dary ed
ges. They do
not con
s
id
er
the start
ve
rt
ex dire
ctly. F
o
r
example,
LWP’s alg
o
rit
h
m
has to d
e
termine whethe
r the start vert
ex lies in
the
resulting lo
ca
l comm
unity sin
c
e it co
nta
i
ns
remove
d op
e
r
ation. In ad
d
i
tion, this sho
r
tcomi
ng
will
be mo
re p
r
o
m
inent when
the sta
r
t vertex
lies o
n
the
boun
dary
of local
comm
unity, which
will b
e
di
scussed fu
rthe
r in the
section
pre
s
entin
g the experim
ent
al results.
3. Proposed
method
: Loc
al Community
D
i
sco
v
e
r
y
Algorithm u
s
ing Local Random Walk
Different
from
cu
rrent m
e
th
ods,
ou
r al
go
rithm d
edi
cat
e
s to
eval
uat
e the
rel
a
tion
of lo
cal
comm
unity a
nd the
sta
r
t vertex in
stead
of inve
stigati
ng the
bou
nd
ary of lo
cal
community o
n
l
y
.
To achieve t
h
is, we empl
oy local
random wal
k
st
rat
egy to compute the
visit
probability of the
vertices
whi
c
h will
be used for the vertex
selecti
on.
Next, we introduce th
e random wal
k
strategy
and ou
r local rand
om meth
od.
There are variou
s
rand
o
m
wal
k
st
rat
egie
s
on g
r
a
phs
su
ch a
s
Markov cha
i
ns [12],
quantum
ran
dom walk [13
]
and ra
ndom
walk
ba
sed
on vertex de
gree
distri
buti
on [7], etc. Thi
s
pape
r d
e
velo
ps
one
ne
w l
o
cal
ra
ndom
wal
k
b
a
sed o
n
Ma
rov
chai
ns.
Ne
ce
ssary definition
s
are
give, at firs
t.
Let
()
GV
E
be a
co
n
necte
d g
r
ap
h
with
n
ve
r
t
ices
an
d
m
edg
es.
Co
nsi
der a
random
wal
k
on
G
: s
t
art with vertex
0
v
; at the
t
-th step, we assume the probability that
move to a
neigh
bor of
t
v
is
1(
)
t
dv
, where
()
t
dv
is the d
e
g
r
ee of
t
v
(equi
valently, the numbe
r of
neigh
bors of
t
v
). The
n
, the
seq
uen
ce
of these rand
om
node
s
(0
1
)
t
vt
is a Markov
chain
.
We denote the matrix of trasition pr
obabilities of this Markov chai
n by
()
ij
M
p
where
ij
V
.
We have
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 4, Dece
mb
er 201
4: 100
5 – 1016
1008
1(
)
i
f
(
)
0o
t
h
e
r
w
i
s
e
ij
di
i
j
E
p
(5)
We
denote t
he probability that vertex
u
is visite
d
at step
t
-th by
0
()
v
t
u
. T
he Ve
ctor
0
v
t
store
s
the probabilitie
s of all vertice
s
in grap
h
G
. Then,
0
0
1
v
v
T
t
t
M
(6)
This pa
pe
r a
dopts the version of ra
ndo
m walk
with restart, be
cau
s
e we wi
sh to revea
l
the relation of
the start vertex and other
vertice
s
. The
formula i
s
0
0
0
1
(1
)
v
v
T
t
t
M
ce
(7)
whe
r
e
0
e
is one vector of which the valu
e of
0
v
’s positi
on is 1 an
d other value
s
are 0. The
n
,
0
ce
in Formula
(7) indi
cates that
random walk has probability
c
to res
t
art with
0
v
at each step.
It is n
e
cessa
r
y to m
ention
that the
co
mputat
ion
of
Formul
as (6) and
(7) re
q
u
ire
s
th
e
global info
rm
ation of gra
ph. The
r
efore, we
coul
d
not employ
dire
ctly. We pro
p
o
s
e o
ne
approximate comp
utation
of rando
m walk with re
sta
r
t when
sea
r
chin
g the can
d
idate su
bg
ra
ph.
In addition, we use adja
c
ency list to store the tran
sition proba
bi
lity
ij
p
instead
of matrix
M
.
Then, the pro
bability
0
1
v
t
can b
e
comp
uted b
y
the followin
g
formula:
0
0
0
0
()
1
0
()
()
(1
)
()
()
()
(1
)
()
v
t
vu
v
t
v
t
vu
u
uv
dv
u
u
cu
v
dv
(8)
whe
r
e
()
u
den
otes th
e nei
ghb
or
set of ve
rtex
u
. We
ca
n
search th
e a
d
jace
ncy li
st of
vertex
u
to obtain
()
u
. Then, this comp
utation will n
o
t deman
d gl
obal adj
acent
information;
that is, can
be cal
c
ul
ated
locally.
Furthe
rmo
r
e,
we restri
ct th
e pro
bability co
mp
utation within
the se
arching su
bgraph
an
d
the outsid
e
b
ound
ary. Formally, denote
curre
n
t sub
g
r
aph
by
s
ub
G
and
outsid
e
bou
n
dary by
OB
.
Then,
OB
can be
formulated b
y
{(
)
}
s
ub
s
u
b
OB
u
u
v
v
G
u
G
Therefore, Fo
rmula (8) i
s
chang
ed into:
0
0
0
0
()
1
0
()
()
(1
)
()
()
()
(1
)
()
sub
su
b
v
t
vu
v
G
O
B
v
t
v
t
vu
v
G
O
B
u
uv
dv
u
u
cu
v
dv
(9)
We next intro
duce the vertex sele
ction strategy of our algorithm.
We
sele
ct o
n
e
vertex to a
dd into th
e current
comm
unity at ea
ch
step. O
u
r
g
oal is to
sele
ct the ve
rtice
s
whi
c
h l
i
e in
the
sa
me lo
cal
co
mmunity a
s
the
start vert
ex. Therefore
,
the
visiting probability which travels from the start ve
rtex is suitable
for
cons
ideration whil
e
choosing
vertice
s
. Sup
pose
u
is the
vertex whi
c
h
is evalu
a
ting
for
choo
sing.
Beside th
e v
a
lue of visitin
g
probability of
u
at step
t
, we
also m
e
asure the fr
action
of probability whi
c
h
u
will go back into
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
A New Algo
rithm
for Detect
ing Lo
cal Co
mm
unity Based on Rand
o
m
Walk (Yue
ping Li)
1009
the cu
rrent
candid
a
te co
mmunity at
step
1
t
. We
n
e
xt pre
s
ent t
he me
asure
of sele
ction,
formally.
Let
s
ub
G
be th
e
candid
a
te
co
mmunity at
step
t
and
OB
be
the
outsi
de
boun
dary
of
s
ub
G
. Thus, the vertice
s
of set
OB
are candi
d
a
tes for ch
oo
sing to be m
e
rge
d
into
s
ub
G
. Let
0
()
v
t
u
be the visiting probability of vertex
u
whi
l
e the
start ve
rtex is
0
v
. T
h
en, o
u
r
me
as
ure
fo
r
cho
o
si
ng vert
ex
u
is defined
as:
0
0
0
0
0
0
()
2
()
()
()
()
()
(
)
()
()
()
()
sub
sub
su
b
v
t
vG
v
u
v
t
v
t
v
t
vG
v
u
v
t
v
vG
v
u
t
v
uu
u
v
v
u
(10
)
The item
0
()
0
()
2
()
()
v
t
vG
v
u
sub
v
t
v
u
forces that the vertices has
high “return" probability obtain
high p
r
io
rity to be sele
cted. The
rea
s
on that ex
po
nent nu
mbe
r
is
set to 2
arises from t
h
e
followin
g
two
asp
e
ct
s: (i
) if
the num
ber i
s
set to
1, thi
s
me
asure
al
so
wo
rks
but
not a
s
go
od
as
that equal
s 2
.
Becau
s
e
when expo
nen
t numbe
r is 1
,
the measure equal
s
0
()
()
sub
v
t
vG
v
u
v
,
whi
c
h sho
w
s
the sum of visiting
proba
bi
lity of
the neighbo
rs of
v
in
s
ub
G
. It does not
indicate
the fraction o
f
probability whi
c
h the vertex will
return
to the candi
date local
co
mmunity. Since
one ve
rtex h
a
s
high
er “ret
urn" f
r
a
c
tion,
the vertex
be
ars clo
s
er
co
nne
ction with
the start
ve
rt
ex.
Therefore,
we do
not
set
the expo
nent
num
ber to
1
;
(ii)
We
hav
e teste
d
the
numbe
r i
s
l
a
rger
than 2. The result
s of sele
ction is alm
o
st the same.
In brief,
we
cho
o
se the
vertex ha
s
maximum va
lue of
to b
e
me
rged
in
to the
can
d
idate
co
mmunity
s
ub
G
. Then, reco
mput
e the vi
siting
pro
bability a
nd the
value
s
of
until
satisfie
d the stopping
crite
r
i
a
.
The stop
ping
criteria of ex
isted algo
rith
ms
is too sim
p
le. For exa
m
ple, most al
gorithm
s
su
ch a
s
g
reed
y
LW
P
an
d
all
LWP
algo
rithm
s
[11] halt wh
en the
r
e i
s
n
o
insertion
o
r
deletio
n of
vertex ca
n i
m
prove
d
the
mea
s
ure. O
n
the oth
e
r hand, seve
ra
l
algo
rithms su
ch
a
s
Ba
g
r
ow’
s
algorith
m
[10
]
stop wh
en
their mea
s
u
r
es
rea
c
h th
e desi
r
e thresh
old
s
. Furt
herm
o
re,
so
me
algorith
m
s d
oes not stop
until
se
arch
ing the
whol
e graph. O
n
e
insta
n
ce i
s
the
algo
rithm
prop
osed
by Cla
u
set [8]. We
next p
r
ese
n
t the
st
oppin
g
crite
r
ia
for ou
r se
lect-a
nd
-me
r
ge
pro
c
ed
ure.
Our algo
rithm
employ
s th
e
mea
s
u
r
e
M
de
fined by
Formula 3
to m
a
rk "potential"
local
comm
unitie
s
. Since ou
r a
l
gorithm m
e
rges the ve
rt
i
c
e
s
one
by one, it is straightforwa
r
d
that
measure
for evaluation wi
ll
increa
se or
de
cre
a
se
d
r
amatically if
potent
ial local com
m
unity
is
found an
d this local co
mm
unity is signifi
cant. We
d
e
n
o
te these in
creasi
ng or d
e
crea
sing p
o
int
s
by "jumping"
points. T
hen
, we record the pote
n
tial l
o
cal
co
mmun
i
ties which a
r
e indi
cated
b
y
these in
crea
sing or de
crea
sing p
o
ints. Finally,
we d
e
termin
e out
put whi
c
h local comm
unity as
result.
We no
w di
scuss the jumpi
ng point
s
furt
her by differe
nt cases. Let
t
m
be the value of
measure
M
at step
t
.
Ca
se
1: Incre
a
sin
g
ly jumpi
ng point. We
call
t
m
is one incre
a
si
ngly ju
mping poi
nt if the
followin
g
pro
positio
n hold
s
1
11
1
1
1
1
1
(
)
(
)
.
...
..
(
)
tt
tt
tt
n
m
m
mm
mm
(11
)
whe
r
e
1
and
1
n
are paramete
r
s su
ch
th
at
1
01
and
1
n
is a
po
si
tive integer.
One exa
m
ple
is illu
strated
o
n
the left of Fi
gure
1. Thi
s
type of
jumpin
g point
sho
w
s that one vert
ex whi
c
h i
s
n
o
t
in t
he
same
l
o
cal
co
mmun
i
t
y
as st
a
r
t
v
e
rt
ex
is
merg
ed into th
e current comm
unity. Theref
ore,
we rec
o
rd the c
o
mmunity at s
t
ep
1
t
as one
potential co
mmunity in this ca
se.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 4, Dece
mb
er 201
4: 100
5 – 1016
1010
Figure 1. Different cases o
f
jumping poi
nts
Ca
se
2: Decrea
singly ju
mping p
o
int. When th
e values of
M
drop d
r
am
atically, it
prob
ably find
s o
ne lo
cal
community. Howeve
r, it u
s
ually discove
r
s
one
majo
r part
of the l
o
cal
comm
unity n
o
t the wh
ole
at that step
. The value
s
of
M
usually will still decrease in the
followin
g
st
e
p
s. T
herefore, it req
u
ires to fi
nd th
e
followin
g
ste
p
of
whi
c
h t
he valu
es of
M
cha
nge
sightl
y
or increa
se.
We call
t
m
is on
e decrea
s
in
gl
y jumping poi
nt if the following pro
p
o
s
itio
n hold
s
2
2
2
12
2
2
2
12
2
2
2
12
2
2
2
(
)
(
)
.....
.
(
)
()
(
)
(
)
(
)
...
...
(
)
(
)
(
)
...
...
(
)
kk
k
k
k
k
n
k
tt
t
t
t
m
t
tt
t
t
t
m
t
mt
m
m
m
m
m
m
kq
t
t
t
mm
m
m
m
m
mm
m
m
m
m
(12)
whe
r
e
2
,
2
and
2
n
are paramet
ers
su
ch that
22
01
and
2
n
is a positive intege
r. By
Formul
a
12,
it is supp
o
s
ed
that va
riable
t
must
be the smallest value
of which the
corre
s
p
ondin
g
M
value indi
cates the flat o
r
increa
sin
g
segment. Thi
s
con
s
trai
n gu
a
r
antee
s that
it obtains the
corre
c
t flat or increa
sing
se
gm
ent after o
ne dra
m
atical
ly decre
asi
n
g
step.
An exampl
e o
f
decrea
s
in
gl
y jumping
poi
nt followed by
a flat
segm
e
n
t is
displaye
d on
the
middle of Fig
u
re 1. On the
other han
d, the ca
se
follo
wed by an in
cre
a
si
ng seg
m
ent is sh
own on
the right of Fi
gure
1. Since
the flat seg
m
ent or in
cre
a
sin
g
se
gme
n
t begin
s
at step
t
, we re
co
rd
the comm
unit
y
at step
t
as one pote
n
tial comm
unity in this ca
se.
We next present the pro
c
e
dure of
ou
r al
gorithm a
s
fol
l
ows:
Local Comm
unit
y
Disco
v
e
r
y
Using Lo
cal Rand
om Walk
Input
: gra
ph
G
with adjac
ent lis
t
A
L
, s
t
art vertex
0
v
,
Parameters
:
1
,
1
n
,
2
,
2
,
2
n
,
2
m
and
Outpu
t
: local c
o
mmunity
0
()
loc
a
l
Gv
Set
0
{}
su
b
Gv
and
0
0
1
v
;
Use vertex se
t
OB
to store the
outsid
e
bou
n
dary of
s
ub
G
, s
e
t
OB
;
For
(
1
t
;
3
t
;
t
) {
Upd
a
te the b
ound
ary
OB
of
s
ub
G
;
Comp
ute
0
v
t
for all vertices in
sub
GO
B
by Formula 8
;
Insert the vert
ex
u
, whic
h
()
u
is maximum, into
s
ub
G
;
}
0
0.
2
0.
4
0.
6
0.
8
1
Ca
s
e
1
M
e
r
g
i
ng Sequenc
e
M
0
0.
2
0.
4
0.
6
0.
8
1
C
a
s
e
2 (
f
l
a
t
seg
m
ent
)
M
e
r
g
i
ng Sequenc
e
M
0
0.
2
0.
4
0.
6
0.
8
1
C
a
s
e
2
(
i
nc
r
eas
i
ng
s
egm
ent
)
M
e
r
g
i
n
g Seque
nc
e
M
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
A New Algo
rithm
for Detect
ing Lo
cal Co
mm
unity Based on Rand
o
m
Walk (Yue
ping Li)
1011
Do
{
Upd
a
te the b
ound
ary
OB
of
s
ub
G
;
Comp
ute
0
v
t
for all vertices in
sub
GO
B
by Formula 9
;
Insert the vert
ex
u
, whic
h
()
u
is maximum, into
s
ub
G
;
Comp
ute
t
m
whi
c
h is the m
e
a
s
ure
M
for
s
ub
G
by Formul
a 3;
Determine whether
t
m
is a jumping poi
nt at current step
t
;
If it
is
, rec
o
rd
the jumping point;
1
tt
; }
While
(
s
ub
G
is sm
all than
G
and
1
t
m
)
Output the su
bgra
ph of the
first jumping
point as
0
()
loc
a
l
Gv
.
The
rea
s
o
n
t
hat we u
s
e
F
o
rmul
a 8
to
compute
0
v
t
when
13
t
is
we
wis
h
to k
n
o
w
visiting probability of
the vertices which
are very closed with the
start vertex
0
v
(that is
, the
vertice
s
of which th
e di
sta
n
ce from
0
v
is
no large
r
tha
n
3). Th
oug
h
this computa
t
ion is n
o
t as
corre
c
t a
s
th
e cla
s
sical random
wal
k
,
it is suffi
ci
e
n
t for the ve
rtex sele
ction
.
Moreove
r
, our
algorith
m
just
outputs the l
o
cal
comm
un
ity found by
the first jum
p
i
ng point. It do
es n
o
t output
the
hiera
r
chi
c
al community according to ou
r proble
m
stat
ement.
We
next pre
s
ent th
e time
com
p
lexity of our alg
o
rit
h
m. It need
s
3
((
)
)
avg
OD
e
g
time to
comp
ute
0
v
t
for
13
t
, where
avg
D
eg
is th
e avera
ge de
gree
of the vertice
s
in g
r
a
ph
G
. For
the re
maine
d
step
3
t
, the ru
nning
time i
s
boun
ded
by
2
0
((
(
)
)
)
lo
c
a
l
OV
G
v
. In the
wors
t
cas
e
, it
is
2
((
)
)
OV
G
. Usually, the re
sult com
m
unity
0
()
loc
a
l
Gv
is much sm
aller th
an the whol
e grap
h
G
Thus, o
u
r alg
o
rithm ru
ns in
32
0
((
)
(
(
)
)
)
avg
l
oca
l
OD
e
g
V
G
v
time in average.
4. Experiments fo
r Ev
aluation
In this sectio
n, we e
m
ploy
the ben
chm
a
rk me
tho
d
p
r
opo
se
d by B
agro
w
[10] to
give an
obje
c
tive co
mpari
s
o
n
wit
h
the exi
s
tin
g
algo
rithm
s
for findi
ng
local
co
mmu
nity stru
cture
s
.
Bagro
w
’
s
met
hod [1
0] uses com
pute
r
ge
nerate
d
n
e
tworks. Be
sid
e
s these a
r
tifici
al net
works,
we
also evalu
a
te
the algorith
m
s on famou
s
real
-worl
d
datasets whi
c
h sho
w
different structu
r
e
s
of
local comm
unity.
Com
puter g
e
nerate
d
net
works
: In Bag
r
ow’
s
ben
ch
m
a
rk, it create
s
a cl
assi
cal
grap
h
()
GV
E
of
(
)
12
8
VG
, which
is then
ran
domly
partition
ed i
n
to four
refe
re
nce
communi
ties to
contai
n equ
al numbe
r of vertices
(32 vert
i
c
es). Each vertex has an
averag
e deg
ree
16
in
o
u
t
zz
z
, whe
r
e
out-degree
ou
t
z
e
q
u
a
l the
num
b
e
r
of ed
ge
s co
nne
ct the
vertices
outsid
e
the
communitie
s
. It is Cle
a
r th
a
t
a small
out
z
sh
ows a
stro
ng
comm
unity structu
r
e. In
orde
r to eval
uate the p
e
rf
orma
nce le
ss affect
ed by
rand
omn
e
ss,
we g
ene
rate
100 g
r
ap
hs
for
each
ou
t
z
from 1 to 7.
Real
-world
d
a
taset
s
: We
cho
o
se some
famou
s
data
s
ets
model
ed
by real
-worl
d
gra
p
h
s
su
ch a
s
ka
rat
e
club n
e
two
r
k [1], US coll
ege foot
ball l
eagu
e [14] and dolp
h
in so
cial net
work [
15].
These graph
s has differe
nt stru
ct
ures of l
o
cal
comm
un
ity.
Karate cl
ub
netwo
rk
cont
ains 3
4
me
mber
s a
s
vertice
s
and 7
8
edge
s re
p
r
esenting
friend
ship b
e
twee
n membe
r
s. Du
e to a disa
gre
e
ment
between the
club’
s admin
istrato
r
and th
e
club’
s in
stru
ctor, the
cl
ub
splits i
n
to two
sm
all
e
r
comm
unit
i
es. In
addit
i
on, the
s
e t
w
o
comm
unitie
s
are p
r
omin
en
t.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 4, Dece
mb
er 201
4: 100
5 – 1016
1012
Figure 2. Karate club n
e
twork
The US
colle
ge football n
e
twork i
s
co
nstr
u
c
ted f
r
o
m
the game
sched
ule of the 200
0
sea
s
o
n
. The
node
s i
n
the
network
rep
r
esent the
1
15 team
s,
while the
ed
ge
s
rep
r
e
s
ent
613
games
played in year. The teams
are divided
into 11 c
o
nferences
of 8-12 teams
eac
h
and
gene
rally g
a
m
es
are mo
re freq
uent
b
e
twee
n t
eam
s of th
e
sa
me confe
r
en
ce th
an b
e
twee
n
teams of di
fferent confe
r
en
ce
s. Therefore, the community
st
ructu
r
e i
s
strong in ea
ch
confe
r
en
ce.
The dol
phin
s
so
cial n
e
two
r
k
studie
d
by
David a
nd L
u
ssea
u [15]
wa
s con
s
tru
c
ted from
observation
s
of a
com
m
uni
ty of 62
bottle
-
no
se
dolp
h
in
s. Thi
s
netwo
rk is divide
d i
n
to two
g
r
ou
p
s
according to
their a
ge. Bu
t the com
m
u
n
ity stru
ct
ure
is not
as
si
gnifica
nt as t
he karate cl
u
b
netwo
rk.
We
next provide the evaluatio
n
measu
r
e
s
.
Normali
z
ed
mutual information (NMI) is an
imp
o
rtant evalu
a
tion crite
r
ia
in local
comm
unity discovery [16][10]. T
he co
rrect partition i
s
den
oted by
{}
RR
R
PC
C
, where
R
C
is
the reference local
comm
unity. Si
milarly, the foun
d i
s
den
oted
by
{}
FF
F
PC
C
, wh
ere
F
C
is
the re
sult local comm
unity. A confusio
n
matrix
N
is em
ployed in NM
I measu
r
e, where th
e ro
ws
corre
s
p
ond
to
the
real
com
m
unities,
and
the
colu
mn
s
corre
s
p
ond
to
the fou
nd
co
mmunitie
s
. T
he
element of
N
,
ij
N
is the
nu
mb
er of
nod
es i
n
the
real
co
mmunity
i
that appe
ar i
n
t
he fou
n
d
comm
unity
j
. Then, then
NMI measure of simil
a
rity betwee
n
the partitions, ba
sed
on
informatio
n theory, is defin
ed belo
w
:
11
11
2
l
og(
)
()
l
og(
)
l
og(
)
RF
RF
CC
ij
i
j
i
j
ij
RF
CC
ii
j
j
ij
NN
N
N
N
IP
P
NN
N
N
N
N
whe
r
e the
su
m over ro
w
i
of matrix
ij
N
is d
enoted
.
i
N
and the su
m over
colum
n
j
is denoted
j
N
.
Thus, a NMI score of 1 sh
ows that both
commu
nities are identi
c
al
and a score
of 0 is
whe
n
these two commu
nities are totally indep
ende
nt.
On
the othe
r
hand, we e
m
ploy
the
F
-m
easure
to reveal the
efficie
n
cy of lo
cal
comm
unity
discovery alg
o
rithm
s
. Preci
s
ion i
s
the fra
c
tion of the
F
C
retrieved that lies in
R
C
.
FR
F
CC
p
r
eci
s
i
on
C
The re
call i
s
the fractio
n
of
R
C
which are
su
ccessf
ully det
ected by
F
C
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
A New Algo
rithm
for Detect
ing Lo
cal Co
mm
unity Based on Rand
o
m
Walk (Yue
ping Li)
1013
FR
R
CC
r
eca
l
l
C
We ado
pt the weighted h
a
rmonic me
an
of prec
i
s
io
n and re
call. Tha
t
is, the traditional
F
-mea
su
re is
2
()
p
r
eci
s
i
o
n
r
ecal
l
F
p
r
eci
s
i
o
n
r
ecal
l
The p
a
ra
met
e
r
setting i
s
given a
s
follo
ws: F
o
rm
ula
11 sho
w
s th
e co
ndition
s
that one
step is
con
s
i
dere
d
as
one
jumping poi
nt:
1
rep
r
e
s
en
ts increa
sing
gap and
1
n
re
quire
s h
o
w
many con
s
e
c
utive points should
re
ach the ga
p. Simil
a
rly,
2
and
2
n
re
pre
s
ent th
e d
e
crea
sing
gap an
d the
numbe
r of con
s
e
c
utive
points o
b
ta
ini
ng the gap,
respe
c
tively. In addition,
2
stand
s fo
r th
e thre
sh
old o
f
one flat
seg
m
ent an
d
2
m
sh
ow the
nu
mb
er of
co
nsecu
t
ive points
of
whi
c
h the val
ues
sho
u
ld n
o
t go beyond
the thre
shold.
In ou
r al
gorit
hm, we
set
1
00
5
,
2
00
3
,
2
0
001
5
an
d
12
2
3
nn
m
for all t
he
datasets.
Since th
e m
easure
M
will diminish to 0
whe
n
the
subgraph
s
ub
G
covers the
whol
e
grap
h
G
, it is
suppo
se
d to
h
a
lt the
sea
r
ch
whil
e
s
ub
G
is too l
a
rge.
In thi
s
case, th
ere i
s
prob
ably
no si
gnifica
nt local
comm
u
n
ity containin
g
the sta
r
t vertex
0
v
. We u
s
e
para
m
eter
to stop th
e
algorith
m
wh
en mea
s
u
r
e
M
is too sm
all. The paramete
r
is
s
e
t to 0.15 in our algorithm.
We
com
p
a
r
e
our alg
o
rith
m, denote
d
by
LR
W
, with th
e
state
-
of-the
-art on
es by
far
whi
c
h a
r
e Ba
gro
w
’s
algo
rithm [10],
n
LW
P
an
d
A
LWP
pro
p
o
s
ed
by Luo, Wa
n
g
and Promi
s
lo
w
[11].
It is note th
at we
ch
oo
se the b
e
st
p
in ra
nge
{0
7
5
0
7
6
1
}
for all the
data
s
et
s
according
to
Bagro
w
’
s
alg
o
rithm.
We f
ound
overall perfo
rman
ce
is
b
e
st wh
en
09
p
which
c
o
heres
to the result in [10].
Table 1. Re
sults on re
al-world data
s
et
s
karate
football
dolphins
LR
W
Bagrow
L
W
P
n
LW
P
A
LR
W
Bagrow
L
W
P
n
LW
P
A
LR
W
Bagrow
L
W
P
n
LW
P
A
F
av
g
0.858
0.7709
0.7646
0.6436
0.7922
0.7027
0.9058
0.1862
0.6695
0.6685
0.5021
0.9367
F
st
d
0.118
0.1535
0.2418
0.0745
0.145
0.2991
0.1591
0.0285
0.1369
0.1473
0.2945
0.1618
I
av
g
0.5372
0.4438
0.5275
0.0332
0.6114
0.5568
0.84
0 0.3301
0.2603
0.2939
0.7597
I
st
d
0.205
0.2413
0.3084
0.0812
0.2095
0.311
0.2307
0
0.1616
0.1791
0.3014
0.2583
We
enum
erate ea
ch
verte
x
as th
e
start
vert
ex a
nd
p
e
rform
the
al
gorithm
to d
e
t
ect the
local
com
m
unity. The experimental resu
lts are p
r
esented by Ta
b
l
e 1, whe
r
e
av
g
F
and
s
td
F
are
the ave
r
age
and th
e
stan
dard
deviatio
n
of
F
-mea
su
r
e
,
avg
I
an
d
s
td
I
are
the ave
r
ag
e
a
nd th
e
stand
ard
devi
a
tion of
NM
I
measure. T
he best values
ar
e illust
rated
by bold font. The
n
LW
P
algorith
m
on the dolp
h
ins
d
a
taset.
It shows that
LR
W
perform
s be
st on
av
g
F
and
avg
I
of karate net
wo
rks. For the o
t
hers,
t
he re
sult
s
o
f
LR
W
stand the
se
con
d
po
sition. It appea
rs
that the ov
erall
re
sults
of
n
LW
P
algorith
m
an
d ou
r al
gorit
hm a
r
e b
e
tter tha
n
th
e
others.
Th
erefore,
we
show the ove
r
all
perfo
rman
ce
whi
c
h is give
n by Figure 3.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 4, Dece
mb
er 201
4: 100
5 – 1016
1014
Figure 3. Overall pe
rform
a
nce of t
he results on re
al-world data
s
et
s
It appears th
at the overall
performan
ce
of our
alg
o
rit
h
m is ap
proximately the same a
s
n
LW
P
algo
rithm
o
n
real-wo
r
ld
datasets.
Ou
r al
gorith
m
works better
on
karate a
nd dolphi
ns
datasets whi
l
e
poo
rer on
foot
ball dat
aset
comp
ared with
n
LW
P
algorithm.
Ho
wever, o
u
r
algorith
m
is more stable
t
han
n
LW
P
. One re
aso
n
is that
our al
gorith
m
perfo
rms
be
st on on
e
dataset an
d
stand
s th
e
se
con
d
o
n
th
e
rema
ine
d
t
w
o
dataset. On
the oth
e
r ha
n
d
, the
s
td
F
a
nd
s
td
I
of our algo
rithm are le
ss than the on
es
of
n
LW
P
on all datasets.
Next, we pre
s
ent the exp
e
rime
ntal re
sult
on comp
uter gen
erat
ed networks
whi
c
h is
illustrated by
Figure 4. It s
hows that our algorithm and
n
LW
P
outpe
rform
the other
s. B
y
the plot,
it appea
rs th
at our alg
o
rit
h
m and
n
LW
P
works a
p
p
r
oxima
t
ely the same
when
out
z
is sm
all. That
is, the
com
m
unity stru
ctu
r
e of te
st grap
hs i
s
signifi
ca
nt. While
out
z
go
es l
a
rg
e, the
perfo
rman
ce
of
n
LW
P
drop
s more dram
aticall
y
than our alg
o
rithm, espe
cially when
67
8
ou
t
z
.
Figure 4.
av
g
F
and
av
g
I
result
s on
128-nod
es n
e
t
work Overall
We
summ
ary
the re
sults
and p
r
e
s
ent t
heover
all
pe
rforman
c
e on comp
uter ge
nerate
d
netwo
rks by
Figure 5. It co
nclu
de
s t
hat our al
gorith
m
perfo
rms
better than
n
LW
P
algo
ri
thm on all
the evaluatio
n measures.
1
2
3
4
5
6
7
8
0.
4
0.
5
0.
6
0.
7
0.
8
0.
9
1
z
out
F
av
g
LR
W
Ba
g
r
o
w
LW
P
n
LW
P
A
1
2
3
4
5
6
7
8
0
0.
1
0.
2
0.
3
0.
4
0.
5
0.
6
0.
7
0.
8
0.
9
z
ou
t
I
av
g
Evaluation Warning : The document was created with Spire.PDF for Python.