TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol.12, No.5, May 2014, pp
. 4063 ~ 40
7
0
DOI: http://dx.doi.org/10.11591/telkomni
ka.v12i5.4389
4063
Re
cei
v
ed O
c
t
ober 1
7
, 201
3; Revi
se
d Decem
b
e
r
26, 2013; Accept
ed Ja
nua
ry 1
8
, 2014
Detecting Community and Topic Co-Evolution in Social
Networks
Juan Bi*
1
, Zh
iguang Qin
1
, Jia Huan
g
2
1
School of Co
mputer Scie
nc
e and En
gi
neer
ing, Un
iversit
y
of Electronic S
c
ienc
e an
d T
e
chno
log
y
of Chi
na,
No.20
06, Xi
yu
an Aven
ue, Ch
eng
du, 61
17
31
, China
2
Chin
a Scienc
e
Publis
hin
g
& Medi
a Ltd (Sci
ence Press),
No.16 Sa
nse R
oad, Ch
en
gdu,
Chin
a, 610
061
, China
Corresp
on
din
g
author, e-mai
l
:biju
an
66@
gma
il.com*
, qi
nzg
@
uestc.ed
u
.cn
,
huang
ji
a@ma
il.scienc
ep.co
m
A
b
st
r
a
ct
In this paper w
e
study how
to discover the c
o
-evo
lutio
n
of topics a
nd co
mmu
n
iti
e
s over time in
dyna
mic socia
l
netw
o
rks. We prese
n
t a topic
mo
del
-
b
a
s
ed ap
proac
h
that autom
ati
c
ally capt
ures
the
dyna
mic fe
atur
es of c
o
mmu
n
i
t
ies a
n
d
topics
evo
l
utio
n. Our
mod
e
l
can
b
e
view
ed
as
a
n
extensi
o
n
of th
e
LDA
mode
l w
i
th the
key
ad
diti
on th
at it c
an
n
o
t on
ly d
e
tect c
o
mmuniti
es
an
d top
i
cs si
mult
ane
ously
b
u
t al
so
w
o
rk in a
n
o
n
li
ne fash
io
n. Ins
t
ead of
mod
e
li
ng co
mmuni
ti
e
s
and
topics
in
statistical
ma
n
ner, the
prop
o
s
ed
mo
de
l ca
n si
mulate
the
user
’s inter
e
sts drift
i
ng
at
d
i
fferent
time epoc
hs by
taki
ng into
consi
derati
o
n
the
temp
ora
l
infor
m
ati
on i
m
pl
ied in
the
data, a
n
d
obs
erve
how
the co
mmu
n
ity
structure cha
n
ges ov
er ti
me
w
i
th
the evo
l
uti
on
of topics. Exp
e
ri
m
ents
on r
eal-w
orl
d
dat
a
set have
pro
v
ed the
ab
ility
of this
mod
e
l
in
discov
e
rin
g
w
e
ll-con
necte
d a
nd topic
a
lly
mean
ingf
ul co
mmu
n
iti
e
s an
d the co-ev
o
l
u
tio
n
pattern of to
pics
and co
mmuniti
es.
Ke
y
w
ords
:
co
mmu
n
ity disc
o
v
ery, LDA, pro
bab
ilisti
c g
e
n
e
r
a
tive mod
e
l, so
cial n
e
tw
orks
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
The ra
pid d
e
v
elopment of
online
so
cial
netwo
rks h
a
s tremend
ou
sl
y chang
ed th
e way of
peopl
e to co
mmuni
cate with each othe
r. A lot of
user-gen
erate
d
conte
n
t is available on th
ese
online
soci
al
networks. T
he
rich
sou
r
ce
of te
xt inf
o
rmatio
n
can
be
exploite
d to exten
d
the
traditional
so
cial g
r
ap
h. Specifi
c
ally, in
corpo
r
at
ing
b
o
th linkage
st
ructu
r
e
and t
e
xt informant
can
provide
a uni
que ability of detectin
g
late
nt soci
al
stru
cture
amon
g
grou
p of users. In this pap
er,
we a
ddress t
he proble
m
o
f
automat
icall
y
discoverin
g
latent co
mm
uni
ties
of use
r
s from o
b
served
textual conte
n
t and their relation
ship
s.
The stu
d
y o
f
commu
nity stru
cture in
net
wo
rks i
s
prima
r
ily b
a
se
d on th
e
grap
h
partitionin
g
al
gorithm [1
-6]
and proba
bilistic mo
del. The metho
d
pre
s
ente
d
in
[1] is based
on
agglom
erative
algo
rithm whe
r
e edge
s
are rem
o
ve
d
from
th
e n
e
twork ite
r
ati
v
ely to split i
t
into
comm
unitie
s
.
These
meth
ods are pure
l
y
based on gr
ap
h pa
rtition algo
rithm, and they fail
to
accou
n
t for other no
de attributes a
n
d
commu
nication conte
n
t information.
Mean
while, the
prob
abili
stic
gene
rative m
odel
s [7
-10]
have b
een
g
a
ined
si
gnificant attention
in recent yea
r
s.
SSN-L
DA [9]
define
s
com
m
unity as
a d
i
stributio
n ov
er the
so
cial l
i
nk
spa
c
e. L
D
A-G [1
0] si
mply
adapt
s the original L
D
A model for
com
m
unity discov
er
y in a so
cia
l
grap
h, they merely con
s
i
der
the lin
k
stru
ct
ure i
n
a
g
r
ap
h. Several
m
e
thod
s for
an
alyzing
the
e
v
olution of to
pics in
large
-
scale
corpo
r
a have
been propo
sed [11-1
7
]. These inclu
de
the Dynami
c
Topic M
odel
(DT
M
) [12], the
Contin
uou
s T
i
me Dynami
c
Topic M
odel (CTDM)
[13] a
nd Topi
c over Time (TOT
) [14].
In this pa
pe
r, we p
r
o
pose
a pro
babili
sti
c
topi
c mo
del
to detect lat
ent co
mmunit
i
es in
a
so
cial net
work ba
sed o
n
semantic info
rmation and t
he so
cial rel
a
tionship
s
be
tween u
s
e
r
s.
In
contrast to
the p
r
eviou
s
works, the a
ppro
a
ch n
a
tu
rally allo
ws the topi
c
model to work in an
online
fashio
n. In
su
ch
a
way
the
user’s inte
re
sts drifting
at d
i
fferent time
epo
ch
s
can
be
observed,
an
d the
evolutio
n of to
pics, in
turn,
dete
r
mi
ne
the
chan
g
e
s of comm
u
n
ities’ structu
r
e
and thei
r topi
cal featu
r
e
s
over time. In
our
wo
rk
,
we
con
s
id
er
co
mmunity and
topic a
s
diffe
rent
latent variabl
es. The mo
del can
not o
n
ly di
scove
r commu
nities and topi
cs simultane
ou
sly,
enabl
e
them to
benefit
e
a
c
h other, but also
t
r
a
c
k
the
evolution of discovered communitie
s
a
nd
topics over time, which is
useful in u
n
d
e
rsta
ndin
g
the dynamic fe
ature
s
of so
ci
al netwo
rks.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4063 – 40
70
4064
2. Proposed
Metho
d
Defini
tion 1
(Topi
cal
co
m
m
unity). A to
pical
co
mmu
nity is a g
r
o
u
p
of u
s
ers
wi
th more
simila
r
comm
unication i
n
te
rest
s
and
stronge
r
relatio
n
shi
p
stren
g
th bet
wee
n
t
hem
within t
h
e
grou
p than b
e
twee
n gro
u
p
s
.
Defini
tion 2
(time-stam
p
ed so
cial graph). Let
G
= (
U, E,X,W
) be the direct
ed and
weig
hted so
ci
al graph at ep
och
t
, where
U
is the user
set in
G
and E is the link set whe
r
e
E
denote
s
a directed lin
k fro
m
use
r
i
to user
j
whi
c
h
corresp
ond
s to a relation
shi
p
b
e
twee
n
i
and
j.X
is
the set
of weig
hts whe
r
e
is th
e weig
ht of the lin
k
.
We d
enote t
h
e wei
ght a
s
t
he st
ren
g
th
of relation
shi
p
from user
i
to us
er
j.W
is t
he set of user-gen
erated te
xts.
Defini
tion 3
(Relatio
nshi
p stren
g
th).
In our wo
rk, the relation
ship
stren
g
th
is the
intensity of in
teractio
ns
su
ch a
s
me
ntio
n,
retwe
e
t be
tween t
w
o
co
nne
cted u
s
e
r
s. We
assum
e
the stronger the relationshi
p
, the more number of
interactions
will take
pl
ace bet
ween two users
Defini
tion 4
(time-stamp
ed do
cum
ent
s). A coll
ecti
on of user-g
enerated
con
t
ent are
assume
d to
b
e
divide
d into
so
calle
d
“e
pochs”. Th
e
conte
n
t ge
ne
rated
by u
s
e
r
u
at the
current
epo
ch
t
is re
pre
s
ente
d
by
w
,
w
,,
,
,
i.e
. the set of word
s in th
e conte
n
t.We
assume that
the epo
ch
t
is a discrete va
riable, a time
epo
ch
can be
a day, a month, or a year.
2.1. Model
The graphi
c
model representati
on of o
u
r mod
e
l wh
ich we prese
n
t in this pa
per i
s
illustrate
d in Figure 1. In this
mod
e
l, the mixture co
mpone
nts,
i.e.
communitie
s
and topi
cs,
are
sha
r
ed
explicitly acro
ss all
time epo
ch
s,
but t
he mixin
g
wei
ghts of
each comp
on
ent evolve ov
er
time, for example, som
e
topics may be
come m
o
re p
opula
r
whil
e others may b
e
com
e
outdat
ed.
Figure 1. Gra
phical Model
Rep
r
e
s
entat
i
on of the Pro
posed Dyn
a
m
ic Mod
e
l
At time epo
ch
t
, the p
r
op
o
s
ed
mod
e
l
co
nsi
s
ts
of tw
o
parts
.
Firs
t, we mo
de
l th
e in
te
r
e
s
t
s
of each u
s
e
r
in the corpu
s
. Specificall
y
, we r
epre
s
ent each use
r
as a multin
omial distri
bu
tion
over topi
cs
, thus e
a
ch word written by t
he user i
s
ge
nerate
d
from
one topi
c sel
e
cted from th
e
distrib
u
tion. In ord
e
r to m
o
del the evolut
ion of t
opic,
we a
s
sume t
hat the curre
n
t topic at ep
och
t
can
be
ge
nerated in
two
ways,
either
depe
nding
o
n
the to
pic di
stributio
n of t
he p
r
eviou
s
t
i
me
epo
ch
s or b
e
i
ng not influen
ced by hi
stori
c
al info
rmatio
n but cu
rre
nt status
. In the
model, we
use
a paramete
r
s
to co
ntrol t
he influen
ce
situation. Th
e
s
is generated from a Bernoulli di
stri
bution
who
s
e para
m
eter
i
s
. When
s
=
0, a
new to
pic
di
stributio
n will
be sample
d
from symm
e
t
ric
Diri
chlet di
stribution
α
. When
s
=
1, it means use
r
’s
cu
rre
nt
in
terest i
s
det
ermin
ed by
his
previou
s
stat
us. In
this ca
se,
we
assu
me topi
c
smo
o
thly ch
ang
e
s
from time
t-1
to
t
. A topic with
a high
er mixt
ure
weig
ht at
the current e
poch is
mo
re lik
e
l
y to
ha
ve
a
h
i
gh
er
we
ig
h
t
in
th
e ne
xt
epo
ch. Motivated by dHDP [15], the priors of a topi
c
z
at epo
ch
t
can be con
s
tru
c
ted a
s
follows:
,
,
1
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Dete
cting Co
mm
unity and
Topic
Co
-Evo
lution in Soci
al Networks
(Jua
n Bi)
4065
Whe
r
e
η
i
s
a
smooth
para
m
eter,
,
is the
numbe
r of word
s a
ssi
gne
d to topic
z
at
time epoch
t
.
The se
con
d
stage of
the gene
rative
p
r
oce
s
s
is
de
ri
ved the
com
m
unity mem
bership
for
u
s
er
depe
nd
s on this u
s
er’
s
to
pics. Hen
c
e, we select a
comm
unity assi
gnme
n
t for a spe
c
ific u
s
er
from the topi
c-comm
unity distrib
u
tion
λ
and finally ea
ch lin
k (inte
r
a
c
tion)
of this
use
r
ge
nerated
from the co
m
m
unity-spe
cific dist
ribution.
Thus, the g
e
n
e
rative proce
ss fo
r time ep
och
t
of the propo
sed mo
de
l is given as f
o
llows:
a) For
ea
ch
com
m
unity
C
, dra
w
a multinomi
a
l distrib
u
tion
~
Diric
h
let
(
)
b)
For ea
ch topi
c
K
, draw a m
u
ltinomial di
stribution
~ Di
rchlet(
)
c)
For ea
ch t
opi
c
K
, draw infl
uence probability
~
Beta(
π
)
d)
For ea
ch u
s
e
r
i
U
:
(1)
Dra
w
an influ
ence indi
cato
r
,
~
Bernoulli
(
)
(2)
Dra
w
a multin
omial topic di
stributio
n
,
~
Diri
c
h
let (
)
(3)
For ea
ch topi
c
K
, draw co
m
m
unity distrib
u
tion
,
~ Diri
chl
e
t
(
,
)
(4)
For
ea
ch
wor
d
,
,
,
associ
ated
with use
r
i
:
a.
Dra
w
a topi
c
,
,
~
M
u
l
t
i
n
o
m
ia
l(
,
)
b.
Dra
w
a word
,
,
~
M
u
l
t
i
n
o
m
ia
l(
,
,
)
(5)
For ea
ch lin
k
,
,
,
for us
er
i
:
a.
Dra
w
a topi
c
,
,
~
M
u
l
t
i
n
o
m
ia
l(
,
)
b.
Dra
w
a com
m
unity
,
,
~
Multinomial(
,
,
,
)
c.
Dra
w
a lin
k
,
,
~
M
u
ltinomial
(
,
,
)
The graphi
cal model re
pre
s
entatio
n is sh
ow
n in
Figure 1 where the g
r
a
y
circl
e
s
corre
s
p
ond to
obse
r
ved variable
s
of textual inform
atio
n and link inf
o
rmatio
n re
sp
ectively. Others
denote
the
latent varia
b
l
e
s
and
pa
rameters. T
h
i
s
g
ene
rative
mod
e
l represe
n
ts
co
nte
n
t
informatio
n a
s
a mixture o
f
topics an
d link info
rmatio
n as a mixture of commu
ni
ties. At epoch
t
,
wefirst gene
rate topic for each wo
rd for a spe
c
ific u
s
er
u
from multinomial
,
, and the topic for
each word
g
enerated by
this u
s
e
r
re
p
r
esents
the i
n
tere
sts
of h
i
m. Then
we
gene
rate th
e
comm
unity assi
gnme
n
t for this use
r
d
epen
ding on
the use
r
and
the topics which the u
s
er is
really interested in.
,
repre
s
e
n
ts the com
m
un
ity distribution for top
i
c
z
. The
comm
unitymembe
r
ship of
a u
s
er is de
rived from
t
h
e
t
opic’
s
com
m
unit
y
mix
t
ure. In othe
r word
s,
use
r
s wh
o
sh
are seri
es of
topics
with
e
a
ch
oth
e
r
sh
ould be m
e
m
bers of the same commu
n
i
ty.
The li
nk information
wa
s
a
s
sumed
to
b
e
rand
om mi
xture ove
r
co
mmunitie
s
, a
nd e
a
ch lin
k
of a
use
r
wa
s final
ly generate
d
from the
com
m
unity-spe
cific dist
ribution.
At first epoch
t
= 1, the topic di
strib
u
tion
,
is drawn from a Di
richl
e
t prior
α
, and the
topic-co
mmu
nity distribution
,
is drawn from a Diri
chl
e
t prior
ε
, where
α
and
ε
are initialized to
symmetri
c
co
nstant, as d
o
ne in origi
nal
LDA mod
e
lin
g.
Formally, let
Z
an
d
C
b
e
t
he
set
of late
nt topics an
d
latent commu
nities
re
sp
ect
i
vely,
W
be the set of
words in th
e corpu
s
,
V
be the set of intera
ction
s
that
were ob
serv
ed on the so
cial
grap
h. The joi
n
t proba
bility on the texts, links and the l
a
tent variable
s
at epo
ch
t
is given by:
PW
,
V
,
Z
,
C
,
S
|
α
,
β
,
ε
,
γ
,
α
,
δ
P
W
|Z
;
φ
PV
|C
;
ψ
P
Z
|
θ
P
C
|Z
,
λ
P
λ
|
λ
,
ε
P
φ
|
β
P
ψ
|
γ
P
θ
|
α
,S
i
p
w
,,
|
φ
,
,
p
φ
|
β
d
φ
,
p
v
,,
|
ψ
,
,
p
ψ
|
γ
d
ψ
,
p
z
,
,
θ
,
p
c
,,
λ
,
,
,
,
p
λ
,
λ
,
,
ε
d
λ
p
z
,,
θ
,
p
θ
,
α
,s
,
,
d
θ
p
s
,
|
δ
p
δ
|
π
d
δ
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4063 – 40
70
4066
2.2. Parameter Estimatio
n
We a
dopt t
he collap
s
e
d
Gibb
s
sam
p
ling,
a
stochasti
c ap
pro
a
ch fo
r a
pproximate
inferen
c
e in high-dimen
s
i
onal model
s.
We need to derive
|
,
,
,
,
,
,
,
,
and
|
,
,
,
,
,
,
,
,
, the conditional distrib
u
tio
n
of a
comm
unity and topic based on
all othe
r vari
able
s
. In pa
rticula
r
, the
co
nditional
distribution of the
topic
assign
ment (whe
n
1
) is given whil
e the other
ca
se (whe
n
0
) is
omitted due to the spa
c
e li
mited.
,
,
,
,
,
,
,
,
,
,
,
1
|
,
,
,
,
,
,
,
,
,
,
,
,
,
.
,
,
,
,
,
.
,
,
,
∑
,
,
,
,
,
,
2
Whe
r
e
,
,
is th
e num
ber
of
times of
wo
rd
w
a
s
sign
ed t
o
topic
z
at
epo
ch
t
, excl
uding th
e
c
u
rr
ent w
o
rd
i
.
,
,
.
is the total numbe
r of word
s a
ssi
g
ned to topic
z
at epoch
t
excludi
ng
cur
r
e
n
t
wo
rd
i
. Similarly,
,
,
is the n
u
mbe
r
of times of community
c
sampled
f
r
om topic
z
at
epo
ch
t
, not inclu
d
ing the
curre
n
t comm
unity.
,
,
is num
ber of wo
rd
s gene
rated by
user
u
at
epo
ch
t
assi
g
ned to comm
unity topic
z,
not includi
ng
the curre
n
t one. The last term mea
s
u
r
e
s
the probability of having t
he
influence indicator
vari
able
s
equal
to 1. Fu
rther,
the
con
d
itio
nal
distrib
u
tion of
a commu
nity assi
gnme
n
t is given by:
,
,
,
,
,
,
,
,
,
|
,
,
,
,
,
,
,
,
,
,
,
.
,
,
,
,
,
.
Whe
r
e
,
,
is the numbe
r of times of user
v
assi
gne
d to comm
unity
c
at epoch
t
, not including
the current user.
,
,
.
is the total number
of users assi
gned to com
m
unity
c
at epoch
t
, not
inclu
d
ing the
curre
n
t one.
Finally, the multinomial parameters
,
,
,
,
,
,
,
,
,
,
,
are obtaine
d as follows:
,
,
|
,
,
∑
,
,
,
,
|
,
,
.
,
,
|
,
,
,
.
,
,
|
,
,
.
3. Results a
nd Analy
s
is
3.1. Data
se
t Des
c
ription
Here, we prese
n
t the data collecte
d
from Tw
itte
r. Since ou
r go
al is to explore the
relation
shi
p
betwe
en use
r
’
inte
re
sts a
nd
thei
r
inte
raction
s
i
n
th
e soci
al n
e
twork,
we
ne
ed to
colle
ct info
rm
ation a
bout
use
r
s,
conte
n
t and
lin
k
structu
r
e. T
h
e
co
ntent i
n
Twitter
refe
rs to
tweets. A
nd
we
co
nne
ct t
w
o
users
onl
y if an inte
ra
ction to
ok pla
c
e
between
them via
men
t
ion
action
s (
@u
ser nam
e
) or retweet actio
n
s
(
RT
), e
a
ch
link
wei
ghted
by countin
g
the nu
mbe
r
o
f
times the
s
e
a
c
tion
s have
take
n pla
c
e
b
e
twee
n the
t
w
o u
s
e
r
s. All the data i
s
co
llected via
Twitter
API from Jul
y
1, 2012 to Octob
e
r 3
1
, 2012. We
ap
plied pre-pro
c
e
ssi
ng
to tweets content
by
removin
g
non
-Engli
s
h twe
e
t
s, punctu
atio
ns an
d st
o
p
words.
We al
so excl
ude
d a small n
u
mb
er
of sho
r
t tweet
s, in whi
c
h l
e
ss th
an ten
word
s rema
in
e
d
in the ba
g
of words
after the stop
-words
had b
een
re
moved. Final
ly, our colle
ction co
ntain
s
3054
users,
1836
75 lin
ks an
d 1
3763
3
distin
ct wo
rd
s. For si
mplicit
y, the unit ep
och
wa
s
set to one m
onth,
so the
r
e
we
re 4 ep
ochs
(i.e.
July, August,
Septembe
r a
nd Octo
ber).
3.2. Experiment Results
Our mod
e
l i
s
evaluated
in t
h
ree
p
r
obl
em
dom
ains:
the
evolution
of t
opics, the
ev
olution
of commu
nities, and the
dynamic
rela
tionshi
p between topi
c an
d comm
unity.
Z
and
C
, t
h
e
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Dete
cting Co
mm
unity and
Topic
Co
-Evo
lution in Soci
al Networks
(Jua
n Bi)
4067
numbe
r of to
pics and th
e numbe
r of co
mmunitie
s
re
spe
c
tively are fixed and
share
d
a
c
ro
ss all
time epochs.
We set the numbe
r of com
m
unities
C
at 10 and topi
cs
Z
at 20.
3.2.1. Topic Trend Analy
s
is
To an
alyze t
he evolutio
n
of the topi
cs over
time, e
.
g. whethe
r t
hey are em
e
r
ging
or
decli
ning, we
cal
c
ulate th
e topic p
opu
larity
along f
our time e
p
o
ch
s. The
m
o
re u
s
e
r
s who
comm
uni
cate
d on a topi
c,
the more
po
pular th
e topi
c is. Be
cau
s
e ea
ch u
s
e
r
is intereste
d
in
each topic
with a different d
egre
e
, the po
pularity of topic
z
at epo
ch
t
is formally defined a
s
:
,
1
|
|
,
∈
Whe
r
e
,
is the topic distribu
tion for user
u
at epoch
t
,
whi
c
h indicates the level of participation
of each u
s
e
r
in each topic.
In Figure 2,
we p
r
e
s
ent t
he mixing p
r
oportio
n
of topics at e
a
ch epo
ch. Ea
ch topi
c i
s
rep
r
e
s
ente
d
by a stripe. The wi
dth of a stripe
co
rre
sp
ond
s to the popula
r
ity of th
e corre
s
p
ondi
n
g
topic ove
r
tim
e
. The
wide
r
the stri
pe, th
e more p
opul
ar the
topic is. From Fi
gu
re
2, we
find th
at
the pop
ula
r
ity of mo
st topi
cs in e
a
ch e
poch
vari
ed
smoothly. Sin
c
e th
e mixture propo
rtion
for
each topic m
a
y be influen
ced by the
hi
story topi
cs
i
n
formatio
n, the users’ top
i
c (inte
r
e
s
t)
may
not change too much between adj
acent epochs; but after a long time, interest
s m
a
y drift.
Figure 2. Overview of the
E
volution of
T
opics
3.2.2. Communit
y
Ev
olution Analy
s
is
In ou
r mo
del
, a
comm
unit
y
assign
ment
of a
u
s
er i
s
dep
end
ent
on the
u
s
e
r
and
his
inherent interest (late
n
t topics).
The
r
efore, the comm
u
n
ity struct
u
r
e
and its topi
c
distrib
u
tion al
so
cha
nge
a
c
co
rding
to th
e
topic
evolutio
n ove
r
time.
In o
r
de
r to
reveal
the
hi
dden
evoluti
o
n
pattern
s of co
mmunitie
s
, we sele
cted two comm
unitie
s
for the com
parative an
al
ysis.
For
ea
ch
co
mmunity, we
can
get th
e
topic
dist
rib
u
tion at e
a
ch epo
ch.
Th
e topical
intere
st of
ea
ch
co
mmunit
y
can
be
de
scrib
e
d
by
the
mo
st o
c
curri
ng topi
cs in
i
t. Those topi
cs
corre
s
p
ond t
o
the mo
st repre
s
e
n
tative topics for th
e sel
e
cte
d
community. F
o
rmally, give
n a
sele
cted
com
m
unity c, the set of most i
m
porta
nt topics
Re
p
r
,
can be
co
mputed a
s
:
,
∈
:
,
,
Whe
r
e
n
arg
m
ax denote
s
the functio
n
returning th
e n
topi
cs
wi
th the high
est values. On
this
way, we can
have an overview of topics in the comm
unity.
In Table
1
we give top fiv
e
topics
(n
= 5) a
nd th
eir
corre
s
p
ondin
g
key
wo
rd
s
for ea
ch
comm
unity. For exam
ple, the domi
nant t
opic in
com
m
unity
1
is
topi
c7
(re
c
all that
topic7
is
a
bou
t
pre
s
ide
n
tial e
l
ection
) and t
he topmo
s
t topic in the
co
mm
unity 4
is
topic1
3
(s
po
rts).
0
0.15
0.3
0.45
0.6
0.75
0.9
7/2012
8/2012
9/2012
10/2012
Presidential
e
lection
Electronic
p
roducts
the2012
Sports
Social media
Daily
life
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4063 – 40
70
4068
Table 1. Top
Five Topics i
n
Comm
unity 1 and Com
m
unity 4
Communit
y
4
Topic
13
0.188
2
Topic
18
0.1081
Topic
11 0.0674
Topic
7 0.0604
Topic10 0.0412
nfl 0.189
1
fishing
0.0345
Ol
y
m
pics201
2
0.0561
vote
0.0561
8
teaching 0.0248
footba
ll
0.053
0
Frida
y
0.0237
BBC
0.0275
presidenti
al
0.0275
3
educatio
n
0.0214
pla
y
er
0.017
3
w
e
eken
d
0.01774
Soccer 0.0117
cnn 0.0117
3
academi
c
0.0122
shot 0.009
6
night
0.00843
beer
0.0082
election 0.0082
3
math 0.0105
espn 0.093
5
great
0.00504
succeed 0.0076
campaign 0.0076
6
examine
0.0781
0
Communit
y
1
Topic7
0.2318
Topic 9
0.0981
Topic 8
0.0708
Topic 11
0.0678
2
Topic13 0.0532
president 0.0830
5
iPhone 0.0543
2
T
w
itter
0.0288
3
London
0.0223
game
0.0225
Romne
y
0.0729
2
Apple 0.0539
6
Social 0.0257
3
Ol
y
m
pics 0.0117
espn
0.0117
election 0.0252
8
iPhone
5
0.0478
2
Facebook
0.0222
5
USA 0.0084
5
shot 0.0085
speech 0.0206
5
launch 0.0344
0
Y
outu
b
e
0.0163
4
Live 0.0076
2
w
i
nning
0.0076
presidentia
l
0.0145
5
mobile 0.0223
8
Google
0.0123
9
w
i
ning
0.0070
3
dead
0.0574
To have
a
clea
r in
sig
h
t into the
evolut
ion in
each
com
m
unity over time, we
furtherl
e
vera
ged the
JS
(Jen
se
n-Sh
anno
n) divergen
ce to
measure the
simila
rity betwee
n
comm
unitie
s
gene
rated at
different epo
chs.
J
S
(p || q)
rep
r
e
s
ent
s the dissimila
rity between two
probability distributions
p
a
nd
q
, whi
c
h is defined a
s
:
,
1
2
,
,
Whe
r
e
KL (p,
q)
is the Kullback-Leibl
er
diverge
n
ce a
n
d
m
0
.
5
p
q
. To comp
ute the
JS-dive
r
ge
nce between
two
comm
uni
ties, we
r
e
p
r
esent e
a
ch
comm
unity as
a vecto
r
of
probabilities over
topi
cs
,
1
and a vecto
r
of prob
abilitie
s over lin
ks
,
1
. The
topi
c
similarity and link st
ru
cture si
milarity o
f
com
m
unity 1
betwe
en
different
time
epo
ch
s we
re
cal
c
ulate
d
an
d displ
a
yed in Figure 3(a
)
and (b
),
respectively. All of the result
s exhibit ahi
gh
simila
rity betwee
n
two
co
ntiguou
s time
perio
ds. Esp
e
cially, link
st
ructu
r
e
of
co
mm
unity 1
d
oes
not cha
nge
signifi
cantly for the entire
time
epochs. This may be becau
se there a
r
e
stro
ng
evolution
d
e
pend
en
cies betwe
en
the
s
e epo
ch
s. By
con
s
tru
c
ti
ng the
prio
rs a
s
a
weig
hted
combi
nation
of the hi
story information
,
t
he distri
bu
tion of ea
ch
com
pon
ent
at epo
ch
t
is
influen
ced b
y
its past di
stributio
n. Consequ
ent
ly
,
comm
unit
y
st
ru
ct
ur
e an
d topic b
e
tween
adja
c
ent e
p
o
c
h
s
may
sta
y
the sa
me
or
cha
nge
smoothly. Ho
wever, th
e
si
milarity between
epo
ch
1
an
d
e
p
o
c
h
4
i
s
rel
a
tively smalle
r tha
n
oth
e
rs,
whi
c
h
me
an
s the
com
m
uni
ty has chang
ed
after a long time.
Figure 3. Similarity Comp
a
r
iso
n
s of Co
mmunity
1 in (a) T
opics, (b
) Link
stru
ctu
r
e over all
epo
ch
s
July
/Aug
Aug/Sep
July
/O
ct
0
0.2
0.4
0.6
0.8
1
1-JS
(
a
)
July
/Au
g
Aug/Se
p
July
/O
ct
0
0.2
0.4
0.6
0.8
1
1-JS
(
b
)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Dete
cting Co
mm
unity and
Topic
Co
-Evo
lution in Soci
al Networks
(Jua
n Bi)
4069
To better un
derstand the
evolution behaviors
a
s
above illust
rated, the evolutiona
ry
pro
c
e
ss
of topics profile for a
spe
c
ific
com
m
unity 1
over all ep
o
c
h
s
is p
r
e
s
e
n
ted in Figu
re 4.
Topical p
e
a
k
s for a
com
m
unity indicate
the do
mina
n
t
topics for th
at parti
cula
r
comm
unity. We
can see
th
at
topic7
is very
promi
nent in
com
m
unity 1
across all e
p
o
ch
s. Thi
s
ha
s bee
n expe
cted
becau
se topi
c7 is
related
to the “US presid
entia
l ele
c
tion in 20
12
”, whi
c
h is th
e most domi
nant
and
widely
discu
s
sed to
pic in
the
selecte
d
data
s
et. Ho
weve
r, topics
can
rise an
d fal
l
in
promi
nen
ce.
It is not ne
cessary fo
r e
v
ery topic
di
stributio
n to
stay t
he
sa
me at different
evolutiona
ry epo
ch
s.
Topi
c11
(“th
e20
1
2
Olympics”)
is cl
early id
e
n
tified in
com
m
unity 1
by our
model, an
d o
u
r mod
e
l correctly sho
w
s
its rise and f
a
ll in promi
n
ence du
ring t
he four e
p
o
c
hs.
These an
alysis re
sult
s de
monst
r
ate tha
t
our mod
e
l can not only m
odel the temp
oral evolutio
n
of
topics ove
r
time ba
se
d on
histori
c
al
info
rmation,
b
u
t
also
ca
pture the em
ergi
ng
topics du
ring
the
evolutiona
ry pro
c
e
ss, which can b
e
don
e
by samplin
g the influen
ce indicator s.
Figure 4. Top
i
c Evolution o
f
Community 1 Over All Epoch
s
3.2.3. Perplexit
y
Analy
s
is
Perplexity is a comm
on
cri
t
erion for
evaluati
ng the qu
ality of cluste
ring. It measu
r
es th
e
predi
ctive pe
rforman
c
e a
n
d
the ability of a model
to gene
rali
ze to
unse
en data
.
The highe
r the
predi
ctive pe
rforma
nce i
s
, the lower th
e pe
rplexity will be, a
nd
hen
ce, bette
r gene
rali
zati
on
perfo
rman
ce
can be achie
v
ed.
We com
pute
the per
p
l
exity of obse
r
ving b
o
th lin
k
stru
cture a
nd
wor
d
s.
We divid
ed t
he data i
n
to
training
set
D
an
d test
set
rand
omly. Let
N
b
e
t
h
e
s
i
z
e
o
f
training set a
nd
M
the
s
i
z
e
of tes
t
set. Formally, the perple
xity of a
test set
at ep
och t give
n th
e
training set
is:
,
|
|
∑
|
,
|
|
|
|
Figure 5. Perplexity Value for (a
) No. of topics, (b)
No
. of communit
i
es
0
0.05
0.1
0.15
0.2
0.25
12345
6789
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
Pro
b
a
b
ility
I
D
of
Topi
cs
July
Aug
Sep
Oct
0
5000
10000
15000
20000
5
1
0
1
52
02
5
3
04
0
Perplexity
No. of
Topics
July
Aug
(a)
0
2000
4000
6000
8000
10000
12000
6
8
10
12
14
Perplexity
No. of
Communities
July
Aug
(b)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4063 – 40
70
4070
We exa
m
ine
d
the p
e
rpl
e
xity value for
each
ep
och
on different setting of topi
c nu
mbe
r
and
comm
un
ity number.
Figure 5(a)
p
l
ots the p
e
rpl
e
xities ag
ain
s
t the nu
mb
er of topi
cs, the
numbe
r of co
mmunitie
s
was set
to
1
0
f
o
r
thi
s
ex
pe
riment. In b
o
th
epo
ch
s, th
e
perpl
exities
can
get their mini
mum at
aro
u
nd 2
0
topi
cs. Figu
re
5(b
)
plots th
e p
e
rplexities
agai
nst the
num
b
e
r of
comm
unitie
s
. For both ep
o
c
h
s
, the perpl
exities
get their minimu
m at aroun
d 10
comm
unitie
s
.
4. Conclusio
n
With the
a
d
vent of
online
so
cial
net
worki
ng,vari
ou
s m
ode
s
of
comm
uni
cati
on e
nabl
e
use
r
s not o
n
l
y
to cre
a
te rel
a
tionship
s
wit
h
othe
rs
but also
to
share intere
sts by
g
enerating
text
s
.
In this pape
r, we prese
n
t a unified
proba
bilisti
c generative model that
not only detect
comm
unitie
s
and topics
in so
cial net
works si
m
u
ltaneo
usly, bu
t also ca
pture the dyna
mic
feature
s
of
communitie
s
a
nd topi
cs
evolution.
Thi
s
model exten
d
s p
r
io
r works on
co
mmu
nity
discovery
by inco
rp
oratin
g both th
e tempo
r
al
info
rmation
of relation
ship
s
and the
textura
l
conte
n
t gene
rated by u
s
e
r
s. Commu
nity is
detecte
d
depen
dent
on not only t
he explicit lin
ks
betwe
en indiv
i
dual
s but also the topics they comm
uni
cate ab
out. The model i
s
a
b
le to identify
importa
nt co
nsi
s
tent topi
cs, a
s
well a
s
captu
r
e th
e eme
r
gin
g
topics which
are i
n
ten
s
ively
covered only in a certain ti
me perio
d. The exper
im
en
t results a
r
e demon
strated
that the model
have the cap
ability to detect well-co
nne
cted and
topi
cally meani
n
g
ful commu
ni
ties and the co-
evolution of communitie
s
a
nd topics.
Referen
ces
[1] Girvan,
Michelle
, Mark
EJ
Ne
w
m
an.
C
o
mmunity str
u
cture i
n
soc
i
a
l
an
d
bio
l
o
g
ic
al n
e
tw
orks
.
Procee
din
g
s of
the Nation
al A
c
adem
y of
Sci
ences. 20
02; 9
9
(12): 78
21-
78
26.
[2]
F
o
rtunatoS. Co
mmunit
y
detect
i
on i
n
grap
hs.
Physics Report
s
. 2010; 48
6(3)
: 75-174.
[3]
Ne
w
m
a
n
, Mar
k
E, Michell
e
G. F
i
nding a
n
d
eval
uati
ng c
o
mmunit
y
stru
cture in n
e
t
w
o
r
ks.
Physical
review E
. 2004
; 69(20): 26-3
1
.
[4]
Ne
w
m
a
n
, Mark
EJ. F
a
st algor
ithm for detecti
ng commu
nit
y
structure in n
e
t
w
o
r
ks.
Physical review E
,
200
4; 69(6): 06
613
3.
[5]
Ming
w
e
i L
e
n
g
, Jinji
n
W
ang,
Pengfe
i
W
ang,
Xi
ao
yu
n Ch
e
n
. Hierarc
hica
l
Agglom
erati
o
n Commu
nit
y
Detectio
n Al
g
o
rithm vi
a C
o
mmunit
y
Sim
i
l
a
rit
y
Meas
ure
s
.
T
E
LKOMNIKA Indo
nesi
a
n Jour
na
l of
Electrical E
ngi
neer
ing.
2
012;
10(6): 15
10-
15
18.
[6]
Jian
Li, Hu
i
w
e
n
De
ng. Com
m
unit
y
Structu
r
e De
tecti
on
Algorit
hm Bas
ed o
n
the N
o
de Be
lon
g
i
n
g
Degr
ee.
T
E
LKOMNIKA Indon
esia
n Journ
a
l o
f
Electrical Eng
i
ne
erin
g
. 201
3; 11(12): 76
49-
765
4.
[7]
Blei D
a
vid M,
Andre
w
Y N
g
, Michae
l I Jorda
n
. Latent
dirich
let al
loca
tion.
the Jo
urn
a
l of machi
n
e
Lear
nin
g
rese
a
r
ch 3
. 2003; 9
9
3
-10
22.
[8]
Z
hou, Di
ng,
e
t
al.
Prob
ab
ili
stic mode
ls fo
r discov
e
rin
g
e-co
mmu
n
ities.
Proce
edi
ngs
of the
15t
h
intern
ation
a
l co
nferenc
e on
Wo
rld
Wide
We
b
. ACM. 2006; 173-1
82.
[9]
Z
hang H, et
al.
An LDA-
b
a
sed c
o
mmun
i
ty structure d
i
scovery a
ppr
oach for l
a
rg
e-scal
e
soci
al
netw
o
rks
. IEEE Confere
n
ce
on Intell
ig
ence
and Sec
u
rit
y
I
n
formatics. 200
7; 200-2
07.
[10] Hen
derso
n,
K
e
it
h, T
i
na Eli
a
ssi-Rad.
A
pply
i
ng
late
nt diric
h
let a
l
l
o
catio
n
to grou
p d
i
sco
very in
lar
g
e
grap
hs.
Procee
din
g
s of the 20
09 ACM s
y
m
p
osium o
n
Appl
i
ed Com
putin
g. ACM. 2009; 1
4
56–
14
61.
[11] Hua
ng,
Hsun-
Hu
i, Horn
g-C
h
ang Ya
ng.
Semantic Cl
ust
e
rin
g
-Base
d
C
o
mmunity Det
e
ction i
n
an
Evolvi
ng Soci
al Netw
ork.
Genetic an
d Evoluti
onar
y
Comp
uti
ng (I
CGEC). Sixth
Internation
a
l
Confer
ence
on
. IEEE. 2012; 91-94.
[12]
Blei
Dav
i
d M,
Joh
n
D
Laff
e
rt
y
.
Dy
na
mic
topic
mod
e
ls
. Procee
din
g
s
of the
23r
d
inter
natio
na
l
confere
n
ce o
n
Machi
ne le
arni
ng. ACM. 200
6
;
113-12
0.
[13] W
ang, C
hon
g,
David
Bl
ei, D
a
vid H
e
ckerm
an
.
Conti
nuo
us ti
me
dy
na
mic
to
pic
mo
de
ls.
Pr
ocee
din
g
s
of
the 24th C
onfe
r
ence i
n
Uncert
aint
y in
Artifici
a
l
Intelli
genc
e (UAI). 2008.
[14]
W
ang, Xueru
i
, Andre
w
McC
a
llum.
T
opics o
v
er time: a n
o
n
-Markov co
nti
nuo
us-ti
m
e
mo
del of top
i
ca
l
trends
. Proce
e
d
in
gs of the
12
th ACM SIGKDD inter
nati
o
n
a
l co
nferenc
e
on Kn
o
w
l
e
d
ge
discov
e
r
y
an
d
data min
i
ng. A
C
M. 2006; 42
4
-
433.
[15]
Prutean
u-Mal
i
n
i
ci, Iulia
n, et al.
Dyna
mic
hierarc
h
ica
l
Dirich
l
et proc
ess for mo
de
ling to
pics i
n
timesta
m
pe
d d
o
cu
me
nts. IEEE T
r
ans
. PAMI
201
0; 32(6): 99
6-10
11.
[16]
I
w
at
a,
T
o
moh
a
ru, et al.
Online
multisca
le
dyna
mic top
i
c mod
e
ls
. Pro
c
eed
ings
of the 16th ACM
SIGKDD intern
ation
a
l co
nfere
n
ce on Kn
o
w
l
e
dge d
i
scov
e
r
y
and d
a
ta min
i
n
g
. ACM. 2010; 663-
672.
[17]
T
ang, Xuni
ng, and
C
h
ristop
he
r C. Ya
ng.
T
U
T
:
a statistica
l
mo
de
l for
det
e
c
ting tre
nds, to
pics
and
us
e
r
interests in social
m
e
dia.
Pr
ocee
din
g
s of the 21st ACM i
n
ternati
o
n
a
l co
nferenc
e on In
formation a
n
d
kno
w
l
e
d
ge ma
nag
ement. AC
M. 2012; 97
2-9
81.
Evaluation Warning : The document was created with Spire.PDF for Python.