TELK OMNIKA , V ol. 14, No . 2, J une 2016, pp . 725 733 ISSN: 1693-6930, accredited A b y DIKTI, Decree No: 58/DIKTI/K ep/2013 DOI: 10.12928/telk omnika.v13.i1.1122 725 Resear c h on Comm unity Detection Algorithm Based on the UIR-Q Zilong Jiang 1 , W ei Dai *2 , Xiuf eng Cao 1 , Liangc hen Chen 1 , K e Zheng 1 , and Abdoula y e Sidibe 1 1 School of Computer Science and T echnology , W uhan Univ ersity of T echnology , W uh an, China 2 School of Economics and Management, Hubei P olytechnic Univ ersity , Huangshi, China * Corresponding author , e-mail: dw eisky@163.com Abstract Aiming at the current prob lems of comm unity detection algor ithm in which user’ s proper ties are not used; the comm unity str ucture is not stab le and the efficiency of the algor ithm is lo w , this paper proposes a comm unity detection algor ithm based on the user influence . In ter ms of the concept of user influe nce in the subject comm unicat ion and the P ageRank alg or ithm, this paper uses the proper ties of nodes of users in social netw or ks to f or m the user influence f actor . Then, the user with the biggest influence is set as the initial node of ne w comm unity and the local modular ity method is introduced into detecting the comm unity str ucture . Exper iments sho w that the impro v ed algor ithm can efficiently detect the comm unity str ucture with large scale users and the results are stab le . Theref ore , this algor ithm will ha v e a wide ap plied prospect. K e yw or d: social netw or ks; comm unity detection; user influence; modular ity Cop yright c 2016 Univer sitas Ahmad Dahlan. All rights reser ved. 1. Intr oduction Comm unity detection means the cohesiv e subg roup detected in the social netw or k and as an impor tant prob lem e xisting in the analysis of social netw or k it is beneficial f or people to fur ther understanding and master ing the complicated netw or k subject in the research and making deep applied research, f or instance , individual recommendation [1], compression of netw or k with large scale [2], social netw or k e v olution [3], and so on. Comm unity detection algor ithm based on the netw or k str ucture is a class of popular al- gor ithm which it divides the social netw or k into sub-comm unities with close connection in same comm unity and sparse connection in diff erent comm unities using the relationship betw een users . Comm unity detection algor ithm based on modular ity is a classical kind of comm unity detection algor ithm based on the netw or k str ucture . Ne wman and Gir v an proposed an e v aluation function of netw or k modular ity , that is , mod- ular ity Q. Q function is the diff erence of real connected n umber in a comm unity and e xpected connected n umber in a comm unity with r andom connection and it descr ibes the good an d bad of detected comm unity [4]. Clauset impro v ed the f ast Ne wman algor ithm through utilizing heap str ucture to calculate the netw or k modular ity and named it as CNM algor ithm in which a sparse matr ix is emplo y ed to e xpress the diff erence of modu lar ity betw een nodes and Maxheap is used f or sa ving the maxim um modular ity diff erence . Ev er y time , a maxim um v alue is selected from the Maxheap and combined with the correlativ e nodes into a comm unity , and then the sparse matr ix and heap are updated. The process does not ter minate until the whole netw or k has only a com- m unity [5]. Chen et al. brought f orw ard a kind of local comm unity detection algor ithm called as LCE algor ithm in which the nodes with local maxim um modular ity are selected as the core nodes , and the core nodes are e xpanded in ter ms of the modified local modular ity R. This algor ithm can detect the o v er lap of comm unity and is good f or par allelization without pr ior kno wledge [6]. Hu et al. proposed the LMDMR algor ithm which is a kind of local comm unity detection algor ithm based on e xpanding the set of local maxim um nodes and impro ving the inde x R (modified local modular- ity R). By estimati ng the potential of appear ing in the same comm unity betw een nodes and their neighbor ing nodes with tw o-le v el, this algor ithm can confir m the centr ality and then detect the Receiv ed September 16, 2015; Re vised March 7, 2016; Accepted March 22, 2016 Evaluation Warning : The document was created with Spire.PDF for Python.
726 ISSN: 1693-6930 comm unity through combining the modified local modular ity R with the diffusiv e str ategy of local comm unity defined b y strong comm unity based on the phenomenon that in the special situation the inde x R will miss a lot of nodes belonged to strong comm unity [7]. Compare with the CNM algor ithm, these mentioned comm unity detection algor ithms based on local modular ity has made some impro v ement in the quality of comm unity detection b ut there are also some deficiencies in them, such as instability of comm unity str ucture and the prob lem of o v er lapping comm unity . Moreo v er , in the tr aditional comm unity detection algor ithms based on the netw or k str ucture the social netw or k is regarded as the static topological g r aph without con- sider ing the inter action f actors betw een nodes , which is no longer suitab le f or the moder n social netw or ks such as microb log, W eChat and so on. The inf or mation flo w of nodes (users) in moder n social netw or k is usually the representativ e of user influence . User influence in social netw or k means the capacity that in social netw or k user utiliz e the w a y of spreading the inf or mation or inter acting with other people to influence other people’ thoughts or motiv ate them to inter act with others . Man y e xper ts and scholars ha v e made man y researches on the user influence in social netw or k. Mee y oung C ., et al. made a deep research on the user beha vior and user influence . This research f ocused on users three beha viors: being concer ned, b eing f orw arded and being called and analyz ed their respectiv e types of user influence [8]. Y e et al. divided user influence into three kinds and fiv e sor ting methods . Three kinds of influence ref er to influence of n umber of f ans , f orw arding influence , replying influence . And fiv e sor ting methods means sor ting b y the n umber of f ans , n umber of replying inf or mation, n umber of f orw arding inf or mation, n umber of respondents , or n umber of f orw arders . In this research, user influence with the largest n umber of respondents is regarded as the most stab le and the n umber of respondents is used f or sor ting user influence [9]. Based on the inter acte d relationship betw een user influence and the correlation with the released inf or mation, Ar lei Silv a, et al. used HITS algor ithm to ob tain the score of user influence and the correlation with the released inf or mation [10]. At present, there is lac k of research on user influence in combination with comm unity detection algor ithm; theref ore , this paper proposes a comm unity detection algor ithm integ r ating with user influence . 2. The resear c h on the comm unity detection algorithm based on UIR-Q 2.1. The user influence factor s of social netw ork Because the personal relationship of real lif e is the basis of the social netw or k, the user’ s influence of the social netw or k is similar with their individual influence in real lif e . Through making an analysis of beha viors of users in sina microb log including f orw arding messages , commenting on message s and replying messages , this paper co nstr ucts three f actors to e v aluate user influ- ence , that is , frequency of updating microb log, deg ree of contin uous attention in f ans an d user’ s capacity of spreading inf or mation. 2.1.1. The frequenc y of updating micr ob log In sina microb log netw or k, this pa per considers tw o f actors: the frequency of releasing the messages in microb log and the times of f orw arding, commenting on and replying messages . In gener al, through releasing the messages in microb log to e xpress the attitudes or ideas of e v ents , users think that the more messages the y release in microb log, the more ideas the y can e xpress and the y ha v e more chance to influence other users . The users f orw ard the interesting messages in microb log to share with their f ans , which ma y mak e their f ans f orw ard and comment on the messages . In this case , the f ans of their f ans can see these messages . Theref ore , the user influence can be quic kly increased in the social netw or k. This paper will define the frequency of updating microb log as the total times of f orw arding, releasing, commenting on and replying messages in microb log in the statistical per iod unit such as one w eek f or users . TELK OMNIKA V ol. 14, No . 2, J une 2016 : 725 733 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA ISSN: 1693-6930 727 The calculating f or m ula can be e xpressed as f ollo ws . A i = n i T i (1) In this f or m ula, A i represents the frequency of updating microb log in the statistical per iod unit f or one user . T i represents a statistical per iod. n i represents the total n umber of f orw arding, releas- ing, commenting on and replying messages in microb log in the statistical per iod unit f or one user . 2.1.2. The degree of contin uous attention in fans The deg ree of contin uous attention in f ans denotes the deg ree that f ans are interested in the pre vious released messages in microb log. This f actor is defined as the r atio which the times of the user j f orw arding, commenting on and replying messages in user i’ s microb log accounts f or the total times of releasing messages in microb log in the sta tistical per iod unit. It can be represented as the f ollo wing f or m ula. R ( i; j ) = R t ( i; j ) + 1 S C ( i ) + 1 (2) In this f or m ula, Rt (i, j) represent s the total times which the user j f orw ards and comments on messages in user i’ s microb log and SC (i) represents the total times which the user i releases and f orw ards the messages in microb log in the statistical per iod unit. If the times which user j f orw ards or comments on the messages in user i’ s microb log is high in the statistical per iod unit, it is reasonab le to belie v e that user j will do the same thing in future . R (i, j) represents the deg ree of attention of user j to user i in the f or m of probability . 2.1.3. The user’ s capacity of spreading inf ormation The concept of user’ s capacity of spreading inf or mation can be defined as the frequency of updating microb log in combination with the deg ree of contin uous attention in f ans . S (i, j) represents the m ultiply betw een the user i’ s frequency of updating microb log and the deg ree of contin uous attention of user j to user i. The f or m ula can be descr ibed as f ollo ws . S ( i; j ) = A i R ( i; j ) (3) In this f or m ula, the user’ s capacity of spreading inf or mation reflects the a v er age which user i deliv ers the v olume of inf or mation to f an j in statistical per iod. 2.2. Resear c h on user influence based on P a g eRank 2.2.1. P a g eRank algorithm and the solution of RankSink phenomenon In P ageRank algor ithm [11], PR v alue is used f or representing the author ity of page . The calculating w a y of distr ib ution of PR v alue in P ageRank algor ithm can be descr ibed as f ollo ws . P R ( v ) = c X u 2 U ( v ) P R ( u ) N ( u ) (4) In this f or m ula, PR(v) means the PR v alue of page v , U(v) ref ers to the page set of redirect- in v . u represents the URL redirecting from pa ge u to page v . N (u) means the n umber of URL redirected from page u. In the research on the directed str ucture of w eb page , Larr y P age and other researchers f ound a phenomenon that the redirecting relations of some pages could compose a cycle . According to the f or m ula (4), the PR v alue of page in this compositiv e cycle will be increased all the time in the iter ativ e process while the PR v alue of page outside this compositiv e cycle will be decre ased. Finally , e xcept f or the page of this compositiv e cycle , the Research on Comm unity Detection Algor ithm Based on the UIR-Q (Zilong Jiang) Evaluation Warning : The document was created with Spire.PDF for Python.
728 ISSN: 1693-6930 PR v alues of other pages are equal to 0. Larr y P age called this phenomenon as the RankSink phenomenon. In order to eliminate the prob lem, Larr y P age introduced t he damping f actor into this algor ithm to represent the probability of r andom access to the page and modified the f or m ula (4) with the f ollo wing f or m ula (5). P R ( v ) = (1 d ) + d X u 2 U ( v ) P R ( u ) N ( u ) (5) In the f or m ula (5), d, as the damping f actor , represents that when user clic ks some page the redirected URL will be clic k ed with the probability d and the page will redirect other URL with the probability from 1 to d. In calculating the user influence of social netw or ks , there are tw o reasons wh y this paper uses the P ageRank algor ithm. Firstly , in the microb log, users ha v e their o wn relation of attention and user-f ans , which are re- spectiv ely similar to the relation of redirect-in and redirect-out. Thu s , it is reasonab le to belie v e that microb log and page ha v e similar netw or k str ucture . Secondly , e v aluating a user’ s influence in t he social netw or ks is similar to the e v aluation of author ity which w eb is in the netw or k in the P ageRank algor ithm. Theref ore , using the P ageRank algor ithm to calculate the user influence of microb log is equal to calculate the users PR v alue . Then, the user with big influence can be f ound through r anking the PR v alue so that the inf or mation prediction can be made . 2.2.2. The impr o ved user influence rank algorithm based on P a g eRank In the P ageRank algor ithm, the PR v alue of e v er y page is distr ib uted to the redirect- in URL of page on a v er age . In order to a v oid the disturbance caused b y z ombie f ans and online w ater ar m y and impro v e the accur acy of e v aluating user influence in the analysis of user influence of social netw or ks , this paper introduces the user influence f actors to impro v e the P ageRank algor ithm and proposes the impro v ed user influence r ank algor ithm based on P ageRank. The basic idea of UIR algor ithm is that the deg ree of contin uous attenti on in f ans is regarded as damping f actor ; when th e algor ithm distr ib utes the v alue of influence , user’ s capacity of spreading inf or mation is tak en into consider ation and users with strong capacity of spreading inf or mation can be distr ib uted more UIR v alue in a big probability while those who ha v e w eak capacity can be distr ib uted less UIR v alue in a big probability as w ell. In the impro v ed user influence r ank algor ithm based on P ageRank, the f or m ula of calculating user influence r ank can be e xpressed as f ollo ws . U I R ( u ) = (1 R ( u; v )) + R ( u; v ) X v 2 E ( u ) B ( u; v ) U I R ( v ) (6) In this f or m ula, E(u) is the use u’ s f ans set and R(u,v) means the deg ree of contin uous attention in f ans which is deter mined b y the frequency of f ans f orw arding messages from users’ microb logs . B(u,v) represents the user v’ s capacity of spreading inf or mation to user u and is deter mined b y the r atio of user u’ s capacity of spreading inf or mation accounting f or the sum of all f ans capacity of spreading inf or mation. The calculating f or m ula can be descr ibed as f ollo ws . B ( u; v ) = S ( u; v ) P f 2 E ( v ) S ( f ; v ) (7) All users’ UIR v alue can be initializ ed to 1 and after being repeatedly iter ativ e calculated b y f or m ula (7), the UIR v alue tends to con v ergence . At that time , the UIR algor ithm will be ter minated and all users’ UIR v alues of netw or k can be obtained. The specific algor ithm can be descr ibed as f ollo ws . 2.3. The algorithm idea of comm unity detection algorithm based on UIR-Q Clauset put f orw ard comm unity detection algor ithm CNM based on local modular ity [5,12]. This algor ithm firstly defines local modular ity Q of a local comm unity and finds out all neighbor ing TELK OMNIKA V ol. 14, No . 2, J une 2016 : 725 733 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA ISSN: 1693-6930 729 Algorithm 1 UIR algor ithm Input: List < Node > U//U is all users’ set. Output: List < Node > U//U is a set of users whose influence v alue is calculated. BEGIN F or each u in U Obtaining the f ans set E from user u Calculating the updating frequency A i according to the f or m ula (1). F or each e in E Calculating R(i,j) according to f or m ula (2) Calculating S(i,j) according to f or m ula (3) Endf or Endf or While the UIR v alue cannot reach the stab le situation do f or u in U Obtaining the f ans set E from user u f or e in E Calculating user u’ s influence deliv ered b y user e according to f or m ula (6) and updating Endf or Endf or Endwhile retur n U ENG BEGIN nodes of this local comm unity , and then calculate the ne w local modular ity of n e w local comm unity when e v er y neighbor ing node is ad ded into this local comm unity . In this case , the neighbor ing node with the biggest v alue of ne w local modular ity can be really added into the local comm unity . This p rocess is iter ativ ely added until the v alue of Q of the local comm unity cannot be increased. At th at time , the local comm unity reaches satur ation. The v alue of Q can be defined as the f ollo wing f or m ula. Q = L in L in + L out (8) In this f or m ula, L in represents the n umber of edges whose connection happens in all nodes in the comm unity and L out represents n umber of edges whose connection hap pens betw een the nodes within the comm unity and nodes outside the comm unity . The v alue of Q represents the boundar y char acter istic of comm unity , that is , if a node is added into the comm unity to mak e the v alue of Q increase , more edges are added into the comm unity , which leads to the situation that the edges outside the comm unity become sparse . Based on the idea of abo v e-mentioned CNM algor ithm [12, 13], the paper proposes a kind of comm unity detection algor ithm (UIR-Q) integ r ated with user influence . The basic idea of comm unity detection algor ithm based on UIR-Q is descr ibed as f ollo ws . 1) Initializing the netw or k proper ties . Through the process of read-in the netw or k, e v er y node in the netw or k is assigned to an ID and the proper ties of deg ree of node , influence and the comm unity label. 2) Selecting initial node . The UIR v alue of e v er y node is put in order from big to small and a set queue is obtained. F rom the set queue , the node that the v alue of comm unity label is n ull and the UIR v alue is maxim um is selected as the initial node of ne w comm unity e v er y times . If the queue is n ull, step 8 is carr ied out. Otherwise , the node of queue v i with the maxim um UIR v alue is used f or initializing the comm unity C j . The Q v alue of C j is initializing to 0. 3) Finding the candidate neighbor nodes set. The e xter nal node which has connection with the nodes of the C j is added into the candidate neighbor nodes set B . 4) F or ming the comm unity str ucture . The ne w local modular ity q v is calculated when e v er y node v in the set B are added into the comm unity C j and the maxim um q v is selected. If q v > Q c , the q v Research on Comm unity Detection Algor ithm Based on the UIR-Q (Zilong Jiang) Evaluation Warning : The document was created with Spire.PDF for Python.
730 ISSN: 1693-6930 is used f or updating the Q c , and the corresponding node v of is added into the comm unity C j . The comm unity label of n ode v is added into the j and the node v in the queue is deleted. If q v < Q c , step 6 is carr ied out. 5) Repeating step 3) and step 4). 6) Obtaining the comm unity j. 7) Repeating step 2), step 3), step 4), step 5) and step 6). 8) Obtaining cluster ing result. This algor ithm can be descr ibed as the f ollo wing algor ithm 2. Algorithm 2 Comm unity detection algor ithm based on UIR-Q Input: List < Node > U //U is the set of all users Output: List < Comm unity > CS //CS is the set of detected comm unity BEGIN U=UIR(U)//Calling algor ithm 1 to calculate user influence Putting U in order WHILE there is node in the U , do Initializing the ne w comm unity C Finding the node i with the maxim um UIR v alue from U and adding it to the comm unity C WHILE there is no satur ation in the C Calculating B , set of neighbor ing node of C Calculating the ne w Q v alue ( q v ) when e v er y node in the B is added into the C Finding out the maxim um q v IF q v > Q c Adding the corresponding node of q v into the C Deleting the corresonding node of q v from U Q = q v ELSE C reaches satur ation ENDIF END WHILE Sa ving the comm unity C into the comm unity set Cs Deleting the node i from U END WHILE RR TURN Cs ENG BEGIN 3. Experiment and anal ysis 3.1. Har d ware and software en vir onment The cluster system of e xper imental platf or m consists of se v en personal computers among which a computer is used as master and the other six computers as sla v ers . Ub untu 12.04 is installed into e v er y personal computer and The en vironment of Spar k [14], Hadoop [15] and Y ar n is estab lished in Ub untu. The comm unity detection algor ithm based on UIR-Q will be par alleliz ed under the en vironment. The specific configur ation of hardw are and softw are is sho wn in the tab le 1. T ab le 1. The configur ation of cluster Name Specific configur ation PC Intel core i3 3.2 GHz, 8G RAM, 500G Hard Disk Ether net 100 Mb/s Oper ation system Ub untu 12.04 L TS J a v a JDK 1.8 Hadoop (Including Y ar n) Hadoop 2.2.0 Spar k Spar k 1.0 T ab le 2. The compar ison of diff erent algor ithms Algor ithm NMI Q ov The n umber of comm unities CNM 0.735 0.729 23 LCE 0.827 0.787 37 LMDMR 0.751 0.704 33 UIR-Q 0.859 0.771 28 TELK OMNIKA V ol. 14, No . 2, J une 2016 : 725 733 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA ISSN: 1693-6930 731 3.2. Dataset of e xperiments All data of this e xper iment is a dataset which contains 3458 users’ sina microb logs and includes inf or mation of microb logs , the retr ansmission relation of microb logs , users’ inf or mation, users’ fr iendship and other data. 3.3. Ev aluation metrics 3.3.1. Normaliz ed Mutual Inf ormation (NMI) Nor maliz ed Mutual Inf or mation (NMI), proposed b y Lancichhetti f or detecting the e v al- uation inde x of comm unity , can be used f or eff ectiv e e v aluati ng the accur ateness of detecting o v er lapping comm unity [16]. NMI represents the ”m utual inf or mation” betw een the set of comm u- nity detected b y the algor ithm and the set of real comm unity str ucture and its v alue sho ws the deg ree of consistency betw een these tw o comm unities . The NMI v alue is usually betw een 0 and 1. The bigger the NMI v alue is , the better the result of comm unity detection is . Its definition can be sho wn as the f ollo wing f or m ula. N M I = I ( a ; b ) p H ( a ) H ( b ) (9) in this f or m ula, H ( a ) = P k ( a ) h n a h n log n a h n , H ( b ) = P k ( b ) l n b l n log n b l n , and I ( a ; b ) = P h P l n h;l n log ( n h;l n = ( n a h n n b l n )) . a , b respectiv ely denotes the diff erent comm unity str uctures . k ( b ) means the n umber of comm unities in the comm unity str ucture a . n a h denotes the n umber of nodes of the hth comm unity in th e comm unity str ucture a . n h;l denotes the n umber of nodes which sim ultaneously e xist in the h th comm unity of the comm unity str ucture a and the l th comm unity of the comm unity str ucture b . 3.3.2. Overlapping Modularity ( Q ov ) In order to con v enient ly e v aluate the o v er lapping comm unity str ucture , Nicosia, et al. proposed the Q ov function [17], which is similar to Q function. The v alue of Q ov function r anges from 0 and 1. The bigger the v alue of Q ov function is , the better the o v er lapping comm unity str ucture is . Its definition can be sho wn as the f ollo wing f or m ula. Q ov = 1 m X c 2 C X i;j 2 V ( ( C i ; C j ; C ) A ij out l ( i;j ) ;c k out i in l ( i;j ) ;c k in j m ) (10) In the f or m ula (10), m denotes then umber of edges in the g r aph. ( C i ; C j ; C ) denotes the affilia- tion relationship betw een node i and node j to comm unity C and it can be e xpressedb y the f or m ula (11). ( C i ; C j ; C ) = ( 1 ; i 2 C ; j 2 C 0 ; other w ise (11) When node i and nod e j belong tothe same comm unity , A ij is equal to 1,otherwise , it is 0. k out i denotes the out-deg ree of node i, that is , the n umber of connected edges betw een node i and the nodes be y ond the comm unity C . The f or m ula of calculating out l ( i;j ) ;c is sho wn as the f ollo wing f or m ula (12). out l ( i;j ) ;c = P j 2 V F ( / i;c ; / j ;c ) j V j (12) In the f or m ula (12), / i;c denotes the strength coefficient of node i belonging to comm unity C and its v alue is assigned to 1 =n . n means the n umber of comm unities to which node i belongs . k in j denotes the in-deg reeof node j, that is , the n umber of connected edges betw een node j and the Research on Comm unity Detection Algor ithm Based on the UIR-Q (Zilong Jiang) Evaluation Warning : The document was created with Spire.PDF for Python.
732 ISSN: 1693-6930 Figure 1. The comm unities str ucture Figure 2. The inter nal str ucture of comm unities nodes inthe comm unity C . The f or m ula of calculating in l ( i;j ) ;c is sho wn as the f ollo wing f or m ula (13). in l ( i;j ) ;c = P i 2 V F ( / i;c ; / j ;c ) j V j (13) Function F is sho wn as the f ollo wing f or m ula (14). F ( / i;c ; / j ;c ) = 1 (1 + e f ( / i;c ) )(1 + e f ( / j ;c ) ) (14) Function f is selected as the f ollo wing f or m ula (15) according to the liter ature [27]. f ( x ) = 60 x 30 (15) 3.3.3. The n umber of comm unities The n umber of comm unities denot es the n umber of sub-comm unities of e x ecuting the comm unity detection algor ithms on the dataset of social netw or k. 3.4. The result and anal ysis The result obtained from the comm unity detection algor ithm based on UIR-Q which is e x ecuted on the dataset of sina microb log is sho wn in a visib le w a y as the f ollo wing figure 1, figure 2 [18]. According to these tw o figures , it can be f ound that the prob lem of o v er lapping comm unity is impro v ed and the fr iends are roughly e v enly distr ib uted around the user . The compar ison betw een the comm unity detection algor ithm based on UIR-Q and the abo v e-mentioned comm unity detection algor ithms based on local modular ity can be sho wn as the tab le 2. According to the tab le 2, the NMI v alue is the highest in the comm unity detection algor ithm based on UIR-Q, while its Q ov v alue is just lo w er than LCE algor ithm. In LCE algor ithm, 37 comm unities in cluding unstab le comm unities are detected. These unstab le comm unities should attach to other bigger comm unities . In the UIR-Q algor ithm, there is a relativ ely good balance among NMI v alue , modular ity and the n umber of comm unities . 4. The conc lusion and the future w ork This paper proposes an impro v ed comm un ity detection algor ithm based on user influ- ence and local modular ity . Through utilizing the proper ties of social netw or ks to f or m the user influence f actors , this paper emplo ys the P ageRank algor ithm to calculate the UIR v alues of all users . The node with the maxim um UIR v alue in the netw or k is used f or initializing a comm unity , and the user is selected as the center of comm unity . Then, the node which mak es the v alue of local modular ity Q in ter ms of the CNM algor ithm is added into the comm unity . This process does not ter minate until all nodes belong to one comm unity or more comm unities . Finally , par alleliza- tion of the algor ithm is implemented under Spar k fr ame w or k. The e xper imental result sho ws that compared with the CNM algor ithm, LCE algor ithm and LMDMR algor ithm, the comm unity detec- tion algor ithm based on UIR-Q can obtain good modular ity and small n umber of comm unity in the TELK OMNIKA V ol. 14, No . 2, J une 2016 : 725 733 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA ISSN: 1693-6930 733 social netw or k with uncer tain str ucture and large scale . Ho w e v er , the phenomenon of o v er lapping comm unities cannot be completely eliminated in this algor ithm. Theref ore , the future w or k will be f ocused on fur ther impro ving the quality of comm unity detection. Ac kno wledg ements This w or k is suppor ted in par t b y Humanities and Social Science Y outh Fund Project of Ministr y of Education, P . R. C , No .13YJCZH028. Ref erences [1] Lim K H., Datta A.. F ollo wing the f ollo w er : Detecting comm unities with common interests on T witter . In Proc. of the 23rd A CM Conf . on Hyper te xt and Social Media , Ne w Y or k: A CM Press . 2012; 317-318. [2] ]Dour isboure Y ., Ger aci F ., P elleg r ini M.. Extr action and classification of dense comm unities in the W eb . In: Proc. of the 16th Intl Conf . on W or ld Wide W eb , Ne w Y or k: A CM Press . 2007; 461-470. [3] Asur S ., P ar thasar ath y S ., Ucar D .. An e v ent-based fr ame w or k f or char acter izing the e v o- lutionar y beha vior of inter action g r aphs . A CM T r ans . on Kno wledge Disco v er y from Data (TKDD) . 2009; 3(4):16. [4] Ne wman MEJ , Gir v an M.. Finding and e v aluating comm unity str ucture in netw or ks . Ph ysical Re vie w E . 2004; 69(2):026113. [5] Clauset A.Ne wman M E J , Moore C .. Finding comm unity str ucture in v er y large netw or ks . Ph ysical Re vie w E . 2004; 70(6): 066111. [6] Chen Q., F ang M.. An Efficient Algor ithm f or Comm unity Detection in Comple x Netw or ks . THE 6TH sna-kdd W or kshop . 2012; 733-740. [7] Y u Hu. Research on comm unity dectection algor ithms in comple x netw or k based on local e xpansion. [D].Changchun: Jilin Univ ersity , 2015. [8] Mee y oung C .. Measur ing user influence in twitter : The million f ollo w er f allacy . Proceedings of Inter national Conf erence on W eb logs and Social Media . 2010; 887 -893. [9] Shaozhi Y e , S . F elix W u. Measur ing Mseeage Propagation and Social Influence on T wit- ter .com. SocInf o 2010 . 2010; 223-228. [10] Ar lei Silv a,Sar a Guimares , W agner Meir a,Jr . Mohammed Zaki. ProfileRank: Finding Rele- v ant Content and Influential Users Based on Inf or mation Diffiision. Proceedings of the A CM Inter national Conf erence on 7t h W or kshop on Social Netw or k Mining and Analysis . 2013; 667-773. [11] P age L., Br in S ., Motw ani R, et al.. The pager ank citation r an king: Br inging order to the w eb . Stanf ord Digital Libr ar ies . 1999; 344-349. [12] Aaron Clauset A.. Finding local comm unity str ucture in netw or ks . Ph ys . Rec. E . 2005; 72(2): 26132-26137, [13] L yr ic D ., Jonas K., Stef an N., et al.. Predictng mo vie pr ices through dynamic social netw or k analysis . Procedia Social and Beha vior al Sciences . 2010; 2: 6423-6433. [14] Spar k. [EB/OL]. http://spar k.apache .org/. [15] Dhr uba Bor thaku. The Hadoop Distr ib uted File System: Architecture and Design. Retr ie v ed from. http://hadoop .apache .org/common/ . 2010. [16] A. lancichineet, S . F or tunat, J . K er tesz. Detecting the Ov er lapping and Hier achical Comm u- nity Str ucture in Comple x Netw or ks . Ne w Jour nal of Ph ysics . 2009; 11(3):033015. [17] V Nieosia, G mangionoi, V Carehiolo , M Malger i. Extending the d efinition of modular ity to Directed g r aphs with o v er lapping comm unities . Jour nal of Statistical mechanics: Theor y and Exper iment . 2009; P03024. [18] Shneider man B , Ar is A.. Netw or k Visualization b y Semantic Substr ates . IEEE T r ansactions on Visualization and Computer Gr aphics . 2006; 12(5):733-740. Research on Comm unity Detection Algor ithm Based on the UIR-Q (Zilong Jiang) Evaluation Warning : The document was created with Spire.PDF for Python.