Inter national J our nal of Electrical and Computer Engineering (IJECE) V ol. 5, No. 2, April 2015, pp. 361 370 ISSN: 2088-8708 361 A No v el Spectral Clustering based on Local Distrib ution P arthajit Roy * and J . K. Mandal ** * Department of Computer Science, The Uni v ersity of Burdw an, Burdw an, W est Beng al, India-713104 ** Dept. of Comp. Sc. & Engg., The Uni v ersity of Kalyani, Nadia, W est Beng al, India-741235 Article Inf o Article history: Recei v ed No v 22, 2014 Re vised Jan 20, 2015 Accepted Feb 5, 2015 K eyw ord: Spectral Clustering Af finity Mahalanobis Distance Outlier Detection Random W alk Laplacian Clustering Indices ABSTRA CT This paper proposed v ariation of spectral clustering model base d on a no v el af finity metric that considers the distrib ution of the neighboring points to learn the underlaying structures in the data set. Proposed af finity metric is calculated using Mahalanobis distance that e x- ploits the concept of outlier detection for identifying the neighborhoods of the data points. Random W alk Laplacian of the representati v e graph and its spectra has been considered for the clustering purpose and the first k number of eigen v ectors ha v e been considered in the second phase of clustering. The model has been tested with benchmark data and the quality of the output of the proposed model has been tested in v arious cluster v alidity inde x es. Copyright c 2015 Institute of Advanced Engineering and Science . All rights r eserved. Corresponding A uthor: J. K. Mandal Dept. of Comp. Sc. & Engg. The Uni v ersity of Kalyani Nadia, W est Beng al India-741235 e-mail: jkm.cse@gmail.com 1. INTR ODUCTION Data clustering is the process of grouping the nonuniform data elements by identifying the underlaying structure[1]. Data clustering is a dif ficult task from computing’ s point of vie w . This dif ficulty is because data clus- tering is computationally hard. There are dif ferent approaches for data clustering and a whole branch of computer science has been de v oted for that. General reference on data clustering is due to Ev eritt et al[2]. Jain et al surv e yed clustering, taxonomy of clustering and recent trends[3]. A More recent surv e y is done by Xu et al[4]. Some recent applications of clustering has been done by F arshchian et al [5] and Sharma et al [6]. There are dif ferent approaches to w ards cluste ring. Some of the popular approaches are Hierarchical-, Density Based-, Squared-Error based-, Fuzzy-based-, and Graph based clustering[4]. The purpose of the present study is to cluster spatial data sets using graph based model. The cluster shape and its orientation is most important parameter in the present w ork. The obj ecti v e of the present research w ork is to handle the clustering problems where the cluster shapes are non-con v e x and moreo v er the present model considers the trend of the data sets and not just the proximities. The graph based model has g ained enormous popularity in recent days. Graph is a v ery good algebraic structure for defining proximity among the data points and proximity among the points is the k e y for the success of almost e v ery cl ustering algorithm. There are dif ferent areas of graph theory that can be e xploited for data clustering. Some Delaunay graph based model is a v ailable due to Y ang et al [7] and Ro y et al[8]. Some minimum spanning tree based clustering technique has been proposed by F oggia et al [9] and Ro y et al[10]. Some early minimum as well as maximum spanning tree based approaches were proposed by Asano et al[11]. An outstanding surv e y on Graph clustering is done by Shaefer[1] where the recent de v elopments has been outlined e xplicitly . Spectral clustering is one of the major branches of t he graph based clustering where the spectra i.e. the eigen v alues and eigen v ectors of the Laplacian of the graph is considered for clustering. There are se v eral papers on this topic. Some of them are due to Higham et al [12], T oussi et al [13] and Qiu et al [14]. A v er y good surv e y on spectral clustering is due to Luxb ur g[15]. A general surv e y on graph Laplacian has been done by Chung[16]. Some Evaluation Warning : The document was created with Spire.PDF for Python.
362 ISSN: 2088-8708 v ariation of the traditional spectral clustering is a v ailable due to Spielman et al[17] and Fiedler[18]. Af finity or distance among the data points is the most important input to an y clustering model. This paper uses a Mahalanobis distance proposed by Mahalanobis[19] based no v el af finity matrix for spectral clustering. A good discussion on Mahalanobis distance can be found in a book by M arsland[20]. Mahalanobis distance has been used ef fecti v ely for outlier ditection by Hodge et al [21]. The quality of the output produced by an y clustering algorithm is measured by se v eral v alidity indices. There are tw o major types of cluster v alidity indices. Internal and e xternal i ndices. Internal indices measures the quality of the cluster produced, by the distrib ution of the data and the density/scatterness of the data points. The e xternal indices, on the other hand, considers some labeled data and the quality of the clusters is measured on the basis of that. A good comparati v e study on the performance of v arious clustering indices is done by Saha et al[22]. The rest of the article is or g anized as follo ws. Section 2. discusses the mathematical backgrounds necessary of the model. Section 3. presents and discusses the proposed model. Section 4. discusses the results and analysis. Conclusion comes in the section 5. and references are dra wn at the end. 2. MA THEMA TICAL B A CKGR OUND This section unco v ers the necessary mathematics required to de v elop and e xplain the proposed model. The proposed model is a spectral clustering model based on a no v el af finity metric that calculates the proximity among the data points in a no v el w ay . The rest of this section discusses spectral graph, similarity measures and other neces- sary mathematical backgrounds in the fol lo wing manner . Subsection 2.1. discusses the clustering as an optimization problem. Subsection 2.2. discusses spectral graph theory and subsection 2.4. discusses the distance and similarity measures. 2.1. Data Clustering Gi v en a set of data points, the consideration of clustering is to identify the grouping by e xploring the under - laying structures in a data set, if there e xists an y . The main assumption in the clustering is that the property of the underlying structure of the data set is not kno wn. Only the number of clusters may be kno wn sometimes. formally clustering can be defined as follo ws: Gi v en S = f p 1 ; p 2 ; p 3 ; p 4 ; ; p n g is the set of n data points in m -dimensional space R m , the clustering is the partition of S into k dif ferent groups, k being the number of clusters, C = f C 1 ; C 2 ; C 3 ; ; C k g with respect to a distance metric d ( p i ; p j ) in such a w ay that the inter -cluster distance becomes maximum and the intra-cluster distance becomes minimum, o v er all the partitions[3]. i.e. the objecti v e of clustering is to minimize the ratio of intra cluster distance and inter cluster distance as sho wn in equation 1. M inimiz e Z = X 8 C i 2 C I ntr aC l uster D istance I n ter C l u stter D istance w :r :t: d ( p i ; p j ) (1) 2.2. Spectral Graph Theory A gr aph is an algebraic structure G = < V ; E > , where V is called the verte x set and E is called the edg e set. The set E can be defined as a relation o v er the cartesian product V V , defined as xy = > ( x; y ) 2 E which implies that there is an edge between x and y of the v erte x set V [23]. P arallel edges implies more than one edge between tw o v ertices and in our case parallel edges are not allo wed. A graph consists of self loop if xx for some x 2 V . A graph G = < V ; E > is called an undir ected graph, if the relation r ho is symmetric. A graph is called a simple undirected graph or S-graph, if it is undirected graph without an y self loop or parallel edges. A graph is weighted, if there is a weight function d ( v i ; v j ) associated with e v ery edge e 2 E . A weight function d ( :; : ) is a metric if it follo ws the follo wing properties: Non-negati vity: 8 x; y 2 R m ; d ( x; y ) 0 : Symmetricity: 8 x; y 2 R m ; d ( x; y ) = d ( y ; x ) : T riangular Inequality: 8 x; y ; z 2 R m ; d ( x; y ) + d ( y ; z ) d ( x; z ) : IJECE V ol. 5, No. 2, April 2015: 361 370 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 363 Gi v en a weighted graph G, the Adjacenc y matrix representation of the graph is a square matrix of order n = j V j where the entry w i;j ( w i;j 0 ), is the weight of the edge e ( i; j ) . i.e. the adjacenc y matrix W can be defined as, W n n = ( w ij ) i =1 ; 2 ;  ;n ; j =1 ; 2 ;  ;n (2) The de gree d i of a v erte x v i 2 V of a graph, represented as per equation 2, can be defined as sum of the weights of the incident edges on a particular v erte x as sho wn in the equation 3. d i = d ( v i ) = n X j =1 w ij (3) The de gree matrix D is a diagonal matrix with the de grees d i are the leading diagonal of the matrix D . The tool for spectral clustering is Laplacian matrix of the graph. A Laplacian of a graph is a symmetric matrix that can be formed from the adjacenc y ma trix. The eigen v alues and eigen v ectors of the Laplacian are the most important tools for analyzing the structure of a graph. S p e cially the cut related things can algebraically be analyzed by computing the v alues of the eigen v alues and eigen v ectors of the graph Laplacian. There e xists se v eral graph Laplacians and e v ery Laplacian has its o wn strength and weakness i n a particular situation [15]. A detailed literature on spectral graph theory has been gi v en by Chung[16]. Unnormalized Laplacian: The unnormalized Laplacian of a graph, denoted as L , can be defined as per the equa- tion 4. L = D W : (4) where D is the diagonal de gree matrix and W is the adjacenc y matrix. The important thing to note that the Laplacian defined abo v e, is a real symmetric matrix whose diagonal elements are all non-ne g ati v e and all other elements are ne g ati v e. Also, the sum of e v ery ro w is zero. The spectra is the eigen v ectors of the this Laplacian. There are se v eral important properties of this spectra. Some of the properties of this Laplacian [24][25][15] are gi v en as follo ws. 1. f 0 Lf = 1 2 P n i =1 P n j =1 w ij ( f i f j ) 2 ; 8 f 2 R n : 2. L is symmetric and positi v e semi-definite. 3. The smallest eigen v alue of L is 0 . 4. All eigen v alues are real with 0 = 1 2 n : 5. If the graph is not connected, the the number of 0 eigen v alue, i.e. the multiplicity of eigen v alue 0 , is the number of connected components. Normalized Laplacian: The spectra of unnormalized Laplacian of a graph is independent of the matrix D . Normal- ized Laplacian o v ercomes this. T w o of the normalized Laplacians are v ery popular . These are Symmetric Laplacian and Random W alk Laplacian. Symmetric Laplacina L sy m is a symmetric matrix define as per the equation 5. L sy m = D 1 = 2 LD 1 = 2 (5) The Random W alk Laplacian models the Random W alk in a graph. i.e. starting from an arbitrary v erte x of a graph, if we randomly go to an y of its adjacent v er te x and follo w the same process repetiti v ely , then in a densely connected subgraph of the original graph, the w alk will be ended in the same subgraph in most of the cases. This property is modeled in a Random W alk Laplacian. The random w alk graph L r w can be defined as per the equation 6. L r w = D 1 L = I D 1 W (6) All of the properties for unnormalized Laplacian holds good for L r w also. In the present paper , the random w alk Laplacian and its spectra will be used for clustering. Spectr al Clustering based on Local Distrib ution(P arthajit Roy) Evaluation Warning : The document was created with Spire.PDF for Python.
364 ISSN: 2088-8708 2.3. Similarity Measur ement The success of the graph based clustering algorithms lie on the choice of similarity matrix. The similarity between tw o data poi nts is the only me ans for the creation of an edge in the graph between them. If the edges are more close to the actual relationship, the success possibility of the algorithm is also high. F or this, similarity measurement demands adequate research. There are se v eral types of similarity measures a v ailable. Some of them e xploits the distance between the data points only . The follo wing part discusses tw o of them. neighbor: In this case, if the distance between tw o data points is less then a predefined benchmark v alue , then the weight is assigned. Otherwise 0 is assigned. Gaussian Similarity: In this case, the distance between the data points is computed as, g ~ X ; ~ Y = e k ~ X ~ Y k 2 2 2 (7) The v alue of in the equation 7 kno wn as the str ength of inclusion , i.e. more the v alue of , the more the weight will be gi v en to the edge between the data points. i.e. f ar points will also come under consideration with hea vy weight. 2.4. Distance Metric Gi v en tw o data points, the ultimate tool for desi gning the Laplacian matrix is similarity among data points and the similarity is based on the distance among data points, which is the weight of the edge between them. There are se v eral distance functions a v ailable b ut the most popular is the Euclidean distance defined as, E ( i; j ) =   m X k =1 j x i ( k ) x j ( k ) j 2 ! 1 = 2 (8) The generalization of the Euclidean distance is called the Mink o wski distance define as, M ( i; j ) =   m X k =1 j x i ( k ) x j ( k ) j n ! 1 =n (9) There are other distance functions also. But These tw o are the most popular . 3. PR OPOSED MODEL This section presents proposed model. The Eucledian (Equation 8) and Mink o wski (Equation 9) of the pre vious section gi v es the proximit y b ut their main disadv antage is the y measures the distance from a single point. Ev en if the distance needs to be measured from a di strib ution, the y first con v ert it to a single point (lik e mean, median etc.) and calculate the distance. This mean or median i s called representati v e of the distrib ution and often the y do not sho w good result. Consider the follo wing situation. Let there are 12 points in tw o dimensions. Some v alues are sho wn in T able 1 and the distrub ution are sho wn graphically in figure 1 (a). Suppose the black-fill spots are some distrib ution. In our case this will be the neighbor set of some point. T able 1. Sample points in tw o dimensions x 3.0 2.9 2.8 2.7 . . . 2.0 y 1.0 1.1 1.0 1.0 . . . 1.1 W e w ould lik e to find out the distance of an y other points, lik e P 1 or P 2 from this distrib ution. Clearly the point P 2 is more close to this distrib ution than P 1 . But ho w to measure that mathema tically? The traditional w ay will find the mean of the distrib ution and will try to find the distance of the gi v en point from the mean. But this is not good here, because point P 1 and P 2 are equidistance from the mean, so either both will be rejected or both will be accepted. Figure 1 (a) sho ws this. IJECE V ol. 5, No. 2, April 2015: 361 370 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 365 (a) (b) Figure 1. (a) Distrib ution of the point set of table 1 (b) Point P1 and P2 are equidistant from the mean of the distrib ution. The solution to this problem is the Mahalanobis distance[19]. Mahalanobis distance is a distance measure which addresses the abo v e problem in a better w ay . F ollo wing are steps for calculating Mahalanobis distance. Gi v en a data set S = f p 1 ; p 2 ; ; p n g where each p i 2 R m , the mean is defined as per equation 10. = 1 n n X i =1 p i (10) v ariance is defined as per equation 11. 2 = 1 n n X i =1 ( p i ) 2 (11) The Co v ariance matrix M m m , where m is the number of attrib utes of the data set, can be defined as, M = cov ( i; j ); 8 i = 1 : m ; j = 1 ; ; m (12) and the Mahalanobis distance between x 1 and x 2 is defined using the equation 12 as, d = p x 1 M 1 x 2 (13) In the figure 1 (b), the Euclidean distance between P 1 and the mean is 0 : 6 . The distance of P 2 from the mean is also 0 : 6 . So, an y algorithm that considers Euclidean distance will either include both of them or will reject both of them, which is unrealistic. The Mahalanobis distance (equation 13, on the other hand, for P 1 from the distrib ution is 34 : 28 whereas the distance of P 2 from the distrib ution is 24 : 11 . This clearly states that the point P 2 is more close to the distrib ution than point P 1 . Spectr al Clustering based on Local Distrib ution(P arthajit Roy) Evaluation Warning : The document was created with Spire.PDF for Python.
366 ISSN: 2088-8708 Figure 2. Point P3 f alls outside the distrib ution and the Point P4 f alls inside. Another illustration is sho wn in figure 2. Here, point P 3 and P 4 are considered. Clearly , P 4 f alls inside the distrib ution and P 3 f alls out side. The Mahalanobis distance states this f act clearly as the distance of P 3 is 28 : 64 and that of P 4 is 23 : 59 . If a suitable cutof f distance can be set, then P 3 can be e xcluded and the P 4 can be included which is not possible in case of traditional similarity measures. The present model e xploits this property of Mahalanobis distance in the neighborhood of each point for calculating the neighbors on the basis of distrib ution similarity and not on the basis of distance similarity . The model assumes the number of clusters as a prerequisite. This is k in the present case. The model also assumes some kno wledge, as in case of other clustering algorithms, as parameter v alue for the creation of similarity matrix. Initially , the algorithms has been presented and then the w orking principle of the proposed model has been discussed e xplicitely . The algorithm for Mahalanobis distance based s imilarity matrix computation has been proposed in Algorithm 1. The algorithm for clustering model is sho wn in the Algorithm 2 . Algorithm 1 Similarity Matrix Computation Input: S = n ~ P 1 ; ~ P 2 ; : : : ; ~ P n o . m -dimensional Data Points to be clustered. Input: . The Benchmark Distance v alue for Euclidean distance Input: . The Benchmark Distance v alue for Mahalanobis distance 1: declar e N m m , M m m as matrix 2: N   S I M I L A R I T Y M A T R I X ( S ; ) . Calculates the -neighbor similarity matrix in the traditional w ay . 3: f or all ~ P i 2 S do 4: declar e A as set 5: A   N E I G H B O R H O O D O F ( ~ P i ; N ) . Calculates The neighborhood of ~ P i w .r .t. the already computed N . 6: matrix   C O V A R I A N C E M A T R I X ( A ) 7: f or all ~ P j 2 S do 8: d   ~ P j T 1 ~ P j 1 = 2 9: if d then 10: M ( i )( j )   1 =d 11: M ( j )( i )   1 =d 12: else 13: M ( i )( j )   0 14: M ( j )( i )   0 15: end if 16: end f or 17: end f or IJECE V ol. 5, No. 2, April 2015: 361 370 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 367 Algorithm 2 Spectral Clustering using Mahalanobis Distance Input: S = n ~ P 1 ; ~ P 2 ; : : : ; ~ P n o . m -dimensional Data Points to be clustered. Input: K . The number of clusters 1: declar e M m m as matrix 2: M   M A H A L A N O B I S S I M I L A R I T Y M A T R I X ( S ; ) . Calculates the Mahalanobis distance based similarity matrix using Algorithm 1. 3: L r w   R A N D O M W A L K L A P L A C I A N ( M ) 4: C A L C U L A T E E I G E N S ( L r w ) . Calculates the eigen v alues and eigen v ectors. 5: S O R T E I G E N V A L U E S . Sorts the eigen v alues in ascending order . 6: D   S M A L L E S T E I G E N V E C T O R S ( k ) . D is the set of k eigen v ectors for k smallest eigen v alues. 7: K - M E A N S ( D ; k ) . Cluster k eigen v ectors using K-means algorithm. From Algorithm 1 and Algorithm 2, it is clear , that the main trick y part is the formation of the similarity matrix or the af finity matrix. Let us discuss the w orking principle of this part first. What the algorithm 1 is doing is that it is calculating the final similarity matrix or adjacenc y matrix in tw o passes. The first pass calculates the similarity matrix or the adjacenc y matrix using an y traditional method. In the present paper , -neighbor has been adapted b ut other similarity matrices may als o be chosen. Thereafter e v ery point and the neighborhood of them are being computed for the second pass or final adjacenc y or similarity matrix. This is computation is v ery trick y . F or a particular ro w , matrix elements with nonzero v alue is the v ertices of the neighbor set. after ha ving v erte x set , Lets say A , the present paper com pu t es the co v ariance matrix of the neighborhood set A . Proposed model considers the other points and the Mahalanobis distance of them. A second ne w similarity matrix is being created in this w ay . Here also an y suitable similarity measure may be considered. In the present paper , -neighbor has been chosen. i.e. gi v en the Mahalanobis distance of a point from a set of points, if the distance is less than , then the point is considered for being a member of the neighborhood of the point otherwise the point is considered as the outlier of the point set and is not considered to be a neighbor of the point set. From the discussions, it is clear that the Mahalanobis distance based similarity matrix is more realistic. It may include (or reject) a point in neighborhood of point by not measuring the distance only b ut by considering the distrib ution also. This method uses the technique of outlier e xclusion in micro le v el for creating af finity or similarity matrix. Algorithm 2 is the actual clustering algorithm. The present paper assumes the number of clusters, i.e. k is already kno wn. Then the present paper calculates random w alk Laplacian of the Mahalanobis distance based similarity matrix. The reason for selecting the random w alk Laplacian is that, it identifies dense subgraph in a better w ay and therefore gi v es a better result in practical. The proposed method finds the eigen v alue of the Laplacian of the matrix and s orts them in ascending order and the smallest k eigen v alues are identified and the corresponding eigen v ectors are considered for k -means clustering and the result of this k -means clustering is the final output. 4. RESUL TS AND AN AL YSIS The model has been tasted with tw o data sets. One is a to y data set proposed by the authors and the second data set is kno wn as spiral data set proposed by Chang and Y eung[26]. The performance of the proposed model has been compared with three other models namely K-means algo- rithm, hierarchical clustering and one of the standard spectral clustering method. More about K-means and hierarchical clustering can be found in [2, 3]. The standard spectral model is due to Shi-Malik [27]. The to y e xample consists of 75 data points in a tw o dimensional plane. The data set has been tak en in such a w ay that the trend of the points are clear . In the figure 3 (a), it is clear that the small portion in the right side of the lo wer cl uster (indicated as C in the figure) is a part of the lo wer cluster and that the small middle cluster(indicated as B in the figure)is the part of the upper straight line. The trend clearly suggests that. The important thing to note that the small clusters are equidistant from the lo wer spiral cluster . i.e. portion B and portion C are equidistant from the lo wer spiral part of the figure 3 (a). Portion B is equidistant from lo wer spiral part and the upper straight line portion. So, the traditional distance or the k -nearest neighborhood is not at all a good measure, because none of these considers the trend of the data s ets. Either t he y will include both of the small clusters or the y will e xclude both of them. The proposed Mahalanobis dista nce based similarity matrix is a v ery good opt ion in such type of situations because it will include one and e xclude the other based on the point distrib ution. The result of the proposed model is sho wn in figure 3 (b). The result sho ws that the proposed model clearly considers the distance as well as the cluster point distrib ution trends. Spectr al Clustering based on Local Distrib ution(P arthajit Roy) Evaluation Warning : The document was created with Spire.PDF for Python.
368 ISSN: 2088-8708 (a) (b) Figure 3. (a) The Sample to y data. (b) The black colors and red colors are the tw o clusters. The proposed models performance is compared with the standard models and the results produced are tested by se v en internal indices and four e xternal indices. T able 3 and table 2 are the results of the te sts. The proposed model sho ws better result in four out of se v en indices. All the four e xternal indices says the strength of the proposed model is better in the proposed situation. T able 2. Internal Indices for proposed to y data On Synthetic T o y Data Internal Rule K-Means Hierarchical Shi-Malik[27] Proposed Is the Proposed Indices Model (A v erage Link) Model Model model best? Silhouette Max 0.4835682 0.4326811 0.2461643 0.2506375 No Scott Symons Min 800.4114 798.99 861.9192 729.9601 Y es W emmert Max 0.5940378 0.5102386 0.1872991 0.2446224 No Gancarski T au Max -0.5383263 -0.4735096 -0.2174559 -0.1618588 Y es Gamma Max -0.7610649 -0.6694097 -0.3078273 -0.2304259 Y es G-Plus Min 0.4403892 0.4174937 0.3262061 0.3034445 Y es Ray T uri Min 0.1788701 0.2293373 0.7925694 1.289401 No T able 3. External indices for proposed to y data On Synthetic T o y Data Internal Rule K-Means Hierarchical Shi-Malik[27] Proposed Which One Indices Model (A v erage) Model Model Model is Best? F olk es Mallo ws Max 0.558085 0.6511637 0.9060712 1 Proposed Jaccard Max 0.386073 0.4815563 0.8275653 1 Proposed Rand Max 0.532973 0.6302703 0.8976577 1 Proposed Czekano wski Max 0.5570745 0.6500682 0.9056478 1 Proposed Dice The model has been tasted with the another standard data set, kno wn as spiral data, proposed by Chang et al[26]. The data set has 312 data points and 2 dimensions and 3 clusters. The data set is not linearly separable. The proposed model correctly identifies the three clusters of the spiral data with accurate point distrib ution. i.e. Both path based model[26] and the present model gi v es 100% accurac y . This sho ws the strength of the proposed model on IJECE V ol. 5, No. 2, April 2015: 361 370 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 369 standard situations. 5. CONCLUSION The proposed Mahalanobis distance based local distrib ution oriented spectral clustering model is a good model in normal situations as well as it can handle the situations where the distrib ution of the points needs to be considered. The result sho ws that the model can handle non-con v e x data set successfully which is a major strength of the proposed model. Ne v ertheless, there are scope of impro v ements also. The model is time demanding because of the eigen v ec tor computations. So, the spars ification of the graph and clustering the sparse graph instead of the original one may be one impro v ement direction. Secondly , if the in v erse of the co v ariance matrix does not e xists, then the Mahalanobis distance cannot be calculated. Handling this will be a major impro v ement. Finally , model is sensiti v e to parameters lik e in similarity matrix calculation. The automatic calculation of such parameters can mak e the model e v en more rob ust. REFERENCES [1] G. Schaefer and H. Zhou, “Fuzzy clustering for colour reduction in images, T elecommunication Systems , v ol. 40, no. 1-2, pp. 17–25, 2009. [2] B. S. Ev eritt, D. Stahl, M. Leese, and S. Landau, Cluster Analysis , 5th ed. John W ile y & Sons, 2011. [3] A. K. Jain, M. N. Murty , and P . J. Flynn, “Data clustering: A re vie w , A CM Computing Surve ys , v ol. 31, no. 3, pp. 264–323, 1999. [4] R. Xu and D. W unsch-II, “Surv e y of clustering algorithms, IEEE T r ansactions on Neur al Network , v ol. 16, no. 3, pp. 645–678, May 2005. [5] F . F arshchian, E. P archam, and S. T ofighi, “Retinal i dentification and connecting this information to electronic health record, International J ournal of Electrical and Computer Engineering , v ol. 3, no. 3, pp. 359–365, June 2013. [6] S. Sharma and G. N. Purohit, Analysis of spectral clustering approach for tracking community formation in social netw ork, International J ournal of Social Networking and V irtual Communities , v ol. 1, no. 1, pp. 31–37, July 2012. [7] X. Y ang and W . Cui, A no v el spatial clustering algorithm based on delaunay triangulation, J ournal of Softwar e Engineering and Applications , v ol. 3, pp. 141–149, 2010. [8] P . Ro y and J. K. Mandal, A no v el spatial fuzzy clustering using delaunay triangulation for lar ge scale gis data (nsfcdt), Pr ocedia T ec hnolo gy , v ol. 6, no. 452459, 2012. [9] P . F oggia, G. Percannella, C. Sansone, and M. V ento, A graph-based clus tering method and its applications, Pr oceedings of Advances in Br ain, V ision, and Artificial Intellig ence , v ol. 4729, pp. 277–287, 2007. [10] P . Ro y and J. K. Mandal, A delaunay triangulation preprocessing based fuzzy-encroachment graph clustering for lar ge scale gis data, Pr oceedings of the International Symposium on Electr onic System Design, 2012 , pp. 300–305, 2012. [11] T . Asano, B. Bhattacharya, M. K eil, and F . Y ao, “Clustering algorithms based on minimum and maximum spanning trees, Pr oceedings of the 4th Annual Symposium on Computational Geometry , pp. 252–257, 1988. [12] D. J. Higham, G. Kalna, and M. Kibble, “Spectral clustering and its use in bioinformatics, J ournal of Computa- tional and Applied Mathematics , v ol. 204, pp. 25–37, 2007. [13] S. A. T oussi and H. S. Y azdi, “Feature selection in spectral clustering, International J ournal of Signal Pr ocess- ing , Ima g e Pr ocessing and P attern Reco gnition , v ol. 4, no. 3, pp. 179–194, September 2011. [14] H. Qiu and E. R. Hancock, “Graph matching and clustering using spectral partitions, P attern Reco gnition , v ol. 39, pp. 22–34, 2006. [15] U. v on Luxb ur g, A tutorial on spectral clustering, Max Planck Institute for Biological Cybernetics, T ech. Rep. TR-149, August 2006. [16] F . Chung, “Spectral graph theory , American Mathematical Society ,USA, T ech. Rep., 1997. [17] D. A. Spielman and S.-H. T eng, “Spectral partitioning w orks: Planar graphs and finite element meshes, Linear Alg ebr a and its Applications , v ol. 421, p. 284305, 2007. [18] M. Fiedler , A property of eigen v ectors of nonne g ati v e symmetric matrices and its application to graph theory , Czec hoslo vak Mathematical J ournal , v ol. 25, no. 4, pp. 619–633, 1975. [19] P . C. Mahalanobis, “On the generalized dis tance in statistics, in Pr oceedings of the National institute of Sciences of India , no. 2, 1936, pp. 49–55. [20] S. Marsland, Mac hine Learning , An Algorithmic P er spective , 1st ed., ser . Machine Learning and P attern Recog- nition Series. CRC Press, 2009. Spectr al Clustering based on Local Distrib ution(P arthajit Roy) Evaluation Warning : The document was created with Spire.PDF for Python.
370 ISSN: 2088-8708 [21] V . J. Hodge and J. Austin, A surv e y of outlier detection methodologies, Artificial Intellig ence Re vie w , v ol. 22, pp. 85–126, 2004. [22] S. Saha and S. Bandyopadh yay , “Performance e v aluation of some symmetry-based cluster v alidity inde x es, IEEE T r ansactions on Systems, Man, and CyberneticsP art C:Applications AND Re vie ws , v ol. 39, no. 4, pp. 420–425, July 2009. [23] C. Godsil and G. Ro yle, Alg ebr aic Gr aph Theory , ser . Graduate T e xts in Mathematics. Springer , 2001, v ol. 207. [24] B. Mohar , Gr aph Theory , Combinatorics, and Applications . W illy , 1991, v ol. 2, ch. Laplacian Spectrum of Graphs, pp. 871–898. [25] ——, Gr aph Symmetry: Alg ebr aic Methods and Aplications , ser . C 497. Kluwer , 1997, v ol. N A T O ASI, ch. Some Applications of Laplace eigen v alues of graphs, pp. 225–275. [26] H. Chang and D.-Y . Y eung, “Rob ust path-based spectral clustering, P attern Reco gnition , v ol. 41, no. 1, pp. 191–203, January 2008. [27] J. Shi and J. Malik, “Normalized cuts and image se gmentation, IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence , v ol. 8, no. 22, pp. 888–905, August 2000. IJECE V ol. 5, No. 2, April 2015: 361 370 Evaluation Warning : The document was created with Spire.PDF for Python.