Indonesian J our nal of Electrical Engineering and Computer Science V ol. 22, No. 3, June 2021, pp. 1619 1628 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v22i3.pp1619-1628 r 1619 Unsuper vised featur e selection with least-squar es quadratic mutual inf ormation J anya Sainui, Chouv anee Sri visal Di vision of Computational Science, F aculty of Science, Prince of Songkla Uni v ersity , Songkhla, Thailand Article Inf o Article history: Recei v ed Dec 27, 2020 Re vised May 10, 2021 Accepted May 24, 2021 K eyw ords: Least-squares quadratic mutual information Mutual information Unsupervised feature selection ABSTRA CT W e propose the feature selection method based on the dependenc y bet ween features in an unsupervised manner . The underlying assumption is that the most important feature should pro vide high dependenc y between itself and the rest of the features. Therefore, the top m features with maximum dependenc y scores should be selected, b ut the redundant features should be ignored. T o deal with this problem, the objecti v e function that is applied to e v aluate the dependenc y between features plays a crucial role. Ho we v er , pre vious methods mainly used the mutual information (MI), where the MI esti mator based on the k -nearest neighbor graph, resulting in its estimation de- pendent on the selection of paramet er , k , without a systematic w ay to select it. This implies that the MI estimator tends to be less reliable. Here, we introduce the least- squares quadratic mutual information (LSQMI) that is more sensible because its tuning parameters can be selected by cross-v alidation. W e sho w through the e xperiments that the use of LSQMI performed better than that of MI. In addition, we compared the pro- posed method to the three counterpart methods using six UCI benchmark datasets. The results demonstrated that the proposed method is useful for selecting the informati v e features as well as discarding the redundant ones. This is an open access article under the CC BY -SA license . Corresponding A uthor: Jan ya Sainui Di vision of Computational Science, F aculty of Science Prince of Songkla Uni v ersity Songkhla, Thailand Email: jan ya.s@psu.ac.th 1. INTR ODUCTION Feature selection aims to select the most informati v e subset of features to capture the structure of the original data. It usually appears as a pre-processing step for v arious tasks such as classification [1], [2], clustering [3]-[5], data mining [6], resulting in better results. In a supervised scenario, the idea is to select the features t h a t are the most rele v ant to the class labels [7], [8]. Ho we v er , in this paper , we focus on an unsupervised feature selection that is more dif ficult to achie v e than the supervised fea ture selection due to the absence of the class labels [9]-[14]. In unsupervised manner , feature selection is important as it helps impro v e the performance as well as reduce the computational time of clustering [15]-[19]. Moreo v er , unsupervised feature selections may be used for v arious purposes such as visualization and pre-processing step for supervised learning [7], [15], [20]. Unsupervised feature selection c an be di vided into tw o cate gories, namely wrapper method and filter method [10], [15]. Man y e xisting methods of unsupervised feature selection are based on the wrapper method such that the selected features are dependent on the specific clustering algorithm. The main dra wbacks of the wrapper method are its computational comple xity and limited use with a particular clustering J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
1620 r ISSN: 2502-4752 algorithm. In this paper , we focus on the filter method which e v aluates the important features without using an y clustering algorithm [21]-[25]. Thus, the filter method is more general and useful than the wrapper method. W e re vie w some e xisting unsupervised feature selection methods relating to our w ork as follo ws. Laplacian score (LS) [10] is based on Laplacia n Eigenmap and locality preserving projection. F or each feature, the Laplacian score is estimated using the nearest neighbor graph. If it is a good feature, its LS tends to be small. The LS is based on the observ ation that, tw o data points are probably related to the same cluster if the y are close to each other . The dra wbacks of this method are that there is no systematic w ay to tune the number of nearest neighbor k or Gaussian width, and the redundant features are not observ ed. multi-cluster feature selection (MCFS) [11] selects the features while pres erving the cluster structure of the data. The authors proposed to use multi ple eigen v ectors of graph Laplacian, which are defined on the af finity matrix of data points to capture the multi-cluster structure of data. The algorithm performs especially well when the number of selected features is small (e.g., < = 50 ). Ho we v er , it performs best when the number of used eigen v ectors is equal to the number of clusters, b ut in an unsupervised manner , the number of clusters remains unkno wn. Moreo v er , we observ ed through the e xperiments that this method tends to be sensiti v e to noise features. Unsupervised feature selection based on mutual information (UFSMI) [9] is deri v ed from the observ ation that good features share information in common, and noisy features are less correlated with the other features. Mutual information (MI) is then used as the objecti v e function to capture the shared information between a feature and the rest of features, and the y choose the first m features that achie v e the higher mutual information. The higher le v el of noise leads to a smaller a v erage v alue of the score function because the addition of noise reduces the mutual information between features. The desired property of t he feature selection algorithm aims at remo ving noisy features. Ho we v er , the estimation of MI is less reliable as it is computed using the k -nearest neighbor graph. Thus, its performance is based on the tuning parameter , k , resulting in that a subset of the selected features may not be the best. All methods mentioned abo v e including se v eral e xisting methods such as [12]-[14] are based on the k -nearest neighbor graph or other tuning param eters, and there is no systematic w ay to choose such parameters (i.e., k ). In addi tion, the y did not observ e the redundant features. Therefore, in this paper , we w ould lik e to deal with these problems. Our idea is inspired by [9] as the y pro v ed that a good feature is e xpected to be dependent on the rest of features. Ho we v er , the method did not tak e the redundanc y of features into account, and thus may not produce an optimal feature subset. Moreo v er , in [9], the y used the mutual information (MI) as their dependent measure, and the MI is estimated based on the k -nearest neighbors which lacks of the systematic w ay to choose the appropriate k . So that the estimation of MI tends to be less reliable. In order to solv e these problems, we propose to apply the L 2 -distance v ariant of MI called the quadratic mutual information (QMI) [26], where the least-square method is used to approximate QMI [27]. Therefore, the used parameters c an be obtained automatically by cross-v alidation. The contrib ution of t his paper is a no v el method of unsupervised feature selection based on the least-squares quadratic mutual information (LSQMI), which is po werful for selecting the most important features as well as rejecting the redundant features. In the e xperimental results, we sho w that the LSQMI is more reliable than MI. In addition, we demonstrate that the proposed method is promising through the e xperimental results on six UCI machine learning repositories. The rest of paper is or g anized as follo ws. In section 2, we re vie w the least-square quadratic mutual information and its norm alization. W e describe the proposed method in section 3. The e xperimental setup and results are sho wn in section 4. In section 5, we pro vide the conclusion. 2. THE LEAST -SQ U ARE Q U ADRA TIC MUTU AL INFORMA TION In this section, we first re vie w the least-square quadratic mutual information (LSQMI) [27] that can be used to measure a statistical dependence among features. As LSQMI is ranged from 0 to 1 , we need to normalize LSQMI to [0,1] range. W e then re vie w the normalization of LSQMI [28] that is finally used as the objecti v e function for our proposed unsupervised feature selection method. 2.1. LSQMI estimation Suppose that we are gi v en a set of f ( x i ; y i ) g n i =1 independently dra wn from a joint probabili ty dis- trib ution with density p ( x ; y ) . Here, x 2 R n is a featrue v ector and y 2 R n ( d 1) is the rest of feature v ectors, where n denotes the number of samples and d denotes the number of features. The quadratic mutual information (QMI) [26] between x and y is defined as Indonesian J Elec Eng & Comp Sci, V ol. 22, No. 3, June 2021 : 1619 1628 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1621 QMI := Z Z f ( x ; y ) 2 d x d y ; where f ( x ; y ) := p ( x ; y ) p ( x ) p ( y ) ; and p ( x ) and p ( y ) denote the mar ginal densities of x and y , respecti v ely . In LSQMI [27], the density dif ference f ( x ; y ) is modeled as g ( x ; y ) = n X ` =1 ` K ( x ; x ` ) L ( y ; y ` ) ; where K ( x ; x ` ) and L ( y ; y ` ) are k ernel functions for x and y , respecti v ely . Then, = ( 1 ; : : : ; n ) > is learned by least-squares as min Z Z g ( x ; y ) f ( x ; y ) 2 d x d y : An empirical and re gularized v ersion of the abo v e optimization problem is gi v en as b := argmin h > H 2 > b h + > i ; where 0 is the re gularization parameter , and H and b h are defined as H `;` 0 := Z K ( x ; x ` ) K ( x ; x ` 0 )d x Z L ( y ; y ` ) L ( y ; y ` 0 )d y ; (1) b h ` := 1 n n X i =1 K ( x i ; x ` ) L ( y i ; y ` ) 1 n 2 n X i;j =1 K ( x i ; x ` ) L ( y j ; y ` ) : In this paper , we use the Gaussian k ernel for both K ( x ; x ` ) and L ( y ; y ` ) as both x and y are continuous: K ( x ; x ` ) := e xp k x x ` k 2 2 2 ; L ( y ; y ` ) := e xp k y y ` k 2 2 2 ; so that the inte gral in H (1) can be computed analytically as H `;` 0 = ( 2 ) d x = 2 exp k x ` x ` k 2 4 2 ( 2 ) d y = 2 exp k y ` y ` 0 k 2 4 2 : Thus, the solution b can be obtained as b = ( H + I ) 1 b h ; where I denotes the identity matrix. Finally , the least-squares quadratic mutual information (LSQMI) is gi v en by [ QMI := m ax (0 ; 2 b > b h b > H b ) : Unsupervised featur e selection with least-squar es quadr atic mutual information (J anya Sainui) Evaluation Warning : The document was created with Spire.PDF for Python.
1622 r ISSN: 2502-4752 2.2. The normalisation of LSQMI As LSQMI is ranged from 0 to 1 , we here normalize it as follo w: \ NQMI := [ QMI( x ; y ) max ( [ QMI( x ; x ) ; [ QMI( y ; y )) : (2) The higher \ NQMI indicates the more information shared between the feature x and the rest of features y , while it is closed to zero if the feature x is a noise feature. W e assume that the v alues of [ QMI( y ; y ) and [ QMI( x ; x ) are lar ger than zero and that of [ QMI( x ; y ) , because the y are the dependenc y of themselv es. No w , the range of \ NQMI is [0 ; 1] . Ho we v er , in practice, \ NQMI may be higher than 1. 3. THE PR OPOSED UNSUPER VISED FEA TURE SELECTION METHOD In an unsupervi sed scenario, the objecti v e function that is used to e v aluate the important feature is v ery important. Ho we v er , the objecti v e function of the e xisting methods tends to be less reliable because its approximation depends on the parameters included in the objecti v e function, and there is no automatic w ay to choose those parameters. In this paper , we propose an unsupervised feature selection by using a more reliable objecti v e function called the least-squares quadratic mutual information (LSQMI). The benefits of LSQMI are that the parameters needed in the objecti v e function can be chosen by cross v alidation, and it is less sensiti v e to noises and outliers that may cause the estimation of the selected criteria unreliable [26], [27]. The proposed method is described as follo ws. Gi v en a set of n samples denoted as f x i g n i =1 , where each sample x i consists of d features (i.e., x i = f f 1 ; : : : ; f d g ), our goal here is to select the most m informati v e features from d features, while the redundant features should be discarded. T o address this problem, we apply the LSQMI [27] as our criteria for both selecting the most m important features as well as ignoring the redundant ones. Let f j 2 R n be the j th feature, and f n j 2 R n ( d 1 ) be all features e xcluding f j . T o select the import ant b ut not redundant features, our proposed algorithm is sho wn in Algorithm 1. Algorithm 1 The unsupervised feature selection with least-squares quadratic mutual information (UFSLSQMI) Input: A set of features, f f 1 ; :::; f d g , f j 2 R n , the number of needed features, m , and a threshold, . Output: A subset of selected features, S = f f 1 ; : : : ; f m g . 1: S   ? 2: f or j = 1 ; : : : ; d do 3: compute \ NQMI (Equation (2)) between f j and f n j 4: end f or 5: sort features f f 1 ; :::; f d g according to their \ NQMI in descending order resulting in f f 1 ; :::; f d g 6: S   f S [ f 1 g 7: k   1 8: f or j = 2 ; : : : ; d do 9: if ( k < m ) then 10: if ( \ NQMI( f j ; f n j ) \ NQMI( f j 1 ; f n j 1 ) < ) then 11: S   f S [ f j g 12: k   k + 1 13: end if 14: else 15: break 16: end if 17: end f or The inputs of the algorithm are a set of features, f f 1 ; :::; f d g , the number of needed features, m , and a threshold, . While the output is a subset of selected features S = f f 1 ; : : : ; f m g . The algorithm starts by Indonesian J Elec Eng & Comp Sci, V ol. 22, No. 3, June 2021 : 1619 1628 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1623 computing the normalisation of LSQMI as (2) for all features (lines 2 - 4). Then, the features are sorted in descending order according to their score (l ine 5). Last ly , the features are selected according to the ranking; a feature with the highest score is firstly selected and the feature with the second high s core is then selected and so on. Ho we v er , the redundant features are ignored by discarding the feature that has the normalisation of LSQMI score closed to the pre vious one. In other w ords, if the ratio of the current score and the pre vious one is higher than the gi v en threshold , (line 10), the feature according to the smaller score is ignored. The algorithm is repeated until the m features (lines 8 - 17) are obtained. The performance of the proposed method depends on the parameters used for approximating the LSQMI. Ho we v er , thanks to the least-squares method, we can choose the appropriate parameters by cross- v alidation method. Although the s elected features are also depended on the threshold , we sho w you through the e xperiments that our objecti v e function well defines the redundant features and the noise features. In other w ords, if tw o features are redundanc y , their LSQMI scores tend to be closed to each other , and if a feature is the noise feature, the LSQMI score tends to be small. 4. RESUL TS AND DISCUSSION In this section, we demonstrate the ef fecti v eness of the proposed unsupervised feature selection. W e compared the proposed method with three competiti v e methods, including UFSMI [9], LSFS [10], and MCFS [11]. All methods are based on the k -nearest neighbor , and here we set k = 5 for all methods as their def ault setting. F or MCFS, there is another parameter that is the number of eigen v ectors, and we set this parameter at 5 as the same reason. F or the proposed method, the threshold is set at 0.95. The competiti v e methods including the proposed method rank the features according to the criteria to select the top m features. This means that the objecti v e function mainly af fect a subset of selected features. Thanks to the reliability of LSQMI, our method considers not only the importance of features b ut the redun- danc y also. Therefore, in the e xperiment setups, we firstly sho w you that our objecti v e function i s ef fecti v e for selecting the important features as well as discarding the redundant features on the synthetic data. Secondly , we illustrate the clustering performance after feature selection using six UCI benchmark datasets to confirm the usefulness of our proposed method. 4.1. Synthetic data In the first e xperiment, we w ould lik e to demonstrate the reliability of the LSQMI comparing to the competiti v e methods. Specifically , we w ould lik e to sho w you that the LSQMI is more reliable for e v aluating both the informati v e feature and the redundant features. T o do so, we conducted a synthetic data of 3 clusters with 3-dimensionality illustrated in Figure 1. W e generated 100 instances with standard de viation = 1 for each class. Figure 1 (a)-(c) sho ws that the 3 r d feature is the most important for underlying clusters, while the 1 st feature and the 2 nd feature are redundanc y , because the y ha v e the same mean. Therefore, if we w ould lik e to choose tw o features (i.e., m = 2 ), the first selected one should be the 3 r d feature, and the second one may be the 1 st feature or the 2 nd feature. In other w ords, the set of selected features should not be the 1 st feature and the 2 nd feature, because the y can capture only 2 clusters of the original data as sho wn in Figure 1 (a). (a) (b) (c) Figure 1. Relationship between features in the synthetic data: (a) Feature 1 vs. Feature 2, (b) Feature 1 vs. Feature 3, (c) Feature 2 vs. Feature 3 Unsupervised featur e selection with least-squar es quadr atic mutual information (J anya Sainui) Evaluation Warning : The document was created with Spire.PDF for Python.
1624 r ISSN: 2502-4752 W e ran all methods using this to y data, where each method w as repeated 10 times with dif ferent random data. W e found that UFSMI performs the w orst, because the 3 r d feature w as not selected 4 times. LS missed selecting the 3 r d feature 1 time, while the proposed method and MCFS al w ays selected the 3 r d feature. The a v erage scores between each feature and the rest features obtained from each objecti v e function are sho wn in T able 1, sho wing that our objecti v e function, \ NQMI , tends to be more sensible than other ones. As can be seen, the \ NQMI score between the 3 r d feature and the rest features is significantly highest, while the \ NQMI scores according to the 2 nd feature and the 1 st feature are close to each other . In contrast to other objecti v e functions, the scores according to each feature are almost the same for all features. Notice that the scores of LS sho wing in T ables 1 and 2 are the scores of (1 Laplacian score) for each feature. Thus, as same as other objecti v e functions, the feature with lar ger score is more important. T able 1. The a v erage scores o v er 10 runs between the feature j and the rest of the features obtained from dif ferent objecti v e functions using the synthetic data Objecti v e functions feature 1 feature 2 feature 3 LS 0.9935 0.9934 0.9936 MI -9.3682 -9.3771 -9.3730 MCFS 0.7330 0.8010 0.9848 \ NQMI 0.7424 0.7404 1.9034 T able 2. The a v erage scores o v er 10 runs between the feature j and the rest of the features obtained from dif ferent objecti v e functions after adding the noise feature (feature 4) Objecti v e functions feature 1 feature 2 feature 3 feature 4 LS 0.9821 0.9818 0.9801 0.9704 MI -8.1727 -8.1980 -8.2047 -8.8783 MCFS 0.6872 0.7244 0.8226 0.9997 \ NQMI 0.3331 0.3415 0.8952 0.0167 Ne xt, we added a noise feature (the 4 th feature) with mean = 0 for all clusters and standard de viation = 1 into the a b o v e to y data. W e ran the e xperiment 10 times for each method ag ain, and the results sho w that our method outperforms other methods including MCFS. Moreo v er , MCFS performs the w orst in this e xperiment as it al w ays selected the 4 th feature, while the other three methods ne v er selected the 4 th feature. The a v erage scores of each method are sho wn in T able 2, confirmi n g that \ NQMI is more reliable than other objecti v e functions. Notice that MCFS w orks well if the number of used eigen v ectors is equal to the number of clusters; ho we v er , in an unsupervised scenario, we do not kno w the actual number of clusters. 4.2. UCI datasets Here, we e v aluated the clustering performance after feature selection using six UCI benchmark datasets, namely Abalone, Sonar , Glass, Pima, Heart, and Cancer datasets. F or each dataset, the number of clusters ( c ), the number of features ( d ), and the number of samples ( n ) are sho wn in Figure 2. F or each run, we randomly chosen 90% samples from each dataset, then we performed feature selection. Finally , we performed clustering by the k -means algorithm; we ran the k -means algorithm 10 times with random initialization and chose the best solution with the minimum error rate. The clustering accurac y is e v aluated. The results of each dataset are sho wn in Figure 2, sho wing the a v erage accuracies o v er 10 runs at dif ferent number of seleced features, m . The results indicate that the proposed method (i.e., UFSLSQMI) almost performs better than the counterpart approaches. Ho we v er , we cannot say that the proposed method is the best. Because, in practice, there is no method that w orks well for all data. Moreo v er , it is hard to e xplain the reasons wh y it w orks or it does not w ork. What we can say is that the proposed method can be an alternati v e for solving unsupervised feature selection problem that may w ork well for your data. Indonesian J Elec Eng & Comp Sci, V ol. 22, No. 3, June 2021 : 1619 1628 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1625 (a) (b) (c) (d) (e) (f) Figure 2. The a v erage of clustering accuracies o v er 10 runs on UCI datasets: (a) Abalone (c=10 ,d=8 , n=2493), (b) Sonar (c=2 ,d= 60, n=188), (c) Glass (c=6 ,d=9 , n= 193), (d) Pima (c= 2,d= 8, n= 692), (e) Heart (c=2 ,d=13 , n=244), (f) Cancer (c=2 ,d=9 , n=615) 4.3. Discussion In the e xperime nts, we compar ed the proposed unsupervised feature selection method with the other filter methods. All m ethods including the proposed method select the informati v e features from the rank of features according to the scores of the objecti v e function. This means that the objecti v e function plays an important role for this task. W e discuss here each objecti v e function as fol lo ws. Firstly , Laplacian score (LS) is computed by constructing of a nearest neighbor graph with the specific-defined number of nearest neighbors, k . This is the reason wh y the Laplacian score feature selection does not al w ays perform well. In addit ion, as can be seen in the e xperiments on to y data (T able 1 and T able 2), the Laplacian scores of the most important feature and the noise feature are not significantly dif ferent from each other , this may cause the rank of features relating to the important features is unreliable. Unsupervised featur e selection with least-squar es quadr atic mutual information (J anya Sainui) Evaluation Warning : The document was created with Spire.PDF for Python.
1626 r ISSN: 2502-4752 Secondly , the mutual information (MI) is also computed based on the k -nearest neighbors graph. The results in T ables 1 and 2 demonstrate that this MI score does not des cribe the dif ference between t he informati v e feature and the noise feature well. In other w ords, as same as LS, the MI scores of the important feature and the noise feature are not significantly dif ferent from each other . Thirdly , the MCFS score is also computed based on the k -nearest neighbor graph such that the suitable k ef fects the good score. Not only the parameter k , the MCFS score also needs the number of used eigen v ectors that mak es the MCFS score depend on these tw o parame ters. As can be seen in the e xperiment on to y data with noise feature, the MC FS score of noise feature is unreliable; ho we v er , if we change the number o f used eigen v ectors to be 2 instead of 5, the MCFS score of noise feature becomes more reliable. This confirms that the MCFS score hea vily depends on the number of used eigen v ectors. Lastly , the proposed method utilizes the LSQMI that approximates by least-square method so that the cross-v alidation can be included, resulting in that the estimator of QMI is sensible. As can be seen in T ables 1 and 2, the normalized LSQMI describe the structure of the original data as well; the LSQMI is highest if the feature is the most important as the 3 r d feature of the to y data; in contrast, the LSQMI is smallest if the feature is the noise feature that pro vides no information about data lik e the 4 th feature. The main dra wback of the proposed method is the time comple xity that is higher than other methods because of the cross-v alidation process. T o sum up, the main problem of the filter based unsupervised feature selection methods is lacking of the ability to automatically choose the parameters included in their objecti v e function so that the approximation of the objecti v e function may be unreliable. Thus, we here propose to use the LSQMI for unsupervised feature selection, that is not only reliable for e v aluating the important features b ut also for ignoring the redundant features. 5. CONCLUSION The goal of feature selection is to select a subset of features in order to reduce the dimensionality of the original data, while the structure of data should be preserv ed. There are tw o main points that are important in order to achie v e this goal. Firstly , we need to select the most informati v e features underlying preserving the structure of the original data. Secondly , we ha v e to remo v e the redundant features because the y do not pro vide more information about data. Ho we v er , the e xisting approaches may not select the important features because the use of unreliable objecti v e functions. Moreo v er , the y do not ignore the redundant features. In this paper , we focus on these tw o points by e xploring the reliable objecti v e function that is the most important for e v aluating the informati v e features as well as the redundant features in an unsupervis ed manner . Specifically , we propose to use the least-squares quadratic mutual information (LSQMI) as the objecti v e function to select the v aluable features e xcluding the redundant features. The benefit of LSQMI is that the needed parameters can be chosen by cross v alidation resulting that the estimation of the objecti v e function will be more reliable, while most of the other objecti v e functions lack of the systematic w ay to choose their parameters. Through the e xperiments, we sho w that the LSQMI can be used to determine the important features including the redundant feature as well. REFERENCES [1] S. B. K ot siantis, ”Feature selection for machine learning classification problems: a recent o v ervie w , Artificial Intellig ence Re vie w , v ol. 42, no. 1, pp. 157–176, 2011, doi: 10.1007/s10462-011-9230-1. [2] J. Cai, J. Luo, S. W ang, and S. Y ang, ”Feature selection in machine lear ning: a ne w perspecti v e, Neur o- computing , v ol. 300, pp. 70-79, 2018, doi: 10.1016/j.neucom.2017.11.077. [3] L. T ala v era, C. Nord, and J. Girona, ”Dependenc y-based feat u r e selection for clustering symbolic data, Intellig ent Data Analysis , v ol. 4, no. 1, pp. 19-28, 2000, doi: 10.3233/ID A-2000-4103. [4] V . Roth and T . Lange, ”Feature selection in clustering problems, Advances in Neur al Information Pr o- cessing Systems , v ol. 16, pp. 473–480, 2004. [5] Ma. Dash, K. Choi, P . Scheuermann, and H. Liu, ”Feature selection for clustering—a filter solution, 2002 IEEE International Confer ence on Data Mining , 2002. Pr oceedings , 2002, pp. 115-122, doi: 10.1109/ICDM.2002.1183893. Indonesian J Elec Eng & Comp Sci, V ol. 22, No. 3, June 2021 : 1619 1628 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1627 [6] S. V isalakshi and V . Radha, ”A lit erature re vie w of feature selection techniques and applications: Re vie w of feature selection in data mining, IEEE International Confer ence on Computational Intellig ence and Computing Resear c h, 2014 , pp. 1-6, 2014, doi: 10.1109/ICCIC.2014.7238499. [7] J. T ang, S. Alelyani, and H. Liu, ”Feature selection for classification: a re vie w , Data classification: Algorithms and applications , CRC Press, pp. 37–64, 2014. [8] J. R. V er g ara and P . A. Est ´ ev ez, ”A re vie w of feature selection methods based on mutual information, Neur al Computing and Application , v ol. 24, no. 1, pp. 175–186, 2014, doi: 10.1007/s00521-013-1368-0. [9] J. Xu, Z. Y uming, C. Lin, and X. Bao wen, ”An unsupervised feature selection approach based on mutual information, J ournal of Computer Resear c h and De velopment , v ol. 49, no. 2, pp. 372-382, 2012. [10] X. He, D. Cai, and P . Niyogi, ”Laplacian score for feature selection, Advances in Neur al Information Pr ocessing Systems , 2005, pp. 507-514. [11] D. Cai, C. Zhang, and X. He, ”Unsupervised feature selection for multi-cluster data, Pr oceedings of the 16th A CM SIGKDD internati onal confer ence on Knowledg e disco very and data mining , pp. 333-342, 2010, doi: 10.1145/1835804.1835848. [12] Y . W ang, ”Unsupervised Representati v e Feature Selection Algorithm Based on Information En- trop y and Rele v ance Analysis, in IEEE Access , v ol. 6, pp. 45317-45324, 2018, doi: 10.1109/A C- CESS.2018.2863752. [13] L. Shi, L. Du, and Y . Shen, ”Rob ust Spectral Learning for Unsupervised Feature Selection, IEEE Inter - national Confer ence on Data Mining , 2014, pp. 977-982, doi: 10.1109/ICDM.2014.58. [14] F . Nie, W . Zhu, and X. Li, ”Unsupervised feature selection with structured graph optimiza- tion, Pr oceedings of the 30th Confer ence on Artificial Intellig ence , 2016, pp. 1302–1308, doi: 10.5555/3015812.3016004 . [15] S. S.-Fern ´ andez, J. A. C.-Ochoa, and J. Fco. M.-T rinidad, ”A re vie w of unsupervised feature selection methods, Artificial Intellig ence Re vie w , v ol. 53, pp. 907–948, 2020, doi: 10.1007/s10462-019-09682-y . [16] S. Alelyani, J. T ang, and H. Liu, ”Feature selection for clustering: a re vie w , Data Cluster Algorithms Applications , pp.29-60, 2018. [17] M. Dash and Y . Ong, ”RELIEF-C: ef ficient feature selection for clustering o v er noisy data, 2011 IEEE 23r d International Confer ence on T ools with Artificial Intellig ence , 2011, pp. 869-872, doi: 10.1109/IC- T AI.2011.135. [18] V . M. Rao and V . N. Sastry , ”Unsupervised feature ranking based on representation entrop y , 2012 1st International Confer ence on Recent Advances in Information T ec hnolo gy (RAIT) , 2012, pp. 421-425, doi: 10.1109/RAIT .2012.6194631. [19] J. G. Dy and C. E. Brodle y , ”Feature selection for unsupervised learning, J ournal of Mac hine Learning Resear c h , v ol. 5, pp. 845–889, 2004. [20] P . Y . Lee, W . P . Loh, and J. F . Chin, ”Feature selection in multimedia: the state-of-the-art re vie w , Ima g e V ision Computing , v ol. 67, pp. 29–42, 2017, doi: 10.1016/j.ima vis.2017.09.004. [21] R. V arsha vsk y , A. Gottlieb, M. Linial, and D. Horn, ”No v el unsupervised feature filtering of biological data, Bioinformatics , v ol.22, no. 14, pp. e507–e513, 2006, doi: 10.1093/bioinformatics/btl214. [22] M.i Banerjee and N. R. P al, ”Feature selection with SVD entrop y: some modification and e xtension, Information Sciences , v ol. 264, pp. 118–134, 2014, doi: 10.1016/j.ins.2013.12.029. [23] S. S.-Fern ´ andez, J. F . Mart ´ ınez-T rinidad, and J. A. Carrasco-Ochoa, ”A ne w unsupervised spectral feature selection method for mi x ed data: a filter approach, P attern Reco gnition , v ol. 72, pp. 314–326, 2017, doi: 10.1016/j.patcog.2017.07.020. [24] X. Zhu, Y . W ang, Y . Li, Y . T an, G. W ang, and Q. Song, ”A ne w unsupervised feature selection algorithm using similarity-based feature clustering, Computational Intellig ence , v ol. 35, no. 1, pp. 2-22, 2018, doi: 10.1111/coin.12192. [25] H. W ang, Y . Zhang, J. Zhang, T . Li, and L. Peng, ”A f actor graph model for unsupervised feature selec- tion, Information Sciences , v ol. 480, pp. 144-159, 2019, doi: 10.1016/j.ins.2018.12.034. [26] K. T orkk ola, ”Feature e xtraction by non-parametric mutual information maximization, The J ournal of Mac hine Learning Resear c h , v ol. 3, pp. 1415-1438, 2003. [27] J. Sainui and M. Sugiyama, ”Direct approximation of quadratic mutual information and its application to dependence maximization clustering, IEICE T r ansactions on Information and Systems , v ol. E96-D, no. 10, pp. 2282-2285, 2013, doi: 10.1587/transinf.E96.D.2282. Unsupervised featur e selection with least-squar es quadr atic mutual information (J anya Sainui) Evaluation Warning : The document was created with Spire.PDF for Python.
1628 r ISSN: 2502-4752 [28] J. Sainui and M. Sugiyama, ”Unsupervised k e y frame selection using information theory and colour histogram dif ference, International J ournal of Business Intellig ence and Data Mining , v ol. 16, no. 3, pp. 324-344, 2020, doi: 10.1504/ijbidm.2020.106137. BIOGRAPHIES OF A UTHORS J anya Sainui recei v ed her BS and MS in Computer Science from Prince of Songkla Uni v ersity , Hat Y ai, Songkhla, Thailand, in 2005 and 2009, respecti v ely . She is currently a Lecturer in Computer Science at Prince of Songkla Uni v ersity , Thailand. She has conduct ed research on topics such as bitmap inde xing, video inde xing and detection, clustering and unsupervised dimension reduction. She is interested in algorithm and application of machine learning, especially , applying machine learning techniques to real w orld applications such as image/video processing, medical imaging, bioinformatics, as well as information retrie v al. Chouv anee Sri visal is currently a Lecturer in Computer Science at F aculty of Science Prince of Songkla Uni v ersity , Thailand. She recei v ed her BS in Computer Science from Prince of Songkla Uni v ersity and MS in Information T echnology from King Mongkut’ s Institute of T echnology Lad- krabang, Thailand in 1997 and 2002, respecti v ely . She has interested in algorithm and application of machine learning that for conducting research on topics such as image processing and information retrie v al. Indonesian J Elec Eng & Comp Sci, V ol. 22, No. 3, June 2021 : 1619 1628 Evaluation Warning : The document was created with Spire.PDF for Python.