Inter national J our nal of Electrical and Computer Engineering (IJECE) V ol. 7, No. 3, June 2017, pp. 1112 1124 ISSN: 2088-8708 1112       I ns t it u t e  o f  A d v a nce d  Eng ine e r i ng  a nd  S cie nce   w     w     w       i                       l       c       m     Comparati v e Study in Determining F eatur es Extraction f or Islanding Detection using Data Mining T echnique: Corr elation and Coefficient Analysis Aziah Khamis *1 , Y an Xu 2 , and Azah Mohamed 3 1 F aculty of Electrical Engineering, Uni v ersiti T eknikal Malaysia Melaka, Malaysia 1 School of Electrical and Information Engineering, The Uni v ersity of Sydne y , NSW , Australia 2 School of Electrical and Electronic Engineering, Nan yang T echnological Uni v ersity , Sing apore 3 Department of Electrical, Electronic and System Engineering, Uni v ersiti K ebangsaan Malaysia, Malaysia Article Inf o Article history: Recei v ed Oct 24, 2016 Re vised Feb 8, 2017 Accepted Feb 22, 2017 K eyw ord: Islanding Detection Distrib uted Generation Data-mining Random F orest ABSTRA CT A comprehensi v e comparison study on the data mining based approaches for detecting is- landing e v ents in a po wer distrib ution system with in v erter -based distrib uted generations is presented. The important features for each phase in the island detection scheme are in v esti- g ated in detail. These features are e xtracted from the time-v arying measurements of v oltage, frequenc y and total harmonic distortion (THD) of current and v oltage at the point of com- mon coupling. Numerical studies were conducted on the IEEE 34-b us system considering v arious scenarios of islanding and non-islanding conditions. The features obtained are then used to train se v eral data mining techniques such as decision tree, support v ector machine, neural netw ork, bagging and random forest (RF). The simulation results sho wed that the im- portant feature paramet ers can be e v aluated based on the correlation between the e xtracted features. From the results, the four important features that gi v e accurate islanding detec- tion are the fundamental v oltage THD, fundamental current THD, rate of change of v oltage magnitude and v oltage de viation. Comparison studies demonstrated the ef fecti v eness of the RF method in achie ving high accurac y for islanding detection. Copyright c 2017 Institute of Advanced Engineering and Science . All rights r eserved. Corresponding A uthor: Aziah Khamis F aculty of Electrical Engineering, Uni v ersiti T eknikal Malaysia Melaka, Malaysia. Uni v ersiti T eknikal Malaysia Melaka,Hang T uah Jaya, 76100 Durian T ungg al, Melaka, Malaysia. aziah83@gmail.com 1. INTR ODUCTION A small localized po wer source called as distrib uted generation (DG) bec omes an alternati v e to b ulk electric generation due to yearly demand gro wth. These DGs can be in the form of wind f arm, micro h ydro turbine and photo v oltaic (PV) generator . Generall y , these DGs are in the range of kW up to MW with se v eral adv antages such as en vironmental benefits, impro v ed reliabil ity , increased ef ficienc y , impro v ed po wer quality and reduced transmission and distrib ution line losses [1–3]. Ho we v er , one of the major dra wbacks of DGs is when subjected to islanding mode of operation. Islanding is referred as disconnection of the main source in which it can be operated either intentional or unintentional. When disconnection occurs, the acti v e part of the distrib ution system should sense the disconnection from the main grid and shut do wn the DGs, where island operation is prohibited or control action must be acti v ated to stabilize the islanded part of system [4, 5]. Islanding operation has some benefits b ut se v eral dra wbacks are still observ ed, especially in unintentional islanding e v ents which may cause problems related to po wer quality , safety , v oltage and frequenc y stabilities, and interference [6, 7]. V arious techniques ha v e been de v eloped to detect islanding. Islanding techniques can generally be classified into remote and local methods. Remote methods are based on communication between the po wer utility and the DGs. Remote methods are highly reliable, b ut the practical implementation of these schemes can be infle xible, comple x and J ournal Homepage: http://iaesjournal.com/online/inde x.php/IJECE       I ns t it u t e  o f  A d v a nce d  Eng ine e r i ng  a nd  S cie nce   w     w     w       i                       l       c       m     DOI:  10.11591/ijece.v7i3.pp1112-1124 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1113 e xpensi v e. F or instance, the cost of implementing a remote method can be e xtremely e xpensi v e especially when it is implemented in netw orks that do not initially ha v e an y communicati on infrastructure with the po wer utility . Therefore, local methods are f a v ourable for detecting islanding condition. These local methods can be cate gorized as passi v e, acti v e and h ybrid techniques [8–10]. The passi v e islanding detection technique monitors t he system parameters such as v oltage, current, frequenc y and harmonic distortion at the point of common coupling (PCC) with the utility grid for detecting e v ents [3, 11–13]. In the acti v e islanding detection technique, disturbances are intentionally injected into the netw ork and the island is detected based on the system responses to the disturbances [6, 14–16]. Meanwhile, the h ybrid technique is a combination of the acti v e and passi v e techniques, in which acti v e technique is applied only if islanding is not detected by the passi v e technique [3, 17–20]. Data mining is widely used in numerous area including islanding detection [21–24]. F or instance, an intel- ligent islanding detection technique w as de v eloped in [25] using decision tree (DT) classifier to identify and classify islanding operations at specific tar get locations. Ho we v er , the DT classifier is not capable in capturing all possible islanding e v ents. T o impro v e the accurac y of the DT cl assifier , fuzzy rule-based incorporated wi th DT w as utilized in detecting the islanding e v ents [26]. In [13], a statistical signal processing algorithm is applied by using features from v oltage and frequenc y w a v eforms. The accurac y of this technique is acceptable, b ut the delay in statistical pro- cessing mak es this technique slo wer than other islanding detection techniques. Realizing the potential of dat a mining techniques for islanding detection, ne w techniques ha v e been de v eloped by combining the discrete w a v elet transform with v arious classifiers, namel y , DT , probabilistic neural-netw ork (PNN) and support v ector machines (SVM) [27]. The test results sho wed that the best accurac y can be achie v ed by the DT classifier model [27]. In [28] a pattern recognition approach based on the DT classifier w as emplo yed for isl anding detection. Ho we v er , DT classifier ha v e limitations, suc h as possibility of spurious relationships, possibility of duplication with the same sub-tree on dif ferent paths and limited to one output per attrib ute, and inability to represent test that refer to tw o or more dif ferent objects, which requires an e xploration of others intelligent technique. On the basis of the comprehe n s i v e literature re vie w , the data mining using correlation and coef ficient analysis had rarely been reported. Therefore, the main objecti v e of this study is to propose a ne w islanding scheme using the correlation and coef ficient analysis for features e xtraction and data mining techniques. Initially , features are e xtracted using the correlation and coef ficient analysis in which se v en parameter indices at the tar get DG location ha v e been identified as important features for identifying the islanding e v ents. Then v e dif ferent data mini ng techniques, namely , DT , SVM, neural netw ork (NN), bagging and random for - est (RF) ha v e been de v eloped as classifiers in islanding detection. The proposed islanding detection scheme is tested on the IEEE 34 b us system with in v erter based DGs. 2. B UILDING THE D A T A SET 2.1. T est System Fig. 1 sho ws the single-line diagram of the IEEE 34-b us distrib ution system model in MA TLAB/SIMULINK softw are. The DG and the load are connected to distrib ution system by a 100-kV A 24.9-kV/480-V transformer . Mean- while, the PCC is connected with R load with 100-kW . The DG is an in v erter -based DG with current controlled interf ace using the same control units in the pre vious study [29]. Figure 1. System under test: IEEE 34-b us system. 2.2. Database Generation V arious islanding and non-islanding e v ents should be generated with a wide range of dataset for training the classifier . The possible situations that may create islanding and non-islanding conditions are gi v en as follo ws: i. Load and capacitor switching at dif ferent b uses, ii. Se v eral types of f ault at dif ferent b usses, and iii. Ev ent that can trip break er and reclosers, and island the DG. Compar ative Study in Determining F eatur es Extr action for Islanding Detection Sc heme ... (Aziah Khamis) Evaluation Warning : The document was created with Spire.PDF for Python.
1114 ISSN: 2088-8708 The abo v e situations are simulated under possible v ariation in operating condition which are considered as: i. Normal DG loading, ii. Dif ferent operating points that cause po wer mismatch at the local R load connected at b us 848. 2.3. F eatur es Selection The main idea of features selection is to choose the most significant input v ariables by eliminating features with non/less-predicti v e information. The use of significant features can greatly impro v e the classifier model perfor - mance and thus, increase the prediction accurac y as well as the computational speed. In this paper , the combination of v arious features parameters has been chosen from pre vious islanding detection methods focusing on in v erter -based- DG. The e xtracted features include X a frequenc y de viation ( f), X b v oltage de viation ( V), X c rate of change of v oltage magnitude ( V)/( t), X d fundamental current total harmonic distortion ( T H D C f ), X e current total har - monic distortion ( T H D C ), X f fundamental v oltage total harmonic distortion ( T H D V f ) and X g v oltage total har - monic distortion ( T H D V ). The features are e xtracted by per phase basis in order to identify the most essential feature parameters for islanding detection. Figs. 2 and 3 sho w e xamples of features signals obtained from islanding e v ent for phase-A at DG terminal in distrib ution system. The signals in Figs. 2a-c and d-f represents the v oltage and frequenc y of phase-A during islanding condition case, respecti v ely . The signals in Figs. 2b and c are the v oltage de viation ( V) and rate of change of v oltage magnitude ( V)/( t), respecti v ely , obtained from the v oltage signal of Fig. 2a. The frequenc y signals of Fig. 2d are e v aluated to get the frequenc y de viation ( f) as illustrated in Fig. 2f. Meanwhile, the information of THD for v oltage and current are selected as sho wn in Figs. 3a and b . The entire features information is then utilized as the input for the classifier . The features are then rearranged and e xpressed as, I nput = 2 6 6 6 4 X a 1 X b 1 X c 1 X d 1 X e 1 X f 1 X g 1 X a 2 X b 2 X c 2 X d 2 X e 2 X f 2 X g 2 . . . . . . . . . . . . . . . . . . . . . X a y X b y X c y X d y X e y X f y X g y 3 7 7 7 5 (1) where X a is referred to the frequenc y de viation ( f), X b is referred to the v oltage de viation ( V), X c is referred to rate of change of v ol tage magnitude ( V)/( t), X d is referred current total harmonic distortion ( T H D C ), X e is referred to fundamental current total harmonic distortion ( T H D C f ), X f is referred v oltage t otal harmonic distortion ( T H D V ), X g is referred to fundamental v oltage total harmoni c distortion ( T H D V f ) and y is referred to the number of points tak en after the disturbance detected. IJECE V ol. 7, No. 3, June 2017: 1112 1124 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1115 0 0.5 1 1.5 2 2.5 3 (a) × 10 4 -2 0 2 Voltage  (V) 0 1000 2000 3000 4000 5000 6000 7000 8000 (b) 0 0.5 1 Normalized  deltaV (V) 0 1000 2000 3000 4000 5000 6000 7000 8000 (c) 0 0.5 1 Normalized  dV/dt  (V/s) 0 1 2 3 4 5 6 7 (d) × 10 5 59.5 60 60.5 Frequency (Hz) 0 1 2 3 4 5 6 7 8 9 10 (e) × 10 4 59.8 60 60.2 Frequency (Hz) 0 1 2 3 4 5 6 7 8 9 10 Sample Number (f) × 10 4 0.4 0.5 0.6 Normalized  deltaF  (Hz) Figure 2. Example features e xtraction for islanding case: (a) Phase A v oltage signal, (b) V oltage de viation ( V), (c) Rate of change of v oltage ( V)/( t), (d) Phase A frequenc y signal, (e) Zoom in frequenc y a fter disturbance, (f) Frequenc y de viation ( f). Figure 3. Example features e xtraction for islanding case: (a) V oltage total harmonic distortion ( T H D V ), (b) Current total harmonic distortion ( T H D C ). Compar ative Study in Determining F eatur es Extr action for Islanding Detection Sc heme ... (Aziah Khamis) Evaluation Warning : The document was created with Spire.PDF for Python.
1116 ISSN: 2088-8708 3. FEA TURE EXTRA CTION USING CORRELA TION AND COEFFICIENT AN AL YSIS The inclusion of irrele v ant and redundant features e xtraction in the classifier model may results in poor performance in classification accurac y and increases the computation time. T o obtain high classification accurac y , high quality of features need to be e xtracted in describing the islanding e v ents using the correlation and coef ficient analysis. Fig. 4 sho ws the correlation between 28 features v ariable. The colours and shape element in the figure are used to sho w the de gree of correlation [30]. The v ariables are said to ha v e perfect correlation with itself, which is in the diagonal lines on the diagonal of the graphic (see Fig. 4). The blue colours sho ws the positi v e v alue, whereas the red for ne g ati v e v alue that used to encode the sign of correlation. Meanwhile, filled circled means positi v e v alue, while anti-clockwise is for ne g ati v e v alues. In this analysis, the Pearson correlation coef ficient is utilized to measure the strength between 28 v ariable features. Mathematically , the coef ficient is e xpressed as follo ws: r = N P k l ( P k )( P l ) p [ N P k 2 ( P k ) 2 ][ N P l 2 ( P l ) 2 ] (2) where N is referred to number of pairs of scores, P k l is referred to sum of the products of paired scores, P k is referred to sum of k scores, P l is referred to sum of l scores, P k 2 is referred to sum of squared k scores, and P l 2 is referred to sum of squared l scores. Figure 4. V isual summary of correlation between the 28 candidate attrib utes for phase A. F or instance, Fig. 4 sho ws that the most positi v e correlation v ariable is X g , where most of the relationship between the v ariables are in positi v e v alue. The relation correlation between X b 1 and X c 1 , X b 1 and X c 4 , and X b 3 and X c 1 are e v aluated as -0.6746369, -0.6300237 and -0.3214842, respecti v ely . Therefore, the circle with red colours in Fig. 4, sho w the ne g ati v e correlated between X b and X c . This finding pro v es that X b is the most ne g ati v e correlation between the features as sho wn in Fig. 4. The significant of the v ariables is ag ain highlighted by the importance analysis report from the RF learning as illustrated in Fig. 5. The figure sho ws that the top four v ariable are listed as [ X g ; X e; X b; X c ] . The out-of bag accurac y-based ranking results in approximately same with the top four , e v en though the X b should be substituted to the lo wer correlated with X g . Similar to the islanding detection classifier procedure adopted to phase-A, classifier t raining and testing data set procedures are applied on the other tw o phases namely , phase B and C. Figs. 6(a) and 6(b) sho ws the correlation between 28 features v ariable for phase-B and C, respecti v ely . The figure sho ws the correlation relationship with the 28 features v ariable by depicting the pattern of relations among the v ariables. Meanwhile, Fig. 7 sho ws that the important beha viour report from the RF model classifier for phase-B and C. Phase-B and C sho w an equal important v ariable. The observ ation lik e wise re v eals that the top four v ariable are listed as [ X g ; X e; X b; X c ] . IJECE V ol. 7, No. 3, June 2017: 1112 1124 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1117 Figure 5. T op-do wn importance of v ariable according to accurac y loss or misclassification rate reduction (gini) for phase A. Compar ative Study in Determining F eatur es Extr action for Islanding Detection Sc heme ... (Aziah Khamis) Evaluation Warning : The document was created with Spire.PDF for Python.
1118 ISSN: 2088-8708 (a) (b) Figure 6. V isual summary of correlation between the 28 candidate attrib utes: (a) phase B, (b) phase C. IJECE V ol. 7, No. 3, June 2017: 1112 1124 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1119 (a) (b) Figure 7. T op-do wn importance of v ariable according to accurac y loss or misclassification rate reduction (gini): (a) phase B, (b) phase C. Compar ative Study in Determining F eatur es Extr action for Islanding Detection Sc heme ... (Aziah Khamis) Evaluation Warning : The document was created with Spire.PDF for Python.
1120 ISSN: 2088-8708 4. IMPLEMENT A TION OF DECISION TREE AND RANDOM FOREST AS CLASSIFIERS Fig. 8 illustrates the DT structure for the islanding classification model for in v erter -based DG consists of 8 nodes. At the top of the tree, the v alue of X e is first compared with the threshold v alue 0.632898 and it will split into tw o descendent subsets. This subset is then split into se v eral leaf called nodes which are designated by a class label. There are tw o class label in this s tudy , namely , islanding and non-islanding cases. From the figure, all the cases ha ving X e within 0.63 and 0.65 are predicted as non-islanding state. Ho we v er , for cases with X e less than 0.63, the classification depends on the v alue of X b and X c . Figure 8. DT generated for phase A considering optimal node of in v erter based DG. Fig. 9 sho ws the multidimensi onal scaling (MDS) plot for islanding and non-islanding e v ents utilizing the RF classifier . This MDS is used to disco v er the underlying structure of distance measured between objects. The MDS assign the observ ations to specific locations in a conceptual space (commonly 2 or 3 dimensional space used), thus the distance between points in space match the gi v en dissimilarities as closely as possible. Figure 9. Multidimensional scaling plot of proximity matrix from random forest. 5. TEST RESUL TS The simulation data were obtained using MAL T AB/SIMULIK softw are and the data were randomly di vided into training and testi n g data set as summarized in T able 1. The features are e xtracted from the information gi v en in (1). The open-source softw are, Rattle is used to implements the con v entional DT , bagging and RF classifier . F or easy comparison, all the classifier use the same training and testing data sets which gi v es tw o predictors of class label called as is landing and non-islanding e v ents. T able 2 sho ws classification results for testing data set of phase A with three dif ferent classifie rs, namely DT , bagging and RF classifiers. This result re v eal s that the highest accurac y can be achie v ed with the RF classifier with percentage classification of 98 : 9% and 100% for the non-islanding and islanding IJECE V ol. 7, No. 3, June 2017: 1112 1124 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1121 cases, respecti v ely . T able 1. N U M B E R O F S A M P L E Non-islanding Islanding T otal T raining data set 95 91 186 T esting data set 91 94 185 T able 2. C L A S S I FI C A T I O N R E S U L T S O N T E S T I N G D A T A S E T S F O R P H A S E A Classifier Model No of Cases Actual Class Non-islanding Islanding Classification Accuracy (%) D ecisionT r ee 91 N on isl anding 78 4 85 : 71 94 I sl anding 13 90 95 : 74 B ag g ing 91 N on isl anding 89 1 97 : 80 94 I sl anding 2 93 98 : 94 R andomF or est 91 N on isl anding 90 0 98.90 94 I sl anding 1 94 100 Further comparison is then made for islanding detection using SVM, NN, DT , bagging, and RF classifiers considering all the three phases. The perform ances of accurac y of these classifiers is e v aluated as sho wn in Fig. 10 and T able 3. T able 3 sho w the accurac y of the v e classifiers for islanding detection at each phase, i.e, phase-A, B and C. F or all the phases, the RF classifier gi v es the highest accurac y compared to the other classifiers in detecting islanding e v ents as indicated in bold. This result pro v es that the best classifier model to predict the islanding condition based on per phase feature e xtraction can be obtained using the RF classifier . A B C Phases 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 Accuracies SVM NN DT Bagging RF Figure 10. Accuracies of v arious model Compar ative Study in Determining F eatur es Extr action for Islanding Detection Sc heme ... (Aziah Khamis) Evaluation Warning : The document was created with Spire.PDF for Python.