Inter national J our nal of Electrical and Computer Engineering (IJECE) V ol. 9, No. 2, April 2019, pp. 1359 1373 ISSN: 2088-8708, DOI: 10.11591/ijece.v9i2.pp1359-1373 1359 Impr o v ed optimization of numerical association rule mining using h ybrid particle swarm optimization and cauch y distrib ution Imam T ah yudin 1 and Hidetaka Nambo 2 1,2 Artificial Intelligence Laboratory , Graduate School of Natural Science and T echnology , Di vision of Electrical Engineering and Computer Science, Kanaza w a Uni v ersity , Japan 1 Department of Information System, STMIK AMIK OM Purw ok erto, Indonesia Article Inf o Article history: Recei v ed Sep 7, 2017 Re vised Sep 10, 2018 Accepted Sep 16, 2018 K eyw ords: Numerical data ARM PSO Cauch y distrib ution Multi-objecti v e functions P ARCD ABSTRA CT P article Sw arm Optimization (PSO) has been applied to solv e optimization problems in v arious fields, such as Assoc iation Rule Mining (ARM) of numerical problems. Ho w- e v er , PSO often becomes trapped in local optima. Consequently , the results do not represent the o v erall opt imum solutions. T o address this limitation, this study aims to combine PSO with the Cauch y distrib ution (P ARCD), which is e xpected to increase the global optimal v alue of the e xpanded search space. Furthermore, this study uses multi- ple objecti v e functions, i.e., support, confidence, comprehensibility , interestingness and amplitude. In addition, the proposed method w as e v aluated using benchmark datasets, such as the Quak e, Bas k et ball, Body f at, Pollution, and Bolt datasets. Ev al uation re- sults were compared to the results obtained by pre vious studies. The results indicate that the o v erall v alues of the objecti v e functions obtained using the proposed P ARCD approach are satisf actory . Copyright c 2019 Institute of Advanced Engineering and Science . All rights r eserved. Corresponding A uthor: Imam T ah yudin, Artificial Intelligence Laboratory , Graduate School of Natural Science and T echnology , Electrical Engineering and Computer Science, Kanaza w a Uni v ersity , Kakumamachi, Kanaza w a, Ishika w a, Japan. T el.: +81-76-234-4835 F ax: +81-76-234-4900 Email: imam@blitz.ec.t.kanaza w a-u.ac.jp 1. INTR ODUCTION The ARM or association analysis method is used to find associations or rela tionships between v ariables, which often arise simultaneously in a dataset [1]. In other w ords, association analysis b uilds a rule for se v eral v ariables in a dataset that can be distinguished as an antecedent or a consequent. The Apriori and Frequent P attern (FP) gro wth methods are widely em plo yed in as sociation analysis. These methods are suitable for cate gorical or binary data, such as gender data, i.e., males can be represented by 0 and females by 1 [2]. Furthermore, if the data are numeric, such as age, weight or length, these methods process the data by t ransforming numerical data into cate gorical data (i.e., a discretization process). This transformation process requires more time and can miss a significant amount of important information because data transformation does not maintain the main meaning of t he original data [3], [4], [5]. F or e xample, if age data represents a 35 years old person and is transformed to 1, this obscures the original meaning of the age information. In addition, both methods require manual interv ention to determine the minimum support (attrib ute co v erage) and confidence (accurac y) v alues. Note that this step is subjecti v e in some cases; thus, the results will not be optimal [6], [7]. J ournal Homepage: http://iaescor e .com/journals/inde x.php/IJECE Evaluation Warning : The document was created with Spire.PDF for Python.
1360 ISSN: 2088-8708 T o resolv e this problem, some researchers ha v e proposed solutions that emplo y optimization approaches, e.g., particle sw arm optimization (PSO) [4], fuzzzy logic [8], and genetic algorithm (GA) [3], [7]. Re g arding of the PSO approach which has multiple objecti v e functions for solving association analysis of numerical data without a discretization process. This research produced the better result than other pre vious optimization meth- ods. It has optimum v alue automatically without determining the minimum support and minimum confidence. Ho we v er , this method can also become trapped in local optima. When iterations are complete and the number of iterations tends to w ard infinity , the v elocity v alue of a particle approaches 0 (the weight v alue of the v elocity function is between 0 and 1). Therefore, the search is terminated because the PSO method can not find the optimal v alue when the v elocity v alue is 0. Thus, PSO often f ails to seek the o v erall optimal v alue [4], [9] , [10]. W e proposed a method that can address the premature searching and the limitations of traditional met h- ods that it does not use a discretization process. In other w ord, the original data are processed directly using the concept of the Michig an or Pittsb ur gh approaches. Furthermore, support and confidence threshold v alues are determined automatically using the P areto optimality concept. One solution to this problem is by combining PSO with the Cauch y distrib ution. This combination increases the size of the s earch space and is e xpected to produce a better optimal v alue. Y ao et al (1999) reported that combining a function with the Cauch y distrib ution will result in a wider co v erage area; thus, when the Cauch y distrib ution is combined with the function of the PSO method, the optimal v alue will increase [10]. Therefore, the purpose of this study is to find the optimal v alue of the numerical data in association a nal- ysis problems by combining PSO with the Cauch y distrib ution (P ARCD). Furthermore, we determine the v alue of se v eral objecti v e functions such as support, confidence, comprehensibility , interestingness, and amplitude, as a parameter to e v aluate the performance of the proposed method. Problem solving in numerical data association analysis is generally performed using se v eral a p pr o a ches, including discretization, distrib ution and optimization. That the discretization is performed using partitioning and combining, clustering [11], [12] and fuzzy [8] methods, and the optimization approach is solv ed using the optimized association rule [13], dif ferential e v olution [14], GA [3], [7] and PSO [4], [15] as sho wn in Figure 1. Figure 1. Numeric association analysis rule mining W e focus to solv e the problem of association analysis of numerical data by optimization. The pre vious research from optimization approach is kno wn as the GAR method. It has been attempted to find the optimal item set with the best support v alue without using a discretization process [13]. And then, the dif ferential e v olution optimization approach includes the generation of the initial population, as well as mutation, crosso v er and selection operations. The multi-objecti v e functions are optimized using the P areto optimality theory . This method is kno wn a s MODEN AR [14]. Furthermore, a study of numerical association rul e mining using the genetic algorithm approach (ARM GA). It successfully solv ed association analysis of numerical data problems without determining the v alues of the minimum support or minimum confidence manually . In addition, this method can e xtract the best rule that has the best relationship between the support and confidence v alues [7]. Another study of GA approach has been used MOGAR method. It presented that using MOGAR method w as f aster than using con v entional methods, such as Apriori and FP-gro wth algorithms, because the time comple xity of the MOGAR method tends to be simpler , and follo ws quadratic distrib ution. On the other hand, the Apriori IJECE V ol. 9, No. 2, April 2019 : 1359 1373 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1361 algorithm follo ws an e xponential distrib ution, which requires more time for computation [3]. Ne xt, the opti mization method has been used PSO for solving numerical ARM problem. Some authors who performed PSO method such as the y used ARM to in v estig ate the association of frequent and repeated dysfunction in the production process. The result obtained a f aster and more ef fecti v e optimization emplo yed PSO, which resulted in a f aster and more ef fecti v e optimization process than the other optimization methods [16]. In addition, the PSO approach w as used to impro v ed the computational ef ficienc y of ARM problems such that appropriate support and confidence v alues could be determined automatically [17]. In 2012, the de v elopment of PSO for ARM problems w as performed by weighting the item set. This weighting is v ery import ant for v ery lar ge data because such data often contain important information that appears infrequently . F or e xample, in medical data, if there is a rule f stif f neck, fe v er , a v ersion to light g ! f meningitis g that rarely appears b ut this rule is v ery important because in f act this condition is often happen [18]. In 2013, Sarath and Ra vi introduced binary PSO (BPSO) to generate association rules in a transaction dat abase. This method is similar to the Apriori and FP gro wth algorithms; ho we v er , BPSO can determine optimum rules without specifying the minimum support and confidence v alues [19]. In 2014, Beiran v and et al. studied numerical data association analysis using the PSO method. The y stated that the emplo yed method could ef fecti v ely analyze numerical data association analysis problems without using a discretization process. This research empl o ys four objecti v e functions, i.e., support, confidence, comprehensibility and interestingness. This method is referred to as MOP AR [4]. In 2014, Indira and Kanmani conducted resear ch using a PSO approach; ho we v er , the y attempted to impro v e results and analysis time using an adapti v e parameter determination process to determine v arious parameters, such the constant and weight v alue in a v elocity equation. The y de v eloped the Apriori algorithm using a PSO approach (APSO), and the results demonstrated that this approach w as f aster and better compared to using only an Apriori method [15]. In addition, the combination of PSO and GSA has been conducted for solving optimal reacti v e po wer dispatch problem in po wer system. The problem has succesfully accomplished on basis of ef ficient and reli able technique. And then, the result were found satisf actorily to a lar ge e xtent that of reported earlier [20]. V erma and Lakhw ani e xamined ARM problems by combining PSO and a GA. The results sho wed better accurac y and consistenc y compared to indi vidual PSO or a GA method [21]. There are man y de v elopments of PSO method. i.e. the papers; ”the implementation of PSO in dis- trib uted generation sizing” [22], ”impro v ed cann y edges using cellular based PSO technique in digital images” [23], and the h ybrid method. One of h ybrid methods is the h ybrid PSO with the Cauch y distrib ution [24]. This method pro vides better results compared to using only PSO. In 2011, this combined method w as retested for SVM parameter selection [25-27]. The combined approach w as also used to impro v e performance weaknesses in a process to identify a w atermark image based on discrete cosine transform (DCT). The results demonstrated that combining PSO with the Cauch y distrib ution outperforms the compared method [28]. In 2014, an empirical study demonstrated that combining PSO with the Cauch y distrib ution pro vided. The results sho w that the use of PSO with Cauch y distrib ution higher than using only PSO [29]. T o the best of our kno wledge, combining PSO with the Cauch y distrib ution has not been applied to ARM problems that in v olv e numerical data. This research has important contrib ution for optimization approach of numerical ARM problem. The reminder of this paper is or g anized as follo ws. Research method is discussed in Section 2. This section describes the design of the multiple objecti v e functions and the de v elopment of the proposed P ARCD method. Secti on 3 e xposes the e xperimental result and discussion of proposed method which w as tested using a dataset benchmark. This section also pro vides a comparison of the results obtained by the proposed P ARCD method and e xisting methods. Conclusions and suggestions for future w ork are pro vided in Section 4. 2. RESEARCH METHOD 2.1. Objecti v e Design This study uses multiple objecti v e functions, i.e., support, confidence, comprehensibility , interesting- ness and amplitude. First, the support criterion determines the ratio of transactions for item X to the total transaction (D), i.e., support(X)= j X j / j D j . Then, if A is the antecedent of the transaction dataset as a precondi- tion then C is consequence as the conclusion of a transaction dataset. The support v alue of if A then C (A ! C) is computed as follo ws: S uppor t ( A [ C ) = j A [ C j j D j (1) Impr o ved optimization of numerical association rule mining ... (Imam T ahyudin) Evaluation Warning : The document was created with Spire.PDF for Python.
1362 ISSN: 2088-8708 where j A [ C j is the number of transaction which contain A and C. The minimum support v alue is closely link ed to the number of items co v ered to determine the refe renced rule. If the threshold v alue is lo w , the support co v ers man y items and vice v ersa. The support measurement is used to determine the confidence measurement criteria, i.e., the criteria used to measure the quality or accurac y of the rule deri v ed from the total transactions. Such rules are often de v eloped for each transaction to better demonstrate quality or accurac y [4]. Confidence can be e xpressed as follo ws, C onf idence ( A [ C ) = S uppor t ( A [ C ) S uppor t ( A ) (2) Ho we v er , these criteria are not guaranteed to produce appropriate rules. Thus, for a gi v en rule to be considered reliable and to pro vide o v erall co v erage, the result must also satisfy the comprehensibility and interestingness criteria. Gosh and Nath (2004), stated that less number of attri b ut es in antecedent component of a rule sho w that the rule is comprehensible [30]. The comprehensibility measurement criteria can be e xpressed as follo ws: C ompr ehensibil ity ( A [ C ) = l og (1+ j C j ) l og (1+ j A [ C j ) (3) where j C j is the number of consequence item and j A [ C j is the rule number of if A then C (A ! C). Ne xt, the interestingness criter ia are used to generate hidden information by e xtracting some interest ing rule or unique rule. This criterion is based on the support v alue and is e xpressed as follo ws: I nt er esting ness ( A [ C ) = S upp ( A [ C ) S upp ( A ) S upp ( A [ C ) S upp ( C ) 1 S upp ( A [ C ) j D j (4) The right side of Eq. (4) consists of three components. The first component sho ws the generation probability of the r u l e that i s based on the antecedent attrib ute. The second is based on the consequence attrib utes and the third is based on the total dataset. There is a ne g ati v e correlation between interestingness and support. When the support v alue is high, the interestingness v alue is lo w because the number of frequent items co v ered is small [4]. The last criterion is the amplitude interv al. The amplitude interv al, which is a measure of a minimizati on function, dif fers from support, confidence and comprehensibility measures, which are m aximization functions. The amplitude interv al is e xpressed as follo ws: Ampl itude ( A [ C ) = 1 1 m ( i = 1 ; m ) ui l i max ( Ai ) min ( Ai ) (5) Here, m is the number of attrib utes in the item set ( j A [ C j ) , ui and l i are the upper and lo wer bounds encoded in the item sets corresponding to attrib ute i. max ( Ai ) and min ( Ai ) are the allo w able limits of the interv als corresponding to attrib ute i. Thus, rules with smaller interv als are intended to be generated [14]. 2.2. PSO PSO, which w as first introduced by K ennedy and Eberhart (1995), is an e v ol utionary method i nspired by animal beha vior , e.g., flocks of birds, school of fish, or sw arms of bees [31]. PSO be gins with a set of random particles. Then, a search process attempts to find the optimal v alue by performing an update generation process. During each iteration, each particle is updated by follo wing tw o best v alues. The first is the best solution (fitness) achie v ed to this point. This v alue is called pBest. The other best v alue track ed by the sw arm particle optimizer is the best v alue obtained by each particle in the population. The v alue is called gBest. After finding pBest and gBest, each particle’ s v elocity and corresponding position are updated [15]. Each particle p in some iteration t has a position x ( t ) and displacement speed v ( t ) . The finest particles (pBest) and bes t global positioning (gBest) are stored in memory . The speed and position are updated using Eqs. 6 and 7, respecti v ely [15]. V i; new = ! V i; ol d + C 1 r and ()( pB est X i ) + C 2 r and ()( g B est X i ) (6) IJECE V ol. 9, No. 2, April 2019 : 1359 1373 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1363 X i; new = X i; ol d + V i; new (7) Here ! is the inertia weight; V i; ol d is the v elocity of the i th particle before updating; V i; new is the v elocity of the ith particle after updating; X i is the i th , or current particle; i is the number of particles; r and () is a r andom number in the range (0, 1); C 1 is the cognit i v e component; C 2 is the s ocial component; pB est is the particle best or local optima in some iterations on e v ery running; g B est is the global best or global optima in some iterations on e v ery running. P article v elocities in each dim ension are restricted to maximum v elocity V max [32]. 2.3. Cauch y Distrib ution Y ao et al. (1999) used a Cauch y distrib ution to implement a wider mutation scale [10]. A general formula for the probability density function is e xpressed as follo ws. f ( x ) = 1 s (1 + (( x t ) =s ) 2 ) (8) A Cauch y random v ari able is calculated as follo ws. F or an y random v ariable X with distrib ution func- tion F . The random v ariable Y=F(X) has a uniform distrib ution in the range [0,1). Consequently , if F is in v erted, the random v ariable can use a uniform density to simulate random v ariable X because X = F 1 (Y). Therefore, the cumulati v e distrib ution function of Cauch y distrib ution is e xpressed as follo ws F ( x ) = 1 ar ctan ( x ) + 0 : 5 (9) Therefore if y = 1 ar ctan ( x ) + 0 : 5 (10) by in v erting its function, the Cauch y random v ariable can be e xpressed as follo ws x = tan ( ( y 0 : 5)) (11) This function can be e xpressed by Eq. (12) because y has a uniform distrib ution in the range (0,1]. Thus, we obtain the follo wing, x = tan ( = 2 r and [0 ; 1)) (12) 2.4. PSO f or Numerical Association Rule Mining with Cauch y Distrib ution P ARCD is an e xtension o f the MOP AR methods that combines PSO and the Cauch y distrib ution to solv e problems that occur in the association analysis of numerical data [33]. The goal is to find the optimal v alue of amateurs and a v oid being trapped in local optima. Essentially , this method uses the concept of PSO b ut modifies the v elocity equation by including the Cauch y distrib ution. The v elocity function is e xpressed as follo ws, V i ( t + 1) = ! ( t ) V i ( t ) + C 1 r and ()( pB est X i ( t )) + C 2 r and ()( g B est X i ( t )) (13) The ne xt step is normalization by using V i ( t + 1) v alue (Eq. 13), which mak es the v ect or length 1. The v ariant of the Cauch y distrib ution is infinite and the objecti v e function scales are 1 [10]. U i ( t + 1) = V i ( t + 1) p V i 1( t + 1) 2 + V i 2( t + 1) 2 ::: + V iK ( t + 1) 2 (14) The result of the normalization process is multiplied by the Cauch y random v ariable as follo ws. S i ( t + 1) = U i ( t + 1) tan 2 r and [0 ; 1) (15) Impr o ved optimization of numerical association rule mining ... (Imam T ahyudin) Evaluation Warning : The document was created with Spire.PDF for Python.
1364 ISSN: 2088-8708 Then, the result of Eq. (15) which is a combination of the v elocity v alue and the Cauch y distrib ution, is used to determine the ne w position of a particle. X i ( t + 1) = X i ( t ) + S i ( t + 1) (16) 2.5. P ARCD Pseudo code and Flo wchart The P ARCD pseudocode as sho wn in Figure 2 and flo wchart as sho wn in Figure 3 sho w that the al- gorithm be gins by initializing the v elocity v ector and position randomly . The algorithm calculates the multi- objecti v e functions as the current fitness. Then, it e x ecutes looping iterations to seek pBest until it finds the gBest v alue as the optimal solution. Figure 2. P ARCD pseudocode Figure 3. PSO flo wchart IJECE V ol. 9, No. 2, April 2019 : 1359 1373 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1365 3. RESUL T AND DISCUSSION 3.1. Experimental Setup W e conducted an e xperiment using the Quak e, Bask etball, Body f at, Pollution, and Bolt benchmark datasets in T able 1. from the Bilk ent Uni v ersity Function Approximation Repository . The e xperiment w as performed using a computer with an Intel Core i5 processor wi th 8 GB main memory running W indo ws 7. The algorithms were implemented using MA TLAB. F or the proposed algorithm, we set the population size, e xternal repository size, number of iterations, C1 and C2, ! , v elocity limit and xRank parameters in T able 2. to 40, 100, 2000, 2, 0.63, 3.83, and 13.33 respecti v ely . T able 1. Dataset Properties Dataset No. of Records No. of Attr ib utes Quak e 2178 4 Bask etball 96 5 Body f at 252 15 Pollution 60 16 Bolt 40 8 T able 2. P arameters P arameter Size External Number of C 1 , C 2 ! V elocity xRank Repository Size iteration Limi t A v erage 40 100 2000 2 0.63 3.83 13.33 3.2. Experiments Association rule analysis comprises tw o steps. The first step is to determine the frequent itemset that includes the antecedents or consequences of each attrib ute. The second step is to implement the proposed algorithm. 3.2.1. Output Rules of the P ARCD Results This e xperiment sho ws the 20 th run time where each running contains 2000 rules. W e presented three datasets of output rules i.e. Body f at, Bolt, and Pollution datasets. T able 3 sho ws the results obtained with the Body f at dataset. F or Rule 1, there are eight antecedent attrib utes and three consequent attrib utes. F or Rule 2, the number of antecedent and consequent attrib utes are the same as Rule 1. F or the last rule, the number of antecedent and consequent attrib utes are six and tw o, respecti v ely . The antecedent attrib utes of Rule 1 are case number , percent body f at (Siri’ s equation), density , age, adiposity inde x, chest circumference, abdomen circumference, and thigh circumference. The consequent at- trib utes are percent body f at (Brozek’ s equation), height, and hip circumference. F or Rule 2, the antecedent and consequent attrib utes are the same as Rule 1. Thus, Rules 1 and 2 can be e xpressed as follo ws: if (att1, att3, att4, att5, att8, att11, att12, att14) then (att2, att 7, att13). F or Rule 2000, the antecedent attrib utes are Percent body f at using Brozek’ s equation, Percent body f at using Siri’ s equation, density , height, neck circumference and knee circumference, and the consequent attrib utes are case number and weight. Therefore, Rule 2000 is if (att2, att3, att4, att7, att10, att15) then (att1, att6). T able 4 sho ws the results obtained with the Bolt dataset, which has eight attrib utes; (run, speed, total, speed2, number2, Sens, time and T20Bolt). As can be seen, the first tw o rules the same results for both antecedent and consequent attrib utes. The antecedent attrib utes are total and time, and the consequent attrib utes are run and speed1. Therefore, the rule is if (total, time) then (run, speed1). The rule 2000 sho ws that the antecedent Impr o ved optimization of numerical association rule mining ... (Imam T ahyudin) Evaluation Warning : The document was created with Spire.PDF for Python.
1366 ISSN: 2088-8708 attrib utes are run and speed2. Ho we v er , the consequent attr ib ute is unkno wn. Thus, this rule cannot be declared clearly because it does not ha v e a conclusion. T able 5 sho ws the rule results for the pollution dataset obtained using the proposed particle repr esen- tation P ARCD method. The results for the first and second rules are the same. Here, the antecedent attrib utes are J ANT , EDUC, NONW , and WWDRK, and the consequent attrib utes are PREC, JUL T , O VR65, DENS and HUMID. Thus, the rule is if (J ANT , EDUC, NONW , WWDRK) then (PREC, JUL T , O VR65, DENS, HUMID). The Rule 2000 has an A CN result that dif fers from the first and second attrib utes. The antecedent attrib utes of Rule 2000 are J ANT , O VR65, HOUS, POOR, HC and HUMID and its consequent attrib utes are POPN, EDUC, DENS, NO X, and SO@. Thus, the final rule is if (J ANT , O VR65, HOUS, POOR, HC) then (POPN, EDUC, DENS, NO X, SO@). T able 3. A CN Rules (the Body f at dataset) Rules A CN LB < Attrib ute < UB Rule 1 Antecedent 1.096724 < Att1 < 1.108900 57.988435 < Att3 < 69.574945 309.987803 < Att4 < 314.218245 55.294719 < Att5 < 66.896106 136.234441 < Att8 < 138.744999 40.927433 < Att11 < 41.562953 20.266071 < Att12 < 20.586850 22.220988 < Att14 < 23.180185 Consequence 35.426088 < Att2 < 42.169776 113.825926 < Att7 < 122.261793 32.375620 < Att13 < 33.596051 Rule 2 Antecedent 1.096724 < Att1 < 1.108900 57.988435 < Att3 < 69.574945 309.987803 < Att4 < 314.218245 55.294719 < Att5 < 66.896106 136.234441 < Att8 < 138.744999 40.927433 < Att11 < 41.562953 20.266071 < Att12 < 20.586850 22.220988 < Att14 < 23.180185 Consequence 35.426088 < Att2 < 42.169776 113.825926 < Att7 < 122.261793 32.375620 < Att13 < 33.596051 ..... ..... Rule 2000 Antecedent 12.402089 < Att2 < 18.144187 56.221481 < Att3 < 65.667791 139.024098 < Att4 < 289.982951 94.156397 < Att7 < 136.200000 57.669974 < Att10 < 87.300000 18.798957 < Att15 < 19.060978 Consequence 1.054478 < Att1 < 1.108900 31.100000 < Att15 < 40.883823 Note : Att1 : Case Number Att2 :Percentage using Brozek’ s equation Att3 :Percentage using Siri’ s equation Att4 :Density Att5 :Age (years) Att6 :W eight (lbs) Att7 :Height (inches)(tar get) Att8 :Adiposity inde x Att9 :F at Free W eight Att10 :Neck circumference (cm) Att11 :Chest circumference (cm) Att12 :Abdomen circumference (cm) Att13 :Hip circumference (cm) Att14 :Thigh circumference (cm) Att15 :Knee circumference (cm) Att16 :Ankle circumference (cm) Att17 :Extended biceps circumference (cm) Att18 :F orearm circumference (cm) Att19 :Wrist circumference (cm) T able 4. A CN Rules (the Bolt dataset) Rules A CN LB < Attrib ute < UB Rule 1 Antecedent 11.911616 < Att3 < 16.259242 62.782669 < Att7 < 65.562550 Consequence 23.688468 < Att1 < 31.295955 5.928943 < Att2 < 6.000000 Rule 2 Antecedent 11.911616 < Att3 < 16.259242 62.782669 < Att7 < 65.562550 Consequence 23.688468 < Att1 < 31.295955 5.928943 < Att2 < 6.000000 ..... ..... Rule 2000 Antecedent 13.621221 < Att1 < 29.817232 1.761097 < Att4 < 2.325029 Consequence None Note : Att1 :R UN Att2 :SPEED1 Att3 :T O T AL Att4 :SPEED2 Att5 :NUMBER2 Att6 :SENS Att7 :TIME Att8 :T20BOL T IJECE V ol. 9, No. 2, April 2019 : 1359 1373 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 1367 T able 5. A CN Rules (the Pollution dataset) Rules A CN LB < Attrib ute < UB Rule 1 Antecedent 42.431841 < Att2 < 46.441110 9.675301 < Att6 < 10.303791 24.171326 < Att9 < 27.345700 42.882070 < Att10 < 44.054696 Consequence 21.695266 < Att1 < 22.757671 77.760994 < Att3 < 80.221960 6.698662 < Att4 < 7.071898 7436.549761 < Att8 < 7801.004046 58.816363 < Att15 < 63.240005 Rule 2 Antecedent 42.431841 < Att2 < 46.441110 9.675301 < Att6 < 10.303791 24.171326 < Att9 < 27.345700 42.882070 < Att10 < 44.054696 Consequence 21.695266 < Att1 < 22.757671 77.760994 < Att3 < 80.221960 6.698662 < Att4 < 7.071898 7436.549761 < Att8 < 7801.004046 58.816363 < Att15 < 63.240005 ..... ..... Rule 2000 Antecedent 39.363260 < Att2 < 46.455909 8.721294 < Att4 < 9.206407 89.212389 < Att7 < 90.700000 21.796671 < Att11 < 23.231486 606.938956 < Att12 < 648.000000 67.768113 < Att15 < 73.000000 Consequence 2.956662 < Att5 < 3.005372 9.450171 < Att6 < 10.068287 9345.537477 < Att8 < 9699.000000 225.061313 < Att13 < 288.274133 242.720468 < Att14 < 250.733264 Note : Att1 :PREC A v erage annual precipitation in inches Att2 :J ANT A v erage January temperature in de grees F Att3 :JUL T A v erage July temperature in de grees F Att4 :O VR65 SMSA population aged 65 or older Att5 :POPN A v erage household size Att6 :EDUC Median school years completed by those o v er 22 Att7 :HOUS of housing units which are sound and with all f acilities Att8 :DENS Population per sq. mile in urbanized areas, 1960 Att9 :NONW non-white population in urbanized areas, 1960 Att10 :WWDRK emplo yed in white collar occupations Att11 :POOR poor of f amilies with income ¡ U S D 3000 Att12 :HC Relati v e h ydrocarbon pollution potential Att13 :NO X Same as nitric oxides Att14 :SO@ Same as Sulphur dioxide Att15 :HUMID Annual a v erage, relati v e humidity at 1 pm Att16 :MOR T T otal age-adjusted mortality rate per 100,000 3.2.2. Output of multi-objecti v e function and corr elation of P ARCD methods The basic concept of association analysis comprises tw o steps, i.e., the first step is the determination rules which in e v ery rule contain antecedent and consequent and the second step is the implementation of the algorithm (i.e., the proposed method). This method be gins with the initialization process, which as the st art of the algorithm starts with the determine the multi-objecti v e function v alue and calculates the particl e v elocity and positioning at i. Then, an iterati v e process is performed to search for pBest and gBest as the optimal solution. Impr o ved optimization of numerical association rule mining ... (Imam T ahyudin) Evaluation Warning : The document was created with Spire.PDF for Python.
1368 ISSN: 2088-8708 T able 6 sho ws the results of the multi-objecti v e function of the P ARCD method. Here, there are four parameters i.e., support, confidence, comprehensibility and interestingness. Then, the method is e xamined using v e datasets i.e., quak e, bask etball, body f at, bo l t, and pollution. Generally , the Bolt dataset is the dominant data set and has the highest v alue for each parameter (e xcept comprehensibility). Con v ersely , the least dominant dataset is quak e (with the e xception of the confidence parameter). T able 6. The Output of P ARCD Method Dataset Support (%) Confide nce (%) Comprehensibility Interestingness (%) Quak es 22.97 86.73 25.88 785.2 37.72 2.34 9.30 Bask et Ball 61.04 92.69 17.87 545.80 167.74 6.56 21.16 Body f at 73.94 81.26 30.67 333.49 218.95 10.61 21.03 Pollution 250.84 96.88 9.49 231.08 168.35 43.43 39.68 Bolt 60.45 34.96 43.91 110.63 165.76 9.51 18.61 The first parameter , i.e., support, sho wed a higher v alue with the Bolt dataset (250.84%) and the lo west with the quak e dataset (22.97%). The a v erage w as approximately 90%. The highest confidence v alue w as similar to the support v alue. The highest confidence v alue w as obtained with the Bolt dataset (96.88%) with a de via- tion of approximately 10. The lo west confidence v alue w as obtained with the pollution dataset (34.96%) with a v ery high de viation of just under 45. The a v erage confidence v alue w as approximately 80%. The highest com- prehensibility v alue w as obtained with the Quak e dataset (approximately 785). The lo west comprehensibility v alue w as obtained with the pollution dataset (approximately 110 with a de viation, well o v er 165). The a v erage comprehensibility v alue w as approximately 400. The final parameter , i.e., interestingness, obtained the highest v alue with the bolt dataset (approximately 43% with a de viation of just under 40). The lo west interestingness v alue w as obtained with the quak e dataset (2.34% with a de viation of just under 10). The a v erage interesting- ness v alue w as approximately 15%. This demonstrates that the support and confidence v alues, i.e., 90% and 80% respecti v ely , were satisf actory . Moreo v er , the comprehensibility v alue w as four times better; ho we v er , the interestingness v alue w as not satisf actory (approximately 15%). The correlation v alues between each objecti v e function are sho wn in T able 7 and Figure 4. The result s sho w one objecti v e function with another are significant association either be positi v e or ne g ati v e. The correla- tion v al ue of all objecti v e functions to amplitude w as al w ays close to zero. In other w ords, the correlation to the amplitude function w as lo w . Thi s pro v es the opinion gi v en by Alatas et al. (2008), i.e., the amplitude function dif fers from other functions because it attempts to minim ize while the other functions attempt to maximize their v alues. T able 7. Correlation of Multi-Objecti v e Function Support Confidence Comprehensibility Interestingness Amplitude Quak e Support 1 0.8076 0.2112 0.9999 0.0000 confidence 0.8076 1 0.3971 0.8077 0.0000 comprehensibility 0.2112 0.3971 1 0.2113 0.0000 interestingness 0.9999 0.8077 0.2113 1 0.0000 amplitude 0.0000 0.0000 0.0000 0.0000 1 Bask et ball Support 1 0.4360 -0.7437 0.9750 0.0000 confidence 0.4360 1 0.1646 0.5716 0.0000 comprehensibility -0.7437 0.1646 1 -0.6350 0.0000 interestingness 0.9750 0.5716 -0.6350 1 0.0000 amplitude 0.0000 0.0000 0.0000 0.0000 1 Body f at Support 1 0.8137 -0.8340 0.8555 0.0000 confidence 0.8137 1 0.9917 0.9469 0.0000 comprehensibility 0.8340 0.9917 1 0.9575 0.0000 interestingness 0.8555 0.9469 0.9575 1 0.0000 amplitude 0.0000 0.0000 0.0000 0.0000 1 IJECE V ol. 9, No. 2, April 2019 : 1359 1373 Evaluation Warning : The document was created with Spire.PDF for Python.