TELK OMNIKA T elecommunication, Computing, Electr onics and Contr ol V ol. 18, No. 3, June 2020, pp. 1331 1342 ISSN: 1693-6930, accredited First Grade by K emenristekdikti, No: 21/E/KPT/2018 DOI: 10.12928/TELK OMNIKA.v18i3.14756 1331 Comparison of machine lear ning perf ormance f or earthquak e pr ediction in Indonesia using 30 y ears historical data I Made Murwantara 1 , Pujianto Y ugopuspito 2 , Rickhen Hermawan 3 1,2 Informatics Department, F aculty of Computer Science, Uni v ersitas Pelita Harapan, Indonesia 3 Under graduate Program, Informatics Department, F aculty of Computer Science, Uni v ersitas Pelita Harapan, Indonesia Article Inf o Article history: Recei v ed Aug 15, 2019 Re vised Jan 13, 2020 Accepted Feb 24, 2020 K eyw ords: Big data Earthquak e Machine learning Multinomial logistic re gression Na ¨ ıv e bayes Prediction SVM ABSTRA CT Indonesia resides on most earthquak e re gion with more tha n 100 acti v e v olcanoes, and high number of seismic act i vities per year . In order to reduce the casualty , some method to predict earthquak e ha v e been de v eloped to estimate the seismic mo v ement. Ho we v er , most prediction use only short term of historical data to predict the incoming earthquak e, which has limitation on model performance. This w ork uses medium to long term earthquak e historical data that were collected from 2 local go v ernment bodies and 8 le gitimate international sources. W e mak e an estimation of a medium- to-long term prediction via machine learning algorithms, which are multinomial logistic re gression, support v ector machine and Na ¨ ıv e Bayes, and compares their performance. This w ork sho ws that the support v ector machine outperforms other method. W e compare the root mean square error computation results that lead us into ho w concentrate d data is around the line of best fit, where the multinomial logistic re gression is 0.777, Na ¨ ıv e Bayes is 0.922 and support v ector machine is 0.751. In predicting future earthquak e, support v ector machine outperforms other tw o methods that produce significant distance and magnitude to current earthquak e report. This is an open access article under the CC BY -SA license . Corresponding A uthor: I Made Murw antara, Informatics Department, F aculty of Computer Science, Uni v ersitas Pelita Harapan, T angerang, Banten, 15811, Indonesia, Email: made.murw antara@uph.edu 1. INTR ODUCTION An earthquak e is a natural disaster that occurs as a result of rocks layer mo v ement or displacement of the earth tectonic plate. This precipitous mo v ement releases a huge amount of ener gy that creates a kind of seismic w a v es. The vibration results that passed through the earth surf ace caused damage for the population that li v es on the earthquak e impact areas. Indonesia with more than 300 million inhabitants is a country located in the most frequent earthquak e re gion as it has about 127 acti v e v olcanoes [1], which usually called the Ring of Fire area that become the most acti v e tectonic mo v ement. Moreo v er , Indonesia also has the Great Sumatran F ault that span 1900 km length and the Banda Sea con v er gent flat mar gin that creates e v en more seismic acti vities [2, 3]. J ournal homepage: http://journal.uad.ac.id/inde x.php/TELK OMNIKA Evaluation Warning : The document was created with Spire.PDF for Python.
1332 ISSN: 1693-6930 No w adays, the earthquak e w arning system already installed in man y remote and v olcanic areas that might increases the number survi v or e xpectation. Moreo v er , man y research outcomes also g ain more information about earthquak e characteristics and impacts to the surrounding area. machine learning has also been used to mak e adv ancement on the information and prediction results. Ho we v er , some machine learning w ork result still has not pro vided accurate prediction, and sometimes rise up a f alse alarm because of lack of the v olume of data or the prediction method [4]. In our kno wledge, the application of the earthquak e prediction still has a space for us to augment into a certain point that gi v es us more confidence and better results. Furthermore, a good and reasonable prediction will pro vide opportunities to manage the emer genc y route path for e v acuation which may reduce the casualties. In order to pro vide data for prediction, we utilize the data collection from se v eral earthquak e and seismological repositories . The list of data resources for our research as follo ws, the United States Geological Surv e y (USGS) [5], Incorporated Research Institution for Seismology(IRIS) [6], National Oceanic and Atmospheric Adm inistration (NO AA) [7], European-Mediteranian Seismological Centre (EMSC) [8], International Seismological Centre (ISC) [9], Istituto Nazionale di Geofisica e V ulcanologia (INGV) [10], GeoF orschungZentrum (GFZ) [11, 12], Indonesia Tsunami Early W arning System (InaTEWS) [13], Global Historical Earthquak e Archi v e(GHEA) [14, 15], and Badan Meteorologi, Klimatology dan Geofisika (BMKG) Indonesia [16]. The v olume of the data collection produces more than 1TB. After cleansing to ha v e only data within Indonesia re gion, we ha v e around 375 GB data which is used as training and testing data. Considering the v olume of data, this w ork is a Big Data research. In this w ork, we compare the performance of three machine learning approaches, which are multinomial logistic re gression [17, 18], Na ¨ ıv e Bayes [2, 19–21] and support v ector machine (SVM) [4, 22–25] to the earthquak e dat a. Where, Logistic Re gression pro vides information of relat ionship between v ariant and to find out ho w close is one or more v ariable to another one. Na ¨ ıv e Bayes approach allo ws us to compute the probability that is tak en from ne w information. SVM is used for classification and re gression analysis of separation h yperplane. The contrib ution of this paper is tw ofold: (a) In predicting a disaster such as earthquak e, a comparison between dif ferent machine learning algorithms may gi v e light for a ne w approach. W e propose a technique that is comparable to other approach for earthquak e prediction in Indonesia re gion. Our method f acilitates of prediction and visualization that range within 50 years of seismic historical data which is particularly helpful to classify of ho w dif ferent machine learning performance could put light on our method of prediction. T o this, our approach can also adjust the size of data for better prediction. This is useful since the size of data, som etimes, influence the training and testing process for ultimate prediction. Other than that, we ha v e fle xibility on testing our results. (b) The data collection and cleansing includes massi v e v olume of data which creates rich resources for prediction. W e collect the data from le gitimate or g anization all o v er the w orld that compares with the local monitoring by the go v ernment bodies in Indonesia. The data cleansing also tak es most of our time which is not only retrie v e ra w data, it is also through web scrapping and data transformation. Some information need t o be inspected carefully , as the monitoring data may be irrele v ant for our w ork. T o this, we analyze the data based on whether the location of monitoring and its data rele v ant. F or e xample, the earthquak e data that released by a resource that tak en from third part y or not primarily generated by a specific seismic monitoring station. 2. RESEARCH METHOD 2.1. Rele v ant w orks The impro v ement of earthquak e prediction has been utilized via historical seismic data. The most promising technique is to use the Artificial Intelligence (AI) and machine learning (ML) has g ained further kno wledge [26]. In [27], Bertrand et al. identify the possibility of upcoming earthquak e by forecasting the laboratory quak e c ycle, which re v eals the timing of the e v ent will probably occurs.In general, earthquak e prediction is cate gorized into three dif ferent terms that is based on the length of the historical data source. Short term earthquak e prediction needs a precursor to strengthen its accurac y [28], while intermediate and long term prediction mak es estimation on statistical probability approach. Syif a et al. [29] uses SVM to analyze post earthquak e situation to assess the distrib ution of seismic destruction, which can be useful for e v acuation and mitig ation plan. Another technique to address t he prediction of earthquak e uses the meteorological data [30] TELK OMNIKA T elecommun Comput El Control, V ol. 18, No. 3, June 2020 : 1331 1342 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1333 based on the particle filter -based and support v ector re gression. This technique obtained natura l information, such as air temperature, g as concentration and wind speed to estimate the precursor of earthquak e. 2.2. Backgr ound This section will discuss the background theory of the w ork that co v ers the earthquak e theory and machine learning approaches. The earthquak e background theory is cate gorized into earthquak e types, seismic w a v e and earthquak e phenomena in Indonesia. The machine learning co v ers the multinomial logistic re gres- sion, Na ¨ ıv e Bayes and support v ector machine. 2.2.1. Earthquak e An earthquak e is a natural disaster that creates tremor or vibration in the impacted area as a result of earth rocks layers mo v ement or displacement because of the tectonic dislocation. This vibration will reach the earth surf ace that causes massi v e destruction. There are four types of earthquak e, which are tectonic, v olcanic, collapse and e xplosion. As sho wn in Figure 1, three types of of surf ace mo v ement that caused an earthquak e that appears not on e v ery place in the earth. In general, the mo v ement of e arth surf ace as the cause of an earthquak e when (a) tw o plates mo v es a w ay to dif ferent direction, (b) tw o plates mo v e in to the same point of line and (c) these plates mo v e side-by-side on opposite direction. Figure 1. Earthquak e types (a) di v er gent, (b) con v er gent and (c) transform The layer of earth skin has high temperature that distrib utes its heath into surrounding area. In gen- eral, this v olcanicacti vitykno wn as the heath flo w con v ection. This kind of acti vity pushes the magma into the surf ace which creates v olcanoe. Indonesia is an archipelago that located in the Circum-P acific and Meditera- nian which has a lot of numbers of acti v e v olcanoes. T o this, Indonesia becomes one of the high risk countries on earthquak e disaster . In term of earthquak e prediction, it is cate gorized based on ho w the earthquak e oc- curs. There are three cate gory of prediction. The first is long term prediction, where this prediction rarely implemented a s it get s the ra ng e of more than 10 years of historical data and some additional informat ion from sequential earthquak e as a result of f ault location. The second is the intermediate prediction that obtained in- formation from the e arthquak e location, time and destruction po wer within se v eral years. The last one is the short-term prediction that mak es an earthquak e estimation using se v eral days of data set. 2.2.2. Machine lear ning machine learning b uilds an insight from one or more dataset via some specific al gorithms. In thi s w ork, we compare the performance of three machine learning algorithms, namel y Na ¨ ıv e Bayes, support v ector machine (SVM) and multinomial re gression. Comparison of mac hine learning performance ... (I Made Murwantar a) Evaluation Warning : The document was created with Spire.PDF for Python.
1334 ISSN: 1693-6930 a. SVM In general, SVM is used to solv e classification and re gression problem. Ho we v er , SVM has g ained its popularity as it has good performance on empirical data. SVM conceptually simple, it has f ast learning al- gorithm and v ery often produce accurate results. This is because SVM is a machine learning that is de v eloped based risk minimization principle. In SVM, a training data set D is gi v en as, D= { ( x i , y i ) | x i R p , y i {− 1 , 1 }} n i =1 , y i is -1 or 1 indicating the class input which is a threshold w a v elet coef ficients x i to describe lo w or high magnitude. F or each x i is the p dimensional v ector . A Hyperplane is used to separate between class input which is good when its position between classes. So that, if w x 1 + b = + 1 is a supporti ng h yperplane of class +1, then w x 2 + b = 1 is the h yperplane to support class -1. In order to count the g ap mar gin between tw o classes, we can find the distance between tw o supporting h yperplanes. This mar gin can be identified via ( w x 1 + b = + 1) ( w x 2 + b = 1) = w ( x 1 x 2 ) , so that, w ( x 1 x 2 ) | | w | | = 2 | | w | | . F or Linear classification, it will be mi n ( w , b ) 1 2 w 2 , and for non-linear ˆ a = a r g mi n a 1 2 Σ m i , j =1 a i a j y i y j K ( x i , x j ) Σ m i =1 a i where K ( x i , x j ) is a k ernel function. b . Multinomial logistic re gression This method anal yzes the relation between bounded and unbounded v ariable that ha v e more than tw o v ariables which generalize logistic re gression into multiclass re gression. Multinomial logistic re gression model with three cate gories will ha v e formula as follo w , P ( Y = i | x ) = π y ( x ) = e x p ( g i ( x ) ) 1 + P 2 h =1 e x p ( g h ( x ) ) (1) c. Na ¨ ıv e bayes Na ¨ ıv e Bayes is a simple classification for counting the probability of combinations of a certain data set. This method assumes there is no dependenc y between classes to a v alue in class v ariable. Bayes theorem, as sho wn belo w , deri v es the posterior probability of tw o antecedents, which are prior probabili ty and a lik elihood function. P ( X | H ) = P ( X | H ) . P ( H ) P ( H ) (2) Where, X is the data with unkno wn class, H is the h ypothesis data for class specification, aa is the probability of h ypothesis H based on the poste rior probability ( X ), P ( H ) is the prior probability , P ( X | H ) is the probability observing X gi v en H , and P ( X ) is the mar ginal e vidence of probability of X . d. Ev aluation method In order to e v aluate the machine learning performance, we mak e use of confusion matrix, mean abso- lute error (MAE), mean Absolute percentage error (MAPE), mean square error (MSE) and root mean square error (RMSE). Confusion matrix describes the performance of classification model from dif ferent classes. The classifier has done its w ork when it g ained the information of true positi v e (TP) and true ne g ati v e (TN). And, when it classifies the ne g ati v e v al ue it wi ll produce t h e f alse pos iti v e (FP) and f alse ne g ati v e (FN). In measuring machine learning performance, we e v a luates for their accurac y (percent of correctness o v er all test instances) and precision.In t h i s paper , we measure the performance using mean absoule error (MAE), mean absolute percentage error (MAPE), mean square error (MSE) and root mean square (RMSE), R M S E = v u u t 1 n n X t =1 ( ˆ y i y i ) 2 (3) As sho wn in the e v aluation formula abo v e, ˆ y i is the predic ted earthquak es, y i is the dat a of earthquak e from the resources and T is the number of e xamples used for testing. MAE measures whether our computation to w ards under and o v er estimations [28]. MSE is the most common w ay to e v aluate the prediction results, where the error is the dif ferences between the estimation result and its data. MAPE is the e v aluation to indicate error when predicting between the original data and its result. MAPE useful when the size of v ariable is important to e v aluate the prediction. Meanwhile, RMSE measurement emphasizes lar ge errors more. RMSE TELK OMNIKA T elecommun Comput El Control, V ol. 18, No. 3, June 2020 : 1331 1342 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1335 e v aluates ho w close the observ ed data points are to the models’ predicted v alues and MAE describes uniformly distrib uted err ors. It is w orth to note that the RMSE v alue is similar to the unit of the outcome. F or e xample, when it measure the depth of an earthquak e then the unit is km. 2.3. Data collection This stage be gins all of our w ork by collecting data from dif ferent location and v arious form ats. The challenge in this acti vity is that some data can be retrie v ed directly from repository as ready to use data. In this w ork, the data collection acti vity is cate gorized into 3 methods, as follo w: (a) Retrie v e directly from the repository as it is pro vided in a ready to use format, such as comma separated v alue (CSV). (b) Retrie v e a web site, manual ly , in a h yperte xt markup language (HTML) format. Then web-scraping to get the information we need from within the HTML te xt file. Se v eral techni qu e s applied to dif ferent data source. W e retrie v e the EMSC data by accessing or do wn- load of each web page within 14 years (2004 2018). The webscraping technique is applied to resources from NO AA, EMSC, ISC, I NGV , GFZ and BMKG. F or InaTEWS, we do wnloaded manually . Other data set also do wnloaded directly , such as GHEA where the data format is not in CSV . USGS dat a is in CSV format that we can do wnloaded almost all the data that range from 1st January 1900 until 31st August 2018. F or IRIS data set we obtained data range 1968 to 2018. INGV data set ranges from 1985 to 2018, and for BMKG data set range 2008 to 2018. 2.4. Data pr e-pr ocessing This stage prepares the data before we mak e an y prediction. Most of the w ork in this stage is fil ter - ing the information such as to identify whether the date, time, latitude, longitude, magnit ude and depth e xist within the data set. W e also remo v e the data that has magnitude v alues 0 to a v oid an y misclassification during processing stage. Data mer ges also done in this stage. F or e xample, we mak e classification of data within the same range of dates into 10 years and 30 year s. In doing so, we obtained the intersection of data from dif ferent resources. 2.5. Pr ediction stage This stage predicts the data set for specific group of 10 and 30 years. W e split the w ork into tw o parts. In the first part, we train the data using set of group based on time, date, latitude, longitude, magnitude and depth to find the location and the possibility ener gy of earthquak e. In the ne xt part, we split the dataset into train and test that already cate gorized into 4 groups which are latitude, longitude, magnitude and depth, where the split ratio is 0.8 o v er 1.0. W e mak e use R [31] as a tool to mak e prediction and its library implement some machine learning methods that we implement to. F or Na ¨ ıv e Bayes we use the function Nai v e Bayes and SVM for support v ector machine from library e0171 [32]. multinomial logistic re gression uses multinom function from library NNET [33]. T o predict the earthquak e, the object is splitted to ha v e specific result. F or e xample, we predict the location of earthquak e as the first step. Then, the magnitude and depth of earthquak e is predicted based on the ne w location that already estimat ed in the pre vious step. The result of prediction is the combination of, both, the first step and the second step. In predicting t he location of earthquak e, we ha v e implemented tw o techniques. First, we mak e use of Geohash library to mer ge the latitude and longitude. Second, we also predict the location of earthquak e using only latitude and longitude. W e split our prediction based on location as sho wn in T able 1. It is w orth noting that the latitude and longitude is in de grees using decimal fraction. T able 1. Prediction F actor Based on Location Method Machine Learning Location GeoHash Latitude Longitude Data Depth Depth+Magnitude Magnitude Depth Depth+Magnitude Magnitude In predicting the magnitude v alues of an earthquak e, we f actorize the prediction into tw o f ac tors. First, in order to get into magnitude prediction the latitude and longitude are used to get the po wer of earth- quak e. Second, we predict via the combination of location and depth, as depicted in T able 2. F or the depth of Comparison of mac hine learning performance ... (I Made Murwantar a) Evaluation Warning : The document was created with Spire.PDF for Python.
1336 ISSN: 1693-6930 earthquak e, we f actorized into the opposite of the magnitude prediction, as sho wn i n T able 3. T o visualize our results, we mak e use of R tool with Shin y [34] library that o v erlay on top of map that retrie v ed from google map using ggmap [35] library . The final application of this w ork is a web-based system. T able 2. Prediction F actor Based on Depth Machine Learning Prediction Location Based on Depth Prediction Location Based on Depth and Magnitude Prediction Location Based on Magnitude Data Longitude +Latitude Longitude + Latitude + Depth Longitude +Latitude Longitude + Latitude + Depth Longitude +Latitude Longitude + Latitude + Depth T able 3. Prediction F actor Based on Magnitude Machine Learning Prediction Location Based on Depth Prediction Location Based on Depth and Magnitude Prediction Location Based on Magnitude Data Longitude +Latitude Longitude + Latitude + Magnitude Longitude +Latitude Longitude + Latitude + Magnitude Longitude +Latitude Longitude + Latitude + Magnitude 3. RESUL TS AND AN AL YSIS 3.1. Analysis In this w ork, we mak e prediction, solely , based on the earthquak e data set. Data processes in tw o condition, first, we grouped into 10 Y ears and 30 Y ear , second, without grouping or i ndi vidual data. Other than that, Na ¨ ıv e Bayes cannot create prediction for 10 and 30 Y ear indi vi d ua l data set because of imbalance data set. W e split the training and testing data into 60% and 40%. W e tak e into account the smaller error will guide us into more accurate prediction. T o reduce the comple xity of our w ork, we manage the prediction using a catalog that describe the method and data set, as sho wn in T able 4. As sho wn in T able 5, the actual data that is grouped into 10 years using dif ferent e v aluation techniques. SVM sho ws good result for Magnitude prediction and multinomial logistic re gression has better results for data with Depth. Na ¨ ıv e Bayes is not included into 10 years analysis. On the other hand, SVM outperforms other method for 30 years dataset with grouping on Magnitude a nd Depth, as sho wn i n table 5. It sho ws that the prediction accurac y as sho wn by MAE has 0.598473 which e xplicate that the prediction results of earthquak e is quite precision than other method. In making prediction using 10 years of data without grouping, SVM outperforms other algorithm which predict the earthquak e location based on Magnitude and Depth. In this prediction, SVM solely predict the f actor of latitude and longitude. The result, as depicted in table 6, sho ws that the prediction has achie v ed good result when the information of Magnitude and Depth estimates the coordinate location. In predicting earthquak e for 30 years dataset without grouping, multinomial logistic re gression (MLR) e xceeds other algorithm. It sho ws that using Magnitude and Depth data, as sho wn in T able 6, MLR has smaller error than SVM, where in this prediction Na ¨ ıv e Bayes is not included because of imbalance data. In the ne xt step, we w ould lik e to find out which method of machine learning suitable to predict earthquak e. T o this, we calculate the a v erage of data set to gi v e us an insight of which data set can pro vide small error rate. As sho wn in figure 7, the most applicable data set is for 30 year grouping data and 10 years not grouping data, as both sho ws lo w le v el of error rate. And we analyze that those data set has a chance to ha v e good predic tion. In more detail, both, the 30 years grouping and 10 years not grouping data set, SVM outperfoms other data with small error rate on using Magnitude information, which also sho ws small er error compares to the Depth information. So that, we analyze that SVM will predict earthquak e much better when using solely , on Magnitude information. From the information in T able 7, we analyze that the earthquak e prediction should be more accurate when we use Magnitude data as reference. In contrast, when the Depth data are used as reference, we might encounter the accurac y and, probably , has problem to predict the earthquak e location prediction. These data gi v e us vision that the depth data might ha v e its use to predict the destruction that might appear to the location prediction. TELK OMNIKA T elecommun Comput El Control, V ol. 18, No. 3, June 2020 : 1331 1342 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1337 In measuring the performance of which machine learning method that suitable for earthquak e predi c- tion in Indonesia, we compare the a v erage error rate for not grouping and grouping data set. Our result sho ws that the 30 Y ears grouping and 10 years not grouping data set gi v e us a reasonable v alues. As sho wn in T able 8, SVM outperforms multinomial logistic re gression and Nai v e Bayes. And also, 10 years not grouping data set, SVM sho ws better performance than Multinomial Logistic Re gresion, as depicted in T abl e 9. Where in 10 Y ears not grouping data set, because of imbalance data, we cannot obtain result from Na ¨ ıv e Bayes method. Ov erall, our e v aluation on machine learning performance sho ws that the grouping and not grouping data set which uses Magnitude as grouping reference performs better than using Depth v alues. Moreo v er , SVM method sho w better performance than other algorithm. Due to that we belie v e the prediction of earthquak e that mak e use of SVM w ould pro vide better accurac y than multinomial logistic re gression and Nai v e Bayes using similar data set. T able 4. An e xcerpt of 10 years group for prediction method and dataset No Method Location Data 1 MultiLogRe g Depth Predict(NonDepth) 2 MultiLogRe g Depth Predict(NonDepthNonMag) 3 MultiLogRe g Depth Predict(NonMag) 4 MultiLogRe g Depth PredictGeoHash(NonDepth) 5 MultiLogRe g Depth PredictGeoHash(NonDepthNonMag) 6 MultiLogRe g Depth PredictGeoHash(NonMag) 7 MultiLogRe g Depth+MA G Predict(NonDepth) 8 MultiLogRe g Depth+MA G Predict(NonDepthNonMag) 9 MultiLogRe g Depth+MA G Predict(NonMag) 10 MultiLogRe g Depth+MA G PredictGeoHash(NonDepth) 11 MultiLogRe g Depth+MA G PredictGeoHash(NonDepthNonMag) 12 MultiLogRe g Depth+MA G PredictGeoHash(NonMag) 13 MultiLogRe g MA G Predict(NonDepth) 14 MultiLogRe g MA G Predict(NonDepthNonMag) 15 MultiLogRe g MA G Predict(NonMag) 16 MultiLogRe g MA G PredictGeoHash(NonDepth) 17 MultiLogRe g MA G PredictGeoHash(NonDepthNonMag) 18 MultiLogRe g MA G PredictGeoHash(NonMag) 19 SVM Depth Predict(NonDepth) 20 SVM Depth Predict(NonDepthNonMag) 21 SVM Depth Predict(NonMag) 22 SVM Depth PredictGeoHash(NonDepth) 23 SVM Depth PredictGeoHash(NonDepthNonMag) 24 SVM Depth PredictGeoHash(NonMag) 25 SVM Depth+MA G Predict(NonDepth) 26 SVM Depth+MA G Predict(NonDepthNonMag) 27 SVM Depth+MA G Predict(NonMag) 28 SVM Depth+MA G PredictGeoHash(NonDepth) 29 SVM Depth+MA G PredictGeoHash(NonDepthNonMag) 30 SVM Depth+MA G PredictGeoHash(NonMag) 31 SVM MA G Predict(NonDepth) 32 SVM MA G Predict(NonDepthNonMag) 33 SVM MA G Predict(NonMag) 34 SVM MA G PredictGeoHash(NonDepth) 35 SVM MA G PredictGeoHash(NonDepthNonMag) 36 SVM MA G PredictGeoHash(NonMag) 37 Nai v eBayes Depth Predict(NonDepth) 38 Nai v eBayes Depth Predict(NonDepthNonMag) 39 Nai v eBayes Depth Predict(NonMag) 40 Nai v eBayes Depth PredictGeoHash(NonDepth) 41 Nai v eBayes Depth PredictGeoHash(NonDepthNonMag) 42 Nai v eBayes Depth PredictGeoHash(NonMag) 43 Nai v eBayes Depth+MA G Predict(NonDepth) 44 Nai v eBayes Depth+MA G Predict(NonDepthNonMag) 45 Nai v eBayes Depth+MA G Predict(NonMag) 46 Nai v eBayes Depth+MA G PredictGeoHash(NonDepth) 47 Nai v eBayes Depth+MA G PredictGeoHash(NonDepthNonMag) Comparison of mac hine learning performance ... (I Made Murwantar a) Evaluation Warning : The document was created with Spire.PDF for Python.
1338 ISSN: 1693-6930 T able 5. Grouping dataset Method Magnitude Depth 10 Y ears Ev aluation RMSE Method(25, 26)0.839928006 Method(34)123.7999 MAPE Method (30) 0.186486 Method (14, 15) 0.712816 MSE Method (25, 27) 0.705479 Method (34) 15326.42 MAE Method (30) 0.681305 Method (31) 64.91890744 30 Y ears Ev aluation RMSE Method (25, 26) 0.751008212 Method (28) 120.3226 MAPE Method (34, 35) 0.156257 Method (32, 33) 0.809354 MSE Method (25, 26) 0.564013 Method (28)14477.52 MAE Method (34, 35) 0.598473 Method(28) 64.5761601 T able 6. Ungrouping dataset Method Magnitude Depth 10 Y ears Ev aluation RMSE Method (19, 20) 0.805136856 Method (23,24) 101.4409 MAPE Method (19, 20) 0.135727 Method (23, 24) 1.835921 MSE Method (19, 20) 0.648245 Method (23, 24)10290.26 MAE Method (19, 20) 0.618199 Method(23, 24) 76.15196673 30 Y ears Ev aluation RMSE Method (15) 3.663452813 Method (2) 107.2547 MAPE Method (15) 0.539494 Method (1) 0.701563 MSE Method (15) 13.42089 Method (1)11503.57 MAE Method (15) 2.310839 Method(1) 70.64115023 T able 7. A v erage e v aluation result Data Set RMSE MA G MAPE MA G MSE MA G MAE MA G Magnitude Data 10 Y ears (Grouping) 0.963318 0.21023 0.94712 0.777716 Data 30 Y ears (Grouping) 0.854072 0.173682 0.746437 0.676576 Data 10 Y ears (No Grouping) 0.868458 0.147251 0.757441 0.672579 Data 30 Y ears (No Grouping) 5.051307 0.866291 25.78514 3.706884 Depth Data 10 Y ears (Grouping) 127.0155 1.070409 16153.99 68.82178 Data 30 Y ears (Grouping) 125.8881 1.162366 15885.88 70.96083 Data 10 Y ears (No Grouping) 109.1246 2.463045 11940.31 80.3022 Data 30 Y ears (No Grouping) 109.8351 0.765595 12066.61 72.89245 T able 8. Machine learning performance for 30 years Method RMSE MA G MAPE MA G MSE MA G MAE MA G Grouping Data Based on Magnitude Multinomial Logistic Re gression 0.777235 0.160233 0.604094 0.61487 SVM 0.751008 0.156257 0.564013 0.598473 Na ¨ ıv e Bayes 0.922814 0.183305 0.851585 0.716253 Grouping Data Based on Depth Multinomial Logistic Re gression 121.9435 0.817061 14870.22 67.01762 SVM 120.3226 0.809354 14477.52 64.57616 Na ¨ ıv e Bayes 123.5369 1.308522 15261.35 70.61942 T able 9. Machine learning performance for 10 years Method RMSE MA G MAPE MA G MSE MA G MAE MA G Not Grouping Data Based on Magnitude Multinomial Logistic Re gression 0.884768 0.150343 0.782815 0.687099 SVM 0.805137 0.135727 0.648245 0.618199 Not Grouping Data Based on Depth Multinomial Logistic Re gression 109.8913 2.797098 12076.09 80.97818 SVM 101.4409 1.835921 10290.26 76.15197 TELK OMNIKA T elecommun Comput El Control, V ol. 18, No. 3, June 2020 : 1331 1342 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1339 3.2. Results T o sho w the implementation of our prediction into a more visualize information, a web s ervice presentation is sho wn using R Shin y system. An original information of earthquak e is retrie v ed from Indonesian Geological center . sho wn in Figure 2(a). W e compare the earthquak e report from the BMKG Indonesia, as sho wn in Figure 2(a), and compare it to the prediction results we made before the date of e v ent that is depicted in Figure 2(b), 2(c) and 2(d). Our prediction is based on the number of day within a year . F or e xample if we w ant to predict earthquak e in March 11, 2019, then we count number of days from the be ginning of the year up until the D day , where from the calculation we ha v e 70 days. Then, we select the v alue of day , which is 70 days, into the web-system. In our map, the red colour sho ws the prediction result and the yello w colour sho ws the original data. In comparing the earthquak e report from BMKG Indonesia and our prediction result sho ws that predi ction using Na ¨ ıv e Bayes, as sho wn in 2(b), based on the original learning data is not good enough. multinomial logistic re gression performs better than Na ¨ ıv e Bayes, as sho wn in 2(c), the earthquak e location slightly close to the report from BMKG. support v ector machine (SVM) achie v e better results for eastern Indonesia re gion, which is out performs other methods. I t is w orth to note that the training data influence the prediction results. Ov erall, the prediction results ha v e updated our kno wledge that dif ferent machine learning may perform dif ferently , although similar data sets were used for training. In our analysis, SVM may ha v e a chance for better earthquak e prediction. (a) (b) Comparison of mac hine learning performance ... (I Made Murwantar a) Evaluation Warning : The document was created with Spire.PDF for Python.
1340 ISSN: 1693-6930 (c) (d) Figure 1. Earthquak e occurs on March 11, 2019, (a) original information from BMKG Indonesia [16], (b) prediction using Na ¨ ıv e Bayes, (c) prediction using multinomial logistic re gression, (d) prediction Using SVM. 4. CONCLUSION W e ha v e compared machine lea rning method to predict earthquak e location, depth and magnitude for Indonesia re gion. In order to visualize the predict ion results, a web-based application has also been demon- strated. The conclusion we obtained from this w ork as follo w , Na ¨ ıv e Bayes method is not good enough to predict for a grouping data set for only one year , and it is applicable for multi year grouping data. Considering the a v erage error rate, SVM method outperforms other algorithm where using Magnitude data as reference pro vides better results than using the Depth data. This information leads us into an insight that the Depth can be used as the addition f actor for better prediction. W e deal with day , month and year as date property for prediction, and our observ ation sho ws that prediction based on day performs better . F or o v erall data set, as we already e xpected, SVM outperforms other method that is follo wed by multinomial logistic re gression in predicting. Na ¨ ıv e Bayes performed w orst from all prediction results. TELK OMNIKA T elecommun Comput El Control, V ol. 18, No. 3, June 2020 : 1331 1342 Evaluation Warning : The document was created with Spire.PDF for Python.