TELK
OMNIKA
T
elecommunication,
Computing,
Electr
onics
and
Contr
ol
V
ol.
18,
No.
3,
June
2020,
pp.
1331
∼
1342
ISSN:
1693-6930,
accredited
First
Grade
by
K
emenristekdikti,
No:
21/E/KPT/2018
DOI:
10.12928/TELK
OMNIKA.v18i3.14756
❒
1331
Comparison
of
machine
lear
ning
perf
ormance
f
or
earthquak
e
pr
ediction
in
Indonesia
using
30
y
ears
historical
data
I
Made
Murwantara
1
,
Pujianto
Y
ugopuspito
2
,
Rickhen
Hermawan
3
1,2
Informatics
Department,
F
aculty
of
Computer
Science,
Uni
v
ersitas
Pelita
Harapan,
Indonesia
3
Under
graduate
Program,
Informatics
Department,
F
aculty
of
Computer
Science,
Uni
v
ersitas
Pelita
Harapan,
Indonesia
Article
Inf
o
Article
history:
Recei
v
ed
Aug
15,
2019
Re
vised
Jan
13,
2020
Accepted
Feb
24,
2020
K
eyw
ords:
Big
data
Earthquak
e
Machine
learning
Multinomial
logistic
re
gression
Na
¨
ıv
e
bayes
Prediction
SVM
ABSTRA
CT
Indonesia
resides
on
most
earthquak
e
re
gion
with
more
tha
n
100
acti
v
e
v
olcanoes,
and
high
number
of
seismic
act
i
vities
per
year
.
In
order
to
reduce
the
casualty
,
some
method
to
predict
earthquak
e
ha
v
e
been
de
v
eloped
to
estimate
the
seismic
mo
v
ement.
Ho
we
v
er
,
most
prediction
use
only
short
term
of
historical
data
to
predict
the
incoming
earthquak
e,
which
has
limitation
on
model
performance.
This
w
ork
uses
medium
to
long
term
earthquak
e
historical
data
that
were
collected
from
2
local
go
v
ernment
bodies
and
8
le
gitimate
international
sources.
W
e
mak
e
an
estimation
of
a
medium-
to-long
term
prediction
via
machine
learning
algorithms,
which
are
multinomial
logistic
re
gression,
support
v
ector
machine
and
Na
¨
ıv
e
Bayes,
and
compares
their
performance.
This
w
ork
sho
ws
that
the
support
v
ector
machine
outperforms
other
method.
W
e
compare
the
root
mean
square
error
computation
results
that
lead
us
into
ho
w
concentrate
d
data
is
around
the
line
of
best
fit,
where
the
multinomial
logistic
re
gression
is
0.777,
Na
¨
ıv
e
Bayes
is
0.922
and
support
v
ector
machine
is
0.751.
In
predicting
future
earthquak
e,
support
v
ector
machine
outperforms
other
tw
o
methods
that
produce
significant
distance
and
magnitude
to
current
earthquak
e
report.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
I
Made
Murw
antara,
Informatics
Department,
F
aculty
of
Computer
Science,
Uni
v
ersitas
Pelita
Harapan,
T
angerang,
Banten,
15811,
Indonesia,
Email:
made.murw
antara@uph.edu
1.
INTR
ODUCTION
An
earthquak
e
is
a
natural
disaster
that
occurs
as
a
result
of
rocks
layer
mo
v
ement
or
displacement
of
the
earth
tectonic
plate.
This
precipitous
mo
v
ement
releases
a
huge
amount
of
ener
gy
that
creates
a
kind
of
seismic
w
a
v
es.
The
vibration
results
that
passed
through
the
earth
surf
ace
caused
damage
for
the
population
that
li
v
es
on
the
earthquak
e
impact
areas.
Indonesia
with
more
than
300
million
inhabitants
is
a
country
located
in
the
most
frequent
earthquak
e
re
gion
as
it
has
about
127
acti
v
e
v
olcanoes
[1],
which
usually
called
the
Ring
of
Fire
area
that
become
the
most
acti
v
e
tectonic
mo
v
ement.
Moreo
v
er
,
Indonesia
also
has
the
Great
Sumatran
F
ault
that
span
1900
km
length
and
the
Banda
Sea
con
v
er
gent
flat
mar
gin
that
creates
e
v
en
more
seismic
acti
vities
[2,
3].
J
ournal
homepage:
http://journal.uad.ac.id/inde
x.php/TELK
OMNIKA
Evaluation Warning : The document was created with Spire.PDF for Python.
1332
❒
ISSN:
1693-6930
No
w
adays,
the
earthquak
e
w
arning
system
already
installed
in
man
y
remote
and
v
olcanic
areas
that
might
increases
the
number
survi
v
or
e
xpectation.
Moreo
v
er
,
man
y
research
outcomes
also
g
ain
more
information
about
earthquak
e
characteristics
and
impacts
to
the
surrounding
area.
machine
learning
has
also
been
used
to
mak
e
adv
ancement
on
the
information
and
prediction
results.
Ho
we
v
er
,
some
machine
learning
w
ork
result
still
has
not
pro
vided
accurate
prediction,
and
sometimes
rise
up
a
f
alse
alarm
because
of
lack
of
the
v
olume
of
data
or
the
prediction
method
[4].
In
our
kno
wledge,
the
application
of
the
earthquak
e
prediction
still
has
a
space
for
us
to
augment
into
a
certain
point
that
gi
v
es
us
more
confidence
and
better
results.
Furthermore,
a
good
and
reasonable
prediction
will
pro
vide
opportunities
to
manage
the
emer
genc
y
route
path
for
e
v
acuation
which
may
reduce
the
casualties.
In
order
to
pro
vide
data
for
prediction,
we
utilize
the
data
collection
from
se
v
eral
earthquak
e
and
seismological
repositories
.
The
list
of
data
resources
for
our
research
as
follo
ws,
the
United
States
Geological
Surv
e
y
(USGS)
[5],
Incorporated
Research
Institution
for
Seismology(IRIS)
[6],
National
Oceanic
and
Atmospheric
Adm
inistration
(NO
AA)
[7],
European-Mediteranian
Seismological
Centre
(EMSC)
[8],
International
Seismological
Centre
(ISC)
[9],
Istituto
Nazionale
di
Geofisica
e
V
ulcanologia
(INGV)
[10],
GeoF
orschungZentrum
(GFZ)
[11,
12],
Indonesia
Tsunami
Early
W
arning
System
(InaTEWS)
[13],
Global
Historical
Earthquak
e
Archi
v
e(GHEA)
[14,
15],
and
Badan
Meteorologi,
Klimatology
dan
Geofisika
(BMKG)
Indonesia
[16].
The
v
olume
of
the
data
collection
produces
more
than
1TB.
After
cleansing
to
ha
v
e
only
data
within
Indonesia
re
gion,
we
ha
v
e
around
375
GB
data
which
is
used
as
training
and
testing
data.
Considering
the
v
olume
of
data,
this
w
ork
is
a
Big
Data
research.
In
this
w
ork,
we
compare
the
performance
of
three
machine
learning
approaches,
which
are
multinomial
logistic
re
gression
[17,
18],
Na
¨
ıv
e
Bayes
[2,
19–21]
and
support
v
ector
machine
(SVM)
[4,
22–25]
to
the
earthquak
e
dat
a.
Where,
Logistic
Re
gression
pro
vides
information
of
relat
ionship
between
v
ariant
and
to
find
out
ho
w
close
is
one
or
more
v
ariable
to
another
one.
Na
¨
ıv
e
Bayes
approach
allo
ws
us
to
compute
the
probability
that
is
tak
en
from
ne
w
information.
SVM
is
used
for
classification
and
re
gression
analysis
of
separation
h
yperplane.
The
contrib
ution
of
this
paper
is
tw
ofold:
(a)
In
predicting
a
disaster
such
as
earthquak
e,
a
comparison
between
dif
ferent
machine
learning
algorithms
may
gi
v
e
light
for
a
ne
w
approach.
W
e
propose
a
technique
that
is
comparable
to
other
approach
for
earthquak
e
prediction
in
Indonesia
re
gion.
Our
method
f
acilitates
of
prediction
and
visualization
that
range
within
50
years
of
seismic
historical
data
which
is
particularly
helpful
to
classify
of
ho
w
dif
ferent
machine
learning
performance
could
put
light
on
our
method
of
prediction.
T
o
this,
our
approach
can
also
adjust
the
size
of
data
for
better
prediction.
This
is
useful
since
the
size
of
data,
som
etimes,
influence
the
training
and
testing
process
for
ultimate
prediction.
Other
than
that,
we
ha
v
e
fle
xibility
on
testing
our
results.
(b)
The
data
collection
and
cleansing
includes
massi
v
e
v
olume
of
data
which
creates
rich
resources
for
prediction.
W
e
collect
the
data
from
le
gitimate
or
g
anization
all
o
v
er
the
w
orld
that
compares
with
the
local
monitoring
by
the
go
v
ernment
bodies
in
Indonesia.
The
data
cleansing
also
tak
es
most
of
our
time
which
is
not
only
retrie
v
e
ra
w
data,
it
is
also
through
web
scrapping
and
data
transformation.
Some
information
need
t
o
be
inspected
carefully
,
as
the
monitoring
data
may
be
irrele
v
ant
for
our
w
ork.
T
o
this,
we
analyze
the
data
based
on
whether
the
location
of
monitoring
and
its
data
rele
v
ant.
F
or
e
xample,
the
earthquak
e
data
that
released
by
a
resource
that
tak
en
from
third
part
y
or
not
primarily
generated
by
a
specific
seismic
monitoring
station.
2.
RESEARCH
METHOD
2.1.
Rele
v
ant
w
orks
The
impro
v
ement
of
earthquak
e
prediction
has
been
utilized
via
historical
seismic
data.
The
most
promising
technique
is
to
use
the
Artificial
Intelligence
(AI)
and
machine
learning
(ML)
has
g
ained
further
kno
wledge
[26].
In
[27],
Bertrand
et
al.
identify
the
possibility
of
upcoming
earthquak
e
by
forecasting
the
laboratory
quak
e
c
ycle,
which
re
v
eals
the
timing
of
the
e
v
ent
will
probably
occurs.In
general,
earthquak
e
prediction
is
cate
gorized
into
three
dif
ferent
terms
that
is
based
on
the
length
of
the
historical
data
source.
Short
term
earthquak
e
prediction
needs
a
precursor
to
strengthen
its
accurac
y
[28],
while
intermediate
and
long
term
prediction
mak
es
estimation
on
statistical
probability
approach.
Syif
a
et
al.
[29]
uses
SVM
to
analyze
post
earthquak
e
situation
to
assess
the
distrib
ution
of
seismic
destruction,
which
can
be
useful
for
e
v
acuation
and
mitig
ation
plan.
Another
technique
to
address
t
he
prediction
of
earthquak
e
uses
the
meteorological
data
[30]
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
3,
June
2020
:
1331
–
1342
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
❒
1333
based
on
the
particle
filter
-based
and
support
v
ector
re
gression.
This
technique
obtained
natura
l
information,
such
as
air
temperature,
g
as
concentration
and
wind
speed
to
estimate
the
precursor
of
earthquak
e.
2.2.
Backgr
ound
This
section
will
discuss
the
background
theory
of
the
w
ork
that
co
v
ers
the
earthquak
e
theory
and
machine
learning
approaches.
The
earthquak
e
background
theory
is
cate
gorized
into
earthquak
e
types,
seismic
w
a
v
e
and
earthquak
e
phenomena
in
Indonesia.
The
machine
learning
co
v
ers
the
multinomial
logistic
re
gres-
sion,
Na
¨
ıv
e
Bayes
and
support
v
ector
machine.
2.2.1.
Earthquak
e
An
earthquak
e
is
a
natural
disaster
that
creates
tremor
or
vibration
in
the
impacted
area
as
a
result
of
earth
rocks
layers
mo
v
ement
or
displacement
because
of
the
tectonic
dislocation.
This
vibration
will
reach
the
earth
surf
ace
that
causes
massi
v
e
destruction.
There
are
four
types
of
earthquak
e,
which
are
tectonic,
v
olcanic,
collapse
and
e
xplosion.
As
sho
wn
in
Figure
1,
three
types
of
of
surf
ace
mo
v
ement
that
caused
an
earthquak
e
that
appears
not
on
e
v
ery
place
in
the
earth.
In
general,
the
mo
v
ement
of
e
arth
surf
ace
as
the
cause
of
an
earthquak
e
when
(a)
tw
o
plates
mo
v
es
a
w
ay
to
dif
ferent
direction,
(b)
tw
o
plates
mo
v
e
in
to
the
same
point
of
line
and
(c)
these
plates
mo
v
e
side-by-side
on
opposite
direction.
Figure
1.
Earthquak
e
types
(a)
di
v
er
gent,
(b)
con
v
er
gent
and
(c)
transform
The
layer
of
earth
skin
has
high
temperature
that
distrib
utes
its
heath
into
surrounding
area.
In
gen-
eral,
this
v
olcanicacti
vitykno
wn
as
the
heath
flo
w
con
v
ection.
This
kind
of
acti
vity
pushes
the
magma
into
the
surf
ace
which
creates
v
olcanoe.
Indonesia
is
an
archipelago
that
located
in
the
Circum-P
acific
and
Meditera-
nian
which
has
a
lot
of
numbers
of
acti
v
e
v
olcanoes.
T
o
this,
Indonesia
becomes
one
of
the
high
risk
countries
on
earthquak
e
disaster
.
In
term
of
earthquak
e
prediction,
it
is
cate
gorized
based
on
ho
w
the
earthquak
e
oc-
curs.
There
are
three
cate
gory
of
prediction.
The
first
is
long
term
prediction,
where
this
prediction
rarely
implemented
a
s
it
get
s
the
ra
ng
e
of
more
than
10
years
of
historical
data
and
some
additional
informat
ion
from
sequential
earthquak
e
as
a
result
of
f
ault
location.
The
second
is
the
intermediate
prediction
that
obtained
in-
formation
from
the
e
arthquak
e
location,
time
and
destruction
po
wer
within
se
v
eral
years.
The
last
one
is
the
short-term
prediction
that
mak
es
an
earthquak
e
estimation
using
se
v
eral
days
of
data
set.
2.2.2.
Machine
lear
ning
machine
learning
b
uilds
an
insight
from
one
or
more
dataset
via
some
specific
al
gorithms.
In
thi
s
w
ork,
we
compare
the
performance
of
three
machine
learning
algorithms,
namel
y
Na
¨
ıv
e
Bayes,
support
v
ector
machine
(SVM)
and
multinomial
re
gression.
Comparison
of
mac
hine
learning
performance
...
(I
Made
Murwantar
a)
Evaluation Warning : The document was created with Spire.PDF for Python.
1334
❒
ISSN:
1693-6930
a.
SVM
In
general,
SVM
is
used
to
solv
e
classification
and
re
gression
problem.
Ho
we
v
er
,
SVM
has
g
ained
its
popularity
as
it
has
good
performance
on
empirical
data.
SVM
conceptually
simple,
it
has
f
ast
learning
al-
gorithm
and
v
ery
often
produce
accurate
results.
This
is
because
SVM
is
a
machine
learning
that
is
de
v
eloped
based
risk
minimization
principle.
In
SVM,
a
training
data
set
D
is
gi
v
en
as,
D=
{
(
x
i
,
y
i
)
|
x
i
∈
R
p
,
y
i
∈
{−
1
,
1
}}
n
i
=1
,
y
i
is
-1
or
1
indicating
the
class
input
which
is
a
threshold
w
a
v
elet
coef
ficients
x
i
to
describe
lo
w
or
high
magnitude.
F
or
each
x
i
is
the
p
dimensional
v
ector
.
A
Hyperplane
is
used
to
separate
between
class
input
which
is
good
when
its
position
between
classes.
So
that,
if
w
x
1
+
b
=
+
1
is
a
supporti
ng
h
yperplane
of
class
+1,
then
w
x
2
+
b
=
−
1
is
the
h
yperplane
to
support
class
-1.
In
order
to
count
the
g
ap
mar
gin
between
tw
o
classes,
we
can
find
the
distance
between
tw
o
supporting
h
yperplanes.
This
mar
gin
can
be
identified
via
(
w
x
1
+
b
=
+
1)
−
(
w
x
2
+
b
=
−
1)
=
w
(
x
1
−
x
2
)
,
so
that,
w
(
x
1
−
x
2
)
|
|
w
|
|
=
2
|
|
w
|
|
.
F
or
Linear
classification,
it
will
be
mi
n
(
w
,
b
)
1
2
w
2
,
and
for
non-linear
ˆ
a
=
a
r
g
mi
n
a
1
2
Σ
m
i
,
j
=1
a
i
a
j
y
i
y
j
K
(
x
i
,
x
j
)
−
Σ
m
i
=1
a
i
where
K
(
x
i
,
x
j
)
is
a
k
ernel
function.
b
.
Multinomial
logistic
re
gression
This
method
anal
yzes
the
relation
between
bounded
and
unbounded
v
ariable
that
ha
v
e
more
than
tw
o
v
ariables
which
generalize
logistic
re
gression
into
multiclass
re
gression.
Multinomial
logistic
re
gression
model
with
three
cate
gories
will
ha
v
e
formula
as
follo
w
,
P
(
Y
=
i
|
x
)
=
π
y
(
x
)
=
e
x
p
(
g
i
(
x
)
)
1
+
P
2
h
=1
e
x
p
(
g
h
(
x
)
)
(1)
c.
Na
¨
ıv
e
bayes
Na
¨
ıv
e
Bayes
is
a
simple
classification
for
counting
the
probability
of
combinations
of
a
certain
data
set.
This
method
assumes
there
is
no
dependenc
y
between
classes
to
a
v
alue
in
class
v
ariable.
Bayes
theorem,
as
sho
wn
belo
w
,
deri
v
es
the
posterior
probability
of
tw
o
antecedents,
which
are
prior
probabili
ty
and
a
lik
elihood
function.
P
(
X
|
H
)
=
P
(
X
|
H
)
.
P
(
H
)
P
(
H
)
(2)
Where,
X
is
the
data
with
unkno
wn
class,
H
is
the
h
ypothesis
data
for
class
specification,
aa
is
the
probability
of
h
ypothesis
H
based
on
the
poste
rior
probability
(
X
),
P
(
H
)
is
the
prior
probability
,
P
(
X
|
H
)
is
the
probability
observing
X
gi
v
en
H
,
and
P
(
X
)
is
the
mar
ginal
e
vidence
of
probability
of
X
.
d.
Ev
aluation
method
In
order
to
e
v
aluate
the
machine
learning
performance,
we
mak
e
use
of
confusion
matrix,
mean
abso-
lute
error
(MAE),
mean
Absolute
percentage
error
(MAPE),
mean
square
error
(MSE)
and
root
mean
square
error
(RMSE).
Confusion
matrix
describes
the
performance
of
classification
model
from
dif
ferent
classes.
The
classifier
has
done
its
w
ork
when
it
g
ained
the
information
of
true
positi
v
e
(TP)
and
true
ne
g
ati
v
e
(TN).
And,
when
it
classifies
the
ne
g
ati
v
e
v
al
ue
it
wi
ll
produce
t
h
e
f
alse
pos
iti
v
e
(FP)
and
f
alse
ne
g
ati
v
e
(FN).
In
measuring
machine
learning
performance,
we
e
v
a
luates
for
their
accurac
y
(percent
of
correctness
o
v
er
all
test
instances)
and
precision.In
t
h
i
s
paper
,
we
measure
the
performance
using
mean
absoule
error
(MAE),
mean
absolute
percentage
error
(MAPE),
mean
square
error
(MSE)
and
root
mean
square
(RMSE),
R
M
S
E
=
v
u
u
t
1
n
n
X
t
=1
(
ˆ
y
i
−
y
i
)
2
(3)
As
sho
wn
in
the
e
v
aluation
formula
abo
v
e,
ˆ
y
i
is
the
predic
ted
earthquak
es,
y
i
is
the
dat
a
of
earthquak
e
from
the
resources
and
T
is
the
number
of
e
xamples
used
for
testing.
MAE
measures
whether
our
computation
to
w
ards
under
and
o
v
er
estimations
[28].
MSE
is
the
most
common
w
ay
to
e
v
aluate
the
prediction
results,
where
the
error
is
the
dif
ferences
between
the
estimation
result
and
its
data.
MAPE
is
the
e
v
aluation
to
indicate
error
when
predicting
between
the
original
data
and
its
result.
MAPE
useful
when
the
size
of
v
ariable
is
important
to
e
v
aluate
the
prediction.
Meanwhile,
RMSE
measurement
emphasizes
lar
ge
errors
more.
RMSE
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
3,
June
2020
:
1331
–
1342
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
❒
1335
e
v
aluates
ho
w
close
the
observ
ed
data
points
are
to
the
models’
predicted
v
alues
and
MAE
describes
uniformly
distrib
uted
err
ors.
It
is
w
orth
to
note
that
the
RMSE
v
alue
is
similar
to
the
unit
of
the
outcome.
F
or
e
xample,
when
it
measure
the
depth
of
an
earthquak
e
then
the
unit
is
km.
2.3.
Data
collection
This
stage
be
gins
all
of
our
w
ork
by
collecting
data
from
dif
ferent
location
and
v
arious
form
ats.
The
challenge
in
this
acti
vity
is
that
some
data
can
be
retrie
v
ed
directly
from
repository
as
ready
to
use
data.
In
this
w
ork,
the
data
collection
acti
vity
is
cate
gorized
into
3
methods,
as
follo
w:
(a)
Retrie
v
e
directly
from
the
repository
as
it
is
pro
vided
in
a
ready
to
use
format,
such
as
comma
separated
v
alue
(CSV).
(b)
Retrie
v
e
a
web
site,
manual
ly
,
in
a
h
yperte
xt
markup
language
(HTML)
format.
Then
web-scraping
to
get
the
information
we
need
from
within
the
HTML
te
xt
file.
Se
v
eral
techni
qu
e
s
applied
to
dif
ferent
data
source.
W
e
retrie
v
e
the
EMSC
data
by
accessing
or
do
wn-
load
of
each
web
page
within
14
years
(2004
–
2018).
The
webscraping
technique
is
applied
to
resources
from
NO
AA,
EMSC,
ISC,
I
NGV
,
GFZ
and
BMKG.
F
or
InaTEWS,
we
do
wnloaded
manually
.
Other
data
set
also
do
wnloaded
directly
,
such
as
GHEA
where
the
data
format
is
not
in
CSV
.
USGS
dat
a
is
in
CSV
format
that
we
can
do
wnloaded
almost
all
the
data
that
range
from
1st
January
1900
until
31st
August
2018.
F
or
IRIS
data
set
we
obtained
data
range
1968
to
2018.
INGV
data
set
ranges
from
1985
to
2018,
and
for
BMKG
data
set
range
2008
to
2018.
2.4.
Data
pr
e-pr
ocessing
This
stage
prepares
the
data
before
we
mak
e
an
y
prediction.
Most
of
the
w
ork
in
this
stage
is
fil
ter
-
ing
the
information
such
as
to
identify
whether
the
date,
time,
latitude,
longitude,
magnit
ude
and
depth
e
xist
within
the
data
set.
W
e
also
remo
v
e
the
data
that
has
magnitude
v
alues
0
to
a
v
oid
an
y
misclassification
during
processing
stage.
Data
mer
ges
also
done
in
this
stage.
F
or
e
xample,
we
mak
e
classification
of
data
within
the
same
range
of
dates
into
10
years
and
30
year
s.
In
doing
so,
we
obtained
the
intersection
of
data
from
dif
ferent
resources.
2.5.
Pr
ediction
stage
This
stage
predicts
the
data
set
for
specific
group
of
10
and
30
years.
W
e
split
the
w
ork
into
tw
o
parts.
In
the
first
part,
we
train
the
data
using
set
of
group
based
on
time,
date,
latitude,
longitude,
magnitude
and
depth
to
find
the
location
and
the
possibility
ener
gy
of
earthquak
e.
In
the
ne
xt
part,
we
split
the
dataset
into
train
and
test
that
already
cate
gorized
into
4
groups
which
are
latitude,
longitude,
magnitude
and
depth,
where
the
split
ratio
is
0.8
o
v
er
1.0.
W
e
mak
e
use
R
[31]
as
a
tool
to
mak
e
prediction
and
its
library
implement
some
machine
learning
methods
that
we
implement
to.
F
or
Na
¨
ıv
e
Bayes
we
use
the
function
Nai
v
e
Bayes
and
SVM
for
support
v
ector
machine
from
library
e0171
[32].
multinomial
logistic
re
gression
uses
multinom
function
from
library
NNET
[33].
T
o
predict
the
earthquak
e,
the
object
is
splitted
to
ha
v
e
specific
result.
F
or
e
xample,
we
predict
the
location
of
earthquak
e
as
the
first
step.
Then,
the
magnitude
and
depth
of
earthquak
e
is
predicted
based
on
the
ne
w
location
that
already
estimat
ed
in
the
pre
vious
step.
The
result
of
prediction
is
the
combination
of,
both,
the
first
step
and
the
second
step.
In
predicting
t
he
location
of
earthquak
e,
we
ha
v
e
implemented
tw
o
techniques.
First,
we
mak
e
use
of
Geohash
library
to
mer
ge
the
latitude
and
longitude.
Second,
we
also
predict
the
location
of
earthquak
e
using
only
latitude
and
longitude.
W
e
split
our
prediction
based
on
location
as
sho
wn
in
T
able
1.
It
is
w
orth
noting
that
the
latitude
and
longitude
is
in
de
grees
using
decimal
fraction.
T
able
1.
Prediction
F
actor
Based
on
Location
Method
Machine
Learning
Location
GeoHash
Latitude
Longitude
Data
Depth
Depth+Magnitude
Magnitude
Depth
Depth+Magnitude
Magnitude
In
predicting
the
magnitude
v
alues
of
an
earthquak
e,
we
f
actorize
the
prediction
into
tw
o
f
ac
tors.
First,
in
order
to
get
into
magnitude
prediction
the
latitude
and
longitude
are
used
to
get
the
po
wer
of
earth-
quak
e.
Second,
we
predict
via
the
combination
of
location
and
depth,
as
depicted
in
T
able
2.
F
or
the
depth
of
Comparison
of
mac
hine
learning
performance
...
(I
Made
Murwantar
a)
Evaluation Warning : The document was created with Spire.PDF for Python.
1336
❒
ISSN:
1693-6930
earthquak
e,
we
f
actorized
into
the
opposite
of
the
magnitude
prediction,
as
sho
wn
i
n
T
able
3.
T
o
visualize
our
results,
we
mak
e
use
of
R
tool
with
Shin
y
[34]
library
that
o
v
erlay
on
top
of
map
that
retrie
v
ed
from
google
map
using
ggmap
[35]
library
.
The
final
application
of
this
w
ork
is
a
web-based
system.
T
able
2.
Prediction
F
actor
Based
on
Depth
Machine
Learning
Prediction
Location
Based
on
Depth
Prediction
Location
Based
on
Depth
and
Magnitude
Prediction
Location
Based
on
Magnitude
Data
Longitude
+Latitude
Longitude
+
Latitude
+
Depth
Longitude
+Latitude
Longitude
+
Latitude
+
Depth
Longitude
+Latitude
Longitude
+
Latitude
+
Depth
T
able
3.
Prediction
F
actor
Based
on
Magnitude
Machine
Learning
Prediction
Location
Based
on
Depth
Prediction
Location
Based
on
Depth
and
Magnitude
Prediction
Location
Based
on
Magnitude
Data
Longitude
+Latitude
Longitude
+
Latitude
+
Magnitude
Longitude
+Latitude
Longitude
+
Latitude
+
Magnitude
Longitude
+Latitude
Longitude
+
Latitude
+
Magnitude
3.
RESUL
TS
AND
AN
AL
YSIS
3.1.
Analysis
In
this
w
ork,
we
mak
e
prediction,
solely
,
based
on
the
earthquak
e
data
set.
Data
processes
in
tw
o
condition,
first,
we
grouped
into
10
Y
ears
and
30
Y
ear
,
second,
without
grouping
or
i
ndi
vidual
data.
Other
than
that,
Na
¨
ıv
e
Bayes
cannot
create
prediction
for
10
and
30
Y
ear
indi
vi
d
ua
l
data
set
because
of
imbalance
data
set.
W
e
split
the
training
and
testing
data
into
60%
and
40%.
W
e
tak
e
into
account
the
smaller
error
will
guide
us
into
more
accurate
prediction.
T
o
reduce
the
comple
xity
of
our
w
ork,
we
manage
the
prediction
using
a
catalog
that
describe
the
method
and
data
set,
as
sho
wn
in
T
able
4.
As
sho
wn
in
T
able
5,
the
actual
data
that
is
grouped
into
10
years
using
dif
ferent
e
v
aluation
techniques.
SVM
sho
ws
good
result
for
Magnitude
prediction
and
multinomial
logistic
re
gression
has
better
results
for
data
with
Depth.
Na
¨
ıv
e
Bayes
is
not
included
into
10
years
analysis.
On
the
other
hand,
SVM
outperforms
other
method
for
30
years
dataset
with
grouping
on
Magnitude
a
nd
Depth,
as
sho
wn
i
n
table
5.
It
sho
ws
that
the
prediction
accurac
y
as
sho
wn
by
MAE
has
0.598473
which
e
xplicate
that
the
prediction
results
of
earthquak
e
is
quite
precision
than
other
method.
In
making
prediction
using
10
years
of
data
without
grouping,
SVM
outperforms
other
algorithm
which
predict
the
earthquak
e
location
based
on
Magnitude
and
Depth.
In
this
prediction,
SVM
solely
predict
the
f
actor
of
latitude
and
longitude.
The
result,
as
depicted
in
table
6,
sho
ws
that
the
prediction
has
achie
v
ed
good
result
when
the
information
of
Magnitude
and
Depth
estimates
the
coordinate
location.
In
predicting
earthquak
e
for
30
years
dataset
without
grouping,
multinomial
logistic
re
gression
(MLR)
e
xceeds
other
algorithm.
It
sho
ws
that
using
Magnitude
and
Depth
data,
as
sho
wn
in
T
able
6,
MLR
has
smaller
error
than
SVM,
where
in
this
prediction
Na
¨
ıv
e
Bayes
is
not
included
because
of
imbalance
data.
In
the
ne
xt
step,
we
w
ould
lik
e
to
find
out
which
method
of
machine
learning
suitable
to
predict
earthquak
e.
T
o
this,
we
calculate
the
a
v
erage
of
data
set
to
gi
v
e
us
an
insight
of
which
data
set
can
pro
vide
small
error
rate.
As
sho
wn
in
figure
7,
the
most
applicable
data
set
is
for
30
year
grouping
data
and
10
years
not
grouping
data,
as
both
sho
ws
lo
w
le
v
el
of
error
rate.
And
we
analyze
that
those
data
set
has
a
chance
to
ha
v
e
good
predic
tion.
In
more
detail,
both,
the
30
years
grouping
and
10
years
not
grouping
data
set,
SVM
outperfoms
other
data
with
small
error
rate
on
using
Magnitude
information,
which
also
sho
ws
small
er
error
compares
to
the
Depth
information.
So
that,
we
analyze
that
SVM
will
predict
earthquak
e
much
better
when
using
solely
,
on
Magnitude
information.
From
the
information
in
T
able
7,
we
analyze
that
the
earthquak
e
prediction
should
be
more
accurate
when
we
use
Magnitude
data
as
reference.
In
contrast,
when
the
Depth
data
are
used
as
reference,
we
might
encounter
the
accurac
y
and,
probably
,
has
problem
to
predict
the
earthquak
e
location
prediction.
These
data
gi
v
e
us
vision
that
the
depth
data
might
ha
v
e
its
use
to
predict
the
destruction
that
might
appear
to
the
location
prediction.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
3,
June
2020
:
1331
–
1342
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
❒
1337
In
measuring
the
performance
of
which
machine
learning
method
that
suitable
for
earthquak
e
predi
c-
tion
in
Indonesia,
we
compare
the
a
v
erage
error
rate
for
not
grouping
and
grouping
data
set.
Our
result
sho
ws
that
the
30
Y
ears
grouping
and
10
years
not
grouping
data
set
gi
v
e
us
a
reasonable
v
alues.
As
sho
wn
in
T
able
8,
SVM
outperforms
multinomial
logistic
re
gression
and
Nai
v
e
Bayes.
And
also,
10
years
not
grouping
data
set,
SVM
sho
ws
better
performance
than
Multinomial
Logistic
Re
gresion,
as
depicted
in
T
abl
e
9.
Where
in
10
Y
ears
not
grouping
data
set,
because
of
imbalance
data,
we
cannot
obtain
result
from
Na
¨
ıv
e
Bayes
method.
Ov
erall,
our
e
v
aluation
on
machine
learning
performance
sho
ws
that
the
grouping
and
not
grouping
data
set
which
uses
Magnitude
as
grouping
reference
performs
better
than
using
Depth
v
alues.
Moreo
v
er
,
SVM
method
sho
w
better
performance
than
other
algorithm.
Due
to
that
we
belie
v
e
the
prediction
of
earthquak
e
that
mak
e
use
of
SVM
w
ould
pro
vide
better
accurac
y
than
multinomial
logistic
re
gression
and
Nai
v
e
Bayes
using
similar
data
set.
T
able
4.
An
e
xcerpt
of
10
years
group
for
prediction
method
and
dataset
No
Method
Location
Data
1
MultiLogRe
g
Depth
Predict(NonDepth)
2
MultiLogRe
g
Depth
Predict(NonDepthNonMag)
3
MultiLogRe
g
Depth
Predict(NonMag)
4
MultiLogRe
g
Depth
PredictGeoHash(NonDepth)
5
MultiLogRe
g
Depth
PredictGeoHash(NonDepthNonMag)
6
MultiLogRe
g
Depth
PredictGeoHash(NonMag)
7
MultiLogRe
g
Depth+MA
G
Predict(NonDepth)
8
MultiLogRe
g
Depth+MA
G
Predict(NonDepthNonMag)
9
MultiLogRe
g
Depth+MA
G
Predict(NonMag)
10
MultiLogRe
g
Depth+MA
G
PredictGeoHash(NonDepth)
11
MultiLogRe
g
Depth+MA
G
PredictGeoHash(NonDepthNonMag)
12
MultiLogRe
g
Depth+MA
G
PredictGeoHash(NonMag)
13
MultiLogRe
g
MA
G
Predict(NonDepth)
14
MultiLogRe
g
MA
G
Predict(NonDepthNonMag)
15
MultiLogRe
g
MA
G
Predict(NonMag)
16
MultiLogRe
g
MA
G
PredictGeoHash(NonDepth)
17
MultiLogRe
g
MA
G
PredictGeoHash(NonDepthNonMag)
18
MultiLogRe
g
MA
G
PredictGeoHash(NonMag)
19
SVM
Depth
Predict(NonDepth)
20
SVM
Depth
Predict(NonDepthNonMag)
21
SVM
Depth
Predict(NonMag)
22
SVM
Depth
PredictGeoHash(NonDepth)
23
SVM
Depth
PredictGeoHash(NonDepthNonMag)
24
SVM
Depth
PredictGeoHash(NonMag)
25
SVM
Depth+MA
G
Predict(NonDepth)
26
SVM
Depth+MA
G
Predict(NonDepthNonMag)
27
SVM
Depth+MA
G
Predict(NonMag)
28
SVM
Depth+MA
G
PredictGeoHash(NonDepth)
29
SVM
Depth+MA
G
PredictGeoHash(NonDepthNonMag)
30
SVM
Depth+MA
G
PredictGeoHash(NonMag)
31
SVM
MA
G
Predict(NonDepth)
32
SVM
MA
G
Predict(NonDepthNonMag)
33
SVM
MA
G
Predict(NonMag)
34
SVM
MA
G
PredictGeoHash(NonDepth)
35
SVM
MA
G
PredictGeoHash(NonDepthNonMag)
36
SVM
MA
G
PredictGeoHash(NonMag)
37
Nai
v
eBayes
Depth
Predict(NonDepth)
38
Nai
v
eBayes
Depth
Predict(NonDepthNonMag)
39
Nai
v
eBayes
Depth
Predict(NonMag)
40
Nai
v
eBayes
Depth
PredictGeoHash(NonDepth)
41
Nai
v
eBayes
Depth
PredictGeoHash(NonDepthNonMag)
42
Nai
v
eBayes
Depth
PredictGeoHash(NonMag)
43
Nai
v
eBayes
Depth+MA
G
Predict(NonDepth)
44
Nai
v
eBayes
Depth+MA
G
Predict(NonDepthNonMag)
45
Nai
v
eBayes
Depth+MA
G
Predict(NonMag)
46
Nai
v
eBayes
Depth+MA
G
PredictGeoHash(NonDepth)
47
Nai
v
eBayes
Depth+MA
G
PredictGeoHash(NonDepthNonMag)
Comparison
of
mac
hine
learning
performance
...
(I
Made
Murwantar
a)
Evaluation Warning : The document was created with Spire.PDF for Python.
1338
❒
ISSN:
1693-6930
T
able
5.
Grouping
dataset
Method
Magnitude
Depth
10
Y
ears
Ev
aluation
RMSE
Method(25,
26)0.839928006
Method(34)123.7999
MAPE
Method
(30)
0.186486
Method
(14,
15)
0.712816
MSE
Method
(25,
27)
0.705479
Method
(34)
15326.42
MAE
Method
(30)
0.681305
Method
(31)
64.91890744
30
Y
ears
Ev
aluation
RMSE
Method
(25,
26)
0.751008212
Method
(28)
120.3226
MAPE
Method
(34,
35)
0.156257
Method
(32,
33)
0.809354
MSE
Method
(25,
26)
0.564013
Method
(28)14477.52
MAE
Method
(34,
35)
0.598473
Method(28)
64.5761601
T
able
6.
Ungrouping
dataset
Method
Magnitude
Depth
10
Y
ears
Ev
aluation
RMSE
Method
(19,
20)
0.805136856
Method
(23,24)
101.4409
MAPE
Method
(19,
20)
0.135727
Method
(23,
24)
1.835921
MSE
Method
(19,
20)
0.648245
Method
(23,
24)10290.26
MAE
Method
(19,
20)
0.618199
Method(23,
24)
76.15196673
30
Y
ears
Ev
aluation
RMSE
Method
(15)
3.663452813
Method
(2)
107.2547
MAPE
Method
(15)
0.539494
Method
(1)
0.701563
MSE
Method
(15)
13.42089
Method
(1)11503.57
MAE
Method
(15)
2.310839
Method(1)
70.64115023
T
able
7.
A
v
erage
e
v
aluation
result
Data
Set
RMSE
MA
G
MAPE
MA
G
MSE
MA
G
MAE
MA
G
Magnitude
Data
10
Y
ears
(Grouping)
0.963318
0.21023
0.94712
0.777716
Data
30
Y
ears
(Grouping)
0.854072
0.173682
0.746437
0.676576
Data
10
Y
ears
(No
Grouping)
0.868458
0.147251
0.757441
0.672579
Data
30
Y
ears
(No
Grouping)
5.051307
0.866291
25.78514
3.706884
Depth
Data
10
Y
ears
(Grouping)
127.0155
1.070409
16153.99
68.82178
Data
30
Y
ears
(Grouping)
125.8881
1.162366
15885.88
70.96083
Data
10
Y
ears
(No
Grouping)
109.1246
2.463045
11940.31
80.3022
Data
30
Y
ears
(No
Grouping)
109.8351
0.765595
12066.61
72.89245
T
able
8.
Machine
learning
performance
for
30
years
Method
RMSE
MA
G
MAPE
MA
G
MSE
MA
G
MAE
MA
G
Grouping
Data
Based
on
Magnitude
Multinomial
Logistic
Re
gression
0.777235
0.160233
0.604094
0.61487
SVM
0.751008
0.156257
0.564013
0.598473
Na
¨
ıv
e
Bayes
0.922814
0.183305
0.851585
0.716253
Grouping
Data
Based
on
Depth
Multinomial
Logistic
Re
gression
121.9435
0.817061
14870.22
67.01762
SVM
120.3226
0.809354
14477.52
64.57616
Na
¨
ıv
e
Bayes
123.5369
1.308522
15261.35
70.61942
T
able
9.
Machine
learning
performance
for
10
years
Method
RMSE
MA
G
MAPE
MA
G
MSE
MA
G
MAE
MA
G
Not
Grouping
Data
Based
on
Magnitude
Multinomial
Logistic
Re
gression
0.884768
0.150343
0.782815
0.687099
SVM
0.805137
0.135727
0.648245
0.618199
Not
Grouping
Data
Based
on
Depth
Multinomial
Logistic
Re
gression
109.8913
2.797098
12076.09
80.97818
SVM
101.4409
1.835921
10290.26
76.15197
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
3,
June
2020
:
1331
–
1342
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
❒
1339
3.2.
Results
T
o
sho
w
the
implementation
of
our
prediction
into
a
more
visualize
information,
a
web
s
ervice
presentation
is
sho
wn
using
R
Shin
y
system.
An
original
information
of
earthquak
e
is
retrie
v
ed
from
Indonesian
Geological
center
.
sho
wn
in
Figure
2(a).
W
e
compare
the
earthquak
e
report
from
the
BMKG
Indonesia,
as
sho
wn
in
Figure
2(a),
and
compare
it
to
the
prediction
results
we
made
before
the
date
of
e
v
ent
that
is
depicted
in
Figure
2(b),
2(c)
and
2(d).
Our
prediction
is
based
on
the
number
of
day
within
a
year
.
F
or
e
xample
if
we
w
ant
to
predict
earthquak
e
in
March
11,
2019,
then
we
count
number
of
days
from
the
be
ginning
of
the
year
up
until
the
D
day
,
where
from
the
calculation
we
ha
v
e
70
days.
Then,
we
select
the
v
alue
of
day
,
which
is
70
days,
into
the
web-system.
In
our
map,
the
red
colour
sho
ws
the
prediction
result
and
the
yello
w
colour
sho
ws
the
original
data.
In
comparing
the
earthquak
e
report
from
BMKG
Indonesia
and
our
prediction
result
sho
ws
that
predi
ction
using
Na
¨
ıv
e
Bayes,
as
sho
wn
in
2(b),
based
on
the
original
learning
data
is
not
good
enough.
multinomial
logistic
re
gression
performs
better
than
Na
¨
ıv
e
Bayes,
as
sho
wn
in
2(c),
the
earthquak
e
location
slightly
close
to
the
report
from
BMKG.
support
v
ector
machine
(SVM)
achie
v
e
better
results
for
eastern
Indonesia
re
gion,
which
is
out
performs
other
methods.
I
t
is
w
orth
to
note
that
the
training
data
influence
the
prediction
results.
Ov
erall,
the
prediction
results
ha
v
e
updated
our
kno
wledge
that
dif
ferent
machine
learning
may
perform
dif
ferently
,
although
similar
data
sets
were
used
for
training.
In
our
analysis,
SVM
may
ha
v
e
a
chance
for
better
earthquak
e
prediction.
(a)
(b)
Comparison
of
mac
hine
learning
performance
...
(I
Made
Murwantar
a)
Evaluation Warning : The document was created with Spire.PDF for Python.
1340
❒
ISSN:
1693-6930
(c)
(d)
Figure
1.
Earthquak
e
occurs
on
March
11,
2019,
(a)
original
information
from
BMKG
Indonesia
[16],
(b)
prediction
using
Na
¨
ıv
e
Bayes,
(c)
prediction
using
multinomial
logistic
re
gression,
(d)
prediction
Using
SVM.
4.
CONCLUSION
W
e
ha
v
e
compared
machine
lea
rning
method
to
predict
earthquak
e
location,
depth
and
magnitude
for
Indonesia
re
gion.
In
order
to
visualize
the
predict
ion
results,
a
web-based
application
has
also
been
demon-
strated.
The
conclusion
we
obtained
from
this
w
ork
as
follo
w
,
Na
¨
ıv
e
Bayes
method
is
not
good
enough
to
predict
for
a
grouping
data
set
for
only
one
year
,
and
it
is
applicable
for
multi
year
grouping
data.
Considering
the
a
v
erage
error
rate,
SVM
method
outperforms
other
algorithm
where
using
Magnitude
data
as
reference
pro
vides
better
results
than
using
the
Depth
data.
This
information
leads
us
into
an
insight
that
the
Depth
can
be
used
as
the
addition
f
actor
for
better
prediction.
W
e
deal
with
day
,
month
and
year
as
date
property
for
prediction,
and
our
observ
ation
sho
ws
that
prediction
based
on
day
performs
better
.
F
or
o
v
erall
data
set,
as
we
already
e
xpected,
SVM
outperforms
other
method
that
is
follo
wed
by
multinomial
logistic
re
gression
in
predicting.
Na
¨
ıv
e
Bayes
performed
w
orst
from
all
prediction
results.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
3,
June
2020
:
1331
–
1342
Evaluation Warning : The document was created with Spire.PDF for Python.