Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
8
, No
.
6
,
Decem
ber
201
8,
pp. 4
763~
4771
IS
S
N: 20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v8
i
6
.
pp
4763
-
47
71
4763
Journ
al h
om
e
page
:
http:
//
ia
es
core
.c
om/
journa
ls
/i
ndex.
ph
p/IJECE
Compar
ative Stu
dy of Cl
assificati
on Meth
od on C
ustomer
Candid
ate Dat
a t
o Predi
ct
i
ts Pot
en
tial Risk
Muji
ono
Sa
di
kin, F
ah
ri
Alfi
an
di
Facul
t
y
of
Com
pute
r
Sc
ie
nc
e, Un
ive
rsit
as
Merc
u
Buana
,
Indone
si
a
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
Ja
n
8
, 201
8
Re
vised
Ju
l
19
,
201
8
Accepte
d
J
ul
29
, 2
01
8
Le
asing
v
ehicles
are
a
compa
n
y
enga
ged
in
the
fi
el
d
of
ve
hic
l
e
loa
ns
.
Purcha
se
b
y
w
a
y
of
cre
dit
b
ec
om
es
a
m
ai
nsta
y
be
ca
use
it
ca
n
at
tr
a
ct
pote
n
tial
customers
to
gene
rate
m
ore
profit
.
But
if
th
ere
is
a
m
ista
ke
in
appr
oving
a
customer
ca
ndi
dat
e
,
the
risk
of
stal
le
d
cre
d
it
pa
y
m
ent
s
ca
n
happe
n.
To
m
ini
m
iz
e
th
e
r
i
sk,
it
c
an
be
ap
pli
ed
the
c
ert
a
i
n
data
m
ini
ng
t
ec
hniqu
e
to
pre
dict
th
e
futur
e
beh
avi
or
of
th
e
customers.
In
thi
s
stud
y
,
it
is
expl
ore
d
in
som
e
dat
a
m
ini
ng
te
chni
ques
suc
h
as
C4.
5
and
N
ai
ve
Ba
y
es
for
t
hi
s
purpose.
The
customer
attribut
es
used
in
thi
s
stud
y
are
:
sala
r
y
,
ag
e,
m
ari
ta
l
st
at
us
,
othe
r
inst
al
lmen
ts
and
worthine
s
s.
The
exp
eri
m
e
nts
are
per
form
e
d
b
y
using
the
W
ek
a
soft
ware
.
Based
o
n
evalua
t
ion
cr
it
eria,
i
.
e
.
ac
cu
racy
,
C4.
5
al
gorit
hm
outp
e
rform
s
com
par
ed
to
Naive
B
a
y
es
.
The
p
erce
nta
ge
spl
it
expe
riment
sce
n
ari
os
provide
the
pre
ci
sion
val
u
e
of
89.
16%
and
the
ac
cu
r
a
c
y
val
ue
of
83
.
33
%
where
s
the
cr
oss
val
ida
ti
on
e
xper
iment
sce
n
a
rios
give
the
highe
r
a
cc
ur
acy
val
ues
of
al
l
use
d
k
-
fold.
The
C
4.
5
exp
eri
m
ent
r
esult
s
al
s
o
conf
irm
that
the
m
ost
infl
uen
ti
a
l
insta
n
t
da
ta
attr
ibut
e
in
thi
s
r
ese
arc
h
is
the
sala
r
y
.
Ke
yw
or
d:
C4.
5
al
gorithm
Data m
ining
Leasi
ng
Naive
bayes al
gorithm
Copyright
©
201
8
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights
reserv
ed
.
Corres
pond
in
g
Aut
h
or
:
Fahri
A
lfia
nd
i
Faculty
of Com
pu
te
r
Scie
nc
e,
Un
i
ver
sit
as Me
rcu Bua
na,
Me
ru
ya
Selat
a
n No.
1,
Kem
ban
ga
n, Ja
kar
ta
Ba
rat 1165
0,
I
ndonesi
a
.
Em
a
il
:
41
5140
10101@st
ude
nt
.
m
ercu
bua
na.
a
c.id
1.
INTROD
U
CTION
Data
m
ining
is
a
proces
s
that
us
e
s
a
va
riet
y
of
data
analy
sis
too
ls
to
disc
over
patte
rn
s
an
d
relat
ion
s
hip
s
i
n
data
that
m
ay
be
us
ed
to
m
ake
valid
pr
edict
io
ns
[1
]
–
[
3]
.
The
proce
ss
is
per
f
or
m
e
d
by
extracti
ng
or
r
ecognizi
ng
t
he
i
m
po
rta
nt
patt
ern
f
ro
m
the
da
ta
con
ta
i
ned
in
the
data
base.
In
the
data
m
ining
there
a
re
m
any
te
chn
i
qu
es
to
do
it
,
am
ong
of
them
are
C4.
5,
Naive
B
ay
es
al
gorithm
,
A
pri
or
i,
K
-
N
N
a
nd
m
any o
the
r
s.
Ba
nk
c
red
it
ris
k
assessm
ent
is
widely
use
d
at
banks
a
rou
nd
the
w
or
ld
.
S
om
e
of
ba
nk
ri
sk
s
incl
ud
e:
cred
it
ris
k,
the
risk
that
the
l
oan
w
on
'
t
be
r
et
urn
bac
k
on
tim
e
or
at
al
l
;
l
iqu
idit
y
risk,
the
ris
k
that
to
o
m
any
deposits
will
be
withdraw
n
t
oo
qu
ic
kly,
le
avin
g
the
ba
nk
al
l;
l
iqu
idit
y
risk,
the
risk
tha
t
too
m
any
dep
os
it
s
will
be
with
drawn
t
oo
quic
kl
y,
le
avin
g
the
bank
s
hort
on
i
m
m
ediat
e
cash;
and
i
nterest
r
at
e
risk,
t
he
ris
k
that
the
interest
rat
es
pri
ced
on
ba
nk
loa
ns
will
be
to
o
l
ow
t
o
earn
t
he
bank
ad
eq
uate
m
on
e
y
[4]
.
As
c
red
i
t
risk
evaluati
on
is
ve
ry
cr
ucial
,
a
va
riet
y
of
te
ch
niq
ue
s
are
use
d
f
or
r
is
k
le
vel
ca
lc
ulati
on
.
I
n
a
ddit
ion
,
cre
dit
ri
sk
i
s
on
e o
f
the
m
ain
functi
ons
of
the
ba
nk
i
ng
c
om
m
un
it
y.
Ba
nk
s
cl
assify
cl
ien
ts
accor
ding
to
their
pro
file
s.
W
hile
cl
assify
ing
,
th
e
fina
ncial
ba
ckgr
ound
of
the
c
us
tom
ers
a
nd
sub
j
ec
ti
ve
facto
rs
r
el
at
ed
to
the
m
a
re
evaluate
d
[
5]
.
To
facil
it
at
e
the
com
pan
y
in
processi
ng
th
e
la
rg
e
data,
t
he
n
the
syst
em
would
be
needed
to
pro
du
ce
a
deci
sion
on
pote
nti
al
custom
er'
s
risk.
O
ne
of
th
e
m
is
us
in
g
da
ta
m
ining
te
c
hniq
ues,
the
m
uch
s
o
that
the
data
can
be
us
e
d
optim
al
ly
.
B
y
e
xp
l
oiti
ng
the
se
data,
it
is
exp
ect
ed
to
as
sist
in
addressin
g
th
e
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
8
, N
o.
6
,
Dece
m
ber
2
01
8
:
4763
-
4771
4764
custom
er
candi
dates
whom
a
re
pr
e
dicte
d
w
il
l
hav
e
paym
e
nt
pro
blem
s
in
the
fu
tu
re
to
assist
in
deter
m
ining
the pr
os
pecti
ve
cu
st
om
er cr
ed
it
m
or
e as
well
.
In
the
stu
dy
publishe
d
in
the
Jour
n
al
en
ti
tl
ed
"C
4.
5
Algorithm
to
Pr
e
dict
the
Im
pact
of
th
e
Earth
qu
a
ke"
it
is
desc
ribe
ab
ou
t
t
he
ea
rth
quake
that
ca
nn
ot
be
pre
dicte
d
w
hen
it
w
ou
l
d
happe
n,
but
we
ca
n
pr
e
dict
the
e
xpect
ed
im
pact
of
the
qua
ke
base
d
on
sei
s
m
ic
data
that
nev
e
r
ha
pp
e
ne
d
befor
e
.
On
e
of
the
m
et
ho
ds
us
e
d
to
di
g
or
to
sea
r
ch
f
or
i
nfor
m
ation
on
old
data
is
data
m
ining
al
gorithm
C4.5
.
T
he
ou
t
pu
t
of
t
he
al
gorithm
C4.5
in
pre
dicti
ng
the
im
pact
of
the
quake
i
s
div
ide
d
int
o
three
pa
rts.
Nam
ely,
there
are
no
i
m
pact/m
ino
r
dam
age,
seve
r
e
dam
age
,
an
d
the
dam
age
and
ts
unam
i.
B
y
pr
e
dicti
ng
th
e
i
m
plica
ti
on
s
of
t
he
earth
qu
a
ke,
it
is
exp
ect
e
d
to
m
ini
m
iz
e
the
qu
ac
k
im
pact.
This
stu
dy
us
e
s
the
C4.
5
al
gorithm
to
pr
e
di
ct
the
eff
ect
s
of
ea
rthqua
kes
w
hile
the
at
tri
bu
te
s
that
are
us
e
d
a
re
the
e
picente
r,
dist
ance
f
rom
the
beach
,
dep
t
h,
scal
e,
durati
on
,
an
d
eff
e
ct
.
T
he
res
ults
of
t
he
stu
dy
show
the
patte
rn
t
o
pr
e
dict
is
bas
ed
on
the
e
ff
e
ct
s
of
earth
qu
a
kes.
I
f
the
scal
e
is
lo
w,
it
does not
c
ause
a
ny
ef
fect.
If
the
scal
e
is m
edium
and
in
short d
urat
io
n,
then
ther
e
is
no
e
ffec
t.
If
the
scal
e
is
m
ediu
m
and
in
l
ong
dur
at
ion
,
t
hen
it
will
cause
t
he
bro
ken.
I
f
t
he
scal
e
is
heig
ht
an
d
in
a
certai
n
distan
ce
from
the
coast
or
it
is
happen
i
ng
on
the
la
nd,
it
will
cause
the
bro
ken
t
oo.
I
f
the
scal
e
is
he
igh
t
a
nd
it
s
di
sta
nc
e
f
ro
m
the
coast
is
ver
y
far,
the
n
it
wi
ll
cause
bro
ke
n
a
nd
ts
unam
i.
If
t
he
scal
e
is
heigh
t
and
it
s
distan
ce
from
the
coast
is
far
an
d
the
epice
nter
in
the
sea,
it
will
cause
bro
ke
n
and
tsun
am
i
[6]
.
The
oth
er
stu
dy
that
util
iz
es
t
he
C4
.5
is
al
s
o
pr
ese
nted
i
n
[
7]
.
The
stu
dy
de
scribes
a
bout
rainf
al
l,
s
oil
data
an
d
cl
im
a
t
e
dataset
that
a
re
us
e
d
to
pred
ic
t
the
crop
pr
oductio
n.
The
se
ty
pes
of
datas
et
s
are
prep
r
oc
essed
to
rem
ov
e
t
he
unwa
nted
a
nd
nu
ll
data
in
t
he
dataset
.
T
he
f
eat
ur
e
e
xtracti
on
m
et
ho
d
is
use
d
t
o
e
xtract
a
subset
of
ne
w
feature
s
from
the
da
ta
set
s
thr
ough
f
unct
io
nal
m
app
in
g
to
m
ai
ntain
the
i
nfor
m
at
ion
.
I
n
f
eat
ur
e
sel
ect
ion
,
ge
ne
ti
c
al
go
rithm
is
us
e
d
to
sel
ect
op
ti
m
al
featur
e
s.
T
he
gen
et
ic
al
gorithm
pr
ov
ides
the
oppo
rtun
it
y
to
disco
ver
the
op
ti
m
u
m
so
luti
on
.
T
he
enh
a
nced
ANFIS
c
la
ssifie
r
then
is
us
ed
.
The
A
NF
I
S
cl
assifi
er
is
the
i
m
pr
ovem
ent
of
C
4.5
cl
assif
ie
r
in
hidden
la
ye
r
to
gen
e
rat
e
the
r
ules
to
pr
e
dict
the
yi
el
d.
By
en
ha
ncing
the
C4.
5,
the
expe
r
i
m
ental
r
esults
of
pro
posed
work s
how bett
e
r
acc
ur
acy
of
92.
50 % tha
n
e
xi
sti
ng
classi
fier
. Th
e
com
par
at
ive
st
ud
y
of
decisi
on
tree
var
ia
nts
pe
rfor
m
ance
of
in
form
ation
m
ining
in
the
f
or
est
bur
ne
d
area
is
cond
ucted
by
Pu
tri
et
al
as
publis
hed
in
[8]
.
The
stu
dy
c
onduct
ed
com
par
at
ive
a
naly
sis
of
t
hr
ee
decisi
on
tree
var
ia
nts
ie
.
C
ART,
C
5.0,
an
d
C
4.5
al
go
rithm
.
Of
t
hese
t
hr
ee
decisi
on
te
chn
i
qu
e
s,
t
he
C5.
0
al
gorit
hm
is
th
e
m
os
t
su
it
able
f
or
sp
at
ia
l
data
of
the
forest
bur
ne
d
area
.
T
he
al
gorithm
is
ou
t
perform
sh
own
by
it
s
acc
ur
acy
is
99.79
%
.
In
[
9]
aut
hors
sh
ow
thei
r
stu
dy
in
us
i
ng
Na
ive
Ba
ye
s
cl
assifi
er
to
predic
t
the
patie
nt’s
hype
rt
ensio
n
disease.
T
he
hy
per
te
ns
i
on
di
sease
is
a
sign
i
ficant
healt
h
pro
blem
,
and
pa
ti
ents
m
ay
no
t
be
able
to
re
co
gn
iz
e
this
disease
for
ye
ars.
But
in
the
oth
e
r
side,
it
'
s
st
il
l
diff
ic
ult
to
ans
w
er
com
plex
queries
su
c
h
as
“Giv
e
n
patie
nt
rec
ords
,
pre
dict
the
pr
ob
a
bili
ty
of
pa
ti
ents
getti
ng
hy
per
te
ns
i
on
”
.
Most
of
t
he
ti
m
e,
cl
inica
l
de
ci
sions
are often
m
ade b
ase
d on doct
or
s
intuiti
on a
nd e
xp
e
rience
ra
ther
th
an o
n
th
e knowle
dge
rich dat
a h
i
dd
e
n i
n
the
database
.
I
n
th
is
stud
y,
the
N
ai
ve
Ba
ye
s
al
go
rithm
is
e
m
pl
oyed
t
o
m
ake
a
m
od
el
with
predict
ive
ca
pabi
li
t
ie
s.
It
pro
vid
es
ne
w
ways
that
of
exp
l
or
in
g
a
nd
underst
an
ding
knowle
dge.
A
tt
ribu
te
s
use
d
i
n
this
resea
rc
h
are
as
fo
ll
ows
se
x,
c
hest
pain
,
exa
m
,
age,
syst
olic
BP,
diastoli
c
BP,
cho
le
ste
r
ol,
fasti
ng
bl
ood
s
ugar,
t
halac
h,
ol
d
peak,
the
ris
k
of
hype
rtensi
on.
T
he
Nai
ve
Ba
ye
s
exp
erim
ents
in
the
stu
dy
giv
e
perf
orm
ances
as:
the
recall
is
83
.
7
0%
,
the
preci
sion
is
83
.
6
0%
a
nd
th
e
ac
cur
acy
is
83
.
67%.
A
nothe
r
i
nteresti
ng
of
na
ïve
Ba
ye
s
a
ppli
cat
ion
for
cl
assifi
cat
ion
purpose
is
pr
e
se
nted
in
[
10
]
.
I
n
the
stu
dy
auth
or
pr
es
ent
the
resu
lt
of
the
Za
kah
r
ecei
ver
cl
assifi
cat
ion
exp
e
rim
ent
th
at
util
iz
es
the
naïve
Ba
ye
s
cl
assifi
er.
Acc
ordin
g
the
e
xperim
ent
resu
lt
s,
th
e
cl
assifi
er
pr
ov
i
des
good
acc
uracy
i.e.
85
%
.
On
e
of
the
ap
pl
ic
at
ion
of
n
aï
ve
Ba
ye
s
cl
assifi
ers
in
m
edia
so
ci
al
m
ining
do
m
ai
n
is disc
us
se
d
in
[11]
. T
he st
ud
y exp
l
or
e
d
t
he a
pp
li
cat
io
n of
Mult
ino
m
ia
l Naïve Bay
es cla
ssifie
r
te
chn
iq
ue
to
m
ine
the
senti
m
e
nt
op
i
nion
patte
rn
of
GS
M
ba
sed
on
custom
er’
s
twit
te
r
account.
By
us
ing
16
65
featur
e
s
of
t
he data
set
, th
e
techn
i
qu
e
pr
ov
i
de
s the acc
ur
a
cy
resu
lt
s
of 73.
15 %
.
In
t
his
work
w
e
pe
rfor
m
an
e
xp
e
rim
ental
st
ud
y
of
Nai
ve
Ba
ye
s
and
C4
.
5
al
go
rithm
th
at
app
li
ed
to
the
com
pan
y
l
easi
ng
cu
stom
er
data
histor
y
.
The
pur
pose
of
the
data
is
to
evaluate
the
perform
ance
of
bo
t
h
al
gorithm
s
in
assist
ing
t
he
c
om
pan
y
le
asi
ng
to
m
ake
the
decisi
on
reg
a
rd
i
ng
t
he
a
ppr
ov
al
of
cu
sto
m
ers
c
and
i
date
w
ho
app
ly
the
le
a
sing.
Th
e
s
uc
h
stud
y
is
crit
ic
al
to
local
Ind
on
e
sia
co
ntext
since
the
fi
na
ncial
te
chnolo
gy
is
cur
re
ntly
grow
i
ng
qu
ic
kly
wh
il
e
the
inf
or
m
at
ion
tech
nolo
gy,
es
pe
ci
al
ly
the
so
ftwar
e
/
app
li
cat
io
n,
th
e
en
vironm
ent
is
sti
ll
in
the
in
it
ia
l
ph
ase.
Ac
cordin
g
t
o
the
auth
or
'
s
kn
ow
l
edg
e
,
the
re
is
a
very
lim
it
ed
publica
ti
on
relat
ed
the
ap
plica
ti
on
of
Ar
ti
fici
al
I
ntell
igent
or
Ma
chine
Lea
r
ning
to
t
his
do
m
ai
n
for
Ind
on
esi
a ca
se
s.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Comparati
ve
St
ud
y
of Clas
sif
ic
ation Met
ho
d on Cust
om
e
r C
andidate
Da
t
a
(
Mujiono
Sadi
ki
n
)
4765
2.
MA
TE
RIA
L
AND ME
TH
OD
2.1.
Cl
as
sific
ati
on
Cl
assifi
cat
ion
is
on
e
of
the
D
at
a
Mi
nin
g
te
c
hn
i
qu
e
s
that
is
m
ai
nly
us
ed
to
analy
ze
a
gi
ven
dataset
and
takes eac
h
instance o
f
it
a
nd
assi
gn
s this
instance to a parti
cular class su
c
h
that cl
assif
ic
at
ion
er
ror
will
b
e
le
ast
.
It
is
us
e
d
to
e
xtract
m
od
el
s
t
hat
acc
ur
at
el
y
de
fine
i
m
po
rtant
data
cl
asses
withi
n
the
giv
e
n
da
ta
set
.
Cl
assifi
cat
ion
is
a
two
ste
p
process
.
D
uri
ng
the
first
ste
p
the
m
od
el
is
create
d
by
ap
pl
yi
ng
a
cl
assifi
cat
ion
al
gorithm
fo
r
trai
ning
data
se
t,
then
in
the
s
econd
ste
p
t
he
extracte
d
m
odel
i
s
te
ste
d
against
a
pr
e
def
i
ne
d
te
st
dataset
to
m
ea
su
re
the
m
od
el
trai
ned
pe
rform
ance
and
acc
ur
acy
.
S
o
cl
assifi
cat
ion
is
the
pr
oc
ess
to
ass
ign
a
cl
ass label f
rom
d
at
aset
w
hose cla
ss label is
unkn
own
[
9]
.
2.2.
C
4.5 Al
go
ri
th
m
C4.
5
al
gorithm
is
an
al
gorith
m
us
ed
to
c
ons
truct
a
decisi
on
tree
[12]
,
a
c
la
ssifi
cat
ion
an
d
pre
dicti
on
m
et
ho
ds
ar
e
extrem
el
y
po
w
erful
and
fam
ou
s
.
Decisi
on
tree
m
et
ho
d
changes
the
ve
ry
la
rg
e
fact
into
a
decisi
on
tree
t
hat
re
pr
ese
nts
the
r
ule.
T
he
decisi
on
tree
is
al
so
use
f
ul
to
ex
plore
th
e
data
in
fin
din
g
t
he
relat
ion
s
hip
be
tween
in
pu
t
var
ia
bles
and
a
certai
n
ou
t
put/
ta
rg
et
va
riable.
I
n
ge
ner
a
l,
C4.5
al
gorit
hm
to
const
ru
ct
a
d
e
c
isi
on
tree
is
de
scribe
d
as
foll
ows:
a.
Sele
ct
an
att
rib
ute as
root.
b.
Creat
e a bra
nc
h for eac
h valu
e.
c.
Fo
r
the
case
of the
br
a
nc
hes.
d.
Re
peat the
pro
cess f
or each
branc
h u
ntil
all
cases the
branc
hes ha
ve
the
sa
m
e cla
ss.
To
sel
ect
a
n
a
tt
ribu
te
as
r
oo
t
s,
is
based
on
the
hi
gh
e
st
gai
n
value
from
t
he
e
xisti
ng
at
trib
utes.
T
o
cal
culat
e the
ga
in used
for
m
ula as foll
ows:
(
,
)
=
(
)
−
∑
|
|
|
|
=
1
Inform
at
ion
:
S
:
T
he
set
s
of
cases
A
:
Attrib
ute
n
:
The
num
ber
of
pa
rtit
ion
s a
tt
ribu
te
A
|S
i
|
: Nu
m
ber
of
cases in
the i
parti
ti
on
s
|S|
:
Nu
m
ber
of cases
on S
Me
anwhil
e, th
e cal
culat
ion o
f
e
ntropy
value
foll
ow
s:
=
∑
−
∗
2
=
1
Inform
at
ion
:
S
:
T
he
set
s
of
cases
A
:
Feat
ur
e
n
:
T
he n
um
ber
of
par
ti
ti
on
s
S
pi
: The
prop
or
t
ion
of S
i
a
gai
nts S
2.3.
N
aiv
e
Ba
yes Alg
orith
m
Naive
Ba
ye
s
al
gorithm
stud
ie
s
the
even
ts
of
the
database
record
by
cal
culat
ing
the
va
r
ia
bles
wh
ic
h
are
a
naly
zed
w
it
h
ot
her
va
riab
le
s
[13]
.
The
r
esult
of
t
his
process
is
we
ca
n
pr
e
dict
s
om
e
thing
s
uch
as
wh
et
he
r
or
a
pe
rs
on
c
om
ing
f
ro
m
certai
n
gro
ups
bas
ed
on
va
riables
at
ta
ched
t
o
it
.
Additi
on
al
ly
,
Naive
Ba
ye
s
c
an
al
so
analy
ze
the
va
riables
that
m
os
t
influ
e
nce
i
n
the
form
of
pro
ba
bili
ti
es.
N
ai
ve
Ba
ye
s
is
a
sim
ple
pr
oba
bili
ty
-
base
d
pr
e
dicti
on
te
chn
i
qu
e
s
ba
sed
on
the
ap
plica
ti
on
of
Ba
ye
s
theor
em
to
assum
e
s
trong
ind
epe
ndence
.
The
ste
ps
belo
w
a
r
e N
ai
ve
Bay
es
sta
ges process:
a.
Counti
ng the
num
ber
of class
es / la
bels
b.
Counti
ng the
num
ber
of cases
p
e
r
cl
ass
c.
Mult
iply
all
cl
ass v
a
riables
d.
Com
par
e res
ults pe
r
cl
ass
T
he
form
ula o
f
N
ai
ve
Bay
es
Algorithm
is as f
ollow
s
:
(
|
)
=
(
|
)
(
)
(
)
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
8
, N
o.
6
,
Dece
m
ber
2
01
8
:
4763
-
4771
4766
Inform
at
ion
:
x
: Data wit
h u
nkno
wn cla
ss
c
: Hyp
oth
esi
s
of
d
at
a is a
sp
eci
fic cl
ass
P (
c
| x)
: The
pro
bab
il
i
ty
o
f
a
h
y
po
t
he
sis base
d o
n
th
e co
nd
it
io
ns
P (
c
)
: The
pro
bab
il
i
ty
o
f
a
h
y
po
t
he
sis
P (x
| c
)
: Pr
ob
a
bili
ty
b
ased
on
hypoth
et
ic
al
co
ndit
ion
s
P (x)
: Pr
ob
a
bili
ty
c
2.4.
Wek
a
T
ools
Wek
a
is
a
c
ollec
ti
on
of
m
ach
ine
le
ar
ning
al
gorithm
s
fo
r
da
ta
m
ining
ta
s
ks.
W
e
ka
sta
nds
for
Waikat
o
En
vironm
ent
fo
r
Kno
wled
ge
Lear
ni
ng.
It
w
as
devel
oped
by
the
Un
i
ver
sit
y
of
W
ai
kato
,
New
Ze
al
an
d.
Wek
a
con
ta
in
s
to
ols
f
or
data
pr
e
-
proce
ssin
g,
cl
assifi
cat
ion
,
re
gr
essi
on,
cl
us
te
rin
g,
as
so
ci
at
ion
r
ule
s,
a
nd
visu
al
iz
at
ion
[
14
]
.
Th
e
wo
rkf
low o
f Weka
wou
l
d be
fo
ll
ow
s i
n
Fi
gure
1.
Figure
1
.
W
e
ka
Flo
w
2.5.
D
ata Set
The
data
s
ourc
e
us
e
d
i
n
t
his
r
esearch
is
c
ollec
te
d
f
r
om
on
e
of
the
le
asi
ng
com
pan
ie
s
l
oc
at
ed
in
the
area
of
Ci
kupa
-
Tan
ge
rang,
B
anten
P
rovince
.
The
total
am
ount
of
data
c
ollec
te
d
are
56
0
rec
ord
data,
each
instant
co
ntains
5
at
trib
ute
s,
nam
el
y:
age,
m
ari
ta
l
stat
us
,
sal
ary,
oth
er
instal
lm
ents
and
w
ort
hi
ness
as
pr
ese
nted
as
Table
1.
Wo
rt
hin
ess
at
trib
ute
is
the
ta
rget
va
riable/
la
bel.
So
m
e
sam
ple
s
of
data
in
sta
nt
a
r
e
descr
i
bed in
T
able
2.
Table
1.
Data
Set Att
rib
ute
No
.
Attribu
te
Attribu
te Value
1
Ag
e(Ye
a
rs)
2
3
,
4
0
,
5
0
so
on
2
Salar
y
(Ru
p
iah
)
1
M
ilio
n
,
4
M
ilio
n
so
on
3
Oth
er
Ins
tall
m
en
ts
Yes, No
4
Mar
ital
Status
Mar
ried, Sing
le
5
W
o
rthin
ess
W
o
rth It
,
No
t
W
o
r
th
I
t
Table
2.
E
xam
ple of
Data Set
A
tt
rib
ute
Valu
e
No
.
Ag
e
Salar
y
Oth
er
Ins
tall
m
en
ts
Mar
ital
Status
W
o
rthin
ess
1
21
4
.40
0
.000 IDR
No
Mar
ried
No
t W
o
rth
It
2
23
1
0
.60
0
.0
0
0
I
DR
Yes
Mar
ried
No
t W
o
rth
It
3
43
1
4
.00
0
.0
0
0
I
DR
Yes
Mar
ried
W
o
rth It
4
54
1
3
.00
0
.0
0
0
IDR
No
Mar
ried
W
o
rth It
5
25
4
.70
0
.000 IDR
Yes
Sin
g
le
No
t W
o
rth
It
Tw
o
of
fou
r
at
tribu
te
s,
a
ge
an
d
sal
ary,
can
c
on
ta
in
values
i
n
wide
ra
ng
e
,
s
o
this
co
ndit
ion
will
m
ake
su
f
fer
in
it
s
c
om
pu
ta
ti
on
.
T
o
de
al
with
t
hi
s
pro
blem
we
ap
ply
the
cat
egorizat
ion
m
echan
ism
to
both
of
at
tribu
te
values
as
pr
ese
nted
in
Ta
ble
3.
T
abl
e 4
show
s
data
exam
ple.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Comparati
ve
St
ud
y
of Clas
sif
ic
ation Met
ho
d on Cust
om
e
r C
andidate
Da
t
a
(
Mujiono
Sadi
ki
n
)
4767
Table
3.
Data
Set Att
rib
ute C
at
egorizat
ion
No
.
Attribu
te
Attribu
te Value
Attribu
te Categ
o
ri
zatio
n
1
Ag
e(Ye
a
rs)
2
3
,
4
0
,
5
0
so
on
-
Ag
e <
45
:
Yo
u
n
g
-
Ag
e >
45
:
Old
2
Salar
y
(Ru
p
iah
)
1
m
illio
n
,
4
m
illio
n
so
on
-
< 5
m
illio
n
: L
o
w
-
5
–
1
0
m
illio
n
:
M
id
d
le
-
> 10
m
illio
n
: High
3
Oth
er
Ins
tall
m
en
ts
Yes, No
Yes, No
4
Mar
ital
Status
Mar
ried, Sing
le
Mar
ried, Sing
le
5
W
o
rthin
ess
W
o
rth It
,
No
t
W
o
r
th
I
t
W
o
rth It
,
No
t
W
o
r
th
It
Table
4.
E
xam
ple of
Data Set
Cat
egorizat
ion
No
Ag
e
Salar
y
Oth
er
Ins
tall
m
en
ts
Mar
ital
Status
W
o
rthin
ess
1
Yo
u
n
g
Low
No
Mar
ried
No
t W
o
rth
It
2
Yo
u
n
g
Hig
h
Yes
Mar
ried
No
t W
o
rth
It
3
Yo
u
n
g
Hig
h
Yes
Mar
ried
W
o
rth It
4
Old
Hig
h
No
Mar
ried
W
o
rth It
5
Yo
u
n
g
Low
Yes
Sin
g
le
No
t W
o
rth
It
2.6.
Ex
peri
ment
S
cen
ario
The
m
ai
n
pa
rts
of
e
xp
e
rim
ent
scenari
o
c
onsist
of
tw
o
ste
ps
.
The
first
ste
p
i
s
to
obta
in
the
best
m
od
el
from
each
al
go
rithm
and
t
he
seco
nd
is
t
o
c
om
pete
the
bo
t
h
be
st
m
od
el
s
ob
ta
ine
d.
The
detai
l
of
e
xper
i
m
ent
sta
ges
a
nd sce
nar
i
o
is i
ll
us
tra
te
d
as t
he
Fi
gure
2
.
Figure
2
.
T
he E
xp
e
rim
ent
Scenar
i
o
The
data
colle
ct
ed
is
not
rea
dy
ye
t
to
be
pr
ocesse
d
by
the
al
gorithm
sinc
e
there
are
t
oo
m
any
biase
s
or
am
big
uous con
ta
ine
d
on
it
, s
o
it
n
eeds to
perform
the d
at
a p
repr
ocessin
g
operati
on. In t
his step
we
pe
rfor
m
data
cl
eanin
g
by
ignoring
t
he
un
c
om
plete
d
da
ta
.
The
ne
xt
s
te
p
of
data
pre
processi
ng
is
a
data
tra
ns
f
or
m
at
ion
that
transfor
m
s
the
data
fo
rm
at
to
fo
rm
at
th
at
co
m
patible
with
W
e
ka
tool
s.
Data
sp
li
tt
i
ng
is
the
n
ap
plied
to
the
data
to
di
vid
e
the
data
int
o
two
par
ts:
tra
ining
data
an
d
te
sti
ng
data.
I
n
this
case,
we
us
e
80%
par
ts
of
the
data
f
or
data
t
rainin
g,
an
d
t
he
re
st
as
data
te
sti
ng
.
The
sam
e
trai
nin
g
data
is
the
n
us
e
d
t
o
t
rain
bo
t
h
of
al
gorithm
to
prov
i
de
the
m
od
el
s w
hic
h wil
l be test
ed
w
it
h t
he
sam
e d
at
a test
ing
.
For b
oth
al
gorithm
s u
sed,
w
e
perform
twenty
exp
erim
ent
ru
ns
to
get
the
best
m
od
el
of
each
al
gorithm
.
Both
of
the
be
st
m
od
el
s
are
then
com
peted
to e
va
luate
thei
r pe
r
form
ance an
d t
o get the
best
m
od
el
a
m
on
g
of C4.5 a
nd
Na
ive Bay
es.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
8
, N
o.
6
,
Dece
m
ber
2
01
8
:
4763
-
4771
4768
2.7.
D
ata Pr
e
processin
g
Data
pr
e
proce
ssing
is
re
qu
i
r
ed
to
im
pr
ov
e
the
qu
al
it
y
of
the
data
by
rem
ov
ing
the
un
wan
te
d
data
from
the
or
igin
al
data
[15]
.
Pre
processi
ng
d
a
ta
is
i
m
po
rtant
since
the
ra
w
da
ta
con
ta
ins
m
i
ssing
v
al
ues
,
noisy
,
and inc
onsist
ent d
at
a it
will
re
su
lt
in data
not qu
al
ifie
d.
I
n
t
his stu
dy, we
do
data
pre
proc
essing
as
fo
ll
ows
:
a.
Data Cl
eanin
g
Data
cl
eanin
g
is
to
do
data
cl
eanin
g
of
the
no
ise
fo
un
d
in
th
e
fo
rm
of
m
issi
ng
val
ues,
inc
on
sist
e
nt
data,
an
d
re
dund
a
nt
data.
All
the
above
at
tribu
te
s
will
then
be
sel
ect
ed
to
obta
in
at
trib
utes
that
con
ta
i
n
releva
nt
val
ue
s,
not
m
issi
ng
values,
an
d
not
re
dunda
nt,
wh
e
re
the
three
re
qu
ire
m
ents
are
the
pr
e
requisi
te
s
that
m
us
t
be
do
ne
in
data
m
ini
ng
so
t
hat
will
be
obta
ine
d
a
c
le
an
dataset
f
or
us
e
in
the d
at
a
m
ining
sta
ge
.
In
this
dataset
fou
nd
1
m
issi
n
g
val
ue,
the
te
c
hn
i
qu
e
t
hat
will
be
done
for
1
m
issi
ng
val
ue
record
is to
d
el
et
e it
r
eco
rd.
b.
Data Tra
nsfo
r
m
at
ion
The
data
tra
nsfo
rm
ation
st
ag
e
is
at
this
stag
e
the
data
is
conve
rted
int
o
the
appr
opriat
e
form
fo
r
processi
ng
in
data
m
ining
.
I
n
this
stud
y
the
data
will
be
processe
d
from
M
ic
ro
soft
excel
will
be
conve
rted
i
nto
a
CSV
file
(C
om
m
a
Separ
at
ed
Values
)
wh
i
ch
ca
n
be
us
ed
for
data
proce
ssi
ng
on
We
ka
too
ls.
2.8.
Ev
alu
at
io
n
To
e
valuate
th
e
pe
rfor
m
ance
of
bo
t
h
al
gor
it
h
m
s,
we
us
e
the
c
omm
on
crit
eria
in
data
m
ining
i.e
.
pr
eci
sio
n,
reca
ll
,
and
accu
rac
y.
The
cal
culat
ion
of
those
pa
ram
et
ers
is
per
f
or
m
ed
by
to
pr
ovid
e
a
confu
si
on
m
at
rix.
A
c
onf
us
i
on
m
at
rix
con
ta
in
s
in
form
at
ion
a
bout
act
ual
an
d
pr
e
dict
ed
cl
ass
pro
vide
d
by
a
cl
assifi
cat
ion
syst
e
m
[16]
. All
cor
re
ct
cl
assifi
cat
ion
s
that
li
e
al
ong
the d
ia
gonal f
r
om
the
nort
h
-
west
c
orner
to
t
he
s
outh
-
eas
t
corner
al
so
is
cal
le
d
Tr
ue
P
osi
ti
ves
(TP)
a
nd
T
ru
e
Ne
gatives
(TN)
w
hile
oth
e
r
cel
ls
a
re
sta
te
d
a
s
th
e
False
Po
sit
ives
(
FP
)
and
False
Ne
gatives
(
FN)
[
17]
.
I
n
this
stu
dy,
the
li
kely
cases
are
co
nsi
der
e
d
as
the
po
sit
ive
case,
w
hile
the
un
li
kely
an
d
pro
bab
le
cases
are
the
ne
gative
cases.
T
he
def
i
niti
on
s
of
t
hese
pa
ram
et
ers
are
pr
ese
nted
as
fol
lows
:
a.
Tru
e
posit
ives
(TP) are
correc
tl
y cl
assi
fied
ye
s cases.
b.
False
posit
ives
(
FP
)
a
re i
ncorre
ct
ly
classi
fied
no cases
.
c.
Tru
e
n
e
gatives
(
T
N) are c
orre
ct
ly
classi
fied
no cases.
d.
False
n
e
gatives
(
F
N) are
inc
orrectl
y cl
assifi
ed
ye
s case
s.
The
tr
ue
posit
ive/ne
gative
a
nd
f
al
se
posit
iv
e/
neg
at
ive
val
ues
recor
ded
f
ro
m
the
co
nfu
sion
m
at
rix,
then
can
be
use
d
to
evaluate
the
per
f
orm
ance
of
the
predi
ct
ion
m
od
el
.
A
desc
riptio
n
of
the
de
finiti
on
a
nd
expressi
on
s
of
the m
et
rics is pr
esente
d
as
foll
ow
s
[
18
]
:
a.
Re
cal
l i
s an
a
ve
rag
e
p
e
r
-
cl
ass
eff
ect
iv
eness
o
f
a classi
fier
to
ide
ntify cl
ass
labels.
=
+
b.
Pr
eci
sio
n
is t
he
ab
il
it
y of
a
classi
fier to dete
r
m
ine the positi
ve
la
bels
b
y
usi
ng
one
ver
s
us
al
l appro
ac
h.
=
+
c.
Accuracy
is
th
e
su
m
of
the
ra
ti
os
of
c
or
rect
cl
assifi
cat
ion
s
to
the
num
ber
of
total
cl
assifi
cat
ion
s
by
us
in
g
a one
ver
s
us al
l appr
oach.
=
+
+
+
+
3.
RESU
LT
S
A
ND D
I
SCUS
S
ION
This
sect
io
n
pr
esents
the
e
xp
e
rim
ental
resu
lts
and
a
naly
sis
of
this
st
ud
y
w
hich
util
iz
e
two
cl
asi
fiers
,
C4.
5
a
nd
Nai
ve
Ba
ye
s.
T
hr
ee
ex
per
im
ents
scenari
o
s
base
d
on
per
ce
ntag
e
data
s
plit
ti
ng
a
re
perform
ed
to
eac
h
al
gorithm
.
The
first
e
xp
e
rim
e
nt
us
e
s
60
%
of
trai
ning
data
a
nd
40%
of
the
data
te
sti
ng,
th
e
seco
nd
ex
pe
r
i
m
ent
us
es
70% of tr
ai
nin
g data a
nd 3
0%
of the dat
a test
ing
, a
nd
t
he
thir
d
e
xp
e
ri
m
ent u
ses 80%
trainin
g
data a
nd
20
%
data
te
sti
ng.
The
e
xp
e
rim
e
nt
w
hich
pro
vid
es
the
highest
perform
ance
values
for
eac
h
m
et
ho
d
is
us
ed
as
a
m
od
el
to
find
the
best
m
e
tho
d
by
re
-
te
sti
ng
on
pro
vid
e
d
data
te
sti
ng.
The
Table
5
pr
ese
nts
the
a
ver
a
ge
perform
ance
pa
ram
et
er
values
of
eac
h
ex
pe
rim
ent
scenario
of
C4.
5
on
m
od
el
te
sti
ng
sta
ges,
wh
il
e
Table
6
sh
ows
t
he
res
ults
of
Nai
ve
Ba
ye
s.
Ba
sed
on
the
ac
hieve
d
value
of
a
cc
ur
acy
c
rite
ria,
the
first
e
xper
i
m
ent
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Comparati
ve
St
ud
y
of Clas
sif
ic
ation Met
ho
d on Cust
om
e
r C
andidate
Da
t
a
(
Mujiono
Sadi
ki
n
)
4769
scenari
o
is
the
best
f
or
bo
t
h
of
the
al
gorithm
.
In
t
he
first
sc
enar
i
o,
the
C
4.5
accu
racy
is
82.
59
%
,
w
her
e
as
the
Naive Bay
es a
ccur
acy
is
80.
35 %
.
Table
5.
C
4.5 Alg
or
it
hm
Test Per
form
ance
Exp
eri
m
en
t
Accurac
y
Precisio
n
Recall
1
8
2
.59
%
8
6
.77
%
8
2
.03
%
2
8
0
.37
%
8
5
.10
%
8
0
.80
%
3
8
0
.37
%
8
7
.50
%
80%
Table
6.
Naive
Bay
es A
lg
ori
thm
Test Perform
ance
Exp
eri
m
en
t
Accurac
y
Precisio
n
Recall
1
8
0
.35
%
8
0
.16
%
8
2
.90
%
2
7
7
.38
%
7
8
.72
%
8
0
.43
%
3
7
7
.68
%
8
1
.25
%
8
1
.25
%
The
nex
t
sta
ge
of
the
e
xper
i
m
ent
is
to
co
m
par
e
the
bes
t
m
od
el
pro
vi
ded
f
ro
m
each
ex
per
im
ent
scenari
o
w
hich
are
ru
n
for
bo
th
al
go
rit
hm
s.
These
tw
o
m
o
dels
then
are
a
pp
li
ed
to
the
da
ta
te
sti
ng
that
has
been
prov
i
de
d
to
get
wh
ic
h
of
al
gorithm
t
hat
is
su
it
able
for
the
stud
y
c
ase.
The
res
ults
of
this
com
par
iso
n
sta
ge
are
pres
ented
as
Table
7.
T
able
7
s
hows
that
the
C4.
5
al
gorith
m
is
su
pe
rior
com
par
ed
to
the
N
ai
ve
Ba
ye
s algorit
hm
w
it
h
it
s accur
acy
is 83.
33%
, while
the
Nai
ve
Ba
ye
s alg
or
it
h
m
achieve
d i
s 80.67
%
.
T
able
7.
C
om
par
iso
n
C
4.5 Be
st M
od
el
a
nd
Naive Bay
es B
est
Mod
el
on T
est
ing
Stage
Criteria
C4
.5 Alg
o
rith
m
Naiv
e Bay
es Alg
o
r
ith
m
Accurac
y
8
3
.33
%
8
0
.67
%
Precisio
n
8
9
.16
%
8
0
.72
%
Recall
8
2
.22
%
8
3
.75
%
To
valid
at
e
the
res
ult
above,
we
pe
rform
the
nex
t
exp
e
rim
ent
based
on
the
c
ross
validat
i
on
evaluati
on
sce
nar
i
o.
T
hr
ee
di
ff
ere
nt
k
-
fo
l
ds
are
use
d
i
n
t
he
sce
nar
i
o
i.e
.
5
-
f
old
,
10
-
f
ol
d,
20
-
f
old
a
nd
eac
h
these
k
-
f
old
is
app
li
ed
to
both
C4.
5
a
nd
N
aï
ve
Ba
ye
s
as
w
el
l.
The
res
ults
are
pr
e
sente
d
as
Table
8
a
nd
Table
9.
Table
8
presents
C4.
5
pe
rfor
m
ance,
w
her
eas
Ta
ble
9
pr
e
sents
Na
ïve
Ba
ye
s
per
form
ance.
The
cro
ss
validat
io
n
ex
pe
rim
ent
con
fir
m
s
that,
in
th
is
case,
C4
.5
achieves
bette
r
pe
rfo
rm
ance
com
par
ed
t
o
Naïve
Ba
y
es.
Of
al
l
k
-
f
old
s
a
pp
li
ed
C4.
5
prese
nts
bette
r
accuracy
tha
n
N
aï
ve
Ba
ye
s.
The
oth
e
r
inf
orm
at
ion
pr
ese
nted
by
the
resu
lt
s
is
th
ei
r
dif
fere
nt
pe
rfor
m
ance
patt
ern.
C4
.5
gi
ve
s
a
bette
r
acc
uracy
pe
rfor
m
ance
for
the less
k
-
fo
l
d, w
her
eas
N
aï
ve
Bay
es b
et
te
r a
ccur
acy
pe
rfo
rm
ances ar
e
prov
i
ded b
y t
he big
ger k
-
f
old.
Table
8.
C
4.5 C
ro
ss
V
al
idati
on Sce
nar
i
o
Pe
rfor
m
ance
Precisio
n
Recall
Accurac
y
5
-
f
o
ld
8
0
.48
%
8
3
.07
%
8
1
.58
%
10
-
f
o
l
d
8
0
.73
%
8
3
.07
%
8
1
.56
%
20
-
f
o
l
d
8
1
.17
%
8
3
.06
%
8
1
.50
%
Table
9.
Naive
Bay
es C4.5 C
r
os
s
Vali
datio
n Scena
rio Per
form
ance
Precisio
n
Recall
Accurac
y
5
-
f
o
ld
7
6
.47
%
8
4
.86
%
8
0
.39
%
10
-
f
o
l
d
7
6
.73
%
8
4
.87
%
8
0
.41
%
20
-
f
o
l
d
7
7
.25
%
8
4
.85
%
8
0
.41
%
The
s
uperi
or
it
y
of
C4
.5
com
par
e
d
t
o
Naive
Ba
ye
s
can
be
unde
rstood
sin
ce
al
l
of
the
in
pu
t
va
riable
are
in
depen
de
nce
eac
h
oth
er
,
so
C4.
5
is
m
or
e
su
it
able
t
o
this
cha
racteri
sti
c
of
data.
O
n
the
ot
her
si
de
,
the
natu
re
of
the
N
ai
ve
Ba
ye
s
al
gorithm
is
base
d
on
the
co
ndit
ion
al
pro
bab
il
it
y
of
i
n
put
var
i
ables,
s
o
in
t
his
cas
e
the
ad
van
ta
ges
of
Naive
Ba
y
es
is
le
ss
us
e.
Anothe
r
im
pli
cat
ion
s
how
n
by
the
res
ults
is
that
the
cus
tom
er
le
asi
ng
a
ppli
cat
ion
ten
ds t
o fa
ll
into
rec
omm
end
e
r
a
ppli
cat
i
on r
at
her
t
han
cl
assifi
cat
ion
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
8
, N
o.
6
,
Dece
m
ber
2
01
8
:
4763
-
4771
4770
4.
CONCL
US
I
O
N AND F
UT
U
RE ST
UDY
In
this
stu
dy,
C4.
5
Algorith
m
and
Naive
Ba
ye
s
Algo
rit
hm
wer
e
i
m
pl
e
m
ented
on
a
custom
er
cred
i
t
dataset
to
pred
ic
t
the
pote
ntial
risk
in
the
f
ut
ur
e.
Ba
sed
on
two
ty
pes
of
exp
e
rim
ents
scenari
o
resu
lt
s,
C4.
5
al
gorithm
achieves
bette
r
pe
rfor
m
ance.
T
he
res
ults
stu
dy
pr
ese
nts
t
ha
t
the
rec
omm
end
e
r
syst
em
as
the
char
act
e
risti
cs
of
C4
.5
is
m
ore
su
it
able
tha
n
Naive
Ba
ye
s
w
hich
w
ork
base
d
on
conditi
onal
pr
obabili
ty
of
t
he
input
va
riables
.
Wh
ereas
,
on
C4.
5
al
gorithm
sal
ary
at
tribu
t
e
is
the
m
os
t
i
nf
l
uen
ti
al
at
tr
ibu
te
s
how
n
by
the
it
s
sign
ific
a
nt
val
ue
of
e
ntropy
gain
c
om
par
ed
to
oth
e
r
in
put
var
ia
bles.
T
he
do
m
inant
inf
luence
of
t
he
sal
ary
at
tribu
te
is
al
so
pr
ese
nted
i
n
ever
y
ex
pe
rim
e
nt
scena
rio
w
he
re
the
at
trib
ute
is
al
ways
sel
ect
ed
as
the
r
oot
node
of
the
tree.
I
n
the
fu
tu
re
stu
dy,
we
will
exp
l
or
e
s
om
e
o
pport
unit
ie
s
to
app
ly
the
othe
rs
te
chn
i
qu
e
in
this
do
m
ai
n.
W
e
al
so
will
in
vestigat
e
the
ot
her
real
ap
plica
ti
on
s
w
hic
h
sti
ll
op
e
n
to
e
xploi
t
su
ch
as:
c
us
t
om
er
care, sales
r
ec
om
m
end
er,
a
nd
m
ic
ro
f
ina
nce
wh
ic
h
is
gro
w
ing q
uickly.
REFERE
NCE
S
[1]
Kari
m
M,
Rahm
an
RM.
Dec
isio
n
Tre
e
and
Naïv
e
Ba
y
es
Algorit
h
m
for
Cla
ss
ifi
ca
t
ion
and
Gene
ra
tion
of
Acti
onable
Know
le
dge
for
Dire
ct Marketin
g.
J Softw
Eng A
ppl
2013;
06
:
1
96
–
206.
[2]
Dim
it
oglou
G,
Dim
it
oglou
G,
Adam
s
J
A,
et
al.
Com
par
ison
of
the
C4
.
5
and
a
Naive
B
a
y
es
Cla
ss
ifi
er
for
th
e
Predic
ti
on
of
L
ung
Canc
er
Co
m
par
ison
of
the
C4
.
5
and
a
Naive
Ba
y
es
C
l
assifie
r
for
th
e
Predic
ti
on
of
L
ung
Canc
er
Survivab
il
ity
.
[3]
Arifin
MF
,
Fit
ri
ana
h
D.
Pener
ap
an
Algorit
m
a
Kl
asifi
kasi
C4
.
5
D
al
am
Rekomend
asi
Pener
imaan
Mitra
Penju
alan
Studi
Kasus
:
PT
Atria Artha Persada
.
InCom
Te
ch
2018;
8
:
87
–
102
.
[4]
Jafa
r
Ham
id
A,
Ahm
ed
TM.
Deve
lopi
ng
Predi
ction
Model
of
Lo
an
Risk
in
Bank
s
Us
ing
Data
Mining.
Mac
h
L
ea
r
n
Appl
An Int
J 20
16;
3:
1
–
9.
[5]
Krich
ene
A.
Us
ing
a
nai
ve
Ba
y
e
sian
cl
assifi
er
m
et
hodolog
y
for
l
oan
risk
assessm
ent
.
J
Ec
on
Fina
nc
Adm
Sci
2017;
22:
3
–
24.
[6]
Buulol
o
E,
Silal
ahi
N,
Fadli
na
,
et
al.
C4.
5
Algorit
hm
To
Predic
t
the
Im
pac
t
of
the
Ea
rthqu
ake.
Int
J
Eng
Res
Te
chno
l
2017;
6
:
10
–
15.
[7]
Poongodi S,
Bab
u
MR.
Predi
ctio
n
of
Crop
Producti
on
using I
m
prove
d
C4
.
5
with
AN
FIS
Cla
ss
ifi
er.
10
.
[8]
Tha
riq
a
P,
Sita
n
ggang
IS,
S
y
auf
i
na
L.
Com
par
a
tive
Anal
y
s
is
of
Spatial
Dec
ision
Tre
e
Algor
it
hm
s
for
Burned
Area
of
Peatland
in
R
okan
Hil
ir
Ri
au.
Te
lkomnika
(T
elec
om
m
unic
at
ion
Com
put
Elec
tro
n
Control 2016;
14:
684
–
691.
[9]
Nikam
S
S.
A
C
om
par
at
ive
Stud
y
of
Cla
ss
ifi
c
at
i
on
Te
chni
qu
es
in
Data
Mining
Algorit
hm
s.
Orien
t
J
Com
put
S
ci
Te
chno
l
2015;
8
:
13
–
19.
[1
0]
Basri
Hasanuddi
n
Z,
S
y
ar
if
S.
Za
kah
Mana
gement
S
y
ste
m
u
sing
Appr
oac
h
Cla
ss
ifi
c
a
ti
on.
Telkom
nik
a
(Te
l
ec
om
m
unic
a
ti
on
Com
put El
e
ct
ron
Con
trol
20
17;
15:
1852
–
18
57.
[11]
Sus
ant
i
AR,
Dj
at
na
T,
Kus
um
a
W
A.
Twit
t
er’
s
Senti
m
ent
Ana
l
y
sis
on
Gs
m
Services
using
Mu
lt
inomial
Naïv
e
Ba
y
es.
TELKO
MN
IKA
(Te
le
co
m
m
unic
at
ion
Co
m
put
El
e
ct
ron
C
ontrol
2017
;
15:
1354.
[12]
La
rose
DT.
DIS
COV
ERING
K
NO
W
LE
DG
E
I
N
DA
TA
An
Introduc
ti
on
to
Dat
a
Mining.
John
W
il
e
y
&
Sons
,
Inc
.
,
2015.
[13]
Pati
l, T.
R
.
,
Sher
eka
r
MS
.
No T
itle. Perform
Anal
Naive Bay
es
J48 Cla
ss
if
Algor
ithm
Data
Cl
assif;
6.
[14]
waika
to
.
W
eka 3
:
Dat
a
Mining
S
oftwa
re in
Jav
a.
[15]
Ȧ SR,
Sonika
Ȧ.
Eff
e
ct
iv
ene
ss
of
Data Preprocess
ing
for
D
at
a
Min
ing.
2014
;
4
:
34
80
–
3483.
[16]
Santra
a.
K,
Christ
y
CJ.
Gene
t
ic
Algorit
hm
an
d
Confusion
Ma
tri
x
for
Docum
ent
Cluste
ring.
In
t
J
Co
m
put
Sci
2012;
9:
322
–
32
8.
[17]
Sadiki
n
M,
Fanan
y
MI,
Basar
u
ddin
T.
A
New
Data
Repre
sent
at
ion
Based
on
Tra
in
ing
Data
Chara
c
te
rist
ic
s
t
o
Ext
ra
ct
Drug
Na
m
e
Ent
i
t
y
in
Me
dic
a
l
T
ext.
2016
.
[18]
Mehdi
y
ev
N,
E
nke
D,
Fe
tt
k
e
P,
et
al.
Eva
lu
at
ing
Forec
ast
in
g
Methods
b
y
Consideri
ng
Dif
fer
ent
Acc
ur
a
c
y
Mea
sures.
Proc
e
dia
Com
put
Sc
i 2016;
95:
264
–
2
71.
BIOGR
AP
HI
ES
OF
A
UTH
ORS
Mujiono
Sadikin
is
fa
cul
t
y
m
e
m
ber
of
Facult
y
of
Com
pute
r
S
ci
en
ce
Univ
ersitas
Merc
u
Bu
ana
Jaka
rta.
He
hel
d
doct
or
al
d
egr
e
e
from
Univie
rsit
as
Indone
sia
,
Ja
kar
ta
2017.
His
rese
arc
h
ar
ea
is
in
Dat
a
Mining
,
Mac
hine
L
ea
rni
ng,
and
IT
Gove
rna
nce
as
we
ll.
Som
e
of
his
exp
eri
en
ce
s
are
:
As
te
am
leade
r
in
I
T
Governa
nc
e
a
n
Proce
dure
pr
e
par
ation
of
Dir
e
ct
ora
te
L
and
&
Tra
nsporta
ti
ons
Ministe
r
y
of
Tr
a
nsportat
ion
,
Te
a
m
le
ade
r
of
IT
Audit
and
As
sessment
Univer
sita
s
Merc
u
Buana
,
and
som
e
m
ore
.
Since
2012
h
e
le
ads
th
e
Univ
ersity
of
Mer
cu
Buana
I
T
Dire
ct
ora
te
as
the
Dire
ct
or
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Comparati
ve
St
ud
y
of Clas
sif
ic
ation Met
ho
d on Cust
om
e
r C
andidate
Da
t
a
(
Mujiono
Sadi
ki
n
)
4771
Fahri
Alfia
ndi
i
s
a
student
in
Facul
t
y
of
Com
pute
r
Scie
n
ce
,
Univer
sita
s
Merc
u
Buana
,
Indone
sia.
He
was
born
in
Jaka
rta
on
Dec
ember
16
th
,
1995.
He
is
int
ere
st
ed
in
dat
a
m
ini
ng,
al
gorit
hm
ana
l
y
s
is a
nd
progr
amm
ing.
Evaluation Warning : The document was created with Spire.PDF for Python.