Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
4
,
A
ug
us
t
2020
,
pp.
4331~4
339
IS
S
N: 20
8
8
-
8708
,
DOI: 10
.11
591/
ijece
.
v10
i
4
.
pp
4331
-
433
9
4331
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
Featur
e selecti
on
for mult
ipl
e water qualit
y statu
s:
i
ntegr
ated
bootstra
pp
ing and SMOT
E appro
ac
h in
i
mb
alance classe
s
Shofwa
tu
l
Uyun
1
, Ek
a Suli
styowa
ti
2
1
Depa
rtment of I
nform
at
ic
s,
Univ
e
r
sita
s Isl
am
Ne
ger
i
Sunan
Kal
ijaga
,
Indonne
si
a
2
Depa
rtment of
Biol
og
y
,
Univer
sita
s Isla
m
Neg
e
ri
Sunan
Ka
li
j
ag
a
,
Indon
esia
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
N
ov
24
, 201
9
Re
vised
Feb 1
7
,
2020
Accepte
d
Fe
b 25
, 202
0
S
T
O
R
E
T
i
s
o
n
e
m
e
t
h
o
d
t
o
d
e
t
e
r
m
i
n
e
t
h
e
r
i
v
e
r
w
a
t
e
r
q
u
a
l
i
t
y
,
a
n
d
t
o
c
l
a
s
s
i
f
y
t
h
e
m
i
n
t
o
f
o
u
r
c
l
a
s
s
e
s
(
v
e
r
y
g
o
o
d
,
g
o
o
d
,
m
e
d
i
u
m
a
n
d
b
a
d
)
b
a
s
e
d
o
n
t
h
e
d
a
t
a
o
f
w
a
t
e
r
f
o
r
e
a
c
h
a
t
t
r
i
b
u
t
e
o
r
f
e
a
t
u
r
e
.
T
h
e
s
u
c
c
e
s
s
o
f
t
h
e
f
o
r
m
a
t
i
o
n
o
f
p
a
t
t
e
r
n
r
e
c
o
g
n
i
t
i
o
n
m
o
d
e
l
m
u
c
h
d
e
p
e
n
d
s
o
n
t
h
e
q
u
a
l
i
t
y
o
f
d
a
t
a
.
T
h
e
r
e
a
r
e
t
w
o
i
s
s
u
e
s
a
s
t
h
e
c
o
n
c
e
r
n
o
f
t
h
i
s
r
e
s
e
a
r
c
h
a
s
f
o
l
l
o
w
s
,
t
h
e
d
a
t
a
h
a
v
i
n
g
d
i
s
p
r
o
p
o
r
t
i
o
n
a
t
e
a
m
o
u
n
t
a
m
o
n
g
t
h
e
c
l
a
s
s
e
s
(
i
m
b
a
l
a
n
c
e
c
l
a
s
s
)
a
n
d
t
h
e
f
i
n
d
i
n
g
o
f
n
o
i
s
e
o
n
i
t
s
a
t
t
r
i
b
u
t
e
.
T
h
e
r
e
f
o
r
e
,
t
h
i
s
r
e
s
e
a
r
c
h
i
n
t
e
g
r
a
t
e
s
t
h
e
S
M
O
T
E
T
e
c
h
n
i
q
u
e
a
n
d
b
o
o
t
s
t
r
a
p
p
i
n
g
to
h
a
n
d
l
e
t
h
e
p
r
o
b
l
e
m
o
f
i
m
b
a
l
a
n
c
e
c
l
a
s
s
.
W
h
i
l
e
a
n
e
x
p
e
r
i
m
e
n
t
i
s
c
o
n
d
u
c
t
e
d
t
o
e
l
i
m
i
n
a
t
e
t
h
e
n
o
i
s
e
o
n
t
h
e
a
t
t
r
i
b
u
t
e
b
y
u
s
i
n
g
s
o
m
e
f
e
a
t
u
r
e
s
e
l
e
c
t
i
o
n
a
l
g
o
r
i
t
h
m
s
w
i
t
h
f
i
l
t
e
r
a
p
p
r
o
a
c
h
(
i
n
f
o
r
m
a
t
i
o
n
g
a
i
n
,
r
u
l
e
,
d
e
r
i
v
a
t
i
o
n
,
c
o
r
r
e
l
a
t
i
o
n
a
n
d
c
h
i
s
q
u
a
r
e
)
.
T
h
i
s
r
e
s
e
a
r
c
h
h
a
s
s
o
m
e
s
t
a
g
e
s
a
s
f
o
l
l
o
w
s
:
d
a
t
a
u
n
d
e
r
s
t
a
n
d
i
n
g
,
p
r
e
-
p
r
o
c
e
s
s
i
n
g
,
i
m
b
a
l
a
n
c
e
c
l
a
s
s
,
f
e
a
t
u
r
e
s
e
l
e
c
t
i
o
n
,
c
l
a
s
s
i
f
i
c
a
t
i
o
n
a
n
d
p
e
r
f
o
r
m
a
n
c
e
e
v
a
l
u
a
t
i
o
n
.
B
a
s
e
d
o
n
t
h
e
r
e
s
u
l
t
o
f
t
e
s
t
i
n
g
u
s
i
n
g
1
0
-
f
o
l
d
c
r
o
s
s
v
a
l
i
d
a
t
i
o
n
,
i
t
s
h
o
w
s
t
h
a
t
t
h
e
u
s
e
o
f
t
h
e
S
M
O
T
E
-
b
o
o
t
s
t
r
a
p
p
i
n
g
t
e
c
h
n
i
q
u
e
i
s
a
b
l
e
t
o
i
n
c
r
e
a
s
e
t
h
e
a
c
c
u
r
a
cy
f
r
o
m
8
3
.
3
%
t
o
b
e
9
8
.
8
%
.
W
h
i
l
e
t
h
e
p
r
o
c
e
s
s
o
f
n
o
i
s
e
e
l
i
m
i
n
a
t
i
o
n
o
n
t
h
e
d
a
t
a
a
t
t
r
i
b
u
t
e
i
s
a
l
s
o
a
b
l
e
t
o
i
n
c
r
e
a
s
e
t
h
e
a
c
c
u
r
a
c
y
t
o
b
e
9
9
.
5
%
(
t
h
e
u
s
e
o
f
f
e
a
t
u
r
e
s
u
b
s
e
t
p
r
o
d
u
c
e
d
b
y
t
h
e
i
n
f
o
r
m
a
t
i
o
n
g
a
i
n
a
l
g
o
r
i
t
h
m
a
n
d
t
h
e
d
e
c
i
s
i
o
n
t
r
e
e
c
l
a
s
s
i
f
i
c
a
t
i
o
n
a
l
g
o
r
i
t
h
m
)
.
Ke
yw
or
d
s
:
Boo
tst
ra
ppin
g
Feat
ur
e
selec
ti
on
Im
balance class
SMOTE
Water
qual
it
y st
at
us
Copyright
©
202
0
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Shofwat
ul
Uyu
n
,
Dep
a
rtm
ent o
f Info
rm
at
ic
s,
Faculty
of S
ci
e
nce a
nd Tec
h
nolo
gy,
Un
i
ver
sit
as
Isl
a
m
N
ege
ri S
unan Kal
ijaga
,
Ma
rsd
a
A
disu
c
ipto
St
reet,
No.
1
De
pok Sl
em
an
Yogyaka
rta,
5528
1,
I
ndone
sia
.
Em
a
il
:
Sh
ofwa
tul.uyu
n@ui
n
-
su
ka
.ac.i
d
1.
INTROD
U
CTION
STO
RET
is
a
m
et
ho
d
us
e
d
by
the
Mi
nister
of
En
vir
onm
ent
in
t
o
dete
rm
i
n
in
e
wate
r
qua
li
ty
sta
tus
in
river
/
water
bo
dy
[1]
.
T
he
pe
rfor
m
ance
pro
cess
of
S
TOR
ET
m
e
tho
d
is
com
par
ing
the
data
resu
lt
in
g
fr
om
water
sam
pling
with
t
he
wat
er
qual
it
y
sta
nd
ar
d
in
acc
ord
an
ce
with
t
he
cl
asses
an
d
ba
sed
on
t
he
at
tribu
te
s
us
e
d.
T
he
m
ore
par
am
et
ers
use
d
m
ay
incur
m
or
e
cost
relat
ed
to
la
borato
r
y
handlin
g
an
d
m
easur
em
ents.
It
is
because
the
ob
serv
at
io
n
an
d
analy
sis
are
co
nducted
in
th
e
la
borator
y
f
or
each
sam
ple
of
water
data
for
each
sam
pling
po
int
.
The
num
ber
of
data
analy
zed
re
qu
i
res
aut
om
ation
in
det
erm
ining
the
water
qual
it
y
sta
tus.
It
requires
a
m
od
el
im
pr
ov
em
ent
in
the
patte
rn
rec
ogniti
on
fiel
d
that
can
be
us
ed
to
cl
assify
the
water
qu
al
it
y
sta
tus.
G
e
n
e
r
a
l
l
y
t
h
e
r
e
a
r
e
s
o
m
e
m
e
t
h
o
d
s
t
h
a
t
c
a
n
b
e
u
s
e
d
t
o
m
e
a
s
u
r
e
t
h
e
w
a
t
e
r
q
u
a
l
i
t
y
s
t
a
t
u
s
a
s
f
o
l
l
o
w
s
:
(
a
)
w
a
t
e
r
q
u
a
l
i
t
y
i
n
d
e
x
a
s
c
o
n
d
u
c
t
e
d
b
y
[
2
]
w
h
o
s
u
g
g
e
s
t
s
T
h
e
W
e
s
t
J
a
v
a
W
a
t
e
r
Q
u
a
l
i
t
y
I
n
d
e
x
(
W
J
W
Q
I
)
t
o
m
e
a
s
u
r
e
t
h
e
w
a
t
e
r
q
u
a
l
i
t
y
i
n
W
e
s
t
J
a
v
a
p
r
o
v
i
n
c
e
,
a
n
d
[3
,
4]
;
(
b
)
b
a
s
e
d
o
n
c
o
m
m
u
n
i
t
y
s
u
g
g
e
s
t
e
d
b
y
[5]
;
(
c
)
W
a
t
e
r
p
o
l
l
u
t
i
o
n
I
n
d
e
x
[
6
]
;
(
d
)
S
T
O
R
E
T
i
n
d
e
x
[
7
]
.
Water
has
a
lot
of
par
am
eter
s
that
can
be
m
easur
ed
to
determ
ine
i
ts
qu
al
it
y
sta
tus
.
Ba
sed
on
the
val
ue
of
s
om
e
sel
ect
ed
at
tribu
te
s,
t
he
qu
al
it
y
sta
tus
can
be
cl
assifi
ed.
In
patte
r
n
recog
niti
on
,
one
of
the
im
po
rtant
com
po
ne
nts
de
te
rm
ining
th
e
su
ccess
grade
of
cl
assi
ficat
ion
proces
s
is
the
su
it
able
f
e
at
ur
e
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
1
0
, No
.
4
,
A
ugus
t
2020
:
4331
-
4339
4332
us
e
[
8]
.
The
re
are
two
proces
s
relat
ed
to
fea
ture
th
os
e
ar
e
f
eat
ur
e
ext
racti
on
process
[
8]
and
featu
re
sel
ect
io
n
process
[
9]
.
T
her
e
a
re
so
m
e
reasons
w
hy
featur
e
sel
ect
i
on
process
be
com
es
ver
y
i
m
po
rtant
in
th
e
patte
rn
recog
niti
on
as
fo
ll
ows:
to
im
pr
ov
e
the
pe
rfor
m
ance
of
a
m
od
el
of
t
he
patte
rn
recogn
it
io
n
syst
em
(sim
ple
m
od
el
that
ha
s
qu
ic
k
perform
ance
by
eli
m
inati
ng
the
irreleva
nt
data)
[10]
,
to
vis
ualiz
e
the
data
on
the
sel
ect
ion
pr
ocess
of
m
od
el
,
to
de
crea
se
th
e
di
m
ension
a
nd
noise
on
the
data
[
11]
.
The
r
e
are
two
im
po
rtant
issues r
eq
uire
d
to b
e
c
oncer
ne
d
a
fter
t
he
featur
e
extr
act
ion process
th
os
e
a
re
the
data
fin
di
ng
which
am
ou
nt
is
i
m
balance am
on
g i
ts cl
asses a
nd the
noise
on the
da
ta
att
rib
ute.
T
w
o
ap
proac
he
s
are
deliberat
el
y
us
ed
to
ha
ndle
t
he
im
bala
nce
cl
ass
case
tho
se
a
re
f
or
oversam
plin
g
and
under
-
sa
m
pl
ing
case
s.
O
ne
te
c
hn
i
que
t
hat
can
be
us
e
d
t
o
ha
nd
le
both
cas
es
is
cal
le
d
SMOTE
te
chn
iq
ue
[12]
.
For
oversam
pling
case,
t
he
dup
li
cat
ion
of
data
will
be
c
onduct
ed
on
the
m
ino
rity
cl
ass.
On
t
he
oth
er
ha
nd,
f
or
un
der
-
sam
pling
case,
so
m
e
data
sam
ples
will
be
el
i
m
inate
d
f
rom
the
m
ajo
rity
cl
ass
or
by
com
bin
ing
bo
t
h
or
usual
ly
cal
le
d
t
he
hybri
d
te
ch
nique
[
10
]
.
T
he
us
e
o
f
this
te
ch
niqu
e
has
the
sam
e
aim
,
wh
ic
h
is
to
fi
nd
the
dataset
f
or
t
he
le
ar
ning
process
ha
ving
the
sam
e
data
or
ha
ving
al
m
os
t
the
sa
m
e
data
a
m
on
g
the
cl
a
sses
(
bala
nce)
.
The
SM
OTE
te
ch
nique
has
bee
n
us
e
d
t
o
so
lve
the
im
b
al
ance
cl
ass
c
ase
on
sever
al
st
ud
ie
s
,
am
on
g
oth
e
r
s
are
the
data
for
detect
ing
the
at
ta
ck
[
13]
,
the
m
edical
data
[
14
-
16]
and
the
e
-
com
m
erce
data
[17]
.
Be
sides
SMOT
E
,
there
is
ano
t
he
r
te
chn
i
qu
e
ca
ll
ed
bootstra
pping
that
can
be
us
ed
for
re
sam
pling
data
.
Re
sam
pling
te
ch
nique
can
be
us
e
d
to
ha
nd
le
t
he
pro
blem
of
the
data
am
ou
nt
on
the
sm
aller
cl
ass
from
it
s
quantit
y
by
changin
g
the
distri
bu
ti
on
of
m
in
or
it
y
cl
ass
un
de
rr
e
pr
ese
nted
durin
g
the
data
trai
ni
ng
pr
oces
s
in
the
m
achine
le
arn
in
g
al
go
rithm
.
Re
sa
m
p
li
ng
te
c
hn
i
qu
e
is
al
so
known
as
the
so
l
ution
on
i
m
balance
cl
a
ss
cas
e
f
or
le
arn
i
ng
dataset
.
This
m
et
ho
d
i
s
su
it
able
to
be
us
e
d
on
t
he
da
ta
in
gr
eat
scal
e,
w
hi
ch
is
co
nducte
d
to
decr
ea
se
the
am
ou
nt
of
da
ta
trai
n
ing
sa
m
ple.
So
that
the
trai
ni
ng
nee
d
can
us
e
few
e
r
am
ou
nt
of
data that
r
e
pr
ese
nt the
a
ct
ual d
at
a.
The
noise
exis
t
ence
on
the
at
trib
ute
data
certai
nly
will
giv
e
i
m
pact
on
it
s
cl
assifi
cat
ion
pe
rfor
m
ance.
If
the
data
us
e
d
has
the
ve
ry
gr
eat
am
ou
nt
of
at
trib
ute
/p
aram
et
er
or
featur
e,
it
certai
nly
will
giv
e
i
m
pact
durin
g
t
he
c
om
pu
ti
ng
pr
oce
ss
[
11
]
.
The
refor
e
,
t
he
featu
r
e
sel
ect
ion
pro
cess
is
re
quire
d.
Ge
ner
al
ly
there
are
three
a
ppr
oac
he
s
that
ca
n
be
co
nducte
d
to
sel
ect
the
at
t
rib
ute
or
featu
re
;
inclu
ding
f
il
te
r
appr
oach
[
18
]
,
wr
a
pper
ap
pro
ach
[19]
or
em
bedde
d
ap
proa
ch
[19]
.
I
n
filt
er
appr
oach
th
e
pr
oc
ess
betw
een
featu
re
sel
ect
io
n
and
le
a
rn
i
ng
is
cond
ucted
i
n
series.
It
is
dif
fer
e
nt
f
ro
m
the
wr
a
pper
a
pp
ro
ac
h
that
is
c
onduct
ed
i
n
pa
rall
e
l.
In
filt
er
ap
proa
ch,
the
proce
ss
of
sel
ect
in
g
th
e
featu
re
subse
t
is
pr
e
viously
cond
ucted
bas
ed
on
t
he
wei
ght
of
each att
rib
ute o
r feat
ure. T
he
w
ei
ghin
g
is co
nducted f
or
ea
ch
at
trib
ute o
r feat
ur
e
t
o
ra
nk
the att
ribu
te
ba
sed
on
the th
reshold
val
ue
that
has
be
en dete
rm
ined
[1
8]
.
T
h
e
c
l
a
s
s
i
f
i
c
a
t
i
o
n
s
t
a
g
e
i
s
c
o
n
d
u
c
t
e
d
a
f
t
e
r
o
b
t
a
i
n
i
n
g
t
h
e
s
e
l
e
c
t
e
d
f
e
a
t
u
r
e
.
T
h
e
r
e
a
r
e
s
e
v
e
r
a
l
a
l
g
o
r
i
t
hm
s
for
t
h
e
l
e
a
r
n
i
ng
p
r
o
c
e
s
s
w
h
i
c
h
a
i
m
f
o
r
c
l
a
s
s
i
f
i
c
a
t
i
o
n
a
s
f
o
l
l
o
w
s
:
D
e
c
i
s
i
o
n
t
r
e
e
(
D
T
)
[
1
8
]
,
n
a
i
v
e
b
a
y
e
s
[
1
7
]
,
K
-
n
e
a
r
e
s
t
n
e
i
gh
b
o
r
s
(
K
N
N
)
[
2
0
]
,
r
a
n
d
o
m
f
o
r
e
s
t
[
2
1
]
,
a
r
t
i
f
i
c
i
a
l
n
e
u
r
a
l
n
e
t
w
o
r
k
[
2
2
]
a
n
d
s
u
p
p
o
r
t
v
e
c
t
o
r
m
a
c
h
i
n
e
[
2
3
]
.
Naïve
Ba
ye
s
is
a
si
m
ple
c
la
ssif
ic
at
ion
m
od
el
and
it
s
le
arn
i
ng
pr
ocess
do
es
no
t
re
qu
i
re
a
long
tim
e
if
co
m
par
ed
with
othe
r
cl
assifi
cat
ion
m
od
el
s.
Be
sides,
it
is
al
so
reco
gniz
ed
as
ha
ving
good
pre
dicti
on
accuracy
pe
rfo
rm
ance.
The
use
of
naïve
bay
es
al
go
rit
hm
is
easy
and
com
fo
rta
ble
becaus
e
it
do
es
not
ne
ed
to
cond
uc
t
the
co
m
pl
ic
at
ed
par
a
m
et
er
est
i
m
at
i
on
a
nd
it
is
reli
able
to
us
e
on
the
gr
eat
data
[24]
.
DT
is
one
of
the
cl
assifi
cat
ion
al
go
rithm
s
m
uch
i
m
ple
m
e
nted
in
se
ver
al
cases
of
m
ac
hin
e
le
ar
ning.
The
ai
m
of
D
T
is
to
m
ake
a
m
od
el
that
can
be
use
d
t
o
pr
e
dict
t
he
value
of
a
t
arg
et
cl
ass
on
the
in
visible
i
nst
ance
te
st
bas
ed
on
sever
al
in
put
f
eat
ur
es
[
17,
25]
.
So
m
e
adv
a
ntages
of
DT
rely
on
it
s
sim
plici
ty
,
easy
to
underst
an
d,
e
asy
to
i
m
pl
e
m
ent,
requirin
g
a
li
tt
le
knowle
dge,
bei
ng
able
to
us
e
in
dataset
ei
the
r
num
eric
or
c
at
egori
cal
,
an
d
bein
g
able to
ha
nd
le
dataset
in gr
eat
am
ou
nt
[
26, 2
7]
.
Ba
s
e
d
o
n
t
h
e
r
e
s
e
a
r
c
h
c
o
n
d
u
c
t
e
d
p
r
e
v
i
o
u
s
l
y
,
t
h
e
r
e
i
s
n
o
m
o
d
e
l
i
n
t
e
g
r
a
t
i
n
g
t
h
e
u
s
e
o
f
b
o
o
t
s
t
r
a
p
p
i
n
g
r
e
s
a
m
p
l
i
n
g
t
e
c
h
n
i
q
u
e
a
n
d
S
M
O
T
E
t
e
c
h
n
i
q
u
e
t
o
h
a
n
d
l
e
t
h
e
i
m
b
a
l
a
n
c
e
c
l
a
s
s
c
a
s
e
i
n
m
u
l
t
i
c
l
a
s
s
c
a
s
e
.
B
e
s
i
d
e
s
,
t
h
e
f
e
a
t
u
r
e
s
e
l
e
c
t
i
o
n
p
r
o
c
e
s
s
b
y
f
i
l
t
e
r
a
p
p
r
o
a
c
h
i
s
c
o
n
d
u
c
t
e
d
t
o
h
a
n
d
l
e
t
h
e
n
o
i
s
e
o
n
d
a
t
a
a
t
t
r
i
b
u
t
e
.
T
h
e
r
e
a
r
e
f
i
v
e
a
l
g
o
r
i
t
h
m
s
(
i
n
f
o
r
m
a
t
i
o
n
g
a
i
n
,
r
u
l
e
,
c
h
i
s
q
u
a
r
e
,
c
o
r
r
e
l
a
t
i
o
n
a
n
d
d
e
r
i
v
a
t
i
o
n
)
u
s
e
d
b
a
s
e
d
o
n
t
h
e
v
a
l
u
e
w
e
i
g
h
t
p
r
o
d
u
c
e
d
a
n
d
a
f
t
e
r
w
a
r
d
s
t
h
e
p
e
r
f
o
r
m
a
n
c
e
w
i
l
l
b
e
c
o
m
p
a
r
e
d
a
m
o
n
g
e
a
c
h
o
t
h
e
r
.
W
h
i
l
e
f
o
r
c
l
a
s
s
i
f
i
c
a
t
i
o
n
t
h
e
r
e
a
r
e
f
o
u
r
a
l
g
o
r
i
t
h
m
s
(
D
e
c
i
s
i
o
n
t
r
e
e
,
K
-
n
e
a
r
e
s
t
n
e
i
g
h
b
o
u
r
s
,
n
a
ï
v
e
b
a
y
e
s
a
n
d
r
a
n
d
o
m
f
o
r
e
s
t
).
2.
RESEA
R
CH MET
HO
D
This
resea
rc
h
us
es
the
pr
im
a
ry
data
for
on
e
ye
ar
in
Branta
s
Ri
ver
f
ro
m
Novem
ber
2017
to
Oct
obe
r
2018
per
io
d
.
T
her
e
a
re
10
lo
cat
i
on
s
of
sam
pling
w
hich
da
ta
is
analy
zed
in
the
L
ab
or
a
tory
of
E
nviro
nm
ent
Ma
la
ng
as
f
o
ll
ow
s:
Pe
nd
em
Bridg
e
,
B
um
iayu
Brid
ge,
Se
nggu
ruh
Re
se
rvoir,
Lo
doyo
Re
servoir
,
Mric
a
n
Dam
,
Ploso
Bri
dge,
Len
gkong
Ba
r
u
Dam
,
Po
r
ong
Bridghe
,
G
un
ungs
a
ri
and
N
guj
a
ng
Bri
dg
e
.
The
re
are
twe
nty
two
par
am
et
er
s
being
m
easur
e
d
su
c
h
as
te
m
per
at
ur
e
,
aci
dity
(pH),
el
ect
ri
c
al
cond
uctivit
y
(DHL),
diss
olv
e
d
ox
y
gen
(DO),
bio
c
hem
ic
al
ox
yge
n
dem
and
(BOD),
c
hem
i
cal
ox
yge
n
de
m
and
(CO
D
),
total
su
sp
e
nde
d
so
li
d
(TSS
),
t
otal
d
issolve
d
s
olids
(
TDS),
N
it
rate
N
it
roge
n
(
NO
3
N)
,
Nitrit
e
(NO
2
N
),
P
O
4
P,
H
2
S,
Ph
e
nol,
deterg
e
nt
,
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Feature
selec
ti
on for
mult
iple w
ater
qualit
y
s
tatus
:
inte
gr
at
ed bootst
rap
pin
g… (
Shofwat
ul U
y
un
)
4333
fr
ee
chl
or
in
e,
o
il
an
d
fat,
C
d,
Zn
,
C
u,
P
b,
t
ot
al
colifo
rm
and
F
aeca
l
Coli
f
or
m
.
The
total
data
us
ed
is
12
0
data
sam
ples
with
22
pa
ram
et
ers.
This
resea
rc
h
has
six
sta
ge
s
as
fo
ll
ows:
data
underst
an
ding,
pre
-
proc
essing,
i
m
balance class, feat
ur
e
selec
ti
on
, classi
ficat
ion
a
nd
perfor
m
ance ev
al
uation, as
sho
wn in F
i
gure
1.
The
c
ollec
te
d
dataset
ha
s
a d
i
m
ension
of 12
0
rows
a
nd 22 co
lum
ns
,
the
r
ow
s
hows
t
he
data
t
ake
n
for
each
locat
io
n
of
ta
ki
ng
t
he
ri
ver
wate
r
sam
ple,
w
hile
colu
m
n
sh
ow
s
the
at
tribu
te
/feat
ur
e/
par
am
et
er
of
water
us
e
d
to
determ
ine
the
sta
tus
of
rive
r
water
qu
al
i
ty
.
Th
e
S
TORET
m
et
hod
is
us
ed
to
de
te
rm
ine
the
sta
tus
of
river
water
qual
it
y
based
on
[
1]
.
ST
OR
E
T
m
et
ho
d
is
us
e
d
to
dete
rm
i
ne
the
sta
t
us
of
water
qual
it
y.
This
m
e
tho
d
c
om
par
es
the
da
ta
fr
om
fie
ld
m
easur
em
ent
with
the
wat
e
r
qu
al
it
y
sta
nd
ard
in
acco
r
dance
with
watercla
ss.
H
ence,
B
ran
ta
s
Ri
ver
is
inc
lud
e
d
in
cl
as
s
2
cat
e
gory.
Be
fore
c
ondu
ct
ing
th
e
pro
cess
of
determ
ining
th
e stat
us
of w
at
er
qu
al
it
y, it
shou
l
d pr
e
vi
ousl
y cond
uct the
foll
ow
i
ng
:
Figure
1. Re
se
arch s
ta
ge
2.1.
Data u
n
derst
andi
ng
2.1.1.
M
an
u
al
feature
so
r
tin
g
Ba
sed
on
the
data
re
su
lt
c
ollec
te
d,
the
re
a
re
s
om
e
data
that
can
not
be
fill
ed
c
om
plete
ly
fo
r
al
l
featur
e
s.
It
is
due
t
o
se
veral
causes
,
on
e
of
wh
ic
h
i
s
each
feat
ure/
at
tribu
te
that
is
not
detect
ed
by
the
m
easur
ing
too
l
beca
us
e
e
ach
ha
s
the
value
un
der
or
ov
er
the
th
res
ho
l
d
of
the
m
e
asur
e
to
ol.
Ba
sed
on
22
featur
e
s,
13 f
ea
tures
a
re s
el
ect
ed,
w
hile 8 oth
ers
a
re
no
t
us
e
d
to
d
et
e
rm
ine the stat
us
o
f
ri
ver water
quali
ty
.
2.1.2.
De
terminat
i
on
of s
tat
us of
w
at
er
quali
ty of Br
an
t
as
River
usin
g
STORET
me
th
od
The
determ
inati
on
of
sta
tus
c
la
s
s
of
the
ri
ve
r
water
qual
it
y
is
co
nducte
d
base
d
on
thirte
en
sel
ect
e
d
featur
e
s.
I
n
th
is
case,
there
are
f
our
cl
ass
e
s
of
rive
r
qu
al
it
y
water
as
fo
ll
ows:
A
(
excell
ent),
B
(
good)
,
C
(interm
ediat
e)
a
nd
D
(
ba
d).
T
he
pe
rfor
m
ance
process
of
ST
ORET
m
e
thod
is
by
c
om
par
ing
the
da
ta
of
resu
lt
of
ta
ki
ng
the
water
s
a
m
ple
with
w
at
er
qual
it
y
raw
in
acc
orda
nce
with
it
s
cl
ass
and
bas
ed
on
the
pa
ram
et
ers
us
e
d.
In
this
c
ase,
Bra
ntas
R
iver
is
incl
ude
d
in
t
he
sec
ond
cl
as
s
cat
eg
or
y
fo
r
it
s
qual
it
y
raw
.
Ba
sed
on
the
c
la
ssific
at
ion
re
su
lt
of
the
sta
tus
of
rive
r
wat
er
qual
it
y,
the
unbalance
cl
as
s
case
is
fo
und
with
the
detai
ls
of cla
sses as
fo
ll
ows:
A=10
, B=1
6, C=
80 a
nd
D
=
14.
2.2.
Preproces
sing
2.2.1.
Missin
g
da
t
a el
im
inat
i
on
Be
fore
co
nduc
ti
ng
the
pr
oc
ess
of
sel
ec
ti
ng
the
be
st
f
eat
ur
e
,
the
process
of
zero
/e
m
pty
data
el
i
m
inatio
n
sho
uld
be
cond
ucted
i
n
order
not
to
di
sturb
the
perf
or
m
ance
of
al
gorithm
that
will
be
app
li
ed
to
t
he
ne
xt
pr
oces
s.
T
her
e
are
seve
ral
way
s
to
fill
the
em
pty
d
at
a.
It
ca
n
be
f
il
le
d
by
the
a
ver
a
ge/m
i
nim
a
l/m
axi
m
al
val
ue
of
data
on
the
feat
ur
e
,
or
it
ca
n
be
fi
ll
ed
with
zer
o
value.
T
his
res
earc
h
chooses
the
dat
a ave
rag
e
v
al
ue
of the
f
eat
ur
e
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
1
0
, No
.
4
,
A
ugus
t
2020
:
4331
-
4339
4334
2.2.2.
D
ata t
ra
nsforma
tio
n
The
data
that
will
be
proces
sed
nee
ds
to
be
sta
ti
sti
ca
ll
y
norm
al
to
kee
p
sta
yi
ng
in
one
ra
ng
e
of
the
sam
e
value.
The
re
are
se
ve
ral
form
ulati
on
s
or
ways
to
norm
al
iz
e
the
data.
T
his
rese
arch
us
es
pro
portio
n
trans
form
ation
.
Nor
m
al
iz
ation ai
m
s at gett
ing t
he value
on e
ach att
rib
ute
propo
rtion
al
ly
.
2.3.
Imb
al
an
ce
cl
a
ss
2.3.1.
SMOT
E (
s
yn
t
hetic
mi
no
ri
t
y o
ver
-
s
am
pli
ng
techni
que
)
Fr
om
the
re
su
l
t
of
determ
ining
the
sta
tus
of
Bra
ntas
Ri
ve
r
w
at
er
qu
al
it
y
us
i
ng
ST
ORE
T
m
et
ho
d,
it
find
s
a
case
of
unbala
nce
c
la
ss,
so
it
ne
e
ds
to
co
nduct
S
MOTE
te
ch
nique
[
12
]
.
T
he
re
are
two
a
ppr
oa
ches
that
can
be
c
onduct
ed
t
o
ta
ke
SM
OTE,
with
ra
ndom
ov
e
r
-
sam
pling
(ROS)
a
nd
r
andom
un
de
r
-
sam
pling
(RUS).
Co
ns
id
erin
g
the
data
us
e
d
for
searc
hing
the
best
m
od
el
to
determ
ine
the
sta
tus
of
rive
r
water
qu
al
it
y
is
no
t
too
bi
g,
ROS
ap
proac
h
is
sel
ect
ed
the
n.
The
river
w
at
er
data
inclu
ded
in
cat
eg
or
y
A,
B
and
D
is
ver
y
m
ini
m
al
so
it
needs
to
add
t
he
syntheti
c
da
ta
ta
ken
rand
om
l
y
fr
om
the
sa
m
e
featur
e
to
get
the
sa
m
e
data
m
ou
nt b
et
we
en
the m
ino
rity
an
d t
he
m
ajorit
y cl
ass.
2.3.2.
B
oots
tr
ap
pin
g
sa
m
pli
ng
m
eth
od
Af
te
r
the
data
set
is
ta
ke
n
f
r
om
us
ing
SM
OTE
e
xactl
y
us
in
g
rand
om
over
-
sam
pling
te
ch
ni
qu
e
,
afterwa
r
ds
it
needs
to
sel
ec
t
the
data
sa
m
ple
on
the
data
trai
ni
ng
r
an
dom
ly
so
that
the
data
use
d
has
sm
a
ll
er
m
e
asure.
2.4.
F
eature sel
ec
t
io
n
The
ai
m
of
the
featur
e
sel
ect
i
on
process
is
t
o
el
i
m
inate
the
featur
e
not
ha
ving
a
stron
g
con
t
rib
ution
in
determ
ining
the
sta
tus
of
water
qu
al
it
y.
This
certai
nly
giv
es
im
pact
on
the
m
easur
e
of
data
dim
e
ns
io
n
ei
ther
f
or
data
trai
ning
or
da
ta
te
sti
ng
.
G
ener
al
ly
ther
e
are
f
our
a
ppr
oa
ches
to
sel
ec
t
the
featu
re
s
ub
s
et
,
a
m
on
g
ot
her
s
are:
filt
ers
,
wh
ic
h
is
the
process
of
f
eat
ur
e
e
v
al
uat
ion
c
onduct
ed
ind
e
pe
nd
e
ntly
fr
om
the
le
ar
ning
process;
wr
a
pper
s
,
wh
ic
h
is
t
he
process
of
feat
ur
e
subset
sel
ect
ion
base
d
on
the
e
valuati
o
n
resu
l
t
of
the
le
ar
ning
proces
s;
e
m
bedde
d,
w
hic
h
is
the
featur
e
sel
ect
ion
co
nducte
d
duri
ng
the
le
arn
in
g
proces
s;
and
sim
ple
filters
by
ass
um
i
ng
t
he
in
dep
e
nd
e
nt
feat
ur
e
(
this
ap
proach
is
us
ually
us
e
d
on
data
with
m
any
featur
e
s
s
uch
a
s
on
the
case
of
te
xtu
al
cl
as
sific
at
ion
)
.
I
n
t
his
resea
rc
h,
t
he
process
of
f
eat
ur
e
sel
ect
io
n
us
es
filt
ers
ap
proac
h
that
sepa
rate
s
the
pr
oces
s
of
eval
uating
t
he
best
featu
r
e
subset
a
nd
t
he
le
ar
ning
pr
ocess.
The
determ
inati
on
of
the
be
st
su
bs
et
is
ba
sed
on
the
s
cor
e
or
weig
ht
pr
od
uce
d
by
each
feat
ur
e
su
bse
t.
T
h
e
s
t
a
g
e
o
f
f
i
l
t
e
r
a
p
p
r
o
a
c
h
i
s
s
h
o
w
n
b
y
F
i
g
u
r
e
2
.
T
h
i
s
r
e
s
e
a
r
c
h
u
s
e
s
f
o
u
r
a
l
g
o
r
i
t
h
m
s
i
n
c
l
u
d
e
d
i
n
f
i
l
t
e
r
s
a
p
p
r
o
a
c
h
c
a
t
e
g
o
r
y
t
o
g
e
t
t
h
e
w
e
i
g
h
t
v
a
l
u
e
a
s
f
o
l
l
o
w
s
:
d
e
r
i
v
a
t
i
o
n
,
i
n
f
o
r
m
a
t
i
o
n
g
a
i
n
,
c
h
i
s
q
u
a
r
e
,
r
u
l
e
a
n
d
c
o
r
r
e
l
a
t
i
o
n
.
Figure
2. The
s
ta
ge
of
process
of f
eat
ur
e
subs
et
f
in
ding
us
in
g fil
te
rs
ap
proa
ch
2.5.
Clas
sific
at
i
on
The
cl
assifi
cat
ion
sta
ge
ha
s
a
ro
le
to
fin
d
out
how
fa
r
this
cl
assifi
cat
ion
m
od
el
is
able
to
dete
rm
in
e
the
st
at
us
of
ri
ver
wate
r
qual
it
y
in
the
ri
gh
t
way
based
on
the
data
of
riv
er
for
each
at
tribu
te
.
T
her
e
ar
e
fou
r
al
gorithm
s
us
ed
in
this
rese
a
r
ch
as
f
ollows:
Decisi
on
t
ree
(
DT)
[
18]
,
nai
ve
bayes
[17]
,
K
-
near
e
st
nei
ghbo
rs
(KN
N
)
[20]
a
nd
rand
om
f
or
es
t
[21]
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Feature
selec
ti
on for
mult
iple w
ater
qualit
y
s
tatus
:
inte
gr
at
ed bootst
rap
pin
g… (
Shofwat
ul U
y
un
)
4335
2.6.
Perfo
r
ma
nce
ev
alu
ati
on
T
h
e
t
e
s
t
i
n
g
p
r
o
c
e
s
s
o
f
t
h
i
s
r
e
s
e
a
r
c
h
u
s
e
s
k
-
f
o
l
d
c
r
o
s
s
v
a
l
i
d
a
t
i
o
n
i
n
d
i
s
t
r
i
b
u
t
i
n
g
t
h
e
d
a
t
a
s
e
t
i
n
t
o
t
w
o
p
a
r
t
s
t
h
o
s
e
a
r
e
d
a
t
a
f
o
r
t
r
a
i
n
i
n
g
a
n
d
d
a
t
a
f
o
r
t
e
s
t
i
n
g
a
l
t
e
r
n
a
t
e
l
y
f
o
r
t
e
n
t
i
m
e
s
.
W
h
i
l
e
s
e
v
e
r
a
l
p
a
r
a
m
e
t
e
r
s
a
r
e
u
s
e
d
t
o
c
o
m
p
a
r
e
t
h
e
p
e
r
f
o
r
m
a
n
c
e
b
e
t
w
e
e
n
o
n
e
m
o
d
e
l
a
n
d
o
t
h
e
r
m
o
d
e
l
s
t
h
o
s
e
a
r
e
a
c
c
u
r
a
c
y
,
p
r
e
c
i
s
i
o
n
,
a
n
d
r
e
c
a
l
l
[
2
8
]
.
3.
RESU
LT
S
O
F R
ESE
ARC
H
3.1.
Data u
n
derst
andi
ng
Ba
sed
on
t
he
r
esult
of
featu
r
e
sel
ect
ion
in
m
anu
al
way
a
nd
t
he
cl
assifi
cat
ion
of
sta
tu
s
of
t
he
ri
ve
r
water
qual
it
y,
the
S
TORET
m
et
hod
is
us
e
d
ba
sed
on
the
sel
ect
ed
feat
ur
e
.
Ther
e
are
f
our
cl
asses
th
os
e
a
re
A
,
B,
C
and
D.
The
exam
ple
of
da
ta
us
ed
i
n
this
researc
h
is
sh
own
in
Table
1.
T
here
are
thirte
en
sel
ect
ed
featur
e
s
(tem
per
at
ur
e
,
pH
,
D
HL,
DO,
BOD
,
COD
,
TSS
,
NO
3
N,
NO
2
N,
PO
4
P,
deter
ge
nt,
total
colifo
rm
and
faecal
colifo
r
m
)
based
on
the
unit
of
qua
li
ty
raw
.
For
e
xam
ple,
the
data
in
the
fo
ur
th
colum
n
is
t
he
data
sam
ple
of
rive
r
water
in
on
e
po
i
nt
of
obser
vation
with
th
e
value
f
or
eac
h
fe
at
ure
am
ou
n
te
d
el
eve
n
f
eat
ur
es
in cate
gory
of
sta
tus of q
ualit
y A.
T
h
e
v
a
l
u
e
s
o
f
t
h
e
t
o
t
a
l
c
o
l
i
f
o
r
m
a
n
d
f
a
e
c
a
l
c
o
l
i
f
o
r
m
f
e
a
t
u
r
e
s
a
r
e
n
o
t
d
e
t
e
c
t
e
d
.
Table
1.
Data
unde
rstan
ding:
D
at
a sam
ple w
it
h
cl
assifi
cat
ion
res
ult o
f qua
li
ty
stat
us
w
it
h
thi
rteen
featu
res
No
Des
criptio
n/
Para
m
eter
Un
it of
Quality
Sta
n
d
ard
1
2
3
4
5
6
S
ta
tu
s Mu
tu
A
ir
(
W
ate
r
Qu
alit
y
St
at
u
s
)
A
C
C
B
C
D
1
Te
m
p
e
ratur
m
g
/l
3
2
.1
2
9
,9
3
0
,2
31
2
8
,1
2
8
,5
2
pH
C
7
,29
6
,05
7
,16
7
,18
6
,69
7
,14
3
DHL
m
h
o
s/c
m
491
498
472
472
458
514
4
DO
3
,8
4
,7
4
,7
4
,2
5
4
,8
5
BOD
m
g
/l
2
,6
2
,5
3
,77
3
,69
6
,77
4
,93
6
COD
m
g
/l
1
1
,8
1
9
,11
18
2
3
,02
2
4
,58
2
9
,38
7
TSS
J
m
l/1
0
0
m
l
40
40
8
1
,4
64
212
206
8
NO3
N
2
,08
2
.26
3
2
.79
9
2
,95
2
,4
2
,80
9
9
NO2
N
0
,18
7
0
,05
9
0
,15
1
0
,12
3
0
,13
3
0
,10
8
10
Po
4
P
0
,08
2
0
,11
7
0
,08
7
0
,06
8
0
,06
3
0
,08
7
11
Detergen
0
,05
6
0
,01
2
0
,00
6
0
,05
1
0
,09
8
0
,02
12
Total Co
lif
o
r
m
430
230
13
Faecal Co
lif
o
r
m
230
90
3.2.
Pre
-
pr
ocessin
g
In
this
sta
ge,
there
a
re
seve
ra
l
ste
ps
cond
ucted
to
pre
par
e
t
he
dataset
that
is
fr
ee
f
ro
m
the
e
m
pty
data
and
to
n
orm
ali
ze
the
data
t
o
get
t
he
good
resu
lt
of
cl
assi
ficat
ion
proce
s
s.
I
n
this
cas
e,
so
m
e
ex
per
i
m
ents
of
pre
-
proce
ssing
m
et
ho
d
are
cond
ucted
wit
h
the
cl
assifi
cat
ion
m
e
tho
d
of
decisi
on
tree
us
in
g
five
-
f
old
cro
s
s
validat
io
n
with
strat
ifie
d
sam
pling.
T
he
res
ul
t
of
te
sti
ng
e
xperim
ent
is
dif
fer
e
nt
f
ro
m
the
t
-
Test
to
get
the
be
st
pre
-
pr
ocessin
g
m
et
ho
d, w
hich
is
sho
wn
i
n
Ta
ble 2. Col
um
n
B sh
ow
s
the ac
cur
acy
res
ult o
f
usi
ng the
d
at
a
tha
t
pr
e
viously
has
no
t
co
nducte
d
the
no
rm
alis
at
ion
of
79.
2%
.
Colum
n
C
sh
ows
the
ac
cu
racy
res
ult
of
us
i
ng
the
data
that
previ
ou
sly
has
cond
ucted
the
rep
la
ce
proc
es
s
towards
the
m
issi
ng
value
of
82.
5%
.
Col
um
n
D
sh
ows
the
acc
ur
acy
res
ult
of
us
in
g
the
da
ta
that
pr
evio
us
ly
has
co
nducted
the
no
r
m
al
iz
ation
of
79.2
%
,
and
c
olu
m
n
E
sh
ows
t
he
acc
ur
acy
re
su
lt
of
us
in
g
the
data
that
pr
e
viousl
y
has
co
nducte
d
the
norm
al
izati
on
process
an
d
th
e
rep
la
ce
proc
ess
towards
th
e
data
with
m
i
ssing
value
of
83.3%.
Ba
sed
on
the
di
ff
e
rence
of
the
te
sti
ng
res
ult
with
t
he
t
-
Test
,
it
show
s
that
the
pr
e
-
pr
ocessin
g
pr
oce
ss
(c
onduct
ing
the
norm
alizat
ion
of
data an
d
t
he re
place t
owar
ds
t
he
m
issi
ng
v
al
ue) is able t
o
i
n
crease
the
pe
r
form
ance o
f
classi
ficat
ion
res
ult.
Table
2.
T
he
test
ing
res
ult i
s di
ff
ere
nt fr
om
t
he
t
-
Te
st t
o get
the
best
pr
e
-
pr
ocessin
g
m
et
ho
d
A
B
C
D
E
0
.79
2
+/
-
0
.02
9
0
.82
5
+/
-
0
.07
3
0
.79
2
+/
-
0
.05
9
0
.83
3
+/
-
0
.08
8
0
.79
2
+/
-
0
.02
9
0
.35
1
1
.00
0
0
.32
8
0
.82
5
+/
-
0
.07
3
0
.27
6
0
.82
0
0
.79
2
+/
-
0
.05
9
0
.22
9
0
.83
3
+/
-
0
.08
8
3.3.
Imba
l
an
ce c
l
ass
The
dataset
ob
ta
ined
in
data
unde
rstan
ding
s
t
age
ha
s
im
balance
data
i
n
e
ach
cl
ass.
This
co
nd
it
io
n
la
te
r
will
giv
e
eff
ect
on
t
he
data
trai
ni
ng
process
.
Th
ere
fore,
t
hr
ee
sce
nar
i
os
are
co
nducte
d
at
this
sta
ge
as
fo
ll
ows:
SM
OTE,
bootstra
pp
i
ng
a
nd
integrati
on
bet
wee
n
SMOT
E
an
d
bootstra
pp
i
ng,
in
wh
ic
h
th
e
tr
ai
nin
g
and
te
sti
ng
pro
cess
are
co
nduc
te
d
with
the
de
ci
sion
tree
m
e
thod
us
i
ng
10
f
old
-
cr
os
s
valid
at
ion
.
T
he
dif
f
erent
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
1
0
, No
.
4
,
A
ugus
t
2020
:
4331
-
4339
4336
te
sti
ng
ex
per
i
m
ent
resu
lt
with
the
t
-
Test
in
handling
the
i
m
balance
cl
ass
case
is
sh
own
in
Ta
ble
3.
Colum
n
B
sho
ws
the
us
e
of
SMOTE
m
e
t
ho
d,
c
olu
m
n
C
sh
ows
t
he
us
e
of
bootstra
ppin
g
m
et
ho
d,
a
nd
colum
n
D
is
t
he
inte
grat
ion
bet
ween
SMOT
E
m
eth
od
a
nd
boots
trap
ping.
Ba
se
d
on
the
ex
pl
anati
on
of
Ta
ble
3,
it
sh
ows
that
the
us
e
of
SMOTE
m
et
ho
d
i
s
able
to
incr
e
ase
the
accu
ra
cy
resu
lt
of
tr
ai
nin
g
of
96.
5%
and
the
accuracy
r
esult
of
trai
ni
ng
proces
s
kee
ps
increasin
g
usi
ng
the
inte
gr
a
ti
on
m
et
ho
d
be
tween
SM
OT
E
and
bootstra
pp
i
ng
of 98.
8%.
Table
3.
Dif
fere
nt test
ing res
ul
t wit
h
t
-
Test
t
o get the m
et
ho
d t
o ha
nd
le
t
he
i
m
balance cl
ass case
A
B
C
D
0
.96
5
+/
-
0
.02
5
0
.85
8
+/
-
0
.04
0
0
.98
8
+/
-
0
.03
2
0
.96
5
+/
-
0
.02
5
0
.00
0
0
.08
9
0
.85
8
+/
-
0
.04
0
0
.00
0
0
.98
8
+/
-
0
.03
2
3.4.
Feature
s
el
ec
t
ion
The
ta
r
get
of
t
his
sta
ge
is
to
get
the
best
f
eat
ur
e
in
deter
m
ining
the
sta
tus
of
rive
r
w
at
er
qu
al
it
y.
Ther
e
ar
e
fi
ve
al
gorithm
s
us
e
d
in
co
nd
ucting
the
fe
at
ur
e
sel
ect
io
n
with
filt
er
appr
oach
as
f
ollows:
inf
or
m
at
ion
ga
in,
c
hi
s
qu
a
re,
der
i
vation,
c
orrelat
ion
a
nd
by
r
ule.
T
he
co
di
ng
is
c
onduct
e
d
pr
e
viously
f
or
eac
h
featur
e
.
F
1
=Te
m
per
at
ur
e;
F
2
=
pH
;
F
3
=
DHL
;
F
4
=
D
O;
F
5
=
B
O
D;
F
6
=
CO
D;
F
7
=
TSS;
F
8
=
NO
3
N;
F
9
=
NO
2
N;
F
10
=
P
o
4P
;
F
11
=
Deter
gen
t;
F
12
=
Total
Coli
form
and
F
13
=
Faecal
Coli
fo
rm
.
The
res
ult
of
sel
ect
ed
at
tribu
te
an
d
featur
e
for ea
c
h
al
go
rithm
o
f feat
ur
e
selec
ti
on is s
how
n
i
n Table
4.
Table
4.
Sele
ct
ed feat
ur
e
set
ba
sed o
n
s
e
ver
al
f
eat
ure sel
ect
ion al
gorithm
s
No
Featu
re
Sel
ectio
n
Alg
o
rith
m
s
Su
b
set Fitu
r
1
Ru
le
{F
5
,
F
7
,
F
8
,
F
9
,
F
3
,
F
6
,
F
4
,
F
12
}
2
Ch
i Squ
are
{F
2
,
F
6
, F
4
,
F
5
,
F
8
,
F
1
3
,
F
1
,
F
12
}
3
Inf
o
r
m
atio
n
G
ain
{F
5
,
F
1
3
,
F
1
2
,
F
6
,
F
3
,
F
1
1
,
F
7
,
F
8
,
F
9
,
F
1
}
4
Co
rr
elatio
n
{F
2
,
F
1
3
,
F
1
2
,
F
6
,
F
1
1
,
F
5
,
F
3
,
F
8
}
5
Derivatio
n
{F
1
0
,
F
1
1
,
F
7
,
F
9}
Ba
sed
on
the
da
ta
fr
om
Table
4,
it
can
be
se
en
that
the
fi
ndin
g
of
featu
re
su
bs
et
has
the
best
sco
re
us
in
g
five
featur
e
sel
ect
ion
a
lgorit
hm
s
with
filt
er
ap
proac
h.
For
e
xam
pl
e,
in
num
ber
1
the
sec
ond
r
ow
the
r
e
are
ei
gh
t
feat
ur
e
s
ubset
s
produce
d
by
th
e
al
go
rithm
ru
le
tho
se
a
re:
BOD,
T
SS,
NO
3
N,
NO
2
N,
DHL
,
COD,
DO
a
nd
Total
Coli
fo
r
m
.
Af
te
rw
a
r
ds
,
the
featur
e
sel
ect
ion
res
ult
us
es
chi
sq
ua
re
al
gorithm
(p
H,
COD,
DO,
BO
D,
NO
3
N,
Faecal
C
olifo
rm
,
T
e
m
per
at
ur
e,
Total
Co
li
fo
rm
),
inf
or
m
at
ion
ga
in
(B
O
D,
Faecal
C
olif
orm
,
Total
Coli
form
,
COD
,
D
HL
,
d
et
er
gen
t,
TSS
,
N
O
3
N,
N
O
2
N,
te
m
per
at
ur
e
),
c
orrelat
ion
(
pH,
faecal
c
oliform
,
total
colifor
m
,
CO
D,
deter
ge
nt,
D
HL,
D
HL,
N
O
3
N
)
a
nd
der
ivati
on
(P
o
4P
,
deter
gen
t,
TS
S,
NO
2
N
),
wh
ic
h
is
s
how
n
in
Table
4
i
n
the
ne
xt
r
ow
wit
h
se
ver
a
l
sel
ect
ed
feat
ur
e
subsets.
A
fterw
a
r
ds
a
le
arn
i
ng
process
is
c
onduct
ed
f
ro
m
tho
se
seve
ra
l
featur
e
sub
set
s
us
in
g
t
he
decisi
on
tr
ee
m
e
tho
d
t
o
know
the
perform
ance
an
d
t
he
res
ult
is
s
how
n
i
n
Ta
ble
5.
Col
um
n
B
to
c
olum
n
F
show
th
e
cl
assifi
cat
ion
res
ult
us
in
g
the
sel
ect
ed
featur
e
s
ubset
pr
od
uced
usi
ng
se
ver
al
fe
at
ur
e
sel
ect
ion
al
go
rithm
s
(ch
i
sq
ua
re,
der
i
vation,
inf
or
m
at
ion
ga
in,
c
orrelat
ion
and
r
ule).
T
he
t
-
Test
te
sti
ng
r
esult
sho
ws
th
at
the
use
of
s
e
le
ct
ed
featu
re
s
ub
s
e
t
pro
du
ce
d by th
e inform
at
ion
gain
al
gorithm
h
as t
he hig
hest accu
racy val
ue
of
99.5
%
.
3.5.
Clas
sific
at
i
on
Af
te
r
t
he
be
s
t
featur
e
s
ub
set
has
bee
n
ob
ta
ine
d,
w
hich
is
produ
ced
by
s
om
e
al
gorithm
s,
a
cl
assifi
cat
ion
is
co
nducte
d
us
in
g
f
o
ur
cl
as
s
ific
at
ion
al
gorithm
s
then
th
e
ave
rag
e
val
ue
is
cal
culat
ed
from
the
use
of
t
he
s
el
ect
ed
featu
re
subset.
T
he
cl
assifi
cat
ion
al
gorithm
us
ed
ar
e:
decisi
on
tree
,
k
-
N
N,
naï
ve
bayes
and
ra
ndom
forest.
Ba
se
d
on
the
data
s
ho
wn
i
n
Ta
ble
6
an
d
Fig
ure
3,
it
c
an
be
f
o
und
out
that
th
e
res
ult
of
cl
assifi
cat
io
n
usi
ng
ei
ght
f
eat
ur
e
s
ubset
s
pro
du
ce
d
by
chi
square
al
gor
it
h
m
with
the
hi
gh
est
acc
ur
ac
y
valu
e
is
produce
d
by
the
decisi
on
tree
al
gorith
m
of
98.
50%
with
the
acc
uracy
aver
a
ge
f
or
fou
r
cl
assif
ic
at
ion
al
gorith
m
s
of
96.
29%
.
Wh
il
e
the
res
ult
of
cl
assifi
cat
ion
us
i
ng
f
our
feat
ur
e
subsets
pro
du
ced
by
the
der
i
vation
al
gorithm
with
the
hi
gh
est
a
ccur
acy
value
is
pro
du
ce
d
by
the
ra
ndom
fo
rest
al
go
r
it
h
m
of
98.
49
%
with
the
accu
racy
aver
a
ge
for
f
our
cl
assi
ficat
ion
al
gorit
h
m
s
of
91
.
86%.
T
he
us
e
of
featur
e
subset
pro
du
c
e
d
by
the
in
f
or
m
at
ion
gain
al
gor
it
h
m
a
m
ou
nted
te
n
feat
ur
e
subsets
is
a
ble
to
pro
du
ce
the
hi
gh
e
st
accu
racy
value
with
the
deci
sion
tree
an
d
ran
dom
fo
re
st
cl
assifi
cat
ion
al
gorithm
s
of
99.
50%.
Wh
il
e
f
or
the
aver
a
ge
of
9
6.9
2%
the
diff
e
re
nt
resu
lt
is
al
so
sh
own
by
the
rest
of
the
two
al
gorithm
s
tho
se
are
correla
ti
on
a
nd
ru
le
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Feature
selec
ti
on for
mult
iple w
ater
qualit
y
s
tatus
:
inte
gr
at
ed bootst
rap
pin
g… (
Shofwat
ul U
y
un
)
4337
Both
produce
t
he
be
st
accu
ra
cy
value
by
th
e
sam
e
c
la
ssific
at
ion
al
gorith
m
that
is
ran
dom
fo
rest
of
97.
99%
and
99.
50%.
G
ener
al
ly
,
it
can
be
co
ncl
ud
e
d
t
hat
featu
re
s
ubset
produce
d
by
the
inf
or
m
at
i
on
gain
an
d
ra
ndom
al
gorithm
s is able to p
rod
uce
the accu
racy l
e
vel m
or
e tha
n 96.5%
.
Table
5.
Res
ult o
f
T
-
te
st t
ow
a
rd
s
classi
ficat
ion res
ul
t u
sin
g decisi
on tree
algorit
hm
and sele
ct
ed fe
at
ur
e s
ubset
A
B
C
D
E
F
0
.98
5
+/
-
0
.01
7
0
.95
0
+/
-
0
.04
7
0
.95
5
+/
-
0
.01
6
0
.97
7
+/
-
0
.03
2
0
.98
8
+/
-
0
.03
2
0
.98
5
+/
-
0
.01
7
0
.04
1
0
.19
6
0
.52
0
0
.83
0
0
.95
0
+/
-
0
.04
7
0
.01
0
0
.14
6
0
.05
1
0
.99
5
+/
-
0
.01
6
0
.13
8
0
.51
2
0
.97
7
+/
-
0
.03
2
0
.48
8
0
.98
8
+/
-
0
.03
2
T
able
6.
Cl
assi
ficat
ion
res
ult uses feat
ure s
ubset
produce
d by feat
ur
e
sel
ect
ion
process
Clas
sif
icatio
n
Alg
o
rith
m
Accurac
y
Recall
Precisio
n
Ch
i Squ
are
Decisio
n
T
ree
9
8
,50
%
9
8
,47
%
9
8
,70
%
k
-
NN
9
6
,73
%
9
6
,74
%
9
7
,15
%
Naïv
e Bay
es
9
2
,22
%
9
2
,37
%
9
2
,75
%
Ran
d
o
m
For
est
9
7
,74
%
9
7
,72
%
9
7
,97
%
Derr
iv
atio
n
Decisio
n
T
ree
9
4
,99
%
9
5
,25
%
9
5
,53
%
k
-
NN
9
5
.48
%
9
5
.49
%
9
5
.99
%
Naïv
e Bay
es
7
8
.38
%
7
9
.18
%
8
1
.28
%
Ran
d
o
m
For
est
9
8
.49
%
9
8
.55
%
9
8
.65
%
Inf
o
r
m
atio
n
G
ain
Decisio
n
T
ree
9
9
,50
%
9
9
,50
%
9
9
,50
%
k
-
NN
9
7
.22
%
9
7
.22
%
97.
75%
Naïv
e Bay
es
9
1
.46
%
9
1
.57
%
9
2
.16
%
Ran
d
o
m
For
est
9
9
.50
%
9
9
.50
%
9
9
.50
%
Co
rr
elatio
n
Decisio
n
T
ree
9
7
,74
%
9
7
,77
%
9
8
,10
%
k
-
NN
9
7
.24
%
9
7
.22
%
9
7
.73
%
Naïv
e Bay
es
9
1
.97
%
9
2
.15
%
9
2
.92
%
Ran
d
o
m
For
est
9
7
.99
%
9
8
.02
%
9
8
.37
%
Ru
le
Decisio
n
Tr
ee
9
8
.75
%
9
8
.84
%
9
9
.06
%
k
-
NN
9
8
.24
%
9
8
.32
%
9
8
.29
%
Naïv
e Bay
es
9
4
.98
%
9
5
.23
%
9
5
.48
%
Ran
d
o
m
For
est
9
9
.50
%
9
9
.52
%
9
9
.55
%
Figure
3. Per
f
orm
ance co
m
par
iso
n of t
he use
of select
ed fea
ture base
d on the a
ver
a
ge res
ult
(acc
ur
acy
, rec
al
l and
preci
sio
n) u
si
ng fo
ur
cl
as
sific
at
ion
al
gorithm
s
3.6.
Perfo
r
ma
nce
eva
l
u
ati
on
Gen
e
rall
y
the
m
od
el
of
patte
rn
rec
ogniti
on
for
the
cl
assifi
cat
ion
of
the
sta
tus
of
river
w
at
er
qu
al
it
y
base
d
on
seve
ral
water
feat
ur
e
subsets
ha
s
the
s
ub
sta
ge
of
pr
ocess
a
s
f
ollow
s:
wit
hout
pr
e
-
pr
oce
ssi
ng,
pre
-
pr
ocessin
g,
SMOTE
te
ch
ni
qu
e,
a
nd
boots
trap
ping
to
ha
ndle
the
i
m
balance
cl
ass
an
d
the
featu
re
sel
e
ct
ion
.
In
this
case,
a
com
par
ison
fo
r
eac
h
s
ub
-
process
is
cond
ucted
us
ing
th
e
decisi
on
tree
al
gorithm
in
the
cl
assifi
cat
ion
pr
ocess.
Ba
sed
on
the
te
st
ing
res
ult
us
i
ng
10
-
f
old
cr
oss
validat
io
n,
t
he
acc
ur
acy
a
ver
a
ge
value
is
obtai
ne
d
as
seen
in F
igure
4.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
1
0
, No
.
4
,
A
ugus
t
2020
:
4331
-
4339
4338
Figure
4. Per
f
orm
ance co
m
par
iso
n for eac
h
s
te
p
usi
ng
decisi
on
t
ree alg
ori
thm
in
it
s
cl
assifi
cat
ion
sta
ge
4.
CONCL
US
I
O
N
The
am
ount
of
data
t
hat
is
im
bal
ance
d
i
n
ea
ch
cl
ass
is
pro
ved
to
gi
ve
e
ffec
t
on
the
le
ar
ning
proces
s
on
t
he
syst
em
of
patte
r
n
re
cogniti
on.
T
he
SMOTE
te
c
hniq
ue
a
nd
bo
otstrap
ping
a
r
e
pro
ved
to
be
able
to
ha
nd
le
t
he
im
balanc
e
cl
ass
case,
in
wh
ic
h
there
is
a
sig
ni
ficant
increase
in
the
acc
u
rac
y
value
f
r
om
83
.
3%
to
98.
8%.
Wh
i
le
to
decr
ease
the
noise
in
th
e
at
tribu
te
,
s
om
e
exp
erim
ents
hav
e
bee
n
c
onduct
ed
us
in
g
fi
ve
featur
e
sel
ect
ion
al
gorithm
s
(ch
i
s
qu
a
re,
correla
ti
on,
de
riva
tio
n,
in
for
m
at
ion
gain
a
nd
ru
le
).
I
f
se
en
f
ro
m
the
ave
rag
e
,
th
e
u
se
of
feat
ure
produce
d
by
the
r
ule
al
gorithm
and
the
in
f
or
m
at
ion
gain
al
gorithm
has
t
he
bes
t
accuracy
valu
e
of
97.87%
and
96.
92
%
.
The
use
of
s
el
ect
ed
feature
us
in
g
the
i
nfor
m
at
ion
ga
in
with
the d
eci
si
on tre
e cla
ssific
at
ion al
gorithm
sh
ows th
e inc
rease
in
the
acc
ur
ac
y l
evel of
99.5%.
REFERE
NCE
S
[1]
Keputusa
n
Mente
ri
Nega
r
a
Li
n
gkungan
Hidup,
“
Keputusa
n
M
ent
er
i
Nega
ra
Li
ngkungan
Hi
dup
Nom
or
11
5
Te
nt
ang
Pedom
a
n
Penent
u
an
St
atus
Mutu
Air,”
Ja
karta
Me
n
te
ri
N
egara
Lingkung H
idup,
pp
.
1
–
15
,
2003.
[2]
A.
D.
Sutadi
an,
N.
Mu
tt
il
,
A.
G.
Yilmaz,
and
B.
J.
C.
Pere
ra,
“
Deve
lopment
of
a
wate
r
qual
it
y
in
dex
for
rive
rs
in
W
est
Java
Provi
nce
,
Indone
si
a,”
Ec
ol
.
Ind
ic
.
,
vo
l. 85,
pp.
966
-
982
,
2018
.
[3]
M.
Bora
,
and
D.
C.
Gos
wam
i,
“
W
at
er
qual
ity
assess
m
ent
in
te
rm
s
of
wate
r
qual
ity
ind
ex
(W
Q
I):
ca
se
stud
y
of
the
Kolong
River,
As
sam
,
Indi
a,
”
Appl.
Wa
te
r S
ci
.
,
vol
.
7
,
no
.
6
,
pp
.
3125
-
3135
,
20
16.
[4]
K.
A.
Shah
,
and
G.
S.
Jos
hi,
“
Eva
luation
of
wat
er
qualit
y
ind
ex
for
R
ive
r
Sabar
m
at
i,
Gujar
at,
In
dia
,
”
Appl.
Wat
e
r
Sci
.
,
vo
l. 7, no. 3
,
pp
.
1349
-
135
8
,
2017.
[5]
T.
Car
lson
,
an
d
A.
Cohen,
“
Li
nking
comm
unity
-
base
d
m
oni
tori
ng
to
wat
er
poli
c
y
:
Per
ce
p
ti
ons
of
ci
t
izen
scie
nti
sts
,
”
J. E
n
vi
ron.
Manag
e
.
,
vol.
219
,
pp
.
168
–
177,
2018
.
[6]
R.
L
.
Kasw
an
to
,
H.
S.
Arifin
,
and
N.
Naka
gos
hi,
“
W
at
er
qua
lit
y
inde
x
as
a
si
m
ple
i
ndi
c
at
or
f
or
sus
ta
ina
bi
li
t
y
m
ana
gement
of
r
ura
l la
ndsc
ape
in
W
est
Java
,
Indo
nesia
,
”
In
t. J.
En
vi
ron. Prot.
,
vo
l. 2, no. 12, pp. 17
–
27,
2012
.
[7]
R.
Y.
Ta
llar
,
an
d
J.
P.
Suen,
“
Id
ent
i
fi
cation
of
wate
rbod
y
sta
tus
in
Indone
sia
b
y
u
sing
pre
dic
t
ive
i
ndex
assess
m
ent
to
ol,”
Int
.
So
il W
ate
r Conserv. R
e
s
.
,
vol
.
3
,
no
.
3
,
pp.
224
-
238
,
20
15.
[8]
E.
Sal
ahat
,
and M
.
Qasai
m
eh,
“
Rec
en
t
adv
anc
es
in
fe
at
ure
s
ext
ra
ct
ion and
desc
r
i
pti
on
a
lgori
thms
:
A c
om
pre
hensi
ve
surve
y
,
”
Proc. I
EE
E
Int
.
Conf
.
I
nd.
Techno
l
.
,
pp
.
1059
-
1063,
201
7.
[9]
S.
U
y
u
n
,
and
L
.
Chorida
h,
“
Feature
sele
c
ti
on
m
a
m
m
ogra
m
base
d
on
bre
ast
ca
nc
e
r
m
ini
ng
,”
Int.
J.
El
ectr.
Comput.
Eng.
,
vol
.
8
,
no
.
1,
pp
.
60
-
69
,
20
18.
[10]
Q.
W
ang,
Z.
L
uo,
J.
Huang,
Y.
Fe
ng,
and
Z
.
Li
u,
“
A
novel
ense
m
ble
m
et
hod
for
imbala
nc
ed
dat
a
l
ea
rn
ing
,”
Comput
. I
nt
el
l
.
Neurosci
.
,
pp.
1
-
11,
2017
.
[11]
L.
Ma
,
and
S.
Fan,
“
CURE
-
SM
OTE
al
gorithm
and
hy
br
id
al
gorit
hm
for
fea
tur
e
select
i
on
and
par
ame
te
r
opti
m
iz
ation
b
ase
d
on
r
andom fo
rests,
”
BMC
Bi
oi
nformatic
s
,
vo
l.
18,
no
.
1
,
pp
.
1
-
18,
2017
.
[12]
A.
Ferná
nde
z,
S
.
Garc
í
a,
F.
He
rre
ra,
and
N.
V.
Ch
awla
,
“
SM
OTE
for
Learni
ng
fro
m
Im
bal
anc
ed
D
at
a
:
Progress
an
d
Chal
le
ng
es,
Mar
king
th
e
15
-
y
e
ar
Anniver
sar
y
,
”
J
.
Arti
f
.
Intell. Res
.
,
vol. 61,
pp.
86
3
-
905,
2018
.
[13]
M.
Rez
a
,
S.
M
iri
,
and
R.
Jav
i
dan,
“
A
h
y
brid
dat
a
m
ini
ng
a
pproa
ch
for
in
t
rusion
de
tecti
on
on
imbala
nc
e
d
NSL
-
KD
D Dat
ase
t,
”
Int
.
J. A
dv
.
Comput.
Sc
i. A
p
pl.
,
vol
.
7
,
no
.
6
,
pp.
20
-
25
,
2016
.
[14]
B.
Krawc
z
y
k,
M.
Gala
r
,
Ł.
Jele
ń
,
and
F.
H
err
era,
“
Evol
u
tionar
y
under
sam
pli
ng
boosting
for
imbala
n
ce
d
cl
assifi
ca
t
ion
of
bre
ast ca
n
ce
r
m
al
igna
n
c
y
,
”
A
pp
l
.
Soft
Comput
.
J.,
vol.
38
,
pp
.
714
-
726,
2016
.
[15]
M.
Alghamdi,
M.
Al
-
Mall
ah,
S
.
Ketey
i
an,
C.
B
rawne
r,
J.
Ehrman,
and
S.
Sakr,
“
Predic
ti
ng
dia
b
et
es
m
el
li
tus
usi
ng
SM
OTE
and
ense
m
ble
m
ac
hine
le
arn
ing
appr
oa
ch:
The
Henr
y
Ford
Exe
rcI
se
Te
sting
(FIT)
proje
c
t,
”
PLo
S
One
,
vol.
12
,
no
.
7
,
pp
.
1
-
15
,
2017
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Feature
selec
ti
on for
mult
iple w
ater
qualit
y
s
tatus
:
inte
gr
at
ed bootst
rap
pin
g… (
Shofwat
ul U
y
un
)
4339
[16]
N.
Mus
ta
fa,
J.
-
P.
Li
,
R
.
A.,
And
M.
Z.
,
“
A
Cla
ss
ifi
cation
m
odel
for
imbala
nce
d
m
edi
ca
l
da
ta
ba
sed
on
PC
A
and
far
the
r
d
ista
n
ce
base
d
s
y
n
the
t
ic
m
inori
t
y
ov
ersa
m
pli
ng
te
chn
iqu
e
,”
Int.
J.
Adv.
Comput.
Sci.
Ap
pl
.
,
vo
l.
8
,
no.
1
,
pp.
61
-
67
,
2017
.
[17]
A.
Saputra
and
S.
-
,
“
Fraud
detec
t
ion
using
m
a
chi
ne
learni
ng
i
n
e
-
Com
m
erc
e,”
Int.
J.
Adv.
Co
mput.
Sci.
App
l
.
,
vol.
10
,
no
.
9
,
pp
.
33
2
-
339
,
2019
.
[18]
O.
Os
ana
i
y
e
,
H.
Cai
,
K.
K
.
R.
Ch
oo,
A.
Dehgha
n
t
anha
,
Z.
Xu
,
and
M.
Dlodlo,
“
En
sem
b
le
-
base
d
m
ult
i
-
fi
lt
e
r
feature
sele
c
ti
on
m
et
hod
for
DD
oS de
t
ec
t
ion
in
cl
oud
com
puti
ng,
”
Eurasip
J. W
ire
l. Commun.
Ne
tw.,
no
.
1
,
2016.
[19]
A.
Mus
ta
qee
m
,
S.
M.
Anw
ar,
a
nd
M.
Maji
d
,
“
Multi
class
cl
assificat
ion
of
ca
rd
ia
c
arr
h
y
thmia
using
improved
fea
tur
e
sel
ec
t
io
n
and
SVM
inva
ri
an
ts,”
Co
mputati
onal
an
d
Mathe
ma
tical
Me
thods
in
Me
dicine
.,
no
.
1,
pp.
1
-
10
,
2018
.
[20]
J.
W
äl
dche
n
,
an
d
P.
Mäde
r,
“
Pla
n
t
s
pec
i
es
ide
nt
i
fic
a
ti
on
using
co
m
pute
r
vision
technique
s
:
A
s
y
s
t
emati
c
li
t
erature
rev
ie
w,
”
Spring
e
r
,
vol. 25, no. 2,
2018.
[21]
Z.
M.
Hira
,
and
D.
F.
Gill
ie
s,
“
A
rev
ie
w
of
fea
tur
e
sele
c
ti
on
and
f
ea
tur
e
ext
r
ac
t
ion
m
et
hods
appl
ied
on
m
ic
roa
rr
a
y
dat
a
,
”
Adv. Bioin
for
matic
s
,
no
.
1
,
2015
.
[22]
C.
Inc
orva
i
a
,
e
t
al
.
,
“
The
soft
computing
-
base
d
appr
oac
h
to
in
vesti
gate
aller
gi
c
disea
ses:
A
sy
stemati
c
rev
ie
w,
”
Cli
n.
Mol. Alle
r
gy,
vo
l. 15, no. 1
,
pp
.
1
-
14
,
2017
.
[23]
R.
W
.
D.
Pedr
o,
A.
Ma
cha
do
-
Li
m
a,
and
F.
L
.
S.
Nunes,
“
Is
m
ass
cl
ass
ifi
c
a
ti
on
in
m
amm
o
gra
m
s
a
solved
p
roble
m
?
-
A
cri
t
ic
a
l
rev
ie
w ov
er the
la
st 20
y
e
ars,”
E
xpe
rt
Syst. Appl
.
,
vol
.
119
,
pp
.
90
-
103
,
2019
.
[24]
K.
S.
Redd
y
and
E.
S.
Redd
y
,
“
Inte
gra
te
d
appr
o
a
ch
to
detec
t
spa
m
in
socia
l
m
edia
net
works
using
h
y
brid
f
eature
s,
”
Inte
rna
tio
nal
Jo
urnal
of El
e
ct
ri
c
al
and
Comput
er
Engi
n
ee
ring
(
IJE
CE)
,
vol
.
9
,
no
.
1,
p.
562,
2019.
[25]
T.
A.
As
segie
,
a
nd
P.
S.
Nair
,
“
Handwrit
te
n
d
ig
it
s
rec
ogn
it
ion
w
it
h
de
ci
sion
tr
ee
cl
assifi
ca
t
ion:
A
m
ac
hine
l
ea
rn
in
g
appr
oac
h
,
”
In
te
r
nati
onal
Journal
of
El
e
ct
rica
l
an
d
Computer
Engi
nee
ring
(
IJE
CE
)
,
vol.
9,
no.
5
,
pp.
4446
-
4451,
2019.
[26]
J.
Singh,
K.
Si
ngh,
and
J.
Sin
gh,
“
Ree
ngin
ee
r
ing
fra
m
ework
for
open
source
software
using
dec
ision
tree
appr
oac
h
,
”
In
te
r
nati
onal
Journal
of
El
e
ct
rica
l
an
d
Computer
Engi
nee
ring
(
IJE
CE
)
,
v
ol.
9,
no.
3
,
pp.
2041
-
2048,
2019.
[27]
R.
N.
Rit
hesh
,
R.
Vignesh,
and
M.
R
.
Anala,
“
Autonom
ous
tra
ffic
signal
con
trol
using
dec
ision
tr
ee
,
”
Inte
rnat
ion
al
Journal
of
Elec
t
rical
and
Computer
Eng
ine
ering
(
IJE
CE
)
,
vol. 8,
no.
3
,
pp
.
1522
-
1529,
2018
.
[28]
I.
Sum
aiy
a
Th
a
see
n
,
and
C.
As
wani
Kum
ar,
“
Intrusion
det
ec
t
ion
m
odel
using
fusi
on
of
chi
-
square
fea
ture
sel
ec
t
io
n
and
m
ult
i
class
SV
M,”
J
.
King
Sa
ud
Univ.
-
Compu
t.
In
f
.
Sci
.
,
vol
.
2
9,
no
.
4
,
pp
.
462
-
472,
2017
.
BIOGR
AP
H
I
ES
OF
A
UTH
ORS
Dr.
Sh
ofw
atul
‘
Uy
un
,
S.T.
,
M
.
Kom
is
a
full
t
i
m
e
le
ct
ure
r
at
th
e
depa
rtmen
t
of
Inform
at
ic
s
and
Hea
d
of
Info
rm
at
ion
T
ec
hno
log
y
and
Da
ta
b
ase
,
Univer
sias
Islam
Nege
ri
(UIN
)
Sunan
Kal
ij
ag
a
in
Yog
y
ak
art
a
,
Ind
onesia
.
She
ob
tained
h
er
B
ac
he
l
or
degr
e
e
in
Inf
orm
at
ic
s
from
Is
la
m
ic
Univ
ersity
of
Ind
onesia
.
S
he
recei
ved
h
er
M.
Kom
.
and
Dr
in
Com
pute
r
Scie
nc
e
from
th
e
Gadja
h
Mad
a
Univer
sit
y
.
H
er
rese
arc
h
intere
st
s
are
pa
tt
e
rn
re
c
ognit
ion,
ar
ti
fi
cial
in
te
l
li
gen
ce
a
nd
m
edi
ca
l
imag
e
proc
essing
.
Ek
a
S
ulistiy
o
wati,
MA,
MIW
M
is
a
full
ti
m
e
te
ac
hing
at
the
Biol
og
y
Educ
a
ti
on
Stud
y
Program
m
e
at
Stat
e
Isl
amic
Univer
sit
y
(UI
N)
Sunan
Kaliaja
g
a
Yog
y
aka
r
ta
.
Envi
ronm
ental
m
ana
gement
b
y
tra
ini
ng
,
she
obta
in
ed
her
d
egr
ee
on
I
nt
egr
at
ed
W
ater
Ma
nage
m
ent
fro
m
The
Unive
rsit
y
of
Quee
nsl
and
,
Aus
tra
lian.
Her
rese
arc
h
int
e
rest
ran
g
es
fro
m
envi
ronm
ent
al
m
ana
gement, r
es
ourc
e
m
an
age
m
e
nt,
wa
te
r
,
and
b
i
odive
rsit
y
conse
rva
ti
on
Evaluation Warning : The document was created with Spire.PDF for Python.