Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
3
,
June
2020,
pp. 3
227
~
32
34
IS
S
N: 20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v10
i
3
.
pp3227
-
32
34
3227
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
The
pe
rtin
ent single
-
attri
bu
te
-
bas
ed classi
fier
for sm
all datasets
classifi
cation
Mona J
am
joo
m
Depa
rtment
o
f
C
om
pute
r
Scie
n
ces
,
Prince
ss
Nour
ah
Bin
t
Abdulr
a
hm
an
Univer
sit
y
,
Kingdom
of
S
aud
i
A
rab
ia
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
J
ul
27
, 2
019
Re
vised
Dec
5
,
20
19
Accepte
d
Dec
11
, 20
19
Cla
ss
if
y
ing
a
d
at
ase
t
using
ma
chi
n
e
learni
ng
al
gorit
hm
s
can
be
a
big
cha
l
le
nge
wh
en
the
t
arg
et
is
a
s
m
al
l
dataset
.
Th
e
OneR
c
la
ss
ifi
e
r
ca
n
b
e
used
for
such
ca
ses
due
to
it
s
sim
pli
cit
y
and
eff
i
cienc
y.
In
thi
s
pape
r
,
we
rev
eale
d
the
power
of
a
single
a
tt
ribu
te
b
y
in
troducing
t
he
per
t
ine
nt
sin
gle
-
a
tt
ribu
te
-
base
d
-
heteroge
n
ei
t
y
-
r
at
io
class
ifier
(SA
B
-
HR)
th
at
use
d
a
per
t
inent
a
tt
rib
u
te
to
class
if
y
sm
all
dataset
s.
Th
e
SA
B
-
HR’s
u
se
d
fea
tur
e
sel
ec
t
io
n
m
et
hod,
which
use
d
the
Hete
roge
n
ei
t
y
-
R
at
io
(H
-
Rat
io)
m
ea
sure
to
ide
n
t
if
y
th
e
m
ost
hom
ogene
ous
attribut
e
among
t
he
othe
r
attributes
in
the
set.
Our
empiric
a
l
result
s
on
12
be
nchmark
dataset
s
from
a
UCI
m
ac
hine
le
a
rnin
g
rep
osito
r
y
show
ed
tha
t
th
e
SA
B
-
HR
cl
assifie
r
signif
ic
an
tly
outpe
rform
ed
t
he
class
ic
al
OneR
cl
assifie
r
for
sm
al
l
dat
ase
t
s.
In
addi
ti
on
,
using
the
H
-
Rat
io
as
a
fea
tu
r
e
sele
c
ti
on
c
riterio
n
for
select
ing
t
he
single
attribu
te
was
m
ore
eff
ec
tu
al
th
an
othe
r tradition
al
cri
t
eri
a
,
such
as
Inform
at
ion
Ga
i
n
(IG) a
nd
Gain
Rat
io
(GR).
Ke
yw
or
d
s
:
Cl
assifi
cat
ion
Feat
ur
e
selec
ti
on
On
eR
classi
fier
Sing
le
-
at
trib
ute
-
base
d
cl
assi
fier
S
m
a
ll
d
at
aset
Copyright
©
202
0
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Mon
a
Jam
j
oom
,
Dep
a
rtm
ent o
f C
om
pu
te
r
Scie
nces
,
Pr
inces
s No
ur
a
h
Bi
nt
A
bdulra
hm
an
Un
i
ver
si
ty
,
Air
port Roa
d,
Ri
ya
dh
11671,
Kingdom
o
f Sa
ud
i
A
rab
ia
.
Em
a
il
:
m
m
j
a
m
j
oom
@p
nu.e
du.sa
1.
INTROD
U
CTION
Cl
assifi
cat
ion
is o
ne of
the m
a
in tasks
of d
at
a
m
ining
an
d
m
achine lear
ning
[
1] that i
s w
idely
u
sed
to
pr
e
dict
dif
fer
e
nt
real
-
li
fe
sit
uations.
High
accuracy
is
a
key
in
dicat
or
for
a
s
uccess
f
ul
predict
io
n
m
od
el
.
Buil
ding
a
n
ac
cur
at
e
cl
assi
fier
is
on
e
of
t
he
i
m
po
rtant
goal
s,
a
nd
ric
h
dataset
s
m
ake
this
ta
sk
easi
er
an
d
m
ore
eff
ect
ive
[
2].
Cl
assify
ing
s
m
al
l
dataset
s
eff
ic
ie
ntly
is
essenti
al
as
s
om
e
real
sit
uations
ca
nnot
pro
vid
e
a
suffici
ent
nu
m
ber
of
cases.
A
li
m
i
te
d
trai
ni
ng
set
is
c
halle
ng
i
ng
to
le
ar
n
an
d,
as
a
res
ul
t,
base
a
decisi
on
on
it.
In
m
any
mu
lt
ivariable
cl
a
ssific
at
ion
or
r
egr
es
sio
n
pro
bl
e
m
s,
su
ch
as
est
i
m
ation
or
f
or
ecast
in
g,
we
hav
e
a
trai
ning
set
Tp
=
(
x
i
,
t
i
)
of
p
pair
s
of
in
put/
ou
t
pu
t
vect
or
x
∈
ℜ
n
a
nd
scal
ar
ta
rg
et
t
.
T
hu
s
,
acc
ord
ing
to
Vapni
k’s
de
finiti
on
,
a
sm
all
dataset
fo
r
T
p
is
determ
ine
d
as
fo
ll
ows:
"Fo
r
est
im
ating
functi
ons
wi
th
VC
dim
ension
h
, we c
onsider
th
e size
p
of
data
to be sm
al
l i
f
the
rati
o
p
/
h
is
sm
a
ll
(
say
p
/
h
< 20)" [
3].
The
pr
ob
le
m
with
the
sm
al
l
dataset
is
that,
if
not
el
a
bor
at
el
y
colle
ct
ed
,
it
is
no
t
a
rep
rese
ntati
ve
sam
ple.
No
n
-
r
epr
ese
ntati
ve
instances
hinde
r
the
proces
s
of
prov
i
ding
e
nough
in
f
or
m
at
ion
f
or
the
le
arn
e
r
m
od
el
becau
se
of
the
gap
s
e
xi
sti
ng
betwee
n
instances;
th
us
,
the
m
od
el
doe
s
no
t
ge
ne
rali
ze
well
.
Ma
ny
work
s
hav
e
bee
n
pro
po
s
ed
in
t
he
li
t
eratur
e
t
o
so
l
ve
the
pro
blem
of
sm
al
l
data
s
iz
e
by
us
in
g
di
ff
e
ren
t
m
et
ho
ds.
O
ne
of
the
com
m
on
m
e
tho
ds
us
e
d
is
to
increase
the
siz
e
of
data
by
add
ing
arti
fici
al
instances
[4
]
,
but
this
appr
oach
la
cks
data
c
red
i
bili
t
y
and
re
fle
ct
io
n
on
real
-
li
fe
us
e.
So
m
e
rese
arch
e
rs
ha
ve
use
d
featu
re
-
sel
ect
ion
m
et
ho
ds
[5
-
8],
wh
e
reas
a
no
vel
te
ch
nique
us
in
g
m
ulti
ple
runs
for
m
od
e
l
dev
el
op
m
ent
was
propose
d
by
[
9]
and o
t
her
s
.
A
sim
ple
so
lut
ion
is
one
of
t
he
re
quirem
ents
w
hen
the
pro
blem
is
beco
m
in
g
i
ncr
easi
ng
l
y
com
plex.
This p
hilo
sop
hy
has
bee
n
sta
te
d
by
Occam
'
s
razor [
1]. Lite
ratur
e
i
n
the
fiel
d
of
cl
assi
ficat
ion
has
s
how
n
s
om
e
su
ccess
fu
l
at
te
m
pts
of
ver
y
s
i
m
ple
ru
le
s
to
achieve
high
a
ccur
acy
with
m
any
dataset
s
[10].
O
neR
is
on
e
of
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020 :
32
27
-
3234
3228
the
sim
ple
and
widely
us
e
d
a
lgorit
hm
s
in
m
achine
le
a
rn
i
ng
to
buil
d
a
si
m
ple
cl
assifi
er
.
A
tra
de
-
off
be
tween
si
m
plici
t
y
and
high
pe
rfor
m
ance
[10]
m
ake
s
On
eR
’
s
perf
or
m
ance
sli
gh
tl
y
le
ss
accurate
than
sta
te
-
of
-
the
-
art
cl
assifi
cat
ion
a
lgorit
hm
s
[1
1
,
12
]
,
al
th
ough
so
m
e
tim
es
i
t
ou
t
perform
s
the
m
[1
3
,
14]
.
Its
m
a
in
advant
age
is
that
it
balances
the
best
accuracy
possi
ble
with
a
m
od
el
that
is
sti
ll
si
m
ple
en
ough
for
hum
ans
t
o
unde
rstan
d [12
]
.
On
eR
is
a
singl
e
-
at
tribu
te
-
bas
ed
cl
assifi
er
th
at
inv
olv
e
s
on
l
y
on
e
at
tribu
te
at
the
cl
as
sificati
on
tim
e.
A
sin
gle
at
tribu
te
co
ncep
t
is
powe
rful
if
it
c
an
directl
y
infl
uen
ce
t
he
cl
assifi
cat
ion
accu
r
acy
of
the
data
set
in
a
po
sit
ive
m
ann
er
.
Yet
not
al
l
at
tribu
te
s
ha
ve
to
po
sit
ively
con
t
rib
ute
to
the
cl
assifi
cat
io
n
proces
s
w
hich
m
a
y
increase
the
si
ng
le
at
trib
ute
powe
r.
The
si
ng
le
at
trib
ute
ru
le
can
be
m
or
e
ef
fecti
ve
than
c
om
plex
m
et
ho
ds
wh
e
n
it
is
dif
fi
cult
to
le
ar
n
f
r
om
the
dataset
du
e
t
o
it
bei
ng
si
m
ple,
sm
a
ll
,
no
isy
,
or
com
plex.
A
stu
dy
by
[15]
us
e
d
the
sin
gle
at
tribu
te
c
onc
ept
by
c
reati
ng
m
ul
t
iple
on
e
-
dim
ension
al
cl
assifi
ers
from
t
he
or
i
gin
al
dataset
in
the trainin
g
ph
ase and
c
om
bi
ning the r
es
ults in the p
re
dicti
on
ph
a
se. T
he
new
m
et
ho
d
is un
li
ke On
eR
be
caus
e
it
con
side
rs
al
l
at
tribu
te
s’
c
on
t
rib
ution
s
at
the
pr
e
dicti
on
tim
e.
Feat
ure
sel
ect
ion
is
a
data
-
m
ining
pre
-
processi
ng
ste
p
widely
us
e
d
to
i
m
pr
ove
the
cl
assifi
cat
ion
and
reduce
the
per
f
orm
ance
t
i
m
e.
It
is
eff
ec
ti
ve
in
reducin
g
the
da
ta
set
’s
dim
ension
al
it
y
by
el
im
inati
ng
no
n
-
c
on
t
rib
utable
at
trib
utes.
It
us
es
dif
fer
e
nt
te
ch
nique
s
to
c
om
e
up
wi
th
a
sin
gle
at
tri
bu
te
or
a
subse
t
of
at
trib
utes
[
16,
17
]
.
Mo
reover
,
it
has
pro
ve
n
it
s
e
ff
ect
ive
ness
in im
pr
ov
i
ng vario
us
a
pp
li
cat
ion
s
’
predict
ive
accu
racy [
18
-
20
]
.
In
t
his
pap
e
r,
we
ta
c
kle
th
e
pro
blem
of
cl
assify
ing
sm
al
l
dataset
s
by
ex
pandin
g
t
he
powe
r
of
a
per
ti
ne
nt
sin
gle
at
tribu
te
usi
ng
S
AB
-
HR
cl
assifi
er,
w
hi
ch
is
si
m
i
la
r
to
O
neR
cl
assifi
er
in
us
i
ng
sing
le
at
tribu
te
at
cl
as
sific
at
ion
ph
a
s
e,
but
diff
e
re
nt
in
wh
ic
h
inste
ad
of
ge
ner
at
in
g
a
ru
le
f
or
ea
c
h
at
trib
ute,
a
fe
at
ur
e
sel
ect
ion
m
e
tho
d
is
em
plo
ye
d
to
sel
ect
the
at
tribu
te
that
is
le
ss
hetero
ge
nic
am
on
g
the
oth
er
at
tr
ibu
te
s.
We
cal
culat
ed
the
H
-
Ra
ti
o
[
21]
f
or
eac
h
at
t
rib
ute
(att
)
the
n
ide
ntifie
d
t
he
at
tribu
te
wi
th
the
l
ow
e
st
H
-
Ra
ti
o
value
(
att
H
-
Ratio
).
We
us
e
d
th
e
pair
(
att
H
-
Rati
o
,
c),
w
her
e
c
is
the
cl
ass
va
lue,
to
le
ar
n
a
nd
cl
assify
the
sm
a
ll
dataset
.
The
re
su
lt
s
wer
e
e
nc
oura
ging
an
d
s
howe
d
a
sig
nif
ic
ant
i
m
pr
ove
m
ent
com
par
ed
to
the
cl
assic
al
On
eR
cl
assifi
er.
In
a
dd
it
io
n,
we
cre
at
ed
m
ulti
ple
c
la
ssifie
rs
in
the
sa
m
e
m
ann
er
of
S
AB
-
HR,
usi
ng
di
ff
e
ren
t
c
rite
ria
to
sel
ect
the
pe
rtinent
si
ng
le
a
tt
ribu
te
.
We
use
d
IG
an
d
GR
in
the
featu
re
-
s
el
ect
ion
proces
s
an
d
c
reated
S
AB
-
IG
a
nd
S
AB
-
GR
cl
assifi
ers
,
corres
pondin
gl
y.
W
e
in
div
id
ually
com
par
ed
the
ne
w
cl
as
sifie
r
S
AB
-
HR
with
oth
e
rs
(i.e.,
S
AB
-
I
G
an
d
S
AB
-
GR).
T
he
rem
ai
nd
er
of
this
pap
e
r
is
organ
iz
e
d
as
fo
ll
ow
s:
Sect
io
n
2
rev
ie
w
s
the
backgro
und
of
our
w
ork
.
In
Sect
io
n
3,
we
pro
po
s
e
the
rese
arch
m
et
ho
d
SA
B
-
HR
cl
assifi
er.
The
e
xp
e
rim
ents
and
a
bri
ef
discuss
i
on
of
t
he
fi
nd
i
ngs
is
in
subsect
io
ns
3.1
a
nd
3.2,
c
on
s
eq
ue
ntly
.
Finall
y,
Sect
ion
4
c
onc
lud
es
the
pa
per.
2.
BACKG
ROU
ND
In this sect
io
n we
will
r
evie
w
so
m
e o
f
t
he
te
chn
i
qu
e
s that
will
b
e
us
e
d
in
this stu
dy.
2.1.
OneR
c
lassi
fier
On
eR
,
is
s
hort
for
"
On
e
Rule"
,
an
d
has
bee
n
introd
uced
by
Rob
H
olte
[
22,
10]
.
It
is
on
e
of
the
m
os
t
pr
im
itive
te
ch
niques,
base
d
on
a
1‐
le
vel
de
ci
sion
tree
th
at
create
s
one
ru
le
for
eac
h
a
tt
ribu
te
in
t
he
dataset
,
then
sel
ect
s
th
e
ru
le
with
m
i
nim
u
m
cl
assific
at
ion
er
r
or
s
a
s
it
s
"on
e
ru
le
".
To
c
reate
a
ru
le
f
or
an
at
tribu
te
,
it
con
str
ucts
a
fr
e
qu
e
ncy
ta
bl
e
fo
r
each
at
t
rib
ute
again
st
the
cl
ass
[
22
]
,
F
igure
1
s
hows
the
ps
e
udoc
od
e
of
On
eR
al
gorit
hm
.
It
has
sh
own
that
O
neR
work
disti
nctivel
y
well
in
pr
act
ic
e
with
re
al
-
w
or
ld
data
and
ca
n
com
pete
the
s
ta
te
-
of
-
the
-
art
cl
assifi
cat
ion
al
go
rithm
s
in
so
m
e
si
tuati
ons
[13,
14,
23]
.
On
eR
is
us
in
g
one
at
tribu
te
f
or
cl
assifi
cat
ion
a
nd
m
any
con
sider
it
as
one
of
featur
e
sel
e
ct
ion
m
et
ho
d
s
with
featu
re
su
bse
t
con
ta
ini
ng
a
si
ng
le
at
trib
ute
[24].
Com
par
i
ng
t
he
On
eR
cl
assifi
er
with
the
baseli
ne
cl
assifi
er
Zer
oR
[14],
On
eR
is
a
one
ste
p
bey
ond.
Both
O
neR
an
d
Zero
R
a
re
us
e
f
ul
f
or
determ
ining
a
m
ini
m
um
s
ta
nd
ar
d
cl
a
ssifie
r
for
oth
e
r
cl
assi
ficat
io
n
al
gorithm
s.
On
eR
’s
a
ccur
acy
is
al
w
ay
s
higher
or
a
t
le
ast
equ
al
t
he
baseli
ne
cl
as
sifie
r
wh
e
n
e
valuate
d
on
the
t
raini
ng
data.
T
he
a
uthors
in
[
25
]
pro
po
se
d
at
te
m
pts
to
enh
a
nc
e
the
perform
ance
of
On
eR
by
a
ddr
essing
tw
o
iss
ues:
the
quant
iz
at
ion
of
c
on
ti
nu
ous
-
val
ued
at
tribu
te
s,
an
d
the
treat
m
e
nt
of
m
issi
ng
v
al
ues
.
Figure
1.
The
ps
e
udoc
od
e
of
On
eR
al
gorith
m
[
15
]
F
o
r ea
c
h
a
tt
ri
b
u
te
(
a
tt
),
F
o
r each v
a
lu
e
of
th
a
t
a
tt
,
m
a
k
e
a r
u
le
a
s f
o
ll
o
ws;
C
o
u
n
t h
o
w
o
ft
e
n
e
a
c
h
valu
e
of
c
lass
a
p
p
e
a
rs
F
in
d
t
h
e
m
o
s
t fr
e
q
u
e
n
t c
lass
Ma
k
e
th
e
ru
l
e
a
ss
ig
n
th
a
t c
lass
to
th
is
val
u
e
of
th
e
a
t
t
C
a
lcul
a
te
th
e
to
tal e
rr
o
r o
f t
h
e
ru
les
o
f ea
c
h
a
tt
C
h
o
o
se t
h
e
a
tt
with
th
e
sm
a
ll
e
st
to
tal e
rr
o
r.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Th
e
pertine
nt s
ing
le
-
attri
bute
-
ba
s
ed
classif
ie
r
for s
m
all d
at
as
et
s cla
ssif
ic
ation
(
M. J
amjo
om
)
3229
2
.
2.
Fe
at
ure
s
el
ection
Feat
ur
e
sel
ect
ion
m
et
ho
ds
at
t
e
m
pt
to
fi
nd
the
m
ini
m
al
su
bs
et
of
featu
re
s
that
do
not
s
ign
ific
a
ntly
decr
ease
the
c
la
ssific
at
ion
ac
cur
acy
.
Feat
ur
e
sel
ect
ion
m
e
thods
ca
n
be
c
at
egorized
as
wr
a
pper
m
et
ho
ds
or
filt
er
m
et
ho
ds
[17].
S
urveys
done
by
[17]
and
[
16
]
sho
w
ed
ple
nty
of
s
uch
m
et
ho
ds.
A
w
rappe
r
m
e
thod
is
a
m
od
el
-
base
d
appr
oach
w
here
the
qu
al
it
y
of
the
featur
e
s
se
le
ct
ed
is
m
eas
ur
e
d
by
the
cl
assifi
cat
ion
acc
ur
ac
y
of
t
he
cl
assifi
c
at
ion
al
go
rith
m
being
us
e
d.
So
m
e
us
e
a
g
reed
y
sea
rch
to
sel
ect
the
s
ubset
[
16]
.
Me
anwhil
e,
in
a
filt
er
m
e
thod,
cal
le
d
a
m
od
el
-
f
ree
a
ppr
oach,
the
s
el
ect
ion
of
fe
at
ur
es
is
done
ind
e
pende
ntly
fr
om
the cla
ssific
at
ion al
gorithm
. It
selec
ts t
he
subset’s
featu
res depe
ndent
on general m
easur
a
ble ch
a
ract
erist
ic
s o
f
the
featu
re,
s
uc
h
as
in
f
or
m
ation
Gain,
Gai
n
Ra
ti
o,
Pears
on
Co
rr
el
at
io
n,
Mutual
I
nfor
m
at
ion
(M
I)
[16
]
,
and
Heter
og
e
neity
Ra
ti
o
[2
1].
I
n
this
pap
e
r,
we
us
e
d
feature
sel
ect
ion
that
util
iz
es
fi
lt
er
m
et
hods
(i.e.
,
at
trib
ute
evaluati
on)
a
nd
f
ocu
se
d
o
n
s
om
e
of
the
m
e
ntion
e
d
m
easur
es
(i.e.
,
I
G,
G
R,
and
H
-
Ra
ti
o).
A
bri
ef
de
s
cripti
on
of each
foll
ows.
-
Inform
at
io
n
g
ain
[
21
]
m
e
asur
e
s
the
am
ount
of
in
f
orm
at
ion
giv
en
by
an
at
tribu
te
about
the
cl
ass.
It
is
def
i
ned b
y
for
m
ula (
1)
:
(
)
=
(
)
−
(
)
(1)
wh
e
re
H
att
(
Y)
m
easur
es
the
entr
op
y
of
the
at
tribu
te
att
by
con
trib
utin
g
to
cl
ass
Y
w
hile
H(Y)
cal
c
ulate
s
the
ent
ropy
of
cl
ass
Y.
I
n
fac
t,
entr
opy
is
th
e
quantit
y
of
i
nfor
m
at
ion
c
onta
ined
or
delivere
d
by
a
s
ou
rce
of
inf
or
m
at
ion
. I
t
is al
so
us
e
d
in
m
easur
in
g
th
e
releva
ncy an
d defi
ned b
y
for
m
ula (
2)
:
(
)
=
∑
−
(
)
2
(
)
(2)
-
G
ain
r
ati
o
[
26
]
is
a
rati
o
of
in
form
ation
gai
n
to
intri
nsi
c
inf
or
m
at
ion
.
It
determ
ines
the
releva
nc
y
of
an
at
tribu
te
. GR i
s
calc
ulate
d usi
ng the
f
or
m
ula (
3):
(
)
=
(
)
(
)
(3)
wh
e
re
H(att
)
=
∑
−
(
)
2
(
)
an
d
P(
v
j
)
rep
r
esents
the
pr
obabili
ty
to
have
the
value
v
j
by
con
t
rib
uti
ng
to ove
rall
v
al
ue
s for at
tribu
te
j
.
-
Heter
ogenei
t
y
r
ati
o
is
a
ne
w
m
easur
e
def
i
ned
b
y
[21]
that
m
easur
es
the
rati
o
of
h
et
er
ogeneit
y
of
a
no
m
inal
at
tribu
te
am
ong
the
dataset
at
tribu
te
s.
I
n
ot
her
wor
ds
,
it
qu
a
ntifie
s
the
ho
m
og
e
neity
of
a
set
of
i
ns
t
ances
sh
ari
ng the
sa
m
e v
al
ue
of att
rib
utes. T
he H
-
Ra
ti
o
is de
fine
d
by for
m
ula (4):
−
(
)
=
(
)
+
(
)
(
)
(4)
The rat
io
(
)
(
)
ad
ds value
to
t
he h
om
og
eneit
y i
nst
ances
based o
n
at
trib
utes a
nd class s
im
ultaneo
us
ly
whe
re
as
the
rati
o
(
)
(
)
apprecia
te
s
the
hom
og
eneit
y
in
sta
nces
of
the
sam
e
c
la
ss
and
sh
a
res
the
sam
e
value
of
at
tribu
te
s.
3.
RESU
LT
S
A
ND AN
ALYSIS
In
this
sect
io
n,
we
intr
oduce
a
new
si
ng
le
-
a
tt
ribu
te
-
base
d
cl
assifi
er
SA
B
-
HR
to
cl
assif
y
the
s
m
al
l
dataset
s.
T
he
ne
w
al
gorithm
us
es
a
ne
w
crit
erio
n
to
sel
ect
the
po
werfu
l
pe
r
ti
nen
t
si
ng
le
at
tribu
te
,
wh
ic
h
will
con
t
rib
ute
in
t
he
cl
assifi
cat
io
n.
SA
B
-
HR
is
un
li
ke
O
neR
i
n
gen
e
rati
ng
a
ru
le
for
eac
h
a
tt
ribu
te
.
It
cal
culat
es
the
H
-
Ra
ti
o
f
or
each
at
trib
ute
(
att
H
-
Ratio
)
in
the
dataset
to
determ
ine
the
at
tribu
te
that
is
le
ss
heterog
eni
c
a
m
on
g
th
e
othe
r
at
tribu
te
s.
The
at
trib
ute
with
the
lo
we
st
heteroge
neity
ra
ti
o
value
is
us
ed
in
pair
s
with
the
cl
ass
c
(
att
H
-
Ratio
,
c
)
in
the
cl
assifi
cat
ion
process
w
hile
the
rem
ai
nin
g
a
tt
ribu
te
s
a
re
el
im
inate
d.
T
he
pow
e
r
of
th
e
sin
gle
at
tribu
te
sel
ect
ed
for
S
AB
-
HR
li
es
in
it
s
ho
m
og
eneit
y
with
oth
er
at
trib
utes
i
n
w
hich
it
pro
vid
e
s
enou
gh
i
nform
at
ion
for
th
e
cl
assifi
er
to
pr
e
dict
corre
ct
ly
.
att
H
-
Rat
io
is
a
rep
re
sent
at
ive
at
tribu
te
that
is
su
f
fici
ent fo
r
s
m
al
l datase
ts. Th
e al
gorithm
i
c d
esc
riptio
n o
f
S
AB
-
HR is
presente
d
in
F
ig
ur
e
2.
Figure
2.
The
ps
e
udoc
od
e
of
SA
B
-
HR al
gor
it
h
m
Fo
r
each att
ribu
te (
a
tt
),
Calcu
late the
a
tt
H
-
R
a
tio
;
Ch
o
o
se th
e
a
tt
with
the s
m
allest
a
tt
H
-
R
a
tio
v
alu
e;
Re
m
o
v
e
al
l
a
tt
in
t
h
e dataset ex
cept
t
h
e pairs (
a
tt
H
-
R
a
to
,
c
);
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020 :
32
27
-
3234
3230
3.1.
Ex
peri
ments
In
the
f
ollow
i
ng
e
xperim
ents,
we
ai
m
to
e
valuate
the
perform
ance
of
th
e
new
SA
B
-
H
R
cl
assifi
er
wh
e
n
deali
ng
with
sm
al
l
dat
aset
s.
I
n
ad
diti
on,
we
wa
nt
to
com
par
e
the
pe
rfor
m
ance
of
SA
B
-
HR
with
oth
e
r
sing
le
at
trib
ute
cl
assifi
ers
tha
t
us
e
diff
e
re
nt
crit
eria,
s
uc
h
a
s
IG
a
nd
GR,
wh
e
n
sel
ect
ing
the
sing
le
at
tr
ibu
te
durin
g
the
fe
at
ur
e
-
sel
ect
ion
process
.
W
e
us
ed
t
he
w
el
l
-
known
op
en
s
ource
s
oft
war
e
WEKA
[27].
The
dataset
s
wer
e
ob
ta
ine
d
fr
om
the
UCI
Re
posit
or
y
f
or
Ma
chi
ne
L
earn
i
ng
[28].
We
sel
ect
ed
12
sm
all
dataset
s
corres
pondin
g
to
V
apn
i
k’
s
def
i
niti
on
[
3].
Table
1
li
sts
the
m
ai
n
char
act
eris
ti
cs
of
the
dataset
s
colle
ct
ed
an
d
use
d
in
te
rm
s
of
nu
m
ber
of
ins
ta
nces,
num
ber
of
at
trib
utes,
a
nd
Va
pn
i
k’
s
ra
ti
o
for
determ
i
ning
the d
at
aset
’s
si
ze. T
he nu
m
ber
beside
t
he dat
aset
nam
e w
il
l be its re
fer
e
nce
in
the
f
i
gures.
The
On
eR
w
as
us
e
d
as
a
base
cl
assifi
er
;
a
10
-
f
old
c
r
os
s
-
validat
io
n
and
a
pair
ed
t
-
te
st
with
a
confide
nce
le
vel
of
95%
wer
e
us
e
d
t
o
determ
ine
i
f
the
dif
fer
e
nc
es
in
cl
assifi
cat
ion
accu
rac
y
wer
e
sta
ti
sti
cally
sign
if
ic
ant,
an
d
unde
rline
d
in
t
he
ta
bles.
We
com
par
ed
the
dif
fer
e
nt
m
eth
ods
with
res
pect
t
o
the
ave
ra
ge
cl
a
ssific
at
ion
acc
ur
acy
a
nd
the
nu
m
ber
of
data
set
s
for
w
hich
each
m
et
ho
d
a
chieve
d
bette
r
resu
lt
s.
Be
tt
er r
esults a
re show
n
in
the
tables i
n b
old
font.
In
t
he
ta
bles,
we
nam
ed
each
te
chn
i
qu
e
us
i
ng
the
a
bbrev
i
at
ion
SA
B
f
or
sing
le
-
at
tri
bu
te
-
base
d
nam
e,
su
f
fixe
d
with
a
n
a
bbre
vi
at
ion
f
or
the
m
easur
e
us
e
d
for
sel
ect
in
g
t
he
si
ng
le
at
tri
bute
in
the
feat
ur
e
-
sel
ect
ion
proces
s.
The
new
cl
ass
ifie
rs,
with
res
pect
to
the
dif
fer
e
nt
m
easur
es,
are
nam
ed
as
fo
ll
ows:
S
A
B
-
HR,
SA
B
-
I
G
a
nd
SA
B
-
GR.
I
n
our
ex
per
im
ents,
we
a
ppli
ed
the
featur
e
-
sel
ect
ion
p
r
ocess
us
in
g
diff
e
ren
t
m
easure
s
(H
-
Ra
ti
o,
IG,
an
d
GR
)
to
sel
ect
the
pe
rtinent
si
ng
le
at
tribu
te
,
t
he
n
we
el
i
m
inate
d
the
rem
ai
nin
g
(i.e
., u
ns
el
ect
e
d) att
ribu
te
s
and classi
fie
d wit
h
a
pair
of att
ri
bu
te
s
(
per
ti
ne
nt
sing
le
att
ri
bu
t
e, class)
.
Table
1.
C
har
a
ct
erist
ic
s o
f dat
aset
s u
se
d
i
n
th
e ex
per
im
ents
#
Dataset
#
ins
tan
ces
#
attr
ib
u
tes
#
ins
tan
ces/#
attr
ib
u
tes
1
Po
sto
p
erative
-
p
atien
t
-
d
ata
90
9
10
2
co
n
tact
-
len
ses
24
4
6
3
weather
-
n
o
m
in
al
14
4
3
.5
4
co
lic.ORIG
368
27
1
3
.63
5
cy
lin
d
er
-
b
an
d
s
540
39
1
3
.85
6
Der
m
ato
lo
g
y
366
34
1
0
.76
7
Flag
s
194
29
6
.69
8
lu
n
g
-
cancer
32
56
0
.57
9
sp
ect_
train
80
22
3
.64
10
Sp
o
n
g
e
72
45
1
.6
11
Zoo
101
17
5
.94
12
p
ri
m
ar
y
-
tu
m
o
r
339
17
1
9
.94
3.2.
Res
ults
and
d
isc
u
ssion
The
e
xp
e
rim
e
nt’s
resu
lt
s
a
re
com
bin
ed
in
Table
2,
wh
ic
h
com
par
es
t
he
perf
or
m
ance
of
cl
assic
al
On
eR
with
the
new
create
d
cl
assifi
ers.
No
t
ic
eably
,
the
pe
rfor
m
ance
of
the
cl
assic
al
O
neR
is
insig
nif
ic
ant
wh
e
n
c
om
par
ed
to
the
ne
w
app
li
ed
cl
assi
f
ie
rs.
T
he
over
al
l
aver
age
ac
cur
acy
f
or
the
new
cl
assifi
e
r
s
(i.e.,
SA
B
-
HR,
S
A
B
-
IG
an
d
S
AB
-
GR)
is
64.
6%
,
49.
72%
an
d
61.31%
,
res
pe
ct
ively
,
corres
pondin
g
to
48.
53%
f
or
the
cl
assic
al
On
eR
cl
assifi
er.
Fu
r
t
her
m
or
e,
t
he
dif
fer
e
nce
in
ave
rag
e
acc
uracy
betwee
n
S
AB
-
HR
com
par
ed
to
the
cl
assic
al
On
eR
is
sta
ti
sti
cal
ly
sign
ific
ant.
Th
e
ave
r
age
dif
fe
ren
ce
betwee
n
the
cl
assic
al
On
e
R
and
the
ap
plied
cl
a
ssifie
rs
(i.e
.,
S
AB
-
HR,
S
AB
-
IG
a
nd
SA
B
-
GR)
is
16.07
%,
1.1
9%
a
nd
1
2.7
8
%
,
res
pe
ct
ively
,
favor
i
ng n
e
w
c
la
ssifie
rs.
Table
2
.
T
he
perf
or
m
ance’s
s
umm
ary of
a
pp
li
ed
cl
assifi
ers c
om
par
ed
t
o
th
e cla
ssica
l One
R cl
assifi
er
Dataset
On
eR
SAB
-
HR
On
eR
SAB
-
IG
On
eR
SAB
-
GR
Po
sto
p
erative
-
p
atien
t
-
d
ata
6
7
.78
7
1
.11
6
7
.78
7
1
.11
6
7
.78
6
8
.89
co
n
tact
-
len
ses
7
0
.83
7
0
.83
7
0
.83
7
0
.83
7
0
.83
7
0
.83
weather
-
n
o
m
in
al
4
2
.86
5
7
.14
4
2
.86
50
4
2
.86
50
co
lic.ORIG
6
7
.66
6
5
.76
6
7
.66
6
7
.66
6
7
.66
6
3
.86
cy
lin
d
er
-
b
an
d
s
4
9
.63
6
7
.59
4
9
.63
4
9
.63
4
9
.63
65
d
er
m
ato
lo
g
y
4
9
.73
3
6
.07
4
9
.73
5
0
.27
4
9
.73
3
6
.07
f
lag
s
4
.64
3
3
.51
4
.64
4
.64
4
.64
4
2
.78
lu
n
g
-
cancer
8
7
.5
9
6
.88
8
7
.5
8
7
.5
8
7
.5
7
8
.13
sp
ect_
train
6
7
.5
9
2
.5
6
7
.5
75
6
7
.5
75
sp
o
n
g
e
4
.17
9
8
.61
4
.17
4
.17
4
.17
9
5
.83
zo
o
4
2
.57
6
0
.4
4
2
.57
4
2
.57
4
2
.57
6
0
.4
p
ri
m
ar
y
-
tu
m
o
r
2
7
.43
2
4
.78
2
7
.43
2
3
.3
2
7
.43
2
8
.9
Av
erage Ac
cu
rac
y
4
8
.53
6
4
.6
4
8
.53
4
9
.72
4
8
.53
6
1
.31
#
of
better dataset
3
8
1
4
3
8
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Th
e
pertine
nt s
ing
le
-
attri
bute
-
ba
s
ed
classif
ie
r
for s
m
all d
at
as
et
s cla
ssif
ic
ation
(
M. J
amjo
om
)
3231
Figure
3
(a
-
c
)
com
par
e
the
a
pp
li
ed
cl
assifi
e
rs
to
the
cl
assi
cal
On
eR
cl
as
sifie
r
in
te
rm
s
of
ave
ra
ge
accuracy,
with
the
le
ss
heter
og
e
nous
at
trib
ute
cl
assifi
er
(
SA
B
-
HR)
ra
nking
first,
f
ollo
wed
by
S
AB
-
GR
wit
h
a
sli
gh
t
dif
fer
e
nce
(3.29%
)
from
first,
an
d
S
AB
-
I
G
cl
assifi
er
with
a
big
di
ff
ere
nce
from
oth
e
r
cl
assifi
er
s
but
lookin
g
ty
pical
to
the
cl
assical
On
eR
,
the
two
li
nes
ap
pro
xim
a
te
ly
identic
al
as
sh
own
in
Fi
gure
3
(
b)
.
Th
e
(
att
IG
)
at
tribu
te
us
e
d
in
SA
B
-
IG
c
on
ta
i
ns
the
la
rg
e
st
a
m
ou
nt
of
inf
orm
ation
ab
ou
t
the
cl
ass.
In
a
sm
a
l
l
dataset
case,
it
m
ay
be
m
or
e
i
m
po
rtant
to
be
co
ncerne
d
about
the
co
nsi
ste
ncy
of
the
at
tribu
te
with
oth
e
r
at
tribu
te
s
du
e
to
the
li
m
i
te
d
num
ber
of
in
sta
nces
in
t
he
dataset
.
T
his
would
m
ini
m
i
ze
the
ga
ps
e
xisti
ng
betwee
n
the
in
sta
nces
in
the
dataset
.
The
hom
og
eneit
y
of
the
dataset
hel
ps
m
ake
it
m
or
e
represe
ntati
ve
an
d,
thu
s
,
m
or
e
acc
ur
at
e
t
o
be
le
a
rn
e
d.
I
n
a
dd
it
ion,
the
ne
w
cl
assifi
ers
ac
hie
ved
bette
r
ave
r
a
ge
acc
ur
acy
i
n
m
or
e
dataset
s
than
On
eR
as
s
how
n
in
Ta
ble
2.
Figure
4
(a
-
c
)
sh
ows
eac
h
ne
w
cl
assifi
er
in
com
par
ison
t
o
On
eR
.
The
num
ber
of
bette
r
da
ta
set
s
achieve
d
is
8,
4
an
d
8
f
or
SA
B
-
HR
,
SA
B
-
IG
an
d
SA
B
-
GR,
res
pe
ct
ively
,
corres
pondin
g t
o
3,
1
,
a
nd
3 f
or O
neR cla
ssi
fier.
Figure
3.
Com
par
is
on of a
ppli
ed
cl
assifi
ers
ver
s
us
O
neR cl
assifi
er in t
erm
of a
ver
a
ge
acc
ur
acy
Figure
4.
Com
par
is
on of a
ppli
ed
cl
assifi
ers
ver
s
us
O
neR cl
assifi
er in t
erm
of
nu
m
ber
of b
et
te
r data
se
ts achie
ve
d
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020 :
32
27
-
3234
3232
Fr
om
Table
2,
it
is
ob
vious
that
sel
ect
ing
the
sing
le
at
tribu
te
that
has
a
lower
cl
assifi
cat
ion
erro
r
rate
f
or
t
he
O
neR
cl
assifi
er
is
no
t
al
ways
op
ti
m
al
,
especial
ly
in
s
m
all
dataset
s.
Using
a
m
or
e
deli
ber
at
e
te
chn
iq
ue
to
se
le
ct
the
sin
gle
at
tribu
te
has
a
po
sit
ive
im
pac
t
on
cl
assifi
cat
ion
ac
cu
racy
a
nd
num
ber
of
bette
r
dataset
s
ac
hiev
ed.
Me
a
nwhile
,
we
de
vel
op
e
d
T
a
ble
3
t
o
hi
gh
li
ght
the
ne
w
cl
assifi
e
r
S
AB
-
HR,
w
hich
us
e
d
ho
m
og
e
neity
fo
r
the
pe
rtine
nt
sing
le
at
tri
bu
te
sel
ect
io
n.
Table
3
s
hows
a
com
par
i
so
n
betwee
n
t
he
ne
w
cl
assifi
er
S
AB
-
HR
a
nd
the
oth
e
r
create
d
cl
assifi
ers
f
or
the
sam
e
pu
r
pose
(i.e.,
SA
B
-
I
G
a
nd
SA
B
-
GR)
.
The
res
ults
showe
d
that
SAB
-
HR’s
a
ver
a
ge
accu
racy
outpe
rfor
m
s
SA
B
-
IG’s
ave
ra
ge
accuracy
by
near
ly
14.88%
,
w
hile
with
SA
B
-
G
R
the
di
ff
e
rence
is
only
1.3
7%.
I
n
gen
e
ra
l,
the
perform
ance
of
t
he
S
AB
-
HR
cl
assifi
er
is
re
m
ark
able
wh
e
n
com
par
e
d
to
the
cl
as
sic
al
On
eR
or
t
he
a
pp
li
ed
cl
assi
fiers
(i.e
.,
S
AB
-
IG
a
nd
SA
B
-
GR)
.
Fig
ur
e
5
(a)
an
d
(
b)
s
how
the
di
ff
e
ren
ce
of
pe
r
form
ance
of
each
dataset
bet
ween
S
AB
-
HR
and
the o
t
her ap
plied classi
fier
s i
n
te
rm
s o
f
av
e
r
age acc
ur
acy
.
Table
3.
A
c
om
par
ison
betw
een th
e
n
e
w
cl
assifi
er
S
AB
-
HR
and
t
he othe
r
cl
assifi
ers
Dataset
SA
B
-
HR
SA
B
-
IG
SA
B
-
HR
SA
B
-
GR
Po
sto
p
erative
-
p
atien
t
-
d
ata
7
1
.11
7
1
.11
7
1
.11
6
8
.89
Co
n
tact
-
len
ses
7
0
.83
7
0
.83
7
0
.83
7
0
.83
W
eath
er
-
n
o
m
in
al
5
7
.14
50
5
7
.14
50
Co
lic.ORIG
6
5
.76
6
7
.66
6
5
.76
6
3
.86
Cylind
er
-
b
an
d
s
6
7
.59
4
9
.63
6
7
.59
65
Der
m
ato
lo
g
y
3
6
.07
5
0
.27
3
6
.07
3
6
.07
Flag
s
3
3
.51
4
.64
3
3
.51
4
2
.78
Lun
g
-
cancer
9
6
.88
8
7
.5
9
6
.88
7
8
.13
Sp
ect_
train
9
2
.5
75
75
75
Sp
o
n
g
e
9
8
.61
4
.17
9
3
.06
9
5
.83
Zoo
6
0
.4
4
2
.57
6
0
.4
6
0
.4
Pri
m
a
r
y
-
tu
m
o
r
2
4
.78
2
3
.3
2
4
.78
2
8
.9
Av
erage Ac
cu
rac
y
6
4
.6
4
9
.72
6
2
.68
6
1
.31
#
of
better dataset
8
2
5
3
Figure
5.
Com
par
is
on of a
ppli
ed
cl
assifi
ers
ver
s
us
SA
B
-
H
R cl
assifi
er in t
erm
o
f
A
ve
rage Acc
ur
acy
In
s
umm
ary,
we
can
c
on
cl
ud
e
that,
f
or
sm
al
l
dataset
s,
us
in
g
a
si
m
ple
cl
assifi
er,
s
uc
h
as OneR,
is on
e
of
the
m
ai
n
opti
ons
f
or
e
nhancin
g
it
s
cl
as
sific
at
ion
acc
uracy
.
I
n
a
ddit
ion,
em
plo
yi
ng
the
featu
re
-
se
le
ct
ion
m
et
ho
d
for
sel
ect
ing
a
si
ng
le
at
tribu
te
us
in
g
a
c
omm
on
m
easur
e
li
ke
H
-
Ra
ti
o,
I
G
or
GR
will
do
s
o,
wit
h
bette
r
resu
lt
s.
On
the
ot
her
ha
nd,
co
ns
ide
ring
the
hom
og
e
neity
of
the
at
t
rib
ute
fo
r
per
ti
nen
t
sin
gle
at
tribu
te
sel
ect
ion
can
posit
ively
i
m
pact
the
cl
assifi
cat
ion
pro
cess.
It
helpe
d
to
r
e
du
ce
t
he
ga
p
be
tween
insta
nc
es,
an
d
accor
dingly
ha
d
a
represe
ntati
ve
dataset
.
Co
ns
e
qu
e
ntly
,
it
pro
vid
e
d
en
ough
in
form
at
ion
fo
r
the
cl
assif
ie
r
to
le
arn
a
nd
achi
eve
a
decen
t
a
ver
a
ge
acc
ur
ac
y.
From
the
previo
us
r
esults,
sing
le
-
at
tri
bu
te
-
base
d
cl
assifi
er
ca
n
be
po
werfu
l
f
o
r
cl
assify
in
g
sm
a
ll
dataset
s
wh
e
n
the
pert
inent
at
tribu
te
is
sel
ect
ed.
That
is
the
case
wit
h
the n
e
w SAB
-
HR, whic
h
is
r
ecom
m
end
e
d
a
m
on
g
the test
e
d
cl
assifi
er
s in
this w
ork.
4.
CONCL
US
I
O
N
In
this
wor
k
we
ha
ve
e
xp
l
ored
the
powe
r
of
the
si
ng
le
a
tt
ribu
te
w
he
n
sel
ect
ed
us
in
g
an
ef
fectual
featur
e
-
sel
ect
ion
crit
eri
on.
We
ha
ve
a
ddr
essed
t
he
sm
al
l
dataset
m
ining
pro
blem
as
it
is
not
al
ways
easy
to
gathe
r
a
la
r
ge
a
m
ou
nt
of
real
data.
T
he
new
al
gorit
hm
SA
B
-
HR
is
a
per
ti
ne
nt
sin
gle
-
at
tri
bu
te
-
base
d
cl
a
ssifie
r
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Th
e
pertine
nt s
ing
le
-
attri
bute
-
ba
s
ed
classif
ie
r
for s
m
all d
at
as
et
s cla
ssif
ic
ation
(
M. J
amjo
om
)
3233
consi
sti
ng
of
a
pair
of
(sim
plici
ty
,
eff
ect
ivene
ss)
t
o
c
ontrib
ute
posit
ively
in
cl
assif
yi
ng
sm
al
l
dataset
s.
The
sin
gle
at
tribu
te
sel
ect
ed
to
be
the
m
os
t
ho
m
og
e
nous
with
the
ot
her
at
tribu
te
s
in
th
e
dataset
giv
es
m
or
e
consi
ste
ncy
be
tween
i
ns
ta
nc
es.
O
ur
em
pirical
resu
lt
s
us
e
d
12
be
nc
hm
ark
dataset
s
of
a
sm
al
l
siz
e
corres
pondin
g
to
Vapnik
’s
de
finiti
on.
The
res
ults
s
how
t
hat
S
A
B
-
HR’s
pe
r
form
ance
sign
if
ic
antly
ou
t
perform
s
the
cl
assic
al
On
e
R’s
pe
rfor
m
ance.
In
a
ddit
ion
,
we
com
par
ed
the
perf
or
m
ance
of
S
AB
-
HR
with
oth
e
r
sing
le
at
tribu
te
cl
assifi
ers
that
us
e
di
ff
ere
nt
at
tribut
e
sel
ect
ion
crit
eria
(e.g
.,
IG
and
GR
),
an
d
al
l
the
res
ults
co
nfi
rm
ed
the
effe
ct
iveness
of
t
he
S
AB
-
HR
cl
assifi
er.
In
fu
t
ur
e
wor
k,
we
int
end
t
o
in
ve
sti
gate
al
gorithm
s
to
i
m
pr
ove
the
c
la
ssific
at
ion
a
ccur
acy
of
s
m
al
l
dataset
s
us
in
g
m
or
e
pr
ogressi
ve
cl
as
sifie
rs.
In ad
diti
on, we
ai
m
to
pro
pos
e m
or
e si
m
ple m
et
ho
ds f
or cla
ssific
at
ion
.
ACKN
OWLE
DGE
MENTS
This
re
searc
h
was
fun
ded
by
the
Dea
ns
hi
p
o
f
Scie
ntific
Re
searc
h
at
P
rincess
Nou
rah
bi
nt
Abd
ulra
hm
an
Un
i
ver
sit
y
th
rough
t
he
Fast
-
tr
ack
Re
sea
rch
Fund
i
ng
Pro
gra
m
.
The
a
utho
r
is
s
o
gr
at
ef
ul
for
al
l
these s
upports
in con
du
ct
i
ng this r
e
searc
h
a
nd m
akes it succ
essfu
l
.
REFERE
NCE
S
[1]
T.
Mi
tc
h
el
l
,
“
Ma
chi
ne
Learni
ng
,
”
McGraw Hill,
1
997.
[2]
T.
Van
Gem
ert,
“
On
the
infl
uen
c
e
of
dataset
cha
r
ac
t
eri
sti
cs
on
cla
ss
ifi
er
per
form
a
nce
,
”
Ba
che
lor
The
sis,
Facu
lty
of
Hum
ani
ti
es
,
Ut
r
ec
ht
Univer
sit
y
,
pp.
1
–
13
,
2017
.
[3]
V.
Vapnik
,
“
Statis
ti
cal Le
arn
ing The
or
y
,
”
W
ile
y
,
New York,
2000
.
[4]
N.
H.
Rupare
l
,
N.
M.
Shahane,
and
D.
P.
Bhamare
,
“
L
ea
rning
f
rom
s
m
al
l
data
set
to
buil
d
cl
as
sific
a
ti
on
m
odel
:
A
surve
y
,
”
IJC
A
Proceedi
ngs
on
Inte
rnationa
l
Confe
renc
e
on
Re
c
ent
Tr
ends
in
Eng
in
ee
ring
and
Technol
og
y
ICRTET
,
vol
.
4
,
pp.
23
–
26
,
2013
.
[5]
X.
Chen
and
J.
C.
Jeong,
“
Mi
nimum
ref
ere
nc
e
set
base
d
feat
ure
select
ion
fo
r
sm
al
l
sam
ple
cl
assifi
ca
t
ions,
”
Proce
ed
ings o
f
t
he
24th
Int
ernational
Conf
ere
nce on
Mac
h
ine Lea
rning
-
ICML ’0
7
,
pp
.
153
–
160
,
2007.
[
6]
S.
L.
Happ
y
,
R
.
Moha
nt
y
,
and
A.
Routray
,
“
An
eff
ective
f
ea
t
ure
select
ion
m
et
hod
base
d
on
pai
r
-
wise
fe
at
ur
e
proximit
y
for
hi
gh
dimensional
low
sam
ple
size
dat
a,
”
25th
Eur
opean
Signal
Pr
oce
ss
ing
Confe
r
enc
e
,
EUSIP
CO
,
pp.
1574
–
1578
,
2017
.
[7]
A.
Golugula,
G.
Le
e
,
and
A.
Mada
bhush
i,
“
Ev
a
lua
ti
ng
feature
s
el
e
ct
ion
str
ategi
es
for
high
dimensional,
sm
al
l
sam
ple
size
dataset
s,
”
Conf
ere
nce
Proceedi
ng
s
Annual
Inte
rnational
Confe
re
nce
of
the
IE
E
E
Engi
ne
ering
i
n
Me
dicine
and
B
i
ology
So
ci
e
ty.
I
E
EE
Engi
ne
ering in
Me
d
ic
in
e
and
Bi
ology Society
,
pp.
949
–
952
,
20
11.
[8]
I.
Soare
s,
J.
Dias,
H.
Rocha
,
M.
d
o
Carmo L
opes,
and
B.
Ferre
ira,
“
Feat
ure
select
io
n
in
sm
al
l
dat
abas
es:
A
m
edi
ca
l
-
ca
se
stud
y
,
”
IF
MBE
Proceedi
n
gs:
XIV
Me
di
te
r
ranean
Confe
re
nce
on
Me
d
ic
al
and
B
iol
ogi
cal
Engi
n
ee
ring
an
d
Computing
,
vo
l.
57,
pp
.
808
–
813
,
2016.
[9]
T.
Shaikh
ina,
D
.
Lowe
,
S.
Dag
a,
D.
Briggs,
R
.
Higgins,
and
N.
Khovanova
,
“
Mac
hine
l
ea
rni
ng
for
pre
di
ct
iv
e
m
odel
li
ng
b
ase
d
on
sm
al
l
data in
biomedical
enginee
ring
,
”
IFA
C
-
Pape
rs
OnL
ine
,
vol
.
28
,
pp
.
469
–
474
,
2015
.
[10]
R.
C.
Holt
e,
“
Ver
y
sim
ple
class
ifi
cation
ru
le
s
p
e
rform
well
on
m
ost
comm
onl
y
u
sed
dataset
s
,
”
M
achi
ne
Learning
,
vol.
11
,
pp
.
63
–
9
1,
1993
.
[11]
A.
K.
Dogra
and
T.
W
al
a
,
“
A
compara
ti
ve
stud
y
of
sele
c
te
d
c
la
ss
i
fic
a
ti
on
a
lgori
th
m
s
of
dat
a
m
ini
ng
,”
Int
ernati
onal
Journ
al
of
Computer
Sc
ie
nc
e
and
Mobile
Computi
ng
,
vol
.
4
,
no
.
6
,
pp.
220
–
229
,
20
15.
[12]
F.
Alam
and
S
.
Pacha
ur
i,
“
Com
par
at
ive
stud
y
of
J48
,
Naive
Ba
y
es
and
One
-
R
cl
assifi
catio
n
te
chni
qu
e
for
cre
di
t
c
ard
fra
u
d
det
e
ct
ion
usin
g
W
EKA,
”
Advance
s
in
Compu
tat
ional
S
cienc
e
s
and
Technol
o
gy
,
vol
.
10
,
no.
6,
pp.
1731
–
1743
,
2017.
[13]
V.
S.
Parsania
,
N.
N.
Jani,
and
N.
H.
Bhal
odiy
a
,
“
Appl
y
ing
Naïve
Ba
y
es,
B
a
y
esNet
,
PA
RT,
JRip
and
One
R
a
lgori
thms
on
h
ypoth
y
ro
id
d
at
ab
a
se
for com
par
at
i
ve
an
aly
sis
,
”
I
J
DI
-
ERET
,
v
ol. 3, pp. 1
–
6,
2015.
[14]
C.
Nasa
and
Su
m
an,
“
Eva
lu
at
io
n
of
diff
ere
n
t
class
ifi
cation
tech
nique
s
for
W
EB
d
ata,
”
Int
ernational
Journal
of
Computer
Appli
cat
ions
,
vol
.
52
,
pp.
34
–
40
,
2012
.
[15]
L.
Du
and
Q
.
Song,
“
A
sim
ple
cl
assifi
er
base
d
on
a
single
attri
bute
,
”
Proc
ee
d
i
ngs
of
the
14
th
I
EE
E
In
te
rnation
a
l
Confe
renc
e
on
High
Pe
rform
ance
Computing
and
Comm
unic
ati
ons,
HPCC
-
2012
&
9th
IEE
E
Int
ernati
ona
l
Confe
renc
e
on
E
mbedde
d
S
of
twa
re
and
Syst
ems, ICESS
-
2012
,
pp
.
660
–
665,
2012.
[16]
M.
Dash a
nd
H.
Li
u,
“
Feat
ure
sel
ec
t
ion
for
class
ifi
cation,
”
In
te
l
li
g
ent
Data
Analysi
s,
v
ol.
1
,
pp.
131
–
156
,
1997
.
[17]
L.
Huan
and
L.
Yu,
“
Towa
rd
i
nte
gra
ti
ng
featu
re
sele
c
ti
on
al
g
orit
hm
s
for
cl
assific
a
ti
on
and
clus
te
ring,
”
IEEE
Tr
ansacti
ons on Knowledge and D
ata
Engi
n
ee
rin
g
,
vol
.
17
,
pp
.
49
1
–
502
, 2
005
.
[18]
M.
Ramasw
ami
and
R.
Bh
aska
ra
n,
“
A
stud
y
on
f
ea
tur
e
sel
ec
t
ion
t
ec
hniqu
es
in
edu
ca
t
iona
l
d
ata
m
ini
ng,
”
Journal
o
f
Computing
,
vo
l.
1
,
pp
.
7
–
11
,
200
9.
[19]
Y.
Pan,
“
A
proposed
fre
quen
c
y
-
bas
ed
fe
at
ur
e
select
ion
m
et
h
od
for
c
ancer
c
la
ss
ifi
c
at
ion
,
”
Maste
r
Thes
e
s
&
Spec
ia
li
st
Projects,
Top
SHCO
L
AR
,
Fa
cul
t
y
of
t
he
Department
of
Computer
Sci
e
nce
,
W
este
rs
Kent
ucky
Unive
rs
ity
,
2017.
[20]
I.
Sangai
ah
,
A.
V.
A.
Kum
ar,
and
A.
Bal
amurugan,
“
An
empiric
al
stud
y
on
different
ran
king
m
ethods
for
eff
ec
ti
v
e
dat
a
class
ifi
c
at
io
n,
”
Journal
o
f
M
odern
Applied
St
ati
stic
a
l
M
et
hod
s
,
vol.
14
,
pp.
35
–
52
,
2015
.
[21]
M.
Tra
b
el
si
,
N.
Meddouri,
and
M.
Maddouri
,
“
A
new
fea
tur
e
sele
c
ti
on
m
et
ho
d
for
nom
ina
l
c
la
ss
ifi
er
base
d
o
n
form
al
con
ce
pt
a
naly
s
is,”
Proc
ed
ia
Computer
S
cienc
e
,
v
o
l.
112
,
p
p.
186
–
194
,
201
7.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
3
,
J
une
2020 :
32
27
-
3234
3234
[22]
R.
Holte,
“
Mac
hine
l
ea
rning
,
”
Proce
ed
ing
of
t
he
Tenth
Int
ernati
onal
Conf
ere
nce
,
Uni
ve
rs
ity
of
Mass
achuse
tts
,
Amhe
rs
t,
Ju
ne
1
993.
[23]
D
.
I.
Morari
u,
R.
G.
C.
Ulesc
u,
and
M.
Breazu
,
“
Feat
ure
se
l
ec
t
ion
in
docu
m
ent
cl
assificat
i
on
,
”
T
he
Fourt
h
Inte
rnational
Co
nfe
renc
e
in
Rom
ania
of
Informat
ion
Sc
ie
nc
e
and
Information
Lit
e
racy
,
Romania
,
2013.
[24]
J.
Novakovic
,
“
Us
ing
informati
on
gai
n
at
tri
bu
te
eva
lua
t
ion
to
class
if
y
sonar
ta
rg
et
s
,
”
17
ThT
El
ec
omm
unucat
io
n
Forum
,
pp.
1351
–
1354
, 2
009
.
[25]
C.
G.
Nevill
-
Ma
nning,
G.
Holm
e
s,
and
I.
H.
W
it
t
en,
“
The
dev
el
o
pm
ent
of
Holte
’s
1R
cl
assifie
r
,
”
Proce
ed
ings
1995
Sec
ond
New
Zealand
Inte
rnatio
nal
Tw
o
-
Stream
Confe
renc
e
on
Arti
fi
c
ial
Neur
al
Net
works
and
Ex
pert
Syste
m
s
,
1995
.
[26]
J.
Novaković,
P.
Strbac
,
and
D.
Bula
tov
ić
,
“
Towa
rd
opti
m
al
fea
tur
e
sel
ec
t
i
on
using
ran
ki
ng
m
et
hods
and
cl
assifi
ca
t
ion al
g
orit
hm
s,
”
Y
ugosl
av
Journal
of
Operations
R
ese
arc
h
,
vol
.
21
,
pp
.
11
9
–
135,
2011
.
[27]
U
of
W
ai
kat
o,
“
W
EKA:
The
W
ai
kat
o
envi
r
onm
ent
for
kn
owledge
a
na
l
y
s
is
,
”
2018.
[Onl
ine
]
.
Avail
able
:
htt
p://ww
w.c
s.wai
ka
to.
a
c.
n
z/
m
l/w
eka
/
[28]
UCI,
“
UC
I
m
ac
hine
le
arn
ing
rep
ositor
y
,
”
2018
.
[Online
]
,
Avail
ab
l
e:
htt
p:
/
/a
r
chi
ve
.
i
cs.
uci.e
d
u
/ml/m
achine
-
l
ea
rningd
ataba
ses/
Evaluation Warning : The document was created with Spire.PDF for Python.