TELKOM
NIKA
, Vol.14, No
.4, Dece
mbe
r
2016, pp. 15
02~150
9
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i4.3956
1502
Re
cei
v
ed Ma
y 9, 2016; Re
vised Novem
ber 17, 20
16;
Accept
ed No
vem
ber 3
0
, 2016
SVM Parameter Optimization Using Grid Search and
Genetic Algorithm to Improve Classification
Performance
I
w
an S
y
arif*
1
, Adam Prugel-Be
nne
tt*
2
, G
a
r
y
W
i
l
l
s
3
1
Politekn
i
k Ele
k
tronika N
eger
i
Suraba
ya, In
d
ones
ia
2,3
School of Ele
c
tronics an
d C
o
mputer Sci
e
n
c
e,
Univers
i
t
y
of Southam
pto
n
, United Ki
ng
dom
*Corres
p
o
ndi
n
g
authors, e-
m
a
il: i
w
a
nar
if@p
ens.ac.id
1
, ap
b
@
ecs.soton.ac
.uk
2
, gb
w
@
ecs
.
soton.ac.uk
3
A
b
st
r
a
ct
Machi
ne
Le
arn
i
ng
al
gorit
h
m
s
have
b
een
w
i
d
e
ly
used
to s
o
l
v
e var
i
ous
ki
nd
s of d
a
ta c
l
assi
ficatio
n
prob
le
ms. C
l
a
ssificatio
n
pr
o
b
le
m esp
e
ci
al
ly for
hig
h
di
me
nsio
nal
d
a
tasets h
a
ve
attracted
man
y
researc
hers in
order to find ef
ficient ap
proac
hes to
addr
ess
them. How
e
ve
r, the classifica
tion pro
b
le
m h
a
s
beco
m
e very complic
ated a
n
d
comp
utatio
nal
l
y
expens
ive,
e
s
peci
a
lly w
hen
the nu
mb
er of possi
ble d
i
ffere
nt
combi
natio
ns of variab
les is
so hig
h
. Supp
ort Ve
ctor Machin
e (SVM) ha
s been pr
ove
n
to perform
much
better w
hen d
e
a
lin
g w
i
th hig
h
di
me
nsio
nal
da
tasets
and n
u
m
er
ical fe
ature
s
. Althoug
h SVM w
o
rks w
e
ll with
defau
lt valu
e, the perfor
m
anc
e of SVM can be improv
ed
signific
antly
us
ing p
a
ra
meter
opti
m
i
z
at
ion.
W
e
app
lie
d tw
o me
thods w
h
ich
are Grid Searc
h
and Ge
netic
Al
gorith
m
(GA) to opti
m
i
z
e
the
SVM para
m
et
e
r
s.
Our exp
e
ri
me
n
t
show
ed
that
SVM par
a
m
et
er o
p
ti
mi
z
a
ti
on
usi
ng
gri
d
se
arch
alw
a
ys fi
nds
ne
ar o
p
ti
ma
l
para
m
eter co
mbin
ation
w
i
thin
the g
i
ven
ran
g
e
s. How
e
ver,
g
r
id se
arch w
a
s
very slow
; ther
efore
it w
a
s ve
ry
relia
bl
e only i
n
low
dime
nsi
o
n
a
l datas
ets w
i
th few
par
ameters. SVM parameter o
p
ti
mi
z
a
tion usi
ng GA can
be
used
to s
o
lv
e the
pro
b
l
e
m
of gri
d
se
arch.
GA has
pr
ove
n
to b
e
mor
e
sta
b
le
than
gri
d
s
earch. B
a
se
d o
n
avera
ge ru
nni
n
g
time
on 9 d
a
tasets, GA was al
most
16 t
i
mes faster than gri
d
searc
h
. F
u
thermor
e
, the
GA
’
s
resu
lts were sli
ghlty bett
e
r than
the gr
id
search in 8 of
9 datasets.
Ke
y
w
ords
:
su
pport vector
machi
ne, ge
netic
al
gor
ith
m
s, pa
rameter opti
m
i
z
a
t
i
o
n
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Support V
ector M
achi
n
e Classi
fier
Cla
ssifi
cation
is a
supe
rvised
lea
r
nin
g
te
chni
que
wh
ich l
earns a f
unctio
n
fro
m
trainin
g
data set that con
s
i
s
ts of input feature
s
/attributes
a
n
d
catego
rical o
u
tput [1]. This function will
be
use
d
to
predi
ct a
cla
s
s la
b
e
l of a
n
y vali
d inp
u
t vecto
r
. The m
a
in
g
oal of
cla
s
sification
is to a
pply
machi
ne lea
r
ning alg
o
rith
ms to achieve the best p
r
e
d
iction a
c
curacy [2].
Cla
ssifi
cation
proble
m
ca
n
be viewed a
s
optimizatio
n probl
em wh
ere the go
al is to find
the be
st m
o
d
e
l that
rep
r
e
s
ents th
e p
r
ed
ictive rel
a
tion
ship
s i
n
the
data [3]. Oth
e
r th
an th
e
well-
kno
w
n cl
assi
cal data mini
ng techni
que
s su
ch a
s
nai
ve Bayes, decisi
on tree, ru
le inductio
n
, etc.,
Suppo
rt Vect
or Ma
chin
e (SVM) has
g
a
ined m
o
re
attention and
has b
een a
dopted in
da
ta
cla
ssifi
cation
probl
em
s in o
r
de
r to find a
good
sol
u
tio
n
. [4] repo
rte
d
that SVM h
a
s b
een
proven
to perform much b
e
tter wh
en deali
ng wit
h
high dime
n
s
ion
a
l data
s
e
t
s.
SVM is an
e
m
ergi
ng d
a
ta
cla
ssifi
cation
techni
que
which i
s
pro
p
o
s
ed
by [5], has b
e
e
n
widely
ado
pted in
vario
u
s
fields
of cl
assification. T
he
SVM algo
rith
m ha
s
an
adv
antage
that it
is
not affected b
y
local minim
a
, furthermo
re it does
not
suffer fro
m
the curse of hig
h
dimen
s
ion
a
lity
becau
se of th
e use of sup
port vecto
r
s [6]. Unfo
rtun
at
ely, the SVM perfo
rman
ce
highly dep
en
ds
on pa
ramete
r setting an
d
its ke
rnel
sel
e
ction. Th
e selectio
n quali
t
y of SVM p
a
ram
e
ters an
d
kernel fun
c
tio
n
s h
a
ve an e
ffect on the l
earni
ng a
nd
gene
rali
zatio
n
perfo
rma
n
ce [7]. The SVM
algorith
m
is e
x
plained mo
re details in [8
].
2. Param
e
ter
Optim
i
zatio
n
Gene
rally, most of machi
ne learning a
l
gorithm
s
will
not achieve
optimal re
sult
s if their
para
m
eters
a
r
e n
o
t bei
ng t
uned
prope
rly. To buil
d
a
hi
gh a
c
curacy
cla
ssifi
cation
model, it i
s
ve
ry
importa
nt to
cho
o
se a
po
werful
ma
chi
ne lea
r
nin
g
a
l
gorithm
as
well a
s
a
d
ju
st its paramet
ers.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
SVM Param
e
ter Optim
i
zation Using G
r
id
S
earch and
Geneti
c
Algorithm
… (Iwan Syarif)
1503
Paramete
r o
p
timization
can be very time con
s
umi
ng if done manually e
s
p
e
cially wh
en
the
learni
ng al
go
rithm ha
s man
y
param
eters
[9, 10]. The l
a
rge
s
t p
r
obl
e
m
s e
n
counte
r
ed in
setting
up
the SVM mo
del are ho
w t
o
sel
e
ct the
kernel fun
c
tio
n
and it
s pa
rameter val
u
e
s
. Inapp
rop
r
i
a
te
para
m
eter se
ttings
lead
to poor cla
ssifi
cation
re
sults.
In this pape
r, we use
d
two method
s to adju
s
t the SVM parame
t
er: grid search wit
h
cro
s
s-vali
dati
on and G
enet
ic Algorithm
s
(GA).
2.1. Parameter Optimiza
ti
on using Gri
d
Search
The g
r
id
se
a
r
ch
is ori
g
in
ally an exh
a
u
st
ive
sea
r
ch
ba
sed
on d
e
fined
sub
s
e
t
of the
hyper-pa
r
am
eter sp
ace.
The hyper-pa
r
ameters are
specifie
d usin
g
minimal value (lower bo
un
d),
maximal valu
e (up
p
e
r
bo
u
nd) a
nd n
u
m
ber of
step
s.
There are three differe
nt scale
s
that
ca
n be
use
d
: linea
r scale, q
uad
rati
c scal
e and l
ogarith
m
ic
scale. The p
e
rf
orma
nce of e
v
ery com
b
ina
t
ion
is evaluate
d
usin
g som
e
p
e
rform
a
n
c
e
metrics.
Figure 1. SVM param
eter
usin
g GRI
D
search
Grid
sea
r
ch o
p
timize
s the
SVM param
e
t
ers
(C,
, degree, etc.
) usi
n
g a cross vali
dation
(CV) te
chniq
ue a
s
a p
e
r
forma
n
ce m
e
tric. T
h
e
g
oal i
s
to
id
entify good
hyper-pa
r
am
eter
combi
nation
so
th
at
the cl
assifier ca
n
p
r
edi
ct
un
kn
o
w
n
data
accu
rately. According to
[11], the
cro
s
s-vali
dati
on tech
niqu
e can p
r
event the over-fitting proble
m
.
To cho
o
se C and
using k-fold CV, we first split the available dat
a into k sub
s
ets (in
most expe
rim
ents we set k=10
). One su
bset is
u
s
e
d
as a testin
g d
a
ta and then
evaluated u
s
i
ng
the rem
a
ining
k-1 t
r
ainin
g
sub
s
et
s. The
n
we
cal
c
ulat
e the CV e
r
ror u
s
ing thi
s
split erro
r for
the
SVM classifi
er usin
g different values o
f
C,
and other pa
ramet
e
rs. Vario
u
s
combi
nation
of
hyper-pa
r
am
eters val
ue a
r
e entered an
d the one
with
the be
st cross-valid
ation
accuracy (or the
lowe
st CV error) i
s
sel
e
cte
d
and u
s
ed to
train an SVM on the whol
e
dataset.
In linea
r
ke
rn
el there i
s
o
n
l
y one im
po
rtant
pa
ram
e
te
r to
optimize
whi
c
h i
s
C, i
n
RBF
kernel
a
nd si
gmoid ke
rnel
t
here
are
2
paramete
r
s:
C an
d
whil
e polynomi
a
l
ke
rnel
ha
s
3
para
m
eters: C,
and de
gre
e
. Actually th
ere
are
mo
re
t
han three p
a
ramete
rs but
sele
cting
more
para
m
eters a
nd a large n
u
mbe
r
of ste
p
s (or p
o
ss
ib
le values
of para
m
eters) result in a h
u
g
e
numbe
r of co
mbination
s
. F
o
r exampl
e, if we ch
oo
se to optimize 5
para
m
eters a
nd 25 ste
p
s f
o
r
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1502 – 150
9
1504
each p
a
ra
me
ter, then
the t
o
tal combin
ations would
b
e
25
5
or 9,76
5,625
whi
c
h
requires a
hu
ge
amount of time. The SVM
para
m
eter o
p
t
imization u
s
i
ng grid
sea
r
ch is explain
e
d
in Figure 1.
One of the bi
gge
st pro
b
le
ms of SVM param
et
er o
p
timization i
s
th
at there a
r
e
no exact
rang
es of
C and
value
s
. We beli
e
ve that the wider the p
a
rameter
rang
e is, the mo
re
possibilities t
he grid
search method has of findi
ng the best
com
b
ination param
e
ter. Therefore,
in
our expe
rime
nt we de
cide
d to make the
range of C a
nd
from 0.00
1 to 10,000.
2.2. Parameter Optimiza
ti
on using Ge
netic Algo
rithm
The GA whi
c
h was firt
sly propo
se
d by John
Holl
and i
n
the 1975, is a method for solving
optimizatio
n
probl
em
s tha
t
is ba
se
d o
n
natu
r
al
se
l
e
ction, the
p
r
ocess th
at drives biolo
g
i
c
al
evolution. G
A
can
also
be u
s
ed fo
r SVM para
m
eter o
p
timi
zation. GA
searche
s
the
best
para
m
eters b
u
t not naively like a brut
e-force o
r
gr
i
d
sea
r
ch. G
A
is very useful to implement
whe
n
the be
st range
s an
d
depe
nden
cie
s
of variou
s
S
V
M param
ete
r
s a
r
e n
o
t kn
own at all. G
A
is
more
ap
pro
p
r
iate th
an
gri
d
sea
r
ch
whi
c
h i
s
ve
ry ti
me
con
s
umi
ng b
e
cau
s
e i
t
tries too
m
any
combi
nation
s
of paramete
r
s. The
pa
ram
e
ter o
p
ti
mizat
i
on u
s
ing
GA
algorith
m
is
e
x
plained i
n
th
e
Figure 2.
Figure 2. Parameter O
p
tim
i
zation u
s
in
g Evolutionary
Algorithm
3. Experimental Settings
3.1. Data
se
ts
We
used
ni
ne dim
e
n
s
io
nal d
a
tasets whi
c
h
have
the n
u
mbe
r
of features from
45
attributes
(th
e
small
e
st
) u
n
til 20,000
attributes
(the
highe
st). Th
e
list of data
s
ets a
r
e sho
w
n in
Table 1.
Dexter, internet_ads
,
madel
on, musk
,
s
p
ambas
e
, SPECTF heart and intrus
ion datas
e
t
s
were do
wnl
o
aded from
UCI Ma
chi
n
e
Learning
R
epo
sitory wh
ile leukemia
and em
bry
onal
tumours datasets
were from BioI
nformatics Group Seville (BIGS).
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
SVM Param
e
ter Optim
i
zation Using G
r
id
S
earch and
Geneti
c
Algorithm
… (Iwan Syarif)
1505
Table 1. The
datasets
# Dataset
Name
Missing
values
Number of
instances
Number of
Attributes
Attributes t
y
pe
Classes
1
Leukemia
No
72
7,130
All numerics
all, aml
2 Embr
y
o
nal
Tumours
No 60
7,130
All
numerics
0,1
3 Dexte
r
No
600
20,000
All
numerics
1,-1
4
Internet_a
ds
Y
e
s
3,279
1,559
All numerics (5 real,
others binar
y
ad, nonad
5 Madelon
No
2,600
501
All
numerics
1,-1
6 Musk
No
6,589
168
All
numerics
0,1
7
Spambase
Y
e
s
4,601
58
All numerics (55real, 3
integer)
0,1
8
SPECTF He
art
No
80
45
All numerics
0,1
9
Intrusion
No
25,192
42
34 numerics, 8 n
o
minal
normal,
anomal
y
3.2. Perform
a
nce Me
tric
The metri
c
u
s
ed to evaluat
e the perfo
rm
ance of SVM is given in Ta
ble 2 [12]:
Table 2. Perf
orma
nce metric
Actual Result
True
Fal
s
e
Predicted
Result
T
r
ue
T
r
ue Positive (
T
P)
F
a
lse Positive (
F
P)
False
False Negative (
F
N)
True N
egative (T
N)
W
e
us
e
ac
cu
ra
c
y
, p
r
ec
is
io
n, r
e
c
a
ll a
n
d
F
-
measure as p
e
rform
a
n
c
e
measurement
which
sho
w
n in Ta
b
l
e 3.
Table 3. Cla
s
sificatio
n
perf
o
rma
n
ce mea
s
ureme
n
t
Measure
Formula
Pr
ecision
TP
TP
FP
Recall /
Sensitiv
ity
/
TP
TP
F
N
Selectiv
ity
TN
FP
TN
Accur
a
cy
TP
T
N
TP
TN
FP
FN
F-Measur
e
2
∗
Pr
ecision
∗
R
ecall
Pr
ecision
Re
c
a
l
l
3.3. Experimental Results
In the b
egin
n
i
ng, we a
pply
feature
sel
e
ct
ion
al
gotihm
s
to all
data
se
ts. After that,
we
run
SVM with three different config
uratio
n: SVM wi
th d
e
fault param
eters, SVM with grid sea
r
ch
optimizatio
n and SVM with GA optimization. The re
sults are expl
a
i
ned in the fol
l
owin
g se
ctio
ns.
3.3.1. Featur
e Selection
Algorithms
Before
we
a
pplied
SVM i
n
to hig
h
dim
ensi
onal
dat
aset
s, we u
s
ed featu
r
e
selectio
n
algorith
m
s to
red
u
ce the
numb
e
r
of attribute
s
.
Feature
sele
ction al
go
rith
m is
a po
p
u
lar
techni
que
u
s
ed to fin
d
the
mo
st impo
rt
ant an
d o
p
timal sub
s
et of
features for
building
po
werful
learni
ng mod
e
ls. An efficient feature selectio
n meth
od ca
n elimi
nate irreleva
nt and red
u
n
dant
data.
In our p
r
evio
us exp
e
rime
n
t
s [13], we ap
pli
ed
two
fe
ature sele
ction algorith
m
s which are
Geneti
c
Algo
rithm (GA) a
nd Particle S
w
arm Optimi
zation (PSO
) and the results are sho
w
n in
Table 4.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1502 – 150
9
1506
Table 4. Feat
ure Sele
ction
usin
g GA and
PSO
Dataset Name
Number of
origin
al
attributes
Number of
attrib
utes after re
duce
d
by
Fraction of Fe
atu
r
es
(FF
)
G
A
PSO
G
A
PSO
1 Leukemia
7,130
2,237
109
31.37%
1.53%
2 Embr
y
o
nal
Tumo
urs
7,130
619
202
8.68%
2.83%
3 Dexte
r
20,000
6,133
279
30.67%
1.40%
4 Internet_a
ds
1,559
489
302
31.37%
19.37%
5 Madelon
501
142
5
28.34%
1.00%
6 Musk
168
66
16
39.29%
9.52%
7 Spambase
58
29
27
50.00%
46.55%
8 SPECTF
He
art
45
11
9
24.44%
20.00%
9
Intrusion NSL K
D
D
42
16
8
38.10%
19.05%
Average
F
F
31.36%
13.47%
Table 4
sh
o
w
s th
at both
GA and P
S
O have
su
ccessfully re
duced the n
u
mbe
r
of
attributes of a
ll data sets,
where
GA redu
ced th
e nu
mb
er of attri
bute
s
to 3
1
.36%
of origi
nal d
a
ta
in ave
r
ag
e
while PSO
wa
s 1
3
.47% i
n
averag
e. T
h
e
r
efore, in
all
of ou
r exp
e
ri
ments bel
ow,
we
use
d
data
s
et
s whi
c
h h
a
ve been redu
ce
d
by PSO.
3.3.2. SVM
w
i
th De
fault P
a
rameters
We u
s
ed
Lib
SVM function
provide
d
by Rapi
dMine
r
Machi
ne Le
arning To
ols a
n
d
applie
d
this al
gorithm
into 9
data
s
e
t
s with
the d
e
f
ault paramet
ers an
d u
s
in
g
four
different
ke
rnel
s
whi
c
h
are line
a
r, RB
F, sigmoid a
n
d
polynomial
ker
nel
s. The result
s are
sh
own in Ta
ble
5.
Table 5. SVM with default para
m
eters
No
PSO-r
educed da
tasets
SVM kernels
Linear
RBF
Polynomial
Sigmoid
F Measure
F Measure
FMeasure
F
Measure
1 Leukemia
74.11%
84.25%
80.94%
66.67%
2 Embr
y
o
nal
Tumo
urs
74.50%
76.67%
74.50%
76.67%
3 Dexte
r
74.92%
68.70%
63.00%
53.18%
4 Internet_a
ds
96.81%
92.19%
96.81%
95.08%
5 Madelon
61.45%
65.59%
60.55%
66.67%
6 Musk
91.29%
96.58%
93.03%
78.74%
7 Spambase
79.70%
84.91%
73.90%
64.84%
8 SPECTF
He
art
74.11%
84.25%
80.94%
66.67%
9
Intrusion NSL K
D
D
26.70%
94.41%
40.03%
84.44%
SVM with RBF ke
rnel a
c
hieved the
b
e
st re
su
lt
s in
5 of 9 data
s
ets
whil
e ot
her th
ree
kernel
s (line
a
r
, polynomial
and si
gmoid
)
achieve
d
be
st results in 2
of 9 dataset
s. In embryo
nal
tumours data
s
et, RBF ke
rn
el and si
gmoi
d kernel have
the same results.
3.3.3. SVM Parameter Op
timization
Using Grid Search
In the
se
co
n
d
expe
rime
nts, we
applie
d pa
ram
e
ter optimi
z
ation
of SVM
usi
ng g
r
i
d
sea
r
ch. G
r
id
se
arch i
s
u
s
ed
to optim
ized
C
pa
ra
meter
(in li
n
ear
ke
rnel
),
C a
nd g
a
m
m
a
para
m
eter
(in
RBF an
d sig
m
oid kernels) and
C,
gam
ma & deg
ree
(in polyn
omi
a
l ke
rnel
). Th
e
para
m
eter
ra
nge
s for expe
riment
s is ex
plaine
d in Ta
ble 6.
Table 6. Hyp
e
r pa
ramete
rs ran
ge for ex
perim
ents
Parameters
Kernel
Min Max T
y
pe
Steps
Scale
C linear
0.001
10,000
Real
10
logar
ithmic or logarithmic legacy
gamma
Linear, RBF, sig
m
oid
0.001
10,000
R
eal 10
logarithmic
or
logarithmic legacy
degree
pol
y
nomial
1
5
Integer
1
Linear
(1,2,
3
,4,5)
The SVM parameter o
p
timization u
s
ing
Grid
Sea
r
ch experim
ental results a
r
e shown i
n
Table 7.
Com
pare to th
e previous
re
sults (ple
a
s
e
see
Table 5
)
, the
SVM param
eter optimi
z
atio
n
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
SVM Param
e
ter Optim
i
zation Using G
r
id
S
earch and
Geneti
c
Algorithm
… (Iwan Syarif)
1507
usin
g g
r
id
se
arch give
s
si
gnifica
nt imp
r
oment. In
l
e
u
k
emia
data
s
e
t, grid
se
arch
improved th
e F-
measure fro
m
84.25% to 100%. In embryon
a
l tu
mours data
s
et, grid se
arch improve
d
the F-
measure fro
m
76.67% to
84.95% and i
n
musk datas
et grid sea
r
ch
improved the
F-mea
s
ure from
95.68% to 100%. Overall
,
grid sea
r
ch
was a
b
le
to signifi
cantlt
y improve the cla
ssifi
cati
on
perfo
rman
ce
on 8 of 9 datasets. Ho
wever, g
r
id
sea
r
ch wa
s
failed to find the best SVM
para
m
eters o
n
intrusi
on da
taset.
This
experi
m
ent shows t
hat the g
r
id
se
ar
c
h
a
l
wa
ys
fin
d
s
nea
r
op
tima
l pa
r
a
me
te
r
combi
nation
within the given ran
g
e
s
, unfortunately it
is very time
con
s
umi
ng. If the dimensio
n of
datasets i
s
q
u
ite high
or the nu
mbe
r
of
paramete
r
combinatio
ns i
s
h
uge, the
g
r
id
sea
r
ch mi
ght
be neve
r
fini
she
d
a
s
it h
appe
ned i
n
i
n
trusi
on
data
s
et for
all kernel
s a
nd
a
l
so in
dexter,
internet_
a
d
s
, madelon a
n
d
spam
ba
se
datasets fo
r polynomial kernel.
Th
eref
ore,
eventho
ugh
grid
se
arch
gi
ves ex
celle
nt
results in
alm
o
st all
data
s
e
t
s, but it i
s
rel
i
able
only in
l
o
w
dimen
s
io
n
a
l
dataset with few parameters
.
Table 7. SVM param
eter o
p
timization u
s
ing grid
sea
r
ch
3.3.4. Param
e
ter O
p
tim
i
z
a
tion U
s
ing Gene
tic Alg
o
rithm
We u
s
e
d
GA
function p
r
o
v
ided by Ra
pidMine
r
Ma
chin
e Lea
rni
ng Tool
s to
adju
s
t the
SVM paramet
ers
with the d
e
fault para
m
e
t
ers a
s
follows:
1.
Max gene
rations
: sets the numb
e
r
of generation
s
for process termin
ation
,
the
default value
is 50
2.
Population size
: spe
c
ifie
s the p
opula
t
ion si
ze o
r
the numb
e
r
of individual
s per
gene
ration, th
e default valu
e is 5
3.
Tourname
nt frac
tion
: sp
ecifie
s the fraction
of the
curre
n
t pop
ul
ation which should
be used a
s
tourna
ment me
mbers,
the de
fault value is 0.25
4.
Cros
sov
e
r prob
: sp
ecifi
e
s the
pro
b
ability for an
individual t
o
be
sele
cte
d
for
cro
s
sove
r, the default valu
e is 0.9
5.
Muta
tion t
y
pe
: there are three mut
a
tion types
whi
c
h are G
aussia
n
mutation,
swit
chin
g mutation and
spa
r
sity mutation
. We use
d
the default valu
e: Gaussia
n
mutation
F
M
e
a
s
ure
B
e
s
t
pa
r
a
m
e
t
e
rs
F
Meas
u
r
e
B
es
t
pa
ra
m
e
t
e
rs
F
Meas
u
r
e
B
e
s
t
pa
r
a
m
e
te
r
s
F
M
e
a
s
ure
B
e
s
t
p
a
ram
e
t
e
rs
C
=
3
1
.6
227
76
C
=
3
1
.6
228
C
=
7
.04
5
4
C
=
3
1.62
27
7
g
a
mm
a
=
0
.
0
0
1
g
a
m
ma
=
2
5
0
.
4
6
9
5
g
a
mma
=
0
.
0
0
1
de
g
r
e
e
=1
C
=
0
.
76
2
C
=
0
.
0
99
9
C
=
3
.
0
82
C
=
3.0
8
2
g
a
m
m
a
=
0.09
99
g
a
m
m
a
=
1
25
.07
2
g
a
m
m
a
=
1
2
5
.
07
2
de
g
r
e
e
=
1
de
g
r
e
e
=
1
C
=
6
.
94
6
6
C
=
63
.
0
95
73
44
C
=
63
.
0
95
73
4
g
a
m
m
a
=
0
.
0
03
98
1
g
a
m
m
a
=
0
.00
3
9
8
C
=
1.0
C
=
0
.
9
96
5
C
=100
0.0
g
a
m
m
a
=
0.99
56
g
a
m
m
a
=
0.00
00
99
99
9
C
=
2
5
0
.
390
4
C
=
0
.
9
96
5
C
=
10
00
.
0
g
am
m
a
=
0
.
9
9
5
6
g
a
m
m
a
=
0
.
0
00
09
99
9
9
C
=
0.2
5
1
188
64
C
=
6
3
.09
5
C
=
0.00
39
8
C
=
2
51
.
1
88
64
g
a
mma
=
0
.
0
0
1
g
a
mma
=
1
2
5
.
0
7
2
g
a
mma
=
0
.
0
0
1
de
g
r
e
e
=
1
C
=
31
.
6
22
77
g
a
mma
=
0
.
0
1
C
=
2
20.4
9
9
C
=
6
3.0
957
C
=
1.0
C
=1.0
g
a
m
m
a
=
0
.015
84
8
g
a
m
m
a
=
0
.
0
63
0
g
a
m
m
a
=
1
5.84
89
de
g
r
e
e
=
1
fo
r
c
e
d
to
st
o
p
af
t
e
r
2
we
e
k
s
ru
nn
i
n
g
fo
r
c
ed
to
st
o
p
af
t
e
r
2
we
e
k
s
r
u
nni
n
g
fo
r
c
e
d
to
st
o
p
af
t
e
r
2
we
e
k
s
run
n
i
n
g
fo
r
c
ed
to
st
o
p
af
t
e
r
2
we
e
k
s
ru
nn
i
n
g
fo
r
c
e
d
to
st
o
p
af
t
e
r
1
we
e
k
ru
nn
i
n
g
10
0.00
%
1
0
0
.00
%
97
.
4
4%
9
5
.
6
0
%
62.2
8
%
6
6.07
%
6
2
.
0
2
%
10
0.0
0
%
1
0
0
.0
0%
8
4
.
95
%
8
1.
56
%
8
1.
61
%
f
a
ile
d
,
no
re
s
u
l
t
s
f
a
ile
d
,
no
re
s
u
l
t
s
f
a
ile
d
,
no
re
s
u
l
t
s
78.6
8
%
7
5.13
%
7
8
.
2
2
%
97.5
4
%
fo
r
c
e
d
to
st
o
p
af
t
e
r
1
we
e
k
ru
nn
i
n
g
fo
r
c
e
d
to
st
o
p
af
t
e
r
1
we
e
k
ru
nn
i
n
g
f
a
ile
d
,
no
re
s
u
l
t
s
8S
P
E
C
T
F
He
a
r
t
9I
n
t
r
u
s
i
o
n
94
.
3
6%
f
a
ile
d
,
no
re
s
u
l
t
s
f
a
ile
d
,
no
re
s
u
l
t
s
f
a
ile
d
,
no
re
s
u
l
t
s
8
6
.
81
%
8
9
.
9
7
%
9
0.
36
%
9
1.
75
%
3D
e
x
t
e
r
4
Ma
d
e
l
o
n
6M
u
s
k
f
a
ile
d
,
no
re
s
u
l
t
s
fa
i
l
ed
,
no
re
s
u
l
t
s
f
a
ile
d
,
no
re
s
u
l
t
s
10
0.00
%
1
0
0
.00
%
10
0.0
0
%
In
t
e
r
n
et
_
a
d
s
5
Si
g
m
o
i
d
No
PS
O
‐
re
duc
e
d
da
t
a
s
e
ts
1L
e
u
k
e
m
i
a
7S
p
a
m
b
a
s
e
Li
n
e
a
r
R
B
F
P
ol
y
n
om
i
a
l
2
E
m
br
y
o
na
l
Tu
m
o
u
r
s
76
.
6
7%
10
0.0
0
%
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1502 – 150
9
1508
6.
Selection ty
pe
: there a
r
e eight diffe
rent sele
ctio
n types whi
c
h a
r
e uni
o
n
, cut,
roulette whee
l, stocha
stic u
n
iversal sa
m
p
ling,
Bolztm
ann, ran
k
an
d
tournam
ent (default value).
The exp
e
rim
ental results of SVM p
a
ram
e
ter
opt
imization
u
s
i
ng GA
are
sho
w
n i
n
Table 8.
Table 8. SVM param
eter o
p
timization: G
r
id Search vs
GA
In the p
r
evio
us
experi
m
en
t, grid
se
arch
had
failed
b
e
ca
use of ve
ry long
execution tim
e
and return
ed
no re
sults
whe
n
appli
e
d to intrusi
o
n (all fou
r
kernel
s), spa
m
base (lin
ea
r,
polynomial
a
nd si
gmoid
kernel
), mad
e
l
on (p
olynom
i
a
l ke
rnel
), int
e
rnet_
a
d
s
(p
olynomial
kernel)
and d
e
xter (p
olynomial
kernel) d
a
tasets.
In t
he cu
rren
t experime
n
t, GA has
proven to be m
o
re
stable tha
n
grid sea
r
ch.
In leuke
m
ia a
nd mu
sk d
a
ta
sets, g
r
id sea
r
ch
achieved
100% a
c
cura
cy in 4 ke
rnel
s whil
e
GA achieve
d
100% accu
ra
cy in 2
kernel
s. From the
s
e 2 datasets re
sult
s, we ca
n see that lin
ear
kernel i
s
much faster tha
n
other
kernel
s (RBF, polyn
omial and
sig
m
oid kern
els). In embryon
a
l
tumours, dex
ter, internet_ads, SPECT
F
Heart and
i
n
trusi
on datasets, ev
ol
utionary search has
s
lig
h
t
ly be
tte
r a
c
c
u
r
a
c
y
but mu
ch
fas
t
er
e
x
ec
utio
n ti
me. Only i
n
1 data
s
et
(spamba
se
)
gri
d
sea
r
ch ha
s b
e
tter accu
ra
ci
es than the G
A
.
Ho
wever, in
the madel
on
and the intrus
io
n data
s
e
t
s GA coul
d
not guarant
ee goo
d
results fo
r all
kernel
s be
cause the
cla
ssifi
ca
tion pe
rforma
nces were
not so good (in
m
a
d
e
lon
datasets the
F-mea
s
u
r
e i
s
only 66.67% and in intr
u
s
i
on data
s
et the F-me
asu
r
e
is only 61.31
%).
4. Conclusio
n
Although SV
M wo
rk well
with defa
u
lt value,
the p
e
rf
orma
nce of S
V
M can
be i
m
prove
d
signifi
cantly
usin
g pa
ram
e
ter o
p
timiza
tion. O
ne
of the big
g
e
s
t pro
b
lem
s
of
SVM param
ete
r
optimizatio
n i
s
the
r
e i
s
no
exact range
s of C
and
values.
We
believe that
the wid
e
r th
e
para
m
eter
ra
nge i
s
, the
more
po
ssi
bil
i
ties the g
r
id
sea
r
ch meth
od find
s the
best
combi
n
a
t
ion
para
m
eter.
Our
experim
ent sh
ows t
hat the gri
d
sea
r
ch al
ways find
s ne
ar optim
al p
a
ram
e
ter
combi
nation
within given range
s.
SVM para
m
eter
o
p
t
imization
u
s
i
ng g
r
id
search is very
po
werful
F
M
e
a
s
ure
K
e
r
ne
l
s
Ex
e
c
.
Ti
me
(
h
h
:
mm:
s
s
)
F
Me
as
u
r
e
K
e
r
n
e
l
s
Ex
e
c
.
Ti
me
(
h
h
:
m
m
:
ss)
l
i
n
e
a
r
00
:
0
0:
0
5
l
i
n
e
a
r
00
:
0
0
:
02
R
B
F
0
0:
00
:
3
4
s
i
g
m
o
i
d
0
0
:
0
0:
0
3
p
o
l
y
n
o
m
i
a
l
00
:
0
0:
2
8
s
i
g
m
o
i
d
0
0:
00
:
1
4
2
E
m
br
y
o
na
l
Tu
m
o
u
r
s
60
7,
1
3
0
2
0
2
2.
83
%
8
4
.
95%
l
i
n
e
a
r
00
:
0
0:
0
2
85
.33
%
po
l
y
no
m
i
a
l
0
0
:
0
0
:
0
3
3
D
e
x
t
e
r
6
00
2
0
,
0
0
0
27
9
1
.
4
0
%
78
.
68%
l
i
n
e
a
r
05
:
5
6:
0
3
78
.88
%
l
i
n
e
a
r
00
:
2
0
:
05
4
I
n
t
e
r
n
e
t
_
a
d
s
3
,
2
7
9
1,
5
5
9
3
0
2
19
.
3
7
%
97
.
54%
l
i
n
e
a
r
00
:
2
0:
1
3
97
.58
%
l
i
n
e
a
r
00
:
1
6
:
15
R
B
F
0
0:
26
:
3
2
l
i
n
e
a
r
0
0:
0
0
:
0
2
RBF
0
0
:
0
0
:
0
2
po
l
y
no
m
i
a
l
0
0
:
0
0
:
0
2
si
g
m
oi
d
0
0
:
0
0
:
0
2
l
i
n
e
a
r
00
:
2
1:
2
0
l
i
n
e
a
r
00
:
1
9
:
32
R
B
F
1
6:
31
:
0
2
p
o
l
y
n
o
m
i
a
l
0
0:
3
0
:
1
2
p
o
l
y
n
o
m
i
a
l
00
:
4
6:
5
9
s
i
g
m
o
i
d
0
4:
13
:
2
1
R
B
F
0
1:
37
:
3
0
l
i
n
e
a
r
RBF
0
0
:
4
7
:
4
4
po
l
y
no
m
i
a
l
8S
P
E
C
T
F
H
e
a
r
t
8
0
4
5
9
20
.
0
0
%
91
.
75%
s
i
g
m
o
i
d
0
0:
00
:
2
0
93
.34
%
si
g
m
oi
d
0
0
:
0
0
:
0
4
li
n
e
a
r
RBF
1
7
:
1
3
:
3
6
po
l
y
no
m
i
a
l
si
g
m
oi
d
9.
52
%
46
.
5
5
%
19
.
0
5
%
6,
59
8
4,
60
1
25
19
2
Nu
m
b
e
r
of
or
i
g
i
n
a
l
a
ttr
i
b
ut
e
s
7,
1
3
0
501
168
58
42
10
9
5
16
27
8
Nu
m
b
e
r
of
at
t
r
i
b
u
t
e
s
af
t
e
r
re
d
u
ce
d
by
PS
O
95
.43
%
SV
M
wi
t
h
e
v
o
l
ut
i
o
na
r
y
se
a
r
c
h
9
I
nt
r
u
s
i
o
n
no
re
s
u
l
t
s
al
l
ke
r
n
e
l
s
we
r
e
f
a
ile
d
pr
o
g
r
a
m
wa
s
fo
r
c
e
d
to
st
op
af
t
e
r
ru
n
n
i
n
g
fo
r
2
we
e
k
s
wi
t
h
ou
t
an
y
re
s
u
l
t
s
6
M
u
s
k
1
00
.0
0%
10
0.
00
%
7
S
p
a
m
b
a
s
e
9
4
.
36%
8
3
.
4
2%
1
No
D
a
t
a
s
e
ts
SV
M
wi
t
h
gr
i
d
se
a
r
c
h
L
e
u
k
em
i
a
10
0.
00
%
Nu
m
b
e
r
of
in
s
t
a
n
c
e
s
72
1.
53
%
1
0
0
.
0
0
%
5
M
a
d
e
l
o
n
66
.
07%
6
6
.
6
7%
2,
60
0
1
.
0
0
%
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
SVM Param
e
ter Optim
i
zation Using G
r
id
S
earch and
Geneti
c
Algorithm
… (Iwan Syarif)
1509
and it i
s
a
b
l
e to imp
r
ov
e the a
c
cu
racy
significa
ntly. Howeve
r, gri
d
sea
r
ch ha
s
seve
ral
disa
dvantag
e
s
, it is
extrem
ely slo
w
a
nd f
u
rthe
rmo
r
e it
may lead to
very lon
g
exe
c
ution time. Fo
r
example, grid
search h
a
s
been failed i
n
finding
opti
m
al SVM parameters for i
n
trusi
on data
s
et
whi
c
h
ha
s a l
a
rge
nu
mbe
r
of insta
n
ces.
The p
r
o
c
e
s
s
wa
s force
d
to
stop
after
2
wee
k
s runni
n
g
.
Therefore,
gri
d
search
is very
reliable
o
n
ly in lo
w di
mensi
onal
da
taset
with fe
w p
a
ra
meters. To
s
o
lve
th
is
pr
ob
le
m, we
us
e G
e
ne
tic
Algor
ith
m
(G
A)
wh
ic
h is
ver
y
us
e
f
u
l
to
imp
l
eme
n
t
wh
e
n
the
best
ra
nge
s
and
dep
end
e
n
cie
s
of vari
o
u
s SVM
pa
ra
meters i
s
not
kn
own at
all.
GA h
a
s p
r
ov
en
to be more stable than g
r
i
d
se
arch. Wh
en appli
ed to
9 datasets,
GA has
an a
v
erage
run
n
i
ng
time of 294 seco
nd
s while
grid se
arch i
s
aro
und 4,6
80 se
con
d
s
(it does not in
clud
e intru
s
io
n
dataset whi
c
h wa
s failed
)
.
It means, S
V
M para
m
ete
r
optimi
z
ation
usin
g GA is
more th
an 1
5
.
9
times faste
r
than u
s
ing g
r
i
d
sea
r
ch.
Referen
ces
[1]
Gaber M, Z
a
s
l
avsk
y A, Kris
h
nas
w
a
m
y
S. A
Surve
y
of C
l
a
ssificatio
n
Met
hods
in
D
a
ta
Streams. In:
Aggar
w
a
l C.
Ed
i
t
o
r
s
. Data Streams, Advanc
es in Data
bas
e
S
y
stems. US: Sprin
ger. 20
07
: 39-59.
[2]
W
illiams NZS,
Armitage G. A Prelimin
ar
y Perf
orm
anc
e Comp
ariso
n
of Five Machin
e Le
arni
ng
Algorit
hms for Practical IP T
r
affic Flow
Cl
assification.
Co
mputer Co
mmun
i
catio
n
Revi
ew
. 2006; 3
0
.
[3]
Otero F
EB,
Freitas AA, Johnso
n
CG. Induci
ng
dec
isio
n trees
w
i
t
h
an ant col
o
n
y
optimizati
o
n
algorithm.
Appl
ied Soft Co
mp
uting
. 20
12; 12
: 3615-3
6
2
6
.
[4]
Ding
S, Ch
en
Li. Intel
lig
ent
Optimizatio
n
Me
thods
for High-
Dime
n
si
o
nal Data Clas
s
ificatio
n
for
Supp
ort Vector
Machin
e.
Intelligent Inform
a
t
io
n
Ma
na
gam
en
t.
2010; 2(6).
[5]
Cortes C, Vapnik V.
Supp
ort-Vector Net
w
or
ks.
Mach. Lear
n
. 1995; 2
0
: 27
3-29
7.
[6]
Sánch
e
z AVD.
Advance
d
su
p
port vector ma
chin
es an
d ker
nel meth
ods.
N
e
u
r
o
c
om
pu
ti
ng
. 200
3; 55:
5-20.
[7]
Sudh
eer
C, M
ahes
w
a
ra
n
R, Pan
i
gra
h
i
BK
, Mat
hur S.
A
h
y
br
id SVM-
PSO mode
l fo
r forecasti
n
g
monthl
y strea
m
flo
w
.
N
eur
al
Co
mp
uting a
n
d
Applic
ations
. 2
013.
[8]
Cha
ng CC, Li
n
CJ. LIBSVM:
A librar
y
for su
pport vector m
a
chi
nes.
ACM T
r
ans. Intell. Syst. T
e
chnol
.
201
1; 2(27): 1-
27.
[9]
F
r
iedrichs F
,
Igel C. Evol
utio
nar
y
tu
ni
ng of
multiple SVM
parameters.
N
e
u
r
o
c
om
pu
ti
ng
. 2005; 64:
107-
117.
[10]
Rossi AL
D, de
Carval
ho ACP
.
Bio-insp
ired
Optimi
z
a
t
i
o
n
T
e
chn
i
qu
es for SVM Para
met
e
r T
unin
g
. In
10th Br
azil
ia
n
S
y
mp
osi
u
m o
n
Ne
ural
Net
w
o
r
ks, 200
8. SB
RN ’
08. Pr
ese
n
ted
at the
1
0
t
h Brazi
lia
n
S
y
mp
osi
u
m on
Neura
l
Net
w
or
ks, 2008. SBR
N ’08. 20
08: 57
-62.
[11]
Lin
SW
, Ying
KC, Ch
en S
C
, Le
e Z
J
. Parti
c
le
s
w
a
rm opti
m
izatio
n
for p
a
rameter
deter
minati
on an
d
feature se
lecti
on of su
pp
ort vector mach
in
es.
Expert Sy
stems w
i
th Ap
plicati
ons
. 2
0
0
8
; 35: 18
17-
182
4.
[12]
Davis J, Go
adr
ich M.
T
he r
e
l
a
tionsh
i
p
betw
een Prec
isi
on-R
e
call
a
nd ROC
curves.
In Pr
o
c
eed
ings
of
the 23r
d Intern
ation
a
l C
onfer
ence o
n
Mach
i
ne Le
ar
n
i
ng, I
C
ML ’0
6. Ne
w
York, NY, USA. 2006: 23
3-
240.
[13]
S
y
ar
if I. Dimen
sion
alit
y R
edu
cti
on Al
gorithm
s on Hi
gh
Dim
ensi
ona
l Dat
a
s
e
ts.
EMITTER International
Journ
a
l of Eng
i
neer
ing T
e
c
h
n
o
lo
gy
. 201
4; 2(2).
Evaluation Warning : The document was created with Spire.PDF for Python.