TELKOM
NIKA
, Vol. 11, No. 4, April 2013, pp. 1725
~17
3
0
ISSN: 2302-4
046
1725
Re
cei
v
ed
No
vem
ber 1
9
, 2012; Re
vi
sed
Jan
uar
y 2, 20
13; Accepted
Jan
uary 15, 2
013
Hybrid Feature Selection Based on Improved Genetic
Algorithm
Shuxin ZHU
1
, Bin HU
2
*
1,2
Colleg
e
of Informatio
n
Scien
c
e and T
e
chno
log
y
, Na
nji
ng A
g
ricult
ural U
n
iv
ersit
y
W
e
iga
ng 1, Na
njin
g,chi
na, 21
009
5
*Corres
p
o
ndi
n
g
author, em
ail
:
hubin
@
nj
au.e
du.cn
A
b
st
r
a
ct
High
di
mensi
o
nality
is on
e of
the
most trou
bles
o
m
e d
i
fficu
lties e
n
cou
n
ter
ed i
n
intrus
io
n
detectio
n
system ana
lysi
s
and ap
plic
ati
on.
For
hi
gh d
i
mensi
on dat
a,
feature s
e
lect
ion
not o
n
ly c
an i
m
prove t
h
e
accuracy and
efficiency of classifica
ti
on, bu
t also disc
over
infor
m
ative s
u
bset. Co
mbi
n
i
n
g F
ilter type a
n
d
W
r
apper
type
character
i
stics,
this pa
per pr
opos
es a
hy
br
id typ
e
metho
d
for fe
ature
selecti
on
usi
n
g a
improve
d
g
e
n
e
t
ic alg
o
rith
m co
ntain
ed rew
a
r
d
and
pun
ish
m
e
n
t mec
h
a
n
is
m.
T
he mech
anis
m
ca
n g
uara
n
tee
this alg
o
rith
m rapi
d conver
ge
nce on a
pprox
imate glo
b
a
l
o
p
timal sol
u
tio
n
. Accordin
g to the exp
e
ri
me
nt
a
l
results, this alg
o
rith
m perfor
m
s w
e
ll and it'
s
time co
mpl
e
xity is low
.
Ke
y
w
ords
:
int
r
usion detection system
, genetic algor
ithm
(G
A), feature
selection,
m
u
t
ual infor
m
ation,
hybri
d
type
Copy
right
©
2013 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Intrusio
n det
ection i
s
essentially a cla
ssifi
cation p
r
oblem [1] an
d the first problem to
solved in
cla
ssifi
cation i
s
the f
eature extraction
and
sele
ction. Th
ere i
s
not a linear
relatio
n
ship
betwe
en feat
ure
s
an
d cla
s
sifier p
e
rfo
r
m
ances.
But when the featu
r
e num
be
r excee
d
s a
ce
rta
i
n
limit, it will cause the
classifier performance vari
ation.
The so-called feat
ure
sel
e
ction [2], is
to
cho
o
se the o
u
tput relevant
or im
po
rtant feature
sub
s
e
t
from the or
i
g
inal feature set acco
rdin
g
to
a ce
rtain eval
uation fun
c
tio
n
and a
s
far
as po
ssib
le t
o
redu
ce th
e dimen
s
ion
a
lity of the feature
spa
c
e
on th
e pre
m
ise of
not re
du
cin
g
the cl
as
sif
i
cat
i
on ac
cu
r
a
cy
.
A
ppa
re
ntly, the feature
sele
ct
ion
ha
s
t
w
o key
pro
b
lems:
sel
e
ct
ion
of
su
itabl
e evalu
a
tion
function
an
d
efficient featu
r
e
sub
s
et search method
s.
Acco
rdi
ng to
be depen
da
nt or not dep
enda
nt
on machi
ne lea
r
ni
ng algo
rithm, feature
sele
ction
alg
o
rithm
s
can
be divide
d int
o
two
cate
go
ries.
One i
s
wra
ppe
r-type
algo
rithm [3]
and
the other is filter-type algo
rithm [4]. Filter ba
sed feat
ure sele
ction
algorithm is
indep
ende
nt of
machi
ne lea
r
ning alg
o
rith
m and ha
s
some be
nefits such a
s
lo
w co
mputatio
nal co
st an
d high
efficien
cy, bu
t it perfo
rm
s
medio
c
rely in
re
du
ci
ng
di
mensi
on.
On
the
co
ntra
ry, wrap
per ba
sed
feature sele
ction algorith
m
s rely on
one or
se
ve
ral ma
chine
learni
ng alg
o
r
ithms
with the
cha
r
a
c
teri
stic of large
co
mputation
co
mplexity
, low efficiency b
u
t good effe
ct of redu
cing
the
dimen
s
ion
s
. This pa
per
combine
s
the cha
r
a
c
teri
stics of two kind
s of
algorithm
s and propo
ses a
hybrid m
e
tho
d
to pe
rform
the cha
r
a
c
teristi
c
sele
cti
on in i
n
tru
s
i
on dete
c
tion
data
set. T
h
is
method utilizes mutual i
n
formatio
n method to
eliminate re
dund
an
cy, establi
s
he
s re
ward
puni
shme
nt
mech
ani
sm a
nd u
s
e
s
the
i
m
prove
d
g
e
n
e
tic alg
o
rithm
to gen
erate
optimal
sub
s
et
suitabl
e for in
trusio
n dete
c
tion cla
s
sificat
i
on.
2. Rese
arch
Metho
d
In the int
r
u
s
io
n dete
c
tion
p
r
oce
s
s, extra
c
ti
on an
d p
r
o
c
essing
too
m
any featu
r
e
n
u
mbe
r
s
is
o
n
e
o
f
the ma
in
r
e
as
ons
le
ad
in
g to
s
p
ee
d
do
wn
. In
fa
c
t, so
me
p
a
r
a
me
ters ju
s
t
inc
l
ud
e
o
r
contai
n mini
mal informati
on on the system status
an
d almost hav
e no effect o
n
the test results.
These
param
eters a
r
e
call
ed
redu
nda
n
c
y pa
ram
e
ters. So fe
ature
sel
e
ctio
n
sh
ould first
ado
pt
method
s like
filter base
d
method
s to remove
re
dun
dan
cy para
m
eter and
reta
in the import
ant
cha
r
a
c
teri
stics
reflectin
g
t
he
state of t
he
sy
stem. I
n
this pa
per,
we
put fo
rward a
featu
r
e
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No. 4, April 2013 : 1725 – 1
730
1726
sele
ction m
e
thod ba
sed
on mutual i
n
formatio
n
from persp
ect
i
ve of
the data stati
s
tical
cha
r
a
c
teri
stic. The use of
mutual inform
ation feat
ure
sele
ction i
s
al
so ba
se
d on
assumptio
n
that
the evaluatio
n data an
d th
e actu
al data
have the sam
e
statisti
cal p
r
ope
rtie
s. So we
can u
s
e t
he
statistical pro
pertie
s
of kno
w
n evalu
a
tion
dat
a t
o
ex
pre
ss a
c
t
ual
sy
st
em cha
r
a
c
t
e
ri
st
ic
s.
Mutual information co
uld
measu
r
e th
e co-occu
r
re
nce relation
s of feature items an
d
categories. Characteristi
c
xi
appears
in a certai
n category
C
with hi
gh
probability and l
o
w
probability in other
catego
ries will
acquire greater m
u
tual in
formation value
(M
I). MI coul
d
be
expre
s
sed a
s
formula (1
):
(
|
)
(
,
)
l
o
g
(
)
*
(
)
Px
C
i
MI
x
C
i
Px
PC
i
(1)
Becau
s
e MI is benefi
c
ial to the low freq
uen
cy f
eature
s
, it easily leads to over-fitting. On
the othe
r ha
n
d
, removin
g
l
o
w-f
r
eq
uen
cy
feature
s
m
a
y affec
t
the detec
t
ion
effic
i
enc
y
[5]. In order
to improve it
, we
ad
d th
e P(xi) facto
r
fun
c
tion
which sta
n
d
s
for the feature appea
ra
nce
probability to mutual
information and form
an
i
m
proved al
gorithm
(PMI). PMI could be
expre
s
sed a
s
formula (2
):
(
|
)
(
,
)
(
)
l
o
g
( )
*
(
)
Px
C
i
PM
I
x
C
P
x
ii
Px
PC
i
(2)
The criterio
n
of this method is the co
ndi
tional mut
ual inform
ation and the
sele
ction
algorith
m
is
seq
uential fo
rwa
r
d
sele
cti
on. Firs
t we
let feature
set empty, then cal
c
ul
ate
th
e
curre
n
t PMI (xi,c) value an
d cho
o
se the maximum
mu
tual informati
on into feature sub
s
et Fi, and
in the
end
co
mpute PMI
(x
i,c/Fi) o
n
e
by
one. So
after
cho
o
si
ng
sev
e
ral
features i
n
to the
feature
sub
s
et, a
ne
w
sampl
e
sp
ace
is fo
rme
d
.
Comp
ared
with the
ori
g
i
nal
sampl
e
space, the n
e
w h
a
s
the smalle
r di
mensi
on an
d lowe
r co
rrelat
ion amon
g feature vari
able
s
.
After achi
evin
g red
u
ctio
n o
f
original fe
atur
e
sets by e
liminating the
redu
nda
ncy
among
feature
s
, we
coul
d ad
opt
the Wrapp
e
r
ba
s
ed fe
ature
sele
ction
method b
a
sed on
geneti
c
algorith
m
. Fi
gure
1
sho
w
s the
imp
r
ov
ed g
eneti
c
al
gorithm
which could
be
u
s
ed
in i
n
tru
s
i
on
detectio
n
system. The sa
mples
are filt
ered
by
con
d
i
tional mutual
information
and then
se
n
t
to
improve
d
g
e
n
e
tic o
perator to be
optimi
z
e
d
a
c
cordi
ng t
o
cl
assifier. In
the e
nd, the
optimal featu
r
e
s
u
bs
e
t
is
ac
qu
ir
e
d
.
normal
abn
orm
al
sample
set
prepr
ocesso
r
trainn
ing
samples
T
e
st
samples
feature
selection
improve
d
GA
intrusi
on det
ection
classifier
C4.5
performa
n
ce
estimation
Figure 1. Feature Sele
ctio
n Flow Di
ag
ram in the Intrusio
n Dete
cti
on ba
sed o
n
Improved GA
A
l
g
orithm
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Hybrid F
eature Selection B
a
se
d on Im
pr
ove
d
Gen
e
tic Algorithm
(Shuxi
n
ZHU)
1727
2.1. Impro
v
e
d
Gene
tic Al
gorithm De
sign
Selecting
the
optimal
sub
s
pace fro
m
th
e existi
ng
fea
t
ure sub
s
et h
a
s prove
n
to be
a NP-
hard
proble
m
. From
th
e optimi
z
atio
n point
of
view, the fe
a
t
ure
sele
ctio
n problem
i
s
a
combi
nato
r
ial
optimizatio
n
probl
em. Th
e cu
rrent
fea
t
ure
sele
ction
method
s ba
sed
on g
enet
ic
algorith
m
[6] usu
a
lly evalu
a
te the featu
r
e sub
s
et
on
the basi
s
of
classifie
r
an
d give individ
ual
evaluation in
dexes a
nd fitness (o
ften e
x
presse
d as f
)
acco
rdin
g to the classifica
tion accuracy
.
2.1.1. Codin
g
Scheme
In the feature
sele
ction ra
n
ge, individual
c
odi
ng sch
e
me often utilize
s
bina
ry expre
ssi
on.
Becau
s
e
bin
a
ry strin
g
ex
pre
ssi
on i
s
si
mple an
d
co
nvenient to o
perate. It also can
rep
r
e
s
ent a
wide
ra
nge
of different i
n
fo
rmation. A
s
su
ming the
or
i
g
i
nal set ha
s
1
6
features, th
en the l
ength
o
f
individual
L
=
16. Ea
ch i
ndi
vidual h
a
s a
gene
corr
e
s
p
ondin
g
to
se
quen
ce
characteri
stics. When
the individu
al
has
a ge
ne
expre
s
sed
a
s
"1" it
mea
n
s that the
correspon
ding
feature item
is
cho
s
e
n
. Co
nversely, "0" mean
s th
e featur
e i
s
not u
s
e
d
.
For exa
m
ple, indivi
dual
1100
1000
010
1100
0 me
an
s that the
seco
nd, fifth firs
t, tenth, t
w
elfth, thirte
enth featu
r
e
s
are
sele
cted. If using the exha
ustive sea
r
ch
method to
so
lve the optimal feature su
b
s
et, it will be
2m
possibl
e su
b
s
et co
mbin
ations fo
r the set containi
ng
m feature
s
. Such a h
uge
sea
r
ch sp
ace is
not feasi
b
le. By using g
e
n
e
tic algo
rithm
,
we not
only
can e
n
sure the glob
al opt
imum but al
so
avoid the hug
e sea
r
ch co
st
.
2.1.2. Initial
Population S
e
lection
Geneti
c
alg
o
rithm always
starts from the
popul
ation which re
pre
s
e
n
ts
potential
solution
set of the proble
m
, nam
ely initial population. The
population
con
s
i
s
ts of several gen
e
t
ic
individual
co
mpone
nts a
n
d
ea
ch in
dividual i
s
a p
o
ssible
sol
u
tion.
Here we ad
o
p
t the sto
c
ha
stic
method to ch
oose the initial gro
up. Every gene of ea
ch individ
ual
has the
sam
e
prob
ability in {0,
1} sel
e
ctio
n. Individual si
ze is determi
n
ed acco
rdin
g to the actual
situation.
2.1.3. Design
of Gene
tic Opera
t
or
There are three o
perators appl
i
ed to
get optimal
feature sub
s
et: the pro
portion
al
sele
ction o
p
e
r
ator, si
ngle
-
p
o
int cro
s
sove
r ope
rator,
simple mutatio
n
operator.
(1) Sel
e
ctio
n is the survival
of the fittest
pr
o
c
e
ss
ba
se
d on fitness.
Here the p
r
o
portion
al
sele
ction
ope
rator i
s
al
so
calle
d a roul
ette whe
e
l
se
lection
strate
gy. The ba
si
c ide
a
is th
at the
prob
ability of each individu
al’s bei
ng selected i
s
pro
p
o
rtional to the
size of its fitness.
(2) The
aim
of crossove
r i
s
to g
ene
rate
new
unit
s
in
the next g
e
n
e
ration. By u
s
ing th
e
cro
s
sove
r op
eration, ge
ne
tic algorithm
will be abl
e to greatly imp
r
ove it
s se
arch capability. In
this algo
rithm
,
the crossov
e
r ope
ration
use
s
a
si
ngle
-
point cro
s
so
ver and sele
cts individual
s with
a certain
pro
bability p. Fo
r two
eld
e
r i
ndividual
s p
a
r
ticipatin
g in
the cro
s
sove
r, we
rando
mly
sele
cts
on
e cross point an
d
ge
nerates
two ne
w
indiv
i
dual
by swa
pping
two
in
dividual
s pa
rt
ial
stru
cture behi
nd it or in fron
t of it.
(3) Mutation
i
s
a
n
a
s
sisted
operation i
n
g
enetic
algo
rit
h
m an
d it i
s
mainly to mai
n
tain the
popul
ation di
versity. Thi
s
operation
use
s
the
simp
l
e
mutation a
n
d
sele
cts indivi
dual
s to p
e
rf
orm
mutation op
eration i
n
the pro
bability of pm. It r
andomly
sel
e
cts th
e indi
vidual gen
e
bit
partici
pating i
n
the mutation and do
es th
e reverse o
p
e
r
ation.
2.1.4.Termination Co
nditi
on
Becau
s
e th
e
geneti
c
algo
ri
thm doe
s not
use
of
the gra
d
ient of the o
b
jective fun
c
tion and
other info
rma
t
ion, it’s una
ble to use traditional
m
e
thod
s to judg
e its co
nverg
ence in o
r
de
r to
terminate the
genetic process. The co
mmonly us
e
d
method is b
y
controllin
g the paramete
r
s to
reali
z
e th
e al
gorithm
termi
nation,
su
ch
as
re
a
c
hi
ng t
he
spe
c
ified
maximum al
g
ebra. A
nothe
r
way is when
the averag
e
fitness-differences of
adj
ace
n
t gene
ra
tions is le
ss than a thre
sh
old
value
ε
, it can terminate the
genetic
o
peration and the
end conditio
n
is
21
1
ff
f
f
ii
i
i
.
2.2. Fitness
Defini
tion an
d Re
w
a
rd an
d Punishment Mech
anis
m
In
most wrap
per ba
sed ge
netic
al
go
rith
ms
featu
r
e
selectio
n meth
ods
are u
s
in
g ce
rtain
cla
ssifie
r
m
o
dels to evalu
a
te sele
cted
feature
set a
nd
utili
zing
t
he cla
ssifi
cat
i
on contri
buti
o
n
value o
r
th
e
cla
ssifi
cation
error rate
a
s
the fitnes
s fu
nction. In
this pap
er, i
n
o
r
d
e
r to
search
out
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No. 4, April 2013 : 1725 – 1
730
1728
the better lin
ear
sepa
ra
bil
i
ty feature su
bsp
a
ce,
we
adopt C4.5 a
l
gorithm [7] a
s
the cla
s
sifier
model an
d cl
assificatio
n
error
rate a
s
the fitness fun
c
tion.
In orde
r to
make
up fo
r the inad
equ
acy of ge
net
ic alg
o
rithm’
s ability to mountain
climbin
g
, whi
c
h often falls
into local o
p
timal solutio
n
but not the global optimal
solutio
n
, we
here
take
a reward
and
pu
nishm
ent me
cha
n
ism as sho
w
n i
n
Figu
re
2.
We take the
ge
netic
algo
rith
m
as the main
pro
c
e
ss a
nd
make
simulat
ed anne
aling
algorithm int
egrate
d
into it in order to is
further adj
ust
and
optimi
z
e the
group
s. The
co
re
id
ea i
s
: when
the ne
w i
ndi
vidual fitne
s
s is
highe
r than its parent fitness, we
use the new to take
the place of i
t
s pare
n
t as
a reward for the
new in
dividua
l. Otherwi
se,
as p
uni
shme
nt, accordi
ng
to the Metro
p
o
lis
standa
rd
[8] we ma
ke t
he
new to replace its parent
by Bo
ltzmann substitution probabilities
P. According to the iterative
improvem
ent
philo
sophy, it not only co
ul
d enha
nce
th
e global
co
nverge
nce but
also a
c
cele
ra
te
the evolution
spe
ed a
n
d
acq
u
ire
the
satisfie
d gl
obal o
p
timal
solutio
n
. Th
e algo
rithm
first
determi
ne
s a
n
averag
e value to reflect e
n
vironm
ent variabl
e
α
in g
enetic al
gorit
hm ope
ration.
2
1
1
k
i
f
f
i
k
(3)
Whe
n
α
i
s
large
r
, nam
el
y during the
early pe
riod
of genetic
operator p
e
rf
ormin
g
,
individual
s vary con
s
ide
r
a
b
l
y and the ph
enotype varia
n
ce
α
of gen
etic enviro
n
m
ent of is larg
er.
The p
r
ob
ability of acqui
ring
the bad
solut
i
on s
hould
be
clo
s
e to 1 a
n
d
pop
ulation
update
com
e
s
near to th
e full update m
ode. It could
accele
rate g
e
netic alg
o
rith
m conve
r
ge
n
c
e p
r
o
c
e
ss
a
nd
climb
well
ov
er the
lo
cal
extremum
ob
stacl
e
to
gui
de the
search directio
n to
get th
e glo
bal
optimum
solu
tion. In the la
ter pe
riod,
α
i
s
smalle
r. We perfo
rm a
n
nealin
g ope
ra
tion on
cro
ssed
and m
u
tation
al individu
als.
It is not o
n
ly benefi
c
ia
l to
improve
the
cap
a
city of
searchin
g glo
b
a
l
optimum but
also to p
r
e
s
e
r
ve excell
ent indivi
dual
s. Populatio
n upd
ate is cl
ose to the cove
rin
g
update m
ode
. It gradually turns into g
r
eedy
sea
r
ch
method unti
l
eventually finds the gl
ob
al
optimal
soluti
on. So replaci
ng probabilit
y is set as: exp(-
θ
⊿
f/
α
) wh
ere
θ
>0. If
α
>=
1 t
h
e
n
θ
=
α
+1
else
θ
=1.
3. Results a
nd Analy
s
is
Our
experi
m
e
n
tal data i
s
K
DD
Cu
p 1
9
9
9
Data
prep
roce
ssed
by
Colum
b
ia
Uni
v
ersity. It
provide
s
the
netwo
rk conn
ection
data la
sting 9
we
eks a
c
qui
red
from a
simulat
ed lo
cal n
e
twork.
Each
re
co
rd i
n
this set con
t
ains
41
dime
nsio
n featu
r
e
s
a
nd m
a
rks i
t
s catego
ry, such
a
s
Norm
al,
Do
s,
U2R,
P
r
obing, R2L
a
nd so on. We
extract
s
the
data
in
the
seco
nd and
fo
urth wee
k
wit
h
the test platform bein
g
AMD36
0
0
+
, 2G, XP operating
system.
In orde
r to m
a
ke
experi
m
e
n
t operation
conve
n
ient,
we ta
ke
sam
p
les from the
data set
and re
du
ce the numb
e
r o
f
the instances. We resp
ectively take rand
om sa
m
p
ling metho
d
for
DOS
,
PROB
E
,
R2L
,
U2
R
,
and
NORMAL in traini
ng
set to
en
sure th
e di
stri
bution
co
nsi
s
tency
(N)
candidate popula
t
ion
Re
w
a
rd &
punishment
termination
condition(
Y
)
re
w
a
r
d
and
punishment
mechanism
selection,cr
ossov
e
r
,
mutatio
n
Genetic
operato
r
encoding
features afte
r
bein
g
filtered
annealing calculation
Metr
opolis r
u
le
Figure 2. Improve GA Flo
w
Diagram
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Hybrid F
eature Selection B
a
se
d on Im
pr
ove
d
Gen
e
tic Algorithm
(Shuxi
n
ZHU)
1729
betwe
en sam
p
les a
nd ori
g
i
nals. Th
en we com
b
ine th
e 5 new
sam
p
les to form t
he experi
m
en
tal
training d
a
ta
set, inclu
d
ing
mixture of NORM
AL a
nd
DOS, mixture
of NORMA
L
and PROBE,
mixture of NORMAL
and
R2L, mixture
of NO
RMAL
and U2R
and
mixture of DOS,PROBE,R2L
and
U2
R. Ea
ch trainin
g
da
ta set h
a
s the
sam
e
in
stan
ce n
u
mb
er
of 1170
1. In th
ese five t
r
aini
ng
data set we u
s
e the
feature sel
e
ctio
n al
gorithm
prop
ose
d
ab
ove a
nd choo
se
th
e co
rrespon
di
ng
feature
su
bse
t
s which resp
ectively conta
i
n 50
72,
5
2
8
0
an
d 10
158
records. In
th
e test,
we ta
ke
10-time
s run
n
ing average
as the test result, and
the
n
build the int
r
usi
on dete
c
ti
on mod
e
l ba
sed
on all
41 fe
a
t
ures an
d
sel
e
cted
feature
su
bsets i
n
e
v
ery traini
ng
data
set. At last, we ma
ke
comp
ari
s
o
n
s
in the asp
e
ct
of system buil
d
ing ti
me and
test accuracy between 4
1
features b
a
sed
intrusi
on dete
c
ting sy
stem model an
d
se
lected featu
r
e
sub
s
ets b
a
sed model.
The expe
rim
ent first
sele
cts 33 featu
r
e
s
by
mutual inf
o
rmatio
n ind
e
x
es an
d then
carrie
s
out the impro
v
ed geneti
c
algorithm. Let
cro
s
s pr
oba
bility pc=0.6 , mutation pro
bability pm=0
.05
and iteration
numbe
r i
s
1
00, acco
rdin
g to the
alg
o
r
ithm termi
n
a
t
ion co
ndition
s the th
re
sh
old
value is
0.00
3. Experime
n
t
al re
su
lts sh
ow
that we g
e
t
the
final
al
gorithm
opti
m
al sub
s
et when
the feature
n
u
mbe
r
is p
r
o
posed finally
in 18. Th
is
result sh
ows that comp
ared with o
r
igi
nal
feature ba
se
d system, the
modeling tim
e
and det
e
c
ting time both decli
ne over
half. It
is sho
w
n
in Figure 3.
Figure 3. Co
mpari
s
o
n
Tim
e
in the Mode
ling and
Dete
cting bet
wee
n
Origin
al an
d Optimal
Features
Comp
ared wi
th original fea
t
ures, the d
e
tect
ion effici
en
cy has g
r
eatl
y
improved in
the
asp
e
ct dete
c
t
i
on time and false rate as
shown in Tabl
e 1.
Table1. Dete
ction Efficien
cy Comp
ari
s
o
n
Original featur
es
Optimal features
Feature
numbe
r
41
18
Detection rate
98.504%
99
.
129%
False rate
9.367%
7
.
459%
In addition,
we
com
pare
the imp
r
ov
ed ge
netic
algorith
m
an
d traditio
nal
geneti
c
algorith
m
in
conve
r
ge
nce
sp
eed
an
d
conve
r
ge
nce
preci
s
ion:
a
l
though
there
is
no
obvio
us
increa
se i
n
converg
e
n
c
e ti
me
spe
ed,
we first u
s
e th
e mutu
al info
rmation
ind
e
xes to
scree
n
the
feature
s
and t
hus ma
ke the
convergen
ce
accura
cy
mo
re stabl
e. It is proved to be
sup
e
rio
r
to the
traditional GA
in the aspe
ct
of detection rate and false rate.
4. Conclusio
n
This pap
er u
t
ilizies a
hy
brid fe
ature
sele
ction
met
hod to
sim
p
li
fy intrusio
n d
e
tection
feature
s
. It first
adopt
s m
u
tual info
rma
t
ion ind
e
x an
d sele
cts
se
quential
forward
co
nst
r
u
c
tion
method fo
r
eliminating
red
unda
nt feature sub
s
et. Th
en o
n
this ba
sis, it
uses i
m
prove
d
g
e
n
e
tic
algorith
m
to con
s
tru
c
t the
optimal
sub
s
et. In the final simul
a
tio
n
experim
ent
, it also get the
satisfie
d re
sul
t
s of modelin
g time and de
tection rate.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No. 4, April 2013 : 1725 – 1
730
1730
Referen
ces
[1] Zave
P.
Classi
fication of R
e
s
earch Effort s in Req
u
ir
ement
’
s
E
ngi
ne
erin
g
. ACM Computi
ng Surve
y
s
.
199
7; 29(4): 31
5
-
32
1.
[2]
Yu L, Liu H. Ef
ficient featur
e selecti
on vi
a a
nal
ysis
of relev
ance a
nd re
du
nda
nc
y
.
J
ourn
a
l of Mach
i
n
e
Lear
nin
g
Res
e
arch.
200
4; 5(1
0
):120
5
−
12
24.
[3] S
Das.
F
ilters, W
r
appers an
d a Boostin
g
-
B
as
ed Hy
brid
for F
eature Se
lectio
n
. Procee
din
g
s of 18th
Confer
ence
on
Machin
e Le
arnin
g
. 200
1; 74-
81.
[4]
E Xi
ng, M Jo
rdan, R K
a
rp.
F
eature se
le
ction for h
i
g
h
-di
m
e
n
sio
nal
g
eno
mic
micro
a
rray d
a
ta
.
Procee
din
g
s of
Eightee
nth Internati
ona
l Co
nferenc
e on Mac
h
in
e Le
arni
ng. 200
1; 601
–6
08
.
[5]
Lu Xin-
guo, Li
n Ya-pi
ng, Ch
en Z
h
i-pi
ng. A
n
Im
proved F
eature Se
lecti
on Prepr
ocess
i
ng Alg
o
rith
m
Based o
n
Mutu
al
Information.
Journ
a
l of Hu
n
an Un
iversity (
N
atura
l
Scienc
e).
2005; (2):1
04-1
07.
[6]
Z
hu Hon
g
-pi
n
g, Gong Qing
-ge, LEI Z
han
-bo. F
eature s
e
lecti
on of intr
usio
n detecti
o
n
base
d
on
gen
etic al
gorith
m
.
Applicati
on
Rese
arch of C
o
mputers
. 20
1
2
. 29(4): 14
17-
141
9.
[7] Ruggieri
S.
Efficient C4.5.
IEEE Transactio
n
s
on Kn
ow
led
g
e
and
Data En
gi
neer
ing
. 2
0
0
2
; 14(2): 4
38-
444.
[8]
Liu
Ya
n, Ha
n
C
hen
g-de,
W
a
n
g
Yi-
he,
Li
Xia
o
-ming. T
he B
a
c
k
grou
nd
of Sim
u
late
d A
nne
al
i
ng
an
d T
he
Monoto
n
ic T
e
mperatur
e Ris
i
ng Sa.
Jo
ura
n
l
of Co
mputer
Rese
arch a
n
d
dev
elo
p
m
e
n
t.
1996;
(0
1):
4-10.
[9]
W
u
Jian.
Unsu
pervis
ed i
n
trus
ion
detecti
on b
a
sed on
f
eatur
e
sel
e
ctio
n.
C
o
mputer E
n
g
i
n
eeri
ng
an
d
Appl
icatio
ns
. 2
011; 47(
26): 79
-82.
[10]
Z
hang
Yo
ng,
Cao
Do
ng-
xi
a.
Nov
e
l
improv
ed fuzz
y
clust
e
rin
g
a
l
g
o
rithm
ap
pli
e
d
in
n
e
tw
o
r
k i
n
trusi
o
n
detectio
n
.
Co
mputer Eng
i
n
eeri
ng an
d Des
i
gn
.
2012; 3
3
(2): 4
79-4
83.
[11]
Jing Xi
ao-
pei, W
ang
Ho
u-
xi
a
ng,
NIE Kai, L
uo Z
h
i-
w
e
i. F
e
ature Se
lectio
n
Algorithm B
a
s
ed o
n
IMGA
and MKSVM to
Intrusion Dete
ction.
Co
mp
ute
r
Science.
20
1
2
; 39(7): 97-9
9
.
[12]
Che
n
W
en, Z
hao Yon
g
ji
u,
Jun Z
hou
xi
ao, C
o
mpact an
d
w
i
de up
per-stop
b
and tripl
e
-mo
d
e
broa
dba
nd
microstrip BPF.
Te
lko
m
nika
. 2
012; 10(
2): 353
-358.
[13]
Li Ju
n-F
ang
a,
Z
hang
Bu-H
a
n
.
Lim
i
tation
of small-
w
o
rl
d
net
w
o
rk to
po
l
o
g
y
for a
p
p
lic
ation
in
non-
domi
nated sor
t
ing differe
ntia
l evol
ution
alg
o
ri
thm.
T
e
lkomnik
a
. 2012; 1
0
(2): 400-
408.
Evaluation Warning : The document was created with Spire.PDF for Python.