Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
Vol.
5, No. 6, Decem
ber
2015, pp. 1516~
1
524
I
S
SN
: 208
8-8
7
0
8
1
516
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Feature Reduction in Clinical
Data Classification using
Augmen
ted Genetic Algorithm
Srividya Sivas
a
nk
ar,
Sru
t
hi Nair,
M.V. Ju
dy
Department o
f
C
o
mputer Scien
c
e & I
.
T,
Amrita School of
Arts &
Sciences, Koch
i,
Amrita Vishwa
Vidy
apeetham
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Apr 27, 2015
Rev
i
sed
Au
g
11
, 20
15
Accepted Aug 30, 2015
In clinical d
a
ta,
we have a larg
e
set of
d
i
agnostic feature
and r
eco
rded details
of patien
t
s
for certa
in dis
eas
es
. I
n
a clini
c
a
l
envi
ronm
ent a docto
r reach
es
a
treatment decision based on his theore
tical knowledge, information attain
ed
from
patients, a
nd the clini
c
a
l
reports of the pa
tient
. It is ver
y
difficu
lt to
work with huge data in machin
e learni
ng; h
e
nce to redu
ce th
e
data, featur
e
reduction is app
lied
.
Featur
e red
u
ction ha
s ga
ine
d
interest
in m
a
n
y
rese
arc
h
areas
whi
c
h de
als
with m
ach
i
n
e le
ar
ning
an
d data mining, because it
enhanc
es
the
c
l
a
s
s
i
fiers
in
term
s
of fas
t
er
execu
t
i
on,
cos
t
-eff
ect
i
v
enes
s
,
and
accur
a
c
y
.
Us
ing
fea
t
ure
redu
ctio
n we in
tend
to f
i
nd th
e r
e
lev
a
nt
featur
es
of
the dat
a
set
.
In
this paper
,
we
have an
al
yz
ed
M
odified GA (M
GA), P
C
A
and
th
e
combination
of PCA and Modified
Genetic
algor
ithm for f
eatu
r
e
reduction. We h
a
ve found th
at correctly
classif
i
ed
rate
of combination
of
PCA and Modified Genetic algo
rithm higher
co
mpared to th
e o
t
her featur
e
reduction metho
d
.
Keyword:
Classification
Feat
ure
re
duct
i
on
Gen
e
tic al
g
o
rith
m
PCA
Copyright ©
201
5 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
M.V
Jud
y
,
Depa
rt
m
e
nt
of
C
o
m
put
er Sci
e
nce &
I
.
T,
Am
rita Sch
o
o
l
o
f
Arts & Scien
ces,
Am
rita Vishwa
Vidy
a
p
eetham
.
Em
a
il: j
u
d
y
.n
air@g
m
ail.co
m
1.
INTRODUCTION
In a cl
i
n
i
cal
envi
ro
nm
ent
,
a doct
o
r m
a
kes a
m
e
di
cal
diagn
o
si
s ba
sed
on hi
s m
e
di
cal
expert
i
s
e,
sy
m
p
to
m
s
o
f
a p
a
tien
t
, an
d
fro
m
th
e p
a
tien
t
’s test rep
o
r
ts. Med
i
cal d
i
ag
no
sis is a critica
l
task
wh
ich
inv
o
l
v
e
s
hi
g
h
preci
si
on
and
wi
t
h
no
c
h
ance
s of
er
ro
rs.
R
e
d
u
n
d
a
n
c
y
i
n
ho
spi
t
a
l
r
ecor
d
s,
negl
i
g
ence of
ot
her
m
e
di
cal
co
nd
itio
ns, amb
i
gu
ou
s responses fro
m
th
e p
a
tien
t
, can
result in
a d
e
lay
in
d
i
agn
o
sis o
r
p
e
rh
ap
s ev
en
a wrong
diagnosis. T
o
im
prove the a
ccuracy
of dia
g
nosis for
effe
ctive
treatm
e
n
t
,
we propose a
m
achine
learni
ng
alg
o
rith
m
to
a
n
alyze wh
et
h
e
r a p
a
tien
t
test
s is p
o
sitiv
e or n
e
g
a
tive fo
r a certain
d
i
sease. Sin
ce t
h
e d
a
ta we
d
eal
with
is t
o
o
larg
e and
com
p
lex
,
we
h
a
ve to
first
red
u
c
e
t
h
e
dat
a
usi
n
g
feat
u
r
e r
e
d
u
c
t
i
on t
e
c
hni
que
s l
i
k
e,
Genet
i
c
Al
g
o
r
i
t
h
m
(GA)
, P
r
i
n
ci
pal
C
o
m
pone
nt
A
n
al
y
s
i
s
(PC
A
), Li
n
ear Di
scri
m
i
nant
Anal
y
s
i
s
(
L
DA
),
Canonical Correlation
Analysis (
CCA) etc. The
m
o
st comm
on techniques am
ong t
h
e
s
e are
GA a
nd PCA.
Feat
ure R
e
duct
i
on i
s
t
h
e
pr
oc
ess of
rem
ovi
n
g
re
du
n
d
ant
or
i
rrel
e
va
nt
dat
a
fr
om
t
h
e ori
g
i
n
al
dat
a
set
.
W
i
t
h
t
h
e
help
of feat
ure reduction t
h
e execution time of classi
fication is cons
idera
b
ly reduc
e
d, a
nd a
n
increased
accuracy rate
is obtaine
d due to the rem
oval of
re
dundant and noisy data. Re
taining all the unwante
d
attributes during t
h
e traini
ng proce
ss consum
es a
lot of m
e
m
o
ry, storage s
p
ace a
n
d CPU
res
o
urces, to
ove
rc
om
e t
h
i
s
pr
o
b
l
e
m
feat
ure red
u
ct
i
o
n i
s
per
f
o
r
m
e
d. C
l
in
ical Data Sets d
e
fi
n
e
a standard
set
o
f
i
n
fo
rmatio
n
t
h
at
i
s
gene
rat
e
d
fr
om
care re
cor
d
s,
o
r
gani
z
a
t
i
on
or
sy
st
em
th
at cap
tu
res th
e
d
a
ta. Clinical d
a
ta are a
p
r
im
ary
resource for most
of t
h
e
health a
n
d m
e
dical researc
h
. Clinical data ca
n
be
collected either
during t
h
e c
o
urs
e
o
f
ong
o
i
n
g
p
a
ti
en
t care or it can
b
e
thro
ugh
fo
rm
al clin
ical t
r
ial.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Fea
t
u
r
e Red
u
c
tio
n
in Clin
ica
l
Da
ta
Cl
a
ssifica
tio
n using
Aug
m
en
ted Gen
e
t
i
c Alg
o
r
ithm (
S
rivid
y
a
S
i
va
san
k
ar)
1
517
Clin
ical d
a
ta fall in
to
th
e fo
llowing
categories:
1)
Electronic
heal
th rec
o
rd
2)
Ad
m
i
n
i
strativ
e d
a
ta
3)
Clai
m
s
d
a
ta
4)
Disease data
5)
Health s
u
rvey
6)
Clin
ical trial d
a
ta
In
ou
r pa
per
,
we deal
wi
t
h
an el
ect
ro
ni
c heal
t
h
rec
o
r
d
whi
c
h co
nsi
s
t
s
of m
e
di
cal
det
a
i
l
s
of t
h
e
p
a
tien
t
. Clin
ical d
a
ta are a co
m
b
in
atio
n
of d
i
fferen
t
a
ttri
b
u
t
e typ
e
s.
Wh
ile co
m
p
aring
clin
ical d
a
ta with
a
n
o
rm
al d
a
ta w
e
o
b
s
erv
e
th
at
, th
e latter is
m
o
stl
y
co
m
p
o
s
ed
of a sin
g
l
e d
a
ta typ
e
wh
ile th
e fo
rm
er is a
co
m
b
in
atio
n
o
f
attrib
u
t
e typ
e
s.
In recent deca
des,
c
o
nsidera
b
le
resea
r
ch has
b
een
done
in
m
achine learni
ng a
n
d data
m
i
ning
t
echni
q
u
es
. A
m
ong t
h
ese G
A
i
s
fo
un
d m
o
st
use
f
ul
i
n
m
e
di
cal
kno
w
l
edge di
sc
o
v
er
y
.
GA i
s
a search
m
e
t
hod
ol
o
g
y
whi
c
h i
s
devel
ope
d u
s
i
n
g t
h
e
pri
n
ci
pl
e of
n
a
t
u
ral
sel
ect
i
on. I
n
t
h
ese pa
p
e
rs, ge
net
i
c
al
go
ri
t
h
m
was
use
d
t
o
i
d
ent
i
f
y
t
h
e
key
di
ag
no
st
i
c
feat
ures
an
d cl
assi
fy
w
h
et
he
r t
h
e
pat
i
e
nt
i
s
s
u
f
f
e
r
i
n
g f
r
om
a pa
r
t
i
c
ul
ar
di
sease
or
n
o
t
[
1
]
,
[
2
]
.
Ji
n-
Xi
n
g
Hao
,
Ya
n
Y
u
, R
o
b La
w a
n
d
Da
vi
s Ka
C
h
i
o
F
o
ng
p
r
o
p
o
s
e
d
genet
i
c
al
g
o
r
i
t
h
m
base
d l
ear
ni
n
g
app
r
oach t
o
un
de
rst
a
n
d
c
u
st
om
er sat
i
s
fact
i
on [
5
]
.
A
b
dul
ham
i
t
Suba
si
, M
.
Ism
a
i
l
Gu
rs
oy
pr
o
pose
d
a P
C
A feat
u
r
e re
duct
i
o
n sy
st
e
m
and use
d
t
h
i
s
dat
a
fo
r EE
G cl
assi
fi
cat
i
o
n [
6
]
.
I
n
t
h
e
p
r
o
p
o
se
d
sy
st
em
, Genet
i
c Al
g
o
r
i
t
h
m
chr
o
m
o
som
e
encodi
ng
i
s
do
ne
in
value
format.
W
e
also us
e PCA and GA
for
featu
r
e redu
ctio
n, PC
A is
u
s
ed
for th
e id
en
tifyin
g
th
e
p
a
ttern
in th
e
d
a
ta an
d h
i
g
h
ligh
t
th
e
similarit
y
and
diffe
re
nce in
data. The a
dva
ntage of PCA is
that once
we
have
f
o
u
n
d
t
h
e
pat
t
e
rn
we wi
l
l
red
u
ce t
h
e
nu
m
b
er
of
di
m
e
nsi
ons
(t
he
res
u
l
t
t
h
at
we
obt
ai
ne
d
fr
om
PC
A ret
a
i
n
s al
l
t
h
e
pr
op
er
t
i
e
s of t
h
e
ori
g
i
n
al
dat
a
set
)
.
In t
h
i
s
st
u
d
y
,
we
have
anal
y
zed PC
A, M
o
d
i
fi
ed G
A
a
n
d c
o
m
b
i
n
at
i
on
of
PC
A a
n
d
M
o
di
fi
ed
GA
f
o
r
featu
r
e redu
ctio
n,
wh
ich
h
e
lps to
d
e
term
in
e th
e m
o
st e
ssential features
re
qui
red for clas
sification. Our
aim
is
t
o
fi
nd t
h
e k
e
y
feat
ures an
d fi
nd w
h
i
c
h i
s
t
h
e best
m
e
t
hod fo
r feat
u
r
e sel
ect
i
on an
d i
d
ent
i
fy
whi
c
h re
du
ct
i
o
n
m
e
t
hod
ha
ve a
hi
g
h
er ac
cu
ra
cy
rat
e
. The
G
e
net
i
c
al
go
ri
t
h
m
has bee
n
m
odi
fi
ed
by
i
n
cl
udi
ng M
o
di
fi
e
d
Kee
p
B
e
st
R
e
pro
d
u
c
t
i
on St
rat
e
gy
.
Whe
n
u
s
i
n
g P
C
A an
d M
o
di
f
i
ed GA
fo
r fe
at
ure re
d
u
ct
i
o
n, fi
r
s
t
t
h
e dat
a
set
is
red
u
ce
d
vi
a PC
A,
fol
l
owe
d
by
M
odi
fi
ed
G
A
.
The
res
u
l
t
a
nt
dat
a
set
i
s
t
h
e
n
cl
assi
fi
ed.
I
n
o
u
r re
searc
h
, we t
o
o
k
fi
v
e
dat
a
set
s
, dat
a
set
1
(C
ol
on
C
a
ncer
) fr
om
t
h
e B
i
oi
nf
orm
a
t
i
c
s gro
u
p
research
rep
o
sito
ry, d
a
taset2
(Breast Can
cer
W
i
scon
si
n
(O
r
i
gi
nal
)
Dat
a
Se
t
)
, dat
a
set
3
(
D
i
a
bet
i
c
R
e
t
i
nop
at
hy
Debrecen
Dat
a
set), d
a
taset
4
(In
d
i
an
Liv
e
r Patien
t
Dataset (ILPD)) and
d
a
taset5
(Fertility) fro
m
t
h
e UC
I
repo
sitory. Dataset1
con
s
ists o
f
20
01
attri
b
u
t
es i
n
cl
ud
ing
class,
d
a
taset2
, d
a
taset
3
,
dataset4
and
dataset5
cont
ai
n
s
1
0
,
2
0
,
1
0
,
10 at
t
r
i
but
es i
n
cl
u
d
i
n
g cl
ass res
p
ect
i
v
el
y
.
W
e
a
ppl
i
e
d
PC
A, M
GA a
nd a c
o
m
b
i
n
at
i
on
o
f
PC
A an
d M
o
di
fi
ed
GA t
o
t
h
ese dat
a
set
s
.
Dat
a
set
1
has
been
red
u
ce
d
t
o
84
9 at
t
r
i
b
ut
es usi
ng M
G
A
,
3
1
at
t
r
i
but
es usi
n
g PC
A a
nd
3
1
at
t
r
i
but
es
us
i
ng com
b
i
n
at
i
on
of PC
A a
nd M
odi
fi
e
d
GA
. Fo
r dat
a
s
e
t
2
,
10
at
t
r
i
but
es ha
ve
been
red
u
ce
d
t
o
8 at
t
r
i
but
e
s
usi
n
g b
o
t
h
PC
A an
d M
o
di
fi
ed
GA
, an
d 7 at
t
r
i
b
ut
es
usi
n
g
com
b
i
n
ed PC
A an
d M
odi
fi
ed G
A
.
Fo
r
dat
a
set
3
, at
t
r
i
b
ut
e
s
ha
ve bee
n
re
duce
d
t
o
9
usi
ng
PC
A
,
6 at
t
r
i
but
es
usi
n
g c
o
m
b
i
n
at
i
on
of
PC
A a
n
d
M
o
di
fi
e
d
G
A
a
n
d
10
at
t
r
i
but
es
usi
n
g
M
odi
fi
ed
G
A
.
F
o
r
dat
a
set
4
, at
t
r
i
b
ut
es
h
a
v
e
been r
e
du
ced to 7 using
PC
A
,
3 using
co
m
b
in
atio
n of
PC
A
and
Mo
d
i
f
i
ed
G
A
an
d 6
u
s
ing
M
G
A. For
dat
a
set
5
,
at
t
r
i
b
ut
es
have
bee
n
red
u
ce
d t
o
9
and
1
0
at
t
r
i
but
es usi
n
g
PC
A
and
M
G
A
resp
ect
i
v
el
y
and
6
usi
n
g
com
b
i
n
at
i
on
P
C
A a
n
d
M
o
di
fi
ed
GA
.
2.
DIME
NSI
O
N
A
LITY RE
D
UCTI
O
N
Dim
e
n
s
io
n
a
lity red
u
c
tion
is th
e p
r
o
c
ess o
f
redu
cing
th
e nu
m
b
er o
f
attrib
u
t
es u
s
i
n
g
fo
llowing
m
e
thods
- a
g
gregating, elim
inating re
dundant feat
ures
,
or cluste
ring,
fo
r i
n
stance
.
Dim
e
nsionality can
be
red
u
ce
d by
re
desi
g
n
i
n
g t
h
e
feat
ures
, sel
e
ct
i
ng an a
p
pr
op
ri
at
e su
bset
am
ong t
h
e e
x
i
s
t
i
ng
feat
u
r
e
s
, an
d
co
m
b
in
in
g
ex
i
s
tin
g
featu
r
es.
Dim
e
n
s
io
n
a
lity redu
ctio
n can
be d
i
v
i
d
e
d
i
n
to
t
w
o
typ
e
s: feature selectio
n
and
feature
ext
r
action.
The feat
u
r
e re
duct
i
o
n p
r
oc
es
s rem
oves red
u
n
d
a
n
t
or i
rrel
e
vant
feat
ures
fr
om
t
h
e ori
g
i
n
al
dat
a
set
.
Th
e ex
ecu
tio
n ti
m
e
, classific
a
tio
n
accu
r
acy
and
und
erstan
d
a
b
ility o
f
t
h
e feature redu
ced
d
a
ta set in
creases
and c
o
st
o
f
ha
ndl
i
n
g
of sm
all
e
r dat
a
set
i
s
com
p
arat
i
v
el
y
low
.
The i
r
rel
e
vant
feat
ures c
a
n al
so i
n
cl
ud
e noi
sy
d
a
ta wh
ich m
a
y h
a
v
e
a
n
e
gativ
e im
p
act in
classificati
on a
ccuracy.
T
h
e
feature
se
lectio
n algo
rith
m
can
be
g
r
ou
p
e
d
in
t
o
3
catego
r
ies:
filters, wrapp
e
r an
d em
b
e
d
d
e
d
.
Wrapp
er m
o
d
e
l
d
e
p
e
nd
s
o
n
classi
ficatio
n
or
cl
ust
e
ri
n
g
al
g
o
ri
t
h
m
,
exam
pl
es of t
h
ese
m
e
t
hods are
genet
i
c
al
gor
i
t
h
m
,
recursi
v
e feat
ure el
i
m
i
n
at
i
on
algorithm
.
Filter feature sele
cti
on al
g
o
ri
t
h
m
are t
hose w
h
i
c
h ar
e i
nde
p
e
nde
nt
o
f
t
h
e
cl
assi
fi
ers.
E
m
bedde
d
m
odel
s
perf
or
m
feat
ure sel
ect
i
on d
u
ri
ng t
h
e l
earni
n
g
pr
ocess [3]. Feature selection for classification can be
achi
e
ve
d
usi
n
g
associ
at
i
o
n a
n
d c
o
r
r
el
at
i
on
m
echani
s
m
[4]
.
An
ot
he
r a
p
p
r
oac
h
pr
o
p
o
s
e
d
i
s
t
o
u
s
e a
p
ri
ori
rul
e
g
e
n
e
ration
algo
rith
m
an
d use it with
co
rr
e
l
at
i
on
of at
t
r
i
b
ut
es t
o
fi
nd
o
u
t
cl
o
s
ely
related
attribu
t
es [4
].
In
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1516 –
1524
1
518
feat
ure
ext
r
act
i
o
n
m
e
t
hod t
h
e
ori
g
i
n
al
set
of
at
t
r
i
but
es i
s
t
r
a
n
sf
orm
e
d i
n
t
o
a ne
w set
of at
t
r
i
but
es,
exam
pl
e o
f
feature
ext
r
action is PC
A.
2.
1. Gene
ti
c
A
l
gori
t
hm
Genet
i
c
al
g
o
ri
t
h
m
co
m
e
s und
er Ev
ol
ut
i
o
na
r
y
al
gori
t
h
m
,
G
A
can
be
used
fo
r a va
ri
et
y
of
search a
n
d
o
p
tim
izat
io
n
prob
lem
s
. In
itially GA b
a
sed
l
earn
i
n
g
was
u
s
ed
in
two
d
i
fferen
t
ap
pro
a
ch
es: Pitt ap
p
r
o
a
ch
and
Mich
ig
an
appro
ach. [
7
]. GA
can
b
e
u
s
ed
f
o
r
p
a
tter
n
r
e
co
gn
itio
n
.
Tw
o
meth
od
s f
o
r
applyin
g
GA
fo
r
patter
n
reo
r
ga
ni
zat
i
o
n are,
1)
Use
GA as a
classifier
directly in com
putation
2)
Use
GA to
com
pute the res
u
lts.
Anothe
r a
r
ea
whe
r
e
GA ca
n be
use
d
is for selectin
g the
prototypes
in the case
-
based
classification
[8]
.
In
G
A
,
sol
u
tions to t
h
e
problem
are enc
o
ded as c
h
rom
o
som
e
s and a
group
of chrom
o
som
e
s are known as
po
p
u
l
a
t
i
on. C
h
rom
o
som
e
s are set
s
of
ge
nes
and
p
o
ssi
bl
e
val
u
es i
n
t
h
e
g
e
nes are
k
n
o
w
n as al
l
e
l
e
s. T
h
e fi
r
s
t
g
e
n
e
ration
o
f
ch
ro
m
o
so
m
e
is
called
th
e p
a
ren
t
g
e
n
e
ration
.
Th
e fitn
ess
functio
n
is ap
p
lied to
th
e ch
ro
m
o
so
m
e
s
to
m
easure the
closenes
s towa
rds the s
o
lu
tion
.
Chro
m
o
so
m
e
with
th
e h
i
ghest fitn
ess v
a
lue is u
s
ed
to
g
e
n
e
rate
o
f
fspring
.
Offsp
r
i
n
g can
b
e
o
f
d
i
ffer
en
t typ
e
s, paren
t
with
best fitn
ess
v
a
l
u
e will au
t
o
m
a
ti
cally su
rv
iv
e t
o
t
h
e
next
ge
ne
rat
i
o
n, o
r
t
w
o ra
n
d
o
m
l
y
sel
ect
ed parent
s are t
a
ke
n t
o
ge
nerat
e
o
ffs
pri
ng
vi
a di
f
f
ere
n
t
t
echni
q
u
e
s l
i
k
e
one
p
o
i
n
t
cr
os
sov
e
r, t
w
o
poi
nt
cro
sso
ve
r,
uni
fo
rm
and n
o
n
u
n
i
f
orm
crosso
ver, et
c.
or
by
m
u
t
a
t
i
ng si
ngl
e
pare
nt chrom
o
som
e
. The algorithm
stops
at whe
n
it reaches som
e
threshold.
At the end of the last evolution
o
f
th
e al
g
o
rithm
th
e b
e
st chrom
o
so
m
e
in
th
e
p
opu
latio
n
will b
e
t
h
e
o
u
t
p
u
t
.
GA use
s
3 m
a
in types
of
rules
at each
step t
o
cr
eate the
ne
xt
ge
neration
from
the
curre
nt
population:
1)
Selectio
n
ru
les
2)
C
r
oss
o
ver r
u
l
e
s
3)
Mu
tatio
n
ru
les
Mostly the values of ge
nes
will be bina
ry values,
but it is not necessary th
at it should
be bi
nary
val
u
es al
way
s
. Usa
g
e o
f
b
i
nary
val
u
e
s
vari
es f
r
o
m
pr
o
b
l
e
m
s
and t
echni
q
u
es
used in re
pres
enting
chr
o
m
o
som
e
s.
For e
x
am
pl
e, bi
nary
val
u
e
s
i
n
chr
o
m
o
som
e
c
a
n be used to
represe
n
t abse
nce or prese
n
ce of the
fature
s; 1 re
pre
s
ents
prese
n
ce
of the a
ttri
bute
and 0 re
pre
s
ent
s
the a
b
se
nce.
Eg
. 100
011
.
In the above bi
nary coded
value represe
n
ts that ther
e are total 7
attrib
u
t
es
o
f
wh
ich
fi
rst an
d
th
e last
th
ree attri
b
u
t
es is con
s
id
ered
t
o
find
th
e so
l
u
tio
n
s
.
Darwin
ian
evolu
tio
n
and
Nat
u
ral selectio
n
led
to
a num
ber
of m
odel
s
for
sol
u
t
i
o
n o
p
t
i
m
izat
i
on. G
A
i
s
one o
f
t
h
e s
ubs
et
s of t
h
i
s
evol
ut
i
on
base
d o
p
t
i
m
i
zat
i
on t
echni
q
u
e f
o
c
u
si
n
g
o
n
t
h
e a
ppl
i
cat
i
o
n of s
e
l
ect
i
on,
m
u
ta
tio
n
and
reco
m
b
in
atio
n o
f
co
m
p
u
ting p
r
ob
lem
so
lu
tio
n
.
Sin
c
e GA is p
a
rallel and
iterativ
e it is m
o
st
successfully used in the
fi
eld like opti
m
i
zatio
n problem
,
includi
ng m
a
ny pa
ttern rec
o
gnition a
n
d classification
task
.
Feat
ure
re
du
ct
i
on
usi
n
g
G
A
w
h
i
c
h i
s
wel
l
-m
at
ched t
o
t
h
e
opt
i
m
i
zati
on
pr
o
b
l
e
m
.
W
e
ha
ve
fi
ve
clin
ical d
a
tasets wh
ere
Dataset1
con
s
ists
o
f
20
01
attr
ibu
t
es
in
clud
ing
class, d
a
taset2
, dataset3
, d
a
taset4
and
dat
a
set
5
co
nt
ai
ns 1
0
,
2
0,
10
,1
0
at
t
r
i
but
es i
n
cl
udi
ng cl
ass
.
Ou
r in
ten
s
ion
is to
redu
ce th
e
dataset b
y
eli
m
i
n
atin
g
un
wa
nt
ed a
nd
repl
i
cat
ed fi
el
d
s
, i
.
e.
i
f
we ha
ve gi
v
e
n a dat
a
set
of n
-
di
m
e
nsi
o
nal
i
n
p
u
t
pat
t
e
rn
, o
u
r t
a
s
k
i
s
t
o
u
s
e
GA to tran
sfo
r
m
th
e
d
a
ta in
to
m
-
d
i
men
s
ion
wh
ic
h
is less th
an
n (m
<n
)
wh
ich
max
i
m
i
zes th
e
set of
opt
i
m
i
zati
on c
r
i
t
e
ri
a. T
h
e t
r
ansf
o
r
m
e
d dat
a
w
h
i
c
h
wer
e
red
u
ce
d
usi
n
g
GA
are e
v
a
l
uat
e
d
base
d
on
t
h
e
d
i
m
e
n
s
io
n
a
lity
, and
eith
er class or co
rrectly classified
rate.
2.
1.
1.
GA
B
a
s
e
d Fe
ature
Re
ducti
on
In
orde
r to c
o
m
pute the feature t
r
an
sfo
r
m
matrix
, GA m
a
in
tain
s pop
u
l
at
io
n
s
t
o
ev
alu
a
t
e
th
is m
a
trix
,
th
e in
pu
t
p
a
ttern
s m
u
ltip
ly b
y
th
e m
a
trix
to
pro
d
u
ce a
set o
f
tran
sform
i
n
g
d
a
ta
wh
ich are th
en
sen
t
to
th
e
classifier. T
h
e sam
p
les obtained are
divi
d
e
d
in
to
a train
i
ng
set an
d
testing
set in
wh
ich
the train
i
ng
set is u
s
ed
for traini
ng t
h
e classifier and the te
sting s
e
t is used to c
a
lculate the
accuracy. T
h
e ac
curacy rate is
passe
d
to
th
e
GA i
n
o
r
d
e
r t
o
m
eas
u
r
e the qu
ality o
f
th
e t
r
an
sfo
r
m
a
tio
n
u
s
ed. GA search
es
fo
r m
i
n
i
mizin
g
th
e
dim
e
nsionality of the
transformed data
by
maximizing the
accuracy
of the
classifier.
A di
rect
a
p
p
r
o
ach of feat
u
r
e sel
ect
i
on was
i
n
t
r
od
uce
d
by
Si
edl
ecki
a
n
d
skl
a
ns
ky
[
1
2]
. In
t
h
i
s
w
o
rk
GA is
u
s
ed to
find
th
e op
timal b
i
n
a
ry
vect
or
whe
r
e eac
h
bit is refe
rre
d as
an
attri
bute, i
n
this
bi
nary
1 refe
rs
to
th
e p
a
rticip
atio
n
o
f
attribu
t
e in
t
h
e classifier and
0
refers to
no
n particip
atio
n
,
th
e resu
ltan
t
feat
u
r
e
o
f
the
sub
s
et
i
s
defi
n
e
d usi
ng acc
ur
acy
of
the classifier. T
h
e GA feature sele
c
t
i
on has
bee
n
ext
e
n
d
ed t
o
i
n
cl
u
d
e
bi
na
ry
m
a
ski
ng
vect
o
r
al
o
n
g
wi
t
h
t
h
e
feat
ur
e wei
g
ht
vect
o
r
of t
h
e c
h
r
o
m
o
som
e
. The
m
a
sk
val
u
e
o
f
0
defi
ne
s
n
o
n
p
a
rticip
atio
n of attribu
t
e
for classificatio
n, if th
e
v
a
lue
is 1 the
field i
s
m
easured acc
ording t
o
the
weight
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Fea
t
u
r
e Red
u
c
tio
n
in Clin
ica
l
Da
ta
Cl
a
ssifica
tio
n using
Aug
m
en
ted Gen
e
t
i
c Alg
o
r
ithm (
S
rivid
y
a
S
i
va
san
k
ar)
1
519
value a
nd i
n
clude t
o
classifier. T
h
e inc
o
rporation of m
a
s
k
v
ect
o
r
allows th
e GA to
rap
i
d
l
y sam
p
le featu
r
e
with
sim
u
ltan
e
o
u
s
ly
op
ti
m
i
zi
n
g
scal
e
fact
or for feature
incl
usion.
For eac
h feat
ure a weight value and one or
m
o
re
m
a
sking value are as
signed
. Th
e m
a
j
o
rity
mask
ing
val
u
e i
s
t
a
ken
t
o
deci
de w
h
et
her t
h
e feat
ure
i
s
sel
e
ct
ed or not
.
Wei
g
ht
ve
ct
or i
s
used w
h
i
l
e
cal
cul
a
t
i
ng t
h
e
fitness value
.
These
vect
ors
are
int
r
oduced
to sm
ooth t
h
e
GA.
Whe
n
k-NN cl
assifier is use
d
k val
u
e
is also
enco
de
d whi
l
e
enco
di
n
g
c
h
ro
m
o
so
m
e
[13]
.
Ad
va
nt
ages
o
f
genet
i
c
al
go
ri
t
h
m
are:
1)
GA
can
sol
v
e
opt
i
m
i
zati
on
pr
obl
em
whi
c
h
c
a
n
be
descri
be
d
wi
t
h
t
h
e
ch
r
o
m
o
so
m
e
enco
d
i
ng
2)
GA h
e
l
p
so
lv
es prob
lem
s
with
n
u
m
erou
s so
l
u
tio
n
3)
GA is no
t d
e
p
e
nd
en
t o
n
the error surface; it
h
e
lp
s u
s
to
so
lv
e m
u
l
ti-d
i
m
e
n
s
io
n
a
l, n
o
n
-
d
i
f
f
e
r
e
n
tial, non
-
c
on
tin
uou
s,
an
d ev
en
no
n-p
a
r
a
m
e
tr
ical p
r
o
b
l
em
s.
4)
Li
ke PC
A
,
G
A
doe
s n
o
t
dem
a
nd t
h
e
kn
owl
e
dge
o
f
m
a
t
h
em
at
i
c
s, GA i
s
a
m
e
t
hod
whi
c
h
i
s
very
easy
t
o
un
der
s
t
a
nd
.
5)
GA
i
s
easi
l
y
re
assi
gne
d t
o
e
x
i
s
t
i
ng si
m
u
l
a
t
i
ons a
n
d m
odel
s
.
Fi
gu
re
1.
Ge
ne
t
i
c
Al
go
ri
t
h
m
pr
ocess
2.
2. Pri
n
ci
pal
Com
p
o
n
en
t A
n
al
ysi
s
PCA is
a m
e
th
o
d
th
at is co
m
m
o
n
l
y u
s
ed
fo
r feature
redu
cti
o
n.
It is
u
s
ed fo
r id
en
tifyin
g
th
e
p
a
ttern in
t
h
e dat
a
an
d hi
ghl
i
g
ht
s t
h
e si
m
i
l
a
ri
ty
and di
ffe
rence
.
The
m
a
i
n
adva
nt
ag
e of PC
A i
s
t
h
at
once
we ha
v
e
fo
un
d
th
e p
a
ttern
we
will redu
ce t
h
e nu
m
b
er
of
d
i
men
s
io
n
s
of t
h
e d
a
taset.
Meth
od
:
Step
1
:
Id
en
tify th
e d
a
taset
We
ha
ve use
d
cl
i
n
i
cal
cancer dat
a
set
w
h
i
c
h con
s
i
s
t
of 2
0
0
0
va
ri
abl
e
a
nd a
cl
ass
l
a
bel
w
h
i
c
h deci
de
s
wh
et
h
e
r th
e
p
a
tien
t
is No
rm
al
or
h
a
s Tu
m
o
r.
Our
d
a
taset con
s
ist of
63
in
st
an
ces.
Step
2: Subt
rac
t
the Mean from
each Dim
e
nsion
In the second
step, the m
ean
̅
of eac
h di
m
e
nsi
o
n i
s
cal
cul
a
t
e
d. T
h
i
s
m
e
an i
s
t
h
e
n
su
bt
r
act
ed fr
om
each x
i;
this
produces
a
dataset whe
r
e t
h
e mean is ze
ro.
Step
3: Covari
ance Matrix Is
Calculated
Co
v
a
rian
ce is
measu
r
ed
i
n
mu
lti d
i
m
e
n
s
io
ns. If
we calcu
l
a
te th
e co
v
a
rian
ce in
on
e-d
i
men
s
io
n
we
obt
ai
n
t
h
e
vari
ance. C
onsi
d
er
a 3-
di
m
e
nsi
onal
dat
a
set
(
x
, y
,
z),
f
o
r
w
h
i
c
h we
ha
ve t
o
fi
nd t
h
e c
o
v
(
x
,
y
)
,
c
o
v(x
,
z)
,
cov
(
y,
z
)
.
The e
q
uat
i
o
n
f
o
r
co
vari
a
n
ce i
s
gi
ven
by
:
Co
v(
x,y)
=
∑
(1)
In
a Cov
a
rian
ce
m
a
trix
if all
th
e n
o
n
d
i
ago
n
al ele
m
en
ts h
a
v
e
po
sitiv
e v
a
l
u
e it
m
ean
s th
at X, Y, Z
v
a
riab
le in
crease tog
e
th
er.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1516 –
1524
1
520
St
ep
4:
Ei
ge
n
Vect
o
r
s a
n
d
Ei
gen
Val
u
es
of
the C
ova
riance
Matrix are
calculated.
Here
we cal
c
u
l
a
t
e
an Ei
ge
n
val
u
e a
n
d Ei
g
e
n
vect
or
from
th
e co
v
a
rian
ce m
a
trix
th
at we ob
tain
ed
fr
om
t
h
e pre
v
i
ous
m
e
t
hod.
St
ep
5:
De
ri
vi
n
g
t
h
e
ne
w
dat
a
set
In t
h
e fi
nal
st
e
p
of PC
A t
h
e r
e
qui
red Ei
ge
n
vect
o
r
s are ch
ose
n
fr
om
t
h
e
new
dat
a
set
,
o
b
t
a
i
n
ed f
r
o
m
th
e
p
r
ev
iou
s
step
s.
Th
e transp
ose
o
f
th
e featu
r
e
v
ect
o
r
an
d new d
a
taset is tak
e
n
an
d m
u
ltip
licat
i
o
n i
s
per
f
o
r
m
e
d.
We h
a
v
e
u
s
ed
WEKA
for
p
e
rform
i
n
g
PCA
First, we l
o
ad
th
e d
a
taset in
t
o
weak
, t
h
en
no
rm
alize th
e
dat
a
usi
ng
Pre
-
p
r
oce
ssi
n
g
aft
e
r t
h
at
we
ha
v
e
do
ne t
h
e featu
r
e redu
ction th
e resu
ltan
t
d
a
taset is tak
e
n
fo
r
classification.
Adva
ntages
of
PCA a
r
e:
1)
PCA allows
us
to
decouple the feature s
p
ace
2)
PCA is a robust
m
e
thod in im
age design,
data pa
tterns with the hel
p
of
PC
A sim
ila
rities and
diffe
re
nces
bet
w
een them
are
efficiently ide
n
tified.
3)
Using
PC
A
d
i
men
s
io
n can be red
u
c
ed
b
y
eli
m
in
atin
g
redu
nd
an
t informatio
n
with
ou
t
m
u
ch
lo
ss
in
th
e orig
in
al d
a
ta.
4)
Data can
be
re
m
odeled and
mapped from
high
dim
e
nsional to low
dimensional s
p
ace. The l
o
w
dim
e
nsional space can
be
res
o
lute usi
n
g Eige
nvect
ors
of t
h
e
covaria
n
ce m
a
trix.
3.
DAT
A T
R
A
N
S
FO
RM
ATI
O
N
A
N
D
N
O
RM
ALIZ
ATION
Measurem
ent unit ca
n affect the data a
n
al
ysis.
For exa
m
ple, changi
ng
the m
easure
m
ent unit of
hei
g
ht
fr
om
i
n
ches t
o
m
e
t
e
r
m
a
y
l
ead t
o
very
di
ffe
rent
resu
lts. In
g
e
n
e
ral
,
ex
pressi
n
g
an attrib
u
t
e in
smaller
u
n
it will lead
to
a larg
e rang
e fo
r th
at attribu
t
e, and
t
h
us ten
d
t
o
g
i
v
e
su
ch
an
attri
b
u
t
e greater effect o
r
“wei
g
h
t
”
. T
o
hel
p
a
v
oi
d
de
p
e
nde
nce
o
n
t
h
e ch
oi
ce o
f
measurem
ent unit the data s
h
ou
ld
b
e
no
r
m
ali
zed
or
st
anda
rdi
z
e
d
.
Diffe
re
nt N
o
r
m
alizati
on techniques a
r
e:
1)
M
i
n
M
a
x N
o
r
m
al
i
zati
on
2)
No
m
i
n
a
l to
Bin
a
ry
3)
Z-sc
ore
4)
Decim
a
l Scaling
3.
1.
Mi
n
M
a
x
N
o
rm
al
i
z
ati
o
n
M
i
n m
a
x n
o
r
m
al
i
zati
on
per
f
o
r
m
s
a
lin
ear transfo
r
m
a
tio
n
o
f
d
a
ta to
a
n
e
w valu
e
wh
ich
fits in th
e
in
terv
al [new_min
A
, ne
w
_
m
a
x
A
].
′
∗
_
_
_
(2)
3.
2.
N
o
mi
n
a
l
to
B
i
nar
y
No
m
i
n
a
l v
a
lu
e
s
are co
nv
erted in
to
b
i
n
a
ry v
a
l
u
es.
3.
3. Z
Sc
ore
Z-sc
ore
n
o
rm
ali
zat
i
on n
o
r
m
a
lizes t
h
e
val
u
es
base
d
on
t
h
e m
ean a
n
d
st
an
da
rd
de
vi
at
i
o
n
.
T
h
i
s
m
e
t
hod
is u
s
ed
w
h
en
t
h
e m
i
n
an
d m
a
x
v
a
lu
es ar
e
unk
now
n or
whe
n
the
r
e a
r
e
outl
i
ers that influe
nce m
i
n-m
a
x
n
o
rm
aliza
tio
n
.
′
̅
/
(3)
3.
4. Deci
m
a
l
S
c
al
i
n
g
Deci
m
a
l
scal
i
ng
no
rm
al
i
z
at
i
on
no
rm
al
i
zes by
m
ovi
n
g
t
h
e
d
ecim
a
l
poi
nt
of
val
u
e
s
of at
t
r
i
but
e
A
′
/10
(
4
)
Wh
ere
j
is t
h
e
sm
a
llest in
teg
e
r su
ch th
at m
a
x
(|v
i
'|) <1
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Fea
t
u
r
e Red
u
c
tio
n
in Clin
ica
l
Da
ta
Cl
a
ssifica
tio
n using
Aug
m
en
ted Gen
e
t
i
c Alg
o
r
ithm (
S
rivid
y
a
S
i
va
san
k
ar)
1
521
4.
PROP
OSE
D
SYSTE
M
It is known t
h
at GA ha
s prove
n
its efficiency
i
n
feat
ure red
u
ct
i
o
n.
We ha
ve au
g
m
ent
e
d t
w
o
i
n
n
ovat
i
o
ns t
o
fu
rt
he
r i
m
pro
v
e t
h
e e
ffi
ci
en
cy
of
GA
usi
n
g M
o
di
fi
e
d
G
A
(M
GA
) a
n
d
com
b
i
n
at
i
on
of
PC
A
an
d MGA i.e.
Here we co
m
b
in
e th
e b
e
st feat
u
r
e of PC
A and
th
en
co
m
b
ine it with
M
G
A.
4
.
1
.
Mo
dif
i
ed GA
(M
GA
)
In
o
u
r
p
r
op
os
ed sy
st
em
we m
odi
fi
ed t
h
e
GA
by
i
n
cl
u
d
i
n
g
a M
odi
fi
ed
Kee
p
B
e
st
R
e
pr
o
duct
i
o
n
Strategy
(M
K
BR),
a
m
i
dway
sel
ect
i
on st
rat
e
gy
has bee
n
use
d
at
t
h
e end
of eac
h g
e
nerat
i
o
n [
1
4]
. M
K
B
R
st
rat
e
gy
i
s
a
m
odi
fi
ed ve
rsi
o
n
of Kee
p
B
e
st
R
e
pr
od
uct
i
o
n
St
rat
e
gy
(KB
R
), w
h
i
c
h
ove
rc
om
es t
h
e ri
sk in KB
R
.
In
KB
R
,
best
of
fsp
r
i
n
g f
r
o
m
t
w
o i
s
sel
ect
ed an
d i
s
re
pl
a
ced by
t
h
e
bes
t
parent
. He
re
t
h
ere i
s
a ri
s
k
i
ng
of
l
o
si
n
g
t
h
e
b
e
t
t
e
r o
ffs
p
r
i
n
g t
h
an
t
h
e
ne
xt
pai
r
o
f
pare
nt
s. M
K
B
R
use
s
ad
di
t
i
onal
s
e
l
ect
i
on
deg
r
e
e
s f
o
r
det
e
rm
i
n
i
ng t
h
e su
rvi
v
al
o
f
p
a
rent
a
n
d
of
fsp
r
i
n
g.
M
odi
fi
e
d
Gene
t
i
c
Al
go
ri
t
h
m
wi
t
h
M
odi
fi
e
d
K
eep-
B
est Repr
odu
ctio
n is g
i
v
e
n b
e
l
o
w
:
4.
2. C
o
mbi
n
at
i
o
n of
PC
A an
d
M
G
A
We
p
r
op
o
s
e a
syste
m
in
wh
ich
th
e
d
a
taset is first
prepro
cessed
u
s
ing a
n
o
rmalizat
io
n
m
e
t
h
od
; t
o
th
i
s
we th
en
app
l
y PCA.
After app
l
yin
g
PCA, to th
e
resu
lta
n
t
dataset MGA is
ap
p
lied to
i
n
crease th
e accuracy.
Fi
gu
re
2.
C
o
m
b
i
n
at
i
o
n
of
PC
A a
n
d
M
G
A
fo
r feat
ure
re
du
ct
i
o
n
Th
e
d
a
taset is first no
rm
alized
. Th
is is
do
n
e
to
rem
o
v
e
no
isy d
a
ta presen
t i
n
th
e
d
a
taset. Th
en PCA
a
d
i
m
e
n
s
io
n
a
lity
redu
ction
tech
n
i
q
u
e
is applied
to
th
e
norm
a
l
i
zed
d
a
taset. PCA
reduces th
e nu
m
b
er of
attributes to a
fewe
r num
b
er of attributes. T
h
e m
odified
Gen
e
tic alg
o
rithm is ap
p
lied
to th
e resu
ltan
t
dataset.
MGA
selects o
n
l
y th
e attribu
t
es th
at satisfy th
e fitn
ess
fun
c
tio
n resu
lting
in
t
o
a sub
s
et
o
f
th
e
d
a
taset. Th
ese
s
u
bs
e
t
s
a
r
e th
en
c
l
a
s
s
i
f
i
ed
and
co
m
pute the
correctly classified
rate.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1516 –
1524
1
522
5.
R
E
SU
LTS AN
D ANA
LY
SIS
We ha
ve use
d
t
h
ree t
y
pes of
feat
ure re
d
u
ct
i
on m
e
t
hods P
C
A, com
b
i
n
at
i
on
of PC
A a
n
d M
G
A
,
and
M
odi
fi
e
d
ge
ne
t
i
c
al
gori
t
h
m
and t
h
e res
u
l
t
e
d
dat
a
set
are t
a
k
e
n f
o
r cl
assi
fi
cat
i
on. Feat
ure
red
u
ct
i
o
n i
s
d
o
n
e i
n
order to inc
r
ea
se the accurac
y
of the data sets. Befo
re
fea
t
ure re
ducti
on
the accuracy of dataset1,
dat
a
set2,
dat
a
set
3
,
dat
a
s
e
t
4
an
d dat
a
set
5
were
82
.2%
,
5
3
.
2
%
,
56%
, 55
.7
%
a
n
d
8
5
%
.
5.
1. Resul
t
We ha
ve obtained a classific
a
tion accuracy of 72.5
%
,
94.1%, 58%, 57.5% and 86% for the datasets
using PCA. Datasets yielded a classificatio
n accuracy
of
74.1%, 96.5%, 59%
,
70% and 85% for MGA.
An
im
pro
v
ed
cl
ass
i
fi
cat
i
on acc
ur
acy
has
bee
n
o
b
t
a
i
n
ed
usi
n
g c
o
m
b
i
n
at
i
on
of
PC
A
an
d M
G
A
fo
r al
l
t
h
e
da
t
a
set
s
.
The im
prove
d
accuracy
for t
h
e datasets
a
r
e 83.2%,
97.2%,
66%, 71%
a
n
d 88%.
Tabl
e 1.
acc
u
r
a
c
y
usi
n
g vari
o
u
s feat
u
r
es red
u
ct
i
o
n
m
e
t
hod
Classification acc
u
racy, using Feat
ure redu
c
tion
P
C
A P
C
A
+
MG
A
MG
A
Da
ta
set1
72.
5%
83.
2%
74.
1%
Da
ta
set2
94.
1%
97.
2%
96.
5%
Da
ta
set3
58%
66%
59%
Da
ta
set4
57.
5%
71%
70%
Da
ta
set5
86%
88%
85%
Fi
gu
re
3.
G
r
ap
hi
cal
re
prese
n
t
a
t
i
on
of
cl
assi
f
i
ed dat
a
set
s
wi
t
h
o
u
t
feat
ure
re
duct
i
o
n
Fi
gu
re
4.
G
r
ap
hi
cal
re
prese
n
t
a
t
i
on
of
cl
assi
f
i
ed dat
a
sset
usi
n
g
PC
A
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Fea
t
u
r
e Red
u
c
tio
n
in Clin
ica
l
Da
ta
Cl
a
ssifica
tio
n using
Aug
m
en
ted Gen
e
t
i
c Alg
o
r
ithm (
S
rivid
y
a
S
i
va
san
k
ar)
1
523
Fi
gu
re
5.
G
r
ap
hi
cal
re
prese
n
t
a
t
i
on
of
cl
assi
f
i
ed dat
a
set
usi
n
g
M
G
A
Fi
gu
re
6.
G
r
ap
hi
cal
re
prese
n
t
a
t
i
on
of
cl
assi
f
i
ed dat
a
set
usi
n
g
com
b
i
n
at
i
o
n
of
PC
A
an
d
M
odi
fi
e
d
GA
From
t
h
e expe
ri
m
e
nt
we con
c
l
ude t
h
at
m
o
d
i
fi
ed
GA has a
higher acc
ura
c
y rate com
p
ared to ot
her
feat
ure
re
d
u
ct
i
o
n
m
e
t
hods.
6.
CO
NCL
USI
O
N
In t
h
i
s
pape
r,
we use t
h
ree d
i
ffere
nt
t
y
pes of feat
ure
red
u
c
t
i
on m
e
t
hods,
nam
e
ly
PC
A, M
GA a
n
d
co
m
b
in
atio
n
of PC
A and
M
G
A to id
en
tify th
e
k
e
y fact
o
r
fo
r th
e d
i
ag
no
sis
of
the
diseases.
T
h
e above
m
e
nt
i
oned
feat
ure
re
duct
i
o
n
m
e
t
hods
are a
ppl
i
e
d
t
o
t
h
e
datasets, and the accuracy has
bee
n
c
o
m
puted.
T
h
e
correctly classified
rate usi
n
g PCA
on datas
e
ts are
7
2
.
5
%
,
94
.1
%, 58%
, 5
7
.
5
%
a
nd 8
6
%
.
Dat
a
set
s
y
i
el
ded
a
classification
a
ccuracy of 74.1%, 96
.5%,
59%, 70% a
nd
85% for M
G
A.
An
im
proved c
l
assification ac
curacy
has
bee
n
obt
ai
ned
u
s
i
n
g c
o
m
b
i
n
at
i
on o
f
PC
A a
n
d
M
G
A
for all d
a
tasets. Datasets
sh
owed
an
i
n
creased
accuracy of 83.2%
,
97.2%, 66%, 71%
and
88%. From
the results,
we conc
lude that com
b
ination of PCA a
nd
M
GA
ha
ve a
h
i
ghe
r acc
uracy
rat
e
com
p
are
d
t
o
ot
hers
.
ACKNOWLE
DGE
M
ENTS
Th
is work
is su
ppo
rted
b
y
th
e
D
S
T Funded
Pr
oj
ect,
(
S
R
/
C
SI/
81/
20
1
1
)
und
er Cog
n
i
tiv
e Scien
c
e
Research
In
itiativ
e in
th
e Dep
a
rtm
e
n
t
o
f
Co
m
p
u
t
er Scie
n
ce, Am
rita Sch
o
o
l
of Arts an
d
Scien
ces,
Am
rita
Vish
wa
Vidy
a
p
eetham
Uni
v
e
r
sity
, K
o
chi
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1516 –
1524
1
524
REFERE
NC
ES
[1]
Yohannes Kassahun, Rob
e
rta
Perrone, El
ena
De Momi, Eimar Berghof
er, La
ura
Tassi, Mar
i
a Pao
l
a C
a
nev
i
ni,
Roberto Spreaf
ico, Gian
car
lo Ferrigno,
and Frank Kirchner
.
“Automatic cla
ssification of
epileps
y
ty
p
e
s usin
g
ontolog
y
-
based
and genetic-b
a
sed machine learning”,
Arti
fic
ial In
tell
igen
ce in Me
dicin
e
.
Vol.
61
, No. 2, pp. 79-88,
2014.
[2]
Hongm
ei Yan, Jun Zheng,Yingt
ao
Jiang, Chengli
n
Peng, Shozhoug Xia. “Selec
tin
g critical clini
cal
features for hear
t
diseases diagnosis
with
a r
eal-
c
o
d
ed gen
e
tic
algo
rithm”,
Applied soft
computing,
Vol. 8
,
No. 2, pp
. 1105-1111
, 20
08.
[3]
Es
ra M
a
hs
erec
i Karabulut
, S
e
lm
a A
y
s
e
Oze
l
, an
d Turga
y
Ibr
i
kc
i
.
“
A
Com
p
arati
v
e S
t
ud
y
on th
e effe
ct of fea
t
u
r
e
selection on
clas
sification
accuracy
”, Pro
ced
ia Technolog
y
.
Vol. 1
,
pp. 323-327, 20
12.
[4]
Prof. K Rajeswa
r
i, Dr. v
.
va
ithi
y
a
n
athan
,
and S
h
a
i
laj
a
V. P
e
de
. “
F
eatur
e S
e
l
ect
ion
for Classific
a
tio
n in Medic
a
l Da
t
a
Mining”,
Intern
ational Journal of
emerging tren
ds and techno
lo
gy in
computer s
c
ien
c
e,
Vol. 2,
No. 2, pp
. 492-4
97,
2013.
[5]
Jin-Xing Hao,
Yan Yu, Rob
Law, and
Davis
Ka Chio Fong.
“A genetic al
go
rithm-based lear
ning appro
ach
to
understand custo
m
er
satisfac
tion
with OTA websites”,
Tourism Management,
Vol. 46, pp. 231-241
, 2015
.
[6]
Abdulhamit Subasi,
and M.
Ismail Gurs
o
y
, “EEG signal classifi
cation using PC
A, ICA, LDA
and support vector
m
achines
”,
Expert Systems with
Applica
tions,
Vol. 37
, No
. 12
, pp
. 8659-8666, 201
0.
[7]
Ian W. Flockhart, and Nicholas
J.
Radcliff
e
, “A genetic algor
ithm-
based approach to data
mining”,
KDD-96
Pr
oceed
ings
,
pp
. 299-302, 1996.
[8]
Gunjan Verm
a,
and Vine
eta
Ve
rm
a, “
R
ole
and
appl
ication
of
genetic algor
ith
m
in data mining”,
In
ternation
a
l
journal of comp
ut
er applications,
Vol. 48
, No. 17
, pp
. 5-8
,
2012
.
[9]
Xcechuan W
a
n
g
, Kuldip K. p
a
liwa
l
, “
F
eatur
e
Extra
c
ti
on
and
Dimensionality Reduc
tion Alg
o
rithm and their
Application
in V
o
wel Recognitio
n
”,
Pattern Reco
gnition,
Vol. 36
, No. 10
. Pp. 242
9-2439, 2003
.
[10]
Rashedur M. Rahman, and Fazle Rabbi
Md. H
a
san, “Using and Comparing di
fferent
d
ecis
i
on tree clas
s
i
fi
cat
io
n
techn
i
que for
datamining ICDDR,B
Hospita
l Surv
ell
a
nce
da
ta”
,
Ex
pe
rt Sy
ste
m
w
ith Application
, Vol. 38
, No. 9, pp.
11421-11436, 2
011.
[11]
D. Nith
y
a
, V. Sugan
y
a,
and R.
Sara
n
y
a Iruday
a
n Mar
y
, “Featur
e Selection us
in
g Integer and Binar
y
Cod
e
d Genetic
Algorithm to im
prove th
e perfor
m
ance of SVM classifer”,
Jo
urnal of Compute Ap
plication,
Vol. 6
,
No. 3, pp. 57-61
,
2013.
[12]
W. Siedleck
i an
d J. Sklansk
y
, “A note on genetic
algori
t
hm
for large s
c
al
e fe
a
t
ure s
e
l
ect
ion”
,
Pattern Recogn
it
Letters,
Vol. 10
,
No. 5, pp. 335-3
47, 1989
.
[13]
Michael L
.
R
a
ym
er, Willi
am
F. Punc
h,
Erik
D. Goodm
an, L
e
sl
ie A. Kuhn,
an
d Anil K. Jain
. “Dim
ensionalit
y
Reduction
using
Genetic
algorithms”.
IEEE tra
s
action on
evo
lu
tionary computa
tion,
Vol. 4
,
No
. 2, pp. 164
-171
,
2000.
[14]
M. V. Jud
y
, K.
S Ravichandran
. “A
So
lution to protein fold
ing problem usi
ng a Genetic Algor
ithm with modified
keep b
e
st r
e
prod
uction
strateg
y
”,
Evo
l
utionary C
o
mputation,
pp.
4776-4780, 200
7.
BIOGRAP
HI
ES OF
AUTH
ORS
Srivid
y
a
Sivasankar, Post Gradua
ted in Master
of Com
puter Applic
ations from
Am
rita Vishwa
vid
y
apeetham, in 2015. Her
area of in
ter
e
st in
clu
d
es Data mining.
Sruthi Nair, P
o
st Graduated
in Master of
Computer Applications from Amrita Vishwa
vid
y
apeetham, in 2015. Her
area of
in
ter
e
st in
clu
d
es Programming.
Dr M V Judy
,
P
h
D in Com
puter
S
c
ienc
e is
an As
s
o
ciat
e P
r
ofes
s
o
r and
Head
of
th
e Depa
rtm
e
nt
of CS and
IT
at
Amrita School o
f
Arts and
Sc
ien
ces, Amrita Vish
wa Vid
y
apeeth
am, Kochi. She
is the Princip
l
e I
nvestigato
r
of
a
project unde
r Department of
Scien
ce and Technolog
y
(DST),
Government of I
ndia. Her r
e
sear
ch interests in
clu
d
e computation
a
l biol
og
y,
m
a
chi
n
e l
earning
and data analy
t
ics.
Evaluation Warning : The document was created with Spire.PDF for Python.