TELKOM
NIKA
, Vol. 11, No. 10, Octobe
r 2013, pp. 6
173 ~ 6
178
ISSN: 2302-4
046
6173
Re
cei
v
ed Ma
rch 1, 2
013;
Re
vised Ju
ly
14, 2013; Accepted July 2
4
,
2013
Weighted K-Nearest Neighbor Classification Algorithm
Based on Genetic Algorithm
Xuesong Ya
n
1
, Wei Li
1
, Wei
Chen
1
, Wenjing Luo
1
, Can Zhang
1
, Qinghua Wu
2,3*
,
Hammin Liu
4
1
School of Co
mputer Scie
nc
e, Chin
a Univ
e
r
sit
y
of Geosci
ences, W
uha
n,
Chin
a
2
Hube
i Provinc
i
al Ke
y La
bor
ator
y
of Intel
lig
en
t R
obot, W
uha
n Institute of
T
e
chn
o
lo
g
y
, W
uhan, Ch
in
a
3
School of Co
mputer Scie
nc
e and En
gi
neer
ing, W
uha
n Ins
t
itute of
T
e
chnolo
g
y
, W
u
h
an,
Chin
a
4
W
uhan Institute of Shipb
u
i
l
di
ng T
e
chnol
og
y, W
uhan, Chin
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
:
w
u
qi
ngh
ua
@
s
ina.com
A
b
st
r
a
ct
K-Near
est Nei
ghb
or (KNN) i
s
one of the most po
pu
lar alg
o
rith
ms for data classific
a
t
i
on. Many
researc
hers h
a
v
e foun
d that the K
NN
alg
o
rit
h
m
acco
mp
lish
e
s very go
od p
e
rformanc
e in t
heir ex
peri
m
en
ts
on d
i
fferent d
a
tasets. T
he
traditio
nal K
N
N text cl
assifi
cation
alg
o
rith
m h
a
s li
mitati
ons: calc
ulati
o
n
compl
e
xity, th
e p
e
rformanc
e
is so
lely
de
p
end
ent o
n
th
e
traini
ng s
e
t, and
so
on. T
o
ov
erco
me t
h
ese
li
mitatio
n
s, a
n
i
m
pr
ove
d
vers
i
on
of KNN
is
p
r
opos
ed
in t
h
is
pa
per, w
e
us
e
ge
netic
alg
o
rit
h
m co
mb
in
ed
w
i
th
w
e
ighted
KN
N
to i
m
prove
it
s classific
a
tio
n
perfor
m
ance,
an
d th
e ex
p
e
ri
ment
resu
lts show
n t
hat
our
prop
osed alg
o
r
i
thm
o
u
tperfor
m
s
the KNN w
i
t
h greater acc
u
racy.
Ke
y
w
ords
: dat
a mi
ni
ng, k-ne
arest nei
gh
bor,
w
e
ight
ed k-n
e
a
rest nei
gh
bor,
genetic a
l
g
o
rit
h
m
Copy
right
©
2013 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Data
mining
(DM
)
i
s
a ra
pidly evolvin
g
a
r
t an
d
sci
ence of
discovering
an
d
exploiting
new, u
s
eful, and profitable
relation
ship
s in data that
is awakin
g great intere
st in topics such as
deci
s
io
n ma
king, informati
on ma
nag
em
ent, medi
cal
diag
no
sis,
perfo
rman
ce
pre
d
ictio
n
,
and
many other a
pplication
s
[1]. Classifi
cati
on is a well-reco
gni
zed
DM task a
nd it has b
een
stu
d
ied
extensively i
n
the fiel
ds
of st
atisti
cs,
pattern
re
co
g
n
ition, de
ci
si
on the
o
ry, m
a
chi
ne l
earni
n
g
literature, ne
ural n
e
two
r
ks, etc. Cla
s
sificati
o
n
op
eration
usual
ly uses
sup
e
rvise
d
lea
r
n
i
ng
method
s
th
at indu
ce a cla
s
sificatio
n
m
o
d
e
l
from
a
data
base. Th
e o
b
j
e
ctive i
s
to
predict th
e
cla
s
s
that an example belo
ngs t
o
.
Nea
r
e
s
t neig
hbor
(NN) se
arch is
one
of the most
popul
ar le
arn
i
ng and
cla
s
sificatio
n
techni
que
s i
n
trodu
ced
by
Fix and
Hod
ges [2], whi
c
h ha
s
bee
n
proved
to
be
a
simpl
e
a
n
d
powerful reco
gnition alg
o
rit
h
m. Cover
a
nd Ha
rt [3
] showed that the de
cisi
on rule pe
rform
s
well
con
s
id
erin
g that no explici
t
knowle
dge
of the data
is available. A
simple g
ene
ralizatio
n of this
method is
ca
lled K-NN rul
e
, in which a new pattern
is cla
ssifie
d
into the cla
s
s with the most
membe
r
s p
r
e
s
ent amo
ng the K neare
s
t neigh
bors,
ca
n be use
d
to obtain goo
d e
s
timate
s of the
Bayes erro
r a
nd its pro
babi
lity of error a
sym
ptotically appro
a
che
s
the Bayes erro
r [4].
Geneti
c
Algo
rithms (GAs)
are
stocha
stic s
earch
met
hod
s, whi
c
h
have be
en in
spired by
the pro
c
e
s
s o
f
biologi
cal e
v
olution [5]. Becau
s
e
of GAs'
rob
u
stn
e
ss an
d their uniform
app
roach
to large num
ber of differe
nt classe
s of probl
em
s, they have bee
n used
in m
any applications.
Data mini
ng i
s
al
so on
e of
the importa
n
t
applicatio
n fields of GA
s.
In data minin
g
, a GA ca
n
be
use
d
either t
o
optimize p
a
ram
e
ters for other ki
nd
s of data mining algo
rithm
s
or to disco
v
er
kno
w
le
dge b
y
itself.
In th
is latter task the rules th
at a GA find
s are u
s
u
a
lly more gen
eral
becau
se of its global
sea
r
ch natu
r
e. In contra
st, most of the other data minin
g
method
s are
based o
n
th
e rul
e
in
ducti
on p
a
ra
digm,
wh
ere
the a
l
gorithm
usu
a
lly perfo
rm
s a ki
nd
of lo
cal
sea
r
ch. Th
e
advantag
es
of GAs be
co
me mo
re
ob
vious
wh
en t
he
sea
r
ch
sp
ace
of a
ta
sk i
s
large.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No
. 10, Octobe
r 2013 : 617
3 –
6178
6174
2. K-Near
est
Neighbo
r Al
gorithm
In pattern
re
cognition fiel
d, KNN i
s
o
ne
of
the mo
st importa
nt non
-pa
r
amete
r
al
gorithm
s
[6] and it is
a su
pervi
sed
learni
ng al
g
o
rithm.
The
cla
ssifi
cation
rule
s a
r
e g
enerated by
the
training
sam
p
les them
sel
v
es
without
any ad
diti
on
al data. T
h
e
KNN cl
assif
i
cation
algo
ri
thm
predi
cts the t
e
st sampl
e
’s
categ
o
ry a
ccordin
g
to the
K training
sa
mples which
are th
e ne
arest
neigh
bors to
the te
st sa
mple, an
d ju
dge it to th
a
t
categ
o
ry
which
ha
s the
larg
est
cate
gory
probability. The process of KNN
algorithm to classify sampl
e
X
is
[7]:
1. Suppo
se
there a
r
e
j
tra
i
ning catego
ries
12
,
,
..
.,
j
CC
C
and the
sum of the t
r
ainin
g
sampl
e
s i
s
N
after feature red
u
ction, they b
e
co
m
e
m-di
m
ensi
on feature vector;
2. Make sam
p
le
X
to be the same featu
r
e vecto
r
of the form
12
(
,
,
...,
)
m
X
XX
, as all
training sam
p
les;
3. Calculate the simila
ritie
s
between all
training sam
p
les an
d
X
. Taking the
th
i
sam
p
le
i
d
12
(
,
,
.
..,
)
ii
i
m
dd
d
as
an
example,
the
simi
larity
(,
)
i
SIM
X
d
is as following:
1
22
11
.
(,
)
()
.
(
)
m
ji
j
j
i
mm
ji
j
jj
Xd
SI
M
X
d
Xd
;
4. Cho
o
se
k
samples whi
c
h are
la
rg
er
fro
m
N
simila
rities of
(,
)
i
SIM
X
d
,
(1
,
2
,
.
.
.
,
)
iN
,
and treat the
m
as
a KNN
colle
ction
of
X
. Then,
cal
c
ul
ate the p
r
ob
a
b
ility of
X
belon
g to ea
ch
categ
o
ry
re
spectively
with the foll
owing fo
rmula:
(,
)
(
,
)
.
(
,
)
ji
i
j
d
P
XC
S
I
M
X
d
y
d
C
, wh
ere
(,
)
ij
yd
C
is a categ
o
ry
attribute function, which sa
tisfied
1,
(,
)
0,
ij
ij
ij
dC
yd
C
dC
;
5. Judg
e sam
p
le
X
to be the categ
o
ry whi
c
h has the la
rg
est
(,
)
j
PX
C
.
The traditional KNN text c
l
as
s
i
fic
a
tion has
three limitat
ions
[8]:
1.
High cal
c
ulation compl
e
xity:
To
find
out
the k
n
eare
s
t nei
gh
bor
sam
p
le
s, all the
simila
rities
b
e
twee
n the t
r
ainin
g
samp
les mu
st b
e
cal
c
ulate
d
. Whe
n
the n
u
m
ber
of train
i
ng
sampl
e
s i
s
le
ss,
t
he K
N
N
cla
ssif
i
e
r
is
n
o
long
er
o
p
timal, but if the trainin
g
set contai
ns
a h
uge
numbe
r of
sample
s, the
KNN
cla
s
sifier n
eed
s mo
re time to
calcul
ate the
simila
rities. T
h
is
probl
em
can
be solve
d
i
n
three
way
s
: redu
cing
th
e dimen
s
io
ns of the featu
r
e spa
c
e; u
s
i
n
g
smalle
r data
sets; u
s
ing im
proved al
go
rithm whi
c
h can
accelerate to [9];
2. Depen
den
cy on the tra
i
ning set: Th
e cla
ssi
fie
r
is generated o
n
ly with the trainin
g
sampl
e
s
and
it does
not
use
any ad
di
tional data. T
h
is ma
ke
s th
e algo
rithm t
o
dep
end
on
the
training
set e
x
cessively; it
need
s re
cal
c
ulation ev
en i
f
there is a small cha
nge
on trainin
g
se
t;
3. No
weig
ht
differen
c
e
bet
wee
n
sample
s: All the trai
ning
sam
p
les are t
r
eate
d
equally;
there i
s
n
o
dif
f
eren
ce
between the
sa
mp
les
with
small
numb
e
r of
d
a
ta and
hug
e
numbe
r of d
a
t
a
.
So it doe
sn’t
match th
e
actual
phe
no
menon
wh
ere the sampl
e
s h
a
ve un
e
v
en dist
ributi
o
n
comm
only.
3. Weighted
K-Nea
r
es
t Neighbor Alg
o
rithm
Con
c
e
p
tually, ea
ch fe
ature
x
, called
an
i
n
stan
ce, i
s
re
pre
s
ente
d
a
s
a ve
ctor of l
ength
||
F
, the size of
the re
co
rd:
11
2
2
(
)
,
(
),
.
..,
(
)
FF
wx
w
x
w
x
, where
()
j
j
wx
is the weight of
the
th
j
term. That weight may be
set acco
rdi
n
g to diffe
rent
crite
r
ia, such
as: freq
uen
cy
, TFIDF or a
score assi
gned to the feat
ure for
its capability to divi
de the
exam
ples into
som
e
set of cl
asses.
The sim
p
lest
feature weigh
t
ing function i
s
to assi
gn th
e same value
to each term
that occu
rs i
n
the instan
ce
(let us
say 1
for instan
ce
), and
0 to those that d
o
not, which a
m
ount to a n
on-
weig
hted feat
ure
s
app
ro
ach.
In traditional
KNN al
gorith
m
[10], used
distan
ce
a
s
a
basi
s
to wei
ght the co
ntri
bution of
each
k
neighb
or in th
e cl
a
ss
assig
n
me
nt pro
c
e
s
s
a
nd defin
e th
e co
nfiden
ce
of having d
a
ta
d
belon
ging to
cla
ss
c
as :
'|
(
(
'
)
)
('
,
)
(,
)
(,
)
ii
i
i
kK
C
l
a
s
s
k
c
i
kK
Sim
k
d
Co
n
fid
e
n
c
e
c
d
sim
k
d
, where
Si
m
is th
e value
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Weig
hted K-Nea
r
e
s
t Neig
hbor
Cla
ssifi
cation Algorith
m
Based on Geneti
c
… (Q
inghu
a Wu
)
6175
returned by the simila
rity functio
n
used
to compa
r
e t
he insta
n
ce with its neig
h
bors. That is,
fo
r
each neig
hbo
r in the nei
gh
bor
set
K
(of
si
ze
k
) belon
gin
g
to a parti
cu
lar cl
ass
c
, then sum u
p
their
simila
rities to
data
d
and divide
by th
e summatio
n
of all
similariti
es
of the
k
neig
hbors with
rega
rd to
d
.
To comp
are
data
d
wit
h
i
n
st
an
ce
i
, it ca
n
cho
o
se
the
Co
sS
im
functio
n
which i
s
particula
rly si
mple u
s
in
g th
e bin
a
ry term
wei
ght ap
pro
a
ch:
(,
)
*
C
Co
sS
im
i
d
A
B
, where
C
is the
numbe
r of terms that
i
and
d
share,
A
is the n
u
mbe
r
of term in
i
and
B
the numbe
r of terms in
d
. The neig
h
b
o
r
set
K
of
d
t
hus
comp
ri
se
s t
h
e
k
insta
n
ces t
hat ran
k
the
highe
st a
c
cording to
that measu
r
e.
4. Weighted
KNN Algo
rith
m
Based G
e
netic Algori
t
hm
4.1. Algorith
m
The mo
st popula
r
evolu
t
ionary mod
e
l use
d
in the curre
n
t resea
r
ch is
Geneti
c
Algorithm
s, o
r
iginally
devel
oped
by
Joh
n
Holl
and
[11]. The
GA
rep
r
odu
ction
ope
rators,
such a
s
recombi
natio
n and
mutati
on, are con
s
i
dere
d
an
alog
ous to th
e bi
ologi
cal p
r
o
c
ess of mutati
on
and cro
s
sove
r re
spe
c
tively in populatio
n geneti
cs. T
he re
com
b
in
ation ope
rato
r is traditio
n
a
lly
use
d
as the
prima
r
y sea
r
ch ope
rato
r in GA wh
ile the mutation
operator is
consi
dered to be a
backg
rou
nd o
perato
r
, whi
c
h is appli
ed with a small probability.
Traditio
nally, GA uses a bina
ry
strin
g
re
p
r
esentation
of ch
romo
somes with
con
c
e
n
tration
on the
notio
n of 'sch
ema
t
a'. A schem
a is
a templ
a
te that allo
ws
explo
r
ing
the
simila
rity
among ch
romo
somes. Gen
e
tic
Algo
rithm
s
model evol
ution a
s
a
se
arch fo
r st
ru
ctu
r
es
or b
u
ilding
bl
ocks th
at pe
rform well in
a given
e
n
vironment. Th
erefore, the
re
combinatio
n a
nd
mutation op
e
r
ators focus
on an individ
ual's
stru
ct
u
r
e, not the structu
r
e'
s inte
rpretation. T
he
results of a
pplying re
pro
ductio
n
ope
ration in
GA generate solution
s t
hat share stru
ctural
simila
rities
wi
th their
paren
ts but m
a
y ha
ve signifi
cantl
y
different int
e
rp
retation
s.
Ho
wever, m
a
ny
recent appli
c
ations
of GA
have
used
other
re
prese
n
tation su
ch
as graph
s,
Li
sp expressio
n
s,
orde
re
d list, and red
-
valu
ed
vectors.
Geneti
c
algo
rithm pseu
do
-cod
e:
t=
0;
initialize P(t);
evaluate stru
cture
s
in P(t);
repe
at
t=
t+
1;
sele
ct- reprod
uction
C(t)f
r
o
m
: P(t-1);
combi
ne an
d mutate stru
ct
ure
s
in C(t)forming C'
(t);
evaluate stru
cture
s
in C'(t);
sele
ct-repl
ace P(t) from C
'(t) an
d P(t+ 1
)
;
Until ( termin
ation co
nditio
n
satisfie
d ).
The above
code gives the
basi
c
algo
rithmic st
e
p
s fo
r GA. After th
e initial popul
ation of
individual
s is
gene
rated
(u
sually rando
mly) and in
di
viduals' stru
ct
ure
s
are
eval
uated, the loo
p
is
entere
d
. The
n
a sele
ction
buffer C(t) is
cre
a
ted
to accomm
odate t
he sel
e
cte
d
copie
s
from P(t-l),
"sele
c
t-rep
r
o
ductio
n
". In the Hollan
d
o
r
iginal
GA, in
dividual
s are
sele
cted
pro
babili
stically
b
y
assigni
ng ea
ch in
dividual
a pro
babilit
y propo
rtion
a
l to its structural fitness. Thu
s
, bet
ter
individual
s a
r
e give
n mo
re o
ppo
rtunit
y
to pro
d
u
c
e offsp
r
ing.
Next the va
riation op
erators
(mutation
an
d cro
s
sover)
are
appli
ed t
o
the in
divi
d
uals in
C(t) b
u
ffer p
r
od
uci
ng offsprin
g
C(t).
After evaluating the stru
ctural fitne
s
s of C(t)
, the sele
ction
method is a
pplied to sel
e
ct
repla
c
e
m
ent for P(t) from
C'(t) an
d P(t-1
)
.
In gen
eral,
g
enetic algo
rit
h
ms
are u
s
u
a
lly
used
to solve pro
b
le
ms with
little
or no
domain
kno
w
ledge, NP
-co
m
plete pr
obl
ems, an
d pro
b
lems fo
r wh
ich ne
ar opti
m
um sol
u
tion
is
sufficie
n
t [12, 13]. The GA methods
ca
n be appli
ed
only if there exist a rea
s
o
nable time a
nd
spa
c
e for evo
l
ution to take
place.
String
Rep
r
e
s
entatio
n [14]
-
Here the
ch
romo
som
e
s a
r
e
en
code
d
with re
al n
u
mb
ers;
the
number of genes in each chrom
o
so
me represents the f
eatures in the training
set. Each gene will
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No
. 10, Octobe
r 2013 : 617
3 –
6178
6176
have 5 digits for vector in
dex and feat
ure nu
mbe
r
of gene
s. For example, if in the data, each
record ha
s 5 feature
s
, a sa
mple ch
rom
o
some m
a
y look a
s
follows:
0010
0 100
10
0025
6 018
75
0009
8
Once the initial popul
ation
is gene
rated
now we are ready to ap
ply genetic op
erato
r
s.
With the
s
e
n
e
ighb
ors, the
distan
ce
between e
a
ch
sa
mple in
the te
sting
set i
s
ca
lculate
d
an
d t
he
accuracy is
st
ored a
s
the fitness value
s
o
f
this chromo
some.
Rep
r
od
uctio
n
(sel
ection
) -
The selectio
n
pro
c
e
ss
sele
cts chromo
so
mes fro
m
the
mating
pool
dire
cted
by the
surviv
al of the
fittest con
c
e
p
t of n
a
tural
gen
etic syste
m
s. In
the p
r
op
ortio
n
a
l
sele
ction
stra
tegy adopted
in this articl
e
,
a ch
rom
o
so
me is a
ssi
gn
ed a numb
e
r
of copie
s
, wh
ich
is p
r
op
ortio
n
a
l to its fitne
s
s in th
e p
o
p
u
lation, that
go into
the
mating p
ool f
o
r fu
rthe
r g
e
netic
operation
s
. Roulette
whe
e
l sel
e
ctio
n i
s
o
ne
comm
on t
e
ch
niqu
e that
implem
ents the p
r
op
ortion
al
sele
ct
ion st
ra
t
egy
.
The
g
eneti
c
operator: cro
s
sover and
mutation,
we
use the t
r
adi
tional meth
od
. We
use
the singl
e po
int cro
s
sover and ra
ndom
mutation. After the gen
etic ope
rato
rs a
r
e ap
plied, th
e
local
maximu
m fitness val
ue i
s
calcula
t
ed an
d com
pare
d
with gl
obal m
a
ximu
m. If the local
maximum is
greate
r
than
the global m
a
ximum then
the global m
a
ximum is a
s
sign
ed with t
he
local maxim
u
m, and the n
e
xt iteration is co
nti
nue
d with the ne
w
popul
ation. The clu
s
te
r poi
nts
will be repo
si
tioned corre
s
pondi
ng to th
e ch
romo
so
m
e
having gl
ob
al maximum. Otherwise, the
next
iteration
is continu
ed with
the sa
m
e
old pop
ulati
on. Thi
s
p
r
o
c
ess i
s
repe
ated for N num
ber
of iterations.
The algo
rithm
process
step
is given as Fi
gure 1.
Figure 1. Algorithm fram
e
w
ork
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Weig
hted K-Nea
r
e
s
t Neig
hbor
Cla
ssifi
cation Algorith
m
Based on Geneti
c
… (Q
inghu
a Wu
)
6177
4.2. Experiment Results
The p
e
rfo
r
m
ance of th
e
approa
che
s
discu
s
sed in
this p
ape
r h
a
s
b
een
teste
d
with
5
different data
s
ets, do
wnl
o
aded fro
m
UCI machine
l
earni
ng data
repo
sito
ry. All experime
n
ts
are
perfo
rmed
on
Intel Core(T
M)2
Duo
CP
U 2.26
GHz/4
G
RAM La
pto
p
. Each d
a
ta
sets
run
10 ti
mes
with the
k=3,
and
ha
s diff
erent val
ue
with popul
ation
si
ze. Ta
ble
1 sho
w
s th
e
details abo
ut
the
datasets u
s
e
d
in this pap
e
r
.
Table 1. Experime
n
t dataset
Dataset Name
Total No. of
Instances
Total No. of
Features
Balance 624
5
Breast 350
9
Diabetes 768
8
Glass 214
10
Waveform 5000
21
Table
2 d
epi
cts th
e p
e
rfo
r
man
c
e
a
c
cu
racy
of ou
r p
r
opo
se
d
cla
s
sifier
co
mpa
r
ed
with
traditional K
NN. From the results it is sh
own that our prop
osed method
outperfo
rm
s the
traditional K
N
N method
with highe
r accu
racy.
Table 2. Experime
n
t Re
su
lts Comp
ari
s
o
n
Dataset
Genetic Algorith
m
K-Nearest N
e
igh
bor
Algorithm Accuracy
R
a
te
(%
)
Population
Size
Gene
ration
Size
Accur
a
cy
Rate(%
)
Balance 10
200
88.63
84.47
20 200
87.57
30 200
87.91
Breast 10
200
98.83
96.94
20 200
97.57
30 200
97.76
Diabetes 10
200
72.46
69.70
20 200
73.73
30 200
74.84
Glass 10
200
67.48
64.57
20 200
66.13
30 200
67.11
Waveform 10
200
80.69
79.89
20 200
81.58
30 200
81.37
5. Conclusio
n
The KNN cl
a
ssifi
cation al
g
o
rithm is
one
of
the most popul
ar nei
g
hborhoo
d cla
ssifie
r
in
data cla
s
sification. Ho
wev
e
r, it has li
mitations
su
ch a
s
: gre
a
t calculation
compl
e
xity, fully
depe
ndent o
n
trainin
g
set, and no
wei
ght differen
c
e between e
a
ch
cla
s
s. To com
bat thi
s
, a
novel metho
d
to improve the cl
assif
i
cation
perfo
rmance of K
NN
usi
ng G
enetic Alg
o
rit
h
m
combi
ne with
weighted K
N
N algo
rithm is pro
p
o
s
ed in
this pape
r. We u
s
e this
new alg
o
rith
m for
data
cla
ssifi
cation a
nd the
experi
m
ent
result
s
sh
o
w
n
that ou
r p
r
o
posed
algo
rithm outp
e
rfo
r
ms
the KNN with
greate
r
a
c
curacy.
Ackn
o
w
l
e
dg
ments
This
pap
er i
s
su
ppo
rted
b
y
Natural Sci
ence Fo
und
a
t
ion of China
.
(No.6
127
24
70 a
nd
No.61
203
307
), Natio
nal
Ci
vil Aero
spa
c
e
Pre-re
se
ar
ch Proj
ect of
Chin
a, the Provincial
Natu
ral
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No
. 10, Octobe
r 2013 : 617
3 –
6178
6178
Scien
c
e Fo
u
ndation of Hu
bei (No. 2012
FFB0410
1)
a
nd the Fund
a
m
ental Re
se
arch Fou
n
d
s
for
Nation
al Univ
ersity, Chin
a Univ
ersity of Geo
sci
en
ce
s (Wuhan
).
Referen
ces
[1]
J Ha
n, M K
a
m
ber. D
a
ta Mi
ni
ng: C
onc
epts
and
T
e
chniq
u
e
s
. Morga
n
K
a
u
f
mann P
u
b
lish
e
rs Aca
demi
c
Press. 2001.
[2]
E Fix and J Hodges.
Discr
i
m
inatory
an
alys
i
s
. Non
para
m
etric
discr
i
m
in
ati
on: C
onsist
enc
y pro
perties
.
T
e
chnical Re
p
o
rt 4. USAF
Schoo
l of Aviati
o
n
Medic
i
ne. Ra
ndo
lph F
i
e
l
d, T
e
xas. 195
1.
[3]
T
M
Cover and
PE Hart. Neares
t neig
h
b
o
r pattern class
i
fi
cation.
IEEE Transactions on Inform
ation
T
heory
. 196
7; 13: 21-2
7
.
[4]
RO Duda a
nd
PE Hart. Pattern classific
a
tion
and sce
ne a
nal
ysis. Ne
w
Y
o
rk
: W
ile
y
.
19
73.
[5]
Z
Michal
e
w
icz.
Genetic A
l
gor
i
t
hms+
Data Structur
es=
E
vol
u
tion Pr
ogr
ams. 3th Editi
on, S
p
ring
er-Verl
ag.
199
9.
[6]
Belur
V.
Dasar
a
thy
.
Ne
arest
Neig
hb
or (N
N)
Nor
m
s:
NN P
a
ttern C
l
assific
a
tion
Tech
niq
u
e
s
. Mc Gra
w
-
Hill, C
o
mputer
Science S
e
rie
s
. IEEE Computer Soci
et
y
P
r
ess. Las Ala
m
itos
, Califor
ni
a, 199
1: 217
-
224.
[7]
Y Li
hua,
D Q
i
, an
d G Ya
nj
un. Stud
y o
n
KNN
T
e
xt C
a
tegor
izatio
n Algorit
hm.
Mic
r
o Co
mputer
Information
, 2
0
06; 21: 26
9-27
1.
[8]
W
Yu an
d W
Z
heng
gu
o.
A
fast KNN a
l
g
o
rith
m for text
categor
i
z
a
t
i
o
n
. Proceed
in
gs
of the Si
xth
Internatio
na
l C
onfere
n
ce o
n
Machi
ne Le
arn
i
ng
a
nd C
y
b
e
rn
etics. Hong Ko
ng. 200
7: 343
6
-
344
1.
[9]
W
Yi, B Shi
an
d W
Z
han
g’o
u
. A F
a
st KNN A
l
gorit
hm Ap
pli
e
d to W
eb T
e
xt
Categ
o
rizati
on.
Journ
a
l of
the Chi
na Soc
i
ety for Scientifi
c
and T
e
chn
i
ca
l Informati
o
n
. 2
007; 26(
1): 60-
64.
[10]
Pascal S
ouc
y,
Gu
y
W
Min
e
au.
A Si
mp
le
KNN Alg
o
rith
m for T
e
xt Ca
tegori
z
at
ion
. P
r
ocee
din
g
s of
Internatio
na
l C
onfere
n
ce o
n
Date Min
i
ng. 2
001: 64
7-6
48.
[11]
J Holla
nd. Ad
a
p
tation i
n
natur
al an
d artificia
l
s
y
stems. Univ
e
r
sit
y
of Michi
g
a
n
press. 197
5.
[12]
Qingh
ua W
u
,
Hanmi
n
Liu, Y
u
xin S
un, F
a
n
g
Xie,
J
i
n Z
h
a
ng,
Xu
eson
g
Yan.
R
e
searc
h
of F
unctio
n
Optimization Algorit
hm.
T
E
LKOMNIKA Indones
ian Jo
urn
a
l of
Electrical
Engin
eeri
n
g
. 201
2; 10 (4):
858-
863.
[13]
Xu
eso
ng Y
an,
Qingh
ua W
u
,
Can Z
h
an
g, W
e
i
Li, W
e
i C
hen, W
e
n
jin
g
Luo. An Impr
o
v
ed Gen
e
tic
Algorit
hm an
d
Its Applicati
on.
T
E
LKOMNIKA Indo
nesi
an
Journ
a
l of E
l
e
c
trical En
gin
e
e
r
ing
. 20
12
; 10
(5): 1081-
10
86
.
[14]
N Sugu
na an
d K
T
hanush
k
odi. An Improved k-
Ne
are
s
t Neigh
bor Classific
a
tio
n
Using Ge
neti
c
Algorit
hm.
Internatio
nal J
ourn
a
l of Co
mp
uter
Science Issue
s
. 2010; 7(4): 1
8
-21.
Evaluation Warning : The document was created with Spire.PDF for Python.