Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
6,
No.
6,
December
2016,
pp.
3222
–
3228
ISSN:
2088-8708
3222
A
New
P
aradigm
f
or
De
v
elopment
of
Data
Imputation
A
ppr
oach
f
or
Missing
V
alue
Estimation
Madhu
G
and
Nagachandrika
G
Dept.
of
Information
T
echnology
VNR
V
ignana
Jyothi
Institute
of
Engineering
and
T
echnology
Hyderabad-90,
T
elang
ana,
India
Article
Inf
o
Article
history:
Recei
v
ed
Mar
26,
2016
Re
vised
No
v
8,
2016
Accepted
No
v
21,
2016
K
eyw
ord:
Centroids
Distance
Imputation
Missing
v
alue
Nearest
Neighbour
ABSTRA
CT
Man
y
re
al-w
orld
applications
encountered
a
common
issue
in
data
analysis
is
the
pres-
ence
of
missing
data
v
alue
and
challenging
task
in
man
y
applications
such
as
wireless
sensor
netw
orks,
medical
applications
and
psychological
domain
and
others.
Learning
and
prediction
in
the
presence
of
missing
v
alue
can
be
treacherous
in
machine
learning,
data
mining
and
statistical
analysis.
A
missing
v
alue
can
signify
important
information
about
dataset
in
the
mining
process.
Handling
missing
data
v
alue
is
a
challenging
task
for
the
data
mining
process.
In
this
paper
,
we
propose
ne
w
paradigm
for
the
de
v
elop-
ment
of
data
imputation
method
for
missing
data
v
alue
estimation
based
on
centroids
and
the
nearest
neighbours.
Firstly
,
identify
clusters
based
on
the
k-means
algorithm
and
calculate
c
entroids
and
the
nearest
neighbour
data
records.
Secondly
,
the
nearest
distances
from
complete
dataset
as
well
as
incomplete
dataset
from
the
centroids
and
estimated
the
nearest
data
record
which
tends
to
be
curse
dimensionality
.
Finally
,
im-
pute
the
missing
v
alue
based
nearest
neighbour
record
using
stat
istical
measure
called
z-score.
The
e
xperimental
study
demonstrates
strengthen
of
the
proposed
paradigm
for
the
imputation
of
the
missing
data
v
alue
estimation
in
dataset.
T
ests
ha
v
e
been
run
us-
ing
dif
ferent
types
of
datasets
in
order
to
v
alidate
our
approach
and
compare
the
results
with
other
imputation
methods
such
as
KNNI,
SVM
I,
WKNNI,
KMI
and
FKNNI.
The
proposed
approach
is
geared
to
w
ards
maximizing
the
utility
of
imputation
with
respect
to
missing
data
v
alue
estimation.
Copyright
c
2016
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
Madhu.G
Department
of
Information
T
echnology
VNR
V
ignana
Jyothi
Institute
of
Engineering
and
T
echnology
T
elang
ana,
Hyderabad-90,
INDIA.
E-mail:
madhu
g@vnrvjiet.in
1.
INTR
ODUCTION
Man
y
real-w
orld
applications
encountered
a
common
issue
in
data
analysis
is
the
presence
of
miss-
ing
or
incomplet
e
data
v
alues.
Ho
we
v
er
,
data
mining
applications
are
associated
with
industry
applications,
wireless
sensor
netw
orks,
medical
applications,
psychological
applications
and
others.
Modern
computational
techniques
of
data
cleaning
require
complete
dataset.These
databases
are
highly
susceptible
to
missing
and
inconsistent
data
due
to
their
huge
amount
of
data
sizes
[1-2].
Missing
v
alue
has
dif
ferent
causes
lik
e
the
data
v
alue
might
not
be
recorded,
equipment
malfunctions,
improper
measurements
and
den
y
answers
to
certain
questions
[3-4].
Missing
data
v
alue
in
the
training
dataset
can
reduce
the
performance
of
the
model
or
that
can
lead
biased
results
to
the
model
and
it
leads
to
misleading
inferences
in
data
analysts.
Recently
,
man
y
researchers
ha
v
e
been
proposed
se
v
eral
methods
to
treat
m
issing
v
alue
problems
for
real-w
orld
applications
[5-10].
Generally
,
missing
v
alue
treatment
methods
classified
into
:
i)
ignoring
and
discarding
data
v
alue
or
case/pairwise
deletion
[11];
ii)
parameter
estimation/Expectation-Maximization
algorithm
[12];
iii)
Imputa-
J
ournal
Homepage:
http://iaesjournal.com/online/inde
x.php/IJECE
,
DOI:
10.11591/ijece.v6i6.10632
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3223
tion,
represents
the
process
of
filling
the
missing
data
v
alues
in
datasets
by
some
plausible
v
alues
based
on
information
a
v
ailable
in
the
dataset
[13].
First,
tw
o
methods
are
suf
fering
with
se
v
eral
limitations
such
as:
i)
high
sensiti
vity
to
the
outliers
and
slo
w
computations;
ii)
Imputation
methods
a
re
suf
fering
with
ho
w
to
select
the
conditioning
v
ariables
and
bias
those
results
from
a
bad
choice.
T
o
address
these
issues,
we
de
v
elop
a
ne
w
paradigm
for
data
imputation
of
the
missing
data
v
alue
estimation
using
cluster
cent
roids
and
the
nearest
neigh-
bours.
Also,
our
approach
uses
the
centroids
and
the
nearest
neighbours
for
the
f
amily
of
k-means
clustering
algorithm.
This
study
demonstrates
that
the
proposed
imputation
method
can
significantly
estimate
the
missing
v
alue
in
datasets.
The
rest
of
the
paper
is
or
g
anized
as
follo
ws:
related
w
ork
presented
in
section-2,
while
in
sect
ion-
3,
proposed
a
ne
w
paradigm
for
data
imputation
method
for
the
missing
data
v
alue
estimation
st
ep
by
step
approach.
In
section-4
presents
the
results
and
analysis.
Conclusions
are
deferred
to
section-5.
2.
RELA
TED
W
ORKS
This
section
represents
mechanisms
of
dif
ferent
data
imputation
methods
and
treatments
for
the
pur
-
pose
of
data
cleaning
in
the
data
mining
process.
Description
of
the
data
imputation
mechanisms
and
treatments
are
as
follo
ws:
2.1.
Mechanism
of
missing
data
imputation
Little,
R.J.,
Rubin,
D.B
[14]
suggested
three
types
of
mechanisms
for
imputation
of
missi
ng
data
such
as:
i)
Missing
Completely
at
Random
(MCAR):
Suppose
fe
w
data
records
are
missing
on
X,
then
these
data
records
are
said
to
be
MCAR
if
the
probability
of
X
is
missing
with
unrelated
X
or
other
v
ariable
Y
then
the
pr
ob
(
X
ismissing
j
Y
;
X
)
=
pr
ob
(
X
ismissing
)
:
ii)
Missing
at
random
(MAR):
Suppose
fe
w
data
records
X
are
m
issing
at
random,
i
f
the
probability
of
that
X
is
missing
does
not
depend
on
the
v
alue
X,
after
controlling
the
o
bs
erv
ed
data
b
ut
not
on
missing
data
then
the
pr
ob
(
X
ismissing
j
Y
;
X
)
=
pr
ob
(
X
ismissing
j
X
);
iii)
Not
missing
at
random
(NMAR):
):
In
this
approach
assumption
is
violated
and
sharing
of
a
sample
ha
ving
a
missing
data
v
alue
for
an
attrib
ute
depends
on
the
missing
v
alues.
2.2.
Missing
data
tr
eatment
methods
In
literature
v
arious
approaches
ha
v
e
been
suggested
to
treat
missing
v
alue
problems.
Little
and
Rubin
[14]
classified
these
approaches
into
three
cate
gories:
i)
Ignoring
and
deleting
data
records:
this
approach
to
discard
data
with
the
missing
v
alue
into
a
complete
case
analysis
and
delete
instances
and/or
attrib
utes.
These
methods
are
applicable
only
missing
data
are
MCAR
otherwise
it
can
lead
bias
results.
ii)
P
arameter
estimation
approach:
this
approach
is
based
on
estimation
of
the
parameters
in
gi
v
en
model
then
the
presence
of
missing
v
alue
based
on
Expectation-Maximization
algorithm
[12].
iii)
Imputation
Methods:
to
impute
the
missing
v
alue
with
probable
v
alue
based
on
information
obtainable
in
the
dataset.
The
main
idea
of
this
approach
is
to
emplo
y
kno
wn
v
alue
that
can
be
identified
in
the
v
alid
data
v
alues
of
the
gi
v
en
dataset
to
assist
in
assessing
the
missing
v
alues.
T
o
imput
e
the
missing
v
alue
based
dif
ferent
type
of
methods
such
as
single
imputation
and
multiple
imputation
methods
[15].
But
multiple
imputation
approaches
are
computationally
more
e
xpensi
v
e
than
the
single
imputation
methods
[15].
Ho
we
v
er
,
the
y
accommodate
for
sample
v
ariability
of
the
estim
ated
v
alue
and
uncertainty
associated
with
a
model
used
for
computation
[16].
Also,
these
tw
o
methods
can
be
classified
into
three
cate
gories
such
as
i)
data
dri
v
en
approach;
ii)
model
based
approach;
and
iii)
machine
learning
approach
[14-17].
The
abo
v
e
discussion
leads
us
to
propose
this
research
w
ork
presents
a
no
v
el
frame
w
ork
for
data
imputation
which
addresses
the
problem
of
missing
data
v
alues.
3.
A
NEW
P
ARADIGM
FOR
D
A
T
A
IMPUT
A
TION:
PR
OPOSED
APPR
O
A
CH
This
section,
presents
a
ne
w
paradigm
for
data
imputation
to
address
the
problem
of
missing
v
alue
which
is
based
on
tw
o
distance
measures
that
are
used
to
define
a
ne
w
feature
between
its
cluster
center
and
the
nearest
neighbour
respecti
v
ely
as
sho
wn
in
Fig.1.
Let
M
be
the
gi
v
en
dataset
containing
of
ro
ws
and
columns
and
each
ro
w
is
represented
by
a
ro
w
v
ector
(
R
1
;
R
2
;
:R
m
)
and
each
column
is
represented
by
a
column
v
ector
(
C
1
;
C
2
;
::C
n
)
and
the
dataset
M
is
represented
as
m
x
n
matrix:
A
Ne
w
P
ar
adigm
for
De
velopment
of
Data
Imputation
Appr
oac
h
for
Missing
V
alue
Estimation
(Madhu
G)
Evaluation Warning : The document was created with Spire.PDF for Python.
3224
ISSN:
2088-8708
Figure
1.
A
Ne
w
P
aradigm
for
Data
Imputation
(NPDI)
M
mn
=
0
B
B
B
@
a
11
a
12
a
1
n
a
21
a
22
a
2
n
.
.
.
.
.
.
.
.
.
.
.
.
a
m
1
a
m
2
a
mn
1
C
C
C
A
Here
matrix
M
represents
the
samples
of
0
m
0
observ
ations
and
0
n
0
v
ariables
and
each
cell
is
denoted
as
ordered
n-tuples
of
data
records
such
as
(
a
i
1
;
a
i
2
;
;
a
i
(
n
1)
;
a
in
)
for
each
i
=
1
;
2
;
3
;
m
where
the
last
column
data
attrib
ute
(
a
in
)
for
each
i
characterizes
the
tar
get
class
or
decision
data
attrib
ute
of
the
dataset
M.
Clearly
,
we
say
that
all
datasets
are
finite
set.
An
object
0
a
0
i
is
kno
wn
as
the
complete
dataset,
if
f
a
ij
6
=
j
;
8
1
j
R
g
and
the
object
0
a
0
i
is
called
the
incomplete
dataset
f
a
ij
=
j
;
9
1
j
R
g
.
No
w
split
the
gi
v
en
dataset
into
tw
o
forms
that
is:
i)
first
form
is
a
complete
dataset
which
contains
records
without
missing
v
alues.
ii)
Second
form
is
an
incomplete
dataset
which
contains
missing
records
with
one
or
more
attrib
utes
which
is
kno
wn
as
the
missing
v
alue
dataset.
Then
consi
der
complete
dataset
as
the
training
set
and
missing
data
set
as
the
testing
set.
This
w
ork
is
moti
v
ated
by
researchers
W
ei-Chao
Lin
[18],
Congnan
Luo
[19].
The
k-means
clustering
algorithm
[20-21]
is
one
of
simplest
and
common
clustering
algorithms
used
to
classify
the
number
of
clusters
based
on
k-v
alue.
The
v
alue
k
is
the
cluster
centroid
of
each
cluster
.
This
study
step-1
we
use
a
k-means
clustering
algorithm
to
form
clusters
based
its
decision
classes
(here
k=2)
and
to
e
xtract
the
cluster
centroids
using
the
centroid
v
ector
c
k
is
defined
as
follo
ws:
c
k
=
1
j
N
j
N
X
m
=1
a
im
(1)
Then,
calculate
its
nearest
neighbours
based
on
a
Euclidian
distance
measure
on
without
missing
v
al
ues
of
dataset.
The
nearest
neighbour
function
defined
as
follo
ws:
Let
S
is
a
finite
set
of
elements
and
a
and
b
belongs
to
S.
B
is
said
to
be
a
the
nearest
neighbour
of
A
if
B
is
closest
to
A
am
o
ng
all
the
points
in
S
A
.
Then
B
is
said
to
be
the
nearest
neighbour
of
A
if
and
only
if
4.
EXPERIMENTS
AND
RESUL
TS
In
order
to
e
v
aluate
NPDI
model,
we
carried
out
e
xperimentions
on
the
benchmark
datasets
from
Kno
wledge
Extraction
based
on
Ev
olutionary
Learning
(KEEL)
repository
[22],
includes
Bands,
Cle
v
eland,
IJECE
V
ol.
6,
No.
6,
December
2016:
3222
–
3228
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3225
Hepatitis,
Horse-Colic,
Iris,
Magic,
Pima,
W
ine.
The
completely
summary
of
datasets
are
used
in
the
e
xperi-
ments
are
sho
wn
in
T
able.1.
T
able
1.
Summary
of
Datasets
used
in
e
xperiments
[22].
Datasets
Instances
Classes
MVs
Databases(KB)
Bands
539
2
32.28
608
Cle
v
eland
303
5
1.98
233
Hepatitis
155
2
48.39
129
Horse-Colic
368
2
98.1
444
Magic
1902
2
58.20
1423
Pima
768
2
50.65
440
W
ine
178
3
70.22
136
First,
we
di
vided
the
gi
v
en
dataset
into
10
folds
of
the
equal
size
based
on
the
stratified
cross
v
al-
idation
test
and
each
step
of
the
e
xperiments
1-fold
is
used
for
the
test
dataset
and
remaining
9-
folds
are
used
for
the
training
dataset.Also,
this
study
presents
t
he
results
obtained
from
the
e
xperimental
e
v
aluation
of
our
proposed
approach
discuss
ed
in
Section
.3
and
we
took
results
for
the
comparison
with
other
impu-
tation
methods
such
as
Imputation
with
K-Nearest
Neighbour
(KNNI),
Support
V
ector
Machine
Imputation
(SVMI),
W
eighted
Imputation
with
K-Nearest
Neighbour
(WKNNI),
K-Means
clustering
Imputation
(KMI)
and
Fuzzy-Means
clustering
Imputation
(FKNNI)
on
a
popular
decision
tree
C4.5
classifier
[23].
T
able
2.
T
est
classifier
accurac
y
using
C4.5
deceson
tree
Methods
NPDI
KNNI
SVMI
W
-KNNI
KMI
F-KNNI
Bands
71.15
70.32
69.18
69.57
70.11
68.25
Cle
v
eland
57.32
56.09
55.41
56.09
55.43
56.09
Hepatitis
89.95
84.10
84.82
85.09
83.20
83.47
Horse-Colic
90.15
83.10
83.67
83.40
82.04
83.94
Magic
82.92
79.96
78.23
79.75
79.86
80.07
Pima
75.56
71.09
73.32
73.69
72.78
72.91
W
ine
88.55
87.64
86.56
88.75
86.54
87.45
In
T
able
2
sho
ws
that
e
v
aluation
results,
from
these
results
we
clearly
state
that
the
proposed
no
v
el
frame
w
ork
for
data
Imputation
is
higher
than
KNNI,
SVMI,
WKNNI,
KMI
and
FKNNI.
Then
classifier
accu-
rac
y
is
impro
v
ed
significantly
compared
with
other
imputation
methods
such
as
KNNI,
SVMI,
WKNNI,
KMI
and
FKNNI
on
benchmark
datasets
using
C4.5
decision
tree,
which
pro
v
ed
the
v
alidity
of
no
v
el
frame
w
ork
for
data
Imputation.
The
predictions
of
classifier
for
the
proposed
approach
is
compared
with
other
imputation
algorithms
are
sho
wn
separately
in
Figure
2-6.
From
the
bar
graphs,
we
can
clearly
state
that
the
proposed
NPDI
has
attai
ned
the
best
accurac
y
among
all
other
se
v
en
benchmark
datasets.
Then
we
compared
with
other
imputation
methods
such
as
KNNI,
SVMI,
WKNNI,
KMI
and
FKNNI
based
on
C4.5
decision
tree
classifie
rs
ha
v
e
scored
approximately
6.75
percentage
impro
v
ement
in
terms
of
accurac
y
.
.
5.
CONCLUSION
This
paper
proposed
a
ne
w
paradigm
for
data
imputation
called
NPDI
for
estimating
the
missing
v
alue
in
datasets.
Firstly
,
split
the
gi
v
en
dataset
into
training
and
testing
datasets
that
is
complete
dataset
and
incom-
plete
dataset.
Secondly
,
applied
k-means
algorithm
on
training
dataset
to
genera
te
clusters
and
its
centroids,
then
calculated
the
nearest
neighbours
distance
between
centroids
and
other
complete
data
records
and
miss-
ing
data
records.
Finally
,
applied
a
popular
statistical
measure
called
z-score
on
mapping
distances
and
then
imputed
plausible
v
alue
for
imputation.
Further
,
we
applied
C4.5
decision
tree
classifier
for
test
classifier
accu-
rac
y
.
The
classifier
accurac
y
is
impro
v
ed
significantly
compared
with
other
imputation
methods
such
as
KNNI,
SVMI,
WKNNI,
KMI
and
FKNNI
on
benchmark
datasets
using
C4.5
decision
tree,
which
pro
v
ed
the
v
alidity
of
the
no
v
el
frame
w
ork
for
data
Imputation.
Approximately
6.75
percentage
impro
v
ement
in
terms
of
classifier
A
Ne
w
P
ar
adigm
for
De
velopment
of
Data
Imputation
Appr
oac
h
for
Missing
V
alue
Estimation
(Madhu
G)
Evaluation Warning : The document was created with Spire.PDF for Python.
3226
ISSN:
2088-8708
Figure
2.
NPDI
vs
KNNI
Figure
3.
NPDI
vs
SVMI
accurac
y
with
compared
to
KNNI
algorithm,
WKNNI
algorithm
for
Hepatitis
dataset
and
KMI
algorithm
for
Horse-Colic
dataset.
Major
strength
of
this
approach
is
not
lost
the
an
y
attrib
ute
information
during
the
dimen-
sionality
reduction
process.
Limitation
of
this
research
w
ork
is
before
imputations
need
suitable
normalization
technique
to
smooth
dataset.
REFERENCES
[1]
Jia
wei
Han,
Micheline
Kamber
.
”Data
Mining:
Concepts
and
T
echniques”,
2nd
edition.
Mor
g
an
Kaufmann,
San
Francisco,
CA,
2006.
[2]
Mehmed
Kantardzic
et
al.,
Data
Mining:
Concepts,
Models,
Methods,
and
Algorithms,
2nd
edition,
IEEE
Press,
2011.
[3]
G.Madhu
et
al.,
A
No
v
el
Inde
x
Measure
Mimputation
Algorithm
for
Missing
Data
V
alues:
A
Machine
Learning
Approach,
IEEE
International
Conference
on
Computational
Intelligence
and
Computing
Re-
search,
pp.1-7,
2012.
[4]
Dan
Li
et
al.,
T
o
w
ards
Missing
Data
Imputation:
A
Study
of
Fuzzy
k-Means
Clustering
Method,
RSCTC
2004,
LN
AI
3066,
pp.
573579,
2004.
[5]
Rahman,
M.M
and
Da
vis,
D.N.
”Machine
Learning-based
Missing
V
alue
Imputation
Methods
for
Clinical
datasets”,
IAENG
T
ransactions
on
Engineering
T
echnologies,
Springer
Netherlands,
pp.245-257,
2013.
[6]
Rupam
Deb
et
al.,
”Missing
V
alue
Imputation
for
the
Analysis
of
Incomplete
T
raf
fic
Accident
Data”,
Information
Sciences,
v
ol.339,
pp.274-289,
2016.
[7]
Bashir
.F
,
et
al.,
”P
arametric
and
Non-P
arametric
Methods
to
Enhance
Prediction
Performance
in
the
Pres-
ence
of
Missing
Data”,
19th
International
conference
on
system
theory
,
control
and
computing
(ICSTCC),
pp.337-342,
2015.
IJECE
V
ol.
6,
No.
6,
December
2016:
3222
–
3228
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3227
Figure
4.
NPDI
vs
WKKNI
Figure
5.
NPDI
vs
KMI
[8]
F
abio
Lobato,
et
al.,
”Multi-Objecti
v
e
Genetic
Algorithm
for
Missing
Data
Imputation”,
pattern
recognition
letters,
v
ol.68,
part-1,
pp.126-131,
2015.
[9]
G.
Madhu,
et
al.,
”A
Non-P
arametric
Discretization
Based
Imputation
Algorithm
for
Continuous
Attrib
utes
with
Missing
Data
V
alues”,
International
Journal
of
Information
Processing,
v
ol.8,
no.1,
pp.64-
72,
2014.
[10]
Chandan
Gautam
and
V
adlamani
Ra
vi,
”Data
Imputation
via
Ev
olutionary
Computation,
Clustering
and
a
Neural
Netw
ork”,
Neurocomput,
156,
134-142,
2015.
[11]
Gary
,
K.,
et
al.,
”Listwise
Deletion
is
Evil:
What
to
do
about
Missing
Data
in
Pol
itical
Science”,
(2000)
(a
v
aible
http://GKing.Harv
ard.edu)
[12]
Dempster
,
A.P
.,
Laird,
N.M.,
Rubin,
D.B.,
”Maximum
lik
elihood
from
incomplete
data
via
the
EM
algo-
rithm”,
J.
of
Ro
yal
Statistical
Society
Series,
v
ol.
39,pp,
138,
1977.
[13]
Myrtv
eit,
I.,
et
al.,
”Analyzing
data
sets
with
missing
data:
an
empirical
e
v
aluation
of
imputation
methods
and
lik
elihood-based
methods”,
IEEE
T
ransactions
on
Softw
are
Engineering,
v
ol.27,
pp.
9991013,
2001.
[14]
Little,
R.J.,
Rubin,
D.B.,
”Statistical
Analysis
with
Missing
Data”,
W
ile
y
,
Ne
w
Y
ork,
1987.
[15]
Alireza
F
arhangf
ar
et
al.,
”A
No
v
el
Frame
w
ork
for
Imputation
of
Missing
v
alues
in
Databases”,
IEEE
T
ransaction
on
systems,
man,
and
c
ybernetics-part-a:
systems
and
humans,
v
ol.37,
no.5,
sept
2007.
[16]
Lakshminarayan
K,
et
al.,
”Imputation
of
Missing
data
in
industrial
databases”,
Appl,
Intell,
v
ol.11,
no.3,
pp.
259-275,
1999.
[17]
H.L.Oh
and
F
.L
Scheuren,
”W
eighting
adjustments
for
unit
nonresponse,
incomplete
data
in
sample
sur
-
v
e
y”,
Theory
and
Bibliographies,
v
ol.2,
pp.143-183,
1983.
[18]
W
ei-Chao
Lin,
Shih-W
en
K
e,
and
Chih-F
ong
Tsai
”CANN.
An
Intrusi
on
Detection
System
based
on
Combining
Cluster
Centers
and
Nearest
Neighbors,
Kno
w
.-Based
Syst”,
v
ol.
78,
pp.
13-21,
2015..
[19]
Congnan
Luo,
Y
anjun
Li,
and
Soon
M.
Chung
”T
e
xt
document
clustering
based
on
neighbors”,
Data
A
Ne
w
P
ar
adigm
for
De
velopment
of
Data
Imputation
Appr
oac
h
for
Missing
V
alue
Estimation
(Madhu
G)
Evaluation Warning : The document was created with Spire.PDF for Python.
3228
ISSN:
2088-8708
Figure
6.
NPDI
vs
FKNNI
Kno
wl.
Eng,
v
ol.68,
issue.
11,
pp.
1271-1288,
2009.
[20]
J.A.
Hartig
an,
”Clustering
Algorithms”,
John
W
ile
y
and
Sons,
1975.
[21]
A.K.
Jain,
M.N.
Murty
,
P
.J.
Flynn,
”Data
clustering:
a
re
vie
w”,
A
CM
Comput.Surv
e
y
,
v
ol.31
(3),
pp.
264323,
1999.
[22]
J.
Alcal-Fdez,
A.
Fernandez,
J.
Luengo,
J.
Derrac,
S.
Garca,
L.
Snchez,
F
.
Herrera.
”KEEL
Data-Mining
Softw
are
T
ool:
Data
Set
Repository
,
Inte
gration
of
Algorithms
and
Experim
ental
Analysis
Frame
w
ork”,
Journal
of
Multiple-V
alued
Logic
and
Soft
Computing
17:2-3
(2011)
255-287.
[23]
J.R.
Quinlan,
”C4.5:
Programs
for
Machine
Learning”.
Mor
g
an
Kaufmann
Publishers,
Inc.,
1993.
BIOGRAPHIES
OF
A
UTHORS
Madhu.G
recei
v
ed
Ph.D
De
gree
in
Compter
Science
and
Engineering
from
Ja
w
aharlal
Nehru
T
ech-
nological
Uni
v
ersity
,
Hyderabad
in
2015.
He
is
currently
w
orking
as
Assocaite
Professor
,
Dept
of
Information
T
echnology
,
VNR
VJIET
,
JNTU
Hyderabad,
T
.S,
INDIA.
He
has
27
research
publi-
cations
at
International/National
Journals
and
Conferences.
His
research
interest
includes
Data
Mining,
Machine
Learning
and
Methematical
Modeling
and
he
is
also
a
re
vie
wer
of
research
pa-
pers
of
v
arious
Journals
lik
e
JCIT
,JET
AI
and
Journal
of
Soft
Computing.
He
is
a
Member
of
IEEE,
IEEE
Computational
Intelligence
Society
,
Life
Member
of
CSI,Member
of
IRSS,
and
Member
of
IAENG.
Recently
,
machine
learning
based
algorithms
on
malaria
daignosis
has
been
tackled.
G.Naga
Chandrika
recei
v
ed
M.T
ech
De
gree
in
Compter
Science
and
Information
T
echnology
from
Ja
w
aharlal
Nehru
T
echnological
Uni
v
ersity
,
Hyderabad.
He
is
c
urrently
w
orking
as
Assis-
tant
Professor
,
Dept
of
Information
T
echnology
,
VNR
VJIET
,
JNTU
Hyderabad,
T
.S,
INDIA.
Her
research
interest
includes
Data
Mining,
T
e
xt
Mining
and
Theory
of
Computation.
IJECE
V
ol.
6,
No.
6,
December
2016:
3222
–
3228
Evaluation Warning : The document was created with Spire.PDF for Python.