Indonesi
an
Journa
l
of El
ect
ri
cal Engineer
ing
an
d
Comp
ut
er
Scie
nce
Vo
l.
24
,
No.
1
,
Octo
b
er
2021
,
pp.
564
~
569
IS
S
N: 25
02
-
4752, DO
I: 10
.11
591/ijeecs
.v
24
.i
1
.
ppa
564
-
569
564
Journ
al h
om
e
page
:
http:
//
ij
eecs.i
aesc
or
e.c
om
Machin
e
learnin
g
bas
ed
ou
tlier
d
etection
f
or
medic
al
dat
a
R.
Vij
aya
Ku
mar
Reddy
1
,
Sha
ik
Su
bh
ani
2
,
B.
Srini
vas
a
Rao
3
,
N
.
La
kshmip
at
hi
A
na
n
tha
4
1
,3
Depa
rtment
of
IT,
L
aki
r
edd
y
B
al
i
Redd
y
Co
lleg
e
of
Engi
n
ee
ring
(Autonom
us)
,
M
y
la
v
ara
m
,
Indi
a
2
Depa
rtment
of
I
T,
Sre
eni
dh
i
Inst
it
ute
of
Sc
ie
n
ce
and
T
ec
hnolo
g
y
(Autonom
us)
,
Te
la
ng
ana,
Indi
a
4
Depa
rtment
of
CS
E,
Mal
la
Red
d
y
Engi
n
ee
ring
Coll
ege
(Autono
m
us)
,
Te
l
anga
na
,
Indi
a
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
hist
or
y:
Re
cei
ved
Ja
n
10
,
2021
Re
vised
A
ug
8
,
20
21
Accepte
d
Aug
23
,
2021
The
concept
of
m
ac
hine
learni
n
g
gene
r
ate
best
r
esult
s
in
health
ca
re
da
ta
,
it
al
so
red
uce
th
e
work
loa
d
of
hea
lt
h
c
are
industr
y
.
Th
is
al
gorit
h
m
pote
nti
a
l
l
y
over
come
the
is
sues
and
find
o
ut
the
novel
kn
owledge
for
dev
el
opm
ent
of
m
edi
ca
l
da
te
in
hea
l
th
ca
r
e
indu
str
y
.
In
thi
s
pape
r
propose
a
ne
w
al
gorit
hm
for
findi
ng
the
o
utl
ie
rs
using
diff
ere
nt
dataset
s.
C
onsideri
ng
tha
t
m
edi
ca
l
data
are
anal
y
tic
of
m
utua
lly
h
ealth
proble
m
s
and
an
ac
t
ivi
t
y
.
The
proposed
al
gorit
hm
is
working
base
d
on
supervise
d
and
unsupervise
d
learni
ng.
Th
is
al
gorit
hm
d
et
e
cts
the
out
li
ers
in
m
edi
ca
l
data.
Th
e
eff
ec
t
ive
n
ess
of
local
and
globa
l
data
factor
for
outl
ie
r
det
e
ct
ion
for
m
edi
ca
l
da
ta
in
rea
l
ti
m
e.
W
hat
eve
r
,
the
m
odel
used
in
thi
s
sce
nar
io
from
t
hei
r
training
and
te
sting
of
m
edi
ca
l
da
ta.
T
he
cl
e
ani
ng
pro
ce
ss
base
d
on
the
complete
attribut
es
of
dat
ase
t
of
sim
ila
rity
oper
ations.
Expe
riments
are
conduc
t
ed
in
bui
lt
in
var
ious
m
edi
ca
l
da
ta
se
ts.
The
statistical
outc
om
e
desc
rib
e
tha
t
th
e
m
ac
hi
ne
le
arn
ing
base
d
out
li
er
fin
ding
a
lgori
thm
g
ive
n
tha
t
best
a
c
cur
atene
ss
.
Ke
yw
or
ds:
Ma
chine
le
a
rn
i
ng
al
gorithm
Me
dical
data
a
naly
sis
Ou
tl
ie
r’s
detect
ion
Si
m
il
arity
op
e
r
at
ion
s
This
is
an
open
acc
ess
arti
cl
e
un
der
the
CC
BY
-
SA
l
ic
ense
.
Corres
pond
in
g
Aut
h
or
:
R.
Vi
j
ay
a
K
uma
r
Re
ddy
Dep
a
rtm
ent
of
IT
Lakire
dd
y
Ba
li
Re
ddy
Coll
eg
e
of
E
ng
i
neer
i
ng
(
Au
t
onom
us
)
My
la
var
am
,
India
E
-
m
ail:
Vij
ay
aku
m
arr
28
5@g
m
ai
l.co
m
1.
INTROD
U
CTION
Ou
tl
ie
r
recog
ni
ti
on
is
sig
nifi
cant
them
es
in
data
m
ining
,
t
he
ai
m
of
disc
ov
e
ry
pa
tt
ern
t
hat
ha
ppe
n
rar
el
y
as
oppos
it
e
to
ot
her
dat
a
m
ining
m
et
h
od
s
[
1
].
An
outl
ie
r
is
der
i
ved
sign
ific
a
ntly
from
incon
sist
e
nt
of
a
dataset
[
2
].
Th
e
sign
i
ficance
of
outl
ie
r
fin
di
ng
is
in
the
s
igh
t
of
t
he
tr
ut
h
so
as
to
ou
tl
ie
rs
can
offe
r
ra
w
patte
rn
s
an
d
preci
ou
s
i
nfor
m
at
ion
a
bout
a
da
ta
set
.
Pr
ese
nt
r
esearch
co
ver
the
fiel
ds
of
ou
t
li
er
disco
ver
y
al
ong
with
cre
dit
car
d
f
raud
fi
nd
i
ng,
netw
ork
int
rusion
e
xposure
,
cri
m
e
discov
e
ry,
m
edical
analy
sis,
and
detect
ing
unusual
par
ts
in
im
age
proces
sing
[
3
].
U
n
s
u
p
e
r
v
i
s
e
d
o
u
t
l
i
e
r
d
e
t
e
c
t
i
o
n
,
is
n
o
r
m
a
l
l
y
c
l
a
s
s
i
f
i
e
d
i
n
t
o
d
i
s
t
a
n
c
e
-
b
a
s
e
d
[
4
]
,
[
5
],
d
e
n
s
i
t
y
-
b
a
s
e
d
[
6
]
,
[
7
],
and
distrib
utio
n
-
base
d
[
3
]
procedu
res.
T
his
appro
ac
h
fi
nds
each
data
po
i
nts
are
pr
oduce
d
by
a
de
finite
arit
hm
etical
mo
del,
but
outl
ie
rs
do
not
acce
pt
this
ty
pe
of
m
od
el
.
This
m
et
hod
was
pr
el
i
m
inaril
y
inv
es
ti
gated
by
Kno
x
an
d
Ng
[
5
].
In
loc
al
info
rm
at
ion
of
the
datase
t
diff
er
to
the
glo
bal
pa
ram
et
ers.
De
ns
it
y
-
bas
e
d
m
et
ho
d
was
at
first
discuss
e
d
by
Breun
ig
et
al
.
[
6
].
Ba
sed
on
their
local
po
i
nt
den
sit
y,
local
ou
tl
ie
r
fac
tor
is
assign
to
e
ver
y
data
point.
Th
e
data
point
with
a
fa
r
ab
ove
the
gro
und
f
or
the
local
(
L
OF
)
val
ue
is
de
sc
ribe
d
as
an
outl
ie
r.
The
cl
us
te
rin
g
-
base
d
m
et
ho
ds
are
uns
up
e
rvi
sed,
t
hey
d
not
re
quire
a
ny
la
beled
trai
ning
data
,
and
their
ap
pea
ran
ce
in
outl
ie
r
disc
ov
e
ry
is
re
stric
te
d.
Ma
ny
real
-
w
orl
d
app
li
cat
ions
m
ay
co
m
e
across
dissim
il
ar
ca
ses
for
a
s
m
al
l
set
of
obj
ect
s
a
re
la
beled
as
outl
ie
rs
to
a
certai
n
cl
ass,
but
the
gr
eat
e
r
pa
rt
of
data
a
re
un
la
beled.
Sig
nificantl
y
im
pr
ov
e
the
e
ff
ic
ie
ncy
of
ou
tl
ie
r
detect
io
n,
little
bit
of
pro
per
knowle
dg
e
is
re
quire
d
[
8
]
-
[
10
].
So
sem
i
su
pe
rv
ise
d
m
et
ho
ds
to
ou
tl
ie
r
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Mac
hin
e le
ar
ni
ng base
d o
utli
er d
et
ect
io
n
fo
r
med
ic
al data
(
R. Vijay
a
K
umar
Re
ddy
)
565
recog
niti
on
ha
ve
bee
n
urban
i
zed
to
under
ta
ke
su
c
h
ty
pe
of
scenari
o,
wh
ic
h
ha
ve
been
c
onside
rati
on
of
a
well
-
li
ked
trac
k
of
outl
ie
r
disc
over
y.
The
la
st
three
decad
e
s,
different
ty
pes
of
m
achine
le
ar
ning
ap
proac
he
s
are
pro
pose
d
li
ke
an
a
nt
colo
ny
opti
m
i
zat
ion
,
a
rtific
ia
l
neural
netw
orks
,
pa
rtic
le
swar
m
op
ti
m
i
zat
ion
,
e
voluti
on
a
ry
cal
culat
ion
a
nd
su
pp
or
t
vect
or
m
achine
(
SVM
)
.
We
co
nce
ntrated
on
uns
up
e
r
vised
m
eth
ods
that
ha
ve
been
pro
j
ect
ed
for
ou
tl
ie
r
rec
ogni
ti
on
in
this
li
t
eratur
e
.
In
de
ns
it
y
based
spa
ti
al
c
lustering
with
no
ise
w
as
us
ed
to
ge
ner
at
e
nu
m
erous
data
segm
ents
by
the
uniq
ue
c
har
a
ct
erist
ic
sp
ace
X
int
o
N
dissim
il
ar
par
ts.
Am
oli
et
al
.
[11]
de
fine
un
s
up
e
r
vised
in
ve
ntory
for
de
pressi
ve
sym
pto
m
at
olo
gy
(
I
DS
)
for
quic
k
netw
ork,
this
t
ype
of
netw
ork
rec
ognize
zer
o
at
ta
cks.
T
he
m
ajor
e
ngine
of
the
densi
ty
-
bas
ed
s
patia
l
cl
us
te
rin
g
of
a
ppli
cat
ion
s
with
noise
(
D
BSC
AN
)
cl
us
t
erin
g
ide
ntifie
s
at
ta
cks
an
d
the
seco
nd
one
s
et
up
the
botne
t
under
dissim
il
ar
protoco
ls
.
The
est
im
at
ion
of
pro
po
s
ed
m
ock
-
up,
two
ob
ta
i
na
ble
dataset
s
w
as
us
e
d
f
or
c
he
ckin
g.
These
a
ccessi
ble
m
od
el
s
w
as
eval
uated
a
nd
al
so
com
par
ed
with
m
ul
ti
ple
ap
proache
s
li
ke
k
-
m
eans
a
nd
DBSCA
N
ou
tl
ie
r
detect
ion
t
echn
i
qu
e
s.
In
these
ap
proac
hes
[
12]
recogn
iz
e
un
c
ha
ra
ct
erist
ic
beh
a
vi
or
s
in
netw
ork
an
d
syst
e
m
log
data
al
so
.
A
su
r
vey
of
m
achine
lear
ni
ng
m
et
ho
ds
for
detect
ion
te
chn
iq
ues
dis
cuss
by
Buczak
an
d
G
uv
e
n
va
riet
y
of
a
pproaches
[
13
]
,
an
d
e
xpla
in
the
sig
nifica
nce
of
the
data
set
s
f
or
trai
ni
ng
a
nd
te
sti
ng
I
DS.
Ni
sioti
et
al
.
[
14]
pro
vid
e
unsup
erv
ise
d
m
et
ho
ds
for
an
om
al
y
ty
pe
of
I
DS
;
it
was
ob
ta
i
nab
l
e
an
d
com
par
ed
c
ha
r
act
erist
ic
sel
ection
f
or
i
ntr
us
io
n
detect
ion.
The
stu
dy
is
orga
nized
i
nto
diff
e
re
nt
sect
ion
s
.
Sect
io
n
2
sta
te
the
healt
h
care
syst
em
arch
it
ect
ure
with
trai
ning
a
nd
te
sti
ng.
Sec
ti
on
3
deal
with
ou
tl
ie
r
detec
ti
on
us
in
g
m
achine
le
a
rn
i
ng
a
lgorit
hm
.
Sect
i
on
4
discusse
s
the
r
eal
world
outl
ie
r
data
detect
io
n
resu
lt
s.
Co
nc
lusio
n
of
t
he
st
ud
y
is
prese
nted
in
the
la
st
s
e
ct
ion
.
2.
HEALT
H
C
A
RE
S
YS
TE
M
ARCHITE
CT
UR
E
Me
dical
data
prof
il
in
g
is
of
t
en
f
ound
in
diff
e
ren
t
s
ources
of
real
ti
m
e
fiel
ds
.
P
reproc
e
ssing
of
the
data
is
essenti
al
for
e
ver
y
real
tim
e
data
for
rem
ov
e
no
ise
.
In
healt
h
care
industry
pre
dicti
ve
analy
ti
cs
is
one
of
t
he
sig
nifica
nt
issues
.
This
pap
e
r
f
oc
us
on
pro
per
a
naly
sis
of
data
usi
ng
m
achine
le
arni
ng
al
go
rithm
[1
5].
The
dia
gr
am
descr
ibe
the
trai
ning
an
d
te
sti
ng
phase
of
he
a
lt
h
care
syst
e
m
.
At
first
le
vel,
the
m
edical
histor
y
of
patie
nts
an
d
m
edical
che
ck
ou
tc
om
es
are
c
ollec
te
d.
P
re
-
process
the
data
befo
re
a
pp
ly
in
g
t
he
m
achine
le
arn
in
g
al
gori
thm
s.
The
at
tribu
te
s
relat
ed
to
the
knowle
dge
is
pract
ic
al
in
the
trai
ni
ng
ph
a
se.
T
he
m
od
el
is
exp
e
rience
d
wi
th
pre
def
i
nite
dataset
an
d
a
ut
hority
us
in
g
di
ssi
m
il
ar
m
et
ric
s
an
d
li
kelih
oo
d
rati
o
set
up
[
16
]
.
Figure
1
desc
r
ibes
the
trai
ni
ng
an
d
te
sti
ng
phase
of
he
al
th
ca
re
syst
em
.
At
first
le
vel,
the
m
edical
histor
y
of
pati
ents
an
d
m
edical
check
out
com
es
are
colle
ct
ed.
Pr
e
-
pro
cess
the
data
befor
e
ap
plyi
ng
the
m
achine
le
arn
i
ng
al
go
rithm
s.
The
at
tribu
te
s
relat
ed
to
the
know
le
dg
e
is
pr
act
ic
al
in
the
trai
ning
ph
ase.
In
te
sti
ng
ph
ase
c
heck
the
sim
ilarity
of
ne
w
pa
ti
ents
featu
res
an
d
validat
es
the
data
.
T
he
m
od
el
is
exp
e
r
ie
nced
with
pr
e
d
e
fini
te
d
at
aset
and a
uthority
u
si
ng
dissim
il
ar
m
et
rics an
d
li
kelih
ood
rati
o set
u
p.
Figure
1.
Trai
ni
ng
a
nd
te
sti
ng
ph
a
se fo
r heal
th
ca
re
syst
em
3.
OUTLIE
R
D
ET
ECTION
US
I
NG
M
ACHINE
LE
ARNING
ALGO
RITH
M
3.1
.
Fe
ature
engineeri
n
g
Feat
ur
e
e
ngine
erin
g
inclu
de
d
the
dif
fer
e
nce
betwee
n
c
urrent
init
ia
l
value
of
patie
nt
data
a
nd
t
he
fi
nal
value,
sec
ond
one
is
dif
fer
e
nc
e
bet
ween
cu
r
ren
t
m
edical
da
ta
value
an
d
f
ind
the
la
st
fiv
e
val
ues
a
ver
a
ge,
an
d
finall
y
m
achine
le
arn
in
g
cl
ust
ering
featu
re
based
on
hundre
d
gro
ups
on
the
af
or
esai
d.
If
the
data
is
not
norm
al
iz
e,
dec
reased
the
pe
r
form
ance
of
the
m
od
el
s.
Th
e
fo
ll
ow
i
ng
Fi
gure
2
re
pr
ese
nts
the
disco
ve
red
ou
tl
ie
rs
sho
wn
in
re
d
points
for
the
local
ou
tl
ie
r
fact
or
trai
ne
d
on
0.5
%
dataset
[17].
Figure
2
repre
sents
the
detec
te
d
ou
tl
ie
rs
ba
sed
on
the
ra
ndom
fo
rests
al
gorithm
trai
ned
on
the
0.5%
ano
m
al
y
dataset
an
d
Fi
gure
3
sho
ws
t
he
detect
ed
ou
tl
ie
rs
f
or
the
is
olati
on
f
or
est
s
m
od
el
s
trai
ne
d
on
the
0.5
%
dataset
.
Fig
ur
e
4
show
s
the
detect
ed
outl
ie
rs
f
or
the
is
olati
on
f
or
est
s
m
od
el
s
trai
ned
on
the
0.5%
dataset
.
Disco
ver
e
d o
utli
er (
x)
for
t
he i
so
la
ti
on
for
est
(
FF
) feat
ure
form
trained
on
the
0.5%
outl
ie
r
dataset
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
24
, N
o.
1
,
Oct
ober
20
21
:
564
-
569
566
Figure
2. Disc
ov
e
re
d ou
tl
ie
r (x)
for
the
loca
l (LOF)
for
m
trained o
n
the
0.5%
outl
ie
r
data
set
Figure
3.
Disc
ov
e
re
d
ou
tl
ie
r
(x)
for
the
ra
nd
om
f
or
est
(RF)
featu
re
form
trai
ned
on
the
0.5%
outl
ie
r
data
set
Figure
4.
Disc
ov
e
re
d
ou
tl
ie
r
(x)
for
the
isol
at
ion
forest
(
IF)
feat
ur
e
f
or
m
trai
ne
d
on
the
0.5%
outl
ie
r
da
ta
set
3.2
.
Sim
ulat
e
d
d
ata
pe
rf
or
man
ce
Ever
y
al
gorith
m
trai
ned
on
0.5%
outl
ie
r
dat
aset
for
trai
n
a
nd
te
st
perf
orm
ances.
Creat
e
the
trai
ni
ng
and
te
sti
ng
dat
aset
in
rati
o
of
80
:2
0
com
e
apar
t
of
the
dat
a.
Sam
ple
par
am
et
ers
us
ed
a
series
of
l
oops
that
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Mac
hin
e le
ar
ni
ng base
d o
utli
er d
et
ect
io
n
fo
r
med
ic
al data
(
R. Vijay
a
K
umar
Re
ddy
)
567
connecte
d
t
hro
ugh
m
od
ify
in
ever
y
rele
van
t
par
am
et
er
sp
e
ci
fied
the
m
odel
.
This
ty
pe
of
cha
nges
pe
r
m
itted
for
us
to
assess
the
im
po
rtanc
e
of
feat
ur
es
to
the
m
od
el
a
nd
cr
op
any
unrel
at
ed
feat
ur
es
in
the
process
.
The
pro
po
se
d
al
gorithm
is
a
plain
a
nd
w
el
l
-
li
ked
cl
us
t
erin
g
al
gorith
m
.
It
us
es
dis
ta
nce
an
d
a
m
ini
m
u
m
nu
m
ber
of
po
i
nts
pe
r
cl
us
te
r
to
cl
assify
a
po
int
as
an
outl
ie
r
.
T
he
r
obus
t
al
gor
it
h
m
thu
sm
ake
s
du
al
pr
e
dicti
on
s:
it
checks
the
poi
nt
is
ou
tl
ie
r
or
no
t.
To
re
fine
the
pr
e
dicti
ons,
chec
k
the
al
l
cl
us
te
rs
oth
e
r
than
ou
tl
ie
r
cl
us
te
r.
Eucli
dea
n
distance
functi
on
will
be
the
def
a
ul
t
us
ed
f
or
cal
culat
ion
s
.
Eucli
dea
n
distance
ba
s
e
d
m
et
ho
d
f
or
ou
t
l
i
e
r
de
t
e
c
t
i
on
i
nd
i
r
e
c
t
ly
t
he
ne
i
gh
bo
r
ho
od
of
an
ob
je
c
t
;
it
is
de
f
i
ne
by
a
gi
ve
n
r
a
di
us
.
A
di
s
t
a
nc
e
ba
s
e
d
on
t
he
t
hr
e
s
h
o
l
d
can
be
t
r
e
a
t
e
d
as
a
ne
i
gh
b
or
h
oo
d
of
o
bj
e
c
t
.
F
or
each
ob
je
c
t
o,
we
can
di
s
c
o
v
e
r
ne
i
gh
bo
r
s
of
an
o
bj
e
c
t
.
N
or
m
al
ly
,
l
e
t
r
(r>
0)
be
a
t
hr
e
s
ho
l
d
di
s
t
a
nc
e
a
nd
∅
(
0<
∅
<1
)
be
a
t
hr
e
s
ho
l
d
f
r
a
c
t
i
on
.
dist
=
‖
′
(
,
′
)
≤
‖
‖
‖
≤
∅
(1)
The
se
c
ond
a
p
pr
oa
c
h
t
a
ke
s
O
(
n²
)
t
i
m
e
,
T
he
de
ns
i
t
y
of
an
o
bj
e
c
t
a
nd
t
ha
t
of
its
ne
i
gh
bo
r
s
e
xa
m
i
ne
by
w
e
i
gh
t
ba
s
e
d
ou
t
l
i
e
r
de
t
e
c
t
io
n
t
e
c
hn
i
q
ue
.
So
m
a
ny
r
e
al
w
or
l
d
da
t
a
s
e
t
s
a
r
e
c
om
pl
e
x
s
t
r
uc
t
ur
e
.
C
om
pa
r
e
to
gl
o
ba
l
da
t
a
,
l
oc
a
l
ne
ig
h
bo
r
ho
od
s
a
r
e
be
t
t
e
r
to
m
e
a
s
ure
ou
t
l
i
e
r
s
am
on
g
ob
je
c
t
s
.
T
he
de
ns
i
t
y
-
ba
s
e
d
ou
t
l
i
e
r
de
t
e
c
ti
on
m
et
ho
ds
c
on
c
e
nt
r
a
t
e
on
de
ns
i
t
y
va
l
ue
s
of
ne
i
gh
bo
r
p
oi
nt
s
.
dist_
k(
o)
is
a
di
s
t
a
nc
e
be
t
w
e
e
n
o
bj
e
c
t
o,
a
n
d
K
N
N
.
T
he
k
-
di
s
t
a
nc
e
ne
i
gh
bo
r
ho
od
of
o,
c
on
s
i
de
r
as
an
e
nt
i
r
e
ob
je
c
t
s
of
t
he
d
i
s
t
a
nc
e
to
o,
it
is
l
e
s
s
t
ha
n
dist_k(
o)
.
(
)
=
[
|
′
∈
,
(
,
′
)
≤
(
)
]
(2)
F
i
nd
o
ut
t
he
a
ve
r
a
ge
di
s
t
a
nc
e
from
t
he
ob
je
c
t
s
in
N
k
(
o)
to
o
.
If
o
ha
s
i
nd
i
c
a
t
e
d
c
l
os
e
ne
i
gh
bo
r
s
o’
s
uc
h
t
ha
t
dist
(
o,
o’)
is
ti
ny
di
s
ta
nc
e
,
due
t
he
nu
m
e
r
i
c
a
l
f
l
uc
t
ua
t
io
ns
of
t
he
di
s
t
a
nc
e
c
a
l
c
ul
at
e
can
be
ve
r
y
hi
gh
.
So
no
r
m
a
li
z
a
ti
on
m
e
t
ho
ds
a
pp
l
i
e
d
f
or
ov
e
r
c
om
e
t
he
c
ur
r
e
nt
i
s
s
ue
s
.
r
e
a
c
hd
i
s
t
k(
o,
o
′
)
=
m
a
x[
dist
k
(
o
)
,
dist
(
o
,
o
′
)
]
(3)
k
is
a
us
e
r
-
s
p
e
c
i
f
i
e
d
pa
r
a
m
et
e
r
a
nd
it
s
pe
c
i
f
y
t
he
m
i
ni
m
um
ne
i
gh
bo
r
h
oo
d
to
be
c
he
c
k
t
he
l
oc
a
l
de
ns
i
t
y
of
an
o
bj
e
c
t
.
T
he
l
oc
a
l
de
ns
i
t
y
of
an
o
bj
e
c
t
o
is
,
l
r
d(
o
)
=
‖
(
)
‖
∑
′
∈
(
)
ℎ
(
,
′
)
(4)
w
e
c
a
l
c
ul
a
te
t
he
de
ns
i
t
y
for
an
ob
je
c
t
of
l
oc
a
l
r
e
a
c
ha
bi
li
ty
a
nd
c
om
pa
r
e
d
w
i
t
h
its
ne
i
gh
b
or
s
to
s
pe
l
l
out
t
he
de
gr
e
e
to
w
hi
c
h
t
he
ob
je
c
t
is
r
e
c
og
ni
z
e
d
an
ou
t
l
i
e
r
.
T
he
a
b
ov
e
a
l
g
or
i
t
hm
i
s
w
or
ki
ng
p
r
o
c
e
du
r
e
a
s
s
im
il
a
r
t
o
e
xi
s
t
i
ng
on
e
s
.
B
ut
t
he
di
f
f
e
r
e
nc
e
i
n
e
f
f
i
c
i
e
nc
y
a
nd
s
u
i
t
a
bl
e
da
t
a
s
e
t
w
i
t
h
di
f
f
e
r
e
nt
da
t
a
s
i
z
e
s
.
(
)
=
∑
′
∈
(
)
(
′
)
(
)
‖
(
)
‖
(5)
3.
3.
P
r
op
os
e
d
al
go
r
i
t
h
m
be
gi
n
St
ep1
:
from
cl
us
te
r
1
im
po
rt
ML
-
A
S
t
e
p2
:
outl
ie
r1_d
et
ect
io
n
=
ML
-
A
(
e
ps
=
.2
Step3
:
m
easur
e
m
ent=”e
uclide
an”,
S
t
e
p4
:
m
in_
sam
ples
=
5,
n_job
s
=
-
1)
S
t
e
p
5:
cl
us
te
rs
=
outl
ie
r_detect
ion
.
fit_pre
dict
(
nu
m
2)
e
nd
4.
RESU
LT
S
W
ITH
RE
AL
W
ORLD
O
UTL
IER
DA
T
A
D
ET
ECTION
In
real
w
orl
d
s
cenari
o
m
edic
al
data
analy
si
s
is
m
os
t
i
m
po
rtant
for
ge
ne
r
at
e
accurate
re
su
lt
.
In
this
researc
h
pr
e
di
c
t
the
real
tim
e
resu
lt
s
for
detect
ing
the
ou
tl
ie
rs.
Be
f
ore
pr
e
dicti
ng
the
res
ults,
che
ck
the
pr
e
vious
sta
tus
of
the
existi
ng
data.
This
al
gorithm
low
sensiti
vity
in
prim
ary
ta
sk
,
after
that
ge
ner
at
e
good
resu
lt
s.
O
utli
er
detect
ion
m
odel
is
im
ple
mented
s
uita
ble
real
tim
e
m
edi
cal
data.
T
he
trai
ning
an
d
te
sti
ng
process
is
si
m
il
ar
to
data
m
ining
a
ppr
oac
h
for
pre
proc
essing
of
data
,
trai
ning
of
da
ta
and
te
sti
ng
with
trai
ning
data,
finall
y
validat
e
t
he
data
an
d
ge
ner
at
e
acc
urat
e
res
ults.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
24
, N
o.
1
,
Oct
ober
20
21
:
564
-
569
568
The
pr
opos
e
d
m
od
el
s
are
su
it
able
for
2.5%
ou
tl
ie
r
dataset
.
The
L
OF
fi
nd
out
at
the
top
en
d
of
the
m
edical
data
ran
ge
for
outl
ie
rs.
In
this
rese
arch
to
fin
d
ra
dio
f
reque
ncy
(
RF
)
a
nd
inte
r
m
ediat
e
fr
eq
ue
ncy
(
IF
)
of
ou
tl
ie
rs
from
giv
en
da
ta
[18].
The
m
odel
trai
ned
on
2.5%
of
datas
et
per
f
or
m
ed
in
the
sam
e
way
its
counter
par
t
tha
t
was
trai
ned
on
the
0.5
%
of
ou
tl
ie
r
dataset
.
It
is
i
m
po
rtan
t
to
m
ention
the
ideal
equ
il
ib
rium
betwee
n
hit
and
false
al
arm
rate
will
bas
ed
on
the
ta
sk
-
relat
ed
pe
nalti
es
of
the
ou
tl
ie
r
recogn
it
io
n.
The
ou
tc
om
e
of
di
ssi
m
il
ar
trai
nin
g
da
ta
set
not
cha
nges
in
the
patte
r
n
cl
ass
ific
at
ion
.
The
obj
ect
ive
of
t
hi
s
job
acce
ss
the
po
t
entia
l
perform
ance
of
al
gorithm
based
on
outl
ie
rs.
Visu
al
i
zed
pr
ese
ntati
on
of
t
he
m
od
el
on
a
novel
a
nd
real
-
w
or
l
d
m
edical
data
sam
ples,
note
d
diff
e
re
nce
ac
ro
s
s
the
m
ulti
ple
m
od
el
s,
an
d
to
c
oncl
ude
diff
e
re
ntiat
e
presentat
ion
ba
se
d
on
w
hich
dat
a
the
m
od
el
wa
s
trai
ne
d.
In
t
his
re
searc
h
based
on
rea
l
world
c
onsid
erati
on
s
,
we
a
pp
ly
this
ty
pe
of
a
ppr
oac
hes
for
al
gorit
hm
util
iz
at
ion
is
too
high
an
d
it
acce
pts
par
al
l
el
pr
oce
ssin
g
a
lso.
T
he
thr
ou
ghput
of
t
he
al
gorithm
su
pport
s
the
processi
ng
a
nd
distrib
ution
of
the
data
in
c
om
pu
ti
ng
res
our
ces
[19]
,
[
20]
.
T
he
c
urren
t
re
s
earch
s
kill
ed
to
us
e
a
cl
assifi
cat
ion
r
ule
f
or
detect
ion
of
outl
ie
rs,
and
al
l
the
wa
y
thr
ough
a
va
li
dation
of
m
e
dical
data.
In
Table
1
we
represe
nt
the
c
om
par
iso
ns
of
diff
e
re
nt
m
et
ho
ds
f
or
de
te
ct
ing
ou
tl
ie
r
s.
Am
ong
t
hes
e
m
e
thods,
pr
opos
e
d
novel
a
ppr
oach
pro
ves
t
he
bette
r
tha
n
tra
diti
onal
ap
proac
hes
.
Figure
5
is
a
grap
h
represe
nts
the
c
om
par
ison
s
of
dif
fer
e
nt
m
et
ho
ds
f
or
detect
ing
outl
ie
rs.
Am
ong
these
m
et
ho
ds,
pro
posed
novel
ap
proac
h
pro
ves
the
bet
te
r
than
tra
diti
on
al
a
ppro
a
ch
es.
I
n
m
y
pr
opose
d
con
ce
pt
ge
nerat
e
bette
r
res
ul
ts
com
par
e
to
m
or
e
m
e
thods
i
n
pr
e
senc
e
of
featu
re
sp
ace
an
d
eff
ic
ie
ncy.
The
com
pu
ta
ti
on
al
com
plexity
is
ver
y
le
ss
in
pr
op
os
e
d
m
achine
le
arn
i
ng
(ML
)
ap
proach.
Here
pract
ic
al
app
li
cabil
it
y
is
not
disc
us
se
d
because
dep
e
ndency
facto
r
ba
sed
on
m
od
el
of
the
data
an
d
siz
e
of
the
da
ta
i
n
ma
chine
le
arn
i
ng.
The
c
om
par
iso
n
of
dif
fe
ren
t
te
ch
nical
resu
lt
avail
abl
e
her
e.
Ou
t
of
these
our
pro
po
s
ed
resu
lt
s stat
e se
par
at
el
y. T
he d
at
aset
o
f
t
his
re
search
p
a
pe
r
ta
ken f
ro
m
k
ag
gl
e w
e
bs
it
e, fo
r our
researc
h.
Table
1.
C
om
par
iso
n
of
outl
ie
r dete
ct
ion
m
eth
ods
Metho
d
s
Co
m
p
u
tatio
n
al
Co
m
p
l
ex
ity
Ef
f
icien
cy
Featu
re
Sp
ace
Practical
ap
p
licab
ility
Statistical
Metho
d
s
[
2
1
]
Hig
h
Co
m
p
lex
Low
Sin
g
le
Variable
Statistical
d
ata
Para
m
etric
M
eth
o
d
s
[
2
2
]
Low
Co
m
p
lex
Mod
erate
Sin
g
le
Variable
Prior
k
n
o
wled
g
e
of
d
ata
sets
No
n
Para
m
etric
m
eth
o
d
s
[
2
2
]
Low
Co
m
p
lex
Mod
erate
Sin
g
le
/
Multi
V
ari
ab
le
p
rof
ile
of
th
e
d
ata
requ
ired
Distan
ce
M
eth
o
d
s
[
2
3
]
Nil
Hig
h
Multi
Va
riable
Relatio
n
of
in
d
iv
i
d
u
al
p
o
in
ts
Den
sity
Metho
d
s
[24
]
Co
m
p
lex
Hig
h
Multi
Va
riable
Relatio
n
of
p
o
in
t
s
an
d
n
eare
st
n
eig
h
b
o
r
to
o
Clu
ster
Ap
p
roach
[
2
5
]
Low
Co
m
p
lex
Hig
h
Sin
g
le
/
Multi
V
ari
ab
le
clu
sterin
g
of
si
m
il
ar
d
ata
Neu
ral
n
etwo
rks
[
2
6
]
Nil
Ver
y
Hig
h
Multi
Va
riable
Si
m
p
le
t
rainin
g
d
ata
Prop
o
sed
ML
ap
p
r
o
ach
Nil
Ver
y
Hig
h
Sin
g
le
/
Multi
V
ari
ab
le
Bas
ed
on
d
ata
size
Figure
5
.
G
raph
represe
ntati
on
of
outl
ie
r
de
te
ct
ion
m
et
ho
ds
5.
CONCL
US
I
O
N
The
c
on
ce
pt
of
m
achine
le
arn
in
g
ge
ne
rate
best
res
ults
in
healt
h
care
dat
a,
it
al
so
re
duce
the
work
load
of
healt
h
care
in
dustry.
This
al
gorithm
pote
ntial
ly
overco
m
e
the
iss
ues
a
nd
fin
d
out
the
no
vel
kn
ow
le
dge
for
de
velo
pm
e
nt
of
m
edical
date
in
healt
h
ca
re
in
du
st
ry.
In
this
pa
per
recom
m
end
a
nove
l
m
et
ho
d
f
or
fi
nd
i
ng
the
outl
ie
rs
usi
ng
dif
fer
e
nt
m
edical
dataset
s.
Co
ns
ide
rin
g
that
m
e
dical
data
ar
e
analy
ti
c
of
healt
h
com
plica
ti
on
s.
The
propose
d
ap
proac
h
is
work
i
ng
base
d
on
s
uper
vis
ed
an
d
unsup
erv
ise
d
le
ar
ning.
T
his
al
gorithm
detect
s
the
outl
ie
rs
in
m
edical
data.
The
e
ff
e
ct
ivene
ss
of
l
oca
l
and
gl
ob
al
da
ta
factor
f
or
ou
tl
ie
r
detect
ion
f
or
m
edical
data
in
real
ti
m
e.
Wh
at
eve
r,
t
he
m
od
el
us
e
d
in
this
sce
nar
i
o
f
r
om
their
trai
nin
g
a
nd
te
sti
ng
of
m
edical
data.
The
cl
eanin
g
pro
c
ess
base
d
on
t
he
com
plete
dim
ension
al
at
tr
ibu
te
s
of
datas
et
of
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Mac
hin
e le
ar
ni
ng base
d o
utli
er d
et
ect
io
n
fo
r
med
ic
al data
(
R. Vijay
a
K
umar
Re
ddy
)
569
si
m
il
arity
op
er
at
ion
s.
E
xperi
m
ents
are
co
nducte
d
in
bu
il
t
in
var
i
ous
m
e
dical
dataset
s.
The
sta
ti
sti
cal
resu
lt
s
sh
ow
t
hat
the
m
achine
le
ar
nin
g
ba
sed
outl
ie
r
rec
ogniti
on
al
gorithm
pr
ovid
ed
that
finest
a
ccur
acy
.
REFERE
NCE
S
[1]
J.
Han,
M.
Kam
ber
,
and
J.
Pe
i,
“
Data
Mining:
Conce
pts
and
T
ec
hniqu
es,
”
El
se
vi
er
,
2012,
do
i:
10.
1016/C2009
-
0
-
61819
-
5
.
[
2
]
D.
M.
H
a
w
k
i
n
s
,
“
I
d
e
n
t
i
f
i
c
a
t
i
o
n
of
O
u
t
l
i
e
r
s
”
,
C
h
a
p
m
a
n
&
H
a
l
l
,
L
o
n
d
o
n
,
U
K
,
1
9
8
0
,
d
o
i
:
1
0
.
1
0
0
7
/
9
7
8
-
94
-
0
1
5
-
3
9
9
4
-
4
.
[3]
O.
Alan
and
C.
Cat
al,
“
Thre
shol
ds
base
d
outlier
det
e
ct
ion
appr
o
ac
h
for
m
ini
ng
cl
ass
outliers:
an
empirical
c
ase
stud
y
on
software
m
ea
surem
ent
dat
ase
ts,”
E
xpe
r
t
Syste
ms
wit
h
Appl
ic
at
ions
,
vol.
38,
no.
4,
pp.
3440
-
3445,
2011,
doi
:
10
.
1016/j.e
s
wa.
2010.
08
.
130.
[4]
S.
Ramasw
am
y
,
R.
Rastogi,
and
K.
Shim
,
“
Eff
ic
i
ent
a
lgori
thms
f
or
m
ini
ng
outl
i
e
rs
from
la
rge
da
t
a
set
,
”
SIGMO
D
Re
cord
(
ACM
Spec
ia
l
Inte
rest
Gr
oup
on
Ma
nageme
nt
of
Data)
,
vol.
29,
no.
2,
pp.
427
-
438,
Ma
y
.
2000,
doi:
10
.
1145/33
5191.
335437.
[5]
E.
M.
Knox
and
R.
T.
Ng,
“
Algorit
hm
s
for
m
ini
ng
dista
nce
b
ase
d
outl
ie
rs
in
l
arg
e
dat
as
et
,
”
i
n
Pr
oce
ed
ings
of
the
Inte
rnational
Co
nfe
renc
e
on
Ve
r
y
Lar
ge
Data
Ba
ses
,
1998
,
pp.
39
2
-
403.
[6]
M.
M.
Br
euni
g,
H.
-
P.
Kri
ege
l
,
R.
T.
Ng
and
J.
Sander
,
“
LOF:
ide
nt
if
y
ing
den
sit
y
-
b
ase
d
loc
a
l
outl
i
ers,
”
ACM
SIGMO
D
Re
cor
d
,
vol
.
29
,
no
.
2,
pp.
93
-
104
,
June
.
2000
,
doi
:
10
.
1
145/335191.
335
388.
[7]
J.
Ha,
S.
Seok
and
J.
-
S.
Lee,
“A
pre
ci
se
ran
k
i
ng
m
et
hod
for
outl
ie
r
d
etec
t
ion
,
”
Information
S
ci
en
ce
s
,
vol
.
32
4,
pp.
88
-
107
,
De
c.
2015,
doi
:
10.
10
16/j
.
ins
.
2015.
06
.
030.
[8]
A.
Dane
shpaz
h
ouh
and
A.
Sa
m
i,
“
Ent
rop
y
-
ba
sed
outl
i
er
de
tecti
on
using
s
emi
-
supervise
d
ap
proa
ch
with
fe
w
positi
ve
exa
m
ples
,
”
Pa
ttern
Reco
gnit
ion
Le
tters
,
vol.
49
,
pp
.
77
-
8
4,
Nov.
2014,
do
i
:
10
.
1016/j.pa
tr
ec
.
2014
.
06.
012
.
[9]
J.
Gao,
H.
Chen
g
and
P.
-
N.
Ta
n
,
“
Sem
i
-
supervise
d
outl
i
er
de
te
c
t
ion,
”
in
Proceed
ings
of
the
ACM
Symposium
on
Appl
ie
d
Computi
ng
,
pp
.
635
-
636
,
ACM
,
Dijon
,
Fr
anc
e
,
Apri
l
2006
,
doi
:
10
.
1145/1
141277.
1141421
.
[10]
Z.
Xue
,
Y.
Sha
ng,
and
A.
Fen
g,
“
Sem
i
-
supervise
d
outl
i
er
d
et
e
ct
ion
b
ase
d
on
f
uzzy
rough
C
-
m
ea
ns
cl
ust
eri
ng
,
”
Mathe
mati
cs
and
Computers
in
Simulat
io
n
,
vol.
80,
no.
9,
pp.
1911
-
1921,
Ma
y
.
2010,
doi
:
10.
1016/j.m
at
co
m
.
2010.
02.
007
.
[11]
P.
V.
Am
oli
,
T.
Ham
al
ai
nen
,
G.
David,
M.
Zol
ot
ukhin,
and
M.
M
irz
amoham
m
ad,
“
Uns
uper
vised
net
work
int
rusion
det
e
ct
ion
s
y
s
tem
s
for
ze
ro
-
day
fast
-
spre
adi
ng
at
tacks
and
bo
tne
ts,
”
Inte
rnat
i
onal
Journal
of
Digit
al
Conte
n
t
Technol
ogy
and
Its
Applications
,
vol.
10
,
no
.
2,
pp
.
1
-
13
,
2016
.
[12]
A.
Bohara
,
U.
T
hakor
e,
and
W.
H.
Sander
s,
“
Intrusion
det
ection
in
ent
erp
r
ise
s
y
s
te
m
s
by
combini
ng
and
cl
uster
in
g
dive
rse
m
onit
or
dat
a
,
”
in
Procee
dings
of
the
Sym
posium
and
Boot
camp
on
the
S
cienc
e
of
Sec
uri
ty
,
Pitt
sburgh,
PA,
USA,
2016,
pp.
7
-
16,
doi
:
10
.
11
45/2898375.
289
8400
.
[13]
A.
L.
Buc
za
k
an
d
E.
Guven,
“A
surve
y
of
d
at
a
m
ini
ng
and
m
achine
l
ea
rning
m
et
hods
for
c
y
ber
sec
urity
int
rusi
on
det
e
ct
ion
,
”
I
E
EE
Comm
unications
Surve
ys
Tutor
ial
s
,
vol.
18,
no
.
2,
pp.
11
53
-
1176,
2016
,
doi
:
10
.
1109/CO
MS
T.
2015.
2494
502.
[14]
A.
Nisioti,
A.
M
y
lona
s
,
P.
D.
Yoo,
and
V.
Katos,
“
From
int
rusion
d
et
e
ct
ion
to
a
tt
a
ck
er
attribu
ti
on:
A
comprehe
nsive
surve
y
of
unsu
per
vised
m
et
ho
ds,”
IEEE
Comm
unic
ati
ons
Su
rve
ys
Tutor
ial
s
,
vol.
20,
no
.
4,
pp.
3369
-
3388
,
2018,
doi
:
10
.
11
09/COMS
T.
2018.
2854724.
[15]
J.
S.
Kee
rta
n,
Y
.
Naga
sai
and
S.
Shaik,
“
Mac
hi
ne
Le
arn
ing
Algorit
hm
s
for
Oil
Price
Predic
ti
o
n,
”
Inte
rnat
iona
l
Journal
of
Innov
ati
v
e
Techno
log
y
and
Ex
p
loring
Engi
ne
ering
,
vol
.
8
,
no
.
.
8,
pp.
95
8
-
963,
June
.
201
9.
[16]
htt
ps://
d
igi
t
al.a
h
rq.
gov/key
-
topi
c
s/arc
hi
te
c
ture
-
h
e
al
th
-
it
[17]
C
.
Isakss
on,
M
.
H.
Dunham
,
“A
Com
par
at
ive
Stu
d
y
of
Out
li
er
De
te
c
ti
on
Algori
th
m
s,”
Mac
hine
L
earning
and
Data
Mini
ng
in
Pa
ttern
Recogni
t
ion
,
v
ol
.
5632
,
pp
440
-
453,
2009
,
doi
: 10.1007/
978
-
3
-
6
42
-
03070
-
3_33
.
[18]
S
.
Shaik
and
U
.
Ravi
b
abu,
“
Clas
sific
at
ion
of
E
MG
Signal
Ana
l
y
sis
b
ase
d
on
Curvel
e
t
Tr
ansf
orm
and
Rando
m
Forest
tre
e
Met
hod,
”
Journal
of
Theoreti
cal
and
Appl
ie
d
Inf
orm
ati
on
Technol
ogy
(
JA
TIT)
,
v
ol.
95,
no.
24
,
pp.
6856
-
6866
,
Dec
.
2017.
[19]
R.
Bekke
rm
an,
M.
Bil
enko,
and
J.
La
ngford,
“
Scal
ing
up
machi
ne
le
arning:
Parall
el
and
distrib
ute
d
approache
s
,”
Cambr
idge
Univ
ersity
Press,
Ca
mbr
idge
,
2012
.
[20]
R.
Vijay
a
Kum
ar
Redd
y
and
U.
Ravi
Babu
,
“A
Revi
ew
on
Cla
ss
ifi
c
at
ion
T
ec
hniqu
es
in
Mac
hin
e
Learni
ng
,
”
Inte
rnational
Jo
urnal
of Adv
an
c
e
R
ese
arch
in
S
c
ie
nc
e
And
Eng
in
ee
ring
,
vol
.
7,
n
o.
3,
Mar
ch
201
8.
[21]
M
.
Alam,
"St
at
ist
ical
techn
ique
s
for
anomal
y
detec
ti
o
n",
Septe
m
ber
2020.
[Online
]
.
Avail
able:
htt
ps://
towar
dsdata
sc
ie
nc
e.com/stat
ist
ical
-
t
ec
hniq
ues
-
for
-
anomal
y
-
det
e
ct
ion
-
6
ac
89
e32d17a
[22]
J
.
Jos
eph,
"H
ow
to
det
e
ct
out
li
ers
using
par
ametr
i
c
and
non
-
par
am
et
ri
c
m
et
hods:
Part
I",
2019.
[On
li
ne]
.
Available:
htt
ps://
cl
ev
ertap.com
/bl
og/how
-
to
-
det
e
ct
-
out
li
ers
-
u
sing
-
par
ametr
i
c
-
m
et
hods
-
and
-
n
on
-
par
ametr
i
c
-
m
et
hods/
[23]
J.
Ranj
an
Seth
i,
"S
tud
y
of
Distanc
e
-
B
ase
d
Outli
er
D
et
e
ction
Methods",
June
2013.
[O
nli
ne]
.
Avai
la
bl
e:
htt
ps://
cor
e
.
a
c.
u
k/downloa
d/pdf/53189702.pdf
[
2
4
]
G
.
Kum
ar
Jha,
N
.
Kum
ar,
P
.
Ra
nja
n
and
K.
G.
Sharm
a,
"
Dens
ity
Based
Outli
er
Detect
ion
(DBO
D)
in
Data
Mining:
A
Novel
Appro
ac
h",
Rece
n
t
Ad
vanc
es
in
Mathem
ati
cs,
Stat
ist
ic
s
and
Computer
Sci
en
ce,
pp.
403
-
412,
2016.
doi:
10.
1142/978981
4704830_0037
[
2
5
]
A
.
C
h
r
i
s
t
y,
G
.
M
e
e
r
a
G
a
n
d
h
i
a
n
d
S.
V
a
i
t
h
ya
s
u
b
r
a
m
a
n
i
a
n
,
“
C
l
u
s
t
e
r
B
a
s
e
d
O
u
t
l
i
e
r
D
e
t
e
c
t
i
o
n
A
l
g
o
r
i
t
h
m
F
o
r
H
e
a
l
t
h
c
a
r
e
D
a
t
a
,
”
I
S
B
C
C
-
1
5
,
P
r
o
c
e
d
i
a
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
,
v
o
l
.
50,
pp.
209
-
2
1
5
,
2015,
d
o
i
:
1
0
.
1
0
1
6
/
j
.
p
r
o
c
s
.
2
0
1
5
.
0
4
.
0
5
8
.
[26]
S
.
Hawkins,
H
.
He
and
G
.
W
i
ll
ia
m
s
and
R
.
Baxt
er
,
”
Outlier
Dete
ction
Us
ing
Repl
icator
Ne
ura
l
Networks
,
”
Proce
ed
ings
of
the
4th
Inte
rnat
ional
Confe
ren
c
e
on
Data
War
ehousing
and
Knowle
dge
Disco
ve
ry
,
Sept
ember
2002,
pp
.
170
-
1
80,
doi
:
10
.
1007
/3
-
540
-
46145
-
0_
17
.
Evaluation Warning : The document was created with Spire.PDF for Python.