TELKOM
NIKA
, Vol.11, No
.4, Dece
mbe
r
2013, pp. 79
7~8
0
2
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v11i4.1645
797
Re
cei
v
ed Se
ptem
ber 2, 2013; Re
vi
sed
Octob
e
r 10, 2
012; Accepte
d
Octob
e
r 20,
2013
Ovarian Cancer Identification using One-Pass
Clustering and k-Nearest Neighbors
Is
y
e
Ariesha
n
ti*, Yudhi Pur
w
a
n
anto,
Hand
a
y
ani T
j
andrasa
T
e
knik Informatika, FT
If,
In
stitut
T
e
knolo
g
i S
epu
luh N
o
p
e
m
ber, Surab
a
y
a,
Indones
ia
Gedun
g T
e
knik Informatika, Kampus IT
S Su
kolil
o
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: i.ariesh
anti@
if.its.ac.id
Abs
t
rak
T
i
ngkat kese
mbuh
an p
a
sie
n
dap
at ditin
g
kat
k
an jik
a kank
e
r
ovariu
m d
a
p
a
t dideteks
i
le
bih aw
al.
Identifikas
i d
e
teksi d
i
ni
ka
nk
er
ovar
iu
m
meng
gu
naka
n
p
r
ofil eks
p
resi
protei
n (SEL
D
I
-T
OF
MS). Akan
tetapi, an
alisis
profil ekspr
e
s
i
protei
n tidak
mud
ah
ka
re
na
ka
ra
kte
r
i
s
ti
kn
ya
ya
ng
b
e
r
di
me
n
s
i
ti
ng
gi
d
an
me
ng
and
un
g
dera
u
. Untuk
me
na
nga
ni
karakteristik d
a
ta SELDI-T
OF
MS tersebut, pene
litia
n
in
i
me
ng
ajuk
an
s
ebu
ah
mod
e
l
i
dentifik
asi k
a
n
k
er ov
ariu
m
ya
ng ter
d
iri
atas
One-Pass
Clus
t
ering
da
n kl
as
ifier
k-Nearest N
e
ig
hbors. Ha
nya d
eng
an ko
mput
asi yan
g
sed
e
r
han
a da
n efisi
en, perfor
m
a mode
l
klasifik
asi in
i
me
nca
pai tin
g
k
a
t ketepata
n
hasil
menc
ap
ai
97%. Hasi
l ini
me
nu
njukk
an b
ahw
a mo
de
l ya
ng di
ajuk
an d
a
pat
me
nj
adi a
l
tern
atif yang
men
j
a
n
jika
n
da
la
m i
d
entifikas
i
kank
er ovari
u
m.
Ka
ta
k
unc
i:
ka
nker ovar
iu
m, klasteris
a
si on
e-pass cl
usteri
ng, k-near
est n
e
ig
hbors
A
b
st
r
a
ct
T
he ide
n
tifica
tion of
ovari
a
n canc
er us
in
g pr
ote
i
n
exp
r
essio
n
profi
l
e
(SELDI-T
OF
-MS) i
s
i
m
p
o
r
tan
t
to
assi
sts e
a
r
l
y
d
e
te
cti
o
n
o
f
o
v
a
r
ia
n
can
c
e
r
. The
ch
an
ce
to sa
ve
pa
ti
en
t’
s l
i
fe
i
s
g
r
e
a
t
e
r
when
ovari
an ca
nce
r
is detecte
d
at an e
a
rly st
age. H
o
w
e
ve
r, the an
alysis
of protei
n exp
r
essio
n
profi
l
e
is
chall
e
n
g
in
g be
cause it has ve
ry high
di
mens
i
ona
l features a
nd no
isy chara
c
teristic. In order to tackle those
difficulti
e
s, a novel ov
aria
n cancer
i
dentific
a
t
ion mod
e
l is p
r
opos
ed in this
study. T
he mode
l co
mpris
e
s
of
One-Pass C
l
u
s
tering
an
d k-
Near
est Nei
g
h
bors C
l
assifi
er
. W
i
th simpl
e
and
efficie
n
t
computati
on, t
he
perfor
m
a
n
ce of
the mod
e
l
ach
i
eves Acc
u
racy
abo
ut 97
%.
T
h
is res
u
lt show
s that
the mo
d
e
l
is pro
m
isi
ng for
Ovarian C
ance
r
identific
ation.
Ke
y
w
ords
: ov
aria
n cancer, o
ne-p
a
ss cluster
i
ng, k-ne
arest nei
ghb
ors
1. Introduc
tion
Ovarian
can
c
er is on
e of most commo
n can
c
e
r
s in
Indone
sia. T
he su
cce
ssfu
ll of the
dise
ase treat
ment depe
nd
on the stag
e
of disea
s
e.
The chan
ce t
o
save patie
n
t
’s life is gre
a
t
e
r
whe
n
ovaria
n
can
c
e
r
is de
tected at an
early sta
ge. In most cases, the disea
s
e
is dete
c
ted
at
advan
ced
sta
ge be
ca
use t
he can
c
e
r
det
ection
at an
e
a
rly sta
ge i
s
quite challe
n
g
ing. Inde
ed,
the
protein
ab
normality as
a
sign of i
n
itial
can
c
e
r
d
e
vel
opment
co
ul
d be
in
spe
c
t
ed in
the p
r
o
t
ein
profile
s. O
n
e
of p
r
otein
p
r
ofiling te
chni
que
s i
s
Su
rface
Enha
nce
d
La
se
r
De
sorption/Ioni
za
tion
Time-of
-
Fligh
t
Mass Spe
c
trom
etry (S
ELDI-T
OF M
S
). The SE
LDI-T
O
F MS
data could
be
analyzed for ovarian
can
c
er ident
ification becau
se
a can
c
e
r
ou
s
protein exp
r
e
ssi
on profile is
different from
the n
on-ca
n
c
er on
e [1].
Ho
wever,
the c
o
mp
ar
iso
n
p
r
oc
es
s is n
o
t tr
ivia
l b
e
c
a
us
e
the protein
profile ha
s ve
ry high
dime
nsi
onal feat
ure
and
noisy
ch
ara
c
teri
stic.
In orde
r to ta
ckle
the problem
of very high
d
i
mensi
onal fe
ature
and
noi
sy
data,
a co
mputational
model
i
s
requ
ired
to assi
st the discrimi
nation
betwee
n
ca
n
c
erou
s protei
n and no
rmal
protein.
Several stu
d
i
e
s repo
rted t
he su
cce
ssfu
l
of
the comp
utational mod
e
l in ovaria
n can
c
e
r
identificatio
n usin
g SELDI-TOF-MS data
.
The study
of [1] employ statistical an
alysis an
d SELDI-
TOF-MS
to d
e
termin
e the
ovarian
can
c
er. Th
ey re
po
rted that th
eir model
a
c
hie
v
e se
nsitivity of
98% and
spe
c
ificity of 93.5
%
. They use
d
the data
fro
m
Qilu Ho
spit
al, China. An
other p
r
edi
cti
on
model
wa
s p
e
rform
ed by [2] with se
nsiti
v
ity of
84% a
nd sp
ecifi
c
ity of 89%. They [2] used SEL
DI-
TOF MS dat
a and
artificia
l
intelligen
ce
approa
ch. An
other p
r
e
d
icti
on mod
e
l is
develop
ed by
[3]
also u
s
e
d
SELDI-T
OF MS
data. They [3
] attempt
ed to red
u
ce the
feature dim
e
nsio
n of the d
a
ta
usin
g stati
s
tical mome
nt an
d su
bsequ
ent
ly use th
e
dat
a to train
the
Kernel P
a
rtial
Lea
st Squa
re
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 11, No. 4, Dece
mb
er 201
3: 797
– 802
798
(KPLS). The
achi
eved a
c
cura
cy is a
bou
t 98% for dat
a witho
u
t red
u
ction
and 9
9
%
for data wi
th
feature red
u
c
tion.
Th
e superi
o
r pe
rfo
r
man
c
e of
th
e ke
rn
el-m
ethod-ba
sed
is also
sho
w
n
in
several
ca
se
studie
s
su
ch
as
ban
cruptcy predi
ction
[4] and
imag
e
pro
c
e
s
sing
[5]. Although t
h
e
perfo
rman
ce
of model that
is devel
ope
d
by [3] is
su
p
e
rio
r
compa
r
e to previous
studie
s
[1,2] the
improvem
ent space
still remains.
In this study
, we pro
p
o
s
e a novel
model
for
predictin
g ovarian ca
ncer
usin
g a
combi
nation
of
One
-
Pa
ss clu
s
terin
g
a
n
d
k-Nea
r
e
s
t Neig
hbo
rs. T
he
mo
del
i
s
driven by
SELDI-
TOF MS
data
.
The
main
contributio
n i
s
an im
provem
ent
in time
c
o
mplexity that
is
more effic
i
ent
comp
are to the previo
us model [3]. Furthe
rmo
r
e,
the perfo
rm
ance in term
of accuracy
is
comp
arable t
o
KPLS model [3] for
data without feature redu
ction.
This pap
er
i
s
orga
nized
a
s
follows,
Secti
on II explain
s
about
data
s
et and
algo
rithm that
is used i
n
thi
s
study. Next, Section III illustrate
s the
result and
di
scussion. Finally, Section
IV
d
e
s
c
r
ib
es
th
e c
o
nc
lus
i
o
n
.
2. Rese
arch
Metho
d
2.1. Data
se
t
Ovarian
ca
n
c
er could b
e
ident
ified from analy
s
is
of protein
ex
pre
ssi
on p
r
of
ile. The
profile
can
b
e
obtain
ed from several t
e
ch
niqu
es
. O
ne of the te
chniqu
es i
s
S
u
rface Enh
a
n
c
ed
Laser
De
so
rp
tion/Ionizatio
n
Time-of-Fli
ght Mass
Sp
ectro
m
etry
(S
ELDI-T
OF M
S
). In this st
udy
,
the mod
e
l of
ovarian
can
c
er id
entificati
on is
develo
p
ed u
s
ing
SEL
DI-TO
F
MS d
a
ta. The d
a
ta
is
available pu
bl
icly at [6].
The data
s
et
con
s
i
s
ts of 1
21 cancer
da
ta
and 95
no
n-can
c
e
r
dat
a. Each d
a
ta
contain
s
more
than
37
0000
feature
s
. A featu
r
e
repre
s
e
n
ts i
n
tensity of a
n
m/z ratio. Be
cau
s
e
value
s
of
several featu
r
es are mi
ssi
ng, a p
r
ep
ro
ce
ssi
ng
st
ep
to rem
o
ve feature
with
missi
ng val
u
e i
s
perfo
rmed. A
fter prep
ro
ce
ssi
ng step, th
e numbe
r of
retaine
d
feature
s
is 399
0
5
. Subseq
ue
ntly,
the dataset with these feature
s
will be u
s
ed for
trai
ni
ng and testin
g. The model
consi
s
t
s
of one-
pass cl
uste
rin
g
model an
d k-Nea
r
e
s
t Nei
ghbo
r cla
s
sifier.
2.2. One-p
a
s
s
Clustering
One-pa
ss clu
s
terin
g
i
s
an
i
n
creme
n
tal cl
usteri
ng
algo
rithm. As
a
no
n-iterative
clu
s
terin
g
algorith
m
, it generates
clu
s
ters of SEL
DI-TOF MS ve
ctor data
s
et in
a sin
g
le iteration. A data wi
ll
be pl
aced
in
a cl
uste
r
wh
e
n
a
simil
a
rity
betwe
en
th
e data and the clu
s
ter centro
id is the high
est
comp
are
to o
t
her ce
ntroid
s.
The simila
rit
y
metrics
is
Co
sine
simil
a
rity as defin
e
d
in E
quatio
n
1
.
The algo
rithm
of one-pa
ss clu
s
terin
g
[7] is de
scribe
d in Table 1.
Table 1. One
-
pass cl
uste
rin
g
algorith
m
One-Pass(d
a
ta_t
raining) retu
rn se
t_of_clusters
Input: data t
r
aining; Outp
ut: set of
clusters
1. Initializat
ion:
D = a set of
data_training; C = empt
y
set of clust
e
rs;
t = threshold; i =
0 (index data training); j = 0
(index
c
l
us
ter)
2.
Create a
ne
w
cluster c
j
and set d
i
as its member. Set d
i
’s label as c
j
’s label. Add to C.
3.
If D is empt
y
go to step 4
else
i = i+1
calculate sim
i
larit
y
of d
i
and
ea
ch centroid c
j
.
If the similarity>t and label d
i
= label c
j
add d
i
to c
j
and update cent
r
o
id of c
j
else
j = j+1
go to step 2
4.
Stop clustering. Return
C as a set of clusters
Whe
n
a
set o
f
cluste
rs
ha
s been g
ene
ra
ted, t
he traini
ng process i
s
accompli
sh
e
d
. In the
next step, th
e ce
ntroid
s from ea
ch
clu
s
ter
ar
e
empl
oyed a
s
pred
iction mo
del
usin
g k-Nea
r
est
Neig
hbo
rs (k-NN) algo
rith
m. While a general k-NN
use
s
all of data training to predi
ct the query
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Ovarian
Cancer Identificati
on usi
ng One-Pa
ss Cl
ustering and
k- ... (Isye Ari
e
shanti)
799
data, this model will only
use
cent
roids from t
he set
of clusters.
This
approach reduces the
comp
utationa
l time significantly.
2.3. k-Ne
are
s
t Neig
hbors
The k-NN i
s
a type of inst
ance-
b
a
sed
classificatio
n
method. In o
r
der to
cla
ssif
y
a query
data, k-NN
will vote the m
a
jority cl
ass
of k
neigh
bours that are
most
similar t
o
the query data.
Subse
que
ntly,
the
voted cla
ss will
b
e
tran
sferr
ed t
o
the q
uery
data a
s
its l
abel
cla
ss. T
h
e
neigh
bou
rs are ch
osen from
the
trai
ning data
s
e
t. In this mo
del, the trai
ning
data
s
et
is
rep
r
e
s
ente
d
by cent
roid
s
of clu
s
ters from prev
iou
s
one-pa
ss clu
s
terin
g
p
r
o
c
e
ss. T
he di
sta
n
ce
metric to
co
mpute the
si
milarity between qu
er
y d
a
ta and
cen
t
roids
of clu
s
ters is
Co
si
ne
simila
rity. The Equation o
f
Cosin
e
simi
larity
is defin
ed in Equati
on 1. The de
tail of the k-NN
algorith
m
[8] is de
scribe
d in Table 2.
Table 1. k-ne
are
s
t neigh
bo
r algo
rithm
k-Nearest-
Neigh
bour(t
raining_dat
aset, quer
y_data
,
number_of
_nei
ghbours)
retur
n
label_of_data_q
u
e
r
y
Input:
training
dataset
a quer
y data
number of n
e
igh
bours
Output: class label of the quer
y d
a
ta
1. Initializat
ion:
D = training data
s
et
k = number of ne
ighbours
2.
For each t
r
aining data in D
compute cosine similarity
bet
w
e
e
n
quer
y data a
n
d
training data
3.
Sort the tr
aining dataset according to similarity
value.
4.
Choose k training data
w
i
th large
s
t similarity
value
5.
Vote majorit
y
cla
ss of the k training data from step
4
6.
Return th
e majori
t
y
class as the class label of query data
The p
e
rfo
r
ma
nce
of the p
r
edictio
n mod
e
l is
m
e
a
s
ured u
s
ing
accura
cy, se
nsiti
v
ity and
spe
c
ificity m
e
tric. A
c
cura
cy evalu
a
tes the
abilit
y o
f
the mo
del
in p
r
edi
cting
the n
egative
an
d
positive data.
While se
nsiti
v
ity measure
s
the model
ability in pred
icting the po
sitive result, the
spe
c
ificity ev
aluate
s
the
model a
b
ility in pre
d
ic
tin
g
negative resu
lt. The formul
as of a
c
cura
cy,
sen
s
itivity and spe
c
ificity a
r
e de
scrib
ed i
n
Equat
ion 2,
Equation 3 a
nd Equation
4 respe
c
tively.
(1)
(2)
(3)
(4)
In Equation
1
,
x
i
and x
j
den
ote vecto
r
dat
a of SELDI-T
OF MS o
r
ce
ntroid
of a cl
u
s
ter. In
Equation
2, 3
,
and
4 TP, T
N
, FP a
nd
F
N
a
r
e
True P
o
sitive, True
Neg
a
tive, Fal
s
e P
o
sitive a
n
d
Fals
e Negative
respec
tively.
3. Results a
nd Analy
s
is
The p
r
opo
se
d model i
s
im
plemente
d
u
s
ing Java l
ang
uage in th
e Mac OS e
n
vironm
ent
and me
mory
setting i
s
2
Gigabyte
s. T
he validation
re
sult of the
predi
ction
m
odel i
s
liste
d
in
Table
3. The
result sho
w
s
the pe
rform
a
nce
of t
he m
odel in
the te
rm of a
c
cu
ra
cy, sen
s
itivity
and
spe
c
ificity. T
hey are a
s
se
ssed
us
i
ng 1
0
-fold cro
s
s-validation.
E
a
ch
10
-fold cross-validatio
n
is
run 10 time
s randomly.
j
i
j
i
j
i
x
x
x
x
x
x
)
,
cos(
F
N
F
P
T
N
T
P
TN
TP
Accuracy
FN
T
P
TP
y
Sensitivit
FP
T
N
TN
y
Specificit
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
9
30
TELKOM
NIKA
Vol. 11, No. 4, Dece
mb
er 201
3: 797
– 802
800
Acco
rdi
ng to
the evaluation re
sult tha
t
is
sho
w
n in Figure 1, the accuracy
of the
prop
osed mo
del is ra
nge
from 96.3% to 99.1%
with
average a
ccura
cy is abo
ut 97.8%. The
averag
e for t
he sen
s
itivity and
spe
c
ifici
t
y are
97.9%
and 9
7
.7%
respe
c
tively. In addition, t
he
rang
e for se
n
s
itivity is 95.8%-10
0
% an
d the rang
e for sp
ecifi
c
ity is 95%-9
9.2
%
. These re
sults
illustrate th
e
good
ability of the model i
n
discri
mi
nati
n
g can
c
er
protein profiles
from no
n-can
c
e
r
protein
profile
s. The
pe
rformance in
identific
a
tion of
pos
itive
res
u
lt (s
ens
i
tivity)
is
as
high as
the
perfo
rman
ce
in cla
ssifyin
g negative result (sp
e
cifi
city) even th
ough the
nu
mber of p
o
si
tive
training d
a
ta is more than the numb
e
r of
negative trai
ning data.
Figure 1. The
performan
ce
of the propo
sed m
odel in t
e
rm
s of accu
racy, sen
s
itivity and
spe
c
ificity (%
). The mea
n
and sta
nda
rd
deviati
on of a
c
cura
cy, sen
s
itivity and spe
c
ificity are
97.8%, 97.9%, 97.7% respectively
and
stand
ard d
e
viation of accuracy
, sen
s
itivity and spe
c
ificity
are 0.00
9, 0.012 an
d 0.01
5 respe
c
tively.
In ord
e
r to
co
mpare the m
odel p
e
rfo
r
m
ance with
th
a
t
of other
cla
s
sificatio
n
mo
dels, the
predi
ction p
r
oce
s
s is also
perform
ed i
n
ordin
a
ry k-Nea
r
e
s
t Neig
hbou
r [8], Compleme
nt Na
ïve
Bayes [9], Na
ïve Bayes Multinomial [10
], Support Ve
ctor Ma
chi
ne
(SMO alg
o
rit
h
m) [11], Ra
dial
Bas
i
s Func
tion Network
(RBF Net
w
o
r
k)
[12], De
cisi
on
Tre
e
(J48
) [
1
3] an
d
Ran
d
o
m Fo
re
st [1
4].
All of those
compared c
l
ass
i
fiers
[9-14]
are in
WEKA [12] implementation
with the
s
a
me Mac
OS environ
m
ent as the
propo
sed m
ode
l. The memo
ry
setting is al
so the
sam
e
as the p
r
o
p
osed
model, 2 Gig
abytes. The
compa
r
ison re
sults a
r
e liste
d in Table 3.
Table 3. Co
m
pari
s
on of a
c
curacy bet
we
en pro
p
o
s
ed
model an
d other cl
assifie
r
s
Classifier
Accur
a
cy
(
%
)
Complement Naï
v
e Ba
y
e
s
76.8
Naïve Ba
y
e
s
74.6
Support Vector
Machine (SMO)
96.8
RBF Net
w
ork
86
Decision Tree (J
48)
83
Random F
o
rest
84.7
1-Nea
r
est Neigh
bour
88.5
One-Pass Cluste
ring + k-NN (t
he
proposed mod
e
l)
97.8
Indeed, we a
l
so perfo
rme
d
cl
assficatio
n
u
s
ing
B
a
ye
s Network, Multilayer
Pe
rce
p
tro
n
,
Voted
Perce
p
tron and
L
ogisti
c
Reg
r
ession. Ho
we
v
e
r,
t
he
cl
as
sif
i
cat
i
o
n
i
s
n
o
t
s
u
c
c
e
ssf
ul
becau
se
of the me
mory
p
r
oble
m
. The
possibl
e ex
pl
anation
is be
cau
s
e
of the
high
dimen
s
i
onal
0
10
20
30
40
50
60
70
80
90
100
1
23456789
1
0
Accuracy
Sensitivity
Specificity
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Ovarian
Cancer Identificati
on usi
ng One-Pa
ss Cl
ustering and
k- ... (Isye Ari
e
shanti)
801
cha
r
a
c
teri
stic of SELDI
-T
OF MS
data.
Thu
s
, th
e
B
a
yes Net
w
ork,
Multilaye
r Perceptro
n, Voted
Perceptro
n a
nd Logi
stic
Regre
s
sion
req
u
ire mu
ch
mo
re memo
ry co
mpare to other cla
s
sifiers.
Table
3 illustrates the
perf
o
rmance of other
classifiers compare
to
the perform
ance of
the prop
osed
model. For Naïve Bayes-based cla
s
sifier, the result yields about
74%-76%. Fo
r
other
cla
ssifi
ers exce
pt SVM (SMO), t
he pe
rf
orm
a
n
c
e
rang
e ab
out 80%-88
%
. Exclusivel
y for
SVM (SMO),
its perfo
rma
n
c
e is
quite hig
h
, about
96%
. The perfo
rm
ance of SVM is simil
a
r to th
e
perfo
rman
ce
of the prop
ose
d
mod
e
l. However, S
V
M training
pro
c
e
ss
req
u
i
res
expen
si
ve
comp
utationa
l co
st. The time co
mplexit
y
for stand
ard SVM is O
(
n
3
) and at le
ast O(Nn
) for S
M
O.
Whe
r
e
N den
otes th
e n
u
m
ber of
supp
ort vectors
and
n de
note
s
the
numb
e
r of training
sampl
e
s.
Furthe
rmo
r
e,
whe
n
comp
aring
the p
e
rforman
c
e
of
the KPLS m
odel of [3], t
heir
accu
ra
cy is
about 9
8
%.
Ho
wever,
stil
l the time
co
mplexity
of KPLS is mo
re expen
sive
than that of
the
prop
osed mo
del. The time
compl
e
xity of the KPLS is at least O
(
n
2
). On the
con
t
rary
, the time
compl
e
xity for the t
r
aini
ng p
r
o
c
e
ss i
n
the p
r
op
o
s
ed
mod
e
l i
s
only li
nea
r O(n
)
. Whe
r
e n
rep
r
e
s
ent
s th
e num
be
r of
trainin
g
sa
mples.
The
time compl
e
xity in the trai
ning
pro
c
e
ss is
requi
re
d in generating the
cluste
rs. Be
cau
s
e the cl
u
s
ters are formed in a sin
g
le-p
ass of e
a
ch
data trainin
g
, the pro
c
e
ss o
n
ly need
s n times ba
si
c op
eration.
4. Conclusio
n
Acco
rdi
ng to
the cl
assification a
c
curacy
of
variou
s cla
ssifie
r
s (Tabl
e 4
)
, the SEL
DI-TO
F
-
MS data co
ul
d be u
s
ed fo
r ovarian
can
c
er ide
n
tificati
on. The
classification a
c
cu
racy of vari
o
u
s
cla
ssifie
r
s
are between
7
4
% and 9
7
%
.
These re
su
l
t
s co
nform th
at the ca
ncerous SEL
DI-T
OF
MS data is different from th
e norm
a
l SELDI-TO
F
MS data.
Furthe
rmo
r
e,
in terms of classificatio
n
a
c
cura
cy, the prop
osed mo
del in this stu
d
y has
highe
r p
e
rfo
r
mance tha
n
the p
e
rfo
r
m
ance of
ot
h
e
r
cla
s
sifiers as liste
d in
Tabl
e 4.
T
h
e
perfo
rman
ce
of the proposed mod
e
l
is comp
ar
a
b
le to the perform
an
ce
of SVM (SMO).
Ho
wever, the
time comple
xity of the propo
sed
mo
d
e
l is much more effici
en
t than the time
compl
e
xity of
the SVM (SMO).
With the cha
r
acteri
stics of
the ovaria
n
cancer SEL
DI-TOF MS dat
a, which ha
s 39905
features, the proposed model also
exhibits a superior
ability in processing very hi
gh dimensional
feature
s
. Wh
en
oth
e
r cl
assifiers su
ch
a
s
Ba
ye
s Net
w
ork,
Multilayer P
e
rceptron, V
o
ted
Perceptro
n a
nd Logi
stic
Reg
r
e
ssi
on fail to proce
ss tho
s
e d
a
ta (with me
mory setting
2
Gigabyte
s), t
he p
r
opo
se
d
model i
s
a
b
le to comp
ute and
pe
rform
well in t
he same
set
t
ing
memory an
d environ
ment.
Ackn
o
w
l
e
dg
ement
This work is suppo
rted
by grant "Bant
uan Op
erasio
nal Perg
urua
n Tinggi
Nege
ri"
(BOPTN) in the schem
e "Penelitian Do
sen Mu
da"
In
stitut Tekn
olo
g
i Sepuluh
Nopemb
e
r 2
0
1
3
.
Referen
ces
[1]
Z
hang
H, Ko
ng
B, Qu
X, Jia
L,
Den
g
B, Y
a
n
g
Q. Biomarker
discov
e
r
y
for
o
v
aria
n ca
ncer
usin
g SEL
DI-
TOF-MS.
Gynecol Oncol.
20
0
6
; 102(1): 6
1
-6
.
[2]
Wu SP, Lin
Y
W
, Lai H
C
, Ch
u T
Y
, Kuo YL,
Liu
HS
. SEL
DI-TOF MS profilin
g
of
pl
as
ma prote
i
ns
in
ovari
an canc
er
.
T
a
iw
an J Obstet Gynecol.
20
06; 45(1): 2
6
-3
2.
[3]
T
ang KL, Li T
H
, Xi
on
g W
W
,
Che
n
K. Ovari
an ca
ncer c
l
as
sificatio
n
b
a
se
d on
dim
ensi
o
nalit
y r
educti
on
for SELDI-TOF
data.
BMC Bioinfor
m
a
tics.
20
10; 27(1
1
): 109
.
[4]
Ariesh
anti I, Pur
w
a
n
anto Y, Ramad
han
i A, Ulinn
u
h
a
M, Ulin
nuh
a N. Comp
arativ
e Stud
y o
f
Bancruptc
y
Predicti
on M
o
dels.
T
E
LKO
M
NIKA T
e
leco
mmu
n
icati
on
Co
mp
uting
El
ectronics
an
d
Contro
l
. 2013;
11(3): 59
1-5
9
6
.
[5]
Cui E. W
i
de B
a
seli
ne Matc
hi
ng Usi
ng Su
pp
ort Vector Reg
r
essio
n
.
T
E
LKOMNIKA T
e
lecommunic
a
tio
n
Co
mp
uting El
e
c
tronics an
d C
ontrol
. 20
13; 1
1
(3)
[6]
Gordon
W
h
it
ele
y
. Bi
omar
ker Profil
ing,
Di
scov
e
r
y
and I
dentific
a
t
ion. Ce
nter
for Ca
ncer
Rese
arch,
Nation
al
Cancer
Inst
itute.
Avail
abl
e. onli
n
e
at:
http://home.ccr.cancer.gov/nc
i
fdaprote
o
mics/
ppatterns.as
p
[7]
S Rieber, VP Marathe. T
he
Single Pass Clus
tering Method.
[8]
D Aha, D Kibl
e
r
. Instance-bas
ed le
arni
ng a
l
g
o
rithms.
Machi
ne Le
arni
ng.
1
991; (6) 37-
66.
[9]
JD Ren
n
ie, La
w
r
e
n
c
e
Shih,
J
T
eevan, DR
Karger.
T
a
ckli
ng the Poor A
ssumptio
n
s of Naive B
a
yes
Text Classifier
s
. ICML 2003. 616-
623
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 11, No. 4, Dece
mb
er 201
3: 797
– 802
802
[10]
Andre
w
Mc
Ca
llum a
nd Ka
mal Ni
gam.
A Compar
ison
of Event Mod
e
ls for Naiv
e
Bayes T
e
xt
Classification
. AAAI-98 Workshop o
n
'Lear
ni
ng fo
r T
e
x
t
Categor
izatio
n' 19
98
[11]
J Platt.
Fast
training of support vector mach
ines using sequent
ial minimal optimiz
ation. in B
.
Schoe
lkopf, C. Burges a
nd A. Smola. Kern
el
Me
thods - Sup
port Vector Le
arni
ng. MIT
press 1998
[12]
M Hal
l
, OF L
Frank, G Ho
l
m
es, B Pfa
h
ri
nger
, P
R
eute
m
ann, I
H Wit
t
en. T
he WEKA data
mi
nin
g
soft
w
a
r
e
: an u
pdate.
ACM SIGKDD Explor
a
t
ions
. 200
9; 11
(1): 10-18
[13]
R Quinl
an. C4
.5: Programs for Machi
ne L
e
a
rn
i
ng. Morg
a
n
Kaufma
nn P
ub
lis
hers. Sa
n
Mateo, CA.
1
9
93
[14]
L Breima
n. Ra
ndom F
o
rests.
Machi
ne Le
arn
i
ng
. 20
01; (45):
5-32
Evaluation Warning : The document was created with Spire.PDF for Python.