TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 13, No. 3, March 2
015,
pp. 537 ~ 54
5
DOI: 10.115
9
1
/telkomni
ka.
v
13i3.711
5
537
Re
cei
v
ed
De
cem
ber 1, 20
14; Re
vised Janua
ry 1
8
, 20
15; Accepted
February 2, 2
015
Clustering Fragments Metagenome Using Self-
Organizing Map
Yunita Fau
z
i
a
Achmad
*
1
, Wisnu Anan
ta Kusuma
2
, Heru
Sukoco
3
1,2,
3
Departmen
t of Computer Scienc
e, F
a
cul
t
y
of Mathemat
ics and N
a
tural
Sciences,
Bogor Agr
i
cult
ural U
n
ivers
i
t
y
,
Bogor 16
68
0, Indon
esi
a
*Corres
p
o
ndi
n
g
author, em
ail
:
yun
i
ta.achma
d12
p@a
pps.i
p
b
.ac.id
1
, w
.
an
an
ta
.ku
s
u
m
a@
gma
i
l
.
co
m
2
,
hsrkom@i
pb.a
c
.id
3
A
b
st
r
a
ct
Metage
no
me i
s
a c
o
mbi
nati
on
of sev
e
ral
micr
oorg
a
n
i
s
m
s coll
ected
fro
m
th
e
envir
on
me
nt. In
meta
ge
no
me
ana
lysis, it
is r
equ
ired
b
i
nn
in
g for
gr
ou
pi
ng
metag
e
n
o
me
fragment y
i
el
d
ed
by se
qu
en
cer.
T
h
is rese
arch
used th
e co
mp
ositi
on a
p
p
r
oach for c
o
n
ductin
g
meta
g
eno
me frag
me
nt bin
n
in
g. In
this
appr
oach,
bin
n
in
g cou
l
d
be
imple
m
ente
d
usin
g u
n
sup
e
r
vised
or sup
e
rvise
d
le
arni
n
g
. W
e
used
Self-
Organi
z
i
ng M
ap
(SOM)
for conducti
ng b
i
nni
ng us
ed o
n
unsu
pervis
e
d lear
nin
g
. W
e
compar
ed tw
o
techni
qu
es of trainin
g
in
SOM
, namely
sequ
enti
a
l trai
nin
g
and
batc
h
traini
ng for
findin
g
the best
techni
qu
es. T
he results sh
ow
ed that the
bat
ch traini
ng
c
o
u
l
d o
b
tain
3.8%
error val
u
e
d
o
n
the
ma
p of [1
0
15]. T
h
is error valu
e is small
e
r t
han that of sequ
enti
a
l train
i
ng.
Ke
y
w
ords
: bat
ch traini
ng, bi
n
n
in
g, meta
ge
n
o
me, self-org
a
n
i
z
i
n
g
map, se
que
ntial
Copy
right
©
2015 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Bioinformati
cs i
s
a
di
sci
pl
ine
whi
c
h
ori
g
inally a
r
o
s
e
for
of introd
ucin
g o
r
de
r i
n
to the
massive
data
set
s
p
r
o
d
u
c
e
d
by the
ne
w tech
nolo
g
ie
s of mol
e
cular biolo
g
y, su
ch a
s
la
rg
e-scale
DNA
se
que
n
c
ing, th
e me
asu
r
em
ent of
multiple
gen
e expressio
n
, and
the te
ch
nology i
n
clu
d
e
s
proteo
mics tech
niqu
es.
Bioinformati
cs integ
r
ate
a numb
e
r of
related di
sciplin
es
su
ch
as
comp
uter sci
ence,
cybe
rn
etics,
mol
e
cu
lar evolution,
genomi
cs
a
nd proteomi
c
s, biometry a
n
d
biostati
stics, mathem
atical an
d
com
putational
bi
ology, i.e., [
10]. Bioinformatics introd
uce
metagen
ome
is
i
n
volves sampli
ng of microbial
DNA from
natural environme
n
ts
rathe
r
th
an
relying on tra
d
itional, sin
g
le spe
c
ie
s cultivation techni
que
s [8].
Re
sea
r
ch on
metage
nom
e have d
one
one of
stud
y [14] using
about 1
G
b
of DNA
seq
uen
ce
s t
hat have succe
ssfully
seq
uen
ced
from
the Sarga
sso Sea sampl
e
s
sho
w
ed t
he
pre
s
en
ce of
microbial
co
mmunitie
s
are fa
r more diverse than p
r
e
v
iously thoug
ht.
Binning ha
s t
w
o ap
proa
ch
es, nam
ely homology a
p
p
r
oa
ch a
nd co
mpositio
n ap
proa
ch.
Homol
ogy a
p
p
roa
c
h
is an
app
roa
c
h
th
at doe
s
alig
nment of
DNA seq
uen
ce
s by compa
r
i
n
g
fragme
n
ts me
tagenom
e an
d datab
ase seque
nces. Th
e
re
sults are con
c
lu
ded
in
each
taxono
mic
level [2]. This cau
s
e
s
the
homolo
g
y ap
proa
ch
req
u
ires a lot of time in the pro
c
e
ss of g
r
ou
ping
[8]. Example
of methods h
o
mology is B
L
AST and Me
gan.
Comp
ositio
n
approa
ch h
a
s
the adva
n
tage a
s
a by
pass. Compo
s
ition ap
proa
ch i
s
an
approa
ch tha
t
use
s
the
ba
se p
a
irs of fe
ature
extr
acti
on results a
s
fill for un
sup
e
rvise
d
lea
r
ni
ng
and supe
rvised lea
r
ning.
Method u
s
e
s
unsu
p
e
r
vi
se
d learning i
s
SOM and T
E
TRA whil
e, the
sup
e
rvised le
arnin
g
metho
d
is PhyloPythia
In this study the algorith
m
to be used a
s
trainin
g
an
d testing in
metagen
ome
fragment
clu
s
terin
g
i
s
a self-o
rga
n
izing m
ap
(SO
M
). SOM
succe
ssfully
appl
ied to
high
-di
m
ensi
onal
da
ta
[6]. This stu
d
y
also u
s
e
s
k-mers
on feat
ure extr
actio
n
method
k-mer, fre
quen
cy k-me
r u
s
e
d
is
tetra-n
u
cl
eoti
de and p
enta
-
nu
cleotid
e [2].
2. Res
earc
h
Method
2.1.
Natio
n
al Cen
t
er for
Bi
ote
c
hnolog
y
Inf
o
rmation (NCBI)
The first
step
take
s th
e d
a
t
a coll
ectio
n
.
Data
used
were ta
ke
n fro
m
nation
a
l
center
o
f
biotechnolo
g
y
information
(NCBI). NCBI is a se
rv
er that contai
ns
databa
se of
heath inform
ation
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 13, No. 3, March 2
015 : 537 – 5
4
5
538
and bi
otech
nology. Databa
se
conti
nuou
sly up
d
a
te acco
rdin
g to the la
test discove
r
ies
con
c
e
r
nin
g
DNA, protein, the com
pou
nd
s active, and
taxonomy [5].
2.2. MetaSim
Furthe
rmo
r
e,
the data
tha
t
has
bee
n collecte
d
in th
e sim
u
lation
seq
uen
ce
r to
obtain
fragme
n
ts. Seque
ncer Sof
t
ware fo
r u
s
e
d
is Meta
Sim
.
MetaSim is a simul
a
tion
softwa
r
e u
s
e
d
to
gene
rate the
data metage
n
o
me [11].
2.3. Bac
t
eria
Data
Data u
s
ed i
s
data ba
cteri
a
are divid
e
d
into two, n
a
mely the d
a
ta trainin
g
and data
testing. T
he
data training
con
s
i
s
ts
of 1
0
o
r
gani
sm
s with
lo
ng rea
d
ing 10 000
readin
g
s, wh
e
r
ea
s
the data testing con
s
i
s
t of 9 organi
sm
s with
long re
ading 5 00
0 readi
ng
s. Le
ngth of 5 kb
p
fragme
n
t use
d
and the lev
e
l of taxonomy genus [7].
2.4. Extrac
tion
F
eatur
e
The n
e
xt ste
p
of featu
r
e
e
x
traction, m
e
thod
extractio
n
feature a
r
e
use
d
k-m
e
r i
s
one
of
the cha
r
a
c
teri
stic extra
c
tio
n
method tha
t
calcul
ates
t
he pattern
of occurre
n
ce of k (the len
g
th
of
fragme
n
t met
agen
ome
)
at
a time in a
seque
nce [3]. The atten
dan
ce p
a
ttern
s
k in se
que
nce
s
is
cal
c
ulate
d
usi
ng the four m
a
in bases
(A C T and
G) raise
d
to a series of base p
a
irs
(attenda
n
c
e
patterns
:
4
k
with k
≥
) [1]. Illustratio
n
of k-mer featu
r
e e
x
traction
can
be se
en in Fi
gure 1.
Figure 1. Extraction featu
r
e
with k-m
e
r
2.5. Normaliza
t
io
n
Normali
z
ation
is the pro
c
e
s
s of scalling t
he va
lue attri
bute of the data, so as to
achi
eve
the spe
c
ified
range. The
purpo
se of
norm
a
lizat
ion
to remove redu
nda
ncy
of data, red
u
ce
compl
e
xity and simplify the
data modifi
cation [4]. In
this stu
d
y, the norm
a
lization
method u
s
e
d
is
the min
max
no
rmali
z
atio
n meth
od. M
i
n-max
no
rm
alizatio
n
m
e
thod usi
ng st
anda
rdi
z
ation
of
data by
puttin
g
the
data in
the ra
nge
of
0 to 1,
th
e
smallest val
u
e
as 0
and th
e
-
gratest val
u
e
as
1.
2.6.
Self-Or
g
anizing Map (SO
M
)
SOM wa
s first offere
d by
Touve Ko
h
onen i
n
19
8
9
from I
r
ela
nd. SOM i
s
a neu
ral
netwo
rk m
e
th
od ba
sed on
comp
etitive and un
supe
rvi
s
ed le
arni
ng, becau
se SO
M does n
o
t have
a target. SO
M divide the
data into se
veral group
defined
clu
s
ters [6]. SOM
has to trai
ni
ng
method, nam
ely sequ
ential
training an
d batch training
[12].
2.6.1.
Sequen
t
ial Training
Sequential
training
also re
feren
c
ed
a
s
on-lin
e o
r
st
o
c
ha
stic,
whe
r
e the u
pdate
is ma
de
for each sa
m
p
le of the training set [12].
The followi
ng
sequ
ential training alg
o
rit
h
m:
1.
Ran
dom we
ight initialization, det
ermines m
a
ximum value
for radiu
s
o
f
neigh
borhoo
d
and learning
value
α
2.
Stop conditio
n
if value false, otherwi
se
go to step 3 to 9
3.
Input vector x
,
proce
ed to step 4 to 6
4.
Cal
c
ulate di
st
ance Eculid
e
an with form
u
l
a:
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Clu
s
terin
g
Fragm
ents Met
agen
om
e Usi
ng Self-O
rga
n
izin
g Map (Y
unita Fau
z
ia
Achm
ad)
539
5.
Determine in
dex j for D (j)
most mino
r
6.
Upd
a
te wei
g
h
t
s for all j neighbo
rho
od of
all inputs, wit
h
formula:
7.
Upd
a
te learni
ng value
α
. value obtain
e
d
with multiplying value lea
r
ning rate fun
c
tion
of the value of learning
rate
redu
ction, wi
th formula:
8.
Red
u
ci
ng ra
d
i
us of neig
h
b
o
rho
od (N) at
a spe
c
ific tim
e
9.
Stop con
d
itio
n is met if de
sire
d value
s
close to
zero a
r
e met. If the value
α
be
co
mes
very small, the weights readings will al
so
be very sm
all so that the training process
can b
e
stop
p
ed.
2.6.2. Batch
Traini
ng
Batch training
, whe
r
e th
e u
pdate i
s
p
e
rfo
r
med
after th
e presentatio
n of all
sam
p
l
e
of the
training set.
The followi
ng
batch traini
n
g
algorith
m
:
1.
Initialize weights randomly
2.
Input vector x
,
proce
ed to step 3 to 7
3.
Cal
c
ulate Eculidea
n dista
n
ce
with form
ula (2
)
4.
Determinin
g t
he b
e
st
value
(BM
U
)
at e
a
c
h ve
cto
r
x
(a
ssi
gn
ea
ch v
e
ctor mo
del
most
simil
a
r
)
calle
d "
Voron
oi Set
"
5.
Upd
a
ting of e
a
ch ve
ctor an
d adja
c
en
cy with formul
a:
6.
Ren
e
w (de
c
rease) radiu
s
of neighb
orh
o
o
d
7.
Stop conditio
n
is met, if value
the sp
ecifi
e
d epo
c
h h
a
ve been met.
Value of l
earning
rate i
s
t
hat of the
eq
uat
ion
s
expli
c
itly batch tra
i
ning
refu
rbished
an
d
are not in
clud
ed at the parameters.
2.7. Ev
aluation
At this
stag
e of th
e ev
aluation, th
e
r
e i
s
th
e ev
aluation
calculation
perfo
rmed
is
quanti
z
ation
error, to
pog
ra
phic e
rro
r
an
d pe
rcenta
g
e
erro
r [13].
Q
uantizati
on
error is a
me
asure
comm
only used at SOM method. Thi
s
measure
m
e
n
t is for me
asu
r
e
s
the a
v
erage
dista
n
ce
betwe
en ea
ch vector and
looking for t
he Best Matchin
g
Unit (B
MU). Top
o
g
r
aphi
c error i
s
a
measure that use
s
the inp
u
t sampl
e
s f
o
r dete
r
min
e
s
advan
ce
d
mappin
g
of the input
spa
c
e to
the map
g
r
id
. Whe
r
ea
s
p
e
rcentag
e e
r
ror is u
s
ed
for
cal
c
ulate
the map
p
ing
error at e
a
c
h
grou
ping.
3.
Resul
t
and Analy
s
is
3.1. Sequen
t
ial
Training
Sequential ex
perim
ental re
sults of trai
ning are
divid
e
d
into two, namely based
learni
ng
rate an
d of neighb
orh
ood.
Experiment
s base
d
on th
e learning rate can b
e
see
n
in Figure 2
and
3, while the e
x
perime
n
t is based on n
e
i
ghbo
rho
o
d
s
can be seen in
Figure 4 a
n
d
5.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 13, No. 3, March 2
015 : 537 – 5
4
5
540
a) Ba
sed Le
arning Ra
te (LR)
Figure 2. Co
mpari
s
o
n
of the map si
ze t
e
tra-nu
cleotid
e based on le
arnin
g
rate
wi
th sequ
ential
training
Based
on Fig
u
re 2
the re
sults
of cal
c
ul
ations
usi
n
g t
he sequ
ential
trainin
g
met
h
od
s to
tetra-n
u
cl
eoti
de fre
que
ncy
based
on le
arnin
g
rate
with a compa
r
i
s
on
between
the si
ze
of the
map b
a
sed o
n
is
used
with cal
c
ul
ation
evaluati
on i
s
use
d
then,
calcul
ation q
u
antizatio
n e
r
ror
whi
c
h ha
s the smalle
st error is
contai
n
ed in the
map size [15 15
] with a learn
i
ng rate of 0.1 is
0.3211.
Cal
c
ulation top
ographi
c erro
r which
ha
s the
smalle
st e
rro
r is contain
ed
in the ma
p si
ze
[10 15] with a learni
ng ra
te of 0.1 is
0.0476.
Cal
c
ulation pe
rce
n
tage of erro
r that have the
smalle
st pe
rcentage
contai
ned in the ma
p size [15
15]
with the learning rate of 0.
25 is 5.68%.
Based
on Fig
u
re 3
the re
sults
of cal
c
ul
ations
usi
n
g t
he sequ
ential
trainin
g
met
h
od
s to
penta-nu
cleot
ide frequ
en
cy base
d
on le
arnin
g
rate
with a compa
r
i
s
on b
e
twee
n
the size of the
map b
a
sed o
n
is
used
with cal
c
ul
ation
evaluati
on i
s
use
d
then,
calcul
ation q
u
antizatio
n e
r
ror
whi
c
h ha
s the smalle
st error is
contai
n
ed in the
map size [15 15
] with a learn
i
ng rate of 0.1 is
0.3575.
Cal
c
ulation top
ographi
c erro
r which
ha
s the
smalle
st e
rro
r is contain
ed
in the ma
p si
ze
[10 10] with a learni
ng ra
te of 0.1 is
0.0469. Ca
l
c
ulation pe
rce
n
tage of erro
r that have the
smalle
st pe
rcentage
contai
ned in the ma
p size [15
15]
with the learning rate of 0.
25 is 4.68%.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Clu
s
terin
g
Fragm
ents Met
agen
om
e Usi
ng Self-O
rga
n
izin
g Map (Y
unita Fau
z
ia
Achm
ad)
541
Figure 3. Co
mpari
s
o
n
of the m
ap si
ze
penta-nu
cleot
ide ba
sed o
n
learni
ng rate with se
que
ntial
training
b) Ba
sed nei
ghborho
od
Figure 4. Co
mpari
s
o
n
of the map si
ze t
e
tra-
nu
cleotid
e based on n
e
ighb
orh
ood
with se
que
ntial
training
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 13, No. 3, March 2
015 : 537 – 5
4
5
542
Based
on Fig
u
re 4
the re
sults
of cal
c
ul
ations
usi
n
g t
he sequ
ential
trainin
g
met
h
od
s to
tetra-n
u
cl
eoti
de freq
uen
cy
based on
ne
ighbo
rho
od
with a com
p
a
r
ison
between
the si
ze of t
h
e
map b
a
sed o
n
is
used
with cal
c
ul
ation
evaluati
on i
s
use
d
then,
calcul
ation q
u
a
ntizatio
n e
r
ror
whi
c
h h
a
s th
e sm
alle
st error i
s
co
ntain
ed in th
e ma
p si
ze [1
5 15
] with a
neig
h
borhoo
d of
1 is
0.307. Cal
c
ul
ation
top
ogra
phic
e
rro
r wh
ich ha
s
the
smallest
erro
r
is
contai
ned
i
n
the
map
si
ze
[10 15]
with
a neig
hbo
rho
od of 4
is
0.0566.
Cal
c
ul
ation pe
rcent
age of
error that have t
h
e
smalle
st pe
rcentage
contai
ned in the ma
p size
[15 15]
with the neig
hborhoo
d of 1 is 5.66%.
Figure 5. Co
mpari
s
o
n
of the map si
ze
pent
a-nu
cleot
i
de ba
sed o
n
neigh
borhoo
d
with
seq
uential tra
i
ning
Based
on Fig
u
re 5
the re
sults
of cal
c
ul
ations
usi
n
g t
he sequ
ential
trainin
g
met
h
od
s to
penta-nu
cleot
ide freq
uen
cy
based on
ne
ighbo
rho
od
wi
th a com
pari
s
on
between
the si
ze of th
e
map b
a
sed o
n
is
used
with cal
c
ul
ation
evaluati
on i
s
use
d
then,
calcul
ation q
u
antizatio
n e
r
ror
whi
c
h h
a
s th
e sm
alle
st error i
s
co
ntain
ed in th
e ma
p si
ze [1
5 15
] with a
neig
h
borhoo
d of
2 is
0.3572.
Cal
c
ulation top
ographi
c erro
r which
ha
s the
smalle
st e
rro
r is contain
ed
in the ma
p si
ze
[10 10]
with
a neig
hbo
rho
od of 2
is
0.0553.
Ca
l
c
ul
ation pe
rcent
a
ge of
error that have t
h
e
smalle
st pe
rcentage
contai
ned in the ma
p size
[15 15]
with the neig
hborhoo
d of 3 is 4.67%.
3.2. Batch
Traini
ng
Different fro
m
the sequ
e
n
tial trainin
g
t
he batch training
experi
m
ents te
sted
only by
neigh
borhoo
d
not u
s
in
g le
arnin
g
rate.
The exp
e
rim
ental b
a
tch
training
re
sult
s ca
n b
e
see
n
in
Figure 5 and
6.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Clu
s
terin
g
Fragm
ents Met
agen
om
e Usi
ng Self-O
rga
n
izin
g Map (Y
unita Fau
z
ia
Achm
ad)
543
.
Figure 6. Co
mpari
s
o
n
of the map si
ze t
e
tr
a-nuc
l
eotide with batch training method
Based
on
Fig
u
re
6 the
re
sults of
cal
c
ul
ations u
s
ing t
h
e b
a
tch t
r
ai
ning m
e
thod
s to tetra
-
nucl
eotide freque
ncy a
compa
r
ison b
e
twee
n the
size of the
map b
a
se
d
on is
used
with
cal
c
ulatio
n evaluation i
s
used then,
calculation qu
anti
z
ation e
r
ror
which h
a
s th
e smalle
st erro
r is
contai
ned in the map si
ze
[15 15] with a neighb
or
h
o
od of 4 is 0.3203. Cal
c
ulat
ion topog
r
ap
hic
error
whi
c
h h
a
s the
sm
alle
st erro
r is
co
n
t
ained in th
e
map
size
[15
15] with a
nei
ghbo
rho
o
d of
4
is 0.044
4. Ca
lculatio
n percentage
of e
r
ror that have
the smalle
st
percenta
g
e containe
d in the
map si
ze [15
15] with the n
e
ighb
orh
ood
of 2 is 5.08%.
Based
on
Fig
u
re
7 the
re
sults of
cal
c
ula
t
i
ons
usin
g th
e bat
ch traini
ng meth
od
s t
o
penta
-
nucl
eotide freque
ncy a
compa
r
ison b
e
twee
n the
size of the
map b
a
se
d
on is
used
with
cal
c
ulatio
n evaluation i
s
used then,
calculation qu
anti
z
ation e
r
ror
which h
a
s th
e smalle
st erro
r is
contai
ned in the map si
ze
[15 15] with a neighb
or
h
o
od of 1 is 0.3563. Cal
c
ulat
ion topog
rap
h
ic
error
whi
c
h h
a
s the
sm
alle
st erro
r is
co
n
t
ained in th
e
map
size
[10
15] with a
nei
ghbo
rho
o
d of
2
is 0.040
8. Ca
lculatio
n percentage
of e
r
ror that have
the smalle
st
percenta
ge containe
d in the
map si
ze [10
15] with the n
e
ighb
orh
ood
of 2 is 3.8%.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 13, No. 3, March 2
015 : 537 – 5
4
5
544
Figure 7. Co
mpari
s
o
n
of the map si
ze
pent
a-nu
cleot
i
de with bat
c
h
training meth
od
3.3.
The Trainin
g Time Comparison b
e
t
w
e
e
n
Seq
u
ential Trai
n
ing and Batch Tr
aining
M
e
t
h
od
Duration
of training i
n
thi
s
study i
s
dete
r
mi
ned
by the
amou
nt of d
a
ta an
d the v
a
lue of
the epo
ch. T
r
aining i
s
a
not
her fa
ctor infl
uen
cing
th
e l
ength of train
i
ng time. Th
e
Table
1 can
in
see
n
a com
p
arison the trai
ning time bet
wee
n
se
que
n
t
ial training a
nd batch train
i
ng method.
Table 1. Co
m
pari
s
on of the
training time
Map
size
Seque
ntial train
ing
Batch traini
n
g
[10 10]
30 minutes
5 minutes
[10 15]
1 hour 10 minut
e
s
25 minutes
[15 15]
2 hour
50 minutes
4. Conclu
sion
Con
c
lu
sion
s t
hat ca
n be
drawn i
n
this
st
udy
that batch trainin
g
met
hod p
r
od
uces a bette
r
grou
ping
an
d
faste
r
tha
n
seq
uential
training
metho
d
. The
small
e
st p
e
rce
n
ta
ge e
r
ror obta
ined
penta-nu
cleot
ide freq
uen
ci
es contain
e
d
in the m
ap [10 15] an
d n
e
ighb
orh
ood
2 that is equ
a
l
3.64%.
Referen
ces
[1]
Abe T
,
Kana
ya S, Kinouch
i
M,
Ichiba Y, Kozuku
T
,
Ikemura T
.
Informatics for unveili
ng hi
dd
en
gen
ome sig
nat
ures.
G
eno
me Rese
arch
. 20
0
3
; 179(4):6
93-
701.d
o
i: 10.1
1
01 / gr.634
603.
[2]
Cha
n
CK,
Hs
u AL,
T
ang S
L
, Ha
lgam
ug
e
SK. Usi
n
g
G
r
o
w
in
g S
e
lf-O
rgan
izin
g M
aps
to Prov
e th
e
Binn
ing Proc
e
ss in Environ
m
ental W
hol
e-
G
enome Sh
otgun Eq
ue
ncin
g
.
Journal of Bi
omed
icin
e and
Biotech
nol
ogy
.
2008. d
o
i:10.1
155/2
0
0
8
/51
3
7
01.
[3]
Choi J
H
, Cho HG. Analy
s
is
of Common
k-
mers for Whole Genome
Sequenc
e Using SSB-T
r
ee.
Gem
o
me Inform
ation
. 20
02;
13: 30-4
1
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Clu
s
terin
g
Fragm
ents Met
agen
om
e Usi
ng Self-O
rga
n
izin
g Map (Y
unita Fau
z
ia
Achm
ad)
545
[4]
Han J, K
a
mb
er
M, Pei J.
Dat
a
Min
i
n
g
: Co
nc
ept a
nd T
e
c
h
n
i
ques
3n
d E
d
iti
on.
Waltham USA: Morgan
Kaufman
n
Pub
lishers. 2
012.
[5]
F
ederh
en S.
T
he NCBI T
a
xonom
y D
a
tabase.
Nuc
l
e
i
c Acids Res
earch
. 40:1
3
6
-
143. Doi
:
10.10
93/n
a
r/gk
r117
8. 201
2.
[6]
Koho
ne
n T
.
Self-Orga
n
iz
ed F
o
rm
ation
of T
opol
ogi
call
y
Corr
ect F
eature
Ma
ps.
Biologica
l
Cyber
netics
.19
82; 43(1): 5
9
-6
9.
[7]
Kusuma W
A
.
Combi
n
e
d
a
p
p
roac
hes for i
m
provin
g the
performa
nce o
f
de nov
o
dna
se
qu
en
ce
assemb
l
y
an
d metage
nomic
classificati
on o
f
short
fragme
n
ts from ne
xt
gen
eratio
n se
q
uenc
er. T
o
kyo
(JP);
T
o
kyo Ins
t
itute of
T
e
chnolo
g
y
. 2
0
1
2
.
[8]
Naser S, Br
el
and
A, Harr
is
F
C
, Nico
l
esc
u
M. A F
u
zz
y Class
ifier to
T
a
xonom
ical
l
y
Group
DNA
F
r
agments
w
i
t
h
in a Meta
ge
n
o
m. Universit
y
of Nevad
a
Re
n
o
. 2002.
[9]
Overbeek MV,
Kusuma W
A
, Buon
o A.
C
l
usteri
ng Met
a
gen
o
m
e F
r
a
g
m
e
n
t Usi
ng
Grow
ing Self
Organi
z
i
ng Ma
p
. In proc. ICACSIS. 2013.
[10]
Polanski A, Kimmel M.
Bioinf
ormatics
. Berli
n
(DE): Spring
er. 2007.
[11]
Richter D
C
, OT
T
F, Auch AF
, Schmid R, H
u
s
on D
H
. Meta
Sim: a sequ
en
cing sim
u
l
a
tor for gen
omic
s
andm
etag
eno
mics. PLoS ON
E, 3, e337
3. 200
9.
[12]
Silva B. A stud
y
of a h
y
b
r
i
d
p
a
rall
el SOM al
gorithm for l
a
rg
e maps i
n
data
minin
g
. Maste
r
T
hesis F
C
T
– UNI. 2008.
[13]
Uriarte EA, Ma
rtin F
D
. T
opolo
g
y
Preserv
a
tio
n
in SOM.
Inte
rnatio
nal J
our
n
a
l of Ap
pli
ed
Mathe
m
atic
s
and C
o
mput
er Scienc
es.
200
5; 1(1): 19-22.
[14]
Venter J
C
, R
e
mingto
n
K,
Hei
del
berg
JF
, Ha
lper
n AL,
Rusc
h D, E
i
sen
JA,
Wu D, P
auls
e
n I, Ne
lso
n
KE, Nels
on
W
,
F
outs DE,
Lev
y
S, Kn
ap A
H
,
Lomas
MW
, N
eals
on
K, W
h
it
e O, Peters
on
J, Hoffman J,
Parsons
R, T
illson H
B
, Pfann
koch C,
Ro
ger
s YH, Smith
H
O. 2004.
En
vi
ro
nm
en
ta
l
Geno
me
Sh
o
t
g
u
n
Sequ
enci
ng of
the Sargas
s
o
Sea. Scienc
e 3
0
4
. doi: 1
0
.112
6/Scienc
e.10
9
385
7.
Evaluation Warning : The document was created with Spire.PDF for Python.