TELKOM
NIKA
, Vol.14, No
.3, Septembe
r 2016, pp. 1
059
~10
6
6
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i3.1886
1059
Re
cei
v
ed Ap
ril 16, 2016; Revi
sed
Jul
y
1
5
, 2016; Acce
pted Jul
y
29,
2016
Hadoop Performance Analysis on Raspberry Pi for DNA
Sequence Alignment
Ja
y
a
Sena T
u
rana*, Heru
Sukoco, Wi
snu An
anta Kusuma
Bogor Agr
i
cult
ural U
n
ivers
i
t
y
,
Jl. Ra
ya Darm
aga Kam
pus I
PB Darmag
a
Bogor 1
6
6
80. Phon
e. +
62 251
862
26
42
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: sena.turan
a
@
gmai
l.com
A
b
st
r
a
ct
T
he rap
i
d d
e
ve
lop
m
ent of e
l
ec
tronic d
a
ta h
a
s
brou
ght tw
o major c
hal
len
ges
, namely, h
o
w
to store
big d
a
ta and
how
to proces
s it.
T
w
o main
proble
m
s
i
n
p
r
ocessi
ng bi
g data are the h
i
gh cost an
d the
computati
o
n
a
l
pow
er. Hado
op, one
of
the ope
n sourc
e
framew
orks for processin
g
big d
a
ta, u
s
es
distrib
u
ted co
mp
utatio
nal
mode
l des
ig
ned
to be a
b
l
e
to
run o
n
co
mmodity h
a
rdw
a
re
. T
he ai
m of t
h
i
s
researc
h
is to analy
z
e
Ha
do
op cluster
on
Rasp
berry Pi
as a co
mmod
i
t
y hardw
are fo
r DNA sequ
en
ce
alig
n
m
e
n
t. Six
B Mode
l R
a
sp
berry Pi
an
d a
Biod
oop
li
brary
w
e
re use
d
i
n
this res
earch
fo
r DNA se
qu
en
ce
alig
n
m
e
n
t. T
he length
of the DNA use
d
in t
h
is rese
arch is
betw
een 5,63
9 bp a
nd 1
3
,2
71 bp. T
he res
u
lts
show
ed that the Had
oop cl
us
ter w
a
s runnin
g
on the
Ras
p
berry Pi w
i
th the aver
age us
age of proc
ess
o
r
73.08
%, 33
4.6
9
MB of me
mo
ry and
19.
89
minutes
of job ti
me c
o
mpl
e
tio
n
. T
he distrib
u
tio
n
of Ha
doo
p d
a
t
a
file bl
ocks w
a
s found to re
du
ce process
o
r
usag
e
as
muc
h
as 24.1
4
%
and
me
mory u
s
age
as much
as
8.49%. How
e
v
e
r, this increas
ed jo
b proc
essi
ng time as
muc
h
as 31.5
3
%.
Ke
y
w
ords
: big
data, had
oop,
raspb
e
rry Pi, DNA sequ
enc
e alig
n
m
e
n
t
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
IBM define
s
big d
a
ta
as
having th
re
e
ch
arac
ter
i
s
t
ic
s
:
volume,
var
i
ety and
veloc
i
ty.
Volume refe
rs to the size of the data, variety
refers t
o
the type of
the data (text, sensory dat
a,
audio, vid
eo,
etc.) an
d v
e
locity refers to the
f
r
eq
u
ency of
the data
that
i
s
prod
uced by
an
appli
c
ation
or the a
nalyzin
g spee
d of th
e data
pr
odu
ced
[1]. Two
major challen
ges with
big
dat
a
are ho
w to st
ore it and ho
w to pro
c
e
ss
it, and the most importa
nt thing is h
o
w t
o
unde
rsta
nd
the
data and t
r
an
sform it into
meanin
g
ful in
formation. Th
e main p
r
obl
e
m
s in p
r
o
c
e
s
sing bi
g data
are
the hig
h
co
st, both
for the
hard
w
a
r
e
an
d the
software, and
comp
utational
po
wer [2].
One
ot
he
r
probl
em i
s
it
requires a
lot
of ele
c
tri
c
ity to po
we
r th
e h
a
rd
wa
re th
at i
n
turn h
a
s an
adve
r
se effe
ct
on the
enviro
n
ment [3].
Ha
doop
[4] is on
e of the
op
en
so
urce
soft
ware f
r
ame
w
o
r
ks devel
ope
d
to
manag
e bi
g
data. Hado
op
is also
de
sig
ned to
be
abl
e to
run
on
commodity h
a
rdwa
re,
so
it ca
n
cut the
co
st
to mana
ge b
i
g data. Th
ere are tw
o
m
a
in compo
n
e
n
ts of
Hado
o
p
: Had
oop
Fi
le
System and
MapRedu
ce.
These co
m
pone
nts a
r
e
inspi
r
ed
by Googl
e GFS
and Map
R
e
duce
proje
c
ts [5]. HDFS i
s
a di
stribute
d
file system
a
nd
MapRedu
ce i
s
a fram
ework for an
alysin
g an
d
transfo
rmin
g l
a
rge
data
set
s
. HDFS
stores m
e
t
adata
and a
ppli
c
ati
on data
se
pa
rately. Metad
a
ta
is sto
r
ed in th
e Name
No
de
while ap
plica
t
ion data is st
ored in the
DataNo
de [6].
Ra
spb
e
rry Pi is a comm
odity hard
w
a
r
e the si
ze
of a cre
d
it card p
r
od
uce
d
by the
Ra
spb
e
rry Pi
Found
ation. Ra
spb
e
rry Pi
as a mini co
mputer h
a
s a
capa
bility to
doing eve
r
ythin
g
a desktop co
mputer to do
, such a
s
browsing,
playi
ng a video, makin
g
a sp
read
sh
eet, and
playing ga
me
s [7]. Raspbe
rry Pi ha
s two major
adv
a
n
tage
s than t
he othe
rs mi
n
i
comp
uter. Fi
rst
is a
simple
i
n
stallatio
n
. Rasp
berry Pi
operating
system by defau
lt installed
o
n
SD
Card, this
feature will
speed u
p
the cre
a
tion of the clu
s
ter
be
cause only ma
ke SD card d
uplication an
d do
a few configu
r
ation
cha
n
g
e
s. Second,
Ra
spb
e
rry
Pi are mu
ch m
o
re
che
ape
st
than the oth
e
rs.
For exampl
e, in June 20
1
5
, the pricin
g
was $1
25 to
$149 for Be
agleBoa
rd M
odel
s, $49 to $89
for Beagl
eBo
ne Mo
del
s, $
174 to
$182
for Pan
daBo
a
r
d Mo
del
s, an
d $25
to $3
5
for Raspbe
rry
Pi
Model
s. The
co
st of Ra
sp
berry Pi
is lo
w an
d it req
u
i
r
es little elect
r
icity [8]. Ra
spberry Pi ena
ble
s
the con
s
tru
c
ti
on of low-co
st and ene
rg
y efficient cl
uster. However, it has se
veral limitations.
One
of them
is the
slo
w
perfo
rman
ce
of the S
D
card.
The
lifetime of th
e S
D
ca
rd i
s
al
so
signifi
cantly
shorten
with a
pplication
s
th
at fr
eq
uently perfo
rms writi
ng
o
p
e
r
ation
on
the
SD ca
rd
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 1059 – 10
66
1060
[9]. There i
s
an in
cre
a
si
ng
intere
st in u
s
ing
Ha
doo
p
to mana
ge
Big Data i
n
variou
s
re
sea
r
ch
fields. O
ne
o
f
them i
s
bi
oi
nformati
cs [1
0]. A lot of
massive d
a
ta
set
s
a
r
e
u
s
ed in
this fiel
d to
apply mathe
m
atical, stati
s
tical a
nd inf
o
rmati
c
al
me
thods to
solv
e biologi
cal
probl
em
s, mainly
related
to DNA sequ
en
ce and
amin
o
acid
s.
Next-gene
ration
DNA sequ
en
cing a
r
e g
ene
rating
billions
of seque
nce dat
a; it's mad
e
seq
uen
ce
alignme
n
t ca
nnot pe
rform
on stan
dal
one
machi
n
e
s
. One sol
u
tion to
this probl
em
is runni
ng th
e algorith
m
o
n
a clou
d or
clu
s
ter. Apply
i
ng
the algorith
m
to run parallel with Had
oop [16]
ha
s many advantage
s; there
are scala
b
ili
ty,
redu
nda
ncy,
automatic m
onitorin
g
an
d
high
perfo
rmance. The
aim of thi
s
re
sea
r
ch i
s
to
impleme
n
t a
nd an
alyze
Had
oop
Clu
s
ter on
Ra
sp
ber
ry Pi for
DNA
seq
uen
ce ali
gnme
n
t [11]
usin
g Biodo
o
p
libra
ry [12]. Thro
ugh thi
s
resea
r
ch,
it is expe
cted th
at an ea
sy a
nd cost effe
ctive
manne
r to manag
e big dat
a is found. It is also exp
e
ct
ed that this re
sea
r
ch ca
n be used to assist
in devel
opin
g
an
environm
entally frien
d
l
y
techn
o
l
ogy whi
c
h ha
s
e
n
ergy co
nserv
a
tion
a
s
one
of
the indicators [13].
2. Related Works
Iridis-Pi clu
s
ter wa
s b
u
ild
with 64 no
de
s Ra
sp
berry Pi Model B. This cl
uste
r wa
s de
sign
for edu
catio
nal appli
c
ati
ons, whe
r
e
it enable
s
students to u
nderstan
d a
nd apply hi
gh-
perfo
rman
ce
comp
uting
a
nd d
a
ta h
and
ling fo
r
comp
lex engi
nee
ri
ng, an
d
scie
ntific challe
n
ges
[17]. Glasgo
w
Clou
d Data Center was build
with
54
nod
es Rasp
berry Pi Mo
d
e
l B. Thi
s
clu
s
ter
wa
s emulate
d
every laye
r of clou
d st
ack, ran
g
ing
from re
sou
r
ce virtuali
z
ati
on to netwo
rk
behavio
ur, p
r
oviding a full
-featured
of cl
oud
comp
utin
g re
se
arch a
nd ed
ucation
a
l enviro
n
me
nt
[8]. Bolzano
Clou
d clu
s
te
r wa
s build
wi
th 300 no
de
s Ra
spbe
rry P
i
Model B. T
h
is
clust
e
r
was
desi
gne
d to
cre
a
te affo
rd
able
and
en
ergy-efficient
clu
s
te
r [9].
High
Perfo
r
m
ance
Comp
u
t
ing
(HP
C
)
with 1
4
Ra
sp
berry
Pi model B
was te
st
ed fo
r
runni
ng 1
000
x1000 mat
r
ix and te
st re
sult
s
h
ow HP
C c
a
n proc
es
s
the data to c
o
mplete [18].
3. Rese
arch
Metho
d
Six Model B Ra
spb
e
rry Pi with Ra
sp
bi
an ope
ratin
g
system
we
re
use
d
in this
rese
arch.
One
Ra
spb
e
rry Pi was
u
s
ed
as
Ha
do
op Name
Nod
e
and five ot
hers a
s
Had
oop
Data
Nod
e
.
Wo
rdcount
a
pplication
wa
s u
s
e
d
for th
e initial te
stin
g of the
Ha
d
oop
clu
s
ter a
r
chite
c
tu
re. T
h
is
appli
c
ation
was availa
ble i
n
Had
oop in
stallation.
Gan
g
lia software wa
s used to
monitor
Had
o
op
clu
s
ter resou
r
ce [1
4]. The
Ganglia
wa
s divided int
o
two data
source
s: had
o
op-m
a
ste
r
s
a
n
d
hado
op-slave
s
. The h
ado
o
p
-ma
s
ters
co
nsi
s
t
ed of Ha
doop
Nam
e
Node a
nd the
hado
op-slave
s
con
s
i
s
ted of
the entire
Ha
doop
Data
No
des. G
angli
a
meta dae
mo
n and
Gangli
a
we
b fronte
n
d
were install
e
d on a virtua
l machin
e on
a Notebo
ok.
The virtualization software use
d
for th
is
r
e
se
ar
ch
wa
s Vir
t
u
a
l
Bo
x. T
h
e
DN
A
s
e
q
u
e
n
c
e
a
lign
m
en
t p
r
oc
ess
wa
s
pe
r
f
or
me
d u
s
ing
Bio
d
oop
softwa
r
e. Th
e step
s ca
rri
e
d
out by Biodoop for DNA
seq
uen
ce ali
gnment a
r
e a
s
follow:
1.
DNA data
with fasta format
upload to HDFS by web m
onitorin
g
2. Biodoop
is runnin
g
fa
sta
2
tab a
ppli
c
at
ion.
Thi
s
ap
plicatio
n
con
v
erts fa
sta f
o
rmat
s
e
quenc
e
to
TAB-delimited format.
3.
Biodoop i
s
ru
nning bi
odo
o
p_bla
s
t appli
c
ation. Thi
s
appli
c
ation i
s
a wra
ppe
r b
a
se
d
MapRedu
ce i
m
pleme
n
tatio
n
of BLAST for Ha
doo
p.
For
se
que
nce align
m
ent
perfo
rman
ce
result comp
a
r
iso
n
, this re
sea
r
ch i
s
u
s
i
ng
six PC
IBM Lenovo MT-M 88
00
-5
CJ
with dual
core proc
essor 1.86
GHz, one 16
0 GB hard di
sk an
d 1
GB RAM. The DNA data for this rese
arch
was obtained from the Nati
onal Center for
Biotechn
olog
y Information (NCBI). T
he
DNA data i
s
shown in Tabl
e 1.
Table 1. DNA
data
DNA Name
Length (b
p)
Size (KB)
Ancylostom
a
duo
denale m
i
tochon
drion
(NC
_00341
5.1)
13,271
14.00
Necator a
m
e
r
icanus m
i
tochond
rion
(NC_0
03416.2
)
13,605
13.88
Chaetoceros ten
u
issim
u
s DNA virus
(NC_014
748.
1)
5,639
5.81
Chaetoceros lore
n
z
i
anus DNA Vir
u
s
(NC_015
211.
1)
5,813
5.98
Human papillomavir
u
s type 132
(
NC_01495
5.1)
7,125
7.31
Human papillomavir
u
s type 134
(
NC_01495
6.1)
7,309
7.49
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Had
oop Pe
rform
ance Ana
l
ysi
s on
Ra
sp
berry Pi for DNA Seque
nce… (Ja
y
a Se
na Turana
)
1061
For the pu
rp
ose of this rese
arch, a web
-
ba
se
d monitori
ng a
pplication wa
s built to
analyze Ha
d
oop Job Tracker
on the Hadoo
p clu
s
te
r. The data f
o
r the an
alysis wa
s obtai
n
ed
from the
Gan
g
lia Monito
rin
g
usi
ng a
we
b se
rvice
pro
t
ocol p
r
ovide
d
by the G
a
n
g
lia API [15]. A
daemo
n
a
ppl
ication
wa
s
built u
s
ing
Java progr
am
ming la
ngu
a
ge to m
onito
r resource
s t
hat
can
not be mo
nitored by the
Ganglia, su
ch as temp
e
r
at
ure an
d disk input/output. This ap
plication
comm
uni
cate
d with the we
b monitori
ng throu
gh a socket prot
ocol.
Figure 1. We
b monitori
ng
comm
uni
cati
on pro
c
e
ss
The process
of data colle
ction from Ga
n
g
lia
API and monitori
ng da
emon
can b
e
see
n
in
Figure 1. The
steps
ca
rrie
d
out by t
he web monito
ring
are as follo
w:
1.
The we
b mon
i
toring ma
ke
s a reque
st to Gangli
a
API
and monito
rin
g
daemo
n
.
2.
The Ga
nglia
API and the monitori
ng
daemo
n
se
nd a data te
xt respo
n
se
with a
J
a
vas
c
ript Objec
t
Notification (J
S
O
N) fo
rmat to the web monito
ring
.
3.
The we
b mon
i
toring pa
rse
s
the respon
se
data and sto
r
es it in the da
tabase.
Figure 2. Sequen
ce dia
g
ra
m of daemon
monitori
ng ap
plicatio
n
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 1059 – 10
66
1062
The summo
ni
ng process of
the temperature
a
nd the d
i
sk I/O da
em
on appli
c
atio
n by the
web mo
nitori
ng is sh
own in Figure 2. T
he step
s to
obtain tempera
t
ure and di
sk I/O values are as
follow:
1.
The we
b mo
nitoring ma
kes a req
u
e
s
t through a socket proto
c
ol to the monitoring
daemo
n
.
2.
The
daem
on
appli
c
ation
summon
s
tem
p
.sh
script
fo
r temp
eratu
r
e
req
u
e
s
t o
r
i
o
stat
for disk
I/O reques
t
.
3.
The d
aem
on
appli
c
ation
sen
d
s a d
a
ta
text respon
se in
JSO
N
f
o
rmat to
the
we
b
monitori
ng.
Thre
e DNA seque
nce alig
nments
we
re
performed in
this re
sea
r
ch and e
a
ch a
lignment
was
tes
t
ed t
w
ice. In the firs
t trial, the defaul
t b
l
oc
k siz
e
w
a
s
us
ed w
h
ile in
th
e
s
e
c
o
nd
o
n
e
th
e
values of the
block si
ze
were modifie
d
. The metri
cs to be m
e
asu
r
ed
we
re
pro
c
e
ssin
g
time,
pro
c
e
s
sor u
s
age, m
e
mory
usage,
disk i
nput/outpu
t,
netwo
rk inp
u
t/output an
d t
e
mpe
r
ature.
The
results of the
two t
r
ial
s
we
re th
en
com
p
ared
to
se
e t
he effe
ct of t
he m
odificati
ons.
The
valu
e of
the cha
nge in
percent wa
s
cal
c
ulate
d
usi
ng the followi
ng formul
a:
100
100
A po
sitive va
lue m
ean
s th
e blo
c
k m
odi
fication i
m
pro
v
es p
e
rfo
r
ma
nce,
and
a
n
egative
one mea
n
s it lowe
rs the p
e
r
forma
n
ce.
4. Results a
nd Analy
s
is
Some of the
Had
oop va
ria
b
le value
s
ha
d to be
adj
ust
ed
so they
co
uld run o
n
Ra
spb
e
rry
Pi. The sp
eci
f
ication of
Ra
spb
e
rry Pi was b
e
lo
w the
minimum
re
quire
ment
s for Hado
op. T
he
adju
s
ted valu
es were a
s
fo
llow:
1.
HADOOP_HEAPSIZE
. Hadoo
p is an
appli
c
ation
written i
n
th
e Java
lang
uage
.
Maximum me
mory allo
cation for Java
Virtual Machi
ne (JVM
) or
comm
only ca
lled heap
size is
importa
nt for
runni
ng
Java
appli
c
ation.
This
re
se
arch was
co
ndu
cted
a fe
w e
x
perime
n
ts to
ge
t
heap
si
ze val
ue an
d d
e
termined th
at th
e value
of he
ap
size
is 38
4 MB. He
ap
size
experim
e
n
t is
sho
w
n in Ta
b
l
e 2.
Table 2. He
a
p
size trial
Heap (MB
)
Error
Error M
e
ssage
64
Y
e
s
Java heap space
128
Y
e
s
Java heap space
192
Y
e
s
Java heap space
256
Y
e
s
Java heap space
320
Y
e
s
Java heap space
384 No
448
Y
e
s
Could not rese
rve enough space f
o
r object heap
512
Y
e
s
Could not rese
rve enough space f
o
r object heap
Java
hea
p
space o
r
com
m
only called
java.lang.O
u
tOfMemoryError Th
ro
wn
whe
n
th
e
Java Virtual
Machi
ne can
not allocate an obje
c
t be
cau
s
e it is o
u
t of memory, and no more
memory
coul
d be made av
ailable by the
garba
ge coll
ector.
2.
Timeout
.
T
h
i
s
re
sea
r
c
h
w
a
s us
ed
a sm
all
f
ile si
ze
. S
p
lit file into a
smaller
blocks file
shows that M
apReduce jo
bs
will be longer than
usual because of
overhe
ad the splitting process
and creating Hadoop task. The val
ue of dfs.cli
ent.file-block-storage-l
ocati
ons.timeout (the
default value
of 1 second
) wa
s modified
to 1200 seco
nds.
3.
Block file si
ze
. Ha
doo
p h
a
s a minim
u
m block file size eq
ual to 1 MB and blo
ck file
size e
qual
to
128
MB. In t
h
is
re
sea
r
ch,
be
cau
s
e
of
usin
g a
sm
all
file si
ze
(7-1
4 KB), mini
m
u
m
block si
ze
(df
s
.name
nod
e.fs-lim
it
s.min-bl
ock-size) wa
s
ch
ang
ed
to
512 bytes a
nd a blo
ck
si
ze
(dfs.blo
cksi
ze
) modified a
c
cordingly
by DNA sequ
en
ce alig
nment
trial.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Had
oop Pe
rform
ance Ana
l
ysi
s on
Ra
sp
berry Pi for DNA Seque
nce… (Ja
y
a Se
na Turana
)
1063
Figure 3. Had
oop cl
uste
r test usin
g wo
rd
cou
n
t appli
c
a
t
ion
The re
sult of
Hado
op
clu
s
ter te
st usi
n
g Wo
rd
cou
n
t application i
s
shown in F
i
gure
3.
Four trial
s
were
pe
rform
e
d with
differe
nt file
si
ze
s. This appli
c
ati
on wa
s
runni
ng
o
n
the clu
s
ter
with average
job time com
p
letion is
19.
89 minute
s
. T
he re
sult of t
he test
sho
w
s that the rel
a
tion
betwe
en the
file size an
d t
he compl
e
tion time
is not linear, and Hadoo
p
works optimally
wh
en
the file is larger.
For th
e temp
eratu
r
e d
a
ta
colle
ction, th
e dae
mon
ap
plicatio
n sum
m
oned
a
shel
l script i
n
the Ra
sp
bian
operating
system,
whil
e for the di
sk I/O, the daem
o
n
appli
c
ation
obtaine
d the
data
from the iosta
t
applicatio
n.
4.1. Hadoop
Cluster Performance
An
aly
s
is for DNA Sequen
ce
Alignment
DNA
seque
n
c
e
alignm
ent
wa
s
pe
rform
ed th
ree
time
s a
nd
ea
ch
a
lignment
wa
s tested
twice. The first trial wa
s p
e
rform
ed u
s
i
ng t
he defaul
t block
size o
f
the Hadoo
p
and the se
cond
one was p
e
rf
orme
d with b
l
ock si
ze mo
dification.
Th
e modificatio
n
wa
s don
e to distribute f
ile
blocks to the entire Hado
o
p
Data
Nod
e
.
1.
Ancyl
ostom
a
duode
nale
m
i
tochond
rio
n
(NC_0
034
15.1) a
nd
Necato
r am
eri
c
an
us
m
i
tochond
rio
n
(
N
C
_
00
3
416
.2
)
Referen
c
e se
quen
ce
: Ancylo
s
tom
a
duod
enale
m
i
tochond
rio
n
Query
sequ
e
n
ce
: Necato
r am
ericanu
s m
i
tocho
ndri
on
On the first tri
a
l, a 128 MB block si
ze
wa
s
used for
bot
h seq
uen
ce
s.
On the se
co
nd trial,
a 3 KB bl
ock si
ze
wa
s u
s
ed for the
NC_0
034
15.1
seq
uen
ce
an
d a
10 KB bl
ock
size was use
d
for the
NC_0
0341
6.2
seq
uen
ce. A m
o
re than
89%
DNA
simil
a
rit
y
wa
s fou
nd
on the
sequ
e
n
ce
alignme
n
t with a bit sco
re
of 1225. The test re
sult of the Raspb
e
rry Pi re
sou
r
ce u
s
a
ge an
d
pro
c
e
ssi
ng time for DNA 1
sequ
en
ce ali
gnment is
sh
own in Ta
ble
3.
Table 3. Te
st result for NC_003
415.1 a
n
d
NC_00
341
6.2
Disk (
K
bps)
Netw
or
k (
K
bps)
Processor (%
)
Memor
y
(MB)
Read
Write
Input
Output
Temp. (
C
)
Time (m)
Trial I (D
efa
u
lt b
l
ock size)
Raspberr
y
82.69
344.76
22.35
3.82
0.51
4.21
38.26
17.66
PC 24.07
502.07
79.65
3.30
0.32
0.60
50.12
1.13
Trial II (Bl
o
ck si
ze : 10 KB
)
Raspberr
y
62.50
335.91
8.28
6.45
7.28
9.51
49.04
25.86
PC. 22.10
415.24
15.97
9.82
2.17
12.91
49.10
1.16
It can
be
gat
here
d
from T
able
3 that t
he
DNA
se
q
uen
ce
alignm
ent u
s
ed
alm
o
st the
entire
cap
a
cit
y
of the proce
s
sor a
nd me
mory on
Ra
spberry Pi. On PC, block si
ze cha
nge
s do
es
not signifi
can
t
ly affect to processo
r an
d
execut
ion ti
me. File blo
c
k dist
ributio
n had the follo
wing
ef
f
e
ct
s:
a.
A decrea
s
e in
average p
r
o
c
essor a
nd me
mory use.
b.
A decrea
s
e in
disk
read
sp
eed an
d an in
cre
a
se in disk write speed.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 1059 – 10
66
1064
c.
An increa
se i
n
netwo
rk in
p
u
t/output spe
ed.
d.
An increa
se i
n
temperature and job
co
mpletion time
.
2.
Hum
an papill
om
aviru
s
typ
e
132
(NC_0
1495
5.1) an
d
Hum
an papil
l
om
aviru
s
typ
e
134
(NC_0
149
56.
1)
Referen
c
e se
quen
ce
: Hum
an papi
llom
a
viru
s typ
e
132
Query
sequ
e
n
ce
: Hum
an papi
llom
a
viru
s typ
e
134
On the first tri
a
l, a 128 MB block si
ze
wa
s
used for
bot
h seq
uen
ce
s.
On the se
co
nd trial,
a 3 KB block
size wa
s u
s
e
d
for the NC_
0149
55.1 seq
uen
ce an
d a 5 KB block
si
ze was u
s
e
d
for
the NC_014
9
56.1 se
que
n
c
e. A more than 93% DNA
similarity wa
s found o
n
the seq
u
e
n
ce
a
lig
n
m
en
t w
i
th
a
b
i
t s
c
o
r
e
o
f
4
3
.
5
.
T
h
e
te
s
t
r
e
s
u
lt o
f
th
e
R
a
sp
be
r
r
y
Pi r
e
s
o
ur
c
e
u
s
ag
e
and
pro
c
e
ssi
ng time for DNA 2
sequ
en
ce ali
gnment is
sh
own in Ta
ble
4.
Table 4. Te
st result for NC_014
955.1 a
n
d
NC_01
495
6.1
Disk (
K
bps)
Netw
or
k (
K
bps)
Processor (%
)
Memor
y
(MB)
Read
Write
Input
Output
Temp. (
C
)
Time (m)
Trial I (D
efa
u
lt b
l
ock size)
Raspberr
y
92.14
355.49
16.34
4.73
4.91
7.69
48.04
15.45
PC
20.93
719.73
46.62
22.81
7.28
32.72
44.75
1.10
Trial II (Bl
o
ck si
ze : 5 KB)
Raspberr
y
67.01
312.36
14.02
7.91
3.08
6.52
47.29
19.84
PC 17.48
679.31
37.88
8.62
4.03
30.59
51.25
1.19
It can
be
gat
here
d
from T
able
4 that t
he
DNA
se
q
uen
ce
alignm
ent u
s
ed
alm
o
st the
entire
cap
a
cit
y
of the proce
s
sor a
nd me
mory on
Ra
spberry Pi. On PC, block si
ze cha
nge
s do
es
not signifi
can
t
ly affect to processo
r an
d
execut
ion ti
me. File blo
c
k dist
ributio
n had the follo
wing
ef
f
e
ct
s:
a.
A decrea
s
e in
average p
r
o
c
essor a
nd me
mory use.
b.
A decrea
s
e in
disk
read
sp
eed an
d an in
cre
a
se in disk write speed
on Ra
sp
berry
Pi.
c.
A decrea
s
e in
disk
read/
wri
t
e spee
d on
PC.
d.
A decrea
s
e in
network inp
u
t/output spee
d.
e.
A decrea
s
e in
temperatu
r
e
on Ra
sp
berry
Pi and an increa
se in temp
eratu
r
e on P
C
.
f.
An inc
r
eas
e
in job c
o
mplet
i
on time.
3.
Cha
e
tocero
s tenuissim
us DNA virus
(NC_01
4748.
1)
an
d
Chaet
oce
r
o
s
lo
ren
z
ian
u
s
DNA Viru
s
(NC_0
152
11.1)
Referen
c
e se
quen
ce
: Chaeto
c
ero
s
tenui
ssim
u
s DNA vi
ru
s
Query
sequ
e
n
ce
: Chaeto
c
ero
s
lore
nzi
anu
s DNA Viru
s
On the first tri
a
l, a 128 MB block si
ze
wa
s
used for
bot
h seq
uen
ce
s.
On the se
co
nd trial,
a 3 KB
block si
ze
wa
s u
s
ed for both
the
NC
_014
7
48.1
d
an NC_015
211.1 seque
nces
A more
than 83% DNA similarity was foun
d on t
he se
que
nce
alignme
n
t with a bit score
of 63. The test
result of the
Ra
spb
e
rry Pi re
sou
r
ce u
s
age
and
processing
time f
o
r
DNA
3
se
quen
ce
align
m
ent
is sh
own in Table 5.
Table 5. Te
st result for NC_014
748.1 a
n
d
NC_01
521
1.1
Disk (
K
bps)
Netw
or
k (
K
bps)
Processor (%
)
Memor
y
(MB)
Read
Write
Input
Output
Temp. (
C
)
Time (m)
Trial I (D
efa
u
lt b
l
ock size)
Raspberr
y
71.24
348.62
13.97
5.03
4.74
7.43
47.61
15.07
PC
34.91
798.56
43.33
11.58
5.69
3.93
43.62
1.29
Trial II (Bl
o
ck si
ze : 3 KB)
Raspberr
y
62.88
311.02
16.46
10.32
7.24
9.89
48.36
25.44
PC
30.19
628.63
46.80
12.28
8.17
9.13
46.99
1.57
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Had
oop Pe
rform
ance Ana
l
ysi
s on
Ra
sp
berry Pi for DNA Seque
nce… (Ja
y
a Se
na Turana
)
1065
It can
be
gat
here
d
from T
able
5 that t
he
DNA
se
q
uen
ce
alignm
ent u
s
ed
alm
o
st the
entire
cap
a
cit
y
of the proce
s
sor a
nd me
mory on
Ra
spberry Pi. On PC, block si
ze cha
nge
s do
es
not signifi
can
t
ly affect to processo
r an
d
execut
ion ti
me. File blo
c
k dist
ributio
n had the follo
wing
ef
f
e
ct
s:
1.
A decrea
s
e in
average p
r
o
c
essor a
nd me
mory use.
2.
An increa
se i
n
disk rea
d
speed a
nd di
sk write spe
e
d
.
3.
An increa
se i
n
netwo
rk in
p
u
t/output spe
ed.
4.
An increa
se i
n
temperature and job
co
mpletion time
.
From
all of the DNA
seq
uen
ce ali
g
n
m
ent
trial
s
o
n
Ra
sp
berry
Pi above, it can
be
con
c
lu
ded th
at file block
distrib
u
tion o
n
Data
Nod
e
can lo
we
r th
e avera
ge proce
s
sor
use a
s
much a
s
24.1
4
% and the a
v
erage m
e
m
o
ry use a
s
m
u
ch a
s
8.48%
. Howeve
r, this increa
se
s the
job pro
c
e
s
sin
g
time as mu
ch a
s
31.53%
. On PC,
block si
ze ch
ang
e
s
doe
s not si
gnifica
ntly affect
the processo
r, mem
o
ry a
nd p
r
o
c
e
ssi
n
g
time.
Pro
c
e
s
s
o
r us
e
low
e
r as
mu
ch
as
12
.7
3%,
memory u
s
e
as mu
ch
as 1
4
.73% and in
cre
a
si
ng p
r
o
c
essing time a
s
mu
ch a
s
10
.85%. Split file
into a
sm
alle
r bl
ocks file
sho
w
s that
MapRedu
ce
j
obs will
be
l
onge
r th
an
u
s
ual
be
ca
use
of
overhe
ad the
splitting p
r
o
c
e
ss
and
cre
a
ting Hado
o
p
task. File
block di
strib
u
t
ion not dire
ctly
affects di
sk I/O, netwo
rk I/O and tempe
r
ature.
5. Conclusio
n
As
lo
w cost commodity
ha
rdwa
re, Ra
sp
berry
Pi can
be u
s
ed
a
s
a
n
alternative
hard
w
a
r
e
to impleme
n
t Had
oop
cl
u
s
ter.
Had
oop
clu
s
ter can
wo
rk well
on Raspbe
rry Pi. The o
n
ly
disa
dvantag
e
is the increa
se of job co
mpletion ti
me
even though
it is only for a simple jo
b. Big
data im
pleme
n
tation
su
ch
as
DNA
seq
u
ence ali
gnme
n
t wa
s ru
nni
ng o
n
the
Ra
spb
e
rry Pi
wi
th
an ave
r
ag
e u
s
ag
e of
pro
c
essor 73.0
8
%
, 334.69
M
B
of mem
o
ry
and
19.8
9
m
i
nutes of jo
b
time
compl
e
tion. T
he di
stributio
n of Had
oop
data file
blo
c
ks
wa
s foun
d
to redu
ce
proce
s
sor
usag
e as
much
as 2
4
.
14% and m
e
mory u
s
ag
e as mu
ch
as 8.49%.
Ho
wever, thi
s
incre
a
sed
job
pro
c
e
ssi
ng time as mu
ch
as 31.5
3
%.
Apach
e
Had
oop
ha
s al
re
ady relea
s
ed
so
me to
ols
to co
mpile
Hadoo
p
sou
r
ce into
a
native libra
ry, inclu
d
ing
an
ARM p
r
o
c
e
s
sor. A
nativ
e libra
ry is a
l
i
bra
r
y that is re
compil
ed t
o
native cod
e
a
c
cordi
ng to th
e platform th
at is run. F
u
ture resea
r
ch can u
s
e
Ha
d
oop native lib
rary
to see the p
e
rform
a
n
c
e
of Hado
op cluster o
n
Ra
spb
e
rry Pi or other lo
w-cost com
m
odi
ty
hard
w
a
r
e.
Ackn
o
w
l
e
dg
ements
The auth
o
rs woul
d like to than
k Sim
one L
eo (si
m
one.leo
@
crs4.it) for
his valuabl
e
comm
ents a
n
d
feedba
ck re
gardi
ng this
rese
arch stu
d
y
.
Referen
ces
[1]
Paul C, Chris
E, Dirk D,
T
h
o
m
as D, George L.
Understa
n
d
in
g Big Data: Anal
ytics for Enterpris
e
Class
Had
oop
and St
reamin
g Data.
Ne
w
York: McGra
w
-
Hil
l Com
pan
ies. 20
12: 5-9.
[2]
Aislin
g O, Jur
a
te D, R
o
y DS.
'
B
ig d
a
ta'
,
Ha
d
oop
an
d
Clo
ud
Comp
utin
g i
n
Genomics.
Sc
i
ence
Dir
ect
Journ
a
l of Bio
m
e
d
ic
al Infor
m
atics.
46(5): 77
4-78
1.
[3]
Jacob L, Chr
i
stos K. On
the ener
g
y
(
i
n)
efficienc
y of H
ado
op clust
e
rs.
ACM SIGOPS Operating
System
s Rev
i
ew
. 2010; 44(1):
61-65.
[4]
Apach
e
Ha
doo
p. http://hadoo
p.apac
he.or
g/.
[5]
Dhru
ba B, et al.
Apach
e
Had
oop Go
es
Realti
me at
F
a
cebo
ok
. Internati
o
n
a
l C
onfere
n
ce o
n
Mana
geme
n
t o
f
Data. Ne
w
Y
o
rk. 2011: 10
71-
108
0.
[6]
Konstanti
n
S, Hairo
ng
K, Sanjay
R, Robert C.
T
he Had
o
op Distri
bute
d
F
ile Syste
m
. Mass Storag
e
S
y
stems a
nd Techn
o
lo
gi
es (MSST
), 2010 IEEE 26th S
y
m
p
osium o
n
. Incli
ne Vil
l
ag
e, NV. 2010: 1-1
0
.
[7]
Rasp
berr
y
Pi. http://
w
w
w
.
ras
pberr
y
p
i
.org/.
[8]
F
ung PT
, Davi
d RW
, Simon
J, Jerem
y
S,
Dimitrios P
P
.
T
he Glasg
o
w
Rasp
berry Pi
Clou
d
: A Sca
l
e
Mode
l for
Clo
u
d
C
o
mputi
n
g
Infrastructures
.
Distribut
ed
Co
mputin
g S
y
ste
m
s Worksho
p
s
(ICDCSW),
201
3 IEEE 33r
d Internatio
na
l Confer
enc
e
on
. Philad
e
lp
hi
a. 201
3: 108-
112.
[9]
Pekka A, et al.
Affordable a
n
d
Energy-Effici
ent Clo
ud Co
mput
in
g Clusters
:
T
he Bol
z
a
no
Rasp
berry P
i
Clou
d
Cl
uster Experi
m
ent
. Cl
oud C
o
mputi
n
g T
e
chnolo
g
y
and Sc
i
enc
e (Clou
d
C
o
m), 2013 IEEE 5th
Internatio
na
l C
onfere
n
ce o
n
. Bristol. 201
3: 1
70-1
75.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 1059 – 10
66
1066
[10] Ron
a
ld CT
.
An
Overview
of
The Ha
doo
p/Ma
pRe
duce/HB
as
e F
r
amew
ork a
nd Its Current Appl
icatio
ns
in Bioi
nfor
mati
cs
. Proceedi
ng
s of the 11th Annu
al Bio
i
nf
o
rmatics Open Source C
onfer
ence (BOSC
)
201
0. Boston. 201
0.
[11]
Stephe
n F
A
, W
a
rren G, W
ebb
M, Eugen
e W
M
, Da
vid JL. B
a
sic L
o
cal A
lig
nment Se
arch
T
ool.
Scienc
e
Direct Jour
nal
of Bio
m
ed
ical I
n
formatics
. 19
90; 215(
3): 403
-410.
[12]
Simon
e
L, F
eder
ico S, Gi
anl
uig
i
Z
.
Bio
doo
p: Bio
i
nfor
matics
o
n
H
a
doo
p
. Par
a
ll
el
Processi
n
g
W
o
rkshops, 20
09. ICPPW
'
09, Internation
a
l C
onfere
n
ce o
n
. Vien
na. 20
09: 415-
422.
[13]
San M, Ganga
dhar
an.
Harn
e
ssing Gree
n IT
. Ne
w
Y
o
rk, W
e
st
Sussex: A John W
ile
y
& Sons, Ltd..
201
2: 7-10.
[14]
Matthe
w
L, Brent N, David
E.
T
he Gangli
a
Distrib
uted
Monitori
ng S
y
s
t
em: Design, Impleme
n
tatio
n
,
and E
x
p
e
ri
enc
e.
Science D
i
re
ct Journal of Bi
omed
ical Infor
m
atics
. 2
004; 3
0
(7): 817-
84
0.
[15]
Matt M, Bernar
d L, Bra
d
N, V
l
adimir
V. Mon
i
torin
g
With Gan
g
lia.
Cal
i
forn
ia:
O'Reill
y M
e
d
i
a
,
Inc. 201
3:
66-6
8
.
[16]
Yan
X, W
a
n
g
Z
,
Z
eng D, Hu
C, Yao
H. De
sign
an
d An
al
ysis of Para
ll
el
MapR
educ
e b
a
sed K
NN-j
o
i
n
Algorit
hm for
Big D
a
ta
Clas
s
ificatio
n.
T
E
L
K
OMNIKA Ind
ones
ian
Jo
urn
a
l of E
l
ectric
al
Engi
ne
erin
g.
201
4; 12(1
1
): 7927-
793
4.
[17]
Co
x SJ,
Co
x
JT
, Boardma
n
RP, Jo
hnsto
n
SJ, Sco
tt M,
O'Brien NS. Iridis-pi:
a
lo
w
-
cost, compact
demo
n
stration
cluster. 201
4; 17(2): 34
9-3
5
8
.
[18]
Ashari A,
Ria
setia
w
a
n
M.
High
Perform
a
nce C
o
mp
utin
g on
Cl
uster
and M
u
ltic
ore
Architecture.
T
E
LKOMNIKA T
e
leco
mmunic
a
tion C
o
mputi
n
g Electron
ics a
nd Co
ntrol
. 20
15; 13(4): 1
408
-141
3.
Evaluation Warning : The document was created with Spire.PDF for Python.