TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 12, Decembe
r
2014, pp. 82
7
8
~ 828
5
DOI: 10.115
9
1
/telkomni
ka.
v
12i12.65
72
8278
Re
cei
v
ed
Jul
y
4, 2014; Re
vised Septem
ber
21, 20
14;
Accept
ed O
c
tober 12, 20
1
4
Accelerating Computation of DNA Multiple
Sequence
Alignment in Distributed Environment
Ramdan Sa
tra*
1,2
, Wisnu Anan
ta
Kus
u
m
a
1
, Heru Sukoco
1
1
Departme
n
t of Computer Sci
ence, Bog
o
r A
g
ricult
ural U
n
iv
ersit
y
Kampus IPB D
e
rmag
a
, Jl. Meranti, W
i
ng
20
Leve
l
5-6, Bog
o
r 166
80, Indo
nesi
a
T
e
lp./F
ax.: +
62-251-
862
55
84
2
Departme
n
t of Informatics Engin
eeri
ng, Un
iv
ersi
t
y
of Musl
i
m
Indones
ia, Makassar, Ind
ones
ia
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: ramdanstr@
g
mail.c
o
m
1
, ananta@
ipb.ac.i
d
2
, hsrkom@ipb
.ac.id
2
A
b
st
r
a
ct
Multipl
e
s
equ
e
n
ce
ali
g
n
m
e
n
t
(MSA) is a
tec
hni
que
for fi
ndi
ng s
i
mil
a
rity i
n
ma
ny s
equ
enc
es. T
h
i
s
techni
qu
e is v
e
ry i
m
porta
nt to sup
port man
y
Bioi
nfor
matic
s
task such a
s
identifyi
ng S
i
ngl
e N
u
cle
o
tid
e
Poly
mor
phis
m
(SNP), cre
a
ti
ng
phyl
o
g
e
n
e
ti
c tree, a
n
d
metage
no
me fra
g
ments
bin
n
i
n
g. T
he s
i
mpl
e
st
alg
o
rith
m in M
SA is Star Algorith
m
. T
h
is al
gorith
m
co
nsis
ts of alignin
g
all poss
i
bl
e pa
irs of seque
nc
es,
findi
ng
a s
e
q
u
ence
Star c
h
o
s
en fro
m
seq
u
ence
that
has
maxi
mu
m a
l
i
g
n
m
e
n
t score,
an
d
ali
gni
ng
all
sequ
enc
es refered to the s
equ
enc
e Star. Each of
pair
w
ise alig
nmen
ts is conducte
d usin
g dyna
mi
c
progr
a
m
min
g
techn
i
qu
e. T
he
compl
e
xity of
DNA
multi
p
l
e
s
equ
enc
e al
ign
m
e
n
t usi
ng dy
n
a
mic pro
g
ra
mmi
n
g
techni
qu
e is v
e
ry hi
gh. T
he
computati
on ti
me
is
incr
eas
ed ex
po
nenti
a
l
l
y du
e to the
incre
a
sin
g
of t
h
e
nu
mb
er a
n
d
th
e l
ength
of
DN
A seq
u
e
n
ces.
T
h
is res
earch
ai
ms to
acce
le
rate co
mputati
on
of Star M
u
ti
ple
Sequ
enc
e Al
ig
nment
usin
g M
e
ssag
e
P
a
ssin
g
Interfaces
(M
PI). T
he perf
o
rma
n
ce
of th
e
p
r
opos
ed
metho
d
w
a
s evaluate
d
by calcu
l
atin
g spee
du
p. Expe
riment w
a
s con
ducted us
in
g 6
4
sequ
enc
es of 800 bp Glyci
n
e
-
max-c
h
ro
mos
o
me-
9
-BBI frag
me
nts yie
l
d
e
d
by ra
ndo
mly
cut from r
e
fer
ence s
e
q
u
e
n
c
e
of Glyci
ne-
max-
chro
mos
o
me-9
-BBI taken fro
m
NCBI (N
atio
nal
Ce
nter
for
Biotech
nol
ogy
Information). T
he r
e
sults s
h
o
w
e
d
that the pro
p
o
s
ed tech
niq
ue
coul
d obta
i
n s
pee
d u
p
three
times
usin
g fi
ve co
mp
uters
w
hen al
ign
i
n
g
6
4
sequ
enc
es of Glycine-
max-c
h
ro
mos
o
me-9-
BBI fragme
n
ts
. Moreover, th
e incre
a
sin
g
o
f
the nu
mber
o
f
computers would significantly increas
e spe
e
d
up of the prop
osed
meth
od.
Ke
y
w
ords
:
DNA mu
ltipl
e
sequ
enc
e
al
ig
nment,
d
i
stribu
ted
co
mputin
g, star al
gorith
m
, mess
ag
e p
a
s
s
i
n
g
interface
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Multiple Seq
uen
ce Alig
n
m
ent (MSA
) is a
techni
que fo
r alig
n
i
ng multipl
e
biologi
cal
seq
uen
ce
s [1] to find similaritie
s
an
d differen
c
e
s
in the seq
uen
ce [2]. MSA is use
d
in
phyloge
netic
analysi
s
to a
s
sessin
g the
origin
of
spe
c
ies. MSA is
a
l
so u
s
e
d
to
search
a si
ngl
e
nucl
eotide po
lymorphi
sm (SNP)
fo
r
mol
e
cul
a
r-ba
se
d
plant b
r
ee
din
g
gen
etics [3]
.
The g
r
o
w
th
of
biologi
cal se
quen
ce data
bases ma
ke
s the se
q
u
ence alignm
ent comp
uta
t
ion increa
se
s
i
gnific
antly [1].
The co
mplex
i
ty of sequence alig
nme
n
t is O (LN), where L i
s
the length of each
seq
uen
ce
an
d N i
s
the
nu
mber
of se
qu
ences [4]. Th
e long
er a
nd
the more seq
uen
ce
s to b
e
aligne
d the longe
r it’s computation
a
l
time. The
optimal time of sequen
ce alignme
n
t is
increa
sed ex
pone
ntially with incre
a
sein
g of
the numb
e
r and le
ngth
of sequ
en
ce
s [5].
Method
s o
r
t
ools
are ne
e
ded to o
b
tain
optim
al com
putational proc
e
s
s. Previo
us
study
to optimi
z
e t
he
comp
utational
se
que
nce alig
nm
ent
use
d
a
dyna
mic
pro
g
ra
m
m
ing al
go
rith
m
w
i
th complexity of
O (m
n)
by Zhou a
nd
Che
n
[6]. Other stu
d
ie
s u
s
ed a line
a
r
space algo
rith
m
for p
a
irwise
seq
uen
ce
ali
gnment
with
the compl
e
xity of
O (n
)
[
7
, 8]. Othe
rs used
pa
rall
el
comp
uting wi
th MPI [9] and GPU-CUDA [1, 3]. Parallel
comp
uting used to prop
ose n
e
w
clssification
algorith
m
[12
]
and
simula
te pro
perti
e
s
of polyme
r
chai
n [13] in
other case
of
comp
utationa
l proble
m
.
In this research,
parallel
computing i
s
utilized on clusters
of computers known as
Distri
buted
M
e
mory
. Thi
s
res
e
a
r
c
h
trie
s to
imp
r
ove
a previou
s
work
co
ndu
ct
ed by Sun
a
rt
o
w
h
ic
h
us
ed
sta
r
a
l
go
r
i
th
ms
in
p
a
r
a
lle
l
e
n
v
ir
o
n
me
n
t
u
s
ing
MPI [9
]. T
h
e
pr
e
v
io
us
re
se
a
r
c
h
had
limitation in scalability. In this research,
we
proposed
a scallabl
e di
stributed com
puting of DNA
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Accel
e
rating Com
putation of
DNA
Multi
p
le
Seque
nce
Alignm
ent in… (Ram
da
n Satra)
8279
Multiple Seq
uen
ce Align
m
ent. The n
e
w a
pproa
ch
coul
d han
dl
e more num
ber of
seq
u
e
n
ce
s
and
wa
s al
so
easily
exten
ded by i
n
crea
sing th
e n
u
m
ber
of re
so
urce
s u
s
ed
to
compute m
u
ltiple
seq
uen
ce ali
gnment.
2. Rese
arch
Metho
d
2.1. Rese
arc
h
Data
In this research, we u
s
ed
the seq
uen
ce of
Glycin
e
-m
ax-chrom
o
s
om
e-9-BBI
obtained
from NCBI (
Nation
al
Cen
t
er for Biote
c
hn
olog
y Inf
o
rm
ation
) [10]. Detailed
research data is
sho
w
n in Ta
b
l
e 1.
Table 1. Re
search data
Gene N
a
me
Length (b
p)
Number
G
l
ycine-
m
a
x-
chrom
o
som
e
-
9
-
BBI
811-850
64
2.2. Multiple Sequenc
e Al
ignment (MS
A
)
w
i
th Star
Algorithm
Algorithm
s for MSA are
actually de
veloped b
a
sed on p
r
ob
abilisti
c or
heuri
s
ti
c
approa
che
s
such
a
s
th
e St
ar m
e
thod
an
d the
Pro
g
re
ssive
meth
od
[3]. This research
u
s
ed
the
Star
metho
d
by
improving and extendin
g
t
hep
reviou
s re
sea
r
ch
co
ndu
cted
by Sunarto
AA
et al
.
In the Star ali
gnment
algo
ri
thm,
there are
3 stage
s,
i
n
clu
d
ing (i
) calcul
ating
p
a
i
r
wi
se alignm
ent
score
s
from
all possibl
e
seq
uen
ce p
a
i
rs, (ii
)
sele
cting a Star
seque
nce whi
c
h h
a
s the
b
e
st
alignme
n
t score, an
d (iii
)
aligning
the S
t
ar seque
nce
with othe
r se
quen
ce
s [3, 9
]. The illustration
of multiple se
quen
ce alig
n
m
ent usin
g the Star al
gorit
hm can b
e
de
scribe
d as foll
ows:
For exampl
e, sup
p
o
s
ed
we
have
DNA seque
nce data
as follows:
Seq 1 = ATG
G
Seq 2 = ATG
C
Seq 3 = ATCC
Seq 4 = AGCG
The first
sta
ge is
calcula
t
ing pairwise
si
milarity score
s from all
possible
se
quen
ce
s
pairs. This
stage ha
s
O
(
) compl
e
xity
where
n
i
s
nu
mber of sequ
ences. Thi
s
stage begi
ns
by
cal
c
ulatin
g the numbe
r of seque
nce pairs usi
ng Equat
ion (1
):
(n-
1
) +
(n
-2)
+ …. + 1
or
n
(
n-1)/2
(
1
)
Whe
r
e
n
is n
u
mbe
r
of
seq
uen
ce
s.
The next ste
p
is creating
identity matrix c
ontainin
g
the se
que
nce
s
, symboli
z
e
d
by Kn.
Figure 1 sh
o
w
s the id
entity matrix.
Seq 1
Seq 2
Seq 3
Seq 4
Seq 1
K1 K2
K3
Seq 2
K1
K4 K5
Seq 3
K2
K4
K6
Seq 4
K3
K5
K6
Figure 1. Pairwise se
que
nce identity matrix
The re
sult
s of the determin
a
tion
of DNA
pairwise se
qu
ence are:
K1 = Seq 1 (ATGG) a
nd Seq 2 (ATG
C)
K2 = Seq 1 (ATGG) a
nd Seq 3 (AT
C
C)
K3 = Seq 1 (ATGG) a
nd Seq 4 (AG
C
G
)
K4 = Seq 2 (ATGC) and S
eq 3 (AT
C
C)
K5 = Seq 2 (ATGC) and S
eq 4 (AG
C
G
)
K6 = Seq 3 (ATCC) and S
eq 4 (AG
C
G
)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8278 – 82
85
8280
Each pai
r of sequ
en
ce
s above is alig
ned
u
s
ing Needlem
an Wunsch al
gorit
hm. Thi
s
algorith
m
u
s
es dyn
a
mic prog
ram
m
in
g app
roa
c
h
to find glo
bal simil
a
rity of seq
uen
ces.
Needleman Wunsch algorithm
consist
s
of matrix i
n
itialization, fillin
g the value of
each cell matrix
and tra
c
ing b
a
ck. Similarity sco
re of pai
rwi
s
e sequ
en
ce is
cal
c
ulat
ed usi
ng Equ
a
tion (1
) belo
w
:
,
,
,
,
0
(
1
)
Whe
r
e:
m
is the row and
n
is the
collumn.
A
is a matrix of values
for each cell i
s
e
x
presse
d as
(
A
[m,n]
)
S
is the score
of each cell is expre
s
sed
as (
S
[m
,n]
), match sco
r
e = 5
and unm
atch
score =
-3
W
expressed
as ga
p align
m
ent with value = - 4
The cal
c
ul
at
ion re
sult
s of
simila
rit
y
sco
r
e
s f
o
r
all pai
rs of seq
uen
ces are sh
own
in Figure 2.
0
Seq 1
Seq 2
Seq 3
Seq 4
Seq 1
0
12
4
7
Seq 2
12
0
12
7
Seq 3
4
12
0
4
Seq 4
7
7
4
0
Figure 2. Sequen
ce align
m
ent simila
rity score
s
The second
stage of sta
r
a
l
gorithm i
s
se
lect
ing a Sta
r
seq
uen
ce. A
Star seq
uen
ce
wa
s
sele
cted
by compa
r
ing th
e
se
quen
ce
ali
gnment
sim
ili
artity scores
of all
sequ
en
ce
s. A sequ
e
n
ce
with the be
st
simila
rity score
wa
s cho
s
en
as
a Sta
r
se
que
nce. Next, the Star se
que
nce
wa
s
update
d
by a
ligning it to th
e othe
r sequ
enc
es. T
he
complexity of this
stage
is
O(n
)
, wh
ere
n
is
the numbe
r o
f
seque
nces.
Illustration of
this stag
e is shown in Figu
re 3.
0
Seq 1
Seq 2
Seq 3
Seq 4
A
ccu
mulat
e
d
Similarity Sc
ores
Seq 1
0
12
4
7
23
* Seq 2
12
0
12
7
31
Seq 3
4
12
0
4 20
Seq 4
7
7
4
0
18
Star Sequen
ce = Seque
nce 2 ( A T G C )
Seq 2
Seq 1
Seq 3
Seq 4
Ne
w Star Sequen
ce
= Sequen
ce 2 ( A
T G C – )
Figure 3. Selecting a Star
seq
uen
ce
Reali
gnm
ent
for the updat
ed Star se
qu
ence
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Accel
e
rating Com
putation of
DNA
Multi
p
le
Seque
nce
Alignm
ent in… (Ram
da
n Satra)
8281
The third
sta
ge of star al
gorithm i
s
re
a
lignme
n
t. This sta
ge is realigni
ng the Star
seq
uen
ce to others se
que
nce
s
by usin
g dynam
ic p
r
ogra
mming t
e
ch
niqu
e. Co
mplexity is
O(n),
whe
r
e
n
is th
e amount of seque
nces. IIlustratio
n
for realignm
ent is sho
w
n in Fig
u
re 4.
Seq 2 ( A T G C - )
Seq 1 ( A T G G )
Seq 3 ( A T C C )
Seq 4 ( A G C G )
A T G C –
A T G – G
A T – C C
A – G C G
Figure 4. Rea
lignment
2.3 Parallelization MSA
w
i
th Mess
age
Passsing Interfa
ces (MPI)
In this research, we u
s
ed
Foste
r
’s Meth
odolo
g
y for impleme
n
ting parall
e
l pro
g
ramming
.
This m
e
thod
ology con
s
ist
s
of p
a
rtitioni
ng,
co
mmuni
cation, a
gglo
m
eratio
n an
d
mappi
ng
sta
g
e
[8]. Parallelization was
co
ndu
cted for
comp
uting p
a
irwi
se
simil
a
rity scores.
The calcula
t
ion
bega
n by dividing the se
q
uen
ce pai
rs i
n
to mu
ltiple processe
s (th
r
ead
s) symb
ol
ized by
K1, K2
... Kn
. In the
ca
se
wh
en th
e am
ount
of
comp
uters i
s
equ
al to th
e
amo
unt of
p
r
ocesse
s,
ea
ch
process
will
be allocated
to each com
puter. If
Pk
i
s
a
comp
uter for
k
=
1,2,...n this
alloc
a
t
i
on
can b
e
define
d
as
P1 =
K1,
P2 =
K2 ... P
n
=
Kn
. Unfo
rtunately the numbe
r of seq
uen
ce pai
rs i
s
often large
r
than the num
ber of com
p
u
t
ers. S
equ
en
ce pai
rs
whi
c
h has not be
en allocated
yet
must wait un
til the previo
us p
r
o
c
e
ss
h
a
s b
een
com
p
leted. Allocation of
processe
s (th
r
ea
ds)
whe
n
the
nu
mber of
seq
u
ence p
a
irs a
r
e mo
re th
an t
he n
u
mbe
r
of co
mpute
r
s o
r
when
Kn > Pn
can b
e
de
scri
bed in Figu
re
5.
K1
P1
K2
K3
P3
P2
K4
P1
Wa
it
..
K5
P2
K6
P3
Kn
Pn
T
1
D
o
ne
T
2
S
t
ar
t
M
e
r
g
e
r
Pr
oc
ess
(
T
1)
M
e
r
g
e
r
Pr
oc
ess
(
T
2)
(
T
n
)
Figure 5. The
distributio
n o
f
proce
s
se
s when t
he num
b
e
r of se
quen
ce pairs are more than the
numbe
r of co
mputers or
Kn
>
P
n
Illustration
of
data di
stri
bu
tion of DNA
seq
uen
ce
wit
h
MPI ca
n b
e
seen i
n
Fig
u
re
6. In
this re
sea
r
ch,
comuni
catio
n
of MPI used point
to poi
nt commu
nication with a b
l
ocking
sen
d
and
receives
ope
rations. Pa
rall
elizatio
n sch
e
m
e for M
SA i
s
to a
ssi
gn a
comp
uter
as t
he data
divid
e
r
and
oth
e
r co
mputers as d
a
ta
proc
esso
rs. A d
a
ta di
vider
calle
d
rank 0
distri
buted seq
u
e
n
ce
pairs usi
ng
MPI_Send()
to other co
mputer a
s
d
a
ta pro
c
e
s
so
rs (Fi
g
u
r
e 7).
Data pro
c
e
s
sors
whi
c
h we
re
symbolize
d
by
the
rank 1...n
received
se
quen
ce
pai
rs
usin
g
MPI_Recv()
(
F
ig
ur
e 8
)
.
Next, each
data processors
co
ndu
cte
d
pairwis
e seque
nce alig
nment to co
mpute pai
rwi
s
e
simila
rity sco
r
es in
pa
rallel
.
The
cal
c
ulat
ion re
sults
of ea
ch
data
proce
s
sor were
tran
smitted t
o
data divider
(
ra
nk 0
)
by usin
g
MPI_Send()
. Data
divider with
rank 0
received the simila
rity
s
c
o
r
es
us
ing
MPI_R
e
cv(
)
comma
nd a
nd co
mpletin
g
the MSA pro
c
e
ss
by sele
cting a
Star
seq
uen
ce a
n
d
realig
ning a
ll sequ
en
ce to the Star se
quen
ce
The re
sult
s of the sequ
en
ce alignme
n
t
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8278 – 82
85
8282
Figure 6. Illustration the distribut
ion of se
quen
ce d
a
ta with MPI
/
/MPI init
ializa
t
ion
MPI_Init(&argc, &arg
v
)
;
MPI_Comm_
r
a
nk (MPI_COMM_WORLD,
&r
ank);//Computer init
ializa
t
ion
MPI_Comm_size (MPI_COMM_WORLD, &p_size);
//Init
ialize the n
u
mb
er of computers
If rank == 0 {
c =
((js*js)-js)/2; // Calcu
l
ate th
e man
y
p
a
ir
w
i
se seq
u
e
n
c
es
p_
bts
=
p_
s
i
ze
-1
; // Limit
Numbe
r
of Compute
r
s
p
=
1;
//
L
i
mit
Nu
mb
er o
f
co
mp
u
t
ers fo
r p
r
o
ces
sin
g
d
a
ta (o
th
er th
an
r
a
n
k
0)
for
(i
=1
; i
<=js
-
1
;
i++){
f
o
r (j=
i
+1; j
<
=js; j+
+){
us
1
=
i-1
;
/
/
In
itia
liza
t
i
on
the
orde
r of s
e
nding
da
ta
s
e
que
nc
e
us
2
=
j-1
;
/
/
In
itia
liza
t
i
on
the
orde
r of s
e
nding
da
ta
s
e
que
nc
e
MPI_Send
(&c, 1, MP
I_INT, p
,
1, MP
I_COMM_WORLD
);
MPI_Send
(&SIZE, 1, MP
I_INT, p, 3, MPI_COMM_WORLD);
MPI_Send
(&p_bts, 1,
M
P
I_INT, p
,
5, M
P
I_COMM_WORLD);
MPI_Send
(&us1, 1, MP
I_INT, p
,
6, MP
I_COMM_WORL
D);
MPI_Send
(&us2, 1, MP
I_INT, p
,
7, MP
I_COMM_WORL
D);
MPI_Send
(s[us1], SIZE+
1
,
MPI_CH
A
R
,
p, 8, MPI_COMM_WORLD);
MPI_Send
(s[us2], SIZE+
1
,
MPI_CH
A
R
,
p, 9, MPI_COMM_WORLD);
p
++;
if (p
>p_
b
ts
){p
=
1
;
} // L
oop
in
g s
e
que
nc
e
da
ta
s
e
nding b
e
s
i
de
s
to ra
nk 0
}
}
} /
/
Tu
tu
p ran
k
0
Figure 7. Source
cod
e
data
divider
If rank <> 0 {
MPI_Rec
v
(&c, 1, MPI_INT,
0,
1, MPI_COMM_WORLD,
&status);
MPI_Rec
v
(&SIZE, 1,
MPI_I
N
T,
0, 3, MP
I_COMM_WORLD, &status);
MPI_Rec
v
(&p_bts, 1, MPI_
INT,
0, 5, MPI_COMM_WORLD, &status);
p=1;
for (h=0; h<=c-1; h++){
if (rank==
p
){
MP
I_Rec
v
(&us1, 1, MPI_INT,
0, 6, MP
I_COMM_WORLD, &status);
MP
I_Rec
v
(&us2, 1, MPI_INT,
0, 7, MP
I_COMM_WORLD, &status);
MP
I_Rec
v
(s[us1], SIZE+1,
MPI_CH
A
R
, 0, 8, MPI_COM
M
_WORLD, &status);
MP
I_Rec
v
(s[us2], SIZE+1,
MPI_CH
A
R
, 0, 9, MPI_COM
M
_WORLD, &status);
}
} //e
nd ra
nk
<
>
0
Figure 8. Source
cod
e
data
processo
rs
Dividers
data
Processin
g
of data
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Accel
e
rating Com
putation of
DNA
Multi
p
le
Seque
nce
Alignm
ent in… (Ram
da
n Satra)
8283
3. Results a
nd Analy
s
is
This re
se
arch
u
s
ed 5 co
mputers with
sp
es
ifi
c
ation
Intel Co
re
CPU i3
-322
0
@
3.30G
Hz
with 4 CPU
core
s, 8 GB of RAM memo
ry.
A whole genome data
wi
th FASTA format of
Glyci
n
e
-
ma
x
-
ch
r
o
moso
me
-
9
-
BBI
was divid
ed ra
ndomly into
some nu
mbe
r
s of sequ
en
ces
su
ch a
s
3,
4
,
8, 12, 14,
16
, 20, 24, 2
8
, 32, 36, 4
0
, 4
4
, 48,
52,
56
, 60 an
d 64
seq
uen
ce. T
e
sting
evalua
tion
wa
s co
ndu
cte
d
in 3, 4 and 5 comp
uters.
3.1.
Results
The evaluatio
n results wa
s describ
ed in Fi
gure 9-1
1
. Figure 9 sho
w
ed the pe
rforma
nce
of our propo
sed method in
term of execu
t
ion time.
Figure 9. the executio
n time in variou
s n
u
mbe
r
of seq
uen
ce
s
The result of seq
uen
ce al
ignment
com
put
ation
sho
w
ed th
at the parall
e
l co
m
putation
time was fa
ster than th
e sequ
ential
computat
io
n
time. Figure 9 sho
w
ed
the difference in
comp
utation
time of data FASTA Glycine-m
a
x-
chro
moso
me-9-B
BI with the length of 811
-850
basepai
r. Usi
ng 3, 4 an
d
8 seq
uen
ce
s the
com
p
u
t
ational time
wa
s only sl
ightly differe
nt.
Ho
wever,
wi
th the incre
s
ing
of the
numb
e
r
of
seq
uen
ce
s the co
mput
ational time
wa
s
s
i
gnific
antly different.
3.2. Analy
s
is
The pe
rform
ance an
alysi
s
o
r
evaluati
on of
this
propo
sed m
e
th
od was
co
n
ducte
d by
cal
c
ulatin
g sp
eedu
p [11]. Speed
up can b
e
obtaine
d by using Equ
a
tion (2
).
Speedup
(
2
)
De
scription:
Ts = Seq
uent
ial executio
n time
Tp = Parallel executio
n time
The results o
f
the parall
e
l comp
uting
wi
th
3, 4 and 5
PC (Perso
n
a
l Com
puter) sho
w
ed
that increa
sin
g
of the n
u
m
ber
of co
mpu
t
ers
(P
Cs) affected th
e pa
rallel exe
c
utio
n time. Parall
el
comp
utationa
l spe
edup i
n
crea
sed
as th
e numb
e
r of
comp
uters
was in
crea
sed.
Plot of paral
lel
comp
uting sp
eedu
p for 3, 4 and 5 PCs were sh
own in Figure 10.
0.
00
5.
00
10.
00
15.
00
20.
00
3
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
tim
e
(se
c
ond)
num
be
r
of
S
e
que
nce
Sequential
time
(1
PC
)
/
sec
(T
s)
Parallel
time
(3
PC
)
/sec
(T
p
1
)
Parallel
time
(4
PC
)
/sec
(T
p
2
)
Parallel
time
(5
PC
)
/sec
(T
p
3
)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8278 – 82
85
8284
Figure 10. Th
e spe
edu
p of parall
e
l co
mp
uting in vario
u
s num
be
r of seq
uen
ce
s
The othe
r metric for
sho
w
ing the pa
rallel perfo
rm
ance is effici
ency. Increa
sing the
numbe
r of proce
s
sors de
crea
sed the va
lue of e
fficien
cy, and co
nversely
an increase in the si
ze
of data
will
increase the efficiency [
14]. The
constant efficiency
va
lue indicates that
the
perfo
rman
ce
of a pa
rallel
system i
s
scalable to
the
size of the
d
a
ta. Scala
b
le
parallel
syst
ems
mean
s the
sy
stem i
s
a
b
le t
o
maintai
n
p
e
r
forma
n
ce
with in
cre
a
si
ng
numbe
r
of proce
s
sors [1
4] to
pro
c
e
s
s the
data in
a
ce
rtain si
ze. Efficien
cy i
s
cal
c
ulate
d
u
s
in
g
Equation
(4). The
re
sults of
efficien
cy cal
c
u
c
altion in th
is study can b
e
see
n
in Fig
u
re 11.
Eficiency
(
4
)
Limitation efficien
cy value i
s
1 / p < efficiency <1
De
scription:
S (n) = Spe
e
dup of paralle
l computin
g
p = Many pro
c
e
s
sors
Figure 11. Efficien
cy of parallel com
put
in
g in variou
s n
u
mbe
r
of seq
uen
ce
s
Figure 11
sho
w
ed that the
use of 3, 4 a
n
d
5
pro
c
e
s
so
r were scala
b
l
e
in pro
c
e
s
si
ng data
with the l
engt
h of 81
1-850
ba
sepai
r
an
d the n
u
mb
er
of seque
nce
of 32
-64. T
h
is effici
en
cy will
increa
se
wh
e
n
the
size a
n
d
the le
ngth
of se
quen
ce
i
n
crea
sed
wit
h
incre
a
si
ng
of the nu
mbe
r
of
p
r
oc
es
so
r
.
0.
00
0.
50
1.
00
1.
50
2.
00
2.
50
3.
00
3.
50
3
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
Rat
i
o
Ts/Tp
num
be
r
of
se
que
nce
Speedup
3
PC
Speedup
4
PC
Speedup
5
PC
0.
00
0.
10
0.
20
0.
30
0.
40
0.
50
0.
60
0.
70
3
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
Val
u
e
Efficiency
num
be
r
of
se
que
nce
s
Ef
f
i
cie
n
cy
3
PC
Ef
f
i
cie
n
cy
4
PC
Ef
f
i
cie
n
cy
5
PC
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Accel
e
rating Com
putation of
DNA
Multi
p
le
Seque
nce
Alignm
ent in… (Ram
da
n Satra)
8285
4. Conclusio
n
Parallel
comp
uting usi
ng M
P
I is reco
mm
ende
d for MS
A. This study
sho
w
that the
spee
d
up of u
s
ing
MPI in distri
b
u
ted env
iron
ment for
con
ductin
g
MSA
wa
s in
cre
s
e
d
by increa
sin
g
the
size
a
n
d
the numbe
r of DNA seq
uen
ce
s
ali
gne
d.
Th
e
tre
nd of
eff
i
cien
cy wa
s almost
con
s
tant
whe
n
usin
g 56 of seque
nces or mo
re. T
h
is tende
ncy
indicated that this method wa
s scalabl
e to
the size of da
ta.
Future
re
se
a
r
ch
can
be
con
d
u
c
ted b
y
using
the l
a
rge
r
size of
data
seq
u
e
n
ce
and
increa
se th
e
numbe
r of
proce
s
sors to
i
m
prove
the speed
up
a
nd efficien
cy
of parall
e
l comp
uting
for MSA. In
this rese
arch, pairwi
s
e
seque
nce alignme
n
t was co
ndu
cte
d
in sequ
en
tial
comp
utation.
In future, we
coul
d condu
ct hybrid
comp
uting by cond
ucting th
e pa
iwise seque
n
c
e
alignme
n
t in parall
e
l usi
n
g
MPI and CUDA GPU to in
cre
a
se sp
eed
up signifi
catl
y.
.
Ackn
o
w
l
e
dg
ements
The auth
o
rs would li
ke
to thank Ind
one
sia
Mini
stry of Agricu
lture for fun
d
ing this
research through the KKP3N 2014 program.
Referen
ces
[1]
Liu Y, Schmidt
B, Maskell DL. MSA-CUDA
. Mult
iple Seq
uenc
e Alig
nme
n
t on Graphics
Processin
g
Units w
i
th
CU
DA.
IEEE International
Conf
erence
on Application-s
pecific
S
ystem
s, Architectures and
Processors
. 20
09: 121-
12
8.
[2]
Juni
or SAC. Sequ
enc
e Alig
n
m
ent Algor
ith
m
s.
Departme
n
t of Compute
r
Science Sch
ool of Physic
a
l
Scienc
es & Engin
eeri
ng Ki
ngí
s Colle
ge L
o
n
d
o
n
. 200
3.
[3]
Suji
w
o
MAP, Kusuma W
A
. Multipl
e
Se
qu
e
n
ce
Ali
gnm
ent
w
i
t
h
Star Met
hod
in Grap
hic
a
l Proc
essin
g
Unit usi
ng CU
DA.
Internatio
n
a
l Se
mi
nar on
Scienc
e (ISS)
.
Bogor. 20
13: 3
59-3
63.
[4]
Llo
y
d GS. 20
1
0
. Parall
el Mult
iple S
equ
enc
e Alig
nment: An Overvie
w
.
[5]
Edgar
RC, Bat
z
ogl
ou S. Mult
iple s
e
q
u
e
n
ce
alig
nme
n
t.
Cur
r
ent op
ini
on i
n
structural bi
ol
ogy
. 20
06;
16(3): 36
8-3
7
3
.
[6]
Z
hou Z
m
, Che
n
Z
-
w
.
D
y
n
a
m
i
c Programmi
ng
for Protein
Se
que
nce A
lig
nm
ent.
Internati
o
n
a
l Jo
urna
l of
Bio-Sci
ence a
n
d
Bio-T
e
ch
nol
o
g
y
. 2013; 5(
2): 141-
150.
[7]
M
y
ers EW
, Mill
er W
.
Optimal
alig
nme
n
ts in li
near sp
ace.
Oxford Univ Pres
.
1988: 1-1
3
.
[8]
Sand
es EF
O,
de Mel
o
ACM
A
. Smith-W
a
terman Alig
nme
n
t of Huge Se
que
nces
w
i
th
GPU in Lin
ear
Space.
IEEE International P
a
r
a
llel & Dis
tributed Process
i
ng
Sym
p
osium
. 2011: 11
99-
121
1.
[9]
Sunarto AA, K
u
suma W
A
, Sukoco
H. Paral
e
lisme Of Star Alig
nment.
IEEE International
Conference
on Instru
menta
t
ion, C
o
mun
i
ca
tions, Infor
m
ati
on T
e
c
h
n
o
lo
gi,
an
d B
i
o
m
e
d
ic
al E
ngi
ne
erin
g
. 20
13:
16
7
-
171.
[10]
Z
hou
L, W
a
n
g
H, W
a
n
g
W
.
Parall
el
Imple
m
ent
atio
n of Classific
a
tio
n
Algorit
hms
Ba
sed on
Cl
ou
d
Comp
uting En
vironme
n
t.
T
E
LKOMNIKA Indon
esia
n Jo
ur
nal
of Electric
al En
gin
eer
ing
. 2012;
10(5)
:
108
7-10
92.
[11]
Li H, Gong
B, Gao HB,
Qian CJ. Par
a
lle
l Com
puti
n
g Prop
erties
of T
a
il Copol
ymer
Cha
i
n.
TEL
K
OMNIKA
. 2013; 1
1
(8): 4
344-
435
0.
[12]
“NCBI Home P
age,”[Onli
ne]. Av
ail
abl
e: http://
w
w
w
.
nc
bi.n
lm
.nih.gov/
[13]
Quinn MJ. Par
a
lle
l Progr
amin
g in C
w
i
th MPI
and Ope
n
MP.
McGraw
-Hill C
o
mpa
n
ies, Inc
. 200
3.
[14]
Maria A
Karta
w
i
d
j
a
j
a
. An
alisi
s
Kin
e
rja
Perk
a
lia
n M
a
triks
Paral
e
l M
eng
g
unak
an M
e
trik
Isoefisi
ensi
.
TESLA
. 2008;
10(2): 51-
54.
Evaluation Warning : The document was created with Spire.PDF for Python.