TELKOM
NIKA
, Vol. 13, No. 4, Dece
mb
er 201
5, pp. 1408
~1
413
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i4.2156
1408
Re
cei
v
ed
Jun
e
6, 2015; Re
vised Septem
ber
5, 201
5; Acce
pted O
c
t
ober 2, 20
15
High Performance Computing on Cluster and Multicore
Architecture
Ahm
a
d Asha
ri*
1
, Mardhan
i
Riasetia
w
a
n
2
Dep
a
rtment of Comp
uter Scie
nce an
d Electr
onics,
F
a
cult
y
of Mathematic
and N
a
tural Sc
ienc
es
Univers
i
tas Ga
dja
h
Mada
*Corres
p
o
ndi
n
g
author, em
ail
:
ashari@
ugm.
a
c.id
1
, mardha
ni@u
gm.ac.id
2
A
b
st
r
a
ct
Co
mp
uting
ne
eds that is gr
ow
ing ra
pid
l
y
and
more a
n
d
mor
e
the n
e
ed to
make
e
x
tensive
computi
ng res
ources co
mme
nsurate.
Hi
gh
computi
ng n
e
e
d
s can be
met
by using cl
uster and h
i
gh s
p
eed
process
o
r tech
nol
ogy. T
h
is st
udy a
n
a
l
y
z
e
s
a
nd co
mpar
es the p
e
rfor
ma
nc
e betw
e
e
n
cl
us
ter and
proc
es
sor
techno
lo
gy to
deter
mi
ne th
e
hig
h
perfor
m
ance
co
mp
ute
r
architect
u
re
that can
sup
p
o
rt the
proces
s of
computati
on
d
a
ta. Res
earch
usin
g R
a
spb
e
r
r
y Pi d
e
vices
that ru
n w
i
th th
e
mod
e
l
cluste
r then
be
teste
d
to
get the val
ue
of the perfor
m
ance,
F
L
OPS, CPU T
i
me a
nd Score.
F
L
OPS value o
b
t
aine
d then
mad
e
equ
ival
ent to t
he l
o
a
d
carr
ie
d by th
e cl
uster co
mp
utin
g
Rasp
berry Pi.
Rese
arch is
al
so do
in
g the
s
a
m
e
thing
on th
e i5
and
i7 pr
ocess
o
r archit
ecture.
T
he rese
arch
use h
i
men
o
9
8
and
hi
me
no
16
Larg
e
to a
naly
s
is
the process
o
r and the
me
mo
ry allocati
on.
T
he test
is run on 10
00x
10
00 matrix then
bench
m
ark w
i
th
OpenMP. T
he
ana
lysis focus
e
s on
CPU T
i
me in F
L
OPS a
n
d
every
archite
c
ture score. T
h
e resu
lt show
s o
n
raspb
e
rry clust
e
r architectur
e
have 25
76.0
7
sec in
CPU T
i
me, 86.
96 ML
POS, and 2.69
score. T
he result
on C
o
re
i5
arc
h
itecture
has
5
5
.57 s
e
c i
n
CP
U ti
me, 7
6
.30
MLOPS, and
0
.
92 scor
e
. T
h
e
result
in
Cor
e
i7
architectur
e
h
a
s 59.5
6
sec
CPU T
i
me, 14
27.61 M
L
O
PS, and 1
7
.23 sc
ore. T
he clust
e
r an
d multic
o
r
e
architectur
e
re
sult show
s th
a
t
the
architecture m
o
dels effect
to
the
co
mp
uting proc
ess. T
he
co
mp
aris
o
n
show
ed th
e c
o
mputi
ng p
e
rf
ormanc
e is str
ong
ly infl
ue
nc
ed by th
e arc
h
itecture
of th
e proc
essor p
o
w
e
r
source i
ndic
a
te
d on the i
5
an
d i7 perfor
m
an
ce is ge
tting
b
e
tter. Researc
h
also
sh
ow
s that both
mo
del
s of
cluster an
d cor
e
i5 an
d i7 a
lik
e can proc
ess the data to co
mplete.
Ke
y
w
ords
: hig
h
perfor
m
a
n
ce
computi
ng, clu
s
ter, multicor
e,
processor,
me
mory
Copy
right
©
2015 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
High pe
rform
ance com
puti
ng are n
eed to pro
c
e
ss the
large data, d
a
ta set and p
r
ocess.
The p
r
o
c
e
s
s increa
se inl
i
ne with the
busi
nes
s, scien
c
e, ed
ucation and
other n
eed
s.
Th
e
Scien
c
e
s
are
a
, espe
cially
astro
nomy, physi
cs,
che
m
istry, biolog
y, mechani
cs are ju
st a few
example
s
of
area
s th
at m
o
st be
nefit from comput
e
r
techn
o
logy [
1
, 2]. Ho
wev
e
r, it is
und
en
iable
that the a
ppli
c
ation
of
com
putational
loa
d
u
s
ed
was n
o
t a lig
ht load
, but often
re
quire
resources
is very large.
Various m
e
thod
s have b
een mad
e
to
overcome thi
s
pro
b
lem, o
ne of which is by
usin
g a su
percomp
uter a
n
d
a mainframe
comp
uter.
Tech
nolo
g
ie
s that govern
comp
uting
re
sou
r
ces
su
ch
as clu
s
ter, grid
a
nd clou
d
gives
variation d
a
ta
cha
nnel
s will
appea
r. Cl
uster that
provi
des
dedi
cate
d re
sou
r
ces
and fa
cilitate the
sha
r
ing
of d
a
ta gen
erate
d
by a fa
ster ti
me. G
r
i
d
dedi
catin
g
re
sou
r
ces conne
cted to
the
centralized settings ca
n
p
r
odu
ce
di
stri
buted data
[
3
]. Clus
ter
or often
k
n
own as c
l
us
tering, a
grou
p
of no
d
e
s th
at o
pera
t
e inde
pen
de
ntly and
wo
rk clo
s
ely
with
each oth
e
r to
be
gove
r
ne
d
b
y
a maste
r
com
puter
(ma
s
ter node
) an
d wi
ll be se
en by
the user a
s
if
the com
pute
r
is conn
ecte
d a
comp
uter u
n
i
t
[4]. The compute
r
clu
s
ters will h
a
ve more
com
puting po
we
r than a sing
le
comp
uter
either. Anoth
e
r
advantag
e of
com
puter
cl
uster when
compa
r
ed
with
singl
e
comp
ute
r
p
r
oc
es
so
r in
th
is
c
a
s
e
is the
pr
oc
es
so
r
in
th
e c
l
us
ter
c
a
n co
n
t
in
ue
to
in
cr
ea
se
with
th
e
n
u
m
be
r
of processo
rs co
ndu
cted
a
clu
s
te
r,
so
th
at
it can be a
s
certain
ed th
at the compu
t
er environ
m
ent
has h
ad a bet
ter ability than the singl
e compute
r
.
At the end of
2012, the Ra
spb
e
rry Foun
dation l
aun
ch
ed its latest p
r
odu
ct in the form of
Single Boa
r
d
Com
puter,
a
small
-
si
ze
d
comp
uter
wit
h
low po
we
r
con
s
um
ption,
3.5 W (5
V
and
0.75 A). Sin
g
le Boa
r
d
Comp
uter
produ
cts n
a
m
ed Raspbe
rry Raspbe
rry
Pi Foun
dat
ion.
Ra
spb
e
rry kn
own
gre
en e
n
vironm
ent can be
a prot
otype su
percompute
r
with
a clu
s
ter bui
lt to
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 140
8 – 1413
1409
perfo
rm ce
rt
ain com
putat
ional load.
With the
abo
ve explanatio
n, the authors arg
ue that
the
Ra
spb
e
rry Pi
can
be
b
u
ilt i
n
to a
cl
uste
r
and fo
rm
a prototype
of a superco
mpute
r
for the
pu
rp
ose
of com
puting
the spe
c
ific lo
ad. Thi
s
i
s
th
e ba
ck
groun
d
of the
re
sea
r
ch
on the
an
alysis of cl
ust
e
r
perfo
rman
ce
Ra
spb
e
rry Pi. Cla
r
ificatio
n
on the
ba
ckground
re
se
arch on
the d
e
si
gn a
nd a
naly
s
is
of cluste
r pe
rforma
nce Ra
spberry Pi.
Several
re
se
arch o
n
cl
ust
e
r e
n
viron
m
e
n
t use
si
ngle
board
somp
u
t
ing ha
s
cond
ucted
by
previou
s
re
se
arche
r
. Cox [5] discu
s
s th
e ma
king
of cluster supe
rcompute
r
with 64 Ra
spb
e
rry
Pi
usin
g MPICH2 middle
w
a
r
e
.
Total memo
ry use
d
fo
r
1T
B. R
e
s
e
a
r
ch
c
o
nd
uc
ted
at the University
of Southamp
t
on, UK. Thi
s
stu
d
y wa
s
con
d
u
c
ted
to
find out the
value of PI usin
g MPI. The
resea
r
ch on
high p
e
rfo
r
m
ance comp
uting cl
uste
rs b
y
desig
n an
d
analysi
s
in
Red Hat Enterprise
Linux al
so h
a
s
cond
ucte
d t
o
add
re
ss pe
rforma
nce
i
s
sues [6]. the
p
e
rform
a
n
c
e
o
f
cluste
r te
ste
d
use
CPI al
go
rithm a
nd
sh
own
it is
wo
rk a
nd
can
op
erate
on
clu
s
ter mo
del
s. T
he a
pproa
ch
on
clu
s
ter o
n
clo
ud ha
s bee
n i
m
pleme
n
ted i
n
an ela
s
ti
c d
a
ta intensive
comp
uting [7]
.
The re
sea
r
ch
use l
o
cal resource
and
clo
ud resource i
n
sa
me p
e
rio
d
s of time. T
he research
give opp
ortun
i
ti
e
s
on pe
rform
a
nce a
nd resource
s. The
high pe
rfor
m
ance co
mput
er cl
osed wit
h
sup
e
rcom
p
u
ter
e
s
pe
c
i
a
lly o
n
th
e
pu
r
p
o
s
es
. T
h
e iss
ues
ar
e
in
the
throu
ghp
ut an
d pe
rform
a
n
c
e itself [8]. T
he
resea
r
ch on
GPU p
a
ssthrough fo
r hi
gh
perfo
rma
n
ce
com
puting
e
s
pe
cially
clou
d define
s
that
the
core a
r
chitect
u
re th
at ena
b
l
e virtual m
a
chine
s
is on
e
of the mo
st i
m
porta
nt co
mpone
nts
on
the
purp
o
ses [9]. The re
sea
r
ch
use Xen Hyp
e
rviso
r
to ma
nage the pe
rf
orma
nce of computation a
nd
run as HPC machi
n
e
s
.
Th
e
high pe
rforamce
comput
ation al
so
ca
n be e
s
tabli
s
h by optim
zin
g
the
resou
r
ce esp
e
cially proce
s
sor an
d me
mory. T
he re
sea
r
ch on m
u
lticore pro
c
e
s
sor optimi
z
a
t
ion
sho
w
n that core spee
d an
d power con
s
umptio
ns
h
a
v
e relation o
n
overall pe
rf
orma
nces. T
he
resea
r
ch sho
w
n that the
r
e are idle
-speed m
odel
and
con
s
ta
nt-sp
eed
mo
del that can
be
introdu
ce to h
andle the o
p
timization [10].
This
re
sea
r
ch have d
e
e
p
different
with othe
r rese
arch, mo
re focus
on
resou
r
ce
perfo
rman
ce
analysi
s
and
ben
chma
rk. The re
sea
r
ch
use clu
s
ter
t
o
ma
nag
e se
veral re
sou
r
ces
into singl
e cl
uster
environ
ment and
co
re i5 and i
7
te
chn
o
logy a
s
rep
r
e
s
entatio
n of high
spe
ed
pro
c
e
s
sor.
T
he b
e
n
c
hma
r
k pu
rpo
s
e
s
to state
th
e core
tech
nology
pro
c
e
s
s an
d relia
bity
esp
e
ci
ally on comp
utation
pro
c
e
ss.
2. Rese
arch
Metho
d
Re
sea
r
ch sta
r
t by build clu
s
ter a
r
chitect
u
re
de
sig
n
14
Raspbe
rry Pi, Core i5 and
Core i7
architectu
re
s.
Which is the
n
impleme
n
te
d and te
ste
d
the perfo
rma
n
ce by calcul
ating the valu
e
of FLOPS (Floating Point Operation
s
Per Sec
ond
) in
units of Meg
a
and co
mpu
t
ing 1000x10
00
matrix cal
c
ul
ations
are
focuse
d on th
e
ability of
the
pro
c
e
s
sor i
n
the cl
uste
r to
handl
e a n
u
m
ber
of computatio
nal load. Co
n
s
tru
c
tion of the sy
stem by desi
gning a
n
d
implementi
ng 14 Ra
sp
b
e
rry
Pi so that it
can run in
a
cluster.
The first test
o
n
the
system i
s
do
ne by
calcula
t
ing the value
in
units of M
ega
FLOPS u
s
in
g ben
ch
mark
tools
Hime
n
o
98. The
se
co
nd test i
s
d
o
n
e
by pe
rformi
ng
a pa
rallel
co
mputing th
ro
ugh a
10
00x1000
matrix
cal
c
ulatio
ns.
From th
e re
sults
obtain
e
d
throug
h the fi
rst a
nd
se
co
nd test, carri
ed out a
n
eq
uivalen
c
e b
e
twee
n FL
OPS value ge
nerated
by the calcula
t
ion of 1000x
1000
mat
r
ix is then an
alyzed.
The sy
stem use
d
in this
thesi
s
ha
s the
followin
g
functio
nal re
q
u
irem
ents o
p
e
rating
system
s Ra
spbian Wh
ee
zy,
MPICH2
m
i
ddle
w
are, the script
Hime
no98.
T
he system can display
the pe
rcenta
ge of
processor
usage
and
memo
ry
whe
n
runnin
g
the
appli
c
atio
n
Himen
o98
u
s
ing
htop. The sy
stem can pe
rfo
r
m parallel ca
lcul
atio
ns
with a 1000x1
0
0
0
matrix cal
c
ulation
The first te
st perfo
rmed
on
the clu
s
ter t
o
test t
he pe
rforman
c
e of t
he clu
s
te
r 14
Raspbe
rry P
i
to
run the
script
Himeno98.
Himeno98
run of the
script will get the
highest output in the form
o
f
clu
s
ters FL
O
PS value that
has bee
n b
u
i
l
t, the ol
d cal
c
ulatio
n Him
e
no98
cal
c
ul
ation of the
scri
pt,
as
well
as score
s of
cal
c
ul
ation. Te
sts
u
s
ing
Him
eno
98 requi
re
s a
lot of its core
or
as a
multipl
e
of two nod
es.
Starting from
2, 4, 8, 16 and so on.
In this
study, the Ra
spbe
rry Pi
is used by 1
4
,
so that i
n
its
impleme
n
tation, the n
ode
s a
r
e u
s
e
d
fo
r the
cal
c
ulati
on u
s
ing
16
node
s, so th
ere
are 2 nod
es runnin
g
doubl
e in the calculations. In th
is test, the test
is not done in
the same time,
and through
a parallel wa
y. Tests pe
rforme
d se
que
ntially, and perform
ed in
a different tim
e
duratio
ns.
The
se
cond
test in this
study, parallel
com
puting i
s
do
ne by
calcul
ating the
matrix
dimen
s
ion
s
1
000x10
00. P
a
rallel
compu
t
ing is do
ne t
o
obtai
n a
cl
u
s
ter cal
c
ul
atio
ns i
n
cal
c
ulati
n
g
the dim
e
n
s
io
ns
100
0x100
0 matrix. T
o
obtain vali
d
result
s, the
te
sting
will
be
done
a
s
m
a
n
y
as
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
High Pe
rform
ance Com
put
ing on Cl
uste
r and Multi
c
ore Archite
c
ture
(Ahm
ad Ashari
)
1410
30 trials, with
the object of testing that is r
un of the script Hime
n
o98 LARGE size, and use
as
many cores t
o
16 cores. T
h
is test wa
s chosen be
cau
s
e of several option
s
the magnitud
e
of the
data used by Himen
o98, S
M
ALL,
MEDIUM, LARGE
and LARGE size clu
s
ters for 14 Raspbe
rry
Pi is allowe
d to run.
Re
sults FLO
PS value, CP
U Tim
e
, and
score
s
obtain
ed fro
m
the fi
rst te
st an
d th
e time of
matrix cal
c
ul
ations i
n
a
se
cond test will be an
equ
ivalence bet
ween the performance of
the
clu
s
ter 14 Ra
spb
e
rry Pi by computin
g the load matri
x
, which is then analyzed
and co
ncl
u
si
ons
c
a
n
be
d
r
aw
n.
The first te
st para
m
eters in thi
s
the
s
is ar
e: the
length
of ti
me calculati
on
script
Himen
o98
LA
RGE 16 co
re
s,
resulting F
L
OPS
value
i
n
unit
s
of Me
ga, then
the
resultin
g
scores
after doi
ng t
he
calculatio
ns. Th
e
se
co
nd te
st pa
ra
meters in thi
s
the
s
i
s
a
r
e:
long
cal
c
ul
a
t
ion
parall
e
l
com
p
uting mat
r
ix calcul
ations. Each of
these
parameters will be
re
corded when the test
30 times a
n
d
analyze
d
in orde
r to obtai
n com
parat
ive re
sults b
e
twee
n ea
ch value so it can
be
dra
w
n a con
c
lusio
n
about t
he differen
c
e
in values o
b
tained.
Applicatio
n u
s
ed to p
e
rfo
r
m testing i
s
MP
ICH2,
Hi
meno9
8, hto
p
, as
well a
s
1000x1
000
matrix cal
c
ul
ation scri
pt. The application will be
installed as middleware
MPICH2 cl
uster
of 14
Ra
spb
e
rry Pi. Run of the
appli
c
ation
Hi
meno9
8, hto
p
, as well as the script m
a
trix cal
c
ulati
ons
performed using the termi
nal. On the
master
node will
run the
inst
alled MPI
C
H2
Himeno98
comm
and
script exe
c
utio
n an
d
script
matrix calc
ul
ation involvin
g 13
othe
r
n
ode
s. While
the
appli
c
ation
ht
op will sho
w
the
work bei
ng
d
one
by
t
he
CPU so t
hat it
will n
o
tify the processor
and mem
o
ry usa
ge for ea
ch node that is runnin
g
.
Then afte
r the clu
s
ter
wo
ke up, a
nd it has b
een i
m
pleme
n
ted,
the first test
will be
carrie
d o
u
t, n
a
mely the
ca
lculatio
n of t
he valu
e of
the
clu
s
ter FL
OPS Ra
sp
be
rry Pi
ha
s b
e
en
built usin
g th
e appli
c
ation
Himen
o98. After the first
te
st is compl
e
ted, a se
co
nd
test is do
ne
by
perfo
rming
p
a
rallel co
mp
uting
u
s
ing 1000x1
000 d
i
m
ensi
onal
m
a
trix cal
c
ulati
on. Re
sult
s
are
issued in the
form of time testing thi
s
se
con
d
cl
u
s
ter
cal
c
ulatio
ns i
n
cal
c
ulatin
g the matrix. The
first and
se
co
nd testing
will
be done 3
0
times
so that
the data ge
ne
rated is valid.
After that will
be
calculated
the ave
r
ag
e
of the
30
re
sults of te
sts
perfo
rmed.
F
r
om th
e resul
t
s of the
re
su
lts
obtaine
d fro
m
the first an
d se
con
d
test
will then be
con
d
u
c
ted eq
uivalen
c
e, be
tween the val
u
e
s
obtaine
d with
the old FLO
PS calcul
atio
n matrix
wa
s
analyzed an
d
con
c
lu
sion
s
dra
w
n fro
m
the
results of equ
ivalence.
3. Results a
nd Analy
s
is
Re
sea
r
ch on
the developm
ent of cluste
r 14 Ra
sp
b
e
rry Pi model B
is a prototype
of the
developm
ent of a supe
rco
m
puter that can perfo
rm
certain
comp
utational loa
d
. Develo
pment
of
the
cl
uste
r co
nfiguratio
n steps
pri
o
r re
search ado
pts
[5].
The st
ud
ies co
ndu
cte
d
in Dep
a
rtm
ent
of Com
puter Scien
c
e
a
nd Ele
c
troni
cs, F
a
cu
lty of Mathemat
ics
and
Nat
u
ral S
c
ien
c
e
s
,
Universita
s G
adjah M
ada,
that used
a 1
4
Ra
spb
e
rry Ra
spb
e
rry Pi model B an
d to obtain d
a
ta
on the p
e
rfo
r
mance of the
clu
s
ter i
s
do
ne by ca
lcula
t
ing the Raspberry Pi FL
OPS whi
c
h i
s
a
ben
chma
rk of a comp
uter
clu
s
ter o
r
sup
e
rcomp
u
te
r.
Then, the val
ue of perfo
rm
ance is foun
d
,
it
will be comp
ared to time
calculation
perfo
rmed
co
mputational l
oad, in this
study, usin
g the
comp
utationa
l load cal
c
ulat
ion dimen
s
io
n 1000x1
000
matrix.
3.1. Testing
Cluster 14 Raspber
r
y
Pi
Test
s o
n
thi
s
re
se
arch, th
e cl
uste
r
system 14
Rasp
berry Pi
Him
eno9
8 a
nd
p
e
rform
cal
c
ulations using 1000x10
00 matrix. In t
h
is test, the
system w
ill
run the test
script, by executing
the command perform
ed
by the mast
er node
will
call for each
sl
ave
in the file list m
a
chine file
run the exe
c
utable prog
ra
m that is
located at / home / mpi_testing. Name o
f
the progra
m
s, ie
himeno
16 LA
RGE. The system run
s
a script that
has exe
c
uta
b
le matrix ca
lculatio
n in MPI
runtime, invol
v
ing 14 Nod
e
Rasp
be
rry Pi as a core
p
r
ocesso
r use
d
in t
he calculation proce
ss.
Script
s ru
n of
matrix cal
c
ul
ations, p
e
rfo
r
med at t
he m
a
ster
nod
e. Same thing
wit
h
the test run
of
the script Hi
meno9
8, will
orde
r the
maste
r
nod
e
to each
sla
v
e node mat
r
iks10
00 to run
prog
ram
s
tha
t
are in the directory / home
/ mpi_testing
on each nod
e.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 140
8 – 1413
1411
3.2. Test Res
u
lts and Dis
c
ussion
The test
re
sults are divi
ded into two part
s
, usi
ng Him
eno
9
8
testing
an
d usi
ng
1000x1
000 m
a
trix cal
c
ulati
on scri
pt. The
test re
su
lts
can then
be ca
rrie
d
out in th
e discu
s
sion
of
the value
an
d pe
rform
a
n
c
e of
Ra
spb
e
rry F
L
OPS i
n
ha
ndling
certain
co
mpu
t
ational lo
ad
by
perfo
rming
m
a
trix cal
c
ul
ations.
The
re
sults of th
e fi
rst test is the
result
of calcu
l
ation of
FLO
PS,
CPU Time a
nd Sco
r
e usi
ng tools Hi
m
eno9
8 ben
ch
mark to determine the pe
rforma
nce of the
clu
s
ter i
s
ind
i
cated
by the
Ra
spb
e
rry Pi FLOPS value a
nd the
other t
w
o p
a
ram
e
ters were
obtaine
d. FL
OPS is valu
e in unit
s
of
Mega. Te
sts
carrie
d out 3
0
times to
ge
t the best
re
sults
and avoi
d an
omalie
s in th
e data o
b
tai
ned. Follo
wi
n
g
the p
r
e
s
en
tation of the
test data tabl
e
clu
s
ter 14
Ra
spb
e
rry Pi using a script Hi
meno9
8 La
rg
e size and u
s
e 16 nod
es.
Tabel 1. Te
st Re
sult on Ra
spb
e
rry Pi Cluster
No
CPU Ti
me (se
c
)
MFL
O
PS
Score
1 2590.58100
7
86.267300
2.676396
2 2594.53736
2
86.235601
2.672315
3 2600.02029
5
86.053747
2.666679
4 2673.03766
8
83.703081
2.593836
5 2630.13882
2
85.068319
2.636143
6 2633.72749
8
84.952406
2.632551
7 2653.36193
5
84.323772
2.613070
8 2660.82689
3
84.087202
2.605739
9 2673.54392
0
83.687231
2.593345
10 2683.17851
6
83.386732
2.584033
Average
2576.06624
0
86.962075
2.695453
Min 2412.47570
0
82.002858
2.541148
Max 2728.45964
0
92.743520
2.873986
Experimental
data
u
s
in
g Hi
meno9
8 as
shown
in
Ta
bl
e 1
give
s a
p
a
ttern
of data
that
can
be a
nalyze
d
. In the first e
x
perime
n
t. The results
ob
tained i
n
the
first ph
ase of
the exp
e
rim
e
n
t
rangi
ng from
test 1 to te
st
to 14. In the
first p
h
a
s
e
of testing, the
hi
gh FL
OPS visible
re
sult
s
on
testing to
1,
2 to te
st p
e
rf
orma
nce d
e
crea
sed,
until
the test
to 4.
Ho
weve
r, o
n
testin
g to
5,
experie
nced
increa
se in
perfo
rman
ce,
rea
c
hin
g
a
value of 85.
0653
19 FL
O
PS. Howeve
r, in
testing to 6 until the end
of the first pha
se of
test
ing is testin
g
to 14 FLOPS value decli
ne,
asse
ssed a
s
having de
clin
e comp
uting
per
fo
rman
ce
by doing it co
ntinuou
sly.
The re
sult of
the cal
c
ulati
on ca
n not
be co
uple
d
FLOPS alon
e without ta
king into
accou
n
t the o
t
her two p
a
ra
meters nam
el
y CPU Ti
me,
and Score.
CPU time is th
e time req
u
ire
d
to cal
c
ul
ate t
he
cluster script so get FLOPS val
ue, will
produce certai
n sc
ores. The value i
n
the
first ph
ase is decre
asi
ng,
but in the
se
con
d
ph
ase
of testing te
n
d
to be
high
e
r
value
obtai
ned
FLOPS and
more
stable.
CPU time, in
versely p
r
op
o
r
tional to the
FLOPS value
,
the higher t
he
value of FLO
PS, then the cal
c
ulatio
n of time tak
en b
y
the faster, in the se
nse
of the smalle
r.
Nod
e
s are u
s
ed a
s
m
any
as
14, b
u
t in
testing,
ca
l
c
u
l
ation scripts to
impo
se 16
nod
es, so
th
at
the testin
g p
r
oce
s
s u
p
to
3
0
to 1, th
e m
a
ster
no
de
(1
92.168.0.2
0
1
)
an
d n
ode
2
(192.1
68.0.2
02)
to do two job
s
at once, but node 3 up to
14 just doi
ng
one job o
n
ly.
On the
ma
ster n
ode
and
node
2, the
memo
ry u
s
age lo
oks
different from t
he othe
r
node
s in the amount of 26
4MB. But at
the sam
e
pro
c
essor that is
use
d
to allocate all pro
c
e
s
sor
cap
ability, 10
0%. Ra
spb
e
rry clu
s
ter
ch
aracteri
st
ics
are sho
w
n in
te
sts
usi
ng
Him
eno9
8 ge
ne
rally
see
n
in F
L
O
PS value ge
nerate
d
. FL
OPS value t
end
s to de
cl
ine for
every
test thereafter.
Processo
r an
d memo
ry all
o
catio
n
ha
s d
e
crea
sed
afte
r the test to
compute
(n)
so that the test
to
the (n + 1
)
wi
ll almost cert
ainly prod
uce
d
FLO
PS value de
creased
. The se
con
d
cha
r
a
c
teri
stic is
sho
w
n
clu
s
te
r Ra
spb
e
rry Pi model B is se
en in the
first phase a
nd the se
con
d
pha
se testi
n
g
testing. After pha
se
witho
u
t any task
perfo
rmed
o
n
the cl
uste
r of Ra
spb
e
rry Pi, the second
pha
se of testing wa
s rep
eated, re
sulti
ng in
FLOP
S value is soarin
g, stom
ping num
ber 92
MFLOPS. But back to the first cha
r
a
c
teristic cl
us
te
r Rasp
berry Pi model B, the value of FLO
P
S
on testin
g to
(n
+ 1
)
would
also
be
smal
ler tha
n
the v
a
lue of the
te
st to FLOPS
(n), the m
eani
ng
is almo
st always d
e
crea
sed. Inversel
y propo
rt
iona
l to the valu
e of FLOPS, CPU time ha
s
increa
sed, al
ong with the
decli
ne FLO
PS value
obtained in the t
e
sting p
r
o
c
e
s
s, demon
stra
ted
the same
cha
r
acte
ri
stics, almost certainl
y decre
ased
perfo
rman
ce
for each test to the (n + 1
)
.
The average
value of the cluste
r FL
O
PS owned
1
4
Ra
spb
e
rry Pi model B is equ
al
86.962
074
7
MFLOPS, wit
h
the
small
e
st value
that
is o
w
n
ed
by
82.002
858
M
F
LOPS a
nd t
h
e
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
High Pe
rform
ance Com
put
ing on Cl
uste
r and Multi
c
ore Archite
c
ture
(Ahm
ad Ashari
)
1412
large
s
t value
of 92.74352
0 MFLOPS. The average
time that need by CPU to do cal
c
ulati
ons
Himen
o98 i
s
durin
g 25
76.
0662
4 secon
d
s, the fa
stest time 2412.4
757
se
con
d
s,
and the lo
ng
est
time taken fo
r 2
728.4
5964
se
co
nd
s.
While the
av
e
r
a
ge
scores cl
u
s
ter po
sse
s
sed 1
4
Ra
spb
e
rry
Pi model B is at 2.6954
5
318, with the
lowe
st
scores of the hig
hest sco
r
e
s
of 2.54114
8 and
2.8739
86.
Tabel 2. Te
st Re
sult on Core i5 & i7
No
Core i5
Core i7
CPU Ti
me
(sec)
MFLO
PS
Score
CPU
Ti
me
(sec)
MFLO
PS
Score
1 48.269985
69.528141
0.839306
59.570688
1427.24162
3
17.228895
2 57.529376
77.783388
0.938959
59.460039
1429.89757
3
17.260956
3 56.789592
78.796653
0.951191
59.496563
1429.01978
1
17.250360
4 48.671718
68.954260
0.832379
59.472241
1429.60419
9
17.257414
5 57.415807
77.937244
0.940817
59.466694
1429.73755
1
17.259024
6 58.970462
75.882562
0.916014
59.746338
1423.04563
4
17.178243
7 57.287316
78.112051
0.942927
59.420364
1430.85231
6
17.272481
8 58.459712
76.545532
0.924017
59.835172
1420.93291
6
17.152739
9 57.663230
77.602829
0.936780
59.572076
1427.20836
9
17.228493
10 54.644435
81.889945
0.988531
59.515524
1428.56451
1
17.244864
Average
55.570163
76.303261
0.921092
59.555570
1427.61044
7
17.233347
Min 48.269985
68.954260
0.832379
59.420364
1420.93291
6
17.152739
Max 58.970462
81.889945
0.988531
59.835172
1430.85231
6
17.272481
Table
2
sho
w
s Him
eno
9
8
with
Ope
n
M
P test on
Desktop P
C
wi
th Co
re i
5
an
d Co
re
i7
architectu
re
s.
Core i5 a
r
chite
c
ture
ha
ve perfo
rma
n
ce o
n
CP
U Time avera
ge 55,57
016
23
se
con
d
s, 7
6
,3032
605 M
F
LOPS and
0,9210
921
sco
r
e. Core i
7
a
r
chite
c
tu
re h
a
ve perfo
rma
n
ce
on CPU Tim
e
59,5555
699
se
con
d
s, 14
2
7
,6104
47 MF
LOPS and 17
,23334
68 sco
r
e.
The Raspbe
rry Pi cluste
r, Core i5 and
Core i7 sh
o
w
n the pe
rformance ba
se
d on the
resou
r
ce ca
p
a
citie
s
. Proce
s
sor an
d me
mory sh
own the main reso
urces give th
e perfo
rman
ces.
3.3. Test Res
u
lts Usi
ng Matrix Calculation
The re
sult
s o
f
the second
test is the result
of calculating the m
a
trix with di
mensi
o
n
s
1000x1
000 p
e
rform
ed by clu
s
ter 14
Ra
spb
e
rry Pi model
of B. Table 3 sho
w
s the re
sults of tests
usin
g a 100
0
x
1000 matrix
cal
c
ulatio
n script sho
w
n in
Table 3.
Table 3. Matri
x
Test Re
sult
s
No
Raspb
e
rr
y
Pi Cl
uster
Core i5
Core i7
D
i
m
e
ns
i
on
T
i
m
e
(
s
) D
i
m
e
ns
i
on T
i
m
e
(
s
) D
i
m
e
ns
i
on
T
i
m
e
(
s
)
1
1000x1
000
39.443008
500x50
0
0.739688
1000x1
000
1.062685
2
1000x1
000
39.058709
500x50
0
0.735486
1000x1
000
1.060851
3
1000x1
000
38.083262
500x50
0
0.733455
1000x1
000
1.064384
4
1000x1
000
38.038639
500x50
0
0.718637
1000x1
000
1.069085
5
1000x1
000
38.680501
500x50
0
0.769055
1000x1
000
1.062433
6
1000x1
000
37.974584
500x50
0
0.761147
1000x1
000
1.062942
7
1000x1
000
37.605260
500x50
0
0.712850
1000x1
000
1.065662
8
1000x1
000
37.520037
500x50
0
0.765559
1000x1
000
1.062061
9
1000x1
000
38.536223
500x50
0
0.744571
1000x1
000
1.063572
10
1000x1
000
38.293610
500x50
0
0.749248
1000x1
000
1.063652
Average
38.679255
0.742970
1.063733
Min
36.934409
0.712850
1.060851
Max
40.327027
0.769055
1.069085
Cal
c
ulation
of
matrix dime
nsio
n 10
00x1
000 i
s
a
real
form of pa
rall
el co
mputing,
whe
r
e
more
and
mo
re n
ode
s a
r
e
use
d
, then
solving the ta
sk will
be
ca
rri
ed out
as
no
des
are u
s
ed
to
perfo
rm
com
putation
s
. In this
ca
se, the
cal
c
ulatio
n u
s
ing 14
pie
c
e
s
of Ra
spb
e
rry Pi, so that th
e
comp
utationa
l load
will b
e
divided e
qual
ly to each no
de a
s
ma
ny
as 1
4
n
ode
s,
so th
at pa
ral
l
el
comp
uting
wi
ll be d
one
qu
ickly. Th
e
re
sults indi
cate
the hete
r
o
g
eneity of the
time re
quired
to
perfo
rm
calculation
s
. Tim
e
cal
c
ul
ation
s
a
r
e g
ene
ra
ted in this
seco
nd te
st, tend
s to be
more
volatile and u
n
stabl
e com
p
ared to the first test that uses Him
eno
98
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 140
8 – 1413
1413
4. Conclusio
n
Con
c
lu
sion
s
dra
w
n fro
m
e
x
amination a
nd discu
s
sion
are a
s
follo
ws. From th
e resultin
g
data and
ch
ara
c
teri
stics
on ea
ch test,
there ex
ist
s
an equivale
nce, Cl
uste
r
14 Ra
sp
berry Pi
model
B
whi
c
h
ha
s
a val
ue of
86.9
6
2
0747
MFL
O
PS perfo
rma
n
ce,
CP
U Ti
me 2
576.0
6
6
24
se
con
d
s, a
n
d
38.67
925
5
4
Sco
r
e 2.6
9545
318 ta
kes
se
con
d
s t
o
co
mplete t
he calculatio
n of
dimen
s
ion
s
1
000x10
00 m
a
trix cal
c
ulati
ons. FL
OPS
cal
c
ulatio
n is likely to de
cline, increa
se
d
C
P
U time
, and
sc
or
es
c
o
mp
a
r
ab
le w
i
th
F
L
O
PS bu
t
te
nd to
be
mo
re ge
ntle, mo
re sta
b
le
re
sul
t
s
.
With su
ch ch
ara
c
teri
stics, yield 30 times the matr
ix cal
c
ulatio
ns that have tren
d tends to rise.
The average
time it takes the cl
uste
r 14 Ra
spb
e
rry Pi model B for comp
u
t
ing this mat
r
ix
cal
c
ulatio
ns for
38.679
25
54
se
con
d
s.
The
fast
e
s
t
time that i
s
obtain
ed
d
u
ring
36.9
3
4
409
se
con
d
s, an
d
the longe
st time taken by t
he clu
s
ter, for 40.3270
27 seco
nd
s.
Test re
sult o
n
PC Mac O
S
X with Core i5 pro
c
e
s
so
r is use 500x
500 matrix. The matrix
impleme
n
t becau
se the lim
itation on me
mory. The
re
sult sh
own th
at in average
Core i5 hav
e
0,7429
66 second
s, or for t
he a
ssu
mptio
n
test
with 1
0
00x100
0 mat
r
ix will have
1.484
second
s.
Core i7 with 1
000x10
00 ma
trix have 1.063732
7 se
co
n
d
s.
Ra
spb
e
rry Pi Cluste
r even
has limited reso
ur
ce
s, with 14 nod
es
can han
dle an
d finish
the job with less performance. It is normal
because limited
resources will
impacted th
e
executio
n ti
mes.
Co
re i
5
archite
c
tu
re ha
s
mo
re reliabl
e re
so
urces
an
d re
sulting
the b
e
tter
perfo
rman
ce
than
Rasp
berry Pi
Clu
s
ter.
Co
re
i
7
a
r
chitectu
re ha
s the
b
e
st p
e
rfo
r
ma
nce
esp
e
ci
ally wh
en executing
the matrix.
High
perfo
rm
ance computi
ng a
r
chitectu
re that h
a
s
b
een b
u
ilt on
this result ca
n give
learn
on th
e
developm
ent
of HPC
archit
ecture mo
del
s, and
ba
seli
ne pe
rforman
c
e. In the
future
it will use fo
r determine t
he delivery a
r
chite
c
tu
re m
odel on
HPC and can be
test by more
variation of lo
ad.
Ackn
o
w
l
e
dg
ment
The
re
sea
r
ch
su
ppo
rted
by Postg
r
ad
uat
e Pro
g
ra
m in
Comp
uter Sci
ence, Depa
rtment of
Comp
uter S
c
ience and El
ectro
n
ics, Fa
culty of
Math
ematic a
nd
Natural Sci
e
nce,
Universi
tas
Gadja
h
Mad
a
. The rese
arch cond
uct
ed in
Co
mp
uter System
and
Networks
Lab
orato
r
y, in
Dep
a
rtme
nt o
f
Comp
uter S
c
ien
c
e
and
Electro
n
ic
s, F
a
culty of Math
ematic
and
Natural S
c
ien
c
e,
Universita
s G
adjah Ma
da.
Referen
ces
[1]
Allock B, Foster J,
Nefed
o
va
V, Cherve
nak
A, Deelm
an A,
Ke
sselm
an C,
Lei
gh J, Sim
A, Shosh
a
n
i
A
,
Drach B, Will
i
a
ms D.
Hi
gh-
Performanc
e
Re
mote Acc
e
s
s
to Cli
mate S
i
mulati
on
Data
: A Chal
len
g
e
prob
le
m for Da
ta Grid T
e
chno
logi
es
. SC20
01
. 2001
[2]
Moore
R, Raj
a
sekar A.
Dat
a
and
M
e
tad
a
ta Coll
ectio
n
s
for Scientific
A
p
p
l
i
c
ations
. H
i
g
h
Performanc
e
Comp
uting a
n
d
Net
w
o
r
ki
ng (H
PCN 20
01), Amsterdam, NL.
2001.
[3]
Riaseti
a
w
a
n
M
,
Mahmo
o
d
A
K
. DALA Pr
oj
ect: Dig
ital
arc
h
ive
s
y
stem f
o
r l
ong
term
access.
20
10
Internatio
na
l
C
onfere
n
ce on Distribut
ed
F
r
a
m
ew
ork
an
d A
pplic
atio
ns (DF
m
A),
Yo
g
y
ak
ar
ta Indo
nesi
a
.
201
0: 1-5.
[4]
Santoso
J, va
n Al
ba GD,
Nazief
BA, Sl
oot PM.
Hier
a
rchic
a
l J
ob
Sched
ull
i
n
g
fo
r Clusters
o
f
W
o
rkstations
.
Procee
din
g
s
of the si
xth
a
nnu
al co
nfere
n
ce
of the A
d
vanc
e
d
Sch
ool
for C
o
mputi
ng
an
d
Imagin
g
(ASCI 200
0). 200
0: 99-10
5.
[5]
Co
x SJ, Co
x J
T
, Boardman
RP, Johnsto
n
SJ, Scoot
M,
O’Brien NS. Iri
d
is-pi: a
lo
w
-
c
o
st, compact
demo
n
stration cluster.
Journ
a
l of Cluster C
o
mputi
ng.
20
1
2
; 17(2): 34
9-3
58.
[6]
Rahm
an A.
Hi
gh P
e
rformanc
e Com
puti
ng
Cluster
s
Desi
g
n
a
nd A
n
a
l
y
s
is
Usin
g R
e
d
H
a
t Entepr
is
e
L
i
nux
.
T
E
LKOMNIKA Indone
sian Jo
urna
l of Electrical E
ngi
neer
ing.
2
015;
14(3): 53
4-5
4
2
.
[7]
Duan Z. An Elastic Data In
te
nsive C
o
mp
uti
ng Cl
uster o
n
Clou
d
.
T
E
LKOMNIKA Indon
e
s
ian Jo
urn
a
l of
Electrical E
ngi
neer
ing.
2
014; 12(1
0
):
743
0-7
437.
[8]
Raicu I, F
o
ste
r
IT
,
Z
haor Y.
Many T
a
sk C
o
mputi
ng for
Grids an
d Sup
e
rco
m
p
u
ters.
W
o
rkshop
o
n
Man
y
T
a
sk Co
mputin
g on Gri
d
s and Su
perc
o
mputers. 20
0
8
: 1-11.
[9]
Younge AJ,
Walters JP,
Crago S, Fox GC.
Eval
uati
ng GPU
Pas
s
throug
h i
n
X
en for
Hi
g
h
Performanc
e Clou
d
Co
mpu
t
ing.
2014 IEEE International Par
a
ll
el
& Distributed Processin
g
S
y
mp
osi
u
m W
o
rksho
p
s. 201
4: 852-8
59.
[10]
Li K. Optimal Partion
i
ng
of Mu
lticore Serv
er Processor.
T
h
e
Journa
l of Sup
e
rco
m
p
u
ting.
S
p
rin
ger US.
201
5.
Evaluation Warning : The document was created with Spire.PDF for Python.