TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 10, Octobe
r 20
14, pp. 7430
~ 743
7
DOI: 10.115
9
1
/telkomni
ka.
v
12i8.543
3
7430
Re
cei
v
ed
De
cem
ber 2
3
, 2013; Re
vi
sed
Jul
y
29, 201
4
;
Accepte
d
Augu
st 17, 201
4
An Elastic Data Intensive Computing Cluster on Cloud
Zhaolei Dua
n
*, Xueli Wu
Coll
eg
e of Co
mputer an
d Co
mmunicati
on E
ngi
neer
in
g, Z
hengz
ho
u Univ
e
r
sit
y
of Li
ght Industr
y
,
Don
g
fen
g
Roa
d
, Z
hengzh
ou,
Hen
an Provi
n
c
e
, Chin
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
:dua
nzl@zz
uli.
edu.cn
A
b
st
r
a
ct
In order to
me
et the incr
easi
ng pr
ocessi
ng
de
ma
nd of
dat
a inte
nsive c
o
mp
utin
g, an
el
asticdat
a
intens
ive co
mputin
g cluster
EDICC
is pr
es
ented. Us
ing
l
o
cal res
ourc
e
and c
l
ou
d res
ource
at the s
a
me
time, EDIC
C
has h
i
gh
avai
l
abil
i
ty an
d rel
i
abil
i
ty, espec
ia
lly w
hen flas
h
crow
d hap
pe
ns. EDICC co
uld
acqu
ire res
our
ce fro
m
clo
ud
w
hen syste
m
i
s
overl
o
a
ded
and r
e
l
ease
u
nnec
essary c
l
o
ud res
ourc
e
af
ter
load down. Ex
perim
ental r
e
s
u
lts s
how, comparing withtrad
itional dat
a intensive c
o
m
puting system
, EDICC
coul
d achi
eve
outstand
in
g
pe
rforma
nce a
nd
hig
her reso
urc
e
efficiency.
Ke
y
w
ords
: dat
a intens
ive co
mp
utin
g, lo
a
d
bal
anci
ng, clo
u
d
computi
n
g
Co
p
y
rig
h
t
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Data i
n
ten
s
ive comp
uting
is the
ne
w dire
ctio
n a
n
d hot
spot
of high
pe
rf
orma
nce
comp
uting re
sea
r
ch in re
cent years. Da
ta int
ensive computing co
u
l
d
ac
hi
eve cross re
gional
and
cro
s
s platfo
rm data p
r
o
c
e
ssi
ng, ma
ssiv
e
data
p
r
o
c
e
ssi
ng a
nd an
alyzing, d
e
si
gn an
d de
cisi
on-
makin
g
assi
sting. Rese
arch results sho
w
, in bi
g business and
re
sea
r
ch institu
t
e, the increa
se
spe
ed of d
a
ta nee
d to be
pro
c
e
s
sed i
s
4 time
s th
e increa
se
speed
of ha
rd
ware p
r
o
c
e
s
sing
cap
ability. Fa
cing
the
expl
osive i
n
crea
se of
dat
a
am
ount, we m
u
st find a
n
effe
ctive solution
to
meet the d
e
m
and
of dat
a processing
and
acquire
valid info
rmation fro
m
massive d
a
t
a
.
Curre
n
tly, the mai
n
met
hod to
a
c
co
mplish
data
inten
s
ive
computing
an
d me
et g
r
o
w
ing
pro
c
e
ssi
ng
d
e
mand
is u
s
i
ng
com
putin
g sy
st
em whi
c
h ha
s
hi
gh enou
gh com
puting ca
pabi
lity,
su
ch a
s
main
frame, su
per
comp
uter a
n
d
compute
r
cl
u
s
ter.
Ho
w to b
u
il
d computin
g
system
for
data
inten
s
iv
e computin
g, how to ma
ke
data
intensive
co
mputing sy
stem’s softwa
r
e and ha
rd
ware be hi
ghly
scal
able a
n
d
reliabl
e, is th
e
focu
s of current data intensive co
mpu
t
ing res
earch
. In traditional data inten
s
ive com
puti
ng
system, there alway
s
is the sit
uation that system l
o
ad exceed
system proc
essi
ng capability. In
this situation, the system pe
rform
a
nce will
decrease significant
ly, even worse, system will ent
er
su
spe
nde
d
a
n
imation, cau
s
e se
rvice un
available.
A t
y
pical exa
m
p
l
e is th
e trai
n
ticket
boo
kin
g
sy
st
em i
n
s
p
r
i
ng f
e
st
iv
al.
S
t
at
ist
i
cs
sh
o
w
s
,
duri
ng spring
festival of
2012, dail
y
netwo
rk hits
of
Chin
a train ticket web
s
ite
“1230
6” exceede
d
1.0 billion. There
wa
s netwo
rk cong
estion
in
“123
06
”
web
s
ite from
6 o’
clo
c
k in
the
morni
ng.
Du
ri
ng
sp
ring
festival of 20
13,
the
co
nge
sti
on
situation of “1230
6” web
s
ite becam
e worse, da
ily
netwo
rk hit
s
exceed
ed 1
0
.0 billion. This
situation,
of course, violated t
he
pri
n
cipl
e of hi
gh
avai
lability and
hi
gh
reliability
of data i
n
tensive
comp
uting
system, must
be avoid
ed.
Unfortu
nat
el
y, existing d
a
ta inten
s
ive
system
ha
s n
o
enou
gh flexib
ility of resource supply. When the
r
e is
a burst in wo
rklo
ad g
r
o
w
th
, such a
s
fla
s
h
cro
w
d in
web
s
ite, tradition
al data inten
s
ive system
coul
d not ha
ndle it effectively.
Even if use
pro
c
e
s
sor a
n
d
other h
a
rd
ware re
so
urce as mo
re
a
s
possible
wh
en build d
a
ta
intensive
system,
along
with ra
pid incre
a
se
of data amou
nt, availabl
e
resou
r
ce in d
a
ta intensive
system
will be
exhau
sted ev
entually. The
r
efore, in o
r
d
e
r to h
andle
contin
ued
an
d ra
pid g
r
o
w
th of re
sou
r
ce
deman
d in da
ta intensive system, we must think
of system scalabi
lity from the very begin
n
ing
of
sy
st
em d
e
sig
n
.
As a
high
scalabilitysoluti
on, cluster is
a
common scheme for data intensive
computing
system. Trad
itional com
p
u
t
er clu
s
ter d
e
s
ign,
a
c
hieve
its scal
abilit
y through ad
ding ne
w no
de
into system,
whi
c
h requi
re
s more sy
ste
m
co
st. Usin
g tradition
al clu
s
ter to buil
d
universal d
a
ta
intensive co
mputing
sy
stem
has obvious sho
r
tc
o
m
ing. For ex
ample, in ord
e
r to handl
e som
e
appli
c
ation,
we in
crea
se
system inv
e
stment,
en
h
ance syste
m
comp
uting
power, after this
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
An Elastic Da
ta Intensive Com
puting Clu
s
ter on
Clou
d
(Zhaol
ei Du
a
n
)
7431
appli
c
ation i
s
done,
system
no lon
ger
ne
ed all p
r
o
c
e
s
sing
ca
pabilit
y, part of syst
em re
so
urce
will
be idle
an
d p
a
rt of
system
co
st is
wa
ste
d
.
Ideally, we nee
d a
dat
a inten
s
ive
computing
sy
stem,
in which there are enough processing capability an
d other
relati
ve resour
ce f
o
r all
custom
ers
and ap
plications, o
n
the ot
her h
and, the
r
e is
no t
oo
much i
d
le an
d wa
sted
re
source at the
same
time. In any
moment, re
so
urce in syste
m
could
b
e
utilized effe
ctively, and all cu
stome
r
s’
serv
ice
quality will be guaranteed. In other
word,
w
hen design universal data intensive computi
ng
system, we need a
scal
able an
d flexible syst
em
architectu
re,
make
syste
m
have eno
ugh
resou
r
ce to meet the dy
namic
pro
c
e
ssi
ng de
man
d
, and imp
r
o
v
e system
re
sou
r
ce utiliza
t
ion
efficien
cy. Apparently, there i
s
a co
ntradi
ct
ion b
e
twee
n improving re
so
urceutili
zation
and
ensurin
g cust
omers
servi
c
e quality. In tradition
al
dat
a inten
s
ive computing
system, usually
use
load b
a
lan
c
in
g strategy to
relieve
and
avoid this
co
ntradi
ction. B
u
t becau
se
d
a
ta processin
g
deman
d i
s
g
r
owin
g en
dle
s
sly, eventuall
y
this
cont
rad
i
ction i
s
u
nav
oidabl
e. Re
cently , along
with
the develop
ment of clo
ud co
m
putin
g, the con
c
ept of “acqu
i
ring resource on dem
an
d” is
pre
s
ente
d
, po
int out a new
dire
cti
on to solve above contradi
ction.
In this pa
per,
we p
r
e
s
ent
a ne
w ela
s
tic data inten
s
i
v
e comp
uting
clu
s
ter—E
D
I
CC. In
EDICC, we
use
hybri
d
reso
urce su
p
p
ly
m
ode:
E
D
ICC coul
d use
lo
cal re
source and
cl
oud
resou
r
ce at t
he
same
time
. EDICC i
s
p
a
rt in
cl
oud:
whe
n
system
is i
n
h
eavy l
oad, it u
s
e
s
cloud
resou
r
ce to increa
se avail
able re
so
urce am
ount an
d guarantee
servi
c
e qu
ality. EDICC is part
out of cl
oud:
it use
s
l
o
cal
re
sou
r
ce to
enha
nce system reliabilit
y
and se
cu
rity,
keep se
rvi
c
e
available eve
n
if cloud
re
source i
s
una
vailable.
In this way, EDI
C
C co
uld a
c
hi
eve outstan
di
ng
perfo
rman
ce
and re
so
urce efficie
n
cy
comp
arin
g
with traditio
n
al data inte
nsive comp
u
t
ing
clu
s
t
e
r
s
.
2. Curren
t
Resear
ch and
Relate Wo
rk
Data
inte
nsiv
e
com
puting system whi
c
h
has
fixed n
u
m
ber
of re
so
urce, usually
Curre
n
t
resea
r
ch an
d
Relate
Wo
rk use
load
ba
lanci
ng
strate
gy and
re
sou
r
ce
di
spat
chi
ng st
rategy t
o
guarantee
system service quality.
Along with increa
se of the task
numbe
r in sy
stem, availab
l
e
resou
r
ce will
be in
sho
r
t supply, the eff
e
ct of
lo
ad b
a
lan
c
ing
strat
egy and
re
so
urce di
spat
ch
ing
strategy become more and more weak, sy
stem
servi
c
e quality will decli
ne inevitably.
In traditional
data inten
s
ive com
puting
system
, to h
andle the
situation that reso
urce
deman
d ex
ceed
s
sup
p
ly, there
a
r
e
several
wa
y
s
: admi
ssi
on
control, servi
c
e
do
wng
r
ad
e,
dynamic in
creasi
ng serve
r
reso
urce. Th
ese meth
od
s could b
e
divided into two
kind
s: first ki
nd
method, solv
e probl
em of lackin
g
re
so
urce su
ppl
y throu
gh re
du
cing som
e
users’ o
r
all users’
servi
c
e qu
ality. Second ki
nd method,
solve t
he problem throug
h increa
sing
the amount
of
available
re
source i
n
sy
stem
.
Service downgrade
method only av
oids th
e p
r
oblem
of lacking
resou
r
ce su
p
p
ly in some
degree. Incre
a
sin
g
se
rver resource
m
e
thod is
limit
ed by the to
tal
amount
of resource
in
se
rver fa
rm,
we
could m
o
ve
re
sou
r
ce from
o
ne a
ppli
c
atio
n to a
nothe
r,
but
there
still will
be pro
b
lem
of lacki
ng resource
supply
whe
n
the total re
sou
r
ce d
e
mand ex
cee
d
s
total amount of available reso
urce in se
rver fa
rm. Th
erefo
r
e, tradit
i
onal data int
ensive
comp
uting
system could
not eliminate the
possibi
lity of lackin
g
resou
r
ce
supply. When
the processi
ng
deman
d bursts, such as flash
cro
w
d h
appe
ns, this
possibility will
raise si
g
n
ificantly, eventually
cau
s
e
sh
arp
decli
ne of
system availabilit
y and reli
abilit
y. Some data
intensive
co
mputing
syst
em
only facin
g
th
e pro
b
lem
of lacking
re
sou
r
ce
su
pply in
some
sp
eci
a
l
perio
d of tim
e
, for exampl
e,
Chin
a trai
n ti
cket bo
oki
ng
system
(“12
3
06” we
bsite
)
i
n
sprin
g
fe
sti
v
al. In this
sit
uation, alth
ou
gh
we could p
u
rcha
se e
nou
g
h
softwa
r
e an
d hard
w
a
r
e re
so
urce in ad
vance, but thi
s
sol
u
tion is v
e
ry
une
con
o
mic.
On
the
oth
e
r
han
d, alo
ng
with
the
co
ntinuin
g
gro
w
th amo
unt
of data and
comp
uting, e
v
en if there a
r
e eno
ugh
re
sou
r
ce in sy
stem, more re
sou
r
ce always will be n
e
e
ded
eventually. S
o
, ho
w to
m
a
ke
data
inte
nsive
co
mput
ing
system
h
a
s
eno
ugh
reso
urce
sup
p
ly
elasti
city, and control system cost at the sam
e
time, is the core probl
em when de
sign
data
intensive com
puting
sy
ste
m
.
In the 9
0
'
s
of
the la
st
cent
ury, the
con
c
ept
of g
r
id
computing
is p
r
esented,
de
scribin
g
su
ch a te
ch
n
o
logy: allo
w
cu
stome
r
a
c
quire
co
mput
ing re
so
urce
on de
mand,
just like u
s
i
n
g
electri
c
ity and
water
re
sou
r
ce in d
a
ily life, pay t
he co
st according to
the amou
nt of resou
r
ce the
y
use
d
. Althou
gh gri
d
co
mp
uting re
se
arch make co
nsidera
b
le p
r
og
ress in m
o
re
than a de
ca
de,
but the o
r
igin
al goal
of gri
d
com
puting i
s
not fully
reali
z
ed. Cu
rre
nt grid com
putin
g
is still
limite
d
in
shari
ng reso
urce am
ong re
sea
r
ch
in
stitutes
,
use
r
s is resource
provider a
nd
re
so
urce
con
s
um
er at the sam
e
time
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 10, Octobe
r 2014: 743
0
– 7437
7432
Re
cently, clo
ud co
mputin
g brin
g us
n
e
w da
wn. Th
e articl
e [1] and [2] intro
duce key
techn
o
logie
s
of
clo
ud com
puting.
In a
word, clou
d
co
mputing i
s
“from cl
oud, to
clou
d”.
Clo
u
d
is
Internet, “fro
m clou
d” m
e
ans
cu
stome
r
coul
d get
re
quire
d reso
urce throug
h In
ternet; “to
clo
ud”
means
com
p
uting result will be
sent
back to
custom
er
through
Int
e
rnet.
Any customer coul
d get
requi
re
d re
source throug
h clou
d co
mputing
a
n
d
achieve
ap
plicatio
n on
cloud. In cloud
comp
uting, d
a
ta is sto
r
e
d
i
n
cl
oud,
appli
c
ation
s
and
service
s
are d
eployed
on
cl
oud, fully utili
ze
comp
uting po
wer
of data center
to serv
e
custo
m
er.
Comp
ari
ng
with 10 ye
ars
ago, the
ne
e
d
for
massive data
processin
g
is more an
d more u
r
ge
nt, thus ne
ed mo
re
and more co
mputing po
wer.
Govern
ment
facility, rese
arch in
stitute
,
and In
du
stry ente
r
tain
ment, all b
e
gin to u
s
e
cloud
comp
uting po
wer to me
et their cont
inui
n
g
increa
sing
comp
uting ne
ed.
Along with th
e developm
e
n
t and pop
ularization of cloud computi
ng, more a
n
d
more
resea
r
chers
begin to do rese
arch u
s
in
g cloud
re
so
u
r
ce. NIR [3] pre
s
ente
d
by Zhou Yang i
s
an
open
so
urce
clou
d ena
bl
ed content
based ima
g
e
retrieval
sy
stem. Co
nte
n
t based im
age
retrieval i
s
hi
gh com
putati
on task be
ca
use of
the al
gorithm
com
putation com
p
lexity and big
amount of da
ta. NIR usi
n
g
cloud
re
sou
r
ce to buil
d
im
age retrieval
system, is
ea
sy to extent and
flexible to deploy.
Tim Dornem
ann [4] presented a scheme of
BPEL workflow engine basi
ng
on cl
oud
comp
uting. T
he b
a
si
c i
dea
of this
sche
me i
s
u
s
in
g
virtual ma
chi
nes in Am
azon’s EC2
(El
a
sti
c
Compute
Cloud) to pr
ovide new host
s and
handl
e peak load
situations
in
BPEL workfl
ow
system. The
pre
s
ente
d
sy
stem supp
ort
s
on de
man
d
resou
r
ce prov
isioni
ng. Experime
n
tal re
sults
for a
comput
ationally inte
nsive vid
eo
analysi
s
appl
ication
sho
w
this
sol
u
tion
is fe
asi
b
le
and
effic
i
ent.
Existing data
intensive
com
puting cl
usters lack
of flexibility on resour
ce provision. When
the sy
stem i
s
ove
r
loa
ded
, the perfo
rm
ance be
co
m
e
poo
r, o
r
ev
en worse, th
e se
rvice will
be
unavailabl
e.
Unfortu
nately
,
only usi
ng l
oad b
a
lan
c
in
g
strategy co
uld
not solve this
p
r
obl
em. So,
basi
ng o
n
cl
o
ud comp
uting
,
we p
r
op
ose
an ela
s
ti
c data intens
ive computing c
l
us
ter E
D
ICC. In
EDICC,
we
could
solve
th
is p
r
o
b
lem
fu
ndame
n
tally, and
g
uarant
ee the
hi
gh
availability a
n
d
reliability of our data intensive computing cluster.
3. Sy
stem Architec
tur
e
In our ela
s
tic
data inten
s
iv
e computin
g
clu
s
ter, b
a
ck
end
s of
clu
s
ter
can
be
divided int
o
two group
s: local n
ode a
n
d
clou
d nod
e. Local n
ode
s
are
con
s
tru
c
t
ed usi
ng lo
ca
l resource, th
ey
are e
s
sential
comp
one
nt o
f
EDICC. Lo
cal node
s a
r
e
alway
s
pa
rt of EDICC unl
e
ss it could n
o
t
work no
rmall
y
. Cloud no
de is built u
s
ing
clou
d reso
urce, the
y
are sam
e
with local
node
function
ally. The diffe
ren
c
e bet
ween
lo
cal n
ode
an
d
clou
d n
ode
is the
num
ber of clo
ud
nod
e is
cha
ngin
g
a
c
cordin
g to the
system l
oad
status
and
available
re
sou
r
ce a
m
ount. F
i
g. 1 sh
ows t
h
e
stru
cture of EDICC.
Figure 1. The
Archite
c
ture
of EDICC
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
An Elastic Da
ta Intensive Com
puting Clu
s
ter on
Clou
d
(Zhaol
ei Du
a
n
)
7433
In EDICC, cl
uster
has
n+m back end
s, includin
g
n (n>0) lo
cal n
ode
s and m
(m>=0
)
clou
d nod
es.
From the a
s
pe
ct of re
so
urce s
upply,
our EDI
CC
could u
s
e lo
cal re
sou
r
ce a
nd
clou
d re
sou
r
ce at the sa
me time, namely, ED
ICC has a hybri
d
resource
supply mode. Our
elasti
c d
a
ta i
n
tensive
com
puting
clu
s
te
r in
cl
o
ud envi
r
onm
ent coul
d
combi
ne
l
o
cal re
so
urce
and
clou
d re
so
urce
sea
m
lessly. Use
clo
ud reso
ur
ce
to insu
re
servi
c
e q
uali
t
y and syst
em
perfo
rman
ce
whe
n
system
is overlo
ade
d
.
Use lo
cal no
des to gu
ara
n
t
ee the reliabi
lity of EDICC.
4. Performan
ce Index
In Figure 2, we simul
a
te da
ta intensive
com
puting clu
s
ter, cho
o
se web ca
ch
e
se
rvice
a
s
the appli
c
atio
n, use NLANR’s lo
g to test perfo
rm
an
ce of traditio
nal data inte
nsive comput
ing
clu
s
ters
with
different fixed
node
num
be
rs. T
he result sho
w
s, alon
g with th
e in
cre
a
se of n
o
de
numbe
r, the averag
e re
sp
onse time is redu
ced.
Obviou
sly, more he
avy the system loa
d
is
, more i
m
provem
ent the increa
se
of node
numbe
r will
bring.
Whe
n
the node
nu
mber ex
ce
e
d
s
some d
egree, the improvement be
come
unobvio
us, th
is mean th
e system ha
s en
ough resou
r
ce to handle
current load. B
a
sin
g
on ab
o
v
e
analysi
s
, we
have an ide
a
:
We co
uld a
d
just no
de n
u
mbe
r
in Dat
a
intensive
computing
clu
s
ter.
Whe
n
system
is in heavy load, we
coul
d increa
se av
ailable no
de. Whe
n
system
load down, we
coul
d de
cre
a
s
e no
de num
ber.
Figure 2. Re
spose Time of Traditio
nal Data Intensive
Comp
uting Cluster
In orde
r to re
alize o
u
r ide
a
, we sh
ould
choo
se a p
e
r
forma
n
ce in
dex firstly. Basin
g
on
this ind
e
x, we co
uld d
e
ci
d
e
wh
ether t
h
e EDICC
n
e
e
d
s m
o
re
re
so
urce to h
andl
e cu
rrent sy
stem
load. When
we eval
uate t
he pe
rform
a
n
c
e of
web
ca
che
se
rvice,
resp
on
se time
is u
s
ually ta
ken
into a
c
count
in the first pl
ace. Be
ca
use of t
he ca
che cha
r
a
c
teri
stic,
the
r
e are
always so
me
requ
est
s
whi
c
h
are n
o
t in
ca
che
shoul
d
be
sent
to
o
r
iginal
serve
r
. The
s
e
reque
sts is so-call
ed
miss reque
st,
their
re
spo
n
s
e time
is affected
by
net
work
co
nditio
n
and
o
r
igina
l
se
rver
process
cap
a
city. Wh
en the
r
e i
s
netwo
rk con
gestio
n
or
o
v
erload
ed o
r
i
g
inal
serve
r
,
miss re
que
sts’
respon
se tim
e
will in
crea
se noticeably
even if the lo
ad in E
D
ICC i
s
lo
w. In othe
r word
s, we
could
not use resp
onse time a
s
perfo
rma
n
ce
index di
re
ctly. So, we pro
posed a
rel
a
tive perfo
rma
n
ce
index R to re
pre
s
ent the p
e
rform
a
n
c
e o
f
EDICC.
Thi
s
relative pe
rf
orma
nce inde
x R is descri
b
ed
as foll
ow:
R i
s
ratio
of hit
reque
sts’
ave
r
age
se
rvic
e ti
me to
miss
re
que
st’ ave
r
ag
e o
r
iginal
server
respon
se tim
e
in given period.
e
AvgMissTim
AvgHitTime
R
, AvgHitTime is the average se
rvic
e time of all h
i
t reque
sts in
EDICC, Avg
M
issTime is
the averag
e origin
al se
rver re
spo
n
se time of all
miss
requ
est
s
in
EDICC.
Before u
s
in
g
perfo
rman
ce index
R, we n
eed
set
thre
shold,
whe
n
R
exceed the
threshold, a
d
d
ne
w no
de
to EDICC, otherwi
se re
move nod
e from E
D
ICC.
Ho
w to de
cide
threshold i
s
an important p
r
oble
m
. We must main
tai
n
the system perfo
rman
ce
in an accepta
b
le
level. At the same
time, we sh
ould
re
d
u
ce th
e fre
q
u
ency of n
ode
numb
e
r’
s alt
e
ration, to
avoid
10
00
20
00
30
00
40
00
50
00
1
1
1
2
13
14
1
5
1
p
eri
od
Aver
age
Resp
onse
Tim
e(ms
ec)
8
no
de
clu
ste
r
1
2 n
ode
cl
ust
er
1
6 n
ode
cl
ust
er
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 10, Octobe
r 2014: 743
0
– 7437
7434
perfo
rman
ce
jitter. In orde
r to
achi
eve
above
goal
s,
We
set a th
reshold
rang
e [t1, t2]. Wh
en
R<t1, rel
e
a
s
e cloud reso
urce if there are cl
oud no
des in EDI
C
C. Whe
n
R>t
2
, acqui
re cl
oud
resou
r
ce, buil
d
clou
d nod
e to incre
a
se capa
city of EDICC.
5. Deplo
y
me
nt of Clo
ud Node
Providers of reso
urce in th
e clou
d, su
ch
as Ama
z
on
EC2 or th
e Scien
c
e
Clou
d
s
, enabl
e
use
r
s to
acqu
ire
re
sou
r
ce
on d
e
man
d
.
Re
sou
r
ce in
clou
d u
s
u
a
lly provided
in t
h
e fo
rm of
virtual
machi
n
e
s
. Virtualization i
s
the ba
si
c te
chn
o
logy
in
clou
d an
d u
s
ers could full
y control virt
ual
machi
n
e
s
. Acqui
re
re
sou
r
ce
from
clo
ud, co
nst
r
u
c
t virtual ma
chine, an
d th
en in
stall u
s
er
spe
c
ified
app
lication
on virtual ma
chine,
this p
r
o
c
ed
u
r
e i
s
the
depl
oyment of u
s
er a
ppli
c
ation
in
clou
d.
Vmplant [5]
use
s
a
typical depl
oyme
nt method: u
s
e a
imag
e
config
ure
d
wi
th basi
c
environ
ment
to create
virtual ma
chi
ne,
then i
n
st
all
and
configu
r
e a
ppli
c
atio
n. Thi
s
meth
od
sup
port fin
e
grain
de
ployment, co
uld i
n
stall diffe
re
n
t
application
as u
s
e
r
n
eed
, but affected
by
appli
c
ation in
stall and
confi
gure time, the
deployment time may be relative long.
Another
depl
oy method
is usin
g fully configured im
age in
which
appli
c
ation
i
s
al
rea
d
y
installe
d to create virtual
machi
n
e
s
. The sh
ort
c
om
i
ng of this m
e
thod is la
ck of flexibility, only
suitabl
e for the situatio
n
that needn’t
to chan
ge
a
pplication. Despite a
bove
limitation, this
method is
si
mple and d
e
p
l
oyment time is sh
ort.
After acq
u
ire
re
sou
r
ce fro
m
clou
d, we
need to
buil
d
clo
ud n
ode
and a
dd it i
n
to ou
r
elasti
c data
i
n
tensive
co
m
puting
clu
s
te
r. Thi
s
me
an
s we sho
u
ld
deploy appli
c
ation
in clo
ud.
From the a
s
p
e
ct of elasti
c data inten
s
ive comp
ut
ing
clu
s
ter, if we allow a cl
oud
node ta
ke ho
urs
to depl
oy ap
p
lication, E
D
ICC
will n
o
t be
able to
ha
ndl
e sy
stem
overload
situ
atio
n timely. So,
we
need fa
st de
ployment in
se
con
d
s o
r
f
a
ster. O
b
vio
u
sly, usin
g b
a
si
c imag
e to create virt
ual
machi
ne, the
n
install appl
ication, this
method
coul
d not meet our ne
ed. In our ela
s
tic
data
intensive
co
mputing
clu
s
t
e
r, softwa
r
e
environ
m
ent
and
co
nfigure on
differe
nt nod
e a
r
e
al
most
identical, except IP addre
ss. Th
erefore
,
we use an
image fully install cl
uste
r and ap
plication
softwa
r
e to create virtual m
a
chi
ne, co
nst
r
uct
cloud n
o
d
e for EDICC.
6. Experimental Verifica
tion
In order to verify performance of EDICC,
we h
a
ve impleme
n
ted a trace
-
drive
n
simulatio
n
of
data inten
s
iv
e com
puting
clu
s
ter.
We
select
squi
d a
c
cess lo
g of
NLANR to b
e
th
e
trace i
n
si
mu
lation of web
ca
che
appli
c
ation,
comp
are E
D
ICC
with traditio
n
a
l data inte
n
s
ive
comp
uting cl
usters
which have fixed no
de numb
e
r.
In experime
n
t, thresh
old of
R is set to [0.3, 0.7], the sample p
e
rio
d
of R is 60
se
con
d
s,
and the initial
node num
be
r (lo
c
al no
de
numbe
r) of o
u
r EDICC is 8
.
Figure 3. Performa
nce of
EDICC
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
An Elastic Da
ta Intensive Com
puting Clu
s
ter on
Clou
d
(Zhaol
e
i Du
a
n
)
7435
Figure
3 sho
w
s
th
e comp
arison re
sult of
perfo
rma
n
c
e a
m
on
g EDICC an
d traditional
data inten
s
ive com
puting
clu
s
ter
with 8nod
es
a
nd
16 nod
es. From experim
e
nt beginni
ng
to
100th p
e
ri
od,
the resp
on
se time
curve
of EDICC
i
s
almo
st the
same
with 8
n
ode
s tra
d
itio
na
l
data inten
s
ive comp
uting
clu
s
ter. Whe
n
the syst
em
load begi
n
to increa
se, 8
node
s traditi
onal
data inten
s
iv
e co
mputin
g
clu
s
ter
co
ul
d not p
r
ovid
e eno
ugh
re
sou
r
ce, its
resp
on
se tim
e
ha
s
notable in
cre
a
se. EDI
CC
coul
d get mo
re resour
ce f
r
om
clou
d, and en
han
ce
system
cap
a
c
ity
dynamically, so
response t
i
me in E
D
ICC is
kept
i
n
a
relative low le
vel. Comp
ari
ng
with 16
no
de
traditional d
a
t
a intensive comp
uting cl
uster, in
the
begin
n
ing 10
0 perio
d
s, EDICC’s respo
n
se
time is little
highe
r
be
cau
s
e E
D
ICC
h
a
s
only
8 n
o
des. Afte
r 1
0
0
th pe
rio
d
, E
D
ICC a
d
d
cl
oud
node into
sy
stem dynami
c
ally, its re
sp
onse time
is
almost the
same with 1
6
node traditio
nal
data inten
s
ive comp
uting
clu
s
ter.
Figure 4. Nod
e
Numb
er in
EDICC
Figure 4 illuminates the
node n
u
mbe
r
chan
ge
of EDICC in expe
riment. Fro
m
the node
numbe
r
cu
rve, we
can
tell sin
c
e
ab
out
100th
peri
od,
nod
e
n
u
mb
e
r
in
EDICC b
egin to
in
crea
se,
this mea
n
s E
D
ICC sta
r
t to acq
u
ire
re
so
urce from
clo
u
d. The max
node n
u
mb
er in EDICC is
16
in ou
r exp
e
ri
ment. Analyzi
ng Fig
u
re 3
a
nd Fig
u
re 4,
we
ca
n g
e
t a
con
c
lu
sio
n
: EDICC h
a
s hig
h
e
r
resou
r
ce efficiency than
16
node tra
d
itio
nal data in
te
n
s
ive co
mputin
g clu
s
te
r, an
d
could
achieve
almost
sam
e
perfo
rman
ce.
In other
wo
rds, compa
r
in
g with tra
d
itio
nal data i
n
te
nsive
comp
uting
clu
s
ter, EDICC co
uld imp
r
ove servi
c
e q
uality and en
han
ce re
so
urce efficie
n
cy
at the same ti
me.
Figure 5. Co
mpari
s
io
n of R Value
Figure 5
sh
o
w
s chang
e of
perfo
rma
n
ce
index
R’s val
ue in
experi
m
ent. The
ch
a
nge of
R
value
i
s
con
s
istent with ch
ange
of re
sp
onse
ti
me
in
data
inten
s
i
v
e co
mputing
clu
s
te
r. Wh
en
system
lack
of re
source, R
value will
i
n
crea
se, when
sy
stem has
idl
e
resource, R
value will
4
6
8
10
12
14
16
18
1
5
1
101
15
1
201
2
5
1
p
erio
d
no
de n
umb
er
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 10, Octobe
r 2014: 743
0
– 7437
7436
decrea
s
e.
Th
e R value
flu
c
tuate
s
a
c
cording to
the l
oad i
n
d
a
ta i
n
tensive
com
puting
clu
s
ter.
Ran
ge of R
value’s flu
c
tu
ation affecte
d
by am
ou
nt of available
resou
r
ce. If there i
s
en
o
ugh
available
re
source,
R val
ue
will flu
c
tu
ateuno
bviou
s
ly,otherwi
se
R
valu
e will vary
inten
s
iv
ely.
Among all th
ree
cu
rves in
Figure 5, curve of
16
n
ode
s traditio
nal data inte
nsive
comp
uting
clu
s
ter is the
smooth
e
st on
e, and cu
rve of 8 node
s
tra
d
itional data i
n
tensive
com
puting cl
uste
r is
the rou
ghe
st
one. In EDICC, be
cau
s
e
reso
urce
sup
p
l
y is dynami
c
ally adjust a
c
cording
syste
m
perfo
rman
ce,
R value’
s
curve i
s
ve
ry
clo
s
e to
16
node
s t
r
aditi
onal
data i
n
tensive
comp
uting
clu
s
ter’
s cu
rve.
Thi
s
p
r
ove
s
R
i
s
an eli
g
ible
perfo
rm
ance in
dex f
o
r
data i
n
ten
s
ive
com
puting
clu
s
t
e
r.
7. Securit
y
a
nd Reliability
Becau
s
e
we
use cl
oud
resou
r
ce in elasti
c
data intensive co
mputi
ng cl
uster, it is
inevitable tha
t
we will brin
g cloud
se
cu
rity issue
s
in
to our EDICC. Secu
rity i
s
sue
s
is a h
o
t
resea
r
ch field
in cloud
com
puting.
In order to a
v
oid the i
n
flu
ence of
se
cu
rity
pro
b
lem i
n
cl
oud
com
puting,
we
prese
n
t a
solutio
n
for EDICC. The m
a
in idea i
s
: divide client
o
r
requ
est into
grou
ps
with d
i
fferent se
cu
ri
ty
prio
rity.
For e
x
ample,
we can set some
IP
address
ra
nge
as high
se
curity p
r
io
ri
ty client, or se
t
requ
est
s
to spe
c
ial d
e
sti
nation ad
dre
s
ses o
r
URL
s
as hi
gh se
curity pri
o
rity reque
st. When
EDICC
re
ceiv
e req
u
e
s
ts
wi
th high
se
curi
ty priority, se
nd them to lo
cal n
ode
s
whi
c
h h
a
ve hig
h
e
r
se
curity co
nfiguratio
n, usin
g local n
ode t
o
guarantee t
he se
cu
rity in EDICC.
Similar to
security in EDICC, for
reliabili
ty i
ssue
in E
D
ICC, we
ca
n give cli
ent
or requ
est
different p
r
io
rity. When
cl
oud
re
sou
r
ce
is u
nava
ilab
l
e, use
lo
cal
node
to serve high
prio
rity
reques
t firs
tly.
8. Conclusio
n
In this pa
per,
we p
r
e
s
ent a
n
ela
s
tic d
a
ta
intensive co
mputing clu
s
ter-E
DICC.
In EDICC,
we u
s
e “on d
e
mand
” cl
oud
resource to
con
s
tr
u
c
t ne
w clu
s
te
r nod
es when lo
ad
exceed
syst
em
cap
a
city, and
relea
s
e
unn
e
c
e
s
sary
cloud
re
sou
r
ce wh
en loa
d
do
wn
. At the same
time, use l
o
cal
resou
r
ce to guarantee
se
curity and rel
i
ability in
our data intensi
v
e computin
g
system. In this
way, we
built
a high avail
a
ble an
d high
reliabl
e data i
n
tensive
com
puting
syste
m
. Experime
n
ts’
results
sh
ow,
com
p
a
r
ing
with tra
d
ition
a
l data i
n
ten
s
ive
comp
uting
clu
s
ters
whi
c
h h
a
ve f
i
xed
node
num
be
r, ou
r el
asti
c d
a
ta inte
n
s
ive
co
mp
uting clu
s
ter
could accom
p
lish outsta
n
d
i
ng
perfo
rman
ce,
higher
re
sou
r
ce effi
cien
cy and lo
wer
system co
st.
Ackn
o
w
l
e
dg
ements
This
wo
rk
was
supp
orte
d
by the Scie
nce
and T
e
chnolo
g
y Plan
of Zheng
zh
ou (No.
131PPTG
G
4
11-5
)
, the S
c
ien
c
e
and
Tech
niqu
e Rese
arch Pro
g
ram
of He
nan Edu
c
ati
onal
Committee (No. 14A52
00
22), the Ph.D. Scientific
Rese
arch fund
ation of
Zhen
gzh
ou University
of Light Industry.
Referen
ces
[1]
Che
n
Ka
ng, Z
hen
g W
e
i Mi
n
.
Clou
d
C
o
mp
uting: S
y
stem I
n
st
ances an
d Current Rese
a
r
ch
.
Jour
nal o
f
Software
.
200
9; 20(5)
:
13
37-1
3
4
8
.
[2]
Luo J
un-z
h
o
u
, JinJi
a
-h
ui
, Song A
i
-bo
.
Cl
ou
d Comp
uting: Architechtur
e
and
Ke
y
T
e
chnolog
ies
.
Journ
a
l on C
o
mmunic
a
tio
n
s
.
201
1
,
32
(7
)
:
3-21
.
[3]
Z
huo Y
a
n
g
,
Sei-ich
i
ro
Ka
mata, Alir
eza
A
hrar
y
.
NIR:
Conte
n
t b
a
se
d i
m
age
retri
e
val
on
cl
ou
d
computi
ng,
Procee
din
g
s of IEEE Internatio
nal Co
nfere
n
c
e
on Intelli
ge
n
t
Computin
g and Intell
ige
n
t
S
y
stems. IEEE Computer Soc
i
et
y
,
Sh
ang
ha
i, Chin
a. 200
9; 556-5
59.
[4]
T
i
m Dornema
nn, Ernst Ju
h
n
ke, Bern
d F
r
eisle
b
e
n
.
On-
D
e
m
a
nd R
e
so
urce Prov
isio
ni
ng for BPE
L
W
o
rkflow
s Using Ama
z
o
n
'
s
Elastic Co
mp
u
t
e Clou
d
.
Proceedings ofthe 9th
IEEE/ACM International
S
y
mp
osi
u
m on
Cluster Com
p
uting a
nd the
Grid, IEEE, Sh
ang
hai, C
h
in
a. 200
9: 140-
147.
[5]
Ivan Krsu
l, Arij
i
t
Gangu
l
y
,
Ji
an
Z
han
g.
VMPl
a
n
ts: Provid
in
g
and
Man
a
g
i
n
g
Virtual
Mach
in
e Exec
utio
n
Enviro
n
m
ents
for Grid Computi
ng.
Pr
oceedings ofthe ACM
/IEEE
SC2004 Confer
ence on
Superc
o
mputi
n
g. IEEE Computer Societ
y
,
Pi
ttsburg, PA, USA. 2004: 1-1
2
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
An Elastic Da
ta Intensive Com
puting Clu
s
ter on
Clou
d
(Zhaol
ei Du
a
n
)
7437
[6]
Hide
o
Nis
himu
ra, Nao
y
a M
a
ruy
a
ma, Satoshi Matsuoka.
Vi
rtual Cl
usters
on the F
l
y - F
a
st, Scalab
le,
and
F
l
exi
b
l
e
I
n
stall
a
tion
,
Pr
oceedings
of the S
e
venth I
EEE In
ternational S
y
mpos
ium on Cluster
Comp
uting a
n
d
the Grid, IEEE Computer Soc
i
et
y
,
Ri
o De Ja
neir
o
, Brazil, 2
007: 54
9 – 55
6
.
[7]
Marios
D D
i
kai
a
kos, Geor
ge
Pallis, P
ank
aj
Mehra. C
l
o
ud
Comp
uting
:
Di
stributed
Intern
et Comp
uti
n
g
for IT
and Scientific Rese
arch
.
IEEE Internet
Com
p
uting
. 2
0
09; 13: 10-
13.
[8]
W
ang Pe
ng, M
eng
Dan
1
, Z
h
a
n
Jia
n
fen
g
. Re
vie
w
of Progra
mming Mo
de
ls
for Data-Inte
n
s
ive.
Jour
na
l
of Computer R
e
searc
h
an
d D
e
vel
o
p
m
e
n
t
. 2010; 47(
11): 19
93-2
002
.
[9]
Z
heng P
a
i, Cu
i
Li-Z
hen, W
a
n
g
Hai-Y
a
n
g
. A Data Pl
ac
eme
n
t Strateg
y
for
Data-Intens
ive
Appl
icatio
n
s
in Clo
ud
.
Chi
n
ese Jour
nal
of Co
mp
uters
.
2
010; 33(
8): 147
2-14
80
.
[10]
Bicer T
e
kin, Chiu
Dav
i
d, A
g
ra
w
a
l Gag
a
n
.
A framew
or
k for data-i
n
te
nsive c
o
mputi
ng w
i
th clo
u
d
burstin
g
.
Proc
eedings
of the
201
1 IEEE Int
e
rnational Conference on
Cluster Comput
ing. 2011: 169-
177
.
Evaluation Warning : The document was created with Spire.PDF for Python.