TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol.12, No.4, April 201
4, pp. 2860 ~ 2
8
6
7
DOI: http://dx.doi.org/10.11591/telkomni
ka.v12i4.4737
2860
Re
cei
v
ed Se
ptem
ber 7, 2013; Re
vi
sed
No
vem
ber 7,
2013; Accept
ed No
vem
b
e
r
21, 2013
Dynamic Data Grid Replication Algorithm Based on
Weight and Cost of Replica
Zhongping Z
h
ang
1,2,
*, Ch
ao Zhang
1
, Mengfei Zuo
1
, Zhiping Wang
3
1
Coll
eg
e of Informatio
n
Scien
c
e and En
gi
ne
erin
g, Y
ansh
a
n
Universit
y
, Qin
hua
ng
dao, He
bei 0
6
6
004, Ch
ina
2
T
he Ke
y
L
a
b
o
r
ator
y
for C
o
mputer Virtua
l T
e
chn
o
l
o
g
y
an
d S
y
stem Integr
a
t
ion of Heb
e
i P
r
ovinc
e
,
Qinhu
an
gda
o, Heb
e
i 06
60
04, Chin
a
3
F
o
reign L
a
n
g
u
age C
o
ll
eg
e of Dali
an Ji
aoto
n
g
Univ
ersit
y
, D
a
lia
n, Lia
o
n
i
ng,
1160
52, Ch
ina
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: zpzhan
g@
ys
u.edu.cn
A
b
st
ra
ct
Data Grid is c
o
mpos
ed of a
larg
e nu
mber
of di
strib
u
ted c
o
mputati
on
an
d storag
e reso
urces
t
o
facilitate th
e
ma
na
ge
me
nt of t
he hu
ge
distrib
u
ted a
n
d
shari
ng
dat
a reso
urces e
fficiently. Dyn
a
mic
replic
atio
n ca
n
reduc
e the
fi
le stora
ge ti
me an
d us
e th
e gri
d
res
ourc
e
s effectively
in a
Data Gri
d
envir
on
me
nt. T
he Data Gri
d
topo
logy
is divi
ded i
n
to th
re
e layers: Re
gio
n
a
l lev
e
l, LAN l
e
vel, the gr
id s
i
t
e
level. W
e
h
a
v
e
i
m
prov
ed D
HRA al
gorith
m
in three ar
ea
s: replica s
e
le
ction, repl
ica
plac
e
m
ent, re
pli
c
a
repl
ace
m
e
n
t. Consi
der
ing
th
e w
e
ight
and
cost of re
p
lica
,
w
e
propos
e
dyna
mic Data
Grid rep
licati
o
n
alg
o
rith
m
LW
L
C
Bas
e
d
on
W
e
ight
an
d C
o
st of R
e
p
lica
.
At last, w
e
use th
e Optor
S
im to
prov
e
th
e
effectiveness
o
f
the LW
LC.
Ke
y
w
ords
:
dat
a grid, hi
erarch
ical, dyn
a
m
ic r
eplic
atio
n, opto
r
sim
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
At pre
s
ent,
wi
th the in
crea
se of d
e
man
d
of
larg
e-scale
com
m
ercial
appli
c
ation
s
and th
e
so
cial
sci
en
ces, La
rg
e-scale data
ge
n
e
rated
all
the
time in the
worl
d [1]. Fo
r exampl
e, hi
gh-
energy physi
cs, bioi
nfor
m
a
tics, ea
rth observation,
global
cli
m
at
e cha
nge, image p
r
o
c
essing,
data minin
g
, data in the
s
e appli
c
ation
s
re
ached T
B
level or e
v
en PB level [2]. These
data
can
not
b
e
sto
r
ed
in som
e
centralized si
tes
a
nd
mu
st be
dispe
r
sed
thro
ugho
ut t
he
wo
rld. In t
h
is
ca
se, h
o
w to
manag
e a
nd
store
hu
ge
a
m
ount of
dat
a
is very diffi
cult a
nd
even
impo
ssi
ble [
3
-4].
The Data Gri
d
can
effectively solve this probl
em; it
tries to sto
r
e t
hese data in
distrib
u
ted
sites,
and provide
s
retrieve
s for e
a
ch a
ppli
c
atio
n.
The
network
band
width
is the m
o
st i
m
po
rtant fa
ctor th
at affect
s the
perfo
rman
ce
of Data
Grid, i.e. slo
w
data acce
ss re
st
ri
ct the perfo
rman
ce
of data-inten
s
ive applicatio
ns in Data G
r
id
.
the topolo
g
ical structu
r
e
o
f
Data G
r
id
wa
s p
r
op
os
e
d
in 3
L
HA
(3
-Level
Hie
r
a
r
chi
c
al Alg
o
rit
h
m)
algorith
m
[5], it has three l
e
vels: regi
on
s, LANs
withi
n
each regi
o
n
, and site
s within ea
ch L
A
N,
the netwo
rk b
and
width of the three level
s
are in a
s
ce
nding o
r
de
r. Regi
onal leve
l communi
cat
e
s
throug
h WA
N, thus
avoid
i
ng re
gion
al level acce
ss can
effectivel
y redu
ce a
ccess laten
c
y a
nd
improve the
perfo
rman
ce
of Data G
r
id.
We u
s
e dat
a repli
c
atio
n mech
ani
sm to add
re
ss thi
s
probl
em. Dat
a
repli
c
atio
n i
s
an
other im
portant o
p
timi
zation m
e
a
s
u
r
e which u
s
e
d
to mana
ge
the
geog
rap
h
icall
y
distributed
data. Effe
ctive
data replication can brin
g lo
wer
band
wi
dth
con
s
um
ption,
and can p
r
o
v
ide more
available d
a
ta. Data repli
c
ati
on is divid
ed
into three m
a
jor
parts
in
thi
s
p
aper: (1
) repli
c
a sel
e
ction,
i
.
e.
t
he process of
sel
e
cting
the
b
e
st
repli
c
a from th
ose
repli
c
a
s
g
eog
raphi
cally
sp
reade
d a
c
ross the Data G
r
i
d
; (2
) repli
c
a
placement, i.
e. the p
r
o
c
e
s
s of
sele
cting the
best SE (sto
rage elem
ent) to store r
epli
c
a; (3
) re
plica repla
c
e
m
e
n
t, i.e. when the
local SE do
e
s
not have
en
ough
storage
spa
c
e to
stor
e repli
c
a, ho
w to delete
some files to
store
the new repli
c
a.
Based
on
DHRA (Dynami
c
Hie
r
archi
c
al
Repli
c
ation
A
l
gorithm
) al
go
rithm
[6], We make
three imp
r
ov
ements: repli
c
a sele
ction,
repli
c
a pla
c
ement, repli
c
a repl
aceme
n
t. Then pro
pose
dynamicdata
grid
repli
c
atio
n algo
rithm b
a
se
d on
we
ig
ht and
co
st of repli
c
a, calle
d LWLC
(Le
a
s
t
weig
ht and
Lea
st Co
st).
Simulation
results
wi
th
OptorSim
sh
ow that L
W
LC give
s be
tter
perfo
rman
ce
comp
ared to other alg
o
rith
ms.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Dynam
ic Dat
a
Grid
Repli
c
ation Algorith
m
Based on Weig
ht and Cost of… (Zh
o
ngpin
g
Zhan
g
)
2861
2. R
e
lated
Works
Curre
n
tly, Data Grid re
sea
r
ch
ers have
a high
intere
st in effective replication strategy.
Here are sev
e
ral commo
n sched
uling al
gorithm
s.
Sang-Mi
n Pa
rk, et
c. propo
sed B
H
R (Ba
ndwi
d
th
Hi
erarchy ba
sed
Repli
c
ation
)
algorith
m
[7] in 2004, th
e algo
rithm redu
ce
s data
acce
ss l
a
ten
cy mainly by incre
a
si
ng the l
o
cal file a
c
ce
ss
and avoidi
ng
netwo
rk
con
g
e
stion
s
. The
Data G
r
id top
o
logy is divid
ed into two le
vels: site leve
l,
regio
n
level. Network ban
dwidth bet
we
en the r
egi
on
s is lower tha
n
the band
wi
dth betwe
en the
sites. Th
e alg
o
rithm repli
c
a
t
es the repli
c
a to t
he site if there a
r
e e
n
ough
spa
c
e,
or a
c
cesse
s
t
h
e
repli
c
a re
mot
e
ly if
the repli
c
a availabl
e in the sa
me region. Othe
rwise it tries to make avail
able
spa
c
e by de
leting files u
s
ing L
RU
(L
east Recent
l
y
Used
), then repli
c
ate the repli
c
a. T
h
is
algorith
m
ha
s two d
r
a
w
ba
cks:
(1) it
con
s
ide
r
s
pop
ula
r
ity of repli
c
a
s
at region
le
vel; (2)
repli
c
as
are pla
c
e
d
at all the requ
ested si
tes, not
only the appropriate
site.
Ruay-shi
ung Cha
ng,
etc. prop
osed
scheduli
ng alg
o
rithm HCS
(Hie
ra
rchical Clu
s
ter-
based S
c
he
d
u
ling) an
d da
ta repli
c
atio
n
algorith
m
HRS (Hierarchical Replic
ation
Strategy) [8]
in
2007, th
e t
w
o alg
o
rithm
i
m
prove
d
the
efficien
cy
of data
sto
r
ag
e in
Data
G
r
id. HCS u
s
e
s
a
hiera
r
chi
c
al sche
duling
an
d decre
ases sea
r
ching
ti
me for the b
e
st computin
g site by usi
n
g
clu
s
ter info
rm
ation. HRS al
gorithm m
a
ke
s two
imp
r
ov
ements
ba
se
d on BHR alg
o
rithm, nam
el
y:
(1) BHR
che
c
ks
all site
s t
o
sel
e
ct th
e
best
repli
c
a,
but HRS alg
o
rithm
sele
ct
s the
be
st wi
thin
c
l
us
ter, if the c
l
us
ter does not exis
t then chec
ks
oth
e
r
clu
s
ters; (2) B
H
R u
s
e
s
the fre
que
ncy of
repli
c
as in
cl
uster level, while
HRS al
g
o
rithm u
s
e
s
t
he fre
que
ncy
of repli
c
a
s
i
n
site l
e
vel. HCS
job scheduli
n
g
alg
o
rithm
s
along
with HRS
repli
c
at
io
n strategy h
a
s
le
ss job
ex
ecutio
n time
in
comp
ari
s
o
n
to other
sch
ed
uling algo
rith
ms and
repli
c
ation strate
gi
es.
A.Horri, etc.
prop
osed 3L
HA (3
-Level Hierarchi
c
al Algorithm
) al
gorithm in 20
08. This
pape
r relate
s to the job
sch
edulin
g and
d
y
namic d
a
ta
replication. Th
ey con
s
id
ere
d
a hie
r
a
r
chi
c
al
netwo
rk
stru
cture th
at ha
s
three
levels:
regio
n
, LA
Ns within
ea
ch
region,
and
sit
e
s
within
ea
ch
LAN, the
net
work
ban
dwi
d
th of the th
re
e level
s
i
s
in
ascen
d
ing
order. F
o
r effici
ent sch
eduli
n
g of
any job, the
algorith
m
det
ermin
e
s
mo
st appropri
a
te
regi
on
(LAN, site) th
at h
o
lds
mo
st of
the
requ
este
d re
plica
s
( from
size p
o
int of vi
ew), i.e.
m
o
st
of the requ
e
s
ted
repli
c
a
s
are
available
in
that re
gion
( L
A
N, site
). Thi
s
can
si
gnifi
cantly red
u
ce t
o
tal tran
sfe
r
t
i
me an
d
network traffic
.
Th
e
main we
akne
ss of 3L
HA
algorith
m
is i
t
place
s
re
pli
c
a
s
at all req
ueste
d site
s but not just the
approp
riate si
tes.
Najme M
a
n
s
ouri, etc. p
r
e
s
ente
d
DHR
(Dyna
m
ic
Hie
r
archi
c
al Rep
lication
)
algo
rithm
[9]
based on th
e 3LHA alg
o
r
ithm in 201
2; it also us
ed three
-
tier grid archite
c
ture. At rep
lica
sele
ction,
DHR
sele
cts the
site
that h
a
s lea
s
t n
u
mbe
r
of
re
que
st;
at re
plica pl
a
c
eme
n
t, DHR
cre
a
tes
a so
rt
ed list (by n
u
m
ber of repli
c
a acce
ss)
of
all SE’s that reque
sted
the
particula
r file in
regio
n
, then the repli
c
a
will
be placed in
the first SE of the above so
rted list.
Najme
Ma
nsouri, et
c. p
r
op
ose
d
MLA
L
W (Mo
d
if
y Late
s
t Acce
ss L
a
rgest
wei
ght S
t
rategy)
[10] algorithm
based o
n
LALW (L
atest Access La
rg
e
s
t weight Strategy) [11] alg
o
rithm in 201
2.
MLALW d
e
letes files by co
nsid
erin
g thre
e important
factors: lea
s
t frequ
ently use
d
repli
c
a
s
, least
recently used
repli
c
a
s
and
the si
ze of th
e repli
c
a. ML
ALW sto
r
e
s
e
a
ch
repli
c
a in
an app
rop
r
ia
te
site, i.e. ap
propriate
site
i
n
the
regi
on t
hat ha
s
th
e h
i
ghe
st num
be
r of
acce
ss i
n
future
for that
particula
r repl
ica.
Najme
Ma
nsouri, et
c. p
r
o
posed
a
sch
edulin
g alg
o
rithm CSS
(Combine
d S
c
h
edulin
g
Strategy) an
d dynamic
Data Grid re
pl
ication
alg
o
ri
thm DHRA algorith
m
in 2013. CSS first
determi
ne
s th
e region
that
has the
mo
st of the
re
que
sted file
s
(fro
m
the
si
ze
po
int of view), i.
e.
most of the reque
sted file
s are av
ail
abl
e in that regio
n
. Then, com
b
ined
cost for each
site within
the be
st re
gio
n
is
com
pute
d
and th
e job
will be
assig
n
ed to the
site
with the Mini
mum Combin
ed
Co
st. DHRA has three pa
rts: (1) R
epli
c
a Selection, it generates a l
i
st
of replica that are availa
ble
in other
regi
o
n
s, and from
this list it sel
e
cts
th
e repli
c
a that ha
s t
he minimu
m Re
spo
n
se Ti
me
whi
c
h is
cal
c
ulated a
s
the
time elapse
d
for transfe
rri
ng a data file
from one sit
e
to another.
(2)
Repli
c
a pla
c
e
m
ent, the rep
lica is pla
c
ed
at the SE
th
at has most
of reque
sts i
n
the region.
(3)
Repli
c
a Ma
n
ageme
n
t, first deletes tho
s
e files
with
minimum tim
e
for tran
sferring, i.e. files tha
t
are both avail
able in the cu
rrent site as
well as in the
local LA
N, second if space still insufficient,
then repeat
s
first ste
p
for
each LA
N in
the cu
rrent region, third
if
spa
c
e
still in
sufficie
n
t, then
delete
s
the re
maining file
s by using L
RU me
thod till space is availa
ble for re
plica
.
These alg
o
rit
h
ms
only co
nsid
er
ce
rtai
n aspe
c
t
s
that affec
t
the performanc
e
of Data
Grid, b
u
t not
con
s
id
er
co
mpre
hen
sivel
y
. Such a
s
: replica repla
c
ement, some
algo
rithms u
s
e
LRU method
to delete files and others u
s
e LF
U meth
od; repli
c
a se
lection, some
algorithm
s o
n
ly
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 4, April 2014: 2860 – 2
867
2862
con
s
id
er the
waiting time o
f
copying a re
plica;
re
plica placement, some algo
rith
ms only take
into
accou
n
t the delay of netwo
rk tra
n
smissi
on
.
3. Proposed
Replication
Al
gorithm
In this
se
ctio
n, first th
e to
pology
of Data G
r
id i
s
de
scrib
ed
and
th
en L
W
L
C
alg
o
rithm i
s
pre
s
ente
d
.
3.1. Topolog
y
of Data Gri
d
Figure 1 sho
w
s the hi
era
r
chi
c
al archite
c
ture
whi
c
h
has three lev
e
ls simil
a
r to
what is
given in
3L
HA algo
rithm.
First l
e
vel i
s
regio
n
, they
are
co
nne
cte
d
via inte
rnet
whi
c
h
ha
s l
o
w
band
width; seco
nd
level contain
s
lo
cal LANs with
in t
he re
gion, th
ey have a m
oderately hig
her
band
width
co
mpari
ng to internet bet
ween them. T
h
ird level co
mpri
se
s the sites
within e
a
ch
LAN, whi
c
h h
a
ve the highe
st band
width.
Figure 1. Dat
a
Grid To
polo
g
y
3.2. LWLC
Algorithm
Whe
n
a jo
b i
s
disp
atch
ed t
o
a
site, it ne
ed c
opy the fi
les that th
e jo
b re
quired b
u
t
the SE
don’t have, this process i
s
data re
plica
t
ion. This pa
per p
r
e
s
ent
s dynamic
Dat
a
Grid repli
c
a
t
ion
algorith
m
ba
sed on weight
and co
st rep
lica, call
ed L
W
L
C
. LWL
C
algorith
m
is p
r
opo
se
d ba
sed
on the DHRA
, it improves
on thre
e asp
e
cts: repli
c
a
sele
ction,
co
nsid
erin
g not
only the wait
in
g
time of co
pying the
repli
c
a
s
but al
so th
e
tran
smissio
n
delay; repli
c
a pla
c
eme
n
t, Whe
n
there a
r
e
several SEs
in the regi
on
have the sa
me numb
e
r
of requ
estin
g
the particula
r repli
c
a,
DHRA
algorith
m
just
rando
mly sel
e
cts a SE, while LWLC
al
gorithm
sele
cts the SE, it h
a
s the maxim
u
m
numbe
r of
different jo
bs th
at req
u
e
s
ted
the pa
rtic
ul
ar
repli
c
a, T
hereby re
du
cing
the waiting ti
me
for multiple j
obs; repli
c
a
repla
c
e
m
ent:
LWL
C
alg
o
r
ithm con
s
id
ers th
e weig
ht of repli
c
a
in
different time interval and t
he co
st of rep
lica
s
.
3.2.1. Replic
a Selection
Select the op
timal sou
r
ce
SE that can redu
c
e
the trans
fer time. Affec
t
ing the trans
fer
spe
ed of
re
pli
c
a mainly rel
a
ted
to
th
re
e asp
e
ct
s: the
netwo
rk ba
nd
wi
dth
betwee
n
source
SE
and
destin
a
tion S
E
, Physical
copy
spee
d
of the sou
r
ce SE and th
e waiting tim
e
in the re
q
ues
t
queu
e. They
can
be me
asured
with d
a
ta tran
smi
ssi
o
n
delay, sto
r
a
ge a
c
cess lat
ency a
nd waiting
time in the reque
st queu
e, resp
ectivel
y
. Then we prop
ose the T
t
(Tra
nsfe
r
Time), it can
be
defined by th
e followin
g
eq
uation:
n
FF
F
t
i=
1
rt
S
S
Sf
Sf
S
i
T=
+
+
WS
S
()
(
)
(
)
(1)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Dynam
ic Dat
a
Grid
Repli
c
ation Algorith
m
Based on Weig
ht and Cost of… (Zh
o
ngpin
g
Zhan
g
)
2863
Whe
r
e S
F
(f) i
s
the
size of desi
r
ed
repli
c
a f, unit is M
B
; W
rt
is the
effective ban
dwidth
betwe
en
sou
r
ce SE and destin
a
tion SE, unit is MB/S; S
F
(i) is the si
ze of rep
lica that wait in the requ
est
e
d
queu
e of sou
r
ce SE, unit is MB/S; n is the numb
e
r of
replica whi
c
h
wait in the reque
sted que
ue
before d
e
si
re
d repli
c
a f; S
s
is the physi
cal copy spee
d of sou
r
ce S
E
, unit is MB/S.
Whe
n
repli
c
a
sel
e
ctio
n, L
W
L
C
g
ene
rat
e
s
a lis
t
of replicas t
hat
are
availabl
e
in Data
Grid, and it select
s the rep
lica that ha
s the minimum
T
t .
3.2.2. Replic
a Placement
There are se
veral SE in th
e regio
n
req
u
e
sting the pa
rticular rpeli
c
a
,
then we nee
d sele
ct
the be
st
site
to pla
c
e thi
s
repli
c
a. T
o
select
th
e b
e
st site, L
W
L
C
create
s
a
sorted
list
by the
numbe
r of re
plica a
c
ce
ss
of all SEs that reque
sted t
he re
plica in
the regio
n
. Now we will pl
ace
the re
plica at
the first SE of
the a
bove
so
rted li
s
t; if mo
re tha
n
o
ne S
E
have the
same n
u
mb
er
of
requ
est
s
, L
W
LC
algo
rithm
sel
e
ct
s SE
whi
c
h h
a
ve t
he maximu
m
numb
e
r
of d
i
fferent job
s
t
h
a
t
requ
este
d th
e parti
cula
r replica. The replica is n
o
t placed at all
the req
u
e
s
te
d site, but on
ly at
best
site in th
e re
que
sted
region.
Hen
c
e
,
SE utiliz
ation is ve
ry high
, and
can
effectively improve
the perfo
rma
n
ce of Data Grid.
3.2.3. Replic
a Replac
eme
n
t
In the Data
Grid
ca
pa
city of ea
ch SE i
s
lim
ited,
and
a copy of th
e data
also h
a
s
co
sts.
Each file in t
he SE has
a
Repli
c
a Val
u
e(RV
), whi
c
h
is calcul
ated
as the
co
st for tra
n
sfe
rri
n
g
a
data file from another site
to local site, whi
c
h is
calculated as the
co
st for transf
e
rri
ng a data file
from anoth
e
r
site to local site next time. RV ma
inly consi
dered two asp
e
ct
s of repli
c
a: Repli
c
a
Weig
ht (RW)
and
Re
plica
Co
st (RC). L
a
rge
r
co
st
wil
l
be
paid
if th
e re
pla
c
ed
file ha
s
gre
a
ter RV,
so al
ways rep
l
ace the file which h
a
ve the
minimum RV
.
RW
exhibits t
he impo
rtan
ce for a
c
cess
histor
y in
different time i
n
tervals. T
he
RW(f) fo
r
file f is repre
s
ented a
s
:
c
c
T
-T
-
t
t
t=
1
R
W
(f
)=
R
N
h
h
f
F
,
(2)
RW(f) indi
cat
e
s weight of
replica f; F is
the set of
files that
have been
requ
ested; T
c
is t
h
e
numbe
r of time interval
s passe
d; t denotes the tim
e
interval t, the time interval is 1s; h is the
base wei
ght of replica; RN
t
indicate
s the
numbe
r of acce
ss fo
r
the file f at time int
e
rval t.
The time
co
st of tra
n
sfe
r
re
plica
from
so
urce SE to
d
e
s
tination
SE i
s
cal
c
ulate
d
b
y
RC,
if
the del
eted fil
e
ha
s l
a
rg
e
RC, th
en th
e
next req
u
e
s
t
for thi
s
file
wi
ll ca
use g
r
eat
time del
ay. On
the other hand, the j
ob
alway
s
requests a
deleted file wh
i
c
h has large RC that will cause
unne
ce
ssary
band
width consumption.
It mainly rela
tes with the
size of repli
c
a and effecti
v
e
netwo
rk b
a
n
d
width. The
r
efore, the larger effe
ctivel
y network ba
ndwi
d
th, the sho
r
ter tra
n
sf
er
time.
F
rt
S
(f)
R
C
(f)=
W
(3)
Whe
r
e S
F
(f
) i
s
the
size of
desi
r
ed
re
pli
c
a f, unit is
MB; W
rt
is th
e effective b
and
width
betwe
en source SE and de
stination SE, unit in MB/S.
nn
i=
1
i
=
1
RW
f
R
C
f
R
V
f
=
1
00%
+
100%
f
F
RW
(
i
)
R
C(
i
)
,
(4)
F is the set of
files that have been requ
e
s
ted;
n re
pre
s
ents the num
ber of files in
the SE.
Whe
n
the ca
pacity of sto
r
age ele
m
ent i
s
sm
a
lle
r tha
n
the re
que
st
ed file, exit algorithm;
otherwise if e
noug
h
stora
g
e
spa
c
e
exist
s
at th
e b
e
st
site, then
re
pl
icate
s
the
sel
e
cted
file. If the
requ
este
d file exists in the local LAN
and t
here is not enoug
h space in the storage el
eme
n
t,
then acce
ss t
he file remot
e
ly. Now, if the req
u
e
s
ted
file doesn’t exist in the same LAN an
d no
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 4, April 2014: 2860 – 2
867
2864
enou
gh
spa
c
e in the
sto
r
a
ge ele
m
ent,
we
need
to d
e
lete on
e o
r
more
files
usi
ng the foll
owi
n
g
st
ep
s:
(1)
Ran
domly
generated li
st datalist1 of
rep
licas that
don’t exist on the acce
ss history
and in the st
orag
e elem
e
n
t. Then dele
t
e files fr
om the above list till space is enoug
h for the
repli
c
a;
(2) If spa
c
e i
s
still not en
ough, then g
enerates
a li
st datalist2 of
replicas in a
s
cendi
ng
orde
r
by RV,
the replicas t
hat a
r
e
both
available
at t
he b
e
st
site
a
s
well
as the
local
LAN.
Now
delete files from top to bottom until there
is enou
gh sp
ace to sto
r
e t
he ne
w repli
c
a;
(3) If space i
s
till in
suffici
e
n
t, then
rep
e
a
t the
step
(2) fo
r e
a
ch L
A
N in
current
re
gion,
rand
omly;
(4) If spa
c
e
is
still not
e
noug
h, then
gene
rate
list
datali
s
t3
of
remai
n
ing
re
plica
s
i
n
ascen
d
ing o
r
der by RV. Then del
ete files from
top
to bottom until there is e
noug
h sp
ace
to
store th
e new replica.
3.2.4. LWLC
Algorithm
Acco
rdi
ng to the ideas of
Section 3.2.1
to
Section 3.2.3, put forward L
W
L
C
algorithm
w
h
ic
h is
improved of DHRA algor
ithm, it is
dynamic
D
a
ta Gr
id
r
e
plic
ation algor
ithm bas
e
d
on
weig
ht and cost of repli
c
a.
Algorithm 1 descri
b
e
s
the
LWL
C
algo
rithm.
Algorithm 1: LWL
C
(L
ea
st weig
ht and L
east Cost
)
INPUT: All files F;
OUTP
UT: Best sou
r
ce SE, best de
stinat
ion SE;
(1) if(th
e
req
u
e
sted file f
i
is not available i
n
the site) {//if1
(2) gen
erate
list datalist1
of f
i
that are available in local LAN;
(3) sel
e
ct f
i
from datali
s
t1
that has mini
mum T
t
;
(4) s
e
lec
t
the bes
t s
i
te S
E
to plac
e this
replic
a;
(5) if(f
i
's si
ze
>= SE storag
e size){
(6) acc
e
s
s
f
i
remotely; exit;}
(7) else{//else1
(8) if(f
i
's
size <= SE available si
ze
) {
(9)
replic
ate fi; exit;}
(10) els
e
{//els
e2
(11)
if(f
i
is availabl
e in the local LA
N)
(12)
ac
c
e
s
s
f
i
remotely;
(13)
els
e
{//els
e3
(14
)
Randomly gen
erated list datali
s
t2 of repli
c
a
s
that not exist on the
acce
ss histo
r
y and in the stora
ge ele
m
ent;
(15)
wh
ile(datali
s
t2 != empty) {
(16)
s
e
lec
t
a file from top of datalis
t2;
(17)
d
e
lete this
file;
(18)
i
f
(f
i
's <= SE available
size
) {
(19)
replic
ate f
i
; exit;}
(20
)
}//end while
(21
)
generate a li
st datalist3
of rep
lica
s
in asce
n
d
ing orde
r by RV, the
repli
c
a
s
that are both avail
able at the
be
st site as
well
as the local LAN;
(22)
wh
ile(datali
s
t3 != empty) {
(23)
s
e
lec
t
a file from top of datalis
t3;
(24)
d
e
lete this
file;
(25)
i
f
(f
i
's <= SE available
size
) {
(26)
replic
ate fi; exit;}
(27
)
}//end while
(28
)
for(ra
ndomly sel
e
cted LA
N in cur
r
e
n
t regio
n
) {
(29
)
generate a li
st datalist4
of rep
lica
s
in asce
n
d
ing orde
r by RV, the
repli
c
a
s
that are both avail
able at the
be
st site as
well
as the local LAN;
(30)
w
h
ile(datalis
t4
!=
empty) {
(31)
s
e
lec
t
a file from top of datalis
t4;
(32)
delete this
file;
(33)
if(f
i
's <= SE available si
ze
) {
(34)
replic
ate f
i
; exit;}
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Dynam
ic Dat
a
Grid
Repli
c
ation Algorith
m
Based on Weig
ht and Cost of… (Zh
o
ngpin
g
Zhan
g
)
2865
(35
)
}//end whil
e
(36
)
}//end for
(37
)
generate li
st datalist5 of
rem
a
ining re
plicas
in ascen
d
ing
orde
r by RV;
(38)
while(datalis
t5
!=
empty) {
(39)
s
e
lec
t
a file from top of datalis
t5;
(40)
delete this
file;
(41)
i
f
(f
i
's <= SE available
size
) {
(42)
replicate f
i
; exit;}
(43
)
}//end while
(44
)
}//end else3
(45
)
}//end else
2
(46
)
}//end else
1
(47
)
}//end if1
END
4.
Experimenta
l
Results a
n
d Performan
ce Analy
s
is
4.1. Simulation Tool
We
evaluate
the pe
rforma
nce
of va
riou
s
repl
i
c
ation
algorith
m
s by
implem
entin
g them
in
OptorSim [1
2-13]. It is develop
ed b
y
the EU Data Grid p
r
oj
ect, a Java-based sim
u
l
a
tion
langu
age. O
p
torsim
assum
e
s that a
Dat
a
Grid i
s
co
m
posed of
serv
eral
sites, ev
ery site h
a
s
zero
or mo
re
Com
puting Elem
e
n
ts (CEs) a
n
d
Storag
e
Elements (SEs). CE is u
s
e
d
to execute j
o
bs,
SE s
t
o
r
e file
s. R
e
s
o
ur
ce
Br
o
k
e
r
(R
B)
ge
ts
th
e us
er’
s
jo
b an
d di
spatch
es ea
ch
job to
a p
r
o
per
site b
a
sed o
n
the sch
eduli
ng alg
o
rithm.
Each
site
ha
s a
Re
plica
Manag
er
(RM), it control
s
data
transfe
rring a
nd provide
s
a mechani
sm for a
c
ce
ssing the repli
c
a catal
og. Replica Optimi
zer
(RO) withi
n
RM implement
s the re
plicati
on algo
rithm.
4.2. Configur
ation
The top
o
logy
of ou
r si
mul
a
ted platfo
rm
is
sho
w
n
in
Figure 1
and
this top
o
logy
is from
the simul
a
te
d archite
c
ture of 3LHA.
There are
th
ree regio
n
s: regio
n1 con
s
ists
of
LA
N1
and
LAN2; re
gion
2 con
s
i
s
ts of
LAN3, LAN4 and LAN5; re
gion3
con
s
i
s
ts of LAN6. L
A
N1, LAN3 a
n
d
LAN6
have t
h
ree
site
s; ot
her
LANs all
have thr
ee
si
tes. The
site
s all have
CE
with a
s
soci
ated
SE. Site6 hold all maste
r
files at the be
ginnin
g
of the simulatio
n
. Intra LAN ba
ndwi
d
th is 10
00
Mbps,
Inter L
A
N b
and
widt
h is 10
0Mb
p
s
, an
d Inte
r
region
ba
nd
wi
dth is 10
Mbp
s
. Th
e top
o
lo
g
y
para
m
eters a
r
e sh
own in T
able 1.
Table 1. Top
o
logy Param
e
ters T
able
Top
olo
g
y
par
a
meters
v
a
l
u
e
No. Of
region
3
No. Of LA
N
6
No. Of site
15
Size of SE
30 G
B
Size of main SE
300 GB
Intra LAN b
and
w
i
dth
1000Mbps
Inter LAN b
and
w
i
dth
100Mbps
Inter re
gion band
w
i
dth
10Mbps
Table 2. G
r
id
Job Paramet
e
rs T
able
Grid J
ob Para
m
e
ters
Value
No. of job t
y
pes
6
No. Of file access per job
15
Size of single file
1GB
Total size of files
100GB
Job dela
y
2500ms
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 4, April 2014: 2860 – 2
8
67
2866
There are
6 job types, ea
ch job type re
quire
s 15
file
s, the size of
each file is 1
G
B. We
assume
all th
e files a
r
e
re
ad-o
n
ly in the
experim
ent
al
simul
a
tion of
this pa
pe
r. T
a
ble 2
sp
ecifi
e
s
the Grid job p
a
ram
e
ters used in our
stud
y.
4.3. Simulation Res
u
lts a
nd Discu
ssi
on
We compa
r
e
the perfo
rma
n
ce of L
W
L
C
algorith
m
wit
h
three othe
r
algorith
m
s:
(1)
LRU (L
east
Re
cently use
d
) alg
o
rithm:
If
the space of SE is not enou
gh for th
e new
repli
c
a, then
delete the old
e
st file in the SE;
(2)
LFU (Lea
st F
r
equ
ently use
d
) alg
o
rithm:
If the space o
f
SE is not enough fo
r the
new
repli
c
a, then
delete the lea
s
t acce
ssed file in the SE;
(3)
DHRA (Dyna
m
ic Hi
era
r
chi
c
al Repli
c
ati
o
n Algorith
m
)
algo
rithm:
The data
gri
d
is
divided into t
h
ree l
a
yers.
The time of
waiting i
n
th
e req
u
e
s
t qu
eue i
s
an im
portant fa
ctor for
repli
c
a sele
ction.
In LRU and
LFU
repli
c
ati
on always ta
ke pla
c
e
at the site
whe
r
e the job i
s
executin
g.
The file
s del
e
t
ed by previo
us al
go
rithm
may be
re
q
u
i
r
ed i
n
future
and a
r
e
prob
ably availabl
e
in
other regions, but inter region ba
ndwidth is lowest, so the file
s transferri
ng time will increase,
then the job runtime incre
a
s
e too.
Figure 2
sho
w
s me
an j
o
b
execution
time b
a
sed o
n
chan
ging
numbe
r of
jo
bs
fo
r 4
algorith
m
s
.
With a
n
in
creasi
ng
numb
e
r of
job
s
, t
hese fou
r
d
y
namic
Data
Gri
d
repli
c
a
t
ion
algorith
m
si
mulation
re
sults sho
w
th
at the job
e
x
ecuted tim
e
con
s
tantly i
n
crea
se to
o. The
averag
e job execute
d
time is basi
c
sa
me whe
n
in small num
be
r of jobs; whe
n
the numbe
r o
f
jobs is larger,
LWLC alg
o
ri
thm and
DHRA alg
o
rith
m
are
abl
e to
pro
c
e
s
s the j
obs in th
e lo
we
st
mean tim
e
in
comp
ari
s
o
n
with oth
e
r
me
thods. But
th
e pe
rform
a
n
c
e of L
W
L
C
al
gorithm
is bet
te
r
than the
DHRA algo
rithm,
And mo
re o
b
vious
advan
tages
with th
e increa
se
d n
u
mbe
r
of job
s
. In
1500
job
s
th
e mea
n
jo
b t
i
me of
LWLC algo
rith
m i
s
faster by a
b
o
u
t 25.53%
co
mpared to
th
e
DHRA al
gorit
hm; in 2
100 j
obs the m
e
a
n
job tim
e
of LWL
C
algo
rithm
is
faste
r
by
about 31.
3
6%
comp
ared to
the
DHRA
algorith
m
. It
can
be
seen
in the
ca
se
of la
rge
nu
mber of jo
bs, the
averag
e job
executio
n time of LWLC a
l
gorithm
i
s
sh
orter th
an L
R
U alg
o
rithm,
LFU al
go
rithm,
DH
RA algo
rit
h
m.
Figure 2. Mean Jo
b Time
Based o
n
Varing Num
b
e
r
o
f
Jobs
5.
Conclu
sion and Futu
r
e
Work
In this p
ape
r, we p
r
op
ose novel dyn
a
mic
Data
G
r
id repli
c
atio
n algo
rithm
based o
n
weig
ht an
d
cost of
repli
c
a,
call
ed
LWLC algo
rithm,
it imp
r
o
v
ed
fr
om DH
R
A
a
l
go
r
i
th
m. In
r
e
plic
a
sele
ction, it take
s into a
c
count
the num
ber of re
que
sts that are
waiting in the storage q
ueu
e
of
sou
r
ce SE, the network ba
ndwi
d
th bet
ween
sou
r
ce S
E
and de
stina
t
ion SE also i
s
con
s
ide
r
ed.
In
repli
c
a pla
c
e
m
ent, the rep
lica is sto
r
e
d
at an appropria
te site, i.e. the repli
c
a in this site ha
s the
highe
st numb
e
r of acce
ss
requ
este
d, if
more
than o
n
e
SE have th
e same n
u
m
b
er of req
u
e
s
ts,
LWL
C
al
gorit
hm sel
e
ct
s S
E
which ha
s t
he maximum
numbe
r of dif
f
erent job
s
th
at requ
este
d
the
particula
r rep
lica. In
re
plica repla
c
em
e
n
t, acce
ss
hi
story i
n
different time i
n
terval and
the ti
me
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Dynam
ic Dat
a
Grid
Repli
c
ation Algorith
m
Based on Weig
ht and Cost of… (Zh
o
ngpin
g
Zhan
g
)
2867
co
st of tran
sf
er repli
c
a fro
m
so
urce SE
to des
tin
a
tio
n
SE are i
n
trodu
ced. T
he
simulatio
n
re
sult
sho
w
s that LWL
C
algo
rith
m can effe
ctively
improve the averag
e jobs exe
c
utio
n time.
In our futu
re
wo
rk,
we pl
an to stu
d
y the othe
r a
s
pect
s
which
affect the da
ta grid
perfo
rman
ce:
repli
c
a
s
differen
c
e
s
, net
work
con
g
e
s
tion.
Thus more effectively
improve the
perfo
rman
ce
of the Data G
r
id.
Ackn
o
w
l
e
dg
ements
This
wo
rk
wa
s finan
cially
sup
porte
d by
Nation
al Natural S
c
ien
c
e
Found
ation o
f
Chin
a
(612
721
24),
Hebei P
r
ovincial
Nat
u
ral Sci
e
n
c
e Foun
datio
n of Chin
a
(F201
220
3
087
,
S20132
030
0
2
) and
Nation
al Natural Sci
ence Foun
dat
ion of Chin
a (6107
3060
).
Referen
ces
[1]
Lili D
i
n
g
, Xi
ao
l
i
ng W
a
n
g
, W
angli
n
Ka
ng. Multi-A
ttribute A
u
ction
i
ng
Res
ource i
n
Grids
:
Model a
n
d
Protocols.
T
E
L
K
OMNIKA Indones
ian J
ourn
a
l of Electrica
l
Engi
neer
in
g.
2013; 11(
10): 56
09-5
616.
[2]
K Rang
an
atha
n, I F
o
ster. Id
entif
yin
g
D
y
n
a
m
ic
Rep
licati
o
n Strategi
es fo
r a Hig
h perfo
rmance D
a
ta
Grid. Grid Computing-GRID 2001
. Co
mputer
Science
. 20
01
; 2242: 75-
86.
[3] Nukar
apu
DT
,
Bin T
ang, Liq
i
an
g W
ang, S
h
i
y
o
ng L
u
. Da
ta R
epl
icati
on
in Data Inte
nsi
v
e Scientifi
c
Appl
icatio
ns w
i
th
Pe
rformanc
e Guara
n
tee.
I
EEE Transactions on Par
a
ll
el and Distribut
ed System
s
.
201
1; 22(8): 12
99-1
306.
[4]
T
an Cuip
ing, Z
hen
g H
uai
gu
o, Z
han
g Ju
nfen
g, Su
n
Sufe
n,
Li Gu
ang
da. A
g
ricult
ural
Kno
w
l
e
d
g
e
Gr
i
d
Constructi
on.
T
E
LKOMNIKA Indo
nesi
an Jo
u
r
nal of Electric
al Eng
i
ne
eri
ng.
2013; 1
1
(9): 5
224-
522
8.
[5]
A Horri, R Se
pahv
an
d, Gh Dastgh
aib
y
f
a
rd
. A Hierarch
i
c
a
l Sch
edu
lin
g
and R
e
p
licati
on Strateg
y
.
IJCSNS Intern
ation
a
l Jo
urna
l of Computer S
c
ienc
e an
d Net
w
ork Security
. 200
8; 8(8): 30-
35.
[6]
Najme M
anso
u
r
i, Gholam Hos
e
in D
a
stgh
aib
y
fard.
Job sche
duli
ng a
nd d
y
n
a
mic data re
pli
c
ation i
n
da
t
a
grid e
n
viro
nme
n
t.
T
he Journa
l
of Superco
mp
uting
. 20
13; 64
(1): 204-2
25.
[7]
Sang-M
i
n Par
k
, Jai-Ho
on K
i
m, Youn
g-Ba
e Ko,
W
on-Si
k Yoon. D
y
n
a
m
ic Data Gri
d
Rep
licati
o
n
Stra
te
gy
b
a
s
ed
on
In
te
rn
e
t
H
i
e
ra
rchy
.
Gri
d
a
n
d
Co
op
er
ative
Comp
uti
ng.
C
o
mp
uter Scienc
e
. 2
004;
303
3(2): 83
6-8
46.
[8]
Rua
y
-s
hiu
ng C
han
g, Jih-Sh
en
g Cha
ng, Shin-
Y
i Lin.
Job sch
edu
lin
g an
d da
ta replic
ation o
n
data gri
d
s
.
F
u
ture Generat
ion C
o
mputer
Systems
. 20
07
; 23(7): 846-8
6
0
.
[9]
Najme M
anso
u
ri, Gholam H
o
sei
n
Dastg
hai
b
y
f
a
rd. A d
y
na
mic replic
a ma
nag
ement stra
teg
y
in d
a
t
a
grid.
Jour
nal of
Netw
ork and Co
mp
uter Appl
icatio
n
. 201
2; 35(4): 129
7-1
3
0
3
.
[10]
Najme
Ma
nso
u
ri. An
Effective
w
e
ig
ht
Dat
a
Repl
icati
on Str
a
teg
y
for
Dat
a
Grid.
Austra
li
an J
ourn
a
l
of
Basic an
d App
l
ied Sci
ences
. 2
012; 6(1
0
): 336
-346.
[11]
Rua
y
-S
hiu
ng
Cha
ng, Hui-P
i
ng Ch
ang, Yu
n-T
i
ng W
ang. A d
y
n
a
mic d
a
ta replic
atio
n strateg
y
usin
g
access-
w
e
ig
hts in data gri
d
s.
Co
mp
uter Systems a
nd Ap
pli
c
ations.
20
08; 45(3):
27
7-2
9
5
.
[12]
T
he Eruopean Data Grid Project, h
ttp://eu-datagrid.
w
eb.c
e
rn.ch/eu-datagr
id/
[13]
Simulati
ng
dat
a access o
p
ti
mizatio
n
alg
o
ri
thms-Opt
orSim [EB/OL]. http://edg-
w
p
2.
w
e
b.cern.ch/ed
g-
w
p
2/o
p
timizati
on/optors
i
m.html. 2004.
Evaluation Warning : The document was created with Spire.PDF for Python.