TELKOM
NIKA
, Vol. 11, No. 8, August 2013, pp. 45
8
7
~4
593
e-ISSN: 2087
-278X
4587
Re
cei
v
ed
Jan
uary 15, 201
3
;
Revi
sed
Ma
y 17, 201
3; Accepted Ma
y
27, 2013
A Privacy Data-oriented Hierarchical MapReduce
Progra
mming Model
Hai
w
e
n
Han
*
1,2
, Weiping
Zheng
3
1
Colle
ge of ec
o
nomic a
nd ma
nag
ement, Sou
t
h Chin
a Norm
al Univ
ersit
y
, C
h
in
a
2
Researc
h
Institute of Comput
er S
y
stems, So
uth Chi
na U
n
iv
ersit
y
of T
e
chn
o
lo
g
y
, C
h
in
a
3
Colle
ge of co
mputer scie
n
ce
and Eng
i
n
eer, South Ch
in
a N
o
rmal Un
iversit
y
, Chi
n
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
:
hanh
w
@
scn
u.edu.cn, i
a
mzhen
g
w
e
i
pi
ng
@
126.com
A
b
st
r
a
ct
T
o
rea
l
i
z
e
pr
i
v
acy d
a
ta pr
o
t
ection
efficien
tly in
hybri
d
c
l
ou
d serv
ice,
a hi
erarc
h
ica
l
contro
l
architectur
e
b
a
sed
multi-cl
u
s
ter MapR
ed
u
c
e pro
g
ra
mmi
ng
mo
de
l (the
Hier
a
rchic
a
l
MapR
educ
e M
ode
l,
HMR)
is pres
ented. Und
e
r this
hi
er
arc
h
ic
al co
ntrol
arch
itecture, d
a
ta i
s
olati
on
an
d p
l
ace
m
ent a
m
o
n
g
private c
l
o
ud
a
nd
pub
lic c
l
ou
d
s
accord
in
g to
the d
a
ta
pr
ivac
y charact
e
ristic
is i
m
ple
m
ente
d
by th
e c
ontr
o
l
center i
n
priv
ate clo
ud. An
d then,
to p
e
rfor
m the c
o
rres
p
o
ndi
ng d
i
strib
u
ted p
a
ral
l
el c
o
mp
utatio
n corr
ectl
y
und
er
the multi
-
clusters mo
de
that
is d
i
ffere
nt to the co
nv
entio
nal
si
n
g
le
-cluster mo
de, the
Map-
Re
du
ce-
Globa
lRe
duc
e
three stag
e sc
hed
uli
ng
proce
ss is des
ign
ed.
Li
miti
ng th
e c
o
mputati
on
ab
out priv
acy d
a
ta i
n
private cl
oud
w
h
ile outso
urc
i
ng the co
mpu
t
ation ab
out n
on-priv
acy dat
a to publ
ic clo
uds as much
as
possi
ble, HMR
reach
e
s the pe
rforma
nc
e of both security and low cost.
Ke
y
w
ords
: hie
r
archica
l
arch
itecture, map re
duce
pr
ogr
ammi
ng, hybr
id cl
oud, priv
acy da
ta
Copy
right
©
2013 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Even though
the cost-effective sh
are
d
comp
ut
ing
resou
r
ces in
publi
c
clou
d of cloud
comp
uting te
chn
o
logy p
r
o
v
ides p
r
etty good
bus
i
n
e
ss
sol
u
tion to meet the
gro
w
ing
so
ci
al
demand for
massive data processing,
i
t
still leads inevitabl
y to security ri
sk [1-3]. Consequently
the
private cl
oud whi
c
h chara
c
te
rizi
ng
in
spen
ding
expen
sive
self-build
con
s
truction
cost
but
owni
ng the t
o
tal co
ntrol
b
e
com
e
the
o
n
ly choi
ce
of
many ap
plica
t
ion with e
m
p
hasi
s
of
priva
cy
data p
r
ote
c
tion. Grew
hot
in re
ce
nt ye
ars,
hybrid
cl
oud te
chn
o
lo
gy [4] provid
es p
o
ssibility
to
distrib
u
te co
mputation a
m
ong the pu
blic an
d pr
iva
t
e cloud
s to achi
eve both
absol
utely limiting
the priva
c
y data co
mput
ation in priv
ate clou
d an
d fully outso
urci
ng the n
on-p
r
iva
c
y data
comp
utation to public
clou
d. Currently, althoug
h
the publi
c
clou
d computation
re
sou
r
ce co
uld
be
use
d
to su
pp
ort the privat
e clou
d comp
utation
scale
out on-dema
nd [4], the effective tech
nol
ogy
for priva
c
y d
a
ta orie
nted
comp
utation
splitting
a
nd
distrib
u
ted ex
ecutin
g ha
s
still not been
well
develop
ed. Rece
ntly, Kehuan Zha
ng [8] desi
gne
d
a Sedic m
odel b
a
se
d on M
a
p
R
ed
uce [5], th
e
most
wid
e
ly
use
d
clou
d
computing
p
r
o
g
rammi
ng
m
odel, to i
s
ol
a
t
e the p
r
iva
cy data in
p
r
ivate
clou
d by the
data
disting
u
ish
me
cha
n
i
s
m o
n
di
ffe
rent data
cha
r
acte
ri
stic
(p
rivacy an
d no
n-
privacy
)
an
d the task
sche
duling m
e
cha
n
ism o
n
different com
putati
on re
so
urce
(publi
c
clo
ud a
nd
private
clou
d
)
. But the
de
tail modification di
re
ct
ly o
n
the
origi
nal
ex
ecution framework
wo
uld
redu
ce the
executio
n efficien
cy
of the conventio
n
a
l comp
utati
on tasks, while the dire
ctly
sched
uler
of publi
c
clo
ud
resou
r
ce with
out usin
g co
ntrol no
de would b
r
ing th
e risk of cha
o
tic
manag
eme
n
t. The
deepl
y rea
s
on
is that Sedi
c ke
ep the
origin
al Ma
p
R
ed
uce working
architectu
re
o
f
single
cl
uste
r, whi
c
h
is
nei
ther
co
m
patib
le to the
hybri
d
clo
ud
naturally com
p
o
s
e
d
of multiple clusters no
r g
ood at data
and co
rre
s
pondi
ng co
m
putation isol
ation. About the
improvem
ent
of Map
R
ed
uce's archit
e
c
tu
re,
Jin
Jin
g
[6
] develop
ed
a
co
ntrol
archit
ecture
of mult
i-
control n
ode
s inste
ad of th
e ori
g
inal
sin
g
le no
de,
wh
ile Yuan
Luo
[7] pro
p
o
s
ed
a hie
r
a
r
chical
MapRedu
ce
architectu
re t
o
implement
the flexib
le sche
duling f
o
r
mult
i clust
e
r
s
re
sou
r
ce.
A
l
l
these
re
sea
r
che
d
are be
nefit to the developme
n
t of MapRedu
ce's a
r
chite
c
tu
re for d
e
si
gn
ing
disting
u
ish mech
ani
sm an
d achi
evi
ng the priva
c
y da
ta prote
c
tion.
Summari
zin
g
the previo
us resea
r
ch an
d focu
si
n
g
o
n
the hybri
d
clou
d's
multi-clu
s
ters
cha
r
a
c
teri
stic, a hiera
r
chi
c
al cont
rol architec
ture Map
R
ed
uce pro
g
ramming m
o
d
e
l (Hi
e
ra
rchical
MapRedu
ce
Model, HMR) is presented
in this
p
ape
r. The m
a
jor developm
en
t base
d
on t
h
e
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 8, August 2013: 4587 –
4593
4588
origin
al Map
R
ed
uce pro
g
rammi
ng
model i
s
co
mpo
s
ed
of the central
i
zed hi
erarchical
architectu
re t
o
manag
e m
u
lti cluste
rs' reso
urce
, the
data pro
c
e
s
si
ng mechani
sm to achieve
th
e
privacy
data
ori
ented
au
tomatic i
s
ol
a
t
ion of d
a
ta
and
comp
u
t
ation, and
t
he M
a
p
R
edu
ce-
Global
Re
duce three
stag
es sch
eduli
n
g pro
c
e
s
s to corre
c
tly carry out the p
a
rallel di
stri
b
u
ted
comp
utation t
a
sks in m
u
lti-clu
s
ter e
n
viro
nmen
t. The d
e
tail impleme
n
tation is b
a
sed on
Had
o
o
p
MapRedu
ce system
[9].
2. The Hierar
chical Arc
h
itectur
e
2.1. Architec
ture
Pr
iva
t
e C
l
o
u
d
C
l
us
t
e
r
R
e
s
o
ur
c
e
N
o
de
…
R
e
s
our
c
e
N
o
de
R
e
s
o
ur
c
e
N
o
de
…
…
Ta
s
k
T
r
a
c
k
e
r
Da
t
a
N
o
d
e
Da
t
a
N
o
d
e
Da
t
a
N
o
d
e
Ta
s
k
T
r
a
c
k
e
r
T
a
s
k
T
r
ack
e
r
C
e
nt
r
a
l
C
ont
r
o
l
l
e
r
Jo
b S
c
he
d
u
l
e
r
Da
t
a
M
a
na
ge
r
W
o
r
k
lo
ad
Mo
n
ito
r
S
u
b-C
o
nt
r
o
l
l
e
r
Na
m
e
N
o
d
e
J
o
b
T
r
a
ck
er
W
o
rk
l
o
a
d
R
e
p
o
rt
e
r
P
u
bli
c
Cl
ou
d
Cl
u
s
te
r
P
u
bli
c
C
l
oud
Cl
u
s
te
r
W
o
rk
l
o
a
d
R
e
p
o
rt
e
r
Su
b-
C
o
nt
r
o
l
l
e
r
W
o
rk
l
o
a
d
R
e
po
rt
e
r
Na
m
e
N
o
d
e
Jo
b
T
r
a
ck
er
S
u
b
-
Co
n
t
ro
lle
r
Na
m
e
N
o
d
e
J
obT
r
a
c
k
e
r
W
o
rk
l
o
a
d
R
e
p
o
rt
e
r
Figure 1. The
HMR Archite
c
ture
As sh
owed in
Figure 1, HMR'
s hie
r
a
r
chical a
r
ch
itecture is
com
p
osed
of thre
e
lay
e
rs
of
central cont
ro
ller, sub
-
controller and
reso
urce
node.
Rangin
g
a
bout one private
cl
oud and
multi
publi
c
clo
u
d
s
, the only one cent
ral con
t
roller of HM
R's hierarchi
c
al
archit
e
c
tu
re is lo
cate
d
in
private cl
oud
to lead the
whole ex
e
c
utio
n, while
every cluste
r'
s
onl
y one sub-co
ntrolle
r is l
e
a
ded
by central
controlle
r to manag
e every cluster'
s re
sou
r
ce acco
rdingly. In HMR syste
m
, the
conve
n
tional
singl
e
control
nod
e in
si
ngl
e cl
uste
r i
s
e
x
tended to
two-laye
r
contro
l archite
c
ture
to
unified
mana
ge the
resou
r
ce
fro
m
diff
erent
source
(p
rivate a
n
d
publi
c
clo
u
d
s
) thro
ugh
th
e
central co
ntroller an
d effectively implement t
he isolation of dat
a and comp
utation und
er the
multi-cl
uste
rs mode.
2.2. The Co
mponents
As mention
e
d
above, in HMR
system
, there
are th
ree types of
node
s. The first one,
central co
ntrol node, is a
brand n
e
w t
y
pe to the original Map
R
e
duce model.
Being locate
d in
private
cloud
, central con
t
rol nod
e i
s
comp
os
ed of
job sch
edul
er
which, dat
a mana
ge
r a
nd
worklo
ad m
o
nitor. The j
o
b sche
dule
r
manag
e t
he
whol
e process in
cludi
ng u
s
er's
su
bmi
s
sion
acceptio
n, co
mputation ta
sks
assig
n
to sub-cont
rol no
des, a
nd out
come data
ret
u
rn to u
s
e
r
. T
he
data man
age
r wo
uld be in
voked by the
job sche
dul
e
r
to implement
the data isol
ation acco
rdi
ng
to priva
c
y
ch
ara
c
teri
stic.
T
he
worklo
ad
monitor is m
a
jor i
n
m
onit
o
ring
an
d
re
coding
the
pu
blic
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
A Priva
cy
Dat
a
-o
riente
d
Hi
era
r
chical Ma
pRe
d
u
c
e Pro
g
ram
m
i
ng Model (Haiwen
Han
)
4589
clou
d's
cu
rre
n
t worklo
ad
whi
c
h would
be u
s
ed by
j
ob sche
dule
r
to do the sub-cont
rol no
de
sele
ct
ion.
The se
con
d
one, sub
-
cont
rol nod
e, is compo
s
ed of n
a
meno
de, job
t
racke
r
and workl
oad
repo
rter. A
s
sub
-
control n
ode i
s
almo
st
the sa
m
e
a
s
the co
ntrol n
ode in
Had
o
o
p
, the wo
rkl
o
ad
repo
rter i
s
th
e ne
w
co
mpo
nent
whi
c
h
re
porting
the
cl
uster's curren
t wo
rkl
oad
to
ce
ntral
control
node in fixed time. As working in conve
n
tional way a
s
origi
nal Ma
pRe
d
u
c
e, the nameno
de a
nd
the job tra
cker al
so woul
d wo
rk in
a
new d
a
ta p
r
ivacy ori
ent
ed way if n
eede
d. Besid
e
s,
namen
ode m
u
st mana
ge the data tran
sf
er process be
tween
clu
s
ters and a
c
cept
job from central
control nod
e instea
d of fro
m
use
r
in orig
inal Ha
doop
cluster.
The thi
r
d o
n
e
,
resou
r
ce no
de, is
co
mpo
s
ed
of data
n
ode a
nd ta
sktracke
r. As worki
ng i
n
conve
n
tional
way as o
r
igin
al MapRedu
ce, the datano
de and ta
skt
racker
also
would work in
a
new d
a
ta priv
acy orie
nted
way if neede
d.
3. Programming Model
To a
c
hieve
th
e priva
c
y d
a
ta ori
ented
iso
l
at
ion o
n
d
a
ta
and
computa
t
ion, HM
R d
e
v
elops
the ori
g
inal
MapRedu
ce
in these two
ways:
1) t
he p
r
ivacy
data o
r
iente
d
data p
r
o
c
ess
mech
ani
sm to isolate th
e data in two cate
go
rie
s
of privacy
and non-pri
v
acy unde
r the
hiera
r
chi
c
al a
r
chite
c
tu
re; 2) the Map-Red
u
ce
-Glo
ble
R
e
duce Thre
e Stage Sche
duli
ng pro
c
e
s
s to
carry o
u
t th
e priva
c
y d
a
t
a orie
nted
parall
e
l di
stri
buted
com
p
u
t
ation un
der the hi
era
r
chical
architectu
re.
3.1. Priv
ac
y
Data Orien
t
e
d
Data Proce
ss Mech
anis
m
The priva
c
y d
a
ta-o
riente
d
data pro
c
e
s
s mech
ani
sm for HM
R sy
ste
m
is about u
s
er data
submi
s
sion a
nd data pla
c
e
m
ent.
3.1.1. User Data Submiss
i
on
Without lo
ss of g
ene
ralit
y, it is a
s
su
med
th
at the
user data
submitted i
n
f
ile. And
besi
d
e
s
sub
m
itting appli
c
ation job
by Job
Clie
nt
in
Had
oop’
s u
s
er clie
nt, users should
al
so
submit
s thei
r data in th
re
e types: 1
)
Public,
wh
i
c
h
mean
s that
this file i
s
co
mposed
with
out
privacy d
a
ta at all and cou
l
d be pa
ssed
to public
cl
o
u
d
entirely. 2)
Sensit
ive, wh
ich me
an
s that
this file is co
mposed of p
r
ivacy data an
d must
be
pa
ssed to privat
e clou
d entire
l
y. 3) Semipu
b
,
whi
c
h m
ean
s that the
file
is mixture of
privac
y
data
and
publi
c
d
a
ta, sh
ould
b
e
firstly split
off
according to t
he sp
ecifi
c
privacy data strings p
r
ov
ide
d
by
user
s,
su
ch a
s
so
me s
pecif
i
c
I
D
ca
r
d
numbe
r, cu
st
omer n
a
me,
so
cial securit
y
number
a
n
d
credit ca
rd
numbe
r, an
d so on. Th
e
s
e
spe
c
ific p
r
iva
c
y data stri
ng
s wo
uld be
su
bmitted together.
3.1.2. Data P
r
oces
s and
Placement
The jo
b
sche
duler re
ceive
s
t
he
user su
bmission
in
cl
uding
co
mput
ing job
an
d u
s
er data
firstly, and
th
en p
a
sse
s
it
to clo
ud
or cl
oud
s a
c
cordi
ng to th
e th
ree d
a
ta
sub
m
issi
on type
s a
s
follow.
1)
Sensitive
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 8, August 2013: 4587 –
4593
4590
If the submi
s
sion
type i
s
sen
s
itive, the
us
er submi
s
sion
in
cludi
n
g
comp
uting
job a
nd
use
r
data
wo
uld be
entirel
y passed to
p
r
ivate clo
ud cl
uster’
s sub
-
control
n
ode b
y
Job
S
c
hed
u
l
er.
Name
No
de i
n
this
nod
e receive
s
the
submi
s
sion
a
nd do
the
da
ta pla
c
eme
n
t with the
sa
me
approa
ch a
s
origin
al MR
system. Specif
ically, as
i
s
shown in Fig
u
re 2, Nam
e
No
de create
s
m
e
ta
data as INo
d
e
stru
cture for this file which reco
rd
s
all the inform
ation about this file, such
as
filename, the
latest modified time, and
so on.
After saving this
INode in
sub
-
co
ntrol n
ode
,
Name
No
de finds o
u
t a fre
e
storage resource no
de i
n
clu
s
ter to save this file by local Data
Node
in fixed size
(64M) d
a
ta bl
ock. Datan
o
d
e
woul
d find
out next free
stora
ge resou
r
ce
nod
e to save
the next d
a
ta
block if
nee
d
ed. In a
dditio
n
, HDFS
di
stributed file
sy
stem
wo
uld
save three
rep
lica
for each dat
a block in three no
de
s re
spe
c
tive
ly (a
ccordi
ng to the default proce
s
s in had
oop
sy
st
em
).
2)
Public
If the sub
m
ission type
is p
ublic, the
u
s
e
r
submi
ssi
on i
n
clu
d
ing
co
m
puting jo
b a
n
d
u
s
er
data would
b
e
entirely pa
ssed to
su
b-control
node
(s
) in (a)
publi
c
clou
d cl
uste
r(s) a
c
cording
to
some
sc
hed
u
ling st
rat
e
gie
s
,
su
ch a
s
t
he clu
s
ter’
s current wo
rkl
o
ad or the ba
ndwith b
e
twe
en
clu
s
ters.
Na
meNo
de
in th
is n
ode.
Na
m
e
No
de i
n
thi
s
nod
e
re
ceiv
es th
e
sub
m
i
ssi
on
and
do
the
data placem
ent with the same a
pproa
ch as b
e
fore.
The only differen
c
e is th
at Name
Nod
e
in
publi
c
clo
ud
clu
s
ter n
eed
to transfe
r d
a
ta form
p
r
ivate clou
d, which m
ean
s the data tran
sfer
betwe
en cl
ou
ds, as i
s
sh
o
w
n in Figu
re
3.
3)
Semipub
If the submi
s
sion type i
s
semip
ubli
c
, Data Ma
nag
er would
be
execute
d
to
get and
pro
c
e
ss d
a
ta in these
step
s.
Firstly, a p
r
ivacy lab
e
l tab
l
e is fou
nde
d
by Data Ma
nage
r. Sca
n
n
ing the
wh
o
l
e file
according to
the spe
c
ific p
r
ivacy data
string
s
provide
d
by use
r
s,
Data Ma
nage
r create
s
a n
e
w
privacy tu
ple
(filenam
e, o
ffset, length) for eve
r
y matche
d st
ring
. As the lo
cation info
rma
t
ion
about all
priv
acy data
st
ri
ngs
bein
g
re
corded
in thi
s
table,
a
sa
nitized
re
plica of this file
is
cre
a
ted by ze
roing o
u
t all the priva
c
y da
ta string
s in o
r
iginal file.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
A Priva
cy
Dat
a
-o
riente
d
Hi
era
r
chical Ma
pRe
d
u
c
e Pro
g
ram
m
i
ng Model (Haiwen
Han
)
4591
Secon
d
ly, Da
ta Manag
er p
a
sse
s
the su
bmissi
on i
n
cl
uding
com
put
ation job, o
r
ig
inal file
and the
priva
c
y label ta
ble
to private
clo
ud cl
uste
r’s sub-cont
rol n
o
de, and
Na
m
e
No
de in
whi
c
h
woul
d do th
e
data pla
c
em
e
n
t in private
cloud
clu
s
ter.
As is
sh
own i
n
Figu
re 3, th
e pro
c
e
d
u
r
e
o
f
data pla
c
em
e
n
t is the sa
m
e
as p
r
eviou
s
except:
1) In
stead of from
Job
Client, Data
Nod
e
ge
ts
data
from Dat
a
Ma
nag
er. 2
)
Name
No
de save
s
n
o
t
o
n
l
y
the INode
structu
r
e
but
a
l
so th
e
priva
cy
label table in
sub
-
control n
ode.
Finally
,
Data
Mana
ge
r p
a
s
ses the
su
b
m
issi
on i
n
cl
u
d
ing
com
put
ation jo
b, sa
nitized
repli
c
a a
nd th
e priva
c
y lab
e
l table to p
u
b
lic
clou
d cl
u
s
ter’
s
sub
-
co
ntrol n
ode, a
nd Name
Nod
e
in
whi
c
h would
do the data
placem
ent in publi
c
clou
d clu
s
ter. As is sho
w
n in
Figure 4, th
e
pro
c
ed
ure of
data pla
c
e
m
ent is th
e sa
me a
s
prev
io
us ex
cept: 1
)
Instead
of from Job
Clie
n
t,
Data
Nod
e
ge
ts data from
Data Ma
nage
r. 2) Na
me
No
de save
s not
only the INod
e stru
cture b
u
t
also th
e p
r
iva
c
y label ta
ble
in su
b-co
ntrol nod
e.
3)
DataNo
de
sav
e
s in
stea
d of
the ori
g
inal f
ile
,
but the saniti
zed repli
c
a.
3.2. The Map
-
Re
duce
-
Glo
b
leRed
u
ce T
h
ree Sta
g
e Scheduling Pr
ocess
The convent
ional Ma
p-Redu
ce two stages
com
p
u
t
ation task sche
dule
r
p
r
o
c
e
ss i
n
origin
al Map
R
ed
uce must
be cha
nged
becau
se of t
he cha
r
a
c
teri
st
ics of multi-cl
uster a
nd p
r
ivacy
data in HMR
system. Th
e
cha
r
a
c
teri
stic of multi-
clu
s
t
e
r me
an
s tha
t
the outcom
e
data from m
u
lti
map fuctio
ns would
be di
stribute
d
am
ong multi cl
u
s
ters in
stead
of single
cl
uster i
n
ori
g
i
nal
MapRedu
ce.
The
cha
r
a
c
te
ristic of p
r
iva
c
y data
m
e
a
n
s that th
e computation
tasks
on
priv
acy
data mu
st be
execute
d
in
private cl
oud.
The di
re
ct
ly solutio
n
, whi
c
h is to tra
n
sf
er all the i
n
te
r
media out
co
me data ba
ck to private clo
ud clu
s
te
r,
own
s
the defi
c
ienci
e
s of g
r
e
a
t data tran
sfer
between
clusters and gr
eat computation in private cloud.
To utilize the
cloud re
source
effectively, a
Map
-
Redu
ce
-Glob
a
lRedu
ce three
stag
e
co
mputation
tasks
sche
dul
ing p
r
o
c
e
s
s i
s
pre
s
ente
d
in this pap
er. While the Map
stage of sch
e
duling p
r
o
c
e
s
s is the sa
m
e
as in ori
g
in
a
l
MapRedu
ce,
The Redu
ce
stage i
s
bein
g
cha
nge
d
to
the Red
u
ce-Global
Re
duce two sta
g
e
s
, in
whi
c
h the Redu
ce sta
g
e
is to imple
m
ent the
local redu
ce i
n
each
clu
s
ter and tran
sfer the
outcom
e
dat
a back to p
r
i
v
ate cloud
while the Glo
b
a
lRe
d
u
c
e is t
o
being
perf
o
rme
d
in priv
ate
clou
d to fini
shed th
e
red
u
c
e
ope
ration
on tho
s
e
dat
a. The
spe
c
ifically
work is
sho
w
e
d
in
Fi
gure
5.
Figure 5. The
Map-Red
u
ce
-Glob
a
lRedu
ce Thre
e Stage Sched
uling
Process
3.2.1. Map stage
:
List
In Map stag
e, distribute
d
map functio
n
s
are perfo
rmed on di
stri
buted data b
l
ocks in
parall
e
l in
e
v
ery clu
s
te
r. After a j
o
b
is
su
bmitted into
HM
R
system, th
e
input d
a
ta
set is
partitione
d in
to blo
c
ks.
JobTracke
r
first cre
a
tes Task
InPr
ogr
e
ss
objec
t
accor
d
ing to map
function fo
r e
a
ch
block in
resou
r
ce nod
e and a
s
sem
b
les the
s
e
o
b
ject
s into a
JobInP
rog
r
e
s
s
queu
e.
Wh
en
ever
a “he
a
rt
beat” sign
al come
s,
i
ndi
cating that a
TaskT
r
a
c
ker
of a com
puti
n
g
m
m
i
i
v
k
v
k
,
,
Evaluation Warning : The document was created with Spire.PDF for Python.
e-ISSN: 2
087-278X
TELKOM
NIKA
Vol. 11, No
. 8, August 2013: 4587 –
4593
4592
resou
r
ce n
o
d
e
is re
ady to
run
a n
e
w task, JobT
ra
cker loo
k
s up
the
queu
e a
nd a
s
sign
s th
e mo
st
approp
riate task for the no
de,
the one whose data blo
ck i
s
sto
r
ed o
n
the node, fo
r example.
For
t
he data block co
mpo
s
ed
of uniform
data
t
hat t
o
tally publi
c
or
private, T
a
skTra
c
ke
r
read
s
a
re
cord from
the
blo
c
k through
a
Re
cordRead
er o
b
je
ct an
d
perfo
rms the
pro
c
e
s
s defin
ed
in map fun
c
ti
on. For th
e d
a
ta blo
ck mix
t
ure of
pu
blic data an
d pri
v
ate data, there a
r
e
origi
n
al
block i
n
p
r
iva
t
e clo
ud
and
saniti
zed
re
pl
ica i
n
p
ubli
c
clou
d b
u
t wit
h
p
r
ivacy l
a
b
e
l table
both.
So,
TaskT
r
a
c
ker
read
s th
e pri
v
acy data o
n
l
y
accordi
ng t
o
the p
r
ivacy
label tabl
e a
nd pe
rforms t
he
pro
c
e
ss
defi
ned in m
ap
function i
n
p
r
ivate clo
ud
clu
s
ter, while
read
s the
p
ublic
data o
n
ly
according to the priva
c
y label table in p
ublic
clou
d cl
usters.
Each m
ap ta
sk
co
nsume
s
a data
blo
c
k, no
m
a
tter
uniform or mixture,
as an
inp
u
t
key/value
pai
r n
a
med
and p
r
od
uces a
set
of int
e
rme
d
iate
ke
y/value pairs nam
ed
. The
list is stored a
s
the o
u
tput data of map functio
n
s
in local nod
e.
3.2.2.
Re
duc
e Stage
:
In Redu
ce st
age, red
u
ce functio
n
s a
r
e
perfo
rmed o
n
distribute
d
d
a
ta output fro
m
map
function
s in
every clu
s
te
r. Similar to the sc
h
edulin
g pro
c
e
s
s fo
r map fu
ncti
ons,
JobT
ra
cker
cre
a
tes T
a
skInProgress o
b
ject a
c
cordi
ng to red
u
ce
function a
n
d
place it into
JobInProg
r
e
ss
queu
e. Whe
n
a “hea
rtbeat
” signal
come
s and the p
r
e
v
ious map o
p
e
ration b
e
ing
compl
e
ted, this
obje
c
t wo
uld
be a
ssi
gne
d
to a Ta
skTra
c
ker
of a
co
mputing
re
so
urce n
ode. E
a
ch
re
du
ce t
a
sk
con
s
um
es th
e inte
rmedi
ate
list mig
h
t nee
ded
to
be d
r
ag
ged
f
r
om
othe
r re
sou
r
ce
node
s. Ta
kin
g
in sa
me
ke
y like
, a set of corre
s
p
o
n
d
ing value
s
li
ke
would
be cl
assified
and valu
es
of
woul
d be
merg
ed a
ccordin
g to the
spe
c
ific
re
d
u
ce
operation.
When
set
of
key/value
p
a
irs a
s
list
for
different
a
r
e
pro
d
u
c
ed,
Name
No
de i
n
su
b-cont
rol
node’
s
would
sen
d
a
subm
issi
on to
cent
ral control no
de’s
Jo
bTracke
r
for transfe
rri
n
g
the
s
e
outp
u
t data
ba
ck to p
r
ivate
cl
oud. If thi
s
submissio
n
co
mes from
pu
blic
clou
d cl
uste
r, it will be p
a
s
sed to
sub
-
control
no
de
to locate
a free re
so
urce
node i
n
priva
t
e
clou
d cl
uste
r
and
sched
ule
this no
de’
s
Data
Nod
e
to
finish thi
s
tra
n
sfer and
sto
r
e the
s
e
data
.
Al
the local Red
u
ce
re
sults
would b
e
sent
back to
p
r
ivate clo
ud
clu
s
ter to pe
rform
Private Re
d
u
ce
t
a
sk.
3.2.3.
Global
R
edu
ce Sta
g
e
:
In PrivateRe
d
u
ce
stag
e, the final re
du
ce
ope
ration, being
the sa
me
as re
duce
functio
n
in Re
du
ce
stage, a
r
e p
e
rf
orme
d on
da
ta output fro
m
red
u
ce fu
nction
s exe
c
uted in
Red
u
c
e
stage. After
JobTracke
r
re
ceive
s
all the
sub
m
issi
on
s from all
clu
s
ters, it p
a
sse
s
the su
bmi
ssi
on
about
red
u
ce
to sub
-
control no
de.
Na
meNo
de i
n
t
h
is
nod
e
cre
a
tes
Ta
skInP
rog
r
e
s
s obj
e
c
t
according to
redu
ce fu
ncti
on an
d pla
c
e
it into
JobIn
P
rogress q
u
e
ue.
Wh
en
a “heartb
eat” si
gnal
come
s, this
obje
c
t woul
d be assig
ned
to a TaskT
r
a
c
ker of a
co
mputing reso
urce no
de. Each
redu
ce ta
sk
con
s
um
es th
e intermedi
ate
list might
need
ed to be
dragg
ed fro
m
other
resou
r
ce no
d
e
. Taki
ng in
same
key like
, a set of corre
s
po
ndin
g
val
ues li
k
e
woul
d be
cla
ssifie
d
an
d value
s
of
would b
e
me
rged a
c
cordin
g to the spe
c
ific
redu
ce o
p
e
r
ation. Whe
n
the set of key/value pa
irs a
s
list for differe
nt
are
prod
uced, th
is final
outp
u
t data
wou
l
d be
su
bm
i
tted to user by ce
ntral
cont
rol n
o
d
e
’s
JobT
ra
cker.
If all the data are di
strib
u
ted in only on
e cl
u
s
ter,
su
ch as the
data
submi
s
sion t
y
pe is
private and
a
ll the data are limited
in p
r
ivate clo
ud cluster, t
he th
ree-stage
sch
edulin
g pro
c
e
ss
become
s
the same a
s
two
-
stage
sched
u
ling pro
c
e
s
s.
The th
ree
-
stage
Map
-
Rd
uce
-
Glo
bal
Re
duce
sched
u
ling p
r
o
c
e
s
s bri
n
g
s
a
b
o
u
t these
advantag
es t
o
HMR: 1
)
Achieving
the ta
rget of priva
cy data protect
i
on. This sch
edulin
g pro
c
e
ss
provide
s
the
assuran
c
e b
y
isolating
storag
e
of priv
acy data a
n
d
corre
s
po
ndi
ng interm
edi
ate
i
i
v
k
,
m
m
v
k
,
m
m
v
k
,
r
m
m
n
m
m
v
k
v
v
k
,
,
,
,
1
m
m
v
k
,
m
k
m
n
m
m
v
v
k
,
,
,
1
]
,
,
[
1
m
n
m
v
v
r
m
v
k
,
m
k
o
m
r
n
r
m
v
k
v
v
k
,
,
,
,
1
r
m
v
k
,
m
k
r
n
r
m
v
v
k
,
,
,
1
]
,
,
[
1
r
n
r
v
v
o
m
v
k
,
m
k
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
e-ISSN:
2087
-278X
A Priva
cy
Dat
a
-o
riente
d
Hi
era
r
chical Ma
pRe
d
u
c
e Pro
g
ram
m
i
ng Model (Haiwen
Han
)
4593
data in priva
t
e cloud cl
uster and sche
duling onl
y the private cl
oud co
mputi
ng re
sou
r
ce to
perfo
rm the correspon
ding
comp
utation
inclu
de m
ap t
a
s
ks a
nd r
e
d
u
ce t
a
sk
s.
2
)
Minimizin
g
t
he
comm
uni
cati
on
ove
r
he
ad
betwe
en clo
u
d
s. Hug
e
co
mputing
amo
unt
of red
u
ce
ope
ration
o
c
curs
in Red
u
ce st
age in p
ubli
c
clou
d clu
s
te
r
decrea
s
e
s
th
e output data
that is nee
de
d to tran
sferred
from pu
blic
cl
oud to p
r
ivat
e clo
ud. 3
)
M
a
ximi
zing th
e
outso
urce
d
comp
ut
ation.
In both the M
a
p
stage
an
d
Redu
ce
stag
e, all th
e
comp
utation
ta
sks
about
non
-p
ri
vacy data
a
r
e exe
c
uted
o
n
ly
usin
g publi
c
cloud re
so
urce
.
4. Conclusio
n
A privacy
d
a
ta-o
riente
d
hierarchi
c
al
Map
R
edu
ce programm
i
ng mo
del,
HMR, i
s
pre
s
ente
d
in
this pa
pe
r. HMR provide
s
not
o
n
ly the solution f
o
r the
priva
cy data-o
r
ient
ed
appli
c
ation
b
u
t also th
e
scala
b
ility for
scale
out t
h
e
appli
c
atio
n i
n
la
rge
scal
e
hybri
d
clou
d
.
In
addition,
th
e scena
rio
s
co
mposed of
o
ne
p
r
ivate cl
o
ud
clu
s
ter an
d multi
publi
c
clo
ud
clu
s
te
rs i
s
coin
cid
ed
wit
h
the
re
quire
ment of
actu
a
l
appli
c
at
io
ns.
In summ
ari
z
e,
HM
R ha
s great
ap
plicat
ion
pro
s
pe
cts.
Referen
ces
[1]
W
a
lker E. Be
n
c
hmarkin
g
Am
azon
EC2
for h
i
gh-
performa
nc
e scie
n
tific c
o
mputin
g.
Use
n
i
x
Lo
gin
. 200
8;
33(5): 18-
23.
[2]
Su H.
Distribution Grid Reac
tive Po
w
e
r
Optimizati
on B
a
sed
on
Impro
v
ed
Clo
ud
Pa
rticle S
w
a
r
m
Algorit
hm.
T
E
LKOMNIKA Indones
ian J
ourn
a
l
of Electrica
l
Engi
neer
in
g.
2012; 11(
1): 468
-475
[3]
S Jha, L Krug
er, V Shmatikov.
T
o
w
a
rds practical pr
ivacy
for geno
mic
computati
on.
In 200
8 IEE
E
S
y
mp
osi
u
m on
Securit
y
and P
r
ivac
y
.
Ma
dis
o
n. 2008: 2
16-2
30.
[4]
Xu
e SJ, W
u
W
.
Sched
uli
n
g
W
o
rkflo
w
in
Cl
ou
d C
o
mputi
ng
B
a
sed
on
H
y
br
id
Particl
e
S
w
a
r
m Alg
o
rithm.
T
ELKOMNIKA
Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
n
g
.
2012; 1
0
(7): 1
560-
156
6.
[5]
J Dea
n
, S Gh
e
m
a
w
at. Map
R
e
duce: s
i
mpl
i
fie
d
d
a
ta pr
ocess
i
ng
on
lar
ge c
l
usters.
Co
mmunic
a
tions
of
the ACM
. 200
8
;
51(1): 107
–11
3.
[6]
Jin Jing, W
a
n
g
Yan, Li
Xi
n, et al. A
New
Mu
lti-Maste
r
F
r
ame
w
o
r
k of MapRe
duc
e.
Journa
l of
Beiji
ng
Univ
ersi
ty of Posts and T
e
leco
mmu
n
ic
ations.
20
12; 3
5
(4): 89-9
3
.
[7]
Yuan
Lu
o, Be
th Plal
e.
Hi
er
archic
al Ma
pR
educ
e Pro
g
ra
mmi
ng M
o
d
e
l
an
d Sch
edu
li
ng Al
orith
m
s
.
Procee
din
g
s o
f
the 2012 12t
h IEEE/ACM Internati
o
n
a
l Sy
mp
osi
u
m on Cluster. Washi
ngton. 20
12
:
769-
774.
[8]
Kehu
an Z
h
an
g
,
Xi
ao
yo
ng Z
h
ou, Ya
ng
yi
Ch
en, et a
l
.
Sed
i
c: Privacy-Aw
are Data I
n
tensi
v
e Co
mputin
g
on Hy
brid
Clo
uds.
Proce
e
d
i
ngs of the
18
th ACM conf
e
r
ence
on C
o
m
puter a
nd c
o
mmunicati
on
s
securit
y
. C
h
ic
a
go. 201
1: 515-
526.
[9]
T
a
y
l
or RC. An
overvi
e
w
of th
e Ha
doo
p/Map
R
ed
uc
e/HBas
e
frame
w
ork a
n
d
its current
a
pplic
atio
ns i
n
bioi
nformatics.
BMC bioi
nfor
matics.
2010; 1
1
(
Supp
l 12): S1.
[10]
T
W
hite
T
.
Hadoop: T
he defini
t
ive gui
de. Se
b
a
stopo
l: O'
Reill
y Med
i
a Inc. 20
12: 203-
24
3.
Evaluation Warning : The document was created with Spire.PDF for Python.