Indonesi
an
Journa
l
of El
ect
ri
cal Engineer
ing
an
d
Comp
ut
er
Scie
nce
Vo
l.
12
,
No.
3
,
Decem
ber
201
8
, p
p.
11
32
~
11
42
IS
S
N: 25
02
-
4752, DO
I: 10
.11
591/ijeecs
.v1
2
.i
3
.pp
11
32
-
11
42
1132
Journ
al h
om
e
page
:
http:
//
ia
es
core.c
om/j
ourn
als/i
ndex.
ph
p/ij
eecs
An Accu
rate and
Effici
ent
Schedul
er for H
adoop M
apReduce
Fra
m
ework
D C
Vinutha
1
,
G.T.
R
aj
u
2
1
Depa
rtment of inform
at
ion
sci
en
ce
and engi
ne
ering,
Vid
y
av
ard
ha
ka
co
ll
eg
e
of
en
gine
er
ing, m
y
sur
u,
Indi
a
1,
2
Depa
rtment
of
Com
pute
r
sci
en
ce
and e
ngineeri
ng,
R.
N.S.I
.
T
,
B
e
ngal
uru, Indi
a
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
Dec
28
, 201
7
Re
vised J
an
9
, 201
8
Accepte
d
Aug
2
1
, 201
8
MapReduc
e
is
th
e
pr
efe
rr
ed
computing
fra
m
ewor
k
used
in
l
arg
e
d
at
a
an
aly
sis
and
pro
ce
ss
ing a
ppli
c
at
ions.
Had
oop
is
a
widely
used
MapRedu
c
e
fr
amework
ac
ross
diffe
r
ent
comm
unity
du
e
to
it
s
open
source
n
at
ure
.
C
loud
servi
c
e
provide
r
such
as
Microsoft
az
ur
e
HD
Insight
o
f
fer
s
resourc
es
to
its
customer
and
onl
y
pa
y
s
fo
r
the
ir
use
.
How
eve
r,
the
cri
t
ical
cha
l
le
nges
of
cloud
service
provide
r
is
to
m
e
et
user
t
ask
Servi
ce
le
v
el
agr
e
eme
nt
(SLA)
r
equi
re
m
ent
(ta
sk
dea
dl
ine
).
Curr
e
ntly
,
th
e
onus
is
on
c
li
en
t
to
compute
the
amount
of
resourc
e
req
uire
d
to
run
a
job on
c
loud.
T
his
work
pre
sen
t
a
novel
m
ake
sp
an m
odel
for
Hadoop
MapR
educ
e
fr
amework
name
l
y
O
HM
R
(Optimiz
ed
H
adoop
MapReduc
e)
to
proc
ess
data
in
real
-
ti
m
e
an
d
uti
l
iz
e
s
y
st
e
m
resourc
e
eff
icientl
y
.
The
OH
MR
pre
sent
a
cc
ura
te
m
odel
to
compute
job
m
a
kespa
n
ti
m
e
and
a
lso
pre
sent
a
m
odel
to
prov
ision
the
amount
of
cl
oud
resour
ce
r
equi
re
d
to
m
eet
t
ask
d
e
adl
in
e.
Th
e
OH
MR
first
bui
ld
a
pro
file
for
each
job
and
computes
m
akes
pan
ti
m
e
of
jo
b
using
gre
ed
y
appr
oa
ch.
Furt
her
m
ore
,
to
provision
amount
of
resourc
e
req
uire
d
to
m
e
et
t
ask
dea
d
li
n
e
La
g
ran
g
e
Multi
pliers
t
ec
h
nique
is
appl
i
ed
.
Expe
rimen
t
ar
e
conduc
t
ed
on
Microsoft
Azure
HD
Insight
cl
oud
pl
at
form
conside
ring
di
ffe
ren
t
app
li
c
at
ion
such
as
t
ex
t
computing
and
b
ioi
nform
at
i
cs
ap
pli
c
at
ion
to
ev
a
l
uat
e
per
form
an
c
e
of
OH
MR
of
over
exi
sting
m
odel
show
s
sig
nifi
c
ant
per
form
anc
e
improvem
e
nt
in
te
rm
s
of
computation
ti
m
e.
Expe
r
i
m
ent
are
cond
uct
ed
on
Micr
osoft
Azure
HD
Insight
cl
ou
d.
Over
al
l
good
cor
re
lation
is
rep
orte
d
be
tween
pra
c
ti
c
al
m
ake
span
values
and
the
or
et
i
ca
l
m
ake
span
values
.
Ke
yw
or
d
s
:
Bi
g
data
Bi
oin
f
or
m
at
ic
s
Cl
oud
c
om
pu
ti
ng
Hado
op
Ma
pRed
uce
Parall
el
co
m
puti
ng
Copyright
©
201
8
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
D
C
Vinutha
,
Dep
a
rtm
ent o
f
Inform
at
ion
Sc
ie
nce &
en
gin
e
erin
g
,
Vidyava
r
dh
a
ka
colle
ge o
f
e
ngineerin
g, m
ysu
ru, In
dia
,
Em
a
il
:
vin
ud
c
@
gm
ai
l.co
m
1.
INTROD
U
CTION
The
Ma
ny
org
anizat
ion
s
s
uc
h
as
in
dustria
l,
gove
r
nm
ent
and
ed
ucati
on
i
ns
ti
tuti
on
colle
ct
s
m
assive
a
m
ou
nt
of
dat
a
from
var
io
us
source
s
suc
h
as
sen
sor
network,
s
ocial
ne
twork
,
bi
oinf
or
m
at
ic
s
and
World
W
i
de
We
b
et
c.
f
or
var
i
ous
a
pp
li
cat
io
n
us
e
s
.
Per
form
ing
s
c
al
able
an
d
an
al
ysi
s
on
thes
e
unstruct
ur
e
d
data
is
m
os
t
desired
acro
s
s
m
any
or
ga
nizat
ion.
T
he
sta
te
-
of
-
a
rt
m
od
el
fin
ds
diff
ic
ulti
es
in
perform
ing
re
al
-
tim
e
analy
sis
on
c
onti
nuous/st
rea
m
data.
Fo
r
pe
rfor
m
ing
re
al
-
tim
e
analy
sis
for
data
inten
siv
e
ap
plica
ti
on
s
,
Goo
gle
hav
e
c
om
e
up
with
pa
rall
el
pro
gr
am
m
ing
m
od
el
cal
le
d
Ma
pRed
uce
fra
m
ewo
r
k
[
1].
It
is
highly
sc
al
able,
fau
lt
tole
ran
t
and
pa
rall
el
iz
e
exec
ution
in
distrib
uted
nat
ur
e
acr
os
s
cl
ust
er
of
c
om
puti
ng
nodes
.
H
adoo
p
Ma
pRed
uce
f
r
a
m
ewo
r
k
[
2]
ha
s
bee
n
wi
dely
a
dopted
ac
r
oss
var
i
ou
s
orga
ni
zat
ion
w
hen
c
om
par
ed
with
counter
par
ts
Phoe
nix
[
3], Mars
[4] an
d Dr
ya
d
[
5]
due to
op
e
n so
ur
c
e n
at
ure
[6
]
.
The
Ha
doop
M
apRed
uce
m
odel
pr
e
dom
inantly
con
sist
of
f
ol
lowing
ph
a
ses
,
Setu
p,
Ma
p,
S
huff
le
,
S
or
t
and
Re
duce
w
hich
is
s
ho
w
n
in
F
ig
ur
e
1.
T
he
Ha
doop
fr
a
m
ewo
r
ks
c
onsist
s
of
a
m
ast
er
node
a
nd
a
cl
us
te
r
of
com
pu
ti
ng
node
s.
Jobs
s
ubm
i
t
te
d
to
Hado
op
are
f
ur
t
her
distrib
uted
int
o
M
ap
an
d
Re
du
ce
ta
sk
s.
In
set
up
ph
a
se,
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
An
Acc
urate
and Ef
fi
ci
ent S
c
heduler f
or
Hadoo
p
M
apRed
uc
e Framew
or
k
(
D.
C.Vi
nuth
a
)
1133
input
data
of
a
j
ob
to
be
proc
essed
(
residi
ng
gen
e
rall
y
on
t
he
Ha
d
oop
Di
stribu
te
d
Fil
e
Syst
e
m
s
(H
DFS))
i
s
log
ic
al
ly
pa
rtit
i
on
e
d
into
ho
m
og
e
nous
vo
l
ume
s
cal
le
d
ch
unks
f
or
the
Ma
p
worker
no
des.
Hado
op
div
ide
s
eac
h
Ma
pRed
uce
jo
b
i
n
t
o
set
of
t
asks
we
re
eac
h
c
hunk
is
pro
cessed
by
Ma
p
w
orke
r.
Ma
p
ph
a
se
ta
kes
i
nput
as
key/
valu
e
pair
as
(
1
,
1
)
an
d
ge
ne
rate
li
st
of
(
2
,
2
)
interm
ediat
e
key/
value
pair
a
s
ou
t
put.
S
huf
fle
phase
beg
i
ns
with
c
om
ple
ti
on
of
Ma
p
phase
that
c
ollec
ts
the
interm
ediat
e
key/
value
pair
from
al
l
the
Ma
p
ta
sk
.
A
so
rt
operati
on
is
perform
ed
on
the
i
nterm
edia
te
key/
value
pair
of
m
ap
ph
ase.
F
or
sim
plici
ty
so
rt
an
d
s
huff
le
ph
a
ses
a
re
cu
m
ula
ti
vely
consi
der
e
d
i
n
t
he
sh
uffle
ph
ase
.
Re
duce
phase
pr
ocesses
s
or
t
ed
i
nterm
ediate
dat
a
base
d on us
e
r defi
ned f
un
ct
i
on. O
utput o
f re
du
ce
phase
is s
tore
d/wr
it
te
n t
o HDFS.
Figure
1
.
Ha
do
op MapRe
du
ce
Com
pu
ta
ti
on
Mod
el
The
Az
ure
H
D
In
si
gh
t
Cl
oud
a
id
i
n
ac
hievi
ng
scal
able
perfor
m
ance
i.e.
us
er
can
set
up
a
nd
r
un
Hado
op
app
li
cat
io
n
on
a
la
rg
e
-
scal
e
cl
us
te
r.
Az
ur
e
H
DInsig
ht
Cl
ou
d
al
lo
w
us
e
r
to
config
ure
the
a
m
ou
nt
of
res
ourc
e
(v
irt
ual
c
om
puti
ng
node
)
re
quire
d
t
o
pe
rform
certai
n
ta
sk
.
H
oweve
r,
at
pr
ese
nt
Ha
doop
job
with
dea
dlin
e
requirem
ent
is
no
t
s
upporte
d
i
n
H
DInsi
gh
t
cl
oud.
The
onus
is
on
t
he
cl
oud
us
er/cl
ie
nt
to
com
pu
te
the
a
m
ou
nt
of
re
source
re
qu
i
rem
ent
to
m
eet
ta
sk
dea
dline
w
hich
is
a
chall
en
ging
ta
sk.
The
refo
re,
Ha
doop
m
akes
pan
m
od
el
li
ng
ha
s b
ecom
e
an
im
p
or
ta
nt
crit
eria
in
c
om
pu
ti
ng
a
m
ou
nt
of r
es
ources r
e
qu
ire
d
to
m
eet
ta
sk
d
e
adlin
e
.
It
s
hould
be
no
te
d
t
hat
m
akes
pan
m
od
el
ing
is
a
c
halle
ng
i
ng
ta
s
k
sin
ce
Ha
doop
j
obs
in
vo
l
ves
m
ulti
ple
processi
ng
sta
ge
w
hich
c
om
posed
of
three
co
re
sta
ge
(i.e.
M
ap,
Shuffle
a
nd
Re
du
c
e
sta
ge
).
More
over,
the
first
wav
e
of
s
huff
l
e
sta
ge
is
ge
ne
rall
y
processe
d
in
par
al
le
l
fas
hion
with
Ma
p
sta
ge
(i.e.
ove
rlap
ping
ph
ase
)
an
d
rest
of
the
wa
ve
s
of
the
S
huff
l
e
sta
ge
are
pro
cessed
post
c
om
ple
ti
on
of
M
ap
sta
ge
(i.e.
non
-
ov
e
rlap
ping
phase
).
To
util
iz
e
the
cl
oud
res
ource
s
ef
fici
ently
,
num
ero
us
m
akesp
a
n
m
od
el
s
f
or
Ha
doop
is
presente
d
[7
]
,
a
nd
[
8].
Howe
ver,
thes
e
appr
oach
e
s
a
re
not
ac
cu
rate
and
i
nc
ur
s
high
com
pu
ti
ng
ov
erh
ea
d/tim
e.
Since
these
a
ppr
oach
e
s
did
not co
ns
i
de
r ov
e
rlap
ping a
nd no
n
-
overlap
ping
ph
a
ses
of
the S
huff
le
sta
ge.
Re
centl
y,
a
num
ber
of
sop
his
ti
cat
ed
Hadoop
pe
rfor
m
ance
m
od
el
s
are
propose
d
[
9
-
14]
.
Starfish
[
9]
colle
ct
s
a
r
unni
ng
Ha
do
op
j
ob
pro
file
at
a
f
ine
gr
a
nula
rity
with
detai
le
d
inf
or
m
at
ion
f
or
job
est
im
ation
a
nd
op
ti
m
iz
ation
.
On
t
he
to
p
of
Starfis
h,
Ela
sti
ci
ser
[
10
]
is
pr
opos
e
d
f
or
res
ource
prov
isi
onin
g
in
te
rm
s
of
virt
ual
m
achines.
H
oweve
r,
c
ollec
tin
g
the
detai
le
d
exec
utio
n
pro
file
of
a
Hado
op
j
ob
inc
ur
s
a
hi
gh
ov
e
r
head
w
hic
h
le
ads
to
an
ove
restim
at
ed
job
execu
ti
on
tim
e
.
I
n
[11],
[12],
and
[
13]
co
ns
i
der
s
both
the
overla
pp
i
ng
an
d
non
-
ov
e
rlap
ping
sta
ges
an
d
us
e
s
si
m
ple
li
near
re
gressi
on
f
or j
ob
estim
ation
.
T
his
m
od
el
al
so
es
tim
a
te
s
the
am
ount
of
resou
rces
f
or
jo
bs
with
dea
dline
re
quirem
ents.
CR
ESP
[
14
]
est
im
a
te
s
j
ob
e
xec
ution
a
nd
s
uppo
rts
res
ourc
e
pro
vision
i
ng
in
te
rm
s
of
m
ap
and
re
duce
slot
s.
H
oweve
r,
bo
th
the H
P
m
od
el
and
CR
ESP
ignore
the
im
pact
of
the
nu
m
ber
of
redu
ce
ta
sk
s
on
job
perform
a
nce.
T
he
HP
m
od
el
is
restri
ct
ed
to
a
c
on
st
ant
nu
m
ber
of
reduc
e
ta
sk
s,
wh
e
reas
CR
ESP
only
consi
ders
a
sin
gle
wa
ve
of
t
he
reduce
phase
.
I
n
CR
ESP
,
the
num
ber
of
reduce
ta
sk
s
ha
s
to b
e equ
al
t
o
num
ber
of
re
duce
sl
ots.
It
is u
nr
eal
ist
ic
to
config
ure
ei
ther
t
he
sa
m
e
nu
m
ber
of r
ed
uce
ta
sk
s
or
the
sin
gle
wa
ve
of
t
he
re
duce
phase
f
or
al
l
the
j
ob
s.
It
ca
n
be
a
r
gued
t
hat
in
pra
ct
ic
e,
the
nu
m
ber
of
reduce
ta
s
ks
va
ries
de
pe
nd
i
ng
on
th
e
siz
e
of
t
he
in
pu
t
da
ta
set
,
the
ty
pe
of
a
Ha
doop
a
pp
li
cat
io
n
(
e.
g.
CPU
intensive
,
or
di
sk
I/O
i
ntensi
ve
)
a
nd
us
e
r
re
quirem
ents.
F
ur
t
her
m
or
e,
f
or
t
he
re
duce
phase,
usi
ng
m
ulti
ple
wav
e
s
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
11
32
–
11
42
1134
gen
e
rates
bette
r
perform
ance
than
us
i
ng
a
si
ng
le
wa
ve
e
sp
e
ci
al
ly
wh
en
Ha
doop
processes
a
la
r
ge
dataset
on
a
sm
a
ll
a
m
ou
nt
of
resou
rces.
Wh
il
e
a
sin
gle
wav
e
reduces
t
he
ta
sk
set
up
over
hea
d,
m
ultip
le
wa
ves
im
pr
ov
e
t
he
util
iz
at
ion
of t
he disk
I
/O
.
To
a
ddress
t
he
researc
h
c
halle
ng
e
s
this
wor
k
prese
nt
an
a
ccur
at
e
a
nd
ef
f
ic
ie
nt
m
akesp
a
n
m
od
el
f
or
Hado
op
Ma
pRedu
ce
fr
am
ework
nam
ely
OHM
R
(
Op
ti
m
iz
e
d
Ha
doop
Ma
pR
edu
ce
)
to
pro
cess
data
i
n
rea
l
-
tim
e
and
util
iz
e
syst
e
m
resour
ce
ef
fici
ently
.
The
OH
MR
pr
e
sent
accu
rate
m
od
el
to
com
pu
te
j
ob
m
akesp
a
n
ti
m
e
an
d
al
so
present
a
m
od
el
to
pro
vi
sion
the
am
ount
of
cl
oud
res
ource
re
quired
to
m
eet
ta
sk
de
adline.
T
he
O
HMR
first
buil
d
a
prof
il
e
f
or
eac
h
job
a
nd
com
pu
t
es
m
akesp
an
ti
m
e
of
jo
b
us
i
ng
gree
dy
ap
pro
ach.
F
urt
her
m
or
e
,
to
pro
vision am
ou
nt
of r
e
source
r
e
qu
ire
d
t
o
m
eet
task
dead
li
ne
Lagran
ge
M
ul
ti
pliers tec
hn
i
qu
e
is a
pp
li
ed
.
The
C
on
t
rib
ution o
f researc
h work i
s
as
f
ollow
s:
1)
This
wor
k pr
es
ent an ac
c
ur
at
e
m
akesp
an
m
od
el
for HMR
ai
ding
perform
a
nce im
pr
ovem
ent.
2)
Ex
per
im
ents co
ns
i
der
i
ng d
i
ve
rse
cl
ou
d
c
onf
igurat
ion
s
and
var
ie
d
a
pp
li
cat
ion
c
onfi
gurati
on.
3)
Correl
at
ion
bet
ween t
he
or
et
ic
al
m
akesp
a
n
m
od
el
a
nd e
xp
e
r
i
m
ental
v
al
u
es.
The
rest
of
the
pap
e
r
is
orga
nized a
s
fo
ll
ow
s
.
E
xtensiv
e
rese
arch
s
urvey
is
carried
ou
t
in
S
ect
ion
2
.
In
S
ect
ion
3
t
he
pro
posed
m
akesp
an
m
od
el
li
ng
for
Hado
op
Ma
pRed
uce
fr
am
ewor
k
is
pr
ese
nted.
I
n
penult
i
m
ate
sect
ion
e
xperi
m
ental
stud
y i
s
carr
ie
d ou
t.
T
he
c
on
cl
us
io
n and f
uture
wor
k
is
desc
ribe
d
i
n
la
st sect
io
n.
2.
RELATE
D
W
ORK
In
t
his
sect
io
n,
a
detai
le
d
li
te
ratur
e
is
pr
es
ented
a
bout
t
he
co
nv
e
ntio
nal
sta
te
-
of
-
a
rt
da
ta
analy
ti
c
te
chn
iq
ues
.
I
n
[9
]
,
a
l
ocali
ty
base
d
Ha
doop
cl
us
te
r
m
od
el
is
ad
op
te
d
wh
i
ch
rely
upon
t
he
distance
betwee
n
input
inf
orm
ati
on
a
nd
pr
ocess
ing
nodes
.
T
his
te
chn
i
qu
e
t
ry
to
overc
om
e
fr
om
var
io
us
is
su
es
of
sta
te
-
of
-
a
rt
te
chn
iq
ues
su
c
h as
high
ove
rhead,
require
d l
arg
e
st
or
a
ge c
a
pacit
y
and
e
xp
ensive i
n real
ti
m
e.
Howev
e
r,
it
al
so
induce
s lar
ge d
el
ay
an
d
c
ause
s p
e
rfor
m
ance
degra
dation.
In
[10],
a
cl
ou
d
base
d
op
ti
m
i
zat
ion
f
ram
ewo
r
k
is
ad
opte
d
to
m
ee
t
dead
li
nes
an
d
acc
om
pl
ish
data
local
it
y.
They
pr
ese
nted
he
uri
sti
c
te
chn
iqu
e
to
pro
visio
n
ta
sk
SL
A
re
quir
e
m
ent
of
cl
ou
d
us
er
.
This
te
c
hn
i
que
p
rese
nted
a
n
optim
iz
at
ion
te
chn
i
qu
e
to
m
eet
ta
sk
dead
li
ne
a
nd
m
ini
m
iz
e
the
num
ber
of
no
des
require
d
f
or
ta
sk
processi
ng.
T
he
y
so
lve
d
sin
gle
no
de
fail
ur
e
a
nd
pr
ese
nted
a
trade
off
betwe
en
m
ini
m
izing
dead
li
ne
a
nd
lo
cal
it
y
const
raint.
Ou
t
com
e
sh
ows
re
du
ct
io
n
o
f
sto
r
age
a
nd
com
puta
ti
on
over
hea
d.
H
oweve
r
th
ey
did
not
c
ons
idere
d
ta
sk
dead
li
ne
a
wr
e
sch
e
duli
ng and
perform
ance ev
al
ua
ti
on c
on
si
der
i
ng com
pu
te
intensiv
e ap
plica
ti
on
.
In
[11],
a
perform
ance
enh
a
nc
e
m
ent
te
chn
iq
ue
is
introd
uced
for Ha
doop
m
od
el
base
d
on
m
et
adata
of
interrelat
ed
ta
sk
s.
This
te
c
hn
i
qu
e
pe
rm
it
s
Na
m
e
Nodes
to
fin
d
bl
oc
k
w
hi
ch
ar
e
preset
i
n
the
cl
us
te
r
to
store
sp
eci
fic
data.
T
heir
m
od
el
at
ta
ined
supe
rior
pe
rfor
m
ance
tha
n
Ha
doop
fr
am
ewor
k.
For
pe
r
form
ance
eval
uation
they
co
ns
ide
re
d
Bi
oi
nfor
m
at
i
c
s
ap
plica
ti
on
.
Ex
per
im
ent
outc
om
e
sh
ow
s
good
pe
rfor
m
ance
in
te
rm
s
of
I
.O
c
os
t
m
ini
m
iz
at
ion
and
m
akesp
a
n
tim
e
red
uctio
n.
H
owe
ver
,
t
hey
did
no
t
c
on
si
der
e
d
perf
or
m
ance
evalu
at
ion
consi
der
i
ng d
i
f
fer
e
nt appli
cat
ion an
d
t
hey co
ns
ide
red pe
rfo
r
m
ance ev
al
uation f
or
sm
al
l gen
om
ic
d
at
a size
.
In
[
12
]
,
a
Ha
doop
m
od
el
is
presente
d
bas
ed
on
Ma
pRe
du
ce
pe
r
form
a
nce
m
odules
t
o
reduce
dela
y
and
con
te
ntion
i
n
the
netw
ork
and
e
nhance
perform
ance
of
the
syst
e
m
.
An
d
it
also
hel
ps
to
decr
eas
e
synch
ronizat
io
n
delay
an
d
sc
hedule
diff
e
re
nt
ta
sk
s
at
a
ti
m
e.
T
hey
al
so
pr
esented
a
the
or
et
ic
al
evaluati
on
of
their
m
akesp
a
n
m
od
el
.
Atta
ined
good
acc
uracy
an
d
pe
rfo
rm
ance
evalua
ti
on
is
car
ried
ou
t
for
word
count
app
li
cat
io
ns
.
H
ow
e
ve
r,
t
hey
di
d
not
c
onsider
ed
perform
ance
evaluati
on
c
on
si
der
i
ng
d
i
ve
rse
a
pp
li
cat
io
n
a
nd
evaluati
on
on c
loud
platfo
rm
.
In
[13],
a
n
A
f
fordHa
doop
a
pp
li
cat
io
n
is
a
dopted
t
o
re
du
ce
cost
in
fini
sh
in
g
var
i
ou
s
ta
sk
s
an
d
t
o
al
locat
e
data
and
sc
he
du
le
ta
sk
s
a
nd
hen
ce
eff
ic
ie
ncy
of
s
yst
e
m
get
enh
anced.
H
oweve
r
,
a
NP
-
ha
rd
prob
le
m
occurs
wh
il
e
sc
hedulin
g
dif
fe
r
ent
ta
sk
s
in
sta
t
e
-
of
-
a
rt
te
chn
i
qu
e
.
To
a
ddres
s
NP
-
ha
rdness
,
they
adopted
i
ntege
r
pro
gr
am
m
ing
te
chn
i
qu
e
s
an
d
heurist
ic
reduc
ti
on
an
d
op
ti
m
iz
at
ion
to
ena
bl
e
an
optim
al
s
olu
ti
on.
E
xperi
m
ent
are
c
onduct
ed
consi
der
i
ng
W
ord
co
unt
a
nd
So
rt
a
pp
li
cat
io
n
at
ta
ine
d
go
od
res
ults
i
n
te
rm
s
of
cost
m
ini
m
iz
at
ion
.
Howe
ver, the
oret
ic
al
accur
ac
y per
form
ance ev
al
uatio
n
is
not p
rese
nted.
In
[
14
]
,
a
Hadoop
m
od
el
is
pro
posed
to
pre
dict
ta
sk
s
r
un
-
t
i
m
e
and
al
loca
te
so
m
e
sp
eci
fi
ed
resou
rces
to
accom
pl
ish
ta
sk
s
in
a
n
assi
gn
e
d
ti
m
e
per
iod.
He
nce,
t
he
dead
li
ne
c
onstr
ai
nts
are
m
et
.
It
us
es
m
ulti
ple
wav
e
s
of
a
s
huf
fle
sta
ge.
Ex
pe
rim
ent
are
c
onduct
e
d
consi
der
i
ng
word
co
unt
a
nd
sort
ap
plica
ti
on.
The
or
et
ic
al
acc
ur
acy
perform
ance
evaluati
on
of
m
akesp
a
n
m
od
el
is
prese
nted
s
hows
good
accur
acy
.
H
oweve
r,
it
in
du
ces
hi
gh
ov
e
r
head
t
o
finish
ta
s
ks
a
nd
data
inten
sive
and
di
ver
se
ap
plica
ti
on
s
uc
h
as
bio
i
nfor
m
at
ic
s
app
li
cat
io
n
is
not
consi
der
e
d f
or
perform
ance ev
al
uatio
n.
In
[
15]
,
A
Hadoop
m
od
el
is
a
dopted
to
opti
m
iz
e
H
ad
oop
par
am
et
ers
wit
h
t
he
help
of
pro
gr
am
m
ing
base
d
PS
O.
T
he
PS
O
te
ch
ni
qu
e
helps
t
o
fi
nd
optim
al
par
a
m
et
ers
in
Ha
doop
net
works
for
a
sp
eci
fied
ta
sk
.
Howe
ver,
perf
or
m
ance
e
valu
at
ion
unde
r
cl
oud
c
om
pu
ti
ng
env
i
ronm
ent
is
not
c
onside
re
d.
I
n
[
16
]
,
a
Bi
gD
at
a
com
pu
ta
ti
on
al
m
od
el
is
ad
opte
d
t
o
re
du
ce
c
os
t
with
the
he
lp
of
geo
-
distrib
uted
datace
nters.
T
his
te
c
hn
i
qu
e
helps
t
o
deci
de
the
pa
ram
et
e
rs
to
sel
ect
the
final
data
ce
nt
er.
He
re,
a
f
r
a
m
ewo
r
k
f
or
eff
ic
ie
nt
in
form
at
ion
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
An
Acc
urate
and Ef
fi
ci
ent S
c
heduler f
or
Hadoo
p
M
apRed
uc
e Framew
or
k
(
D.
C.Vi
nuth
a
)
1135
m
ov
e
m
ent
and
to
pr
ov
i
de
re
source
al
locat
ion
and
to
sel
ect
a
require
d
data
c
enter
t
o
decr
ea
se
cost
of
the
s
yst
e
m
is desc
ribe
d.
H
ow
e
ve
r,
tas
k d
eadli
ne req
uire
m
ent o
f
task is
no
t c
onside
red.
Extensi
ve
rese
arch
s
urvey
ca
rr
ie
d
ou
t
sho
w
s
num
ero
us
a
ppr
oac
h
is
prese
nted
to
m
ini
m
i
ze
cost,
ti
m
e
and
am
ou
nt
o
f
resou
rce
re
quir
ed
to
com
pu
te
a
ta
sk
on
H
ad
oop
Ma
pRe
du
ce
fr
am
ework.
T
he
s
urvey
s
hows
nee
d
to
de
vel
op
a
ne
w
m
akesp
a
n
m
od
el
that
m
i
nim
iz
e
a
m
ou
nt
of
resou
rce
re
qu
i
red
to
ta
s
k
dead
li
ne
with
good
accuracy
c
ons
iderin
g
div
e
rs
e
ap
plica
ti
on
.
In
ne
xt
sect
ion
t
he
pr
opose
d
m
akesp
a
n
m
od
el
for
Hado
op
Ma
pRed
uce
fra
m
ewo
r
k
is
pr
esented
.
3.
MAKE
SP
AN
M
ODELL
I
N
G
F
OR
P
ROP
OSED
O
PT
IMIZ
ED
S
C
HEDU
L
A
R
FOR
H
A
DOOP
MAPRE
DUC
E FR
AM
E
W
ORK
This
w
ork
pr
e
sent
an
optim
i
zed
sc
heduler
for
sc
hedulin
g
j
ob
to
m
eet
t
ask
dead
li
ne
t
o
m
eet
Qo
S
require
m
ent
of
ap
plica
ti
on
on
Ha
doop
Ma
pRed
uce
(
HMR)
f
ram
e
work.
Firstl
y,
this
wor
k
present
a
m
at
he
m
at
ic
a
l
m
od
el
to
c
ompu
te
com
pleti
on
ti
m
e
of
M
apRed
uce
jo
b.
Sec
ondly,
t
he
am
ou
nt
of
re
so
urce
require
d
to
m
e
et
task
dead
li
ne
of a
pp
li
cat
io
n
is
pr
e
sente
d.
3.1.
Makesp
an
mo
dell
ing/prop
osi
tion
Firstl
y
we
eval
uate
the
perf
orm
ance
lim
i
ts
fo
r
a
gi
ven
m
ak
espa
n
of
a
s
pe
ci
fied
set
of
ta
sk
s
th
at
is
processe
d
by
slots/
serv
e
rs.
L
et
1
,
2
,
3
,
…
,
be
the
tim
e
pe
rio
d
of
ta
sk
s
of
a
pa
rtic
ular
j
obs
.
T
his
work
co
ns
i
der
slot
al
locat
ion
t
o
a
ta
s
k
ba
sed
on
slot
with
Mi
nim
u
m
Execu
ti
on
Tim
e
(
)
by
a
dopting
G
reedy
al
gorithm
.
Let
be
the m
axim
u
m
t
i
m
e p
erio
d of
ta
sk w
hich
is
r
e
pr
ese
nted
a
s
:
=
ma
x
{
}
(1)
and
be
the
ave
rag
e
tim
e p
erio
d of
ta
sk
whic
h
is
represe
nted
as:
=
(
∑
=
1
)
⁄
.
(2)
The
m
akesp
an
of
a
ta
s
k
t
o
m
eet
is
at
le
ast
∙
an
d
at
m
os
t
(
−
1
)
∙
+
.
We
c
onside
r
t
he
w
or
st
cas
e
scenari
o
f
or
uppe
r
li
m
it
,
that
is,
the
longest
ta
sk
∈
{
1
,
2
,
3
,
…
,
}
with
ti
m
e
per
i
od
is
the
la
st
processe
d
ta
s
k. Co
ns
ide
rin
g
t
his sce
nar
i
o, th
e tim
e taken
b
e
fore c
omm
enc
e
m
ent of last
ta
sk
is sc
he
dule
d
i
s
at
le
ast
(
∑
−
1
=
1
)
⁄
≤
(
−
1
)
∙
⁄
.
T
her
e
fore,
t
otal
exec
ution
ti
m
e
of
al
l
assignm
ent
is
at
le
ast
(
−
1
)
∙
+
⁄
.
T
he
l
ow
e
r
li
m
it
is
sm
a
ll
er,
since
t
he
best
cas
e
is
w
he
n
ta
sk
distrib
uted
eq
ua
ll
y
a
m
on
g
t
he
avail
able
slots.
T
he
refore
,
the
total
e
xe
cution
ti
m
e
of
is
at
le
ast
∙
⁄
.
T
he
t
otal
jo
b
c
om
pl
et
ion
ti
m
e
fo
r
s
cheduli
ng
li
es
betwee
n
the
lo
wer
an
d
uppe
r
lim
it
.
These
li
m
it
are
m
os
tl
y
ben
e
fici
al
in
c
ase
w
he
n
the
ti
m
e
per
io
d
of
l
ongest
ta
sk
is sm
al
l as
co
m
par
ed
to
t
otal exec
utio
n t
i
m
e, i.e.
w
hen
≪
∙
⁄
.
3.2.
Co
m
pu
ting j
ob co
mple
tio
n
t
im
e
Let
co
ns
ide
r
jo
b
with
kn
own
exec
utio
n
ti
m
e
that
is
obta
in
ed
from
pr
e
vious
e
xec
ution.
Let
be
execu
te
d
with
new
set
of
data
that
is
segm
ent
ed
into
ℋ
m
ap
ta
sk
s
a
nd
ℬ
reduce
ta
sk
s.
Let
ℋ
be
the
num
ber
of m
ap
slots as
sign
e
d
t
o job
a
nd
ℬ
be
the
nu
m
ber o
f red
uce
s
lots assi
gne
d
t
o job
. Let
ℋ
→
be
t
he
m
ean
tim
e
per
io
d
of
m
ap
ta
sk
of
a
pa
rtic
ular
jo
b
a
nd
ℋ
↑
be
t
he
m
axi
m
u
m
tim
e
per
i
od
of
m
ap
ta
sks
of
a
pa
rtic
ula
r
j
ob
.
T
hen, u
sin
g
m
akesp
a
n
m
od
el
li
ng (
pr
opos
it
ion)
in
sect
ion
a
,
t
he
lo
we
r
li
m
it
s
ℋ
↓
and
up
per
lim
it
s
ℋ
↑
on tim
e p
erio
d of al
l m
a
p
ph
as
e are c
om
pu
te
d as f
ollo
ws
:
ℋ
↓
=
ℋ
∙
ℋ
→
ℋ
(3)
ℋ
↑
=
(
ℬ
−
1
)
∙
ℋ
→
ℋ
+
ℋ
↑
(4)
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
11
32
–
11
42
1136
The
re
du
c
e
ph
ase
is
c
om
po
se
d
of
s
huf
fle,
sort
a
nd
re
duce
s
ta
ge.
Sim
i
la
r
to
m
ap
ph
a
se,
the
m
akesp
a
n
m
od
el
li
ng
(pr
opos
it
io
n)
c
an
be
app
li
e
d
to
est
i
m
at
e
the
lower
lim
it
(
ℬ
↓
)
and
uppe
r
lim
it
s
(
ℬ
↑
)
of
reduce
sta
ge
com
pleti
on
tim
e.
Since
,
we
possess
m
easur
em
ent
o
f
m
ean
and
m
axi
m
u
m
ta
sk
s
ti
m
e
per
iod
s
in
re
du
ce
s
ta
ge,
al
locat
ed
re
duc
e slots
ℬ
and the
nu
m
ber
of r
e
duce ta
s
k
ℬ
.
The
ref
i
nem
ent
li
es
in
com
puti
ng
the
ti
m
e
per
io
d
of
t
he
s
huf
fle
sta
ge
.
F
or
easi
ne
ss,
t
he
so
rt
sta
ge
is
m
erg
ed
w
it
h s
huff
le
sta
ge.
T
her
e
fore,
the s
huff
le
sta
ge
i
n t
he
rem
ai
nin
g r
edu
ce
phase
is
est
i
m
at
ed
as f
ol
lows
:
↓
=
(
ℬ
ℋ
−
1
)
∙
→
(5)
↑
=
(
ℬ
ℋ
−
1
)
∙
→
+
↑
(6)
Finall
y,
ta
king
Eq
uatio
n
(5)
and
(
6)
to
geth
er,
we
ca
n
f
orm
ula
te
the
lo
w
er
a
nd
up
per
li
m
it
of
the
over
al
l
jo
b
com
pleti
on
tim
e of
, which
is
sh
ow
n
as
foll
ows
:
↓
=
ℋ
↓
+
→
1
+
↓
+
ℬ
↓
(7)
↑
=
ℋ
↑
+
↑
1
+
↓
+
ℬ
↑
(8)
wh
e
re
↓
de
picts
the
opti
m
ist
ic
pr
edict
io
n
of
job
com
pleti
on
ti
m
e
and
↑
de
picts
the
pessim
istic
pr
e
dicti
on
of
jo
b
com
pleti
on
tim
e.
In
sec
ti
on
c
,
we
c
ompare
w
hethe
r
t
he
predict
io
n
t
ha
t
is
base
d
on
m
ean
value
bet
w
een
lowe
r
li
m
it
an
d u
pp
e
r
li
m
i
ts t
end
s
to
b
e
close
r
to
m
easur
ed
t
i
m
e p
erio
d.
Th
eref
or
e
, we sta
te
:
↑
=
(
ℋ
↑
+
↓
)
2
(9)
The
E
q
uati
on
(
7)
ca
n
be
re
-
w
r
it
te
n
fo
r
ℋ
↓
by
repl
aci
ng
pa
rts
with
E
q.
(
3)
an
d
(
5),
an
d
sim
il
ar
equ
at
io
n
for
s
ort
and re
duce sta
ges
as
foll
ows
:
↓
=
ℋ
∙
ℋ
→
ℋ
+
ℬ
∙
(
→
+
ℬ
→
)
ℬ
+
→
1
−
→
(10)
The
E
q
uatio
n
(
8) can
b
e
sim
pl
ifie
d
to
co
m
pu
t
e the m
akesp
a
n
ti
m
e is
as f
ol
lows
:
↓
=
↓
∙
ℋ
ℋ
+
↓
∙
ℬ
ℬ
+
↓
,
(11)
wh
e
re
↓
=
ℋ
→
,
↓
=
(
→
+
ℬ
→
)
,
an
d
↓
=
→
1
−
→
.
The
E
q
uatio
n
(
11)
,
re
pr
ese
nt
a
m
akesp
a
n
tim
e
of
jo
b
as
a
f
unct
ion
/
operati
on
of
m
a
p
a
nd
re
duce
sl
ots
assi
gn
e
d
t
o
j
ob
f
or
pe
rfo
r
m
ing
it
s
m
ap
and
re
duce
ta
sks
,
that
is, as a
f
un
ct
io
n of
(
ℋ
,
ℬ
)
. In
sim
il
ar
w
ay
↑
and
→
is wri
tt
en
as foll
ows
:
↓
=
→
∙
ℋ
ℋ
+
→
∙
ℬ
ℬ
+
→
,
(12)
↓
=
↑
∙
ℋ
ℋ
+
↑
∙
ℬ
ℬ
+
↑
,
(13)
3.3.
Resource
re
q
uire
ment
e
sti
mat
i
on
t
o
mee
t
t
as
k
de
ad
li
n
e
Her
e
we
e
valu
at
e
the
m
ini
m
u
m
nu
m
ber
of
m
ap
an
d
re
duc
e
slots
re
quire
d
t
o
m
eet
ta
sk
dead
li
ne
.
To
assure
gua
ran
ti
es
of
ta
sk
d
ea
dl
ine
of
a
Job
in
ti
m
e
we
nee
d
to
c
om
pu
te
w
hat
is
t
he
m
i
nim
u
m
nu
m
be
r
of
Ma
pRed
uce
sl
ots
need
e
d
t
o
be
al
locat
e
d
to
m
eet
ta
sk
dea
dline
with
in
put
da
ta
siz
e
ℐ
.
F
or
achie
ving
it
the
fo
ll
owin
g q
ues
ti
on
nai
res nee
ds t
o be c
onside
red.
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
An
Acc
urate
and Ef
fi
ci
ent S
c
heduler f
or
Hadoo
p
M
apRed
uc
e Framew
or
k
(
D.
C.Vi
nuth
a
)
11
37
is co
ns
i
der
e
d as a lo
we
r
li
m
i
t of the
jo
b
m
a
kes
pan tim
e. Generall
y, this
aid in re
duci
ng a
m
ou
nt
of
resou
rces all
oc
at
ed
f
or jo
b
t
o m
eet
task d
ea
dl
ine
. T
his sett
ing m
igh
t no
t
be
ideali
sti
c in real
en
vi
ronm
e
nt.
is
c
onside
red
as
a
n
uppe
r
li
m
it
of
t
he
job
m
akesp
a
n
ti
m
e.
T
his
will
le
ad
t
o
over
al
lo
cat
ion
of
resou
rces
an
d
m
igh
t
le
ad
to
ver
y
sm
al
le
r
j
ob
c
om
pleti
on
t
i
m
e
than
beca
us
e
w
orst
case
sce
nar
i
o
a
re
ve
r
y
rar
e
phen
om
eno
n i
n p
rod
uctio
n
e
nv
i
ronm
ent.
is
c
on
si
der
e
d
as
a
m
ean
between
lo
wer
a
nd
uppe
r
li
m
i
ts
on
the
j
ob
m
akesp
a
n
ti
m
e.
Th
is
strat
eg
y
m
ay
aid in pro
vid
in
g bala
nce
d resou
rce all
oc
at
ion
/uti
li
zat
i
on that is cl
os
e
r
to
jo
b
m
akesp
an
tim
e
.
The
assi
gnm
ent
of
m
ap
an
d
r
edu
ce
slots
t
o
j
ob
f
or
m
eet
ing
ta
sk
dea
dline
co
ns
ide
rin
g
know
n
j
ob
pro
file
are e
valuate
d usin
g variat
io
n
in
E
q
uation
(
11),
wh
e
re
↓
,
↓
, and
↓
a
re
def
ine
d.
↓
.
ℋ
ℋ
+
ℋ
ℋ
+
↓
=
−
↓
(14)
The
E
q
uatio
n
(
14)
ca
n be sim
plifie
d
as
foll
ows
:
∙
=
ℐ
(15)
wh
e
re
an
d
de
pi
ct
s the num
ber
of
m
ap
an
d re
du
ce
slots all
oc
at
ed
to
jo
b
re
sp
ect
ively
, a
nd
,
and
ℐ
dep
ic
ts t
he
c
orrespo
nd
i
ng exp
ressio
n from
E
q
uatio
n
(
14).
The o
bj
ect
ive
of our m
od
el
is to m
ini
m
iz
e th
e num
ber
of m
ap
a
nd r
e
duce
slot f
or jo
b
. i.e.
, w
e
m
ini
m
iz
e
ℱ
(
,
)
=
+
over
∙
=
ℐ
. W
e
consi
der
Lagr
a
nge m
ultip
li
er a
nd set
ℒ
=
+
+
ℎ
+
−
ℐ
. By dif
fer
e
ntia
ti
ng
ℒ
with
res
pe
ct
to
,
an
d
an
d eq
uating t
o ze
ro, we
ob
ta
in
,
ℒ
=
1
−
ℎ
2
=
0
(16)
ℒ
=
1
−
2
=
0
(17)
ℒ
=
+
−
ℐ
=
0
(18)
So
lvi
ng E
q
uati
on
(
16), (
17)
a
nd (1
8)
sim
ultaneo
us
ly
, we
ob
ta
in
,
=
√
(
√
+
√
)
ℐ
,
=
√
(
√
+
√
)
ℐ
(19)
Using
these
e
quat
ion
t
he
opti
m
al
value
of
m
ap
an
d
reduce
slot
are
obta
in
ed
suc
h
t
hat
th
e
nu
m
ber
of
slots
is
m
ini
m
i
zed
wh
il
e
m
ee
ti
ng
ta
s
k
dead
l
ine
co
ns
trai
nt.
Her
e
we
r
ound
up
the
val
ues
ob
ta
ine
d
from
these
equ
at
io
n f
or
a
ppr
oxim
a
ti
on
. Si
nce th
ese
v
al
ue
s h
a
ve
t
o be i
nteg
ral.
In
ne
xt
sect
ion
the
perform
ance
eval
uatio
n
of
pro
po
se
d
sc
he
du
le
r
over
sta
te
of
art
te
c
hn
i
qu
e
is
s
how
n.
4.
RESU
LT
A
N
D ANALY
SIS
This
sect
ion
pr
ese
nt
pe
rform
ance
e
valua
ti
on
of
pro
po
sed
O
HMR
ov
e
r
sta
te
-
of
-
art
Hado
op
Ma
pRed
uce
F
r
a
m
ewo
r
k
[11].
Hado
op
is
the
m
os
t
widely
use
d/ad
op
te
d
Ma
pRed
uce
platfo
rm
fo
r
c
om
pu
ti
ng
on
cl
oud
e
nviro
nm
ents
[
17
]
,
he
nce
it
is
c
onsi
der
e
d
f
or
com
par
is
ons.
Ha
doop
2.0
i.e.
ver
s
ion
2
.7
is
us
e
d
an
d
is
dep
l
oyed
on
a
zur
e
cl
oud
us
i
ng
H
DInsig
ht.
The
Hado
op
cl
us
te
r
is
c
om
posed
of
one
m
a
ste
r
w
orke
r
no
de
a
nd
four
w
orke
r/sla
ve
nodes
.
Eac
h
work
e
r
node
is
de
plo
ye
d
on
A
3
virtu
al
m
achine
i
ns
ta
nce
s
w
hich
c
om
po
s
ed
of
4
virt
ual
com
pu
ti
ng
c
or
e
s,
7
GB
RAM
an
d
120
GB
of
st
orage
sp
ace
.
U
ni
form
con
fig
ur
a
ti
on
is
co
ns
ide
r
ed
f
or
bo
t
h
O
HMR
and
HMR.
F
or
e
xp
e
rim
ent
analy
sis
diff
ere
nt
a
pp
li
cat
io
n
are
consi
der
e
d
s
uc
h
as
Ge
ne
se
quencin
g
(Bioin
form
at
ics)
, Wo
r
d fr
e
qu
ency sta
ti
sti
cs com
pu
ta
ti
on
a
nd Hot
-
wor
d d
et
ect
ion
.
4.1.
Gene se
quenci
ng
Gen
e
se
qu
e
nce
al
ign
m
ent
is
a
fun
dam
ental
op
erati
on
a
dopte
d
to
ide
ntify
si
m
il
arities
that
exist
bet
wee
n
a
qu
e
ry
protei
n
seq
ue
nce,
D
NA
or
R
NA
a
nd
a
database
of
se
quences
m
ai
ntained.
Se
qu
e
nce
al
ig
nme
nt
is
com
pu
ta
ti
on
al
ly
heav
y
an
d i
ts
com
pu
ta
ti
on
c
om
plexity
is
re
la
ti
ve t
o pro
duct
of
tw
o seq
ue
nces
bei
ng
cu
rrentl
y
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
11
32
–
11
42
1138
analy
zed.
Ma
s
sive
vo
l
um
es
of
seq
ue
nces
m
ai
ntained
in
the
database
to
be
searc
he
d
in
duces
a
dd
it
ion
al
com
pu
ta
ti
on
bur
de
n.
BL
AS
T
is
a
widely
a
dopte
d
bio
i
nform
at
ic
s
too
l
f
or
seq
ue
nce
al
i
gnm
ent
wh
ic
h
pe
rfor
m
faster
al
ig
nm
e
nts,
at
e
xpense
of
acc
ur
acy
(possib
ly
m
issi
ng
s
om
e
po
te
nti
al
hits)
[18].
D
rawbac
ks
of
BLAS
T
and
it
s
im
pr
ov
e
m
ents
is
disc
us
se
d
i
n
[19].
Fo
r
e
valuati
on
he
re
t
he
im
prov
e
d
BLA
ST
al
gorithm
of
[
19
]
is
adopted
.
T
o
i
m
pr
ov
e
c
om
put
at
ion
ti
m
e
a
heu
risti
c
strat
e
gy
is
us
e
d
c
om
pr
om
isi
ng
accu
r
acy
m
ini
m
ally.
In
the
heurist
ic
strate
gy init
ia
l m
a
tch
is
fou
nd and
is l
at
er ex
te
nd
e
d
to
obtai
n
t
he c
om
plete
m
at
c
hing se
quence
.
Ex
per
im
ent
are
cond
ucted
to
evaluate
O
H
MR
and
HMR
perform
ance
fo
r
per
form
ing
gen
e
s
eq
ue
nc
e
al
ign
m
ent.
Th
e
dataset
f
or
exp
e
rim
ent
analy
sis
is
ob
ta
ined
f
ro
m
NCBI
[
20
]
.
For
perform
ing
al
ign
m
ent
Dros
ophila
database
as
a
refe
ren
ce
databas
e
an
d
Q
ue
ry
s
equ
e
nce
of
va
ried
siz
es
of
f
ro
m
Ho
m
o
sa
pien
s
chrom
os
om
al
s
equ
e
nces
a
nd
genom
ic
scaff
old
s
is
c
on
si
de
red
sim
il
ar
to
[1
9]
w
hic
h
are
ta
bu
la
te
d
i
n
Ta
ble
1
.
All
six
e
xp
e
ri
m
ent
are
co
nd
ucted
us
i
ng
B
LAS
T
al
gorith
m
on
HMR
a
nd
O
HMR
f
ra
m
ewo
r
ks
.
The
total
m
akesp
a
n
tim
e
of
both
HMR
and
O
HMR
f
or
al
l
six
e
xpe
rim
ent
is
no
te
d
an
d
gr
a
ph
is
plo
tt
ed
as
s
ho
wn
i
n
Fig
ure
2.
It
m
u
st
be
note
d
t
hat
the
init
ia
li
zat
io
n
tim
e
of
the
V
M
cl
us
te
r
is
not
consi
dered
is
c
om
pu
ti
ng
m
akesp
a
n
as it
is unif
or
m
in bo
t
h OH
M
R an
d HMR
owin
g
to
sim
i
la
r
cl
us
te
r
c
onfi
gurati
ons.
The
t
otal
m
a
kes
pan
of
O
HMR
an
d
H
MR
is
de
penden
t
on
t
as
k
exec
utio
n
ti
m
e
of
virt
ual
com
pu
ti
ng
/w
or
ker
no
des
duri
ng
Ma
p
an
d
R
edu
ce
ph
ase
.
T
he
total
m
akes
pan
obse
rved
in
BL
AS
T
se
quence
al
ign
m
ent ex
pe
rim
ents ex
ecut
ed o
n HMR
a
nd
O
HMR
fr
am
eworks is
sho
w
n
i
n Fi
g
ure
2.
The
outcom
es sh
ows
sign
ific
a
nt
perf
or
m
ance
in
te
r
m
s
of
re
duce
m
akesp
a
n
ti
m
e
s
of
O
HMR
over
HMR.
A
m
akes
pan
re
duct
ion
of
43.44%
,
44.
85%,
56.
9%,
57.
17%,
62.
83%
and
65.01%
is
ob
ta
ine
d
f
or
si
x
e
xperim
ent
by
O
HMR
over
HMR.
An av
e
ra
ge
m
a
kes
pan re
du
ct
i
on of
55.
03
%
is achie
ved b
y
OH
MR
over
H
MR
acro
ss
all
exp
e
rim
ents.
The
or
et
ic
al
m
a
kes
pan
of
O
H
MR
i.e.,
giv
e
n
by
Eq
uatio
n
(
11)
is
com
pu
te
d
a
nd
c
om
par
ed
a
gainst
the
pract
ic
al
va
lues
obse
rv
e
d
i
n
al
l
the
e
xp
e
ri
m
ents.
R
esults
ob
ta
ine
d
is
sho
wn
in
Fi
gure
3.
Mi
nor
va
riat
io
ns
is
ob
s
er
ved
bet
w
een
pr
act
ic
al
and
the
or
et
i
cal
m
akesp
an
c
om
pu
ta
ti
on
s.
O
ver
al
l
good
c
orrelat
ion
is
r
e
ported
betwee
n
pr
act
ic
al
m
akesp
an
va
lues
an
d
the
or
et
ic
al
m
akesp
an
val
ues.
Ba
se
d
on
the
r
esults
presente
d
it
is
ev
ident
that
exec
ution
of
BL
AST
se
qu
e
nce
al
ig
nme
nt
al
gorithm
on
pro
posed
OH
MR
yi
el
ds
su
p
erio
r
r
esult
s
w
hen
com
par
ed
to
s
i
m
i
la
r
exp
e
ri
m
ents
cond
ucted
on
e
xisti
ng
HMR
fr
am
ework.
A
ccu
rac
y
and
c
orrect
ness
of
theo
reti
cal
m
a
kes
pan m
od
el
o
f
OHMR
pres
ented
is
prove
d t
hro
ugh
c
orrel
at
ion
m
easur
es
.
Table
1
.
Sim
ul
at
ion
par
am
et
e
r
c
on
si
der
e
d
Exp
eri
m
en
t
I
d
Qu
ery g
en
o
m
e
Qu
ery g
en
o
m
e
size
Ref
erence gen
o
m
e
Ref
erence gen
o
m
e
size
1
NT_0
0
7
9
1
4
1
4
8
6
6
2
5
7
Dros
o
p
h
ila datab
ase
1
2
2
,653,9
7
7
2
AC_
0
0
0
1
5
6
1
9
3
1
7
0
0
6
Dros
o
p
h
ila datab
ase
1
2
2
,653,9
7
7
3
NT_0
1
1
5
1
2
3
3
7
3
4
1
7
5
Dros
o
p
h
ila datab
ase
1
2
2
,653,9
7
7
4
NT_0
3
3
8
9
9
4
7
0
7
3
7
2
6
Dros
o
p
h
ila datab
ase
1
2
2
,653,9
7
7
5
NT_0
0
8
4
1
3
4
3
2
1
2
1
6
7
Dros
o
p
h
ila datab
ase
1
2
2
,653,9
7
7
6
NT_0
2
2
5
1
7
9
0
7
1
2
4
5
8
Dros
o
p
h
ila datab
ase
1
2
2
,653,9
7
7
Figure
2
.
BLA
ST se
qu
e
nce al
ign
m
ent total
m
akesp
a
n
ti
m
e
observe
d for e
xp
e
rim
ents co
nducte
d on O
H
MR
and H
MR
fr
am
ewor
ks
0
50
1
0
0
1
5
0
2
0
0
2
5
0
3
0
0
3
5
0
4
0
0
4
5
0
1
2
3
4
5
6
Execu
tio
n
T
im
e
(s)
Exp
erim
en
t
n
u
m
b
er
Makes
pan
Tim
e
HM
R
OH
MR
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
An
Acc
urate
and Ef
fi
ci
ent S
c
heduler f
or
Hadoo
p
M
apRed
uc
e Framew
or
k
(
D.
C.Vi
nuth
a
)
1139
Figure
3
.
Co
rr
e
la
ti
on
b
et
wee
n t
heoreti
cal
an
d p
racti
cal
m
akesp
a
n
ti
m
es f
or
BLAST
seque
nce ali
gnm
ent
execu
ti
on
on
O
HMR f
ram
ewo
rk
4.2.
Word fre
quen
cy
s
tatistics c
om
putat
i
on
s
The
w
ord
fr
e
qu
ency
sta
ti
sti
c
app
li
cat
io
n
is
de
velo
ped
us
in
g
J
ava
pro
gr
am
ing
la
ngua
ge.
Th
e
W
i
kip
e
dia
dataset
[2
1] is
consi
der
e
d f
or
exp
e
rim
ent an
a
ly
sis. Th
e
W
i
ki
ped
ia
d
at
aset
is hu
ge
i
n
siz
e
(
i.
e. >
100 GB
)
a
nd is
sp
li
t
into
2048
MB
each
an
d
store
d
i
n
Az
ure
c
lo
ud
c
on
ta
i
ner.
F
or
e
xperi
m
ental
analy
sis
this
w
ork
co
ns
ide
r
16GB
of
data.
The
w
ord
fr
e
quency
sta
ti
sti
cs
ap
plica
ti
on
s
w
ere
e
xec
uted
on
t
he
OH
MR
a
nd
HMR
f
ram
ewor
k
and
t
he
res
ults ob
ta
ine
d
a
re
note
d.
The
outc
om
es
sh
ow
s sign
i
ficant
pe
rfo
rm
ance
in
te
rm
s
of
r
ed
uce
m
akes
pan
tim
es
of
O
HM
R
ove
r
HMR.
A
m
akesp
an
re
du
ct
io
n
of
43.7%,
44.
34%,
45.
69%
an
d
51.
57
%
is
obta
ine
d
f
or
data
siz
e
of
20
48
M
B,
40
96
MB
,
8192
MB
a
nd
16384
MB
res
pec
ti
vely
by
O
HM
R
over
HMR.
An
ave
ra
ge
m
a
kes
pan
reducti
on
of
46
.39% is ac
hiev
ed by O
HMR
ov
e
r HMR
acr
os
s all
e
xp
e
rim
ents.
The
or
et
ic
al
m
a
kes
pan
of
O
H
MR
i.e.,
giv
e
n
by
Eq
uatio
n
(
11)
is
com
pu
te
d
a
nd
c
om
par
ed
a
gainst
the
pract
ic
al
va
lues
obse
rv
e
d
i
n
al
l
the
e
xp
e
ri
m
ents.
Re
su
lt
s
ob
ta
ine
d
is
sho
wn
in
Fi
g
ure
5.
Mi
nor
va
riat
io
ns
is
ob
s
er
ved
bet
w
een
pr
act
ic
al
and
the
or
et
ic
al
m
akesp
an
c
om
pu
ta
ti
on
s.
O
ver
al
l
good
c
orrelat
ion
is
r
e
ported
betwee
n
pr
act
ic
al
m
akesp
an
va
lues
an
d
the
or
et
ic
al
m
akesp
an
val
ues.
Ba
se
d
on
the
r
esults
presente
d
it
is
ev
ident
that
execu
ti
on
of
word
fr
e
quency
sta
ti
sti
c
app
li
cat
ion
on
pro
pose
d
O
HMR
yi
el
ds
superi
or
re
su
lt
s
wh
e
n
com
par
ed
to
s
i
m
i
la
r
exp
e
ri
m
ents
cond
ucted
on
e
xisti
ng
HMR
fr
am
ework.
A
ccu
rac
y
and
c
orrect
ness
of
theo
reti
cal
m
a
kes
pan m
od
el
o
f
OHMR
pres
ented
is
prove
d t
hro
ugh
c
orrel
at
ion
m
easur
es
.
Figure
4
.
Wo
r
d fr
e
quency
sta
ti
sti
c app
li
cat
ion
total
m
akespan ti
m
e o
bs
er
ve
d for e
xp
e
rim
ent con
du
ct
e
d on
OH
MR
a
nd
H
MR
f
ram
ewo
r
ks
0
20
40
60
80
1
0
0
1
2
0
1
4
0
1
6
0
1
2
3
4
5
6
Execu
tio
n
T
im
e
(s)
Exp
erim
en
t
n
u
m
b
er
Makes
pan
Tim
e
OH
MR
OH
MR-T
h
eo
r
y
0
50
1
0
0
1
5
0
2
0
0
2
5
0
3
0
0
2
0
4
8
M
B
4
0
9
6
M
B
8
1
9
2
M
B
1
6
3
8
4
MB
Execu
tio
n
tim
e
(s)
W
ik
ip
ed
ia
d
ata size
Makes
pan
t
i
m
e
obser
ved
HM
R
OH
MR
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
11
32
–
11
42
1140
Figure
5
.
Co
rr
e
la
ti
on
b
et
wee
n t
heoreti
cal
an
d p
racti
cal
m
akesp
a
n
ti
m
es f
or
word f
re
qu
e
nc
y st
at
ist
ic
app
li
cat
ion
e
xe
cution o
n O
H
MR
f
ram
ewo
r
k
4.3.
H
ot
-
w
or
d d
e
t
ection
co
m
pu
t
at
i
on
s
The
hot
-
w
ord
detect
io
n
al
gorithm
[22]
is
de
velo
pe
d
us
i
ng
Ja
va
pr
ogram
ing
la
ngua
ge
.
The “
M
ov
ie
tw
eet
ing
s”
datase
t
[23]
is
co
ns
id
ered
f
or
e
xperi
m
ent
analy
sis
and
sto
red
i
n
A
zur
e
cl
oud c
on
ta
ine
r.
Tweets
c
onsist
ing
of
2000
0,
40000,
6000
0
and
80000
m
ov
ie
s
is
co
ns
i
de
red
a
nd
is
repr
esented
as
20
K,
40
K,
60K
an
d
80K.
The
hot
-
w
ord
detect
ion
al
go
r
it
h
m
wer
e
e
xe
cuted
on
the
O
HMR
a
nd
HM
R
f
ram
ewo
r
k
a
nd
the
resu
lt
s
obta
ine
d
a
re
note
d.
T
he
t
otal
m
akesp
an
ti
m
e
of
O
HMR
a
nd
e
xis
ti
ng
m
od
el
is
no
te
d
a
nd
is
s
how
n
i
n
Fig
ure
6.
E
xp
e
rim
ent an
al
yses
shows
a
s nu
m
ber
o
f
tweet
s
incr
eases t
he
c
om
pu
ta
ti
on
ti
m
e o
f b
oth
OHM
R and
HMR
incr
ease
s.
T
he
ou
tc
ome
s
sho
ws
si
gn
i
ficant
perform
ance
in
te
rm
s
of
re
du
ce
m
akesp
a
n
ti
m
es
of
O
HMR
ov
e
r HMR
.
A
m
akesp
a
n red
uc
ti
on
of
54.19
%,
45.13%
, 60.6
8%
a
nd
54.
69% is
obtai
ne
d f
or tweet
siz
e
of 20
K,
40K,
60
K
a
nd
80K
res
pecti
ve
ly
by
O
HMR
over
HMR.
A
n
aver
a
ge
m
akes
pan
re
duct
ion
of
53.
67%
is
a
chieve
d
by OHMR
ov
e
r
HMR a
cr
os
s
al
l exp
e
rim
ents
.
The
or
et
ic
al
m
a
kes
pan
of
O
H
MR
i.e.,
giv
e
n
by
Eq
uatio
n
(
11)
is
com
pu
te
d
a
nd
c
om
par
ed
a
gainst
the
pract
ic
al
va
lues
obse
rv
e
d
i
n
al
l
the
e
xp
e
ri
m
ents.
Re
su
lt
s
ob
ta
ine
d
is
sho
wn
in
Fi
g
ure
7.
Mi
nor
va
riat
io
ns
is
ob
s
er
ved
bet
w
een
pr
act
ic
al
and
the
or
et
ic
al
m
akesp
an
c
om
pu
ta
ti
on
s.
O
ver
al
l
good
c
orrelat
ion
is
r
e
ported
betwee
n
pr
act
ic
al
m
akesp
an
va
lues
an
d
the
or
et
ic
al
m
akesp
an
val
ues.
Ba
se
d
on
the
r
esults
presente
d
it
is
ev
ident
that
exec
utio
n
of
H
ot
-
wor
d
de
te
ct
ion
on
pr
opos
e
d
OH
M
R
yi
el
ds
superi
or
res
ults
w
he
n
c
om
par
ed
t
o
sim
il
ar
exp
e
rim
ents
co
nducted
on
e
xi
sti
ng
HMR
f
ra
m
ewo
r
k.
Accuracy
a
nd
c
orrec
tness
of
t
heoret
ic
al
m
akesp
an
m
od
e
l
of OHMR
pres
ented
is
prove
d t
hro
ugh
c
orrel
at
ion
m
easur
es
.
Fi
gure
6
.
H
ot
-
word de
te
ct
ion t
otal m
akesp
an
ti
m
e o
bs
er
ve
d for e
xp
e
rim
e
nt con
du
ct
e
d o
n OH
MR
a
nd
HMR
fr
am
ewo
r
k
0
20
40
60
80
1
0
0
1
2
0
1
4
0
2
0
4
8
M
B
4
0
9
6
M
B
8
1
9
2
M
B
1
6
3
8
4
MB
Execu
tio
n
tim
e
(s)
W
ik
ip
ed
ia
d
ata size
Makes
pan
t
i
m
e
obser
ved
OH
MR
OH
MR-T
h
eo
r
y
0
20
40
60
80
1
0
0
1
2
0
2
0
K
4
0
K
6
0
K
8
0
K
Execu
tio
n
tim
e
(s)
Nu
m
b
er
o
f
twitter
f
eeds
co
n
sid
ered
Makes
pan
t
i
m
e
obser
ved
HM
R
OH
MR
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
An
Acc
urate
and Ef
fi
ci
ent S
c
heduler f
or
Hadoo
p
M
apRed
uc
e Framew
or
k
(
D.
C.Vi
nuth
a
)
1141
Figure
7
.
Co
rr
e
la
ti
on
b
et
wee
n t
heoreti
cal
an
d p
racti
cal
m
akesp
a
n
ti
m
es f
or
BLAST
seque
nce ali
gnm
ent
execu
ti
on
on
O
HMR f
ram
ewo
rk
In
this
sect
ion
t
he
e
xec
ution
of
the
im
pr
eci
se
and
bio
in
f
or
m
a
ti
cs
ap
plica
ti
ons
nam
el
y
word
fr
e
qu
e
ncy
sta
ti
sti
cs,
ho
t
word
de
te
ct
ion,
an
d
ge
ne
se
quenci
ng
(BL
A
ST)
is
prese
nted.
The
res
ults
presente
d
here
pro
ve
that
the
OH
M
R
m
od
el
reduc
es
the
m
akesp
an
obse
rv
e
d
due
to
the
optim
ized
m
akesp
an
m
od
el
incorporate
d
i
n
to
HMR.
An
a
ver
a
ge
re
du
ct
i
on
of
53.
67%
f
or
w
ord
f
reque
ncy
sta
ti
sti
cs
and
46.
39%
f
or
the
ho
t
w
ord
de
te
ct
ion
is
re
ported
a
nd
53.
67%
f
or
th
e
ge
ne
se
qu
e
nc
ing
(BL
AS
T
)
co
ns
id
erin
g
th
e
O
HMR
m
odel
w
hen
c
om
par
ed
to
the
e
xisti
ng
H
MR
m
od
el
[
11]
.
T
he
c
um
ul
at
ive
a
naly
sis
ov
e
r
sta
te
-
of
-
a
rt
te
ch
nique
in
Ta
ble
II
s
hows
t
he
eff
ic
ie
ncy
of
O
HMR
ov
e
r
sta
te
-
of
-
a
rt
te
ch
ni
qu
e
in
te
rm
s
of
r
obus
t
ness
an
d
scal
a
bili
ty
.
S
ince,
O
HMR
s
upport
execu
ti
on
of
diff
e
ren
t
a
ppli
cat
ion
su
c
h
as
Bi
oinfo
rm
atics
and
te
xt
m
ining
over
cl
oud
pla
tfor
m
s.
O
ur
O
HMR
m
akesp
a
n
m
odel
ai
ded
in
bet
te
r
cl
oud
re
sou
rce
util
iz
at
ion
.
The
oret
ic
al
com
par
ison
e
va
luati
on
is
c
on
s
idere
d
and
at
ta
ine
d
be
tt
er
resu
lt
w
he
n
com
par
ed
with
[12]
an
d
[
14]
.
Ado
ption
cl
ou
d
platf
or
m
a
id
in
pr
ov
i
ng
s
cal
abili
t
y
of
processi
ng
of
la
rg
e
am
ount
of
data
of
va
rio
us
ty
pes
on
la
r
ge
com
pu
ti
ng
cl
us
te
rs.
All
these
featu
re
at
tribu
te
d
to the pe
rfor
m
ance im
pr
ove
m
ent o
f
OHM
R o
ver
sta
te
-
of
-
art m
od
el
s.
Table
2
.
C
om
par
iso
n wit
h
sta
t
e of art tec
hniq
ue
[11
]
[12
]
[13
]
[14
]
[15
]
O
H
M
R
MapR
ed
u
ce platf
o
r
m
co
n
sid
ered
Had
o
o
p
Had
o
o
p
Had
o
o
p
Had
o
o
p
Had
o
o
p
Had
o
o
p
Clo
u
d
adopted
Yes
NO
Yes
Yes
No
Yes
Ap
p
licatio
n
con
sider
ed
Bio
in
for
m
atics
W
o
rd co
u
n
t
W
o
rd co
u
n
t
an
d
T
era
so
rt
W
o
rd co
u
n
t
an
d
Sort
W
o
rd co
u
n
t
an
d
Sort
Bio
in
for
m
atics
an
d
text
m
in
in
g
Makes
p
an
acc
u
rac
y
ev
alu
atio
n
con
sid
e
red
No
Yes
No
Yes
No
Yes
Av
erage percentag
e
i
m
p
rov
e
m
en
t ov
er
HMR f
ra
m
ewo
rk
4
0
.28
%
1
3
.33
%
3
4
.83
%
2
7
.7%
4
3
.91
%
5
1
.16
%
5.
CONCL
US
I
O
N
The
sig
nifica
nc
e
of
cl
ou
d
c
om
pu
ti
ng
platf
orm
s
is
disc
us
se
d.
Com
m
on
ly
adopted
Ha
doop
m
ap
re
du
ce
fr
am
ewo
r
k
w
orki
ng
with
it
s
draw
bac
ks
is
pr
esented
.
T
o
lo
wer
m
akesp
an
tim
es
and
e
na
bl
e
eff
ect
ive
util
iz
at
ion
of
cl
oud
re
sour
ces
this
pa
per
pro
poses
a
n
O
H
MR
fr
am
ew
ork.
The
m
ai
n
co
ntributi
on
of
th
is
w
ork
is
pres
entin
g
an
acc
ur
at
e
an
d
e
ff
ic
ie
nt
m
akesp
a
n
m
od
el
f
or H
ad
oop
Ma
pR
edu
ce
f
ram
e
work.
T
he
am
ou
nt r
es
ource r
e
qu
i
re
d
to
m
eet
ta
sk
dead
li
ne
is
done
base
d
m
akesp
a
n
m
od
el
prese
nt
ed
he
re.
To
ev
al
uate
the
perf
or
m
ance
of
pr
opose
d
OH
MR
fr
am
e
work
com
pu
ta
ti
on
al
ly
hea
vy
bio
in
f
or
m
at
ic
s
app
li
cat
io
n
an
d
im
pr
eci
se
ap
plica
ti
on
s
uch
as
w
or
d
fr
e
qu
e
ncy
sta
ti
sti
cs
and
hot
w
ord
detect
ion
is
co
ns
ide
red.
Pe
rfor
m
ance
of
O
HMR
fram
ewo
rk
is
c
om
par
ed
wit
h
HMR
f
ram
ewo
rk
in
te
rm
s
of
m
akesp
a
n
ti
m
e
.
A
ve
rag
e
ove
r
al
l
m
akesp
an
ti
m
es
reducti
on
of
55.
03%,
46.
39,
a
nd
53.67%
is
ac
hi
eved
us
in
g
O
H
MR
fr
am
ewo
r
k
w
hen
com
pared
t
o
HMR
f
ra
m
ewo
r
k
f
or
BL
AS
T,
wor
d
fr
e
qu
e
ncy
sta
ti
sti
cs,
and
hot
w
ord
detect
ion
ap
plica
ti
on
s
.
Ex
pe
rim
ents
pr
ese
nted
pro
ve
rob
us
tnes
s
of
OH
MR
fr
am
ework,
it
s
capa
bili
ty
t
o
ha
nd
le
di
verse
a
pp
li
cat
ions
on
pu
blic
an
d
pr
i
vate
cl
ou
d
pl
at
fo
rm
s.
Re
sul
ts
pr
e
sente
d
t
hro
ugh
exp
e
rim
ents
co
nducted
pro
ve
su
pe
rio
r
pe
rform
ance
of
O
H
MR
again
st
Ha
doop
fr
am
ework.
G
ood
m
at
c
hing
i
s
repor
te
d betwe
en
the
the
or
et
i
cal
m
akesp
a
n of O
HMR p
res
ented
a
nd e
xpe
rim
ental
v
al
ues
obser
ve
d.
0
20
40
60
80
1
0
0
1
2
0
1
4
0
2
0
K
4
0
K
6
0
K
8
0
K
Execu
tio
n
tim
e
(s)
Nu
m
b
er
o
f
twitter
f
eeds
co
n
sid
ered
Makes
pan
t
i
m
e
obser
ved
OH
MR
OH
MR-T
h
eo
r
y
Evaluation Warning : The document was created with Spire.PDF for Python.