Indonesi
an
Journa
l
of El
ect
ri
cal Engineer
ing
an
d
Comp
ut
er
Scie
nce
Vo
l.
12
,
N
o.
3
,
Decem
ber
201
8
, p
p.
995
~
1002
IS
S
N: 25
02
-
4752, DO
I: 10
.11
591/ijeecs
.v1
2
.i
3
.pp
995
-
10
02
995
Journ
al h
om
e
page
:
http:
//
ia
es
core.c
om/j
ourn
als/i
ndex.
ph
p/ij
eecs
Effici
ent
H.264 D
ecoder
A
rc
hit
ecture
U
sin
g
Extern
al Mem
ory
and Pip
eli
nin
g
G.R. P
oornim
a
1
, S C
Pr
asan
na
Kum
ar
2
1
Dept.
of
E
&
C
E,
Sri
Venka
te
sh
wara
Col
le
g
e
of
Engi
ne
eri
ng,
Ba
ngal
ore
2
Dept.
of
E
lectr
o
nic
s &
Instrum
e
nta
ti
on
T
ec
hnolo
g
y
,
R
V
Coll
ege
of
Engi
n
ee
rin
g,
Banga
lor
e
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
A
pr
30
, 201
8
Re
vised
Ju
l
14
,
201
8
Accepte
d
Aug
2
1
, 201
8
A
H.264
standa
rd
is
one
of
the
most
popula
r
coding
standa
rd
with
signifi
ca
n
t
improvem
ent
in
vide
o
broa
dc
asting
and
strea
m
in
g
appl
icati
on
.
How
eve
r
it
’s
signifi
c
ant
in
co
m
pre
ss
ion
but
n
ee
ds
huge
c
al
cu
l
at
ion
and
complex
a
lgori
thm
for
providi
ng
b
et
t
er
image
qu
alit
y
and
compress
ion
rat
e
.
In
H.
264
codi
ng
te
chn
ique
,
desig
ning
of
dec
oder
is
a
ke
y
fac
tor
for
eff
ic
i
ent
cod
ing.
In
thi
s
pape
r
we
ar
e
designi
ng
a
de
code
r
using
a
complex
input
.
W
e
ens
ure
d
seve
ra
l
improvem
ent
li
ke
loopi
ng
a
rra
ngement
,
buffe
r
upgra
da
ti
o
n,
b
uffe
r
supplement,
m
e
m
ory
reu
sab
il
i
t
y
and
pip
el
in
i
ng
arc
h
it
e
ct
ur
e.
W
e
have
m
odifi
ed
the
m
e
m
ory
stru
ct
ur
e
a
lso.
Our
designed
dec
oder
ac
h
ieves
a
bette
r
fra
m
e
dec
oding
eff
ic
i
ency
against
stat
e
-
of
-
art
m
et
hods.
The
proposed
appr
oac
h
al
so provi
des
good
are
a
opti
m
iz
ation
wi
th
a
m
axi
m
um
fr
eque
nc
y
o
f
355
MH
z.
Ke
yw
or
d
s
:
H.264
dec
od
e
r
P
ipeli
ning
a
rchi
te
ct
ur
e
Mem
or
y re
us
a
bili
ty
Lum
a/
Chro
m
a
Copyright
©
201
8
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
G.
R.
Poo
rn
im
a
,
Dep
t.
of E
&
C
E,
Sr
i V
en
kat
esh
war
a
Colle
ge o
f
E
ng
i
nee
ri
ng,
Ba
ng
al
or
e
.
Em
a
il
:
po
orni
m
a_g
_r@ya
hoo.
c
om
1.
INTROD
U
CTION
H.264
is
a
m
os
t
broad
ly
use
d
sta
nd
a
r
d
f
or
vid
e
o
c
od
i
ng
with
si
gn
i
ficant
im
pr
ov
e
m
ent
in
vide
o
broa
dcasti
ng,
vid
e
o
stream
ing
an
d
op
ti
cal
disc.
It
is
est
a
blishe
d
by
J
V
T
(J
oin
t
vi
deo
Team
)
of
IT
U
-
T
a
nd
IS
O/
IEC and a
lso know
n
as MPEG
-
4 part
10
a
dvance
vide
o
co
ding.
Mo
st of
the b
it
s tr
ansm
itted w
ire
le
ssly
in
a
com
m
un
ic
ation
netw
orks
us
es
MPE
G
-
4
par
t
10
a
dv
a
nced
vid
e
o
c
odin
g
(
AV
C
).
H.264
vid
e
o
cod
i
ng
com
pr
ise
s
high
com
pr
essio
n
eff
ic
ie
ncy,
s
o
it
is
m
os
t
fr
equ
e
ntly
us
e
d
in
vi
deo
c
odin
g.
It
has
s
om
e
new
featur
e
s
inclu
di
ng
inte
r
-
pr
e
di
ct
ion
,
i
ntra
pre
dicti
on
,
va
riab
le
blo
c
k
siz
e
a
nd
c
onte
xt
-
bas
ed
ada
ptive
e
nt
ropy
cod
i
ng
[1
]
.
T
he
se
al
l
new
fe
at
ur
e
nee
ds
c
om
plex
co
m
pr
ession
al
go
rith
m
and
huge
c
al
culat
ion
s
to
pro
vid
e
bette
r
im
age
qu
al
it
y
and
c
om
pr
ession
r
at
e.
H.2
64
co
ding
te
ch
nique
in
vo
l
ves
s
ource
cod
e
f
ro
m
dif
fer
e
nt
do
m
ai
n
li
ke
c
om
pu
ta
ti
on
al
ph
ysi
cs,
com
pu
te
r
sci
ence
an
d
m
achine
le
ar
nin
g
ap
proac
h,
wh
ic
h
m
akes
c
om
plex
so
urce
co
de
a
nd
ch
al
le
ng
es
in
synt
hesis.
T
he
c
om
plexity
of
the
H.264
decode
r
is
i
ncrea
sed
l
ot
c
ompare
t
o
MPEG
-
4 dec
oder
.
Usu
al
ly
a
H.
264
decode
r
co
ntains
a
pip
el
i
ning
arc
hitec
ture
of
4*4
sub
blo
c
k.
T
he
dat
a
pr
oc
essin
g
tim
e
and
the
c
om
plexity
of
each
sta
ge
of
pi
pelinin
g
arc
hitec
ture
de
pe
nds
on
the
ty
pe
of
data
an
d
dec
od
i
ng
m
et
ho
ds.
The
t
i
m
ing
of
eac
h
sta
ges
sho
uld
be
know
n,
s
o
that
the
sta
ges
wh
ic
h
re
qu
i
res
m
or
e
processi
ng
ti
m
e
can
be
no
rm
alized
by
eff
ic
ie
nt
decode
r
arc
hite
ct
ur
e.
An
eff
ic
ie
nt
dec
od
er
arch
it
ect
ure
can
ideal
iz
e
al
l
th
e
tim
e
con
su
m
ing
sta
ges
by
re
duci
ng
the
c
omplexit
y
up
t
o
c
ertai
n
le
vel.I
t
i
s
obser
ved
by
run
ti
m
e
analy
sis
tha
t
the
m
otion
c
om
pen
sat
ion
(
MC
)
us
es
55
%
of
the
dec
odin
g
ti
m
e.
So
,
it
’s
a
c
ru
ci
al
facto
r
in
de
sign
i
ng
a
decode
r
arc
hitec
ture
[
2
-
3]
c
onside
rin
g
pe
rfo
rm
ance.
Re
ading
pix
el
data
with
le
ss
com
plexity
can
incr
ease
the
perform
ance
of
the
de
co
der.
H.264
decode
r
s
usual
ly
hav
e
three
unit
s.
O
ne
is
the
MC
un
it
disc
us
se
d
above,
seco
nd
is
the
deb
l
ock
i
ng
filt
er
un
it
an
d
the
la
st
is
data
ou
t
un
it
w
hich
helps
in
tra
ns
f
err
in
g
im
age
data
on
disp
la
y
dev
ic
e.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
995
–
10
02
996
The
siz
e
of
c
ode,
c
om
plexity
of
data
str
uctu
re
an
d
functi
on
hierar
c
hy
s
houl
d
be
lim
it
e
d
as
m
uch
a
s
po
s
sible.
T
here
are
diff
e
ren
t
m
et
ho
do
l
og
ie
s,
wh
ic
h
affe
c
ts
the
eff
ic
ie
nc
y
of
dec
od
e
r
.
So
m
e
research
ers
fo
ll
owe
d
bo
tt
om
-
up
m
et
ho
dolog
ie
s
w
hich
inclu
des
blo
c
k
le
vel
desig
n
al
ong
with
syst
em
-
le
vel
design
in
g.
Wh
e
reas
s
om
e
researc
he
rs
f
ol
lowed
to
p
-
down
m
et
ho
do
l
ogie
s
w
hich
inc
lud
es
t
he
whol
e
functi
on
as
a
sing
l
e
un
it
.
T
her
e
is
al
so
dif
fer
e
nce
between
HLS
gen
e
rated
ha
r
dw
a
re
an
d
sta
nd
al
on
e
hard
w
are.
S
o,
the
di
ff
e
ren
t
m
et
ho
dolo
gies
us
es
in
di
ff
e
r
ent
goal
and
a
pp
li
cat
io
n.
A
com
plex
app
li
cat
ion
m
ay
be
m
or
e
eff
ect
iv
e
with
appr
oach
es
tha
t
br
ea
ks
the
c
od
e
piece
wise
and
opti
m
i
ze
e
ach
piece
of
c
od
e
w
her
e
a
s
a
si
m
ple
app
li
cat
ion
m
ay
r
equ
ire
m
or
e
m
et
ho
ds
to
g
et
e
ff
ect
ive
.s
o,
it
’s
co
m
plete
ly
d
epe
nds to
pur
po
se
of a
pp
li
cat
ion
.
To
re
duce
the
m
e
m
or
y
scope,
seve
ral
opti
m
iz
at
ion
are
done
by
resea
rc
her
s
.
S
om
e
of
the
HE
VC
hard
war
e
inter
po
la
ti
on
is
presented
in
[4
-
7],
w
her
e
the
y
hav
e
com
par
ed
diff
e
re
nt
te
chn
i
qu
e
s.
I
n
thei
r
appr
oach,
they
did
not
us
e
d
m
e
m
or
y
based
i
m
ple
m
entation
.
I
n
[
4],
they
hav
e
im
ple
m
e
nted
th
ree
di
fferent
8
-
ta
p
FI
R
filt
er
by
us
in
g
a
co
nf
i
gurab
le
pat
h,
w
hich
ca
n
evaluate
s
in
gle
filt
er
ou
tp
ut
at
a
t
i
m
e.
Du
e
to
this
reason,
it
can
be
us
e
d
for
on
ly
m
otion
com
pensat
ion.
I
n
[
5],
the
prese
nted
ha
rdwa
re
de
sign
us
es
m
ulti
plier
le
ss
con
sta
nt
m
ul
ti
plica
ti
on
(MCM
)
ap
pro
ach
f
or
m
ul
ti
plyi
ng
with
co
nst
ant
factor
i
n
[6
-
7],
they
ha
ve
us
ed
add
e
rs
a
nd s
hif
te
rs
f
or g
e
ne
rati
ng
FI
R
filt
ers.
In
t
his
pap
e
r
we
hav
e
dem
on
strat
ed
H
.26
4
vid
e
o
c
odin
g
with
it
s
real
a
pp
li
cat
io
n
an
d
the
ty
pes
-
of
com
plexity
whic
h
we
face
on
top
dow
n
a
ppro
ac
hes.
Desi
gnin
g
of
decode
r
co
re
is
ve
ry
im
po
rtant
in
fa
st
an
d
powe
r
eff
ic
ie
nt
decodin
g.
Ma
ny
onli
ne
platf
or
m
li
ke
Yo
uT
ub
e
a
nd
Face
book u
sin
g
h.2
64
te
ch
nique
f
or
vid
e
o
cod
i
ng.
Pop
ula
r
m
anu
factur
i
ng
com
pan
y
li
ke
app
le
an
d
sna
pdra
gon
al
so
us
es
H
.26
4
vi
deo
c
od
i
ng
for
their
process
or
s
.
Th
is
pa
per
inclu
de
s
that
how
w
e
are
im
pr
ov
i
ng
t
he
desig
n
proces
s
a
nd
what
dif
ficult
ie
s
we
a
re
facin
g
w
hile
de
sign
i
ng
a
dec
od
e
r.
We
sy
nth
esi
zed
t
he
c
ode,
op
ti
m
iz
ed
the
co
de
a
nd
a
chieve
d
a
th
r
ough
pu
t
wh
ic
h ou
t
perform
s the state
-
of
-
a
rt tec
hniq
ue
s.
This
paper
is
orga
nized
as
fo
l
lowing
way.
I
n
sect
ion
2
w
e
ha
ve
dem
on
strat
ed
a
bri
ef
relat
ed
wor
k
of
desig
ning
a
de
cod
e
rs.
I
n
S
ec
ti
on
3,
a
n
ove
rv
ie
w
of
H.2
64
decode
r
is
presente
d.
Sect
ion
4
des
cribe
s
our
pro
po
se
d
opti
m
iz
at
ion
te
chni
qu
es
of
de
sig
ning
a
H.64
de
cod
e
rs.
Th
e
pe
rfor
m
ance
an
d
re
su
lt
e
valua
ti
on
i
s
s
how
n
in
S
ect
i
on 5.
In last
se
ct
ion
we
c
on
cl
ud
e
ou
r pape
r.
2.
R
EL
ATED
W
ORK
HLS
to
ols
m
u
ch
hype
d
to
ac
hieve
prot
otyp
ing
an
d
ra
pid
desig
ning
of
ha
rdwar
e
in
re
gi
ste
r
transf
e
r
le
vel.
J.
And
ra
de
et
al
[8
]
has
pr
ove
d
this
cl
aim
by
us
ing
a
com
plex
app
li
cat
ion
desi
gn
i
ng,
w
hich
im
ple
m
ents
low
-
de
ns
it
y
pa
rity
-
check
(L
D
PC).
HLS
t
oo
l can h
el
p
us
e
r
t
o
e
xp
l
or
e
la
rg
e
sp
ace desi
gn
i
ng
int
o
m
ulti
ple
sm
all
desig
ning,
w
hi
ch
m
ake
i
ts
pr
oductivit
y
high.
HL
S
too
l
ca
n
al
so
ex
plore
m
ic
ro
arch
it
ect
ur
e
of
the
ge
ne
rated
desig
n. LD
PC
decode
rs
a
re
de
velo
ped b
y
usi
ng
t
his
HLS
t
ools
on
ly
, w
hich
h
as
an ave
rag
e
thro
ughput.
S.
Ba
lde
v
et
.al
[
9]
has
dev
el
oped
a
n
e
ff
ic
ie
nt
5
-
sta
ge
pip
el
i
ning
a
rch
it
ect
ure
of
da
glo
c
king
filt
er
f
or
desig
n
of
HE
VC
dec
od
e
r.
T
he
lum
a/
chr
oma
sa
m
ples
are
app
li
ed
ver
ti
ca
ll
y
o
n
edg
e
filt
ers
of
th
e
desi
gn
s
to
get
m
axi
m
u
m
t
hro
ughput
a
nd
m
ini
m
iz
es
the
nu
m
ber
of
cl
oc
k
cy
cl
e.
T
his
pro
posed
arc
hitec
ture
is
de
velop
e
d
in
FP
GA
an
d
AS
IC
platf
or
m
us
in
g
90
-
nm
t
echn
i
qu
e
.
T
he
resu
lt
of
t
his
pro
pose
arc
hitec
ture
s
hows
t
ha
t
the
UHD vide
os
ar
e
dec
od
e
d
at
200f
ps
.
F.
Le
duc
-
P
rim
eau
et
.al
[10]
has
de
velo
ped
a
desi
gn
ap
pro
ach,
know
n
as
quasi
-
sy
nchronous
desi
gn
appr
oach
.LDP
C
decoder
al
lo
ws
ti
m
ing
vi
olati
on
,
wh
ic
h
is
m
od
ifie
d
t
hro
ugh
pro
per
m
od
el
ing
by
HLS.
The
new
desi
gn
e
d
ci
rcu
it
s
ca
n
pro
vid
e
sam
e
p
er
form
ance
with
sam
e
area
co
ns
trai
nt
but
ha
ving
a
n
energy
reducti
on
of
32
%.
Desig
ning
of
Node
P
ro
ces
sing
Un
it
s
in
L
PD
C
dec
oder
[11]
is
ver
y
im
po
rtant
for
both
ha
r
dw
a
re
resou
rces
a
nd
processi
ng
e
xp
e
riences
.
N
PU
a
rch
it
ect
ures
s
upports
decode
r
in
ke
epin
g
lo
w
ha
rdware
util
iz
at
ion
with
m
axi
m
u
m
op
erati
ng
fr
e
qu
ency.
T
he
sy
nth
esi
s
ou
tc
om
e
pro
ves
th
e
ha
rdwar
e
ef
fici
ency
f
or
pro
po
se
d
a
rc
hitec
ture.
T.
Ma
ll
ikarachc
hi
et
.al
[1
2]
has
pro
posed
a
fr
am
ewo
r
k
to
reduce
the
c
om
plexity
of
deco
di
ng.
By
reducin
g
the
c
om
plexity
of
de
cod
e
r,
t
hey
ha
ve
re
du
c
ed
the en
er
gy
co
nsum
ption
dur
in
g
m
edia
play
bac
k.
Th
ey
al
so
im
pr
ov
e
d i
n
bit
-
rate a
nd
vid
e
o qu
al
it
y b
y desig
ning t
his f
r
am
ewo
rk.
H.
Kim
et
.al
[13]
has
de
ve
lop
e
d
a
n
ef
fici
ent
arch
it
ect
ure
of
HEV
C
for
s
upportin
g
ultra
-
hi
gh
def
i
niti
on
c
on
te
nt
by
m
ultico
re
im
ple
m
e
ntati
on
.
In
st
and
a
r
d
HE
VC
te
chn
iq
ue
th
ere
is
issue
of
data
dep
e
ndencies
,
wh
ic
h
m
akes
it
ineff
ic
ie
nt
fo
r
par
al
le
l
processin
g.
The
no
vel
ar
chite
ct
ur
e
of
m
e
m
or
y
orga
nizat
ion
s
olv
es
the
probl
e
m
of
data
dep
en
de
ncies
an
d
m
akes
it
s
eff
i
ci
ent
for
pa
r
al
le
l
pr
ocessin
g.
T
hey
hav
e
im
ple
m
en
te
d
de
-
blo
c
king
filt
er w
it
h ski
p
m
od
e
pip
el
in
ing
t
o
ac
hieve
high
perform
a
nce
of thro
ughput.
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Eff
ic
ie
nt H
.26
4 Dec
od
e
r
ar
chi
te
ct
ur
e u
si
ng E
xt
erna
l Me
m
ory
and Pipel
ini
ng
(
G.R.
Po
ornima
)
997
3.
H.26
4
ST
ANDAR
D VID
E
O D
E
CODI
N
G
A
H.2
64
sta
nd
ard
ha
s
a
nu
m
ber
of
pro
file
s,
wh
ic
h
c
ov
e
rs
diff
e
re
nt
encodin
g
featu
res,
fr
am
e
rates
and
reso
l
utions.
Th
us
,
it
’s
m
and
at
ory
to
def
i
ne
the
pro
file
,
wh
il
e
des
ign
in
g
a
dec
oder
,
so
that,
they
will
su
pp
or
t
f
or
tha
t
sp
eci
fic
pro
file
.
In
this
pape
r,
we
a
re
de
sig
ning
a
dec
oder
for
the
m
ai
n
pr
ofi
le
,
norm
al
l
y
us
ed
in stan
dard
vide
o
stream
ing
a
nd br
oad
cast
in
g.
Inp
ut
of
t
he
de
cod
e
r
is
an
en
cod
e
d
Y
UV
vi
deo
co
ntaini
ng
colo
r
s
pace
d
pix
el
s,
w
her
e
Y
represe
nts
lum
inance
w
he
reas
U
a
nd
V
re
pr
ese
nts
chrom
inance
com
po
ne
nt.
H
um
an
ey
es
are
m
or
e
sensit
ive
to
br
i
gh
t
ness
t
han colors,
for t
his
r
eas
on chr
om
i
nan
ce
d
at
a
is c
on
si
der
e
d f
or
i
m
pr
ov
in
g
e
nc
odin
g
e
ff
ic
ie
ncy
.
In
H.264
sta
ndar
d,
vid
e
o
is
encode
d
f
ra
m
e
wise,
where
I
-
f
ram
es
are
enc
oded
without
any
inf
or
m
at
ion
of
past
a
nd
f
utur
e
fr
am
es.
Enc
odin
g
of
P
-
fr
am
e
re
qu
ire
s
in
form
ation
of
pre
vious
fr
am
e
wh
ereas
encodin
g
of
B
-
f
ram
e
req
uire
s
inform
at
ion
of
past
fr
am
e
as
well
as
fu
t
ur
e
f
ram
es
al
s
o.
E
ncode
d
vi
deo
is
store
d
in
form
of
bit
stream
.
Fil
e
fo
rm
at
of
encode
d
vid
e
o
pro
vid
es
i
nform
at
ion
ab
ou
t
e
ach
f
ram
e’s
typ
e.
A
H.264 i
nput
file
for
m
at
is sh
own
in
F
ig
ur
e
1.
Figure
1
.
Str
uc
ture of
H.2
64 e
ncode
d fil
e
SPS
an
d
PP
S
un
it
of
input
f
il
e
(f
ram
e)
con
ta
ins
inf
orm
a
ti
on
re
gardin
g
decodin
g
pa
r
a
m
et
ers
and
fr
am
e
siz
e.
ID
R
un
it
of
the
i
nput
file
is
the
first
sli
ce
wh
er
e
fr
am
e
is
fu
rt
her
div
ide
d
int
o
m
acro
blo
ck
s.
Each
of
t
he
sli
ce
he
ader
ha
ving
basic
inf
orm
ation
of
sli
ce
li
ke
sli
ce
iden
ti
fier,
num
ber
of
m
acro
bl
ocks,
qu
a
ntiza
ti
on f
a
ct
or
a
nd fram
e config
ur
at
io
n.
A
Dec
od
e
r
get
s
com
pr
essed
bit
-
stream
fr
om
the
enco
de
d
inp
ut
file
w
he
re
entr
op
y
dec
od
e
r
dec
odes
the
input
bit
stream
into
a
set
of
qua
ntize
d
coeffic
ie
nt.
T
he
resid
ual
i
m
a
ge
inf
orm
ation
can
be
ob
ta
i
ne
d
by
us
in
g
i
nv
e
rse
qu
a
ntiza
ti
on
a
nd
in
verse
tra
nsfo
rm
at
ion
unit
.
The
c
om
bin
ed
i
nfor
m
at
ion
of
resid
ual
da
ta
,
pre
-
decode
d data a
nd pre
dicti
on
i
nfor
m
at
ion
is
ut
il
iz
ed
in f
inal
decode
d
im
age.
In
this
pap
e
r,
we
m
ai
nly
fo
c
us
on
ge
tt
ing
higher
reso
l
ution
dec
od
e
d
im
age
with
a
high
f
ram
e
rate.
Fo
r
this
purpo
se
we
need
to
consi
de
r
a
de
sign
process
,
wh
e
re
we
im
p
lem
ent
the
para
m
et
erized
buf
fer
f
or
tem
po
rar
y
sto
r
age
so
l
ution.
A
H.
26
4
vid
e
o
de
cod
e
r
f
un
ct
io
nal
blo
c
k
diag
r
a
m
al
on
g
with
intra
pr
e
dicti
on
unit
is sh
own
i
n
F
i
gure
2.
Figure
2
.
H
.26
4 D
eco
de
r
f
un
ct
ion
al
b
l
oc
k d
ia
gr
am
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
995
–
10
02
998
4.
IMP
ROVED
H.26
4 DEC
O
DER WIT
H
S
IMU
L
ATIO
N
ARCHIT
EC
TURE
In
this
sect
ion
we
will
discuss
ab
out
the
un
iq
ue
desi
gnin
g
an
d
m
eth
od
ology
to
get
i
m
pr
ov
e
d
i
m
ple
m
entat
io
n
of
H.264
de
c
od
e
r.
We
are
m
ai
nly
con
cen
trat
ing
on
dif
fi
culti
es
in
p
erfo
rm
ing
to
desig
ning
a
m
od
ule,
w
hich
can
dec
od
e
ou
r
vi
deo
f
ram
e
eff
ic
ie
ntly
.
I
n
t
otal
desi
gn
i
ng
process
,
a
us
e
case
has
co
ns
i
der
e
d
for
the
ha
rdw
are
m
od
ule
wh
ic
h
incl
ud
e
s
requirem
ent
ver
i
ficat
ion
,
cod
e
w
riti
ng
and
it
erati
ve
so
urce
op
ti
m
iz
ation
w
it
h
sin
gle
f
unc
ti
on
or
m
ulti
pl
e
f
un
ct
io
n
al
ong
with
syst
em
le
vel
optim
i
zat
ion
.
The
si
m
ula
te
d
arch
it
ect
ure
of
H.
26
4
decod
er
is
sh
own
in
F
igu
re
3.
Th
e
arch
it
ect
ur
e
sh
ows
the
co
nnect
ivit
y
of
ex
te
rn
al
m
e
m
or
y wit
h
s
har
e
d b
us
a
nd interf
ace
unit
.
Figure
3.
Sim
ulate
d
H.2
64 d
e
cod
e
r
a
rch
it
ect
ur
e
4.1
.
Im
prove
d Synthesi
s Pr
ocess
In
this
sect
io
n
we
will
sta
rt
i
m
pr
ovin
g
the
app
li
cat
io
n
it
erati
vely
by
i
m
p
rovin
g
the
co
de
.
W
e
will
first
im
pr
ov
i
ng
the
desi
gn
for
area
m
ini
m
iz
ing
by
im
plem
enting
m
axim
a
ll
y
par
al
le
li
ze
and
pipe
li
nin
g
arch
it
ect
ure.
T
hough,
this
ar
chite
ct
ur
e
can
no
t
be
us
e
d
in
com
plex
ap
plica
ti
on
,
beca
us
e
ap
plyi
ng
global
par
al
le
li
zat
ion
m
akes
area
cos
t
too
high.
D
ue
to
this
reaso
n,
we
sh
ould
m
or
e
sel
ect
ive
in
op
ti
m
iz
ing
area
and
op
ti
m
iz
ing
p
er
form
ance in
c
om
plex
app
li
cat
ion
.
We
hav
e
w
ritt
en
our
c
ode
by
the
help
of
Xili
nx
IS
E
t
oo
l
i
n
Ver
il
og
hard
war
e
desc
ripti
on
la
ngua
ge
.
It’s
a
t
oo
l
w
hi
ch
gi
ves
e
nv
i
ronm
ent
to
de
velo
p
the
desi
gn
co
de,
synt
hesize
the
c
od
e
an
d
sim
ulate
the
dev
el
op
e
d desi
gn code.
4.1.1 Buil
din
g Individ
ua
l
Fu
nctio
n
We
sta
rt
buil
di
ng
eve
ry
in
div
i
du
al
f
un
ct
io
n
i
n
a
f
as
hion,
w
hich
help
s
in
ge
tt
ing
pe
r
form
ance
on
cal
l.
We
ha
ve
div
i
de
d
al
l
sing
le
functi
on
int
o
gro
up
s
a
nd
perfor
m
ed
sever
al
en
han
cem
ent.
We
con
ce
ntrated
m
or
e
on p
e
rfo
rm
ance optim
iz
at
ion
r
at
her tha
n
a
re
a optim
iz
at
ion
.
4.2
.
Mem
ory R
eus
ab
il
it
y
w
ith Pipel
ini
ng
Architec
tu
re
In
order
t
o
m
a
ke
m
e
m
or
y
ref
eren
ces
e
ff
ic
ie
nt
in
refe
re
nce
so
ft
war
e
CP
U
i
m
ple
m
entat
io
n
is
a
key
factor.
T
he
ha
r
dw
a
re
im
ple
m
entat
ion
of
sh
a
red
m
e
m
or
y
m
igh
t
be
ex
pe
nsi
ve,
eve
n
i
n
ca
se
of
l
ocal
BR
AM
blo
c
k
al
so
,
whic
h
m
ay
cause
a
per
f
or
m
ance
bo
tt
le
nec
k
of
the
syst
e
m
.
T
hu
s
,
by
ap
plyi
ng
s
plit
te
d
indi
vid
ua
l
reg
ist
er
or
sm
al
l
local
data
arr
ay
reu
sa
bili
ty
can
be
a
n
im
p
or
ta
nt
facto
r
in
syst
e
m
per
for
m
ance.
The
de
ci
sion
of
s
plit
ti
ng
re
gi
ste
rs
or
sm
al
l
local
data
arra
y
i
m
ple
m
entat
ion
de
pe
nds
on
three
facto
rs
.
First,
the
fun
ct
ion
s
wh
ic
h
in
vo
l
ve
d
in
ar
ray
sho
uld
be
sign
i
ficant
functi
on
in
div
id
ually
.
Sec
ond
thi
ng
the
functi
ons
w
hic
h
are
involve
d
in
ar
r
ay
sh
ould
ha
ve
data
-
par
al
le
li
sm
and
m
e
m
or
y
port
li
m
it
at
ion
shou
l
d
no
t
be
there.
The
la
st
thin
g
is t
he
im
ple
m
e
nta
ti
on
of BR
AM. T
he re
du
c
ing
BR
AM ca
n be m
or
e e
ff
ect
ive tha
n pr
e
ve
nting t
he re
gister
us
e.
This
strat
e
gy
is
ap
plied
in
ou
r
de
sig
n
im
ple
m
entat
ion
.
B
ut,
if
the
local
buf
fer
siz
e
is
no
t
com
plete
l
y
par
ti
ti
on
i
ng
fe
asi
ble,
in
that
case
we
create
an
ad
diti
on
al
l
y
loc
al
bu
f
fers
wh
ic
h
can
be
reu
se
d
ef
fici
en
tl
y.
In
our
co
de,
i
n
each
it
erati
on
we
read
5
-
6
overl
app
e
d
data
it
em
s
wh
ere
on
ly
on
e
data
it
e
m
is
new
.
T
he
refor
e
,
by
creati
ng local
buff
e
r we
read
a n
e
w data
in e
ach ite
rati
on.
Til
l
no
w,
im
pr
ov
em
ent
is
do
ne
for
eff
ic
i
e
nc
y
and
m
e
m
or
y
reu
sabili
ty
by
low
le
vel
functi
on
cal
li
ng.
Now,
we
ca
n
f
ur
t
her
im
pr
ov
e
our
syst
em
f
or
area
co
nst
raints.
A
pipe
li
nin
g
arc
hitec
ture
help
s
a
lot
in
i
m
pr
ovin
g
pe
rfor
m
ance
as
we
ll
as
sign
ific
an
t
area
op
ti
m
iz
a
ti
on
.
We
ha
ve
pip
el
inin
g
base
d
on
the
nu
m
ber
of
it
erati
on
s t
o
im
pro
ve
f
urt
her p
erfor
m
ance of
the syst
em
alon
g wit
h area
optim
iz
at
ion
.
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Eff
ic
ie
nt H
.26
4 Dec
od
e
r
ar
chi
te
ct
ur
e u
si
ng E
xt
erna
l Me
m
ory
and Pipel
ini
ng
(
G.R.
Po
ornima
)
999
4.2.1 Im
provi
ng
Thr
ough
Cross
Func
tion
Exa
mi
n
ati
on
We
sta
rt
dev
el
op
i
ng
f
unct
io
n
sequ
e
nces
any
analy
zed
their
i
m
po
rtance
in
app
li
cat
io
n.
I
n
desce
nd
i
ng
order
o
f
sig
nif
ic
ance,
we
ca
n
i
m
pr
ove
the
le
af
fu
nctio
n
seq
uen
ce
,
w
hich
is
m
or
e
effe
ct
ive
in
enh
a
ncin
g
i
m
po
rtant
cal
l
seq
uen
ces
.
Pro
fili
ng
data
is
updated
a
nd
t
her
e
afte
r
we
fin
d
out
the
diff
e
ren
t
seq
ue
nc
es
of
functi
on
cal
ls.
These
f
un
ct
io
n
cal
l
m
igh
t
be
the
im
pr
oved
le
af
f
un
ct
i
on
or
oth
er
se
ver
al
f
un
ct
io
n
i
nclu
ding
cal
l
sta
ck.
Each
f
un
ct
io
n
la
te
ncy
is
analy
zed
al
ong
wit
h
the
num
ber
of
cal
ls
on
cal
l
st
ack.
T
his
data
helps
us
in f
i
nd
i
ng criti
cal
p
at
h
i
n
cal
l
sta
ck.
4.2.2lum
a
a
nd
C
hr
om
a
P
ar
al
le
l Da
t
a
Fl
ow D
esi
gn
Fi
gure
4
dem
on
st
rates
the
Lum
a
and
Ch
r
om
a
par
al
le
l
data
path
unit
,
wh
e
re
buff
e
rs
are
util
iz
ed.
Weig
ht
pr
e
dictor
est
im
a
te
s
the
weigh
t
f
or
both
L
um
a
and
Chrom
a
data
s
a
m
ples
wh
ere
data
is
stored
i
n
sm
al
l
buff
e
rs.
A
ve
ra
ge
s
um
of
t
he
predict
ed
weigh
t,
bu
ff
e
r
dat
a
an
d
i
nd
i
vidu
al
pr
e
dicte
d
w
ei
gh
t
is
ap
plied
to
m
ul
ti
plexer
for
f
inali
zi
ng the
ou
t
pu
t.
Figure
4. Lum
a
-
Ch
ro
m
a D
at
a flow
desig
n
4.3
.
L
ooping
Rea
r
r
angeme
nt
a
nd
Fu
ncti
on
I
nli
ning
Ther
e
a
re
so
m
e
factor
s
,
w
hich
can
sto
p
the
pip
el
inin
g
c
omplet
el
y
in
loo
p.
The
co
ntr
ol
f
low
of
the
loop
m
ay
deg
r
ade
the
pe
rform
ance
by
app
l
yi
ng
co
nd
it
io
na
l
check
in
eac
h
loop
it
erati
on.
D
ue
to
the
se
reason,
we reo
rd
e
r
e
ve
ry lo
op where
op
ti
m
iz
ation
of both t
he
la
te
nc
y i
s n
ee
ded co
ns
ide
rin
g
the
pi
pelining s
uitabil
it
y.
Durin
g
si
ng
le
functi
on
op
ti
m
iz
at
ion
,
it
has
bee
n
fou
nd
in
se
ver
al
ca
se
s,
that
t
her
e
is
a
nee
d
of
functi
on
inli
ni
ng
f
or
both
la
te
ncy
an
d
are
a
optim
iz
at
ion
.
But
it
is
not
necessa
ry
tha
t
there
is
al
w
ay
s
a
sign
ific
a
nt
be
ne
fit
of
s
pecial
iz
at
ion
w
ould
be
there
by
f
un
ct
ion
in
li
ning.
Du
e
t
o
du
plica
ti
on
of
res
ourc
es,
a
trade
off
is
ther
e
between
fun
ct
ion
cal
l
ov
er
heads
an
d
incr
eased
area.
By
i
m
ple
m
enting
inli
nin
g
to
ol,
we
can
achieve
un
i
versal
ly
inli
ne
or
we
can
pre
vent
inli
nin
g
of
a
functi
on.
I
n
our
case,
we
trie
d
to
get
m
axi
m
u
m
ben
e
fit
in
each
cal
l
s
it
e.
Ho
w
ever,
in
so
m
e
c
ases
it
is
ben
ef
ic
ia
l
on
ly
after
inli
nin
g
a
subs
et
of
fr
e
qu
e
ntly
us
e
d
functi
on call
si
te
s.
Af
te
r
ide
ntifyi
ng
im
po
rtant
posit
ion
s
of
po
t
entia
l
savin
g,
we
im
ple
m
ent
ed
f
unct
io
n
inl
ining.
Using
prof
il
e
data
an
d
ca
ll
sit
e
po
sit
ion
s,
it
is
ob
serv
e
d
that
the
sign
ific
a
nce
of
al
l
cal
l
sit
es
f
or
can
did
at
e
f
unct
ion.
We
ope
n
m
ult
iple
al
te
rn
at
ive
op
ti
on
of
inli
ning
di
recti
ves
wh
ic
h
helps
i
n
fi
nd
i
ng
la
te
nc
y
savin
g
an
d
area
cost, B
ut w
e
have
finali
zed i
nline c
ho
ic
es
by
inli
ni
ng m
anu
al
ly
f
or the
i
m
ple
m
ented
f
unct
ion.
Buffer
upgra
da
ti
on
As
we
disc
us
s
ed
earli
er,
in
im
pr
ov
in
g
of
sing
le
f
unct
ion
buff
e
r
r
ole
is
ver
y
sig
nifican
t
to
execu
t
e
par
al
le
li
sm
al
g
or
it
hm
.
In
it
ia
ll
y,
we
sta
rt
f
r
om
local
buf
fer
insertio
n
t
o
im
pro
ve
sin
gle
le
af
f
unct
io
n.
He
re,
we
are
no
t
c
onsid
erin
g
the
plac
es,
w
he
re
to
de
fine
t
he
buffer
f
or
getti
ng
m
axi
m
u
m
benefit
of
pa
rall
el
iz
at
ion
durin
g
the
cal
l
sta
ck.
W
e
just
evaluate
the
ind
i
vidual
buff
e
r
to
de
fine
in
cal
l
stack
.
Bu
ff
e
r
can
aff
e
c
t
par
al
le
li
sm
of
su
b
-
f
unct
io
n
s
at
hig
he
r
le
vel
of
cal
l
sta
ck,
bu
t
at
the
sa
m
e
tim
e
i
t
le
ads
com
plexity
in
inter
-
blo
c
k
e
dg
es
,
wh
ic
h
ex
pe
riences
ov
e
rhead
wh
il
e
c
op
yi
ng
f
or
m
global
edg
e
s.
I
n
t
his
sit
uation,
a
co
nf
li
ct
i
on
com
es
in
sh
ari
ng
of
pote
ntial
data,
because
m
ul
ti
ple
su
b
-
f
un
ct
io
ns
re
us
e
s
sam
e
bu
ff
e
r.
W
e
a
naly
ze
the
cal
l
sta
ck
of
eac
h
local
buff
e
r
an
d
try
to
def
ine
it
earli
es
t
plac
es,
wh
e
re
over
head
is
m
ini
mu
m
and
su
b
f
unct
i
on
gets m
axi
m
u
m
b
e
nef
it
durin
g parall
el
ism
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
995
–
10
02
1000
4.4
.
S
ystem
Le
vel Pa
r
allel
Pro
ces
sing
On
ce
cal
l
sta
ck
i
m
pr
ovem
ent
is
do
ne,
we
im
pr
ov
e
d
ever
y
sing
le
functi
on
al
ong
with
the
i
m
po
rta
nt
seq
uen
ce
of
f
unct
ion
in
cal
l
s
ta
ck.
Now
in
ste
ad
of
optim
izing
le
ss
si
gnific
ant
f
un
ct
i
on,
we
sta
rt
op
ti
m
iz
ing
data
bu
ff
e
rin
g
and
data
t
rans
portat
ion
acr
oss
dif
fe
re
nt
por
ti
on
s
of
dec
od
er,
w
hich
is
not
di
rectl
y
connecte
d
with
functi
on c
al
l st
ack.
Af
te
r
data
op
ti
m
iz
at
ion
we
fi
nd
out
to
p
-
le
ve
l
dep
e
ndencies
in
pr
of
il
in
g
da
ta
,
w
hich
helps
in
fi
nd
i
ng
data d
e
pe
nden
cy
b
et
wee
n buf
fer
s
of to
p
le
ve
l functi
on in
ca
ll
stack.
4.4.1 Bu
ff
er
S
upplemen
t
a
n
d Ta
s
k Lev
el
Par
allel
ism
As
we
discuss
ed
earli
er,
t
he
input
data
str
ea
m
is
co
m
pl
ex,
s
o
after
c
al
l
sta
ck
and
local
buff
e
r
insertio
n,
t
he
exter
nal
data
e
valuati
on
is
c
r
it
ic
al
factor
in
perf
or
m
ance
evaluati
on.
Th
at
’s
w
hy
we
wan
t
t
o
d
evel
op
a
co
re
syst
e
m
,
wh
ic
h
shou
l
d
be
i
ndepende
nt
from
ta
rg
et
res
olu
ti
on,
wh
e
re
s
om
e
par
t
of
m
e
m
or
y
can
be
acce
sse
d
th
r
ough
exte
rn
al
bu
s
f
or
data
tr
ansf
e
r
rat
her
t
ha
n
us
i
ng
l
ocal
m
e
m
or
ie
s.
Mo
reover
,
im
ple
m
ented
local
buf
fer
ar
e
well
op
ti
m
ized
for
par
al
le
l
i
s
m
in
local
f
un
ct
io
n.
S
o,
th
ere
is
no
need
of
hav
i
ng
fac
il
it
at
e
reu
si
ng th
rou
ghout cal
l st
at
e r
egio
n.
By
us
ing
the
sta
ti
sti
cs
of
pr
of
il
e
in
form
at
i
on,
crit
ic
al
functi
on
is
dete
rm
ined
and
cl
us
te
rin
g
is
dev
el
op
e
d
for
functi
on
cal
l
grap
h.
Additi
on
al
ly
,
with
th
e
help
of
t
his
cl
ust
ering,
a
local
data
arr
ay
is
c
reated
for
syst
em
-
le
vel,
wh
ic
h
is
c
om
par
at
ively
lar
ge
a
nd
use
d
by
directl
y
or
in
directl
y.
Direct
ly
us
e
of
this
l
arg
e
r
data
can
be
po
ssible
wh
e
n,
th
ere
is
no
nee
d
of
s
ub
sta
ntial
par
al
le
li
sm
wh
ereas
ind
i
rectl
y
us
e
of
data
c
an
be
po
s
sible
by
co
pying
data
int
o
nex
t
le
vel
of
buf
fer.
Co
pying
of
data
int
o
ne
xt
le
vel
buf
fer
m
ay
exp
e
rience
s
ov
e
r
head,
bu
t
it
protect
s
fro
m
la
te
ncy
pro
du
ce
by
l
ocali
zat
ion
a
nd
re
usa
bili
ty
.
Lat
er
we
fi
nd
out
t
hose
s
ub
functi
on,
w
hic
h
i
m
pl
e
m
ents
l
ocal
buff
e
r
bas
ed
on
the
obse
rv
at
io
n
of
their
bu
ffe
rin
g
nece
ssit
y
and
proce
ssing
tim
e.
In
ou
r
desig
ni
ng
process,
we
hav
e
opti
m
iz
e
d
first
ind
i
vidu
al
fu
nctio
n
or
cal
l
sequ
ences
and
ig
nore
d
ta
sk
-
le
vel
pa
ra
ll
el
is
m
.
Lat
er
we
co
ns
ide
red,
ta
sk
-
le
vel
paral
le
li
s
m
by
i
m
plem
enting
two
m
e
tho
ds
wh
i
ch
are
(a)
Bu
ff
e
r
dupl
ic
at
ion
and
(
b)
interface
dupli
cat
ion
.
W
e
us
e
bu
f
fer
duplica
ti
on
m
e
tho
d
in
case
of
us
in
g
input
data
by
m
ult
iple
fu
nctio
n
w
he
reas
inter
face
duplica
ti
on
m
et
hod
in
case
of
par
ti
ti
onin
g
da
ta
into
two
or
m
or
e
gro
up accesse
d i
nd
i
viduall
y b
y functi
on.
4.5
.
Ru
n
time
M
em
or
y All
oc
at
i
on
In
or
der
to
m
i
nim
iz
e
the
m
e
m
or
y
a
ll
ocati
on
pro
blem
H.
264
us
es
a
dyna
m
ic
m
e
m
or
y
a
ll
ocati
on
.
I
n
our
us
e
-
case
a
lso,
we
ha
ve
us
e
d
dy
nam
ic
m
e
m
or
y
al
locat
ion
f
or
inte
rn
al
buff
e
rs
w
hich
will
conver
t
in
par
am
et
erized
sta
ti
c
al
locat
ion
s.
T
he
us
a
ge
of
dynam
ic
m
e
m
or
y
al
locat
ion
de
pends
on
the
siz
e
of
th
e
input
file
reso
l
ution.
It’s
not
go
od
to
desig
n
a
de
cod
e
r
w
hic
h
will
su
pport
only
on
e
m
axim
u
m
reso
luti
on.
T
o
ov
e
rc
om
e
this
pro
blem
,
we
hav
e
to
re
design
our
m
od
el
for
eac
h
ta
r
ge
t
case
res
olu
t
ion
w
hich
i
nc
rease
s
op
ti
m
iz
ation
c
halle
ng
e
s.
A
la
rg
e
am
ount
of
in
pu
t
buff
e
r
conve
rts
into
t
op
le
vel
m
e
mo
ry
inte
rf
aces
,
wh
ic
h
will
us
e
f
or
m
e
m
or
y
banks
.
It
m
igh
t
nee
d
of
tra
nsfer
ring
data
us
in
g
m
e
m
or
y
i
nterf
ace.
T
hat’s
w
hy
we
op
ti
m
iz
ing
k
er
nel’s wit
hout c
on
si
der
i
ng the
vid
e
o reso
l
utio
n.
5.
RESU
LT
A
N
D ANALY
SIS
We
us
e
d
Xili
nx
VC
U15
25
de
velo
pm
ent
kit
as
a
synthesiz
able
software
for
optim
iz
at
i
on
of
ea
c
h
interm
ediat
e
st
age
of
desi
gning.
We
perf
orm
ed
on
-
boar
d
v
erificat
io
n
us
i
ng
Om
nitek
Z
ynq
7000.
AR
M
CP
U
is util
iz
ed
f
or da
ta
trav
el
li
ng
wh
il
e Zy
nq acts as a sta
ndal
one F
PGA.
Viva
do
2015.
2
too
l
is
app
li
ed
to
determ
ine
the
occupied
area
an
d
oper
at
ing
fr
e
quen
c
y
fo
r
each
desig
n
process
.
I
n
or
der
t
o
m
easur
e
perform
ance
of
in
put
vi
deo
file
s
and
the
c
orres
pondin
g
bo
a
rd
-
le
vel
app
li
cat
io
n,
si
m
ula
ti
on
has
pe
rfor
m
ed
in
H.264
vid
e
o
file
.
H.2
64
vi
deo
f
il
e
fo
ll
ows
a
r
epeati
ng
patte
rn
li
ke
on
e
fr
am
e
is
I
-
fr
am
e,
nex
t
is
P
-
f
ram
e
and
th
en
B
-
f
ram
es
a
nd
a
gain
P
-
f
ra
m
e.
The
la
te
ncy
of
de
c
od
e
d
f
ram
e
dep
e
nds
on
da
ta
fo
r
al
m
os
t
a
ll
fr
a
m
e
t
ype.
We
try
to
find
ou
t
the
worst
case
la
te
ncy
(m
axi
m
u
m
la
ten
cy
)
of
each fram
e t
ype in each
cy
cl
e. A
weig
hted
a
ver
a
ge
is pe
rfo
rm
ed
twic
e fo
r
B
-
fr
am
es compare t
o
I or
P f
ram
es.
Lat
er,
the
ave
r
age
la
te
ncy
of
each
cy
cl
e
is
m
ul
ti
plied
with
obta
in
fr
e
que
ncy
to
check
t
he
ave
rag
e
la
te
ncy
go
t
per f
ram
e. Moreov
e
r, we
h
a
ve
u
se
d In
te
l c
or
e
i5
-
2310 CP
U (
2.9GHz)
to
com
ple
te
the r
e
quire
d
m
od
ific
at
ion
.
Be
cause,
our
c
or
e
de
sig
n
is
i
nd
e
pe
nd
e
nt
of
reso
l
ution,
we
te
ste
d
our
de
si
gn
with
m
ulti
ple
reso
luti
on
li
ke
QC
IF
144p
a
nd
480p
of
input
file
s.
W
e
est
i
m
at
ed
syst
e
m
perform
ance
f
or
di
ff
e
ren
t
reso
l
utions.
W
e
al
s
o
cov
e
re
d
the
a
ve
rag
e
la
te
ncy
for
per
m
ic
ro
blo
c
k
of
the
s
yst
e
m
.
W
e
ha
ve
c
om
par
ed
our
pe
rfor
m
ance
with
diff
e
re
nt pre
-
de
ve
lo
ped H.2
64 im
ple
m
entat
i
on.
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Eff
ic
ie
nt H
.26
4 Dec
od
e
r
ar
chi
te
ct
ur
e u
si
ng E
xt
erna
l Me
m
ory
and Pipel
ini
ng
(
G.R.
Po
ornima
)
1001
5.1
.
Per
fo
r
m
an
ce E
valua
tion
a
n
d
Analys
is
We
sy
nth
esi
ze
d
our
desi
gn
a
nd
e
valuate
pe
rfor
m
ance
w
hi
le
treat
ing
wit
h
QCIF
in
put
vid
e
os
.
At
each
op
ti
m
iz
a
ti
on
sta
ge
,
oc
cup
ie
d
area
is
pr
e
sente
d
with
pe
rfo
rm
a
nce
f
or
co
rr
e
sp
on
ding
synt
he
siz
ed
i
m
ple
m
entat
io
n.
By
im
ple
menting
sin
gle
functi
on
im
pr
ov
em
ent,
a
li
tt
le
area
gets
op
ti
m
iz
ed
bu
t
it
al
so
i
m
pr
oves
the
CPU
perform
a
nce,
w
hich
i
ndic
at
es
reu
si
ng
f
un
ct
io
n
is
well
orga
nized
.
T
houg
h,
optim
iz
a
ti
on
of
cro
ss
-
f
unct
io
n
with
the
hel
p
of
local
buff
e
r
m
aking
the
pi
pelinin
g
m
or
e
ef
fecti
ve
rises
perf
orm
ance
excell
ently
.
The
area
dep
e
nds
pri
m
aril
y
on
FF
,
DSP
an
d
LU
T
unit
s
and
SRAMs
a
re
util
iz
ed
m
a
inly
in
syst
e
m
-
le
vel
buf
fer
but
rar
el
y
us
e
d
in
loca
l
buff
e
rs.
Op
ti
m
iz
at
ion
in
sy
stem
le
vel
co
nfi
gurati
on
m
ak
es
the
perform
ance
low
in
CP
U
le
vel
de
picts
a
dev
ia
ti
on
in
our
desire
res
ul
t.
In
final
ste
p
of
pa
rall
el
iz
ation
,
we
achieve
good t
hro
ughput.
Howe
ver,
our
si
m
ulati
on
pro
cess
do
not
de
sign
a
m
e
m
or
y
band
width
at
syst
e
m
le
vel.
H.264
syst
em
requires
a
lot
of
c
om
pu
ta
ti
on
proce
ss,
s
o,
it
sh
ou
l
d
not
be
a
m
e
m
or
y
bo
un
ded
a
ppli
cat
ion
.
I
n
e
xp
e
r
i
m
ent,
Xili
nx
kin
te
x
-
7 FP
GA
s
is
dem
on
st
rated t
o ge
t a g
ood
a
rea
optim
iz
at
ion
w
it
h
m
axi
m
u
m
f
r
equ
e
ncy.
T
able
1
.
T
he
num
ber
of lum
a and chr
om
a int
ra
pr
e
dicti
on unit
co
m
par
iso
n i
s d
e
picte
d
Desig
n
LUT
FF
lu
m
a [
1
4
]
545
127
lu
m
a[
1
5
]
212
37
lu
m
a [
p
rop
o
sed
]
83
284
ch
ro
m
a[
1
4
]
216
59
ch
ro
m
a[
1
5
]
163
33
ch
ro
m
a [
p
rop
o
sed
]
284
105
T
able
2
.
C
om
par
iso
n of H
.26
4 deco
de
rs wit
h othe
r H.
264 deco
de
rs
a
nd
HEV
C
d
ec
oder
s ar
e
show
n
Ou
r
p
rop
o
sed
wo
rk
ESSCIRC
1
4
[
1
6
]
ISSCC
1
3
[17
]
ASSCC
1
3
[18
]
ISSCC
1
2
[19
]
VLSI
10
[
20]
ISSCC 1
0
[
21]
W
o
rk d
o
n
e[
22]
Vid
eo
-
f
o
r
m
at
H.26
4
HEVC +
Multi
-
f
o
r
m
at
HEVC
W
D4
HEVC
H.26
4
H.26
4
H.26
4
HEVC
On
-
Ch
ip
SRAM
1
0
2
.5
KB
1
5
4
KB
1
2
4
KB
1
0
.2KB
7
9
.9KB
5
9
.6KB
9
.0KB
3
9
6
KB
Log
ic
g
ates
190k
3
4
5
4
k
715k
446k
1
3
3
8
k
662k
414k
2
8
8
7
k
tech
n
o
lo
g
y
2
8
n
m
/
0
.9v
2
8
n
m
/
0
.9v
4
0
n
m
/
0
.9v
9
0
n
m
/
0
1
.0v
6
5
n
m
/1
.2
v
9
0
n
m
/
1
.0v
9
0
n
m
/
1
.09
v
4
0
n
m
/
1
.0v
Clo
ck
r
ate
355
MHz
350
MHz
200
MHz
224
MHz
340
MHz
175
MHz
210
MHz
300
MHz
DRAM
co
n
f
ig
DDR3
L
3
2
b
LPDD
R3
3
2
b
DDR3
n
/a
6
4
b
DDR2
6
4
b
DDR1
n
/a
6
4
b
DDR3
6.
CONCL
US
I
O
N
This
pa
per
s
hows
an
ef
fici
ent
desig
ning
of
a
n
h.2
64
decod
er
with
it
s
intra
pr
e
dicti
on
un
it
(lu
m
a
and
chrom
a).
For
this
proc
ess,
w
e
ha
ve
us
e
d
Xi
li
nx
VC
U
1525
devel
opm
ent
kit
with
on
bo
ard
ve
rificat
io
n
usi
ng
ARM
CP
U.
B
y
us
in
g
r
un
ti
m
e
m
e
m
or
y
al
locat
ion
,
in
div
id
ual
f
unct
io
n
im
pr
ovem
ent
and
cr
os
s
f
unct
ion
ver
ific
at
io
n,
w
e
hav
e
ac
hieve
d
com
plete
i
mp
r
ovem
ent
in
desig
ning.
T
hi
s
i
m
pr
ov
em
ent
prov
i
des
good
area
op
ti
m
iz
ation
w
it
h
m
axi
m
u
m
f
reque
ncy
of
355
MHz
.
We
h
a
ve
us
ed
di
ff
e
re
nt
bl
ock
ty
pes
for
intra
pre
dicti
on
un
it
to
reduce
the
area
c
os
t
of
t
he
dec
oder
.
By
app
ly
in
g
var
i
ou
s
im
pr
ovem
ent,
we
ha
ve
achie
ve
d
a
gr
eat
thr
oughput.
REFERE
NCE
S
[1]
L.
V.
Agos
ti
ni
,
A.
Aze
vedo
,
W
.
Stae
hl
er,
V.
Ros
a,
B.
Za
t
t
,
A.
C
.
Pinto,
R.
E
.
C.
P
orto,
S.
Bampi,
A.
Sus
in,
"D
esign
and
FP
GA
Protot
y
p
ing
of
a
H.
264/AVC
Main
Profile
Dec
od
e
r
for
HD
TV",
Journal
of
the
B
raz
il
ian
Computer
Soci
e
ty
,
vol
.
12
,
pp.
25
-
36
,
2007
.
[2]
D.
Indoonundon,
T.
P
.
Fow
dur,
K.
M.
S
So
y
jaudah
,
“
A
Conceal
m
ent
Aw
are
UEP
Scheme
for
H.264
using
RS
Codes”,
Indon
esian
Journal
o
f
E
le
c
tric
al
Engi
n
e
ering
and
Comp
ute
r
Scienc
e(
IJEECS)
Vol.
6,
No
.
3,
June
2017,
p
p.
671
~ 681
DO
I:
10.
11591/ijeecs.
v6.
i3.
pp671
-
681
.
[3]
Chuan
-
Yung
Tsai
,
Tung
-
Chi
e
n
Chen,
To
-
W
ei
Ch
en
and
Li
ang
-
Ge
e
Che
n,
"Bandwidt
h
opti
m
iz
ed
m
oti
on
compensat
ion
h
a
rdware
design
fo
r
H.264/
AV
C
HD
TV
dec
oder
,
"
48th
Midwe
st
S
y
mpos
ium
on
Circui
ts
and
S
yste
m
s,
2005.
,
Covingt
o
n,
KY
,
2005
,
pp
.
1199
-
1202
Vol
.
2.
doi: 10.
1109
/
MW
SC
AS
.
2005
.
1594322
[4]
E.
Kal
ali,
Y.
A
d
ibe
lli,
I
.
Ham
z
aogl
u,
“
A
Rec
o
nfigura
bl
e
HEVC
Sub
Pixel
Int
erp
olation
Hard
ware
”
,
IEEE
In
t
.
Confe
renc
e
on
C
onsum
er
El
ectronics
-
Be
rl
in, Sept
.
2013
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
12
, N
o.
3
,
Dece
m
ber
2
01
8
:
995
–
10
02
1002
[5]
E.
Kalali
,
I.
Ham
za
oglu,
“
A
low
ene
rg
y
HEVC
sub
-
pixe
l
int
erp
ola
ti
on
ha
rdware,”
IEEE
Int.
Co
nfe
renc
e
on
Image
Proce
ss
ing
,
pp
.
1218
-
1222,
Oc
t.
2014.
[6]
Mengm
eng
Zhang,
Jianf
eng
Qu,
Huihui
B
ai
,
“
Fast
Intra
Prediction
Mode
Dec
ision
Algori
thm
for
HEVC”
,
TEL
KOMNIKA
Indone
sian
Journal
of
El
ectric
al
Engi
nee
ring
,
Vol.
11,
No.
10,
Octobe
r
2013,
p
p.
5703
~
5710
ISS
N:
2302
-
4046
.
[7]
C.
M.
Diniz,
M.
Shafique,
S.
B
ampi,
J.
Henke
l
,
“
A
Rec
onfigur
abl
e
Hardwar
e
Archi
tectur
e
fo
r
Frac
ti
on
al
Pixe
l
Inte
rpol
at
ion
in
High
Eff
icien
c
y
Video
Coding,”
IEE
E
Tr
ans.
on
Computer
-
Ai
ded
Design
of
Int
eg
rated
Circuits
and
Syste
ms
,
vol
.
34
,
no.
2,
pp.
238
-
2
51,
Feb
.
2015
.
[8]
J.
Andrade
et
al
.
,
"D
esign
Space
Expl
ora
ti
on
of
L
DP
C
Dec
oder
s
u
sing
High
-
Le
vel
Sy
n
the
sis,"
in
I
EE
E
A
cc
ess
,
vol
.
PP
,
no.
99
,
pp
.
1
-
1.
doi
:
10
.
1109/
ACCESS
.
2017.
2727221
[9]
S.
Bal
dev
,
K.
Shukla,
S.
Gogoi
,
P.
Rat
hore
and
R.
Peesa
pa
ti
,
"D
esign
and
Im
plem
ent
at
ion
of
Eff
ic
i
ent
Stre
aming
Debloc
king
and
SA
O
Filt
er
for
HEVC
Dec
oder
,
"
in
IEE
E
Tr
ansacti
ons
on
Consum
er
El
ec
troni
c
s
,
vol.
PP
,
no.
99,
pp.
1
-
1.
doi:
10
.
1109/T
C
E.
2018.
2812518
[10]
Zha
o
Han,
M.R.
Anjum
,
“A
Ne
w
Low
-
Costing
QC
-
LDPC
Dec
oder
for
FP
GA
”,
TEL
KO
MNIKA
Indone
sian
Journal
of
E
le
c
tric
al
Engi
n
ee
ring
Vol.
12
,
No.
11
,
Novem
ber
2014,
pp.
772
1
~
7727
DO
I:
10.
11591/telkomnika.
v12
i11.
651
2.
[11]
P.
Hail
es,
L.
X
u,
R.
G.
Maund
er,
B.
M.
Al
-
H
ashimi
and
L.
Hanz
o,
"H
ard
w
are
-
Eff
icient
No
de
Proce
ss
ing
Unit
Archi
tectur
es
fo
r
Flexi
bl
e
LDPC
Dec
oder
Im
pl
ementa
t
ions,"
in
IEE
E
Tr
ansacti
ons
on
Circui
ts
and
Syste
ms
II
:
Ex
press
Briefs
,
v
ol.
PP
,
n
o.
99,
p
p.
1
-
1
.
doi
:
10
.
1
109/T
CS
II.
2018.
2807362
[12]
T.
Mallikar
ac
hc
hi,
D.
S.
Ta
l
a
gal
a
,
H.
K.
Arac
hch
i
and
A.
Ferna
ndo,
"D
ecoding
-
Com
ple
xi
t
y
-
A
war
e
HEVC
Enc
oding
Us
ing
a
Com
ple
xity
-
R
at
e
-
Distor
ti
on
Model,
"
in
I
EE
E
T
rans
act
ions
on
Consum
er
El
ec
tr
onic
s
,
vol
.
PP
,
no.
99,
pp
.
1
-
1
.
doi
: 10.1109/
TC
E.
20
18.
2810479
[13]
H.
Kim
,
J.
Ko
and
S.
Park,
"A
n
Eff
ic
i
ent
Arch
itect
ur
e
of
In
-
Loo
p
Filt
ers
for
Multi
cor
e
Sca
la
bl
e
HEVC
Hardware
Dec
oder
s,"
in
I
E
EE
Tr
ansacti
ons
on
Mul
ti
media
,
vol.
PP
,
no.
99,
pp.
1
-
1
doi: 10.
1
109/T
MM
.
2017.
275950
[14]
F.
Palumbo
et
al
.
,
“
Runti
m
e
ene
r
g
y
ver
sus
qual
ity
tuni
ng
in
m
otion
compensat
io
n
fil
te
rs
for
HEVC,”
in
Proc.
of
the
PDe
S
Conf
.
,
201
6.
[15]
C.
S
au
e
t
al.
<
em>”
Challengi
n
g
the
Best
HE
VC
Frac
ti
ona
l
Pixel
FP
GA
Inte
rpolators
with
Rec
onfigur
abl
e
and
Multi
-
fre
qu
ency
Approxim
at
e
Co
m
puti
ng
IEEE E
m
bedde
d
S
y
st
e
m
s L
et
te
rs”
</em
>
2017.
[16]
C.
-
C.
Ju
et
al.,
“
A
0.
2
nJ/pixe
l
4K
60
fps
Main
-
10
HEVC
dec
o
der
with
m
ult
i
-
f
orm
at
ca
pab
il
i
ties
for
UH
D
-
TV
appl
i
ca
t
ions,” i
n
Proc.
Eur.
So
li
d
-
Stat
e
Circuits C
onf.
(
ESSCIR
C)
,
Sep.
2014,
pp.
1
95
–
198.
[17]
C.
-
T.
Huang
,
M.
Ti
keka
r
,
C.
Juveka
r,
V.
Sze
,
an
d
A.
Chandra
kasa
n,
“
A
249
Mpixel
/s
HEVC
vide
o
-
dec
oder
ch
ip
for
quad
full
HD
appl
ica
-
ti
ons,
”
in
IEE
E
Int
.
Soli
d
-
S
tat
e
Circuits
Conf.
(
ISSCC
)
Dig.
Tech.
Pape
rs
,
Feb.
2013,
pp.
16
2
–
164.
[18]
C.
-
H.
Tsa
i,
H
.
-
T
.
W
ang,
C
.
-
L
.
L
i
u,
Y. Li, a
nd
C
.
-
Y.
Lee,
“
A 446.6K
-
gat
es
0.
55
–
1
.
2V H.265/
HEV
C
dec
od
er
for
n
e
xt
gene
ra
ti
on
v
ide
o
appli
-
cations,”
i
n
Pr
oc. I
E
EE A
s
ian
Soli
d
-
State
Circui
ts Conf.
(
A
-
SSCC)
,
Nov. 2013,
pp
.
305
–
3
08.
[19]
D.
Zhou,
J.
Zho
u,
J.
Zhu,
P.
Li
u
,
and
S.
Goto,
“
A
2
Gpixel
/s H.
264/AVC HP
/M
V
C
vide
o
dec
od
er
chi
p
for
Super
Hi
-
Vision
and
3D
TV/FTV
applic
at
ions,
”
in
IE
E
E
Int
.
Sol
id
-
Sta
te
Cir
cui
ts
C
on
f
.
(
ISSCC)
Dig.
Tech.
Pape
rs
,
S
an
Franc
isco, CA,
US
A,
Feb.
2012,
pp.
224
–
225.
[20]
D.
Zhou
e
t
a
l.,
“
A
530
Mpixel
s/s
4096
×
2160@6
0
fps
H.264/
AV
C
high
profile
v
i
deo
de
code
r
chip,”
in
Proc
.
Sym
p.
VLSI
Circuits (
VLSI)
,
Honolulu,
HI,
US
A,
2010,
pp.
171
–
172
.
[21]
T.
-
D.
Chuang
e
t
al.,
“
A
59.
5
m
W
sca
la
ble
/m
ult
i
-
vi
ew
vid
eo
dec
oder
ch
ip
fo
r
quad/
3D
fu
ll
HD
TV
and
vid
eo
strea
m
ing
app
lic
at
ions,
”
in
IE
EE
Int.
So
li
d
-
Sta
te
Circui
ts Conf.
(
I
SSCC)
Dig.
Tec
h.
Pap
ers
,
Feb
.
2010,
pp
.
330
–
3
31.
[22]
D.
Zhou
et al
.
,
"
An 8K H.
265/HEVC
Video
De
c
oder
Chip
W
it
h
a
New S
y
s
te
m
Pipel
in
e
Design
,
"
in
IE
EE
Journal of
Soli
d
-
State
C
irc
uit
s
,
vo
l. 52, no.
1,
pp
.
113
-
126
,
J
an.
2017
.
do
i: 10
.
1109/JS
SC
.
2016.
2616362.
Evaluation Warning : The document was created with Spire.PDF for Python.