TELK
OMNIKA
,
V
ol.
15,
No
.
2,
J
une
2017,
pp
.
570
577
ISSN:
1693-6930,
accredited
A
b
y
DIKTI,
Decree
No:
58/DIKTI/K
ep/2013
DOI:
10.12928/telk
omnika.v15.i2.5508
570
A
Customiz
ed
Reconfiguration
Contr
oller
with
Remote
Direct
ICAP
Access
f
or
Dynamicall
y
Reconfigurab
le
Platf
orm
Tz
e
Hon
T
an*
1
,
Chia
Y
ee
Ooi
2
,
and
Muhammad
Nadzir
Mar
sono*
3
1,3
F
aculty
of
Electr
ical
Engineer
ing,
Univ
ersiti
T
eknologi
Mala
ysia,
81310,
Skudai,
Johor
,
Mala
ysia
2
Mala
ysia-J
apan
Inter
national
Institute
of
T
echnology
(MJIIT),
Univ
ersiti
T
eknologi
Mala
ysia
K
ual
a
Lumpur
,
54100
K
uala
Lumpur
,
Mala
ysia
*
Corresponding
authors
,
e-mail:
thtan5@liv
e
.utm.m
y
,
nadzir@fk
e
.utm.m
y
Abstract
As
FPGA
dynamic
par
tial
reconfigur
ation
getting
into
mainstream,
design
of
reconfigur
ation
con-
troller
becomes
an
activ
e
research.
Most
of
the
e
xisting
recon
figur
ation
controllers
suppor
t
only
the
loading
of
par
tial
bitstream
into
configur
ation
memor
y
without
allo
wing
user
to
access
ICAP
directly
,
which
can
pro-
vide
user
higher
controllability
o
v
er
the
reconfigur
ab
le
de
vice
.
This
paper
presents
the
architecture
of
a
cus-
tomiz
ed
reconfigur
ation
con
troller
with
remote
direct
ICA
P
access
.
Remote
direct
ICAP
a
ccess
allo
ws
user
to
configure
or
readbac
k
de
vice
inter
nal
registers
,
which
off
er
user
higher
cont
rollability
o
v
er
the
reconfigur
ab
le
de
vice
.
Additionally
,
the
proposed
reconfigur
ation
cont
roller
achie
v
ed
at
least
3.19
Gbps
of
reconfigur
ation
throughput,
which
reduces
the
platf
or
m
ser
vice
do
wntime
dur
i
ng
dynamic
par
tial
r
econfigur
ation.
In
order
to
reduce
the
latency
and
tr
ansmission
o
v
erhead
of
remote
functional
update
,
par
tial
bitstream
is
compressed
with
r
un-length
encoding
bef
ore
tr
ansmission.
K
e
yw
or
ds:
Dynamic
par
tial
reconfigur
ation,
Self-reconfigur
ation,
ICAP
Cop
yright
c
2017
Univer
sitas
Ahmad
Dahlan.
All
rights
reser
ved.
1.
Intr
oduction
In
reconfigur
ab
le
computing,
dynamic
par
tial
reconfigur
ation
is
a
vital
f
eature
,
which
en-
ab
les
updates
in
the
field,
impro
v
es
area
utilization
and
allo
ws
def
ect
compensation.
Dynamic
par
tial
reconfigur
ation
pro
vides
a
solution
to
update
acceler
ator
sub-circuits
.
With
dynamic
recon-
figur
ation
f
eature
,
FPGA
becomes
a
viab
le
solution
f
or
most
e
xisting
hardw
are
implementation
of
real-w
or
ld
applications
that
demand
both
processing
po
w
er
and
fle
xibility
,
as
FPGA
has
both
perf
or
mance
adv
antages
of
ASIC
solution
and
fle
xibility
adv
antage
of
softw
are
solution.
T
o
apply
changes
and
updates
to
the
acceler
ators
in
the
reconfigur
ab
le
hardw
are
,
a
ne
w
par
tial
bitstream
is
loaded
to
the
reconfigur
ab
le
hardw
are
at
r
un
time
.
This
leads
to
the
requirement
of
efficient
reconfigur
ation
controller
to
enab
le
dynamic
par
tial
reconfigur
ation
f
eature
.
The
Inter
nal
Configur
ation
Access
P
or
t
(ICAP)
in
Xilinx
FPGA
de
vice
allo
ws
the
recon-
figur
ation
controller
to
be
implemented
within
the
chip
.
Hence
,
this
pro
vides
oppor
tunity
f
or
self-
reconfigur
ation
and
single
chip
implementation
option
f
or
designers
.
Maintaining
distr
ib
uted
sys-
tem
f
or
Inter
net-of-Things
and
cloud
computing
are
big
challenges
f
or
administr
ators
if
the
remote
update
f
eature
is
absent
in
such
systems
[1].
Hence
,
design
and
implementation
of
a
customiz
ed
reconfigur
ation
controller
f
or
remote
dynamically
reconfigur
ab
le
platf
or
m
becomes
the
pr
imar
y
f
ocus
in
this
research
w
or
k.
There
are
a
n
umber
of
researches
[2–8]
that
f
ocused
on
de
v
eloping
controller
to
sup-
por
t
dynamic
par
tial
reconfigur
ation
in
FPGA
through
ICAP
.
Since
these
controllers
w
ere
cus-
tomiz
ed
and
hardw
are
based,
high
reconfigur
ation
throughput
w
as
e
xpected.
Ho
w
e
v
er
,
utilization
of
shared
b
us
architecture
lo
w
ers
reconfigur
ation
throughput
and
increases
o
v
erhead
in
inter-
nal
tr
ansmission.
Nabina
et
al.
[7]
implemented
reconfigur
ation
controller
with
par
tial
bitstream
compression.
The
compression
algor
ithm
is
dictionar
y
based
with
high
compression
r
atio
,
where
par
tial
bistream
is
highly
compressed.
Receiv
ed
December
28,
2016;
Re
vised
Apr
il
10,
2017;
Accepted
Apr
il
25,
2017
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
ISSN:
1693-6930
571
In
this
w
or
k,
a
customiz
ed
reconfigur
ation
controller
with
remote
direct
ICAP
access
is
proposed
to
enab
le
dynamic
reconfigur
atio
n
in
NetFPGA
f
or
remote
functional
updates
pur
poses
.
The
de
v
eloped
Reconfigur
ation
Controller
achie
v
es
at
least
3.19
Gbps
of
reconfigur
ation
through-
put,
which
reduces
the
plat
f
or
m
ser
vice
do
wntime
dur
ing
remote
functional
updates
.
In
addition,
the
proposed
controller
suppor
ts
remote
direct
ICAP
access
,
which
allo
ws
user
to
configure
or
readbac
k
de
vice
inter
nal
registers
.
With
this
f
eature
,
user
has
more
obser
v
ability
and
controlla-
bility
on
reconfigur
ab
le
hardw
are
inter
nal
oper
ations
,
which
include
dynamic
reconfigur
ation.
In
order
to
reduce
tr
ansmission
latency
,
par
tial
bitstream
is
compressed
losslessly
bef
ore
tr
ansmis-
sion.
Ev
en
so
,
this
proposed
w
or
k
pref
ers
the
r
un-length
encoding
compression
scheme
proposed
b
y
Liu
et
al.
[6]
after
consider
ing
the
hardw
are
implementation
efficiency
in
decompression.
In-
stead
of
matching
compression
symbol
width
to
ICAP
b
us
width
(32
bit)
as
in
[6],
the
proposed
architecture
matches
the
compression
symbol
width
to
pac
k
et
b
us
width
(64
bit).
2.
Ar
c
hitecture
Over
vie
w
The
system
in
FPGA
consists
of
both
Static
Region
and
P
ar
tial
Reconfigur
ab
le
Region.
Figure
1
illustr
ates
the
o
v
er
vie
w
of
the
proposed
high-le
v
el
ar
chitecture
.
In
the
Static
Region,
there
are
Comm
unication
Manager
to
handle
Ether
net
pac
k
et
tr
ansmission
and
Reconfigur
ation
Con-
troller
to
hand
le
the
loading
of
par
tial
bitstream.
Specifically
,
Comm
unication
Manager
is
respon-
sib
le
to
perf
or
m
lo
w
er
la
y
er
tasks
in
the
OSI
model
while
Reconfigur
ation
Controller
is
responsib
le
to
retr
ie
v
e
bitstream
from
Comm
unication
Manager
and
loading
the
bitstream
to
Reconfigur
ation
P
or
t.
In
Xilinx
FPGA,
the
Reconfigur
ation
P
or
t
is
instantiated
with
the
ICAP
pr
imitiv
e
,
while
the
Reconfigur
ation
Logic
and
Configur
ation
Memor
y
are
not
visib
le
to
the
designer
.
The
static
region
only
undergoes
the
configur
ation
process
on
the
star
tu
p
,
while
the
par
tial
reconfigur
ation
region
ma
y
undergo
m
ultiple
reconfigur
ations
dur
ing
r
un-time
.
In
the
P
ar
tial
Reconfigur
ab
le
Region,
the
P
ar
tial
Reconfigur
ab
le
Module
is
link
ed
to
the
Comm
unication
Manager
f
or
inter
nal
comm
unica-
tion.
Ob
viously
,
the
perf
or
mance
achie
v
ement
of
the
o
v
er
all
system
relies
on
the
design
and
implementation
of
Reconfigur
ation
Controller
and
Comm
unication
Manager
.
Figure
1.
Proposed
architecture
f
or
dynamically
reconfigur
ab
le
platf
or
m
3.
Dynamic
P
ar
tial
Reconfiguration
There
are
se
v
er
al
flo
ws
pro
vided
b
y
Xilinx
to
suppor
t
pa
r
tial
reconfigur
ation
f
eature
.
Pro-
vided
flo
ws
are
modular
method,
diff
erence-based
method,
small
bit
manipulation
method,
ear
ly
A
Customiz
ed
Reconfigur
ation
Controller
with
Remote
Direct
ICAP
...
(Tz
e
Hon
T
an)
Evaluation Warning : The document was created with Spire.PDF for Python.
572
ISSN:
1693-6930
access
method
and
par
tition-based
method.
This
w
or
k
uses
par
tition-based
method
as
it
is
the
most
recent
consolidated
flo
w
which
only
a
v
ailab
le
in
ne
w
v
ersion
of
Xilinx
ISE
Design
Suite
or
Xilinx
Viv
ado
Design
Suite
.
The
major
design
flo
w
includes
design
and
synthesis
on
all
functional
modules
,
defin-
ing
P
ar
tial
Reconfigur
ab
le
Module
,
defining
design
constr
aints
and
gener
ating
both
bitstream
and
par
tial
bitstream.
The
region
of
P
ar
tial
Reconfigur
ab
le
Region
(PRR)
is
defined
in
a
design
con-
str
aint
file
and
is
used
mainly
dur
ing
the
Place
and
Route
(P
AR)
process
.
In
order
to
simplify
the
o
v
er
all
process
,
this
w
or
k
uses
Xilinx
PlanAhead
to
implement
designs
with
par
tial
reconfigur
ation
as
it
pro
vides
user-fr
iendly
g
r
aphical
user
interf
ace
.
3.1.
Reconfiguration
Contr
oller
The
role
of
the
Reconfigur
ation
Controller
is
to
retr
ie
v
e
par
tial
bitstream
from
Comm
u-
nication
Manager
,
which
consists
of
Control
Plane
P
ac
k
et
Handler
,
P
ac
k
et
T
ype
Classifiers
,
and
Platf
or
m
Manager
.
Figure
2
sho
ws
the
implementation
of
custo
miz
ed
Reconfigur
ation
Controller
.
In
the
Reconfigur
ation
Controller
,
the
Bitstream
P
ac
k
et
Handler
e
xtr
acts
par
tial
bitstream
con-
tent
in
bitstream
pac
k
et,
ac
kno
wledges
T
er
minal
Client
on
the
tr
ansmission
and
stores
par
tial
bitstream
into
SRAM
through
SRAM
Interf
ace
.
When
the
siz
e
of
par
tial
bitstreams
are
large
,
stor
ing
par
tial
bistream
in
inter
nal
BRAM
becomes
impr
actical
and
ma
y
unnecessar
ily
used
up
too
m
uch
inter
nal
logic
resources
.
Additionally
,
platf
or
m
with
on-the-fly
remote
dynamic
par
tial
reconfigur
ation
poses
higher
r
isk
on
system
f
ailure
especially
when
par
tial
bistream
tr
ansmission
is
interr
upted.
Moreo
v
er
,
such
f
ailure
is
not
reco
v
er
ab
le
and
the
platf
or
m
ser
vices
become
un-
a
v
ailab
le
when
the
first
segment
of
par
tial
bitstream
is
loaded
into
the
configur
ation
memor
y
while
the
other
segments
are
in
tr
ansmission.
Theref
ore
,
this
proposed
architecture
separ
ates
par
tial
bitstream
tr
ansmission
from
dynamic
par
tial
reconfigur
ation
process
.
Upon
arr
iv
al
of
the
last
segment
of
par
tial
bitstream,
the
Bitstream
P
ac
k
et
Handler
will
no-
tify
Bitstream
Loader
on
the
status
and
the
DPR
Flo
w
Controller
will
issue
control
signal
to
s
witch
both
Platf
or
m
Manager
and
P
ac
k
et
T
ype
Classifier
into
DPR
mode
.
In
the
DPR
mode
,
the
DPR
Flo
w
Controller
asser
ts
the
reset
signal
in
P
ar
tial
Reconfigur
ab
le
Module
to
stop
its
oper
ation.
In
this
moment,
the
Bitstream
Loader
retr
ie
v
es
pa
r
tial
bistream
from
SRAM
to
load
it
into
the
config-
ur
ation
memor
y
through
both
ICAP
Interf
ace
and
ICAP
.
Since
the
par
tial
bistream
is
compressed
using
Run-Length
Encoding
(RLE)
algor
ithm,
the
ICAP
Interf
ace
inline
decompresses
the
par
tial
bitstream
bef
ore
loading
it
into
ICAP
.
In
order
to
v
er
ify
the
outcome
of
dynamic
par
tial
reconfigur
a-
tion,
the
ICAP
Interf
ace
proceeds
with
a
readbac
k
sequence
to
retr
ie
v
e
the
v
alue
of
in
ter
nal
Status
Register
(ST
A
T)
after
the
last
w
ord
of
par
tial
bitstream
is
loaded
to
the
configur
ation
memor
y
.
The
DPR
Flo
w
Control
ler
will
initializ
e
the
P
ar
tial
Reconfigur
ab
le
Module
after
ICAP
Inter-
f
ace
flags
a
reconfigur
ation
success
status
.
The
initialization
ma
y
tak
e
se
v
er
al
cycles
depending
on
application
components
initialization
requirements
.
Once
initialization
is
completed,
the
DPR
Flo
w
Controller
will
s
witch
both
Platf
or
m
Manager
and
P
ac
k
et
T
ype
Classifier
bac
k
to
nor
mal
mode
and
the
P
ar
tial
Reconfigur
ab
le
Module
is
activ
ated
again.
In
case
dynamic
reconfigur
ation
is
un-
successful,
the
DPR
Flo
w
Controller
will
f
eedbac
k
the
outcome
to
T
er
minal
Client
and
w
ait
f
or
another
retr
y
.
The
par
tia
l
bistream
compression
is
implemented
to
reduce
par
tial
bitstream
tr
ansmis-
sion
o
v
erhead
and
tempor
ar
y
stor
age
usage
.
This
is
because
the
area
of
P
ar
tial
Reconfigur
ab
le
Region
is
defined
as
large
as
possib
le
so
that
comple
x
application
can
fit
into
it.
Ho
w
e
v
er
,
the
par-
tial
bistream
file
siz
e
depends
on
the
area
definition
of
P
ar
tial
Reconfigur
ab
le
Region
regardless
of
its
logic
utilization.
Based
on
obser
v
ation
in
par
tial
bitstreams
content,
P
ar
tial
Reconfigur
ab
le
Region
with
lo
w
logic
utilization
has
higher
count
of
repetitiv
e
content,
which
can
be
compressed
losslessly
to
reduce
its
or
iginal
file
siz
e
.
In
order
to
reduce
logic
resources
needed
f
or
the
im-
plementation
of
decompression,
Run-Length
Encoding
is
used
instead
of
Huffman
encoding.
In
addition,
the
r
un
v
alue
siz
e
is
configured
to
match
the
b
us
siz
e
(64bit)
so
that
the
design
comple
x-
ity
is
minimiz
ed
and
the
architecture
becomes
more
efficient.
The
counter
v
alue
(7
bit)
of
each
r
un
is
stored
at
par
ity
field
(8
bit)
along
with
respectiv
e
r
un
v
alue
(64
bit)
at
data
field
(64
bit),
in
both
SRAM
and
Xilinx
FIFO36_72
pr
imitiv
e
.
The
remaining
1bit
in
par
ity
field
is
used
to
indicate
TELK
OMNIKA
V
ol.
15,
No
.
2,
J
une
2017
:
570
577
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
ISSN:
1693-6930
573
the
last
w
ord
of
a
par
tial
bitstream
content.
Other
than
handling
dynamic
par
tial
reconfigur
ation,
the
Reconfigur
ation
Controller
pro-
vides
a
w
a
y
f
or
user
to
access
the
ICAP
directly
.
This
f
eature
a
ims
to
allo
w
user
to
remotely
configure
or
readbac
k
de
vice
inter
nal
registers
through
direct
ICAP
access
.
Ho
w
e
v
er
,
an
y
pac
k
et
used
to
access
ICAP
directly
m
ust
f
ollo
w
a
str
ict
f
or
mat
because
the
pac
k
et
contains
the
inter
nal
signal
asser
tion
of
ICAP
Interf
ace
.
These
signals
are
chip
enab
le
(CE)
and
read
or
wr
ite
request
(R
W)
from
ICAP
as
w
ell
as
last
command
(L).
The
asser
tion
of
CE
and
R
W
are
similar
to
ICAP
,
where
CE
is
in
activ
e
lo
w
while
logic
’1’
in
R
W
indicates
a
read
request.
T
ab
le
1
pro
vides
snippet
of
a
sample
pac
k
et
to
readbac
k
Status
Register
(ST
A
T).
In
the
middle
par
t
of
T
ab
le
1,
the
control
signal
is
toggled
from
wr
ite
request
to
read
request
and
bac
k
to
wr
ite
request
again,
which
results
in
the
sequence
of
“000”,
“011”,
“010”,
“001”
and
“000”.
There
are
3
FIFOs
used
in
the
implementation
of
Reconfigur
ation
Controller
.
These
FI-
FOs
act
as
the
inter
media
te
b
uff
er
and
pro
vide
cloc
k
domain
crossing
betw
een
functional
b
loc
ks
.
By
splitting
cloc
k
domain
of
ICAP
from
the
platf
or
m,
the
ICAP
can
be
cloc
k
ed
up
to
100MHz,
which
is
the
maxim
um
frequency
pro
vided
b
y
Xilinx
[9].
The
FIFOs
with
label
“A”
and
“B”
are
implemented
with
Xilinx
FIFO18_36
pr
imitiv
e
and
are
used
f
or
direct
user
access
to
ICAP
while
FIFO
with
label
“C”
is
implemented
with
Xilinx
FIFO36_72
pr
imitiv
e
and
is
used
f
or
dynamic
par
tial
reconfigur
ation.
3.2.
T
erminal
Client
The
T
er
minal
Client
reads
and
compressed
t
he
par
tial
bitstream
gener
ated
from
Bitgen
bef
ore
tr
ansmits
it
to
proposed
platf
or
m
through
UDP/IP
.
The
par
tial
bistream
is
tr
ansmitted
to
the
platf
or
m
in
m
ultiple
segments
depending
on
user-specified
pac
k
et
siz
e
and
the
T
er
minal
Client
proceeds
with
pac
k
et
retr
ansmission
whene
v
er
ac
kno
wledgment
is
not
receiv
ed
within
a
specified
timefr
ame
.
The
bitstream
pac
k
ets
are
tr
ansmitted
in
b
ulk
e
xcluding
the
last
segment,
which
is
only
tr
ansmitted
to
the
platf
or
m
after
the
ac
kno
wledgements
of
all
ot
her
segments
ha
v
e
been
receiv
ed.
Figure
3
sho
ws
the
bitstream
pac
k
et
f
or
mat
f
or
dynamic
par
tial
reconfigur
ation.
The
Compression
Header
co
nsists
of
pairs
of
r
un
location
(location
of
repeated
content)
and
r
un
length
(repeated
count)
of
the
par
tial
bitstream.
F
or
instance
,
the
pair
{0x02,0x0A}
in
Compression
Header
indicates
that
the
third
w
ord
of
the
Bitstream
Content
is
a
11
times
repea
ted
w
ord,
where
this
w
ord
will
be
lo
aded
11
times
into
configur
ation
memor
y
.
By
def
ault
,
each
par
tial
bitstream
w
ord
without
repetition
(r
un
length
with
v
alue
0)
will
be
loaded
once
into
configur
ation
memor
y
,
where
this
type
of
w
ord
is
not
compressed
and
tr
ac
k
ed
in
Compression
Header
.
Based
on
the
pac
k
et
definition,
each
w
ord
of
Compression
Header
can
store
up
to
4
pairs
of
r
un
location
and
r
un
length.
F
or
optimal
perf
or
mance
,
the
v
alue
of
segment
siz
e
is
configured
to
either
64
or
128,
which
are
the
v
alues
from
po
w
er
of
tw
o
and
are
smaller
than
maxim
um
tr
ansmission
unit
(MTU).
Figure
4
illustr
ates
the
flo
w
char
t
of
implemented
T
er
minal
Client.
4.
Platf
orm
Ev
aluation
T
ab
le
2
summar
iz
es
the
logic
resources
required
to
implement
proposed
platf
o
r
m
in
NetF-
PGA
10G
board
that
comes
with
Xilinx
Vir
te
x
5
(XC5VTX240T)
FPGA,
which
pro
vides
37440
slices
and
324
BRAM.
The
proposed
platf
or
m
utiliz
ed
7297
slices
(as
repor
ted
b
y
Xilinx
ISE
DS)
out
of
37440
a
v
ailab
le
slices
(less
than
20%
logic
utilization),
which
left
more
than
80%
of
slices
f
or
P
ar
tial
Reconfigur
ab
le
Module
implementation.
Ho
w
e
v
er
,
the
BRAM
utilization
of
Static
Region
Module
is
almost
33.6%
due
to
e
xtensiv
e
use
of
FIFO
in
the
design
to
hold
pac
k
ets
and
to
b
uff
er
par
tial
bitstream.
In
order
to
v
er
ify
the
implemented
plat
f
or
m
e
xper
imentally
,
the
par
tial
reconfigur
ab
le
mod-
ule
is
dynamically
reconfigured
with
v
ar
ious
types
of
pac
k
et
f
orw
arding
designs
while
netw
or
k
pac
k
ets
are
injected
to
the
platf
or
m.
These
netw
or
k
pac
k
ets
are
captured
and
analyz
ed
using
Wireshar
k
pac
k
et
analyz
er
to
v
er
ify
the
beha
vior
of
implemented
platf
or
m.
A
Customiz
ed
Reconfigur
ation
Controller
with
Remote
Direct
ICAP
...
(Tz
e
Hon
T
an)
Evaluation Warning : The document was created with Spire.PDF for Python.
574
ISSN:
1693-6930
!
"
Figure
2.
Implementation
of
the
Reconfigur
ation
Controller
.
T
ab
le
1.
Snippet
of
a
sample
pac
k
et
to
readbac
k
Status
Register
.
P
ac
k
et
content
Control
signal
{L,R
W
,CE}
FFFFFFFF_00000000
000
000000BB_00000000
000
11220044_00000000
000
FFFFFFFF_00000000
000
AA995566_00000000
000
20000000_00000000
000
2800E001_00000000
000
20000000_00000000
000
P
ac
k
et
content
Control
signal
{L,R
W
,CE}
20000000_00000000
000
00000000_03000000
011
00000000_02000000
010
00000000_01000000
001
30008001_00000000
000
0000000D_00000000
000
20000000_00000000
000
20000000_04000000
100
!"
#
$%
!"
% $
$
#
&
#
$
%
!
'
Figure
3.
Bitstream
pac
k
et
f
or
mat.
TELK
OMNIKA
V
ol.
15,
No
.
2,
J
une
2017
:
570
577
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
ISSN:
1693-6930
575
!"
#
$
!"
#
$
%
&
%&
%
&
'
'
'
'
'
Figure
4.
Flo
w
char
t
of
T
er
minal
Client.
T
ab
le
2.
Logic
utilization
of
proposed
architecture
.
Resources
type
P
ar
tial
Reconfigur
ab
le
Module
Static
Regions
Module
T
otal
utilization
A
v
ailab
le
Slice
registers
4818
15150
19968
149760
Slice
LUTs
3962
16185
20147
149760
Occupied
slices
1830
7297
9127
37440
BRAM
32
109
141
324
Based
on
e
xper
imental
e
v
aluation,
the
Reconfigur
ation
Controller
achie
v
es
at
least
3.19
Gbps
of
reconfigur
ation
throughput.
The
par
tial
bitstreams
used
ha
v
e
the
same
file
siz
e
,
which
are
1,524,139
Bytes
(~1.489
MB).
In
order
to
capture
the
time
tak
en
in
dynamic
par
tial
reconfig-
ur
ation,
a
timer
is
implemented
and
used
in
the
Reconfigur
ation
Controller
.
The
loading
of
each
par
tial
bistream
to
configur
ation
memor
y
through
Reconfigur
ation
Controller
tak
es
381,052
cloc
k
cycles
with
ICAP
cloc
k
ed
at
100MHz,
which
result
in
3.199855
Gbps
of
reconfigur
ation
through-
put.
Ideally
,
the
maxim
um
achie
v
ab
le
reconfigur
ation
throughput
is
3.2
Gbps
,
where
the
ICAP
is
used
with
32
bit
b
us
wide
and
is
cloc
k
ed
at
100MHz.
Ho
w
e
v
er
,
due
to
minor
o
v
erhead
results
from
the
inter
nal
state
machine
desig
n
and
the
SRAM
stor
age
handling,
the
Reconfigur
ation
Controller
managed
to
achie
v
e
reconfigur
ation
throughput
that
close
to
the
ideal
maxim
um
perf
or
mance
.
Since
par
tia
l
bitstream
tr
ansmission
is
independent
from
the
dynamic
par
tial
reconfigur
ation
pro-
cess
,
the
platf
or
m
ser
vice
only
becomes
una
v
ailab
le
f
or
3.81052
millisecond.
Although
par
tial
bitstream
tr
ansmission
does
not
impact
the
platf
or
m
ser
vice
a
v
ailability
,
the
par
tial
bitstream
tr
ansmission
can
still
impact
the
latency
perf
or
mance
f
or
remote
update
.
One
of
the
approaches
to
impro
v
e
this
is
through
par
tial
bitstream
compression.
Figure
5
sho
ws
the
compression
r
atio
of
par
tial
bitstream
in
v
ar
ious
logic
utilization.
The
compressed
par
tial
bitstream
is
alw
a
ys
smaller
than
the
uncompressed
par
tial
bitstrea
m.
As
the
logic
utilization
increases
,
the
compression
perf
or
mance
becomes
lo
w
er
.
This
mainly
due
to
par
tial
bitstream
with
lo
w
logic
utilization
has
significant
amount
of
repetitiv
e
content,
where
un
used
logic
is
filled
with
z
ero
.
This
A
Customiz
ed
Reconfigur
ation
Controller
with
Remote
Direct
ICAP
...
(Tz
e
Hon
T
an)
Evaluation Warning : The document was created with Spire.PDF for Python.
576
ISSN:
1693-6930
Figure
5.
Compression
r
atio
of
par
tial
bitstream
with
v
ar
ious
logic
utilization.
T
ab
le
3.
Compar
ison
with
pre
vious
w
or
k.
Pub
lication
Reconf
.
throughput
(Gbps)
Stor
age
Additional
detail
A
C_ICAP
[2]
3.04824
BRAM
-
DPR
Manager
[3]
3.07432
SD
Flash
-
MST_HWICAP
[4]
1.88160
DDR
SDRAM
-
BRAM_HWICAP
[4]
2.97120
BRAM
-
ICAP
Controller
[5]
3.19840
DDR
SDRAM
U
AR
T
tr
ansmission
Intelligent
ICAP
Controller
[6]
3.19832
SRAM
32bit
RLE
compression
FlashCAP
[7]
3.08000
BRAM
X-MatchPR
O
compression
ZyCAP
[8]
3.05600
DRAM
Xilinx
Zynq
FPGA
Proposed
3.19985
SRAM
64bit
RLE
compression,
Gigabit
Ether
net
tr
ansmission
repetitiv
e
content
can
be
eff
ectiv
ely
compressed
using
r
un-length
encoding,
where
compression
of
a
par
tial
bitstream
with
37%
logic
utilization
results
roughly
in
2.6
compression
r
atio
.
Once
the
logic
resources
are
utiliz
ed,
the
content
no
longer
filled
with
z
ero
,
which
increases
the
entrop
y
in
par
tial
bitstream.
Ev
en
so
,
par
t
of
the
par
tial
bitstream
can
still
be
compressed,
where
this
par
t
consist
of
a
ser
ies
of
No
Oper
ation
(NO
OP)
commands
.
Additionally
,
utiliz
ed
BRAM
resources
without
initialization
are
filled
with
z
ero
as
w
ell,
which
can
be
eff
ectiv
ely
compressed.
T
ab
le
3
summar
iz
es
the
compar
ison
of
implemented
Reconfigur
ation
Controller
with
se
v-
er
al
related
w
or
ks
.
The
proposed
Reconfigur
ation
Controller
has
slightly
high
er
reconfigur
ation
throughput
compared
to
[5]
and
[6]
mainly
due
to
the
use
of
dedicated
b
us
and
SRAM
interf
ace
f
or
dynamic
par
tial
reconfigur
atio
n,
which
result
in
lo
w
er
o
v
erhead.
Dedicated
b
us
used
in
the
pro-
posed
architecture
off
ers
adv
antages
such
as
lo
w
er
processing
o
v
erhead,
higher
reliability
(du
e
to
being
independent
from
other
components)
and
higher
consistency
in
reconfigur
ation
throughput.
Additionally
,
the
par
tial
bitstream
used
in
e
v
aluation
of
proposed
w
or
k
is
consider
ab
le
large
in
file
siz
e
,
which
is
1,524,139
Bytes
(~1.489
MB).
5.
Conc
lusion
In
this
paper
,
a
customiz
ed
Reconfigur
ation
Controller
with
remote
access
to
ICAP
is
proposed.
The
customiz
ed
Reconfigur
at
ion
Controller
can
achie
v
e
at
least
3.19
Gbps
of
recon-
TELK
OMNIKA
V
ol.
15,
No
.
2,
J
une
2017
:
570
577
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
ISSN:
1693-6930
577
figur
ation
throughput,
which
significantly
reduces
the
plat
f
or
m
ser
vice
do
wntime
dur
ing
dynamic
par
tial
reconfigur
ation.
Besides
that,
the
latency
of
par
tial
bitstream
tr
ansmission
is
reduced
with
par
tial
bit
stream
compression.
In
addition,
the
customiz
ed
Reconfigur
ation
Controller
allo
ws
user
to
remotely
access
to
the
ICAP
f
or
de
vice
inter
nal
registers
readba
c
k
and
configur
ation.
With
remote
dynamic
par
tial
reconfigur
ation,
the
acceler
ator
sub-circu
its
can
be
updated
remotely
at
r
un-time
after
deplo
yment.
In
g
ener
al,
functional
update
is
impor
tant
to
patch
e
xisting
design
fla
ws
and
b
ugs
,
to
optimiz
e
design
perf
or
mance
and
to
cope
with
the
chang
ing
of
e
x
ecution
unit’
s
func-
tional
requirement.
Future
w
or
k
will
f
ocus
on
augmenting
platf
or
m
secur
ity
through
end-to-end
pac
k
et
encr
yption
so
that
the
platf
or
m
can
be
deplo
y
ed
on
non-secured
netw
or
k.
Ac
kno
wledgment
This
w
or
k
is
suppor
ted
in
par
t
b
y
the
CREST
g
r
ant
(UTM
V
ote
No
.
4B176)
and
Univ
ersiti
T
eknologi
Mala
ysia
matching
g
r
ant
(UTM
V
ote
No
.
00M75).
Ref
erences
[1]
A.
Schallenberg,
Dynamic
par
tial
self-reconfigur
ation:
Quic
k
modeling,
sim
ulation,
and
syn-
thesis
.
Ger
man
y:
Suedw
estdeutscher
V
er
lag
fuer
Hochschulschr
iften,
2010.
[2]
L.
A.
Cardona
and
C
.
F
errer
,
“A
C_ICAP:
A
fle
xib
le
high
speed
ICAP
controller
,
”
Inter
national
Jour
nal
of
Reconfigur
ab
le
Computing
,
v
ol.
2015,
2015.
[3]
J
.
T
arr
illo
,
F
.
A.
Escobar
,
F
.
L.
Kastensmidt,
and
C
.
V
al
derr
ama,
“Dynamic
par
tial
reconfig-
ur
ation
manager
,
”
in
2014
IEEE
5th
Latin
Amer
ican
Symposium
on
Circuits
and
Systems
(LASCAS)
,
Santiago
,
Chile
,
F
eb
2014,
pp
.
1–4.
[4]
M.
Liu,
W
.
K
uehn,
Z.
Lu,
and
A.
J
antsch,
“Run-time
par
tial
reconfigur
ation
speed
in
v
estiga-
tion
and
architectur
al
design
space
e
xplor
ation,
”
in
2009
Inter
national
Conf
erence
on
Field
Prog
r
ammab
le
Logic
and
Applications
,
Pr
ague
,
Cz
ech
Repub
lic
,
Sep
2009,
pp
.
498–502.
[5]
K.
Vipin
and
S
.
A.
F
ahm
y
,
“A
high
speed
open
source
controller
f
or
FPGA
par
tial
reconfig-
ur
ation,
”
in
2012
Inter
national
Conf
erence
on
Field-Prog
r
ammab
le
T
echnology
(FPT)
,
Seoul,
K
orea,
Dec
2012,
pp
.
61–66.
[6]
S
.
Liu,
R.
N.
Pittman,
A.
F
or
in,
and
J
.-L.
Gaudiot,
“Minimizing
the
r
untime
par
tial
reconfigur
a-
tion
o
v
erheads
in
reconfigur
ab
le
systems
,
”
The
Jour
nal
of
Supercomputing
,
v
ol.
61,
no
.
3,
pp
.
894–911,
Sep
2012.
[7]
A.
Nabina
and
J
.
L.
Nunez-Y
anez,
“Dynamic
reconfigur
ation
optimisation
with
streaming
data
decompression,
”
in
2010
Inter
national
Conf
erence
on
Field
Prog
r
ammab
le
Logic
and
Applica-
tions
,
Milan,
Italy
,
Sep
2010,
pp
.
602–607.
[8]
K.
Vipin
and
S
.
A.
F
ahm
y
,
“ZyCAP
:
Efficient
par
tial
reconfigur
ation
management
on
the
Xilinx
Zynq,
”
IEEE
Embedded
Systems
Letters
,
v
ol.
6,
no
.
3,
pp
.
41–44,
Sep
2014.
[9]
Xilinx,
“UG702
(v13.4)
par
tial
reconfigur
ation
user
guide
,
”
2012.
A
Customiz
ed
Reconfigur
ation
Controller
with
Remote
Direct
ICAP
...
(Tz
e
Hon
T
an)
Evaluation Warning : The document was created with Spire.PDF for Python.