TELKOM
NIKA
, Vol. 11, No. 10, Octobe
r 2013, pp. 5
774 ~ 5
781
ISSN: 2302-4
046
5774
Re
cei
v
ed Ap
ril 23, 2013; Revi
sed
Jul
y
2, 2013; Accept
ed Jul
y
15, 2
013
Efficient Implementation of Decimal Floating Point
Adder in FPGA
Yang Huijing*
1
, Yu Fan
2
, Han
Dand
an
3
1
School of Softw
a
r
e, Har
b
in U
n
iversit
y
of Sci
ence
a
nd T
e
chnol
og
y, Harb
in
150
04
0, Heil
on
gjin
g, Chi
n
a
2
F
i
nanci
a
l De
p
a
rtment, Heil
on
gjia
ng U
n
ivers
i
t
y
, Harbi
n
15
00
80, Hei
l
on
gji
n
g
,
China
3
School of Softw
a
r
e, Har
b
in U
n
iversit
y
of Sci
ence
a
nd T
e
chnol
og
y, Harb
in
150
04
0, Heil
on
gjin
g, Chi
n
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
:
496
455
21
@16
3
.co
m
A
b
st
r
a
ct
Deci
ma
l flo
a
tin
g
Poi
n
t a
d
d
e
r is o
ne
of the
mo
st frequ
ent
op
eratio
ns us
ed
by
ma
ny fi
nanc
ia
l
,
busi
ness
an
d u
s
er-orie
n
ted
a
p
p
licati
ons
but c
u
rrent i
m
pl
e
m
e
n
tations
in
F
P
GAs are v
e
ry i
nefficie
n
t in
ter
m
s
of both area a
nd late
ncy w
hen compar
ed to bin
a
ry fl
oatin
g poi
nt adder. T
h
is pap
er has show
n an efficie
n
t
imple
m
entati
o
n
of a new
paral
lel d
e
ci
ma
l floa
ting po
int
mod
u
le o
n
a rec
onf
igur
abl
e pl
atform, w
h
ich is b
o
t
h
area as w
e
l
l
a
s
perfor
m
anc
e
opti
m
al. T
he
deci
m
al flo
a
tin
g
-po
i
nt Add
e
r
w
a
s further pi
peli
n
e
d
into fiv
e
stages to incr
ease the
max
i
mu
m freq
ue
nc
y of operatio
n. T
he synthesi
s
results for a Stratix IV device
indic
a
te that o
u
r impl
ementa
t
ions hav
e 25.
1% red
u
ctio
n of the latency
and 1.
1
%
re
ductio
n
of are
a
compar
ed to a
n
existin
g
alte
r-core ad
der d
e
sig
n
, prese
n
ti
ng are
a
an
d d
e
lay fig
u
res cl
ose to those
o
f
opti
m
a
l
bin
a
ry add
er trees.
Ke
y
w
ords
: F
l
o
a
ting Po
int, F
i
e
l
d Progr
a
m
ma
ble Gate Array,
Parall
el Add
e
r
,
Pipeli
n
e
Copy
right
©
2013 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Floating
poin
t
adde
r is wi
dely used i
n
larg
e
set of
scientific
an
d sig
nal
pro
c
e
ssi
ng
comp
utation.
However, b
i
nary floating
-
point
o
perations a
r
e n
o
t suitable fo
r financial
an
d
comm
ercial
comp
utation
s
. its binary cou
n
terp
ar
t
has a
n
inna
te defect in aforeme
n
tio
ned
appli
c
ation
s
[1]. Decim
a
l n
u
mbe
r
s in th
ese a
ppli
c
ati
ons a
r
e u
s
u
a
lly required to be re
pre
s
e
n
ted
exactly, and
arithmeti
c
o
p
e
ration
s
often ne
ed to
m
i
rro
r m
anu
al
decim
al
cal
c
ulation
s
, whi
c
h
perfo
rm de
ci
mal roun
ding
. but most of the
decimal
floating poin
t
numbers ca
nnot be exactly
rep
r
e
s
ente
d
by the bin
a
ry weighte
d
seri
es in
a
finite pre
c
i
s
i
on, and
the
error can
be
accumul
a
ted
after cal
c
ulati
ons. Althoug
h De
cima
l flo
a
ting point op
eration
s
can
be don
e thro
ugh
the softwa
r
e
prog
ram,
but
its spee
d i
s
1
00~100
0
tims slo
w
e
r
th
an t
he
spe
ed
of
binary
floatin
g-
point arithm
e
t
ic [2]. Addition is the b
a
sic but
the m
o
st impo
rtant
function am
ong the d
e
ci
mal
arithmeti
c
op
eration
s
. In
seq
uential
m
u
ltiplicat
ion
a
nd digit
re
cu
rre
nce divisi
on, the
partial
prod
uct
s
for
every iteratio
n are a
c
cum
u
lated by
the
adders. Mo
reover, in pa
rallel multiplication
and fun
c
tion
a
l
division, the
partial
pro
d
u
c
ts a
r
e
red
u
ced by the
ad
ders a
r
range
d in walla
ce t
r
ee
stru
cture. Since a
n
impr
ovement in a
ddi
tion ca
n ben
e
f
it to m
any other d
e
ci
mal o
peratio
ns, m
a
ny
method
s and
algorith
m
s
were ap
plied to
boost the pe
rforman
c
e of the de
cimal a
dder [3] [4].
De
cimal
arith
m
etic i
s
com
p
lex to impl
e
m
ent in
ha
rd
ware b
e
cau
s
e of the
large
r
rang
e of
decim
al digit
s
([0, 9]) a
n
d
the inefficien
cy of
binary
cod
e
s to rep
r
ese
n
t deci
m
a
l
values, so that
decim
al floati
ng p
o
int ad
d
e
r
coul
d n
o
t achieve b
e
tter pe
rforma
nce
an
d sm
aller
are
a
th
an
comp
arable
binary floatin
g point a
dde
r. Over th
e
years,
seve
ral de
sign
s f
o
r floating
p
o
int
decim
al a
dde
r have
be
en
prop
osed fo
r
ASIC and
FP
GA platform
s. FPGA imple
m
entation
s
a
r
e
gene
rally ba
sed on techni
que
s origi
nall
y
developed
fo
r VLSI arch
itecture
s. Th
e spe
c
ial buil
t
in
cha
r
a
c
t
e
ri
st
ic
s of
FP
GA
a
r
chit
e
c
t
u
re
s
make
it
dif
f
i
c
u
lt
t
o
u
s
e
m
any
w
e
ll-
kno
w
n
met
hod
s
t
o
spe
edu
p co
mputation
s
(for exampl
e,
carry-save a
nd sign
ed
-di
g
it
arithmetics) [5]. Theref
ore,
beyond ad
ap
ting existing techni
que
s we explore
n
e
w de
cimal flo
a
ting point a
dder al
gorith
m
s
more
s
u
itable for FPGAs
[6].
This p
ape
r p
r
esents t
he a
l
gorithm, a
r
chitec
ture an
d
FPGA imple
m
entation of
a novel
unit to perfo
rm fast de
cim
a
l floating po
int add. We
have de
sign
e
d
a de
cimal
parall
e
l floati
n
g
point ad
der
which
re
sults i
n
an a
r
ea
-effi
cient impl
em
entation o
n
S
t
ratix IV chip
of altera FP
GA.
The structu
r
e
of the paper is as follo
ws.
In Secti
on II
we Di
scuss some relatio
n
work. In Section
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No
. 10, Octobe
r 2013 : 577
4 –
5781
5775
III, A new dec
i
mal floating point adder
whic
h pe
rforms
addition is
prop
osed, and introduc
e
s the
resultant co
m
b
ination
a
l an
d pipeline
d
archite
c
ture
s. In Section IV pre
s
ent
s t
he sy
nt
he
sis r
e
s
u
lt
s
of an im
plem
entation
on
a
Stratix IV FPGA and
com
par
i
s
on
of the
re
sult
with al
tera
co
re
de
si
gn.
Finally, the concl
u
si
on
s are summ
ari
z
e
d
in Section
V.
2. Related Works
Scientific an
d
Enginee
ring
application
s
wo
rk on rea
l
numbe
rs. Real numb
e
rs
can b
e
rep
r
e
s
ente
d
in com
puters
in fix
ed-point
representati
ons
whe
r
e th
e fractio
nal p
o
int has
a fixed
positio
n in th
e numb
e
r. T
h
is allo
ws for using
th
e same integ
e
r
units to pe
rfo
r
m re
al num
ber
comp
utation
s
. Ho
weve
r th
e rang
e of
re
al nu
mbe
r
s
i
n
fixed-point
rep
r
e
s
entatio
n is very
sm
all.
Another alte
rnative is to re
pre
s
ent
re
a
l
nu
mb
er
s
in
floa
tin
g
-
po
in
t
re
pre
s
entatio
n. so floating
-
p
o
i
nt
arithmeti
c
i
s
very impo
rta
n
t . Addition
is the
ba
sic
but the m
o
st
impo
rtant fu
nction
amo
n
g
the
arithmeti
c
op
eration
s
.
2.1. Formats
of Floating
-point Numb
e
r
s
The IEEE 754-1985 i
s
the first IEEE standard
for
bi
nary floating-point comput
ations.
The standard was
later revised in 2008 (IEEE 754-2008) to
incorporate decim
a
l floating-poi
n
t
comp
utation
s
as
well. T
h
e 754
-1
985
stand
ard
def
i
nes fo
rmat
s
for representi
ng floating
-
p
o
int
numbe
rs and
spe
c
ial valu
e
s
(infinitie
s, a
nd NaNs) t
o
g
e
ther
with a
set of floating-point op
eratio
ns
that operate on these valu
es. A float
ing-point num
ber
con
s
i
s
ts of th
ree field
s
a
s
sho
w
n in Fi
g
u
re
1: The sign
bit, the fraction, and the expone
nt.
Since the si
gni
ficand i
s
normalize
d
, the most
signifi
cant
bit (MSB) mu
st
be
"1", he
nce it i
s
n
o
t ex
plicitly sto
r
ed
and
it i
s
call
ed a
"hid
den
1".
Only the fract
i
on is expli
c
itly repres
ented [7][8].
Figure 1. Significant an
d Exponent Repres
e
n
tation in
Single and
Double Preci
s
i
o
n
The IEEE 754-1985
standard
wa
s revised in 2008
when t
he IEEE 754-2008 replaced it.
It includes the entire ori
g
inal
IEEE 754-1985
standard in addit
i
on to decim
al floating-point
comp
utation
s
. The stand
ard define
s
arit
hmetic form
at
s for bin
a
ry a
nd de
cimal fl
oating-point d
a
ta
as
sh
own in
Table
1. It al
so
define
s
i
n
terchan
ge
fo
rmats
(en
c
o
d
i
ngs) fo
r th
e fl
oating-point
d
a
ta
[9].
Table 1. The
IEEE 754-2008 arithm
etic
formats
Name
Common Nam
e
Base
Digits
Max. e
x
pone
nt
Min. exponen
t
bin16
Half precision
2
11
15
-14
bin32
Single precision
2
24
127
-126
bin64
Double precision
2
53
1023
-1022
bin128
Quad
ruple precision
2
113
16383
-16382
decl32
10
7
96
-95
dec32
decl64
10
16
384
-383
dec64
dec128
10
34
6144
-6143
decl128
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Efficient Im
pl
em
entation of Decim
a
l Floa
ting Point Adder in FPGA (Yang huijin
g)
5776
2.2. Binar
y
F
l
oating-Poin
t Adder
In gene
ral, floating poi
nt arithmeti
c
im
plem
entatio
n
s
involve pro
c
e
ssi
ng sepa
rately the
sign,
expon
e
n
t and
manti
s
sa p
a
rt
s, an
d then
co
mbi
n
ing th
em aft
e
r
rou
ndin
g
a
nd n
o
rm
alization
[9]. The ha
rd
ware impl
em
entation of th
is a
r
ithm
etic f
o
r floating
po
int numbe
rs i
s
a
com
p
licated
operation
du
e to th
e n
o
rm
alizatio
n
requ
ireme
n
ts. An
impleme
n
tation of
do
uble
pre
c
isi
o
n
floa
ting
point add
er h
a
s be
en sho
w
n he
re[10].
The ste
p
s for
comp
uting ad
dition of two floating poi
nt numbe
rs p
r
o
c
e
eds a
s
follo
ws:
1. Compa
r
e
expone
nts an
d mantissa of
both num
be
rs. De
cide la
rge expone
nt & mantissa and
small expo
ne
nt & mantissa
.
2. Right shift the mantissa
asso
ciated
wi
th t
he smalle
r exponent, by the difference of exponent
s.
3. Add both mantissa if si
gns a
r
e same
else
subtra
ct
smalle
r mant
issa from larg
e one.
4. Do the rou
nding of the result after ma
ntissa additio
n
.
5. If
the sub
t
raction
re
sul
t
s in loss of most
signifi
cant bit (MS
B
), then the result mu
st
be
norm
a
lized.
To d
o
thi
s
, t
he m
o
st
sig
n
i
ficant n
o
n
-
ze
ro entry
i
n
th
e
result mantissa must be
shifted u
n
til i
t
rea
c
he
s th
e
front. Thi
s
i
s
a
c
compli
sh
ed by a
“L
ea
ding o
ne
det
ector (LO
D
)”
followed by a s
h
ift.
6. Do normali
zation a
nd ad
just larg
e exp
onent a
c
cordi
ngly.
7. Final re
sult
includ
es
sign
of larger n
u
m
ber, no
rmali
z
ed exp
one
nt and mantissa.
2.3. Decimal Digit Add
e
r
Finan
cial a
n
d
busi
n
e
ss
ap
plicatio
ns
use
deci
m
al
ba
sed a
r
ithmetic
to perfo
rm a
r
i
t
hmetic
operation
s
. These ap
pli
c
ation
s
requ
ire a
c
cu
ra
cy
.We have
mentione
d
that floating-point
arithmeti
c
introdu
ce
s a
ro
undoff erro
r. This i
s
be
ca
use of the fi
nitene
ss of t
he floating
-
p
o
int
numbe
rs rep
r
ese
n
table
in
a given
floati
ng-p
o
int
syst
em. The
a
c
cumulation
of
these
ro
und
o
ff
errors can re
sult in a total
l
y different numbe
rs
tha
n
the numb
e
r e
x
pected.Ea
rly solution to t
h
e
above proble
m
wa
s to implement de
ci
mal floating-
point arithm
e
t
ic in softwa
r
e. However,
to
increa
se the
perfo
rman
ce
of decim
al floating-p
o
in
t arithmetic, de
ci
mal floating-p
o
int units
were
recently impl
emented in h
a
rd
wa
re[11].
conve
n
tional
decim
al a
dde
r an
d it is sh
own i
n
Fig
u
re 2. Fo
r e
a
ch de
cimal
dig
i
t, it has
two 4
-
bit bin
a
ry ad
ders
a
nd
carry det
ection l
ogi
c
betwe
en the
adde
rs.
The
first level
ad
ders
prod
uce the b
i
nary ad
dition
re
sults. If the
re
sult
is
gre
a
t
er than
9, a
carry outp
u
t is p
r
odu
ce
d a
nd
the result of first level 4-bit
adde
r is co
rre
c
ted
by addin
g
6. Furtherm
o
re,
the ca
rry
output is use
d
as a
ca
rry i
n
put for the
ne
xt digit. The main di
sadva
n
tage of the
conve
n
tional
decim
al ad
de
r is
its lo
w
spe
e
d
be
cau
s
e
all
the first level
4-bit
adde
rs
must
wait fo
r
a nu
mbe
r
of
4-bit
addition
s to
get the right carry inp
u
t [12].
Figure 2. Con
v
entional BCD Adde
r
3. Design an
d Implementation
This sectio
n pre
s
ent
s ou
r decim
al floating-p
o
int add
e
r
, which uses a parallel me
thod for
decim
al sig
n
ifican
d alignm
ent and a Ko
gge-Sto
ne pa
rallel prefix network for
sig
n
ifican
d addit
i
on
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No
. 10, Octobe
r 2013 : 577
4 –
5781
5777
and
subt
ra
ction. The
de
cimal flo
a
tin
gpoint a
dde
r sup
p
o
r
ts al
l the ro
undi
ng mo
de
s a
nd
appropriate exceptions
sp
ecified i
n
IEEE 754.Figure
3 shows a
high-level
block diagram of
our
prop
osed de
cimal floating-point add
er.
The ‘Forward Format
Conv
er
s
i
on
Unit’ t
a
k
e
s
the two
IEEE-enc
oded operands
,
A and B,
and th
e o
p
e
r
ation, an
d p
r
odu
ce
s the
si
gn bit
s
, SA1
and SB1,
BCD
signifi
can
d
s
, CA
1 a
nd
CB1,
biased expo
n
ents, EA1 an
d EB1, and e
ffective oper
a
t
ion, EOP (n
ot sho
w
n in t
he figure). T
h
e
‘Operand Ali
gnment Calculation an
d Swappi
ng
Unit’ (OACSU) take
s the
s
e values a
n
d
comp
utes the
re
sult’
s
temp
ora
r
y expo
ne
nt, ER1, the
right
shift a
m
ount, RSA,
a
nd the
left
sh
ift
amount,
LSA. It also
swa
p
s
the
signifi
cand
s if EB1
> EA1
. Th
e
two
signifi
can
t
after
swappi
ng
are d
enote
d
as
CAS and
CBS. Next, two ‘De
c
im
al Barrel Shi
fters’ take th
ese
re
sults
and
perfo
rm ope
rand alig
nmen
t on CAS an
d CBS. The two shifted si
gnifica
nd
s, CA2 and CB2,
are
then
co
rre
cte
d
in th
e ‘Precorrectio
n
Unit
’. Base
d o
n
t
he EOP
sig
n
a
l an
d the
p
r
evailing
rou
n
d
ing
mode, the ‘P
re-co
r
rectio
n
Unit’ prepa
re
s the
B
CD o
pera
n
d
s
for addition or subtra
ction
a
n
d
inse
rts a val
u
e ne
ede
d fo
r
injectio
n-b
a
sed
roun
di
ng.
The co
rrecte
d
si
gnificand
s,
CA3 and
CB3,
are th
en fe
d i
n
to the Ko
gg
e-Stone
(K-S
) network,
wh
ic
h
p
r
od
uc
es
a
n
un
co
rr
ec
ted
r
e
s
u
lt, U
CR,
a
digit-carry vector, C1, and
flag vectors,
F1 and
F2.
After this, the ‘
P
ost-co
rre
cti
on Unit’ conv
erts
UCR ba
ck int
o
the BCD e
n
co
ding to produ
ce
CR1. If needed, the
‘Shift and Round Unit’ sh
ifts
and
roun
ds CR1 to
pro
d
u
c
e the result’s
signifi
ca
n
d
s,
CR2, and
adj
usts th
e temp
ora
r
y expon
e
n
t,
ER1, to prod
uce the resul
t
’s exponent,
E
R2. Simu
ltaneou
sly, the ‘Sign Unit’ an
d the ‘Overflow
Unit’ compute
the re
sult’s
sign bit, SR1,
and the ov
e
r
fl
ow
signal. T
h
e re
sult’s val
ues,
CR2, ER2,
and S
R
1, are combined to gen
erate the IEEE-encoded
resul
t
in the ‘Backward F
o
rmat
Conve
r
si
on
Unit’. This
result a
nd th
e origi
nal in
put ope
ran
d
s
are exami
ned in the ‘
P
ost-
pro
c
e
ssi
ng Unit’ to determi
ne if a spe
c
ial
result is
n
e
e
ded, whi
c
h h
appe
ns if eith
er one o
r
bot
h of
the ope
ra
nd
s are
Not
-
a-Numbe
r
(Na
N
) or infinity
. F
u
rthe
r detail
s
on e
a
ch of
these
units a
r
e
provide
d
belo
w
.
Figure 3. Block
Diag
ram o
f
the Propose
d
De
cimal Flo
a
ting-Poi
n
t Adder
The core
of our de
cimal
fl
oating-point adde
r
o
perates o
n
BCD
si
gni
fica
nd
s. T
herefo
r
e,
conve
r
ters a
r
e first empl
o
y
ed to extract the
DPD-en
cod
ed si
gnifi
can
d
s, bin
a
ry
exponent
s, and
sign
bits from
both IEEE-encoded
operands. Once
unpacked, the
tw
o resulting
signifi
cands are
swapp
ed if EB1 > EA1 and the tem
pora
r
y re
sult
exponent, ER1, is dete
r
mine
d. The
two
signifi
can
d
s
after swappi
n
g
are den
ote
d
as CAS an
d CBS where the sub
s
cri
p
t “S” refers to
Swapp
ed. Th
e numb
e
r of
leading
ze
ro
s in the
si
gni
ficand
with the larger
expone
nt, CAS, is
denote
d
a
s
LAS. In para
llel with
swa
pping th
e o
p
eran
ds, th
e
effective op
e
r
ation
(EOP) is
determi
ned b
y
the Boolea
n equ
ation E
O
P = SA1S
B1Ope
ratio
n
, whe
r
e EO
P and Op
eration
are zero for a
ddition an
d o
ne for subtra
ction.
De
cimal op
e
r
and
alignm
e
n
t is more
compl
e
x tha
n
its bina
ry cou
n
terp
art
becau
se
decim
al nu
m
bers a
r
e n
o
t norm
a
lized.
This le
ad
s to both left a
nd rig
h
t shift
s
to en
su
re
the
roun
ding l
o
ca
tion is in
a fixed digit p
o
siti
on. To
corre
c
tly adjust bot
h ope
ran
d
s t
o
have the
sa
me
expone
nt, the follow com
p
u
t
ations are pe
rforme
d:
LSA =
min(|E
A1
−
EB1|, LAS)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Efficient Im
pl
em
entation of Decim
a
l Floa
ting Point Adder in FPGA (Yang huijin
g)
5778
RSA =
min(max(|EA1
−
EB1|
−
LAS, 0), 19)
ER1 =
EAS
−
LSA
The above e
quation
s
pro
duce a left shift amount, LSA, which i
ndicates by h
o
w many
digits
CAS
sh
ould
be l
e
ft shifted. LSA is equ
al to th
e
absolute val
u
e of the
expo
nent differen
c
e,
|EA1
−
EB1|, but is limited
to LAS digits so that
the l
e
ft-shifted
sig
n
ifican
d, CA2
,
does n
o
t ha
ve
more th
an 16
digits. The
RSA value indi
cate
s by ho
w many digits
CBS sh
ould
be rig
h
t shifte
d in
orde
r to gu
arante
e
that
both numb
e
rs
have th
e sam
e
exp
onent, ER1,
after sig
n
ifican
d
alignme
n
t. RSA is limited
to 19 digit
s
, since th
e ri
ght shifted signifi
can
d
,
CB2, contain
s
16 di
gits
plus
guard and round digi
ts and
a
sticky bit, T
he temporary result exponent,
ER1 i
s
sim
p
l
y
the
larger exponent, EAS, after it
has been adjusted to compensate for
the left shift amount, LSA.
After comp
uting the left an
d right shift amounts, two decim
al ba
rre
l shifters, whi
c
h shift
by multiple
s
of four
bits,
perfo
rm the
operand
alig
nment. Th
e
significa
nd
s af
ter alig
nment
are
denote
d
a
s
CA2
= l
e
ft sh
ift(CAS, LSA) and
CB
2
=
right shift(CB
S,RSA). As
n
o
ted p
r
eviou
s
ly,
CA2 is 1
6
dig
i
ts, and CB2 i
s
16 di
gits pl
us a g
uar
d di
git, G, a roun
d digit, R and
a sticky bit, S,
as sho
w
n in
Figure 4.
Once shifted, an inje
ction
value ba
sed
on the
sign
b
i
t and prevaili
ng ro
undi
ng
mode i
s
inse
rted i
n
to the Roun
d a
n
d
Sticky
digit
positio
ns
of CA2 to form
CA_2, whi
c
h i
s
a 19
-digit B
C
D
numbe
r. Th
e
inje
ction val
ue i
s
dete
r
mi
ned
by equ
ations simil
a
r t
o
tho
s
e
deve
l
oped
for
bin
a
ry
floating-p
o
int addition an
d is used to faci
litate corre
c
t roundi
ng.
Figure 4. Ope
r
and Pla
c
em
ent for De
cim
a
l Addition
Becau
s
e
bot
h ope
ran
d
s
are
co
rre
cted
, a binary K
ogge
-Stone
(K-S) net
work can
be
used to generate the proper
ca
rry i
n
to each
digit. Figure 5
illustrates how the original K
-
S
netwo
rk i
s
e
x
tended to d
e
tect trailin
g
nine
s.
The traditional inj
e
ction b
a
se
d roundi
ng met
hod
use
s
con
d
itio
nal ad
ders to
comp
ute the
uncorre
cted
sum and
the u
n
co
rrecte
d su
m plus one
a
n
d
then uses the
MSDs of the
s
e value
s
an
d the carry
into the LSD of the un
corre
c
te
d sum to sel
e
ct
the prope
r su
m. To re
du
ce
are
a
, our ad
der i
n
st
e
ad u
s
e
s
the flag
g
ed-p
r
efix met
hod to
com
p
ute
the uncorre
cted sum a
nd the un
corre
c
te
d sum plu
s
o
ne.
The tempo
r
a
r
y result ge
nerate
d
from
the Kogge-Stone netwo
rk re
qui
re
s a post-
corre
c
tion uni
t to convert the uncorre
cted re
sul
t, UCR back to BCD to produ
ce
CR1. The ru
le
s
for perfo
rmin
g this co
rrecti
on are d
e
fine
d in Figure 6.
Overflow o
c
curs when
the a
ddition
or
su
btra
ctio
n of t
w
o
op
eran
ds ex
ce
eds the
maximum rep
r
esentabl
e va
lue in the
de
stination fo
rm
a
t. Typically, the ad
der ne
e
d
s to
che
c
k t
h
e
carry from th
e MSD after i
n
creme
n
ting t
he
co
rre
cted
result
to se
e
i
f
an overflo
w
occurs. With
t
he
injectio
n-b
a
sed rou
ndin
g
method, ho
wever, si
nce
the injectio
n corre
c
tion
value does not
gene
rate
ano
ther
carry fro
m
the MSD, t
he ove
r
flow
signal
can
be
gene
rated
by examining
th
e
final expone
n
t
from the operan
d alignm
ent unit, ER1,
and the MSD of the correct
ed re
sult, CR1.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No
. 10, Octobe
r 2013 : 577
4 –
5781
5779
Figure 5. Con
c
eptu
a
l View
of the Flag-
b
a
se
d Logi
c a
nd the Kogge
-Stone Netwo
r
k.
Figure 6. The
rules fo
r perf
o
rmin
g this correctio
n
The ‘Ove
rflo
w Unit’, also
gene
rate
s a
sign
al to det
e
r
mine if the fi
nal re
sult
sho
u
ld be infinity
or
the maximum
rep
r
e
s
enta
b
l
e
value of th
e de
stination
format, ba
se
d on the
rou
nding m
ode
and
the si
gn
of th
e result. Usin
g this si
gnal
and th
e ove
r
f
l
ow fla
g
, the f
i
nal result
ca
n be
mo
dified
, if
need
ed, in the ‘Post Pro
c
e
ssi
ng Unit’.
We have full
y pipelined t
he co
mbinati
onal archite
c
ture a
s
follo
ws: ea
ch lev
e
l of the
adde
r tree i
s
placed in
a
pi
peline
stag
e.
Beside
s,
e
a
ch carry-ri
pple
add
er i
s
pipe
lined in
chun
ks
of k bit
s
at
most. Th
e to
tal numb
e
r
o
f
pipeline
d
stages i
s
equ
al to [(4p
+l
)/k]+[log
2
(m
)]. A
signifi
cant a
m
ount of re
g
i
sters is
req
u
i
red for i
nput
synch
r
o
n
ization. To re
du
ce the h
a
rd
ware
co
st, these
synchroni
zatio
n
re
giste
r
s are pla
c
ed i
n
th
e first pi
pelin
e level of the
tree a
nd p
a
cked
together a
s
1
6
-bit shift re
gi
ster L
U
T
s
.
4.
Experime
nts and
Res
u
lts Comp
arison
The thre
e de
cimal floating
point adde
rs have
been d
e
scripte
d
in the verilog HDL an
d
impleme
n
ted
on Th
e ha
rdwa
re EP4S
G
X230
wi
ch
is Stratix IV FPGA. Stra
tix FPGAs h
a
ve
HardCopy ASIC equivale
nt device
s
. HardCopy
ASICs provide
a path to low-co
st volu
me
prod
uctio
n
wi
th low ri
sk through FP
GA p
r
ototyping
of
your de
sig
n
. Stratix serie
s
FPGAs a
r
e al
so
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Efficient Im
pl
em
entation of Decim
a
l Floa
ting Point Adder in FPGA (Yang huijin
g)
5780
ideal for th
e
prototyping
a
nd verificatio
n
of stan
dard
-
cell ASICs.
Table
2 co
m
pare
s
Criti
c
al
path
Laten
cy and the total LUT'
s of the three desi
g
n
s
.
From Fig
u
re
7, the propo
sed DFP ad
de
r has
a
bout 1
9
.2 percent l
e
ss delay an
d 11.53
percent le
ss
LUT'
s tha
n
th
e de
sign
pre
s
ented in
[2], a
nd ab
out 9.7
6
pe
rcent le
ss del
ay and
8
.
85
percent le
ss
LUT'
s than th
e desi
gn presented in
[3].
Figure 7. The
three add
ers performan
ce
compa
r
i
s
on
Futher mo
re
, we
com
p
are th
e p
r
o
posed m
odu
le with
the
altera
core
add
er.
Implementati
on re
sults a
r
e sho
w
n on
Figure 8.
The
prop
osed de
sign ha
s been
implemente
d
for
variou
s late
n
c
ie
s. The
dat
a for
altera
Core a
dde
r
has also b
e
e
n
shown fo
r
variou
s avail
able
latenci
e
s, to have better i
dea of pro
p
o
s
ed d
e
sig
n
. The pro
p
o
s
ed
desi
gn is taki
ng app
roxima
tely
same
hardware
(in term
s of numbe
r o
f
LUT’s
and
FF’s
cou
n
t) a
s
of altera m
odule, but h
a
v
e
better pe
rformance spee
d
with simil
a
r l
a
tenci
e
s.
Th
e
prop
osed d
e
sign i
s
a
c
hie
v
ing a spee
d
of
358 M
H
z th
an 286 M
H
z for altera
core for a lat
ency of 12,
which sh
o
w
s a
signifi
can
t
perfo
rman
ce i
m
provem
ent in the prop
ose
d
desi
gn.
Figure 8. The
decimal floati
ng point ad
de
r on Stratix IV
5. Conclusio
n
This
pap
er
h
a
s
sh
own an
efficient im
pl
ementation
o
f
a ne
w
pa
rallel de
cim
a
l
floating
point mod
u
le
on FPGA, We de
scri
be
d in detail se
veral novel
compon
ents i
n
the desi
g
n
s
. we
have p
r
ovide
d
a detail
ed
analysi
s
o
n
o
u
r
synthe
sis
result
s an
d a
comp
ari
s
o
n
b
e
twee
n a alte
ra-
core ad
der
d
e
sig
n
and th
e
others two d
e
cimal flo
a
tin
g
point ad
ders. Implementi
on re
sult
s sh
ow
that the p
r
op
ose
d
ad
de
r d
e
sig
n
ha
s
25
.1% less late
ncy an
d 1.2
%
less L
U
T'
s than the
alte
ra-
core de
sig
n
. We
can
also
find the prop
ose
d
ad
der
d
e
sig
n
ha
s the
better pe
rformance than t
he
other two d
e
cimal floating point add
ers.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NIKA
Vol. 11, No
. 10, Octobe
r 2013 : 577
4 –
5781
5781
Ackn
o
w
l
e
dg
ement
This work wa
s su
ppo
rted b
y
F201232.
Referen
ces
[1]
Asger Mu
nk Ni
else
n, Davi
d
W. Matula, Ch
ung
Na
n
L
y
u,
Gu
y
Ev
en. An
IEEE compli
an
t floating-
po
in
t
add
er that c
onf
orms
w
i
th
the
pip
e
li
ne
pack
e
t-for
w
a
r
di
ng
par
adi
gm.
IEEE Transactions
on Com
p
uters
.
200
0; 49: 33-4
7
.
[2]
J Moskal, E Oruklu,
and J Saniie.
Des
i
gn
a
n
d
Synth
e
sis
of
a C
a
rry-F
ree S
i
gn
ed-D
i
git
De
cimal A
d
d
e
r
.
Procee
din
g
s of
the IEEE Sy
m
posi
u
m on Cir
c
u
its and S
y
ste
m
s. 2007; 10
8
9
-10
92.
[3]
K Yehi
a, HAH
F
ahm
y
,
a
nd
M Hassan.
A
Red
und
ant D
e
cimal F
l
o
a
tin
g
-Point Ad
der
. P
r
ocee
din
g
s of
Asilomar C
onf
erenc
e on Si
gn
als, S
y
st
ems &
Computers. 2
010; 11
44-
114
7.
[4]
Amir Kaiva
n
i a
nd Ghassem J
aber
ipur. F
u
ll
y
redu
nda
nt deci
m
al ad
ditio
n
a
nd subtracti
on
usin
g stored-
uni
bit enc
odi
ng
.
Integration, th
e VLSI journ
a
l
.
2010; 4
3
(1): 3
4
-41.
[5]
EM Sch
w
arz,
JS Kaper
nick,
and MF Co
wlisha
w
.
Decim
a
l floati
ng-
poi
n
t
support o
n
the IBM z10
process
o
r.
IBM Journa
l of Res
earch a
nd D
e
v
e
lo
p
m
ent
. 20
0
9
; 53: 231-
239.
[6]
J T
hompson,
N
Karra,
an
d MJ
Schu
lte.
A
64-
bit d
e
ci
ma
l fl
oa
ting-p
o
int
ad
de
r
. Procee
di
ngs
of the
IEE
E
Comp
uter Soci
et
y
An
nua
l S
y
mposi
u
m on V
L
SI. 2004; 29
7
-
298.
[7]
Lia
ng-Ka
i W
a
n
g
, MJ Schulte,
JD T
hompson
, and N J
a
iram
. Hard
w
a
r
e
D
e
signs for D
e
ci
mal F
l
oati
ng-
Point Add
i
tio
n
and R
e
lat
ed Operati
on.
IEEE Transactions on Com
p
uters
. 200
9; 58: 322-
335.
[8]
G Even and
PM Seidel.
A
Com
paris
on of
T
h
ree
Rounding Algorithms
fo
r IEEE
Floating-Point
Multipl
i
cati
on.
IEEE Trans. Com
p
uters
. 200
0;
49(7): 638-
65
0.
[9]
A Vazqu
e
z, E
Antelo
an
d P
Montusch
i. Impr
ove
d
Des
i
gn of H
i
g
h
-Pe
r
formance P
a
r
a
lle
l D
e
cima
l
Multipl
i
ers.
IEEE Transactions
on Com
p
uters
, 2010; 59(
5): 679-6
93.
[10]
Ghassem Ja
b
e
rip
u
r an
d Sae
i
d Gorgi
n
.
A N
onsp
e
cul
a
tive
Maxi
mal
l
y Re
d
und
ant Sig
n
e
d
Digit Ad
der
.
Procee
din
g
s of
T
he 13th inter
natio
nal CS
I C
o
mputer C
onfe
r
ence. 20
08; 2
35-2
42.
[11]
Soni
a Gonza
l
e
z
-Navarro
a, Ja
vier Hormi
go
a, Mic
hae
l J. Schulte
b. A stud
y of d
e
cima
l le
ft shifters for
bin
a
r
y
numb
e
r
s
.
Informatio
n and C
o
mput
ati
o
n
. 201
2; 216:
47-5
6
.
[12]
Saei
d Gorgi
n
, Ghassem Jab
e
r
ipur.
A
full
y
r
edu
nd
ant deci
m
al ad
der a
n
d
its appl
icati
on in p
a
ra
lle
l
decim
al multi
p
l
i
ers.
Microel
ect
r
onics Jo
urna
l
. 200
9; 40(1
0
): 1471-
148
1
Evaluation Warning : The document was created with Spire.PDF for Python.