Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
4, N
o
. 4
,
A
ugu
st
2014
, pp
. 54
8
~
55
6
I
S
SN
: 208
8-8
7
0
8
5
48
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Strategies for FPGA Implementa
tion of Non-Restoring Square
Root Algorithm
Tole Su
tikn
o
1
,
Aim
a
n Z
a
kw
an
Jidin
2
,
Auz
a
ni
Jidin
3
,
Nik
Rumz
i Nik
Id
ris
4
1
Universitas Ah
mad Dahlan
(UAD), Yog
y
akarta, Indon
esia
2,3
Uni
v
e
r
si
ti
Tekni
ka
l Ma
l
a
y
s
ia
Me
la
ka (UTe
M),
Me
laka, Malaysia
4
Universiti Tekn
ologi
Malay
s
ia (
U
TM), Johor
Bahru, Malay
s
ia
Article Info
A
B
STRAC
T
Article histo
r
y:
Received J
u
n
2, 2014
Rev
i
sed
Ju
l 10
,
20
14
Accepte
d
J
u
l 26, 2014
This pap
e
r pres
ents thr
ee str
a
tegies
to
implement non r
e
storing
square roo
t
algorithm b
a
sed
on FPGA. A
new basic build
in
g block
is c
a
ll
e
d
control
l
ed
subtract-multiplex (CSM) is introduced in
first strateg
y
which use gate lev
e
l
abstraction. Th
e main prin
ciple
of th
e method
is similar with
convention
a
l
non-restoring algorithm,
but
it
only
uses subtract op
eration
and
append
01,
while add op
eration and app
e
nd
11 is not
used. Second strateg
y
p
r
esents th
e
first strateg
y
in
register tr
ansfer leve
l (RTL) abst
ract
ion. In third
strateg
y
,
a
modification for
the
impl
ementation of conv
entio
nal non-r
e
storin
g algorithm
is presented wh
ich also use R
TL abstra
ction.
The all above s
t
rategies is
implemented in
VHDL programming and a
dopt fully
p
i
pelined
architectur
e.
The strategies h
a
ve condu
cted to implement succe
ssfully
in FPGA hardware,
and e
ach o
f
th
e s
t
ra
tegi
es
is
offer an
eff
i
ci
e
n
t in h
a
rdware
res
ource
. In
generally
,
th
e
third strateg
y
is superior.
Keyword:
FPGA
No
n-
rest
ori
n
g al
go
ri
t
h
m
Pipelined arc
h
i
t
ecture
Sq
uare
r
oot
cal
cul
a
t
i
o
n
Copyright ©
201
4 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
To
le Su
tikn
o
Depa
rtem
ent of Electrical E
n
ginee
r
ing
Uni
v
ersitas A
h
m
a
d
Da
hlan
Kam
pus
3, Jl
n.
Pr
of
. S
o
e
pom
o, J
a
nt
ura
n
,
U
m
bul
Har
j
o
,
Y
ogy
a
k
art
a
5
5
1
6
4
,
In
d
onesi
a
Em
a
il: to
le@ee.u
a
d
.
ac.i
d
1.
INTRODUCTION
Sq
uare r
o
ot
cal
cul
a
t
i
on i
s
o
n
e o
f
t
h
e m
o
st
useful
a
nd
v
i
t
a
l
operat
i
on
i
n
com
put
er g
r
ap
hi
cs an
d
scien
tific calcu
latio
n
ap
p
licatio
n
s
, su
ch
as d
i
g
ital sig
n
a
l
pr
o
cessi
ng (
D
S
P
)
al
go
ri
t
h
m
s
,
m
a
t
h
cop
r
oc
esso
r,
dat
a
pr
ocessi
ng a
n
d
cont
r
o
l
,
an
d e
v
en m
u
l
t
i
m
e
dia appl
i
cat
i
ons
[1
-6]
.
It
i
s
a cl
assi
cal
pro
b
l
e
m
i
n
co
m
put
at
i
ona
l
n
u
m
b
e
r t
h
eory
an
d often en
cou
n
t
ered,
wh
ich is a
h
a
rd
task
t
o
g
e
t an ex
act resu
lt [7
,
8
]
.
Som
e
square
ro
ot
cal
cul
a
t
i
on a
p
p
r
oach
has h
a
ve
be
en st
u
d
i
e
d
,
s
u
ch a
s
R
o
ug
h est
i
m
at
i
o
n
,
Bab
y
lo
n
i
an
meth
od
, expo
n
e
n
tial id
en
tity, Taylo
r
-series exp
a
n
s
ion
al
g
o
rith
m
,
Newt
o
n
-Raph
s
on
meth
od,
Swee
ney
R
o
b
e
rt
so
n T
o
c
h
er
re
du
n
d
ant
a
n
d
no
n
re
du
n
d
a
n
t
m
e
t
hod,
res
t
ori
n
g a
n
d
no
n-
rest
o
r
i
n
g al
g
o
ri
t
h
m
(di
g
i
t
-
by
-
d
i
g
i
t
m
e
t
hod)
[
1
-
9
]
.
Ho
weve
r, t
h
e earl
y
p
r
oce
ssor
s
car
ry
ou
t
t
h
e squa
re r
oot
ope
rat
i
o
n of t
h
e
al
go
ri
t
h
m
s
abo
v
e
by
s
o
ft
wa
re
m
eans,
whi
c
h
have
l
o
ng
del
a
y
s
fo
r i
t
s
c
o
m
p
l
e
t
i
on [
6
]
.
W
i
t
h
th
e
rap
i
d ad
v
a
n
cem
en
t o
f
tech
no
log
y
wh
ich
is po
ssi
b
l
e to
in
teg
r
at
e larg
e circu
its o
n
a
sing
le
chip a
nd incre
a
se in dem
a
nd
for fast
er c
o
mputational
e
x
ec
ution tim
e, hard
wa
re realize
of s
q
uare root becam
e
m
o
re attractive [6].
Unfortunately b
ecause
of the
com
p
lexity of the
s
q
ua
re
root algorithm
s
, the s
qua
re root
calcu
latio
n
is
no
t easy to im
p
l
e
m
en
t o
n
field
p
r
og
ra
m
m
able
array
(FP
G
A
)
t
echn
o
lo
gy
[1
, 3, 5,
1
0
]
.
There a
r
e s
o
m
e
al
go
ri
t
h
m
s
of sq
uare
r
oot
whi
c
h are i
m
pl
em
ent
e
d on
F
P
G
A
. T
h
ey
ar
e
ge
neral
l
y
gr
o
upe
d i
n
t
o
t
w
o
di
st
i
n
ct
cat
ego
r
i
e
s.
In first categ
ory is called
esti
m
a
tio
n
m
e
th
o
d
s, su
ch as Roug
h
estimatio
n
and
Ne
wt
o
n
-
R
a
ph
so
n m
e
t
h
o
d
(an
d
al
s
o
i
t
s
de
ri
vat
i
o
ns:
C
O
R
D
IC
,
De
Lu
gi
sh'
s
an
d
C
h
en'
s
),
an
d i
n
sec
o
nd
cat
ego
r
y
i
s
call
e
d di
gi
t
-
by
-di
g
i
t
m
e
t
hod. T
h
e rest
ori
ng al
g
o
ri
t
h
m
has a bi
g l
i
m
i
t
a
t
i
on at rest
ori
n
g st
ep
i
n
t
h
e
regular
flow. P
r
im
arily for t
h
i
s
reas
on, although i
n
itially having led the
way for all the
othe
r m
e
thods
, it has
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
St
rat
e
gi
es f
o
r
FPGA
I
m
pl
e
m
ent
a
t
i
o
n
of
N
o
n-Re
st
ori
n
g
S
q
uar
e R
oot
Al
g
o
r
i
t
h
m (
T
ol
e S
u
t
i
kno)
54
9
decl
i
n
ed
i
n
i
m
po
rt
ance
an
d
n
o
wa
day
s
i
t
i
s
n
o
l
o
n
g
er
use
d
[
11]
.
The
n
o
n
r
e
st
ori
n
g al
go
ri
t
h
m
does
not
re
st
or
e
th
e rem
a
in
d
e
r, wh
ich can b
e
im
p
l
e
m
en
ted
with
fewest
h
a
rd
ware
re
so
u
r
ce.
It is m
o
st suitable
f
o
r
FPG
A
im
ple
m
entatio
n a
n
d allows for IEEE sta
nda
rd
r
oundi
ng to
be rea
d
ily im
ple
m
ented [1-3, 6].
There a
r
e m
a
ny
st
rat
e
gi
es or
archi
t
ect
u
r
es h
a
ve co
n
duct
e
d t
o
im
pl
em
ent
the n
on
rest
o
r
i
n
g di
gi
t
-
by
-
di
gi
t
s
qua
re ro
ot
al
g
o
ri
t
h
m
i
n
FP
GA ha
rd
wa
re. Yam
i
n
and Wanm
i
ng [1
, 2
,
9]
have
i
n
t
r
o
d
u
ced
a no
n rest
ori
n
g
alg
o
rith
m
with
fu
lly p
i
p
e
lin
ed
and
iterativ
e v
e
rsion
th
at req
u
i
res n
e
ith
er
m
u
ltip
liers n
o
r
m
u
ltip
lex
o
r
s. Th
ey
in
tr
odu
ced th
e
car
r
y
sav
e
adder
(
C
SA)
an
d
car
r
y
p
r
op
ag
at
e add
e
r (
C
PA
) as
b
a
sic bu
ildin
g
b
l
o
c
k
s
.
A
lth
ough
t
h
e al
go
ri
t
h
m
s
i
n
[1,
2]
hav
e
a speed
pr
o
cessi
ng
, t
h
ey
con
s
um
es t
oo m
a
ny
hard
wa
r
e
reso
urce
, w
h
i
l
e
t
h
e
alg
o
rith
m
s
in
[9
] alth
ou
gh
it
co
st less resource,
b
u
t
it has
low s
p
ee
d. T
h
e sim
ilar architectures a
b
ove
have
i
n
t
r
o
d
u
ced
by
Xi
aol
i
a
n
g
[
1
0]
, Tha
k
kar
[1
2]
and
Xi
um
i
n
et
al
[1
3]
. I
n
t
h
e
ot
he
r st
u
d
y
,
S
a
m
a
wi
et
al
[6]
have
i
n
t
r
o
d
u
ced
co
nt
r
o
l
l
e
d a
d
d
-
s
u
b
(C
AS)
as
basi
c
bui
l
d
i
n
g
bl
oc
ks.
T
h
e
eff
o
rt
i
s
d
one
t
o
re
d
u
ce
ha
rd
war
e
con
s
um
ed,
wi
t
h
m
oderat
e
del
a
y
.
The
ot
her
a
r
chi
t
ect
u
r
e
also
h
a
s
propo
sed is fu
lly co
m
b
in
atio
n
a
l
arch
itectu
r
e
[4].
Howe
ver, the FPGA is
very s
u
itable for adoption
of t
h
e fully pipe
lined
a
r
c
h
itecture becaus
e
of
t
h
e
ch
aracteristics o
f
its stru
cture. Hen
ce, the v
e
ry litt
le o
r
ev
en
n
e
ed
less ex
tra co
st, if th
e p
i
p
e
lin
e techn
o
l
o
g
y
is
im
pl
em
ent
e
d i
n
F
P
G
A
[1
4]
.
Thi
s
pa
pe
r p
r
esent
s
t
h
ree st
rat
e
gi
es t
o
i
m
pl
em
ent
non
r
e
st
ori
n
g s
q
uar
e
ro
ot
al
g
o
ri
t
h
m
based
o
n
FPGA
wh
ich
ad
op
t
fu
lly p
i
p
e
lin
ed
arch
itectu
r
e. Th
e
first
s
t
rat
e
gy
use
gat
e
l
e
vel
ab
st
ract
i
on
w
h
i
c
h i
n
t
r
od
uce
C
S
M
as a basi
c bui
l
d
i
n
g bl
o
c
k. T
h
e m
a
i
n
pri
n
ci
pl
e o
f
t
h
e fi
rst
m
e
t
hod
i
s
onl
y
us
es s
u
bt
ract
o
p
e
r
at
i
o
n an
d
appe
n
d
0
1
,
w
h
i
l
e
add
o
p
era
t
i
on an
d a
ppe
nd
1
1
i
s
n
o
t
u
s
ed. Sec
o
n
d
st
rat
e
gy
p
r
ese
n
t
s
t
h
e fi
rst
st
ra
t
e
gy
i
n
reg
i
ster tran
sfer lev
e
l
(RTL) ab
straction
,
an
d in th
ird
strateg
y
, a m
o
d
i
ficatio
n
fo
r the im
p
l
e
m
en
tat
i
o
n
of
con
v
e
n
t
i
onal
n
o
n
-re
st
ori
ng al
go
ri
t
h
m
i
s
present
e
d
whi
c
h al
so use RTL abstraction.
In
the th
ree strateg
i
es will
needs fe
wer
pipeline stages
com
p
ar
ed
wi
th
th
e propo
sed
algo
rith
m
in
[12
]
.
Nex
t
, th
e p
e
rfo
rm
a
n
ce of
d
e
v
e
l
o
p
e
d
syste
m
s will b
e
com
p
ared
to
Samawi et al [6
].
2.
D
I
GIT-
BY-
D
IGIT CA
LCULA
T
ION
M
ETHOD
In di
gi
t
-
by
-di
g
i
t
cal
cul
a
t
i
on m
e
t
hod, eac
h
di
gi
t
of t
h
e s
q
u
a
re ro
ot
i
s
fo
u
nd i
n
a se
que
n
ce whe
r
e o
n
l
y
one
digit of the squa
re root is gene
rated at
each iterati
on
[2,
6, 13]. It ha
s several a
dva
ntages
, suc
h
as
: every
d
i
g
it o
f
t
h
e ro
ot fo
und
is kn
own
t
o
b
e
co
rrect an
d
it will
n
o
t
h
a
s to
b
e
chan
g
e
d
later; if
th
e squ
a
re
roo
t
h
a
s to
b
e
exp
a
n
d
e
d
,
i
t
will b
e
termi
n
ated
after th
e
last d
i
g
it
is foun
d
;
an
d
t
h
e algo
rith
m
wo
rk
s
fo
r an
y nu
m
b
er b
a
se
(o
f c
o
u
r
se t
h
e
pr
ocess
de
pe
nd
s o
n
num
ber
ba
se).
In
ge
neral
,
t
h
i
s
m
e
t
hod ca
n b
e
di
vi
de
d i
n
t
w
o cl
asse
s, i
.
e.
rest
o
r
i
n
g an
d
no
n r
e
st
o
r
i
n
g
di
gi
t
-
by
-di
g
i
t
al
go
ri
t
h
m
[6]
.
In
rest
ori
n
g al
go
ri
t
h
m
,
t
h
e p
r
oce
d
ure i
s
co
m
posed
by
t
a
k
i
ng t
h
e s
q
uare
r
oot
o
b
t
a
i
n
ed
so
far
,
appe
n
d
i
n
g 0
1
t
o
i
t
and
su
bt
ra
ct
i
ng i
t
,
p
r
o
p
er
l
y
shi
f
t
e
d,
fr
o
m
t
h
e curre
nt
r
e
m
a
i
nder.
The
0 i
n
01 c
o
rres
p
on
ds t
o
m
u
l
tip
lyin
g
b
y
2
;
t
h
e
1
is a
n
e
w gu
ess
b
it. Th
e
n
e
w roo
t
b
it
d
e
v
e
l
o
p
e
d is 1, i
f
th
e resu
ltin
g rem
a
in
d
e
r i
s
p
o
s
itiv
e, else i
t
is 0
,
wh
ich
th
e rem
a
in
d
e
r m
u
st b
e
resto
r
ed
b
y
add
i
ng
th
e
qu
an
tity ju
st sub
t
racted
.
It is
di
ffe
re
nt
fr
om
t
h
e no
n rest
o
r
i
n
g al
go
ri
t
h
m
where t
h
e s
u
b
t
raction
is n
o
t resto
r
ed
if th
e resu
lt is n
e
g
a
tiv
e.
In
stead
,
it ap
pen
d
s 1
1
to
th
e ro
o
t
d
e
v
e
l
o
p
e
d
so
far and
on
th
e n
e
x
t
iteratio
n
it p
e
rfo
r
m
s
an
ad
d
itio
n. If the
ad
d
ition
cau
s
es an
ov
erflow, t
h
en on
t
h
e
n
e
x
t
iteratio
n
it
h
a
s to
g
o
b
a
ck
t
o
t
h
e su
b
t
ractio
n
m
o
d
e
[15
]
. Fi
gu
re
1
(
a
)
and
(
b
)
g
i
ves an ex
am
p
l
e
o
n
how
tak
e
the b
i
nar
y
squ
a
re ro
o
t
of
0
1011
101
(
e
qu
iv
alen
t w
ith
9
3
d
e
ci
m
a
l)
f
o
r
r
e
stor
ing
an
d non
r
e
stor
ing
algor
ith
m
r
e
sp
ectiv
ely.
The c
o
nve
nt
i
o
nal
m
e
t
hod i
s
s
h
o
w
n i
n
Fi
g
u
r
e
1(a
)
w
h
ereas
t
h
e m
odi
fi
cat
i
o
n
i
s
sh
o
w
n
i
n
Fi
g
u
re
1
(
b
)
.
In t
h
i
s
m
odi
fi
c
a
t
i
on,
onl
y
su
b
t
ract
ope
rat
i
on
wi
t
h
ap
pe
nd
0
1
i
s
use
d
;
add
ope
rat
i
o
n an
d
appe
n
d
1
1
i
s
n
o
t
use
d
.
Thi
s
pape
r a
d
o
p
t
s
t
h
i
s
m
odi
fi
cat
i
on t
o
i
m
pl
em
ent
un
si
g
n
ed
32
an
d
6
4
-
b
i
t
b
i
nary
s
qua
re
ro
ot
ba
sed
o
n
FP
GA
.
3.
THE PROPOSED ST
RATEGIES FOR
FPGA IM
PL
EMENT
A
TION OF NON-RESTORING
SQUAR
E ROOT A
L
GORITHM
2.
1.
First Strate
gy
The fi
rst
st
rat
e
gy
of
fer
s
a si
m
p
l
e
al
t
e
rnat
i
v
e sol
u
t
i
o
n. S
a
m
a
vi
, et
al
[6]
has i
m
prove
d
cl
assi
cal
non
-
restoring
d
i
g
it-b
y
-d
ig
it sq
u
a
re roo
t
circu
it by eli
m
in
at
e red
und
an
t b
l
o
c
ks wh
ich
still b
a
sed
on
con
s
tan
t
d
i
g
i
t
of
0
1
or
1
1
a
n
d a
d
d
-
s
ubt
ract
as t
h
e m
a
i
n
b
u
i
l
d
i
n
g
bl
ock
.
T
h
e
fi
rst
st
rat
e
g
y
of
fers
a si
m
p
l
e
st
rat
e
gy
whi
l
e onl
y
uses
su
bt
ract
o
p
erat
i
o
n a
n
d a
ppe
n
d
s
0
1
.
Th
e st
rat
e
gy
i
s
i
m
pl
em
ent
e
d b
y
VH
DL
p
r
og
r
a
m
m
i
ng i
n
gat
e
l
e
ve
l
abstraction.
A
har
d
ware i
m
pl
em
ent
a
t
i
on o
f
t
h
e
n
o
n
-
r
es
t
o
ri
n
g
di
gi
t
-
by
-di
g
i
t
al
go
ri
t
h
m
for u
n
si
gne
d
6-
bi
t
sq
ua
re
ro
ot
by
a
n
a
r
r
a
y
st
ruct
u
r
e i
s
sh
ow
n i
n
Fi
g
u
re
2.
Th
e r
a
dican
d
is
P (P5,P4
,P3,P2
,
P
1
,
P0
),
U (U
2,
U1
,U
0)
as
qu
ot
i
e
nt
a
n
d R
(R
4,R
3
,R
2,R
1
,
R
0) a
s
rem
a
i
n
d
e
r.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 4
,
N
o
. 4
,
Au
gu
st 2
014
:
54
8
–
55
6
55
0
(a)
(b
)
Figure 1
.
Th
e ex
am
p
l
e o
f
d
i
g
i
t
-
b
y
-d
ig
it calcu
latio
n
to so
lv
e sq
u
a
re
roo
t: (a)
restoring
algo
ri
th
m; (b
) non
r
e
stor
ing
algo
r
i
th
m
Fi
gu
re
2.
A
si
m
p
l
e
hard
ware
im
pl
em
ent
a
t
i
on
of
t
h
e
n
o
n
-re
st
ori
n
g
di
gi
t
-
b
y
-
di
gi
t
al
g
o
r
i
t
h
m
for u
n
si
gne
d
6
-
bi
t
squ
a
re r
oot
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
St
rat
e
gi
es f
o
r
FPGA
I
m
pl
e
m
ent
a
t
i
o
n
of
N
o
n-Re
st
ori
n
g
S
q
uar
e R
oot
Al
g
o
r
i
t
h
m (
T
ol
e S
u
t
i
kno)
55
1
It
can
be s
h
ow
n t
h
at
t
h
e
i
m
plem
ent
a
t
i
on ne
eds
3 st
age
pi
p
e
l
i
n
es. T
h
e
bas
i
c bui
l
d
i
n
g
bl
o
c
ks
of t
h
e ar
ra
y
are
b
l
o
c
k
s
called
as co
n
t
ro
lled
su
b
t
ract-m
u
ltip
l
e
x
(CSM
). Fi
gu
re
3
presen
t th
e d
e
tails
of a CSM. I
npu
t of th
e
bui
l
d
i
n
g
bl
oc
k i
s
x,y
,
b
a
n
d u, and
as
a
n
o
u
t
p
ut
i
s
bo
(b
o
r
r
o
w) an
d d res
u
l
t
)
. If
u=
0,
t
h
e
n
d<=x
-y
-
b
el
se d<=x
.
Fi
gu
re
3.
I
n
t
e
r
n
al
st
r
u
ct
u
r
e
of
a C
S
M
bl
oc
k
In t
h
e fi
r
s
t
st
rat
e
gy
, t
o
o
p
t
i
m
i
ze hardwa
r
e
reso
u
r
ce ut
i
l
i
zat
i
on of t
h
e im
pl
em
ent
a
ti
on a
b
o
v
e
,
sp
ecialized
en
t
ities can
b
e
created
as bu
ild
i
n
g
b
l
o
c
k co
m
p
on
en
ts. It
will el
i
m
in
ate circu
itry th
at is no
t need
ed.
As exam
ple, to optimize the
im
ple
m
ent
a
tion
of
un
si
g
n
e
d
6
-
bi
t
sq
uare
ro
ot
can be
opt
i
m
i
zed becom
e
as
sh
own
i
n
Fi
g
u
re
4
.
Th
e sp
ecialized
en
tities A, B, C, D an
d E are m
i
n
i
mized
CSM wh
en inp
u
t
ybu=100
,
yu=00
,
u=0
,
yu=10
, and
y=
0
resp
ectiv
ely, and
th
e rem
a
in
d
e
r is igno
red
.
Fig
u
re
4
.
Op
timized
si
m
p
le hardware im
p
l
e
m
en
tati
on
of t
h
e n
o
n
-re
st
ori
n
g
di
gi
t
-
by
-
d
i
g
i
t
al
go
ri
t
h
m
for
unsi
g
ned
6
-
bi
t
squ
a
re r
oot
The
gene
ral
i
zat
i
on
of t
h
e
n
o
n
-
r
est
o
ri
n
g
di
g
i
t
-
by
-di
g
i
t
al
g
o
ri
t
h
m
for
u
n
s
i
gne
d
n-
bi
t
sq
uare
ro
ot
i
s
s
h
ow
n i
n
Fi
gu
re 5.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 4
,
N
o
. 4
,
Au
gu
st 2
014
:
54
8
–
55
6
55
2
Fig
u
re
5
.
Op
timized
si
m
p
le hardware im
p
l
e
m
en
tati
on
of t
h
e n
o
n
-re
st
ori
n
g
di
gi
t
-
by
-
d
i
g
i
t
al
go
ri
t
h
m
for
unsi
g
ned
n
-
bi
t
squ
a
re r
oot
2.
2.
S
eco
nd
St
ra
t
e
g
y
Th
e secon
d
strateg
y
also
o
f
fers a sim
p
le altern
ativ
e so
l
u
tio
n as t
h
e fi
rst strateg
y
th
at
it o
n
l
y
u
s
es
subt
ract
s o
p
er
at
i
on an
d a
ppe
nds
0
1
. B
u
t
,
t
h
e sec
o
n
d
st
ra
t
e
gy
prese
n
t
s
t
h
e fi
rst
st
r
a
t
e
g
y
i
n
regi
st
er t
r
ansf
e
r
l
e
vel
(R
TL
) a
b
st
ract
i
on.
T
h
e
pri
n
ci
pl
e
of t
h
e
seco
n
d
pr
op
os
ed al
g
o
r
i
t
h
m
can
be
descri
be
d as
f
o
l
l
o
w:
Step 0.
Start
Step 1.
Initialization radicand (the n-bit number will be squared root),
quotient (the result of squared root), and remainder. To calculate
square root of a 2n bit number, it needs n stage pipelines to
implement the proposed algorithm.
Step 2.
Beginning at the binary point, divide the radicand into groups of two
digits in both direction.
Step 3.
Beginning on the left (most significant bit), select the first group
of one or two digit (If n is odd then the first groups is one digit,
and vice versa)
Step 4.
Choose 1 squared, and then subtract.
Fist developed root is “1” if the result of subtract is positive, and
vice versa is “0”
Step 5.
Shift two bits, subtract guess squared with append 01.
Nth-bit squared is “1” if the result of subtract is positive, and
Because of subtract operation is done
else
Nth-bit squared is “0”, and not subtract
Step 6.
Go to step 5 until end group of two digits
Step 7.
End
2.
3.
Third Str
a
tegy
Th
e t
h
ird strat
e
g
y
is th
e m
o
d
i
ficatio
n of
Sa
m
a
wi et
al [6
]. Th
e strateg
y
is also im
p
l
e
m
en
ted
in
register transfe
r
level
(RTL)
abstraction as
the sec
o
nd
strateg
y
. Th
e basic m
o
d
i
ficatio
n o
f
th
e t
h
ird
st
rateg
y
can be descri
b
e
d
as
f
o
l
l
o
w:
1
r
o
(n/2 + 2 bit)
0
q
o
(n/2 + 1 bit)
the radicand
0
1
2
n
1
n
D
D
...
D
D
D
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
St
rat
e
gi
es f
o
r
FPGA
I
m
pl
e
m
ent
a
t
i
o
n
of
N
o
n-Re
st
ori
n
g
S
q
uar
e R
oot
Al
g
o
r
i
t
h
m (
T
ol
e S
u
t
i
kno)
55
3
For
0
i
to
1
2
n
do:
If
0
r
i
then
1
q
4
D
D
r
4
r
2
)
i
2
n
(
1
)
i
2
n
(
i
1
i
else
3
q
4
D
D
r
4
r
2
)
i
2
n
(
1
)
i
2
n
(
i
1
i
If
0
r
1
i
then
1
q
2
q
i
2
i
else
i
2
i
q
2
q
Th
e
fin
a
l
resu
lt
of th
e squ
a
re ro
o
t
is equ
a
l to
q
n
(
n
/
2
-1
)
do
w
n
t
o
0
,
c
ode
d i
n
n/
2
bi
t
s
.
4.
R
E
SU
LTS AN
D ANA
LY
SIS
In t
h
e p
r
evi
o
us
sect
i
ons, t
h
e t
h
ree ha
r
d
wa
re im
pl
em
ent
a
t
i
o
n st
rat
e
gi
es o
f
t
h
e no
n-
rest
o
r
i
ng
di
gi
t
-
by
-
di
gi
t
al
g
o
ri
t
h
m
fo
r s
qua
re r
o
o
t
were e
xpl
ai
n
e
d.
In t
h
i
s
sect
i
on,
si
m
u
l
a
ti
on
resul
t
s
of
32
-
b
i
t
and
6
4
-
b
i
t
squ
a
r
e
ro
ot
base
d
on
Al
t
e
ra A
P
EX
2
0
K
E
FP
GA
usi
ng t
h
e ab
o
v
e
m
e
t
hod a
r
e p
r
e
s
ent
e
d
,
as sh
o
w
n i
n
Fi
g
u
re
4
.
In t
h
i
s
si
m
u
latio
n
,
P
i
s
radi
can
d an
d
U
is quotient. The results showe
d
that
the im
ple
m
entatio
n has succeede
d
and
w
o
r
k
ed pr
op
er
ly.
B
a
sed o
n
com
p
i
l
a
t
i
on re
p
o
rt
,
t
o
im
pl
em
ent
32
-
b
i
t
and
64
-
b
i
t
squa
re r
o
ot
usi
n
g t
h
ree ab
o
v
e st
rat
e
gi
es
u
s
ing
Altera
FPGA APEX
20
KE are
needed 256 and
1023 lo
gic element (LE)
res
p
ectively, for
the first
st
rat
e
gy
. The c
o
m
p
ari
s
on
of
r
e
sul
t
s
obt
ai
ned
from
di
ffere
nt
im
pl
em
ent
a
ti
on m
e
t
hod i
s
s
h
ow
n i
n
Ta
bl
e 1
.
Thi
s
co
m
p
ar
iso
n
of
LE or
l
o
g
i
c cell (
L
C) u
s
ag
e is listed
b
a
sed on
r
e
f
e
r
e
n
c
es [6] an
d [1
6
]
. I
t
has show
n a
f
a
ntastic
val
u
e f
o
r re
d
u
c
i
ng o
f
har
d
war
e
reso
urce co
n
s
um
ed. Thi
s
is
d
u
e
ad
op
tion
fu
lly p
i
p
e
lin
ed
architecture and also
si
m
p
lif
icatio
n
o
f
CSM as show
n in
Figu
r
e
4.
Table
1. T
h
e
c
o
m
p
arison of l
ogic elem
ent usage
N
o
Meth
od
LEs Usag
e
Ab
st
ractio
n
level
32
-
b
i
t
sq
uare
ro
ot
64
-
b
i
t
sq
uare
ro
ot
1 Classical-
N
R
1
008
4
092
N
/
A
2 Red
u
c
ed
-Ar
ea-N
R
6
32
2
464
N
/
A
3 Mo
du
lar-
N
R
6
24
2
468
N
/
A
4 Si
m
p
le-
X
-
M
odu
le
6
48
2
488
N
/
A
5
Th
e f
i
r
s
t p
r
op
osed
st
r
a
teg
y
2
56 1
023
g
a
t
e
6
The sec
o
nd
p
r
op
ose
d
strategy
3
40 1
365
R
TL
7
Th
e t
h
ir
d pro
p
o
s
ed
str
a
teg
y
2
64 1
061
R
TL
Base
d
on [
1
6
]
,
for Alter
a
AP
EX 20KE &
Xilin
x Vi
rt
ex-E,
1
LC
=
1
LE
,
an
d
1 C
L
B
=
4
L
E
The seco
n
d
an
d t
h
i
r
d st
rat
e
gi
es cons
um
e LE bi
gge
r t
h
a
n
t
h
e fi
rst
st
rat
e
g
y
.
Nevert
hel
e
s
s
, t
h
e secon
d
and third strat
e
gies are m
o
re
flexible
beca
use the strate
gies use t
h
e RTL
ab
straction
level. If it is
n
eeded
t
o
set the num
ber bits of the ra
dicand which
will be determ
ined its squa
re
root, it can be done easily without
har
d
m
odi
fi
cat
i
on
o
f
V
H
D
L
so
urce
w
h
i
l
e
i
n
t
h
e
fi
r
s
t strateg
y
, it m
u
st rebu
ild
t
h
e VHDL so
urce. By
consideri
ng the abstraction
and the am
ount of LE co
ns
um
e, t
h
e st
udy
recom
m
e
nds
t
h
e use of t
h
e t
h
i
r
d
strategy.
Sim
u
l
a
t
i
on res
u
l
t
s
of
3
2
-
b
i
t
and
6
4
-
b
i
t
sq
u
a
re r
oot
base
d
on
Al
t
e
ra
AP
EX
2
0
KE
FP
G
A
usi
n
g t
h
e
pr
o
pose
d
m
e
t
hod i
s
p
r
esent
e
d
,
as sh
o
w
n
i
n
Fi
gu
re
6.
In t
h
i
s
sim
u
l
a
t
i
on,
P
i
s
radi
ca
nd a
nd
U
is qu
o
tien
t
. Th
e
resul
t
s
s
h
owe
d
t
h
at
t
h
e i
m
pl
em
ent
a
t
i
on i
s
s
u
ccessf
ul
an
d
w
o
r
k
e
d
pr
ope
rl
y
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 4
,
N
o
. 4
,
Au
gu
st 2
014
:
54
8
–
55
6
55
4
(a)
(b
)
(c)
(d
)
Fi
gu
re
6.
Si
m
u
l
a
t
i
on res
u
l
t
of
n-
bi
t
sq
ua
re r
o
ot
u
s
i
n
g
opt
i
m
ized si
m
p
l
e
har
d
wa
re i
m
pl
em
ent
a
t
i
on m
e
t
h
o
d
of
th
e non
-restorin
g
d
i
g
it-b
y
-d
igit alg
o
r
ith
m
:
(a)
3
2
-b
it in d
e
ci
m
a
l
di
spl
a
y
,
(
b
)
32
-bi
t
i
n
bi
na
ry
di
s
p
l
a
y
,
(c
)
64
-
b
it
in
d
ecim
a
l
d
i
sp
lay, (d) 64
-b
it
in
b
i
n
a
ry
d
i
sp
lay
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
St
rat
e
gi
es f
o
r
FPGA
I
m
pl
e
m
ent
a
t
i
o
n
of
N
o
n-Re
st
ori
n
g
S
q
uar
e R
oot
Al
g
o
r
i
t
h
m (
T
ol
e S
u
t
i
kno)
55
5
B
a
sed o
n
Fi
g
u
re
5, act
ual
l
y
t
h
e fi
rst
st
r
a
t
e
gy
can be
expa
n
d
ed
f
o
r l
a
rge
r
num
ber t
o
s
o
l
v
e
com
p
l
i
cat
ed squa
re
ro
ot
pr
o
b
l
e
m
i
n
FPG
A i
m
pl
em
ent
a
t
i
on.
U
n
f
o
rt
un
at
el
y
,
t
h
e p
r
o
pos
ed
m
e
t
hod
i
s
o
n
l
y
app
r
op
ri
at
e fo
r
gat
e
l
e
vel
abst
ract
i
on a
n
d i
s
not
po
wer
f
u
l
f
o
r R
TL
or
be
h
a
vi
o
u
r l
e
vel
ab
st
ract
i
on.
The
seco
nd
and t
h
i
r
d m
e
t
hods a
r
e bet
t
e
r
choi
ce f
o
r sol
v
i
n
g sq
uare
ro
ot
pr
o
b
l
e
m
for
l
a
rger
num
ber
.
W
e
d
o
n
’
t
ne
ed re-
written the
VHDL s
o
urce c
ode
s for di
ffe
rent num
b
ers.
The m
e
thods
will have m
a
ny adva
ntages
ove
r the
m
e
t
hod p
r
o
p
o
s
e
d by
Sam
a
wi
et al
[
6
]
,
a
n
d
t
h
e t
h
i
r
d m
e
t
hod
i
s
best
c
hoi
ce
f
o
r
ha
rd
wa
re re
sou
r
ce
savi
ng
.
5.
CO
NCL
USI
O
N
Thi
s
pa
per
has
prese
n
t
e
d t
h
re
e st
rat
e
gi
es o
f
t
h
e FP
GA i
m
pl
em
ent
a
t
i
on of
no
n rest
ori
ng
squ
a
re r
o
ot
alg
o
rith
m
.
In
first strateg
y
, a CSM as
b
a
si
c bu
ild
ing
bl
o
c
k
use
gat
e
l
e
vel
a
b
st
ract
i
o
n
has
i
n
t
r
o
duce
d
.
The
pri
n
ci
pl
e of t
h
e st
rat
e
gy
i
s
sim
i
l
a
r wi
t
h
co
nve
nt
i
o
nal
no
n-
rest
o
r
i
n
g al
g
o
ri
t
h
m
,
but
i
t
onl
y
uses su
bt
ract
ope
rat
i
o
n a
nd
appe
n
d
01
, w
h
i
l
e
add
o
p
erat
i
o
n
an
d a
ppe
n
d
1
1
i
s
n
o
t
used
. Sec
o
n
d
st
rat
e
gy
has
p
r
ese
n
t
e
d t
h
e
first strateg
y
fo
rm
in
reg
i
ster tran
sfer lev
e
l (RTL) ab
st
ract
i
on, a
nd i
n
t
h
i
r
d st
rat
e
gy
, a m
odi
fi
cat
i
on f
o
r t
h
e
im
pl
em
ent
a
t
i
o
n o
f
co
n
v
ent
i
o
nal
n
o
n
-rest
ori
ng al
go
ri
t
h
m
has prese
n
t
e
d
w
h
i
c
h al
so
use
R
TL abst
ract
i
o
n. T
h
e
al
l
above st
rat
e
gi
es hav
e
im
pl
em
ent
e
d i
n
VH
DL p
r
og
ra
m
m
i
ng and ad
opt
f
u
l
l
y
pi
pel
i
n
ed arc
h
i
t
ect
u
r
e. The
strateg
i
es h
a
v
e
co
nd
u
c
ted
to
im
p
l
e
m
en
t su
ccessfu
lly in
FP
GA
har
d
wa
re,
and eac
h o
f
t
h
e
st
rat
e
gi
es i
s
of
fer an
effi
ci
ent
i
n
ha
rd
ware
res
o
u
r
ce. In
ge
neral
l
y
,
t
h
e t
h
i
r
d s
t
rat
e
gy
i
s
sup
e
ri
or
beca
use
i
t
do n
o
t
nee
d
ha
rd
m
o
d
i
ficatio
n
to set th
e
nu
m
b
er b
its
o
f
t
h
e
rad
i
can
d.
Refere
nces
[1]
Yam
i
n, L. an
d
C
.
W
a
nm
i
ng. I
m
pl
em
ent
a
t
i
on of Si
n
g
l
e
Prec
i
s
i
on Fl
oat
i
ng
Poi
n
t
S
q
uare R
oot
on
FP
GAs
.
i
n
IE
EE Sy
m
posi
u
m
on
FP
G
A
f
o
r C
u
som
C
o
m
put
i
ng M
a
c
h
i
n
es
.
19
9
7
.
N
a
pa, C
a
l
i
f
or
ni
a
,
U
S
A
.
[2]
Yam
i
n, L. and
C
.
W
a
nm
i
ng. Paral
l
e
l
-
ar
ray
i
m
pl
em
ent
a
t
i
ons of a n
o
n
-re
st
ori
ng s
qua
re r
oot
al
g
o
ri
t
h
m
.
i
n
C
o
m
put
er Des
i
gn:
V
L
SI i
n
C
o
m
put
ers an
d Pr
oces
so
rs,
19
9
7
.
IC
C
D
'
9
7
.
P
r
ocee
di
n
g
s.,
1
9
97
IEE
E
In
tern
ation
a
l
C
o
nferen
ce on
. 1
997
.
[3]
Piro
m
s
o
p
a
, K., C.
Aporn
t
ewan
, and
P. Cho
n
g
s
titv
at
an
a,
An FPGA Im
p
l
em
en
tatio
n
o
f
a fi
x
e
d-p
o
i
n
t
squ
a
re
ro
ot
o
p
erat
i
o
n,
i
n
I
n
t
.
Sy
m
p
. o
n
C
o
m
m
uni
cat
ions
an
d
I
n
f
o
rm
ati
on Tec
h
nol
ogy
(I
SC
I
T
20
0
1
)
2
00
1:
C
h
i
a
ngM
ai
, T
h
ai
l
a
nd
.
[4]
Lla
m
occa-Obregon,
D.R.,
A Core Desi
gn t
o
Obtain
Squa
re Root Based
on a No
n-Rest
ori
ng Al
gorithm
,
i
n
IB
ER
C
H
IP
S
Wor
k
hso
p
2
0
0
5
:
Sal
v
a
d
or
B
a
hi
a, B
r
azi
l
.
p
.
1-
5.
[5]
Xi
ao
ju
n
W
an
g,
Va
ri
abl
e
Pr
eci
si
on Fl
oat
i
ng
-P
oi
nt
Di
vi
de an
d Sq
ua
re R
oot
f
o
r
Effi
ci
ent
FP
G
A
Im
ple
m
entation
of
Im
age and Signal
Processi
ng Algo
rith
m
s
, in
Electrical an
d Co
mp
u
t
er
Eng
i
n
eer
i
n
g200
7,
N
o
r
t
h
easter
n
Un
iv
er
sity: Bo
ston
, Massach
u
setts. p.
1
1
9
.
[6]
Sam
a
vi, S., A. Sadraba
d
i, and A. Fania
n
,
Mod
u
l
a
r
a
r
ra
y stru
ctu
r
e fo
r
n
on-resto
r
ing
square ro
o
t
circu
it.
Jo
urn
a
l of
Syste
m
s A
r
ch
itectur
e,
20
08
.
54
(10)
: p.
9
5
7
-
96
6.
[7]
Do
n
g
-
G
u
k
,
H
.
, C
.
D
o
oh
o,
a
nd
K.
H
o
wo
n,
Imp
r
o
ved Compu
t
a
tion
o
f
Sq
ua
re
Roo
t
s i
n
Sp
ecific Fin
i
te
Fields.
C
o
m
puters, IEEE Tra
n
sactions on, 2009.
58
(
2
)
:
p.
1
88-
196
.
[8]
Lach
owi
cz,
S.
and
H.J
.
P
f
l
e
i
d
erer.
Fast
Eval
uat
i
on
o
f
t
h
e S
qua
re R
o
ot
an
d Ot
her
No
nl
i
n
ear F
unct
i
o
ns i
n
FPG
A. i
n
El
e
c
t
r
o
n
i
c
Desi
g
n
, Test
an
d Ap
pl
i
cat
i
ons
, 20
0
8
. DE
LTA
20
08
. 4t
h I
E
EE Int
e
r
n
at
i
o
nal
Sym
posi
u
m
on
. 2
0
0
8
.
[9]
C
hu;
,
W. an
d
Y. Li
;
.
C
o
st
/
P
erf
o
rm
ance
Trade
o
ff o
f
n
-
Sel
ect
Squa
re
R
oot
Im
pl
em
ent
a
t
i
ons
. i
n
5
t
h
Aust
ralasian Com
puter
Arc
h
itecture
C
o
nf
er
en
ce
(A
CA
C
20
00)
. 20
00
. Can
b
e
r
r
a
, A
C
T
[10]
Xiao
liang
, J.,
Im
pl
eme
n
t
a
t
i
on
of
Sq
u
a
re
Root
Ari
t
h
m
e
t
i
c
Based o
n
FPG
A.
Modern Electroni
cs
Techn
i
qu
e, 200
7.
30
(1
4)
.
[11]
M
ont
uschi
,
P.
an
d M
.
M
e
z
zal
am
a. Sur
v
e
y
of
sq
ua
re
r
oot
i
n
g
al
g
o
ri
t
h
m
s
. i
n
C
o
m
put
ers a
n
d
Di
gi
t
a
l
Techniques, IE
E
Procee
dings E
1990. Italy.
[12]
Tha
kka
r, A
.
J.
and
A. Ej
ni
o
u
i
.
Desi
g
n
an
d i
m
pl
em
ent
a
t
i
on of d
o
u
b
l
e
p
r
ec
i
s
i
on fl
oat
i
n
g p
o
i
n
t
di
vi
si
on a
n
d
square root on FPGAs.
in Ae
rosp
ace Conference,
20
06 IEE
E
. 2006.
[13]
Xi
um
i
n
,
W
., e
t
al
. A New
Al
g
o
ri
t
h
m
for
Desi
g
n
i
n
g Sq
uare R
oot
C
a
l
c
ul
at
ors B
a
se
d
on F
P
G
A
wi
t
h
Pip
e
lin
e Techno
log
y
. in
Hybrid
In
tellig
en
t Syste
m
s,
2
0
0
9
.
HIS '0
9. Nin
t
h In
tern
atio
n
a
l
Co
nferen
ce on.
2
009
.
[14]
Ren
x
i
,
G., et al. Hardware Im
p
l
em
en
tatio
n
o
f
a High
Sp
eed Flo
a
tin
g
Po
in
t
Mu
ltip
lier Based
on
FPGA. in
4t
h I
n
t
e
r
n
at
i
o
n
a
l
C
onfere
n
ce
on C
o
m
put
er
Sci
e
nce &
Edu
catio
n. 20
09
. N
a
nn
ing
,
Gu
an
gx
i, P.R.C
h
ina:
19
0
2
-
1
90
6.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 4
,
N
o
. 4
,
Au
gu
st 2
014
:
54
8
–
55
6
55
6
[15]
Dattalo
, S.
Squa
re Ro
o
t
Th
eo
ry
. Tech
ni
cal
St
uf
f 2
0
00 M
a
rc
h 1
0
,
2
0
1
0
M
a
rch
1
7
,
20
1
0
]
;
Avai
l
a
bl
e
fr
om
:
h
ttp
://www.d
attalo
.co
m
/tech
n
i
cal/th
eo
ry/sq
r
t
.
h
t
m
l
.
[16]
Comparing Altera APEX
20KE
&
Xilinx Virtex-E Logi
c Densities
.
[ci
t
e
d 2
0
10 M
a
rch
30
, 2
0
1
0
]
;
Av
ailab
l
e fro
m
:
h
ttp
://w
ww
.al
t
era.co
m
/
p
r
oducts/d
ev
ices
/ap
e
x
/
features/apx-co
m
p
d
e
n
s
ity.h
t
m
l
.
Evaluation Warning : The document was created with Spire.PDF for Python.