TELKOM
NIKA
, Vol.13, No
.3, Septembe
r 2015, pp. 1
089
~10
9
6
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i3.1776
1089
Re
cei
v
ed Ma
rch 2
3
, 2015;
Re
vised July
8, 2015; Acce
pted Jul
y
25,
2015
A Fast Fractal Image Compression Algorithm
Combined with Graphic Processor Unit
Hui Guo*, Ji
e He
Schoo
l of Information a
nd El
e
c
tronic Eng
i
ne
erin
g,
W
u
zhou
Univers
i
t
y
, W
u
zhou
54
300
2, Guang
xi, Chin
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: guoh
ui
928
@
qq.com
A
b
st
r
a
ct
Directe
d ag
ain
s
t the charact
e
rist
ics of co
mp
utatio
nal
int
ensity
of fract
a
l i
m
a
ge c
o
mpressi
on
enco
d
in
g, a s
e
rial-p
aral
le
l tra
n
sfer
mech
an
i
s
m
is b
u
ilt
for
enco
d
in
g pr
oc
edur
es. By uti
l
i
z
i
n
g th
e
prop
er
ties
of singl
e instr
u
ction a
nd
mu
ltithrea
din
g
ex
ecutio
n
of co
mp
ute un
ifie
d devic
e
archit
e
c
ture (CUDA),
the
para
lle
l co
mpu
t
ationa
l mod
e
l of fractal enco
d
in
g is bui
lt
on
the graph
ic pr
ocessor u
n
it (GPU) in order
to
para
lle
li
z
e
t
h
e
consi
dera
b
ly ti
me-c
ons
u
m
in
g
seria
l
ex
ecuti
on process
of search
in
g
for t
he
block
of
be
st
match. T
he ex
peri
m
e
n
tal res
u
lt ind
i
cates, the al
gorit
h
m
i
n
this pap
er shortens the e
n
c
odi
ng ti
me to
the
mi
llis
econ
d sca
le a
nd si
gnific
a
ntly bo
osts the
executi
on effici
ency of fractal
i
m
a
ge
enco
d
i
n
g al
gorith
m
w
h
i
l
e
keep
ing th
e de
code
d i
m
ag
e in
good q
u
a
lity.
Ke
y
w
ords
:
fractal
i
m
a
ge compressi
on, grap
hic
pr
oce
sso
r un
it, comp
ute u
n
ifi
e
d
devic
e arch
itecture
,
para
lle
l co
mput
ing
Copy
right
©
2015 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Fra
c
tal imag
e en
codi
ng i
s
a
com
p
re
ssion m
e
thod
within the
sp
atial domai
n
with hig
h
comp
re
ssion
ratio and hi
g
h
decoding q
uality. Ho
wev
e
r, the unne
cessarily lon
g
encodin
g
time
has limited it
s pop
ulari
z
ati
on and ap
pli
c
ation. In order to red
u
ce the enco
d
i
ng time, a lot o
f
schola
r
s h
a
ve raised the
feature
cl
assi
fication meth
od or metho
d
of cluste
rin
g
to redu
ce the
sea
r
ch time for matching
blocks [1-4]. Jian
g Zhen
g
et al.
[5] propo
sed a K-mean cl
uste
ri
ng
optimizatio
n
based fractal
en
codi
ng. A
gain
s
t the
problem th
at the K-m
ean
clusteri
ng f
r
a
c
tal
encodin
g
alg
o
rithm dep
en
ds on data d
i
stributio
n, Wu Yiquan an
d Sun Ziyi [6] propo
se
d an
immune
pa
rticle
swarm
o
p
timization
(PSO) an
d
ke
rnel fu
zzy cl
usteri
ng
ba
sed meth
od,
whi
c
h
can
achieve
an a
c
cele
rati
on of 6
times as
co
mpa
r
e
d
with th
e ba
sic fra
c
tal en
codi
ng al
gorit
hm.
Hui Gu
o
et al
.
[7] have modified the qu
adtree f
r
a
c
tal enco
d
ing m
e
thod in
com
b
ination
with
the
human vi
sual
system, in
orde
r to
cont
rol the
di
sto
r
tion within th
e ran
ge b
e
yond h
u
man
eye
recognitio
n
when de
co
din
g
, but the accele
ration
eff
e
ct is no m
o
re than 27 ti
mes. Since the
fractal
en
codi
ng rend
ers th
e typica
l ch
aracteri
stic
of serial exec
utio
n, the mat
c
hi
ng p
r
o
c
ed
ure
is
to impleme
n
t global
or
cl
assified lo
cal
sea
r
ch
of
D-blo
c
k pool
for e
a
ch R
bl
ock on
e by
one,
whi
c
h can b
e
viewed a
s
serial repetitive executio
n o
v
er the same
pro
c
ed
ure
s
,
so the e
n
codi
ng
is rath
er tim
e
-con
sumi
ng.
Paralleli
zed
execut
ion
o
f
these procedures
wo
uld be a fea
s
i
b
le
optimizatio
n
method. E
s
p
e
cially, given
the availa
bi
lity of a lot
of ha
rd
wa
re with
pa
rall
el
comp
utation stru
cture
no
wad
a
ys
, the
encodin
g
sp
eed
will pro
m
ote
sig
n
ificantly if the fractal
encodin
g
can
be i
n
tegrate
d
with
a
ce
rta
i
n pa
ralle
l
ha
rdwa
re with hi
gh
p
opula
r
ity and
l
o
w co
st to
establi
s
h a
co
rre
sp
ondi
ng impleme
n
tatio
n
mech
ani
sm
.
The research
of the above
schola
r
s i
s
to sh
orten th
e
encoding tim
e
in the dime
nsio
n of
optimizatio
n
of encodin
g
a
l
gorithm. T
h
e
fractal
en
co
d
i
ng re
nde
rs a
typical cha
r
a
c
teri
stic
of se
rial
executio
n. Th
e matching
p
r
ocedu
re i
s
t
o
implem
ent
global
or cl
assified lo
cal
search
of D-bl
ock
pool fo
r
ea
ch
R
blo
c
k on
e
by on
e,
which can
be vi
e
w
ed
a
s
se
rial
re
petitive ex
ecutio
n ove
r
the
same
proced
ure
s
. Thu
s
,
to parall
e
lize
these
pro
c
e
dure
s
woul
d
be a fea
s
ib
le optimization
method. Esp
e
cially, given
the availabilit
y of a lot
of h
a
rd
wa
re
with
parall
e
l comp
utation st
ru
cture
nowaday
s, the e
n
coding
sp
eed
will
prom
ote
si
g
n
ificantly if t
he fra
c
tal
e
n
co
ding
can
be
integrate
d
wi
th
a ce
rtain parall
e
l
h
a
rd
ware wi
th hi
gh p
opul
arity and
lo
w
co
st to e
s
tabli
s
h
a
corre
s
p
ondin
g
imple
m
enta
t
ion me
cha
n
i
s
m. Th
e im
a
ge p
r
o
c
e
s
sor graphi
c
pro
c
essor unit
(G
PU)
has l
a
rge q
uantities
of
parall
e
l ha
rd
ware a
r
it
hme
t
ic units which a
r
e
appli
c
abl
e to pa
rallel
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 1089 – 10
96
1090
comp
utation
of multiple d
a
ta obje
c
ts.
The compute
unified device archite
c
ture
compute
unified
device a
r
chitecture
(CUDA) is a new type of software ar
chitecture and programming model
for
handli
ng a
n
d
mana
ging
GPU
comp
utation. With
single in
stru
cti
on an
d multi
data exe
c
uti
on
mode
s, it can
utilize CP
U to pro
c
e
s
s the
sequ
entia
l p
o
rtion of a
ppli
c
ation
s
, and
at the sam
e
time
perfo
rm pa
ral
l
el executio
n
of the com
p
u
t
e-inten
s
ive p
o
rtion o
n
GP
U via API with threa
d
as t
h
e
bas
ic
unit [8].
This pap
er of
fers a fa
st fra
c
tal ima
ge
co
mp
re
ssion
al
gorithm
built
on the
ba
se
of GPU
and
utilizin
g
CUDA fo
r p
a
r
allel
en
codi
n
g
. Thi
s
p
a
rall
el en
co
ding
method
com
p
rises of th
ree
comp
one
nts:
the 4-nei
gh
borh
ood
average m
e
thod
adopted fo
r spa
c
e
com
p
re
ssi
on of the
domain bl
ock, preprocessi
ng of ran
ge
and dom
ai
n
blocks, and computation o
f
minimum mean
squ
a
re
erro
r. The p
r
o
c
e
ss of sp
ace co
mpre
ssion
of
the
dom
ain block
b
egin
s
with
the use of
a
parall
e
l exe
c
ution sch
e
me
, namely ea
ch threa
d
of G
P
U pe
rform
s
the avera
ge
sampli
ng jo
b
of
one d
o
main
block. On th
e
prep
ro
ce
ssin
g stag
e, ea
ch threa
d
of G
P
U will figu
re
out the sum
of
pixels an
d sum of squ
a
res of pixels,
resp
e
c
tively, for each ra
nge blo
c
k a
nd the se
arched
domain bl
ock. In the comp
utation of min
i
mum
mea
n
square erro
r, eac
h ra
nge b
l
ock will have
a
corre
s
p
ondin
g
standal
one
thread for
affine transfo
rmation an
d
solution to minimum me
an
squ
a
re e
r
ror.
The experim
ental re
sult indicate
s the
algorith
m
in this pap
er
ca
n spe
ed up
120
and mo
re tim
e
s a
s
compa
r
ed with the traditional fract
a
l com
p
re
ssi
on metho
d
while ke
epin
g
the
decode
d ima
ge in goo
d qu
ality.
2. Traditiona
l Fractal Enc
oding Meth
o
d
Mandel
brot first raised th
e fractal im
a
ge wa
s a
n
i
t
erated fun
c
ti
on sy
stem [9]. He
believed th
at many matters in the n
a
tural wo
rld
h
ad
si
milar p
a
rt
s, a
nd poi
nted o
u
t
a fract
a
l cl
o
ud
coul
d be d
e
scrib
ed by a
simple mathe
m
atical fun
c
ti
on. In 1988,
Barn
sly and
Sloan rai
s
e
d
the
fractal im
age
com
p
re
ssio
n metho
d
, utilizing th
e
im
age’
s lo
cal
self-simil
arity for
comp
re
ssion
[10]. The pra
c
tical f
r
a
c
tal
blocke
d en
co
ding p
u
t fo
rward by Ba
rn
sl
ey’s do
ctoral stude
nt
Ja
cq
uin
wa
s exactly develop
ed o
n
this ba
se [11]. The
fractal enco
d
ing
method shall
first partition
an
image into n
o
n
-ove
rlap
ping
R
×
R
blo
c
ks
and po
ssibly
overlap
p
ing
D
×
D
blo
c
ks,
whi
c
h a
r
e
cal
l
ed
rang
e blo
c
ks (
R
blo
c
ks) and
d
o
main
blocks
(
D
bl
ocks). The
si
ze of
domai
n
blocks mu
st
be
greate
r
tha
n
that of ra
nge
blocks. Th
e
followi
n
g
step is to pe
rf
orm ave
r
a
g
e
sam
p
ling
of the
domain
blo
cks
so
as to
a
c
cord
the
si
ze of d
o
mai
n
blocks
with t
hat of
ran
g
e
blo
c
ks. All t
h
e
domain bl
ocks ca
n be saved in the dom
ain pool
S
D
.
A
N
×
N
sized
image
ca
n b
e
pa
rtitioned
into
i
dom
ain
blocks a
nd
j
r
a
nge blocks, w
h
er
e
i
=
0
,1,2,…(
N
-2
R
+1)
2
,
j
=
0
,1,2,…,(
N
/
R
)
2
.
Search fo
r t
he d
o
main
b
l
ock of
be
st
match
from
the
domain pool
S
D
by the norm of minim
u
m sq
ua
re e
rro
r. The affi
ne tran
sfo
r
m
a
tion form
ula
is
sho
w
n a
s
Fo
rmula 1.
i
xy
i
xy
o
D
y
x
s
d
c
b
a
R
y
x
0
0
0
0
0
0
(
1
)
Whe
r
e
x
=
0 ,1 ,2 ,…,
R
,
−
1
y
=
0 ,1 ,2 ,…,
R
−
1,
xy
R
an
d
xy
D
are the val
ues of pixels
within ra
nge
block
j
R
and do
main blo
ck
i
D
. Parameters
a
,
b
,
c
and
d
are used for 8
isomet
ric
transfo
rmatio
ns of pixel:
4
rotation
s a
n
d
4 reversal
s, as
sho
w
n i
n
Figu
re 1.
i
S
a
nd
i
O
are the
contrast an
d
brightne
ss adju
s
tment coefficient
s, resp
ectively, in the pro
c
e
s
s wh
ere d
o
m
ain
block
i
D
is matched with
ran
g
e
block
j
R
. The comp
utation
a
l formula
s
fo
r
i
S
and
i
O
are sho
w
n
as Fo
rmula 2
and Fo
rmula
3, resp
ectivel
y
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
9
30
A Fast Fra
c
ta
l Im
age Compre
ssi
on Algo
rithm
Com
b
ined with GPU
(Hui G
u
o
)
1091
Figure 1. 8 Isometri
c Tra
n
sformation
s
]
'
[
)
'
(
]
'
[
'
2
2
2
R
x
R
y
xy
R
x
R
y
xy
R
x
R
y
xy
R
x
R
y
xy
R
x
R
y
xy
xy
i
D
D
R
R
D
R
D
R
S
(2)
2
'
R
D
S
R
O
R
x
R
y
xy
ij
R
x
R
y
xy
i
(3)
D
'
xy
is the pi
xel value in
corre
s
p
ond
en
ce to th
e d
o
m
ain bl
ocks
via the 8 i
s
o
m
etric
transfo
rmatio
ns. The root
-mea
n-squ
a
re,
RMS
i-j
, in the pro
c
e
s
s whe
r
e dom
ain blo
ck
i
D
is
matche
d with
range bl
ock
j
R
can b
e
figure
d
out as Formula 4. The minimum root
-mea
n-squ
a
re
min
RMS
is sh
own as F
o
rmul
a 5.
2
/
1
2
2
1
R
x
R
y
xy
j
xy
j
j
i
R
o
D
s
R
RMS
(
4
)
)
)
/
(
,...,
2
,
1
,
0
,
)
1
2
,...(
2
,
1
,
0
,
min(
2
2
mi
n
R
N
j
R
N
i
RMS
RMS
(
5
)
Suppo
se
I
is
the o
r
iginal
i
m
age,
R
i
s
th
e si
ze
of
ra
ng
e blo
c
k,
D
i
s
t
h
e
si
ze
of
do
main blo
ck.
T
h
e
con
c
rete algo
rithmic
step
s are sho
w
n a
s
the following:
Step 1: Partition the origin
a
l
image
I
into
non-overl
appi
ng ran
g
e blo
c
ks
j
R
with the s
i
z
e
of
R
×
R.
Step 2: Partition the ori
g
in
al image
I
into possibly ov
erlap
p
ing d
o
m
ain blo
c
ks
i
D
with
the size of
D
×
D
.
Step 3: Pe
rfo
r
m ave
r
ag
e
sampling
of th
e do
main
blo
c
ks
so
a
s
to
accord th
eir
size
with
that of the range blo
c
ks.
Step 4: For
each ra
nge
b
l
ock
j
R
, find the co
rrespon
d
i
ng dom
ain b
l
ock
i
D
from the
domain
po
ol
S
D
. Make
su
re if the difference of me
a
n
sq
ua
re e
r
ro
rs
between
j
R
and
i
D
is the
minimum afte
r the affine transfo
rmatio
n over
i
D
, then this
i
D
is
the bloc
k of bes
t
match for
j
R
.
Step 5: For each ran
ge
block
j
R
, rec
o
rd the frac
tal
c
o
de (IFS c
o
de)
c
o
ns
tituted by
transfo
rmatio
n libra
ry
w
i
(
i
,
n
,
s
i
,
o
i
):
(1) T
he num
b
e
r
i
of block
i
D
of best match;
(2) T
u
rn
j
R
and
i
D
into a numbe
r
n
(
n
ran
g
e
s
from 0 to 7) of
isometri
c tra
n
sformation
s;
(3)
Contrast a
d
justme
nt co
efficient
i
S
and
brightn
e
ss ad
justment
coef
ficient
i
O
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 1089 – 10
96
1092
3. GPU Com
b
ined Parallel
Fractal En
coding Algo
rithm
CUDA is a n
e
w ha
rd
wa
re
and software archite
c
ture for han
dlin
g and ma
nag
ing GP
U
comp
utation.
Applicatio
ns
can
ha
ndle
th
e sectio
n of
seque
ntial exe
c
ution
via
CPU a
nd
perfo
rm
parall
e
l exe
c
ution of the
compute
-
inten
s
ive sect
io
n
on GPU via
relevant API to CUDA, the
r
eb
y
give more ful
l
play to the
large
-
scale concur
rent co
mputing po
wer of displ
a
y card. While t
h
e
prog
ram i
s
runnin
g
, the concurr
ent p
r
o
c
e
ssi
ng
se
ction in
CUDA
prog
ram i
s
p
e
rform
ed by
the
kernel fun
c
tio
n
. The b
a
si
c unit of the kernel
fu
nctio
n
run
n
ing
on
GPU i
s
thre
ad. CUDA m
a
y
prod
uce a
lot
of con
c
urre
n
t
thread
s at
different
a
d
d
r
esse
s. Th
ese
threa
d
s exe
c
ute th
e
kern
el
function a
nd i
m
pleme
n
t pa
rallel p
r
o
c
e
s
sing of dat
a. F
i
gure
2 demo
n
strate
s the
basi
c
exe
c
uti
on
prin
ciple
of CUDA.
The prog
ram co
d
e
develop
ed
by CUDA p
r
og
rammi
ng
comp
ri
se
s of two
se
ction
s
: Ho
st cod
e
and
the device
code. The
h
o
st code is th
e
seri
al pro
c
e
ssi
ng runnin
g
on
CPU, wh
erea
s the device cod
e
is
the p
a
rallel p
r
o
c
e
s
sing runni
ng
on the displa
y chip GPU. The
host
cod
e
take
s the typical and p
r
in
ci
pal charge
of
sched
uling t
he overall an
d stro
ngly lo
gical
seri
al o
perations,
su
ch
as i
n
itialization
of
GP
U
and
dat
a exchang
e,
etc., whil
e the
device
code
i
s
mainly resp
o
n
sibl
e for pa
rallel
data
p
r
ocessin
g
with hig
h
de
gree of
parall
e
lizatio
n in t
he
prog
ram.
Figure 2. Execution Pri
n
ci
p
l
e of CUDA
The tra
d
ition
a
l fractal i
m
age
comp
re
ssion
m
e
thod
is con
s
ide
r
ably time-co
n
sumi
ng
becau
se of the gre
a
t amo
unt of calcula
t
ion in
the proce
s
s of sea
r
chi
ng the bl
ock of match
for
the ra
nge
blo
c
ks. In o
r
d
e
r t
o
speed
up
th
e search
i
ng, t
h
is p
ape
r
rai
s
es
a GP
U ba
sed
fast fract
a
l
image comp
ression alg
o
rit
h
m usin
g CUDA for parall
e
l encoding.
In the following su
ch pa
ra
llel
executio
n me
cha
n
ism
is d
e
mon
s
trate
d
via Figure 3,
in which
T
1
, T
2
,
... are threads
on
GPU,
sum
i
D
_
and
powersum
i
D
_
a
r
e
th
e sum
of pixels and su
m
of squares
of pixels of
do
main
blo
c
ks
i
D
,
sum
j
R
_
and
powersu
m
j
R
_
are
the
sum
of pixel
s
an
d
sum
of squ
a
res of pi
xels of
rang
e
blocks
j
R
.
The fractal
i
m
age
co
mpression
meth
od of
su
ch
parall
e
l p
r
o
c
essing
falls i
n
to three
st
eps:
Average
sam
p
ling of dom
ain blocks, prep
ro
ce
ssi
ng
of range blo
c
ks an
d dom
ain blocks, and
comp
utation
of minimum mean squa
re
error.
The Averag
e
-
Sample in
Figure 3 is
averag
e sa
m
p
ling. Prior t
o
sea
r
ching
for the
matche
d, a
4
-
neig
hbo
rho
o
d
pixel m
ean
s p
r
o
c
e
ssi
ng
shall
be
pe
rfo
r
med
over th
e
D
blo
c
ks
wi
thin
the
D
block p
ool.
The co
m
putation wo
rk
of this p
a
rt i
s
eq
ually di
st
ributed
over
all thre
ad
s. T
he
pro
c
e
s
sed d
a
t
a are
saved
into the texture mem
o
ry. The con
c
rete
pro
c
e
ss fo
r
4-nei
ghb
orh
o
od
averag
e is sh
own a
s
Figu
re 4.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
A Fast Fra
c
ta
l Im
age Compre
ssi
on Algo
rithm
Com
b
ined with GPU
(Hui G
u
o
)
1093
Figure 3. Parallel Pro
c
e
ssi
ng on GPU
Figure 4. 4-Neighb
orh
ood
Average Met
hod
Precompute i
s
the prepro
c
e
ssi
ng. After co
ndu
cting
sub
-
sampli
ng
over the D
blocks,
each thre
ad
on GPU is all
o
cate
d to pre
c
omp
u
te the
sum
j
R
_
value an
d
powersum
j
R
_
value of ea
ch
rang
e blo
c
k, as
well a
s
th
e
sum
i
D
_
value an
d
powersum
i
D
_
value of ea
ch
domain
blo
c
k, with the
respe
c
tive co
mputational f
o
rmul
a sh
own as follo
ws:
R
x
R
y
xy
sum
j
R
R
_
(6)
R
x
R
y
xy
powersum
j
R
R
2
_
(7)
R
x
R
y
xy
sum
i
D
D
_
(8)
R
x
R
y
xy
powersum
i
D
D
2
_
(9)
Comp
utation
of the mi
nim
u
m r
oot-m
ea
n-squa
re
e
r
ror. Ea
ch
R
b
l
ock
shall
be
equ
ally
distrib
u
ted to each threa
d
for se
arch a
n
d
match of
D
blocks in the
D
blo
ck p
ool, for com
putati
on
of affine tran
sform
a
tion, a
nd fo
r g
ene
ra
tion of
RMS
. Finally,
RMS
mi
n
is fou
nd
o
u
t by comp
aring
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 1089 – 10
96
1094
the gene
rate
d
RMS
. The
D
blo
ck i
n
co
rrespon
den
ce
to this
RMS
mi
n
is just the
searche
d
blo
c
k of
match.
Re
co
rd its co
rrespo
nding
IFS co
de
w
i
(
i
,
n
,
s
i
,
o
i
) a
n
d
pe
rform qu
antizatio
n codin
g
to
g
e
t
the
fra
c
tal co
de
of ea
ch ra
nge block.
T
he con
c
rete
matchin
g
p
r
o
c
e
s
s is sho
w
n in Fi
g 5,
where
T
0
, T
1
, T
2
,
.
..
are threads
on GPU.
The followi
ng
is the con
c
rete step
s of fr
actal ima
ge
encodin
g
thro
ugh whi
c
h th
e CUDA
structure is ut
ilized fo
r parallel encoding:
Input: For an
N×
N
s
i
z
e
d grays
c
ale image, the gr
ayscale
of pixels
is
quantiz
ed by 8 bits
,
N
is ge
nerally a powe
r
of
2
.
Output: Frac
tal c
o
de, namely
w
i
(
i
,
n
,
s
i
,
o
i
).
Step 1: Read
in the i
m
ag
e
data
at the
CPU
sid
e
. Pa
rtition the
ori
g
inal im
age
I
into
no
n-
overlap
p
ing range blo
c
ks
j
R
with the size
of
R
×
R
; partition the original image
I
into possibly
overlap
p
ing d
o
main blo
c
ks
i
D
with the size of
D
×
D
. And tran
smit these d
a
ta into
the device’
s
memory.
Step 2: Perform average
samplin
g to con
s
ti
tute a codeb
oo
k. Allocate the
averag
e
sampli
ng
work of ea
ch d
o
m
ain blo
c
k to each th
rea
d
, namely assi
gn one th
rea
d
to perfo
rm
the
sub
-
sampli
ng
work of one
domain bl
ock.
Step 3: Processing
of the
domain
blo
c
ks
and
ran
g
e
blocks, nam
ely comp
ute
sum
j
R
_
,
powersum
j
R
_
,
sum
i
D
_
and
powersum
i
D
_
. Allocate the preproce
s
sing of
each dom
ain
block an
d ra
nge
block to ea
ch
thread to pe
rform the co
m
putation work.
Step 4: Com
putation
of m
ean
sq
uare e
rro
r
RMS
. In
the kern
el fu
nction,
distri
b
u
te the
rang
e blo
c
ks equally into
each thre
ad for the mat
c
h
with all the domain bl
ocks in the dom
ain
pool, an
d for affine tran
sf
ormatio
n
an
d
RMS
comp
u
t
ation. Finally, find out
RM
S
mi
n
and re
co
rd
the fractal
co
de
w
i
(
i
,
n
,
s
i
,
o
i
) in co
rresp
onde
nce to this
RMS
mi
n
.
Step 5: Tran
smit the fractal
code fr
om the device e
nd
to the CPU e
nd.
Step 6: Output the fractal
cod
e
w
i
(
i
,
n
,
s
i
,
o
i
).
Figure 5. Matchin
g
Pro
c
e
s
s between
Ra
nge Blocks a
nd Dom
a
in Bl
ocks
4. Experimental Re
sult Analy
s
is
To verify th
e
validity of th
e alg
o
rithm,
this p
ape
r
a
dopts fou
r
standa
rd
test i
m
age
s
-
256
×25
6
×8 Lena, Peppe
r,
Cat and Ce
ll - for a test. These 4 i
m
age
s are o
f
repre
s
e
n
tative
signifi
cance i
n
term
s
of equilibri
um and change
of
texture and
margi
nal details, and are wel
l
cap
able of te
sting vari
ou
s image p
r
o
c
e
ssi
ng alg
o
rith
ms. Fo
r all the imag
es to
be teste
d
, the
sizes of ra
ng
e blocks are
set as 4
×
4,
the si
ze
s of
domain blo
c
ks a
s
8
×
8. T
he develo
p
m
ent
environ
ment
of pro
g
ram i
s
Micro
s
oft Visual
St
udio 2
010
+ CUDA6
.
0+Op
en
cv
2.3, the
syste
m
is
64-bit
Win
d
o
w
s
7, the m
e
mory is 4GB,
the di
sp
lay
card i
s
Nvidi
a
Gefo
rce G
T
630M,
and
the
CPU i
s
Co
re I3. In the
followin
g
th
e traditio
nal
fractal
en
cod
i
ng alg
o
rithm
and th
e G
P
U
combi
ned pa
rallel
fractal encodin
g
alg
o
rithm
a
r
e
a
dopted
se
parately to encode the
s
e fo
u
r
image
s. While de
codi
ng,
a blan
k m
a
tri
x
is create
d
. Via 9 iteratio
ns, the o
r
igi
n
al image
s
ca
n be
approximated
. The experim
ental re
sult is sho
w
n a
s
follows:
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
9
30
A Fast Fra
c
ta
l Im
age Compre
ssi
on Algo
rithm
Com
b
ined with GPU
(Hui G
u
o
)
1095
(a)
De
cod
ed Image
s by Tra
d
itional Algori
thm
(b)
De
cod
ed Image
s by Parallel Algo
rith
m
Figure 6. Effects of De
co
de
d Images by
Two Algo
rith
ms
Figure 6
sho
w
s, f
r
om
na
ked eye
s
, the
decode
d
ima
ges by the
traditional
fra
c
tal imag
e
encodin
g
alg
o
rithm a
nd b
y
the GPU
combine
d
pa
rallel fra
c
tal i
m
age
en
codi
ng alg
o
rithm
have
achi
eved the
coincident eff
e
cts,
whi
c
h demonstrates
t
he feasibility
of the algorithm in thi
s
paper.
The followi
ng
Table 1 p
r
o
v
ides the tim
e
(unit: s) of enco
d
ing, th
e PSNR valu
e (unit: dB)
of
decode
d ima
ges a
nd the speed
-up
ratio
of both algori
t
hms.
Table 1. Experime
n
tal Dat
a
by Usin
g Both Algorithm
s
Images
Traditional Algorithm
Parall
el Algorithm
Speed-up Ratio
T PSNR
T P
S
NR
Lena
29.13 31.4
6
0.243
31.46
120
Pepper
29.11 32.0
0
0.237
32.00
123
Cat
29.00 36.9
8
0.252
36.96
115
Cell
29.06 34.9
2
0.245
34.94
119
From the dat
a in Table 1, the encoding
time
by using
the GPU co
mbined p
a
rall
el fractal
encodin
g
met
hod i
s
sig
n
ificantly
sho
r
ter than th
at
by
the tradition
al fra
c
tal
en
coding
alg
o
rith
m;
the maximum
sp
eed
-up
rat
i
o ca
n a
c
hi
eve 123
times,
and the
de
co
ded im
age
s
can be
ret
a
ine
d
in good q
ual
ity. Therefore, it is feasi
b
le to
utilize
GPU for pa
ralleli
zation e
x
ecution of t
he
encodin
g
p
r
o
c
e
s
s of fracta
l image
comp
ressio
n in
co
mbination
wit
h
CUDA, whi
c
h
attach
es vital
signifi
can
c
e t
o
popul
ari
z
ati
on and a
ppli
c
ation of fracta
l image co
mp
ressio
n en
co
ding.
5. Conclusio
n
This p
ape
r h
a
s raised a f
a
st fra
c
tal im
age comp
re
ssion al
gorith
m
utilizing
CUDA o
n
GPU.
Su
ch p
a
rallel
fra
c
tal image co
mpression
meth
o
d
falls into th
ree
step
s: Averag
e
sam
p
ling
of domain bl
ocks, p
r
ep
ro
cessing of ra
n
ge blo
c
ks an
d domain bl
o
c
ks, co
mputa
t
ion of minimum
mean
squ
a
re error. Th
e
experime
n
t evince
s the
algorithm i
n
this pap
er
can a
c
hi
eve
an
accele
ration
of 123 time
s
as
com
pared
with the t
r
ad
itional fra
c
tal
image
com
p
ression
metho
d
,
and
kee
p
the
de
code
d im
age
s in
goo
d
quality. Furt
her
develo
p
m
ent is
antici
p
ated in th
e fu
ture
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 1089 – 10
96
1096
wor
k
to u
s
e
GPU to
opti
m
ize
CP
U
co
des
s
o
as
to
make
fra
c
tal
image
comp
ression
real-ti
m
e.
Furthe
rmo
r
e,
these meth
od
s will be u
s
e
d
in other
field
s
like dyn
a
mi
c image e
n
co
ding, etc.
Ackn
o
w
l
e
dg
ements
This
work wa
s sup
p
o
r
ted by
Gua
ngxi Natural
S
c
ien
c
e Fou
ndatio
n
Pro
g
ram (Grant
No.
2013
GXNSF
BA01927
5, G
r
ant
No. 201
3GXNSFBA0
1927
6)
a
nd
Guan
gxi Univ
ersity of Sci
e
nce
and Te
chn
o
lo
gy Research
Program (Gra
nt
No.201
3YB227, Gra
n
t No.20
13YB2
28).
Referen
ces
[1]
Bo W
ang, Y
ubi
n Gao. A
n
Image
Co
mpressio
n
Sc
heme B
a
se
d
on F
u
zz
y
Neur
al N
e
t
w
o
r
k.
T
E
LKOMNIKA T
e
leco
mmunic
a
tion C
o
mputi
n
g Electron
ics a
nd Co
ntrol
. 20
15; 13(1): 1
37-
145
[2]
Mohse
n
Nasr
i
,
Abdel
ham
id
Hel
a
li, H
a
lim
Sgha
ier, H
a
ssen Ma
aref. Efficient JPE
G
2000 Im
ag
e
Compress
io
n Scheme for Multih
op W
i
rel
e
s
s
Net
w
o
r
ks.
TELKOMNIKA Telec
o
mmun
icat
ion Co
mputi
n
g
Electron
ics an
d Contro
l
. 201
1; 9(2): 311-3
1
8
.
[3]
Cui
Xi
n-
Xia, L
u
o
Che
n
-
X
u. F
r
actal an
d Ch
a
o
s Char
acterist
ics in Rock Mi
ll
ed Process.
T
E
LKOMNIKA
Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
ng.
2014; 1
2
(1): 5
30-5
38.
[4]
Yi Li, Ho
ngc
h
an Z
h
e
ng, Gu
ohu
a Pen
g
, Min Z
hou. N
o
rm
al Vector Bas
ed Su
bdiv
i
sio
n
Scheme to
Generate
F
r
ac
tal C
u
rves.
T
E
LKOMNIKA In
don
esia
n J
our
nal
of El
ectric
al E
ngi
ne
erin
g
. 201
3; 1
1
(8):
427
3-42
81.
[5]
Jian
g Z
h
e
ng,
Jian
g Mi
ng
ya
n
.
A F
a
st F
r
acta
l Imag
e C
o
mpressio
n
A
l
g
o
rithm B
a
sed on K-mea
n
Clustering Optimization.
Journ
a
l of Electrica
l
& Electronic E
ducati
on.
20
06
; 36(03): 22-2
5
.
[6]
W
u
Yiqu
an, S
un Z
i
yi. F
a
st F
r
actal Imag
e C
odi
ng B
a
se
d Immunit
y
Partic
e S
w
arm
Opti
mizatio
n
a
n
d
F
u
zz
y
K
e
rn
el
Clusteri
ng.
J
o
urna
l of B
e
ij
in
g Un
iversity
o
f
Posts an
d
T
e
leco
mmunic
a
tions.
201
1;
34(0
1
): 69-7
4
.
[7]
Hui Gu
o, Yu
np
ing Z
h
en
g, Jie
He. A N
e
w
HV
S-Based
F
r
actal Imag
e C
o
m
p
ressi
on A
l
gor
i
t
hm.
Lecture
Notes in El
ectri
c
al Eng
i
ne
eri
n
g
. 2012; 1
38(2)
: 753-75
9.
[8]
Ma W
e
i-
w
e
i, S
UN Don
g
, WU
Xi
an-l
i
a
ng. Res
ear
ch on h
i
g
h
-order SF
DT
D
para
lle
l comput
ing bas
ed o
n
GPU.
Journal
of Hefei Un
iver
sity of
T
e
chnol
ogy(N
atural Sc
icenc
e).
201
2; 35(7): 92
6-9
2
9
.
[9]
BB Mande
lbrot
.
T
he F
r
actal Geometr
y
of Nat
u
re. Secon
d
ed
ition. W
H
F
r
eedman, Ne
w
Y
o
rk. 1982.
[10]
M. Barnsle
y
an
d A. Sloan, A b
e
tter
w
a
y
to co
mpress ima
ges
, BYT
E
, 1988, no.1,21
5-2
23
[11]
AE Jacqui
n. Image co
din
g
base
d
on a fr
actal t
heor
y
of iterated co
ntract
ive ima
ge transformati
ons.
IEEE Trans. Image Proc
essin
g
. 1992; 1(
1): 18-30.
Evaluation Warning : The document was created with Spire.PDF for Python.