Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eer
ing
(IJ
E
C
E)
Vo
l.
9
, No
.
5
,
Octo
ber
201
9
, pp.
4382~4
395
IS
S
N: 20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v9
i
5
.
pp4382
-
43
95
4382
Journ
al h
om
e
page
:
http:
//
ia
es
core
.c
om/
journa
ls
/i
ndex.
ph
p/IJECE
Combin
ing
conv
olu
tional n
eura
l
netwo
rks
and
slan
tlet
transfor
m for
an ef
f
ective
image r
etr
i
eval
schem
e
Moham
med
S
ab
bih
H
am
oud Al
-
T
amimi
Depa
rtment
o
f
C
om
pute
r
Scie
n
ce,
Col
le
ge
of
Science
,
Univer
si
t
y
of
Baghda
d
,
Ir
aq
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
his
tory:
Re
cei
ved
Des 26
, 201
8
Re
vised
A
pr
1
0
, 2
01
9
Accepte
d
Apr
25
, 201
9
In
the
latest
y
e
a
rs
the
re
has
be
e
n
a
profound
ev
olut
ion
in
comp
ute
r
sc
ie
nc
e
and
te
chno
log
y
,
which
inc
orpora
te
d
seve
r
al
fi
el
ds.
Under
this
evol
uti
on
,
Conte
nt
Base
I
m
age
Ret
ri
eva
l
(CBIR
)
is
among
the
image
pro
ce
ss
ing
field
.
The
re
ar
e
seve
ra
l
image
ret
ri
eva
l
m
et
hods
tha
t
can
ea
sil
y
ext
r
ac
t
f
ea
tur
e
as
a
result
of
the
ima
ge
ret
ri
eval
m
ethods
’
progre
ss
es
.
To
the
r
ese
ar
ch
ers,
findi
ng
resourc
efu
l
imag
e
ret
ri
eval
devi
c
es
has
the
ref
ore
bec
om
e
an
exten
sive
are
a
o
f
conc
ern
.
Im
age
ret
rie
v
al
t
ec
hni
que
ref
ers
to
a
s
y
stem
used
to
sea
rch
an
d
ret
ri
eve
images
from
digi
ta
l
ima
ges’
huge
datab
ase
.
In
th
is
paper,
the
aut
hor
foc
uses
on
rec
o
m
m
enda
ti
on
of
a
fre
sh
m
et
hod
fo
r
ret
ri
evi
ng
image.
For
m
ulti
pre
sentati
on
of
image
in
Convolut
ional
Neura
l
Ne
twork
(CNN
),
Convolut
ional
N
eur
al
N
et
work
-
Slanl
e
t
Tr
ansform
(CNN
-
SLT)
m
odel
uses
Slanl
et
Tra
nsfor
m
(SLT).
The
C
BIR
s
y
stem
was
the
ref
or
e
inspe
c
te
d
and
th
e
outc
om
es
benc
h
m
ark
ed.
The
r
e
sults
cl
ea
r
l
y
il
l
ustrat
e
th
at
gen
e
rall
y
,
th
e
rec
om
m
ende
d
technique
ou
tdi
d
t
he
rest
with
acc
ura
c
y
of
89
p
er
ce
nt
ou
t
of
the
thr
ee
d
at
ase
ts
tha
t
wer
e
ap
pli
ed
i
n
our
ex
per
iments.
Th
is
remarka
b
l
e
per
form
anc
e
c
learl
y
illus
tra
t
ed
t
hat
the
CNN
-
SLT
m
et
hod
worked
well
fo
r
al
l
three
da
ta
se
ts,
where
t
he
pre
v
ious
phase
(CN
N)
and
th
e
succ
essive
phas
e
(CNN
-
SLT)
har
m
oniousl
y
work
ed
tog
et
her
.
Ke
yw
or
d
s
:
Con
te
nt
base
i
m
age r
et
rie
val
wav
el
et
tra
nsfo
rm
s
Conv
olu
ti
onal
neural
netw
ork
s
Deep
le
ar
ning
Inform
at
ion
ret
rieval
Slanlet
tran
sf
orm
Copyright
©
201
9
I
nstit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Moh
am
m
ed
Sabb
i
h Ham
ou
d Al
-
Tam
i
m
i,
Dep
a
rtm
ent o
f C
om
pu
te
r
Scie
nce,
C
ollege
of
Science,
Un
i
ver
sit
y o
f B
aghda
d,
Ba
ghda
d,
Ir
a
q
.
Em
a
il
:
m
_al
ta
m
i
m
i75
@yah
oo.c
om
1.
INTRO
D
U
CTION
Of
la
te
,
the
re
has
bee
n
an
ups
urge
i
n
t
he
c
onsu
m
ption
of
di
gital
im
ages
with
th
e
de
velo
ping
acce
ssibil
it
y
of
the
i
nter
net
and
com
pu
te
rs
,
pa
rtic
ularly
because
dig
it
al
i
m
age
m
edia
creati
on
is
s
pe
edily
increasin
g.
T
he
avail
abili
ty
of
ine
xp
e
ns
iv
e
stor
a
ge
de
vic
es
an
d
t
he
us
e
r
dem
and
to
ge
ther
with
high
qual
ity
pr
i
nters
gi
ve
r
oo
m
fo
r
publi
c
con
s
um
ers
to
pri
nt
an
d
c
ol
le
ct
dig
it
al
i
m
ag
es
with
ea
se
from
the
In
te
rn
et
.
In
a
ddit
ion,
the
prom
pt
de
velo
pm
ent
of
netw
ork
te
c
hnologi
es
has
stim
ulate
d
the
ap
plica
ti
on
o
f
di
gital
im
ages
as one
of the m
os
t esse
ntial
com
m
un
ic
at
ion
m
edia for
e
very
day li
fe.
CB
IR
refe
rs
to
th
e
ret
rieval
of
per
ti
ne
nt
i
m
ages
from
an
im
age
data
base
i
n
acc
ordan
ce
with
sp
onta
neously
resu
lt
ant
feat
ures,
for
instanc
e
sh
ape,
c
olor
and
te
xt
ur
e
re
pr
ese
n
ti
ng
the
i
m
age’
s
inf
orm
at
ion
con
te
nt.
I
n
se
ver
al
a
pp
li
cat
ion
areas
,
the
need
f
or
e
ff
ic
i
ent
co
ntent
-
ba
sed
im
age
retr
ie
val
has
im
m
ensely
increase
d,
for
instance,
e
nter
ta
in
m
ent,
biom
edici
ne,
ed
uc
at
ion
,
c
rim
e
pr
e
ve
ntion,
co
m
m
erce,
m
i
li
t
ary
an
d
culture
.
In
ge
ne
ral,
retrieval
t
hro
ugh
docum
ents
or
im
ages
fou
nd
e
d
on
te
xt
ual
desc
riptio
n
is
ve
ry
easy
t
hough
the
process
ne
cessi
ti
es
to
m
a
nu
al
ly
ta
g
the
i
m
ages,
w
hich
i
s
tim
e
con
su
m
ing
a
nd
la
boriou
s
a
pa
rt
from
bein
g
highly
su
sce
pti
ble
to
er
ror.
T
he
m
anu
al
pro
cess
is
reli
a
nt
on
the
hum
an
knowle
dge
a
nd
this
cl
ue
s
the
process
to unce
rtai
nty si
nce
dissim
il
a
r
in
div
i
du
al
s
ha
ve dive
rse
im
age
unde
rstan
di
ng
.
Ov
e
r
the
tr
adit
ion
al
te
xt
-
base
d
retrie
val,
CB
IR
has
se
ve
ral
ben
e
fits.
CB
IR
is
capab
le
of
ov
e
rc
om
ing
the
pointe
d
out
chall
en
ge
s
by
autom
at
ic
all
y
ta
ckling
t
he
m
or
via
m
ac
hin
es
,
w
hic
h
i
s
ve
ry
ef
fici
en
t
and
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Combini
ng con
volutio
na
l
ne
ural netw
or
ks
an
d
sl
an
tl
et
…
(
Mo
hamm
e
d S
abbi
h Ham
oud Al
-
Tamimi
)
4383
pr
eci
se
de
vo
i
d
of
hum
an
interfe
ren
ce
as
a
res
ult
of
the
us
e
of
t
he
vis
ual
co
ntents
of
the
query
im
age
in
CB
IR
[1]
.
CB
I
R
has
pr
e
viou
sly
been
acco
m
pl
ished
via
s
ever
al
m
et
ho
ds
and
researc
h
is
sti
l
l
un
de
r
way
for
pro
gr
essi
ve
up
gr
a
ding.
In
a
ddit
ion
,
with
th
e
current
di
gital
te
chn
ol
og
y,
as
well
as
the
enorm
ou
s
num
ber
of
i
m
ages
al
l
ov
e
r
the
w
or
l
d,
an
autom
at
ed
sys
tem
fo
r
the
ret
rieval
vi
a
im
a
ges
is
t
her
e
f
ore
com
pu
lso
ry.
It
is
a
m
or
e
eff
ect
ive
and
ef
fici
ent
m
eans
of
disc
ov
e
rin
g
pe
rtin
ent
i
m
ages
as
com
par
ed
to
s
earchi
ng
acc
ordin
g
t
o
te
xt
annotat
io
ns.
More
over
,
CB
IR
does
not
use
the
waste
d
tim
e
in
the
proc
ess
of
te
xt
-
bas
ed
m
et
ho
d’
s
m
anu
al
an
nota
ti
on.
In
CB
IR
fiel
d,
th
ese
be
ne
fits
ha
ve
e
nc
ourag
e
d
this
stu
dy
to
desi
gn
im
age
retrie
val
te
ch
nique
.
This
c
halle
nge
has
res
ulted
i
n
the
inc
rease
of
dev
el
op
m
ent
an
d
resea
rc
h
in
the
CB
I
R
fiel
d.
Im
ages
are
retrieve
d
acc
ordin
g
to
f
eat
ur
e
s that are
ex
t
ra
ct
e
d
aut
om
at
icall
y fr
om
i
m
ages in
CB
IR
.
Am
on
gs
t
th
e
im
age
featu
res
that
are
l
ow
-
le
vel,
f
or
inst
ance,
s
ha
pe,
c
olor,
s
patia
l
locat
ion
an
d
te
xtu
re
,
te
xtur
e
us
ed
t
o
be
obj
ect
ive
a
nd
eff
ect
ive
in
re
trie
val
of
c
onte
nt
base
im
age.
Dive
rse
m
e
thods
adv
a
nce
d
f
or
extrac
ti
on
of
t
extu
re
feat
ur
es
,
la
rg
el
y
cat
eg
or
iz
e
d
into
t
he
sp
ect
ral
(
sim
i
la
rly
ref
er
red
t
o
as
fr
e
qu
e
ncy)
a
nd
the
sp
at
ia
l
te
chn
i
qu
e
s.
Ge
ne
rall
y,
the
sp
at
ia
l
m
et
ho
ds
are
reli
ant
on
sta
t
ist
ic
al
co
m
pu
ta
ti
on
s
on
t
he
im
age.
These
sta
ti
sti
c
m
et
ho
ds
a
re
howe
ver
se
ns
it
ive
to
im
age
no
ise
an
d
la
ck
adequate
num
ber
of
featur
e
s
[2]
.
S
pectral
te
ch
ni
ques
of
te
xt
ur
e
analy
sis
f
or
im
age
retrieval
a
r
e
on
t
he
oth
e
r
hand
stron
g
t
o
no
ise
.
The
s
pectral
te
chn
iq
ues
c
om
pr
ise
us
ing
m
ul
ti
-
reso
luti
on
te
xture
e
xtr
act
appro
ac
he
s
su
c
h
as
W
avelet
Transf
or
m
(
WT)
[
3
,
4
]
f
or
te
xture
il
lustrat
ion
,
disc
rete
c
os
ine
t
ran
s
f
orm
[
5
]
and
Mu
lt
i
Re
so
luti
on
(MR)
te
c
hn
iq
ues
for
instance,
Ga
bor
filt
ers
[
6
,
7
]
.
The
disad
van
t
age
with
t
hese
sp
ect
ral
te
ch
ni
qu
es
is
that
th
ey
do
no
t
e
ff
ect
ively
captu
re
the
i
m
age’
s
e
dg
e
i
nfor
m
at
ion
.
T
his
is
the
rati
on
al
e
be
hind
lookin
g
f
or
a
bette
r
reso
l
ution
t
o
conglom
erate
t
he
finest
feat
ures
from
sp
ect
ral
approac
h
a
nd
s
patia
l
appro
ac
h,
wh
ic
h
can
be
strong t
o n
oise
with sim
ple stat
ist
ic
al
co
m
pu
ta
ti
on
.
The
pro
blem
back
gr
ound
wa
s
discu
ssed
earl
ie
r.
It
i
nd
ic
at
e
d
that
t
he
probl
e
m
s
relat
ed
to
the
CB
IR
process
need
f
ur
t
her
in
vestig
at
ion
.
O
ne
of
the
m
ajo
r
chall
eng
e
s
of
the
C
BIR
was
t
hat
the
i
m
ages
m
us
t
be
represe
nted
us
ing
ef
fecti
ve
and
acc
ur
at
e
extracti
on
te
ch
niques.
T
he
hi
gh
e
r
dim
ension
al
it
y
of
the
colour
featur
e
vecto
r
al
ong
with
t
he
extracte
d
featur
es
does
not
disp
la
y
any
s
pa
ti
al
info
rm
at
i
on
a
nd
co
ns
ist
s
of
sem
antic
gap
s.
These
is
s
ues
m
us
t
be
reso
lv
ed
f
or
im
pr
oving
t
he
preci
sio
n
of
the
ret
riev
al
per
f
or
m
ance
[
3
,
4
]
.
Along
with
dev
el
op
i
ng
te
chn
i
qu
e
s
for
com
bin
ing
the
te
xtu
re
,
sh
ap
e
or
col
our
-
ba
sed
sim
i
la
riti
es,
the
dr
a
w
back
s
no
te
d
in
the
ea
rlie
r
stud
ie
s
w
ere
inv
est
i
gated
f
or
dete
rm
ini
ng
the
qu
e
sti
ons
that
nee
d
an
swer
s
.
Her
e
,
the
rese
arch
e
rs
ha
ve
pro
posed
an
d
de
velo
ped
a
novel
aut
om
at
ic
featur
e
s
e
xtrac
ti
o
n
process
th
at
was
base
d
on
the
SLT
an
d
CNN
m
et
ho
d,
wh
ic
h
was
con
si
der
e
d
as
on
e
of
the
best
deep
le
ar
ning
m
e
tho
ds
i
n
the
fiel
d of
m
achin
e lea
rn
i
ng alg
ori
thm
s.
2.
RELATE
D
W
ORK A
N
D D
EFINITIO
N
The
fin
ge
r
vei
ns
are
not
visi
ble
to
the
na
ke
d
hum
an
ey
e
s
under
nor
m
a
l
illum
inati
ng
conditi
ons.
On
t
he
oth
e
r
ha
nd,
they
c
ou
l
d
be
vie
wed
usi
ng
t
he
Near
-
Infr
aR
e
d
(
N
IR)
li
gh
t
betwee
n
the
wa
velen
gth
s
of
700
a
nd
1000
nm
.
The
hu
m
an
ti
ssu
e
s
are
s
een
to
a
bs
or
b
the
N
IR
li
ght
wav
e
s,
howe
ve
r,
t
hese
wa
ve
s
get
blo
c
ked
by
the
de
ox
i
dise
d
H
ae
m
og
lo
bin
(
H
bO)
m
olecule,
w
hich
is
pres
ent
in
hi
gh
e
r
c
on
ce
ntrati
ons
in
th
e
hu
m
an
veins
,
wh
ic
h
m
ake
these
veins
dark
er
in
the
acq
uir
ed
im
ages
[1]
.
The
vei
n
-
sca
nners
ge
ner
al
ly
su
pp
or
t
the
NI
R
li
ght
wav
e
s,
w
hich
are
ge
ner
at
e
d
f
ro
m
the
Ligh
t
Em
it
t
ing
Di
odes
(LE
Ds),
an
d
the
Cha
rg
e
Coupl
e
d
Dev
ic
e
(CCD)
ca
m
eras
or
t
he
Com
ple
m
enta
ry
Me
ta
l
-
Ox
i
de
Sem
ic
on
duct
or
(CMOS
)
ca
m
eras.
These
de
vices
captu
re
the
im
ages
of
t
he
ve
ins
f
ro
m
a
part
ic
ular
re
gion.
Also,
these
de
vices
com
pr
is
e
of
seve
ral
opti
cal
fil
te
rs
wh
ic
h
s
cat
te
r
the
NI
R
-
em
i
tt
ed
beam
s
and
al
s
o,
in
crease
the
co
nt
r
ast
of
the
ca
ptured
raw
im
ages,
as d
esc
ribe
d
i
n Fi
gure
1.
Ther
e
a
re
a
num
ber
of
fact
or
s
that
dete
r
m
ine
the
CB
IR;
these
aspects
include
feat
ur
e
e
xtracti
on
te
chn
iq
ue,
us
ing
s
uitable
fea
ture
s
i
n
CB
IR
,
resem
blance
m
easur
em
ent
te
chn
i
qu
e
a
nd
s
el
ect
ed
m
at
he
m
at
ic
al
conve
rt
to
com
pu
te
op
erati
on
al
cha
racter
ist
ic
s,
reacti
on
pr
oce
dure
.
In
CB
IR,
al
l
these
aspects
ar
e
ver
y
i
m
po
rtant.
A
n
eff
ic
ie
nt
retri
eval
m
echan
ism
can
be
acc
om
plished
thr
ough
im
pr
ov
e
m
e
nt
of
s
om
e
of
the
prom
pting
as
pe
ct
s.
W
e
first
avail
a
s
hort
r
eview
of
the
a
sp
ect
s
that
c
ou
ld
in
flue
nce
C
BIR
f
or
t
his
pur
pose.
Fr
om
the
discuss
io
n
on
CB
I
R
in
intro
duct
ion,
it
is
assu
m
ed
that
us
in
g
low
-
le
vel
im
ag
e
featur
es
li
ke
color
,
sh
a
pe
an
d
te
xtu
re
is
to
am
as
s
inform
at
ion
fr
om
an
i
m
age
to
resto
rati
on.
A
var
ie
ty
of
s
pectral
te
chn
i
ques
of
extracti
ng
te
xtu
re
feat
ur
e
s
as
well
as
al
l
of
t
he
existi
ng
te
chn
i
qu
e
s
of
validat
ing
im
age
t
extu
re
cha
racteri
sti
c
s
in
co
ntem
po
rary
li
te
ratur
e
ha
ve
bee
n
ta
lked
over
with
t
he
in
te
ntion
of
at
ta
inin
g
the
obj
ect
ive
of
the
rese
arch,
wh
ic
h
is
to
asc
ertai
n
the
m
os
t
suffici
ent
c
ha
racteri
sti
cs
in
CB
IR.
W
e
at
te
m
pt
to
handle
the
wea
knesse
s
of
a
sp
ect
ral
m
et
ho
d
as
well
as
the
w
ay
a
dif
f
eren
t
m
et
ho
d
cou
l
d
offer
re
so
luti
ons,
an
d
wh
ic
h
is
t
he
m
os
t
op
e
rati
onal
am
ong
t
hem
in
re
pr
ese
ntati
on
of
te
xture
c
har
ac
te
risti
cs
in
this
disc
us
sio
n.
T
he
vital
co
ncerns
of
con
te
nt
ba
sed
i
m
age
retriev
al
syst
e
m
inclu
de
the
fo
ll
owin
g:
(
1)
Sim
il
arit
y
m
easur
e
m
ent
,
(
2)
Lo
w
-
le
vel
i
m
age
,
(3)
fea
tures
e
xtracti
on
,
(4)
Sele
ct
io
n
of
im
age
da
ta
base
,
a
nd
(
5)
Per
form
ance
evaluati
on
of
the
retrieval
proce
ss.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
9
, N
o.
5
,
Oct
ober
20
19
:
4382
-
4395
4384
2.1
.
Deep
le
ar
ning
In
nu
m
ero
us ar
ti
fici
al
intel
li
gen
t t
asks,
for
in
s
ta
nce,
m
achine tran
sla
ti
on
, o
bject
d
et
ect
ion
a
nd
s
peec
h
recog
niti
on
,
de
ep
le
ar
ning
has
rad
ic
al
ly
enh
a
nce
d
the
sta
te
-
of
-
the
-
art
[8]
.
Its
deep
arch
it
ect
ure
natu
re
pro
vid
es
dee
p
le
arn
i
ng
with
t
he
li
kelihood
of
reso
l
ving
nu
m
erous
pro
blem
at
ic
al
arti
fici
al
intel
li
gen
t
ta
sks
[9]
.
Ther
e
f
or
e, r
ese
arch
e
rs
are p
r
ol
onging
dee
p
le
arn
i
ng
to d
ive
rse
co
ntem
po
r
ary
do
m
ai
ns
an
d
ta
sk
s
to
gethe
r
with
tradit
ion
al
ta
s
ks
su
c
h
as
face
recog
niti
on,
la
nguag
e
m
od
el
s,
or
obj
ect
detect
ion,
[
8]
e
m
plo
ys
the
i
nsi
ste
nt
neural
net
wor
k
to
denoisin
g
sp
e
ech
si
gnal
s,
[10]
ap
pl
ie
s
sta
cked
a
uto
e
ncode
rs
in
disc
ov
e
rin
g
gen
e
expressi
on
s
’
cl
us
te
rin
g
patte
r
ns
.
[
11
]
A
pp
li
es
a
neural
m
od
el
in
c
rea
ti
ng
im
ages
with
di
ver
se
sty
le
s
[
12
]
.
Ma
ke
us
e
of
de
ep
le
ar
ning
to
ena
ble
in
sta
nt
aneous
se
nti
m
ent
a
naly
sis
from
nu
m
ero
us
m
od
al
it
ie
s,
[1
3]
Pu
t
into
us
e
dee
p
l
earn
i
ng for cl
assifi
cat
ion
of bi
olo
gical
im
age.
This
is
a
n
era
of
witnessi
ng
the
deep
le
a
rn
i
ng
resea
rch’s
f
lourishin
g.
De
ep
le
ar
ning
perform
s
bette
r
as
com
par
ed
to
oth
e
r
m
achine
le
ar
ning
al
gorithm
s
the
way
it
is
pro
po
s
ed
by
the
e
m
pirical
ou
tc
om
es.
Ther
e a
re a num
ber
w
ho h
a
ve
r
ecom
m
end
ed
that i
t i
s d
ue
to the f
act
t
hat it
slop
pily
i
m
i
t
at
es the fun
ct
io
ns
of
a
br
ai
n,
neural
ne
tworks
’
nu
m
erous
la
ye
rs
sta
cked
one
a
fter
an
oth
er
one
s
uch
as
the
cl
assic
al
br
ai
n
m
od
el
.
Up
to
date
on
the
oth
e
r
ha
nd,
there
is
no
stron
g
hypotheti
cal
fo
un
dation
fo
r
dee
p
le
arni
ng
,
[14]
then
de
e
p
le
arn
in
g
m
achi
nes
re
gu
la
rly
work
bette
r
as
com
par
ed
to
tradit
io
nal
ML
dev
ic
es
si
nce
they
le
arn
the
par
t
of
featur
e
e
xtract
ion
as
well
.
The
ai
m
of
deep
le
ar
ning
te
chn
i
qu
e
s
is
to
le
arn
featu
r
e
hierar
c
hies
with
char
act
e
risti
cs
fr
om
hig
her
le
vels
of
the
hierar
c
hy
de
sign
e
d
by
al
ign
in
g
feat
ur
e
s
fr
om
lower
le
vel.
Spon
ta
neously
le
arn
i
ng
c
harac
te
risti
cs
at
num
ero
us
le
vel
s
of
a
bs
tract
io
n
e
na
ble
a
sys
tem
to
le
arn
i
ntricat
e
functi
ons
that
directl
y
m
ap
t
he
input
to
th
e
ou
tp
ut
from
data,
de
vo
i
d
of
de
pendin
g
c
om
plete
ly
o
n
hu
m
an
crafted
ch
aract
erist
ic
s
[14]
.
A
point
in
case
is
i
m
age
rec
ogniti
on
,
w
he
re
t
he
tra
diti
on
al
s
yst
e
m
is
t
o
re
m
ov
e
handc
raf
te
d
c
ha
racteri
sti
cs
be
fore
fee
ding
a
Suppor
t
Ve
ct
or
Ma
chi
ne
(S
VM
).
In
c
ontrast
,
deep
le
arn
i
ng
schem
es
op
ti
m
iz
e
the
extr
act
ed
feature
s
that
la
rg
el
y
enlighten
on
the
rati
on
al
e
beh
in
d
their
bette
r
perform
ance.
The
sal
ie
nt
va
riance
betwee
n
tra
diti
onal
m
a
chine
le
a
rn
i
ng
an
d
deep
le
ar
ning
is
it
s
perf
or
m
ance
as
data
increases
’
scal
e.
Deep
le
arn
i
ng
al
gorit
hm
s
do
no
t
pe
rfor
m
well
in
a
sit
uation
where
the
data
is
s
m
all.
This
is
du
e
to
the
fact
that
deep
le
arn
i
ng
al
gorithm
s
necessit
at
e
a
hu
ge
am
ou
nt
of
data
for
it
to
be
perfect
ly
unde
rstood
[12]
.
2.2.
C
onv
olut
i
onal neur
al ne
tw
ork
(CN
N)
On
e
sp
eci
fic
ki
nd
of
dee
p
fe
edforwa
rd
net
work
that
was
trai
ned
with
m
uch
ease
gen
e
r
al
iz
ed
m
uch
bette
r
as
com
par
e
d
t
o
netw
orks
ha
ving
f
ull
co
nnect
ivit
y
betwee
n
la
y
er
s
t
hat
are
ne
arb
y.
T
his
w
as
the
CNN
[
15
,
16
]
,
w
hich
accom
plishe
d
se
ver
al
pract
ic
al
su
cc
esses
at
a
tim
e
wh
e
n
neural
ne
tworks
w
ere
beyo
nd
good tu
r
n
a
nd it
h
ad
curre
ntly
been wi
dely
adop
te
d by the
co
m
m
un
it
y of
c
om
pu
te
r
visio
n.
CNN
a
re
ai
m
e
d
at
processi
ng
data
com
ing
in
the
f
orm
of
nu
m
erous
ar
ra
ys,
and
a
point
in
case
is
a
gr
ay
scal
e
i
m
age
com
pr
isi
ng
three
T
wo
Dim
ensio
n
(
2D)
ar
rays
that
con
ta
in
pix
el
intensi
ti
es.
Nu
m
ero
us
data
m
od
al
it
ie
s
are
in
the
fo
rm
of
m
anifo
ld
ar
rays:
Thr
ee
D
i
m
ension
s
(
3D
)
for
volum
et
ric
or
vid
e
o
im
ages;
On
e
Dim
ensio
n
(1D)
f
or
s
equ
e
nces
an
d
sig
nals,
c
omprisin
g
la
ngua
ge;
an
d
2D
f
or
au
dio
or
i
m
ages
sp
ect
r
ogram
s.
B
ehin
d
CN
N,
there
are
f
our
cr
ucial
ideas
ta
king
a
dvant
age
of
nat
ur
al
sig
nals’
pro
pe
rtie
s:
sh
are
d weig
hts
, th
e a
ppli
cat
ion
of sev
e
ral la
ye
rs,
poolin
g
a
nd l
ocal co
nnec
ti
on
s
[
15
-
18
]
.
The
ty
pical
C
NN’s
a
rch
it
ect
ur
e
a
s
s
how
n
i
n
Fi
gure
1
is
st
ru
ct
ur
e
d
as
a
s
equ
e
nce
of
pha
ses.
T
he
few
init
ia
l
sta
ges
c
om
pr
ise
two
kin
ds
of
la
ye
r
s:
poolin
g
la
ye
rs
and
co
nvol
utio
nal
la
ye
rs.
I
n
a
co
nvolu
ti
on
al
la
ye
r,
un
it
s
are
pr
ea
r
range
d
in
feat
ure
m
aps,
wh
e
r
eby
ever
y
un
it
is
li
nk
ed
to
lo
cal
patches
in
the
pr
ec
edi
ng
l
ay
er’
s
featur
e
m
aps
vi
a
a
set
of
wei
ghts
kn
own
a
s
a
filt
er
ba
nk.
T
he
outc
om
e
of
t
his
local
weig
ht
ed
su
m
is
ther
efore
passe
d
via
a
n
on
-
li
near
it
y,
f
or
instance
a
R
ect
ifie
d
Linear
Un
it
s
(ReL
U)
[1
9]
.
I
n
a
fea
ture
m
ap,
each
an
d
ever
y
unit
sh
a
res
a
sim
i
la
r
f
il
te
r
bank.
I
n
a
la
ye
r,
div
e
rs
e
featu
re
m
ap
s
em
plo
y
diss
i
m
i
la
r
filt
er
ba
nks.
The
rati
onal
e
be
hind
t
his
arc
hi
te
ct
ur
e
is
double
.
F
orem
os
t,
in
ar
ray
data,
f
or
insta
nce
im
ages,
local
gro
up
s
of
value
s
a
re
usu
al
ly
gr
eat
ly
int
err
el
at
ed
,
as
a
resu
lt
f
or
m
s
a
un
i
qu
e
l
ocal
m
ot
ifs
that
co
uld
be
detect
e
d
with
ease.
Seco
nd
ly
,
the
i
m
ages’
local
sta
ti
sti
cs
as
well
as
oth
er
sign
al
s
are
inv
a
riant
to
posit
ion
.
To
sim
plify,
in
a
sit
uation
wh
e
re
a
m
otif
cou
l
d
ap
pea
r
in
the
im
age’
s
on
e
par
t
,
it
can
ap
pea
r
at
an
ypla
ce,
as
a
re
su
lt
the
idea
of
unit
s
at
div
e
rse
l
ocati
on
s
that
s
ha
re
identic
al
wei
ghts
a
nd
detect
si
m
il
ar
patte
rn
in
the
ar
ray’s
div
e
rse
par
ts.
A
rithm
e
ti
cal
ly
,
the
filt
ering
ope
rati
on
car
ried
ou
t
by
a
featu
r
e
m
ap
is
a
disti
nct
co
nvol
ution,
and the
refor
e
the
nam
e.
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Combini
ng con
volutio
na
l
ne
ural netw
or
ks
an
d
sl
an
tl
et
…
(
Mo
hamm
e
d S
abbi
h Ham
oud Al
-
Tamimi
)
4385
Figure
1.
Ef
fec
ts of sel
ect
ing
diff
e
re
nt sw
it
c
hing
unde
r dyn
a
m
ic
co
nd
it
io
n
Desp
it
e the f
ac
t t
hat the co
nv
olu
ti
onal
lay
er’s ro
le
is d
et
ect
ion
of
lo
cal
p
er
m
uta
ti
on
s of f
e
at
ur
es fr
om
the
prece
di
ng
l
ay
er,
the
po
oling
la
ye
r
’s
r
ol
e
is
integrati
ng
into
one,
t
he
sem
antic
a
ll
y
com
par
able
fe
at
ur
es
.
Be
cause the
r
el
at
ive posit
ion
s
of
t
he feat
ures
that f
or
m
a
m
otif could
var
y t
o
s
om
e extent,
consi
ste
nt d
et
e
ct
ion
of
the
m
otif
co
uld
be
do
ne
by
coar
se
-
grai
ni
ng
eve
ry
featu
re
’s
posit
ion.
I
n
on
e
featu
re
m
a
p,
a
cl
assic
po
oling
un
it
calc
ulate
s the suprem
e o
f
a local
patch
of
un
it
s.
Fo
r
insta
nce,
it
is
cl
early
dem
on
st
rated
in
Figure
2
t
hat
a
CNN
co
uld
le
ar
n
t
o
disti
nguis
h
e
dges
from
raw
pi
xels
in
the
first
la
ye
r
in
i
m
ag
e
ta
xo
nom
y,
and
then
app
ly
the
edges
in
detect
ing
si
m
ple
sh
apes
in
the
seco
nd
la
ye
r,
and
the
n
a
pp
ly
these
s
ha
pes
i
n
pr
e
ve
nting
hi
gh
e
r
-
le
vel
cha
racteri
sti
cs,
f
or
instance
sh
a
pe
s
of
faces in
h
i
gh
e
r l
ay
ers.
T
her
e
fore,
t
he
fi
nal lay
er is a classi
fier th
at
a
ppl
ie
s these c
har
act
e
risti
cs o
f
h
i
gh level.
Figure
2.
Ey
eri
s’ deep
lear
ning
base
d
facial
f
eat
ur
e e
xtracti
on u
si
ng c
onvo
luti
on
al
ne
ur
al
netw
ork
[20]
2.3.
C
onv
olut
i
o
n
al neur
al ne
tw
ork archi
tectur
e
As
cl
early
de
m
on
strat
ed
in
Figure
1
[
15
,
21
,
22
]
,
a
CN
N
com
pr
ise
s
a
n
outp
ut
a
nd
a
n
in
pu
t
la
ye
r
,
tog
et
he
r wit
h n
um
ero
us
hidde
n
la
ye
rs. T
he h
idd
e
n
la
ye
rs
a
r
e eit
her
fu
ll
y c
onnected
, c
onvoluti
onal
or
po
o
li
ng.
2.3.1
. C
onvo
lu
tional
la
yer
Conv
olu
ti
onal
la
ye
rs
us
e
a
con
voluti
on
pr
oc
edure
to
the
in
pu
t,
c
onveyi
ng
the
ou
tc
om
e
t
o
the
subse
qu
e
nt
la
ye
r.
The
co
nvol
ution
im
i
ta
t
es
a
per
son’
s
respo
ns
e
ne
uron
to
vis
ual
stim
uli
[
23
]
.
A
conv
olu
ti
onal
la
ye
r
com
pr
ise
s
m
a
nif
old
ne
uro
ns’
m
aps,
re
ferre
d
to
as
featu
re
filt
ers
or
m
aps,
with
thei
r
si
ze
eq
uiv
al
e
nt
to
the
input
im
age’
s
dim
ension
.
T
wo
pe
rce
ptions
e
na
ble
reduc
ti
on
of
the
nu
m
ber
of
m
od
el
stric
tures:
str
ic
ture
sh
ari
ng
a
nd
lo
cal
connecti
vity
.
In
it
ia
ll
y,
not
li
ke
in
a
network
t
hat
is
f
ully
connecte
d,
eve
ry
neur
on
in
a
featur
e
m
ap
is
si
m
ply
li
nk
ed
t
o
a
local
patch
of
ne
urons
in
t
he
prece
ding
l
ay
er,
al
so
k
nown
a
s
rece
ptiv
e
fiel
d.
Seco
nd
ly
,
each
an
d
e
ver
y
ne
uro
n
i
n
a
c
ertai
n
featu
re
m
ap
s
har
es
sim
i
la
r
stric
tures
.
T
her
e
fore,
ea
ch
an
d
every
neur
on
in
a
fe
at
ur
e
m
ap
scans
f
or
e
quival
ent
featu
re
in
the
pr
ec
edi
ng
la
ye
r,
though
at
div
erse
l
oca
li
ti
es
.
Dive
rse
feat
ur
e
m
aps
co
uld
,
for
insta
nce,
pe
rceive
e
dg
e
s
of
div
e
rse
or
ie
ntati
on
i
n
an
i
m
age,
or
serie
s
m
otifs
in
a
ge
nom
ic
s
eries.
T
he
acq
uisit
ion
of
ne
uro
n’
s
act
ivit
y
i
s
via
com
pu
ta
ti
on
of
a
disti
nc
t
convo
l
utio
n
of
it
s
appr
oach
a
ble
f
ie
ld,
w
he
reb
y
i
t
is
com
pu
ta
ti
on
of
t
he
sub
j
ec
ti
ve
su
m
of
in
pu
t
ne
uro
ns
,
a
s
well
as
ap
plica
ti
on
of an
acti
vatio
n functi
on. Fi
gure
3 cl
early
il
lustrate
the
D
is
creet C
onvol
ution
.
Figure
3. The
CNN
’s
first la
ye
r
is disc
reet
conv
olu
ti
on
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
9
, N
o.
5
,
Oct
ober
20
19
:
4382
-
4395
4386
2.3.2
. Rec
tified
li
near uni
ts
(R
eL
U)
l
ayers
It
is
a
re
s
olu
ti
on
to
us
e
an
a
ct
ivati
on
la
ye
r
(
or
a
la
ye
r
th
at
is
nonlinea
r
)
insta
ntane
ous
ly
afterwa
r
d
fo
ll
owin
g
e
very
conv
olu
ti
ona
l
la
ye
r
[19]
.
T
hi
s
la
ye
r’
s
pu
rpose
is
pr
ese
nta
ti
on
of
nonline
arit
y
to
a
syst
em
that
m
ai
nly
has
j
us
t
been
cal
cul
at
ing
li
near
operati
ons
in
th
e
cou
r
se
of
th
e
convo
l
ution
a
l
la
ye
rs.
In
the
past
,
non
li
nea
r
funct
ion
s
li
ke
sigm
oid
an
d
ta
nh
w
ere
ap
plied,
th
ough
resear
che
rs
disco
ve
red
t
hat
Re
LU
la
ye
rs
a
re
m
or
e
enh
a
nce
d
in
the
way
t
hey
oper
at
e
since
t
he
netw
ork
is
ca
pa
ble
of
trai
ni
ng
a
l
ot
quic
ke
r
(du
e
to
the
com
pu
ta
ti
on
al
eff
ect
ive
ness
)
devoid
of
m
aking
a
s
ub
sta
ntial
diff
e
re
nce
t
o
the
preci
sio
n.
I
n
a
ddit
ion
,
it
assist
in
reli
evin
g
th
e
chall
eng
e
of
van
is
hing
gra
dient,
wh
ic
h
is
the
concer
n,
wh
e
re
by
the
ne
twork
’s
lo
wer
la
ye
rs
trai
n
sluggis
hly
since
the
gradient
ex
pone
ntial
ly
decli
nes
via
the
la
ye
rs.
To
al
l
of
th
e
values
in
th
e
inp
ut
vo
l
um
e,
the
R
eLU
la
ye
r
m
ake
us
e
of
t
he
functi
on
f
(
x)
=
m
ax
(
0,
x).
All
the
ne
gative
a
ct
ivati
on
s
a
re
c
hange
d
to
zer
o
by
this
la
ye
r
in
si
m
ple
te
rm
s
[
16
]
.
Th
e
nonlinea
r
pr
op
e
rtie
s
of
the
m
od
el
and
the
ov
e
rall
netw
ork
are
increase
d
by
t
his
la
ye
r
de
vo
i
d
of
inter
fer
i
ng
th
e
c
onv
la
y
er’
s
rece
ptive
f
ie
lds.
Fig
ur
e
4
il
lustrate
s
th
e
Re
LU
act
ivati
on
func
ti
on
.
Figure
4. The
r
el
u
act
ivati
on fun
ct
io
n
2.3.3
. P
ooli
ng
lay
er
Pooli
ng
la
ye
r
l
essens
the
ir
in
pu
t
’s
siz
e
an
d
giv
es
r
oom
fo
r
analy
sis
of
m
ulti
-
scal
e.
T
he
m
os
t
po
pula
r
poolin
g
op
e
rator
s
are
a
ver
a
ge
-
poolin
g
a
nd
m
ax
-
poolin
g.
Within
a
s
m
al
l
sp
at
ia
l
blo
ck,
these
op
erators
cal
culat
e
the
aver
a
ge
or
the
m
axi
m
u
m
va
lue.
Ma
x
pool
ing
operati
on
with
(2
×
2)
f
il
te
rs
is
il
lustrat
ed
i
n
Figure
5.
I
n
a
nu
m
ber
of
ap
plica
ti
on
s
,
the
feat
ur
es
’
e
xact
f
reque
ncy
an
d
po
sit
io
n
i
s
not
per
ti
ne
nt
for
t
he
la
st
exp
ect
at
io
n,
f
or
instance
abili
ty
to
reco
gniz
e
obj
ect
s
in
an
i
m
age
[
24
]
.
With
the
us
e
of
t
his
assum
ption,
the
poolin
g
la
ye
r
ou
tl
ines
ad
join
ing
ne
uro
ns
t
hro
ugh
com
pu
ta
ti
on
,
f
or
i
ns
ta
nc
e,
the
ave
rage
or
m
axi
m
u
m
ove
r
their
act
ivit
y,
le
ading
to
feat
ur
e
act
ivit
ie
s’
represe
ntati
on
that
is
sm
oo
th
er.
T
hro
ugh
a
pp
li
cat
io
n
of
s
i
m
i
la
r
poolin
g
operati
on
to
sm
al
l
i
mage
patche
s
that
are
m
ov
ed
by
pix
el
beyond
on
e,
the
in
pu
t
i
m
age
is
eff
ic
ie
ntly
dow
n
-
sam
pled, an
d
as a
res
ult reducin
g f
ur
th
er th
e m
od
el
pa
ram
et
ers’
n
um
ber
.Th
e size
of
t
he ou
t
pu
t c
ou
l
d b
e
regulat
ed by t
hree
hyper stric
t
ur
es
w
hich
are
the zer
o
-
pa
dd
i
ng, dept
h
a
nd s
tride.
a.
Stride:
am
ou
nt
of
pix
el
s the
f
i
lt
er ju
m
ps
as th
ey
sli
de
over
th
e i
m
age.
b.
Dep
t
h:
in
or
de
r
to
the
i
nput
im
age,
it
is
bas
ic
al
ly
the
a
m
o
un
t
of
filt
ers
th
at
is
e
m
plo
ye
d.
These
filt
ers
are
capab
le
of
detect
ing
str
uct
ur
e
for
in
sta
nce,
b
l
ob
s
, e
dges a
nd
corner
s.
c.
Zero
-
Pa
ddin
g:
paddin
g
ze
r
os
arou
nd the i
nput’s b
orders f
or it
s size t
o be
pr
ese
r
ved.
Figure
5.
Ma
x pooli
ng
op
e
rati
on w
it
h (
2×
2)
f
il
te
rs
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Combini
ng con
volutio
na
l
ne
ural netw
or
ks
an
d
sl
an
tl
et
…
(
Mo
hamm
e
d S
abbi
h Ham
oud Al
-
Tamimi
)
4387
2.3.4
. F
ull
y
-
c
onnecte
d la
yer
Usu
al
ly
,
a
C
NN
com
pr
ise
s
num
ero
us
po
oling
an
d
c
on
vo
l
ution
al
la
y
ers,
w
hich
give
s
r
oo
m
for
increasin
gly
le
arn
i
ng
a
bs
tract
char
act
erist
ic
s
at
sn
owball
in
g
scal
es
inclu
di
ng
ob
j
ect
par
t
s,
entire
ob
j
ect
s
and
sm
a
ll
edg
es.
It
is
po
ssible
f
or
one
or
m
or
e
com
plete
l
y
c
onnected
la
ye
r
s
to
fo
ll
ow
th
e
final
pool
in
g
la
ye
r.
Mod
el
h
y
per‐s
tric
tures
f
or
ins
ta
nce th
e size
o
f
rece
ptive f
ie
lds,
the num
ber
o
f feat
ur
e m
aps
an
d
the num
ber
of
conv
olu
ti
onal
l
ay
ers
re
fer t
o
a
pp
li
cat
io
n‐depend
e
nt a
nd s
hould
be firm
ly
c
ho
s
en
on a
v
al
i
dation data
set
[
23
]
.
Fu
ll
y
-
Co
nn
ect
ed
la
ye
r
at
ta
ch
es
to
eac
h
a
nd
ever
y
ne
uron
of
the
pr
ece
di
ng
la
ye
r.
Ty
pi
cal
ly
,
fu
ll
y
connecte
d
la
ye
rs
are
ap
plied
as
the
netw
ork
’s
fi
nal
la
ye
r
and
car
ry
out
the
ta
xonom
y.
Fig
ur
e
6
dep
i
ct
s
a
CNN
’s
sam
ple, ill
us
trat
in
g
al
l t
he
th
ree
form
erly
r
eveale
d
l
ay
ers.
Figure
6
.
A
Sa
m
ple o
f
CN
N
a
rch
it
ect
ur
e
2.4.
Slan
tlet
tr
an
s
fo
rm
(
SLT
)
The
SLT
re
fe
rs
to
a
ort
hogonal
Disc
rete
Wav
el
et
Tra
nsfo
rm
(DWT)
with
t
wo
z
er
o
m
o
m
ents,
hav
i
ng
e
nh
a
nc
ed
tim
e
local
i
zat
ion
.
SLT
m
ai
ntains
ty
pical
filt
er
ban
k
im
ple
m
entat
ion
’s
featu
res
ha
ving
a
scal
e
dilat
ion
f
act
or
of
tw
o.
I
ts
fou
nd
at
io
n
i
s
not
on
reit
er
at
ed
filt
er
bank
s
uch
as
D
WT;
as
an
al
te
r
na
ti
ve,
div
e
rse
filt
ers
are
a
pp
li
ed
f
or
ever
y
scal
e.
This
pap
e
r
rec
omm
end
s
a
ne
w
way
of
a
pp
l
yi
ng
SL
T
in
im
age
retrieval
t
hrough
co
nversi
on
of
the
im
age
from
sp
at
ia
l
do
m
ai
n
to
c
on
ver
t
dom
ai
n
with
the
inte
nt
ion
of
order
i
ng
t
hem
and
sel
ect
in
g
the
m
os
t
relevan
t
an
d
inf
or
m
at
ive
par
t
of
i
m
age
for
the
r
et
rieval
m
od
el
to
be
i
m
pr
oved
.
I
n
m
ul
ti
CNN,
a
novel
te
ch
niqu
e
of
im
age
retrieval
CNN
-
SL
T
m
od
el
ap
plies
m
erg
in
g.
A
s
i
m
age
represe
ntati
on
,
the
resea
rch
e
r
s
ha
ve
there
f
ore
ap
plied
the
trans
form
do
m
ai
n.
W
e
will
as
well
associat
e
this
novel tec
hni
que w
it
h
im
age ret
rieval’s
prese
nt tech
niques.
In
a
2D
SL
T
deco
m
po
sit
io
n,
there
is
usual
ly
an
i
m
age
that
is
div
ided
i
nto
f
our
par
ts,
High
-
High
(HH),
Lo
w
-
L
ow
(LL
),
Hi
gh
-
Lo
w
(H
L
)
an
d
Lo
w
-
High
(L
H
),
as
Fig
ur
e
7
i
ll
us
trat
es,
w
here
H
and
L
re
present
the
hi
gh
a
nd
low
fr
e
quency
band,
c
orres
po
nd
i
ng
ly
.
Eac
h
is
carryin
g
di
ve
rse
im
age
inf
or
m
at
ion
.
T
he
low
-
fr
e
qu
e
ncy
band
c
om
po
ne
nt
of
the
im
age,
wh
ic
h
is
m
arked
as
LL,
retai
ns
t
he
in
ve
ntiv
e
i
m
age
inf
or
m
at
ion
.
On
the
c
on
t
rary
,
the
hig
h
-
a
nd
m
edium
-
frequ
e
ncy
ba
nds
,
HH,
LH
an
d
HL
car
ry
the
inf
or
m
at
ion
associat
ed
with
the
c
on
t
our
,
ed
ge,
as
w
el
l
as
the
i
m
a
ge’
s
oth
e
r
det
ai
ls.
In
the
im
age,
the
im
po
r
ta
nt
inform
at
i
on
is
char
act
e
rized
by
high
coe
ff
i
ci
ents.
I
n
the
m
eantim
e,
the
sm
a
ll
(insig
nif
ic
ant)
coe
ff
ic
ie
nts
are
deli
berat
ed
as
worthle
ss
i
nform
at
ion
or
nois
e.
The
se
sm
al
l
coeffic
ie
nts
th
eref
or
e
ought
t
o
be
ig
nore
d
f
or
t
he
best
out
com
es
in s
ucceedi
ng
op
e
rati
ons to
be at
ta
ined.
Figure
7. The
c
onve
ntion
al
2D SL
T
deco
m
po
sit
io
n
sc
hem
es
f
or
di
vid
in
g
an
im
age
O
ri
g
in
a
l
I
ma
g
e
L
H
LL
LH
HL
HH
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
9
, N
o.
5
,
Oct
ober
20
19
:
4382
-
4395
4388
As
a
m
ulti
-
reso
luti
on
m
et
ho
d
,
the
S
LT
[
17
]
is
well
-
m
a
tc
hed
f
or
piec
ewise
li
near
da
ta
.
The
SLT
ref
e
rs
to
a
n
ort
hogonal
D
W
T,
ha
ving
e
nhanced
ti
m
e
local
iz
at
ion
cha
r
act
erist
ic
s
and
two
ze
ros
m
om
ents.
It
is
fou
nd
e
d
on
the
pr
i
nciple
of
desi
gnin
g
div
e
rse
filt
ers
for
di
ver
se
sca
le
s
no
t
li
ke
it
erated
filt
e
rs
m
et
hod
with
the
us
e
of
D
WT.
F
orm
erly
,
SLT
w
as
app
li
e
d
in
a
var
i
ou
s
a
ppli
cat
i
on
s,
f
or
instance
,
com
pr
essio
n,
de
-
noisi
ng
of
var
i
ou
s
in
pu
t
im
ages,
est
im
ation
a
nd
fast
al
gorithm
s.
SLT
is
exec
ut
ed
as
a
filt
er
-
ba
nk
hav
i
ng
par
al
le
l
structu
res
an
d
us
e
d
in
par
al
le
l
pr
oce
ssing,
where
di
ver
se
filt
ers
ar
e
con
fi
gure
d
f
or
eve
ry
scal
e
rather
than
filt
er
reit
erati
on
at
per
s
on
al
le
vel
.
A
ccordin
g
to
Se
le
sn
ic
k
[
1
7
]
,
the
filt
ers’
c
oe
f
fici
ents
are
c
om
pu
te
d
with the
use
of the S
LT e
quat
ion
s
.
3.
METHO
DS
A
ND M
ATERI
ALS
Pr
im
arily,
the
util
iz
ed
dataset
s
are
hi
gh
poi
nted
with
the
i
ntentio
n
of
e
va
luati
ng
the
re
com
m
end
ed
fr
am
ewo
r
k.
A
fter
that,
the
const
ru
ct
io
n
of
Netw
ork
a
rc
hitec
ture
is
de
scribe
d
accom
pan
ie
d
by
the
deep
conv
olu
ti
onal
netw
ork’s
desi
gn.
S
ub
se
quen
tl
y,
the
m
erg
ing
i
nput
re
pr
e
sentat
ion
t
o
th
e
syst
e
m
is
de
fine
d.
To
te
st t
he
syst
e
m
, v
ari
ou
s
experim
ental
b
en
chm
ark
s ar
e
f
i
nally
ap
plie
d.
3.1.
D
ata sets
Fo
r
t
he
pro
posed
im
age
retrieval
te
chn
iq
ue
to
be
valida
te
d,
three
sta
ndar
d
dataset
s
are
ap
plied.
The
init
ia
l
two
dataset
s
are
acqu
i
red
th
rou
gh
Wang
V
2.0
and
W
a
ng
V
1.0
–
the
W
a
ng
ref
e
rs
to
the
Corel
database
’s
s
ubcl
ass
[
18
]
.
The
thir
d
dat
aset
is
the
C
al
te
ch
101,
w
hich
c
on
sist
s
of
obj
ect
s’
pictures,
wh
ic
h
belo
ng
to
101
cl
assifi
c
at
ion
s
-
a
pprox
i
m
at
ely
40
to
800
im
ages
fo
r
each
cat
egory,
tho
ug
h
m
ajo
rity
of
cat
egories
ha
ve
ap
prox
im
at
e
ly
50
im
ages.
Fei
-
Fei
Li,
Ma
rco
And
re
et
to,
an
d
Ma
r
c
'
Au
reli
o
Ra
nzat
o
est
ablished
the
data
base
i
n
Se
ptem
ber
2003.
Ever
y
im
age’
s
siz
e
is
ap
pro
xi
m
at
ely
300×
200
pix
el
s.
The
m
os
t
widely
us
e
d
an
d
po
pu
la
r
data
base
in
nu
m
ero
us
la
te
st
stu
dies
is
W
A
NG
V1.0
databas
e
[
25
]
.
It
com
pr
i
ses
on
e
thousa
nd
im
a
ges
of
te
n
cl
asses.
Each
c
la
ss
con
sist
s
of
100
im
age
s
that
near
ly
resem
ble
each
oth
er
.
These
cl
asses
are
beac
h,
bus
es,
el
eph
a
nt’s,
horse,
di
nosa
ur,
m
ou
ntain,
m
on
um
ents,
roses,
f
ood
an
d
Africa
,
il
lustrate
d
in
Fi
gure
8.
Figure
8.
A
n
e
xam
ple o
f
im
a
ge fr
om
each
of
the
10 classes
of
the
WANG
databa
se
as
well
as t
heir
class l
abels
WAN
G_1
0000
Database
V
2.0
As
c
om
par
ed
to
it
s
pr
e
de
cesso
r,
th
is
da
ta
set
is
te
n
ti
m
es
la
rg
er
,
Wang
V1.0.
Ma
j
ori
ty
of
t
he
i
m
ages
ha
ve
low
res
olu
ti
on,
wh
ic
h
a
dv
e
rsely
influ
e
nce
s
any
im
age
retrieval
syst
e
m
s’
perform
ance.
Fig
ures
9
i
ll
us
trat
e
the
im
ages
a
ccordin
g
t
o
th
e
cat
eg
or
ie
s
t
hat
are
est
abli
sh
e
d.
Database
V2.
0 i
s how
e
ver m
or
e c
om
pr
ehensi
ve
a
nd ch
al
le
ngin
g
tha
n data
base
V
1.0.
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Combini
ng con
volutio
na
l
ne
ural netw
or
ks
an
d
sl
an
tl
et
…
(
Mo
hamm
e
d S
abbi
h Ham
oud Al
-
Tamimi
)
4389
Figure
9
.
An e
xam
ple i
m
age f
r
om
each
of
th
e 10 classes
of
the
WANG V
2.0
databa
se
tog
et
he
r wit
h
t
heir
cl
ass
label
s
Ca
lt
ech
101
re
fer
s
to
a
data
set
of
di
gital
im
ages
ge
nerat
ed
in
Se
ptem
ber
2003
[
26
]
.
It
ai
m
ed
at
facil
it
at
ing
i
m
age
rec
ogniti
on,
cl
assifi
cat
io
n,
a
nd
com
pu
t
er
visio
n.
The
dataset
co
ns
is
ts
of
a
t
otal
of
9,1
46
i
m
ages,
di
vid
e
d
betwee
n
101
disti
nct
ob
j
ect
cl
asses
(P
ia
no
s,
Faces
,
A
nts,
W
at
che
s
an
d
oth
e
rs)
a
s
w
el
l
as
a
backg
rou
nd
cl
assifi
cat
ion
.
F
or
e
ver
y
cat
eg
or
y,
it
is
app
r
oxim
a
te
ly
,
40
to
800
im
ages.
Ma
j
ori
ty
of
cat
egories
hav
e
a
ppr
ox
im
at
el
y
50
i
m
ages.
The
siz
e
of
ever
y
im
age
is
about
300
x
200
pix
el
s.
Fi
gures
10
il
lustrate
th
e
i
m
ages’
sam
ples.
D
escri
ption o
f used datase
ts
as sho
wn in
Table
1.
Figure
10
.
A
n exam
ple i
m
age f
r
om
each
of
t
he
10
cl
asses
of
the
calt
ech
101 database
tog
et
he
r wit
h
t
heir
cl
ass
label
s
Table
1.
Desc
ription o
f
us
e
d d
at
aset
s
Dataset N
a
m
e
Nu
m
b
e
r
o
f
I
m
ag
es
Clas
s
Nu
m
b
e
r
o
f
I
m
ag
es
I
n
E
a
ch
Class
I
m
ag
e
Size
W
AN
G 1
1000
10
100
2
5
6
×3
8
4
W
AN
G 2
1
0
0
0
0
100
100
2
5
6
×1
2
8
Caltech
10
1
9146
101
40
-
800
3
0
0
×2
0
0
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
9
, N
o.
5
,
Oct
ober
20
19
:
4382
-
4395
4390
3.2.
Be
nchm
arkin
g
The
ben
c
hm
ark
in
g
re
fer
s
t
o
the
m
os
t
i
m
pe
rati
ve
ste
ps
t
ha
t
ough
t
t
o
be
us
e
d
in
m
os
t
of
the
im
age
processi
ng
re
s
earch
with
the
intenti
on
of
de
te
rm
ining
the
dev
el
op
e
d
te
ch
niques’
reli
abili
ty
and
eff
ect
ivene
ss
com
par
ed
to
th
e
current
on
e
.
Ty
pical
ly
,
the
ben
c
hm
ark
in
g
is
at
ta
ined
ei
ther
by
us
i
ng
sim
il
ar
dataset
or
us
i
ng
al
gorithm
s
utilized
in
no
dif
f
eren
t
pro
blem
do
m
ai
n.
T
he
be
nch
m
ark
in
g
i
s
al
so
pe
rfor
m
ed
with
t
he
use
of
th
e
best
and
fam
ou
s
m
e
tho
ds
since
in
the
li
te
ratur
e,
the
re
w
as
retrieval
of
i
m
age.
A
nu
m
ber
of
be
nc
hma
rk
i
ng
te
chn
iq
ues
that
u
se
sim
il
ar s
ta
nd
a
r
d datase
t a
re e
nlist
ed
in
T
able 2.
Table
2.
E
xisti
ng
m
et
ho
d
s
for
be
nc
hm
ark
in
g
Dataset
N
a
m
e
Ben
ch
m
a
rkin
g
W
AN
G 1
ElAla
m
i
[
2
8
]
,
Lin
[
2
9
]
,
W
an
g
[
2
5
]
W
AN
G 2
Sh
rivas
tav
a &
T
y
a
g
i
[
3
0
]
Caltech
10
1
Bo
sch
[
3
1
]
3.3.
C
NN netw
or
k a
rc
hitect
ure
Fo
ll
owin
g
the
colle
ct
ion
of
al
l
the
data,
the
conv
olu
ti
onal
arch
it
ect
ure,
ha
ving
com
plete
ly
con
necte
d
la
ye
rs
was
del
iberated
as
the
av
oid
a
nce
a
r
c
hitec
ture.
I
n
de
sign
i
ng
t
he
r
ecom
m
end
e
d
m
od
el
config
urat
ion
,
the
Kr
iz
hev
s
ky
pr
i
nciples
[
24
]
wer
e
app
li
ed
,
in
wh
ic
h
the
s
ource
co
de
m
igh
t
be
seen
[
27
]
.
The
a
forem
entione
d
gen
e
ric
desig
n
fo
ll
ow
ed
the
co
nf
i
gurati
on
[
24
]
.
Figure
11
il
lu
strat
es
the
s
ug
gested
config
ur
at
io
n,
in
wh
ic
h
the
im
ages
wer
e
pa
ssed
via
a
sta
ck
of
4
conv
olut
ion
al
(con
v.
)
l
ay
ers,
w
her
e
a
(3
×3
)
featur
e
m
ap
siz
e
was
a
ppli
ed
f
or
the
co
nv
.
la
ye
rs
(t
his
i
s
a
vi
rtu
ou
s
si
ze
f
o
r
ce
nter
,
rig
ht/l
eft,
dow
n/up)
.
Ther
e
we
re
di
ve
rse
num
ber
of
poolin
g
la
ye
rs
and
co
nv.
T
he
init
ia
l
two
co
nv.
la
ye
rs
a
ppli
ed
32
kernel
fi
lt
ers;
wh
e
reas
the
fi
nal
two
co
nv.
la
ye
rs
app
li
ed
64
kernel
filt
ers.
Ma
x
-
pool
ing
’s
tw
o
la
ye
rs
se
pa
rated
ever
y
conv
olu
ti
on’s
ste
p.
T
his
c
ombinati
on
helpe
d
the
m
od
el
s
to
m
utu
al
ly
benefit
from
and
upgra
ded
the
propos
e
d
config
ur
at
io
n’s
p
e
rfor
m
ance,
l
eadin
g
to
the
re
trie
val of im
a
ge.
The
co
nv.
Lay
er
was
f
ollo
we
d
by
the
flat
te
ned
la
ye
r
(
ha
vin
g
a
rch
it
ect
ure
s
with
di
ver
se
dep
t
hs
)
a
nd
this
assist
ed
to
transfor
m
into
a
vector
the
2D
m
a
trix
data.
This
gav
e
r
oo
m
fo
r
the
proc
essing
of
the
outp
ut
with
the
com
plete
ly
-
con
necte
d
la
ye
rs,
ref
e
r
red
to
as
de
nse
la
ye
rs.
The
init
ia
l
co
m
pletely
-
connecte
d
la
ye
r
con
ta
ine
d
256
nodes
,
w
he
reas
the
sec
ond
fu
l
ly
-
connecte
d
la
ye
r
was
buil
t
us
in
g
128
node
s.
T
he
re
gula
ri
zat
ion
la
ye
r
ap
plied
dro
pouts
a
nd
w
as
co
nf
i
gure
d
t
o
ra
ndom
ly
omi
t
50
pe
rce
nt
of
the
ne
uro
ns
with
t
he
inte
nt
ion
of
reducin
g o
verfit
ti
ng
. T
he final
lay
er w
as
d
e
sign
e
d by t
he
s
oftm
ax
la
ye
r
[
14,
24
,
23
]
.
Figure
11
.
T
he
r
ec
omm
end
ed
CNN
co
nfi
gur
at
i
on
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Combini
ng con
volutio
na
l
ne
ural netw
or
ks
an
d
sl
an
tl
et
…
(
Mo
hamm
e
d S
abbi
h Ham
oud Al
-
Tamimi
)
4391
a.
CBIR
b
as
ed
of
mer
ging
and
com
bine mul
t
i
CNN:
CB
IR
ref
ers
t
o
a
powerf
ul
dev
ic
e
for
r
et
rieval
of
im
age.
D
ur
i
ng
t
ran
sla
ti
ng
an
i
m
age
into
m
at
he
m
at
ic
a
l
var
ia
bles,
m
ajo
r
com
po
ne
nt
s
of
the
te
ch
niqu
e
are
the
il
lustrati
on
s
that
are ap
plied
.
A
nu
m
ber
of
i
m
age
pr
ese
ntati
on
an
d
desc
riptors
are
c
orrespo
nd
i
ng
t
o
oth
e
r
desc
ript
ors;
there
fore
if
the
com
bin
at
ion
is
app
li
ed
,
bette
r
resu
lt
s
c
ould
be
pro
duced
.
T
his
op
i
nion
im
plies
that
div
e
r
se
d
esc
ript
or
s
cou
l
d
c
reate
di
ver
se
ou
tc
om
es
fo
r
retrieval
of
im
age,
a
nd
inte
grat
e
sever
al
i
m
age
pr
ese
nta
ti
on
s
to
c
om
bi
ne
an
d
m
erg
e
m
ult
i
conv
olu
ti
onal
neural
netw
ork
m
od
el
,
wh
ic
h
cou
l
d
en
han
ce
the
CB
IR
m
od
el
’s
perform
ance.
The
f
unda
m
ental
per
ce
ptio
n
is a
ccum
u
la
ti
ng
in
one C
NN
a
va
riet
y of
knowle
dg
e
s
ources a
nd
featur
e
s.
Deep
le
ar
ning
do
e
s
perf
or
m
bette
r
than
othe
r
m
achine
le
a
rn
i
ng
al
go
rith
m
s
as
the
e
m
p
iric
al
resu
lt
s
su
ggest
.
So
m
e
has
s
uggeste
d
that
is
beca
use
it
loo
sel
y
m
i
m
ic
the
br
ai
n
functi
ons,
m
ulti
ple
la
ye
rs
of
neural
netw
orks
sta
ck
ed
one
after
a
no
t
her
li
ke
the
cl
assic
al
br
ai
n
m
od
el
.
How
ever,
unti
l
now
there
is
no
rob
us
t
theo
reti
cal
bac
kgr
ound
f
or
de
ep
le
ar
ning
[
8
,
9
,
11
,
13
]
,
oth
e
rw
ise
Dee
p
Lea
r
ning
m
achines
usual
l
y
wor
k
bette
r
tha
n
tra
diti
on
al
ML
to
ols
beca
us
e
th
ey
al
so
le
arn
t
he
feat
ur
e
e
xtr
act
ion
pa
rt.
De
ep
le
ar
ning
m
et
hods
aim
at
le
arn
in
g
featu
re
hie
r
arch
ie
s
with
f
eat
ur
es
f
r
om
higher
le
vels
of
the
hiera
rc
hy
form
ed
by
the
com
po
sit
ion
of
lowe
r
le
ve
l
fe
at
ur
es.
A
uto
m
at
ic
al
ly
le
arn
ing
feat
ur
e
s
at
m
ulti
ple
le
vels
of
ab
stract
io
n
al
low
a
syst
e
m
to
le
arn
com
plex
f
un
ct
ion
s
m
app
in
g
the
i
nput
to
the
outp
ut
dir
e
ct
ly
fr
om
data,
without
dep
e
nd
i
ng
com
plete
ly
on
hu
m
an
-
cra
fted
featur
e
s
[13
,
14]
.
I
n
im
age
r
ecognit
ion,
for
exam
ple,
the
tradit
io
nal
set
up
is
t
o
extract
ha
n
dcrafte
d
feat
ur
e
s
and
t
hen
fee
d
a
SV
M.
On
the
co
ntra
ry,
deep
le
a
rn
i
ng
CNN
sc
hem
es
al
so
op
ti
m
iz
e
the
featur
es
that
are
extracte
d
wh
ic
h
la
rg
el
y
exp
la
ins
wh
y
they
pe
rfor
m
bette
r.
The
m
os
t
i
m
p
or
ta
nt
diff
e
re
nce
bet
ween
dee
p
le
arn
i
ng
a
nd
tra
di
ti
on
al
m
achine
le
ar
ning
is
it
s
perform
ance
as
the
scal
e
of
data
increases
.
When
the
data
is
sm
a
ll
,
deep
le
arn
i
ng
al
gorith
m
s
do
n’
t
perf
orm
that
well
.
This
is
beca
use
dee
p
le
arn
in
g
al
go
rithm
s
need
a
la
rg
e
am
ount
of
data
to
un
de
rstan
d
it
perfe
ct
ly
.
On
the
ot
her
ha
nd,
tra
diti
on
a
l
m
achine lear
nin
g al
gorithm
s w
it
h
thei
r han
dc
raf
te
d r
ules
prevail
in
t
his sc
enar
i
o.
By
un
de
rstan
di
ng
the
pro
ble
m
Stat
e
m
ents
wh
ic
h
has
be
en
disc
us
sed
earli
er,
the
CB
IR
try
to
m
easur
e
the
sim
il
arities
of
i
m
ages.
Sinc
e
the
tra
diti
on
al
CB
IR
syst
e
m
s
sti
ll
su
f
fe
r
f
r
om
their
poor
r
et
rieval
accuracy
a
nd
s
ensiti
vity
,
m
or
e
wor
ks
are
sti
ll
req
ui
red
t
o
de
velo
p
new
a
ppr
oac
hes
f
or
th
e
area
of
sim
ilariti
es
of
im
ages
m
ea
su
rem
ent.
The
refor
e
,
this
res
earch
raises
se
ver
al
c
halle
nges,
su
c
h
as
im
pro
ving
the
ret
rieva
l
accuracy
a
nd e
nh
a
ncin
g
t
he
i
m
age d
esc
ript
ors a
nd f
eat
ur
es
extracti
on ste
p.
The
co
nce
pts
of
c
om
bin
ing
and
m
erg
in
g
m
ul
ti
con
volut
ion
al
ne
ur
al
ne
twork
m
od
el
ha
ve
em
plo
ye
d
to
dev
el
op
a
novel
CN
N
-
SL
T
-
CB
IR
m
od
el
accord
i
ng
to
SLT
pr
e
senta
ti
on
com
bin
at
ion
with
CN
N
m
od
el
with
the
i
ntent
ion
of
e
nha
ncing
a
nd
im
pr
ov
ing
t
he
rec
omm
end
ed
CB
IR
m
od
el
’s
pe
rfo
rm
ance.
The
ge
ner
al
CNN
-
SLT
-
CB
IR m
od
el
’s fra
m
ewo
r
k base
d on com
bin
at
io
n of SLT
w
it
h
CNN
is
il
lustra
te
d
in
Fig
ur
e
12
.
Figure
12
. T
he
g
e
ner
al
fr
am
ework o
f
C
NN
-
SLT
-
CB
IR
m
od
el
Dev
el
op
m
ent
of
C
N
N
-
S
LT
-
CB
IR
m
od
el
with
the
us
e
of
SLT
im
age
pr
ese
ntati
on
c
om
bin
ed
with
CNN
m
od
el
with
the
intent
ion
of
de
velo
pi
ng
no
vel
CB
IR
m
od
el
.
The
researc
her
s
in
tro
du
ce
a
ne
w
CN
N
arch
it
ect
ure
th
at
j
oins
inf
or
m
at
ion
from
t
wo
pr
e
sentat
io
n
of
a
n
im
age
into
a
com
pact
and
si
ng
le
i
m
age
descr
i
ptor that
offer
s
ev
e
n bet
te
r
retrie
val of
i
m
age.
Evaluation Warning : The document was created with Spire.PDF for Python.