TELKOM
NIKA
, Vol.13, No
.3, Septembe
r 2015, pp. 9
63~975
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i3.1474
963
Re
cei
v
ed
Jan
uary 26, 201
5
;
Revi
sed Ap
ril 9, 2015; Accepte
d
May 1
,
2015
ATLAS: Adaptive Text Localization Algorithm in High
Color Similarity Background
Lih Fong Wo
ng*
1
, Mohd. Yazid Idris
2
F
a
cult
y
of Com
putin
g, Univ
ersitiT
e
knolog
i Ma
la
ysia, 8
1
3
10
Skuda
i, Johor,
Mala
ysi
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: lf
w
o
ng
2@l
i
ve
.utm.my
1
, y
a
zid@utm.my
2
A
b
st
r
a
ct
One of the ma
jor pro
b
le
ms
t
hat occur in te
xt locali
z
a
t
i
on
process is the
issue of col
o
r similar
i
ty
betw
een
text a
nd
back
g
rou
n
d
i
m
a
ge. T
h
e l
i
m
itati
o
n
of
loca
li
z
a
tio
n
al
gorith
m
s
du
e to
h
i
gh
col
o
r si
milar
i
ty
i
s
hig
h
li
ghte
d
in severa
l resear
ch pap
ers. He
nce, this
research focuses towards the improvement of text
local
i
z
i
n
g
ca
pa
bility i
n
hig
h
c
o
lor b
a
ckgro
un
d imag
e si
mil
a
rity by introduc
ing a
n
ad
aptiv
e text locali
z
a
t
i
on
alg
o
rith
m (AT
L
AS). AT
LAS is
an
e
dge-
bas
e
d
text l
o
cal
i
z
a
ti
on
alg
o
rith
m th
at cons
ists of t
w
o parts. T
e
x
t
-
Backgro
un
d Simi
larity Index (
T
BSI) being th
e first part of
AT
LAS, meas
ures the si
mil
a
rity index of ev
ery
text regi
on w
h
ile th
e sec
o
n
d
,
Multi Ad
aptiv
e T
h
resh
ol
d (
M
AT), performs multiple
a
dap
ti
ve
th
re
sh
ol
ds
calcul
atio
n usi
ng si
z
e
fi
ltratio
n
an
d de
gre
e
devi
a
tion for
lo
cating th
e pos
sible text re
gi
o
n
. In this rese
a
r
ch,
AT
LAS is verif
i
ed a
nd c
o
mp
ared w
i
th oth
e
r
local
i
z
a
tio
n
techn
i
qu
es b
a
s
ed o
n
tw
o parameters, loca
li
z
i
n
g
strength
an
d p
r
ecisio
n. T
h
e
e
x
peri
m
e
n
t h
a
s
bee
n i
m
pl
eme
n
ted
an
d ver
i
fi
ed
usin
g tw
o t
y
pes
of d
a
tase
ts,
gen
erate
d
text
col
o
r sp
ectru
m
datas
et a
n
d
Doc
u
m
ent
An
alysis
an
d
Re
cogn
ition
d
a
ta
set (ICDAR).
T
h
e
result show
s AT
LAS has si
gnific
ant i
m
pr
ove
m
e
n
t on l
o
cali
z
i
n
g
stren
g
th and sl
ig
ht impr
ove
m
e
n
t
on
precisi
on co
mp
ared w
i
th other
locali
z
a
tio
n
al
gorith
m
s i
n
hig
h
color text-ba
ckgrou
nd i
m
a
g
e
.
Ke
y
w
ords
: text locali
z
a
t
i
on, c
o
lor si
mi
larity, ada
ptive thres
hol
d
Copy
right
©
2015 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Text in an image exist in t
w
o form
s eith
er caption tex
t
or scene tex
t
[1].
Both text forms
are im
po
rtant
sou
r
ce fo
r d
e
scribi
ng th
e
sema
ntic
co
n
t
ent of image
s [2] such a
s
:
in ge
o-lo
cati
on
appli
c
ation,
o
b
taining
obj
e
c
ts i
n
form
atio
n, for i
ndexin
g, cate
go
rizin
g
an
d
sea
r
ch
ing p
r
o
c
e
s
s [
3
].
Text extracti
on is
an im
portant
re
se
arch a
r
ea [4
], which
co
mpri
se
s thre
e stag
es [5]:
text
locali
zation, t
e
xt segme
n
ta
tion and text recognitio
n
.
Text localizatio
n is to locate
the position
of
variou
s texts
in the imag
e
while text se
gmentati
on i
n
volves sepa
ration bet
wee
n
text pixels
and
backg
rou
nd
pixels. The t
e
xt pixels are further
co
nverted to soft text in
final stag
e of text
recognitio
n
.
Figure 1. Samples of Cap
t
ion Text (To
p
) and S
c
en
e
Text (Bottom)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 963 – 975
964
Text localization, as the m
a
in eleme
n
t in te
xt extraction frame
w
o
r
k, is the first pro
c
e
s
s
that affect
s t
he ove
r
all
a
c
cura
cy
of text extractio
n
result. It had
bee
n ta
ken
seri
ou
sly by
resea
r
chers i
n
ord
e
r to p
r
odu
ce a hi
gh
accuracy
tex
t
localization
algorith
m
whi
c
h
can lo
cali
ze
the text in va
riou
s im
age
condition
s
wh
ether i
n
capti
on o
r
scene
text. Caption
text (also
kno
w
n
as g
r
ap
hic te
xt, superimp
o
se
d text or artificial
text) is the text that is po
st-a
dded o
r
crea
ted
throug
h ima
ge editing to
ols. Thi
s
type of text is comm
only seen on th
e advertisi
ng a
n
d
informative i
m
age li
ke
bl
og he
ade
r, b
r
ochu
re, an
d
logo. O
n
the
other ha
nd,
scene
text is the
origin
al text in the image.
It can be
seen on
mo
st
of the nature image
s ca
ptured by
digital
device
s
. Figu
re 1 sh
ows th
e example
s
o
f
the caption text and scen
e text.
Both captio
n
text and scene text have three im
po
rtant prope
rties that
can
affect text
locali
zation
result [6]: text geom
etry, text colo
r a
n
d
text effect. Text geo
me
try refe
rs to
the
relative
sh
ap
e an
d p
o
sitio
n
of th
e text, whi
c
h
in
clud
es:
sizes,
fon
t
s, align
m
ent
s, di
re
ction
s
and
distan
ce
s b
e
twee
n charact
e
rs. T
e
xt col
o
r si
mply
ref
e
rs to th
e va
lue of intege
r in ea
ch
col
o
r
cha
nnel
for e
a
ch
text pixel
and
finally, t
e
xt effect refers to
additio
nal o
r
na
ment
or de
co
ratio
n
on
the text for example
sha
dow
ed, sharpene
d and
blurred effe
ct. The com
b
i
nation of tho
s
e
prop
ertie
s
creates
the u
npre
d
icta
ble text
model and un
certai
n
imag
e
b
a
ckgro
und, which
prod
uce a very challen
g
ing
environ
ment for text localization.
Among th
ese
thre
e p
r
op
erties, text col
o
r h
a
s a
simpl
e
form
of
roo
t
cau
s
e
but
requires
compl
e
x
t
e
ch
nique
s t
o
sol
v
e it
. The most
comm
on si
tuation is the existen
c
e of text with almost
simila
r colo
r
with its
ba
ckgrou
nd. Text
locali
zatio
n
algorith
m
s g
enerate
hi
gh
false po
sitives
errors in
lo
ca
ting the text
with hi
gh
col
o
r
simila
rity. The m
a
in
rea
s
on
for thi
s
l
o
cali
zation
error i
s
the small
different
betwe
en the
color value
of te
xt and b
a
ckgrou
nd,
whi
c
h p
r
events
most
algorith
m
s from distin
gui
shin
g col
o
r
spa
c
e
s
. He
n
c
e, it is very challen
g
in
g to locali
ze
text
esp
e
ci
ally text with simila
r
backg
rou
nd
color. Yet, the
s
e type
s of image a
r
e q
u
i
t
e comm
on in
the
real
environ
ment. Hen
c
e
,
there
is a
need
to p
r
o
duce a
n
al
g
o
rithm to
ov
ercome
the
colo
r
simila
rit
y
issu
e.
Section 2 d
e
s
cribe
s
the
related works on te
xt loca
lization al
gori
t
hm. The det
ails of
ATLAS expla
i
n in Se
ction
3 an
d follo
we
d by the
exp
e
rime
nt process, t
he exp
e
rime
ntal
results
and con
c
ludi
ng rem
a
rks in
Section 4 an
d 5.
2. Related Works
Text locali
za
tion alg
o
rith
ms
can
be
categ
o
ri
zed
i
n
to thre
e
ca
tegorie
s:
con
necte
d-
comp
one
nt base
d
(CC), texture based a
nd edg
e ba
se
d algorith
m
s.
CC
ba
sed
alg
o
rithm
s
an
alyze
s
color valu
e of eve
r
y im
age pixel
s
an
d group
s thei
r nea
rby
pixels which
have similarit
y
in color to form a
region that will be used
to differentiate between
text region and backg
ro
u
nd regi
on [7-9]. On the
other ha
nd, texture ba
sed al
gorithm em
pl
oy
machi
ne l
earning te
chni
qu
es fo
r a
nalyzi
ng uni
que
pa
tterns th
at ap
pear on
the t
e
xt regio
n
in t
h
e
image. The
algorith
m
examine
s
sp
ecified colo
r di
stributio
n, either in spatia
l domain or in
freque
ncy d
o
m
ain that m
a
tche
s
with the featu
r
e
s
of gro
und tru
t
h text regio
n
s [10
-
1
2
]. Edge
based alg
o
rit
h
m implem
e
n
t a different
strategy
wi
th others, in
stead of loo
k
in
g for the si
m
ilar
colo
r re
gion,
edge b
a
sed
algorith
m
det
ect the su
dde
n cha
nge of color value in
an image
reg
i
on
and defin
e the regi
on of sharp
ch
ange
as an e
dge.
T
he ed
ge will
act as a b
a
rrier that se
parate
betwe
en text regio
n
an
d b
a
ckgroun
d re
gion. A pre
-
d
e
termin
ed thresh
old is
use
d
as a mi
nim
u
m
value to eval
uate the
sharp chan
ges in
pixel co
lo
r.
Any sha
r
p
ch
ange
above
the threshold
will
be identified
as an e
dge [1
3-14].
In orde
r to l
ook fo
r the e
dge
s, it requi
res
an ed
ge
detecto
r alg
o
r
ithm. Severa
l edge
detecto
r al
go
rithms have
been
intro
d
u
c
ed
in
cludin
g
Sobel [1
5], Rob
e
rts [16], Lapla
c
ia
n [1
7],
Geneti
c
-Ant Con
o
ly
edg
e detecto
r
[18] and Can
n
y e
dge d
e
tecto
r
s [19]. It has b
een
recogni
zed
that [20-22]
Can
n
y edge
detecto
r prod
uce
d
high
er
accuracy a
n
d
better edg
e image g
r
ante
d
by
its ed
ge thi
n
n
i
ng al
gorith
m
and
heu
risti
c
threshold
co
mpared with others.
Howe
ver,
the origi
nal
purp
o
se
of Canny e
dge
d
e
tector i
s
to
extract th
e o
b
ject fe
atures from
imag
es. To im
pleme
n
t it
on text localization alg
o
rit
h
m, it require
s so
m
e
additi
onal filtration
and enh
an
ce
ment pro
c
e
sses
to locate the
co
rrect text
edge
s a
nd
el
iminate u
nne
ce
ssary e
d
g
e
s. Enh
a
n
c
e
m
ent on
Can
n
y
edge b
a
sed a
l
gorithm o
n
th
e text localiza
t
ion as hi
ghli
ghted in [23
-
25] increa
se
s the accuracy
o
f
locali
zing text
in images.
Liu and
Wa
n
g
[23] pro
p
o
s
ed stroke-li
k
e
d
edge
dete
c
tion based o
n
co
ntours to
remove
other
noi
se
edge f
r
om
complex ima
g
e
. Then, th
e
y
locate th
e
text region
s based
on t
h
e
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
ATLAS: Adaptive Te
xt Localizatio
n Algo
rithm
in High Colo
r Sim
ilarity… (Li
h
Fon
g
Won
g
)
965
distrib
u
tion
of edg
es and
corne
r
s. Finall
y
, they
perfo
rmed
seg
m
ent
ation o
n
the t
e
xt regio
n
s a
n
d
identify the text pixels by lookin
g for the l
a
rge
s
t fre
que
ncy bin in th
e
colo
r hi
stogram. Yi and Ti
an
[24] introdu
ce
d an edge
-ba
s
ed solution
on text localization with th
ree ste
p
s: First, to cluster the
edge bo
und
a
r
ies b
a
sed o
n
bigram
col
o
r uniformity;
Second, to segm
ent stro
ke
s by assi
g
n
ing
the mean
co
lor-pair i
n
e
a
ch
bou
nda
ry layer. An
d
third, use
Gabo
r-ba
sed
text features to
determi
ne th
e co
rre
ct text regio
n
s from
the ca
n
d
idat
es. Lee
and
Kim [25] prop
ose
d
an effici
ent
edge
ba
sed
t
e
xt locali
zatio
n
alg
o
rithm
b
y
prop
os
i
ng
a
two-stag
e
co
nditional
ra
nd
om field,
whe
r
e
it utilizes both
edge map a
n
d
salie
nt map
to look
for op
timal configu
r
ation of text region
s.
Limitation of the cu
rre
nt algor
ithm
s [23-25] are mai
n
l
y
due to low resolution of
image,
multiple
colo
r in the text a
nd
compl
e
x b
a
ckgroun
d, which
can b
e
gene
rali
ze
as the p
r
obl
em
of
colo
r
simila
rity betwe
en te
xt pixel and
b
a
ckgroun
d pi
xel. For menti
oned
ed
ge-b
a
se
d al
gorith
m
s,
the co
re fun
c
tion in the al
gorithm i
s
Ca
nny edg
e
det
ector
algo
rith
m, which is
u
s
ed to
discov
er
the edg
es in
side the im
age
. The dete
c
te
d edg
e i
s
th
e
n
bein
g
filtere
d
and
en
han
ced, so
that th
e
leftovers a
r
e t
he ed
ge
s of the text. However, it re
q
u
ires a
sp
ecifi
c
amount of diff
eren
ce
s of
co
lor
value betwee
n
text pixels a
nd ba
ckgroun
d pixels
b
e
fore the text can
be identified
and lo
cated
b
y
the algo
rithm
.
Hen
c
e, the
smalle
r of th
e differen
c
e
s
betwee
n
col
o
rs val
ue (or the highe
r color
simila
rity), the harder th
e
locali
zation
a
l
gorithm
can
locate the
po
sition of text. Locali
z
atio
n
is
likely to fail
when it attempt
s
to d
eal
with
texts
that ha
ve high
col
o
r
simila
rity with
its ba
ckgrou
nd
like
eng
raved
text, text with complex
ba
ckgro
und
an
d
effect of li
gh
t exposure. S
o
me exa
m
ple
s
of images a
r
e
sho
w
n in Fig
u
re 2.
Figure 2. Image with Hi
gh
Colo
r Similari
ty:engrav
ed text (left), exposu
r
e of light (ce
n
ter) and
compl
e
x backgrou
nd (right
)
Implementati
on of a
daptiv
e thre
sh
old
can
solve thi
s
probl
em. Ron
g
et al. [26] i
n
trodu
ced
their ada
ptive thresh
old alg
o
rithm ba
se
d on t
he mean
and sta
nda
rd
deviation of image g
r
adie
n
t
.
Another exa
m
ple d
one
by
Li et al. [2
7]
whe
r
e th
ey a
pplied
Mea
n
Shift algorith
m
on
Ca
nny
edge
detecto
r to
e
x
tract
wea
k
o
b
ject.
Ho
wev
e
r, b
o
th al
gorithms fo
cu
s
o
n
en
han
cin
g
t
he
Can
n
y ed
ge
detecto
r in a more ge
ne
ral
purpo
se rath
er than sp
ec
if
ic to text localization. The
r
e are also other
enha
ncement
works d
one
on the detect
i
on of wea
k
edge
s in oth
e
r field for example
s
medi
cal
image [28] a
nd rad
a
r ima
ge [29]. Existing text lo
calizing alg
o
rith
ms usi
ng ad
a
p
tive thresh
ol
d on
edge
dete
c
to
r are relativel
y
very few. T
he mo
st rel
a
ted work
wa
s
done
by Hsia
and
Ho [3
0], but
they focu
s
on
localize text
in video
sce
n
e
u
s
ing
Robe
rts
edge
det
e
c
tor in
stead
o
n
Canny
edg
e
detecto
r.
Given that text images
co
ntain texts wi
th diffe
rent p
o
sition a
nd di
fferent col
o
r
simila
rity,
adaptive thre
shol
d is limit
ed. Thu
s
, a multi-re
gi
on
adaptive thre
shol
d is ne
e
ded to en
su
re all
texts
with
different
color s
i
milarity in
an image
a
r
e
loca
liz
ed
. T
h
e ad
va
n
t
a
g
e
o
f
us
in
g mu
lti-r
e
gion
adaptive thre
shol
d is, it allow the ap
plication of
low
value of ada
ptive thresh
ol
d on the re
gi
on
that has
high
colo
r
similari
ty instead of t
he entir
e ima
ge. Beca
use
in som
e
case
s, the expo
su
re
of light will effect only a small regi
on o
n
the
image,
if the adaptive thresh
old
take
s the ent
ire
image
as
co
n
s
ide
r
ation
(fin
d the me
an o
f
the entire
im
age), th
e sm
all re
gion
wit
h
very hig
h
color
simila
rity will be neutralize
d
by the mea
n
value, and
finally the alg
o
rithm omit th
e locali
ze reg
i
on
with light exposu
r
e. In ord
e
r to form the region
s
with
different col
o
r similarity, the candi
date
s
of
text region are first formed
, and then possible regi
o
n
of omitted cha
r
a
c
ter a
r
e
estimated. The
suitabl
e th
re
shold val
ue i
s
cal
c
ulate
d
b
a
s
ed
on
the
s
i
milarity index
in the region to further extrac
t
the missing e
dge
s and
co
mplete the text localizatio
n
process.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 963 – 975
966
In sum
m
ary,
this p
ape
r f
o
cu
s o
n
the
enha
ncement
of the
Can
n
y
edge
dete
c
tor fo
r
solving the
color
simila
rity issue
s
in text locali
zation.
A new a
dapti
v
e threshold
with multi-reg
i
on
is p
r
o
p
o
s
ed,
whi
c
h fo
cu
s
on
Can
n
y ed
ge d
e
tecto
r
with pu
rpo
s
e
o
f
locali
zin
g
te
xt that have
hig
h
colo
r simil
a
rit
y
wit
h
it
s bac
kgr
oun
d.
3. Proposed
Algorithm
The p
r
op
osed alg
o
rith
m intend
s
to solve th
e afore
m
ent
ioned
ch
alle
nge
s by
impleme
n
ting
a m
u
lti-regio
n
ad
aptive th
reshold
on
v
a
riou
s types
of text simila
rity. In order
to
deal
with u
n
c
ertai
n
ty of
differen
c
e
s
i
n
color bet
ween text pix
e
ls
and
ba
ckgroun
d pixe
ls, a
measurement
of simila
rity index is
nee
d
ed for fu
rthe
r
cre
d
it. Base
d
on p
r
eviou
s
resea
r
ch, so f
a
r
there i
s
no st
anda
rd me
asurem
ent avail
able to defin
e
the colo
r sim
ilarity. Hen
c
e,
this pape
r first
prop
osed a
measurement
method fo
r
colo
r si
mila
rit
y
index, and
next prop
ose
d
a multi-regi
on
adaptive thre
shol
d for
Ca
nny edg
e det
ector by refe
rring to it
s si
milarity index
. We na
me it
as
Adaptive Tex
t
Localizi
ng A
l
gorithm
o
r
A
T
LAS. Th
e
summary
of th
e alg
o
rithm
is sh
own at
Ta
ble
1, Table 2 an
d Figure 3.
Figure 3. Co
mpone
nt Dia
g
ram of ATL
AS
3.1. Text-Ba
ckgrou
nd Similarit
y
Inde
x
To sho
w
the
colo
r
simila
rity betwe
en te
xt pixels a
nd
backg
rou
nd
p
i
xels, a m
e
a
s
urem
ent
index called
Text-Backg
ro
und Simila
rity Index (
TBS
I
) i
s
introdu
ce
d
whi
c
h i
s
defi
ned a
s
the
degree of like
ness for pixel
s
value bet
we
en text and backgroun
d in an image
regi
on.
To cal
c
ulate
TBS
I
of an ima
ge, it is first converte
d into grayscal
e image
I
. The
conve
r
si
on to
gray
scale i
m
age i
s
inte
nded to
sim
p
lify the cal
c
ul
ation. Using
a re
ctan
gle b
o
x
manually mark-u
p of
ea
ch
of
the gr
o
und
truth
re
gion
s of text. Let
|1
,
2
,
,
i
GG
i
N
w
h
er
e
i
G
is the
gro
u
n
d
truth
regi
o
n
marke
d
an
d
N
is th
e tota
l numb
e
r of t
e
xt regio
n
s i
n
I
.
TBS
I
estimation
de
pend
s on th
e
grou
nd truth
regio
n
. Ho
we
ver, if the gro
und truth
regi
on is u
n
kno
w
n,
it can be re
pl
ace
d
with re
g
i
on of intere
st. Next, for each re
gion
i
G
, Otsu’
s
bina
rization algo
rithm
[31] is appli
ed to get the app
roxim
a
ted seg
m
e
n
tation between text pixels,
ij
Gt
and
backg
rou
nd p
i
xels
ij
Gb
. The averag
e gray value for text pixels an
d ba
ckgro
und
pixels in ea
ch
regio
n
i
G
is obt
ained. L
e
t
i
Gt
be
the average
value for text
pixels a
nd
i
Gb
be
th
e
a
v
er
a
ge
value for
ba
ckgroun
d pixel
s
in th
e re
gi
on
i
G
. The ab
so
lute col
o
r
different
betwee
n
text pixels
and ba
ckgrou
nd pixels
i
D
G
can
be cal
c
ulate
d
by using th
e formula:
ii
i
D
G
Gt
Gb
(1)
The average
value of col
o
r diffe
re
nce
betwee
n
text and backg
round,
i
D
G
for the
image
I
can b
e
determi
ned
by averagin
g
the differen
c
e
s
for all re
gio
n
s:
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
ATLAS: Adaptive Te
xt Localizatio
n Algo
rithm
in High Colo
r Sim
ilarity… (Li
h
Fon
g
Won
g
)
967
1
i
DG
DG
N
(2)
The pu
rpo
s
e
of cal
c
ulating
the avera
ge
colo
r differe
nt betwee
n
text and ba
ckg
r
o
und i
s
to look for its percenta
ge of
t
he color differen
c
e
s
. Fina
lly,
the
TBS
I
for image
can be computed
usin
g the formula:
ma
x
1
DG
TB
S
I
D
(3)
Whe
r
e
ma
x
D
refe
rs to m
a
ximu
m po
ssi
ble v
a
lue diffe
rent
betwe
en tex
t
and b
a
ckg
r
ound
pixels. In this
resea
r
ch, gra
y
scal
e
imag
e
is u
s
ed, h
e
n
c
e the
ma
x
D
will be
255
(Maximu
m
different
scena
rio i
s
when two pixel
s
with value
of 0 a
s
bl
ack and
255
a
s
white).
TBS
I
r
a
ng
es
be
tw
e
e
n
zero a
nd o
n
e
,
whe
r
e hi
gh
er valu
e of
TBS
I
repre
s
e
n
t hig
her
simila
rity betwe
en text pixels
an
d
backg
rou
nd
pixels. It is consi
dered inv
a
lid whe
n
1
TBSI
or
0
DG
as these re
pre
s
ent that
there i
s
n
o
dif
f
erent b
e
twe
e
n
text pixels
and b
a
ckg
r
ou
nd pixel
s
. Thi
s
situation ta
ke
s pla
c
e
wh
en
there is o
n
ly one color in t
he gro
und tru
t
h text region.
Table 1. Sum
m
ary of TBSI
Algorithm
1
.
Te
xt
Background Similarity
In
dex
1.
Convert image in
to gra
y
scale ima
ge.
2.
Mark each of the
potential ground
truth re
gions of text.
3.
For each po
tenti
a
l ground tr
uth re
gions
3.1
Appl
y
OTSU alg
o
rithm
3.2
Obtain app
ro
ximated value for te
xt pixels and background pi
xels
3.3
Calculate the ab
solute different b
e
tween te
xt pixels and backgroun
d pixels.
4.
Average the abs
olute differences.
5. Calculate
TBSI
b
y
the
ratio of absolute
differenc
es to ma
ximum possible different.
3.2. Multi Ad
aptiv
e
Threshold
TBS
I
measures th
e deg
ree
of
likeline
s
s b
e
twee
n text pixels an
d ba
ckgroun
d pixe
ls.
Hen
c
e, it is suitable to utili
ze it as the
a
dapt
ive thre
shold
s
for Can
n
y edge d
e
te
ctor al
gorith
m
b
y
applying a lo
w thre
sh
old
value on hig
h
TBS
I
image
s a
nd high th
re
shold value o
n
low
TBS
I
image
s. Diffe
rent from
oth
e
r
app
roa
c
h
e
s
, this
research i
m
plem
ent
s m
u
lti ada
ptive thre
shol
d
on
each of the
p
o
ssible text
region
s in
an
image to
en
sure th
at Can
n
y edge
dete
c
tor
do
not o
m
it
any po
ssible
text regio
n
with hig
h
colo
r
similarity
. Before
cal
c
ulatin
g
the th
re
shol
d
s
valu
e, si
mpl
e
analysi
s
ne
ed
s to be don
e to find the possible text regi
ons in the im
age.
The p
r
op
ose
d
algo
rithm b
egin
s
with a
n
image
, and
Can
n
y edge
detecto
r is
ap
plied to
obtain an initial binary ed
ge image
by using the overall
TBS
I
of
I
.Since the grou
n
d
truth
regio
n
s
and
text candi
date
s
are u
n
k
nown at the initial
stage,
oveall
TBS
I
is
simply cal
c
ulate
d
by
taking th
e full
input ima
ge
as the
re
gion
of intere
st.
Next, edge
pi
xels a
r
e divid
ed into
seve
ral
grou
ps, and
l
e
t
|1
,
2
,
,
i
Ee
i
N
whe
r
e
i
e
re
fers to
a set
of contin
uou
s edge
pixels
whe
r
e
the eight-co
n
necte
d neig
h
bor of ea
ch e
dge pixel
s
co
ntain at least
one of anoth
e
r ed
ge pixel
in
the sam
e
gro
up and
N
den
otes the total
edge pixel
g
r
oup
s in
E
. Then, for ea
ch
edge pixe
l
grou
p
i
e
, a regi
on
mi
n
m
i
n
ma
x
m
a
x
,,
,
i
Rx
y
x
y
is set up,
where the re
gion is en
clo
s
ed by a mini
mum
x
and
y
co
ordi
nates
min
m
in
,
ii
xy
a
n
d
al
s
o
ma
xi
m
u
m
x
and
y
co
ordi
nates
max
m
a
x
,
ii
xy
re
sult
from the
overall edg
e pixel
s
in
i
e
. This
si
mply indicate
s that forea
c
h
i
e
, the re
gion
has
cove
red
the minimum
surfa
c
e
are
a
. For any
whi
c
h is too
small
or too big, o
r
whi
c
h is
having imbal
ance
width a
nd h
e
i
ght ratio, it i
s
eliminated f
r
o
m
.
Figure 4
repre
s
e
n
ts th
e
step
-by-step
workflo
w
for
ATLAS until obtaining initial
edge pixel
s
grou
p.
i
e
E
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 963 – 975
968
Figure 4. (a)
Origin
al Imag
e; (b) Imag
e after
applying
Canny ed
ge
detecto
r; (c) Image after e
dge
grou
p filtering
;
(d) Initial ed
ge pixel gro
u
p
As sho
w
n i
n
Figu
re 4,
e
x
posu
r
e
of li
ght in the
o
r
iginal im
age
increa
se
s t
he
colo
r
simila
rity for the text with t
he ba
ckgroun
d. Hen
c
e, after ap
plyi
ng Canny edg
e de
tector, the ed
ge
insid
e
the
reg
i
on
with hi
gh
colo
r
simila
rity is mi
ssin
g
a
fter filteration
is d
one. In
o
r
der to
solve t
h
is
probl
em, ATL
AS utilized
th
e remaini
ng
e
dge i
n
form
ation o
n
the
ima
ge, an
d fo
rm
s the
edge
pix
e
l
grou
p
to pre
d
ict the p
o
sit
i
on of the mi
ssi
ng
regio
n
.Since
bro
k
e
n
edge
s often
exist on th
e
result of Can
n
y edge dete
c
tor, it create
a lot of
sma
ll region
s tha
t
will affect the co
mplexit
y
.
Hen
c
e, fo
r a
n
y
regio
n
that i
t
overlap
s
wit
h
ea
ch
othe
r, they are m
e
rged to
form
a
ne
w regio
n
b
y
re-adju
s
ting the minimum
and maximu
m corrdin
a
tes
Figure 5(b)
sho
w
s the re
sult after the first
mergi
ng. After mer
g
in ste
p
s, the leftov
e
r re
gion
sa
re
each assum
e
d to be cont
aining eith
er
a
singl
e cha
r
a
c
ter or wo
rd. Therefo
r
e, the regio
n
s
which are clo
s
e
d
to each oth
e
r
are a
s
sumed
to
be the g
r
ou
p
with same fe
ature
(eithe
r text or noi
se).
For e
a
ch
re
g
i
on, locate every re
gion n
e
a
r
by it, by searchin
g the are
a
arou
nd it by a distance of
J
in horizon
tal and
K
in vertical.T
wo
regio
n
s
are then me
rg
ed
again th
ey are both in
sid
e
each searchi
ng zone. Fig
u
re 5
(
c) sho
w
s the
result of
se
cond m
e
rging.
J
and
K
den
ote t
o
the
avera
g
e
wi
dth of
a
ch
ara
c
te
r a
nd ave
r
ag
e
height of a ch
ara
c
ter
respe
c
tively and they can be o
b
tained by:
ma
x
m
i
n
0
1
N
ii
i
Jx
x
N
(4)
max
m
i
n
0
1
N
ii
i
Ky
y
N
(5)
Figure 5. (a)
Small regio
n
s from Can
n
y edge d
e
tecto
r
; (b) Re
sult o
n
mergi
ng the
overlapp
ed
regio
n
; (c) Re
sult on me
rgi
ng the regi
on
s within
sea
r
ching ra
nge
Duri
ng
th
e
fai
l
ure ca
se
s
i
n
high col
o
r si
milarity
, the
miss lo
cali
ze
d characte
rs
mostly on
the middle of
the text as sh
own in
Figu
re
4(d
)
.
Hen
c
e,
for any two
re
gion
s whi
c
h
h
a
ve the sa
me
alignme
n
t an
d same
dire
ction, it is a
ssumed to
be
o
n
the
sam
e
region
but
co
ntains a mi
ssing
text at the middle. Degree
deviation i
s
p
r
opo
se
d
an
d cal
c
ulate
d
for su
ch
regio
n
s before mergi
ng
E
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
ATLAS: Adaptive Te
xt Localizatio
n Algo
rithm
in High Colo
r Sim
ilarity… (Li
h
Fon
g
Won
g
)
969
them. De
gre
e
deviation
i
s
u
s
e
d
to
estimate
the ali
gnment
and
dire
ction fo
r
any two
re
gi
ons.
Con
s
id
er if two regio
n
s a
r
e pa
rallel to
each othe
r, they most like
l
y bel
ong to the sa
me regi
on.
Ho
wever,
in
real e
n
viron
m
ent, text regi
o
n
in
the i
m
ag
e not
al
ways
align
with i
m
a
ge h
o
ri
zo
ntal, it
may offset b
y
a ce
rtain
a
m
ount of
an
gle from
the
hori
z
ontal
lin
e. Hen
c
e,
by looki
ng fo
r t
he
regio
n
s with
simila
r an
gle
deviation
with
each ot
he
r, the po
ssible
missi
ng text can be f
ound.
Fo
r
instan
ce, if
b
o
th regio
n
s is assu
med
to
be
1
e
and
2
e
. the
degree
of d
e
viation bet
we
en the
regio
n
is calculate a
nd
de
picte
d
with
mi
n
ta
n
and
max
ta
n
wh
ere
mi
n
ta
n
refe
r to
d
egre
e
of d
e
viation for
the minim
u
m
x
and
y
c
o
or
d
i
na
te
wh
ile
max
ta
n
ref
e
r to
deg
re
e
of deviation
for the
maxim
u
m
x
and
y
coordi
na
te. Figure 6 d
epict
s an in
stance wh
ere
mi
n
ta
n
and
ma
x
ta
n
can be o
b
tained by:
12
mi
n
m
i
n
mi
n
12
mi
n
m
i
n
tan
yy
x
x
(6)
12
ma
x
m
a
x
ma
x
12
ma
x
m
ax
ta
n
yy
xx
(7)
Figure 6. (a)
Deg
r
ee d
e
via
t
ion betwe
en
two regi
on
s,
min
and
max
is the de
gree b
e
twe
e
n
yellow
and re
d line; (b) Re
sult afte
r mergi
ng reg
i
on with simil
a
r deg
re
e de
viation
Furthe
rmo
r
e,
the deg
ree
deviation
dev
for both
mi
n
and
max
can be
cal
c
ul
a
t
ed usi
n
g
the equatio
n:
ma
x
m
i
n
de
v
(8)
If the
dev
is sm
all, i.e.
de
v
T
, both regi
on
s is
merg
ed into
one by re
adj
usting the
maximum and minimum
x
,
y
coo
r
dinate
s
.
T
is the maxim
u
m limit of deviation allowe
d. In the
prop
osed sy
stem, loose strategy
9
T
is
taken, whi
c
h
equivalent to
10
%
of maximum
possibl
e d
e
viation
90
. After merging
p
r
oce
s
s, the
some
regi
on
s might
co
nta
i
n un
wa
nted
feature
s
(noi
se);
hen
ce
re
gion filtratio
n
is
don
e to
fi
lter out
the
region
s
whi
c
h
are
mo
re li
kely
contai
ning
n
o
ise. After, filtration,
re
sul
t
of new e
d
ge pixel g
r
o
up group
s
'
E
is then
bein
g
prod
uced. Assume
''
|1
,
2
,
,
i
Ee
i
M
w
h
er
e
'
i
e
refe
rs to m
i
ssi
ng text re
gion
s (regi
on
sthat
adde
d duri
n
g
previou
s
me
rging p
r
o
c
e
s
s)and
M
is the total numbe
r of the region
s. Each
'
i
e
is
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 963 – 975
970
assume
d to
contain
a mixture
of text an
d noi
se.
He
n
c
e,
TBS
I
cal
c
ul
ation i
s
p
e
rfo
r
m
ed for ea
ch
'
i
e
as
well
as th
e simil
a
rity in
dex of
i
TB
SI
for th
at parti
cula
r region. T
he v
a
lue
i
TBSI
is
us
ed
a
s
weig
ht for th
e
thre
shol
d of
edge
dete
c
to
r. Ca
nny
e
d
g
e
dete
c
tion
is re
-ap
p
lied
o
n
ea
ch
re
gion
'
i
e
of the original
image
I
by using thre
shol
d
s
cal
c
ul
ated b
y
:
'
11
lo
w
l
o
w
i
TT
T
B
S
I
(9)
''
2
hi
gh
l
o
w
TT
(10
)
lo
w
T
refers to
the
origin
al lo
we
r thre
shol
d u
s
ed at th
e first stag
e a
nd
refers to markup
factor fo
r th
e thre
shol
d t
hat re
se
rve the up
per lim
it gro
w
th fro
m
origi
nal th
reshold. In t
h
is
resea
r
ch, the
threshold i
s
markup by
50%
or
0.
5
. Other re
gio
n
s
which do
not fall in
'
E
wil
l
not be
proce
s
sed
and
ret
a
in the
ori
g
in
al ed
ge
re
sul
t. Figure
7 gi
ves a
n
illu
stration of
su
ch
the
instan
ce. T
h
e final ed
ge
image
reveal
s the lo
ca
li
zed text on th
e regi
on
wh
ere filtratio
n
s on
Equation (4-8
) is re
-impl
e
m
ented an
d the
edge pixel
s
are cl
uste
re
d.
Figure 7. (a)
Secon
d
Ca
nn
yedge dete
c
t
o
r appli
c
atio
n
onare
a
'
1
e
by using
1
TBSI
as
threshold;
(b) Final local
i
zed result.
Table 2. Sum
m
ary of ATLAS
Algorithm
2. Ad
a
p
tive Text Localiz
i
n
g Algorith
m
1.
Appl
y
Cann
y Ed
ge Detector to
ob
tain an initial edge image b
y
using
overall image
TBSI
.
2.
Grou
p the edg
es w
h
ich continuou
s
connect to its e
i
ght-connected n
e
ighbor.
3.
For each ed
ge gr
oup:
3.1
Search nearb
y
ra
nge for oth
e
r ed
g
e
groups
3.2
Merge both
grou
ps if similar grou
p prope
rt
y
is fou
nd
3.3
Ne
w
edge group is
formed
4.
For each ne
w ed
ge group:
4.1 Calculate
TBSI
value for ext
r
a r
egion of
the edge g
r
oup
4.2
Reappl
y
Cann
y
Edge Detector t
o
the ext
r
a re
gion
w
i
th calculated
TBSI
as
the threshold.
5.
Regions
w
i
th edg
e pixels are mark
ed as text
region.
4. Experiment and Dis
c
u
ssion
This sectio
n details the e
x
perime
n
tal proces
s carrie
d out to evaluate the efficiency of
the ATLAS in terms of its stren
g
th and
accuracy.
To
sho
w
the ro
b
u
stne
ss of the result, ATLAS
wa
s teste
d
with two different set
s
of i
m
age
dat
a
s
e
t
s. First data
s
et was u
s
e
d
to evaluate
the
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
ATLAS: Adaptive Te
xt Localizatio
n Algo
rithm
in High Colo
r Sim
ilarity… (Li
h
Fon
g
Won
g
)
971
ideal lo
cali
zin
g
stre
ngth of
algorithm
s,
whi
c
h co
n
s
ist
of image
s with different text-backg
ro
u
nd
colo
r simil
a
rit
y
. The secon
d
dataset wa
s used to
ev
aluate the p
r
eci
s
ion a
s
well as the a
c
t
ual
locali
zing
strength, whi
c
h
indicate
s th
e usa
b
ility of the ATLAS, by using common ima
g
e
s
dataset.
The first d
a
taset
(self-g
e
nerate
d
d
a
ta
set)
of the
e
x
perime
n
t proce
s
s requi
res im
age
s
with different
rang
e of
TBS
I
value. In order to achi
eve a co
mpre
hen
sive result, the full design of
the data
s
et i
s
self-gen
erat
ed. The text t
o
be lo
cali
ze
d in the
gen
e
r
ated im
age
s is p
o
sition
ed
at
the center of i
m
age
whi
c
h
i
s
con
s
id
ere
d
to be th
e e
a
si
est p
o
sitio
n
f
o
r lo
cali
zatio
n
.
The l
o
calizi
ng
algorith
m
is limited to g
r
a
y
scal
e
ima
g
e
,
hence a tot
a
l of
65
,
5
80
2
5
6
2
5
5
gr
ayscale images
were
gen
erat
ed
whi
c
h
co
mpri
se
s all
p
o
ssible
comb
ination
betwe
en text pixel
s
and
colo
r pi
xels
except for th
e combi
natio
n whe
r
e both
text pixels and backg
ro
un
d pixels are t
he sam
e
(inv
alid
by
TBS
I
definitio
n). Figure 8
sh
ows
some
ex
ample
s
of
the
self
-gen
erate
d
imag
e
data
s
et a
nd it
s
corre
s
p
ondin
g
TBS
I
.
Figure 8. Image with vario
u
s
TBS
I
;
Top left:
0
.
21
57
TB
SI
, top right:
0
.
41
18
TB
SI
,
bottom left:
0
.
60
78
TB
SI
; b
o
ttom right:
0.8
0
3
9
TB
SI
The second
dataset evaluates
the
actual locali
zin
g
stre
ngth which i
s
obtai
ned b
y
cal
c
ulatin
g th
e ave
r
ag
e lo
cali
zing
st
ren
g
ths
on
ea
ch
imag
e. Hen
c
e, the
exp
e
rimental
pro
c
e
ss
employed
pu
blic i
m
age
-d
a
t
aset o
b
taine
d
from
ICDA
R 2
011
[32],
whi
c
h
comp
rise
s commo
nly
see
n
text image
s. Moreo
v
er, the dat
aset i
s
com
m
only u
s
ed
for text local
i
zation
and t
e
xt
recognitio
n
for analy
s
is i
n
this re
sea
r
ch field.
Figu
re 9 sh
ows
sampl
e
s of text images from
ICDAR 20
11
dataset. Th
e rel
a
ted inf
o
rmatio
n
of
dataset for f
i
rst exp
e
rim
e
nt and
se
co
nd
experim
ent are summ
ari
z
e
d
in Table 3.
Figure 9.Sam
p
les of ICDAR 201
1 Imag
es
Table 3. Data
set De
sription
Dataset
Self-Gene
rated
Dataset
ICDAR 2011
Dat
a
set
Total Images
65,280
420
TBSI
Range
0.00 – 0.99
0.140 – 0.84
4
Resolution 1360*1024
101*130, 109*1
4
0
, 110*110, 11
0*152, …
980*152, 990*1
4
5
, 992*592, 16
25
*313
The first
exp
e
rime
nt on
self-gen
erated
data
s
et
evaluates the
ide
a
l lo
cali
zing
strength
of
the ATLAS.
With referen
c
e to oth
e
r
alg
o
rithm
s
, it ca
n al
so b
e
u
s
ed a
s
a reference to
cal
c
u
l
ate
the avera
ge
simila
rity of the publi
c
dat
aset. Ideal
lo
cali
zing
stre
n
g
th revie
w
s t
he ca
pability of an
algorith
m
to locali
ze text regardle
ss a
n
y
difference
s
of color
simila
rity.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
9
30
TELKOM
NIKA
Vol. 13, No. 3, September 20
15 : 963 – 975
972
Strength of
al
gorithm
can
be
cal
c
ulate
d
by the
surfa
c
e a
r
ea
covered by th
e fu
n
c
tion:
precisi
o
n of
similarity index
p
. Pre
c
isi
on
P
is taken
as
/
cd
PN
N
, wh
ere
c
N
is the total
numbe
r
of correct text re
gion, a
nd
d
N
is the total
nu
mber of d
e
te
cted text regi
on. The
text
regio
n
is co
n
s
ide
r
ed lo
cali
zed
corre
c
tly only if
t
he outcome of lo
cali
zed text region covere
d at
least
80%
of the text region fro
m
grou
nd trut
h regi
on. Let
the simila
rity index
TB
SI
, w
h
er
e
the values
ra
nge of
TB
S
I
from
zero to one, the strength o
f
the algorith
m
can b
e
exp
r
ess by the
followin
g
equ
ation:
1
0
pd
(1
1
)
Self-gen
erate
d
dataset u
s
e
s
the image
s with
clea
r an
d unique
colo
r for each text pixels
and
ba
ckgro
und
pixels; it
eliminat
es al
l othe
r u
n
ce
rt
ainties that
can affe
ct the
localizi
ng
re
sult
except for th
e colo
r simila
rity where this pap
er focu
se
s on. All the gene
rated i
m
age
s have
the
same
texts a
nd po
sition
b
u
t different
color
betw
een
text pixels an
d ba
ckgroun
d
pixels. To
sh
ow
the feasibility of
TB
S
I
, a
ll pr
o
b
a
b
l
e
d
i
ffe
r
e
nc
es
in
c
o
lor
b
e
tw
ee
n
te
xt p
i
xe
ls a
n
d
b
a
c
k
g
r
o
und
pixels a
r
e g
e
nerate
d
which co
nsi
s
t of
65
,
280
image
s. The
s
e ima
g
e
s
a
r
e furthe
r divi
ded into
grou
ps
categ
o
rized by its
TB
S
I
value whi
c
h consi
s
ts of
25
5
su
bgro
u
p
s
. In this expe
rime
nt, the
ATLAS is
compa
r
ed to
three
othe
r
algorith
m
s: L
i
u and
Wan
g
’s Stroke-Li
k
e Ed
ge
ba
sed
algorith
m
[23], Yi and Tian’
s Boun
dary Clusteri
ng ba
sed algo
rithm [24] and Le
e
and Kim’s T
w
o-
Stage Ra
ndo
m Field algo
ri
thm in terms
of ideal loca
li
zing
stre
ngth.
All the four algorithm
s were
impleme
n
ted
on
the
self
-gene
rated
da
taset o
n
sa
me comp
uter
with
Intel Core
i7 2.00
GHz
pro
c
e
s
sor an
d
16GB me
morie
s
.
Th
e experim
ental
re
sults are
summ
ari
z
ed
in
Tabl
e
4
a
n
d
Figure 10.
Figure 10. Ideal Lo
cali
zing
Strength Co
mpar
i
s
o
n
bet
wee
n
[23], [24], [25] and ATLAS
Table 4. Experime
n
t Re
su
lt from Self-Generated Dataset
Al
g
o
r
i
t
h
m
Ideal L
o
calizin
g
Streng
th
ATLAS 0.760
SH. Lee [25]
0.696
Y
i
and
Tian [24]
0.560
Liu and Wang [2
3]
0.545
Text localization is rel
a
tively easier for t
he machine
whe
n
the col
o
r differe
nce betwe
en
text pixel and
backg
ro
und
pixel is bi
g (o
r
is sm
all). T
he p
r
e
c
isio
n
of simila
rity index fun
c
tion
p
sho
w
s an id
eal ch
art with
a shar
p de
crease at a certain level of
(See Figu
re
11). Thi
s
simplifie
s the
cal
c
ulatio
n o
f
locali
zing
st
rength
w
here
it can
be
ob
tained by
directly taki
ng t
h
e
value at the sharp d
e
crea
se point. Local
izing st
ren
g
th
is normali
ze
d to range be
tween zero and
one, a
nd i
s
u
s
ed
for judgi
ng the
ability
of an
alg
o
rit
h
m on
lo
cali
zing ima
g
e
wit
h
differe
nt
TB
S
I
(more preci
s
ely, image wi
th high
TB
S
I
). By obse
r
ving Fi
gure
11, it shows that al
gorithm [23]
Evaluation Warning : The document was created with Spire.PDF for Python.