I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
14
, N
o.
3
,
J
une
20
25
, pp.
1743
~
1751
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
14
.i
3
.pp
1743
-
1751
1743
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
A
r
t
i
f
i
c
i
al
i
n
t
e
l
l
i
ge
n
c
e
m
u
l
t
i
l
i
n
gu
a
l
i
m
age
-
to
-
sp
e
e
c
h
f
or
ac
c
e
ss
i
b
i
l
i
t
y an
d
t
e
xt
r
e
c
ogn
i
t
i
on
R
os
al
in
a
1
,
H
as
an
u
l
F
ah
m
i
2
,
G
e
n
t
a S
ah
u
r
i
3
1
I
nf
or
m
a
t
i
c
s
S
t
udy P
r
og
r
a
m
, F
a
c
ul
t
y of
C
om
put
e
r
S
c
i
e
nc
e
, P
r
e
s
i
de
nt
U
ni
ve
r
s
i
t
y, B
e
ka
s
i
,
I
ndone
s
i
a
2
S
c
hool
of
I
nf
or
m
a
t
i
on T
e
c
hnol
ogy
,
U
N
I
T
A
R
I
nt
e
r
na
t
i
ona
l
U
ni
ve
r
s
i
t
y,
P
e
t
a
l
i
ng J
a
ya
, M
a
l
a
ys
i
a
3
I
nf
or
m
a
t
i
on S
ys
t
e
m
s
S
t
udy P
r
og
r
a
m
,
F
a
c
ul
t
y of
C
om
put
e
r
S
c
i
e
nc
e
,
P
r
e
s
i
de
nt
U
ni
ve
r
s
i
t
y, B
e
ka
s
i
, I
ndone
s
i
a
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
S
e
p
15
,
2024
R
e
vi
s
e
d
N
ov
18
,
2024
A
c
c
e
pt
e
d
N
ov
24
,
2024
The
primary
challenge
for
visually
impaired
and
illiterate
individ
uals
is
accessing
and
understan
ding
visual
content,
which
hinders
their
abi
lity
to
navigate
environments
and
engage
with
text
-
based
information.
This
re
search
addresses
this
problem
by
imple
menting
an
artificial
intell
igence
(AI)
-
powered
multilingual
image
-
to
-
speech
technology
that
converts
tex
t
from
images
into
audio
descriptions.
The
system
combines
optical
ch
aracter
recognition
(OCR)
and
text
-
to
-
speech
(TTS)
synthesis,
using
natural
la
ngua
ge
processing
(NLP)
and
digital
signal
processing
(DSP)
to
generate
spoken
outputs
in
various
languages.
Tested
for
accuracy,
the
system
demon
strated
high
precision,
recall,
and
an
average
accuracy
rate
of
0.976,
prov
ing
its
effectivenes
s
in
real
-
world
app
lications.
This
technology
en
hances
accessibil
ity,
signifi
cantly
improvi
ng
the
quality
of
life
for
visuall
y
im
paired
individuals
and
offering
scalable
solutions
for
illiterate
population
s.
The
results
also
provide
insight
s
for
refining
OCR
accuracy
and
exp
anding
multilingual suppor
t.
K
e
y
w
o
r
d
s
:
I
m
a
ge
-
to
-
s
pe
e
c
h
M
ul
ti
li
ngua
l
a
udi
o de
s
c
r
ip
ti
ons
N
a
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
O
pt
ic
a
l
c
ha
r
a
c
te
r
r
e
c
ogni
ti
on
T
e
xt
-
to
-
s
pe
e
c
h
This is an
open
acce
ss artic
le unde
r the
CC BY
-
SA
license.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
H
a
s
a
nul
F
a
hm
i
S
c
hool
of
I
nf
or
m
a
ti
on T
e
c
hnol
ogy
,
U
N
I
T
A
R
I
nt
e
r
na
ti
ona
l
U
ni
ve
r
s
it
y
K
e
la
na
J
a
y
a
, 47301
-
P
e
ta
li
ng J
a
ya
,
S
e
la
ngor
, M
a
la
ys
ia
E
m
a
il
:
f
a
hm
i.
z
uhr
i@unit
a
r
.m
y
1.
I
N
T
R
O
D
U
C
T
I
O
N
I
n
a
n
in
c
r
e
a
s
in
gl
y
di
gi
ta
l
w
or
ld
,
a
c
c
e
s
s
to
vi
s
ua
l
c
ont
e
nt
is
e
s
s
e
nt
ia
l
f
or
da
il
y
c
om
m
uni
c
a
ti
on,
e
duc
a
ti
on,
a
nd
na
vi
ga
ti
on
[
1]
,
[
2]
.
H
ow
e
v
e
r
,
vi
s
ua
ll
y
im
pa
ir
e
d
a
nd
il
li
te
r
a
te
in
di
vi
dua
ls
f
a
c
e
s
ig
ni
f
ic
a
nt
c
ha
ll
e
nge
s
in
in
te
r
pr
e
ti
ng
s
uc
h
c
ont
e
nt
,
li
m
it
in
g
th
e
ir
a
bi
li
ty
to
f
ul
ly
e
nga
ge
w
it
h
th
e
ir
s
ur
r
oundings
a
nd
a
c
c
e
s
s
c
r
it
ic
a
l
in
f
or
m
a
ti
on
[
3]
−
[
5
]
.
A
s
s
is
ti
ve
te
c
hnol
ogi
e
s
ha
v
e
m
a
de
pr
ogr
e
s
s
in
e
nha
nc
in
g
a
c
c
e
s
s
ib
il
it
y,
but
ga
ps
s
ti
ll
e
xi
s
t
in
pr
ovi
di
ng
a
c
c
ur
a
te
a
nd
r
e
a
l
-
ti
m
e
s
ol
ut
io
ns
th
a
t
e
f
f
e
c
ti
ve
ly
c
onve
r
t
vi
s
ua
l
in
f
or
m
a
ti
on
in
to
a
f
or
m
a
t
a
c
c
e
s
s
ib
le
to
th
e
s
e
in
di
vi
dua
ls
.
R
e
c
e
nt
a
dva
nc
e
m
e
nt
s
in
a
r
ti
f
ic
ia
l
in
te
ll
ig
e
nc
e
(
A
I
)
[
6]
−
[
10]
,
of
f
e
r
pr
om
is
in
g
s
ol
ut
io
ns
by
c
om
bi
ni
ng
opt
ic
a
l
c
ha
r
a
c
te
r
r
e
c
ogni
ti
on
(
O
C
R
)
,
te
xt
-
to
-
s
pe
e
c
h
(
T
T
S
)
,
a
nd
na
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
(
N
L
P
)
to
tr
a
ns
f
or
m
im
a
ge
s
in
to
s
poke
n
de
s
c
r
ip
ti
ons
.
T
hi
s
r
e
s
e
a
r
c
h
f
oc
us
e
s
on
im
pl
e
m
e
nt
in
g
a
nd
e
va
lu
a
ti
ng a
m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h
s
y
s
te
m
t
ha
t
a
m
pl
if
ie
s
A
I
t
o a
ddr
e
s
s
t
he
s
e
a
c
c
e
s
s
ib
il
it
y c
ha
ll
e
nge
s
.
A
ke
y
pr
obl
e
m
f
or
th
e
vi
s
ua
ll
y
im
pa
ir
e
d
i
s
th
e
in
a
bi
li
ty
to
a
c
c
e
s
s
pr
in
te
d
or
di
gi
ta
l
te
xt
in
im
a
ge
s
[
11]
−
[
13]
,
w
hi
c
h
is
e
xa
c
e
r
ba
te
d
w
he
n
na
vi
ga
ti
ng
di
ve
r
s
e
e
nvi
r
onm
e
nt
s
or
c
ons
um
in
g
vi
s
u
a
l
c
ont
e
nt
in
m
ul
ti
pl
e
la
ngua
ge
s
[
14]
,
[
15]
.
W
hi
le
e
xi
s
ti
ng
O
C
R
a
nd
T
T
S
te
c
hnol
ogi
e
s
pr
ovi
de
ba
s
ic
te
xt
-
to
-
a
udi
o
c
onve
r
s
io
n
[
16]
,
[
17]
,
th
e
i
r
a
c
c
ur
a
c
y
of
te
n
de
c
li
ne
s
in
r
e
a
l
-
w
or
ld
s
c
e
n
a
r
io
s
w
he
r
e
te
xt
m
a
y
be
di
s
to
r
te
d,
pa
r
ti
a
ll
y
vi
s
ib
le
,
or
di
s
pl
a
ye
d
in
m
ul
ti
pl
e
la
ngua
g
e
s
[
18]
.
F
ur
th
e
r
m
or
e
,
m
a
ny
c
u
r
r
e
nt
s
ol
ut
io
ns
a
r
e
m
onol
in
gua
l,
li
m
it
in
g
th
e
ir
ut
il
it
y
f
or
us
e
r
s
in
m
ul
ti
li
ngua
l
e
nvi
r
onm
e
nt
s
[
19]
−
[
21]
.
I
ll
it
e
r
a
te
in
di
vi
dua
ls
a
ls
o
f
a
c
e
s
im
il
a
r
ba
r
r
ie
r
s
in
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
14
, N
o.
3
,
J
une
20
25
:
1743
-
1751
1744
a
c
c
e
s
s
in
g
te
xt
-
ba
s
e
d
in
f
or
m
a
ti
on
[
22]
−
[2
7
]
.
T
hus
,
th
e
r
e
is
a
c
le
a
r
ne
e
d
f
or
a
s
y
s
te
m
th
a
t
c
a
n
a
c
c
ur
a
te
ly
r
e
c
ogni
z
e
a
nd
de
s
c
r
ib
e
t
e
xt
f
r
om
i
m
a
ge
s
a
c
r
o
s
s
di
f
f
e
r
e
nt
l
a
ngu
a
ge
s
i
n r
e
a
l
ti
m
e
. S
e
ve
r
a
l
s
tu
di
e
s
ha
ve
e
xpl
or
e
d
O
C
R
a
nd
T
T
S
te
c
hnol
ogi
e
s
f
or
a
c
c
e
s
s
ib
il
it
y,
but
f
e
w
ha
ve
in
te
gr
a
te
d
th
e
s
e
te
c
hnol
ogi
e
s
in
to
a
r
obus
t,
A
I
-
pow
e
r
e
d
s
ol
ut
io
n
c
a
pa
bl
e
of
pr
ovi
di
ng
a
c
c
ur
a
te
m
ul
ti
li
ngua
l
de
s
c
r
ip
ti
ons
.
P
r
e
vi
ous
w
or
k
by
[
28]
de
m
on
s
tr
a
te
d
th
a
t
a
dva
nc
e
d
O
C
R
m
ode
ls
c
oul
d
im
pr
ove
te
xt
r
e
c
ogni
ti
on
r
a
te
s
,
e
s
pe
c
ia
ll
y
in
noi
s
y
or
c
om
pl
e
x
im
a
ge
e
nvi
r
onm
e
nt
s
.
H
ow
e
ve
r
,
c
om
bi
ni
ng
th
e
s
e
t
e
c
hnol
ogi
e
s
w
it
h
A
I
-
dr
iv
e
n
N
L
P
a
nd
di
gi
ta
l
s
ig
na
l
pr
oc
e
s
s
in
g
(
D
S
P
)
te
c
hni
que
s
r
e
m
a
in
s
unde
r
e
xpl
or
e
d.
A
I
a
dva
nc
e
m
e
nt
s
now
e
na
bl
e
m
or
e
s
ophi
s
ti
c
a
te
d
a
nd
c
ont
e
xt
-
a
w
a
r
e
s
ys
te
m
s
t
ha
t
c
a
n e
nha
n
c
e
bot
h t
he
a
c
c
ur
a
c
y a
nd t
he
s
c
ope
of
a
c
c
e
s
s
ib
il
it
y s
ol
ut
io
ns
f
or
t
he
vi
s
ua
ll
y i
m
pa
ir
e
d.
T
he
pr
opos
e
d
s
ol
ut
io
n
in
te
gr
a
te
s
s
ta
te
-
of
-
th
e
-
a
r
t
O
C
R
a
nd
T
T
S
s
ys
te
m
s
w
it
h
A
I
-
pow
e
r
e
d
N
L
P
a
nd
D
S
P
te
c
hni
que
s
to
c
r
e
a
te
a
m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h
te
c
h
nol
ogy.
B
y
le
ve
r
a
gi
ng
th
e
s
e
te
c
hnol
ogi
e
s
,
th
e
s
ys
te
m
i
s
de
s
ig
ne
d
to
a
c
c
ur
a
te
ly
r
e
c
ogni
z
e
te
xt
in
m
ul
ti
pl
e
la
ngua
ge
s
,
e
ve
n
unde
r
c
h
a
ll
e
ngi
ng
im
a
ge
c
ondi
ti
ons
, a
nd pr
ovi
de
r
e
a
l
-
ti
m
e
a
udi
o de
s
c
r
ip
ti
ons
f
or
us
e
r
s
. T
he
s
y
s
te
m
not
onl
y a
ddr
e
s
s
e
s
t
he
c
ha
ll
e
ng
e
of
vi
s
ua
l
te
xt
a
c
c
e
s
s
ib
il
it
y f
or
vi
s
ua
ll
y i
m
pa
ir
e
d us
e
r
s
but
a
l
s
o br
o
a
de
ns
i
ts
s
c
op
e
t
o
s
uppor
t
il
li
te
r
a
te
i
ndi
vi
dua
ls
,
e
na
bl
in
g
th
e
m
to
“
he
a
r
”
te
xt
c
ont
e
nt
th
a
t
th
e
y
c
a
nnot
r
e
a
d.
T
hi
s
m
ul
ti
li
ngua
l
c
a
pa
bi
li
ty
di
s
ti
ngui
s
he
s
it
f
r
om
ot
he
r
e
xi
s
ti
ng
s
ol
ut
io
n
s
.
T
he
in
nova
ti
ve
va
lu
e
of
th
is
r
e
s
e
a
r
c
h
li
e
s
in
it
s
a
ppr
oa
c
h
to
c
om
bi
ni
ng
m
ul
ti
pl
e
A
I
te
c
hnol
ogi
e
s
in
to
a
c
ohe
s
iv
e
,
us
e
r
-
f
r
ie
ndl
y
s
ys
te
m
th
a
t
e
nha
nc
e
s
r
e
a
l
-
ti
m
e
a
c
c
e
s
s
ib
il
it
y.
T
h
e
s
ys
te
m
a
c
hi
e
ve
s
hi
gh
pr
e
c
is
io
n
a
nd
r
e
c
a
ll
,
a
s
de
m
ons
tr
a
te
d
by
a
n
a
ve
r
a
ge
a
c
c
ur
a
c
y
r
a
te
of
0.976
in
te
s
ts
,
m
a
ki
ng
it
a
r
e
li
a
bl
e
to
ol
f
or
r
e
a
l
-
w
or
ld
a
ppl
ic
a
ti
ons
.
A
ddi
ti
ona
ll
y,
it
s
m
ul
ti
li
ngua
l
f
u
nc
ti
ona
li
ty
e
xt
e
nds
th
e
te
c
hnol
ogy
’
s
u
s
e
f
ul
ne
s
s
to
di
ve
r
s
e
popula
ti
ons
.
T
he
f
in
di
ngs
c
ont
r
ib
ut
e
to
th
e
body
of
kn
ow
le
dge
in
a
s
s
is
ti
ve
te
c
hnol
ogi
e
s
by
pr
ovi
di
ng
a
f
r
a
m
e
w
or
k
f
or
f
ur
th
e
r
im
pr
ove
m
e
nt
s
in
O
C
R
a
c
c
ur
a
c
y
a
nd
T
T
S
in
te
gr
a
ti
on,
of
f
e
r
in
g
a
s
c
a
la
bl
e
s
ol
ut
io
n
to
a
c
c
e
s
s
ib
il
it
y c
ha
ll
e
nge
s
f
a
c
e
d by vis
ua
ll
y i
m
pa
ir
e
d a
nd i
ll
it
e
r
a
te
i
ndi
vi
dua
ls
.
2.
M
E
T
H
O
D
T
hi
s
r
e
s
e
a
r
c
h
a
im
s
to
d
e
ve
lo
p,
im
pl
e
m
e
nt
,
a
nd
e
va
lu
a
te
a
m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h
s
y
s
te
m
s
pe
c
if
ic
a
ll
y
de
s
ig
ne
d
to
s
uppor
t
vi
s
ua
ll
y
im
pa
ir
e
d
a
nd
il
li
te
r
a
te
in
di
vi
dua
ls
in
a
c
c
e
s
s
in
g
w
r
it
te
n
in
f
or
m
a
ti
on.
T
he
s
y
s
te
m
in
te
gr
a
te
s
O
C
R
te
c
hnol
ogy
to
a
c
c
ur
a
te
ly
e
xt
r
a
c
t
te
xt
f
r
om
im
a
ge
s
,
e
ns
ur
in
g
hi
gh
-
qua
li
ty
te
x
t
r
e
c
ogni
ti
on
a
c
r
os
s
m
ul
ti
pl
e
la
ngu
a
ge
s
.
O
nc
e
th
e
te
xt
is
e
xt
r
a
c
te
d,
a
T
T
S
s
ynt
he
s
i
s
m
odul
e
c
onv
e
r
ts
it
in
to
na
tu
r
a
l
-
s
ounding
s
pe
e
c
h,
a
ll
ow
in
g
us
e
r
s
to
li
s
te
n
to
th
e
c
ont
e
nt
in
th
e
ir
pr
e
f
e
r
r
e
d
la
ngua
ge
.
B
y
c
om
bi
ni
ng
O
C
R
a
nd
T
T
S
,
th
e
s
ys
te
m
e
nha
nc
e
s
a
c
c
e
s
s
ib
il
it
y,
e
na
bl
in
g
in
di
vi
dua
ls
w
it
h
vi
s
ua
l
or
r
e
a
di
ng
im
pa
ir
m
e
nt
s
to
in
te
r
a
c
t
w
it
h
te
xt
ua
l
in
f
or
m
a
ti
on
m
or
e
in
de
pe
nde
nt
ly
.
T
he
ove
r
a
ll
a
r
c
hi
t
e
c
tu
r
e
,
w
hi
c
h
out
li
ne
s
th
e
k
e
y
c
om
pone
nt
s
a
nd
th
e
ir
i
nt
e
r
a
c
ti
ons
,
is
de
pi
c
te
d i
n F
ig
ur
e
1, pr
ovi
di
ng a
c
om
pr
e
he
ns
iv
e
ove
r
vi
e
w
of
t
he
s
ys
t
e
m
’
s
f
unc
ti
ona
li
ty
.
F
ig
ur
e
1.
M
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h s
ys
te
m
a
r
c
hi
te
c
tu
r
e
2.1. Dat
a c
ol
le
c
t
io
n
T
o
de
v
e
lo
p
a
n
A
I
-
pow
e
r
e
d
m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h
te
c
hno
lo
gy
a
im
e
d
a
t
e
nha
nc
in
g
a
c
c
e
s
s
ib
il
it
y
f
or
vi
s
ua
ll
y
im
pa
i
r
e
d
us
e
r
s
,
a
c
om
pr
e
he
ns
iv
e
da
ta
c
ol
le
c
ti
on
pr
oc
e
s
s
is
e
s
s
e
nt
ia
l.
T
h
e
f
ir
s
t
s
te
p
in
vol
ve
s
ga
th
e
r
in
g
a
di
ve
r
s
e
im
a
ge
da
ta
s
e
t
c
ont
a
in
in
g
te
xt
in
va
r
io
us
la
ng
ua
ge
s
,
in
c
lu
di
ng
s
ig
n
s
,
pr
in
te
d
doc
um
e
nt
s
,
a
nd
la
be
ls
.
T
he
goa
l
is
to
e
ns
ur
e
th
a
t
th
e
da
ta
s
e
t
r
e
f
le
c
ts
di
f
f
e
r
e
nt
f
ont
s
,
s
iz
e
s
,
or
ie
nt
a
ti
ons
,
a
nd
ba
c
kgr
ounds
to
e
nha
nc
e
th
e
r
obus
tn
e
s
s
of
th
e
m
ode
l.
F
or
th
is
pur
pos
e
,
w
e
c
a
n
ut
il
iz
e
th
e
in
te
r
na
ti
ona
l
c
onf
e
r
e
nc
e
on
doc
um
e
nt
a
na
ly
s
is
a
nd
r
e
c
ogni
ti
on
(
I
C
D
A
R
)
da
ta
s
e
ts
,
w
hi
c
h
pr
ovi
de
b
e
nc
hm
a
r
k
im
a
ge
s
of
pr
in
te
d
te
xt
in
m
ul
ti
pl
e
la
ngua
ge
s
a
nd f
or
m
a
ts
.
T
he
s
e
da
ta
s
e
ts
a
r
e
w
id
e
ly
r
e
c
ogni
z
e
d i
n
th
e
f
ie
ld
f
or
t
he
ir
c
om
pr
e
he
ns
iv
e
r
a
ng
e
of
t
e
xt
ty
pe
s
a
nd
c
ondi
ti
ons
.
A
ddi
ti
ona
ll
y,
c
r
ow
ds
our
c
in
g
c
a
n
be
e
m
pl
o
ye
d
to
a
ll
ow
us
e
r
s
to
upl
oa
d
im
a
ge
s
c
ont
a
in
in
g
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
in
te
ll
ig
e
nc
e
m
ul
ti
li
ngual i
m
age
-
to
-
s
pe
e
c
h f
or
ac
c
e
s
s
ib
il
it
y
and te
x
t
r
e
c
ogni
ti
on
(
R
os
al
in
a
)
1745
te
xt
,
f
ur
th
e
r
e
nr
ic
hi
ng
th
e
da
ta
s
e
t
w
it
h
r
e
a
l
-
w
or
ld
e
xa
m
pl
e
s
.
T
hi
s
a
ppr
oa
c
h
not
onl
y
br
oa
de
n
s
th
e
da
ta
s
e
t
but
a
ls
o c
a
pt
ur
e
s
v
a
r
ie
d c
ondi
ti
ons
unde
r
w
hi
c
h t
e
xt
a
ppe
a
r
s
, e
nha
n
c
in
g t
he
m
ode
l
’
s
a
ppl
ic
a
bi
li
ty
.
T
h
e
s
e
c
o
nd
s
t
e
p
in
v
ol
ve
s
c
ol
le
c
ti
ng
a
hi
gh
-
qu
a
li
ty
a
ud
io
da
t
a
s
e
t
th
a
t
c
o
n
s
i
s
t
s
of
r
e
c
or
d
in
g
s
c
or
r
e
s
po
nd
in
g
to
th
e
t
e
x
t
e
xt
r
a
c
te
d
f
r
om
i
m
a
ge
s
,
e
n
s
ur
i
ng
a
c
c
ur
a
t
e
pr
on
un
c
i
a
ti
on
a
n
d
in
to
na
t
io
n
f
or
e
a
c
h
la
n
gu
a
g
e
.
F
or
t
hi
s
pur
p
o
s
e
, M
o
z
i
ll
a
’
s
C
om
m
o
n V
oi
c
e
p
r
oj
e
c
t
s
e
r
ve
s
a
s
a
n
e
x
c
e
ll
e
nt
s
our
c
e
.
I
t
i
s
a
n
op
e
n
-
s
our
c
e
in
it
ia
t
iv
e
th
a
t
c
o
ll
e
c
ts
di
v
e
r
s
e
vo
ic
e
s
a
m
pl
e
s
in
m
ul
ti
pl
e
l
a
ngu
a
ge
s
,
a
ll
o
w
i
ng
c
o
nt
r
i
bu
to
r
s
to
r
e
c
or
d
t
he
m
s
e
lv
e
s
r
e
a
di
n
g s
e
nt
e
n
c
e
s
. T
hi
s
d
a
t
a
s
e
t
pr
ov
id
e
s
t
he
v
a
r
i
a
bi
li
t
y a
nd
r
i
c
hn
e
s
s
n
e
e
de
d
t
o c
r
e
a
t
e
e
f
f
e
c
ti
v
e
TTS
c
a
pa
bi
li
ti
e
s
.
I
n
a
ddi
ti
o
n,
c
o
nt
r
a
c
ti
n
g
n
a
ti
ve
s
pe
a
k
e
r
s
or
v
oi
c
e
a
c
to
r
s
c
a
n
e
nh
a
n
c
e
th
e
d
a
t
a
s
e
t
’
s
qu
a
l
it
y
by
e
n
s
ur
in
g
a
c
c
ur
a
t
e
pr
on
un
c
i
a
ti
on
a
nd
e
m
ot
i
on
a
l
e
xpr
e
s
s
io
n.
T
h
e
a
nn
ot
a
ti
on
pr
oc
e
s
s
i
s
c
r
uc
ia
l
f
or
pr
e
p
a
r
i
ng
t
he
im
a
g
e
d
a
ta
s
e
t
f
or
tr
a
i
ni
n
g.
I
t
in
vol
v
e
s
m
a
nu
a
ll
y
a
nn
ot
a
ti
ng
i
m
a
ge
s
w
it
h
b
oun
di
ng
bo
xe
s
a
r
o
un
d
t
e
xt
a
r
e
a
s
a
n
d
pr
ovi
di
ng
c
or
r
e
s
po
nd
in
g
te
xt
tr
a
n
s
c
r
ip
t
io
n
s
in
m
ul
t
ip
l
e
l
a
ngu
a
g
e
s
.
A
n
not
a
t
i
on t
ool
s
s
uc
h a
s
L
a
b
e
lI
m
g
or
R
e
c
tL
a
b
e
l
is
u
s
e
d
to
c
r
e
a
t
e
bo
und
in
g b
ox
e
s
a
r
o
und
t
h
e
te
xt
,
e
n
s
ur
in
g
t
h
a
t
v
a
r
i
ou
s
or
ie
n
ta
ti
o
ns
a
nd
l
a
yo
ut
s
a
r
e
c
a
pt
ur
e
d.
2.2. O
p
t
ic
al
c
h
ar
ac
t
e
r
r
e
c
ogn
it
io
n
T
he
s
ys
te
m
e
m
pl
oys
th
e
T
e
s
s
e
r
a
c
t
O
C
R
e
ngi
ne
w
it
h
P
yt
hon
’
s
pyt
e
s
s
e
r
a
c
t
li
br
a
r
y
f
or
e
xt
r
a
c
ti
ng
te
xt
f
r
om
im
a
ge
s
.
T
he
w
or
kf
lo
w
be
gi
ns
w
it
h
im
a
g
e
a
c
qui
s
it
io
n
f
r
om
va
r
io
us
s
our
c
e
s
,
in
c
lu
di
ng
s
c
a
n
s
a
nd
phot
o
s
.
P
r
e
-
pr
oc
e
s
s
in
g
s
te
ps
s
uc
h
a
s
r
e
s
iz
in
g,
noi
s
e
r
e
duc
ti
on,
a
nd
c
o
nt
r
a
s
t
e
nha
nc
e
m
e
nt
a
r
e
u
s
e
d
to
im
pr
ove
te
xt
vi
s
ib
il
it
y.
T
e
xt
r
e
gi
on
s
a
r
e
id
e
nt
if
ie
d
us
in
g
te
c
hni
que
s
li
ke
e
dge
de
te
c
ti
on
a
nd
m
a
c
hi
ne
le
a
r
ni
ng,
a
nd
th
e
de
te
c
te
d
te
xt
is
c
onv
e
r
te
d
in
to
m
a
c
hi
ne
-
r
e
a
da
bl
e
f
or
m
a
t.
P
os
t
-
pr
oc
e
s
s
in
g
f
ur
th
e
r
r
e
f
in
e
s
th
is
t
e
xt
,
c
or
r
e
c
ti
ng
e
r
r
or
s
a
nd i
m
pr
ovi
ng f
or
m
a
tt
in
g.
I
n t
he
pr
e
-
pr
oc
e
s
s
in
g pha
s
e
,
w
e
s
ta
nd
a
r
di
z
e
i
m
a
ge
di
m
e
ns
io
n
s
t
o e
ns
ur
e
c
ons
i
s
te
nt
pr
oc
e
s
s
in
g a
c
r
os
s
va
r
io
us
im
a
ge
s
.
T
hi
s
in
vol
ve
s
r
e
s
iz
in
g
a
ll
im
a
ge
s
to
a
f
ix
e
d
w
i
dt
h
a
nd
he
ig
ht
,
m
a
in
ta
in
in
g
a
s
p
e
c
t
r
a
ti
o
w
he
n
ne
c
e
s
s
a
r
y.
T
he
r
e
s
iz
in
g
f
or
m
ul
a
a
s
in
(
1)
a
dj
us
ts
th
e
im
a
ge
s
iz
e
w
hi
le
pr
e
s
e
r
vi
ng
pr
opor
ti
ons
.
T
hi
s
s
ta
nda
r
di
z
a
ti
on
s
im
pl
if
ie
s
s
ub
s
e
que
nt
im
a
ge
a
n
a
ly
s
is
by
e
n
s
ur
in
g
uni
f
or
m
in
put
di
m
e
ns
io
ns
,
w
hi
c
h
he
lp
s
in
m
a
in
ta
in
in
g
c
ons
is
te
nc
y
in
f
e
a
tu
r
e
e
xt
r
a
c
ti
on
a
nd
p
r
oc
e
s
s
in
g,
ul
ti
m
a
te
ly
im
pr
ovi
ng
th
e
a
c
c
ur
a
c
y
a
nd
e
f
f
ic
ie
nc
y
of
t
he
O
C
R
s
ys
t
e
m
.
ℎ
=
ℎ
ℎ
=
ℎ
(
ℎ
ℎ
)
(
1)
I
n
a
ddi
ti
on,
w
e
a
ppl
ie
d
f
il
te
r
s
to
r
e
m
ove
unw
a
nt
e
d a
r
ti
f
a
c
ts
a
nd
noi
s
e
f
r
om
th
e
im
a
ge
.
T
hi
s
in
vol
ve
s
us
in
g
G
a
us
s
ia
n
bl
ur
,
w
hi
c
h
s
m
oot
hs
th
e
im
a
ge
by
a
ve
r
a
gi
ng
th
e
in
te
ns
it
ie
s
of
s
ur
r
ounding
pi
xe
ls
.
T
he
G
a
us
s
ia
n
f
unc
ti
on,
gi
ve
n
a
s
in
(
2)
,
t
hi
s
f
il
te
r
r
e
duc
e
s
noi
s
e
a
nd
e
nha
nc
e
s
im
a
ge
qua
li
ty
by
m
in
im
iz
in
g
hi
gh
-
f
r
e
que
nc
y
va
r
ia
ti
ons
, e
ns
ur
in
g t
ha
t
th
e
O
C
R
s
ys
te
m
pr
oc
e
s
s
e
s
c
le
a
ne
r
a
nd
m
or
e
a
c
c
ur
a
te
da
ta
.
′
(
,
)
=
1
2
2
∬
(
′
,
′
)
(
−
(
−
′
)
2
+
(
−
′
)
2
2
2
)
′
′
∞
−
∞
(
2)
C
ont
r
a
s
t
e
nha
nc
e
m
e
nt
w
a
s
a
ppl
ie
d
to
im
pr
ove
te
xt
vi
s
ib
il
it
y
th
r
ough
c
ont
r
a
s
t
s
tr
e
tc
hi
ng.
T
hi
s
te
c
hni
que
a
dj
u
s
ts
pi
xe
l
in
te
ns
it
ie
s
us
in
g
th
e
f
or
m
ul
a
a
s
in
(
3)
.
I
n
th
is
f
or
m
ul
a
,
r
e
pr
e
s
e
nt
s
th
e
or
ig
in
a
l
pi
xe
l
in
te
ns
it
y,
a
nd
a
r
e
th
e
m
in
im
u
m
a
nd
m
a
xi
m
um
in
te
ns
it
ie
s
in
th
e
im
a
ge
,
a
nd
is
th
e
num
be
r
o
f
in
te
ns
it
y l
e
ve
ls
. T
hi
s
a
dj
us
tm
e
nt
e
nha
nc
e
s
c
ont
r
a
s
t,
m
a
ki
ng t
e
xt
a
nd de
ta
il
s
m
or
e
di
s
ti
nc
t.
=
−
−
(
−
1
)
(
3)
W
e
th
e
n
id
e
nt
if
ie
d
r
e
gi
ons
in
th
e
im
a
ge
th
a
t
li
ke
ly
c
ont
a
in
te
xt
us
in
g
e
dge
de
te
c
ti
on.
T
hi
s
te
c
hni
que
c
a
lc
ul
a
te
s
e
dge
s
tr
e
ngt
h
to
lo
c
a
te
t
e
xt
bounda
r
ie
s
.
T
h
e
e
dg
e
s
t
r
e
ngt
h
is
de
te
r
m
in
e
d
by
th
e
f
or
m
ul
a
a
s
in
(
4)
.
H
e
r
e
,
r
e
pr
e
s
e
nt
t
he
gr
a
di
e
nt
s
of
pi
xe
l
in
te
ns
it
y
in
t
he
hor
iz
ont
a
l
a
nd ve
r
ti
c
a
l
di
r
e
c
ti
ons
, r
e
s
pe
c
ti
ve
ly
. T
hi
s
c
a
lc
ul
a
ti
on highl
ig
ht
s
a
r
e
a
s
w
it
h hi
gh c
h
a
nge
s
i
n i
nt
e
n
s
it
y, i
ndi
c
a
ti
ng pote
nt
ia
l
te
xt
r
e
gi
ons
.
ℎ
=
√
(
)
2
+
(
)
2
(
4)
A
f
te
r
id
e
nt
if
yi
ng
pot
e
nt
ia
l
te
xt
r
e
gi
ons
,
w
e
pe
r
f
or
m
e
d
te
xt
r
e
c
ogni
ti
on
to
c
onve
r
t
th
e
s
e
r
e
gi
ons
in
to
m
a
c
hi
ne
-
r
e
a
da
bl
e
te
xt
u
s
in
g
th
e
O
C
R
e
ngi
ne
.
T
he
T
e
s
s
e
r
a
c
t
O
C
R
e
ngi
ne
e
m
pl
oy
s
pa
tt
e
r
n
r
e
c
ogni
ti
on
a
lg
or
it
hm
s
to
a
na
ly
z
e
th
e
de
te
c
te
d
t
e
xt
a
r
e
a
s
,
m
a
tc
hi
ng
th
e
m
to
k
now
n
c
ha
r
a
c
te
r
pa
tt
e
r
ns
.
T
hi
s
pr
oc
e
s
s
in
vol
ve
s
c
om
pa
r
in
g
im
a
ge
s
e
gm
e
nt
s
w
it
h
a
da
ta
ba
s
e
of
c
ha
r
a
c
te
r
te
m
pl
a
te
s
to
a
c
c
ur
a
te
ly
id
e
nt
if
y
a
nd
tr
a
ns
c
r
ib
e
th
e
te
xt
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
14
, N
o.
3
,
J
une
20
25
:
1743
-
1751
1746
2.
3
.
S
p
e
e
c
h
s
yn
t
h
e
s
is
T
he
r
e
c
ogni
z
e
d t
e
xt
i
s
c
onve
r
te
d i
nt
o s
pe
e
c
h us
in
g t
he
P
yt
hon G
oogl
e
te
xt
-
to
-
s
pe
e
c
h
(
gT
T
S
)
l
ib
r
a
r
y.
T
hi
s
pr
oc
e
s
s
be
gi
n
s
w
it
h
N
L
P
,
w
hi
c
h
in
c
lu
de
s
to
ke
ni
z
a
ti
on
to
br
e
a
k
dow
n
th
e
te
xt
in
to
s
m
a
ll
e
r
uni
ts
,
s
uc
h
a
s
w
or
ds
or
s
e
nt
e
nc
e
s
,
a
nd
la
ngua
ge
de
te
c
ti
on
to
a
ppl
y
a
c
c
ur
a
te
pr
onunc
ia
ti
on
r
ul
e
s
a
nd
phone
ti
c
a
dj
us
tm
e
nt
s
.
I
n
th
e
s
ubs
e
que
nt
pha
s
e
,
D
S
P
te
c
hni
que
s
a
r
e
e
m
pl
oye
d.
T
he
f
ir
s
t
s
t
e
p
is
w
or
d
-
to
-
phone
m
e
c
onve
r
s
io
n,
w
he
r
e
t
e
xt
is
tr
a
ns
la
te
d
in
to
it
s
phone
ti
c
r
e
pr
e
s
e
nt
a
ti
on.
F
or
e
x
a
m
pl
e
,
“
he
l
lo
”
is
c
onv
e
r
te
d
to
phone
m
e
s
/h
/,
/ə
/,
/l
/,
/o
ʊ/.
T
hi
s
is
e
xpr
e
s
s
e
d
a
s
in
(
5)
.
ℎ
=
→
ℎ
(
5)
P
hone
m
e
s
ynt
he
s
is
c
onve
r
ts
phone
ti
c
r
e
pr
e
s
e
nt
a
ti
ons
in
to
s
pe
e
c
h
s
ounds
,
w
he
r
e
s
ound
w
a
ve
s
a
r
e
ge
ne
r
a
te
d
to
r
e
pr
e
s
e
nt
e
a
c
h
phone
m
e
.
W
e
im
pl
e
m
e
nt
e
d
W
a
v
e
N
e
t
m
ode
ls
to
e
nh
a
nc
e
th
i
s
pr
oc
e
s
s
.
W
a
ve
N
e
t
us
e
s
de
e
p
n
e
ur
a
l
ne
twor
ks
to
pr
oduc
e
m
or
e
na
tu
r
a
l
a
nd
hum
a
n
-
li
ke
s
ound
w
a
v
e
s
by
a
c
c
ur
a
te
ly
c
a
pt
ur
in
g
th
e
c
om
pl
e
xi
ti
e
s
of
huma
n s
pe
e
c
h
, t
hi
s
i
s
e
xpr
e
s
s
e
d
a
s
i
n (
6)
.
ℎ
=
ℎ
→
−
>
(
6)
F
in
a
ll
y,
w
a
ve
f
or
m
ge
ne
r
a
ti
on
pr
oduc
e
s
a
hi
gh
-
qua
li
ty
a
udi
o
w
a
ve
f
or
m
f
r
om
th
e
s
ynt
he
s
iz
e
d
phone
m
e
s
.
T
hi
s
s
te
p
in
vol
ve
s
c
onv
e
r
ti
ng
th
e
s
ynt
he
s
iz
e
d
pho
ne
m
e
s
ounds
in
to
a
n
a
udi
o
s
ig
na
l
th
a
t
c
a
n
be
pl
a
ye
d
ba
c
k.
T
h
e
w
a
ve
f
or
m
is
ge
ne
r
a
te
d
u
s
in
g
a
dva
n
c
e
d
te
c
h
ni
que
s
to
e
ns
ur
e
c
la
r
it
y
a
nd
na
tu
r
a
ln
e
s
s
in
th
e
s
pe
e
c
h output
.
T
he
w
a
ve
f
or
m
is
r
e
pr
e
s
e
nt
e
d a
s
in
(
7)
.
=
ℎ
−
>
(
7)
2.4
.
E
val
u
at
io
n
p
ar
am
e
t
e
r
s
f
o
r
t
e
xt
-
to
-
s
p
e
e
c
h
c
on
ve
r
s
io
n
m
e
t
r
ic
s
I
n
de
ve
lo
pi
ng
a
n
e
f
f
e
c
ti
ve
T
T
S
s
ys
te
m
,
it
i
s
e
s
s
e
nt
ia
l
to
e
s
ta
b
li
s
h
c
le
a
r
e
va
lu
a
ti
on
pa
r
a
m
e
t
e
r
s
th
a
t
m
e
a
s
ur
e
th
e
qua
li
ty
a
nd
in
te
ll
ig
ib
il
it
y
of
s
ynt
he
s
iz
e
d
s
pe
e
c
h.
T
he
s
e
pa
r
a
m
e
te
r
s
he
lp
de
te
r
m
in
e
how
w
e
ll
th
e
T
T
S
s
ys
te
m
c
a
n
pr
oduc
e
a
udi
o
th
a
t
is
bot
h
c
le
a
r
a
nd
n
a
tu
r
a
l
-
s
ounding,
pa
r
ti
c
ul
a
r
ly
f
or
a
ppl
ic
a
ti
ons
a
im
e
d
a
t
a
s
s
is
ti
ng
vi
s
ua
ll
y
im
pa
ir
e
d
us
e
r
s
.
T
h
e
f
ol
lo
w
in
g
m
e
tr
ic
s
pr
ovi
de
a
c
om
pr
e
he
ns
iv
e
f
r
a
m
e
w
or
k
f
or
a
s
s
e
s
s
in
g
th
e
pe
r
f
or
m
a
nc
e
of
T
T
S
s
ys
te
m
s
:
‒
P
hone
m
e
s
ynt
he
s
i
s
qua
li
ty
:
th
is
pa
r
a
m
e
te
r
m
e
a
s
ur
e
s
th
e
a
c
c
u
r
a
c
y
a
nd
pr
e
c
i
s
io
n
w
it
h
w
hi
c
h
th
e
T
T
S
s
ys
te
m
g
e
ne
r
a
te
s
phone
m
e
s
th
e
di
s
ti
nc
t
uni
t
s
of
s
ound
in
s
pe
e
c
h.
A
hi
ghe
r
va
lu
e
(
on
a
s
c
a
le
of
0
to
1)
in
di
c
a
te
s
be
tt
e
r
pe
r
f
or
m
a
nc
e
,
s
ugge
s
ti
ng
th
a
t
th
e
s
ys
te
m
e
f
f
e
c
ti
ve
ly
c
a
pt
ur
e
s
th
e
nua
nc
e
s
of
di
f
f
e
r
e
nt
la
ngua
ge
s
a
nd
di
a
l
e
c
ts
.
I
t
is
ty
pi
c
a
ll
y
a
s
s
e
s
s
e
d
th
r
ough
s
ubj
e
c
ti
v
e
li
s
te
ni
n
g
te
s
ts
a
nd
obj
e
c
ti
ve
e
v
a
lu
a
ti
ons
,
s
uc
h a
s
c
om
pa
r
in
g s
ynt
he
s
iz
e
d phone
m
e
s
a
ga
in
s
t
a
r
e
f
e
r
e
nc
e
s
e
t.
‒
W
a
ve
f
or
m
c
la
r
it
y:
th
is
qua
li
ta
ti
ve
pa
r
a
m
e
te
r
e
va
lu
a
te
s
th
e
ove
r
a
ll
a
udi
o
qua
li
ty
of
th
e
s
ynt
he
s
iz
e
d
s
pe
e
c
h.
A
“
hi
gh
”
r
a
ti
ng
s
ig
ni
f
ie
s
th
a
t
th
e
a
udi
o
out
put
h
a
s
m
in
im
a
l
di
s
to
r
ti
on,
noi
s
e
,
a
nd
a
r
ti
f
a
c
ts
,
le
a
di
ng
to
c
le
a
r
a
nd i
nt
e
ll
ig
ib
le
s
pe
e
c
h.
W
a
ve
f
or
m
c
la
r
it
y i
s
e
s
s
e
nt
ia
l
f
or
e
ns
ur
i
ng t
ha
t
li
s
te
ne
r
s
c
a
n
e
a
s
il
y unde
r
s
t
a
nd t
he
ge
ne
r
a
te
d a
udi
o, w
hi
c
h i
s
p
a
r
ti
c
ul
a
r
ly
i
m
por
ta
nt
f
or
a
ppl
ic
a
ti
ons
a
im
e
d
a
t
a
id
in
g vi
s
u
a
ll
y i
m
pa
ir
e
d us
e
r
s
.
‒
S
pe
e
c
h
na
tu
r
a
ln
e
s
s
:
th
is
pa
r
a
m
e
t
e
r
a
s
s
e
s
s
e
s
th
e
de
gr
e
e
to
w
hi
c
h
s
ynt
he
s
i
z
e
d
s
p
e
e
c
h
r
e
s
e
m
bl
e
s
n
a
tu
r
a
l
hum
a
n
s
pe
e
c
h.
I
t
e
va
lu
a
te
s
f
a
c
to
r
s
s
uc
h
a
s
in
to
na
ti
on,
r
hyt
hm
,
a
nd
e
xpr
e
s
s
iv
e
ne
s
s
.
A
n
“
im
pr
ove
d
”
r
a
ti
ng
in
di
c
a
te
s
th
a
t
th
e
T
T
S
s
y
s
te
m
ha
s
e
nha
nc
e
d
it
i
s
a
bi
li
ty
to
pr
od
uc
e
e
nga
gi
ng
a
nd
r
e
la
ta
bl
e
a
udi
o
out
put
.
T
hi
s
is
of
te
n
de
te
r
m
in
e
d
th
r
ough
us
e
r
f
e
e
dba
c
k
a
nd
e
xpe
r
t
e
va
lu
a
ti
ons
,
a
s
w
e
ll
a
s
c
om
p
a
r
is
on
w
it
h
na
tu
r
a
l
s
pe
e
c
h
s
a
m
pl
e
s
.
2.5
.
E
val
u
at
io
n
m
e
t
r
ic
s
f
o
r
op
t
ic
al
c
h
ar
ac
t
e
r
r
e
c
ogn
it
io
n
p
e
r
f
or
m
an
c
e
an
d
t
e
xt
r
e
c
ogn
it
io
n
ac
c
u
r
ac
y
I
n
a
s
s
e
s
s
in
g
th
e
e
f
f
e
c
ti
ve
ne
s
s
of
O
C
R
s
ys
te
m
s
a
nd
te
xt
r
e
c
ogni
ti
on
a
c
r
os
s
di
f
f
e
r
e
nt
la
ngua
ge
s
,
s
e
ve
r
a
l
ke
y
e
va
lu
a
ti
on
m
e
tr
ic
s
a
r
e
e
m
pl
oye
d.
T
he
s
e
m
e
tr
ic
s
of
f
e
r
in
s
ig
ht
s
in
to
th
e
a
c
c
ur
a
c
y
a
nd
r
e
li
a
bi
li
ty
of
th
e
s
ys
te
m
s
,
e
ns
ur
in
g t
ha
t
th
e
y m
e
e
t
th
e
ne
e
ds
of
va
r
io
us
a
ppl
ic
a
ti
o
ns
.
‒
P
r
e
c
is
io
n
is
a
c
r
it
ic
a
l
m
e
tr
ic
th
a
t
qua
nt
if
ie
s
th
e
a
c
c
ur
a
c
y
of
th
e
O
C
R
s
y
s
te
m
in
id
e
nt
if
yi
ng
r
e
le
va
nt
te
xt
.
I
t
is
de
f
in
e
d
a
s
th
e
r
a
ti
o
of
tr
ue
pos
it
iv
e
pr
e
di
c
ti
ons
(
c
or
r
e
c
tl
y
id
e
nt
if
ie
d
te
xt
)
to
th
e
to
ta
l
pr
e
di
c
te
d
pos
it
iv
e
s
(
bot
h
c
or
r
e
c
t
a
nd
in
c
or
r
e
c
t
id
e
nt
if
ic
a
ti
ons
)
.
A
hi
gh
pr
e
c
is
io
n
s
c
or
e
in
di
c
a
te
s
th
a
t
w
he
n
th
e
s
ys
te
m
pr
e
di
c
ts
t
e
xt
, i
t
is
l
ik
e
ly
t
o be
a
c
c
ur
a
te
, m
in
im
iz
in
g f
a
ls
e
pos
it
iv
e
s
.
‒
R
e
c
a
ll
,
a
ls
o
known
a
s
s
e
ns
it
iv
it
y
or
tr
ue
pos
it
iv
e
r
a
te
,
m
e
a
s
ur
e
s
th
e
O
C
R
s
y
s
te
m
’
s
a
bi
li
ty
to
id
e
nt
if
y
a
ll
r
e
le
va
nt
i
ns
ta
nc
e
s
of
t
e
xt
i
n a
gi
ve
n da
ta
s
e
t.
I
t
is
c
a
lc
ul
a
te
d a
s
t
he
r
a
ti
o of
t
r
ue
pos
it
iv
e
pr
e
di
c
ti
ons
t
o
th
e
to
ta
l
a
c
tu
a
l
po
s
it
iv
e
s
(
a
ll
in
s
ta
nc
e
s
of
t
e
xt
pr
e
s
e
nt
in
th
e
im
a
ge
s
)
.
A
hi
gh
r
e
c
a
ll
s
c
or
e
in
di
c
a
te
s
th
a
t
th
e
s
ys
te
m
s
uc
c
e
s
s
f
ul
ly
c
a
pt
ur
e
s
m
os
t
of
th
e
a
c
tu
a
l
te
xt
,
th
e
r
e
by
r
e
d
uc
in
g
th
e
li
ke
li
hood
of
m
is
s
in
g
c
ha
r
a
c
te
r
s
or
w
or
ds
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
in
te
ll
ig
e
nc
e
m
ul
ti
li
ngual i
m
age
-
to
-
s
pe
e
c
h f
or
ac
c
e
s
s
ib
il
it
y
and te
x
t
r
e
c
ogni
ti
on
(
R
os
al
in
a
)
1747
‒
F
1
s
c
or
e
s
e
r
ve
s
a
s
a
ba
la
n
c
e
d
m
e
a
s
ur
e
th
a
t
c
om
bi
ne
s
bot
h
pr
e
c
is
io
n
a
nd
r
e
c
a
ll
in
to
a
s
in
gl
e
m
e
tr
ic
.
I
t
is
c
a
lc
ul
a
te
d
a
s
th
e
ha
r
m
oni
c
m
e
a
n
of
pr
e
c
is
io
n
a
nd
r
e
c
a
ll
,
pr
ovi
di
ng
a
c
om
pr
e
he
ns
iv
e
vi
e
w
of
th
e
s
ys
te
m
’
s
ove
r
a
ll
pe
r
f
or
m
a
nc
e
.
A
hi
gh
F
1
s
c
or
e
s
ig
ni
f
ie
s
th
a
t
th
e
O
C
R
s
ys
te
m
pe
r
f
or
m
s
w
e
ll
in
bot
h
a
c
c
ur
a
te
ly
id
e
nt
if
yi
ng r
e
le
va
nt
t
e
xt
a
nd c
a
pt
ur
in
g a
s
m
uc
h t
e
xt
a
s
po
s
s
ib
le
f
r
om
t
he
i
m
a
ge
s
.
3.
R
E
S
U
L
T
S
A
N
D
D
I
S
C
U
S
S
I
O
N
T
he
im
pl
e
m
e
nt
a
ti
on
of
th
e
m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h
s
ys
t
e
m
,
w
hi
c
h
in
te
gr
a
te
s
O
C
R
a
nd
T
T
S
s
ynt
he
s
is
,
h
a
s
pr
oduc
e
d
pr
om
is
in
g
r
e
s
ul
t
s
.
T
h
e
O
C
R
c
om
pone
n
t
of
th
e
s
ys
te
m
e
xhi
bi
te
d
hi
gh
pe
r
f
or
m
a
nc
e
in
te
xt
e
xt
r
a
c
ti
on,
a
c
hi
e
vi
ng
a
n
a
ve
r
a
ge
pr
e
c
i
s
io
n
r
a
te
of
0.976
.
T
hi
s
hi
gh
pr
e
c
i
s
io
n
in
di
c
a
te
s
th
e
s
y
s
te
m
’
s
r
obus
tn
e
s
s
in
a
c
c
ur
a
te
ly
id
e
nt
if
yi
ng
a
nd
e
xt
r
a
c
ti
ng
te
xt
f
r
om
va
r
io
us
ty
pe
s
of
im
a
ge
s
,
in
c
lu
di
ng
s
c
a
nne
d
doc
um
e
nt
s
, di
gi
ta
l
phot
os
, s
c
r
e
e
ns
hot
s
,
a
nd ha
ndw
r
it
te
n note
s
.
T
he
pr
e
c
is
io
n
m
e
tr
ic
r
e
f
le
c
ts
th
e
pr
opor
ti
on
o
f
c
or
r
e
c
tl
y
id
e
nt
if
ie
d
te
xt
r
e
gi
ons
out
of
th
e
to
ta
l
id
e
nt
if
ie
d
te
xt
r
e
gi
ons
.
A
pr
e
c
is
io
n
r
a
te
of
0.976
s
ig
ni
f
ie
s
th
a
t
th
e
O
C
R
s
y
s
te
m
is
hi
ghl
y
e
f
f
e
c
ti
ve
a
t
m
in
im
iz
in
g
f
a
ls
e
pos
it
iv
e
s
,
w
he
r
e
non
-
te
xt
a
r
e
a
s
a
r
e
in
c
or
r
e
c
tl
y
r
e
c
ogni
z
e
d
a
s
te
xt
.
T
hi
s
hi
gh
a
c
c
ur
a
c
y
is
c
r
uc
ia
l
f
or
a
ppl
ic
a
ti
ons
w
he
r
e
th
e
c
or
r
e
c
t
in
te
r
pr
e
ta
ti
on
of
te
xt
is
e
s
s
e
nt
i
a
l,
s
uc
h
a
s
c
onve
r
ti
ng
pr
in
te
d
or
ha
ndw
r
it
te
n
doc
um
e
nt
s
in
to
m
a
c
hi
ne
-
r
e
a
da
bl
e
f
or
m
a
ts
.
T
h
e
s
uc
c
e
s
s
f
ul
im
pl
e
m
e
nt
a
ti
on
of
th
is
O
C
R
c
om
pone
nt
unde
r
s
c
or
e
s
th
e
e
f
f
e
c
ti
ve
ne
s
s
of
th
e
c
hos
e
n
a
lg
or
it
hm
s
a
nd
te
c
hni
que
s
in
pr
oc
e
s
s
in
g
di
ve
r
s
e
im
a
ge
in
put
s
.
B
y
a
c
c
ur
a
te
ly
e
xt
r
a
c
ti
ng t
e
xt
, t
he
s
ys
te
m
l
a
y
s
a
s
ol
id
f
ounda
ti
on f
or
t
he
s
ubs
e
que
nt
TTS
s
ynt
he
s
is
pha
s
e
,
w
hi
c
h r
e
li
e
s
on t
he
qua
li
ty
of
th
e
e
xt
r
a
c
te
d
te
xt
to
ge
ne
r
a
te
c
le
a
r
a
nd
c
ohe
r
e
nt
s
poke
n
out
put
.
T
hi
s
in
te
gr
a
te
d
a
ppr
oa
c
h
e
nha
nc
e
s
th
e
a
c
c
e
s
s
ib
il
it
y
a
nd
us
a
bi
li
ty
of
th
e
im
a
ge
-
to
-
s
pe
e
c
h
s
ys
te
m
f
or
vi
s
ua
ll
y
im
pa
ir
e
d
a
nd
i
ll
it
e
r
a
te
in
di
vi
dua
ls
a
c
r
os
s
m
ul
ti
pl
e
l
a
ngua
ge
s
.
T
a
bl
e
1
pr
ovi
de
s
a
de
ta
il
e
d
e
va
lu
a
ti
on
of
O
C
R
a
c
c
ur
a
c
y
a
c
r
os
s
di
f
f
e
r
e
nt
im
a
ge
ty
pe
s
,
s
how
c
a
s
in
g
th
e
s
ys
te
m
’
s
p
e
r
f
or
m
a
nc
e
in
va
r
io
us
s
c
e
na
r
io
s
.
T
h
e
ta
bl
e
in
c
lu
de
s
m
e
tr
ic
s
s
uc
h
a
s
pr
e
c
is
io
n,
r
e
c
a
ll
,
a
nd
F
1
s
c
or
e
,
w
hi
c
h
a
r
e
c
r
it
ic
a
l
f
or
a
s
s
e
s
s
in
g
th
e
e
f
f
e
c
ti
ve
ne
s
s
of
te
xt
e
xt
r
a
c
t
io
n.
F
or
s
c
a
nne
d
doc
um
e
nt
s
,
th
e
O
C
R
s
ys
t
e
m
a
c
hi
e
ve
s
t
he
hi
ghe
s
t
a
c
c
ur
a
c
y w
it
h a
pr
e
c
is
io
n of
0.98, r
e
c
a
ll
of
0.97, a
nd a
n F
1
s
c
or
e
of
0.975. T
hi
s
i
ndi
c
a
te
s
th
a
t
th
e
s
ys
te
m
r
e
li
a
bl
y
e
xt
r
a
c
ts
te
xt
f
r
om
s
c
a
nne
d
doc
um
e
nt
s
w
it
h
m
in
im
a
l
e
r
r
or
s
a
nd
hi
gh
c
om
pl
e
te
ne
s
s
.
D
ig
it
a
l
phot
os
f
ol
l
ow
w
it
h
a
pr
e
c
is
io
n
of
0.95
a
nd
a
r
e
c
a
ll
of
0
.94,
r
e
s
ul
ti
ng
in
a
n
F
1
s
c
or
e
of
0.945.
T
hi
s
r
e
f
le
c
ts
s
tr
ong
pe
r
f
or
m
a
nc
e
,
th
ough
s
li
ght
ly
le
s
s
a
c
c
ur
a
te
th
a
n
s
c
a
nne
d
do
c
um
e
nt
s
du
e
to
pot
e
nt
ia
l
im
a
ge
qua
li
ty
va
r
ia
ti
ons
.
S
c
r
e
e
n
s
hot
s
s
how
a
pr
e
c
is
io
n
of
0.96
a
nd
r
e
c
a
ll
of
0.95,
w
it
h
a
n
F
1
s
c
or
e
of
0.955.
T
he
s
ys
te
m
pe
r
f
or
m
s
w
e
ll
in
e
xt
r
a
c
ti
ng
te
xt
f
r
om
s
c
r
e
e
ns
hot
s
,
de
m
ons
tr
a
ti
ng
it
s
ve
r
s
a
ti
li
ty
a
c
r
os
s
di
f
f
e
r
e
nt
i
m
a
ge
ty
pe
s
.
M
e
a
nw
hi
le
,
h
a
ndw
r
it
te
n
n
ot
e
s
e
xhi
bi
t
th
e
lo
w
e
s
t
a
c
c
ur
a
c
y,
w
it
h
a
pr
e
c
is
io
n
of
0.90,
r
e
c
a
ll
of
0.88,
a
nd
a
n
F
1
s
c
or
e
of
0.890
,
w
hi
le
s
ti
ll
e
f
f
e
c
ti
ve
,
th
e
s
ys
te
m
f
a
c
e
s
m
or
e
c
ha
ll
e
nge
s
w
it
h
ha
ndw
r
it
te
n
te
xt
due
to
it
s
in
he
r
e
nt
va
r
ia
bi
li
ty
a
nd c
om
pl
e
xi
ty
.
T
a
bl
e
1
.
O
C
R
a
c
c
ur
a
c
y a
c
r
os
s
di
f
f
e
r
e
nt
i
m
a
ge
t
ype
s
I
m
a
ge
t
ype
P
r
e
c
i
s
i
on
R
e
c
a
l
l
F
1
s
c
or
e
S
c
a
nne
d doc
um
e
nt
s
0.98
0.97
0.975
D
i
gi
t
a
l
phot
os
0.95
0.94
0.945
S
c
r
e
e
ns
hot
s
0.96
0.95
0.955
H
a
ndw
r
i
t
t
e
n not
e
s
0.90
0.88
0.890
T
a
bl
e
2
pr
ovi
de
s
a
c
om
pa
r
a
ti
ve
a
na
ly
s
i
s
of
te
xt
r
e
c
ogni
ti
on a
c
c
ur
a
c
y
a
c
r
os
s
va
r
io
us
l
a
ngua
ge
s
us
in
g
th
e
O
C
R
s
ys
te
m
.
T
he
m
e
tr
ic
s
pr
e
s
e
nt
e
d
in
c
lu
de
pr
e
c
is
io
n,
r
e
c
a
ll
,
a
nd
F
1
s
c
or
e
,
w
hi
c
h
r
e
f
le
c
t
th
e
s
ys
te
m
’
s
pe
r
f
or
m
a
nc
e
i
n r
e
c
ogni
z
in
g t
e
xt
f
r
om
di
f
f
e
r
e
nt
l
in
gui
s
ti
c
c
ont
e
x
ts
.
F
or
E
ngl
is
h, t
he
O
C
R
s
ys
t
e
m
de
m
ons
tr
a
te
s
e
xc
e
pt
io
na
l
a
c
c
ur
a
c
y
w
it
h
a
pr
e
c
is
io
n
of
0.97
a
nd
a
r
e
c
a
ll
of
0.
96,
r
e
s
ul
ti
ng
in
a
hi
gh
F
1
s
c
or
e
of
0.965.
T
hi
s
in
di
c
a
te
s
th
a
t
th
e
s
ys
te
m
r
e
li
a
bl
y
id
e
nt
if
ie
s
a
nd
e
xt
r
a
c
ts
E
ngl
is
h
te
xt
w
it
h
m
in
im
a
l
e
r
r
or
s
.
S
pa
ni
s
h
s
how
s
s
tr
ong
pe
r
f
or
m
a
nc
e
w
it
h
a
pr
e
c
is
io
n
of
0.95
a
nd
a
r
e
c
a
ll
of
0.94
,
le
a
di
ng
to
a
n
F
1
s
c
or
e
of
0.945.
T
hi
s
r
e
f
le
c
ts
th
e
s
ys
te
m
’
s
c
a
p
a
bi
li
ty
to
ha
ndl
e
S
pa
ni
s
h
te
xt
e
f
f
e
c
ti
ve
ly
.
M
a
nda
r
in
ha
s
lo
w
e
r
a
c
c
ur
a
c
y
c
om
pa
r
e
d
to
E
ur
ope
a
n
la
ngua
ge
s
,
w
it
h
a
pr
e
c
i
s
io
n
of
0.92,
a
r
e
c
a
ll
of
0.
91,
a
nd
a
n
F
1
s
c
or
e
of
0.915.
T
hi
s
i
s
a
tt
r
ib
ut
e
d
to
th
e
c
om
pl
e
xi
ty
of
M
a
nda
r
in
c
ha
r
a
c
te
r
s
a
nd s
c
r
ip
t.
F
r
e
nc
h a
nd I
ta
li
a
n a
ls
o s
how
hi
gh pe
r
f
or
m
a
nc
e
, w
it
h F
1
s
c
or
e
s
of
0.935
f
or
bot
h
la
ngua
ge
s
,
in
di
c
a
ti
ng
r
obus
t
r
e
c
ogni
ti
on
c
a
pa
bi
li
ti
e
s
.
I
ndone
s
ia
n
ha
s
a
pr
e
c
is
io
n
of
0.93
a
nd
a
r
e
c
a
ll
of
0.92,
r
e
s
ul
t
in
g i
n a
n
F
1
s
c
or
e
of
0.925, d
e
m
ons
tr
a
ti
ng
e
f
f
e
c
ti
ve
t
e
xt
r
e
c
ogni
ti
on.
G
e
r
m
a
n a
c
hi
e
ve
s
a
pr
e
c
is
io
n
of
0.96
a
nd
a
r
e
c
a
ll
of
0.95,
w
it
h
a
n
F
1
s
c
or
e
of
0.955,
hi
gh
li
ght
in
g
it
s
hi
gh
a
c
c
ur
a
c
y
in
te
xt
r
e
c
ogni
ti
on.
J
a
pa
ne
s
e
a
nd
K
or
e
a
n
ha
v
e
lo
w
e
r
s
c
or
e
s
,
w
it
h
F
1
s
c
or
e
s
of
0.895
a
nd
0.905,
r
e
s
pe
c
ti
ve
ly
,
r
e
f
le
c
ti
ng
c
ha
ll
e
nge
s
in
r
e
c
ogni
z
in
g
th
e
s
e
s
c
r
ip
ts
.
A
r
a
bi
c
s
how
s
th
e
lo
w
e
s
t
a
c
c
ur
a
c
y
w
it
h
a
pr
e
c
is
io
n
of
0.88,
a
r
e
c
a
ll
of
0.87, a
nd a
n F
1
s
c
or
e
of
0.875, due
t
o t
he
i
nt
r
ic
a
c
ie
s
of
t
he
A
r
a
bi
c
s
c
r
ip
t.
T
a
bl
e
3
pr
e
s
e
nt
s
th
e
pe
r
f
or
m
a
nc
e
m
e
tr
ic
s
of
th
e
T
T
S
c
onve
r
s
i
on
s
ys
te
m
us
e
d
in
th
is
r
e
s
e
a
r
c
h.
T
he
phone
m
e
s
ynt
he
s
is
qua
li
ty
is
m
e
a
s
ur
e
d
a
t
0.95,
in
di
c
a
ti
ng
a
hi
gh
le
ve
l
of
a
c
c
ur
a
c
y
in
ge
ne
r
a
ti
ng
phone
m
e
s
ounds
f
r
om
te
xt
,
w
hi
c
h
is
c
r
uc
ia
l
f
or
pr
oduc
in
g
in
te
ll
ig
ib
le
s
pe
e
c
h.
W
a
ve
f
or
m
c
la
r
it
y
is
r
a
te
d
a
s
hi
gh,
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
14
, N
o.
3
,
J
une
20
25
:
1743
-
1751
1748
r
e
f
le
c
ti
ng
th
e
e
f
f
e
c
ti
ve
ne
s
s
of
th
e
w
a
ve
f
or
m
ge
ne
r
a
ti
on
pr
oc
e
s
s
in
c
r
e
a
ti
ng
c
le
a
r
a
nd
di
s
to
r
ti
on
-
f
r
e
e
a
udi
o
s
ig
na
ls
.
L
a
s
tl
y,
s
pe
e
c
h
na
tu
r
a
ln
e
s
s
is
not
e
d
a
s
im
pr
ove
d,
hi
ghl
ig
ht
in
g
th
a
t
th
e
s
ynt
he
s
iz
e
d
s
pe
e
c
h
s
ounds
m
or
e
na
tu
r
a
l
a
nd
hum
a
n
-
li
ke
,
th
a
nks
to
th
e
im
pl
e
m
e
nt
a
ti
on
of
a
dva
nc
e
d
te
c
hni
que
s
s
uc
h
a
s
W
a
ve
N
e
t
m
ode
ls
.
T
oge
th
e
r
,
th
e
s
e
m
e
tr
ic
s
de
m
on
s
tr
a
te
th
a
t
th
e
TTS
s
ys
te
m
de
li
ve
r
s
hi
gh
-
qua
li
ty
a
nd
r
e
a
li
s
ti
c
s
pe
e
c
h
out
put
,
e
nha
nc
in
g ove
r
a
ll
us
e
r
e
xpe
r
ie
nc
e
a
nd a
c
c
e
s
s
ib
il
it
y.
M
e
a
nw
hi
le
, T
a
bl
e
4
pr
e
s
e
nt
s
e
x
e
c
ut
io
n
ti
m
e
s
f
or
c
om
pone
nt
s
o
f
a
n
A
I
-
pow
e
r
e
d
m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h
s
ys
t
e
m
,
m
e
a
s
ur
e
d
in
m
il
li
s
e
c
onds
(
m
s
)
.
I
m
a
ge
pr
e
pr
oc
e
s
s
in
g
ta
ke
s
50
m
s
,
e
f
f
ic
ie
nt
ly
pr
e
pa
r
in
g
im
a
ge
s
f
or
O
C
R
.
T
he
O
C
R
pha
s
e
r
e
qui
r
e
s
120
m
s
to
e
xt
r
a
c
t
te
xt
,
r
e
f
le
c
ti
ng
th
e
c
om
pl
e
xi
ty
of
r
e
c
ogni
z
in
g
va
r
io
us
te
xt
f
or
m
a
ts
.
F
ol
lo
w
in
g
th
is
,
te
xt
c
le
a
ni
ng
oc
c
ur
s
in
30
m
s
,
w
he
r
e
th
e
e
xt
r
a
c
te
d
te
xt
i
s
or
ga
ni
z
e
d
f
or
s
pe
e
c
h
c
onve
r
s
io
n.
T
h
e
T
T
S
c
onve
r
s
io
n
is
th
e
m
o
s
t
ti
m
e
-
in
te
ns
iv
e
c
om
pone
nt
,
ta
ki
ng
200
m
s
to
s
ynt
he
s
iz
e
na
tu
r
a
l
-
s
ounding
s
pe
e
c
h
f
r
om
th
e
c
le
a
ne
d
te
xt
,
hi
ghl
ig
ht
in
g
th
e
c
om
put
a
ti
ona
l
de
m
a
nd
s
of
m
ul
ti
li
ngua
l
s
ynt
he
s
is
.
L
a
s
tl
y,
a
udi
o
pl
a
yba
c
k
r
e
qui
r
e
s
ju
s
t
15
m
s
,
de
m
ons
tr
a
ti
ng
th
e
s
ys
te
m
’
s
e
f
f
ic
ie
nc
y
in
de
li
ve
r
in
g
a
udi
o
out
put
s
.
T
he
to
ta
l
e
xe
c
ut
io
n
ti
m
e
of
415
m
s
in
di
c
a
te
s
th
e
c
um
ul
a
ti
ve
dur
a
ti
on
f
r
om
i
m
a
ge
in
put
to
a
udi
o
out
put
,
s
ugge
s
ti
ng t
he
s
y
s
te
m
’
s
r
e
s
pon
s
iv
e
ne
s
s
f
or
r
e
a
l
-
ti
m
e
a
ppl
ic
a
ti
ons
a
im
e
d a
t
a
s
s
is
ti
ng vis
u
a
ll
y i
m
pa
ir
e
d us
e
r
s
.
T
a
bl
e
2.
T
e
xt
r
e
c
ogni
ti
on a
c
c
ur
a
c
y w
it
h di
f
f
e
r
e
nt
l
a
ngua
ge
s
L
a
ngua
ge
P
r
e
c
i
s
i
on
R
e
c
a
l
l
F
1
s
c
or
e
E
ngl
i
s
h
0.97
0.96
0.965
S
pa
ni
s
h
0.95
0.94
0.945
M
a
nda
r
i
n
0.92
0.91
0.915
F
r
e
nc
h
0.94
0.93
0.935
I
ndone
s
i
a
n
0.93
0.92
0.925
G
e
r
m
a
n
0.96
0.95
0.955
I
t
a
l
i
a
n
0.94
0.93
0.935
J
a
pa
ne
s
e
0.90
0.89
0.895
K
or
e
a
n
0.91
0.90
0.905
A
r
a
bi
c
0.88
0.87
0.875
T
a
bl
e
3. P
e
r
f
or
m
a
nc
e
m
e
tr
ic
s
of
T
T
S
c
onve
r
s
io
n
M
e
t
r
i
c
V
a
l
ue
P
hone
m
e
s
ynt
he
s
i
s
qu
a
l
i
t
y
0.95
W
a
ve
f
or
m
c
l
a
r
i
t
y
H
i
gh
S
pe
e
c
h na
t
ur
a
l
ne
s
s
I
m
pr
ove
d
T
a
bl
e
4.
E
xe
c
ut
io
n
ti
m
e
s
of
di
f
f
e
r
e
nt
c
om
pone
nt
s
i
n t
he
a
i
-
pow
e
r
e
d m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h s
ys
te
m
C
om
pone
nt
E
xe
c
ut
i
on
t
i
m
e
(
m
s
)
N
ot
e
s
I
m
a
ge
pr
e
pr
oc
e
s
s
i
ng
50
T
i
m
e
t
a
ke
n t
o l
oa
d a
nd pr
e
pr
oc
e
s
s
t
he
i
m
a
ge
O
C
R
120
T
i
m
e
t
a
ke
n f
or
t
he
O
C
R
m
ode
l
t
o e
xt
r
a
c
t
t
e
xt
T
e
xt
c
l
e
a
ni
ng
30
T
i
m
e
t
a
ke
n f
or
c
l
e
a
ni
ng a
nd f
or
m
a
t
t
i
ng t
he
e
xt
r
a
c
t
e
d t
e
xt
TTS
c
onve
r
s
i
on
200
T
i
m
e
t
a
ke
n t
o c
onve
r
t
t
he
c
l
e
a
ne
d t
e
xt
t
o s
pe
e
c
h
A
udi
o pl
a
yba
c
k
15
T
i
m
e
t
a
ke
n t
o pl
a
y t
he
ge
ne
r
a
t
e
d a
udi
o
T
ot
a
l
e
xe
c
ut
i
on t
i
m
e
415
S
um
of
a
l
l
e
xe
c
ut
i
on t
i
m
e
s
f
or
t
he
c
om
pl
e
t
e
pr
oc
e
s
s
T
he
hi
gh
O
C
R
a
c
c
ur
a
c
y
a
c
hi
e
ve
d
by
th
e
s
ys
te
m
unde
r
s
c
or
e
s
it
s
e
f
f
e
c
ti
ve
ne
s
s
in
a
c
c
ur
a
te
ly
e
xt
r
a
c
ti
ng
te
xt
f
r
om
im
a
ge
s
.
T
he
r
ig
or
ous
pr
e
-
pr
oc
e
s
s
in
g
te
c
hni
que
s
,
s
u
c
h
a
s
c
ont
r
a
s
t
s
tr
e
tc
hi
ng
a
nd
noi
s
e
r
e
duc
ti
on,
pl
a
ye
d
a
c
r
uc
ia
l
r
ol
e
in
e
nha
nc
in
g
te
xt
vi
s
ib
il
it
y
a
nd
th
us
i
m
p
r
ovi
ng
th
e
pe
r
f
o
r
m
a
nc
e
of
th
e
O
C
R
.
C
ont
r
a
s
t
s
tr
e
tc
hi
ng
a
dj
us
te
d
pi
xe
l
in
te
ns
it
y
le
ve
ls
to
m
a
ke
te
xt
s
ta
nd
out
m
or
e
di
s
ti
nc
tl
y
a
ga
in
s
t
it
s
ba
c
kgr
ound,
w
hi
le
noi
s
e
r
e
duc
ti
on
m
in
im
iz
e
d
a
r
ti
f
a
c
ts
th
a
t
c
oul
d
hi
nde
r
te
xt
r
e
c
o
gni
ti
on.
T
oge
th
e
r
,
th
e
s
e
te
c
hni
que
s
f
a
c
il
it
a
te
d
m
or
e
a
c
c
ur
a
te
te
xt
e
xt
r
a
c
ti
on
by
th
e
O
C
R
e
ngi
ne
.
F
ur
th
e
r
e
nha
nc
e
m
e
nt
in
th
e
s
ys
te
m
’
s
pe
r
f
or
m
a
nc
e
is
a
tt
r
ib
ut
e
d
to
th
e
us
e
of
a
dva
n
c
e
d
m
ode
l
s
li
ke
W
a
ve
N
e
t
f
or
ph
one
m
e
s
ynt
he
s
is
.
W
a
ve
N
e
t,
a
d
e
e
p
g
e
ne
r
a
ti
ve
m
ode
l
f
or
c
r
e
a
ti
ng
r
a
w
a
udi
o
w
a
ve
f
or
m
s
,
s
ig
ni
f
ic
a
nt
ly
im
pr
ove
d
th
e
na
tu
r
a
ln
e
s
s
a
nd
qu
a
li
ty
of
th
e
s
ynt
he
s
iz
e
d
s
pe
e
c
h. U
nl
ik
e
tr
a
di
ti
ona
l
s
pe
e
c
h
s
ynt
h
e
s
is
m
e
th
ods
,
W
a
ve
N
e
t
m
ode
ls
ge
ne
r
a
te
m
or
e
n
a
tu
r
a
l
a
n
d
hum
a
n
-
li
ke
s
pe
e
c
h
by
m
ode
li
ng
th
e
a
udi
o
w
a
ve
f
or
m
a
t
a
f
in
e
r
le
ve
l
of
de
ta
il
.
T
hi
s
a
dva
n
c
e
m
e
nt
is
r
e
f
le
c
te
d
in
th
e
im
pr
ove
d
phone
m
e
s
ynt
he
s
is
qu
a
li
ty
,
w
he
r
e
th
e
ge
ne
r
a
te
d
s
pe
e
c
h
c
lo
s
e
l
y
r
e
s
e
m
bl
e
s
na
tu
r
a
l
hum
a
n
s
pe
e
c
h
in
te
r
m
s
of
f
lu
id
it
y a
nd e
xpr
e
s
s
iv
e
ne
s
s.
T
he
s
y
s
te
m
’
s
a
bi
li
ty
to
pe
r
f
or
m
w
e
ll
a
c
r
os
s
di
f
f
e
r
e
nt
la
ngua
ge
s
a
nd
t
e
xt
ty
pe
s
d
e
m
ons
tr
a
te
s
it
i
s
r
obus
tn
e
s
s
a
nd
r
e
li
a
bi
li
ty
in
TTS
c
onve
r
s
io
n.
I
t
e
f
f
e
c
ti
ve
ly
ha
n
dl
e
s
m
ul
ti
pl
e
la
ngua
ge
s
,
in
c
lu
di
ng
th
os
e
w
it
h
c
om
pl
e
x
s
c
r
ip
ts
a
nd
phone
ti
c
s
tr
uc
tu
r
e
s
,
pr
ovi
di
ng
a
c
c
ur
a
te
a
nd
c
le
a
r
s
poke
n
out
put
.
T
he
in
te
gr
a
ti
on
of
W
a
ve
N
e
t
f
or
w
a
ve
f
or
m
ge
ne
r
a
ti
on
f
ur
th
e
r
e
nha
nc
e
s
th
e
r
e
a
li
s
m
of
th
e
s
ynt
he
s
iz
e
d
s
pe
e
c
h,
m
a
ki
ng
it
m
or
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
in
te
ll
ig
e
nc
e
m
ul
ti
li
ngual i
m
age
-
to
-
s
pe
e
c
h f
or
ac
c
e
s
s
ib
il
it
y
and te
x
t
r
e
c
ogni
ti
on
(
R
os
al
in
a
)
1749
e
nga
gi
ng
a
nd
e
a
s
ie
r
to
unde
r
s
ta
nd
f
or
us
e
r
s
.
O
ve
r
a
ll
,
th
e
c
om
bi
na
ti
on
of
hi
gh
-
a
c
c
ur
a
c
y
O
C
R
a
nd
a
dva
nc
e
d
T
T
S
te
c
hnol
ogi
e
s
of
f
e
r
s
a
pow
e
r
f
ul
s
ol
ut
io
n
f
o
r
e
nha
nc
in
g
a
c
c
e
s
s
ib
il
it
y
f
or
vi
s
ua
ll
y
im
pa
ir
e
d
a
nd
il
li
te
r
a
te
in
di
vi
dua
ls
.
B
y
br
id
gi
ng
th
e
ga
p
be
twe
e
n
vi
s
ua
l
a
nd
a
udi
to
r
y
in
f
or
m
a
ti
on,
th
e
s
ys
te
m
m
a
k
e
s
te
xt
-
ba
s
e
d
c
ont
e
nt
m
or
e
a
c
c
e
s
s
ib
le
th
r
ough
s
poke
n
out
put
.
T
hi
s
in
te
gr
a
ti
on
r
e
pr
e
s
e
nt
s
a
s
ig
ni
f
ic
a
nt
a
dva
nc
e
m
e
nt
in
a
s
s
i
s
ti
ve
te
c
hnol
ogy,
e
na
bl
in
g
u
s
e
r
s
to
a
c
c
e
s
s
in
f
or
m
a
ti
on
th
a
t
w
a
s
pr
e
vi
o
us
ly
le
s
s
a
c
c
e
s
s
ib
l
e
du
e
to
vi
s
ua
l
im
pa
ir
m
e
nt
s
.
F
ut
ur
e
w
or
k
c
oul
d
f
oc
us
on
e
xpa
ndi
ng
m
ul
ti
li
ngua
l
s
uppor
t
to
in
c
lu
de
a
br
oa
de
r
r
a
nge
of
la
ngua
ge
s
a
nd
di
a
le
c
ts
,
a
s
w
e
ll
a
s
im
pr
ovi
ng
r
e
a
l
-
ti
m
e
TTS
c
onve
r
s
io
n
c
a
pa
bi
li
ti
e
s
.
E
nha
nc
e
m
e
nt
s
in
th
e
s
e
a
r
e
a
s
w
oul
d
f
ur
th
e
r
in
c
r
e
a
s
e
th
e
s
ys
te
m
’
s
ve
r
s
a
ti
li
ty
a
nd
a
ppl
ic
a
bi
li
ty
in
va
r
io
us
c
ont
e
xt
s
,
m
a
ki
ng
it
a
m
or
e
in
c
lu
s
iv
e
to
o
l
f
or
us
e
r
s
w
it
h
di
ve
r
s
e
ne
e
d
s
.
A
ddi
ti
ona
ll
y,
im
pr
ovi
ng
r
e
a
l
-
ti
m
e
T
T
S
c
onve
r
s
io
n
c
a
pa
bi
li
ti
e
s
is
c
r
uc
i
a
l
f
or
e
nha
nc
in
g t
he
s
ys
te
m
’
s
u
s
a
bi
li
ty
.
A
r
eal
-
ti
m
e
c
onv
e
r
s
io
n w
oul
d
a
ll
ow
us
e
r
s
t
o r
e
c
e
iv
e
s
poke
n output
i
n
s
ta
nt
ly
a
s
te
xt
is
r
e
c
ogni
z
e
d,
m
a
ki
ng
th
e
s
y
s
te
m
m
or
e
r
e
s
pons
iv
e
a
nd
pr
a
c
ti
c
a
l
f
or
dyna
m
ic
e
nvi
r
onm
e
nt
s
.
T
hi
s
e
nha
nc
e
m
e
nt
w
oul
d
be
pa
r
ti
c
ul
a
r
ly
be
ne
f
ic
ia
l
in
a
ppl
ic
a
ti
ons
s
uc
h
a
s
li
ve
e
ve
nt
s
or
r
e
a
l
-
ti
m
e
doc
um
e
nt
r
e
a
di
ng,
w
he
r
e
im
m
e
di
a
te
f
e
e
db
a
c
k
i
s
e
s
s
e
nt
ia
l.
B
y
a
ddr
e
s
s
in
g
th
e
s
e
a
r
e
a
s
,
th
e
s
y
s
te
m
’
s
v
e
r
s
a
ti
li
ty
w
oul
d
be
s
ig
ni
f
ic
a
nt
ly
in
c
r
e
a
s
e
d.
U
s
e
r
s
w
it
h
va
r
yi
ng
li
ngui
s
ti
c
a
nd
a
c
c
e
s
s
ib
il
it
y
ne
e
ds
w
oul
d
be
n
e
f
it
f
r
om
a
m
or
e
a
da
pt
a
bl
e
a
nd
e
f
f
ic
ie
nt
t
ool
.
T
hi
s
pr
ogr
e
s
s
w
oul
d e
n
s
ur
e
t
ha
t
t
he
s
ys
te
m
r
e
m
a
in
s
r
e
le
va
nt
a
nd us
e
f
ul
i
n di
ve
r
s
e
c
ont
e
xt
s
,
m
a
ki
ng
it
a
m
or
e
in
c
lu
s
iv
e
s
ol
ut
io
n
f
or
in
di
vi
dua
ls
w
it
h
vi
s
ua
l
im
pa
ir
m
e
nt
s
or
li
te
r
a
c
y
c
ha
ll
e
nge
s
w
or
ld
w
id
e
.
4.
C
O
N
C
L
U
S
I
O
N
T
he
im
pl
e
m
e
nt
a
ti
on
of
th
e
m
ul
ti
li
ngua
l
im
a
ge
-
to
-
s
pe
e
c
h
s
ys
te
m
e
f
f
e
c
ti
ve
ly
a
li
gns
w
it
h
th
e
e
xpe
c
ta
ti
ons
out
li
ne
d
in
th
e
in
tr
oduc
ti
on.
T
h
e
in
te
gr
a
ti
on
of
O
C
R
a
nd
T
T
S
s
ynt
he
s
is
de
m
ons
tr
a
te
d
hi
gh
a
c
c
ur
a
c
y
in
te
xt
e
xt
r
a
c
ti
on
a
nd
im
pr
ove
d
s
pe
e
c
h
ge
ne
r
a
ti
on
qua
l
it
y,
va
li
da
ti
ng
th
e
s
ys
te
m
’
s
c
a
pa
bi
li
ty
to
br
id
ge
vi
s
ua
l
a
nd
a
udi
to
r
y
in
f
or
m
a
ti
on
f
or
e
nha
nc
e
d
a
c
c
e
s
s
ib
il
it
y.
T
he
s
uc
c
e
s
s
f
ul
us
e
of
a
dva
nc
e
d
m
ode
ls
li
ke
W
a
ve
N
e
t
f
or
phone
m
e
s
ynt
h
e
s
is
a
nd
w
a
ve
f
or
m
ge
ne
r
a
ti
on
ha
s
r
e
s
ul
te
d
in
na
tu
r
a
l
a
nd
hi
gh
-
qua
li
ty
s
p
e
e
c
h
out
put
,
m
e
e
ti
ng
th
e
in
it
ia
l
goa
l
of
pr
ovi
di
ng
a
r
e
li
a
bl
e
a
nd
in
c
lu
s
iv
e
to
ol
f
or
vi
s
ua
ll
y
im
pa
ir
e
d
a
nd
il
li
te
r
a
te
in
di
vi
dua
ls
.
T
he
r
e
s
ul
ts
a
nd
di
s
c
u
s
s
io
n
hi
ghl
ig
ht
th
e
s
ys
te
m
’
s
s
tr
ong
pe
r
f
or
m
a
nc
e
a
c
r
os
s
va
r
io
us
la
ngua
ge
s
a
nd
te
xt
ty
pe
s
,
a
f
f
ir
m
in
g
i
t
i
s
ve
r
s
a
ti
li
ty
a
nd
pr
a
c
ti
c
a
l
ut
il
i
ty
.
L
o
oki
ng
a
he
a
d,
f
ut
ur
e
r
e
s
e
a
r
c
h
c
oul
d
f
oc
us
on
e
xpa
ndi
ng
m
ul
ti
li
ngua
l
s
uppor
t
to
in
c
lu
de
a
br
oa
de
r
r
a
ng
e
of
la
ngua
ge
s
a
nd
di
a
le
c
ts
,
th
e
r
e
by
in
c
r
e
a
s
in
g
th
e
s
ys
te
m
’
s
gl
oba
l
a
ppl
ic
a
bi
li
ty
.
A
ddi
ti
ona
ll
y,
e
nha
nc
in
g
r
e
a
l
-
ti
m
e
TTS
c
a
pa
bi
li
ti
e
s
c
oul
d
f
ur
th
e
r
im
pr
ove
th
e
s
ys
te
m
’
s
r
e
s
pon
s
iv
e
ne
s
s
a
nd
us
e
r
e
xpe
r
ie
nc
e
.
T
h
e
pr
os
pe
c
t
o
f
f
ur
th
e
r
de
ve
lo
pm
e
nt
in
c
lu
de
s
r
e
f
in
in
g
th
e
s
e
f
e
a
tu
r
e
s
to
m
a
k
e
th
e
s
ys
t
e
m
m
or
e
a
d
a
pt
a
bl
e
a
nd
u
s
e
f
ul
in
di
ve
r
s
e
c
ont
e
xt
s
.
B
y
a
ddr
e
s
s
in
g
th
e
s
e
a
r
e
a
s
,
f
ut
ur
e
s
tu
di
e
s
c
a
n
bui
ld
on
th
e
c
ur
r
e
nt
r
e
s
e
a
r
c
h
to
e
nha
nc
e
a
c
c
e
s
s
ib
il
it
y
te
c
hnol
ogi
e
s
,
e
ns
ur
in
g
th
e
y
m
e
e
t
th
e
e
vol
vi
ng
ne
e
ds
of
us
e
r
s
w
or
ld
w
id
e
.
A
C
K
N
O
WL
E
D
G
E
M
E
N
T
S
T
he
a
ut
hor
th
a
nks
U
N
I
T
A
R
I
nt
e
r
na
ti
ona
l
U
ni
ve
r
s
it
y
a
nd
P
r
e
s
id
e
nt
U
ni
ve
r
s
it
y
f
or
th
e
ir
s
uppor
t
a
nd
r
e
s
our
c
e
s
th
r
oughout
th
is
r
e
s
e
a
r
c
h.
S
pe
c
ia
l
a
ppr
e
c
ia
ti
on
i
s
e
xt
e
nde
d
to
th
e
f
a
c
ul
ty
a
nd s
ta
f
f
f
or
th
e
ir
gui
da
nc
e
a
nd e
nc
our
a
ge
m
e
nt
. T
h
e
ir
f
in
a
nc
ia
l
a
nd i
ns
ti
tu
ti
ona
l
ba
c
ki
ng w
a
s
e
s
s
e
nt
ia
l
to
t
he
s
u
c
c
e
s
s
of
t
hi
s
s
tu
dy.
F
U
N
D
I
N
G
I
N
F
O
R
M
A
T
I
O
N
T
hi
s
r
e
s
e
a
r
c
h w
a
s
s
uppor
te
d by f
undi
ng f
r
om
U
N
I
T
A
R
a
nd P
r
e
s
id
e
nt
U
ni
ve
r
s
it
y.
A
U
T
H
O
R
C
O
N
T
R
I
B
U
T
I
O
N
S
S
T
A
T
E
M
E
N
T
T
hi
s
jo
ur
na
l
us
e
s
th
e
C
ont
r
ib
ut
or
R
ol
e
s
T
a
xonomy
(
C
R
e
d
iT
)
to
r
e
c
ogni
z
e
in
di
vi
dua
l
a
ut
hor
c
ont
r
ib
ut
io
ns
, r
e
duc
e
a
ut
hor
s
hi
p
di
s
put
e
s
,
a
nd f
a
c
il
it
a
te
c
ol
la
bo
r
a
ti
on.
N
am
e
o
f
A
u
t
h
or
C
M
So
Va
Fo
I
R
D
O
E
Vi
Su
P
Fu
H
a
s
a
nul
F
a
hm
i
✓
✓
✓
✓
✓
✓
✓
✓
✓
R
os
a
li
na
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
G
e
nt
a
S
a
hur
i
✓
✓
✓
✓
✓
✓
C
:
C
onc
e
pt
ua
l
i
z
a
t
i
on
M
:
M
e
t
hodol
ogy
So
:
So
f
t
w
a
r
e
Va
:
Va
l
i
da
t
i
on
Fo
:
Fo
r
m
a
l
a
na
l
ys
i
s
I
:
I
nve
s
t
i
ga
t
i
on
R
:
R
e
s
our
c
e
s
D
:
D
a
t
a
C
ur
a
t
i
on
O
:
W
r
i
t
i
ng
-
O
r
i
gi
na
l
D
r
a
f
t
E
:
W
r
i
t
i
ng
-
R
e
vi
e
w
&
E
di
t
i
ng
Vi
:
Vi
s
ua
l
i
z
a
t
i
on
Su
:
Su
pe
r
vi
s
i
on
P
:
P
r
oj
e
c
t
a
dm
i
ni
s
t
r
a
t
i
on
Fu
:
Fu
ndi
ng a
c
qui
s
i
t
i
on
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
14
, N
o.
3
,
J
une
20
25
:
1743
-
1751
1750
C
O
N
F
L
I
C
T
O
F
I
N
T
E
R
E
S
T
S
T
A
T
E
M
E
N
T
T
he
a
ut
hor
s
de
c
la
r
e
th
a
t
th
e
y
ha
v
e
no
known
c
om
pe
ti
ng
f
in
a
nc
ia
l
in
te
r
e
s
ts
or
pe
r
s
ona
l
r
e
la
ti
ons
hi
ps
th
a
t
c
oul
d ha
ve
a
ppe
a
r
e
d t
o i
nf
lu
e
nc
e
t
h
e
w
or
k r
e
por
te
d i
n t
hi
s
pa
pe
r
.
I
N
F
O
R
M
E
D
C
O
N
S
E
N
T
W
e
ha
ve
obt
a
in
e
d i
nf
or
m
e
d c
ons
e
nt
f
r
om
a
ll
i
ndi
vi
dua
ls
i
nc
lu
de
d i
n t
hi
s
s
tu
dy.
E
T
H
I
C
A
L
A
P
P
R
O
V
A
L
T
he
r
e
s
e
a
r
c
h
r
e
la
te
d
to
hum
a
n
us
e
ha
s
c
om
pl
ie
d
w
it
h
a
ll
r
e
le
va
nt
na
ti
ona
l
r
e
gul
a
ti
ons
a
nd
in
s
ti
tu
t
io
na
l
pol
ic
ie
s
in
a
c
c
or
da
nc
e
w
it
h
th
e
te
ne
t
s
of
th
e
H
e
ls
in
ki
D
e
c
la
r
a
ti
on
a
nd
ha
s
be
e
n
a
ppr
ove
d
by
th
e
a
ut
hor
s
'
in
s
ti
tu
ti
ona
l
r
e
vi
e
w
boa
r
d or
e
qui
va
le
nt
c
om
m
it
te
e
.
D
A
T
A
A
V
A
I
L
A
B
I
L
I
T
Y
D
a
ta
a
va
il
a
bi
li
ty
i
s
not
a
ppl
ic
a
bl
e
t
o t
h
is
p
a
pe
r
a
s
no ne
w
da
t
a
w
e
r
e
c
r
e
a
te
d or
a
na
ly
z
e
d i
n t
hi
s
s
tu
dy.
R
E
F
E
R
E
N
C
E
S
[
1]
Y
.
K
.
D
w
i
ve
di
,
e
t
al
.
,
“
S
e
t
t
i
ng
t
he
f
ut
ur
e
of
di
gi
t
a
l
a
nd
s
oc
i
a
l
m
e
di
a
m
a
r
ke
t
i
ng
r
e
s
e
a
r
c
h:
pe
r
s
pe
c
t
i
ve
s
a
nd
r
e
s
e
a
r
c
h
pr
opos
i
t
i
ons
,
”
I
nt
e
r
nat
i
onal
J
our
nal
of
I
nf
or
m
at
i
on
M
anage
m
e
nt
,
vol
.
59,
no.
1,
pp.
1
–
37,
2021,
doi
:
10.1016/
j
.i
j
i
nf
om
gt
.2020.102168.
[
2]
B
.
K
ur
i
a
kos
e
,
R
.
S
hr
e
s
t
ha
,
a
nd
F
.
E
.
S
a
ndne
s
,
“
T
ool
s
a
nd
t
e
c
hnol
ogi
e
s
f
or
bl
i
nd
a
nd
vi
s
ua
l
l
y
i
m
pa
i
r
e
d
na
vi
ga
t
i
on
s
uppor
t
:
a
r
e
vi
e
w
,
”
I
E
T
E
T
e
c
hni
c
al
R
e
v
i
e
w
, vol
. 39, no. 1, pp. 1
-
16, S
e
p. 2020, doi
:
10.1080/
02564
602.2020.1819893.
[
3]
S
.
K
l
a
uke
,
C
.
S
ondoc
i
e
,
a
nd
I
.
F
i
ne
,
“
T
he
i
m
pa
c
t
of
l
ow
vi
s
i
on
on
s
oc
i
a
l
f
unc
t
i
on:
t
he
pot
e
nt
i
a
l
i
m
por
t
a
nc
e
of
l
os
t
vi
s
u
a
l
s
oc
i
a
l
c
ue
s
,
”
J
ou
r
nal
of
O
pt
om
e
t
r
y
, vol
. 16, no. 1,
M
a
y 2022, doi
:
10.1016/
j
.opt
om
.2022.03.003.
[
4]
M
.
F
a
yya
d
a
nd
A
.
R
.
Al
-
S
i
nna
w
i
,
“
C
ha
l
l
e
nge
s
of
a
c
hi
e
vi
ng
f
i
na
nc
i
a
l
i
nc
l
us
i
on
f
or
i
nd
i
vi
dua
l
s
w
i
t
h
vi
s
ua
l
i
m
pa
i
r
m
e
nt
s
,
”
H
e
l
i
y
o
n,
vol
. 10, no. 16, A
ug. 2024, doi
:
10.1016/
j
.he
l
i
yon.2024.e
35573.
[
5]
F
.
F
ue
nt
e
s
,
A
.
M
or
e
no,
a
nd
F
.
D
í
e
z
,
“
T
he
us
a
bi
l
i
t
y
of
i
c
t
s
i
n
pe
opl
e
w
i
t
h
vi
s
u
a
l
di
s
a
bi
l
i
t
i
e
s
:
a
c
ha
l
l
e
ng
e
i
n
s
pa
i
n
,
”
I
nt
e
r
nat
i
onal
J
our
nal
of
E
nv
i
r
onm
e
nt
al
R
e
s
e
ar
c
h and P
ubl
i
c
H
e
al
t
h
, vol
. 19, no. 17, A
ug. 2022, doi
:
10.3390/
i
j
e
r
ph191710782.
[
6]
J
.
W
a
ng,
S
.
W
a
ng,
a
nd
Y
.
Z
ha
ng,
“
A
r
t
i
f
i
c
i
a
l
i
nt
e
l
l
i
ge
nc
e
f
or
vi
s
ua
l
l
y
i
m
pa
i
r
e
d,
”
D
i
s
pl
ay
s
,
vol
.
77,
A
pr
.
2023,
doi
:
10.1016/
j
.di
s
pl
a
.2023.102391.
[
7]
R
.
C
.
J
os
hi
,
N
.
S
i
ngh,
A
.
K
.
S
ha
r
m
a
,
R
.
B
ur
ge
t
a
nd
M
.
K
.
D
ut
t
a
,
“
AI
-
s
e
ns
e
vi
s
i
on:
a
l
ow
-
c
os
t
a
r
t
i
f
i
c
i
a
l
-
i
nt
e
l
l
i
ge
nc
e
-
ba
s
e
d
r
obus
t
a
nd
r
e
a
l
-
t
i
m
e
a
s
s
i
s
t
a
nc
e
f
or
vi
s
ua
l
l
y
i
m
pa
i
r
e
d
pe
opl
e
,
”
I
E
E
E
T
r
ans
ac
t
i
ons
on
H
um
an
-
M
ac
hi
ne
S
y
s
t
e
m
s
,
vol
.
54,
no.
3
,
pp. 325
-
336, J
un
.
2024, doi
:
10.1109/
T
H
M
S
.2024.3375655.
[
8]
R
.
S
he
ndge
,
A
.
P
a
t
i
l
,
a
nd
S
.
K
a
du,
“
S
m
a
r
t
na
vi
ga
t
i
on
f
or
v
i
s
ua
l
l
y
i
m
pa
i
r
e
d
pe
opl
e
us
i
ng
a
r
t
i
f
i
c
i
a
l
i
nt
e
l
l
i
ge
nc
e
,
”
I
T
M
W
e
b
of
C
onf
e
r
e
nc
e
s
, vol
. 44, 2022, doi
:
10.1051/
i
t
m
c
onf
/
20224403053.
[
9]
S
.
S
e
l
va
n,
J
.
S
t
e
l
l
a
,
K
.
B
a
nd
N
.
V
.
G
.
S
.
N
i
ki
t
ha
,
“
S
m
a
r
t
s
hoppi
ng
t
r
ol
l
e
y
ba
s
e
d
on
i
ot
a
nd
a
i
f
or
t
he
vi
s
u
a
l
l
y
i
m
pa
i
r
e
d
,
”
i
n
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
C
ogni
t
i
v
e
R
obot
i
c
s
and
I
nt
e
l
l
i
ge
nt
Sy
s
t
e
m
s
(
I
C
C
-
R
O
B
I
N
S
)
,
C
oi
m
ba
t
o
r
e
,
I
ndi
a
,
2024,
pp.
132
-
138,
doi
:
10.1109/
I
C
C
-
R
O
B
I
N
S
60238.2024.10533927.
[
10]
A
.
K
uz
de
uov,
O
.
M
uka
ye
v,
S
.
N
ur
ga
l
i
ye
v,
A
.
K
unbol
s
yn
a
nd
H
.
A
.
V
a
r
ol
,
“
C
ha
t
G
P
T
f
or
vi
s
ua
l
l
y
i
m
pa
i
r
e
d
a
nd
bl
i
nd
,
”
i
n
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on A
r
t
i
f
i
c
i
al
I
nt
e
l
l
i
ge
nc
e
i
n I
nf
or
m
at
i
on and C
om
m
un
i
c
at
i
on (
I
C
A
I
I
C
)
,
O
s
a
ka
, J
a
pa
n, 2024, pp. 722
-
727,
doi
:
10.1109/
I
C
A
I
I
C
60209.2024.10463430.
[
11]
C
. C
he
r
ot
i
c
h, K
.
P
.
C
he
pt
oo, a
nd R
.
M
.
O
ba
r
e
,
“
C
ha
l
l
e
nge
s
i
n a
c
c
e
s
s
i
ng di
gi
t
a
l
r
e
s
our
c
e
s
a
m
ong vi
s
ua
l
l
y i
m
pa
i
r
e
d (
V
I
)
s
t
ude
nt
s
a
t
t
he
uni
ve
r
s
i
t
y of
na
i
r
obi
l
i
br
a
r
y
,
”
I
n
f
or
m
at
i
on D
e
v
e
l
opm
e
nt
, J
un. 2024, doi
:
10.1177/
02666669241259083.
[
12]
Y
.
A
bde
l
a
a
l
a
nd
D
.
A
l
-
T
ha
ni
,
“
A
c
c
e
s
s
i
bi
l
i
t
y
f
i
r
s
t
:
de
t
e
c
t
i
ng
f
r
us
t
r
a
t
i
on
i
n
w
e
b
br
ow
s
i
ng
f
or
vi
s
ua
l
l
y
i
m
pa
i
r
e
d
a
nd
s
i
ght
e
d
s
m
a
r
t
phone
us
e
r
s
,
”
U
ni
v
e
r
s
al
A
c
c
e
s
s
i
n t
he
I
nf
or
m
at
i
on Soc
i
e
t
y
, O
c
t
. 2023, doi
:
10.1007/
s
10209
-
023
-
01053
-
3.
[
13]
A
.
B
a
um
ga
r
t
ne
r
,
T
.
R
ohr
ba
c
h,
a
nd
P
.
S
c
hönha
ge
n,
“
I
f
t
he
phone
w
e
r
e
br
oke
n,
I
’
d
be
s
c
r
e
w
e
d
’
:
m
e
di
a
us
e
of
pe
opl
e
w
i
t
h
di
s
a
bi
l
i
t
i
e
s
i
n t
he
di
gi
t
a
l
e
r
a
,
”
D
i
s
abi
l
i
t
y
&
Soc
i
e
t
y
, pp. 1
–
25, M
a
y 2021, doi
:
10.1080/
096
87599.2021.1916884.
[
14]
M
.
R
ohr
ba
c
h,
W
.
Q
i
u,
I
.
T
i
t
ov,
S
.
T
ha
t
e
r
,
M
.
P
i
nka
l
a
nd
B
.
S
c
hi
e
l
e
,
“
T
r
a
ns
l
a
t
i
ng
vi
de
o
c
ont
e
nt
t
o
na
t
ur
a
l
l
a
ngua
ge
de
s
c
r
i
pt
i
ons
,
”
i
n
I
E
E
E
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
C
om
put
e
r
V
i
s
i
on,
S
ydne
y,
N
S
W
,
A
us
t
r
a
l
i
a
,
M
a
r
.
2013,
pp.
433
-
440,
doi
:
10.1109/
I
C
C
V
.2013.61.
[
15]
D
.
J
i
nda
l
,
C
.
K
a
ur
,
A
.
P
a
ni
gr
a
hi
,
B
.
S
oni
,
A
.
S
ha
r
m
a
a
nd
S
.
S
i
ngl
a
,
“
M
ul
t
i
l
i
n
gua
l
c
r
os
s
-
m
oda
l
i
m
a
ge
s
ynt
he
s
i
s
w
i
t
h
t
e
xt
-
gui
de
d
ge
ne
r
a
t
i
ve
a
i
,
”
i
n
Si
x
t
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
C
om
put
at
i
onal
I
nt
e
l
l
i
ge
nc
e
and
C
om
m
uni
c
at
i
on
T
e
c
hnol
ogi
e
s
(
C
C
I
C
T
)
,
S
one
pa
t
, I
ndi
a
, 2024, pp. 576
-
582, doi
:
10.1109/
C
C
I
C
T
62777.2024.00096.
[
16]
S
.
K
.
S
i
ngl
a
a
nd
R
.
K
.
Y
a
da
v,
“
O
pt
i
c
a
l
c
ha
r
a
c
t
e
r
r
e
c
ogni
t
i
on
ba
s
e
d
s
pe
e
c
h
s
y
nt
he
s
i
s
s
y
s
t
e
m
us
i
ng
l
a
bvi
e
w
,
”
J
ou
r
nal
of
A
ppl
i
e
d
R
e
s
e
ar
c
h and T
e
c
hnol
ogy
, vol
. 12, no. 5, pp. 919
-
926, O
c
t
. 2014, doi
:
10.1016/
s
1665
-
6423(
14)
70598
-
x.
[
17]
S
.
F
a
i
z
ul
l
a
h,
M
.
S
.
A
yub,
S
.
H
us
s
a
i
n,
a
nd
M
.
A
.
K
h
a
n,
“
A
s
ur
ve
y
of
oc
r
i
n
a
r
a
bi
c
l
a
ngua
ge
:
a
ppl
i
c
a
t
i
ons
,
t
e
c
hni
que
s
,
a
nd
c
ha
l
l
e
nge
s
,
”
A
ppl
i
e
d s
c
i
e
nc
e
s
, vol
. 13, no. 7, A
pr
. 2023, doi
:
10.3390/
a
pp13074
584.
[
18]
M
.
E
.
M
a
t
r
e
a
nd
D
.
L
.
C
a
m
e
r
on,
“
A
s
c
opi
ng
r
e
vi
e
w
on
t
he
us
e
of
s
pe
e
c
h
-
to
-
t
e
xt
t
e
c
hnol
ogy
f
or
a
dol
e
s
c
e
nt
s
w
i
t
h
l
e
a
r
ni
ng
di
f
f
i
c
ul
t
i
e
s
i
n
s
e
c
onda
r
y
e
du
c
a
t
i
on,
”
D
i
s
abi
l
i
t
y
and
R
e
habi
l
i
t
at
i
on:
A
s
s
i
s
t
i
v
e
T
e
c
hnol
ogy
,
pp.
1
–
14,
N
ov.
2022
,
doi
:
10.1080/
17483107.2022.2149865.
[
19]
L
.
O
r
ynba
y,
B
.
R
a
z
a
khova
,
P
.
P
e
e
r
,
B
.
M
e
de
n,
a
nd
Ž
.
E
m
e
r
š
i
č
,
“
R
e
c
e
nt
a
dva
nc
e
s
i
n
s
ynt
he
s
i
s
a
nd
i
nt
e
r
a
c
t
i
on
of
s
pe
e
c
h,
t
e
xt
,
a
nd
vi
s
i
on,
”
E
l
e
c
t
r
oni
c
s
,
vol
. 13, no. 9, A
pr
. 2024, doi
:
10.3390/
e
l
e
c
t
r
oni
c
s
1309172
6.
[
20]
S
a
.
K
a
s
m
a
i
e
e
a
nd
M
.
T
a
dj
f
a
r
,
“
E
l
l
i
pt
i
c
a
l
pr
e
s
s
ur
e
s
w
i
r
l
j
e
t
i
s
s
ui
ng
i
nt
o
s
t
a
gna
nt
a
i
r
,
”
P
hy
s
i
c
s
of
F
l
ui
ds
,
vol
.
36,
no.
7,
J
ul
.
2024,
doi
:
10.1063/
5.0198105.
[
21]
Z
. C
a
i
, Y
. Y
a
ng, a
nd
M
.
L
i
,
“
C
r
os
s
-
l
i
ngua
l
m
ul
t
i
-
s
pe
a
k
e
r
s
pe
e
c
h
s
ynt
he
s
i
s
w
i
t
h l
i
m
i
t
e
d bi
l
i
ngua
l
t
r
a
i
ni
ng da
t
a
,
”
C
om
put
e
r
Spe
e
c
h
&
L
anguage
, vol
. 77, J
a
n. 2023, doi
:
10.1016/
j
.c
s
l
.2022.101427.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
in
te
ll
ig
e
nc
e
m
ul
ti
li
ngual i
m
age
-
to
-
s
pe
e
c
h f
or
ac
c
e
s
s
ib
il
it
y
and te
x
t
r
e
c
ogni
ti
on
(
R
os
al
in
a
)
1751
[
22]
X
.
P
e
ng,
H
.
C
a
o,
S
.
S
e
t
l
ur
,
V
.
G
ovi
nda
r
a
j
u,
a
nd
P
.
N
a
t
a
r
a
j
a
n,
“
M
ul
t
i
l
i
ngua
l
O
C
R
r
e
s
e
a
r
c
h
a
nd
a
ppl
i
c
a
t
i
ons
,
”
P
r
oc
e
e
di
ngs
of
t
he
4t
h I
nt
e
r
nat
i
onal
W
or
k
s
hop on M
ul
t
i
l
i
ngual
,
2013, doi
:
10.1145/
2505377.2509
977.
[
23]
D
.
P
ur
m
a
ya
nt
i
,
“
T
he
c
ha
l
l
e
ng
e
s
of
i
m
pl
e
m
e
nt
i
ng
di
gi
t
a
l
l
i
t
e
r
a
c
y
i
n
t
e
a
c
hi
ng
a
nd
l
e
a
r
ni
ng
a
c
t
i
vi
t
i
e
s
f
or
e
f
l
l
e
a
r
ne
r
s
i
n
I
ndone
s
i
a
,
”
B
A
T
A
R
A
D
I
D
I
:
E
ngl
i
s
h L
anguage
J
our
nal
, vol
. 1, no. 2, pp. 101
-
110, O
c
t
. 2022, doi
:
10.56209/
ba
di
.v1i
2.38.
[
24]
P
.
W
i
l
l
i
a
m
s
,
“
E
xpl
or
i
ng
t
he
c
ha
l
l
e
nge
s
of
de
ve
l
opi
ng
di
gi
t
a
l
l
i
t
e
r
a
c
y
i
n
t
he
c
ont
e
xt
of
s
pe
c
i
a
l
e
duc
a
t
i
ona
l
ne
e
ds
c
om
m
uni
t
i
e
s
,
”
I
nnov
at
i
on
i
n
T
e
ac
hi
ng
and
L
e
ar
ni
ng
i
n
I
nf
or
m
at
i
on
and
C
om
put
e
r
S
c
i
e
nc
e
s
,
vol
.
5,
no.
1,
pp.
1
-
16,
J
a
n.
2006,
doi
:
10.11120/
i
t
a
l
.2006.05010006.
[
25]
P
. R
e
ddy, K
.
C
ha
udha
r
y, a
nd
S
. H
us
s
e
i
n,
“
A
di
gi
t
a
l
l
i
t
e
r
a
c
y m
ode
l
t
o
na
r
r
ow
t
he
di
gi
t
a
l
l
i
t
e
r
a
c
y s
ki
l
l
s
ga
p,
”
H
e
l
i
y
on
, vol
. 9,
no. 4
,
A
pr
. 2023, doi
:
10.1016/
j
.he
l
i
yon.2023.e
14878.
[
26]
S
.
K
a
s
m
a
i
e
e
,
M
.
T
a
dj
f
a
r
,
S
.
K
a
s
m
a
i
e
e
,
a
nd
G
.
A
hm
a
di
,
“
L
i
ne
a
r
s
t
a
bi
l
i
t
y
a
na
l
ys
i
s
of
s
ur
f
a
c
e
w
a
ve
s
of
l
i
qui
d
j
e
t
i
nj
e
c
t
e
d
i
n
t
r
a
ns
ve
r
s
e
ga
s
f
l
ow
w
i
t
h
di
f
f
e
r
e
nt
a
ngl
e
s
,
”
T
he
or
e
t
i
c
al
and
C
om
put
at
i
onal
F
l
u
i
d
D
y
nam
i
c
s
,
vol
.
38,
pp.
107
–
138,
F
e
b.
2024,
do
i
:
10.1007/
s
00162
-
024
-
00685
-
2.
[
27]
S
a
.
K
a
s
m
a
i
e
e
a
nd
M
.
T
a
dj
f
a
r
,
“
N
on
-
c
i
r
c
ul
a
r
pr
e
s
s
ur
e
s
w
i
r
l
noz
z
l
e
s
i
nj
e
c
t
i
ng
i
nt
o
s
t
a
gna
nt
a
i
r
,
”
I
nt
e
r
nat
i
onal
J
our
nal
of
M
ul
t
i
phas
e
F
l
ow
, vol
. 175,
M
a
y 2024, doi
:
10.1016/
j
.i
j
m
ul
t
i
pha
s
e
f
l
ow
.2024.104798.
[
28]
J
. G
a
o, A
. Z
ongw
e
n, a
nd B
.
X
ue
z
ong,
“
A
ne
w
r
e
pr
e
s
e
nt
a
t
i
on m
e
t
hod f
or
pr
oba
bi
l
i
t
y di
s
t
r
i
but
i
ons
of
m
ul
t
i
m
oda
l
a
nd i
r
r
e
gul
a
r
da
t
a
ba
s
e
d on uni
f
or
m
m
i
xt
ur
e
m
ode
l
,
”
A
nnal
s
of
O
pe
r
at
i
ons
R
e
s
e
ar
c
h,
A
pr
. 2019, doi
:
10.1007/
s
10479
-
019
-
03236
-
9.
B
I
O
G
R
A
P
H
I
E
S
O
F
A
U
T
H
O
R
S
Rosalina
is
a
lecturer
at
President
University,
specializes
in
artifici
al
intell
igence,
applied
computi
ng,
and
computi
ng
methodo
l
ogies.
She
excels
at
transl
ating
theoretical
research
into
practical
applications,
fostering
innovation
and
technological
advancement.
Her
work
integrates
cutting
-
edge
research
with
real
-
world
solutions,
enh
ancing
both
academic
understanding and industry p
ractices. She
can be contacted at
email: ro
salina@
president.ac.i
d.
Hasanul
Fahmi
is
a
lecturer
at
UNITAR
International
University,
specializes
i
n
information
systems,
data
analytics
,
and
IT
Project
Management.
His
research
focuses
on
leverag
ing
data
-
driven
approaches
to
enhance
system
efficienc
y
and
innovation.
His
contribu
tions
bridge academ
ic research w
ith in
dustry
practice, ad
vancing t
echnology
solut
ions.
He can be contacted at email:
fahmi.zuhri@
unitar.my.
Genta
Sahuri
a
dedicated
lecturer
at
President
University
’
s
Infor
mation
Systems
Study
Program,
brings
a
wealth
of
knowledge
and
expertise
to
his
r
ole.
Holding
a
Master
’
s
degree
in
Informatics
from
the
same
institution
underscores
his
c
ommitment
to
academic
excellence.
With
a
strong
education
al
background
and
a
passion
for
his
field,
Genta
plays
a
crucial
role
in
shaping
the
academic
journey
of
his
student
s.
His
adeptne
ss
in
conveying
complex
concepts
fosters
a
dynamic
and
enriching
learning
environm
ent.
He
d
edication
to
his
field
and
his
ability
to
inspire
students
make
him
a
valuable
asset
to
Presiden
t
Unive
rsity.
He
c
an
be
contacted
at email
: genta.
sahuri@
president
.ac.id.
Evaluation Warning : The document was created with Spire.PDF for Python.