I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
20
22
, pp.
1
61
~
1
72
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
11
.i
1
.pp
1
61
-
1
72
161
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
A
r
t
i
f
i
c
i
al
sp
e
e
c
h
d
e
t
e
c
t
i
on
u
si
n
g i
m
a
ge
-
b
ase
d
f
e
at
u
r
e
s an
d
r
an
d
om
f
or
e
st
c
l
ass
i
f
i
e
r
C
h
oon
B
e
n
g T
an
1
, M
oh
d
H
an
af
i
A
h
m
ad
H
ij
az
i
1
,
F
r
az
ie
r
K
ok
2
, M
oh
d
S
ab
e
r
i
M
oh
am
ad
3
,
P
u
t
e
r
i
N
or
E
ll
yz
a N
oh
u
d
d
in
4
1
F
a
c
ul
t
y of
C
om
put
i
ng a
nd I
nf
or
m
a
t
i
c
s
, U
ni
ve
r
s
i
t
i
M
a
l
a
ys
i
a
S
a
ba
h,
K
i
na
ba
l
u,
M
a
l
a
ys
i
a
2
B
a
yur
i
ni
S
dn B
hd, P
e
na
m
pa
ng,
K
i
na
ba
l
u
, M
a
l
a
y
s
i
a
3
C
ol
l
e
ge
of
M
e
di
c
i
ne
a
nd H
e
a
l
t
h S
c
i
e
nc
e
s
,
U
ni
t
e
d A
r
a
b E
m
i
r
a
t
e
s
U
ni
ve
r
s
i
t
y,
A
bu D
ha
bi
,
U
ni
t
e
d A
r
a
b E
m
i
r
a
t
e
s
4
I
ns
t
i
t
ut
e
of
I
R
4.0, U
ni
ve
r
s
i
t
i
K
e
ba
ngs
a
a
n M
a
l
a
ys
i
a
,
B
a
ngi
, M
a
l
a
y
s
i
a
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
Ju
l
1
5
, 2021
R
e
vi
s
e
d
N
ov 26, 2021
A
c
c
e
pt
e
d
D
e
c
11
,
2021
The
ASVspoof
2015
Challenge
was
one
of
the
efforts
of
the
re
search
communi
ty
in
the
field
of
speech
processin
g
to
foster
the
develop
ment
of
generalized
countermeasures
against
spoofing
attacks.
However,
most
countermeas
ures
submit
ted
to
the
ASVspoo
f
2015
Challeng
e
fai
led
to
detect
the
S10
attack
effectively,
the
only
attack
that
was
generated
usi
ng
the
waveform
concatenation
approach.
Hence,
more
informative
featur
es
are
needed
to
detect
previously
unseen
spoofing
attacks.
This
paper
pres
ents
an
approach
that
uses
data
transforma
tion
techniques
to
engineer
image
-
based
features
together
with
r
andom
forest
classifi
er
to
detect
artificial
spee
c
h.
The
objectives
are
two
-
fold:
(i)
to
extract
image
-
based
features
from
th
e
mel
-
frequency
cepstral
coefficients
representation
of
the
speech
signal
and
(ii)
to
compare
the
performance
of
using
the
extracted
features
and
Random
Forest
to
determine
the
authenticity
of
voices
with
the
existing
approach
es.
An
audio
-
to
-
image
transformation
technique
was
used
to
engineer
new
f
eatures
in
cl
assifyi
ng
genuine
and
spoof
voices.
An
experiment
was
condu
cted
to
find
the
appropriate
combination
of
the
engineered
features
and
cla
ssifier.
Experimental
results
showed
that
the
proposed
approach
was
able
to
detect
speech
synthesis
and
voice
conversion
a
ttacks
effectively,
with
an
equal
error rate
of 0.10% and accura
cy of 99.93%
.
K
e
y
w
o
r
d
s
:
A
nt
i
-
s
poof
in
g voic
e
r
e
c
ogni
ti
on
A
r
ti
f
ic
ia
l
s
pe
e
c
h de
te
c
ti
on
S
pe
a
ke
r
r
e
c
ogni
ti
on
S
pe
a
ke
r
ve
r
if
ic
a
ti
on
V
oi
c
e
pr
e
s
e
nt
a
ti
on
a
tt
a
c
k
de
te
c
ti
on
This is an
open
acce
ss artic
le unde
r the
CC BY
-
SA
license.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
M
ohd Ha
na
f
i
A
hm
a
d H
ij
a
z
i
F
a
c
ul
ty
of
C
om
put
in
g a
nd
I
nf
or
m
a
ti
c
s
,
U
ni
ve
r
s
it
i
M
a
la
ys
ia
S
a
b
a
h
J
a
la
n U
M
S
, S
a
ba
h,
M
a
la
y
s
ia
E
m
a
il
:
ha
na
f
i@ums
.e
du.my
1.
I
N
T
R
O
D
U
C
T
I
O
N
V
oi
c
e
r
e
c
ogni
ti
on,
of
te
n
known
a
s
s
pe
a
k
e
r
r
e
c
ogni
ti
on,
is
th
e
a
c
t
of
id
e
nt
if
yi
ng
a
nd
v
e
r
if
yi
ng
a
s
pe
a
ki
ng
hum
a
n.
I
t
i
s
di
vi
de
d
in
to
two
c
a
te
gor
ie
s
:
s
pe
a
k
e
r
id
e
nt
if
ic
a
ti
on
a
nd
s
pe
a
ke
r
ve
r
if
ic
a
ti
on.
S
pe
a
ke
r
id
e
nt
if
ic
a
ti
on
is
th
e
pr
oc
e
s
s
of
de
te
r
m
in
in
g
a
s
pe
a
ki
ng
in
di
vi
d
ua
l’
s
id
e
nt
it
y,
w
he
r
e
a
s
s
pe
a
ke
r
ve
r
if
ic
a
ti
on
is
th
e
a
c
t
of
ve
r
if
yi
ng
th
a
t
in
di
vi
dua
l’
s
c
la
im
e
d
id
e
nt
it
y.
F
ig
ur
e
1
de
pi
c
t
s
th
e
di
s
ti
nc
ti
on
be
twe
e
n
s
p
e
a
ke
r
id
e
nt
if
ic
a
ti
on
a
nd
s
pe
a
ke
r
ve
r
if
ic
a
ti
on.
I
n
r
e
c
ogni
z
in
g
a
nd
va
li
da
ti
ng
th
e
id
e
nt
it
y
of
a
pe
r
s
on
f
r
om
voi
c
e
,
s
pe
a
ke
r
r
e
c
ogni
ti
on e
m
pl
oys
bot
h phys
io
lo
gi
c
a
l
a
nd be
ha
vi
or
a
l
c
om
pone
nt
s
.
A
ut
om
a
ti
c
s
pe
a
k
e
r
ve
r
if
ic
a
ti
on
(
A
S
V
)
is
th
e
pr
oc
e
s
s
of
ve
r
if
yi
ng
th
e
c
la
im
e
d
id
e
nt
it
y
of
a
s
pe
a
ki
ng
in
di
vi
dua
l
a
ut
om
a
ti
c
a
ll
y.
I
n
m
os
t
A
S
V
s
ys
te
m
s
,
th
e
s
pe
a
k
e
r
e
nr
ol
m
e
nt
pha
s
e
a
nd
th
e
s
p
e
a
ke
r
ve
r
if
ic
a
ti
on
pha
s
e
a
r
e
th
e
two
ke
y
pha
s
e
s
.
D
ur
in
g
s
p
e
a
ke
r
e
nr
ol
m
e
nt
,
th
e
A
S
V
s
ys
te
m
c
a
pt
ur
e
s
th
e
s
pe
a
k
e
r
’
s
vo
ic
e
a
nd
e
xt
r
a
c
ts
a
tt
r
ib
ut
e
s
th
a
t
a
r
e
ut
il
iz
e
d
to
c
r
e
a
te
a
s
pe
a
ke
r
m
od
e
l
of
t
he
s
pe
a
ki
ng
in
di
vi
dua
l.
T
h
e
s
p
e
a
ke
r
m
ode
l
i
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
1
61
-
1
72
162
th
e
n r
e
gi
s
te
r
e
d w
it
h t
he
A
S
V
s
ys
te
m
. D
ur
in
g s
pe
a
ke
r
ve
r
if
ic
a
ti
on, t
he
s
pe
a
ki
ng i
ndi
vi
dua
l’
s
voi
c
e
i
s
r
e
c
or
de
d
to
c
r
e
a
te
a
s
pe
a
k
e
r
m
ode
l
f
or
ve
r
if
ic
a
ti
on.
A
f
te
r
th
a
t,
th
e
s
p
e
a
k
e
r
m
ode
l
is
c
om
pa
r
e
d
to
th
e
c
la
im
e
d
id
e
nt
it
y’
s
s
pe
a
ke
r
m
ode
l
in
th
e
A
S
V
s
ys
te
m
.
F
in
a
ll
y,
th
e
m
a
tc
hi
ng
w
il
l
g
e
ne
r
a
te
a
s
c
or
e
,
w
it
h
th
e
c
la
im
be
in
g
a
c
c
e
pt
e
d
if
th
e
s
c
or
e
is
e
qu
a
l
to
or
gr
e
a
te
r
th
a
n
th
e
A
S
V
s
ys
te
m
’
s
th
r
e
s
h
ol
d.
O
th
e
r
w
is
e
,
th
e
c
la
im
w
il
l
be
tu
r
ne
d
do
w
n.
N
um
e
r
ous
ty
pe
s
of
f
e
a
tu
r
e
s
h
a
ve
be
e
n
de
pl
oye
d
f
or
A
S
V
s
ys
t
e
m
s
.
G
a
us
s
ia
n
m
ix
tu
r
e
m
ode
ls
(
G
M
M
)
w
e
r
e
e
xt
e
ns
iv
e
ly
us
e
d i
n t
he
pa
s
t
f
or
f
e
a
tu
r
e
e
xt
r
a
c
ti
on t
o p
r
oduc
e
r
o
bus
t
A
S
V
s
ys
te
m
s
. I
n s
pe
a
ke
r
ve
r
if
ic
a
ti
on,
th
e
uni
ve
r
s
a
l
ba
c
kgr
ound
m
od
e
l
(
U
B
M
)
is
a
s
p
e
a
ke
r
m
ode
l
th
a
t
r
e
pr
e
s
e
nt
s
br
oa
d
a
tt
r
ib
ut
e
s
a
nd
c
ha
r
a
c
te
r
is
ti
c
s
th
a
t
c
a
n
be
c
om
pa
r
e
d
to
th
e
s
p
e
c
if
ic
pe
r
s
on
be
in
g
ve
r
if
ie
d
[
1]
.
L
a
te
r
,
i
-
ve
c
to
r
a
nd
x
-
ve
c
to
r
[
2]
ba
s
e
d
A
S
V
s
ys
te
m
s
w
e
r
e
in
tr
oduc
e
d
to
r
e
pl
a
c
e
th
e
ga
us
s
ia
n
m
ix
tu
r
e
m
od
e
l
-
uni
ve
r
s
a
l
ba
c
kgr
ound
m
ode
l
(
G
M
M
-
U
B
M
)
ba
s
e
d
A
S
V
s
ys
te
m
s
.
D
e
e
p
le
a
r
ni
ng
a
ppr
oa
c
he
s
[
3]
s
uc
h
a
s
r
e
c
ur
r
e
n
t
ne
ur
a
l
ne
twor
k
(
R
N
N
)
[
4]
a
s
a
ba
c
ke
nd
c
la
s
s
if
ie
r
w
a
s
s
how
n t
he
c
a
pa
bi
li
ty
i
n s
pe
a
ke
r
ve
r
if
ic
a
ti
on w
it
h
a
l
ow
e
qua
l
e
r
r
or
r
a
te
(
E
E
R
)
.
F
ig
ur
e
1. T
he
i
ll
us
tr
a
ti
on of
s
pe
a
ke
r
i
de
nt
if
ic
a
ti
on ve
r
s
us
s
pe
a
k
e
r
ve
r
if
ic
a
ti
on
T
o
m
it
ig
a
te
th
e
is
s
ue
of
s
e
c
ur
it
y
th
r
e
a
ts
to
th
e
A
S
V
s
y
s
te
m
s
,
voi
c
e
pr
e
s
e
nt
a
ti
on
a
tt
a
c
k
de
te
c
ti
on
(
P
A
D
)
w
a
s
in
tr
oduc
e
d.
V
oi
c
e
P
A
D
c
a
n
be
c
a
te
gor
iz
e
d
in
to
two
m
a
jo
r
ty
pe
s
,
na
m
e
ly
a
r
ti
f
ic
ia
l
a
nd
r
e
pl
a
ye
d
s
pe
e
c
h
de
te
c
ti
on.
T
he
a
r
ti
f
ic
ia
l
s
pe
e
c
h
w
a
s
ge
n
e
r
a
te
d
by
s
pe
e
c
h
s
ynt
he
s
i
s
a
nd
voi
c
e
c
onve
r
s
io
n,
w
he
r
e
a
s
r
e
pl
a
ye
d
s
pe
e
c
h
w
a
s
ge
n
e
r
a
te
d
by
r
e
pl
a
yi
ng
th
e
r
e
c
or
di
ngs
of
hum
a
n
s
pe
e
c
h.
S
e
ve
r
a
l
e
f
f
or
ts
c
a
n
be
s
e
e
n
to
f
os
te
r
th
e
de
v
e
lo
pm
e
nt
of
c
ount
e
r
m
e
a
s
ur
e
s
a
ga
in
s
t
s
poof
in
g
a
tt
a
c
ks
on
A
S
V
s
ys
te
m
s
.
F
i
r
s
t,
th
e
bui
ld
in
g
of
publ
ic
da
ta
s
e
t
s
s
uc
h
a
s
th
e
da
ta
s
e
t,
w
hi
c
h
is
m
a
d
e
up
of
a
c
ol
l
e
c
ti
on
of
ge
nui
ne
a
nd
r
e
pl
a
ye
d
s
pe
e
c
h
[
5]
.
I
n
th
e
R
e
M
A
S
C
da
ta
s
e
t,
th
e
hum
a
n
s
pe
e
c
h
c
a
pt
ur
e
d
by
th
e
m
ic
r
ophone
a
r
r
a
y
w
a
s
l
a
be
le
d
a
s
ge
nui
ne
s
pe
e
c
h,
w
he
r
e
a
s
th
e
pl
a
yba
c
k
of
th
e
r
e
pl
a
y
s
our
c
e
r
e
c
or
di
ngs
ge
ne
r
a
t
e
d
in
di
f
f
e
r
e
nt
r
e
pl
a
y
s
e
tt
in
gs
w
a
s
la
be
le
d
a
s
r
e
pl
a
ye
d
s
pe
e
c
h.
I
n
pa
r
ti
c
ul
a
r
,
th
e
R
e
M
A
S
C
da
ta
s
e
t
m
a
de
up
of
9,240
ge
nui
ne
a
nd
45,472
r
e
pl
a
ye
d
r
e
c
or
di
ngs
. T
he
s
pe
e
c
h c
or
pus
w
a
s
c
ol
le
c
te
d f
r
om
a
t
ot
a
l
of
50
s
pe
a
ke
r
s
, i
n pa
r
ti
c
ul
a
r
, 22 f
e
m
a
le
a
nd 28
m
a
le
s
pe
a
ke
r
s
w
it
h
a
ge
s
r
a
nge
f
r
om
18
to
36.
A
m
ong
th
e
50
s
pe
a
k
e
r
s
,
th
e
r
e
w
e
r
e
36
E
ngl
is
h
na
ti
ve
s
pe
a
ke
r
s
,
12
C
hi
ne
s
e
na
ti
ve
s
pe
a
ke
r
s
,
a
nd
2
I
ndi
a
n
na
ti
ve
s
pe
a
k
e
r
s
.
A
bout
132
voi
c
e
c
om
m
a
nds
m
a
de
up
of
273
uni
que
w
or
ds
w
e
r
e
us
e
d
a
s
r
e
c
or
di
ng
m
a
te
r
ia
ls
to
pr
ovi
de
r
e
a
s
ona
bl
e
phone
ti
c
di
ve
r
s
it
y.
F
our
di
f
f
e
r
e
nt
r
e
c
or
di
ng
e
nvi
r
onm
e
n
ts
w
it
h
di
f
f
e
r
e
nt
no
is
e
le
ve
ls
w
e
r
e
us
e
d,
na
m
e
ly
one
out
door
e
nvi
r
onm
e
nt
,
two
in
doo
r
e
nvi
r
onm
e
nt
s
(
qui
e
t
a
nd
noi
s
y)
,
a
nd
one
ve
hi
c
le
e
nvi
r
onm
e
nt
.
T
he
bui
ld
in
g
of
a
publ
ic
d
a
ta
s
e
t
a
ll
ow
e
d
th
e
c
om
m
uni
ty
of
s
poof
in
g a
nd a
nt
i
-
s
poof
in
g f
or
A
S
V
t
o de
ve
lo
p r
obus
t
P
A
D
f
or
A
S
V
s
ys
te
m
s
.
S
e
c
ond,
th
e
A
S
V
s
poof
c
ha
ll
e
nge
s
w
e
r
e
he
ld
to
e
nc
our
a
g
e
th
e
de
ve
lo
pm
e
nt
of
voi
c
e
s
poof
in
g
c
ount
e
r
m
e
a
s
ur
e
s
.
S
ta
nda
r
d
da
ta
s
e
t
s
,
te
c
hni
que
s
,
a
nd
e
va
lu
a
ti
on
c
r
it
e
r
ia
w
e
r
e
ut
il
iz
e
d
in
th
e
A
S
V
s
poof
C
ha
ll
e
nge
s
e
r
ie
s
.
T
h
e
f
ir
s
t
A
S
V
s
poof
C
ha
ll
e
ng
e
,
w
hi
c
h
c
ov
e
r
e
d
s
pe
e
c
h
s
ynt
he
s
is
a
nd
voi
c
e
c
onve
r
s
io
n
a
tt
a
c
ks
, w
a
s
he
ld
i
n 2015. I
n t
he
A
S
V
s
poof
2015 C
ha
ll
e
nge
, t
h
e
r
a
ti
ng w
a
s
ba
s
e
d on 16 pr
im
a
r
y s
ubm
is
s
io
ns
.
T
he
b
e
s
t
s
ys
te
m
in
th
e
A
S
V
s
poof
2015
C
ha
ll
e
nge
ha
d
a
n
E
E
R
of
1.21%
on
a
v
e
r
a
ge
[
6]
.
T
he
n,
to
e
m
pha
s
i
z
e
r
e
pl
a
y
a
tt
a
c
ks
,
th
e
A
S
V
s
poof
2017
C
ha
ll
e
nge
w
a
s
la
unc
h
e
d.
T
he
r
e
w
a
s
a
m
uc
h
hi
ghe
r
num
be
r
of
s
ubm
is
s
io
ns
r
e
c
e
iv
e
d
in
th
e
A
S
V
s
poof
2017
C
ha
ll
e
nge
,
r
e
c
or
de
d
49
s
ubm
is
s
io
ns
c
om
pa
r
e
d
to
th
e
pr
e
vi
ous
c
ha
ll
e
nge
.
T
he
be
s
t
pe
r
f
or
m
in
g
s
ys
te
m
in
th
e
A
S
V
s
poof
2017
c
ha
ll
e
nge
ha
s
a
c
hi
e
ve
d
6.73%
E
E
R
[
7
]
.
T
he
A
S
V
s
poof
2019
c
ha
ll
e
nge
w
a
s
la
te
r
or
ga
ni
z
e
d
to
in
c
lu
de
s
p
e
e
c
h
s
ynt
he
s
is
,
voi
c
e
c
onve
r
s
io
n,
a
nd
r
e
pl
a
y
a
tt
a
c
ks
.
T
he
A
S
V
s
poof
2019
da
ta
s
e
t
c
a
n
be
di
vi
de
d
in
to
tw
o
ty
pe
s
of
a
tt
a
c
ks
:
lo
gi
c
a
l
a
tt
a
c
ks
(
L
A
)
a
nd
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
s
pe
e
c
h de
te
c
ti
on u
s
in
g i
m
age
-
bas
e
d f
e
at
u
r
e
s
and
r
an
dom
f
or
e
s
t
c
la
s
s
if
ie
r
(
C
hoon B
e
ng T
an
)
163
phys
ic
a
l
a
tt
a
c
k
s
(
P
A
)
.
G
e
nui
ne
s
pe
e
c
h,
s
pe
e
c
h
s
ynt
he
s
is
,
a
nd
voi
c
e
c
onve
r
s
io
n
a
tt
a
c
ks
w
e
r
e
in
c
lu
de
d
in
th
e
L
A
da
ta
s
e
t,
w
he
r
e
a
s
ge
nui
ne
s
pe
e
c
h
a
nd
r
e
pl
a
y
a
tt
a
c
ks
w
e
r
e
i
nc
lu
de
d
in
th
e
P
A
da
ta
s
e
t.
F
or
th
e
L
A
a
nd
P
A
s
c
e
na
r
io
s
,
th
e
be
s
t
s
ubm
is
s
io
ns
a
c
hi
e
v
e
d
0.22%
a
nd
0.39%
E
E
R
,
r
e
s
pe
c
ti
ve
ly
[
8]
.
H
ow
e
ve
r
,
onl
y
56.25%
a
nd
64%
s
ubm
is
s
io
ns
f
or
th
e
L
A
a
nd
P
A
s
c
e
na
r
io
s
ha
d
out
pe
r
f
or
m
e
d
th
e
ba
s
e
li
ne
s
ys
t
e
m
,
r
e
s
pe
c
ti
ve
ly
.
N
one
th
e
le
s
s
,
in
bot
h
th
e
L
A
a
nd
P
A
s
c
e
na
r
io
s
,
th
e
E
E
R
of
th
e
m
a
jo
r
it
y
of
th
e
s
ubm
is
s
io
ns
ha
d
not
be
e
n
le
s
s
th
a
n
5%
.
D
ue
to
th
e
e
a
s
e
of
obt
a
in
in
g
bi
om
e
tr
ic
da
ta
,
e
s
pe
c
ia
l
ly
th
r
ough
s
oc
ia
l
m
e
di
a
,
th
e
s
e
c
ur
it
y
th
r
e
a
t
to
th
e
A
S
V
s
ys
te
m
s
is
s
i
gni
f
ic
a
nt
.
P
ubl
ic
ly
a
va
il
a
bl
e
bi
om
e
tr
ic
da
ta
in
s
oc
ia
l
m
e
di
a
c
a
n
be
us
e
d
by
s
e
c
ur
it
y
a
dve
r
s
a
r
ie
s
to
la
unc
h
pr
e
s
e
nt
a
ti
on
a
tt
a
c
k
s
s
uc
h
a
s
s
pe
e
c
h
s
ynt
he
s
is
a
nd
voi
c
e
c
onv
e
r
s
io
n
to
s
poof
th
e
A
S
V
s
ys
te
m
s
.
F
ur
th
e
r
m
or
e
,
a
la
r
ge
a
m
ount
of
a
r
ti
f
ic
ia
l
s
pe
e
c
h
c
a
n
be
ge
ne
r
a
te
d
us
in
g
s
ta
te
-
of
-
th
e
-
a
r
t
s
pe
e
c
h
s
ynt
he
s
is
a
nd
voi
c
e
c
onve
r
s
io
n
a
lg
or
it
hm
s
to
s
poof
A
S
V
s
ys
te
m
s
.
W
he
r
e
by
in
th
is
pa
pe
r
,
th
e
f
oc
us
is
o
n
a
r
ti
f
ic
ia
l
s
pe
e
c
h
due
to
it
is
th
e
c
om
m
on
s
poof
in
g
a
tt
a
c
k
a
s
it
c
a
n
be
ge
ne
r
a
te
d
in
a
s
hor
t
ti
m
e
to
s
poof
th
e
A
S
V
s
ys
te
m
.
S
e
ve
r
a
l
voi
c
e
P
A
D
s
y
s
te
m
s
w
e
r
e
in
tr
oduc
e
d
to
d
e
te
c
t
a
r
ti
f
ic
ia
l
s
pe
e
c
h.
A
s
m
o
s
t
of
th
e
a
r
ti
f
ic
ia
l
s
pe
e
c
h
w
a
s
pr
oduc
e
d
us
in
g
pa
r
a
m
e
tr
ic
voc
ode
r
s
,
pha
s
e
in
f
or
m
a
ti
on
w
a
s
a
n
e
f
f
e
c
ti
ve
f
e
a
tu
r
e
to
de
te
c
t
s
pe
e
c
h
s
ynt
he
s
is
a
tt
a
c
ks
.
A
s
a
r
e
s
ul
t,
pha
s
e
-
ba
s
e
d
voi
c
e
P
A
D
f
or
de
te
c
ti
ng
a
r
ti
f
ic
ia
l
s
pe
e
c
h
ha
s
be
c
om
e
s
ta
te
-
of
-
th
e
-
a
r
t
[
9]
.
H
ow
e
ve
r
,
A
S
V
s
ys
te
m
s
a
r
e
s
t
il
l
pr
one
to
a
tt
a
c
ks
f
r
o
m
a
r
ti
f
ic
ia
l
s
pe
e
c
h
a
s
m
os
t
of
th
e
pha
s
e
-
ba
s
e
d
voi
c
e
P
A
D
in
tr
oduc
e
d
w
e
r
e
onl
y
e
f
f
e
c
ti
ve
a
ga
in
s
t
a
r
ti
f
ic
ia
l
s
pe
e
c
h
ge
ne
r
a
te
d
us
in
g
m
in
im
um
-
pha
s
e
f
il
te
r
s
ba
s
e
d pa
r
a
m
e
tr
ic
voc
ode
r
s
[
10]
.
T
he
r
e
w
e
r
e
num
e
r
ous
w
or
ks
f
ound
in
th
e
li
te
r
a
tu
r
e
w
he
r
e
by
t
he
a
ppl
ic
a
ti
on
of
im
a
ge
c
la
s
s
if
ic
a
ti
on
in
th
e
s
ig
na
l
dom
a
in
w
a
s
s
how
n
to
be
e
f
f
e
c
ti
ve
.
T
o
a
ppl
y
a
n
i
m
a
ge
c
la
s
s
if
ic
a
ti
on
a
ppr
oa
c
h,
a
udi
o
da
t
a
w
e
r
e
pr
e
-
pr
oc
e
s
s
e
d
a
nd
tr
a
ns
f
or
m
e
d
in
to
im
a
ge
da
ta
.
F
or
e
xa
m
pl
e
,
f
e
a
tu
r
e
s
e
xt
r
a
c
te
d
f
r
om
th
e
S
pe
c
tr
ogr
a
m
im
a
ge
w
e
r
e
s
how
n
to
im
pr
ove
th
e
p
e
r
f
or
m
a
nc
e
of
a
c
ous
ti
c
e
ve
nt
c
la
s
s
if
ic
a
ti
ons
[
11]
.
B
e
s
id
e
s
,
S
pe
c
tr
ogr
a
m
im
a
ge
s
w
e
r
e
a
ls
o
be
in
g
us
e
d
f
or
r
a
pi
d
s
pe
a
ke
r
r
e
c
ogni
ti
on
a
nd
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on
[
12]
.
T
he
r
e
c
e
nt
w
or
k
[
12]
,
w
hi
c
h
us
e
d
r
a
w
S
pe
c
tr
ogr
a
m
im
a
ge
a
s
in
put
f
or
a
n e
nd
-
to
-
e
nd
L
ig
ht
-
R
e
s
N
e
t
-
34
m
ode
l,
ha
s
out
p
e
r
f
or
m
e
d
th
e
c
onve
nt
io
na
l
a
ppr
oa
c
h
of
us
in
g
c
ons
ta
nt
q
c
e
ps
tr
a
l
c
oe
f
f
ic
ie
nt
s
(
C
Q
C
C
)
a
nd
G
M
M
in
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on.
A
not
he
r
r
e
c
e
nt
w
or
k
th
a
t
a
ppl
ie
d
de
e
p
ne
ur
a
l
ne
two
r
k
(
D
N
N
)
a
r
c
hi
te
c
tu
r
e
a
s
a
ba
c
ke
nd
c
la
s
s
if
ie
r
w
it
h
c
ons
ta
nt
-
q
e
qua
l
s
ubba
nd
tr
a
ns
f
or
m
(
C
Q
-
E
S
T
)
f
e
a
tu
r
e
s
[
13]
w
a
s
s
how
n
to
out
pe
r
f
or
m
m
os
t
of
th
e
s
ta
te
-
of
-
th
e
-
a
r
t
a
ppr
oa
c
he
s
w
it
h
a
n
E
E
R
of
0.06%
.
O
th
e
r
th
a
n
ba
c
ke
nd
c
la
s
s
if
ie
r
s
,
de
e
p
l
e
a
r
ni
ng
a
r
c
hi
te
c
tu
r
e
s
uc
h
a
s
c
onvolut
io
na
l
ne
ur
a
l
ne
twor
k
(
C
N
N
)
w
a
s
a
l
s
o
us
e
d
a
s
a
f
e
a
t
ur
e
e
xt
r
a
c
to
r
in
r
e
c
e
nt
w
or
ks
.
I
n
ot
he
r
w
or
k,
a
li
ght
ga
te
d
C
N
N
w
a
s
u
s
e
d
a
s
a
f
e
a
tu
r
e
e
xt
r
a
c
to
r
to
e
xt
r
a
c
t
f
e
a
t
ur
e
s
f
r
om
s
pe
c
tr
ogr
a
m
im
a
ge
a
nd
pr
oba
bi
li
s
ti
c
li
ne
a
r
di
s
c
r
im
in
a
nt
a
na
ly
s
is
(
P
L
D
A
)
a
s
ba
c
ke
nd
c
la
s
s
if
ie
r
to
a
c
hi
e
ve
a
n
E
E
R
of
0.16%
in
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on
[
14]
.
A
f
us
e
d
s
ys
te
m
us
in
g
s
hor
t
ti
m
e
f
our
ie
r
tr
a
ns
f
or
m
(
S
T
F
T
)
a
nd
m
odi
f
ie
d
gr
oup
de
la
y
(
M
G
D
)
f
e
a
tu
r
e
s
w
e
r
e
in
tr
oduc
e
d
r
e
c
e
nt
ly
a
nd
pr
oduc
e
d
a
0.02%
E
E
R
in
de
te
c
ti
ng
a
r
ti
f
ic
ia
l
s
pe
e
c
h. T
he
a
dva
nt
a
ge
of
th
is
ki
nd
of
f
us
e
d
s
ys
te
m
[
15]
is
th
a
t
bot
h
m
a
gni
tu
de
a
nd
ph
a
s
e
s
pe
c
tr
a
l
f
e
a
tu
r
e
s
w
e
r
e
us
e
d
to
ge
th
e
r
.
T
hi
s
m
e
th
od
yi
e
ld
e
d
be
tt
e
r
p
e
r
f
or
m
a
nc
e
th
a
n
a
f
us
io
n
of
in
de
pe
nd
e
nt
s
ys
t
e
m
s
w
it
h
one
f
e
a
tu
r
e
f
or
e
a
c
h
s
ys
t
e
m
.
A
lt
hough
m
os
t
of
th
e
r
e
c
e
nt
w
or
ks
a
c
hi
e
ve
d
good
E
E
R
,
how
e
ve
r
,
m
os
t
of
th
e
m
di
d
not
pe
r
f
or
m
w
e
ll
in
de
te
c
ti
ng
a
n
S
10
a
tt
a
c
k,
one
of
th
e
a
tt
a
c
k
s
c
e
na
r
io
s
of
th
e
A
S
V
s
poof
2015
da
ta
s
e
t.
T
hi
s
in
di
c
a
te
s
th
a
t
m
or
e
ge
ne
r
a
li
z
e
d
m
ode
ls
of
a
r
ti
f
ic
ia
l
s
pe
e
c
h
a
tt
a
c
ks
a
r
e
ne
e
d
e
d.
I
n
th
e
c
on
te
xt
of
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on,
th
e
m
os
t
r
e
c
e
nt
w
or
ks
w
e
r
e
us
in
g
C
N
N
a
s
a
f
e
a
tu
r
e
e
xt
r
a
c
to
r
to
e
xt
r
a
c
t
im
a
ge
-
ba
s
e
d
f
e
a
tu
r
e
s
f
r
om
th
e
s
pe
c
tr
ogr
a
m
.
N
one
th
e
le
s
s
,
C
N
N
us
ua
ll
y
r
e
qui
r
e
s
a
la
r
ge
num
be
r
of
tr
a
in
in
g
s
a
m
pl
e
s
,
c
om
put
in
g
ti
m
e
,
a
nd
r
e
s
our
c
e
s
f
or
b
e
tt
e
r
pe
r
f
or
m
a
nc
e
a
nd
ge
ne
r
a
li
z
a
ti
on.
H
ow
e
v
e
r
,
s
im
il
a
r
pe
r
f
or
m
a
nc
e
c
a
n
be
a
c
hi
e
v
e
d
by
ut
il
iz
in
g
ha
ndc
r
a
f
te
d
f
e
a
tu
r
e
s
f
or
im
a
ge
c
la
s
s
if
ic
a
ti
on,
de
a
li
ng
w
it
h
th
e
a
bove
m
e
nt
io
ne
d
dr
a
w
ba
c
ks
.
M
or
e
ove
r
,
w
or
k
th
a
t
ut
il
iz
e
d
ha
ndc
r
a
f
te
d
im
a
ge
-
ba
s
e
d
f
e
a
tu
r
e
s
in
de
te
c
ti
ng
a
r
ti
f
ic
ia
l
s
pe
e
c
h
w
a
s
li
m
it
e
d
in
th
e
l
it
e
r
a
tu
r
e
. I
t
is
c
onj
e
c
tu
r
e
d t
ha
t
us
in
g s
im
il
a
r
a
ppr
oa
c
he
s
t
o
e
xt
r
a
c
t
im
a
ge
-
ba
s
e
d f
e
a
tu
r
e
s
(
c
ol
or
, t
e
xt
ur
e
, or
e
dge
s
)
c
oul
d be
us
e
f
ul
t
o ge
ne
r
a
te
m
or
e
ge
ne
r
a
li
z
e
d f
e
a
tu
r
e
s
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h de
te
c
ti
on.
D
e
s
pi
te
th
e
a
dva
nc
e
m
e
nt
s
m
a
de
pos
s
ib
le
by
s
pe
e
c
h
id
e
nt
if
ic
a
ti
on
te
c
hnol
ogy,
s
poof
in
g
a
tt
a
c
ks
by
s
e
c
ur
it
y
a
dve
r
s
a
r
ie
s
to
e
va
de
A
S
V
s
y
s
te
m
s
i
s
a
lwa
y
s
a
pr
obl
e
m
.
T
o
s
poof
A
S
V
s
y
s
te
m
s
,
s
ta
te
-
of
-
th
e
-
a
r
t
s
pe
e
c
h
s
ynt
he
s
is
a
nd
voi
c
e
c
onve
r
s
io
n
a
lg
or
it
hm
s
c
oul
d
e
a
s
il
y
ge
ne
r
a
te
a
r
ti
f
ic
ia
l
s
pe
e
c
h
in
m
a
s
s
iv
e
qua
nt
it
ie
s
.
F
ur
th
e
r
m
or
e
,
be
c
a
u
s
e
it
is
s
o
e
a
s
y
to
ge
t
bi
om
e
tr
ic
d
a
ta
vi
a
s
oc
ia
l
m
e
di
a
, s
poof
in
g
a
tt
a
c
k
s
on
A
S
V
s
ys
te
m
s
a
r
e
be
c
om
in
g
m
or
e
c
om
m
on.
A
s
a
r
e
s
ul
t,
r
obus
t
s
poof
in
g
c
ount
e
r
m
e
a
s
ur
e
s
a
r
e
r
e
qui
r
e
d.
T
he
s
e
c
ount
e
r
m
e
a
s
u
r
e
s
a
r
e
c
om
m
onl
y
known
a
s
voi
c
e
P
A
D
.
V
oi
c
e
P
A
D
ha
s
be
e
n
th
e
s
ubj
e
c
t
of
va
r
io
us
r
e
s
e
a
r
c
h
s
tu
di
e
s
,
w
hi
c
h
m
a
y
be
f
ound
in
th
e
li
te
r
a
tu
r
e
.
R
e
c
e
nt
voi
c
e
P
A
D
s
,
on
th
e
ot
he
r
ha
nd,
w
e
r
e
vul
n
e
r
a
bl
e
to
unknown
s
poof
in
g
te
c
hni
que
s
[
6]
.
T
he
voi
c
e
P
A
D
s
s
ub
m
it
te
d
in
th
e
A
S
V
s
poof
2015
c
om
pe
ti
ti
on
de
m
ons
tr
a
te
th
is
.
S
ys
te
m
A
,
th
e
b
e
s
t
s
ys
te
m
in
th
e
A
S
V
s
poof
2
015,
pr
opos
e
d
us
in
g
two
f
e
a
tu
r
e
s
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on:
m
e
l
-
f
r
e
que
nc
y
c
e
ps
tr
a
l
c
oe
f
f
ic
ie
nt
s
(
M
F
C
C
)
a
nd
c
o
c
hl
e
a
r
f
il
te
r
c
e
p
s
tr
a
l
c
oe
f
f
ic
ie
nt
s
pl
us
in
s
ta
nt
a
ne
ous
f
r
e
que
nc
y
(
C
F
C
C
I
F
)
us
in
g
G
M
M
c
la
s
s
if
ie
r
.
A
lt
hough
s
ys
te
m
A
p
e
r
f
or
m
e
d
w
it
h
a
n
a
v
e
r
a
ge
of
1.21%
E
E
R
,
th
e
a
ve
r
a
ge
E
E
R
f
or
known
a
nd
unknown
a
t
ta
c
ks
w
e
r
e
0.41%
a
nd
2.01%
,
r
e
s
pe
c
ti
v
e
ly
.
S
im
il
a
r
ly
,
m
os
t
s
ys
te
m
s
s
ubm
it
te
d
to
th
e
A
S
V
s
poof
2015
c
ha
ll
e
nge
e
nc
ount
e
r
e
d
a
s
im
il
a
r
c
ir
c
um
s
ta
nc
e
in
w
hi
c
h
th
e
y
w
e
r
e
una
bl
e
to
id
e
nt
if
y
th
e
S
10
a
tt
a
c
k
e
f
f
e
c
ti
ve
ly
,
w
hi
c
h
w
a
s
th
e
s
ol
e
a
tt
a
c
k
pr
oduc
e
d
us
in
g
th
e
w
a
ve
f
or
m
c
onc
a
te
na
ti
on
m
e
th
od.
T
hi
s
pa
tt
e
r
n c
a
n
b
e
in
te
r
pr
e
te
d
a
s
po
s
s
ib
le
ove
r
f
it
ti
n
g
in
th
e
pr
opos
e
d
voi
c
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
1
61
-
1
72
164
P
A
D
s
. H
e
nc
e
, m
or
e
i
nf
or
m
a
ti
ve
f
e
a
tu
r
e
s
a
r
e
ne
e
de
d t
o ge
ne
r
a
li
z
e
voi
c
e
P
A
D
a
ga
in
s
t
uns
e
e
n s
poof
in
g a
tt
a
c
ks
[
16]
. O
ne
s
ol
ut
io
n i
s
t
o pr
oduc
e
ne
w
f
e
a
tu
r
e
s
us
in
g f
e
a
tu
r
e
e
ngi
ne
e
r
in
g,
in
w
hi
c
h ne
w
de
s
c
r
ip
ti
ve
f
e
a
tu
r
e
s
a
r
e
c
ons
tr
uc
te
d
to
be
us
e
d
to
tr
a
in
a
pr
e
di
c
ti
ve
m
ode
l.
T
hi
s
pa
pe
r
i
s
w
r
it
te
n
to
pr
opos
e
a
ne
w
f
e
a
tu
r
e
e
ngi
ne
e
r
in
g
a
ppr
oa
c
h
us
in
g
da
ta
tr
a
ns
f
or
m
a
ti
on
te
c
hni
que
s
f
or
a
r
ti
f
ic
ia
l
s
p
e
e
c
h
de
te
c
ti
on.
I
n
th
is
w
or
k,
r
a
th
e
r
th
a
n
us
in
g
c
onve
nt
io
na
l
s
ig
na
l
pr
oc
e
s
s
in
g
to
e
xt
r
a
c
t
f
e
a
tu
r
e
s
f
r
om
s
pe
e
c
h,
w
e
pr
opos
e
d
to
us
e
da
t
a
tr
a
ns
f
or
m
a
ti
on
to
a
ppl
y
th
e
te
c
hni
que
s
f
r
om
ot
he
r
dom
a
in
s
s
uc
h
a
s
im
a
ge
pr
oc
e
s
s
in
g
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on.
T
he
pr
opos
e
d
a
ppr
oa
c
h
is
m
ot
iv
a
te
d
by
th
e
s
uc
c
e
s
s
of
de
pl
oyi
n
g
im
a
ge
c
la
s
s
if
ic
a
ti
on
te
c
hni
que
s
to
s
ounds
c
la
s
s
if
ic
a
ti
on
a
nd
s
pe
a
ke
r
r
e
c
ogni
ti
on
[
11]
.
A
n
e
ns
e
m
bl
e
c
la
s
s
if
ie
r
in
th
e
f
or
m
of
r
a
ndom
f
or
e
s
t
(
R
F
)
w
a
s
u
s
e
d
to
ge
ne
r
a
te
th
e
a
r
ti
f
ic
ia
l
s
pe
e
c
h
m
ode
l.
T
he
pe
r
f
or
m
a
nc
e
of
th
e
pr
opos
e
d
a
ppr
oa
c
h
is
de
ta
il
e
d,
a
lo
ng
w
it
h
th
e
r
e
s
ul
t
s
a
nd
di
s
c
u
s
s
io
n.
T
he
n,
is
s
ue
s
a
nd
f
ut
ur
e
w
or
k
to
m
it
ig
a
te
th
e
li
m
it
a
ti
on
of
th
e
pr
opos
e
d
a
ppr
oa
c
he
s
a
r
e
d
e
s
c
r
ib
e
d i
n t
hi
s
pa
p
e
r
.
T
he
ke
y c
ont
r
ib
ut
io
ns
of
t
hi
s
pa
pe
r
a
r
e
:
−
A
ppl
ic
a
ti
on
of
da
ta
tr
a
ns
f
or
m
a
ti
on
te
c
hni
que
s
to
e
ngi
ne
e
r
im
a
ge
-
ba
s
e
d
f
e
a
tu
r
e
s
to
de
te
c
t
a
r
ti
f
ic
ia
l
s
pe
e
c
h
−
A
ppl
ic
a
ti
on of
R
F
t
o be
us
e
d w
it
h t
he
ne
w
f
e
a
tu
r
e
s
e
ngi
ne
e
r
e
d t
o de
te
c
t
a
r
ti
f
ic
ia
l
s
pe
e
c
h
−
E
m
pi
r
ic
a
l
e
va
lu
a
ti
on of
th
e
pr
opos
e
d a
ppr
oa
c
h w
it
h t
he
e
xi
s
ti
n
g w
or
k f
ound in t
he
l
it
e
r
a
tu
r
e
2.
T
H
E
P
R
O
P
O
S
E
D
M
E
T
H
O
D
I
n
th
is
pa
pe
r
,
da
ta
tr
a
ns
f
or
m
a
ti
on
is
c
ons
id
e
r
e
d
to
ge
ne
r
a
te
pot
e
nt
ia
l
ge
ne
r
a
li
z
e
d
f
e
a
tu
r
e
s
f
or
voi
c
e
P
A
D
.
I
n
th
e
c
onve
nt
io
na
l
a
ppr
oa
c
h,
f
e
a
tu
r
e
s
s
uc
h
a
s
M
F
C
C
a
nd
C
Q
C
C
a
r
e
e
xt
r
a
c
te
d
di
r
e
c
tl
y
f
r
om
th
e
s
pe
e
c
h
s
ig
na
l
to
d
e
te
r
m
in
e
th
e
ge
nui
n
e
ne
s
s
of
th
e
s
pe
e
c
h
.
R
e
c
e
nt
ly
,
de
e
p
le
a
r
ni
ng
a
ppr
oa
c
h
e
s
,
in
c
lu
di
ng
C
N
N
,
w
e
r
e
f
r
e
que
nt
ly
b
e
in
g
us
e
d
to
a
ut
om
a
ti
c
a
ll
y
e
xt
r
a
c
t
f
e
a
tu
r
e
s
f
r
om
im
a
ge
r
e
pr
e
s
e
nt
a
ti
on
of
s
pe
e
c
h
s
ig
na
ls
.
U
nl
ik
e
c
onve
nt
io
na
l
a
nd
de
e
p
le
a
r
ni
ng
f
e
a
tu
r
e
e
xt
r
a
c
ti
on
a
ppr
oa
c
he
s
,
th
e
w
or
k
pr
e
s
e
nt
e
d
in
th
is
p
a
pe
r
pr
opos
e
d
to
us
e
ha
ndc
r
a
f
te
d
f
e
a
tu
r
e
s
e
xt
r
a
c
te
d
f
r
om
th
e
im
a
ge
a
nd
he
xa
d
e
c
im
a
l
f
r
e
que
nc
y
r
e
pr
e
s
e
nt
a
ti
on
of
th
e
s
pe
e
c
h
s
ig
na
l.
I
n
th
is
pa
pe
r
,
a
udi
o
r
e
c
or
di
ngs
w
e
r
e
f
ir
s
t
tr
a
ns
f
or
m
e
d
in
to
im
a
ge
s
.
T
he
n,
th
e
im
a
ge
-
ba
s
e
d
f
e
a
tu
r
e
s
a
r
e
e
xt
r
a
c
te
d
f
r
om
th
e
tr
a
n
s
f
or
m
e
d
da
ta
to
f
or
m
th
e
f
e
a
tu
r
e
ve
c
to
r
s
.
F
ig
ur
e
2
s
how
s
th
e
di
f
f
e
r
e
nc
e
s
be
tw
e
e
n
th
e
c
onve
nt
io
na
l
a
ppr
oa
c
h
a
nd
th
e
pr
o
pos
e
d
f
e
a
tu
r
e
e
ngi
ne
e
r
in
g
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on.
T
he
pr
opos
e
d
f
e
a
tu
r
e
e
ngi
n
e
e
r
in
g
a
ll
ow
s
ne
w
f
e
a
tu
r
e
s
to
be
e
xt
r
a
c
te
d
f
r
om
th
e
s
pe
e
c
h
da
ta
.
S
ubs
e
c
ti
on
2.1
de
s
c
r
ib
e
s
th
e
ge
ne
r
a
ti
on
of
im
a
ge
-
ba
s
e
d
f
e
a
t
ur
e
s
c
ons
id
e
r
e
d
in
th
is
p
a
pe
r
.
S
ubs
e
c
ti
on
2.2
pr
e
s
e
nt
s
t
he
RF
c
l
a
s
s
if
ie
r
us
e
d f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h d
e
te
c
ti
on i
n
th
is
w
or
k.
F
ig
ur
e
2. T
he
c
om
pa
r
is
on
of
t
he
c
onve
nt
io
na
l
a
ppr
oa
c
h
a
nd t
he
pr
opos
e
d da
ta
t
r
a
ns
f
or
m
a
ti
on a
ppr
oa
c
h f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h de
te
c
ti
on
2.1.
T
r
an
s
f
o
r
m
at
io
n
of
au
d
io
d
at
a i
n
t
o i
m
age
r
e
p
r
e
s
e
n
t
at
io
n
an
d
e
xt
r
ac
t
io
n
of
i
m
age
-
b
as
e
d
f
e
at
u
r
e
s
S
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
s
a
r
e
two
c
om
m
on
f
or
m
s
us
e
d
to
r
e
pr
e
s
e
nt
a
udi
o
da
t
a
[
17]
,
[
18]
.
H
ow
e
ve
r
,
li
tt
le
a
tt
e
n
ti
on
ha
s
be
e
n
pa
id
to
us
e
bot
h
a
s
im
a
ge
s
,
w
he
r
e
by
im
a
ge
-
ba
s
e
d
f
e
a
tu
r
e
s
c
oul
d
b
e
e
xt
r
a
c
t
e
d
f
or
voi
c
e
P
A
D
.
I
n
w
or
k
pr
e
s
e
nt
e
d
in
th
is
p
a
pe
r
,
th
e
a
udi
o
s
ig
na
ls
a
r
e
r
e
pr
e
s
e
nt
e
d
a
s
S
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
s
im
a
ge
s
. D
if
f
e
r
e
nt
f
e
a
tu
r
e
s
w
e
r
e
t
he
n e
xt
r
a
c
te
d f
r
om
e
a
c
h of
th
e
ge
ne
r
a
te
d i
m
a
ge
s
. F
ig
ur
e
3 s
how
s
t
he
pr
oc
e
s
s
of
th
e
f
e
a
tu
r
e
e
xt
r
a
c
ti
on
f
r
om
S
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
im
a
ge
s
pr
opos
e
d
in
th
is
pa
pe
r
.
T
he
s
pe
e
c
h
s
ig
na
l
is
f
ir
s
t
tr
a
ns
f
or
m
e
d
in
to
s
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
im
a
ge
s
.
T
he
n,
th
e
c
ol
or
la
yout
f
il
te
r
(
C
L
F
)
a
nd
lo
c
a
l
bi
na
r
y
pa
tt
e
r
ns
(
L
B
P
)
f
e
a
tu
r
e
s
a
r
e
e
xt
r
a
c
te
d
f
r
om
th
e
s
pe
c
tr
ogr
a
m
to
f
or
m
th
e
s
pe
c
tr
ogr
a
m
-
ba
s
e
d
f
e
a
tu
r
e
s
.
c
onc
e
r
ni
ng
M
F
C
C
i
m
a
ge
s
, t
he
C
L
F
f
e
a
tu
r
e
s
a
r
e
e
xt
r
a
c
t
e
d t
o f
or
m
t
he
M
F
C
C
-
ba
s
e
d f
e
a
tu
r
e
s
.
A
s
pe
c
tr
ogr
a
m
is
a
r
e
pr
e
s
e
nt
a
ti
on
of
a
s
ig
na
l
th
a
t
s
how
s
t
he
s
ig
na
l’
s
s
pe
c
tr
a
l
in
f
or
m
a
ti
on
a
s
f
r
e
que
nc
y ove
r
t
im
e
i
n
t
he
f
o
r
m
o
f
vi
s
ua
l.
F
ig
u
r
e
4 s
how
s
how
s
pa
ti
a
l
di
f
f
e
r
e
nc
e
s
be
twe
e
n ge
nui
ne
a
nd s
poof
voi
c
e
s
us
in
g
S
pe
c
tr
ogr
a
m
im
a
ge
r
e
pr
e
s
e
nt
a
ti
on
c
oul
d
be
obs
e
r
ve
d.
I
n
th
is
e
xa
m
pl
e
,
a
ge
nui
ne
voi
c
e
c
ont
a
in
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
s
pe
e
c
h de
te
c
ti
on u
s
in
g i
m
age
-
bas
e
d f
e
at
u
r
e
s
and
r
an
dom
f
or
e
s
t
c
la
s
s
if
ie
r
(
C
hoon B
e
ng T
an
)
165
le
s
s
ba
c
kgr
ou
nd
noi
s
e
in
a
c
e
r
ta
in
r
e
gi
on
of
th
e
s
pe
c
tr
og
r
a
m
,
w
hi
le
th
e
s
poof
voi
c
e
c
ont
a
in
s
m
or
e
ba
c
kgr
ound
noi
s
e
.
T
o
de
te
c
t
m
or
e
di
f
f
e
r
e
nc
e
s
be
tw
e
e
n
ge
nui
ne
a
nd
s
poof
voi
c
e
s
in
th
e
voi
c
e
-
tr
a
ns
f
or
m
e
d
im
a
ge
s
s
uc
h a
s
s
pe
c
tr
ogr
a
m
,
im
a
ge
c
la
s
s
if
ic
a
ti
on
te
c
hni
que
s
c
o
ul
d
be
us
e
d
a
s
s
ugge
s
te
d
in
[
17]
.
M
F
C
C
is
a
n
a
udi
o
f
e
a
tu
r
e
c
om
m
onl
y
u
s
e
d
f
or
s
ig
na
l
pr
oc
e
s
s
in
g,
e
s
p
e
c
ia
ll
y
s
pe
e
c
h
r
e
c
ogni
ti
on
[
18]
,
[
19]
.
F
ig
ur
e
5
s
how
s
th
e
ge
ne
r
a
te
d
M
F
C
C
im
a
ge
s
of
ge
nui
ne
a
nd
s
poof
voi
c
e
s
.
F
r
om
F
ig
ur
e
5,
a
s
li
ght
ly
di
f
f
e
r
e
nt
c
ol
or
in
te
ns
it
y
in
th
e
r
e
gi
on
of
a
non
-
s
pe
e
c
h
s
e
gm
e
nt
c
a
n
be
ob
s
e
r
ve
d
w
he
n
c
om
pa
r
in
g
t
he
M
F
C
C
im
a
ge
s
of
ge
nui
ne
a
nd
s
poof
s
pe
e
c
h.
F
ig
ur
e
3. T
he
f
e
a
tu
r
e
e
xt
r
a
c
ti
on pr
oc
e
s
s
F
ig
ur
e
4. T
he
obs
e
r
va
bl
e
s
pa
ti
a
l
di
f
f
e
r
e
nc
e
s
be
tw
e
e
n ge
nui
ne
a
nd s
poof
voi
c
e
s
us
in
g
s
pe
c
tr
ogr
a
m
ge
ne
r
a
te
d
f
r
om
t
he
a
uda
c
it
y t
ool
F
ig
ur
e
5. A
n e
xa
m
pl
e
of
M
F
C
C
i
m
a
ge
s
g
e
ne
r
a
te
d f
or
ge
nui
ne
a
nd s
poof
voi
c
e
s
of
a
s
p
e
a
ke
r
C
L
F
w
a
s
s
e
le
c
te
d
f
or
f
e
a
tu
r
e
e
xt
r
a
c
ti
on
a
s
it
de
s
c
r
ib
e
s
th
e
s
p
a
ti
a
l
di
s
tr
ib
ut
io
n
of
c
ol
or
s
in
a
n
im
a
ge
a
nd
it
w
or
ks
w
e
ll
in
im
a
g
e
c
la
s
s
if
ic
a
ti
on
w
he
n
a
ppl
ie
d
on
c
ol
or
s
pe
c
tr
ogr
a
m
[
20]
.
I
n
th
e
C
L
F
a
lg
or
it
hm
,
th
e
in
put
im
a
ge
w
a
s
di
vi
de
d
in
to
64
bl
oc
ks
.
T
he
n,
th
e
va
lu
e
s
of
a
ll
pi
xe
ls
w
it
hi
n
e
a
c
h
bl
oc
k
w
e
r
e
a
ve
r
a
ge
d
to
obt
a
in
a
r
e
pr
e
s
e
nt
a
ti
ve
c
ol
or
,
r
e
s
ul
ti
ng
in
th
r
e
e
8
×
8
a
r
r
a
ys
,
c
ol
le
c
ti
ve
ly
r
e
pr
e
s
e
nt
in
g
Y
C
bC
r
c
ol
or
s
pa
c
e
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
1
61
-
1
72
166
T
he
n,
di
s
c
r
e
te
c
os
in
e
tr
a
ns
f
or
m
(
D
C
T
)
w
a
s
a
ppl
ie
d
to
th
e
th
r
e
e
8
×
8
a
r
r
a
ys
,
r
e
s
ul
ti
ng
in
th
r
e
e
D
C
T
m
a
tr
ic
e
s
,
one
f
or
e
a
c
h
Y
C
bC
r
c
om
pone
nt
.
T
he
C
L
F
de
s
c
r
ip
to
r
w
a
s
f
or
m
e
d
by
r
e
a
di
ng
th
e
c
oe
f
f
ic
ie
nt
s
f
r
om
th
e
m
a
tr
ic
e
s
in
z
ig
z
a
g
or
de
r
.
T
he
C
L
F
de
s
c
r
ip
to
r
c
ont
a
in
s
a
to
ta
l
of
33
f
e
a
tu
r
e
s
ge
ne
r
a
te
d.
A
s
s
ho
w
n
in
F
ig
ur
e
s
4
a
nd 5, ge
nui
ne
a
nd a
r
ti
f
ic
ia
l
voi
c
e
s
m
a
y be
di
s
ti
ngui
s
he
d by the
di
f
f
e
r
e
nc
e
s
i
n t
he
s
pa
ti
a
l
di
s
tr
ib
ut
io
n of
c
ol
or
in
c
e
r
ta
in
r
e
gi
ons
of
th
e
ge
n
e
r
a
te
d
S
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
im
a
ge
s
.
F
ig
ur
e
6
s
how
s
th
e
pr
oc
e
s
s
of
C
L
F
f
e
a
tu
r
e
s
e
xt
r
a
c
ti
on.
L
B
P
is
c
hos
e
n
in
th
is
pa
pe
r
a
s
it
is
c
om
m
onl
y
us
e
d
a
nd
pr
o
duc
e
d
good
de
s
c
r
ip
to
r
s
of
te
xt
ur
e
i
n
im
a
ge
c
la
s
s
if
ic
a
ti
on.
F
ig
ur
e
7
s
how
s
th
e
pr
oc
e
s
s
of
L
B
P
f
e
a
tu
r
e
e
xt
r
a
c
ti
on.
T
o
e
xt
r
a
c
t
L
B
P
f
e
a
tu
r
e
s
f
r
om
a
S
pe
c
tr
ogr
a
m
im
a
ge
,
th
e
3D
c
ol
or
pi
xe
ls
w
e
r
e
c
onve
r
te
d
in
to
2D
gr
a
ys
c
a
le
va
lu
e
s
.
F
or
e
a
c
h
pi
xe
l
in
th
e
c
onve
r
te
d gr
a
ys
c
a
le
i
m
a
ge
, a
n
e
ig
hbor
hood r
a
di
us
r
s
ur
r
ounding t
he
c
e
nt
e
r
pi
xe
l
w
a
s
s
e
le
c
t
e
d. T
he
n, t
he
L
B
P
va
lu
e
w
a
s
c
a
lc
ul
a
te
d
f
or
th
is
c
e
nt
e
r
pi
xe
l
a
nd
s
to
r
e
d
a
s
a
2D
a
r
r
a
y
w
it
h
th
e
s
a
m
e
he
ig
ht
a
nd
w
id
th
a
s
th
e
c
onve
r
te
d
gr
a
ys
c
a
le
im
a
ge
.
T
he
n,
th
e
c
e
nt
e
r
pi
xe
l
w
a
s
c
om
pa
r
e
d
to
th
e
s
u
r
r
ounding
ne
ig
hbor
hood
pi
xe
ls
,
w
he
th
e
r
th
e
ne
ig
hbor
pi
xe
ls
w
e
r
e
gr
e
a
te
r
-
th
a
n
-
or
-
e
qua
l
-
to
th
e
c
e
nt
e
r
pi
xe
l.
I
f
th
e
ne
ig
hbor
pi
xe
l
w
a
s
gr
e
a
te
r
th
a
n
or
e
qua
l
to
th
e
c
e
nt
e
r
pi
xe
l,
th
e
n
th
e
va
lu
e
w
il
l
be
s
e
t
a
s
1;
ot
he
r
w
is
e
,
0
w
il
l
be
s
e
t.
T
he
pos
s
ib
le
num
be
r
o
f
c
om
bi
na
ti
ons
of
L
B
P
c
ode
s
w
a
s
2
p
,
w
he
r
e
p
is
th
e
num
be
r
of
ne
ig
hbor
hood
pi
xe
ls
.
I
n
or
ig
in
a
l
L
B
P
,
w
it
h
ne
ig
hbor
hood
r
a
di
us
,
=
1
a
nd
p
=
8,
th
e
r
e
w
e
r
e
2
8
=
256
pos
s
ib
le
num
be
r
c
om
bi
na
ti
ons
of
th
e
L
B
P
c
ode
s
,
r
a
nge
d
0
-
255.
A
f
r
e
que
nc
y
hi
s
to
gr
a
m
of
L
B
P
c
ode
s
w
a
s
c
om
put
e
d
a
s
L
B
P
f
e
a
tu
r
e
s
.
D
e
ta
il
s
of
L
B
P
c
a
n
be
f
ound in
[
21]
.
F
ig
ur
e
6. T
he
pr
oc
e
s
s
of
C
L
F
f
e
a
tu
r
e
s
e
xt
r
a
c
ti
on
F
ig
ur
e
7. T
he
pr
oc
e
s
s
of
L
B
P
f
e
a
tu
r
e
s
e
xt
r
a
c
ti
on
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
s
pe
e
c
h de
te
c
ti
on u
s
in
g i
m
age
-
bas
e
d f
e
at
u
r
e
s
and
r
an
dom
f
or
e
s
t
c
la
s
s
if
ie
r
(
C
hoon B
e
ng T
an
)
167
I
n
th
is
pa
pe
r
,
th
e
S
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
im
a
ge
s
w
e
r
e
ge
n
e
r
a
te
d
f
r
om
a
udi
o
r
e
c
or
di
ngs
us
in
g
P
yt
hon.
S
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
im
a
ge
s
w
e
r
e
pl
ot
te
d
us
in
g
py
pl
ot
a
nd
li
br
os
a
li
br
a
r
ie
s
,
r
e
s
p
e
c
ti
ve
ly
.
T
h
e
ge
ne
r
a
te
d
im
a
ge
s
w
e
r
e
th
e
n
s
a
ve
d
in
P
N
G
f
or
m
a
t
w
it
h
a
s
iz
e
o
f
640
×
480
pi
xe
ls
. T
he
C
L
F
im
pl
e
m
e
nt
a
ti
on
in
W
e
ka
[
22]
w
a
s
u
s
e
d
to
e
xt
r
a
c
t
th
e
f
e
a
tu
r
e
s
.
T
he
c
v
tC
ol
o
r
(
)
a
lg
or
it
hm
of
th
e
O
pe
n
C
V
li
br
a
r
y
(
cv2
)
w
a
s
e
m
pl
oye
d
f
or
gr
a
ys
c
a
le
c
onve
r
s
io
n
in
P
yt
hon.
C
onc
e
r
ni
ng
L
B
P
,
a
ne
ig
hbor
hood
r
a
di
us
r
=
1
w
a
s
c
ho
s
e
n
a
s
it
w
a
s
th
e
s
e
tt
in
g
us
e
d
in
th
e
or
ig
in
a
l
L
B
P
[
21]
a
nd
m
os
t
us
e
d
in
th
e
l
it
e
r
a
tu
r
e
,
in
w
hi
c
h
th
e
r
e
w
e
r
e
e
ig
h
t
ne
ig
hbor
in
g
pi
xe
ls
in
a
3
×
3
pi
x
e
ls
w
in
dow
.
T
he
f
r
e
que
nc
y
hi
s
t
ogr
a
m
c
om
put
e
d
f
r
om
L
B
P
w
a
s
us
e
d
a
s
L
B
P
f
e
a
tu
r
e
s
in
th
is
w
or
k.
I
n
to
ta
l,
322
f
e
a
tu
r
e
s
w
e
r
e
e
xt
r
a
c
te
d
f
r
om
th
e
s
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
r
e
pr
e
s
e
nt
e
d
a
udi
o
da
ta
.
A
to
ta
l
of
289
f
e
a
tu
r
e
s
w
e
r
e
ge
ne
r
a
te
d
f
r
om
th
e
s
p
e
c
tr
ogr
a
m
;
33
w
e
r
e
C
L
F
a
nd
256
w
e
r
e
L
B
P
f
e
a
tu
r
e
s
. C
onc
e
r
ni
ng M
F
C
C
, a
t
ot
a
l
of
33 C
L
F
f
e
a
tu
r
e
s
w
e
r
e
ge
ne
r
a
te
d.
2.2. Ran
d
om
f
or
e
s
t
(
R
F
)
c
la
s
s
if
ie
r
f
or
a
r
t
if
ic
ia
l
s
p
e
e
c
h
d
e
t
e
c
t
io
n
F
e
a
tu
r
e
s
w
e
r
e
e
xt
r
a
c
te
d
f
r
om
da
ta
s
a
m
pl
e
s
a
nd
a
ut
om
a
ti
c
a
ll
y
le
a
r
ne
d
us
in
g
a
de
e
p
le
a
r
ni
ng
pr
oc
e
s
s
,
w
hi
c
h
w
a
s
th
e
n
u
s
e
d
to
pr
e
di
c
t
th
e
da
ta
s
a
m
pl
e
s
’
c
la
s
s
la
b
e
ls
in
e
nd
-
to
-
e
nd
le
a
r
ni
ng.
U
nl
ik
e
th
e
e
nd
-
to
-
e
nd
a
ppr
oa
c
h,
a
b
a
c
ke
nd
c
la
s
s
if
ie
r
is
n
e
e
de
d
to
di
f
f
e
r
e
nt
ia
te
be
twe
e
n
ge
nui
ne
a
nd
s
poof
s
p
e
e
c
h
u
s
in
g
th
e
pr
opos
e
d
ha
ndc
r
a
f
te
d
f
e
a
tu
r
e
s
.
I
n
th
is
w
or
k,
a
n
e
ns
e
m
bl
e
c
la
s
s
i
f
ie
r
is
s
e
le
c
te
d
a
s
it
s
how
s
good
c
la
s
s
if
ic
a
ti
on
r
e
s
ul
ts
w
he
n a
ppl
ie
d w
it
h ha
nd
c
r
a
f
te
d f
e
a
tu
r
e
s
[
23]
–
[
25]
.
R
F
is
a
s
upe
r
vi
s
e
d,
e
ns
e
m
bl
e
le
a
r
ni
ng
m
ode
l
w
he
r
e
de
c
is
io
n
tr
e
e
s
a
r
e
ba
gge
d
f
or
c
la
s
s
if
ic
a
ti
on
a
nd
r
e
gr
e
s
s
io
n.
I
n
a
n
R
F
m
ode
l,
m
ul
ti
pl
e
de
c
is
io
n
tr
e
e
s
ba
s
e
d
on
r
a
ndoml
y
s
e
le
c
te
d
tr
a
in
in
g
s
ubs
e
ts
w
e
r
e
tr
a
in
e
d
a
nd me
r
ge
d t
o ge
t
a
m
or
e
a
c
c
ur
a
te
a
nd
s
ta
bl
e
pr
e
di
c
ti
on via
vot
e
s
a
ggr
e
ga
ti
on.
T
he
us
e
of
th
e
gr
e
e
dy a
lg
or
it
hm
to
s
e
le
c
t
th
e
be
s
t
s
pl
it
poi
nt
a
t
e
a
c
h
s
t
e
p
in
th
e
tr
e
e
bui
ld
in
g
pr
oc
e
s
s
w
il
l
le
a
d
to
s
im
il
a
r
r
e
s
ul
ti
ng
tr
e
e
s
f
or
ba
gge
d
de
c
is
io
n
tr
e
e
s
.
T
h
is
r
e
s
ul
ti
ng
in
th
e
r
e
duc
ti
on
in
th
e
va
r
ia
nc
e
of
th
e
pr
e
di
c
ti
ons
of
th
e
ba
gge
d
de
c
is
io
n
tr
e
e
s
.
T
o
m
it
ig
a
te
th
is
is
s
ue
,
R
F
is
a
n
im
pr
ove
d
ve
r
s
io
n
of
ba
gge
d
de
c
is
io
n
tr
e
e
s
th
a
t
di
s
r
upt
th
e
gr
e
e
dy
s
pl
it
ti
ng
a
lg
or
it
hm
dur
in
g
tr
e
e
c
r
e
a
ti
on.
W
he
n
th
e
gr
e
e
dy
s
pl
it
ti
ng
a
lg
or
it
hm
is
di
s
r
upt
e
d
dur
in
g
tr
e
e
c
r
e
a
ti
on
in
R
F
,
th
e
s
pl
it
poi
n
ts
of
de
c
is
io
n
tr
e
e
s
c
a
n
onl
y
be
c
hos
e
n
f
r
om
a
s
ubs
e
t
of
t
he
i
nput
f
e
a
tu
r
e
s
a
t
r
a
ndom. As
a
r
e
s
ul
t,
th
e
s
im
il
a
r
i
t
y be
twe
e
n t
he
ba
gge
d de
c
is
io
n t
r
e
e
s
de
c
r
e
a
s
e
d
a
nd l
e
d t
o l
ow
e
r
bi
a
s
a
nd highe
r
va
r
ia
nc
e
of
t
he
pr
e
di
c
ti
ons
. D
u
e
t
o i
ts
s
im
pl
ic
it
y a
nd pr
e
di
c
ti
ve
pe
r
f
or
m
a
nc
e
,
R
F
w
a
s
c
ho
s
e
n a
s
a
ba
c
ke
nd c
la
s
s
if
ie
r
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on
in
th
is
w
or
k.
D
e
ta
il
s
of
R
F
c
a
n
be
f
ound
in
[
26]
. T
he
W
e
ka
i
m
pl
e
m
e
nt
a
ti
on of
t
he
i
de
nt
if
ie
d
c
la
s
s
if
ie
r
s
w
a
s
us
e
d i
n w
or
k pr
e
s
e
nt
e
d i
n t
hi
s
pa
pe
r
.
3.
R
E
S
E
A
R
C
H
M
E
T
H
O
D
A
n
e
xpe
r
im
e
nt
w
a
s
e
xe
c
ut
e
d
to
te
s
t
th
e
pr
opos
e
d
a
ppr
oa
c
h’
s
g
e
ne
r
a
li
z
a
ti
on
c
a
pa
bi
li
ty
in
id
e
nt
if
yi
ng
a
r
ti
f
ic
ia
l
s
pe
e
c
h
a
s
a
n
in
de
pe
nde
nt
s
ys
te
m
r
a
th
e
r
th
a
n
a
s
pa
r
t
of
a
n
in
te
gr
a
te
d
A
S
V
s
ys
te
m
.
T
he
A
S
V
s
poof
2015
da
ta
s
e
t,
th
e
l
a
r
ge
s
t
a
nd
m
o
s
t
us
e
d
publ
ic
da
ta
s
e
t
f
or
a
r
ti
f
i
c
ia
l
s
pe
e
c
h
de
te
c
ti
on,
w
a
s
u
s
e
d
to
m
e
a
s
ur
e
th
e
pe
r
f
or
m
a
nc
e
of
th
e
pr
opos
e
d
a
ppr
oa
c
h.
T
he
r
e
c
e
nt
A
S
V
s
po
of
2019
da
ta
s
e
t
w
a
s
not
in
c
lu
de
d
a
s
it
w
a
s
de
s
ig
ne
d
to
e
va
lu
a
te
th
e
im
pa
c
t
of
th
e
c
ount
e
r
m
e
a
s
ur
e
s
on
th
e
r
e
li
a
bi
li
ty
o
f
a
n
A
S
V
s
ys
te
m
w
he
n
s
ubj
e
c
te
d
to
s
poof
in
g a
tt
a
c
ks
[
27]
, w
hi
c
h i
s
out
of
t
he
s
c
ope
of
t
he
w
or
k p
r
e
s
e
nt
e
d i
n t
hi
s
pa
pe
r
.
T
he
A
S
V
s
poof
2015
da
ta
s
e
t
us
e
d
in
th
e
e
xpe
r
im
e
nt
w
a
s
m
a
de
up
of
s
pe
e
c
h
s
ynt
he
s
is
a
nd
voi
c
e
c
onve
r
s
io
n
a
tt
a
c
ks
in
a
ddi
ti
on
to
ge
nui
ne
s
pe
e
c
h
e
s
.
T
h
e
A
S
V
s
poof
2015
da
ta
s
e
t
w
a
s
c
ol
le
c
te
d
a
nd
ge
n
e
r
a
te
d
f
r
om
a
to
ta
l
of
106
s
pe
a
ke
r
s
,
s
pe
c
if
ic
a
ll
y
45
m
a
le
a
nd
61
f
e
m
a
le
s
pe
a
ke
r
s
.
T
he
ge
nui
ne
s
pe
e
c
he
s
of
th
e
A
S
V
s
po
of
2015
w
e
r
e
r
e
c
or
de
d
in
a
s
e
m
i
-
a
ne
c
hoi
c
c
ha
m
be
r
ha
vi
ng
a
s
ol
id
f
lo
or
,
w
he
r
e
a
s
th
e
s
poof
s
pe
e
c
he
s
w
e
r
e
ge
ne
r
a
te
d
us
in
g
te
n
di
f
f
e
r
e
nt
c
om
m
on
s
p
e
e
c
h
s
ynt
he
s
is
a
nd
voi
c
e
c
onve
r
s
io
n
a
lg
or
it
hm
s
.
T
h
e
s
e
a
lg
or
it
hm
s
pr
oduc
e
d
te
n
di
f
f
e
r
e
nt
c
a
te
gor
ie
s
of
a
tt
a
c
ks
(
S
1
-
S
10)
.
T
he
known
a
tt
a
c
ks
in
th
e
A
S
V
s
poof
2015
da
ta
s
e
t
w
e
r
e
m
a
de
up
of
S
1
-
S
5
a
tt
a
c
ks
,
w
hi
c
h
u
s
e
d
c
om
m
on
voi
c
e
c
onve
r
s
io
n
a
nd
s
pe
e
c
h
s
ynt
he
s
is
a
lg
or
it
hm
s
. T
he
unknown a
tt
a
c
ks
i
n t
he
A
S
V
s
poof
2015 da
ta
s
e
t
w
e
r
e
m
a
de
up of
S
6
-
S
10
a
tt
a
c
ks
. S
1, S
2, a
nd
S6
-
S
9
a
tt
a
c
ks
w
e
r
e
ge
ne
r
a
te
d
us
in
g
voi
c
e
c
onve
r
s
io
n
a
lg
or
it
hm
s
,
w
he
r
e
a
s
S
3,
S
4,
a
nd
S
10
a
tt
a
c
ks
w
e
r
e
ge
ne
r
a
te
d
us
in
g
s
pe
e
c
h
s
ynt
he
s
is
a
lg
or
it
hm
s
.
D
e
ta
il
s
on
e
a
c
h
of
th
e
te
n
s
poof
in
g
a
lg
or
it
hm
s
us
e
d
in
th
e
pr
oduc
ti
on of
s
poof
s
pe
e
c
he
s
i
n t
he
A
S
V
s
poof
2015 da
ta
s
e
t
a
r
e
a
va
il
a
bl
e
i
n
[
6]
.
F
our
ty
pe
s
of
f
e
a
tu
r
e
s
w
e
r
e
e
xt
r
a
c
t
e
d
f
r
om
th
e
a
udi
o
r
e
c
or
di
ngs
of
th
e
A
S
V
s
poof
2015
da
t
a
s
e
t,
w
he
r
e
a
s
th
e
c
la
s
s
if
ic
a
ti
ons
w
e
r
e
c
onduc
te
d
u
s
in
g
th
e
w
e
ka
to
ol
.
M
os
t
of
th
e
pa
r
a
m
e
te
r
s
s
e
t
in
th
e
W
e
k
a
to
ol
w
e
r
e
e
m
pi
r
ic
a
ll
y
f
ound
to
be
w
or
ki
n
g
w
e
ll
in
m
os
t
c
a
s
e
s
;
he
n
c
e
th
is
w
or
k
us
e
s
th
e
s
ugge
s
te
d
pa
r
a
m
e
te
r
s
by
W
e
ka
. T
he
r
e
w
e
r
e
t
r
a
in
in
g, de
ve
lo
pm
e
nt
, a
nd e
va
lu
a
ti
on s
e
t
s
i
n t
he
A
S
V
s
poof
2015 da
ta
s
e
ts
. A
s
de
s
c
r
ib
e
d i
n
[
6]
,
th
e
tr
a
in
in
g
s
e
t
is
to
tr
a
in
a
nd
bui
ld
a
P
A
D
m
ode
l,
w
he
r
e
a
s
th
e
de
ve
lo
pm
e
nt
s
e
t
is
f
or
m
ode
l
tu
ni
ng
a
nd
r
e
f
in
e
m
e
nt
,
a
nd
th
e
e
va
lu
a
ti
on
s
e
t
is
f
or
m
ode
l
e
va
lu
a
ti
on.
A
n
e
xpe
r
im
e
nt
w
a
s
c
ondu
c
te
d
to
e
va
lu
a
te
th
e
pe
r
f
or
m
a
nc
e
s
of
th
e
di
f
f
e
r
e
nt
c
om
bi
na
ti
ons
of
th
e
e
xt
r
a
c
te
d
f
e
a
tu
r
e
s
a
nd
c
l
a
s
s
if
ie
r
s
.
T
he
e
xpe
r
im
e
nt
w
a
s
c
onduc
te
d
f
or
e
a
c
h
c
om
bi
na
ti
on
of
f
e
a
tu
r
e
s
a
nd
c
la
s
s
if
ie
r
s
,
w
he
r
e
bot
h
tr
a
in
in
g
a
nd
d
e
ve
lo
pm
e
nt
s
e
ts
w
e
r
e
us
e
d
to
tr
a
in
th
e
m
ode
l,
w
he
r
e
a
s
th
e
e
va
lu
a
ti
on
s
e
t
w
a
s
us
e
d
f
or
va
li
da
ti
on.
T
he
e
xpe
r
im
e
nt
w
a
s
c
ondu
c
te
d
us
in
g
a
m
a
c
hi
ne
w
it
h
s
pe
c
if
ic
a
ti
ons
:
I
nt
e
l
i5
-
3210M
pr
oc
e
s
s
or
,
2.50
G
H
z
,
8
G
B
of
R
A
M
,
W
in
dow
s
10
(
64
-
bi
t)
O
S
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
1
61
-
1
72
168
4.
R
E
S
U
L
T
S
A
N
D
D
I
S
C
U
S
S
I
O
N
I
n
th
is
w
or
k,
w
e
s
how
e
d
th
e
a
c
c
ur
a
c
y
a
nd
F
1
-
s
c
or
e
of
e
a
c
h
m
o
de
l
a
s
a
s
uppor
ti
ng
m
e
tr
ic
in
a
ddi
ti
on
to
th
e
E
E
R
f
or
a
be
tt
e
r
c
om
pa
r
is
on
of
th
e
m
ode
l
pe
r
f
or
m
a
nc
e
s
.
T
hi
s
is
be
c
a
u
s
e
a
lo
w
e
r
E
E
R
m
a
y
not
ne
c
e
s
s
a
r
il
y
in
di
c
a
te
th
a
t
a
m
ode
l
pr
e
di
c
te
d
m
or
e
in
s
ta
nc
e
s
c
or
r
e
c
tl
y.
F
1
-
s
c
or
e
is
of
te
n
u
s
e
d
in
bi
n
a
r
y
c
la
s
s
if
ic
a
ti
on
to
e
va
lu
a
te
how
good
th
e
c
la
s
s
if
ie
r
is
in
de
te
c
ti
n
g
pos
it
iv
e
c
a
s
e
s
.
T
a
bl
e
1
s
how
s
th
e
r
e
s
ul
ts
of
e
xpe
r
im
e
nt
1,
w
he
r
e
by
th
e
d
e
te
c
ti
on
w
a
s
p
e
r
f
or
m
e
d
to
id
e
nt
i
f
y
ge
nui
ne
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h,
w
it
h
th
e
be
s
t
-
pe
r
f
or
m
e
d
c
om
bi
na
ti
on
of
f
e
a
tu
r
e
s
a
nd
c
la
s
s
if
ie
r
s
us
in
g
th
e
f
ou
r
f
e
a
tu
r
e
s
pr
opos
e
d.
S
e
ve
r
a
l
r
e
c
e
nt
w
or
k
s
th
a
t
us
e
d
bot
h
tr
a
in
in
g
a
nd
de
ve
lo
pm
e
nt
s
e
ts
f
or
m
ode
l
tr
a
in
in
g,
na
m
e
ly
M
ode
l
1
-
M
ode
l
3
[
28]
–
[
30]
,
w
e
r
e
us
e
d
f
or
c
om
pa
r
is
on
to
pr
e
s
e
nt
th
e
c
om
pe
ti
ti
ve
ne
s
s
of
th
e
pr
opos
e
d
a
p
pr
oa
c
h
to
th
e
s
ta
te
-
of
-
th
e
-
a
r
t.
I
n
th
e
r
e
m
a
in
de
r
of
th
is
s
e
c
ti
on,
th
e
c
om
bi
na
ti
on
of
f
e
a
tu
r
e
s
a
nd
c
la
s
s
if
ie
r
s
is
r
e
pr
e
s
e
nt
e
d
u
s
in
g
th
e
m
ode
l
num
b
e
r
,
a
s
s
how
n
in
T
a
bl
e
1. F
ig
ur
e
8 s
how
s
d
e
te
c
ti
on e
r
r
or
t
r
a
de
of
f
(
D
E
T
)
c
ur
ve
s
f
or
M
ode
l
4
-
M
ode
l
7
in
e
xpe
r
im
e
nt
1.
T
a
bl
e
1.
T
he
pe
r
f
or
m
a
nc
e
s
of
t
he
m
ode
ls
t
r
a
in
e
d w
it
h both t
r
a
i
n s
e
t
a
nd de
ve
lo
pm
e
nt
s
e
t
a
nd t
e
s
te
d w
it
h t
he
e
va
lu
a
ti
on s
e
t
of
t
he
A
S
V
s
poof
2015 da
ta
s
e
t
M
ode
l
E
xpe
r
i
m
e
nt
1 (
E
va
l
S
e
t
)
EER
(
%
)
A
c
c
ur
a
c
y (
%
)
F1
-
S
c
or
e
(
%
)
M
ode
l
1
:
S
c
a
t
t
e
r
i
ng
c
e
ps
t
r
a
l
c
oe
f
f
i
c
i
e
nt
s
(
S
C
C
)
+G
M
M
-
U
B
M
[
28
]
0.18
-
-
M
ode
l
2
:
C
om
pr
e
s
s
e
d
s
e
n
s
i
ng f
or
hi
gh di
m
e
ns
i
ona
l
f
e
a
t
ur
e
s
(
C
S
-
H
D
)
+i
-
ve
c
t
or
[
29]
0.24
-
-
M
ode
l
3
:
C
Q
C
C
+S
C
C
+G
M
M
-
U
B
M
[
30]
0.10
-
-
M
ode
l
4
:
S
pe
c
t
r
ogr
a
m
i
m
a
ge
C
L
F
+R
F
17.61
93.02
96.42
M
ode
l
5
:
S
pe
c
t
r
ogr
a
m
i
m
a
ge
L
B
P
+R
F
17.09
94.35
97.11
M
ode
l
6
:
M
F
C
C
i
m
a
g
e
C
L
F
+
R
F
0.10
99.93
99.96
M
ode
l
7
:
M
F
C
C
i
m
a
g
e
L
B
P
+
R
F
30.01
95.01
97.45
F
ig
ur
e
8. D
E
T
c
ur
ve
s
of
M
ode
l
4
-
M
ode
l
7
in
e
xpe
r
im
e
nt
1
A
s
t
he
pr
im
a
r
y m
e
tr
ic
us
e
d i
n t
he
A
S
V
s
poof
2015 wa
s
E
E
R
on
ly
, he
nc
e
t
he
a
c
c
ur
a
c
y of
t
he
M
ode
l
1
-
M
ode
l
3
a
r
e
not
s
how
n
in
T
a
bl
e
1.
F
r
om
T
a
bl
e
1,
m
os
t
of
th
e
pr
opos
e
d
m
ode
l
s
pe
r
f
or
m
e
d
w
it
h
ov
e
r
17%
E
E
R
in
e
xpe
r
im
e
nt
1.
A
ll
th
e
pr
opos
e
d
m
ode
ls
(
M
od
e
l
4
-
M
ode
l
7
)
a
c
hi
e
ve
d
a
c
c
ur
a
c
y
ove
r
90%
in
e
xpe
r
im
e
nt
1.
I
n
e
xpe
r
im
e
nt
1,
M
ode
l
6
a
c
hi
e
v
e
d
th
e
lo
w
e
s
t
E
E
R
a
nd
th
e
hi
gh
e
s
t
a
c
c
ur
a
c
y
a
m
ong
th
e
pr
opos
e
d
m
ode
ls
.
I
t
c
a
n
be
obs
e
r
ve
d c
le
a
r
ly
i
n F
ig
u
r
e
8 t
ha
t
t
he
pe
r
f
o
r
m
a
nc
e
of
M
ode
l
6
w
a
s
f
a
r
be
tt
e
r
t
ha
n ot
he
r
m
ode
ls
. O
n t
he
ot
he
r
ha
nd,
a
ll
th
e
pr
opos
e
d
m
ode
ls
(
M
ode
l
4
-
M
ode
l
7
)
a
c
hi
e
v
e
d
ove
r
96%
F
1
-
s
c
or
e
.
T
hi
s
in
di
c
a
te
s
th
a
t
th
e
pr
opos
e
d
m
ode
ls
w
e
r
e
ve
r
y
good
in
de
te
c
ti
ng
th
e
s
poof
voi
c
e
s
w
hi
le
a
t
th
e
s
a
m
e
ti
m
e
ha
s
lo
w
m
is
c
la
s
s
if
ic
a
ti
on of
ge
nui
ne
i
n
s
ta
nc
e
s
a
s
a
s
poof
.
T
he
c
om
bi
na
ti
on
of
th
e
M
F
C
C
im
a
ge
w
it
h
th
e
C
L
F
f
e
a
tu
r
e
e
xt
r
a
c
to
r
ha
s
pr
oduc
e
d
a
r
obus
t
f
e
a
tu
r
e
th
a
t
e
na
bl
e
d
th
e
M
od
e
l
6
to
pe
r
f
or
m
th
e
be
s
t
in
e
xp
e
r
im
e
nt
1.
M
F
C
C
us
e
s
a
M
e
l
s
c
a
li
ng
th
a
t
pr
oduc
e
s
a
s
e
r
ie
s
of
c
oe
f
f
ic
ie
nt
s
r
e
s
e
m
bl
in
g
th
e
r
e
s
ol
ut
io
n
of
th
e
hum
a
n
a
udi
to
r
y
s
ys
te
m
,
w
hi
c
h
is
di
f
f
e
r
e
nt
f
r
om
s
pe
c
tr
ogr
a
m
th
a
t
us
e
s
a
li
ne
a
r
f
r
e
que
nc
y
s
c
a
li
ng.
I
n
a
ddi
ti
on,
th
e
di
f
f
e
r
e
nc
e
s
in
th
e
s
p
a
ti
a
l
di
s
tr
ib
ut
io
n
of
c
ol
or
in
th
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
ti
fi
c
ia
l
s
pe
e
c
h de
te
c
ti
on u
s
in
g i
m
age
-
bas
e
d f
e
at
u
r
e
s
and
r
an
dom
f
or
e
s
t
c
la
s
s
if
ie
r
(
C
hoon B
e
ng T
an
)
169
M
F
C
C
im
a
ge
s
be
twe
e
n
g
e
nui
ne
a
nd
s
poof
c
oul
d
be
d
e
te
c
te
d
by
th
e
C
L
F
f
e
a
tu
r
e
e
xt
r
a
c
to
r
.
T
he
r
e
f
or
e
,
th
e
M
F
C
C
ba
s
e
d f
e
a
tu
r
e
s
pe
r
f
or
m
e
d t
he
be
s
t
w
he
n us
in
g t
he
C
L
F
f
e
a
tu
r
e
e
xt
r
a
c
to
r
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h de
te
c
ti
on.
O
th
e
r
th
a
n
th
e
r
obus
tn
e
s
s
of
th
e
M
F
C
C
C
L
F
f
e
a
tu
r
e
,
th
e
u
s
e
of
R
F
m
a
y
be
on
e
of
th
e
f
a
c
to
r
s
th
a
t
lo
w
e
r
E
E
R
w
a
s
a
c
hi
e
v
e
d
by
M
ode
l
6
in
e
xp
e
r
im
e
nt
1.
D
ue
to
th
e
na
tu
r
e
o
f
R
F
,
de
c
is
io
n
tr
e
e
s
w
it
h
m
or
e
va
r
ia
ti
on
w
e
r
e
bui
lt
w
he
n
th
e
num
be
r
of
in
s
ta
nc
e
s
in
tr
a
in
in
g
da
ta
to
be
r
a
ndoml
y
s
e
le
c
te
d
in
c
r
e
a
s
e
s
.
E
ve
nt
ua
ll
y,
th
is
pr
oduc
e
s
a
m
or
e
ge
n
e
r
a
li
z
e
d
pr
e
di
c
ti
ve
m
ode
l
a
s
th
e
s
im
il
a
r
it
y
be
twe
e
n
th
e
ba
gge
d
tr
e
e
s
de
c
r
e
a
s
e
d.
T
he
r
e
f
or
e
,
ha
vi
ng
m
or
e
d
a
ta
in
m
ode
l
tr
a
in
in
g
m
a
y
im
pr
ovi
s
e
t
he
de
te
c
ti
on
r
a
te
,
th
ough
it
m
a
y
not
a
lw
a
ys
b
e
th
e
c
a
s
e
.
I
n
te
r
m
s
of
de
te
c
ti
on
e
r
r
or
tr
a
de
-
o
f
f
,
a
D
E
T
c
ur
ve
w
a
s
pr
e
s
e
nt
e
d
in
F
ig
ur
e
8
us
in
g
th
e
r
e
s
ul
ts
obt
a
in
e
d
in
e
xpe
r
im
e
nt
1.
A
D
E
T
c
ur
ve
s
how
s
th
e
de
te
c
ti
on
e
r
r
or
tr
a
de
-
of
f
be
twe
e
n
th
e
f
a
ls
e
-
ne
ga
ti
ve
r
a
te
(
m
is
s
pr
oba
bi
li
ty
)
a
nd
f
a
ls
e
pos
it
iv
e
r
a
te
(
f
a
ls
e
a
la
r
m
pr
oba
b
il
it
y)
of
a
bi
na
r
y
c
la
s
s
if
ic
a
ti
on
m
ode
l.
F
r
om
F
ig
ur
e
8,
M
ode
l
6
h
a
s
a
s
ig
ni
f
ic
a
nt
ly
lo
w
e
r
de
t
e
c
ti
on
e
r
r
or
tr
a
de
-
of
f
th
a
n
ot
he
r
m
ode
ls
in
e
xpe
r
im
e
nt
1.
T
hi
s
in
di
c
a
te
s
M
od
e
l
6
pe
r
f
or
m
e
d s
ig
ni
f
ic
a
nt
ly
be
tt
e
r
t
ha
n ot
he
r
m
ode
ls
i
n
e
xpe
r
im
e
nt
1.
B
e
s
id
e
s
,
th
e
r
obus
tn
e
s
s
of
M
ode
l
6
c
a
n
a
l
s
o
be
s
e
e
n
by
lo
oki
ng
a
t
th
e
I
S
O
/I
E
C
s
ta
nda
r
d
m
e
tr
ic
s
,
na
m
e
ly
,
a
tt
a
c
k
pr
e
s
e
nt
a
ti
on
c
la
s
s
if
ic
a
ti
on
e
r
r
or
r
a
te
(
A
P
C
E
R
)
a
nd
bona
f
id
e
pr
e
s
e
nt
a
ti
on
c
l
a
s
s
if
ic
a
ti
on
e
r
r
or
r
a
te
(
B
P
C
E
R
)
of
th
e
m
ode
l.
I
n
to
ta
l,
onl
y
130
out
of
193,404
in
s
ta
nc
e
s
(
0.07%
)
in
th
e
A
S
V
s
poof
2015
e
va
lu
a
ti
on
s
e
t
w
e
r
e
m
is
c
la
s
s
if
ie
d
by
M
od
e
l
6
.
T
h
e
A
P
C
E
R
of
M
ode
l
6
w
a
s
0.02%
,
gi
v
e
n
29
out
of
184,000
s
poof
in
s
ta
nc
e
s
w
e
r
e
m
is
c
la
s
s
if
ie
d
a
s
ge
nui
ne
.
T
he
B
P
C
E
R
o
f
M
ode
l
6
w
a
s
1.07%
,
gi
ve
n
101
out
of
9,40
4
ge
nui
ne
in
s
ta
nc
e
s
w
e
r
e
m
is
c
la
s
s
if
ie
d
a
s
a
s
poof
.
N
one
th
e
le
s
s
,
th
e
di
f
f
e
r
e
nc
e
be
twe
e
n
A
P
C
E
R
a
nd
B
P
C
E
R
w
a
s
a
bout
1%
.
T
o
f
ur
th
e
r
c
om
pa
r
e
our
be
s
t
m
ode
l,
M
ode
l
6,
w
it
h
r
e
c
e
nt
w
or
ks
,
th
e
c
om
pa
r
is
on
of
a
r
ti
f
ic
ia
l
s
pe
e
c
h de
t
e
c
ti
on by c
a
te
gor
y of
a
tt
a
c
ks
(
S
1
-
S
10)
i
s
pr
e
s
e
nt
e
d i
n
T
a
bl
e
2.
T
a
bl
e
2.
T
he
c
o
m
pa
r
is
on of
t
he
pe
r
f
or
m
a
nc
e
of
our
be
s
t
m
ode
l
w
it
h r
e
c
e
nt
w
or
ks
on t
he
e
va
lu
a
ti
on s
e
t
of
t
he
A
S
V
s
poof
2015 da
ta
s
e
t
by c
a
te
gor
y of
a
tt
a
c
ks
(
S
1
-
S
10)
M
ode
l
E
E
R
(
%
)
K
now
n A
t
t
a
c
k
U
nknow
n A
t
t
a
c
k
S1
S2
S3
S4
S5
S6
S7
S8
S9
S
10
M
ode
l
1
0.02
0.33
M
ode
l
2
0.02
0.03
0.01
0.01
0.02
0.01
0.00
0.01
0.00
26.28
M
ode
l
3
0.00
0.01
0.00
0.00
0.02
0.01
0.01
0.00
0.00
0.95
M
ode
l
6
0.14
0.10
0.02
0.02
0.21
0.21
0.14
0.02
0.12
0.03
F
r
om
T
a
bl
e
2,
it
c
a
n
be
obs
e
r
ve
d
th
a
t
our
m
ode
l,
M
ode
l
6,
s
ig
ni
f
ic
a
nt
ly
out
pe
r
f
or
m
e
d
M
ode
l
1
-
M
ode
l
3
f
or
th
e
S
10
a
tt
a
c
ks
s
c
e
na
r
io
,
th
e
m
os
t
di
f
f
ic
ul
t
s
poof
i
ng
a
tt
a
c
k.
M
ode
l
6
a
ls
o
pr
oduc
e
d
c
om
pa
r
a
bl
e
pe
r
f
or
m
a
nc
e
s
on
ot
he
r
c
a
te
gor
ie
s
of
a
tt
a
c
k.
B
e
s
id
e
s
,
it
c
a
n
b
e
obs
e
r
ve
d
th
a
t
M
ode
l
6
r
e
c
or
de
d
0.02
-
0.21%
E
E
R
a
c
r
os
s
S1
-
S
10
a
tt
a
c
ks
.
M
ode
l
6
th
a
t
pe
r
f
or
m
e
d
w
it
h
a
n
o
ve
r
a
ll
E
E
R
of
0.10%
,
r
e
c
or
de
d
a
s
ig
ni
f
ic
a
nt
ly
hi
ghe
r
E
E
R
o
f
0.95%
on
th
e
S
10
a
tt
a
c
k
de
s
pi
te
th
e
m
ode
l
a
c
h
ie
ve
d
be
lo
w
0.02%
E
E
R
in
ot
he
r
a
tt
a
c
ks
(
S
1
-
S
9)
. T
hi
s
i
ndi
c
a
te
s
t
ha
t
M
ode
l
6
i
s
m
or
e
ge
ne
r
a
li
z
e
d t
ha
n t
he
ot
h
e
r
s
.
T
he
r
e
w
e
r
e
9,404
ge
nui
ne
in
s
ta
nc
e
s
a
nd
18,400
s
poof
in
s
ta
nc
e
s
of
e
a
c
h
a
tt
a
c
k
ty
pe
(
S
1
-
S
10)
in
th
e
A
S
V
s
poof
2015
e
va
lu
a
ti
on
s
e
t.
T
he
m
is
c
l
a
s
s
if
ie
d
in
s
ta
n
c
e
s
f
or
ge
nui
ne
,
known,
a
nd
unknown
a
tt
a
c
ks
by
M
ode
l
6
w
e
r
e
101,
11,
a
nd
18
in
s
ta
nc
e
s
,
r
e
s
p
e
c
ti
ve
ly
.
A
n
i
nt
e
r
e
s
ti
ng
obs
e
r
va
ti
on
is
th
a
t
th
e
r
e
w
e
r
e
no
in
s
ta
nc
e
s
f
r
om
S
10
a
tt
a
c
ks
be
in
g
m
is
c
la
s
s
if
ie
d
a
s
g
e
nui
ne
b
y
M
ode
l
6
,
in
w
hi
c
h
th
e
0.03%
E
E
R
f
or
S
10
a
tt
a
c
ks
w
e
r
e
in
c
ur
r
e
d
by
th
e
f
a
ls
e
a
la
r
m
.
U
nl
ik
e
known
a
tt
a
c
ks
(
S
1
-
S
5)
,
w
hi
c
h
w
e
r
e
ge
ne
r
a
te
d
us
in
g
a
vo
c
ode
r
,
S
10
a
tt
a
c
k
s
w
e
r
e
ge
n
e
r
a
te
d
w
it
hout
a
voc
od
e
r
.
T
hi
s
w
a
s
th
e
f
a
c
to
r
th
a
t
h
a
s
c
a
u
s
e
d
m
o
s
t
of
th
e
s
ta
te
-
of
-
th
e
-
a
r
t
voi
c
e
P
A
D
s
ys
te
m
s
to
s
uf
f
e
r
f
r
om
s
ig
ni
f
ic
a
nt
ly
hi
gh
e
r
E
E
R
on
S
10
a
tt
a
c
ks
.
C
om
pa
r
a
bl
y,
M
ode
l
6
ha
s
s
uc
c
e
s
s
f
ul
ly
i
de
nt
if
ie
d a
ll
S
10 a
tt
a
c
ks
a
s
a
s
poof
;
he
nc
e
i
t
w
a
s
m
or
e
ge
ne
r
a
li
z
e
d a
nd e
f
f
e
c
ti
ve
i
n de
te
c
ti
ng
a
r
ti
f
ic
ia
l
s
pe
e
c
h r
e
ga
r
dl
e
s
s
of
t
he
us
e
of
t
he
voc
ode
r
.
T
o
pr
e
ve
nt
unr
e
li
a
bl
e
pe
r
f
or
m
a
nc
e
e
va
lu
a
ti
on,
E
E
R
,
a
c
c
ur
a
c
y, a
nd
F
1
-
s
c
or
e
w
e
r
e
us
e
d
in
th
is
w
or
k.
F
r
om
th
e
r
e
s
ul
ts
s
how
n
in
T
a
bl
e
1,
th
e
pe
r
f
or
m
a
nc
e
of
M
ode
l
6
is
r
e
li
a
bl
e
a
s
th
e
E
E
R
,
a
c
c
ur
a
c
y, a
nd
F
1
-
s
c
or
e
w
e
r
e
good. Ve
r
y hi
gh a
c
c
ur
a
c
y a
nd F
1
-
s
c
or
e
but
hi
gh E
E
R
c
a
n be
obt
a
in
e
d i
f
t
he
pr
opor
ti
on o
f
e
it
he
r
c
la
s
s
of
th
e
t
e
s
t
s
e
t
w
a
s
ove
r
w
he
lm
in
g a
nd t
he
m
ode
l
bi
a
s
e
d t
ow
a
r
d one
of
t
he
c
la
s
s
e
s
w
it
h ove
r
w
he
lm
in
g pr
opor
ti
on.
F
or
e
xa
m
pl
e
,
a
te
s
t
s
e
t
w
a
s
m
a
de
up
of
100
in
s
t
a
nc
e
s
w
it
h
90
s
poof
in
s
ta
nc
e
s
a
nd
te
n
ge
nui
ne
in
s
ta
nc
e
s
.
I
f
th
e
m
ode
l
w
a
s
bi
a
s
a
nd
ove
r
f
it
ti
ng,
it
m
ig
ht
pr
e
di
c
t
a
ll
in
s
ta
nc
e
s
of
th
e
te
s
t
s
e
t
a
s
a
s
poof
to
a
c
hi
e
ve
hi
gh
a
c
c
ur
a
c
y
a
nd
F
1
-
s
c
or
e
.
I
n
th
is
c
a
s
e
,
th
e
a
c
c
ur
a
c
y
of
th
e
m
ode
l
w
oul
d
be
90%
,
w
he
r
e
a
s
th
e
E
E
R
w
oul
d
be
50%
. H
ow
e
ve
r
, M
ode
l
6
w
a
s
not
t
he
c
a
s
e
. F
r
om
t
he
r
e
s
ul
ts
, a
s
s
how
n i
n T
a
bl
e
s
1 a
nd 2, the
l
ow
E
E
R
a
c
hi
e
ve
d
by
M
ode
l
6
in
di
c
a
te
d
th
a
t
th
e
hi
gh
a
c
c
ur
a
c
y
a
c
hi
e
ve
d
w
a
s
n
e
it
he
r
due
to
bi
a
s
nor
ov
e
r
f
it
ti
ng.
T
he
c
om
bi
na
ti
on
of
R
F
a
nd
M
F
C
C
im
a
ge
-
ba
s
e
d
C
L
F
f
e
a
tu
r
e
s
w
a
s
s
how
n
to
be
e
f
f
e
c
ti
ve
in
de
te
c
ti
ng
a
r
ti
f
ic
ia
l
s
pe
e
c
h
a
s
M
ode
l
6
pr
oduc
e
d
a
lo
w
E
E
R
of
0.10
%
w
hi
le
a
c
hi
e
vi
ng
hi
gh
a
c
c
ur
a
c
y
a
nd
F
1
-
s
c
or
e
of
99.93%
a
nd
99.96%
,
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
2022
:
1
61
-
1
72
170
r
e
s
pe
c
ti
ve
ly
.
I
t
is
a
ls
o
s
how
n
to
be
a
bl
e
to
pr
oduc
e
s
im
il
a
r
de
te
c
ti
on
pe
r
f
or
m
a
nc
e
on
a
ll
c
a
te
gor
ie
s
of
a
tt
a
c
ks
(
S
1
-
S
10)
.
5.
C
O
N
C
L
U
S
I
O
N
I
n
th
is
pa
pe
r
,
a
f
e
a
tu
r
e
e
ngi
ne
e
r
in
g
a
ppr
oa
c
h
to
pr
oduc
e
ha
n
dc
r
a
f
te
d
f
e
a
tu
r
e
s
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on
w
a
s
pr
opos
e
d. T
he
c
ont
r
ib
ut
io
n
of
th
is
pa
pe
r
is
in
th
e
pr
opos
e
d
c
om
bi
na
ti
on
of
f
e
a
tu
r
e
s
e
ngi
ne
e
r
e
d
us
in
g
da
ta
tr
a
ns
f
or
m
a
ti
on
a
ppr
oa
c
he
s
a
nd
R
F
c
la
s
s
if
ie
r
f
or
a
r
t
if
ic
ia
l
s
pe
e
c
h
de
te
c
ti
on.
F
our
ty
pe
s
of
im
a
ge
-
ba
s
e
d
s
pe
c
tr
ogr
a
m
a
nd
M
F
C
C
f
e
a
tu
r
e
s
w
e
r
e
e
xt
r
a
c
te
d
to
c
la
s
s
if
y
ge
nui
ne
a
nd
s
poof
s
p
e
e
c
he
s
.
T
he
A
S
V
s
poof
2015
da
ta
s
e
t
w
a
s
u
s
e
d
in
th
e
e
xpe
r
im
e
nt
to
de
te
r
m
in
e
th
e
e
f
f
e
c
ti
ve
ne
s
s
of
th
e
pr
opos
e
d
a
ppr
oa
c
h
a
ga
in
s
t
a
r
ti
f
ic
ia
l
s
p
e
e
c
h.
A
n
e
xpe
r
im
e
nt
w
a
s
r
un
to
c
om
pa
r
e
th
e
pe
r
f
or
m
a
nc
e
of
th
e
ne
w
f
e
a
tu
r
e
s
w
it
h
th
e
R
F
c
la
s
s
if
ie
r
f
or
a
r
ti
f
ic
ia
l
s
pe
e
c
h
de
t
e
c
ti
on.
F
r
om
th
e
e
xp
e
r
im
e
nt
,
t
he
r
e
s
ul
ts
s
how
e
d
th
a
t
th
e
pr
opos
e
d
a
ppr
oa
c
h
c
oul
d
pr
oduc
e
a
m
ode
l
(
M
ode
l
6
)
w
hi
c
h
us
e
d
a
n
I
m
a
ge
F
il
te
r
c
a
ll
e
d
C
L
F
to
e
xt
r
a
c
t
f
e
a
tu
r
e
s
f
r
om
M
F
C
C
im
a
ge
s
a
nd
a
R
a
ndom
F
or
e
s
t
a
s
th
e
c
la
s
s
if
ie
r
.
T
he
c
om
bi
na
ti
o
n
of
th
e
M
F
C
C
C
L
F
f
e
a
tu
r
e
a
nd
R
F
c
la
s
s
if
ie
r
ge
ne
r
a
te
d
a
w
e
ll
-
pe
r
f
or
m
e
d
m
ode
l,
w
hi
c
h
yi
e
ld
s
good
E
E
R
,
a
c
c
ur
a
c
y,
a
nd
F
1
-
s
c
or
e
of
0.10%
,
99.93%
,
a
nd
99.96%
,
r
e
s
pe
c
ti
ve
ly
,
in
de
te
c
ti
ng
a
r
ti
f
ic
ia
l
s
pe
e
c
h.
H
ow
e
ve
r
,
in
a
r
e
a
l
-
w
or
ld
s
c
e
na
r
io
,
s
pe
e
c
h
da
ta
w
e
r
e
a
lwa
ys
e
xpos
e
d
to
va
r
io
us
noi
s
e
s
th
a
t
de
te
r
io
r
a
te
th
e
a
udi
o
qua
li
ty
.
A
s
th
e
A
S
V
s
poof
2015
da
ta
s
e
t
c
ont
a
in
e
d
onl
y
c
le
a
n
a
udi
o
r
e
c
or
di
ng,
th
e
p
r
opos
e
d
a
ppr
oa
c
h
m
a
y
not
be
a
bl
e
to
a
c
hi
e
ve
s
im
il
a
r
pe
r
f
or
m
a
nc
e
w
he
n
te
s
te
d
on
th
e
noi
s
e
a
dde
d
da
ta
s
e
t.
H
e
nc
e
,
f
ut
ur
e
w
or
k
i
s
di
r
e
c
t
e
d
to
te
s
t
th
e
pr
opos
e
d
a
ppr
oa
c
h
on
th
e
noi
s
e
a
dde
d
da
ta
s
e
t.
T
h
e
n,
th
e
in
ve
s
ti
ga
ti
on
of
f
e
a
tu
r
e
f
us
io
n
a
nd
e
n
s
e
m
bl
e
c
la
s
s
if
ie
r
s
to
im
pr
ove
th
e
pe
r
f
or
m
a
nc
e
f
ur
th
e
r
a
nd
to
e
xpa
nd
th
e
de
te
c
ti
on
to
ot
he
r
ty
pe
s
of
pr
e
s
e
nt
a
ti
on
a
tt
a
c
ks
s
uc
h
a
s
r
e
pl
a
y
a
tt
a
c
k
s
.
I
n
a
ddi
ti
on,
m
or
e
da
ta
s
e
ts
w
il
l
be
us
e
d
to
e
va
lu
a
te
th
e
ge
ne
r
a
li
z
a
ti
on
c
a
pa
bi
li
ty
of
f
e
a
tu
r
e
e
ngi
ne
e
r
e
d
us
in
g
a
da
ta
tr
a
ns
f
or
m
a
ti
on
a
ppr
oa
c
h
a
ga
in
s
t
pr
e
vi
ous
ly
uns
e
e
n
s
poof
in
g
a
tt
a
c
ks
.
L
a
s
tl
y,
th
e
in
te
gr
a
ti
on
of
th
e
pr
opos
e
d
a
ppr
oa
c
h w
it
h
th
e
A
S
V
s
ys
te
m
s
w
il
l
be
c
onduc
te
d
a
nd t
e
s
te
d o
n t
he
A
S
V
s
poof
2019 da
ta
s
e
t.
R
E
F
E
R
E
N
C
E
S
[
1]
A
.
A
.
M
a
l
l
ouh,
Z
.
Q
a
w
a
qne
h,
a
nd
B
.
D
.
B
a
r
ka
na
,
“
N
e
w
t
r
a
ns
f
or
m
e
d
f
e
a
t
u
r
e
s
ge
ne
r
a
t
e
d
by
de
e
p
bot
t
l
e
ne
c
k
e
xt
r
a
c
t
or
a
nd
a
G
M
M
–
U
B
M
c
l
a
s
s
i
f
i
e
r
f
or
s
pe
a
ke
r
a
ge
a
nd
ge
nde
r
c
l
a
s
s
i
f
i
c
a
t
i
on,”
N
e
u
r
a
l
C
om
put
i
ng
and
A
ppl
i
c
at
i
ons
,
vol
.
30,
no.
8,
pp. 2581
–
2593, O
c
t
. 2018, doi
:
10.1007/
s
00521
-
017
-
2848
-
4.
[
2]
A
.
I
.
A
bdur
r
a
hm
a
n
a
nd
A
.
Z
a
hr
a
,
“
S
poke
n
l
a
ngua
ge
i
de
nt
i
f
i
c
a
t
i
on
us
i
ng
i
-
v
e
c
t
or
s
,
x
-
ve
c
t
or
s
,
P
L
D
A
a
nd
l
ogi
s
t
i
c
r
e
gr
e
s
s
i
on,
”
B
ul
l
e
t
i
n of
E
l
e
c
t
r
i
c
al
E
ngi
ne
e
r
i
ng and I
nf
or
m
at
i
c
s
, vol
. 10, no. 4, pp. 2237
–
2244, A
ug. 2021, doi
:
10.11591/
e
e
i
.v10i
4.2893.
[
3]
G
.
H
e
i
gol
d,
I
.
M
or
e
no,
S
.
B
e
ngi
o,
a
nd
N
.
S
ha
z
e
e
r
,
“
E
nd
-
to
-
e
nd
t
e
xt
-
de
pe
nde
nt
s
pe
a
ke
r
ve
r
i
f
i
c
a
t
i
on,”
i
n
2016
I
E
E
E
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
A
c
ous
t
i
c
s
,
Spe
e
c
h
and
Si
gnal
P
r
oc
e
s
s
i
ng
(
I
C
A
SSP
)
,
M
a
r
.
2016,
pp.
5115
–
5119,
doi
:
10.1109/
I
C
A
S
S
P
.2016.7472652.
[
4]
S.
-
H.
Y
oon
a
nd
H
.
-
J
.
Y
u,
“
A
s
i
m
pl
e
di
s
t
or
t
i
on
-
f
r
e
e
m
e
t
hod
t
o
ha
ndl
e
va
r
i
a
bl
e
l
e
ngt
h
s
e
que
nc
e
s
f
or
r
e
c
ur
r
e
nt
ne
ur
a
l
ne
t
w
or
ks
i
n
t
e
xt
de
pe
nde
nt
s
pe
a
k
e
r
ve
r
i
f
i
c
a
t
i
on,”
A
ppl
i
e
d Sc
i
e
nc
e
s
, vol
. 10, no. 12, J
un. 2020, doi
:
10.3390/
a
pp10124092.
[
5]
Y
.
G
ong,
J
.
Y
a
ng,
J
.
H
ube
r
,
M
.
M
a
c
K
ni
ght
,
a
nd
C
.
P
oe
l
l
a
ba
ue
r
,
“
R
E
M
A
S
C
:
r
e
a
l
i
s
t
i
c
r
e
pl
a
y
a
t
t
a
c
k
c
or
pus
f
or
voi
c
e
c
ont
r
ol
l
e
d
s
ys
t
e
m
s
,
”
i
n
I
nt
e
r
s
pe
e
c
h 2019
, S
e
p. 2019, pp. 2355
–
2359, doi
:
10.21437/
I
nt
e
r
s
pe
e
c
h.2019
-
1541.
[
6]
Z
.
W
u
e
t
al
.
,
“
A
S
V
s
poof
:
t
he
a
ut
om
a
t
i
c
s
pe
a
ke
r
ve
r
i
f
i
c
a
t
i
on
s
poof
i
ng
a
nd
c
ou
nt
e
r
m
e
a
s
ur
e
s
c
ha
l
l
e
ng
e
,”
I
E
E
E
J
our
nal
of
Se
l
e
c
t
e
d
T
opi
c
s
i
n Si
gnal
P
r
oc
e
s
s
i
ng
, vol
. 11, no. 4, pp. 588
–
604, J
un. 2017, doi
:
10.1109/
J
S
T
S
P
.2017.2671435.
[
7]
T
. K
i
nnune
n
e
t
al
.
, “
T
he
A
S
V
s
poof
2017
c
ha
l
l
e
nge
:
a
s
s
e
s
s
i
ng
t
he
l
i
m
i
t
s
of
r
e
pl
a
y s
poof
i
ng a
t
t
a
c
k d
e
t
e
c
t
i
on,”
i
n
I
nt
e
r
s
pe
e
c
h
2017
,
A
ug. 2017, pp. 2
–
6, doi
:
10.21437/
I
nt
e
r
s
pe
e
c
h.2017
-
1111.
[
8]
M
.
T
odi
s
c
o
e
t
al
.
,
“
A
S
V
S
poof
2019:
f
ut
ur
e
hor
i
z
ons
i
n
s
poof
e
d
a
nd
f
a
ke
a
udi
o
de
t
e
c
t
i
on,”
i
n
I
nt
e
r
s
p
e
e
c
h
2019
,
S
e
p.
2019
,
vol
. 2019
-
S
e
pt
e
, pp. 1
008
–
1012, doi
:
10.21437/
I
nt
e
r
s
pe
e
c
h.2019
-
2249.
[
9]
M
. P
a
l
, D
. P
a
ul
, a
nd G
. S
a
ha
, “
S
ynt
he
t
i
c
s
p
e
e
c
h de
t
e
c
t
i
on us
i
ng f
unda
m
e
nt
a
l
f
r
e
que
nc
y va
r
i
a
t
i
on a
nd s
pe
c
t
r
a
l
f
e
a
t
ur
e
s
,
”
C
om
put
e
r
Spe
e
c
h and L
anguage
, vol
. 48, pp. 31
–
50, M
a
r
. 2018, doi
:
10.1016/
j
.c
s
l
.2017.1
0.001.
[
10]
C
.
D
e
m
i
r
ogl
u,
O
.
B
uyuk,
A
.
K
hoda
ba
kh
s
h
,
a
nd
R
.
M
a
i
a
,
“
P
os
t
pr
oc
e
s
s
i
ng
s
ynt
he
t
i
c
s
pe
e
c
h
w
i
t
h
a
c
om
pl
e
x
c
e
ps
t
r
um
voc
od
e
r
f
or
s
poof
i
ng
pha
s
e
-
ba
s
e
d
s
ynt
he
t
i
c
s
pe
e
c
h
de
t
e
c
t
or
s
,”
I
E
E
E
J
our
nal
of
Se
l
e
c
t
e
d
T
opi
c
s
i
n
Si
gnal
P
r
oc
e
s
s
i
ng
,
vol
.
11,
no.
4,
pp. 671
–
683, J
un. 2017, doi
:
10.1109/
J
S
T
S
P
.2017.267
3807.
[
11]
I
.
O
z
e
r
,
Z
.
O
z
e
r
,
a
nd
O
.
F
i
ndi
k,
“
L
a
n
c
z
os
k
e
r
ne
l
ba
s
e
d
s
pe
c
t
r
ogr
a
m
i
m
a
ge
f
e
a
t
ur
e
s
f
or
s
ound
c
l
a
s
s
i
f
i
c
a
t
i
on,”
P
r
oc
e
di
a
C
om
put
e
r
Sc
i
e
nc
e
, vol
. 111, no. 2015, pp. 137
–
144, 2017, doi
:
10.1016/
j
.pr
oc
s
.2017.06.020.
[
12]
P
.
P
a
r
a
s
u,
J
.
E
pps
,
K
.
S
r
i
s
ka
nd
a
r
a
j
a
,
a
nd
G
.
S
ut
hokum
a
r
,
“
I
nve
s
t
i
ga
t
i
ng
l
i
ght
-
R
e
s
N
e
t
a
r
c
hi
t
e
c
t
ur
e
f
or
s
poof
i
ng
de
t
e
c
t
i
on
und
e
r
m
i
s
m
a
t
c
he
d c
ondi
t
i
ons
,”
i
n
I
nt
e
r
s
pe
e
c
h 2020
, O
c
t
. 2020, pp. 1111
–
1115, doi
:
10.21437/
I
nt
e
r
s
pe
e
c
h.2020
-
2039.
[
13]
J
.
Y
a
ng,
R
.
K
.
D
a
s
,
a
nd
H
.
L
i
,
“
S
i
gni
f
i
c
a
nc
e
of
s
ubba
nd
f
e
a
t
ur
e
s
f
or
s
y
nt
he
t
i
c
s
pe
e
c
h
de
t
e
c
t
i
on,”
I
E
E
E
T
r
ans
ac
t
i
ons
on
I
nf
or
m
at
i
on F
or
e
ns
i
c
s
and Se
c
ur
i
t
y
, vol
. 15, pp. 2160
–
2170, 2020, doi
:
10.110
9/
T
I
F
S
.2019.2956589.
[
14]
A
. G
om
e
z
-
A
l
a
ni
s
, A
.
M
. P
e
i
na
do, J
. A
. G
on
z
a
l
e
z
, a
nd
A
. M
.
G
om
e
z
, “
A
l
i
ght
c
onvol
u
t
i
ona
l
G
R
U
-
R
N
N
de
e
p f
e
a
t
ur
e
e
xt
r
a
c
t
or
f
or
A
S
V
s
poof
i
ng de
t
e
c
t
i
on,”
i
n
I
nt
e
r
s
pe
e
c
h 2019
, S
e
p. 2019, vol
. 2019
-
S
e
pt
e
, pp. 1068
–
1072, doi
:
10.21437/
I
nt
e
r
s
pe
e
c
h.2019
-
2212.
[
15]
A
.
G
om
e
z
-
A
l
a
ni
s
,
A
.
M
.
P
e
i
na
do,
J
.
A
.
G
onz
a
l
e
z
,
a
nd
A
.
M
.
G
om
e
z
,
“
A
ga
t
e
d
r
e
c
ur
r
e
nt
c
onvol
ut
i
ona
l
ne
ur
a
l
ne
t
w
or
k
f
or
r
obus
t
s
poof
i
ng
de
t
e
c
t
i
onn,”
I
E
E
E
/
A
C
M
T
r
ans
ac
t
i
ons
on
A
udi
o
Spe
e
c
h
and
L
angu
age
P
r
oc
e
s
s
i
ng
,
vol
.
27,
no.
12,
pp.
1985
–
1999,
D
e
c
. 2019, doi
:
10.1109/
T
A
S
L
P
.2019.2937413.
[
16]
C
. B
. T
a
n
e
t
al
.
, “
A
s
ur
ve
y
on pr
e
s
e
nt
a
t
i
on a
t
t
a
c
k
de
t
e
c
t
i
on f
or
a
ut
om
a
t
i
c
s
pe
a
k
e
r
ve
r
i
f
i
c
a
t
i
on s
ys
t
e
m
s
:
s
t
a
t
e
-
of
-
t
he
-
a
r
t
, t
a
xonom
y,
i
s
s
ue
s
a
nd
f
ut
ur
e
di
r
e
c
t
i
on,”
M
ul
t
i
m
e
di
a
T
ool
s
and
A
ppl
i
c
at
i
ons
,
vol
.
80,
no.
21
–
23,
pp.
32725
–
32762,
S
e
p.
2021,
doi
:
10.1007/
s
11042
-
021
-
11235
-
x.
[
17]
A
.
M
.
B
a
ds
h
a
h,
J
.
A
hm
a
d,
N
.
R
a
hi
m
,
a
nd
S
.
W
.
B
a
i
k,
“
S
pe
e
c
h
e
m
ot
i
on
r
e
c
ogni
t
i
on
f
r
om
s
pe
c
t
r
ogr
a
m
s
w
i
t
h
de
e
p
c
onvol
ut
i
ona
l
ne
ur
a
l
ne
t
w
or
k,”
i
n
2017
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
P
l
at
f
or
m
T
e
c
hnol
ogy
an
d
Se
r
v
i
c
e
,
P
l
at
C
on
2017
-
P
r
oc
e
e
di
ngs
,
F
e
b.
2017,
vol
. 24, n
o. 6, pp. 1
–
5, doi
:
10.1109/
P
l
a
t
C
on.2017.7883728.
Evaluation Warning : The document was created with Spire.PDF for Python.