TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 14, No. 1, April 2015, pp. 116 ~ 1
2
2
DOI: 10.115
9
1
/telkomni
ka.
v
14i1.721
6
116
Re
cei
v
ed
De
cem
ber 2
0
, 2014; Re
vi
sed
Febr
uary 26,
2015; Accept
ed Feb
r
ua
ry
15, 2015
Robust Pitch Detection Based on Recurrence Analysis
and Empirical Mode Decomposition
Jingfan
g Wa
ng
Schoo
l of Information Sci
enc
e & Engin
eer
in
g, Huna
n Inter
natio
nal Ec
ono
mics Univers
i
t
y
,
Cha
ngsh
a
, Chi
na, postco
de: 410
20
5
E-mail: matlab
_b
ysj@
12
6.co
m
A
b
st
r
a
ct
A new
pitch d
e
tection
met
h
o
d
is desi
g
n
ed
by the recurre
nce an
alysis
i
n
this pa
per, w
h
ich is
combi
ned
of Emp
i
rica
l Mod
e
Deco
mpos
iti
on (EMD)
a
n
d
Elli
ptic F
ilte
r
(EF
)
.
T
he Emp
i
rica
l Mod
e
Deco
mpositi
o
n
(EMD) of Hil
b
e
rt-Hua
ng T
r
an
sform (H
HT
) is utili
z
e
d
toso
lv
e the pr
obl
e
m
, and
a no
isy voi
c
e
is first filter
ed
on th
e
ell
i
ptic
ban
d filt
er. T
h
e tw
o
Intrins
i
c
Mode
F
uncti
on
s (IMF
) are sy
nthesi
z
e
d
by
E
M
D
w
i
th maxi
mu
m correlati
on of
voice, a
nd the
n
the pitc
h be easily divi
de
d.
T
he
re
sults sh
ow
that the ne
w
meth
od
perfor
m
a
n
ce
is b
e
tter than
the c
onve
n
tio
nal
au
tocorrel
a
tion
al
gorith
m
an
d
c
epstru
m
meth
od
,
espec
ial
l
y in the part that the su
rd an
d the sona
nt are not evid
en
t, and get a hi
gh
robustn
ess in
noi
s
y
envir
on
me
nt.
Ke
y
w
ords
: e
m
p
i
rica
l
mode
deco
m
positi
on,
recurs
ive
an
al
ysis, el
liptic
filt
er, intri
n
sic
mo
de fu
nctio
n
, p
i
tch
detectio
n
Copy
right
©
2015 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Pitch refers to the hair ca
use
d
by voca
l fold
vibration durin
g voiced peri
odi
city pitch is
the re
ciprocal
of the freque
ncy of vocal f
o
ld vi
bration [
1
]. Speech
si
gnal pitch is to describe on
e
of the im
po
rtant pa
ram
e
te
rs, i
n
the
ton
e
rec
ognition,
emotio
n re
cognition, sp
e
e
ch
recogniti
on,
spe
a
ker reco
gnition, sp
ee
ch synth
e
si
s
and codin
g
, musi
c ret
r
iev
a
l, soun
d sy
stem, diag
no
sis,
heari
ng im
p
a
irme
nt and
many othe
r area
s
of l
angu
age i
n
struction
ha
s wide
ran
g
e
of
appli
c
ation
s
[
2
]. Becau
s
e
spe
e
ch is a
dynamic
process i
s
no
n-st
ationar
y ran
d
o
m
p
r
o
c
e
ss, so
cha
nge
s in
the
waveform
is
extremely
com
p
lex, no
t only the
si
ze of the
pitch
peri
od l
engt
h of
individual
vocal, thickne
s
s, tough
ne
ss a
nd p
r
o
nun
ciat
ion h
abits,
bu
t also
with th
e p
r
on
un
ciation
of age, gen
d
e
r, pronun
cia
t
ion , the intensity and em
otional a
r
ticul
a
tion, and m
any other fa
ctors.
At pre
s
ent, th
e ha
rde
r
to
find a
commo
n
app
roa
c
h
to
extract
accu
rately and
reli
able voi
c
e
in
any
ca
se, the
pitch pe
riod,
so
the e
s
timated
pitch
peri
od i
s
the
stu
d
y of
sp
ee
ch
pro
c
essing
field h
a
s
been h
o
t and
difficult one.
We
use elli
ptic band-pa
ss filter (Ellipti
c Filter) [3] to
pr
eprocess t
he
sign
al to
eliminate
the introdu
cti
on of high-o
r
der ha
rmo
n
ic distor
tion a
nd noise, the singul
ar p
o
int, so has
the
physi
cal m
e
a
n
ing
of the
si
gnal
com
pon
ent of a
co
mp
lete
line
a
r su
perp
o
sitio
n
stand out
the
way,
and th
en
u
s
e
the EM
D
m
e
thod
sel
e
cte
d
correlatio
n
with p
h
ysi
c
al
mea
n
ing
we
nee
d th
e m
ode
sign
al. The
n
with voi
c
e
recursive a
n
a
lysis
of dy
n
a
mic
ch
ara
c
t
e
risti
c
s of pi
tch d
e
tection
is
combi
ned. A
dditive broa
d
band n
o
ise with a variet
y of voice test, the metho
d
can a
c
cu
ra
tely
detect the pit
c
h pe
riod, so that further re
duc
e the dete
c
tion erro
r,
and ha
s goo
d robu
stne
ss.
2. Elliptic Fil
t
er
w
i
th
the
Pitch Detecti
on Process
2.1. Elliptic fi
lter
Elliptic filter (Elliptic filter) [
3
], also
known as
K
aul filter (Cauer filt
er), is in
the passband
and a sto
pba
nd equi
rippl
e
filter. Elliptic filter compa
r
ed to other t
y
pes of filters, in order
un
der
the same
co
ndition
s
with
the minim
u
m
pa
ssban
d
a
nd
stopb
and,
fluctuatio
ns i
n
tra
n
sitio
n
zone
decrea
s
e
d
ra
pidly; the tran
sition
zon
e
is very
narro
w. It is in the p
a
ssba
nd a
nd
stopb
and
of the
fluctuation
s
i
n
the same,
whi
c
h i
s
different fr
om the
passb
and
an
d stop
ban
d a
r
e flat Butterworth
filter, and a flat passba
nd, stopba
nd
equirip
p
le
stop-ba
nd or
a flat passb
and rip
p
le, etc.
C
h
eb
ys
he
v filte
r
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Rob
u
st Pitch
Dete
ction Ba
sed o
n
Re
cu
rren
ce Anal
ysi
s
and Em
pirical Mode…
(Ji
ngfang
Wan
g
)
117
This
4-order elliptic
band-pass
filter, t
he maximum attenuation of 0.05dB pas
s
band and
minimum
sto
pban
d attenu
ation of
80d
B, passba
n
d
regi
on
2 * [
75,500] / f
s
,
fs the
sam
p
li
ng
freque
ncy (Hz). When the
fs = 19.98
kHz to
obtainin
g
the filte (1) (Omission
)
.
2.2. Pitch De
tec
t
ion Proc
ess
Noi
s
y spe
e
ch
unde
rwe
n
t a
4-orde
r ellipt
i
c ban
d-
p
a
ss
filter, filter out high freq
uen
cy and
low frequ
en
cy below 60
Hz, and
calcul
ated to the fi
rst
N
0
ellipti
cal filtering
of
data as the i
n
itia
l
stand
ard
d
e
viation of
the
n
o
ise
sectio
n
of Q0
(EM
D
as
a
ba
sis for acce
ss
); the
n
20
-3
0ms lo
ng
framing; of e
a
ch frame
si
gnal of a re
cursiv
e two-degree threshold fram
e voicin
g de
cisi
on,
voicele
s
s fra
m
e ze
ro pitch
,
or determi
n
e
the init
ial section of the
stand
ar
d d
e
viation of the noise
Q0
s
i
z
e
. If Q0 <
α
(eg
α
=
0.15), the
re
cursive
an
alysi
s
of di
re
ct a
c
ce
ss to pitch, voiced
fram
e
or
qua
si-va
r
ian
c
e cal
c
ulation
s
Q. Whe
n
Q <kQ0 (k
con
s
tant), vo
icele
s
s fram
e pitch to zero,
otherwise the EMD de
co
mposed IMF
compo
nent
s on different
scal
e
s a
s
so
ciated
with the
decompo
sitio
n
of the
si
gna
l prio
r to
calculation,
ta
ke t
he maxim
u
m
correl
ation
of
the two
mod
e
s
(IMF) synth
e
tic pitch si
gna
l, agai
n Synthetic sp
ect
r
u
m
calculated
for the se
con
d
requ
est sig
nal
pitch.
Voiced f
r
am
e of re
cu
rsi
v
e freque
ncy signal
a
n
a
lysis and cal
c
ulatio
n:
Statistics
recurrence pl
ot parallel to
the
main diagonal
l
ength of
each DG (k
),
k
= 1,2, .., N-1, N i
s
frame
length.
]
[
0
0
f
f
n
s
, f
s
is the sa
mpling
frequ
en
cy, f
0
uppe
r fre
q
u
ency limit for the pitch, [x] sai
d
that the great
est intege
r n
o
t exceedin
g
z, find the max (DG (k> n
0
)) co
rrespon
ds to the nu
mber
n, the pitch freque
ncy:
n
f
f
s
J
.
3. Empirical
Mode (EM
D
)
Decomp
osit
i
on and Pitch
Automa
tic Sy
nthesis
3.1. Empiric
a
l Mode De
c
o
mposition (EMD)
Assu
mption o
f
signal, EMD IMF compo
n
ent sele
ction
to achieve th
e followin
g
st
eps:
First fin
d
the
sig
nal m
a
ximum p
o
ints
and mi
nimu
m of all
data
point
s, fitted by cubi
c
splin
e interpo
l
ation to obtai
n the si
gnal
e
n
velope
and t
he next on th
e envelo
pe, to en
sure that
all
points o
n
the
two envelo
p
e
s in the Bet
w
ee
n the
up
per a
nd lo
we
r envelo
pe b
y
calcul
ating
the
mean
of ea
ch poi
nt, to o
b
tain a
mea
n
cu
rve, an
d
define th
e si
gnal mi
nu
s t
he
corre
s
p
o
n
d
ing
point of the seque
nce of the new d
a
ta a
v
ailable
(1)
1
()
ht
:
(1
)
11
()
()
()
x
tm
t
h
t
(
2
)
If
(1
)
1
()
ht
meet the
co
ndition
s
of IMF
com
pone
nts,
(1
)
1
()
ht
is
I
the firs
t order IMF
comp
one
nt. Otherwise,
(1
)
1
()
ht
continue to
re
peat the
pro
c
e
s
s times,
until
()
1
()
n
ht
meet the
conve
r
ge
nce crite
r
ia, then the
first order
comp
one
nt of the
()
x
t
’s
IMF:
()
11
()
()
n
Ct
h
t
(
3
)
1
()
Ct
is the
most
high
-freq
uen
cy co
mpo
nen
ts. Subtra
cte
d
1
()
Ct
from th
e
origin
al si
gna
l to
obtain first
-
order resi
dual t
e
rm
1
()
rt
:
11
()
()
()
x
tC
t
r
t
(
4
)
Then,
1
()
rt
repeat
the pro
c
e
s
s to get the se
con
d
order I
M
F com
pon
e
n
t
2
()
Ct
. This cont
inued
throug
h the
EMD de
com
p
osition
of the sign
al a
second roun
d se
lection to g
e
t some
ord
e
r I
M
F
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 14, No. 1, April 2015 : 116 – 12
2
118
comp
one
nts
and a re
sidu
al comp
one
nt
n
r
,
the entire d
e
com
p
o
s
ition
process is complete. After
the decompo
sition, the ori
g
inal si
gnal
()
x
t
can be expressed a
s
:
1
()
()
()
n
in
i
x
tC
t
r
t
(
5
)
Finally, the EMD de
com
p
o
s
ed IMF
com
pone
nts
()
i
Ct
of each o
r
d
e
r
co
ntained in th
e
sign
al reflect
s
the
charact
e
risti
c
s of diff
erent
tim
e
scales, on beh
a
l
f
of
non
-line
a
r
signal
from
the
high-f
r
eq
uen
cy modes to low freq
uen
cy
vibration mo
des inh
e
rent cha
r
a
c
teri
stics, so that you
can
ma
ke i
n
differe
nt sign
al chara
c
teri
stic
s
Resolution di
splay, in order to achi
eve
multire
s
olutio
n signal
capa
city; that
()
n
rt
is the trend term
or mean of
()
x
t
. EMD decomp
o
sition
to avoid the energy loss cau
s
e
d
by the wavele
t tra
n
sform to overcome the
energy leaka
g
e
.
Usi
ng (5
) can
reco
nst
r
u
c
t the origi
nal si
gnal.
3.2. Automa
tic Sy
nthesis Pitch
Elliptic filter through the noisy
speech (1) filtering af
t
e
r that x (t), the main ingredients for
the pitch; noi
se when the
band i
s
still strong (Q0
larger), the use
of
EMD (5)
decompo
sitio
n
.
Cal
c
ulate
d
the correl
ation
coeffici
ent:
n
i
C
STD
x
STD
C
x
i
R
i
i
,
,
2
,
1
)
(
*
)
(
)
,
cov(
)
(
(
6
)
Whe
r
e
cov is
the covari
an
ce, STD is sta
ndard deviati
on. Let R (i) b
y
order of the
first two se
ri
al
numbe
r for i (1), i (2), the synthetic pitch
is:
)
(
)
(
)
(
)
2
(
)
1
(
t
C
t
C
t
x
i
i
J
(
7
)
4. Recur
s
iv
e
Analy
s
is
Re
cursive a
n
a
lysis i
s
a
n
online
a
r dyn
a
mic a
nalysi
s
metho
d
, it is ba
se
d ph
a
s
e
spa
c
e
recons
truc
tion, reflec
ting t
he recovery
after the
cha
o
tic attra
c
tor
has
a la
w. Di
fferent natu
r
e
of
the state of the sig
nal cha
r
acte
ri
stics of the track not
the same
a
s
, and in the recu
rren
ce pl
o
t
(Re
c
u
r
ren
c
e
Plot RP) of the stru
cture is
differ
ent [4, 5]. Thought alg
o
rithm de
scri
bed a
s
follows:
(1) Sele
ct the a
ppropri
a
te time del
ay
and e
m
be
dding
dime
n
s
ion
m
, the o
ne-
dimen
s
ion
a
l reco
nstructio
n
of nonlinea
r time se
rie
s
, the resulting dy
namic
system
is as follo
ws:
(
(
)
,
(
)
,
...,
(
(
1
)
)
)
i
Xx
i
x
i
x
i
m
(
8
)
More tha
n
on
e-dim
e
n
s
iona
l time serie
s
that is
re
-po
s
e
-
dime
nsi
onal pha
se
spa
c
e trajecto
ry,
fro
m
the perspe
c
ti
ve of dynamical sy
stem
s to achi
eve a recove
ry in the high dime
n
s
ion
a
l sp
ace
attrac
tor.
(2)
Cal
c
ulate
the pha
se sp
ace rows
i
X
, c
o
lumns
j
X
, and the distan
ce b
e
t
ween ve
ctors:
ij
i
j
SX
X
(9)
Whe
r
e
x
is Euclidean n
o
rm.
(3) Recursive
calculation
ij
i
j
RS
(10)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Rob
u
st Pitch
Dete
ction Ba
sed o
n
Re
cu
rren
ce Anal
ysi
s
and Em
pirical Mode…
(Ji
ngfang
Wan
g
)
119
Whe
r
e
,
is th
e critical
di
stance,
x
is
said
step
(Heavi
s
ide) fu
nctio
n
,
0
1
0
0
)
(
x
x
x
.
Nod
e
s u
s
e t
he pha
se
sp
ace
can
be d
e
scrib
ed fro
m
two-dimen
s
ion
a
l graphi
cs
on the int
e
rna
l
dynamics
of nonlin
ear tim
e
se
rie
s
mat
r
i
x
of the mech
anism, th
e re
curre
n
ce plot
(RP).
1
ij
R
is
corre
s
p
ondin
g
position at the time that the black poi
n
t,
0
ij
R
is white poi
nt when you
said, RP
is thro
ugh th
e black poi
nt and figure p
o
int to des
cri
be the white
graphi
cs to reflect the ti
me
seri
es.
In orde
r to
quantitatively from a stati
s
tical p
o
int o
f
view of sig
nal analy
s
is,
in the
recurren
ce pl
ot is introdu
ced ba
sed o
n
the si
g
nal to b
e
measured d
egre
e
of re
cu
rsio
n [5]:
N
j
i
ij
R
N
R
1
,
2
1
(11)
Whe
r
e,
N fo
r RP m
ap
up,
the colu
mn
vector of n
ode
s. Cl
early,
R
is a
cumul
a
tive distri
bution
function, which descri
b
e
s
the pha
se
spa
c
e attra
c
to
r i
s
less tha
n
the
distan
ce bet
wee
n
two poi
nts
on the probability of
ε
, de
picted
relativ
e
to the pha
se
spa
c
e of
a refe
ren
c
e
point Xi in th
e
ε
phas
e
points
within the aggregate level.
So,
R
as the
correlation inte
gral fun
c
tion
attracto
r [6].
If
ε
is too small to obtain a
result from t
he large
||
||
j
i
X
X
than
ε
, then
)
(
x
= 0, summation,
R
= 0 indi
cate
s
the distri
butio
n of phase po
ints outsi
de the
ε
. If
ε
too
big ele
c
tion, all "points of "
no more than
the distan
ce
from it, then
R
= 1. Therefo
r
e,
ε
is too la
rge o
r
too small can
not
reflect th
e sy
stem'
s
inte
rn
al natu
r
e. In g
eneral,
ε
the
emulated t
o
make
0
≤
R
≤
1 make
s se
nse.
Propo
se
d d
e
g
ree
of
sign
al re
cu
rsion
syste
m
for the a
nalysi
s
of
sig
nal
s in the
dyn
a
mic
compl
e
xity provides a the
o
r
etical m
e
tho
d
.
The main di
a
gonal
straig
ht line parall
e
l to the
R
= 0 we
call re
cu
rrent points:
1
,
,
2
,
1
)
(
1
,
N
k
R
k
D
k
N
i
k
i
i
(
1
2
)
Its
s
i
ze reflects
the s
t
rength of the s
y
s
t
em
peri
odi
call
y. In spee
ch
sign
al processing,
we
take the em
b
eddin
g
dimen
s
ion m
= 1, time delay
= 0, the samplin
g frequ
en
cy fs (Hz), pitch
freque
ncy lim
it 500Hz,
]
500
[
fs
ks
.
}
1
,
,
1
,
)
(
max{
)
(
0
N
ks
ks
k
k
D
k
D
(
1
3
)
5. Experimental Ev
aluation
Backgroun
d noise
taken
f
r
om Noi
s
ex-9
2
dat
ab
ase [9], and its
sa
mpling frequ
ency fs
=
19.98
kHZ. Here we have
t
he
same sa
m
p
ling
frequ
en
cy fs, the
noi
se in
the
com
puter
re
co
rd
and
interio
r
noi
se
environ
ment,
"langua
ge, to
ne, end
point
" sou
nd
sho
w
n in Fig
u
re
1(a), the m
e
tho
d
frame Voici
n
g
line for the verdi
c
t. Proce
s
s in t
he voice
sub-f
r
ame
s
, each frame takin
g
25m
s, the
frame len
g
th N = [0.025f
s] point, frame shift
2
N
.
Experiment
1
:
The o
r
iginal
voice, o
r
igin
al voice
and
noise Noise
x
-92 lib
rary
o
f
white
noise (white
)
were u
s
ed
in
this meth
od
sign
al to noi
se ratio
10d
b, 5db, 0d
b,-5
d
b
, re
spe
c
tivel
y
,
unde
r the pit
c
h d
e
tectio
n
sho
w
n i
n
Fig
u
re 4, the fig
u
re L
e
ft part
of the hori
z
o
n
tal axis i
s
time
(se
c
o
n
d
s
), v
e
rtical
axi
s
i
s
amplitu
de, t
he
right
sid
e
of the
ab
sci
s
sa i
s
th
e n
u
m
ber of fram
es
,
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 14, No. 1, April 2015 : 116 – 12
2
120
respe
c
tively, the verti
c
al
axis pit
c
h f
r
eque
ncy
(H
z)
signal
with the elli
ptical
filter recursi
v
e
degree
s. Mini
stry left dia
g
ram of voi
c
e,
spe
e
ch mixe
d with
differe
nt noise (blue
)
, elliptical filtered
sign
al (bl
a
ck) and voi
c
ing t
heir di
scrimi
n
ant re
sult
s, th
e algo
rithm fo
r the dete
c
tio
n
of the central
figure to the
pitch frequency, the co
rresponding figure for the ri
ght
of the elliptical filter Voicing
sign
al re
cu
rsi
v
e degre
e
s a
nd split do
ubl
e threshold di
scrimin
ant lin
e.
Figure 1. The
original voi
c
e
mixed white noise
(white)
with different
SNR Comp
arison of
Funda
mental
Freq
uen
cy Detection al
gorithm
Experiment 2
:
For non
-sta
tionary noi
se
. The origi
n
a
l
voice, ori
g
i
nal voice
an
d noi
se
Noi
s
ex
-9
2 lib
rary
i
n
the
c
a
r n
o
is
e (v
ol
v
o
), bu
rs
t e
n
g
ine
(de
s
troy
eren
gine
) n
o
i
s
e, fa
ctory n
o
ise
(facto
ry),
were noi
sy n
o
ise
(b
abble
)
,
re
spectively
, the
metho
d
u
s
e
d
in
the
sig
n
a
l to
noise
ra
tio
(SRN) Pitch d
e
tection u
nde
r the 0db were sho
w
n in Fi
gure 5, the le
gend a
bove.
Figure 2. Orig
inal sp
ee
ch
mixed with diffe
rent noi
se (SNR =
0dB)
algorith
m
und
er the
Funda
mental Freq
uen
cy
Detection with
Comp
ari
s
o
n
(a)
Origi
nal spee
ch an
d th
e voicing
decision,
(a1)
of the origi
nal voice pit
c
h fre
quen
cy
detectio
n
, (a2
)
of the origin
al audio fre
q
u
ency si
gnal recu
rsive d
e
g
r
ees;
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Rob
u
st Pitch
Dete
ction Ba
sed o
n
Re
cu
rren
ce Anal
ysi
s
and Em
pirical Mode…
(Ji
ngfang
Wan
g
)
121
(b) Hy
bri
d
c
a
r
noi
se (v
olv
o
)
s
pee
ch (S
NR =
0
d
B) v
o
icin
g de
cisi
o
n
, (b1
)
hybri
d
vehicle
noise tone
pi
tch fre
que
ncy detecto
r, (b2)
Re
cu
rs
iv
e hybrid
vehi
cle n
o
ise lev
e
l low f
r
eq
ue
ncy
sign
al;
(c)
Hyb
r
id m
o
tor (de
s
troy
eren
gine
) sp
eech
noi
se
(SNR
= 0
d
B) voicing
de
ci
sion,
(c1)
hybrid e
ngine
noise to
ne pi
tch freq
uen
cy
detecto
r, (c2
)
Re
cu
rsive
h
y
brid en
gine
noise level lo
w
freque
ncy si
g
nal;
(d) Ble
ndin
g
plant noi
se
(facto
ry) spe
e
ch
(SNR = 0dB) voici
n
g de
cisio
n
, (d1) pit
c
h
freque
ncy
so
und mixing
plant noi
se
detectio
n
, (d
2) Recursive
hybrid pla
n
t noise l
e
vel
low
freque
ncy si
g
nal;
(e)
Loud
noi
se mixed peo
p
l
e (ba
bble
)
spee
ch (S
NR
= 0dB
) voicin
g deci
s
io
n, (e
1) were
noisy mixe
d t
one
pitch f
r
e
quen
cy dete
c
tor noi
se,
(e2
)
mixed l
o
w-f
r
equ
en
cy si
g
nals we
re
lo
ud
noise re
cu
rsi
v
e degre
e
s.
Experiment
3
:
The TIMIT
spe
e
ch data
b
a
se.
He
re th
e pe
rform
a
n
c
e of the
ne
w
method
with the
tradi
tional
cente
r
of the a
u
toco
rrel
a
tion fu
nction of
clippin
g
metho
d
[10
]
and
ce
pstru
m
[11] com
pare
d
the p
e
rfo
r
mance
a
nd
evaluation. T
e
st pe
rforma
nce i
ndi
cators u
s
ed
are
as
follows
:
1) Voi
c
ing T
h
e accu
ra
cy (ASR-Acoup
Sur Ratio)
: th
e right to det
ermin
e
the ex
isten
c
e of
fundame
n
tal f
r
equ
en
cy of the nu
mbe
r
of
frame
s
in
th
e
voice a
s
a p
e
rcentag
e of t
he total nu
mb
er
of frame
s
. Th
e high
er the
i
ndex, then d
e
termin
e wh
e
t
her the
cycli
c
al p
e
rfo
r
ma
nce
of voice,
the
better.
2) The effe
ctive fundament
al freque
ncy relative
error (VPRE-Valid
Pitch Relativ
e
Erro
r):
In the standa
rd frame fun
damental fre
quen
cy is
not
zero, the cal
c
ulatio
n of non-ze
ro valu
e of
the fund
ame
n
tal freq
uen
cy and th
e
re
feren
c
e fu
nd
amental f
r
eq
uen
cy divide
d by the
squ
a
re
error b
e
twe
e
n
the refe
re
nce
RMS M
ean fund
ame
n
tal frequ
en
cy. The lowe
r the index, the
algorith
m
accura
cy as po
ssible.
As ca
n be
seen from Ta
ble 1, the
ne
w meth
od of
voicing
erro
r rate lo
we
r t
han the
traditional
au
tocorrel
ation and ce
pstru
m
,
whi
c
h ce
pstru
m
worst
.
This
i
s
mai
n
ly
be
cau
s
e only
cep
s
tru
m
ce
p
s
trum o
r
u
s
in
g compl
e
x ce
pstru
m
and
p
i
tch in if there
are pea
ks correspon
ding
to
disting
u
ish the voicing sou
nd and e
s
timated pitch
pe
riod, voice
d
in some
ca
se
s, but someti
mes
not pa
rticul
arly promin
ent
pea
k p
o
int, and i
n
t
he
ca
se of voi
c
eless b
u
t th
ere
will b
e
some
occa
sion
al p
eaks, re
sultin
g in larg
er V
o
ic
in
g misju
d
ged an
d effective base freque
ncy erro
r;
autocorrelatio
n
with relativ
e
ly fixed clipping
thre
shol
d, half-octave h
i
gher freque
n
c
y phen
omen
a,
and thu
s
al
so
affect the eff
e
ctive fund
a
m
ental fre
q
u
ency e
r
ror; o
v
al filtered hi
gh-frequ
en
cy filter,
empiri
cal
mo
de d
e
compo
s
ition
(EMD) of the
sig
n
a
l
filtering, filt
ered
half
-
fre
quen
cy h
a
rm
onic
gene
ration S
H
G, ca
n effectively filter out on the pi
tch
detectio
n
is n
o
t the nece
ssary informatio
n,
and si
gnal
s
of different a
m
plitude
can
be simplifie
d,
thus imp
r
oving the cl
assificatio
n
rate
Voicing, fund
amental fre
q
u
ency re
du
ce
s the effective error.
6. Conclusio
n
s and Ou
tlo
o
k
With
the
lo
wer
si
gnal
to noise
ratio, recu
rsive pha
se spa
c
e re
con
s
tru
c
tion analysi
s
forecastin
g p
e
rform
a
n
c
e
of attractor i
n
crea
si
ngly
blurred, lead
ing to the signal se
que
n
c
e
compl
e
xity of
the analysi
s
d
r
opp
ed. Hilb
e
r
t-Huan
g tran
sform
with e
m
piri
cal mod
e
decompo
sit
i
on
(EMD) to obtain a finite order intri
n
si
c
mode fu
n
c
tio
n
(IMF), ea
ch one of the IMF comp
on
ents
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 14, No. 1, April 2015 : 116 – 12
2
122
typically have real physi
cal
meaning,
re
spe
c
tively, th
e cha
r
a
c
teri
stic scale of si
gnal pa
ramet
e
rs
in a freq
uen
cy band info
rmation The
mode fun
c
tio
n
ca
n refle
c
t the sign
al at
any time with
the
freque
ncy
ch
ara
c
teri
stics. Therefore, thi
s
co
mbin
atio
n of empiri
cal
mode de
co
m
positio
n of no
isy
spe
e
ch si
gn
al pre
-
n
o
ise
redu
ction,
and the
n
select the
hi
gh-di
men
s
ion
a
l pha
se
sp
ace
recon
s
tru
c
tio
n
of attra
c
tor
descri
p
tion of
the ch
ar
acte
ristics
of the
sign
al to a
c
hi
eve the loo
k
and
accurate pitch extraction o
f
spee
ch sta
r
t
and end p
o
in
t and purp
o
se
.
Voice si
gna
l is one-di
mensi
onal ti
me-d
omain
sign
al, usin
g empiri
cal
mode
decompo
sitio
n
(EM
D
)
co
rrelation, recursive an
alys
i
s
and
elliptic f
ilter (EF
)
the
com
b
inatio
n
of
pro
c
e
ss it, the re
sults
sho
w
that the method
can ef
fectively sup
p
re
ss
noise, highlightin
g the
sign
al peri
odi
c stru
ctu
r
e, wea
k
e
n
ing caused
by the reso
nan
ce freque
ncy pea
k half-frequ
e
n
cy
phen
omen
on.
And ca
n accurately di
stin
guish voicin
g
low tone
of voice, voi
c
ing t
r
an
sition
se
ction
of the pitch d
i
scrimin
a
tion i
s
more accu
rate,
and the algorith
m
is simple and fa
st. Experiments
sho
w
th
at this meth
od
ca
n re
si
st the
noise inte
rference, is
ro
bu
st, bein
g
abl
e to a
c
curately
extract the pit
c
h of the cy
cl
e to achi
eve the extr
a
c
tion
of the sign
al, kee
p
ing p
u
rp
ose
s
detail a
nd
noise su
ppre
ssi
on.
Referen
ces
[1]
CHUN J, SYING J, ZHAN
G R.
T
RUES:
T
one recog
n
itio
n usin
g
ext
end
ed se
g
m
ents.
ACM
T
r
ansactio
n
s o
n
Asia La
ng
ua
ge Infor
m
atio
n
p
rocess
ing
. 2
0
08; 7(3).
[2]
FERRER C, T0RRES D, HE
RNANDEz DIA
Z
ME.
Contours in the Eval
ua
tion of Cycle-t
o
-Cycle Pitc
h
Detectio
n Alg
o
r
ithms
. Proce
e
d
in
gs of the 13
th Iberoamer
ic
an con
g
ress o
n
Pattern Rec
o
gniti
on. 20
08.
[3]
Gao Qua
n
, D
i
ng Y
u
me, W
i
de Y
o
n
gho
ng.
Di
gital
si
g
n
a
l
proc
essin
g
t
heor
y,
impl
em
entatio
n,
a
n
d
app
licati
on. Bei
jing: El
ectronic
Industr
y
Pr
ess. 2007: 1
44-1
5
1
.
[4]
Mar
w
an
N, T
h
iel M,
No
w
a
cz
yk
NR.
Cr
oss
recurre
nce
pl
ot bas
ed sy
nc
hron
i
z
a
t
io
n
of
time
seri
es
.
Nonl
in
ear. Pro
c
. Geoph
ys. 20
02; 9: 325-
331.
[5]
Yan Yu
n-Qia
n
g
, Z
hu Yi-Sh
e
ng, T
he voice
sign
al
e
n
d
poi
nt detectio
n
met
hod
is bas
ed
o
n
a rec
u
rsive
analy
sis.
Co
mmu
n
ic
ations
. 2
007; 28(
1): 35-
39.
[6]
Spib Noise dat
a
. http://spib.
rice.edu/spib/select_nois
e
.html.
[7] RABINE
RLR
.
On the
use
of autoc
orre
lati
on
an
al
ysis
for
pitch
d
e
tectio
n
.
IEEE Tram ASSR
. 19
77
;
25(1): 24-
33
.
[8]
KOBAYASHI H, SHIMAMURA T
.
A mo
dif
i
ed
cepst
m
m
meth
od
for
pit
c
h extracti
on.
I
EEE APCCAS.
199
8: 299-
302
Evaluation Warning : The document was created with Spire.PDF for Python.