TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 12, Decembe
r
2014, pp. 82
0
5
~ 821
1
DOI: 10.115
9
1
/telkomni
ka.
v
12i12.64
82
8205
Re
cei
v
ed
Jul
y
14, 201
4; Revi
sed O
c
tob
e
r 17, 201
4; Acce
pted No
vem
ber 4, 20
14
Pitch Detection Based on EMD and the Second
Spectrum
Jingfan
g Wa
ng
Schoo
l of Information Sci
enc
e and En
gi
neer
ing, Hu
na
n Internatio
nal Ec
on
omics Univ
ersit
y
,
Chan
gsh
a
, Ch
ina, postco
de:
410
20
5
email: matl
ab_
b
y
s
j
@1
26.com
A
b
st
r
a
ct
A new
metho
d
for pitch
det
ection
of seco
ndary s
pectru
m
is
des
ign
e
d
in the
pa
per,
the no
is
y
speec
h
oval
(E
lliptic
Filter, EF
) ba
nd-p
a
ss fi
lter is
des
ig
ned
first
in
th
is method, an
d
th
en
the exp
e
rie
n
c
e
mo
de Dec
o
mp
ositio
n(EMD)of
Hilbert-H
uan
g transform (H
H
T
) is used to deco
m
p
o
se the
sign
al into a fin
i
te
nu
mb
er of intrinsic mode fun
c
tions (IMF
), a
nd IMF
co
mpo
nents of differe
nt scales are a
ssociate
d
w
i
th
t
h
e
deco
m
positi
o
n
of the sig
n
a
l
before
c
a
lcu
l
at
ion,
the maxi
mu
m of
tw
o
modes
associ
at
ed (IMF
) synth
etic
pitch si
gn
al d
e
tection
is tak
en. Exp
e
ri
me
n
t
al resu
lts sho
w
that the me
thod co
ul
d b
e
better tha
n
t
h
e
traditio
nal auto
c
orrelati
on method,
a
nd
c
epst
r
um metho
d
h
a
s
better res
u
lts
,
especi
a
lly
w
i
th voic
ing
obv
io
us
seg
m
e
n
t featur
es, there is bet
ter perfor
m
anc
e of pitch det
ec
tion in n
o
isy sp
eech, sig
n
a
l
to nois
e
ratio(SN
R
)
also h
a
s go
od
robustn
ess in the low
e
r sou
n
d
enviro
n
m
ent.
Ke
y
w
ords
:
empiric
a
l
mod
e
d
e
co
mp
ositi
on (EMD),
ell
i
ptic
filter
(EF), intrinsi
c
mo
de fu
nction
(IMF
),
secon
dary sp
e
c
trum, pitch d
e
t
ection
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Pitch refers to the hair ca
use
d
by voca
l fold
vibration durin
g voiced peri
odi
city pitch is
the re
ciprocal
of the freque
ncy of vocal f
o
ld vi
bration [
1
]. Speech
si
gnal pitch is to describe on
e
of the im
po
rtant pa
ram
e
te
rs, i
n
the
ton
e
rec
ognition,
emotio
n re
cognition, sp
e
e
ch
recogniti
on,
spe
a
ker reco
gnition, sp
ee
ch synth
e
si
s
and codin
g
, musi
c ret
r
iev
a
l, soun
d sy
stem, diag
no
sis,
heari
ng im
p
a
irme
nt and
many othe
r area
s
of l
angu
age i
n
struction
ha
s wide
ran
g
e
of
appli
c
ation
s
[
2
]. Becau
s
e
spe
e
ch is a
dynamic
process i
s
no
n-st
ationar
y ran
d
o
m
p
r
o
c
e
ss, so
cha
nge
s in
the
waveform
is
extremely
com
p
lex, no
t only the
si
ze of the
pitch
peri
od l
engt
h of
individual
vocal, thickne
s
s, tough
ne
ss a
nd p
r
o
nun
ciat
ion h
abits,
bu
t also
with th
e p
r
on
un
ciation
of age,
gen
d
e
r, p
r
on
un
cia
t
ion, the inte
nsity an
d
em
otional
articul
a
tion, an
d m
any othe
r fa
ctors.
At prese
n
t, the harder to fi
nd a commo
n app
roa
c
h t
o
extract a
ccurate a
nd reli
able voice in
an
y
ca
se, the
pitch pe
riod,
so
the e
s
timated
pitch
peri
od i
s
the
stu
d
y of
sp
ee
ch
pro
c
essing
field h
a
s
been h
o
t and
difficult one.
Pitch e
s
timat
e
s
usually
kn
own
a
s
pit
c
h
detecti
o
n
. Th
e current
pitch dete
c
tion
is mainly
based on the
traditional voice mo
del, can be divide
d into time domain, frequ
ency dom
ain
and
time-freq
uen
cy domain
mi
xing. Among
them, t
he
most
rep
r
e
s
entative is t
he time
-dom
ain
autocorrelatio
n
(ACF
) met
hod an
d the
averag
e mag
n
itude differe
nce
(AMDF)
method [3], b
u
t
the ACF
met
hod i
s
ea
sy t
o
have
a
"mul
tiplier" a
nd "
h
alf freq
uen
cy" erro
r, AMDF
method
can
n
o
t
effectively track voi
c
e rapi
d cha
nge
s in
frequen
cy, so whe
n
rapi
d
chan
ge
s in voice freq
uen
cy
and
amplitud
e, the pitch d
e
tection
a
c
cu
racy
de
creased
significantl
y
. Commo
nly used frequ
e
n
cy
cep
s
tru
m
[4], the introduct
i
on of the number of ope
rations, the calcul
ation of the digital sig
nal
pro
c
e
ssi
ng
in
cre
a
sed
si
gni
ficantly, and
so vul
n
e
r
abl
e to th
e effe
cts
of
noise,
pitch
detecti
on
accuracy d
e
crea
sed.
Com
b
ination of ti
me-fre
que
ncy
wavelet tran
sform d
o
mai
n
is al
so hig
h
ly
vulnera
b
le to
the impa
ct of noise,
with the voic
e
sign
al to noise
ra
tio decrea
s
e
d
,
incre
a
si
ng the
error dete
c
tio
n
.
In 1998, No
rd
eng E. Huan
g
propo
se
d a n
on-st
ationa
ry sign
al adaptiv
e decompo
sit
i
on of
the prog
ram:
first signal
EMD (Empi
r
i
c
al Mod
e
Decom
p
o
s
ition
,
referre
d to as EMD) [
5
],
scree
n
ing
a
seri
es of
sin
g
l
e-fre
que
ncy
narro
w-b
and
mode
compo
nent
(availabl
e the
way th
ey
are con
s
tru
c
t
ed by linear superpo
sition
of the or
iginal
signal
), then every singl
e frequ
en
cy ban
d
comp
one
nt o
f
the Hil
b
e
r
t tran
sform
mode, th
e i
n
stanta
neo
us re
ceived
signal
wh
en ti
me-
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8205 – 82
11
8206
freque
ncy distribution and distrib
u
tion.
T
he
p
r
og
ra
m i
s
called
the
Hilbe
r
t-Huan
g Tran
sform.
Th
e
method th
ere
is a
pro
m
ine
n
t pro
b
lem, if
the sig
nal ex
i
s
ts in
sin
gula
r
ity, the
EMD filter out the fi
rst
mode
comp
o
nent will con
t
ain the sing
ular poi
nt, while the no
rmal sign
al compon
ent wil
l
be
pushed to
th
e next mo
de
after the
co
m
pone
nt level
by
level of
so
that the p
h
ysical me
anin
g
of
the origi
nal
with a co
mplet
e
sig
nal
com
pone
nts
a
r
e fi
ltered to
different
comp
one
nts of the m
o
de.
The ab
sen
c
e
of a unified solution to the probl
em, a
ccordin
g to the spe
c
ific
cha
r
acteri
stics of
the
analysi
s
sign
al pro
c
e
s
sing
. Many probl
ems fo
r the
above p
r
obl
e
m
, we u
s
e el
liptic ban
d-pa
ss
filter (Elliptic Filter) [6] t
o
preprocess the sig
nal t
o
eliminate t
he introduction of high-order
harm
oni
c dist
ortion an
d n
o
ise, the sin
gular p
o
in
t, so has the ph
ysical me
ani
ng of the sig
nal
comp
one
nt of
a
com
p
lete li
near
sup
e
rp
o
s
ition
stan
d o
u
t the
way, a
nd the
n
use
t
he EM
D m
e
th
od
sele
cted
correlation with p
h
ysical meani
ng we ne
ed the mode
sig
nal. And then
combini
ng with
the se
co
nd
spectrum of
pi
tch dete
c
tion.
Additive
bro
adba
nd n
o
ise
with a va
riet
y of voice te
st,
the method can accu
rately
detect the pitch peri
od,
so that further redu
ce the detectio
n
error,
and ha
s go
od
robu
stne
ss.
2. Elliptic Fil
t
er
w
i
th
the
Pitch Detecti
on Process
2.1. Eelliptic
Filter
Elliptic filter (Elliptic filter) [
6
], also
known as
K
aul filter (Cauer filt
er), is in
the passband
and a sto
pba
nd equi
rippl
e
filter. Elliptic filter compa
r
ed to other t
y
pes of filters, in order
un
der
the same
co
ndition
s
with
the minim
u
m
pa
ssban
d
a
nd
stopb
and,
fluctuatio
ns i
n
tra
n
sitio
n
zone
decrea
s
e
d
ra
pidly, the tran
sition
zon
e
is very
narro
w. It is in the p
a
ssba
nd a
nd
stopb
and
of the
fluctuation
s
i
n
the same,
whi
c
h i
s
different fr
om the
passb
and
an
d stop
ban
d a
r
e flat Butterworth
filter, and a flat passba
nd, stopba
nd
equirip
p
le
stop-ba
nd or
a flat passb
and rip
p
le, etc.
C
h
eb
ys
he
v filte
r
.
This
4-order elliptic
band-pass
filter, t
he maximum attenuation of 0.05dB pas
s
band and
minimum
sto
pban
d attenu
ation of
80d
B, passba
n
d
regi
on
2 * [
75,500] / f
s
,
fs the
sam
p
li
ng
freque
ncy (Hz). When the
fs = 19.98
kHz to
obtainin
g
the filte (1) (Omission
)
.
2.2. Pitch De
tec
t
ion Proc
ess
Noi
s
y spe
e
ch
unde
rwe
n
t a
4-orde
r ellipt
i
c ban
d-
p
a
ss
filter, filter out high freq
uen
cy and
low fre
quen
cy below 60
Hz, and
calcul
ated to the first
N0 ellipti
ca
l filtering of data as the ini
t
ial
stand
ard
devi
a
tion of the
n
o
ise
se
ct
ion
of Q0 (EM
D
as a
ba
sis f
o
r a
c
cess); th
e
n
20
-30m
s lo
ng
framing; of e
a
ch fra
m
e si
gnal (x(i
),
i = 1,2, ..., L) the avera
ge
energy (
L
i
i
x
M
1
2
)
(
1
) of
the
doubl
e thre
sh
old for the fra
m
e voicing d
e
ci
sion, pi
tch set the frame
voicele
ss Ze
ro, or dete
r
mi
ne
the stan
da
rd
deviation of
the initial n
o
ise
se
gment
Q0
size. If Q0 <
α
(eg
α
= 0.1
5
), di
rectly
cal
c
ulate
d
fro
m
the se
con
dary pitch fre
quen
cy sp
ect
r
um, or q
u
a
s
i
-
varia
n
ce cal
c
ulatio
ns voi
c
ed
frame Q.
When Q
<kQ0
(k
con
s
tant
), voicel
e
ss f
r
ame
pitch t
o
ze
ro, othe
rwi
s
e the E
M
D
decompo
se
d
IMF
comp
o
nents on
diff
erent
scal
es
asso
ciated
with
the de
co
mpositio
n of th
e
sign
al prio
r to calculation, take the maxi
mum co
rrelati
on of the two
modes
(IMF) synthetic pit
c
h
sign
al, again
Synthesized
sign
al se
ek a
se
con
d
sp
ect
r
um calculati
ng pitch.
Figure 1. pitch detectio
n
proce
ss
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Pitch Dete
ction Base
d on
EMD and the
Secon
d
Spe
c
trum
(Jingfan
g Wan
g
)
8207
Voiced
fra
m
e
of a
second
frequ
en
cy si
g
nal
spe
c
tru
m
cal
c
ulatio
n: fx = IFFT
(|FFT
(x) |
2
).
]
[
0
0
f
f
n
s
, fs is the sa
mpling fre
que
ncy, f0 uppe
r freque
ncy li
mit for the pitch, [x] said th
at the
greate
s
t int
e
g
e
r
not ex
cee
d
i
ng
z, find th
e
max (fx
(i
> n
0
)) corre
s
po
n
d
s to
the
num
ber n, the
pitch
freque
ncy:
n
f
f
s
J
.
3.
Empirical
Mode (EM
D
)
Decomp
ositi
on and Pitch
Automa
tic Sy
nthesis
3.1. Empiric
a
l Mode Dec
o
mposition (EMD)
EMD is to d
e
com
p
o
s
e th
e sign
al to be the time
delay betwe
en adja
c
e
n
t pea
ks i
s
defined a
s
th
e time scale,
nonlin
ear, n
o
n
-statio
n
a
r
y signal scree
ni
ng, bro
k
e
n
d
o
wn into
different
time scale
s
contain a limit
ed num
ber
of intrinsi
c
mo
d
e
functio
n
(In
t
rinsi
c
Mo
de
Functio
n
, IMF)
comp
one
nt of and, deco
m
positio
n of the orde
r
IMF comp
one
nts are
stat
ion
a
ry narro
w-b
and
sign
als. IMF comp
one
nt must meet the followin
g
two conditions: (1) For a list of characteri
stics
of the data from the global
point
of view, the number
of extreme po
ints have an
equal nu
mbe
r
o
r
a maximum d
i
fference of zero poi
nt; (2) at a certain l
o
cal p
o
int by point and the
maximum The
definition of the minimum
point of the two
zero me
a
n
envelope, that envelope
up and do
wn
on
the timeline o
f
local
symme
try. EMD decompo
sition
of
the above t
w
o co
ndition
s i
s
the e
nd of t
h
e
conve
r
ge
nce
crite
r
ia, e
a
ch
one
of the IM
F co
mpo
nent
s
can
be
co
n
s
ide
r
ed
a
s
a
n
intrin
sic si
gn
a
l
mode fun
c
tio
n
.
Assu
mption o
f
signal, EMD IMF compo
n
ent sele
ction
to achieve th
e followin
g
st
eps:
First fin
d
the
sig
nal m
a
ximum p
o
ints
and mi
nimu
m of all
data
point
s, fitted by cubi
c
splin
e interpo
l
ation to obtai
n the si
gnal
e
n
velope
and t
he next on th
e envelo
pe, to en
sure that
all
points o
n
the
two envelo
p
e
s in the Bet
w
ee
n the
up
per a
nd lo
we
r envelo
pe b
y
calcul
ating
the
mean
of ea
ch poi
nt, to o
b
tain a
mea
n
cu
rve, an
d
define th
e si
gnal mi
nu
s t
he
corre
s
p
o
n
d
ing
point of the seque
nce of the new d
a
ta a
v
ailable
(1
)
1
()
ht
:
(1
)
11
()
()
()
x
tm
t
h
t
(
2
)
If
(1)
1
()
ht
meet the
co
ndition
s
of IMF
com
pone
nts,
(1)
1
()
ht
is
I
the firs
t order IMF
comp
one
nt. Otherwise,
(1)
1
()
ht
continue to
re
peat the
pro
c
e
s
s times,
until
()
1
()
n
ht
meet the
conve
r
ge
nce crite
r
ia, then the
first order
comp
one
nt of the
()
x
t
’s
IMF:
()
11
()
()
n
Ct
h
t
(
3
)
1
()
Ct
is the
most
high
-freq
uen
cy co
mpo
nen
ts. Subtra
cte
d
1
()
Ct
from th
e
origin
al si
gna
l to
obtain first
-
order resi
dual t
e
rm
1
()
rt
:
11
()
()
()
x
tC
t
r
t
(
4
)
Then,
1
()
rt
repeat
the pro
c
ess t
o
get t
he se
cond orde
r IMF comp
one
nt
2
()
Ct
. This co
ntinu
ed
throug
h the EMD de
com
p
o
s
ition of the si
gnal a secon
d
roun
d sel
e
ction to get so
me ord
e
r IMF
comp
one
nts
and a re
sid
u
a
l
compo
nent
n
r
, the entire de
comp
ositio
n pro
c
e
ss i
s
co
mplete. After
the decompo
sition, the ori
g
inal si
gnal
()
x
t
can be expressed a
s
:
1
()
()
()
n
in
i
x
tC
t
r
t
(
5
)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8205 – 82
11
8208
Finally, the
EMD de
com
posed IMF
compon
ents
()
i
Ct
o
f
each
order contai
ned i
n
the sig
nal
reflect
s
the chara
c
te
risti
c
s of different time sc
ale
s
, o
n
behalf of n
on-lin
ea
r sign
al from the hi
gh-
freque
ncy m
ode
s to lo
w f
r
equ
en
cy vibration m
ode
s
inhe
rent cha
r
acte
ri
stic
s, so that you
can
make
in different sign
al chara
c
te
risti
c
s
Re
sol
u
ti
on
d
i
splay, in
ord
e
r to
a
c
hieve
multire
s
ol
ution
sign
al cap
a
ci
ty;
that
()
n
rt
is the
tren
d te
rm
o
r
me
an
of
()
x
t
. EMD
de
comp
o
s
ition to
avoi
d the
energy loss caused by the
wavelet tran
sform to
ove
r
come th
e en
ergy lea
k
ag
e.
Usin
g (5
) ca
n
recon
s
tru
c
t the origin
al sig
nal.
3.2. Automa
tic Sy
nthesis Pitch
Elliptic filter through the noisy
speech (1) filtering af
t
e
r that x (t), the main ingredients for
the pitch; noi
se when the
band i
s
still strong (Q0
larger), the use
of
EMD (5)
decompo
sitio
n
.
Cal
c
ulate
d
the correl
ation
coeffici
ent:
n
i
C
STD
x
STD
C
x
i
R
i
i
,
,
2
,
1
)
(
*
)
(
)
,
cov(
)
(
(
6
)
Whe
r
e
cov is
the covari
an
ce, STD is sta
ndard deviati
on. Let R (i) b
y
order of the
first two se
ri
al
numbe
r for i (1), i (2), the synthetic pitch
is:
)
(
)
(
)
(
)
2
(
)
1
(
t
C
t
C
t
x
i
i
J
(
7
)
Figure 2 sho
w
s the
ellipse of a voice
filtered thro
ugh EMD
de
comp
ositio
n
(n = 7
)
received th
e f
i
rst
4 IMF
co
mpone
nts an
d synth
e
ti
c
(I
MF2
+ IMF
3
) pitch
si
gnal.
Figure 2
sho
w
s,
the sig
nal lay
e
rs f
r
om hi
gh
to low fre
que
ncy f
ilters, e
a
c
h o
ne of the
IMF sho
w
e
d
a com
pon
ent
o
f
the
mod
a
l scales, and
no
modal overl
a
p.
The
EM
D
decompo
sitio
n
of a
synth
e
tic pit
c
h f
r
equ
ency
sign
al filtered
with a half-freque
ncy ha
rmoni
cs, dyna
mic screen a
u
tomatically combine
d
.
Figure 2. EMD de
comp
osit
ion of sp
e
e
ch
signal
synthe
sis a
nd pitch
4. Experimental Ev
aluation
Backgroun
d noise
taken
f
r
om Noi
s
ex-9
2
dat
ab
ase [7], and its
sa
mpling frequ
ency fs
=
19.98
kHZ. Here we have
t
he
same sa
m
p
ling
frequ
en
cy fs, the
noi
se in
the
com
puter
re
co
rd
and
interio
r
noi
se
environ
ment, "langua
ge, to
ne, end p
o
int
" soun
d sho
w
n in Figu
re 1
(
a), the metho
d
frame Voici
n
g
line for the verdi
c
t. Proce
s
s in t
he voice
sub-f
r
ame
s
, each frame takin
g
25m
s, the
frame len
g
th M = [0.025fs]
point, frame shift
2
M
.
Experiment 1
:
The origin
al voice, origi
nal voice an
d noise
Noi
s
ex-92 library of white
noise (white
)
were u
s
ed
in
this meth
od
sign
al to noi
se ratio
10d
b, 5db, 0d
b,-5
d
b
, re
spe
c
tivel
y
,
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Pitch Dete
ction Base
d on
EMD and the
Secon
d
Spe
c
trum
(Jingfan
g Wan
g
)
8209
unde
r the pit
c
h dete
c
tion
sho
w
n in Fi
g
u
re 3, Figu
re
Left part of the hori
z
ont
al axis is tim
e
(se
c
o
n
d
s
), v
e
rtical
axi
s
i
s
amplitu
de, t
he
right
sid
e
of the
ab
sci
s
sa i
s
th
e n
u
m
ber of fram
es
,
respe
c
tively, the vertical axi
s
pitch freq
ue
ncy (H
z) filtered sig
nal
with
the avera
ge
energy ellip
se
.
Ministry left diagram of voice, speech
mixed with
di
fferent noi
se
(blue), elliptical filtered si
gnal
(bla
ck) an
d voicin
g their di
scrimin
ant re
sults, the alg
o
rithm for th
e
detection of the ce
ntral fig
u
re
to the pit
c
h f
r
equency, the corres
ponding figure for the
right
of
the ellipti
cal f
ilter Voi
c
ing the
averag
e sig
n
a
l energy and
dual thre
shol
d discrimi
nan
t dividing line.
Figure 3. Th
e origin
al aud
io mix of white noise (white) with differe
nt SNR Co
m
pari
s
on of
Funda
mental
Freq
uen
cy Detection al
gorithm
Experiment 2:
For n
o
n
-
st
ationary noi
se. The origi
n
al voice, orig
inal voice an
d noise
Noi
s
ex
-9
2 lib
rary
i
n
the
c
a
r n
o
is
e (v
ol
v
o
), bu
rs
t e
n
g
ine
(de
s
troy
eren
gine
) n
o
i
s
e, fa
ctory n
o
ise
(facto
ry), were loud
noi
se
(bab
ble
)
, re
spective
ly, the
method
use
d
in the
sig
n
a
l to noi
se
ra
tio
(SRN ) pitch
detectio
n
und
er the 0db
we
re
sh
own in F
i
gure 4, the le
gend a
bove.
Figure 4. The
original
spe
e
c
h mixed with
diffe
rent noise (SNR = 0d
B) algorith
m
unde
r the
Funda
mental Freq
uen
cy
Detection with
Comp
ari
s
o
n
(a)
Origi
nal spee
ch an
d th
e voicing
decision,
(a1)
of the origi
nal voice pit
c
h fre
quen
cy
detectio
n
, (a2
)
the avera
g
e
energy of the
original a
udi
o freque
ncy signal;
(b) Hy
bri
d
c
a
r
noi
se (v
olv
o
)
s
pee
ch (S
NR =
0
d
B) v
o
icin
g de
cisi
o
n
, (b1
)
hybri
d
vehicle
noise tone
pit
c
h frequ
en
cy
detec
to
r, (b2) hybrid
car
n
o
ise, th
e ave
r
age
ene
rgy o
f
low-f
r
eq
uen
cy
sign
al;
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8205 – 82
11
8210
(c)
Hyb
r
id m
o
tor (de
s
troy
eren
gine
) sp
eech
noi
se
(SNR
= 0
d
B) voicing
de
ci
sion,
(c1)
hybrid e
ngin
e
noise to
ne p
i
tch freq
uen
cy detector,
(c2) the ave
r
a
ge hybri
d
en
gine noi
se l
o
w-
freque
ncy si
g
nal ene
rgy;
(d) Blen
ding
plant noi
se (f
actory
) sp
ee
ch (S
NR
= 0
d
B) voicin
g deci
s
io
n, (d1
)
Mixed
plant noi
se t
one pit
c
h fre
quen
cy dete
c
tor, (d2
)
the
averag
e mix
ed-sign
al lo
w-freq
uen
cy n
o
ise
power pla
n
ts;
(e)
Loud
noi
se mixed peo
p
l
e (ba
bble
)
spee
ch (S
NR
= 0dB
) voicin
g deci
s
io
n, (e
1) were
noisy mixe
d t
one
pitch f
r
e
quen
cy dete
c
tor noi
se,
(e2
)
mixed l
o
w-f
r
equ
en
cy si
g
nals we
re
lo
ud
noise avera
g
e
energy.
Experiment 3:
T
he
TIMIT sp
ee
ch
data
base. Here
th
e pe
rforman
c
e of the
n
e
w
method
with the
tra
d
itional
ce
nter
o
f
the auto
c
o
r
relation
fu
ncti
on of
clip
ping
metho
d
[8] a
nd
cep
s
tru
m
[
9
]
to compa
r
e a
nd evaluate t
he perfo
rma
n
c
e. Te
st perfo
rman
ce in
dicators u
s
e
d
are as follo
ws:
1) Voi
c
ing T
h
e accu
ra
cy (ASR-Acoup
Sur Ratio)
: th
e right to det
ermin
e
the ex
isten
c
e of
fundame
n
tal f
r
equ
en
cy of the nu
mbe
r
of
frame
s
in
th
e
voice a
s
a p
e
rcentag
e of t
he total nu
mb
er
of frame
s
. Th
e high
er the
i
ndex, then d
e
termin
e wh
e
t
her the
cycli
c
al p
e
rfo
r
ma
nce
of voice,
the
better.
2) The effe
ctive fundament
al freque
ncy relative
error (VPRE-Valid
Pitch Relativ
e
Erro
r):
In the standa
rd frame fun
damental fre
quen
cy is
not
zero, the cal
c
ulatio
n of non-ze
ro valu
e of
the fund
ame
n
tal freq
uen
cy and th
e
re
feren
c
e fu
nd
amental f
r
eq
uen
cy divide
d by the
squ
a
re
error b
e
twe
e
n
the refe
re
nce
RMS M
ean fund
ame
n
tal frequ
en
cy. The lowe
r the index, the
algorith
m
accura
cy as po
ssible.
As can
be
seen from T
a
ble 1, the
ne
w meth
od
of voicing
e
rro
r rate lo
we
r t
han th
e
traditional
au
tocorrel
ation and ce
pstru
m
,
whi
c
h ce
pstru
m
worst
.
This
i
s
mai
n
ly
be
cau
s
e only
cep
s
tru
m
ce
p
s
trum o
r
u
s
in
g compl
e
x ce
pstru
m
and
p
i
tch in if there
are pea
ks correspon
ding
to
disting
u
ish the voicing sou
nd and e
s
timated pitch
pe
riod, voice
d
in some
ca
se
s, but someti
mes
not particularly prominent peak point , And in
the case of voicel
ess
but there will be som
e
occa
sion
al p
eaks, re
sultin
g in larg
er V
o
ic
in
g misju
d
ged an
d effective base freque
ncy erro
r;
autocorrelatio
n
with
relatively fixed
clipp
i
ng th
re
shold,
half-octave
h
i
gher fre
que
n
c
y ph
eno
men
a
, And thu
s
al
so
affect the
effective fun
damental
fre
quen
cy e
r
ror;
oval filtere
d
high
-fre
que
n
c
y
filter, empiri
cal mode
de
comp
ositio
n
(EMD) of
th
e sig
nal filtering, filtere
d
half-fre
que
n
c
y
harm
oni
c g
e
n
e
ration
SHG, ca
n effe
ctively filter
out
o
n
the
pitch
d
e
tection
is no
t The
ne
ce
ssary
informatio
n,
and
sig
nal
s of different
amplit
ud
e
can
be
si
m
p
lified, thu
s
improving
th
e
cla
ssifi
cation
rate Voici
ng, fundam
ental
freque
ncy re
du
ce
s the effect
ive erro
r.
5. Conclusio
n
s and Ou
tlo
o
k
Voice
signal i
s
a on
e-di
me
nsio
nal time-domain
sign
a
l
, empirical mode d
e
com
positio
n
(EMD) asso
ci
ated method i
s
com
b
ine
d
with the ellip
tic filter (EF) to pro
c
e
ss
sign
al, the rese
arch
results
sho
w
that this method can e
ffectiv
ely suppre
s
s noi
se
, promine
n
t sign
al peri
o
d
i
c
stru
cture, it can
wea
k
e
n
d
oublin
g p
hen
omeno
n
wh
i
c
h is cau
s
ed
b
y
the form
ant. And voi
c
ing
and
low ton
e
voice ca
n a
c
cura
tely distingui
shed, pitc
h di
scrimi
nation
of Voicing
tran
sition
se
ction
is
more
accu
rat
e
, and the
al
gorithm i
s
si
mple an
d
fast
. Experiment
al re
sults
sh
o
w
that the m
e
tho
d
can resi
st the interferen
ce noise, there is
better robu
stne
ss, the pitch pe
ri
od ca
n be more
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Pitch Dete
ction Base
d on
EMD and the
Secon
d
Spe
c
trum
(Jingfan
g Wan
g
)
8211
accurately ex
tracted,
sig
n
a
l extra
c
tion
is a
c
hiev
e
d
,
sign
al d
e
tail is mai
n
taine
d
and
noi
se
is
sup
p
re
ssed.
Referen
ces
[1]
CHUN J, SY
ING J, ZHANG R T
RUES.
T
one rec
o
gniti
on
usin
g
e
x
tend
ed s
e
gments.
ACM
T
r
ansactio
n
s o
n
Asia La
ng
ua
ge Infor
m
atio
n
p
rocess
ing
. 2
0
08; 7(3).
[2]
FERRER C, T
0
RRES
D, HE
RNANDEz
_
DI
AZM E.
Conto
u
rs in t
he Ev
al
uatio
n of Cyc
l
e
-
to-Cycle P
i
tch
Detection Algorithm
s
.
Pr
ocee
din
g
s of th
e
13th Ib
eroam
e
r
ican c
o
n
g
ress
on P
a
ttern
Reco
gniti
on
.
200
8
.
[3]
LI H, DAI B-GI
,
LU W
_
A
Pit
c
h D
e
tectio
n a
l
gorithm
bas
ed
on Am
df a
nd
Acf.
Digita
l
Obj
e
ct Identifi
e
r
.
200
6; 1: 14-19
.
[4]
KOBAYASHI H, SHIMAMURA. A modified
cepstmm m
e
thod for
pitch ex
traction.
IE
EE APCCAS.
199
8
.
29
9
-
302
.
[5]
Hua
ng N E, Sh
en Z
,
Long S
R, et al.
T
he emp
i
rica
l mod
e
deco
m
positi
on
and th
e Hil
bert
spectru
m
for
non
lin
ear an
d non-stati
on
ti
me
seri
es ana
ly
sis
. Proce
edi
n
g
of th
e R
o
yal
Societ
y
of L
o
ndo
n, 45
4(A).
199
8; 903-
995.
[6]
Gao Xiqu
an, D
i
ng Yum
e
i, Ko
u Yong
ho
ng et
c.
Digital si
gn
a
l
process
i
ng
–
princi
pl
es, impl
ementati
o
n
and a
p
p
licati
o
n
.
Publish
i
ng H
o
use of Electro
n
i
cs Industr
y
;
B
e
iji
ng, Ch
ina. 2
007; 14
4-1
51.
[7]
Spib Noise dat
a
[EB/OL]
,
htt
p
://spib.rice.edu/s
pib/select_noise.html
[8] RABINERL
R
.
On the use
of autocorr
e
lati
o
n
an
al
ysis for
pitch d
e
tectio
n
.
IEEE Tram
ASSR
19
77;
25(1): 24-
33
.
[9]
KOBAYASHI H, SHIMAMURA T
.
A modif
i
ed
cepstmm method
for pit
c
h
ex
traction.
I
EEE APCCAS.
199
8; 299-
302
[10]
Mohd
R Jam
a
lu
din, S
heik
h
HS Sa
lle
h,
T
an
T
S
w
e
e
, Kartini
Ahm
ad, Ahm
ad K
A
. Ibrahim
,
Kamarulafizam
Ismail. An Improved
T
i
me Domai
n
Pitc
h
Detectio
n Al
gorithm for Pathological Voice.
Amer
ica
n
Jour
nal of App
l
i
ed
Scienc
es
. 201
2; 9(1): 93-10
2
.
[11]
He Ba, N
a
Ya
n
g
. Ilker Dem
i
rk
ol a
nd W
e
ndi
Heinz
e
lm
an.
B
a
Na: A Hy
brid
Appro
a
ch for
N
o
ise
Resi
lie
nt
Pitch Detection
. IEEE Statistic
a
l Sig
nal Proc
e
ssing (SSP), Ann Arbor, Mich
i
gan. 20
12.
[12]
J Bartošek. A
Pitch D
e
tection
Alg
o
rithm for
Cont
i
nuo
us S
p
eech
Sig
n
a
l
s
Using
Viter
b
i T
r
aceb
ack
w
i
t
h
T
e
mporal F
o
rg
etting.
Acta Polytechnica
. 201
1; 51: 5,8-13.
Evaluation Warning : The document was created with Spire.PDF for Python.