TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 12, Decembe
r
2014, pp. 82
3
8
~ 824
5
DOI: 10.115
9
1
/telkomni
ka.
v
12i12.66
94
8238
Re
cei
v
ed Ma
y 19, 201
4; Revi
sed O
c
tob
e
r 22, 201
4; Acce
pted No
vem
ber 1
6
, 2014
Speech Enhancement Resear
ch Based on FRFT
Jingfan
g Wa
ng
Schoo
l of Information Sci
enc
e and En
gi
neer
ing, Hu
na
n Internatio
nal Ec
on
omics Univ
ersit
y
,
Chan
gsh
a
, Ch
ina, postco
de:
410
20
5
email: matl
ab_
b
y
s
j
@1
26.com
A
b
st
r
a
ct
As many
trad
ition
a
l
de-n
o
is
i
ng
metho
d
s f
a
il
in
the
int
ensiv
e n
o
is
es
env
iron
ment
an
d ar
e
una
da
ptabl
e in
various n
o
isy
envir
on
me
nts, a met
hod of sp
eech e
n
h
ance
m
e
n
t has be
en
advanc
ed b
a
s
e
d
on
dyn
a
mic F
r
action
al F
o
uri
e
r T
r
ansfor
m
(F
RF
T
)
filt
ering.
T
he ac
oustic
si
gna
ls ar
e fra
m
ed. T
h
e
re
new
i
ng
meth
ods
are
p
u
t in
F
R
F
T
opt
imal
dis
perse
degr
ee
of n
o
isi
ng s
peec
h
and
this
met
hod
is
i
m
pl
e
m
e
n
ted
i
n
detai
l. By T
I
MIT
criterion
voic
e a
nd
No
is
ex-
92, the
exp
e
ri
me
ntal
resu
lts
s
how
that this
alg
o
rith
m c
an f
ilter
nois
e
from
v
o
ice
availably
and im
pr
ove
the perf
or
m
ance of
automatic speech r
e
cognition system
signific
antly. It is prove
d
to be
robust und
er vari
o
u
s noisy
envir
on
me
nts and Si
gn
al-to-
Noise R
a
tio (S
NR)
cond
itions. T
h
i
s
algor
ith
m
is o
f
low
computat
i
ona
l co
mpl
e
xit
y
and bri
e
fness
in real
i
z
a
t
io
n.
Ke
y
w
ords
:
acoustic si
gn
al
, fractional F
o
u
r
ier transfor
m
(
F
RF
T
)
, speech enha
nce
m
ent, filtering d
e
-n
oi
sing
,
auto-a
d
a
p
tive process
i
ng
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
With the de
velopment o
f
commu
nica
tion techn
o
lo
gy, spee
ch
comm
uni
cati
on ha
s
become a m
a
jor me
dium
of commu
nication for p
e
o
p
le to pa
ss i
n
formatio
n m
o
re
conve
n
ie
nt.
Ho
wever, the
wide
sp
rea
d
noise in natu
r
e makes th
e
decli
ne in
sp
eeh
comm
uni
cation
quality
.
In
orde
r to
red
u
c
e the
effect
of noi
se o
n
spe
e
ch com
m
unication
s
perfo
rman
ce
and im
prove
th
e
quality of sp
eech com
m
u
n
icatio
n, S.
Boll et al
[1] propo
se
d a
classi
cal sp
ectral
subt
ra
ction
algorith
m
(Sp
e
ctral Subt
ra
ction, SS) in 1979, t
he alg
o
rithm is un
d
e
r the assum
p
tion of additive
noise and
sh
ort-te
rm ste
a
d
y voice si
gn
al indep
end
e
n
t con
d
itions,
the estimate
d noi
se spe
c
trum
is
subt
ra
cted
from
the
no
isy spee
ch
signal
sp
ectru
m
, sp
ee
ch
d
enoi
sed
si
gn
al spe
c
trum
is
obtaine
d. Ho
wever,
due t
o
its lo
cal
st
ationarity
a
ssumption i
s
n
o
t con
s
i
s
tent
with the
act
ual
situation, so t
he effect is n
o
t ideal, leadi
ng to
a larger resid
ual mu
sical noi
se an
d other issue
s
.
On the b
a
si
s of the tra
d
itional
spe
c
tral subt
ra
cti
on, Bero
uti [2] increa
sed
the adju
s
tm
ent
coeffici
ent of
the noi
se p
o
w
er spe
c
trum
and e
nha
nced the mi
nim
u
m limit of th
e sp
ee
ch p
o
w
er
spe
c
tru
m
, an
d it is to im
prove the
pe
rforma
nce
of
spe
c
tral
su
b
t
raction.
Ho
wever, sin
c
e t
h
e
corre
c
tion fa
ctor an it mi
ni
mum a
r
e d
e
termin
ed a
c
co
rding
to the e
x
perien
c
e, a
n
d
there is
po
or
adapta
b
ility in the
metho
d
.
In 19
84, th
e minim
u
m m
ean
sq
ua
re
e
rro
r i
s
i
n
trod
u
c
ed
to
a
spe
c
tral
subtractio
n b
y
Y. Ephraim
et al [3], it is po
ssi
bl
e to
solve the m
u
si
c n
o
ise po
rtio
n, the effe
ct
of
the de
noi
sing
is
of imp
r
ove
d
. Ho
weve
r, t
he di
stri
b
u
tio
n
of the
spee
ch
sp
ect
r
um i
s
requi
re
d p
r
i
o
r
to be e
s
timat
ed in thi
s
al
g
o
rithm, an
d
calcul
ation i
s
relatively larg
e. On the
ba
sis of the
sp
ectral
subtractio
n, P. Loch
w
o
od e
t
al [4] gave
spe
e
ch en
ha
ncem
ent gai
n
function
ada
ptively base
d
on
spe
e
ch si
gna
l to noi
se
ratio, and
nonli
near spe
c
tral
subt
ra
ction
algorith
m
(No
n
linea
r Spe
c
t
r
al
Subtra
ctor,
NSS) is p
r
op
o
s
ed, alth
oug
h
the SNR of
spe
e
ch is i
m
proved
in the
algorith
m
voi
c
e
,
but the
audi
o
quality is not i
m
prove
d
. To
further re
du
ce mu
sical n
o
i
s
e
and
imp
r
o
v
e voice
cl
arit
y,
peopl
e contin
ue to make
a variety of improvem
ents base
d
on traditional
spe
c
tral subtracti
o
n
algorith
m
[5-7
], the quality of voice i
s
im
proved
bette
r. However,
when the
sig
n
a
l
to noise ratio is
low or
in a non-station
a
ry
noise
e
n
vironment,
th
e perfo
rman
ce
of
the conve
n
tional sp
ect
r
al
subtractio
n tends to be
com
e
poor. In 20
02, S.
Kamath et al [8] proposed a multi
-
ban
d sp
ectral
subtractio
n b
a
se
d on
iterative
method
s. Thi
s
meth
od ta
ke
s int
o
a
c
count th
e colored n
o
i
se
spe
c
tral
inh
o
m
ogen
eity effects on
sp
ee
ch, introdu
cin
g
the twi
ddle
factor
and
th
e ban
d
segm
ent
treatment, while maintai
n
i
ng hig
h
voice
quality, the backg
rou
nd
noise an
d m
u
si
c noi
se
ca
n be
effectively eliminated un
d
e
r col
o
re
d no
ise pollutio
n
. There are m
a
ximum post
e
rio
r
i estimati
on
method
ba
se
d on
spee
ch
gen
eratio
n
model [9],
a
nd Kalm
an fi
lters,
etc. [1
0-12], th
e vo
ice
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Speech Enha
ncem
ent Re
search
Ba
sed
on FRF
T
(Ji
n
gfang Wang
)
8239
gene
ration
proce
s
s
can
b
e
mod
e
led
a
s
a
linea
r va
riable filter, th
e differe
nt ex
citation
so
urces
are u
s
ed fo
r different types of voice.
In 1995, Y. Ephrai
m, etc. build a ne
w avenue in the ti
me domai
n for voice en
han
ceme
nt
(frame
theo
r
y
sub
s
p
a
c
e
)
[13] for th
e f
i
rst
time,
we
pro
p
o
s
e
sp
eech e
nha
ncement al
go
rithm
based
on
subspa
ce sig
n
a
l.
Y.
Ephrai
m initial
wo
rk wa
s mainly f
o
r
white
noi
se, in
ord
e
r to
deal
with the
case
of no
n-white
noi
se, in
20
00, M
r
it
al [1
4] propo
sed
a si
gnal
/ noi
se K
L
tra
n
sfo
r
m,
despite the e
nhan
ce
d sig
n
a
l had a
sm
a
ll resi
dual
noi
se for
ea
ch frame, but the
non
stability of
resi
dual
noi
se bet
wee
n
frames i
s
di
stu
r
bing.
In
2
0
0
1
, A.Re
zaye
and S.
Ga
zor [15] p
r
op
ose
d
an
adaptive K
L
T
metho
d
fo
r p
r
ocessin
g
no
n-statio
na
ry
n
o
ise, th
ey a
s
sume
that th
e
feature ve
cto
r
s
for the spe
e
c
h si
gnal ca
n be app
roximated to co
varian
ce mat
r
ix diagon
ali
z
ation of no
n
-
stationa
ry col
o
red n
o
ise. In
2003, ba
sed on sig
nal
subspa
ce d
e
com
p
o
s
ition
and for lack o
f
Re
zayee met
hod, spe
e
ch enha
ncement
algorithm is
prop
osed by Y. Hu et al [16] for colo
re
d
noise in the time domai
n a
nd freq
uen
cy
domain. In
the sa
me yea
r
, A. Lev and
Y. Ephraim [17]
have also p
r
opo
se
d app
roa
c
h for
colore
d noi
se
. Howeve
r, the premi
s
e
of the above
method
srequi
re that
noi
se
cova
rian
ce
matrix
mu
st
be full
ran
k
,
whi
c
h
is
no
t appli
c
able
for
narro
wba
nd noise.
We studied
FRFT (Fra
ctional
Fo
ur
ie
r T
r
an
sform) filter meth
od, an
d afte
r to
test in vari
ou
s noi
se
environment, the
e
ffect is go
od,
comp
utationa
l co
st is
small
in the p
r
op
osed
algorith
m
, it is simpl
e
and
easy to achie
v
e.
2. Fraction
a
l Fourier Tra
n
sform (FRF
T)
Fra
c
tional
F
ourie
r T
r
an
sf
orm
(Fractio
nal Fo
urie
r
Tran
sfo
r
m,
FRFT
) i
s
a
re
cently
develop
ed, a
nd it is a
ne
w time-f
requ
ency a
nalysi
s
tool, it is
a
gene
rali
zed
form of Fo
u
r
ier
transfo
rm. Essentially, the
sign
al is
re
prese
n
tati
on o
n
the fra
c
tiona
l Fouri
e
r
dom
ain, whil
e it is
the integratio
n of the sign
al informatio
n in
the time domain and
frequen
cy d
o
main. This
new
mathemati
c
al
tools not o
n
l
y
is clo
s
ely li
nke
d
with
the
Fouri
e
r tra
n
sform, but also there
are
al
so
very meanin
g
ful conta
c
t
with othe
r time-fre
que
nc
y
analysi
s
tool
s, it has b
e
e
n
widely u
s
e
d
in
optical
syste
m
s a
nalysi
s
,
filter de
sign
, signal
anal
ysis, solving
differential
equatio
ns, p
hase
recovery, p
a
ttern
re
cog
n
ition a
nd
other fields [18
-
20
]. In re
cent y
ears, ap
plication of f
r
a
c
tio
nal
Fouri
e
r tra
n
sform is mo
stly concentrat
ed in
the linear FM si
gn
al estimation
, detection a
n
d
filtering aspe
cts.
FRFT
can b
e
interp
rete
d
as
sig
nal
rep
r
esentat
ion
i
n
the
com
p
o
s
ition
of the f
r
actio
nal
Fouri
e
r dom
a
i
n when the
cou
n
terclo
ckwise rota
tion
is done the
time-frequ
en
cy plane at any
angle. FRFT i
s
a gen
erali
z
ed form of Fo
urie
r
tran
sform. FRFT sig
n
a
l is define
d
as [20].
,
X
uF
x
t
u
x
t
K
t
u
d
t
(
1
)
Whe
r
e the tra
n
sform ke
rnel
FRFT
,
Kt
u
is:
22
1c
o
t
ex
p
c
o
t
c
s
c
,
22
(,
)
(
)
,
2
()
,
(
2
1
)
jt
u
jt
u
n
K
tu
t
u
n
tu
n
(
2
)
Where
2
p
is FRFT rotation. Signal
x
t
Return to:
du
u
t
K
u
X
t
u
X
F
t
x
)
,
(
)
(
)
)]}(
(
[
{
)
(
(
3
)
3. FRFT Dy
n
a
mic Filtering
The e
nergy
accumul
a
tion
of F
r
a
c
tional
Fou
r
ie
r tra
n
s
form
is relat
ed to
tran
sfo
r
mation
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8238 – 82
45
8240
orde
r
α
,
and
i
t
s ag
gregatio
n is
stron
g
ly
depe
nd
s o
n
t
he
extent
of i
t
s
close to
Fo
urie
r tran
sform.
Fra
c
tional Fo
urie
r tran
sform of the speech sig
nal has a
certai
n energy in both voice
d
an
d
unvoiced focus, the ene
rg
y difference i
s
differ
ent in
their focu
s areas: voiced focu
se
d ene
rgy
reflect
s
the
central re
gio
n
of the wa
veform on F
r
actio
nal tra
n
s
form d
o
mai
n
, the voicel
ess
focu
sing en
e
r
gy is refle
c
ted at both ends of t
he region of the waveform. Fraction
al Fourier
Tran
sfo
r
m of white noi
se i
s
not
the nature of the ene
rgy focu
s, t
he focu
s en
ergy
is poo
r o
n
the
noise, it can be used to de
-noi
sing in voi
c
e si
gnal.
3.1. The Bes
t
Frac
tional Order
α
FR
FT
For differe
nt segm
ents a
n
d
noise pollu
tion sign
als
,
FRFT tra
n
sfo
r
mation i
s
made in
different fra
c
ti
onal
α
, then t
he effective fil
t
ering i
s
do
ne
. What is
me
asu
r
ed
? It is
comm
on MM
SE
(minimu
m
m
ean squa
re e
rro
r, MMSE),
the energy
focu
s de
gree
is mea
s
u
r
ed
by the weight
ed
varian
ce in th
is pap
er.
Fra
c
tional
α
orde
r Fou
r
ie
r transfo
rm of 2N point si
gn
al
x
t
is:
N
k
k
X
2
,
,
2
,
1
),
(
,
its half is taken because th
e center sym
m
etric. Its
normalized Probability is:
N
k
i
X
k
X
k
p
N
i
,
,
2
,
1
|
)
(
|
|
)
(
|
)
(
1
(
4
)
N
k
k
kp
EX
1
)
(
,
N
k
k
p
EX
k
X
Var
1
2
)
(
)
(
)
,
(
(
5
)
Takin
g
i
,
1
0
i
, to calcul
ate the
weig
hted vari
ance Var
(X,
i
), then the
s
e
data
are fitted in
cubic
spli
ne, a
nd the mi
nim
u
m value Va
r (X,
α
0
) i
s
so
ught in the
Var (X,
α
), so
α
0
is
of the corre
s
p
ondin
g
be
st fraction
al ord
e
r.
Figure 1. Accordi
ng to the weig
hted varian
ce, be
st FRFT Fracti
onal orde
r
α
contra
st of
spe
e
ch and n
o
ise
In
Figu
re 1, (a1
)
i
s
spe
e
c
h sig
nal, (b
1)
is
its
α
-o
rde
r
fractio
n
Fouri
e
r tran
sform, it
cha
nge
s in t
he ene
rgy p
r
oce
s
s of gra
dual a
c
cumul
a
tion from th
e time domai
n to frequ
en
cy
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Speech Enha
ncem
ent Re
search
Ba
sed
on FRF
T
(Ji
n
gfang Wang
)
8241
domain; (c1
)
are Var (X,
α
) and
α
Tren
ds and the cubic spline fitting
by Formula (5). (a
2) is a
factory noi
se,
(b2
)
is
co
rre
spo
ndin
g
to the
α
-o
rde
r
fraction
al Fou
r
i
e
r tra
n
sfo
r
m,
it chan
ge
s in
the
energy pro
c
e
ss of g
r
adu
al accumul
a
tion
from the
time domain to frequ
en
cy do
main; (c2) a
r
e its
Var (X,
α
)
and
α
Trend
s
a
nd the
cubi
c
splin
e fitting.
Speec
h e
n
e
r
gy is
gathe
re
d well in
the f
i
eld
of FRFT, noi
se en
ergy a
c
cumul
a
tion of
FRFT dom
ai
n is bad.
3.2. FRFT Do
main Filter Design
Becau
s
e
spe
e
ch
en
ergy i
s
g
a
thered
well in
the
F
R
FT
domai
n, noi
se F
R
FT
domai
n
energy agg
re
gation is p
o
o
r
, then FRFT
domain ma
g
n
itude is hi
gh
er, and noi
se margi
n
is l
o
w,
the amplitud
e is use
d
to achi
eve bette
r
effic
i
enc
y
, how to s
e
lec
t
cutting threshold?
First
n
0
fram
es i
s
set for
the noi
se
fra
m
e, The
me
an a
m
plitude
of the
s
e
n
0
frames
in
FRFT d
o
main
were MV
i
,i=1,2,…, n
0
。
0
1
0
n
MV
MV
n
i
i
(
6
)
The cu
rrent frame FRFT Amplitude:
N
k
k
X
k
p
,
,
2
,
1
|
)
(
|
)
(
Thre
sh
old:
1
},
*
),
,
,
2
,
1
),
(
(
max{
a
MV
a
N
k
k
p
median
T
(7)
Filters:
function
Sign
T
k
p
sign
k
H
})
0
,
)
(
(max{
)
(
(
8
)
Filtered sign
a
l
recove
ry:
)
)}(
)](
'
(
[
{
)
(
ˆ
n
k
n
x
HF
F
n
x
(
9
)
4. Experimental Ev
aluation
Backgroun
d noise is sele
cted from
No
isex-9
2
data
base [21],
we test F
R
F
T
filter by
TIMIT spee
ch data
b
a
s
e
standa
rd,
the sam
p
lin
g frequ
en
cy fs = 16
kHz,
spee
ch
file
KDT_003.WA
V is accessed in libra
ry, a
nd its
wave i
s
in
(a)
of Fi
gure
2. In th
e co
urse of t
h
e
spe
e
ch su
b-frame, to take
32ms in fram
e l
ength, ie, frame length M
= [0.32fs] poi
nts.
Obje
ctively, the perfo
rma
n
c
e of the algo
rithm
is co
mp
rehe
nsively a
nalyze
d
from several
asp
e
ct
s of th
e sp
ee
ch
wa
veform, sp
ect
r
og
ram,
SNR improve
m
ent
. Effect of denoisi
ng al
gori
t
hm
is analy
z
ed q
uantitatively by using SNR.
)
)
(
)
(
(
log
10
1
2
1
2
10
N
t
N
t
t
noise
t
signal
SNR
(
1
0
)
Experiment 1:
T
he
origi
n
al voice i
s
in
Figure 2
(
a
)
, the o
r
igin
al
sp
eech
we
re
m
i
xed with
white noi
se (white),
pin
k
noise (pin
k), fighters
(f1
6
_
c
o
c
kpit) noi
se,
plant
(fa
c
tory) noi
se,
n
o
isy
vocal (bab
ble
)
so
urce fro
m
noise Noi
s
ex-92
lib
ra
rie
s
, the re
sults were
com
p
ared
before
and
after thi
s
a
r
ticle FRFT m
e
thod, th
eir sp
eech
waveforms
are
sho
w
n in
Figu
re
2, spee
ch
an
d
the
noisy spee
ch
are in left part of Fi
gure 2, the filtered sp
eech are in th
e right, the h
o
rizontal axis
is
the time (second
s) in e
a
ch thumbn
ail, the ordi
nate i
s
the am
plitu
de; (a
), (a1
)
are
comp
are
d
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8238 – 82
45
8242
before
and
a
fter the filtering of t
he o
r
iginal spee
ch;
(b), (b1)
are com
p
a
r
ative voice filteri
n
g
before
and
after the mixin
g
of white
noi
se (white
)
; (c), (c1
)
a
r
e
contra
st befo
r
e and
after the
filtering of th
e
mixed pin
k
n
o
ise
(pi
n
k) vo
ice;
(d), (d1
)
are co
mpa
r
in
g
befo
r
e and after
the noi
sy
spe
e
ch filteri
ng of th
e mix
ed ai
rcraft n
o
i
se voi
c
e
(f
16
_co
c
kpit), (e),
(e1
)
are
com
pari
s
on
b
e
fo
re
and afte
r voi
c
e filter
wh
en
it is mixed
with varyi
ng n
o
i
se
sou
r
ce
s - plant n
o
is
e (factory),
(f), (f
1)
are comp
ari
s
on before an
d after voice filtering when
i
t
is mixed with varying noi
se source
s - l
oud
voice
s
(ba
bbl
e). White noi
se (white
), pin
k
noi
se
(pi
n
k), fighters (f16
_co
c
kpit) noi
se are statio
na
ry
noise sou
r
ce
s, while the factory (fa
c
tory) noise,
lo
ud
vo
ic
e
s
(
b
ab
b
l
e
)
ar
e
n
on-
s
t
a
t
io
na
r
y
no
is
e
sou
r
ces. Th
e right in Figu
re
2 is the co
rre
spo
ndin
g
dynamic fractio
nal
α
'
s
ch
ang
es.
Figure 2
.
Sp
eech wavefo
rm comp
ari
s
o
n
before a
nd
after FRFT filtering
Spectrogram
comp
ari
s
on
results a
r
e
in Fi
gure 3
before
and
after filterin
g of this
algorith
m
, the hori
z
o
n
tal
axis of ea
ch
small g
r
a
p
h
is time (se
c
on
ds), the
vertical axi
s
is
freque
ncy
(kHz); (a
) i
s
th
e ori
g
inal voi
c
e
spe
c
trogram, (b
), (b
1) are th
e spe
e
ch
sp
ect
r
og
ram
comp
ari
s
o
n
for
mixing
white noi
se
(white)
bef
or
e
and
after filteri
ng;
(c), (c1
)
a
r
e
the
spe
c
tro
g
ram contrast of spee
ch which is mixed wi
th
pink noi
se
(pink) befo
r
e
and after filte
r
ing;
(d), (d
1)
are the
spe
c
tro
g
ram contrast
of
the
th
e mi
xed ai
rcraft n
o
ise
(f1
6
_
c
ockpit) befo
r
e
a
nd
a
fte
r
filte
r
in
g
;
(
e
)
,
(
e
1)
ar
e s
p
ee
ch
s
p
ec
tr
o
g
r
a
m co
ntr
a
s
t
w
h
en
it is
mixe
d w
i
th
va
r
y
in
g
n
o
i
s
e
sou
r
ces
- pla
n
t (facto
ry) b
e
fore a
nd aft
e
r filterin
g;
(f), (f1) a
r
e th
e
spe
e
ch spe
c
trogram contrast
whe
n
it is mixed with varying noi
se sou
r
ce
s - n
o
isy
voice
s
(ba
bbl
e) befo
r
e an
d after filterin
g ,
time-varying
noise
sou
r
ce
s
- lou
d
voi
c
e
s
(bab
ble
)
i
s
mixed in voi
c
e freq
uen
cy b
and, the
ge
n
e
ral
approa
ch is h
a
rd work, goo
d results a
r
e
also rea
c
he
d in this algo
rithm.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Speech Enha
ncem
ent Re
search
Ba
sed
on FRF
T
(Ji
n
gfang Wang
)
8243
Figure 3
.
spe
c
trog
ram
com
pari
s
on b
e
fore and after F
R
FT filtering
Signal to n
o
ise ratio
SNRin
is
cal
c
ulate
d
before f
ilterin
g, the filtere
d
sign
al to
noi
se
ratio
is SNRout, spee
ch is mix
ed re
sp
ectiv
e
ly with
whit
e noise (whit
e
), pin
k
noi
se (pin
k), figh
ter
(f16_
co
ckpit),
factory n
o
ise (fa
c
to
ry), n
o
isy voices
(babbl
e), that
five signal to
noise ratio a
fter
these
noi
sy
spe
e
ch i
s
filtered
by th
e
algorith
m
:
%
100
in
in
out
SNR
SNR
SNR
is
Incre
a
sed
by 8.36%,
11.50%, 40.0
6
%, 29.37%, 40.49% (see
Table 1
)
.
Experiment 2:
the ab
ove spe
e
ch sig
nal
s we
re ad
de
d to 4 grou
ps of Gaussia
n
noise in
orde
r to en
han
ce the intensity, the input SN
R of the resulting mixed
signal SNRin is
respe
c
tively: -4.556,
-9.0
1
9
, -1
8.41,
-28
.
63dB. In
exp
e
rime
nt, 4
kin
d
s
of different
SNRin
sp
ee
ch
sign
al a
r
e u
s
ed un
de
r the
mean
filterin
g (n
=
3,5),
wavelet filter (db
2
wavele
t decompo
siti
on
level n
= 3) a
nd F
R
FT
do
main filteri
ng
denoi
sing
SNR. Th
e
re
sult
s a
r
e
sho
w
n
in Ta
ble
2. It i
s
see
n
from
th
e table, in
st
rong
ba
ckgro
und n
o
ise, d
e
-noi
sin
g
me
thod ba
se
d o
n
FRFT d
o
m
a
in
filtering is sup
e
rio
r
to conve
n
tional de
-noi
sing meth
od
s.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 12, Decem
ber 20
14 : 8238 – 82
45
8244
In orde
r to facilitate visual
compa
r
i
s
on,
the denoi
sin
g
results in T
able 2 is plot
ted, the
results are
shown in
Figu
re
4. Amon
g
them: afte
r spee
ch sig
nal is
a
dde
d
by stron
g
Gau
ssian
noise, SNRin
and SNRout
SNR a
r
e com
pare
d
in th
re
e kind
s of de-noisi
ng meth
od in Figu
re 4
.
Figure 4
.
SNRout spee
ch
sign
al com
p
a
r
iso
n
in thre
e kind
s of de-n
o
isin
g metho
d
From Figu
re
4, when the b
a
ckgroun
d no
ise is
strong,
denoi
sing eff
e
ct is ba
sed
on FRFT
domain filteri
ng better tha
n
averag
e filtering a
nd wa
velet de-noi
si
ng, and de
-n
oisin
g
effect is
little change
d
with the enhancement of noise, while
i
n
the mean filter and wav
e
let de-n
o
isi
n
g
method
s, the de-n
o
isi
ng effect are
decre
ase
d
rapi
dly with noi
se en
han
ceme
nt.
5. Conclusio
n
s and Ou
tlo
o
k
In this p
ape
r,
throug
h the f
r
action
al Fo
uri
e
r
tra
n
sfo
r
m appli
c
ation of
noisy sig
nals,
signal
measured ve
rgen
ce
and
its de-noi
sin
g
method
are pro
p
o
s
ed
based on f
r
action
al Fou
r
ier
transfo
rm. Co
mpared with t
he traditio
nal
Fouri
e
r tra
n
sf
orm de
-n
oisi
n
g
method, fra
c
tional F
ouri
e
r
transfo
rm
do
main is mad
e
to the sig
nal
and n
o
is
e b
y
the pro
p
o
s
ed metho
d
, so that the
sig
nal
and
noise d
o
not ove
r
lap
as m
u
ch a
s
possibl
e, so
as to
a
c
hiev
e better de
n
o
isin
g effe
ct. The
results
sho
w
that, for noisy sign
al wit
h
differ
ent SNR, the
r
e is
a best F
r
acti
onal orde
r in
the
prop
osed met
hod, de-noi
si
ng effects m
a
ke the be
st.
Becau
s
e
there are a va
rie
t
y of pra
c
tica
l pro
b
lem
s
of
non
-Ga
u
ssia
n noi
se
and
stron
g
backg
rou
nd
n
o
ise, it i
s
diffi
cult to
extract
t
he
sou
nd
si
gnal, a
FRFT
domai
n filteri
ng meth
od
s i
s
prop
osed
i
n
this
p
ape
r. Experiment
s are don
e
with sta
nda
rd
TIMIT sp
ee
ch data
b
a
s
e
and
Noi
s
ex-9
2 noi
se library, the experimental
result
s sh
ow that the use of this metho
d
all have go
od
local fe
ature
s
in the time
domain
and f
r
equ
en
cy do
main, and it
is supe
rio
r
to
the tradition
al
extraction m
e
thod of sou
nd sig
nal fea
t
ure. Mean
while, denoi
sin
g
simul
a
tion
s are ma
de with
noisy spee
ch
which it is containi
n
g
whi
t
e noise
(white), pin
k
noi
se
(pin
k), fighte
r
s (f1
6
_
c
o
c
kp
it)
noise, factory noise (fa
c
tory), noisy
voice
s
(ba
b
b
l
e), and ha
s a stron
g
voice G
a
u
ssi
an
backg
rou
nd n
o
ise, the
sim
u
lation results sho
w
that
this meth
od
si
gnifica
ntly improve
s
the
si
gnal
to noise ratio
,
and it is
si
gnifica
ntly be
tter than the
traditional
m
ean filteri
ng
and
wavelet
de
-
noisi
ng meth
ods.
For no
n-stationary noi
se
environ
ment
s, spee
ch de
n
o
isin
g algo
rithm is propo
sed from
the pe
rspe
ctive of noi
se
FRFT
dom
ai
n filteri
ng.
Non-stationa
ry
noi
se f
r
ame
is
sm
oothe
d
and
update
d
by t
he alg
o
rithm
of fast tra
c
king
noi
se, it
is b
e
tter to
estimate th
e ambi
ent n
o
ise.
Experiment
s
sho
w
th
at the
pro
p
o
s
ed
alg
o
rithm
ca
n m
o
re
effectivel
y sup
p
re
ss b
a
ckgroun
d n
o
ise
and imp
r
ove
voice qu
ality after den
oise
d. This meth
od ha
s sim
p
l
e
cal
c
ulatio
n, real-tim
e is
high,
noise immu
ni
ty is strong,
and it p
r
ovid
es a
ne
w
wa
y to detect
a
wea
k
sig
nal
de-noi
sing
a
n
d
stron
g
ba
ckground
noi
se.
For thi
s
arti
cl
e filter
o
n
F
R
FT
dom
ain,
the fu
rthe
r
study will
be
d
one
on the narro
wband filterin
g of the ma
jor e
nergy ag
greg
ation point
s.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Speech Enha
ncem
ent Re
search
Ba
sed
on FRF
T
(Ji
n
gfang Wang
)
8245
Referen
ces
[1]
SF
Boll. S
uppr
essio
n
of
aco
u
s
tic nois
e
in s
p
eech
usi
ng s
p
e
c
tral su
btractio
n
. IEEE Trans.
ASSP.
197
9;
27(2): l1
3-12
0.
[2]
M Berouti, R Sch
w
artz, J Makhoul.
Enh
ance
m
e
n
t of Spe
e
c
h
Corru
pted
by
Acoustic N
o
is
e
. Procee
di
n
g
of 1979 IEEE, ICASSP. 197
9; 208-
211.
[3]
Y Epharim, D
Mala
h. Speec
h
enha
ncem
ent usin
g a
minim
u
m mean-s
qua
re error short-ti
me spectra
l
amplit
ude esti
mator
. IEEE. Trans. Acoustic,
Speech Si
gn
al
Processin
g
. 1984; 32(
6): 110
9-11
21.
[4]
P Loch
w
o
o
d
, J Boun
d
y
. E
x
perime
n
ts
w
i
th
a Non
l
i
near
Spectral S
ubtr
a
ctor (NSS), Hidd
en M
a
rko
v
Mode
ls and Pr
ojecti
on, for Ro
bust Reco
ng
nit
i
on i
n
Cars.
Sp
eech C
o
mmun
.
,
1992; 11(
6): 215-2
28.
[5]
Y Ephr
aim. A
minimum
mea
n
sq
uare
error
ap
proac
h for
speec
h e
n
h
a
n
c
ement.
Aco
u
s
t
ics, Speec
h
,
and Si
gn
al Pro
c
essin
g
. 199
0; 2: 829 -83
2
.
[6]
Liu Z
h
i
b
i
n
, Xu
Naip
in
g. Spe
e
ch e
nha
nce
m
ent bas
ed o
n
minim
u
m m
ean-s
quar
e er
ror short-tim
e
spectra
l esti
ma
tion an
d its reali
z
a
t
io
n.
IEEE International conferenc
e on
int
e
lligent proc
es
sing s
y
stem
.
199
7; 179
4-17
97.
[7] R
Martin.
Spe
e
ch
enh
anc
e
m
ent us
ing
MM
SE short ti
me
spectral
esti
ma
tion w
i
th Ga
mma
distri
bute
d
speec
h priors
.
Proc. IEEE Int. conf. Acoustics, Speech, Sig
nal Proc
essin
g
.
2002; 1: 253-
256.
[8]
S Kamath, P
Loizou.
A multi-
ban
d
Sp
ectral Subtractio
n
Me
t
hod for E
n
h
a
n
c
ing S
p
e
e
ch
C
o
rrupte
d
b
y
Color
ed N
o
is
e.
Proceed
in
gs o
f
ICASSP. Orlando USA, IV-4
164. 20
02.
[9]
JS Lim, AV O
ppe
nh
eim.
En
hanc
e
m
ent an
d
Ban
d
w
i
dth Co
mpress
io
n of
Noisy
S
p
e
e
c
h.
Proc.of the
IEEE. 1979; 67
(12): 158
6-1
6
0
4
.
[10]
JD Gibson, B Koo, SD Gr
a
y
.
F
iltering of Co
lore
d Nois
e for Speech En
ha
nceme
n
t and
Codi
ng.
IEEE
T
r
ans. Signa
l Processi
ng
. 19
91; 39: 17
32-1
742.
[11]
W
R
W
u
a
n
d
P
C
C
hen.
Sub
b
and
Ka
lman
F
i
l
t
ering
for S
p
e
e
c
h En
ha
nceme
n
t.
IEEE Trans
. On Circ
u
its
And Syste
m
sli:
Analo
g
And D
i
g
ital Si
gna
l Pro
c
essin
g
. 199
8; 45: 107
2-1
083.
[12]
S Gann
ot, D
Burshte
i
n,
E W
e
inste
i
n.
Itera
tive
an
d se
que
ntia
l
Kalma
n
filter-
base
d
sp
eec
h
enh
anc
ement alg
o
rithms.
IEEE Trans Speec
h and Au
di
o Process.
199
8; 6
(
4): 373-3
8
.
[13]
Y Ephraim, H
L
V
T
r
ees. A signal su
bsp
a
ce a
ppro
a
ch for sp
eech e
n
h
anc
e
m
ent.
IEEE Tr
ansactions on
Speec
h an
d Audi
o Processi
n
g
.
1995;
3(
4): 251-2
66.
[14]
U Mrital, N Ph
amdo
n. Sign
al
/
nois
e
KLT
based a
ppr
oac
h
for enha
nci
ng
speec
h de
gra
d
ed b
y
co
lore
d
nois
e
.
IEEE Trans on Sp
eec
h
and Aud
i
o Pro
c
essin
g
. 200
0; 8(3): 159-
16
7.
[15]
A Reza
ye
e, S
Gazor. An a
d
a
p
tive KLT
appr
oach for s
p
e
e
c
h
en
ha
nceme
n
t.
IEEE Tram Speec
h Au
di
o
Processi
ng
. 20
01; 9(2): 87-9
5
.
[16]
Y Hu, P
L
o
izo
u
. A g
e
n
e
ral
i
ze
d su
bspac
e
ap
proac
h for
en
h
anci
ng s
p
e
e
ch
corru
pted
b
y
c
o
lor
ed
no
ise
.
IEEE Trans on Speec
h an
d Audi
o Processi
n
g
. 2004; 1
1
(4): 334-
341.
[17]
H Lev
a, Y Ep
hraim. E
x
tens
i
on of th
e sig
n
a
l su
bsp
a
ce s
peec
h e
nha
nc
ement a
ppr
oa
ch to col
o
re
d
nois
e
.
IEEE Signal Processing
. 2003; 1
0
(4): 104-
106.
[18]
Soo-C
han
g P
e
i, Jia
n
-Jiu
n
Ding
.Rel
atio
n
s
bet
w
e
en F
r
action
al Op
er
ations
an
d T
i
me-F
requ
enc
y
Distributi
ons, a
nd T
heir App
l
i
c
ations,
IEEE Transactions On
Signal
Processing
. 200
1;
49(8): 1
638-
165
5.
[19]
T
ao Ran, Bing Den
g
, etc. fractiona
l F
our
ier transform in sig
nal proc
essi
ng
researc
h
.
Sci
e
nce
in
Ch
in
a
Series F.
2006;
49(1): 1-25
[20]
T
ao Ran, Bing
Deng, W
ang
Y
ue. F
r
actiona
l F
ourier transf
o
rm and it
s ap
plicati
on. Bei
j
i
n
g:
T
s
inghu
a
Univers
i
t
y
Pr
es
s. China. 20
09.
[21]
Spib Noise dat
a
[EB/OL], http
://spib.rice.edu/
spib/select_noise.html
Evaluation Warning : The document was created with Spire.PDF for Python.