Indonesian J
ournal of Ele
c
trical Engin
eering and
Computer Sci
e
nce
Vol. 2, No. 2,
May 2016, pp
. 367 ~ 379
DOI: 10.115
9
1
/ijeecs.v2.i2.pp36
7-3
7
9
367
Re
cei
v
ed
Jan
uary 1, 2016;
Re
vised Ap
ril
18, 2016; Accepte
d
May 1
,
2016
Speech Enhancement based on
Wiener Filter and
Compressive S
e
nsing
Am
art Sulon
g
*
1
, Tedd
y
Sur
y
a Guna
w
a
n*
2
, Othman O. Khalifa
3
, Mira Karti
w
i
4
,
Eliathamb
y
Ambikairaja
h
5
1,2,
3
Department
of Electrical a
nd Com
puter E
ngi
neer
in
g,
Internati
ona
l Isla
mic Univers
i
t
y
Mala
ysi
a
(IIUM),
Malay
s
ia
2
Visiting F
e
l
l
o
w
, School of Ele
c
trical Eng
i
ne
e
r
ing a
nd
T
e
lecommunic
a
tio
n
s
,
Universit
y
of Ne
w
So
uth W
a
les
(UNSW), Australia
4
Departme
n
t of Information S
ystems, Internation
a
l
Islamic U
n
iversit
y
Mal
a
ysia (IIUM), Mala
y
s
ia
5
School of Elec
trical Eng
i
ne
eri
ng an
d T
e
leco
mmuni
cati
ons,
Univers
i
t
y
of Ne
w
S
outh W
a
l
e
s (UNSW
)
,
Australia
*Corres
p
o
ndi
n
g
authors, e-m
a
il: amar
tu
ia@
g
mail.c
o
m, tsguna
w
a
n
@
ii
um.edu.m
y
A
b
st
r
a
ct
In the last f
e
w
decad
es,
ma
ny a
d
van
c
ed tech
no
log
i
es h
a
ve b
e
e
n
pro
pose
d
, i
n
w
h
ic
h
communic
a
tio
n
s
playe
d
a
gre
a
t role as w
e
l
l
as telec
o
mmu
n
icati
ons a
ppl
i
c
ations. T
he
n
o
ise e
l
i
m
i
nati
o
n in
vario
u
s envir
o
n
ments bec
a
m
e the most concern
ed as
it greatly h
i
nd
er
ed the spe
e
ch
commu
n
ic
atio
n
app
licati
ons. T
he i
m
prov
eme
n
t of no
isy sp
e
e
ch i
n
ter
m
s
of
qua
lity a
nd
int
e
lli
gi
bil
i
ty are t
a
ken
into
acco
unt
w
i
thout introd
u
c
ing
any ad
diti
ona
l nois
e
. Many
spe
e
ch e
nha
nce
m
e
n
t a
l
gorit
hms
have
been pr
op
os
ed.
W
i
ener filter i
s
one of the classica
l alg
o
ri
thm t
hat i
m
pr
ove the no
isy
speec
h by r
educ
ing its no
ise
compo
nents t
h
rou
gh se
lecti
v
ely ch
osen
W
i
ener
gai
n. In
this p
a
p
e
r, compressiv
e
s
ensi
ng
meth
o
d
b
y
rand
o
m
i
z
e
me
asure
m
ent matrix is combi
n
e
d
w
i
th the
W
i
ener filter to re
duce th
e noisy
speech si
gn
al
t
o
prod
uce h
i
gh s
i
gn
al to no
ise
ratio. T
he PES
Q is us
ed to
me
asur
e the q
uality
of the
pr
opos
ed a
l
gor
ith
m
desi
gn. Experi
m
e
n
tal res
u
lts show
the effectiveness of
our
prop
osed a
l
g
o
r
i
thm to en
ha
n
c
e noisy sig
n
a
l
s
corrupted by v
a
rious nois
e
s
compar
ed to other traditi
onal algor
ithm
s, in whic
h high PESQ scores were
achi
eved
acros
s
various n
o
ise
s
and differe
nt SNRs.
Ke
y
w
ords
: sp
eech e
n
h
ance
m
e
n
t, W
i
ener
filter, compre
ssive sensi
n
g
(CS), percep
t
ual eva
l
uati
o
n
of
speec
h qu
ality
(PESQ)
Copy
right
©
2016 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
1.1. Speech
Enhanceme
n
t Algorithms
In advanced of
today technologies enable
to di
rect communication
in a large
distance,
broader audiences, and
more challenging
circumstanc
es. These fundamental
principles lead
to
more crucial and provide a great interest to t
he scientists in getting to the field of speech
enhancement
[1]. Such as, the
initial motivation
of the interest
area is to develop noise
reduction algorithms that
can be used
to help
hearing-impaired listeners
(cochlear implant
listeners)
better communicate in noisy environment
s. It is motivated by improving perceptual
aspects
of speech that have
been degraded by
additive
noise that corrupted speech
[2].However, there is always
tradeoff between
noise reduction and signal distortion –
better
noise reduction is always accompanied by larger
signal distortion [3]. Hence, the
main
challenge
in
speech enhancements is
to design effective
algorithm to suppress
the noise without
introducing any perceptible distortion in t
he signal.The speech enhancement algorithms broadly
introduced
two types of speech distortion: the dist
ortions that affect the speech signal itself
called speech distortion and the distortions
t
hat affect the background noise called
noise
distortion [4-7]. Class of speech enhancement algor
ithms can be represented into three different
speech
enhancement methods
used to
date [2, 6-
8],
as will
be explained
in the following
sect
ions.
1.1.1. Spectr
a
l-Subtr
a
c
t
iv
e Algorithms
Spectral-Su
b
tractive algo
rithms we
re propo
sed
by Weiss et al. [2,
9] in the
correlatio
n
domain
and l
a
ter by Boll [
2
, 10] in the
Fouri
e
r tran
sform d
o
main.
This n
o
ise e
s
timation will
be
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 367 –
379
368
evaluated d
u
r
ing
spe
e
ch pau
se that n
o
rmally ha
pp
ens in
a no
rmal co
nversa
tion. It is wid
e
ly
kno
w
n to
suffer from
perce
ptible artifa
cts by intr
od
uc
in
g
mus
i
ca
l no
is
e
.
H
o
we
ver
,
th
is
me
th
od is
the simpl
e
st
enha
ncement
algorith
m
s t
o
implem
ent. The id
ea b
ehind thi
s
b
a
s
ic
prin
cipl
e
is
based o
n
ad
ditive noise
whi
c
h
can
be
estimated f
r
om the noi
sy
spe
c
tru
m
when
spe
e
ch i
s
not
pre
s
ent an
d subtract
s it from the noisy
signal.
The
sho
r
t-te
rm sp
ectral am
plitu
de (STSA) h
a
s
been
exploite
d su
cce
ssfull
y
in the deve
l
opment.
Th
e
s
e
subt
ra
ctive-type al
gorit
hmsu
se
d ST
SA
on the
noi
sy
spe
e
ch inp
u
t and
re
cove
r an e
s
timate
of the
clea
n
STSA by re
moving the
p
a
rt
contri
buted
b
y
the additive noise. Th
e u
npro
c
e
s
sed
p
hase of the n
o
isy input
sig
nal is u
s
ed to
synthe
size th
e en
han
ced
spe
e
ch
signa
l unde
r
assu
mption that t
he h
u
man
e
a
r i
s
n
o
t abl
e to
perceive th
e
distortio
n
s in
the ph
ase of
the s
pee
ch
si
gnal [1
1]. Its
enha
nced
sig
nal i
s
o
b
taine
d
by com
puting
the inverse
discrete
Fou
r
i
e
r tra
n
sf
o
r
m
of the estim
a
ted si
gnal
sp
e
c
trum
usi
ng t
he
pha
se
of the
noise
signal.
In other word
s, the
noi
se i
s
a
s
sume
d to
be
un
correl
a
t
ed an
d a
dditi
ve
to the sp
ee
ch
sign
al. Its estimate of the
noise
si
gnal i
s
me
asu
r
e
d
durin
g sil
e
n
c
e or
non
-spe
ech
activity in the
sign
al.
While the sp
ectral
subt
ra
ction method [11]
can be e
a
sily implem
ented and eff
e
ctively
redu
ce
s the
noise pre
s
e
n
t in the corru
p
t
ed sign
al, there exi
s
t som
e
glarin
g sho
r
tcomi
ng a
s
the
dra
w
ba
ck of t
h
is al
gorith
m
. Its re
sidu
al n
o
ise
or
m
u
si
cal noi
se i
s
ob
vious that the
effectivene
ss
of the noi
se
removal
process is
de
pen
dent on
obtai
ning a
n
a
c
cu
rate
spe
c
tral
estimate
of the
noise si
gnal.
The bette
r
the noi
se e
s
timation, the lesser th
e resid
ual n
o
ise
conte
n
t in the
modified
sp
e
c
trum.
Ho
we
ver, si
nce
n
o
ise
spe
c
tru
m
cannot
be
dire
ctly obt
ained. T
he
n
o
ise
removal p
r
o
c
ess is forced
to use an a
v
erage e
s
tim
a
te of the noise. He
nce, there a
r
e
so
me
signifi
cant va
riation
s
b
e
tween the
e
s
timated n
o
ise
spe
c
trum a
nd the
actu
a
l
noise
cont
ent
pre
s
ent i
n
the
instanta
neo
u
s
spee
ch
spe
c
trum. T
he subtra
ction of
these
qu
antities
results i
n
t
h
e
pre
s
en
ce of
isolate
d
re
si
dual noi
se le
vels of
large
variance.Thi
s
re
sidu
al sp
ectral
conte
n
t
s
manifest the
m
selve
s
in t
he re
co
nstru
c
ted time
sig
nal as va
ryin
g tonal soun
ds resulting i
n
a
musi
cal
distu
r
ban
ce
of
an
unn
atural q
u
a
lity. Th
is
m
u
si
cal
noi
se
can
be
even
more di
stu
r
b
i
ng
and a
nnoyin
g to the li
ste
ner th
an th
e
origin
al n
o
ise
co
ntent. Several
re
sidu
al
noise redu
cti
on
algorith
m
s h
a
v
e been prop
ose
d
to overcome this p
r
ob
lem. However, due to the limitations of the
singl
e-cha
n
n
e
l en
han
cem
ent metho
d
s,
it is
not po
ssi
ble to
re
move this n
o
ise
complet
e
ly,
without
co
mp
romi
sing
the
quality of th
e
enh
an
ced
spee
ch.
Hen
c
e the
r
e i
s
a t
r
ade
off bet
ween
the amount of
noise redu
cti
on and
spe
e
ch dist
ortio
n
d
ue to the und
erlying p
r
o
c
e
ssi
ng.
In addition, the distortion is also due to hal
f/full wave rectification in the modified speech
spe
c
tru
m
. It may co
ntain
some
neg
ative value
s
d
u
e
to the erro
rs in estim
a
ted
noise spe
c
trum.
These value
s
are re
ctified
usin
g hal
f-wave rectification (s
et to zero
) or full-wave
rectification (set
to its ab
solut
e
value). Thi
s
can al
so l
e
a
d
to furthe
r di
stortion
s in th
e re
sulting ti
me sig
nal. Be
side
of that, the rough
ening
of the spe
e
ch due to t
he n
o
isy pha
se af
fected the
sp
eech sig
nal. The
pha
se of the noise-co
rru
pted sig
nal is
not enha
nce
d
before b
e
in
g combi
ned
with the modi
fied
spe
c
tru
m
to g
enerate the
e
nhan
ce
d time
sig
nal [12].
T
h
is i
s
d
ue to
the fa
ct that the p
r
e
s
en
ce
of
noise in the
p
hase info
rmat
ion do
es
not
contri
bute im
mensely to th
e deg
rad
a
tio
n
of the
spe
e
c
h
quality. This i
s
e
s
pe
cially t
r
ue
at high S
N
Rs
(>
5dB).
Ho
wever, at t
he lo
wer SNRs
(<0dB
), the
noisy p
h
a
s
e
can l
ead to
a perceiva
b
l
e
rou
ghn
e
ss in the spee
ch
sign
al co
ntributing to
the
redu
ction
in
spee
ch q
uality. Estimating t
he ph
ase of
t
he cl
ean
spe
e
ch i
s
rathe
r
difficult and
will
greatly i
n
cre
a
se
the
com
p
lexity of the
metho
d
. Mo
reove
r
, the
d
i
stortion
d
ue
to noi
sy p
h
a
s
e
informatio
n is not very significant co
mp
ared to
that of the magnitude sp
ect
r
u
m
, espe
cially
for
high SNRs.
Hen
c
e th
e u
s
e of the
noi
sy pha
se
i
n
formatio
n is
consi
dered to
be an
acce
ptable
pra
c
tice in th
e recon
s
tru
c
ti
on of the enh
anced spee
ch sign
al.
1.1.2. Statisti
cal-Mod
el-Based Me
thod
s and Wien
e
r
Filtering
It is a ne
w
spe
e
ch en
ha
ncem
ent met
hod
kno
w
s a
s
spee
ch
bo
osting. T
he
method
increa
se
s the
relative p
o
wer of the
spe
e
ch th
us a
c
ting a
s
a
sp
ee
ch b
o
o
s
ter, in
stead
of focu
sing
on su
ppressi
ng the noi
se
. These spe
e
ch e
nha
nce
m
ent algo
rith
ms [2, 6, 7]
are po
se
d in
a
statistical esti
mation frame
w
ork. To find
a li
near (o
r nonlin
ear) estimator of the param
eter
of
intere
st, nam
ely the tran
sf
orm
coeffici
e
n
ts of t
he
cle
an si
gnal
by given a
set o
f
measure
m
e
n
ts
corre
s
p
ondin
g
to the F
o
u
r
ier t
r
an
sform
co
effici
ent
s
of the noi
sy
sign
al.The
Wiener filter a
nd
minimum me
an-squ
a
re error (MMSE
)
algorithm
s, am
ong othe
rs, fall in this cat
egory. The a
r
ea
of this work
wa
s initiated
by McAulay and
Malp
ass [13], who propo
sed a ma
ximum-likelih
ood
approa
ch fo
r estimating t
he Fou
r
ie
r transfo
rm
c
oef
ficients (spe
ctrum) of
the clea
n
si
gnal, and
wa
s follo
we
d
by Ephraim
a
nd M
a
lah
[14
], who
propo
sed a
n
MMSE
estim
a
tor
of
the ma
gnitud
e
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Speech Enha
ncem
ent ba
sed on Wi
ene
r Filter and Co
m
p
ressive Sensin
g
(Am
a
rt Sulong
)
369
spe
c
tru
m
.
In addition, mu
ch
work with
the Wien
er
filter algo
rithm wa
s
initiate
d in
the
spee
ch
enha
ncement
field by
Lim
and
Opp
enh
e
i
m [15, 1
6
]. L
o
izo
u
[2] m
e
n
t
ion that the
statistical-mod
e
l
focu
se
s on
n
online
a
r e
s
ti
mators of th
e magnitu
de
(i.e. the mo
dulu
s
of the
DFT
coeffici
e
n
ts)
rathe
r
that the compl
e
x spectrum of the sign
al as done by the Wien
er filter, usin
g various
statistical mo
dels an
d opt
imization
crit
eria. T
h
e
s
e
nonlin
ear e
s
timators ta
ke
the p
r
ob
abil
i
ty
den
sity functi
on (P
DF)
of the noi
se
and
the sp
ee
ch
DFT
coeffici
e
n
ts expli
c
itly into acco
unt a
nd
use, i
n
some
ca
se
s, no
n-G
aussia
n
p
r
ior
distrib
u
tion
s.
These e
s
tima
tors
are often
com
b
ine
d
wi
th
soft-de
ci
sion
gain modifi
ca
tion that take
s the
probabil
i
ty of speech
pre
s
ent into a
c
count.
A paramete
r
of a
statisti
cal
estimatio
n
fram
ework in no
nline
a
r estimato
r
of intere
st
depe
nd on m
easure
m
ent
s corre
s
p
ond t
o
the set of D
FT coeffici
ent
s of the noisy
signal (i.e. the
noisy spect
r
u
m
) and the p
a
ram
e
ter of intere
st are
th
e set of DFT coeffici
ents of
the clean si
g
nal
(i.e. the
clea
n si
gnal
sp
ectrum). Va
riou
s te
chni
que
s
exist in the
e
s
timation th
e
o
ry literature
for
deriving
the
s
e nonli
nea
r
estimato
rs a
nd in
clud
e t
he maximu
m
-
likeli
hoo
d e
s
timators. T
h
e
s
e
estimato
rs
differ prima
r
ily in the assu
mptions
mad
e
about the
para
m
eter
of intere
st (e.g.
determi
nisti
c
but un
kno
w
n,
rand
om)
and
the form
of o
p
timization
criteria u
s
ed. In
[2], Loizou
h
a
s
mentione
d the following a
l
gorithm
s: the maximum-l
i
kelih
ood e
s
timator, an M
M
SE magnitud
e
estimato
r, an
d a l
o
g
-
MMS
E estimato
r.
Bayesian
e
s
timators of th
e
mag
n
itude
spectrum
ba
sed
on p
e
rceptu
a
lly motivated disto
r
tion
m
easure
were
also d
e
scri
bed. MAP e
s
timators
of
the
magnitud
e
a
nd pha
se
sp
ectra
we
re p
r
esented.
Se
veral metho
d
s
of incorp
orating sp
ee
ch
-
pre
s
en
ce
un
certai
nty in the proceedin
g
estima
to
rs also di
scu
s
sed. Th
ese
method
s, wh
en
combi
ned
with the statistical estimato
rs,
sub
s
tantially redu
ce
d the resid
ual noi
se.
Furthe
rmo
r
e,
Yang [16]
ref
e
rred th
at sp
eech e
nha
ncement in
Wie
ner filter i
s
al
so
ba
sed
on the Short Time Fouri
e
r Tran
sform (STFT) tech
ni
que, and u
s
e
d
the same
basi
c
estim
a
tion
prin
ciple
as t
he sp
ect
r
al subtra
ction m
e
thod
s. The
Wien
er filt
er method can effectively
re
duce
Gau
ssi
an
noi
se. It is al
so
used ST
FT
in the
Minim
u
m Me
an S
q
uare
Estimati
on-Sh
ort Ti
m
e
Spectral Amp
litude (MMSE-STSA) meth
od. The
me
th
od a
s
sume
s t
hat the n
o
isy
sp
ee
ch STF
T
coeffici
ents f
o
r co
ntinuo
us frames a
r
e i
ndep
ende
nt
Gau
ssi
an variable, whi
c
h can be statisti
cally
modele
d
to estimate the cl
ean spee
ch
spectrum.
1.1.3. Subspace Algo
rith
ms
Unli
ke th
e p
r
ece
d
ing
alg
o
r
ithms,
the
subspa
ce
alg
o
rithm
s
a
r
e
rooted
prim
ari
l
y from
linear alg
ebra theo
ry. In a
ddition, ve
cto
r
sub
s
pa
ce
t
e
ch
niqu
e u
s
e
d
STFT
-ba
s
e
d
techniq
u
e
s
for
spe
e
ch enh
a
n
cem
ent met
hod [17, 18]. A vector su
b
s
pa
ce te
chni
que u
s
ually h
a
s the follo
wi
ng
measurement
step to
imp
r
o
v
e of sp
ee
ch
sign
al. At
first, the noisy
spee
ch i
s
d
e
compo
s
ed
into
a
vector
spa
c
e.
Then the
noi
sy sp
ee
ch ve
ctor
sp
a
c
e i
s
divided into a
sign
al su
bsp
a
ce
and n
o
ise
sub
s
p
a
ce. Finally, the noi
se
sub
s
p
a
ce
is re
mov
ed
a
nd spee
ch
si
gnal i
s
re
co
n
s
tru
c
ted from
th
e
sign
al sub
s
p
a
ce.
The
r
e
are seve
ral
tran
sform
a
tion
tech
niqu
es usin
g
for
vector sub
s
pa
ce
to
s
p
ee
ch
e
n
han
c
e
me
n
t.
Most of re
se
arche
s
com
m
only used
the
Karhu
n
e
n
-Lo
e
ve Tra
n
sform (KLT
) and the
discrete
co
si
ne tran
sform
(DCT) for n
o
isy sp
ee
ch
decompo
sitio
n
. KLT is an
optimal Eigen
decompo
sitio
n
techniq
ue,
but
DCT i
s
more
com
p
utationally eff
i
cient. In
ge
neral, th
e ve
ctor
sub
s
p
a
ce [1
7, 18] usuall
y
use
s
a La
place mod
e
l
or Ga
ussia
n
model to d
e
scrib
e
the
sig
nal
sub
s
p
a
ce, an
d u
s
e
s
a
Ga
ussian
mod
e
l
to de
scrib
e
t
he n
o
ise
sub
s
pa
ce. In
ad
d
i
tion,the spee
ch
sign
al [17] de
grad
ed
by un
correl
ated a
d
d
itive noise
b
a
se
d on
the
vector
su
bsp
a
ce
of the n
o
isy
sign
al that can be de
com
posed into a
signal
plu
s
noise su
bsp
a
c
e an
d the orthogo
nal noi
se
sub
s
p
a
ce. Decom
p
o
s
ition
of the vector spa
c
e of
the
noisy si
gnal
is pe
rform
ed
by applying a
n
eigenvalu
e
o
r
sing
ula
r
value de
comp
o
s
ition
or by
applying the
Karhun
en-L
oeve tran
sform
(KLT). Th
e p
r
oce
s
sing i
s
o
n
ly perfo
rme
d
on the ve
ctors i
n
the
sin
g
le subspa
ce
while the
noi
se
sub
s
p
a
ce i
s
removed fi
rst.
The id
ea
of t
h
is
app
roa
c
h
is that
noi
sy
spe
e
ch fra
m
e
s
a
r
e
cl
assifi
ed
into spe
e
ch-dominate
d
frames. In
sp
eech domi
n
a
t
ed frame
s
, the sign
al K
a
rhu
nen
-Lo
e
v
e
transfo
rm
(K
LT) m
a
trix is use
d
, an
d i
n
the n
o
ise-d
o
minated
fra
m
es, the
noi
se KLT
matri
x
is
use
d
.
1.2. Compre
ssiv
e
Sensing
Comp
re
ssive
se
nsi
ng
(CS) is a
fun
damentally
n
e
w
app
roa
c
h
to d
a
ta a
c
quisitio
n
approa
ch a
n
d
a ne
w typ
e
of sa
mplin
g theo
ry
whi
c
h p
r
edi
ct
s that sp
arse
signal
s can b
e
recon
s
tru
c
ted
from wh
at previou
s
ly b
e
lieved to b
e
inco
mplete
information
[19]. The the
o
ry
assert
s that
one
can
re
cover
certai
n
sign
al from
far fe
wer sa
mples or me
asu
r
em
ents t
han
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 367 –
379
370
traditional
me
thods u
s
e [2
0
]. This
CS th
eory
relie
s
on
the e
m
piri
cal
ob
servatio
n t
hat ma
ny type
of sign
als
ca
n be
well ap
proximate
d
b
y
sparse
expan
sion in t
e
rm
s of suitable b
a
si
s.
The
traditional ap
proa
ch of
re
con
s
tru
c
ting sign
al
s from
measured
data fo
llows the well
-kn
o
wn
Shanno
n
sa
mpling th
eore
m
[21]. Many
solutio
n
s t
o
spa
r
se a
ppro
x
imation hav
e bee
n p
r
op
o
s
ed
,
such as mat
c
hing pursuit (MP),
least absolute
shrinkage and
sele
ction operator (LASSO), basi
s
pursuit (BP
)
, and g
r
adi
ent pu
rsuit (GP), in
whi
c
h of it
s pe
rforman
c
e
sho
w
some
interde
pen
de
nce
bet
wee
n
the
numb
e
r of
mea
s
urem
ent n
o
i
s
e,
sign
al
sparsity and
the
recons
truc
tion algorithm [23].
The CS can be explained by consider a r
eal-valued, finite-length, one-dimensional,
discrete-time signal
which
can viewed as
an
1
column
v
e
ct
or in
with elements
,
1
,2
,3
,…,
and treat it to higher dimentional dat
a
by vectorizing it into a long one-
dimentional vector. Any signal
can be
represented in terms
of a basis
of
1
v
e
ct
ors
.
For
simplicity, assume
that the basis
matrix
,…,
is the certain
domain of the
trans
form matric
with the vec
t
ors
as columns and generally view as transform domain,
i.e.
Wavelet transform
(WT), discrete
cosine transfo
rm (DCT)
and discrete
Fourier transform (DFT).
A signal
can be expressed as
(1)
where
is the
1
column vector of weighting
coefficients
〈
,
〉
and
.
denotes
transposition. Clearly,
is in the
domain.
In
the CS
method [22], the
signal
represents the
foundation
forms of the
transform
coding that can compress signals
which approximated well in data acquisition
systems. This transform codi
ng plays
a
central role
to
sample
of the data
signal
. This
CS
approach addresses the inefficiencies of cl
assicall approach that introduced by
Shannon-
Nyquist theorem by di
rectly acquiring a compressed signal
representation without going
through
the
intermediate state of acquiring
sample. Consider a general linear measurement proess
that computes
≪
inner products between
and
a collection of vectors
a
s
i
n
〈
,
〉
. Arrange
the measurements
in
an
1
vec
t
or
and
measurement vector
as rows
in an
matrix
. Then by substituting
from the (1),
can be written as
Θ
(2)
where
Θ
is
matrix of
random linear
which r
epresent the
measurement process
and
typically
log
. The measurement process
is not adaptive, meaning that
is
fixed and
doest not
depend on
signal
.
The problem
consists of
designing a stable
measurement matrix
such that the
salient in any
or compressible
signal is
not
damaged by the dimensionality reduction from
and
and a
reconstruction
algorithm to
recover
from only
measurements
(or
about as many
measurements as the
number of coefficients recorded by tr
adition transform coder (see Figure 1).
In CS’s spa
r
sity of the desired si
gnal wi
th
sparse rep
r
esentation
in
a known tra
n
sform
domain. Nu
mber of sig
n
ificant (stri
c
tly speak
in
g
nonzero
)
compon
ents i
s
relatively small
comp
ared to signal len
g
th. The spa
r
sity repre
s
ent
ation in the form of
,
an
d
‖
‖
∑
|
|
that count t
he num
ber
of non
zero co
mpone
nt of
. This
CS can
comp
re
ssible
sig
nal
down to a
mu
ch sm
alle
r o
b
se
rvation
space by
usi
ng
a
pprop
ria
t
e
observation
matrix, then t
he n
on-li
nea
r re
con
s
truc
tio
n
techniq
ues
develop
ed fo
r buildin
g
spa
r
se
rep
r
e
s
entatio
ns that can
be use
d
to deco
de t
he si
gnal. Furth
e
rmore, it has
been sho
w
n
that
both in terms of the numb
e
r of sampl
e
s and the n
u
m
ber of bit
req
u
ired to
en
co
de the
sampl
e
s,
comp
re
ssive
sen
s
in
g ca
n be alm
o
st
as effici
ent
as u
s
in
g
a sp
arse transfo
rm d
o
m
ain
rep
r
e
s
entatio
n with traditio
nal sam
p
ling
with low ma
rg
in of the erro
r for the recon
s
tru
c
tion.
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Speech Enha
ncem
ent ba
sed on Wi
ene
r Filter and Co
m
p
ressive Sensin
g
(Am
a
rt Sulong
)
371
2. Proposed
Speech Enh
a
ncemen
t Al
gorithm
Variou
s spe
e
ch enha
ncement
alg
o
ri
th
ms
have been pro
p
o
s
ed
to
impro
v
e
the
perfo
rman
ce
of modern communi
catio
n
device in
n
o
isy enviro
n
ments. The
backg
rou
nd
noise
level and th
e
cha
r
a
c
teri
sti
cs
are con
s
ta
ntly chan
ging
in a real
env
ironm
ent. Th
e elu
s
ion
of the
noisy
sign
al t
hat is
reli
able
and fai
r
com
pari
s
on
bet
ween
algo
rithm
s
h
a
ve be
en
emerged. T
h
ere
are
several rese
arche
s
sh
ow that the f
a
tigue an
d
ex
hau
stion of th
e sign
al dep
e
nds o
n
the la
ck
of commo
n spee
ch data
b
a
s
e for eval
uat
ion of new
al
gorithm
s, differen
c
e
s
in th
e types of noi
se
use
and differen
c
e
s
in
testing methodol
ogy
. Furthermore, understa
nding the
spe
e
ch
cha
r
a
c
teri
stics and a com
m
on sp
ee
ch databa
se will
help in desi
gning spee
ch
enhan
cem
e
nt
algorith
m
s to
access to n
early po
ssi
bl
e for re
se
a
r
chers to com
p
are at very le
ast the obje
c
tive
perfo
rman
ce
of their algo
rithms with that
of others.
N
x
x
ˆ
N
M
K
spar
se
K
M
y
M
M
Figure 1. The
comp
re
ssive
sen
s
ing a
pproac
h for se
nsing the mea
s
urem
ent matrix.
Figure 2. The
propo
se
d sp
eech enh
an
cement ba
s
ed
on Wie
ner filter and
com
p
ressive se
nsi
n
g
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 367 –
379
372
Figure 1
sho
w
s the
CS
modificatio
n
for the
spee
ch
sig
nal to
eliminat
e th
e noi
se.
Figure 2
sho
w
s th
e propo
sed
algo
rithm
that appli
ed Wien
er
Filte
r
and comp
re
ssive
sen
s
ing for
the spe
e
ch e
nhan
cem
ent pro
c
e
ss. In th
e prop
osed al
gorithm, a
s
shown in Figu
re 2, it is started
with initial state by acqui
ring noisy sp
eech then
m
easure the n
o
isy sig
nal in
Wiene
r filter to
obtain noi
se
estimation a
n
d
estimate
sp
eech and
ca
l
c
ulate the g
a
i
n
parameters of the spee
ch
and n
o
ise. The
comp
re
ssive
se
nsi
n
g (CS) by
usin
g G
r
a
d
ient p
r
oje
c
tion for
sp
a
r
se
recon
s
tru
c
tio
n
(GPSR) al
gorithm [24]
will me
a
s
ure the value
of the noise redu
ction
and
prod
uci
ng th
e e
s
timation
of sp
ee
ch
si
gnal. Th
en
synthesi
s
blo
c
k
will p
r
od
uce the
enha
n
c
ed
speech
signal
. This speech enh
ancement signal later
will be
eval
uated using PESQ to m
e
asure
the quality of the enha
nced
spee
ch.
In Figure 1, the processing of
the CS
will use GPS
R
by measuri
ng the
signal
following
equatio
n (3
) to estimate th
e clea
n sp
ee
ch si
gnal.
min
1
2
‖
‖
‖
‖
(3)
whe
r
e
,
, and
is
matrix. T
he
is a nonnegative para
m
eter,
‖
‖
refer to the
of
, and
‖
‖
refers the Eu
cli
dean
norm of
.Equation (3
) is
relate
d to
the followi
ng
convex con
s
traine
d optimi
z
ation p
r
obl
e
m
s
min
‖
‖
‖
‖
(4)
and
min
‖
‖
‖
‖
(5)
where
and
are
nonnegative real parameters. It
was ut
ilized due to
it reconstruction quality to
trade
with available processing power at inve
rse
transform domain and then synthesi back to
gain the enhancement of
the speech signal. At
the
end of the
process, t
he measurement of
the
quality
of speech signal also proposed by usi
ng the perceptual evaluation of speech quality
(PESQ) s
c
o
re [2].
3. Results a
nd Discu
ssi
on
The proposed algorithm and
other algorithm we
re utilized
its performance levels
using
objective measure of PESQ score of ITU-T P.862
to achieve the main objective of the enhanced
speech signal [25].
Itsobjective PESQ
correlation wi
th subjective
test is
93.5% compare
with
other
objec
tive tes
t
[2]. The PESQ
objec
tive as
s
e
s
s
m
ent
tes
t
s
was
evaluated
at four different
type’s
noise, i.e. babble, car, ex
hibition, restaurant noise
respec
tively, under 0, 5, 10, and 15 dB
SNR.
New speech quality
assessment test is
introduced
in [7], in
terms of percentage PESQ
improvement (
) and can be expressed as follows
%
100
ref
ref
proc
PESQ
PESQ
PESQ
(6)
where
proc
PESQ
is defined as
PESQ score of the
enhanced speech.
ref
PESQ
is defined as the
PESQ
score of the clean speech as the refer
ence
speech respectively. Its improvement
is
also
evaluated based on noise corrupted to
the
speech signal within various environments and
SNRs.
Its objective measures used the noisy
speech corpus (NOIZEUS) of IEEE subcommittee
1996 standard [2].
Other traditional
algorithms are orig
inal Wiener
filter algorithm
[26], spectral
subtraction (specsub) [27], ss_rdc [
28], logmmse_SPU [29], and klt [30].
Figure
3shows the comparison of t
he
enhanced speech signal of the proposed
algorithm
and other traditional
methods. At vari
ous
enviroments of noise
attack to the speech
signal, the
proposed algorithms
produced the
best re
sult than
traditional methods
in term of
speech wave form while klt and ss_rdc algorithm
are highly distorted the speech signal. Figure 4
clearly presents the worse case scenario for kl
t because it suppressed most identity of speech
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Speech Enha
ncem
ent ba
sed on Wi
ene
r Filter and Co
m
p
ressive Sensin
g
(Am
a
rt Sulong
)
373
signal and also in Figure 5.
The overall proposed
algorithm in Figure
3, Figure 4, and Figure
5
performed
the best improvement
among other algor
ithms. In
other words, logmmse_SPU was
observed with acceptable result.
Clea
n Spee
ch’s waveform
N
o
is
y s
p
eech ’s
w
a
veform
The propo
se
d algorith
m
’s
waveform
klt
’
s w
a
v
e
f
o
r
m
logmm
s
e_SP
U
’s waveform
ss
_rd
c
Figure 3. Co
mpari
s
o
n
of the enh
an
ced
spe
e
ch
wavef
o
rm of the propo
sed al
gori
t
hm with
other alg
o
rith
m of the babb
le noise “sp9.
wav” at 0 dB SNR
Figure
5 represents comparison
of the PESQ
score of the
propsoed algorithm with
traditional
methods at
various noise condition,i.e
re
staurant, exhibition,
car,
babble noise
of 0, 5,
10, and 15 dB SNR. The PESQ score in rest
aurant and exhibition noise of the
proposed
algirthm outperforms than traditional method.
Particularly, the enhance speech of
restaurant
noise
at 0 dB SNR produced lower PESQ score co
mparing to PESQ score of noisy. However,
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 367 –
379
374
when
the dB SNR were increased to 5, 10, 15
dB SNR, the PESQ score results with better
performance
level in term
of speech quality
es
pecially in the
proposed algorithm. Most of
traditional methods in 5 and 10 dB SNR show
t
he PESQ scores close to the PESQ of the
Noisy
except in klt and Wiener algorithm.
Clea
n Spee
ch
Noi
s
y Speech
The propo
se
d algorith
m
klt
logmm
s
e_SP
U
ss
_rd
c
Figure 4. The
spe
c
trog
ram
s
of the pro
p
o
s
ed al
gorith
m
compa
r
e
with other alg
o
rit
h
m of the
babbl
e noise
“sp
9
.wav
” at 0 dB SNR
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Speech Enha
ncem
ent ba
sed on Wi
ene
r Filter and Co
m
p
ressive Sensin
g
(Am
a
rt Sulong
)
375
In Figure 5, the car
and babble noise in t
he
proposed algorithm given the best
results
when
compare among other
traditional methods. The the
performanced PESQ scores of
traditional
methods slighly can be competed when
it
compared with PESQ score of noisy. Only
in
Specsub, logmmse, and klt at 15 dB
S
NR, the traditional methods produced better PESQ
score
than the proposed algorithm. In
Babble noise
at 0 dB
SNR, most the traditional methods
were lower than
PESQ score of
Noisy but
the proposed algorithm
outperforms than
other
methods. Particularly, the proposed algorithm
in
babble noise can
clearly be observed its
best
performance score comparing with others methods and noisy.
Clea
n Spee
ch
Noi
s
y Speech
The propo
se
d algorith
m
klt
logmm
s
e_SP
U
ss
_rd
c
Figure 5. The
spe
c
trum d
e
n
sity of the propo
sed
al
gori
t
hm comp
are
d
to other alg
o
r
ithms of the
babbl
e noise
“sp
9
.wav
” at 0 dB SNR
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 367 –
379
376
Re
staurant Noise
Exhibition Noi
s
e
Car Noi
s
e
Babble Noise
Figure 6. PESQ s
c
ore
c
o
mparis
on of the propos
ed
algorithm c
o
mpared to the other algorit
h
ms
0
0.5
1
1.5
2
2.5
3
Noisy
Propos
ed
method
Wiener
Spec
sub
SS_rdc
l
ogmmse
SPU
KLT
0
dB
5
dB
10
dB
15
dB
0
0.5
1
1.5
2
2.5
3
3.5
Noisy
Propos
ed
method
Wiener
Spec
sub
SS_rdc
logmmse
SPU
KLT
0
dB
5
dB
10
dB
15
dB
0
0.5
1
1.5
2
2.5
3
Noisy
Propos
ed
method
Wien
er
Spec
sub
SS_rdc
logmmse
SPU
KLT
0
dB
5
dB
10
dB
15
dB
0
0.5
1
1.5
2
2.5
3
Noisy
Propos
ed
method
Wien
er
Spec
sub
SS_rdc
l
ogmmse
SPU
KLT
0
dB
5
dB
10
dB
15
dB
Evaluation Warning : The document was created with Spire.PDF for Python.