TELKOM
NIKA
, Vol.14, No
.1, March 2
0
1
6
, pp. 181~1
8
6
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i1.2021
181
Re
cei
v
ed Ma
y 13, 201
4; Revi
sed
No
ve
m
ber 27, 201
5; Acce
pted
De
cem
ber 1
7
,
2015
A Novel Scheme of Speech En
hancement using Power
Spectral Subtraction - Multi-Layer Perceptron Network
Budiman P.A. Rohman
*, Ken Parama
y
udha, Asep Yudi Hercua
di
Rese
arch Ce
nter for Electroni
cs and T
e
leco
mmuni
cati
ons,
Indon
esi
an Institute of Scienc
es
Kampus LIPI Gd.20 Lt. 4 Jl. S
angk
uria
ng Ba
ndu
ng 4
013
5, Indo
nesi
a
*Corres
p
o
ndi
n
g
author, em
ail
:
budima
n.par
@gmai
l
.com
A
b
st
r
a
ct
A novel
meth
od for eli
m
i
nati
ng no
ise from
a nois
ed spe
e
c
h sign
al in or
der to improve
its qualit
y
usin
g co
mbin
e
d
pow
er s
pectr
al su
btractio
n
and
multi-l
a
yer
perce
ptron
ne
tw
ork is prese
n
ted i
n
th
is p
a
per.
Fi
rstl
y, th
e
co
n
t
a
m
i
n
a
t
e
d
spe
e
c
h
sig
n
a
l
w
a
s p
r
o
c
e
s
se
d
b
y
sp
e
c
tra
l
sub
t
ra
cti
o
n
to
en
h
a
n
c
e
th
e
clean
speec
h si
gna
l. T
hen, the
si
gna
l w
a
s proc
essed
by a
n
eura
l
n
e
tw
ork usin
g the
sp
ectral su
btracti
o
n
para
m
eters an
d result of esti
mate
d sp
eech
sign
al in
or
d
e
r to impr
ove its
signa
l qu
ality
and i
n
tel
lig
ibi
lit
y.
T
he artificia
l
n
eura
l
netw
o
rk used w
a
s mult
i-layer p
e
rce
p
tron netw
o
rk co
nsisted of thre
e layers w
i
th six
inp
u
t an
d
one
output. T
h
e n
e
u
ral
netw
o
rk w
a
s trai
ned
w
i
th three
spe
e
ch
sign
als c
onta
m
inate
d
w
i
th tw
o
level
w
h
ite
ga
ussia
n
n
o
is
es
in S
NR
incl
ud
i
ng
0
dB a
n
d
3
0dB. T
h
e
d
e
si
gne
d s
peec
h
enh
anc
ement
w
a
s
exa
m
i
ned
w
i
th ten
nois
e
d
sp
eech
sig
n
a
l
s.
Based
o
n
the
exper
iments,
t
he improv
e
m
e
n
t of si
gn
al
qu
ality
SNR w
a
s up to 7 dB w
hen the sign
al qu
a
lity i
nput w
a
s 0dB. T
hen, ba
sed on the P
ESQ score, th
e
prop
osed
met
hod
can
i
m
pro
v
e u
p
to
0.4 fr
om its
or
ig
in v
a
lu
e. T
hos
e e
x
peri
m
e
n
t res
u
lts show
that
th
e
prop
osed
method is ca
pa
ble
to impr
ove bo
th the sign
al
q
uality a
nd i
n
tel
ligi
b
il
ity better than the ori
g
i
n
a
l
pow
er spectral
subtractio
n.
Ke
y
w
ords
: sp
eech e
n
h
ance
m
e
n
t, spectral
subtractio
n, arti
ficial n
eur
al n
e
t
w
o
rk, multi-lay
e
r perce
ptron.
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
The
sp
ee
ch
enh
an
ceme
nt is an
im
portant
tool for sup
portin
g
ma
ny
ap
p
lication
s
esp
e
ci
ally
in the
telecom
m
unication a
r
ea
s su
ch
a
s
in the m
obil
e
co
mmuni
ca
tion. Others,
this
cap
ability ha
s direct i
n
fluen
ce
s to
the
pe
rforma
nce of the
hu
man
-
m
a
chi
ne
i
n
terfa
c
e appli
c
atio
ns
su
ch a
s
spe
e
c
h recognitio
n
and
sp
ea
ker recogniti
o
n
whi
c
h a
r
e ve
ry popul
ar
cu
rre
ntly. In many
situation
s
, th
e hig
h
level
n
o
ise
de
gra
d
in
g spee
ch
sig
nal
can
de
cre
a
se
the
pe
rfo
r
man
c
e
of th
ose
appli
c
ation
s
.
More
over, i
t
degrade
s t
he spee
ch
q
uality and in
telligibility, and al
so affe
cts
negatively to
the listen
e
r’
s
perceptio
n e
s
peci
a
lly in
m
obile
comm
u
n
icatio
n case
s [1]. This is t
h
e
main p
r
obl
e
m
of the sp
eech en
han
cement whic
h
almost
all o
f
these
spe
e
c
h e
nha
nce
m
en
t
approa
che
s
rely on the est
i
mation of a short-time
spe
c
tral gai
n [2].
There are se
veral types
of spe
e
ch enh
a
n
cem
ent alg
o
rithm inve
stig
ated by re
se
arche
r
s
over the wo
rl
d whi
c
h can
be se
parated
to two gr
ou
ps e.g. sin
g
l
e
cha
nnel a
n
d
multicha
nn
el
spe
e
ch meth
od. Ba
sed
o
n
several
re
sea
r
che
s
, th
e multi-ch
an
nel
spe
e
ch e
nhan
cem
ent
has
better pe
rformance than
singl
e ch
ann
el method
s. Ho
wever, be
cau
s
e of its
simpli
city and low
co
st impleme
n
tation, the si
ngle chan
nel
method i
s
still
worthy to be
explore
d
and
improve
d
. Th
e
most pop
ula
r
singl
e ch
ann
el spe
e
ch en
han
ceme
nt method is
spe
c
tral subt
ra
ctio
n.
Spectral subt
ractio
n i
s
we
ll kno
w
n
noi
se
red
u
ctio
n
method
and
it is o
ne of
the first
algorith
m
s fo
r spee
ch
enh
ancement fa
cing mu
sical n
o
is
e. Firstly, it was inve
sti
gated by Boll
in
1979 [3]. In its first method
, spectral su
b
t
raction
i
s
purposed for eli
m
inating mu
sical noi
se. In the
same
year, th
e spect
r
al
su
btractio
n
wa
s improv
e
d
by
Berouti in
19
79 [4]. He
de
veloped
an
o
v
er-
subtractio
n consta
nt for over-e
stimatin
g the noised
spe
e
ch sig
n
a
l
. Although the developm
e
n
t of
spe
c
tral
su
btractio
n ha
s
been
beg
un
sin
c
e 1
979
, this metho
d
ha
s be
en
use
d
in m
any
appli
c
ation
s
until no
w be
cause it is
rela
tively i
nexpen
sive in
com
p
utation [5]. Howeve
r, spe
c
tral
subtractio
n suffers from a
probl
em of i
n
trodu
ci
n
g
a
r
tifacts like no
ise
while
re
moving resi
d
ual
noise. It the
n
will influen
ce to the bo
th qualit
y and intelligibility of estimate
d spe
e
ch sig
nal.
Hen
c
e, curre
n
t rese
arch
e
s
on sp
ect
r
al
subtra
ct
ion t
e
ch
niqu
es a
r
e con
c
e
n
trat
ed on de
crea
sing
or removing this
nois
e
. [6]
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 1, March 2
016 : 181 – 1
8
6
182
Over th
ese
years,
spe
c
tral subtractio
n
ha
s bee
n
modified
and
improved
b
y
many
resea
r
chers
over the wo
rld. In 2002,
Sunil Kamath [7] investigated the mul
t
i band sp
ect
r
al
subtractio
n fo
r eliminatin
g the col
o
red n
o
ise in
co
rrup
ted sp
eech si
gnal. Th
i
s
re
search p
r
ovide
s
the definite
improvem
ent
over
the convention
a
l power
sp
ect
r
al
su
btra
ction. Oth
r
e
s
, a
n
investigatio
n of an improv
ed sp
ec
tral subtra
ction u
s
i
ng perce
ptua
l weighting fil
t
er wa
s held
by
RM Udrea [8]
.
In this re
se
arch, he h
ad
improv
e
d
the
quality of sp
eech si
gnal
s.
Then, in 2
0
1
1
,
Verteletskaya
[9] pro
p
o
s
e
d
the m
odifi
ed
spe
c
tral
subtra
ction fo
r eliminatin
g resid
ual n
o
ise
s
.
Also, the sp
e
e
ch e
nha
nce
m
ent usi
ng spectral
subtraction in
wav
e
let domain
wa
s propo
se
d by
Nishimu
r
a et.al in 1998 [10
].
In this paper,
we prop
ose
d
the novel appro
a
ch of spee
ch enh
an
ceme
nt by combinin
g
the spe
c
tral
subtractio
n a
nd a
r
tificial
n
eural
net
work for optimi
z
i
ng the
sp
ee
ch en
han
cem
ent.
The ne
ural
netwo
rk
use
d
is multi
-
lay
e
r pe
rceptro
n network b
e
ca
use amo
ng othe
r net
work
stru
cture this netwo
rk i
s
most
succe
s
sful de
sign [
11]. In this schem
e, sp
ect
r
al subtra
ctio
n is
u
s
ed
as
a main
p
r
o
c
e
s
s
a
n
d
th
e
n
e
u
r
a
l
n
e
t
w
o
r
k
b
e
c
a
me
a
n
o
p
t
imiz
e
r
b
y
r
e
c
a
lcu
l
a
t
in
g
an
d
then
improvin
g the
quality and i
n
telligibility of sign
al out
p
u
t
of spe
c
tral
subtractio
n.
The pu
rp
ose
of
this re
se
arch
is that by using thi
s
met
hod,
sp
ee
ch
sign
al will be
enhan
ce
d b
e
tter than o
r
igin
power spe
c
tral subtractio
n
method but still
keep its si
mplicity in the computatio
n.
2. Spectral S
ubtra
ction a
nd Arti
ficial Neur
al Net
w
ork
2.1. Spectral
Subtrac
t
ion (SS)
Spectral Subt
ractio
n is a
spee
ch en
han
ceme
nt meth
od whi
c
h p
r
o
c
e
ss in th
e freque
ncy
domain. In
this
algo
rithm
,
there
are t
w
o
step
s i.e.
VAD (V
oice
Activity Det
e
ction
)
step
and
spe
c
tral
subtraction.
In
the
step of
VAD, the
spee
ch
si
gnal i
s
proce
s
sed
for lab
e
ling
wheth
e
r the
framed
si
gnal
is voi
c
e,
unvoice
d o
r
silen
t
sign
al.
Thi
s
will lea
d
to
n
e
xt spe
c
tral
subtra
ction
ste
p
as a pri
m
ary
step of this spee
ch en
han
ceme
nt meth
od.
As
s
u
me
y(
n)
=
x(
n
)
+
d
(
n
)
is
th
e
s
a
mp
le
d
n
o
i
s
y
s
p
ee
c
h
s
i
g
n
a
l
c
o
n
s
is
tin
g
o
f
th
e
c
l
ean
sign
al x(n
)
a
nd the n
o
ise
sign
al d
(
n).
The freque
n
c
y dom
ain of
sign
al in
k
th
frame ca
n be
rep
r
e
s
ente
d
as bel
ow,
(,
)
(
(
,
)
)
YK
D
F
T
w
n
k
(1)
)
,
(
)
,
(
)
,
(
k
D
k
X
K
Y
(2)
After spe
e
ch sign
al y(n) i
s
transfo
rme
d
i
n
to the freq
u
ency d
o
main,
the spe
c
tru
m
, mean
and sta
nda
rd
deviation of first fram
ed si
g
nal k=1 is
co
nsid
ere
d
as n
o
ise.
)
,
(
)
(
k
Y
No
, k
=
1
(3)
))
(
(
10
log
))
,
(
(
10
log
20
No
k
Y
(4)
L
1
'
(5)
is m
agnitu
de
sp
ectral di
st
ance b
e
twee
n si
gnal
and
noise an
d
L
is
the len
g
th of f
r
ame.
'
is
the mean
of magnitud
e
sp
ectral
dista
n
ce in a fram
e. This valu
e th
en will b
e
co
mpared with t
he
pred
etermi
ne
d thre
sh
old
s
of noise ma
rgin (
m
N
) a
nd h
a
ngover (
h
). Commonly, the
value u
s
e
d
for tho
s
e t
w
o
threshold
s
are 3 a
nd 8
respectively.
If the me
an of
magnitud
e
sp
ectral
di
stan
ce is
lower than the noi
se ma
rgi
n
, this fram
e
will be labeled as
noise
signal. In
contrast, if the
mean i
s
highe
r than th
e hang
over consta
nt, this frame
will be consi
dered a
s
spe
e
ch sig
nal
.
In the spe
c
tral subt
ra
ctio
n step, this
rese
a
r
ch use
s
facto
r
of over-su
b
tra
c
tion and
spe
c
tral
-floo
r
based on p
o
steriori SNR.
others
D
Y
jika
D
D
Y
S
2
2
2
2
2
2
)
(
ˆ
)
(
ˆ
)
(
ˆ
)
(
ˆ
)
(
ˆ
)
(
ˆ
(6)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
A Novel S
c
he
m
e
of Speech Enhan
cem
ent usin
g Power Sp
ectral
… (Budim
an P.A. Rohm
an)
183
α
i
s
a
n
ove
r
subtractio
n fa
ctor for
overe
s
timating th
e
noise
spe
c
tru
m
.
β
is a
s
pec
tral floor fac
t
or
whi
c
h i
s
n
eed
ed to avoi
d th
e elimin
ation
of sp
ee
ch
sig
nal at the
lo
west level. T
h
e
optimal
ran
g
e
of
β
is betwe
en 0.1 an
d 0.
001. Belo
w is the equati
o
n
used fo
r cal
c
ulatin
g the o
v
er-subtractio
n
fac
t
or.
posterior
SNR
20
3
0
-5 dB
≤
posterior
SNR
≤
20
d
B
(
7
)
Then, the SNR (Sign
a
l to Noi
s
e ratio
)
is calculated u
s
ing belo
w
eq
uation.
2
2
)
(
ˆ
)
(
)
(
D
Y
SNR
posteriori
(8)
Whe
r
e
0
is the targete
d
α
wh
en sign
al in 0dB quality. Then, fo
r
power spe
c
tral subtraction step,
the optimal ra
nge from
0
is in
betwee
n
3 a
nd 6.
2.2. Artificial
Neural Net
w
ork (ANN)
Artificial Ne
u
r
al net
work i
s
de
sign
ed
based on th
e biologi
cal
human
brai
n
neuron
con
s
tru
c
tion.
As a human
brain rep
r
e
s
e
n
tation, t
he neural n
e
two
r
k generally co
nsi
s
ts of neu
ron,
weig
ht, activation functio
n
and laye
r. Neuron is
a simple
pro
c
e
ssi
ng unit.
In this part,
the
multiplicatio
n
of wei
ght a
n
d
a
c
tivation functio
n
is
pro
c
e
s
sed. Th
e
weig
ht is th
e
weig
ht value
of
input in neu
ra
l network. Thi
s
value will b
e
adapte
d
in the trainin
g
proce
s
s. Activation Fun
c
tion
is
need
ed fo
r a
threshold
p
r
o
c
e
s
s after
su
mming th
e
weighted
input.
Layer is a
set of ne
uro
n
s in
the neural net
work [11].
Figure 1. Co
mmon st
ru
ctu
r
e of
artificial
neural network
See figu
re
on
the top,
j
x
is i
nput of
neu
ra
l network. Th
e summatio
n
of wei
ghted
i
nput,
k
v
, can be com
puted a
s
:
p
j
j
kj
k
x
w
v
1
(
9
)
The output o
f
the
ne
uron,
k
y
, woul
d the
r
efore
be
the
outcome
of
some
sele
cte
d
a
c
tivation
function o
n
the value of
k
v
.
In the Multi-L
a
yer Perce
p
tron structu
r
e,
t
he neu
ral n
e
t
work con
s
ist
s
of several
hidde
n
layers b
e
twe
en input and
output layer [11]. Gener
all
y
, there are weig
hts in be
tween ne
uro
n
in
adjoin laye
r. This network is capa
ble
to so
lve co
mplex pattern. However, the training
and
comp
utation
of this netwo
rk is mo
re co
mp
lex than the singl
e layer network
stru
cture.
3. Rese
arch
Metho
d
In this re
se
arch,
we
propo
sed
the
u
s
e
of
Multi-L
a
y
e
r
Pe
rc
eptro
n Netw
or
k (ML
P
N) wit
h
three
hidd
en
layers u
s
ing
a log
a
rithmi
c
sigmoi
d a
c
tivation fun
c
tion
. Neu
r
al
net
work
con
s
i
s
ted
of
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 1, March 2
016 : 181 – 1
8
6
184
6 inp
u
ts, 1
o
u
t
put and
3
lay
e
rs
with 8,
4
and
2 n
e
u
r
on
s fo
r e
a
ch lay
e
r
re
spe
c
tivel
y
. Each
neu
ron
had a
bia
s
v
a
lue. Input
s
of neu
ral n
e
twork
co
ns
i
s
t
ed of en
han
ced
sp
ee
ch
sign
al, estim
a
ted
noise, mea
n
of estimate
d
noise, estim
a
ted SN
R, gradient of
esti
mated SNR
and VAD fla
g
.
Output of neu
ral network was the e
s
tima
ted clea
n sp
e
e
ch
sign
al.
Figure 2. Gen
e
ral de
sig
n
of propo
se
d sp
eech enh
an
cement
After sa
mplin
g, frame
-
blo
c
king
an
d wi
n
dowi
ng
Ham
m
ing
(with
time len
g
th i
s
25ms an
d
40% ove
r
lap
p
ing) p
r
ocesse
s,
noi
se
d
spe
e
ch
signa
l wa
s tran
sfo
r
med
into fre
quen
cy do
m
a
in
usin
g Di
scret
e
Fouri
e
r T
r
a
n
sform. Then
, the
signal i
n
frequ
en
cy domain
wa
s
pro
c
e
s
sed u
s
ing
Spectral Sub
t
raction
whi
c
h wa
s sepa
rated into
two
steps i.e. V
A
D and Sp
e
c
tral Subt
ra
ction.
VAD u
s
ed
m
agnitud
e
spe
c
tral
di
stan
ce
with
noi
se
m
a
rgin
=3
a
nd
hang
over co
nstant
= 8.
In
the
spe
c
tral subt
raction step
s
use
d
parame
t
er
β
= 0.03. After spe
c
tral
subt
ra
ction p
r
ocess, the n
e
xt
step is n
eural
network which lead to better quality sp
e
e
ch
sign
al.
Traini
ng of n
eural n
e
two
r
k in this filter wa
s co
nfigured by learni
n
g
rate 0.98,
maximum
epo
ch
100
0
and ta
rget
error M
ean
Squ
a
red
Erro
r (M
SE) 1x10
-1
. T
r
ainin
g
al
go
rithm u
s
e
d
in
this
experim
ent was
L
a
vend
berg Marqua
dt
. The initial
we
ights of the
n
eural
network were
sele
cte
d
rand
omly. Th
e a
c
tivation f
unctio
n
u
s
e
d
in this
ne
ural
netwo
rk was
logarith
m
ic si
gmoid fu
nctio
n
.
For t
r
ainin
g
step, the filter
wa
s trai
ned
b
y
3 di
ffere
nt
noised
spe
e
ches with
2 n
o
i
se l
e
vels S
N
R
i.e. 0dB a
n
d
30dB.
Noi
s
e
type u
s
e
d
i
n
this resea
r
ch
wa
s white noi
se. T
h
e
target
of n
e
u
r
al
netwo
rk traini
ng wa
s a cl
ea
n spe
e
ch sig
nal.
The p
r
o
c
e
s
s
of enh
an
cem
ent will
be
ru
n sequ
ent
iall
y starting
wit
h
spe
c
tral
su
btractio
n
and th
en
ne
ural
net
work.
Firstly, the
contami
nat
ed
sp
ee
ch
sig
n
a
l is p
r
o
c
e
s
sed by
spe
c
tral
subtraction t
o
get the first clean
speech
es
tim
a
tion. Then, this es
timated signal will
be
pro
c
e
s
sed fu
rther
by neu
ral network. T
he fun
c
tion o
f
neural
network i
n
this
m
e
thod
wa
s fo
r
improving the quality and
intelligibility
of estimated speech
si
gnal after
spectr
al subtracti
on.
Usi
ng the u
s
ed sp
ectral subtra
ction pa
ramete
rs
a
n
d
estimated n
o
ise, this n
e
twork was trai
ned
for re
con
s
tru
c
ting the clea
n
spee
ch
sign
al (See Figu
re 2).
The a
nalysi
s
of this meth
o
d
(titled a
s
NN-SS)
co
uld
be divide
d to
three i
n
clu
d
i
ng time
domain a
naly
s
is, fre
quen
cy domain an
alysis a
nd co
mpari
s
o
n
of sign
al quality
level (SNR)
and
PESQ (Perc
e
ptual Evaluation of S
peec
h
Quality) sc
ore. PESQ is
the mos
t
c
o
mplex to c
o
mpute
and this i
s
th
e one recom
m
ende
d by ITU-T for spe
e
ch q
uality a
s
sessme
nt of 3.2 kHz ha
n
d
set
telephony and narro
w-band speech
codecs [12]. PESQ measures pe
rformed modestly well in
terms of predicting both quality
and intel
ligibility [13]. The
score of
PESQ is ranged from 0-4.
5
whi
c
h the hig
h
score mean
s high q
uality and intellig
ibi
lity. All
of tho
s
e an
alysi
s
will be comp
are
d
to the original
powe
r
sp
ect
r
al subtractio
n
method (title
d as SS).
4. Simulation Resul
t
and
Analy
s
is
The traini
ng
pro
c
e
ss of d
e
sig
ned n
eural
netwo
rk i
n
the spe
e
ch enha
ncem
ent wa
s
conve
r
ge
d be
fore the g
oal
wa
s achieved
. The
trainin
g
stoppe
d at 8
7
epo
ch
s wit
h
Mean Sq
ua
re
Erro
r (MSE)
wa
s around
0
.
8755. Next, the train
ed
al
gorithm
wa
s tested by d
e
g
r
ade
d noi
sed
for
further a
nalysis.
4.1. Time Domain Anal
y
s
is
Below the
compa
r
ison o
f
spee
ch
sig
nal
u
s
ing sp
ectral su
btra
ction
a
nd
im
prove
d
spe
c
tral
subt
raction u
s
in
g neural network.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
A Novel S
c
he
m
e
of Speech Enhan
cem
ent usin
g Power Sp
ectral
… (Budim
an P.A. Rohm
an)
185
Figure 3. Co
mpari
s
o
n
time domain
re
sult
s of NN-SS
and SS with input SNR 0 d
B
Figure
3 sh
o
w
s
th
e comp
arison of
time
dom
ain sig
nal
am
ong cl
ean spe
e
ch sign
al,
noised sp
ee
ch
si
gnal
with
SNR
0
dB, estimated
cle
an spe
e
ch si
gnal by
SS and estimate
d
spe
e
ch sig
n
a
l
by NN-SS. It can be
se
e
n
cle
a
rl
y that
the quality o
f
the signal
a
fter the NN-SS
wa
s imp
r
ove
d
better th
an
by SS only. More
over,
by
the NN-SS, the cl
ean
sp
e
e
ch
sig
nal
co
uld
be re
co
nstru
c
ted an
d opti
m
ized. In det
ails, for
exa
m
ple, in the
sampl
e
num
b
e
r 1 to 40
00
the
sign
al which
wa
s
contain
e
d
by ba
ckg
r
o
und
noises o
n
ly wa
s
su
ccessfully s
upp
resse
d
to it
s
very
minimum val
ue. Then, in
the sample
number a
r
o
und 500
0 to 7000, NN-SS succe
ssf
ully
recon
s
tru
c
ted
and re
sem
b
led the o
r
ig
inal wavefo
rm whi
c
h ha
d been di
sto
r
ted by the SS.
Others, NN-SS coul
d repai
r the
spe
e
ch
sign
al wh
ich actually
h
ad been eliminat
ed
by
SS.
Th
is
ability could b
e
see
n
in the sampl
e
num
b
e
r aroun
d 90
00 to 1300
0.
4.2. Frequen
c
y
Domain Analy
s
is
Figure 4 sho
w
s th
e filterin
g re
sult by b
o
th
NN-SS and SS in the s
p
ec
trogram. After SS,
there
wa
s
still
a noi
se
si
gn
al left sp
re
adi
ng a
r
ou
nd th
e ori
g
inal
sig
nal. By NN-S
S, this noi
se
had
been
eliminat
ed with
out d
e
stru
cting th
e
origin
al form
of spe
e
ch si
gnal resulted
by the sp
ect
r
al
subtractio
n filter. In severa
l
sample
s, NN-SS re
con
s
t
r
ucte
d the sp
eech sign
als
whi
c
h actu
all
y
had be
en eli
m
inated by
SS. Overall, based on t
h
o
s
e fre
que
ncy
analyse
s
, NN-SS ha
d re
sult
better sig
nal
quality than SS as well.
Figure 4. Co
mpari
s
o
n
of spectrog
ram o
f
result by NN-SS and SS with input SNR 0dB
4.3. Compari
s
on of SNR
and PESQ Score
In addition, the NN-SS was examin
ed
base
d
on its sig
nal qual
ity and intelligibility
improvem
ent. Below is th
e mean
re
sul
t
sco
re of
testing of pro
p
o
s
ed
spe
e
ch e
nhan
cem
ent
by
10 deg
rad
ed
spe
e
ch sig
nal
s within SNR
rang
ed from
0 to 15dB,
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 1, March 2
016 : 181 – 1
8
6
186
Table 1. Co
m
pari
s
on S
NR
of result
s of NN-SS and S
S
No
SNR PESQ
Input
SS
NN-SS
Input
SS
NN-SS
1 0
3.12
7.01
1.56
1.56
1.93
2 5
5.16
11.03
1.92
2.02
2.42
3 10
9.31
12.84
2.31
2.48
2.63
4 15
14.53
13.77
2.67
2.94
2.84
Based
on th
e SNR valu
e
,
the table shows t
hat th
e NN-SS co
uld improve the sig
nal
quality up
to
7 dB
wh
en i
n
put si
gnal
SNR
wa
s 0
d
B.
The im
provement d
e
crea
sed
steeply fol
l
ow
the quality o
f
the input
sign
al. For
example,
wh
en the i
nput
sign
al SNR wa
s 5
d
B, the
improvem
ent
wa
s
app
roxi
mately 6dB.
Ho
wever,
th
o
s
e im
provem
ents
we
re
hi
gher than
the
SS
algorith
m
esp
e
cially wh
en the sign
al inp
u
t qualit
y was very low (0
-10dB
). Furth
e
rmo
r
e, dep
e
n
d
on the PESQ scores; the
output si
gnal of NN-SS had
the higher
sco
re than the
SS by up to 0.4.
It means that the NN-SS has bette
r perf
o
rma
n
ce
than
the SS in both quality and
intelligibility.
5. Conclusio
n
This pa
per h
a
s presented
the result
s of
applicatio
n of spee
ch
enhan
cem
e
nt usin
g
combi
nation of
power sp
e
c
tral su
btra
ction
an
d
multi-layer
p
e
rcept
ron network, namely Neu
r
al
Network-Sp
e
c
tral Subt
ra
ction (NN-SS) i
n
the pr
e
s
e
n
c
e of white G
aussia
n
noi
ses. Overall, NN-
SS is capabl
e to enhan
ce
the spee
ch signal rel
a
tive
ly better than the origin SS. Furthermore, in
the low sig
nal
quality input, this method has
significan
t
improvemen
t. The future rese
arche
s
ca
n
be co
ndu
cted
in the ca
se
of in the pre
s
en
ce of
the
non-station
a
ry noise
s. Oth
e
rs, the fu
rth
e
r
investigation
can be held l
eads to
mini
mize the
com
p
lexity of com
putation,
especi
a
lly if it
will be
impleme
n
ted
in real ap
plication.
Referen
ces
[1]
Yan, W
a
n
g
Gu
ang, Ge
ng Y
a
n
Xia
ng,
and
Z
hao
Xia
o
Qun.
A Sig
nal
Su
bs
pace
Spe
e
ch
Enha
ncem
ent
Method for V
a
r
i
ous N
o
is
es.
T
E
LKOMNIKA Indo
nesi
an J
o
u
r
nal of El
ectric
al En
gin
eeri
n
g
.
201
3; 11(2):
726-
735.
[2]
Ou Shifen
g, C
hao Ge
ng, Yi
ng Gao. Impr
oved
a Pr
ior
i
SNR Estimati
o
n
for Spe
e
ch
Enha
ncem
ent
Incorpor
atin
g Speec
h Disto
rtion Com
pon
ent.
T
E
LKOMNIKA Indon
es
ian Jo
urna
l of Electrica
l
Engi
neer
in
g.
2013; 11(
9): 535
9-53
64.
[3]
Boll Steven F.
A spectral subt
raction a
l
gor
ith
m
for s
uppr
ess
i
on of aco
u
stic nois
e
in spe
e
c
h
. Acoustics,
Speec
h, and Si
gna
l Processi
n
g
, IEEE Inter
nationa
l Conf
eren
ce on ICASSP'79.10
79; 4.
[4]
Berouti M,
R Sch
w
artz,
John Mak
h
oul.
En
hanc
e
m
e
n
t of spe
e
ch corru
pted
by acoustic
nois
e
.
Acoustic
s
, Speec
h, an
d Sig
n
a
l
Proc
essin
g
, IEEE Internatio
na
l C
onfere
n
ce
on
ICASSP'79
.
197
9; 4.
[5]
Vaseg
h
i Sa
eed
V. Advanced d
i
gital si
gn
al pr
o
c
essin
g
an
d no
ise red
u
ction. J
ohn W
i
l
e
y
& S
ons. 200
8.
[6]
Goel Paur
av, Anil Garg. Re
vie
w
of Spectr
al S
ubtracti
o
n
T
e
chniq
ues for Speec
h En
hanc
ement 1
.
(201
1).
[7]
Kamath Su
nil,
Phili
pos
Loiz
ou.
A multi-
ba
nd sp
ectral s
ubtractio
n
met
hod for
en
han
cing s
peec
h
corrupte
d
by color
ed no
ise
.
IEEE internati
o
nal conf
erenc
e
on acoustics s
peec
h and si
gn
al process
i
n
g
.
200
2; 4.
[8]
Udre
a Ra
du M
i
hn
ea, Nic
ola
e
D Vizire
an
u, Si
lviu C
i
och
i
n
a
. An improv
ed s
pectral s
ubtrac
t
ion meth
od
for spe
e
ch
en
hanc
ement
usi
ng
a p
e
rce
p
tu
al
w
e
i
ghtin
g fi
l
t
er.
Digital
Sig
nal
Process
i
n
g
.
2008;
1
8
(4):
581-
587.
[9]
Verteletska
ya
Ekaterin
a, Bor
i
s Simak.
No
i
s
e re
ductio
n
base
d
on m
o
difie
d
sp
ectral
subtracti
o
n
method.
IAEN
G Internation
a
l
journ
a
l of co
mputer scie
n
ce.
201
1;
38(1): 82
-88.
[10]
Nishim
u
ra
R
y
ouic
h
i, et
al.
Spe
e
ch
en
hanc
ement
u
s
ing
spectra
l
subtracti
o
n
w
i
t
h
w
a
ve
let
transform.
Elec
tronics and
Comm
unications
in Japan (P
art III: Fundamental Electronic Sc
ience).
19
98;
81(1): 24-
31.
[11]
Hu Yu He
n, Jenq-N
eng H
w
a
n
g. Hand
bo
ok of
neura
l
net
w
o
r
k
signa
l proces
sing. CRC
pres
s. 2001.
[12]
Hu Yi, Ph
ili
pos
C Lo
izou. Ev
alu
a
tion
of ob
j
e
ctive q
ual
it
y
measur
es for
speec
h e
nha
n
c
ement.
Audi
o,
Speec
h, and L
ang
ua
ge Proce
ssing, IEEE Transacti
ons o
n
.
200
8; 16(1): 22
9-23
8.
[13]
Ma Jianf
en, Yi
Hu, Phil
ipos
C Loiz
ou. Obj
e
ctive m
easur
es
for predicti
ng
speec
h inte
lli
gi
bilit
y i
n
no
is
y
cond
itions
b
a
s
ed on ne
w
b
and-
importa
nc
e
functi
ons.
T
he J
ourn
a
l
of
the Ac
oustic
a
l S
o
ciety
of
Amer
ica.
20
09;
125(5): 33
87-
340
5.
Evaluation Warning : The document was created with Spire.PDF for Python.