TELKOM
NIKA
, Vol.13, No
.1, March 2
0
1
5
, pp. 103~1
1
7
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i1.295
103
Re
cei
v
ed Au
gust 26, 20
14
; Revi
sed O
c
t
ober 9, 20
14;
Accept
ed De
cem
ber 1
0
, 2014
Saron Music T
r
anscription Based on
Rhythmic
Information Using HMM on Gamelan Orchestra
Yo
y
o
n
K. Suprapto
1
, Yosefine Tri
w
i
d
y
astu
t
i
2
1
Department o
f
Electrical Eng
i
ne
erin
g, Institut
T
e
knolog
i Se
pul
uh No
pem
b
e
r, Suraba
ya, I
ndo
nesi
a
2
Sekolah T
i
ng
gi Man
a
jem
en
Informatika da
n T
e
kn
ik Komputer (ST
I
KOM), Suraba
ya, In
d
ones
ia
email: yo
yo
nsu
p
r
apto
@
ee.its.
ac.id
1
, y
o
s
e
fine
_its@
y
a
hoo.co
m
2
A
b
st
r
a
ct
Now
adays, ea
stern mus
i
c ex
plor
ation is n
e
ede
d to
raise h
i
s pop
ular
ity that has bee
n ab
and
on
e
d
by the
p
eop
le,
espec
ial
l
y th
e
youn
ger
ge
ner
ation. Ons
e
t d
e
t
ection
in
Ga
mela
n
mus
i
c si
g
nals
are
n
eed
e
d
to
hel
p be
gin
ners
follow
the bea
ts and the not
a
t
ion. We prop
o
s
e a Hid
de
n Markov Mod
e
l (
H
MM) m
e
th
od
for
detectin
g
th
e
onset
of e
a
ch
event
in
the
s
a
ron
so
u
nd. F
-
me
asur
e of
a
v
erag
e th
e o
n
s
et detecti
on
w
a
s
ana
ly
z
e
d to g
e
nerate n
o
tatio
n
s
. T
he experi
m
ent de
mo
ns
trat
es 97.83
% F
-measur
e of mus
i
c transcriptio
n
.
Ke
y
w
ords
: ga
me
la
n music, mus
i
c tempo, o
n
set detecti
on,
hid
den
mark
ov
mod
e
l, music t
r
anscripti
on
1. Introduc
tion
Gamela
n is
a tradition
al
musi
cal in
strument of Ind
one
sia, which com
e
s f
r
o
m
Java. In
orde
r to p
r
e
s
erve ga
mela
n
as n
a
tional
h
e
ritage
and
t
o
brin
g ba
ck the greatne
ss
of this mu
sic
as
it was in 17-18 century, some
efforts
must be conducted to
make people m
o
re familiar with
gamela
n
and
to help them play this instrument ea
sie
r
.
Gamela
n co
nsi
s
ts of ab
out
fi
fteen group
s of di
ff
erent inst
rume
nts, su
ch a
s
Saron,
Kenong, Ke
m
pul, Kend
ang
, Bonang, et
c. Some of
the
instrument
s
have the
sam
e
funda
ment
al
freque
ncy
,
su
ch a
s
saro
n
and bo
nan
g [1]. See Figure 1. In game
l
an mu
sic, sa
ron a
nd bo
na
ng
are n
o
t sou
n
ded at the sa
me time. Bonang is
stru
ck a half beat b
e
fore
saron time, whe
r
e b
eat
,
in this case,
is d
e
fined
a
s
the di
stan
ce
betw
een
two con
s
e
c
utive soun
ds Sa
ron, [2]-[3] S
ee
Figure 2.
Figure 1. Spectrum
s of bon
ang an
d sa
ro
n
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 103 – 1
1
7
104
Figure 2. Ho
w Saro
n and
Bonang
we
re
played
One
com
m
on
ly used
meth
od to d
e
tect
o
n
set i
s
the
fe
ature
-
ba
se
d
onset dete
c
ti
on [4]-[8].
The
disadvan
tage of
this conventio
n
a
l
method
is su
sceptible
to
wea
k
on
set f
eature
o
r
spu
r
iou
s
pea
k that not corre
s
p
ond
s to an on
set event. See Figu
re 3.
Figure 3. Spuriou
s
pe
aks
in Saron ma
g
n
itude ex
tra
c
tion re
sult fro
m
a real re
co
rded G
a
mel
a
n
orche
s
tra sig
nal
Another difficulty,
gamelan
instrument
s are hand
mad
e
and we
re
t
uned ba
sed on
the
sen
s
e of
craftsmen. Th
us, gam
elan
musi
c si
g
n
a
ls often h
a
v
e fluctuatio
ns in am
plitude,
freque
ncy an
d pha
se [1]. These fluctuati
ons ma
y lea
d
to different shape of si
gna
l envelope.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Saron Mus
i
c
Transc
ription Bas
ed
on Rhy
t
hmic
Information Us
i
ng
HMM .... (Yoy
ok
K. Suprapto)
105
We p
r
op
ose
a Hid
den M
a
rkov Mod
e
l (HMM) ap
proa
ch to pre
d
ict t
he likely timing of the
onsets of ga
melan mu
si
c signal
s. HM
M method al
lows inform
a
t
ion su
ch a
s
tempo to be
combi
ned
in
onset dete
c
ti
on meth
od [
9
] - [1
0]. Te
mpo info
rmat
ion give
s
pre
d
iction
for
ea
rly
detectio
n
of
on
sets and
discri
minate
s
them
fro
m
false
pe
a
k
s core
sp
on
ding to
anot
her
instru
ment’
s
sou
nd. HMM
offers effici
en
t computation
and doe
s not
need large d
a
ta for trainin
g
.
Tran
scriptio
n is
e
s
sentially done by
dete
c
ting
th
e
on
set of a spe
c
ific type of in
strument.
Many
studie
s
h
a
ve
been
ca
rri
ed
out
to dete
c
t
the
on
set
of m
u
si
cal
events, but tho
s
e
are
esp
e
ci
ally focuse
d on we
stern mu
sic. Comm
on on
set detection
sho
u
ld be im
proved to det
ect
the onset of the easte
rn m
u
si
cal in
strum
ents such as
the gamela
n
.
Standard onset detection
method
s are
used firs
t to
find the location of the
pea
ks by
measuri
ng th
e abrupt
cha
nge in
ene
rg
y content,
m
agnitud
e
, fre
quen
cy, or
p
hase of the
musi
c
sign
al, then to apply a thre
shol
d to de
ci
de wh
ethe
r it is a pea
k o
r
not, by consi
derin
g heig
h
t as
an onset [5]
-
[8]. If a peak’
s hei
ght
i
s
above the threshol
d, then the peak
will
be consi
d
ered as
an o
n
set eve
n
t and
vice
versa. Thi
s
on
set d
e
tectio
n
accuracy
onl
y depe
nd
s on
sin
g
le p
e
a
k
t
hat
is analyzed a
t
current time. Therefore, this meth
o
d
could not disti
ngui
sh the sp
uriou
s
pe
aks in
Bonang
sig
n
a
l with the re
al onset even
t and the we
ak on
set p
e
a
k
due to m
a
n
y
fluctuations in
Gamela
n mu
sic
sign
al su
ch as in Figu
re
4.
In gamel
an
ensemble,
a
n
inst
rume
nt so
und i
s
always interfe
r
ed by tho
s
e
of othe
r
instru
ment
s. For exam
ple,
the extracte
d
sar
on soun
d
may still cont
ain bon
ang
sound
sin
c
e b
o
th
instru
ment
s h
a
ve the
sam
e
fund
ament
al freq
uen
cy.
But
the pre
s
ence
of bon
a
ng sou
nd ca
n
b
e
disting
u
ished
from saron
sou
nd by
co
mpari
ng the
spe
c
tral
env
elope
of bot
h so
und
s, si
nce
bona
ng soun
d (60 m
s
) ha
s shorte
r envel
ope
than that
of saro
n (3
00
ms).
In the co
nven
tional metho
d
s
of on
set det
ecti
on, e
a
ch pea
k is o
n
ly evaluated in
d
i
vidually
without con
s
i
derin
g the te
mporal relati
onship wi
th o
t
her pe
aks. If an onset ha
s app
ea
red, the
next onset wi
ll not app
ea
r
in the ne
ar fu
ture, unl
es
s a
ce
rtain time i
n
terval h
a
s
p
a
ssed. A
clea
r
example is t
he interval b
e
twee
n beat
s in the form
of musi
c that can often b
e
followe
d by the
audie
n
ce. It's kind of impo
rtant informat
ion that is
difficult to relate the onset o
f
feature-b
a
sed
detectio
n
.
Other
re
cent
onset dete
c
tion meth
od
s
are
usi
ng ma
chin
e lea
r
nin
g
. An artifici
a
l
neu
ral
netwo
rk
ca
n
be train
ed to
detect the
o
n
set of
the
e
v
ent [11]-[12]
. Important p
r
erequi
site fo
r
machi
ne le
arning meth
od
s is
that the tra
i
ning d
a
ta mu
st be la
rge
en
ough to
rep
r
e
s
ent the
actu
al
data in
re
ality. In som
e
ca
ses, the
amo
u
n
t of tr
aini
ng
data
shoul
d
reach up
to 7
0
% of all a
c
t
ual
data [2]. Due
to fluctuatio
n
s
in
the
sig
nal
ca
n b
e
fou
n
d
in m
any g
a
m
e
lan m
u
si
c,
al
l the va
riation
s
of the si
gnal
s must al
so
b
e
incl
ude
d in
the traini
ng p
r
ocess. In o
r
der to
dete
c
t
the on
set of t
he
event many different gamel
an instrument
s, the net
work can n
o
t be traine
d by just one variety of
gamela
n
inst
ruments. Th
us, one requi
re
s a large data
b
a
se to train th
e netwo
rk.
Thre
e stag
es
in onset dete
c
tion metho
d
s
ca
n be see
n
in Figure 3.
1.
Preprocessin
g
,
an
option
a
l
initial p
r
o
c
e
s
s to a
c
cent
uate o
r
atte
n
uate
some
a
s
pe
cts of the
origin
al sign
a
l
is relate
d
t
o
the on
set detectio
n
. At this
stag
e, the
sign
al’s spectrog
ram i
s
divided into
multiple freq
u
ency ba
nd
s.
2.
Red
u
ctio
n,
is the most important pa
rt in the onset
d
e
tection, be
cause the orig
inal sign
al i
s
conve
r
ted int
o
a sample
d
function
of detecte
d on
set. In general, the origi
n
al sig
nal is
transfo
rme
d
i
n
to the dete
c
tion fun
c
tio
n
by st
anda
rd feature
s
such a
s
an e
x
plicit sign
al
amplitude, fre
quen
cy or ph
ase
3.
Peak-pi
cki
ng,
which is the
final pro
c
e
s
s after the d
e
tection fun
c
t
i
on
,
is form
e
d
and o
n
set
pea
ks beg
an to
appe
ar.
At this
sta
ge sm
oothing or
no
rmali
z
ation
can be
don
e in
advan
ce
so
as to fa
cilitat
e pea
k-pi
ckin
g
process th
at ident
ifies the lo
cation
s
of local
maxi
ma that a
r
e
above the thresh
old.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 103 – 1
1
7
106
Figure 4. Flowchart O
n
set
Detectio
n Standa
rd
s [8]
1.1.
Prev
ious Detection On
se
t methods
There are
se
veral on
set d
e
tection meth
ods:
a. Spectr
a
l
Flux
Method of
Sp
ectral Flu
x
(S
F) is
ba
sed
o
n
the dete
c
tio
n
of su
dde
n chang
es i
n
the
sign
al
positive e
nergy that sho
w
s a p
a
rt of a
new
event.
Spectral flux
measures th
e ch
ang
e in
the
amount of e
nergy in
ea
ch frequ
en
cy bin,
and
sum
m
ed giving
o
n
set d
e
tectio
n functio
n
. The
formula i
s
wri
tten in eq. (1) and (2
) [7].
∑
|
,
|
|
1
,
|
(1)
(1)
∑
|
,
|
|
1
,
|
(2)
with
|
|
is the half-wave recti
f
ier function
and
X(n, k
)
i
s
the result of the STFT o
f
th
e
input sig
nal
x
at
every
n
th
frame in the
k
th
frequen
cy bi
n. Based on
empiri
cal exp
e
rime
nts [7], it is
kno
w
n that th
e function of
L1-n
o
rm in e
q
.
(1) is
sup
e
rior to L2-norm [7] in eq. (2).
Selection
of
Spectral Flux
method
a
s
a
met
hod
of
co
mpari
s
o
n
d
u
e
to the
ch
aracteristics
of the equip
m
ent Gamel
an is a musical inst
rume
nt played by percussive or beate
n
. The
percu
ssive
fe
ature
s
of th
e
gamel
an i
n
strument,
cau
s
e
ch
ang
es
of magnitu
de
more p
r
omi
nent
than the othe
r feature
s
[7].
∑
|
,
|
|
1
,
|
(3)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Saron Mus
i
c
Transc
ription Bas
ed
on Rhy
t
hmic
Information Us
i
ng
HMM .... (Yoy
ok
K. Suprapto)
107
In this
metho
d
, we
are g
o
i
ng to l
o
o
k
for
t
he large
ch
a
nge
s in
the
o
u
tput mag
n
itu
de STFT
X(n, k
)
Ma
gnitude of th
e pea
k dete
c
tion metho
d
of Spectral
Flux is based only on the
magnitud
e
of
the ch
ang
e i
n
magnitu
de
with the p
r
ev
i
ous f
r
ame. In
eq. (3
), the
differen
c
e in
the
magnitud
e
of
each frame
n
is re
ctified, t
hen th
e
re
sult
of the
k
th
freq
uen
cy bin
su
mmed
up
all t
h
e
wind
ow len
g
th
N
.
As the pro
c
e
ss
continu
ed,
it
takes p
e
a
k
-pi
c
king p
r
o
c
e
ss to analy
z
e the magni
tude of
cha
nge in m
u
ltiple frame
s
at on
ce. T
h
re
shol
d value for dete
c
t
i
on functio
n
at time
t
is the
averag
e of the detectio
n
function in
the
analysi
s
wi
nd
ow centered
at
t
.
∑
(4)
Then a pe
ak
at the n-th fra
m
e is selecte
d
fo
r the beat
or beat
s if it is a local maximum.
,∀
:
(5)
The
sele
ction
of the value
of
w
is
ba
sed
on the ave
r
a
ge pe
riod
of musi
c that in
dicat
e
s
the ave
r
ag
e d
i
stan
ce
betwe
en the
kno
c
k
on the
G
a
mel
an m
u
si
c
sig
n
a
l. Whil
e th
e
variable
m i
s
a
multiplier vari
able
given a
value of
1
so
that t
he
det
ermin
a
tion
of a p
e
a
k
b
a
se
d solely o
n
t
h
e
magnitud
e
of
the
pea
k
hei
ght of th
e fra
m
e to th
e ot
her fram
es in
a
ran
ge
of
musi
c tem
p
o
on
averag
e.
b. Phase
dev
i
ation
One
of Pha
s
e compo
nent
s o
b
served
i
n
the d
e
tecti
on meth
od i
s
the
inst
ant
aneo
us
freque
ncy
ch
ange
s th
at are indi
ca
to
rs o
f
possibl
e o
n
s
ets.
Let
φ
(n,
k
)
is the p
h
a
s
e compo
nent
of
X(n,k)
STFT result
s of the input sig
nal to
the nth frame in the frequ
ency bin
k.
,
|
,
|
,
(6)
φ
(n, k) h
a
s a ra
nge
of
values
from
-
π
to
π
. So the in
stantan
eou
s fre
que
n
c
y,
φ
'(n,
k),
is
cal
c
ulate
d
fro
m
the first time-differen
c
e o
f
phase
spe
c
t
r
um
φ
’(
n,
k
).
,
,
1
,
(7)
Then chan
ge
of the instantaneo
us fre
quen
cy
ca
n be derive
d
from the se
co
nd ord
e
r tim
e
-
differen
c
e of pha
se spe
c
trum:
,
′
,
′
1
,
(8)
As a final
st
ep, the o
n
se
t detection
functio
n
ba
se
d on
pha
se
deviation o
b
tained f
r
om t
h
e
absolute valu
e of the avera
ge insta
n
tane
ous fr
e
que
ncy chang
es in
all bin frequ
e
n
cie
s
[7].
1.2.
Onse
t De
tec
t
ion Ba
sed o
n
spec
tral F
eatur
es
In feature-ba
sed o
n
set d
e
tection, the
input
sign
al
is co
nverte
d into the d
e
tection
function th
ro
ugh
redu
ctio
n proce
s
s by
observing
th
e su
dde
n ch
ange
of the
stand
ard f
eat
ure
s
su
ch a
s
au
di
o sig
nal’s
explicit inform
a
t
ion
on en
ergy (or a
m
plit
ude),
frequ
e
n
cy content,
or
pha
se. In the
followin
g
su
b
s
e
c
tion
s we
b
r
iefly re
viewe
d
the existing
approa
ch of
onset detecti
on
usin
g sp
ectra
l
flux and pha
se deviatio
n
[13]
The
re
st of th
e pa
per is org
anized a
s
foll
ows. Sectio
n
II
describ
es p
r
opo
se
d m
e
th
od, the
HMM m
e
thod
that is
used f
o
r o
u
r
on
set
detectio
n
de
scrib
e
s the m
e
thods u
s
ed,
while S
e
ctio
n
III
descri
b
e
s
ou
r whol
e meth
od of pe
rformance me
asurem
ent, pre
s
ent
s ou
r ex
perim
ental re
sults
and di
scuss the results. T
he last sectio
n con
c
lu
de
s this work.
2. Proposed
M
e
thod
And seT
he p
r
opo
sed met
h
od in this research i
s
usi
n
g
Hidden Markov Model (HMM)
for
the Saron’
s b
eat tracking. In this study, to anal
yze the
perform
an
ce
of the Saron’s beat tra
ckin
g
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 103 – 1
1
7
108
usin
g HMM
method
s, we
com
p
a
r
e it
with
conv
enti
onal d
e
tecto
r
s Spe
c
tral Fl
ux, applying
the
prin
ciple
s
of adaptive th
re
shol
ding. Fl
o
w
chart
of
the
method
s
use
d
in thi
s
stud
y can
be
se
e
n
in
Figure 5.
Figure 5. Flowchart of Re
search Meth
od
ology
2.1. Frequen
c
y
Filtering
In Gamel
an
orche
s
tra, ta
ctus level
co
rres
p
ond
s to t
he spee
d of
beats th
at so
unde
d by
the Saron i
n
strum
ent, whi
c
h
only a
si
n
g
le
stri
ke
to
each n
o
tatio
n
. The
r
efo
r
e,
in a
system
o
f
asse
ssm
ent
Gamela
n mu
sic tem
po, freque
ncy filt
ering proce
s
se
s ne
ce
ss
ary saron
in
stru
ment,
500-100
0 Hz, espe
cially if the audi
o sig
n
a
l is
a sig
nal
observed G
a
melan o
r
che
s
tra.
Kais
er window
is
one way to
form a
filter
. Kaise
r
filter t
y
pe can
creat
e a
wid
e
a
r
ea
and
i
s
rest
ricte
d
to a narrow b
and
regio
n
.
2.2. Preproc
essing
Revie
w
ed M
odelin
g syste
m
s is
an au
d
i
o sign
al
that
has b
een
proce
s
sed
with
overlap
Short Time
Fouri
e
r Tran
sform (ST
F
T
)
. In the
ST
FT, the audi
o signal i
s
repre
s
e
n
ted i
n
two
domain
s
, the time domain
and freq
uen
cy domain.
STFT process can b
e
u
s
ed to em
pha
size t
he feat
ure m
agnitu
d
e
of an a
udi
o sig
nal.
More
over, th
e re
presentati
on in
the freq
uen
cy dom
ai
n allo
ws the
scre
enin
g
p
r
o
c
ess
whe
n
on
set
detectio
n
of saron i
s
appli
e
d to the signa
l Gamelan o
r
che
s
tra.
This stu
d
y u
s
ed a
wi
ndo
w
length
N = 2
048
, o
r
e
quiv
a
lent to
43
m
s
at
48
kHz sampling
freque
ncy. Both windo
w l
ength and h
ope length in
overlapp
ed STFT were u
s
ed to maint
a
in
freque
ncy
re
solution a
nd ti
me re
sol
u
tion
re
spe
c
tively. In ord
e
r to
g
e
t a smalle
r i
ndex ag
ain, t
h
e
width
of ho
p
s
h
= 10
ms (o
r 7
6
% ov
erlap
)
i
s
use
d
. The
usag
e of
wide
-ho
p
si
ze
is alre
ady
comm
only used in these st
udie
s
for dete
c
ti
on of the o
n
set an
d beat
tracking [13].
2.3. Hidden
Markov
Models
The flo
w
cha
r
t of Hid
den
M
a
rkov Mo
del
(HMM
), p
r
op
ose
d
in thi
s
rese
arch
ca
n
be
see
n
in Figure 6. Gene
rally, the method i
s
the incorpor
ation of HMM p
r
oba
bility value of ob
serv
ation,
transitio
n p
r
o
babilitie
s, and
the initial
pro
bability to
p
r
edict th
e valu
e of a
state t
hat ha
s a
n
o
r
der
or reg
u
lar
e
noug
h stru
cture so
that
many
va
ria
t
ions
of the
ob
servatio
n
a
l data
can
be
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Saron Mus
i
c
Transc
ription Bas
ed
on Rhy
t
hmic
Information Us
i
ng
HMM .... (Yoy
ok
K. Suprapto)
109
approximated
by the information st
ru
cture of the t
r
a
n
sition
state.
In this stu
d
y HMM i
s
u
s
ed
to
extract tem
p
o infofmation of musi
cal pi
eces that
will
be useful fo
r
eliminating the false peaks in
transcription.
Figure 6. Flowchart of the
HMM
Hidd
en va
ria
b
le
τ
t
is defin
ed a
s
the
nu
mber of fram
es
s
i
nc
e
las
t
o
n
s
e
t,
w
h
ic
h is
w
o
r
t
h
one if the frame is an on
set frame (st
a
te event = 1). Conversely
, if
τ
t
= s
,
means that the t -
frame i
s
an
s
th
frame from
the on
set to
the last fram
e.
If the detection of the
o
n
set of the
n
e
xt
frame is met, then the state
event back t
o
being eq
ual
to 1 and moves up ag
ain to enco
unte
r
the
onset of the
next frame.
T
o
tal numb
e
r
of state
S
tha
t
may arise is cal
c
ulate
d
from the maxi
mum
frame spa
c
in
g betwe
en two su
ccessive
beats.
In each fram
e, the system
also i
s
sue
s
the ob
se
rvatio
nal data ot, that is the pe
ak value
of the input a
udio si
gnal. T
he de
sire
d d
e
ci
sion i
s
to find the optim
al state sequ
ence
:
∗
whi
c
h
refers to the observation
al
data and formulated in eq
. (9) [10].
∑|
,
|
(9)
Therefore,
o
n
set d
e
tectio
n pr
ocess d
oes
not requ
ire fre
que
ncy
informatio
n, then al
l
output STFT magnitud
e
at each fre
que
ncy bin is
su
mmed to obt
ain the total magnitud
e
of each
time fram
e. If the
audio
si
gnal
und
er st
udy is
a
com
p
lex
au
dio si
gnal, whi
c
h many
in
stru
m
ents
played
at on
ce, the
n
the
sum
of all
m
agnitud
e
s are only
perfo
rmed in
the
freque
ncy
ran
ge of
instru
ment
s to analyze the
beat.
The
probabilit
y of the
obse
rved to
occur i
f
state-
t
th
-hap
pen
s to
P(o
t
|
τ
t
)
i
s
divided
i
n
to two
probability values, ie the
probability of
the
observed onset
fram
e t i
s
a
P(o
t
|
τ
t
=1)
a
nd th
e
probability of
the observed if frame to
-
t-th is
not an ons
e
t frame P (ot |
τ
t
≠
1), an
d
se
cond
overall p
r
oba
bility value amounted to 1.
The probabilit
y of the observ
ed data is a
n
onset frame
P(o
t
|
τ
t
=1)
is determine
d from th
e
results of ele
v
ation normal
i
zed outp
u
t extraction
of STFT pro
c
e
s
s’
magnitude. High
er the pe
ak
value of the
magnitud
e
of
a frame
t
th
a
r
e ob
se
rved,
the more likel
y the frame i
s
a frame o
n
s
et.
Contrarily, if the value of magnitu
d
e
lo
wer, it is likel
y that the
t
th
observed fra
m
e is not an
onset
frame
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 103 – 1
1
7
110
|
1
(10)
|
1
1
(11)
In eq. (10
)
and (11), it can b
e
said
that
the cal
c
ulatio
n of the proba
bility of the
observation
s
can b
e
con
s
i
dere
d
as a fi
xed thres
hold
i
ng with a value of 0.5 is often used in the
con
s
e
r
vative
onset dete
c
tion meth
od
s. If the pea
k
value is grea
ter than
0.5,
the pe
ak is
an
onset, and vice versa.
The cal
c
ul
ation of the state transitio
n probability is de
noted by the symbol
Ps
,u
w
h
ic
h
is
the probabilit
y of
s
state is chan
ged to
state
u
or
P(
τ
t
=s
|
τ
t+1
=u),
wh
ere
s,
u
∈
{1,2,…,S}
. Be
c
a
us
e
of hidden
sta
t
e variable
s
repre
s
e
n
t the
frame in
dex
cal
c
ulate
d
fro
m
the on
set
of the previo
us
frame
se
que
nce, th
en it i
s
li
kely that t
he o
n
ly po
ssi
ble
state
cha
nge
s fro
m
s to
s+
1
or 1.
Ps
,
u
may happe
n only
Ps
,s
+1
d
an
Ps
,1
whi
c
h mean
s that the next frame is a frame o
n
set. Illustrati
on
of the transiti
on state of the HMM
method is illustrate
d in Figure 6.
Figure 6. Illustration Rel
a
tionship bet
we
en State Tran
sition Op
port
unities
Ps
,1
values,
is modeled
as Ga
ussian
proba
bility distributio
n tha
t
has a pea
k at a
n
averag
e fram
e distan
ce
b
e
twee
n two
su
ccessi
ve o
n
set. The
averag
e value
can b
e
obtai
ned
from the
valu
e of mu
si
c t
e
mpo
on th
e
com
m
only u
s
ed
compo
s
it
ion of G
a
mel
an, eg
60
bp
m
tempos m
ean
s that in 1 minut
e on avera
ge there a
r
e
60 beat
s.
A simple exa
m
ple of mode
ling the dist
ri
bution of
the
state tran
sitio
n
can b
e
se
e
n
in Fig.
7, with a num
ber of state o
f
20 and an a
v
erage
of
10 Gau
ssi
an di
stribution,
which mean
s mo
st
likely that after the 10th fra
m
e to the stat
e transitio
ns t
o
state 0 (a
10
, 0 = 0.99).
Figure 7. Simple Tra
n
sitio
n
Distrib
u
tion
Model
While observ
a
tional
data
o
t
is defin
ed
as the p
e
a
k
value
dete
c
tion on
set of the feature
extraction
proce
s
s results whe
n
the
t
th
τ
t state of t
he following
probability di
stribution
P(o
t
|
τ
t
)
..
Illustration of
the observed
relation
shi
p
with t
he hidden
variable stat
e can b
e
se
e
n
in Figure 8.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Saron Mus
i
c
Transc
ription Bas
ed
on Rhy
t
hmic
Information Us
i
ng
HMM .... (Yoy
ok
K. Suprapto)
111
Figure 8. Gra
ph Acycli
c tre
nding
HMM
A simpl
e
exa
m
ple to
illu
strate ag
ain th
e
corre
s
p
ondin
g
relation
ship
with
the
hidd
en
stat
e
of obse
r
vatio
nal data can
be se
en in Fi
gure 9.
W
hen
the value of observation
al
data on a fra
m
e
sho
w
in
g a pe
ak o
r
a hi
gh v
a
lue, then th
e
data
shoul
d
be hid
den
sta
t
e co
rre
sp
ond
s to the
state
0
in the frame.
(a)
(b)
Figure 9. Illustration of Dat
a
Observatio
ns
an
d Hid
d
e
n
State; (a) Hidden State (b) Data
Observation
s
De
cisi
on
s tha
t
is
goin
g
to
be a
c
hi
eved f
r
om
HMM
b
e
a
t dete
c
tion
method
is to
find the
optimal state seq
uen
ce
:
∗
which refe
rs to data obse
r
vat
i
on
o
t
. The integratio
n pro
c
ess and the
transitio
n p
r
o
bability value
of ob
se
rvational d
a
ta i
s
illu
strated
in
Fig
u
re
10.
Whil
e
sim
p
le
exam
ple
of merging th
e tran
sition
d
i
stributio
n mo
del Fig
u
re
7
with quite
a
variety of ob
servatio
nal
d
a
ta
can b
e
se
en i
n
Figure 10.
τ
t+
o
t
τ
t
o
t
τ
T
o
T
τ
t
-
o
t
-
τ
1
o
1
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 103 – 1
1
7
112
Figure 10. Integratio
n Simple HMM for
Onset Dete
ction
Figure 11. Di
stributio
n of Observat
ion
s
Integration a
nd Tra
n
sitio
n
In Figure 1
1
, after the on
set frame i
s
detecte
d, then the state of
transitio
n distribution
model
move
again
from
st
ate 0, a
nd
so
on
until th
e
l
a
st frame. Fi
gure
11,
it ca
n al
so
be
se
en
that the exi
s
tence of
a fal
s
e pe
ak in th
e
25th fr
ame.
Howeve
r, be
ca
use
the f
r
am
e is the
value
of
the state tra
n
s
ition p
r
ob
abi
lities into stat
e 0 is lo
w,
therefo
r
e the
cou
n
ting of the fram
e stat
e
contin
ue
s.
If
there are
many
vari
atio
ns of
temp
o, all
valu
e
s
of tempo va
riatio
ns i
n
the
calculation
s
can b
e
inclu
ded on Ps, 1
as written in eq. (11
)
.
,
∑
ex
p
/
2
(12)
whe
r
e K is th
e numbe
r of possibl
e varia
t
ions in temp
o and
μ
k
is the value of music to variatio
ns
in the
kt
h
peri
od-.
σ
k
is the
stand
ard d
e
viation of the value of musi
c to the
kt
h
period.
Illustration
va
lue of
P
s,1
wi
th two va
riati
ons of temp
o
ca
n be
see
n
in Fi
gure 1
2
whi
c
h
shows the value of the probability
of a subsequent frame if the
previous onset frame have
an
s-state . Two
tempo variat
ions exp
e
cte
d
to
occur a
r
e 60 bpm a
n
d
120 bpm,
whi
c
h mea
n
s that
the ave
r
age
onset fram
e
appe
ars at
a
distan
ce
of e
v
ery 10
0 fra
m
es an
d 5
0
f
r
ame
s
, give
n
the
distan
ce b
e
tween ea
ch fra
m
e is a
s
wide
as 10 m
s
ho
p size.
Detec
tion
:
∗
Data
Obervation
o
t
Proses
Integrasion
HMM
Probabilit
y a
i,j
Evaluation Warning : The document was created with Spire.PDF for Python.