TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol.12, No.6, Jun
e
201
4, pp. 4717 ~ 4
7
2
3
DOI: 10.115
9
1
/telkomni
ka.
v
12i6.549
1
4717
Re
cei
v
ed
De
cem
ber 2
9
, 2013; Re
vi
sed
March 8, 201
4; Acce
pted
March 21, 20
14
Impact of Missing Data on EM Algorithm under
Rayleig
h
Distribution
Zhendon
g Li
1
, Mengmeng Li
2
1
School of Infor
m
ation En
gi
ne
erin
g, n, Lanzh
ou, 730
02
0, Chin
a
2
School of Stati
s
tics, Lanzho
u
Univers
i
t
y
of F
i
nanc
e an
d Eco
nomics, La
nzh
ou, 730
02
0, Chin
a
Corresp
on
din
g
author, email: li
zd@lzcc.edu.c
o
m
1
, l
i
mm2
0
12@y
e
ah
.n
e
t
2
A
b
st
r
a
ct
Is EM a
l
g
o
r
i
t
hm
pa
ram
e
te
r
esti
m
a
tio
n
un
der R
a
yl
ei
gh
distr
i
buti
on se
nsitiv
e to miss
in
g d
a
ta an
d if
it is, w
hat ext
ent is
it? By
d
e
sig
n
in
g co
mp
uter si
mu
lati
on
methods, c
o
n
t
rast and
an
al
y
z
e
the r
e
sults
of
max
i
mu
m lik
eli
hoo
d esti
matio
n
w
i
th compl
e
t
e
data
an
d EM algorith
m
e
s
timati
on u
nde
r different mis
s
i
n
g
r
a
te
in
sm
al
l
sa
m
p
le
.
It show
s that the
res
u
l
t
s w
e
re al
most id
ent
ica
l
w
h
e
n
the
missin
g
ra
te is
bel
ow
0.3
0
,
but the
effici
e
n
cy of EM
pa
rameter
estimation
gra
d
u
a
lly
deter
iorates
as the
missi
n
g
rate
incr
eas
es.
Meanw
hi
le the
results als
o
show
that
the EM algor
ith
m
is
sensitive to
s
a
mpl
e
si
z
e
a
n
d
the sel
e
ctio
n
of
initia
l val
ue.
Ke
y
w
ords
:
mi
ssing d
a
ta, Ra
yleig
h
al
gorit
h
m
, EM, para
m
eter estimatio
n
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
In the stu
d
y of relia
bility, Exponential dist
ributio
n,
Wei
bull distrib
u
tion, Rayleig
h
distrib
u
tion
s and so o
n
, are impo
rtant life dist
ributio
n. Therefo
r
e,
rese
arche
r
s explored go
od
para
m
eter e
s
timation from variou
s point of view, w
hether in the co
mplete sam
p
l
e
, or in the ca
se
of ce
nsored
sa
mple
s. B
a
se
d o
n
the
Type-II
ce
n
s
ori
ng life
te
st, Wei
et a
l
[1] disc
usses
expone
ntial distrib
u
tion; Liu
et al
[2]
discu
s
ses ex
perim
ental B
a
yes
parame
t
er e
s
timatio
n
of
Weib
ull distri
bution. Ho
we
ver, how missing
data aff
e
cts p
a
ra
met
e
r estim
a
tion
under
comm
on
condition of missi
ng data i
s
still lack of in-depth study
.
EM (Expect
a
tion-M
a
ximization
) algo
rithm plays
an impo
rtant
role in pa
ramete
r
estimation,
e
s
pe
cially p
e
rf
orm
s
well in
small
sampl
e
with
missi
ng data. EM
algo
rithm i
s
an
iterative algo
rithm with nu
meri
cal stabil
i
ty, sma
ll storage
cap
a
city
. It could en
sure that in the
para
m
eter e
s
timation pro
c
ess the likeli
hood fun
c
ti
o
n
of obse
r
vat
i
on data is n
onde
crea
sing
in
each iterative
and the a
ccura
cy is relat
i
vely
high. But unde
r different mi
ssing
rate, ho
w EM
algorith
m
affects t
he a
ccura
cy of pa
ra
meter e
s
ti
ma
tion nee
d a f
u
rthe
r stu
d
y. This
pap
er i
s
to
work on the
accura
cy of EM parame
t
er estimati
o
n
of Rayleig
h
distrib
u
tion
under different
missi
ng
rate
based o
n
foregoin
g
case
by the num
erical exam
ple.
It analyze
s
a
nd evalu
a
tes
the
impact of different mi
ssi
ng
rate on the a
c
cura
cy of pa
ramete
r estim
a
tion.
2. Rese
arch
Metho
d
Let the den
sity function of Rayleig
h
distribution is:
0
),
2
/
exp(
)
/
(
)
;
(
2
2
2
x
x
x
x
f
(
0
)
(1)
The dist
ributi
on functio
n
is:
0
,
1
)
(
2
2
2
x
e
x
F
x
(2)
12
,,
,
n
x
xx
L
is the sa
mple
, then its likeli
hood fun
c
tion
is:
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4717 – 4
723
4718
0
2
exp
)
2
exp(
)
(
2
1
2
2
11
2
2
i
n
i
i
n
n
i
n
i
i
i
i
x
x
x
x
x
L
(3)
Loga
rithmi
c:
n
i
i
n
i
i
x
n
x
L
1
2
2
1
2
1
ln
2
ln
)
(
ln
(4)
The maximu
m likeliho
od e
s
timator of pa
ramete
r is:
2
1
1
2
^
2
1
n
i
i
x
n
(5)
Each iteratio
n of EM iterativ
e algorith
m
of Rayleig
h
distrib
u
tion
param
eter e
s
timation
con
s
i
s
ts of two step
s: E(Expectatio
n
) ste
p
and M(M
a
ximization
) ste
p
.
Let
12
(,
,
,
)
n
Xx
x
x
L
be the obse
r
vation dat
a, but we only obtain
11
(,
,
,
,
,
)
kk
n
Z
zz
z
z
LL
be
cau
s
e
of
condition
al limi
t
s,
1
,,
kn
zz
L
repre
s
e
n
ts
the data whi
c
h
don’t be ob
se
rved, and
X
and
Z
have the following
relatio
n
s:
1,
2
,
,
1,
2
,
,
jj
jj
xz
j
k
x
zj
k
k
n
L
L
(6)
Solve
whi
c
h
is the
estim
a
ted value
of
by EM algo
rithm, sta
r
tin
g
from th
e in
itial
value
)
(
i
(
0,
1
,
2,
i
L
), the two ste
p
of the
1
i
iteration is:
1) E step
:
Calcul
ate the condition
al expectatio
n
.
n
j
i
j
n
j
i
j
i
i
Z
x
E
n
Z
x
E
Z
x
L
E
Q
1
)
(
2
2
1
)
(
)
(
)
(
)
,
|
(
2
1
ln
2
)
,
|
(ln
]
,
|
)
,
(
[ln
)
|
(
(7)
2) M step: Maximize
()
(|
)
i
Q
to get the update
d
)
1
(
i
, that is
:
)
|
(
max
)
|
(
)
(
)
(
)
1
(
i
i
i
Q
Q
(8)
Let
0
)
,
|
(
1
2
1
)
(
2
3
n
j
i
j
Z
x
E
n
d
dQ
(9)
Includi
ng it:
n
k
j
i
j
k
j
i
j
n
j
i
j
Z
x
E
Z
x
E
Z
x
E
1
)
(
2
1
)
(
2
1
)
(
2
)
,
|
(
)
,
|
(
)
,
|
(
k
j
n
k
j
z
j
x
i
z
j
x
i
j
j
k
j
n
k
j
i
j
j
i
j
j
j
j
i
j
j
i
j
dx
e
dx
e
x
z
z
x
p
z
x
E
z
11
2
2
)
(
2
2
)
(
2
11
)
(
2
)
(
2
2
2
)
(
2
)
(
2
1
2
)
,
(
)
,
(
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Im
pact of Missing
Data on
EM Algorithm
under
Ra
ylei
gh Di
stributio
n (Zhe
ndo
ng
Li)
4719
k
j
n
k
j
z
z
j
i
j
i
j
i
j
e
e
z
z
11
2
2
2
)
(
2
2
)
(
2
)
(
)
2
(
k
j
n
k
j
j
i
j
z
z
11
2
)
(
2
)
2
(
k
j
n
k
j
j
i
j
z
k
n
z
11
2
)
(
2
)
(
k
j
i
j
k
n
z
1
2
)
(
2
)
(
(10)
Bec
a
us
e,
)
|
(
max
)
|
(
)
(
)
(
)
1
(
i
i
i
Q
Q
(11)
By
(9),
0
)
)
(
(
1
2
1
2
)
(
2
3
)
1
(
)
1
(
k
j
i
j
i
i
k
n
z
n
(12)
Get the iterati
v
e formula is:
2
1
1
2
)
(
2
)
1
(
2
)
(
n
k
n
z
k
j
i
j
i
(13)
Stop the itera
t
ions until
|
|
)
(
)
1
(
i
i
is
s
u
ffic
i
ently s
m
all.
3. Anal
y
s
is o
f
Computer
Simulation
3.1.
Computer Simulation Design
ed [3]
Step 1: Generate o
b
se
rvations of
a rand
om n
u
mbe
r
12
(,
,
,
)
n
Yy
y
y
L
with
Paramete
r
and re
stri
ctive rand
om num
ber
12
(,
,
,
)
n
Xx
x
x
L
under
Rayleigh di
strib
u
tion by
the comp
uter;
Step 2: Gen
e
r
ate the
ob
se
rved d
a
ta by
comp
ut
er whi
c
h ran
dom
d
e
lete
some d
a
ta
from
origin
al data
unde
r differe
nt missin
g
rat
e
p
(su
c
h a
s
: 0.
05 ,0.10 ,0.15
, etc.):
()
()
(
1
,
2
,
,
)
jj
j
j
j
j
j
z
y
Ix
y
x
Ix
y
j
n
L
(14)
Substitute it into the iterative formula (1
3);
Step 3: Take
the initial value
)
0
(
for a give
n
0
, tes
t
whether
|
|
)
(
)
1
(
i
i
or not,
if it meet the
above conditi
ons, then
)
1
(
~
i
, or usin
g (13
)
to contin
ue calculating
)
1
(
i
;
Step 4:
Cal
c
ulate
2
1
1
2
^
2
1
n
i
i
y
n
whi
c
h
i
s
the
maximu
m likelihoo
d
estimated
val
ue of
by the observed data
12
(,
,
,
)
n
Yy
y
y
L
;
Step 5:
Rep
eat Step1
to
step
4 to
g
e
t the o
b
served sequ
en
ce
12
,,
,
n
L
and
~~
~
12
,,
,
n
L
. Respectivel
y
calcul
ate
seq
uen
ce
|
|
i
i
a
and
|
|
~
i
i
b
under
di
fferent
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4717 – 4
723
4720
missi
ng rat
e
p
and
their M
SE, analyze
the differen
c
e
s
b
e
twe
en th
e likelihoo
d e
s
timate
s an
d
EM es
timates with inc
o
mplete data.
3.2. Compari
s
on and Ana
l
y
s
is of Simulation Re
su
lts
1) The imp
a
ct
of missing
ra
te on para
m
e
t
er estimatio
n
Whe
n
=0.5
a
nd the
initial
value of EM
algorith
m
1
)
0
(
, the likelihoo
d
estimate
s
and EM e
s
ti
mates u
nde
r
different mi
ssing rate
and
their MSE sh
ows in Ta
ble
1 and T
able
2,
and its an
alysis table sho
w
s in Figu
re 1.
Table 1. Likel
i
hood Estim
a
tes an
d EM Estimate
s (
5
.
0
)
Missing r
a
te (
p
)
Likelihood estim
a
te
EM estimate
0.05
0.4978
0.5316
0.10
0.4955
0.5520
0.15
0.4984
0.5250
0.20
0.4986
0.5392
0.25
0.5005
0.4797
0.30
0.4969
0.4389
0.35
0.4946
0.4553
0.40
0.4974
0.4255
0.45
0.5012
0.4206
Table 2. MSE of Likeliho
o
d
Estimation a
nd EM Estimation (
5
.
0
)
Missing r
a
te (
p
)
Likelihood estim
a
tion
EM estimaton
0.05
0.0004
0.0004
0.10
0.0005
0.0007
0.15
0.0005
0.0012
0.20
0.0004
0.0013
0.25
0.0004
0.0017
0.30
0.0004
0.0014
0.35
0.0005
0.0022
0.40
0.0005
0.0017
0.45
0.0005
0.0015
Figure 1. Like
lihood Estim
a
tes and EM E
s
timate
s and
their MSE (
1
)
It can be se
en from Fig
u
re 1, wh
en
the missin
g
rate
30
.
0
p
, the differen
c
e
betwe
en likel
ihood e
s
tima
tion and EM
estimation i
s
si
gnifica
nt, mean
while
the MSE also
incr
ea
singly
i
n
cr
ea
se
s.
Whe
n
=1 a
nd
the initial val
ue of EM
alg
o
rithm
5
.
0
)
0
(
, th
e
lik
e
liho
o
d
es
tima
te
and
EM estimate unde
r differe
nt missin
g
ra
te and t
heir mean squa
re
erro
r sh
ows in Table 3 and
Table 4, and i
t
s analysi
s
ta
ble is sho
w
n i
n
Figure 2.
0.
0
5
0.
1
0.
1
5
0.
2
0.
2
5
0.
3
0.
3
5
0.
4
0.
4
5
0.
4
0.
4
5
0.
5
0.
5
5
0.
6
0.
6
5
e
s
ti
m
a
te
v
a
l
u
e
(
=0
.5
)
ML
E
EM
0.
0
5
0.
1
0.
1
5
0.
2
0.
2
5
0.
3
0.
3
5
0.
4
0.
4
5
0
0.
5
1
1.
5
2
2.
5
x 1
0
-3
MS
E
(
=0
.5
)
ML
E
EM
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Im
pact of Missing
Data on
EM Algorithm
under
Ra
ylei
gh Di
stributio
n (Zhe
ndo
ng
Li)
4721
Table 3. Likel
i
hood Estim
a
tes an
d EM Estimate
s (
1
)
Missing r
a
te (
p
)
Likelihood estim
a
te
EM estimate
0.05
0.9942
0.8874
0.10
1.0004
0.8857
0.15
0.9902
0.8982
0.20
1.0077
0.8372
0.25
0.9819
0.8078
0.30
0.9924
0.7597
0.35
1.0013
0.7420
0.40
0.9954
0.5785
0.45
0.9983
0.5889
Table 4. MSE of Likeliho
o
d
Estimation a
nd EM Estimation (
1
)
Missing r
a
te (
p
)
Likelihood estim
a
tion
EM estimation
0.05
0.0020
0.0025
0.10
0.0017
0.0025
0.15
0.0014
0.0039
0.20
0.0019
0.0051
0.25
0.0015
0.0054
0.30
0.0022
0.0059
0.35
0.0016
0.0070
0.40
0.0026
0.0090
0.45
0.0015
0.0060
Figure 2. Like
lihood Estim
a
tes and EM E
s
timate
s and
their MSE (
1
)
It can be
see
n
from Fig
u
re
1, Figure
2, though
and t
he initial valu
e of EM algo
rithm
)
0
(
take
s differe
nt values
,wh
en the mi
ssi
n
g
rate i
s
gre
a
ter than
0.3
0
, the differe
nce
s
in th
e
MSE also in
cre
a
ses.T
hat
is, when th
e missi
ng ra
te is greate
r
than 0.30,the results of EM
algorith
m
pa
rameter e
s
tim
a
tion are sig
n
ificantly
different to the
result
s of ma
ximum likelih
ood
estimation a
n
d
the differen
c
e in
cre
a
ses
signifi
cantly a
s
the missing
rate incre
a
se
s.
2) The imp
a
ct
of sample
size on paramet
er estim
a
tion
EM algorithm
perfo
rms
we
ll in small
sa
mple with mi
ssi
ng data.
Whe
n
=1, the initial
value of EM algorithm
5
.
0
)
0
(
,
sampli
ng si
ze
20
,
30
,
50
n
,the MSE of the likelihoo
d
estimation a
n
d
EM estimati
on with missi
ng data sho
w
s in Figu
re 3 to Figure 5.
Figure 3. MSE of Different Missi
ng Rate
(
50
n
)
Figure 4. MSE of Different Missi
ng Rate
)
30
(
n
0.
0
5
0.
1
0.
1
5
0.
2
0.
2
5
0.
3
0.
3
5
0.
4
0.
4
5
0
0.
0
0
2
0.
0
0
4
0.
0
0
6
0.
0
0
8
0.
0
1
MS
E
(
=1
,
n
=5
0
)
ML
E
EM
0.
0
5
0.
1
0.
1
5
0.
2
0.
2
5
0.
3
0.
3
5
0.
4
0.
4
5
0
0.
0
0
5
0.
0
1
0.
0
1
5
MS
E
(
=1
,
n
=3
0
)
ML
E
EM
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4717 – 4
723
4722
Figure 5. MSE of Different Missi
ng Rate
)
20
(
n
It shows in Fi
gure 3,
when
sample
size
50
n
,mis
s
i
ng rate
30
.
0
p
, the differen
c
e
betwe
en likel
ihood e
s
tima
tion and EM
estimation
b
e
com
e
s
cle
a
r
. It shows i
n
Figure 4 a
n
d
Figure 5, wh
en
20
,
30
n
,
15
.
0
p
,the difference is
cle
a
r.
It indicate
s that sam
p
le
si
ze h
a
s
a
big imfact on
the accuracy
of EM estimation.
3) The imp
a
ct
of initial value on paramet
er estim
a
tion
It shows in Figure 6, when
=1, the effect
of EM algorithm wa
s not g
ood with the i
n
itial
value
5
.
0
)
0
(
,when the missing rate is fairly
low,
the effect will be greatly improv
ed i
f
sele
cting
0.8
as its initial v
a
lue. It indica
tes that
the E
M
algo
rithm i
s
sen
s
itive to
the sele
ction
o
f
initial value, to choose a rea
s
on
able
initial value
)
0
(
will increase the accuracy of EM
algorith
m
.
Figure 6. Like
lihood Estim
a
tes and EM E
s
timate
s and
their MSE (
1
,
8
.
0
,
5
.
0
)
0
(
)
4. Conclusio
n
EM algo
rith
m pe
rform
s
well i
n
p
a
ra
meter
estim
a
tion
with m
i
ssi
ng
data.
Thro
ugh
comp
uter si
m
u
lated
cal
c
ul
ation, it
can
b
e
seen
that
when th
e mi
ssi
ng
rate i
s
l
e
ss tha
n
0.30,
EM
algorith
m
i
s
a
l
most i
dentica
l
to likeliho
od
estima
tion
,
when th
e mi
ssi
ng
rate i
s
gre
a
ter th
an
0.3
0
,
the differe
nce bet
wee
n
t
he two p
a
ra
meter
es
tim
a
tion meth
o
d
s i
s
i
n
crea
sing. Sim
u
la
ted
cal
c
ulatio
n in
dicate
s that sampl
e
si
ze
the sele
ction
of initial value make an
effect on the
accuracy
of E
M
algo
rithm,
and it
need
s f
u
rthe
r
st
udy t
o
choo
se
a
reasona
ble ini
t
ial value of E
M
algorith
m
accordin
g to different problem
s.
Ackn
o
w
l
e
dg
ements
The re
se
arch is su
ppo
rted by Scien
c
e an
d Tech
nology Supp
ort Proje
c
t of Gansu
Province (Pro
ject No. 12
04
GKCA01
0).
Referen
ces
[1]
Lin
g
W
e
i, Ji
a
n
ju
n Qi, Yimi
n
Shi. T
he EB Estimation
of
Scale-
param
e
t
er for the T
w
o-Param
e
ter
Exp
o
n
enti
a
l D
i
s
tributio
n U
nde
r the T
y
pe-II C
ensor
ing
Life
T
e
st.
Mathe
m
ati
c
a App
lic
ata
. 200
1;
14(
4):
66-7
01.
[2]
Yushu
a
n
g
Liu,
Li
xin S
ong. T
he EB Estimatio
n
of
Scale-p
a
r
a
meter Un
der
T
y
pe-II Ce
nsor
ing L
i
fe T
e
st
for
T
w
o-p
a
ram
e
ter Weibu
ll Di
stributio
n.
Jour
nal of Jil
i
n Nor
m
a
l
un
iversity
. 200
4; 5(2): 16~
18.
0.
0
5
0.
1
0.
1
5
0.
2
0.
2
5
0.
3
0.
3
5
0.
4
0.
4
5
0
0.
0
0
5
0.
0
1
0.
0
1
5
0.
0
2
MS
E
(
=1
,n
=
2
0
)
ML
E
EM
0.
0
5
0.
1
0.
1
5
0.
2
0.
2
5
0.
3
0.
3
5
0.
4
0.
4
5
0
0.
5
1
1.
5
e
s
ti
m
a
te
v
a
lu
e
(
=1
,
(0
)
=0
.
5
)
ML
E
EM
0.
0
5
0.
1
0.
1
5
0.
2
0.
2
5
0.
3
0.
3
5
0.
4
0.
4
5
0
0.
5
1
1.
5
e
s
ti
m
a
te
v
a
lu
e
(
=1
,
(0
)
=0
.
8
)
ML
E
EM
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Im
pact of Missing
Data on
EM Algorithm
under
Ra
ylei
gh Di
stributio
n (Zhe
ndo
ng
Li)
4723
[3]
Z
hend
on
g Li,
Yings
hu L
i
ao.
I
m
p
a
ct of Missi
ng Dat
a
on P
a
rameter Esti
mation
of EM Algorith
m
U
n
d
e
r
Expon
enti
a
l D
i
s
tributio
n
. T
he Internatio
na
l C
onfere
n
ce
on
Automatic C
o
n
t
rol an
d Artifici
al Intel
lig
enc
e
(ACAI 2012).
Xiame
n
. 201
2; 3597-
359
9.
[4]
Ng HKT
,
Cha
n
PS, Bal
a
kris
hna
n N. Estim
a
tion
of
Par
a
meters
F
r
om Progress
i
vel
y
Cens
ored
Dat
a
Using EM Al
go
rithm.
Comput
ation
a
l Statistic
s
& Data Analy
s
is
. 2002; 3
9
:3
71- 38
6.
[5]
W
u
CF
J.
On th
e Conv
erge
nc
e Pr
operti
es of the EM Algorithm.
T
he Annal
s of Statistics
. 198
3;
11:9
5
-
103.
Evaluation Warning : The document was created with Spire.PDF for Python.