I
nte
rna
t
io
na
l J
o
urna
l o
f
Ro
bo
t
ics a
nd
Aut
o
m
a
t
io
n (
I
J
R
A)
Vo
l.
7
,
No
.
1
,
Ma
r
ch
2
0
1
8
,
p
p
.
3
9
~
47
I
SS
N:
2089
-
4
8
5
6
,
DOI
: 1
0
.
1
1
5
9
1
/
i
j
r
a
.
v7
i
1
.
pp
39
-
47
39
J
o
ur
na
l ho
m
ep
a
g
e
:
h
ttp
:
//ia
e
s
co
r
e.
co
m/jo
u
r
n
a
ls
/in
d
ex
.
p
h
p
/
I
JR
A
/in
d
ex
An Ac
tor
-
cri
tic
A
lg
o
rith
m
Using
C
ro
ss
Ev
a
lua
tion o
f
Va
lue
Funct
io
ns
H
ui Wa
ng
,
P
eng
Z
ha
ng
,
Q
u
a
n L
iu
S
c
h
o
o
l
o
f
Co
m
p
u
ter S
c
ien
c
e
a
n
d
T
e
c
h
n
o
lo
g
y
,
S
o
o
c
h
o
w
Un
iv
e
rsity
,
S
u
z
h
o
u
,
Jia
n
g
su
,
C
h
i
n
a
Art
icle
I
nfo
AB
ST
RAC
T
A
r
ticle
his
to
r
y:
R
ec
eiv
ed
Sep
4
,
2
0
1
7
R
ev
i
s
ed
Dec
5
,
2
0
1
7
A
cc
ep
ted
J
an
1
1
,
2
0
1
8
In
o
r
d
e
r
to
o
v
e
rc
o
m
e
th
e
d
if
f
icu
lt
y
o
f
lea
rn
in
g
a
g
lo
b
a
l
o
p
ti
m
a
l
p
o
l
ic
y
c
a
u
se
d
b
y
m
a
x
i
m
iz
a
ti
o
n
b
ias
in
a
c
o
n
ti
n
u
o
u
s
sp
a
c
e
,
a
n
a
c
to
r
-
c
rit
ic
a
lg
o
rit
h
m
f
o
r
c
ro
ss
e
v
a
lu
a
ti
o
n
o
f
d
o
u
b
le
v
a
lu
e
f
u
n
c
ti
o
n
is
p
r
o
p
o
se
d
.
T
w
o
in
d
e
p
e
n
d
e
n
t
v
a
lu
e
f
u
n
c
ti
o
n
s
m
a
k
e
th
e
c
rit
iq
u
e
c
lo
se
r
to
th
e
re
a
l
v
a
lu
e
f
u
n
c
ti
o
n
.
A
n
d
th
e
a
c
to
r
is
g
u
id
e
d
b
y
a
c
ro
ss
o
v
e
r
f
u
n
c
ti
o
n
to
c
h
o
o
se
it
s
o
p
ti
m
a
l
a
c
ti
o
n
s.
Cro
ss
e
v
a
lu
a
ti
o
n
o
f
v
a
l
u
e
f
u
n
c
ti
o
n
s
a
v
o
id
s
t
h
e
p
o
l
icy
ji
tt
e
r
p
h
e
n
o
m
e
n
o
n
b
e
h
a
v
e
d
b
y
g
re
e
d
y
o
p
ti
m
iza
ti
o
n
m
e
th
o
d
s
i
n
c
o
n
ti
n
u
o
u
s
s
p
a
c
e
s.
T
h
e
a
lg
o
rit
h
m
is
m
o
re
ro
b
u
st
th
a
n
CA
CLA
le
a
rn
in
g
a
lg
o
rit
h
m
,
a
n
d
th
e
e
x
p
e
rim
e
n
tal
re
s
u
lt
s
sh
o
w
th
a
t
o
u
r
a
lg
o
rit
h
m
is
sm
o
o
th
e
r
a
n
d
t
h
e
sta
b
i
li
ty
o
f
p
o
li
c
y
is
im
p
ro
v
e
d
o
b
v
io
u
sly
u
n
d
e
r
th
e
c
o
n
d
i
ti
o
n
th
a
t
th
e
c
o
m
p
u
tatio
n
re
m
a
i
n
s
a
lm
o
st
u
n
c
h
a
n
g
e
d
.
K
ey
w
o
r
d
:
A
cto
r
-
cr
itic
C
o
n
ti
n
u
o
u
s
s
p
ac
es
C
r
o
s
s
ev
a
lu
atio
n
R
ein
f
o
r
ce
m
e
n
t le
ar
n
in
g
Co
p
y
rig
h
t
©
201
8
In
stit
u
te o
f
A
d
v
a
n
c
e
d
E
n
g
i
n
e
e
rin
g
a
n
d
S
c
ien
c
e
.
All
rig
h
ts
re
se
rv
e
d
.
C
o
r
r
e
s
p
o
nd
ing
A
uth
o
r
:
Qu
a
n
g
L
i
u
,
Sch
o
o
l o
f
C
o
m
p
u
ter
Scie
n
ce
a
n
d
T
ec
h
n
o
lo
g
y
,
So
o
ch
o
w
U
n
i
v
er
s
it
y
,
No
.
1
Sh
izi
Stre
et,
Su
z
h
o
u
,
J
ia
n
g
s
u
2
1
5
0
0
0
,
C
h
in
a
E
m
ail: q
u
a
n
li
u
@
s
u
d
a.
ed
u
.
cn
1.
I
NT
RO
D
UCT
I
O
N
T
em
p
o
r
al
d
if
f
er
e
n
ce
alg
o
r
it
h
m
i
n
r
ein
f
o
r
ce
m
e
n
t
lear
n
in
g
(
R
L
)
u
s
u
all
y
u
s
es
a
m
a
x
i
m
izi
n
g
o
p
er
atio
n
to
s
o
lv
e
t
h
e
o
p
ti
m
al
p
o
lic
y
.
W
h
eth
er
t
h
e
o
f
f
-
p
o
lic
y
Q
-
lea
r
n
in
g
o
r
in
-
p
o
lic
y
S
AR
S
A
,
t
h
e
m
ax
i
m
iz
a
tio
n
i
s
r
eq
u
ir
ed
to
f
in
d
t
h
e
o
p
ti
m
al
a
ctio
n
o
f
t
h
e
c
u
r
r
en
t
s
tate,
a
n
d
b
ased
o
n
th
i
s
ac
tio
n
,
alg
o
r
it
h
m
e
s
co
n
tin
u
o
u
s
l
y
u
p
d
ate
th
e
s
tate
-
ac
tio
n
v
al
u
e
f
u
n
ctio
n
s
.
T
h
e
v
alu
e
f
u
n
ctio
n
s
ca
lcu
lated
al
w
a
y
s
h
av
e
s
o
m
e
d
ev
iatio
n
s
,
w
h
ic
h
ar
e
u
s
u
all
y
r
e
f
er
r
ed
to
as
th
e
m
a
x
i
m
izatio
n
b
ias.
I
n
s
o
m
e
s
p
ec
ial
cir
c
u
m
s
ta
n
ce
s
,
t
h
ese
d
ev
iat
io
n
s
m
a
y
s
er
io
u
s
l
y
a
f
f
ec
t t
h
e
l
ea
r
n
in
g
e
f
f
icie
n
c
y
o
f
th
e
a
g
en
t.
T
h
e
m
ax
i
m
izatio
n
b
ias
m
a
y
s
h
i
f
t
t
h
e
lear
n
i
n
g
g
o
als,
m
a
k
in
g
t
h
e
p
o
lic
y
ca
lc
u
latio
n
f
all
i
n
to
a
lo
ca
l
o
p
tim
a
l.
T
h
e
p
o
lic
y
lear
n
ed
at
th
i
s
ti
m
e
is
o
p
ti
m
al
i
n
t
h
e
ad
j
ac
en
t
p
o
lic
y
s
p
ac
e,
b
u
t
it
i
s
n
o
t
th
e
o
p
ti
m
al
in
th
e
en
tire
p
o
lic
y
s
p
ac
e.
T
h
e
lo
ca
l
o
p
ti
m
al
p
o
lic
y
o
cc
u
r
s
b
ec
au
s
e
th
e
a
g
e
n
t
d
o
es
n
o
t
ad
eq
u
ate
l
y
ac
ce
s
s
t
h
e
s
tate
an
d
ac
tio
n
s
p
ac
e,
a
n
d
ca
n
n
o
t
ac
cu
m
u
late
e
n
o
u
g
h
s
a
m
p
le
d
ata
r
elate
d
to
th
e
o
p
ti
m
al
p
o
lic
y
.
I
n
a
lear
n
in
g
en
v
ir
o
n
m
e
n
t
o
f
lar
g
e
-
s
ca
le
co
n
ti
n
u
o
u
s
s
ta
te
s
p
ac
e
o
r
ac
tio
n
s
p
ac
e,
r
ein
f
o
r
ce
m
e
n
t
lear
n
in
g
alg
o
r
ith
m
n
ee
d
s
to
av
o
id
th
e
i
m
p
r
o
p
er
o
p
tim
izati
o
n
g
u
id
an
ce
w
h
ich
m
a
y
p
r
ese
n
t
an
i
n
co
r
r
ec
t
d
ir
ec
tio
n
ab
o
u
t
th
e
o
p
ti
m
al
p
o
lic
y
.
I
n
s
u
ch
s
it
u
atio
n
,
th
e
m
a
x
i
m
izatio
n
b
ias
s
h
o
w
s
an
o
b
v
i
o
u
s
s
id
e
e
f
f
ec
t.
A
t
t
h
e
s
a
m
e
ti
m
e,
t
h
e
lear
n
in
g
alg
o
r
ith
m
i
n
a
co
n
tin
u
o
u
s
s
p
ac
e
is
m
o
r
e
li
k
el
y
to
p
r
o
d
u
ce
cu
r
s
e
o
f
d
i
m
en
s
io
n
al
it
y
,
w
h
ich
m
a
k
es
t
h
e
ex
lo
r
atio
n
ti
m
e
in
b
o
th
p
o
lic
y
an
d
s
tate
s
p
ac
e
r
is
e
alo
n
g
t
h
e
ex
p
o
n
en
t
ial
cu
r
v
e.
T
h
e
o
p
tim
izatio
n
o
f
th
e
f
u
n
ctio
n
ap
p
r
o
x
i
m
a
to
r
is
a
c
o
m
m
o
n
w
a
y
to
b
r
ea
k
th
e
cu
r
s
e
o
f
d
i
m
en
s
io
n
al
it
y
i
n
co
n
t
in
u
o
u
s
s
p
ac
e.
A
li
n
ea
r
ap
p
r
o
x
i
m
ati
o
n
w
a
s
f
ir
s
t
u
s
ed
b
y
Sa
m
u
e
l
to
i
m
p
le
m
e
n
t
a
n
ar
tif
icial
ch
ec
k
er
p
la
y
er
[
1
]
.
Su
tto
n
co
m
b
i
n
ed
t
h
e
te
m
p
r
al
d
if
f
er
e
n
ce
lear
n
i
n
g
m
eth
o
d
e
n
h
a
n
ce
d
b
y
eli
g
ib
le
tr
aj
ec
to
r
y
w
it
h
a
lin
ea
r
f
u
n
c
t
io
n
ap
p
r
o
x
i
m
ato
r
,
an
d
u
s
es
th
e
g
r
ad
ien
t
d
escen
t
m
et
h
o
d
to
s
o
l
v
e
th
e
ap
p
r
o
x
i
m
ate
v
alu
e
f
u
n
c
tio
n
s
[
2
]
.
E
n
g
e
l
u
s
ed
Gau
s
s
ia
n
p
r
o
ce
s
s
to
m
o
d
el
th
e
v
al
u
e
f
u
n
ctio
n
,
an
d
p
r
o
p
o
s
ed
a
tem
p
o
r
al
d
if
f
er
e
n
ce
r
ein
f
o
r
ce
m
e
n
t
lear
n
in
g
m
et
h
o
d
w
ith
Ga
u
s
s
ia
n
p
r
o
ce
s
s
[
3
]
.
T
h
eir
w
o
r
k
d
e
m
o
n
s
tr
ates
th
at
o
p
ti
m
ized
f
u
n
ctio
n
ap
p
r
o
x
i
m
ato
r
s
ca
n
ac
h
iev
e
e
x
ce
lle
n
t e
x
p
er
i
m
e
n
tal
r
esu
lt
s
in
co
n
ti
n
u
o
u
s
s
tate
an
d
ac
tio
n
s
p
ac
e
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
9
-
4856
I
J
R
A
,
Vo
l.
7
,
No
.
1
,
Ma
r
ch
2
0
1
8
:
39
–
47
40
T
h
e
ac
to
r
-
cr
itic
ap
p
r
o
ac
h
[
4
]
co
m
b
i
n
es
t
h
e
ad
v
a
n
ta
g
es
o
f
t
h
e
v
al
u
e
f
u
n
ctio
n
m
e
th
o
d
an
d
th
e
p
o
lic
y
s
ea
r
ch
m
et
h
o
d
,
w
h
il
e
s
to
r
in
g
b
o
th
v
alu
e
f
u
n
ctio
n
s
an
d
p
o
licies.
W
h
en
t
h
e
ag
e
n
t
s
elec
t
s
an
ac
tio
n
,
it
o
n
l
y
n
ee
d
s
to
s
elec
t
it
b
ased
o
n
t
h
e
s
to
r
ed
p
o
licy
w
i
th
o
u
t
k
n
o
w
i
n
g
o
f
t
h
e
v
al
u
e
f
u
n
ctio
n
.
W
h
en
an
i
m
m
ed
iate
r
e
w
ar
d
is
o
b
tai
n
ed
f
r
o
m
t
h
e
e
n
v
ir
o
n
m
e
n
t,
t
h
e
a
g
e
n
t
u
p
d
ates
th
e
v
a
l
u
e
f
u
n
c
tio
n
a
n
d
m
ai
n
tai
n
s
t
h
e
c
u
r
r
en
tl
y
s
to
r
ed
p
o
licy
b
ased
o
n
th
e
c
h
a
n
g
e
o
f
th
e
v
al
u
e
f
u
n
c
tio
n
.
I
n
t
h
e
co
n
ti
n
u
o
u
s
s
tate
an
d
ac
t
io
n
s
p
ac
e,
th
e
ac
to
r
cr
itic
m
et
h
o
d
ca
n
av
o
id
t
h
e
co
n
v
er
g
e
n
c
e
to
a
lo
ca
l
o
p
tim
a
l
b
y
u
s
i
n
g
o
n
l
y
t
h
e
v
al
u
e
f
u
n
ctio
n
,
a
n
d
also
s
o
l
v
e
t
h
e
lar
g
er
esti
m
atio
n
b
ias
p
r
o
b
le
m
i
n
m
o
s
t
p
o
lic
y
s
ea
r
ch
m
e
th
o
d
s
[
5
,
6
]
.
I
n
r
ec
e
n
t
y
ea
r
s
,
t
h
e
u
s
e
o
f
ac
to
r
cr
itic
m
eth
o
d
s
to
s
o
l
v
e
r
ein
f
o
r
m
e
n
t
lear
n
i
n
g
p
r
o
b
lem
s
in
co
n
ti
n
u
o
u
s
s
p
ac
e
h
as b
ec
o
m
e
a
r
esear
ch
h
o
ts
p
o
t.
Su
tto
n
p
r
o
p
o
s
ed
a
m
et
h
o
d
o
f
ap
p
ly
in
g
f
u
n
cti
o
n
ap
p
r
o
x
i
m
a
tio
n
to
p
o
lic
y
g
r
ad
ien
ts
,
w
h
ic
h
d
ep
en
d
s
o
n
a
d
ed
icate
d
ac
tio
n
r
ep
r
esen
tatio
n
[
7
]
.
T
h
e
m
eth
o
d
f
ir
s
t
i
n
tr
o
d
u
ce
s
a
p
o
lic
y
v
al
u
e
f
u
n
ct
io
n
,
an
d
d
e
f
i
n
es
th
e
o
b
j
ec
tiv
e
o
f
co
n
ti
n
u
o
u
s
s
p
ac
e
r
ein
f
o
r
ce
m
en
t
lear
n
in
g
a
s
t
h
e
m
a
x
i
m
izatio
n
o
f
p
o
lic
y
v
al
u
e
f
u
n
ct
io
n
.
P
eter
s
ap
p
lied
th
e
n
atu
r
al
g
r
ad
ien
t
m
et
h
o
d
to
f
u
n
ctio
n
ap
p
r
o
x
i
m
atio
n
.
He
co
m
b
in
ed
t
h
e
te
m
p
o
r
al
d
if
f
er
en
ce
w
i
th
least
s
q
u
ar
e
s
alg
o
r
it
h
m
in
r
ei
n
f
o
r
ce
m
en
t
lear
n
i
n
g
,
a
n
d
d
esi
g
n
ed
n
atu
r
al
ac
to
r
-
cr
itic
(
N
AC
)
alg
o
r
it
h
m
[
8
,
9
]
.
Hass
elt
u
s
ed
t
h
e
ac
tio
n
d
if
f
er
e
n
ce
to
i
m
p
r
o
v
e
p
o
lic
y
p
ar
a
m
e
ter
s
to
ev
a
lu
ate
t
h
e
p
r
o
s
an
d
c
o
n
s
o
f
ac
tio
n
s
,
a
n
d
p
r
o
p
o
s
ed
a
co
n
tin
u
o
u
s
ac
to
r
-
c
r
itic le
ar
n
in
g
a
u
to
m
ato
n
(
C
AC
L
A
)
[
10
].
W
ier
s
tr
a
ap
p
lied
th
e
n
at
u
r
al
g
r
ad
ien
t
m
et
h
o
d
an
d
t
h
e
e
v
o
lu
tio
n
ar
y
p
o
lic
y
m
e
th
o
d
to
t
h
e
p
o
lic
y
u
p
d
ate,
an
d
p
r
o
p
o
s
ed
th
e
n
a
tu
r
al
e
v
o
lu
t
io
n
ar
y
s
tr
ate
g
ie
s
(
NE
S)
[
1
1
,
1
2
]
.
B
u
s
o
n
i
u
u
s
e
d
cr
o
s
s
e
n
tr
o
p
y
to
o
p
tim
ize
t
h
e
p
ar
a
m
e
ter
s
o
f
t
h
e
b
asis
f
u
n
ctio
n
s
,
a
n
d
p
r
o
p
o
s
ed
a
cr
o
s
s
-
en
tr
o
p
y
o
p
ti
m
izati
o
n
m
eth
o
d
[
1
3
,
1
4
]
.
Ma
r
tin
u
s
ed
t
h
e
k
-
n
ea
r
est
n
ei
g
h
b
o
r
class
if
icatio
n
to
d
is
cr
et
i
ze
th
e
s
p
ac
e,
an
d
p
r
o
p
o
s
ed
a
te
m
p
o
r
al
d
if
f
er
e
n
ce
alg
o
r
ith
m
b
ased
o
n
k
-
n
ea
r
e
s
t
n
eig
h
b
o
r
class
i
f
icatio
n
[
15
]
.
L
ill
icr
ap
u
s
ed
a
p
o
licy
g
r
ad
ien
t
to
s
tu
d
y
th
e
d
ee
p
r
ein
f
o
r
ce
m
en
t
lear
n
in
g
p
r
o
b
lem
an
d
p
r
o
p
o
s
ed
a
d
ee
p
d
eter
m
i
n
is
t
ic
p
o
lic
y
g
r
ad
ien
t
al
g
o
r
ith
m
[
16
].
Gu
u
s
ed
a
m
o
d
el
lear
n
i
n
g
m
et
h
o
d
to
i
m
p
r
o
v
e
th
e
co
n
v
er
g
e
n
ce
r
ate
in
co
n
ti
n
u
o
u
s
s
p
ac
e,
an
d
p
r
o
p
o
s
ed
a
co
n
tin
u
o
u
s
s
p
ac
e
d
ee
p
Q
-
lear
n
in
g
alg
o
r
i
th
m
b
ased
o
n
m
o
d
el
ac
ce
ler
atio
n
[
17
]
.
Kh
am
a
s
s
i
ap
p
lied
th
e
m
eta
-
lear
n
in
g
m
et
h
o
d
to
th
e
e
x
p
lo
r
atio
n
o
f
th
e
p
ar
a
m
eter
e
s
i
n
co
n
ti
n
u
o
u
s
s
p
ac
es,
a
n
d
p
r
o
p
o
s
ed
an
ac
to
r
-
cr
itic
lear
n
in
g
m
et
h
o
d
b
ased
o
n
m
e
ta
-
lear
n
i
n
g
[
1
8
].
2.
A
CT
O
R
-
CR
I
T
I
C
M
E
T
H
O
D
A
cc
o
r
d
in
g
to
th
e
s
elec
tio
n
o
f
ac
tio
n
s
,
r
ein
f
o
r
ce
m
en
t
lear
n
in
g
m
et
h
o
d
ca
n
b
e
d
iv
id
ed
in
to
th
r
ee
ca
teg
o
r
ies:
ac
to
r
-
o
n
l
y
,
cr
itic
-
o
n
l
y
a
n
d
a
cto
r
-
cr
itic.
T
h
e
a
cto
r
-
o
n
l
y
m
eth
o
d
d
o
es
n
o
t
esti
m
ate
t
h
e
v
alu
e
f
u
n
ctio
n
.
Ag
en
t
f
o
llo
w
s
it
s
cu
r
r
en
t
p
o
lic
y
to
i
n
ter
ac
t
t
h
e
e
n
v
ir
o
n
m
en
t.
T
h
e
i
m
m
ed
iate
r
e
w
ar
d
aq
u
air
ed
f
r
o
m
th
e
en
v
ir
o
m
e
n
t i
s
u
s
ed
d
ir
ec
tly
to
o
p
ti
m
ize
cu
r
r
en
t p
o
lic
y
.
T
h
e
cr
itic
-
o
n
l
y
ap
p
r
o
ac
h
d
o
e
s
n
o
t
n
ee
d
to
m
a
in
ta
in
a
p
o
lic
y
.
T
h
e
p
o
lic
y
is
ca
lcu
lated
f
r
o
m
t
h
e
cu
r
r
en
t
v
a
lu
e
f
u
n
ctio
n
b
y
in
ter
ac
tin
g
w
ith
t
h
e
en
v
ir
o
n
m
en
t.
T
h
e
cu
r
r
en
t
v
al
u
e
f
u
n
ct
io
n
is
co
n
ti
n
u
all
y
o
p
tim
ized
u
s
in
g
t
h
e
r
e
w
ar
d
o
b
tain
ed
.
Un
li
k
e
ac
to
r
-
o
n
l
y
an
d
th
e
cr
it
ic
-
o
n
l
y
ap
p
r
o
ac
h
e
s
,
th
e
ac
to
r
-
cr
itic
ap
p
r
o
ac
h
co
n
s
is
t
s
o
f
b
o
th
th
e
ac
to
r
an
d
t
h
e
cr
it
ic,
w
h
ic
h
n
ee
d
to
m
ai
n
tai
n
v
al
u
e
f
u
n
c
tio
n
s
an
d
p
o
lic
y
at
th
e
s
a
m
e
ti
m
e.
T
h
e
ac
to
r
u
s
ed
to
s
elec
t
ac
tio
n
s
,
an
d
th
e
cr
itic
c
an
co
m
m
en
t
o
n
th
e
s
elec
ted
ac
tio
n
g
o
o
d
o
r
b
a
d
.
T
h
e
w
a
y
t
h
at
ac
to
r
s
ch
o
o
s
e
ac
tio
n
s
is
n
o
t
b
ased
o
n
th
e
cu
r
r
en
t
v
al
u
e
f
u
n
ctio
n
,
b
u
t
t
h
eir
o
w
n
p
o
lic
y
.
C
r
itics
'
co
m
m
e
n
t
s
g
en
er
all
y
tak
e
th
e
f
o
r
m
o
f
te
m
p
r
o
al
d
if
f
er
e
n
ce
e
r
r
o
r
s
,
w
h
ich
is
ca
lcu
lated
f
r
o
m
t
h
e
c
u
r
r
en
t
v
a
lu
e
f
u
n
ctio
n
.
T
h
e
te
m
p
o
r
al
d
if
f
er
e
n
ce
er
r
o
r
is
th
e
o
n
l
y
o
u
tp
u
t
o
f
th
e
cr
itic
a
n
d
d
r
iv
es
a
ll
t
h
e
le
ar
n
in
g
b
et
w
ee
n
t
h
e
ac
to
r
an
d
th
e
cr
itic [
1
9
,
2
0
]
.
T
h
e
ac
to
r
's alg
o
r
it
h
m
s
tr
u
ct
u
r
e
i
s
s
h
o
w
n
i
n
Fi
g
u
r
e
1
.
Fig
u
r
e
1
.
T
h
e
f
r
am
e
w
o
r
k
o
f
ac
to
r
-
cr
itic
m
et
h
o
d
E
n
v
i
r
o
n
m
en
t
V
a
l
u
e
F
u
n
c
t
i
o
n
P
o
l
i
c
y
A
c
t
o
r
s
t
a
t
e
r
e
w
a
r
d
a
c
t
i
o
n
c
r
i
t
i
c
T
e
m
p
o
r
a
l
D
i
f
f
e
r
e
n
c
e
E
r
r
o
r
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
R
A
I
SS
N:
2089
-
4856
A
n
A
cto
r
-
crit
ic
A
lg
o
r
ith
m
Usi
n
g
C
r
o
s
s
E
va
lu
a
tio
n
o
f V
a
lu
e
…
(
Hu
i
W
a
n
g
)
41
T
h
e
tr
ad
itio
n
al
ac
to
r
-
cr
itic
ap
p
r
o
ac
h
is
u
s
ed
in
d
i
s
cr
ete
s
tat
e
an
d
ac
tio
n
s
p
ac
es.
T
h
e
ac
to
r
s
elec
ts
a
p
er
f
o
r
m
in
g
ac
tio
n
i
n
t
h
e
cu
r
r
en
t
s
tate
b
ased
o
n
a
lo
o
k
u
p
tab
le.
T
h
e
lo
o
k
u
p
tab
le
s
to
r
es
th
e
co
r
r
esp
o
n
d
in
g
p
r
ef
er
en
ce
,
p
x
u
f
o
r
ea
ch
s
tate
ac
ti
o
n
p
air
.
,
xu
r
ep
r
esen
t
s
tate
a
n
d
ac
tio
n
.
A
cc
o
r
d
in
g
p
r
ef
er
e
n
ce
p
,
t
h
e
s
elec
tio
n
p
r
o
b
ab
ilit
y
o
f
ea
c
h
ac
tio
n
ca
n
b
e
co
m
p
u
ted
u
n
d
er
th
e
cu
r
r
e
n
t
s
tate.
T
h
e
ac
tio
n
s
elec
tio
n
m
et
h
o
d
is
Gib
b
s
s
o
f
t
m
ax
.
A
s
s
h
o
w
n
in
E
q
u
atio
n
1
.
(
,
)
,'
'
,
p
x
u
p
x
u
uU
e
h
x
u
e
(
1
)
C
r
itic
u
s
es
te
m
p
o
r
al
d
if
f
er
e
n
c
e
er
r
o
r
to
c
o
m
m
e
n
t
o
n
t
h
e
q
u
alit
y
o
f
ac
tio
n
s
.
On
ce
t
h
e
cr
iti
c
g
ets
t
h
e
ac
tio
n
,
t
h
e
te
m
p
o
r
al
d
if
f
er
en
c
e
er
r
o
r
is
ca
lcu
lated
f
r
o
m
th
e
v
alu
e
f
u
n
ctio
n
.
T
h
e
te
m
p
o
r
al
d
if
f
er
en
ce
er
r
o
r
is
ca
lcu
lated
w
it
h
eq
u
atio
n
(
2
)
.
V
is
s
ta
te
v
a
lu
e
f
u
n
c
tio
n
,
[0
,
1
)
is
a
d
is
co
u
n
t
an
d
r
r
ep
r
esen
ts
an
i
m
m
ed
iate
r
e
w
o
r
d
.
11
(
)
(
)
t
t
t
t
r
V
x
V
x
(
2
)
P
r
ef
er
en
ce
p
ca
n
b
e
u
p
d
ated
iter
ativ
el
y
w
it
h
eq
u
atio
n
(
3
)
.
is
a
s
tep
f
ac
to
r
.
(
,
)
(
,
)
t
t
t
t
t
p
x
u
p
x
u
(
3
)
3.
C
RO
SS
E
VALU
AT
I
O
N
O
F
VALU
E
F
UNC
T
I
O
NS
I
n
te
m
p
o
r
al
d
if
f
er
en
ce
al
g
o
r
it
h
m
,
t
h
e
g
r
ee
d
y
m
e
th
o
d
,
w
h
ic
h
d
ep
en
d
s
o
n
th
e
m
a
x
i
m
izi
n
g
o
p
er
atio
n
,
is
u
s
ed
f
o
r
s
elec
ti
n
g
ac
tio
n
s
.
B
ec
au
s
e
o
f
t
h
e
m
ax
i
m
iza
tio
n
b
ias,
a
p
ar
t
o
f
v
al
u
e
f
u
n
ct
io
n
w
ill
b
e
o
v
er
-
ev
alu
a
ted
an
d
b
e
ea
s
il
y
a
f
f
ec
ted
b
y
t
h
e
lac
k
o
f
ex
p
lo
r
atio
n
o
f
t
h
e
en
v
ir
o
n
m
e
n
t.
T
h
is
m
ak
e
s
t
h
e
lear
n
i
n
g
p
o
licy
o
f
ten
tr
ap
in
to
a
lo
ca
l
o
p
tim
al.
I
n
p
ar
ticu
lar
,
f
o
r
th
e
o
f
f
-
p
o
lic
y
Q
-
lear
n
i
n
g
alg
o
r
i
th
m
,
it
s
ev
al
u
atio
n
p
o
licy
n
e
v
er
e
x
p
lo
r
es,
s
o
it
i
s
ea
s
ier
to
f
all
i
n
to
a
lo
ca
l
o
p
ti
m
al
p
o
lic
y
[
2
1
]
.
E
x
p
lo
r
atio
n
ca
n
r
ed
u
ce
t
h
e
m
ax
i
m
izatio
n
b
ias,
b
u
t
t
h
e
o
v
er
all
ef
f
icie
n
c
y
is
lo
w
.
T
h
e
ad
o
p
tio
n
o
f
m
u
l
tip
le
cr
o
s
s
ev
al
u
ated
v
alu
e
f
u
n
ctio
n
s
f
o
r
te
m
p
o
r
al
d
if
f
er
e
n
ce
lear
n
i
n
g
ca
n
d
ec
r
ea
s
e
th
e
m
ax
i
m
izat
io
n
b
ias [
2
2
].
I
n
s
h
o
r
t,
cr
o
s
s
-
e
v
alu
a
tio
n
v
al
u
e
f
u
n
ct
io
n
u
s
u
all
y
u
s
es
t
w
o
s
ets
o
f
v
a
lu
e
f
u
n
ctio
n
s
f
o
r
te
m
p
o
r
al
d
if
f
er
e
n
ce
lear
n
i
n
g
.
Un
l
ik
e
t
h
e
u
s
u
al
te
m
p
o
r
al
d
if
f
er
e
n
ce
al
g
o
r
ith
m
,
th
er
e
ar
e
t
w
o
p
er
f
o
r
m
er
s
i
n
th
e
le
ar
n
i
n
g
p
r
o
ce
s
s
,
w
it
h
t
w
o
s
et
s
o
f
i
n
d
e
p
en
d
en
t
ev
a
lu
at
io
n
f
u
n
ctio
n
s
.
W
h
en
u
s
i
n
g
th
e
ac
t
io
n
s
ta
te
v
alu
e
f
u
n
ctio
n
,
t
h
e
ev
alu
a
tio
n
f
u
n
ctio
n
s
ar
e
1
Qu
an
d
2
Qu
,
ac
tio
n
uU
(
s
et
o
f
ac
tio
n
s
)
.
Fo
r
ea
ch
e
v
alu
a
tio
n
f
u
n
c
tio
n
1
Qu
,
th
e
o
p
ti
m
al
ac
tio
n
i
s
ca
lcu
lat
ed
w
it
h
eq
u
atio
n
(
4
)
.
1
*
m
a
x
u
u
Q
u
(
4
)
I
n
an
o
t
h
er
ev
al
u
atio
n
f
u
n
ct
io
n
2
Qu
,
it
is
n
ee
d
ed
to
ch
ec
k
th
e
o
p
ti
m
al
ac
tio
n
o
b
tain
ed
b
y
1
Qu
.
T
h
e
ev
alu
atio
n
f
o
llo
w
s
eq
u
ati
o
n
(
5
)
.
2
2
1
*
m
a
x
u
Q
u
Q
Q
u
(
5
)
W
h
en
co
n
d
itio
n
2
**
E
Q
u
q
u
is
s
atis
f
ied
,
th
e
esti
m
a
tio
n
is
u
n
b
ia
s
ed
.
qu
r
ep
r
esen
ts
th
e
tr
u
e
v
alu
e
f
u
n
c
tio
n
.
T
h
e
s
a
m
e
m
e
t
h
o
d
is
u
s
ed
to
cr
o
s
s
ex
a
m
in
a
ti
o
n
th
e
f
ir
s
t
s
et
o
f
ac
tio
n
s
tate
v
alu
e
f
u
n
c
tio
n
.
T
h
e
ca
lcu
latio
n
f
o
llo
w
s
eq
u
atio
n
(
6
)
.
1
1
2
*
m
a
x
u
Q
u
Q
Q
u
(
6
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
9
-
4856
I
J
R
A
,
Vo
l.
7
,
No
.
1
,
Ma
r
ch
2
0
1
8
:
39
–
47
42
Ag
e
n
t
s
elec
t
ac
tio
n
s
b
ased
o
n
t
w
o
in
ter
t
w
i
n
ed
ev
a
lu
atio
n
f
u
n
ctio
n
s
1
Q
an
d
2
Q
.
Mo
s
t
o
cc
asi
o
n
s
12
Q
u
Q
u
is
u
s
ed
to
r
ep
lace
cu
r
r
en
t
v
al
u
e
f
u
n
ct
io
n
s
.
T
em
p
o
r
al
d
if
f
er
e
n
ce
lear
n
i
n
g
m
eth
o
d
lik
e
th
is
i
s
th
e
cr
o
s
s
ev
al
u
atio
n
f
u
n
ctio
n
ap
p
r
o
ac
h
.
Usi
n
g
t
h
is
m
et
h
o
d
f
o
r
ev
al
u
atio
n
o
f
th
e
o
p
ti
m
al
ac
tio
n
,
o
n
l
y
o
n
e
o
f
th
e
ev
al
u
atio
n
f
u
n
ctio
n
s
is
u
p
d
ated
at
ea
ch
ti
m
e
s
tep
,
an
d
th
e
ac
tio
n
ev
al
u
ated
b
y
an
o
th
er
ev
alu
a
ti
o
n
f
u
n
ctio
n
is
an
i
m
p
o
r
tan
t
lear
n
in
g
p
ar
am
eter
.
T
h
is
m
e
th
o
d
ad
d
ed
o
n
e
m
o
r
e
ev
a
lu
at
io
n
f
u
n
ctio
n
an
d
r
eq
u
ir
es
m
o
r
e
s
to
r
ag
e.
B
u
t
th
e
co
m
p
u
tat
io
n
a
l
co
m
p
le
x
it
y
r
e
m
ai
n
s
al
m
o
s
t
s
a
m
e
as
t
h
e
n
o
n
-
cr
o
s
s
ev
al
u
at
io
n
ap
p
r
o
ac
h
.
T
h
e
n
e
w
lear
n
in
g
al
g
o
r
ith
m
s
till
s
a
tis
f
ies
t
h
e
g
r
ee
d
y
p
r
in
cip
le,
a
n
d
th
e
s
elec
t
io
n
o
f
ac
tio
n
s
is
r
el
ated
to
th
e
o
p
ti
m
a
l
ac
tio
n
in
t
h
e
ev
a
lu
at
io
n
p
o
lic
y
.
T
h
e
u
p
d
ate
o
f
v
alu
e
f
u
n
ctio
n
is
co
n
s
tr
ai
n
ed
b
y
an
o
t
h
er
v
al
u
e
f
u
n
ctio
n
.
I
f
o
n
e
o
f
th
e
v
alu
e
f
u
n
ctio
n
s
ev
alu
a
tes
t
h
e
er
r
o
r
d
u
e
to
th
e
m
ax
i
m
izatio
n
b
ia
s
,
th
e
n
e
x
t
s
t
ate
-
ac
tio
n
p
air
in
a
n
o
th
er
s
e
t
o
f
ev
al
u
atio
n
s
w
i
ll
co
r
r
ec
t
th
e
d
ev
iatio
n
.
I
n
th
is
w
a
y
,
an
o
t
h
er
s
et
o
f
ev
al
u
atio
n
v
al
u
e
s
w
ill
al
s
o
p
u
s
h
t
h
e
cu
r
r
en
t
v
al
u
e
f
u
n
ctio
n
o
u
t
o
f
th
e
lo
ca
l
o
p
ti
m
al
s
tate,
r
ed
u
cin
g
th
e
m
a
x
i
m
izat
io
n
b
ias
w
h
ile
co
n
t
in
u
i
n
g
l
y
s
elec
tin
g
th
e
o
p
ti
m
al
ac
tio
n
.
T
h
is
m
et
h
o
d
is
p
ar
ticu
lar
l
y
e
f
f
ec
tiv
e
i
n
en
v
ir
o
n
m
e
n
ts
w
h
er
e
lo
ca
l o
p
tim
al
p
o
licies ar
e
p
r
o
n
e
to
o
cc
u
r
.
4.
AN
ACTOR
-
CR
I
T
I
C
A
L
G
O
RIT
H
M
B
AS
E
D
O
N
CRO
SS
E
VA
L
UA
T
I
O
N
O
F
VA
L
UE
F
UNCTI
O
N
S (
DVCA
C)
T
h
e
tr
ad
itio
n
al
cr
o
s
s
ev
al
u
at
io
n
o
f
v
a
lu
e
f
u
n
ct
io
n
al
g
o
r
ith
m
co
n
s
tr
u
cts
t
w
o
s
ets
o
f
e
v
alu
a
tio
n
f
u
n
ctio
n
s
.
B
y
ad
j
u
s
ti
n
g
th
e
u
p
d
ate
eq
u
atio
n
,
th
e
s
e
t
w
o
ev
al
u
atio
n
f
u
n
ctio
n
s
ca
n
r
es
tr
ain
ea
ch
o
th
er
,
s
o
th
at
t
h
e
p
o
licy
co
m
p
u
ted
is
b
alan
ce
d
.
T
h
is
m
eth
o
d
ca
n
r
ed
u
ce
th
e
m
ax
i
m
izatio
n
b
ias
an
d
m
a
k
e
t
h
e
p
o
lic
y
g
et
o
u
t
o
f
th
e
lo
ca
l
o
p
tim
al
f
a
s
ter
,
an
d
s
p
ee
d
u
p
th
e
co
n
v
er
g
en
ce
o
f
th
e
alg
o
r
ith
m
[
2
]
.
T
h
e
cr
o
s
s
ev
alu
a
tio
n
o
f
v
al
u
e
f
u
n
ctio
n
al
g
o
r
ith
m
ca
n
b
e
u
s
e
d
in
ac
tio
n
-
cr
itic
alg
o
r
it
h
m
i
n
co
n
ti
n
u
o
u
s
s
p
ac
e,
f
o
r
r
ed
u
cin
g
t
h
e
p
o
s
s
ib
ili
t
y
o
f
lo
ca
l o
p
tim
al
s
it
u
atio
n
s
.
I
n
s
o
lv
i
n
g
r
ein
f
o
r
ce
m
en
t
lear
n
in
g
p
r
o
b
lem
s
,
th
e
s
ta
te
v
a
lu
e
f
u
n
ctio
n
o
r
th
e
s
tate
ac
t
io
n
v
al
u
e
f
u
n
ctio
n
i
s
u
s
ed
to
e
v
al
u
ate
t
h
e
p
o
licies.
Usi
n
g
t
h
e
s
tate
v
al
u
e
f
u
n
ctio
n
ca
n
r
ed
u
ce
th
e
a
m
o
u
n
t
o
f
s
to
r
ag
e
a
n
d
co
m
p
u
tatio
n
r
eq
u
ir
ed
f
o
r
p
ar
am
eter
u
p
d
ate
s
,
b
u
t
t
h
e
ca
lcu
latio
n
o
f
t
h
e
c
u
r
r
en
t
o
p
ti
m
a
l
ac
t
io
n
is
s
ig
n
i
f
ica
n
tl
y
in
cr
ea
s
ed
.
I
n
th
e
ac
to
r
-
cr
itic
alg
o
r
ith
m
,
t
h
e
o
p
ti
m
al
ac
tio
n
is
o
b
tain
ed
d
ir
ec
tl
y
th
r
o
u
g
h
th
e
cu
r
r
en
t
ca
ch
e
d
p
o
licy
,
an
d
n
o
m
o
r
e
v
alu
e
f
u
n
ctio
n
is
n
ee
d
ed
to
co
m
p
u
te
it,
s
o
in
ac
to
r
-
cr
iti
c
alg
o
r
ith
m
i
n
a
co
n
tin
u
o
u
s
s
p
ac
e,
cu
r
r
en
t p
o
lic
y
i
s
ev
al
u
ated
b
y
u
s
i
n
g
th
e
s
tate
v
al
u
e
f
u
n
ctio
n
w
it
h
le
s
s
s
to
r
a
g
e
ca
p
ac
it
y
.
I
n
co
n
ti
n
u
o
u
s
s
p
ac
e,
s
tate
-
v
al
u
ed
f
u
n
ctio
n
s
ar
e
ap
p
r
o
x
i
m
ate
d
b
y
lin
ea
r
f
u
n
ct
io
n
s
.
T
h
e
r
ep
r
esen
tat
io
n
m
et
h
o
d
is
s
h
o
w
n
i
n
eq
u
atio
n
(
7
)
.
T
V
x
x
(
7
)
An
d
th
e
u
p
d
atin
g
o
f
t
h
e
p
o
lic
y
p
ar
am
eter
is
s
h
o
w
n
in
eq
u
atio
n
(
8
)
.
=+
x
(
8
)
T
h
e
v
alu
e
f
u
n
ctio
n
r
elatio
n
s
h
ip
b
et
w
ee
n
th
e
c
u
r
r
en
t
s
tat
e
an
d
th
e
n
e
x
t
s
tate
is
p
as
s
ed
b
y
th
e
te
m
p
o
r
al
d
if
f
er
en
ce
er
r
o
r
.
T
h
e
er
r
o
r
in
t
h
e
co
n
ti
n
u
o
u
s
s
p
ac
e
is
s
h
o
w
n
in
eq
u
atio
n
(
9
)
.
TT
'
r
x
x
(
9
)
W
h
en
a
s
et
o
f
v
al
u
e
f
u
n
ctio
n
s
is
u
p
d
ated
,
an
o
th
er
v
al
u
e
f
u
n
c
tio
n
i
s
u
s
ed
as
a
s
tan
d
ar
d
to
ev
alu
ate
t
h
e
v
alu
e
f
u
n
c
tio
n
u
p
d
ated
,
to
p
r
e
v
en
t
i
t
f
r
o
m
g
etti
n
g
in
to
a
lo
ca
l
o
p
tim
a
l
b
ec
au
s
e
o
f
t
h
e
m
ax
i
m
izatio
n
b
ias.
T
h
e
m
et
h
o
d
f
o
r
u
p
d
atin
g
t
h
e
te
m
p
o
r
al
d
if
f
er
en
ce
o
f
t
h
e
f
ir
s
t set
o
f
v
al
u
e
f
u
n
cti
o
n
s
is
s
h
o
w
n
in
eq
u
atio
n
(
10
)
.
TT
21
'
r
x
x
(
10
)
r
ep
r
esen
ts
th
e
p
o
licy
p
ar
a
m
et
er
co
r
r
esp
o
n
d
in
g
to
th
e
f
ir
s
t
s
e
t
o
f
v
alu
e
f
u
n
c
tio
n
s
.
r
ep
r
esen
ts
th
e
s
ec
o
n
d
o
n
e.
I
n
eq
u
atio
n
(
10
)
,
th
e
v
al
u
e
f
u
n
ctio
n
o
f
th
e
n
e
x
t
s
ta
te
is
co
m
p
u
ted
b
y
t
h
e
s
ec
o
n
d
s
et
o
f
e
v
alu
at
io
n
v
al
u
es,
an
d
th
e
v
a
lu
e
f
u
n
ctio
n
o
f
t
h
e
c
u
r
r
en
t
s
tate
i
s
co
m
p
u
ted
b
y
t
h
e
f
ir
s
t set
o
f
e
v
al
u
atio
n
v
alu
e
s
,
s
o
t
h
at
t
h
e
f
ir
s
t
s
et
o
f
v
al
u
e
f
u
n
ctio
n
s
w
i
ll
b
e
af
f
ec
ted
b
y
t
h
e
s
ec
o
n
d
o
n
e
w
h
e
n
th
e
y
ar
e
u
p
d
ated
.
Si
m
ilar
ly
,
t
h
e
te
m
p
o
r
al
d
if
f
er
en
ce
eq
u
a
tio
n
(
11
)
ca
n
b
e
u
p
d
ated
u
s
in
g
t
h
e
s
ec
o
n
d
s
et
o
f
v
al
u
e
f
u
n
ctio
n
s
.
1
θ
2
θ
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
R
A
I
SS
N:
2089
-
4856
A
n
A
cto
r
-
crit
ic
A
lg
o
r
ith
m
Usi
n
g
C
r
o
s
s
E
va
lu
a
tio
n
o
f V
a
lu
e
…
(
Hu
i
W
a
n
g
)
43
TT
12
'
r
x
x
(
11
)
I
n
th
e
d
is
cr
ete
s
p
ac
e,
d
o
u
b
le
Q
-
lear
n
in
g
s
elec
ts
ac
tio
n
s
b
ased
o
n
t
w
o
s
ets
o
f
ev
al
u
ati
o
n
v
al
u
e
f
u
n
ctio
n
s
.
T
h
e
co
m
m
o
n
ap
p
r
o
ac
h
is
to
u
s
e
12
Q
u
Q
u
in
s
tead
o
f
th
e
o
r
ig
i
n
al
v
al
u
e
f
u
n
ctio
n
f
o
r
ac
tio
n
s
elec
tio
n
.
Ho
w
e
v
er
,
th
is
m
et
h
o
d
is
n
o
t
p
r
ac
tical
f
o
r
th
e
ac
t
o
r
-
cr
itic
alg
o
r
it
h
m
.
I
n
d
i
s
cr
et
e
s
p
ac
e,
th
e
p
o
lic
y
ca
n
b
e
o
b
tain
ed
th
r
o
u
g
h
t
h
e
s
t
ate
v
alu
e
f
u
n
ctio
n
.
B
u
t
it
is
d
if
f
ic
u
lt
to
s
o
lv
e
t
h
e
o
p
ti
m
al
ac
t
io
n
in
a
co
n
ti
n
u
o
u
s
s
p
ac
e
o
n
l
y
d
ep
en
d
in
g
o
n
t
h
e
s
tate
v
al
u
e
f
u
n
ctio
n
.
T
h
e
p
o
licy
o
f
t
h
e
ac
to
r
-
cr
itic
alg
o
r
ith
m
is
s
to
r
ed
d
ir
ec
tl
y
w
it
h
o
u
t
a
n
y
ca
lc
u
latio
n
o
f
th
e
v
al
u
e
f
u
n
ctio
n
.
T
y
p
icall
y
,
a
p
o
licy
is
r
an
d
o
m
l
y
s
elec
ted
.
An
d
th
en
ag
en
t
w
ill
u
p
d
ate
th
e
co
r
r
esp
o
n
d
in
g
s
et
o
f
v
alu
e
f
u
n
ctio
n
s
ac
co
r
d
in
g
to
th
e
s
e
l
ec
ted
p
o
licy
.
T
h
is
m
et
h
o
d
n
o
t
o
n
l
y
en
s
u
r
es
t
h
at
t
h
e
alg
o
r
ith
m
lear
n
s
ac
co
r
d
in
g
to
th
e
f
r
a
m
e
w
o
r
k
o
f
t
h
e
ac
to
r
-
cr
itic,
b
u
t
also
r
ed
u
ce
s
th
e
d
eg
r
ee
o
f
th
e
m
a
x
i
m
izatio
n
b
ias.
T
h
e
s
p
ec
if
ic
ac
to
r
-
cr
itic a
lg
o
r
ith
m
b
ased
o
n
th
e
c
r
o
s
s
ev
al
u
atio
n
v
al
u
e
f
u
n
c
tio
n
is
s
h
o
w
n
i
n
alg
o
r
it
h
m
1
.
Alg
o
rit
h
m
1
.
A
n
ac
to
r
-
cr
itic a
lg
o
r
ith
m
b
ased
o
n
cr
o
s
s
ev
a
lu
atio
n
o
f
v
a
lu
e
f
u
n
ctio
n
s
1
.
in
itializatio
n
:
p
ar
a
m
eter
s
o
f
v
al
u
e
f
u
n
ctio
n
1
,
2
,
p
ar
am
e
ter
s
o
f
p
o
lic
y
1
、
2
;
2
.
R
E
P
E
A
T
(
f
o
r
ea
ch
ep
is
o
d
e)
:
3
.
0
xx
,
0
x
is
a
in
itial
s
tate
;
4
.
R
E
P
E
A
T
(
f
o
r
ea
ch
ti
m
e
s
tep
)
:
5
p
e
r
f
o
r
m
t
h
e
f
o
llo
w
i
n
g
s
te
p
s
w
it
h
5
0
% p
r
o
b
ab
ilit
y
:
6
.
Select
1
an
d
g
et
t
h
e
o
p
t
i
m
al
ac
tio
n
1
A
c
x
,
an
d
p
er
f
o
r
m
ac
tio
n
u
.
7
in
s
tate
x
,
p
er
f
o
r
m
ac
tio
n
u
,
g
et
r
an
d
th
e
n
e
x
t sta
te
'
x
;
8
.
TT
21
'
r
x
x
;
9
.
11
x
;
1
0
.
当
0
时
1
1
1
1
A
c
x
u
A
c
x
;
1
1
.
W
ith
5
0
% p
r
o
b
ab
ilit
y
p
e
r
f
o
r
m
th
e
f
o
llo
w
i
n
g
s
tep
s
:
1
2
.
Fro
m
2
g
et
th
e
o
p
ti
m
a
l a
ctio
n
2
A
c
x
an
d
p
er
f
o
r
m
ac
tio
n
u
1
3
.
I
n
s
tate
x
,
p
er
f
o
r
m
t
h
e
ac
t
io
n
u
,
an
d
g
et
a
r
e
w
ar
d
r
an
d
th
e
n
ex
t
s
tate
'
x
;
1
4
.
TT
12
'
r
x
x
;
1
5
.
22
x
;
1
6
.
当
0
时
2
2
2
2
A
c
x
u
A
c
x
;
1
7
.
'
xx
;
1
8
.
UN
T
I
L
x
is
a
f
i
n
al
s
tate.
1
9
.
UN
T
I
L
th
e
m
a
x
i
m
u
m
n
u
m
b
er
o
f
ep
i
s
o
d
es.
5.
ANALY
SI
S O
F
E
XP
E
R
I
M
E
NT
R
E
S
UL
T
S
T
w
o
d
if
f
er
en
t
p
u
d
d
le
w
o
r
ld
s
,
w
h
ic
h
ar
e
ea
s
y
to
g
et
i
n
to
th
e
lo
ca
l
o
p
ti
m
al,
ar
e
u
s
ed
t
o
test
th
e
p
er
f
o
r
m
a
n
ce
o
f
th
e
DV
C
AC
alg
o
r
ith
m
.
Fi
g
u
r
e
2
is
a
co
n
tin
u
o
u
s
s
p
ac
e
p
u
d
d
le
w
o
r
ld
p
r
o
b
le
m
.
T
h
e
s
tate
s
p
ac
e
is
a
s
q
u
ar
e
w
it
h
a
s
id
e
len
g
t
h
o
f
1
,
a
n
d
t
w
o
s
e
g
m
en
t
s
r
e
p
r
esen
t
t
h
e
lo
ca
tio
n
o
f
t
h
e
p
u
d
d
le.
T
h
e
v
er
tex
p
o
s
itio
n
s
o
f
th
e
t
w
o
s
e
g
m
en
ts
ar
e
(
0
.
1
,
0
.
7
5
)
,
(
0
.
4
5
,
0
.
7
5
)
an
d
(
0
.
4
5
,
0
.
4
)
,
(
0
.
4
5
,
0
.
8
)
.
A
f
ter
th
e
ag
e
n
t
p
er
f
o
r
m
s
an
ac
tio
n
.
I
f
th
e
m
i
n
i
m
u
m
d
is
tan
ce
d
f
r
o
m
th
e
p
u
d
d
le
to
th
e
ag
en
t
i
s
less
t
h
a
n
0
.
1
,
th
e
r
e
w
o
r
d
r
is
1
4
0
0
*
(0
.
1
)
d
,
o
th
er
w
is
e
r
e
w
o
r
d
r
is
-
1.
I
n
ea
ch
s
tate,
th
e
a
g
en
t
ca
n
m
o
v
e
i
n
an
y
d
ir
ec
tio
n
s
,
an
d
th
e
d
is
tan
ce
b
et
w
ee
n
ea
ch
m
o
v
e
is
f
ix
ed
at
0
.
0
5
.
T
h
e
m
o
v
e
s
in
x
ax
is
an
d
y
ax
is
ar
e
s
u
b
j
ec
ted
to
n
o
is
e
in
ter
f
er
e
n
ce
.
T
h
e
n
o
is
e
f
o
llo
ws
n
o
r
m
al
d
is
tr
i
b
u
tio
n
w
it
h
av
er
ag
e
0
an
d
th
e
s
ta
n
d
ar
d
d
ev
iatio
n
0
.
0
1
.
I
f
ag
en
t
af
ter
a
m
o
v
e
ex
ce
ed
s
t
h
e
b
o
u
n
d
s
,
it
s
ta
y
s
o
n
t
h
e
b
o
u
n
d
ar
y
.
T
h
e
g
o
al
o
f
t
h
e
p
u
d
d
le
w
o
r
ld
ex
p
er
i
m
e
n
t
is
to
f
i
n
d
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
9
-
4856
I
J
R
A
,
Vo
l.
7
,
No
.
1
,
Ma
r
ch
2
0
1
8
:
39
–
47
44
th
e
s
h
o
r
test
p
at
h
f
r
o
m
th
e
s
ta
r
t
to
th
e
ter
m
i
n
al
a
n
d
th
e
p
ath
s
h
o
u
ld
b
y
p
ass
t
h
e
p
u
d
d
les.
T
h
e
s
tar
t
p
o
s
itio
n
is
(
0
,
0)
,
an
d
th
e
ter
m
i
n
al
(
,
)
xy
s
atis
f
ies
1
.
9
xy
.
I
n
th
i
s
p
u
d
d
le
w
o
r
ld
,
th
er
e
ar
e
m
a
n
y
s
u
b
o
p
ti
m
al
s
o
lu
tio
n
s
.
T
h
er
ef
o
r
e,
th
e
co
n
v
er
g
en
ce
p
er
f
o
r
m
an
ce
o
f
th
e
D
VC
AC
alg
o
r
it
h
m
ca
n
b
e
p
r
o
f
o
u
n
d
l
y
ex
a
m
i
n
ed
.
Fig
u
r
e
2
.
P
u
d
d
le
w
o
r
ld
1
T
h
e
DVC
A
C
al
g
o
r
ith
m
is
co
m
p
ar
ed
w
it
h
t
h
e
class
ical
C
AC
L
A
al
g
o
r
ith
m
[
1
1
]
.
I
n
th
e
ex
p
er
i
m
e
n
t,
b
o
th
DVC
A
C
a
n
d
C
AC
L
A
u
s
e
Gau
s
s
ian
r
ad
ial
b
asis
f
u
n
c
tio
n
s
.
1
0
x
1
0
g
r
id
s
d
iv
id
e
th
e
w
h
o
le
s
tate
s
p
ac
e.
E
ac
h
g
r
id
is
th
e
ce
n
ter
o
f
th
e
b
asis
f
u
n
ctio
n
.
T
h
e
r
ad
iu
s
o
f
th
e
b
asis
f
u
n
ctio
n
is
0
.
0
5
,
d
is
co
u
n
t
0
.
9
5
,
s
tep
f
ac
to
r
o
f
th
e
s
tate
0
.
1
,
s
tep
f
ac
to
r
o
f
t
h
e
p
o
lic
y
=
0
.
0
2
.
T
h
e
m
a
x
i
m
u
m
n
u
m
b
er
o
f
ep
is
o
d
es
i
s
d
ef
in
ed
as
1
0
0
0
.
T
h
e
ac
tio
n
is
s
elec
ted
w
i
th
th
e
Gau
s
s
ia
n
ex
p
lo
r
atio
n
m
et
h
o
d
.
I
n
t
h
e
p
u
d
d
le
w
o
r
ld
,
th
e
cu
m
u
lat
iv
e
r
e
w
ar
d
o
f
ea
c
h
ep
is
o
d
e
is
u
s
ed
to
m
ea
s
u
r
e
t
h
e
q
u
alit
y
o
f
t
h
e
p
o
lic
y
.
T
h
e
g
r
ea
ter
th
e
r
etu
r
n
,
th
e
b
etter
th
e
p
o
licy
.
Fig
u
r
e
3
is
th
e
r
e
w
ar
d
co
m
p
ar
is
o
n
o
f
t
h
ese
t
w
o
a
lg
o
r
it
h
m
s
.
T
h
e
r
e
w
ar
d
v
al
u
es
ar
e
th
e
a
v
e
r
ag
e
o
f
2
0
ex
p
er
i
m
e
n
ts
.
I
n
o
r
d
er
to
p
r
ev
en
t
a
lar
g
e
n
u
m
b
er
o
f
s
tep
s
u
n
d
er
an
ep
is
o
d
e,
th
e
m
a
x
i
m
u
m
n
u
m
b
er
o
f
s
tep
s
is
1
0
0
in
all
ep
is
o
d
es.
Sin
ce
th
e
n
u
m
b
er
o
f
s
tep
s
in
ea
ch
ep
is
o
d
e
is
v
er
y
s
m
a
ll,
it
is
d
if
f
icu
lt
to
g
et
to
th
e
g
o
al
s
tate
in
a
n
ep
is
o
d
e.
T
h
e
r
e
w
ar
d
v
alu
e
ca
n
b
e
u
s
ed
as a
cr
iter
io
n
f
o
r
ev
al
u
ati
n
g
lear
n
i
n
g
al
g
o
r
ith
m
s
.
Fig
u
r
e
3
.
T
h
e
co
m
p
ar
is
o
n
o
f
a
cc
u
m
u
lated
r
e
w
ar
d
s
in
p
u
d
d
le
w
o
r
ld
1
Sin
ce
t
h
e
DV
C
AC
alg
o
r
it
h
m
n
ee
d
s
to
s
o
lv
e
a
n
d
ev
al
u
ate
t
w
o
s
et
s
o
f
f
u
n
c
tio
n
p
ar
a
m
eter
s
,
th
e
co
n
v
er
g
e
n
ce
r
ate
is
s
lo
w
er
th
a
n
th
at
o
f
C
AC
L
A
.
Ho
w
e
v
er
,
th
e
r
e
w
ar
d
v
al
u
e
o
f
DV
C
AC
alg
o
r
ith
m
i
n
cr
ea
s
e
s
s
tead
il
y
w
i
th
t
h
e
in
cr
ea
s
e
o
f
t
h
e
n
u
m
b
er
o
f
ep
is
o
d
es.
W
h
en
th
e
n
u
m
b
er
o
f
ep
is
o
d
es
is
ab
o
u
t
700
,
th
e
r
e
w
ar
d
v
alu
e
s
o
f
th
e
t
w
o
al
g
o
r
ith
m
s
co
in
cid
e,
an
d
th
e
DVC
AC
i
s
s
till
o
p
ti
m
ized
w
h
e
n
th
e
n
u
m
b
er
o
f
ep
is
o
d
es
ex
ce
ed
s
7
0
0
.
T
h
e
C
A
C
L
A
a
lg
o
r
ith
m
alt
h
o
u
g
h
t
h
e
lear
n
i
n
g
s
p
ee
d
is
v
er
y
f
ast,
b
u
t
wh
en
th
e
n
u
m
b
er
o
f
ep
is
o
d
es
g
r
ea
ter
th
a
n
4
0
0
,
th
e
r
e
w
ar
d
v
alu
e
is
al
m
o
s
t
n
o
ch
an
g
e,
an
d
th
e
p
o
lic
y
i
s
b
asicall
y
n
o
t
o
p
ti
m
izi
n
g
,
an
d
n
o
t
s
tab
le
a
f
ter
1
0
0
0
ep
is
o
d
es.
E
x
p
er
i
m
e
n
ts
s
h
o
w
th
at
alth
o
u
g
h
t
h
e
co
n
v
er
g
en
ce
s
p
ee
d
is
n
o
t
f
ast
,
DVC
AC
alg
o
r
ith
m
h
a
s
a
g
o
o
d
lear
n
in
g
p
er
f
o
r
m
an
ce
an
d
an
id
ea
l
c
o
n
v
er
g
en
ce
,
an
d
th
e
lear
n
t
p
o
licy
is
r
elativ
el
y
s
tab
le.
0
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
1
2
0
0
-
30
C
A
C
L
A
D
V
C
A
C
R
ew
a
r
d
s
E
p
i
s
o
d
es
-
1
0
0
-
90
-
80
-
70
-
60
-
50
-
40
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
R
A
I
SS
N:
2089
-
4856
A
n
A
cto
r
-
crit
ic
A
lg
o
r
ith
m
Usi
n
g
C
r
o
s
s
E
va
lu
a
tio
n
o
f V
a
lu
e
…
(
Hu
i
W
a
n
g
)
45
I
n
t
h
is
e
x
p
er
i
m
e
n
t,
p
u
d
d
le
is
d
ef
in
ed
a
s
a
s
eg
m
e
n
t;
t
h
e
e
n
d
p
o
in
t
o
f
t
h
e
s
eg
m
e
n
t
i
s
(
0
.
1
,
0
.
5
)
,
(
0
.
8
,
0
.
5
)
r
esp
ec
tiv
el
y
.
I
f
th
e
ag
e
n
t
g
o
es
n
ea
r
th
e
p
u
d
d
le,
an
d
th
e
d
is
tan
ce
is
les
s
th
a
n
0
.
1
,
th
e
r
e
w
ar
d
r
is
d
ef
in
ed
as
1
4
0
0
*
(0
.
1
)
d
.
Oth
er
w
i
s
e
t
h
e
r
e
w
ar
d
is
s
e
t
to
-
1
.
I
n
o
r
d
er
to
k
ee
p
th
e
e
n
v
ir
o
n
m
e
n
t
m
o
r
e
r
an
d
o
m
,
w
h
e
n
t
h
e
a
g
en
t
g
o
e
s
ab
o
v
e
t
h
e
m
id
d
le
p
o
s
itio
n
,
e.
g
.
0
.5
y
,
ag
e
n
t
m
a
y
g
e
t
a
r
a
n
d
o
m
r
e
w
ar
d
w
it
h
m
ea
n
o
f
0
.
1
an
d
a
s
tan
d
ar
d
d
ev
iatio
n
o
f
1
.
T
h
e
task
o
f
th
e
ag
en
t
in
t
h
e
ex
p
er
i
m
e
n
t
is
to
r
ea
ch
th
e
d
esti
n
atio
n
q
u
ick
l
y
.
I
n
ea
ch
s
tate
t
h
e
a
g
en
t c
a
n
m
o
v
e
in
a
n
y
d
ir
ec
tio
n
,
th
e
d
is
ta
n
c
e
o
f
ea
ch
m
o
v
e
is
f
i
x
ed
to
0
.
0
5
.
E
ac
h
ti
m
e
th
e
m
o
v
e
m
en
t
in
b
o
th
x
ax
is
an
d
y
ax
is
w
ill
b
e
af
f
e
cted
b
y
n
o
is
e.
T
h
e
n
o
is
e
f
o
llo
w
s
a
n
o
r
m
al
d
is
tr
ib
u
tio
n
,
w
it
h
th
e
m
ea
n
v
al
u
e
o
f
0
,
an
d
th
e
s
ta
n
d
ar
d
d
ev
iatio
n
o
f
0
.
0
1
.
I
f
th
e
m
o
v
e
m
e
n
t
ex
ce
ed
s
th
e
b
o
u
n
d
ar
y
,
t
h
e
a
g
en
t
r
e
m
ai
n
s
o
n
th
e
b
o
u
n
d
ar
y
.
I
n
t
h
i
s
ex
p
er
im
e
n
t,
t
h
e
s
tar
tin
g
p
o
in
t
is
d
ef
in
ed
as
(
0
,
0)
,
th
e
e
n
d
p
o
in
t
s
ati
s
f
ies
1
.
9
xy
.
T
h
e
o
t
h
er
p
ar
a
m
eter
s
o
f
th
e
p
u
d
d
le
w
o
r
ld
2
ar
e
ex
ac
tl
y
t
h
e
s
a
m
e
a
s
th
o
s
e
o
f
th
e
o
r
ig
in
al
p
u
d
d
le
w
o
r
ld
.
P
u
d
d
le
w
o
r
ld
2
is
s
h
o
w
n
in
Fig
u
r
e
4
.
T
h
e
s
tate
s
p
ac
e
is
a
s
q
u
ar
e
w
it
h
a
s
id
e
len
g
th
o
f
1
.
I
n
th
i
s
s
q
u
ar
e
s
p
a
ce
th
er
e
is
a
p
u
d
d
le
i
n
t
h
e
m
id
d
le
as
an
o
b
s
tacle
,
b
lo
c
k
in
g
t
h
e
m
o
v
e
m
e
n
t
o
f
t
h
e
ag
en
t.
终
点
区
域
0
1
0
1
Fig
u
r
e
4
.
P
u
d
d
le
w
o
r
ld
2
I
n
th
e
p
u
d
d
le
w
o
r
ld
2
,
th
e
e
n
v
ir
o
n
m
e
n
t
i
s
s
i
m
p
le,
b
u
t
th
e
r
e
is
a
r
an
d
o
m
n
o
is
e
f
o
llo
w
i
n
g
n
o
r
m
a
l
d
is
tr
ib
u
tio
n
i
n
t
h
e
r
e
w
ar
d
p
ar
t.
I
n
o
r
d
er
t
o
p
r
ev
en
t
ex
ce
s
s
i
v
e
n
u
m
b
er
o
f
s
tep
s
in
a
n
ep
is
o
d
e
th
at
ev
en
t
u
all
y
lead
s
to
ex
ce
s
s
iv
e
lear
n
i
n
g
,
t
h
e
m
ax
i
m
u
m
n
u
m
b
er
o
f
s
tep
s
d
ef
in
ed
f
o
r
ea
ch
ep
is
o
d
e
is
1
0
0
.
C
o
m
p
ar
ed
w
it
h
th
e
lear
n
i
n
g
au
to
m
ata
b
ased
,
ac
to
r
-
cir
tic
alg
o
r
it
h
m
i
n
co
n
t
in
o
u
s
s
p
ac
e,
th
e
r
es
u
lt
s
ar
e
s
h
o
wn
in
Fig
u
r
e
5
.
Fig
u
r
e
5
.
T
h
e
co
m
p
ar
is
o
n
o
f
a
cc
u
m
u
lated
r
e
w
ar
d
s
in
p
u
d
d
le
w
o
r
ld
2
T
h
e
r
ew
ar
d
in
F
ig
u
r
e
5
is
th
e
av
er
ag
e
o
f
1
0
ex
p
er
i
m
e
n
ts
.
Sin
ce
DV
C
A
C
al
g
o
r
ith
m
i
s
a
C
A
C
L
A
alg
o
r
ith
m
t
h
at
ad
d
s
a
n
e
w
v
a
l
u
e
f
u
n
ctio
n
to
lear
n
,
t
h
e
co
n
v
er
g
en
ce
r
ate
is
n
o
t
as
f
ast
a
s
C
AC
L
A
.
Ho
w
ev
er
,
it
ca
n
b
e
s
ee
n
f
r
o
m
Fig
u
r
e
5
,
i
n
DVC
AC
,
t
h
e
r
e
w
ar
d
s
tab
ilit
y
is
m
u
c
h
b
etter
t
h
a
n
t
h
e
C
A
C
L
A
al
g
o
r
it
h
m
.
T
h
e
C
AL
C
A
al
g
o
r
ith
m
is
ef
f
icie
n
t
b
u
t
w
ith
a
r
e
w
ar
d
cu
r
v
e
f
l
u
ct
u
ates g
r
ea
tl
y
.
0
200
400
600
800
1000
1200
E
p
i
s
o
d
es
R
e
w
a
r
d
s
C
A
C
L
A
D
V
C
A
C
-
110
-
100
-
90
-
80
-
70
-
60
-
50
-
40
-
30
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
9
-
4856
I
J
R
A
,
Vo
l.
7
,
No
.
1
,
Ma
r
ch
2
0
1
8
:
39
–
47
46
W
h
en
th
e
n
u
m
b
er
o
f
ep
is
o
d
es
is
s
m
all
(
ab
o
u
t
5
0
ep
is
o
d
es),
th
e
f
lu
c
tu
at
io
n
r
an
g
e
i
s
lar
g
er
.
I
n
s
o
m
e
s
p
ec
if
ic
cir
c
u
m
s
ta
n
ce
s
,
t
h
e
s
ta
b
ilit
y
o
f
t
h
e
co
n
v
er
g
en
ce
r
ate
is
m
o
s
t
i
m
p
o
r
tan
t.
I
t
ca
n
b
e
s
e
en
f
r
o
m
t
h
e
f
i
g
u
r
e
th
at
w
h
e
n
t
h
e
n
u
m
b
er
o
f
ep
is
o
d
es is
g
r
ea
ter
th
a
n
5
0
0
,
th
e
s
ta
b
ilit
y
o
f
C
AC
L
A
alg
o
r
it
h
m
b
eg
in
s
to
d
ec
li
n
e,
th
e
r
e
w
ar
d
s
tar
ts
to
b
ec
o
m
e
s
m
al
l
er
an
d
th
er
e
i
s
co
n
s
id
er
ab
le
f
l
u
ctu
a
tio
n
,
a
n
d
DV
C
A
C
alg
o
r
i
th
m
s
t
ill
m
ai
n
tai
n
s
a
g
o
o
d
v
alu
e
-
ad
d
ed
tr
en
d
.
I
t
ca
n
b
e
s
ee
n
th
at
th
e
DV
C
A
C
a
lg
o
r
ith
m
ca
n
o
b
tain
a
m
o
r
e
ac
cu
r
ate
an
d
s
tab
le
p
o
licy
.
T
h
e
alg
o
r
ith
m
h
as
s
tr
o
n
g
r
o
b
u
s
t
n
ess
,
a
n
d
h
as
b
etter
co
n
v
er
g
e
n
ce
p
er
f
o
r
m
a
n
ce
,
an
d
is
s
u
itab
le
f
o
r
th
e
ap
p
licatio
n
th
at
s
ee
k
s
to
a
g
lo
b
al
o
p
tim
al
p
o
lic
y
.
6.
CO
NCLU
SI
O
N
I
n
o
r
d
er
to
im
p
r
o
v
e
th
e
s
tab
i
lit
y
o
f
th
e
co
n
ti
n
u
o
u
s
s
p
ac
e
alg
o
r
ith
m
a
n
d
p
r
ev
en
t
th
e
p
o
lic
y
f
r
o
m
f
alli
n
g
in
to
t
h
e
lo
ca
l
o
p
ti
m
al
b
ec
au
s
e
o
f
t
h
e
lack
o
f
e
x
p
lo
r
atio
n
,
an
ac
to
r
-
cr
itic
alg
o
r
ith
m
b
ased
o
n
th
e
cr
o
s
s
ev
alu
a
tio
n
o
f
v
al
u
e
f
u
n
ct
io
n
is
p
r
o
p
o
s
ed
.
T
h
e
alg
o
r
ith
m
co
n
s
tr
u
c
ts
t
w
o
s
ets
o
f
ev
al
u
atio
n
f
u
n
c
tio
n
s
,
b
y
r
an
d
o
m
l
y
u
s
in
g
a
s
et
o
f
ev
a
l
u
atio
n
f
u
n
ct
io
n
s
in
t
h
e
ac
tio
n
s
elec
tio
n
.
W
h
en
a
s
et
o
f
e
v
a
lu
atio
n
f
u
n
c
tio
n
s
is
u
p
d
ated
,
an
o
th
er
s
et
o
f
e
v
alu
a
tio
n
f
u
n
ctio
n
s
is
u
s
ed
to
ev
al
u
ate
th
e
v
al
u
e
f
u
n
ctio
n
o
f
t
h
e
n
ex
t
s
tate
o
r
th
e
n
e
x
t
s
tate
ac
tio
n
p
air
.
T
h
is
m
e
th
o
d
m
a
k
es
th
e
p
o
licy
ea
s
y
g
et
r
id
o
f
th
e
lo
ca
l
o
p
ti
m
al
u
n
d
er
th
e
s
a
m
e
ex
p
lo
r
atio
n
.
T
h
e
ad
v
an
ta
g
es
ar
e
m
o
r
e
n
o
ticea
b
le
w
h
en
d
ea
li
n
g
w
i
th
e
n
v
ir
o
n
m
en
ts
t
h
at
ar
e
v
u
ln
er
ab
le
to
lo
ca
l
o
p
tim
al
a
n
d
a
s
tab
le
p
o
licy
i
s
r
eq
u
ir
ed
.
I
n
o
r
d
er
to
ch
ec
k
th
e
p
er
f
o
r
m
an
ce
,
th
e
DVC
AC
an
d
th
e
C
AC
L
A
al
g
o
r
ith
m
ar
e
ex
p
er
i
m
e
n
ted
i
n
t
w
o
d
if
f
er
en
t
p
u
d
d
le
w
o
r
ld
e
n
v
ir
o
n
m
en
t
s
.
T
h
e
ex
p
er
i
m
en
tal
r
es
u
lt
s
s
h
o
w
t
h
at
th
e
p
o
lic
y
lear
n
t
b
y
t
h
e
DVC
AC
alg
o
r
ith
m
i
s
m
o
r
e
s
tab
le
th
an
th
at
o
f
C
AC
L
A
a
lg
o
r
it
h
m
,
an
d
th
e
co
m
p
u
tatio
n
a
l
co
m
p
le
x
it
y
is
o
f
th
e
s
a
m
e
le
v
e
l.
RE
F
E
R
E
NC
E
S
[1
]
S
a
m
u
e
l
A
L
.
"
S
o
m
e
stu
d
ies
in
m
a
c
h
in
e
lea
rn
in
g
u
sin
g
th
e
g
a
m
e
o
f
c
h
e
c
k
e
r
s
"
.
IBM
J
o
u
rn
a
l
o
f
re
se
a
rc
h
a
n
d
d
e
v
e
lo
p
me
n
t
2
0
0
0
,
4
4
(1
.
2
):
2
0
6
-
2
2
6
.
[2
]
M
in
sk
y
M
.
"
S
tep
s to
w
a
rd
a
rti
f
ici
a
l
in
telli
g
e
n
c
e
"
.
Pro
c
e
e
d
in
g
s
o
f
th
e
IRE
,
1
9
6
1
,
4
9
(
1
):
8
-
3
0
.
[3
]
Ba
rto
A
G
,
S
u
tt
o
n
R
S
,
A
n
d
e
rso
n
C
W
,
e
t
a
l.
"
Ne
u
ro
n
l
ik
e
a
d
a
p
ti
v
e
e
le
m
e
n
ts
th
a
t
c
a
n
so
lv
e
d
if
f
icu
lt
lea
rn
in
g
c
o
n
tr
o
l
p
ro
b
lem
s
"
.
T
ra
n
sa
c
ti
o
n
s
o
n
sy
ste
ms
,
ma
n
,
a
n
d
c
y
b
e
rn
e
ti
c
s
,
1
9
8
3
,
1
3
(5
):
8
3
4
-
8
4
6
.
[4
]
Ko
n
d
a
V
R,
T
sitsik
li
s
J
N.
"
A
c
to
r
-
c
rit
ic
a
lg
o
rit
h
m
s
"
.
S
ia
m
jo
u
rn
a
l
o
n
c
o
n
tro
l
&
o
p
ti
miza
ti
o
n
,
2
0
0
0
,
4
2
(4
):
1
0
0
8
-
1
0
1
4
.
[5
]
G
ro
n
d
m
a
n
I,
Bu
so
n
iu
L
,
L
o
p
e
s
G
A
D,
e
t
a
l
.
"
A
su
rv
e
y
o
f
a
c
to
r
-
c
ri
ti
c
re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
:
S
tan
d
a
rd
a
n
d
n
a
tu
ra
l
p
o
li
c
y
g
ra
d
ien
t
s
".
T
ra
n
sa
c
ti
o
n
s
o
n
S
y
ste
ms
M
a
n
&
Cy
b
e
rn
e
ti
c
s P
a
rt C,
2
0
1
2
,
4
2
(
6
):
1
2
9
1
-
1
3
0
7
.
[6
]
S
u
tt
o
n
R
S
,
M
c
a
ll
e
ste
r
D,
S
in
g
h
S
,
e
t
a
l.
"
P
o
li
c
y
g
ra
d
ien
t
m
e
t
h
o
d
s
f
o
r
re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
w
it
h
f
u
n
c
ti
o
n
a
p
p
ro
x
im
a
ti
o
n
"
,
Ad
v
a
n
c
e
s in
n
e
u
ra
l
in
fo
rm
a
ti
o
n
p
r
o
c
e
ss
in
g
sy
ste
ms
.
Ca
mb
ri
d
g
e
,
M
A,
U
S
A:
2
0
0
0
:
1
0
5
7
-
1
0
6
3
.
[7
]
En
g
e
l
Y,
M
a
n
n
o
r
S
,
M
e
ir
R.
"
Ba
y
e
s
m
e
e
ts
Be
ll
m
a
n
:
T
h
e
g
a
u
ss
ian
p
ro
c
e
ss
a
p
p
ro
a
c
h
t
o
tem
p
o
ra
l
d
if
f
e
r
e
n
c
e
lea
rn
in
g
"
,
In
ter
n
a
t
io
n
a
l
c
o
n
fer
e
n
c
e
o
n
ma
c
h
in
e
lea
r
n
in
g
.
Ne
w
J
e
rs
e
y
,
US
A:
2
0
0
3
:
1
5
4
-
1
6
1
.
[8
]
P
e
ters
J,
S
c
h
a
a
l
S
.
"
Na
tu
ra
l
A
c
to
r
-
c
rit
ic
"
.
Ne
u
ro
c
o
mp
u
ti
n
g
,
2
0
0
8
,
7
1
(7
–
9
):
1
1
8
0
-
1
1
9
0
.
[9
]
P
e
ters
J,
V
ij
a
y
a
k
u
m
a
r
S
,
S
c
h
a
a
l
S
.
"
Re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
f
o
r
h
u
m
a
n
o
id
r
o
b
o
ti
c
s
"
.
Au
t
o
n
o
mo
u
s
ro
b
o
t,
2
0
0
3
,
1
2
(
1
):
1
-
2
0
.
[1
0
]
Ha
ss
e
lt
H V
.
"
Re
in
f
o
rc
e
m
e
n
t
lea
r
n
in
g
i
n
c
o
n
ti
n
u
o
u
s sta
te an
d
a
c
ti
o
n
sp
a
c
e
s
"
. B
e
rli
n
He
id
e
lb
e
rg
:
S
p
rin
g
e
r,
2
0
1
2
.
[1
1
]
W
iers
tra
D,
S
c
h
a
u
l
T
,
P
e
ters
J,
e
t
a
l.
"
Na
tu
ra
l
e
v
o
lu
ti
o
n
stra
teg
ies
"
,
Co
n
g
re
ss
o
n
e
v
o
lu
ti
o
n
a
ry
c
o
mp
u
t
a
ti
o
n
.
Pi
sc
a
ta
wa
y
,
NJ
,
US
A
,
2
0
0
8
:
3
3
8
1
-
3
3
8
7
.
[1
2
]
S
u
n
Y,
W
iers
tra
D,
S
c
h
a
u
l
T
,
e
t
a
l.
"
Eff
icie
n
t
n
a
tu
ra
l
e
v
o
lu
ti
o
n
st
ra
teg
ies
"
,
Ge
n
e
ti
c
a
n
d
e
v
o
lu
ti
o
n
a
ry
c
o
mp
u
ta
ti
o
n
c
o
n
fer
e
n
c
e
.
M
o
n
tre
a
l
,
Qu
é
b
e
c
,
C
a
n
a
d
a
,
2
0
0
9
:
5
3
9
-
5
4
6
.
[1
3
]
Ru
b
i
n
ste
in
R
Y
,
Kro
e
se
D
P
.
"
T
h
e
Cro
ss
-
e
n
tro
p
y
m
e
th
o
d
"
.
Ne
w
Y
o
rk
:
S
p
rin
g
e
r,
2
0
0
4
.
[1
4
]
Bo
tev
Z
I,
Kro
e
se
D P
,
Ru
b
in
ste
i
n
R
Y,
e
t
a
l.
"
T
h
e
c
ro
ss
-
e
n
tro
p
y
m
e
th
o
d
f
o
r
o
p
ti
m
iza
ti
o
n
"
.
Ha
n
d
b
o
o
k
o
f
S
ta
t
isti
c
s,
2
0
1
3
,
3
1
:
3
5
-
5
9
.
[1
5
]
M
a
rti
n
H J A
,
De
L
o
p
e
J.
"
Ex
<
α
>
:
A
n
e
ff
e
c
ti
v
e
a
lg
o
rit
h
m
f
o
r
c
o
n
ti
n
u
o
u
s ac
ti
o
n
s rein
f
o
rc
e
m
e
n
t
lea
rn
in
g
p
ro
b
lem
s
"
,
Co
n
fer
e
n
c
e
o
f
th
e
i
n
d
u
stria
l
e
lec
tro
n
ics
so
c
iety
.
Pi
sc
a
ta
wa
y
,
NJ
,
U
S
A,
2
0
0
9
:
2
0
6
3
-
2
0
6
8
.
[1
6
]
L
il
l
icra
p
T
P
,
Hu
n
t
J
J,
P
rit
z
e
l
A
,
e
t
a
l.
"
Co
n
ti
n
u
o
u
s
c
o
n
tr
o
l
w
it
h
d
e
e
p
re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
"
.
Co
mp
u
ter
S
c
ien
c
e
,
2
0
1
5
,
8
(
6
):A
1
8
7
.
[1
7
]
G
u
S
,
L
il
li
c
ra
p
T
P
,
S
u
tsk
e
v
e
r
I,
e
t
a
l.
"
Co
n
ti
n
u
o
u
s
d
e
e
p
Q
-
lea
rn
i
n
g
w
it
h
m
o
d
e
l
-
b
a
se
d
a
c
c
e
lera
ti
o
n
"
,
In
ter
n
a
ti
o
n
a
l
c
o
n
fer
e
n
c
e
o
n
m
a
c
h
i
n
e
lea
rn
i
n
g
.
Ne
w
J
e
rs
e
y
,
US
A,
2
0
1
6
:
2
8
2
9
-
2
8
3
8
.
[1
8
]
Kh
a
m
a
ss
i
M
,
T
z
a
f
e
sta
s
C.
"
A
c
ti
v
e
e
x
p
lo
ra
ti
o
n
i
n
p
a
ra
m
e
t
e
rize
d
re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
"
.
a
rXiv
p
re
p
rin
t
a
rXiv:1
6
1
0
.
0
1
9
8
6
,
2
0
1
6
.
[1
9
]
Bh
a
tn
a
g
a
r
S
,
S
u
tt
o
n
R
S
,
G
h
a
v
a
m
z
a
d
e
h
M
,
e
t
a
l.
"
In
c
re
m
e
n
tal
n
a
tu
ra
l
a
c
to
r
-
c
rit
ic
a
lg
o
rit
h
m
s
"
.
Ne
u
ra
l
in
fo
rm
a
ti
o
n
p
ro
c
e
ss
in
g
sy
ste
ms
.
No
r
t
h
M
ia
mi
Bea
c
h
,
Fl
o
rid
a
,
2
0
0
7
:
1
0
5
-
1
1
2
.
[2
0
]
Ko
n
d
a
V
R,
T
sitsik
li
s
J
N.
"
Ac
t
o
r
-
Crit
ic
A
lg
o
rit
h
m
s
"
,
N
e
u
ra
l
in
f
o
rm
a
ti
o
n
p
ro
c
e
ss
in
g
sy
ste
ms
.
Ph
il
a
d
e
l
p
h
i
a
,
PA
,
US
A,
2
0
0
0
:
1
0
0
8
-
1
0
1
4
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
R
A
I
SS
N:
2089
-
4856
A
n
A
cto
r
-
crit
ic
A
lg
o
r
ith
m
Usi
n
g
C
r
o
s
s
E
va
lu
a
tio
n
o
f V
a
lu
e
…
(
Hu
i
W
a
n
g
)
47
[2
1
]
F
u
Qi
-
M
in
g
,
L
iu
Qu
a
n
,
W
a
n
g
Hu
i.
"
A
No
v
e
l
o
f
P
o
l
icy
Q
(λ
)
A
lg
o
rit
h
m
Ba
s
e
d
o
n
L
in
e
ra
r
F
u
n
c
ti
o
n
A
p
p
ro
x
ima
ti
o
n
"
.
Ch
i
n
e
se
J
o
u
rn
a
l
o
f
C
o
mp
u
ter
s,
2
0
1
4
,
3
7
(3
)
:
6
7
7
-
6
8
6
.
(i
n
Ch
in
e
se
)
[2
2
]
Ha
ss
e
lt
H
V
,
G
u
e
z
A
,
S
il
v
e
r
D.
"
De
e
p
re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
w
it
h
d
o
u
b
le
Q
-
lea
rn
in
g
"
.
T
h
irti
e
s
AA
AI
c
o
n
fer
e
n
c
e
o
n
a
rtif
ici
a
l
i
n
telli
g
e
n
c
e
.
P
h
o
e
n
ix
,
USA
,
2
0
1
6
:
2
0
9
4
-
2
1
0
0
.
B
I
O
G
RAP
H
I
E
S
O
F
AUTH
O
RS
Hu
i
W
a
n
g
,
b
o
r
n
i
n
1
9
6
8
,
P
h
.
D.
c
a
n
d
id
a
te.
His m
a
in
re
se
a
rc
h
in
tere
sts in
c
lu
d
e
re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
,
c
o
m
p
u
ter v
isio
n
a
n
d
h
u
m
a
n
-
c
o
m
p
u
ter i
n
tera
c
ti
o
n
.
P
e
n
g
Z
h
a
n
g
,
b
o
r
n
i
n
1
9
9
2
,
M
a
ste
r
stu
d
e
n
t.
His m
a
in
re
se
a
rc
h
in
tere
sts
in
c
lu
d
e
re
i
n
f
o
rc
e
m
e
n
t
lea
rn
in
g
in
c
o
n
ti
n
u
o
u
s s
p
a
c
e
s.
Q
u
a
n
L
iu
,
b
o
r
n
i
n
1
9
6
9
,
P
h
.
D.
,
p
r
o
f
e
ss
o
r,
P
h
.
D.
s
u
p
e
rv
iso
r.
His
m
a
in
re
se
a
rc
h
in
tere
sts in
c
lu
d
e
re
in
f
o
rc
e
m
e
n
t
lea
rn
in
g
,
in
telli
g
e
n
c
e
in
f
o
rm
a
ti
o
n
p
r
o
c
e
ss
in
g
a
n
d
a
u
t
o
m
a
ted
re
a
so
n
in
g
.
Evaluation Warning : The document was created with Spire.PDF for Python.