Indonesian
J
our
nal
of
Electrical
Engineering
and
Computer
Science
V
ol.
38,
No.
1,
April
2025,
pp.
357
∼
366
ISSN:
2502-4752,
DOI:
10.11591/ijeecs.v38.i1.pp357-366
❒
357
Simulation
of
ray
beha
vior
in
bicon
v
ex
con
v
er
ging
lenses
using
machine
lear
ning
algorithms
J
uan
Deyby
Carlos-Chullo,
Marielena
V
ilca-Quispe,
Whinders
J
oel
F
er
nandez-Granda,
Ev
eling
Castr
o-Gutierr
ez
Uni
v
ersidad
Nacional
de
San
Agustin
de
Arequipa,
Arequipa,
Peru
Article
Inf
o
Article
history:
Recei
v
ed
May
20,
2024
Re
vised
Oct
21,
2024
Accepted
Oct
30,
2024
K
eyw
ords:
Con
v
er
ging
bicon
v
e
x
lenses
Machine
learning
Proximal
polic
y
optimization
Reinforcement
learning
Soft
actor
-critic
ABSTRA
CT
This
study
used
machine
learning
(ML)
algorit
hms
to
in
v
estig
ate
the
simula-
tion
of
light
ray
beha
vior
in
bicon
v
e
x
con
v
er
ging
lenses.
While
earlier
studies
ha
v
e
focused
on
lens
image
formation
and
ray
tracing,
the
y
ha
v
e
not
applied
re-
inforcement
learning
(RL)
algorithms
lik
e
proximal
polic
y
optimization
(PPO)
and
soft
actor
-critic
(SA
C),
to
model
light
refraction
through
3D
lens
models.
This
study
addresses
that
g
ap
by
assessing
and
contrasting
the
performance
of
these
tw
o
algorithms
in
an
optical
simulation
conte
xt.
The
ndings
of
this
study
suggest
that
the
PPO
algorithm
achie
v
es
superior
ray
con
v
er
gence,
surpassing
SA
C
in
terms
of
stability
and
accurac
y
in
optical
simulation.
Consequently
,
PPO
of
fers
a
promising
a
v
enue
for
optimizing
optical
ray
simulators.
It
allo
ws
for
a
representation
that
closely
aligns
with
the
beha
vior
in
bicon
v
e
x
con
v
er
ging
lenses,
which
holds
signic
ant
potential
for
application
in
more
comple
x
optical
scenarios.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Juan
De
yby
Carlos
Chullo
Uni
v
ersidad
Nacional
de
San
Agustin
de
Arequipa
Arequipa,
Peru
Email:
jcarlosc@unsa.edu.pe
1.
INTR
ODUCTION
Con
v
er
ging
lenses,
such
as
bicon
v
e
x
lenses,
are
designed
to
form
both
real
and
virtual
images
[1].
These
lenses
are
essential
for
impro
ving
the
precision
with
which
we
observ
e
and
study
objects
[2].
Through
the
refraction
of
light,
con
v
er
ging
lenses
ena
b
l
e
illumi
nated
objects
to
project
onto
a
screen,
creating
images
that
can
be
e
xamined
for
v
arious
scientic
purposes
[3].
While
se
v
eral
applications
simulate
image
formation
through
these
lenses,
man
y
do
not
fully
capture
the
comple
x
beha
vior
of
light
rays.
One
such
application,
AR-GiOs,
as
analyzed
in
[4],
has
sho
wn
promising
results
in
the
academ
ic
eld,
particularly
for
learning
about
the
formation
of
real
and
virtual
images.
Ho
we
v
er
,
despite
its
success
in
educational
settings,
AR-GiOs
still
struggles
to
accurately
simulate
the
beha
vior
of
rays
passing
through
optical
systems.
This
g
ap
highli
ghts
the
limitations
of
current
simulation
tools
in
capturing
the
subtle
details
of
ray
beha
vior
,
which
are
fundamental
to
the
study
of
ph
ysical
optics.
Se
v
eral
applications
attempt
to
sim
u
l
ate
image
formation
through
lenses,
b
ut
the
y
often
f
ail
to
accu-
rately
model
the
light
rays
in
v
olv
ed
in
the
process
[4],
[5].
These
rays,
referred
to
as
principal,
central,
and
focal
rays,
are
essential
for
understanding
k
e
y
optical
beha
viors
when
light
passes
through
lenses
or
mirrors.
Accu-
rate
simulation
of
these
rays
is
crucial
because
the
y
dictate
ho
w
images
are
formed
and
ho
w
optical
systems
function,
yet
man
y
e
xisting
tools
lack
the
necessary
delity
to
simulate
them
ef
fecti
v
ely
.
J
ournal
homepage:
http://ijeecs.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
358
❒
ISSN:
2502-4752
No
studies
ha
v
e
been
identied
that
apply
reinforcement
learning
(RL)
algorithms
to
model
light
refraction
through
lenses.
RL
methods
lik
e
proximal
polic
y
opt
imization
(PPO)
[6],
[7]
and
soft
actor
-critic
(SA
C)
[8],
[9]
are
e
xtensi
v
ely
used
in
articial
intelligence
(AI)
and
machine
learning
(ML)
for
decision-
making
tasks
[10],
[11].
These
algorithms
operate
based
on
learning
from
interactions
with
their
en
vironment,
where
an
agent
mak
es
decisions
and
is
gi
v
en
feedback
through
re
w
ards
or
penalties
[12].
Due
to
the
dynamic
nature
of
light
refraction,
RL
algorithms
ha
v
e
the
potential
to
impro
v
e
the
precision
of
ray
simulations.
This
article
proposes
the
use
of
P
PO
and
SA
C
algorithms
to
control
the
trajectory
of
rays
as
the
y
pass
through
a
lens,
guiding
them
to
con
v
er
ge
at
points
where
virtual
or
real
images
are
formed.
By
applying
the
thin
lens
equation
and
magnication
formulas,
the
de
viation
and
trajectory
of
the
rays
are
calculated
as
the
y
interact
with
the
lens
[13].
A
simulator
created
in
Unity
utilizes
these
RL
algorithms
to
simulate
the
passage
of
the
three
critical
rays
(principal,
central,
and
focal)
through
a
con
v
er
ging
lens,
aiming
to
achie
v
e
accurate
ray
con
v
er
gence
and
image
formation.
While
it
is
possible
to
simulate
con
v
er
ging
rays
through
a
lens,
achie
ving
an
accurate
simulation
of
rays
passing
through
a
3D
lens
model
requires
highly
comple
x
and
computationally
demanding
models.
Gi
v
en
the
detailed
geometry
of
con
v
er
ging
lenses,
it
is
not
feasible
to
approximate
their
shape
using
multiple
primiti
v
e
models
in
Unity
.
Therefore,
moderately
comple
x
3D
models
and
RL
algorithms
are
emplo
yed
to
enhance
the
accurac
y
of
the
simulation.
The
remainder
of
this
article
is
structured
as
follo
ws:
section
2
co
v
ers
related
w
orks,
section
3
details
the
proposed
simulation
of
ray
beha
vior
in
bicon
v
e
x
con
v
er
ging
lenses,
and
section
4
pro
vides
the
results
and
discussion.
Lastly
,
section
5
outlines
our
conclusions
and
suggests
directions
for
future
w
ork.
2.
RELA
TED
W
ORKS
2.1.
Con
v
er
ging
bicon
v
ex
lenses
Bicon
v
e
x
lenses,
with
their
tw
o
curv
ed
surf
aces
f
acing
outw
ard,
serv
e
as
an
e
xample
of
con
v
er
ging
lenses.
It
is
crucial
to
note
that,
despite
their
appearance,
these
lenses
are
positi
v
e
(with
thickness
decreasing
from
the
center
to
w
ards
the
edges)
and
ha
v
e
the
ability
to
focus
light
rays
[14].
Commonly
used
in
optics
courses
in
schools
or
uni
v
ersities,
these
lenses
are
emplo
yed
to
illustrate
the
principles
of
refraction
and
the
formation
of
both
real
and
virtual
images
[1].
2.2.
Machine
lear
ning
ML
is
a
crucial
branch
of
AI,
enabling
computers
to
process
information
and
learn
from
it
[12].
Through
the
use
of
algorithms,
ML
addresses
comple
x
data
problems
and
automates
processes,
with
applica-
tions
in
v
arious
elds
such
as
data
mining,
image
analysis,
and
predicti
v
e
modeling
[15].
Its
broad
applicability
e
xtends
into
numerous
sci
entic
areas,
particularly
within
the
ph
ysical
sciences,
where
it
applies
algorithms
and
modeling
t
echniques
for
data
analysis
in
disciplines
li
k
e
statistical
mechanics,
high-ener
gy
ph
ysics,
cosmology
,
quantum
man
y-body
systems,
quantum
computing,
chemistry
,
and
materials
research
[16].
2.3.
Reinf
or
cement
lear
ning
RL
is
a
ML
method
where
an
agent
eng
ages
with
its
en
vironment
and
disco
v
ers
an
optimal
strate
gy
through
trial
and
error
[10],
[17].
It
is
recognized
as
one
of
the
three
primary
types
of
ML,
alongside
super
-
vised
and
unsupervised
learning.
Unlik
e
other
approaches,
its
objecti
v
e
is
to
acquire
dif
ferent
actions
based
on
the
conditions
in
the
en
vironment,
with
the
agent
serving
as
the
principal
decision-mak
er
[17].
RL
has
demonstrated
signicant
potential
for
adv
ancing
AI
[18].
In
this
frame
w
ork,
the
agent
recei
v
es
feedback
from
the
en
vironment
b
ut
lacks
access
to
labeled
data
or
e
xplici
t
guidance.
It
is
emplo
yed
in
sequential
decision-
making
tasks
across
v
arious
domains,
including
natural
and
social
sciences,
engineering,
and
AI
[19].
2.3.1.
Pr
oximal
policy
optimization
PPO
is
a
RL
technique
that
has
demonstrated
cutting-edge
performance
across
a
range
of
challenging
tasks
[20].
PPO
has
been
utilized
in
multiple
areas,
including
robotics,
g
aming,
and
autonomous
systems,
to
impro
v
e
agent
performance
in
comple
x
en
vironments.
F
or
e
xample,
in
[21],
PPO
w
as
emplo
yed
to
automate
simulated
autonomous
dri
ving,
leading
to
enhanced
outcomes.
Similarly
,
in
[22],
PPO
w
as
ef
fecti
v
ely
used
to
predict
stock
mark
et
trends,
highlighting
its
v
ersatility
and
ef
cienc
y
in
nancial
applications.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
38,
No.
1,
April
2025:
357–366
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
359
2.3.2.
Soft
actor
-critic
SA
C
operates
within
the
maximum
entrop
y
RL
frame
w
ork,
aiming
to
maximize
both
e
xpected
perfor
-
mance
and
entrop
y
simultaneously
,
thereby
enabling
actors
to
act
with
maximum
randomness
while
achie
ving
task
success
[8].
Its
ef
cac
y
has
been
e
xtensi
v
ely
e
v
aluated
in
v
arious
e
xperiments,
including
tests
on
Atari
g
ames
and
a
lar
ge-scale
MOB
A
g
ame,
as
demonstrated
in
[23].
In
comparati
v
e
studies,
PPO
has
consistently
emer
ged
as
a
top
performer
,
as
e
vidence
d
by
comparisons
with
SA
C
across
dif
ferent
test
conditions
[24].
Specically
,
PPO
sho
wcased
superior
performance,
especially
in
scenarios
in
v
olving
a
high
number
of
units
and
layers.
In
research
comparing
RL
algorithms,
PPO
consistently
demonstrates
remarkable
performance,
sur
-
passing
SA
C
i
n
v
arious
conditions,
particularly
when
dealing
with
comple
x
architectures
[25].
Furthermore,
in
comparati
v
e
studies
of
deep
RL
algorithms,
PPO
consistently
outperforms
alternati
v
es
lik
e
DDPG,
SA
C,
and
TD3,
as
demonstrated
in
[26].
T
o
implement
RL
in
simulation
en
vironments,
practitioners
often
le
v
erage
tools
such
as
Unity
and
ML-Agents,
as
highlighted
in
pre
vious
research
[27],
[28].
3.
METHOD
3.1.
Pr
oposed
simulation
of
ray
beha
vior
in
con
v
er
ging
bicon
v
ex
lenses
This
w
ork
aims
to
de
v
elop
a
simulation
of
ray
beha
vior
in
con
v
er
ging
bicon
v
e
x
lenses
using
3D
models,
utilizing
RL
techniques
lik
e
PPO
and
SA
C
to
modify
the
refraction
angles
of
light
rays
passing
through
a
lens.
The
goal
is
to
compare
the
feasibility
and
stability
of
these
algorithms
in
achie
ving
more
accurate
simulation
results.
The
proposal
includes
the
follo
wing
steps
as
outlined
in
T
able
1.
T
able
1.
Steps
for
simulating
light
through
a
bicon
v
e
x
lens
Steps
Description
1
Dene
the
objecti
v
e
2
Model
the
con
v
er
ging
bicon
v
e
x
lens
using
Blender
3
De
v
elop
a
simulation
en
vironment
using
Unity
4
Identi
fy
constraints
and
critical
properties
5
Explanation
of
PPO
and
SA
C
algorithms
6
RL
en
vironment
conguration
3.2.
Simulation
of
con
v
er
ging
bicon
v
ex
lens
beha
vior
T
o
v
alidate
the
proposal,
a
simulator
based
on
RL
algorithms,
specically
PPO
and
SA
C,
has
been
de
v
eloped.
These
algorithms
were
emplo
yed
to
accurately
model
and
simulate
the
beha
vior
of
light
rays
in
con
v
er
ging
bicon
v
e
x
lenses
under
simulated
conditions.
3.2.1.
Dene
the
objecti
v
e
The
objecti
v
e
is
to
compare
t
he
feasibility
and
stability
of
the
PPO
and
SA
C
algorithms
within
the
conte
xt
of
simulating
con
v
er
ging
bicon
v
e
x
lenses.
The
purpose
is
to
determine
which
of
these
algorithms
is
more
ef
fecti
v
e
in
achie
ving
accurate
and
reliable
simulation
results
that
precisely
describe
the
beha
vior
of
light
rays
interacting
with
con
v
er
ging
bicon
v
e
x
lenses.
PPO
and
SA
C
were
s
elected
because
of
their
e
xtensi
v
e
use
in
the
literature,
being
recognized
for
ef
fecti
v
ely
managing
both
single-agent
settings
as
well
as
multi-agent
cooperati
v
e
and
competiti
v
e
scenar
-
ios.
Additionally
,
other
RL
algorithms,
such
as
MA-POCA
[29],
also
support
these
types
of
en
vironments.
Ho
we
v
er
,
this
study
focuses
on
PPO
and
SA
C
due
to
their
popularity
and
demonstrated
success
in
comple
x
ph
ysical
simulations.
3.2.2.
Modeling
the
con
v
er
ging
bicon
v
ex
lens
using
Blender
In
the
simul
ation,
tw
o
distinct
models
of
con
v
er
ging
bicon
v
e
x
lenses
were
implemented
to
ensure
high
precision
in
ray
tracing.
Each
model,
depic
ted
in
Figure
1,
w
as
constructed
using
Blender
3D
v
ers
ion
3.6.1
and
has
a
size
of
6.5
MB.
Th
e
se
models
consist
of
141,604
v
ertices
and
269,316
triangles,
with
one
model
ha
ving
its
surf
ace
normals
oriented
inw
ard
and
the
other
outw
ard.
This
dif
ference
in
normal
orientation
w
as
essential
to
enable
precise
collision
detection
by
the
rays
emitted
using
Unity’
s
Raycast
function,
both
when
entering
and
e
xiting
the
lens.
The
models
were
generated
by
intersecting
tw
o
spheres,
which
were
created
in
Blender
with
the
highest
possible
le
v
el
of
detail,
constrained
by
the
computational
capabilities
and
the
limits
of
the
Simulation
of
r
ay
behavior
in
bicon
ve
x
...
(J
uan
De
yby
Carlos-Chullo)
Evaluation Warning : The document was created with Spire.PDF for Python.
360
❒
ISSN:
2502-4752
Blender
softw
are.
Each
sphere
has
a
radius
of
10
meters,
and
their
centers
are
separated
by
a
distance
of
19.9
units,
resulting
in
an
intersection
of
0.1
meters.
This
intersection
w
as
chosen
to
produce
a
lens
thin
enough
to
a
v
oid
the
optical
aberration
that
occurs
in
thick
er
lenses.
Figure
1.
Characteristics
of
the
spheres
with
256
rings
and
2048
se
gments
each,
with
dimensions
of
10x10x10
meters
and
an
intersection
of
0.10
meters,
resulting
in
a
con
v
er
ging
bicon
v
e
x
lens
of
approximately
67
rings,
2048
se
gments,
and
dimensions
of
1.977
meters
x
1.977
meters
x
0.1
meters
3.2.3.
De
v
elop
a
simulation
en
vir
onment
using
Unity
In
the
conte
xt
of
optical
ray
si
mulation
with
a
con
v
er
ging
lens,
a
simulation
en
vironment
w
as
de
v
el-
oped
using
Unity
v
ersion
2021.3.11f1
and
the
RL
agents
library
,
a
popular
tool
in
RL
en
vironments
[7].
This
en
vironment
includes
elements
such
as
a
con
v
er
ging
bicon
v
e
x
lens,
focal
points,
and
a
ray
launch
point
to
determine
the
initial
direction
and
trajectory
of
the
rays
passing
t
hrough
the
lens.
The
de
v
elopment
aimed
to
apply
PPO
and
SA
C
algorithms
to
simulate
and
optimize
the
beha
vior
of
light
rays.
Figure
2
presents
a
screenshot
of
the
simulation
en
vironment
within
Unity
.
It
illustrates
the
con
v
er
ging
lens,
focal
points,
and
the
trajectory
of
three
t
ypes
of
rays
projected
from
a
designated
origin
point.
These
rays
include
the
principal,
central,
and
focal
rays,
mo
ving
from
left
to
right
from
the
vie
wpoint,
sho
wcasing
the
simulated
optical
phenomena.
The
simulator
designed
for
this
study
inte
grates
ph
ysics
optics
principles
into
Unity’
s
frame
w
ork,
pro
viding
a
platform
for
comparing
the
performance
of
PPO
and
SA
C
algorithms
in
simulating
light
ray
tra-
jectories.
Cent
ral
to
the
simulator
is
the
algorithm
responsible
for
tracing
the
path
of
light
rays
as
t
he
y
interact
with
the
lens
surf
aces.
By
recording
c
ollision
points,
the
beha
vior
of
rays
can
be
analyzed,
informing
adjust-
ments
required
for
accurate
simulation.
The
utilization
of
the
thin
lens
formula
and
lens
magnication
aids
in
calculating
the
optimal
points
for
ray
passage
or
approach,
enhancing
the
realism
of
the
simulation.
Figure
2.
Simulation
en
vironment
de
v
eloped
in
Unity
.
The
en
vironment
contains
a
con
v
er
ging
bicon
v
e
x
lens.
F
ocal
points
are
located
on
both
sides
of
the
lens.
Three
rays
–
principal
ray
,
central
ray
,
and
focal
ray
–
are
projected
from
the
origin
point.
All
rays
are
projected
from
left
to
right
from
the
vie
wpoint
3.2.4.
Identifying
constraints
and
critical
pr
operties
A
k
e
y
constraint
is
to
limit
the
rays
to
three
specic
types:
principal,
central,
and
focal,
due
to
their
rele
v
ance
in
the
eld
of
optics.
Additionally
,
the
launch
point
of
the
rays
w
as
restricted
to
a
distance
of
10
meters
from
the
lens,
and
along
the
Y
and
Z
ax
es
within
a
range
that
ensures
col
lision
with
the
lens,
pre
v
enting
the
rays
from
escaping
into
empty
space
or
striking
the
lens
edges.
Gi
v
en
that
the
lens
has
a
radius
of
1
meter
,
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
38,
No.
1,
April
2025:
357–366
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
361
a
maximum
radius
of
0.975
meters
w
as
chosen
to
a
v
oid
reaching
the
edge.
Furthermore,
the
refracti
v
e
inde
x
of
the
lens
w
as
set
to
2,
despite
gl
ass
typically
ha
ving
a
v
alue
of
1.45,
in
order
to
achie
v
e
tighter
con
v
er
gence
of
the
rays.
Finally
,
the
number
of
i
nteractions
during
the
training
of
the
PPO
and
SA
C
algorithms
w
as
limited
to
500k
due
to
the
time
required
for
training,
with
48
simulated
ray
instances,
taking
approximately
20
to
30
minutes
to
complete
the
training
for
each
ray
type.
3.2.5.
Explanation
of
PPO
and
SA
C
algorithms
The
PPO
method
is
a
widely
used
on-polic
y
algorithm
in
RL,
based
on
combining
v
alue
and
polic
y
gradients
to
optimize
agent
performance
[21],
[30].
Its
k
e
y
objecti
v
e
is
to
mak
e
sure
that,
after
updating
the
polic
y
,
it
remains
relati
v
ely
close
to
the
pre
vious
one.
T
o
a
v
oid
drastic
shifts,
PPO
incorporates
a
clipping
mechanism.
This
algorithm
samples
data
from
its
en
vironment
and
uses
stochastic
gradient
descent
to
optimize
a
clipped
loss
function
[31].
In
contrast,
SA
C
is
an
of
f-polic
y
algorithm
in
RL
that
follo
ws
an
actor
-critic
approach
and
does
not
re
ly
on
predened
models
or
rules
[30].
SA
C
emplo
ys
a
re
vised
RL
objecti
v
e
function
and
emphasizes
maximizing
re
w
ards
o
v
er
the
agent’
s
lifespan
along
with
polic
y
entrop
y
[31].
Therefore,
in
this
study
,
visual
analyses
were
conducted
to
assess
whether
the
AI
agent
responsible
for
adjusting
the
angle
in
optical
ray
simulators
w
as
trained
using
ML,
ensuring
its
stability
and
applicability
.
T
o
achie
v
e
this,
the
Unity
ML
agent
w
as
utilized,
and
a
comparison
between
the
PPO
and
SA
C
algorithms
w
as
performed.
3.2.6.
RL
en
vir
onment
conguration
In
this
study
,
the
ray’
s
origin
point
is
randomly
selected
within
a
distance
of
2F
from
the
X-axis,
allo
wing
its
location
at
an
y
point
within
the
area
of
a
circle
dened
by
the
Y
and
Z
ax
es,
as
sho
wn
in
Figure
3.
In
this
conte
xt,
the
v
ariable
observ
ed
by
the
agent
is
the
radius,
representing
the
distance
from
the
center
of
the
circle
to
the
origin
point
of
the
ray
.
The
agent
mak
es
decisions
based
on
this
v
ariable
while
interacting
with
the
en
vironment
and
the
con
v
er
ging
bicon
v
e
x
lens.
Figure
3.
The
starting
points
of
the
rays
originate
at
a
distance
of
2F
from
the
right
side.
Kno
wing
that
F
is
at
a
distance
of
2.5
meters
and
2F
at
5
meters
T
o
ensure
proper
beha
vior
and
pre
v
ent
unnecessary
collision
repetitions,
a
small
displacement
of
10
−
8
meters
in
the
direction
of
the
ray
w
as
applied
each
time
a
colli
sion
occurred
with
the
lens
model.
This
displacement
w
a
s
necessary
because
the
ray
could
collide
with
both
internal
and
e
xternal
colliders
generated
from
the
normals
of
the
lens
model,
ensuring
that
the
ray
continued
its
trajectory
without
additional
interference
in
subsequent
collisions
Figure
4.
Throughout
the
training
process,
the
agent
recei
v
ed
observ
ations
from
the
en
vironment,
aiding
in
the
decision-making
process.
Re
w
ards
were
utilized
to
moti
v
ate
the
agent
to
adjust
the
angle
of
the
rays
after
colliding
with
the
lens.
Figure
4.
The
guidelines,
for
instance,
in
the
case
of
the
principal
ray
at
the
top
of
the
lens,
are
as
follo
ws:
the
green
lines
represent
the
ray’
s
trajectory
,
the
red
lines
correspond
to
the
normals
at
the
points
where
the
ray
collided,
and
the
yello
w
line
is
used
to
project
the
resulting
ray
Simulation
of
r
ay
behavior
in
bicon
ve
x
...
(J
uan
De
yby
Carlos-Chullo)
Evaluation Warning : The document was created with Spire.PDF for Python.
362
❒
ISSN:
2502-4752
RL
agent,
the
RL
agent’
s
task
is
to
adjust
the
ray’
s
direction
each
time
it
passes
through
the
lens,
aiming
to
minimize
the
distance
between
the
resulting
ray’
s
path
and
the
tar
get
point.
This
tar
get
point
is
calculated
using
the
thin
lens
formulas
in
(1)
and
the
magnication
in
(2).
The
formula
applied
for
con
v
er
ging
thin
lenses
is
kno
wn
as
the
thin
lens
equation,
which
relates
the
image
distance
(
d
i
),
object
distance
(
d
o
),
and
focal
length
(
f
)
of
the
lens,
as
sho
wn
in
(2).
Additionally
,
lateral
magnication
is
used,
linking
the
image
height
(
h
i
),
object
height
(
h
o
),
image
distance
(
d
i
),
and
object
distance
(
d
o
)
as
presented
in
(1).
The
a
v
ailability
of
these
actions
for
the
agent
depends
on
the
state
of
the
en
vironment
at
that
moment.
In
this
particular
case,
the
decision
has
been
made
to
apply
the
RL
algorithms
PPO
and
SA
C,
taking
only
one
action
in
each
training
c
ycle.
Unlik
e
man
y
approaches
described
in
the
literature,
where
multiple
actions
are
tak
en
in
each
training
c
ycle,
in
this
en
vironment,
the
agent
tak
es
only
one
action
out
of
tw
o
possible
actions
in
each
training
c
ycle:
-
The
refraction
angle
wi
ll
only
be
modied
when
the
emitted
light
reaches
the
four
k
e
y
points
(the
point
of
origin,
the
e
xternal
collision
point
where
the
light
enters
the
lens,
the
internal
collision
point
where
it
e
xits
the
lens,
and
the
endpoint).
-
Rays
that
ha
v
e
three
points
or
fe
wer
will
be
discarded
and
will
not
be
considered
in
the
simulation
because
some
rays
do
not
collide
with
the
lens,
which
is
due
to
defects
in
the
3D
model.
1
f
=
1
d
o
+
1
d
i
(1)
m
=
h
i
h
o
=
−
d
i
d
o
(2)
Hyperparameters,
similar
to
other
RL
algorithms,
both
PPO
and
SA
C
ha
v
e
multiple
h
yper
parameters
that
inuence
the
agent’
s
performance
in
a
con
v
er
ging
lens
en
vironment.
In
this
case,
the
objecti
v
e
is
to
adjust
the
angle
of
refraction
to
produce
an
outgoing
ray
trajectory
that
f
alls
within
a
precision
threshold
of
0.01
meters
from
the
point
pre
vious
ly
calculated
using
lens
and
magnication
formulas.
The
re
w
ard
is
determined
by
the
distance
between
the
resulting
ray
and
the
tar
get
point.
T
able
2
lists
the
h
yperparameters
of
PPO
and
SA
C
used
in
this
study
,
with
both
congurations
based
on
e
xamples
from
Unity’
s
ML-Agents
toolkit.
T
able
2.
Hyperparameters
for
PPO
and
SA
C
algorithms
in
the
e
xperiment.
P
arameters
include
polic
y
de
viation
penalty
,
learning
rate,
batch
size,
iterations,
samples,
and
entrop
y
settings.
These
v
alues
af
fect
the
agents’
performance
in
simulating
light
ray
trajectories
through
a
con
v
er
ging
lens
Hyperparameter
PPO
v
alue
SA
C
v
alue
Polic
y
de
viation
penalty
coef
cient
0.2
-
Learning
rate
0.0001
0.0003
Batch
size
64
0.-
Number
of
iterations
10
-
Number
of
collected
samples
1000
-
Replay
b
uf
fer
size
-
1000
T
ar
get
entrop
y
-
0.2
Entrop
y
re
gularization
f
actor
-
0.01
Minimum
entrop
y
-
0.5
Re
w
ard
function,
Figure
5
pro
vides
a
visual
representation
of
ho
w
this
re
w
ard
function
operates
within
the
agent-en
vironm
ent
interaction.
The
re
w
ard
mechani
sm
plays
a
crucial
role
in
t
he
agent’
s
l
earning
process
[32].
T
o
achie
v
e
the
i
ntended
beha
vior
,
a
specic
goal
needs
to
be
dened
for
optimization.
The
agent’
s
task
is
to
adjust
the
angle
of
refraction
so
that
the
resulting
ray
is
within
at
least
0.01
meters
of
the
tar
get
point.
As
the
ray
passes
through
the
3D
model,
the
re
w
ard
is
determined
by:
-
If
the
distance
to
the
tar
get
is
less
than
or
equal
0.01
m.
-
Re
w
ard
=
1.0
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
38,
No.
1,
April
2025:
357–366
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
363
-
If
the
distance
to
the
tar
get
is
greater
than
to
0.01
m.
-
The
change
in
distance
to
the
tar
get
is
calculated,
indicating
whether
the
ray
is
approaching
or
mo
ving
a
w
ay
from
the
tar
get.
This
is
achie
v
ed
by
subtracting
the
current
distance
to
the
tar
get
from
the
distance
it
had
in
the
pre
vious
step.
-
If
the
change
in
distance
is
positi
v
e,
the
agent
is
re
w
arded
for
approaching
the
tar
get.
-
If
the
change
in
distance
is
ne
g
ati
v
e,
the
agent
is
penalized
for
mo
ving
a
w
ay
from
the
tar
get.
Figure
5.
V
isual
depiction
of
the
re
w
ard
function
go
v
erning
agent-en
vironment
interaction.
Re
w
ards
are
determined
by
ray
proximity
to
the
tar
get,
encouraging
con
v
er
gence
and
discouraging
di
v
er
gence.
The
gure
claries
re
w
ard
dynamics
in
collision
scenarios
and
distance
relationships
Experiment
en
vironment,
the
setup
includes
an
AMD
Ryzen
7
3800XT
processor
,
an
NVIDIA
GeF
orce
R
TX
3070
GPU,
and
32
GB
of
RAM.
Additionally
,
the
follo
wing
softw
are
v
ersions
were
utilized:
Unity
En-
gine
2021.3.11f1,
T
ensorFlo
w
2.13.0,
and
Unity
ML-Agents
T
oolkit
Release
20.
These
specications
were
chosen
to
ensure
a
rob
ust
and
ef
cient
system
capable
of
handling
compl
e
x
simulations
and
the
computation-
ally
demanding
tasks
required
for
training
RL
algorithms
in
optical
ray
simulations.
4.
RESUL
TS
AND
DISCUSSION
W
e
found
that
the
PPO
algorithm
achie
v
ed
superior
ray
con
v
er
gence
with
higher
stability
and
accurac
y
than
SA
C.
PPO
reached
a
re
w
ard
abo
v
e
0.99
in
fe
wer
steps
for
principal,
central,
and
focal
rays,
while
SA
C
sho
wed
impro
v
ement
only
after
500k
steps,
making
PPO
better
suited
for
optimizing
optical
ray
simulators.
4.1.
Ev
aluate
the
beha
vior
of
PPO
and
SA
C
algorithms
The
learning
results
were
visualized
using
T
ensorBoard,
a
tool
from
T
ensorFlo
w
,
with
data
generated
for
the
principal,
central,
and
focal
rays.
As
sho
wn
in
Figures
6
and
7,
the
PPO
algorithm
achie
v
ed
re
w
ards
of
0.9932,
0.9943,
and
0.9938
for
the
principal,
central,
and
focal
rays,
respecti
v
ely
,
within
200k
steps,
success-
fully
meeting
the
tar
get
re
w
ard
of
0.99
wi
th
a
precision
threshold
of
0.01m.
In
contrast,
the
SA
C
algorithm
obtained
re
w
ards
of
0.8951,
0.8829,
and
0.8715
for
t
he
same
rays
o
v
er
the
same
steps,
f
alling
short
of
the
tar
get.
While
SA
C
sho
wed
impro
v
ement
between
200k
and
500k
steps,
it
still
lagged
behind
PPO
in
accurac
y
and
stability
.
PPO’
s
results
closely
aligned
with
the
predictions
from
the
thin
lens
formula,
demonstrating
its
superior
performance,
consistent
with
pre
vious
studies
highlighting
PPO’
s
ef
fecti
v
eness
in
achie
ving
reliable
outcomes
in
similar
simulation
tasks.
0.
8
0.
9
1
0
100k
200k
300k
400k
500k
PPO
S
AC
(a)
0.
7
0.
8
0.
9
1
0
100k
200k
300k
400k
500k
PPO
S
AC
(b)
0.
7
0.
8
0.
9
1
0
100k
200k
300k
400k
500k
PPO
S
AC
(c)
Figure
6.
Comparison
of
PPO
and
SA
C
algorithms
in
dif
ferent
rays
(a)
principal
ray
,
(b)
center
ray
,
and
(c)
focal
ray
.
The
graphs
represent
en
vironment/cumulati
v
e
re
w
ard.
PPO
is
represented
in
blue,
and
SA
C
in
pink.
The
PPO
algorithm
sho
ws
stable
performance
throughout
while
SA
C
e
xperiences
early
signs
of
o
v
ertting
b
ut
e
v
entually
stabilizes
Simulation
of
r
ay
behavior
in
bicon
ve
x
...
(J
uan
De
yby
Carlos-Chullo)
Evaluation Warning : The document was created with Spire.PDF for Python.
364
❒
ISSN:
2502-4752
Figure
7.
Results
of
all
rays
(principal,
central,
and
focal)
for
the
PPO
and
SA
C
algorithms
at
the
200k
step
4.2.
Principal
ndings
Optical
ph
ysics
simulators
are
a
critical
area
of
study
,
as
the
y
require
agents
capable
of
e
x
ecuting
beha
viors
in
real-time
under
v
arious
circumstances.
In
this
project,
a
simulator
for
con
v
er
ging
rays
and
lenses
w
as
designed
to
e
v
aluate
the
performance
of
algorithms
lik
e
PPO
and
SA
C.
The
results
contrib
ute
to
impro
ving
realistic
representations
of
optical
phenomena,
sho
wing
that
PPO
ef
fecti
v
ely
emulates
the
beha
vior
of
objecti
v
e
optical
systems
and
accurately
reproduces
the
predicted
outcomes.
4.3.
Comparison
to
prior
w
ork
In
comparison
with
pre
vious
applications
of
PPO
and
SA
C
in
other
domains,
such
as
g
ame
si
mulations
and
general
ph
ysics
en
vironments
[12],
[30],
this
project
stands
out
by
de
v
eloping
a
simulator
focused
on
the
beha
vior
of
rays
in
con
v
er
ging
bicon
v
e
x
lenses.
Unlik
e
other
appl
ications
that
use
rays
as
guides
b
ut
do
not
simulate
them
correctly
[4],
this
w
ork
allo
ws
for
the
visualization
of
the
optical
phenomenon
using
only
rays
trained
with
RL.
The
results
demonstrate
ho
w
a
ray
is
refracted
when
entering
and
e
xiting
the
lens.
The
choice
of
PPO,
based
on
its
stability
and
adaptability
,
has
pro
v
en
ef
fect
i
v
e
for
this
comple
x
task,
distinguishing
this
research
from
pre
vious
w
orks
focused
on
broader
or
less
specic
areas.
4.4.
Str
engths
and
limitations
This
study
e
xplored
the
application
of
RL
algorithms
t
o
simulate
ray
beha
vior
in
a
single
type
of
con-
v
er
ging
bicon
v
e
x
lens
with
three
specic
ray
types
(principal,
central,
and
focal).
Ho
we
v
er
,
further
and
more
comprehensi
v
e
studies
are
needed
to
e
xplore
the
beha
vior
of
additional
ray
types
and
more
comple
x
optical
systems,
such
as
multi-lens
setups.
Using
a
dense
3D
polygonal
mesh
also
introduced
computational
chal-
lenges,
such
as
memory
limitations
and
occasional
missed
collisions,
which
may
ha
v
e
impacted
the
accurac
y
of
the
simulations.
5.
CONCLUSION
This
research
successfully
applied
RL
algorithms
to
simulate
the
beha
vior
of
light
rays
passing
through
bicon
v
e
x
con
v
er
ging
lenses,
demonst
rating
the
viability
of
RL
in
modeling
optical
phenomena.
The
results,
particularly
with
PPO
achie
ving
a
re
w
ard
e
xceeding
0.99
with
high
accurac
y
,
indicate
its
superior
stability
and
ef
cienc
y
in
this
conte
xt.
I
n
contrast,
SA
C,
while
kno
wn
for
its
general
applicability
in
v
arious
domains,
underperformed
in
this
specic
scenario.
This
nding
aligns
with
the
need
to
tailor
RL
algorithms
to
problem-
specic
dynamics,
as
SA
C’
s
v
ersatility
in
other
studies
w
as
not
considered
in
this
study
.
Recent
observ
ations
suggest
that
RL
algorithms
can
signicant
ly
impro
v
e
the
accurac
y
of
optical
ray
simulations.
Our
ndings
pro
vide
conclusi
v
e
e
vidence
that
PPO,
in
particular
,
enhances
the
precision
of
ray
con
v
er
gence
through
bicon-
v
e
x
lenses,
making
it
a
promising
tool
for
future
optical
system
modeling.
Our
study
demonstrates
that
PPO
is
more
reliable
for
simulating
ray
beha
vior
in
optical
systems.
Future
studies
may
e
xplore
the
application
of
these
algorithms
to
multi-lens
systems,
where
ray
tracing
becomes
more
intricate.
AKNO
WLEDGMENTS
Thanks
to
the
”Research
Center
,
T
ransfer
of
T
echnologies
and
Softw
are
De
v
elopment
R
+
D
+
i”
-
CiT
eSoft
EC-0003-2017-UNSA,
for
their
collaboration
in
the
use
their
equipment
and
f
acilities,
for
the
de
v
el-
opment
of
this
research
w
ork.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
38,
No.
1,
April
2025:
357–366
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
365
REFERENCES
[1]
H.
Isik,
“Comparing
the
images
formed
by
uses
of
lens
surf
aces,
”
Physics
Education
,
v
ol.
58,
no.
3,
p.
035002,
May
2023,
doi:
10.1088/1361-6552/acb87d.
[2]
K.
Flie
g
auf,
J.
Sebald,
J.
M.
V
eith,
H.
Spieck
er
,
and
P
.
Bitzenbauer
,
“Impro
ving
early
optics
instruction
using
a
phenomenological
approach:
a
eld
study
,
”
Optics
,
v
ol.
3,
no.
4,
pp.
409–429,
No
v
.
2022,
doi:
10.3390/opt3040035.
[3]
S.
W
¨
orner
,
S.
Beck
er
,
S.
K
¨
uchemann,
K.
Scheiter
,
and
J.
K
uhn,
“De
v
elopment
and
v
alidation
of
the
ray
optics
in
con-
v
er
ging
lenses
concept
in
v
entory
,
”
Physical
Re
vie
w
Physics
Education
Resear
c
h
,
v
ol.
18,
no.
2,
p.
020131,
No
v
.
2022,
doi:
10.1103/Ph
ysRe
vPh
ysEducRes.18.020131.
[4]
H.
P
.
K
encana,
B.
H.
Isw
anto,
and
F
.
C.
W
ibo
w
o,
“
Augmented
reality
geometrical
optics
(AR-GiOs)
for
ph
ysics
learning
in
high
schools,
”
J
ournal
of
Physics:
Confer
ence
Series
,
v
ol.
2019,
no.
1,
p.
012004,
Oct.
2021,
doi:
10.1088/1742-6596/2019/1/012004.
[5]
Y
.-J.
Liao,
W
.
T
arng,
and
T
.-L.
W
ang,
“The
ef
fects
of
an
augmented
reality
lens
imaging
learning
system
on
students’
science
achie
v
ement,
learning
moti
v
ation,
and
inquiry
skills
in
ph
ysics
inquiry
acti
vities,
”
Education
and
Information
T
ec
hnolo
gies
,
Sep.
2024,
doi:
10.1007/s10639-024-12973-9.
[6]
J.
Schulman,
F
.
W
olski,
P
.
Dhariw
al,
A.
Radford,
and
O.
Klimo
v
,
“Proximal
polic
y
optimization
algorithms,
”
Arxiv
,
2017,
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/1707.06347.
[7]
A.
Raza,
M.
A.
Shah,
H.
A.
Khattak,
C.
Maple,
F
.
Al-T
urjman,
and
H.
T
.
Rauf,
“Collaborati
v
e
multi-agents
in
dynamic
industrial
internet
of
things
using
deep
reinforcement
learning,
”
En
vir
onment,
De
velopment
and
Sustainability
,
v
ol.
24,
no.
7,
pp.
9481–9499,
Jul.
2022,
doi:
10.1007/s10668-021-01836-9.
[8]
T
.
Haarnoja
et
al.
,
“Soft
actor
-critic
algorithms
and
applications,
”
Arxiv
,
2018,
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/1812.05905.
[9]
B.
Peng,
Y
.
Xie,
G.
Seco-Granados,
H.
W
ymeersch,
and
E.
A.
Jorswieck,
“Communication
scheduling
by
deep
reinforcement
learning
for
remote
traf
c
state
estimation
with
bayesian
infere
nce,
”
IEEE
T
r
ansactions
on
V
ehicular
T
ec
hnolo
gy
,
v
ol.
71,
no.
4,
pp.
4287–4300,
Apr
.
2022,
doi:
10.1109/TVT
.2022.3145105.
[10]
M.
Kim,
J.-S.
Kim,
M.-S.
Choi,
and
J.-H.
P
ark,
“
Adapti
v
e
discount
f
actor
for
deep
reinforcement
learning
in
continuing
tasks
with
uncertainty
,
”
Sensor
s
,
v
ol.
22,
no.
19,
p.
7266,
Sep.
2022,
doi:
10.3390/s22197266.
[11]
V
.
K.
R.
Radha,
A.
N.
Lakshmipathi,
R.
K.
T
irandasu,
and
P
.
R.
Prakash,
“The
general
design
of
the
automation
for
multiple
elds
using
reinforcement
learning
algorithm,
”
Indonesian
J
ournal
of
Electrical
Engineering
and
Computer
Science
(IJEECS)
,
v
ol.
25,
no.
1,
p.
481,
Jan.
2022,
doi:
10.11591/ijeecs.v25.i1.pp481-487.
[12]
H.
An
and
J.
Kim,
“Design
of
a
h
yper
-casual
futsal
mobile
g
ame
using
a
machine-learned
AI
agent-player
,
”
Applied
Sciences
,
v
ol.
13,
no.
4,
p.
2071,
Feb
.
2023,
doi:
10.3390/app13042071.
[13]
T
.
Goncharenk
o,
N.
Y
ermak
o
v
a-Cherchenk
o,
and
Y
.
Anedchenk
o,
“Experience
in
the
use
of
mobile
technologies
as
a
ph
ysics
learning
method,
”
CEUR
W
orkshop
Pr
oceedings
,
v
ol.
2732,
pp.
1298–1313,
2020.
[14]
Y
.
B.
Bhakti,
I.
A.
D.
Astuti,
and
R.
Prasetya,
“F
our
-tier
optics
diagnostic
test
(4T
-ODT)
to
identify
student
misconceptions,
”
in
Advances
in
Social
Science
,
Education
and
Humanities
Resear
c
h
,
2023,
pp.
308–314.
[15]
B.
Mahesh,
“Machine
learning
algorithms
-
a
re
vie
w
,
”
International
J
our
nal
of
Science
and
Resear
c
h
(IJSR)
,
v
ol.
9,
no.
1,
pp.
381–386,
Jan.
2020,
doi:
10.21275/AR
T20203995.
[16]
G.
Carleo
et
al.
,
“Machine
learning
and
the
ph
ysical
sciences,
”
Re
vie
ws
of
Modern
Physics
,
v
ol.
91,
no.
4,
p.
045002,
Dec.
2019,
doi:
10.1103/Re
vModPh
ys.91.045002.
[17]
A.
T
.
Huynh,
B.
T
.
Nguyen,
H.
T
.
Nguyen,
S.
V
u,
and
H.
D.
Nguyen,
“
A
method
of
deep
reinforcement
learning
for
simulation
of
autonomous
v
ehicle
control,
”
in
International
Confer
ence
on
Evaluation
of
No
vel
Appr
oac
hes
to
Softwar
e
Engineering
,
EN
ASE
-
Pr
oceedings
,
2021,
v
ol.
2021-April,
pp.
372–379,
doi:
10.5220/0010478903720379.
[18]
C.
Stranne
g
˚
ard
et
al.
,
“The
ecosystem
path
to
A
GI,
”
in
Lectur
e
Notes
in
Computer
Science
(including
subseries
Lectur
e
Notes
in
Articial
Intellig
ence
and
Lectur
e
Notes
in
Bioinformatics)
,
v
ol.
13154
LN
AI,
2022,
pp.
269–278.
[19]
S.
S.
Mousa
vi,
M.
Schukat,
and
E.
Ho
wle
y
,
“Deep
reinforcement
learning:
an
o
v
ervie
w
,
”
Lectur
e
Notes
in
Net
works
and
Systems
,
v
ol.
16,
pp.
426–440,
2018,
doi:
10.1007/978-3-319-56991-8
32.
[20]
Y
.
W
ang,
H.
He,
and
X.
T
an,
“T
ruly
proximal
polic
y
optimization,
”
Pr
oceedings
of
Mac
hine
Learning
Resear
c
h
,
v
ol.
115,
pp.
113–122,
2019.
[21]
Y
.
Sa
vid,
R.
Mahmoudi,
R.
Mask
eli
¯
unas,
and
R.
Dama
ˇ
se
vi
ˇ
cius,
“Simulated
autonomous
dri
ving
usi
ng
reinforcement
learning:
a
comparati
v
e
st
udy
on
Unity’
s
ML-agents
frame
w
ork,
”
Information
,
v
ol.
14,
no.
5,
p.
290,
May
2023,
doi:
10.3390/info14050290.
[22]
H.
K.
Sagiraju
and
S.
Mog
alla,
“
Application
of
multilayer
perceptron
to
deep
reinforcement
learning
for
stock
mark
et
trading
and
analysis,
”
Indonesian
J
ournal
of
Electrical
Engineering
and
Computer
Science
(IJEECS)
,
v
ol.
24,
no.
3,
pp.
1759–1771,
2021,
doi:
10.11591/ijeecs.v24.i3.pp1759-1771.
[23]
H.
Zhou,
Z.
Lin,
J.
Li,
Q.
Fu,
W
.
Y
ang,
and
D.
Y
e,
“Re
vi
siting
discrete
soft
actor
-critic,
”
Arxiv
,
2022,
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/2209.10081.
[24]
I.
V
ohra,
S.
Uttrani,
A.
K.
Rao,
and
V
.
Dutt,
“Ev
aluating
the
ef
cac
y
of
dif
ferent
neural
netw
ork
deep
reinforcement
algorithms
in
comple
x
search-and-retrie
v
e
virtual
simulations,
”
Communications
in
Computer
and
Information
Science
,
v
ol.
1528
CCIS,
pp.
348–361,
2022,
doi:
10.1007/978-3-030-95502-1
27.
[25]
D.
S.
Alarcon
and
J.
H.
Bidinotto,
“Deep
reinforcement
learning
for
e
vtol
ho
v
ering
control,
”
33r
d
Congr
ess
of
the
International
Council
of
the
Aer
onautical
Sciences,
ICAS
2022
,
v
ol.
7,
pp.
5130–5142,
2022.
[26]
H.
Shengren,
E.
M.
Salazar
,
P
.
P
.
V
er
g
ara,
and
P
.
P
alensk
y
,
“Performance
comparison
of
deep
RL
algorithms
for
ener
gy
sys-
tems
optimal
scheduling,
”
in
2022
IEEE
PES
Inno
vative
Smart
Grid
T
ec
hnolo
gies
Confer
ence
Eur
ope
(ISGT
-Eur
ope)
,
Oct.
2022,
v
ol.
2022-Octob,
pp.
1–6,
doi:
10.1109/ISGT
-Europe54678.2022.9960642.
[27]
J.
Possik
et
al.
,
“
A
distrib
uted
simulation
approach
to
inte
grate
an
ylogic
and
Unity
for
virtual
reality
applications:
case
of
CO
VID-19
modelling
and
training
in
a
dialysis
unit,
”
in
2021
IEEE/A
CM
25th
International
Symposium
on
Distrib
uted
Simulation
and
Real
T
ime
Applications
(DS-RT)
,
Sep.
2021,
pp.
1–7,
doi:
10.1109/DS-R
T52167.2021.9576149.
[28]
A.
Juliani
et
al.
,
“Unity:
a
general
platform
for
intelligent
agents,
”
Arxiv
,
2018,
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/1809.02627.
[29]
A.
Cohen
et
al
.,
“On
the
use
and
misuse
of
absorbing
states
in
multi-agent
reinforcement
learning,
”
Arxiv
,
2021,
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/2111.05992.
Simulation
of
r
ay
behavior
in
bicon
ve
x
...
(J
uan
De
yby
Carlos-Chullo)
Evaluation Warning : The document was created with Spire.PDF for Python.
366
❒
ISSN:
2502-4752
[30]
C.
Y
u
et
al.
,
“The
surprising
ef
fecti
v
eness
of
PPO
in
cooperati
v
e
multi-agent
g
am
es,
”
Advances
in
Neur
al
Information
Pr
ocessing
Systems
,
v
ol.
35,
2022.
[31]
A.
P
.
Kalidas,
C.
J.
Joshua,
A.
Q.
Md,
S.
Basheer
,
S.
Mohan,
and
S.
Sakri,
“Deep
reinforcement
learning
for
vision-based
na
vig
ation
of
U
A
Vs
in
a
v
oiding
stationary
and
mobile
obstacles,
”
Dr
ones
,
v
ol.
7,
no.
4,
p.
245,
Apr
.
2023,
doi:
10.3390/drones7040245.
[32]
M
.
Hildebrand,
R.
S
.
Andersen,
and
S.
Bøgh,
“Deep
reinforcement
learning
for
robot
batching
optimization
and
o
w
control,
”
Pr
ocedia
Manufacturing
,
v
ol.
51,
pp.
1462–1468,
2020,
doi:
10.1016/j.promfg.2020.10.203.
BIOGRAPHIES
OF
A
UTHORS
J
uan
Deyby
Carlos-Chullo
w
as
born
in
Cusco,
Peru.
He
obtained
his
Bachelor’
s
de
gree
in
Systems
Engineering
from
the
National
Uni
v
ersity
of
San
Agustin
de
Arequipa
in
2021.
His
research
interests
encompass
video
g
ames,
simulators,
articial
intelligence,
and
usability
.
Additionally
,
he
has
contrib
uted
to
a
project
in
v
olving
augmented
reality
called
ZOODEX,
which
w
as
af
liated
with
CiT
eSoft
(Research
Center
for
T
echnology
T
ransfer
and
Softw
are
De
v
elopment
R+D+i).
He
can
be
contacted
at
email:
jcarlosc@unsa.edu.pe.
Marielena
V
ilca-Quispe
w
as
born
in
Arequipa,
Peru.
She
is
a
graduate
of
S
ystems
Engineering
from
the
National
Uni
v
ersity
of
San
Agustin
de
Arequipa
i
n
2020.
Her
research
interests
encompass
video
g
ames,
aumented
reality
,
articial
intelligence
and
usability
.
Additionally
,
she
has
contrib
uted
to
a
project
in
v
olving
augmented
reality
called
ZOODEX,
which
w
as
af
liated
with
CiT
eSoft
(Research
Center
for
T
ec
hnology
T
ransfer
and
Softw
are
De
v
elopment
R+D+i).
She
can
be
contacted
at
email:
mvilcaquispe@unsa.edu.pe.
Whinders
J
oel
F
er
nandez-Granda
has
a
de
gree
i
n
Ph
ysics
from
the
National
Uni
v
ersity
of
San
Agustin
de
Arequipa,
Peru.
He
has
a
master’
s
de
gree
in
Higher
Education
and
is
a
teacher
in
the
Academic
Department
of
Ph
ysics
at
UNSA.
His
research
area
is
the
teaching
of
ph
ysics,
ha
ving
published
v
arious
articles
on
the
subject.
He
is
also
the
author
of
books
on
data
processing
in
e
xperi-
mental
ph
ysics
and
the
application
of
ph
ysics
to
v
arious
areas
of
kno
wledge.
He
can
be
contacted
at
email:
wfernandezgr@unsa.edu.pe.
Ev
eling
Castr
o-Gutierr
ez
Ev
eling
Castro-Gutierrez
holds
a
Ph.D.
in
Computer
Science
and
is
the
Coordinator
of
CiT
eSoft
at
UNSA.
She
is
a
f
aculty
member
at
the
National
Uni
v
ersity
of
San
Agustin
de
Arequipa
and
a
member
of
IEEE.
Additionally
,
she
serv
es
as
t
he
Coordinator
of
W
omen
in
Engineering
(WIE).
She
holds
a
Master’
s
de
gree
in
Softw
are
Engineering
and
has
been
the
principal
in
v
estig
ator
of
projects
at
CONCYTEC
and
UnsaIn
v
estig
a
since
2010.
Moreo
v
er
,
she
has
published
research
articles
in
Scopus
and
W
eb
of
Science
(W
oS)
in
Computer
V
ision
and
com-
putational
Thinking.
She
has
been
granted
cop
yright,
industrial
design
rights,
utility
model
patents,
and
in
v
ention
patents,
including
the
rst
international
patent
(PCT),
on
behalf
of
UNSA
in
2022.
She
can
be
contacted
at
email:
ecastro@unsa.edu.pe.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
38,
No.
1,
April
2025:
357–366
Evaluation Warning : The document was created with Spire.PDF for Python.