Indonesian J our nal of Electrical Engineering and Computer Science V ol. 38, No. 1, April 2025, pp. 357 366 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v38.i1.pp357-366 357 Simulation of ray beha vior in bicon v ex con v er ging lenses using machine lear ning algorithms J uan Deyby Carlos-Chullo, Marielena V ilca-Quispe, Whinders J oel F er nandez-Granda, Ev eling Castr o-Gutierr ez Uni v ersidad Nacional de San Agustin de Arequipa, Arequipa, Peru Article Inf o Article history: Recei v ed May 20, 2024 Re vised Oct 21, 2024 Accepted Oct 30, 2024 K eyw ords: Con v er ging bicon v e x lenses Machine learning Proximal polic y optimization Reinforcement learning Soft actor -critic ABSTRA CT This study used machine learning (ML) algorit hms to in v estig ate the simula- tion of light ray beha vior in bicon v e x con v er ging lenses. While earlier studies ha v e focused on lens image formation and ray tracing, the y ha v e not applied re- inforcement learning (RL) algorithms lik e proximal polic y optimization (PPO) and soft actor -critic (SA C), to model light refraction through 3D lens models. This study addresses that g ap by assessing and contrasting the performance of these tw o algorithms in an optical simulation conte xt. The ndings of this study suggest that the PPO algorithm achie v es superior ray con v er gence, surpassing SA C in terms of stability and accurac y in optical simulation. Consequently , PPO of fers a promising a v enue for optimizing optical ray simulators. It allo ws for a representation that closely aligns with the beha vior in bicon v e x con v er ging lenses, which holds signic ant potential for application in more comple x optical scenarios. This is an open access article under the CC BY -SA license . Corresponding A uthor: Juan De yby Carlos Chullo Uni v ersidad Nacional de San Agustin de Arequipa Arequipa, Peru Email: jcarlosc@unsa.edu.pe 1. INTR ODUCTION Con v er ging lenses, such as bicon v e x lenses, are designed to form both real and virtual images [1]. These lenses are essential for impro ving the precision with which we observ e and study objects [2]. Through the refraction of light, con v er ging lenses ena b l e illumi nated objects to project onto a screen, creating images that can be e xamined for v arious scientic purposes [3]. While se v eral applications simulate image formation through these lenses, man y do not fully capture the comple x beha vior of light rays. One such application, AR-GiOs, as analyzed in [4], has sho wn promising results in the academ ic eld, particularly for learning about the formation of real and virtual images. Ho we v er , despite its success in educational settings, AR-GiOs still struggles to accurately simulate the beha vior of rays passing through optical systems. This g ap highli ghts the limitations of current simulation tools in capturing the subtle details of ray beha vior , which are fundamental to the study of ph ysical optics. Se v eral applications attempt to sim u l ate image formation through lenses, b ut the y often f ail to accu- rately model the light rays in v olv ed in the process [4], [5]. These rays, referred to as principal, central, and focal rays, are essential for understanding k e y optical beha viors when light passes through lenses or mirrors. Accu- rate simulation of these rays is crucial because the y dictate ho w images are formed and ho w optical systems function, yet man y e xisting tools lack the necessary delity to simulate them ef fecti v ely . J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
358 ISSN: 2502-4752 No studies ha v e been identied that apply reinforcement learning (RL) algorithms to model light refraction through lenses. RL methods lik e proximal polic y opt imization (PPO) [6], [7] and soft actor -critic (SA C) [8], [9] are e xtensi v ely used in articial intelligence (AI) and machine learning (ML) for decision- making tasks [10], [11]. These algorithms operate based on learning from interactions with their en vironment, where an agent mak es decisions and is gi v en feedback through re w ards or penalties [12]. Due to the dynamic nature of light refraction, RL algorithms ha v e the potential to impro v e the precision of ray simulations. This article proposes the use of P PO and SA C algorithms to control the trajectory of rays as the y pass through a lens, guiding them to con v er ge at points where virtual or real images are formed. By applying the thin lens equation and magnication formulas, the de viation and trajectory of the rays are calculated as the y interact with the lens [13]. A simulator created in Unity utilizes these RL algorithms to simulate the passage of the three critical rays (principal, central, and focal) through a con v er ging lens, aiming to achie v e accurate ray con v er gence and image formation. While it is possible to simulate con v er ging rays through a lens, achie ving an accurate simulation of rays passing through a 3D lens model requires highly comple x and computationally demanding models. Gi v en the detailed geometry of con v er ging lenses, it is not feasible to approximate their shape using multiple primiti v e models in Unity . Therefore, moderately comple x 3D models and RL algorithms are emplo yed to enhance the accurac y of the simulation. The remainder of this article is structured as follo ws: section 2 co v ers related w orks, section 3 details the proposed simulation of ray beha vior in bicon v e x con v er ging lenses, and section 4 pro vides the results and discussion. Lastly , section 5 outlines our conclusions and suggests directions for future w ork. 2. RELA TED W ORKS 2.1. Con v er ging bicon v ex lenses Bicon v e x lenses, with their tw o curv ed surf aces f acing outw ard, serv e as an e xample of con v er ging lenses. It is crucial to note that, despite their appearance, these lenses are positi v e (with thickness decreasing from the center to w ards the edges) and ha v e the ability to focus light rays [14]. Commonly used in optics courses in schools or uni v ersities, these lenses are emplo yed to illustrate the principles of refraction and the formation of both real and virtual images [1]. 2.2. Machine lear ning ML is a crucial branch of AI, enabling computers to process information and learn from it [12]. Through the use of algorithms, ML addresses comple x data problems and automates processes, with applica- tions in v arious elds such as data mining, image analysis, and predicti v e modeling [15]. Its broad applicability e xtends into numerous sci entic areas, particularly within the ph ysical sciences, where it applies algorithms and modeling t echniques for data analysis in disciplines li k e statistical mechanics, high-ener gy ph ysics, cosmology , quantum man y-body systems, quantum computing, chemistry , and materials research [16]. 2.3. Reinf or cement lear ning RL is a ML method where an agent eng ages with its en vironment and disco v ers an optimal strate gy through trial and error [10], [17]. It is recognized as one of the three primary types of ML, alongside super - vised and unsupervised learning. Unlik e other approaches, its objecti v e is to acquire dif ferent actions based on the conditions in the en vironment, with the agent serving as the principal decision-mak er [17]. RL has demonstrated signicant potential for adv ancing AI [18]. In this frame w ork, the agent recei v es feedback from the en vironment b ut lacks access to labeled data or e xplici t guidance. It is emplo yed in sequential decision- making tasks across v arious domains, including natural and social sciences, engineering, and AI [19]. 2.3.1. Pr oximal policy optimization PPO is a RL technique that has demonstrated cutting-edge performance across a range of challenging tasks [20]. PPO has been utilized in multiple areas, including robotics, g aming, and autonomous systems, to impro v e agent performance in comple x en vironments. F or e xample, in [21], PPO w as emplo yed to automate simulated autonomous dri ving, leading to enhanced outcomes. Similarly , in [22], PPO w as ef fecti v ely used to predict stock mark et trends, highlighting its v ersatility and ef cienc y in nancial applications. Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 1, April 2025: 357–366 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 359 2.3.2. Soft actor -critic SA C operates within the maximum entrop y RL frame w ork, aiming to maximize both e xpected perfor - mance and entrop y simultaneously , thereby enabling actors to act with maximum randomness while achie ving task success [8]. Its ef cac y has been e xtensi v ely e v aluated in v arious e xperiments, including tests on Atari g ames and a lar ge-scale MOB A g ame, as demonstrated in [23]. In comparati v e studies, PPO has consistently emer ged as a top performer , as e vidence d by comparisons with SA C across dif ferent test conditions [24]. Specically , PPO sho wcased superior performance, especially in scenarios in v olving a high number of units and layers. In research comparing RL algorithms, PPO consistently demonstrates remarkable performance, sur - passing SA C i n v arious conditions, particularly when dealing with comple x architectures [25]. Furthermore, in comparati v e studies of deep RL algorithms, PPO consistently outperforms alternati v es lik e DDPG, SA C, and TD3, as demonstrated in [26]. T o implement RL in simulation en vironments, practitioners often le v erage tools such as Unity and ML-Agents, as highlighted in pre vious research [27], [28]. 3. METHOD 3.1. Pr oposed simulation of ray beha vior in con v er ging bicon v ex lenses This w ork aims to de v elop a simulation of ray beha vior in con v er ging bicon v e x lenses using 3D models, utilizing RL techniques lik e PPO and SA C to modify the refraction angles of light rays passing through a lens. The goal is to compare the feasibility and stability of these algorithms in achie ving more accurate simulation results. The proposal includes the follo wing steps as outlined in T able 1. T able 1. Steps for simulating light through a bicon v e x lens Steps Description 1 Dene the objecti v e 2 Model the con v er ging bicon v e x lens using Blender 3 De v elop a simulation en vironment using Unity 4 Identi fy constraints and critical properties 5 Explanation of PPO and SA C algorithms 6 RL en vironment conguration 3.2. Simulation of con v er ging bicon v ex lens beha vior T o v alidate the proposal, a simulator based on RL algorithms, specically PPO and SA C, has been de v eloped. These algorithms were emplo yed to accurately model and simulate the beha vior of light rays in con v er ging bicon v e x lenses under simulated conditions. 3.2.1. Dene the objecti v e The objecti v e is to compare t he feasibility and stability of the PPO and SA C algorithms within the conte xt of simulating con v er ging bicon v e x lenses. The purpose is to determine which of these algorithms is more ef fecti v e in achie ving accurate and reliable simulation results that precisely describe the beha vior of light rays interacting with con v er ging bicon v e x lenses. PPO and SA C were s elected because of their e xtensi v e use in the literature, being recognized for ef fecti v ely managing both single-agent settings as well as multi-agent cooperati v e and competiti v e scenar - ios. Additionally , other RL algorithms, such as MA-POCA [29], also support these types of en vironments. Ho we v er , this study focuses on PPO and SA C due to their popularity and demonstrated success in comple x ph ysical simulations. 3.2.2. Modeling the con v er ging bicon v ex lens using Blender In the simul ation, tw o distinct models of con v er ging bicon v e x lenses were implemented to ensure high precision in ray tracing. Each model, depic ted in Figure 1, w as constructed using Blender 3D v ers ion 3.6.1 and has a size of 6.5 MB. Th e se models consist of 141,604 v ertices and 269,316 triangles, with one model ha ving its surf ace normals oriented inw ard and the other outw ard. This dif ference in normal orientation w as essential to enable precise collision detection by the rays emitted using Unity’ s Raycast function, both when entering and e xiting the lens. The models were generated by intersecting tw o spheres, which were created in Blender with the highest possible le v el of detail, constrained by the computational capabilities and the limits of the Simulation of r ay behavior in bicon ve x ... (J uan De yby Carlos-Chullo) Evaluation Warning : The document was created with Spire.PDF for Python.
360 ISSN: 2502-4752 Blender softw are. Each sphere has a radius of 10 meters, and their centers are separated by a distance of 19.9 units, resulting in an intersection of 0.1 meters. This intersection w as chosen to produce a lens thin enough to a v oid the optical aberration that occurs in thick er lenses. Figure 1. Characteristics of the spheres with 256 rings and 2048 se gments each, with dimensions of 10x10x10 meters and an intersection of 0.10 meters, resulting in a con v er ging bicon v e x lens of approximately 67 rings, 2048 se gments, and dimensions of 1.977 meters x 1.977 meters x 0.1 meters 3.2.3. De v elop a simulation en vir onment using Unity In the conte xt of optical ray si mulation with a con v er ging lens, a simulation en vironment w as de v el- oped using Unity v ersion 2021.3.11f1 and the RL agents library , a popular tool in RL en vironments [7]. This en vironment includes elements such as a con v er ging bicon v e x lens, focal points, and a ray launch point to determine the initial direction and trajectory of the rays passing t hrough the lens. The de v elopment aimed to apply PPO and SA C algorithms to simulate and optimize the beha vior of light rays. Figure 2 presents a screenshot of the simulation en vironment within Unity . It illustrates the con v er ging lens, focal points, and the trajectory of three t ypes of rays projected from a designated origin point. These rays include the principal, central, and focal rays, mo ving from left to right from the vie wpoint, sho wcasing the simulated optical phenomena. The simulator designed for this study inte grates ph ysics optics principles into Unity’ s frame w ork, pro viding a platform for comparing the performance of PPO and SA C algorithms in simulating light ray tra- jectories. Cent ral to the simulator is the algorithm responsible for tracing the path of light rays as t he y interact with the lens surf aces. By recording c ollision points, the beha vior of rays can be analyzed, informing adjust- ments required for accurate simulation. The utilization of the thin lens formula and lens magnication aids in calculating the optimal points for ray passage or approach, enhancing the realism of the simulation. Figure 2. Simulation en vironment de v eloped in Unity . The en vironment contains a con v er ging bicon v e x lens. F ocal points are located on both sides of the lens. Three rays principal ray , central ray , and focal ray are projected from the origin point. All rays are projected from left to right from the vie wpoint 3.2.4. Identifying constraints and critical pr operties A k e y constraint is to limit the rays to three specic types: principal, central, and focal, due to their rele v ance in the eld of optics. Additionally , the launch point of the rays w as restricted to a distance of 10 meters from the lens, and along the Y and Z ax es within a range that ensures col lision with the lens, pre v enting the rays from escaping into empty space or striking the lens edges. Gi v en that the lens has a radius of 1 meter , Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 1, April 2025: 357–366 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 361 a maximum radius of 0.975 meters w as chosen to a v oid reaching the edge. Furthermore, the refracti v e inde x of the lens w as set to 2, despite gl ass typically ha ving a v alue of 1.45, in order to achie v e tighter con v er gence of the rays. Finally , the number of i nteractions during the training of the PPO and SA C algorithms w as limited to 500k due to the time required for training, with 48 simulated ray instances, taking approximately 20 to 30 minutes to complete the training for each ray type. 3.2.5. Explanation of PPO and SA C algorithms The PPO method is a widely used on-polic y algorithm in RL, based on combining v alue and polic y gradients to optimize agent performance [21], [30]. Its k e y objecti v e is to mak e sure that, after updating the polic y , it remains relati v ely close to the pre vious one. T o a v oid drastic shifts, PPO incorporates a clipping mechanism. This algorithm samples data from its en vironment and uses stochastic gradient descent to optimize a clipped loss function [31]. In contrast, SA C is an of f-polic y algorithm in RL that follo ws an actor -critic approach and does not re ly on predened models or rules [30]. SA C emplo ys a re vised RL objecti v e function and emphasizes maximizing re w ards o v er the agent’ s lifespan along with polic y entrop y [31]. Therefore, in this study , visual analyses were conducted to assess whether the AI agent responsible for adjusting the angle in optical ray simulators w as trained using ML, ensuring its stability and applicability . T o achie v e this, the Unity ML agent w as utilized, and a comparison between the PPO and SA C algorithms w as performed. 3.2.6. RL en vir onment conguration In this study , the ray’ s origin point is randomly selected within a distance of 2F from the X-axis, allo wing its location at an y point within the area of a circle dened by the Y and Z ax es, as sho wn in Figure 3. In this conte xt, the v ariable observ ed by the agent is the radius, representing the distance from the center of the circle to the origin point of the ray . The agent mak es decisions based on this v ariable while interacting with the en vironment and the con v er ging bicon v e x lens. Figure 3. The starting points of the rays originate at a distance of 2F from the right side. Kno wing that F is at a distance of 2.5 meters and 2F at 5 meters T o ensure proper beha vior and pre v ent unnecessary collision repetitions, a small displacement of 10 8 meters in the direction of the ray w as applied each time a colli sion occurred with the lens model. This displacement w a s necessary because the ray could collide with both internal and e xternal colliders generated from the normals of the lens model, ensuring that the ray continued its trajectory without additional interference in subsequent collisions Figure 4. Throughout the training process, the agent recei v ed observ ations from the en vironment, aiding in the decision-making process. Re w ards were utilized to moti v ate the agent to adjust the angle of the rays after colliding with the lens. Figure 4. The guidelines, for instance, in the case of the principal ray at the top of the lens, are as follo ws: the green lines represent the ray’ s trajectory , the red lines correspond to the normals at the points where the ray collided, and the yello w line is used to project the resulting ray Simulation of r ay behavior in bicon ve x ... (J uan De yby Carlos-Chullo) Evaluation Warning : The document was created with Spire.PDF for Python.
362 ISSN: 2502-4752 RL agent, the RL agent’ s task is to adjust the ray’ s direction each time it passes through the lens, aiming to minimize the distance between the resulting ray’ s path and the tar get point. This tar get point is calculated using the thin lens formulas in (1) and the magnication in (2). The formula applied for con v er ging thin lenses is kno wn as the thin lens equation, which relates the image distance ( d i ), object distance ( d o ), and focal length ( f ) of the lens, as sho wn in (2). Additionally , lateral magnication is used, linking the image height ( h i ), object height ( h o ), image distance ( d i ), and object distance ( d o ) as presented in (1). The a v ailability of these actions for the agent depends on the state of the en vironment at that moment. In this particular case, the decision has been made to apply the RL algorithms PPO and SA C, taking only one action in each training c ycle. Unlik e man y approaches described in the literature, where multiple actions are tak en in each training c ycle, in this en vironment, the agent tak es only one action out of tw o possible actions in each training c ycle: - The refraction angle wi ll only be modied when the emitted light reaches the four k e y points (the point of origin, the e xternal collision point where the light enters the lens, the internal collision point where it e xits the lens, and the endpoint). - Rays that ha v e three points or fe wer will be discarded and will not be considered in the simulation because some rays do not collide with the lens, which is due to defects in the 3D model. 1 f = 1 d o + 1 d i (1) m = h i h o = d i d o (2) Hyperparameters, similar to other RL algorithms, both PPO and SA C ha v e multiple h yper parameters that inuence the agent’ s performance in a con v er ging lens en vironment. In this case, the objecti v e is to adjust the angle of refraction to produce an outgoing ray trajectory that f alls within a precision threshold of 0.01 meters from the point pre vious ly calculated using lens and magnication formulas. The re w ard is determined by the distance between the resulting ray and the tar get point. T able 2 lists the h yperparameters of PPO and SA C used in this study , with both congurations based on e xamples from Unity’ s ML-Agents toolkit. T able 2. Hyperparameters for PPO and SA C algorithms in the e xperiment. P arameters include polic y de viation penalty , learning rate, batch size, iterations, samples, and entrop y settings. These v alues af fect the agents’ performance in simulating light ray trajectories through a con v er ging lens Hyperparameter PPO v alue SA C v alue Polic y de viation penalty coef cient 0.2 - Learning rate 0.0001 0.0003 Batch size 64 0.- Number of iterations 10 - Number of collected samples 1000 - Replay b uf fer size - 1000 T ar get entrop y - 0.2 Entrop y re gularization f actor - 0.01 Minimum entrop y - 0.5 Re w ard function, Figure 5 pro vides a visual representation of ho w this re w ard function operates within the agent-en vironm ent interaction. The re w ard mechani sm plays a crucial role in t he agent’ s l earning process [32]. T o achie v e the i ntended beha vior , a specic goal needs to be dened for optimization. The agent’ s task is to adjust the angle of refraction so that the resulting ray is within at least 0.01 meters of the tar get point. As the ray passes through the 3D model, the re w ard is determined by: - If the distance to the tar get is less than or equal 0.01 m. - Re w ard = 1.0 Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 1, April 2025: 357–366 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 363 - If the distance to the tar get is greater than to 0.01 m. - The change in distance to the tar get is calculated, indicating whether the ray is approaching or mo ving a w ay from the tar get. This is achie v ed by subtracting the current distance to the tar get from the distance it had in the pre vious step. - If the change in distance is positi v e, the agent is re w arded for approaching the tar get. - If the change in distance is ne g ati v e, the agent is penalized for mo ving a w ay from the tar get. Figure 5. V isual depiction of the re w ard function go v erning agent-en vironment interaction. Re w ards are determined by ray proximity to the tar get, encouraging con v er gence and discouraging di v er gence. The gure claries re w ard dynamics in collision scenarios and distance relationships Experiment en vironment, the setup includes an AMD Ryzen 7 3800XT processor , an NVIDIA GeF orce R TX 3070 GPU, and 32 GB of RAM. Additionally , the follo wing softw are v ersions were utilized: Unity En- gine 2021.3.11f1, T ensorFlo w 2.13.0, and Unity ML-Agents T oolkit Release 20. These specications were chosen to ensure a rob ust and ef cient system capable of handling compl e x simulations and the computation- ally demanding tasks required for training RL algorithms in optical ray simulations. 4. RESUL TS AND DISCUSSION W e found that the PPO algorithm achie v ed superior ray con v er gence with higher stability and accurac y than SA C. PPO reached a re w ard abo v e 0.99 in fe wer steps for principal, central, and focal rays, while SA C sho wed impro v ement only after 500k steps, making PPO better suited for optimizing optical ray simulators. 4.1. Ev aluate the beha vior of PPO and SA C algorithms The learning results were visualized using T ensorBoard, a tool from T ensorFlo w , with data generated for the principal, central, and focal rays. As sho wn in Figures 6 and 7, the PPO algorithm achie v ed re w ards of 0.9932, 0.9943, and 0.9938 for the principal, central, and focal rays, respecti v ely , within 200k steps, success- fully meeting the tar get re w ard of 0.99 wi th a precision threshold of 0.01m. In contrast, the SA C algorithm obtained re w ards of 0.8951, 0.8829, and 0.8715 for t he same rays o v er the same steps, f alling short of the tar get. While SA C sho wed impro v ement between 200k and 500k steps, it still lagged behind PPO in accurac y and stability . PPO’ s results closely aligned with the predictions from the thin lens formula, demonstrating its superior performance, consistent with pre vious studies highlighting PPO’ s ef fecti v eness in achie ving reliable outcomes in similar simulation tasks. 0. 8 0. 9 1 0 100k 200k 300k 400k 500k PPO S AC (a) 0. 7 0. 8 0. 9 1 0 100k 200k 300k 400k 500k PPO S AC (b) 0. 7 0. 8 0. 9 1 0 100k 200k 300k 400k 500k PPO S AC (c) Figure 6. Comparison of PPO and SA C algorithms in dif ferent rays (a) principal ray , (b) center ray , and (c) focal ray . The graphs represent en vironment/cumulati v e re w ard. PPO is represented in blue, and SA C in pink. The PPO algorithm sho ws stable performance throughout while SA C e xperiences early signs of o v ertting b ut e v entually stabilizes Simulation of r ay behavior in bicon ve x ... (J uan De yby Carlos-Chullo) Evaluation Warning : The document was created with Spire.PDF for Python.
364 ISSN: 2502-4752 Figure 7. Results of all rays (principal, central, and focal) for the PPO and SA C algorithms at the 200k step 4.2. Principal ndings Optical ph ysics simulators are a critical area of study , as the y require agents capable of e x ecuting beha viors in real-time under v arious circumstances. In this project, a simulator for con v er ging rays and lenses w as designed to e v aluate the performance of algorithms lik e PPO and SA C. The results contrib ute to impro ving realistic representations of optical phenomena, sho wing that PPO ef fecti v ely emulates the beha vior of objecti v e optical systems and accurately reproduces the predicted outcomes. 4.3. Comparison to prior w ork In comparison with pre vious applications of PPO and SA C in other domains, such as g ame si mulations and general ph ysics en vironments [12], [30], this project stands out by de v eloping a simulator focused on the beha vior of rays in con v er ging bicon v e x lenses. Unlik e other appl ications that use rays as guides b ut do not simulate them correctly [4], this w ork allo ws for the visualization of the optical phenomenon using only rays trained with RL. The results demonstrate ho w a ray is refracted when entering and e xiting the lens. The choice of PPO, based on its stability and adaptability , has pro v en ef fect i v e for this comple x task, distinguishing this research from pre vious w orks focused on broader or less specic areas. 4.4. Str engths and limitations This study e xplored the application of RL algorithms t o simulate ray beha vior in a single type of con- v er ging bicon v e x lens with three specic ray types (principal, central, and focal). Ho we v er , further and more comprehensi v e studies are needed to e xplore the beha vior of additional ray types and more comple x optical systems, such as multi-lens setups. Using a dense 3D polygonal mesh also introduced computational chal- lenges, such as memory limitations and occasional missed collisions, which may ha v e impacted the accurac y of the simulations. 5. CONCLUSION This research successfully applied RL algorithms to simulate the beha vior of light rays passing through bicon v e x con v er ging lenses, demonst rating the viability of RL in modeling optical phenomena. The results, particularly with PPO achie ving a re w ard e xceeding 0.99 with high accurac y , indicate its superior stability and ef cienc y in this conte xt. I n contrast, SA C, while kno wn for its general applicability in v arious domains, underperformed in this specic scenario. This nding aligns with the need to tailor RL algorithms to problem- specic dynamics, as SA C’ s v ersatility in other studies w as not considered in this study . Recent observ ations suggest that RL algorithms can signicant ly impro v e the accurac y of optical ray simulations. Our ndings pro vide conclusi v e e vidence that PPO, in particular , enhances the precision of ray con v er gence through bicon- v e x lenses, making it a promising tool for future optical system modeling. Our study demonstrates that PPO is more reliable for simulating ray beha vior in optical systems. Future studies may e xplore the application of these algorithms to multi-lens systems, where ray tracing becomes more intricate. AKNO WLEDGMENTS Thanks to the ”Research Center , T ransfer of T echnologies and Softw are De v elopment R + D + i” - CiT eSoft EC-0003-2017-UNSA, for their collaboration in the use their equipment and f acilities, for the de v el- opment of this research w ork. Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 1, April 2025: 357–366 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 365 REFERENCES [1] H. Isik, “Comparing the images formed by uses of lens surf aces, Physics Education , v ol. 58, no. 3, p. 035002, May 2023, doi: 10.1088/1361-6552/acb87d. [2] K. Flie g auf, J. Sebald, J. M. V eith, H. Spieck er , and P . Bitzenbauer , “Impro ving early optics instruction using a phenomenological approach: a eld study , Optics , v ol. 3, no. 4, pp. 409–429, No v . 2022, doi: 10.3390/opt3040035. [3] S. W ¨ orner , S. Beck er , S. K ¨ uchemann, K. Scheiter , and J. K uhn, “De v elopment and v alidation of the ray optics in con- v er ging lenses concept in v entory , Physical Re vie w Physics Education Resear c h , v ol. 18, no. 2, p. 020131, No v . 2022, doi: 10.1103/Ph ysRe vPh ysEducRes.18.020131. [4] H. P . K encana, B. H. Isw anto, and F . C. W ibo w o, Augmented reality geometrical optics (AR-GiOs) for ph ysics learning in high schools, J ournal of Physics: Confer ence Series , v ol. 2019, no. 1, p. 012004, Oct. 2021, doi: 10.1088/1742-6596/2019/1/012004. [5] Y .-J. Liao, W . T arng, and T .-L. W ang, “The ef fects of an augmented reality lens imaging learning system on students’ science achie v ement, learning moti v ation, and inquiry skills in ph ysics inquiry acti vities, Education and Information T ec hnolo gies , Sep. 2024, doi: 10.1007/s10639-024-12973-9. [6] J. Schulman, F . W olski, P . Dhariw al, A. Radford, and O. Klimo v , “Proximal polic y optimization algorithms, Arxiv , 2017, [Online]. A v ailable: http://arxi v .or g/abs/1707.06347. [7] A. Raza, M. A. Shah, H. A. Khattak, C. Maple, F . Al-T urjman, and H. T . Rauf, “Collaborati v e multi-agents in dynamic industrial internet of things using deep reinforcement learning, En vir onment, De velopment and Sustainability , v ol. 24, no. 7, pp. 9481–9499, Jul. 2022, doi: 10.1007/s10668-021-01836-9. [8] T . Haarnoja et al. , “Soft actor -critic algorithms and applications, Arxiv , 2018, [Online]. A v ailable: http://arxi v .or g/abs/1812.05905. [9] B. Peng, Y . Xie, G. Seco-Granados, H. W ymeersch, and E. A. Jorswieck, “Communication scheduling by deep reinforcement learning for remote traf c state estimation with bayesian infere nce, IEEE T r ansactions on V ehicular T ec hnolo gy , v ol. 71, no. 4, pp. 4287–4300, Apr . 2022, doi: 10.1109/TVT .2022.3145105. [10] M. Kim, J.-S. Kim, M.-S. Choi, and J.-H. P ark, Adapti v e discount f actor for deep reinforcement learning in continuing tasks with uncertainty , Sensor s , v ol. 22, no. 19, p. 7266, Sep. 2022, doi: 10.3390/s22197266. [11] V . K. R. Radha, A. N. Lakshmipathi, R. K. T irandasu, and P . R. Prakash, “The general design of the automation for multiple elds using reinforcement learning algorithm, Indonesian J ournal of Electrical Engineering and Computer Science (IJEECS) , v ol. 25, no. 1, p. 481, Jan. 2022, doi: 10.11591/ijeecs.v25.i1.pp481-487. [12] H. An and J. Kim, “Design of a h yper -casual futsal mobile g ame using a machine-learned AI agent-player , Applied Sciences , v ol. 13, no. 4, p. 2071, Feb . 2023, doi: 10.3390/app13042071. [13] T . Goncharenk o, N. Y ermak o v a-Cherchenk o, and Y . Anedchenk o, “Experience in the use of mobile technologies as a ph ysics learning method, CEUR W orkshop Pr oceedings , v ol. 2732, pp. 1298–1313, 2020. [14] Y . B. Bhakti, I. A. D. Astuti, and R. Prasetya, “F our -tier optics diagnostic test (4T -ODT) to identify student misconceptions, in Advances in Social Science , Education and Humanities Resear c h , 2023, pp. 308–314. [15] B. Mahesh, “Machine learning algorithms - a re vie w , International J our nal of Science and Resear c h (IJSR) , v ol. 9, no. 1, pp. 381–386, Jan. 2020, doi: 10.21275/AR T20203995. [16] G. Carleo et al. , “Machine learning and the ph ysical sciences, Re vie ws of Modern Physics , v ol. 91, no. 4, p. 045002, Dec. 2019, doi: 10.1103/Re vModPh ys.91.045002. [17] A. T . Huynh, B. T . Nguyen, H. T . Nguyen, S. V u, and H. D. Nguyen, A method of deep reinforcement learning for simulation of autonomous v ehicle control, in International Confer ence on Evaluation of No vel Appr oac hes to Softwar e Engineering , EN ASE - Pr oceedings , 2021, v ol. 2021-April, pp. 372–379, doi: 10.5220/0010478903720379. [18] C. Stranne g ˚ ard et al. , “The ecosystem path to A GI, in Lectur e Notes in Computer Science (including subseries Lectur e Notes in Articial Intellig ence and Lectur e Notes in Bioinformatics) , v ol. 13154 LN AI, 2022, pp. 269–278. [19] S. S. Mousa vi, M. Schukat, and E. Ho wle y , “Deep reinforcement learning: an o v ervie w , Lectur e Notes in Net works and Systems , v ol. 16, pp. 426–440, 2018, doi: 10.1007/978-3-319-56991-8 32. [20] Y . W ang, H. He, and X. T an, “T ruly proximal polic y optimization, Pr oceedings of Mac hine Learning Resear c h , v ol. 115, pp. 113–122, 2019. [21] Y . Sa vid, R. Mahmoudi, R. Mask eli ¯ unas, and R. Dama ˇ se vi ˇ cius, “Simulated autonomous dri ving usi ng reinforcement learning: a comparati v e st udy on Unity’ s ML-agents frame w ork, Information , v ol. 14, no. 5, p. 290, May 2023, doi: 10.3390/info14050290. [22] H. K. Sagiraju and S. Mog alla, Application of multilayer perceptron to deep reinforcement learning for stock mark et trading and analysis, Indonesian J ournal of Electrical Engineering and Computer Science (IJEECS) , v ol. 24, no. 3, pp. 1759–1771, 2021, doi: 10.11591/ijeecs.v24.i3.pp1759-1771. [23] H. Zhou, Z. Lin, J. Li, Q. Fu, W . Y ang, and D. Y e, “Re vi siting discrete soft actor -critic, Arxiv , 2022, [Online]. A v ailable: http://arxi v .or g/abs/2209.10081. [24] I. V ohra, S. Uttrani, A. K. Rao, and V . Dutt, “Ev aluating the ef cac y of dif ferent neural netw ork deep reinforcement algorithms in comple x search-and-retrie v e virtual simulations, Communications in Computer and Information Science , v ol. 1528 CCIS, pp. 348–361, 2022, doi: 10.1007/978-3-030-95502-1 27. [25] D. S. Alarcon and J. H. Bidinotto, “Deep reinforcement learning for e vtol ho v ering control, 33r d Congr ess of the International Council of the Aer onautical Sciences, ICAS 2022 , v ol. 7, pp. 5130–5142, 2022. [26] H. Shengren, E. M. Salazar , P . P . V er g ara, and P . P alensk y , “Performance comparison of deep RL algorithms for ener gy sys- tems optimal scheduling, in 2022 IEEE PES Inno vative Smart Grid T ec hnolo gies Confer ence Eur ope (ISGT -Eur ope) , Oct. 2022, v ol. 2022-Octob, pp. 1–6, doi: 10.1109/ISGT -Europe54678.2022.9960642. [27] J. Possik et al. , A distrib uted simulation approach to inte grate an ylogic and Unity for virtual reality applications: case of CO VID-19 modelling and training in a dialysis unit, in 2021 IEEE/A CM 25th International Symposium on Distrib uted Simulation and Real T ime Applications (DS-RT) , Sep. 2021, pp. 1–7, doi: 10.1109/DS-R T52167.2021.9576149. [28] A. Juliani et al. , “Unity: a general platform for intelligent agents, Arxiv , 2018, [Online]. A v ailable: http://arxi v .or g/abs/1809.02627. [29] A. Cohen et al ., “On the use and misuse of absorbing states in multi-agent reinforcement learning, Arxiv , 2021, [Online]. A v ailable: http://arxi v .or g/abs/2111.05992. Simulation of r ay behavior in bicon ve x ... (J uan De yby Carlos-Chullo) Evaluation Warning : The document was created with Spire.PDF for Python.
366 ISSN: 2502-4752 [30] C. Y u et al. , “The surprising ef fecti v eness of PPO in cooperati v e multi-agent g am es, Advances in Neur al Information Pr ocessing Systems , v ol. 35, 2022. [31] A. P . Kalidas, C. J. Joshua, A. Q. Md, S. Basheer , S. Mohan, and S. Sakri, “Deep reinforcement learning for vision-based na vig ation of U A Vs in a v oiding stationary and mobile obstacles, Dr ones , v ol. 7, no. 4, p. 245, Apr . 2023, doi: 10.3390/drones7040245. [32] M . Hildebrand, R. S . Andersen, and S. Bøgh, “Deep reinforcement learning for robot batching optimization and o w control, Pr ocedia Manufacturing , v ol. 51, pp. 1462–1468, 2020, doi: 10.1016/j.promfg.2020.10.203. BIOGRAPHIES OF A UTHORS J uan Deyby Carlos-Chullo w as born in Cusco, Peru. He obtained his Bachelor’ s de gree in Systems Engineering from the National Uni v ersity of San Agustin de Arequipa in 2021. His research interests encompass video g ames, simulators, articial intelligence, and usability . Additionally , he has contrib uted to a project in v olving augmented reality called ZOODEX, which w as af liated with CiT eSoft (Research Center for T echnology T ransfer and Softw are De v elopment R+D+i). He can be contacted at email: jcarlosc@unsa.edu.pe. Marielena V ilca-Quispe w as born in Arequipa, Peru. She is a graduate of S ystems Engineering from the National Uni v ersity of San Agustin de Arequipa i n 2020. Her research interests encompass video g ames, aumented reality , articial intelligence and usability . Additionally , she has contrib uted to a project in v olving augmented reality called ZOODEX, which w as af liated with CiT eSoft (Research Center for T ec hnology T ransfer and Softw are De v elopment R+D+i). She can be contacted at email: mvilcaquispe@unsa.edu.pe. Whinders J oel F er nandez-Granda has a de gree i n Ph ysics from the National Uni v ersity of San Agustin de Arequipa, Peru. He has a master’ s de gree in Higher Education and is a teacher in the Academic Department of Ph ysics at UNSA. His research area is the teaching of ph ysics, ha ving published v arious articles on the subject. He is also the author of books on data processing in e xperi- mental ph ysics and the application of ph ysics to v arious areas of kno wledge. He can be contacted at email: wfernandezgr@unsa.edu.pe. Ev eling Castr o-Gutierr ez Ev eling Castro-Gutierrez holds a Ph.D. in Computer Science and is the Coordinator of CiT eSoft at UNSA. She is a f aculty member at the National Uni v ersity of San Agustin de Arequipa and a member of IEEE. Additionally , she serv es as t he Coordinator of W omen in Engineering (WIE). She holds a Master’ s de gree in Softw are Engineering and has been the principal in v estig ator of projects at CONCYTEC and UnsaIn v estig a since 2010. Moreo v er , she has published research articles in Scopus and W eb of Science (W oS) in Computer V ision and com- putational Thinking. She has been granted cop yright, industrial design rights, utility model patents, and in v ention patents, including the rst international patent (PCT), on behalf of UNSA in 2022. She can be contacted at email: ecastro@unsa.edu.pe. Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 1, April 2025: 357–366 Evaluation Warning : The document was created with Spire.PDF for Python.