IAES Inter national J our nal of Articial Intelligence (IJ-AI) V ol. 14, No. 5, October 2025, pp. 3528 3541 ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i5.pp3528-3541 3528 Efciency sear ch: application of natur e-inspir ed algorithms in articial intelligence f or ecasting models J os ´ e Rolando Neira V illar, Miguel ´ Angel Cano Lengua F acultad de Ingenier ´ ıa de Sistemas e Inform ´ atica, Uni v ersidad T ecnol ´ ogica del Per ´ u, Lima, Per ´ u Article Inf o Article history: Recei v ed Jan 13, 2025 Re vised Jul 26, 2025 Accepted Aug 6, 2025 K eyw ords: Articial intelligence Demand forecasting Multi-space optimization Nature-inspired optimization algorithm Quantization ABSTRA CT This study re vie ws ho w nature-inspired optimization algorithms (NIO As) ha v e been applied to articial intelligence-based demand forecasting, using preferred reporting items for systematic re vie ws and meta-analyses (PRISMA) and clus- tering analysis to e xamine 36 selected articles. The ndings re v e al that NIO As, particularly genetic algorithms and sw arm intelligence methods, including their h ybrids, ha v e been frequently applied to long short-term memory (LSTM) and other backpropag ation neural netw ork models (BPNN). A k e y insight is the dif- ferentiated application of NIO As depending on netw ork depth: In shallo w net- w orks, the y ha v e been ef fecti v ely used to optimize trainable parameters, whereas in deep netw orks, their role has focused primarily on h yperparameter optimiza- tion due to the prohibiti v e dimensionality of trainable weights. I n all studies, NIO A-optimized models consistently outperform con v entional baselines based on backpropag ation. Ho we v er , persistent challenges such as e xcessi v e e x ecution times and slo w con v er gence ha v e led to the de v elopment of more ef cient h y- brid s trate gies and adapti v e mechanisms for automated e xploration-e xploitation control. By mapping e xplored and une xplored pathw ays, s ummarizing k e y out- comes and techniques, and identifying promis ing methodologies, this re vie w of fers a practical foundation to guide future e xperiments and implementations in v olving NIO A-based optimization strate gies in neura l netw ork models. As a conceptual contrib ution, it also proposes an inno v ati v e use of multispace opti- mization to address one of the most critical challenges identied: the optimiza- tion of trainable parameters in deep neural netw orks. This is an open access article under the CC BY -SA license . Corresponding A uthor: Miguel ´ Angel Cano Lengua F acultad de Ingenier ´ ıa de Sistemas e Inform ´ atica, Uni v ersidad T ecnol ´ ogica del Per ´ u Jr . Natalio Sanchez 125, Lima, Per ´ u Email: mcanol@unmsm.edu.pe 1. INTR ODUCTION Accurate demand forecasting is crucial for managing b usiness operations and supply chains, enabling ef fecti v e resource planning while a v oiding costly issues such as stock outs and the b ull whip ef fect. It also supports nancial, human resource, and mark eting planning, thereby signicantly enhancing competiti v eness [1]-[4]. In light of this importance, recent years ha v e witnessed the emer gence of adv anced machine learning approaches, particularly deep learning-based models, which ha v e consistently demonstrated superior predicti v e performance [4], [5]. Ho we v er , a maj o r limitation of these models lies in the highly comple x optimization problems the y generate: specically , the need to optimize both the netw ork architecture and h yperparameters [6]-[8], as well as trainable parameters such as synaptic weights and biases. These problems typically in v olv e J ournal homepage: http://ijai.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3529 v ast search spaces that are often e xplored manually through trial and error , as e xhausti v e search methods are computationally prohibiti v e [9], [10]. These optimization challenges manifest at both the parametric and h yperparameter le v els. P arametric optimization is traditionally performed using backpropag ation gradient descent, which f aces notable challenges such as the v anishing of the gradient - that is, a loss of ef fecti v eness as the depth of the netw ork increases [11] - and the dif culties of na vig ating multiple local optima, often f ailing to reach the global optimum [7]. In contrast, h yperparameter optimization cannot rely on gradient-based methods such as backpropag ation, as the objecti v e functions are unkno wn. These black-box optimization problems are typically noisy , lack analytical e xpressions, and are computationally e xpensi v e to solv e [6], [8]. Thus, it is e vident that although sophisticated machine learning models signicantly impro v e forecast accurac y , the y require ne w optimization techniques capable of o v ercoming their optimization dra wbacks [7]. In this conte xt, nature-inspired opti mization algorithms (NIO As) ha v e g ained signicant popularity . These algorithms mimic natural processes to ef cientl y solv e comple x problems, pro viding good approximate solutions within reasonable time limits. Their k e y adv antage is that the y require no detailed kno wledge of the problem, making them ideal for black-box optimization. Furthermore, the y perform well in non-con v e x, noisy , and stochastic search spaces, further dri ving their widespread adoption [12], [13]. Notable successes include their scalable applic ation in the search for high-performance neural architectures and h yperparameter congurations [13], [14]. Ho we v er , in parametric optimization, NIO As ha v e yet to match the computational ef cienc y of gradient-based algorithms, presenting a promising a v e n ue for future research [7]. In general, the range of NIO As applications is e xpanding and di v erse, although some domains remain undere xplored. Some emer ging areas within NIO As include neuroe v olut ion, multi-objecti v e optimization, multit ask optimization, and multispace optimization. Neuroe v olution applies NIO As to e v olv e deep neural netw ork ar - chitectures, enabling the identication of ef cient congurations tailored to specic tasks. This approach often achie v es better results compared to manually tuned models, including those adjusted by e xperts [6]. Multi- objecti v e e v olutionary optimization, on the other hand, focuses on simultaneously optimizing typically con- icting goals, such as maximizing model accurac y while minimizing computational cost, which is particularly v aluable in h yperparameter tuning [6], [8]. Multitask e v olutionary optimization deserv es special attention, as it aims to create syner gies between dif ferent optimization tasks by transferring kno wledge across search spaces, a v oiding unproducti v e re gions, and sharing promising solutions. This method has sho wn strong potential for signicantly impro ving the ef cienc y of NIO As [14]. Expanding on this idea, recently proposed multispace optimization algorithms introduce simplied auxiliary search spaces to support the optimization of lar ge, com- ple x domains, with the kno wledge g ained being transferred back t o the original space [15], [16]. Amid the promising con v er gence between machine learning and NIO As, this study e xplores the application of these adv anced techniques in the design of machine learning models for demand forecasting. It analyzes the out- comes achie v ed, unco v ers recent NIO As approaches that remain untapped in this conte xt, and highlights k e y research g aps, in viting further e xploration of their potential in addressing comple x neural netw ork optimization problems and adv ancing some of the most promising lines of in v estig ation in the eld. 2. METHOD This study applies the preferred reporting items for systematic re vie ws and meta-analyses (PRI SMA) methodology to systemat ically re vie w the state of the art on the use of NIO As in articial intelligence-based demand forecasting models, with the aim of identifying research g aps. W ithin the PRISMA frame w ork, NIO As are treated as interv entions, while their impact on forecasting model performance is considered as the outcome. PRISMA ensures the trustw orthiness of the re vie w by pro viding a transparent process for article selecti on and synthesis of ndings [17]. T o enhance the obj ecti vity of the latter , an automatic agglomerati v e hierarchical clustering technique w as emplo yed to classify the re vie wed studies. 2.1. Resear ch questions As recommended by the PRISMA methodology [18], the research questions were e xplicitly and con- cisely posed to help e v aluate the coherence of the study in all its parts. T o do so, after the main question w as posed, the population, interv ention, comparator , outcome and conte xt (PICOC) frame w ork w as used to mak e the secondary questions e xplicit. T able 1 sho ws the results of this process. Ef ciency sear c h: application of natur e-inspir ed algorithms in ... (J os ´ e Rolando Neir a V illar) Evaluation Warning : The document was created with Spire.PDF for Python.
3530 ISSN: 2252-8938 T able 1. Research questions Code Question Main Ho w ha v e NIO As been used in recent years in the de v elopment of AI-based demand forecasting models? P What are t he characteristics of the AI models in which NIO As ha v e been in v olv ed? I What type of NIO As ha v e been used to interv ene in AI-based forecasting models? C What metrics and models ha v e been used to measure and compare the performance of models b uilt with NIO As? O What is the performance of the models b uilt with NIO As in relation to the established models? C In which economic sectors ha v e the y been applied and what main problems ha v e been attempted to be solv ed with the models b uilt with NIO As? 2.2. Eligibility criteria T o dene the scope of the article, the eligibility criteria [18] outli ned in T ables 2 and 3 were estab- lished. These criteria were also used to v erify the inclusion de cisions of the re vie w . The focus w as on selecting recent, reliable empirical studies that propose demand forecasting models using AI and NIO As. T able 2. Inclusion criteria Code Description I1 Studies that use Nature-inspired algorithms, as part of the proposed AI-based demand forecasting models I2 Studies containing a detailed and comprehensi v e methodology related to Nature-inspired algorithms used I3 Empirical studies with models v alidated with real data from companies I4 Studies whose main objecti v e is the de v elopment and v alidation of a demand forecasting model T able 3. Exclusion criteria Code Description E1 Articles published after 2018 E2 Other documents than scientic articles and conference papers E3 Articles published in other idioms than English or Spanish or with full te xt not a v ailable E4 Documents not related to the o v erall demand of a specic b usiness mark et 2.3. Sour ces of inf ormation In July 2024, the Scopus, W eb of Science, and IEEE databases were consulted, as the y are recognized for their reliability within the academic community . The queries were conducted through their respecti v e platforms using the same search method for all three. At this stage, the temporal co v erage of the search w as not limited. 2.4. Sear ch strategy During the de v elopment of the search strate gy , the population, interv ention, comparison, output (PICO) frame w ork guided the identication of rele v ant terms and their synon yms. These were link ed using OR oper - ators within each cate gory . While the PICO components themselv es were combined using AND operators to create the follo wing search string, applied uniformly across all data sources: (”demand forecasting” OR ”de- mand prediction” OR ”dem and prognostic” OR ”demand prognosis” OR ”dem and estimation”) AND (”e v o- lutionary computation” OR ”genetic algorithm” OR ”genetic programming” OR ”e v olutionary programming” OR ”e v olution strate gies” OR ”neuro e v olution” OR ”sw arm intelligence”) AND (”articial intelligence” OR ”machine learning” OR ”deep learning” OR ”reinforc ement learning” OR ”neural netw orks”) AND (”error” OR ”performance” OR ”ef cienc y” OR ”rob ustness” OR ”accurac y” OR ”precision”). 2.5. Article selection pr ocess The researchers independently as sessed the search results for consistenc y and rele v ance to the inclu- sion and e xclusion criteria. After resolving inconsistencies and making adjustments, the e xclusion criteria were applied at the title and abstract le v el, and the inclusion criteria at the full te xt le v el. Only studies agreed upon by both researchers were included. 2.6. Data items and data collection The authors identied the data required to answer the research questions and collaborati v ely de v el- oped an e xtraction matrix, with columns for data items and ro ws for included studies. Each article w as indepen- dently re vie wed and discrepancies were resolv ed through discussion. Extracted data encompassed: i) economic Int J Artif Intell, V ol. 14, No. 5, October 2025: 3528–3541 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3531 sector; ii) problem addressed and limitations of prior solutions; ii i) NIO As and their class ication by [19]; i v) the role of NIO As in the model; v) type of optimization performed; vi) machine learni ng methods emplo yed; vii) optimization strate gy (e.g., single-objecti v e or multi-objecti v e); viii ) forecast model outline; ix) data de- scription; x) performance metrics; xi) benchmarking models; xii) performance of the proposed model. 2.7. Synthesis method The qualitati v e synthesis in v olv ed classifying articles based on similarities using the criteria described in subsection 2.6, specically items iii), v), vi), vii), and ix). T o reduce bias, hierarchical agglomerati v e clustering w as appli ed using a feature table to compute Euclidean dist ances. The silhouette method w as used to determine the optimal number of clusters. The implementation w as carried out in Python, using scip y .cluster .hierarch y for linkage construction, sklearn.cluster for clustering e x ecution, and sklearn.metrics for silhouette e v aluation, adopting W ard’ s method to ensure clear separation between clusters. Subsequently , the clusters and sub-cluster are analyzed and grouped when the dif ferences were minimal. This classication informed the synthesis by ident ifying contrib utions to the research questions and led to the de v elopment of a ne w conceptual model that inte grates these insights and addresses k e y challenges using state-of-the-art tools. 3. RESUL TS AND DISCUSSION This section presents the results of the study selection process and the subsequent qualitati v e synthes is. The selection results are detailed, tracing the progression from the initial records identied in the search to the nal number of studies included in the re vie w . F or the quantitati v e synthesis, this section reports the classication of the selected articles and pro vides answers to the research questions, of fering insights into the k e y ndings deri v ed from the analysis. 3.1. Result of the studies selection The search process initially retrie v ed a total of 282 records. After eliminating duplicates and syst em- atically applying the e xclusion and inclusion criteria, 36 studies were selected for the nal analysis, as sho wn in Figure 1. Most of these studies were published in 2019, 2023, and 2024. In terms of application domains, the predominant sectors represented in the selected articles are electricity , w ater distrib ution, and retail. Figure 1. Results of article selection 3.2. Result of the qualitati v e synthesis As a result of the analytical operations conducted on the collected informat ion, such as automat ic grouping, comparison, and cate gorization of articles, and subsequent confrontation of e vidence with the respecti v e research questions, signicant ndings were obtained, which are reported as follo ws: Ef ciency sear c h: application of natur e-inspir ed algorithms in ... (J os ´ e Rolando Neir a V illar) Evaluation Warning : The document was created with Spire.PDF for Python.
3532 ISSN: 2252-8938 3.2.1. Result of the classication of articles After con v erting the rele v ant data items from the e xtraction matrix into dummy v ariables and eli minat- ing the redundant v ariables, the features table for the calculation of the Euclidean distances between the articles w as obtained. In relation to the optimal number of clusters, the silhouette method initially recommended 24, a high number relati v e to the 36 articl es analyzed. Ba sed on the best s ilhouette v alue in the range of tw o to v e clusters, the authors selected four main clusters and used the 24 clusters as subclusters within these main clusters. After analyzing similarities and dif ferences within and between them, subclusters with ne gligible dif- ferences were mer ged to align with the research objecti v es. Each class and subclass were then descripti v ely named. The nal classication, along with the numbering of the automatic clusters, is presented in T able 4. T able 4. Classication of articles Classication of articles Cluster Sub-cluster Articles a) Shallo w learning optimizers 1 13 i) Ev olutionary-sw arm optimizers 1 1 3 ii) Shallo w parameter optimizers 1 2, 3, 4, 5 8 iii) Shallo w multi-objecti v e optimizers 1 6 1 i v) Shallo w ensemble optimizers 1 7 1 b) Ev olutionary optimizers 2 16 i) Genetic programming models 2 13, 14 3 ii) Shallo w h yperparameter optimizers 2 15 1 iii) Support v ector machine (SVM) e v olutionary optimizers 2 18 1 i v) Deep learning optimizers 2 11 - Deep structural optimizers 2 16, 17, 19 4 - Deep multi-objecti v e optimizers 2 8, 9 3 - Deep parameter optimizers 2 10, 11 3 - Deep ensemble optimizers 2 12 1 c) Deep sw arm optimizers 3 20, 21 3 d) SVM sw arm optimizers 4 22, 23, 24 4 The follo wing describes the classes and subclasses presented in T able 4. a. Shallo w learning optimizers: this class groups models based on shallo w neural netw orks (only one hidden layer), where NIO As primarily optimize trainable parameters, with dif ferences across subclasses. i) Ev olutionary-sw arm optimizers: this subclass combines genetic algorithms (GA) with sw arm intelli- gence to optimize models. A k e y e xample is study in [20], where GA e xplores ne w weights and biases, and PSO e xploits GA s best ndings through continuous tra n s fer learning. Similarly , studies in [21] and [22] use GA to pre-train initial weights and biases, impro ving the ef cienc y of gradient descent ne- tuning. The y also emplo y Northern Gosha wk optimization (NGO) and Gray W olf optimization (GW O), respecti v ely , to optimize other model h yperparameters. ii) Shallo w parameter optimizers: this subclass focuses on optimizing only the trainable parameters of shal- lo w neural netw orks, mainly through e v olutionary algorithms. Studies in [23]-[26] use the mind e v olu- tionary algorithm (MEA), GA, and PSO respecti v ely for pre-training backpropag ation neural netw orks (BPNN), while studies in [27]-[30] apply dif ferential e v olution (DE), GA, and articial immune system (AIS) algorithms respecti v ely for full parameter optimization. These approaches reduce prediction er - rors compared to backpropag ation, though at the cost of longer training. Notably , MEA impro v es both accurac y and e x ecution time o v er GA. iii) Shallo w multi-objecti v e optimizers: this subclass includes a single study proposing a multi-objecti v e optimization to reduce lag inputs while minimizing error for a multilayer perceptron (MLP) model. An adaptati v e neuro-fuzzy inference system (ANFIS) further renes predictions, with both input selection and ANFIS parameters optimized by GA. The model outperforms standalone MLP a n d ANFIS in ac- curac y . Additionally , the authors claim that by reducing inputs, computational cost decreases, enabling real-time use, though no quantitati v e e vidence is pro vided in [31]. i v) Shallo w ensemble optimizers: this subclass includes a single study in [32] where PSO combines predic- tors, including an e xtreme learning machine (ELM), outperforming standalone components. b . Ev olutionary optimizers: this class is characterized by using only and e xclusi v ely e v olutionary algo- rithms, lea ving aside sw arm intelligence. It is made up of four subclasses that are distinguished from Int J Artif Intell, V ol. 14, No. 5, October 2025: 3528–3541 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3533 each other by the type of optimization the y perform, and the AI m o de l that interv enes, where deep learn- ing models are a prominent set. i) Genetic programming (GP) models: this subclass applies GP to generate e xplicit mathematical e xpres- sions for demand forecasting, emplo ying Canonical GP , Multi-Gene GP , and multi-e xpression program- ming in [33]-[35], respecti v ely . These studies benchmark ag ainst ARIMA, articial neural netw ork (ANN), and ANFIS, consistently achie ving lo wer errors. GP stands out for producing interpretable mod- els, unlik e the black-box nature of neural netw orks. ii) Shallo w h yperparameter optimizers: this subclass includes a single study from the w ater distrib ution sector [36], where GA optimizes structural and training h yperparameters of a shallo w BPNN, including hidden neurons, learning rate, and v alidation criteria. The optimized model outperforms both standalone BPNN and ARIMA. The study highlights the ef fecti v eness of GA for h yperparameter tuning, e v en with gradient-based parameter training. iii) SVM e v olutionary optimizers: this subclass includes a single study in [37] where GA optimizes the penalty (C) and k ernel (g amma) parameters of an SVM for forecasting. GA adapti v ely tunes crosso v er and mutation rates, balancing e xploration and e xploitation a s in [20]. The model impro v es accurac y , and the authors suggest f aster con v er gence, though no quantitati v e e vidence is pro vided. i v) Deep learning optimizers: this is the lar gest subclass, with 11 studies focused on optimizing deep learning models, mainly using e v olutionary algorithms. It includes four groups: tw o optimize h yperparameters, one tar gets trainable parameters, and one combines models using weighted a v eraging. Each group is described belo w . - Deep structural optimizers: this group includes four studies on h yperparameter optimization of deep neu- ral netw orks [38]-[41], co v ering both structural (layers, neurons) and training h yperparameters (dropout rate, batch size, learning rate). All combine on e algorithm for e xploration and another for renement, such as Bayesian optimization (BO)-GA [38], GA-DE [39], and GA-scatter search (SS) [41]. All impro v e accurac y o v er standalone methods. Notably , GA-SS reduced e x ecution time to 23 minutes compared to 58 minutes for GA alone and 480 minutes for trial-and-error . - Deep multi-objecti v e optimizers: this group includes three studies by the same author [42]-[44], applying non-dominated sorting genetic algorithm II (NSGA-II) to jointly maximize R 2 and minimize test error by optimizing structural h yperparameters of ANN, long short-term memory (LSTM), and T rans former mod- els. T rainable parameters are rened via gradient methods. Accurac y and e xplanatory po wer impro v e across studies. T o reduce computational cost, training time or epochs are limited during optimization, with nal retraining of the best models for full con v er gence. These studies conrm the ef fecti v eness of multi-objecti v e optimization for neural netw ork design. - Deep parameter optimizers: this group focuses on parameter optimization in deep learning using neu- roe v olution, which st arts from simple architectures with a single layer and fe w neurons, progressi v ely e v olving both structure and parameters. The studies apply neural netw ork simultaneous optimization algorithm (NNSO A) [45] and neuro e v olution of augmenting topologies (NEA T) [46]. Additionally , [47] uses GA for pre-training in a gray neural netw ork (GNN), reportedly impro ving con v er gence, though without quantitati v e e vidence. - Deep ensemble optimizers: this is a group consisting of a single study in [48] in which GA is used to obtain the optimal weights to assemble a MLP in char ge of forecasting trends, with an LSTM in char ge of forecasting seasonality and other comple x v ariations. The authors found that the proposed model obtains better error metrics than benchmark models. c. Deep sw arm optimizers: this class i n c ludes three studies that impro v e deep neural netw ork pre-training by combining aggressi v e e xploration with strong e xploitation. Study in [49] uses the modied dragon- y algorithm (MD A), mer ging genetic operators with Dragony renement. Study in [50] combines stochastic fractal search (SFS) for broad e xploration with whale optimization algorithm (W O A) for pre- cise e xploitation, impro ving netw ork accurac y and con v er gence. Study in [51] applies PSO to deep netw ork, enhancing temporal memory and prediction accurac y . d. SVM sw arm optimizers: this class sho ws ho w sw arm intelligence enhances SVM model s by optimizing k ernel parameters, re gularization terms, and epsilon. Boosted multi v erse optimizer (BMV O) impro v es Incremental SVM accurac y [52], while PSO boosts SVM and LSTM-SVM h ybrid models [9], [53]. Ef ciency sear c h: application of natur e-inspir ed algorithms in ... (J os ´ e Rolando Neir a V illar) Evaluation Warning : The document was created with Spire.PDF for Python.
3534 ISSN: 2252-8938 Study in [10] combines GA with sw arm methods for SVM optimization in cloud demand forecasting. These studies demonstrate the v ersatility of sw arm intelligence to impro v e non-neural models across sectors. 3.2.2. Result of the r esear ch questions This section consolidates the insights from the classied articles to e v aluate thei r contrib utions to the research questions sho wn in T able 1. It e xamines the application of NIO As in AI-based forecasting models, highlighting the characteristics of the models, the specic NIO As emplo yed, the metrics and benchmarking models used, the performance achie v ed, and the primary challenges and sectors addressed. Main question: ho w ha v e NIO As been used in recent years in the de v elopment of AI-bas ed demand forecasting models? NIO As ha v e been predominantly applied to neural netw ork optimization, representing 28 of the 36 studies re vie wed. In addition, some applications ha v e focused on the optimization of SVM and genetic programming (GP) models. Notably , no use of NIO As w as identied for other machine learning models be yond these cate gories. Neural netw ork optimization co v ers both shallo w and deep learning, as sho wn in Figures 2 and 3. In these gures, the main branches, sub-branches, and lea v es represent the optimization focus, applied technique, and specic NIO A with its corresponding study . In shallo w netw orks, the focus is primarily on parametric optimization, achie v ed through pre-training or full training, ma inly using GA. In deep learning, the emphasis shifts to h yperparameter optimization, where adv anced techniques such as h ybridization, adapti v e mechanisms, and multi-objecti v e approaches are applied. Notably , no studies address full parametric optimization of deep netw orks. Figure 2. NIO As applications on shallo w learning Figure 3. NIO As applications on deep learning Int J Artif Intell, V ol. 14, No. 5, October 2025: 3528–3541 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3535 P1: What are the characteristics of the AI models in which NIO As ha v e been in v olv ed? The AI models predominantly optimized by NIO As are neural netw ork-based. Among these, half (14 studies) in v olv e shallo w neural netw orks, while the other half focus on deep neural netw orks. The shallo w neural netw orks include fully connected single hidden layer BPNNs [21]-[27], [30], [31], [36], radial basis function neural netw orks (RBFNNs) [20], [28], and w a v elet neural netw orks [29]. F or deep learning models, most studies in v olv e LSTM architectures [38], [40], [41], [43], [48], [50] and fully connected deep BPNNs [39], [42], [45], [46]. Other deep neural net w orks e xplored include transformer neural netw orks [44], generati v e adv ersarial netw orks (GANs) [49], Deep Echo State Netw orks [51], and GNN [47]. Finally , NIO As ha v e also been applied to models based on SVM [10], [37], [52], [53], and GP [33]-[35]. P2: What type of NIO As ha v e been used to interv ene in AI-based forecasting models? In shallo w learning, parameter pre-training has predominantly relied on GA [21], [22], [24], [25] and its v ariants, such as MEA [23], with occasional use of PSO [26]. F or complete parameter training, GA [28], DE [27], [30], AIS [29], and the h ybrid GA-PSO [20] ha v e been emplo yed. Additionally , GA has been ap- plied to h yperparameter optimization [36] and input selection [31]. In deep learning models, GA has been e xtensi v ely used for optimizing structural and training h yperparameters, either independently [40], in com- bination with other algorithms such as SS [41], BO [38], and success-history-based parameter adaptation for dif ferential e v olution (SHADE) [39], or within the NSGA-II multi-objecti v e optimization frame w ork [42]-[44]. Neuroe v olutionary algorithms lik e NNSO A [45] and NEA T [46] ha v e been applied to simultaneously optimize h yperparameters and trainable parameters. F or parametric pre-training of deep netw orks, GA has also been emplo yed [47], along with h ybrids such as MD A [49] and SFS-W O A [50]. F or optimization of SVM-based models, both GA [37] and Sw arm Intelligence algorithms [52], [53] ha v e been used, as well as h ybridization of both types of metaheuristics [10]. P3: What metrics and models ha v e been used to measure and compare the performance of models b uilt with NIO As? Root mean squared error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE) are the most used metrics to assess prediction accurac y and error normalization in both shallo w and deep models. Correlat ion coef cients and NSE pro vide additional performance insights. Shallo w models benchmark ag ainst re gression, ARIMA, and BPNN, while deep models are compar ed to support v ector re gression (SVR), ANN, and non-optimized LSTM. In these models, adv anced metrics lik e R 2 and SEP assess goodness-of-t and rob ustness, especially for LSTM, GAN, and T ransformers. In SVM models, RMSE and MAPE are the main metrics, with specic measures lik e the Bull whip Ef fect used in in v entory forecasting. P4: What is the performance of the models b uilt with NIO As in relation to the established models? The re vie wed studies pro vide compelling e vidence that NIO As signicantly impro v e forecasting ac- curac y in shallo w and deep neural netw ork-based models, as well as SVM-based models. In neural models, this impro v ement is consistent across h yperparameter optim ization, trainable parameter optimization, or a com- bination of both, with notable e xamples from studies [20], [45]. While the primary focus of most studies is on reducing forecast errors, some authors also address computational ef cienc y concerns, proposing strate gies such as h ybrid approaches (e.g., BO-GA [38], SS-GA [41], M D A [49]) that enhance con v er gence speed and ef cienc y compared to standalone methods. Ho we v er , h ybrids lik e NEA T -NCS increase e x ecution time [46], and others, such as SFS-W O A, reduce training time b ut add pre-training steps, lea ving o v erall ef cienc y un- certain [50]. Additional ef cienc y-oriented strate gies include adapti v e algorithms with automatic parameter tuning [20], [37], [39], transfer learning mechanisms between GA and PSO [20], and input v ariable selection [31]. P5: In which economic sectors ha v e the y been applied and what main problems ha v e been attempted to be solv ed with the models b uilt with NIO As? The primary sectors utilizing NIO As are electricity , w ater distrib ution, manuf acturing, retail, and cloud computing. Across sectors, common challenges in forecasting include managing non-linear dynamics, reducing o v ertting, and impro ving accurac y in dynamic systems. T raditional models, such as re gression, ARIMA, and MLR, often f ail to capture non-linearities, while standalone machi ne learning models lik e ANN, SVM, and BPNN struggle with o v ertting and limited adaptability . T o address these issues, machine learn- ing models and h ybrid frame w orks ha v e been introduced. LSTM models enhance accurac y b ut f ace dif culties with o v ertting and h yperparameter tuning, while h ybrid approaches, such as LSTM-SVR and GA-DE inte gra- tions, impro v e non-linear modeling b ut encounter computational ef cienc y limitations. NIO As play a critical role in o v ercoming these limitations by enhancing h ybrid frame w orks’ predicti v e accurac y , adapt ability , and Ef ciency sear c h: application of natur e-inspir ed algorithms in ... (J os ´ e Rolando Neir a V illar) Evaluation Warning : The document was created with Spire.PDF for Python.
3536 ISSN: 2252-8938 potentially computational ef cienc y . T echniques lik e PSO, GA, W O A, and DE optimize parameters and h y- perparameters, addressing the shortcomings of traditional and standalone models. Sector -specic applications highlight these adv ancements: in electricity , NIO As support dynamic forecasting for short-term and annual ener gy demands [33], [35]; in w ater distrib ution, the y address agricultural and urban needs by managing non- linear patterns in daily and hourly forecasts [23], [44]; in retail and manuf acturing, the y tackle the b ull whip ef fect and rene e-commerce demand predictions [9], [34]; and in cloud computing, NIO As enhance resource demand forecasting and w orkload optimization in highly dynamic en vironments [10], [27], [29]. 3.3. Pr oposed conceptual model This section introduces a ne w optimization model to address the k e y issue identied: optimizing train- able parameters in deep learning models. The model incorporates adv anced tools lik e heuristic h ybridization and adapti v e parameter control, while addressing the main g ap: the lack of e v olutionary multitasking applica- tions. 3.3.1. F oundational studies This model adopts the multi-space e v olutionary search for lar ge-scale optimization [15], a v ariant of e v olutionary multitasking optimization. It generates an auxiliary search space with simplied v ersions of the original space to ease the search process. Insights learned from the auxiliary space guide the original space search, enhancing ef fecti v eness and ef cienc y , while the best results from the original space return to enrich the auxiliary search. On t he other hand, the model dra ws inspiration from lo w-bit quantization optimization [54] for con- structing the auxiliary search space. Quantization, a deep netw ork compression technique, discretizes con- tinuous v ariables representing neural netw ork weights, reducing possible weight v alues and bit requirements, thereby simplifying optimizati on. Recent adv ances ha v e sho wn that lo w-bit-width models can maintain high accurac y by applying quantization to both acti v ations and weights [55]. The model is also inuenced by meta- heuristic h ybridization [49], [50], and adapti v e mechanisms for mutation and crosso v er control in GA [20], [39]. 3.3.2. Model components Model components are the k e y elements that dene the search strate gy , adapti v e mechanisms, and h ybrid techniques of the optimization frame w ork. Original search space. This search space encompasses all possible v alues for the weights and biases of a deep neural netw ork. This continuous space has dimensions equal to the total number of weights and biases in the netw ork. Auxiliary quantized search space. This space discretizes the dimensions of the original space based on the range of the initial population. It allo ws generati ng v alues be yond the initial ranges b ut within feasible limits. The number of possible v alues per dimension is go v erned by the bit width (m-bit); higher m-bit v alues permit more possibilities, with the binary dimension (2-bit) representing the most e xtreme case. GA with adapti v e mechanism. This search algorithm e xplores the discretized auxiliary space using mutation and crosso v er , guided by an adapti v e mechanism that promotes aggressi v e e xploration in the early stages and shifts to e xploitation as tness impro v es. SFS-W O A h ybridization. This h ybrid algorithm searches the original space using insights from the auxiliary space, combining the e xploration strength of SFS with the renement capabilities of W O A. Automatic granularity adjustment mechanism. This mechanism adjusts the m-bit in the auxiliary space, starting with lo w m-bits for ef cient e xplorati on of lar ge re gions and increasing the v alue during e v olu- tion for ner e xploration of promising areas. 3.3.3. Operational dynamics The model emplo ys multi-tas k optimization with transfer learning. The auxiliary space, dri v en by aggressi v e GA and lo w m-bit v alues, rapidly e xplores lar ge re gions and identies promising areas. This in- formation directs the more precise b ut less aggressi v e SF S-W O A algorithms in the original space to e xploit these areas and rene the search for optimal solutions. The best candidates from the original space are then transferred back to the auxiliary space to adjust the m-bit and enhance e xploration. Figure 4 illustrates the operation of the proposed model. Int J Artif Intell, V ol. 14, No. 5, October 2025: 3528–3541 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3537 Figure 4. Schematic of the proposed model 3.3.4. Expected outcomes The simultaneous search across tw o spaces, the use of multiple NIO As, and the dynamic resizing of the auxiliary space via m-bit adjustments are e xpected to incur signicant computational costs. Ho we v er , the rapid con v er gence f acilitated by the auxiliary space is anticipated to accelerate optimization in the original space, maintaining precision and a v oiding local optima. This ef cienc y could drastically reduce e x ecution times, a critical f actor for practical applications of deep neural netw orks in demand forecasting across v arious industries. 3.4. Discussion NIO As are primarily applied to neural netw ork optimization. In shallo w netw orks, the focus is on parameter optimization, including pre-training [21]-[26] and full training [20], [27]-[30]. In deep netw orks, parameter optimization is rare due to high computational costs and is limited to neuro e v olution [45], [46] or adv anced pre-training methods [47], [49], [50]. Neuro e v olution reduces comple xity by progressi v ely in- creasing netw ork size, starting with a single layer , b ut remains resource-intensi v e. In contrast, h yperparameter optimization is more common in deep learning, with se v en studies in [38]-[44] applying techniques such as h ybridization, adapti v e mechanisms, and multi-objecti v e optimization. In shallo w netw orks, only one study in [36] addresses h yperparamet er tuning, as parameter optimization of fers greater g ains. This contrast reects the higher practicality and impact of h yperparameter optimization in deep learning, gi v en the dif culty of parameter -le v el optimization. Despite their dif ferences, shallo w and deep netw orks share similar optimization strate gies, especial ly h ybridization. In shallo w models, [20] applies a GA-PSO h ybrid for complete RBFNN training, while deep learning studies use BO-GA and SS-GA for LSTM h yperparam eter tuning [38], [41], and GA-D A and SFS- W O A for pre-training [10], [49]. Adapti v e mechanisms are also common: [20] applies them to shallo w param- eter optimization, and [39] to deep h yperparameter tuning. Multi-objecti v e optimization with NSGA-II is used in both shallo w [31] and deep netw orks [42]-[44]. While the main goal of NIO As is to impro v e model accurac y , se v eral studies ackno wledge their high computational cost, primarily due to intensi v e neural netw ork e v aluations. T o address this, dif ferent ef cienc y- oriented strate gies ha v e been proposed. In shallo w netw orks, studies in [26], [31] highlight that selecting rele- v ant input v ariables reduces model comple xity and impro v es ef cienc y . [26] proposes gre y relational analysis for input selection, while [31] uses a multi-objecti v e e v olutionary algorithm that jointly minimizes forecast er - ror and selects inputs. Although these approaches are e xpected to impro v e ef cienc y , no quantitati v e v alidation is pro vided. Ef cienc y is also e xplored through the selection of specic NIO As, with [23] nding MEA more ef cient than GA for shallo w netw ork pre-training, and [29] sho wing w ater c ycle algorithm (WCA) surpassing AIS in training ef cienc y , though at the cost of accurac y . Adapti v e mechanisms in GA are another approach Ef ciency sear c h: application of natur e-inspir ed algorithms in ... (J os ´ e Rolando Neir a V illar) Evaluation Warning : The document was created with Spire.PDF for Python.