IAES
Inter
national
J
our
nal
of
Articial
Intelligence
(IJ-AI)
V
ol.
14,
No.
5,
October
2025,
pp.
3528
∼
3541
ISSN:
2252-8938,
DOI:
10.11591/ijai.v14.i5.pp3528-3541
❒
3528
Efciency
sear
ch:
application
of
natur
e-inspir
ed
algorithms
in
articial
intelligence
f
or
ecasting
models
J
os
´
e
Rolando
Neira
V
illar,
Miguel
´
Angel
Cano
Lengua
F
acultad
de
Ingenier
´
ıa
de
Sistemas
e
Inform
´
atica,
Uni
v
ersidad
T
ecnol
´
ogica
del
Per
´
u,
Lima,
Per
´
u
Article
Inf
o
Article
history:
Recei
v
ed
Jan
13,
2025
Re
vised
Jul
26,
2025
Accepted
Aug
6,
2025
K
eyw
ords:
Articial
intelligence
Demand
forecasting
Multi-space
optimization
Nature-inspired
optimization
algorithm
Quantization
ABSTRA
CT
This
study
re
vie
ws
ho
w
nature-inspired
optimization
algorithms
(NIO
As)
ha
v
e
been
applied
to
articial
intelligence-based
demand
forecasting,
using
preferred
reporting
items
for
systematic
re
vie
ws
and
meta-analyses
(PRISMA)
and
clus-
tering
analysis
to
e
xamine
36
selected
articles.
The
ndings
re
v
e
al
that
NIO
As,
particularly
genetic
algorithms
and
sw
arm
intelligence
methods,
including
their
h
ybrids,
ha
v
e
been
frequently
applied
to
long
short-term
memory
(LSTM)
and
other
backpropag
ation
neural
netw
ork
models
(BPNN).
A
k
e
y
insight
is
the
dif-
ferentiated
application
of
NIO
As
depending
on
netw
ork
depth:
In
shallo
w
net-
w
orks,
the
y
ha
v
e
been
ef
fecti
v
ely
used
to
optimize
trainable
parameters,
whereas
in
deep
netw
orks,
their
role
has
focused
primarily
on
h
yperparameter
optimiza-
tion
due
to
the
prohibiti
v
e
dimensionality
of
trainable
weights.
I
n
all
studies,
NIO
A-optimized
models
consistently
outperform
con
v
entional
baselines
based
on
backpropag
ation.
Ho
we
v
er
,
persistent
challenges
such
as
e
xcessi
v
e
e
x
ecution
times
and
slo
w
con
v
er
gence
ha
v
e
led
to
the
de
v
elopment
of
more
ef
cient
h
y-
brid
s
trate
gies
and
adapti
v
e
mechanisms
for
automated
e
xploration-e
xploitation
control.
By
mapping
e
xplored
and
une
xplored
pathw
ays,
s
ummarizing
k
e
y
out-
comes
and
techniques,
and
identifying
promis
ing
methodologies,
this
re
vie
w
of
fers
a
practical
foundation
to
guide
future
e
xperiments
and
implementations
in
v
olving
NIO
A-based
optimization
strate
gies
in
neura
l
netw
ork
models.
As
a
conceptual
contrib
ution,
it
also
proposes
an
inno
v
ati
v
e
use
of
multispace
opti-
mization
to
address
one
of
the
most
critical
challenges
identied:
the
optimiza-
tion
of
trainable
parameters
in
deep
neural
netw
orks.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Miguel
´
Angel
Cano
Lengua
F
acultad
de
Ingenier
´
ıa
de
Sistemas
e
Inform
´
atica,
Uni
v
ersidad
T
ecnol
´
ogica
del
Per
´
u
Jr
.
Natalio
Sanchez
125,
Lima,
Per
´
u
Email:
mcanol@unmsm.edu.pe
1.
INTR
ODUCTION
Accurate
demand
forecasting
is
crucial
for
managing
b
usiness
operations
and
supply
chains,
enabling
ef
fecti
v
e
resource
planning
while
a
v
oiding
costly
issues
such
as
stock
outs
and
the
b
ull
whip
ef
fect.
It
also
supports
nancial,
human
resource,
and
mark
eting
planning,
thereby
signicantly
enhancing
competiti
v
eness
[1]-[4].
In
light
of
this
importance,
recent
years
ha
v
e
witnessed
the
emer
gence
of
adv
anced
machine
learning
approaches,
particularly
deep
learning-based
models,
which
ha
v
e
consistently
demonstrated
superior
predicti
v
e
performance
[4],
[5].
Ho
we
v
er
,
a
maj
o
r
limitation
of
these
models
lies
in
the
highly
comple
x
optimization
problems
the
y
generate:
specically
,
the
need
to
optimize
both
the
netw
ork
architecture
and
h
yperparameters
[6]-[8],
as
well
as
trainable
parameters
such
as
synaptic
weights
and
biases.
These
problems
typically
in
v
olv
e
J
ournal
homepage:
http://ijai.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3529
v
ast
search
spaces
that
are
often
e
xplored
manually
through
trial
and
error
,
as
e
xhausti
v
e
search
methods
are
computationally
prohibiti
v
e
[9],
[10].
These
optimization
challenges
manifest
at
both
the
parametric
and
h
yperparameter
le
v
els.
P
arametric
optimization
is
traditionally
performed
using
backpropag
ation
gradient
descent,
which
f
aces
notable
challenges
such
as
the
v
anishing
of
the
gradient
-
that
is,
a
loss
of
ef
fecti
v
eness
as
the
depth
of
the
netw
ork
increases
[11]
-
and
the
dif
culties
of
na
vig
ating
multiple
local
optima,
often
f
ailing
to
reach
the
global
optimum
[7].
In
contrast,
h
yperparameter
optimization
cannot
rely
on
gradient-based
methods
such
as
backpropag
ation,
as
the
objecti
v
e
functions
are
unkno
wn.
These
black-box
optimization
problems
are
typically
noisy
,
lack
analytical
e
xpressions,
and
are
computationally
e
xpensi
v
e
to
solv
e
[6],
[8].
Thus,
it
is
e
vident
that
although
sophisticated
machine
learning
models
signicantly
impro
v
e
forecast
accurac
y
,
the
y
require
ne
w
optimization
techniques
capable
of
o
v
ercoming
their
optimization
dra
wbacks
[7].
In
this
conte
xt,
nature-inspired
opti
mization
algorithms
(NIO
As)
ha
v
e
g
ained
signicant
popularity
.
These
algorithms
mimic
natural
processes
to
ef
cientl
y
solv
e
comple
x
problems,
pro
viding
good
approximate
solutions
within
reasonable
time
limits.
Their
k
e
y
adv
antage
is
that
the
y
require
no
detailed
kno
wledge
of
the
problem,
making
them
ideal
for
black-box
optimization.
Furthermore,
the
y
perform
well
in
non-con
v
e
x,
noisy
,
and
stochastic
search
spaces,
further
dri
ving
their
widespread
adoption
[12],
[13].
Notable
successes
include
their
scalable
applic
ation
in
the
search
for
high-performance
neural
architectures
and
h
yperparameter
congurations
[13],
[14].
Ho
we
v
er
,
in
parametric
optimization,
NIO
As
ha
v
e
yet
to
match
the
computational
ef
cienc
y
of
gradient-based
algorithms,
presenting
a
promising
a
v
e
n
ue
for
future
research
[7].
In
general,
the
range
of
NIO
As
applications
is
e
xpanding
and
di
v
erse,
although
some
domains
remain
undere
xplored.
Some
emer
ging
areas
within
NIO
As
include
neuroe
v
olut
ion,
multi-objecti
v
e
optimization,
multit
ask
optimization,
and
multispace
optimization.
Neuroe
v
olution
applies
NIO
As
to
e
v
olv
e
deep
neural
netw
ork
ar
-
chitectures,
enabling
the
identication
of
ef
cient
congurations
tailored
to
specic
tasks.
This
approach
often
achie
v
es
better
results
compared
to
manually
tuned
models,
including
those
adjusted
by
e
xperts
[6].
Multi-
objecti
v
e
e
v
olutionary
optimization,
on
the
other
hand,
focuses
on
simultaneously
optimizing
typically
con-
icting
goals,
such
as
maximizing
model
accurac
y
while
minimizing
computational
cost,
which
is
particularly
v
aluable
in
h
yperparameter
tuning
[6],
[8].
Multitask
e
v
olutionary
optimization
deserv
es
special
attention,
as
it
aims
to
create
syner
gies
between
dif
ferent
optimization
tasks
by
transferring
kno
wledge
across
search
spaces,
a
v
oiding
unproducti
v
e
re
gions,
and
sharing
promising
solutions.
This
method
has
sho
wn
strong
potential
for
signicantly
impro
ving
the
ef
cienc
y
of
NIO
As
[14].
Expanding
on
this
idea,
recently
proposed
multispace
optimization
algorithms
introduce
simplied
auxiliary
search
spaces
to
support
the
optimization
of
lar
ge,
com-
ple
x
domains,
with
the
kno
wledge
g
ained
being
transferred
back
t
o
the
original
space
[15],
[16].
Amid
the
promising
con
v
er
gence
between
machine
learning
and
NIO
As,
this
study
e
xplores
the
application
of
these
adv
anced
techniques
in
the
design
of
machine
learning
models
for
demand
forecasting.
It
analyzes
the
out-
comes
achie
v
ed,
unco
v
ers
recent
NIO
As
approaches
that
remain
untapped
in
this
conte
xt,
and
highlights
k
e
y
research
g
aps,
in
viting
further
e
xploration
of
their
potential
in
addressing
comple
x
neural
netw
ork
optimization
problems
and
adv
ancing
some
of
the
most
promising
lines
of
in
v
estig
ation
in
the
eld.
2.
METHOD
This
study
applies
the
preferred
reporting
items
for
systematic
re
vie
ws
and
meta-analyses
(PRI
SMA)
methodology
to
systemat
ically
re
vie
w
the
state
of
the
art
on
the
use
of
NIO
As
in
articial
intelligence-based
demand
forecasting
models,
with
the
aim
of
identifying
research
g
aps.
W
ithin
the
PRISMA
frame
w
ork,
NIO
As
are
treated
as
interv
entions,
while
their
impact
on
forecasting
model
performance
is
considered
as
the
outcome.
PRISMA
ensures
the
trustw
orthiness
of
the
re
vie
w
by
pro
viding
a
transparent
process
for
article
selecti
on
and
synthesis
of
ndings
[17].
T
o
enhance
the
obj
ecti
vity
of
the
latter
,
an
automatic
agglomerati
v
e
hierarchical
clustering
technique
w
as
emplo
yed
to
classify
the
re
vie
wed
studies.
2.1.
Resear
ch
questions
As
recommended
by
the
PRISMA
methodology
[18],
the
research
questions
were
e
xplicitly
and
con-
cisely
posed
to
help
e
v
aluate
the
coherence
of
the
study
in
all
its
parts.
T
o
do
so,
after
the
main
question
w
as
posed,
the
population,
interv
ention,
comparator
,
outcome
and
conte
xt
(PICOC)
frame
w
ork
w
as
used
to
mak
e
the
secondary
questions
e
xplicit.
T
able
1
sho
ws
the
results
of
this
process.
Ef
ciency
sear
c
h:
application
of
natur
e-inspir
ed
algorithms
in
...
(J
os
´
e
Rolando
Neir
a
V
illar)
Evaluation Warning : The document was created with Spire.PDF for Python.
3530
❒
ISSN:
2252-8938
T
able
1.
Research
questions
Code
Question
Main
Ho
w
ha
v
e
NIO
As
been
used
in
recent
years
in
the
de
v
elopment
of
AI-based
demand
forecasting
models?
P
What
are
t
he
characteristics
of
the
AI
models
in
which
NIO
As
ha
v
e
been
in
v
olv
ed?
I
What
type
of
NIO
As
ha
v
e
been
used
to
interv
ene
in
AI-based
forecasting
models?
C
What
metrics
and
models
ha
v
e
been
used
to
measure
and
compare
the
performance
of
models
b
uilt
with
NIO
As?
O
What
is
the
performance
of
the
models
b
uilt
with
NIO
As
in
relation
to
the
established
models?
C
In
which
economic
sectors
ha
v
e
the
y
been
applied
and
what
main
problems
ha
v
e
been
attempted
to
be
solv
ed
with
the
models
b
uilt
with
NIO
As?
2.2.
Eligibility
criteria
T
o
dene
the
scope
of
the
article,
the
eligibility
criteria
[18]
outli
ned
in
T
ables
2
and
3
were
estab-
lished.
These
criteria
were
also
used
to
v
erify
the
inclusion
de
cisions
of
the
re
vie
w
.
The
focus
w
as
on
selecting
recent,
reliable
empirical
studies
that
propose
demand
forecasting
models
using
AI
and
NIO
As.
T
able
2.
Inclusion
criteria
Code
Description
I1
Studies
that
use
Nature-inspired
algorithms,
as
part
of
the
proposed
AI-based
demand
forecasting
models
I2
Studies
containing
a
detailed
and
comprehensi
v
e
methodology
related
to
Nature-inspired
algorithms
used
I3
Empirical
studies
with
models
v
alidated
with
real
data
from
companies
I4
Studies
whose
main
objecti
v
e
is
the
de
v
elopment
and
v
alidation
of
a
demand
forecasting
model
T
able
3.
Exclusion
criteria
Code
Description
E1
Articles
published
after
2018
E2
Other
documents
than
scientic
articles
and
conference
papers
E3
Articles
published
in
other
idioms
than
English
or
Spanish
or
with
full
te
xt
not
a
v
ailable
E4
Documents
not
related
to
the
o
v
erall
demand
of
a
specic
b
usiness
mark
et
2.3.
Sour
ces
of
inf
ormation
In
July
2024,
the
Scopus,
W
eb
of
Science,
and
IEEE
databases
were
consulted,
as
the
y
are
recognized
for
their
reliability
within
the
academic
community
.
The
queries
were
conducted
through
their
respecti
v
e
platforms
using
the
same
search
method
for
all
three.
At
this
stage,
the
temporal
co
v
erage
of
the
search
w
as
not
limited.
2.4.
Sear
ch
strategy
During
the
de
v
elopment
of
the
search
strate
gy
,
the
population,
interv
ention,
comparison,
output
(PICO)
frame
w
ork
guided
the
identication
of
rele
v
ant
terms
and
their
synon
yms.
These
were
link
ed
using
OR
oper
-
ators
within
each
cate
gory
.
While
the
PICO
components
themselv
es
were
combined
using
AND
operators
to
create
the
follo
wing
search
string,
applied
uniformly
across
all
data
sources:
(”demand
forecasting”
OR
”de-
mand
prediction”
OR
”dem
and
prognostic”
OR
”demand
prognosis”
OR
”dem
and
estimation”)
AND
(”e
v
o-
lutionary
computation”
OR
”genetic
algorithm”
OR
”genetic
programming”
OR
”e
v
olutionary
programming”
OR
”e
v
olution
strate
gies”
OR
”neuro
e
v
olution”
OR
”sw
arm
intelligence”)
AND
(”articial
intelligence”
OR
”machine
learning”
OR
”deep
learning”
OR
”reinforc
ement
learning”
OR
”neural
netw
orks”)
AND
(”error”
OR
”performance”
OR
”ef
cienc
y”
OR
”rob
ustness”
OR
”accurac
y”
OR
”precision”).
2.5.
Article
selection
pr
ocess
The
researchers
independently
as
sessed
the
search
results
for
consistenc
y
and
rele
v
ance
to
the
inclu-
sion
and
e
xclusion
criteria.
After
resolving
inconsistencies
and
making
adjustments,
the
e
xclusion
criteria
were
applied
at
the
title
and
abstract
le
v
el,
and
the
inclusion
criteria
at
the
full
te
xt
le
v
el.
Only
studies
agreed
upon
by
both
researchers
were
included.
2.6.
Data
items
and
data
collection
The
authors
identied
the
data
required
to
answer
the
research
questions
and
collaborati
v
ely
de
v
el-
oped
an
e
xtraction
matrix,
with
columns
for
data
items
and
ro
ws
for
included
studies.
Each
article
w
as
indepen-
dently
re
vie
wed
and
discrepancies
were
resolv
ed
through
discussion.
Extracted
data
encompassed:
i)
economic
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3528–3541
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3531
sector;
ii)
problem
addressed
and
limitations
of
prior
solutions;
ii
i)
NIO
As
and
their
class
ication
by
[19];
i
v)
the
role
of
NIO
As
in
the
model;
v)
type
of
optimization
performed;
vi)
machine
learni
ng
methods
emplo
yed;
vii)
optimization
strate
gy
(e.g.,
single-objecti
v
e
or
multi-objecti
v
e);
viii
)
forecast
model
outline;
ix)
data
de-
scription;
x)
performance
metrics;
xi)
benchmarking
models;
xii)
performance
of
the
proposed
model.
2.7.
Synthesis
method
The
qualitati
v
e
synthesis
in
v
olv
ed
classifying
articles
based
on
similarities
using
the
criteria
described
in
subsection
2.6,
specically
items
iii),
v),
vi),
vii),
and
ix).
T
o
reduce
bias,
hierarchical
agglomerati
v
e
clustering
w
as
appli
ed
using
a
feature
table
to
compute
Euclidean
dist
ances.
The
silhouette
method
w
as
used
to
determine
the
optimal
number
of
clusters.
The
implementation
w
as
carried
out
in
Python,
using
scip
y
.cluster
.hierarch
y
for
linkage
construction,
sklearn.cluster
for
clustering
e
x
ecution,
and
sklearn.metrics
for
silhouette
e
v
aluation,
adopting
W
ard’
s
method
to
ensure
clear
separation
between
clusters.
Subsequently
,
the
clusters
and
sub-cluster
are
analyzed
and
grouped
when
the
dif
ferences
were
minimal.
This
classication
informed
the
synthesis
by
ident
ifying
contrib
utions
to
the
research
questions
and
led
to
the
de
v
elopment
of
a
ne
w
conceptual
model
that
inte
grates
these
insights
and
addresses
k
e
y
challenges
using
state-of-the-art
tools.
3.
RESUL
TS
AND
DISCUSSION
This
section
presents
the
results
of
the
study
selection
process
and
the
subsequent
qualitati
v
e
synthes
is.
The
selection
results
are
detailed,
tracing
the
progression
from
the
initial
records
identied
in
the
search
to
the
nal
number
of
studies
included
in
the
re
vie
w
.
F
or
the
quantitati
v
e
synthesis,
this
section
reports
the
classication
of
the
selected
articles
and
pro
vides
answers
to
the
research
questions,
of
fering
insights
into
the
k
e
y
ndings
deri
v
ed
from
the
analysis.
3.1.
Result
of
the
studies
selection
The
search
process
initially
retrie
v
ed
a
total
of
282
records.
After
eliminating
duplicates
and
syst
em-
atically
applying
the
e
xclusion
and
inclusion
criteria,
36
studies
were
selected
for
the
nal
analysis,
as
sho
wn
in
Figure
1.
Most
of
these
studies
were
published
in
2019,
2023,
and
2024.
In
terms
of
application
domains,
the
predominant
sectors
represented
in
the
selected
articles
are
electricity
,
w
ater
distrib
ution,
and
retail.
Figure
1.
Results
of
article
selection
3.2.
Result
of
the
qualitati
v
e
synthesis
As
a
result
of
the
analytical
operations
conducted
on
the
collected
informat
ion,
such
as
automat
ic
grouping,
comparison,
and
cate
gorization
of
articles,
and
subsequent
confrontation
of
e
vidence
with
the
respecti
v
e
research
questions,
signicant
ndings
were
obtained,
which
are
reported
as
follo
ws:
Ef
ciency
sear
c
h:
application
of
natur
e-inspir
ed
algorithms
in
...
(J
os
´
e
Rolando
Neir
a
V
illar)
Evaluation Warning : The document was created with Spire.PDF for Python.
3532
❒
ISSN:
2252-8938
3.2.1.
Result
of
the
classication
of
articles
After
con
v
erting
the
rele
v
ant
data
items
from
the
e
xtraction
matrix
into
dummy
v
ariables
and
eli
minat-
ing
the
redundant
v
ariables,
the
features
table
for
the
calculation
of
the
Euclidean
distances
between
the
articles
w
as
obtained.
In
relation
to
the
optimal
number
of
clusters,
the
silhouette
method
initially
recommended
24,
a
high
number
relati
v
e
to
the
36
articl
es
analyzed.
Ba
sed
on
the
best
s
ilhouette
v
alue
in
the
range
of
tw
o
to
v
e
clusters,
the
authors
selected
four
main
clusters
and
used
the
24
clusters
as
subclusters
within
these
main
clusters.
After
analyzing
similarities
and
dif
ferences
within
and
between
them,
subclusters
with
ne
gligible
dif-
ferences
were
mer
ged
to
align
with
the
research
objecti
v
es.
Each
class
and
subclass
were
then
descripti
v
ely
named.
The
nal
classication,
along
with
the
numbering
of
the
automatic
clusters,
is
presented
in
T
able
4.
T
able
4.
Classication
of
articles
Classication
of
articles
Cluster
Sub-cluster
Articles
a)
Shallo
w
learning
optimizers
1
13
i)
Ev
olutionary-sw
arm
optimizers
1
1
3
ii)
Shallo
w
parameter
optimizers
1
2,
3,
4,
5
8
iii)
Shallo
w
multi-objecti
v
e
optimizers
1
6
1
i
v)
Shallo
w
ensemble
optimizers
1
7
1
b)
Ev
olutionary
optimizers
2
16
i)
Genetic
programming
models
2
13,
14
3
ii)
Shallo
w
h
yperparameter
optimizers
2
15
1
iii)
Support
v
ector
machine
(SVM)
e
v
olutionary
optimizers
2
18
1
i
v)
Deep
learning
optimizers
2
11
-
Deep
structural
optimizers
2
16,
17,
19
4
-
Deep
multi-objecti
v
e
optimizers
2
8,
9
3
-
Deep
parameter
optimizers
2
10,
11
3
-
Deep
ensemble
optimizers
2
12
1
c)
Deep
sw
arm
optimizers
3
20,
21
3
d)
SVM
sw
arm
optimizers
4
22,
23,
24
4
The
follo
wing
describes
the
classes
and
subclasses
presented
in
T
able
4.
a.
Shallo
w
learning
optimizers:
this
class
groups
models
based
on
shallo
w
neural
netw
orks
(only
one
hidden
layer),
where
NIO
As
primarily
optimize
trainable
parameters,
with
dif
ferences
across
subclasses.
i)
Ev
olutionary-sw
arm
optimizers:
this
subclass
combines
genetic
algorithms
(GA)
with
sw
arm
intelli-
gence
to
optimize
models.
A
k
e
y
e
xample
is
study
in
[20],
where
GA
e
xplores
ne
w
weights
and
biases,
and
PSO
e
xploits
GA
’
s
best
ndings
through
continuous
tra
n
s
fer
learning.
Similarly
,
studies
in
[21]
and
[22]
use
GA
to
pre-train
initial
weights
and
biases,
impro
ving
the
ef
cienc
y
of
gradient
descent
ne-
tuning.
The
y
also
emplo
y
Northern
Gosha
wk
optimization
(NGO)
and
Gray
W
olf
optimization
(GW
O),
respecti
v
ely
,
to
optimize
other
model
h
yperparameters.
ii)
Shallo
w
parameter
optimizers:
this
subclass
focuses
on
optimizing
only
the
trainable
parameters
of
shal-
lo
w
neural
netw
orks,
mainly
through
e
v
olutionary
algorithms.
Studies
in
[23]-[26]
use
the
mind
e
v
olu-
tionary
algorithm
(MEA),
GA,
and
PSO
respecti
v
ely
for
pre-training
backpropag
ation
neural
netw
orks
(BPNN),
while
studies
in
[27]-[30]
apply
dif
ferential
e
v
olution
(DE),
GA,
and
articial
immune
system
(AIS)
algorithms
respecti
v
ely
for
full
parameter
optimization.
These
approaches
reduce
prediction
er
-
rors
compared
to
backpropag
ation,
though
at
the
cost
of
longer
training.
Notably
,
MEA
impro
v
es
both
accurac
y
and
e
x
ecution
time
o
v
er
GA.
iii)
Shallo
w
multi-objecti
v
e
optimizers:
this
subclass
includes
a
single
study
proposing
a
multi-objecti
v
e
optimization
to
reduce
lag
inputs
while
minimizing
error
for
a
multilayer
perceptron
(MLP)
model.
An
adaptati
v
e
neuro-fuzzy
inference
system
(ANFIS)
further
renes
predictions,
with
both
input
selection
and
ANFIS
parameters
optimized
by
GA.
The
model
outperforms
standalone
MLP
a
n
d
ANFIS
in
ac-
curac
y
.
Additionally
,
the
authors
claim
that
by
reducing
inputs,
computational
cost
decreases,
enabling
real-time
use,
though
no
quantitati
v
e
e
vidence
is
pro
vided
in
[31].
i
v)
Shallo
w
ensemble
optimizers:
this
subclass
includes
a
single
study
in
[32]
where
PSO
combines
predic-
tors,
including
an
e
xtreme
learning
machine
(ELM),
outperforming
standalone
components.
b
.
Ev
olutionary
optimizers:
this
class
is
characterized
by
using
only
and
e
xclusi
v
ely
e
v
olutionary
algo-
rithms,
lea
ving
aside
sw
arm
intelligence.
It
is
made
up
of
four
subclasses
that
are
distinguished
from
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3528–3541
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3533
each
other
by
the
type
of
optimization
the
y
perform,
and
the
AI
m
o
de
l
that
interv
enes,
where
deep
learn-
ing
models
are
a
prominent
set.
i)
Genetic
programming
(GP)
models:
this
subclass
applies
GP
to
generate
e
xplicit
mathematical
e
xpres-
sions
for
demand
forecasting,
emplo
ying
Canonical
GP
,
Multi-Gene
GP
,
and
multi-e
xpression
program-
ming
in
[33]-[35],
respecti
v
ely
.
These
studies
benchmark
ag
ainst
ARIMA,
articial
neural
netw
ork
(ANN),
and
ANFIS,
consistently
achie
ving
lo
wer
errors.
GP
stands
out
for
producing
interpretable
mod-
els,
unlik
e
the
black-box
nature
of
neural
netw
orks.
ii)
Shallo
w
h
yperparameter
optimizers:
this
subclass
includes
a
single
study
from
the
w
ater
distrib
ution
sector
[36],
where
GA
optimizes
structural
and
training
h
yperparameters
of
a
shallo
w
BPNN,
including
hidden
neurons,
learning
rate,
and
v
alidation
criteria.
The
optimized
model
outperforms
both
standalone
BPNN
and
ARIMA.
The
study
highlights
the
ef
fecti
v
eness
of
GA
for
h
yperparameter
tuning,
e
v
en
with
gradient-based
parameter
training.
iii)
SVM
e
v
olutionary
optimizers:
this
subclass
includes
a
single
study
in
[37]
where
GA
optimizes
the
penalty
(C)
and
k
ernel
(g
amma)
parameters
of
an
SVM
for
forecasting.
GA
adapti
v
ely
tunes
crosso
v
er
and
mutation
rates,
balancing
e
xploration
and
e
xploitation
a
s
in
[20].
The
model
impro
v
es
accurac
y
,
and
the
authors
suggest
f
aster
con
v
er
gence,
though
no
quantitati
v
e
e
vidence
is
pro
vided.
i
v)
Deep
learning
optimizers:
this
is
the
lar
gest
subclass,
with
11
studies
focused
on
optimizing
deep
learning
models,
mainly
using
e
v
olutionary
algorithms.
It
includes
four
groups:
tw
o
optimize
h
yperparameters,
one
tar
gets
trainable
parameters,
and
one
combines
models
using
weighted
a
v
eraging.
Each
group
is
described
belo
w
.
-
Deep
structural
optimizers:
this
group
includes
four
studies
on
h
yperparameter
optimization
of
deep
neu-
ral
netw
orks
[38]-[41],
co
v
ering
both
structural
(layers,
neurons)
and
training
h
yperparameters
(dropout
rate,
batch
size,
learning
rate).
All
combine
on
e
algorithm
for
e
xploration
and
another
for
renement,
such
as
Bayesian
optimization
(BO)-GA
[38],
GA-DE
[39],
and
GA-scatter
search
(SS)
[41].
All
impro
v
e
accurac
y
o
v
er
standalone
methods.
Notably
,
GA-SS
reduced
e
x
ecution
time
to
23
minutes
compared
to
58
minutes
for
GA
alone
and
480
minutes
for
trial-and-error
.
-
Deep
multi-objecti
v
e
optimizers:
this
group
includes
three
studies
by
the
same
author
[42]-[44],
applying
non-dominated
sorting
genetic
algorithm
II
(NSGA-II)
to
jointly
maximize
R
2
and
minimize
test
error
by
optimizing
structural
h
yperparameters
of
ANN,
long
short-term
memory
(LSTM),
and
T
rans
former
mod-
els.
T
rainable
parameters
are
rened
via
gradient
methods.
Accurac
y
and
e
xplanatory
po
wer
impro
v
e
across
studies.
T
o
reduce
computational
cost,
training
time
or
epochs
are
limited
during
optimization,
with
nal
retraining
of
the
best
models
for
full
con
v
er
gence.
These
studies
conrm
the
ef
fecti
v
eness
of
multi-objecti
v
e
optimization
for
neural
netw
ork
design.
-
Deep
parameter
optimizers:
this
group
focuses
on
parameter
optimization
in
deep
learning
using
neu-
roe
v
olution,
which
st
arts
from
simple
architectures
with
a
single
layer
and
fe
w
neurons,
progressi
v
ely
e
v
olving
both
structure
and
parameters.
The
studies
apply
neural
netw
ork
simultaneous
optimization
algorithm
(NNSO
A)
[45]
and
neuro
e
v
olution
of
augmenting
topologies
(NEA
T)
[46].
Additionally
,
[47]
uses
GA
for
pre-training
in
a
gray
neural
netw
ork
(GNN),
reportedly
impro
ving
con
v
er
gence,
though
without
quantitati
v
e
e
vidence.
-
Deep
ensemble
optimizers:
this
is
a
group
consisting
of
a
single
study
in
[48]
in
which
GA
is
used
to
obtain
the
optimal
weights
to
assemble
a
MLP
in
char
ge
of
forecasting
trends,
with
an
LSTM
in
char
ge
of
forecasting
seasonality
and
other
comple
x
v
ariations.
The
authors
found
that
the
proposed
model
obtains
better
error
metrics
than
benchmark
models.
c.
Deep
sw
arm
optimizers:
this
class
i
n
c
ludes
three
studies
that
impro
v
e
deep
neural
netw
ork
pre-training
by
combining
aggressi
v
e
e
xploration
with
strong
e
xploitation.
Study
in
[49]
uses
the
modied
dragon-
y
algorithm
(MD
A),
mer
ging
genetic
operators
with
Dragony
renement.
Study
in
[50]
combines
stochastic
fractal
search
(SFS)
for
broad
e
xploration
with
whale
optimization
algorithm
(W
O
A)
for
pre-
cise
e
xploitation,
impro
ving
netw
ork
accurac
y
and
con
v
er
gence.
Study
in
[51]
applies
PSO
to
deep
netw
ork,
enhancing
temporal
memory
and
prediction
accurac
y
.
d.
SVM
sw
arm
optimizers:
this
class
sho
ws
ho
w
sw
arm
intelligence
enhances
SVM
model
s
by
optimizing
k
ernel
parameters,
re
gularization
terms,
and
epsilon.
Boosted
multi
v
erse
optimizer
(BMV
O)
impro
v
es
Incremental
SVM
accurac
y
[52],
while
PSO
boosts
SVM
and
LSTM-SVM
h
ybrid
models
[9],
[53].
Ef
ciency
sear
c
h:
application
of
natur
e-inspir
ed
algorithms
in
...
(J
os
´
e
Rolando
Neir
a
V
illar)
Evaluation Warning : The document was created with Spire.PDF for Python.
3534
❒
ISSN:
2252-8938
Study
in
[10]
combines
GA
with
sw
arm
methods
for
SVM
optimization
in
cloud
demand
forecasting.
These
studies
demonstrate
the
v
ersatility
of
sw
arm
intelligence
to
impro
v
e
non-neural
models
across
sectors.
3.2.2.
Result
of
the
r
esear
ch
questions
This
section
consolidates
the
insights
from
the
classied
articles
to
e
v
aluate
thei
r
contrib
utions
to
the
research
questions
sho
wn
in
T
able
1.
It
e
xamines
the
application
of
NIO
As
in
AI-based
forecasting
models,
highlighting
the
characteristics
of
the
models,
the
specic
NIO
As
emplo
yed,
the
metrics
and
benchmarking
models
used,
the
performance
achie
v
ed,
and
the
primary
challenges
and
sectors
addressed.
Main
question:
ho
w
ha
v
e
NIO
As
been
used
in
recent
years
in
the
de
v
elopment
of
AI-bas
ed
demand
forecasting
models?
NIO
As
ha
v
e
been
predominantly
applied
to
neural
netw
ork
optimization,
representing
28
of
the
36
studies
re
vie
wed.
In
addition,
some
applications
ha
v
e
focused
on
the
optimization
of
SVM
and
genetic
programming
(GP)
models.
Notably
,
no
use
of
NIO
As
w
as
identied
for
other
machine
learning
models
be
yond
these
cate
gories.
Neural
netw
ork
optimization
co
v
ers
both
shallo
w
and
deep
learning,
as
sho
wn
in
Figures
2
and
3.
In
these
gures,
the
main
branches,
sub-branches,
and
lea
v
es
represent
the
optimization
focus,
applied
technique,
and
specic
NIO
A
with
its
corresponding
study
.
In
shallo
w
netw
orks,
the
focus
is
primarily
on
parametric
optimization,
achie
v
ed
through
pre-training
or
full
training,
ma
inly
using
GA.
In
deep
learning,
the
emphasis
shifts
to
h
yperparameter
optimization,
where
adv
anced
techniques
such
as
h
ybridization,
adapti
v
e
mechanisms,
and
multi-objecti
v
e
approaches
are
applied.
Notably
,
no
studies
address
full
parametric
optimization
of
deep
netw
orks.
Figure
2.
NIO
As
applications
on
shallo
w
learning
Figure
3.
NIO
As
applications
on
deep
learning
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3528–3541
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3535
P1:
What
are
the
characteristics
of
the
AI
models
in
which
NIO
As
ha
v
e
been
in
v
olv
ed?
The
AI
models
predominantly
optimized
by
NIO
As
are
neural
netw
ork-based.
Among
these,
half
(14
studies)
in
v
olv
e
shallo
w
neural
netw
orks,
while
the
other
half
focus
on
deep
neural
netw
orks.
The
shallo
w
neural
netw
orks
include
fully
connected
single
hidden
layer
BPNNs
[21]-[27],
[30],
[31],
[36],
radial
basis
function
neural
netw
orks
(RBFNNs)
[20],
[28],
and
w
a
v
elet
neural
netw
orks
[29].
F
or
deep
learning
models,
most
studies
in
v
olv
e
LSTM
architectures
[38],
[40],
[41],
[43],
[48],
[50]
and
fully
connected
deep
BPNNs
[39],
[42],
[45],
[46].
Other
deep
neural
net
w
orks
e
xplored
include
transformer
neural
netw
orks
[44],
generati
v
e
adv
ersarial
netw
orks
(GANs)
[49],
Deep
Echo
State
Netw
orks
[51],
and
GNN
[47].
Finally
,
NIO
As
ha
v
e
also
been
applied
to
models
based
on
SVM
[10],
[37],
[52],
[53],
and
GP
[33]-[35].
P2:
What
type
of
NIO
As
ha
v
e
been
used
to
interv
ene
in
AI-based
forecasting
models?
In
shallo
w
learning,
parameter
pre-training
has
predominantly
relied
on
GA
[21],
[22],
[24],
[25]
and
its
v
ariants,
such
as
MEA
[23],
with
occasional
use
of
PSO
[26].
F
or
complete
parameter
training,
GA
[28],
DE
[27],
[30],
AIS
[29],
and
the
h
ybrid
GA-PSO
[20]
ha
v
e
been
emplo
yed.
Additionally
,
GA
has
been
ap-
plied
to
h
yperparameter
optimization
[36]
and
input
selection
[31].
In
deep
learning
models,
GA
has
been
e
xtensi
v
ely
used
for
optimizing
structural
and
training
h
yperparameters,
either
independently
[40],
in
com-
bination
with
other
algorithms
such
as
SS
[41],
BO
[38],
and
success-history-based
parameter
adaptation
for
dif
ferential
e
v
olution
(SHADE)
[39],
or
within
the
NSGA-II
multi-objecti
v
e
optimization
frame
w
ork
[42]-[44].
Neuroe
v
olutionary
algorithms
lik
e
NNSO
A
[45]
and
NEA
T
[46]
ha
v
e
been
applied
to
simultaneously
optimize
h
yperparameters
and
trainable
parameters.
F
or
parametric
pre-training
of
deep
netw
orks,
GA
has
also
been
emplo
yed
[47],
along
with
h
ybrids
such
as
MD
A
[49]
and
SFS-W
O
A
[50].
F
or
optimization
of
SVM-based
models,
both
GA
[37]
and
Sw
arm
Intelligence
algorithms
[52],
[53]
ha
v
e
been
used,
as
well
as
h
ybridization
of
both
types
of
metaheuristics
[10].
P3:
What
metrics
and
models
ha
v
e
been
used
to
measure
and
compare
the
performance
of
models
b
uilt
with
NIO
As?
Root
mean
squared
error
(RMSE),
mean
absolute
percentage
error
(MAPE),
and
mean
absolute
error
(MAE)
are
the
most
used
metrics
to
assess
prediction
accurac
y
and
error
normalization
in
both
shallo
w
and
deep
models.
Correlat
ion
coef
cients
and
NSE
pro
vide
additional
performance
insights.
Shallo
w
models
benchmark
ag
ainst
re
gression,
ARIMA,
and
BPNN,
while
deep
models
are
compar
ed
to
support
v
ector
re
gression
(SVR),
ANN,
and
non-optimized
LSTM.
In
these
models,
adv
anced
metrics
lik
e
R
2
and
SEP
assess
goodness-of-t
and
rob
ustness,
especially
for
LSTM,
GAN,
and
T
ransformers.
In
SVM
models,
RMSE
and
MAPE
are
the
main
metrics,
with
specic
measures
lik
e
the
Bull
whip
Ef
fect
used
in
in
v
entory
forecasting.
P4:
What
is
the
performance
of
the
models
b
uilt
with
NIO
As
in
relation
to
the
established
models?
The
re
vie
wed
studies
pro
vide
compelling
e
vidence
that
NIO
As
signicantly
impro
v
e
forecasting
ac-
curac
y
in
shallo
w
and
deep
neural
netw
ork-based
models,
as
well
as
SVM-based
models.
In
neural
models,
this
impro
v
ement
is
consistent
across
h
yperparameter
optim
ization,
trainable
parameter
optimization,
or
a
com-
bination
of
both,
with
notable
e
xamples
from
studies
[20],
[45].
While
the
primary
focus
of
most
studies
is
on
reducing
forecast
errors,
some
authors
also
address
computational
ef
cienc
y
concerns,
proposing
strate
gies
such
as
h
ybrid
approaches
(e.g.,
BO-GA
[38],
SS-GA
[41],
M
D
A
[49])
that
enhance
con
v
er
gence
speed
and
ef
cienc
y
compared
to
standalone
methods.
Ho
we
v
er
,
h
ybrids
lik
e
NEA
T
-NCS
increase
e
x
ecution
time
[46],
and
others,
such
as
SFS-W
O
A,
reduce
training
time
b
ut
add
pre-training
steps,
lea
ving
o
v
erall
ef
cienc
y
un-
certain
[50].
Additional
ef
cienc
y-oriented
strate
gies
include
adapti
v
e
algorithms
with
automatic
parameter
tuning
[20],
[37],
[39],
transfer
learning
mechanisms
between
GA
and
PSO
[20],
and
input
v
ariable
selection
[31].
P5:
In
which
economic
sectors
ha
v
e
the
y
been
applied
and
what
main
problems
ha
v
e
been
attempted
to
be
solv
ed
with
the
models
b
uilt
with
NIO
As?
The
primary
sectors
utilizing
NIO
As
are
electricity
,
w
ater
distrib
ution,
manuf
acturing,
retail,
and
cloud
computing.
Across
sectors,
common
challenges
in
forecasting
include
managing
non-linear
dynamics,
reducing
o
v
ertting,
and
impro
ving
accurac
y
in
dynamic
systems.
T
raditional
models,
such
as
re
gression,
ARIMA,
and
MLR,
often
f
ail
to
capture
non-linearities,
while
standalone
machi
ne
learning
models
lik
e
ANN,
SVM,
and
BPNN
struggle
with
o
v
ertting
and
limited
adaptability
.
T
o
address
these
issues,
machine
learn-
ing
models
and
h
ybrid
frame
w
orks
ha
v
e
been
introduced.
LSTM
models
enhance
accurac
y
b
ut
f
ace
dif
culties
with
o
v
ertting
and
h
yperparameter
tuning,
while
h
ybrid
approaches,
such
as
LSTM-SVR
and
GA-DE
inte
gra-
tions,
impro
v
e
non-linear
modeling
b
ut
encounter
computational
ef
cienc
y
limitations.
NIO
As
play
a
critical
role
in
o
v
ercoming
these
limitations
by
enhancing
h
ybrid
frame
w
orks’
predicti
v
e
accurac
y
,
adapt
ability
,
and
Ef
ciency
sear
c
h:
application
of
natur
e-inspir
ed
algorithms
in
...
(J
os
´
e
Rolando
Neir
a
V
illar)
Evaluation Warning : The document was created with Spire.PDF for Python.
3536
❒
ISSN:
2252-8938
potentially
computational
ef
cienc
y
.
T
echniques
lik
e
PSO,
GA,
W
O
A,
and
DE
optimize
parameters
and
h
y-
perparameters,
addressing
the
shortcomings
of
traditional
and
standalone
models.
Sector
-specic
applications
highlight
these
adv
ancements:
in
electricity
,
NIO
As
support
dynamic
forecasting
for
short-term
and
annual
ener
gy
demands
[33],
[35];
in
w
ater
distrib
ution,
the
y
address
agricultural
and
urban
needs
by
managing
non-
linear
patterns
in
daily
and
hourly
forecasts
[23],
[44];
in
retail
and
manuf
acturing,
the
y
tackle
the
b
ull
whip
ef
fect
and
rene
e-commerce
demand
predictions
[9],
[34];
and
in
cloud
computing,
NIO
As
enhance
resource
demand
forecasting
and
w
orkload
optimization
in
highly
dynamic
en
vironments
[10],
[27],
[29].
3.3.
Pr
oposed
conceptual
model
This
section
introduces
a
ne
w
optimization
model
to
address
the
k
e
y
issue
identied:
optimizing
train-
able
parameters
in
deep
learning
models.
The
model
incorporates
adv
anced
tools
lik
e
heuristic
h
ybridization
and
adapti
v
e
parameter
control,
while
addressing
the
main
g
ap:
the
lack
of
e
v
olutionary
multitasking
applica-
tions.
3.3.1.
F
oundational
studies
This
model
adopts
the
multi-space
e
v
olutionary
search
for
lar
ge-scale
optimization
[15],
a
v
ariant
of
e
v
olutionary
multitasking
optimization.
It
generates
an
auxiliary
search
space
with
simplied
v
ersions
of
the
original
space
to
ease
the
search
process.
Insights
learned
from
the
auxiliary
space
guide
the
original
space
search,
enhancing
ef
fecti
v
eness
and
ef
cienc
y
,
while
the
best
results
from
the
original
space
return
to
enrich
the
auxiliary
search.
On
t
he
other
hand,
the
model
dra
ws
inspiration
from
lo
w-bit
quantization
optimization
[54]
for
con-
structing
the
auxiliary
search
space.
Quantization,
a
deep
netw
ork
compression
technique,
discretizes
con-
tinuous
v
ariables
representing
neural
netw
ork
weights,
reducing
possible
weight
v
alues
and
bit
requirements,
thereby
simplifying
optimizati
on.
Recent
adv
ances
ha
v
e
sho
wn
that
lo
w-bit-width
models
can
maintain
high
accurac
y
by
applying
quantization
to
both
acti
v
ations
and
weights
[55].
The
model
is
also
inuenced
by
meta-
heuristic
h
ybridization
[49],
[50],
and
adapti
v
e
mechanisms
for
mutation
and
crosso
v
er
control
in
GA
[20],
[39].
3.3.2.
Model
components
Model
components
are
the
k
e
y
elements
that
dene
the
search
strate
gy
,
adapti
v
e
mechanisms,
and
h
ybrid
techniques
of
the
optimization
frame
w
ork.
−
Original
search
space.
This
search
space
encompasses
all
possible
v
alues
for
the
weights
and
biases
of
a
deep
neural
netw
ork.
This
continuous
space
has
dimensions
equal
to
the
total
number
of
weights
and
biases
in
the
netw
ork.
−
Auxiliary
quantized
search
space.
This
space
discretizes
the
dimensions
of
the
original
space
based
on
the
range
of
the
initial
population.
It
allo
ws
generati
ng
v
alues
be
yond
the
initial
ranges
b
ut
within
feasible
limits.
The
number
of
possible
v
alues
per
dimension
is
go
v
erned
by
the
bit
width
(m-bit);
higher
m-bit
v
alues
permit
more
possibilities,
with
the
binary
dimension
(2-bit)
representing
the
most
e
xtreme
case.
−
GA
with
adapti
v
e
mechanism.
This
search
algorithm
e
xplores
the
discretized
auxiliary
space
using
mutation
and
crosso
v
er
,
guided
by
an
adapti
v
e
mechanism
that
promotes
aggressi
v
e
e
xploration
in
the
early
stages
and
shifts
to
e
xploitation
as
tness
impro
v
es.
−
SFS-W
O
A
h
ybridization.
This
h
ybrid
algorithm
searches
the
original
space
using
insights
from
the
auxiliary
space,
combining
the
e
xploration
strength
of
SFS
with
the
renement
capabilities
of
W
O
A.
−
Automatic
granularity
adjustment
mechanism.
This
mechanism
adjusts
the
m-bit
in
the
auxiliary
space,
starting
with
lo
w
m-bits
for
ef
cient
e
xplorati
on
of
lar
ge
re
gions
and
increasing
the
v
alue
during
e
v
olu-
tion
for
ner
e
xploration
of
promising
areas.
3.3.3.
Operational
dynamics
The
model
emplo
ys
multi-tas
k
optimization
with
transfer
learning.
The
auxiliary
space,
dri
v
en
by
aggressi
v
e
GA
and
lo
w
m-bit
v
alues,
rapidly
e
xplores
lar
ge
re
gions
and
identies
promising
areas.
This
in-
formation
directs
the
more
precise
b
ut
less
aggressi
v
e
SF
S-W
O
A
algorithms
in
the
original
space
to
e
xploit
these
areas
and
rene
the
search
for
optimal
solutions.
The
best
candidates
from
the
original
space
are
then
transferred
back
to
the
auxiliary
space
to
adjust
the
m-bit
and
enhance
e
xploration.
Figure
4
illustrates
the
operation
of
the
proposed
model.
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3528–3541
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3537
Figure
4.
Schematic
of
the
proposed
model
3.3.4.
Expected
outcomes
The
simultaneous
search
across
tw
o
spaces,
the
use
of
multiple
NIO
As,
and
the
dynamic
resizing
of
the
auxiliary
space
via
m-bit
adjustments
are
e
xpected
to
incur
signicant
computational
costs.
Ho
we
v
er
,
the
rapid
con
v
er
gence
f
acilitated
by
the
auxiliary
space
is
anticipated
to
accelerate
optimization
in
the
original
space,
maintaining
precision
and
a
v
oiding
local
optima.
This
ef
cienc
y
could
drastically
reduce
e
x
ecution
times,
a
critical
f
actor
for
practical
applications
of
deep
neural
netw
orks
in
demand
forecasting
across
v
arious
industries.
3.4.
Discussion
NIO
As
are
primarily
applied
to
neural
netw
ork
optimization.
In
shallo
w
netw
orks,
the
focus
is
on
parameter
optimization,
including
pre-training
[21]-[26]
and
full
training
[20],
[27]-[30].
In
deep
netw
orks,
parameter
optimization
is
rare
due
to
high
computational
costs
and
is
limited
to
neuro
e
v
olution
[45],
[46]
or
adv
anced
pre-training
methods
[47],
[49],
[50].
Neuro
e
v
olution
reduces
comple
xity
by
progressi
v
ely
in-
creasing
netw
ork
size,
starting
with
a
single
layer
,
b
ut
remains
resource-intensi
v
e.
In
contrast,
h
yperparameter
optimization
is
more
common
in
deep
learning,
with
se
v
en
studies
in
[38]-[44]
applying
techniques
such
as
h
ybridization,
adapti
v
e
mechanisms,
and
multi-objecti
v
e
optimization.
In
shallo
w
netw
orks,
only
one
study
in
[36]
addresses
h
yperparamet
er
tuning,
as
parameter
optimization
of
fers
greater
g
ains.
This
contrast
reects
the
higher
practicality
and
impact
of
h
yperparameter
optimization
in
deep
learning,
gi
v
en
the
dif
culty
of
parameter
-le
v
el
optimization.
Despite
their
dif
ferences,
shallo
w
and
deep
netw
orks
share
similar
optimization
strate
gies,
especial
ly
h
ybridization.
In
shallo
w
models,
[20]
applies
a
GA-PSO
h
ybrid
for
complete
RBFNN
training,
while
deep
learning
studies
use
BO-GA
and
SS-GA
for
LSTM
h
yperparam
eter
tuning
[38],
[41],
and
GA-D
A
and
SFS-
W
O
A
for
pre-training
[10],
[49].
Adapti
v
e
mechanisms
are
also
common:
[20]
applies
them
to
shallo
w
param-
eter
optimization,
and
[39]
to
deep
h
yperparameter
tuning.
Multi-objecti
v
e
optimization
with
NSGA-II
is
used
in
both
shallo
w
[31]
and
deep
netw
orks
[42]-[44].
While
the
main
goal
of
NIO
As
is
to
impro
v
e
model
accurac
y
,
se
v
eral
studies
ackno
wledge
their
high
computational
cost,
primarily
due
to
intensi
v
e
neural
netw
ork
e
v
aluations.
T
o
address
this,
dif
ferent
ef
cienc
y-
oriented
strate
gies
ha
v
e
been
proposed.
In
shallo
w
netw
orks,
studies
in
[26],
[31]
highlight
that
selecting
rele-
v
ant
input
v
ariables
reduces
model
comple
xity
and
impro
v
es
ef
cienc
y
.
[26]
proposes
gre
y
relational
analysis
for
input
selection,
while
[31]
uses
a
multi-objecti
v
e
e
v
olutionary
algorithm
that
jointly
minimizes
forecast
er
-
ror
and
selects
inputs.
Although
these
approaches
are
e
xpected
to
impro
v
e
ef
cienc
y
,
no
quantitati
v
e
v
alidation
is
pro
vided.
Ef
cienc
y
is
also
e
xplored
through
the
selection
of
specic
NIO
As,
with
[23]
nding
MEA
more
ef
cient
than
GA
for
shallo
w
netw
ork
pre-training,
and
[29]
sho
wing
w
ater
c
ycle
algorithm
(WCA)
surpassing
AIS
in
training
ef
cienc
y
,
though
at
the
cost
of
accurac
y
.
Adapti
v
e
mechanisms
in
GA
are
another
approach
Ef
ciency
sear
c
h:
application
of
natur
e-inspir
ed
algorithms
in
...
(J
os
´
e
Rolando
Neir
a
V
illar)
Evaluation Warning : The document was created with Spire.PDF for Python.