Indonesian
J
our
nal
of
Electrical
Engineering
and
Computer
Science
V
ol.
42,
No.
1,
April
2026,
pp.
215
∼
224
ISSN:
2502-4752,
DOI:
10.11591/ijeecs.v42.i1.pp215-224
❒
215
Multi-model
deep
ensemble
framew
ork
f
or
early
diagnosis
of
rar
e
genetic
disorders
using
genomic,
Phenotypic,
and
EHR
data
fusion
Shan
Mahmood
1
,
Sayma
Akter
T
rina
1
,
Ar
pita
Saha
Sukanna
1
,
Sabrina
Zaman
Esha
1
,
Md.
Agdam
Amin
Adib
1
,
Md.
Sanim
Ahmed
1
,
Amirul
Islam
2
1
Department
of
Computer
Science
and
Engineering,
American
International
Uni
v
ersity-Bangladesh,
Dhaka,
Bangladesh
2
Department
of
Electrical
and
Electronic
Engineering,
BSRM
School
of
Engineering,
BRA
C
Uni
v
ersity
,
Dhaka,
Bangladesh
Article
Inf
o
Article
history:
Recei
v
ed
Aug
9,
2025
Re
vised
Dec
13,
2025
Accepted
Mar
4,
2026
K
eyw
ords:
Deep
learning
Genetic
disorder
Healthcare
Hybrid
model
Machine
learning
ABSTRA
CT
Rare
genetic
disorders
pose
signicant
challenges
in
diagnosis
because
of
their
lo
w
pre
v
alence,
heterogeneous
manifestations,
and
lack
of
readily
a
v
ailable
datasets.
This
study
systematically
assesses
v
arious
supervis
ed
and
unsuper
-
vised
m
achine
learning
methods
for
the
early
diagnosis
of
rare
genetic
disorders
based
on
a
multi-center
pediatric
dataset
of
2,434
anon
ymized
records
enriched
with
demographic,
clinical,
and
laboratory
v
ariables.
In
this
study
,
genomic,
phenotypic,
and
EHR
v
ariables
were
inte
grated
into
a
unied
feature
matrix,
al-
lo
wing
all
modalities
to
be
jointly
analyzed
within
each
m
achine
learning
(ML)
model.
F
ollo
wing
rigorous
pre-processing
steps,
including
the
discard
of
nonin-
formati
v
e
identiers,
imputation
and
encoding
of
cate
gorical
features,
and
nor
-
malization
of
numerical
predictors,
v
e
classication
frame
w
orks
were
imple-
mented:
logistic
re
gression
(LR),
random
forest
(RF),
one-dime
nsional
con
v
o-
lutional
neural
netw
ork
(CNN),
a
h
ybrid
CNN
long
short-term
memory
(LSTM)
model,
and
a
stack
ed
ensemble
of
RF
and
XGBoost.
Model
performances
were
e
v
aluated
on
an
independent
test
s
et
via
accurac
y
,
precision,
recall,
and
F1-score
metrics.
While
LR
and
the
CNN
baseline
achie
v
ed
F1-scores
of
0.9090
and
0.8572,
respecti
v
ely
,
tree-based
models
substantially
outperformed
deep
learn-
ing
(DL)
models:
RF
achie
v
ed
an
F1-score
of
0.9565,
and
the
CNN+LSTM
h
ybrid
achie
v
ed
0.9611.
RF+XGB
ensemble
achie
v
ed
the
highest
diagnostic
accurac
y
(98.77%)
with
balanced
precision
(0.9879)
and
recall
(0.9877),
illus-
trating
its
superior
capacity
in
capturing
complicated,
non-linear
feature
interac-
tions
and
ghting
ag
ainst
data
imbalance.
The
results
illustrate
that
bagging
and
boosting
algorithms
in
combination
pro
vide
a
strong
and
interpretable
frame-
w
ork
for
ef
cient
pre-screening
of
rare
genetic
disorders.
The
use
of
these
ensemble
techniques
has
the
potential
to
enhance
clinical
practice
by
agging
high-risk
cases
for
v
erication
and
f
acilitating
early
therapeutic
interv
ention.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Amirul
Islam
Department
of
Electrical
and
Electronic
Engineering,
BSRM
School
of
Engineering,
BRA
C
Uni
v
ersity
Dhaka,
Bangladesh
Email:
amirul.islam@bracu.ac.bd
J
ournal
homepage:
http://ijeecs.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
216
❒
ISSN:
2502-4752
1.
INTR
ODUCTION
Rare
genetic
diseases
typically
af
fect
fe
wer
than
4
to
5
indi
viduals
in
e
v
ery
10,000.
Y
et
collecti
v
ely
,
the
y
form
a
substantial
w
orldwide
problem,
inuencing
in
e
xcess
of
400
million
indi
viduals
and
demonstrating
a
combined
pre
v
alence
of
3.5
to
5.9
percent
w
orldwide.
As
much
as
80
percent
of
them
are
genetic.
Although
there
is
no
uniform
international
criterion,
RDs
are
usually
dened
as
those
af
fecting
fe
wer
than
4–5
cases
out
of
10,000
indi
viduals
[1].
Considering
them
as
a
whole,
RDs
can
be
re
g
arded
as
a
common
e
v
ent,
with
7,265,
with
an
estimated
accumulated
pre
v
alence
of
3.5–5.9%
and
af
fecting
more
than
400
million
people
w
orldwide
[2].
Most
RDs
appear
to
be
caused
or
modied
by
genetic
f
actors;
u
p
to
80%
of
them
are
thought
to
ha
v
e
a
genetic
etiology
[3].
This
points
to
the
signicant
necessity
for
rapid
and
precise
diagnosis
so
that
preliminary
treatments,
accurate
genetic
counseling,
and
impro
v
ed
patient
care
can
be
addressed.
In
spite
of
progress
in
gene
testing
and
medical
diagnosis,
achie
ving
a
rm
diagnosis
is
v
ery
challenging.
Di
v
erse
symptoms
and
the
infrequent
incidence
of
some
syndromes
lead
to
protracted
diagnostic
odysse
ys,
a
high
rate
of
misdiagnoses,
and
postponed
treatment.
Con
v
entional
methods
using
sequential
biochemical
assays,
single-gene
tests,
and
specialist
opinion
generally
do
not
ha
v
e
the
capacity
to
tackle
numerous
cases,
recognize
issues
ef
fecti
v
ely
,
or
act
suf
ciently
f
ast
to
decipher
complicated
gene-symptom
correlations
on
a
grand
scale.
Also,
the
absence
of
lar
ge,
well-labeled
classes
from
multiple
centers
and
the
huge
class
size
v
ariation
mak
e
con
v
entional
analysis
methods
dif
cult.
Ev
en
though
rare
genetic
diseases
are
not
common
one
by
one,
together
the
y
af
fect
a
lot
of
people
around
the
w
orld.
These
diseases
are
hard
to
diagnose
because
man
y
doctors
do
not
ha
v
e
much
e
xperience
with
them,
and
there
is
not
al
w
ays
enough
data.
Machine
learning
(ML)
is
a
type
of
computer
program
that
helps
doctors
understand
health
problems
better
.
ML
is
a
smart
computer
tool
that
can
spot
patterns
in
a
person’
s
genes
and
symptoms.
It
helps
doctors
nd
out
what
illness
someone
might
ha
v
e
more
quickly
and
accurately
.
One
good
e
xample
is
DeepGestalt.
It
looks
at
f
aces
using
deep
learning
(DL)
to
nd
signs
of
o
v
er
215
genetic
conditions.
It
gets
the
right
answer
in
the
top
10
guesses
about
91%
of
the
time.
In
some
cases,
it
is
done
better
than
doctors
[4].
Another
tool
is
AlphaMissense,
made
by
DeepMind.
It
checks
small
changes
in
DN
A
called
missense
mutations.
W
ith
about
90%
accurac
y
,
it
helps
scientists
gure
out
which
changes
might
cause
disease,
so,
the
y
can
focus
on
the
most
important
ones
[5].
There
is
also
SHEPHERD,
from
the
Zitnik
Lab
.
It
uses
patient
data
and
DL
to
nd
genes
that
might
be
causing
a
disease
.
It
also
matches
patients
with
similar
cases.
This
tool
has
helped
a
lot
in
the
undiagnosed
disease
s
netw
ork
[6].
Since
there
often
is
not
enough
labeled
data
in
rare
disease
research,
other
learning
methods
are
used.
Sun
and
his
team
created
a
system
that
mix
es
unsupervised
learning
with
techniques
lik
e
self-distillation
and
gi
ving
the
model
guessed
labels.
It
is
useful,
especially
for
diagnosing
diseases
from
images
[7].
Li
et
al.
[8]
used
a
type
of
model
called
a
generati
v
e
adv
ersarial
netw
ork
(GAN),
which
lets
computers
learn
from
lots
of
unlabeled
data.
Their
method
w
ork
ed
better
than
re
gular
ones
and
sho
wed
that
GANs
are
great
for
detecting
rare
diseases.
Recently
,
researchers
ha
v
e
started
combining
dif
ferent
kinds
of
data
to
mak
e
models
more
accurate.
F
or
e
xample,
W
u
and
his
team
made
Gestalt
MML,
which
uses
a
T
ransformer
model
to
bring
together
f
acial
pictures,
patient
info,
and
doctor
notes.
This
helps
the
system
notice
both
visible
and
hidden
symptom
s
[9].
Another
tool
is
F
ace2Gene
from
FDN
A.
T
able
1
is
the
pre
vious
research
on
w
orking
in
rare
genetic
dis
orders,
a
model
with
performance
(accurac
y).
Despite
this
signicant
progress,
prior
literature
on
the
detection
of
a
rare
genetic
disorder
still
suf
fers
from
se
v
eral
k
e
y
limitations:
most
deep-learning
models,
including
DeepGestalt
and
GestaltMML,
rely
on
lar
ge
curated
image
data,
which
is
hard
to
generalize
into
f
acial
or
phenotypic
data-
poor
settings.
Other
methods,
such
as
AlphaMissense
and
SHEPHERD,
are
po
werful
b
ut
narro
wly
focus
on
genomic
v
ariants
and
often
miss
important
clinical
and
laboratory
features
that
inform
diagnosis.
Impro
v
e-
ments
in
lo
w-label
en
vironments
c
o
m
e
with
semi-supervised
and
GAN-based
approaches,
most
of
which
may
yield
unstable
results
and
need
careful
tuning.
Most
i
mportantly
,
v
ery
fe
w
studies
ha
v
e
combined
these
de
v
el-
opments:
genomic,
phenotypic,
and
EHR
data
are
inte
grated
within
a
single
fused
frame
w
ork,
and
cross-center
v
alidation
is
f
ar
too
often
lacking,
limiting
real-w
orld
clinical
applicability
.
These
g
aps
indicate
that
there
is
a
great
need
for
a
unied,
multi-modal,
and
rob
ust
diagnostic
approach-an
issue
our
study
directly
addresses.
T
o
address
these
g
aps,
we
pro
vide
a
full
ML
pipeline
applied
to
a
uniform
set
of
2,434
anon
ymous
children’
s
records.
The
records
contain
details
on
their
background,
health,
and
laboratory
tests.
Recent
years
ha
v
e
seen
a
sur
ge
in
interest
in
the
application
of
articial
intelligence
(AI)
and,
in
particular
,
ML
algorithms
because
of
their
potential
to
re
v
eal
intricate
patterns
in
genetic
data
[10].
The
accurac
y
of
RD
diagnosis
has
increased
as
a
result
of
these
ML
algorithms’
demonstrated
ability
to
learn
from
and
act
upon
massi
v
e,
di-
v
erse
datasets
in
order
to
deri
v
e
no
v
el
biological
insights
[11],
[12].
Examining
the
role
of
AI/ML
algorithms
Indonesian
J
Elec
Eng
and
Comp
Sci,
V
ol.
42,
No.
1,
April
2026:
215–224
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
and
Comp
Sci
ISSN:
2502-4752
❒
217
in
the
diagnosis
and
prognosis
of
RDs
using
genomic
data
[3].
Genetic
disorders
result
from
abnormalities
in
DN
A;
each
is
usually
rare,
b
ut
tak
en
together
,
the
y
are
a
common
cause
of
disease
throughout
the
w
orld.
Symptoms
are
v
aried,
often
o
v
erlapping,
and
clinical
diagnosis
is
frequently
v
ery
slo
w
.
Early
treatment
is
usu-
ally
essential
for
the
best
outcomes,
yet
traditional
methodologies
can
be
limited
and
sometimes
inconclusi
v
e.
Therefore,
reliable
data-dri
v
en
models
are
ur
gently
needed
to
support
f
aster
and
more
accurate
identication
of
geneti
c
disorders.
Our
whole
process
encompasses
thorough
data
preparation
(remo
v
al
of
personal
infor
-
mation,
imputation
of
missing
data,
encoding
labels,
and
normalization
of
data)
and
de
v
eloping
v
e
methods
for
classifying
the
data.
Among
these,
the
random
forest
(RF)+Boost
ensemble
emer
ged
as
the
best-performed
model,
achie
ving
98.77%
accurac
y
and
an
F1-score
of
0.9877
by
ef
fecti
v
ely
capturing
comple
x,
non-linear
feature
interactions
and
mitig
ating
class
imbalance.
Our
study
uniquely
inte
grates
genomic,
phenotypic,
and
EHR
features
into
a
single
fused
model
and
e
v
aluates
v
e
traditional,
DL,
and
ensemble
approaches
to
identify
the
most
reliable
diagnostic
frame
w
ork.
The
main
contrib
utions
are
as
follo
ws:
−
Computed
and
analyzed
feature
importance
to
pro
vide
meaningful
insights
for
clinical
practice,
enabling
early
detection
and
interv
ention
of
unusual
genetic
diseases.
−
Designed
and
implemented
our
proposed
tw
o
model
ensemble
architectures
(CNN+LSTM,
RF+XGBOOSTER)
to
capture
rare
genetic
disorders.
The
acceptability
of
the
ensemble
m
od
e
l
has
been
determined
through
v
ar
-
ious
indicators
of
accurac
y
,
F1-score,
precision,
and
recall.
T
able
1.
Summary
of
rare
disease
detection
models
and
their
performance
Ref.
Model
and
method
Data
and
task
Reported
performance
[4]
DeepGestalt:
CNN-based
f
acial
phe-
notype
frame
w
ork
quantifying
simi-
larities
to
genetic
syndromes
26,000+
patient
cases
across
215
syn-
dromes;
identify
syndrome
from
uncon-
strained
2D
f
acial
images
91%
T
op-10
accurac
y;
outper
-
formed
clinical
e
xperts
in
three
e
xperiments
[5]
AlphaMissense:
Unsupervised
lan-
guage
model
ne-tuned
wit
h
structural
conte
xt
and
e
v
olutionary
conserv
ation
Proteome-wide
missense
v
ariant
pathogenicity
pre
diction
across
the
hu-
man
proteome
>
90%
precision
for
kno
wn
clinical
impact
of
v
ariants
[6]
SHEPHERD:
Fe
w-shot
DL
o
v
er
a
biomedical
kno
wledge
graph
(dis-
eases,
phenotypes,
genes)
465
real
patients
(299
dis
eases)
from
the
Undiagnosed
Diseases
Netw
ork;
tasks:
causal
gene
disco
v
ery
,
“patients-lik
e-me”
retrie
v
al,
phenotype
characterization
Causal
genes
rank
ed
at
3.52
on
a
v
erage
[7]
Hybrid
URL
+
Pseudo-Label
Self-
Distillation:
Contrast
i
v
e
unsuper
-
vised
representation
learning
inte-
grated
with
pseudo-label
supervised
self-distillation
Rare
skin
lesion
classication
on
ISIC
2018
(fe
w-shot
setting
with
base
dataset
of
com-
mon
diseases
and
controls)
Substantially
outperforms
e
x-
isting
fe
w-shot
learning
meth-
ods
[8]
Semi-supervised
GAN
(feature-
matching
+
pull-a
w
ay
term)
for
rare
disease
detection
IQVIA
longitudinal
claims:
5,923
positi
v
es,
17,769
matched
ne
g
ati
v
es,
1.17
M
unla-
beled
(test:
23,246
positi
v
es
of
1.77
M)
34.18%
PR-A
UC
(vs.
LR
29.04%,
NN
28.95%,
RF
10.51%)
2.
METHOD
In
this
methodology
part,
we
present
a
clear
e
xplication
of
the
data
and
step-by-step
processes
fol-
lo
wed
in
our
study
.
First,
we
e
xpound
on
the
dataset
used
in
the
study
in
terms
of
its
source,
nature,
and
rele
v
ant
features.
W
e
then
elaborate
on
the
strong
pre-processing
processes
and
con
v
ert
the
data
into
a
ML
model-ready
format.
Secondly
,
we
clarify
the
v
arious
supervised
and
h
ybrid
ML
models
used,
describing
their
architectures.
Finally
,
we
specify
the
e
v
aluation
to
compare
the
performance
of
the
implemented
models.
2.1.
Dataset
description
F
or
our
project,
we
used
“Genetic
Disorder
Dataset”
from
Kaggle.
The
data
set
is
a
retrospec
ti
v
e,
multi-center
cohort
of
2,434
anon
ymized
pediatric
patient
records
(age
range:
0–14
years;
mean
±
SD:
6.99
±
4.38
years)
from
four
tertiary
care
centers.
Each
record
is
assigned
a
unique,
de-identi
ed
patient
code
and
annotated
with
minimal
demographic
metadata
(i
nstitution
name
and
location)
to
preserv
e
pro
v
enance
without
violating
condentiality
[13].
T
o
supplement
these
data,
the
data
set
includes
quantitati
v
e
lab
tests,
red
and
white
blood
cell
counts
e
xpressed
in
nati
v
e
units,
and
binary
blood-test
outcomes
(normal,
inconclusi
v
e,
or
missing
represented
as
–99)
[14].
Fi
v
e
binary
symptom
ags
record
the
occurrence
or
non-occurrence
of
primary
clinical
features,
and
the
principal
outcome
measure
“Genetic
Disorder”
(1
=
present
risk;
0
=
not
Multi-model
deep
ensemble
fr
ame
work
for
early
dia
gnosis
of
r
ar
e
g
enetic
...
(Shan
Mahmood)
Evaluation Warning : The document was created with Spire.PDF for Python.
218
❒
ISSN:
2502-4752
present)
is
complemented
by
a
free-te
xt
eld
stating
the
cate
gory
of
disorder
.
Figure
1
represents
the
la
bel
of
our
dataset,
where
0
is
no
disorder
,
and
1
is
disorder
.
Figure
1.
Genetic
disorder
label
2.2.
Dataset
pr
e-pr
ocessing
In
our
dataset,
non-informati
v
e
identiers
were
remo
v
ed,
missing
v
alues
were
imputed,
cate
gorical
v
ariables
were
label-encoded,
and
numerical
features
were
standar
d
i
zed
to
prepare
the
datas
et
for
modeling.
These
steps
ensured
a
clean,
consistent
feature
space
suitable
for
all
machine-learning
models
without
altering
the
underlying
clinical
patterns.
2.3.
Model
In
our
paper
,
we
applied
three
single
ML
and
DL
models
and
tw
o
h
ybrid
models
to
detect
rare
genetic
disorders.
Figure
2
represents
all
the
models
we
applied
in
our
paper
,
including
our
proposed
model.
Figure
2.
Applied
models
o
v
ervie
w
2.3.1.
Logistic
r
egr
ession
The
logistic
re
gression
(LR)
model
is
an
open,
baseline
detector
of
rare
genetic
disorders.
Input
fea-
tures,
ha
ving
been
preprocessed
and
encoded,
are
passed
through
a
single
dense
layer
that
computes
a
weighted
sum
of
all
predictors
[15].
This
linear
combination
i
s
then
passed
through
a
logistic
acti
v
ation
function
to
output
a
probability
score
of
the
presence
of
a
genetic
anomaly
.
It
is
trained
using
maximum-lik
elihood
estimation
with
gradient-based
optimization,
L2
re
gularization
for
coef
cient
size
limiting,
and
o
v
ertting
pre
v
ention
[16].
Its
computational
tractability
ensures
rapid
con
v
er
gence,
minimal
memory
usage,
and
reproducible
performance
in
a
wide
v
ariety
of
computing
en
vironments
[17].
As
a
rst-line
model,
it
of
fers
a
performance
benchmark
ag
ainst
which
more
adv
anced
architectures
can
be
rigorously
compared.
Indonesian
J
Elec
Eng
and
Comp
Sci,
V
ol.
42,
No.
1,
April
2026:
215–224
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
and
Comp
Sci
ISSN:
2502-4752
❒
219
2.3.2.
Random
f
or
est
A
RF
classier
is
a
collection
of
decision
trees.
Each
tree
is
trained
on
a
bootstrap
sample
from
the
data,
and
trees
can
gro
w
to
a
predetermined
maximum
depth
or
until
leaf-size
constraints
are
met,
balancing
the
trade-of
f
between
bias
and
v
ariance
[18].
The
nal
prediction
is
obtained
by
majority
v
ote
across
all
trees,
and
class-probability
predictions
are
calculated
by
a
v
eraging
indi
vidual
tree
v
otes.
This
model
is
disco
v
ering
comple
x,
nonlinear
interactions
between
clinical,
genetic,
and
en
vironmental
v
ariables
without
e
xplicit
feature
engineering.
P
arallelization
training
and
inference
enable
the
RF
to
scale
to
lar
ge
pediatric
cohorts
[19].
RF
model
tak
es
adv
antage
of
the
ensemble
of
decision
trees
to
spot
nonlinear
interactions
among
clinical
and
genetic
v
ariables.
Gro
w
trees
until
either
a
minimum
leaf-size
or
maximum
depth
threshold
is
reached
to
ensure
di
v
ersity
of
the
ensemble
[20].
T
rees
cast
v
otes
at
inference
145
on
the
e
xistence
of
a
genetic
disorder;
such
v
otes
are
tallied
by
majority
(or
a
v
eraged
for
probability
estimation)
[21].
Its
modular
and
intrinsic
structure
allo
ws
for
easy
scaling
to
lar
ge
cohorts
through
distrib
uted
tree
construction
[22].
2.3.3.
CNN
CNN
accepts
each
patient’
s
record
as
a
feature
sequence
so
that
local
patterns
can
be
e
xtracted
from
neighboring
v
ariable
sets.
The
model
architectur
e
is
composed
of
multiple
con
v
olutional
blocks,
each
with
a
con
v
olutional
layer
of
small
k
ernel
size,
batch-normalization,
ReLU
acti
v
ation,
and
max-pooling
[23].
A
global
a
v
erage-pooling
layer
then
reduces
each
feature
map
to
a
scalar
[24].
The
nal
sigmoid
acti
v
ation
produces
the
probability
of
a
genetic
disorder
.
W
eight
sharing
and
local
connecti
vi
ty
reduce
the
total
parameter
count,
f
acilitating
generalization
on
moderate-sized
clinical
datasets
[25].
The
input
tensor
under
goes
a
sequence
of
con
v
olutional
blocks—each
consisting
of
a
one-dimensional
con
v
olution
(s
mall
k
ernel),
batch
normalization,
ReLU
acti
v
ation,
a
nd
max-pooling
step-by-step
learning
hierarchical
representations
[26].
Finally
,
a
dropout-
re
gularized
fully
connected
layer
computes
the
disorder
probability
with
sigmoid
acti
v
ation.
This
layering
automatically
learns
comple
x
inter
-feature
relationships
[26].
2.3.4.
Hybrid
CNN+LSTM
CNN+LSTM
model
inte
grates
con
v
olutional
feat
u
r
e
learning
and
recurr
ent
sequence
modeling
to
learn
local
patterns
and
long-range
dependencies
across
feature
windo
ws.
The
early
Con
v1D
blocks
are
anal-
ogous
to
an
independent
CNN
and
pro
vide
a
lo
w-dimensional
feature
sequence.
This
is
passed
through
an
LSTM
layer
that
has
hidden
states
to
capture
information
from
all
time
steps.
A
dense
output
layer
with
sig-
moid
acti
v
ation
pro
vides
the
nal
probability
[27].
LSTM’
s
g
ating
beha
vior
enables
selecti
v
e
memory
of
k
e
y
features,
enhancing
the
sensiti
vity
to
atypical
e
v
ent
patterns.
Empirical
e
xperi
ments
demonstrate
this
tw
o-stage
approach
has
a
propensity
to
surpass
entirely
con
v
olutional
or
recurrent
netw
orks
when
it
comes
to
e
xtracting
both
sequence-le
v
el
as
well
as
motif-le
v
el
information
[28].
Con
v
olutional
blocks
(Con
v1D
BatchNorm
ReLU
MaxPool)
[27]
initially
map
continuous
subsets
of
features
into
a
lo
wer
-dimensional
sequence.
A
nal
dense
layer
with
sigmoid
acti
v
ation
produces
the
probability
estimate.
Combining
the
con
v
olutional
l-
ters’
po
wer
and
the
LSTM’
s
ability
,
the
h
ybrid
is
particularly
ef
fecti
v
e
at
identifying
dif
fuse
characteristics
of
uncommon
genetic
disorders
[29].
2.3.5.
Hybrid
random
f
or
est
and
gradient
boosting
F
or
enhancing
the
performance
and
stability
of
classication
processes
on
our
data,
we
propose
an
ensemble
model
combining
RF
and
gradient
boosting
(GB)
classiers
with
a
soft
v
oting
strate
gy
.
Ensemble
learning
is
a
widely
used
approach
to
strengthen
prediction
capacity
by
combining
the
strengths
of
ensemble
learners
[30],
[31].
In
our
method,
RF
and
GB
outputs
are
combined
based
on
their
estimated
class
probability
,
and
the
nal
label
is
decided
based
on
the
a
v
eraged
probabilities
(soft
v
oting).
RF
is
a
collection
of
decision
trees,
and
each
tree
casts
a
v
ote
for
making
the
nal
prediction.
Its
strengths
are
its
rob
ustness
to
o
v
ertting,
its
ability
to
learn
non-linear
relationships,
and
its
ability
to
handle
lar
ge
datasets.
GB,
on
the
other
hand,
sequentially
b
uilds
learners,
and
e
v
ery
ne
w
learner
focuses
on
the
errors
of
the
e
xisting
one.
It
is
reno
wned
for
its
good
predicti
v
e
performance
and
is
susceptible
to
o
v
ertting
and
tuning
parameters.
Through
the
fusi
on
of
the
tw
o
models,
we
seek
to
le
v
erage
their
di
v
ersity
and
complementarity
of
learning
paradigms:
RF
is
kno
wn
to
of
fer
stability
and
reduction
of
v
ariance,
whereas
Gradient
Boosting
is
aimed
at
bias
correction
and
rened
learning.
Multi-model
deep
ensemble
fr
ame
work
for
early
dia
gnosis
of
r
ar
e
g
enetic
...
(Shan
Mahmood)
Evaluation Warning : The document was created with Spire.PDF for Python.
220
❒
ISSN:
2502-4752
2.4.
Ev
aluation
metrics
In
order
to
rigorously
quantify
and
compare
the
diagnostic
accurac
y
of
each
proposed
classier
,
we
apply
the
confusion-m
atrix
paradigm
and
four
resultant
summary
measures,
namely
,
confusion
matrix,
accu-
rac
y
,
precision,
recall,
and
F1-score.
Supporting
our
e
v
aluation
is
the
confusion
matrix,
which
holds
model
predictions
ag
ainst
ground
truth
labels
in
a
binary
situation.
It
distinguishes
between
true
positi
v
es
(TP),
f
alse
positi
v
es
(FP),
f
alse
ne
g
ati
v
es
(FN),
and
true
ne
g
ati
v
es
(TN),
thus
illuminating
whether
errors
result
from
f
alse
ne
g
ati
v
es
[32].
Accurac
y
estimates
the
proportion
of
all
correctly
cl
assied
instances
and
is
an
intuiti
v
e
es-
timate
of
the
o
v
erall
correctness
of
the
model.
Precision
is
the
fraction
of
correctly
predicted
disorder
cases
among
predicted
positi
v
es.
High
precision
helps
limit
redundant
follo
w-up
tests
for
f
alse
alarms.
Recall
esti-
mates
ho
w
well
the
model
can
pick
actual
instances
of
disorder
from
all
the
real
positi
v
es
[33].
The
F1-score
balances
recall
and
precision
into
a
scalar
by
their
harmonic
mean,
yieldi
ng
a
balance
measure
that
is
unique
for
class
imbalance
[33].
3.
RESUL
TS
AND
DISCUSSION
An
e
xtensi
v
e
comparati
v
e
study
w
as
carried
out
to
compare
the
performance
of
v
e
v
aried
m
achine-
learning
setups
in
predicting
rare
genetic
dis
eases
from
intricate
genomic
and
clinical
datasets.
The
models
in
question
were
a
linear
LR
classier
,
an
ensemble
bagged
RF
,
a
con
v
olutional
neural
netw
ork
(CNN),
a
CNN+LSTM
netw
ork,
and
a
stack
ed
ensemble
of
RF
with
XGBoost
(RF+XGB).
Performance
w
as
e
v
aluated
o
v
er
an
independent
test
set,
where
accurac
y
,
precision,
recall,
and
F1-score
were
used
as
the
primary
metrics.
LR
achie
v
ed
a
baseline
accurac
y
of
90.91%
and
an
F1-score
of
0.9090,
reecting
the
inability
of
linear
deci-
sion
boundaries
to
model
the
intricate,
non-linear
relationships
inherent
to
rare
disease
genom
ics.
The
CNN
model,
which
w
as
created
for
local
sequence
motif
identication,
achie
v
ed
a
score
of
85.71%
(F1
=
0.8572),
reecting
its
relati
v
e
lack
of
ef
fecti
v
eness
when
transferred
to
tab
ular
formats
of
genetic
v
ariants
without
sig-
nicant
domain-specic
architectural
modication
or
massi
v
e
data
augmentation.
RF
presented
a
dramatic
impro
v
ement
from
LR
and
CNN
with
95.65%
accurac
y
and
an
F1-score
of
0.9565.
This
dramatic
boost
is
a
testament
to
the
ef
cac
y
of
decision
tree
ensembles
at
learning
intricate
feature
interactions
and
mitig
ating
v
ariance
by
bootstrap
aggre
g
ating.
The
CNN+LSTM
h
ybrid
architecture,
which
marries
con
v
olutional
lters
for
motif
capture
with
recurrent
layers
for
modeling
sequence
dependence,
took
it
a
step
further
with
96.10%
accurac
y
and
an
F1-score
of
0.9611.
While
the
g
ain
o
v
er
RF
w
as
modest,
it
w
as
statistically
signicant,
indi-
cating
that
the
addition
of
ordered
or
sequential
patterns,
i.e.,
v
ariant
phasing
or
longitudinal
clinical
measures,
yields
additional
predicti
v
e
v
alue.
The
best
results
were
obtained
by
the
RF+XGB
ensemble
that
posted
e
xcellent
metrics
across
the
board:
98.77%
accurac
y
,
0.9879
precision,
0.9877
recall,
and
an
F1-score
of
0.9877.
These
ndings
represent
a
roughly
2.7-percentage-point
impro
v
ement
o
v
er
CNN+LSTM
and
a
3.1-point
impro
v
ement
o
v
er
RF
alone,
indicating
the
ensemble’
s
better
discriminati
v
e
po
wer
in
the
rare-disease
setting.
T
able
2
sho
ws
the
applied
algorithm
and
its
performance
matrix
(accurac
y
,
preci
sion,
recall,
F1-score).
Here,
RF+XGB
achie
v
ed
a
better
result
than
other
algorithms.
T
able
2.
Performance
comparison
of
dif
ferent
algorithms
Algorithm
Accurac
y
Precision
Recall
F1-score
LR
0.9091
0.9093
0.9091
0.9090
RF
0.9565
0.9578
0.9565
0.9565
CNN
0.8571
0.8575
0.8571
0.8572
CNN
+
LSTM
0.9610
0.9614
0.9610
0.9611
Radom
f
or
est
+
XGBoost
(RF+XGB)
0.9877
0.9879
0.9877
0.9877
The
ndings
are
important
because
the
RF+XGB
model
presents
v
ery
reliable
performance
for
early
rare-genetic-disorder
detection,
reaching
an
accurac
y
of
98.77
percent
and
a
strong
o
v
erall
balance
in
precision
and
recall.
The
high
performance
here
indicates
that
ensemble
learning
can
underpin
f
aster
and
more
accurate
clinical
screening.
Additional
genomic
sequencing
data
could
further
this
w
ork,
testing
the
model
on
lar
ger
multi-center
datasets
and
using
e
xplainabl
e
AI
tools
to
understand
feature
importance.
K
e
y
e
xperiments
that
should
be
done
include
e
xternal
v
alidation,
ablation
studies,
a
n
d
rob
ustness
testing
under
class
imbalance.
The
main
tak
ea
w
ay
is
that
the
ensemble-based
models
pro
vide
a
practical
and
po
werful
basis
for
impro
ving
diagnosis
in
early-stage
rare
diseases.
Although
the
CNN+LSTM
model
slightly
outperformed
RF
,
tree-based
Indonesian
J
Elec
Eng
and
Comp
Sci,
V
ol.
42,
No.
1,
April
2026:
215–224
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
and
Comp
Sci
ISSN:
2502-4752
❒
221
methods,
particularly
the
RF+XGB
ensemble,
still
pro
vided
the
strongest
o
v
erall
performance,
indicating
that
ensemble
strate
gies
capture
nonlinear
interactions
more
ef
fecti
v
ely
than
single
deep
models.
Figure
3
represents
the
point
plot
of
performance
metrics
for
all
the
algorithms,
while
Figure
4
sho
ws
the
confusion
matrix
for
RF+XGB.
Logistic Regression
Random Forest
CNN
CNN + LSTM
RF + XGBoost
Score
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Accuracy
Precision
Recall
F1-Score
Figure
3.
Point
plot
of
metrics
by
v
arious
algorithms
Figure
4.
Confusion
matrix
of
proposed
algorithm
(RF+XGB)
4.
CONCLUSION
This
research
presents
a
holistic
assess
ment
of
v
arious
ML
strate
gies
for
early
diagnosis
of
rare
genet
ic
diseases
by
comparing
con
v
entional
linear
models,
deep-learning
structures,
and
ensemble
classiers
ag
ainst
intricate
genomic
and
clinical
data.
The
ndings
cate
gorically
indicate
that
ensemble
tree
methods
are
the
best
predictors
with
an
accurac
y
rate
of
98.77%
and
an
F1-score
of
0.9877,
achie
v
ed
by
the
RF+XGB
model.
The
superior
performance
of
RF+XGB
model,
in
comparison
to
LR
(accurac
y:
90.91%),
CNN
(accurac
y:
85.71%),
and
the
h
ybrid
CNN+LSTM
netw
ork
(accurac
y:
96.10%).
The
signicant
impro
v
ement
pro
vided
by
the
RF+XGB
ensemble
is
due
to
its
tw
o
inherent
strengths:
RF
v
ariance-reducing
bagging
method
and
XGBoost’
s
bias-reducing,
re
gularized
gradient-boosting
mechanism.
The
h
ybrid
model
ef
fecti
v
ely
balances
the
risks
of
undertting
and
o
v
ertting.
In
addition,
the
ensemble
interpretability
is
remarkable.
In
conclusion,
our
results
demonstrate
that
the
RF+XGB
ensemble
is
a
rob
ust
and
interpretable
basis
for
early
diagnosis
of
Multi-model
deep
ensemble
fr
ame
work
for
early
dia
gnosis
of
r
ar
e
g
enetic
...
(Shan
Mahmood)
Evaluation Warning : The document was created with Spire.PDF for Python.
222
❒
ISSN:
2502-4752
rare
genetic
disorders
in
complicated
genomic
and
clinical
data
sets,
of
fering
superior
predicti
v
e
reliability
and
strong
potential
for
inte
gration
into
modern
intelligent
healthcare
and
IoT
-supported
diagnostic
systems.
This
w
ork
will
also
contrib
ute
to
intelligent
computing
and
healthcare
IoT
systems
by
pro
viding
a
reliable
data-dri
v
en
diagnostic
frame
w
ork.
This
ensemble
model
can
be
incorporated
into
smart
clinical
platforms
for
real-time
screening
and
decision
support.
In
general,
the
approach
strengthens
the
link
between
ML,
healthcare
automation,
and
modern
computational
system
design.
FUNDING
INFORMA
TION
This
research
w
as
funded
by
Shan
Mahmood,
Sabrina
Zaman
Esha,
and
Arpita
Saha
Sukanna,
who
acquired
the
nancial
support
necessary
to
carry
out
the
study
.
A
UTHOR
CONTRIB
UTIONS
ST
A
TEMENT
This
journal
uses
the
C
o
nt
rib
utor
Roles
T
axonomy
(CRediT)
to
recognize
indi
vidual
author
contrib
u-
tions,
reduce
authorship
disputes,
and
f
acilitate
collaboration.
Name
of
A
uthor
C
M
So
V
a
F
o
I
R
D
O
E
V
i
Su
P
Fu
Shan
Mahmood
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Sayma
Akter
T
rina
✓
✓
✓
✓
✓
✓
✓
✓
✓
Arpita
Saha
Sukanna
✓
✓
✓
✓
✓
Sabrina
Zaman
Esha
✓
✓
✓
✓
Md.
Agdam
Amin
Adib
✓
✓
Md.
Sanim
Ahmed
✓
Amirul
Islam
✓
✓
✓
✓
✓
C
:
C
onceptualization
I
:
I
n
v
estig
ation
V
i
:
V
i
sualization
M
:
M
ethodology
R
:
R
esources
Su
:
Su
pervision
So
:
So
ftw
are
D
:
D
ata
Curation
P
:
P
roject
Administration
V
a
:
V
a
lidation
O
:
Writing
-
O
riginal
Draft
Fu
:
Fu
nding
Acquisition
F
o
:
F
o
rmal
Analysis
E
:
Writing
-
Re
vie
w
and
E
diting
CONFLICT
OF
INTEREST
ST
A
TEMENT
The
authors
state
no
conict
of
interest.
D
A
T
A
A
V
AILABILITY
Data
supporting
this
study
are
a
v
ailable
from
the
corresponding
author
upon
request.
REFERENCES
[1]
T
.
Richter
et
al.
,
“Rare
disease
terminology
and
denitions—a
systematic
global
re
vie
w:
report
of
the
ispor
rare
disease
special
interest
group,
”
V
alue
in
Health
,
v
ol.
18,
no.
6,
pp.
906–914,
Sep.
2015,
doi:
10.1016/j.jv
al.2015.05.008.
[2]
S.
N.
W
akap
et
al.
,
“Estimating
cumulati
v
e
point
pre
v
alence
of
rare
diseases:
analysis
of
the
or
-phanet
database,
”
European
Journal
of
Human
Genetics
,
v
ol.
28,
no.
2,
pp.
165–173,
2020,
doi:
10.1038/s41431-019-0508-0.
[3]
S.
Brasil,
C.
P
ascoal,
R.
Francisco,
V
.
dos
Reis
Ferreira,
P
.
A.
V
ideira,
and
G.
V
alad˜ao,
“
Articial
intelligence
(AI)
in
rare
diseases:
is
the
future
brighter?”
Genes
,
v
ol.
10,
no.
12,
p.
978,
2019,
doi:
10.3390/genes10120978.
[4]
Y
.
Guro
vich
et
al.
,
“Deepgestalt-identifying
rare
genetic
syndromes
using
deep
learning,
”
arXi
v
preprint
arXi
v:1801.07637
,
Jan.
2018,
doi:
10.48550/arXi
v
.1801.07637.
[5]
J
.
Cheng
et
al.
,
“
Accurate
proteome-wide
misse
nse
v
ariant
ef
fect
prediction
with
alphamissense,
”
Science
,
v
ol.
381,
no.
6664,
p.
eadg7492,
Sep.
2023,
doi:
10.1126/science.adg7492.
[6]
E.
Alsentzer
et
al.
,
“Deep
learning
for
diagnosing
patients
with
rare
genetic
diseases,
”
medRxi
v
,
Dec.
2022,
doi:
10.1038/s41746-
025-01749-1.
[7]
J
.
Sun,
D.
W
ei,
K.
Ma,
L.
W
ang,
and
Y
.
Zheng,
“Unsupervised
representation
learning
meets
pseudolabel
supervised
self-distillation:
A
ne
w
approach
to
rare
disease
classication,
”
in
Proc.
International
Conference
on
Medical
Image
Computing
and
Computer
-
Assisted
Interv
ention
,
Strasbour
g,
France,
Sep.
2021,
pp.
519–529,
doi:
10.48550/arXi
v
.2110.04558.
[8]
W
.
Li,
Y
.
W
ang,
Y
.
Cai,
C.
Arnold,
E.
Zhao,
and
Y
.
Y
uan,
“Semi-supervised
rare
disease
detection
using
generati
v
e
adv
ersarial
netw
ork,
”
arXi
v
preprint
arXi
v:1812.00547
,
Dec.
2018,
doi:
10.48550/arXi
v
.1812.00547.
Indonesian
J
Elec
Eng
and
Comp
Sci,
V
ol.
42,
No.
1,
April
2026:
215–224
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
and
Comp
Sci
ISSN:
2502-4752
❒
223
[9]
D.
W
u
et
al.
,
“Gestaltmml:
Enhancing
rare
genetic
disease
diagnosis
through
multimodal
machine
learning
combining
f
acial
images
and
clinical
te
xts,
”
ArXi
v
,
pp.
arXi
v–2312
,
Apr
.
2024,
doi:
10.48550/arXi
v
.2312.15320.
[10]
E.
Routhier
and
J.
Mozziconacci,
“Genomics
enters
the
deep
learning
era,
”
PeerJ
,
v
ol.
10,
p.
e13613,
Jun.
2022,
doi:
10.7717/peerj.13613.
[11]
J.
Schaefer
,
M.
Lehne,
J.
Schepers,
F
.
Prasser
,
and
S.
Thun,
“The
use
of
machine
learning
in
rare
diseases:
a
scoping
re
vie
w
,
”
Orphanet
Journal
of
Rare
Diseases
,
v
ol.
15,
no.
1,
p.
145,
Jun.
2020,
doi:
10.1186/s13023-020-01424-6.
[12]
S.
T
.
Setty
,
M.-P
.
Scott-Bo
yer
,
T
.
Cuppens,
and
A.
Droit,
“Ne
w
de
v
elopments
and
possibilities
in
reanalysis
and
reinterpretation
of
whole
e
xome
seque
ncing
datasets
for
unsolv
ed
rare
diseases
using
machine
learning
approaches,
”
International
Journal
of
Molecular
Sciences
,
v
ol.
23,
no.
12,
p.
6792,
Jun.
2022,
doi:
10.3390/ijms23126792.
[13]
A.
Raza
te
xtslet
al.,
“Predicting
genetic
disorder
and
types
of
disorder
using
chain
classier
approach,
”
Genes
,
v
ol.
14,
no.
1,
p.
71,
Dec.
2022,
doi:
10.3390/genes14010071.
[14]
N.
Chaplot
,
D.
P
ande
y
,
Y
.
K
umar
,
and
P
.
S.
Sisodia,
“
A
comprehensi
v
e
analysis
of
artici
al
intelligence
techniques
for
the
prediction
and
prognosis
of
genetic
disorders
using
v
arious
gene
disorders,
”
Archi
v
es
of
Computational
Methods
in
Engineering
,
v
ol.
30,
no.
5,
pp.
3301–3323,
Jun.
2023,
doi:
10.1007/s11831-023-09904-1.
[15]
S.
Dom
ınguez-Almendros,
N.
Ben
ıtez-P
arejo,
and
A.
R.
Gonzalez-Ramirez,
“Logistic
re
gression
models,
”
Aller
gologia
Et
Im-
munopathologia
,
v
ol.
39,
no.
5,
pp.
295–305,
Sep.
2011,
doi:
10.1016/j.aller
.2011.05.002.
[16]
J.
M.
H
ilbe,
Logistic
re
gression
models
,
Chapman
and
Hall/CRC,
May
2009.
[17]
T
.
G.
Nick
and
K
.
M.
Campbell,
Logistic
re
gression.
T
oto
w
a
,
NJ:
Humana
Press,
Jan.
2007.
[18]
S.
J.
R
ig
atti,
“Random
forest,
”
Journal
of
Insurance
Medicine
,
v
ol.
47,
no.
1,
pp.
31–39,
2017,
doi:
10.17849/insm-47-01-31-39.1.
[19]
M.
Belgiu
and
L.
Dr
agut
¸
,
“Random
forest
in
remote
sensing:
A
re
vie
w
of
applications
and
future
directions,
”
ISPRS
Journal
of
Photogrammetry
and
Remote
Sensing
,
v
ol.
114,
pp.
24–31,
Apr
.
2016,
doi:
10.1016/j.isprsjprs.2016.01.011.
[20]
A.
P
armar
,
R.
Katariya,
and
V
.
P
atel,
“
A
re
vie
w
on
random
forest:
An
ensemble
classier
,
”
in
Proc.
International
Conference
on
In-
telligent
Data
Communication
T
echnologies
and
Internet
of
Things
,
Coimbatore,
India,
Aug.
2018,
pp.
758–763,
doi:
10.1007/978-
3-030-03146-6
86.
[21]
L.-C.
Chuang
and
P
.-H.
K
uo,
“Building
a
genetic
risk
model
for
bipolar
disorder
from
genome-wide
association
data
with
random
forest
algorithm,
”
Scientic
Reports
,
v
ol.
7,
no.
1,
p.
39943,
Jan.
2017,
doi:
10.1038/srep39943.
[22]
A.
K.
Kalusi
v
aling
am,
A.
Sharma,
N.
P
atel,
and
V
.
Singh,
“Le
v
eraging
deep
learning
and
random
forest
algorithms
for
en-
hanced
genomic
analysis
in
rare
disease
identication,
”
International
Journal
of
AI
and
ML
,
v
ol.
2,
no.
10,
No
v
.
2013,
doi:
10.1016/j.csbr
.2025.100038.
[23]
F
.
Elmaz,
R.
Eyck
erman,
W
.
Casteels,
S.
Latr
e,
and
P
.
Hellinckx,
“CNN-LSTM
architecture
for
predicti
v
e
indoor
temperature
modeling,
”
Building
and
En
vironment
,
v
ol.
206,
p.
108327,
Dec.
2021,
doi:
10.1016/j.b
uilden
v
.2021.108327.
[24]
K.
Gupta,
N.
Jiw
ani,
and
N.
Afreen,
“Blood
pressure
detection
using
CNN-LSTM
model,
”
in
Proc.
2022
IEEE
11th
Interna-
tional
Conference
on
Communication
Systems
and
Netw
ork
T
echnologies
(CSNT)
,
Indore,
India,
Apr
.
2022,
pp.
262–366,
doi:
10.1109/CSNT54456.2022.9787648.
[25]
Y
.
F
an,
H.
Xiong,
and
G.
Sun,
“Deepasdpred:
a
CNN-LSTM-based
deep
learning
method
for
autism
spectrum
disorders
risk
rna
identication,
”
BMC
Bioinformatics
,
v
ol.
24,
no.
1,
p.
261,
Jun.
2023,
doi:
10.1186/s12859-023-05378-x.
[26]
M
.
M.
Hossain
et
al.
,
“Cardio
v
ascular
disease
identication
using
a
h
ybrid
CNN-LSTM
model
with
e
xplainable
AI,
”
Informatics
in
Medicine
Unlock
ed
,
v
ol.
42,
p.
101370,
Jan.
2023,
doi:
10.1016/j.imu.2023.101370.
[27]
Z.
V
ujo
vi
c
et
al.
,
“Classication
model
e
v
aluation
metrics,
”
International
Journal
of
Adv
anced
Computer
Science
and
Applica-
tions
,
v
ol.
12,
no.
6,
pp.
599–606,
Jan.
2021,
doi:
10.14569/IJ
A
CSA.2021.0120670.
[28]
J
.
Liang,
“Confusion
matrix:
machine
learning,
”
POGIL
Acti
vity
Clearinghouse
,
v
ol.
3,
no.
4,
Dec.
2022,
doi:
https://pac.pogil.or
g/inde
x.php/pac/article/vie
w/304.
[29]
K.
O’
shea
and
R.
Nash,
“
An
introduction
to
con
v
olutional
neural
netw
orks,
”
arXi
v
preprint
arXi
v:1511.08458
,
No
v
.
2015,
doi:
10.48550/arXi
v
.1511.08458.
[30]
L.
Rokach,
“Ensemble-based
classiers,
”
Articial
Intelligence
Re
vie
w
,
v
ol.
33,
no.
1,
pp.
1–39,
2010,
doi:
10.1007/s10462-009-
9124-7.
[31]
P
.
Chujai,
K.
Chomboon,
P
.
T
eerarassamee,
N.
K
erdprasop,
and
K.
K
erdprasop,
“Ensemble
learning
for
imbal
anced
data
classi-
cation
problem,
”
in
Proc.
of
the
3rd
International
Conference
on
Industrial
Application
Engineering
,
v
ol.
467,
Kitak
yushu,
Japan,
Mar
.
2015,
pp.
449–456,
doi:
10.12792/iciae2015.079.
[32]
S.
S.
Y
ada
v
and
S.
M.
Jadha
v
,
“Deep
con
v
olutional
neural
netw
ork
based
medical
image
classication
for
disease
diagnosis,
”
Journal
of
Big
Data
,
v
ol.
6,
no.
1,
pp.
1–18,
Dec.
2019,
doi:
10.1186/s40537-019-0276-2.
[33]
S.
Sarraf
and
G.
T
oghi,
“Classication
of
alzheimer’
s
disease
using
fMRI
data
and
deep
learning
con
v
olutional
neural
netw
orks,
”
arXi
v
preprint
arXi
v:1603.08631
,
Mar
.
2016,
doi:
10.48550/arXi
v
.1603.08631
F
ocus
to
learn
more.
BIOGRAPHIES
OF
A
UTHORS
Shan
Mahmood
is
an
Americ
an
International
Uni
v
ersity-Bangladesh
B.Sc.
Computer
Science
and
Engineering
student.
His
areas
of
rese
arch
are
AI,
ML,
DL,
natural
language
processing
(NLP),
generati
v
e
adv
ersarial
netw
orks,
and
high-accurac
y
AI-dri
v
en
decision
frame
w
orks.
He
is
co-
author
of
the
2025
MDPI
Drones
paper
“Multi-Agent
Actor–Critic
Frame
w
orks
for
U
A
V
Sw
arm
Net-
w
orks”
(Q1,
IF
4.4),
In
this
research,
His
contrib
ution
in
this
research
is
conceptualizat
ion,
method-
ology
,
softw
are,
v
alidation,
formal
analysis,
in
v
estig
ation,
resources,
data
curation,
writi
ng
–
original
draft,
re
vie
w
and
editing,
visualization,
project
administration.
He
is
committed
to
creating
intelligent
distrib
uted
systems
and
describing
theoretical
models
as
concret
e,
reproducible
research
outputs.
He
can
be
contacted
at
email:
shan26103@gmail.com.
Multi-model
deep
ensemble
fr
ame
work
for
early
dia
gnosis
of
r
ar
e
g
enetic
...
(Shan
Mahmood)
Evaluation Warning : The document was created with Spire.PDF for Python.
224
❒
ISSN:
2502-4752
Sayma
Akte
r
T
rina
is
a
B.Sc.
student
in
Computer
Science
and
Engineering
at
the
Amer
-
ican
International
Uni
v
ersity-Bangladesh,
with
a
strong
academic
focus
on
AI,
ML,
and
data-dri
v
en
systems.
Her
research
interests
include
ML,
DL,
algorithm
optimization,
and
sentiment
and
emotion
analysis
using
generati
v
e
models.
In
the
current
study
,
she
contrib
uted
to
the
methodology
design,
In
v
estig
ation,
V
isualization,
softw
are,
Data
curation,
formal
analysis,
v
alidation,
and
w
as
acti
v
ely
in
v
olv
ed
in
bot
h
the
original
draft
pr
eparation
and
manuscript
re
vie
w
and
editing.
She
is
dedicated
to
b
uilding
interpretable
and
scalable
intelligent
systems
that
inte
grate
theoretical
adv
ancements
with
real-w
orld
applications.
She
has
been
r
ecognized
with
the
Dean’
s
A
w
ard
for
outstanding
academic
performance.
She
can
be
contacted
at
email:
trinasayma5191@gmail.com.
Ar
pita
Saha
Suk
anna
is
an
Ame
rican
International
Uni
v
ersity-Bangladesh
B.Sc.
Com-
puter
Science
and
Engineering
student.
In
this
research,
her
contrib
ution
is
methodology
,
softw
are,
data
curation,
and
writing
original
draft.
Her
areas
of
research
are
ML,
DL,
and
NLP
.
She
is
ac-
ti
v
ely
e
xpanding
her
impact
in
AI
through
joint
research
ef
forts
and
ongoing
publications.
She
can
be
contacted
at
email:
arpita.sukanna85@gmail.com.
Sabrina
Zaman
Esha
is
an
under
graduate
student
in
the
Department
of
Computer
Science
and
E
ngineering
at
Ame
rican
International
Uni
v
ersity
Bangladesh.
Her
research
inter
ests
include
NLP
and
AI-dri
v
en
c
ybersecurity
.
She
has
been
in
v
olv
ed
in
research
projects
focusing
on
enhanc-
ing
email
security
using
DL
and
NLP-based
models.
Her
contrib
ution
is
visualisation,
data
curation
and
writing
original
draft.
She
is
passionate
about
using
intelligent
systems
to
address
real-w
orld
problems
in
digital
communication.
Her
academic
contrib
utions
continue
to
gro
w
through
collabo-
rati
v
e
research
and
publications
in
the
eld
of
AI
and
c
ybersecurity
.
She
can
be
contacted
at
email:
needbasic51@gmail.com.
Md.
Agdam
Amin
Adib
is
currently
studying
Computer
Science
and
Engineering
(CSE)
at
American
International
Uni
v
ersity-Bangladesh.
He
is
an
under
graduate
student.
His
research
in-
terest
is
in
applying
ML
and
DL
to
healthcare
to
solv
e
real-w
orld
medical
challenges
by
utilizing
Python,
P
andas,
scikit-learn,
and
basic
genomic
data
processing
tools.
His
research
e
xpertise
in-
cludes
ML,
DL,
and
biomedical
data
analysis
with
a
focus
on
healthcare
solutions
and
genomic
data
applications.
His
contrib
ution
to
this
research
is
softw
are
and
writing
the
original
dr
aft.
He
aims
to
contrib
ute
practical,
research-based
solutions
that
can
mak
e
a
meaningful
impact
in
healthcare
to
contrib
ute
to
the
betterment
of
humanity
.
He
can
be
contacted
at
email:
agdam.adib@gmail.com.
Md.
Sanim
Ahmed
is
a
B.Sc.
student
in
Computer
Science
and
Engineering
a
t
the
Amer
-
ican
International
Uni
v
ersity-Bangladesh
with
research
interests
in
ML,
DL.
He
has
contrib
uted
to
this
research
in
the
follo
wing
capacities
as
a
co-author
of
the
2025
MDPI
Drones
paper
“Multi-Agent
Actor–Critic
Frame
w
orks
for
U
A
V
Sw
arm
Netw
orks”
(Q1,
IF
4.4).
In
the
current
study
,
contri-
b
utions
inc
lude
writing
the
original
draft.
His
w
ork
focuses
on
creating
interpretable
and
scalable
intelligent
systems
that
connect
theory
to
real-w
orld
implementation.
He
can
be
contacted
at
email:
ahmedsanim1234@gmail.com.
Amirul
Islam
recei
v
ed
his
P
.hD.
de
gree
in
Computing
and
Electronic
Systems
from
the
Uni
v
ersity
of
Esse
x,
UK,
in
2022.
He
completed
his
M.Sc.
from
K
ookmin
Uni
v
ersity
,
South
K
orea.
He
currently
serv
es
as
an
Assistant
Professor
in
the
Department
of
Electrical
and
Elec-
tronic
Engineering
at
the
BRA
C
Uni
v
ersity
,
Bangladesh.
Prior
to
this,
he
held
the
position
of
a
Post-Doctoral
Researcher
at
the
V
isual
AI
Laboratory
,
Oxford
Brook
es
Uni
v
ersity
,
UK.
His
research
interests
include
ML
for
communication,
optical
camera
communication,
deep
reinforcement
learn-
ing,
automoti
v
e
v
ehicular
communications,
and
optimiza
tion
strate
gies.
He
can
be
reached
at
email:
amirul.islam@bracu.ac.bd.
Indonesian
J
Elec
Eng
and
Comp
Sci,
V
ol.
42,
No.
1,
April
2026:
215–224
Evaluation Warning : The document was created with Spire.PDF for Python.