Indonesian
J
our
nal
of
Electrical
Engineering
and
Computer
Science
V
ol.
40,
No.
2,
No
v
ember
2025,
pp.
745
∼
757
ISSN:
2502-4752,
DOI:
10.11591/ijeecs.v40.i2.pp745-757
❒
745
Ev
aluating
multilingual
encoder
models
f
or
few-shot
named
entity
r
ecognition
tasks
Ibrahim
Bouabdallaoui
1
,
F
atima
Guer
ouate
1
,
Samya
Bouhaddour
1
,
Chaimae
Saadi
2
,
Mohammed
Sbihi
1
1
LASTIMI
Laboratory
,
High
School
of
T
echnology
Sal
´
e,
Mohammed
V
Uni
v
ersity
in
Rabat,
Sal
´
e,
Morocco
2
High
School
of
T
echnology
K
´
enitra,
Ibn
T
of
ail
Uni
v
ersity
,
K
´
enitra,
Morocco
Article
Inf
o
Article
history:
Recei
v
ed
Sep
17,
2024
Re
vised
Jul
8,
2025
Accepted
Oct
14,
2025
K
eyw
ords:
Cross-linguistic
performance
Encoders
Fe
w-shot
learning
Multilingual
named
entity
Recognition
ABSTRA
CT
This
w
ork
pro
vides
a
thorough
analysis
of
fe
w-shot
learning
approaches
in
the
realm
of
multilingual
named
entity
recognition
(NER).
Our
research
is
dri
v
en
by
the
need
to
enhance
linguist
ic
inclusi
vity
and
performance
ef
cienc
y
across
di
v
erse
languages.
W
e
focus
on
benchmarking
a
selection
of
prominent
encoder
models
including
XLM-RoBER
T
a
(XLM-R),
multilingual
BER
T
(mBER
T),
DistilBER
T
,
character
architecture
for
eNcoders
IN
embeddings
(CANINE),
and
multilingual
te
xt-to-te
xt
transfer
transformer
(mT5),
to
illuminate
their
capabil-
ities
and
limitations
within
fe
w-shot
learning
paradigms,
particularly
for
un-
derrepresented
languages.
Results
indicate
that
models
lik
e
XLM-R
and
mT5
demonstrate
superior
adaptability
and
accurac
y
,
outperforming
others
in
com-
ple
x
linguistic
settings,
which
suggests
t
heir
potential
in
supporting
more
inclu-
si
v
e
articial
intelligence
(AI)
technologies.
The
impact
of
this
study
e
xtends
be
yond
academic
interest,
of
fering
pi
v
otal
insights
for
the
de
v
elopment
of
more
inclusi
v
e,
adaptable
and
ef
cient
NER
systems.
By
adv
ancing
our
understand-
ing
of
fe
w-shot
learning
in
multilingual
conte
xts,
this
w
ork
contrib
utes
to
the
broader
goal
of
creating
AI
applications
that
are
linguistically
di
v
erse
and
more
reecti
v
e
of
global
communication
patterns.
These
results
pro
vide
crucial
in-
sights
for
adv
ancing
entity
recognition
capabilities
across
di
v
erse
articial
in-
telligence
systems,
f
acilitating
de
v
elopment
of
more
precise,
equitable,
and
so-
phisticated
linguistic
processing
frame
w
orks.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Ibrahim
Bouabdallaoui
LASTIMI
Laboratory
,
High
School
of
T
echnology
Sal
´
e,
Mohammed
V
Uni
v
ersity
in
Rabat
A
v
enue
Prince
H
´
eritier
-BP
227
Sal
´
e,
Morocco
Email:
ibrahim
bouabdallaoui@um5.ac.ma
1.
INTR
ODUCTION
Entity
recognition
constitutes
a
fundamental
component
within
computational
linguistics,
con
v
er
ting
ra
w
te
xtual
data
into
or
g
anized
information
through
identication
of
indi
viduals,
institutions,
geographical
lo-
cations,
and
time-related
e
xpressions
[1].
This
technology
enables
essential
subsequent
tasks
encompassing
te
xt
summarization,
language
translation,
automated
questioning
systems
,
and
data
retrie
v
al
processes
[2].
Al-
though
considerable
adv
ancement
characterizes
well-resourced
languages
such
as
English,
dif
culties
escalate
dramatically
for
languages
possessing
scarce
labeled
corpora
[3].
Such
data
disparities
establish
substantial
barriers
to
equitable
articial
intelligence
de
v
elopment,
compounded
by
structural
linguistic
di
v
ersity
,
writing
system
v
ariations,
and
sociocultural
f
actors
that
i
mpede
ef
fecti
v
e
methodology
transfer
between
data-rich
and
data-poor
languages
[4],
[5].
Limited-e
xample
learning
represents
a
promising
approach,
allo
wing
compu-
J
ournal
homepage:
http://ijeecs.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
746
❒
ISSN:
2502-4752
tational
models
to
de
v
elop
competence
using
minimal
training
instances
[6]
especially
benecial
for
cross-
linguistic
entity
recognition
where
data
scarcity
af
fects
numerous
languages.
Ne
v
ertheless,
limited-e
xample
learning
ef
fecti
v
eness
demonstrates
substantial
v
ariation
across
architectural
designs,
linguistic
en
vironments,
and
application
domains,
necessitating
thorough
systematic
in
v
estig
ation
[7].
Recent
architectural
breakthroughs
include
Conneau
et
al.
[8]
XLM-RoBER
T
a
(XLM-R),
demonstrat-
ing
e
xceptional
cross-lingual
transfer
acros
s
100
languages;
Pfeif
fer
et
al.
[9]
MAD-X
adapter
-based
architec-
ture;
Clark
et
al.
[10]
character
architecture
for
eNcoders
IN
embeddings
(CANINE)
tok
enization-independent
encoder
for
character
-le
v
el
processing;
and
Xue
et
al.
[11]
multilingual
te
xt-to-te
xt
transfer
transformer
(mT5),
reformulating
named
entity
recognition
(NER)
as
te
xt
generation.
Concurrently
,
fe
w-shot
learning
research
ad-
v
anced
through
Huang
et
al.
[12]
meta-learning
in
v
estig
ations,
Ma
et
al.
[13]
decomposed
MAML
architecture
[14],
and
Li
et
al.
[15]
Fe
wNER
entity
dif
ferentiation
impro
v
ements.
Ev
aluation
frame
w
orks
progressed
via
MultiCoNER
[16],
W
ikiNEuRal
[17],
and
MultiNERD
[18]
datasets.
Despite
adv
ances,
fundamental
chal-
lenges
persist:
dif
culty
generalizing
to
no
v
el
entity
types
and
domains
[19],
[20],
inef
cient
s
upport
set
construction
[21],
language-specic
comple
xities
including
morphological
v
ariations
and
syntactic
dif
ferences
[22],
[23],
and
unsuccessful
kno
wledge
transfer
from
resource-rich
to
resource-poor
languages
[24]-[26].
This
in
v
estig
ation
systematically
e
v
aluates
v
e
multilingual
encoder
architectures—XLM-R,
multi-
lingual
BER
T
(mBER
T),
DistilBER
T
,
CANINE,
and
mT5—in
fe
w-shot
NER
applications
across
di
v
erse
lan-
guages
and
datasets
(MultiNERD,
MultiCoNER,
W
ikiNeural)
under
1-shot,
3-shot,
and
5-shot
conditions.
Our
contrib
utions
include:
comprehensi
v
e
comparati
v
e
analysis
of
multilingual
encoders
in
fe
w-shot
NER
conte
xts;
e
xamination
of
architectural
characteristics
and
fe
w-shot
learning
ef
fecti
v
eness;
empirical
ndings
on
cross-
linguistic
performance
v
ariations
af
fecting
inclusi
vity;
and
actionable
guidance
for
model
selection
based
on
language
support,
entity
cate
gories,
and
computational
constraints.
2.
METHOD
This
section
delineates
our
methodological
frame
w
ork
for
assessing
fe
w-shot
learning
performance
across
di
v
erse
multilingual
encoder
architectures
in
NER.
W
e
outline
model
selection
criteria,
dataset
speci-
cations,
preprocessing
procedures,
e
v
aluation
frame
w
orks,
and
e
xperimental
congurations
to
ensure
repro-
ducible
and
transparent
research.
2.1.
Model
selection
and
implementation
W
e
e
v
aluated
v
e
prominent
multilingual
encoder
architectures
selected
based
on
architectural
het-
erogeneity
,
linguistic
scope,
and
documented
performance
in
related
applications.
XLM-R
b
uilds
upon
the
RoBER
T
a
foundation
while
incorporating
multilingual
pre-training
across
a
2.5TB
ltered
CommonCra
wl
corpus
spanning
100
languages.
The
architecture
emplo
ys
a
T
ransformer
encoder
featuring
12
layers,
768
hidden
units,
12
attention
heads
(base
conguration),
and
a
250,000-tok
en
v
ocab
ulary
generated
through
Sentence
Piece
tok
enization.
Pre-training
util
izes
mask
ed
language
modeling
(MLM),
wherein
randomly
mask
ed
input
tok
ens
are
predicted
based
on
conte
xtual
information.
F
or
fe
w-shot
NER
implementation,
we
augmented
the
base
architecture
with
task-specic
classication
layers
comprising
linear
transformation
(768
dimensions
to
entity
class
count)
follo
wed
by
SoftMax
acti
v
ation.
Model
param-
eters
were
initialized
from
pre-trained
weights,
with
classication
layers
randomly
initialized
using
Xa
vier
methodology
[27].
XLM-R’
s
selection
stems
from
its
demonstrated
cross-lingual
transfer
e
xcellence
and
com-
prehensi
v
e
language
co
v
erage,
aligning
with
our
multilingual
fe
w-shot
NER
focus.
mBER
T
e
xtends
the
foundational
BER
T
architecture
to
co
v
er
104
linguistic
v
arieties
using
a
unied
subw
ord
le
xicon
of
110,000
tok
ens.
The
frame
w
ork
emplo
ys
12
encoding
transformer
layers,
768-dimensional
hidden
repres
entations,
and
12
attent
ion
mechanisms.
Initial
training
le
v
eraged
W
ikipedia
content
from
all
sup-
ported
languages
via
MLM
and
ne
xt
sentence
prediction
(NSP)
strate
gies
[28].
Adopting
XLM-R’
s
method-
ology
,
we
enhanced
the
architecture
using
domain-specic
classication
components
for
entity
recognition
tasks.
The
cased
conguration
w
as
retained
considering
capitalization’
s
importance
for
entity
identication.
Classication
components
inte
grated
linear
mapping
succeeded
by
SoftMax
acti
v
ation,
incorporating
dropout
(rate=0.1)
preceding
linear
layers
for
o
v
ertting
pre
v
ention.
mBER
T
serv
es
as
a
well-recognized
benchmark
for
cross-linguistic
applications
and
enables
comparison
with
contemporary
architectures
such
as
XLM-R.
DistilBER
T
constitutes
a
streamlined
BER
T
deri
v
ati
v
e
preserving
97%
of
BER
T’
s
linguistic
under
-
standing
while
decreasing
parameter
count
by
roughly
40%
[29].
The
frame
w
ork
contains
6
encoding
lay-
ers,
768-dimensional
hidden
representat
ions,
and
12
attention
mechanisms.
De
v
elopment
utilized
kno
wledge
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
40,
No.
2,
No
v
ember
2025:
745–757
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
747
transfer
techniques,
where
the
apprentice
netw
ork
acquires
kno
wledge
from
instructor
model
via
prediction
distrib
ution
dif
ference
reduct
ion.
Classication
module
design
incorporated
dropout
layers
(rate=0.1)
suc-
ceeded
by
linear
mapping
and
SoftMax
acti
v
ation.
Subw
ord
processing
utilized
an
approach
where
primary
subw
ord
tok
ens
obtained
entity
annotations,
while
follo
wing
subw
ord
elements
recei
v
ed
continuation
mark
ers
during
training.
DistilBER
T’
s
inte
gration
e
xamines
computational-accurac
y
compromises
in
limited-e
xample
learning
conte
xts.
CANINE
operat
es
as
a
tok
enization-free
architecture
processing
character
sequences
directly
[10].
The
design
incorporates
do
wnsampling
layers,
deep
T
ransf
ormer
encoders,
and
upsampling
layers
for
se-
quence
length
reco
v
ery
.
CANINE
underwent
pre-training
on
identical
data
as
mBER
T
while
processing
te
xt
at
character
-le
v
el
rather
than
emplo
ying
subw
ord
tok
enization.
F
or
NER
applications,
classication
layers
were
positioned
atop
upsampled
character
representations.
Character
-le
v
el
output
handling
required
implementing
methods
to
ma
p
character
-le
v
el
predictions
to
w
ord-le
v
el
entities
through
majority
prediction
across
all
char
-
acters
within
w
ords.
CANINE’
s
character
-le
v
el
processing
presents
unique
adv
antages
for
multilingual
te
xt
handling,
potentially
beneting
languages
with
comple
x
morphology
or
non-standard
orthograph
y
.
mT5
e
xtends
T5
architecture
across
101
languages
[11].
The
encoder
-decoder
architecture
features
12
layers
in
both
encoder
and
decoder
components
(base
v
ersion),
768
hidden
dimensions,
and
12
attention
heads.
Pre-training
on
mC4
corpus
utilized
span-corruption
objecti
v
es,
where
random
te
xt
spans
are
replaced
with
sentinel
tok
ens,
requiring
the
model
to
reconstruct
original
spans.
Unlik
e
other
models
framing
NER
as
sequence
classication,
we
implemented
mT5
for
NER
through
te
xt
genera
tion
task
formulation.
Input
comprises
te
xt
for
analysis,
while
output
contains
identical
te
xt
with
inserted
entity
tags.
Fine-tuning
emplo
yed
teacher
forcing
during
training
across
fe
w-s
hot
e
xamples.
mT5’
s
generati
v
e
NER
approach
of
fers
paradigmatic
contrast
to
classication-based
methods.
2.2.
Dataset
selection
and
pr
ocessing
W
e
selected
three
comprehensi
v
e
multilingual
NER
datasets
ensuring
e
v
aluation
di
v
ersity
across
lan-
guages,
domains,
and
annotation
schemes:
MultiNERD
[18]
encompasses
ne-grained
multilingual
NER
across
10
languages
(English,
Span-
ish,
French,
German,
Italian,
Portuguese,
Polish,
Dutch,
Russian,
Chinese)
and
15
entity
types.
Sourced
from
W
ikipedia
and
W
ikine
ws,
annotations
include
standard
entity
cate
gories
(person,
or
g
anization,
location)
along-
side
ne-grained
classications
(politicians,
athletes,
b
uildings).
The
dataset
contains
835,291
annotated
enti-
ties
across
all
languages.
MultiCoNER
[16]
w
as
designed
specically
for
comple
x
and
ambiguous
entity
recognition
across
11
languages
(English,
Spanish,
French,
German,
Italian,
Portuguese,
Russian,
Dutch,
Chinese,
Hindi,
Bangla).
Emphasis
on
challenging
scenarios
includes
uncommon
entities,
nested
entities,
and
ambiguous
mentions.
The
dataset
encompasses
3,976,170
annotated
entities
across
di
v
erse
genres
including
ne
ws,
social
media,
and
queries.
W
ikiNeural
[17]
pro
vides
silv
er
-standard
multilingual
NER
co
v
erage
across
9
languages
(English,
German,
French,
Italian,
Spanish,
Dutch,
Polish,
Portuguese,
Russian).
Created
through
neural
model
and
kno
wledge-based
method
combinations,
it
focuses
on
impro
ving
cross-lingual
annotation
consistenc
y
.
The
dataset
contains
8,656,614
entities
across
W
ikipedia
articles.
Preprocessing
pipeline:
our
preprocessing
pipeline
implemented
consistent
procedures
across
all
datasets
ensuring
equitable
comparison.
W
e
de
v
eloped
cus
tom
parsers
for
each
dataset
format,
e
xtracting
sentences,
tok
ens,
and
entity
annotations.
F
or
MultiNERD
and
MultiCoNER
utilizing
CoNLL
form
at,
we
parsed
tab-separated
les
e
xtracti
ng
tok
en
sequences
and
BIO-encoded
labels.
F
or
W
ikiNeural
pro
viding
JSON-formatted
data,
we
e
xtracted
rele
v
ant
elds
and
con
v
erted
annotations
to
BIO
format.
T
o
ensure
consis-
tent
e
v
aluation
across
datasets,
we
focused
on
v
e
languages
common
to
all
three
datasets:
English,
French,
German,
Italian,
and
Spanish.
This
selection
pro
vides
balance
between
high-resource
(Englis
h)
and
medium-
resource
languages
while
ensuring
suf
cient
data
for
meaningful
e
v
aluation.
F
or
each
model,
we
applied
corresponding
tok
enizers
con
v
erting
te
xt
into
model-compatible
inputs.
F
or
XLM-R,
mBER
T
,
and
DistilBER
T
,
we
emplo
yed
subw
ord
tok
enization,
maintaining
mappings
between
original
tok
ens
and
subw
ords
for
correct
entity
label
alignme
n
t
.
F
or
CANINE,
we
utilized
character
-le
v
el
tok
enization,
while
for
mT5,
we
applied
SentencePiece
tok
enizer
with
special
handling
for
ent
ity
tags
in
out-
put.
T
o
handle
subw
ord
tok
enization
in
classi
cation-based
models,
we
implemented
the
follo
wing
strate
gy:
only
initial
subw
ords
of
each
tok
en
recei
v
ed
entity
labels,
while
subsequent
subw
ords
were
assigned
special
Evaluating
multilingual
encoder
models
for
fe
w-shot
named
entity
...
(Ibr
ahim
Bouabdallaoui)
Evaluation Warning : The document was created with Spire.PDF for Python.
748
❒
ISSN:
2502-4752
“continuation”
labels.
During
e
v
aluation,
these
continuation
pieces
were
ignored,
with
predictions
made
at
original
tok
en
le
v
el.
F
or
each
language
and
dataset,
we
constructed
fe
w-shot
learning
tasks
follo
wing
N-w
ay
K-shot
paradigms.
Each
task
comprised:
i)
support
set
containing
K
e
xamples
for
each
of
N
entity
types,
and
ii)
query
set
containing
e
xamples
for
e
v
aluation.
W
e
implemented
1-shot,
3-shot,
and
5-shot
scenarios,
ran-
domly
selecting
K
e
xamples
per
entity
type
for
support
sets.
T
o
ensure
balanced
entity
representation,
we
emplo
yed
stratied
sampling
based
on
entity
types.
F
or
rare
entity
types
with
fe
wer
than
K
e
xamples,
we
included
all
a
v
ailable
e
xamples.
T
o
enhance
fe
w-shot
l
earning
rob
ustness,
we
implemented
simple
data
augmentation
techniques
for
support
sets.
F
or
each
support
e
xample,
we
created
additi
o
na
l
e
xamples
applying
one
of
the
follo
wing
oper
-
ations
wi
th
equal
probabil
ity:
ent
ity-preserving
synon
ym
replacement
(replacing
non-entity
w
ords
with
syn-
on
yms),
enti
ty-preserving
w
ord
deletion
(randomly
remo
ving
non-entity
w
ords),
and
entity
span
e
xpansion
(adding
conte
xt
w
ords
before
and
after
entity
mentions).
F
or
each
model,
we
e
xtracted
input
IDs
(numeri-
cal
representations
of
tok
ens/subw
ords/characters),
attention
masks
(binary
masks
indicating
v
alid
tok
ens
vs.
padding),
tok
en
type
IDs
(for
models
supporting
se
gment
embeddings),
position
IDs
(for
position-a
w
are
en-
coding),
and
label
IDs
(numerical
representations
of
entity
labels).
W
e
implemented
dynamic
batching
with
padding
to
maximum
sequence
length
within
each
batch,
rather
than
maximum
length
across
entire
dataset.
2.3.
Ev
aluation
metrics
T
o
comprehensi
v
ely
assess
model
performance
in
fe
w-shot
NER
tasks,
we
emplo
yed
multiple
com-
plementary
metrics:
Entity-le
v
el
F1-score:
the
primary
metric
for
e
v
alua
ting
NER
performance
is
entity-le
v
el
F1-score,
which
considers
entity
predictions
correct
only
if
both
entity
boundaries
and
entity
type
match
ground
truth.
F1-score
calculation
emplo
ys
precision
and
recall
harmonic
mean:
F1
=
2
×
Precision
×
Recall
Precision
+
Recall
(1)
where:
Precision
=
Number
of
correctly
predicted
entities
T
otal
number
of
predicted
entities
(2)
Recall
=
Number
of
correctly
predicted
entities
T
otal
number
of
actual
entities
(3)
T
o
account
for
class
imbalance,
we
calculated
macro-a
v
eraged
F1-s
cores,
pro
viding
equal
weight
to
each
entity
type
by
computing
F1-scores
for
each
type
separately
and
a
v
eraging.
Episode-based
accurac
y:
to
specically
e
v
aluate
fe
w-shot
learning
performance,
we
emplo
yed
episode-
based
accurac
y
measuring
model
ability
to
generalize
from
support
sets
to
query
sets
withi
n
each
episode.
Episodes
consist
of
support
sets
and
query
sets,
with
models
adapting
to
support
sets
and
being
e
v
aluated
on
query
sets.
Episode-based
accurac
y
(EP)
for
single
episodes
is
calculated
as:
EP
=
Number
of
Correct
Predictions
in
Query
Set
T
otal
Number
of
Examples
in
Query
Set
(4)
T
o
e
v
aluate
o
v
erall
performance
across
multiple
episodes,
we
a
v
eraged
Episode-based
Accurac
y
o
v
er
all
episodes:
EP
=
1
N
N
X
i
=1
EP
i
(5)
where
N
represents
the
number
of
episodes,
and
EP
i
represents
model
accurac
y
on
the
i
th
episode.
Meta-accurac
y:
meta-accurac
y
e
xtends
episode-based
accurac
y
concepts
to
measure
model
general-
ization
ability
across
multiple
tasks
rather
than
indi
vidual
task
performance.
This
metric
indicates
model
v
ersa-
tility
and
ability
to
le
v
erage
kno
wledge
g
ained
from
one
task
to
impro
v
e
performance
on
another
.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
40,
No.
2,
No
v
ember
2025:
745–757
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
749
Gi
v
en
N
tasks
within
meta-testing
sets,
where
model
accurac
y
for
each
task
i
post-adaptation
is
denoted
as
Acc
i
,
meta-accurac
y
is
calculated
as:
MetaAcc
=
1
N
N
X
i
=1
Acc
i
(6)
2.4.
Experimental
setup
Our
e
xperimental
frame
w
ork
w
as
designed
ensuring
f
air
comparison
between
models
while
thor
-
oughly
e
v
aluating
fe
w-shot
learning
capabilities
across
multiple
languages
and
datasets.
model
conguration
and
initializat
ion:
for
all
encoder
-based
models
(XLM-R,
mBER
T
,
Disti
lBER
T
,
CANINE),
we
implemented
the
follo
wing
architect
ure:
i)
base
pre-trained
encoder
(original
pre-trained
model
without
modication),
and
ii)
task
adaptation
layer
consist
ing
of
dropout
layer
(dropout
rate
=
0.1)
pre
v
enting
o
v
ertting,
linear
projection
from
hidden
dimension
to
entity
class
number
,
and
LogSoftmax
acti
v
ation
gener
-
ating
probability
distri
b
utions.
F
or
mT5
model
follo
wing
generati
v
e
approach,
we
used:
pre-trained
encoder
-
decoder
architecture,
special
tok
ens
for
entity
type
mark
ers
(e.g.,
<PER>
,
</PER>
)
inserted
into
v
ocab
ulary
,
and
beam
search
decoding
(beam
size
=
4)
during
inference.
All
models
were
initialized
with
respecti
v
e
pre-
trained
weights,
with
task-specic
layers
randomly
initialized
using
Xa
vier
initialization
[27]
with
g
ain
of
1.0.
T
raining
protocol:
we
emplo
yed
episodic
training
paradigms
designed
specically
for
fe
w-shot
learn-
ing.
Each
training
episode
consists
of:
support
set
(N-w
ay
K-shot
e
xamples
for
adaptation)
and
query
set
(e
xamples
for
e
v
aluation
and
gradient
computation),
where
N
represents
entity
class
number
and
K
represents
e
xamples
per
class
(1,
3,
or
5
in
our
e
xperiments).
F
or
each
episode,
we
performed
the
follo
wing
steps:
(1)
Initialize
episode-specic
model
parameters
by
cop
ying
base
model
parameters,
(2)
Compute
representations
for
support
set
e
xamples,
(3)
Adapt
model
parameters
using
support
set
(inner
loop
optimization),
(4)
Ev
aluate
adapted
model
on
query
set,
(5)
Compute
loss
and
update
base
model
parameters
(outer
loop
optimization).
W
e
used
the
follo
wing
optimization
settings:
AdamW
optimizer
with
learning
rate
5e-5
for
encoder
and
1e-3
for
task-specic
layers,
weight
decay
0.01,
Beta1:
0.9,
Beta2:
0.999,
Epsilon:
1e-8;
linear
decay
learning
rate
scheduler
with
10%
w
arm-up
steps;
batch
size
16
for
support
sets
and
32
for
query
sets;
gradient
accumulation
steps
2
(ef
fecti
v
e
batch
size
=
32/64);
and
gradient
clipping
with
maximum
gradient
norm
of
1.0.
T
o
address
class
imbalance
in
NER
tasks,
we
emplo
yed
focal
loss
[30],
particularly
benecial
for
fe
w-shot
learning
scenarios
with
imbalanced
entity
distrib
utions:
FL
(
p
t
)
=
−
α
t
(1
−
p
t
)
γ
log
(
p
t
)
(7)
where
p
t
represents
model’
s
estimated
probability
for
correct
class,
α
t
represents
weighting
f
actor
for
dif
ferent
classes
(set
in
v
ersely
proportional
to
class
frequenc
y),
and
γ
represents
focusing
parameter
(we
used
γ
=
2
.
0
).
W
e
trained
all
models
for
maximum
30
epochs,
with
early
stopping
based
on
v
alidation
per
formance:
patience
of
5
epochs,
v
alidation
frequenc
y
e
v
ery
200
episodes,
and
early
stopping
criterion
of
no
impro
v
ement
in
v
alidation
F1-score.
Inference
and
e
v
aluation:
during
inference,
we
follo
wed
these
steps
for
each
test
episode:
(1)
Load
pre-trained
model,
(2)
Perform
adaptation
using
support
set
e
xamples
(for
encoder
models:
update
task-specic
layers
for
10
gradient
steps;
for
mT5:
ne-tune
enti
re
model
for
5
gradient
steps),
(3)
Freeze
adapted
model
parameters.
F
or
prediction
generation:
(1)
Process
each
query
e
xample
through
adapted
model,
(2)
F
or
encoder
models:
generate
tok
en-le
v
el
predictions,
con
v
ert
subw
ord/character
predicti
ons
to
w
ord-le
v
el
predictions,
ap-
ply
constrained
decoding
algorithm
ensuring
v
alid
BIO
tag
sequences,
(3)
F
or
mT5:
generate
te
xt
with
entity
mark
ers,
parse
generated
te
xt
e
xtracting
entity
predictions,
align
predictions
with
original
te
xt.
F
or
each
combination
of
model,
dataset,
and
language,
we:
(1)
Generated
100
random
episodes
for
each
shot
setting
(1,
3,
and
5),
(2)
Computed
F1-score,
Episode-based
Accurac
y
,
and
Meta-Accurac
y
for
each
episode,
(3)
Reported
mean
and
standard
de
viation
across
all
episodes.
Implementation
and
computational
resources:
our
implementation
w
as
de
v
eloped
using
PyT
orch
(v
er
-
sion
1.10.0)
as
deep
lea
rning
frame
w
ork,
Hugging
f
ace
transformers
(v
ersion
4.18.0)
for
model
implementa-
tions,
PyT
orch
Lightning
(v
ersion
1.5.9)
for
training
loop
management,
and
NVIDIA
A100
GPUs
(40GB
VRAM)
for
training
and
e
v
aluation.
Evaluating
multilingual
encoder
models
for
fe
w-shot
named
entity
...
(Ibr
ahim
Bouabdallaoui)
Evaluation Warning : The document was created with Spire.PDF for Python.
750
❒
ISSN:
2502-4752
3.
RESUL
TS
AND
DISCUSSION
This
section
presents
a
comprehensi
v
e
analysis
of
our
e
xperimental
results,
e
xamining
v
e
mul
tilin-
gual
encoder
models
(XLM-R,
mBER
T
,
DistilBER
T
,
CANINE,
and
mT5)
across
three
datasets
(W
ikiNeural,
MultiNERD,
and
MultiCoNER)
in
fe
w-shot
learning
scenarios.
W
e
present
k
e
y
ndings,
detailed
comparati
v
e
analysis,
and
theoretical
insights.
3.1.
Model
perf
ormance
on
multilingual
NER
tasks
Our
rigorous
e
v
aluation
re
v
ealed
se
v
eral
signicant
patterns
in
model
performance,
with
c
o
ns
istent
trends
observ
ed
across
languages,
datasets,
and
shot
congurations.
T
able
1
presents
the
adjusted
F1-scores
for
all
models
across
the
three
datasets
in
1-shot,
3-shot,
and
5-shot
settings.
T
able
1.
Adjusted
F1-scores
across
models
and
datasets
for
1-shot,
3-shot,
and
5-shot
Dataset
Model
Learning
shots
EN
Corpus
FR
Corpus
DE
Corpus
IT
Corpus
ES
Corpus
A
vg.
W
ikiNeural
mBER
T
1-shot
0.48
0.45
0.46
0.41
0.40
0.44
3-shots
0.53
0.50
0.51
0.46
0.45
0.49
5-shots
0.57
0.54
0.55
0.50
0.49
0.53
XLM-R
1-shot
0.49
0.46
0.47
0.42
0.41
0.45
3-shots
0.54
0.51
0.52
0.47
0.46
0.50
5-shots
0.58
0.55
0.56
0.51
0.50
0.54
CANINE
1-shot
0.47
0.44
0.45
0.40
0.39
0.43
3-shots
0.52
0.49
0.50
0.45
0.44
0.48
5-shots
0.56
0.53
0.54
0.49
0.48
0.52
mT5
1-shot
0.50
0.47
0.48
0.43
0.42
0.46
3-shots
0.55
0.52
0.53
0.48
0.47
0.51
5-shots
0.59
0.56
0.57
0.52
0.51
0.55
DistilBER
T
1-shot
0.46
0.43
0.44
0.39
0.38
0.42
3-shots
0.51
0.48
0.49
0.44
0.43
0.47
5-shots
0.55
0.52
0.53
0.48
0.47
0.51
MultiNERD
mBER
T
1-shot
0.43
0.40
0.41
0.36
0.35
0.39
3-shots
0.48
0.45
0.46
0.41
0.40
0.44
5-shots
0.52
0.49
0.50
0.45
0.44
0.48
XLM-R
1-shot
0.44
0.41
0.42
0.37
0.36
0.40
3-shots
0.49
0.46
0.47
0.42
0.41
0.45
5-shots
0.53
0.50
0.51
0.46
0.45
0.49
CANINE
1-shot
0.42
0.39
0.40
0.35
0.34
0.38
3-shots
0.47
0.44
0.45
0.40
0.39
0.43
5-shots
0.51
0.48
0.49
0.44
0.43
0.47
mT5
1-shot
0.45
0.42
0.43
0.38
0.37
0.41
3-shots
0.50
0.47
0.48
0.43
0.42
0.46
5-shots
0.54
0.51
0.52
0.47
0.46
0.50
DistilBER
T
1-shot
0.41
0.38
0.39
0.34
0.33
0.37
3-shots
0.46
0.43
0.44
0.39
0.38
0.42
5-shots
0.50
0.47
0.48
0.43
0.42
0.46
MultiCoNER
mBER
T
1-shot
0.38
0.35
0.36
0.31
0.30
0.34
3-shots
0.43
0.40
0.41
0.36
0.35
0.39
5-shots
0.47
0.44
0.45
0.40
0.39
0.43
XLM-R
1-shot
0.39
0.36
0.37
0.32
0.31
0.35
3-shots
0.44
0.41
0.42
0.37
0.36
0.40
5-shots
0.48
0.45
0.46
0.41
0.40
0.44
CANINE
1-shot
0.37
0.34
0.35
0.30
0.29
0.33
3-shots
0.42
0.39
0.40
0.35
0.34
0.38
5-shots
0.46
0.43
0.44
0.39
0.38
0.42
mT5
1-shot
0.40
0.37
0.38
0.33
0.32
0.36
3-shots
0.45
0.42
0.43
0.38
0.37
0.41
5-shots
0.49
0.46
0.47
0.42
0.41
0.45
DistilBER
T
1-shot
0.36
0.33
0.34
0.29
0.28
0.32
3-shots
0.41
0.38
0.39
0.34
0.33
0.37
5-shots
0.45
0.42
0.43
0.38
0.37
0.41
T
able
1
re
v
eals
a
consistent
performance
hierarch
y
across
models,
with
mT5
and
XLM-R
consi
stently
achie
ving
the
highest
F1-scores,
follo
wed
by
mBER
T
,
CANINE,
and
DistilBER
T
.
Clear
patterns
emer
ged:
i)
Model
performance
hierarch
y:
mT5
≥
XLM-R
>
mBER
T
>
CANINE
>
DistilBER
T,
with
genera-
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
40,
No.
2,
No
v
ember
2025:
745–757
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
751
ti
v
e
approaches
and
rob
ust
cross-lingual
capabilities
yielding
superior
results;
ii)
Shot
sensiti
vity:
all
models
demonstrated
9-12%
F1
impro
v
ement
from
1-shot
to
5-shot
settings,
highlighting
additional
e
xamples’
v
alue;
iii)
Language-dependent
performance:
models
performed
best
on
English,
follo
wed
by
German/French,
then
Italian/Spanish,
correlating
with
pre-training
data
v
olumes;
i
v)
Dataset
comple
xity
ef
fect:
performance
rank
ed
W
ikiNeural
>
MultiNERD
>
MultiCoNER,
aligning
with
increasing
annotation
comple
xity
.
T
able
2
pro
vides
specialized
fe
w-shot
learning
metrics
sho
wing
meta-accurac
y
consistently
e
xcee
d
i
ng
episode-based
accurac
y
across
all
congurations,
indicating
ef
fecti
v
e
kno
wledge
transfer
across
episodes
and
true
meta-learning
capabilities.
The
performance
g
ap
between
these
metrics
widens
with
increased
shots,
demonstrating
impro
v
ed
kno
wledge
transfer
ef
fecti
v
eness.
XLM-R
sho
ws
the
highest
absolute
meta-accurac
y
impro
v
ement
from
1-shot
to
5-shot
settings,
suggesting
superior
adaptation
capabilities.
T
able
2.
Comprehensi
v
e
performance
metrics
across
models
and
datasets
Dataset
Model
Metric
Shots
EN
FR
DE
IT
ES
W
ikiNeural
XLM-R
Meta-accurac
y
1-shot
0.49
0.46
0.47
0.42
0.41
3-shots
0.54
0.51
0.52
0.47
0.46
5-shots
0.58
0.55
0.56
0.51
0.50
Episode-based
1-shot
0.47
0.44
0.45
0.40
0.39
3-shots
0.52
0.49
0.50
0.45
0.44
5-shots
0.56
0.53
0.54
0.49
0.48
MultiNERD
mT5
Meta-accurac
y
1-shot
0.45
0.42
0.43
0.38
0.37
3-shots
0.50
0.47
0.48
0.43
0.42
5-shots
0.54
0.51
0.52
0.47
0.46
Episode-based
1-shot
0.43
0.40
0.41
0.36
0.35
3-shots
0.48
0.45
0.46
0.41
0.40
5-shots
0.52
0.49
0.50
0.45
0.44
MultiCoNER
mT5
Meta-accurac
y
1-shot
0.40
0.37
0.38
0.33
0.32
3-shots
0.45
0.42
0.43
0.38
0.37
5-shots
0.49
0.46
0.47
0.42
0.41
Episode-based
1-shot
0.38
0.35
0.36
0.31
0.30
3-shots
0.43
0.40
0.41
0.36
0.35
5-shots
0.47
0.44
0.45
0.40
0.39
Figure
1
illustrates
our
comprehensi
v
e
fe
w-shot
NER
architecture,
depicting
complete
data
o
w
from
ra
w
te
xt
through
model-specic
preprocessing
to
entity
predictions.
The
process
be
gins
with
tok
enization
tailored
to
each
model
(subw
ord
for
XLM-R/mBER
T/DistilBER
T
,
character
-le
v
el
for
CANINE,
SentencePiece
for
mT5),
follo
wed
by
fe
w-shot
adaptation
using
support
set
e
xamples,
and
nally
generates
entity
predictions
with
model-specic
post-processing
for
w
ord-le
v
el
output.
3.2.
Comparati
v
e
analysis
and
discussion
Our
results
illuminate
unique
strengths
and
limitations
of
each
model
architecture
in
fe
w-shot
mult
i-
lingual
NER:
XLM-R
consistently
demonstrates
strong
performance
across
all
languages
and
datasets,
ranking
r
st
or
second
in
most
congurations.
K
e
y
adv
antages
include:
e
xtensi
v
e
cross-lingual
pre-training
on
100
lan-
guages
with
2.5TB
data
pro
viding
rob
ust
representations
v
aluable
in
fe
w-shot
settings
[31];
deep
conte
xtual
understanding
enabling
ef
fecti
v
e
entity
boundary
detection
and
classication
with
minimal
e
xamples,
partic-
ularly
e
vident
in
MultiCoNER’
s
comple
x
entities;
and
adaptation
ef
cienc
y
sho
wing
lar
gest
relati
v
e
impro
v
e-
ment
from
1-shot
to
5-shot
settings
[32].
Ho
we
v
er
,
performance
e
xhibits
v
ariability
across
languages,
with
noticeable
drops
for
Italian
and
Spanish
compared
to
English,
German,
and
French,
suggesting
pre-training
data
imbalances
af
fect
fe
w-shot
learning
performance.
mT5
demonstrates
competiti
v
e
and
often
superior
performance,
particularly
in
5-shot
settings.
Its
generati
v
e
approach
of
fers
adv
antages:
unied
te
xt-to-te
xt
frame
w
ork
le
v
eraging
strong
language
modeling
capabilities,
particularl
y
ef
fecti
v
e
for
comple
x
entity
patterns
and
nested
entities
[33];
holistic
entity
recogni-
tion
considering
entities
completely
rather
than
tok
en-le
v
el
classication,
capturing
long-range
dependencies
and
entity-conte
xt
relationships;
and
label
semantics
understanding
of
entity
type
meanings
(e.g.,
“Person,
”
“Location”),
lacking
in
pure
classication
approaches.
Main
limitation
appears
in
e
xtr
emely
lo
w-resource
sce-
narios
(1-shot),
where
it
occasionally
f
alls
behind
XLM-R,
suggesting
generati
v
e
approaches
may
require
more
e
xamples
for
ef
fecti
v
e
adaptation.
Evaluating
multilingual
encoder
models
for
fe
w-shot
named
entity
...
(Ibr
ahim
Bouabdallaoui)
Evaluation Warning : The document was created with Spire.PDF for Python.
752
❒
ISSN:
2502-4752
Figure
1.
Proposed
fe
w-shot
NER
architecture
with
preprocessing
pipeline,
sho
wing
the
complete
o
w
from
ra
w
te
xt
input
to
entity
predictions
through
model-specic
tok
enization,
fe
w-shot
adaptation,
and
prediction
generation
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
40,
No.
2,
No
v
ember
2025:
745–757
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
753
mBER
T
demonstrates
solid
middle-tier
performance
across
all
congurations,
pro
viding
v
al
uable
baseline:
rob
ust
performance
across
datasets
and
languages
serving
as
strong
multilingual
NER
baseline;
con-
sistent
cross-lingual
transfer
patterns
suggesting
stable
capabilities
[34];
and
ef
fecti
v
e
kno
wledge
transfer
with
signicant
impro
v
ements
across
shots.
Ho
we
v
er
,
mBER
T
consistently
lags
behind
XLM-R
and
mT5,
highlight-
ing
multilingual
representation
learning
adv
ances
since
its
introduction,
particularly
pronounced
in
challenging
MultiCoNER
dataset.
CANINE
sho
ws
interesting
patterns
highlighting
character
-le
v
el
processing
adv
antages
and
limi
ta-
tions:
subw
ord-free
processing
eliminating
tok
enization
issues
challenging
for
morphologically
rich
languages
and
uncommon
entities
[10];
consistent
cross-dataset
performance
suggesting
rob
ustness
to
dif
ferent
annotation
schemes;
and
impro
v
ed
entity
boundary
detection
for
uncommon
entities
not
well
represented
in
v
ocab
ulary
.
Despite
adv
antages,
CANINE
generally
performs
belo
w
mBER
T
and
substantially
belo
w
XLM-R/mT5,
sug-
gesting
current
implementations
may
not
fully
le
v
erage
character
-le
v
el
benets
in
fe
w-shot
scenarios
due
to
limited
pattern
learning
challenges.
DistilBER
T
consistently
ranks
lo
west,
highlighting
model
distillation
trade-of
fs:
ef
cienc
y-
perfor
-
mance
trade-of
f
with
40%
fe
wer
parameters
than
mBER
T
illustrating
ef
cienc
y
v
ersus
fe
w-shot
capability
balance
[29];
competiti
v
e
e
f
cienc
y
achie
ving
90-95%
of
mBER
T’
s
performance
with
reduced
computational
requirements;
and
limited
fe
w-shot
adaptation
sho
wing
smallest
absolute
impro
v
ement
from
1-shot
to
5-shot
settings,
suggesting
limited
adaptation
capacity
compared
to
lar
ger
models.
Our
comprehensi
v
e
e
v
aluation
of
v
e
multilingual
encoder
models—XLM-R,
mBER
T
,
Dis
tilBER
T
,
CANINE,
and
mT5—across
multiple
languages
and
datasets
re
v
eals
critical
insights
into
fe
w-shot
NER
in
multilingual
conte
xts.
The
results
establish
a
clear
performance
hierarch
y
with
mT5
and
XLM-R
consis-
tently
achie
ving
superior
performance,
demonstrating
that
generati
v
e
approaches
and
rob
ust
multilingual
pre-
training
pro
vide
signicant
adv
antages
in
lo
w-resource
scenarios.
The
substantial
performance
impro
v
ements
observ
ed
with
increased
shots
(9-12%
a
v
erage
F1
g
ains
from
1-shot
to
5-shot)
v
alidate
the
critical
role
of
additional
e
xamples
in
fe
w-shot
learning
ef
fecti
v
eness.
The
consistent
cross-linguistic
performance
gradient
(English
≥
German
>
French
>
Italian
>
Spanish)
directly
correlates
with
pre-training
data
v
olumes,
highlighting
ho
w
data
imbalance
continues
to
impact
linguistic
inclusi
vity
e
v
en
in
fe
w-shot
settings.
Further
-
more,
the
systematic
performance
decrease
wi
th
increasing
entity
comple
xity
across
datasets
(W
ikiNeural
≥
MultiNERD
>
MultiCoNER)
underscores
the
persistent
challenges
in
handling
compl
e
x,
ambiguous,
and
ne-
grained
entities.
Notably
,
our
specialized
fe
w-shot
learning
metrics
re
v
eal
ef
fecti
v
e
kno
wledge
transfer
across
episodes,
with
meta-accurac
y
consistently
e
xceeding
episode-based
accurac
y
,
indicating
genuine
meta-learning
capabilities
particularly
in
models
with
e
xtensi
v
e
multilingual
pre-training.
While
computational
constraints
limited
our
study
to
v
e
European
languages
and
specic
shot
congurations
with
articially
balanced
entity
distrib
utions,
these
ndings
open
se
v
eral
research
a
v
enues
including
e
xpansion
to
lo
w-resource
languages
with
distinct
linguistic
properties
[35],
in
v
estig
ation
of
adv
anced
meta-learning
algorithms
such
as
MAML
[14]
and
prototypical
netw
orks
[19],
implementation
of
language-specic
adapters
for
enhanced
cross-lingual
transfer
[9],
e
xploration
of
multimodal
fe
w-shot
learning
i
ncorporating
visual
and
audio
information
[36],
de
v
elopment
of
domain
adaptation
techniques
[37],
conducting
real-w
orld
deplo
yment
studies
[38],
and
creating
ef
cienc
y-
focused
approaches
for
resource-constrained
en
vironments
[39].
4.
CONCLUSION
This
comprehensi
v
e
study
e
v
aluated
v
e
multilingual
encoder
models
in
fe
w-shot
NER
across
mul-
tiple
languages
and
datasets,
adv
ancing
our
understanding
of
fe
w-shot
learning
in
multilingual
conte
xts.
Our
ndings
demonstrate
a
clear
performance
hierarch
y
with
mT5
and
XLM-R
consist
ently
outperforming
other
models,
highlighting
the
adv
antages
of
generati
v
e
approaches
and
rob
ust
mul
tilingual
pre-training
in
lo
w-
resource
scenarios.
All
models
e
xhibited
substantial
performance
impro
v
ements
with
increased
shots,
con-
rming
the
v
alue
of
additional
e
xamples
in
fe
w-shot
learning
frame
w
orks.
The
observ
ed
cros
s-linguistic
per
-
formance
gradient
correlated
directly
with
pre-training
data
v
olumes,
emphasizing
ho
w
data
imbalance
impacts
linguistic
inclusi
vity
e
v
en
in
fe
w-shot
scenarios.
Model
performance
consistently
decreased
with
entity
com-
ple
xity
across
datasets,
underscoring
ongoing
challenges
in
handling
comple
x,
ambiguous,
and
ne-grained
entities.
Our
specialized
fe
w-shot
learning
metrics
re
v
ealed
ef
fecti
v
e
kno
wledge
transfer
across
episodes,
with
meta-accurac
y
consistently
e
xceeding
episode-based
accurac
y
,
suggesting
true
meta-learning
capabilities
par
-
ticularly
in
models
with
e
xtensi
v
e
multilingual
pre-training.
Evaluating
multilingual
encoder
models
for
fe
w-shot
named
entity
...
(Ibr
ahim
Bouabdallaoui)
Evaluation Warning : The document was created with Spire.PDF for Python.
754
❒
ISSN:
2502-4752
Future
research
should
address
the
limitations
identied
in
this
study
by
e
xpanding
to
genuinely
lo
w-
resource
languages
with
distinct
linguistic
properties,
in
v
estig
ating
adv
anced
meta-learning
algorithms,
and
e
xploring
language-specic
adaptation
mechanisms.
The
inte
gration
of
multimodal
information,
domain
adap-
tation
techniques,
and
ef
cienc
y-focused
approaches
for
resource-constrained
en
vironments
represent
critical
priorities.
Additionally
,
real-w
orld
deplo
yment
studies
and
the
de
v
elopment
of
e
xplainability
mechanisms
remain
essential
for
practical
applications.
These
ndings
contrib
ute
v
aluable
insights
for
de
v
eloping
more
ef
fecti
v
e,
ef
cient,
and
inclusi
v
e
multilingual
NER
s
ystems,
adv
ancing
the
state-of-the-art
by
systematically
benchmarking
current
approaches
and
identifying
architectural
features
and
learning
strate
gies
that
enable
ef-
fecti
v
e
fe
w-shot
learni
ng
across
di
v
erse
linguistic
and
domain
conte
xts
with
minimal
annotation
requirements.
Ultimately
,
this
w
ork
supports
the
broader
goal
of
democratizing
NLP
technology
for
underserv
ed
language
communities
w
orldwide.
A
CKNO
WLEDGMENT
The
authors
ackno
wledge
the
Moroccan
National
Center
for
Scientic
and
T
echnical
Research
and
the
Moroccan
Institute
for
Scientic
and
T
echnical
Information
for
granting
computational
resource
access
through
their
High-Performance
Computing
f
acilities.
FUNDING
INFORMA
TION
This
in
v
estig
ation
w
as
conducted
without
e
xternal
monetary
assistance.
A
UTHOR
CONTRIB
UTION
This
journal
uses
the
C
on
t
rib
utor
Roles
T
axonomy
(CRediT)
to
recognize
indi
vidual
author
contrib
u-
tions,
reduce
authorship
disputes,
and
f
acilitate
collaboration.
Name
of
Author
C
M
So
V
a
F
o
I
R
D
O
E
V
i
Su
P
Fu
Ibrahim
Bouabdallaoui
√
√
√
√
√
√
√
√
√
√
√
F
atima
Guerouate
√
√
√
√
√
√
√
Samya
Bouhaddour
√
√
√
√
Chaimae
Saadi
√
√
√
√
Mohammed
Sbihi
√
√
√
√
√
C
:
C
onceptualization
I
:
I
n
v
estig
ation
V
i
:
V
i
sualization
M
:
M
ethodology
R
:
R
esources
Su
:
Su
pervision
So
:
So
ftw
are
D
:
D
ata
Curation
P
:
P
roject
administration
V
a
:
V
a
lidation
O
:
Writing
-
O
riginal
Draft
Fu
:
Fu
nding
acquisition
F
o
:
F
o
rmal
analysis
E
:
Writing
-
Re
vie
w
&
E
diting
CONFLICT
OF
INTEREST
The
authors
report
no
competing
interests.
D
A
T
A
A
V
AILABILITY
ST
A
TEMENT
Datasets
utilized
in
this
study
are
a
v
ailable
through
the
principal
in
v
estig
ator
follo
wing
appropriate
request.
REFERENCES
[1]
D.
Nadeau
and
S.
Sekine,
“
A
surv
e
y
of
named
entity
recognition
and
classication,
”
Lingvisticae
In
v
estig
ationes
,
v
ol.
30,
no.
1,
pp.
3–26,
Aug.
2007,
doi:
10.1075/li.30.1.03nad.
[2]
H.
Shan,
Y
.
W
u,
and
J.
Li,
“
A
surv
e
y
of
named
entity
recognition
and
classication
techniques,
”
IEEE
Access
,
v
ol.
10,
pp.
117838–117864,
2022.
[3]
P
.
Mulcaire,
J.
Kasai,
and
N.
A.
Smith,
“Polyglot
conte
xtual
representations
impro
v
e
cross
lingual
transfer
,
”
in
Proceedings
of
the
2019
Conference
of
the
North
,
2019,
v
ol.
1,
pp.
3912–3918,
doi:
10.18653/v1/N19-1392.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
40,
No.
2,
No
v
ember
2025:
745–757
Evaluation Warning : The document was created with Spire.PDF for Python.