I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
20
22
, pp.
379
~
387
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
11
.i
1
.pp
379
-
387
379
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
A
r
aB
E
R
T
t
r
an
sf
or
m
e
r
m
od
e
l
f
or
A
r
ab
i
c
c
om
m
e
n
t
s an
d
r
e
vi
e
w
s
an
al
ysi
s
H
ic
h
am
E
l
M
ou
b
t
ah
ij
1
,
H
aj
ar
A
b
d
e
la
li
2
,
E
l
B
ac
h
ir
T
az
i
3
1
S
ys
t
e
m
s
a
nd T
e
c
hnol
ogi
e
s
of
I
nf
or
m
a
t
i
on T
e
a
m
, H
i
gh S
c
hool
of
T
e
c
hnol
o
gy,
U
ni
ve
r
s
i
t
y of
I
bn Z
ohr
,
A
ga
di
r
,
M
or
oc
c
o
2
L
I
S
A
C
L
a
bor
a
t
or
y
,
F
a
c
ul
t
y of
S
c
i
e
nc
e
s
D
h
a
r
M
a
hr
a
z
, U
ni
ve
r
s
i
t
y of
S
i
di
M
oh
a
m
e
d B
e
n A
bde
l
l
a
h
,
F
e
z
,
M
or
oc
c
o
3
C
om
put
e
r
S
c
i
e
nc
e
de
pa
r
t
m
e
nt
, P
ol
ydi
s
c
i
pl
i
na
r
y F
a
c
ul
t
y,
U
ni
ve
r
s
i
t
y of
S
i
di
M
oha
m
e
d B
e
n A
bde
l
l
a
h,
T
a
z
a
,
M
or
oc
c
o
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
S
e
p 27, 2021
R
e
vi
s
e
d
D
e
c
24, 2021
A
c
c
e
pt
e
d
J
a
n 4, 2022
Arabic
language
is
rich
and
complex
in
terms
of
word
morphology
compared
to
other
Latin
languages.
Recently
,
natural
language
proc
essing
(NLP)
field
emerges
with
many
researches
targeting
Arabic
language
understanding
(ALU).
In
this
context,
this
work
pres
ents
our
dev
eloped
approach
based
on
the
Arabic
bidirectional
encoder
representations
from
transforme
rs
(
AraBERT
)
model
where
the
main
required
steps
are
pre
sented
in
detail.
We
started
by
the
input
text
pre
-
pro
cessing,
which
is,
then,
segmented
using
the
Farasa
segmentation
technique.
In
the
next
st
ep,
the
AraBERT
model
is
implemented
with
the
pertinent
parameters
.
The
performance
of
our
approach
has
been
evaluated
using
the
ARev
dataset
which
contains
more
than
40,000
comments
-
remarks
records
relate
to
th
e
tourism
sector
such
as
hotel
reviews,
restaura
nt
reviews
and
others.
Moreover,
the
obtained
results
are
deeply
compared
with
other
r
elevant
states
of
the
art
methods,
and
it
shows
the
competitiv
eness
of
our
ap
proach
that
gives
important
results
that
can
serve
as
a
guide
for
further
improvements
in this field.
K
e
y
w
o
r
d
s
:
A
r
a
B
E
R
T
A
r
a
bi
c
l
a
ngua
ge
unde
r
s
ta
ndi
ng
F
a
r
a
s
a
s
e
gm
e
nt
a
ti
on
N
a
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
This is an
open
acce
ss artic
le unde
r the
CC BY
-
SA
license.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
H
ic
ha
m
E
l
M
oubt
a
hi
j
S
ys
te
m
s
a
nd T
e
c
hnol
ogi
e
s
of
I
nf
or
m
a
ti
on T
e
a
m
, H
ig
h S
c
hool
o
f
T
e
c
hnol
ogy, Unive
r
s
it
y of
I
bn Z
ohr
A
ga
di
r
, M
or
oc
c
o
E
m
a
il
:
h.e
lm
oubt
a
hi
j@uiz
.a
c
.m
a
1.
I
N
T
R
O
D
U
C
T
I
O
N
A
r
a
bi
c
is
a
n
in
te
r
na
ti
ona
l
la
ngua
ge
,
s
poke
n
by
m
or
e
th
a
n
500
m
il
li
on
s
pe
a
ke
r
s
.
I
t
is
c
ons
id
e
r
e
d
a
s
one
of
th
e
im
po
r
ta
nt
S
e
m
it
ic
la
ngua
ge
s
f
a
m
il
y.
F
r
o
m
th
e
A
r
a
bi
a
n
gul
f
to
th
e
a
tl
a
nt
ic
oc
e
a
n
,
A
r
a
bi
c
la
ngua
ge
is
a
dm
in
is
tr
a
ti
ve
a
nd
of
f
ic
ia
l
la
ngua
g
e
of
m
or
e
t
he
21 c
ount
r
ie
s
[
1]
.
A
r
a
bi
c
is
a
r
ic
h a
nd
c
om
pl
e
x
la
ngua
ge
in
te
r
m
s
of
w
or
d
m
or
phol
ogy
c
om
pa
r
e
d
to
E
ngl
is
h,
th
e
p
r
e
s
e
nc
e
of
va
r
io
us
di
a
le
c
ts
is
s
om
e
of
th
e
di
s
ti
ngui
s
hi
ng
pr
om
in
e
nt
f
a
c
to
r
s
in
th
e
la
ngua
ge
.
M
or
e
ove
r
,
th
e
la
r
ge
di
f
f
e
r
e
nc
e
s
be
twe
e
n
th
e
m
ode
r
n
s
ta
nda
r
d
A
r
a
bi
c
(
M
S
A
)
a
nd
th
e
di
a
le
c
ti
c
a
l
A
r
a
bi
c
(
D
A
)
in
c
r
e
a
s
e
th
is
c
om
pl
e
xi
ty
.
I
t
s
houl
d
be
not
e
d
th
a
t
M
S
A
is
e
m
pl
oye
d
f
or
f
or
m
a
l
(
a
dm
in
is
tr
a
ti
ve
)
w
r
it
in
g
a
nd
D
A
i
s
e
m
pl
oye
d
f
or
in
f
or
m
a
l
da
il
y
c
om
m
uni
c
a
ti
on
on s
oc
ia
l
m
e
di
a
f
or
e
xa
m
pl
e
[
2]
.
F
r
om
t
he
w
or
k o
f
G
ue
ll
il
et
al
.
[
3]
publ
is
he
d i
n 2021, the D
A
i
s
di
vi
de
d
i
nt
o
s
ix
c
ol
le
c
ti
ons
:
i
)
M
a
ghr
e
bi
(
M
A
G
H
)
,
ii
)
E
gypt
ia
n
(
E
G
Y
)
,
ii
i
)
I
r
a
qi
(
I
R
Q
)
,
iv
)
L
e
va
nt
in
e
(
L
E
V
)
,
v
)
G
u
lf
(
G
L
F
)
,
a
nd
vi
)
ot
he
r
s
r
e
m
a
in
in
g
di
a
le
c
t.
O
n
th
e
ot
he
r
ha
nd,
t
he
A
r
a
bi
c
la
ngua
g
e
us
e
d
on
s
hor
t
m
e
s
s
a
gi
ng
s
ys
te
m
(
S
M
S
)
,
c
ha
t
f
or
um
s
a
nd
on
s
oc
ia
l
m
e
di
a
ge
n
e
r
a
ll
y
is
c
a
ll
e
d
"
A
r
a
bi
z
i
"
[
4]
.
I
ts
w
r
it
te
n
te
xt
is
a
m
ix
tu
r
e
of
L
a
ti
n
c
ha
r
a
c
te
r
s
,
num
e
r
a
ls
a
nd
s
om
e
pun
c
tu
a
ti
on.
F
or
e
xa
m
pl
e
,
th
e
s
e
nt
e
nc
e
:
"
و
ر
ف
ا
س
ن
ه
ا
ي
"
,
th
a
t
is
tr
a
ns
la
te
d i
nt
o E
ngl
is
h a
s
"
le
t'
s
t
r
a
ve
l
"
, i
s
w
r
it
te
n i
n A
r
a
bi
z
i
f
or
m
a
s
"
ya
ll
a
h ns
a
a
f
e
r
ou"
[
5]
.
D
e
s
pi
te
it
s
s
pr
e
a
d
us
a
g
e
,
th
e
r
e
i
s
li
tt
le
r
e
s
e
a
r
c
h
i
n
th
e
f
ie
ld
of
m
ode
r
n
c
om
put
a
ti
ona
l
li
ngui
s
ti
c
s
in
te
r
e
s
te
d
in
th
e
A
r
a
bi
c
la
ngua
ge
c
om
p
a
r
e
d
to
ot
he
r
la
ngua
ge
.
H
ow
e
ve
r
,
in
th
e
la
s
t
ye
a
r
s
,
s
e
ve
r
a
l
r
e
s
e
a
r
c
h
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
20
22
:
379
-
387
380
e
f
f
or
ts
ha
s
be
e
n
m
a
de
a
nd
m
a
ny
pa
pe
r
a
pp
e
a
r
in
va
r
io
us
la
ng
ua
ge
pr
oc
e
s
s
in
g
ta
s
ks
.
P
r
a
c
ti
c
a
ll
y,
th
e
na
m
e
d
e
nt
it
y
r
e
c
ogni
ti
on
(
N
E
R
)
a
nd
th
e
s
e
nt
im
e
nt
a
na
ly
s
is
(
S
A
)
a
r
e
th
e
m
os
t
di
f
f
ic
ul
t
ta
s
ks
of
A
r
a
bi
c
na
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g (
A
N
L
P
)
[
6]
.
I
n
or
de
r
to
obt
a
in
s
a
ti
s
f
a
c
to
r
y
r
e
s
ul
ts
w
it
h
to
le
r
a
bl
e
pe
r
f
or
m
a
nc
e
f
or
A
N
L
P
ta
s
ks
,
r
e
s
e
a
r
c
h
w
or
ks
of
th
e
la
s
t
ye
a
r
s
ha
ve
f
oc
us
e
d
on
th
e
a
ppl
ic
a
ti
on
of
tr
a
ns
f
e
r
le
a
r
ni
ng
by
th
e
f
in
e
-
tu
n
in
g
of
la
r
ge
pr
e
-
t
r
a
in
e
d
la
ngua
ge
m
ode
l
s
w
it
h
a
r
e
la
ti
ve
ly
s
m
a
ll
num
be
r
of
s
a
m
pl
e
s
.
I
t
s
houl
d
be
m
e
nt
io
ne
d
th
a
t
th
is
a
pp
r
oa
c
h
is
ba
s
e
d
on
a
s
e
lf
-
s
upe
r
vi
s
e
d
pr
e
-
tr
a
in
e
d
la
ngua
ge
m
ode
ls
. T
he
y a
ll
ow
us
to
r
e
pr
e
s
e
nt
th
e
s
e
t
of
w
or
ds
a
s
de
ns
e
ve
c
to
r
s
in
a
ve
c
to
r
s
pa
c
e
of
m
in
im
um
di
m
e
ns
io
n
a
nd
c
ons
tr
uc
t
c
ont
in
uous
di
s
tr
ib
ut
e
d
r
e
pr
e
s
e
nt
a
ti
ons
f
or
te
xt
s
.
D
e
s
pi
te
th
e
e
f
f
e
c
ti
ve
ne
s
s
of
w
or
d
e
m
be
ddi
ng,
it
is
una
bl
e
to
ta
ke
in
to
a
c
c
ount
th
e
r
e
la
ti
ons
hi
p
be
twe
e
n
s
e
ve
r
a
l
w
or
ds
a
nd
th
e
m
e
a
ni
ng
of
c
om
pl
e
te
s
e
nt
e
nc
e
s
in
th
e
te
xt
.
S
e
e
in
g
th
e
ne
xt
two
s
e
nt
e
nc
e
s
,
"
ة
أ
ر
م
ل
ا
ا
ه
س
ف
ن
ه
ذ
ه
"
.
O
n
th
e
one
ha
nd,
th
e
ir
w
or
d
e
m
be
ddi
ng
r
e
pr
e
s
e
nt
a
ti
on
s
a
r
e
id
e
nt
ic
a
l,
a
nd
on
th
e
ot
he
r
h
a
nd,
th
e
ir
m
e
a
ni
ngs
a
r
e
e
nt
ir
e
ly
di
f
f
e
r
e
nt
.
H
ow
e
ve
r
,
th
e
hi
gh
c
om
put
a
ti
ona
l
c
os
t
is
a
di
s
a
dv
a
nt
a
ge
in
th
e
tr
a
i
ni
ng
pha
s
e
of
th
e
m
ode
ls
(
m
or
e
th
a
n
500
T
P
U
w
or
ki
ng
f
or
w
e
e
ks
)
.
M
o
r
e
ove
r
,
a
huge
c
or
pus
is
n
e
e
de
d
f
or
th
e
pr
e
-
tr
a
in
in
g pha
s
e
[
7]
, [
8]
.
I
n t
hi
s
w
or
k,
w
e
de
f
in
e
a
nd de
s
c
r
ib
e
t
he
i
m
por
ta
nt
p
r
oc
e
s
s
a
nd s
te
ps
of
our
a
ppr
oa
c
h
ba
s
e
on
A
r
a
bi
c
bi
di
r
e
c
ti
ona
l
e
nc
ode
r
r
e
pr
e
s
e
nt
a
ti
ons
f
r
om
tr
a
ns
f
or
m
e
r
s
(
A
r
a
B
E
R
T
)
tr
a
ns
f
or
m
e
r
m
ode
l
f
or
th
e
A
r
a
bi
c
la
ngua
ge
unde
r
s
ta
ndi
ng
(
A
L
U
)
.
W
e
c
a
n
e
f
f
e
c
ti
ve
ly
c
la
s
s
if
y
th
e
c
om
m
e
nt
s
a
nd
th
e
r
e
vi
e
w
s
in
to
pos
it
iv
e
a
nd
ne
ga
ti
ve
c
a
te
gor
ie
s
.
H
e
n
c
e
,
w
e
e
va
lu
a
te
d
our
m
ode
l
on
A
R
e
v
da
ta
s
e
t
w
hi
c
h
c
ont
a
in
s
m
or
e
th
a
n
40
,
000
c
om
m
e
nt
s
,
hot
e
l,
r
e
s
ta
ur
a
nt
,
pr
oduc
t,
a
tt
r
a
c
ti
on
a
nd
m
ovi
e
r
e
v
ie
w
s
w
r
it
te
n
on
a
m
ix
tu
r
e
of
s
ta
nda
r
d
A
r
a
bi
c
a
nd A
lg
e
r
ia
n di
a
le
c
t.
T
h
e
e
xpe
r
im
e
nt
s
s
ho
w
t
ha
t
our
a
ppr
oa
c
h
a
c
hi
e
ve
s
v
e
r
y good
r
e
s
ul
ts
.
T
hi
s
r
e
m
in
de
r
of
th
is
pa
pe
r
is
s
tr
uc
tu
r
e
d
a
s
:
in
s
e
c
ti
on
2,
w
e
p
r
e
s
e
nt
th
e
m
os
t
im
por
ta
nt
te
c
hni
que
s
a
nd
a
ppr
oa
c
he
s
us
e
d
in
th
e
na
tu
r
a
l
la
ngua
g
e
pr
oc
e
s
s
in
g
(
N
L
P
)
f
ie
ld
to
de
a
l
w
it
h
th
e
A
L
U
pr
obl
e
m
.
T
he
n,
in
s
e
c
ti
on 3 we
de
s
c
r
ib
e
a
nd c
la
r
if
y our
m
o
de
l’
s
a
r
c
hi
te
c
tu
r
e
w
he
r
e
B
E
R
T
r
e
pr
e
s
e
nt
s
i
ts
ba
s
ic
c
or
e
. I
n
s
e
c
ti
on 4,
w
e
de
s
c
r
ib
e
th
e
A
R
e
v
da
t
a
s
e
t
on
w
hi
c
h
w
e
pe
r
f
or
m
our
e
xpe
r
im
e
nt
s
,
th
e
n
w
e
c
om
p
a
r
e
our
r
e
s
ul
ts
w
it
h
th
os
e
of
r
e
le
va
nt
m
e
th
ods
. F
in
a
ll
y,
s
e
c
ti
on 5 c
onc
lu
de
s
t
he
pa
pe
r
a
nd
out
li
ne
s
t
he
m
a
i
n point
s
of
our
f
ut
ur
e
w
or
ks
.
2.
R
E
L
A
T
E
D
WORKS
T
he
r
e
a
r
e
va
r
io
us
te
c
hni
que
s
a
nd
a
ppr
oa
c
h
e
s
u
s
e
d
in
N
L
P
to
s
ol
ve
th
e
pr
obl
e
m
of
A
L
U
.
I
n
th
is
s
e
c
ti
on,
w
e
br
ie
f
ly
pr
e
s
e
nt
s
om
e
w
or
k
in
th
is
f
ie
ld
.
T
he
f
ir
s
t
w
or
k
on
th
e
m
e
a
ni
ng
o
f
w
or
ds
be
ga
n
in
201
3
w
it
h
th
e
w
or
d2ve
c
m
ode
l
de
ve
lo
pe
d
by
M
ik
ol
ov
et
al
.
[
9
]
,
th
e
n
r
e
s
e
a
r
c
he
r
s
a
r
e
or
ie
nt
e
d
to
w
a
r
ds
va
r
ia
nt
s
of
w
or
d2ve
c
li
ke
G
lo
V
e
by
P
e
nni
ngt
on
et
a
l.
[
10]
in
2014
a
nd
f
a
s
t
-
te
xt
by
M
ik
ol
ov
et
al
.
[
11]
in
2017.
B
y
th
e
in
tr
oduc
ti
on
of
th
e
c
onc
e
pt
of
"
c
ont
e
xt
ua
l
in
f
or
m
a
ti
on"
in
20
18,
th
e
r
e
s
ul
ts
w
e
r
e
im
p
r
ove
d
not
ic
e
a
bl
y
on
di
f
f
e
r
e
nt
ta
s
ks
[
12]
,
in
c
r
e
a
s
in
gl
y
th
e
s
tr
uc
tu
r
e
s
be
c
a
m
e
la
r
ge
r
w
hi
c
h
ha
d
s
upe
r
io
r
r
e
pr
e
s
e
nt
a
ti
ons
of
w
or
ds
a
nd
s
e
nt
e
nc
e
s
.
F
r
om
th
is
da
te
,
th
e
f
a
m
ous
m
ode
ls
of
la
ngua
ge
c
om
pr
e
he
ns
io
n
ha
ve
be
e
n
de
ve
lo
pe
d,
f
or
e
xa
m
pl
e
:
i)
bi
di
r
e
c
ti
ona
l
e
n
c
ode
r
r
e
pr
e
s
e
nt
a
ti
ons
f
r
om
tr
a
ns
f
or
m
e
r
s
(
B
E
R
T
)
[
13]
,
ii
)
uni
ve
r
s
a
l
la
ngua
ge
m
ode
l
f
in
e
-
tu
ni
ng
(
U
L
M
F
iT
)
[
14]
,
ii
i)
te
xt
-
to
-
te
xt
tr
a
ns
f
e
r
tr
a
ns
f
or
m
e
r
(
T5
)
[
15]
,
iv
)
A
L
it
e
B
E
R
T
(
A
L
B
E
R
T
)
[
16]
.
T
he
s
e
of
f
e
r
e
d
im
pr
ove
d
pe
r
f
o
r
m
a
nc
e
b
y
e
xpl
or
in
g
di
f
f
e
r
e
nt
p
r
e
-
tr
a
in
in
g
m
e
th
ods
,
m
odi
f
ie
d
m
ode
l
a
r
c
hi
te
c
tu
r
e
s
a
nd l
a
r
ge
r
l
e
a
r
ni
ng c
or
por
a
.
C
onc
e
r
ni
ng
th
e
A
r
a
B
E
R
T
m
ode
l,
w
e
not
e
th
a
t
th
e
r
e
is
li
t
tl
e
w
or
k
done
in
r
e
la
ti
on
to
ot
he
r
la
ngua
ge
s
.
I
n
th
e
f
ol
lo
w
in
g
w
e
quot
e
s
om
e
in
c
hr
onol
ogi
c
a
l
or
de
r
.
I
n
2020,
N
a
da
et
al
.
[
17]
pr
opos
e
d
a
ne
w
a
ppr
oa
c
h
f
or
A
r
a
bi
c
te
xt
s
um
m
a
r
iz
e
r
f
ounde
d
on
a
ge
ne
r
a
l
-
pur
pos
e
a
r
c
hi
te
c
tu
r
e
f
or
na
tu
r
a
l
la
ngua
ge
unde
r
s
ta
ndi
ng
(
N
L
U
)
,
a
nd
na
tu
r
a
l
la
ngua
ge
ge
ne
r
a
ti
on
(
N
L
G
)
:
ge
ne
r
a
ti
on
a
nd
unde
r
s
ta
ndi
ng
of
na
tu
r
a
l
la
ngua
ge
to
s
um
m
a
r
iz
e
t
he
A
r
a
bi
c
t
e
xt
by
e
xt
r
a
c
ti
ng a
nd e
va
lu
a
ti
ng t
he
m
os
t
im
por
ta
nt
s
e
nt
e
nc
e
s
a
t
t
hi
s
t
e
xt
.
A
la
m
i,
a
m
e
m
be
r
of
th
e
L
I
S
A
C
F
S
D
M
-
U
S
M
B
A
t
e
a
m
a
t
S
e
m
E
va
l
-
2020
[
18
]
,
pr
opos
e
d
a
n
e
f
f
e
c
ti
ve
m
e
th
od
f
or
de
a
li
ng
w
it
h
th
e
o
f
f
e
ns
iv
e
A
r
a
bi
c
la
ngua
ge
in
T
w
it
te
r
by
us
in
g
A
r
a
B
E
R
T
e
m
be
ddi
ngs
.
I
n
th
e
F
ir
s
t,
th
e
y
s
ta
r
te
d
w
it
h
pr
e
-
pr
oc
e
s
s
in
g
twe
e
ts
by
h
a
ndl
in
g
e
m
o
ji
s
(
c
ont
a
in
in
g
th
e
ir
A
r
a
bi
c
m
e
a
ni
ngs
)
,
in
th
e
n
e
xt
,
th
e
y
s
ub
s
ti
tu
te
d
e
a
c
h
de
te
c
te
d
e
m
oj
i
s
by
th
e
s
pe
c
ia
l
to
ke
n
(
M
A
S
K
)
in
to
bot
h
f
in
e
-
tu
ni
ng
a
nd
in
f
e
r
e
nc
e
pha
s
e
s
.
T
he
n,
by
a
ppl
yi
ng
th
e
A
r
a
B
E
R
T
m
ode
l
th
e
y
r
e
pr
e
s
e
n
t
twe
e
ts
to
ke
n
s
.
F
in
a
ll
y,
to
d
e
c
i
de
w
h
e
th
e
r
a
twe
e
t
is
of
f
e
ns
iv
e
or
not
,
th
e
y
f
e
e
d
th
e
tw
e
e
t
r
e
pr
e
s
e
nt
a
ti
on
in
t
o
a
s
ig
m
oi
d
f
unc
ti
on.
T
h
e
r
e
pr
opos
e
d
m
e
th
od
a
c
hi
e
ve
d t
he
be
s
t
r
e
s
ul
ts
, a
s
c
or
e
e
qua
l
to
90.17%
on Of
f
e
ns
E
v
a
l
2020.
I
n t
he
ne
xt
ye
a
r
, F
a
r
a
j
a
nd
A
bdul
la
h
[
19]
publ
is
he
d t
he
be
s
t
s
ol
ut
io
n f
or
t
he
s
ha
r
e
d t
a
s
k on s
e
nt
im
e
nt
a
nd
s
a
r
c
a
s
m
de
te
c
ti
on
in
th
e
A
r
a
bi
c
la
ngua
ge
.
T
h
e
obj
e
c
ti
ve
gl
oba
l
of
th
e
ta
s
k
is
to
id
e
nt
if
y
w
he
th
e
r
a
twe
e
t
is
s
a
r
c
a
s
ti
c
or
not
.
T
he
pr
opos
e
d
s
ol
ut
io
n
is
ba
s
e
d
on
th
e
e
n
s
e
m
bl
e
te
c
hni
que
w
it
h
A
r
a
B
E
R
T
pr
e
-
tr
a
in
e
d
m
ode
l.
I
n t
he
ir
pa
pe
r
,
th
e
y s
ta
r
te
d by de
f
in
in
g t
he
a
r
c
hi
te
c
tu
r
e
of
t
he
m
ode
l
in
t
he
s
ha
r
e
d t
a
s
k. I
n t
he
ne
xt
,
th
e
hype
r
pa
r
a
m
e
te
r
a
nd
th
e
e
xp
e
r
im
e
nt
tu
ni
ng
th
a
t
le
a
d
to
th
i
s
r
e
s
ul
t
a
r
e
pr
e
s
e
nt
e
d
in
de
t
a
il
.
T
he
ir
m
ode
l
i
s
r
a
nke
d 5
th
out
of
27 t
e
a
m
s
w
it
h a
n F
1 s
c
or
e
of
0.5985.
I
n
th
e
r
e
c
e
nt
w
or
k
of
2021,
H
us
s
e
in
et
al
.
[
20]
w
or
ke
d
on
a
n
e
f
f
e
c
ti
ve
a
ppr
oa
c
h
f
or
f
ig
ht
in
g
T
w
e
e
ts
C
O
V
I
D
-
19
I
nf
ode
m
ic
by
us
in
g
th
e
A
r
a
B
E
R
T
m
ode
l.
T
he
or
ga
ni
s
a
ti
on
of
th
e
ir
a
ppr
oa
c
h
is
:
i
n
th
e
f
ir
s
t
s
te
p,
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
aB
E
R
T
t
r
ans
fo
r
m
e
r
m
od
e
l
f
or
A
r
abi
c
c
om
m
e
nt
s
and r
e
v
ie
w
s
analy
s
is
(
H
ic
ham
E
l
M
oubt
ahi
j
)
381
th
e
goa
l
is
to
tr
a
ns
f
or
m
T
w
it
te
r
ja
r
gon,
in
c
lu
di
ng
e
m
ot
ic
ons
a
nd
e
m
oj
is
,
in
to
pl
a
in
te
xt
by
in
vol
vi
ng
a
s
e
que
nc
e
of
pr
e
-
pr
oc
e
s
s
in
g
pr
oc
e
dur
e
s
,
a
nd
th
e
y
e
xpl
oi
te
d
a
v
e
r
s
io
n
of
A
r
a
B
E
R
T
in
th
e
s
e
c
ond
s
te
p,
w
hi
c
h
w
a
s
pr
e
-
tr
a
in
e
d
on
pl
a
in
te
xt
,
to
f
in
e
-
tu
ne
a
nd
c
la
s
s
if
y
th
e
twe
e
ts
c
on
c
e
r
ni
ng
th
e
ir
L
a
be
l.
T
he
ir
a
pp
r
oa
c
h
c
a
n
be
pr
e
di
c
t
7 bi
na
r
y pr
ope
r
ti
e
s
of
a
n A
r
a
bi
c
t
w
e
e
t
a
bout
C
O
V
I
D
-
19. B
y us
in
g t
he
da
ta
s
e
t
pr
ovi
de
d by NL
P
4
I
F
2021, the
y r
a
nke
d 5t
h i
n t
he
F
ig
ht
in
g t
he
C
O
V
I
D
-
19 I
nf
ode
m
ic
t
a
s
k r
e
s
ul
ts
w
it
h a
n F
1 of
0.664.
3.
M
E
T
H
O
D
O
L
O
G
Y
T
he
obj
e
c
ti
ve
of
th
is
s
e
c
ti
on
is
to
de
s
c
r
ib
e
a
nd
c
la
r
if
y
th
e
a
r
c
hi
te
c
tu
r
e
of
our
m
ode
l
ba
s
e
d
on
th
e
A
r
a
B
E
R
T
m
ode
l,
w
he
r
e
B
E
R
T
r
e
pr
e
s
e
nt
s
th
e
ba
s
i
c
c
or
e
.
S
ubs
e
c
ti
on
3.1
s
how
B
E
R
T
m
ode
l.
O
ur
m
ode
l
ba
s
e
d on Ar
a
B
E
R
T
s
ee
in
s
ubs
e
c
ti
on 3.2.
3.1. B
E
R
T
m
od
e
l
B
E
R
T
s
ta
nds
f
or
bi
di
r
e
c
ti
ona
l
e
nc
ode
r
r
e
pr
e
s
e
nt
a
ti
ons
f
r
om
t
r
a
ns
f
or
m
e
r
s
,
it
c
a
m
e
out
of
G
oog
le
A
I
la
bs
in
la
te
2018.
W
e
m
e
nt
io
n
th
a
t
it
is
:
i)
m
or
e
pow
e
r
f
ul
th
a
n
it
s
pr
e
de
c
e
s
s
or
s
in
te
r
m
s
of
r
e
s
ul
ts
;
i
i)
m
or
e
pow
e
r
f
ul
t
ha
n i
ts
pr
e
de
c
e
s
s
or
s
i
n t
e
r
m
s
of
l
e
a
r
ni
n
g
s
pe
e
d
;
i
ii
)
o
nc
e
pr
e
-
tr
a
in
e
d, i
n a
n uns
upe
r
vi
s
e
d w
a
y, i
t
ha
s
it
s
ow
n
li
ngui
s
ti
c
“
r
e
pr
e
s
e
nt
a
ti
on”
.
I
t
c
a
n
be
tr
a
in
e
d
in
in
c
r
e
m
e
nt
a
l
m
ode
(
in
a
s
upe
r
vi
s
e
d
w
a
y
th
is
ti
m
e
)
to
s
pe
c
ia
li
z
e
th
e
m
ode
l
qui
c
kl
y
a
nd
w
it
h
li
tt
le
da
ta
;
a
nd
iv
)
f
in
a
ll
y,
it
c
a
n
w
or
k
in
a
m
ul
ti
-
m
ode
l
w
a
y,
ta
ki
ng
a
s
in
put
da
ta
of
di
f
f
e
r
e
nt
ty
pe
s
s
uc
h
a
s
im
a
ge
s
a
nd/
or
te
xt
,
w
it
h
s
om
e
m
a
ni
pul
a
ti
ons
.
I
t
ha
s
th
e
a
dva
nt
a
ge
ove
r
it
s
c
om
pe
ti
to
r
s
O
p
e
nA
I
'
s
ge
n
e
r
a
ti
ve
pr
e
-
tr
a
in
e
d
tr
a
ns
f
or
m
e
r
(
G
P
T
)
a
nd
e
m
be
ddi
ngs
f
r
om
la
ngu
a
ge
m
ode
l
s
(
E
L
M
o
)
[
12]
o
f
be
in
g
bi
-
di
r
e
c
ti
ona
l,
it
doe
s
not
ha
ve
to
lo
ok
onl
y
ba
c
kw
a
r
ds
li
k
e
O
pe
nA
I
G
P
T
or
c
onc
a
te
na
t
e
t
he
“
ba
c
k
”
vi
e
w
a
nd t
he
“
f
r
ont
”
vi
e
w
dr
iv
e
n i
nde
pe
nde
nt
ly
l
ik
e
f
or
E
L
M
o,
a
s
s
how
n i
n
F
ig
ur
e
1.
F
ig
ur
e
1. D
if
f
e
r
e
nc
e
s
i
n pr
e
-
tr
a
in
in
g m
ode
l
a
r
c
hi
te
c
tu
r
e
s
E
xa
m
pl
e
s
of
w
ha
t
i
t
c
a
n
do:
i)
B
E
R
T
c
a
n
do
th
e
t
r
a
ns
la
ti
on.
H
e
c
a
n
e
ve
n
onc
e
pr
e
-
tr
a
in
e
d
to
t
r
a
ns
la
te
[
F
r
e
nc
h/
E
ngl
is
h
-
E
ng
li
s
h/
F
r
e
nc
h]
a
nd
th
e
n
[
E
ngl
is
h/
G
e
r
m
a
n
-
G
e
r
m
a
n/
E
ngl
is
h]
,
tr
a
ns
la
te
f
r
om
F
r
e
nc
h
to
G
e
r
m
a
n
w
it
hout
t
r
a
in
in
g
;
ii
)
B
E
R
T
c
a
n
c
om
pa
r
e
th
e
m
e
a
ni
ng
of
two
s
e
nt
e
nc
e
s
to
s
e
e
if
th
e
y
a
r
e
e
qui
va
le
nt
;
ii
i)
B
E
R
T
c
a
n
ge
ne
r
a
te
te
x
t
;
iv
)
B
E
R
T
c
a
n
de
s
c
r
ib
e
a
nd
c
a
te
gor
iz
e
a
n
im
a
ge
;
a
nd
v)
B
E
R
T
c
a
n
do
lo
gi
c
a
l
s
e
nt
e
nc
e
a
na
ly
s
is
, i
.
e
. de
te
r
m
in
e
i
f
a
gi
ve
n e
le
m
e
nt
i
s
a
s
ubj
e
c
t,
a
ve
r
b,
a
nd
a
di
r
e
c
t
obj
e
c
t
c
om
pl
e
m
e
nt
.
3.1.1.
B
id
ir
e
c
t
io
n
al
e
n
c
od
e
r
r
e
p
r
e
s
e
n
t
at
io
n
s
f
r
om
t
r
an
s
f
o
r
m
e
r
s
(
B
E
R
T
)
a
r
c
h
it
e
c
t
u
r
e
B
E
R
T
r
e
us
e
s
th
e
a
r
c
hi
te
c
tu
r
e
of
tr
a
ns
f
or
m
e
r
s
(
he
nc
e
th
e
“
T
”
in
B
E
R
T
)
.
I
nde
e
d,
B
E
R
T
i
s
not
hi
ng
m
or
e
t
ha
n a
s
upe
r
pos
it
io
n of
e
nc
ode
r
s
t
ha
t
a
ll
ha
ve
t
he
s
a
m
e
s
tr
uc
tu
r
e
but
do not
s
ha
r
e
t
he
s
a
m
e
w
e
ig
ht
s
.
T
he
“
B
a
s
e
”
ve
r
s
io
n of
B
E
R
T
c
ons
i
s
ts
of
12 e
nc
ode
r
s
.
T
he
r
e
i
s
a
not
he
r
l
a
r
ge
r
ve
r
s
io
n c
a
ll
e
d
“
L
a
r
ge
”
w
hi
c
h ha
s
24
e
nc
ode
r
s
.
C
e
r
ta
in
ly
,
th
e
la
r
ge
v
e
r
s
io
n
is
m
or
e
pow
e
r
f
ul
but
m
or
e
de
m
a
ndi
ng
on
m
a
c
hi
ne
r
e
s
our
c
e
s
.
T
h
e
a
bove
m
ode
l
ha
s
512 e
nt
r
ie
s
, e
a
c
h c
or
r
e
s
ponding t
o a
t
oke
n. T
h
e
f
ir
s
t
e
nt
r
y c
or
r
e
s
ponds
t
o a
s
pe
c
ia
l
to
ke
n t
he
“
[
C
L
S
]
”
f
or
“
c
la
s
s
if
ic
a
ti
on
”
w
hi
c
h
a
ll
ow
s
B
E
R
T
to
be
us
e
d
f
or
a
t
e
xt
c
la
s
s
if
ic
a
ti
on
ta
s
k.
I
t
a
ls
o
ha
s
512
out
put
s
of
s
iz
e
768
e
a
c
h
(
1024
f
o
r
th
e
ba
s
e
ve
r
s
io
n)
.
T
he
f
ir
s
t
ve
c
to
r
is
th
e
c
la
s
s
if
ic
a
ti
on
ve
c
to
r
.
T
he
out
put
of
e
a
c
h
of
th
e
12
e
nc
ode
r
s
c
a
n
be
c
ons
id
e
r
e
d
a
s
a
ve
c
to
r
r
e
pr
e
s
e
nt
a
ti
on
of
th
e
in
put
s
e
que
nc
e
.
T
he
r
e
le
va
nc
e
of
t
hi
s
r
e
pr
e
s
e
nt
a
ti
on i
s
e
ns
ur
e
d by the
a
tt
e
nt
io
n m
e
c
ha
ni
s
m
i
m
pl
e
m
e
nt
e
d by the
e
nc
ode
r
s
.
3.1.2. T
r
ai
n
in
g p
r
oc
e
d
u
r
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
20
22
:
379
-
387
382
B
E
R
T
di
f
f
e
r
s
f
r
om
it
s
pr
e
de
c
e
s
s
or
s
(
pr
e
-
tr
a
in
e
d
N
L
P
m
ode
ls
)
in
th
e
w
a
y
it
i
s
pr
e
-
tr
a
in
e
d
on
a
l
a
r
ge
da
ta
s
e
t
c
ons
i
s
ti
ng
of
te
xt
s
f
r
om
E
ngl
is
h
W
ik
ip
e
di
a
pa
ge
s
(
2,
500
m
il
li
on
w
o
r
ds
)
a
s
w
e
ll
a
s
a
s
e
t
of
books
(
800 mi
ll
io
n
w
or
ds
)
. T
hi
s
pr
e
-
t
r
a
in
in
g i
s
done
on
t
w
o t
a
s
ks
. F
is
r
t,
a
m
a
s
ke
d l
a
ngua
ge
m
ode
ll
in
g (
M
L
M
)
t
a
s
k
.
S
e
c
ond,
a
ne
xt
s
e
nt
e
n
c
e
pr
e
di
c
ti
on (
N
S
P
)
t
a
s
k.
a.
T
a
s
k 1:
m
a
s
ke
d l
a
ngua
g
e
m
ode
ll
in
g
(
M
L
M
)
T
he
obj
e
c
ti
ve
of
th
i
s
ta
s
k
i
s
to
pr
e
di
c
t
th
e
hi
dde
n
w
or
d.
T
he
r
e
f
or
e
,
be
c
a
us
e
of
th
e
a
bi
li
ty
of
th
e
tr
a
ns
f
or
m
e
r
a
r
c
hi
te
c
tu
r
e
t
o s
im
ul
ta
ne
ous
ly
t
a
ke
i
nt
o a
c
c
ount
t
he
r
ig
ht
a
nd l
e
f
t
c
ont
e
xt
s
of
t
he
t
a
r
ge
t
w
or
d,
th
is
ta
s
k
a
ll
ow
s
th
e
m
ode
l
to
le
a
r
n
e
ve
n
m
or
e
c
o
nt
e
xt
ua
li
s
e
d
r
e
pr
e
s
e
nt
a
ti
ons
th
a
n
one
-
w
a
y
m
ode
l
s
s
uc
h
a
s
E
L
M
o
[
12]
.
I
n
p
r
a
c
ti
c
e
,
ta
r
ge
t
w
or
ds
a
r
e
s
om
e
ti
m
e
s
r
e
pl
a
c
e
d
w
it
h
a
s
pe
c
ia
l
s
ym
bol
[
M
A
S
K
]
,
or
r
e
pl
a
c
e
d
w
it
h
a
not
he
r
r
a
ndom wor
d, or
ke
pt
a
s
t
he
y a
r
e
a
s
s
how
n
in
F
ig
ur
e
2.
F
ig
ur
e
2. M
L
M
b.
T
a
s
k 2:
ne
xt
s
e
nt
e
nc
e
pr
e
di
c
ti
on
(
N
S
P
)
B
E
R
T
i
s
a
l
s
o
tr
a
in
e
d
on
a
ne
xt
-
s
e
nt
e
nc
e
pr
e
di
c
ti
on
t
a
s
k
in
w
h
ic
h
it
m
us
t
d
e
c
id
e
w
he
th
e
r
two
in
put
s
e
nt
e
nc
e
s
a
r
e
c
ons
e
c
ut
iv
e
.
T
he
r
a
ti
ona
le
f
or
th
is
ta
s
k
is
to
im
pr
ove
th
e
pe
r
f
or
m
a
nc
e
of
th
e
m
ode
l
on
ta
s
ks
w
he
r
e
th
e
obj
e
c
ti
ve
is
to
qua
li
f
y
th
e
r
e
la
ti
ons
hi
p
be
twe
e
n
a
p
a
i
r
of
s
e
nt
e
nc
e
s
.
I
n
pr
a
c
ti
c
e
,
th
e
s
pe
c
ia
l
s
ym
bo
l
r
e
pr
e
s
e
nt
a
ti
on
[
C
L
S
]
is
us
e
d
to
c
la
s
s
if
y
e
a
c
h
pa
ir
of
in
put
s
e
nt
e
nc
e
s
a
s
w
e
ll
a
s
f
or
a
ny
ot
he
r
c
la
s
s
if
ic
a
ti
on
ta
s
k onc
e
t
he
m
ode
l
ha
s
be
e
n t
r
a
in
e
d.
3.1.3. B
E
R
T
:
f
in
e
-
t
u
n
in
g
F
in
e
-
tu
ni
ng
c
ons
is
ts
of
us
in
g
a
pr
e
-
tr
a
in
e
d
ve
r
s
io
n
of
B
E
R
T
i
n
a
m
ode
l
a
r
c
hi
te
c
t
ur
e
f
or
a
s
pe
c
if
ic
N
L
P
ta
s
k.
A
ddi
ng
a
ba
s
ic
ne
ur
a
l
ne
twor
k
la
ye
r
is
e
nough
to
ge
t
ve
r
y
good
r
e
s
ul
ts
.
F
or
a
te
xt
c
la
s
s
if
ic
a
ti
on
ta
s
k,
f
or
e
xa
m
pl
e
,
a
nd
m
or
e
pr
e
c
i
s
e
ly
f
or
th
e
a
na
ly
s
i
s
of
th
e
s
e
nt
im
e
nt
of
m
ovi
e
goe
r
s
’
r
e
vi
e
w
s
,
th
e
a
r
c
hi
te
c
tu
r
e
of
th
e
f
it
te
d
m
o
d
e
l
m
a
y
lo
ok
li
ke
th
is
a
s
s
how
n
in
F
ig
ur
e
3
.
I
t
is
s
uf
f
ic
ie
nt
to
a
dd,
dow
ns
tr
e
a
m
of
B
E
R
T
, a
f
e
e
d
-
f
or
w
a
r
d f
ol
lo
w
e
d by a
s
of
tm
a
x.
3.2. Ou
r
m
od
e
l
b
as
e
d
on
A
r
aB
E
R
T
I
n
our
a
ppr
oa
c
h,
w
e
us
e
d
A
r
a
B
E
R
T
ba
s
e
d
on
th
e
B
E
R
T
m
ode
l
.
I
t
is
a
w
id
e
ly
us
e
d
m
ode
l
in
va
r
io
us
N
L
P
ta
s
ks
f
or
s
e
v
e
r
a
l
la
ngua
ge
s
.
A
r
a
B
E
R
T
is
a
pr
e
-
tr
a
in
e
d
m
ode
l
f
or
th
e
A
r
a
bi
c
la
ngua
ge
,
ba
s
e
d
on
th
e
G
oogl
e
B
E
R
T
a
r
c
hi
te
c
tu
r
e
[
6]
th
e
r
e
a
r
e
s
ix
ve
r
s
io
ns
of
th
e
m
o
de
l:
A
r
a
B
E
R
T
v0.1
-
ba
s
e
,
A
r
a
B
E
R
T
v0.2
-
ba
s
e
,
A
r
a
B
E
R
T
v0.2
-
la
r
ge
,
A
r
a
B
E
R
T
v1
-
ba
s
e
,
A
r
a
B
E
R
T
v2
-
ba
s
e
a
nd
A
r
a
B
E
R
T
v2
-
la
r
ge
.
I
n
T
a
bl
e
1
w
e
d
e
s
c
r
ib
e
in
de
ta
il
th
e
im
por
ta
nt
in
f
or
m
a
ti
on
f
or
e
a
c
h
v
e
r
s
io
n
in
r
e
la
ti
on
to
th
e
pr
e
-
tr
a
in
in
g
pr
oc
e
s
s
.
T
he
ove
r
a
ll
vi
e
w
of
our
m
ode
l
is
s
how
n
in
F
ig
ur
e
4.
W
e
ha
ve
be
e
n
w
or
ki
ng
o
n
th
e
c
us
to
m
e
r
/u
s
e
r
r
e
vi
e
w
da
ta
ba
s
e
f
or
th
e
s
e
nt
im
e
nt
a
na
ly
s
i
s
a
r
e
a
, our
da
ta
s
e
t
is
t
it
le
d A
R
e
v.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
aB
E
R
T
t
r
ans
fo
r
m
e
r
m
od
e
l
f
or
A
r
abi
c
c
om
m
e
nt
s
and r
e
v
ie
w
s
analy
s
is
(
H
ic
ham
E
l
M
oubt
ahi
j
)
383
F
ig
ur
e
3. A
r
c
hi
te
c
tu
r
e
of
t
he
f
in
e
-
tu
ni
ng
T
a
bl
e
1. M
od
e
l
pr
e
-
tr
a
in
in
g pa
r
a
m
e
te
r
s
M
ode
l
S
i
z
e
P
r
e
-
s
e
gm
e
nt
a
t
i
on
D
a
t
a
s
e
t
MB
P
a
r
a
m
.
S
e
nt
e
nc
e
s
S
i
z
e
W
or
ds
A
r
a
B
E
R
T
v0.2
-
ba
s
e
543 M
136M
No
200 M
77 G
B
8.6 B
A
r
a
B
E
R
T
v0.2
-
l
a
r
ge
1.38 G
371M
No
200 M
77 G
B
8.6 B
A
r
a
B
E
R
T
v2
-
ba
s
e
543 M
B
136M
Y
e
s
200 M
77 G
B
8.6 B
A
r
a
B
E
R
T
v2
-
l
a
r
ge
1.38 G
371M
Y
e
s
200 M
77 G
B
8.6 B
A
r
a
B
E
R
T
v0.1
-
ba
s
e
543 M
B
136M
No
77 M
23 G
B
2.7 B
A
r
a
B
E
R
T
v1
-
ba
s
e
543 M
B
136M
Y
e
s
77 M
23 G
B
2.7 B
F
ig
ur
e
4. A
r
a
B
E
R
T
a
r
c
hi
te
c
tu
r
e
ove
r
vi
e
w
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
20
22
:
379
-
387
384
A
t
th
e
in
put
of
our
s
y
s
te
m
,
w
e
go
th
r
ough
th
e
pr
e
-
pr
oc
e
s
s
in
g
s
ta
ge
w
he
r
e
w
e
c
le
a
n
th
e
te
xt
of
a
ny
uns
e
nt
im
e
nt
a
l
c
ont
e
nt
, s
uc
h a
s
us
e
r
na
m
e
s
, ha
s
ht
a
g
s
a
nd U
R
L
s
, a
nd t
he
n pr
oc
e
e
d t
o s
e
gm
e
nt
t
he
t
e
xt
by us
in
g
th
e
F
a
r
a
s
a
s
e
gm
e
nt
a
ti
on
[
21]
.
F
ir
s
t,
w
e
s
e
gm
e
nt
th
e
w
or
ds
in
to
s
te
m
s
,
pr
e
f
ix
e
s
a
nd
s
uf
f
ix
e
s
.
L
ook
f
or
s
e
nt
e
nc
e
,
“
ب
ا
ت
ك
ل
ا
–
A
lk
it
ta
b
“
be
c
om
e
s
“
"
ب
+
ا
ت
ك
+
ل
ا
-
"
A
l+
ki
tt
a
+
b
"
.
T
he
n,
in
uni
gr
a
m
m
ode
,
w
e
tr
a
in
e
d
a
S
e
nt
e
nc
e
P
ie
c
e
[
22]
on
th
e
s
e
gm
e
nt
e
d
pr
e
-
tr
a
in
in
g
da
ta
s
e
t
to
pr
oduc
e
a
s
ubw
or
d
voc
a
bul
a
r
y
of
m
or
e
th
a
n
59K
to
ke
ns
.
I
t
m
us
t
be
not
e
d
th
a
t
be
f
or
e
th
e
a
ppl
i
c
a
ti
on
of
F
a
r
a
s
a
s
e
gm
e
nt
a
ti
on,
th
e
d
a
ta
s
e
t
th
a
t
is
u
s
e
d
f
or
pr
e
-
tr
a
in
in
g
ha
s
a
s
iz
e
m
or
e
of
70
G
B
,
m
or
e
th
a
n
8.5
bi
ll
io
n
w
or
ds
a
nd
m
or
e
th
a
n
200
m
il
li
on
s
e
nt
e
n
c
e
s
.
T
o
c
r
e
a
te
a
w
e
ll
pr
e
-
tr
a
in
in
g
da
ta
s
e
t,
w
e
us
e
d
s
e
ve
r
a
l
w
e
b
s
i
te
s
s
u
c
h
a
s
:
i)
O
S
I
A
N
C
or
pus
.
ii
)
A
r
a
bi
c
W
ik
ip
e
di
a
dum
p
, i
ii
)
A
s
s
a
f
ir
ne
w
s
a
r
ti
c
le
s
, i
v)
1.5 bil
li
on w
or
d A
r
a
bi
c
C
or
pus
, a
nd v)
O
S
C
A
R
unf
il
te
r
e
d a
nd s
or
te
d
I
n
our
m
ode
l
ba
s
e
d
on
A
r
a
B
E
R
T
,
w
e
s
uc
c
e
s
s
iv
e
ly
us
e
d
tw
o
s
pe
c
ia
l
to
ke
ns
:
T
ok1:
s
e
gm
e
nt
s
e
pa
r
a
ti
on
(
“
S
E
P
’
)
a
nd T
ok2:
c
la
s
s
if
ic
a
ti
on (
“
C
L
S
”
)
.
F
or
a
ny c
la
s
s
if
ie
r
, w
e
us
e
d i
t
a
s
t
he
f
ir
s
t
in
put
t
oke
n w
hi
c
h w
e
h
e
lp
us
to
de
r
iv
e
a
n
out
put
ve
c
to
r
.
T
he
n,
in
or
de
r
to
obt
a
in
th
e
pr
oba
bi
li
ty
di
s
tr
ib
ut
io
n
on
th
e
pr
e
di
c
te
d
out
put
c
la
s
s
e
s
, w
e
a
dd a
s
im
pl
e
la
ye
r
c
om
po
s
e
d of
f
e
e
d
-
f
or
w
a
r
d a
nd S
of
tm
a
x s
e
e
(
1)
:
P
=
s
o
ft
m
a
x
C
W
T
)
(
1)
w
he
r
e
P
is
p
r
oba
bi
li
ty
of
e
a
c
h
c
a
t
e
gor
y
,
W
is
m
a
tr
ix
of
th
e
c
la
s
s
if
ic
a
ti
on
la
ye
r
,
a
nd
C
is
o
ut
put
of
th
e
tr
a
ns
f
or
m
e
r
s
.
4.
E
X
P
E
R
I
M
E
N
T
A
N
D
R
E
S
U
L
T
S
4.1
.
A
R
e
v d
at
as
e
t
W
e
e
va
lu
a
te
d
our
m
ode
l
on
th
e
s
e
nt
im
e
nt
a
na
ly
s
i
s
ta
s
k.
F
or
th
i
s
r
e
a
s
on,
w
e
us
e
d
th
e
A
r
a
bi
c
r
e
vi
e
w
s
(
A
r
ev
)
da
ta
s
e
t
[
23]
.
U
s
in
g
th
e
F
a
c
e
book
A
P
I
,
th
e
A
R
e
v
da
ta
s
e
t
is
bui
lt
by
m
or
e
th
a
n
100
K
c
om
m
e
nt
s
of
th
e
m
os
t
popula
r
A
lg
e
r
ia
n
F
a
c
e
book
pa
ge
s
.
W
e
ne
e
de
d
tr
e
e
in
put
f
or
our
A
R
e
v
d
a
ta
s
e
t
w
hi
c
h
a
r
e
:
th
e
F
a
c
e
book
pa
ge
id
e
nt
if
ie
r
,
th
e
id
e
nt
if
ie
r
of
th
e
F
a
c
e
book
p
a
ge
pos
t
a
nd
th
e
a
c
c
e
s
s
to
ke
n
a
s
s
how
n
in
F
ig
ur
e
5
. T
o
e
nr
ic
h
our
A
R
e
v
da
ta
s
e
t,
th
r
e
e
op
e
n
-
s
our
c
e
da
ta
s
e
ts
of
m
ode
r
n
s
ta
nd
a
r
d
A
r
a
bi
c
a
nd
A
lg
e
r
ia
n
A
r
a
bi
c
c
om
m
e
nt
s
a
r
e
us
e
d
s
e
e
T
a
bl
e
2.
F
in
a
ll
y,
a
f
te
r
pr
e
-
pr
oc
e
s
s
in
g
a
nd
de
l
e
ti
ng
th
e
dupl
ic
a
te
e
le
m
e
nt
s
,
th
e
d
a
ta
s
e
t
i
s
s
a
ve
d
in
C
S
V
f
or
m
a
t.
T
he
s
ta
ti
s
ti
c
s
of
our
da
ta
s
e
t
a
r
e
pr
e
s
e
nt
e
d i
n T
a
bl
e
3
.
F
ig
ur
e
5. I
nput
s
of
da
ta
s
e
t
c
ol
le
c
ti
on f
r
om
F
a
c
e
book
T
a
bl
e
2. V
a
r
io
us
d
a
ta
s
e
t
s
us
e
d
D
a
t
a
s
e
t
s
T
ype
of
l
a
ngua
ge
D
e
s
c
r
i
pt
i
on
L
A
B
R
[
24
]
S
t
a
nda
r
d A
r
a
bi
c
B
ook r
e
vi
e
w
s
T
he
da
t
a
s
e
t
of
E
l
s
a
ha
r
a
nd
El
-
B
e
l
t
a
gy
[
25]
H
ot
e
l
r
e
vi
e
w
s
, r
e
s
t
a
ur
a
nt
r
e
vi
e
w
s
, pr
oduc
t
r
e
vi
e
w
s
,
a
t
t
r
a
c
t
i
on r
e
vi
e
w
s
,
m
ovi
e
r
e
vi
e
w
s
.
T
he
da
t
a
s
e
t
of
M
a
t
a
oui
et
al
.
[
26]
A
l
ge
r
i
a
n D
i
a
l
e
c
t
C
om
m
e
nt
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
aB
E
R
T
t
r
ans
fo
r
m
e
r
m
od
e
l
f
or
A
r
abi
c
c
om
m
e
nt
s
and r
e
v
ie
w
s
analy
s
is
(
H
ic
ham
E
l
M
oubt
ahi
j
)
385
T
a
bl
e
3. S
ta
ti
s
ti
c
s
on t
h
e
A
R
e
v d
a
ta
s
e
t
P
os
i
t
i
ve
N
e
ga
t
i
ve
T
ot
a
l
c
om
m
e
nt
s
24932
24932
T
ot
a
l
w
or
ds
1180663
1345029
A
vg. w
or
ds
i
n e
a
c
h c
om
m
e
nt
47.36
53.95
A
vg. c
ha
r
a
c
t
e
r
s
i
n e
a
c
h c
om
m
e
nt
253.15
294.47
4.2. E
xp
e
r
im
e
n
t
al
s
e
t
u
p
W
e
us
e
d
th
e
G
oogl
e
C
ol
a
b
to
ol
to
r
un
our
e
xpe
r
im
e
nt
s
w
h
e
r
e
w
e
c
a
n
ta
ke
good
a
dva
nt
a
ge
of
T
e
ns
or
F
lo
w
’
s
pe
r
f
or
m
a
nc
e
.
N
ot
e
th
a
t
w
e
w
or
ke
d
w
it
h
a
m
a
s
k
in
g
pr
oba
bi
li
ty
of
15%
,
a
r
a
ndom
s
e
e
d
of
34,
a
nd
a
dupl
ic
a
ti
on
f
a
c
to
r
w
a
s
s
e
t
to
10.
I
n
our
a
ppr
oa
c
h,
w
e
w
or
ke
d
th
r
ough
th
e
ve
r
s
io
n
of
A
r
a
B
E
R
T
v1
im
pl
e
m
e
nt
e
d
in
th
e
w
or
k
of
[
6]
w
he
r
e
our
m
ode
l
w
a
s
pr
e
-
tr
a
in
e
d
on
a
T
P
U
v2
-
8
pod.
T
a
bl
e
4
r
e
s
um
e
th
e
pa
r
a
m
e
te
r
s
us
e
d f
or
f
in
e
-
tu
ni
ng i
n our
m
ode
ls
.
T
a
bl
e
4
.
P
a
r
a
m
e
te
r
v
a
lu
e
s
P
a
r
a
m
e
t
e
r
V
a
l
ue
L
e
a
r
ni
ng R
a
t
e
1e
-
4
E
ps
i
l
on (
A
da
m
opt
i
m
i
z
e
r
)
1e
-
8
M
a
xi
m
um
S
e
que
nc
e
L
e
ngt
h
256
E
poc
hs
27
4.3. Re
s
u
lt
s
an
d
d
is
c
u
s
s
io
n
T
o s
how
t
he
i
m
por
ta
nc
e
of
our
m
odul
e
, w
e
c
om
pa
r
e
d t
he
r
e
s
ul
t
obt
a
in
e
d by our
a
ppr
oa
c
h w
it
h t
hos
e
e
xi
s
ti
ng
in
th
e
s
ta
te
of
th
e
a
r
t
f
or
th
e
do
m
a
in
of
s
e
nt
im
e
nt
a
n
a
ly
s
is
.
F
or
th
is
r
e
a
s
on,
w
e
us
e
d
th
e
a
c
c
ur
a
c
y
m
e
tr
ic
,
a
s
s
how
n
in
T
a
bl
e
5
.
T
he
pr
e
vi
ous
r
e
s
ul
ts
s
how
th
a
t
our
a
ppr
oa
c
h
gi
ve
s
a
n
im
por
ta
nt
r
e
s
ul
t
th
a
t
is
c
om
pa
r
a
ti
ve
to
th
os
e
of
th
e
s
ta
te
of
th
e
a
r
t.
W
e
obt
a
in
e
d
a
n
a
c
c
ur
a
c
y
va
lu
e
of
92.5%
f
or
a
da
ta
b
a
s
e
c
ont
a
in
in
g
m
or
e
th
a
n
40,000
c
om
m
e
nt
s
w
r
it
te
n
by
a
m
ix
tu
r
e
of
s
ta
nda
r
d
A
r
a
bi
c
a
nd
A
lg
e
r
ia
n
di
a
l
e
c
t.
H
ow
e
ve
r
,
th
e
a
ppr
oa
c
h
of
A
lo
m
a
r
i
e
t
al
.
[
27]
gi
ve
s
a
n
a
c
c
ur
a
c
y
va
lu
e
be
tt
e
r
th
a
n
our
s
by
+
1.3%
,
w
hi
c
h
i
s
a
s
li
ght
di
f
f
e
r
e
nc
e
due
t
o
t
he
t
w
o
r
e
a
s
ons
f
ol
lo
w
in
g:
Fi
r
s
tl
y, t
he
n
um
be
r
of
twe
e
ts
i
n
[
27]
doe
s
not
e
xc
e
e
d 1800
twe
e
ts
,
s
e
c
ondl
y,
th
e
la
ngua
g
e
m
ix
us
e
d
in
our
a
ppr
oa
c
h
ge
ne
r
a
te
s
m
or
e
li
ngui
s
ti
c
s
pe
c
if
ic
a
ti
ons
th
a
n
th
e
J
or
da
ni
a
n
di
a
le
c
t.
T
he
A
r
a
B
E
R
T
v1
w
it
h
th
e
be
s
t
pa
r
a
m
e
te
r
s
c
hos
e
n
f
or
f
in
e
-
tu
ni
ng
gi
ve
s
our
a
pp
r
oa
c
h
th
is
c
om
pe
ti
ti
ve
ne
s
s
ov
e
r
ot
he
r
m
ode
ls
.
T
a
bl
e
5. P
e
r
f
or
m
a
nc
e
of
our
m
ode
l
im
pl
e
m
e
nt
e
d on Ar
a
B
E
R
T
v
1 c
om
pa
r
e
d by the
pr
e
vi
ous
s
ta
te
of
t
he
a
r
t
s
ys
te
m
s
D
a
t
a
s
e
t
D
e
s
c
r
i
pt
i
ons
L
a
ngua
ge
A
c
c
ur
a
c
y
A
S
T
D
[
28]
T
he
da
t
a
s
e
t
c
ont
a
i
ns
10,000 t
w
e
e
t
s
.
E
gypt
i
a
n di
a
l
e
c
t
92.6
A
r
s
e
n T
D
l
e
v
[
29]
T
he
da
t
a
s
e
t
c
ont
a
i
ns
4,000 t
w
e
e
t
s
.
L
e
va
nt
i
ne
di
a
l
e
c
t
59.4
A
J
G
T
[
27]
T
he
A
r
a
bi
c
J
or
da
ni
a
n G
e
ne
r
a
l
T
w
e
e
t
s
da
t
a
s
e
t
c
ont
a
i
ns
m
or
e
t
ha
n
1,800 t
w
e
e
t
s
.
J
or
da
ni
a
n di
a
l
e
c
t
93.8
A
r
S
a
r
c
a
s
m
-
v2
[
30]
C
ol
l
e
c
t
i
on of
15,548 s
a
r
c
a
s
m
a
nd
s
e
nt
i
m
e
nt
t
w
e
e
t
s
.
S
t
a
nda
r
d A
r
a
bi
c
a
nd
di
a
l
e
c
t
a
l
A
r
a
bi
c
67.7
A
R
e
v
O
ur
da
t
a
s
e
t
T
he
D
a
t
a
s
e
t
of
a
m
i
xt
ur
e
of
c
om
m
e
nt
s
a
nd H
ot
e
l
r
e
vi
e
w
s
, r
e
s
t
a
ur
a
nt
r
e
vi
e
w
s
, pr
oduc
t
r
e
vi
e
w
s
, a
t
t
r
a
c
t
i
on r
e
vi
e
w
s
, m
ovi
e
r
e
vi
e
w
s
.
S
t
a
nda
r
d A
r
a
bi
c
a
nd
A
l
ge
r
i
a
n di
a
l
e
c
t
92.5
5.
C
O
N
C
L
U
S
I
O
N
A
N
D
F
U
T
U
R
E
WORK
T
he
a
ut
om
a
ti
c
unde
r
s
ta
ndi
ng
of
A
r
a
bi
c
s
c
r
ip
ts
i
s
s
ti
ll
a
c
ha
ll
e
ngi
ng
pr
oc
e
s
s
a
nd
a
n
ope
n
is
s
ue
f
or
r
e
s
e
a
r
c
he
r
s
in
th
e
N
L
P
f
ie
ld
.
I
n
th
is
w
or
k,
w
e
ha
v
e
pr
e
s
e
nt
e
d
o
ur
a
ppr
oa
c
h
ba
s
e
d
on
th
e
A
r
a
B
E
R
T
la
ngua
g
e
m
ode
l.
A
ls
o,
w
e
ha
v
e
de
s
c
r
ib
e
d
a
nd
de
ta
il
e
d
th
e
m
a
in
s
te
p
s
of
th
e
pr
opos
e
d
a
r
c
hi
te
c
tu
r
e
us
in
g
di
a
gr
a
m
s
a
nd
e
xa
m
pl
e
s
.
T
he
pr
oc
e
s
s
s
ta
r
ts
w
it
h
th
e
in
put
of
our
m
ode
l
in
to
a
pr
e
-
pr
oc
e
s
s
e
d
te
xt
f
r
om
th
e
A
R
e
v
da
ta
ba
s
e
,
th
e
n
ve
r
s
io
n
1
of
th
e
A
r
a
B
E
R
T
m
od
e
l
w
a
s
im
pl
e
m
e
nt
e
d
b
y
us
in
g
F
a
r
a
s
a
s
e
gm
e
nt
a
ti
on.
M
or
e
ove
r
,
our
e
va
lu
a
ti
on
is
ba
s
e
d
on
th
e
A
R
e
v
da
ta
s
e
t,
w
hi
c
h
c
ont
a
in
s
m
or
e
th
a
n
40,000
c
om
m
e
nt
s
a
nd
r
e
vi
e
w
s
.
W
it
h
w
e
ll
-
tu
ne
d
pa
r
a
m
e
te
r
s
of
th
e
A
r
a
B
E
R
T
m
ode
l,
w
e
obt
a
in
e
d
a
n
a
c
c
ur
a
c
y
va
lu
e
of
92.5%
,
w
hi
c
h
r
e
pr
e
s
e
nt
s
a
ve
r
y
c
om
pe
ti
ti
ve
r
e
s
ul
t.
I
n
f
ut
ur
e
w
or
k,
w
e
a
im
to
a
ddr
e
s
s
th
e
pr
obl
e
m
of
A
r
a
bi
c
te
xt
s
e
gm
e
nt
a
ti
on,
tr
y
to
im
pr
ove
t
he
f
a
r
a
s
a
s
e
gm
e
nt
a
ti
on ve
r
s
io
n.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
11
, N
o.
1
,
M
a
r
c
h
20
22
:
379
-
387
386
R
E
F
E
R
E
N
C
E
S
[
1]
N
.
B
ouda
d,
R
.
F
a
i
z
i
,
R
.
O
ul
a
d
H
a
j
T
ha
m
i
,
a
nd
R
.
C
hi
he
b,
“
S
e
nt
i
m
e
nt
a
na
l
ys
i
s
i
n
A
r
a
bi
c
:
A
r
e
vi
e
w
of
t
he
l
i
t
e
r
a
t
ur
e
,”
A
i
n
Sham
s
E
ngi
ne
e
r
i
ng J
our
nal
, vol
. 9, no. 4, pp. 2479
–
2490, D
e
c
. 2018, doi
:
10.1016/
j
.a
s
e
j
.2017.04.007.
[
2]
A
.
W
a
dha
w
a
n,
“
D
i
a
l
e
c
t
i
de
nt
i
f
i
c
a
t
i
on
i
n
nua
nc
e
d
a
r
a
bi
c
t
w
e
e
t
s
us
i
ng
f
a
r
a
s
a
s
e
gm
e
nt
a
t
i
on
a
nd
A
r
a
B
E
R
T
,”
ar
X
i
v
:
2102.09749
,
F
e
b. 2021.
[
3]
I
.
G
ue
l
l
i
l
,
H
.
S
a
â
da
ne
,
F
.
A
z
oua
ou,
B
.
G
ue
ni
,
a
nd
D
.
N
ouve
l
,
“
A
r
a
bi
c
na
t
ur
a
l
l
a
ngua
ge
pr
oc
e
s
s
i
ng:
A
n
ove
r
vi
e
w
,”
J
our
nal
of
K
i
ng
Saud
U
ni
v
e
r
s
i
t
y
-
C
o
m
put
e
r
and
I
nf
or
m
at
i
on
Sc
i
e
nc
e
s
,
vol
.
3
3,
no.
5,
pp.
497
–
507,
J
un.
2021,
doi
:
10.1016/
j
.j
ks
uc
i
.2019.02.006.
[4
]
T
. T
oba
i
l
i
, “
S
e
nt
i
m
e
nt
a
na
l
ys
i
s
f
or
t
he
l
ow
-
r
e
s
our
c
e
d l
a
t
i
ni
s
e
d A
r
a
bi
c
‘
A
r
a
bi
z
i
’
’
,’
”
T
he
O
pe
n U
ni
ve
r
s
i
t
y, 2020.
[
5]
I
.
G
ue
l
l
i
l
,
F
.
A
z
oua
ou,
F
.
B
e
na
l
i
,
A
.
E
.
H
a
c
ha
ni
,
a
nd
M
.
M
e
ndo
z
a
,
“
T
he
r
ol
e
of
t
r
a
ns
l
i
t
e
r
a
t
i
on
i
n
t
he
pr
oc
e
s
s
of
A
r
a
bi
z
i
t
r
a
ns
l
a
t
i
on/
s
e
nt
i
m
e
nt
a
na
l
ys
i
s
,
”
i
n
St
udi
e
s
i
n C
om
put
at
i
onal
I
nt
e
l
l
i
ge
nc
e
, S
pr
i
nge
r
I
nt
e
r
na
t
i
ona
l
P
ubl
i
s
hi
ng, 2020, pp. 101
–
128.
[
6]
W
. A
nt
oun, F
. B
a
l
y, a
nd H
. H
a
j
j
, “
A
r
a
B
E
R
T
:
t
r
a
ns
f
or
m
e
r
-
ba
s
e
d m
ode
l
f
or
A
r
a
bi
c
l
a
ngua
ge
unde
r
s
t
a
ndi
ng,”
i
n
P
r
o
c
e
e
di
ngs
of
t
h
e
58t
h A
nnual
M
e
e
t
i
ng of
t
he
A
s
s
oc
i
at
i
on f
or
C
om
put
at
i
onal
L
i
ngui
s
t
i
c
s
, J
ul
. 2020, pp. 8440
–
8451.
[
7]
A
. C
onne
a
u
e
t
al
.
, “
U
ns
upe
r
vi
s
e
d
c
r
os
s
-
l
i
ngua
l
r
e
pr
e
s
e
nt
a
t
i
on l
e
a
r
ni
ng a
t
s
c
a
l
e
,
”
i
n
P
r
oc
e
e
di
ngs
of
t
he
58t
h A
nnual
M
e
e
t
i
ng of
t
he
A
s
s
oc
i
at
i
on f
or
C
om
put
at
i
onal
L
i
ngui
s
t
i
c
s
, N
ov. 2020, pp. 8440
–
8451.
[
8]
D
.
A
di
w
a
r
da
na
e
t
al
.
,
“
T
ow
a
r
ds
a
hum
a
n
-
l
i
ke
ope
n
-
dom
a
i
n
c
ha
t
bot
,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
E
l
e
v
e
nt
h
I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
L
anguage
R
e
s
our
c
e
s
and E
v
al
uat
i
on (
L
R
E
C
2018)
, J
a
n. 2018, pp. 52
–
55.
[
9]
T
.
M
i
kol
ov,
I
.
S
ut
s
ke
ve
r
,
K
.
C
he
n,
G
.
S
.
C
or
r
a
do,
a
nd
J
.
D
e
a
n,
“
D
i
s
t
r
i
but
e
d
r
e
pr
e
s
e
nt
a
t
i
ons
of
w
or
ds
a
nd
phr
a
s
e
s
a
nd
t
he
i
r
c
om
pos
i
t
i
ona
l
i
t
y,”
i
n
A
dv
anc
e
s
i
n ne
ur
al
i
nf
or
m
at
i
on pr
oc
e
s
s
i
ng s
y
s
t
e
m
s
, 2013
, pp. 3111
–
3119.
[
10]
J
.
P
e
nni
ngt
on,
R
.
S
oc
he
r
,
a
nd
C
.
M
a
nni
ng,
“
G
l
ove
:
gl
oba
l
ve
c
t
or
s
f
or
w
or
d
r
e
pr
e
s
e
nt
a
t
i
on,”
i
n
P
r
o
c
e
e
di
ngs
of
t
he
2014
C
onf
e
r
e
nc
e
on E
m
pi
r
i
c
al
M
e
t
hods
i
n N
at
ur
al
L
anguage
P
r
oc
e
s
s
i
ng (
E
M
N
L
P
)
, 2014, pp. 1532
–
1543, doi
:
10.3115/
v1/
D
14
-
1162.
[
11]
T
.
M
i
kol
ov,
E
.
G
r
a
ve
,
P
.
B
oj
a
now
s
ki
,
C
.
P
uhr
s
c
h,
a
nd
A
.
J
ou
l
i
n,
“
A
dva
nc
e
s
i
n
pr
e
-
t
r
a
i
ni
ng
di
s
t
r
i
but
e
d
w
o
r
d
r
e
pr
e
s
e
nt
a
t
i
ons
,”
D
e
c
. 2017, [
O
nl
i
ne
]
. A
va
i
l
a
bl
e
:
ht
t
p:
/
/
a
r
xi
v.or
g/
a
bs
/
1712.09405.
[
12]
M
.
P
e
t
e
r
s
e
t
al
.
,
“
D
e
e
p
c
ont
e
xt
u
a
l
i
z
e
d
w
or
d
r
e
pr
e
s
e
nt
a
t
i
ons
,”
i
n
P
r
oc
e
e
di
ng
s
of
t
he
2018
C
onf
e
r
e
n
c
e
of
t
he
N
or
t
h
A
m
e
r
i
c
an
C
hapt
e
r
of
t
he
A
s
s
oc
i
at
i
on
f
or
C
om
put
at
i
onal
L
i
ngui
s
t
i
c
s
:
H
um
an
L
anguage
T
e
c
hnol
ogi
e
s
,
V
ol
um
e
1
(
L
ong
P
ape
r
s
)
,
2018,
pp.
2227
–
2237, doi
:
10.18653/
v1/
N
18
-
1202.
[
13]
J
.
D
e
vl
i
n,
M
.
-
W
.
C
ha
ng,
K
.
L
e
e
,
a
nd
K
.
T
out
a
nova
,
“
B
E
R
T
:
p
r
e
-
t
r
a
i
ni
ng
of
de
e
p
bi
di
r
e
c
t
i
ona
l
t
r
a
ns
f
or
m
e
r
s
f
o
r
l
a
ngua
ge
unde
r
s
t
a
ndi
ng,”
ar
X
i
v
:
1810.04805
, O
c
t
. 2018.
[
14]
J
.
H
ow
a
r
d
a
nd
S
.
R
ude
r
,
“
U
ni
ve
r
s
a
l
l
a
ngua
g
e
m
ode
l
f
i
ne
-
t
uni
ng
f
or
t
e
xt
c
l
a
s
s
i
f
i
c
a
t
i
on,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
56t
h
A
nnual
M
e
e
t
i
ng of
t
he
A
s
s
oc
i
at
i
on f
or
C
om
put
at
i
onal
L
i
ngui
s
t
i
c
s
, J
a
n. 2018, p
p. 328
–
3
39, doi
:
10.18653/
v1/
P
18
-
1031.
[
15]
C
. R
a
f
f
e
l
e
t
al
.
, “
E
xpl
or
i
ng t
he
l
i
m
i
t
s
of
t
r
a
ns
f
e
r
l
e
a
r
ni
ng w
i
t
h a
uni
f
i
e
d t
e
xt
-
to
-
t
e
xt
t
r
a
ns
f
or
m
e
r
,”
ar
X
i
v
:
1910.10683
, O
c
t
. 2019.
[
16]
Z
.
L
a
n,
M
.
C
he
n,
S
.
G
oodm
a
n,
K
.
G
i
m
pe
l
,
P
.
S
ha
r
m
a
,
a
nd
R
.
S
or
i
c
ut
,
“
A
L
B
E
R
T
:
a
l
i
t
e
B
E
R
T
f
or
s
e
l
f
-
s
upe
r
vi
s
e
d
l
e
a
r
ni
ng
of
l
a
ngua
ge
r
e
pr
e
s
e
nt
a
t
i
ons
,”
a
r
X
i
v
:
1909.11942
, S
e
p. 2019.
[
17]
A
.
M
.
A
.
N
a
da
,
E
.
A
l
a
j
r
a
m
i
,
A
.
A
.
A
l
-
S
a
qqa
,
a
nd
S
.
S
.
A
bu
-
N
a
s
e
r
,
“
A
r
a
bi
c
t
e
xt
s
um
m
a
r
i
z
a
t
i
on
us
i
ng
A
r
a
B
E
R
T
m
ode
l
us
i
ng
e
xt
r
a
c
t
i
ve
t
e
xt
s
um
m
a
r
i
z
a
t
i
on
a
p
pr
oa
c
h,”
I
nt
e
r
nat
i
onal
J
our
nal
of
A
c
ade
m
i
c
I
nf
or
m
at
i
on
Sy
s
t
e
m
s
R
e
s
e
a
r
c
h
(
I
J
A
I
SR
)
,
vol
.
4,
no.
8, pp. 6
–
9, 2020.
[
18]
H
.
A
l
a
m
i
,
S
.
O
ua
t
i
k
E
l
A
l
a
oui
,
A
.
B
e
nl
a
hbi
b,
a
nd
N
.
E
n
-
na
hna
hi
,
“
L
I
S
A
C
F
S
D
M
-
U
S
M
B
A
T
e
a
m
a
t
S
e
m
E
va
l
-
2020
T
a
s
k
12:
O
ve
r
c
om
i
ng
A
r
a
B
E
R
T
’
s
pr
e
t
r
a
in
-
f
i
ne
t
une
di
s
c
r
e
pa
nc
y
f
or
A
r
a
bi
c
of
f
e
ns
i
ve
l
a
ngua
ge
i
de
nt
i
f
i
c
a
t
i
on,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
F
our
t
e
e
nt
h W
or
k
s
hop on Se
m
ant
i
c
E
v
al
uat
i
on
, 2020, pp. 2080
–
2085, doi
:
10.18653/
v1/
2020.s
e
m
e
va
l
-
1.275.
[
19]
D
.
F
a
r
a
j
a
nd
M
.
A
bdul
l
a
h,
“
S
a
r
c
a
s
m
D
e
t
a
t
S
e
m
E
va
l
-
2021
T
a
s
k
7:
de
t
e
c
t
hu
m
or
a
nd
of
f
e
ns
i
ve
b
a
s
e
d
on
de
m
ogr
a
phi
c
f
a
c
t
or
s
us
i
ng R
oB
E
R
T
a
pr
e
-
t
r
a
i
ne
d m
ode
l
,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
15t
h I
nt
e
r
nat
i
onal
W
or
k
s
hop on
Se
m
ant
i
c
E
v
al
uat
i
on (
Se
m
E
v
al
-
2021)
,
2021, pp. 527
–
533, doi
:
10.18653/
v1/
2021.s
e
m
e
va
l
-
1.64.
[
20]
A
.
H
us
s
e
i
n,
N
.
G
hne
i
m
,
a
nd
A
.
J
oukha
da
r
,
“
D
a
m
a
s
c
us
T
e
a
m
a
t
N
L
P
4I
F
2021:
f
i
ght
i
ng
t
he
A
r
a
bi
c
C
O
V
I
D
-
19
i
nf
ode
m
i
c
on
T
w
i
t
t
e
r
us
i
ng A
r
a
B
E
R
T
,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
F
our
t
h W
or
k
s
hop on
N
L
P
f
or
I
nt
e
r
ne
t
F
r
e
e
dom
:
C
e
n
s
or
s
hi
p, D
i
s
i
nf
or
m
at
i
on, and
P
r
opaganda
, 2021, pp. 93
–
98, doi
:
10.18653/
v1/
2021.nl
p4i
f
-
1.13.
[
21]
A
.
A
bde
l
a
l
i
,
K
.
D
a
r
w
i
s
h,
N
.
D
ur
r
a
ni
,
a
nd
H
.
M
ub
a
r
a
k,
“
F
a
r
a
s
a
:
a
f
a
s
t
a
nd
f
ur
i
ous
s
e
gm
e
nt
e
r
f
or
A
r
a
bi
c
,”
i
n
P
r
oc
e
e
di
ng
s
of
t
h
e
2016
C
onf
e
r
e
nc
e
of
t
he
N
or
t
h
A
m
e
r
i
c
an
C
hapt
e
r
of
t
he
A
s
s
oc
i
at
i
on
f
or
C
om
put
at
i
onal
L
i
ngui
s
t
i
c
s
:
D
e
m
ons
t
r
at
i
ons
,
2016,
pp.
11
–
16, doi
:
10.18653/
v1/
N
16
-
3003.
[
22]
T
.
K
udo,
“
S
ubw
or
d
r
e
gul
a
r
i
z
a
t
i
on:
i
m
pr
ovi
ng
ne
ur
a
l
ne
t
w
or
k
t
r
a
ns
l
a
t
i
on
m
ode
l
s
w
i
t
h
m
ul
t
i
pl
e
s
ubw
or
d
c
a
ndi
da
t
e
s
,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
56t
h
A
nnual
M
e
e
t
i
ng
of
t
he
A
s
s
o
c
i
at
i
on
f
or
C
om
put
at
i
ona
l
L
i
ngui
s
t
i
c
s
(
V
ol
um
e
1:
L
ong
P
ape
r
s
)
,
2018,
pp.
66
–
75, doi
:
10.18653/
v1/
P
18
-
1007.
[
23]
A
.
A
bde
l
l
i
,
F
.
G
ue
r
r
ouf
,
O
.
T
i
be
r
m
a
c
i
ne
,
a
nd
B
.
A
bde
l
l
i
,
“
S
e
nt
i
m
e
nt
a
n
a
l
y
s
i
s
of
A
r
a
bi
c
A
l
ge
r
i
a
n
di
a
l
e
c
t
us
i
ng
a
s
up
e
r
vi
s
e
d
m
e
t
hod,”
i
n
2019
I
n
t
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on
I
nt
e
l
l
i
ge
nt
Sy
s
t
e
m
s
and
A
dv
anc
e
d
C
om
put
i
ng
Sc
i
e
nc
e
s
(
I
SA
C
S)
,
D
e
c
.
2019,
pp.
1
–
6, doi
:
10.1109/
I
S
A
C
S
48493.2019.9068897.
[
24]
M
. A
l
y a
nd A
. A
t
i
ya
, “
L
a
br
:
A
l
a
r
ge
s
c
a
l
e
A
r
a
bi
c
book r
e
vi
e
w
s
da
t
a
s
e
t
,”
2013.
[
25]
H
.
E
l
S
a
ha
r
a
nd
S
.
R
.
E
l
-
B
e
l
t
a
gy,
“
B
ui
l
di
ng
l
a
r
ge
A
r
a
bi
c
m
ul
t
i
-
dom
a
i
n
r
e
s
our
c
e
s
f
or
s
e
nt
i
m
e
nt
a
na
l
ys
i
s
,”
i
n
C
om
put
at
i
onal
L
i
ngui
s
t
i
c
s
and I
nt
e
l
l
i
ge
nt
T
e
x
t
P
r
oc
e
s
s
i
ng
, S
pr
i
nge
r
I
nt
e
r
na
t
i
ona
l
P
ubl
i
s
hi
ng, 2015, pp. 23
–
34.
[
26]
M
.
M
a
t
a
oui
,
O
.
Z
e
l
m
a
t
i
,
a
nd
M
.
B
oum
e
c
ha
c
h
e
,
“
A
pr
opos
e
d
l
e
xi
c
on
-
ba
s
e
d
s
e
nt
i
m
e
nt
a
na
l
ys
i
s
a
ppr
oa
c
h
f
or
t
he
ve
r
na
c
ul
a
r
A
l
ge
r
i
a
n A
r
a
bi
c
,”
R
e
s
e
ar
c
h i
n
C
om
put
i
ng Sc
i
e
nc
e
, vol
. 110, no. 1, pp. 55
–
70,
D
e
c
. 2016, doi
:
10.13053/
r
c
s
-
110
-
1
-
5.
[
27]
K
.
M
.
A
l
om
a
r
i
,
H
.
M
.
E
l
S
he
r
i
f
,
a
nd
K
.
S
ha
a
l
a
n,
“
A
r
a
bi
c
t
w
e
e
t
s
s
e
nt
i
m
e
nt
a
l
a
na
l
ys
i
s
us
i
ng
m
a
c
hi
ne
l
e
a
r
ni
ng,”
i
n
A
dv
anc
e
s
i
n
A
r
t
i
f
i
c
i
al
I
nt
e
l
l
i
ge
nc
e
:
F
r
om
T
he
or
y
t
o P
r
ac
t
i
c
e
, S
pr
i
nge
r
I
nt
e
r
na
t
i
ona
l
P
ubl
i
s
hi
ng, 2017, pp. 602
–
610.
[
28]
M
.
N
a
bi
l
,
M
.
A
l
y,
a
nd
A
.
A
t
i
ya
,
“
A
S
T
D
:
A
r
a
bi
c
s
e
nt
i
m
e
nt
t
w
e
e
t
s
da
t
a
s
e
t
,”
i
n
P
r
oc
e
e
di
ngs
of
t
h
e
2015
C
onf
e
r
e
nc
e
on
E
m
pi
r
i
c
a
l
M
e
t
hods
i
n N
at
ur
al
L
anguage
P
r
oc
e
s
s
i
ng
, 2015, pp. 2515
–
2519, doi
:
10.18653
/
v1/
D
15
-
1299.
[
29]
R
.
B
a
l
y,
A
.
K
ha
dda
j
,
H
.
H
a
j
j
,
W
.
E
l
-
H
a
j
j
,
a
nd
K
.
B
.
S
ha
ba
n,
“
A
r
S
e
nt
D
-
L
E
V
:
a
m
ul
t
i
-
t
opi
c
c
or
pus
f
or
t
a
r
ge
t
-
ba
s
e
d
s
e
nt
i
m
e
nt
a
na
l
ys
i
s
i
n A
r
a
bi
c
l
e
va
nt
i
ne
t
w
e
e
t
s
,”
T
he
3
r
d W
or
k
s
hop on O
pe
n
-
Sour
c
e
A
r
abi
c
C
or
por
a and P
r
o
c
e
s
s
i
ng T
ool
s
, 2018.
[
30]
I
.
A
.
F
a
r
ha
a
nd
W
.
M
a
gdy,
“
B
e
nc
hm
a
r
ki
ng
t
r
a
ns
f
or
m
e
r
-
ba
s
e
d
l
a
ngua
ge
m
ode
l
s
f
or
A
r
a
bi
c
s
e
nt
i
m
e
nt
a
nd
s
a
r
c
a
s
m
de
t
e
c
t
i
on,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
Si
x
t
h A
r
abi
c
N
at
u
r
al
L
anguage
P
r
oc
e
s
s
i
ng W
or
k
s
hop
, 2021
, pp. 21
–
31.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
r
aB
E
R
T
t
r
ans
fo
r
m
e
r
m
od
e
l
f
or
A
r
abi
c
c
om
m
e
nt
s
and r
e
v
ie
w
s
analy
s
is
(
H
ic
ham
E
l
M
oubt
ahi
j
)
387
B
I
O
G
R
A
P
H
I
E
S
O
F
A
U
T
H
O
R
S
Prof.
Hicham
El
Moubtahij
i
s
c
u
r
r
en
t
l
y
a
P
ro
f
e
ss
o
r
o
f
C
om
p
ut
e
r
S
c
i
e
nc
e
a
t
t
h
e
Un
i
v
e
rs
i
t
y
o
f
I
b
n
Z
o
h
r,
Ag
a
d
i
r,
M
or
o
c
co
.
He
re
c
e
i
ve
d
h
i
s
P
h
.D
.
i
n
C
o
mp
u
t
e
r
S
ci
e
n
c
e
f
r
o
m
th
e
U
ni
v
e
r
si
t
y
o
f
S
id
i
M
oh
a
m
ed
B
e
n
A
b
de
l
l
a
h,
F
e
z,
M
or
o
c
co
,
i
n
2
0
1
7
.
H
e
i
s
n
ow
a
me
m
b
e
r
o
f
t
h
e
S
ys
t
e
m
s
a
nd
Te
c
h
n
ol
o
g
ie
s
of
I
n
f
o
rm
a
t
i
on
Te
a
m
a
t
t
he
Hi
g
h
S
c
h
oo
l
o
f
T
e
c
h
n
o
lo
g
y
a
t
t
h
e
U
n
iv
e
r
s
it
y
o
f
I
b
n
Z
o
h
r,
Ag
a
d
i
r.
Hi
s
cu
r
r
en
t
r
e
se
a
r
c
h
i
n
t
er
e
s
t
s
i
nc
l
u
d
e
m
a
c
h
i
n
e
l
e
a
rn
i
n
g
,
d
e
e
p
l
e
a
r
ni
n
g
,
Ar
a
b
ic
h
a
n
dw
r
i
t
in
g
r
ec
o
g
n
it
io
n
,
T
e
x
t
M
i
n
in
g
,
a
nd
medica
l
i
m
a
ge
r
y
.
D
r
.
El
M
o
u
bt
a
h
i
j
H
i
c
ha
m
h
as
p
u
b
li
s
h
ed
a
r
t
i
c
le
s
i
n
i
n
d
e
xe
d
i
n
t
e
r
n
at
i
o
n
al
j
o
u
rn
a
l
s
a
nd
c
o
n
fe
r
e
n
ce
s
,
h
a
s
b
e
e
n
a
re
v
i
e
we
r
f
o
r
s
c
i
e
n
t
if
i
c
j
ou
r
n
a
ls
,
a
n
d
h
a
s
se
r
v
e
d
o
n
t
h
e
p
ro
g
r
a
m
c
om
m
i
t
te
e
o
f
s
e
v
e
ra
l
c
o
n
f
er
e
n
ce
s
.
H
e
c
an
be
co
n
t
a
ct
e
d
a
t
e
m
a
i
l
:
h
.
e
l
mo
u
b
t
ah
i
j
@
ui
z
.
a
c.
m
a
.
Dr.
Hajar
Abdelali
holder
of
a
bachelor
’
s
degree
in
experimen
tal
sciences,
a
bachelor'
s
degree
in
mathematics
and
computer
science,
a
master'
s
degree
in
information
sciences,
networks
and
multimedia
from
Sidi
Mohammed
Ben
Abde
llah,
University
of
Fez,
Morocco
in
2013.
She
joined
the
laboratory
XLIM
of
the
University
of
Poitiers
in
France
in
collaborat
ion
with
the
scienti
fic
laboratory
LIMS
of
the
Faculty
of
S
ciences
Dhar
Mahraz
of
Sidi
Mohammed
Ben
Abdellah,
University
of
Fez,
Morocco
where
he
obtained
his
Ph.D.
degree
in
computer
science
in
2019.
Sh
e
c
an
b
e
c
o
n
t
a
c
te
d
a
t
e
m
ai
l
:
abdelali.
hajar@
usmba.ac.m
a
.
Prof.
El
Bachir
Tazi
graduated
in
Electronic
Engineering
from
ENSET
Mohammedia
Morocco
in
1992.
He
obtained
his
DEA
and
DES
in
Automation
and
Signal
Proce
ssing
and
his
PhD
in
Computer
Scienc
e
from
Sidi
Moh
ammed
B
en
Abdella
h
Univer
sity,
Faculty
of
Scienc
es
in
Fez,
Morocc
o
resp
ectively
in
1995,
1999
a
nd
2012.
He
is
now
a
member
of
the
engineering
sciences
laboratory
and
associate
profes
sor
at
Sidi
Mohammed
Ben
Abdellah
University,
Polydisciplinary
Faculty
of
Taza,
Morocco.
His
areas
of
interest
generally
include
all
areas
of
aut
omatic
recognition
based
on
artific
ial
intelligence
methods
and
applications
related
to
automatic
speaker
.
H
e
c
an
b
e
c
o
n
t
a
ct
e
d
a
t
e
ma
i
l
:
e
l
b
a
c
h
ir
t
a
zi
@
y
a
h
oo
.
f
r
.
Evaluation Warning : The document was created with Spire.PDF for Python.