I
nte
rna
t
io
na
l J
o
urna
l o
f
E
lect
rica
l a
nd
Co
m
pu
t
er
E
ng
ineering
(
I
J
E
CE
)
Vo
l.
11
,
No
.
2
,
A
p
r
il
2
0
2
1
,
p
p
.
1
6
2
7
~
1
6
3
3
I
SS
N:
2088
-
8
7
0
8
,
DOI
: 1
0
.
1
1
5
9
1
/
ijece
.
v
11
i
2
.
pp
1
6
2
7
-
1
6
3
3
1627
J
o
ur
na
l ho
m
ep
a
g
e
:
h
ttp
:
//ij
ec
e.
ia
esco
r
e.
co
m
Ara
bic
t
weeps
d
ia
lect
p
r
edict
io
n
b
a
sed o
n
m
a
chine
le
a
rning
a
ppro
a
ch
K
ha
led A
lrif
a
i
,
G
ha
ida
Rebda
wi
,
Na
da
G
hn
eim
De
p
a
rt
m
e
n
t
o
f
In
fo
rm
a
ti
c
s,
Hig
h
e
r
In
stit
u
te fo
r
A
p
p
l
ied
S
c
ien
c
e
s a
n
d
Tec
h
n
o
l
o
g
y
,
S
y
ria
Art
icle
I
nfo
AB
S
T
RAC
T
A
r
ticle
his
to
r
y:
R
ec
eiv
ed
No
v
22
,
2
0
1
9
R
ev
is
ed
Au
g
14
,
20
20
Acc
ep
ted
No
v
4
,
2
0
20
In
t
h
is
p
a
p
e
r,
we
p
re
se
n
t
o
u
r
a
p
p
ro
a
c
h
fo
r
p
ro
f
il
in
g
Ara
b
ic
a
u
th
o
rs
o
n
T
witt
e
r,
b
a
se
d
o
n
t
h
e
ir
twe
e
ts.
We
c
o
n
sid
e
r
h
e
re
th
e
d
iale
c
t
o
f
a
n
Ara
b
ic
a
u
th
o
r
a
s
a
n
imp
o
rtan
t
trait
t
o
b
e
p
re
d
icte
d
.
F
o
r
t
h
is
p
u
r
p
o
se
,
m
a
n
y
in
d
ica
to
rs,
fe
a
t
u
re
v
e
c
to
rs
a
n
d
m
a
c
h
in
e
lea
rn
in
g
-
b
a
se
d
c
las
sifiers
we
re
imp
lem
e
n
ted
.
Th
e
re
su
lt
s
o
f
t
h
e
se
c
las
sifi
e
rs
we
r
e
c
o
m
p
a
re
d
to
fi
n
d
o
u
t
th
e
b
e
st
d
iale
c
t
p
re
d
ictio
n
m
o
d
e
l.
T
h
e
b
e
st
d
iale
c
t
p
re
d
ictio
n
m
o
d
e
l
w
a
s
o
b
tain
e
d
u
sin
g
ra
n
d
o
m
fo
re
st
c
las
sifier
with
f
u
ll
fo
rm
s
a
n
d
t
h
e
ir
ste
m
s
a
s
fe
a
tu
re
v
e
c
to
r.
K
ey
w
o
r
d
s
:
Ar
ab
ic
d
ialec
ts
d
etec
tio
n
Au
th
o
r
p
r
o
f
il
in
g
Ma
ch
in
e
l
ea
r
n
in
g
So
cial
m
ed
ia
a
n
aly
s
is
T
ex
t
m
in
in
g
T
h
is i
s
a
n
o
p
e
n
a
c
c
e
ss
a
rticle
u
n
d
e
r th
e
CC B
Y
-
SA
li
c
e
n
se
.
C
o
r
r
e
s
p
o
nd
ing
A
uth
o
r
:
Kh
aled
Alr
if
ai
Dep
ar
tm
en
t
o
f
I
n
f
o
r
m
atics
Hig
h
er
I
n
s
titu
te
f
o
r
Ap
p
lied
S
cien
ce
s
an
d
T
ec
h
n
o
lo
g
y
B
ar
ze
h
, D
am
ascu
s
,
Sy
r
ia
E
m
ail: k
h
aled
.
alr
if
ai@
h
iast
.
ed
u
.
s
y
1.
I
NT
RO
D
UCT
I
O
N
Au
th
o
r
p
r
o
f
ilin
g
o
n
s
o
cial
m
ed
ia
is
a
m
eth
o
d
o
f
a
n
aly
s
in
g
th
e
au
t
h
o
r
wr
itin
g
s
o
n
s
o
ci
al
m
ed
ia
in
o
r
d
er
to
u
n
co
v
er
d
i
f
f
er
en
t
tr
aits
o
f
th
e
au
th
o
r
(
e.
g
.
g
en
d
er
an
d
ag
e)
b
ased
o
n
s
ty
lis
tic
o
r
co
n
ten
t
-
b
ased
f
ea
tu
r
es.
T
h
is
m
eth
o
d
aim
s
a
t
tak
in
g
ad
v
an
ta
g
e
o
f
a
h
u
g
e
v
o
lu
m
e
o
f
d
ata
g
en
e
r
ated
b
y
a
h
u
g
e
n
u
m
b
er
o
f
au
th
o
r
s
,
in
o
r
d
er
to
class
if
y
th
em
in
to
p
r
ed
e
f
in
ed
class
es
b
ased
o
n
th
eir
t
r
aits
.
Au
th
o
r
p
r
o
f
ilin
g
h
as
m
an
y
u
s
ef
u
l
ap
p
licatio
n
s
in
th
e
d
o
m
ain
o
f
s
o
cial
m
ed
ia
an
aly
s
is
,
s
u
ch
as
in
m
ar
k
etin
g
an
d
ad
v
er
tis
in
g
,
as
well
a
s
in
th
e
f
o
r
e
n
s
ic
an
d
s
ec
u
r
ity
a
r
ea
s
[
1
]
.
W
ith
th
e
b
ir
th
an
d
r
is
e
o
f
s
o
c
ial
m
ed
ia,
in
ter
n
et
u
s
er
s
in
th
e
Ar
ab
wo
r
l
d
wer
e
q
u
ick
t
o
e
m
b
r
ac
e
th
e
n
ew
tech
n
o
lo
g
y
,
an
d
u
tili
ze
all
wh
at
s
o
cial
m
ed
ia
h
as
to
o
f
f
er
to
co
n
n
ec
t,
c
o
m
m
u
n
icate
,
an
d
s
h
a
r
e
in
f
o
r
m
atio
n
with
o
th
er
s
u
s
in
g
Ar
ab
ic
lan
g
u
ag
e
[
2
]
.
Ar
a
b
ic
lan
g
u
ag
e
th
at
u
s
ed
in
s
o
cial
m
ed
ia
h
as
two
f
o
r
m
s
:
th
e
f
ir
s
t,
is
th
e
m
o
d
er
n
s
tan
d
ar
d
Ar
a
b
ic
(
MSA)
,
wh
ic
h
i
s
wid
ely
u
s
ed
in
f
o
r
m
al
s
itu
atio
n
s
lik
e
f
o
r
m
al
s
p
ee
ch
es,
g
o
v
e
r
n
m
e
n
t
an
d
o
f
f
icial
co
n
ten
ts
;
th
e
s
ec
o
n
d
,
i
s
k
n
o
wn
as
d
ialec
tal
Ar
ab
ic
(
DA)
wh
ich
is
th
e
in
f
o
r
m
al
p
r
iv
ate
lan
g
u
ag
e,
p
r
ed
o
m
in
an
tly
f
o
u
n
d
as
s
p
o
k
en
v
er
n
ac
u
lar
s
with
n
o
wr
itten
s
tan
d
ar
d
s
.
Dialec
t
s
d
if
f
er
in
m
o
r
p
h
o
lo
g
ies,
g
r
a
m
m
atica
l
ca
s
es,
v
o
ca
b
u
lar
ies
an
d
v
er
b
co
n
ju
g
atio
n
s
[
3
]
.
T
h
ese
d
if
f
er
en
ce
s
ca
ll
f
o
r
d
ialec
t
-
s
p
ec
if
ic
p
r
o
ce
s
s
in
g
an
d
m
o
d
elin
g
wh
en
b
u
ild
in
g
Ar
a
b
ic
au
to
m
atic
an
aly
s
is
s
y
s
tem
s
[
4
]
.
T
h
e
n
atu
r
al
lan
g
u
ag
e
p
r
o
ce
s
s
in
g
(
NL
P)
co
m
m
u
n
ity
h
as
ag
g
r
e
g
ated
d
ialec
tal
Ar
ab
ic
in
to
f
o
u
r
r
eg
io
n
al
lan
g
u
ag
e
g
r
o
u
p
s
: E
g
y
p
tian
,
Ma
g
h
r
e
b
an
,
Gu
lf
,
a
n
d
L
ev
an
tin
e
d
ialec
ts
,
in
ad
d
itio
n
t
o
m
o
d
er
n
s
tan
d
ar
d
Ar
ab
ic
(
MSA)
,
th
e
Ar
a
b
ic
f
o
r
m
al
lan
g
u
ag
e.
An
o
b
jectiv
e
co
m
p
ar
is
o
n
o
f
th
e
v
ar
iet
ies
o
f
A
r
ab
ic
d
ialec
ts
c
o
u
ld
p
o
ten
tially
lead
t
o
th
e
c
o
n
clu
s
io
n
th
at
Ar
a
b
ic
d
ialec
ts
ar
e
h
i
s
to
r
ically
r
elate
d
,
b
u
t
ar
e
m
u
t
u
ally
u
n
i
n
tellig
ib
le
d
ialec
ts
[
5
]
.
Au
th
o
r
p
r
o
f
ilin
g
i
s
a
class
if
icatio
n
p
r
o
b
lem
th
at
co
u
ld
b
e
s
o
lv
ed
u
s
in
g
v
ar
io
u
s
ap
p
r
o
ac
h
es.
T
h
ese
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
8
-
8
7
0
8
I
n
t J E
lec
&
C
o
m
p
E
n
g
,
Vo
l.
11
,
No
.
2
,
Ap
r
il
2
0
2
1
:
1
6
2
7
-
1633
1628
a
p
p
r
o
ac
h
es
ar
e
b
ased
o
n
th
e
s
elec
ted
f
ea
tu
r
es
ex
tr
ac
ted
f
r
o
m
th
e
au
th
o
r
’s
wr
itin
g
s
,
an
d
th
e
class
if
ier
u
s
ed
in
th
e
d
ev
elo
p
m
e
n
t
o
f
th
e
p
r
ed
i
ctio
n
m
o
d
el.
A
lo
t
o
f
r
esear
ch
es
in
th
e
f
ield
o
f
au
th
o
r
p
r
o
f
ilin
g
p
r
ev
iewe
d
a
co
m
p
ar
is
o
n
s
tu
d
y
b
etwe
en
m
u
ltip
le
f
ea
tu
r
es
[
6
]
an
d
class
if
ier
s
[
7
]
to
s
elec
t
th
e
b
est
co
m
b
in
atio
n
o
f
th
em
f
o
r
p
r
ed
ictin
g
a
s
p
ec
if
ic
t
r
ait.
Featu
r
es
u
s
ed
in
d
ialec
t
id
en
t
if
icatio
n
p
r
o
b
lem
s
ar
e
co
n
te
n
t
-
b
ased
an
d
s
ty
le
-
b
ased
f
ea
tu
r
es
[
8
]
.
I
n
co
n
ten
t
-
b
ased
f
e
atu
r
es,
ch
ar
ac
ter
n
-
g
r
am
s
an
d
wo
r
d
n
-
g
ra
m
s
ar
e
wid
ely
u
s
ed
.
Kh
en
g
et
a
l
.
in
[
9
]
u
s
ed
w
o
r
d
n
-
g
r
am
s
with
v
alu
es
b
etwe
en
1
an
d
3
f
o
r
n
.
Ma
r
k
o
v
et
a
l
.
i
n
[
1
0
]
c
o
m
b
in
e
d
ch
a
r
ac
ter
n
-
g
r
am
s
an
d
wo
r
d
n
-
g
r
am
s
with
v
alu
es
o
f
3
-
4
f
o
r
t
y
p
ed
ch
ar
ac
ter
s
,
3
-
7
f
o
r
u
n
ty
p
ed
ch
a
r
ac
ter
s
an
d
2
-
3
f
o
r
wo
r
d
s
,
r
esp
ec
tiv
e
ly
.
I
n
u
n
ty
p
e
d
ch
ar
ac
ter
s
,
n
-
g
r
am
ty
p
es
ar
e
ig
n
o
r
ed
(
e.
g
.
‘
th
e’
a
s
a
wh
o
le
wo
r
d
is
n
o
d
if
f
er
en
t
f
r
o
m
‘
th
e’
in
th
e
m
id
d
le
o
f
a
wo
r
d
)
,
b
u
t
in
ty
p
e
d
ch
ar
ac
ter
s
,
n
-
g
r
am
s
o
f
d
if
f
er
en
t
ty
p
es
ar
e
d
is
tin
g
u
is
h
ed
(
e.
g
.
n
-
g
r
a
m
s
m
ay
b
e
s
u
f
f
ix
es,
p
u
n
ctu
atio
n
s
,
wo
r
d
s
et
c.
)
.
Similar
ly
,
C
io
b
a
n
u
et
a
l
.
in
[
1
1
]
co
m
b
in
ed
ch
a
r
ac
ter
an
d
wo
r
d
n
-
g
r
am
s
with
v
alu
es o
f
n
o
f
1
-
6
an
d
1
-
2
r
esp
ec
tiv
ely
.
I
n
[
1
2
,
1
3
]
,
tf
-
id
f
n
-
g
r
am
s
wer
e
co
m
b
in
ed
wit
h
wo
r
d
e
m
b
ed
d
in
g
,
an
d
with
2
-
g
r
am
c
h
ar
ac
ter
s
in
th
e
b
eg
in
n
in
g
a
n
d
en
d
in
g
,
r
esp
ec
tiv
ely
.
Ma
n
y
f
ea
tu
r
es
s
elec
tio
n
cr
iter
i
a
h
av
e
b
ee
n
u
s
ed
:
g
ain
r
atio
[
1
4
]
,
b
ag
-
of
-
wo
r
d
s
[
1
5
,
1
6
]
,
th
e
1
0
0
m
o
s
t
d
is
cr
im
in
an
t
wo
r
d
s
p
er
class
f
r
o
m
a
lis
t
o
f
5
0
0
to
p
ic
wo
r
d
s
[
1
7
]
,
laten
t
s
em
an
tic
a
n
al
y
s
is
L
SA
[
9
]
,
an
d
s
p
ec
if
ic
lis
ts
o
f
wo
r
d
s
f
o
r
d
ial
ec
t
[
1
8
]
.
I
n
s
ty
le
-
b
ased
f
ea
tu
r
es,
ch
ar
ac
ter
f
lo
o
d
in
g
(
i.e
.
le
n
g
th
en
ed
wo
r
d
s
)
an
d
em
o
tico
n
s
o
r
/an
d
lau
g
h
er
e
x
p
r
ess
io
n
s
[
1
5
,
1
8
]
wer
e
co
m
m
o
n
ly
u
s
ed
.
Ma
r
k
o
v
et
a
l
.
i
n
[
1
0
]
also
co
m
b
in
e
d
d
o
m
ain
n
am
es
th
at
a
r
e
u
s
ed
in
lin
k
s
,
with
d
if
f
er
en
t
k
in
d
s
o
f
n
-
g
r
am
s
.
Ar
cial
et
a
l
.
in
[
1
5
]
co
m
b
i
n
ed
em
o
tio
n
al
f
ea
tu
r
es
s
u
ch
as:
em
o
tio
n
s
,
ap
p
r
aisal,
ad
m
ir
atio
n
,
p
o
s
itiv
e/n
eg
ativ
e
em
o
tic
o
n
s
,
an
d
p
o
s
itiv
e/
n
eg
ativ
e
wo
r
d
s
.
Ma
r
tin
c
et
a
l
.
in
[
1
8
]
also
u
s
ed
em
o
jis
an
d
s
en
tim
en
t w
o
r
d
s
.
C
o
n
ce
r
n
in
g
class
if
icatio
n
alg
o
r
ith
m
s
,
m
o
s
t
r
esear
c
h
er
s
u
s
ed
tr
ad
itio
n
al
m
ac
h
in
e
lear
n
i
n
g
a
lg
o
r
ith
m
s
s
u
ch
as
lo
g
is
tic
r
eg
r
ess
io
n
[
1
2
,
1
8
,
19
]
,
SVMs
[
9
-
1
1
,
1
6
,
20
-
2
2
]
,
a
n
d
d
is
tan
ce
-
b
ased
m
eth
o
d
s
[
1
4
,
1
5
,
1
7
]
.
So
m
e
r
esear
ch
er
s
em
p
lo
y
ed
d
ee
p
lear
n
in
g
tech
n
iq
u
es
f
o
r
th
is
p
u
r
p
o
s
e.
Fo
r
ex
am
p
le,
Ko
d
iy
an
et
a
l
.
in
[
2
3
]
ap
p
lied
r
ec
u
r
r
en
t
n
eu
r
al
n
et
wo
r
k
s
(
R
NN)
,
wh
er
ea
s
Sch
ae
tti
in
[
1
3
]
an
d
Sier
r
a
et
a
l
.
in
[
2
4
]
u
s
e
d
co
n
v
o
l
u
tio
n
al
n
e
u
r
al
n
etwo
r
k
s
(
C
NN)
.
Fin
ally
,
Salv
ad
o
r
et
a
l
.
in
[
2
5
]
ap
p
lied
d
ee
p
av
er
ag
i
n
g
n
etwo
r
k
s
.
Dialec
t
o
f
Ar
ab
ic
twee
p
s
(
T
witter
u
s
er
s
)
is
th
e
tr
ait
u
n
d
er
s
tu
d
y
o
f
au
th
o
r
p
r
o
f
ilin
g
in
th
is
p
ap
er
.
Acc
o
r
d
in
g
ly
,
th
e
r
eq
u
ir
e
d
task
is
to
d
ev
elo
p
a
m
o
d
el
th
at
ca
n
p
r
ed
ict
th
e
d
ialec
t
o
f
a
twee
p
b
ased
o
n
h
is
/h
er
Ar
ab
ic
twee
ts
.
I
n
th
e
r
est
o
f
th
is
p
ap
e
r
,
we
p
r
esen
t
o
u
r
m
eth
o
d
o
l
o
g
y
th
at
in
cl
u
d
es:
t
h
e
ch
ar
ac
t
e
r
is
tics
o
f
tr
ain
in
g
an
d
test
in
g
d
ata,
th
e
f
ea
tu
r
es
u
s
ed
f
o
r
th
e
d
ev
elo
p
e
d
m
o
d
el,
an
d
a
s
tep
-
by
-
s
tep
a
p
p
r
o
ac
h
to
b
u
ild
th
e
p
r
ed
ictio
n
m
o
d
el
in
s
ec
tio
n
2
.
I
n
s
ec
tio
n
3
,
a
b
r
ief
d
is
cu
s
s
io
n
o
f
th
e
r
esu
lts
is
ad
d
r
ess
ed
.
At
th
e
en
d
,
in
s
ig
h
ts
f
o
r
th
e
f
u
tu
r
e
an
d
a
s
h
o
r
t su
m
m
ar
y
ar
e
p
r
esen
ted
.
2.
RE
S
E
ARCH
M
E
T
H
O
D
I
n
th
is
s
ec
tio
n
,
we
d
escr
ib
e
th
e
d
ataset
u
s
ed
in
th
is
wo
r
k
,
an
d
th
e
f
ea
tu
r
es
d
ev
elo
p
ed
f
o
r
th
e
p
r
ed
ictio
n
m
o
d
el.
T
h
e
p
r
o
p
o
s
ed
m
o
d
el
is
ex
p
lain
ed
in
d
etail
h
er
ea
f
ter
,
in
clu
d
in
g
:
d
a
ta
p
r
e
-
p
r
o
ce
s
s
in
g
,
f
ea
tu
r
es e
x
tr
ac
tio
n
,
f
ea
tu
r
es f
il
ter
in
g
an
d
th
e
alg
o
r
ith
m
s
with
th
eir
ev
alu
atio
n
cr
iter
ia.
2
.
1
.
Da
t
a
s
et
I
n
o
u
r
r
esear
ch
,
we
u
s
ed
tr
ai
n
in
g
d
ataset
f
r
o
m
PAN
co
n
f
e
r
en
ce
2
0
1
7
[
8
]
.
On
e
o
f
PAN
2
0
1
7
task
s
was
ab
o
u
t
Ar
ab
ic
twee
p
s
p
r
o
f
ilin
g
ac
co
r
d
in
g
to
th
eir
d
iale
cts.
T
h
is
d
ata
co
n
s
is
ts
o
f
2
4
0
,
0
0
0
A
r
ab
ic
twee
ts
wr
itten
b
y
2
,
4
0
0
au
th
o
r
s
(
1
0
0
twee
ts
f
o
r
ea
ch
au
th
o
r
)
.
Au
th
o
r
s
wer
e
tag
g
ed
with
th
eir
d
ia
lects.
Dialec
t
s
wer
e
d
iv
id
ed
in
to
f
o
u
r
class
es:
L
ev
an
tin
e,
Gu
lf
,
E
g
y
p
tian
an
d
Ma
g
h
r
eb
a
n
.
Au
th
o
r
s
wer
e
ca
teg
o
r
ized
in
to
4
class
es
o
f
6
0
0
au
th
o
r
s
ea
ch
.
As
a
te
s
tin
g
d
ataset,
PAN
2
0
1
7
p
r
e
p
ar
ed
also
a
d
ataset
co
n
s
is
ts
o
f
1
6
0
,
0
0
0
Ar
a
b
ic
twee
ts
wr
itten
b
y
1
6
0
0
au
th
o
r
s
(
1
0
0
twee
ts
f
o
r
ea
ch
au
th
o
r
)
,
d
iv
id
ed
eq
u
ally
in
t
o
th
e
f
o
u
r
class
es
d
escr
ib
ed
in
th
e
tr
ain
in
g
d
ata.
2
.
2
.
Studied
f
ea
t
ures
W
e
im
p
lem
en
ted
s
ev
er
al
ex
p
e
r
im
en
ts
u
s
in
g
d
if
f
e
r
en
t
f
ea
t
u
r
e
v
ec
to
r
s
.
W
e
ca
teg
o
r
ized
th
ese
f
ea
tu
r
es
in
to
:
a.
C
o
n
ten
t
-
b
ased
f
ea
tu
r
es:
−
Un
i
-
g
r
am
,
b
i
-
g
r
a
m
an
d
tr
i
-
g
r
a
m
o
f
wo
r
d
s
−
Stem
s
o
f
wo
r
d
s
−
L
em
m
as o
f
wo
r
d
s
−
W
o
r
d
s
p
ar
t o
f
s
p
ee
c
h
tag
s
(
POS),
i.e
.
NOUN
_
MS_
PR
ON
an
d
V_
PR
ON
−
C
h
ar
ac
ter
n
-
g
r
a
m
,
wh
er
e
n
r
a
n
g
es f
r
o
m
2
to
7
.
b.
Sty
le
-
b
ased
f
ea
tu
r
es:
−
L
in
k
s
to
web
s
ites
(
“h
ttp
”)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J E
lec
&
C
o
m
p
E
n
g
I
SS
N:
2088
-
8
7
0
8
A
r
a
b
ic
tw
ee
p
s
d
ia
lect
p
r
ed
ictio
n
b
a
s
ed
o
n
m
a
ch
in
e
lea
r
n
in
g
a
p
p
r
o
a
c
h
(
K
h
a
led
A
lr
ifa
i)
1629
−
Hash
tag
s
to
ac
tiv
e
p
u
b
lic
tr
en
d
s
(
“#
”)
−
Me
n
tio
n
s
to
o
th
er
au
th
o
r
s
(
“@
”)
−
L
en
g
th
en
e
d
wo
r
d
s
,
i.e
.
th
e
i
n
ten
tio
n
al
r
e
p
etitio
n
o
f
a
ch
ar
ac
ter
in
a
wo
r
d
to
em
p
h
a
s
ize
an
d
to
ex
ag
g
er
ate
in
d
escr
ib
in
g
s
o
m
e
th
in
g
lik
e
lau
g
h
in
g
“
ههه
ه
ه
ه
ه
ه
ه
”,
m
ag
n
if
icatio
n
“
واا
اا
اا
اا
اا
اا
اا
اا
او
”
,
in
d
ig
n
atio
n
“
ا
اا
اا
اا
اا
”,
etc.
−
Av
er
ag
e
twee
ts
len
g
th
,
i.e
.
th
e
av
er
ag
e
n
u
m
b
e
r
o
f
wo
r
d
s
in
a
n
au
th
o
r
twee
ts
−
T
wee
ts
p
u
n
ctu
atio
n
m
a
r
k
s
,
i.e
.
th
e
s
u
m
m
atio
n
o
f
p
u
n
ctu
atio
n
m
ar
k
s
u
s
ed
in
a
n
au
th
o
r
twee
t
s
.
2
.
3
.
O
ur
m
o
del
I
n
o
u
r
attem
p
t
to
f
in
d
o
u
t
th
e
b
est
p
r
ed
ictio
n
m
o
d
el,
we
p
r
ep
ar
e
d
th
e
d
ataset
an
d
ex
tr
ac
ted
th
e
f
ea
tu
r
es.
T
h
ese
f
ea
tu
r
es h
a
v
e
b
ee
n
f
ilter
ed
to
r
ed
u
ce
t
h
e
s
ize
o
f
f
ea
tu
r
e
v
ec
to
r
s
.
Dep
e
n
d
in
g
o
n
r
ed
u
ce
d
f
ea
t
u
r
e
v
ec
to
r
s
,
we
im
p
lem
en
ted
s
ev
er
al
ex
p
er
im
en
ts
th
at
d
if
f
er
ed
f
r
o
m
ea
c
h
o
t
h
er
in
f
ea
tu
r
e
v
ec
to
r
an
d
th
e
alg
o
r
ith
m
u
s
ed
f
o
r
tr
ain
in
g
.
T
h
e
r
esu
ltin
g
m
o
d
els
wer
e
co
m
p
ar
ed
u
s
in
g
s
p
ec
if
ic
e
v
alu
at
io
n
cr
iter
ia
to
s
elec
t
th
e
b
est o
n
e.
2
.
3
.
1
.
Da
t
a
pre
-
pro
ce
s
s
ing
B
ef
o
r
e
s
tar
tin
g
f
ea
tu
r
e
ex
tr
ac
t
io
n
s
tag
e,
we
co
n
ca
ten
ated
all
th
e
1
0
0
twee
ts
f
o
r
ea
ch
au
t
h
o
r
in
to
o
n
e
lo
n
g
te
x
t.
T
h
is
l
o
n
g
tex
t
was
t
o
k
en
ized
u
s
in
g
Far
asa
to
k
en
iz
er
[
2
6
]
.
All
ex
tr
ac
ted
to
k
e
n
s
h
av
e
b
ee
n
g
r
o
u
p
ed
an
d
weig
h
ted
with
th
eir
f
r
e
q
u
e
n
cy
in
th
e
d
ataset
(
all
th
e
to
k
e
n
s
f
r
o
m
all
au
t
h
o
r
s
)
.
2
.
3
.
2
.
F
ea
t
ures
ex
t
r
a
ct
io
n
Af
ter
th
e
to
k
en
izatio
n
s
tep
,
l
em
m
as
an
d
s
tem
s
wer
e
ex
tr
ac
ted
f
r
o
m
th
e
ca
lc
u
lated
to
k
en
s
u
s
in
g
Far
asa
to
o
lb
o
x
.
T
o
k
en
s
wer
e
u
s
ed
also
to
ex
tr
ac
t
ch
ar
ac
t
er
2
-
7
g
r
am
s
.
I
n
all
co
n
te
n
t
-
b
ased
f
ea
tu
r
es,
th
e
ca
lcu
lated
v
alu
e
f
o
r
ea
c
h
f
ea
t
u
r
e
was
th
e
f
r
e
q
u
en
c
y
o
f
u
s
e
in
th
e
d
ataset.
T
h
is
s
tep
p
r
o
d
u
ce
d
a
h
u
g
e
s
ize
o
f
f
ea
tu
r
e
v
ec
to
r
th
at
s
h
o
u
ld
b
e
r
ed
u
ce
d
.
Sty
le
-
b
ased
f
ea
tu
r
es
wer
e
als
o
ca
lcu
lated
an
d
ex
tr
ac
te
d
f
o
r
ea
ch
au
th
o
r
.
W
e
co
n
s
id
er
e
d
a
wo
r
d
is
len
g
th
en
ed
if
it
in
clu
d
e
d
a
ch
a
r
ac
ter
r
ep
ea
ted
t
h
r
ee
tim
es
at
l
ea
s
t.
I
n
ca
s
e
o
f
av
e
r
ag
e
twee
ts
len
g
th
an
d
twee
ts
p
u
n
ctu
atio
n
m
ar
k
s
,
th
e
v
alu
e
o
f
th
ese
f
ea
tu
r
es
was
th
e
ca
lcu
lated
co
u
n
t
its
elf
.
I
n
ca
s
e
o
f
h
ash
tag
s
,
m
en
tio
n
s
,
lin
k
s
an
d
len
g
th
e
n
ed
w
o
r
d
s
,
t
h
e
v
alu
e
o
f
th
ese
f
ea
t
u
r
es
wer
e
th
e
n
o
r
m
alize
d
u
s
ag
e
r
atio
.
T
h
ese
f
ea
tu
r
es
wer
e
co
u
n
ted
t
h
en
n
o
r
m
alize
d
i
n
to
t
h
e
in
ter
v
al
[
0
,
1
0
0
]
.
2
.
3
.
3
.
F
ea
t
ures
f
ilte
ring
T
h
e
n
u
m
b
er
o
f
elem
e
n
ts
o
f
ea
ch
co
n
te
n
t
-
b
ased
f
ea
t
u
r
e
v
ec
to
r
was
v
er
y
h
u
g
e,
wh
ic
h
m
ad
e
th
e
tr
ain
in
g
p
r
o
ce
s
s
v
er
y
h
ar
d
an
d
tim
e
-
co
n
s
u
m
in
g
.
W
e
ap
p
lied
th
e
f
o
llo
win
g
s
tep
s
to
r
ed
u
c
e
th
e
f
ea
tu
r
e
v
ec
to
r
s
ize:
−
E
lim
in
atin
g
f
ea
tu
r
es
with
a
v
alu
e
less
th
an
f
iv
e
(
we
h
av
e
f
o
u
r
class
es).
T
h
e
p
r
o
b
a
b
ilit
y
th
at
th
ese
item
s
co
n
tr
ib
u
te
in
th
e
class
if
icatio
n
is
lo
w
−
Dis
ca
r
d
in
g
all
elem
en
ts
with
I
n
f
o
r
m
atio
n
Gain
I
G
e
q
u
als to
ze
r
o
.
2
.
3
.
4
.
M
o
del
t
ra
ini
ng
I
n
o
u
r
ex
p
er
im
en
ts
,
we
tr
ain
ed
d
if
f
er
en
t
m
o
d
els
u
s
in
g
W
ek
a
to
o
lb
o
x
.
T
h
e
f
ea
t
u
r
es
m
en
tio
n
ed
in
3
.
2
.
h
av
e
b
ee
n
u
s
ed
s
ep
ar
atel
y
o
r
jo
in
tly
to
cr
ea
te
v
ar
io
u
s
f
e
atu
r
e
v
ec
t
o
r
s
to
b
e
test
ed
in
o
u
r
ex
p
er
im
en
ts
.
Acc
o
r
d
in
g
to
th
e
class
if
ier
s
,
we
u
s
ed
in
itially
s
u
p
p
o
r
t
v
ec
t
o
r
m
ac
h
i
n
e
(
SVM
)
in
o
r
d
er
t
o
f
in
d
o
u
t
th
e
b
est
f
ea
tu
r
e
v
ec
to
r
th
r
o
u
g
h
s
ev
er
al
ex
p
er
im
en
ts
.
Usi
n
g
th
e
r
esu
ltin
g
b
est
f
ea
tu
r
e
v
ec
to
r
,
we
tr
ain
ed
o
th
er
class
if
ier
s
,
s
u
ch
as:
s
eq
u
en
tial
m
in
im
al
o
p
tim
izatio
n
(
SMO
)
,
r
an
d
o
m
f
o
r
est
(
RF
)
an
d
n
aïv
e
b
ay
es
(
NB
)
as
tr
ain
in
g
alg
o
r
ith
m
s
to
b
e
c
o
m
p
ar
ed
with
SVM
r
esu
lts
.
2
.
3
.
5
.
E
v
a
lua
t
io
n o
f
m
o
dels
Fo
r
th
e
ev
alu
atio
n
,
we
u
s
ed
b
o
th
tr
ain
i
n
g
an
d
test
in
g
d
at
aset
to
f
in
d
o
u
t
th
e
b
est
m
o
d
el.
I
n
t
h
e
tr
ain
in
g
p
h
ase,
we
u
s
ed
F
-
m
ea
s
u
r
e
(
F1
)
o
v
er
1
0
-
f
o
ld
s
cr
o
s
s
-
v
alid
atio
n
(
F1
T
r
ain
)
,
a
n
d
in
th
e
test
in
g
p
h
ase,
we
ca
lcu
lated
F1
o
v
er
th
e
test
in
g
d
ataset
(
F1
T
est
).
3.
RE
SU
L
T
S
A
ND
D
I
SCU
SS
I
O
NS
I
n
th
is
s
ec
tio
n
,
we
p
r
esen
t
o
u
r
ex
p
er
im
e
n
ts
f
o
r
d
ialec
t
p
r
ed
ictio
n
.
I
n
itially
,
we
u
s
ed
SVM
with
p
o
ly
n
o
m
ial
class
if
ier
to
u
n
co
v
er
th
e
b
est
f
ea
tu
r
e
v
ec
to
r
,
th
en
we
tr
ied
o
th
er
class
if
ier
s
t
o
f
in
d
o
u
t
th
e
b
est
p
r
ed
ictio
n
m
o
d
el.
T
h
e
u
s
ed
f
e
atu
r
es
ab
b
r
ev
iate
d
h
er
e
as:
C
NGr
am
f
o
r
c
h
ar
ac
ter
n
-
g
r
a
m
;
Un
iGr
am
,
B
iGr
am
an
d
T
r
iGr
am
f
o
r
wo
r
d
u
n
i
-
g
r
am
,
b
i
-
g
r
am
a
n
d
tr
i
-
g
r
am
r
esp
ec
tiv
ely
,
Stem
f
o
r
s
tem
s
,
L
em
m
a
f
o
r
lem
m
as
,
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
8
-
8
7
0
8
I
n
t J E
lec
&
C
o
m
p
E
n
g
,
Vo
l.
11
,
No
.
2
,
Ap
r
il
2
0
2
1
:
1
6
2
7
-
1633
1630
POS
f
o
r
p
ar
t
o
f
s
p
ee
ch
tag
s
,
L
E
N
f
o
r
len
g
th
e
n
ed
wo
r
d
s
r
atio
,
AT
L
f
o
r
a
v
e
r
ag
e
twee
ts
len
g
th
,
T
PM
f
o
r
twee
ts
p
u
n
ctu
atio
n
m
ar
k
s
a
n
d
L
HM
f
o
r
lin
k
s
,
h
ash
tag
s
an
d
m
en
ti
o
n
s
u
s
ab
ilit
y
r
atio
s
.
3
.1
.
F
ea
t
ures
v
ec
t
o
r
c
o
m
pa
riso
n
us
ing
SV
M
As
we
m
en
tio
n
ed
ab
o
v
e,
s
ev
er
al
f
ea
tu
r
e
v
ec
to
r
s
h
a
v
e
b
ee
n
u
s
ed
to
tr
ain
a
n
u
m
b
er
o
f
m
o
d
els
in
m
u
ltip
l
e
ex
p
e
r
im
en
ts
.
W
e
u
s
ed
h
e
r
e
SVM
class
if
ier
with
p
o
ly
n
o
m
ial
k
er
n
el
as
a
tr
ai
n
in
g
al
g
o
r
ith
m
,
an
d
ca
lcu
lated
th
e
ev
alu
atio
n
cr
ite
r
ia
f
o
r
c
o
m
p
ar
is
o
n
.
Fig
u
r
e
1
s
h
o
ws th
e
r
esu
lts
.
Fig
u
r
e
1
.
C
o
m
p
a
r
is
o
n
b
etwe
e
n
f
ea
tu
r
e
v
ec
to
r
s
u
s
in
g
p
o
ly
n
o
m
ial
-
SVM
At
th
e
f
ir
s
t
s
tep
,
we
tr
ain
ed
o
u
r
m
o
d
el
u
s
in
g
t
h
e
Un
iGr
am
alo
n
e
(
F1
T
r
ai
n
=
6
0
.
2
%),
t
h
e
n
wh
en
we
ad
d
ed
s
o
m
e
f
ea
tu
r
es,
we
n
o
t
iced
th
at
ad
d
in
g
P
O
S
was
th
e
m
o
s
t
ef
f
ec
tiv
e
(
7
1
.
7
%).
T
h
en
,
we
tr
ie
d
th
e
co
n
ca
ten
atio
n
o
f
Un
iGr
am
,
B
iGr
am
an
d
T
r
iGr
am
of
wo
r
d
s
,
th
e
r
esu
lt
was
r
elativ
ely
p
o
o
r
(
4
5
.
8
%).
Usi
n
g
Stem
in
s
tead
o
f
th
e
Un
iGr
am
in
cr
ea
s
ed
th
e
ac
cu
r
ac
y
(
7
4
.
8
%).
Mo
r
eo
v
er
,
u
s
in
g
Stem
co
n
ca
ten
ated
with
th
e
Un
iGr
am
,
LEN
an
d
L
HM
r
atio
s
p
r
o
d
u
ce
d
b
etter
ac
cu
r
a
cy
(
7
7
%,
7
5
%
an
d
7
5
.
1
%
r
esp
ec
tiv
ely
)
.
Usi
n
g
C
NGr
am
as
m
ain
f
ea
tu
r
e
v
ec
to
r
led
to
g
o
o
d
ac
c
u
r
ac
y
(
7
5
.
9
%).
Usi
n
g
L
em
m
a
with
o
th
er
f
ea
tu
r
es
let
to
g
o
o
d
ac
cu
r
ac
y
,
esp
ec
ially
with
AT
L
an
d
T
PM
(
7
4
.
4
%).
At
th
e
en
d
,
we
n
o
ticed
th
at
u
s
in
g
St
em
with
Un
iGr
am
p
r
o
d
u
ce
d
th
e
b
est F1
T
r
ain
(
7
7
%),
an
d
F1
T
est (
7
4
.
6
%).
Fro
m
th
e
r
esu
lts
ab
o
v
e,
we
c
an
n
o
tice
t
h
at
co
n
te
n
t
-
b
ased
f
ea
tu
r
es
p
lay
ed
an
e
f
f
ec
tiv
e
r
o
le
in
th
e
d
ialec
t
p
r
ed
ictio
n
m
o
d
el
b
ec
a
u
s
e
d
if
f
er
en
t
Ar
ab
ic
d
ialec
ts
u
s
e
d
if
f
er
en
t
wo
r
d
s
to
r
ef
lect
th
e
s
am
e
m
ea
n
in
g
,
f
o
r
ex
am
p
le
,
th
e
co
n
ce
p
t
o
f
“m
u
ch
”
is
r
ep
r
esen
ted
u
s
in
g
“
ري
ت
ك
”
in
L
ev
an
tin
e,
“
ي
و
أ
”
in
E
g
y
p
tian
,
“
فازب
”
in
Ma
g
h
r
eb
an
an
d
“
د
ي
او
”
in
Gu
lf
d
ialec
ts
.
W
e
ca
n
n
o
tice
also
th
at
b
est
r
esu
lts
wer
e
o
b
tain
e
d
u
s
in
g
Stem
as
a
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J E
lec
&
C
o
m
p
E
n
g
I
SS
N:
2088
-
8
7
0
8
A
r
a
b
ic
tw
ee
p
s
d
ia
lect
p
r
ed
ictio
n
b
a
s
ed
o
n
m
a
ch
in
e
lea
r
n
in
g
a
p
p
r
o
a
c
h
(
K
h
a
led
A
lr
ifa
i)
1631
co
n
ten
t
-
b
ased
f
ea
tu
r
e
(
co
m
p
a
r
ed
to
Fu
llf
o
r
m
an
d
L
em
m
a
).
T
h
is
ca
n
b
e
ex
p
lain
ed
b
y
t
h
e
f
ac
t
th
at
s
ev
er
al
wo
r
d
s
r
elate
d
to
th
e
s
am
e
o
r
ig
in
ar
e
r
ep
r
esen
te
d
b
y
o
n
e
s
tem
,
f
o
r
e
x
am
p
le,
th
e
w
o
r
d
s
“
تا
ع
م
ا
ج
ل
ا
”,
“
مه
ت
ا
عم
ا
ج
”
an
d
“
ة
عم
ا
ج
ل
ا
”
h
av
e
th
e
s
am
e
s
te
m
“
عم
ا
ج
”.
C
o
n
s
eq
u
en
tly
,
b
y
u
s
in
g
s
tem
s
,
we
m
ak
e
a
tr
ad
e
-
o
f
f
b
etwe
en
th
e
n
u
m
b
er
o
f
f
ea
tu
r
es a
n
d
th
e
o
r
ig
in
s
o
f
t
h
e
wo
r
d
s
.
R
eg
ar
d
in
g
s
ty
le
-
b
ased
f
ea
tu
r
e
s
,
we
ca
n
n
o
tice
th
at
AT
L
h
a
s
a
g
o
o
d
ef
f
ec
t
o
n
th
e
r
esu
lts
.
I
t
s
ee
m
s
th
at
s
o
m
e
d
ialec
ts
allo
w
ex
p
r
ess
in
g
t
h
e
id
ea
u
s
in
g
r
ed
u
ce
d
n
u
m
b
er
o
f
wo
r
d
s
.
L
e
n
g
t
h
en
in
g
wo
r
d
s
is
a
co
m
m
o
n
l
y
u
s
ed
p
r
ac
tice
in
s
o
cial
m
ed
ia.
I
t
s
ee
m
s
th
at
th
e
u
s
e
o
f
len
g
th
en
ed
wo
r
d
s
d
if
f
e
r
s
f
r
o
m
o
n
e
d
ialec
t
t
o
an
o
th
er
,
th
u
s
,
LEN
en
h
an
ce
d
t
h
e
p
r
ed
ictio
n
m
o
d
el.
Usi
n
g
ch
ar
ac
ter
n
-
g
r
a
m
co
m
b
in
es th
e
b
est f
ea
tu
r
es o
f
u
n
i
-
g
r
am
o
f
wo
r
d
s
an
d
u
n
i
-
g
r
am
o
f
s
tem
s
with
all
r
elate
d
p
r
ef
ix
es
an
d
s
u
f
f
ix
es
in
o
n
e
f
ea
tu
r
e
v
ec
to
r
.
T
h
er
e
f
o
r
e,
u
s
in
g
C
NGr
am
en
ab
les tak
in
g
ad
v
an
tag
e
o
f
b
o
th
wo
r
d
s
an
d
s
tem
s
u
s
ed
at
th
e
s
am
e
tim
e
with
o
u
t d
u
p
licatio
n
.
3
.2
.
Cla
s
s
if
iers
co
m
pa
riso
n
Her
e,
we
u
s
ed
th
e
b
est
f
ea
tu
r
es
v
ec
to
r
d
is
co
v
e
r
ed
in
s
ec
tio
n
4
.
1
to
tr
ai
n
n
ew
m
o
d
els
u
s
i
n
g
d
if
f
er
en
t
class
if
ier
s
.
W
e
co
m
p
ar
ed
SVM,
NB
,
R
F
an
d
SMO
clas
s
if
ie
r
s
.
T
h
e
r
esu
lts
ar
e
p
r
esen
ted
in
Fig
u
r
e
2
.
T
h
e
b
est
class
if
ier
was
r
an
d
o
m
f
o
r
e
s
t
(
R
F).
I
t
p
r
o
d
u
ce
d
th
e
b
est
F1
T
r
ain
(
8
0
.
6
%)
an
d
F1
T
est
(
7
8
.
2
%).
I
t
is
wo
r
t
h
m
en
tio
n
in
g
th
at
th
e
a
p
p
r
o
p
r
ia
te
ch
o
ice
o
f
th
e
class
if
ier
is
c
o
n
s
id
er
ed
a
m
ajo
r
s
tep
o
f
a
n
y
m
ac
h
in
e
lear
n
in
g
p
r
o
b
lem
.
T
h
e
co
n
f
i
g
u
r
atio
n
o
f
th
e
class
if
ier
it
s
elf
p
lay
s
a
cr
u
cial
r
o
le
also
.
I
n
o
u
r
r
esear
ch
,
wh
en
we
u
s
ed
SVM,
we
n
o
ticed
th
at
th
e
k
e
r
n
el
o
f
SVM
is
a
v
er
y
im
p
o
r
tan
t
p
ar
am
ete
r
wh
ich
s
h
o
u
ld
b
e
s
elec
ted
ac
cu
r
ately
.
W
e
tr
ied
th
e
lin
ea
r
,
p
o
ly
n
o
m
i
al
an
d
ex
p
o
n
e
n
tial
k
er
n
el.
T
h
e
p
o
ly
n
o
m
ial
k
er
n
el
g
a
v
e
th
e
b
est
r
esu
lt
(
F1
T
r
ain
was
6
9
.
7
% f
o
r
lin
ea
r
,
7
7
% f
o
r
p
o
ly
n
o
m
ial,
7
0
.
6
% f
o
r
ex
p
o
n
en
tial)
.
Fig
u
r
e
2
.
C
lass
if
ier
s
co
m
p
ar
is
o
n
4.
CO
NCLU
SI
O
N
I
n
th
is
r
esear
ch
,
we
p
r
esen
ted
o
u
r
wo
r
k
in
au
th
o
r
p
r
o
f
ilin
g
o
f
Ar
ab
ic
twee
p
s
co
n
ce
r
n
i
n
g
d
i
alec
t
tr
ait.
W
e
tr
ain
ed
s
ev
er
al
m
o
d
els
u
s
in
g
v
ar
io
u
s
f
ea
tu
r
es
an
d
cla
s
s
if
ier
s
to
f
in
d
o
u
t
th
e
b
est
m
o
d
el
f
o
r
p
r
e
d
ictin
g
au
th
o
r
d
ialec
t.
W
e
f
o
u
n
d
th
at
u
s
in
g
R
F
class
if
ier
with
f
u
ll
f
o
r
m
s
an
d
th
eir
s
tem
s
as
a
f
ea
t
u
r
e
v
ec
t
o
r
led
to
th
e
b
est
m
o
d
el
with
F1
T
r
ain
(
8
0
.
6
%)
an
d
F1
T
est
(
7
8
.
2
%).
I
t
will
b
e
wo
r
th
in
v
esti
g
atin
g
u
s
in
g
lem
m
atize
r
f
o
r
Ar
ab
ic
v
er
n
ac
u
lar
s
in
s
tead
o
f
th
e
cu
r
r
en
tly
u
s
ed
lem
m
ati
ze
r
wh
ich
is
m
ad
e
f
o
r
m
o
d
e
r
n
s
tan
d
ar
d
Ar
ab
ic
(
MSA)
.
Mo
r
eo
v
e
r
,
we
in
ten
d
to
s
tu
d
y
t
h
e
ef
f
ec
t
o
f
u
s
i
n
g
d
ee
p
lear
n
in
g
alg
o
r
it
h
m
s
f
o
r
Ar
a
b
i
c
d
ialec
ts
class
if
icatio
n
in
ca
s
e
o
f
av
ailab
ilit
y
a
h
u
g
e
d
ataset
co
llected
f
r
o
m
Ar
a
b
ic
wr
iter
s
.
RE
F
E
R
E
NC
E
S
[1
]
M
.
B
.
o
p
Vo
ll
e
n
b
r
o
e
k
,
e
t
a
l.
,
“
G
r
o
n
UP:
G
ro
n
in
g
e
n
Us
e
r
P
r
o
fil
i
n
g
.
No
teb
o
o
k
f
o
r
P
AN
a
t
CLE
F
2
0
1
6
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
l
u
a
ti
o
n
F
o
r
u
m,
Évo
ra
,
Po
rt
u
g
a
l,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
i
n
g
s
,
2
0
1
6
,
p
p
.
8
4
6
-
8
5
7
.
[2
]
TNS
,
“
Ara
b
S
o
c
ial
M
e
d
ia
Re
p
o
rt
,
”
Ara
b
S
o
c
i
a
l
M
e
d
ia
In
fl
u
e
n
c
e
rs
S
u
mm
it
,
2
0
1
5
.
[3
]
M
.
A
.
Ali,
“
Artifi
c
ial
i
n
telli
g
e
n
c
e
a
n
d
n
a
tu
ra
l
lan
g
u
a
g
e
p
r
o
c
e
ss
in
g
:
th
e
Ara
b
ic
c
o
rp
o
ra
i
n
o
n
l
in
e
tran
sla
ti
o
n
so
ftwa
re
,
”
In
ter
n
a
ti
o
n
a
l
J
o
u
rn
a
l
o
f
A
d
v
a
n
c
e
d
a
n
d
A
p
p
li
e
d
S
c
ien
c
e
s
,
v
o
l
.
3
,
n
o
.
9
,
p
p
.
5
9
-
6
6
,
2
0
1
6
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
0
8
8
-
8
7
0
8
I
n
t J E
lec
&
C
o
m
p
E
n
g
,
Vo
l.
11
,
No
.
2
,
Ap
r
il
2
0
2
1
:
1
6
2
7
-
1633
1632
[4
]
F
.
Hu
a
n
g
,
“
Im
p
ro
v
e
d
Ara
b
ic
Dia
lec
t
Clas
sifica
ti
o
n
with
S
o
c
ial
M
e
d
ia
Da
ta
,
”
Pro
c
e
e
d
in
g
s
o
f
th
e
2
0
1
5
Co
n
fer
e
n
c
e
o
n
Em
p
irica
l
M
e
th
o
d
s
in
N
a
tu
r
a
l
L
a
n
g
u
a
g
e
Pr
o
c
e
ss
in
g
,
2
0
1
5
,
p
p
.
2
1
1
8
-
2
1
2
6
.
[5
]
A
.
Ali,
e
t
a
l.
,
“
Au
to
m
a
ti
c
Dia
lec
t
De
tec
ti
o
n
in
Ara
b
ic Bro
a
d
c
a
st S
p
e
e
c
h
,
”
a
rXiv: 1
5
0
9
.
0
6
9
2
8
,
2
0
1
5
.
[6
]
E
.
Were
n
,
e
t
a
l.
,
“
Ex
a
m
in
in
g
m
u
l
ti
p
le
fe
a
t
u
re
s
fo
r
a
u
t
h
o
r
p
ro
fil
in
g
,
”
J
o
u
rn
a
l
o
f
I
n
fo
rm
a
ti
o
n
a
n
d
Da
t
a
M
a
n
a
g
e
me
n
t
,
v
o
l.
5
,
n
o
.
3
,
p
p.
2
6
6
-
2
7
9
,
2
0
1
4
.
[7
]
K
.
A
lri
fa
i,
G
h
a
id
a
Re
b
d
a
wi,
a
n
d
Na
d
a
G
h
n
e
im
,
“
Co
m
p
a
riso
n
Of
M
a
c
h
in
e
Lea
rn
in
g
Ap
p
ro
a
c
h
e
s
In
A
ra
b
ic
Twe
e
p
s
G
e
n
d
e
r
P
re
d
ictio
n
,
”
In
ter
n
a
ti
o
n
a
l
J
o
u
rn
a
l
o
f
S
c
ien
ti
f
ic
&
T
e
c
h
n
o
lo
g
y
Res
e
a
rc
h
,
v
ol
.
8
,
n
o
.
1
1
,
p
p.
2
8
9
2
-
2
8
9
5
,
2
0
1
9
.
[8
]
F
.
Ra
n
g
e
l,
e
t
a
l.
,
“
Ov
e
rv
iew
o
f
th
e
5
t
h
A
u
th
o
r
P
ro
f
il
in
g
Tas
k
a
t
P
AN
2
0
1
7
:
G
e
n
d
e
r
a
n
d
La
n
g
u
a
g
e
Va
riety
Id
e
n
ti
fica
ti
o
n
i
n
Twit
ter
,”
W
o
rk
in
g
n
o
tes
p
a
p
e
rs
o
f
th
e
C
L
EF
,
p
p
.
1
6
1
3
-
0
0
7
3
,
2
0
1
7
.
[9
]
G
.
Kh
e
n
g
,
e
t
a
l.
,
“
INSA
LYON
a
n
d
UN
I
P
ASS
AU
’s
p
a
rti
c
ip
a
ti
o
n
a
t
P
AN
@CLE
F
’1
7
:
Au
t
h
o
r
P
ro
fil
i
n
g
tas
k
,”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s o
f
th
e
Ev
a
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CE
UR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
0
]
I
.
M
a
rk
o
v
,
e
t
a
l.
,
“
Lan
g
u
a
g
e
-
a
n
d
S
u
b
tas
k
-
De
p
e
n
d
e
n
t
F
e
a
tu
re
S
e
lec
ti
o
n
a
n
d
Clas
sifier P
a
ra
m
e
ter T
u
n
in
g
fo
r
Au
th
o
r
P
ro
fil
i
n
g
,”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
l
u
a
ti
o
n
Fo
r
u
m,
Du
b
li
n
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pr
o
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
1
]
A
.
M
.
Cio
b
a
n
u
,
e
t
a
l.
,
“
In
c
l
u
d
i
n
g
Dia
lec
ts an
d
Lan
g
u
a
g
e
Va
rieties
in
Au
t
h
o
r
P
r
o
fil
i
n
g
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s o
f
t
h
e
Eva
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
2
]
A
.
P
o
u
lst
o
n
,
e
t
a
l.
,
“
Us
in
g
TF
-
IDF
n
-
g
ra
m
a
n
d
W
o
rd
Emb
e
d
d
in
g
Clu
ste
r
En
se
m
b
les
fo
r
Au
t
h
o
r
P
ro
fi
li
n
g
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s o
f
th
e
Ev
a
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CE
UR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
3
]
N
.
S
c
h
a
e
tt
i,
“
Un
iNE
a
t
CLE
F
2
0
1
7
:
TF
-
IDF
a
n
d
De
e
p
-
Lea
rn
in
g
f
o
r
Au
t
h
o
r
P
r
o
fil
i
n
g
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
th
e
Eva
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
4
]
M
.
Ko
c
h
e
r
a
n
d
J
.
S
a
v
o
y
,
“
Un
i
NE
a
t
CLE
F
2
0
1
7
:
Au
t
h
o
r
P
ro
fil
in
g
Re
a
so
n
in
g
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
th
e
Eva
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
5
]
Y
.
Ad
a
m
e
-
Arc
ia,
e
t
a
l.
,
“
Au
th
o
r
P
ro
fil
in
g
,
i
n
sta
n
c
e
-
b
a
se
d
S
imila
rit
y
Clas
sifica
ti
o
n
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
6
]
E
.
S
.
Tellez
,
e
t
a
l.
,
“
G
e
n
d
e
r
a
n
d
lan
g
u
a
g
e
-
v
a
riet
y
i
d
e
n
ti
fica
ti
o
n
with
M
icr
o
TC
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
7
]
J
.
A
.
Kh
a
n
,
“
Au
t
h
o
r
P
ro
f
il
e
P
re
d
i
c
ti
o
n
Us
in
g
Tren
d
a
n
d
W
o
rd
F
re
q
u
e
n
c
y
Ba
se
d
An
a
ly
sis
i
n
Tex
t
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
l
u
a
ti
o
n
F
o
ru
m,
D
u
b
li
n
,
Ire
l
a
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
8
]
M
.
M
a
rti
n
c
,
e
t
a
l.
,
“
P
AN
2
0
1
7
:
Au
th
o
r
P
ro
fil
in
g
-
G
e
n
d
e
r
a
n
d
Lan
g
u
a
g
e
Va
riety
P
re
d
icti
o
n
,
”
Co
n
fe
re
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
l
u
a
ti
o
n
Fo
r
u
m,
Du
b
li
n
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pr
o
c
e
e
d
in
g
s
,
2
0
1
7
.
[1
9
]
L
.
Ak
h
t
y
a
m
o
v
a
,
e
t
a
l
.,
“
Twit
ter
Au
th
o
r
P
ro
fi
li
n
g
Us
in
g
Wo
r
d
Em
b
e
d
d
i
n
g
s
a
n
d
Lo
g
isti
c
Re
g
re
ss
io
n
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
l
u
a
ti
o
n
Fo
r
u
m,
Du
b
li
n
,
Ire
l
a
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
i
n
g
s
,
2
0
1
7
.
[2
0
]
R
.
R
.
Oliv
e
ira
a
n
d
R
.
F
.
d
e
O
.
Ne
to
,
“
Us
in
g
c
h
a
ra
c
ter
n
-
g
ra
m
s
a
n
d
sty
le
fe
a
tu
re
s
fo
r
g
e
n
d
e
r
a
n
d
lan
g
u
a
g
e
v
a
riet
y
c
las
sifica
ti
o
n
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
l
u
a
ti
o
n
Fo
r
u
m,
Du
b
li
n
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[2
1
]
A
.
Og
a
lt
so
v
a
n
d
A
.
Ro
m
a
n
o
v
,
“
Lan
g
u
a
g
e
Va
riety
a
n
d
G
e
n
d
e
r
Clas
sifica
ti
o
n
fo
r
Au
t
h
o
r
P
r
o
fil
i
n
g
in
P
AN
2
0
1
7
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s o
f
th
e
Ev
a
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CE
UR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[2
2
]
A
.
Ba
sile,
e
t
a
l.
,
“
N
-
G
rAM:
Ne
w G
ro
n
in
g
e
n
A
u
th
o
r
-
p
ro
fil
in
g
M
o
d
e
l
,
”
a
rXiv:1
7
0
7
.
0
3
7
6
4
,
2
0
1
7
.
[2
3
]
D
.
Ko
d
i
y
a
n
,
e
t
a
l.
,
“
Au
th
o
r
P
r
o
fi
li
n
g
with
Bid
irec
ti
o
n
a
l
RNN
s
u
si
n
g
Atte
n
ti
o
n
wit
h
G
RUs
,
”
Co
n
fe
re
n
c
e
a
n
d
L
a
b
s
o
f
t
h
e
Eva
l
u
a
ti
o
n
Fo
r
u
m,
Du
b
li
n
,
Ire
la
n
d
,
CEUR
W
o
rk
sh
o
p
Pr
o
c
e
e
d
in
g
s
,
2
0
1
7
.
[2
4
]
S
.
S
ierra
,
e
t
a
l.
,
“
Co
n
v
o
lu
t
io
n
a
l
Ne
u
ra
l
Ne
two
rk
s
fo
r
Au
t
h
o
r
P
ro
fil
in
g
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s
o
f
th
e
Eva
lu
a
ti
o
n
Fo
ru
m,
D
u
b
li
n
,
Ire
l
a
n
d
,
CEUR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[2
5
]
M
.
F
ra
n
c
o
-
S
a
lv
a
d
o
r,
e
t
a
l.
,
“
S
u
b
wo
rd
-
b
a
se
d
De
e
p
Av
e
ra
g
i
n
g
Ne
two
rk
s
fo
r
Au
t
h
o
r
P
ro
fi
li
n
g
i
n
S
o
c
ial
M
e
d
ia
,
”
Co
n
fer
e
n
c
e
a
n
d
L
a
b
s o
f
th
e
Ev
a
lu
a
ti
o
n
F
o
ru
m,
D
u
b
l
in
,
Ire
la
n
d
,
CE
UR
W
o
rk
sh
o
p
Pro
c
e
e
d
in
g
s
,
2
0
1
7
.
[2
6
]
A
.
Ab
d
e
lali
,
e
t
a
l.
,
“
F
a
ra
sa
:
A
F
a
st
a
n
d
F
u
rio
u
s
S
e
g
m
e
n
ter
fo
r
Ara
b
ic
,
”
Qa
ta
r
C
o
mp
u
ti
n
g
Res
e
a
rc
h
In
sti
tu
te
,
Ha
ma
d
Bi
n
Kh
a
li
f
a
U
n
ive
rs
it
y
,
Do
h
a
,
Qa
ta
r.
Pro
c
e
e
d
in
g
s
o
f
N
AA
CL
-
HLT
2
0
1
6
(De
mo
n
str
a
ti
o
n
s)
,
S
a
n
Die
g
o
,
Ca
li
fo
rn
ia,
As
so
c
iati
o
n
fo
r
C
o
m
p
u
tatio
n
a
l
Li
n
g
u
isti
c
s
,
2
0
1
6
,
p
p.
11
-
16
.
B
I
O
G
RAP
H
I
E
S
O
F
AUTH
O
RS
G
h
a
id
a
Re
b
d
a
wi
P
h
.
D.
in
S
o
ftwa
re
En
g
i
n
e
e
rin
g
fro
m
INSA
d
e
L
y
o
n
,
F
ra
n
c
e
.
Re
se
a
rc
h
Dire
c
to
r,
De
p
u
t
y
Dire
c
to
r
fo
r
E
d
u
c
a
ti
o
n
a
l
Affa
irs,
a
n
d
P
r
o
fe
ss
o
r
o
f
S
o
ftwa
re
En
g
i
n
e
e
rin
g
a
t
HIA
S
T,
Da
m
a
sc
u
s
S
y
ria
.
Cu
rre
n
t
re
se
a
rc
h
e
s
in
c
lu
d
e
th
e
u
se
o
f
Bu
sin
e
ss
P
ro
c
e
ss
M
o
d
e
li
n
g
t
o
M
a
n
a
g
e
c
h
a
n
g
e
Re
q
u
irem
e
n
ts
i
n
Ag
il
e
S
o
ftwa
re
De
v
e
lo
p
m
e
n
t
,
Au
th
o
r
p
r
o
fil
i
n
g
fro
m
Ara
b
ic
S
o
c
ial
m
e
d
ia
u
sin
g
M
a
c
h
in
e
Lea
r
n
in
g
a
n
d
NLP
tec
h
n
i
q
u
e
s,
a
n
d
th
e
d
e
v
e
lo
p
m
e
n
t
o
f
a
n
On
t
o
l
o
g
y
in
Ara
b
ic.
Co
-
a
u
t
h
o
r
o
f
m
a
n
y
e
-
Co
n
ten
t
i
n
S
o
f
twa
re
En
g
i
n
e
e
rin
g
fo
r
S
y
rian
Virt
u
a
l
Un
iv
e
rsit
y
(S
VU
).
Co
-
a
u
th
o
r
o
f
m
a
n
y
Bo
o
k
s
i
n
S
o
ftwa
re
En
g
in
e
e
rin
g
in
Ara
b
ic
lan
g
u
a
g
e
,
a
n
d
in
th
e
tran
sla
ti
o
n
o
f
IT
Bo
o
k
s
a
n
d
i
n
th
e
p
ro
d
u
c
ti
o
n
o
f
p
ro
fe
ss
io
n
a
l
d
ictio
n
a
ries
fro
m
En
g
li
sh
t
o
Ara
b
ic.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J E
lec
&
C
o
m
p
E
n
g
I
SS
N:
2088
-
8
7
0
8
A
r
a
b
ic
tw
ee
p
s
d
ia
lect
p
r
ed
ictio
n
b
a
s
ed
o
n
m
a
ch
in
e
lea
r
n
in
g
a
p
p
r
o
a
c
h
(
K
h
a
led
A
lr
ifa
i)
1633
Na
d
a
G
h
n
e
i
m
P
h
.
D.
i
n
Lan
g
u
a
g
e
S
c
ien
c
e
s
(S
p
e
e
c
h
Co
m
m
u
n
ica
ti
o
n
)
fr
o
m
th
e
“
In
sti
tu
t
d
e
la
Co
m
m
u
n
ica
ti
o
n
P
a
rlée
”
-
S
te
n
d
h
a
l
(G
r
e
n
o
b
le
III)
Un
iv
e
rsity
,
F
ra
n
c
e
,
1
9
9
7
,
a
n
d
a
P
o
stg
ra
d
u
a
te
De
g
re
e
(DEA)
in
Artifi
c
ial
In
telli
g
e
n
c
e
(Im
a
g
e
,
Ro
b
o
ti
c
s,
Visi
o
n
),
fr
o
m
th
e
Na
ti
o
n
a
l
Hig
h
S
c
h
o
o
l
o
f
C
o
m
p
u
ter
S
c
ien
c
e
a
n
d
Ap
p
li
e
d
M
a
th
e
m
a
ti
c
s
in
G
re
n
o
b
l
e
(ENS
IM
AG
),
F
ra
n
c
e
,
1
9
9
3
.
No
wa
d
a
y
s,
I’
m
a
n
As
sista
n
t
P
ro
fe
ss
o
r
a
t
th
e
F
a
c
u
lt
y
o
f
I
n
f
o
rm
a
ti
c
s
&
Co
m
m
u
n
ica
ti
o
n
En
g
i
n
e
e
rin
g
,
a
t
th
e
Ara
b
I
n
tern
a
ti
o
n
a
l
Un
iv
e
rsit
y
(AIU
),
D
a
m
a
sc
u
s,
S
y
ria.
I’m
a
lso
a
Re
se
a
rc
h
e
r/L
e
c
tu
re
r,
a
t
Hig
h
e
r
In
stit
u
te
f
o
r
Ap
p
l
ied
S
c
ien
c
e
s
a
n
d
Tec
h
n
o
l
o
g
y
(HIA
S
T),
a
n
d
a
t
th
e
In
f
o
rm
a
ti
o
n
Tec
h
n
o
l
o
g
y
En
g
in
e
e
rin
g
F
a
c
u
lt
y
(Da
m
a
sc
u
s
Un
iv
e
rsity
).
I’m
a
m
e
m
b
e
r
o
f
th
e
S
y
rian
C
o
m
p
u
ter
S
o
c
iety
,
a
n
d
I
h
a
v
e
m
a
n
y
p
u
b
li
c
a
ti
o
n
s
in
S
p
e
e
c
h
a
n
d
Na
tu
ra
l
Lan
g
u
a
g
e
P
ro
c
e
ss
in
g
d
o
m
a
in
,
s
u
c
h
a
s
Ar
a
b
ic
Tex
t
-
to
-
S
p
e
e
c
h
,
S
e
n
ti
m
e
n
t
An
a
ly
sis,
M
o
r
p
h
o
lo
g
ica
l
a
n
d
S
y
n
tac
ti
c
An
a
l
y
sis,
Dic
ti
o
n
a
ry
a
n
d
On
t
o
lo
g
y
Bu
il
d
i
n
g
.
K
h
a
led
Alr
ifa
i
P
h
.
D
.
c
a
n
d
i
d
a
te
in
Hi
g
h
e
r
I
n
stit
u
te
fo
r
Ap
p
li
e
d
S
c
ien
c
e
s
a
n
d
Tec
h
n
o
lo
g
y
HIA
S
T,
Da
m
a
sc
u
s,
S
y
ria.
I
h
o
l
d
a
m
a
ste
r
d
e
g
re
e
fro
m
HIA
S
T
e
n
ti
tl
e
d
:
In
f
o
rm
a
ti
o
n
a
n
d
d
e
c
isio
n
su
p
p
o
rt
sy
ste
m
,
a
n
d
li
c
e
n
se
d
e
g
re
e
in
i
n
fo
rm
a
ti
o
n
tec
h
n
o
l
o
g
y
e
n
g
in
e
e
ri
n
g
fr
o
m
Da
m
a
sc
u
s
u
n
i
v
e
rsit
y
s
p
e
c
ialize
d
in
a
rti
ficia
l
in
telli
g
e
n
c
e
.
Cu
rre
n
tl
y
,
I'
m
in
ter
e
ste
d
in
Ara
b
ic
d
a
ta
a
n
a
ly
sis
re
se
a
rc
h
e
s
a
n
d
in
a
ll
re
late
d
AI
te
c
h
n
iq
u
e
s.
I
d
e
p
e
n
d
o
n
NLP
a
n
d
d
a
ta
m
in
in
g
t
o
b
e
u
se
d
i
n
c
a
se
o
f
Ara
b
ic l
a
n
g
u
a
g
e
to
p
ro
p
o
se
b
e
n
e
ficia
l
to
o
ls f
o
r
b
u
sin
e
ss
a
n
d
a
c
a
d
e
m
ic p
u
rp
o
se
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.