Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 4
,
A
ugu
st
2016
, pp
. 18
97
~
1
906
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
4.1
040
3
1
897
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Inferring Student's Chat Topic in
Coll
oquial Arabic T
e
xt Using
Semantic Representation
Fa
isa
l
T
.
Khamay
seh
Department o
f
I
n
formation Tech
nolog
y
and
Computer
Engineerin
g
Palestine
Pol
y
t
e
chnic
Universit
y
, Pal
e
stine
Article Info
A
B
STRAC
T
Article histo
r
y:
Received
Ma
r 6, 2016
Rev
i
sed
May 22
, 20
16
Accepte
d
J
u
n 6, 2016
Since the
colloq
u
ial Arabi
c
is now widespread i
t
is required to
describe th
e
coll
ect
ion and
c
l
assifica
tion of
a
m
u
lti-dial
ec
tal
c
o
rpus of Arabic
.
Nowada
y
s
,
colloquial multi-dial
ectal comes
in almost countr
y
b
a
sed for
m
s
such as
Eg
y
p
tian
,
Ir
aqi
,
Levan
tine
,
Tun
i
sian,
et
c. This p
a
per discusses a n
e
w method
for analy
z
ing the conversation of the e
ducational chat room using
Corpus for
Palestinian Arab
ic and Stanford Tagger
too
l
. This method represents the k
e
y
words using semantic net-like
represen
t
a
tion
to
obtain
the
m
a
in
subjects of
the conversation
.
The main subject of
the chat is obtain
e
d using the proposed
method which
achieves a high
accuracy
. Usin
g Arabic Corpu
s
, Stanford
Tagger
and per
c
entag
e
of ke
yw
ords will assure
m
o
re accur
a
c
y
.
The stud
y
als
o
ex
am
ines
t
h
e eff
e
c
t
of p
i
v
o
t-words
dis
t
rib
u
tion bas
e
d on
occurren
ces
and
betw
eenes
s
values of the p
i
v
o
ts throughout
the text. Th
is stud
y
ex
amines
some of the characteristics of the text
s written in colloqui
al Arabic dialect
and anal
ys
is
of the free expres
s
i
ve Arabic s
t
at
e
m
ents
. The res
u
l
t
s
s
how that
the cor
e
subject of the ch
at can be determin
ed b
y
combinin
g both the
occurren
ces
and
the d
i
stribution
of the word thro
ugh the conv
ersation.
Keyword:
A
r
ab
ic
ch
a
t
A
r
ab
ic
co
rp
or
a
Co
llo
qu
ial an
al
ysis
Palestine ara
b
i
c
corpus
Sem
a
n
tic n
e
t
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Faisal
T. Kha
m
ayseh,
Depa
rt
m
e
nt
of
C
o
m
put
er Sci
e
nce, C
o
l
l
e
ge
o
f
I
n
f
o
rm
at
i
on T
echn
o
l
o
gy
a
n
d
C
o
m
put
er E
ngi
neeri
n
g
,
Palestin
e Po
lytech
n
i
c
Un
iv
ersity,
Depa
rt
m
e
nt
of C
o
m
put
er
Sci
e
nce, He
br
o
n
. P
.
O.
B
o
x 1
9
8
.
P
a
l
e
st
i
n
e.
Em
a
il: faisal@p
pu
.edu
.p
s
1.
INTRODUCTION
So
cial n
e
twor
kin
g
an
d
so
cial med
i
a p
l
atf
o
rm
s
in
cr
ease r
a
p
i
d
l
y in
typ
e
s
an
d
i
n
th
e huge n
u
m
b
e
r
of
u
s
ers. Th
ey increase in
th
eir u
s
ag
e an
d
th
ei
r h
u
g
e
nu
m
b
er o
f
do
cu
m
e
n
t
s su
ch
as Sk
yp
e, Wh
atsApp
, Twitter,
Facebook, Vi
ber, IRC, Blogs,
Myspace,
just
to m
e
ntion a fe
w. Eac
h
of
t
h
e
s
e networks
provi
des c
h
at pla
t
form
fo
r the large
num
ber o
f
use
r
s. Som
e
platform
s exist
to
serve s
o
m
e
sp
ecific scopes
suc
h
as studie
s
and
researc
h
,
while othe
rs are s
h
are
d
wit
h
the followers
on va
rious s
o
cial
m
e
dia platform
s such as
ope
n
con
v
e
r
sat
i
o
n
r
oom
s. S
p
eci
fi
c
g
r
o
u
p
s
be
nefi
t
fr
om
ope
n
pl
at
form
s t
o
fo
r
m
t
h
ei
r cl
ose
d
soci
al
net
w
or
k, a
n
d
ot
he
rs m
a
y
benefi
t
m
o
re fro
m
speci
fi
c con
f
i
g
ure
d
pl
at
form
s
such as LMS e-classes on M
o
odle, Ill
u
m
i
nate,
etc.
No
wa
day
s
co
nve
rsat
i
o
n o
n
soci
al
m
e
di
a ski
p
ped t
h
e
standa
rd
gram
matical rules in alm
o
st all
l
a
ng
uage
s.
As
i
n
m
o
st
of t
h
e cu
rre
nt
l
a
n
gua
ges
,
A
r
abi
c
l
a
ng
ua
ge ha
s t
w
o
fo
rm
s;
t
h
e st
an
dar
d
a
nd t
h
e
co
llo
qu
ial. The stan
d
a
rd
fo
rm is su
b
j
ect t
o
th
e
firm
ru
les th
at syn
t
actically co
v
e
r all form
s o
f
written
and
sp
ok
en
statemen
ts. C
o
llo
qu
ial Arab
ic is
wid
e
ly u
s
ed
as
sp
ok
en
langu
age and
lately is b
e
ing
wid
e
ly used
as
written langua
g
e especially in m
obile
m
e
s
s
aging and
we
b social m
e
di
a. Som
e
recent atte
m
p
ts focus on
an
alyzin
g
th
e
ru
le-free tex
t
an
d
bu
ild
ing
so
m
e
ru
les (roo
tin
g).
A con
s
id
erab
le work d
o
n
e
b
y
[1
]-[4
]
in
devel
opi
ng
A
r
abi
c
O
n
t
o
l
o
gy
t
o
defi
ne t
h
e
fo
rm
al
speci
ficat
i
on o
f
t
h
e c
once
p
t
s
of
Ar
abi
c
w
o
r
d
s rel
a
t
e
d t
o
Palestin
ian
spok
en
an
d
written
con
v
e
rsations
.
Peop
le m
a
y
u
s
e th
eir so
ci
al co
llo
qu
ial te
x
t
wh
ile ch
attin
g
on
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 6
,
N
o
. 4
,
Au
gu
st 2
016
:
18
97
–
1
906
1
898
som
e
ope
n s
o
cial environment s
u
ch
as
stude
nt e
-
classe
s and
academ
ic st
udent
to stude
nt portals. Searc
h
engi
nes
do
not
fully support
Ara
b
ic langua
g
e in acce
ptab
l
e
form
while a
v
ailable sea
r
ch engines
are ty
pically
l
i
m
i
t
e
d t
o
key
w
o
r
d sea
r
c
h
es
based
on st
r
u
ct
ures an
d r
u
l
e
s of t
h
e l
a
n
g
u
a
ge an
d d
o
n
o
t
t
a
ke i
n
account
t
h
e
sem
a
n
tics o
f
th
e co
n
t
en
t [5
]. Th
e ch
alleng
e co
m
e
s in
th
e
correct analysis of
bot
h
forms the sta
nda
rd
Ara
b
ic
an
d th
e co
lloqu
ial Arab
ic tex
t
s.
Desp
ite th
e
d
e
m
a
n
d
s
for th
e correct
us
e of
th
e
standar
d
lan
g
u
a
g
e
ph
r
a
ses
especially the
written te
xt, the dial
ects of c
o
lloquial im
pos
e them
selves
in the
curre
nt i
n
ternet
world.
There
are m
a
ny educational
chat room
s tha
t
contai
n
di
f
f
er
ent
st
u
d
ent
s
a
n
d
vari
o
u
s c
o
nv
ersat
i
o
n
s
i
n
d
i
fferen
t
subj
ects. So
m
e
o
f
cu
rren
t
un
iv
ersities u
s
e ed
u
cat
io
n
a
l web
s
ites to
supp
ort
th
e d
i
scu
ssi
on
s
between
stude
nts and teachers
.
This c
o
nve
r
sa
tion
ha
s adva
ntages
for the e
v
alua
tion
of the st
udent acknowledgm
e
nts
conce
r
ni
n
g
sp
eci
fi
c sub
j
ect
.
In t
h
i
s
pape
r,
we sug
g
est
a new m
e
t
hod t
o
ext
r
act
t
h
e m
a
i
n
sub
j
ect
s of t
h
e
con
v
e
r
sat
i
o
n
s
t
h
at
st
ude
nt
s en
gage
d i
n
. T
h
i
s
m
e
t
hod,
based
on a
n
al
y
z
i
ng t
h
e chat
of t
h
e
st
ude
nt
s, de
pe
nds
on
co
nv
ertin
g
th
e word
s
o
f
th
e
ch
at to
th
e equ
i
v
a
len
t
g
l
o
ss
in
Co
rp
us fo
r
Palestin
ian
Arab
ic, th
en
th
e
syste
m
com
put
es t
h
e
h
i
ghe
r
perce
n
t
a
ge
of
w
o
r
d
s
t
h
at
exi
s
t
i
n
t
h
e
g
i
ven c
o
nve
rsat
i
o
n
.
2.
BA
C
KGR
OUN
D
The ne
w em
erging spelling e
r
rors a
nd
othe
r inaccura
te term
s
are perceived as an acce
ptable act in
onl
i
n
e c
o
m
m
u
n
i
cat
i
on.
Text
-
b
ase
d
chat
i
n
n
o
w a
v
ai
l
a
bl
e a
s
one
o
f
t
h
e m
o
st
val
u
abl
e
c
o
m
m
uni
cat
i
on t
ool
s i
n
web
.
As a
m
a
t
t
e
r of fact
, t
h
e no
rm
al chat
conve
rsat
i
o
ns d
o
n
o
t
co
nf
orm
t
o
l
a
nguage r
u
l
e
s.
Di
ffe
ren
t
lan
g
u
a
g
e
s
h
a
v
e
d
i
fferen
t lev
e
l
s
of co
llo
qu
ia
l
words in spok
en
o
r
written
con
v
e
rsation
s
.
2.
1.
Ch
at T
e
xt
A
n
al
ysi
s
Recently, there has
been propose
d
differe
n
t
m
odified tools and
algorit
h
m
s
focusing on
ont
ology
analysis suc
h
as using lexic
o
-syntactic patterns
[6].
Recent resea
r
ches a
tte
m
p
t to achieve better a
n
alysis of
ch
at tex
t
and n
e
twor
k r
e
p
r
esen
tatio
n of
ch
at-
l
og
d
a
ta [7
]. Con
t
emp
o
r
a
r
y
an
alysis ackno
w
l
ed
ges the
charact
e
r
i
s
t
i
c
s of c
h
at
m
e
ssages and
pr
op
ose
s
an i
ndi
ca
tive term
-based
categorizati
on a
p
pr
oac
h
fo
r chat
t
opi
c
detection [8].
The be
ne
fit of
these anal
ytical stu
d
i
es and
in
v
e
stig
atio
n
s
i
s
to
en
ab
le tracin
g
th
e
relay ch
at o
f
peo
p
l
e
an
d av
oi
d t
h
e
vast
am
ount
of co
nt
i
n
u
o
u
s m
oni
t
o
ri
ng e
f
f
o
rt
.
Al
t
h
ou
g
h
t
h
i
s
anal
y
s
i
s
m
a
y
not
b
e
10
0%
accurate i
n
out
com
e
s, it provi
d
es e
v
ide
n
ces
an
d assists in a
ccom
p
lishing t
h
e
desire
d
goal
s
.
Ara
b
i
c
i
n
st
ant
chat
s al
so l
ack of a
n
al
y
s
i
s
especi
al
l
y
t
h
e col
l
o
q
u
i
a
l
spoke
n/
w
r
i
t
t
e
n Ara
b
i
c
con
v
e
r
sat
i
o
n
.
Li
ke m
o
st
l
a
ngua
ges
,
i
t
i
s
easi
e
r t
o
anal
y
z
e
st
anda
rd
Ara
b
i
c
si
nce t
h
e st
anda
r
d
st
at
em
ent
s
are
g
o
v
e
rn
ed
b
y
the fix
e
d
stru
ctures and
ru
les. Alth
oug
h
th
ere are
m
a
n
y
stu
d
ies d
eal with
stan
d
a
rd
Arab
ic, still
th
er
e is m
u
ch
lack
in
pho
no
l
o
g
y
, m
o
r
p
h
o
l
og
y, an
d syn
t
ax
an
alysis.
Arab
ic lang
uag
e
is cha
r
acterized in its
co
m
p
lex
ity d
u
e to
rich
“ro
o
t
-and
-p
attern
”
m
o
rp
ho
log
y
an
d
am
b
i
g
u
ity.
Th
is is du
e to th
e ab
sen
ce
of short
v
o
wels for m
o
st Arab
ic tex
t
s. Mo
reo
v
e
r,
p
r
od
u
c
tiv
e clitic
s an
d
affi
x
e
s
o
f
Arab
ic wo
rd
s
an
d
so
m
e
roo
t
letters
can
b
e
h
a
rd
t
o
g
u
e
ss if on
e
o
r
two
ro
o
t
letters are l
o
ng
vo
wels o
r
b
e
l
o
ng
t
o
letters’ affix
e
s
o
r
clitics [9
].
A no
tab
l
e effo
rt
b
y
[10
]
presen
ts a statistical
stu
d
y
o
f
clitics in
Arab
ic lan
g
u
a
g
e
to
show the
d
i
stribu
tio
n
o
f
clitics an
d
ex
amin
e th
e p
e
rform
a
n
ce o
f
th
e u
s
ed
to
k
e
n
i
zer. Th
ey ap
p
lied
clitics
to
k
e
n
i
zatio
n
on
a l
a
rge Ara
b
i
c
corp
us an
d s
h
o
w
e
d
t
h
at
a
red
u
ct
i
o
n of
2
4
.
5
4
%
i
n
a num
ber of u
n
i
q
ue t
oke
ns co
u
l
d be
achi
e
ve
d.
As
any
ot
her l
a
ng
uage
,
deep
or
shal
l
o
w sy
nt
ac
t
i
c
anal
y
s
i
s
of
free c
o
nve
rsa
t
i
ons r
e
q
u
i
r
es
l
a
rge
corres
ponding corpora [11].
2.
2.
Co
rpus fo
r
Palestinia
n Ara
b
ic
Som
e
m
i
nor
di
ffe
re
nces a
m
ong t
h
e Pal
e
st
i
n
i
a
n s
p
o
k
e
n
di
al
ect
s exi
s
t
whi
l
e
s
h
ari
n
g
t
h
e sam
e
l
i
ngui
st
i
c
asset
s
. The
m
a
i
n
di
ffe
rences
exi
s
t
i
n
p
h
o
n
o
l
o
gy
and
l
e
xi
co
n
p
r
efe
r
ences
t
h
at
vary
am
ong
m
a
jo
r
hi
st
ori
cal
ar
ea
s of
Pal
e
st
i
n
e.
M
a
ny
w
o
r
d
s
i
n
C
o
rp
us
a
r
e
annotated i
n
the contex
t
as
t
h
ey
ha
ve
di
f
f
ere
n
t
an
no
tatio
ns in d
i
fferen
t co
ntex
ts [1
2
]
. The stud
y d
e
fined
th
e anno
tatio
n
o
f
a
word
as a tup
l
e: <Raw
(U
ni
co
de)
,
R
a
w (B
uc
k
w
al
t
e
r
)
, C
O
DA (
U
ni
code
), C
O
D
A
(B
uc
kwal
t
e
r
)
,
Lem
m
a
, B
u
ckwal
t
e
r POS
,
Gl
oss an
d
Analysis>. The gloss of each word in
the chat is required t
o
obtain the eq
uivalent word in English Corpus in
or
der t
o
spee
d u
p
t
h
e a
n
n
o
t
a
t
i
on
pr
oces
s usi
n
g M
A
DAM
IR
A t
o
o
l
for m
o
rp
h
o
l
ogi
cal
anal
y
s
i
s
an
d
d
i
sam
b
ig
u
a
tion
of
MSA
an
d EGY
.
MADAMI
R
A
too
l
is
ch
osen
b
ecau
s
e o
f
assu
m
p
tio
n
th
at EG
Y
/
M
S
A
an
d
PA
L sh
ar
e m
a
n
y
o
r
t
h
ogr
aph
i
c an
d
m
o
r
pholo
g
i
cal f
eatur
es
. A
s
a r
ecen
t
w
o
r
k
, [1
2
]
co
n
s
t
r
u
c
ted
a co
rpu
s
co
nsistin
g
of wo
rd
s fro
m
Palestin
ian
d
i
fferent
recou
r
ses
an
d p
r
esen
ted
d
i
fferen
t
p
ilo
t st
u
d
ies to
select the b
e
st
to
o
l
to sp
eed up
th
ei
r an
no
tatio
n pro
cess.
2.
3.
A
r
abic Co
rpora
In t
h
e f
o
rm
of t
h
e st
anda
rd
l
a
ngua
ge w
o
r
d
s an
d m
o
rph
o
l
o
gi
cal
and s
y
nt
act
i
c
st
ruct
uri
ng r
u
l
e
s,
Ara
b
i
c
l
a
ng
ua
ge has
very
st
ri
ct
and p
o
w
er
ful
r
u
l
e
s.
I
n
th
e form
of sp
o
k
en la
nguage, Ara
b
ic
enc
o
mpasses
many dialects
scattered all
ove
r Arabic
worl
d areas.
Differe
n
t refe
renc
ing Corpora are now a
v
ailable t
o
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Inf
e
rri
ng
St
ude
nt
's C
hat
To
pi
c
i
n
C
o
l
l
o
q
u
i
a
l
Ara
b
i
c
Text
Us
i
ng
Se
ma
nt
i
c
..
.. (
F
ai
s
a
l
T
.
K
h
am
ayse
h)
1
899
en
ab
le co
m
p
u
tin
g
lingu
istics an
d
wo
rd
s’ an
al
ysis in
clu
d
i
ng
Ad
jir Co
rp
o
r
a
,
Tas
h
keela
,
Ar
abi
c W
o
r
d
C
o
r
por
a
,
Alwatan
,
OS
A
C
, t
o
m
e
nt
i
on
a fe
w [
1
3]
.
Ara
b
ic Corpus
curre
ntly is one of the m
o
st langua
ge area
s of resea
r
ch. Many recent studies handle
i
nvest
i
g
at
e a
n
not
at
ed
l
i
n
g
u
i
s
t
i
c
reso
urce
whi
c
h s
h
o
w
s
t
h
e A
r
abi
c
gr
am
m
a
r, sy
nt
ax
and m
o
r
p
h
o
l
ogy
fo
r
Ara
b
i
c
w
o
r
d
s
.
Som
e
l
i
ngui
st
i
c
st
udi
es a
n
d a
n
al
y
s
i
s
serve t
h
e m
o
t
h
er st
an
dar
d
l
a
n
g
u
age
and
j
u
st
fe
w
of
t
h
em
serve
local
dialects [13]-[17]. Vari
ou
s
dialects sha
r
e m
a
ny m
o
rphological an
d syntactic
structures e
s
pe
cially
in closed c
o
untries that share
som
e
com
m
on t
r
adi
t
i
ons suc
h
as L
e
vantine a
r
ea, Gulf area
, Maghre
b
(Mo
r
o
c
co
) coun
tries.
Th
e si
milarit
y
in
lo
cal d
i
alect
s m
a
k
e
s th
e st
u
d
i
es fairly li
mited
to
so
m
e
areas.
2.
4.
Social Conver
sati
ons
Facto
r
s su
ch
as p
a
rticip
an
ts,
to
p
i
c,
fun
c
tion o
f
i
n
te
raction
,
and
th
e
v
a
lu
e
o
f
t
h
e in
teractio
n affect th
e
lev
e
l o
f
d
i
alectic co
nv
ersation
[18
]
.
Social
m
e
dia and
social channels
also affect the
langua
ge and now
becom
e
m
a
jor
reso
urces
of
po
p
p
i
n
g u
p
u
n
c
ont
rol
l
e
d
wo
r
d
s. S
o
ci
al
m
e
di
a has pr
om
pt
ed a p
o
we
rf
ul
subt
l
e
revolution in c
o
nve
r
sations.
Educationa
l
platform
s are not so fa
r from
th
is accelerating lingu
istic re
volution.
Howev
e
r, th
ere is u
r
g
e
n
t
need
for
an
alytical stu
d
i
es to
assess reality
, th
e effect an
d
v
a
lu
es of so
cial
conve
r
sations especially
in
the context
of learni
ng.
2.
5.
Educatio
na
l Cha
t
Roo
m
s
Ed
ucat
i
onal
c
h
at
ro
om
i
s
t
h
e use
of t
e
c
h
n
o
l
ogi
cal
t
o
ol
s i
n
l
earni
n
g
a
n
d
f
o
r s
h
a
r
i
n
g i
n
fo
rm
ati
on vi
a
t
e
xt
wi
t
h
g
r
o
u
p
s o
f
st
u
d
e
n
t
s
t
h
at
of
fer a rea
l
-t
im
e t
r
ansm
i
s
si
on
of t
e
xt
. C
h
at
ro
om
s enabl
e
m
a
ny
st
udent
s t
o
conve
r
se with each othe
r
in
t
h
e
sam
e
conve
r
sation from
websites. St
ude
nts
in a
n
e
ducat
ional c
h
at room
are
gene
ral
l
y
con
n
ect
ed
vi
a a
share
d
i
n
t
e
res
t
of e
ducat
i
o
n. C
h
at
ro
om
, w
h
i
c
h i
s
i
n
t
e
nde
d f
o
r
st
ude
nt
s
’
co
nv
ersatio
ns,
u
s
ually p
o
s
sesses ru
les
an
d in
stru
ctio
ns th
at th
ey req
u
i
re
stu
d
e
n
t
s to fo
llo
w. Co
mm
o
n
l
y u
s
ed
chat
r
o
om
s are not
m
ode
rat
e
d
so st
ude
nt
s m
a
y
chat
f
r
eel
y
,
whi
c
h m
a
y
l
e
ad t
o
l
o
ng
u
s
el
ess co
n
v
ersat
i
o
n
[
19]
.
2.
6.
St
anfo
rd Tagg
er
Stanford
Tagger is a
piece
of s
o
ft
ware t
h
at
rea
d
s
text
in
som
e
langua
ge
, in
our case
English, a
nd
assi
gns
pa
rt
s o
f
speec
h t
o
eac
h w
o
rd
, suc
h
a
s
n
o
u
n
,
ver
b
, a
d
jective, etc.
All user re
qui
re
m
e
nts are processed
u
s
ing
th
e
Stanford
tag
g
e
r
b
y
writing
th
e st
ate
m
en
ts in
th
e tex
t
area prov
id
ed
fo
r t
h
at
p
u
rp
o
s
e. In
add
itio
n
,
St
anf
o
r
d
Tag
g
e
r uses a set
of t
a
gs t
o
descri
be di
f
f
ere
n
t
co
m
ponent
s o
f
a st
at
em
ent
.
So
m
e
im
provem
e
nt
s i
n
t
h
e feat
ure
s
, p
a
ram
e
t
e
rs, and
l
earni
n
g
m
e
t
hods
gi
ve sm
al
l i
n
crem
ent
a
l
g
a
i
n
s i
n
POS t
a
ggi
ng
per
f
o
r
m
a
nce, i
n
addition to spl
itting certain c
a
tegori
es, part-of-speech
a
n
d phrasal
categor
ies, and
parsi
ng
with the re
sulting
sp
lit-catego
r
y Treeb
ank
g
r
am
mar
[20
]
.
St
anf
o
r
d
Ta
g
g
e
r t
o
keni
zes t
h
e st
at
em
ent
s
and
uses a l
a
rge
num
ber
of t
a
g
s
. Si
nce t
h
e i
n
put
t
e
xt
i
s
i
n
Co
llo
qu
ial Arab
ic th
at is so
meh
o
w
d
i
f
f
ere
n
t
an
d fa
r f
r
o
m
st
anda
rd
A
r
abi
c
, t
h
e t
e
xt
has t
o
go t
h
r
o
ug
h s
p
eci
al
cor
p
us t
o
pr
o
d
u
ce t
h
e st
a
n
da
rd
Ara
b
i
c
or E
ngl
i
s
h eq
ui
val
e
nce. T
h
e
req
u
i
red t
a
g
s
t
h
at
t
h
i
s
st
u
d
y
use
s
i
n
t
h
e
analysis phase
is therefore
produce
d
usi
n
g
St
anf
o
r
d
Ta
g
g
er
.
Tabl
e
1 s
h
o
w
s
these tags that
are
produce
d
using
St
anf
o
r
d
Tag
g
e
r t
o
be
use
d
i
n
t
h
e
pr
o
pose
d
a
p
p
r
oach
.
Tabl
e
1. T
h
e
S
t
anf
o
r
d
Ta
gs
Tag
Description
Tag
Description
CC
conjunctio
n,
coor
dinating
VB
ver
b
,
base form
IN
pr
eposition or
conj
unction,
sub
o
r
d
ina
ting
VBD
ver
b
,
past tense
JJ
adjective or
nu
m
e
ral,
or
dinal
VBN
ver
b
,
past par
ticiple
NN
noun,
co
m
m
on, singular
or
m
a
ss
VBP
ver
b
,
pr
esent tense,
not 3r
d per
s
on singular
NNP
noun,
pr
oper
,
sing
ular
VB
Z
ver
b
,
pr
esent tense,
3r
d per
s
on singul
ar
3.
AN
ALY
S
IS
O
F
GLOS
S
CH
AT A
N
D
RA
NK
OF P
I
V
O
T WO
RDS
In a
ppl
i
e
d m
e
tho
d
o
l
o
gy
, t
h
e
pape
r
pr
op
ose
d
an a
p
pr
oac
h
t
o
get
o
r
d
eci
d
e
t
h
e m
a
i
n
su
b
j
ect
s o
f
t
h
e
con
v
e
r
sat
i
on
b
e
t
w
een st
ude
nt
s base
d o
n
t
h
e
ext
r
act
ed
Ara
b
i
c
chat
t
e
xt
ke
y
w
or
ds
. Fi
rst
s
t
ep re
qui
re
s i
n
sert
i
n
g
t
h
e free co
n
v
e
r
sat
i
o
n
s
of al
l
st
ude
nt
s i
n
a sel
ect
ed cont
rol
l
e
d l
ear
ni
n
g
bl
og
. I
n
or
de
r t
o
nar
r
o
w t
h
e t
e
xt
analysis, a si
ngle topic is sel
ected. St
udents
'
state
m
en
ts are then c
o
nvert
e
d to
t
h
e
co
rre
spo
n
d
i
n
g
gl
os
s
usi
n
g
o
n
e
of av
ailab
l
e Arab
ic C
o
rpo
r
a m
a
in
ly th
e Palestin
e Ar
abic Co
rp
u
s
. In
t
h
is step,
a limited tem
porary
corpus
has
bee
n
c
o
n
s
t
r
uct
e
d
base
d
o
n
[1]
,
[1
2]
si
nc
e t
h
e
real
c
o
r
p
us i
s
n
o
t
a
v
ai
l
a
bl
e
up
t
o
t
h
e
d
a
t
e
of
s
ubm
i
t
ting
t
h
i
s
pape
r for publi
s
h. T
h
e
n
,
we a
n
alyze the glos
s of t
h
e wo
rds
to get the tags using
Sta
n
ford Tagger.
We a
ccept
t
h
e w
o
r
d
s t
h
at
have
a n
o
un
and a
d
ject
i
v
e
t
a
gs (
N
N
,
DTNN, JJ). After th
at, th
e algorith
m
co
m
p
u
t
es th
e
perce
n
tage
of the count of each noun
and the
adjective words by
counting the occurre
n
ce
s
of them
in the chat
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
JECE
Vo
l. 6
,
N
o
. 4
,
Au
gu
st 2
016
:
18
97
–
1
906
1
900
text. As a
deci
sion, the
hi
gh
perce
n
tage
ra
nked words
are
chosen accordi
n
g to t
h
e m
o
st pos
sible c
o
nve
r
sation
to
p
i
cs. Th
e fo
llo
wi
n
g
algo
rithm
su
mmarizes th
e m
a
in
tag
g
i
n
g
and
ran
k
i
n
g
step
s
wh
ich
illu
strated in
Figu
re 1.
A
l
go
rit
h
m Pseudo code: Tagg
ing a
nd r
a
n
k
i
ng ch
at
gloss
keywor
ds
Co
nv
ert ch
at tex
t
to
g
l
oss list
u
s
ing
C
o
rpu
s
fo
r Palestin
ian
Arab
ic (C
PA).
Obt
a
i
n
t
a
gs
co
r
r
esp
o
ndi
ng
t
o
gl
oss
w
o
r
d
s
us
i
ng
St
an
fo
rd
t
a
gge
r.
Determ
in
e th
e
set o
f
cand
i
d
a
t
e
wo
rds i
n
cluding
nouns
and
adjecti
v
es.
Count the
occ
u
rre
nces
of each key
word.
Place keywords in s
u
c
h
order.
Ob
tain th
e
d
eci
sio
n
fro
m
high occurre
nce val
u
es.
Fi
gu
re 1.
M
e
t
h
od
ol
o
g
y
bl
oc
k di
ag
ram
The following exam
ple
illustrates
the
propos
ed m
e
thod
usi
n
g a c
o
nve
rsation
for three st
udents.
St
ude
nt
s c
h
at
I
:
اد
ج
بعص
ناحت
م
ا
Ib
rahim
تصاو
م
لا
ناش
ع
ناح
ت
ما
ىلع
ترخأت
O
b
ad
a
ناحت
م
ا
هي
ف
لحا
ةي
افك
تق
و
شف
Ali
In t
h
e
first ste
p
,
we c
o
nve
r
t the statem
ents of
stud
en
ts' co
nv
ersatio
n to
th
e correspon
d
i
n
g
g
l
o
s
s
words as i
n
Ta
ble 2:
Tab
l
e
2
.
C
o
rpus G
l
oss
Chat
St
atem
ent
G
l
oss
اد
ج
بعص
ناحتما
Exa
m
,
Ha
rd, Ve
ry
ا
ىلع
ترخأت
ت
ص
ا
و
م
ل
ا
ناشع
ناحتم
Late,
On,
Exa
m
,
B
ecause, T
r
ansport
ناحتما
هيف
لحا
ةيافك
تقو
شف
No,
T
i
m
e
, E
nough
,
Solve,
I
n
, E
x
a
m
The sec
o
nd
st
e
p
obt
ai
n
s
t
h
e
t
a
gs
of
t
h
e
gl
o
ss
wo
rd
s
usi
n
g St
anf
o
r
d
Tag
g
er
as l
i
s
t
e
d i
n
Tab
l
e 3:
Tabl
e 3. St
an
fo
rd
Tag
s
Chat
Gloss
Exa
m
Hard
Very
Late
On Exa
m
Because
Tr
ansport No
Ti
m
e
Enough
Solve
In
Exa
m
Tag
NN
JJ
PRT
VBN
IN
NN
PRT
NN
PRT
NN
PRT
VBN
IN
NN
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISS
N
:
2088-8708
Inf
e
rri
ng
St
ude
nt
's C
hat
To
pi
c
i
n
C
o
l
l
o
q
u
i
a
l
Ara
bi
c Text
Us
i
ng
Se
ma
nt
i
c
..
.. (
F
ai
s
a
l
T
.
K
h
am
ayse
h)
1
901
In
t
h
e last step, th
e system
a
p
p
lies a co
un
t-alg
o
rith
m
th
at
tak
e
s th
e list of wo
rd
-string
s
an
d
cou
n
t
s
the occ
u
rre
nce
s
for
each word
of the
form
noun
or
a
d
jective in all c
o
nvers
a
tion, a
s
in Ta
ble 4:
Tabl
e 4. Perce
n
t
a
ge o
f
t
a
g
g
e
d
wo
rd
s of cha
t
I
Glo
s
s
Tag Occur
rences
Percen
tage
E
x
am
NN
3
50%
Har
d
JJ
1
16.
7%
T
r
anspor
t NN
1
16.
7%
T
i
m
e
NN
1
16.
7%
Tabl
e 4 sh
o
w
s
t
h
at
t
h
e candi
dat
e
key
s
whi
c
h are t
h
e o
n
e
s
wi
t
h
hi
g
h
co
unt
s
.
W
h
i
l
e
co
unt
i
n
g t
h
e
o
ccur
r
e
n
ces
o
f
w
o
rd
s pro
v
i
d
e
s a g
o
o
d
h
i
n
t
t
o
w
a
rd
s a g
o
o
d
d
ecisio
n
about th
e co
n
t
ex
t of
th
e co
nv
er
satio
n
,
it
may a
l
so
m
i
sl
ead
th
e d
ecisio
n
. Figu
re 2
illu
strates th
e main
p
i
v
o
t
wo
rd
s and
th
eir rates b
a
sed
on
th
eir
occurre
nces
in
th
e conversati
on.
Fi
gu
re 2.
Occ
u
rre
nces of
pi
vo
t
wo
r
d
s of chat
I
Th
e cou
n
t
s of
k
e
y sp
ok
en
wo
rd
s i
n
l
o
ng
er con
v
er
sa
tion
l
ead
t
o
m
o
re co
rrect an
alysis with lower
error
or m
i
slea
ding results.
T
h
e e
x
am
ple in stude
nt c
h
at
II
d
i
sp
lays th
e same co
nv
ersation
statem
en
ts o
n
the
sam
e
t
opi
c bet
w
een
ot
her
re
g
u
l
a
r st
u
d
ent
s
i
n
a
di
ffe
re
nt
cha
t
gr
o
u
p
.
S
o
m
e
sl
i
ght
up
dat
e
h
a
s bee
n
m
a
de
on
t
h
e
con
v
e
r
sat
i
o
n
o
f
st
u
d
e
n
t
s
usi
n
g ecl
ass.
p
p
u
.
ed
u, s
u
c
h
as
ha
vi
ng
l
o
nge
r c
o
nv
ersat
i
o
n
i
n
t
h
e
chat
:
St
ude
nt
c
h
at
I
I
:
تا
يمزراوخلا
سردم
موي
مك
لب
ق
ناحتما
ني
ع
Ahm
e
d
يشا
و
ناحتم
ا
ةدام
نم
رصتخا
ام
سب
Sara
ولجأ
ن
فر
عن
ول
تي
ر
اي
عوب
س
ا
نامك
Om
ar
ادج
ةب
عص
تايمزراوخلاو
طبخ
وتاناحتما
سردم
داھ
Sara
ول
بق
يلا
ناحت
م
ا
يف
تبسر
ام
يز
ناحتم
ا
يف
ب
سرا
في
اخ
انا
Ahm
e
d
ا
فوش
ح
ةركب
ةماعل
ا
تاناحتما
دعب
ونيعي
ب
اذا
سردمل
Om
ar
ش
مھفن
ت
ب
ام
ةي
مزر
ا
و
خ
انل
بيجي
حرو
ةدام
ديزي
حر
انيلع
درب
اذا
اذھ
Ahm
e
d
ت
مھف
Om
ar
To
keni
zi
n
g
t
h
e chat
t
e
xt
usi
ng
St
an
fo
rd
T
a
gge
r, M
A
D
A
M
IR
A o
r
a
n
y
ot
he
r t
o
ol
res
u
l
t
s
i
n
m
a
ny
er
ro
r
s
or
tagg
in
g
m
i
stak
es due to
th
e
h
i
gh
occu
rr
en
ces of
u
nkn
own
wor
d
s. Non
e
standar
d
A
r
ab
ic sp
ok
en
or
written
lan
g
u
a
g
e
g
e
ts m
a
n
y
o
f
th
ese unk
nown word
s
d
a
y
b
y
d
a
y. W
e
up
d
a
te
th
e g
l
oss
of u
nkn
own
Arab
i
c
co
llo
qu
ial wo
rd
s m
a
n
u
a
lly b
a
sed
on
th
e Palestin
ian
Arab
i
c
C
o
rp
us [
1
]
,
[
12]
. Ta
bl
e 5 s
h
o
w
s t
h
e t
o
t
a
l
num
ber
of occ
u
rre
nces
of each pi
vot
word in the te
xt of chat II
.
Whe
n
the chat
text is a
bit s
m
all,
tagging and
word
anal
y
s
i
s
i
s
hi
g
h
er
ro
r
pr
one
.
Th
e a
n
alysis shows that the key
Exam
o
c
cu
pies th
e first p
l
ace in
th
e co
nv
ersation
due
t
o
i
t
s
hi
g
h
occu
rre
nce co
u
n
t
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
JECE
Vo
l. 6
,
N
o
. 4
,
Au
gu
st 2
016
:
18
97
–
1
906
1
902
Tabl
e 5. N
o
u
n
s
an
d ob
ject
i
v
es of
stud
en
ts chat I
I
Pivot word/Tags
Occurrences
Rank
Instructor:
[
NN,
NN,D
T
NN
]
3
17.6%
Algor
ith
m
:
[DT
N
N,
DTNN,
JJ]
3
17.
6%
Exa
m
:
[
D
TNN
,
D
T
NN,
DTN
N
,D
TN
N,DT
NN,
NNP]
6
35.3%
Day:
[
NN]
1
6%
Materi
al: [NN
,
NN
]
2
11%
Week:
[
NN]
1
6%
General:
[
D
TJJ]
1
6%
Fi
gu
re
3 s
h
ow
s t
h
e
di
ffe
re
nt
occu
rre
nces
o
f
t
h
e
pi
vo
t wo
rd
s in th
e ch
at.
Se
lected words
are
only the
no
u
n
s a
n
d a
d
je
ct
i
v
es si
nce
su
ch t
y
pes
ha
ve
t
h
e m
a
in
in
flu
e
n
ces i
n
th
e conv
ersation
.
Fi
gu
re 3.
Occ
u
rre
nce of
pi
vot
wo
rd
s of
c
h
at
I
I
Results show t
h
at the de
cision of
inferring the s
u
bjects of
the chat
is taken according to the
highe
r
perce
n
t
a
ge
of
al
l
wor
d
s (
N
o
uns
, A
d
ject
i
v
e
s
). I
n
t
h
e
pre
v
i
ous e
x
am
pl
es t
h
e st
ude
nt
s’
chat
s ab
out
t
h
e exam
occupy t
h
e fi
rst possible
ra
nk
because
of the
resulte
d
highest perce
n
tage.
In reality,
this does not
e
x
clude
o
t
h
e
r lower
ran
k
e
d
top
i
cs; rath
er th
e
h
i
gh
est rate is th
e most possi
ble. Si
nce the
hi
g
h
l
i
g
ht
ed gl
oss w
o
r
d
s
are
nouns a
n
d adje
ctives, the
r
e is
no nee
d
t
o
a
n
alyze and co
m
p
u
t
e
occu
rre
n
ces
of ot
he
r w
o
r
d
s.
Thi
s
i
s
p
o
ssi
bl
e
by
th
e d
e
lib
er
ate i
g
nor
an
ce of
tag
s
o
t
h
e
r
th
an
no
un
s and
adj
e
ctiv
es.
H
o
w
e
v
e
r, pr
ono
un
s and abb
r
ev
iatio
n
s
ma
y
lead to m
o
re accuracy in re
sults. Many
factors ha
ve to be c
onsi
d
ere
d
to e
n
sure m
o
re accurate res
u
lts such as
scop
e, ti
m
e
o
f
co
nv
ersatio
n, p
r
ev
iou
s
ch
ats, p
a
rticip
an
ts
, etc. For e
x
am
ple, it is
m
o
re likely college stude
nt
s
t
a
l
k
abo
u
t
co
u
r
se m
a
t
e
ri
al
and e
x
am
s duri
ng t
h
e l
a
st
t
w
o wee
k
s o
f
t
h
e sem
e
st
er wh
en usi
ng
uni
v
e
rsi
t
y
e_class.
These
factors s
h
ould
be take
n i
n
c
onsid
eration whe
n
a
n
alyzing an
d c
o
unting the
occurre
nces
.
4.
CO
NSEC
UTI
V
E REPETIT
I
VE W
O
R
D
S
AN
D
DIST
RI
BUTION
F
A
CTOR
S
To
avo
i
d
m
i
sl
ead
ing
con
s
ecu
tiv
e wo
rd
repetitio
n
,
an
alyst
m
a
y co
n
s
id
er o
n
l
y on
e of th
e rep
e
titive
words.
For e
x
a
m
ple, the c
h
a
t
state
m
ent “
…
unf
ort
u
nat
el
y I
ha
ve
been
st
udyi
n
g
m
a
t
h
,
m
a
t
h
,
m
a
t
h
a
n
d
m
a
t
h
d
u
r
i
n
g
la
st
da
ys. I had
no
time to
stud
y
on
co
mp
iler exa
m
”
,
m
a
y not nece
ssarily m
ean
th
at th
e top
i
c is
ab
ou
t
math
, it co
u
l
d b
e
abou
t com
p
i
l
er ex
am
,
in
wh
ich
th
e
co
nv
ersatio
n
i
s
m
o
re lik
ely ab
ou
t. Fu
rt
h
e
rm
o
r
e, to
st
ren
g
t
h
e
n
t
h
e
al
go
ri
t
h
m
,
re
searche
r
s m
a
y st
u
d
y
an
d i
n
vest
i
g
at
e t
h
e
r
e
l
a
t
i
onshi
p
gra
p
h
m
odel
am
o
n
g
t
h
e
to
k
e
n
s
. Th
e relay o
f
th
e k
e
y wo
rd
s in
th
e co
nv
ersation
a
l tex
t
as well
as th
e relay o
f
p
r
on
oun
s h
e
lp
in
d
e
term
in
in
g
the co
nv
ersational co
n
t
ex
t with referen
ce to
k
e
y
w
o
r
ds
. Us
ua
l
l
y
t
h
e di
st
ri
bu
t
i
on o
f
t
h
e key
wor
d
s
th
ro
ugh
ou
t th
e ch
at tex
t
h
a
s a strong
er i
n
d
i
catio
n
th
an
t
h
e
co
nsecu
tiv
e
rep
e
titio
n
th
at
o
c
cu
rs i
n
so
m
e
li
mited
part
s. T
h
i
s
case i
s
il
l
u
st
rat
e
d i
n
Fi
gure 4 r
e
prese
n
t
i
n
g t
h
e st
udent
s
’
co
nve
rsat
i
o
n t
a
ki
ng a
b
o
u
t
a l
a
borat
or
y
ex
am
o
f
b
i
o
l
ogy co
ur
se.
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISS
N
:
2088-8708
Inf
e
rri
ng
St
ude
nt
's C
hat
To
pi
c
i
n
C
o
l
l
o
q
u
i
a
l
Ara
bi
c Text
Us
i
ng
Se
ma
nt
i
c
..
.. (
F
ai
s
a
l
T
.
K
h
am
ayse
h)
1
903
St
ude
nt
s c
h
at
I
II:
ءايحا
ربتخم
ناحتما
ىتم
يفص
ن
لا
؟
Sara
عورشم
مي
ل
س
ت
ودعبو
ةجمر
ب
رب
تخم
ناحتم
ا
ودعبو
ير
ظن
ةجمرب
نا
حتما
ةركب
فس
ل
ءايحا
رب
تخم
سيمخلا
ودعبو
ةجمر
بل
ا
.
Om
ar
مس
اي
.
ةجمر
ب
يف
ةج
مرب
يف
ةجمرب
عو
بس
ا
لك
.
Sara
ءاي
ح
ا
ربتخم
ريتك
قل
ق
ت
جات
ح
ي
امو
لھ
س
يفص
ن
لا
.
Ahm
e
d
؟يحابص
وھ
Sara
.
يفص
ن
لا
ءايحا
ربتخم
مدقنب
بي
ل
ا
س
ا
ةرضاحم
دعب
.
Ahm
e
d
It
i
s
ob
vi
o
u
s t
h
at
st
ude
nt
s t
a
l
k
abo
u
t
bi
ol
o
g
y
i
n
t
h
e fi
r
s
t
pl
ac
e even t
h
e w
o
r
d
“p
r
o
g
r
am
m
i
ng” c
o
unt
s
m
o
re. Th
is can b
e
represen
ted u
s
ing
sem
a
n
t
i
c
n
e
t-lik
e wh
ere occ
u
rrence
s
of ea
ch
gloss a
r
e to
be c
o
nsidered.
Goi
ng t
h
r
o
u
g
h
t
h
e sem
a
nt
ic net
l
o
o
k
i
n
g
for
hi
g
h
e
r
cou
n
t
s
gi
ve an
i
ndi
cat
i
on
of
t
h
e chat
t
opi
c. Th
e
di
st
ri
b
u
t
i
on of key
s
o
v
er
t
h
e chat
t
e
x
t
as ill
u
s
trated
i
n
th
e se
m
a
n
tic n
e
t
sh
ows a stro
nger in
d
i
cation
desp
ite
h
a
v
i
n
g
less
o
c
cu
rren
ce coun
t
o
f
so
m
e
rep
e
titiv
e weigh
t
s.
To
ach
i
ev
e th
i
s
, it is h
e
l
p
fu
l
to
scan
th
e text first
l
o
o
k
i
n
g fo
r sm
al
l
set of hi
g
h
occu
rre
nces
of
p
i
vo
t w
o
rd
s,
then com
pute the
betweenes
s
v
a
lu
es of id
en
tical
words fo
rm
in
g set o
f
land
m
a
rk
represen
tatio
n
s
. Th
e b
e
st
rep
r
ese
n
t
a
t
i
on
com
b
i
n
es pr
o
p
e
rt
i
e
s i
n
cl
udi
n
g
hi
gh
occurre
nces with
large
betweeness
val
u
es bet
w
een l
a
ndm
ark
wo
rds
.
The
s
e properties are
extrem
ely im
p
o
rta
n
t
and t
h
e
r
e
f
o
r
e s
h
o
u
l
d
be co
nsi
d
ere
d
i
n
de
fi
ni
ng t
h
e c
h
at
t
o
p
i
c. Fi
gu
re 4 s
h
ows t
h
e sem
a
nt
i
c
net
of t
h
e abo
v
e
exam
pl
e. The
pr
o
n
o
u
n
i
s
al
so co
nsi
d
ere
d
wi
t
h
t
h
e
di
am
ond
sy
m
bol
rep
r
esent
i
n
g t
h
e e
qui
val
e
nce
w
o
rds
.
Al
l
ot
he
r w
o
r
d
s s
u
ch a
s
co
n
n
ec
t
o
rs a
nd
ve
rbs
are del
i
b
e
r
ately d
r
opp
ed
due to
th
e weak p
a
rticip
ation
in
th
e
analysis phase.
Fig
u
re
4
.
Sem
a
n
tic n
e
t-lik
e of
p
i
vo
t word
s of
ch
at III
General
al
gor
i
t
hm
: In
ferri
n
g
th
e co
re of the ch
at
u
s
ing
seman
tic-n
et an
alysis
Inpu
t
: Ch
at tex
t
.
Process
:
C
o
m
put
i
n
g t
h
e
wei
ght
s
o
f
key
w
o
rds
usi
n
g
sem
a
nt
i
c
net
a
n
d
oc
cur
r
ence
s c
o
u
n
t
s.
Outp
ut
: C
h
at
core
.
Co
nv
ert ch
at tex
t
to
g
l
oss list
u
s
ing
C
o
rpu
s
fo
r Palestin
ian
Arab
ic.
Obt
a
i
n
t
a
gs
co
r
r
esp
o
ndi
ng
t
o
gl
oss
w
o
r
d
s
us
i
ng
St
an
fo
rd
t
a
gge
r.
Determ
in
e th
e
set o
f
cand
i
d
a
t
e
wo
rds i
n
cluding
nouns
and
adjecti
v
es.
Co
n
s
t
r
u
c
t t
h
e se
m
a
n
tic n
e
t rep
r
esen
tatio
n.
Count the
occ
u
rre
nces
of each pivot
word.
Determ
in
e th
e
set o
f
k
e
y
wo
rd
s with
t
h
e high occurre
nces
.
Determ
in
e th
e
betweeness
val
u
e of
pi
vot
w
o
r
d
s.
Place keywords in s
u
c
h
order
to obtain t
h
e
decision.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
JECE
Vo
l. 6
,
N
o
. 4
,
Au
gu
st 2
016
:
18
97
–
1
906
1
904
Acco
r
d
i
n
g t
o
t
h
e sem
a
nt
i
c
net
gra
ph
sh
o
w
n Fi
g
u
r
e 4 a
n
d Ta
bl
e 6;
t
h
e
candi
dat
e
l
a
n
d
m
a
rk w
o
r
d
s
with
th
ei
r
d
i
strib
u
tion
co
un
ts
are:
Lab:
ربتخم
: cou
n
t
= 4
+
1(
Prono
un)
=5
,
Pr
ogr
ammin
g
:
ةجمرب
: co
un
t=6+1
(
Pro
nou
n)
=7
,
B
i
ol
ogy
:
ءاي
ح
ا
: cou
n
t
=4
+1(
P
r
ono
un
) =5,
Mid
t
er
m:
يفصنلا
: cou
n
t
=4
+1(Pr
ono
un)
=5
Table 6. Pivot Land
Marks Di
st
ri
but
i
o
n
(
n
o
u
n
s a
n
d a
d
ject
i
v
es)
It
i
s
o
bvi
ou
s t
h
at
t
h
e
w
o
r
d
B
i
ol
ogy
e
x
p
a
n
d
s
ove
r t
h
e
w
hol
e c
h
at
i
n
w
h
i
c
h i
t
c
o
m
b
i
n
es t
h
e a
hi
g
h
count (m
ay not
be m
a
xim
u
m
)
and the large
betweeness
val
u
e. The
w
o
rd “
P
ro
gram
m
i
ng” cou
n
t
s
m
o
re
but
has
very
l
o
w
betweeness
v
a
lu
e.
T
h
e
tr
ad
eo
ff
be
tw
e
e
n
cou
n
t
o
f
wo
rd
s
an
d
betweeness
v
a
lu
es g
i
v
e
s a str
ong
i
ndi
cat
i
o
n of d
e
t
e
rm
i
n
i
ng
t
h
e t
e
xt
co
re.
Tabl
e
6 s
h
ow
s t
h
e l
a
ndm
ark
di
st
ri
b
u
t
i
o
n
o
f
pi
v
o
t
w
o
r
d
s
wi
t
h
di
f
f
er
ent
bet
w
ee
nes
s
v
a
lu
e
s
in
adjace
ncy list of the c
h
at words
.
Fi
gu
re
4
sh
ows th
e con
s
ecu
tiv
eness of
p
i
vo
t wo
rd
s wh
ile tab
l
e 6
sho
w
s t
h
e
po
we
r o
f
di
st
ri
but
i
o
n
of
t
h
ese
pi
v
o
t
w
o
rds
.
Thi
s
i
s
i
m
por
tant step
where
the word t
h
at occurs over
t
h
e
chat
h
a
s
h
i
gh
er po
ssib
ility to
b
e
selected
th
an
the wo
rd
t
h
at o
c
cu
rs in
so
m
e
li
m
i
ted
p
a
rts
o
f
th
e tex
t
.
Anoth
e
r
featu
r
e is th
at th
e wo
rd
s th
at
o
ccur tog
e
th
er h
a
v
e
v
e
ry h
i
g
h
p
r
o
b
a
b
ility if
th
ey o
ccur in
differen
t
p
a
rts of th
e
chat text. In t
h
e above e
x
am
ple, ta
bl
e 6 s
h
o
w
s t
h
at
t
h
e
bes
t
l
a
ndm
ark o
f
rel
a
t
e
d w
o
rds
are B
i
ol
o
g
y
,
l
a
b an
d
mid
t
er
m
.
Th
e
q
u
e
stio
n th
at t
h
e tex
t
starts
with
is abo
u
t
th
e Bio
l
o
g
y
lab m
i
d
t
er
m
ex
am
. Th
is also
sh
ou
ld
be
t
a
ken i
n
c
onsi
d
erat
i
o
n.
Se
ve
ral
co
nv
ersat
i
o
ns
have
bee
n
a
n
al
y
zed
whe
r
e
di
f
f
ere
n
t
u
n
i
v
ersi
t
y
st
ude
nt
gr
o
u
p
s
were aske
d
t
o
be
i
n
v
o
l
v
e
d
i
n
t
h
e
chat
bl
o
g
s
usi
n
g
a
v
ai
l
a
b
l
e e-class tem
p
lates. It is im
p
o
r
tan
t
to
m
e
n
tio
n
th
at
t
h
e si
zes
o
f
t
h
e c
o
nve
rsat
i
o
ns
use
d
i
n
t
h
i
s
st
udy
are
relatively s
m
all of
ma
x
i
mu
m
1
8
l
i
n
e
s
.
T
a
b
l
e
7
su
mm
arizes th
e an
alysis
o
f
t
h
ese
ou
tco
m
es of
4
g
r
o
u
p
s
ran
g
i
n
g
fro
m
3 to
7
st
u
d
e
n
t
s
ch
attin
g sev
e
ral ti
m
e
s
u
s
ing
t
h
e e-learn
i
n
g
tem
p
late
s.
Tabl
e
7. T
h
e
p
e
rcent
a
ge
of
co
rrect
ne
ss
of
i
n
f
e
rri
n
g
c
h
at
t
opi
c
Gr
oup/L
i
nes
G1(
2
students)
G2(
3
students)
G3 (
4
students)
G4(
5
students)
G5 (
6
students)
G6 (
7
students)
7
L
i
nes
50
55
70
74
76
84
9
L
i
nes
60
65
72
77
78
86
12
L
i
nes
70
70
75
80
80
87
18
L
i
nes
73
74
80
82
86
91
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISS
N
:
2088-8708
Inf
e
rri
ng
St
ude
nt
's C
hat
To
pi
c
i
n
C
o
l
l
o
q
u
i
a
l
Ara
bi
c Text
Us
i
ng
Se
ma
nt
i
c
..
.. (
F
ai
s
a
l
T
.
K
h
am
ayse
h)
1
905
Fig
u
re
5
illu
strates th
e co
rrectn
ess
p
e
rcen
tag
e
as
ch
at
lin
es grow i
n
th
e co
nv
er
satio
n. It is
clear
th
at
the correct
ness
is clear as
the conve
rsation lines
gr
ow
.
H
o
we
ve
r
very
l
o
n
g
c
o
nve
rsat
i
ons
m
a
y
t
a
l
k
abo
u
t
di
ffe
re
nt
m
a
jor
t
opi
cs
w
h
ere
t
h
i
s
st
u
d
y
di
d
n
o
t
anal
y
ze y
e
t
.
Fi
gu
re
5.
C
o
rre
ct
ness
of
i
n
fe
rr
i
ng c
h
at
t
o
pi
c
5.
CO
NCL
USI
O
N
Th
is
p
a
p
e
r
pr
op
o
s
es an
app
r
oach
t
o
inf
e
r th
e subj
ect of
educatio
n
a
l stud
ent ch
at.
Th
e stud
y su
gg
ests
an
im
p
o
r
tan
t
way to
d
e
term
i
n
e th
e co
re
o
f
th
e co
lloq
u
i
al
Ara
b
ic student
chat text. Th
is
approac
h
start
s
wit
h
extracting
the equi
valent glos
s
of each
word in English la
ngua
ge
using th
e Corpus of Pa
lestinian Arabi
c
. The
second step obtains the
tags
of eac
h
wo
r
d
us
i
n
g
St
an
fo
r
d
T
a
gge
r
w
h
i
l
e
f
o
cusi
n
g
o
n
t
h
e
no
u
n
s a
n
d a
d
je
ct
i
v
es
wh
ich
o
c
cur in
th
e co
nv
ersatio
n
.
After t
h
at, th
e ap
pr
oach counts the occurre
n
ces
of eac
h word in all
co
nv
ersatio
ns to
d
eci
d
e
th
e
main
su
bj
ect of th
e
writte
n
co
nv
ersatio
n
b
a
sed
o
n
t
h
e h
i
gh
er
p
e
rcen
tag
e
o
f
t
h
e
occurre
nces
of
noun a
n
d adjec
tive words.
Sem
a
nt
i
c
analy
s
i
s
and
t
h
e
conce
r
n
of
ot
her
w
o
r
d
s
he
l
p
i
n
ac
hi
evi
n
g
hi
ghe
r acc
uracy
.
Th
e
di
st
ri
b
u
t
i
on
of
t
h
e pi
v
o
t
wo
r
d
s t
o
co
ve
r t
h
e con
v
er
sat
i
o
n
has ve
ry
hi
g
h
im
pact
on de
ci
di
ng t
h
e co
r
e
of t
h
e
t
e
xt
. M
a
ny
ot
her
fact
or
s suc
h
as t
i
m
e
of t
h
e co
n
v
ersat
i
o
n, st
art
i
n
g st
at
em
ent
s
and t
h
e used
pl
at
fo
r
m
m
a
y
have
di
rect impact on t
h
e ac
curacy
o
f
d
e
term
in
in
g
th
e
n
a
tu
re
o
f
th
e conten
t. Althou
gh hu
m
a
n
in
terven
tio
ns
sho
u
l
d
be ab
an
do
ne
d, i
t
al
so hel
p
i
n
obt
ai
ni
ng
bet
t
e
r o
u
t
c
o
m
es such as i
n
cor
r
ect
i
ng c
o
l
l
o
q
u
i
a
l
cor
r
es
p
o
n
d
i
n
g
gl
oss
an
d t
a
gs,
pr
o
n
o
u
n
s, a
n
d
u
n
ex
pl
ai
ne
d t
e
rm
s.
Furt
her st
u
d
i
e
s
and ex
peri
m
e
nt
s
sh
o
u
l
d
rel
y
on real
com
p
r
e
hen
s
i
v
e
c
o
l
l
o
qui
al
c
o
r
p
u
s
. Anal
y
s
es o
f
th
e d
i
stribu
tion o
f
all typ
e
s o
f
word
s an
d
t
h
e an
alysis o
f
the se
m
a
n
tic rela
tio
n
s
of m
o
re lo
ng
ch
ats sho
u
ld
b
e
tak
e
n
i
n
accoun
t to
ob
tain
m
o
re accu
r
ate
resu
lts.
REFERE
NC
ES
[1]
M. Jarrar,
et
al
.
,
“
Building a C
o
rpus for Palesti
n
ian Arabi
c
: A Prelim
inar
y Stu
d
y
,”
Proceeding
s of the EMNLP
2014 Workshop
on Arabic Natur
a
l Language
Processing.
A
ssociation for
Computational Linguistics (
A
CL)
, D
oha,
Qatar, pp. 18-27, 2014.
ISBN: 97
8-1-937284-96-1.
[2]
M.
A.
Helou,
et al.
, “Towards B
u
ilding Lexical
Ontolo
g
y
Via C
r
oss-Language
Matching
,”
Proceedings of
the 7th
Conference on
Global WordNet.
Globa
l W
o
rdNet Association,
Tartu
,
Estonia, pp
. 346-3
54, 2014. ISB
N
:
7492329949978.
[3]
A. L
.
S
o
ar
es
,
et al.
, “
OnToC
o
ntent 2014 PC C
o
-C
hairs Mess
age,
”
Proceed
in
gs of The In
tern
ational
Workshop
on Ontology Co
ntent and
Evalu
a
tion (
O
nToCon
tent 2014)
. In
OTM 2014 Workshops,
pp. 575, 2
014. LNCS:884
2,
Springer. ISBN:
978-3-662-45549-4.
[4]
M. Altant
aw
y,
et al
., “Morphological Analy
s
is
and Generatio
n of
Arabic N
ouns: A Morph
e
mic Function
a
l
Approach,”
Proceed
ings of the I
n
ternational Co
nf
erenc
e
on Language Resource
s and Evaluatio
n
,
Va
l
l
e
t
ta
,
Ma
lta,
LREC 2010
, 17-
23 May
2010
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
I
J
ECE
Vo
l. 6
,
N
o
. 4
,
Au
gu
st 2
016
:
18
97
–
1
906
1
906
[5]
L. Al-S
af
adi
,
et al
., ”Dev
eloping Ontolog
y
fo
r Arabic B
l
ogs Retrieval,”
International
Journal of Computer
Applica
tions
, vo
l/issue: 19
(4), pp
. 0975–8887
, 20
11.
[6]
E. Karya
w
ati
,
et al
.,
“O
nt
ol
o
g
y
-
based
Why
-
Q
u
est
i
o
n
Anal
y
s
i
s
Us
i
ng Le
xi
co-
S
y
n
t
act
i
c
Pat
t
e
rns
A.
A.I
.
N
,
”
IJEC
E Internationa
l
Journal of El
ectrical and Computer Engineerin
g
, vol/issue:
5
(
2)
,
pp
. 3
1
8
~
332,
2
015
.
[7]
S.
Tavassoli and K.
A.
Zweig,
“Analy
zing th
e activity
of a pers
on in a chat b
y
combining netw
ork analy
s
is and
fuzz
y logic
,
”
P
r
oceedings of
t
h
e 2015 IE
EE
/
A
CM Internat
io
nal
Conferen
ce
on Advan
ces i
n
Social
Netwo
r
ks
Analysis and
Mining
, ASONAM
'
1
5
,
2015
.
[8]
S. C. H. Haichao Dong a
nd Y.
He, “Structural
analy
s
is of
chat messages for to
pic detection,”
Online Informati
o
n
Rev
i
ew
, vol/issue: 30(5)
, pp
. 496
--516, 2006
.
[9]
M. Sawalha, “
O
pen-Source Resources
and S
t
andards
for Arabic W
o
rd S
t
ructure Anal
ys
is
: F
i
ne Graine
d
Morphological
Analy
s
is
of Arabic
Text Corpor
a,” PhD. Th
esis.
School of Comp
uting. University of Leeds, 2011.
[10]
F.
Alota
i
by
,
et al.
, “Cliti
cs in
Arabic l
a
nguag
e
: a statisti
cal st
ud
y
,
”
Pr
o
ceed
in
gs
of Paci
fi
c As
ia Confer
en
ce o
n
Language, Infor
m
ation and Com
putation
2
4
, (PACLIC 24). Sendai, Jap
a
n, pp. 595
-602, 2010
.
[11]
M
.
A. S
.
Hazaa
,
et al
., “Automatic Extr
actio
n of
Malay
Co
mpound Noun
s
using a H
y
brid
of Statistical and
Machine Learn
i
ng Methods,”
IJECE Internation
a
l Journal of El
ectrical and Com
puter Engineerin
g
, vo/issue: 6(3),
2016.
[12]
M. Jarrar, et a
l
.
,
“
B
uilding a Corpus for Palestinian Arabic: a Prel
im
inar
y
Stud
y,
”
Pr
oceed
ings
of the EMNL
P 201
4
Workshop on Arabic Natural Language Proces
sing (
A
NLP
)
,
Association for Computational Linguistics
,
ANL
P
2014
, Doha, Qatar, pp
. 18–2
7, 20
14.
[13]
W. Zaghouan
i
,
“Critical Survey of the Freel
y
Availab
l
e Arab
i
c
Corpora,”
Proceed
ings of th
e Int
e
rnational
Conference on L
anguage Resour
ces and Evaluation (
L
REC'2014), OSACT Workshop
. Rejk
avik
, I
celand, pp
. 26-3
1
,
2014.
[14]
N. Habash, “Introduction
to Arabic
Natur
a
l Language Proces
sing. S
y
nt
h
e
sis Lectur
es on Human Language
Techno
logies,”
Journal of Ma
chine Translation,
vol/issue:
24(3-4
)
, pp
. 285-289
, 2
010.
[15]
A. Al-Thubaity
,
et al.
, “New Language Resources for Arabic: C
o
rpus
Containin
g
More Than
T
w
o Million Words
and a Corpus P
r
ocessing Tool,”
Proceedings of the
Asian Lang
uage
Processing (
I
ALP)
Conference
, pp. 67-70
,
2013.
[16]
K. Almeman and M. Le
e
,
“
A
utom
atic Build
ing
of Arabic Mult
i Diale
c
t T
e
xt C
o
rpora b
y
Boo
t
strapping Dia
l
e
c
t
Words,
”
Proceedings of The First International Conferen
ce on
Communicatio
ns, Si
gnal Processing, and
their
Applica
tions (
I
CCSPA’13)
, Sharjah, UAE, 12-14
Feb. 2013
.
[17]
W. Salloum an
d N. Hab
a
sh,
“ADAM: Anal
y
zer for
Dialectal Arab
ic Morpholog
y
,
”
Journ
a
l of King Sau
d
University - Co
mputer and Info
rmation Sciences,
vol/issue: 26(4
)
, pp
. 372–378
,
2014.
[18]
S. Ervin-Tripp,
“An Analy
s
is of
the
In
teraction of
Langu
age,
To
pic, and Li
sten
er
. American
Anth
ropologist,” vol.
66, pp
, 86–102
,
1964.
doi:10
.
15
25/aa.1964.66
.suppl_3.02a00050
.
[19]
M. Hanini,
et
a
l
.
, “Text Modeling in
Adaptiv
e Education
a
l C
h
at Room,”
International Jour
nal of Computer
Applica
tions
, vo
l/issue: 10
3(5), p
p
. 33-37
, 2014
.
[20]
N. Arman and
S. Jabbarin
,
“Genera
ting Use C
a
se Models fro
m Arabic Us
e
r
Re
qui
re
me
nt
s i
n
a
Se
mi
a
u
t
o
ma
te
d
Approach Using a Natural La
ng
uage Processing
Tool,”
Journal
of Intelligen
t Systems
,
vol/issue: 24(2), pp. 277-
286, 2015
.
BI
O
G
R
A
P
HY
OF
A
U
T
HO
R
Dr. F
a
is
al Kham
a
y
s
e
h is
a Com
puter S
c
ienc
e as
s
i
s
t
ant profes
s
o
r. He recei
ved his
BS
in
Computer Information – Advan
ced Computer C
a
reers
,
from Southern Illinois U
n
iversity
, USA
1992, and MS in Computer Science from same uni
versity
in 1
995. He received his PhD in
Computers and
Information S
y
s
t
ems from the C
o
lle
g
e
of
Computers and
Infor
m
ation, H
e
lwan
University
, Eg
ypt,
in 2009
.Curr
e
ntly
work
ing
at Pa
lestine Poly
t
echnic Univ
ersity
as
instructor
and head of
Dept. of
Informat
ion Technolog
y
and as
instructo
r
of MS in Inf
o
rmatics. Dr
.
Khamay
seh is a resear
cher in
software engi
n
eering r
e
search
group (SERG)
at college o
f
Information Technolog
y
and Co
mputer Engineer
ing.
He
is inter
e
sted in Computer Algorithms,
Software
Eng
i
neering and E-lear
ning.
M
y
LiveDNA
is
970.11840.
Evaluation Warning : The document was created with Spire.PDF for Python.