TELKOM
NIKA
, Vol.14, No
.4, Dece
mbe
r
2016, pp. 14
72~147
9
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i4.4026
1472
Re
cei
v
ed Ma
y 20, 201
6; Revi
sed O
c
tob
e
r 28, 201
6; Acce
pted No
vem
ber 1
2
, 2016
Email Classification Using Adaptive Ontologies
Learning
Suma T*
1
,
K
u
m
a
r
a
S
w
am
y Y
.
S
2
1
Departme
n
t of CSE, JJT
Universit
y
, Ra
jasth
an, India
2
Departme
n
t of Computer Sci
ence & En
gin
e
e
ri
n
g
, Nagar
ju
na Co
lle
ge of
Engg. & T
e
chnolo
g
y
,
VT
U, Karnatak
a, India
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: tsumamurth
y.cs@gmail.co
m
1
, y
sklds
w
a
m
y
@y
ahoo.co.in
2
A
b
st
r
a
ct
Ema
il is a w
a
y of commun
ica
t
ion for the tod
a
y
’
s in
tern
et world, privat
e an
d gover
n
m
ent
sector or
pub
lic sect
or al
l are
use
d
e
m
a
il for co
mmuni
cation w
i
th th
ei
r clie
nts. T
hey
can free
ly se
n
d
nu
mber
of
mail
to their c
lie
nt w
i
thout
disturb
i
n
g
the
m
. N
o
w
a
day
e
m
ai
l co
mmu
n
icati
on
is
also
a w
a
y of
advertis
i
ng, s
o
me
ma
il is als
o
sp
am, lots of soc
i
al
mai
l
s are th
ere. Ca
teg
o
ri
zation a
nd ha
nd
ling l
o
ts of email is an i
m
port
ant
task for the research
es, as they all ar
e w
o
rking i
n
th
is field
by using the
Natura
l lan
g
u
a
ge proc
essin
g
an
d
ontol
ogy
extra
c
tion pr
ocess.
User g
e
t frustrated for
h
a
n
d
li
ng l
o
ts of mail
s and r
e
a
d
in
g
those for fin
d
i
n
g
there is any
i
m
p
o
rtant
mai
l
, sometime use
r
delete l
o
ts
of mai
l
w
i
thout read
ing
and i
n
that case may
be
some i
m
p
o
rtan
t mail w
h
ich co
ntain the i
m
por
tant infor
m
atio
n may b
e
ab
o
u
t me
eting,
se
mi
nar etc. is al
so
del
eted. F
o
r av
oidi
ng th
ese sc
enar
ios h
e
re a
u
to up
dat
io
n of
sched
ule c
a
le
ndar
proce
dur
e is pro
pos
ed
by
the auth
o
r. Co
ncept extracti
o
n
an
d cl
usterin
g
of conc
ept is
don
e b
a
sed
o
n
fu
zz
y
lo
gic, s
i
milar
mail
patt
e
rn
is gro
u
p
ed i
n
a
sa
me cl
uster if
similar
i
ty is l
e
s
s
than thr
e
sh
ol
d val
ue
a n
e
w
cluster is
defi
n
ed for th
at. F
r
om
the extracte
d
conc
ept
aut
hor
est
abl
ish
the r
e
lati
ons
hip
betw
e
e
n
them a
n
d
ge
nerate
the
re
sult.
Co
mp
utation
o
v
erhe
ad
is
als
o
ca
lcul
ated
fo
r differe
nt se
t
of mails
a
n
d
fi
nds th
at it
tak
e
s very less time in
computi
ng lar
g
e emai
l data se
t.
Ke
y
w
ords
: Concept vector, Feature SROIQ, Ontology
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Uses of inte
rnet is a ha
bit for today’s
gen
eratio
n
and it’s a basi
c
nee
ds for all
orga
nization.
All org
aniza
tion uses E-mail for
com
m
unication
with their
clie
nts. Othe
r th
an
prima
r
y mail
comp
any
sen
d
email
for ad
vertisem
ent, for p
r
om
otion
a
l an
d
so
cial
messag
e i
s
a
l
so
there. Nu
mb
er of Email receivin
g per
day is
more than 20 on a
n
averag
e so
handlin
g the
s
e
email are no
w a hu
rdle fo
r the user. Readin
g
ever
y
mail is not p
o
ssible a
nd i
n
that case some
time may be
importa
nt mai
l
also
u
s
er n
o
t rea
d
an
d l
e
ft that whi
c
h
is n
o
t be
nefi
c
ial fo
r u
s
e
r
.
As
per
req
u
ire
m
ent of email
categ
o
ri
zatio
n
there ar
e v
a
riou
s
wo
rk i
s
on
goin
g
bu
t its not up to
the
mark, they only categori
e
s
the mail but after this al
so
we are not eligible for a
c
hi
eving best re
sult
like num
ber
of email amo
unt is huge. Inbox hand
lin
g is the impo
rtant issue
s
. So we need
a
better em
ail manag
eme
n
t system for
handli
ng hu
g
e
amou
nt of email in on
e
day. If a people
open th
eir
e
m
ail id after
2
or 3
days
he
see l
o
ts of
e
m
ail is th
eir, i
t
s hum
an n
a
ture
he
can’t o
pen
each email
o
ne by one i
n
that ca
se
so
me impo
rtant
meeting d
e
tail may he m
i
ss
out. Mach
ine
learni
ng tech
nologi
es u
s
e
d
to classify based on thei
r
details. Before u
s
ing it cl
assifier al
so
must
be traine
d by using set of
sampl
e
s, pre
parin
g of
training sa
mple
also ne
ed
s lo
ts of labor work
[1].
Re
sou
r
ce De
scription
Fra
m
ewo
r
k (RDF) defin
ed
th
e sem
antic
of the re
sou
r
ce
s which i
s
best de
scrib
e
on web l
angu
age recommen
ded
by the Worl
d Wide
We
b Con
s
o
r
tiu
m
[4].
Semantic
web whic
h is
c
a
rried by the
RDF
data could be
queried
us
ing language like SPARQL
[5], there i
s
also an
othe
r techniq
ue
for this sem
antic
web
le
arnin
g
calle
d
We
b ontol
o
g
y
Lang
uage
(O
WL
). Ontolo
gy is freque
ntly used in
the semanti
c
web a
ppli
c
ation [6]. If two
document co
ntain different
word
s an
d n
o
word is
co
mmon for tha
t
also we
can
establi
s
h rel
a
tion
betwe
en the
m
if the conte
x
t of the docu
m
ent is
sim
ila
r by the u
s
e o
f
simila
r ve
cto
r
feature sp
ace
[2]. By the uses of ontologies em
ail validation c
an
be done, it is very helpful in reusability of
kno
w
le
dge, it can sha
r
e th
e kno
w
le
dge
and an
alyse
s
it.
It also helpful in sep
a
ra
te the comm
on
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Em
ail Classifi
cation
Usi
ng
Adaptive O
n
tologie
s
Lea
rn
ing (Sum
a T)
1473
function
ality from
different
kn
owl
edg
e
area.
It extra
c
t
the kn
owl
e
dge
from em
ail
content
s and
defined
the
concept
whi
c
h
is
gene
rally f
i
nd in
email
d
o
cum
ents [3]. He
re
autho
r
prop
osed
wo
rk
to generate the useful info
rmation fro
m
email and
u
p
date that in calend
ar rega
rding any eve
n
t
meeting
etc.
let take a
n
e
x
ample u
s
e
r
received lot
s
of mail an
d if don’t che
c
k i
t
but from a
u
t
o
updatio
n tech
nique
s calen
dar i
s
upd
ate
and user
n
o
t
get worry a
bout re
ading
each mail. The
pro
c
e
s
s of a
c
hieving thi
s
b
y
the usi
ng
concept
of o
n
tology ba
se
o
n
fuzzy logi
c
and fu
zzy ba
se
d
feature extra
c
tion an
d co
nce
p
t finding
is done.
Rel
a
tionship is
establi
s
h
ed b
e
twee
n obtai
ned
con
c
e
p
t and feature, an
d cl
usteri
ng of re
ceived mail
i
s
also don
e by the using fuzzy logic. Based
on
similarity
self-clu
ste
r
ing
and
con
c
e
p
t
clu
s
terin
g
i
s
d
e
fined a
nd
re
duced the
effort for th
e em
ail
cla
ssifi
cation
by the red
u
ct
ion in con
c
e
p
t. Distri
b
u
tio
n
of wo
rd in
email set is done by u
s
i
ng
con
c
e
p
t vect
or
and
it i
s
p
r
ocesse
d
one
by o
ne. If two word a
r
e
find
simila
r th
e
y
kept
in to
t
h
e
same
clu
s
te
r. By the use
s
mean a
nd d
e
v
iation a me
mbershi
p
fun
c
tion d
e
fined
for ea
ch
clu
s
ter.
Con
c
e
p
t vect
or i
s
defin
ed
for ea
ch
clu
s
t
e
r a
nd if a
word i
s
n
o
t si
milar to
any
existing
clu
s
ter a
new
clu
s
ter is defined for th
at word.
Here
numbe
r
of co
nce
p
t fetched
is not describe in advan
ce.
Re
st of the
p
aper o
r
gani
zation a
s
follo
w:
se
ction 2
cover the va
ri
ous re
se
arch
wo
rk in
the field of e
m
ail cla
s
sification. In se
ction 3 a
ll the
u
s
abl
e co
ncep
t rega
rding
e
m
ail cla
s
sification
is discu
s
s, in se
ction 4 pro
posed metho
d
is
given an
d at last con
c
l
u
sio
n
is given
.
2. Related Work
Based
on thi
s
mo
rphol
ogi
cal mo
del a
system i
s
de
veloped
whi
c
h take
s inp
u
t i.e. an
Arabi
c wo
rd. That mean
s the system uti
lize mo
rphol
o
g
ical Ara
b
ic
Natural Lan
g
uage Pro
c
e
s
sing
(ANLP
)
and
translate
s
the Arabi
c langu
age int
o
English l
angu
age [21
]. Word Se
nse
Disambig
uati
on (WSD) is cru
c
ial
and
its sig
n
if
ica
n
ce i
s
p
r
omi
nent in eve
r
y application
of
comp
utationa
l
lingui
stics. WSD
i
s
a cha
llenging
p
r
obl
em of
Natu
ral
Lan
guag
e Proce
s
sing
(NL
P
)
[22]. In this work us
es
Hierarch
ical
clu
s
tering
alg
o
rith
m with
differe
nt simila
rity
measures wh
ich
are
co
sine. NlL is po
we
rful
l mech
ani
sh
m whi
c
h
we
can u
s
e
s
in v
a
riou
s a
ppli
c
ation he
re we
are
usin
g NLP for the
email
data p
r
o
c
e
s
sing. Sem
a
n
t
ic email
wa
s very
popul
ar
colle
ction
o
f
sema
ntic e
m
ail and m
ana
ged its ta
sk g
a
in focused b
y
the resea
r
ches. At the ti
me of integ
r
a
t
ion
different m
e
a
n
ing i
s
u
s
e
d
for the
notati
on, email
is f
u
ll of with
m
e
ta data [7].
Con
s
tru
c
tion
of
ontologi
es d
o
ne for email
cla
ssifi
cation,
from t
he em
ail data set feature
s
calcu
l
ated whi
c
h i
s
use
d
for trai
ning of data,
for email cl
assifier t
hey
use
d
feature
vector an
d cla
ssify emai
l in
different cate
gory [8]. Logi
cal an
d de
cision ba
sed th
eoreti
c
mod
e
l
is defined [9
] set of updat
e
were given to
the email as per ce
rtain u
t
ilities
and co
nstrai
nt. Interferen
ce p
r
obl
em is de
scrib
e
here, lo
gical
model
can
b
e
take
ca
re b
a
se
d on
acce
ptable em
ail resp
on
se
s. In polynomial ti
me
it is possibl
e
to gene
rate
the optimal
messag
e-
han
dling rule for the deci
s
io
n
theory. Sup
port
Vector
Ma
chi
ne (SVM
) u
s
ed by the a
u
thor fo
r the
cl
assificatio
n
a
nd filtration of
email ba
se
d
on
text classifie
r
,
it is used fo
r finding the u
n
wa
nted
em
a
il [10]. Main motive of this approa
ched
is
to prevent un
usu
a
l and h
a
rmful email sp
readi
ng in
o
u
r
system
and
damag
e it. In this [11] pap
er
author fo
cu
sed the
for m
o
st u
s
a
b
le ta
sk pe
rfor
med
with
email
read,
reply, d
e
lete, an
d d
e
l
ete
without
rea
d
.
By used
the l
earni
ng
co
ncept predi
ction
of the
s
e fo
ur task, for that
hori
z
o
n
tal a
n
d
vertical le
arni
ng is
used. Clusteri
ng te
ch
nique
s i
s
use
d
for g
r
ou
pin
g
of simila
r
kinds
of email
s
,
email have
similar ki
nd of
attributes
he
re by
u
s
ing t
e
xt mining si
milar type of
email find a
nd
clu
s
terin
g
of email is don
e
[12]. Using spee
ch ac
t the
o
ry [13] categ
o
rization of e
m
ail done by the
use
d
of
sen
d
e
r inte
nsi
on i
n
email
syste
m
. Fuzzy
ba
sed spam
filtering
in email which not requi
re
tr
aining
of dat
a s
e
t in
advanc
e
, it us
ed t
he c
l
us
ter
i
ng
mec
h
anis
m
bas
e
d on fuzzy logic
[14]. Her
e
[19] author gi
ves the idea
about lat
ent semantic
anal
ysis an
d it is for finding th
e simila
r wo
rd
s in
the poetry an
d analyzin
g the emotion
a
l related
n
e
ss. In [20] based
on sem
antic l
angu
age aut
ho
r
prop
osed co
occuren
c
e se
arch
e
ngin
e
f
o
r Chin
es
e-ti
betan
and
m
onitor t
hat b
a
s
e
s
o
n
sem
a
ntic
langu
age but
they are not g
i
ven detail
ide
a
about sema
ntic ann
otatio
n.
3. Proposed
Work
for Email Clusterin
g
3.1. Conce
p
t Vector
Senso
r
Co
ncept vector is associate
d
with
ontology
, in term of
extracted e
n
tity from
e
m
a
il
c
o
r
p
ora
.
A co
nc
e
p
t
ve
c
t
or
is
a lin
k
er
be
twe
e
n
entity in
a
n
email
cont
ent o
r
d
o
cu
ment.
Con
s
id
eratio
n of both se
mantic an
d le
xical analys
i
s
is done in th
e formation o
f
concept vector,
terminol
ogie
s
used in emai
l corp
ora i
s
re
lated to
the concept. Con
c
ept formation
approa
ch is t
he
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1472 – 147
9
1474
extraction
of
basi
c
termino
l
ogy used in
email whic
h i
s
relevant to
the most
of the email th
e
s
e
basi
c
termi
n
o
l
ogy can b
e
called a
s
ca
ndi
date.
3.2. Conce
p
t Cluste
ring Algorithm
In We have set of email N , and in th
at set numbe
r of e-mail is
like
,
,
....
all
together
with
a con
c
ept v
e
ctor
of
mail
,
,....
each e
m
ail
has its o
w
n
spe
c
ific p
r
o
perty
some e
m
ail may has som
e
prop
erty sa
me so from
set of mail let
grou
p as
,
,.....
. We mak
e
one co
ncepts pattern for each email in
. for concepts
,
its conce
p
ts pattern
is defined
,
by:
,
,
....,
|
,
|
,
|
,....,
|
(1)
Whe
r
e,
|
∑
∑
(
2
)
For
1
. Here
indicate the n
u
mbe
r
of occurren
ce of
in E-mail
,
ca
n
be define
d
as:
1
when email
, belongs to
group
, if it not belongs to any grou
p value o
f
0
.
Here we h
a
ve con
c
ept
s pattern, take an exa
m
ple that
we have si
x emails
,
,
,
,
,
,
belongin
g
to group
,
,
,
,
resp
ectively it can be simplif
y as
,
,
,
belong
s to
and
,
,
,
belon
gs t
o
No
w o
c
c
u
r
r
e
n
ce
s of
in the
s
e e
m
ails be
1, 2, 3, 4, 5 a
nd 6
re
spe
c
ti
vely. Email pattern
of
of
can be
cal
c
ulate
d
by the usin
g equ
ation numb
e
r
2.
0.3,
0.7
(
3
)
Our motive is the making cluster, ba
sed
on these ema
il pattern. Combine the em
ail in
into clu
s
ter b
a
se
d on the
s
e email patterns.
A clu
s
ter h
a
ve certai
n n
u
m
ber of
emai
l pattern a
nd
is the
p
r
od
uct
of d
on
e-di
m
ensi
onal
Gau
ssi
an fun
c
tion. Let
be a cluster containin
g
e
m
ail pattern
,
,
,
....
. Let
,
,
,
...,
,
,
1
.
3.3. Adap
tiv
e
-clus
t
ering
Let sup
p
o
s
e
here u
s
e
r
don’t have a
n
y idea abo
ut existing n
u
mbe
r
of clu
s
ter, let
sup
p
o
s
e no
cluster i
s
exist
s
at the begin
n
ing an
d clu
s
ter
can b
e
created a
s
per
need
s. If a new
clu
s
ter i
s
d
e
fined a
detail
containin
g
ab
o
u
t that cl
u
s
te
r a fun
c
tion i
s
also
ge
nerated. While if a
n
email patte
rn
is combin
ed
with existin
g
clu
s
ter it deta
il contai
ning f
unctio
n
up
dat
ed a
s
pe
r ne
w
entry.
Initial Conditi
on:
1.
numbe
r of cl
uster
2.
,
,,......
be the numb
e
r of clu
s
ter
3.
,
,....,,
4.
,
,....,
5.
,
,
....,
6.
Matchin
g
or
similarity between
Similarity calculation
∏
We have defined som
e
thresh
old value
between 0 and 1 if we want to create larg
e
r
clu
s
ter thre
sh
old value set to small value.
pass the simila
rity test
if its
value is greater than
set thre
sh
old
value
. If value of
increa
se
s simil
a
rity is
less and i
n
th
at ca
se nu
mb
er of cl
uste
r
is
more. If s
i
milarity tes
t
is not pass
e
s
by
foration of
new
clu
s
ter is done for that
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Em
ail Classifi
cation
Usi
ng
Adaptive O
n
tologie
s
Lea
rn
ing (Sum
a T)
1475
Ne
wly forme
d
cluste
r ha
s only one member mail pa
ttern
is the newly formed
clu
s
ter
1
if
passe
d th
e simila
rity test then n
e
w
clu
s
ter i
s
n
o
t formed
and
find that like
it
s
most
simila
r t
o
t
he clust
e
r
. Pattern of email sorted
in a
decre
a
s
ing order, calcul
ation for
simila
rity is done an
d com
pare it with all
the existing cluster.
Conc
ept Extrac
tion
Pattern extra
c
tion can be
expre
s
sed in
the followin
g
form:
N
’
=
N
S
(
4
)
Whe
r
e,
.
...
(
5
)
′
′
′
′
.
.
.
′
(
6
)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(
7
)
With,
.
...
(
8
)
′
′
′
′
.
..
′
(
9
)
For
1
clearly, S is a
weighti
ng matrix. The
aim of pattern red
u
ctio
n
is achieves
by finding an
approp
riate
S
such that
c
is smalle
r than
n
.
By usin
g
clu
s
terin
g
algo
rithm, con
c
ept
s pattern h
a
ve
bee
n g
r
o
u
p
ed into
cl
ust
e
rs,
an
d
con
c
e
p
ts in the pattern vector V are
also cl
uste
re
d according to that. One
pattern vecto
r
i
s
assign
to o
n
e
cl
uste
r,
so
for different-d
ifferent
cluste
r we h
a
ve dif
f
erent p
a
ttern
vecto
r
s. If
we
have
c
l
u
ste
r
in th
at case
we
have
c
ex
tracted
p
a
ttern vecto
r
also. Th
e
eleme
n
t
of S a
r
e
fin
d
based on the
obtained
clu
s
ters, and pa
ttern extracti
on will be do
ne. Our p
r
op
ose
d
wei
ghtin
g
approa
che
s
i
d
ha
rd,
soft a
nd mixed. In
t
he h
a
rd
–
w
ei
ghting
app
roa
c
h, e
a
ch
word is only
allo
we
d
to belong to
a clu
s
ter, a
n
d
so it only co
ntribute to a
new extracte
d pattern. In t
h
is
case ele
m
ent
of
S
are defin
ed as follo
ws:
,
,
,
.
(
1
0
)
If
l
is not unique in
(10
)
, one of them is cho
s
e
n
ran
domly. In the soft-weightin
g
approa
ch, ea
ch
wo
rd i
s
al
lowe
d to con
t
ribute to all
new
extra
c
te
d patterns,
with the deg
re
es
depe
nding o
n
the values of
the membership functio
n
s.
The elem
ent of
S
in (4) are defined a
s
follows:
(11
)
Combi
nation
of hard
weig
h
t
ing an
d
soft
weig
hting a
p
p
roa
c
h
give
a
ne
w a
pproa
ch called
mixed-weighti
ng app
roa
c
h.
For this
ca
se,
t
he element S in (8) are d
e
fined a
s
follows:
(
1
2
)
Whe
r
e
is obtained by (10
)
and
is obtain
ed by (11), a
nd
is a user-defined con
s
tant lying
betwe
en 1 an
d 0. Note
is not relate
d to the clu
s
teri
ng
but
it concerns the me
rg
e of comp
one
nt
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1472 – 147
9
1476
pattern in a clus
ter into
a res
u
lt
in
g patte
rn. If thresho
l
d value i
s
small, the n
u
m
ber
of cl
uste
r is
small, and e
a
ch
clu
s
ter covers m
o
re t
y
pe match. In this ca
se, a smalle
r
value favor
soft-
weig
hting a
n
d
get hig
her
accuracy.
ca
n vary bet
we
en 0 a
nd 1, a
s
threshold in
cre
a
ses
num
be
r
of cluste
r also increa
se
s.
3.4. Extrac
tion of Rela
tio
n
Let Rel
a
tion
extraction
in
an em
ail do
cument
can
b
e
don
e b
a
se
d on
gramma
r rule o
r
Part of spee
ch (POS) like Nou
n
, verb, adverb et
c. Establi
s
h the relation betwe
en these tokens
or word by t
he use of POS like which POS belon
gs to which
Nou
n
or verb
or it belong
s to
adverb
etc. S
y
ntactic p
a
ttern i
s
u
s
ed to
find t
he Part
_
o
f Relatio
n
s.
In RDF mo
de
l data
subje
c
t
predi
cate
an
d obje
c
t is
consi
dered a
n
d
it woul
d
be
extracted
wi
th sema
ntic relation bet
we
en
them and thei
r domai
n.
Define Is_A relation
ship b
e
twee
n No
un
and it
can
b
e
found by checkin
g
for h
y
pernym
s
in WordNet For exampl
e:
The
obtain
e
d
output
hype
rnymically
rel
a
ted
syns
et
can b
e
re
con
s
tructed
by th
e trail
of
hypernymi
call
y related sy
nset
s let take an
example:
{robi
n,re
dbre
a
st}
@
→
{bird}
@
→
{
anim
a
l,a
n
imate_b
eing
}@
→
{o
rga
n
ism,life_form,living_thing
}@
→
is a
tran
sitive
, sem
antic relation th
at can b
e
con
s
id
ered
a
s
IS_A
of KIND OF
and
dire
ction
of
arrow rep
r
e
s
ent as up
wa
rd pointing [18
]
.
3.5. Email Classifica
tion
based on o
n
tolog
y
is set of train
i
ng email
s
, E-mail cla
ssifi
cation ca
n be
done a
s
:
Author
spe
c
if
y the match
e
d
thre
shol
d
,
A
s
s
u
m
e
t
h
a
t
clu
s
te
rs are
obtaine
d for the words in the con
c
e
p
t vector
. then we find weighti
ng matrix
S
a
nd conve
r
t
to
′
by the use of SROIQ onto
l
ogy tool extraction.
The
syntax a
nd
sema
ntics of SROIQ
i
s
su
mm
ari
z
e
d
in T
able
1. The
set
of SROIQ
con
c
e
p
ts is recu
rsively de
fined usin
g the con
s
truc
t
o
rs in the up
per pa
rt of the table, wh
ere
A
∈
,
,
are
con
c
ept
s,
,
role
s,
an individual, an
d
a positive integer.
Table 1 Te
rm
inology for on
tology tool
Name
S
y
nta
x
Semantics
Concepts
atomic concept
nominal
top concept
negation
/
conjunction
ח
⋂
existential restriction
∃.
|
,
∅
min cardinality
.
|
∥
,
∥
ex
ists self
∃.
|
,
∈
Ax
ioms
Com
p
lex rol
e
in
clusio
n
⊑
⊆
disjoint roles
,
∩
∅
concept inclusion
⊑
⊆
concept assertion
∈
rol
e
as
s
e
rti
on
,
〈
,
〉
∈
4. Simultion
Resul
t
And
Analy
s
is
The ontol
ogy
extraction e
ngine a
nd visualizati
on too
l
is reali
z
e
d
usin
g C#.
N
et
built on
the Visual St
udio
2010 pl
atform. The
Semantic
E-Mails were pre-proc
essed for S
pelling
and
Gram
mar
Ch
eck usi
ng Mi
cro
s
oft Office
librarie
s
inte
grated
with the appli
c
atio
n. The English
Dictionary av
ailable
within
Microsoft Office pa
ckage was utilized as a
benchmark. The
ontol
ogy
engin
e
used f
o
r evalu
a
tion
is also interfa
c
ed to t
he
Ou
tlook Mail
Cli
ent and the a
c
tivity semant
ic
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Em
ail Classifi
cation
Usi
ng
Adaptive O
n
tologie
s
Lea
r
n
i
ng (Sum
a T)
1477
details
were
succe
s
sfully updated to
the Calend
ar. Notificati
on rem
a
inde
rs could al
so be
enabl
ed fo
r
use
r
. E-M
a
ils have
bee
n
clu
s
tere
d
ba
sed o
n
th
e p
a
ttern extra
c
te
d an
d th
e
NLP
visuali
z
ation
clea
rly dem
o
n
strate
s th
e
relation
bet
ween the
con
c
epts extracte
d. He
re Auth
or
use
s
E
n
ro
n
E-mail
data
set, and
try to
find o
u
t th
e
exact d
a
ta
set ba
sed
on
these
detail
li
ke
Date, time, v
enue, m
eetin
g etc.
and
fin
d
the
num
b
e
r of con
c
ept i
n
ea
ch m
a
ils.
The B
C
3
Co
rpus
[16, 17] co
nsi
s
ts of a
bout
40 thre
ad
s e
m
bodying
26
1 E-Mail
s. Th
e BC3
co
rpu
s
is
a pa
rt of
the
W3
C co
rpu
s
,
as sho
w
n in
Figures 1
-
6.
Figure 1. Concept Extracte
d for Each Mail in
BC3 Corpus
Figure 2. Rel
a
tion obtain
e
d
for Each M
a
il in
BC3
Figure 3. Con
c
ept Extracte
d for Each M
a
il in
Cu
stom Co
rp
us
Figure 4. Relation Extrac
ted for Mail in
Cu
stom Co
rp
us
Figure 5. Ontology Extract
e
d
Figure
6. Co
mputation Ti
me For
Numb
er
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1472 – 147
9
1478
Ontology
rep
r
esentation
for BC3
corp
us i
s
s
h
ow
n
in
F
i
g
u
r
e
7
w
h
er
e
r
e
la
tio
n
s
h
i
p
establi
s
h
bet
wee
n
o
b
taine
d
con
c
ept.On
t
ology extra
c
t
i
on for
cu
sto
m
data
s
et i
s
visuali
z
ed
wh
ich
con
s
i
s
t of 2
0
emails.
The
o
n
tology extra
c
tion fo
r
emai
l 2 i
s
visuali
z
ed in
Figu
re
8 which
con
s
i
s
t
of co
ncept
su
ch
as pe
ople,
area.
Relatio
n
shi
p
am
ong
con
c
e
p
t
from
variou
s
e
m
ail
s
a
nd ba
sed
on
this ontolo
g
y extraction the
calen
dar i
s
u
pdated for ev
ery inco
ming
and outg
oing
email.
Figure 7. Schematic
rep
r
e
s
entatio
n of
ontology extraction for B
C
3 corpu
s
data
s
et
Figure 8. Sch
e
matic
representation of
ontology extraction for
Cu
stum Corp
us
corpu
s
data
s
et
5. Conclusio
n
Email is p
r
i
m
ary source
of comm
uni
cation fo
r th
e bu
sine
ss
orga
nization,
we a
r
e
depe
ndent
o
n
it for ou
r communi
catio
n
to the hig
h
e
r auth
o
rity. Orga
nization
can
se
nd variou
s
types
of email for trans
f
erring
s
o
me informat
ion calling for meeting,
s
o
me organiz
a
tion
send
prom
otional
email, so
me
are fo
r adve
r
tising th
eir
p
r
odu
cts. So h
andle of all
ki
nds
of email
and
read e
a
ch e
m
ail is a hu
rdle for user a
nd may esca
pe an impo
rt
ant mail in that case. To a
v
oid
these
case a
nd
b
e
tter ma
nagem
ent of
email, he
re
based
on
co
nce
p
t extra
c
ti
on
clu
s
teri
ng
is
done fo
r that
if a new
mai
l
is arrive
s is simila
ri
ty for that mail pat
tern is match
ed with
existi
ng
clu
s
ter, if
simi
larity value i
s
more th
an
or
equal
to
the t
h
re
shol
d valu
e it acce
pted
for that
clu
s
te
r
if similarity value i
s
not m
a
tche
d with
any cl
u
s
te
r value a
ne
w
cluster is
defi
ned a
n
d initi
a
lize
some
me
mb
er fu
nctio
n
fo
r that. Extra
c
ted con
c
ept
s rel
a
tion
ship
i
s
e
s
tabli
s
h
a
nd g
e
ne
rate
the
auto upd
atio
n is don
e wit
h
cale
nda
r. We a
r
e al
so
cal
c
ulatin
g computation
o
v
er hea
d
for
the
different num
ber of mail
s.
Referen
ces
[1]
Y Hu, C G
uo,
X Z
han
g, Z
G
u
o, J Z
hang, X He.
An Intelli
g
ent Spa
m
F
ilte
r
ing Syste
m
Based o
n
F
u
zz
y
Clusteri
ng
. F
u
zz
y
S
y
stems
and Kn
o
w
l
e
dge D
i
scov
e
r
y
, 20
09, F
SKD '
09. Si
xth
Internatio
nal
Confer
ence
on
.
T
i
anjin. 20
09:
515-5
19.
[2]
Kolcz A, Chow
dhur
y
A, Als
pector J.
T
he impact of feat
ure sel
e
ctio
n on sig
natu d
e
t
e
ction
. CEAS.
200
4.
[3]
Kazem T
aghva
,
Julie Bors
ack
,
Jeffre
y
S. Co
om
bs, Alle
n C
ond
it, Steven Lumos, T
homas A.Nartker
.
O
n
tology-
bas
e
d
Classific
a
tio
n
of Email
. IT
CC, IEEE Computer 2003
Society
.
2003:
194-198.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Em
ail Classifi
cation
Usi
ng
Adaptive O
n
tologie
s
Lea
rn
ing (Sum
a T)
1479
[4]
Lassila O, S
w
ick R.
Res
ourc
e
Descr
iptio
n
F
r
amew
ork (
R
DF
) Mode
l a
n
d
Syntax Sp
ecifi
c
ation.
W3C
Recomme
nd
ati
on. 200
4.
[5]
Prud’
homme
au
x E, S
eab
or
ne A.
SPARQL Query L
anguage for
RDF
. W3
C C
a
nd
id
a
t
e
Recomme
nd
ati
on. 200
8.
[6]
Gruber R. T
o
w
a
rds pri
n
cip
l
es
for the desi
g
n
of ontolo
g
ies
used for kn
o
w
l
edg
e shari
ng.
International
Journ
a
l of Hu
man-C
o
mputer S
t
udies
. 19
95; 4
3
: 907-9
28.
[7]
L Mcdo
w
e
ll, O Etzioni, A Halevy
, H Levy
.
Semantic e
m
ai
l.
In WW
W ’04
:
Proceedi
ngs
of the 13th
intern
ation
a
l co
nferenc
e on W
o
rld W
i
d
e
W
e
b
.
Ne
w
Y
o
rk, NY, USA. 2004: 244-
254.
[8]
K
T
aghva, J Borsack, J Coo
m
bs, A Condit
,
S Lumos,
T
Nartker,
Ontol
ogy-b
ased cl
a
ssificatio
n
of
email
. In
fo
rmati
o
n
T
e
ch
no
l
ogy
: Co
di
n
g
an
d C
o
mp
u
t
i
n
g
[C
o
m
pu
te
rs and
Co
mmu
ni
ca
ti
o
n
s
], 20
03
.
Procee
din
g
s. IT
CC 2003. Internatio
nal C
onfe
r
ence o
n
. 200
3
:
194-19
8.
[9]
Luke McD
o
w
e
ll
, Oren Etzioni, Alon H
a
l
e
v
y
.
S
e
mantic E
m
a
il:
T
heory an
d A
pplic
atio
ns
. De
partment of
Comp
uter Scie
nce & Engi
ne
e
r
ing, Un
iversi
t
y
of W
a
shingto
n
.
Seattle, W
A
.
200
4.
[10]
Qing Y
ang, F
a
ng-Min
L
i
.
Sup
port vector
ma
chin
e for c
u
sto
m
i
z
e
d
e
m
a
il filt
erin
g b
a
se
d o
n
i
m
prov
in
g
latent s
e
ma
nti
c
in
dexi
n
g
.
Machi
ne
Le
ar
nin
g
a
n
d
C
y
bern
e
tics, 2
0
05. Proc
ee
din
g
s of
20
05
Internatio
na
l C
onfere
n
ce o
n
. Guangz
ho
u, Chin
a. 200
5; 6: 378
7-37
91.
[11]
Dotan
Di C
a
stro, Z
ohar K
a
rni
n
.
You’
ve got Mail, and Here is What
you
Could do Wi
th It!nAnaly
z
in
g
and Pre
d
icti
ng
Actions on E
m
ail Mess
ages
.
W
S
DM’16. Sa
n F
r
ancisco, C
A
, USA. 2016.
[12]
SF
Shazme
en
, J G
y
ani.
A
nove
l
a
ppr
oac
h for c
l
usteri
n
g
e-
mail
us
ers usi
ng
patter
n
match
i
n
g.
Electron
ics Co
mputer T
e
chnolo
g
y
(ICE
CT
), 2011 3r
d Int
e
rnati
ona
l Co
n
f
erence o
n
. Kan
y
ak
umari.
201
1: 205-
209.
[13]
VR Carva
l
ho,
W
W
Cohen.
On the col
l
ectiv
e
classificati
on
of email "s
pe
e
c
h acts".
Proce
edi
ngs of the
28th
annual international ACM S
IGIR conference on
Research
and
development
in information
retrieval. Sa
lva
dor, Brazil, 20
0
5
.
[14]
Y Hu, C Guo,
X Z
han
g, Z
Gu
o, J Z
hang, X He.
An Intelli
g
ent Spa
m
F
ilte
r
ing Syste
m
Based o
n
F
u
zz
y
Clusteri
n
g
. F
u
zz
y
S
y
stems
and Kn
o
w
l
e
dge D
i
scov
e
r
y
, 20
09. F
SKD '
09. Si
xth
Internatio
nal
Confer
ence
on
.
T
i
anjin. 20
09:
515-5
19.
[15]
Horrocks I, Kutz O, Sattler U.
T
he even
mor
e
irresistibl
e
SR
OIQ
. 2006.
[16]
Ulrich
J, Murr
a
y
G, Care
ni
ni G.
A Pu
bli
c
ly Avai
la
ble
Annotate
d
C
o
r
pus for S
u
p
e
r
v
ised E
m
ail
Summari
z
a
tion.
AAAI08 EMAIL Workshop. C
h
ica
go, USA. 2
008.
[17] http://baila
ndo.
sims.berkel
e
y.
edu/
e
n
ro
n/enro
n
_
w
it
h_cat
egor
ies.tar.gz
[18]
F
e
llb
aum C. W
o
rdN
e
t: An ele
c
tronic le
xical
datab
ase. 19
9
8
[19]
W
u
jian Y
a
n
g
, Lia
n
y
ue
Li
n. Process Impro
v
ement
of LS
A for Semanti
c
Relate
dn
ess
Computi
n
g
.
T
E
LKOMNIKA T
e
leco
mmunic
a
tion C
o
mputi
n
g Electron
ics a
nd Co
ntrol.
20
14; 12(4): 1
045
-105
2.
[20]
Liro
ng Qiu. W
e
bsite Res
ourc
e
Monitori
ng
Pl
a
tform Supporti
ng T
i
betan a
n
d
U
y
ghur
Lan
gu
age B
a
sed
on Sema
ntics.
T
E
LKOMNIKA Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
ng.
2013; 1
1
(8): 4
766-
477
3.
[21]
Abdu
lrahm
an Ahmed Alza
nd
,
R
Ibrah
i
m.
Di
acritics of Ar
a
b
ic N
a
tura
l L
a
ngu
ag
e Proc
e
ssing (A
NLP)
and
its qu
al
ity assess
me
nt
. Industria
l En
gin
eeri
ng
an
d
Oper
atio
ns
Mana
geme
n
t (
I
EOM), 2015
Internatio
na
l C
onfere
n
ce o
n
. Dub
a
i. 20
15: 1
-
5.
[22]
N Patel, B Pat
e
l, R Parikh, B Bhatt.
Hierarc
hical c
l
usteri
ng
techni
que for
w
o
rd sense
dis
a
mbi
guati
o
n
usin
g Hi
nd
i
WordNet
. 20
1
5
5th
Nirma
Univ
ersit
y
Internati
o
n
a
l C
onfere
n
ce
on
Engi
ne
erin
g
(NUiCONE). A
h
med
aba
d. 20
15: 1-5.
Evaluation Warning : The document was created with Spire.PDF for Python.