Indonesi
an
Journa
l
of El
ect
ri
cal Engineer
ing
an
d
Comp
ut
er
Scie
nce
Vo
l.
1
4
,
N
o.
1
,
A
pr
il
201
9
, p
p.
450
~
454
IS
S
N: 25
02
-
4752, DO
I: 10
.11
591/ijeecs
.v1
4
.i
1
.pp
450
-
454
450
Journ
al h
om
e
page
:
http:
//
ia
es
core.c
om/j
ourn
als/i
ndex.
ph
p/ij
eecs
Effecti
ve XQ
u
ery key
wor
d usin
g XML qu
ery pro
cessing
E. Sesh
atheri
1
,
T. Bh
uvanes
w
ari
2
1
Com
pute
r
Scie
n
ce
and Engi
ne
ering
,
Manonm
ania
m
Su
ndar
ana
r
U
nive
rsit
y
,
Indi
a
2
Depa
rtment of
Com
pute
r
Appli
ca
t
ions,
Que
en Mar
y
’s Co
ll
eg
e (Autonom
ous),
India
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
A
ug
9,
2018
Re
vised
O
ct
2
8
, 2
018
Accepte
d
J
an
21
, 2
01
9
The
da
ta
h
as
struct
ure
d
is
deter
m
ine
d
using
the
standa
rd
is
kno
wn
as
XM
L
where
asla
rg
e
a
m
ount
of
dat
a
h
as
consum
ed
through
int
ern
et
co
nsist
of
th
e
both
struct
ur
al
d
at
a
form
at
as
well
as
sem
i
structural
da
ta
form
at
which
get
s
stored
and
proc
essed
where
as
XM
L
al
low
the
dat
a
of
sem
i
-
struct
ure
d
and
hie
ra
r
chi
c
al
da
ta
re
pre
sent
at
ion
not
onl
y
consist
of
conc
ept
wit
h
indi
vidual
it
ems
from
var
i
ous
kind
of
data
base
but
al
so
ha
ve
re
l
at
ionship
among
dat
a
it
ems
.
Th
e
u
ti
l
ized
knowledg
e
b
ed
is
prov
ide
d
with
con
ci
se
ideas
f
or
bot
h
struct
ure
d
and
s
emi
struct
ure
d
d
at
a
f
il
es
,
XM
L
document
content
s
and
ra
pid
with
exact
solu
tions
for
the
quer
i
es
re
quire
d
at
an
y
t
ime.
The
user
ca
n
se
arc
h
the
ir
re
sourc
es
with
the
hel
p
of
quer
ie
s.
Sear
ching t
he
re
source
s with
the
help
of
quer
ie
s
is
not
a
sim
ple
ta
sk,
where
inacc
ur
at
e
r
esult
and
comple
xity
would
occ
ur.
H
enc
e
it
is
not
a
be
tt
er
wa
y
for
se
arc
h
i
ng
the
re
sourc
es
.
Thi
s
pap
er
proposes
the
quer
y
answer
ing
sy
st
em
of
Li
nea
r
sea
rc
h
usi
ngwild
ca
rd
sea
rc
hfor
ex
tract
ing
the
fre
qu
ent
pat
te
rn
to
m
ax
imize
y
our
se
arch
re
sults
in
li
bra
r
y
d
ataba
se
on
XM
L
document
to
ext
r
ac
t
th
e
m
ost
re
le
v
ant
fe
eds
from
the
la
rge
fi
le di
r
ec
t
l
y
.
It
wi
ll he
l
p
the user to
f
ind
his r
esour
ce
s
co
m
ple
te
l
y
.
Ke
yw
or
d
s
:
Data m
ining
Linear
searc
h
a
lgorit
hm
Tree
base
d
ass
ociat
ion
r
ules
(TA
R
)
W
il
d
ca
rd sear
ch
XML
do
c
um
ent
Copyright
©
201
9
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
E. S
es
hathe
ri,
Com
pu
te
r
Scie
nce a
nd E
ng
i
ne
erin
g,
Ma
nonm
ania
m
Sun
dar
a
nar
U
niv
e
rsity
,
Tiru
nelveli
–
627 0
12, I
nd
ia
.
Em
a
il
:
esesha3@gm
ai
l.co
m
1.
INTROD
U
CTION
XML
is
a
sta
ndar
d
f
or
de
scri
bing
ho
w
in
for
m
at
ion
is
struc
ture
d.
It
has
be
com
e
a
po
pula
r
f
or
m
at
fo
r
storing
a
nd
s
har
i
ng
data
a
cro
ss
hete
roge
neous
platf
orm
s.
The
repr
esentat
ion
of
XML
is
fle
xi
ble
an
d
intero
per
a
ble w
hic
h
is
f
re
quently
us
ed
in
a
pp
li
cat
io
n
an
d
able
to
c
reate
in
va
rio
us
p
la
tf
or
m
s.
In
ord
e
r
to
kn
ow
the
str
ucture
of
the
XML
file
us
er
nee
ds
t
o
know
t
he
sem
antic
s
befor
e
que
ryi
ng
t
he
do
c
um
ent
wh
ic
h
ne
eds
to
form
ing
the qu
ery.
I
n
this res
earch
,
it
is pro
po
s
ed
a
m
et
ho
d
for
retrievi
ng
m
or
e
eff
ic
ie
nt
m
or
e
accu
rate
res
ults
or
the
qu
e
ries
m
ade
by
the
use
rs
on
the
X
ML
do
c
um
ent
[1
]
.T
he
ori
gina
l
XML
do
c
um
ent
is
interpret
ed
to
Mod
ifie
d
T
ree
base
d
Associ
a
ti
on
R
ules
(TAR
)
file
s
w
hich
wer
e
sh
a
pe
d
by
fr
e
quent
patte
rn
s
on
the
ori
gin
a
l
do
c
um
ent.
It
pro
vid
es
c
onci
se
representat
io
n
of
Xm
l
do
cu
m
ent
based
o
n
the
co
ntent
an
d
structu
re
of
X
m
l
file
[2
]
. An
a
ppro
a
ch
f
or
Ta
r
us
e
d as
m
ined
ru
le
s
w
hich
ta
kes
RSS f
ee
ds
as in
put wh
ic
h
pro
vi
de
the m
or
e sui
ta
ble
and
sta
nda
rd
data
get
sto
re
d
in
the
f
or
m
at
of
XML
in
bo
t
h
t
he
XML
co
ntent
as
w
el
l
as
struct
ure
in
t
he
do
c
um
ent
[3
]
.
A
novel
fr
e
qu
ent
-
patte
rn
tre
e
(FP
-
tree
)
st
r
uctu
re;
ou
r
perform
ance
study
shows
that
t
he
F
P
-
grow
t
h
m
et
ho
d
is
eff
ic
ie
nt
a
nd
scal
able
f
or
m
ining
the
f
re
qu
e
nt
patte
rn
s
of
both
l
ong
a
nd
s
hort
an
d
al
s
o
order
of
m
agn
it
ud
e
i
s
faster
tha
n
A
pr
i
or
al
go
rithm
[4
]
.
I
n
[
5]
pro
po
s
es
the
al
gor
it
h
m
of
Ma
xim
al
Fr
equ
e
nt
I
tem
set
(MFI
)
an
d
im
pro
vised
f
re
quent
patte
r
n
tre
e
for
associat
i
on
r
ule
m
ining
.
T
his
al
gorithm
gen
erates
fr
e
qu
e
nt
it
e
m
set
s
witho
ut
us
in
g
ca
nd
i
date
set
s
an
d
Com
plexity
Param
et
er
(CP)
tre
es.
I
n
[
6]
discu
sses
the
ap
proa
ch
of
Tree
Ba
sed
As
so
ci
at
ion
Rule
s
(TA
R)
play
s
an
i
m
po
rta
nt
ro
le
for
reducin
g
the
retrieval
tim
e
of
qu
e
ry.
In
[
7]
in
their
pap
e
r
the
stud
y
hi
gh
li
gh
t
the
analy
sis
of
la
rg
e
scal
e
dataset
pr
oce
ssing,
ha
nd
li
ng
chall
en
ges
and
it
s
syst
e
m
atic
rev
ie
w
is
com
pr
ehen
sive
.
In
[
8]
has
il
lustrate
d
a
m
et
ho
d
as
m
ine
Tree
-
base
d
associat
ion
r
ul
es
in
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Eff
ect
iv
e XQu
e
ry keyw
ord
us
i
ng XML
query
pr
oc
essin
g
(
E.
Ses
ha
t
heri
)
451
XML
do
c
um
ents
wh
e
reas
t
hi
s
r
ul
eo
ff
e
rd
at
a
in
XML
do
c
um
ent
with
c
onte
nt
as
wel
l
as
structu
re.
In
thi
s
w
ork
[9]
offers
m
or
e
su
it
able
and
st
and
a
r
d
data
ha
sstor
e
d
as
Xm
l
fo
rm
at
in
bo
th
the
structu
re
as
well
as
con
te
nt
of
Xm
l
do
cum
ent
base
d
on
t
he
TAR.
In
[
10
]
pro
vid
e
c
on
ci
s
e
represe
ntati
on
o
f
XML
do
cum
ent
and
al
so
to
pro
vid
e
fast,
a
ppr
ox
im
at
e
an
swer
s
t
o
the
qu
eries
wh
e
ne
ver
required.
I
n
[
11]
has
pro
pose
d
ap
rior
i
al
gori
thm
is
us
e
d
to
fin
ding
the
us
a
ge
patte
rn
s
by
m
od
ifi
ed
ver
si
on
cal
le
d
a
pr
i
or
i
gr
a
ph.
These
r
ules
are
us
ed
to
ass
ist
for
pr
e
dict
ing
the
su
it
able
web
pa
ges
for
t
he
use
r
to
visit
fea
s
ibly
in
f
ur
t
her
as
a
se
rv
ic
e
pr
ov
i
der.
I
n
[
12
]
.
T
he
perform
ance
of
this
m
et
ho
d
i
s
good
in
ease
XML
doc
um
e
nt
but
does
n’
t
perform
wo
th
XML
doc
um
e
nt
with
com
plex
an
d
irregular
str
uc
ture
i
n
t
oo
l
is
sai
d
t
o
b
e
X
qu
e
ry,
t
he
la
ngua
ge
to
i
dent
ify
ing
a
nd
el
e
m
ent
extracti
on,
at
tr
ibu
te
s
f
r
om
th
e
XML
doc
ume
nt.
I
n
[
13]
ha
ve
re
pr
e
sente
d
an
al
gorithm
na
m
el
y
C
Mt
ree
Mi
ner
wh
ic
h
is
eff
ic
ie
nt
in
com
pu
ta
ti
on
al
ha
ve
de
te
rm
ined
al
l
near
est
a
nd
m
or
e
re
peate
d
s
ub
tree
in
t
he
root
e
d
unorde
red
tree
s
data
base.
Th
e
DR
YADEP
AREN
T
is
represente
d
from
[14]
is
the
rec
ent
quic
k
tree
m
inin
g
al
gorithm
.
Hen
ce,
it
has
e
xt
racted
t
he
s
ub
tree
wh
ic
h
is
em
bed
ded
wi
th
trees
m
ai
ntained
with
an
cest
or
relat
ion
s
hip
a
m
on
g
the
node
s
an
d
bet
wee
n
the
a
ncest
or
desce
nd
e
nt
pa
irs
eve
n
in
pa
ren
t
-
c
hild
no
des.
T
his
pap
e
r
pro
pose
s
the
Mult
inom
ia
l
Naïve
Ba
ye
sia
n
(MNB)
Cl
assifi
er,
Art
ific
ia
l
Neu
ral
Netw
ork
(
ANN)
a
nd
Suppor
t
Vecto
r
Ma
chine
(SVM)
f
or
m
ini
ng
em
otion
f
r
om
te
xt.
In
our
set
up,
S
V
M
ou
tpe
rfo
rm
ed
ot
her
cl
assif
ie
rs
with
prom
isi
ng
accuracy
[
15]
.
Thi
s
stud
y
has
il
lustrate
d
the
disa
dv
a
ntage
s
of
the
ab
ove
m
ent
ione
d
te
chn
iq
ue
get
inco
rpor
at
e
d
a
nd
disco
ve
ry
of
the
ne
w
te
chn
i
qu
e
.
Attri
bu
te
Or
ie
nte
d
Ind
uction
High
le
vel
Em
erg
ing
Patt
ern
(
AOI
-
HE
P
)
has
bee
n
pro
ven
ca
n
m
in
e
f
reque
nt
an
d
sim
il
ar
patte
rn
s
and
the
fi
nd
i
ng
AOI
-
HEP
patte
rn
s
with
co
nf
i
dence
m
ining
patte
rn
[
16]
.
This
researc
h
w
ork
pro
posed
im
pro
ved
al
go
rithm
for
stemm
ing
Ind
on
e
sia
n
te
xt.
The
res
ult
of
t
he
resea
rch
s
hows
th
at
the
pro
posed
al
go
rithm
was
the
be
st
fo
r
Ind
on
esi
a
n
te
xt
pro
ces
sin
g pur
po
s
e w
it
h
sc
ore
of
0.648
[17
].
2.
SEMI
STR
U
CTU
RED
D
A
TA TEC
HNIQUES
USIN
G
X
M
L Q
UER
Y
P
ROCESSI
NG
2.1
.
Associ
ati
on
Ru
le
s
The
f
oc
us
of
da
ta
m
ining
co
m
m
un
it
y
is
ba
sed
on
the
a
dv
ancem
ent
te
chn
iq
ue
f
or
ge
ne
ral
struct
ur
e
extracti
on
f
rom
heterogen
e
ous
XML
data
is
sai
d
to
be
m
ining
sem
i
-
st
ru
ct
ur
e
d
data.
The
def
a
ult
ap
proac
h
from
XML
data
for
ass
ociat
io
n
r
ule
m
ining
wh
e
reas
it
hel
p
to
rec
ord
the docu
m
ent
of
XM
L
into
the
m
od
el
o
f
rati
on
al
data
a
nd
finall
y
it
get
stored
in
a
re
la
ti
on
al
da
ta
ba
se.
He
nce,
the
se
sta
nd
a
rd
t
ools
get
ap
plied
us
in
g
this
m
et
ho
d
in
the
relat
ion
al
database
to
pe
rfor
m
ru
le
m
i
ning.
T
he
tim
e
con
s
um
ption
and
in
volvem
ent
of
m
anu
al
intr
us
i
on
due
t
o
m
a
pp
i
ng
pr
ocess
are
a
vaila
ble
in
this
m
et
ho
d.
T
her
e
f
or
e,
this
a
ppro
ac
h
is
n
ot
appr
opriat
e
f
or
stream
ing
X
ML
data.
X
Q
ue
ry
is
a
la
ngua
ge
of
XML
Q
ue
ry
w
hich
ha
s
addresses
the
c
apab
le
requirem
ent
for
qu
e
ryi
ng
inte
ll
igently
the
s
ource
of X
ML d
at
a.
He
nce,
it
is
hi
gh
ly
a
dopta
ble
in
ord
er
t
o
qu
e
ry
a
wide
sp
ect
rum
of
so
urce
i
n
XML
data
wh
ic
h
is
incl
usi
ve
of
both
docum
ents
an
d
databases
.
T
hu
s,
the
XQuer
y
has
m
anag
ed
to
pe
rfor
m
m
ining
with
ass
ociat
ion
r
ule
from
XML
do
c
um
ents
as
strai
ght
f
orward
appr
oach.
The
XML
Qu
e
ry
la
ng
ua
ge
has
dev
el
ope
d
th
e
XQuer
y
w
hich
is
us
ed
f
or
us
ual
f
un
ct
io
ns
f
or
se
arch
i
ng
an
d
extracti
ng
of
both
el
em
ents
an
d
at
trib
utes
f
ro
m
the
XML
do
c
um
ents
wh
e
rea
s
the
i
m
ple
m
entat
io
n
of
c
om
plex
al
gorithm
is
fr
equ
e
ntly
ha
rd
in
X
Query.
T
he
m
ajo
r
iss
ue
in
ass
ociat
ion
rul
e
m
ining
has
pr
opos
e
d
init
ia
ll
y
and
se
ver
al
al
go
rithm
i
m
plem
entat
i
on
s
hav
e
devel
oped
i
n
the
li
te
ratur
e
database
.
X
Q
ue
ry
has
us
ed
va
rio
us
m
et
ho
ds
for
extra
ct
ing
associat
ion
r
ules
from
ease
XML
do
c
um
ents.
The
set
of
f
un
ct
io
ns
fr
om
XQ
ue
r
y
has
dev
el
oped
an
d
get
im
plem
ented
in
Aprio
ri
al
gorithm
in
order
to
sh
ow
a
bette
r per
fo
rm
o
nly i
n ea
se
X
ML d
ocu
m
ent.
2.2
.
Clust
eri
ng
In
data m
ining
,
clusterin
g
is
one
of
t
he
im
po
r
ta
nt techn
i
qu
e
us
e
d
to
disco
ve
r
patte
r
n
a
nd
al
so
for data
distrib
ution
f
r
om
the
or
igi
nal
data.
T
he
cat
eg
or
iz
at
io
n
of
Wo
rl
d
W
i
de
W
e
b
do
c
um
ents,
a
rr
ay
of
pr
otein
s
with
sam
e
kin
d
of
f
un
ct
io
ns,
group
of
ge
nes
a
nd
t
he
sei
sm
ic
fau
lt
detect
io
ns
us
in
g
cat
lo
g
of
earth
quake
wi
th
the
entries
w
hich
are
gro
uped
ca
n
able
to
proce
ssed
by
cl
us
te
r
ing
.
These
sam
ples
ha
ve
in
ge
ner
al
that
cl
ust
eri
ng
al
gorithm
qu
al
it
y
is
go
od
th
en
the
ben
e
fit
o
rec
og
nize
d
higher
.
The
r
esearche
r
[
18
]
has
re
present
ed
this
m
et
ho
d
in
acc
ordin
g
to
two
la
ng
ua
ge
us
e
s
with
cl
ass
descr
ipti
on
f
or
sem
i
-
structu
ral
data
in
auto
m
at
ed
cl
us
te
rin
g.
T
he
first
cl
ass
la
ngua
ge
has
cl
as
ses
la
tt
ic
e
wh
i
ch
is
create
d
a
s
a
m
od
el
for
env
el
op
i
ng
the
entire
dataset
.
T
he
se
cond
c
la
ss
la
ngua
ge
is
the
ba
se
f
or
inter
pr
et
at
ion
of
la
tt
i
ce
pa
rt
in
w
hic
h
t
he
us
er
nee
ds
t
o
be
addresse
d.
O
ne
sign
ific
a
nt
XML
co
nce
pt
s
is
Docum
ent
Ty
pe
De
fini
ti
on
(
DT
D)
w
her
eas
t
he
c
om
ple
t
e
adv
a
ntage
s
a
re
notc
onside
red
in
the
prese
nt
app
li
cat
io
n.
T
he
resea
rch
e
r
[19
]
has
il
lustrat
ed
cl
ust
er
al
gorithm
for
e
xtracti
on
of
sem
i
-
structu
ral
data
from
t
he
or
i
gin
al
data
w
her
eas
cl
ust
ering
no
vel
m
et
hod
with
DT
Ds
i
s
pr
ese
nted
wh
i
ch
can
be
us
ed
for
cl
us
te
ri
ng
the
XML
do
c
um
ent.
This
appro
ac
h
ha
s
two
le
vel
cl
us
te
r
appr
oach
es
na
m
el
y
a.
Cl
us
te
rin
g
the
el
e
m
ent
in
DT
Ds:
The
first
le
vel
m
e
tho
d
w
it
h
el
e
m
ent
cl
us
te
ring
t
hat
prov
i
ded
with
dem
ent cluster
s which
h
as
appr
opriat
e ele
m
ents fo
r
init
ia
ti
on.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
1
4
, N
o.
1
,
A
pr
il
201
9
:
450
–
454
452
b.
Cl
us
te
rs
DT
Ds
sepa
ratel
y:
Th
is
is
a
sec
ond
l
evel
in
t
he
e
ntire
cl
us
te
rin
g
proces
s
in
w
hic
h
the
D
TD
cl
us
te
rin
g has
util
iz
ed
the
ge
ner
al
iz
ed
d
at
a.
2.3
.
Clas
si
ficat
i
on
In
m
os
t
of
th
e
cases,
be
ha
vior
of
cl
assif
ic
at
ion
in
XM
L
do
c
um
ent
i
s
con
ceal
e
d
w
it
h
structu
re
inf
or
m
at
ion
presented
in
the
do
c
um
ent.
In
s
om
e
cases,
the
cl
assifi
er
of
inf
or
m
at
ion
al
retrieval
has
pr
ob
a
bly
pro
gr
ess
to
be
inef
fectual
f
or
XML
do
c
ume
nts
wh
ic
h
ha
s
f
ocused
on
the
ru
le
base
d
cl
assifi
er
us
es
as
a
n
eff
ic
ie
nt
to
ol
f
or
data
cl
assifi
cat
ion
.
T
he
m
otive
te
chn
iq
ue
is
r
ule
base
d
cl
assifi
ers
w
hich
hav
e
inte
gr
at
e
d
the
issues
of
both
cl
assifi
cat
ion
a
nd
ass
ociat
ions.
T
he
str
uctu
r
al
ru
le
s
with
their
c
reated
pr
ob
le
m
s
are
discuss
e
d
us
in
g
XR
ULE
[20]
w
hich
is
to
pe
rfor
m
the
cl
assifi
cat
ion
ta
sk
.
T
he
str
uct
ur
es
w
hich
are
firm
l
y
ass
ociat
ed
to
the
resp
ect
ive
cl
ass
var
ia
bles
are
identifie
d
in
the
trai
nin
g
ph
a
se.
O
nce
th
e
trai
nin
g
phas
e
get
com
pleted
,
the
te
sti
ng
phase
s
ta
rt
per
f
orm
in
g
these
r
ule
w
hich
are
util
iz
ed
to
predict
th
e
unknown
X
ML
do
c
um
ent
cl
asses
wh
e
reas
the
X
ML
do
c
um
ents
ca
n
be
m
od
el
ed
as
rooted
tr
ees
wh
ic
h
is
order
e
d
an
d
la
be
le
d.
I
n
[21
]
prov
i
de
d
the
f
or
m
of
XML
doc
um
e
nt
de
fines
pat
te
rn
of
s
ubtre
es
in
t
he
X
ML
do
c
um
ent.
I
n
[
22]
intr
oduce
d
XMI
NERUL
E
for
en
rich
XQuer
y
wit
h
knowle
dge
disc
ov
ery
an
d
datam
i
ning
capa
bili
ti
es.
I
n
[
23
]
des
cribe
d
the
si
m
ple
XML
do
c
um
ent
has
il
lustrate
d
the
pr
op
os
e
d
te
chn
iq
ue
that
perform
go
od
on
ly
in
sim
ple
XML
do
c
um
ent b
ut
no
t i
n
the
c
omplex
XML
doc
um
ent w
hich
has i
rr
e
gula
r
str
uctu
re.
The
lim
it
a
ti
on
of
this
m
et
ho
d
is
a
hu
ge
num
ber
of
ru
le
s
are
pro
d
uce
d
by
ru
le
gen
e
rator
al
gorithm
,
and
it
is
ve
ry
diffi
c
ult
to
st
ore
the
ru
le
s,
ret
rieve
t
he
relat
e
d
r
ules,
an
d
s
e
t
the
r
ules.
I
n
m
os
t
cases,
X
RULE
achieves
high
-
cl
assifica
ti
on
accuracy
by
us
i
ng
c
on
si
der
a
bl
y
la
rg
e
nu
m
ber
of
r
ules
in
the
cl
assifier,
wh
ic
h
su
cce
ssi
vely
m
igh
t ca
us
e
over
fitt
ing
, pa
rtic
ularly
f
or sm
al
l tr
ai
ning
dataset
s.
2.4
.
Constr
u
ction
of T
AR
ba
sed
X
Q
uer
y Searc
h
The
m
os
t
flexi
ble
arch
it
ect
ure
is
XML
do
cum
ents
wh
ic
h
can
be
prep
rocessed
w
her
ea
s
the
X
M
L
pre
-
pr
ocessin
g
is
done
by
X
ML
par
se
r.
T
he
DO
M
(Doc
um
ent
Object
Mod
el
)
pa
rser
is
us
ed
he
re
w
hich
is
us
e
d
to
co
ns
tr
uct
the
tree
f
rom
the
XML
docum
ent.
Acc
ordi
ng
to
t
he
X
ML
do
c
um
ent,
DO
M
ha
s
cre
at
ed
a
structu
re
of
tre
e
within
the
in
te
rn
al
m
e
m
or
y
wh
e
reas
D
O
M
can
able
to
store
the
e
ntire
do
c
um
ents
in
t
o
the
internal
m
e
m
or
y
be
fore
proc
essing
the
acc
essible
XML
do
c
um
ents
w
hi
ch
get
l
oad
e
d
as
an
ob
j
ect
of
XML
DO
M.
It
al
lo
ws
th
e
use
rs
to
tra
verse
the
docum
ent
us
i
ng
wild
car
d
appr
oach
XM
L
trees,
acce
s
s,
inse
rt
,
update
the
c
on
te
nt,
sty
le
and
structu
re
of
th
e
d
oc
um
ent
an
d
al
so
t
o
delet
e
the
node
s
f
rom
the
tree.
Th
eref
or
e
XML
doc
um
e
nt
form
s
a
tree
structu
re.
Also
the
XML
do
cum
ent
sh
ou
l
d
be
validat
e
d
(i
.e)
the
ta
gs
s
houl
d
be
sta
rted
a
nd end
ed
c
orrectl
y wi
thout l
eavi
ng a
ny tag
with
ou
t
it
s p
ai
r.
2.4.1
.
Fre
que
nt
P
attern
Ex
t
ract
i
on
The
fr
e
quent
e
ven
t
of
dataset
s
with
hu
ge
a
m
ou
nt
of
c
ollec
te
d
data
is
de
te
rm
ined
in
th
e
asso
ci
at
io
n
ru
le
s.
The
tw
o
data
it
e
m
s
c
on
si
der
e
d
are
X
an
d
Y
an
d
it
is
rep
rese
nt
ed
in
te
rm
ofX
ՈY.
S
uppo
rt
and
Confide
nce
ar
e
the
fact
or
s
us
e
d
to
m
eas
ur
e
the
as
so
ci
at
ion
ru
le
w
her
e
as
the
S
upport
is
represe
nted
wit
h
fr
e
qu
e
ncy
of
t
he
set
nam
el
y
X
a
nd
Y
w
hich
is
a
vaila
ble
i
n
the
data
set
and
Co
nf
i
den
c
e
is
re
pr
ese
nte
dw
it
h
conditi
on
al
prob
a
bili
ty
abo
ut
find
in
g
Y
,
ha
ving
got
X.
The
interest
in
g
patte
r
ns
am
ong
the
s
ub
t
re
es
of
th
e
giv
e
n
XML
docum
ent
can
be
ide
ntifie
d.
I
n
the
XML
doc
um
ent,
the
s
ubtree
patte
r
n
ha
s
de
fine
d
t
he
XML
do
c
um
ent in the
set of
T
AR
w
her
eas t
he
w
ho
le
d
oc
um
ent o
f
X
ML i
s acces
sed
in
orde
r
to
pro
vid
e s
up
por
t and
confide
nce sta
nd
a
r
ds
.
Accor
ding
to
T
AR m
ining
the
re ar
e
tw
o
sta
ge
of
process i
nvol
ved is m
entione
d belo
w
:
St
age
1:
Mining frequ
ent
s
ub
tr
ees
St
age
2:
Com
put
in
g
in
teres
ting rul
es
The
data
c
onsidere
d
i
n
al
l
the
file
s
are
m
erg
ed
in
te
rm
of
one X
ML do
c
um
ent,
after
acq
uiring
t
he
se
t
of
file
s
f
r
om
the prop
os
ed
m
od
el
.T
he
ste
p ne
xt to
t
his is to
ob
ta
in
the
TA
R of al
l t
he fil
es. Once it
is
do
ne,
t
he
pro
po
se
d
m
od
el
of
CM
Tree
Mi
ner
al
gorith
m
will
giv
e
the
m
os
t
fr
eq
uent
feed
s
of
al
l
t
he
file
s
w
her
e
as
this
process
is
c
omplet
ed
the
n
fee
d
searc
h
has
pe
rfor
m
ed
w
hich
are
pro
vid
e
d
with
filt
ere
d
r
esult.
Searc
hi
ng
the
resou
rces
with
the h
el
p of q
ue
ries is
no
t a
sim
ple task,
w
he
re in
acc
urat
e r
esult an
d
c
om
plexity
w
ould
oc
cur.
3.
PROP
OSE
D
METHO
D
3.1
.
Ef
fecti
ve
X
Q
uery
Ke
yw
ord
Se
arch
The
m
ajo
r
c
ha
ll
eng
e
of
this
r
esearch
is
to
r
ank
al
l
these
queries
base
d
on
the
in
div
i
dual
m
at
ches.
Tree
base
d
ass
ociat
ion
r
ule
w
her
e
it
is
m
ai
n
ly
qu
e
ry
ba
sed
syst
e
m
.
The
use
r
ca
n
sea
rc
h
their
resou
rces
wit
h
the
hel
p
of
qu
eries.
Sea
rc
hing
the
res
ource
s
with
t
he
help
of
que
ries
is
no
t
a
sim
ple
task
,
w
her
e
inac
cur
at
e
resu
lt
a
nd
c
omplexit
y
w
ould
occur.
It
is
not
a
bette
r
way
for
sea
rch
i
ng
the
res
ources.
He
nce,
t
his
resear
ch
ha
s
fo
c
us
e
d
to
res
olv
e
t
he
a
bove
lim
i
ta
ti
on
s
and
al
so
in
our
resea
rch,
t
he
a
bove
al
l
disad
va
ntages
is
al
so
Evaluation Warning : The document was created with Spire.PDF for Python.
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci
IS
S
N:
25
02
-
4752
Eff
ect
iv
e XQu
e
ry keyw
ord
us
i
ng XML
query
pr
oc
essin
g
(
E.
Ses
ha
t
heri
)
453
inco
rpor
at
e
d
a
nd
determ
ined
the
novel
te
chn
i
qu
e
.
T
he
associat
ion
r
ul
e
has
s
pecific
m
ining
i
deas
for
pro
vid
in
g
su
m
m
arized
repres
entat
ion
in
X
ML
do
c
um
ent
s
hav
e
in
vestig
at
ed
thr
ough
sever
al
pro
posa
ls
ei
ther
us
in
g
la
ngua
ge
li
ke
Xquer
y
and
a
dv
a
nce
d
te
chn
iq
ue
in
XML
co
ntent
or
us
in
g
im
pl
e
m
entat
ion
of
li
near
search
al
gorith
m
.
Ther
efore,
an
ad
va
nced
s
earch
te
c
hn
i
que
is
wildcar
d
wh
ic
h
ha
s
us
e
d
f
or
m
axi
m
izing
th
e
search
res
ult
in
the
databa
se.
I
n
ord
er
to sear
ch
the r
ep
rese
nt
ed
one
or
m
or
e
char
act
e
r
in
t
he
w
ord
,
wildc
ard
i
s
the
m
os
t
eff
ect
ive
te
chn
i
qu
e
wh
e
reas
t
he
r
epr
ese
ntati
on
of
a
si
ngle
ch
aracte
r
is
m
entioned
i
n
the
f
or
m
of
qu
e
sti
on
m
ark
(
?
?
)
wh
ic
h
is
ve
ry
essenti
al
w
hile
there
are
s
ever
al
s
pelli
ngs
for
a
w
ord
a
nd
it
has
to
sea
r
ch
f
or
al
l
var
ia
ti
on
at
once.M
or
e
ef
f
ic
i
ency
beca
use
if
t
he
us
er
f
orget
t
he
e
xact
r
eso
ur
ces
that
t
he
us
e
r
wan
t.
I
n
t
his
case al
so
Wild
card searc
h wil
l help t
he use
r t
o
fi
nd h
is
res
ources
co
m
plete
ly
.
Fo
r
e
xam
ple, S
earchi
ng for Ja
va would
r
et
urn jav
a
.
The fo
rm
s o
f wil
dcard sy
ntax
s
pecified
b
y t
his
XML
do
c
um
ent are:
a.
A
sin
gle
per
i
od, wit
hout a
ny
qu
al
ifie
r
s: M
at
ches a
sin
gle ar
bitrary c
har
act
er.
b.
A
per
i
od
im
mediat
el
y
fo
ll
ow
ed
by
a
sing
le
qu
est
ion
m
a
rk,
"
?
":
Ma
tc
hes
ei
ther
no
c
har
act
er
s
or
on
e
char
act
e
r.
c.
A peri
od im
m
e
diate
ly
f
ollo
we
d by a si
ng
le
a
ste
ris
k, "*": M
at
ches zero
or
m
or
e char
act
er
s.
d.
A peri
od im
m
e
diate
ly
f
ollo
we
d by a si
ng
le
pl
us
si
gn, "+": M
at
ches
on
e
or
m
or
e char
act
er
s.
e.
A
per
i
od
im
m
ediat
el
y
fo
ll
ow
e
d
by
a
se
qu
e
nce
of
c
ha
racters
t
hat
m
at
ches
the
re
gula
r
e
xpressi
on
{[0
-
9]+,
[0
-
9]+}:
Ma
tc
hes
a
nu
m
ber
of
ch
aracte
rs,
wh
e
r
e
the
nu
m
ber
is
no
le
ss
than
the
num
ber
represe
nted
by
the
series
of
di
gits
befor
e
t
he
com
m
a,
and
no
great
er
tha
n
the
num
ber
repr
esented
by
the
series
of
dig
it
s
fo
ll
owin
g
t
he
c
omm
a.
In
XQue
ry,
co
ntains
wildcar
d
searc
h
op
ti
on
co
ns
ist
s
of
m
ul
ti
ple
search
key
wor
ds
na
m
el
y
*,?
,
fu
l
l
stop,+
that
a
r
e
al
te
rn
at
ively
f
ollow
e
d
by
a
qual
ifie
r.
E
ver
y
wildcar
d
searc
h
has
m
at
ched
z
er
o
or
m
any
char
act
e
rs
with
a
X
Q
uer
y
t
oken
in
the
te
xt
a
re
bein
g
sea
rched.
The
num
ber
of
c
ha
racters
that
can
be
m
atch
e
d
dep
e
nds o
n
t
he
qu
al
ifie
r
.
T
his
search
us
e
d
to
i
m
pr
ove
the perf
or
m
ance
an
d
retrie
ve
rele
va
nt
inf
or
m
at
ion
f
ro
m
the
XML
doc
um
ent.
4.
CONCL
US
I
O
N
The
pr
opos
e
d
m
od
el
of
li
near
searc
h
al
gorit
hm
is
pr
ov
i
ded
with
m
os
t
fr
equ
e
nt
fee
ds
of
al
l
the
file
s
wh
il
e
the
fee
d
searc
h
has
be
en
perform
ed
by
the
wildca
r
d
base
d
sea
rch
on
XQue
ry
does
pro
vid
e
d
with
th
e
filt
ered
re
su
lt
.
Ther
e
f
or
e,
t
he
pro
po
se
d
wild
card
sea
rc
h
is an
ad
va
nce
sea
rch
te
c
hn
i
qu
e
t
hat
can b
e u
ti
lized
f
or
m
axi
m
iz
ing
the
searc
h
resu
lt
s
in
li
brary
data
bases
with
le
ss
tim
e
con
s
um
ption
s
i
n
order
to
fin
d
the
res
ourc
e
com
plete
ly
f
or
the users
.
REFERE
NCE
S
[1]
Sashathe
ri
,
E
.
,
a
nd
Dr.
Bhuvan
e
shw
ari
,
T
.
,
“
A
Novel
Method
t
o
Mana
ging
Se
m
i
Struct
ure
d
Data
in
Distr
ibut
e
d
Envi
ronm
ent
using
Modifie
d
T
re
e
base
d
As
sociation
Rul
es(T
AR)”,
Australi
a
n
Journal
of
B
asic
and
Appl
i
e
d
Sci
en
ce
s,
vol. 9
,
no.
35
,
pp
.
277
-
286,
2015
.
[2]
Vikhe,
P.
B.
,
an
d
Gunjal,
B
.
L
.
,
“
Ext
ra
c
ti
ng
Tree
Based
As
sociation
Rul
es
from
XM
L
Docum
ent
”,
In
te
rnationa
l
Journal
of
Eme
r
ging
Techno
logy and
Ad
vanced E
ngi
nee
ring
,
vol
.
3,
no
.
6
,
June
20
13.
[3]
Alfi
y
aIq
ba
l,
A
.
S.,
and
Sanchi
k
a
,
B.
,
“
Freque
nt
Pattern
Mining
f
o
r
XM
L
Quer
y
-
Ans
weri
ng
Support”,
Inte
rnat
ion
al
Journal
of
Innov
ati
v
e
Techno
log
y
and
Ex
p
loring Engineering
(
IJ
I
TEE)
,
vol. 4, no. 2, Jul
y
2014.
[4]
Nee
le
s
h,
S.
,
and
Ric
ha
,
K.,
“
FP
-
Grow
th
Tre
e
Ba
sed
Algorit
hm
s
Anal
y
sis:
CP
-
Tr
ee
and
K
Map”
,
Bi
nary
Journal
of
Data
Mini
ng
&
Net
working
,
vol
.
5,
pp
.
26
-
29
,
20
15.
[5]
Vandit
,
A
.
,
Mandha
ni,
K
.
,
and
Dr.
Pree
tha
m
,
K
.,
“
An
Im
provis
ed
Freque
nt
Pattern
Tre
e
Based
As
socia
ti
on
Rul
e
M
ini
ng
Te
chn
iq
ue
with
Mining
Freque
nt
Ite
m
Sets
Algorit
hm
and
a
Modifie
d
Hea
d
er
Ta
b
le
”
,
Inte
rnat
ion
al
Journal
of
Data
Mini
ng
&
Knowl
edge
Manag
eme
nt
Proc
ess (
IJ
DKP)
,
vol. 5, no.
2,
2015
.
[6]
Sham
bhu,
K.
S.,
et
al.,
“
As
socia
ti
on
Policy
for
X
ML
Quer
y
Ans
weri
ng
B
y
Mini
ng
Tree”,
Int
ernati
onal
Journal
o
f
Adv
anc
ed Re
sea
rch
in
Comput
er
and
Comm
unication Engi
ne
erin
g,
vol
.
4
,
no
.
3
,
2
015.
[7]
Sesha
the
ri
,
E.,
and
Dr.
Bhuva
neshwari
,
T.,
“
An
Eff
icient
di
stribut
ed
data
proc
essing
m
ethod
for
sm
ooth
envi
ronm
ent
”
,
J
ournal
o
f Engi
ne
ering
and
Appl
i
e
d
Scienc
es
vol
.
1
1,
no
.
8
,
pp
.
185
5
-
1858,
2016
.
[8]
Sw
aru
pa,
N.
S.,
“
TAR:
Algorit
hm
for
Mining
XM
L
Quer
y
Ans
weri
ng”,
Int
ernati
onal
Jour
nal
of
Adv
an
ced
Re
search in
Co
mputer
Scienc
e and Sof
tware
En
gine
ering
,
vol
.
6
,
no
.
6
,
2016
.
[9]
Poonam
,
R.
M.,
“
Ans
weri
ng
XM
L
Quer
y
Us
ing
Tre
e
Base
d
As
socia
ti
on
Rule
”
,
IJ
CSMC
,
vol.
6,
no
.
2,
pp.
75
–
80,
2017
.
[10]
Neha
,
H.
N.,
an
d
Kapil
,
H.,
“
Data
Mining
for
Inte
nsional
Quer
y
Ans
weri
ng
Us
in
g
Tre
e
Based
As
socia
ti
on
Rul
es”
,
IJ
EDR
,
vol
.
4
,
n
o.
2
,
2016
.
[11]
Priti
sh,
Y.
,
and
Suneet
ha
,
K.
R.
,
“
Modifie
d
Apri
ori
Graph
Algor
it
hm
for
Freque
nt
Pattern
Min
in
g”,
In
te
rnationa
l
Confe
renc
e
on
I
nnovat
ions
in
in
f
orm
ati
on
Embe
d
ded
and
Comm
u
nic
ati
on
Syst
ems.
2016
.
[12]
W
an,
J.
W
.
,
an
d
Dobbie,
G.
,
“
Ext
ra
ct
ing
As
sociatio
n
Rul
es
from
XM
L
Docu
m
ent
s
Us
ing
X
Quer
y
,
”
Proc
.
F
if
t
h
ACM
Int’
l
Work
shopWeb
Information
and
Data
Manage
ment,
pp
:
94
-
97,
2003.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2502
-
4752
Ind
on
esi
a
n
J
E
le
c Eng &
Co
m
p
Sci,
Vo
l.
1
4
, N
o.
1
,
A
pr
il
201
9
:
450
–
454
454
[13]
Chi,
Y.,
Y
ang,
Y.,
Xia,
Y.
,
an
d
Muntz,
R.
R.
,
“
CMTree
Mine
r:
Mining
both
Closed
and
Maximal
Freque
nt
Subtree
s”,
Proc
.
Ei
gh
th
Pa
cific
-
Asia
Conf
.
Know
le
dge
Discov
ery
and
Data
Min
in
g,
pp:
63
-
73,
20
04.
[14]
Te
rm
ie
r,
A.
,
Rouss
et
,
M.,
Seb
a
g,
M.,
Ohara
,
K
.
,
W
ashio,
T.,
a
nd
Motoda,
H.,
“
Dry
ade
Pare
n
t,
an
Eff
icient
an
d
Robust
Closed
Attri
bute
Tree
Mining
Algorit
h
m
”,
IEE
E
Tr
ans.
Knowle
dge
a
n
d
Data
Eng,
vol.
20,
no.
3,
pp.
3
00
-
320,
Mar,
2008
.
[15]
Muham
m
ad,
A.
A.,
and
Mahm
udul,
H.
B.
,
“
T
ext
to
Emotion
Ext
ra
ct
ion
Us
in
g
Supervised
Mac
hin
e
Learni
ng
Te
chn
ique
s”
,
TE
LK
OMNIKA,
vo
l.
16,
no.
3
,
pp
.
13
94~1401,
2018.
[16]
Harc
o,
L
.
H.
S.
W
.
,
Agung,
T.
,
&
Ric
har
d
,
R.
,
“
Confide
nc
e
of
AO
I
-
HEP
Mining
Patt
ern
”,
TEL
KOMNIKA,
vol.
1
6,
no.
3,
pp.
1217
-
1
225,
2018
.
[17]
Afia
n,
S.
R.
,
Aris,
T.
,
and
Rah
m
at
,
T.,
“
Com
p
ari
son
of
stemming
al
gorit
hm
s
and
it
s
eff
ect
on
indone
sian
text
proc
essing”,
TE
LK
OMNIKA
Tele
communic
at
ion
,
Computing
,
El
e
ct
ronics
and
Co
ntrol
,
vo
l
-
17,
20
18.
[18]
Natha
l
ie
,
P.
,
Marie
-
Christi
n
e,
R
.
,
&
Veroni
que,
V.,
“
Autom
at
ic
Construct
ion
an
d
re
fine
m
ent
of
a
cl
ass
hie
ra
r
c
h
y
over
sem
i
-
struc
t
ure
d
da
ta”,
[19]
Svetl
oz
ar,
N., Se
rge
,
A
.
,
and
Raje
ev,
M
.
,
“
Ext
r
ac
t
in
g
sche
m
a
from
sem
i
-
struct
ur
ed dat
a
”,
[20]
Abite
boul
,
S.
,
B
uneman,
P.
,
Suc
i
u,
D.
,
“
Data on t
he
W
eb’
,
Morga
n
Kaufm
ann,
20
00.
[21]
Sugan
y
a
,
I.,
Velmuruga
n,
N.,
and
Gan
eshkum
ar,
P.,
"X
ML
Quer
y
-
Ans
weri
ng
Support
S
y
stem
usin
g
As
socia
ti
onMini
ng
Techni
que"
,
I
EE
E
conference
on
ICT.
2013
.
[22]
Braga
,
D.
,
Campi,
A.,
Ceri,
S.,
Klemett
in
en,
M.
,
and
La
nz
i,
P.,
“
Discove
ring
Inte
re
sting
Inform
a
ti
on
in
XM
LDa
ta
with
As
socia
t
ion
Rule
s”
,
Proc. ACM S
Y
mp.
Appli
ed
Computing
,
p
p.
450
-
454
,
200
3.
[23]
W
an,
J.
W
.
W
.
,
and
Dobbie,
G.
,
“
Ext
ra
cting
As
s
oci
a
ti
on
Rule
s
fr
om
X
ML
Docu
m
ent
s
Us
ing
XQue
r
y
,
”
Pro
c.
Fift
h
ACM
Int’
l
Work
shop We
b
In
formation
and
Data
Manage
ment
,
p
p.
94
-
97
,
2003
.
Evaluation Warning : The document was created with Spire.PDF for Python.