Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
Vol.
5, No. 6, Decem
ber
2015, pp. 1536~
1
544
I
S
SN
: 208
8-8
7
0
8
1
536
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Graph-Based Concept Clusteri
n
g
for W
e
b Search Results
Sup
akp
o
n
g
Ji
nar
a
t
*
,
Ch
o
o
c
h
ar
t H
a
ruec
h
a
i
y
as
ak
**
, A
r
non
R
u
n
g
saw
a
n
g
*
* Department of
Computer Engin
eering
,
Kase
tsart University
, Ch
atuchak
,
B
a
ngkok
, Th
ailand
** Nation
a
l Electronics
and Computer
Technolog
y
C
e
nter (NEC
TEC),
Thailand
Article Info
A
B
STRAC
T
Article histo
r
y:
Received
J
u
l 20, 2015
Rev
i
sed
Au
g
27
, 20
15
Accepted
Sep 14, 2015
A search engine usually
returns a
long list of web search results
corresponding to a quer
y
from the user. Users must spend a lot of time for
browsing and n
a
vigating th
e search r
e
su
lts for
the r
e
l
e
vant
re
sults. Ma
n
y
res
earch works
appli
e
d the t
e
xt
clus
tering tech
niques, called web search
results cluster
i
n
g
, to handle the pr
obl
e
m
.
Unfort
una
te
ly
,
sea
r
c
h
re
sult
docum
ent re
turn
ed from
s
ear
ch
engine
is
a
ver
y
short tex
t
.
It
is diffi
cult
to
cluster re
lat
e
d d
o
cum
e
nts into th
e sam
e
group because a short do
cum
e
nt has
low informativ
e content. In
this
pape
r, we p
r
opo
sed a method to
cluster th
e
web search
resu
lts with high
clu
s
teri
ng quality
u
s
ing graph-based cluster
i
ng
with concept w
h
ich ex
tract fro
m the external
knowledge sour
ce. The main
idea
is to
exp
a
n
d
the
origin
al
se
arch r
e
sults wi
th
som
e
rel
a
ted
co
ncept
term
s.
We applied the Wikipedia as the ex
tern
al kno
wledge source f
o
r concept
extraction
.
We compared the
clu
s
teri
ng results of our proposed method with
two well-known search results clusteri
ng techniq
u
es, Suffix Tree Clusterin
g
and Lingo. Th
e experimental results
showed t
h
at our proposed method
significantly
outperforms over th
e
well-known clustering
techn
i
q
u
es.
Keyword:
Co
n
c
ep
t ex
tractio
n
Gra
p
h-
base
d cl
ust
e
ri
n
g
Searc
h
re
sults
clustering
W
i
ki
pe
di
a c
o
n
cept
Copyright ©
201
5 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Ar
no
n R
u
ngs
a
w
an
g,
Depa
rt
m
e
nt
of
C
o
m
put
er E
ngi
neeri
n
g
,
Kasetsart Un
iversity,
5
0
Ng
am
W
ong
W
a
n Ro
ad
, Lad
y
aow
C
h
atuch
a
k, Ban
gko
k, 109
00
, Th
ailan
d
.
Em
a
il: arn
o
n
@
mik
e
lab
.
n
e
t
1.
INTRODUCTION
A searc
h
e
n
gi
n
e
us
ual
l
y
ret
u
r
n
s a l
a
r
g
e
num
ber
of
we
b sea
r
ch
res
u
l
t
s
cor
r
e
sp
on
di
n
g
t
o
a
que
ry
f
r
om
t
h
e use
r
. C
o
ns
eque
nt
l
y
, use
r
s
m
u
st
spen
d a l
o
t
of t
i
m
e
t
o
br
owse a
n
d n
a
vi
gat
i
n
g t
h
e s
earch
resul
t
s
f
o
r t
h
e
m
o
st
rel
e
vant
i
n
f
o
rm
at
i
on.
M
a
ny
resea
r
c
h
ers
use
d
t
e
xt
cl
ust
e
ri
n
g
t
e
c
hni
que
s, cal
l
e
d
web
searc
h
res
u
l
t
s
clu
s
tering
, t
o
h
a
nd
le th
e
prob
lem
[1
]-[6
].
Tex
t
clu
s
teri
ng is th
e m
e
th
o
d
for
g
r
o
u
p
i
ng
t
h
e related do
cu
m
e
n
t
s
in
to
th
e sam
e
clu
s
ter.
Un
fo
rt
un
ately, search
resu
lt do
cu
m
e
nt ret
u
rne
d
fr
om
search
en
gi
ne
i
s
a ve
ry
s
h
o
r
t
t
e
xt
.
It is difficult to cluster related doc
um
ents into the
sam
e
group
because a short do
cum
e
nt has low inform
ative
cont
e
n
t
.
We assu
m
e
th
at th
e sho
r
t
d
o
c
u
m
e
n
t can
b
e
ex
ten
d
e
d
with
so
me su
itab
l
e terms to
jo
in th
e
related
d
o
c
u
m
en
t in
to
th
e sam
e
clu
s
ter.
M
a
ny
web se
arch
resul
t
cl
ust
e
ri
n
g
t
ech
n
i
ques
us
e
d
the “Bag of
Words”
m
o
d
e
l to
conv
ert a
doc
um
ent
i
n
t
o
a t
e
r
m
-vect
o
r
and c
o
m
put
e t
h
e si
m
i
l
a
ri
t
y
b
e
t
w
een
2 d
o
c
u
m
e
nt
s [2]
.
S
o
m
e
wor
k
use
d
si
ng
ul
ar
val
u
e
decom
posi
t
i
on m
odel
t
o
ge
nerat
e
d cl
ust
e
r l
a
bel
fi
rs
t
and t
h
e
n
cl
us
t
e
r t
h
e d
o
cum
e
nt
s [
4
]
.
Som
e
searc
h
resul
t
cl
ust
e
ri
ng t
e
c
hni
que
s
appl
i
e
d t
h
e s
u
f
f
i
x
t
r
ee st
ructu
r
e to
m
a
in
tain
wo
rd
s or
p
h
rases
[5
]-[6]. Th
e
li
mitatio
n
o
f
sho
r
t
do
cu
m
e
n
t
affect to th
ese wo
rk
s en
co
un
ters un
satisf
actory clu
s
ter
i
ng
r
e
su
lt.
Many resea
r
c
h
ers
applied t
h
e e
x
ternal
knowle
dg
e sour
ce to
exp
a
nd th
e inf
o
r
m
ati
o
n
of
sh
or
t
doc
um
ent
.
So
m
e
wor
k
ap
pl
i
e
d we
b t
a
x
o
n
o
m
y
of Ope
n
D
i
rect
ory
Pr
o
j
ec
t
(OD
P
) t
o
ge
n
e
rat
e
t
h
e co
nce
p
t
s
f
o
r
t
e
xt
enri
c
h
m
e
nt
[7]
.
T
h
e w
o
r
k
o
f
Di
M
a
rc
o
and
Nai
g
l
i
[8] applied t
h
e ODP to cl
uster t
h
e we
b sea
r
ch
result
s
by
usi
n
g t
echn
i
ques o
f
Wor
d
Sense In
d
u
ct
i
on (
W
S
I
)
.
Li
m
i
t
a
t
i
on of u
s
i
ng
ODP as an
ext
e
rnal
i
n
f
o
r
m
at
i
o
n
source is
too s
m
all content
of each cate
g
ory, low inform
at
ive c
onte
n
t a
n
d
lack of links t
o
related cate
g
ories.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
G
r
aph
-
B
a
s
ed
Co
n
c
ep
t Clu
s
terin
g
fo
r
Web Sea
r
ch
Resu
lts (S
upa
kpon
g Jina
ra
t)
1
537
Recently, m
a
n
y
researc
h
works
use
W
i
ki
pe
dia as a
n
e
x
ternal knowle
dge
for m
a
ny aspects in data
m
i
ni
ng t
ech
ni
que
s [9]
-
[
12]
.
Due t
o
Wi
ki
p
e
di
a i
s
t
h
e l
a
rgest
o
n
l
i
n
e fre
e ency
cl
ope
di
a
whi
c
h c
o
vers
every
to
p
i
cs an
d contain
s
nu
m
e
r
o
us categ
or
ies inclu
d
i
ng
a p
e
opl
e sto
r
ies, im
p
o
r
tan
t
ev
en
ts an
d etc. th
e articles in
W
i
ki
pe
di
a a
r
e
pr
o
v
i
d
e
d
by
a
l
o
t
of
v
o
l
u
nt
ee
rs ar
o
u
n
d
t
h
e
wo
rl
d
.
S
o
m
e
research
w
o
r
k
s
appl
i
e
d
W
i
ki
p
e
di
a t
o
gene
rate features for text cate
g
orization [11], a
nno
tate or iden
tified
do
cu
m
e
n
t
top
i
cs [12
]-[13
].
In
th
is p
a
p
e
r,
we try to
i
m
p
r
o
v
e
th
e
q
u
a
lity o
f
search
results clu
s
terin
g
by u
s
in
g
th
e con
cep
t term
s
gene
rat
e
d f
r
o
m
W
i
ki
pe
di
a (t
h
e
l
a
rgest
onl
i
n
e ency
cl
ope
di
a) and al
s
o
i
n
t
r
o
duce a sh
o
r
t
t
e
xt
cl
ust
e
ri
n
g
m
e
t
h
o
d
usi
n
g
gra
p
h-
ba
sed c
o
nst
r
uct
i
o
n t
o
co
n
n
ect
t
h
e rel
a
t
e
d
d
o
cu
m
e
nt
s.
W
e
co
m
p
ared t
h
e cl
u
s
t
e
ri
ng
res
u
l
t
s
of
o
u
r
pr
o
pose
d
m
e
t
hod
wi
t
h
wel
l
-
k
n
o
w
n searc
h
r
e
sul
t
cl
ust
e
ri
n
g
t
echni
q
u
es
, S
u
f
f
i
x
T
r
ee C
l
u
s
t
e
ri
ng
[
5
]
and
Li
ng
o
[4]
.
E
x
peri
m
e
nt
al
resul
t
s
sh
o
w
t
h
at
ou
r
pr
o
pos
ed m
e
t
hod
si
gni
fi
ca
nt
l
y
o
u
t
p
e
r
f
o
rm
s t
h
e wel
l
-
k
n
o
w
n
cl
ust
e
ri
n
g
techniques
.
2.
R
E
SEARC
H M
ETHOD
In t
h
i
s
sect
i
o
n,
t
h
e pr
op
ose
d
m
e
t
hod
has 2
pri
m
ary
st
eps. The Fi
rst
i
s
C
once
p
t
Ext
r
act
i
on st
ep
, t
h
e
process
of ext
r
acting the
concepts
fr
o
m
ex
ter
n
al
k
now
ledg
e s
o
urce to produce the i
n
p
u
t
f
o
r
ne
xt
st
ep. T
h
e
seco
nd
i
s
t
h
e
st
ep
of
searc
h
re
sul
t
s
cl
ust
e
ri
ng
base
d
o
n
gra
p
h m
odel
.
2.
1. C
o
ncep
t E
x
tr
acti
on
In t
h
i
s
sect
i
o
n,
we e
xpl
ai
n
w
h
at
t
h
e
W
i
ki
pe
d
i
a i
s
,
wh
at is the in
fo
rm
atio
n
wh
ich
u
s
ed to
as a con
c
ep
t
t
e
rm
and h
o
w
t
o
e
x
t
r
act
t
h
e
co
ncept
s
f
r
om
Wi
ki
pedi
a.
2.
1.
1. Wi
ki
pe
di
a
W
i
k
i
p
e
d
i
a is th
e lar
g
est on
lin
e en
cyclop
ed
ia o
f
th
e
wo
r
l
d. I
n
En
g
lish
v
e
r
s
ion
,
it con
t
ain
s
ov
er
300
m
i
ll
i
on
w
o
r
d
s
i
n
nearl
y
5 m
i
ll
i
on
art
i
c
l
e
s,
co
nt
ri
b
u
t
e
d by
o
v
er 16
0,
0
00 v
o
l
u
nt
eer
e
d
i
t
o
rs
. W
i
ki
pe
di
a
has
several a
dva
nt
ages. Fi
rst, its articles are
m
u
ch cleane
r
t
h
a
n
t
y
pi
cal
web
page
s. I
n
t
h
e
XM
L
W
i
ki
pe
d
i
a dum
p
fi
l
e
, we ca
n e
a
si
l
y
ext
r
act
t
h
e pa
rt
s o
f
W
i
ki
pe
di
a art
i
c
l
e
by
t
r
acki
ng
w
i
t
h
W
i
ki
pe
di
a
Tag c
ont
ai
ne
d
i
n
t
h
e
article file.
The
W
i
kipe
dia article is a
page that contains k
nowledge of each article in
the encyclopedia. The
r
e
are three m
a
in sections for each article
. First
,
the Title secti
on is the
nam
e
or topic of an
article. Second, the
Co
n
t
en
t section
,
th
is section
u
s
e to
exp
l
ain
wh
at th
e ar
ticle is ab
ou
t, an
d also
con
t
ain
t
h
e relativ
e term
s
th
at
lin
k
to
t
h
e
related
articles. Last sectio
n
is Cat
e
g
o
ry, it
con
t
ain
s
th
e catego
r
y
term
s wh
ich
t
h
is article b
e
l
o
n
g
to
i.e. th
e article “Pu
m
a” b
e
lo
ng
to
th
e
cate
g
ori
e
s “Cat stubs”
and “Felines”
.
From
ou
r o
b
se
rvat
i
o
n,
we f
o
u
nd t
h
at
t
h
e cat
ego
r
y
t
e
rm
s from
t
h
e C
a
t
e
gory
sect
i
on see
m
t
o
pro
p
erl
y
represe
n
t as the conce
p
ts of i
t
s article. Therefore, we
a
ppl
y these category ter
m
s be a
represe
n
tative as the
co
n
c
ep
ts
o
f
th
e do
cu
m
e
n
t
wh
ich
co
n
t
ai
ns th
e Title o
f
t
h
e articles with
in
.
2.
1.
2.
B
u
i
l
d
i
n
g the Wi
ki
ped
i
a
C
o
rp
us
We d
o
w
n
l
o
a
d
t
h
e
W
i
ki
pe
di
a art
i
c
l
e
pages
dum
ped as XM
L fi
l
e
t
h
at
cont
ai
ns al
l
of
W
i
ki
pe
di
a
articles in English. For each
the article consist of Title, Conte
n
t an
d Category which a
r
e the thre
e mainly
com
pone
nt
s.
We u
s
e o
n
l
y
t
h
e use
f
ul
Wi
ki
pe
di
a pa
ges
by
rem
ovi
n
g
t
h
e usel
ess
p
a
ges
wi
t
h
cri
t
eri
a
as
fo
llowing
:
1)
A title th
at contain
s
on
ly d
i
g
its ch
aracter su
ch
as "19
4
4
"
, "82
1
"
2)
A sing
le little title su
ch
as “A”, “B”, “X”
o
r
“Y”
3)
An article of y
ears s
u
c
h
as "
1
000s BC"
or "
5
00s
BC (deca
de)"
4)
Ad
m
i
n
i
strativ
e Pag
e
s,
a p
a
ge th
at h
a
s the title start w
ith
"W
ik
ip
ed
i
a
:", "File:", "Po
r
tal:"
,"Category:", "
S
pecial:" and "
T
em
plate
:
"
The Wi
ki
pedi
a
co
rp
us has be
en
c
r
eat
ed by
i
nde
xi
n
g
al
l
Wi
ki
pe
di
a
pa
ges
whi
c
h
ca
n do
wnl
o
ad
f
r
om
W
i
k
i
p
e
d
i
a
website b
y
u
s
i
n
g Ap
ach
e Lu
cen
e
API.
We ind
e
x
th
e Title sectio
n fo
r search
ing
with
an
inpu
t tex
t
an
d use th
e
cat
e
gory terms
in
th
e Catego
ry sectio
n
as th
e co
n
c
ep
ts i
n
th
e
co
n
c
ep
t ex
tract
io
n
m
o
du
le.
2.
1.
3.
E
x
tr
ac
ti
ng the C
o
ncep
ts
Aft
e
r
Wi
ki
pe
di
a cor
p
us
has
b
een creat
e
d
.
W
e
per
f
o
rm
t
h
e conce
p
t
e
x
t
r
act
i
on
by
searc
h
i
ng a
n
i
n
p
u
t
tex
t
in
to
t
h
e
Wik
i
p
e
d
i
a corpus to
m
a
tch
th
e
titles o
f
W
i
k
i
ped
i
a article and
d
e
term
in
e th
e categ
ory term
s an
d
o
u
t
-lin
k term
s o
f
th
e m
a
tch
e
d articles as cand
id
ate co
ncep
t
s
as fo
llowing
step
s in Figure
1
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1536 –
1544
1
538
1.
Search
inpu
t do
cu
m
e
n
t
in
to th
e
W
i
k
i
p
e
d
i
a co
rpu
s
b
y
m
a
tc
h
i
ng
with
th
e Title o
f
W
i
k
i
p
e
dia article.
2.
Collect the ret
u
rne
d
W
i
kipe
dia articles search
results
3.
Get all catego
r
y term
s fro
m
retu
rn
ed article
4.
Sco
r
e eac
h t
e
r
m
by
count
i
n
g
t
h
e n
u
m
b
er
of
occu
rre
nce
5.
Retu
rn th
e
h
i
gh
sco
r
e con
c
ept term
If t
h
e sc
ore
is
m
o
re than
1
Else, ret
u
rn
all
term
s
Figure
1. Conc
ept ext
r
action s
t
eps
2.
2. Graph
-
Based
Conce
p
t Clusterin
g
In t
h
i
s
pape
r,
we p
r
o
p
o
se
d t
h
e cl
ust
e
ri
ng a
l
go
ri
t
h
m
for s
h
o
r
t
t
e
xt
, G
r
a
p
h-
base
d co
nce
p
t
cl
ust
e
ri
n
g
,
whi
c
h co
nsi
s
t
s
of
4
pri
m
ary
st
eps as f
o
l
l
o
wi
ng:
1)
Feat
u
r
e
ext
r
act
i
o
n 2
)
C
a
ndi
dat
e
C
l
ust
e
r Ge
nerat
i
on
and
3
)
Su
b
gra
p
h
det
ect
i
on a
n
d
4
)
C
l
ust
e
r
sco
r
i
n
g a
n
d
l
a
bel
i
n
g,
t
h
at
sh
ow
n i
n
Fi
gu
re
2.
Fi
gu
re
2.
The
wo
rk
fl
o
w
o
f
G
r
ap
h-
base
d C
o
ncept
C
l
ust
e
ri
n
g
al
g
o
r
i
t
h
m
2.
2.
1. Fea
t
ure E
x
tr
acti
on
We separate this step
in
to
2
parts. The first
p
a
rt is “Term
Ex
traction
”
.
We d
e
fi
n
e
th
e “Term
”
as th
e
ori
g
i
n
w
o
r
d
or
p
h
rase
t
h
at
a
p
pears
in
t
h
e search resu
lts.
We rem
o
v
e
all sto
p
-word and
ex
tract th
e terms wit
h
fo
llowing
ru
les th
at sh
own
in
Tab
l
e
1
.
Tab
l
e
1
.
Th
e ru
les
o
f
term
ex
tractio
n
Rules Description
Punctuation M
a
r
k
T
e
r
m
T
h
e ter
m
s that
appear
in punctuation m
a
r
k
s such as double or
single quotes,
par
e
nthesis,
br
acket,
etc.
Noun Phr
a
se T
e
r
m
W
e
detect the Nou
n
phr
ase te
r
m
by using Par
t
-
O
f-
Speech T
a
gger
tools [14
]
Na
m
e
Entit
y T
e
r
m
The ter
m
that
start
with
capital lett
er s
u
ch as “
B
at
m
a
n B
e
gins”
The sec
o
nd ste
p
, “C
oncept
Extraction”, we
e
xpl
aine
d t
h
is st
ep in above sec
tion
(2. CONC
EPT
EXTR
ACTION)
2.
2.
2. C
a
ndi
d
a
te
Cl
us
ter G
e
nera
ti
on
Let
F
= {
f
1
, f
2
, …,f
i
} denot
e
as a set
of feat
ure t
h
at
ge
ner
a
t
e
d fr
om
previ
o
us st
ep,
D
= {
d
1
, d
2
,.
.,
d
j
}de
n
ot
e as a set
of i
n
put
d
o
c
um
ent
for cl
u
s
t
e
ri
ng a
n
d
CC
= {
cc
1
,
cc
2
,…,
cc
i
} de
not
e
as a set
of candi
dat
e
cl
ust
e
r.
We
ca
n e
xpl
ai
n
t
h
e
al
go
ri
t
h
m
t
o
ge
n
e
rat
e
the ca
ndidate cluste
r as
shown in Fi
gure 3.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
G
r
aph
-
B
a
s
ed
Co
n
c
ep
t Clu
s
terin
g
fo
r
Web Sea
r
ch
Resu
lts (S
upa
kpon
g Jina
ra
t)
1
539
Fi
gu
re
3.
Al
go
ri
t
h
m
of can
di
dat
e
cl
ust
e
r
ge
nerat
i
o
n
We apply all features that
returne
d
by feat
ure extraction st
ep as candi
date clusters (
cc
) and
as
si
g
n
th
e do
cu
m
e
n
t
d
j
whic
h c
ontai
ns t
h
e feat
ures
to each ca
ndi
date cluster
2.
2.
3. Sub
Gr
a
ph Detec
t
i
o
n
Let
G
is a weighte
d
undirected acyclic graph, de
note as
G
=
(
V
,
E
)
.
W
e
d
e
not
e
V
= {
v
1
, v
2
, …, v
i
} is
a set of
ve
rtices and
E
= {
e
12
,
e
13
,…,
e
ij
} is a set o
f
weigh
t
ed
edg
e
s i
n
G
. F
i
rst
,
we
defi
ne
candi
dat
e
cl
ust
e
r
cc
i
as
v
i
in
V
. W
e
l
i
nk
v
i
and
v
j
with
th
e co
nd
itio
n
o
f
cluster si
m
i
larity wh
ich
d
e
fin
e
d
as
si
mCluster
(
v
i
,
v
j
)
.
If th
e
simCluster
(
v
i
,
v
j
) is equ
a
l or greater th
an
similarity th
resh
o
l
d
d
e
no
te as
si
m_t
h
,
v
i
a
nd
v
j
will b
e
lin
k
e
d
with
e
ij
,
wi
t
h
wei
g
ht
ed by
val
u
e of
simCluster
(
v
i
,
v
j
) t
h
at is
define
d
a
s
:
,
∩
|
|
∩
2
(
1
)
The g
o
al
o
f
t
h
i
s
st
ep i
s
t
o
fi
n
d
t
h
e co
here
nt
gr
o
ups
of cl
u
s
t
e
rs t
h
at
co
nnec
t
ed by
hi
g
h
we
i
ght
ed
.
We
obs
er
ve t
h
at
t
h
e cohe
re
nt
su
b
gra
ph i
s
c
o
nn
ect
ed by
t
h
e c
a
ndi
dat
e
cl
ust
e
r w
h
i
c
h
has rel
a
t
e
d d
o
cum
e
nt
s.
W
e
appl
y
t
h
e
wel
l
-
kn
o
w
n
g
r
a
p
h
a
l
go
ri
t
h
m
,
brea
dt
h
-fi
rst
searc
h
t
o
det
ect
t
h
e s
u
b
g
r
a
phs
f
r
om
t
h
e
gra
p
h
G.
2.
2.
4. Cl
us
ter Scori
n
g and
L
a
bel
i
n
g
We sc
ore
all cluster
(vertices
in the
graph
G
)
with
scoring
fu
n
c
tion
scoreC
luster
(
c
i
) shown
in (2
).
The
n
we s
o
rt
b
y
hi
g
h
sc
ore
an
d sel
ect
t
o
p
N
clu
s
ters as clu
s
ter resu
lts.
(
2
)
∑
∈
|
|
(
3
)
∑
|
|
(
4
)
Whe
n
E
i
i
s
a
se
t
of
ed
ge
of
c
i
,
|
|
is nu
m
b
er
of
ed
g
e
s in
E
i
,
w(e)
i
s
a wei
ght
of
ed
ge
e
,
c
ij
is a
cluster
t
h
at
l
i
nke
d t
o
c
i
,
|
|
i
s
num
ber
of
d
o
cum
e
nt
i
n
c
l
ust
e
r
c
and
|
|
i
s
num
ber
of
al
l
d
o
cum
e
nt
s
To dis
p
lay the cluster label for user’s explori
n
g.
W
e
ge
nerate the label of each cluste
r results by
sel
ect
i
on t
h
e
hi
ghe
st
n
u
m
b
er of
d
o
cum
e
nt
of
a cl
ust
e
r i
n
s
u
b gra
ph t
h
en
use the feat
ure t
e
rm
as a cluste
r label
.
In case o
f
t
h
e
sam
e
num
ber of
doc
um
ent
for m
a
ny
cl
ust
e
rs i
n
t
h
e sam
e
sub
gra
p
h
,
w
e
cho
o
se t
h
e c
once
p
t
term
as a lab
e
l first.
3.
R
E
SU
LTS AN
D ANA
LY
SIS
In o
u
r
ex
peri
m
e
nt
s, we
c
o
m
p
are
t
h
e res
u
l
t
s
of o
u
r
p
r
o
p
o
se
d
cl
ust
e
ri
n
g
fr
am
ewor
k wi
t
h
wel
l
-
k
n
o
w
n
t
r
adi
t
i
onal
we
b sea
r
c
h
resul
t
s cl
ust
e
ri
ng
al
go
ri
t
h
m
s
. Fi
rst
,
t
h
e
s
u
f
f
i
x
t
r
e
e
cl
ust
e
ri
ng
(
S
TC
)
by
[
5
]
-
[
6]
an
d
secon
d
, Lingo (singu
lar
v
a
lue d
e
co
m
p
o
s
itio
n b
a
sed algo
rith
m
)
.
W
e
set
u
p th
e exp
e
ri
men
t
s for an
al
yzin
g
in
three as
pects a
s
followi
ng:
1)
The ex
pe
ri
m
e
nt
i
n
or
de
r t
o
pr
oo
f t
h
at
usi
ng
kn
o
w
l
e
d
g
e
from
ext
e
rnal
sou
r
ce, c
once
p
t
t
e
r
m
wh
ich
g
e
n
e
rated
in con
c
ep
t extractio
n
step
, can
im
p
r
ov
e th
e qu
ality o
f
cl
u
s
terin
g
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1536 –
1544
1
540
2)
Exp
e
rim
e
n
t
to
co
m
p
are ou
r pro
p
o
s
ed
cl
u
s
teri
n
g
algo
rith
m
with
trad
ition
a
l
o
n
e
s.
3)
Exp
e
rim
e
n
t
to
tu
n
e
up
th
e qu
ality o
f
p
r
o
p
o
s
ed
alg
o
rith
m b
y
v
a
ryin
g
the k
e
y p
a
ram
e
t
e
rs, the
si
m
ilarit
y
th
resh
o
l
d
(sim
_
t
h
)
in
th
e grap
h con
s
tru
c
tio
n step.
3.
1.
Web
Sear
ch Res
u
l
t
T
e
s
t
Set
We
u
s
e th
e web
search resu
l
t
d
a
ta set called
AM
BIENT t
o
ev
alu
a
te th
e
q
u
a
lity of clu
s
terin
g
. Th
e
AM
B
I
E
N
T
da
t
a
set
consi
s
t
s
of
4
4
am
bi
gu
ous
q
u
eri
e
s
(t
opi
cs
). F
o
r ea
ch q
u
e
r
y
use
d
t
o
searc
h
i
n
t
o
Yah
o
o
search engine
. The top
100 searc
h
res
u
lts of eac
h
qu
ery were labele
d by s
u
btopics of each
que
r
y. For
i
n
st
ance,
t
h
e
q
u
ery
“
j
a
gua
r”
con
s
i
s
t
s
o
f
m
a
ny
s
ubt
opi
cs
s
u
ch as “
A
tari
jagua
r
Vide
o
Gam
e”, “Ma
mmal”,
“Jack
so
nv
ille Jag
u
a
r
Am
erica
n
Foo
t
b
a
ll (NFL)” and
etc.
Th
is
d
a
ta set
co
n
t
ains ab
ou
t
4
,
40
0 search
resu
lts
and
wi
del
y
use
d
i
n
m
a
ny
researches
[
1
]
,
[8]
t
h
at
rel
a
t
e
d t
o
doc
um
ent
or
s
h
o
r
t
t
e
xt
cl
ust
e
ri
n
g
.
3.
2. E
v
al
ua
ti
o
n
Me
th
od
In
th
is p
a
p
e
r,
we u
s
e th
e
Precisio
n
,
Recall an
d
av
era
g
e F-score, in m
a
ny re
sear
ch
w
o
rk
s [3
],
[7
],
[15
-
17
] called
F-m
easu
r
e, t
o
measu
r
e the
q
u
ality o
f
clu
s
teri
n
g
.
Precisi
on
is fractio
n of
d
o
cu
m
e
n
t
s in
cluster
c
t
h
at
bel
o
ng t
o
cl
ass
t
and al
l
doc
um
ent
s
i
n
cl
ust
e
r
c
.
Rec
a
ll
is fraction
of
d
o
c
u
m
en
ts in
clu
s
ter
c
th
at b
e
lo
ng
t
o
class t and all
doc
um
ent in cl
ass
t
that a
r
e
define
d as:
,
,
|
|
(
5
)
,
,
|
|
(
6
)
F-sc
ore
m
easure
m
ent is harm
onic m
ean of
precision and
re
call that defi
ne
d as:
,
,
,
,
,
(
7
)
∑
∈
,
∈
|
|
∑|
|
∈
(
8
)
3.
3. E
x
peri
me
ntal
Res
u
l
t
s
For
eac
h o
u
r
e
xpe
ri
m
e
nt
, we
den
o
t
e
S
T
C
as
su
ffi
x
t
r
ee cl
u
s
t
e
ri
ng
,
den
o
t
e
Li
n
go a
s
Li
n
g
o
cl
ust
e
ri
n
g
and
ou
r p
r
o
p
o
s
e
d al
go
ri
t
h
m
,
we de
not
e as
GC
(g
ra
ph
-ba
s
ed cl
ust
e
ri
ng
).
We use
d
t
h
e A
M
B
I
ENT
dat
a
set
for
testin
g
th
e qu
al
ity o
f
each
clusterin
g
algo
rithm
s
an
d
eval
uated them
with t
h
e a
v
era
g
e F-s
c
ore
m
easurement.
The re
sul
t
s
o
f
eval
uat
i
o
n i
n
f
i
gu
re 4 s
h
ow
h
i
ghe
r val
u
e o
f
F-sc
ore
fo
r
G
C
al
gori
t
h
m
com
p
are wi
t
h
STC an
d Lingo
.
For
ru
nn
ing th
e cl
u
s
ter
i
ng algo
r
ith
m
w
ith
ou
t con
cep
t ex
traction
,
GC
is 32
.8
5
%
im
p
r
ov
ed
com
p
are wi
t
h
STC
and
50
.2
5% i
m
prove
d
com
p
are wi
t
h
Li
ng
o. F
o
r
ru
n
n
i
n
g wi
t
h
c
onc
ept
ext
r
act
i
o
n,
GC
i
s
27
.6
3% i
m
pro
v
ed c
o
m
p
are wi
t
h
STC
an
d
44.
6
2
% im
pr
ove
d com
p
are
wi
t
h
Li
n
go a
n
d GC
wi
t
h
co
ncept
i
s
6.
61
% i
m
prov
ed c
o
m
p
are wi
t
h
GC
wi
t
h
o
u
t
conce
p
t
.
Fo
r an
alyzin
g
th
e effect
o
f
si
m
i
larity th
resh
o
l
d
o
n
th
e
GC alg
o
rith
m
,
we m
o
d
i
fied th
e sim
ilarit
y
t
h
res
hol
d
fo
r s
e
veral
val
u
e
s
.
The
res
u
l
t
of
t
h
e si
m
i
l
a
ri
t
y
t
h
resh
ol
d m
odi
fi
c
a
t
i
on
has s
h
ow
n i
n
Fi
g
u
r
e
5 a
n
d
6
Fi
gu
re
4.
C
l
ust
e
ri
n
g
re
sul
t
s
c
o
m
p
ar
ison
for e
ach clustering
algorithm
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
G
r
aph
-
B
a
s
ed
Co
n
c
ep
t Clu
s
terin
g
fo
r
Web Sea
r
ch
Resu
lts (S
upa
kpon
g Jina
ra
t)
1
541
Fi
gu
re
5.
C
l
ust
e
ri
n
g
re
sul
t
of
vary
i
n
g
of
G
r
a
p
h
-
based
cl
ust
e
ri
n
g
’s
si
m
i
l
a
rit
y
t
h
resh
ol
d
wi
t
h
ot
her
al
g
o
ri
t
h
m
s
Fi
gu
re
6.
C
l
ust
e
ri
n
g
re
sul
t
of
vary
i
n
g si
m
i
l
a
r
i
t
y
t
h
resh
ol
d
o
f
Gr
ap
h-
base
d c
l
ust
e
ri
n
g
In Fi
g
u
r
e 7,
we sh
ow t
h
e
avera
g
e n
u
m
b
er of s
ub
gra
ph t
h
at
i
s
gen
e
rat
e
d f
r
om
t
h
e sub g
r
a
p
h
d
e
tectio
n step
i
n
th
e GC
algo
rith
m
b
y
ch
ang
i
n
g
th
e
similarit
y
th
resh
o
l
d
v
a
lu
e.
Fi
gu
re 7.
A
v
er
age num
ber of
sub
g
r
a
phs
by
vary
i
n
g
t
h
e
si
m
i
l
a
ri
ty
t
h
resh
ol
d val
u
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1536 –
1544
1
542
For the sea
r
c
h
results expl
oring,
users
browse to
each cl
uster res
u
lts that generate
d by clustering
al
go
ri
t
h
m
s
. In
fi
g
u
re
8,
we s
h
o
w
t
h
at
t
h
e c
l
ust
e
r re
sul
t
’
s l
a
bel
s
ge
ne
rat
e
d by
STC
,
Li
n
go a
n
d
ou
r
pr
o
pos
ed
m
e
t
hod
GC
C
whi
c
h
per
f
o
r
m
t
h
e cl
u
s
t
e
ri
n
g
on
i
n
put
d
o
cu
m
e
nt
fr
om
t
opi
c “Jag
uar”
i
n
t
h
e
AM
B
I
E
N
T
dat
a
set
.
Fi
gu
re
8.
C
l
ust
e
r La
bel
s
gene
rat
e
d
by
cl
ust
e
ri
n
g
al
g
o
ri
t
h
m
s
To
im
p
r
ov
e the qu
ality o
f
cluster,
we
u
s
ed ph
rase
or term
co
-o
ccu
r
ren
c
e based
with
g
r
aph
clu
s
teri
ng
alg
o
rith
m
b
y
u
s
in
g
t
h
e orig
i
n
al ter
m
s fro
m
th
e inpu
t do
cumen
t
an
d
t
h
e ex
tend
ed con
c
ep
t term
s wh
ich were
g
e
n
e
rated
from co
n
cep
t ex
t
r
actio
n
step
. Th
e con
c
ep
t term
s
can
h
e
lp
to
group
in
th
e case o
f
so
m
e
related
d
o
c
u
m
en
ts th
at d
o
no
t sh
are
th
e sam
e
wo
rds bu
t sh
are
t
h
e
sam
e
concept
.
On the ot
her ha
nd, the c
oncepts
fro
m
th
e ex
tern
al
k
n
o
w
ledg
e sou
r
ce reliev
e
th
e lim
i
t
ati
o
n
o
f
short tex
t
by in
creasi
n
g the in
form
at
iv
e d
a
ta for
effective clust
e
ring process
.
On the c
ont
rary, the conc
e
p
t extraction ste
p
consum
e
a lot of processing tim
e
because its searchi
ng ste
p
need to acces
s
many tim
e
s
to the ha
rd
dri
v
e (I/O) int
o
the
W
i
kipedia c
o
rpus
i
nde
xed
fi
l
e
s.
An
ot
he
r key
v
a
ri
abl
e
t
h
at
can affect
t
h
e pe
rf
orm
a
nce of
ou
r cl
ust
e
ri
ng
al
go
ri
t
h
m
i
s
t
h
e sim
i
l
a
ri
t
y
th
resh
o
l
d
.
Th
is v
a
riab
le u
s
ed
to
d
e
term
in
e t
h
e relati
on
of
t
w
o can
di
dat
e
cl
ust
e
r o
n
t
h
e sub
gra
p
h det
e
ct
i
on
step
. When the similarity th
resho
l
d
is ch
ang
e
d to h
i
g
h
e
r
v
a
lu
e, two
cand
id
ate cl
u
s
ters will b
e
co
nn
ected
i
f
an
d
on
ly if th
ey h
a
v
e
alm
o
st
th
e sam
e
d
o
c
umen
t
m
e
m
b
ers wh
ich
affect t
o
th
e nu
m
b
er
o
f
su
b
graph
will b
e
in
creased
b
u
t
t
h
e
p
e
rform
a
n
ce of clu
s
tering
will
b
e
d
ecreased
(see
Fi
g
u
re 5
an
d
6).
Fo
r t
h
e clu
s
tering
resu
lt ex
p
l
oratio
n, search
resu
lts clu
s
terin
g
algorith
m
s
n
eed
to
p
r
ov
id
e the
reada
b
le clust
e
rin
g
res
u
lt f
o
r use
r
s.
Lin
g
o
algo
rith
m
wo
rke
d
on
t
h
e d
e
scriptio
n-c
o
m
e
-first para
dig
m
.
It
foc
u
se
d o
n
t
h
e
m
eani
n
g
f
ul
cl
u
s
t
e
r l
a
bel
ge
ne
r
a
t
i
on fi
rst
an
d
gr
o
upi
ng t
h
e i
n
put
do
cum
e
nt
whi
c
h si
m
i
l
a
r t
o
t
h
e
lab
e
l after. Li
kewise th
e STC
alg
o
rith
m
,
th
e p
h
rase
for
th
e
clu
s
tering
result’s lab
e
l, will b
e
m
a
in
tain
ed
o
n
t
h
e
su
ffix
tree constru
c
tion
step
an
d
th
e
n
u
m
b
e
r o
f
word
in
that p
h
r
ase will b
e
calcu
lated
fo
r clu
s
teri
ng
sco
r
i
ng
step as
well.
For cl
uster label gene
ration in our proposed algo
rithm
,
the label of each candidate
cluster wa
s
g
e
n
e
rated in
t
h
e cand
i
d
a
te cluster g
e
n
e
ratio
n step
b
y
u
s
i
n
g
th
e term
s th
at ex
tract fro
m
o
r
ig
in
al do
cu
m
e
n
t
or
the concepts
from
concept extraction
st
ep
.
Aft
e
r t
h
e s
ub
gra
p
h det
ect
i
o
n st
ep, we select th
e clu
s
ter lab
e
l by
u
s
ing
t
h
e co
n
c
ep
t term
first.
4.
CO
NCL
USI
O
N
In t
h
i
s
pap
e
r
w
e
have s
h
ow
n t
h
e achi
e
vem
e
nt
of we
b sea
r
c
h
res
u
l
t
cl
ust
e
r
i
ng
by
l
i
nki
n
g
t
h
e rel
a
t
e
d
candi
dat
e
cl
ust
e
r i
n
gra
p
h-
bas
e
d m
odel
an
d r
e
duci
n
g
t
h
e
va
ri
ance
of
d
o
cu
m
e
nt
wi
t
h
i
n
o
n
e cl
ust
e
r by
fi
ndi
n
g
th
e co
mm
o
n
wo
rd
s
(co
n
c
ep
ts) for
j
o
i
n
all
do
cu
m
e
n
t
s in the sam
e
clu
s
ter.
W
e
tested
t
h
e
d
o
c
u
m
en
t clu
s
tering
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
G
r
aph
-
B
a
s
ed
Co
n
c
ep
t Clu
s
terin
g
fo
r
Web Sea
r
ch
Resu
lts (S
upa
kpon
g Jina
ra
t)
1
543
i
n
t
h
e case
of
vary
i
n
g t
h
e
v
a
l
u
e o
f
si
m
i
l
a
ri
t
y
t
h
reshol
d.
We f
o
un
d t
h
at
t
h
e val
u
e
bet
w
een
0.
4
– 0
.
5 t
h
at
pr
o
duce
d
t
h
e
h
i
gh
pe
rf
orm
a
nce o
f
cl
ust
e
ri
n
g
.
The m
a
in reas
on for using t
h
e
W
i
ki
pedia
as th
e ex
tern
al k
now
ledg
e so
urce i
n
stance
for
ot
he
r
sources t
o
ext
r
act the concept
of
do
cum
e
nt. Because,
Wiki
pedia is a large
s
t online e
n
cyc
l
ope
dia that contains
a lo
t o
f
articles wh
ich
co
v
e
r m
o
st
o
f
to
p
i
cs
in
th
e world
inclu
d
i
ng
techn
i
cal ter
m
s o
r
n
a
me en
tit
ies an
d n
e
wly
wo
rd
t
h
ese
can
n
o
t
f
o
un
d i
n
E
ngl
i
s
h l
e
xi
c
o
n
dat
a
base
.
In t
h
e
fut
u
re,
we ca
n i
m
prov
e t
h
e ef
fi
ci
enc
y
of e
x
t
r
act
i
n
g
conce
p
t
fr
om
W
i
ki
pe
di
a c
o
r
pus
by
fi
n
d
i
n
g
t
h
e go
o
d
feat
u
r
e sel
ect
i
on t
echni
que
s or c
o
m
b
i
n
i
ng t
h
e
man
y
ex
tern
al kn
ow
ledg
e so
urces as a single corpus
to
ex
ten
d
informatio
n
fo
r t
h
e
sh
ort
d
o
c
u
m
en
t.
ACKNOWLE
DGE
M
ENTS
Thi
s
pa
per
ha
s been s
u
pp
o
r
t
e
d by
t
h
e N
a
t
i
onal
El
ect
ro
ni
cs an
d C
o
m
put
e
r
Tech
n
o
l
ogy
C
e
nt
e
r
(
N
ECTEC)
und
er gr
an
ts TGIST-0
1
-5
0-
068
.
REFERE
NC
ES
[1]
X. Han and J
.
Zhao, “
T
opic-
Driven W
e
b Search Res
u
lt Organization b
y
Lever
a
ging Wikipedia Semantic
Knowledge,”
in
International Co
nference on Info
rma
tion and Knowledge Manag
ement, CIKM10,
pp. 1749-1752
,
2010.
[2]
C. L. Ngo
and
H. S. Ngu
y
en
,
“A method of web search
result
clustering b
a
sed
on rough sets,”
in
Proceeding o
f
th
e
2005 IEE
E
/WIC/
A
CM, In
ternatio
nal Conf
erence
on Web
Intellig
ence
, pp. 673-679
, 2005
.
[3]
D. Crabtree, X.
Gao, and P. Andreae., “Imp
roving Web Clustering b
y
Cluster S
e
lection
,
”
in
T
h
e
IEEE/WIC
/AC
M
,
International Co
nference on
Web
I
ntelligen
c
e
, pp
.
172-178, 2005
.
[4]
S. Osinski, J. St
efanowski, and
D.
Weiss, “Lingo: Search Results Clus
tering Algorithm Based on Singular Value
Decomposition,”
in
The Proceed
ings of Intellig
e
n
t Information Processing and W
e
b Mining (
IIPWM‘04)
, pp. 359-
368, 2004
.
[5]
O. Zam
i
r and
O. Et
zioni
, “
W
e
b Docum
e
nt Clustering:
A Feasib
ilit
y D
e
m
onstrat
ion,”
in
Proceedings of the 21s
t
annual international ACM
SIGI
R con
f
eren
ce on
Rese
arch and
developmen
t in
in
formation retrieval
. 1998
.
[6]
O.
Zamir,
et al.
,
“
F
ast and int
u
itive
cluster
i
ng
of W
e
b docum
ents”.
in
Proceedings of the
3rd International
Conference on
K
nowledge Disco
very and
Data M
i
ning
, pp. 287-2
90, 1997
.
[7]
S. Jinarat, C. H
a
ruech
aiy
a
s
a
k, a
nd A. Rungsawang, “Web Snippet Cluste
r
i
ng
Based on Tex
t
Enrichment with
Concept Hi
erar
c
h
y,”
in
Proceeding of the 16
th In
ternational Con
f
erence o
f
Neural Information Processing
, pp. 309-
317, 2009
.
[8]
A. Di Mar
c
o, and R. Nav
i
gli, “
C
lu
stering
and
Diversif
y
i
ng W
e
b Sear
ch R
e
sults with Graph-B
a
sed Word Sens
e
Induction
”
,
Computational Lingu
istics
. Vol. 39, No. 3
,
pp
. 709-75
4, 2013
.
[9]
J.
Hu,
et al.
, “Enhancing Tex
t
Clustering b
y
Lever
a
ging Wik
i
pedia Semantics,”
in
Internatio
nal Conference
on
Research and Development in I
n
formation Retri
eval, SIGIR 2008. Thirty-First Annual ACM SIGIR
, pp. 179-186,
2008.
[10]
D. Milne, O
.
Medel
y
a
n
,
and I.
H. W
itten, “
M
in
ing Dom
a
in-Specifi
c Thesaur
i
fr
om
W
i
kipedia:
A Case Stud
y
,
”
in
Proceed
ings of
t
h
e 2006 I
E
E
E
/W
IC/ACM In
terna
tional Con
f
eren
c
e
on W
e
b In
te
lli
gence
, pp
. 442-4
48, 2006
.
[11]
E. Gabrilov
i
ch
and S. Markovitch, “
O
vercom
i
ng the Br
ittl
ene
ss Bottleneck u
s
ing W
i
kipedia:
Enhancing T
e
xt
Categor
ization
with Ency
clop
ed
ic Knowledge,”
In
AAAI Proce
e
d
ings of the 21st National Confe
r
ence on Artifi
ci
al
intel
ligen
ce
, pp
.1301-1306, 200
6.
[12]
P. Schönhofen
, “Identif
y
i
ng docu
m
ent topics
usin
g the Wikiped
i
a
categor
y
n
e
twor
k,”
In
th
e
Proceedings of the 20
06
IEEE
/WIC/ACM
Internat
ional C
onfer
ence on
Web Intellig
ence (
WI'06)
, pp. 456-
462, 2006
.
[13]
D.
Mi
l
n
e a
n
d I. H.
Wi
tt
e
n
, “L
ea
rni
ng
t
o
Li
nk
wi
t
h
Wi
ki
pe
di
a,”
In
In
ternat
ion
a
l Conf
erence
o
n
Information a
nd
Knowledge Man
agement, CIKM
08.
pp
. 509-516
, 2008.
[14]
K. Toutanova
an
d C. D. Ma
nning. “Enriching th
e Knowledge Sources Us
ed in a Maximum Entro
p
y
Part-of-Speech
Tagger
,
”
In
The
Joint SIGDAT Conference on
Empirical M
e
t
hod
s in Natural Language Processing and Very Large
Corpora (
E
MNLP/VLC-2000)
,
p
p
. 63-70
, 2000
.
[15]
P. Vahdani Amoli and O. Sojoodi Sh., “Scientif
ic Do
cuments Clustering
Based
on Text Summarization”,
International Jo
urnal of
Electrical
and Computer Engin
eering
(
I
JECE)
,
Vol. 5, No. 4
,
pp
.782-787
, 2015
.
[16]
Ravi kumar V., and K. Raghuveer, “Leg
al Documents Cl
ustering and Su
mmar
i
zation using Hierarch
ical Laten
t
Dirichlet Allocation”,
IA
ES Int
e
r
national
Journal
of
Artif
icia
l In
telligen
ce (
I
J-AI)
,
Vol. 2
,
No. 1, pp
. 27-35
, 2013
.
[17]
O. Tsur, A. Littman, and
A. Rappoport, “Eff
icient Cluster
i
n
g
of
Short Messages into General Domains,”
In
Proceed
ing of
th
e 7th
Int
e
rnation
a
l A
AAI
Confere
n
ce on
Webb
log
s
and Socia
l M
e
dia
, 2013
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1536 –
1544
1
544
BIOGRAP
HI
ES OF
AUTH
ORS
Supakpong Jinarat r
e
ceived
his
B. Sc. of
Co
mputer Scien
c
e from Khon Kaen Univ
ersity
,
Thailand, in 19
99. He received
his M. Sc. of
Com
puter S
c
ience from
Kas
e
ts
art Univers
i
t
y
,
Thailand, in 20
04. He is currently
a Ph.D. ca
ndidate in Computer Engin
eering at Kasetsar
t
Univers
i
t
y
. His
res
earch in
ter
e
s
t
s
are M
achine l
earning
, Data a
nd W
e
b M
i
ning and Big Data
Analy
s
is
Choochart Haru
echaiy
a
sak r
e
ceived his Ph.D. d
e
gree from
the Department
of Electrical
and
Computer Engin
eering
,
University
of Miami,
in
2
003. After
receiving his
degr
ee,
he has worked
as a r
e
sear
cher
at the Natio
n
a
l Electronics
an
d Computer
Technolog
y
C
e
nter (NECTEC)
,
National Scien
c
e and Techno
log
y
Developm
ent Agenc
y
(NS
T
DA). His
current r
e
s
earch
inter
e
s
t
s
includ
e data
an
d text mining,
natural languag
e
processing, information
retrieval, and
data
management. O
n
e of his objectives is to pr
o
m
ote R&D works through IT applications for
government offices and busin
ess sectors in
Thailand
. He
is als
o
a visiting
lecturer at sever
a
l
universities in
Thailand.
Arnon Rungsawang received h
i
s B.Eng. of
Electr
i
cal Engin
e
ering from the King Mongkut
Institute of
Technolog
y
(L
adkr
abang
cam
pus), Th
ailand
,
in 1
986. He receiv
ed his PhD in
Computer Engineering
from th
e ENST-Paris,
Fr
ance
in 1997. Currently
, h
e
is an associate
professor in the Department of
Computer E
ngin
eering
at Kasets
art University
,
Bangkok. His
res
earch
int
e
res
t
s
are Inform
at
io
n and Knowled
g
e Ret
r
iev
a
l
,
In
terne
t
Com
putin
g, and
S
o
cia
l
Network Mining
.
Evaluation Warning : The document was created with Spire.PDF for Python.