TELKOM
NIKA
, Vol.11, No
.1, March 2
0
1
3
, pp. 207~2
1
4
ISSN: 1693-6
930
accredited by D
G
HE (DIKTI
), Decree No: 51/Dikti/Kep/2010
207
Re
cei
v
ed
De
cem
ber 2
9
, 2012; Re
vi
sed
Jan
uar
y 9, 20
13; Accepted
February 7, 2
013
Combination of Cluster Method for Segmentation of
Web Visitors
Yuhefizar
1
, Budi Santosa
2
, I Ketut Eddy
P
3
, Yo
y
o
n
K. Suprapto
4
1
Information T
e
chno
log
y
D
e
p
a
r
tment, Politek
nik Ne
geri Pa
d
ang, Ind
ones
ia
2
Industrial En
gi
neer
ing D
e
p
a
rtment, Institut
T
e
kno
l
og
i Sep
u
l
uh No
pemb
e
r (IT
S
), Suraba
ya
3,4
Electrical En
gin
eeri
ng D
epa
rtment, Institut
T
e
knologi S
e
p
u
lu
h Nop
e
mb
e
r
(IT
S
), Suraba
ya
yu
hefiz
ar@p
oli
npd
g.ac.id
1)
,budi_s@ie.its.ac.id
2)
{ ketut
3)
,
y
o
y
onsu
p
rapt
o
4)
}
@
ee.its.ac.id
Abs
t
rak
Klasteris
a
si
merup
a
kan
sal
a
h satu b
a
g
i
an
penti
ng
dal
a
m
w
eb
usag
e
mi
nin
g
u
n
tuk
keper
lua
n
seg
m
e
n
tasi
pe
ngu
nju
ng.
Hal
i
n
i s
ang
at b
e
rg
una
unt
uk
ke
p
e
rlu
an
pers
ona
lisasi
ata
u
mo
d
i
fikasi w
e
b. Da
l
a
m
pap
er i
n
i, ka
mi
melak
u
ka
n kl
a
s
terisasi ter
h
a
d
ap p
e
n
g
u
n
ju
ng
w
eb
men
g
g
u
n
a
kan k
o
mb
inas
i metod
a
kl
aste
r
hirarki
da
n n
o
n
-
hirarki t
e
rha
d
a
p
data
w
eb l
o
g
.
Me
toda k
l
ast
e
r hir
a
rki d
i
g
u
n
a
kan
da
la
m p
e
nentu
an
ju
ml
a
h
klaster d
an
no
n-hirark
i d
i
gu
n
a
kan
da
la
m
membe
n
tuk kl
as
ter. T
ahap
an
a
nalis
is kl
aster
did
ahu
lui
de
ng
an
pra-p
eng
ol
aha
n data da
n a
nalis
is Factor. Deng
an p
e
n
dekata
n
ini, p
e
milik w
eb le
bih efektif da
l
a
m
me
ne
mukan p
o
la akses
p
e
ngu
nju
ng
w
e
b
dan me
mber
ikan pen
geta
h
uan baru da
l
a
m
s
e
g
m
enta
s
i
pen
gu
nju
ng. D
a
ri pe
ng
uji
an y
ang d
i
l
a
kuka
n
terhad
ap d
a
ta
w
eb log IT
S, diper
ole
h
6 klas
ter peng
un
jun
g
w
eb da
n kl
aster ke-3
me
mp
unya
i
j
u
mla
h
a
ngg
ota terb
esa
r
. Hal i
n
i
menj
adi
masuk
an
b
agi
pe
nge
lo
la
w
e
b
untuk
me
mper
hatika
n
p
o
l
a
p
e
rilak
u
ang
got
a klast
e
r
ke-
3
terseb
ut ba
ik
untuk
kep
e
rl
u
an
perso
na
lisa
s
i
ataup
un
mo
difi
kasi w
eb. Hal i
n
i ju
ga
me
mb
u
k
tikan kel
a
yak
an da
n efisi
ens
i dari p
ener
ap
a
n
met
oda i
n
i.
Ka
ta
k
unc
i
: web usa
ge
mi
nin
g
, anal
isis klast
e
r, person
a
lis
a
s
i w
eb, mod
i
fik
a
si w
eb, w
eb log
A
b
st
r
a
ct
Clusteri
ng
is o
ne of the
i
m
por
tant
part in w
e
b usa
ge
mi
nin
g
f
or the pur
pose
of seg
m
e
n
ting
visitors
.
T
h
is action is
very imp
o
rta
n
t for w
eb persona
li
z
a
ti
on o
r
w
eb mo
dificat
i
on. In this pa
per, w
e
perfor
m
clusteri
ng of t
he w
eb vis
i
tor
s
usin
g a co
mb
in
ation
of meth
ods
of hi
erarchic
al
and
non-
hier
archi
c
a
l
clusteri
ng tow
a
rd w
eb
lo
g d
a
t
a
. Hi
erarch
ical
clusteri
ng
method
us
ed to
d
e
termin
e
th
e n
u
mber
of cl
ust
e
rs,
and
non-
hi
erar
chical c
l
usteri
n
g
metho
d
is u
s
ed in for
m
in
g
clusters. T
h
e
stages of clu
s
ter ana
lysis a
r
e
prece
ded
by pr
e-proc
essin
g
the data
and fa
ctor analys
is. W
i
th this appro
a
ch
, the ow
ner
of the w
eb is
mor
e
effective at fi
ndi
ng acc
e
ss
patterns
of w
eb vi
sitors
and c
an
have
new
know
le
dge
ab
out vis
i
tor
s
’
se
gm
en
ta
ti
on
. Fro
m
th
e
te
st a
p
p
l
i
e
d
o
n
ITS’
s we
b
l
o
g
d
a
t
a, 6
cl
u
s
te
rs o
f
we
b
vi
si
to
rs a
r
e
re
su
l
t
ed
. Among
the 6 cluster, cluster 3 has
the bi
g
gest nu
mb
er of me
mbers. T
h
is in
fo
rmati
on ca
n b
e
useful for w
e
b
ma
na
ge
me
nt to pay attenti
o
n on
me
mb
er
s
’
b
e
h
a
vior
al
patterns of th
e 3rd clust
e
r
’
s either to make
perso
nal
i
z
a
t
i
o
n
or
mo
dificati
on
on th
e w
e
b. The test
r
e
sults s
how
the fe
asib
ility
and
efficie
n
cy
o
f
app
licati
on of this metho
d
.
Keyw
ords: w
eb usage
mi
nin
g
, cluster an
alysi
s, w
eb persona
li
z
a
tio
n
, w
eb modific
a
tion, w
eb logs
1. Introduc
tion
The Inte
rnet
has be
com
e
a hu
ge info
rmation
sou
r
ce [1] an
d a
n
i
m
porta
nt me
dia in
the
distrib
u
tion of
current info
rmation. This
is an
inte
gral
part of one i
n
ternet
servi
c
e, nam
ely the
Wo
rld Wid
e
Web
(WWW) that is capa
ble of diss
em
inating inform
ation in text,
image, video,
or
voice and m
u
ltimedia. The survey
result
s conducte
d
by Netcraft, in July 2012
states that there
are 6
65,91
6,461 a
c
tive sit
e
s, an
d a
c
cording to in
te
rn
et wo
rld
stats, in De
cem
b
e
r
201
1 the
r
e
are
2.267.23
3.74
2 internet u
s
ers i
n
the world. Thi
s
m
ean
s that the intera
ction
betwe
en Inte
rnet
use
r
s
with web site
s is very high and
web
serve
r
s
re
cord every activity of the
visitor is in the
form of files
(web l
og).
Unt
il now, a
web
log ha
s be
co
me the mo
st i
m
porta
nt part
in We
b Usag
e
Mining (WUM) to gathe
r the web visit
o
r data, e
s
pe
cially in findi
ng patterns o
f
visitors’ a
c
cess,
predic
t
ion of
vis
i
tors
’ behavior [2],[3
],
to
c
r
eate a user
profiles
[4],[5].
WUM or web
log mini
ng [
6
] is o
ne
cat
egory i
n
the f
i
eld of
web m
i
ning [7], whi
c
h i
s
the
mining cond
u
c
ted on the
web ba
sed o
n
web lo
g data.
Spec
ific
ally, by [8], s
t
ates
that WUM is
the
appli
c
ation of
data mining techni
que
s to discover
the
interactio
n b
e
twee
n visitors of a we
bsi
t
e
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 1693-6
930
TELKOM
NIKA
Vol. 11, No. 1, March 2
013 : 207 – 2
1
4
208
throug
h
web
l
og d
a
ta. Th
e
mining
of we
b log
s
i
s
useful for a va
riet
y of fields, i
n
cludi
ng fo
r
web
person
a
lization [9] and we
b modificatio
n
[10].
Techniques on WUM
i
s
including stat
istica
l analy
s
is[11],
associ
at
ion rules [12],[13],
sequential patterns [14],[15], clas
sificati
on [16],[17] and clusteri
ng
[18-20]. Clust
e
ring i
s
one
of
the importa
nt topics i
n
WUM for visito
r segm
entatio
n based o
n
a
c
cess patte
rn
s on the
web
or
freque
ncy of
visits. by [21], use beli
e
f func
tion m
e
thod to pe
rform the clu
s
te
ring on
web l
o
g
data. Th
ey di
vide web vi
sitors into
different
g
r
oup
s a
nd find
a
co
mmon
acce
ss p
a
ttern
for
each
grou
p me
mb
er. However,
this a
pproa
ch
still requi
re
s
identify se
ssions that a
r
e l
e
ss effici
ent
on
the pre
-
p
r
o
c
essing
stage.
By [22
], condu
ct the clu
s
terin
g
of we
b visitors wit
h
the K-Me
a
n
s
method an
d they only prove that
the method of K-Me
ans cl
uste
rin
g
can be u
s
e
d
to web log data
without valida
t
ion of its cluster re
sult.
Acco
rdi
ng to
[23], clu
s
tering on
we
b
se
ssi
on
s in
cl
ude
s three
stages, n
a
mel
y
pre-
pro
c
e
ssi
ng, measurement
on the similarity and th
e
applicatio
n of cluste
r alg
o
rithm
s
. In this
resea
r
ch, we
perform clu
s
tering ba
se
d on the visitin
g
freque
ncy
of visitor on the site
s in the
given pe
riod
of time rega
rdless of the
web
se
ssion
so it is m
o
re
efficient at the pre
-
p
r
o
c
e
s
sing
stage
and th
en we pe
rform clu
s
teri
ng
usin
g a
comb
i
nation of hi
e
r
archi
c
al
and
non-hie
r
arch
ical
c
l
us
te
r
me
thod
s
.
This pa
per is organi
zed
a
s
follo
w: in
chapt
er 1
that
explain
s
th
e
ba
ckground
of the
resea
r
ch an
d
also
the
rela
ted re
se
arch,
cha
p
te
r
2 di
scusse
s a
b
o
u
t stage
s
of the research
as
well a
s
the method u
s
ed
, chapte
r
3 is abo
ut the result and a
nalysi
s
, and
cha
p
ter 4 i
s
the
c
o
nc
lus
i
on
o
f
th
e
r
e
se
ar
ch
.
2. Rese
arch
Metho
d
Stages of this rese
arch in g
eneral are sh
own in Fig
u
re
1.
Figure 1. Stages of Resea
r
ch
2.1. Data
se
t
The d
a
taset
use
d
in thi
s
rese
arch a
r
e
web l
og d
a
ta
from web of
Tenth of
No
vember
Institute of Tech
nolo
g
y Surabaya,
with the we
b add
ress is www.its.ac.id a
nd th
e perio
d of d
a
ta
colle
ction is from 3 to 16 Ju
ly 2012.We
b log file fo
rmat use
d
in this rese
arch is th
e Comm
on L
og
Format
(CLF
) [24], whi
c
h i
s
the
sta
nda
rd form
at
u
s
e
d
by the
we
b
se
rver wh
en
creating
a l
o
g
.
Each line of CFLs consi
s
t
s
of host/IP Address,
identification, authus
er, date and time, method,
requ
est, statu
s
, and bytes
as sho
w
n in table 1.
From
the first
line
of Tabl
e
1, we o
b
tain
ed info
rmatio
n that the
visitor with
IP a
ddre
s
s
66.249.69.xxx have accessed a web
page index.php
on July
15, 2012 at 06:45:
13 with a stat
us
cod
e
of 200
and 1531
9 file size an
d
so on. This
is the kind
of informatio
n whi
c
h is to be
resea
r
ched to
get web visit
o
r se
gme
n
tation.
W
eb Logs
D
a
ta
C
o
l
l
ec
ti
on
P
r
e-
P
r
oc
es
s
i
n
g
F
a
c
t
or
A
nal
y
s
i
s
H
i
er
ar
c
h
i
c
al
C
l
us
ter
M
e
t
hod
N
on-
H
i
er
ar
c
h
i
c
al
C
l
us
t
e
r
M
e
thod
R
e
s
u
l
t
s
and
A
nal
y
s
i
s
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Com
b
ination
of Cluste
r Me
thod for Seg
m
entation of
Web Vi
sitor (Yuhevi
z
a
r
)
209
Table1. Com
m
on log form
at
Host/IP Address
Ident,
authuser
Date & time
Method
Request
Status
B
y
tes
66.249.69.
xxx --
15/Jul/2012:06:4
5
:13
GET
/index.php
200
15319
114.79.57.
xxx --
15/Jul/2012:19:0
8
:48
GET
/info.php
200
15582
206.53.148.
xxx --
15/Jul/2012:19:0
8
:50
GET
/media.jpg
200
1324
96.47.225.
xxx --
15/Jul/2012:19:2
0
:20
POST
/berita.php
200
30462
114.79.16.
xxx --
15/Jul/2012:20:0
0
:01
GET
/favicon.ico
200
3798
2.1.1. Pre-Processin
g
At this sta
ge,
we
perfo
rm t
he p
r
o
c
e
ss
of cl
ea
ning/filte
r
ing from
we
b log
data fro
m
items
that are not n
eede
d (irrelev
ant data). Filtering h
a
s b
e
e
n
done b
a
sed
on:
(i)
The file extension
, the accepte
d
file extension
s
ar
e .html, .php, .jsp, asp
and other
extensio
ns th
at refer direct
ly to a we
b p
age. Ite
m dat
a with file
extensi
o
n
s
such
as
.jpg,
.gif, .ic
o
, .bmp, .c
gi, .
s
wf,
.c
ss
, .txt doe
s n
o
t d
e
scri
b
e
the
beh
avior
of web vi
sitors so that
the data item is rem
o
ved [2
5].
(ii)
Acc
ess M
e
thod
. Only acce
ss th
at use
s
the GET m
e
t
hod
can in
d
i
cate the b
e
h
a
vior of web
visitors. Item data with oth
e
r acce
ss me
thods,
such a
s
HEAD an
d POST are also removed
[25].
(iii)
The resp
ons
e code from
the
w
e
b
ser
v
e
r
. Web se
rver respon
se with the code of 200
indicates an
acce
ss re
que
st to a
we
b
page
is
grant
ed a
nd di
spl
a
yed by the
web
serve
r
.
Therefore, th
e data item with a c
ode oth
e
r than 20
0 is removed [26]
.
(iv
)
The
frequ
e
n
c
y
of
v
i
sitor acc
ess
.
Only vis
i
tors
with acc
e
s
s
>
1
0 were used in this
resea
r
ch, as it is assum
ed that visitors
with a
c
ce
ss<1
0 ca
n not prope
rly describ
e the
behavio
ur of visitors.
The final re
su
lt of pre-p
r
ocessing
stage i
n
the
form of a matrix vector is a
s
follow [22]:
⋯
⋯
⋮
⋮
⋮
⋯
⋮
(
1
)
whe
r
e
m
is the numb
e
r of
web visito
rs (data),
n
is th
e numb
e
r of
web p
age
s (v
ariabl
e), an
d
X
is
a vecto
r
of o
b
se
rvation
s
. I
m
pleme
n
tatio
n
of m
a
tr
ix v
e
ctor in
eq
ua
tion (1)
abo
ut the
web
visit
o
r
behavio
r data
base
d
on the
frequen
cy of visits to the web pag
e is sh
own in Ta
ble
2.
Table 2. Matrix vector
User
Web page
p1 p2 p3 p4 p5 p6 p7 p8 p9
p10
…
pn
u1
6 9 0 0 0 0 0 0 5
20
…
X
1n
u2
0
0
0 35 0 35 0
0
0
0
…
X
2n
u3
0
11
0 0 0 0 0 0
14
9
…
X
3n
u4
0 1 4 3 2 3 0 4 4
1
…
X
4n
u5
0
84
0 0 0 0 0 0 0
0
…
X
5n
u6
5 5 5 0 1 5 4 2 6
0
…
X
6n
u7
1
37
0 0 3 0 0 1 9
1
…
X
7n
u8
2
21
3 0 4 0 2 1 7
0
…
X
8n
:
: : : : : : : : :
:
:
:
um X
m1
X
m2
X
m3
X
m4
X
m
5
X
m
6
X
m
7
X
m8
X
m9
X
m10
…
X
mn
With
p1, p2,
p3, pn
a
r
e th
e variabl
e for a web
pag
e, for example,
p1
is the
web pag
e
with the nam
e of index.ph
p
.
u1, u2, u3
, um
are the variable for
the visitors o
f
the web, for
example
u1
is a
web visitor's with
IP address,
72.233.
234.xxx. From
Table 2, it can be
con
c
lu
ded
th
at the visito
rs with va
riabl
e
s
u1
h
a
ve a
c
ce
ssed th
e
web p
age
p1
6
times,
we
b p
age
p2
9 times an
d so on
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 1693-6
930
TELKOM
NIKA
Vol. 11, No. 1, March 2
013 : 207 – 2
1
4
210
After the pre
-
processin
g
of
the data
s
et, 165
web
visitor d
a
ta
were a
c
q
u
ire
d
with
57
variable
s
(a
cce
s
sed web page
).
Thi
s
d
a
ta
in
the
fo
rm of this mat
r
ix vector th
a
t
was
processed
further.
2.1.2. Factor
Analy
s
is
The next sta
ge is to con
duct a facto
r
analysis o
n
the data resulted from t
he pre-
pro
c
e
ssi
ng
st
age. F
a
cto
r
a
nalysi
s
i
s
a
m
u
ltivariate m
e
thod that
is u
s
ed
to d
e
scri
be the
patte
rn of
relation
shi
p
s
betwe
en va
ri
able
s
in o
r
d
e
r
to find
i
nde
pend
ent vari
able
s
that aff
e
ct the
obje
c
ts
calle
d by a factor. In this
case, fa
ctor a
nalysi
s
ai
m
s
to red
u
ce the
variable
s
into
several set
s
of
indicators call
ed facto
r
s, wi
th no loss of meanin
g
ful in
formation fro
m
the initial variabl
e.
The first stag
e in facto
r
an
alysis i
s
the
pr
o
c
e
ss
of testing the a
d
e
qua
cy of the data and
the identificat
ion of co
rrela
t
ions bet
wee
n
va
riabl
es with
Measure of
Sampling Adequa
cy
(
MSA
)
method i
n
eq
uation
(2), K
a
iser-Meye
r
-Ol
k
in
(
KMO
) in
equatio
n (3)
and Ba
rtlett's
Test in
eq
uati
on
(4) [27].
MS
A
∑
∑
∑
(
2
)
∑∑
∑∑
∑∑
(
3
)
whe
r
e:
i
=
1, 2, 3, ...
,
p
dan
j
=
1, 2, 3, ..
.,
p
r
ij
= Coefficie
n
t
of correl
a
tio
n
betwe
en va
riable
s
i
and
j
a
i
j
= Partial co
rrel
a
tion coefficient bet
wee
n
variable
s
i
and
j
ln
|
|
1
(
4
)
whe
r
e:
|
R
| = Value of
determin
a
n
n
= Numb
er
of data
p
= Numb
er
of variabel
Based o
n
this method, a group of data is said
to meet the sufficie
n
cy of the data
and the
correl
ation assumptio
n
s
wh
en the value of the
MSA, KMO
is gre
a
ter than 0.5 a
nd a signifi
ca
nce
value of Ba
rtlett test <0.05.
The
r
efore, variabl
es with
MSA
<0.5
we
re exclu
ded
from the
analy
s
is.
Output of the
analysi
s
in f
o
rm of facto
r
sco
re
s will
be used in t
he clu
s
te
r an
alysis. Ta
ble
3
sho
w
s t
he t
e
s
t
result
s usin
g
KMO
,
Bartlett's
and
MS
A
methods.
Table 3. Re
sults of the testing with KMO, Bartlett and MSA methods
Kaiser-Me
y
e
r
-
O
lkin Measure of Sampling Adequa
cy
.
0.757
Bartlett's of
Approc. Chi-Squ
a
re
9872.112
Sphericity Df
1596
Sig
0
As sho
w
n in
Table 3, the value of KMO and
Bartle
tt's Test is 0.757 with si
gn
ifican
ce
value is 0.0.
This
mean
s t
hat the vari
a
b
le an
d the
d
a
ta ca
n b
e
receive
d
an
d
analyzed fu
rther
becau
se the
value of KM
O and B
a
rtlet
t
's Te
st re
cei
v
ed is
> 0.5
and
signifi
ca
nce val
ue
<0.05.
Variabl
es wit
h
MSA <0.5 were exclu
d
e
d
in this re
se
arch. Table 4
sho
w
s the variabl
es
with MSA
<0.
5
.
After testing t
he ad
equ
acy
of the data, t
hen a
facto
r
analysi
s
wa
s perfo
rmed
wit
h
re
sult
s
as
sh
own in
f
i
gure
2.As sh
own
in
Figu
re
2 th
at there
are
14
facto
r
s fo
rmed
(eig
envalue
s
≥
1)
of
57 baseline variabl
es.
With the di
stri
buti
on of th
e vari
able
and the
percent
age
of variable ability
explained by
factor
sho
w
n
in table 5 and
table 6.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Com
b
ination
of Cluste
r Me
thod for Seg
m
entation of
Web Vi
sitor (Yuhevi
z
a
r
)
211
The la
st
st
ep
in f
a
ct
or
ana
ly
sis i
s
t
o
ma
ke f
a
ct
or
s s
c
o
re,
t
h
is i
s
a
sc
ore f
o
r f
a
ct
ors t
h
at
are formed to
repla
c
e the
value of the origin
al varia
b
le by namin
g variable
f1
to fac
t
or 1,
f2
to
factor 2, and
so on. Th
e re
sults from the
factor
sco
r
e
s
operatio
n are use
d
for clu
s
ter an
alysi
s
.
Table 4. Vari
able
s
with
MSA
<0.
5
Variables M
SA
Values
p50
0.415
P7
0.394
P36
0.416
P1
0.401
P33
0.471
P30
0.491
Figure 2. Scree plot factori
z
ation results
Table 5. Di
stribution an
d p
e
rcentag
e of va
riable a
b
ility explained b
y
resulted fa
ctor
Fact
or 1
Fact
or 2
Fact
or 3
Fact
or 4
Fact
or 5
Fact
or 6
Fact
or 7
p10(67.8
%
)
p40(81
%)
p4(89.7
%
)
p19(79.3
%
)
p32(69.4
%
)
p11(72.9
%
)
p45(82.7
%
)
p12(88.4
%
)
p42(76.9
%
)
p6(96.8
%
)
p31(83.7
%
)
p34(82.6
%
)
p13(79.2
%
)
p51(80
%)
p15(87.8
%
)
p44(73.6
%
)
p27(98.4
%
)
p38(87.7
%
)
p35(78.3
%
)
p14(86.1
%
)
p61(91.1
%
)
p16(78.2
%
)
p55(71.8
%
)
p28(98.6
%
)
p57(92.2
%
)
p37(87.8
%
)
p26(82.6
%
)
p63(34
%)
p17(92.7
%
)
p59(75
%)
p20(82.8
%
)
p60(81.1
%
)
p21(87.7
%
)
p62(66.4
)
p22(90.4
%
)
p24(92.4
%
)
p25(94.8
%
)
p49(81.1
%
)
Table 6. Di
stribution an
d p
e
rcentag
e of variabl
e a
b
ility explained b
y
resulted fa
ctor (continu
e
)
Fact
or 8
Fact
or 9
Fact
or 10
Fact
or 11
Fact
or 12
Fact
or 13
Fact
or 14
p3(67.3
%
)
p46(84.5
%
)
p23(74
%)
p52(95.4
%
)
p47(89.9
%
)
p39(63.8
%
)
p2(71.8
%
)
p5(66.5
%
)
p48(63.5
%
)
p29(53.7
%
)
p58(96
%)
p56(89.2
%
)
p9(55.8
%
)
p8(69
%)
p53(70.8
%
)
p41(71.6
%
)
p18(76
%)
p54(55.2
%
)
p43(77.1
%
)
2.1.3. Cluste
r Analy
s
is
Clu
s
ter an
alysis i
s
the task of assig
n
ing
a set of obje
c
ts into gro
ups (calle
d clu
s
t
e
rs) so
that the obje
c
ts in th
e
sa
me cl
uste
r a
r
e mo
re
simi
lar to e
a
ch
other th
an to
those i
n
oth
e
r
clu
s
ters. Thi
s
is no
n-p
a
ra
metric te
ch
ni
que
s which i
s
very mu
ch
appli
c
able i
n
the re
al world.
Clu
s
ter
analy
s
is i
n
this stu
d
y wa
s carrie
d out by
com
b
ining th
e hi
era
r
chical cl
u
s
terin
g
meth
od
and the
no
n-hiera
r
chi
c
al
clusteri
ng m
e
thod. Result
of the facto
r
analysi
s
in t
h
e form
of factor
score
s
were use
d
as in
put
to the cluste
r analysi
s
.
2.1.3.1. Hiera
r
chical Clus
ter
The first pha
se of the hie
r
archi
c
al cl
uster
is cal
c
ulat
ing the dista
n
ce bet
wee
n
objects
with eu
clide
a
n
dista
n
ce m
e
thod a
nd cl
uster fo
rmati
on u
s
ing the
singl
e linkag
e
method. B
a
se
d
on the re
sults of the agglomeratio
n sch
edule from
thi
s
method, the
numbe
r of clusters ba
se
d on
the rule
s of the elbo
w we
re
determin
ed, as sho
w
n in
Table 7.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 1693-6
930
TELKOM
NIKA
Vol. 11, No. 1, March 2
013 : 207 – 2
1
4
212
Table 7. Aggl
omeration schedul
e
Stage
Cluster Combine
d
Coefficients
Stage Cluster Fir
s
t Appears
Next
Stage
Cluster 1
Cluster 2
Cluster 1
Cluster 2
: :
:
:
:
:
:
158 3
24
61.638
157
0
159
159 3
10
100.274
158 0
160
160 3
6
115.043
159
0
161
161 3
4
116.709
160
0
162
162 3
78
133.669
161
0
163
163 1
3
147.534
0
162
164
164 1
2
175.524
163
0
0
Table 7 sho
w
s a differen
c
e in co-effici
ent in whe
r
e
co-efficient in
stage 15
9 is bigger
than the
oth
e
r. Th
us, b
a
s
ed
on
elbo
w rule, with
the am
ount o
f
data a
s
16
5, 165
– 15
9
= 6
(re
sulte
d
6 cl
usters). The
s
e result are u
s
ed a
s
inp
u
t for the non
-hi
e
ra
rchy clu
s
ter analy
s
is.
2.1.3.2. Non-Hierar
ch
y
Cluster
Non
-
Hi
erarch
ical Cl
uste
r is used to d
e
te
rmi
ne
web’
s
visitor segme
n
tation. In this ca
se,
K-Mean
s met
hod [22] wa
s
use
d
with the
following alg
o
rithm:
(i)
Determine th
e numbe
r of
k
as m
any as the numbe
r of cluste
r whi
c
h is fo
rmed.
This i
s
also
intende
d to re
pre
s
ent the st
arting centroi
d
.
(ii)
Data are allo
cated rand
om
ly in
to cluster
based on the
nearest centroid.
(iii) Recal
c
ulate
t
he
ce
nt
roid
k positio
n.
(iv)
Rep
eat step
2 and 3 until i
n
ter-clu
s
ter o
b
ject moving
no long
er exi
s
t.
3. Results a
nd Analy
s
is
Based o
n
the implement
ation of
Non-Hierarchy Cl
uster met
hod
with 6 clust
e
r of web
visitor, memb
ership of every cluste
r wa
s
gotten, as sh
own in Ta
ble
8.
Table 8. The
numbe
r of clu
s
ters’ memb
e
r
s
Cluster Member
1 2
2 1
3 143
4 13
5 3
6 3
Valid data: 16
5
Table 8 info
rms the g
r
o
uping of 16
5 web’
s visitor with clu
s
t
e
r 1 co
nsi
s
t
s
of two
membe
r
s,
clu
s
ter
2 with
on
e, clu
s
ter
3 with one h
und
red forty three
,
cluste
r 4
wit
h
thirtee
n
, an
d
clu
s
ter 5 an
d 6 with three
membe
r
s e
a
ch. The detail informatio
n ca
n be se
en in
Table 9.
It can
be
con
c
lud
ed f
r
om
Table
9 th
at
web
visitors (
u
1,
u
2,
u
3…
u
165)
within
the
sam
e
clu
s
ter
have t
he
same
access o
r
visitin
g
pattern to
ward ITS
we
b
page
so
that
this info
rmati
o
n
can b
e
used
as an in
put for the we
b p
e
rsonali
z
at
io
n and modifi
cation, inclu
d
i
ng clu
s
ter 3
whi
c
h
has the mo
st
membe
r
.
The last pa
rt of cluste
r an
alysis i
s
to produ
ce the fin
a
l clu
s
ter cen
t
ers. As info
rmed by
Table 10, the
amounts of clu
s
ters pro
d
u
ce
d are si
x
and ea
ch cl
u
s
ter ha
s its o
w
n ch
aracte
ri
stic
whi
c
h is
different from
on
e anoth
e
r. T
h
is info
rmatio
n ca
n be
se
en from the
value of the
final
clu
s
ter
ce
nter of ea
ch va
ri
able
in
where the p
o
sitive
sign
(+) re
prese
n
ts th
e value
s
which
are
above ave
r
ag
e and the
ne
gative si
gn
are the value b
e
low ave
r
a
g
e
.
Here, the v
a
lue of
f1
ha
s a
positively big
value in
clu
s
ter 1 b
u
t ha
s
negative valu
e in oth
e
r
clu
s
ters. It mea
n
s that th
e web
page i
n
facto
r
1 is vi
sited b
y
more m
e
m
bers in
clu
s
te
r 1
comp
ari
n
g to the oth
e
r clu
s
ters. Ba
sed
on the cl
uste
rs, it can b
e
concl
ude
d that cluste
r 1 i
s
the visitors wh
o domin
antly acce
ss the
web
page within
f1
and
f14
, cl
u
s
ter 2
con
s
i
s
ts of visitors who domin
antl
y
acce
ss the
web p
age
within
f
3, and so on.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Com
b
ination
of Cluste
r Me
thod for Seg
m
entation of
Web Vi
sitor (Yuhevi
z
a
r
)
213
Table 9. Clu
s
ter membe
r
ship
Cluster Member
1 u1,
u3
2 u2
3
u5, u7, u9,
u10,
u12, u13, u
14, u
15, u17, u1
8, u1
9, u20, u21,
u22
, u23, u26,
u27,
u29, u32, u
33, u
34, u35
,
u36, u37, u
38, u
39, u40, u4
1, u4
2, u43, u44
, u45
. U46, u47,
u48,
u49, u51,
u52,
u53, u54, u
55, u
57, u58
,
u59, u60,
u61, u
62, u63,
u64, u6
5, u67, u
70, u71
, u72, u7
3, u74,
u75, u76,
u77, u
79, u81,
u82, u8
3, u8
4
,
u85, u86,
u87,
u
88, u89,
90,
u91
, u92,
u93, u
94,
u95, u96,
u97,
u
98, u99,
u100,
u
101, u10
2,
u10
3, u104,
u105, u106, u10
7, u108, u109, u
110, u111, u113
, u114, u115, u1
16,u117, u118,
u119, 120, u121
, u122
,
u123, u124, u12
5, u126, u127, u
128, u129, u130,
u131, u132, u1
33, u134, u135,
u136, u137, u13
8, u139,
u140, u141, u14
2, u143, u145, u
146, u147, u148,
u149, u150, u1
51, u152, u153,
u154, u155, u15
6, u157,
u158, u159,
u16
0, u161, u16
2, u
163, u164, 1
6
5
4
u4, u8, u11, u
24,
u25, u28, u5
0, u
56, u66, u68,
u6
9, u112, u14
4
5
u30, u78, 80
6
u6, u16, u31
Table 10. Th
e final clu
s
ters ce
ntre
Var
Cluster
1 2
3
4
5
6
f1
7.89195
-.15046
-.09949
-.04319
-.14271
-.13891
f2
-.14383
-.35827
.03727
-.16429
-.72473
-.12447
f3
.03924
12.59114
-.08314
-.04008
-.16055
.07413
f4
-.02047
-.19888
-.13353
-.03349
.13025
6.45976
f5
.02678
-.39458
-.04784
.66457
-.42887
-.05686
f6
.00977
-1.06264
-.00937
.17408
-.60885
.64886
f7
-.08475
-.07753
-.15246
1.83025
-.47691
-.10470
f8
.18684
-.69466
-.00275
-.07509
.68261
-.11938
f8
-.19577
-.15362
-.03678
.34198
.26607
.18699
f10
-.07120
-.07587
.01548
-.11539
-.25520
.09032
f11
-.37024
-.01147
-.15697
1.95009
-.22823
-.48951
f12
-.08222
-.00952
-.13323
.18111
5.60284
.02113
f13
.20360
.04535
-.04079
.40478
.14716
-.10767
f14
1.08553
.11134
-.04146
.23921
.01811
.16078
4. Conclusio
n
Based
on th
e appli
c
ation
of combi
ned
method of h
i
era
r
chy and
non-hie
r
archy
clu
s
ter
toward the
web log
data, it
ca
n be
sum
m
ed u
p
t
hat t
h
is m
e
thod
can give
ne
w i
n
formatio
n a
bout
a web vi
sitors’ p
a
ttern
or
behavio
r
so t
hat the i
n
formation
can
b
e
u
s
ed
for
web p
e
rson
alization
and we
b mod
i
fication. Fro
m
the test applied on ITS’s
web log data
,
6 clusters of
web visitors are
resulted.
Am
ong
the 6 clu
s
ter, clu
s
ter 3
ha
s
t
he
big
gest n
u
mb
er
of membe
r
s (143 m
e
mbe
r
s).
This info
rmati
on can be
useful for web
manag
em
e
n
t to improve t
he service o
n
the we
b pa
ge
whi
c
h is fre
q
uently visited
or acce
ssed
by member
o
f
3rd clu
s
ter,
esp
e
ci
ally if the mana
gem
ent
want
s to do the we
b perso
nalization an
d web mo
dification.
Referen
ces
[1]
Yoha
nes BW
, Hand
oko,
W
a
rdan
a
HK. F
o
cused
Cra
w
l
e
r Optimizati
on U
s
ing Ge
neti
c
Algorit
hm
.T
ELKOMNIKA Indones
ian J
ourn
a
l of Electrica
l
Engi
neer
in
g
. 2011; 9(3): 4
03
- 410.
[2]
F
ong ACM, B
a
o
y
a
o
Z
,
Hui
SC, Hon
g
GY, Do T
A
. W
eb Conte
n
t Rec
o
mend
er S
y
ste
m
Based
O
n
Cons
umer Be
h
a
vior Mo
de
lli
n
g
.
IEEE Transactional on Consum
er E
l
ectr
opnics
. 20
11;
57(2): 9
62
–
969.
[3]
A
w
a
d
MA, K
h
a
lil I. Pre
d
ictio
n
of User’s
Web-
Br
o
w
sin
g
B
eha
viour: Ap
pl
icati
on of M
a
rkov
Mode
l.
IE
EE
transactio
n
on
Systems, Man,
And
Cyber
neti
cs, Part B: Cybernetics.
20
12;
42(4): 113
1 –
114
2.
[4]
Nasra
oui O, S
o
lima
n
M, Sak
a
E, Badi
a A,
Germain R. A
W
eb Usa
ge Mi
nin
g
F
r
ame
w
o
r
k for Minin
g
Evolvi
ng User
Profiles in
D
y
namic W
e
b
Sites.
IEEE T
r
ansaction
on Know
le
dg
e
and Dat
a
Engi
neer
in
g
. 2008; 20(
2): 20
2 – 215.
[5]
Godo
y D, Ama
ndi A. User
Pr
ofilin
g for W
eb
Page Filter
in
g.
IEEE Internet Com
p
uting
. 2
0
05; 9(3): 56
–
64.
[6]
W
ang Y-T
,
Le
e AJT
.
Mining
W
eb Navi
gatio
n Patterns W
i
th a Path T
r
aversal
Graph.
Ex
perts Syst
em
w
i
th Applicati
o
n.
2011; 3
8
(6): 711
2 – 71
22.
[7]
Hussai
n
T
,
Asghar
S, Maso
o
d
N.
W
e
b
Usa
ge Mi
ni
ng: A S
u
rvey o
n
Pr
epr
ocessi
ng
of W
eb
Log
F
ile
.
Internatio
na
l C
onfere
n
ce
on
In
fo
rma
ti
on
a
nd Em
e
r
gi
ng
Te
ch
n
o
l
o
gi
e
s
(IC
I
ET)
. Karachi. 2010: 1–
6.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 1693-6
930
TELKOM
NIKA
Vol. 11, No. 1, March 2
013 : 207 – 2
1
4
214
[8]
Khasa
w
n
e
h
N,
Chan C-C.
Ac
tive User-Bas
e
d
and Onto
log
y
-Based W
eb
Log D
a
ta Prep
rocessi
ng for
W
eb Usag
e
Minin
g
.
Intern
ation
a
l C
onfer
ence
on Web
Inte
llig
enc
e, IEEE/WIC/ACM
.
Washingto
n
200
6: 325
–3
28
.
[9]
Cha
ng
CC, Ch
en P-L,
Chi
u
F
-
R, Che
n
Y-K.
Appl
icati
on
of Neur
al
Net
w
o
r
ks and
Kan
o
’
s
Method
t
o
Conte
n
t R
e
co
mmend
ation
i
n
W
eb P
e
rso
nal
izatio
n.
J
ourn
a
l Ex
pert Syst
ems
w
i
th Ap
pl
i
c
ation
. 20
09;
36 (3); 531
0 –
531
6.
[10]
Kumar R. Mini
ng W
eb Lo
gs: Applic
ations
and C
hal
len
g
e
s
.
KDD
’
09 Pro
c
eed
ings of th
e 15
th
ACM
SIGKDD, Internatio
nal C
onfer
ence o
n
Know
l
edg
e Discov
e
r
y
and Data Mi
n
i
ng
. Ne
w
York.
2009.
[11]
Srivastava J, Cool
e
y
R, De
s
hpa
nd
e M,
T
an P.-N. W
eb Usage Mi
nin
g
: D
i
scover
y
a
nd A
pplic
atio
ns of
Usag
e Pattern
s F
r
om W
eb Data.
SIGKDD Explor
ations
. 2
0
00; 1(2): 12-2
3
.
[12]
Lee C
L
, Lee S. Interpreting
T
he W
eb-Minin
g
Resu
lts b
y
C
o
g
n
itive
Map an
d Associati
on Ru
l
e
Appro
a
ch
. Info
rmati
on Proc
es
sing & Man
a
g
e
m
e
n
t
. 2011; 4
7
(
4); 482 – 4
90.
[13]
Nagi
M, ElS
h
e
i
kh A, S
l
eim
a
n
I,
Peng
P, R
i
fa
ie
M, Kia
n
m
ehr K,Kar
a
mp
elas
P, Ri
dl
e
y
M, Rok
n
e
J,Alhaj
j R.
Associati
on Ru
le
s Mining Bas
e
d Appro
a
ch fo
r W
eb Usage
Minin
g
. IEEE International
Confre
nce o
n
Informatio
n
Re
use an
d Integr
at
ion (IRI). Las
Vegas, NV. 20
11: 166
–1
71.
[14]
Lee Y-S, Ye
n
S-J. Incremental a
nd Inte
r
a
ctive Mi
nin
g
of W
eb T
r
aversal Patterns.
Information
Scienc
es
. 200
8; 178(2): 2
87-
306.
[15]
[15] W
u
H-Y,
Z
hu
J-J, Z
hang X-Y. T
he Explor
e of the W
eb-Bas
ed L
ear
nin
g
Enviro
nm
ent Base i
n
W
eb Sequ
enti
a
l Pattern Mi
nin
g
.
Internati
ona
l Confer
en
ce on Co
mp
utation
a
l Intell
ige
n
ce an
d
Softw
are Engin
eeri
ng (CISE)
. W
uhan. 20
09:
1–6.
[16]
Che
n
C-M, Lee H-M, Cha
ng Y-J.
Two
Novel F
e
ature
Selectio
n Ap
proac
hes F
o
r
W
eb Pag
e
Classification
. Expert Systems w
i
th Applicati
ons.
200
9; 36(
1): 260 – 2
72.
[17]
Yu J
X
, Y
u
mi
ng O, Z
h
ang
C, Z
h
a
n
g
S
.
Identif
yin
g
I
n
terestin
g V
i
si
tors T
h
rought
W
eb
Lo
g
Classification
. IEEE Intellig
ent
System
s
. 200
5; 20(3): 55 – 5
9
.
[18]
Sudh
amath
y
G, Venkates
w
a
ran JC. W
e
b
Log Cl
usteri
n
g
Appro
a
ch
es
– A Surve
y
.
Internatio
nal
Journ
a
l on C
o
mp
uter Scie
nc
e and En
gi
neer
ing (IJCSE)
. 20
11; 3(7): 28
96–
190
3.
[19]
Shi P. An Efficient Ap
proac
h for Clu
ster
in
g W
eb Acces
s
Patterns from W
eb Logs.
Internatio
nal
Journ
a
l of Adv
ance
d
Scie
nce
and T
e
chn
o
l
o
gy
. 2009; 5: 1–
14.
[20]
Martian
a
E, Ro
s
y
i
d
N, Agus
eti
a
U. Mesin Pe
ncari D
o
kume
n
deng
an P
engk
lastera
n
Secar
a
Otomatis.
T
E
LKOMNIKA Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
n
g
.
2010; 8(1): 4
1
- 48.
[21]
Xi
e Y, P
h
o
ha
VV.
W
eb Us
er
Clusteri
n
g
F
r
o
m
Acc
e
ss
Log
Using
Be
lief
F
unctio
n
.
Proc
e
edi
ngs of
th
e
ACM K-CAP'
OI. F
i
rst
Internati
ona
l Conf
erenc
e on
Kno
w
l
e
dg
e Captur
e. Victoria. 20
01: 20
2
-
208.
[22]
Xu HJ, Li
u H.
W
eb User Clu
stering An
alysi
s Based on K
M
eans Al
gorith
m
.
Internati
ona
l Confere
n
c
e
on Informatio
n
,
Net
w
o
r
ki
ng an
d Automatio
n
(I
CINA). Kunmin
g. 2010; 2: V2-
6
– V2-9.
[23] Cha
o
fen
g
L.
R
e
searc
h
on W
eb Sess
ion
Cl
usterin
g
.
Journ
a
l of Soft
w
a
r
e
. Academ
y Pu
b
lisher. 2
0
0
9
;
4(5): 460
–4
68.
[24]
T
anasa D, T
r
ousse
B. Adv
ance
d
D
a
ta P
r
eproc
essin
g
f
o
r Intens
itas
W
eb Usa
g
e
Minin
g
. IEEE
Intelli
gent System
. 2
004; 1
9
(2
); 59 – 65.
[25]
Lee
C-H, L
o
Y
-
L, F
u
Y-H. A
Novel
Pre
d
icti
on Mo
de
l Bas
ed o
n
H
i
er
archical
Ch
aracte
ristic of W
e
b
Site.
Expert Systems w
i
th App
licatio
n.
20
11; 38 : 342
2–3
43
0.
[26]
Liu B. W
eb Dat
a
Mini
ng : Exp
l
orin
g H
y
per
link
s
, Contents, an
d Usag
e Data.
Berlin: Spr
i
ng
e
r
. 2007.
[27]
Niu J,
He Y,
Li
M,
Z
han
g
X,
Cha
o
C, Z
h
an
g B.
A C
o
mp
ar
ative Stu
d
y o
n
App
licati
on
of
Data Mi
ni
ng
T
e
chni
que
i
n
Hu
ma
n Sh
ap
e
Cluster
in
g: Pr
incip
a
l
Co
mpo
nent A
nalys
is
VS. F
a
ctor An
alysis.
IEEE
Confer
ence
on
ICIEA. 2010; 2014 – 2
0
1
8
.
Evaluation Warning : The document was created with Spire.PDF for Python.