Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 1
,
Febr
u
a
r
y
201
6,
pp
. 33
0
~
33
6
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
1.9
334
3
30
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Prediction Data Processing Scheme
using an Artif
i
cial Neu
r
al
Ne
twor
k a
nd Da
ta
Cluster
i
ng
for
Big
Da
ta
Se-H
oon Jun
g
*, Jon
g
-Ch
a
n Kim**, Ch
un-Bo
Sim
*
**
*Departm
ent
of
Multim
edia
Engi
neering
,
Suncho
n Nation
a
l
Univ
ersit
y
,
(Gwang
y
a
ng
Bay
SW Converg
ence Institute), K
o
rea
** Departmen
t
o
f
Computer
Engineerin
g
,
Suncho
n Nation
a
l Univ
ersity
, Korea
*** Departm
e
n
t
of Multim
edi
a
E
ngineer
ing, Sunchon Nation
a
l
Universit
y
, Korea
Article Info
A
B
STRAC
T
Article histo
r
y:
Received
J
u
l 24, 2015
Rev
i
sed
No
v
12
, 20
15
Accepted Nov 28, 2015
Various ty
p
e
s of derivative infor
m
ati
on have been increasing exp
onentially
,
based on mobile dev
i
ces and
social
n
e
twor
king sites (SNSs), and th
e
inform
ation
tech
nologies u
tili
zin
g
them
hav
e
als
o
been
dev
e
lopi
ng rapi
d
l
y.
Techno
logies
to
cl
as
s
i
f
y
and
an
al
yz
e s
u
ch
infor
m
ation ar
e as
i
m
portant as
data gener
a
tion.
This stud
y
con
c
entrat
es on d
a
ta
cluster
i
ng throu
gh principal
component analy
s
is and K-means algorith
ms to analy
z
e and classif
y
user
data
efficien
tly
.
We propose a techn
i
que of
changing th
e clu
s
ter choice
before cluster processing in the exis
ting K-m
eans
practic
e into
a variable
cluster
choice through principal co
mponent
an
aly
s
is, and exp
a
nding the
scope of data cluster
i
ng. Th
e techniqu
e a
l
s
o
applies
an
arti
fici
al neur
a
l
network learnin
g
model for user
recommendation and pr
ediction from th
e
cluster
e
d data.
The proposed pr
ocessing
model for predicted data gener
a
ted
results th
at
im
proved th
e ex
i
s
ting art
i
ficial
neural network
–based data
cluster
i
ng and
learning mode
l b
y
approximately
9
.
25%.
Keyword:
Artificial n
e
u
r
al n
e
two
r
k
Clu
s
tering
K-m
eans
Pri
n
ci
pal
com
pone
nt
a
n
al
y
s
i
s
R p
r
og
ramm
in
g
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Ch
un
-B
o
Sim
,
Depa
rt
m
e
nt
of
M
u
l
t
i
m
e
di
a Engi
neeri
n
g
,
Nat
i
onal
Su
nc
ho
n U
n
i
v
e
r
si
t
y
,
M
aego
k
-
D
on
g,
Su
nc
heo
n
-
si
J
e
ol
l
a
nam
-
do
5
4
0
-
74
2,
R
e
p
u
b
l
i
c
of
K
o
rea,
Em
a
il: cb
si
m@sun
c
h
o
n
.
ac.k
r
1.
INTRODUCTION
Tod
a
y, r
eal-
time d
a
ta an
d
d
o
c
u
m
en
ts ar
e o
n
an
ex
ponen
tial r
i
se b
a
sed
on
adv
a
nced
m
o
b
ile
com
put
i
ng a
n
d
soci
al
net
w
o
r
ki
n
g
si
t
e
s (SNSs). C
r
eated i
n
real tim
e, big
data
has atypical data structures,
suc
h
as
film
a
n
d im
ages added to the typi
cal data st
r
u
ct
ures
creat
e
d
b
e
fo
re
no
w.
Us
ual
l
y
use
d
by
l
a
rg
e
co
rpo
r
ation
s
,
big
d
a
ta an
alysis and
p
r
ed
iction
techno
log
i
es are also u
tilized
b
y
go
v
e
rnmen
t
ag
en
cies, sm
a
l
l
an
d
m
e
d
i
u
m
-sized
co
m
p
an
ies, and
tod
a
y’s co
mm
o
n
resear
ch
in
stitu
tion
s
. Th
ere h
a
v
e
been
m
a
n
y
stu
d
ies o
n
bi
g dat
a
anal
y
s
i
s
and p
r
edi
c
t
i
on t
echn
o
l
o
g
i
es. The t
echn
i
ques o
f
p
r
edi
c
t
i
ng t
y
pi
cal
or at
y
p
i
cal
bi
g dat
a
creat
ed i
n
real
t
i
m
e
are di
vi
ded i
n
t
o
s
upe
r
v
i
s
ed a
nd
u
n
s
upe
r
v
i
s
ed l
ear
ni
n
g
.
Whi
l
e
su
per
v
i
s
ed l
e
a
r
ni
ng i
s
a
typ
e
o
f
m
ach
in
e learn
i
ng
to
in
fer certain
fu
n
c
tio
ns f
r
o
m
t
h
e dat
a
,
com
p
ri
sed o
f
pre
d
i
c
t
e
d a
n
swe
r
s,
uns
u
p
er
vi
se
d l
earni
ng
fi
n
d
s
r
e
l
a
t
i
ons am
ong
dat
a
by
m
a
ki
ng u
s
e o
f
u
n
l
a
b
e
l
e
d dat
a
. T
h
e
num
ber
of st
u
d
i
e
s i
s
risin
g
in
th
e
area of
u
tilizi
n
g
an
artificial n
e
ural n
e
t
w
o
r
k
an
d un
sup
e
rv
ised learnin
g
–
b
a
sed
cl
usterin
g
techniques
to
analyze bi
g
da
ta being c
r
eat
ed
now [1].
Studies
on
unsupervised learning–base
d
clustering
t
echni
q
u
es
pr
o
pos
e bi
g dat
a
anal
y
s
i
s
and
p
r
oces
si
n
g
m
e
t
h
od
s
[
2
-3
].
Reco
mmen
d
a
ti
on services t
o
meet the
req
u
i
r
em
ent
s
o
f
use
r
s, a
n
d t
i
m
e t
echnol
ogy
fo
r a
n
al
y
s
i
s
o
f
cl
ust
e
ri
ng
p
r
ocessi
n
g
i
s
gr
owi
ng i
n
i
m
port
a
nce,
al
on
g
wi
t
h
t
h
e
bi
g
dat
a
an
al
y
s
i
s
t
ech
nol
ogi
e
s
. T
h
ere
are
t
h
ree m
a
jor stages of
data a
n
al
ysis clu
s
tering
in
th
e
pre
v
i
o
us st
u
d
i
e
s [4]
.
Fi
rst
,
t
h
e pre
-
pr
ocess
i
ng st
age
pres
ents a structure in which ra
w data reflect
ed for
analysis are i
n
a structure
com
p
rised of se
ntences c
ont
ai
ni
ng
w
o
r
d
s
.
It
e
l
im
i
n
at
es stop
words a
n
d ext
r
acts
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
33
0 – 33
6
33
1
m
o
rphem
e
s from sentences. T
h
e second stage distinguishe
s
th
e nu
m
b
er o
f
clu
s
ters to
d
e
term
in
e th
e clu
s
tering
of se
ntence
s pre-processe
d i
n
the
firs
t stage, and re
peats
clustering by
calcu
l
atin
g
th
e Eu
clid
ian
d
i
stan
ce
of
p
r
e-
pro
cessed
d
a
ta obj
ects.
Th
e last stag
e is a str
u
ct
u
r
e o
f
pr
opo
sed
u
s
er
s’
p
r
ed
icted
clu
s
ter
i
ng
thr
ough
cl
ust
e
red
dat
a
ob
ject
s an
d p
r
ovi
des fast
o
p
e
rat
i
on s
p
ee
d f
o
r
dat
a
anal
y
s
i
s
. Pre
v
i
o
us st
u
d
i
e
s o
n
dat
a
anal
y
s
i
s
and
p
r
e
d
i
c
t
i
o
n
ha
d
fi
xe
d a
n
d
vari
a
b
l
e
p
r
o
b
l
e
m
s
. Fi
rst
,
t
h
ey
ha
ve t
o
fi
x t
h
e
det
e
rm
i
n
at
i
on c
o
ef
fi
ci
ent
o
f
a
cluster when the sa
m
p
le data for anal
ysis increases, which is a disadva
ntage.
In s
u
ch a cas
e, unnecessa
ry data
ob
ject
s can
be
cl
ust
e
red
unl
ess t
h
e desi
re
d cl
ust
e
ri
ng
happe
ns
. Secondly, they lacked the acc
ura
c
y and
reliab
ility o
f
pred
iction
clu
s
t
e
ring
b
y
fix
i
ng
certain
la
b
e
l
s
, su
ch
as superv
ised
learn
i
ng
, in
ad
v
a
n
c
e, an
d
cl
ust
e
ri
n
g
dat
a
t
o
t
h
e
fi
xe
d l
a
bel
s
i
n
l
o
w-l
e
v
e
l
dat
a
a
n
al
y
s
i
s
. T
h
e
pre
s
ent
pape
r
pr
o
p
o
ses
C
-
a
n
d R
-
base
d
dat
a
resu
lt pred
ictio
n
p
e
rform
a
n
ce an
alysis with
a d
a
ta pr
oces
sing m
odel to a
n
alyze and
pre
d
ict connections and
rul
e
s am
on
g
d
a
t
a
base
d o
n
u
s
ers’
bi
g
dat
a
.
The m
a
i
n
g
o
al
s o
f
t
h
e
p
r
o
p
o
s
e
d
pre
d
i
c
t
i
o
n
d
a
t
a
pr
ocessi
ng
m
odel
are to ov
erco
m
e
th
e
p
r
ob
lem
o
f
d
e
term
in
in
g th
e
n
u
m
b
e
r
of clu
s
ters in
t
h
e stag
e
b
e
fore
data clu
s
tering
,
an
d to
in
crease t
h
e accu
racy an
d reli
ab
ility o
f
pred
ictio
n
d
a
ta to
mak
e
d
ecision
s
fo
r v
a
riou
s p
r
ed
ictio
n
p
r
o
cesses.
It
p
r
o
cesses
u
s
ers’
pr
ed
iction
data, in
clu
d
i
n
g
in
ter
-
d
a
ta r
e
gular
ity an
d
m
a
i
n
to
p
i
cs
p
u
r
s
ued
b
y
u
s
er
s thr
ough
pri
n
cipal c
o
m
pone
nt a
n
alysis (PC
A
) and
K-m
eans algor
ithm
s
based on
the se
nten
ces
written on
the
users
’
SNSs.
Th
e d
i
stin
gu
ish
e
d
regular
d
a
ta ob
j
ects
p
r
op
ose u
s
er
pred
iction
clu
s
tering
throug
h rep
e
titiv
e learn
i
ng
b
y
appl
y
i
n
g
an a
r
t
i
f
i
c
i
a
l
neural
net
w
or
k.
Use
r
s’ se
nt
ences
on a
n
S
N
S c
a
n ex
pre
ss t
h
e wo
rd
s use
d
i
n
t
h
e
sen
t
en
ces as a
v
ector in th
e ch
aracteristic mu
ltip
le-d
im
en
sio
n
v
ect
o
r
sp
ace.
2.
RELATED WORK
The ge
net
i
c
an
d ne
ural
net
w
o
r
k al
g
o
r
i
t
h
m
s
are g
o
o
d
e
x
am
ples of algorithm
s
trying to translate what
man
actu
a
lly learn
s
in
to th
e co
m
p
u
t
er as
it is. Th
e
n
e
ur
al
net
w
or
k c
o
nsi
s
t
s
o
f
n
ode
s wi
t
h
m
a
t
h
em
at
i
cal
com
putation c
a
pabilities inte
rconnected
t
o
each
a
n
other, ope
rating by
proper
learning rules [5-6].
That is,
each node pe
rform
s
a
m
a
the
m
atical ope
ration t
h
rough c
o
upling a
nd t
r
a
n
sfe
r
functions
. The si
gnals a
c
tually
ent
e
re
d i
n
t
o
t
h
e no
des can b
e
expr
essed l
i
k
e Fo
rm
ul
a (1) base
d o
n
t
h
e
addi
t
i
on
of w
e
i
ght
ed
val
u
es
an
d
tran
sfer fu
n
c
tion
s
.
∑
(
1
)
, a
n
act
ual
i
n
p
u
t
si
g
n
al
,
o
b
t
a
i
n
s
out
put
val
u
es
by
goi
n
g
t
h
r
o
ug
h a
no
n-
l
i
n
ear
fu
nct
i
o
n
cal
l
e
d a
tran
sfer
o
r
activ
atio
n fu
n
c
tion.
re
prese
n
t
s
t
h
e co
nnect
i
o
n st
ren
g
t
h
bet
w
ee
n i
n
put
a
n
d hi
dde
n l
a
y
e
rs
.
i
s
t
h
e i
n
p
u
t
val
u
e
of an i
n
p
u
t
l
a
y
e
r. The si
gm
oi
d f
u
nct
i
on i
s
t
h
e
m
o
st
com
m
onl
y
used no
n-l
i
n
ea
r f
unct
i
on
, an
d
m
u
lt
i
-
l
a
y
e
r art
i
f
i
c
i
a
l
neu
r
al
net
w
or
ks
co
ns
i
s
t
of
i
n
put
,
hi
dde
n,
a
n
d
o
u
t
put
l
a
y
e
rs.
A
p
pl
i
e
d t
o
t
h
e l
e
arni
ng
al
go
ri
t
h
m
used
t
o
opt
i
m
i
ze con
n
ect
i
o
n st
r
e
ngt
h
was t
h
e e
r
r
o
r
bac
k
pr
o
p
a
gat
i
o
n
l
ear
ni
n
g
al
g
o
r
i
t
h
m
based
o
n
t
h
e gra
d
i
e
nt
de
scent
m
e
t
hod
, t
o
whi
c
h
m
o
m
e
nt
um
const
a
nt
s and l
earni
n
g
rat
e
s were ap
p
l
i
e
d. Incl
ude
d i
n
t
h
e
al
go
ri
t
h
m
,
an act
i
v
at
i
on f
u
nct
i
on
use
d
a t
a
ng
ent
si
gm
oi
d fu
nct
i
on i
n
t
h
e hi
dde
n l
a
y
e
r an
d
a l
i
n
ear fu
nct
i
on i
n
t
h
e o
u
t
p
ut
l
a
y
e
r. T
h
e
gra
d
i
e
nt
desce
n
t
m
e
t
hod re
peat
e
d
l
y
expl
ores
t
h
e
o
p
t
i
m
i
zati
on o
f
pa
ram
e
t
e
rs t
o
im
pr
o
v
e
t
h
e val
u
es
of
ob
ject
i
v
e
f
unct
i
ons
by
cal
cul
a
t
i
ng t
h
e
ad
j
u
s
t
m
e
nt
vol
um
e i
n
p
r
o
p
o
rt
i
o
n t
o
a
pri
m
ary
deri
ve
d
fu
nct
i
o
n–
base
d
gra
d
i
e
nt
of t
h
e o
b
ject
i
v
e f
unct
i
o
n an
d i
m
provi
n
g
t
h
e
val
u
es o
f
t
h
e ob
ject
i
v
e f
u
nct
i
o
n
.
Form
ul
a (
2
)
s
h
ows
a
n
a
r
t
i
f
i
c
i
a
l
ne
ural
net
w
or
k t
h
at
used
m
o
m
e
nt
um
const
a
nt
s
an
d l
e
a
r
ni
ng
rat
e
s
t
o
pr
o
v
i
d
e
m
o
re
effi
ci
ent
t
r
ai
ni
n
g
a
n
d
t
o
pr
o
duce
bet
t
e
r resul
t
s
:
γ
is th
e learn
i
n
g
rate;
α
th
e m
o
m
e
n
t
u
m
co
nstan
t;
the connection
strength c
o
nne
cted to e
ach layer; and
is the error rate.
1
1
(
2
)
3.
PROP
OSE
D
DAT
A P
R
O
C
ESSIN
G
S
C
H
E
ME
3.1. The Struc
t
ure
of the
Data Proce
ssing
Scheme
As seen i
n
Fi
g
u
re
1, t
h
e
pr
o
p
o
se
d deci
si
o
n
m
a
ki
ng cl
ust
e
r
i
ng c
onsi
s
t
s
of
ext
r
act
i
n
g cl
u
s
t
e
rs t
h
r
o
ug
h
p
r
e-
pro
cessing
, PCA, an
d a
K
-
m
ean
s algo
rith
m
,
clu
s
ter
i
ng
an
alysis objects, and
testi
n
g obj
ects thro
ugh
an
artificial n
e
ural n
e
twork.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Pred
ictio
n Da
t
a
Pro
cessi
n
g
Sch
e
me using
an
Artificia
l Neu
r
a
l
Netwo
r
k
an
d Da
ta …
(Chu
n-
Bo
S
i
m
)
33
2
Fi
gu
re 1.
O
v
er
al
l
st
ruct
u
r
e of
ou
r
sc
hem
e
The pre
-
p
r
oces
si
ng st
age pr
o
duce
s
i
n
put
da
t
a
by
el
im
i
n
at
ing
st
o
p
wo
rd
s and
ext
r
act
i
n
g
m
o
rphem
e
s
fr
om
raw S
N
S dat
a
.
P
r
e-
pr
ocesse
d
dat
a
c
onsi
s
t
o
f
a
d
o
c
um
ent
–
wo
r
d
m
a
t
r
i
x
. T
h
e
n
u
m
b
er o
f
cl
us
t
e
rs i
s
searche
d
by conducting PCA
for the c
h
aract
eristic vectors
corr
esp
ond
ing
to
th
e m
a
trix
, an
d
u
s
ers’
p
a
st
acts
are cl
ust
e
re
d b
y
appl
y
i
ng t
h
e
K-m
eans al
go
ri
t
h
m
.
Pre-p
r
o
cessed
dat
a
ar
e appl
i
e
d as l
e
arni
ng
dat
a
fo
r
dat
a
reliability and accuracy,
unde
rgo a cluste
ring test process, and c
h
eck the
out
put results of the fi
nal clustered
dat
a
.
3.
2. Pre-Pr
oce
ssi
ng for
t
h
e Da
ta
Ob
jec
t
This is a data
pre
-
processing stage to apply raw
SNS
dat
a
t
o
deci
si
on m
a
ki
n
g
cl
ust
e
ri
n
g
, ext
r
act
i
n
g
m
o
rphem
e
s an
d el
i
m
i
n
at
i
ng st
op
w
o
r
d
s
f
r
o
m
SNS sent
e
n
ces t
h
at
a
r
e
ra
w
dat
a
.
The
f
r
e
que
ncy
of
w
o
rds
i
n
sentences
is
used as
an im
p
o
rta
n
t indicator to m
eas
ure
i
m
port
a
nce. If
wo
rd
s have
t
o
o hi
g
h
a fre
q
u
e
ncy
,
h
o
wev
e
r, th
ey
will d
e
crease
in
im
p
o
r
tan
ce
an
d ev
en
tua
lly ho
ld
no
si
g
n
i
fican
ce. Th
e
presen
t
p
a
p
e
r ap
p
lied
algorithm
s
to analyze m
o
rphe
m
e
s an
d eliminate stop
words
by m
o
rphe
me
to extract
m
o
rphem
e
s from
SNS
sent
ences
[
7
]
.
Fi
gu
re
2
pre
s
e
n
t
s
t
h
e
co
n
d
i
t
i
ons
o
f
st
o
p
w
o
r
d
s acc
o
r
di
ng
t
o
m
o
rp
hem
e
s i
n
t
h
e al
g
o
r
i
t
h
m
.
Min
r
ang
esu
ppo
rt
an
d
Mindelta
re
fer to t
h
e
minim
u
m
scope support an
d i
t
s al
l
o
wa
bl
e ra
nge
,
respect
i
v
e
l
y
.
If
t
h
e
num
ber
of
m
o
rphem
e
s t
o
sat
i
s
fy
Mindelta
is larger th
an
th
e thresh
o
l
d
,
wo
rd
s
will b
e
ex
tracted
.
If
Min
r
ang
esu
ppo
rt
is
bigger than the
m
ean of
ra
ng
esu
p
port
,
th
ey will b
e
pro
cessed
as stop wo
rd
s.
(1
)
Min
r
ang
esu
ppo
rt
≤
wo
rd
supp
o
r
t
(2
)
k
(|
∆
ra
nges
u
p.
*
|
≤
Mi
ndel
t
a
)
≥
th
reho
ld
(3
)
Min
r
ang
esu
ppo
rt
∑
∗
Fi
gu
re
2.
C
o
nd
i
t
i
on f
o
r
st
o
p
w
o
r
d
det
ect
i
o
n
Whe
n
,
for ins
t
ance, analyzing two
se
ntences that a user has poste
d
o
n
a
n
S
N
S
(
“
I
w
e
n
t
t
o
a
n
am
usem
ent
park
fo
r a dat
e
w
i
t
h
m
y
boy
fri
e
nd t
oday
”
a
nd
“I l
o
ved c
o
t
t
o
n
candy
t
h
e m
o
st
at
t
h
e am
usem
ent
p
a
rk
to
d
a
y”) with
th
e lev
e
ls c
l
assified
b
y
sen
t
en
ce d
a
ys,
mo
rph
e
m
e
an
alysis will p
r
o
d
u
c
e resu
lts o
f
{I, wen
t
,
to
, an
am
u
s
emen
t p
a
rk
,
fo
r, a d
a
te, with, my b
o
y
friend,
tod
a
y, I, loved
,
co
tton
cand
y, th
e, m
o
st, at, th
e
am
usem
ent
park
, t
o
day
}
.
I
n
m
o
rphem
e
anal
y
s
i
s
, t
h
e
part
s
that are
e
ndi
ngs a
r
e all
d
e
leted
to tell th
em ap
art
fr
om
st
em
s. When t
h
res
hol
d i
s
fi
xed at
1 wi
t
h
t
h
e dat
a
o
b
je
ct
s pro
d
u
ced t
h
r
o
ug
h m
o
rph
e
m
e
anal
y
s
i
s
appl
i
e
d
t
o
a st
op
w
o
r
d
pr
ocessi
ng al
g
o
ri
t
h
m
,
{t
oday
,
boy
f
r
i
e
n
d
,
dat
a
, t
oday
,
am
us
em
ent
par
k
, c
o
t
t
on can
dy
, l
o
v
e
} are
extracted as c
a
ndi
dates for
Min
r
ang
esu
ppo
rt
. Si
nce the
m
ean of
ra
ng
esu
ppo
rt
in
all th
e sen
t
en
ces is 1,
{today} is ext
r
acted as a stop word. T
h
ere a
r
e a total
o
f
si
x
SNS se
nt
e
n
ce
wo
rd
s t
h
ro
u
gh
m
o
rphem
e
anal
y
s
i
s
an
d stop
w
o
r
d
p
r
o
cessi
n
g
,
na
m
e
ly {b
o
y
f
r
i
en
d (w
1)
,
d
a
te (W
2)
, tod
a
y
(w
3)
, am
u
s
em
e
n
t p
a
r
k
(w
4)
,
co
tto
n
candy
(
w
5
)
a
n
d l
o
ve
(
w
6
)
}.
They
are
f
u
rt
h
e
r
di
vi
de
d i
n
t
o
a day
–
w
or
d m
a
t
r
i
x
by
g
r
ant
i
ng
sel
ect
i
v
e
w
e
i
ght
s.
3.3. PCA and K-Means Algorithm
for Cl
ustering
Pre
v
ious studi
e
s selected the num
ber
of c
l
usters
ra
ndom
ly or accordi
n
g to the
num
b
er of cases
wante
d
by the
user in t
h
e earl
y
stage whe
n
making
use
of
the K-m
eans algorithm
for
da
ta an
alysis[8
-9
].
If
the
num
ber o
f
vec
t
ors (the clu
s
ters fo
r clusteri
n
g
) is cho
s
en
acco
rd
ing
to
flexib
le an
d
v
a
riable situ
atio
n
s
, it will
be p
o
ssi
bl
e t
o
pre
d
i
c
t
dat
a
of
br
oade
r sc
ope
s. Pri
n
ci
pal
co
m
ponent
s o
f
n
e
w dat
a
o
b
j
ect
s not
co
rr
el
at
ed wi
t
h
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
33
0 – 33
6
33
3
o
n
e
ano
t
h
e
r are ex
tracted
b
y
ch
ang
i
ng
lin
early th
e m
u
lti-d
i
m
e
n
s
io
n
a
l
d
a
ta o
f
ch
aracteri
s
tic v
ecto
r
s based
on
th
e sen
t
en
ce–word
m
a
trix
fro
m
th
e p
r
e-processing stage
.
Ex
tracted
pri
n
cipal com
p
onents are the
n
use
d
as
vectors for clustering, whic
h
are proc
e
sse
d
according to t
h
e num
ber
of cl
usters
.
As the
data analyzed t
h
rough
PCA sh
ow,
SNS u
s
ers in
crease in
SNS u
tilizatio
n
on
Su
n
d
ay, Mo
nd
ay, Thu
r
sd
ay, Sat
u
rday, an
d
Fri
d
ay. Th
ey
are app
lied
to in
pu
t d
a
ta th
ro
ugh
th
e cen
t
ral p
o
i
n
t
o
f
each
d
a
y
v
ecto
r
. Th
e in
itial v
a
lu
es are
d
e
termin
ed
aro
u
nd eac
h cl
ust
e
r f
o
r t
h
e
da
t
a
object
s acc
o
r
di
ng t
o
t
h
e
ve
ctors selected
base
d on the days analyzed as being
pri
n
ci
pal
c
o
m
p
o
n
e
n
t
s
. E
u
cl
i
d
i
a
n
di
st
ance
i
s
obt
ai
ned
be
tween t
h
e clusters a
n
alyzed as bei
n
g pri
n
cipal
com
pone
nts, a
nd the
data objects of S
N
S
users
for th
e data obje
c
ts classified accordi
ng to the
day–word
matrix
. On
ce it is d
e
term
in
ed
wh
ich
d
a
ta
ob
ject h
a
s th
e
h
i
gh
est similarity
to
wh
ich
cl
u
s
ter, it will b
e
mo
ved
to
th
e clu
s
ter
of con
cern
.
Fi
n
a
lly, th
e
cent
r
al
poi
nt
of t
h
e m
ove
d cl
ust
e
r i
s
re-calculated.
Here is an e
x
a
m
ple
:
for a data
obje
ct called {Data}, distan
ce i
s
c
a
l
c
ul
at
ed base
d o
n
t
h
e ce
nt
ra
l
poi
nt
of eac
h
cl
ust
e
r an
d E
u
cl
i
d
i
a
n
di
st
ance,
an
d t
h
e
dat
a
o
b
j
ect
i
s
m
oved
fr
om
the cl
u
s
t
e
r c
ont
ai
ni
ng
t
h
e m
i
ni
m
u
m
val
u
e t
o
anot
her
cl
ust
e
r
.
3.
4. Desi
gn of
Predi
c
ti
on
D
a
ta
A
N
N
M
o
de
l
A m
odel
i
s
desi
gne
d base
d
on
t
h
e dat
a
o
b
j
ec
t
s
from
cl
ust
e
r sent
ence
dat
a
of S
N
S
use
r
s.
An a
r
t
i
f
i
c
i
a
l
neural
network m
odel is built base
d
on
the
trial-and-e
r
ror m
e
thod
t
o
pre
d
ic
t use
r
s
[5].
The a
r
tificial neural
net
w
or
k c
o
nst
r
uct
s
dat
a
o
b
j
ec
t
s
cl
assi
fi
ed i
n
t
h
e
pre
-
pr
oces
sing stage
as i
n
put
data
a
n
d
com
p
ares the
results
pr
ocesse
d i
n
t
h
e
out
put
l
a
y
e
r wi
t
h
t
h
e
per
f
o
rm
ance o
f
da
ta objects
produced via
K-m
eans.
Form
ula (3) is a
m
o
d
e
l eq
u
a
tion
to bu
ild
an
artificial n
e
ural n
e
two
r
k
.
Here,
is th
e su
m
o
f
reco
mm
en
d
e
d wo
rd
s
associated wit
h
ce
rtain data
objects;
is
an
object t
h
at cl
usters
with th
e
K-m
ean
s algorith
m
;
an
d
is a
prel
i
m
i
n
ary
gr
ou
p
o
f
dat
a
o
b
j
ect
s cl
assi
fi
ed t
h
r
o
ug
h PC
A,
r
a
t
h
er t
h
an
p
r
e
d
i
c
t
e
d cl
ust
e
rs
.
∑
,...,
,...,
(3)
Fig
u
re
3
.
Artificial n
e
ural n
e
t
w
ork learn
i
ng
alg
o
rith
m
fo
r clu
s
ter an
alysis clu
s
tering
Fi
gu
re
3
p
r
ese
n
t
s
a l
e
a
r
ni
ng
al
go
ri
t
h
m
for a
r
t
i
f
i
c
i
a
l
neu
r
al
net
w
or
ks t
o
re
com
m
e
nd as
so
ci
at
ed w
o
r
d
s
t
o
t
h
e
dat
a
o
b
j
ect
s pr
o
duce
d
t
h
r
o
u
g
h
cl
ust
e
r
i
ng.
The
l
ear
ni
ng
al
g
o
ri
t
h
m
generat
e
s l
e
a
r
ni
ng
dat
a
base
d
on
t
h
e
wo
rd
o
b
j
ect
s
r
ecom
m
e
nde
d t
h
r
o
ug
h t
h
e
day
cl
ust
e
rs
p
r
o
d
u
ced through
K-m
eans. The
trial-an
d-error m
e
th
od
is u
s
ed
to
calcu
late th
e su
m
o
f
weigh
t
v
a
lues and
thre
s
h
o
l
ds o
f
wo
r
d
ob
ject
s i
n
t
h
e i
n
put
l
a
y
e
r
.
Er
r
o
rs a
r
e
calculated
through weight learning
i
n
the output layer. T
h
e lea
r
ni
ng
a
n
d threshold learning
of each
co
nn
ection
layer are pro
c
essed
.
Fin
a
lly, error rates calcu
la
ted re
peate
d
ly are adde
d toget
h
er t
o
c
h
eck the error
rate of th
e fi
n
a
l pred
iction
o
b
ject an
d jud
g
e t
h
e
fitn
ess
o
f
the reco
mmen
d
e
d
o
b
j
ect.
4.
E
X
PERI
MEN
T
AN
D PERF
OR
MA
N
C
E EVAL
UATI
O
N
The p
r
o
p
o
sed
pre
d
i
c
t
i
on
dat
a
proce
ssi
n
g
p
r
oces
s was su
bject
e
d
t
o
ex
p
e
ri
m
e
nt and eval
uat
e
d
fo
r
p
e
rf
or
m
a
n
ce in
th
e fo
llow
i
ng
en
v
i
r
o
n
m
en
ts: th
e CPU
was an In
tel Cor
e
i7-
4790
at
3
.
6
G
H
Z
w
ith
1
6
G
B
m
e
m
o
ry
, an
d t
h
e
O/
S wa
s
W
i
nd
ows
7
.
T
h
e
ex
peri
m
e
nt
t
ool
s u
s
ed i
n
t
h
e
pa
per
we
re V
i
sual
St
u
d
i
o
a
n
d
R
St
udi
o. E
xpe
ri
m
e
nt
dat
a
were
const
r
uct
e
d
wi
t
h
10
0 ra
nd
om
SNS se
nt
ences
post
e
d
by
cert
a
i
n
i
ndi
vi
dual
s
ov
e
r
14
day
s
i
n
or
d
e
r t
o
anal
y
ze a
nd
pre
d
i
c
t
t
h
ei
r SN
S dat
a
. T
h
e prese
n
t
pa
pe
r co
nd
uct
e
d se
veral
e
xpe
ri
m
e
nt
s t
o
assess t
h
e e
ffi
ci
ency
of t
h
e
pr
o
pose
d
pre
d
i
c
t
i
on dat
a
pr
ocessi
n
g
fr
om
vari
ous
pe
rsp
ect
i
v
es. N
o
uns
wer
e
ext
r
act
ed
by
el
im
i
n
at
i
ng st
op
wo
rd
s an
d en
di
ngs i
n
t
h
e pre
-
pr
ocessi
ng st
a
g
e base
d o
n
t
h
e raw SN
S dat
a
. PC
A
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Pred
ictio
n Da
t
a
Pro
cessi
n
g
Sch
e
me using
an
Artificia
l Neu
r
a
l
Netwo
r
k
an
d Da
ta …
(Chu
n-
Bo
S
i
m
)
33
4
was carri
e
d
o
u
t
t
o
obt
ai
n see
d
s, a cl
ust
e
r c
l
assi
fi
cat
i
on cr
i
t
e
ri
on f
o
r cl
u
s
t
e
ri
ng
, dy
nam
i
cal
l
y
based o
n
t
h
e
extracted nouns. T
h
e
SNS
data objects
we
re clustered
w
ith
a cluster as a cen
t
ral
po
in
t, and
o
b
j
ects were
sel
ect
ed f
o
r
rec
o
m
m
e
ndat
i
o
n t
o
users
.
Ta
bl
e
1
prese
n
t
s
t
h
e
day
–
w
o
r
d a
n
al
y
s
i
s
m
a
t
r
i
x
fo
r
PC
A.
Tab
l
e 1
.
PCA
Resu
lt
of
D
a
y–W
ord
Matr
ix
Variable
P
C
1 P
C
2 P
C
3 P
C
4
P
C
5
P
C
6
sun.
-
4
.
71 4.
77
-
4
.
41
-
5
.
60
-
0
.
89
-
1
.
13
mo
n.
6.
24
7.
02
1.
32
3.
34
1.
29
-
3
.
10
tue.
-
5
.
95 -
3
.
72 -
0
.
45
5.
87
-
1
.
81
-
4
.
02
wed.
4.
17
-
2
.
62
-
4
.
05 2.
89
-
1
.
02 5.
03
thu
2.
38
-
5
.
74
0.
64
-
3
.
80 7.
54
-
2
.
13
fri.
2.
81
-
2
.
25
4.
70
-
3
.
71
-
6
.
04
0.
08
sat.
-
4
.
95
2.
57
5.
24
1.
00
2.
93
5.
27
The analysis re
sults reveal tha
t
they we
re del
i
vere
d i
n
f
o
u
r
vect
o
r
n
u
m
b
ers:
mon., tue
., t
h
u.,
and
sa
t.
Fi
gu
re 4 s
h
o
w
s t
h
e anal
y
s
i
s
resul
t
s
of r
e
l
a
t
i
ons bet
w
e
e
n 1
19 i
n
p
u
t
dat
a
(w
or
ds
) pr
o
duce
d
i
n
t
h
e pr
e-
pr
ocessi
ng st
a
g
e a
nd
p
r
i
n
ci
p
a
l
com
pone
nt
s.
The
res
u
l
t
s
seen i
n
Fi
g
u
re
4
co
ver t
h
e PC
A res
u
l
t
s
o
f
P
C
1 a
n
d
PC
2 fr
om
a t
o
t
a
l
of si
x rou
n
d
s of PC
A. F
o
ur p
r
i
n
ci
pal
co
m
ponent
s by
d
a
y
were i
d
ent
i
f
i
e
d, w
h
i
c
h i
n
cl
ude
Monday, T
u
es
day, T
h
urs
d
ay, and Sat
u
rday
, whe
n
the
u
s
ers m
o
st o
f
ten u
s
ed
th
e SNS. Accord
i
n
g
to
th
e
eval
uat
i
o
n re
s
u
l
t
s
of
SN
S s
e
nt
ences
f
o
r
1
4
day
s
, t
h
e m
o
st
i
n
fl
uent
i
a
l
wo
r
d
s we
re
{sum
m
e
r, sho
ppi
ng
,
t
e
l
e
gram
, and ph
one cal
l
}
.
Fi
gure
5 pre
s
ent
s
an anal
y
s
i
s
grap
h by
day
for p
r
e
d
i
c
t
i
on dat
a
cl
ust
e
ri
n
g
.
{Thu
r
s
d
a
y, Satu
rd
ay, Tu
esd
a
y, an
d
Mond
ay} co
r
r
e
spo
nd to {black,
green, re
d, a
n
d black},
res
p
ec
tively,
falling i
n
to t
h
e
centroid of ea
ch cluster
on t
h
e clustering
g
r
ap
h.
The
11
9 wo
rd
s, w
h
i
c
h are
i
n
p
u
t
dat
a
ob
ject
s
,
were classified
in
to
th
eir cl
u
s
ters
of conc
ern
by calculating the centra
l
location of e
ach cluster and the
Euclidian
distance. T
h
e use
r
recommendation res
u
lts can u
lti
mately
b
e
in
ferred
for predictiv
e serv
ice facto
r
s
th
ro
ugh
pr
e-p
r
o
cessing
, PC
A, an
d
K
-
m
eans algorithm
analysis from
the thr
ee fo
llowing
p
e
rsp
ectiv
es.
First,
it
is possible t
o
analyze the
da
ys when
the
SNS is
use
d
by
the s
u
bjects;
secondly, it is
pos
sible t
o
c
h
eck t
h
e
areas of i
n
t
e
re
st
by
day
am
o
ng t
h
e u
s
ers;
a
nd fi
na
lly, it c
a
n
ch
eck
th
e co
nn
ectio
ns bet
w
een the 119 words
arran
g
e
d
t
h
roug
h PCA an
d the op
ti
m
i
zed
term
s in
th
e field
s
of th
e u
s
ers’ i
n
terests.
Fig
u
r
e
4
.
In
pu
t
d
a
ta an
alysis by PCA
Fi
gu
re
5.
I
n
p
u
t
dat
a
cl
u
s
t
e
ri
n
g
by
K-m
eans
Th
e go
al o
f
an artificial n
e
u
r
al n
e
twork
stag
e is
to judge
whet
her the cl
a
ssificatio
n
pred
ictio
n
of a
user rec
o
mm
e
ndation service
for certain
data objects cl
ustered through K-means anal
ysis is accurate. For that
goal
,
t
h
e
p
r
es
ent
pape
r m
easure
d
t
h
e
pr
ed
i
c
t
i
ons a
n
d
er
r
o
r
rat
e
s
o
f
a
r
t
i
f
i
c
i
a
l
neu
r
al
net
w
or
k
–base
d
dat
a
o
b
j
ects an
d
check
ed
t
h
e reliab
ility o
f
d
a
ta
p
r
ed
ictio
n.
The p
a
p
e
r also
ch
eck
ed
th
e
accu
r
acy
o
f
{shop
p
i
n
g
}
d
a
ta obj
ects in term
s o
f
an
al
ysis b
y
d
a
y an
d
pred
icte
d day. Th
e learn
i
n
g
nu
m
b
er
fo
r th
e artificial n
e
ural
network
was
fixed at 200 times in a
ll cases
.
The learning
re
sults show th
at
accuracy was
the hi
ghe
st when the
opt
i
m
al
nu
m
b
er o
f
hi
dde
n l
a
y
e
rs was t
w
o
.
Fi
g
u
re
6 p
r
e
s
ent
s
t
h
e l
ear
ni
n
g
m
odel
cr
eat
i
on res
u
l
t
s
of a
n
artificial n
e
ural
n
e
twor
k (A
NN
)
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
33
0 – 33
6
33
5
RMSE
∑
∶
∶
∶
Fi
gu
re
6.
R
e
sul
t
s of
creat
i
o
n
b
y
an
AN
N l
e
a
r
ni
n
g
m
odel
Fi
gu
re
7.
R
e
sul
t
s of
creat
i
o
n
b
y
an
AN
N l
e
a
r
ni
n
g
m
odel
Figure 7 is for accuracy a
nd
root
m
ean squa
red e
r
ror
(RMSE),
nam
e
ly the num
eric criteria to judge
the efficiency
and precision
of a
pre
d
iction
data process
.
T
h
e m
eas
ure
m
ent res
u
lts are
fo
un
d
in
Tab
l
e 2
.
Tab
l
e
2
.
PCA
Resu
lt of
Day-Word
Matrix
Tab
l
e
3
.
Cl
u
s
tering
Erro
r Rate Measurem
en
t
{Shopping}
Node=2 Node=3 Node=4
Acc.
RMSE
Acc.
RMSE
Acc.
RMSE
sun.
86.
7
0.
8647
83.
6
0.
9451
84.
7
0.
7942
mo
n.
75.
4
1.
0148
64.
6
1.
3218
83.
3
1.
2115
tue.
88.
8
1.
1656
84.
8
1.
5444
81.
2
0.
8651
wed.
81.
4
0.
9947
79.
7
0.
8562
84.
1
0.
8451
thu.
85.
5
0.
7455
84.
4
0.
9186
77.
7
0.
7953
fri.
90.
4
1.
2432
79.
9
0.
7887
79.
2
0.
8844
sat.
93.
8
0.
9464
87.
6
0.
9499
85.
4
0.
9319
Data Cluster
i
ng and
Predic
tion
(Only ANN
)
P
r
oposed
Data Cluster
i
ng
and Predic
tion
Error
Rate
14.
81%
5.
56%
Based
on the l
earni
ng m
easurem
ents of the
artific
ial neura
l
network, t
h
e
probability
of each day
for
the {shoppi
ng}
data objects was m
easured. T
h
e res
u
lts indi
cate th
at Satu
rday h
e
ld
th
e h
i
gh
est m
easu
r
emen
ts,
reg
a
rd
less
of
th
e nu
m
b
er of h
i
dd
en
nod
es. Th
e sim
p
le
m
easu
r
ed
pro
b
a
b
ility resu
lts ag
reed
wit
h
d
a
ta
clu
s
tering
throu
g
h
PCA
and
K-m
ean
s. In
ad
d
ition
,
RMSE, wh
ich
sh
ows d
i
fferen
ces between
pred
ict
e
d
an
d
act
ual
val
u
es,
had
no
bi
g di
f
f
e
rences
fr
om
sim
p
l
e
pro
b
abi
l
i
t
y
resul
t
s
, recor
d
i
n
g 0.
9
4
6
4
whe
n
t
h
e
r
e we
re t
w
o
nodes
.
Ta
ble
3 presents
com
p
aris
on
resu
lts b
e
tween th
e
old
artificial n
e
u
r
al n
e
twor
k–b
ased
d
a
ta
cluster
i
n
g
and
pat
t
e
rn
pre
d
i
c
t
i
on er
r
o
r ra
t
e
s, and t
h
ose
of t
h
e
pr
o
pose
d
dat
a
p
r
edi
c
t
i
on m
odel
.
T
h
e prese
n
t
pa
per
h
a
d t
h
e
sam
e
characteristics of uns
upe
rvised
l
ear
ni
n
g
as pre
v
i
ous st
udi
es
, b
u
t
t
h
e pre
v
i
o
u
s
st
udi
es ha
d
t
o
set
classification
criteria for cl
ustering in a
d
vance
.
Becaus
e
of the
pr
oble
m
, the error rate of
data object
cl
ust
e
ri
n
g
or
p
r
edi
c
t
i
o
n was m
easured
as h
i
gh.
T
h
e pr
op
o
s
ed pa
per ge
n
e
rat
e
d
e
ffect
s
of
p
r
e-
pr
ocessi
ng
t
h
e
cl
assi
fi
cat
i
on c
r
i
t
e
ri
a t
h
r
o
u
g
h
PC
A an
d
red
u
ced t
h
e e
r
r
o
r
r
a
t
e
by
9%
or
m
o
re i
n
cases of
dat
a
cl
ust
e
ri
ng a
n
d
pre
d
iction under the
sam
e
conditions.
5.
CO
NCL
USI
O
N
In
t
h
i
s
pape
r,
we pr
op
ose
d
s
t
udy
of
a pr
oc
essi
ng
m
odel
t
o
cl
u
s
t
e
r
dat
a
base
d
on
use
r
i
n
f
o
rm
at
i
on
created
on a
n
SNS a
n
d to predict about, a
n
d rec
o
mm
end
to, users
i
n
the fut
u
re. The propos
ed researc
h
m
odel
pr
ocesses t
h
e c
l
ust
e
ri
n
g
o
f
us
er sent
en
ce dat
a
by
m
a
ki
ng us
e of re
gul
a
r
i
t
y
am
ong
dat
a
p
u
r
sue
d
by
t
h
ose
users
,
per
f
o
r
m
i
ng PC
A
on m
a
i
n
t
o
p
i
cs, an
d a
ppl
y
i
ng
a K
-
m
eans al
go
ri
t
h
m
based
on t
h
e se
nt
e
n
ces
user
s p
o
st
o
n
a
n
SNS. Th
e d
i
st
in
gu
ish
e
d
reg
u
lar d
a
ta o
b
j
ect
s p
r
o
v
i
d
e
u
s
er p
r
ed
ictio
n
d
a
ta th
rou
g
h
repetitiv
e learn
i
n
g
b
y
appl
y
i
n
g
an a
r
t
i
f
i
c
i
a
l
neural
net
w
or
k.
Dat
a
cl
ust
e
ri
n
g
p
r
o
p
o
sed i
n
t
h
e
pape
r o
v
er
co
m
e
s t
h
e pro
b
l
e
m
of
det
e
rm
i
n
i
ng t
h
e num
ber
of cl
ust
e
rs
bef
o
re c
l
ust
e
ri
n
g
f
o
r
di
verse
p
r
edi
c
t
i
o
n deci
si
ons a
n
d i
m
proves t
h
e
err
o
r
rate by approxim
a
t
ely 9.25% com
p
ar
ed to pre
v
ious st
udies
, in term
s of prediction data accura
cy and
reliab
ility.
W
e
will supp
lemen
t
th
e p
a
rts
wh
ere efficien
cy d
r
op
s i
n
the pre-p
r
o
cessi
n
g
stag
e becau
s
e t
h
e
p
r
esen
t st
u
d
y
co
v
e
rs
SNSs
with
sm
all a
m
o
u
n
t
s
o
f
sen
t
en
ce
d
a
ta, and
we
will con
tinu
e
to inv
e
stigate th
e
l
earni
n
g
p
h
e
n
o
m
enon
of
dat
a
ob
ject
s t
o
w
a
r
d
t
h
e cent
r
al
p
o
i
n
t
s
o
f
cl
ust
e
r
s
du
ri
n
g
K
-
m
e
ans al
g
o
ri
t
h
m
–base
d
dat
a
cl
ust
e
ri
n
g
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Pred
ictio
n Da
t
a
Pro
cessi
n
g
Sch
e
me using
an
Artificia
l Neu
r
a
l
Netwo
r
k
an
d Da
ta …
(Chu
n-
Bo
S
i
m
)
33
6
ACKNOWLE
DGE
M
ENTS
Thi
s
w
o
r
k
wa
s sup
p
o
rt
e
d
b
y
R
e
search F
o
u
n
d
at
i
on
of
En
gi
neeri
ng C
o
l
l
e
ge, S
unc
h
on
Nat
i
onal
Uni
v
ersity. T
h
e researc
h
wa
s supporte
d
by '
S
oftwar
e C
o
nve
r
ge
nce T
echnology De
velopm
ent Program
'
,
th
ro
ugh
th
e Min
i
str
y
of
Science, I
C
T and
Futu
r
e
Plann
i
ng
(S01
70
-15
-
10
79
,
S017
0-1
5
-
1
0
54)
.
REFERE
NC
ES
[1]
H.J. Kim, S.Z. Cho, and
P.S. Kang, "KR-WordRa
nk : An Un
supervised
Korean Word Extraction
Method Based on
WordRank",
Jou
r
nal of
th
e
kore
an institu
te
of
in
dustrial
engine
e
rs,
vol. 40, no. 1, pp
. 18-33
, 201
4.
[2]
C.
W.
Kim,
S.
Pa
rk,
"E
nha
ncing Text Docu
ment Clustering
Using
Non-ne
gativ
e Matrix
Factorization and
WordNe
t
"
,
Journal of information and communi
cation convergence engineerin
gs
, vol. 11, no
. 4, pp. 241-246
,
2013.
[3]
S. Park and S.R
.
Lee, “Enhan
c
ing
document clustering using
condensing cluster terms
and fu
zzy
association
”
,
Journal of IEIC
E Transactions
on Information a
nd Systems
, vol.
94D, no. 6, pp. 1
227-1234, 2011
.
[4]
T.
Zhang, "BIR
CH: an ef
ficient
data
cl
ustering
method for ver
y
large
databases",
SIGMOD '96 Pr
oceedings
o
f
th
e
1996 ACM SIG
M
OD internatio
nal con
f
erence o
n
Management o
f
data
,
1996
,
pp. 103-114.
[5]
G.
Barko,
J.
Hlavay
,
"Application of an artificial
neural
network (ANN) and piezoelectric
chemical sensor array
for
identif
ication
of
volatile organic
compounds",
Talanta
, vol.
44, n
o
. 12
, pp
. 2237-
2245, 1997
.
[6]
H.K. P
a
lo, M
.
N.
Mohant
y, "Cl
a
ssifica
tion of
Em
otional
S
p
eech
o
f
Children Usin
g P
r
obabilisti
c
Neural Network
"
,
International Jo
urnal of
Electr
ical and Computer Engin
eering
, vo
l. 5
,
no
. 2
,
pp
.31
1
~317, 2015.
[7]
K.
H.
Joo,
W.
S.
Lee,
"Document Clustering based on Level-wise St
op-word Removing for an Eff
i
cient Document
S
earching"
,
Jou
r
nal of
the korea
n
association
of
computer edu
c
ation
, vo
l. 11
, no
.
3, pp
. 67-80
, 20
08.
[8]
S.S. Kim
,
"Va
r
iabl
e Selec
tion
and Outlier
De
tec
tion for Autom
a
ted K-m
eans Clusteri
ng",
Journal of
Communications for Sta
tisti
cal Applications
and Methods
, vo
l. 22
, no
. 1
,
pp
.55~67, 2015
.
[9]
D. Napoleon
,
S. Pavalakodi, "A new
method for dimensionality
r
e
duction usi
ng K-means cluster
i
ng algorithm for
High Dimensional Data Set",
International Journ
a
l of
Computer
Applica
tions
, vo
l. 13
, no
. 7
,
pp
.4
1~46, 2011.
BIOGRAP
HI
ES OF
AUTH
ORS
Se-Hoon Jung r
eceived his BSc and MSc in
Multim
edia
Engin
eering from
Sunchon Nation
a
l
University
in 2010 and 2012, respectively
.
Curr
ently
,
he is a senior researcher with the research
& development
team, Gwang
y
a
ng Ba
y
SW Convergence Institute,
South Kor
e
a. His research
inter
e
sts inc
l
ude
data
an
al
y
s
is an
d dat
a
pr
edic
tion
.
E-mail :
iam1710@hanmail.n
e
t
Jong-Chan Kim
received a BSc, MSc, and Ph
D in computer engineering from Chonbuk National
University
, South Korea, in 2000, 2002, and 2007,
respectively
.
He was a senior resear
ch
professor in t
h
e
Autom
a
tion and
S
y
stem
Research Institut
e
at
Seo
u
l Nat
i
onal
Univ
ersit
y
in
2013.
His
current
res
e
a
r
ch in
ter
e
s
t
s
ar
e
im
age proc
es
s
i
n
g
, com
put
er gr
ap
hics
, d
a
t
a
an
al
ys
is
.
E-mail :
seaghost.sunchon.ac.kr
Chun-Bo Sim
receiv
e
d a BSc
,
MSc, and PhD i
n
com
puter eng
i
neering f
r
om
Chonbuk Nationa
l
University
, South Korea, in 1996, 1998, and 2003,
respectively
.
Currently
, he is an associate
professor with t
h
e Depar
t
m
e
nt o
f
Multim
edia
En
gineer
ing, Sunchon National Un
iversit
y
, South
Korea. His rese
a
r
ch int
e
rests in
cl
ude m
u
ltim
edia
datab
a
ses, ubiqu
itous com
puting
s
y
stem
s, and
big data processing.
E-mail : cbsim@sunchon.ac.kr
Evaluation Warning : The document was created with Spire.PDF for Python.