TELKOM
NIKA
, Vol. 13, No. 4, Dece
mb
er 201
5, pp. 1368
~1
375
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i4.2811
1368
Re
cei
v
ed Se
ptem
ber 7, 2015; Re
vi
sed
No
vem
ber 4,
2015; Accept
ed No
vem
b
e
r
20, 2015
Design and Implementation of Network Public Opinion
Analysis System
Ma Junhon
g
*
1
, Liao Na
2
Dep
a
rtment of Comp
uter eng
i
neer
i
ng, Institu
t
e of technol
og
y, X
i’a
n Intern
a
t
iona
l Univ
ersit
y
Xi
’a
n, Shaa
n
x
i,
Chin
a, 710
077
,
*Corres
p
o
ndi
n
g
author, em
ail
:
maxia
o
fei
9
1
3
@
16
3.com
1
, 2340
86
9@q
q
.co
m
2
Phon
e: 029-
87
866
35
0
1
, 029-8
875
11
19
2
A
b
st
r
a
ct
Netw
ork publ
ic
opin
i
on
ana
ly
sis is an i
m
p
o
r
tant w
a
y of i
n
formatio
n
an
alysis pr
ocessi
ng. T
h
is
pap
er base
d
o
n
the researc
h
of the related
tec
hnol
ogi
es, desig
ns and
reali
z
e
s
a new
netw
o
rk publi
c
opi
nio
n
an
alysi
s system. System
ma
inly i
n
cl
udes n
e
tw
or
k data fetchin
g
p
a
rt, fetching the data proc
ess
i
ng
part, an
aly
z
e
s
the pr
ocess
e
d dat
a p
a
rt a
nd d
i
sp
lay
par
t of the p
u
b
lic
opi
ni
on
ana
ly
sis resu
lts. In
the
docu
m
ent extr
action
part, us
ed the w
e
b cra
w
ler techno
lo
g
y
, Larbi
n w
eb
craw
ler to rea
l
i
z
e
t
he co
llecti
o
n of
w
eb content; In pu
blic
opi
nio
n
infor
m
ati
on
ana
lysis
p
a
rt, the i
m
p
l
e
m
enta
t
ion of
the
ne
w
topic ado
pts a
n
improve
d
S
i
n
g
l
e
- P
a
ss c
l
uste
ring
al
gorit
hm.
T
h
is a
l
gor
it
h
m
i
s
usi
ng
of
mu
lti
-
center, us
in
g t
he titl
e
and
b
o
dy
of the vector to compar
ed tw
o-w
a
y,
that is
better reflect the dyn
a
m
ics o
f
public o
p
in
io
n topics. F
i
nal
l
y
, in
the n
e
tw
ork e
n
viro
nment
of
a u
n
ivers
i
ty, w
e
h
a
ve
th
e
tes
t
s repe
ated
ly.
T
he res
u
lts s
h
ow
that the
n
e
w
public opinion ana
lysis system
r
unning is st
able
and
has
good efficiency.
The
thes
is has
certain value for
the developm
ent of other infor
m
ation
analys
is system
s in the Internet.
Ke
y
w
ords
: Ne
tw
ork Public Opini
on, Se
arch
Engi
ne, Clust
e
r
analys
is, W
eb craw
ler
Copy
right
©
2015 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Public opi
nio
n
analysi
s
is a method b
y
colle
ctin
g informatio
n, text crawl
s
a
nd other
related te
ch
nologi
es, to
quickly find
and g
a
t
her relevant inf
o
rmatio
n on
publi
c
opin
i
on.
Meanwhile, t
he information collected automatically
captu
r
e, text filtering, topi
c a
nalysi
s
, text
cla
ssifi
cation,
clu
s
teri
ng a
nalysi
s
an
d
statistical jud
g
ment [1]. F
ebru
a
ry 2
015
, China Inte
rnet
Network Info
rmation
Cente
r
CNNIC issu
ed the
“
35th Statistical Re
port on
Inte
rn
et
Devel
opm
ent
in China
”
. T Rep
o
rt" sho
w
s that, with the rapi
d rise
of the mobile Internet, the 2014 Chin
ese
netize
n
s h
ad
rea
c
he
d 649
million. Of wh
ich 54.5
perc
ent of Interne
t
users said trust inform
atio
n
on the Inte
rn
et, there i
s
4
3
.8% of Internet u
s
ers
sai
d
they liked
o
n
the inte
rn
et to co
mment.
On
the othe
r ha
nd, in recent
y
ears, the
Chin
ese gov
ernm
ent a
c
tively promote
and
guide
the
network in politics, the
m
a
jority
of Internet
use
r
s throu
gh the
Internet
cha
n
nel comme
nt on
curre
n
t affairs, reflectin
g
t
he p
eopl
e's li
veli
hood, su
g
gestio
n
s, net
workin
g ha
s become
on
e of
the mo
st imp
o
rtant pl
atform for
use
r
s to
expre
s
s
thei
r views, the e
m
erg
e
n
c
e of
a vast n
e
two
r
k of
publi
c
opi
nio
n
ma
cro
s
cop
i
c po
we
r. At the sa
me ti
me, a growi
ng num
be
r o
f
Internet pu
blic
opinio
n
bega
n to stand out
events, and to bring a p
o
si
tive or negati
v
e soci
al imp
a
ct [2].
Colle
ge
s a
n
d
universitie
s
are
an
impo
rt
ant pa
rt
of
so
ciety, the n
e
twork ha
s
be
come th
e
first unive
rsit
y student
s’ a
c
cess to info
rmation an
d m
edia for
com
m
unication, the pu
blic o
p
i
n
ion,
publi
c
opi
nio
n
and
social
status situati
on ha
s
si
mila
rities [3]. Coll
ege
Network
publi
c
opi
nio
n
in
favor of imp
r
oving th
e
manag
eme
n
t of universit
ies
wo
rkin
g
to prom
ote
demo
c
ratic a
nd
harm
onio
u
s,
but al
so
to al
l kin
d
s of fal
s
e
st
atem
ent
s,
exag
ge
rat
ed spe
e
ch, malicio
us sp
eech
provide
s
a b
r
eedi
ng grou
nd,
to scienti
f
ic
res
earch,
unive
rsity
te
achi
ng ca
mp
us stability a
n
d
harm
ony adv
ersely affect
ed. And be
cause stu
dent
s blin
dly follow, con
c
entration, as
wel
l
as
university net
work
monito
ri
ng mi
sma
nag
ement a
nd
other
re
aso
n
s,
makin
g
it e
a
sier to
be
come
a
colle
ge cyb
e
rspa
ce
netwo
rk pu
blic o
p
ini
on of the
out
brea
k, the
so
cial in
flue
nce
of the stude
nt
popul
ation growin
g network of publi
c
o
p
inion,
for
co
llege an
d uni
versity stud
e
n
ts and
stabl
e
gro
w
th influe
nce
also tau
ght gro
w
in
g. Therefor
e, t
he stu
d
y of the university netwo
rk
pu
bli
c
opinio
n
analy
s
is
system i
s
particula
rly importa
nt.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 136
8 – 1375
1369
Forei
gn
publi
c
o
p
inio
n research i
s
simil
a
r to
the
we
stern p
ubli
c
o
p
i
nion
re
sea
r
ch. Fro
m
the begi
nnin
g
of the 1
9
th
century,
the western
p
ubli
c
opinio
n
stu
d
y
by so
cial
scholars, p
o
litical
and so
cial p
sycholo
g
ist
s
and other
soci
al scie
n
ce
an
d the wide attention of sch
o
lars, and ra
pid
developm
ent. Relate
d research m
a
inly
focu
se
s
o
n
the gove
r
nm
ent de
cisi
on
-makin
g, bo
rrow
online
poll,
a
nd a
s
a
refe
rence fo
r the
policy [4]. Pa
rt
ies of
all
co
u
n
tries,
it is through
the
net
work
media th
e el
ectro
n
ic Brid
ges, i
m
plem
e
n
tation of
pa
rty membe
r
s,
the ma
sse
s
party di
re
ctly or
indire
ctly part
i
cipate in m
a
king imp
o
rta
n
t
decisi
o
n
s
, whi
c
h is
con
ducive to bet
ter integrate the
intere
sts
of al
l partie
s
, imp
r
ove the d
e
mo
crati
z
atio
n
an
d scientifi
c
p
r
oce
s
s of d
e
ci
sion
ma
king.
In
the unive
rsity network
pub
lic o
p
inio
n m
anag
ement,
study a
b
road
is mo
re fo
cu
s o
n
th
e
sch
ool
cri
s
is ma
nag
ement. With the deep
enin
g
of the crisi
s
mana
geme
n
t rese
arch a
nd develop
m
ent,
the scho
ol cri
s
is m
ana
gem
ent ha
s be
co
me a ne
w
re
search field. S
o
me Europe
a
n
and Am
eri
c
an
cou
n
trie
s and
Japan fo
r the school crisi
s
mana
geme
n
t rese
arch relatively early, has the cert
ai
n
data a
c
cumu
lation. Ameri
c
an
re
sea
r
ch
proje
c
t T
D
T
(Topi
c
Dete
ction a
nd T
r
acking
) mai
n
ly
related
to the
five area
s
o
f
research:
continuo
us tex
t
segm
entati
on (fo
r
b
r
oa
d
c
a
s
t ne
ws), t
h
e
theme tra
ck,
theme di
scovery, di
scovery of new eve
n
ts, releva
nt finding
s [5]. Its intention i
s
to
come
up
wit
h
so
me alg
o
r
ithms th
at can di
scover
and
summ
ari
z
ed f
r
om th
e data
stre
a
m
importa
nt info
rmation
and
conte
n
t. With
the dee
peni
ng of the
cri
s
is man
age
me
nt re
sea
r
ch a
nd
developm
ent, the scho
ol cri
s
is
man
a
g
e
ment re
search h
a
s
beco
m
e a ne
w a
r
ea of re
se
arch.
Some Europ
ean
cou
n
trie
s and
Jap
an a
nd crisi
s
m
a
n
ageme
n
t sch
ool rel
a
tively early, there i
s
a
certai
n accu
mulation of in
formation.
In recent ye
ars, al
ong
with ou
r co
untry
for the
work of n
e
t
work pu
blic opinion
manag
eme
n
t, some
only the network
publi
c
opini
o
n
mana
geme
n
t laws
and
regul
ation
s
, the
resea
r
ch on
netwo
rk
publi
c
opi
nion
official
s an
d pr
iv
ate institution
s
supp
ort in
crea
se g
r
a
dua
lly,
some
rely
o
n
the g
o
vernment, the
media
publi
c
opinio
n
mo
nitoring
ag
e
n
cie
s
, a
c
ad
e
m
ic
institutions and research
institut
ion
s
a
r
ise
s
at th
e h
i
stori
c
m
o
me
nt [6], the n
e
twork
of pu
blic
opinio
n
, more
and mo
re
re
sea
r
che
r
s
wri
t
ings al
so m
o
re an
d mo
re
Found
ed in
De
cemb
er
20
05
the co
mmuni
cation
university of Chin
a
institute
of
publi
c
rel
a
tio
n
s a
nd p
ubli
c
opi
nion, i
s
an
analysi
s
of the public o
p
ini
on informatio
n research
a
nd aca
demi
c
resea
r
ch insti
t
utions, the main
resea
r
ch interests i
n
cl
ude
social
publi
c
o
p
inion,
cri
s
is
warning, b
r
an
d rep
u
tation,
publi
c
rel
a
tio
n
s
activities, etc.
; Renmin uni
versity of Chi
na and
foun
d
e
r group joint
l
y establish
e
d
"the Nation
al
People'
s
Co
n
g
re
ss a
foun
der publi
c
op
inion m
oni
to
ri
ng re
sea
r
ch base",
there are
some oth
e
r
colle
ge
s and
universities
set up the rele
vant rese
ar
ch
center. The
establi
s
hm
en
t of the network
publi
c
opini
on
agen
cie
s
got
the attention related to
n
e
twork p
ubli
c
o
p
inion. Xu Xi
n wa
s propo
sed
based on
sig
nal analy
s
is
of early wa
rn
ing me
chani
sm of netwo
rk, and put it into two kin
d
s of
mode
s:
sign
als l
ongitu
din
a
l excavation
and
late
ral
control. 2
011
, Hon
g
Xia
o
Juan
etc. i
n
t
h
e
“unive
rsity ne
twork p
ubli
c
opinio
n
a
s
sessment
sy
ste
m
ba
sed
on
the mo
del I-S
pace to b
u
ild
” ha
d
introdu
ce
d th
e British
eco
nomist
bo
wie
Sauter'
s
"inf
ormatio
n
spa
c
e" (I
-Spa
ce
) model,
Codi
ng,
abstract d
e
g
r
ees
and
diffusion i
s
the m
odel of
three
dimen
s
ion
s
.
2013, Pan
Chao b
a
sed in
the
cro
a
k of ga
m
e
of net
work
publi
c
opi
nio
n
and
gov
e
r
n
m
ent re
gulati
on mo
del "p
u
t
forwa
r
d i
n
the
con
n
e
c
tion
b
e
twee
n the
n
e
twork publi
c
opini
on
and
sup
e
rvisi
on;
Yao Chun
hu
a in th
e
stud
y of
netwo
rk p
ubl
ic opini
on control te
chn
o
logy
ba
sed
on weib
o throu
gh the
analysi
s
of the
cha
r
a
c
teri
stics of p
ubli
c
o
p
inion t
r
an
smissi
on i
n
weibo, mi
cro
b
l
oggin
g
pu
bli
c
opi
nion
co
ntrol
tec
h
nology solutions
are put forward [7].
Network publ
ic opini
on an
alysis
can h
e
l
p us to
auto
m
atically collect the re
quired data
,
found the pro
b
lem and
carry on deep mi
ning, and the
n
find the topic of the relati
onship betwe
en
different fact
ors, on the
whole det
ail
s
of an event;
this ha
s imp
o
rtant implica
t
ions for net
work
monitori
ng pu
blic opi
nion.
2. Design of
Internet Public Opinion Anal
y
s
is
S
y
st
em
The b
a
si
c p
r
i
n
cipl
e of net
work
publi
c
o
p
ini
on
analy
s
is sy
stem impl
ementation i
s
to WEB
page
s
of text
informatio
n
collectio
n, p
r
o
c
e
ssi
ng
and
data mi
ning
[8]. Before
dat
a mini
ng, the
r
e is
the informati
on preproces
sing, na
mely WEB page fil
t
ering, te
xt segmentatio
n, word freq
ue
ncy
statistics, fea
t
ure
sele
ction
and fe
ature
extracti
o
n
, e
t
c. System
will eventually
get hot i
s
sue
s
arisi
ng from
the re
ce
nt
netwo
rk information,
an
d
the event i
n
formatio
n a
c
cordi
ng to
the
requi
rem
ents of use
r
s f
o
r
a parti
cula
r type of ne
ws
a
nd informatio
n to tra
ck , th
en timely re
p
o
rt
the cert
ain pu
blic opi
nion fo
r the widely a
ttention.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
De
sign a
nd Im
plem
entation of Netwo
r
k
Public Opi
n
io
n Analysis S
y
stem
(Ma Jun
hong
)
1370
2.1. The Main Structure
System in
clu
des net
wo
rk
data fetching
pa
r
t, fe
tc
h
i
ng
th
e d
a
t
a pr
o
c
e
s
s
i
ng
par
t, a
n
d
analyzes th
e
pro
c
e
s
sed d
a
ta part a
nd
displ
a
y part
of the publi
c
opinio
n
analy
s
is
re
sults. T
he
netwo
rk
data
fetching p
a
rt
is fetchin
g
some con
c
ern
s
with the Int
e
rnet info
rma
t
ion, includi
n
g
news
,
BBS, blog, mic
r
o blog [9], etc
.; Fetc
hing
the data pr
oc
es
s
i
ng
par
t is
to
c
a
tch the web
page
for further
cl
eanin
g
and
discarding th
e usele
s
s,
to retain only useful info
rm
ation; The data
analysi
s
p
a
rt, throug
h the
data of cl
ean
ed of st
e
p
s, su
ch
a
s
Chin
ese wo
rd se
gmentation b
y
method
s
su
ch a
s
cla
ssifi
cation, cl
uste
ri
ng to
obtain
i
n
formatio
n in
the
publi
c
op
inion
hot: Pu
blic
opinio
n
a
naly
s
is sho
w
s th
e hot
publi
c
opinio
n
after
the pa
rt i
s
th
e an
alysi
s
re
sults in
different
ways. Th
e main stru
ctu
r
e
sho
w
n in Fig
u
re 1:
Figure 1. Network pu
blic o
p
inion
a
nalysi
s
syste
m
stru
cture di
ag
ram
2.2. The Web
Cra
w
l
e
r
The first step
of public opi
nion analy
s
is work is
to collect rel
e
van
t
information,
mainly
throug
h th
e
sea
r
ch
engin
e
sy
stem.
Main
part
s
i
n
clu
de: d
o
cu
ment extra
c
ti
on
sub
s
yste
m,
document filtering
sub
s
yst
e
m, docume
n
t pro
c
e
ssi
n
g
sub
s
ystem, the index ret
r
i
e
val sub
s
yst
e
m
and outp
u
t su
bsyste
m. Shown in Figu
re
2:
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 136
8 – 1375
1371
Figure 2. Network pu
blic o
p
inion
a
nalysi
s
syste
m
stru
cture di
ag
ram
One
of the
most im
porta
nt is th
e do
cum
ent extraction sub
s
yste
m, also
calle
d web
cra
w
le
r
s
,
mai
n
ly comp
ose
d
of do
cum
e
n
t
adapte
r
an
d
the inform
ation spide
r
. Do
cume
nt ada
pter
is u
s
e
d
to
de
al with
differe
nt types of d
o
c
ume
n
ts
; th
e
informatio
n
spider is mai
n
l
y
re
spo
n
sibl
e
for
the page inf
o
rmatio
n coll
ection
work.
Document
extraction
su
bsyste
m, according to t
he
provisi
o
n
s
of
the configu
r
ation file, first time
to p
r
odu
ce
the
spider on th
e
distri
bution
of
informatio
n
on the
Inte
rnet info
rma
t
ion no
de traversal
sca
n
type, an
d
then
call
the
corre
s
p
ondin
g
do
cum
ent
ada
pter
extractin
g
n
e
twork do
cu
men
t
informatio
n. Do
cum
ent t
h
e
adapte
r
ca
n e
x
tract all kind
s of page file.
There a
r
e
m
any op
en
so
urce
web
cra
w
ler,
such
a
s
Wget,
Httract, La
rbi
n
e
t
c. Thei
r
function i
s
si
milar, the
mai
n
differe
nce i
s
in th
e
pe
rfo
r
mance, but th
e main fa
cto
r
s influe
nci
ng t
h
e
perfo
rman
ce
is web spide
r
s cra
w
ling aft
e
r sto
r
a
ge,
this pa
pe
r u
s
es a L
a
rbin
web
crawl
e
r,
its
stru
cture sh
o
w
n in Figu
re
3:
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
De
sign a
nd Im
plem
entation of Netwo
r
k
Public Opi
n
io
n Analysis S
y
stem
(Ma Jun
hong
)
1372
Figure 3. Labi
n cra
w
le
rs structure
Larbi
n co
ntai
ns thre
e part
s
: the cra
w
le
r, c
heck the state of the
crawle
r we
bserver and
use
d
to rece
ive a ne
w o
ne URL
so
cket po
rt. Th
e cra
w
ler i
s
the co
re
of larbin
fun
c
tion,
acq
u
isitio
n, a
nalysi
s
, processing th
e URL to
get
the
page;
We
bserver fo
r vie
w
ing the
crawl
e
r to
download
the
cu
rre
nt state
,
given
stati
s
tics; So
cket p
o
rt u
s
ed to
re
ceive the
URL in the
cra
w
l
e
r
runni
ng dyna
mic incre
a
se need to do
wn
load the URL.
2.3. E - R Anal
y
s
is of Public Opinion S
y
stem
Public o
p
inio
n analy
s
is
system nee
ds to be ba
se
d on n
e
two
r
k data
ca
ptu
r
ing a
n
d
proto
c
ol redu
ction a
s
well
as the web crawle
r to
get the data, and
to obt
ain the
data co
ntent of
extraction, e
x
traction afte
r se
gmentati
on, cla
ssi
fi
ca
tion, and thu
s
to obtain
hot wo
rds, p
o
st
cou
n
t/reply to bro
w
se an
d parti
cipate
in discu
ssi
on
s staff analy
s
is
of the ch
ara
c
teri
stics
of a
sen
s
itive topi
c, etc., co
upl
ed with the
seco
nda
ry
se
a
r
ch to
ols
and
public
opini
o
n
rep
o
rt outp
u
t
auxiliary tool
campus BBS public opi
nion. E -
R analysi
s
on the publi
c
opinion of the whole
system
can
get all the e
n
tities in the
public
opini
on syste
m
a
nd rel
a
ted p
r
ope
rtie
s, he
at
transmissio
n
of network
p
ublic opi
nion,
the
stre
ngth
of
content, t
he a
udie
n
ce
orientatio
n a
nd
gro
w
th rul
e
is the most imp
o
rtant prope
rties of
network publi
c
opi
nion sp
eci
a
l ch
ara
c
teri
stics.
System anal
ysis wa
s carried out on the fa
ctors of
public opi
ni
on, can d
r
a
w
a E-R
diagram of the system, a
s
sho
w
n in figu
re 4:
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 136
8 – 1375
1373
3. Impro
v
ement of Single
- Pass Clus
tering Algorithm
Public opi
nio
n
analysi
s
module i
s
the core of the netwo
rk p
ublic opi
nion
analysi
s
system, mainl
y
include
s: su
bject ide
n
tification
modul
e, the words T
o
pic tra
c
king
module,
subj
ect
evaluation m
odule [10]. T
he re
alization
of a new
top
i
c u
s
ed a Sin
g
le-Pa
s
s imp
r
oved
clu
s
teri
ng
algorith
m
; Mu
lticenter form
s can
refle
c
t the vari
at
ion o
f
the publi
c
o
p
inion to
pic;
Use a d
oubl
e
or
multiple key
w
ord
s
more gives highe
r weights me
th
o
d
can a
c
cura
tely identify
the subj
ect. The
module d
e
si
g
n
idea is:
Usi
ng m
u
lticenter fi
rst
used the
vecto
r
an
d the
a
m
ount of tex
t
vector to t
w
o-way
comp
ari
s
o
n
,
comp
ari
s
o
n
p
r
inci
ple i
s
ad
opted i
n
the
pro
c
e
s
s of d
ouble
or mult
iple
keywo
r
d
s
to
give more p
o
w
er valu
e, an
d make the to
pics into topic cluste
ring tre
e
.
Divided i
n
to the pa
rent
cla
ss to
pic i
s
: re
ad all d
o
cum
ent title vecto
r
and
then
co
mpare
the s
i
milarity.
Sub cla
ssin
g
topic
i
s
: rea
d
i
ng within cla
s
s
do
cu
ment text
vector a
nd
the
n
com
pare
the
s
i
milarity.
Basic i
dea i
s
: first of all, we
will be th
e
title of the document fe
ature ve
ctor
and th
e
extraction
of t
e
xt eigenve
c
t
o
r effe
ctively; Individual
ch
ara
c
teri
stic
vector cl
uste
ri
ng if a
c
cordin
g to
the docu
m
en
t title
vector as a stan
da
rd, can be
di
vided into the first hiera
r
chy clu
s
teri
n
g
, a
document titl
e vecto
r
a
r
e
comp
ared
wit
h
tho
s
e
of
si
milarity of th
e pa
rent
cla
s
s topi
cs
can
be
divided into
cla
ss
a topi
c; If a do
cu
ment c
ould not
be determined do
cu
ment
title
vecto
r
compari
s
on, will the title text vector and to com
pare the similarity of t
he parent class topi
c, in
orde
r to determine the cla
s
s of topics, this is the secon
d
partition cl
u
s
terin
g
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
De
sign a
nd Im
plem
entation of Netwo
r
k
Public Opi
n
io
n Analysis S
y
stem
(Ma Jun
hong
)
1374
Improveme
n
t of Single - Pass
clu
s
terin
g
algorith
m
pro
c
e
ss i
s
as foll
ows:
Start:
(1) T
he initial loadin
g
for co
nversation cl
ass
;
(2)
Di rea
d
the document
(3)
comp
ared
with the pare
n
t class topi
c title feature vector
simila
rity ;
(4)
Determine
whethe
r more than thre
sh
old DC1?
Y
e
s:
st
ep (
5
);
No: text featu
r
e ve
ctor
simi
larity com
p
a
r
ed to
the
pa
rent cla
s
s topi
cs, determi
ne
wheth
e
r
more than thres
h
old DC1?
Y
e
s:
st
ep (
5
);
No: logo for a
new pa
rent class topic, go
to step (6
);
(5) The
cl
assi
fication to
the
co
rrespon
din
g
pa
rent cla
s
s
topi
cs,
are comp
ared wit
h
tho
s
e
of text feature vecto
r
si
mi
larity su
bcl
a
ss t
opi
c, dete
r
mine
whethe
r mo
re tha
n
threshold
DC2
again
?
Yes: the cla
s
sificatio
n
to the corr
e
s
p
ondi
ng su
bcl
a
ss topic, ste
p
(6
);
No: logo for a
new sub
c
la
ss topic;
(6) T
o
judge
all data have
been p
r
o
c
e
s
sed?
Yes: update
and sto
r
e the
data, end the
algorith
m
;
No: to contin
ue with the n
e
xt docume
n
t Di + 1, go to step (2
).
4. The Sy
ste
m
Implementation a
nd Experiment
The test of th
is sy
stem in
a university n
e
twork
cente
r
, which u
s
e
s
the Mysql d
a
taba
se,
operating
system of Red
Hat Linux Ente
rpri
se
4 0.
Te
st data
is line
of 5.5 milli
on
publi
c
o
p
inio
n
informatio
n d
a
ta; raw data
size is 2 g
The
experim
e
n
tal ste
p
s:
in
dex cache
re
ad
re
cords in
the data
b
a
s
e,
and
sto
r
in
g t
he d
a
ta
in the index, and then opti
m
ized. If there is no set ca
che is di
re
ct cycle qu
ery 2
0
times, and then
cal
c
ulate
d
th
e average
value of the
qu
ery time.
Rep
eat the a
bov
e process
10
times,
cal
c
ul
ate
the ave
r
ag
e ti
me. 50
records
befo
r
e fin
a
lly retu
rned
to
the
data, a
n
d
de
scendi
ng orde
r acco
rdi
ng
to the relevant work
.
The expe
rime
ntal results:
Index and th
e
optimizatio
n
time we
re 9
0
00 second
s, form the i
ndex
file size is 3.
22 GB.
To retrieve
a
singl
e
word, t
he
size of
the
test
re
sult
se
t for 1
000
00,
2000
00, 3
0
0
000, 8
000
00,
1
million, 2 milli
on, 4 milli
on,
5 million,
when the perf
orm
ance. If the result
se
t i
s
less than 400000,
Lucene.n
e
t speed i
s
faste
r
, but the retri
e
val time incr
easi
ng with t
he incre
a
se o
f
the result
se
t is
also
slo
w
. The results a
s
shown in table
1:
Table 1. Search re
sult
s
Wor
d
Results
T
i
me
Internet pu
blic opinion
901241
80.4
Computer net
w
o
r
k
2593142
73.2
Entertainment
5196317
109.6
emplo
y
ment rate
897625
99.8
Finally, System in this
pa
per a
nd the
typi
cal ge
neral se
arch e
n
g
ine
s
such a
s
bai
du,
sog
ou, G
oogl
e, etc. T
he e
x
perime
n
tal
compa
r
ison,
m
a
inly in the
field of Inte
rne
t
publi
c
o
p
ini
o
n
informatio
n searche
s
for th
e obje
c
t, the result
s are a
s
follows:
Table 2. Co
ntrast thi
s
syste
m
with indep
ende
nt sea
r
ch engin
e
Our
s
y
ste
m
Google
Baidu
Sogou
Number of
result
s pages
896
601
641
759
precision
0.91 0.78 0.82
0.66
Contrast tabl
e above
sh
o
w
s th
at on th
e pre
m
ise of
rand
om term
publi
c
opini
o
n
, publi
c
opinio
n
in
terms
of re
call
ratio of
sea
r
ch
re
sult
s info
rmation
sea
r
ch en
gine
me
dicin
e
for a
b
o
u
t a
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No
. 4, Decem
b
e
r
2015 : 136
8 – 1375
1375
quarte
r hig
h
e
r
than ge
neral sea
r
ch en
gine. Whil
e p
r
eci
s
io
n is a
b
out 21% high
er than g
ene
ral
sea
r
ch engi
n
e
s. Thi
s
is e
m
bodie
s
the
sup
e
rio
r
ity of this syste
m
.
5. Conclusio
n
Network p
ubl
ic opi
nion
is
the political
be
liefs
of the
peopl
e thro
ugh the
Internet in a
variety of governme
n
t and so
cial phe
nomen
a,
the problem ex
pre
s
sed by the sum of the
attitudes, opi
nion
s, emotio
ns, with free
dom an
d
con
c
ealm
ent, intera
ctivity and timeliness, ri
ch
diversity a
n
d
other cha
r
a
c
teri
stics.
Uni
v
ersi
ty
net
wo
rk
with a co
mmon netwo
rk
publi
c
opi
nion
publi
c
opini
o
n
in com
m
on,
but also h
a
s
its own
peculi
a
rities. O
n
ho
t topics, eme
r
gen
cie
s
, maj
o
r
issue
s
for qui
ck id
entificati
on and tracki
ng, can
h
e
lp
stude
nts un
d
e
rsta
nd the
scho
ol the focus
topic, you
ca
n bo
ot the
campu
s
p
ubli
c
hotspots in
quickly g
r
a
s
p
and
man
a
g
e
information
in
terms of pu
bl
ic opi
nion, th
e ca
mpu
s
n
e
t
work, for
sch
ools
an
effect
ive cha
nnel
[11]. This arti
cle
will examine
the combi
nati
on of theory and empi
ri
cal
rese
arch, propo
sed info
rmation coll
ection
model n
e
two
r
k of publi
c
o
p
i
nion throug
h
sea
r
ch
en
gines and web
cra
w
le
rs’ co
mbination of
the
establi
s
hm
en
t of the network
publi
c
op
inion colle
ction sy
stem to
meet the dif
f
erent ne
ed
s of
peopl
e for informatio
n coll
ection. Fi
nall
y
, the dema
n
d
from th
e n
e
twork i
n
formation
colle
ction
and a
nalysi
s
of publi
c
op
inion
starting
to study
pu
blic o
p
inion
informatio
n n
e
twork a
naly
s
i
s
system, re
ali
z
ed from the
URL crawl
ed
page
s to re
-work, to an
alyze the informa
t
ion obtaine
d to
compl
e
te the camp
us n
e
twork p
ubli
c
opi
nion an
alysis
system d
e
sig
n
.
Ackn
o
w
l
e
dg
ment
This work is financially suppo
rted by Scient
ific research p
r
oje
c
t of shaanxi provin
ce
depa
rtment o
f
educatio
n, NO. 15
JK213
3.
Referen
ces
[1]
Yue
Xian
g F
e
n. Cl
uster An
a
l
y
s
is o
n
Inter
n
et Pub
lic Op
in
i
on
Literat
u
re.
Pion
eeri
n
g
W
i
th Sci
ence
&
T
e
chno
logy M
onthly.
20
12; 8(6): 96-1
00.
[2]
Sun Ha
oJu
n
, Shan Gu
ang
Hui, an
d Gao
Yu Lo
ng.
Alg
o
r
ithm for hig
h
-dime
n
sio
nal c
a
tegoric
al d
a
ta
w
e
ig
hted s
ubs
pace cl
usterin
g
.
Comp
uter En
gin
eeri
ng a
nd
Appl
icatio
ns
.
2
014; 50(
23): 13
1-13
5.
[3]
W
en Shu
n
, Z
h
ao Ji
e Yu, a
n
d
Z
hu Sha
o
Ju
n
.
Hierarc
hica
l
Clusteri
ng B
a
s
ed o
n
a B
a
yesi
an H
a
rmon
y
Measur
e.
Pattern Reco
gniti
on
and Artifici
al In
tellig
enc
e (PR
&AI)
. 2013; 26(
12): 116
1-1
168
.
[4]
LV Gang, Ch
e
n
She
ng-b
i
ng.
An Improv
ed
Entit
y
Simi
lari
t
y
Meas
ureme
n
t Method.
TEL
K
OMNIKA
T
e
leco
mmunic
a
tion, Co
mputi
ng,
Electron
ics
and Co
ntrol.
2
014; 12(
4): 101
7 –10
22
[5]
Che
n
Li
pin
g
. Desig
n
an
d Impl
ementati
on of
Da
ta Co
llecti
o
n
and E
x
tractio
n
S
y
stem on P
u
blic Opi
n
io
n
in Cam
pus BB
S.
Hua
z
h
o
ng
Univers
i
ty of Scienc
e an
d T
e
chno
logy.
W
u
han, Hu
be
i PR
Chin
a, 201
2;
45(6): 15-
19.
[6]
Mana
ev Ol
eg,
Man
a
y
ev
a N
a
tali
e, Yur
an
Dzmitr
y
.
T
he‘s
p
iral
of
sil
ence
’in
el
ectio
n
ca
mpai
gns
in
a
post-Comm
uni
st societ
y
.
Inter
natio
nal J
ourn
a
l of Market Re
search.
20
11;
26(5
3
): 319-
33
8.
[7]
Yin Ya
ntai. T
he R
e
searc
h
on D
e
vel
opme
n
t and Gu
ida
n
ce of C
o
ll
eg
e Stude
nts'
Net
w
o
r
k Pu
bl
i
c
Opini
on in C
h
i
na.
Hun
an U
n
i
v
ersity.
2013; (
4
): 11-17.
[8]
M Ikonomak
is, S Kotsiantis,
V T
a
mpakas. T
e
xt Classific
a
tion Us
ing M
a
chin
e Le
arn
i
ng
T
e
chniqu
es.
WSEAS Transactions on Com
p
uters.
201
0
;
4(8): 966-97
4
.
[9]
G Guo ha
o. R
e
searc
h
a
nd
D
e
sig
n
of Pu
bl
ic
Op
ini
ons Infor
m
ation S
earc
h
S
y
stem B
a
se
o
n
LU
CENE.
PLA Informatio
n
Engi
ne
erin
g Univers
i
ty.
2012; (12): 21-2
4
.
[10]
W
ang Qi
ng, C
hen
g Yi
ng,
Ch
ao
Nai
e
n
g
. On
the C
ons
tructio
n
of Inter
net P
ublic
Opi
n
io
n I
nde
x S
y
ste
m
forts monitorin
g
and Ear
l
y
W
a
rni
ng. Books i
n
telli
ge
nce
w
o
r
k
. 2011; 55(
8): 55-5
8
.
[11]
S Siti Ma
imun
a
h
, Hus
n
i S S
a
s
t
ramihar
dja.
C
T
-F
C: more Co
mpreh
ensiv
e T
r
aversal
F
o
cus
ed Cr
a
w
l
e
r.
T
E
LKOMNIKA T
e
leco
mmunic
a
tion, Co
m
puti
ng, Electron
ics
and Co
ntrol
. 2
012; 10(
1): 189
–191.
Evaluation Warning : The document was created with Spire.PDF for Python.