Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
8
, No
.
6
,
Decem
ber
201
8
, p
p.
5409
~
5414
IS
S
N: 20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v8
i
6
.
pp
5409
-
54
14
5409
Journ
al h
om
e
page
:
http:
//
ia
es
core
.c
om/
journa
ls
/i
ndex.
ph
p/IJECE
Twitt
er
Sentim
en
t Analy
sis on 2
013 Cur
ricul
um
Usi
ng
Ensemb
le Featu
res and
K
-
Nea
re
s
t
Ne
i
ghbo
r
M.
Riz
z
o
Irfan,
M.
Ali F
au
z
i, Ti
byani,
Nur
ul D
yah
Men
ta
ri
Facul
t
y
of
Com
pute
r
Sc
ie
nc
e
,
Br
awij
a
y
a
Univ
ersi
t
y
,
Indon
esia
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
Ja
n
1
9
, 2
01
8
Re
vised
Ma
y
2
7
, 2
01
8
Accepte
d
J
ul
29
, 2
01
8
2013
cu
rr
ic
ulum
is
a
new
c
urriculum
in
the
I
ndonesi
a
n
edu
cat
i
on
syst
e
m
wh
ic
h
has
bee
n
enact
ed
by
the
go
ve
rn
m
ent
to
rep
l
ace
KTS
P
curriculum
.
The
i
m
ple
m
entat
ion
of
this
cu
rr
ic
ul
um
in
th
e
la
st
few
ye
ars
ha
s
s
parked
va
rio
us
opinio
ns
am
on
g
st
ud
e
nts,
te
achers
,
a
nd
public
in
ge
ne
ral,
es
pecial
ly
on
s
ocial
m
edi
a
twit
te
r.
In
th
is
stud
y,
a
sentim
ental
analy
sis
on
2013
cu
rr
ic
ul
um
is
co
nducte
d.
E
ns
em
ble
of
sever
al
feat
ure
set
s
wer
e
use
d
inclu
ding
te
xtu
al
featu
res,
twit
te
r
sp
eci
fic
featu
r
es,
le
xic
on
-
ba
sed
featu
res,
Parts
of
S
pee
ch
(POS)
featur
e
s,
an
d
Ba
g
of
Wo
r
ds
(BO
W)
fe
at
ur
es
for
the
sentim
ent
cl
assifi
cat
ion
us
in
g
K
-
Neare
st
Neig
hbor
m
et
ho
d.
The
exp
e
rim
ent
resu
lt
showe
d
that
the
th
e
ensem
ble
f
eat
ur
es
ha
ve
the
best
perform
ance
of
sentim
ent
cl
assifi
cat
ion
com
par
ed
t
o
on
ly
us
i
ng
ind
ivi
du
al
feat
ur
es
.
T
he
best
accu
racy
us
i
ng
ens
em
ble
featur
es
is
96% whe
n k=
5 i
s u
se
d.
Ke
yw
or
d:
Ed
ucati
on
Ensem
ble Features
K
-
Near
e
st Nei
ghbor
Sentim
ent A
na
ly
sis
Text Cl
assifi
ca
ti
on
Twitt
er
Copyright
©
201
8
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
M. Ali Fa
uzi
,
Faculty
of Com
pu
te
r
Scie
nc
e,
Brawijaya
U
niv
ersit
y,
Jl. V
et
era
n, M
al
ang
,
In
donesi
a.
Em
a
il
:
m
och
.ali
.f
auzi@
ub.ac
.id
1.
INTROD
U
CTION
Accor
ding to
a
su
r
vey co
nduc
te
d
by IDC (
I
nt
ern
at
io
nal D
at
a Corporat
io
n),
a
m
ark
et
r
esea
rch
a
ge
ncy
in
the
Un
it
ed
Stat
es,
in
2013
to
20
20
t
he
num
ber
of
di
gital
inform
at
ion
will
con
ti
nue
t
o
gr
ow
c
orres
pondin
g
the
facto
r
of
10,
from
4
tril
lio
n
gig
a
byte
s
to
44
tril
li
on
gi
gab
yt
es.
T
his
i
s
com
m
ensu
rate
with
the
gro
wing
nu
m
ber
of
us
er
s
of
s
ocial
m
ed
ia
nowad
ay
s
si
nce
they
wa
nt
to
be
a
ble
to
exch
a
nge
in
for
m
at
ion
m
or
e
quic
kly.
Howe
ver,
not
al
l
info
rm
at
ion
disp
la
ye
d
al
w
ay
s
has
a
good
opinio
n
val
ue
.
Ther
e
a
re
m
ulti
ple
o
pi
nion
s
that
can
be
ei
the
r p
os
it
ive
or
ne
ga
ti
ve
to a
pa
rtic
ular
t
op
ic
t
hat is b
ei
ng d
isc
us
s
ed.
On
e
of
t
he
m
os
t
widely
ci
rcu
la
te
d
in
form
ation
to
day
is
the
op
i
nion
of
20
13
c
urricul
um
by
Ind
on
esi
a
n
Mi
nistry
of
E
du
c
at
ion
a
nd
C
ultur
e
.
T
he
20
13
curriculum
is
a
new
cu
rr
ic
ulum
to
su
ccee
d
the
ol
d
2006
curricul
um
(o
ften
refe
r
r
ed
as
KTS
P
)
in
the
Ind
on
esi
an
ed
ucati
on
s
yst
e
m
[1
-
2].
The
ap
plica
ti
on
of
this
new
cu
rr
ic
ulum
reap
s
a
va
riet
y
of
op
i
nions
from
pu
blic.
Ther
e
are
s
om
e
sign
i
ficant
diff
e
ren
ces
bet
w
een
this
new
c
ur
ricul
um
and
the
old
on
e
s
uc
h
as
stu
den
ts
a
re
re
qu
i
red
to
be
act
iv
e,
te
achers
only
su
bm
it
m
at
er
ia
ls
an
d
stud
e
nts
m
us
t
fin
d
ou
t
for
t
hem
sel
ves,
the
r
e
are
s
om
e
le
sso
ns
that
a
re
e
lim
inate
d,
require
sc
outs
a
nd
oth
e
r
things that
i
ncrea
sing
ly
provo
ked v
a
rio
us
op
i
nio
ns a
bout th
e top
ic
e
sp
eci
a
ll
y a
m
on
g
twit
t
er
us
ers
.
Twitt
er
is
on
e
of
the
la
rg
est
and
m
os
t
dyna
m
ic
so
ci
al
m
e
dia
co
ntributo
r
s
base
d
on
use
r
-
gen
e
rated
con
te
nt.
It
is
ve
ry
popula
r
a
m
on
g
I
ndonesi
an
pe
ople
.
I
n
Twitt
er,
us
ers
can
post
sta
tus
or
a
m
essage
that
is
cal
le
d
as
a
twe
et
that
is
not
m
or
e
than
140
char
a
ct
ers.
It
i
s
est
i
m
at
ed
that
there
a
re
a
bout
40
0
m
illi
on
tweet
s
po
ste
d by
200 m
illi
on
users
dai
ly
[
3].
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
8
, N
o.
6
,
Dece
m
ber
2
01
8
:
5409
-
5414
5410
In
t
his
stu
dy,
sentim
ent
analy
sis
syst
e
m
is
buil
t
to
know
the
posit
ive
or
ne
gative
opinio
n
t
hat
dev
el
op
e
d
in
t
he
so
ci
et
y
about
2013
cu
rr
ic
ulu
m
thro
ug
h
twit
te
r
m
edia.
Ensem
ble
of
s
ever
al
feat
ur
es
will
be
us
e
d
f
or
cl
assi
fyi
ng
the
pola
rity
of
tweet
s.
O
ne
of
the
pr
evio
us
w
ork
c
onduct
ed
by
[
4]
was
usi
ng
s
ever
al
sta
ti
sti
cal
and
sem
antic
featur
es
inclu
ding
t
extu
al
feat
ur
es
,
twit
te
r
sp
eci
f
ic
featur
es
,
le
xi
con
base
d
fea
tures
,
Parts
of
Sp
ee
ch
(
POS)
feat
ur
es
,
a
nd
Ba
g
of
Word
s
(B
O
W)
featu
res
al
on
e
only
gi
ve
73.
8%
ac
c
ur
acy
.
Me
anwhil
e,
the
ensem
ble
of
featur
e
s
can
im
pr
ov
e
the
accuracy
to
bec
om
e
87
.7
%
.
Th
e
us
e
of
this
en
sem
ble
featur
e
al
s
o
gi
ve
bette
r
acc
uracy
than
oth
e
r
featu
res
li
ke
un
i
gr
am
+
bi
gr
am
,
prop
a
ga
ti
on
la
bel,
se
nt
i
m
ent
top
ic
featu
re,
s
entist
rength
, me
ta
level featu
r
es, a
nd sem
ant
ria (
on
li
ne
syst
e
m
)
.
In
this
st
ud
y,
we
will
ex
plo
r
e
the
us
e
of
K
-
Nea
rest
Neig
hbor
(
K
NN)
f
or
the
cl
assi
ficat
ion
ta
s
k.
K
-
Near
est
Neig
hbor
(
K
-
N
N)
is
an
al
gorithm
that
cl
assifi
es
obj
ect
s
ba
sed
on
le
ar
ning
da
ta
that
rese
m
bles
the
cl
os
est
resem
blance
to
the
obj
ect
[
5
-
6].
I
n
a
pr
evi
ou
s
st
ud
y
co
nducte
d
by
[7
]
,
K
-
N
N
yi
el
ded
the
highest
accuracy
value
wh
e
n
c
om
par
ed
with
Naive
Ba
ye
s
and
Te
r
m
Gr
ap
h.
T
he
aver
a
ge
acc
ur
a
cy
resu
lt
is
98.
95%
for
K
-
NN
m
eth
od,
62.
66%
f
or
Nai
ve
Ba
ye
s
an
d
98.
72
%
for
Te
rm
Gr
ap
h.
The
refore
,
K
-
NN
w
ould
be
m
or
e
su
it
able to
u
s
e
for
this
task.
2.
RESEA
R
CH MET
HO
D
This
sect
io
n
de
scri
bes
the
ste
ps
i
n
the
se
ntim
ent
analy
sis
syst
e
m
.
The
m
ai
n
w
ork
flo
w
of
the
syst
em
can
be
se
en
in
Figure
1.
A
s
s
how
n
in
Fi
gur
e
1,
the
first
st
ep
co
nducte
d
in
this
syst
em
i
s
ta
king
a
twe
et
that
entere
d
by the
us
er a
nd
t
hen
s
ta
nd
a
rd
iz
at
io
n
of
words is con
duct
ed
. Th
is s
ta
nd
a
rd
iz
at
io
n
is
the p
urp
os
e
of
thi
s
sta
nd
a
rd
iz
at
io
n
is
to
c
onve
rt
non
-
sta
nda
rd
w
ords
i
nto
sta
nd
ard
an
d
t
o
c
orr
ect
sp
el
li
ng
er
r
or
s
.
T
he
ne
xt
s
te
p
is
featur
e
s
extrac
ti
on
.
S
om
e
fe
at
ur
e
us
e
d
in
this
work
incl
ud
i
ng
incl
ud
i
ng
te
xtu
al
featu
res,
twit
te
r
spe
ci
fic
featur
e
s,
le
xico
n base
d
feat
ur
e
s,
Pa
rts of
Sp
e
ech
(POS) feat
ur
es
, a
nd Bag
of
Wo
r
ds (B
O
W) featu
res.
The
detai
le
d
fe
at
ur
es
can
be
s
een
i
n
Ta
ble
1.
For
POS
fatu
r
es,
we
util
iz
e
ka
te
glo
API
t
o
ge
t
PO
S
ta
g
of
ea
ch
w
ords
.
W
e
al
so
us
e
data
f
r
om
pr
evio
us
researc
h
f
or
le
xic
on
of
posit
ive
a
nd
ne
gative
w
ord
s
,
e
m
oticons
,
data
dicti
on
ary
w
ord
am
plifie
r
or
intensifie
r
w
ord
by
[
8].
W
e
al
so
us
e
dicti
onary
of
no
n
-
sta
nd
a
r
d
or
sla
ng
la
ng
uag
e
by
[9
]
.
Sp
eci
al
f
or
th
e
BO
W
feat
ures
e
xtracti
on,
pr
e
proces
sin
g
ge
ner
al
y
s
hou
ld
be
cond
ucted
first
be
fore
t
he
e
xtr
act
ion
be
gin
[10].
T
his
pr
e
pro
cessi
ng
ste
p
in
cl
ud
in
g
t
okeni
zat
ion
,
filt
erin
g,
a
nd
stemm
ing
.
I
n
t
he
to
ke
nizat
io
n
process
,
eac
h
docum
ents
is
sp
li
tt
ed
into
s
m
al
le
r
un
it
s
ca
ll
ed
toke
n
[
11
]
.
I
n
t
his
ste
p,
al
l
le
tt
ers
are
con
ver
te
d
into
lowe
rcase
and
s
om
e
char
act
ers
li
ke
punc
tuati
on
,
nu
m
be
rs,
an
d
H
TML
ta
gs
are
al
so
rem
oved
[12
-
13
]
.
I
n
f
il
te
ring
,
uni
nfor
m
at
ive
words
are
rem
ov
ed
base
d
on
the
e
xisti
ng
sto
plist
by
by
Tal
a
[1
4
]
.
T
he
la
st
pr
ocess
in
pr
ep
r
ocessin
g
is
s
temm
ing
or
resto
rin
g
eve
ry
word
s
to
it
s
ro
ot
[
15
-
16]
.
I
n
this
case,
we use
S
ast
raw
i Ste
m
m
er
.
The
la
st
sta
ge
is
senti
m
ent
classificat
ion
us
i
ng
K
-
Near
est
Neig
hbor
.
Thi
s
sta
ge
ou
t
pu
t
is
te
st
data
ca
t
egory
wh
et
e
r
they
ar
e
po
sit
ive
or
ne
gative.
F
or
th
e
te
rm
weigh
ti
ng
m
et
h
od,
we
us
e
TF
.IDF
si
nce
it
is
a
ver
y
poplar
m
et
ho
d
a
nd
ge
ner
al
ly
gi
ves
ve
ry
go
od
pe
rfor
m
ance
on
cl
assifi
cat
ion
t
ask
[17].
T
he
neig
hbor
pro
xim
it
y
cal
culat
ion
in
this
stud
y
is
us
ing
c
os
ine
sim
i
l
arit
y
instea
d
of
Eucli
dian
dist
ance.
Ba
sed
on
the
pr
evi
ous
works
[18
-
19]
, c
os
ine
sim
il
arit
y gives p
e
rfor
m
s v
er
y well
on NLP
ta
sk
.
Figure
1.
Se
ntim
ent A
naly
sis
Syst
e
m
u
sing
Ensem
ble Features
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Twi
tt
er S
entim
ent A
na
ly
sis
on
2013 C
ur
ric
ulu
m
u
si
ng E
ns
e
mb
le
Feat
ur
es
and
…
(
M.
Riz
zo
Irfan
)
5411
Table
1.
T
he
E
ns
em
ble Featur
es
T
y
p
e
ID
Feat
ure
Descri
p
t
ion
Twit
ter Specifi
c
F1
W
het
her
th
e twe
et
cont
a
ins a #has
hta
g
or
no
t.
F2
W
het
her
th
e twe
et
is a re
tw
ee
t
or
not.
F3
W
het
her
th
e twe
et
cont
a
ins a use
r
name
or
not
.
F4
W
het
her
th
e twe
et
cont
a
ins a URL
or
no
t.
Te
xtu
al
Fe
at
ur
es
F5
Twe
etLe
ngth
:
N
um
ber
of
words
in
th
e
tw
ee
t
.
F6
AvgW
ordLe
ngth:
Avera
g
e cha
r
ac
t
er
l
engt
h
of
w
ords.
F7
Num
ber
of
quest
ion
m
ark
s in
the
twee
t
.
F8
Num
ber
of
excla
m
at
ion
m
ark
s in
the
twee
t
.
F9
Num
ber
of
quot
es
in the
tweet.
F10
Num
ber
of
words
start
wi
th the
u
pper
ca
se
l
et
t
er i
n
tweet.
F11
W
het
her
th
e twe
et
cont
a
ins a posit
ive e
m
oticon
o
r
not.
F12
W
het
her
th
e twe
et
cont
a
ins a ne
g
at
iv
e
emoti
con
o
r
not.
Parts
of
Spe
ec
h
(
PoS
)
Feat
ure
s
F13
Num
ber
of
noun
PoS
in
th
e twee
t
.
F14
Num
ber
of
adjec
ti
ve
PoS
in
th
e t
wee
t.
F15
Num
ber
of
ver
b
PoS
in
the t
we
et
.
F16
Num
ber
of
adverb PoS
in
th
e
twee
t
.
F17
Num
ber
of
interj
ec
t
ion
PoS
in
the t
wee
t
F18
Perc
entage
of
no
un
PoS
in
th
e twee
t
.
F19
Perc
entage
of ad
je
c
ti
ve
PoS
in th
e
twe
et.
F20
Perc
entage
of
v
e
rb
PoS
in
th
e twee
t
.
F21
Perc
entage
of ad
ver
b
PoS
in the
t
wee
t.
F22
Perc
entage
of
in
t
erj
e
ct
ion
PoS
in the
twee
t
.
Le
xi
con
Base
d
Feat
ure
s
F23
Num
ber
of
posit
ive
words
in
th
e twee
t
.
F24
Num
ber
of
negat
ive
words
in
th
e twee
t
.
F25
Num
ber
of
posit
ive
words
with
a
dje
c
ti
ve
PoS
.
F26
Num
ber
of
negat
ive
words
with
a
dje
c
ti
ve
PoS
.
F27
Num
ber
of
posit
ive
words
with
v
erb
PoS
.
F28
Num
ber
of
negat
ive
words
with
v
erb
PoS
.
F29
Num
ber
of
posit
ive
words
with
a
dver
b
PoS
.
F30
Num
ber
of
negat
ive
words
with
a
dver
b
PoS
.
F31
Perc
entage
of
po
siti
ve
words
wit
h
adjec
t
ive PoS
.
F32
Perc
entage
of
n
e
gat
iv
e
words
wit
h
adjec
t
ive PoS
.
F33
Perc
entage
of
po
siti
ve
words
wit
h
ver
b
PoS
.
F34
Perc
entage
of
n
e
gat
iv
e
words
wit
h
ver
b
PoS
.
F35
Perc
entage
of
po
siti
ve
words
wit
h
adve
rb
PoS
.
F36
Perc
entage
of
n
e
gat
iv
e
words
wit
h
adve
rb
PoS
.
F37
Num
ber
of
inten
sifie
r
words
in
th
e
twe
et.
BOW
Feat
ure
s
F38
Te
rm
1
…
…
F38+n
Te
rm
n
3.
RESU
LT
S
A
ND AN
ALYSIS
The
dataset
use
d
i
n
this
stu
dy
is
obta
ine
d
fro
m
twitter.
A
to
ta
l
of
20
0
twee
ts
co
ntainin
g
t
he
keyw
ord
'
Ku
rik
ulu
m
2013'
wer
e
ta
ken.
Of
t
he
200
data,
10
0
data
a
r
e
posit
ive
twe
et
s
and
the
ot
her
100
are
ne
gative
on
e
s.
T
he
cat
egory
of
the
t
w
eet
s
is
an
nota
te
d
m
an
ually
by
an
e
xpert.
D
at
aset
s
then
be
div
i
de
d
int
o
t
raini
ng
data
an
d
te
st
da
ta
.
A
total
of
150
tweet
data
wer
e
us
e
d
as
trai
ning
data
(
75
posit
ive
cat
egorical
data
a
nd
75
neg
at
ive
cat
eg
or
ic
al
data
)
a
nd
50 as test
data (
25
posit
ive c
at
egorical
d
at
a
and 25
ne
g
at
iv
e
cat
egorical
dat
a
).
In
this
st
ud
y
,
s
ever
al
e
xp
e
rim
ents
are
c
ondu
ct
ed
an
d
the
re
su
lt
s
are
an
al
yz
ed.
T
he
fi
rst
exp
e
rim
ent
is
to d
et
erm
ine
the
eff
ect
of
k
va
lue
of K
-
NN
to
the accu
racy
of
sentim
ent an
al
ysi
s
syst
e
m
.
T
he
ne
xt expe
rim
ent
to ex
pl
or
e the
us
e
of
the BO
W
featu
res,
the
en
sem
ble f
eatu
re
s
without BO
W
(
te
xtu
al
f
e
at
ur
es,
twit
te
r speci
fi
c
featur
e
s,
le
xico
n
-
base
d
featur
e
s
, a
nd PO
S
fea
tures
),
a
nd the
com
bin
at
ion
of them
all
.
3.1.
K Value E
xp
e
ri
ment Result
anad An
aly
sis
The
first
e
xp
e
rim
ent
is
to
a
naly
ze
the
effe
ct
of
k
val
ue
of
K
-
NN
to
the
accuracy
of
se
nti
m
ent
analy
sis
syst
e
m
and
determ
ine
wh
ic
h
the
k
value
of
K
-
NN
that
ha
s
the
best
accuracy
valu
e.
I
n
this
exp
e
rim
ent,
the
featu
res
us
e
d
are
the
com
plete
ensem
ble
featur
e
s
.
T
he
ex
per
im
ent
is
co
nducted
us
i
ng
sever
al
value
s
of
k
sta
r
te
d
f
ro
m
3
to
31.
The
experim
ent r
es
ult dis
pl
ay
ed
in
Fig
ur
e
2.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
8
, N
o.
6
,
Dece
m
ber
2
01
8
:
5409
-
5414
5412
Figure
2
.
K
V
a
lue
E
xp
e
rim
ent
Result
The
res
ult
showe
d
that
w
he
n
the
val
ue
of
k
was
too
s
m
al
l,
fo
r
ex
a
m
ple
the
value
k=
3
,
the
cl
assifi
cat
ion
accuracy
co
uld
no
t
re
ac
h
the
m
axi
m
u
m
po
in
t
becau
se
the
re
are
so
m
e
rele
van
t
data
that
are
not
involve
d
in the
cat
ego
ry voti
ng b
y
K
-
N
N.
How
e
ve
r,
when
the v
al
ue
of
k
was
too
big
, fo
r
exam
ple w
hen
the k
value
was
m
ore
than
13
,
t
he
accuracy
decr
e
ased
sl
ow
ly
be
cause
the
re
a
re
m
any
i
rr
el
evan
t
data
t
hat
ha
d
bee
n
involve
d
i
n
th
e
cat
eg
or
y
vo
ti
ng
.
The
best
accu
racy
value
is
obta
ine
d
w
he
n
k=5
w
it
h
96%
accu
r
acy
.
Ther
e
f
or
e,
this
best
value of
k
would be
used
for
the
n
e
xt e
xperim
ent.
3.2.
Ensembl
e Fe
atures E
xp
eri
m
ent Result
an
ad A
na
l
ys
is
This
ex
per
im
e
nt
aim
to
anal
yz
e
the
us
e
of
ensem
ble
featur
es.
I
n
this
ex
pe
rim
ent,
we
com
par
ed
the
us
e
of
the
the
BO
W
f
eat
ur
e
s,
the
ensem
ble
featur
e
s
with
out
BO
W
(
te
xtua
l
featur
es,
t
witt
er
sp
eci
fi
c
fe
at
ur
es
,
le
xicon
base
d
featur
e
s
,
an
d
P
O
S
feat
ur
es
),
a
nd
t
he
com
bin
at
ion
of
them
al
l.
The
ex
per
i
m
ent
resu
lt
dis
play
ed
in Figu
re
3.
Figure
3
.
En
se
m
ble Features
Ex
per
im
ent Result
It
is
cl
ear
t
o
s
ee
from
Figur
e
3
t
hat
the
m
os
t
in
fer
i
or
pe
rfor
m
ance
is
obta
ined
w
he
n
on
ly
B
O
W
featur
e
s
we
re
us
e
d
with
accu
racy
val
ue
of
80%.
It
ha
ppe
ne
d
becau
se
the
r
e
are
s
om
e
sh
ort
tweet
s
wh
ic
h
on
ly
has
a
ver
y
fe
w
words
that
ca
n
le
ad
to
s
pa
rsity
an
d
am
big
uity
.
Co
ns
e
qu
e
ntly
,
m
os
t
wo
r
ds
c
on
ta
ine
d
i
n
the
te
st
0
10
20
30
40
50
60
70
80
90
100
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
Ac
cu
racy
(%)
K
Value
70
75
80
85
90
95
100
Ba
g of
Word
s
Ensembl
e
withou
t BOW
Comple f
e
atures
en
s
emb
l
e
Ac
cu
racy(%
)
Featu
res
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Twi
tt
er S
entim
ent A
na
ly
sis
on
2013 C
ur
ric
ulu
m
u
si
ng E
ns
e
mb
le
Feat
ur
es
and
…
(
M.
Riz
zo
Irfan
)
5413
tweet
data
ne
ve
r
a
pp
ea
re
d
in
the
trai
ni
ng
data.
T
his
s
hows
that
the
us
e
of
this
fe
at
ur
e
is
hi
gh
ly
dep
e
nde
nt
on
word stat
ist
ic
s co
ntaine
d
i
n
t
he
trainin
g data.
T
he
ensem
ble
featur
e
s
with
ou
t
BO
W
(
te
xtu
al
featu
res,
twit
te
r
sp
eci
fi
c
featur
es,
le
xico
n
-
base
d
featur
e
s
,
a
nd
P
O
S
feat
ur
es
)
ha
d
sli
gh
tl
y
bett
er
pe
rfor
m
ance
than
on
ly
invo
lving
B
O
W
fe
at
ur
es.
T
he
acc
ur
acy
value
was
82%.
This
featu
r
e
is
ver
y
de
pe
nd
e
nt
on
the
di
ct
ion
ary
or
le
xico
n
use
d.
Wo
r
ds
on
te
st
da
ta
tweet
s
that
ha
ve
no
t
be
en
well
-
rec
ognized
or
no
t
c
onta
ined
withi
n
the
le
xico
n
will
aff
ect
the
feat
ur
e'
s
val
ue
s
o
t
hat
it
i
m
pacts t
he
cl
assifi
cat
ion
res
ul
t.
The
c
om
plete
com
bin
at
ion
of
al
l
featu
res
set
s
pe
rfor
m
the
be
st
accu
r
acy
by
96%
.
Ther
e
is
a
n
i
m
pr
ovem
ent
com
par
ed
to
t
he
pre
vious
fe
at
ur
es.
By
co
m
bin
ing
al
l
of
the
featu
re
s
et
s,
it
can
co
ve
r
the
weakness
of ea
ch feat
ur
e
s sets
and
get the
be
st o
ut of th
em
.
4.
CONCL
US
I
O
N
In
t
his
stu
dy
,
we
bu
il
t
se
ntim
ent
analy
sis
of
2013
c
urric
ulu
m
us
in
g
K
-
NN
an
d
e
ns
e
m
ble
featur
es
.
V
ari
ou
s
te
st
s
cenari
os
ha
ve
been
c
onduct
ed
to
s
pecify
the
eff
ect
of
k
val
ue
an
d
t
he
ef
fect
of
f
eat
ur
e
com
bin
at
ion
on
sentim
ent
cl
a
ssific
at
ion
acc
ur
acy
.
T
he
value
of
k
is
ver
y
prom
inent
in
the
accu
racy
of
the
K
-
NN
m
et
ho
d,
th
e
best
value
k
ob
ta
ine
d
w
he
n
k
was
5
with
t
he
acc
ur
acy
of
96%.
Th
e
k
va
lue
that
is
to
o
sm
a
l
l
causes
the
acc
ur
acy
obta
ine
d
has
not
reac
he
d
the
m
axi
m
um
po
int
ot
herwise
the
k
val
ue
to
o
m
uch
w
il
l
cause
the accu
racy t
o dec
rease.
Ap
a
rt
from
t
he
k
values
,
featur
e
c
om
bin
at
ions
al
so
hav
e
si
gn
i
ficant
sign
i
ficant
infl
ue
nce
in
i
m
pr
ovin
g
the
accu
racy.
C
om
bin
ing
BO
W
featu
res
a
nd
oth
e
r
feat
ur
es
includi
ng
te
xt
ual
feat
u
res
,
t
witt
er
-
sp
eci
fi
c
feature
s,
P
OS
fea
tu
r
es,
an
d
le
xico
n
-
base
d
featu
res
can
im
pr
ove
t
he
acc
ur
acy
c
om
par
ed
to
on
ly
us
in
g
ind
e
pende
nt
fe
at
ur
es.
I
ncor
porati
ng
t
his
feat
ur
e
ca
n
c
over
t
he
w
eak
nesses
of
eac
h
featu
re
set
s
an
d
a
nd
ge
t
the
best
ou
t
of the
m
.
The
be
st
ac
cur
acy
gaine
d by com
bin
in
g
al
l
featur
es
set
s
reac
hes 9
6%
accuracy
value
.
REFERE
NCE
S
[1]
Poerwat
i,
Loelo
ek
End
ah,
and
Sofan
Am
ri.
"P
andua
n
Mem
aham
i
Kurikulum
2013.
"
Jak
arta
:
Prestasi
Pustaka
2013.
[2]
Im
a
Nurdiana
,
Yulia
.
“
Com
par
at
iv
e
Stud
y
of
Im
ple
m
ent
at
ion
o
f
2013
Curric
ulum
in
Cla
ss
X
Be
twee
n
State
High
School
1
Ta
m
an S
idoa
rjo and
Ma
dra
sah
Ali
y
a
h
Si
doar
jo"
(in
Bah
a
sa)
,
PhD
diss., U
IN Sunan
Am
pel,
2015
.
[3]
Da
Silva,
Nad
ia
FF
,
Edua
rdo
R.
Hrus
chka
,
and
Este
vam
R.
Hru
schka
.
"
Twe
et
senti
m
ent
anal
y
s
i
s
with
c
la
ss
ifi
e
r
ense
m
ble
s"
,
De
c
ision
Support
S
y
stems
66
:
pp
.
17
0
-
179
;
2014
[4]
Siddiqua
,
Um
m
e
A
y
m
un,
Ta
nv
ee
r
Ahs
an,
and
Abu
Now
shed
Ch
y
.
"Com
bini
ng
a
rul
e
-
base
d
cl
assifi
er
wit
h
ense
m
ble
of
feat
ure
sets
and
m
ac
hine
l
ea
rn
ing
technique
s
for
sen
ti
m
ent
anal
y
s
is
on
m
ic
roblog.
"
I
n
Computer
and
Information
Tec
hnology
(
ICCIT),
2016
19th
In
te
r
nati
onal
Con
fe
re
nce
on
,
I
EEE,
pp
.
304
-
309
;
2016.
[5]
Hardi
y
ant
o
,
Eri
k
,
and
Faisal
Rah
utomo.
“
Preli
m
i
nar
y
Cl
assificat
i
on
Study
Indone
sian
W
iki
pedi
a
Artic
l
es
Us
i
ng
t
he
K
-
Nea
rest
Ne
igh
bor
Method
(in Baha
sa)
”
Sen
trin
o
,
2
(
1
)
:
158
-
165
;
2016
.
[6]
Suharno,
Cl
audio
Fresta,
M.
Al
i
Fauz
i,
and
R
i
za
l
Set
y
a
Perda
na.
“
Indone
si
an
Te
x
t
Cl
assifica
ti
on
On
Onl
ine
Com
bustion
Do
cuments
Us
ing
K
-
Nea
rest
Neig
hbors
And
Chi
-
Square
Methods
."
Syste
mic:
Infor
mation
Syste
m
and
Informatic
s J
our
nal
,
Vol
3
no
.
1
: 25
-
32
;
2017
.
[7]
Bij
al
wan
,
Vishw
ana
th
,
Vina
y
Kum
ar,
Pinki
Kumar
i,
and
Jordan
Pascua
l.
"K
NN
base
d
m
ac
hine
l
ea
rning
appr
oa
c
h
for
te
xt
and
do
c
um
ent
m
ini
ng.
"
Inte
rnational
Jo
urnal
of
Database
Theor
y
and
Appl
ic
a
ti
on
,
Vol
7,
no.
1
:
61
-
70
;
2014
.
[8]
W
ahi
d,
Devid
H
ar
y
alesm
ana
,
an
d
S.
N.
Azha
ri
.
"
Summ
ari
zi
ng
Ex
tra
c
ti
ve
Sen
ti
m
e
nts
on
Twit
ter
Us
ing
H
y
brid
TF
-
IDF
and
Co
sine
Sim
il
ari
t
y
(in
Baha
sa)
"
,
IJCC
S
(
Indone
sian
J
ournal
of
Comp
uti
ng
and
Cybe
rnetics
Syste
ms
)
10,
no.
2:
207
-
218.
[9]
Antina
sari
,
Pran
anda
,
Rizal
Se
t
ya
Perda
n
a,
and
M.
Ali
Fauz
i.
"
S
ent
iment
Ana
l
y
s
is
of
Film
Opinion
on
Indone
sia
n
La
nguag
e
Twitter
Docum
ent
s
Us
ing
Naive
Ba
y
es
with
Non
-
Standa
rd
W
ord
Repa
ir
(in
Baha
sa)
"
Jurna
l
Pe
ngembangan T
ek
nologi Inf
or
m
asi da
n
Ilmu
K
omputer
e
-
ISSN
2548:
964X.
[10]
Feldman,
Ronen
,
and
Jam
es
Sanger
.
“
The
te
x
t
m
ini
ng
handbook:
adva
nc
ed
app
roa
che
s
in
an
alyzing
unstruct
ur
e
d
dat
a
”
.
Cambridg
e
univ
ersity
pre
s
s,
2007.
[11]
Fauzi
,
M.
Ali,
Agus
Za
ina
l
Arifin,
and
Sonn
y
Christi
ano
Gos
ari
a.
"Indonesian
News
Cla
ss
ifi
c
at
ion
Us
ing
Naïve
Ba
y
es
and
Two
-
Phase
Feat
ure
Sele
c
ti
on
Model
"
,
Indone
sian
J
ournal
of
El
e
ct
r
ic
al
Eng
ineerin
g
and
Computer
Sci
en
ce
(
IJE
ECS
)
,
Vol
8,
no.
3
:
6
10
-
615;
2017.
[12]
Fauzi
,
M
.
Ali
,
Agus
Za
inal
Arifi
n,
and
Ann
y
Yu
nia
rti.
"A
rab
i
c
B
ook
Ret
ri
eva
l
using
Cla
ss
and
B
ook
Inde
x
Base
d
Te
rm
W
ei
ghti
ng
.
"
Int
ernati
onal
Journal
of
El
ect
rical
and
Comp
ute
r
Engi
nee
rin
g
(
IJE
CE)
,
Vol
7,
no.
6
:
3705
-
3710
;
2017
.
[13]
Fauzi
,
M.
Al
i,
Agus
Arifin,
and
Ann
y
Yuni
art
i
.
"
Te
rm
W
ei
ghti
n
g
Based
on
Boo
k
Inde
x
and
C
las
s
for
Ranki
ng
o
f
Arabi
c
Do
cuments
(in
B
aha
sa)
”,
Lontar
Komput
er:
Jurnal Ilmia
h
Teknol
og
i
Info
rm
asi
5,
no.
2
(2
013).
[14]
Ta
l
a,
Fad
il
l
ah
Z
.
"A
stud
y
of
ste
m
m
ing
eff
ec
ts
o
n
informati
on
r
e
tri
ev
al
in
Baha
s
a
Indone
sia
.
"
In
stit
ute
for
Logic,
Language
and
C
omputat
ion, Uni
ve
rs
it
e
it
van Amsterdam,
The
N
e
the
rlands
(2003)
.
[15]
Fauzi
,
M.
Ali
,
a
nd
Anny
Yuni
ar
ti
.
"Ensem
ble
M
et
hod
for
Indone
sian
Twit
t
er
Hat
e
Speec
h
Det
ec
t
ion.
"
Indon
esian
Journal
of
Elec
t
rical
Engi
ne
erin
g
and
Computer
Sci
en
ce
11,
no.
1
(2018).
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
8
, N
o.
6
,
Dece
m
ber
2
01
8
:
5409
-
5414
5414
[16]
Fauzi
,
M.
Ali
,
Ro'
I
.
Fahre
z
a
Nur
Firm
ansy
ah
,
an
d
Tri
Afiria
nto
.
"Im
proving
Senti
m
ent
Anal
y
s
is
of
Short
Inform
a
l
Indone
sian
Product
Revi
ews
using
Sy
non
y
m
Based
Feat
ure
Expa
nsion."
TE
LK
OMNIKA
Te
le
communic
a
ti
o
n,
Computing,
El
e
c
tronic
s and
Con
t
rol.
Vol
16
,
no
.
3
;
1345
-
1350;
2
018.
[17]
Christophe
r,
D
.
Manning,
Ragh
a
van
Prabha
kar
,
and
S.
C.
H.
Ü
.
T.
Z
.
E
.
Hinri
c
h.
"Introd
uc
ti
on
to
informati
on
ret
ri
eva
l
"
,
An
Int
roducti
on
to
Info
rm
ati
on
Retriev
a
l
,
151
:
177
;
200
8
.
[18]
Pram
ukant
oro,
E
ko
Sakti
,
and
M.
Ali
Fauzi.
"Co
m
par
at
ive
anal
ysis
of
stri
ng
sim
il
ari
t
y
and
cor
pus
-
base
d
sim
il
arit
y
for
aut
om
at
i
c
es
sa
y
scor
ing
s
y
st
em
on
e
-
learni
n
g
gamific
a
ti
on
.
"
In
Adv
anc
ed
C
omputer
Sci
en
ce
and
Informatio
n
Syste
ms
(
ICACSIS)
,
2016
Inte
rn
ati
onal
Con
fe
ren
ce
on
,
pp
.
149
-
1
55.
IE
EE,
2016
.
[19]
Fauzi
,
M
.
Ali
,
Djoko
Cah
y
o
Utom
o,
Eko
Sakt
i
Pram
ukant
oro,
and
Budi
Dar
m
a
Seti
awa
n
.
"
Autom
at
ic
Es
s
a
y
Scoring
S
y
stem
Us
ing
N
-
Gram
and
Cosine
Sim
il
ari
t
y
for
Ga
m
ifi
ca
t
ion
Base
d
E
-
Learni
ng
.
"
In
Inte
rnational
Confe
renc
e
on
Adv
anc
es
in
Im
age
Proce
ss
ing
(
ICAIP
)
,
2017
I
nte
rnational
Co
nfe
renc
e
on
,
pp.
151
-
155.
ACM,
2017
.
Evaluation Warning : The document was created with Spire.PDF for Python.