Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 1
,
Febr
u
a
r
y
201
6,
pp
. 35
7
~
36
6
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
1.8
982
3
57
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Sentimental Analysis of Twi
tter Data Using Classifier
Algorith
ms
Sha
rv
il Sha
h
*, K
Kuma
r*
*,
R
a
. K. Sa
ra
vanag
uru**
* Software Developer, Tr
iforce
S
o
lutions, Ahmedabad, India
** School of
Co
mputing Scien
c
e and
Engin
eerin
g, VIT University
, Vellore, Ind
i
a
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Sep 7, 2015
Rev
i
sed
O
c
t 26
, 20
15
Accepted Nov 11, 2015
Microblogging
has become a daily
routin
e for mo
st of the people in this
world. With the help of Microb
l
ogging people
get opinions ab
out several
things going on
, not only
aroun
d the n
a
tio
n
but
also worldwide
.
Twi
tter
is
one such online social networkin
g
website where people can post their views
regarding
som
e
thing. It
is a hu
ge platform
hav
i
ng over 316 M
illion
users
regis
t
er
ed from
all ov
er th
e wor
l
d. It
enab
les
us
ers
to s
e
nd
and
read s
hort
messages with over 140 char
acters for
compatibility
with SMS messaging. A
good sentimental an
aly
s
is of d
a
ta of
th
is huge p
l
atform can lead
to
achieve
man
y
new app
lications lik
e –
Movi
e review
s, Product reviews, Spam
detection
,
Knowing consumer needs, etc.
In th
is paper, we h
a
v
e
devised
a
new algorithm with which the above needs can
be achiev
e
d. Our algorithm
us
es
three
s
p
eci
fic t
echn
i
ques
f
o
r s
e
ntim
ent
a
l
a
n
al
y
s
is
and
can
be c
a
ll
ed a
h
y
brid algorithm
– (1)
Hash Tag
Classi
fication fo
r topic modeling
;
(2) Naïve
Bay
e
s Classifier
Algorithm for p
o
larity
classification; (3)
Em
otic
on Anal
y
s
is
for Neutral po
lar data. Th
ese technique
s indiv
i
d
u
ally
h
a
ve some limitations
for sentim
ent
a
l
a
n
al
y
s
is.
Keyword:
M
i
crobl
og
gi
n
g
Naïve Bay
e
s
Sen
tim
en
tal An
alysis
Twitter
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
S
h
ar
v
il Sh
ah
,
7, He
ritage
E
n
clave,
Th
altej
,
Ah
m
e
d
a
b
a
d
–
38
0059
, In
d
i
a
Em
a
il: sh
arv
il.sh
ah
199
4@g
m
ail.co
m
1.
INTRODUCTION
W
i
t
h
th
e in
creasin
g
nu
m
b
er
o
f
u
s
ers and
tweets, it wo
u
l
d
b
e
b
e
st to
an
al
yze th
e twitter
d
a
ta to
g
e
t to
kn
o
w
ab
o
u
t
v
a
ri
o
u
s rel
e
v
a
nt
t
h
i
ngs
g
o
i
n
g on a
r
o
u
nd
us.
M
oni
t
o
ri
n
g
and
revi
e
w
i
n
g t
h
e pe
rs
pect
i
v
e
fr
om
so
cial m
e
d
i
a p
r
ov
id
es
g
r
eat
o
ppo
rt
u
n
ities fo
r
p
u
b
lic and
p
r
i
v
ate secto
r
. Fo
r ex
am
p
l
e, a co
m
p
an
y is ab
le to
k
now if th
e ann
oun
cem
en
t o
f
a p
r
o
d
u
c
t h
a
s
n
e
g
a
tiv
e
o
r
po
sitiv
e i
m
p
act. A Po
litical lead
er can
k
now if he h
a
s
g
o
t
an
y ch
an
ces to
win
in
t
h
e up
co
m
i
n
g
electio
n
s
. Area o
f
Sen
tim
en
t
a
l An
alysis is ap
p
ealing
to
a lo
t o
f
research
ers and
scien
tists du
e to
th
e ch
allen
g
es
it o
f
fers an
d its po
ten
tial app
licab
ility [1
].
Th
e sen
tim
en
t
a
l an
alysis cou
l
d
lead
to sev
e
ral ch
allenges lik
e
d
a
ta sp
arsity wh
ich
is b
ecau
s
e
o
f
sl
ang l
a
n
g
u
age
used
due t
o
wo
rd l
i
m
it
. Al
so, t
h
i
s
pl
at
fo
r
m
i
s
an open
dom
ai
n whe
r
e
users ca
n p
o
s
t
about
anything
whic
h leads
us
to
build
a
se
ntiment
classifier. To reach
m
a
xim
u
m
efficiency and a
ccurac
y
our
alg
o
rith
m
sh
o
u
ld
run
in real ti
me [3
].
In
t
h
is p
a
p
e
r,
we no
t
o
n
l
y g
i
v
e
a
b
i
n
a
ry classificatio
n
o
f
po
sitiv
e and
n
e
g
a
tiv
e
d
a
ta bu
t also
g
i
v
e
a
h
a
sh
tag classificatio
n
for t
o
p
i
c m
o
d
e
ling, an em
o
tico
n
an
alysis fo
r
d
e
term
in
in
g
po
larity o
f
th
e po
st,
m
u
ltilingual support by usi
ng
tools like
G
oog
le Lang
uag
e
D
e
tecto
r
a
nd
La
ng
id
[
1
]
.
W
e
al
so
gi
ve
a
g
r
a
phi
cal
rep
r
ese
n
t
a
t
i
on
of t
h
e se
nt
im
ent
a
l
anal
y
s
i
s
b
y
m
a
ki
ng
use
of
G
o
o
g
l
e
C
h
a
r
t
To
ol
s.
In t
h
i
s
pa
per,
we
p
o
r
t
r
ay
a
n
alg
o
rith
m
wh
ich
can
try t
o
d
e
tect th
e cu
rren
t
a
ttitu
d
e
of th
e
u
s
er toward
s a
p
a
rticu
l
ar top
i
c.
For
se
nt
im
ent
a
l
anal
y
s
i
s
o
u
r
a
p
p
r
oach
m
e
nt
ione
d i
n
t
h
i
s
pa
per
i
s
di
vi
de
d i
n
t
h
e
f
o
l
l
o
wi
n
g
pa
rt
s:
Data Retriev
a
l:
Th
e first app
r
o
ach is ret
r
iev
a
l o
f
d
a
ta
fro
m
t
w
itter b
y
u
s
ing twitter APIs
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
35
7 – 36
6
35
8
Pre-
p
r
oces
si
n
g
:
The ret
r
i
e
ve
d dat
a
i
s
pre
-
p
r
o
cessed f
o
r f
u
rt
her
act
i
o
n
Hash
Ta
g C
l
as
si
fi
cat
i
on
[1
2]
:
The
p
r
e-
pr
oce
ssed
dat
a
i
s
t
h
e
n
cl
assi
fi
e
d
i
n
t
o
t
o
pi
c
wi
se
da
t
a
by
pa
rsi
n
g
Po
larity Classifier: After hash
tag
classificat
io
n
th
e
d
a
ta
is
th
en
classified
to
s
ubjective a
nd
objecti
v
e data
with
h
e
lp
of Naïv
e
Bayes
Classifi
er an
d Po
larity Sh
ifter [4
, 5
]
.
Em
o
tico
n
Analysis: If th
e al
go
rith
m
is u
n
a
b
l
e to
classify polarity, th
en
th
e
p
a
rser look
s
for em
o
tico
n
and
classifies data
accordingly
Please
re
fer
t
o
Figu
re 1 fo
r fu
rther
details
Figure
1.
System
Flowchart
2.
RELATED WORK
Sent
i
m
ent
a
l Anal
y
s
i
s
i
s
a boom
i
ng t
opi
c i
n
fi
el
d of re
searc
h
. It
has bee
n
s
t
udi
ed f
o
r y
ears on va
ri
o
u
s
t
e
xt
cor
pus l
i
k
e news
pa
per a
r
t
i
c
l
e
s,
m
ovi
e revi
e
w
s an
d p
r
od
uct
re
vi
ews.
In t
h
e ve
ry
be
gi
n
n
i
n
g, resea
r
ch o
n
t
h
i
s
t
opi
c
was
do
ne
wi
t
h
t
h
e hel
p
of M
a
xi
m
u
m
Ent
r
op
y
and S
u
pp
o
r
t
Vect
or M
a
c
h
i
n
es t
o
det
e
c
t
t
h
e
sent
i
m
ent
s
. Th
e m
a
xim
u
m
opt
im
al
resul
t
of
83
%
was cl
ai
m
e
d by
o
n
e
of
t
h
e re
searc
h
ers
nam
e
d M
a
xE
nt
. B
u
t
d
u
ring
th
is trad
itio
n
a
l research
th
ey were
n
o
t
ab
le to
classify d
a
ta in
neu
t
ral sen
tim
e
n
t. To
ov
ercome th
is
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
S
e
n
timen
ta
l
Ana
l
ysis o
f
Twitter Da
ta
Usi
n
g
Cla
ssifier Alg
o
rith
ms
(Sha
rvil Sh
ah
)
35
9
l
i
m
i
t
a
t
i
on, Pak an
d Par
o
ub
ek ad
di
t
i
onal
l
y
ret
r
i
e
ved n
e
ut
ral
t
w
eet
s and t
h
e
n
u
s
ed
3-cl
ass
Naï
v
e
B
a
y
e
s
Classifier wh
ich
was ab
le to detect n
e
u
t
ral m
e
ssag
e
s alon
g
with
th
e po
lar
o
n
e
s [9
].
R
e
searche
r
s f
r
o
m
IIT B
o
m
b
ay
, In
di
a pu
bl
i
s
hed
pap
e
r
on T
w
isent [23], a s
e
ntim
ental analysis syste
m
for Twitter.
It
co
llects tweets p
e
rtain
i
ng
t
o
it an
d categ
o
r
i
zes th
em
in
d
i
fferen
t
po
larity classes – positiv
e,
negat
i
v
e a
nd
o
b
ject
i
v
e. H
o
w
e
ver
,
anal
y
z
i
n
g m
i
cro-
bl
o
g
post
s
have m
a
ny
i
nher
e
nt
ch
al
l
e
nges com
p
ared t
o
ot
he
r t
e
xt
ge
n
r
es [2
4]
. R
e
sea
r
chers
nam
e
d B
a
rb
osa a
n
d
Fe
n
g
u
s
ed
2 cl
assi
fi
ers -
Su
b
j
ect
i
v
e ve
rs
us
Ob
je
ct
i
v
e
classes and
Positiv
e v
e
rsu
s
neg
a
tiv
e classes. Th
ey
p
r
esen
t
sep
a
rate ev
al
u
a
tio
n on
bo
th m
o
d
e
ls b
u
t
do
n
o
t
expl
ore
com
b
i
n
i
n
g t
h
em
or
c
o
m
p
ari
n
g
i
t
wi
t
h
a
3-
way
cl
a
ssi
fi
cat
i
on
sch
e
m
e
[1
4]
. Ji
an
g et
al
.
,
20
1
1
prese
n
t
resu
lts on
bu
ild
ing
a 3
-
way classifier for
Obj
ectiv
e,
Po
sitiv
e an
d
Negativ
e tweets. Howev
e
r, th
ey d
o
not
expl
ore
t
h
e ca
s
caded
de
si
g
n
a
n
d
d
o
n
o
t
det
e
c
t
Neut
ral
t
w
eet
s [
16]
.
Tabl
e 1.
R
e
l
a
t
e
d Wo
r
k
by
di
f
f
e
rent
a
u
t
h
or
s
Author
Advantages
L
i
m
itations
Spencer [
4
]
Naives Bayes Clas
sif
i
cation
No hash tag classif
i
cation
Apoor
va
[11]
E
nd to end pipelin
e for
classify
ing tweets,
tr
ee
ker
n
el
m
odel,
100 senti featur
es
m
odel,
ker
n
el
plus senti featur
es,
unigr
am
plus senti featur
es
T
opic m
odeling
Alexander [1]
Classifies data wit
h
proper accurac
y
Multi lingual supp
ort
Theresa [
12]
Hash Tag classif
i
c
a
tion
Uses
iSieve data
set-
very
narr
ow data
Sunil [25]
Real tim
e
analy
s
is
us
ing Hadoo
p
Does not under
s
tand sar
cas
m
Go et al.
Binary cl
assif
i
cation
Did
not im
p
r
ove classifi
cation per
f
or
m
a
nce
Pak and Par
oubek
[1]
E
n
tr
opy
,
salience
and naïve bay
e
s classification
for
classify
ing
m
i
cr
oblogs
T
h
ey
r
e
m
ove UR
L
s
,
user
nam
e
s
,
r
e
tweets,
em
oticons and ar
ticle stopwor
ds (
a
,
an,
the)
f
r
o
m
all twee
ts and tokenize
on whit
espace
and punctuatio
n
Bar
bosa and Feng
[13]
Two step Classif
i
e
r
– Subjective
and Objective
for better classifica
tion
T
h
eir
appr
oach on
ly
possible with li
m
i
ted
data
Berm
ingham
an
d
Sm
eaton [26]
Collect tweets of
ten so-
called tr
ending t
opics
f
o
r each of
the f
i
ve categories “
e
ntertain
m
e
nt,
pr
oducts and ser
v
i
ces,
spor
t,
cur
r
ent affair
s and
co
m
p
anies” (
B
er
m
i
ngham
and Sm
eaton,
20
10)
to build a
m
a
nually
annotated dataset
More ac
curate
wit
h
short tweets
Su
m
i
t
Graphical Rep of Twitter Data
No e
m
oticon analysis
3.
DAT
A RETR
EIVAL
Th
e
d
a
ta fro
m
Twitter can
b
e
retriev
e
d
i
n
m
a
n
y
wa
ys
li
k
e
–
Using
Twitter
Search
APIs, Nod
e
XL o
r
Ki
m
onofy
T
o
o
l
usi
n
g w
h
i
c
h
we can
ge
nerat
e
AP
Is an
d i
m
po
rt
al
l
t
h
e re
q
u
i
r
e
d
dat
a
.
We
need t
o
do t
h
i
s
i
n
rea
l
tim
e
and so our syste
m
faces millions of twe
e
ts at on
ce. T
h
is data is pre
p
rocesse
d and cl
assified acc
ording to
th
e po
larity. Th
e Em
o
tico
n
dataset can
retriev
e
fro
m
twi
tte
rsen
tim
en
t.appsp
o
t
.com
.
In this se
ction, res
u
lts
of
rese
arch are
explai
ne
d
a
n
d
at
t
h
e
sam
e
t
i
m
e
i
s
gi
ve
n t
h
e c
o
m
p
re
he
nsi
v
e
di
scussi
o
n
. R
e
sul
t
s
can
be p
r
esent
e
d i
n
fi
g
u
r
es,
gra
p
hs, t
a
bl
es an
d
ot
her
s
t
h
at
m
a
kes the rea
d
er
u
nde
rst
a
n
d
easi
l
y
[2]
,
[
4
]
.
The
di
scu
ssi
o
n
can
be m
a
de i
n
se
veral
su
b-c
h
apt
e
rs.
3.1. Prepr
o
ces
sing
The p
r
e
p
r
o
ces
si
ng
of t
w
eet
s i
s
a very
im
po
rt
ant
part
o
f
t
h
i
s
pape
r. T
h
e dat
a
ret
r
i
e
v
e
d i
n
J
S
O
N
fo
rm
at
i
s
fi
rst
con
v
e
r
t
e
d t
o
no
r
m
al
t
e
xt
m
e
ssage.
It
c
ont
ai
n
s
fol
l
o
wi
n
g
–
All cap
s id
en
tificatio
n
Lower ca
sing
UR
L R
e
m
oval
Em
o
tico
n
An
alysis
Re
m
oval of Punctuations a
n
d
White s
p
aces
Letter
Redu
ndan
c
y / Co
m
p
r
e
ssio
n
of
W
o
rd
s
Since the
twee
t can
be in
Lower ca
se or
Upper ca
se
,
for the conve
nience
of t
h
e algorithm
a
t
first the
tex
t
is co
nv
ert
e
d
t
o
lower case [11
]
. It
is po
ssib
l
e th
at
th
e tweet can
h
a
v
e
URLs, so
all th
e
URLs are
el
im
i
n
at
ed f
r
o
m
t
h
e m
e
ssages wi
t
h
t
h
e
hel
p
of
re
gul
a
r
e
x
p
r
essi
on
o
r
repl
a
c
i
ng
wi
t
h
ge
ne
ri
c w
o
r
d
UR
L.
The
use
r
nam
e
s m
e
nt
i
oned i
n
t
h
e
dat
a
ret
r
i
e
v
e
d a
r
e el
i
m
i
n
ated
wi
t
h
t
h
e
hel
p
of
re
g
u
l
a
r e
x
pressi
o
n
or
repl
ace
d by
an
y
ot
her w
o
r
d
w
h
i
c
h i
s
ha
vi
n
g
a neut
ral
p
o
l
a
ri
t
y
. The wo
rd
s havi
ng
has
h
t
a
g rem
a
i
n
s unc
h
a
ng
e
d
so t
h
at
t
h
ey
can
be
used
fo
r t
o
pi
c m
odel
i
n
g
[
2
4]
. I
f
t
h
e
w
o
r
d
i
s
ha
vi
n
g
m
a
ny
red
u
n
d
a
n
ci
e
s
l
i
k
e
‘
h
app
p
p
yyyyy
yy’
, th
en su
ch
w
o
r
d
s ar
e
conv
er
ted
to
‘
h
app
yy’
b
y
r
e
m
o
v
i
n
g
m
a
x
i
m
u
m
r
e
dun
d
a
n
c
ies
po
ssib
l
e
and kee
p
i
n
g u
p
t
o
t
w
o repet
i
t
i
ons. Pu
nct
u
at
i
ons
a
n
d
ad
di
t
i
onal
whi
t
e
s
p
a
ces are rem
o
v
e
d kee
p
i
ng
o
n
l
y
one
white s
p
ace i
n
the m
i
ddle of
words a
n
d eliminati
ng punct
uations
with the
help
of parsing.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
35
7 – 36
6
36
0
Fi
gu
re 2.
Pre
p
r
o
cessi
n
g
3.
2. Ha
sht
a
g Cl
assi
fi
c
a
ti
on
Hasht
a
g cl
assi
f
i
cat
i
on i
s
very
im
port
a
nt
f
o
r t
opi
c m
odel
i
ng
[6
, 10]
.
Whi
l
e
post
i
ng any
m
e
ssage, t
h
e
user
use
s
a
ha
sh t
a
g
,
f
o
r e
g
.
#I
n
dvs
A
u
s.
So
, f
r
om
t
h
i
s
we
can
k
n
o
w
t
h
a
t
t
h
e p
o
st
i
s
a
b
o
u
t
t
h
e
I
ndi
a
vers
u
s
Aust
ral
i
a
m
a
t
c
h.
Thi
s
ca
n
hel
p
i
n
cl
assi
fy
i
n
g
t
h
e
pre
p
r
o
cess
e
d
dat
a
i
n
va
ri
ous
t
o
pi
cs.
We d
o
n
o
t
ch
ange t
h
e
has
h
t
a
g wor
d
s
du
ri
n
g
pre
p
roces
si
ng
.
W
i
t
h
t
h
e
hel
p
o
f
dat
a
parsi
n
g
,
t
h
e
alg
o
rith
m
can
id
en
tify th
e h
a
sh
tagg
ed
wo
rd
s and
wit
h
the h
e
lp
of th
at
p
a
rticu
l
ar tex
t
messag
e
is classified
into that group so th
at the
data does
not get
mixed up and
because
of that
accuracy inc
r
e
a
ses. Our al
gorith
m
doe
s
not
rem
ove t
h
e l
e
ss
use
d
has
h
t
a
gs
i
n
st
ead i
t
c
once
n
t
r
at
es o
n
t
h
e m
o
st
use
d
has
h
t
a
g
s
[
13]
.
Ha
sht
a
g
i
n
t
h
e
m
e
ssage can
p
r
ove
ve
ry
m
u
ch
cr
uc
ial for classifyin
g th
e
d
a
t
a
.
One
challe
nge
that Ha
shta
g classi
ficatio
n mig
h
t
po
se is th
at afte
r the h
a
sh
sym
b
ol, th
e tex
t
is
concate
n
ated because of which
the
r
e
m
i
ght be a
problem
of t
opic m
odeling
[8]. T
o
overcom
e
this, we have
p
r
op
o
s
ed
a small alg
o
r
ith
m
as fo
llo
ws
–
In ge
ne
ral
,
pe
opl
e w
r
i
t
e
has
h
t
a
gs i
n
a co
n
cat
enat
ed format. There are
no
wh
ite spac
es or special
ch
aracter in
b
e
tween
wh
ich
parser
can
id
en
tify to
sp
lit th
e tex
t
. For ex
am
p
l
e du
ring
World
cup
2
015
, tweets
rel
a
t
e
d t
o
al
l
Indi
a
n
m
a
tches ha
d a
hasht
a
g ‘
#
We
Wo
nt
Gi
ve
I
t
B
ack’ o
r
‘
#
wew
o
nt
gi
vei
t
b
ack’
or
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISS
N
:
2088-8708
S
e
n
timen
ta
l
Ana
l
ysis o
f
Twitter Da
ta
Usi
n
g
Cla
ssifier Alg
o
rith
ms
(Sha
rvil Sh
ah
)
36
1
‘
#
W
E
W
O
N
T
GI
V
E
I
T
B
A
CK’
.
Fir
s
t k
i
n
d
is
used
m
o
r
e
th
an
th
e second
an
d th
e th
ird
k
i
nd
. Bu
t, w
e
cannot r
e
ly
o
n
p
e
op
le pu
t
tin
g
a cap
ital letter at startin
g
of
e
v
ery
new
w
o
r
d
.
S
o
, we
m
a
ke a l
i
s
t
of
pre
p
osi
t
i
ons
,
con
j
unct
i
o
ns a
nd ‘
w
h’
quest
i
on
wo
rd
s. Usi
ng t
h
at
l
i
s
t
,
t
h
e parse
r
searc
h
es for t
h
e wo
r
d
(i
r
r
espect
i
v
e
of t
h
e
case used) as given in the list, if it finds the word, it put
s whitespace in front and rea
r
of that particular word.
If th
e
word
search
ed
is h
a
v
i
ng th
e first p
o
sitio
n
i.e. im
m
e
diat
el
y
aft
e
r t
h
e
has
h
t
a
g t
h
e
n
t
h
e hasht
a
g i
s
re
m
oved
and
white spa
ce is inserted
only in t
h
e re
ar. So, i
n
our
exam
ple the parser m
a
ke the
has
h
tag text
as ‘W
e
WontGi
ve
It Back’ and t
h
en the tweet
s
are
c
l
assified acc
ordingly [20].
4.
POLARITY CLAS
SIFIE
R
Po
larity classifier is th
e
h
eart
o
f
t
h
is
p
a
p
e
r.
W
e
use
Naïve Bayes Classifier,
Unigram
and Bigram
m
odel
s
for cl
a
ssi
fi
cat
i
on
of
pol
a
r
dat
a
.
W
e
have
di
st
ri
but
ed dat
a
i
n
t
o
–
Su
bject
i
v
e an
d O
b
ject
i
v
e
da
t
a
. I
n
su
bj
ectiv
e
d
a
ta, we in
clud
e
data with
p
o
sitiv
e and
n
e
g
a
tive sen
t
i
m
en
ts. In
obj
ectiv
e d
a
ta we in
clud
e d
a
ta
havi
ng
ne
ut
ral
sent
i
m
ent
s
and
em
ot
i
c
ons [
7
]
.
4.
1.
N
a
ï
v
e B
a
y
e
s Cl
assi
fi
er
Thi
s
cl
assi
fi
er
uses si
m
p
l
e
appr
oac
h
base
d o
n
B
a
y
e
s The
o
r
e
m
whi
c
h de
sc
ri
bes -
ho
w t
h
e
con
d
i
t
i
onal
probability of each of a se
t of po
ssi
ble causes for a
give
n obse
rve
d
outcom
e can be c
o
m
puted from
k
nowledg
e of th
e prob
ab
ility
o
f
each
cause an
d
t
h
e cond
itio
n
a
l p
r
ob
ab
ility
o
f
th
e
ou
tco
m
e o
f
each
cause. It is
a B
a
g
of
Wo
rd
s ap
pr
oac
h
fo
r
sub
j
ect
i
v
e
anal
y
s
i
s
of
a c
ont
e
n
t
[
9
,
1
0
]
.
According to the Bayes T
h
e
o
rem
,
for a
doc
ument d a
n
d cla
ss c
–
.
N
a
ïv
e Bayes C
l
assif
i
er
wou
l
d b
e
–
∗
a
r
g
Using
Naïv
e B
a
yes Classifier we can d
e
termin
e th
e acc
ur
acy
of
cl
assi
fi
c
a
t
i
on
[5
,
22]
.
Gene
ral
l
y
, f
o
r
efficient algori
thm the accura
cy turn
s out 80%. Acc
o
rding
to Figure 3
gi
ven below, a
f
ter the topic m
odeling
is done the dat
a
is given senti
m
ents
and dist
ributed according to the
pola
r
ity – positive, negative a
nd
neutral.
The c
r
u
c
i
a
l
di
sad
v
ant
a
ge
of
Naï
v
e
B
a
y
e
s
C
l
assi
fi
er i
s
t
h
at
i
t
su
p
p
o
s
e
s
co
n
d
i
t
i
onal
i
nde
pe
nde
nce
a
m
ong
l
i
ngui
st
i
c
f
eat
u
r
es.
Fig
u
re
3
.
Po
larity Classifier
4.
2.
D
a
t
a
Cl
a
ssi
fi
cati
on
In
t
h
i
s
pape
r,
we
use t
w
o
st
r
a
t
e
gi
es f
o
r
cl
as
si
fy
i
ng t
h
e
pol
ari
t
y
[1
4]
–
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
35
7 – 36
6
36
2
4.
2.
1. B
i
n
a
r
y
In bi
nary
cl
assi
fi
cat
i
on, t
h
e da
t
a
is c
l
assified
j
u
st in
two
typ
e
s fo
r th
e co
nven
i
en
ce of th
e alg
o
rith
m
–
p
o
s
itiv
e an
d
neg
a
tiv
e.
W
e
co
n
s
i
d
er th
is classificatio
n
fo
r th
e case wh
en
we on
ly n
e
ed
po
sitiv
e o
r
neg
a
tiv
e
resu
lts an
d
no n
e
u
t
ral classificatio
n
[1
7
]
. So
, a
b
a
sic
b
i
n
a
ry classificatio
n
h
e
l
p
s in
separatin
g
po
sitiv
e and
n
e
g
a
tiv
e sen
timen
ts o
f
th
e data. If th
e d
a
ta is classified
with
p
o
s
itiv
e an
d n
e
g
a
tiv
e po
larity
th
en
th
e remain
ing
data havi
ng
no polarity (ne
u
tral) is
sent for e
m
oticon analysis. Later,
according to the
e
m
oticons us
ed the
data is classified accordi
ngly
.
The
r
eafter, the rem
a
ini
ng
neut
ral data is ignore
d for the case whe
n
we nee
d
o
n
l
y
p
o
sitiv
e an
d n
e
g
a
tiv
e
resu
lts.
4.
2.
2. B
a
sel
i
n
e
In b
a
seli
n
e
classificatio
n
,
we d
i
stri
b
u
t
e
d
a
t
a
in
to
po
sitiv
e, n
e
g
a
tiv
e and n
e
u
t
ral.
W
e
co
n
s
i
d
er th
e
t
opi
cs i
n
w
h
i
c
h ne
ut
ral
se
nt
i
m
ent
i
s
havi
n
g
an i
m
port
a
nce
[1
8]
. F
o
r e
x
a
m
pl
e duri
n
g el
ect
i
ons, s
o
m
e
peo
p
l
e
are ne
utral towards a particul
ar party.
We use a rule
base
d
cl
assi
fi
er i
n
w
h
i
c
h acc
or
di
n
g
t
o
t
h
e pol
a
r
l
e
xi
co
n
list, th
e d
a
ta are g
i
v
e
n
th
eir
sen
tim
en
ts. The n
e
u
t
ral re
s
u
l
t
s are se
nt aga
i
n for em
oticon a
n
alysis and if the
sen
tim
en
t is fou
n
d
n
e
g
a
tiv
e
or
p
o
s
itiv
e th
en
th
e
d
a
ta is classified
acco
r
d
i
ng
ly, rest
of t
h
e
d
a
ta is
h
e
n
ce t
e
rm
ed
as n
e
u
t
ral
d
a
ta. Fro
m
th
e b
a
selin
e Naïv
e Bayes Classi
fier
we can ac
hie
v
e a
n
acc
uracy
of a
b
out
80% [19].
4.
3. Pol
a
rity
S
h
ifter
If a
n
oun
, v
e
rb o
r
ad
j
ecti
v
e is h
a
v
i
n
g
a
po
sitiv
e po
larity and
th
e
word
b
e
fo
re that is a n
e
g
a
tio
n
lik
e
‘n
ot’
the
n
the
accuracy m
i
ght dec
r
ease. T
o
overcom
e
th
is, we
ha
ve
propose
d
a
n
a
l
gorithm
in which it
search
es fo
r the n
e
g
a
tion
word
s.
Wh
en
th
e p
a
rser find
s t
h
e n
e
g
a
tiv
e wo
rd
it lo
ok
s for th
ree
word
s
b
e
yon
d
negat
i
o
n.
If t
h
e
t
h
ree wo
rd
wi
nd
o
w
i
s
havi
n
g
a nou
n, ve
r
b
o
r
an ad
ject
i
v
e whi
c
h has p
o
si
t
i
v
e pol
ari
t
y
t
h
en t
h
e
pola
r
ity of t
h
at
data is
reve
rse
d
.
Using t
h
is
polarity
shifte
r,
we ca
n ac
hieve
res
u
lts with maxim
u
m
accuracy.
For e
x
am
pl
e Let
us t
a
ke a dat
a
‘
T
h
e
m
o
vie
wa
s no
t go
od
’
. Now, th
e tex
t
is h
a
v
i
ng
a po
sitiv
e sen
t
i
m
en
t word
i.e. ‘g
ood
’,
so th
ere is a possib
ility th
at t
h
e m
ach
in
e
classifies th
is data as p
o
s
itiv
e sen
tim
en
t. B
u
t, wit
h
pola
r
ity shifter, that would not be po
ssible
because we even look at nega
tion. T
h
e polarity is reverse
d
and
th
at d
a
ta is classified
in
t
o
n
e
gativ
e sen
tim
en
t
[2
2
]
.
4.
4. E
m
o
t
i
c
on
An
al
ysi
s
After t
h
e polarity classification
phase, the
neut
ral
data ha
ving em
oticons are the
n
anal
yzed. If the
sen
t
en
ce is
h
a
v
i
ng
po
sitiv
e an
d
n
e
g
a
tiv
e em
o
t
ico
n
s
are th
en
classi
fied
i
n
to
p
o
sitiv
e and
n
e
g
a
tiv
e sen
t
i
m
en
ts
[1
5, 2
1
]
.
Fo
r e
x
am
pl
e l
e
t
us loo
k
at
a t
w
eet
post
e
d by
an a
t
hl
et
e
‘I j
u
st
f
i
ni
she
d
a 2
.
6
6
mi
run w
i
t
h
a
pace
of
1
1
'14"
/mi with Nike+ GPS
:D :D.
#n
ikep
l
u
s
#
m
a
keitcoun
t‘
.
Acco
rd
ing to
th
is tweet,
th
e tex
t
is no
t
h
a
v
i
n
g
an
y po
sitiv
e
or
n
e
g
a
tiv
e senti
m
en
ts h
e
n
c
e it is classifi
ed
to
n
e
u
t
ral
tweet. B
u
t, the tweet is
h
a
v
i
n
g
an
e
m
o
tico
n
wh
ich
is ‘:D’
wh
ich
shows po
sitiv
e sen
tim
en
t a
b
ou
t n
i
k
e
p
l
u
s
[2
1
]
. Th
erefo
r
e, th
is tex
t
is cla
ssified
as po
sitiv
e sen
t
i
m
en
t tex
t
. Th
e alg
o
rith
m
o
f
em
o
t
ico
n
an
alysis work
s as fo
llo
ws –
The
dat
a
fr
om
t
h
e ne
ut
ral
sect
i
on i
s
a
n
al
y
zed
and
em
oticon
is searche
d
in t
h
e se
ntence
[18]. The t
w
o
letters after ‘:’
sym
bol is im
porta
nt in this c
a
se. If
white
space is prese
n
t a
f
ter the
‘:’ symbol the
n
it is ignore
d
but i
f
a letter is
present a
f
ter t
h
e sym
bol the
n
the em
oticons
are the
n
classi
fied accordingly.
Tabl
e
2. Em
ot
i
c
on
Li
st
Em
oticon Sentim
ent
:) Positive
:( Negative
:D Positive
:
|
Negative
:’( Negative
;) Positive
:/ Negative
:O Negative
Tab
l
e-2
sho
w
s list o
f
ex
am
p
l
e e
m
o
tico
n
s
with
th
ei
r
respectiv
e sen
tim
e
n
ts.
In th
is
way, th
ey are
classified acc
ordi
ng to t
h
eir s
e
ntim
ents.
5.
ALGO
RITH
M AN
D C
A
S
E
S
So
fr
om
t
h
e ab
ove
m
odul
es,
o
u
r
o
v
eral
l
al
go
ri
t
h
m
wor
k
s as
f
o
l
l
o
ws
–
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
S
e
n
timen
ta
l
Ana
l
ysis o
f
Twitter Da
ta
Usi
n
g
Cla
ssifier Alg
o
rith
ms
(Sha
rvil Sh
ah
)
36
3
Th
e
d
a
ta is ex
t
r
acted
fro
m
twitter u
s
ing
twitter APIs
o
r
Had
oop
.
Th
e d
a
ta
i
m
p
o
r
ted
is sto
r
ed
in
JSON o
r
any
ot
her
rel
e
v
a
nt
f
o
rm
at
.
Th
is d
a
ta is th
en
sen
t
fo
r
p
r
e-p
r
o
cessi
n
g
where th
e d
a
ta is si
m
p
lified
.
Here, th
e fo
llowi
ng
pro
cesses takes
place –
1.
Cap
s
Id
en
tificatio
n
2.
Lower Casi
ng
3.
UR
L R
e
m
oval
4.
R
e
m
ovi
n
g
Use
r
nam
e
fr
om
po
st
5.
Re
m
oval of Punctuations a
n
d
white s
p
aces
6.
Tak
i
ng
car
e
o
f
letter
r
e
dun
d
a
ncy
Aft
e
r t
h
e dat
a
i
s
pre
-
p
r
oce
sse
d we
o
b
t
a
i
n
a
refi
ne
d
dat
a
.
N
o
w
,
t
h
e al
g
o
r
i
t
h
m
does t
o
pi
c m
odel
i
ng o
f
t
h
e
give
n
data a
n
d
it is classified
according t
o
the va
rious
topic
s
usi
n
g the
has
h
tag al
gorithm
as stated a
b
ove
.
Later on the
data is classifie
d
accordi
ng to the se
ntim
ents using Naïve
Bayes Classification Algorithm
.
We use basic
bina
ry and bas
e
line pa
ttern t
o
classify the da
ta. To im
prove
the accuracy we also
follow a
p
o
l
arity sh
ifter alg
o
rith
m
wh
i
c
h
can
id
en
tify
th
e
n
e
g
a
ti
o
n
used
i
n
th
e tex
t
an
d the
n
act ac
cordingly.
Th
e d
a
ta wh
ich
is classified
in
to
n
e
u
t
ral senti
m
en
t is
then sent for an em
oticon analysis.
Here
, the ne
utral
d
a
ta is classifi
ed
t
o
su
bj
ective sen
tim
en
ts an
d rem
a
in
in
g data wh
ich
do
es no
t
h
a
v
e
an
y
e
m
o
tico
n
s
stay in
t
h
e ne
ut
ral
s
ect
i
on.
5.
1. Pseud
o
C
o
de
Algo
rith
m
:
Sen
t
i
m
en
tal An
al
ysis o
f
Twitter
Data
Inpu
t: Set of al
l th
e d
a
ta retriev
e
d
D
Ou
t
p
u
t:
Po
larised
d
a
ta
P
1
In
itialize
Data Retriev
e
d
set
D
2
In
itialize Selected
tok
e
n set
S
/
/
C
onve
rt
i
n
g t
o
Lo
wer
case
3
foreach t €
D
do
4 i
←
t.tweet;
5 if
S
(i) =
N
U
LL
then
6
S
(i) = t;
7 else
S
(i
)=lo
wer
case();
//Re
m
o
v
e
URL
8
foreach t € D do
9 i
←
t.tweet;
10
i
f
S
(i)=
NO
UR
L the
n
11
S
(i)=t;
12
el
se
S
(i
)=t.sub('((www\.[^\s]+)|(h
ttp
s
?
://[^\s]+))','
URL',tweet);
/
/
R
em
ovi
ng
us
ernam
e
13
foreach t € D do
14
i
←
t.tweet;
15
S
(
i
) =
t.
sub
(
'@[
^
\s
]+
',
'A
T
_
U
S
E
R
',
tw
e
e
t
)
;
//Re
m
o
v
e
add
itio
n
a
l
wh
ite sp
aces
16
foreach t € D do
17
i
←
t.tweet;
18
S
(
i
) =
t.
sub
(
'[
\s
]+
',
'
',
tw
e
e
t
)
;
/
/
T
opi
c
M
o
del
i
ng
19
S =
t.su
b(‘#
w
o
rd
a
c
co
rd
i
n
g to
th
e list
’,’
’
)
20
Loa
d
t
h
e
t
o
pi
c
wi
se se
parat
e
d
t
w
eet
es i
n
di
f
f
e
rent
dat
a
st
ore
20
foreach t € D do
21
i
←
t.tweet;
22
S
(i)=t.
store
();
//Po
l
arity
Clais
s
ifier
2
3
if(tweet
co
n
t
ai
n
i
ng
p
o
sitiv
e word)
t
h
en
2
4
t.p
o
sitiv
esen
ti
men
t
();
25 elseif(tweet
c
o
ntaini
n
g
ne
gat
i
v
e wo
r
d
)
t
h
en
2
6
t.n
e
g
a
tive
sen
timen
t();
27
el
sei
f(t
weet
c
o
nt
ai
ng
ne
gat
i
o
n)
t
h
e
n
28
if
(ne
x
t
3
w
o
r
d
s a
r
e
polar
noun,
ve
rb or a
d
j)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
35
7 – 36
6
36
4
29
t
.
reve
rse
pol
ari
t
y
();
30
el
sei
f(em
o
t
i
c
on=TR
U
E
)
t
h
en
3
1
if(emo
tico
n
=
po
sitive) th
en
3
2
t.p
o
sitiv
esen
ti
men
t
();
33
el
sei
f(em
o
t
i
c
on=ne
gat
i
v
e
)
t
h
en
34
t
.
negat
i
v
esent
i
m
e
nt
();
35
else t.
neut
ralsentim
ent();
Fi
gu
re 4.
O
v
er
al
l
Fl
ow o
f
Al
g
o
ri
t
h
m
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
S
e
n
timen
ta
l
Ana
l
ysis o
f
Twitter Da
ta
Usi
n
g
Cla
ssifier Alg
o
rith
ms
(Sha
rvil Sh
ah
)
36
5
5.
2. C
a
ses
Table
3 e
xplai
ns
differe
n
t cas
es for our algorith
m
Tabl
e 3.
C
a
ses
Case No
Case
Description
Exa
m
ple
Result
1
Positive Senti
m
ent
Data
Tweets
with positive senti
m
ent is test
ed with
our
algor
ith
m
@XY
Z
: The m
ovi
e
was
aa
m
a
a
aazzzingg !! :D
Success
2
Negative Senti
m
en
t Data
Tweets
with negative senti
m
ent is test
ed with
our
algor
ith
m
@ABC:
It is such
a bad
day
Success
3
Neutral Senti
m
ent
Data
Tweets
with neutra
l senti
m
ent is t
e
ste
d
with
our
algor
ith
m
@
U
SR: I
gave an
E
nglish T
e
st today
Success
4
Neutral Senti
m
ent
Data with
Em
oticon
A tweet which is h
a
ving neutr
a
l sentim
ent but
with e
m
oticon is te
sted with our algorith
m
@USR:
I
ran
2
.
5
k
m
stoday ! :D
Success
5
Positive Senti
m
ent
with Negation
A
tweet
which has
positive words but
a
negation is star
ting
@
U
SR: T
oday
is not
good
Success
6
Topic
m
odeling using hashtag
A tweet
is
classif
i
e
d
according to topi
cs by
using hashtag
@
U
SR: Hoping fo
r
nar
e
ndr
am
odi to win
elections
#Nam
o
F
or
I
ndia
Success
6.
RESULTS
A
N
D
DI
SC
US
S
I
ON
Co
m
p
ared
t
o
o
t
h
e
r
works
do
n
e
up
till no
w, our fi
n
a
l algo
rith
m
h
a
s all
th
e accu
r
acy
main
tain
in
g
feat
ure
s
l
i
k
e –
Hash Ta
g C
l
assi
fi
cat
i
on f
o
r
To
pi
c M
ode
ling
,
Po
larity Sh
ifter, em
o
tico
n
An
alysis and
Graph
gene
ration for
getting acc
urate res
u
lts. Because
of t
h
ese
fe
atures, our algorithm
is
better tha
n
other works
done in this
field. An ave
r
a
g
e accuracy of
81% is a
m
ong
the highest re
ported in
resea
r
ch
of this
field. T
h
e
p
o
l
arity sh
ifter an
d
top
i
c
m
o
d
e
lin
g
are two cru
c
ial st
ep
s in
o
u
r alg
o
rithm wh
ich
lead
s o
u
r
algorith
m to
a
higher accurac
y
.
Let u
s
tak
e
sev
e
n
tweets regard
i
n
g
a sp
ecific
m
a
tch
In
d
i
a v
s
Au
st
ralia. In
th
is p
a
p
e
r, we will sh
ow
wo
rki
n
g
of
o
n
e
t
w
eet
i
n
t
h
e al
go
ri
t
h
m
and t
h
en t
h
e
g
r
a
p
h
ge
nerat
i
o
n.
Let
u
s
t
a
ke a
n
e
x
am
pl
e t
w
eet
as
–
“@
a
b
c:
H
a
vi
n
g
a
gre
a
t
f
eel
i
ng w
h
i
l
e
w
a
t
c
hi
ng
t
h
e
m
a
t
c
h
#
I
ndvs
A
us
”
Step-1
T
h
e
abo
v
e
tw
ee
t f
i
r
s
t
g
o
e
s for
pr
ep
ro
c
e
s
s
i
ng
.
T
h
e who
l
e
tweet is co
nv
erted
to
l
o
wer case.
Hen
c
e th
e tweet
lo
ok
s lik
e – “
@a
bc
: havi
n
g
a gre
a
t
f
eel
i
ng w
h
i
l
e
w
a
t
c
hi
ng
t
h
e mat
c
h #i
n
d
vs
aus
”
Step-2
The algorithm
searche
s
for
UR
L, bu
t sin
ce th
ere are no
URL in
th
is tweet so
m
o
v
e
s ahead
to
th
e
n
e
x
t
step
in
whi
c
h i
t
rem
oves t
h
e use
r
na
m
e
. So aft
e
r r
e
m
ovi
n
g
use
r
n
a
m
e
and co
nv
ert
i
ng i
t
t
o
ge
neri
c nam
e
, ou
r t
w
eet
lo
ok
s lik
e – “
at
_us
er
havi
n
g
a
gre
a
t
f
eel
i
n
g
w
h
i
l
e
w
a
t
c
hi
ng
t
h
e
mat
c
h
#i
n
d
vs
aus
”
Step-3
In t
h
is step, a
dditional white spaces are
re
m
oved fr
om
the tweet a
n
d topic m
odeling takes
place
. Afte
r
appl
y
i
n
g
t
h
e
h
a
sht
a
g cl
assi
fi
c
a
t
i
on al
g
o
ri
t
h
m
t
h
e dat
a
i
s
st
ore
d
u
n
d
er
In
di
a vs
A
u
st
ral
i
a
m
a
t
c
h t
opi
c
and
o
u
r
tweet lo
ok
s like –
“
at
_
u
ser
h
a
v
i
ng
a
gre
a
t
f
e
el
i
ng w
h
i
l
e
w
a
t
c
hi
n
g
t
h
e
m
a
t
c
h”
Step-4
The polarity classification takes place
using Naïve Ba
yes classification algorithm
and he
nce the graph is
g
e
n
e
rated
as sh
own
b
e
l
o
w. Th
is ex
am
p
l
e tweet is classi
fied
in
t
o
po
sitive sen
tim
en
t. Please refer
figure 5
fo
r
gra
p
hi
cal
vi
ew
.
Figure
5. Graphical View
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E V
o
l
.
6, No
. 1, Feb
r
uar
y
20
1
6
:
35
7 – 36
6
36
6
Th
e
r
e
su
lt show
n in
f
i
gur
e
5 is no
t
1
0
0
%
accu
r
a
te. On
m
a
n
u
a
l
ch
eck
it
is foun
d th
at
on
e
o
f
th
e t
w
eet was
sh
owing
wro
n
g
po
larity, so
accu
racy
co
m
e
s arou
nd
85
%
fo
r sev
e
n
tweets. Similarly, we tested
th
is alg
o
rith
m
for
vari
ous num
ber of t
w
eets and an average accuracy
of 81% was found by
usi
n
g our
al
gorithm
.
7.
CO
NCL
USI
O
N
The algorit
h
m used
by us is pretty m
u
ch accura
te for
classification
of
data according to the
sent
i
m
ent
s
as
di
scuss
e
d ea
rl
i
e
r an
d
gi
ves
u
s
a gre
a
t
ave
r
a
g
e acc
uracy
as
di
scuss
e
d i
n
R
e
sul
t
s
an
d
Di
scussi
o
n
6
.
Th
e
sp
eciality o
f
th
is algo
rith
m
is th
at it u
s
es all th
e th
ree tech
n
i
q
u
e
s
an
d th
is m
i
x
t
u
r
e resu
lts in
t
o
a good
o
u
t
co
m
e
as
seen
in
th
is p
a
p
e
r.All th
e step
s
u
s
ed
in
th
e alg
o
r
ith
m
are well b
u
ilt an
d
tested
sev
e
ral ti
mes an
d
t
h
i
s
cust
om
algo
ri
t
h
m
desi
gned
by
us i
s
effi
ci
ent
an
d
bet
t
e
r t
h
an
ot
her
wo
rk
s i
n
t
h
i
s
fi
el
d.T
h
e
r
e
are no
un
necessa
ry
st
eps i
n
t
h
i
s
al
go
ri
t
h
m
whi
c
h
w
oul
d l
ead
t
o
a t
i
m
e
cons
um
i
ng p
r
oces
s.
As
we ca
n see
i
n
Ta
bl
e
1, al
l
t
h
e
wo
rk
s
hav
e
som
e
or t
h
e
ot
he
r l
i
m
i
t
a
ti
ons.
O
u
r
w
o
r
k
ove
r c
o
m
e
s
th
ese limitat
i
o
n
s
.
Sp
en
cer’s
li
mitatio
n
o
f
hash
tag classificatio
n
,
Appo
rv
a’s li
m
ita
tio
n
o
f
Top
i
c m
o
d
e
ling,
Th
eresa’s
lim
it
atio
n
o
f
n
a
rrow d
a
ta,
Go
et al.’s
limita
ti
on
of
cl
assi
fi
cat
i
o
n
pe
rf
orm
a
nce and Berm
ingham
’s
li
mitatio
n
o
f
accu
racy wit
h
sh
ort tweets are all co
v
e
red
up
b
y
our
algo
rith
m
.
I
n
our
futu
r
e
wor
k
, w
e
w
o
u
l
d
lik
e to
im
p
l
emen
t an
al
g
o
rith
m
wh
ich
can
d
e
tect sarc
a
s
m
in a better way a
n
d ca
n give
acc
urate
res
u
lts.
Pattern
ex
tracti
o
n can b
e
con
s
id
ered
for
g
e
ttin
g recu
rring
i
n
form
at
io
n
.
REFERE
NC
ES
[1]
Twitte
r as
a Co
r
pus for Sentim
e
n
t Anal
ysis
and
Opinion Mining
B
y
Al
exand
e
r P
a
k, Pa
tri
c
k Paro
ubek.
[2]
EthemAlpay
din
.
2004. Introduction to Machine
Learning (A
dap
tiv
e Computation and Machine Learning). Th
e MIT
Pre
ss.
[3]
Emoticon Smoothed Languag
e
M
odels fo
r
Twitter
Sentiment Analy
s
is b
y
Kun-
Lin
Liu, Wu-Jun Li,
Min
y
iGuo.
[4]
Sentim
entor:
Se
ntim
ent Ana
l
y
s
is of Twi
t
t
e
r Da
ta
b
y
Jam
e
s Spenc
e
r and
Gulden
U
c
h
y
igit
.
[5]
Citius: A
Naive-
Ba
yes Stra
teg
y
f
o
r Sentim
ent
Anal
y
s
is
on
Englis
h Twee
ts* b
y
Pa
blo Gam
a
llo
and
Marcos Gar
c
ia
.
[6]
Antonio Fernan
dez Anta
, Philip
pe Morere, and
Agust´
ı
n
Santos. 2013. Sentiment Analy
s
is and Topic Detection o
f
Spanish Tweets: A Comparative
Stud
y
of
NLP T
echniqu
es. Procesamiento
del Lenguaje
Natur
a
l.
[7]
Alec Go
, Ri
chaBhay
a
ni
,
and L
e
i Huang. 2009. T
w
itter
se
ntim
ent
classifi
cat
ion usi
ng distan
t super
v
ision.
[8]
Pak, A., Paroub
ek, P. Twitter
as a co
rpus for sentiment an
aly
s
is
and opin
i
on mining. In
: Chair), N.C.C., Choukr
i,
K
., M
aeg
aard
,
B., M
a
ri
ani
,
J
.,
O
d
ijk, J
., P
i
p
e
ri
dis
,
S
., Ros
n
er
,
M
., Tap
i
as
, D
.
(
e
ds
.) P
r
oce
e
ding
s
of the S
e
venth
International Co
nference on
Lan
guage
R
e
sources
and
Evaluation
(LREC’10).
[9]
David Ahn& B
a
lder
ten
Cat
e
.
Si
m
p
le langu
age
m
ode
ls and spa
m
filter
i
ng wi
th
Naive B
a
yes,
20
05.
[10]
S. Bacc
ian
e
ll
a,
A. Esuli,
and F. Se
bastiani. S
e
ntiWordNet 3.0: An Enha
nced
Lexi
cal R
e
s
ourc
e
for S
e
ntim
en
t
Analy
s
is
and Op
inion Mining
.
[11]
End-to-End
Sentiment Analy
s
is
o
f
Twitter
Data b
y
Apoor
v Ag
ar
wal and
Jasneet
Singh Sabharwal.
[12]
Agarwal, A., Xi
e, B
.
, Vovsha, I., Ram
bow, O.,
an
d Pa
ssonneau, R
.
(2011)
. Sent
im
ent
anal
y
s
is
of t
w
itter
dat
a
.
[13]
Barbosa, L. and Feng,
J.
(2010)
. Robust
sentim
en
t
det
ect
ion on
t
w
itter from
bi
ased and no
is
y
d
a
t
a
. Proceedings o
f
the 23rd
International Conference
on Computatio
nal
Linguistics.
[14]
Go, A., Bhay
ani, R., and Huang
,
L. (2009
). Twitter sen
timent classification usin
g dist
ant superv
ision. Technical
report, Stanford
.
[15]
Jiang,
L., Yu, M
.
, Zhou, M., Liu, X., and
Zhao
,
T. (2011)
on
Targ
et-dep
ende
nt twitter
sentiment.
[16]
Kim, S. M. and
Hov
y
, E. (2004)
. Dete
rmining
th
e sentiment of op
inions.
[17]
Pang, B. and Lee, L. (2004). A sentim
ental education: Sentiment analy
s
is
using subjectivity
analy
s
is using
subjectiv
ity
sum
m
arization b
a
sed
on minimum cu
ts.
[18]
Turney
, P. (200
2). Thumbs up or thumbs down
?
seman
tic
or
ientation app
lied
to uns
upervised classification
of
reviews.
[19]
Yu, H. and
Hatzivassiloglou
, V.
(2003). Towards
answer
ing opin
i
on questions: Separati
ng facts fr
om opinions and
identif
y
i
ng
the p
o
larity
of
opinio
n
senten
ces. Con
f
erence
on
Empirical methods
in
natural languag
e
processing.
[20]
David Haussler. 1999. Convoluti
on kernels on
discrete struct
ur
es. Technical r
e
port, University of California at
Sa
nta
Cruz
.
[21]
CM Whissel. 19
89. The diction
a
r
y
of
Affect in
Languag
e
. Emotion: theor
y
r
e
search and exp
e
ri
ence, Acad pres
s
London.
[22]
T. Wilson
, J. Wiebe, and
P. Hoff
man.
2005. Reco
gnizing
contextu
al po
larity
in phr
ase level sentiment
analy
s
is
.
[23]
TwiSent: A Multistag
e
S
y
st
em
for Anal
y
z
ingSentim
ent in
Twi
t
t
e
r b
y
Subhabr
at
a Mukherjee,
AkshatMalu,
A.R
.
Balamurali, Pushpak Bhattachar
yy
a.
[24]
Sentimenatal A
n
aly
s
is of
Twitter Data b
y
A
poorv Agarwal Bo
y
i
Xie Ilia
Vovsha Owe
n Rambow
Rebecca
Passonneau Dep
a
rtment of
Computer Science
C
o
lumbia Univers
i
ty
New York
, N
Y
10027 USA.
[25]
Real Time S
e
ntiment Analy
s
is
of Twitter D
a
ta Using Hadoop
b
y
S
unil B M
a
ne, Yashwant
S
a
want, SaifKazi,
VaibhavShinde in IJCSIT, ISSN:09745-9646.
[26]
Classif
y
ing Sentiment in Microb
logs: Is Brevity
an A
dvantage?
B
y
Adam Bermingha
m and Alan
Smeaton, Dublin
Ci
ty
Uni
v
e
r
si
ty
.
Evaluation Warning : The document was created with Spire.PDF for Python.