I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
10
, N
o.
4
,
D
e
c
e
m
be
r
2021
, pp.
990
~
996
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
10
.i
4
.pp
990
-
996
990
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
A
ss
e
ss
i
n
g
n
ai
ve
b
aye
s an
d
su
p
p
or
t
ve
c
t
or
m
ac
h
i
n
e
p
e
r
f
or
m
an
c
e
i
n
se
n
t
i
m
e
n
t
c
l
ass
i
f
i
c
at
i
on
on
a
b
i
g d
a
t
a p
l
at
f
or
m
R
e
d
ou
an
e
K
ar
s
i,
M
ou
n
ia
Z
ai
m
, Jam
il
a E
l
A
la
m
i
LASTIMI Laboratory, Higher School of Technology of Sale, Mohammed V
University, Moro
cco
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
F
e
b 8
,
2021
R
e
vi
s
e
d
M
a
y 19
,
2021
A
c
c
e
pt
e
d
J
ul
28
,
2021
Nowadays,
mining
user
reviews
becomes
a
very
useful
mean
for
de
cision
making
in
several
areas.
Traditionally,
machine
learning
algorithms
have
been
widely
and
effectively
used
to
analyze
user’s
opinions
on
a
limited
volume
of
data.
In
the
case
of
massive
d
ata,
powerful
hardware
res
ources
(CPU,
memory,
and
storage)
are
essential
for
dealing
with
the
whol
e
data
processing
phases
including,
collection,
pre
-
processing,
and
learning
in
an
optimal
time.
Several
big
data
technologies
have
emerged
to
effic
iently
pr
ocess
massive
data,
like
apache
spark
,
which
is
a
distributed
framew
ork
for
data
processing
that
provides
libraries
implementing
several
m
achine
learning
algorithms.
In
order
to
evaluate
the
performa
nce
of
apache
s
park'
s
machine
learning
librar
y
(MLlib)
on
a
large
volume
of
data,
classifi
cation
accuracies
and
processin
g
time
of
two
machine
learning
algo
rithms
implemented
in
spark
:
naive
bayes
and
support
vector
machine
(
SV
M
)
are
compared
to
the
performance
achi
eved
by
the
stand
ard
implem
entat
ion
of
these
two
algorithms
on
large
different
size
datasets
built
from
movie
reviews.
The
results
of
our
experiment
show
that
the
performan
ce
of
classifi
ers
running
under
spark
is
higher
than
traditional
ones
and
reac
hes
F
-
measure
greater
than
84%.
At
the
same
time,
we
found
that
under
spark
framework, the l
earning tim
e is relati
vely low.
K
e
y
w
o
r
d
s
:
A
pa
c
he
s
p
a
r
k
B
ig
da
ta
N
a
iv
e
ba
ye
s
S
e
nt
im
e
nt
a
na
ly
s
is
S
uppor
t
ve
c
to
r
m
a
c
hi
ne
This is an
open
acce
ss artic
le unde
r
the
CC BY
-
SA
license
.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
R
e
doua
ne
K
a
r
s
i
L
a
bor
a
to
r
y of
S
ys
te
m
A
na
ly
s
is
, I
nf
or
m
a
ti
on P
r
oc
e
s
s
in
g a
nd I
nt
e
gr
a
te
d M
a
na
ge
m
e
nt
H
ig
he
r
S
c
hool
of
T
e
c
hnol
ogy of
S
a
le
M
oha
m
m
e
d V
U
ni
ve
r
s
it
y
M
or
oc
c
o
E
m
a
il
:
r
dka
r
s
i@ya
hoo.f
r
1.
I
N
T
R
O
D
U
C
T
I
O
N
F
ol
lo
w
in
g
th
e
e
xpl
os
io
n
of
s
ubj
e
c
ti
ve
te
xt
ua
l
in
f
or
m
a
ti
on
in
s
oc
ia
l
ne
twor
ks
,
f
or
um
s
,
a
nd
bl
og
s
in
th
e
f
or
m
of
opi
ni
ons
f
r
e
e
ly
w
r
it
te
n
by
in
te
r
ne
t
us
e
r
s
,
s
e
nt
im
e
nt
a
na
l
ys
is
ha
s
e
m
e
r
ge
d
a
s
a
di
s
c
ip
li
ne
of
da
ta
m
in
in
g
th
a
t
a
im
s
t
o e
xt
r
a
c
t
a
n opini
on f
r
om
uns
tr
uc
tu
r
e
d t
e
xt
ua
l
da
ta
. I
t
a
ll
ow
s
, f
or
e
xa
m
pl
e
, m
a
na
gi
ng t
he
m
a
r
ke
ti
ng
s
tr
a
te
gy
of
a
c
om
pa
ny
b
a
s
e
d
on
th
e
a
na
ly
s
is
of
c
on
s
um
e
r
f
e
e
dba
c
k
to
w
a
r
ds
a
pr
oduc
t
[
1]
,
[
2]
.
T
a
c
kl
in
g
s
e
nt
im
e
nt
a
na
ly
s
is
i
s
s
ue
s
i
s
done
a
c
c
or
di
ng t
o s
e
ve
r
a
l
a
ppr
oa
c
h
e
s
, a
l
e
xi
c
a
l
a
ppr
oa
c
h
[
3]
th
a
t
us
e
s
a
di
c
ti
ona
r
y
to
id
e
nt
if
y
th
e
te
xt
’
s
s
e
nt
im
e
nt
f
r
om
it
s
c
ons
ti
tu
e
nt
s
'
pol
a
r
it
y,
w
he
th
e
r
th
e
y
a
r
e
w
or
ds
or
s
e
nt
e
nc
e
s
.
H
ow
e
v
e
r
,
th
is
a
ppr
oa
c
h
is
not
a
lwa
y
s
th
e
b
e
s
t
s
ol
ut
io
n
be
c
a
u
s
e
a
w
or
d
c
a
n
ha
ve
di
f
f
e
r
e
nt
or
ie
nt
a
ti
ons
de
pe
ndi
ng
on
th
e
dom
a
in
w
he
r
e
it
a
ppe
a
r
s
.
I
nde
e
d,
“
a
da
ng
e
r
ous
pl
a
ye
r
”
h
a
s
a
pos
it
iv
e
pol
a
r
it
y
in
th
e
s
por
ts
dom
a
in
,
but
“
da
nge
r
ous
a
ni
m
a
l
”
ha
s
a
ne
ga
ti
ve
pol
a
r
it
y
in
th
e
a
ni
m
a
l
dom
a
in
.
B
e
s
id
e
s
th
e
le
xi
c
a
l
a
ppr
oa
c
h,
th
e
r
e
is
a
n
a
ppr
oa
c
h
us
in
g
m
a
c
hi
ne
le
a
r
ni
ng
m
e
th
od
s
[
4]
,
a
nd
f
or
c
om
pa
r
i
s
on,
r
e
s
e
a
r
c
h
ha
s
s
how
n
th
a
t
m
a
c
hi
ne
le
a
r
ni
ng
m
e
th
ods
a
r
e
m
or
e
a
c
c
ur
a
te
t
ha
n l
e
xi
c
a
l
-
ba
s
e
d m
e
th
ods
[
5]
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
s
s
e
s
s
in
g
nai
v
e
bay
e
s
and s
uppo
r
t
v
e
c
to
r
m
ac
hi
ne
pe
r
fo
r
m
anc
e
i
n
…
(
R
e
douane
K
ar
s
i
)
991
A
s
e
xpe
r
i
m
e
nt
s
ha
ve
s
how
n
th
a
t
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
s
out
pe
r
f
or
m
le
xi
c
a
l
-
ba
s
e
d
a
lg
or
it
hm
s
,
th
e
y
a
r
e
a
pr
e
f
e
r
r
e
d
c
hoi
c
e
f
or
s
e
nt
im
e
nt
c
la
s
s
if
ic
a
ti
on
pr
obl
e
m
s
.
I
n
R
e
s
e
a
r
c
h
w
or
ks
th
a
t
a
ddr
e
s
s
s
e
nt
im
e
nt
a
na
ly
s
is
pr
obl
e
m
s
w
it
h
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
s
,
w
e
of
te
n
u
s
e
s
m
a
ll
or
m
e
di
um
-
s
iz
e
d
le
a
r
ni
ng
da
ta
th
a
t
do
not
r
e
qui
r
e
m
uc
h
ha
r
dw
a
r
e
r
e
s
our
c
e
s
.
I
n
th
e
s
e
c
ondi
ti
ons
,
th
e
s
e
a
lg
or
it
hm
s
r
e
a
c
h
hi
gh
a
c
c
ur
a
c
ie
s
a
nd
ve
r
y
lo
w
la
te
nc
y.
W
he
n
th
e
tr
a
in
in
g
da
ta
is
la
r
ge
,
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
i
th
m
s
f
a
c
e
m
a
ny
c
ha
ll
e
nge
s
.
T
h
e
ir
de
s
ig
n
m
us
t
a
c
c
om
m
oda
te
l
im
it
e
d m
e
m
or
y r
e
s
our
c
e
s
a
nd
e
ns
ur
e
a
d
e
qua
te
e
xe
c
ut
io
n t
im
e
[
6]
.
T
he
c
onc
e
pt
of
bi
g
da
ta
ha
s
e
m
e
r
ge
d
to
br
in
g t
oge
th
e
r
a
ll
te
c
hn
ol
ogi
e
s
f
or
th
e
c
ol
le
c
ti
on,
s
to
r
a
ge
,
a
nd
pr
oc
e
s
s
in
g
of
m
a
s
s
iv
e
da
t
a
th
a
t
tr
a
di
ti
ona
l
to
ol
s
c
a
n
not
pr
oc
e
s
s
[
7]
,
[
8]
.
T
he
pur
pos
e
of
th
is
pa
pe
r
is
to
e
va
lu
a
te
th
e
pe
r
f
or
m
a
nc
e
of
two
m
a
c
hi
ne
le
a
r
ni
ng
m
e
th
ods
e
m
be
dde
d
in
to
a
bi
g
da
ta
f
r
a
m
e
w
or
k
na
m
e
d
a
pa
c
he
s
pa
r
k
on
a
la
r
ge
da
ta
s
e
t
by
c
om
pa
r
in
g
th
e
m
to
th
e
pe
r
f
or
m
a
nc
e
of
th
e
s
e
s
a
m
e
m
e
th
ods
e
xe
c
ut
e
d
a
c
c
or
di
ng
to
a
tr
a
di
ti
ona
l
a
ppr
oa
c
h.
B
y
te
s
ti
ng
on
s
e
ve
r
a
l
ha
r
dw
a
r
e
c
onf
ig
u
r
a
ti
ons
of
th
e
s
pa
r
k
c
lu
s
te
r
,
w
e
f
ound
th
a
t
th
e
c
la
s
s
if
ic
a
ti
on pe
r
f
or
m
a
nc
e
s
of
na
iv
e
b
a
ye
s
a
nd
s
uppor
t
ve
c
to
r
m
a
c
hi
ne
(
S
V
M
)
unde
r
s
p
a
r
k
pl
a
tf
or
m
a
r
e
be
tt
e
r
th
a
n
th
os
e
a
c
hi
e
ve
d
in
a
s
in
gl
e
m
a
c
hi
ne
w
it
h
F
-
m
e
a
s
ur
e
be
yo
nd
84%
.
W
e
a
ls
o
ob
s
e
r
ve
th
a
t
s
uppor
t
v
e
c
to
r
m
a
c
hi
ne
(
S
V
M
)
a
nd
na
iv
e
b
a
ye
s
a
r
e
s
c
a
la
bl
e
m
a
c
hi
n
e
l
e
a
r
ni
ng
a
lg
or
it
hm
s
on t
he
s
pa
r
k
pl
a
tf
or
m
.
T
he
r
e
m
a
in
de
r
of
th
e
pa
p
e
r
is
or
ga
ni
z
e
d
a
s
f
ol
lo
w
s
.
I
n
s
e
c
ti
on
2,
r
e
la
te
d
w
or
k
is
pr
e
s
e
nt
e
d.
I
n
s
e
c
ti
on
3,
th
e
a
dopt
e
d
m
e
th
odol
ogy
is
de
ta
il
e
d.
E
xpe
r
im
e
nt
a
l
r
e
s
ul
ts
a
r
e
pr
e
s
e
nt
e
d
a
nd
di
s
c
u
s
s
e
d
in
s
e
c
ti
on
4. F
in
a
ll
y, i
n s
e
c
ti
on 5, the
pa
p
e
r
i
s
c
onc
lu
de
d, a
nd f
ut
u
r
e
r
e
s
e
a
r
c
h i
s
s
u
e
s
a
r
e
und
e
r
li
ne
d.
2.
R
E
L
A
T
E
D
WORK
I
n
th
is
s
e
c
ti
on,
w
e
di
s
c
u
s
s
di
f
f
e
r
e
n
t
w
or
ks
on
s
e
nt
im
e
nt
a
na
ly
s
i
s
,
bi
g
da
ta
f
r
a
m
e
w
or
ks
,
a
nd
di
s
tr
ib
ut
e
d
m
a
c
hi
ne
l
e
a
r
ni
ng me
th
ods
.
2
.1.
S
e
n
t
im
e
n
t
an
al
ys
is
S
e
nt
im
e
nt
a
na
ly
s
is
is
a
s
e
t
of
te
c
hni
que
s
in
c
lu
di
ng
te
xt
a
na
l
yt
ic
s
,
c
om
put
a
ti
ona
l
li
ngui
s
ti
c
s
,
a
nd
na
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
f
or
c
la
s
s
if
yi
ng
te
xt
s
in
to
po
s
it
iv
e
,
n
e
ga
ti
ve
,
or
ne
ut
r
a
l.
P
a
ng
e
t
al
.
[
9]
a
r
e
th
e
or
ig
in
of
th
e
f
ir
s
t
s
tu
di
e
s
on
s
e
nt
im
e
nt
a
na
ly
s
is
.
T
he
y
us
e
d
a
m
a
c
hi
ne
le
a
r
ni
ng
a
ppr
oa
c
h
to
c
la
s
s
if
y
m
ovi
e
r
e
vi
e
w
s
.
K
im
e
t
al
.
[
10]
ha
d
te
s
te
d
f
e
a
tu
r
e
s
e
le
c
ti
on
on
th
e
s
uppor
t
v
e
c
to
r
m
a
c
hi
ne
(
S
V
M
)
a
lg
or
it
hm
.
T
he
a
ut
hor
s
c
onc
lu
de
d
th
a
t
S
V
M
out
pe
r
f
or
m
s
a
ll
ot
he
r
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
s
f
or
s
e
nt
im
e
nt
c
la
s
s
if
ic
a
ti
on
ta
s
ks
.
J
e
ong
e
t
al
.
[
11]
us
e
s
e
nt
im
e
nt
a
na
ly
s
i
s
to
id
e
nt
if
y
c
us
to
m
e
r
pr
e
f
e
r
e
nc
e
s
a
nd
tr
e
nds
.
W
u
e
t
al
.
[
12]
ha
d
e
xpl
or
e
d
twe
e
ts
to
pr
e
di
c
t
s
to
c
k
m
a
r
ke
t
pr
ic
e
.
T
he
y
us
e
d
bot
h
le
xi
c
on
a
nd
m
a
c
hi
ne
le
a
r
ni
ng
a
ppr
oa
c
he
s
.
T
he
a
ut
hor
s
f
ound
th
a
t
m
a
c
hi
ne
le
a
r
ni
ng
is
b
e
tt
e
r
th
a
n
th
e
le
xi
c
on
a
ppr
oa
c
h.
K
um
a
r
i
e
t
al
.
[
13]
c
ol
le
c
te
d
tw
e
e
ts
in
a
ll
la
ngua
ge
s
,
th
e
n
tr
a
ns
la
te
th
e
m
onl
in
e
to
E
ngl
is
h.
A
f
te
r
w
a
r
ds
,
tw
e
e
ts
a
r
e
c
la
s
s
if
ie
d
a
s
po
s
it
iv
e
or
ne
ga
ti
ve
u
s
in
g
a
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
to
s
e
r
ve
a
s
na
iv
e
ba
ye
s
c
la
s
s
if
ie
r
'
s
tr
a
in
in
g
da
ta
.
T
hi
s
a
ppr
oa
c
h
pr
ovi
d
e
s
good
c
la
s
s
if
ic
a
ti
on r
e
s
ul
t
s
.
2.2
.
D
is
t
r
ib
u
t
e
d
m
ac
h
in
e
l
e
ar
n
in
g
D
is
tr
ib
ut
e
d
m
a
c
hi
ne
le
a
r
ni
ng
c
a
n
de
a
l
w
it
h
c
om
put
a
ti
ona
l
c
om
pl
e
xi
ty
a
lg
or
it
hm
s
a
nd
m
e
m
or
y
r
e
s
tr
ic
ti
ons
in
la
r
ge
da
ta
s
e
ts
[
14]
.
T
o
s
ol
ve
th
e
pr
obl
e
m
of
a
lg
or
it
hm
s
'
in
a
bi
li
ty
to
p
r
oc
e
s
s
a
la
r
ge
vol
um
e
of
da
ta
,
th
e
y
m
us
t
r
un
on
s
e
ve
r
a
l
m
a
c
hi
ne
s
or
pr
oc
e
s
s
or
s
[
15]
.
B
e
s
id
e
s
th
e
pr
e
di
c
ti
on
e
f
f
ic
ie
nc
y
by
pa
r
a
ll
e
l
da
t
a
pr
oc
e
s
s
in
g, t
he
di
s
tr
ib
ut
e
d m
a
c
hi
ne
l
e
a
r
ni
ng a
lg
or
it
hm
s
pr
ovi
de
f
a
ul
t
to
le
r
a
nc
e
by c
opyi
ng t
he
da
ta
o
n
s
e
ve
r
a
l
m
a
c
hi
ne
s
.
M
or
e
ove
r
le
a
r
ni
ng
f
r
om
di
s
tr
ib
ut
e
d
da
ta
us
in
g
di
f
f
e
r
e
nt
a
lg
or
it
hm
s
pr
oduc
e
s
good
pr
e
c
is
io
ns
,
e
s
pe
c
ia
ll
y
in
la
r
ge
dom
a
in
s
[
16]
.
T
he
di
s
tr
ib
ut
e
d
a
lg
or
it
hm
s
c
a
n
be
in
te
gr
a
te
d
w
it
h
ot
he
r
da
ta
pr
oc
e
s
s
in
g
s
ys
te
m
s
[
17]
.
H
ow
e
ve
r
,
de
s
ig
ni
ng
a
nd
im
pl
e
m
e
nt
in
g
di
s
tr
ib
u
te
d
a
lg
or
it
hm
s
is
a
ha
r
d
ta
s
k
[
18
]
.
A
ls
o,
th
e
di
s
tr
ib
ut
e
d
a
lg
or
it
hm
s
a
r
e
e
f
f
e
c
ti
ve
w
he
n
th
e
node
s
de
di
c
a
te
d
to
th
e
da
ta
pr
oc
e
s
s
in
g
c
om
m
uni
c
a
te
di
r
e
c
tl
y.
H
ow
e
ve
r
, c
om
m
uni
c
a
ti
on a
c
r
os
s
t
he
ne
twor
k be
twe
e
n t
h
e
node
s
e
nt
a
il
s
a
l
onge
r
da
ta
pr
oc
e
s
s
in
g t
im
e
[
19]
.
2.3
.
M
ac
h
in
e
l
e
ar
n
in
g t
ool
s
S
pa
r
k
M
L
li
b
[
20]
a
nd
M
a
hout
[
21]
a
r
e
two
ope
n
-
s
our
c
e
to
ol
s
th
a
t
in
c
lu
de
s
e
ve
r
a
l
s
c
a
la
bl
e
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
im
pl
e
m
e
n
ta
ti
ons
.
T
he
im
pl
e
m
e
nt
e
d
a
lg
or
it
hm
s
pe
r
f
or
m
c
la
s
s
if
ic
a
ti
on,
r
e
gr
e
s
s
io
n,
c
lu
s
te
r
in
g,
c
ol
la
bor
a
ti
ve
f
il
te
r
in
g,
a
nd
di
m
e
ns
io
na
li
ty
r
e
duc
ti
o
n
ta
s
ks
.
T
he
y
a
r
e
in
de
pe
nde
nt
of
th
e
bi
g
da
ta
e
ngi
ne
, s
o t
he
y
a
r
e
por
ta
bl
e
,
a
nd w
e
c
a
n e
a
s
il
y i
m
pl
e
m
e
nt
t
he
m
in
a
not
he
r
bi
g da
ta
pl
a
tf
or
m
. M
a
hout
s
uppor
ts
H
a
doop,
s
pa
r
k
a
nd
H
2O
. F
ur
th
e
r
, a
lt
hough the
s
e
a
lg
or
it
hm
s
a
r
e
m
a
in
ly
i
nt
e
nde
d f
or
pr
oc
e
s
s
in
g l
a
r
ge
da
ta
i
n
a
di
s
tr
ib
ut
e
d
e
nvi
r
onm
e
nt
,
th
e
y
a
r
e
a
ls
o
us
e
d
to
pr
oc
e
s
s
s
m
a
ll
da
ta
on
a
s
in
gl
e
m
a
c
hi
ne
.
T
he
r
e
a
r
e
a
ls
o
f
r
a
m
e
w
or
ks
f
or
l
a
r
ge
-
s
c
a
le
da
ta
l
e
a
r
ni
ng, s
uc
h a
s
S
A
M
O
A
, but
it
i
s
a
pr
oj
e
c
t
in
i
ts
be
gi
nni
ngs
[
22]
.
3.
M
E
T
H
O
D
O
L
O
G
Y
A
s
s
how
n
in
F
ig
ur
e
1,
w
e
c
on
s
tr
uc
te
d
a
s
e
nt
im
e
nt
c
l
a
s
s
if
ic
a
ti
on
s
ys
te
m
f
r
om
a
da
ta
s
e
t
c
a
ll
e
d
A
m
a
z
on
M
ovi
e
R
e
vi
e
w
s
c
ont
a
in
in
g
ove
r
8
m
il
li
on
r
e
vi
e
w
s
.
T
o
te
s
t
our
s
ys
te
m
'
s
r
e
s
il
ie
nc
e
a
nd
it
s
a
bi
li
ty
to
s
c
a
le
up,
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
10
, N
o.
4
,
D
e
c
e
m
be
r
2021:
990
-
996
992
w
e
w
or
ke
d
w
it
h
f
iv
e
da
ta
s
e
ts
e
xt
r
a
c
t
e
d
f
r
om
th
e
A
m
a
z
on
M
ovi
e
R
e
vi
e
w
s
d
a
ta
s
e
t,
w
ho
s
e
s
i
z
e
va
r
ie
s
be
tw
e
e
n
10,000 a
nd 200,000 r
e
vi
e
w
s
. T
he
pr
oc
e
s
s
in
g of
t
hi
s
l
a
r
ge
da
ta
i
s
m
a
de
us
in
g t
he
a
pa
c
h
e
s
pa
r
k
f
r
a
m
e
w
or
k t
ha
t
in
c
or
por
a
te
s
m
a
c
hi
ne
l
e
a
r
ni
ng l
ib
r
a
r
ie
s
a
nd r
e
li
e
s
on a
di
s
tr
ib
ut
e
d pr
oc
e
s
s
in
g s
y
s
te
m
t
o e
xe
c
ut
e
pr
e
pr
oc
e
s
s
in
g
a
nd
le
a
r
ni
ng
ta
s
ks
.
W
e
c
ho
s
e
two
a
lg
or
it
hm
s
f
or
our
e
xpe
r
im
e
nt
:
na
iv
e
ba
ye
s
a
nd
s
uppor
t
ve
c
to
r
m
a
c
hi
ne
s
.
T
he
c
hoi
c
e
of
th
e
s
e
two
a
lg
or
it
hm
s
is
a
r
gue
d
by
th
e
f
a
c
t
th
a
t
th
e
y
a
r
e
th
e
be
s
t
a
lg
or
it
hm
s
in
te
r
m
s
of
pr
e
c
is
io
n,
a
s
i
t
ha
s
be
e
n pr
ove
n i
n va
r
io
us
r
e
s
e
a
r
c
h w
or
ks
[
23]
.
F
ig
ur
e
1. D
is
tr
ib
ut
e
d m
a
c
hi
ne
l
e
a
r
ni
ng a
r
c
hi
te
c
tu
r
e
i
n
s
pa
r
k
pl
a
tf
or
m
3
.1.
T
h
e
d
at
as
e
t
O
ur
e
xpe
r
im
e
nt
a
l
s
tu
dy
d
a
ta
s
e
t:
A
m
a
z
on
M
ovi
e
R
e
vi
e
w
s
D
a
t
a
s
e
t
[
24]
is
pa
r
t
of
th
e
S
ta
nf
or
d
N
e
twor
k
A
na
ly
s
is
P
r
oj
e
c
t.
I
t
is
a
c
ol
le
c
ti
on
of
opi
n
io
ns
c
ol
le
c
te
d f
r
om
A
m
a
z
on ove
r
a
pe
r
io
d o
f
10
ye
a
r
s
unt
il
O
c
to
be
r
2012.
I
t
ha
s
a
bout
8
m
il
li
on
r
e
vi
e
w
s
.
I
n
our
e
xpe
r
im
e
nt
'
s
c
a
s
e
,
f
iv
e
di
s
jo
in
t
s
ubs
e
ts
w
it
h
r
e
s
pe
c
ti
ve
s
iz
e
s
of
10
k,
50
k,
100
k,
150
k,
200
k
r
e
vi
e
w
s
ha
v
e
be
e
n
e
xt
r
a
c
te
d
f
r
om
th
is
da
ta
s
e
t.
E
a
c
h
r
e
vi
e
w
is
c
om
pos
e
d
of
e
ig
ht
f
e
a
tu
r
e
s
.
I
n
ou
r
s
tu
d
y,
w
e
e
xt
r
a
c
te
d
onl
y
two
f
e
a
tu
r
e
s
,
w
hi
c
h
a
r
e
th
e
r
e
vi
e
w
te
xt
a
nd
th
e
s
c
or
e
.
T
he
r
e
vi
e
w
te
xt
is
tr
a
ns
f
or
m
e
d
in
to
a
ve
c
to
r
us
in
g
th
e
ba
g
of
w
or
ds
m
ode
l.
T
he
s
c
or
e
s
a
r
e
c
onve
r
te
d
to
0'
s
a
nd
1'
s
to
a
s
s
ig
n
pol
a
r
it
ie
s
to
r
e
vi
e
w
s
by
a
ppl
yi
ng
th
e
f
ol
lo
w
in
g
c
onve
r
s
io
n
r
u
le
:
R
e
vi
e
w
s
w
it
h
s
c
or
e
s
be
twe
e
n
1
a
nd
3
a
r
e
c
ons
id
e
r
e
d
ne
ga
ti
ve
a
nd
a
r
e
a
s
s
ig
ne
d
th
e
va
lu
e
0,
w
hi
le
r
e
vi
e
w
s
w
it
h
s
c
or
e
s
be
twe
e
n
4
a
nd
5
a
r
e
r
e
ga
r
de
d
a
s
ha
vi
ng
pos
it
iv
e
s
e
nt
im
e
nt
a
nd
a
r
e
gi
ve
n
th
e
va
lu
e
1.
I
n
ou
r
e
xpe
r
im
e
nt
,
w
e
us
e
d
ba
la
nc
e
d
tr
a
in
in
g
da
ta
be
twe
e
n
th
e
pos
it
iv
e
a
nd ne
ga
ti
ve
c
l
a
s
s
e
s
.
3.2
.
F
e
at
u
r
e
s
e
le
c
t
io
n
T
he
e
xt
r
a
c
te
d da
t
a
s
e
t
s
ha
ve
unde
r
gone
pr
e
pr
oc
e
s
s
in
g ope
r
a
ti
on
s
t
hr
ough thr
e
e
s
ta
ge
s
:
−
T
oke
ni
s
a
ti
on
:
E
a
c
h
r
e
vi
e
w
is
s
e
gm
e
nt
e
d
by
s
pl
it
ti
ng
th
e
te
xt
in
to
w
o
r
ds
s
e
pa
r
a
te
d
by
s
pa
c
e
s
a
nd
punc
tu
a
ti
ons
.
−
S
to
p w
or
ds
r
e
m
ova
l:
T
he
r
e
m
ova
l
of
e
m
pt
y w
or
ds
s
uc
h a
s
a
r
ti
c
le
s
a
nd punc
tu
a
ti
on.
−
S
te
m
m
in
g:
E
a
c
h
w
or
d i
s
c
onve
r
te
d i
nt
o i
ts
s
te
m
.
−
F
e
a
tu
r
e
s
e
le
c
ti
on
:
T
h
e
s
e
le
c
te
d
f
e
a
tu
r
e
s
c
or
r
e
s
pond
to
s
e
qu
e
nc
e
s
of
a
s
in
gl
e
w
or
d
c
a
ll
e
d
uni
gr
a
m
s
,
pr
e
vi
ous
s
tu
di
e
s
ha
ve
a
r
gue
d
th
a
t
c
la
s
s
if
ic
a
ti
on
a
c
c
ur
a
c
y
is
m
or
e
a
c
c
ur
a
te
w
he
n
us
in
g
uni
gr
a
m
s
in
m
ovi
e
dom
a
in
, t
he
n, t
he
s
e
le
c
te
d f
e
a
tu
r
e
s
a
r
e
w
e
ig
ht
e
d
a
c
c
or
di
ng t
o t
h
e
T
F
-
I
T
F
s
c
he
m
e
f
ol
lo
w
in
g t
he
f
or
m
ul
a
.
T
F
ID
F
=
TF
∗
l
o
g
(
N
df
)
(
1)
N
i
s
t
he
t
ot
a
l
num
be
r
of
doc
um
e
nt
s
.
df
i
s
t
he
numbe
r
of
doc
um
e
nt
s
i
n w
hi
c
h t
he
t
e
r
m
a
ppe
a
r
s
.
T
F
i
s
t
he
numbe
r
of
t
im
e
s
a
t
e
r
m
a
ppe
a
r
s
i
n
a
doc
um
e
nt
.
3.3
.
T
r
ai
n
in
g t
h
e
c
la
s
s
if
ie
r
R
e
f
e
r
r
in
g
to
th
e
li
te
r
a
tu
r
e
,
s
uppor
t
ve
c
to
r
m
a
c
hi
ne
s
a
nd
na
iv
e
ba
ye
s
a
r
e
th
e
c
la
s
s
if
ie
r
s
th
a
t
br
in
g
th
e
be
s
t
pe
r
f
or
m
a
nc
e
s
i
n t
he
m
ovi
e
doma
in
. O
ur
e
xpe
r
im
e
nt
a
l
s
tu
d
y f
oc
us
e
s
on t
he
s
e
t
w
o a
lg
or
it
hm
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
s
s
e
s
s
in
g
nai
v
e
bay
e
s
and s
uppo
r
t
v
e
c
to
r
m
ac
hi
ne
pe
r
fo
r
m
anc
e
i
n
…
(
R
e
douane
K
ar
s
i
)
993
−
S
uppor
t
ve
c
to
r
m
a
c
hi
ne
s
:
S
uppor
t
ve
c
to
r
m
a
c
hi
ne
s
(
S
V
M
)
is
a
s
upe
r
vi
s
e
d
le
a
r
ni
ng
a
lg
or
it
hm
us
e
d
to
pe
r
f
or
m
c
la
s
s
if
ic
a
ti
on
a
nd
r
e
gr
e
s
s
io
n
ta
s
ks
[
25]
.
S
V
M
i
s
a
c
la
s
s
if
ie
r
ba
s
e
d
on
a
s
ta
ti
s
ti
c
a
l
a
ppr
oa
c
h
f
or
e
it
he
r
le
a
r
ni
ng
or
pr
e
di
c
ti
on.
I
t
is
de
ve
lo
pe
d by Va
pni
k
[
26]
.
I
ts
ope
r
a
ti
ng pr
in
c
ip
le
c
ons
is
ts
of
pe
r
f
or
m
in
g
a
s
e
t
of
c
om
put
a
ti
ons
t
o
de
te
r
m
in
e
a
hype
r
pl
a
ne
th
a
t
s
e
pa
r
a
te
s
th
e
da
ta
in
to
two
di
f
f
e
r
e
nt
c
la
s
s
e
s
,
s
o
th
a
t
th
e
di
s
ta
nc
e
be
twe
e
n
th
e
two
c
la
s
s
e
s
is
m
a
xi
m
um
.
S
V
M
s
e
e
k
s
to
pe
r
f
or
m
a
bi
na
r
y
c
la
s
s
if
ic
a
ti
on
by
de
f
in
in
g
a
hype
r
pl
a
ne
th
a
t
s
e
pa
r
a
te
s
th
e
two
c
la
s
s
e
s
'
da
ta
.
T
hi
s
c
a
n
be
a
c
hi
e
ve
d
by
e
xpr
e
s
s
in
g
th
e
da
ta
in
a
m
ul
ti
di
m
e
ns
io
na
l
s
pa
c
e
th
a
t
m
a
ke
s
th
e
da
ta
'
s
li
ne
a
r
s
e
pa
r
a
ti
on
qui
te
pos
s
ib
le
.
W
h
a
t
m
a
ke
s
S
V
M
a
c
om
pl
e
x
a
lg
or
it
hm
is
th
a
t
it
us
e
s
a
ke
r
ne
l
f
unc
ti
o
n
th
a
t
r
e
li
e
s
on
th
e
pr
oj
e
c
ti
on
of
da
ta
in
to
a
hi
ghe
r
-
di
m
e
ns
io
na
l
s
pa
c
e
in
w
hi
c
h
th
e
pr
obl
e
m
be
c
om
e
s
li
ne
a
r
.
T
he
a
lg
or
it
hm
m
us
t
go
th
r
ough
s
e
ve
r
a
l
it
e
r
a
ti
ons
to
s
e
le
c
t
th
e
on
ly
hype
r
pl
a
ne
a
m
ong
a
ll
th
os
e
w
ho
s
e
pa
r
a
te
le
a
r
ni
ng
da
ta
a
c
c
or
di
ng
to
th
e
ir
c
la
s
s
.
T
he
pa
r
ti
c
ul
a
r
it
y
of
th
is
hype
r
pl
a
ne
is
th
a
t
it
is
lo
c
a
te
d
a
t
a
m
a
xi
m
um
di
s
ta
nc
e
f
r
om
di
f
f
e
r
e
nt
l
e
a
r
ni
ng i
ns
ta
nc
e
s
.
−
N
a
iv
e
b
a
ye
s
:
N
a
iv
e
b
a
ye
s
[
27]
is
a
c
la
s
s
if
ie
r
ba
s
e
d
on
th
e
ba
ye
s
th
e
or
e
m
.
I
n
th
is
m
ode
l,
th
e
r
a
ndom
va
r
ia
bl
e
s
a
r
e
s
ta
ti
s
ti
c
a
ll
y
in
de
p
e
nde
nt
gi
ve
n
a
c
l
a
s
s
c
.
T
hi
s
a
s
s
um
pt
io
n
of
d
a
ta
in
de
pe
nde
n
c
e
w
il
l
r
e
du
c
e
th
e
c
om
put
a
ti
on
ti
m
e
.
T
o
pr
e
di
c
t
th
e
c
la
s
s
c
i
of
a
r
a
ndom
va
r
ia
bl
e
X
by
a
ppl
yi
ng
t
he
ba
ye
s
th
e
or
e
m
,
w
e
c
a
lc
ul
a
te
th
e
c
ondi
ti
ona
l
pr
oba
bi
li
ty
t
ha
t
th
e
va
r
ia
bl
e
X
be
lo
ngs
t
o t
he
c
la
s
s
c
i
by t
hi
s
f
or
m
ul
a
:
(
=
)
=
(
=
⁄
)
×
(
=
⁄
)
(
)
(
2)
P
(
C
=
c
i
/X)
i
s
t
he
pr
oba
bi
li
ty
of
c
la
s
s
c
i
c
ondi
ti
one
d on X.
P
(
C
=
c
i
)
i
s
t
he
pr
oba
bi
li
ty
of
c
la
s
s
c
i
.
P
(
X
/
C
=
c
i
)
i
s
t
he
pr
oba
bi
li
ty
of
X
c
ondi
ti
one
d on c
i
.
P
(
X
)
i
s
t
he
pr
oba
bi
li
ty
of
X
.
T
he
r
a
ndom
va
r
ia
bl
e
X
w
il
l
be
a
s
s
ig
n
e
d
th
e
c
l
a
s
s
c
i
w
hi
c
h
m
a
xi
m
iz
e
s
th
e
c
ondi
ti
ona
l
pr
oba
bi
li
ty
P
(
C
=
c
i
/X)
.
−
A
pa
c
he
s
p
a
r
k
:
T
he
f
ir
s
t
B
ig
D
a
ta
pl
a
tf
or
m
s
l
ik
e
H
a
doop ba
s
e
d on
t
he
M
a
pR
e
d
uc
e
f
r
a
m
e
w
or
k w
e
r
e
m
a
in
ly
de
s
ig
ne
d
f
or
ba
tc
h
da
ta
pr
oc
e
s
s
in
g
w
hi
c
h
r
e
qui
r
e
s
f
r
e
que
nt
a
c
c
e
s
s
to
t
he
s
to
r
a
ge
s
pa
c
e
,
but
in
th
e
c
a
s
e
of
it
e
r
a
ti
ve
c
om
put
in
g,
th
e
pe
r
f
or
m
a
nc
e
of
th
e
M
a
p
R
e
duc
e
f
r
a
m
e
w
or
k
s
d
e
c
r
e
a
s
e
s
c
on
s
id
e
r
a
bl
y.
W
it
h
th
e
w
id
e
s
pr
e
a
d
u
s
e
of
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
s
f
or
da
ta
a
na
ly
s
is
,
a
nd
in
o
r
de
r
to
ove
r
c
om
e
th
e
pr
obl
e
m
of
in
te
ns
iv
e
c
om
put
a
ti
ons
pe
r
f
or
m
e
d
by
m
a
c
hi
ne
l
e
a
r
ni
ng
a
lg
or
it
hm
s
,
s
e
v
e
r
a
l
te
c
hni
que
s
ha
ve
b
e
e
n
de
v
e
lo
pe
d,
e
s
pe
c
ia
ll
y
f
or
th
e
f
a
s
t
pr
oc
e
s
s
in
g
of
m
a
s
s
iv
e
da
ta
.
A
m
ong
th
e
s
e
te
c
hni
q
ue
s
,
a
pa
c
he
s
p
a
r
k
is
pos
it
io
ne
d
a
s
a
n
e
f
f
ic
ie
nt
s
ol
ut
io
n
th
a
t
pr
ovi
de
s
a
hi
ghe
r
-
le
ve
l
pr
ogr
a
m
m
in
g
in
te
r
f
a
c
e
to
de
ve
lo
p
di
s
tr
ib
ut
e
d
a
ppl
ic
a
ti
ons
.
I
n
th
is
pl
a
tf
or
m
,
th
e
da
ta
a
nd
th
e
in
te
r
m
e
di
a
te
r
e
s
ul
ts
a
r
e
lo
a
de
d a
nd
s
t
or
e
d
in
th
e
m
e
m
or
y
of
c
lu
s
te
r
m
a
c
hi
ne
s
us
in
g
a
da
ta
a
bs
tr
a
c
ti
on
s
ys
te
m
c
a
ll
e
d R
e
s
il
ie
nt
D
is
tr
ib
ut
e
d D
a
ta
s
e
t
pr
ovi
di
ng da
ta
pr
oc
e
s
s
in
g i
n pa
r
a
ll
e
l.
4.
E
X
P
E
R
I
M
E
N
T
S
A
N
D
R
E
S
U
L
T
S
S
e
ve
r
a
l
e
xpe
r
im
e
nt
s
w
e
r
e
c
onduc
te
d
to
hi
ghl
ig
ht
th
e
pe
r
f
or
m
a
n
c
e
of
two
c
la
s
s
if
ie
r
s
:
s
vm
a
nd
na
iv
e
ba
ye
s
in
a
di
s
tr
i
but
e
d
e
nvi
r
onm
e
nt
s
uc
h
a
s
s
pa
r
k
.
O
ur
e
va
lu
a
ti
on
w
a
s
done
f
r
om
s
e
ve
r
a
l
a
ngl
e
s
by
obs
e
r
vi
ng
in
di
c
a
to
r
s
s
uc
h
a
s
(
c
la
s
s
if
ic
a
ti
on
F
-
m
e
a
s
ur
e
a
nd
ti
m
e
n
e
e
de
d
t
o
c
om
pl
e
te
th
e
le
a
r
ni
ng
jo
b)
w
hi
le
va
r
yi
ng
th
e
da
t
a
s
e
t
s
i
z
e
a
nd t
he
numbe
r
of
c
lu
s
te
r
s
l
a
ve
node
s
.
4.1.
S
e
t
u
p
s
p
ar
k
c
lu
s
t
e
r
T
o
s
e
t
up
th
e
e
nvi
r
onm
e
nt
to
tr
a
in
th
e
c
la
s
s
if
ic
a
ti
on
a
lg
or
it
hm
s
,
w
e
ha
ve
im
pl
e
m
e
nt
e
d
a
m
ul
ti
-
node
c
lu
s
te
r
a
r
c
hi
te
c
tu
r
e
. O
ur
s
ys
te
m
i
s
c
om
po
s
e
d of
a
m
a
s
te
r
node
,
a
nd t
hr
e
e
s
la
ve
node
s
, e
a
c
h node
of
t
he
c
lu
s
t
e
r
ha
s
a
c
onf
ig
ur
a
ti
on
w
it
h
a
3.4
G
H
z
pr
oc
e
s
s
or
,
8
G
B
m
e
m
or
y,
a
nd
500 G
B
ha
r
d
di
s
k.
T
he
s
e
di
f
f
e
r
e
nt
node
s
a
r
e
in
te
r
c
onne
c
te
d w
it
h a
l
oc
a
l
ne
twor
k w
it
h a
s
pe
e
d of
100 M
bps
. W
e
opt
e
d f
or
t
hi
s
c
onf
ig
ur
a
ti
on t
o
pr
ovi
de
t
he
sa
m
e
c
ondi
ti
ons
i
n w
hi
c
h t
r
a
di
ti
ona
l
a
lg
or
it
hm
s
ha
ve
b
e
e
n e
xpe
r
im
e
nt
e
d on a
s
in
gl
e
m
a
c
hi
ne
.
4.2.
T
r
ai
n
in
g an
d
c
la
s
s
if
ic
at
io
n
al
gor
it
h
m
A
f
te
r
e
xt
r
a
c
ti
ng
f
iv
e
da
ta
s
e
ts
of
r
e
s
pe
c
ti
ve
s
iz
e
s
of
10
k,
50
k,
100
k,
150
k a
nd
200
k
f
r
om
A
m
a
z
on
M
ovi
e
r
e
vi
e
w
s
da
ta
s
e
t
,
w
e
ha
v
e
w
r
it
te
n
a
ja
va
pr
ogr
a
m
w
hi
c
h
e
xpl
oi
ts
th
e
s
pa
r
k'
s
m
a
c
hi
ne
le
a
r
ni
ng
li
br
a
r
y
M
L
li
b,
our
pr
ogr
a
m
r
e
c
e
iv
e
s
a
s
in
put
a
da
ta
s
e
t
of
m
ovi
e
r
e
vi
e
w
s
.
N
e
xt
,
it
c
a
r
r
ie
s
out
va
r
io
us
pr
e
pr
oc
e
s
s
in
g
ope
r
a
ti
ons
,
in
c
lu
di
ng
s
e
le
c
ti
ng
to
ke
ns
,
s
uppr
e
s
s
in
g
s
to
p
w
or
ds
a
nd
f
e
a
tu
r
e
s
e
le
c
ti
on
(
U
ni
gr
a
m
s
w
e
ig
ht
e
d
a
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
10
, N
o.
4
,
D
e
c
e
m
be
r
2021:
990
-
996
994
TF
-
I
D
F
)
, a
nd t
he
n pe
r
f
or
m
in
g l
e
a
r
ni
ng, c
la
s
s
if
ic
a
ti
on, a
nd
c
om
put
a
ti
on of
pe
r
f
or
m
a
nc
e
i
ndi
c
a
to
r
s
. B
e
lo
w
,
t
he
s
e
nt
im
e
nt
c
la
s
s
if
ic
a
ti
on a
lg
or
it
hm
us
in
g
na
iv
e
b
a
ye
s
.
A
lg
or
it
h
m
:
S
e
n
t
im
e
n
t
c
la
s
s
if
ic
at
io
n
u
s
in
g n
ai
ve
b
aye
s
on
s
p
ar
k
p
la
t
f
or
m
1
/
/
L
o
a
d
t
r
a
i
n
n
i
n
g
d
a
t
a
2
D
a
t
a
s
e
t
m
o
v
i
e
D
a
t
a
=
l
o
a
d
(
"
c
:
/
d
a
t
a
/
m
o
v
i
e
.
t
x
t
"
)
;
3
/
/
S
p
l
i
t
t
h
e
d
a
t
a
i
n
t
o
t
o
k
e
n
s
4
n
e
w
T
o
k
e
n
i
z
e
r
(
)
.
t
r
a
n
s
f
o
r
m
(
m
o
v
i
e
D
a
t
a
)
5
/
/
S
t
o
p
w
o
r
d
s
r
e
m
o
v
a
l
6
S
t
o
p
W
o
r
d
s
R
e
m
o
v
e
r
(
)
.
l
o
a
d
D
e
f
a
u
l
t
S
t
o
p
W
o
r
d
s
(
"
e
n
g
l
i
s
h
"
)
7
r
e
m
o
v
e
r
.
t
r
a
n
s
f
o
r
m
(
m
o
v
i
e
D
a
t
a
)
.
s
h
o
w
(
f
a
l
s
e
)
;
8
/
/
S
e
l
e
c
t
u
n
i
g
r
a
m
s
9
N
G
r
a
m
n
g
r
a
m
T
=
n
e
w
N
G
r
a
m
(
)
.
s
e
t
N
(
1
)
;
10
n
g
r
a
m
T
.
t
r
a
n
s
f
o
r
m
(
m
o
v
i
e
D
a
t
a
)
;
11
/
/
S
e
t
T
F
-
I
D
F
a
s
a
W
e
i
g
h
t
s
c
h
e
m
e
12
n
e
w
h
a
s
h
i
n
g
T
F
(
)
.
t
r
a
n
s
f
o
r
m
(
m
o
v
i
e
D
a
t
a
)
;
13
n
e
w
i
d
f
(
)
.
f
i
t
(
m
o
v
i
e
D
a
t
a
)
;
14
/
/
S
p
l
i
t
d
a
t
a
i
n
t
o
t
r
a
i
n
i
n
g
(
9
0
%
)
a
n
d
t
e
s
t
(
1
0
%
)
15
s
p
l
i
t
s
=
m
o
v
i
e
D
a
t
a
.
r
a
n
d
o
m
S
p
l
i
t
(
n
e
w
d
o
u
b
l
e
[
]
{
0
.
9
,
0
.
1
}
)
;
16
D
a
t
a
s
e
t
<
R
o
w
>
n
b
T
r
a
i
n
=
s
p
l
i
t
s
[
0
]
;
17
D
a
t
a
s
e
t
<
R
o
w
>
n
b
T
e
s
t
=
s
p
l
i
t
s
[
1
]
;
18
/
/
c
r
e
a
t
e
t
h
e
N
a
i
v
e
B
a
y
e
s
c
l
a
s
s
i
f
i
e
r
19
N
a
i
v
e
B
a
y
e
s
n
B
a
y
e
s
=
n
e
w
N
a
i
v
e
B
a
y
e
s
(
)
;
20
/
/
t
r
a
i
n
t
h
e
m
o
d
e
l
21
N
a
i
v
e
B
a
y
e
s
M
o
d
e
l
n
b
M
o
d
e
l
=
n
B
a
y
e
s
.
f
i
t
(
n
b
T
r
a
i
n
)
;
22
/
/
T
e
s
t
t
h
e
m
o
d
e
l
21
D
a
t
a
s
e
t
<
R
o
w
>
r
e
s
u
l
t
s
=
n
b
M
o
d
e
l
.
t
r
a
n
s
f
o
r
m
(
n
b
T
e
s
t
)
;
22
r
e
s
u
l
t
s
.
s
h
o
w
(
)
;
23
/
/
c
o
m
p
u
t
e
F
-
m
e
a
s
u
r
e
o
n
t
h
e
t
e
s
t
d
a
t
a
24
e
v
a
l
u
a
t
o
r
=
n
e
w
B
i
n
a
r
y
C
l
a
s
s
i
f
i
c
a
t
i
o
n
E
v
a
l
u
a
t
o
r
(
)
;
25
D
o
u
b
l
e
f
1
=
e
v
a
l
u
a
t
o
r
.
e
v
a
l
u
a
t
e
(
r
e
s
u
l
t
s
)
;
4.3.
R
e
s
u
lt
s
an
d
d
is
c
u
s
s
io
n
T
he
de
s
ig
ne
d
pr
ogr
a
m
pr
ovi
de
s
s
e
ve
r
a
l
s
t
a
ti
s
ti
c
s
m
e
a
s
ur
in
g
t
he
c
la
s
s
if
ic
a
ti
on
F
-
m
e
a
s
ur
e
a
nd
th
e
pr
oc
e
s
s
in
g
ti
m
e
a
c
c
or
di
ng
to
s
e
ve
r
a
l
pa
r
a
m
e
te
r
s
s
uc
h
a
s
th
e
da
ta
s
e
t
s
i
z
e
a
nd
th
e
num
be
r
of
s
la
v
e
node
s
c
ons
ti
tu
ti
ng
th
e
s
pa
r
k
c
lu
s
te
r
.
T
o
e
va
lu
a
te
th
e
pe
r
f
or
m
a
n
c
e
of
o
ur
a
lg
or
it
hm
s
,
w
e
us
e
th
e
c
la
s
s
if
ic
a
ti
on
r
e
c
a
ll
,
pr
e
c
is
io
n a
nd F
-
m
e
a
s
ur
e
s
de
f
in
e
d
a
s
(
3)
-
(
5)
:
=
+
(
3)
=
+
(
4)
−
=
2
.
.
+
(
5)
w
he
r
e
−
T
P
(
T
r
ue
P
os
it
iv
e
)
:
c
or
r
e
c
tl
y c
la
s
s
if
ie
d a
s
pos
it
iv
e
.
−
F
P
(
F
a
ls
e
P
os
it
iv
e
)
:
in
c
or
r
e
c
tl
y c
la
s
s
if
ie
d a
s
pos
it
iv
e
.
−
T
N
(
T
r
ue
N
e
ga
ti
ve
)
:
c
or
r
e
c
tl
y c
la
s
s
if
ie
d a
s
ne
ga
ti
ve
.
−
F
N
(
F
a
ls
e
N
e
ga
ti
ve
)
:
in
c
or
r
e
c
tl
y c
la
s
s
if
ie
d a
s
ne
ga
ti
ve
.
I
n
T
a
bl
e
1,
w
e
f
in
d
th
a
t
th
e
c
la
s
s
if
ic
a
ti
on
F
-
m
e
a
s
ur
e
of
S
V
M
a
n
d
na
iv
e
ba
ye
s
unde
r
s
pa
r
k
f
r
a
m
e
w
or
k
is
gr
e
a
te
r
th
a
n
84%
a
nd
c
ons
is
te
nt
ly
e
xc
e
e
ds
ba
s
e
li
ne
r
e
s
ul
ts
o
bt
a
in
e
d
on
a
s
in
gl
e
m
a
c
hi
ne
r
e
ga
r
dl
e
s
s
o
f
th
e
da
ta
s
e
t
s
iz
e
.
O
n
th
e
ot
he
r
ha
nd,
if
th
e
c
la
s
s
if
ic
a
ti
on
pr
oc
e
s
s
i
s
pe
r
f
or
m
e
d
in
a
s
in
gl
e
m
a
c
hi
n
e
c
onf
ig
ur
a
ti
on,
pe
r
f
or
m
a
nc
e
is
poor
f
r
om
10k
da
ta
s
e
t
s
iz
e
.
I
n
la
r
ge
r
s
iz
e
s
,
th
e
s
ys
te
m
f
a
il
s
to
c
om
pl
e
te
th
e
le
a
r
ni
ng
ta
s
k
a
nd
ge
ne
r
a
te
s
a
n
out
-
of
-
m
e
m
or
y
e
r
r
or
.
U
nl
ik
e
th
e
r
e
s
ul
ts
a
c
hi
e
ve
d
by
tr
a
di
ti
ona
l
m
a
c
hi
ne
le
a
r
ni
ng
te
c
hni
que
s
,
na
iv
e
b
a
ye
s
is
m
or
e
a
c
c
ur
a
te
th
a
n
S
V
M
w
he
n
us
in
g
s
pa
r
k
c
om
pone
nt
s
.
O
th
e
r
w
is
e
,
w
e
obs
e
r
ve
th
a
t
th
e
c
la
s
s
if
ic
a
ti
on
F
-
m
e
a
s
ur
e
in
c
r
e
a
s
e
s
unt
il
it
s
ta
bi
li
z
e
s
f
r
om
da
ta
s
e
t
s
iz
e
s
gr
e
a
t
e
r
th
a
n
150
k.
T
hi
s
is
be
c
a
us
e
th
e
m
ode
l
ga
in
s
e
nough knowle
dge
f
r
om
m
a
ny t
r
a
in
in
g e
xa
m
pl
e
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
A
s
s
e
s
s
in
g
nai
v
e
bay
e
s
and s
uppo
r
t
v
e
c
to
r
m
ac
hi
ne
pe
r
fo
r
m
anc
e
i
n
…
(
R
e
douane
K
ar
s
i
)
995
T
o
te
s
t
our
m
e
th
ods
'
s
c
a
l
a
bi
li
ty
,
th
e
ti
m
e
r
e
qui
r
e
d
f
or
bo
th
S
V
M
a
nd
na
iv
e
b
a
ye
s
a
lg
or
it
hm
s
e
xe
c
ut
e
d
on
th
r
e
e
s
la
ve
node
s
to
c
om
pl
e
te
pr
e
pr
oc
e
s
s
in
g
a
nd
le
a
r
ni
n
g
ope
r
a
ti
ons
w
a
s
c
a
lc
ul
a
te
d
a
s
il
lu
s
tr
a
te
d
in
F
ig
ur
e
2.
W
e
de
du
c
e
th
a
t
th
e
s
e
two
a
lg
or
it
hm
s
'
r
unni
ng
ti
m
e
r
is
e
s
pr
opor
ti
ona
ll
y
to
th
e
da
ta
s
e
t
s
i
z
e
w
hi
le
m
a
in
ta
in
in
g
be
tt
e
r
c
la
s
s
if
ic
a
ti
on
pe
r
f
or
m
a
nc
e
,
c
onf
ir
m
in
g
th
a
t
S
V
M
a
nd
na
iv
e
ba
y
e
s
a
r
e
s
c
a
la
bl
e
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
s
on
s
pa
r
k
pl
a
tf
or
m
.
T
hi
s
is
due
to
s
pa
r
k'
s
c
a
p
a
bi
li
ti
e
s
in
r
e
duc
in
g
la
te
nc
y
by
c
a
c
hi
ng
da
ta
s
e
t
in
m
e
m
or
y f
or
f
a
s
t
pr
oc
e
s
s
in
g
a
nd
s
ha
r
in
g
da
ta
dur
in
g
it
e
r
a
ti
ve
c
om
put
a
ti
ons
.
F
ur
th
e
r
m
or
e
, i
f
w
e
a
dd
node
s
to
th
e
c
lu
s
te
r
,
w
e
not
e
th
a
t
th
e
r
unni
ng
ti
m
e
de
c
r
e
a
s
e
s
c
on
s
id
e
r
a
bl
y.
I
nde
e
d,
th
e
m
a
s
te
r
node
di
s
tr
ib
ut
e
s
d
a
ta
pr
oc
e
s
s
in
g be
twe
e
n t
he
di
f
f
e
r
e
nt
s
la
ve
node
s
a
s
i
ll
us
tr
a
te
d i
n F
i
gur
e
3.
T
a
bl
e
1. S
e
nt
im
e
nt
c
la
s
s
if
ic
a
ti
on F
-
m
e
a
s
ur
e
of
S
V
M
a
nd
na
iv
e
ba
ye
s
unde
r
s
p
a
r
k pl
a
tf
or
m
c
om
pa
r
e
d t
o
ba
s
e
li
ne
pe
r
f
or
m
a
nc
e
on s
in
gl
e
m
a
c
hi
ne
10k
50k
100k
150k
200k
S
V
M
s
pa
r
k
84.79
84.99
85.65
86.88
87.13
S
V
M
B
a
s
e
l
i
ne
84.51
-
-
-
-
N
B
s
pa
r
k
85.58
86.31
86.68
87.51
87.82
N
B
B
a
s
e
l
i
ne
83.39
-
-
-
-
F
ig
ur
e
2. R
unni
ng t
im
e
of
S
V
M
a
nd
na
iv
e
ba
ye
s
a
lg
or
it
hm
s
w
he
n da
ta
s
e
t
s
iz
e
i
nc
r
e
a
s
e
s
u
s
in
g 3 s
la
v
e
node
s
F
ig
ur
e
3. R
unni
ng t
im
e
of
S
V
M
a
nd
na
iv
e
ba
ye
s
a
lg
or
it
hm
s
w
he
n a
ddi
ng node
s
t
o t
he
s
pa
r
k
c
lu
s
te
r
on 150
k da
ta
s
e
t
s
i
z
e
5.
C
O
N
C
L
U
S
I
O
N
E
xpe
r
im
e
nt
s
ha
ve
s
how
n
th
a
t
m
a
c
hi
n
e
le
a
r
ni
ng
a
lg
or
it
hm
s
a
r
e
v
e
r
y
e
f
f
e
c
ti
ve
in
de
a
li
ng
w
it
h
di
f
f
e
r
e
nt
is
s
ue
s
of
s
e
nt
im
e
nt
a
na
ly
s
is
.
H
ow
e
v
e
r
th
e
y
ha
ve
s
om
e
w
e
a
kn
e
s
s
e
s
,
a
m
ong
w
hi
c
h
th
e
ir
in
a
bi
li
ty
to
s
c
a
le
up
w
he
n
th
e
vol
um
e
of
d
a
ta
in
c
r
e
a
s
e
s
a
s
in
bi
g
d
a
ta
c
ont
e
xt
.
T
hr
oug
h
th
is
pa
pe
r
,
w
e
c
ondu
c
te
d
a
s
e
nt
im
e
nt
a
na
ly
s
is
a
ppr
oa
c
h
th
a
t
e
xpl
oi
ts
m
a
c
hi
ne
le
a
r
ni
ng
c
om
pone
nt
s
of
s
pa
r
k
a
s
a
bi
g
da
ta
f
r
a
m
e
w
or
k.
I
n
our
e
xpe
r
im
e
nt
a
l
s
tu
dy,
w
e
w
r
ot
e
a
pr
ogr
a
m
ba
s
e
d
on
a
pa
c
he
s
pa
r
k'
s
m
a
c
hi
n
e
le
a
r
ni
ng
li
br
a
r
y
(
M
L
li
b)
to
obs
e
r
ve
th
e
be
ha
vi
or
of
two
m
a
c
hi
ne
le
a
r
ni
ng
a
lg
or
it
hm
s
:
S
V
M
a
nd
na
iv
e
b
a
ye
s
f
o
r
s
e
nt
im
e
nt
c
l
a
s
s
if
ic
a
ti
on
us
in
g
l
a
r
ge
tr
a
in
in
g
da
ta
s
e
ts
w
hos
e
s
iz
e
v
a
r
ie
s
be
twe
e
n
10
k
a
nd
200
k.
F
r
om
th
e
r
e
s
ul
ts
of
our
e
xpe
r
im
e
nt
s
,
it
a
ppe
a
r
s
th
a
t
th
e
c
la
s
s
if
ic
a
ti
on pe
r
f
or
m
a
n
c
e
unde
r
s
p
a
r
k
is
m
uc
h be
tt
e
r
c
om
pa
r
e
d t
o t
r
a
di
ti
ona
l
a
ppr
oa
c
he
s
. M
or
e
ove
r
, i
n t
e
r
m
s
of
s
c
a
la
bi
li
ty
,
th
e
r
unni
ng
ti
m
e
is
pr
opor
ti
ona
l
to
th
e
tr
a
in
in
g
da
t
a
s
e
t
s
iz
e
.
B
e
s
id
e
s
,
it
ha
s
be
e
n
f
ound
th
a
t
a
ddi
ng
s
la
ve
node
s
to
th
e
c
lu
s
te
r
s
ig
ni
f
ic
a
nt
ly
r
e
duc
e
s
la
te
nc
y.
I
n
our
f
ut
ur
e
w
or
k,
w
e
w
il
l
in
ve
s
ti
ga
te
w
a
ys
to
tr
a
in
c
la
s
s
if
ie
r
s
f
r
om
va
r
io
us
he
te
r
oge
ne
ou
s
da
ta
s
our
c
e
s
.
R
E
F
E
R
E
N
C
E
S
[1]
E.
Park,
J.
Kang,
D.
Choi,
and
J.
Han,
“
Understanding
customer
s’
hot
el
revisiting
behaviour:
A
sentiment
analy
sis
of
online
feedback
reviews,”
Curr.
Issues
Tour
,
vol.
23,
no.
5,
pp.
605
–
611,
Mar.
2020
,
doi:
10.1080/13683500.2018.1549025.
[2]
E.
Park,
Y.
Jang,
J.
Kim,
N.
J.
Jeong
,
K.
Bae,
and
A.
P.
del
Pobil,
“
Determinants
of
customer
satisfaction
with
airlin
e
services: An a
nalysis
of custom
er feedback big
data,”
J. Retai
l.
Consu
m. Serv
, vol. 51, pp.
186
–
190, Nov. 2019
,
doi:
10.1016/j.jretconser.2019.06.009.
[3]
E.
M.
Alshari,
A.
Azman,
S.
Doraisamy
,
N.
Mustapha,
and
M.
Alke
shr,
“
Effective
method
for
sentiment
lexical
dictionary
enrichme
nt
based
on
word2vec
for
sentiment
analysis,”
i
n
2018
Fourth
International
Conference
on
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
,
V
ol
.
10
, N
o.
4
,
D
e
c
e
m
be
r
2021:
990
-
996
996
Informati
on
Retrieval
and
Knowledg
e
Management
(CAMP)
,
Mar.
2018,
pp.
1
–
5
,
doi
: 10.1109/INFRKM
.2018.8464775.
[4]
A.
Rahman
and
M.
S.
Hossen,
“
Sentimen
t
analysis
on
movie
review
da
ta
using
machine
learning
approach,”
in
2019
Internati
onal
Conference
on
Bangla
Speech
and
Language
Proce
ssing
(ICBSLP)
,
Sep.
2019,
pp.
1
–
4
,
doi
:
10.1109/ICBSLP47725.2019.201470.
[5]
O.
Kolchyna,
T.
T.
P.
Sou
za,
P.
Treleaven,
and
T.
Aste,
“
Twitter
s
entiment
analysis
:
Lexicon
method,
machine
learning me
thod and the
ir combinatio
n,”
ArXiv150700955 Cs Stat
, Sep. 2015
.
[6]
J.
López
Belmonte,
A.
Segura
-
Robles,
A.
-
J.
Moreno
-
Gu
errero,
and
M.
E.
Parra
-
Gonzál
ez,
“
Machine
learning
and
big
data
in
the
impac
t
literature.
a
bibliometric
review
with
scientific
m
apping
in
web
of
science,”
Symmetry
,
vol.
12,
no. 4, Art. no. 4, Apr. 2020
,
doi:
10.3390/sym12040495.
[7]
I.
Lee,
“
Big
data:
Dimensions,
evol
ution,
impacts,
and
challenges,”
Bus.
Horiz
,
vol.
60,
no.
3,
pp.
2
93
–
303,
May
2017
,
doi:
10.1016/j.bushor.2017.01.004.
[8]
A.
L’Heureux,
K.
Grolinger,
H.
F.
E
lyamany,
and
M.
A.
M.
Capretz,
“
Machine
learning
with
big
data:
Challenge
s
and approach
es,”
IEEE Access
, vol. 5, pp. 7776
–
7797, 2017
,
doi
:
10.1109/ACCESS.2017.2696365.
[9]
B.
Pang,
L.
Lee,
and
S.
Vaithyanathan,
“
Thumbs
up?
Sentiment
class
ification
using
machine
learning
techniques,”
in
Proceedings of the 2002 Conference on Empirical
Methods
in Natur
al Language Processing (EMNLP 2
002)
, Jul.
2002,
pp. 79
–
86.
[10]
H.
Kim,
P.
Howland,
H.
Park,
and
N.
Christianini,
“
Dimension
reduc
tion
in
text
classificatio
n
with
support
vector
machines,”
J. Mach. L
earn. Res
.
, vol. 6, no. 1, 2005.
[11]
B.
Jeong,
J.
Yoon,
and
J.
-
M.
Lee,
“
Social
media
mining
for
product
pla
nning:
A
product
opportunity
mining
approach
based
on
topic
modeling
and
sentiment
analysis,”
Int.
J.
Inf.
Manag
,
vol.
48,
pp.
280
-
290,
2019
,
doi:
10.1016/j.ijinfomgt.2017.09.009.
[12]
D.
D.
Wu,
L.
Zheng,
and
D.
L.
Olson,
“
A
decision
support
approach
for
online
stock
forum
sentiment
analysis,”
IEEE Trans
. Syst. Man
Cybern. Sys
t
, vol. 44, no. 8, pp. 1077
–
1087, 2014
,
doi
: 10.1109/TSMC.2013
.2295353.
[13]
P. Kuma
ri, S. Si
ngh, D.
Mo
re, D. Talpade,
and
M. Path
ak, “Sentim
ent
analysis o
f
tweets,”
Int. J. Sci.
Technol.
Eng.
,
vol. 1, no. 10, pp. 130
–
134, 2015
,
doi:
10.1007/s00484
-
018
-
1574
-
7.
[14]
D.
Peteiro
-
Ba
rral
and
B.
Guijarro
-
Berdiñas
,
“
A
survey
of
methods
f
o
r
distributed
machine
learning,”
Prog.
Artif.
Intell.
, vol. 2, no. 1, pp. 1
–
11, 2013
.
[15]
J.
Verbraeken,
M.
Wolting,
J.
Katzy,
J.
Kloppenburg,
T.
Verbelen,
a
n
d
J.
S.
Rellermeyer,
“
A
survey
o
n
distribute
d
machine le
arning,”
ACM Comput. Surv. CSUR
, vol. 53, no. 2, pp. 1
–
33, 2020
,
doi:
10.1145/3377454.
[16]
A.
Qiao
et
al.
,
“
Litz:
Elasti
c
framework
for
high
-
perform
ance
distribute
d
machine
learning,”
2018
{
USENIX}
Annual
Technic
al Confe
rence
({
USENIX
}
{
ATC}
18)
, 2018,
pp. 631
-
644
.
[17]
M.
Yui
and
I.
Kojima,
“
A
Database
-
hadoop
hybrid
approach
to
scalable
machine
learning,”
in
2013
IEEE
Internati
onal Con
gress on B
ig Data
, Jun. 2013, pp. 1
–
8
,
doi
: 10.1109/BigDa
ta.Congress.
2013.10.
[18]
Y.
Low,
J.
Gonzalez,
A.
Kyrola,
D.
Bickson,
C.
Gu
estrin
,
and
J.
M.
Hel
lerstein,
“Distributed
g
raphLab:
A
framework
for machine l
earning in t
he cloud,”
ArXiv12046078 Cs
, Apr. 2012
.
[19]
K.
Luu,
C.
Zhu,
and
M.
Savvides,
“
Distributed
cla
ss
dependent
feature
analysis
-
A
big
data
approach,”
2014
IEEE
Int. Conf
. Big Dat
a
Big Data
, 2014
,
doi
: 10.1109/BigDa
ta.2014.70042
33.
[20]
M.
Assefi,
E.
Behra
vesh,
G.
Liu,
and
A.
P.
Tafti,
“
Big
data
machine
lea
rning
using
apache
spark
MLlib,”
in
2017
IEEE
International
Conference
on
Big
Data
(Big
Data)
,
Dec.
2017,
pp.
3492
–
3498
,
doi
:
10.1109/BigData.2017.8258338.
[21]
R.
Anil
et
al.
,
“Apache
m
ahout:
Machine
learning
on
distributed
d
ata
flow
systems,”
J.
Mach.
Learn.
Res.
,
vol.
21,
no. 127, pp. 1
–
6, 2020.
[22]
N.
Kourtellis,
G.
De
Fr
ancisci
Morales,
and
A.
Bifet,
“
Large
-
scale
learn
ing
from
data
st
reams
with
apache
SAMOA,”
in
Learning
from
Data
Stream
s
in
Evolv
ing
Environ
ments:
Method
s
and
Applica
tions
,
M.
Sayed
-
Mouchaweh,
Ed.
Cham: Spring
er Inte
rnationa
l Publishing,
2019, pp.
177
–
207
,
[23]
J.
Huang,
J.
Lu,
and
X.
Ling,
“
Comparing
naive
b
ayes,
decision
trees, a
nd
SVM
with
AUC
and
ac
curacy,”
in
Third
IEEE Internat
ional C
onference on Da
ta Minin
g
, Nov. 2003, pp. 553
–
556
,
doi
: 10.1109/ICDM.2
003.1250975.
[24]
‘SNAP:
Web
data:
Amazon
movie
reviews
.
[Online].
Available:
htt
ps://snap.stanford.edu/data/web
-
Movies.html
(accessed Feb. 0
7, 2021).
[25]
R.
Karsi
,
M.
Zaim,
and
J.
El
Alami,
“
Impact
of
corpus
domain
for
senti
ment
classification:
An
evaluation
study
using
supervis
ed machin
e learning
techniq
ues,”
in
Journal
of Phys
ics: Co
nference Series
, vol. 870, no. 1,
2017
.
[26]
V. Vapnik,
The natu
re of s
tatistica
l learn
ing the
ory
. Springer science & business media, 2013.
[27]
S. B. Ko
tsiantis,
I. Zaharakis, an
d
P. Pi
ntelas, “
Superv
ised
machi
ne lea
rning:
A rev
ie
w of classification
techniques,”
Em
erg. Art
if. Int
ell. App
l. Com
put. En
g.
, vol. 160, no. 1, pp. 3
–
24, 2007.
Evaluation Warning : The document was created with Spire.PDF for Python.