TELKOM
NIKA
, Vol.14, No
.4, Dece
mbe
r
2016, pp. 15
10~152
0
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i4.3654
1510
Re
cei
v
ed Ma
rch 3
0
, 2016;
Re
vised O
c
to
ber 24, 20
16;
Accept
ed No
vem
ber 1
0
, 2016
Resear
ch on Identification Method of Anonymous Fake
Reviews in E-commerce
Lizhen Liu*
1
,
Xinlei Zhao
2
, Hanshi Wa
ng
3
, Wei Song
4
, Chao Du
5
Information a
n
d
Engi
ne
erin
g Coll
eg
e, Capit
a
l Norma
l Un
iv
ersit
y
, Bei
j
i
ng 1
000
48, P. R. Chin
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: liz_li
u
@1
26.c
o
m
1
, xinl
ei
891
2
@
sina.c
o
m
2
, necroston
e@si
n
a
.com
3
,
w
s
ong
@
c
nu
.ed
u
4
, cnuie
_dc
@12
6
.com
5
A
b
st
r
a
ct
In this
pa
per,
a
new
meth
od
has
be
en
pro
pose
d
for
id
entifyin
g
a
nony
mous f
a
k
e
rev
i
ew
s
gen
erate
d
by
c
lick far
m
ers
in
E-commerce
a
nd i
m
prov
es
th
e id
entific
atio
n
rates. Ano
n
y
m
ous fak
e
rev
i
e
w
s
are d
i
fferent fr
om the
gu
nui
n
e
revi
ew
s. They coul
d b
e
dist
ing
u
ish
ed
bas
e
d
o
n
the
cred
i
b
ility
of users,
the
avera
ge da
ily nu
mb
er
of eva
l
uati
ons
, th
e c
ontent s
i
mil
a
rit
y
, and
the
de
gree
of w
o
rd
overl
app
in
g. T
h
e
prop
osed
meth
od takes into a
ccount thes
e 5
features to calculate the fake
review
s content by constructi
n
g
mu
ltivari
a
te li
n
ear regr
essi
on
mo
del, Ex
per
iments sh
ow
that this pr
eli
m
ilnary w
o
rk p
e
rforme
d w
e
ll i
n
ide
n
tifying
fak
e
revi
ew
s in
Chin
ese E-c
o
mmerc
e
w
ebs
i
t
e. T
he extrac
ted featur
es a
r
e als
o
us
eful
to
ide
n
tifying th
e fake revi
ew
s when the rev
i
ew
er
’
s
id
entificati
on
is not acc
e
s
s
abl
e.
Ke
y
w
ords
: vol
u
me of fake re
view
s, feature extrac
tion,
mult
i-lin
ear re
gress
i
on, click far
m
i
n
g
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
It is ovserve
d
that the E-co
mme
rce a
nd
opinio
n
-sharin
g we
bsi
t
e are gaini
n
g
their
popul
arity followin
g
the e
m
erg
e
n
c
e of
Web 2.0 [1]. Online co
mmenting a
n
d
revie
w
ing
are
widely
ado
pte
d
in
many
E-commerce
pl
atform
su
ch
a
s
Amazo
n
, Ta
o
bao, a
n
d
Tm
all, whi
c
h
allo
w
use
r
s to ex
chang
e thei
r p
u
rcha
sing
or
con
s
umi
ng
e
x
perien
c
e
s
.
Positive reviews can in
cre
a
se
sho
p
’s reput
ations a
nd at
tract mo
re custome
r
s, wh
ile negative one
s may bri
ng out poten
tial
s
a
les
loss
[2
]. C
o
ns
u
m
er
s
u
s
e th
ese
r
e
view
s
no
t o
n
l
y to
re
c
e
ivew
or
d-
of-
m
o
u
t
h
(
W
OM)
information on products,
su
ch as quality, suitability and utility,
but al
so to input their own reviews
to advice oth
e
r co
nsume
r
s [3]. As the influence of
online revie
w
be
comin
g
a criti
c
al facto
r
of the
E-com
m
erce
market, unfo
r
tunately, it is notice
d
h
und
red
s
of
cli
ck f
a
ring
group
s
have
sprung
u
p
in recent years whi
c
h deliv
ering b
undl
es of fa
ke positi
v
e reviews to
merchant
s requiri
ng a qui
ck
and di
rty wa
y to boost th
eir po
pula
r
ity. The co
n
s
u
m
ers who m
a
ke
purch
ase de
cisi
on re
ly on
those fa
ke re
views p
o
ste
d
by ‘click fra
m
er’ coul
d b
e
disa
ppoi
nte
d
as the p
r
od
uct will not m
eet
their expe
cta
t
ion [4]. Moreover, many E-com
m
erce
platform
s pro
t
ect the prica
c
y of the use
r
s
throug
h p
r
ovi
d
ing a
nonym
ous
evaluatin
g se
rvice
s
,
p
o
tentially facil
i
tates the g
e
neratio
n of fa
ke
rev
i
ew
s.
No
w, the d
a
ta se
cu
rity is
the bigg
est i
s
sue fo
r the
con
s
um
ers in
E-comme
rce
[5]. So
anonymo
us f
a
ke
revie
w
s analysi
s
h
a
s
d
r
o
w
n ma
ny re
sea
r
che
s
’ attention.
The rea
s
on
of
investigatin
g the fake revi
ews in E-co
mmerce
is t
o
identify th
e fake revie
w
s an
d help
the
con
s
um
ers to
get ge
nuin
e
informatio
n of
the p
r
od
uct or se
rvice
th
e
y
want con
s
u
m
e.
The
r
efo
r
e,
how to id
entify the anonym
ous fa
ke reviews in E-
com
m
erce is
a n
e
ce
ssary
work. But, very fe
w
resea
r
chers
explore
an
on
ymous fa
ke
review i
dentif
ication
topi
c ti
ll no
w. Y. Fu
and
B. H.
Dong
[6] provide
a
model to
extract the fa
ke
review
s from
Taoba
o a
nd
Tmall web
s
ite, but the m
e
thod
they employ
ed is rely on
som
e
inte
rn
al bu
si
ne
ss
data, thu
s
m
a
y not ap
plicable to
other E-
comm
erce pl
atforms. Thi
s
resea
r
ch tries to se
t up a model to identify on anonymous fake revi
e
w
purely
rely
o
n
pu
blic reviews. In
stea
d o
f
usin
g spe
c
i
a
l
custo
m
er databa
se, ou
r
mo
del achi
eves
the goal by e
x
tracting review feature an
d model trai
ni
ng upo
n publi
c
revie
w
s.
For the p
r
o
b
l
e
ms m
ention
ed above, th
e pape
r firstly propo
se
s fi
ve assumptio
n
s of the
feature extra
c
tion ba
se
d on analy
s
is
of eval
uation
pro
c
e
ssi
ng.
Simultaneo
usly, a co
ncept
named Vo
FR is defined t
o
com
pute fa
ke review
s. Then a
sup
e
r
vised m
odel
for finding click
farming
pro
d
u
ct an
d dete
c
ting fa
ke re
views i
s
dev
elope
d with
VoFR. Finall
y
, the experi
m
ent
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
sea
r
ch on
Identification
Method of An
onym
o
u
s
Fa
ke Re
vie
w
s in
E-com
m
e
rce
(Lizhen Li
u)
1511
sho
w
s
the
su
pervised
VoF
R
m
odel a
c
cura
cy is 92.5
%
, precisi
on i
s
94.7%, the recall is 90%.
In
addition,
th
e uniqu
e contri
bution of
the
pape
r fills the
gap
s i
n
id
en
tification of
a
nonymou
s
fa
ke
rev
i
ew
s.
The rest of t
he pa
per i
s
orga
nized a
s
follows. Rel
a
ted work is
reviewed in
Section 2
.
Based
on
a
n
a
lysis of eval
uation
pro
c
e
s
sing,
we
p
r
op
ose
five a
s
su
mptions in
Se
ction
3, follo
wed
by extraction
of the featu
r
e
functio
n
s
and su
pe
rvised VoFR mo
dle in Sectio
n 4. Section 5
pre
s
ent
s ou
r detailed exp
e
r
imental resul
t
s. We su
mm
arize this work in Section 6
.
2. Related Work
Gene
rally, the re
sea
r
ch re
lated to analy
z
ing the fa
ke
reviews
can
be ro
ughly fal
l
en into
two cate
gori
e
s: identifying the text of fake
reviews an
d identifying the cli
ck farm
ers.
For fake text
detectio
n
, Jin
dal, et al.,
[7-9] propo
se
d the con
c
e
p
t of fake revie
w
s in 2007
at the first time. He coll
ect
ed the revie
w
s from Ama
z
on and m
anu
ally labelled t
he fake
revie
w
s.
Logi
stic
reg
r
ession
is em
ployed i
n
hi
s re
se
arch
to
identify the f
a
ke
reviews.
Lai, et
al., [10]
prop
osed a
reco
gnition m
e
thod na
med
unigram mo
del. The g
r
a
mmars an
d format
s of the
text
are used
as f
eatures for
cl
assification.
Ott, et al
., [11] formulated t
he probl
em
of identifying f
a
ke
reviews a
s
a
binary
cla
ssi
fication p
r
obl
em. Besid
e
s
that, the fak
e
reviews
are
s
o
lely identified
throug
h the texts in [12-1
4
]. Even through hug
e progre
s
s ha
s been mad
e
in text based f
a
ke
review dete
c
t
i
on in
re
ce
nt
years, th
e m
e
thod
s r
equi
re hig
h
level
u
nderstan
ding
s of th
e
word
s in
the revie
w
s,
whi
c
h i
s
ha
rd to a
r
chive
sin
c
e the
fake
revie
w
s
are inte
nde
d
to misle
ad
the
con
s
um
ers.
For
dete
c
ting
the
click fa
rmers, rese
arche
r
s a
s
sum
e
the
cli
c
k fa
rmers
co
uld
gene
rate
more fa
ke
reviews than
the gen
uin
e
revie
w
s. Referen
c
e
s
[
4
, 15] tra
c
k the beh
avior o
f
con
s
um
ers to
detect the cli
ck fa
rme
r
s. Li
u, et al
., [16]
exploited an ‘
unde
sired rul
e
’ to identify the
fake revie
w
e
r
s. Ho
wever, the aforem
ent
ioned alg
o
rith
ms co
uld onl
y identify a s
pecifi
c
categ
o
ry
of click farm
e
r
s. Mu
khe
r
je
e
,
et al., [17, 1
8
] co
n
s
ide
r
e
d
the gene
ratin
g
of fake reviews as
a gro
u
p
behavio
r, he prop
osed a triple-cross mo
del to ident
ify the click farmers throu
g
h
integrating t
he
grou
p featu
r
e
s
an
d individ
ual features.
Refere
n
c
e
s
[19, 20] con
s
i
dere
d
the
rel
a
tions
betwe
en
the reviewers, reviews a
n
d
shop
s
to identify the c
lic
k
farmers
.
In this pap
er,
both the u
s
erID a
nd hi
storic
activity of the con
s
um
ers
are
not a
v
ailable
unde
r ano
nymous
co
nditi
on, so that
we p
r
opo
se
a new
way to solve the
probl
em. The
fake
reviews a
r
e i
dentified thro
ugh feature e
x
traction an
d
model trai
nin
g
. The detail
s
of the methods
are a
s
followi
ng.
3. Click Far
m
ing Ev
alua
tion Proces
s
i
ng
To better u
n
d
e
rsta
nd the f
a
ke
revie
w
s
and the f
eatu
r
e of cli
ck fa
rmers, we inv
e
stigate
d
the pipeli
ne
o
f
gene
rating
fake
revie
w
s i
n
Tao
bao
&
Tmall, two l
a
rgest E
-
comm
erce pl
atform
in
Chin
a. A det
ailed m
odel
o
f
click fa
rmin
g is studi
ed
comprehe
nsiv
ely. As sho
w
n in Fi
gure 1,
the
survey fo
und
that fake
revi
ews a
r
e
gen
e
r
ated
exac
tly
the same
as
the ge
nuine
reviews. As th
e
the revie
w
i
s
made
ano
n
y
mously, the
r
e i
s
no
wa
y
to acce
ss the real ide
n
t
i
fication of t
h
e
con
s
um
ers, whi
c
h brin
gs
difficultie
s to
our
in
ve
stigation. T
h
erefo
r
e,
we
pro
p
o
s
ed
five
assumptions to describe the anonym
ous reviews:
the users’
credibility, the average daily
numbe
r of ev
aluation
s
, the
similarity of reviews
a
nd
prod
uct
s
de
scriptio
n, the overlap
between
reviews an
d ratio betwe
en
sale
s volume
and sh
op ru
nning time. F
eature
s
a
r
e e
x
tracted b
a
se
d
on these assumption
s and
then use
d
to train a mod
e
l to identify the fake revie
w
s.
Figure 1. Click farmin
g pro
c
e
s
s
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1510 – 152
0
1512
3.1. Users
Credibilit
y
Once co
nsu
m
ers finishe
d
their sho
ppi
ng and
evalu
a
ting, merch
ants in Tao
b
ao&Tmall
will give the
m
a positive
or neg
ative rating. T
he ‘p
ositive’ rating
will add on
e
point, ‘negat
ive
’
rating
redu
ce
one poi
nt, thus
cau
s
ed
co
nsum
ers to
p
r
odu
ce th
e dif
f
erent u
s
e
r
credibility. So, the
more
con
s
um
ers u
s
e thei
r ID for shoppi
n
g
the more
credibility they have.
Users cre
d
ibil
ity reflect bot
h the num
ber of
good
s that
the co
nsume
r
ha
s pu
rcha
sed an
d
the credit of the co
nsume
r
has. The
cre
d
ibilit
y cann
o
t
be hidden e
v
en if the co
nsum
er
cho
o
s
e
anonymity. Accordi
ng to
o
u
r inve
stigati
on, the
cli
c
k farme
r
’s
sala
ry is p
a
rtially
depe
nding
on
th
e
cre
d
ibility. Hi
gher credi
bility is
corre
s
po
nding to
hig
h
e
r
sala
ry. Ho
wever,
wh
at
we fo
und i
s
E-
comm
erce pl
atform alway
s
re
stri
ct the maximu
m nu
mber of revie
w
s that ca
n b
e
posted by
one
a
c
c
o
un
t in
a
c
e
r
t
a
i
n
pe
r
i
od
. So
th
e c
l
ick
far
m
er
s
,
i
n
usu
a
l, ha
s
m
u
ltiple u
s
e
r
ID wh
ose id
enti
t
y
has not been verified by th
e E-comm
erce platfo
rm. These auxiliary accounts al
ways have lower
buying re
co
rds an
d lower user
credibili
ty, which
at the end, de
cli
nes the ave
r
age credi
bility of
the p
r
od
uct
who
s
e
revie
w
s a
r
e
gen
erated by
the
c
lick farm
ers.
Also,
sin
c
e th
e cost
of hi
ri
ng
a
click farm
er
with high
er
credibility is hi
gher th
an
hiri
ng a cli
c
k farmer with l
o
wer credi
bility,
the
mercha
nt always cho
o
se to hiring mo
re
click fa
rme
r
s to generate
more reviews instead of hiring
some
one
ha
s more
experi
ence. Hen
c
e,
we
a
s
sume
that ne
gative
correl
at
ion exists between
the
use
r
s
credibili
ty and numbe
r of fake
revie
w
s that a p
r
o
duct re
ceive
d
:
Assumption I: The average user’s credibility of
a pr
oduct with more fake revi
ews is
lowe
r than th
e averag
e user’s
credibly
of
a produ
ct
with more ge
nuine reviews.
3.2. Av
erage Daily
Number of Ev
aluations
The evaluatio
n time can b
e
obtain from
the
website,
almost all the merchant
s cho
o
se
click fa
rming
is for im
proving sale
s an
d
rati
ng. Th
e
high
sale
s m
ean b
e
tter re
putation, bett
e
r
prod
uct
s
. Sim
ilarly, these h
i
gher re
putati
on an
d
sale
s
sho
p
s will
attract mo
re
co
n
s
ume
r
s to b
u
y
,
and the
n
th
e daily n
u
m
ber
of revie
w
s will i
n
cre
a
se. T
h
e
s
e
mercha
nts
h
a
ve a
com
m
on
cha
r
a
c
teri
stic that the average daily
nu
mber of evalu
a
tions i
s
lowe
r.
As
s
u
mption II: The
sc
alping produc
t
s
av
erage
daily number of
evaluations
is
lower than
norm
a
l pro
d
u
c
ts.
3.3. Similarity
of Rev
i
e
w
s
and Products Des
c
riptio
n
Since fa
ke
re
views
are
not
gene
rated t
h
roug
h
any cu
stome
r
expe
ri
enc
e, the con
t
ents of
the fake
revi
ews a
r
e
always m
onoton
ous an
d tire
some. Th
e o
n
l
y sou
r
ce fo
r cli
c
k farm
ers to
obtain the informatio
n abo
ut the produ
ct is through
readin
g
its onl
ine de
scriptio
n. Since the fake
reviews a
r
e u
s
ually ge
ne
ra
ted and o
r
g
a
n
ize
d
ba
sed
on
the de
scri
ption, re
sult i
n
a high
simil
a
rity
betwe
en the fake reviews a
nd t
he de
scri
ption of the produ
ct.
Assumption I
II: The similarity between the
fake
reviews and t
he
description of the
prod
uct is hi
g
her than the
similarity between t
he true
reviews and th
e descri
p
tion
of the produ
ct
.
3.4. O
v
erlap bet
w
e
e
n
Rev
i
e
w
s
Acco
rdi
ng to
expecta
ncy t
heory
in
refe
rence [21], th
e motivation
force
exp
e
rie
n
ce
d by
an individual t
o
sele
ct one
behavio
r from
a larger
set is som
e
functi
on of the perceived likelih
o
od
that that behavior will
result
in the attainment of various
outcomes weighted by
the desi
rability
of
these
out
com
e
s to t
he p
e
rson. Sin
c
e
itis n
o
t re
wa
rd
able to
publi
s
h a
ca
refully
written
revie
w
,
whi
c
h i
s
gen
erally
betwe
e
n
0.3
to 1.3
dollars, th
e
cl
ick farm
ers,
who
a
r
e l
a
ck of real
custo
m
er
experie
nce, a
l
ways
ch
oo
se
to edit a
nd
reorg
ani
ze
pre
v
ious
revie
w
s. Therefore, t
he fa
ke
revie
w
s
alway
s
plagia
r
ize e
a
ch oth
e
r. Refe
ren
c
e
[22] refe
rred
to a method
of detecting
text plagiarism,
COPS whi
c
h
detecte
d do
cument overl
a
p by rely
ing o
n
string m
a
tching an
d se
ntences. Le
arni
ng
the method , the overla
p be
tween reviews is calculate
d
.
Assu
mption I
V
: The overla
p betwe
en fake revi
e
w
s is higher tha
n
the simila
rity betwe
en
the genui
ne reviews.
3.5. Ratio be
t
w
e
e
n
Sales
Volume and Shop Runni
ng Time
Based o
n
sta
t
istics, we found that there is
a relative
consta
nt ratio betwee
n
the selling
volume
and
the time
of th
e onli
ne
sh
o
p
e
s
tabli
s
he
d
.
A cli
c
k farming
sh
op
u
s
ually
ha
s
highe
r
selling volum
e
but short running time.
Assumption V:
The click
f
a
rm
ing product has
a lower rati
o between
selling vol
u
me and
sellin
g age th
an a normal p
r
odu
ct.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
sea
r
ch on
Identification
Method of An
onym
o
u
s
Fa
ke Re
vie
w
s in
E-com
m
e
rce
(Lizhen Li
u)
1513
4. Superv
ised VoFR Mod
e
l
We p
r
o
p
o
s
e
a ne
w meth
o
d
to identify click fa
rmin
g p
r
odu
ct by
co
mputing the
volume of
face
reviews
(VoFR). Th
e
VoFR i
s
com
puted b
a
sed
on the featu
r
es p
r
op
osed
in Section
3, a
multi-line
a
r
regre
s
sion
m
odel i
s
ad
opt
ed, and th
e
result of the
multi-line
a
r
regre
s
sion i
s
th
e
VoFR m
odel.
He
re, the
VoFR
of cli
c
kfarmin
g
p
r
o
d
u
c
t is 1, the V
o
FR
of a
normal p
r
odu
ct i
s
0
.
The mod
e
l is
defined a
s
fol
l
owin
g:
01
2
3
4
5
()
()
()
()
()
ci
t
i
s
i
o
i
r
i
Vo
FR
p
p
p
p
p
(
1
)
()
ci
p
,
()
ti
p
,
()
s
i
p
,
()
oi
p
,
()
ri
p
respe
c
tively repre
s
e
n
t the feature f
unctio
n
of
use
r
s
credi
bility, the average d
a
ily nu
mber
of
eval
uation
s
, simil
a
rity of revie
w
s
and
pro
d
u
ct
descri
p
tion, t
he ove
r
lapb
e
t
ween
revie
w
s, and
the ra
tio
betwe
en sales
vol
u
me and regi
strati
on
time.
05
are the weig
hts, they are lea
r
ne
d throu
gh the training d
a
tase
t.
4.1. Credibilit
y
Feature Function
Althoughthe
use
r
ID i
s
an
onymou
s
, nei
ther coul
d tra
ck
any histo
r
i
c
information
of any
click fa
rme
r
s,
user’
s
cre
d
ib
ility can b
e
o
b
tained. T
he
weig
ht is a
s
sign
ed b
a
sed
on T
able
1
. T
h
e
footnote rep
r
ese
n
ts the weight
of prod
u
c
ts ’s
review.
Table 1. The
Weig
ht
ij
User
s cr
edibility
ij
0 gold cro
w
n
1
1 red hea
rt
0.9
2 red hea
rt
0.8
3 red hea
rt
0.7
4 red hea
rt
0.6
5 red hea
rt
0.5
1 diamond
0.4
2 diamond
0.3
3 diamond
0.2
4 diamond
0.1
5 diamond and a
bove
0
In Taobao, 0 Golden
Crown
represent
s user
wi
th t
he lowest credibility. User with the
greate
s
t credi
bility is assi
g
ned the lo
we
st weig
ht
to ensu
r
e the
cre
d
ibility of pro
duct fall into the
interval of [0, 1].
Cal
c
ulation of
average
cred
ibility following formula:
'
()
()
||
ij
i
j
j
ci
i
j
i
c
pC
R
(
2
)
''
''
()
m
i
n
(
()
)
()
ma
x
(
(
)
)
m
i
n
(
(
)
)
ci
ci
ci
ci
c
i
pp
p
p
p
(
3
)
ij
C
represent
s the average degree
of
credibility of the product
i
p
’s review
j
r
,
||
i
R
rep
r
e
s
ent
s the numb
e
r of
prod
uct
i
p
of all
reviews,
()
ij
i
j
j
c
repre
s
e
n
ts th
e sum of th
e
numbe
r of
p
r
odu
cts
i
p
’s revi
ews
j
r
with
the
co
rrespon
din
g
weight
s
ij
. The average
credibility
is comp
uted
whi
c
h
contai
n
s
all th
e fake
and
gen
uine
revie
w
s
of o
ne p
r
od
uct. T
he final valu
e
is
norm
a
lized th
roug
h max-mi
n norm
a
lization.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1510 – 152
0
1514
4.2. Time Feature F
unc
tion
Time featu
r
e
function i
s
usedto na
me t
he
ave
r
ag
e n
u
mbe
r
of eva
l
uation
s
du
rin
g
on
e
day, is rep
r
e
s
ented by
()
ti
p
:
'
()
i
ti
R
p
T
(
4
)
''
''
()
m
i
n
(
()
)
()
1
m
a
x
(
(
)
)m
i
n
((
)
)
ti
t
i
ti
ti
t
i
pp
p
pp
(
5
)
Whe
r
ein,
i
R
rep
r
esents all th
e reviews of
i
p
’s,
T
re
pre
s
e
n
ts the total days of coll
e
c
tion to
obtain
i
R
.
4.3. Similarity
Feature Fu
nction
Since the
cli
ck fa
rme
r
s d
o
not have custom
e
r
exp
e
rien
ce, thei
r reviews are
always
based
o
n
the
descri
p
tion of
pro
d
u
c
ts, so that
t
he
simil
a
rity between
the reviews
and th
e p
r
od
uct
can b
e
used to cha
r
a
c
ter th
e genui
nene
ss of the revie
w
.
(,
)
||
|
|
T
ij
ij
ij
i
ij
i
si
m
rr
rd
rd
(
6
)
'
()
(
(
,
)
)
s
ii
j
i
pM
e
a
n
s
i
m
rd
(
7
)
''
''
()
m
i
n
(
()
)
()
ma
x
(
(
)
)
m
i
n
(
(
)
)
si
si
si
si
si
pp
p
p
p
(
8
)
ij
r
rep
r
e
s
ent
s th
e vecto
r
of review
j
r
for pro
duct
i
p
,
i
d
represents th
e ve
ct
or
of th
e
descri
p
tion of
produ
ct
i
p
,
(,
)
ij
i
sim
rd
is the co
sine
si
m
ilarity between produ
ct
i
p
’s revie
w
and
its description,
()
s
i
p
is the norma
lized
simila
rity sco
re.
Revie
w
s d
a
taset
s
and
de
scription
data
s
ets a
r
e b
u
ilt to calculate
the aforem
e
n
tioned
simila
rity,the pro
c
e
ss i
s
sh
own in Fig
u
re
2.
Figure 2. Cal
c
ulatio
n the si
milarity
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
sea
r
ch on
Identification
Method of An
onym
o
u
s
Fa
ke Re
vie
w
s in
E-com
m
e
rce
(Lizhen Li
u)
1515
The simil
a
rit
y
between e
a
ch review
with the pro
duct de
script
ion is calcul
ated to
measure the obje
c
tivity of
reviews. The
highe
r t
he de
gree of simil
a
rity between
prod
uct
s
with
its
descri
p
tion, the less em
otional word
s it
used,
the m
o
re o
b
je
ctive of the review. Base
d on
the
AssumptionIII, the more
objective of
the review,
the hi
gher possi
bili
ty of the review bel
ongs to
a
fak
e
review.
4.4. O
v
erlap Feature Fun
ction
A wo
rd
set
fo
r e
a
ch review of p
r
od
uct
i
p
is built, excl
udi
ng a
n
y du
plicated
wo
rds.
The
averag
e overl
ap value bet
wee
n
two word sets i
s
co
m
puted a
s
follo
wing:
()
1
{}
i
NP
ii
j
j
Rr
(
9
)
'
0
||
()
(
)
max
(
|
|
,
|
|
)
ij
i
ij
RR
pM
e
a
n
RR
(
1
0
)
''
''
()
m
i
n
(
()
)
()
ma
x
(
(
)
)
m
i
n
(
(
)
)
oi
oi
oi
oi
o
i
pp
p
p
p
(
1
1
)
i
R
re
pre
s
e
n
ts word
colle
ction
co
ntain
s
all t
he
wo
rd
set
s
of
i
p
,
'
0
()
i
p
repre
s
en
ts the
averag
e overlap value bet
wee
n
any pai
r of wo
rd sets,
()
oi
p
rep
r
e
s
ent
s the norm
a
lized valu
e
of overlap score.
4.5. Ratio Fe
ature F
unc
tion
Analysis the
selling volum
e
fi
nd that the selli
ng volume of
cli
c
k
farming shops are
l
e
ss
than the normal sho
p
s, h
o
weve
r, their regist
rati
on time only have slight difference. Hen
c
e
we
add this featu
r
e into ou
r model.
'
()
i
ri
i
G
p
L
(
1
2
)
''
''
()
m
i
n
(
()
)
()
1
ma
x
(
(
)
)
m
i
n
(
(
)
)
ri
ri
ri
ri
ri
pp
p
p
p
(
1
3
)
i
G
represent
s the total selling volume of product
i
p
,
i
L
represe
n
ts the a
ge of sho
p
i
p
’s ,we
employ the same data no
rmalizatio
n m
e
thod to normalize o
u
r d
a
t
a.
5. Experiment An
aly
s
is
The overvie
w
of the propo
sed method i
s
sho
w
n in Fig
u
re 3.
Figure 3. Experime
n
tal flow ch
art
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1510 – 152
0
1516
80 pairs of produ
ct are ra
n
domly sele
ct
ed as
the trai
ning data
s
et, 20 pairs as a
testing
dataset.A multi-linea
r reg
r
essio
n
i
s
le
arne
d o
n
tra
i
ning d
a
taset, and te
sted
on the te
sti
n
g
dataset.
5.1. Training Data
se
t Coll
ection
To investigat
e the feature
s
of fake rev
i
ew
s, unde
rst
and the pipel
ine of click farmin
g,
1163
63 revie
w
s f
r
om T
a
o
bao a
nd Tm
all platform
are
colle
cted
. The ge
nui
ne reviews
are
sele
cted
an
d
do
wnlo
ade
d
from
the
sh
ops wher
e th
e auth
o
rs
ha
d pu
rcha
sed
somethi
ng f
r
om.
The
sho
p
s h
a
ve truly g
o
o
d
con
s
ume
r
experie
nce a
nd existe
d fo
r a l
ong
pe
riod of time.
To
colle
ct the fa
ke
revie
w
s, t
he auth
o
r
pre
t
ended to
be
a cli
c
k farm
er. We fou
nd th
at fake
revie
w
s
are al
so
overwhelm
ed in T
m
all platform.
After
data prepro
c
e
s
sing,
featur
e
s
a
r
e extracted on
t
h
e
basi
s
of assu
mptions m
ent
ioned in
se
ction 3.
5.1.1. Fake Rev
i
e
w
s
Data
set
Thro
ugh inve
stigation, we
found that cli
ck fa
rming
produ
cts a
r
e u
s
ually com
e
wi
th kick-
started
onlin
e sh
op
s who
s
e reviews a
r
e few
and
selling volum
e
s are lo
w. T
he qu
ality of the
prod
uct
s
that
they sell is not
necessa
ri
ly worse tha
n
their co
mp
etitors. The f
a
ke reviews
are
mainly ge
nerated for imp
r
oving th
eir
sellin
g volu
me, reven
u
e
,
and
ran
k
in
gs.
Hen
c
e, t
h
e
followin
g
rule
s are e
m
ploy
ed in pre
p
rocessing.
Rule
1: exclud
e any reviews that contain i
m
age
s or a
d
d
i
tional com
m
ent.
It is not usual
to insert ima
ges o
r
additio
nal com
m
ent
s in the fake
reviews.
Rule
2: exclud
e any reviews that human cannot ide
n
tify.
A label can
n
o
t
be assi
gne
d
if human cou
l
d not disting
u
ish the revie
w
.
Rule
3: exclud
e any reviews t
hat contain
advertiseme
n
t
s.
In practi
ce, th
ese reviews can be excl
ud
ed throu
gh si
mple heu
ri
stic rule
s.
Rule
4: exclud
e any reviews that
contain real sh
oppin
g
experie
nce.
Rule
5: exclud
e any negativ
e reviews.
Based
o
n
five rule
s, reviews of
ne
ck a
nd
sho
u
lde
r
massag
e
hav
e be
en
preproce
s
sed
,
as sho
w
n in
Table 2.
Table 2. Ama
z
on, Ta
obao
and Tmall d
a
ta comp
ari
s
o
n
Review
s tex
t
Rules
Product s is also prod
uct, gave
a pr
oduct choi
ce
elders birthd
a
y
gift
!
Like a p
r
oduct mothe
r
,
product custome
r
service attitude, logistics fast
!
E
xcessive additional period of time to
continue.
http://img.alicdn.com/bao/uploade
d/i1/197630138
3
83715927/
TB2Iu
NRhXXXXXaQX
pXXXXXXXX
XX_
!
!
0
-
rate.jpg
http://img.alicdn.com/bao/uploade
d/i2/197630138
3
83737204/
TB2OGd
Y
h
XXXXXX4
XpXXXXXXXX
XX_
!
!
0
-
rate.jpg
Rule1
Product ver
y
pro
duct, praise.
Rule 2
Professional tattoo, please add
QQ *********
Rule 3
Product packaging, housing also has
a correspon
ding manufactur
e
rs. But the pr
od
uct gap is large,
does not meet the proper qualit
y
b
r
and. T
he use of fever phenomenon, the ge
neral attitude of
customer
ser
v
ice Mike.
Rule 4
It’s completely
fa
ke, I'm not black you,
w
h
a
t
is to spread the p
r
oduc
ts, on the value of a 20
y
uan a
massage head is crooked, the intensity
is ver
y
sm
all, y
e
t I bought
40
y
uan a pr
odu
ct, w
ho do no
t
believe, w
ho
regr
et, asked people
to click farming all the praise
!
Rule 5
After pretreat
ment the fake revie
w
s d
a
taset ha
s a to
tal of 80 pro
duct
s
, 51 me
rch
ants,
1070
8 revie
w
s, related to t
he eight cate
gorie
s of
clot
hing, footwe
a
r
, electroni
c applia
nce
s
, a
n
d
furniture etc.
5.1.2. Genuine Rev
i
e
w
s
Datase
t
Contrast to f
a
ke
revie
w
s,
genui
ne reviews ar
e also
from Ali's T
aoba
o&Tmall
web
s
ite.
Facin
g
the
same p
r
odu
ct
and p
r
ice, the highe
r t
he
ran
k
ing th
e
more li
kely to
be sel
e
cte
d
. So
click fa
rme
r
s prevalent in
ne
wly e
s
tbli
she
d
sho
p
o
r
a n
e
w p
r
od
uct. Fo
r m
e
rcha
nt who
sales
ran
k
ing
s
alre
ady hig
h
, there is no
ne
ed f
o
r
click
fa
rmi
ng, be
ca
use t
hese
sho
p
s i
n
pre-sal
e
s h
a
ve
accumul
a
ted
a gre
a
t deal
of popul
arity. So we
cho
o
s
e hi
ghe
r sal
e
s me
rcha
nt, and
cho
o
se
the
recent reviews to set up g
enuin
e
revie
w
s
data
s
e
t. High
sal
e
s
of prod
uct
s
’re
v
iews
gen
era
lly
have ten
s
of
thou
sand
s,
according
to
the ti
me se
quen
ce, we cho
o
se
the reviews
ne
arl
y
2
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
sea
r
ch on
Identification
Method of An
onym
o
u
s
Fa
ke Re
vie
w
s in
E-com
m
e
rce
(Lizhen Li
u)
1517
months, an
d
then pretre
atment, artificial re
m
o
ve advertisi
ng reviews, get genui
ne revi
ews
dataset, used
for Vo
FR mo
del training.
T
he g
enui
ne
reviews d
a
taset ha
s a
total
of 80
p
r
od
ucts,
50 merch
ants, and 1056
55
reviews, rel
a
ted to eight ca
tegorie
s.
5.2. Superv
ised VoFR M
odel Analy
s
is
Feature functi
on’s
cal
c
ulati
ons
sho
w
u
s
, ther
e is diffe
rent betwe
en the fake
revie
w
s a
nd
genui
ne revie
w
s. Expre
s
se
d as
..
.
.
..
.
.
()
(
)
ct
s
o
r
s
h
i
l
l
ct
s
o
r
t
r
u
t
h
pp
in Fig.4-8. We ch
oo
se
40 pairs of
the data as
sample.
Figure 4. Cre
d
it’s com
p
a
r
ison
Figure 5. Time’s compa
r
i
s
on
Figure 6. Similarity’s co
mp
arison
Figure 7. Overlap’
s co
mpa
r
ison
Figure 8. Rati
o’s compa
r
i
s
on
Figure 9. Cha
nge of
2
R
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 4, Dece
mb
er 201
6 : 1510 – 152
0
1518
For the traini
ng dataset 80
pairs of prod
ucts
’ featu
r
e
s
function re
su
lts, using the
amount
of data
accu
mulated
gra
d
ually fitting m
anne
r VoF
R
model, a
nd t
he mo
del i
s
f
i
tted with 1
0
, 20
and 4
0
pai
rs datasetsg
ra
dually
accu
m
u
lated, an
d the
2
R
’s cha
nge
s are foun
d
as
sho
w
n i
n
Figure 9.
Acco
rdi
ng to
the
ch
ange
of
2
R
, w
h
en
c
hoo
s
i
ng
60
pa
irs
,
2
R
has a
do
wnwa
rd
tren
d.
Whe
n
ch
oo
si
ng of 80 pairss,
2
R
has imp
r
oved. It is specul
ated that
as the data
s
et incre
a
se,
model fitting a grad
ual increase in the degre
e
, the fi
nal choi
ce of 8
0
pairs as the
training set, get
the model pa
rameters a
s
shown in the followin
g
Tabl
e 3. Final determin
a
tion VoFR mo
del:
0.
93
7
1
.
4
1
(
)
0
.
2
9
7
(
)
0.
15
7
(
)
0
.
9
2
(
)
0
.
2
3
6
(
)
ci
t
i
s
i
o
i
r
i
VoF
R
p
p
p
p
p
Table 3. Mod
e
l para
m
eter
05
0
1
2
3
4
5
Value -0.937
1.410
0.297
0.157
0.920
0.236
Test d
a
tasets is compo
s
ed by 2
0
cli
c
k farmin
g p
r
odu
cts (3
22
52, 13
sh
op
s) an
d 2
0
norm
a
l pro
d
u
c
ts (2
453, 13
shop
s).T
he critical val
ue
is 0.5. If VoF
R
‘s out
put is highe
r than 0
.
5,
the pro
d
u
c
t is identified cli
c
k farmin
g, oth
e
rwi
s
e
i
s
no
rmal pro
d
u
c
t. The cl
assifica
tion re
sults a
r
e
sho
w
n in Fig
u
re 10:
Figure 10. VoFR re
sult
s of the test dataset
5.3. Ev
aluation Indicator
s
The evaluatio
n indicators: accuracy, pre
c
isi
on, and re
call. He
re defi
ned a
s
follows:
|
TP
| is the number of cli
c
k farming p
r
od
ucts
co
rre
ctly to be predi
cted cli
ck farmi
ng.
|
FP
|is the number of the n
o
rmal p
r
od
uct
s
wrong to be
predi
cted cli
c
k farmin
g.
|
FN
| is the nu
mber of cli
c
k farming p
r
od
ucts
wrong to
be predi
cted
the norm
a
l.
|
TN
|is the nu
mber of no
rm
al prod
uct
s
correctly to be predi
cted the
norm
a
l.
Formul
a as fo
llows:
||
|
|
|
|
||
||
|
|
TP
TN
Ac
cu
r
a
cy
TP
F
P
TN
FN
(
1
4
)
||
Pr
||
|
|
TP
ec
i
s
i
o
n
TP
F
P
(
1
5
)
||
Re
||
|
|
TP
ca
l
l
TP
F
N
(
1
6
)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Re
sea
r
ch on
Identification
Method of An
onym
o
u
s
Fa
ke Re
vie
w
s in
E-com
m
e
rce
(Lizhen Li
u)
1519
A
ccu
ra
cy
re
p
r
es
ent
s
an a
c
cur
a
cy
of
clic
k f
a
rmi
ng p
r
o
duct
s
an
d no
rmal pro
d
u
c
ts
can
be
corre
c
tly cla
s
sified. Pre
c
isi
on re
pre
s
e
n
tsthe cl
i
c
k farming produ
ct
s ca
n be
su
cce
ssfully dete
c
ted
the ac
curac
y
rate. It reflec
t
s
the ac
curacy of the
cl
assi
fication
re
sult
s of Vo
FR
mo
del. Recall th
at
issaid th
e p
r
obability of
a
co
rrect
cl
assification
of th
e cli
c
k fa
rmin
g, sai
d
th
e p
r
oportio
n
of cl
ick
farming in tot
a
l prod
uct
s
.
No
w, almost
all resea
r
ch on fake revie
w
s a
r
e relyin
g on Amazo
n
. We co
mp
ared the
different from
the revie
w
s
data on Ama
z
on, Ta
oba
o
and Tm
all. As sho
w
n in T
able 4. Diffe
rent
data makes it
difficult on the study.
Table 4. Ama
z
on, Ta
obao
and Tmall d
a
ta comp
ari
s
o
n
E-commerce
platform
Rating Review
er
sID
Review
susefulness
T
i
me
User
s
cr
edibility
Users other
purchase
Users
ranking
Amazon
√
√
√
√
√
√
√
Taobao
×
×
×
√
√
×
×
Tmall
×
×
×
√
√
×
×
Table
4 tells that ano
nymous u
s
ers
can
not be
track the
pu
rchase hi
story
and t
h
e
reviews
can
n
o
t be evalu
a
ted by ratin
g
, so
we
p
r
op
ose Vo
FR m
odle to
cla
s
sify clickfa
r
mi
n
g
prod
uct
s
and
the fake reviews.
Referen
c
e [6]
,
the data
s
et
is al
so from
Taoba
o an
d
Tmall. The
cli
ck fa
rme
r
s’ u
s
erI
D
i
s
obtaine
d, tracki
ng pu
rcha
sing info
rmati
onfor an
alysi
s
and getting
14 feature
s
. SVM algorithm
and KNN al
gorithm
are
resp
ectively
applie
d for i
dentificatio
n
click farmers in Taob
ao.
Its
evaluation in
dicato
rs a
nd
ours are sh
o
w
n in Tabl
e 5
.
Table 5. Evaluation Indi
cat
o
rsCo
mpa
r
ison
M
e
thod
A
c
c
u
rac
y
P
r
ec
i
s
i
o
n
Rec
a
l
l
|TP
|
|TN
|
|FP
|
|FN
|
SVM 93%
88%
100%
30
26
4
0
KNN 78%
81%
73%
22
25
5
8
VoFR
92.5%
94.7%
90%
18
19
1
2
First, Refe
re
nce
[6] wa
s analyzed
fo
r click
fa
rm
e
r
s’
use
r
ID an
d p
u
rcha
sing
inf
o
rmatio
n,
whi
c
h i
s
difficult to obtain.
In the era
of data
security, su
ch a
n
app
roa
c
h al
so
co
uld infrin
ge o
n
use
r
p
r
ivacy,
it is un
safe.
Our
data
s
et
is fr
o
m
the
publi
c
inform
ation in T
a
o
bao a
nd Tm
all.
Secon
d
, sin
c
e the data is too spe
c
ific in
Refere
nc
e[6], thereby cau
s
ing po
or
can
migrate. Third,
comp
ari
ng Referen
c
e [6]’
s 14 features,
our
re
sea
r
ch
has
5 feature
s
. But the re
sult of evaluation
index i
s
not
much
differen
c
e. Preci
s
io
n
is b
e
tte
r tha
n
SVM and
KNN. So
that,
some
featu
r
e
s
in
Referen
c
e[6] are useless. Forth,
Refere
nce[6] con
d
u
c
ted re
sea
r
ch
for the
co
nsumers, while
this
pape
rresearch for p
r
od
uct
s
which can
be ap
plied
fo
r any
catego
ry
, more
com
p
rehen
sive d
a
ta is
available.
6. Conclusio
n
s and Fu
tur
e
Work
Click farmi
ng
in E-co
mme
rce is a
n
unig
nora
b
ly and
challen
g
ing i
s
sues, it misle
a
d
ing the
con
s
um
er'
s
purcha
s
e d
e
c
isi
o
n
s
. This paper
com
pare
s
the
click fa
rm pattern an
d no
rmal
sho
ppin
g
pat
tern, then
p
r
opo
se
s the
appli
c
ation
of the feature functio
n
a
nd multi-li
ne
ar
reg
r
e
ssi
on m
e
thod of
co
n
s
tru
c
ting Vo
F
R
mo
del. By cal
c
ulatin
g th
e volume
of fake
revie
w
s to
identify
click farming pro
d
u
cts, real co
nse
u
me
r’
s
d
e
ci
sion i
s
p
r
ovided to u
s
ers. Expe
rim
ental
datasets o
b
ta
in from
Chin
a's l
a
rg
est E
-
comm
erce
pl
atform’s
(Ta
o
bao a
nd Tm
a
ll) revie
w
s, a
n
d
manual ta
ggi
ng the a
nony
mous user
re
views
of st
an
dard
data
set
s
to en
su
re t
he a
c
cura
cy
of
the data. Experime
n
ts
sho
w
that by
calculation an
d a
nalysi
s
, the five feature
s
fu
nction
s a
s
inp
u
t
data for VoF
R
model a
r
e
effective iden
tification method.
The future work
will be further
study for Vo
FR model,
extracting
feature funct
i
ons
and
feature
s
of i
m
porta
nce ra
ting furthe
r i
m
prove
t
he a
c
cura
cy
of
V
o
FR mod
e
l. And
the meth
od
i
s
extended to o
t
her fake
reviews to identify missi
ng info
rmation, an
d to expand the
applica
b
ility of
Evaluation Warning : The document was created with Spire.PDF for Python.