Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
Vol.
5, No. 6, Decem
ber
2015, pp. 1553~
1
563
I
S
SN
: 208
8-8
7
0
8
1
553
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
Recomm
ender Systems
in Light of Big Dat
a
Kha
d
i
j
a A. Al
mohse
n
, Hud
a
Al
-
Job
ori
Department o
f
I
n
formation Tech
nol
ogy
,
Ahlia University
, Bahrain
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Feb 13, 2015
Rev
i
sed
Ju
l 6
,
2
015
Accepte
d
J
u
l 28, 2015
The growth
in t
h
e us
age
of th
e
web, es
pe
ci
all
y
e-com
m
e
rce we
bs
ite,
has
l
e
d
to the develo
pment of recommender sy
stem (RS)
which aims in
personali
zing th
e web cont
ent f
o
r each user
and
reducing th
e co
gnitive
load
of inform
ation on the us
er. However, as
th
e world enters
Big Data er
a and
lives through
the contemporar
y
d
a
ta e
xplosio
n, th
e main go
al of
a RS
becom
e
s to provide m
illions of h
i
gh quali
t
y
reco
m
m
e
ndations in few seconds
for the in
creasing number of users and item
s
. One of the successful
techn
i
ques of
RSs is collabora
tive f
ilt
e
r
ing (CF) which m
a
kes
recommendation
s
for users based on what o
t
her
like-mind
users had
preferred
.
Des
p
i
t
e its
s
u
cces
s
,
CF
is
facing s
o
me chal
lenges
po
s
e
d b
y
Big
Data,
s
u
ch as
:
s
cal
abil
it
y,
s
p
ars
i
t
y
and
cold s
t
ar
t. As
a
cons
equ
e
nce
,
new
approach
es
of
CF
that ove
rco
m
e the
ex
isting
problems have been stud
ied
such as Singul
ar value deco
m
positi
on (SVD). This paper
survey
s the
liter
a
tur
e
of
RSs and r
e
vi
ews the
curren
t
st
ate
of
RSs with the
m
a
in con
cerns
surrounding them due to Big Data. Furt
hermore,
it investig
ates thorough
ly
SVD, one of the promising approaches
exp
e
c
t
ed
to perform
well
in ta
cklin
g
Big Data
challen
g
es, and prov
ides an
im
plem
enta
tion to
it using s
o
m
e
of the
successful Big
Data
tools (
i
.e. Apach
e H
a
doop and Sp
ark).
This
im
plem
entat
i
on
is intend
ed t
o
valid
ate
the
appli
cabi
lit
y
of, exist
i
ng
contributions to the field of
, SVD-
bas
e
d RSs as
well as
validat
ed t
h
e
effectiven
ess of Hadoop and sp
ark in
develop
i
ng large-scale s
y
stems. The
im
plem
entat
i
on
has
been
eva
l
uat
e
d em
piri
cal
l
y
b
y
m
eas
ur
ing m
e
an abs
o
lut
e
error which
gave comparab
le
results
with o
t
h
e
r exp
e
riments
conducted,
previously
b
y
o
t
her research
ers, on
a relatively smaller data set and non-
distributed env
i
r
onment. This proved th
e s
calab
il
it
y
of S
VD-based RS and its
appli
cabi
lit
y to
Big
Dat
a
.
Keyword:
A
p
ach
e
H
a
d
oop
Apac
he spa
r
k
B
i
g dat
a
Co
llab
o
rativ
e filterin
g
Recom
m
e
nder syste
m
Si
ng
ul
ar
val
u
e
decom
posi
t
i
o
n
Copyright ©
201
5 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Kh
ad
ij
a
Atiya Alm
o
h
s
en
,
Depa
rt
m
e
nt
of
In
fo
rm
at
i
on Te
chn
o
l
o
gy
,
Ah
lia Un
i
v
ersi
ty,
Ex
hi
bi
t
i
ons
A
v
enue
, M
a
nam
a
, B
a
hrai
n
Em
a
il: k
a
l
m
o
h
s
en
@ah
lia.ed
u
.b
h
1.
INTRODUCTION
Ad
va
nces i
n
t
echn
o
l
o
gy
, t
h
e
wi
de s
p
read
of i
t
s
usag
e a
nd t
h
e c
o
n
n
ec
t
i
v
i
t
y
of eve
r
y
t
hi
ng
t
o
t
h
e
In
tern
et h
a
v
e
mad
e
th
e worl
d
exp
e
rien
ce un
u
s
u
a
l rate of g
e
n
e
ratin
g
and
sto
r
i
n
g
d
a
ta resu
lting
in
wh
at is
bei
n
g cal
l
e
d B
i
g Dat
a
phe
n
o
m
enon
. A
s
a c
ons
eq
ue
nce,
d
a
t
a
i
s
becom
i
ng u
n
b
el
i
e
va
bl
y
l
a
rge i
n
scal
e,
sco
p
e,
d
i
stribu
tio
n and
h
e
tero
g
e
n
e
ity. To
pu
t it d
i
fferen
tly,
Big
Data is bei
n
g cha
r
acteriz
ed
by 6Vs:
Volum
e
,
Variety,
Velo
city, Veracity,
Variab
ility an
d Valu
e [1
]-[3
].
As a co
nsequ
e
n
ce of th
e em
e
r
g
i
n
g
fl
u
i
d
of
d
a
ta, no
rm
al ta
sk
s and
activ
ities b
eco
m
e
ch
allen
g
e
s. Fo
r
i
n
st
ance,
br
ow
si
ng t
h
e we
b a
nd sea
r
chi
ng f
o
r i
n
t
e
re
st
i
ng i
n
f
o
rm
at
i
on or
pr
o
duct
s
i
s
a rout
i
n
e a
nd c
o
m
m
on
t
a
sk. H
o
weve
r
,
t
h
e m
a
ssi
ve
am
ount
o
f
dat
a
on t
h
e
web i
s
expa
n
d
i
n
g t
h
e noi
se t
h
e
r
e m
a
ki
ng i
t
har
d
er an
d
m
o
re tim
e
consum
ing to c
hoose t
h
e interesting
pi
eces of
inform
ati
on
from all this noise
[4]-[5].
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1553 –
1563
1
554
Lik
e
wise, t
h
e cu
rren
tly av
ailab
l
e syste
m
s, tech
no
log
i
es an
d to
o
l
s sh
ow their li
m
i
tatio
n
in
p
r
o
cessi
n
g
and m
a
nagi
n
g
t
h
i
s
m
a
ssi
ve am
ount
of
dat
a
.
Thi
s
l
eads t
o
t
h
e i
nve
nt
i
o
n
of
new t
e
c
h
no
l
ogi
es, s
u
c
h
as
M
a
p
R
e
duce
of G
o
ogl
e,
Had
o
op
of Y
a
h
o
o
! A
n
d Spa
r
k fr
om
Uni
v
ersi
t
y
of
C
a
l
i
f
or
ni
a, B
e
r
k
el
ey
[6]
.
W
i
t
h
t
h
i
s
i
n
m
i
nd, e
x
i
s
t
i
n
g
sy
st
em
s hav
e
bee
n
a
d
a
p
t
e
d t
o
m
eet
B
i
g
Dat
a
by
us
i
ng t
h
e
ne
wl
y
i
nve
nt
e
d
t
o
ol
s a
n
d
technologies.
One
of t
h
ese s
y
ste
m
s is recommender syste
m
.
R
ecom
m
e
nder
sy
st
em
s have
bee
n
i
m
pl
em
ent
e
d l
o
n
g
t
i
m
e
ago
by
se
veral
Int
e
rnet
gi
ant
s
;
l
i
k
e
Am
azon.com
,
Facebook a
nd
Google. The
s
e
syste
m
s suggest
new item
s
that
m
i
ght be of inte
rest to the use
r
b
y
an
alyzing
user’s profiles,
th
eir activ
ities o
n
th
e
webs
ites as well as t
h
eir pu
rch
a
se
h
i
sto
r
y; if ap
p
licab
le.
Howev
e
r, Big Data in
creases th
e co
gn
itive lo
ad on
th
e u
s
er,
p
o
sing
m
o
re ch
allen
g
es on
reco
mmen
d
e
r
syste
m
s.On
e
of th
ese ch
alleng
es is scalab
ility in
wh
ich
t
h
e syste
m
sh
ou
ld
b
e
ab
le to
d
e
al with
a
b
i
gg
er d
a
ta
set
wi
t
h
out
d
e
gra
d
i
n
g
i
t
s
p
e
rf
orm
a
nce. H
o
we
ve
r,
t
h
is is no
t th
e case with
t
h
e curren
t
techn
i
ques of
recomm
ender
syste
m
s as the
com
putationa
l time incr
ease by increasi
n
g the num
b
er
of
users a
n
d ite
m
s
.
Ano
t
h
e
r ch
allen
g
e
is t
o
pro
v
i
d
e
h
i
g
h
qu
ality reco
mm
en
d
a
tio
n
s
, i
n
a v
e
ry q
u
i
ck m
a
n
n
e
r, to
g
a
in
t
h
eir u
s
ers
sat
i
s
fact
i
on a
n
d ret
a
i
n
t
h
em
.
The t
h
i
r
d c
h
al
l
e
nge
res
u
l
t
e
d f
r
om
t
h
e spars
e
ness
of t
h
e dat
a
whe
r
e eac
h u
s
er h
a
d
rated
relativ
ely sm
a
ll fractio
n o
f
all th
e av
ailab
l
e ite
m
s
. T
h
is co
m
p
licates th
e pro
cess
o
f
find
ing
similarity
bet
w
ee
n use
r
s as t
h
e num
ber
of com
m
onl
y rat
e
d i
t
e
m
s
i
s
very
sm
al
l i
f
not
zer
o. Dat
a
Sparsi
t
y
l
e
d anot
he
r
chal
l
e
ng
e cal
l
e
d col
d
st
art
p
r
obl
em
i
n
whi
c
h t
h
e use
r
d
o
e
s
not
get
pe
rs
o
n
al
i
zed rec
o
m
m
e
ndat
i
on
u
n
l
e
ss s/
he
rates su
fficien
t
nu
m
b
er of item
s
[7
]-[11
]
.
This enc
o
ura
g
es m
o
re research
work on
new
rec
o
mme
ndation a
p
proaches that c
o
uld sol
v
e the
exi
s
t
i
n
g
p
r
o
b
l
e
m
s
. One
of t
h
e
pr
om
i
s
i
ng ap
pr
oache
s
i
s
Si
ng
ul
ar
Val
u
e
Dec
o
m
posi
t
i
on
(S
VD
).
Th
is
research p
a
p
e
r rev
i
ews t
h
e literatu
re of
reco
m
m
en
d
e
r system
s an
d
prov
i
d
es a bro
a
d
b
ackgr
oun
d
o
f
its d
i
f
f
e
r
e
n
t
app
r
o
a
ch
es. In
add
itio
n
,
it stu
d
i
es th
e
m
a
in
co
ncer
n
s
su
rr
oundin
g
th
em
d
u
e
to
b
i
g
dat
a
. Fu
rt
he
rm
ore
,
i
t
i
nvest
i
g
at
es SVD ap
p
r
oach a
nd
pr
o
v
i
d
es an i
m
pl
em
ent
a
t
i
on o
f
i
t
usi
ng B
i
g Dat
a
t
ool
s
(
i
.e.
Ap
ach
e Had
oop
an
d Sp
ar
k)
.
Th
is wo
rk
is in
tend
ed
t
o
v
a
lid
ate ex
istin
g
co
n
t
ribu
tio
n
s
t
o
th
e field
of SVD, assess th
e ap
p
licab
ility
o
f
SVD to
large scale reco
m
m
en
d
e
r
systems and
ev
al
u
a
te
th
e app
licab
ility an
d
v
i
ab
ility o
f
Had
oop
and Sp
ark
in
bu
ild
ing
scalab
le system
.
The reset
of
t
h
e
pa
per
i
s
or
ga
ni
zed
as fol
l
o
ws:
T
h
e ne
xt
s
ect
i
on pr
o
v
i
d
e
s
a br
oa
d back
gr
o
u
n
d
of
t
h
e
th
eories related
to
RS and
C
F
in
p
a
rticu
l
ar. In
add
i
t
i
on,
i
t
she
d
s t
h
e l
i
g
h
t
on a
p
pl
y
i
ng
SV
D ap
p
r
oac
h
t
o
R
S
.
Th
is w
ill b
e
follo
w
e
d
b
y
a sectio
n
wh
ich
d
e
tails al
l th
e ex
p
e
rim
e
n
t
s u
ndertak
en
u
s
i
n
g
A
p
ach
e
H
a
doop
and
Sp
ark
to
im
p
l
emen
t SVD-b
a
sed
RS. It will also
p
r
esen
t
th
e
resu
lts of th
ese ex
p
e
rim
e
n
t
s a
n
d
d
i
scu
s
s th
em
.
At
t
h
e en
d,
t
h
e
co
ncl
u
si
on
an
d
f
u
t
u
re
wo
rk
wi
l
l
be
prese
n
t
e
d
.
2.
BA
C
KGR
OUN
D
Thi
s
sect
i
o
n f
o
rm
ul
at
es t
h
e pro
b
l
e
m
t
o
be sol
v
e
d
by
R
S
s
and
p
r
o
v
i
d
es a
br
oa
d bac
k
gr
ou
n
d
o
f
t
h
e
di
ffe
re
nt
rec
o
m
m
e
nder al
g
o
r
i
t
h
m
s
and a
p
pr
oac
h
es,
espe
ci
al
l
y
t
h
e cont
em
pora
r
y
one
t
h
at
s
u
i
t
e
s sc
al
abl
e
syste
m
. In
additio
n
,
it ad
dresses th
e
preli
m
i
n
aries as
well as th
e app
licab
ility o
f
SVD
to
CF
reco
mmen
d
e
r
syste
m
s. Fu
rt
herm
o
r
e, it rev
i
ews
related
word
to
gi
ve clear
view
of the
state of the
art.
2.1.
Rec
o
mme
nder Sys
t
em’s
Problem F
o
r
m
ulati
o
n
Su
pp
ose
t
h
at
a
B
i
g Dat
a
set
r
ecor
d
s
t
h
e p
r
ef
erences
of
bi
g num
ber o
f
user
s;
de
not
e
d
by
n;
f
o
r
s
o
m
e
or all of m
ite
m
s
. The prefe
r
ence
reco
rd
u
s
u
a
lly tak
e
s t
h
e
form
o
f
tup
l
e
(u
serID, item
I
D, rating); wh
ere rating
takes a
value
on a num
e
rical scale (f
or e
x
am
pl
e fr
om
1-5) a
n
d t
h
at
e
x
press
e
s
ho
w m
u
ch
t
h
e
user
h
o
l
d
i
n
g
u
s
erID lik
es t
h
e ite
m
with
item
I
D.
Let
R
b
e
a u
s
er–ite
m
matrix
o
f
size
mn
which
represents the
prefer
ence records suc
h
that
each
Rij
cel
l
ei
t
h
er hol
d
s
t
h
e rat
i
n
g gi
ven
by
us
er
i
t
o
item
j
o
r
nu
ll if t
h
e
u
s
er
d
i
d no
t
rate th
e ite
m
yet, as sh
own in
Figure
1. In m
o
st
of the ca
se
s, this
m
a
trix is spa
r
se
beca
use each user do
es
not norm
a
lly rate all the i
t
e
m
s in
th
e d
a
ta
set.
Fig
u
re
1
.
Sam
p
le o
f
u
s
er-item
matrix
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Reco
mme
n
d
er Syst
ems
i
n
Li
g
h
t
of
Bi
g D
a
t
a
(
K
hadi
j
a
A.
Al
mo
hse
n
)
1
555
Th
e m
i
ssio
n
o
f
a RS is
to
p
r
ed
ict th
e
m
i
ssin
g
rating
s
; i.e. p
r
ed
ict h
o
w a u
s
er
wou
l
d
rat
e
an
ite
m in
th
e fu
tu
re. Th
is aid
s
th
e recommen
d
e
r system in
reco
mm
e
n
d
i
n
g
item
s
th
at are p
r
ed
icted
to
receiv
e
h
i
g
h
rating
by
t
h
e use
r
[1
2
]
.
2.2. Rec
o
mme
nder
Sys
t
ems and
Appr
oac
hes
Am
ong t
h
e c
o
m
m
onl
y
used rec
o
m
m
endat
i
on al
g
o
r
i
t
h
m
s
are cont
e
n
t
base
d
rec
o
m
m
e
nder a
nd
col
l
a
bo
rat
i
v
e
fi
l
t
e
ri
ng
recom
m
ender
.
C
ont
ent
base
d
sy
st
em
s anal
y
ze t
h
e use
r
p
r
ofi
l
e
an
d
hi
s p
u
r
c
hase
hi
st
ory
b
y
stud
yin
g
the u
s
ers’ m
a
in
attrib
u
t
es
(also called
m
e
ta
-data such as: us
er age
,
ge
nd
er
an
d in
ter
e
st
)
an
d h
i
s
pre
v
i
o
usl
y
p
u
r
chase
d
i
t
e
m
s
’ feat
ure
s
(
s
uc
h
as:
pri
ce,
cat
eg
ory
a
n
d d
e
scri
pt
i
o
n
)
.
Thi
s
ap
pr
oac
h
rec
o
m
m
e
nds
i
t
e
m
s
wi
t
h
si
m
i
l
a
r at
t
r
i
but
es
t
o
t
h
e
pre
v
i
o
usl
y
pu
rc
hase
d
on
e [
8
]
,
[
1
3]
. T
h
e m
a
i
n
pr
obl
e
m
of t
h
i
s
a
p
pr
oach
i
s
that it
is dom
ain specific; for
exam
ple: in a
m
ovie r
ecommendation, the s
y
ste
m
n
eeds to consi
d
er actors and
di
rect
o
r
s as
at
t
r
i
b
ut
es w
h
i
l
e
m
a
ki
ng
rec
o
m
m
e
ndat
i
o
n
.
Ho
weve
r,
suc
h
c
o
m
put
at
i
on i
s
n
o
t
ap
pl
i
cabl
e
f
o
r
b
o
o
k
reco
mm
en
d
a
tio
n [1
4
]
.
Th
e oth
e
r ap
pro
ach,
i.e. co
llabor
ativ
e
filtering
(cf),
m
a
k
e
s
recommen
d
a
tio
n
based
o
n
th
e ex
isting
relatio
n
s
h
i
p
b
e
tween users and
item
s
. In
g
e
neral
,
i
t
rel
i
e
s
o
n
ot
he
r user
s’ pre
f
ere
n
ces t
o
f
i
n
d
ite
m
s
si
milar t
o
wh
at h
a
v
e
been
pur
chase
d
by the user a
n
d suggest them
as recomm
endation
or to fi
nd like-
m
i
nded
user
s
wh
o
have si
m
i
lar t
a
st
e t
o
t
h
e t
a
rget
user a
n
d
t
hus
recom
m
end w
h
at
eve
r
t
h
e
y
have
pu
rc
has
e
d b
u
t
not
seen
by
t
h
e
t
a
rget
use
r
.
[8]
,
[
1
4]
. T
h
e t
w
o c
o
m
m
on app
r
oac
h
es
o
f
C
F
are:
User-base
d Collaborative F
iltering
: it ex
amin
es th
e en
tire d
a
ta set o
f
users and
item
s
to
g
e
n
e
rate
recom
m
endat
i
ons
by
i
d
e
n
t
i
f
y
i
ng use
r
s t
h
a
t
have si
m
i
l
a
r i
n
t
e
rest
s t
o
t
h
e t
a
rget
one
and t
h
en
reco
m
m
e
nds
i
t
e
m
s
t
h
at
have
been
b
o
u
g
h
t
b
y
ot
hers
an
d
n
o
t
t
h
e t
a
r
g
et
us
er. T
h
i
s
p
r
ocee
ds
by
co
nst
r
uct
i
ng
user
-i
t
e
m
m
a
t
r
i
x
wh
ich
rep
r
esen
ts th
e in
teractio
n
b
e
t
w
een
u
s
ers and
ite
m
s
. After that, som
e
statistical com
putations (i.e
.
si
m
ilarit
y
mea
s
u
r
es) w
ill b
e
ap
p
lied
on
th
e m
a
trix
to
find
th
e n
earest n
e
ig
hbo
rs. Th
ese n
e
i
g
hbors are
su
ppo
sed
to
hav
e
sim
i
lar in
terest with
t
h
e
targ
et us
er. This will b
e
fo
llo
wed
b
y
co
m
b
in
ing
t
h
e neig
hbo
rs’
pre
f
ere
n
ces a
n
d fi
ndi
ng t
h
e t
op
N i
t
e
m
s
t
h
at
have
been
rat
e
d hi
ghl
y
by
n
e
i
g
h
b
o
r
s a
nd
n
o
t
by
t
h
e t
a
r
g
e
t
user.
Th
ese
N item
s
will form
th
e to
p N reco
mm
e
n
d
a
tion
s
[8
].
Desp
ite th
e fact th
at th
is ap
p
r
o
a
ch
h
a
s b
e
en
ad
ap
ted
wi
d
e
ly, it su
ffers fro
m
scalab
ili
ty p
r
ob
lem
whic
h was not
conside
r
ed
a big
iss
u
e fe
w decade
s
a
g
o
when the
num
b
er of
use
r
s a
n
d
ite
m
s
was rela
tively
sm
a
ll. However, as the data
set size
increases in bi
g
d
a
ta era, co
m
p
u
tin
g
th
e similarity b
e
tween
u
s
ers is
i
n
creasi
n
g
ex
p
one
nt
i
a
l
l
y
beca
use
of
t
h
e
nee
d
fo
r c
o
m
p
ari
n
g eac
h
use
r
wi
t
h
al
l
t
h
e
ot
he
r
use
r
s.
M
o
reo
v
e
r, as
th
e
u
s
ers in
teract with
m
o
re
ite
m
s
an
d
ch
an
g
e
th
eir
p
r
e
f
e
r
ences
, t
h
e si
milarity need
s
to be recom
p
uted; i.e.
sim
i
l
a
ri
ty
pre-
com
put
at
i
on b
ecom
e
s us
eless. Th
is is d
e
g
r
ad
ing
th
e perfo
rm
an
ce o
f
R
S
s and
th
at is wh
y it is
bei
n
g co
nsi
d
e
r
ed as a bi
g p
r
o
b
l
e
m
t
oday
.
Fu
rt
herm
ore,
hav
i
ng a spa
r
se us
er-i
t
e
m
m
a
t
r
i
x
, whi
c
h i
s
us
ual
l
y
t
h
e
case beca
use
users i
n
teract
with relativ
ely small set o
f
item
s
, also
add
s
t
o
the
d
i
fficu
lty of co
m
p
u
tin
g
u
s
er’s
si
m
ilarit
y
sin
ce th
e
n
u
m
b
e
r
o
f
co
mm
o
n
ite
m
s
is relativ
ely small if n
o
t
zero [8
]-[9
], [14
]
-[1
5
]
.
Item-base
d Collaborative
F
iltering
:
it ex
amin
es th
e set
of item
s
rated
by th
e targ
et
u
s
er an
d fi
n
d
s
ot
he
r i
t
e
m
s
sim
i
l
a
r t
o
t
h
em
(w
hi
ch are cal
l
e
d nei
g
h
b
o
r
s
)
,
by
consi
d
eri
n
g ot
he
r us
ers
’
pre
f
ere
n
ces.
W
i
t
h
t
h
e
h
o
p
e
of find
ing
n
e
igh
bors,
each
ite
m
will
b
e
represen
ted b
y
a v
ect
o
r
of th
e rating
s
giv
e
n
b
y
t
h
e
d
i
fferent
u
s
ers, and
th
en
, th
e sim
ilari
ty o
f
two
item
s
will b
e
measu
r
ed
b
y
com
p
u
tin
g
th
e similarit
y
b
e
tween
th
eir
v
ector
s as show
n in
fo
llo
w
i
ng
f
i
gu
r
e
.
Fig
u
re
2
.
Co
mp
u
ting
item
-
ite
m
si
mi
larity [1
6
]
Th
ese
n
e
ighb
ors
will form
th
e reco
mmen
d
a
tio
n
s
and
will b
e
rank
ed
after
p
r
ed
icting
th
e p
r
eferen
ce
of t
h
e target us
er for each
one
of t
h
em
. The
pre
d
iction
P
,
of t
h
e target use
r
u
to
on
e
o
f
t
h
e
neig
hb
or
s, ite
m
i
,
i
s
gi
ve
n
by
:
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1553 –
1563
1
556
P
,
∑
Sim
i,
j
R
,
∑
Sim
i,
j
W
h
er
e N is
th
e nu
mb
e
r
o
f
n
e
ig
hbo
r
s
,
simi,
j
is t
h
e
si
m
ilarit
y
b
e
tween
t
h
e item
j
an
d its
n
e
igh
bor
i
,
R
,
i
s
t
h
e
rat
i
n
g
gi
ven
by
use
r
u
to
ite
m
j
[9
],
[17
]
.
Howev
e
r, m
e
a
s
u
r
i
n
g th
e sim
i
larities b
e
tween
item
s
tak
e
s l
o
ng
tim
e an
d
co
n
s
u
m
es lo
ts of co
m
p
u
t
er
r
e
sour
ces.
Th
is is th
e m
a
in
p
itf
all o
f
th
is m
e
t
h
od
. An
yhow
,
ch
ang
e
s in
item
s
ar
e n
o
t
as fr
equ
e
n
t
as ch
an
g
e
s i
n
users a
n
d
,
t
h
u
s
, suc
h
com
put
at
i
ons ca
n b
e
pre-cal
c
u
l
a
t
e
d i
n
an of
fl
i
n
e
m
ode. An
ot
her st
re
ngt
h o
f
t
h
i
s
algorithm
is
that it is not affe
cted by ha
ving a sparse
use
r-i
te
m
matrix. This is because
with large
num
ber of
u
s
er, th
ere
will b
e
en
oug
h num
b
e
r o
f
ratin
g
s
for each
ite
m
wh
ich
en
ab
le
measu
r
ing
th
e
si
m
ilarit
y
b
e
tween
the
d
i
fferen
t
ite
m
s
an
d g
e
tting
si
gn
ifican
t statistics [8
]-[9
],
[15
]
.
Gene
ral
l
y
spea
ki
n
g
, C
F
whet
her
i
t
i
s
an
i
t
e
m
-
based o
r
a
user
ba
sed
ap
pr
oac
h
,
has
a
wel
l
-
k
n
o
w
n
streng
th
in which
it is n
o
t
do
m
a
in
sp
ecific, and
thu
s
,
do
es no
t rely on
th
e item
s
’ p
r
operties and
attrib
u
t
es.
That
i
s
why
i
t
i
s
ap
pl
i
cabl
e
t
o
di
ffe
rent
d
o
m
ai
ns:
m
ovi
e
recom
m
endat
i
o
n
,
bo
o
k
r
eco
m
m
e
ndat
i
on,
f
l
owe
r
s,
f
ood
an
d o
t
h
e
rs.
H
o
w
e
v
e
r, C
F
suf
f
e
r
s
f
r
o
m
th
e fo
llow
i
ng
pr
ob
lem
s
:
1)
Scalability: RSs are
bei
n
g fe
d with m
a
ssive am
ount
of data
which
should be processe
d
rapidly.
Ho
we
ver
,
C
F
al
go
ri
t
h
m
s
co
m
put
at
i
o
n t
i
m
e
gro
w
s u
p
wi
t
h
t
h
e co
nt
i
n
u
o
u
s
i
n
crease i
n
t
h
e
num
ber o
f
use
r
s and
ite
m
s
[9
].
2)
Data Sp
arsity: In
an
e-co
mmerce
web
s
ite,
users
u
s
u
a
lly rate s
m
all fracti
o
n
of all th
e av
ailab
l
e
ite
m
s
resulting in sparse
data
set. Th
is de
gra
d
es t
h
e acc
ura
c
y of the RS
because it c
o
m
p
licates the proc
ess of
find
ing
sim
ilar
ities b
e
tween users as th
e
nu
mb
er
of co
mm
o
n
ite
m
s
b
eco
m
e
s relativ
ely small [9
].
3)
C
o
l
d
-
s
t
a
rt
p
r
o
b
l
e
m
s
:
Thi
s
pr
obl
em
em
erged as a
co
nse
q
u
e
nce
of
dat
a
s
p
arsi
t
y
p
r
obl
e
m
;
where
new
users ca
n
not
get
pe
rs
on
al
i
zed recom
m
e
ndat
i
o
n u
n
l
e
s
s
t
h
ey
rat
e
a suf
f
i
c
i
e
nt
num
b
e
r of i
t
e
m
s
. Likewi
se
,
new
i
t
e
m
s
can
not
be rec
o
m
m
ende
d be
fo
re g
e
t
t
i
ng
reas
o
n
a
b
l
e
num
ber or
ra
t
i
ngs [1
1]
.
4)
Sy
no
ny
m
y
:
di
ffere
nt
p
r
o
d
u
ct
s
ha
ve di
ffe
rent
nam
e
s i
n
t
h
e
dat
a
set
eve
n
i
f
t
h
ey
are
si
m
i
l
a
r t
o
each othe
r. In this case, a standard
CF RS will treat
the
m
di
ffe
rently and
will not infer t
h
e hidde
n
association
betwee
n t
h
em
.
For illust
ration, “
cart
oon
film” and “cart
o
on m
ovie” are
tw
o
phrases refereeing to t
h
e sam
e
i
t
e
m
.
Howe
ve
r
,
o
r
di
nary
i
m
pl
em
ent
a
t
i
ons
of
C
F
al
g
o
ri
t
h
m
s
ha
d t
r
eat
e
d
t
h
e
m
di
fferent
l
y
[
18]
.
5)
Grey
shee
p:
i
t
addre
sses use
r
s wh
ose o
p
i
n
i
ons d
o
n
o
t
m
a
t
c
h wi
t
h
any
ot
he
r gr
o
up o
f
users
.
Co
n
s
equ
e
n
tly,
CF canno
t serv
e
g
r
ey sh
eep
sin
ce it m
a
in
ly
relies
o
n
th
e
si
m
ilarit
y
b
e
tween
u
s
ers’
prev
iou
s
pre
f
ere
n
ces [16].
The a
f
orem
ent
i
one
d,
st
an
dar
d
, i
m
pl
em
ent
a
t
i
on
of
i
t
e
m
-
based a
n
d
user
-base
d
C
F
a
r
e
f
o
l
l
o
wi
ng
me
m
o
ry-b
ased ap
p
r
o
a
ch
in
wh
ich
th
e en
tire
d
a
ta set is
kept
i
n
m
e
m
o
ry
w
h
i
l
e
proc
essi
n
g
i
t
and searc
h
i
ng f
o
r
sim
i
l
a
ri
ti
es between
user
s or
i
t
e
m
s
i
n
ord
e
r
t
o
m
a
ke recom
m
e
ndat
i
o
n
.
Th
e ot
her a
p
p
r
o
a
ch o
f
im
pl
em
ent
i
n
g
C
F
al
gori
t
h
m
is cal
l
e
d
m
odel
based a
p
p
r
oa
c
h
i
n
w
h
i
c
h t
h
e
d
a
ta set is u
s
ed
in
an
offli
n
e
m
o
d
e
to
g
e
n
e
rate a
m
odel
by
ut
i
lizi
ng som
e
dat
a
m
i
ni
ng, m
a
chi
n
e l
ear
ni
n
g
or st
at
i
s
t
i
cal
t
echni
que
s. Thi
s
m
odel
coul
d
be use
d
l
a
t
e
r on t
o
pre
d
i
c
t
t
h
e rat
i
n
gs
fo
r u
n
see
n
i
t
e
m
s
wi
t
hout
t
h
e
need
of
pr
oce
ssi
ng t
h
e ent
i
r
e dat
a
set
agai
n a
n
d
again. E
x
am
ples of this approach
are:
deci
si
on t
r
ees, cl
u
s
t
e
ri
ng m
e
t
hod
s and m
a
t
r
i
x
fact
ori
zat
i
on m
odel
s
[1
9]
.
Point often
overlooke
d is that
m
odel-base
d appr
oac
h
ge
nerates pre
d
ictions wit
h
lower accurac
y
wh
en
co
m
p
ared
with
m
e
m
o
r
y
b
a
sed
app
r
o
a
ch
. Howev
e
r,
it h
a
s b
e
tter scalab
ility. Th
u
s
, man
y
researchers are
i
nvest
i
g
at
i
n
g t
h
ei
r ef
fo
rt
i
n
s
t
udy
i
n
g an
d e
nha
nci
n
g m
o
d
e
l
-
base
d C
F
.
One
of t
h
ese
al
go
ri
t
h
m
s
i
s
Si
ng
ul
ar
v
a
lu
e d
e
co
m
p
o
s
itio
n (SVD);
wh
ich is th
e
o
n
e
im
p
l
e
m
en
t
e
d
and
v
a
lid
at
ed
i
n
th
is work
u
s
ing
so
m
e
o
f
Big
Data
To
o
l
s on
a
Big
Data
Set.
2.
3. Si
ng
ul
ar Val
u
e
Dec
o
m
p
osi
t
i
o
n (S
VD
)
SV
D i
s
o
n
e
of
t
h
e fam
ous m
a
t
r
i
x
fact
ori
zat
i
on t
e
c
hni
ques
t
h
at
dec
o
m
pose a m
a
t
r
i
x
R
of size
m
n
an
d r
a
nk
=
r
in
to th
ree m
a
trices
U,
S
and
V
as fo
llo
ws:
RU
.
S
.
V
Whe
r
e:
U
: an
ort
h
onormal
m
a
trix of
size
mr
h
o
l
d
i
n
g
l
e
ft
si
n
gul
a
r
ve
ct
ors
o
f
R
in
its co
lu
m
n
s; i.e.
it
s
r
colum
n
s hold e
i
genvectors
of
the
r
no
nze
r
o
ei
gen
v
al
ue
s
of
RR
.
S
: a diagonal
matrix of size
rr
h
o
l
d
i
n
g
t
h
e si
ng
ul
ar val
u
e
s
o
f
R
in
its diag
on
al en
t
r
ies in
decreasi
n
g order;
i.e.
s
s
s
⋯s
.
T
h
e
s
e r
v
a
lu
es
ar
e
the
n
onn
e
g
a
tiv
e s
q
ua
r
e
roo
t
s
of
eigenvalues
of
RR
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Reco
mme
n
d
er Syst
ems
i
n
Li
g
h
t
of
Bi
g D
a
t
a
(
K
hadi
j
a
A.
Al
mo
hse
n
)
1
557
V
:
an
o
r
t
h
ono
rm
a
l
m
a
trix
o
f
size
nr
ho
ld
i
n
g th
e
r
i
gh
t sing
u
l
ar
v
ector
s
o
f
R
in
its co
lu
m
n
s; i.e.
its
r
colum
n
s hold eige
nvectors
of the
r
non
zero eig
e
nv
alues
o
f
R
R
.
Furt
herm
ore,
S
coul
d be
re
duc
ed by
t
a
ki
ng t
h
e l
a
r
g
est
k
si
n
gul
a
r
val
u
es o
n
l
y
and t
h
u
s
o
b
t
a
i
n
S
of
size
kk
. Acc
o
r
d
ingly,
U
and
V
c
oul
d
be
re
duce
d
b
y
ret
a
i
n
i
n
g t
h
e
fi
rst
k
si
n
gul
a
r
v
ect
ors
an
d
di
sc
ardi
ng
t
h
e
rest. I
n
a
not
h
e
r w
o
r
d
,
U
is g
e
n
e
rated
b
y
eli
m
in
atin
g
the last
r–
k
column of
U
an
d, si
milar
l
y,
V
is
gene
rat
e
d
by
el
im
i
n
at
i
ng t
h
e l
a
st
r–
k
colum
n
of
V
. Th
is
will yiel
d
U
of size
mk
and
V
of size
nk
. A
s
a conseque
nce,
R
U
.S
.
V
and
R
R
, w
h
e
r
e
R
is the closest rank
k
app
r
ox
im
a
tio
n
to
R
(
9
, 18
,2
0).
See Figure
3.
Fi
gu
re
3.
The
r
e
duce
d
m
a
t
r
i
x
R
of
ra
n
k
k (
2
0
)
2.
4.
SVD-base
d Rec
o
mme
nd
er Systems
Ap
pl
y
i
ng
SV
D
t
o
rec
o
m
m
en
der sy
st
em
s assum
e
s t
h
at
t
h
e rel
a
t
i
ons
hi
p
b
e
t
w
een
use
r
s a
nd i
t
e
m
s
as
well as th
e si
milarity b
e
tween
u
s
ers/ite
m
s
co
u
l
d
b
e
ind
u
ced b
y
so
m
e
laten
t
lo
wer d
i
m
e
n
s
io
n
a
l stru
ctu
r
e in
th
e
d
a
ta. For illu
stratio
n
,
th
e
rat
i
n
g
s
g
i
v
e
n b
y
a sp
ecific
u
s
er to a
p
a
rticular m
o
v
i
e, assu
m
i
n
g
th
at item
s
are
m
ovi
es, depe
n
d
s
on s
o
m
e
im
pl
i
c
i
t
fact
ors l
i
ke t
h
e
pre
f
ere
n
ce of t
h
at
use
r
acros
s di
f
f
ere
n
t
m
ovi
e ge
nres
. As a
matter o
f
fact,
it treats u
s
ers an
d ite
m
s
as u
n
k
n
o
w
n
feat
ure
vect
o
r
s t
o
be
l
earnt
by
a
ppl
y
i
n
g
S
V
D t
o
use
r
–i
t
e
m
matrix
an
d
break
i
ng
it do
wn
i
n
to
three sm
all
e
r m
a
trices:
,
and
[
12]
. T
h
i
s
pr
ocee
ds
by
const
r
uct
i
n
g t
h
e
,
sp
arse, u
s
er-item
m
a
trix
fro
m
th
e in
pu
t d
a
ta set an
d
th
en
imp
u
ting
it b
y
some v
a
lu
es to
fi
ll th
e
missin
g
ratin
g
s
and
red
u
ce i
t
s
spars
e
ness
be
f
o
re c
o
m
put
i
ng
i
t
s
SVD.
The
r
e
are seve
ral
im
put
at
i
o
n t
ech
ni
que
s an
d he
re
are t
h
e
m
o
st co
mm
on one: im
pute by Zero, im
pute each col
u
m
n
by
its Ite
m
Average
,
im
pute each row by its User
Ave
r
a
g
e
or i
m
put
e eac
h m
i
ssi
ng cel
l
by
t
h
e
m
ean
of
User
Ave
r
a
g
e a
n
d ite
m
avera
g
[21].
Th
is
will resu
l
t
in a
filled
m
a
trix
R
w
h
i
c
h coul
d be
no
rm
al
i
zed by
s
ubt
ract
i
n
g
t
h
e
a
v
erage
ratin
g of each
u
s
er fro
m
its co
rresp
ond
ing
ro
w resu
ltin
g i
n
R
. Th
e last st
ep
is
u
s
efu
l
i
n
offsetting
th
e
di
ffe
re
nce i
n
ra
t
i
ng scal
e
bet
w
een t
h
e
di
ffe
re
nt
u
s
ers
[
22]
.
At th
is po
in
t,
SVD cou
l
d
b
e
ap
p
lied
t
o
R
to com
pute
U
(t
hi
s
hol
ds u
s
ers
’
f
eat
ures
),
S
(h
ol
ds
t
h
e st
re
n
g
t
h
of
t
h
e
hi
d
d
en
fea
t
ures
) a
n
d
V
(hold
s
item
s
’ feat
u
r
es) su
ch th
at
th
eir inn
e
r
p
r
o
d
u
c
t
will
g
i
v
e
th
e
cl
osest
ra
nk
-
k
app
r
oxi
m
a
t
i
on t
o
R
. Th
is lower-ran
k
app
r
ox
imatio
n
o
f
u
s
er-ite
m
m
a
trix
is b
e
tter th
an
the
ori
g
i
n
al
one
si
nce S
V
D el
i
m
i
n
at
e t
h
e
noi
se
i
n
t
h
e
use
r-i
t
e
m
rel
a
t
i
onshi
p
by
di
sca
r
di
n
g
t
h
e
sm
al
l
sing
ul
ar
val
u
es
f
r
om
S
[18]
.
Hence
f
ort
h
, t
h
e pre
f
ere
n
ce
of
use
r
i
to
ite
m
j
coul
d
be
pre
d
i
c
t
e
d
by
t
h
e d
o
t
pr
od
uct
o
f
t
h
ei
r
corres
ponding feature
s
vect
or
s; i.e., co
m
p
u
t
e th
e do
t pro
d
u
c
t of th
e ith
ro
w
of (
U
.S
and jt
h colum
n
of
V
and add
back the
user a
v
e
r
age rating th
at was subtracted
while
norm
aliz
ing
R
. T
h
i
s
co
ul
d b
e
ex
p
r
esse
d
as:
p
r
U
.S
,_
.
V
_,
Whe
r
e
p
is th
e p
r
ed
icted
rating
fo
r
u
s
er
i
and
ite
m
j
,
r
is the user a
v
era
g
e
rating,
V
_,
is th
e j
t
h
colum
n
of
V
and
U
.S
,_
is th
e ith ro
w
o
f
th
e m
a
trix
resu
ltin
g fro
m
m
u
l
tip
lyin
g
U
and
S
.
In p
o
i
n
t
o
f
fac
t
, t
h
e dot
pr
o
d
u
ct
of t
w
o vec
t
ors m
easures t
h
e cosi
ne sim
i
lari
t
y
bet
w
een t
h
em
. Thus,
t
h
e a
b
ove
f
o
r
m
ul
a coul
d
be
i
n
t
e
r
p
ret
e
d
as
fi
ndi
ng
t
h
e
si
m
i
l
a
ri
ty
bet
w
e
e
n
use
r
i
and it
e
m
j
vect
or
s a
n
d t
h
en
ad
d
i
n
g
th
e
u
s
er av
erag
e rating to
p
r
ed
ict th
e
missin
g
rating
p
.
2.
5. SV
D Ap
p
r
oac
h
i
n
Rese
arch
Sarwar,
Karypis, Kon
s
tan
and
Ried
l h
a
d
st
u
d
i
ed
th
e ap
p
l
icab
ility o
f
SVD t
o
th
e
field
of RS
b
y
co
ndu
cted
t
w
o
exp
e
rim
e
n
t
s
o
n
relativ
ely sm
a
ll d
a
ta set. In th
e
first exp
e
rim
e
n
t
, th
ey d
i
d sev
e
ral
pre
p
r
o
cessi
ng
st
eps o
n
user
-i
t
e
m
rat
i
ngs
m
a
tri
x
be
fo
re fi
n
d
i
ng i
t
s
SVD
dec
o
m
posi
t
i
on an
d pre
d
i
c
t
i
n
g
m
i
ssi
ng
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1553 –
1563
1
558
users
’
prefe
r
ences for item
s
not seen yet. For the
pu
rpos
e of predicting user i prefe
r
e
n
ce for ite
m
j, they
m
u
l
tip
lied
ro
w i
o
f
U.
S
by
col
u
m
n
j
of
S
.V
. I
n
t
h
e ot
her
e
xpe
ri
m
e
nt
, t
h
ey
rel
i
e
d
o
n
SV
D,
i
n
st
ead
o
f
Pears
on c
o
r
r
el
at
i
on an
d c
o
si
ne si
m
i
l
a
ri
t
y
,
t
o
el
i
c
i
t
t
h
e rel
a
t
i
onshi
p bet
w
een
use
r
s an
d t
h
us fi
nd
u
s
er’s
n
e
igh
bor
s
n
e
ed
ed to
sugg
est
to
p N r
e
co
mmen
d
a
tio
n. Thei
r resu
lts
were en
co
urag
ing
for th
e ap
p
licatio
n
o
f
SVD in t
h
e fie
l
d of RSs
beca
use they
believed that
re
ducing the
dim
e
nsionality of
the
ra
tings m
a
trix succee
d
in
filterin
g
ou
t
th
e n
o
i
se from th
e d
a
ta [1
8
]
. A
f
ter tw
o
years, th
e same g
r
oup
of
research
ers proposed
an
in
crem
en
tal i
m
p
l
em
en
tatio
n
of th
eir pr
evi
o
us
w
o
r
k
as a
s
o
l
u
t
i
on f
o
r t
h
e
ex
p
e
nsi
v
e
com
put
at
i
on
of
SV
D.
They
gra
d
ual
l
y
cons
t
r
uct
e
d a l
a
rg
e scal
e
m
odel
by
rel
y
i
ng o
n
, p
r
evi
o
usl
y
com
put
ed, sm
al
l
SVD m
odel
and
p
r
oj
ectin
g
n
e
w u
s
ers/rating
s
to
it [1
0
]
. In
200
6, Netflix
started
a co
m
p
eti
t
i
o
n
with
1
milli
o
n
d
o
llar as a p
r
ize
for the team
w
h
ich c
oul
d im
p
r
ove the acc
uracy of their
existin
g
RS b
y
at least 1
0
%
. This co
m
p
etit
io
n
en
ded
i
n
2
0
0
9
w
h
en
t
h
e g
r
an
d
pri
z
e
was
gi
ve
n t
o
a t
e
am
who
bl
ende
d S
V
D-
ba
sed rec
o
m
m
ender
wi
t
h
a st
oc
hast
i
c
art
i
f
i
c
i
a
l
neu
r
a
l
net
w
or
k t
e
c
hni
que
cal
l
e
d
R
e
st
ri
ct
ed B
o
l
t
zm
ann M
achi
n
es
[2
3]
.
G
o
n
g
,
Ye a
n
d
Da
i
ha
d
pr
o
pose
d
a
n
al
go
ri
t
h
m
whi
c
h
com
b
i
n
ed S
V
D an
d t
h
e t
r
ad
i
t
i
onal
i
t
e
m
-
based C
F
a
p
p
r
oa
ch. T
h
ey
com
put
e
d
SVD fo
r th
e
sp
arse u
s
er-item
m
a
trix
an
d th
en
m
u
ltip
l
y
U, S, V ag
ain
to
g
e
t a filled
m
a
trix
with
ap
pro
x
i
m
a
tio
n
to
th
e
o
r
i
g
in
al
ly
m
i
ssin
g
v
a
lu
es.
Th
en
CF
was ap
p
lied on
th
e
n
e
w m
a
t
r
ix
to find
t
h
e
clo
s
est
n
e
igh
bor
s to
t
h
e targ
et ite
m an
d thu
s
p
r
ov
id
e
go
od
r
e
co
mmen
d
a
tion
s
[
1
7
]
. Zh
ou
et. al. pr
opo
sed
an
approxim
a
tion to SVD whic
h could provi
d
e m
o
re accu
rate recommendations than
the standa
rd SVD a
n
d
coul
d be com
puted m
o
re effi
ciently.
Th
eir
work is su
mmarized
in
sam
p
lin
g
th
e
rows
o
f
a
u
s
er-item
matrix
according t
o
sa
m
p
ling
proba
b
ilities and
constructing a
smaller
m
a
trix C.
Then com
pute SVD
on the ne
wly
con
s
t
r
uct
e
d m
a
t
r
i
x
C
and
n
o
t
t
h
e o
r
i
g
i
n
al
m
a
t
r
i
x
[24]
.
In a
not
her e
f
f
o
rt
by
Lee a
n
d C
h
a
n
g
,
St
oc
hast
i
c
Si
ng
ul
ar Val
u
e Decom
posi
t
i
on
was use
d
i
n
st
ead o
f
con
v
e
n
tion
a
l similarit
y
meas
u
r
e to
ov
ercome th
e
scalab
ility p
r
ob
lem
o
f
ex
istin
g ite
m
-
b
a
sed CF reco
mm
e
n
d
e
r system
s. Th
eir
wo
rk
was i
m
p
l
e
m
en
ted
u
s
ing
Apac
he Ma
hout MapReduce
[9].
3.
EX
PER
I
M
E
NTS AN
D EVALU
A
T
ION
S
3.
1. E
x
peri
me
ntal
E
n
vi
r
o
ne
mt
Al
l
t
h
e expe
ri
m
e
nt
s were con
d
u
ct
ed u
s
i
n
g Scal
a pr
o
g
r
a
m
m
i
ng l
a
ng
u
a
ge o
n
Ecl
i
p
s
e
, ru
nni
n
g
o
n
M
acB
oo
k P
r
o
wi
t
h
X 1
0
.
9
.3
OS,
2
.
4
G
H
z I
n
t
e
l
C
o
re
i
5
pr
ocess
o
r a
n
d
8
GB
o
f
R
A
M
.
Thi
s
m
achi
n
e s
e
rve
d
a
s
a si
ngl
e
no
de c
l
ust
e
r f
o
r A
p
ac
he h
a
d
o
op
2.
4.
0 w
h
i
c
h
was
c
o
n
f
i
g
ure
d
i
n
ps
eud
o
-
di
st
ri
but
e
d
m
ode. I
n
a
d
d
i
t
i
on,
Apac
he
spa
r
k
v.
1.
0.
2
was
us
ed as
i
t
pr
o
v
i
d
e
s
fast
di
st
ri
b
u
t
e
d c
o
m
put
at
i
ons
.
3.
2. D
a
t
a
S
e
t
Th
e d
a
ta
set used
in
th
is work
is
th
e 1
M
M
ovieLe
n
s set
collected
from
M
ovi
eLen
s
we
bsi
t
e
by
Gr
ou
pLe
n
s re
search l
a
b
of
t
h
e Depa
rt
m
e
nt
of C
o
m
put
er Sci
e
nce an
d En
gi
nee
r
i
n
g
at
t
h
e Uni
v
ersi
t
y
of
M
i
nnes
o
t
a
. Th
i
s
dat
a
set
cont
ai
ns 1m
i
l
l
i
on r
a
t
i
ngs p
r
o
v
i
d
e
d
by
m
o
re t
h
an 6
0
0
0
us
ers t
o
ar
ou
n
d
3
9
0
0
m
ovi
es
in
th
e
form
o
f
tu
p
l
e (u
serID,
Mo
v
i
eID, rating
,
tim
esta
m
p
). Ratin
g
s
tak
e
in
teg
e
r v
a
l
u
es i
n
th
e i
n
terv
al [1
,
5
]
indicating
how m
u
ch the
user
likes the m
ovie
.
The af
orem
ent
i
one
d dat
a
set
was di
vi
de
d i
n
t
o
t
r
ai
ni
ng se
t
and t
e
st
set
base
d on
di
ffe
rent
rat
i
o
s
k
nown as trai
nin
g
ratio
s
[1
8
]
. For illu
stration
,
a train
i
n
g
ra
t
i
o
of
0
.
8
i
n
d
i
cates th
at 80
%
of th
e
o
r
i
g
in
al
d
a
ta set
i
s
use
d
as t
r
ai
n
i
ng set
a
n
d t
h
e
ot
he
r
20
% are
ke
pt
as t
e
st
se
t
.
To
p
u
t
i
t
an
o
t
her
way
,
t
h
e t
r
ai
ni
n
g
set
i
s
u
s
ed t
o
fill th
e
u
s
er-ite
m
m
a
trix
R
of size
6040
3900
wh
ere each
cell in it
h
o
l
d
s
t
h
e
pre
f
erence
of a
us
er to a
p
a
rticu
l
ar item. Th
is
will
b
e
u
s
ed
t
o
co
m
p
ute SVD, co
m
e
-up with
U
,
S
, a
n
d
V
m
a
trices as
well
as
predi
c
t
ratin
g
s
fo
r
u
n
rated
item
s
. On
th
e o
t
h
e
r h
a
nd
, th
e test set
will b
e
u
s
ed
to evalu
a
te th
e accuracy of t
h
e
p
r
ed
icted
rat
i
ngs
.
3.
3. E
v
al
ua
ti
o
n
Me
tri
c
Differen
t
e
m
p
i
rical ev
alu
a
tion
m
e
trics are t
h
ere to
assess
th
e qu
ality o
f
th
e esti
m
a
ted
p
r
ed
ictio
n
s
.
The m
o
st com
m
on m
e
trics are the statistical one s
u
c
h
as
M
ean
Ab
sol
u
t
e
E
r
r
o
r
(M
AE)
an
d R
oot
M
e
a
n
S
qua
re
Er
ro
r (
R
MSE)
. In
th
is
wo
rk
,
MA
E is
u
s
ed
du
e to th
e ease
o
f
in
terp
reting
it.
Th
e ev
alu
a
tion p
r
o
cess
o
f
t
h
is work
is illu
strated
b
y
Figu
re 4
,
wh
ere th
e d
a
ta set was
d
i
v
i
d
e
d
int
o
t
w
o
di
sj
oi
nt
se
t
s
;
one f
o
r t
r
ai
ni
n
g
an
d t
h
e s
econ
d
f
o
r t
e
st
i
ng t
h
e sy
st
em
as
m
e
nt
i
one
d
bef
o
re i
n
sect
i
on
3.
2.
Th
e
p
r
ed
icted
ratin
g
s
will b
e
co
m
p
ared
with th
e actu
a
l rati
n
g
s
i
n
th
e test
set b
y
m
easu
r
in
g
M
A
E
wh
ich
will
com
pute
the
a
v
era
g
e of
the abs
o
lute diffe
rence betw
ee
n each pre
d
icted
value
an
d
its corres
ponding actual
ratin
g, [18
]
i.e:
MAE
∑
p
,
r
,
N
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Reco
mme
n
d
er Syst
ems
i
n
Li
g
h
t
of
Bi
g D
a
t
a
(
K
hadi
j
a
A.
Al
mo
hse
n
)
1
559
Whe
r
e
N
is t
h
e
size o
f
th
e test set,
p
,
is th
e pred
i
c
ted
ratin
g fo
r
u
s
er i and
r
,
is the actu
a
l
rating
s
fo
r
user
u
.
A sm
aller value of MAE re
fers to a
hi
ghe
r
pred
iction
accura
cy and thus
bet
t
er rec
o
mm
endations.
Fi
gu
re
4.
Pre
d
i
c
t
i
on E
v
al
uat
i
o
n Fl
ow
Im
portant to re
alize, the e
m
pirical
evaluation will not a
d
dress the com
putation ti
m
e
. That is because
t
h
e wo
r
k
was
do
ne o
n
a si
n
g
l
e
m
achi
n
e wh
i
l
e
t
h
e al
gori
t
h
m
i
s
desi
gned
t
o
ru
n o
n
a cl
u
s
t
e
r. Th
us, m
easuri
ng
its ru
nn
ing
ti
me o
n
a sin
g
l
e
mach
in
e will n
o
t
reflect its
real p
e
rfo
r
m
a
n
ce o
n
a m
u
lt
i-n
o
d
e
clu
s
ter.
Howev
e
r,
analytical evaluation c
oul
d
be use
d
t
o
a
sse
ss the
runni
n
g ti
m
e
. As a matter of
fact, t
h
e RS co
n
s
ists of two
com
pone
nt
s:
an o
ffl
i
n
e c
o
m
pone
nt
an
d a
n
onl
i
n
e
o
n
e. T
h
e of
fl
i
n
e com
pone
nt
d
o
es
not
affect
t
h
e
real
-t
im
e
p
e
rform
a
n
ce of th
e system
a
s
all th
e
op
eratio
n
s
w
ill b
e
p
r
e-co
m
p
u
t
ed. Fortun
ately,
co
m
p
u
tin
g th
e SDV,
whi
c
h i
s
t
h
e m
o
st
ex
pe
nsi
v
e
ope
rat
i
o
n i
n
t
h
e wh
ol
e sy
st
e
m
,
i
s
do
ne i
n
a
n
o
ffl
i
n
e
m
ode.
On t
h
e
ot
her
h
a
nd
, t
h
e
o
n
lin
e co
m
p
onen
t
sim
p
ly g
e
n
e
rates th
e
p
e
rd
itio
n
b
y
m
u
ltip
lyin
g
2
v
ect
o
r
s
o
f
size
k
. Th
is is
O1
as the val
u
e
of
k
is co
nstan
t
.
3.4.
Choosing
the number
of dimensions
R
e
duci
n
g
t
h
e
di
m
e
nsi
ons of
t
h
e ori
g
i
n
al
m
a
t
r
i
x
R
is
u
s
efu
l
b
ecau
s
e it aids in eli
m
in
atin
g
t
h
e
no
ise
an
d fo
cusing
on
th
e im
p
o
r
tant in
form
atio
n
.
W
i
t
h
th
is i
n
min
d
,
an
app
r
opriate v
a
lu
e
of
k
shoul
d
be
selected
su
ch
th
at it can
filter
o
u
t
t
h
e no
ise bu
t no
t
lead
s to
t
h
e loo
s
e
o
f
im
p
o
r
tan
t
in
form
at
io
n
.
In ano
t
h
e
r
word, th
e
val
u
e
o
f
k
shou
ld
b
e
larg
e enou
gh
t
o
en
su
re
cap
turing
th
e essen
tial stru
cture
o
f
m
a
trix
R
bu
t sm
all en
o
ugh
t
o
filter ou
t no
ise and
avo
i
d overfittin
g
[18
]
,
[20
]
. Th
e
b
e
st v
a
lu
e
of
k
will b
e
ex
p
e
rim
e
n
t
ally d
e
ter
m
in
ed
b
y
t
r
y
i
ng di
f
f
ere
n
t
val
u
e
s
.
3.
5.
E
x
peri
me
nts an
d Resul
t
s
In the first place, 1M Movie
L
ens data set was loade
d
int
o
HDFS and then the training set, was use
d
to
fill th
e
u
s
er-item
matrix
R
. A
f
ter t
h
at,
R
u
nde
rwe
n
t
t
w
o
p
r
ep
r
o
cess
i
ng
o
p
erat
i
o
ns
:
im
put
at
i
on a
n
d
no
rm
al
i
z
at
i
on.
The
i
m
put
at
ion
was
d
o
n
e
by
m
ean of
ite
m
avera
g
e
ra
ting a
n
d user
avera
g
e
rating, after
expe
ri
m
e
nt
al
ly p
r
o
v
i
n
g i
t
s
s
u
peri
ori
t
y
o
v
er
ot
he
r i
m
put
at
ion
t
ech
ni
q
u
es
(re
fer t
o
Fi
gu
r
e
5
)
.
Fu
rt
he
rm
ore
,
t
h
e
n
o
rm
aliza
tio
n
step
su
b
t
racted th
e av
erag
e
rat
i
n
g
of each
u
s
er fro
m
its co
rresp
ond
ing
row
resu
ltin
g in
R
.
Fi
gu
re
5.
C
o
m
p
ari
n
g
di
f
f
ere
n
t
im
put
at
i
on t
e
chni
que
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1553 –
1563
1
560
Thi
s
was
f
o
l
l
o
wed
by
u
s
i
n
g
Apac
he
S
p
ar
k
t
o
c
o
m
put
e SV
D a
n
d c
o
m
e
up
wi
t
h
:
U
,
V
, and
S
. T
h
is
is equ
i
v
a
len
t
t
o
ex
tracting
bo
th
use
r’s
an
d
item
s
’ featu
r
e
s
f
r
om
R
. Fo
r th
at pur
po
se,
k
was set t
o
20
after
expe
ri
m
e
nt
al
ly p
r
o
v
i
n
g i
t
s
s
u
peri
ori
t
y
o
v
er
ot
he
r
val
u
es.
R
e
fer t
o
Fi
gu
re
6.
Fi
gu
re
6.
Det
e
r
m
i
n
at
i
on o
f
t
h
e
o
p
t
i
m
a
l
num
ber
of
di
m
e
nsi
ons
k
In
o
r
de
r t
o
c
o
m
put
e a m
i
ssi
ng
rat
i
n
g
f
o
r
o
n
e
use
r
, i
t
s
c
o
r
r
e
sp
on
di
n
g
ro
w
o
f
(
U.
S
)
was m
u
ltip
lied
b
y
V
co
lu
m
n
th
at co
rresp
ond
s to th
e targ
et item
an
d th
en
den
o
r
m
a
l
i
zed by
ad
d
i
ng t
h
e
us
er average
rating.
Thi
s
wo
r
k
co
ul
d
be e
x
p
r
esse
d
usi
n
g t
h
e f
o
l
l
o
wi
n
g
al
g
o
r
i
t
h
m
:
Algorithm
:
Large Scale
SVD-base
d Recommende
r
Syste
m
/
/
I
np
ut
:
1
M
M
ovi
eLe
n
se B
i
g
Dat
a
Set
//Ou
t
pu
t
:
A filled
u
s
er-item
matrix
P
with pred
ictio
n
s
to
all
th
e orig
i
n
ally missin
g
rating
s
read d
a
ta
fro
m
HDFS i
n
th
e form
o
f
tup
l
es (userID, item
I
D,
ratin
g)
train
i
ng
Set
←
80
% o
f
t
h
e
dat
a
testSet
←
20%
o
f
t
h
e dat
a
//Co
n
s
tru
c
t u
s
er-item
m
a
trix
fro
m
train
i
ng
Set
fo
r e
v
e
r
y
u
se
rI
D
Fi
nd
al
l
t
h
e
rat
i
ngs
gi
ve
n
by
hi
m
and co
nst
r
uc
t
a r
o
w
r
with
t
h
ese ratin
gs
u
s
er-item
matri
x
R
←
all rows
r
of all th
e
u
s
ers
//I
m
p
u
t
er
R
by
M
ean
_ It
em
AvgR
at
i
n
g&
Use
r
A
vgR
at
i
n
g
a
vg
fo
r e
v
ery
r
in
R
com
pute ave
r
a
g
e
v1
of all th
e rating
s
in
r
fo
r e
v
ery
c
o
lu
m
n
c
in
R
com
pute ave
r
a
g
e
v2
of all th
e rating
s
in
c
fo
r e
v
ery
cell
R
ij
in
R
if
R
ij
= n
il
R
ij
←
av
e
r
ag
e
o
f
v1
i
and
v2
j
//No
r
m
a
lize R
fo
r e
v
ery
r
in
R
fo
r e
v
ery
cell
Rj
in
r
R
j
←
R
j
- v1
//Co
m
p
u
t
e SVD
num
ber of
di
m
e
nsi
o
ns
k
←
20
com
put
e SV
D
of
R
to
g
e
t
U
,
S
,
V
//Pred
i
ctin
g th
e missin
g
ratin
gs
com
put
er t
h
e
d
o
t
p
r
od
uct
of
U
and
S
to
g
e
t
US
find
th
e transpo
s
e
o
f
V
to
g
e
t
V
T
com
put
e t
h
e
d
o
t
p
r
od
uct
of
US
and
V
T
to
g
e
t
th
e p
r
ed
ictio
ns
P
//De-no
rm
alize
P
fo
r e
v
ery
r
in
P
fo
r e
v
ery
cell
Pj
in
r
P
j
←
P
j
+ v1
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
Reco
mme
n
d
er Syst
ems
i
n
Li
g
h
t
of
Bi
g D
a
t
a
(
K
hadi
j
a
A.
Al
mo
hse
n
)
1
561
3.6. Discussi
on
The m
a
i
n
co
m
put
at
i
onal
adva
nt
age
o
f
r
u
n
n
i
n
g t
h
ese
expe
ri
m
e
nt
s, whi
c
h i
m
pl
em
ent
S
V
D
-
ba
se
d
recom
m
ender
sy
st
em
, usi
ng
Had
o
o
p
,
Spa
r
k
and Scal
a i
s
i
t
s easy
paral
l
e
l
i
zat
i
on. P
r
o
v
i
n
g t
h
e p
o
we
rf
ul
ness o
f
these fram
eworks/
A
PIs in implem
enting
large-scale
system
s with parall
elized
o
p
erat
i
o
ns i
n
di
st
ri
b
u
t
e
d m
ode.
Th
e
resu
lts are co
m
p
arab
le
with
th
e resu
lts of
o
t
h
e
r work
s,
d
i
scu
ssed
p
r
ev
iou
s
ly in
t
h
e literature
revi
e
w
, s
u
c
h
a
s
t
h
e
one c
o
nd
uct
e
d
by
Sa
rw
ar an
d
hi
s c
o
l
l
eague
s,
by
G
o
n
g
a
nd
Dai
as
w
e
l
l
as by
Zh
o
u
and
hi
s
colleagues; that were carrie
d
on si
gnifi
can
tly s
m
al
ler d
a
ta set (i.e. 10
0
K Mo
v
i
eLen
se Data Set). Th
is
p
r
ov
es
th
at SVD app
r
o
ach is
n
o
t
on
ly effectiv
e fo
r, ord
i
n
a
ry,
sm
al
l d
a
ta bu
t ev
en
for Big Data sets.
Ind
e
ed
, th
is wo
rk
resu
lted
i
n
b
e
tter
p
r
ed
ictio
n
s
whe
n
com
p
ared with Sarw
a
r
et. al. work [18]
al
t
hou
g
h
i
t
has
been ca
rri
e
d
on m
u
ch
bi
g
g
e
r
dat
a
set
.
T
o
put
i
t
di
f
f
ere
n
t
l
y
,
t
h
e best
pr
e
d
i
c
t
i
ons
o
b
t
a
i
n
ed b
y
Sarw
ar
et
. al
.
o
n
1
0
0
K
M
o
vi
eLense
dat
a
set
were
u
s
i
n
g t
r
ai
ni
n
g
dat
a
set
o
f
80
% a
n
d
k
∈
20
,100
as
th
e
y
g
e
t
M
A
E ra
ngi
ng
fr
om
0.74
8 t
o
0.
73
2.
Ho
we
ve
r, t
h
e M
A
E
o
b
t
ai
ned by
o
u
r
i
m
pl
em
ent
a
t
i
on, fo
r t
h
e sam
e
val
u
es
of
k
,
t
h
e
sam
e
trai
ni
n
g
rat
i
o
a
n
d
1M
M
o
vi
eLe
n
se
dat
a
set
,
w
e
re ra
n
g
i
n
g
bet
w
een
0
.
7
2
4
a
n
d
0.
73
0.
Whi
l
e
l
o
oki
ng
fo
r t
h
e be
st
val
u
e o
f
k
,
2
0
wa
s fo
u
nd as t
h
e
fav
o
ra
bl
e o
n
e
si
nce i
t
gave a
sm
al
l
val
u
e
of M
A
E w
h
e
n
checki
ng i
t
o
v
e
r di
f
f
ere
n
t
t
r
a
i
ni
ng
rat
i
o
s. T
h
is is reason
able wh
en
co
m
p
aring
it with
p
r
ev
ious
wo
rk
s w
h
i
c
h f
o
u
n
d
k = 1
4
[
1
0]
, [1
8]
o
r
k =
15
[1
7]
fo
r sm
al
l
e
r dat
a
set
.
Not
a
bl
e, i
n
cre
a
s
i
ng t
h
e
vol
um
e of t
h
e
d
a
ta set to
1
millio
n
ratin
g
s
did
n
o
t
,
d
r
am
ati
cally, in
cr
ease th
e v
a
lu
e of k
wh
ich
v
a
lid
ates o
t
h
e
r research
ers’
o
p
i
n
i
on
s,
r
e
ported
in
so
m
e
r
e
sear
ch
p
a
p
e
r
s
,
in
wh
ich
a small n
u
m
b
e
r
o
f
d
i
m
e
n
s
io
n
s
u
s
u
a
lly g
i
v
e
pr
etty g
ood
resu
lts
with
go
od
app
r
o
x
i
matio
n
to
t
h
e
orig
in
al m
a
trix
R. Th
is is simp
ly b
ecau
s
e a sm
a
ll v
a
lu
e
o
f
k
is
suf
f
i
c
i
e
nt
t
o
capt
u
re t
h
e im
port
a
nt
feat
ure
s
of use
r
s a
nd
i
t
e
m
s
and t
hus
m
a
ke go
od
p
r
edi
c
t
i
o
ns. H
o
weve
r,
i
n
creasi
n
g t
h
e
val
u
e
o
f
k m
i
ght
si
m
p
l
y
repr
esent
a
ddi
ng
m
o
re noi
se t
o
t
h
e dat
a
w
h
i
c
h
d
o
es
not
a
d
d
val
u
e t
o
t
h
e
p
r
ocess of
m
a
ki
ng p
r
edi
c
t
i
ons
.
Furt
herm
ore, t
r
y
i
ng
di
f
f
ere
n
t
im
put
at
i
on t
e
c
hni
que
s an
d t
r
acki
n
g t
h
ei
r M
A
E s
h
o
w
e
d
t
h
e im
port
a
nce
of
pre
-
pr
oces
si
ng st
e
p
s a
nd i
t
s
ef
fect
on t
h
e
pr
edi
c
t
i
on acc
u
r
acy
. As
pe
r o
u
r e
x
peri
m
e
nt
s,
M
ean_
I
t
e
m
A
vgR
at
i
n
g
&
Use
r
Av
gR
at
i
n
g out
per
f
o
r
m
e
d
ot
he
r i
m
put
at
i
on t
e
chni
que
s si
nce
i
t
gave l
o
we
r
M
A
E.
M
o
re
ove
r,
re
p
eat
i
ng t
h
e e
x
p
e
ri
m
e
nt
s
m
u
l
t
i
pl
e t
i
m
es wi
t
h
di
f
f
er
ent
val
u
es o
f
k
a
n
d
different
val
u
es
of t
r
ai
ni
n
g
rat
i
o
x
; rev
ealed
t
h
e sen
s
itiv
ity o
f
th
e pred
iction q
u
a
lity
to
th
e
sp
arsity of th
e
d
a
ta set sin
ce
MAE
values
dec
r
eas
e as the
traini
ng ratios
inc
r
ease and t
h
e
sp
arsity
d
ecrease.
Add
e
d
to th
at, it rev
e
aled
th
e
sig
n
i
fican
t
effect o
f
th
e v
a
lu
e o
f
k
on
th
e pred
ictio
n
qu
ality, as well as
t
h
e effectiv
en
ess o
f
SVD in
dealing
with
co
ld-start cases.
4.
CO
NCL
USI
O
N
R
ecom
m
e
nder
sy
st
em
s have been
devel
ope
d a
nd i
n
t
e
grat
e
d
i
n
m
a
ny
web
s
i
t
e
s, especi
al
l
y
e-
co
mmerce web
s
ites, lo
ng
time ag
o
.
Th
ey
pr
o
v
ed t
h
ei
r
p
o
we
rf
ul
ne
ss i
n
pro
v
i
d
i
ng
per
s
on
al
i
zed, cust
om
i
zed,
web c
o
nt
ent
t
o
di
ffe
rent
user
s by
recom
m
endi
ng c
o
nt
ent
(i
.e. i
t
e
m
s
i
n
the case o
f
e-c
o
m
m
e
rce webs
i
t
e
) o
f
interest to
each use
r
a
n
d thus
mitigate the
problem
of i
n
formation overl
o
a
d
on the
use
r
.
Di
ffe
re
nt
t
ech
ni
q
u
es a
n
d a
p
pr
oac
h
es a
r
e t
h
ere
f
o
r
rec
o
m
m
e
nder
sy
st
em
s. One
of
t
h
e wi
del
y
use
d
tech
n
i
qu
es is co
llab
o
rativ
e filterin
g
wh
ich
min
e
s th
e
in
teractio
n
reco
rd
s b
e
tween
u
s
ers an
d
item
s
, p
u
r
ch
ase
hi
st
ory
,
t
o
i
n
fe
r
user
’s t
a
st
e a
n
d
t
h
us
recom
m
ends i
t
e
m
s
t
h
at
m
a
t
c
h hi
s t
a
st
e.
Sur
p
ri
si
n
g
l
y
, recom
m
ender
sy
st
em
s, C
F
t
echni
ques
i
n
part
i
c
ul
ar
,
have
st
art
e
d
faci
n
g
s
o
m
e
chal
l
e
ng
es
wi
t
h
t
h
e
da
w
n
of
B
i
g Dat
a
e
r
a.
Thi
s
new
p
h
e
n
om
enon,
B
i
g
Dat
a
, i
s
i
n
fl
am
i
ng t
h
e
dat
a
vo
l
u
m
e
t
o
b
e
p
r
o
cessed
b
y
RS, as th
e
n
u
m
b
e
r
o
f
u
s
ers an
d
con
t
en
t
/ite
m
s
co
n
tin
ue to
in
crease,
an
d thu
s
raises so
m
e
co
n
c
ern
s
ab
out th
e sp
arsen
e
ss of the av
ailab
l
e d
a
ta
, scalab
ility o
f
R
S
s as
well as th
e
q
u
a
lity of th
e
pre
d
i
c
t
i
ons
.
Wi
t
h
t
h
e h
ope
of
recom
m
ender sy
st
em
s
t
o
cont
i
n
ue i
t
s
success
jo
u
r
ne
y
,
i
t
shoul
d p
r
oc
e
s
s
m
i
ll
i
ons o
f
i
t
e
m
s
and u
s
ers
per sec
o
nds
w
i
t
hout
deg
r
a
d
i
ng i
t
s
p
r
e
d
i
c
t
i
on acc
u
r
acy
. For t
h
i
s
p
u
r
p
o
s
e, ne
w
app
r
oaches
o
f
C
F
ha
ve
been
pr
o
pose
d
a
n
d s
t
udi
ed
i
n
resea
r
ches
after the
traditiona
l appr
o
a
ch
e
s
sho
w
ed
th
e
i
r
lim
itation. Am
ong these a
p
proaches is
Singular
va
lue decom
position. F
u
rthe
rm
ore, se
veral Big Data
fram
e
wor
k
s an
d AP
Is (s
uc
h as Had
o
o
p
, M
a
ho
ut
an
d Spa
r
k) h
a
ve
been r
e
l
eased an
d t
r
i
e
d i
n
b
u
i
l
d
i
n
g l
a
rge
-
scale recomm
e
nde
r system
s.
Thi
s
resea
r
c
h
wo
rk
m
a
kes a cont
ri
b
u
t
i
on t
o
t
h
e st
ate of t
h
e art of rec
o
mmender systems in the se
nse
t
h
at
i
t
pro
v
i
d
e
s
an i
m
pl
em
ent
a
t
i
on
of a l
a
rge scal
e S
V
D-
base
d rec
o
m
m
e
nder sy
st
em
usi
ng
bot
h
Apac
he
Hadoo
p and
Sp
ark
.
Th
is came as a
resu
lt o
f
an
in
ten
s
iv
e stu
d
y
t
o
th
e literatu
re as well as p
e
rfo
r
m
i
n
g
m
u
l
tip
le
expe
ri
m
e
nt
s usi
ng
Scal
a p
r
o
g
ram
m
i
ng l
a
ngua
ge
o
n
t
o
p
of a
p
ac
he H
a
d
o
o
p
a
n
d
S
p
ar
k
.
The
st
u
d
y
i
n
vol
ve
d
several
t
opi
cs
whi
c
h ar
e:
B
i
g
Dat
a
phe
n
o
m
e
no
n,
t
h
e
di
f
f
e
r
ent
t
e
c
hni
que
s an
d a
p
p
r
oac
h
es
of
rec
o
m
m
e
nder
sy
st
em
s t
oget
h
er wi
t
h
t
h
ei
r
pr
os a
nd c
o
ns, t
h
e chal
l
e
n
g
es p
o
se
d by
bi
g
da
t
a
on
recom
m
ende
r sy
st
em
s and C
F
in
p
a
rticu
l
ar, th
e app
licab
ility o
f
SVD for
reco
mm
en
d
e
r
syste
m
s as we
ll as
its effecti
v
en
ess in
so
lv
i
n
g
the
afo
r
em
ent
i
one
d c
h
al
l
e
n
g
es.
T
h
e e
x
peri
m
e
nt
s we
re c
o
nd
uct
e
d t
o
det
e
rm
i
n
e t
h
e
opt
i
m
al
val
u
es
of
t
w
o e
sse
nt
i
a
l
p
a
ram
e
ters th
at affect SVD-based
RS
wh
ich are: th
e im
p
u
t
atio
n
techn
i
que to
b
e
u
s
ed
i
n
fillin
g
th
e
u
s
er-ite
m
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJECE
Vol. 5, No. 6, D
ecem
ber
2015 :
1553 –
1563
1
562
m
a
t
r
i
x
bef
o
re
pr
ocessi
ng i
t
a
n
d
t
h
e
n
u
m
b
er of
di
m
e
nsi
ons
k
to
be retain
ed
after
d
eco
m
p
osin
g
t
h
e m
a
trix
. The
resul
t
s
s
h
o
w
e
d
t
h
at
M
ean_
I
t
e
m
A
vgR
at
i
ng&
User
A
vgR
at
i
n
g i
s
t
h
e be
st
i
m
put
at
i
on t
ech
ni
q
u
e an
d
k=2
0
i
s
t
h
e
opt
i
m
al
num
ber
of
di
m
e
nsi
o
n
s
as i
t
ga
ve t
h
e
l
o
west
M
A
E.
Th
is
work so
l
v
ed
th
e scalab
ility p
r
ob
lem
b
y
u
tilizi
n
g
Hadoo
p and
its
v
a
luab
le feat
u
r
es. In
ad
d
ition
,
it sh
o
w
ed
th
at
p
r
etty g
o
o
d
q
u
a
lity co
u
l
d
b
e
ach
ieved
by ch
o
o
sing
a robu
st i
m
p
u
t
atio
n
techn
i
qu
e
(as a
pre
p
r
o
cessi
ng
st
ep) be
fo
re ap
pl
y
i
ng S
VD t
o
t
h
e user-i
t
e
m
m
a
t
r
i
x
. M
o
reove
r, i
t
assert
ed t
h
at
Apac
he
Spa
r
k
co
m
e
s with
a
ttractiv
e
m
e
rit
s
wh
ich
en
able easy in
t
e
grat
i
on wi
t
h
H
a
do
o
p
an
d easy
devel
o
pm
ent
o
f
p
a
rallelizab
le co
d
e
.
Thi
s
dre
w
a
concl
u
si
o
n
t
h
a
t
a caref
ul
i
m
pl
em
ent
a
t
i
on of
a scal
abl
e
SV
D-
base
d c
o
l
l
a
bo
rat
i
v
e
filterin
g
recommen
d
e
r syste
m
is effecti
v
e wh
en
choo
sing
th
e ri
gh
t p
a
ram
e
ters an
d
th
e approp
riate
fram
e
wor
k
s
a
n
d API
s
.
5.
FUTU
RE W
O
RK
The
obt
ai
ne
d,
pr
om
i
s
i
ng, re
sul
t
s
are
j
u
st
t
h
e st
art
i
n
g p
o
i
n
t
.
O
n
e m
i
g
h
t
co
nsi
d
e
r
de
pl
oy
i
n
g t
h
i
s
i
m
p
l
e
m
en
tatio
n
o
f
SVD-b
a
sed
RS on
a
m
u
l
ti-n
o
d
e
cl
uster to ev
al
u
a
te its scalab
ility, p
e
rform
a
n
ce (its
com
put
at
i
on t
i
m
e
i
n
pa
rt
i
c
ul
ar) a
n
d acc
urac
y
i
n
a
di
st
ri
b
u
t
e
d m
ode.
In
ad
d
ition
,
mo
re research
to b
e
con
d
u
c
ted
to
exp
l
o
r
e
Apach
e
Spark
im
p
l
e
m
en
tatio
n
o
f
SVD
for a
gi
ve
n m
a
t
r
i
x
and
fi
n
d
o
u
t
p
o
ssi
bl
e
way
s
of i
m
pro
v
i
n
g
i
t
s
perf
orm
a
nce i
n
t
e
rm
of run
n
i
n
g t
i
m
e, resul
t
’
s
q
u
a
lity and
han
d
ling
co
ld
start p
r
o
b
l
em
. Fo
r t
h
is purpo
se,
on
e m
i
g
h
t
co
n
s
i
d
er a
hyb
rid
app
r
o
a
ch
th
at
com
b
i
n
es:
st
oc
hast
i
c
ve
rsi
o
n
of
SV
D
pr
op
os
ed by
Le
e an
d
C
h
an
g,
[9]
i
n
c
r
em
ent
a
l
versi
on
o
f
S
VD
p
r
o
pos
e
d
by Sarwar, B.
et al. [10] and Exp
ectation Maximization
technique pres
en
ted
by Kurucz et al. This
shoul
d
replace the
tra
d
itional im
ple
m
entation of
SVD by a
n
ite
rative process, which is
the
heart of Expe
ctation
M
a
xi
m
i
zat
i
on, t
h
at
a
ppl
i
e
s a
st
ochast
i
c
ver
s
i
on
o
f
S
V
D,
re
peat
edl
y
, t
o
a
m
a
t
r
i
x
an
d
use
t
h
e
out
c
o
m
e
of
one
i
t
e
rat
i
on t
o
im
put
e t
h
e i
n
put
of t
h
e ne
xt
i
t
e
rat
i
on. St
oc
hast
i
c
SVD co
ul
d
be d
one i
n
an
i
n
crem
ent
a
l
manne
r
su
ch
th
at t
h
e ad
v
e
n
t
o
f
a
n
e
w
u
s
er will
n
o
t i
m
p
l
y
re-co
m
p
u
ting
t
h
e d
e
co
m
p
o
s
itio
n
of u
s
er-item
matrix
;
bu
t
th
e n
e
w
u
s
er
will b
e
proj
ect to th
e ex
istin
g SVD m
o
d
e
l.
An
ot
he
r resea
r
ch ef
fo
rt
sh
oul
d be
dedi
cat
ed
t
o
ex
peri
m
e
nt
ot
he
r B
i
g Dat
a
t
ool
s an
d f
r
am
ewo
r
k suc
h
as Apac
he Ma
hout a
n
d com
p
are its
pe
rf
o
r
m
a
nce wi
t
h
Apa
c
he
S
p
a
r
k
.
REFERE
NC
ES
[1]
Schönberger V, and Cukier K., “A re
volution that will transfo
r
m how we
live, work, and
th
ink”, New Yor
k
:
Houghton Mifflin Harcourt, 2013
.
[2]
Chen J
., Chen Y
., D
u
X
.,
Li C.
,
Lu J
., and
Zhao
S
.,
et a
l
.
, “
B
ig Data Cha
lleng
e:
A da
ta management perspectiv
e”,
Front. Comput.
Sci.
Vol. 7, No.
2, pp
. 157-164
,
2013.
[3]
Bizer C., Boncz P., Brodie M. L., and Er
ling
O., “The M
eaningful Use of Bi
g Data:
Four Perspectives- Four
Challeng
es
”,
SIGMOD,
Vol. 4
,
No. 4, pp. 56-60, 2011.
[4]
Villa
A.,
“
T
ran
s
fering Big
Dat
a
Across the
G
l
obe”
,
Di
ssert
ati
on, New Ham
p
shire (NH): Un
iversit
y
of New
Hampshire Durham, 2012.
[5]
Sc
he
lte
r
S.,
Owen S.
,
Proceeding
s of ACM
RecS
ys Challeng
e ’12
,
Dublin, Ireland
, 2012.
[6]
Schönberger V.
M., Cukier K.,
“Big Data: A revolution th
at will transform how we live, work
, and think
”
. New
York: Houghton
Mifflin
Harcour
t, 2013
.
[7]
Chik
y R.
, Ghislo
ti R
.
,
and
Aoul Z
.
K.
,
Proceed
ing
s
of EGC 2012
,
Bordeaux, France, 2012
.
[8]
Thangav
e
l S
.
K.
, and Tham
pi N.
S
., “
P
erform
ance Anal
y
s
is of various Recommendation Algorith
m
Using Apache
Hadoop and
Mahout”,
IJS
ER,
V
o
l. 4
,
No
. 12
, pp
. 279-287, 2013.
[9]
Lee C
.
R.
, Chan
g Y. F., “
E
nhan
c
ing Accur
a
c
y
a
nd Perform
ance
of Collabor
ativ
e
Filtering Algor
i
t
hm
b
y
Stochast
i
c
SDV and Its MapReduce Implementation
”
.
In:
Ra
ś
Z W, Ohsuga S, ed
itors. IPD
P
SW 2013;
Cam
b
ridge.
US
A: IEEE
computer Society
,
pp
. 1869-187
8, 2013
.
[10]
S
a
rw
ar B.,
et a
l
.
, “Incr
e
m
e
ntal
Singular Valu
e Decom
position Algorithm
s
for Highl
y
Scal
abl
e
Recom
m
e
nder
S
y
ste
m
s”
,
5
th
International Conference on
Co
mpu
t
er and In
formation Science,
pp.
27-28, 2002
.
[11]
Kabore S
.
C., “
D
es
ign and Im
plem
enta
tion of a
Recom
m
e
nder S
y
s
t
em
as
a M
o
dule for Lifer
a
y P
o
rtal”, M
a
s
t
er
thesis, UPC: University
Poly
tech
nic of
Catalun
y
a
,
2012
.
[12]
Melville M., S
i
ndhwani V., “Recommender S
y
stems”, In
: Sammut, Claude, Webb,
Geoffrey
I,
editor
s
.
Ency
clopedia of
Machine Learning,
US: Springer
,
pp
. 829-838
, 2
010.
[13]
Rijm
enam
K. V., “Recom
m
e
nder
Engin
e
s ar
e Cr
u
c
ial for
Positive
User Experi
ences”, 2013
[14]
Owen S., Anil R
.
, Dunning T., an
d Fr
iedman
E.,
“Mahout in Action”, N
e
w
York:
Manning Publications, 2012.
[15]
Walunj S., Sad
a
fale K., “An
on
line R
ecom
m
e
ndat
i
on S
y
s
t
e
m
for E-co
mmerce Based
on
Apache Maho
ut
F
r
am
ework”,
20
13 annual
conference on
Compu
t
ers and peap
le research,
New Y
o
rk, pp
. 153-158
, 2013
.
[16]
Walunj S., Sadafale K., “Priced
based Recommendation S
y
stem”.
Internationa
l Journal of Resea
r
ch in Computer
Engineering and
Information Technology,
Vol. 1,
No. 1, 2013.
Evaluation Warning : The document was created with Spire.PDF for Python.