Indonesian
J
our
nal
of
Electrical
Engineering
and
Computer
Science
V
ol.
23,
No.
1,
July
2021,
pp.
510
518
ISSN:
2502-4752,
DOI:
10.11591/ijeecs.v23.i1.pp510-518
r
510
Efficient
multi-k
eyw
ord
similarity
sear
ch
o
v
er
encrypted
cloud
documents
A
yad
I.
Abdulsada,
Dhafer
G.
Honi,
Salah
Al-Darraji
Department
of
Computer
Science,
Colle
ge
of
Education
for
Pure
Science,
Uni
v
ersity
of
Basrah,
Iraq
Article
Inf
o
Article
history:
Recei
v
ed
Dec
6,
2020
Re
vised
Apr
12,
2021
Accepted
Jun
16,
2021
K
eyw
ords:
Cloud
computing
Multi-k
e
yw
ords
ranking
search
Pri
v
ac
y
preserving
Searchable
encryption
Simhash
ABSTRA
CT
Man
y
or
g
anizations
and
indi
viduals
are
attracted
to
outsource
the
ir
data
into
remote
cloud
service
pro
viders.
T
o
ensure
pri
v
ac
y
,
sensiti
v
e
dat
a
should
be
encrypted
be-
fore
being
hosted.
Ho
we
v
er
,
encryption
disables
the
direct
application
of
the
essential
data
management
operations
lik
e
searching
and
inde
xing.
Searchable
encryption
is
a
cryptographic
tool
that
gi
v
es
users
the
ability
to
search
the
encrypted
data
while
being
encrypted.
Ho
we
v
er
,
the
e
xisting
schemes
either
serv
e
a
single
e
xact
search
that
loss
the
ability
to
handle
the
misspell
ed
k
e
yw
ords
or
multi-k
e
yw
ord
search
that
generate
v
ery
long
trapdoors.
In
this
paper
,
we
address
the
problem
of
designing
a
practical
multi-k
e
yw
ord
similarity
scheme
that
pro
vides
short
trapdoors
and
returns
the
correct
results
according
to
their
similarity
scores.
T
o
do
so,
each
document
is
translated
into
a
compressed
trapdoor
.
T
rapdoors
are
generated
using
k
e
y
based
hash
functions
to
en-
sure
their
pri
v
ac
y
.
Only
authorized
users
can
issue
v
alid
trapdoors.
Similarity
scores
of
tw
o
te
xtual
documents
are
e
v
aluated
by
comput
ing
the
Hamming
distance
between
their
corresponding
trapdoors.
A
rob
ust
security
definition
is
pro
vided
together
with
its
proof.
Our
e
xperimental
results
illustrate
that
the
proposed
scheme
impro
v
es
the
search
ef
ficienc
y
compared
to
the
e
xisting
schemes.
Furthermore,
it
sho
ws
a
high
le
v
el
of
performance.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
A
yad
I.
Abdulsada
Department
of
Computer
Science,
Colle
ge
of
Education
for
Pure
Science
Uni
v
ersity
of
Basrah,
Iraq
Email:
ayad.abdulsada@uobasrah.edu.iq
1.
INTR
ODUCTION
Cloud
computing
is
a
promising
technology
that
supports
cost-ef
fecti
v
e
solutions
for
storing
and
pro-
cessing
lar
ge
datasets.
F
or
this
reason,
indi
viduals
and
or
g
anizations
with
constrained-resource
machines
tend
to
outsource
their
data
collections
to
such
professional
po
wer
serv
ers.
Ho
we
v
er
,
such
outsourced
service
may
raise
main
concerns
to
w
ards
users
pri
v
ac
y
,
where
personal
data
should
be
preserv
ed
[1].
Such
data
may
include
E-mail,
medical
information,
pri
v
ate
vi
d
e
os,
and
photos.
Therefore,
users
emplo
y
encryption
to
protect
the
pri
v
ac
y
of
their
confidential
data.
Unfortunately
,
e
n
c
ryption
disables
traditional
k
e
yw
ord
search
operations
on
remote
d
a
ta.
Searchable
encrypt
ion
schemes
allo
w
preforming
search
o
v
er
encrypted
data
at
the
serv
er
side
without
decryption.
Lik
e
search
o
v
er
plainte
xt
data,
searchable
encryption
methods
b
uild
a
searchable
inde
x
from
the
entire
dataset,
such
that
during
the
search,
only
trapdoors
generated
using
a
secret
k
e
ys
can
match
inde
x
entries
to
get
rele
v
ant
results.
Inde
x
contents
should
re
v
eal
nothing
to
the
adv
ersary
serv
er
.
Inde
x-based
search
not
only
enhances
search
ef
ficienc
y
,
b
ut
also
isolates
data
and
inde
x
encryption
schemes.
Under
the
set-
ting
of
searchable
symmetric
encryption
(SSE)
schemes,
the
encrypt
ed
data
are
uploaded
and
retrie
v
ed
by
the
J
ournal
homepage:
http://ijeecs.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
511
same
party
.
Most
of
SSE
methods
focus
on
servi
n
g
e
xact
single
k
e
yw
ord
search
queries
[2–6].
Some
methods
[7,
8]
focus
on
single
k
e
yw
ord
fuzzy
search.
The
results
of
such
schemes
are
too
lar
ge
especially
when
huge
data
collections
ar
e
used.
Us
ers
can
filter
the
res
ults
locally
by
sending
a
list
of
single-k
e
yw
ord
quer
ies
and
get
the
intersection
of
the
returned
results.
This
scheme
is
inef
ficient
and
requires
intensi
v
e
computations
for
the
user
.
Another
solution
is
to
find
the
intersection
at
the
serv
er
side
and
ask
the
latter
to
return
the
final
results
to
the
user
.
This
enables
the
serv
er
from
learning
undesirable
information
such
as
the
intersection
patter
n
s
for
all
k
e
yw
ords.
Therefore,
it
is
necessary
to
ha
v
e
SSE
schemes
that
enable
the
submission
of
search
queries
with
a
conjunction
of
se
v
eral
k
e
yw
ords.
Such
scheme
reduces
the
computation
b
urden
of
users
since
only
the
best
matching
documents
are
retrie
v
ed.
In
this
paper
,
we
propose
a
ne
w
pri
v
ac
y
preserving
yet
ef
ficient
multi-
k
e
yw
ord
similarity
search
scheme
that
returns
the
rele
v
ant
documents
according
to
their
similarity
scores
e
v
en
in
case
of
pro
viding
misspelled
k
e
yw
ords.
Contrib
utions:
Contrib
utions
are
briefly
described
as
follo
ws:
Firstly
,
we
emplo
y
Simhash
function
to
design
pri
v
ac
y-preserving
scheme
that
supports
multi-k
e
yw
ord
similarity
search.
Such
a
function
transforms
the
te
xtual
documents
into
a
fix
ed
size
v
ectors,
such
that
tw
o
similar
documents
l
ead
to
tw
o
similar
v
ectors.
Secondly
,
we
introduce
adapti
v
e
semantic
security
definition
and
pro
v
e
that
the
proposed
scheme
satisfies
this
definition.
Thirdly
,
we
utilize
a
ranking
method
that
emplo
ys
XOR
operation
between
tw
o
bit
v
ectors.
Finally
,
we
implement
the
proposed
scheme
and
sho
w
its
ef
ficienc
y
and
ef
fecti
v
eness.
Or
g
anization:
The
paper
is
outlined
as
follo
ws.
Secti
on
2
summarizes
the
pre
vious
w
orks.
Section
3
pro
vides
the
problem
description
and
illustrates
its
security
definition.
Section
4
sho
ws
the
basic
steps
of
the
proposed
scheme.
Section
5
reports
the
formal
pro
v
e
of
the
scheme.
Section
6
sho
ws
the
results
of
the
conducted
e
xperiments,
and
section
7
concludes
our
w
ork.
2.
RELA
TED
W
ORK
Single
k
e
yw
ord
SSE
schemes:
The
problem
of
searching
o
v
er
encrypted
data
is
addressed
by
v
arious
w
orks
in
the
literature.
The
majority
of
these
w
orks
focus
on
handling
a
single
k
e
yw
ord
search
requests.
Under
such
a
setting,
searchable
inde
x
is
generated
and
encrypted
before
being
uploaded.
Search
process
scans
the
encrypted
inde
x
to
identify
the
documents
that
includes
the
pro
vided
queries.
Goh
et
al.
[2]
introduced
a
Bloom
filter
based
construction,
where
the
unique
k
e
yw
ords
of
a
gi
v
en
document
are
hashed
into
a
single
filter
.
The
major
problem
of
this
w
ork
is
that
it
may
return
f
alse
positi
v
e
results.
Chang
et
al.
[5]
pro
vided
simulator
-based
security
definition,
which
is
stronger
than
[2].
Their
inde
x
construction
searches
without
f
alse
positi
v
e
results.
Later
on,
Curtmola
et
al.
[6]
emplo
yed
the
in
v
erted
inde
x
structure
to
describe
the
entire
collection.
Here,
search
time
is
log
arithmic
in
the
total
number
of
the
search
k
e
yw
ords
stored
on
the
serv
er
.
Strizho
v
and
Ray
[9]
and
Cash
et
al.
[3]
de
v
eloped
ne
w
schemes
that
gi
v
e
the
data
o
wner
the
ability
to
update
his
collection
with
a
minimum
leak.
Dynamic
SSE
constructions
with
adv
anced
security
definitions
are
presented
in
[10–15].
The
main
disadv
antage
of
the
abo
v
e
mentioned
schemes
is
that
the
y
use
only
single
k
e
yw
ord
search
which
return
massi
v
e
number
of
results,
where
most
of
them
are
not
rele
v
ant
to
the
users.
Single
Similarity
SSE
schemes:
Some
SSE
schemes
enhance
query
richness
to
support
simil
arity
search,
where
the
scheme
can
retrie
v
e
the
correct
results
e
v
en
with
pro
viding
misspelled
k
e
yw
ords.
In
the
w
ork
of
[7],
each
k
e
yw
ord
is
e
xpanded
by
listing
all
its
v
ariants
that
are
within
a
specific
edit
distance.
All
k
e
yw
ord
v
ariants
are
stored
in
the
searchable
inde
x,
which
require
more
computation
and
communication
costs.
K
uzu
et
al.
[8]
emplo
y
the
Minhash
technique
to
capture
the
Jaccard
distance
between
tw
o
sets.
Such
schemes
support
only
single
k
e
yw
ord
similarity
search.
Multy-k
e
yw
ord
SSE
schemes:
T
o
enhance
se
arch
functionality
,
multi-k
e
yw
ord
search
is
used
to
re-
trie
v
e
only
the
rele
v
ant
data
items.
Ca
o
et
al.
[16]
proposed
the
first
scheme
that
performs
rank
ed
multi-k
e
yw
ord
search,
where
each
document
is
represented
as
a
binary
v
ector
.
The
size
of
this
v
ector
is
determined
by
the
total
number
of
k
e
yw
ords
in
the
v
ocab
ulary
.
V
ectors
are
protected
by
multiplication
with
randomly
generated
ma-
trices.
This
scheme
requires
from
the
data
users
to
kno
w
the
position
of
each
k
e
yw
ord
to
generate
v
alid
search
requests.
Furthermore,
binary
v
ectors
ignore
the
importance
of
each
k
e
yw
ord
within
the
pro
vided
document.
Additionally
,
similarity
scores
are
defined
as
the
number
of
matched
k
e
yw
ords
between
tw
o
v
ectors,
which
is
not
standard
operation
for
ranking
the
returned
results.
Cash
et
al.
[4]
designed
an
ef
ficient
construction
that
supports
conjuncti
v
e
queries.
Such
a
construction
returns
only
the
documents
that
match
all
items
of
the
pro
vided
query
.
Ho
we
v
er
,
pairing-based
solutions
require
intensi
v
e
computational
costs.
¨
Orencik
et
al.
[17]
proposed
an
ef
ficient
rank
ed
multi-k
e
yw
ord
search
scheme.
In
this
scheme,
a
fix
ed-size
v
ector
is
generated
Ef
ficient
multi-k
e
ywor
d
similarity
sear
c
h
o
ver
encrypted
cloud
documents
(A
yad
I.
Abdulsada)
Evaluation Warning : The document was created with Spire.PDF for Python.
512
r
ISSN:
2502-4752
for
each
document.
The
underline
method
does
not
allo
w
for
ranking
results.
The
w
ork
of
[18]
described
an
ef
ficient
scheme
that
solv
ed
the
problem
of
document
ranking
with
multi-k
e
yw
ord
search.
The
y
emplo
yed
Minhash
method
for
generating
fingerprint
items
for
each
file.
The
items
of
the
entire
collection
are
used
to
construct
an
in
v
erted
inde
x.
Ho
we
v
er
,
such
scheme
assumes
the
e
xistence
of
tw
o
non-colluding
serv
ers
to
rank
the
returned
results.
Ne
w
security
impro
v
ements
are
presented
in
[19].
Our
scheme
uses
only
one
serv
er
which
performs
all
ranking
tasks.
Strizho
v
et
al.
[9]
used
inner
product
similarity
and
tf-idf
based
ranking
model.
Such
a
model
uses
onl
y
one
serv
er
to
answer
multi-k
e
yw
ord
queries,
where
results
are
returned
without
sorting.
Such
a
model
uses
inef
ficient
fully
homomorphi
c
encryption
scheme.
Baldimtsi
et
al.
[20]
designed
recently
a
tool
for
sorting
an
encrypted
data.
Such
a
tool
w
as
successfully
utilized
to
solv
e
the
problem
of
rank
ed
search
o
v
er
encrypted
data,
where
tw
o
serv
ers
are
needed.
Recently
,
a
secure
multi-k
e
yw
ord
rank
ed
search
that
release
dynamic
search
functionality
is
proposed
in
[21],
[22].
W
ang
et
al.
[23]
proposed
a
secure
scheme
that
protects
the
search
pattern.
Recently
,
Rani
et
al.
[24]
designed
a
tree-based
inde
x
to
solv
e
the
problem
of
secure
search.
3.
PR
OBLEM
DESCRIPTION
AND
SECURITY
DEFINITION
In
this
section,
the
notations,
formal
description
of
the
problem,
and
the
security
definition
are
pre-
sented.
3.1.
Notations
Let
be
the
s
ecurity
parameter
.
D
=
f
D
1
;
D
2
;
:
:
:
;
D
n
g
is
a
document
collection
of
size
n
,
D
i
=
f
w
1
;
:::;
w
m
g
is
a
set
of
m
distinct
k
e
yw
ords,
C
=
f
C
1
;
C
2
;
:
:
:
;
C
n
g
is
the
encrypted
collection,
id
(
C
i
)
is
the
identifier
of
document
C
i
,
j
C
i
j
is
the
size
of
the
encrypted
document
C
i
,
Q
=
f
Q
1
;
Q
2
;
:
:
:
;
Q
q
g
is
a
collection
of
q
successi
v
e
queries,
where
Q
i
=
f
w
1
;
:::w
s
g
is
a
list
of
s
k
e
yw
ords,
D
B
(
k
w
)
is
the
collection
of
documents
that
match
the
multi-k
e
yw
ord
query
k
w
,
and
S
I
=
f
S
I
1
;
:::;
S
I
n
g
is
the
inde
x,
where
S
I
i
is
the
document
inde
x
of
D
i
.
K
D
and
K
S
represent
secret
k
e
ys
for
encrypting
document
collection
and
searchable
inde
x,
respecti
v
ely
.
The
occurrence
of
k
e
yw
ord
w
i
within
a
gi
v
en
document
is
denoted
by
tf
i
,
whereas
T
is
the
secure
trapdoor
for
a
gi
v
en
search
query
.
Finally
,
t
is
the
user
defined
search
threshold.
3.2.
Pr
oblem
description
W
e
consider
the
follo
wing
setting:
a
data
o
wner
who
o
wns
a
pri
v
ate
collection
of
n
te
xtual
documents
D
=
f
D
1
;
:::;
D
n
g
.
He
intend
to
upload
his
pri
v
ate
collection
into
a
profes
sional
yet
untrusted
serv
er
,
which
pro
vides
data
storage
and
computing
services
with
an
ef
ficient
cost.
Before
outsourcing,
data
o
wner
generates
a
searchable
inde
x
S
I
and
uploads
it
together
with
the
encrypted
v
ersion
of
the
encrypted
documents
C
to
the
remote
serv
er
.
During
search
time,
authorized
users
pro
vide
multi-k
e
yw
ord
search
request
k
w
and
use
secret
k
e
ys
to
construct
a
v
alid
trapdoor
T
.
Once
recei
ving
T
,
the
ser
v
er
scans
the
searchable
inde
x
S
I
to
find
the
most
similar
encrypted
documents
that
match
the
pro
vided
trapdoor
.
Be
yond
what
data
o
wner
allo
ws
to
leak,
the
security
guarantee
pre
v
ents
the
serv
er
from
emplo
ying
the
outsourced
data
set
and
the
pro
vided
trapdoors
to
get
additional
information
about
the
underline
collection
and
the
search
k
e
yw
ords.
3.3.
Security
definition
The
objecti
v
e
of
an
y
SSE
scheme
is
to
protect:
(1)
document
collection
pri
v
ac
y
,
(2)
user
query
pri
v
ac
y
,
and
(3)
searchable
inde
x
pri
v
ac
y
.
Document
collection
pri
v
ac
y
is
protected
using
encryption
before
outsourc-
ing.
Interestingly
,
documents
are
encrypted
by
AES
which
supports
PCP
A
(pseudo-randomness
ag
ainst
chosen
plainte
xt
attack)
security
notion
[6].
Such
a
notion
guarantees
that
the
cipherte
xt
is
indistinguishable
from
truly
random
string.
User
queries
and
document
indices
are
tr
ansformed
into
binary
v
ectors
using
secret
k
e
ys.
Such
v
ectors
hide
both
the
number
and
content
of
the
underline
search
k
e
yw
ords.
The
majority
of
the
current
SSE
schemes
define
a
leakage
function,
which
describes
precisely
what
is
allo
wed
to
be
leak
ed
for
the
adv
ersary
serv
er
.
W
e
list
some
of
the
security
definitions
[6].
Definition
1:
History
(
H
q
)
represents
the
document
collection
D
and
the
set
of
search
queries
Q
.
Such
issue
represents
the
sensiti
v
e
information
that
should
not
be
leak
ed
to
the
serv
er
.
F
ormally
,
H
q
=
(
D
;
Q
)
.
Defini-
tion
2:
Access
pattern
(
A
(
Q
i
)
)
includes
the
collection
of
document
identifiers
D
B
(
Q
i
)
that
satisfy
the
pro
vided
search
query
Q
i
.
F
ormally
,
A
(
Q
i
)
=
D
B
(
Q
i
)
.
Definition
3:
Search
pattern
(
P
)
determines
the
repeated
search
queries,
where
it
is
determined
by
collecting
the
repeated
queries.
It
is
formulated
as
a
square
q
q
matrix,
where
P
(
i;
j
)
=
1
if
f
Q
i
=
Q
j
,
8
i;
j
=
1
;
:
:
:
;
q
.
Definition
4:
T
race
(
T
r
(
H
q
)
)
it
is
the
leakage
function.
Gi
v
en
the
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
23,
No.
1,
July
2021
:
510
–
518
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
513
history
H
q
,
trace
is
defined
as:
T
r
(
H
q
)
=
f
(
id
(
C
1
)
;
:
:
:
:;
id
(
C
n
))
;
(
j
C
1
j
;
:
:
:
;
j
C
n
j
)
;
A
(
H
q
)
;
P
(
H
q
)
g
.
Defini-
tion
5:
V
ie
w
(
V
(
H
q
)
)
represents
the
actual
information
that
the
adv
ersary
serv
er
can
see
during
the
e
x
ecution
of
the
SSE
scheme.
F
ormally
,
V
(
H
q
)
=
f
(
id
(
C
1
)
;
:
:
:
;
id
(
C
n
))
;
S
I
;
C
;
Q
g
.
Definition
6:
Adapti
v
e
semantic
security:
SSE
is
adapti
v
e
semantic
secure
if
gi
v
en
T
r
(
H
q
)
,
there
e
xists
S
algorithm
that
can
simulates
V
(
H
q
)
with
probability
1
for
all
probabilistic
polynomial
time
(
P
P
T
)
adv
ersaries,
where
is
a
ne
gligible
proba-
bility
.
Our
scheme
assumes
the
cloud
serv
er
to
be
semi-honest
party
,
where
it
follo
ws
e
xactly
the
construction
steps,
while
try
to
analyze
its
accessed
data
to
get
additional
information
be
yond
what
is
allo
wed
to
learn.
4.
THE
PR
OPOSED
SCHEME
W
e
adopt
a
model
of
tw
o-parties:
the
first
one
is
the
client
(data
o
wner),
and
the
second
one
is
the
serv
er
.
The
client
constructs
an
encrypted
inde
x,
and
outsources
it
along
with
the
encrypted
documents
to
the
serv
er
,
who
pro
vides
lar
ge
storage
and
responds
to
the
search
queries
of
the
client.
The
proposed
scheme
is
composed
of
fi
v
e
polynomial-time
algorithms:
=(
Gen
,
IndexBuilding
,
Makestrapdoor
,
Search
,
Dec
).
1.
(
K
S
;
K
D
)
Gen(
1
)
:
this
algorithm
is
run
by
data
o
wner
.
It
tak
es
the
security
parameter
and
returns
tw
o
secret
k
e
ys
K
S
and
K
D
.
2.
(
S
I
;
C
)
IndexBuilding(
D
;
K
S
;
K
D
)
:
this
algorithm
is
run
by
the
data
o
wner
to
generate
the
searchable
inde
x
S
I
and
the
encrypted
collection
C
,
gi
v
en
the
data
collection
D
,
and
the
secret
k
e
ys
K
D
and
K
S
.
3.
T
Makestrapdoor
(
k
w
;
K
S
)
:
Gi
v
en
the
k
e
yw
ord
set
k
w
,
data
o
wner
runs
this
algorithm
to
gen-
erate
the
secret
trapdoor
T
utilizing
the
secret
k
e
y
K
S
.
4.
R
Search
(
T
;
S
I
)
:
Once
recei
ving
search
trapdoor
T
,
cloud
serv
er
compares
it
with
all
of
S
I
entries
to
find
the
list
R
of
the
most
similar
documents.
5.
D
i
Dec
(
C
i
;
K
D
)
:
gi
v
en
the
sec
ret
k
e
y
K
D
,
data
user
runs
this
algorithm
to
get
the
plainte
xt
form
of
document
C
i
.
4.1.
Index
b
uilding
In
this
w
ork,
a
direct
inde
x
[25]
S
I
i
is
generated
for
each
document
D
i
.
Under
such
a
setting,
search
time
is
linear
to
the
number
of
document
collection
files.
The
process
of
inde
x
generation
includes
three
operations:
k
e
yw
ord
set
e
xtraction,
document
inde
x
construction,
and
trapdoor
generation.
K
e
yw
ord
set
e
xtract
ion:
Gi
v
en
the
document
D
i
,
data
o
wner
runs
a
preprocessing
step
to
refine
t
he
k
e
yw
ord
set
of
that
document.
Such
step
includes
some
techniques
that
are
borro
wed
from
information
re-
trie
v
al
systems.
Such
techniques
includes:
lo
wer
case
con
v
ersion,
tok
enization,
remo
ving
stop
w
ords,
and
stemming
[26].
Stemming
con
v
erts
the
dif
ferent
forms
of
the
same
w
ord
into
their
unique
stem.
F
or
e
xample,
the
w
ords
talks
,
talking
,
and
talk
ed
are
all
con
v
erted
to
the
w
ord
talk
.
This
process
increases
the
probability
of
finding
documents
that
ha
v
e
the
pro
vided
k
e
yw
ord
e
v
en
in
dif
ferent
forms.
After
refining
the
k
e
yw
ord
set,
the
occurrence
tf
j
of
each
k
e
yw
ord
w
j
within
D
i
is
computed.
At
the
end,
document
D
i
is
represented
as
high
dimensional
v
ector:
D
i
=
f
(
w
1
;
tf
1
)
;
:
:
:
;
(
w
m
;
tf
m
)
g
.
Document
inde
x
construction:
Gi
v
en
the
document
k
e
yw
ords
and
their
corresponding
oc
currences,
we
need
to
encode
them
together
into
a
small
fingerprint
of
fix
ed
size
l
(usually
l
=64).
T
o
do
so,
we
ap-
ply
the
Simhash
function
[27],
which
reduces
high
dimensional
v
ectors
such
that
tw
o
similar
te
xtual
in-
puts
will
produce
tw
o
fingerprints
of
a
minimum
Hamming
distance.
The
fingerprint
T
of
the
document
D
i
=
f
(
w
1
;
tf
1
)
;
:
:
:
;
(
w
m
;
tf
m
)
g
is
g
e
nerated
as
follo
ws:
initialize
l
zero
counters
<
s
1
;
:::;
s
l
>
.
Each
k
e
yw
ord
w
i
2
D
i
is
hashed
by
SHA-1.
If
the
bit
j
(
j
2
f
1
;
:::;
l
g
)
equals
1,
then
its
corresponding
counter
s
j
is
incremented
by
tf
i
.
Otherwise
s
j
is
decremented
by
tf
i
.
After
the
processing
of
all
document
k
e
yw
ords,
the
counter
v
alue
s
j
(
j
2
f
1
;
::;
l
g
)
is
set
to
1
if
it
is
positi
v
e.
Otherwise,
it
is
set
to
0.
The
fingerprint
is
the
final
binary
bits
of
<
s
0
;
:::;
s
l
>
.
T
rapdoor
generation:
Fingerprints
are
generated
using
SHA-1,
which
is
a
public
function.
Ho
we
v
er
,
using
such
a
function
raises
security
risks.
This
is
because
adv
ersary
serv
er
can
guess
search
k
e
yw
ords
with
brute
force
attack,
where
it
tries
to
compute
the
fingerprint
for
se
v
eral
inputs
until
it
finds
a
collision.
T
o
solv
e
this
problem,
we
generate
a
trapdoor
for
each
fingerprint
using
k
e
y
based
hashing
function
such
as
HMA
C,
where
k
e
yw
ords
are
hashed
using
a
secret
k
e
y
.
This
k
e
y
will
mak
e
the
brute
force
attack
a
hard
job
.
HMA
C
K
S
:
f
0
;
1
g
!
f
0
;
1
g
l
[28]
is
a
popular
message
authentication
code
that
uses
a
secret
k
e
y
K
S
to
hash
Ef
ficient
multi-k
e
ywor
d
similarity
sear
c
h
o
ver
encrypted
cloud
documents
(A
yad
I.
Abdulsada)
Evaluation Warning : The document was created with Spire.PDF for Python.
514
r
ISSN:
2502-4752
each
k
e
yw
ord
of
the
gi
v
en
document.
Other
steps
of
Simhash
function
still
without
changing.
Algorithm
1
illustrates
inde
x
construction.
T
rapdoors
of
all
document
collection
wil
l
be
the
final
inde
x
that
will
be
uploaded
to
the
serv
er
.
Algorithm
1:
Inde
x
b
uilding
Input
:
T
e
xtual
collection
D
=
f
D
1
;
D
2
;
:
:
:
;
D
n
g
,
secret
k
e
ys:
K
S
,
and
K
D
,
fingerprint
size
l
.
Output:
Searchable
inde
x
S
I
,
encrypted
collection
C
.
f
or
i
1
to
n
do
C
i
E
nc
K
D
(
D
i
);
f
(
w
1
;
tf
1
)
;
(
w
2
;
tf
2
)
;
;
:
:
:
;
(
w
m
;
tf
m
)
g
PreProcessing
(
D
i
);
Initialize
zero
v
ector
<
s
1
:
:
:
s
l
>
;
f
or
j
1
to
m
do
MA
C
H
M
AC
K
S
(
w
j
);
f
or
u
1
to
l
do
if
MA
C
u
=
0
then
s
u
=
s
u
tf
j
else
s
u
=
s
u
+
tf
j
f
or
u
1
to
l
do
if
s
u
0
then
S
I
i
[
u
]
=
0
else
S
I
i
[
u
]
=1
S
I
=
f
S
I
1
;
:::;
S
I
n
g
;
C
=
f
C
1
;
:::;
C
n
g
;
Retur
n
S
I
,
C
;
4.2.
T
rapdoor
construction
Gi
v
en
the
search
k
e
yw
ord
set
k
w
=
f
w
1
;
w
2
;
:
:
:
;
w
s
g
,
data
user
utilizes
the
secret
k
e
y
K
S
,
which
is
obtained
from
the
data
o
wner
via
a
secret
channel,
to
gener
ate
Simhash
fingerprint
T
for
such
a
set.
T
rapdoor
is
constructed
in
the
same
w
ay
of
document
inde
x
generation.
Note
that
the
trapdoor
size
is
l
,
which
is
unrelated
to
the
number
of
k
e
yw
ords
s
,
so
s
will
be
hidden
from
the
adv
ersary
serv
er
.
T
o
construct
a
trapdoor
,
data
user
performs
only
hash
functions
and
fe
w
bitwise
comparisons.
4.3.
Pri
v
acy
pr
eser
ving
sear
ch
Cloud
serv
er
e
v
aluates
the
Hamming
distance
between
the
trapdoor
T
and
each
document
inde
x
S
I
i
,
i
=
f
1
;
:::;
n
g
.
P
articularly
,
he
e
v
aluates
the
XOR
operation
between
tw
o
binary
v
ectors.
Similarity
scores
of
tw
o
binary
v
ectors
are
determined
by
identifying
the
number
of
matched
bits.
T
o
control
the
number
of
returned
results,
users
pro
vide
a
threshold
v
alue
t
,
such
that
only
the
encrypted
files
of
the
top-
t
minimum
scores
are
r
eturned.
The
search
operation
cost
is
il
lustrated
as
follo
ws:
cloud
serv
er
needs
to
perform
a
binary
comparison
of
l
-bit
fingerprint
with
n
entries,
then
it
sorts
the
similarity
scores,
and
selects
only
the
t
-minimum
scores.
Algorithm
2
describes
the
steps
of
serv
er
search
operation.
Once
recei
ving
the
t
encrypted
documents
that
ha
v
e
the
best
matching
scores,
data
user
utilizes
the
secret
k
e
y
K
D
to
decrypt
them.
Algorithm
2:
Pri
v
ac
y
preserving
search
Input
:
Simhash
fingerprint
T
,
threshold
t
,
Searchable
inde
x
S
I
,
and
encrypted
collection
C
.
Output:
encrypted
documents
of
the
top-
t
similarity
scores.
f
or
i
1
to
n
do
XV
ect
XOR
(
S
I
i
,
T
)
;
Scores
[
i
]
P
l
j
=1
XV
ect
[
j
]
;
Sor
t
score
Sort
(
Scores
);
Inde
x
Minimum
t
(
Sor
t
score
);
f
or
j
1
to
t
do
R
j
C
(
Inde
x
[
j
])
;
Retur
n
R
;
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
23,
No.
1,
July
2021
:
510
–
518
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
515
5.
SECURITY
DISCUSSION
W
e
emplo
y
the
real
vs
ideal
g
ames
approach
to
pro
v
e
the
security
of
our
proposed
scheme
according
to
Definition
6.
In
the
real
g
ame,
the
client
beha
v
es
as
in
the
proposed
scheme,
while
in
the
ideal
g
ame
,
we
b
uilds
a
simulator
who
emplo
ys
the
leak
ed
information
to
simulate
the
beha
vior
of
the
client
to
w
ards
the
serv
er
.
If
there
is
no
PPT
distinguisher
that
can
distinguish
the
beha
vior
of
the
tw
o
g
ames,
then
the
scheme
is
considered
a
se-
cure.
The
intuition
of
this
approach
is:
gi
v
en
user
pri
v
ate
inputs
H
q
,
accessible
information
to
the
serv
er
V
(
H
q
)
,
and
the
leakage
function
T
r
(
H
q
)
,
then
if
there
e
xists
a
simulator
S
that
emplo
ys
T
r
(
H
q
)
to
simulate
V
(
H
q
)
such
that
no
P
P
T
distinguisher
can
distinguish
V
(
H
q
)
from
the
simulated
one,
then
the
leakage
is
not
useful
and
hence
the
security
is
ensured.
This
is
because
T
r
(
H
q
)
does
not
help
the
adv
ersary
to
learn
an
y
useful
in-
formation
from
V
(
H
q
)
.
F
ormally
,
let
the
original
vie
w
V
(
H
q
)
=
V
(
H
q
)
=
f
(
id
(
C
1
)
;
:
:
:
:;
id
(
C
n
)
;
S
I
;
C
;
Q
g
,
and
the
tr
ace
is
T
r
(
H
q
)
=
f
(
id
(
C
1
)
;
:
:
:
:;
id
(
C
n
))
;
(
j
C
1
j
;
:
:
:
;
j
C
n
j
)
;
A
(
H
q
)
;
P
(
H
q
)
g
.
Let
V
(
H
q
)
be
the
simulated
vie
w
,
which
is
defined
as:
V
(
H
q
)
=
f
(
id
(
C
1
)
;
:
:
:
:;
id
(
C
n
))
;
S
I
;
C
;
Q
g
.
Our
scheme
is
con-
sidered
to
be
secure
if
V
(
H
q
)
is
indistinguishable
from
V
(
H
q
)
.
Document
identifiers
id
(
C
j
)
of
V
(
H
q
)
are
a
v
ailable
in
T
r
(
H
q
)
.
Thus
S
can
simulate
them
simply
by
setting
id
(
C
j
)
=
id
(
C
j
)
.
Simulator
kno
ws
from
T
r
(
H
q
)
the
number
of
outsourced
documents
and
the
real
size
of
e
ach
one.
Thus,
it
can
create
a
simulated
v
ersion
C
j
for
each
document
C
j
by
encrypting
a
random
stream
of
size
j
C
j
j
.
This
beha
vior
cannot
be
distin-
guished
by
an
y
PPT
distinguis
her
due
to
the
security
of
PCP
A
encryption
method.
Similarly
,
S
simulates
S
I
by
generating
n
random
binary
v
ectors
S
I
j
of
size
l
.
S
I
j
can
not
be
distinguished
from
S
I
j
since
it
is
generated
using
a
secure
hash
function
HMA
C.
No
w
,
S
simulates
the
q
successi
v
e
queries
Q
=
f
Q
1
;
Q
2
;
:
:
:
;
Q
q
g
of
V
(
H
q
)
as
follo
ws:
if
Q
j
w
as
not
issued
before,
as
indicated
by
A
(
H
q
)
of
T
r
(
H
q
)
,
then
it
constructs
a
random
Q
j
.
Otherwise
it
sets
Q
j
=
Q
p
,
where
p
is
pre
vious
inde
x
for
Q
j
.
Note
that
8
i;
Q
i
is
indistinguishable
from
Q
i
.
W
e
e
xplain
ho
w
S
can
simulate
V
(
H
q
)
.
Hence,
our
proposed
scheme
is
secure.
6.
EXPERIMENT
AL
EV
ALU
A
TION
W
e
demonstrate
the
ef
ficienc
y
and
ef
fecti
v
eness
of
our
proposed
scheme
by
e
v
aluating
thorough
e
x-
periments.
The
proposed
scheme
is
e
v
aluated
by
Ja
v
a
language
using
64-bit
W
indo
ws
10
operating
system
with
Intel
Core-i7
processor
of
1.8
GHz.
During
the
e
xperiments,
8000
randomly
selected
files
are
from
the
RCV1
database
[29],
a
list
of
570
stop
w
ords
are
used,
Porter
algorithm
[30]
is
used
to
stem
the
remaining
w
ords.
Inde
x
generation
time
is
determined
by
the
parameter
l
.
Figure
1a
sho
ws
the
ef
fect
of
l
for
v
ariable
collection
sizes.
It
is
ob
vious
that
long
Simhash
fingerprints
require
more
processing
time.
0
2
;
000
4
;
000
6
;
000
8
;
000
0
100
200
300
400
Number
of
documents
Inde
x
b
uilding
time
(s)
l
=
64
l
=
128
l
=
256
l
=
384
(a)
0
2
;
000
4
;
000
6
;
000
8
;
000
0
100
200
300
400
500
Number
of
documents
Inde
x
b
uilding
time
(s)
Minhash
Simhash
(b)
Figure
1.
Inde
x
b
uilding
time:
(a)
ef
fect
of
l
on
inde
x
generation
time
and
(b)
comparison
of
our
scheme
with
[18]
in
terms
of
inde
x
b
uilding
time
Our
scheme
is
compared
with
the
w
ork
of
[18]
in
terms
of
inde
x
generation
time.
Figure
1b
demon-
strates
that
the
method
of
[18]
requires
more
time
than
our
w
ork,
since
it
uses
hea
vy
cryptographic
primiti
v
es.
W
e
used
tw
o
e
v
aluation
metrics
to
sho
w
the
ef
fecti
v
eness
of
our
scheme:
precision
and
recall.
Preci-
sion
pr
ec
refers
to
the
percentage
of
rele
v
ant
documents
from
the
o
v
erall
returned
documents,
while
recall
r
ec
Ef
ficient
multi-k
e
ywor
d
similarity
sear
c
h
o
ver
encrypted
cloud
documents
(A
yad
I.
Abdulsada)
Evaluation Warning : The document was created with Spire.PDF for Python.
516
r
ISSN:
2502-4752
refers
to
the
percentage
of
rele
v
ant
documents
retrie
v
ed
among
the
total
number
of
rele
v
ant
documents.
Let
k
w
=
f
w
1
;
w
2
;
:
:
:
;
w
s
g
be
the
set
of
search
k
e
yw
ords,
R
(
k
w
)
be
the
set
of
retrie
v
ed
documents
and
R
(
k
w
)
be
the
set
of
retrie
v
ed
documents
that
include
all
k
e
yw
ord
set
k
w
,
D
B
(
k
w
)
be
set
of
documents
in
the
entire
collection
that
contains
all
k
e
yw
ord
set
k
w
.
Note
R
(
k
w
)
R
(
k
w
)
and
R
(
k
w
)
D
B
(
k
w
)
.
W
e
define
the
precision(
P
r
ec
(
k
w
)
),
recall
(
R
ec
(
k
w
)
),
a
v
erage
precision
(
Av
g
pr
ec
(
k
w
)
)
and
A
v
erage
recall
(
Av
g
r
ec
(
k
w
)
)
as
follo
ws:
P
r
ec
(
k
w
)
=
j
R
(
k
w
)
j
j
R
(
k
w
)
j
;
(1)
R
ec
(
k
w
)
=
j
R
(
k
w
)
j
j
D
B
(
k
w
)
j
(2)
Av
g
pr
ec
(
k
w
)
=
q
X
i
=1
pr
ec
(
k
w
i
)
q
;
(3)
Av
g
r
e
c
(
k
w
)
=
q
X
i
=1
r
ec
(
k
w
i
)
q
(4)
Figure
2a
sho
ws
the
accurac
y
of
results
of
our
proposed
scheme
with
v
ariable
Simhash
size
l
.
In
this
e
xperiment,
150
search
queries
are
pro
vided
with
v
ariable
number
of
k
e
yw
ords
and
only
the
top
20
matching
documents
are
returned.
0
100
200
300
400
0
0
:
2
0
:
4
0
:
6
0
:
8
1
Fingureprint
size
Accurac
y
Precision
Recall
(a)
0
1
2
3
4
5
6
7
8
9
10
0
0
:
2
0
:
4
0
:
6
0
:
8
1
Number
of
k
e
yw
ords
Precision
Simhash
Minhash
(b)
Figure
2.
Ef
fecti
v
eness
e
v
aluation:
(a)
Fingureprint
size
and
(b)
Number
of
k
e
yw
ords
In
the
ne
xt
e
xperiment,
we
e
v
aluate
ho
w
the
number
of
k
e
yw
ords
s
in
the
search
query
can
af
fect
the
result
accurac
y
.
Figure
2b
sho
ws
the
precision
of
results
for
dif
ferent
schemes
with
v
ariable
number
of
search
k
e
yw
ords.
F
or
both
schemes,
precision
decreases
as
the
number
of
k
e
yw
ords
increases
from
1
to
10.
This
is
because
when
the
number
of
search
k
e
yw
ords
increased
the
non
rele
v
ant
retrie
v
ed
documents
will
be
accumulated,
leading
to
lo
wer
precision.
Notice
that,
the
accurac
y
of
our
proposed
scheme
outperforms
MinHash-based
scheme,
since
it
better
inte
grates
the
term
frequenc
y
of
each
k
e
yw
ord
in
the
generated
v
ector
,
whereas
MinHash
method
ignores
such
v
aluable
information.
7.
CONCLUSION
In
this
paper
,
we
proposed
a
practical
scheme
for
a
multi-k
e
yw
ord
similarity
search
o
v
er
encrypted
data.
The
proposed
scheme
answers
the
multi-k
e
yw
ord
queries
and
can
handle
the
misspelled
k
e
yw
ords.
Doc-
uments
are
retrie
v
ed
to
the
user
according
to
specific
ranking
method.
The
scheme
is
constructed
according
to
the
setting
of
direct
inde
xing,
where
a
specific
inde
x
is
constructed
for
each
document.
Simhash
function
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
23,
No.
1,
July
2021
:
510
–
518
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
517
is
emplo
yed
to
generate
a
small
fix
ed
size
fingerprint
for
each
document.
M
u
l
ti-k
e
yw
ord
search
queries
are
constructed
in
the
same
w
ay
,
so
the
search
operation
is
e
v
aluated
simply
by
performing
Hamming
distance.
T
o
illustrate
the
security
of
our
scheme,
a
precis
definition
of
s
ecurity
is
presented
along
with
its
formal
proof.
Extensi
v
e
e
xperiments
were
conducted
to
sho
w
the
practical
v
alue
of
our
scheme.
The
proposed
scheme
sho
ws
high
precision
of
70%
and
recall
of
85%.
Our
scheme
requires
395
ms
to
generate
the
inde
x
for
8000
docu-
ments.
REFERENCES
[1]
W
.
Hassan,
T
.-S.
Chou,
X.
Li,
P
.
Appiah-K
ubi,
and
O.
T
amer
,
“Latest
trends,
challenges
and
solutions
in
security
in
the
era
of
cloud
computing
and
softw
are
defined
netw
orks,
”
International
J
ournal
of
Informatics
and
Communic
ation
T
ec
hnolo
gy
(IJ-ICT)
,
v
ol.
8,
no.
3,
pp.
162–183,
2019,
doi:
10.11591/ijict.v8i3.pp162-183.
[2]
E.
Goh,
“Secure
inde
x
es,
”
IA
CR
Cryptol.
ePrint
Ar
c
h.
,
v
ol.
2003,
p.
216,
2003.
[Online].
A
v
ailable:
http://eprint.iacr
.or
g/2003/216
[3]
D.
Ca
sh,
J.
Jae
ger
,
S.
Jarecki,
C.
S.
Jutla,
H.
Kra
wczyk,
M.
Rosu,
and
M.
Steiner
,
“Dynamic
searchable
encryption
in
v
ery-lar
ge
databases:
Data
structures
and
implementation,
”
in
21st
Annual
Network
and
Distrib
uted
System
Security
Symposium,
NDSS
2014,
San
Die
go,
California,
USA,
F
ebruary
23-26,
2014
.
The
Internet
Society
,
2014.
[Online].
A
v
ailable:
https://www
.ndss-symposium.or
g/ndss2014/dynamic-sea
rchable-encryption-v
ery-lar
ge-
databases-data-structures-and-implementation
[4]
D.
Cas
h,
S.
Jarecki,
C.
S.
Jutla,
H.
Kra
wczyk,
M.
Rosu,
and
M.
Steiner
,
“Highly-scalable
searchable
symmetric
encryption
with
support
for
boolean
queries,
”
in
Advances
in
Cryptolo
gy
-
CR
YPT
O
2013
-
33r
d
Annual
Cryptolo
gy
Confer
ence
,
Santa
Barbar
a,
CA,
USA,
A
ugust
18-22,
2013.
Pr
oceedings,
P
art
I
,
ser
.
Lecture
Notes
in
Computer
Science,
R.
Canett
i
and
J.
A.
Garay
,
Eds.,
v
ol.
8042.
Springer
,
2013,
pp.
353–373,
doi:
10.1007/978-3-642-40041-4
20.
[5]
Y
.
Chang
and
M.
Mitz
enmacher
,
“Pri
v
ac
y
preserving
k
e
yw
ord
searches
on
remote
encrypted
data,
”
in
Applied
Crypto
gr
aphy
and
Network
Security
,
Thir
d
International
Confer
ence
,
A
CNS
2005,
Ne
w
Y
ork,
NY
,
USA,
J
une
7-10,
2005,
Pr
oceedings
,
ser
.
Lecture
Notes
in
Computer
Science,
J.
Ioannidis,
A.
D.
K
eromytis,
and
M.
Y
ung,
Eds.,
v
ol.
3531,
2005,
pp.
442–455,
doi:
10.1007/11496137
30.
[6]
R.
Curtmola,
J.
A.
Garay
,
S.
Kamara,
and
R.
Ostro
vsk
y
,
“Searchable
symmetric
encryption:
Impro
v
ed
definitions
and
ef
ficient
constructions,
”
IA
CR
Cryptol.
ePrint
Ar
c
h.
,
v
ol.
2006,
p.
210,
2006.
[Online].
A
v
ailable:
http://eprint.iacr
.or
g/2006/210
[7]
J.
Li,
Q.
W
ang,
C.
W
a
ng,
N.
Cao,
K.
Ren,
and
W
.
Lou,
“Fuzzy
k
e
yw
ord
search
o
v
er
encrypted
data
in
cloud
computing,
”
in
2010
Pr
oceedings
IEEE
INFOCOM
,
2010,
pp.
1-5,
doi:
10.1109/INFCOM.2010.5462196.
[8]
M.
K
uzu,
M.
S.
Isl
am,
and
M.
Kantarcioglu,
“Ef
ficient
similarity
search
o
v
er
encrypted
data,
”
in
2012
IEEE
28th
International
Confer
ence
on
Data
Engineering
,
2012,
pp.
1156-1167,
doi:
10.1109/ICDE.2012.23.
[9]
M.
Strizho
v
and
I.
Ray
,
“Secure
multi-k
e
yw
ord
similarity
search
o
v
er
encrypted
cloud
data
supporting
ef
ficient
multi-user
setup,
”
T
r
ans.
Data
Priv
.
,
v
ol.
9,
no.
2,
pp.
131–159,
2016,
doi:
10.5555/2993206.2993208.
[10]
X.
Song,
C.
Dong,
D.
Y
uan,
Q.
Xu,
and
M.
Zhao,
“F
orw
a
rd
pri
v
ate
searchable
symmetric
encryption
with
optimized
I/O
ef
ficienc
y
,
”
IEEE
T
r
ans.
Dependable
Secur
.
Comput.
,
v
ol.
17,
no.
5,
pp.
912–927,
2020,
doi:
10.1109/TDSC.2018.2822294.
[11]
J.
G.
Chamani,
D.
P
apadopoulos,
C.
P
apamanthou,
and
R.
Jalili,
“Ne
w
constructions
for
forw
ard
and
backw
ard
pri
v
ate
symmetric
searchable
encryption,
”
in
Pr
oceedings
of
the
2018
A
CM
SIGSA
C
Confer
ence
on
Computer
and
Communications
Security
,
CCS
2018,
T
or
onto,
ON,
Canada,
October
15-19,
2018
,
D.
Lie,
M.
Mannan,
M.
Back
es,
and
X.
W
ang,
Eds.
A
CM,
2018,
pp.
1038–1055,
doi:
10.1145/3243734.3243833.
[12]
S.
Chatterjee,
S.
K.
P
.
Puria,
and
A.
Shah,
“Ef
ficient
backw
ard
pri
v
ate
searchable
encryption,
”
J
.
Comput.
Secur
.
,
v
ol.
28,
no.
2,
pp.
229–267,
2020,
doi:
10.3233/JCS-191322.
[13]
I.
Demertzis,
J.
G.
Chamani,
D.
P
apadopoulos,
and
C.
P
apamanthou,
“Dynamic
searchable
encryption
with
small
client
storage.
”
IA
CR
Cryptol.
ePrint
Ar
c
h.
,
v
ol.
2019,
p.
1227,
2019.
[Online].
A
v
ailable:
https://eprint.iacr
.or
g/2019/1227
[14]
G.
Amjad,
S.
Kamara,
and
T
.
Moataz,
“F
orw
ard
and
backw
ard
pri
v
ate
searchable
encryption
with
sgx,
”
in
Pr
oceedings
of
the
12th
Eur
opean
W
orkshop
on
Systems
Security
,
2019,
pp.
1–6,
doi:
10.1145/3301417.3312496.
[15]
L.
Bingjie,
Z.
Jun,
and
C.
Zhenfu,
“
A
multi-user
forw
ard
secure
dynamic
symmetric
searchable
encryption
with
enhanced
security
,
”
J
ournal
of
Computer
Resear
c
h
and
De
velopment
,
v
ol.
57,
no.
10,
p.
2104,
2020.
[16]
N.
Cao,
C.
W
ang,
M.
Li,
K.
Ren,
and
W
.
Lou,
“Pri
v
ac
y-preserving
multi-k
e
yw
ord
rank
ed
search
o
v
er
encrypted
cloud
data,
”
in
IEEE
T
r
ansactions
on
P
ar
al
lel
and
Distrib
uted
Systems
,
v
ol.
25,
no.
1,
pp.
222-233,
Jan.
2014,
doi:
10.1109/TPDS.2013.45.
[17]
C.
¨
Orencik
and
E.
Sa
v
as,
“
An
ef
ficient
pri
v
ac
y-preserving
multi-k
e
yw
ord
search
o
v
er
encrypted
cloud
data
with
ranking,
”
Distrib
uted
P
ar
allel
Databases
,
v
ol.
32,
no.
1,
pp.
119–160,
2014,
doi:
10.1007/s10619-013-7123-9.
Ef
ficient
multi-k
e
ywor
d
similarity
sear
c
h
o
ver
encrypted
cloud
documents
(A
yad
I.
Abdulsada)
Evaluation Warning : The document was created with Spire.PDF for Python.
518
r
ISSN:
2502-4752
[18]
C.
¨
Orencik,
M.
Kantarcioglu,
and
E.
Sa
v
as,
“
A
practical
and
secure
multi-k
e
yw
ord
search
method
o
v
er
encrypted
cloud
data,
”
in
2013
IEEE
Sixth
International
Confer
ence
on
Cloud
Computing
,
2013,
pp.
390-397,
doi:
10.1109/CLOUD.2013.18.
[19]
C.
¨
Orencik,
A.
Selcuk,
E.
Sa
v
as,
and
M.
Kantarcioglu,
“Multi-k
e
yw
ord
search
o
v
er
encrypted
data
with
scoring
and
search
pattern
obfuscation,
”
Int.
J
.
Inf
.
Sec.
,
v
ol.
15,
no.
3,
pp.
251–269,
2016,
doi:
10.1007/s10207-015-0294-9.
[20]
F
.
Baldimtsi
and
O.
Ohrimenk
o,
“Sorting
and
searching
behind
the
curtain,
”
in
F
inancial
Crypto
gr
aphy
and
Data
Security
-
19th
International
Confer
ence
,
FC
2015,
San
J
uan,
Puerto
Rico,
J
anuary
26-30,
2015,
Re
vised
Selected
P
aper
s
,
ser
.
Lecture
Notes
in
Computer
Science,
R.
B
¨
ohme
and
T
.
Okamoto,
Eds.,
v
ol.
8975.
Springer
,
2015,
pp.
127–146,
doi:
10.1007/978-3-662-47854-7
8.
[21]
Z.
Xia,
X.
W
ang,
X.
Sun,
and
Q.
W
ang,
“
A
secure
and
dynamic
multi-k
e
yw
ord
rank
ed
search
s
cheme
o
v
er
encrypted
cloud
data,
”
IEEE
T
r
ansactions
on
P
ar
allel
and
Dist
rib
uted
Systems
,
v
ol.
27,
no.
2,
pp.
340-352,
1
Feb
.
2016,
doi:
10.1109/TPDS.2015.2401003.
[22]
C.
Guo,
R.
Zhuang,
C.
Chang,
and
Q.
Y
uan,
“Dynamic
multi-k
e
yw
ord
rank
ed
search
based
on
bloom
filter
o
v
er
encrypted
cloud
data,
”
IEEE
Access
,
v
ol.
7,
pp.
35826-35837,
2019,
doi:
10.1109/A
CCESS.2019.2904763.
[23]
B.
W
ang,
W
.
Song,
W
.
Lou,
and
Y
.
T
.
Hou,
“In
v
erted
i
nde
x
based
multi-k
e
yw
ord
public-k
e
y
searchable
encryption
with
strong
pri
v
ac
y
guarantee,
”
in
2015
IEEE
Confer
ence
on
Computer
Communications
(INFOCOM)
,
2015,
pp.
2092-2100,
doi:
10.1109/INFOCOM.2015.7218594.
[24]
K.
Pushpa
Rani,
L.
Lakshmi,
C.
Sabitha,
B.
Dhana
Lakshmi,
and
S.
Sreeja,
“T
op-k
search
scheme
on
encrypted
data
in
cloud,
”
International
J
ournal
of
Advances
in
Applied
Sciences
(IJ
AAS)
,
v
ol.
9,
no.
1,
pp.
67–69,
2020,
doi:
10.11591/ijaas.v9.i1.pp67-69.
[25]
G.
S.
Poh,
J.
Chin,
W
.
Y
au,
K.
R.
Choo,
and
M.
S.
Mohamad,
“Searchable
symmetric
encryption:
Designs
and
challenges,
”
A
CM
Comput.
Surv
.
,
v
ol.
50,
no.
3,
pp.
1-37,
2017,
doi:
10.1145/3064005.
[26]
M.
Sanderson,
“Christopher
d.
manning,
prabhakar
ragha
v
an,
hinrich
sch
¨
utze,
Introducti
on
to
Information
Retrie
v
al,
cambridge
uni
v
ersity
press
2008.
ISBN-13
978-0-521-86571-5,
xxi
+
482
pages,
”
Nat.
Lang
.
Eng
.
,
v
ol.
16,
no.
1,
pp.
100–103,
2010,
doi:
10.1017/S1351324909005129.
[27]
M.
Charikar
,
“Similarity
estimation
techniques
from
rounding
algorithms,
”
in
Pr
oceedings
on
34th
Annual
A
CM
Symposium
on
Theory
of
Computing
,
May
19-21,
2002,
Montr
´
eal,
Qu
´
ebec,
Canada
,
J.
H.
Reif,
Ed.
A
CM,
2002,
pp.
380–388,
doi:
10.1145/509907.509965.
[28]
M.
Bell
are,
R.
Canetti,
and
H.
Kra
wczyk,
“K
e
ying
hash
functions
for
message
authentication,
”
in
Advances
in
Cryptol-
o
gy
-
CR
YPT
O
’96
,
N.
K
oblitz,
Ed.
Berlin,
Heidelber
g:
Springer
Berl
in
Heidelber
g,
1996,
pp.
1-15,
doi:
10.1007/3-
540-68697-5
1.
[29]
D.
D.
Le
wis,
Y
.
Y
ang,
T
.
G.
Rose,
and
F
.
Li,
“Rcv1:
A
ne
w
benchmark
collection
for
te
xt
cate
gorization
research,
”
J
.
Mac
h.
Learn.
Res.
,
v
ol.
5,
p.
361–397,
Dec.
2004.
[30]
M.
F
.
Porter
,
“
An
algorithm
for
suf
fix
stripping,
”
Pr
o
gr
am:
electr
onic
libr
ary
and
information
systems
,
v
ol.
40,
no.
3,
pp.
211-218,
2006,
doi:
10.1108/00330330610681286.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
23,
No.
1,
July
2021
:
510
–
518
Evaluation Warning : The document was created with Spire.PDF for Python.