Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
4
,
A
ugus
t
2020
,
pp.
3615
~
36
22
IS
S
N:
20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v
10
i
4
.
pp3615
-
36
22
3615
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
Data los
s preven
tion by u
sing M
RS
H
-
v2 al
gorithm
Basheer
Hus
h
am Ali
, Ahme
d Adeeb
Jalal,
Wa
s
seem
N. I
brah
e
m A
l
-
O
bayd
y
Com
pute
r
Engi
n
ee
ring
Depa
r
tment,
Co
ll
eg
e
of
E
ngine
er
ing, AL
-
I
raq
ia
Univer
si
t
y
,
Ira
q
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
Ma
r 6, 2
019
Re
vised Dec
15, 2
019
Accepte
d
Ja
n 1
1,
2020
Sensiti
ve
d
ata
m
a
y
be
stored
i
n
diffe
r
ent
form
s.
Not
onl
y
le
g
a
l
owners
but
al
so
m
al
icious
peopl
e
are
in
te
r
esti
ng
of
ge
tt
in
g
sensiti
ve
d
at
a
.
Exposin
g
val
uab
le
da
ta
to
othe
rs
l
ea
d
s
to
seve
re
Consequenc
es.
Custom
ers,
orga
nizati
ons,
a
nd
/or
compani
e
s
lose
the
i
r
m
one
y
and
r
eput
a
ti
o
n
due
to
d
at
a
bre
ac
h
es.
Th
ere
are
m
an
y
r
ea
son
s
for
dat
a
leaka
g
es.
Inte
rn
al
thr
e
at
s
such
as
hum
an
m
ista
kes
and
ext
ern
al
th
rea
ts
such
as
D
DoS
at
ta
cks
are
two
m
ai
n
rea
sons
for
dat
a
loss.
In
gene
ral,
dat
a
m
a
y
be
c
at
egor
iz
ed
b
ase
d
int
o
thre
e
kinds:
data
in
u
se,
data
a
t
rest
,
and
data
in
m
oti
on.
Da
ta
Los
s
Preve
nti
on
(DLP)
are
good
tool
s
to
ide
nt
if
y
important
d
at
a
.
DLP
ca
n
do
an
aly
sis
for
d
ata
cont
en
t
and
send
fee
dbac
k
to
administra
tors
to
m
ake
dec
ision
such
as
fil
tering,
d
el
e
ti
n
g,
or
enc
r
y
pt
i
on
.
Data
Loss
Preve
nti
on
(DLP)
t
ools
are
not
a
fina
l
solut
ion
for
dat
a
bre
ac
h
e
s,
but
they
cons
ide
r
good
sec
uri
t
y
too
ls
to
el
iminate
m
al
i
cious
ac
ti
vi
ties
a
nd
prote
c
t
sensi
ti
ve
info
rm
at
ion
.
The
r
e
ar
e
m
an
y
kinds
of
DLP
te
chni
ques,
and
appr
oximati
on
m
at
chi
ng
is
one
of
the
m
.
Mrs
h
-
v2
is
one
t
y
pe
of
appr
o
ximati
on
m
at
ching.
It
is
implemente
d
and
eva
lu
at
ed
b
y
usi
ng
TS
da
ta
set
a
nd
conf
usion
m
at
rix
.
Fina
lly
,
Mrs
h
-
v2
has
high
score
of
true
positi
ve
and
sensiti
vity
,
and
it
has
low
score
of
fal
se
nega
t
ive
.
Ke
yw
or
d
s
:
Appro
xim
ation m
a
tc
hin
g
Data b
reac
h
DLP
Mrsh
-
v2
Sens
it
ive
data
Copyright
©
202
0
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Ba
sh
eer
Hus
ha
m
A
li
A
l
-
M
afra
chi,
Com
pu
te
r
E
ng
i
neer
i
ng D
e
par
t
m
ent,
Coll
ege
of
En
gi
neer
in
g, AL
-
I
r
aqia U
niv
e
rsity
,
Ba
ghda
d,
Ir
a
q
.
Em
a
il
:
Ba
sh
eer.husham
@ali
ra
qia.edu.i
q
1.
INTROD
U
CTION
In
the
past,
the
el
ect
ro
nic
co
m
m
un
ic
at
ion
am
on
g
pe
op
le
was
so
dif
ficul
t.
They
wer
e
de
pende
d
only
on
post
office
to
exc
hange
or
sh
are
data
or
pack
a
ge.
H
owever,
m
od
ern
t
echnolo
gy
has
fixed
this
pro
blem
.
Diff
e
re
nt
kind
s
of
dev
ic
es
a
re
avail
able
nowa
days
suc
h
as
la
pto
ps,
c
om
pu
te
rs,
iPa
d,
iP
od
s
,
m
ob
il
es,
et
c.
The
inte
rn
et
c
on
t
rib
utes
in
push
i
ng
w
heel
of
de
velo
pm
ent.
By
us
in
g
int
ern
et
,
pe
op
le
c
an
do
seve
ral
kinds
of
act
ivit
ie
s
to
serv
e
their
nee
ds
su
c
h
as
shoppi
ng
us
in
g
onli
ne
web
sit
es,
se
ndin
g
e
-
m
ai
ls
a
m
on
g
the
m
,
st
ud
yi
ng
on
li
ne
,
el
ect
r
onic
rese
r
vation
an
d
s
o
on.
P
ub
li
c
a
nd
pr
iv
at
e
orga
nizat
ion
s,
unive
rsiti
es,
ho
s
pital
s,
or/
an
d
com
pan
ie
s sto
r
e, se
nd, or rece
ive d
i
ff
e
ren
t
ki
nd
s
of
data.
These
data
m
a
y
be
te
xts,
books,
im
ages,
or
vid
e
os
.
T
hey
m
ay
be
rep
res
ented
i
n
dif
fere
nt
kinds
of
form
su
ch
as
t
xt,
pdf,
do
c
,
gi
f,
j
pg
,
m
p4
,
xl
s,
pp
t,
or
exe
.
These
data
can
be
sto
red
in
diff
er
ent
ki
nds
of
dev
ic
es
,
onli
ne
storag
e
,
or
cl
oud
com
pu
ti
ng
.
Most
of
thes
e
fo
rm
s
con
ta
in
sensiti
ve
in
f
or
m
at
ion
su
c
h
as
fu
ll
nam
es,
fu
ll
hom
e
ad
dr
ess
es,
so
ci
al
secu
rity
nu
m
ber
s,
m
ob
i
le
phon
es
,
e
-
m
ai
l
add
res
ses,
cred
it
car
d
nu
m
ber
s,
or
date
of
birt
h.
N
ot
on
ly
le
ga
l
owner
s
bu
t
al
so
m
alici
ou
s
pe
op
le
a
re
inter
est
ing
of
getti
ng
these
data
to
serv
e
their
nee
ds
.
M
al
ic
iou
s
pe
ople
hav
e
us
e
d
a
dvanced
act
i
vi
ti
es
to
get
these
i
nfor
m
at
ion
by
m
any
ways
suc
h
as
vir
us
pro
gr
am
s,
ju
nk m
ails,
spy
war
es,
or
rans
om
war
es.
Ex
po
si
ng
the
se
valuab
le
data
to
oth
e
rs
le
ads
to
sever
e
dangero
us
.
Cust
om
ers,
orga
nizat
ion
s,
a
nd
/o
r
com
pan
ie
s
los
e
their
m
on
ey
and
re
pu
ta
ti
on
due
to
d
at
a
breac
hes.
They
l
os
e
bill
ion
s
of
doll
ars,
reput
at
ion
s,
br
a
nds.
The
re
are
m
any
exa
m
ples
abo
ut
t
hat.
On
e
exam
ple
is
that
m
or
e
tha
n
650
m
al
ic
iou
s
at
ta
ck
s
we
re
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3
6
1
5
-
3
6
2
2
3616
repor
te
d
be
for
e
2013.
Ba
sed
on
r
ep
or
t of
O
pen
Sec
uri
ty
Fo
un
dation
(OS
F)
,
m
or
e
than
on
e
th
ousa
nd
i
ncide
nts
of
data
loss
ha
pp
e
ne
d
duri
ng
2013
[1
]
.
Mo
r
eov
e
r,
m
or
e
than
two
hundre
d
m
il
l
ion
of
log
in
in
form
ation
were
com
pr
om
ise
d
befor
e
2012
ba
sed
on
Sym
a
ntec
com
pan
y
repor
t
[
2].
In
add
it
io
n,
m
or
e
than
165
m
i
llion
of
do
c
um
ents
wer
e
stolen
du
rin
g
2011
acco
r
di
n
g
to
Ve
rizo
n
com
pan
y
[3
]
.
Finall
y,
m
or
e
than
60
m
il
lio
n
of
sensiti
ve
file
s
wer
e
br
e
ache
d
from
So
ny
co
m
pan
y,
and
m
or
e
t
han
5
m
il
l
ion
of
l
og
i
n
da
ta
wer
e
e
xpos
e
d
f
or
Link
e
dIn
use
rs
durin
g
2012
[
1].
Be
cause
of
these
dange
rous,
re
searc
her
s
hav
e
de
sig
ne
d
m
any
us
efu
l
too
ls
against
da
ta
breac
hes.
D
at
a
L
os
s
P
rev
e
ntio
n
(D
L
P)
c
onside
rs
as
one
ki
nd
of
d
at
a
breac
h
pr
e
ve
ntion.
It
is
us
e
d
to iden
ti
fy an
d
pr
e
ve
nt i
m
po
rtant d
at
a to f
al
l
in the w
r
ong
ha
nds [
4]. T
his too
l can
be
us
e
d
to p
r
otect
sen
sit
iv
e
data
at
rest
w
hi
ch
is
data
sto
r
ed
in
en
d
us
e
r
or
syst
e
m
su
ch
as
ha
r
d
disk
or/
an
d
rem
ov
a
ble
disk.
It
can
a
lso
be
us
e
d
to
i
den
ti
f
y,
m
on
it
or
,
or
protect
data
in
us
e
w
hich
is
data
tra
ns
m
i
tt
e
d
in
t
he
netw
ork
.
P
ro
te
ct
in
g
data
i
n
m
ot
ion
is als
o on
e
goal
of D
L
P [5].
Ther
e
a
re
m
any
kin
ds
of
sec
ur
it
y
too
ls
ot
he
r
than
DLP
t
hat
can
be
use
d
in
this
re
ga
rd
s
uc
h
as
In
tr
us
i
on
Pr
e
ve
ntion
Syst
em
s
(IPS
)
a
nd
I
ntr
us
io
n
Detect
ion
Syst
em
s
(IDS).
Althou
gh
they
c
on
si
der
as
secur
it
y
to
ols,
they
hav
e
m
a
in
di
ff
e
ren
ce
.
DLP
is
re
spo
nsi
ble
only
f
or
captu
rin
g
a
nd
identify
in
g
se
nsi
ti
ve
inf
or
m
at
ion
.
H
ow
e
ve
r,
I
PS
a
nd
IDS
is
in
c
harge
of
al
l
ki
nd
s
of
da
nger
ous
or
th
reat
tha
t
m
ay
face
data.
DL
P
has
two
m
ai
n
ph
a
ses
to
ident
ify
i
m
po
rtant
or
sensiti
ve
inf
orm
ation
.
The
fi
rst
on
e
is
gene
rati
ng
fi
ng
e
r
pr
i
nts
or
pr
e
def
i
ned
patt
ern
s
which a
re
base
d
on e
xtra
ct
ing
ce
rtai
n
fe
at
ur
es
from
k
now
n fil
es. T
he se
co
nd
is c
om
par
i
ng
the
fin
ge
rprint
s
of
ne
w
file
s w
it
h
the
existe
d
fi
ng
e
r
pr
i
nts
that
are d
e
rive
d
from
the
first stage.
Fi
nally
,
if
res
ult
of
c
om
par
iso
n
was
posit
iv
e,
detect
ed
fi
le
m
a
y
be
e
ncr
y
pted,
bloc
ked,
rem
ov
e
d,
or
trans
fe
r
red
to
safe
place [
6
,
7]
.
Fu
rt
her
m
or
e,
DLP
to
ols
wa
s
first
us
e
d
in
2006.
T
hey
are
no
t
a
fi
nal
so
l
ution
for
data
br
eac
hes,
but
they
con
si
der
good
secu
rity
too
l
to
el
i
m
inate
m
alici
ou
s
act
ivit
ie
s
and
protect
sensiti
ve
inf
orm
ation
[
8].
In
t
he
m
ark
et
,
DLP
m
ay
hav
e
oth
e
r
nam
es
su
ch
a
s
in
form
a
ti
on
m
on
it
or
in
g
an
d
pr
e
ve
ntion,
in
form
at
io
n
lo
s
s
protect
ion,
dat
a
analy
zi
ng
a
nd
pre
ven
ti
on,
an
d/
or
data
le
akag
e
prev
e
ntion
[
5].
The
re
a
re
m
any
f
a
m
ou
s
com
pan
ie
s
have
dev
el
o
pe
d
D
LP
too
ls.
F
or
e
xam
ple,
Palo
Alto
Netw
ork
com
pan
y
produced
a
sec
ur
it
y
too
l
to
protect
data
in
m
otion
by
an
al
yz
ing
,
m
on
it
or
i
ng,
an
d
det
ect
ing
sensiti
ve
inform
at
ion
t
hat
are
transm
i
tt
ed
in
the
wi
re
or
wi
reless.
Am
Xecu
re
C
om
pan
y
release
d
secu
rity
desig
n
t
o
i
den
ti
fy
data
le
akag
e
that
is
cal
le
d
Pr
ivacy
I
D
[
7].
RSA
an
d
Mc
Af
ee
hav
e
th
e
best
DLP
sec
uri
ty
too
l
to
identify
inf
or
m
at
i
on
le
a
kag
e
as
sta
te
d
in
[9
,
10]
.
Fin
al
ly
,
W
eb
sense
and
Mc
A
fee
orga
nizat
ion
s
desig
ne
d
DLP
too
l
that
has
three
sta
ge
s
w
hi
ch
are
data
co
ntr
ol,
da
ta
endp
oin
t,
and
data
i
den
t
ific
at
ion
[
11
,
12
]
.
Finall
y,
t
he
r
est
of
the
pap
e
r
is
orga
ni
zed
as
the
f
ollo
wing:
sect
ion
I
I
e
xp
la
ined
t
he
popula
r
reas
ons
th
at
le
ad
to
data
le
aka
ge.
Furt
her
m
or
e,
sect
ion
I
I
I
pr
ese
nted
cat
egories
of
wh
e
r
e
sensiti
ve
data
can
be
sto
re
d.
M
or
e
over,
so
luti
on
m
et
ho
ds
are
pr
e
sen
te
d
i
n
detai
l
in
sect
io
n
I
V.
Th
e
im
pl
e
m
entat
ion
an
d
e
valuati
on
of
m
rsh
-
v2
in
se
ct
ion
V
a
nd
V
I
res
pecti
vely
.
Finall
y,
con
cl
us
io
n
is
presente
d
in
the
in the last s
ect
ion.
2.
POPUL
A
R R
EASO
NS
FO
R DA
T
A
LO
S
S
Fam
ou
s
orga
ni
zat
ion
s
a
nd
co
m
pan
ie
s
lose
their
r
ep
utati
onal
and
bill
ion
s
of
doll
ars
due
to
sensiti
ve
inf
or
m
at
ion
br
eaches.
Th
e
re
are tw
o pr
im
ary causes
of
data breac
h: e
xter
nal and i
nter
na
l t
hr
eat
.
2
.
1.
Int
ern
al
th
re
ats
First
of
al
l,
in
te
rn
al
threats
consi
der
as
a
pr
im
ary
cause
of
data
le
aka
ge.
H
um
an
m
i
sta
kes
are
at
the top
of this t
ype of thr
eat
. T
her
e a
re m
any
m
ist
akes th
at
can
be d
one by
p
eo
ple. For
i
ns
ta
nce
, n
e
glig
ence of
e
m
plo
ye
es
w
ho
le
ave
t
heir
c
om
pu
te
rs,
m
obil
es,
iPads,
or
o
the
r
de
vices
i
n
pu
blic
trans
portat
io
n
or p
la
ces
su
c
h
as
restau
ran
t,
m
ark
et
s,
an
d/
or
st
or
es
c
os
t
their
com
pan
ie
s
a
lot
of
m
oney
if
these
devi
ces
fall
in
the
wrong
hands
.
Re
m
ov
able
de
vices
s
uc
h
as
flas
h
disk
or
dis
k
dr
i
ve
r
that
a
re
le
ft
i
n
inte
r
net
ca
fé
are
c
on
si
der
f
r
om
this
ty
pe
of
th
reat.
More
tha
n
f
or
t
y
per
cent
wa
s
the
rate
of
data
loss
due
to
s
tolen
com
pu
te
r
s.
As
in
[
13]
,
thes
e
kinds
of
pro
ble
m
s
le
ad
to
al
m
os
t
m
or
e
than
f
or
ty
fi
ve
pe
rcen
t
of
data
l
eakag
e
in
the
healt
hcar
e
sect
or
[14].
Accor
d
in
g
to
the stat
ist
ic
s,
hu
m
an
erro
rs was
the m
ai
n
caus
es of in
f
or
m
at
i
on leaka
ge dur
ing
2014
[
15
]
.
Accor
ding
to
t
he
la
st
sur
vey
that
was
acc
om
pl
ished
by
th
e
Gro
up
of
He
al
thcare
Organi
zat
ion
s
a
nd
Ethic
s
A
sso
ci
a
ti
on
wh
ic
h
represent
a
colle
c
ti
on
of
ex
per
t
rese
arc
hers
in
this
m
ajo
r,
t
he
y
sta
te
d
that
al
m
os
t
forty
per
ce
nt
of
data
br
e
ach
es
incidents
ha
pp
e
ne
d
due
t
o
m
isplac
ed
do
cum
ents
su
c
h
as
i
m
po
rtant
file
s.
They
al
so
sho
wed
that
al
m
os
t
30
pe
rce
nt
of
data
le
a
kages
incide
nts
cau
s
ed
by
lost
rem
ov
a
ble
dev
ic
es
[16
]
.
As
sta
te
d
al
so
in
[
16]
,
c
ounse
l
public
offic
e
f
or
Ma
ssa
ch
us
et
ts
sta
te
obl
igate
d
Go
l
dthwai
t
w
hich
is
on
e
of
the
healt
hcar
e
chargin
g
com
pan
y
to
pay
m
o
re
than
on
e
hundre
d
an
d
thirt
y
tho
us
a
nd
bu
cks
as
a
fine.
This
is
done
beca
us
e
po
li
ce
officers
f
ound
US
B
dis
k
belo
ng
to
one
em
plo
ye
e
of
their
c
om
pan
y
that
co
ntains
m
ore
than 6
5000 se
nsi
ti
ve
recor
ds
t
hat are
r
el
at
ed
to their
p
at
ie
nt
s in p
ub
li
c tras
h.
In
a
dd
it
io
n,
te
xt
ing
m
essages
and
e
-
m
ails
is
ano
t
her
form
o
f
inf
or
m
at
ion
le
akag
e
.
Staf
f
who
w
ork
i
n
diff
e
re
nt
kin
d
of
places
s
uch
as
hosp
it
al
s,
s
chools,
unive
r
sit
ie
s,
com
pan
ie
s,
or/
an
d
org
anizat
ion
s
m
ay
sen
d
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
Da
t
a
lo
ss
pr
ev
ention by
usi
ng
MRSH
-
v2
a
l
gorit
hm
(
Bashee
r Hus
ham
Ali
)
3617
i
m
po
rtant
doc
um
ents
that
con
ta
in
se
ns
it
ive
data
ou
tsi
de
boun
dar
ie
s
of
their
w
ork
pl
ace
by
us
in
g
te
xting
app
li
cat
io
ns
suc
h
as
Gm
ai
l,
Yaho
o,
a
nd
/
or
Ho
tm
ai
l.
They
c
an
use
ot
her
kinds
of
c
ommun
ic
at
io
n
a
pp
li
cat
ion
su
c
h
as
Vibe
r,
Faceb
ook,
What
sApp,
Tel
eg
r
a
m
,
or
I
ns
ta
gr
a
m
.
They
m
ay
sen
d
t
hat
to
w
ron
g
destinat
io
n
a
nd
these
ty
pes
of
errors
nam
ed
as
m
isc
ellaneo
us
m
ist
akes
[14].
Fi
nally
,
this
can
le
ad
to
data
br
e
aches
.
Fu
rt
her
m
or
e,
s
ta
ff
of
certai
n
kind
of
orga
nizat
ion
s
m
ay
leav
e
certai
n
jo
b
and
sta
rt
a
ne
w
job
with
an
oth
e
r.
They
m
ay
ta
ke
sensiti
ve
dat
a
of
thei
r
pr
e
vi
ou
s
jo
b
with
them
and
e
xpos
e
t
hat
to
othe
rs
or
their
ne
w
j
ob
.
Accor
ding
to
the
sta
ti
sti
cs,
al
m
os
t
fifty
per
ce
nt
of
sta
ff
ta
ke
confide
ntial
data
of
t
heir
previo
us
jo
b
w
he
n
they
le
ave
[
5].
At
ul
Ma
lho
tra
was
in
cha
r
ge
of
gi
vin
g
im
po
rtan
t
data
from
his
pr
e
vious
job
wh
ic
h
is
IBM
to
his
new one
w
hich
is H
P
[5].
Staff
of
ce
rtai
n
com
pan
ie
s
be
li
eve
that
the
ir
i
m
po
rta
nt
d
ocu
m
ents
can
be
dem
olished
wh
e
n
t
hey
delet
ed
them
.
I
n
ot
her
ha
nds,
these
file
s
can
be
rec
overe
d
by
us
in
g
certai
n
pro
gr
am
s
and
can
be
us
e
d
a
ga
inst
their
com
pan
ie
s
un
le
ss
pro
vi
der
rem
ov
e
th
e
m
per
m
anen
tly.
Fo
r
insta
nc
e,
ever
y
de
vice
su
ch
as
com
pu
te
rs
,
scan
ner
s
,
pr
i
nters,
or
phones
ha
ve
m
e
m
or
y
inside
of
t
he
m
.
The
data
i
n
m
e
m
or
y
can
be
rec
overe
d
even
if
us
ers
rem
ov
e
them
.
Finall
y,
us
in
g
we
ak
al
gorithm
s
help
e
m
plo
ye
es
w
ho
ha
ve
bad
i
ntention
t
o
ta
ke
data
.
In
c
orrect
set
ti
ng
for
orga
nizat
ion
syst
em
ind
uces
at
ta
cker
s
t
o
do
their
m
a
liciou
s
act
ivit
y
su
ch
as
una
utho
rized
acce
ss to
t
he
s
yst
e
m
accor
di
ng to
V
e
rsion
re
port in
20
08 [1
7
,
18]
.
2
.
2
.
Extern
al
th
re
at
s
No
t
only
inter
nal
threats
but
al
so
exter
nal
threats
ha
ve
hi
gh
ef
fect
on
da
ta
br
each
.
Atta
cker
s
would
li
ke
to
hav
e
da
ta
to
ex
plo
it
th
ei
r
owne
r
i
n
or
der
t
o
in
duce
t
hem
to
pay
m
on
ey
f
or
them
.
Ther
e
f
or
e,
the
y
hav
e
dev
el
op
e
d
s
ophisti
cat
ed
ap
plica
ti
on
to
dece
ive
us
e
rs
of
la
rg
e
orga
nizat
ion
s
s
uc
h
as
he
al
thcare
com
pan
ie
s
,
fina
ncial
, o
t
her b
us
iness
orga
nizat
ion
s
. Poin
t
-
of
-
Sa
le
i
ntr
usi
on
s
(
P
OS)
we
re
on
e
ty
pe of exter
nal th
reats.
PO
S
is
kind
of
m
alici
ou
s
act
i
viti
es
that
can
ta
ke
and
gat
he
r
people
visa,
cred
it
,
or
de
pt
card
s
data
at
checko
ut
m
ark
et
.
Sens
it
ive
da
ta
su
ch
ca
rds
nu
m
ber
s,
e
xpir
at
ion
date,
a
nd
pass
words
ca
n
be
c
ollec
t
ed
in
one
file
by
at
ta
cker
s.
Acc
ordin
g
to
the
sta
ti
sti
c
s
that
hav
e
be
en
done
,
alm
os
t
m
or
e
than
10
gig
a
byte
s
ha
d
bee
n
br
eac
he
d
at
c
he
ckout
of
Tar
ge
t
m
ark
et
s
wh
i
ch
a
re
se
ries
of
la
r
gest
st
or
es
in
USA.
Alm
os
t
f
ort
y
m
il
l
i
on
of
cred
it
,
dep
t,
vi
sa
card
s
data
wer
e
st
olen
at
checko
ut
w
hen
custom
ers
swi
pe
their
ca
rds
at
checkou
t.
S
even
ty
people
nam
es,
date
of
birth,
a
nd
ad
dress
were
al
so
st
olen
due
to
that
m
al
i
ci
ou
s
act
s
[
14
,
19
]
.
Finall
y,
not
only
Targ
et
sto
res
but
al
so
Neim
a
n
Ma
rcu
s
m
ark
et
s
was
ta
rg
et
ed
by
PO
S
at
t
acks.
T
wo
th
ousa
nd
cre
dit
card
data
wer
e
breache
d by this at
ta
ck
[19
]
.
More
ov
e
r,
an
ot
her
kind
of
ex
te
rn
al
dange
rous
is
c
rim
ewar
e.
Atta
cke
rs
m
ay
deceive
us
e
rs
to
instal
l
m
al
ic
iou
s
ap
plica
ti
on
s
in
thei
r
el
ect
r
on
ic
de
vices
without
t
heir
knowle
dg
e
in
orde
r
to
ge
t
their
data.
Atta
ckers
can
do
that
by
sen
ding
a
po
is
on
e
-
m
ail
to
victim
s
that
con
t
ai
n
m
a
li
ci
ou
s
app
li
cat
io
ns
.
Vi
ct
i
m
s
m
ay
a
lso
visit
com
pr
om
ise
d
web
sit
es
by
m
i
sta
ke.
As
soo
n
as
victi
m
s
do
w
nlo
a
d
a
nd
in
sta
ll
the
m
alici
ous
ap
plica
ti
on
in
thei
r
dev
ic
es
,
at
t
ack
ers
ca
n
s
py,
m
on
it
or,
or
phishin
g
data
[
20]
.
Finall
y,
the
re
m
any
oth
e
r
e
xt
ern
al
th
reats
l
ead
to
data
breac
hes.
On
e
of
them
i
s
Distrib
uted
Den
ia
l
of
Ser
vi
ces
(DD
oS
)
at
ta
cks.
Stat
ist
ic
s
showe
d
t
hat
DDoS
has
inc
rease
d
in
the
la
st
deca
de
[
14
]
.
DDoS
at
ta
cks
ov
e
rlo
ad
ser
vers
wit
h
ve
ry
la
rg
e
num
ber
of
pac
ke
ts
to
sh
ut
dow
n
se
rv
i
ces
to
le
gitim
a
te
us
er
s.
Ba
se
d
on
r
ep
or
t
of
World
wide
I
nfrastr
uctur
e
A
ss
ociat
ion
,
se
vent
y
five
per
ce
nt
of
ba
nk
data le
aka
ge were
cause
d b
y DDoS
att
ack
s [21
]
.
3.
PROTE
CTIO
N
O
F I
MPO
R
TANT I
NF
O
Data
is
ver
y
i
m
po
rtant
not
on
ly
f
or
le
giti
m
at
e
ow
ne
rs
but
al
so
f
or
at
t
ackers
.
Ma
ny
DLP
vend
or
s
pr
ese
nted
th
rea
ts
that
face
dat
a
w
her
e
ver
sto
red.
V
onto
is
one
of
t
hem
.
It
sta
te
d
that
al
m
os
t
tw
o
ou
t
of
ei
gh
t
hundre
d
te
xts
that
are
tra
nsm
itted
in
the
ai
r
m
a
y
co
m
pr
ise
sensiti
ve
data.
T
wo
out
of
hundre
d
ne
twork
m
essages
i
m
ply
pr
ivate
data.
Vonto
m
entioned
al
s
o
in
the
ir
repor
t
that
e
igh
t
out
of
te
n
com
pan
ie
s
lost
their
data stor
e
d on
their lapto
ps. I
n
ad
diti
on, two
o
ut
of
fou
r
co
m
pan
ie
s lost their infor
m
at
ion
st
or
e
d
on
rem
ov
a
bl
e
dr
i
ver
s
[
8].
Im
portant
data
ca
n
be
sto
red
in
diff
e
re
nt
kind
of
cat
eg
or
ie
s
s
uch
as
data
in
m
ot
ion
,
data
at
rest,
and d
at
a i
n use
. F
inall
y, in
the
n
e
xt th
ree s
ubsect
ion
s,
these
cat
egories are
go
i
ng to be
pre
sented
in deta
il
.
3.1.
Data in
m
oti
on
Data
in
m
otion
is
inf
or
m
at
ion
that
are
tra
nsm
itted
in
the
ne
twork
w
hethe
r
thr
ou
gh
wirel
ess
or
wi
re.
This
data
m
ay
be
el
ect
ronic
books,
w
ord
docum
ents,
e
xc
el
docum
ents,
pictures
,
vid
e
os,
vo
ic
es,
or
powe
r
po
i
nt
sli
des.
A
tt
ackers
m
ay
i
nterce
pt
these
com
m
un
ic
at
ion
s
betwee
n
t
w
o
le
gal
us
e
rs
i
n
order
to
ste
al
people
data.
T
hey
can
do
that
by
dev
el
op
i
ng
m
al
ic
iou
s
appl
ic
at
ion
s.
T
he
y
al
so
m
a
y
do
that
by
ex
plo
it
ing
vu
l
ner
a
bili
ti
es
in
net
wor
k
al
gorithm
s,
netw
ork
ap
plica
ti
on
or
prot
oco
ls
.
D
LP
s
olu
ti
ons
a
re
ver
y
im
po
rt
ant
t
o
m
on
it
or
n
et
work
pac
kets that
are tra
ns
fe
rr
i
ng in
the c
ha
nnel
.
D
LP
provid
es
feedbac
k or
re
port to
t
he
a
dmi
n
of
netw
ork
i
f
the
re
a
re
a
ny
m
a
li
ci
ou
s
act
ivit
y.
T
her
e
fore,
a
dm
inist
rator
s
can
ta
ke
act
io
n
s
uc
h
as
bl
oc
king
,
filt
ering
,
e
ncr
y
ption t
o
t
h
e
dat
a.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3
6
1
5
-
3
6
2
2
3618
3.1.
1.
Ne
twor
k
m
oni
t
or
DLP
ca
n
do
m
any
ta
sk
s
as
m
entione
d
ea
rlie
r.
O
ne
of
t
hem
is
networ
k
m
on
it
or.
Passi
ve
m
on
it
or
an
d
act
ive
m
on
it
or
are
tw
o
kind
s
of
net
wor
k
m
on
it
or
in
ge
ner
al
.
I
n
s
pecific,
m
on
it
or
of
D
LP
c
onsid
ers
as
a
passive
one.
Acti
ve
m
on
it
or
can
dep
l
oy
in
bo
t
h
cl
ie
nt
and
serv
er
si
des.
S
erv
e
r
side
can
get
fu
ll
repo
rt
about
transm
issi
on
m
edium
su
ch
as n
um
ber
o
f
pac
kets,
l
os
s o
f
pa
ckets,
t
hroug
hput,
ba
ndwi
dth,
prot
oc
ol
ty
pe,
delay
tim
e,
so
ur
ce
I
P
ad
dr
e
ss,
dest
inati
on
IP
a
ddr
ess,
destinat
i
on
port
ad
dress,
an
d
s
ource
port
ad
dr
es
s.
On
oth
e
r
hands
,
passive
m
on
it
or
m
ay
be
as
a
dev
ic
e
that
ca
n
be
de
plo
y
on
ly
i
n
on
e
side
(eit
he
r
on
cl
ie
nt
sid
e
or
on
serv
e
r
side
)
[
22]
.
It
is
us
e
d
a
s
pack
et
s
nif
fe
r
to
gat
her
in
f
orm
ation
in
or
de
r
to
analy
ses
or
e
xam
ine
pack
et
s
or
flo
ws
to
i
de
ntify
m
a
li
ci
ou
s
a
ct
ivit
ie
s.
This
can
inc
rease
le
vel
of
c
onfide
ntial
it
y,
per
f
orm
ance,
an
d
e
ffi
ci
ency
of
t
he
net
wor
k.
DLP
li
ke
s
I
ntr
us
io
n
Detect
ion
Syst
em
(I
DS)
in
ide
ntifyi
ng
m
al
ic
iou
s
act
and
noti
fyi
ng
person
who
in
cha
rg
e
of
net
wor
k.
O
n
oth
e
r
ha
nd
s
,
ID
S
is
not
design
e
d
to
av
oid
data
le
akag
e
[
23
]
.
Finall
y,
ne
twor
k
m
on
it
or
of
D
LP
can
be
de
plo
ye
d
near
router
or
switc
h
that
connecte
d
al
l
dev
ic
es
tog
et
her
to
c
ontrol
al
l
incom
ing
a
nd
ou
tc
om
ing
p
ac
kets [2
4].
3.1.
2.
FTP
p
r
otoc
ol
an
d
E
-
m
ai
l
E
-
m
ail
is
the
m
os
t
co
m
m
on
way
to
se
nd
da
ta
throu
gh
t
he
internet
am
ong
pe
ople
ar
ound
the
world
.
User
s
ca
n
tra
nsfer
diff
e
re
nt
ki
nd
s
of
file
s,
a
nd
it
is
on
to
p
pr
i
or
it
y
of
data
le
akag
e.
Users
can
receive
poiso
n
li
nk
s,
ph
oto
s,
or
ot
her
docu
m
ents.
They
al
so
m
ay
get
ex
ecutable
pro
gra
m
su
ch
as
botnet.
A
s
s
oon
as
us
e
r
s
dow
nlo
a
d
thes
e
file
s
in
their
dev
ic
es
,
their
i
m
po
rtant
data
m
ay
be
in
da
ng
e
r.
Aa
a
re
su
lt
,
DL
P
so
l
ut
ion
is
an
im
po
rtant
w
ay
to
pr
otect
e
-
m
ai
l con
te
nts
by
f
ew
m
et
ho
ds
.
On
e
way
is
th
at
so
m
e
DLP
t
oo
ls
obli
gate
e
-
m
ail
app
li
cat
ion
s
to
at
ta
ch
s
m
al
l
file
s
siz
e
instea
d
of
la
rg
e
file
s
siz
e.
This
w
ould
gu
a
ra
ntee
that
e
m
plo
ye
e
can
no
t
sent
la
r
ge
siz
e
of
im
po
rtant
data
to
ex
te
rn
a
l
par
ti
es.
F
ur
t
he
rm
or
e,
DLP
to
ols
can
al
so
noti
fy
per
s
ons
who
in
ch
ar
ge
of
net
w
ork
w
hen
at
ta
cks
ha
pp
e
n.
Finall
y, data ca
n be e
ncr
y
pted or eve
n bloc
ke
d
if
ther
e
are
a
ny suspici
on
by
u
sin
g DLP
to
ols.
Im
po
rtant
doc
um
ents
can
be
sent
by
us
in
g
ano
the
r
m
et
h
od
wh
ic
h
is
file
transf
er
pro
tocol
(F
T
P).
This
protoc
ol
face
sec
ur
it
y
c
halle
ng
e
s.
Da
t
a
m
a
y
be
cha
ng
e
d
or
breac
hed
on
se
r
ver
side
w
he
n
m
alici
ou
s
per
s
ons
in
va
de
this
protoc
ol.
Fo
r
i
ns
ta
nce
,
im
po
rtant
data
relat
ed
to
Am
e
rican
arm
y
that
are
avail
able
i
n
I
raq
wer
e
e
xpose
d
by
Associ
at
ed
Pr
ess
beca
us
e
of
la
ke
of
secu
rity
fo
r
FTP
pr
oto
c
ol
[
25
]
.
A
no
t
her
c
onse
quence
of
FTP
vulne
rab
il
it
y
hap
pened
wh
e
n
al
m
os
t
e
igh
t
thousa
nds
of
rec
ords
tha
t
belong
to
S
A
IC
wer
e
e
xpose
d
as
sta
te
d
in
[25].
DLP
too
ls
fa
ce
FTP
pro
blem
s,
bu
t
it
is
n
ot
enou
gh
to
el
i
m
inate
this
pro
blem
.
Ther
efore
,
Ma
nag
e
d
Fil
e
Transfe
r
(MF
T),
w
hich
can
be
use
d
to
tr
ansf
e
r
docum
ents
sa
fely
,
m
a
y
wor
k
with
DLP
to
decr
ease
d
a
nge
rous of FT
P c
om
ple
te
ly
[
26
]
.
3.1.
3.
Fil
terin
g,
bridge
, and
blockin
g
s
oluti
on
s
Anothe
r
act
ion
that
can
be
tak
en
afte
r
detec
ti
on
of
data
le
akag
e
by
us
in
g
DLP
is
blo
c
king
dat
a
from
bein
g
br
eac
he
d.
Pac
kets
that
c
arr
y
data
w
hich
are
not
ide
ntifie
d
as
sensiti
ve
can
be
passing
th
rou
gh
the
DLP.
Howe
ver,
th
ose
pac
kets
that
has
im
po
rta
nt
inf
or
m
at
ion
c
an
be
bl
oc
ked
from
passing
thr
ough.
Bl
oc
king,
filt
ering
,
instal
li
ng
br
i
dg
e
are
al
l w
ay
s that c
an be
us
e
d
in
this r
e
ga
rd.
First
of
al
l,
bri
dg
e
is
a
ne
twork
de
vice
t
hat
can
be
us
ed
to
co
nnect
com
pu
te
r
de
vi
ces
to
f
orm
a
netw
ork.
It
can
be
us
e
d
al
s
o
to
c
onnect
in
te
rn
al
with
outsi
de
net
works.
Bridg
e
de
vice
can
do
deep
c
onte
nt
analy
sis
fo
r
th
e
pack
et
s
that
passing
thr
ough
it
and
stop
t
ran
s
ferrin
g
pa
ckets
in
case
of
fi
nd
i
ng
im
po
rta
nt
data
[5
]
.
This
dev
ic
e
colle
ct
s
incom
ing
pack
et
s
to
form
f
low
ta
ble.
Eac
h
flo
w
is
a
gr
oup
of
pac
kets
that
has
sam
e
char
act
erist
ic
s
su
ch
as
I
P
source
address,
IP
d
est
inati
on
a
ddr
ess,
Ma
c
sour
ce
address
,
an
d
Ma
c
destinat
io
n
a
ddress.
B
rid
ge de
pends
m
ai
nly o
n
rec
ordi
ng and sto
rin
g
i
ncom
ing
d
e
sti
nation M
ac ad
dres
ses and
so
urce
Ma
c
a
ddress
es
in
it
s
t
able.
A
fter
a
wh
il
e,
t
his
de
vi
ce
can
get
e
noug
h
in
form
at
i
on
t
o
decide
wh
ic
h
traff
ic
m
ay
b
e
blo
c
ked b
a
sed
on Mac
a
ddres
ses. T
his lead
s
to inc
rease sec
ur
it
y l
evel
[18].
Pr
oxy
se
rv
e
r
i
s
de
vice
that
has
high
c
har
a
ct
erist
ic
le
vel
su
c
h
as
high
m
ic
ro
process
or
s
peed,
hi
gh
rand
om
m
e
m
o
ry
acce
ss
(R
A
M),
an
d
la
r
ge
a
m
ou
nt
of
stora
ge
s
pace.
T
hi
s
de
vice
m
a
y
dep
loy
bet
wee
n
internal
netw
ork
an
d
e
xter
nal
netw
ork
to
do
deep
i
ns
pe
ct
ion.
DL
P
can
get
packet
s
that
passing
thr
ough
proxy
serve
r
for
analy
sis.
I
nt
ern
et
Con
te
nt
Ad
a
ptati
on
P
r
oto
c
ol
(I
C
AP
)
that
run
in
the
prox
y
can
se
nd
a
cop
y
of
pa
ckets
flo
w
to
ide
ntif
y
sensiti
ve
i
nfor
m
at
ion
.
Fi
na
ll
y,
DLP
ca
n
do
fu
ll
in
sp
ect
io
n
a
nd
a
naly
sis
for
in
com
ing
pa
ckets
as
m
entioned
earli
er.
It
can
ta
ke
act
io
n
wh
e
n
se
ns
it
iv
e
data
detect
ed
s
uch
as
bloc
king,
filt
erin
g,
a
nd
encr
y
ption.
Th
is
too
l
m
a
y
br
eak
the
c
ommun
ic
at
io
n
bet
w
een
tw
o
sides
.
Fo
r
e
xam
ple,
this
can
be
do
ne
by
sen
ding
pack
et
that cal
le
d
TC
P Rest
(
RS
T)
t
o br
ea
k
t
he
c
onnecti
on [5
,
27]
.
3.
2.
Data at
re
st
Data
at
Re
st
is
kind
of
i
nacti
ve
data
that
m
ay
be
not
be
use
d
at
t
he
ti
m
e
in
t
he
syst
em
.
It
m
ay
be
store
d
in
diff
e
ren
t
kind
of
f
orm
s
su
ch
as
w
ord
doc
um
ent,
sp
rea
dsheet
e
xcel,
el
ect
roni
c
books
(
pdf
),
power
po
i
nt
sli
des
(ppt),
vid
e
os,
vo
ic
es,
i
m
ages,
or
ot
her
kinds
of
file
ty
pes.
T
hese
data
can
be
sto
re
d
in
di
ff
e
ren
t
kinds
of
de
vices
su
c
h
as
la
pt
op
s
,
com
pu
te
r
s,
data
base
of
ser
ver
s
,
w
ork
sta
ti
on
s,
i
nter
na
l
or
e
xter
nal
hard
dr
i
ves,
ta
pe dri
ver
s
, clo
ud sys
tem
, p
hone
s,
i
Pads
,
or iP
ods
[4
]
.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
Da
t
a
lo
ss
pr
ev
ention by
usi
ng
MRSH
-
v2
a
l
gorit
hm
(
Bashee
r Hus
ham
Ali
)
3619
3.
3.
Data
in u
se
The
best
desc
ription
f
or
data
in
us
e
is
that
a
ny
kind
of
data
that
m
ay
be
use
d
wh
e
n
syst
em
is
act
ive.
In
ot
her
w
ord
s,
al
l
data
that
us
e
rs
ca
n
dea
l
with
w
hen
they
use
thei
r
dev
ic
es
.
F
or
i
ns
ta
nce
,
data
t
hat
are
avail
able
in
R
andom
Access
Mem
or
y
(RA
M)
m
a
y
con
sider
as
data
in
us
e.
T
his
is
be
cause
RAM
is
e
m
pty
wh
e
n
syst
em
i
s
off.
H
oweve
r,
as
soo
n
as
use
rs
sta
rt
up
s
yst
e
m
and
run
program
s,
data
can
be
uploa
ded
to
RAM.
A
no
t
he
r
exam
ple,
Ce
ntral
Processi
ng
U
nit
(CPU)
has
m
e
m
or
y
t
hat
co
ntains
a
few
kinds
of
s
m
a
l
l
reg
ist
ers
.
Eac
h
reg
ist
er
ha
s
sp
eci
al
ta
s
ks
su
c
h
as
sto
ri
ng
te
m
po
rar
y
res
ults,
po
i
nters,
m
e
m
or
y
locat
ion
addresses
,
nu
m
ber
of
incre
m
ents,
and
nu
m
ber
of
dec
re
m
ents.
These
data
co
ns
ide
r
as
data
in
us
e
wh
ic
h
are
act
ive
durin
g
syst
e
m
wo
r
k.
D
at
a
stored
in
of
f
-
li
ne
m
e
m
or
y
that
is
us
ed
when
syst
e
m
is
a
ct
ive
su
ch
as
DVD,
CD,
an
d
Bl
ue
ray
consi
de
r
as
data
in
us
e
.
D
at
a
stored
i
n
fl
oppy
dis
k,
inte
rn
al
ha
r
d
dri
ve
,
exter
nal
ha
rd
dr
i
ve,
rem
ov
able
dis
ks
are
al
l
kind
s
of
data
in
use
if
they
us
ed
in
tim
e
of
syste
m
execu
ti
on.
data
store
d
in
office
app
li
cat
io
n
s
uc
h
as
w
ord
,
e
xc
el
,
po
wer
point
s,
or
outl
ook
a
re
al
l
ty
pe
of
da
ta
in
us
e.
Dat
a
that
a
re
wr
it
te
n
i
n
te
rm
inal execu
ti
on
of
pro
gr
a
m
s su
ch
as Ja
va
, C,
C+
+,
Pyt
hon, or MA
TL
AB m
ay
b
e also fr
om
this ty
pe
[4
,
5].
DLP
is
a
good
so
l
ution
f
or
t
his
kind
of
dat
a.
F
or
exam
ple,
ap
plyi
ng
c
onstrai
nt
on
m
ac
hin
es
to
preve
nt
da
ta
loss
is
one
s
ol
ution
o
f
DLP.
P
utti
ng
li
m
itati
on
on
usi
ng
pro
gr
am
s
tha
t
le
t
us
ers
tra
ns
m
itti
ng
im
p
or
ta
nt
inf
or
m
at
ion
outsi
de
th
ei
r
de
vices.
Fi
nally
,
DLP
to
ols
obli
gate
em
plo
ye
rs
to
ha
ve
lim
i
te
d
acce
ss
to
dat
a
con
te
nt in o
rd
e
r
to
elim
inate
d
at
a lea
kag
e
[5
]
.
4.
TE
CHN
I
QUE
S FO
R DLP
DLP
to
ols
m
a
y
us
e
dif
fer
e
nt
an
d
var
i
ous
m
et
hods
t
o
do
da
ta
co
ntent
a
na
ly
sis.
Acc
ordin
g
to
S
A
NS
In
sti
tute
in
[
28]
,
seven
ty
pes
of
m
et
ho
ds
ca
n
be
us
e
d
t
o
im
ple
m
ent
DLP
too
ls
.
First
of
al
l,
the
m
os
t
popula
r
ty
pe
that
can
be
us
ed
to
im
ple
m
ent
DLP
to
ols
cal
le
d
regu
la
r
expressi
on
or
r
ule
base
d.
This
m
et
ho
d
is
based
on
l
ooking
f
or
sp
eci
fic
se
ns
it
ive
in
form
ation
su
ch
a
s
so
ci
al
secur
it
y
num
ber
s,
us
ers
’
nam
es,
e
-
m
ai
l
add
r
esses,
dep
t
or
visa
c
ard
num
ber
s,
or
phone
nu
m
ber
s
.
T
his
te
ch
nique
is
su
it
a
bl
e
fo
r
i
de
ntifyi
ng
pa
ti
ents
’
re
cords,
e
m
plo
ye
es’
re
cords,
el
ect
rici
ty
bill
s,
ph
one
s’
bill
s,
hos
pital
bill
s,
or
ba
nk
sta
te
m
ents.
Howe
ver
,
it
is
not
appr
opriat
e
for
detect
ing
im
ages,
vid
e
os,
or
vo
ic
es
.
False
po
sit
ive
rate
is
g
oi
ng
to
be
hi
gh
.
I
n
ot
her
word,
a
m
ou
nt
or
r
at
e
of
dat
a
that
a
r
e
detect
ed
by
us
in
g
t
his
m
eth
od
a
nd
do
no
t
m
at
ch
with
ori
gin
al
se
ns
it
iv
e
data
is hig
h.
In
a
dd
it
io
n,
a
nothe
r
m
et
ho
d
f
or
ide
ntifyi
ng
im
po
rtant
data
is
database
fin
ge
rprintin
g.
T
hi
s
m
et
ho
d
is
base
d
on
lo
ok
i
ng
for
a
colle
c
ti
on
of
im
po
rtant
inf
or
m
at
ion
with
t
he
dat
a
that
are
avai
la
ble
in
the
da
ta
base.
This
gro
up
m
i
gh
t
be
a
ny
set
of
data
su
c
h
as
dep
t
car
d
num
ber
s
and
ful
l
na
m
es,
fu
ll
nam
es
and
phone
nu
m
ber
s,
or
e
-
m
ai
l
add
resse
s
an
d
ca
r
d
nu
m
ber
s
.
T
his
m
eth
od
is
fitt
ing
on
i
den
ti
fyi
ng
a
sel
ect
ion
of
se
ns
it
ive
data.
Am
ount
or
rate
of
data
that
are
detect
ed
by
us
in
g
thi
s
m
et
ho
d
a
nd
do
not
m
a
tc
h
with
ori
gi
nal
sensiti
ve
data
is
hi
gh.
I
n
ot
her
w
ords
,
this
m
et
ho
d
pro
du
ces
lo
w
false
posit
ive
rate.
H
oweve
r,
this
ap
proac
h
has
pro
blem
that
i
s
sim
il
ar
to
th
e
previ
ou
s
one
w
hich
is
not
su
it
able
f
or
id
entify
ing
unstr
uctu
red
data
s
uch
a
s
vid
e
o,
im
age, or v
oices.
More
ov
e
r,
a
no
ther
a
pproach
that
us
e
d
f
or
DLP
is
e
xact
file
m
at
ching
.
This
m
et
ho
d
is
based
on
fin
ding
fin
ge
r
pr
i
nt
sign
at
ur
e
s
to
doc
um
ent
that
has
im
portant
data.
th
e
ne
xt
ste
p
is
com
par
ing
these
fin
gerpr
i
nt
sig
natu
res
with
ne
w
file
s
to
fin
d
m
at
ching
.
It
can
be
us
e
d
w
it
h
al
l
docum
ent’s
kinds
.
It
a
lso
has
low
false
neg
a
ti
ve
rate.
M
rs
h
-
v1,
ss
dee
p,
and
s
dh
as
h
al
gorithm
s
can
be
us
e
d
to
im
plem
ent
this
m
et
ho
d.
Fu
rt
her
m
or
e,
pa
rtia
l
do
c
um
ent
m
at
ching
is
a
no
t
her
te
c
hn
i
que.
T
his
a
pproach
is
searc
hing
f
or
i
nco
m
plete
or
com
plete
m
at
c
hing
with
sensi
ti
ve
data.
Roll
i
ng
h
ash
m
et
ho
d
is
an
exam
pl
e
about
this
app
r
oac
h.
It
is
effe
ct
ive
in d
et
ect
in
g
te
xt d
at
a.
Ho
wever,
it
is
no
t
su
it
able f
or v
i
deos
, photos
, or v
oices.
In
a
dd
it
io
n,
an
oth
e
r
te
chn
i
que
is
sta
t
ist
ic
al
m
et
ho
d.
T
his
m
et
ho
d
is
base
d
on
m
a
the
m
atical
equ
at
io
ns
and
sta
ti
sti
cs.
Ba
ye
sia
n
ap
proach
can
be
use
d
to
im
ple
m
e
nt
this
m
et
ho
d.
This
m
et
ho
d
i
s
su
it
able
for
ve
ry
bi
g
dataset
.
Howe
ve
r
,
it
ge
ne
rates
hi
gh
false
po
sit
ive
an
d
high
false
ne
gative.
It
al
s
o
r
eq
uire
d
m
assive
data
set
to
pro
du
ce
accu
ra
te
r
esults.
Con
ce
ptu
al
/
le
xico
n
m
et
ho
d
is
ano
t
her
te
ch
ni
qu
e
that
ca
n
be
us
ed
t
o
i
m
plem
ent
DLP
.
T
his
approac
h
is
a
colle
ct
ion
of
r
ules
an
d
di
ct
ion
aries
t
hat
can
be
use
d
t
o
fin
d
s
uspic
ious
be
hav
i
or
a
nd
detect
ing
im
p
or
ta
nt
data.
This
m
eth
od
is
ap
propri
at
e
fo
r
detect
in
g
sex
ual
harass
m
ent,
pr
ivate
tradi
ng
by
us
in
g
w
ork
acco
un
t,
and
il
le
gal
pr
act
ic
e
of
stoc
k
exc
ha
ng
e
s.
H
owev
er,
it
gen
e
rates
high
false
po
sit
ive.
Finall
y,
the
la
st
m
e
t
hod
i
s
cat
egories.
T
his
te
chn
i
qu
e
is
a
colle
ct
ion
of
pr
e
vious
m
eth
ods
t
hat
can
be
use
d
t
o
im
plem
ent
a
m
o
del
f
or
DLP
.
F
or
exa
m
ple,
loo
king
for
s
pecific
e
-
m
ai
l
add
ress
,
s
pecific
us
er
na
m
e,
and
one
ful
l
nam
e.
Finally,
exac
t
file
m
a
tc
hin
g
is
the
best
ap
pr
oach
acc
ordi
ng
to
repo
rts
[28]
.
Th
us
,
m
rsh
-
vs
is
an
al
gorith
m
that
i
m
ple
mented
in the ne
xt sect
ion
t
o
s
how
the
capa
bili
ti
es o
f t
his m
et
ho
d.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3
6
1
5
-
3
6
2
2
3620
5.
IMPLEM
A
N
TATION
As
m
entioned
earli
er
that
ap
pro
xim
ation
file
s
m
a
tc
hin
g
is
one
te
ch
nique
t
ha
t
can
be
us
e
d
to
i
den
ti
fy
data
breac
hes.
It
w
orks
on
i
de
ntifyi
ng
le
aka
ge
file
s
su
c
h
a
s
pdf
s
or
w
ord
s.
T
his
te
ch
nique
has
t
wo
ph
ases.
The
fi
rst
one
is
generati
ng
fi
ng
e
r
pr
ints
or
s
ign
at
ures
wh
ic
h
are
base
d
on
ext
racti
ng
c
ertai
n
feat
ur
es
from
known
file
s.
The
seco
nd
is
com
par
ing
the
fin
gerpr
i
nts
of
new
file
s
wit
h
the
existe
d
f
ing
e
rprints
tha
t
are
der
i
ved
f
r
om
t
he
first
sta
ge
.
The
res
ults
of
com
par
ison
fall
within
the
ra
ng
e
of
0
an
d
100.
In
oth
e
r
words
,
wh
e
n
t
he
pr
obabili
sti
c
resu
lt
s
of
com
par
in
g
two
file
s
are
hi
gh
,
the
file
s
are
sim
i
la
r
to
e
ach
oth
er
.
Howev
e
r
,
wh
e
n
t
hese
ra
ti
os
are
lo
w,
file
s
are
not
s
i
m
i
la
r
to
eac
h
oth
e
r.
T
his
pa
per
im
ple
m
e
nted
an
d
e
val
uated
the
Mrs
h
-
v2
(Multi
-
res
olu
ti
on
sim
i
la
rity
has
hing)
w
hich
is
on
e
al
go
rithm
is
based
on
ap
pro
xi
m
at
ion
m
at
ching
tec
hniqu
e
[
6
,
29]
.
In
this p
a
ragra
ph, f
irst st
age o
f
this al
gorith
m
is
ex
plaine
d
in d
et
ai
l. Th
e input that con
ta
ins sequen
c
e
of
byte
s
(
b,
b2
,..
,
bn)
can
be
div
ide
d
an
d
gr
oupe
d
into
sev
eral
window.
Each
wi
ndow
has
7
byte
s
in
le
ng
th
.
The
n,
these
w
indo
ws
(w1,w
2,
…
,w
m
)
are
gro
up
e
d
by
usi
ng
idea
of
r
ol
li
ng
has
h
al
gorithm
.
This
idea
is
si
m
ply
based
on
rem
ov
ing
t
he
first
el
em
e
nt
from
old
window
an
d
inse
rting
a
ne
w
el
e
m
ent
to
fo
rm
a
new
chun
k.
T
he
siz
e
of
el
em
ents
in
ch
unk
(e
nd
of
c
hunk)
is
de
te
rm
ined
by
c
al
c
ulati
ng
ps
eu
do
ra
ndom
fu
nc
ti
on
PRF
(
for
eac
h
chun
k)
an
d
c
hunk
siz
e
(c
).
if
PRF
(for
cert
ai
n
el
em
ent)
==
-
1
m
od
c,
t
hi
s
m
eans
stop
add
i
ng
new
el
em
ents
to
the
c
hunk.
Othe
rw
ise
,
ad
ding
new
el
em
ent
is
kee
p
go
ing
t
o
f
or
m
the
ch
unk.
Each
ch
un
k
then
is
has
he
d
by
us
in
g
F
I
N
al
gorithm
.
Th
e
m
a
in
goal
is
to
hav
e
16
0
by
te
s
for
the
ch
unk
siz
e
an
d
0.5
f
or
the
fin
gerpr
i
nt
siz
e
[6
]
.
The
idea
of
bloom
filt
er
was
us
e
d
to
im
ple
m
ent
the
second
phase.
Bl
oom
fi
lt
er
is
the
m
et
ho
d
f
or
ans
wer
in
g
s
et
m
e
m
ber
sh
ip.
It
de
pends
on
set
o
f
va
lues
(in
pu
t
)
and
in
de
pende
nt
hash
functi
ons.
Let
us
im
agine
that
there
are
set
of
el
em
ents
den
ote
d
by
E,
a
nd
al
l
el
e
m
ents
(n)
are
set
init
ia
ll
y
to
be
false.
I
n
add
it
io
n,
i
nd
e
pende
nt
has
h
functi
on
(
Hf)
that
m
ay
c
on
ta
in
a
set
of
has
h
f
un
ct
ion
s
(Hf1,H
f1,…
,
H
fn)
gi
ves
us
ra
ndom
nu
m
ber
s
betwee
n
0
a
nd
n
-
1.
F
or
eac
h
el
em
ent
e
in
the
set
of
E,
re
su
lt
of
Hf(e)
is
go
i
ng
to
be
t
he
in
dex
num
ber
(
po
sit
io
n)
of
E
,
an
d
w
e
are
go
i
ng
to
set
to
true
t
he
e
lem
ent
of that
po
sit
io
n.
The
s
eco
nd
s
ta
ge
is
c
om
par
in
g
the
fi
nger
pr
i
nts
of
new
file
s
with
th
e
e
xisted
fin
ge
rprints.
To
im
ple
m
ent
this
sta
ge,
tw
o
bloom
fil
te
rs
are
us
e
d
E
1
an
d
E2
.
e1
a
nd
e
2
is
goin
g
to
be
nu
m
ber
of
bi
ts
that
are
set
to
be
tr
ue
in
E1
a
nd
E
2
res
pecti
vely
.
The
num
ber
of
bits
that
set
t
o
be
tr
ue
in
co
m
m
on
is
(
k=e
1∩
e
2).
To
com
par
e
E
1
an
d
E2,
k
is
go
i
ng
to
be
co
m
par
ed
with
certai
n
kind
of
scor
e
(
S)
.
In
s
uc
h
a
way,
if
k
is
la
rg
er
than
S,
the
n
the
sim
il
arit
y
s
cor
e
is
high.
Othe
rw
ise
it
i
s
go
i
ng
to
be
0.
S
can
be
com
pu
te
d
bas
ed
on
the
m
axi
m
u
m
(Max
)
a
nd
m
ini
m
u
m
(Min
)
num
ber
of
bi
ts
that
inter
fere
by
c
ha
nce
be
tween
E1
an
d
E
2.
Ther
e
f
or
e,
=
∗
(
−
)
+
(1)
w
he
re β
=0.3
base
d o
n bett
er
exp
e
rim
ent.
Wh
e
re
Max ca
n be
def
i
ned as i
n
(
2
)
:
=
(
1
,
2
)
(2)
Howe
ver, Mi
n ca
n be calc
ulate
d
as
in
(3).
=
∗
(
1
−
(
∗
1$
)
−
(
∗
2$
)
+
(
1$
+
2$
)
)
(3)
Wh
e
re
n
is
the
num
ber
of
bits
of
bloom
filter
a
nd
H
f
is
the
siz
e
of
has
h
functi
ons
as
m
entioned
earl
ie
r
i
n
the
desc
riptio
n
of
bl
oo
m
filter.
e
1$
is
am
ou
nt
of
el
em
ent
for
Bl
oom
fil
te
r
E1
,
a
nd
e
2$
is
the
num
ber
of
el
e
m
ents
fo
r
E
2.
Fi
nally
,
q
is
the
rate
that
som
e
kin
d
of
bit
sti
ll
false
or
zero
in
t
he
Bl
oom
filt
er
wh
en
we
a
dd
a n
e
w
el
em
ent an
d i
t can
be
define
d
as i
n (4)
.
=
(
1
−
1
/
)
(4)
Ther
e
f
or
e,
w
e
can
us
e
(5)
t
o
f
ind
t
he
sim
il
ari
ty
r
at
io (
Sim
)
be
tween t
wo Bl
oo
m
f
il
te
rs
:
=
{
0
,
≤
100
(
−
)
−
,
ℎ
(5)
6.
EVAL
UA
TI
O
N
Mrsh
-
v2
al
gori
thm
was
evaluated
b
y
us
i
ng
conf
us
io
n
m
at
r
ix
[29,
30]
.
Co
nfusion
m
at
rix
has
m
any
kinds
of
m
et
ri
cs
that
can
be
us
e
d
in
the
e
valuati
on.
T
r
ue
posit
ive
(TP),
false
posit
iv
e
(F
P
),
false
ne
gative
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N:
20
88
-
8708
Da
t
a
lo
ss
pr
ev
ention by
usi
ng
MRSH
-
v2
a
l
gorit
hm
(
Bashee
r Hus
ham
Ali
)
3621
(F
N
),
tr
ue
posi
ti
ve
rate
(TPR
)
or
se
ns
it
ivit
y
or
recall
,
fals
e
po
sit
ive
rate
(F
PR)
or
fall
-
ou
t
or
pro
ba
bili
ty
of
fal
se
al
arm
,
and
false
ne
gativ
e
rate
(FNR)
or
m
iss
rate
are
so
m
e
of
these
m
et
rices
that
wer
e
us
e
d.
TP
m
eans
the
file
identif
y
cor
rectl
y
by
the
al
gorithm
s.
I
n
ot
her
w
ords
,
the
id
entif
ie
d
file
is
si
m
i
la
r
to
the
on
e
that
is
store
d
in
datab
ase.
Se
ns
it
ivit
y
or
rec
al
l
is
the
am
ou
nt
or
r
at
e
of
file
s
tha
t
hav
e
sim
il
ari
ty
with
file
s
th
at
are
existe
d
in
the
database
.
FP
m
eans
that
m
rsh
-
v2
detect
s
file
s,
but
they
are
no
t
e
xisted
in
t
he
data
base.
F
al
l
-
out
or
pro
bab
il
it
y
of
false
al
arm
is
the
am
ou
nt
or
rate
of
file
s
that
a
re
detect
e
d
by
the
al
gori
thm
and
do
no
t
hav
e
a
corres
ponding
one
in
the
database
.
F
N
m
eans
that
m
rsh
-
v2
ide
ntifie
s
file
s
that
are
no
t
a
vaila
ble
in
the
database
.
Mi
ss
rate
is
the
nu
m
ber
of
fil
es
that
are
detect
ed
by
this
al
gorithm
,
bu
t
they
are
no
t
avai
la
ble
in
the d
at
a
base.
Mrsh
-
v2
w
as
evaluate
d
in
th
e
netw
ork
en
vi
ronm
ent
and
by
us
in
g
TS
da
ta
set
.
TS
data
set
wh
ic
h
is
publicl
y
avail
able
on
li
ne.
T
his
dataset
con
ta
ins
seve
ral
kinds
of
file
s
su
c
h
as
pdf
,
exe,
do
c
,
gif
,
xls,
ppt,
an
d
txt.
App
roxim
at
el
y
three
th
ousa
nd
file
s
f
rom
TS
dataset
wer
e
tra
ns
fe
rre
d
in
net
work
by
gen
e
rati
ng
alm
os
t
two
hundre
d
a
nd
nin
et
y
th
ousan
d
pac
kets.
249670
ou
t
of
290314
pack
et
s
that
carried
di
ff
ere
nt
ki
nd
s
of
file
s
wh
ic
h
are
a
va
il
able
in
the
dataset
wer
e
detect
ed
co
rr
e
ct
ly
by
m
rsh
-
v2
al
gorithm
as
show
n
in
T
able
1.
Howe
ver,
40
643
pac
kets
t
hat
trans
ferre
d
dif
fer
e
nt
ki
nds
of
file
s
we
re
i
de
ntifie
d
by
m
rsh
-
v2,
bu
t
t
hey
are
no
t
avail
able
in
th
e
dataset
.
I
n
oth
e
r
hand,
2903
1
pac
kets
that
ha
ve
file
s
we
re
no
t
dis
cov
e
re
d
by
m
rsh
-
v2
al
tho
ug
h
th
os
e
file
s
are
a
vaila
ble
in
the
dat
aset
.
Finall
y,
the
res
et
of
T
a
bl
e
1
sho
ws
the
a
m
ou
nt
of
pac
kets
f
or
al
l kinds
of
file
s in deta
il
s b
as
ed on t
he
te
rm
FP,
TP
, a
nd F
N
.
Finall
y,
0.8
5
out
of
1
was
th
e
value
of
T
P
R
or
sensiti
vit
y
as
show
n
i
n
Fig
ur
e
1.
Thi
s
m
eans
t
hat
there
is
a
high
am
ou
nt
of
di
f
fer
e
nt
kinds
of
file
s
was
dete
ct
ed
c
orrectl
y
by
m
rsh
-
v2.
H
ow
e
ve
r,
0.1
4
a
nd
0.1
ou
t
of 1 wa
s th
e am
ou
nt of
FP
R an
d
F
NR r
e
s
pecti
vely
. T
his
m
eans
m
rsh
-
v2
produces
lo
w
am
ou
nt of e
rror
s
in
gen
e
ral
f
or
dif
fer
e
nt
ty
pes
of
file
s.
I
n
s
pecific,
Mrs
h
-
v2
a
lgorit
hm
detect
s
j
pg,
gif
,
an
d
pd
f
file
s
c
orrectl
y
because
TPR
is h
i
gh, a
nd F
N
R an
d
F
PR are
low
a
s s
how
n
i
n
Fi
gure
1.
Table
1.
Value
s of FP, T
P,
a
nd F
N
F
i
l
e
Ty
p
e
FP
TP
FN
j
p
g
31934
278701
8709
g
i
f
29031
281604
5806
doc
40643
214832
72578
x
l
s
52256
235154
52256
ppt
40643
267088
20321
pdf
34837
278701
11612
t
x
t
40643
252573
34837
e
x
e
52256
243863
43547
t
o
t
a
l
40643
249670
29031
Figure
1.
FPR,
TPR, a
nd F
NR
value
s
by u
si
ng Mrs
h
-
v2
7.
CONCL
US
I
O
N
It
is
ve
ry
sig
nificant
to
prote
ct
data
w
hatev
er
they
st
or
e.
Data
can
be
store
d
in
th
ree
m
ai
n
diff
e
re
nt
kinds:
data
i
n
m
ot
ion
,
data
at
rest,
a
nd
data
in
us
e
.
Data
is
i
m
po
rtant
not
on
ly
f
or
le
giti
m
at
e
ow
ne
rs
but
al
so
for
at
ta
cke
rs.
Data
L
os
s
P
re
ven
ti
on
(
DLP)
are
good
to
ols
to
ide
ntify
se
nsi
ti
ve
data.
DL
P
can
do
a
naly
sis
for
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3
6
1
5
-
3
6
2
2
3622
data
co
ntent
and
se
nd
fee
db
a
ck
to
adm
inist
rator
s
to
m
ake
decisi
on
su
c
h
as
filt
ering
,
de
le
ti
ng
,
or
enc
r
ypti
on
.
Ther
e
a
re
m
an
y
kin
ds
of
D
L
P
te
chn
iq
ues
.
The
best
on
e
i
s
approxim
at
io
n
file
s
m
a
tc
hin
g.
Mrs
h
-
v2
al
gorithm
consi
ders
as
a
n
e
xam
ple
abo
ut
t
his
ap
pro
ach.
T
his
al
gorithm
was
i
m
plem
ented
and
evaluate
d
by
usi
ng
publicl
y
avail
able
TS
dataset
.
Confus
i
on
m
at
rix
res
ults
sh
ow
e
d
that
this
al
go
rithm
has
hig
h
TP
,
an
d
TPR
.
In o
t
her
ha
nds,
m
rsh
-
v2
h
as l
ow FP
, FN,
FPR
, and FNR.
REFERE
NCE
S
[1]
"2019
Data
Brea
ch
Inve
stiga
tions
Report
,
"
ver
iz
on
,
2019.
[Online
]
.
Available:
htt
ps://
ent
er
prise
.
ver
i
zon.com/resourc
es/re
por
ts/2
019
-
dat
a
-
br
each
-
inve
stig
at
ions
-
r
epor
t.
pdf
.
[2]
D.
Antonia
d
es,
e
t
al
.
,
"A
cc
ura
te
Tra
ffi
c
C
a
te
gor
i
za
t
ion,
"
Proceed
ings o
f
IST
Broadband
Europe,
p
p.
1
-
6
,
2006
.
[3]
W
.
As
hford,
"D
DoS
is
the
Mo
s
t
Com
m
on
Method
of
Cy
b
er
-
A
tt
a
ck
on
Financ
i
al
Instit
ut
ions,"
2016.
[Online
]
,
Avail
a
b
le
:
htt
ps:
//
Computer
W
eekl
y
.
com,
[4]
F.
Breitinge
r
,
a
nd
I.
B
aggi
l
i,
"F
il
e
Det
ection
o
n
Network
Tr
af
fic
Us
ing
Appr
oximati
on
Ma
tching,
"
Journal
of
Digit
al Forensics
,
Sec
uri
ty
&
La
w,
vol
.
9
,
no
.
2
,
pp.
23
-
35
,
2014
.
[5]
J.
Bee
skow
,
"Reduc
ing
Secur
i
t
y
R
isk
using
D
at
a
Loss
Preve
nti
on
T
ec
hno
lo
g
y
,
"
Journ
al
o
f
The
Heal
thc
ar
e
Fi
nanci
a
l
Mana
geme
nt Ass
oci
ation,
pp
.
108
-
112
,
2015.
[6]
F.
Brei
ti
ng
er
an
d
H.
Bai
er
,
"S
imila
rity
Preserv
i
ng
Hashing:
El
i
gibl
e
Propert
ie
s
and
a
New
Algo
rit
hm
MRS
H
-
v2
,
"
Inte
rnational
IC
ST Confe
renc
e
o
n
Digit
al
Forens
ic
s and
C
ybe
r C
rime,
pp
.
167
-
18
2,
2012
.
[7]
B.
Bl
evi
ns,
"Best
of
Da
ta L
oss
Pr
eve
nt
ion,
"
In
formation
Se
curit
y
,
pp.
13
-
15
,
2014
.
[8]
A.
Burroughs,
"
Data
Br
eache
s C
ause
W
orr
y
,
"
Sm
art B
usiness Or
ange
Count
y,
201
5.
[9]
A.
Cecil,
"A
Sum
m
ary
o
f
Netw
ork
Tra
ff
ic
Mo
nit
oring
and
An
aly
s
is
Techni
qu
es,
"
Co
mputer
Syste
ms
Analysi
s,
pp.
4
-
7
,
2006
.
[10]
S.
Gum
aste
,
e
t
al.
,
"P
rox
y
Serv
er
Experim
ent
a
nd
the
B
eha
vior
of
the
W
eb
,
"
I
nte
rnational
Jou
rnal
of
Ad
vanc
e
d
Re
search in Co
mputer
Scienc
e,
vol.
4
,
no
.
1
,
pp
.
84
-
87,
2013
.
[11]
"D
at
a
Loss
Pre
vent
ion
Ke
epi
n
g
y
our
Sensiti
v
e
out
of
the
Public
Dom
ai
n
,
"
EY
,
2011
.
[O
nli
ne]
.
Avai
la
bl
e
:
htt
p://ww
w.ey
.
c
om
/Publi
ca
ti
on
/
vwLUA
ss
et
s/EY
_Data
_Loss_P
re
vent
ion/
$FIL
E/E
Y_D
at
a_L
os
s_P
rev
ention.
pdf
[12]
L.
Grandi
a,
"N
in
e
Ke
y
C
y
b
er
Thr
ea
ts
Ide
n
ti
fi
ed
in
Veri
zon
Dat
a
B
rea
ch
R
epor
t
,
"
Healt
h
Manag
eme
nt
Technol
og
y,
vol.
35
,
no
.
6
,
20
14.
[13]
"The
Prac
t
ic
a
l
Exe
cu
ti
ve’
s
Gui
de
to
Data
Loss
Preve
nti
on,
"
Whit
epap
er
,
pp.
2
-
17
,
2019.
[Online
]
.
Availabl
e:
htt
ps://
cdw
-
prod.
adobecqm
s.ne
t/conte
n
t/
dam/cd
w/on
-
dom
ai
ncd
w/bra
nds/forc
ep
oint
/whitepap
er
-
pra
ctic
al
-
exe
cu
ti
ves
-
guid
e
-
dat
a
-
loss
-
pre
ve
nti
on
-
en
.
pdf
,
[14]
"Inte
rne
t
se
cur
ity
threat
rep
ort,
"
2019tren
ds
,
Symante
c
,
Inc
.,
vol
.
2
4,
2019.
[Online
]
.
Availabl
e:
htt
ps://
ww
w.
s
y
m
ant
ec
.
com/con
te
nt/
d
am/s
y
m
antec
/do
cs/re
ports/
i
str
-
24
-
2019
-
en.
p
df
.
[15]
J.
Jae
ger
,
"H
um
an
Err
or,
Not
Hac
ker
s,
Cause
Mos
t
Data
B
rea
ch
es,
"
Compliance
We
ek,
vol.
10,
no.
110
,
pp.
56
-
57
,
2013
.
[16]
R.
Hiesh,
"Im
proving
HIP
AA
E
nforc
ement
and
Protec
ti
ng
Pa
ti
e
nt
Privacy
in
a
Digit
al
He
al
th
care
Envi
ronm
ent,
"
Loyola
Univ
ersit
y
Chi
cago
Law
J
ournal,
vol
.
46
,
no.
1
,
pp
.
175
-
2
23,
2014
.
[17]
"M
cAfe
e
Tot
a
l
Protec
ti
on
for
Data
Loss
Pr
eve
nt
ion,
"
Solution
brief
,
Mc
A
fe
e
,
2019.
[Onl
ine
]
.
Avail
able:
htt
ps://
ww
w.m
c
afe
e
.
com/en
te
rpr
ise/
en
-
us/
assets/
soluti
on
-
briefs
/sb
-
tot
al
-
protect
io
n
-
for
-
dlp.
pdf
.
[18]
H.
Tuttle,
"H
a
ck
ing
Aw
a
y
at
the
Bott
om
Li
n
e,
"
R
isk
Manage
men
t
,
2014
.
[19]
L.
Mus
thaler
,
"T
he
Tru
e
C
ause
o
f
Data Breac
h
es,
"
Net
workWorld
Asia,
pp
.
6
-
6
.
20
08.
[20]
S.
Naidu
,
"D
ata in Moti
on
-
Se
cur
ing
Businesses o
n
the Go,"
SDA
Asia
Magazine
,
pp.
46
-
48
,
2009
.
[21]
N.
W
y
nn
e
and
B
.
Re
ed, "Magi
c
Q
uadr
ant
fo
r
Co
nte
nt
-
Aw
are Da
t
a
Loss Prev
ent
io
n,
"
Gar
tne
r R
ese
arch,
2013
.
[22]
A.
Papadogi
ann
aki
s,
et
a
l.,
"Im
proving
the
Perform
anc
e
of
Pass
ive
Network
Monitori
ng
Applic
a
ti
ons
using
Loc
a
li
t
y
Buffe
r
i
ng,
"
IE
EE
Xpl
or
e
,
pp
.
151
-
1
57,
2
007.
[23]
K.
Koš
t’á
l,
et
a
l.
,
"M
an
age
m
ent
and
Monitori
ng
of
IoT
Devi
ce
s
Us
ing
Bloc
kch
ai
n,
"
The
Associ
ati
on
of
Dig
it
al
Forensic
s,
S
ec
ur
it
y
and
Law
(
ADFS
L)
,
pp.
1
-
12
,
2019.
[24]
L.
Str
auss,
"D
ata
bre
a
ch
stud
y
:
C
riminal
at
t
ac
ks n
ow l
ea
d
ing ca
us
e
,
"
J
.
of
h
ealth
c
are
compli
an
ce
,
pp.
61
-
63
,
2015
.
[25]
"S
urve
y
C
it
es
Hum
an
Err
or
as
Bigge
st
C
ause
of
Dat
a
Br
ea
c
hes,
"
Magazine
Arti
c
le
-
In
formation
Manag
eme
nt
,
2015,
[onl
ine
]
.
Avai
la
bl
e:
htt
ps:/
/www
.
questi
a
.
com/m
aga
zi
n
e/
1G1
-
4362
28015/surve
y
-
cit
es
-
hum
an
-
err
or
-
as
-
bigge
st
-
c
ause
-
of
-
dat
a
.
[26]
"The
GoA
n
y
wh
e
re
book
o
f
se
cur
e
fi
le t
ran
sfe
r
pr
oje
c
t
ex
amples
,
"
GoAny
where
m
anag
.
f
il
e
transf
.
,
pp
.
1
-
33
,
2019
,
[onli
ne]
.
Ava
il
a
ble
:
htt
ps:
//
ww
w.i
nfosec
ur
ity
e
uro
pe.
com/__novad
ocuments/585230
?
v
=636906728
355170000
.
[27]
J.
W
u,
e
t
al
.
,
"
Ke
y
strok
e
and
Mous
e
Movem
ent
Profili
ng
for
Data
Loss
Pre
vent
ion
,
"
Journ
al
of
Informatio
n
Sci
en
ce and
Eng
ine
ering
,
pp
.
23
-
42,
2015
.
[28]
R.
Mogull,
and LL
C.
Secur
osin
g,
"U
nder
standi
n
g
and
Sele
cting
a
Data
Loss Preve
nti
on
Soluti
on
,
"
Techni
cal
report
,
SANS
Insti
tut
e
,
2
007.
[29]
V.
Gupta
,
"F
ile
det
e
ct
ion
in
netw
ork
tra
ffi
c
usin
g
appr
oximat
e
m
at
ch
ing
,
"
MS
Thesis.
Insti
tut
t
for
Tele
mati
kk
,
201
3.
[30]
K.
Ti
ng
,
"Confu
ss
ion
Matri
x,
"
E
nc
yclope
dia
of
Mac
hine Learni
ng
and
Data
Mi
ning.
Spring
er,
Boston,
2017
.
Evaluation Warning : The document was created with Spire.PDF for Python.