Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l. 6,
N
o
.
3
,
Ju
n
e
201
6,
p
p
.
9
8
0
~
985
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
3.7
208
9
80
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
URL ATTACKS: Classification of
U
R
Ls via Analysi
s
and Learning
M. R
a
jes
h
1
, R
.
A
bhilash
2
,
R. Praveen Kumar
3
1
Department of Computer
Engin
eering
,
M.A
.
M School of
Eng
i
neering, India
2
Device
As
s
o
cia
t
e,
Am
azon Dev
e
lopm
ent C
e
nte
r
, Chenn
a
i
,
Ind
i
a
3
Cognizant Tech
nolog
y
Solutions
, Chenn
a
i, Ind
i
a
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Dec 19, 2014
Rev
i
sed
Feb 1, 20
16
Accepted
Feb 16, 2016
Social Network
s
such as Twitter, Facebook pl
ay
a rem
a
rkabl
e
growth in
recen
t
ye
ars
.
Th
e rat
i
o of twe
e
ts
or m
e
s
s
a
ges
in the form
of URLs
incre
a
s
e
s
day
b
y
d
a
y
.
As
the number
of
URL incr
eas
es
,
the
prob
abil
it
y o
f
fabri
c
a
tion
als
o
gets
incre
a
s
e
d us
ing their HTM
L
conten
t as
well as
b
y
th
e us
age of t
i
n
y
URLs. It
is im
portant
to
class
i
f
y
th
e URLs b
y
means of so
me modern
techn
i
ques. Con
d
itional r
e
dir
ection met
hod is
used her
e
b
y
wh
ich the URLs
get clas
s
i
fi
ed a
nd als
o
the tar
g
et page tha
t
t
h
e us
er needs
is
achieved
.
Learn
i
ng metho
d
s also in
troduced to
diff
erentiate th
e URLs
and
there b
y
th
e
fabric
ation
is n
o
t possible
.
Als
o
the
cl
assifiers
will
effi
ci
entl
y det
ect
th
e
suspicious URLs using
link analysis
algorithm.
Keyword:
HTM
L
Link Analysis
Social net
w
orks
Tiny URL
URL attacks
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
M. Raj
e
sh
,
Depa
rtem
ent of Com
puter Sci
e
nce,
M
.
A.M
Sch
o
o
l
o
f
E
ngi
neeri
n
g,
Tam
i
l
Nadu
,
In
di
a.
Em
ail: rajesh.m
anoha
ra
n89@gm
ail.co
m
1.
INTRODUCTION
Soci
al
net
w
or
ki
n
g
pl
ay
s an
im
port
a
nt
r
o
l
e
i
n
i
n
f
o
rm
at
i
on s
h
a
r
ing se
rvice for t
r
ansferring of
m
e
ssages i
n
t
h
e fo
rm
of t
w
e
e
t
s
or
any
ot
h
e
r m
odes.
Wh
e
n
the
social
us
ers
need
to s
h
are a
URL
with thei
r
cl
ose
once
t
h
e
n
t
h
ey
f
o
rm
al
ly
use s
o
m
e
of t
h
e s
h
ort
e
ni
ng
s
e
rvi
ces.
The
proliferati
o
n of s
o
ci
al networki
ng [1]
lead to i
n
crea
s
e
in s
p
am
activity. The
spa
mmers send
uns
ol
i
c
i
t
e
d m
e
ssages
fo
r
vari
ous
p
u
r
p
oses.
Hash t
a
gs a
nd
sho
r
t
e
ne
d
UR
Ls [
2
]
l
i
k
e t
.
co
are f
r
eq
ue
nt
l
y
abu
s
ed
b
y
th
e sp
amme
rs.
Hash
tag
s
are
u
s
ed
to
d
e
note th
e top
i
c
or l
a
test trend a
n
d
they are
abuse
d
by the
s
p
ammers.
Th
e ab
ility to
d
i
sgu
i
se URL d
e
stin
ation
h
a
s
m
a
d
e
twitter
o
r
o
t
h
e
r so
cial n
e
tworks as an
attractiv
e targ
et for
the s
p
ammers.
In t
h
e
first stu
d
y
f
o
cu
sin
g
on
sp
am
detection [3],
we c
o
llect a num
b
er of use
r
s acc
ount. The
use
r
s
are conside
r
e
d
as spam
m
e
rs
by use
of s
p
ec
ial
m
e
thods
and algorithm
s
and t
o
dete
rm
in
e the false pos
itive
rate. Here
we
collect a specific num
be
r
of users acc
ount s
u
ch as in sm
al
l
env
i
ron
m
en
t lik
e co
lleg
e
s
o
r
sm
a
ll
scale in
du
stries to
d
e
tect th
eir
sp
ammin
g
.Th
i
s will act
as t
h
e stand
alon
e ap
p
lication
for fin
d
i
ng
sp
am
URLs.
2.
R
E
SEARC
H M
ETHOD
1
.
Co
nd
itio
n
a
l
red
i
rection
sch
e
me to
ig
no
re th
e su
sp
ic
i
o
u
s
URLs an
d th
ere b
y
fabricatio
n is no
t po
ssib
le
anym
ore.
2.
New feat
ures l
i
ke l
earni
n
g
conce
p
t
s
, cl
assi
fi
cat
i
on an
d l
i
nk a
n
al
y
s
i
s
t
o
di
ffere
nt
i
a
t
e
t
h
e sus
p
i
c
i
o
us and
uns
us
pi
ci
o
u
s U
R
Ls.
3
.
Data sets
were tak
e
n th
at co
nsist o
f
URLs
of su
sp
i
c
i
o
us
an
d
uns
us
pi
ci
o
u
s
si
t
e
s
and they
are classified by
sup
e
r
v
i
s
ed
l
ear
ni
n
g
m
e
t
hods
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
URL
ATTACKS
: Cla
ssifica
tion
o
f
URLs vi
a
Ana
l
ysis and
Lea
r
n
i
ng
(M. Rajesh
)
9
81
Th
e
u
lti
m
a
te g
o
a
l is to
d
e
v
e
lop
a con
d
ition
a
l red
i
rectio
n to
p
r
o
t
ect th
e su
sp
icio
us
URLs.
Th
e cu
rrent
vi
si
t
o
r ca
n be
a no
rm
al
brow
ser .T
he n
o
rm
al
bro
w
se
r wi
l
l
not
k
n
o
w
t
h
at
t
h
e UR
L i
s
bei
n
g re
di
rect
ed t
o
su
sp
icio
us site an
d th
ere b
y
th
e u
s
er
g
e
ts red
i
rected
t
o
m
a
l
i
c
i
ous pa
ge.
Here t
h
e c
ont
e
n
t
o
f
t
h
e s
u
s
p
i
c
i
ous
URL is n
o
t
retriev
e
d
,
sin
ce th
ey d
o
no
t reveal th
eir s
ecrets to the norm
a
l browse
rs
.So an analysis algorithm
is nee
d
ed for classification.
2.
1.
Sys
t
em de
tails
The p
r
o
p
o
sed
sy
st
em
consi
s
t
s
of f
o
l
l
o
wi
ng
com
ponent
s:
dat
a
col
l
ect
i
o
n
,
ext
r
act
i
o
n, l
earni
ng a
n
d
classification [4]
(Fi
g
ure 1).
2.
1.
1.
Data Co
llection
In t
h
is phase, t
h
e URL m
e
ssages are c
o
llected fr
om
t
h
e p
u
b
l
i
c
and
m
a
de fo
r UR
L
re
di
re
ct
i
ons. T
h
e
tweets always follow stream
ing
APIs and
lo
ok
up
fo
r
IP ad
dr
esses. I
t
sim
p
l
y
b
l
o
c
k
s
up
th
e I
P
addr
esses if
seem
s
to be malicious and they are
skipped off. It is known t
h
at the cr
awlers cannot
reach t
h
e m
a
li
cious
URLs
[5
]
when
co
nd
itio
nal red
i
rection
is
u
s
ed
.
Tabl
e
1. T
r
ai
ni
ng
dat
a
set
Phases Label
Users
T
r
aining Spam
104
Non Spam
1483
T
e
sting Spam
104
Non Spam
1548
2.
1.
2.
Da
ta
e
x
tr
acti
o
n
Th
is
p
h
a
se invo
lv
es
gr
oup
ing o
f
do
m
a
in
s and
ex
tr
ac
ting
futu
re
v
ectors.
Th
e
p
h
a
se also
m
o
n
ito
rs th
e
m
e
ssage que
u
e
.
If se
veral
UR
Ls sha
r
e t
h
e sa
m
e
IP addre
ss
th
en
t
h
ey rep
l
ace th
e sites to
th
e on
e
wh
ich
is in
th
e d
a
ta
set foun
d to
b
e
b
e
n
i
gn
.
2.
1.
3.
Lea
r
ning
Of
fl
i
n
e m
ode
f
o
r
su
pe
rvi
s
e
d
al
go
ri
t
h
m
i
s
used
here
to clas
sify both
URL
s
an
d als
o
clas
sification i
s
m
a
de vi
a ra
n
k
basi
s (l
i
nk a
n
al
y
s
i
s
). Fo
r l
a
bel
i
n
g
,
acc
ou
n
t
st
at
us i
s
use
d
an
d so
t
h
at
U
R
Ls fo
rm
susp
ende
d
accounts a
r
e c
onsi
d
ere
d
a
s
s
u
spicious
whe
r
e
as
from
native accounts
are
c
onsi
d
ere
d
a
s
benign.
2.
1.
4.
Classific
a
ti
on
In
p
u
t
vect
ors
[
6
]
are
use
d
t
o
cl
assi
fy
t
h
e s
u
spi
c
i
o
us a
nd
u
n
su
spi
c
i
o
us
U
R
Ls. LIB
L
I
N
E
A
R
m
e
t
hods
were
use
d
earl
i
er to im
ple
m
e
n
t this classi
fier. T
h
e
cl
assifier algorithm
s
suc
h
as
A
d
a B
oost
,Naï
ve B
a
y
e
s,
Support Vect
or
Classification(SVC)
are
c
o
m
p
ared a
n
d
selected an
lin
k an
alysis
based
al
g
o
rithm
-
pow
er
itera
tio
n
th
at will classify
t
h
e URLs effectiv
ely so
th
at t
h
e false po
sitiv
e rate g
e
t d
ecreased
to
a
g
r
eater
extent.
Fi
gu
re
1.
Sy
st
em
co
m
ponent
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
98
0 – 9
8
5
98
2
The sy
st
em
com
ponent
s are
descri
bed i
n
Fi
gu
re 1. I
n
t
h
e
dat
a
col
l
ect
i
o
n
phase, t
w
eet
s
or m
a
il
s wi
th
URL are c
o
llected for re
dire
ctions t
h
at m
a
y be s
u
s
p
ic
ious or
uns
us
picious
.
In extraction phase t
h
e
dom
a
ins
that are i
d
entic
al are c
o
llected to cl
assi
fy
t
h
em
. M
achi
n
e
l
earni
ng
(s
u
p
e
r
vi
se
d l
ear
ni
n
g
)
i
s
do
ne i
n
t
h
e ne
xt
p
h
a
se an
d classificatio
n
is
don
e at th
e term
i
n
atio
n lev
e
l
b
y
lin
k
an
alysis alg
o
rith
m
.
2.
2.
Steps
in m
a
chine learning w
i
th
given data
sets
Super
v
ised learnin
g
(m
ach
in
e learn
i
n
g
)
wh
ich
will tak
e
a kn
own
set o
f
inp
u
t
s
an
d
kn
own
respon
ses
, an
d bu
ild
a m
o
d
e
l th
at g
e
n
e
rates
reas
on
ab
le pred
ictio
n
s
fo
r th
e resp
on
se to
n
e
w
d
a
ta.
K
now
n
d
a
ta
M
odel
K
now
n
Respon
ses
M
odel
Pre
d
i
c
ted Respo
nses
New
Dat
a
Thi
s
m
e
t
hod i
s
based
on
pre
d
i
c
t
i
on. S
u
pp
o
s
e i
f
we t
a
ke a real
t
i
m
e
exam
pl
e t
h
at
t
h
e num
ber of
p
e
op
le will h
a
v
e
h
e
art attack with
in
a year
.Th
i
s can
b
e
k
n
o
w
n
b
y
tak
i
ng
a train
e
d
d
a
ta sa
m
p
les th
at con
s
ist
o
f
ag
e, h
e
i
g
h
t
, weigh
t
,
b
l
oo
d p
r
essure etc.
So
t
h
is will co
m
b
in
e all th
e ex
istin
g d
a
ta i
n
to
a m
o
d
e
l that can
p
r
ed
ict a
p
e
rson
will h
a
v
e
a
heart attack
with
in
a year
. Su
perv
ised
learn
i
ng
sp
lits in
to
t
w
o
b
r
o
a
d
categories:
Classificatio
n
for resp
on
ses
th
at
co
nsist o
f
two
v
a
lues
,
s
u
ch
as '
t
rue'
or '
f
alse'
.
Classification
alg
o
rith
m
s
ap
ply to
no
m
i
n
a
l
d
a
ta sets.Reg
ressio
n
fo
r res
ponses t
h
at are
c
onsi
d
ere
d
a
s
a
real num
b
er, s
u
ch as
miles p
e
r g
a
llon
fo
r a
p
a
rticular car. It is adv
i
sed
to
c
r
eate
a regre
ssion
m
odel first, be
cause they are
often
m
o
re co
m
p
u
t
atio
n
a
lly efficient.
Fi
gu
re
2. UR
L R
e
di
rect
i
o
n
sc
hem
e
The
data set wi
ll consist of a
num
ber
of
use
r
accoun
ts a
nd
from
which s
p
a
m
accounts we
re detected.
The dat
a
set
s
have
bee
n
se
parat
e
d i
n
t
o
t
w
o:
t
r
ai
ni
n
g
and t
e
st
i
n
g. T
h
e feat
ure
s
are
fu
rt
her cl
assi
fi
ed as
Ph
ish
i
ng
d
a
taset and
leg
itimate d
a
taset.
Tak
i
ng
10
00
ph
ish
i
ng
and
10
00
leg
iti
m
a
te u
r
l’s in
t
o
acco
u
n
t
, t
h
e
perce
n
t
a
ge
o
f
l
e
gi
t
i
m
a
t
e
UR
L’s i
s
cl
earl
y
i
n
c
r
eased
by
usi
n
g
po
we
r i
t
e
rat
i
o
n
m
e
t
hod.
Ta
ble
2.
Le
gitim
a
te
and P
h
ishi
ng data in gi
ve
n
data sets
T
y
pes
L
e
gitim
a
te
Phishing
I
P
Addr
ess
0%
0.
04%
Hexadeci
m
a
l
Char
acter
0%
0.01%
Suspicious sym
bol
0%
0.
01%
Age of do
m
a
in
35%
75%
Page r
a
nk featur
e
1.
2%
88%
E
m
ail/
Twe
e
t in
URL
URL
CLASSI
FICA
TIO
N
Sus
p
icious
Beni
g
n
L
i
nk
Analysis
L
ear
ning
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
URL
ATTACKS
: Cla
ssifica
tion
o
f
URLs vi
a
Ana
l
ysis and
Lea
r
n
i
ng
(M. Rajesh
)
9
83
2.
3.
Algorithm
-
P
o
wer Iteration
method:
Po
wer
i
t
e
rat
i
o
n i
s
base
d
o
n
Ei
gen
val
u
e
of
a
gi
ve
n m
a
t
r
ix.
The
al
g
o
ri
t
h
m
i
s
m
a
i
n
l
y
based
o
n
Ei
g
e
n
val
u
es a
nd Ei
g
e
n vect
o
r
s w
h
i
c
h i
s
al
so kn
o
w
n as V
o
n- M
i
ses i
t
e
rat
i
on. Thi
s
m
e
t
hod c
a
n be use
d
w
h
en t
h
e
spars
e
m
a
t
r
i
x
i
s
very
l
a
r
g
e.
It
can c
o
m
put
e onl
y
o
n
e Ei
gen
val
u
e a
n
d
l
o
we
r co
n
v
er
gence
.
T
h
e pa
ge ra
n
k
i
t
e
rat
i
on al
g
o
r
i
t
h
m
i
s
gi
ven
be
l
o
w
by
whi
c
h t
h
e
ran
k
i
n
g e
q
u
a
t
i
on i
s
pr
o
duc
ed.
Algorithm
:
P
o
wer iterati
on
Initial Phase:
Gen
e
rate d
a
ta
set
s
th
at co
m
p
rise
o
f
URL
s
with suspicious and uns
uspicio
u
s links
Classification Phase:
Initially consider page count as 1.
Incre
m
ent the c
ount as each user visit the page.
Atte
m
p
t
to co
mpare with the datas
e
ts obtained
If
su
ccessf
u
l
, c
o
n
s
id
er as n
o
r
m
a
l
Else
, co
n
s
id
er a
s
sp
a
m
.
Pow
er-Iterate(
G
)
P
e/
n
R
1
Re
pe
a
t
Pr
(1
-d)e+ dA
T
Pr
-1
Until
||
Pr-
P
r-1
|
|<
ε
Retur
n
Pr
After satisfying all
the conditions,
the page r
a
nk equation is pr
oduced:
P =
(1
-d)
e
+ d
A
T
P
W
h
ere
,
P is
the pri
n
cipal eigen vector and R is the initial
count.
e
is the colu
m
n
ve
ctor and
d
is the dam
p
ing factor
.
3.
R
E
SU
LTS AN
D ANA
LY
SIS
As
di
scusse
d e
a
rl
i
e
r we
use
p
o
we
r i
t
e
rat
i
o
n
m
e
t
hod
beca
us
e i
t
sho
w
s
hi
g
h
est
A
U
C
an
d
l
o
west
F
P
(False Po
sitiv
e).
AUC is an
area
u
n
d
e
r ROC cu
rv
e. Th
e
area
u
n
d
e
r th
e ROC cu
rv
e
(AUC
) is a m
e
asu
r
e
of
ho
w wel
l
a pa
ram
e
t
e
r can di
st
i
n
g
u
i
s
h bet
w
een t
w
o
di
ag
n
o
st
i
c
gr
ou
ps (
d
i
s
eased/
n
orm
a
l).I
n o
u
r
case we use
phi
s
h
i
n
g an
d l
e
gi
t
i
m
a
t
e
UR
L’s.
W
e
c
o
m
p
ared va
ri
o
u
s cl
ass
i
fiers like L1 R
e
gula
r
ized a
nd
L2 Regularize
d and
al
so SVC
(
S
i
m
pl
e vect
o
r
cl
assi
fi
cat
i
on) a
nd
com
p
ari
s
on t
a
bl
e i
s
obt
ai
ne
d
usi
n
g t
h
e p
o
w
e
r i
t
e
rat
i
on m
e
t
h
o
d
.
H
e
r
e
w
e
too
k
10
000
sam
p
le tw
eets an
d fo
und
1
568
96
t
w
eets ar
e
b
e
n
i
g
n
an
d 156
,8
96
w
e
r
e
m
a
lic
io
u
s
.
Tab
l
e 3
.
C
o
m
p
arison
with
differe
n
t
classifie
r
s
CLA
SSIFIER
AUC
ACCUR
ACY
%
FP
%
FN
%
L
2
R-
L
R
0.
9000
91.
11
1.
56
6.
54
L
2
-
l
oss SVC
0.
8995
90.
79
1.
49
6.
54
Link Analysis
0.
9028
91.
96
1.
13
7.
01
SVC 0.
8984
91.
32
1.
33
6.
86
Fro
m
th
e abo
v
e tab
l
e
we can co
m
e
to
th
e co
n
c
l
u
sion
t
h
at LINK ANALYS
IS m
e
th
o
d
will in
crease
the accuracy le
vel and t
h
ere
by re
ducing t
h
e
false positive
rate.
3.
1.
Perfor
mance Analysis
C
onsi
d
eri
ng t
h
e per
f
o
rm
ance aspect
o
f
a
n
y
p
r
o
p
o
sed
an
d i
m
pl
em
ent
e
d al
go
ri
t
h
m
i
s
one
of t
h
e m
a
i
n
criteria to be c
onsi
d
ere
d
du
ri
ng t
h
e resea
r
c
h
.
In
o
u
r
re
sea
r
ch, pe
rform
a
nce analysis
has
been ca
rrie
d
out for
t
h
e i
m
pl
em
ent
e
d al
g
o
ri
t
h
m
usi
n
g t
h
e
o
p
e
n
s
o
u
r
ce
per
f
o
r
m
a
nce t
e
st
i
ng t
ool
JM
et
er
.
Pages
f
o
r
w
h
i
c
h t
h
e
al
go
ri
t
h
m
has been
i
m
pl
em
ent
e
d a
r
e fet
c
he
d as a
n
i
n
p
u
t to the
JMeter.
The
following
Table 4 and Fi
gure
3
have
bee
n
o
b
t
a
i
n
ed a
s
t
h
e
re
su
lt of
p
e
rform
a
n
ce testing
.
Table 4. Performance
Analysis
Lab
e
l
Nu
m
b
e
r
Of
Sa
m
p
l
e
s
(
C
ount)
Aver
age Response
Ti
m
e
(
M
s)
Erro
r%
Ho
m
e
Page
500
5446
0
Classification Pag
e
500
4256
0
T
o
tal 1000
4851
0
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
98
0 – 9
8
5
98
4
Tabl
e 4 has
be
en o
b
t
a
i
n
ed as
a resul
t
of pe
rf
orm
a
nce t
e
st
i
ng fr
om
JM
et
er. Label
i
ndi
cat
es t
h
e page
s
fo
r w
h
i
c
h t
h
e a
l
go
ri
t
h
m
has been i
m
pl
em
ented.
Num
b
er
of Sam
p
le indicates, the nu
m
b
er of vi
rtually created
users accesse
d the created page. Aver
a
g
e
Response tim
e
indicates the num
ber of s
a
m
p
les execut
e
d in
p
a
rticu
l
ar in
stan
ce.
Error % ind
i
cates the am
ount
of error
occurr
ed during
th
e testin
g process.
Fi
gu
re 3
s
h
ow
s
t
h
e gr
ap
hi
cal
re
prese
n
t
a
t
i
o
n
o
f
per
f
o
r
m
a
nce
anal
y
s
i
s
, w
h
i
c
h has bee
n
ob
t
a
i
n
ed fr
om
the JMeter. In the
Figure
3, X
axis
i
n
dicates the
pa
ge
whic
h
has
bee
n
tested and
Y a
x
is indicat
es the
Resp
on
se ti
m
e
in
Mil
liseco
nd
s. Bar
val
u
e indicates the avera
g
e val
u
e of
t
h
e resp
o
n
se
t
i
m
e
(544
6 m
s
for
Hom
e
Page a
n
d
42
5
6
m
s
fo
r
C
l
assi
fi
cat
i
on
page
).
Figure
3. Graphical Representa
tio
n
of
Perf
orman
ce An
alysis
3.
2.
Inference
s
From
the Pe
rform
a
nce analysis m
a
de on the i
m
ple
m
en
ted
alg
o
rith
m
,
th
e follo
wing
inferences were
mad
e
.
1.
The e
r
r
o
r
%
of
pe
rf
orm
a
nce a
n
al
y
s
i
s
i
s
0
(Ze
r
o
)
, t
h
is ind
i
cates th
at th
e app
l
icatio
n
of t
h
e alg
o
rith
m
is
fu
nct
i
o
nal
l
y
go
od
.
2
.
Th
e av
erag
e resp
on
se ti
m
e
o
f
th
e im
p
l
e
m
en
t
e
d
algorith
m
is
48
51
m
s
for
10
00
sam
p
les. Fro
m
th
is we can
in
fer th
at,
fo
r
a sing
le sam
p
le th
e respo
n
se time is 4
sec
o
nds (less t
h
an 5 s
econds
)
.
This i
ndicates t
h
at the
per
f
o
r
m
a
nce of
t
h
e i
m
pl
em
ented al
g
o
r
i
t
h
m
hol
ds
g
o
o
d
.
3.
From
t
h
e a
b
o
v
e
t
w
o
i
n
fere
nc
e, It
ca
n
be c
o
n
c
l
ude
d t
h
at
t
h
e
im
pl
em
ent
e
d al
go
ri
t
h
m
i
s
fun
c
t
i
onal
l
y
an
d
no
n
-
f
unct
i
o
nal
l
y
go
od
.
3.
3.
Discussions
Th
e m
a
in
g
o
a
l
o
u
r research
is
to
propo
se an
d i
m
p
l
e
m
en
t an
alg
o
rith
m
wh
ich
is Sim
p
le, Scalab
le and
Highly
efficie
n
t
(Rate of Detection). Apart from
these criteria, c
o
nside
r
ing the
perform
ance of the
im
pl
em
ent
e
d al
go
ri
t
h
m
i
s
very
i
m
port
a
n
t
aspect
of
the researc
h
work. Perf
ormance analysis of the
im
pl
em
ent
e
d al
go
ri
t
h
m
has b
een
di
scus
sed
i
n
t
h
e
sect
i
o
n
6
of
t
h
i
s
pape
r.
Here t
h
e im
ple
m
ented algorit
h
m
has bee
n
c
o
m
p
ared
with
th
e earlier
research
wo
rk
m
a
d
e
on
in
t
h
is
area
usi
n
g t
h
e
ab
ove
na
rrat
e
d c
r
i
t
e
ri
a. F
r
o
m
our a
n
al
y
s
i
s
t
h
e
fol
l
o
wi
n
g
Ta
bl
e
5
has
bee
n
nar
r
at
ed
w
h
i
c
h
h
i
gh
lig
h
t
s th
e
i
m
p
act o
f
t
h
e
i
m
p
l
e
m
en
ted
alg
o
rith
m
is far b
e
tter than
t
h
e earlier research
wo
rk
wh
ich
h
a
s
been carried out.
In Table
5 t
h
e
valu
e
‘Yes’, indicates t
h
at
the criteria
ha
s b
e
en
satisfied
fu
lly. ‘No
’
indicates
th
at th
e criteria h
a
v
e
no
t b
e
en
co
nsid
ered
o
r
n
o
t
satisfied
. Partial in
d
i
cates th
at th
e criteria h
a
v
e
b
e
en
p
a
rt
ially
satisfied
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
URL
ATTACKS
: Cla
ssifica
tion
o
f
URLs vi
a
Ana
l
ysis and
Lea
r
n
i
ng
(M. Rajesh
)
9
85
Tabl
e 5.
C
o
m
p
ari
s
o
n
of
Im
pl
em
ent
e
d
Al
g
o
rith
m
Vs Earlier
Research wo
rks
S.No Research
Wor
k
Si
m
p
l
e
Scalable
Efficiency
Perfor
m
a
nce
1 Dhanalksh
m
i
Ren
g
anayakulu
[1]
Partial
Partial
No
Yes
2
Jelena I
s
acenkova and Oliver
T
honna
r
d
[2]
Yes
Par
tial
Par
tial
No
3
Kelin.F and Stroh
m
a
i
e
r
.M
[3]
Partial
Partial
Partial
Partial
4
Lee.S
and Ki
m
.
J [4]
Partial
Partial
No
Yes
5
Nazpar
Yazdanf
a
r
,
Alex
Tho
m
o
[
5
]
Yes
Partial
Partial
Yes
6
Str
i
nghini.
G
,
Kr
uegel.
C and Vi
gna.
C
[6]
Yes
Yes
Par
tial
No
7
Song.
J,
L
ee.
S and Kim
J
[7]
Yes
Yes
Yes
No
8 Zachary
Miller
[8]
Yes
No
Partial
Partial
9 Our
Work
Yes
Yes
Yes
Yes
4.
CO
NCL
USI
O
N
Tabl
e 5
has
be
en
deri
ve
d
by
ou
r a
n
al
y
s
i
s
, f
r
o
m
whi
c
h
we c
a
n i
n
fer t
h
at
t
h
e im
pl
em
ent
e
d al
go
ri
t
h
m
i
s
sim
p
le, efficie
n
t,
high
pe
rformance, a
n
d sc
alable co
m
p
ari
n
g to th
e earlier research
work
s.
A con
v
e
n
t
io
n
a
l
m
e
thod seem
s
to be ineffec
tive in their conditional re
di
rection that se
parates
norm
al users from
being
red
i
rected
to
su
sp
ici
o
u
s
p
a
g
e
. Un
lik
e th
e co
nv
ection
a
l syste
m
s, classificatio
n
v
i
a an
alysis is ro
bu
st. Th
e
system
accuracy and pe
rf
orm
a
nce seem
s to be hi
gh this m
e
thod by
re
ferring the statistical Table 3. In the
fut
u
re,
p
r
ocess
has t
o
be e
x
t
e
n
d
ed
t
o
ha
ndl
e
d
y
n
am
i
c
redi
rec
t
i
ons.
REFERE
NC
ES
[1]
D. Rengan
a
y
a
ku
lu, “Detecting M
a
li
cious URLs in
E-m
a
il
,”
A
A
SRI
Pr
oced
ia
Els
e
v
i
er
, vol. 4
,
pp
. 12
5-131, 2013
.
[2]
J. Isacenkova an
d O. Thonnard, “Inside Scam
Ju
ngle:
a closer look at 419 scam e
m
ail operations,”
URASIP journa
l
of information
s
ecurit
y
, 2014.
[3]
F. Kelin
and M. Strohmaier, “Short links und
er Att
ack: G
e
o
g
raphical Analy
s
is of Spam in URL Shorten
e
r
Network,”
Proc.23 ACM Con
f
.H
ypertext
and
So
cial med
i
a (
H
T)
,
2012
.
[4]
S. Lee and J. Ki
m
,
“W
arning Bird: A Near Real
-Tim
e De
te
ction
Sy
st
em
for Suspicious URLs in Twitter Stre
am
,
”
IEEE transactio
ns on
secure co
mputing,
vo
l/iss
u
e:
10(3)
, 2013
.
[5]
N. Yazdanf
a
r a
nd A. Thom
o, “
C
ollaborat
ive-F
ilter
i
ng for Re
c
o
m
m
e
nding URLs to Twitt
er
Users,”
Pr
ocedi
a
,
vol/issue: 19(3), pp.
412-419
,
20
13.
[6]
G. Stringhini,
et
al.,
“Detecting
Spammers on
Social Networks,”
Proc.26
th
Ann. Computer Secu
rity Applications
C
o
n
f
.
(A
C
S
A
C
),
2010.
[7]
J. Song,
et al.
, “
S
pam Filtering
in Twitter Using Sender-Rece
i
v
er Relat
i
onship
,
”
Proc.14
th
International Symp.
Rec
e
nt
Advan
ces
in In
tr
usion detection (
RAID)
,
2
011.
[8]
Z
.
Mi
ll
e
r
,
“T
wi
tt
e
r
spa
mme
r de
te
c
t
ion using data stream cluster
i
ng,”
In
for
m
ation
s
c
ien
ces
E
l
s
evi
er
,
vol. 260, pp.
64-73, 2013
.
Evaluation Warning : The document was created with Spire.PDF for Python.