TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol.12, No.6, Jun
e
201
4, pp. 4330 ~ 4
3
3
6
DOI: 10.115
9
1
/telkomni
ka.
v
12i6.461
0
4330
Re
cei
v
ed O
c
t
ober 2, 20
13;
Revi
se
d De
cem
ber 28, 20
13; Accepted
Jan
uary 21, 2
014
Software Vulnerability Analysis Method Based on
Adaptive-K Sequence Clustering
Di Wu
1,2
, Jiadong Re
n
1
*
1
Colle
ge of Info
rmation Sci
enc
e and En
gi
neer
ing, Yans
ha
n Univers
i
t
y
, Qin
hua
ng
dao 0
6
6
004, Ch
in
a
2
Departme
n
t of Information a
n
d
Electron
ic En
gin
eeri
ng,
He
b
e
i Univ
ersit
y
of
Engin
eeri
ng,
HanD
an 0
5
6
0
3
8
,
Chin
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: bestmoog
oo
@16
3
.com
A
b
st
r
a
ct
Softw
are vuln
e
r
abil
i
ty an
alys
is
has
bec
ome
a
hot to
p
i
c rec
e
ntly. How
e
ver,
the trad
ition
a
l
meth
ods
for analy
z
i
ng s
o
ftw
are vulner
abil
i
ty have
hi
gher
false positive rate
. In this pap
er, ada
p
t
ive K function
i
s
defined, and S
VAAKSC
(
Softw
are vuln
erab
il
ity ana
lysis
me
thod
b
a
se
d on
ada
ptive-K se
q
uenc
e cluster
i
n
g
)
is pres
ented.
T
he col
l
ected
obj
ects in soft
w
a
re vu
ln
erab
i
lity seq
uenc
e
datab
as
e SVS
D are
pretreat
ed t
o
equ
al l
e
n
g
th v
e
ctors. Moreov
er, accord
in
g to ad
aptiv
e-K b
a
sed s
e
q
u
e
n
ce
clusteri
ng
alg
o
r
ithm,
all s
o
ftwa
r
e
vuln
erab
iliti
e
s i
n
SVSD are cl
ustered i
n
to K clusterin
g
. Afterw
ards, by matchin
g
the si
m
i
lariti
es betw
e
e
n
detected v
u
ln
e
r
abil
i
ty from sof
t
w
a
re and eac
h clusteri
ng
ce
nter, w
hether the detect
ed
vul
nera
b
il
ity is a rea
l
softw
are vuln
erab
ility can
be ju
dge
d. Finally,
th
e correspo
ndi
ng a
n
a
lysis rep
o
rt is obtain
ed. T
h
e
exper
imental r
e
sults and analysis s
how that
SVAAKSC has lower false
posi
tive r
a
te and better analys
is
tim
e
.
Ke
y
w
ords
: softw
are vulnera
b
ility a
nalys
is, sequ
enc
e cl
ust
e
rin
g
, ada
ptive
-
K, false positiv
e rate
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Information
secu
rity tech
n
o
logy ha
s b
e
en a
c
cepted
and
widely a
pplied i
n
a lot
of fields
recently [1].
However, the s
a
fety of network
an
d informatio
n system face
s
great
challe
n
ge
becau
se of the incursion
of hackers.
Software
vulnerability is one of
the root ca
uses of
informatio
n
secu
rity proble
m
s. It is the
d
e
fect in
be
ha
vioral lo
gic,
d
a
ta a
c
cess
a
nd oth
e
r
asp
e
cts
[2]. As to software vuln
era
b
ility has pa
ssive and
st
atic ch
aracte
rist
ics, comp
uter softwa
r
e ca
n
easily be expl
oited by maliciou
s
attacke
r
s
without
aut
hori
z
ation. Th
us it can d
o
g
r
eat dam
age
to
human
societ
y. Software v
u
lnerability a
nalysi
s
ha
s b
e
com
e
a
pop
ular
and im
p
o
rtant topi
c o
f
informatio
n secu
rity theory
rese
arch a
n
d
practi
cal work [3].
In order to
prevent the at
tack
and int
r
usion of
soft
ware
vulnerabilities, the effective
discovery a
n
d
analy
s
is
of the software
vulnera
b
ilitie
s is e
s
sential.
Therefore,
ho
w to an
alyze
the
softwa
r
e vuln
erabilitie
s fast and effectively which
ca
n be see
n
a
s
a key rol
e
for improvin
g
s
o
ftware s
e
curity
pe
rform
a
nce [4].
2. Related Work
A lot of research work of
software vulnerability analysis ha
s been carri
ed
out
in recent
years. Ed
ge
-weig
h
ted
call
grap
hs minin
g
algo
rith
m fo
r software b
u
gs lo
cali
zatio
n
wa
s
pre
s
e
n
t
ed
by Eiching
e
r [
5
]. A novel re
ductio
n
techn
i
que for
call
g
r
aph
s
whi
c
h i
n
trodu
ce
s e
d
ge weight
s was
discu
s
sed. O
n
the basi
s
of
graph mi
ning
and tradi
tion
al feature
sel
e
ction, an a
n
a
lysis te
chni
q
u
e
for weighted
call g
r
ap
hs was al
so i
n
tro
d
u
ce
d. Ho
wev
e
r, when th
e
scale of g
r
a
p
h
is ve
ry larg
e
,
the an
alysi
s
efficien
cy of
softwa
r
e
bug
s i
s
d
e
cr
ea
se
d. In o
r
de
r to
re
duce th
e
amount
of
sli
c
ed
cod
e
s, and f
a
cilitate the sub
s
e
que
nt cal
c
ulatio
n of code cove
rage inform
ation, He et al.
prop
osed a
softwa
r
e d
e
fects
analy
s
is method b
a
sed on
pro
g
ram structu
r
e
reversin
g d
a
ta
depe
nden
cy [6]. Reverse
data dep
end
ence analy
s
i
s
focu
se
s o
n
the same
progra
m
execu
t
ion
path to
anal
yze d
a
ta, an
d the
data
depe
nden
cie
s
a
r
e
extra
c
t
ed a
nd
sto
r
e
d
on
this p
a
t
h.
Mean
while, t
he
store
d
a
nd reverse
d
data
depe
n
den
ce i
s
tra
v
erse
d b
a
se
d on
a p
a
rti
c
ula
r
variable. Th
e
n
cod
e
state
m
ents related
to the specifi
c
variabl
e are
found.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Software Vul
nerability Analysi
s Method Based
on Adaptive-K Sequence
Cl
ustering (Di Wu)
4331
In ord
e
r fu
rth
e
r imp
r
ove th
e analy
s
is time, m
any
scho
lars u
s
ed th
e
clu
s
terin
g
me
thod to
analyze soft
ware vulne
r
a
b
ilities. Mah
a
w
ee
ra
wat et
al. pre
s
ente
d
a two-l
e
vel clusteri
ng met
hod
to predi
ct soft
ware fault [7]. SOM metho
d
wa
s emplo
y
ed to classif
y
historical d
a
ta into clu
s
ters.
Software faul
ts that occurred in clu
s
ter
comp
one
nts
are p
r
edi
cted
by RBFN. Howeve
r, as the
numbe
r
of S
O
M u
n
its
gro
w
s, th
e time
co
st is la
rge.
On t
he
ba
sis of
dynami
c
inform
ation f
l
ow,
softwa
r
e
se
curity failure
s
are a
nalyzed
in literatur
e [8]. The tool of DynFlo
w is u
s
ed to record
informatio
n flow p
r
ofile
s of
executio
ns. I
n
the li
ght
of automatic clu
s
ter
a
nalysi
s
,
the executio
ns
are
sele
cted.
The
efficie
n
c
y of i
n
form
ation flo
w
an
omaly a
nalysis d
epe
nd
s
on the
type
s of
prog
ram
s
, an
d the existed types of prog
ram defec
t
s
. Wan
g
et al. discusse
d DSV
R
DC (Dete
c
ting
Software
Vul
nera
b
ilities
M
e
thod based on Rapid
De
nsity Cl
uste
ri
ng) [9].
Ne
w
definition of
rd
-
entropy a
n
d
s
-order are presented
in this method. According to utilizing
rd
-entro
py-b
a
s
ed
density clustering,
the vulnerability sequence pa
ttern database i
s
built.
Analyzing se
quences are
detecte
d by the variation o
f
s
-or
d
e
r
.
The ab
ove algorith
m
s h
a
ve been im
proved th
e time co
st of softwa
r
e vul
nera
b
ility
analysi
s
.
Ho
wever,
goo
d
seq
uen
ce
si
milarity mea
s
urem
ent fo
r
software
vulne
r
ability is
still not
well a
ddressed. DVCMA (Dete
c
ting Vul
nera
b
ilities
b
a
sin
g
on
Clu
s
terin
g
an
d
Model An
alyzing)
wa
s p
r
op
ose
d
[10]. In thi
s
a
pproa
ch,
the identific
a
t
ion dista
n
ce
is to filtrate
initially befo
r
e
cal
c
ulating the edit di
stance of
sequences. The vul
nerabilitie
s
hi
ding in the software can
be
mined un
der
a novel edit-distan
ce
-ba
s
ed simila
ri
ty functio
n
. Although
DVCM
A can effecti
v
ely
detect
software vul
nerabilit
ies, and
with lower fa
lse positive and false neg
ative rates,
but t
h
e
similarity measurement bet
ween software vulnerab
iliti
e
s sequences
is based on
edi
t-di
stance, i
t
still has a hi
gh computational com
p
lexity.
In this arti
cle,
in
order to im
prove the
analy
sis tim
e
of
software vul
nerability, on the basi
s
of the numb
e
r of all seq
uen
ce elem
e
n
ts, and the
numbe
r of comm
on seq
uen
ce elem
e
n
ts
betwe
en two
vulnera
b
ilities, the vulnerab
ility similarity
measurement
is de
sig
ned.
In acco
rdan
ce
w
i
th
tw
o-
p
has
e
s
i
milar
i
ty matc
hing, whether
DV
(detecte
d
vulnerability) is a real
soft
ware
vulnera
b
ility
or not will
b
e
determi
ned. Afterwa
r
ds,
i
n
o
r
de
r to
re
d
u
ce
the
impa
ct of
paramet
er
K
to the cluste
ri
ng quality, further improve
the perfo
rm
a
n
ce of the fal
s
e po
sitive ra
te. On the basis
of a new de
finition of adaptive
K
function, AKSC (Adaptive-K
-
based Sequence Clusteri
ng)
algorith
m
is p
r
esented.
Th
e
obje
c
t with the small
e
st
A
daptive
(
K
) i
s
deeme
d
to the optimal
K
.
The
remi
nde
r of thi
s
p
ape
r
is o
r
g
anized
as foll
ows. In
se
ction
2, we
de
scribe
the
related
work of software vulne
r
a
b
il
ity analysis.
Section 3 giv
e
s p
r
oble
m
d
e
finitions. Se
ction 4
con
c
l
udes
the SVAAKSC method. Section 5
contains experim
ent
al result
s, and we offer
our
conclusions in
se
ction 6.
3. Problem Definitions
SVSD
(Softwa
r
e vul
nerabili
ty sequ
en
ce
databa
se
) i
s
co
mpo
s
e
d
of coll
ecte
d
softwa
r
e
vulnerability sequences,
SVSD
={
SVS
1
,
SVS
2
,…,
SV
S
N
}, wherei
n,
SVS
(Software vulnerabili
ty
seq
uen
ce
) re
pre
s
ent
s an
orde
rly pro
g
ram ope
ration
sequ
en
ce
whi
c
h can le
ad to prod
uce
vulnerability.
SVS
=
a
1
a
2
…
a
n
.
a
i
denote
s
t
he item of
SV
S
,
a
i
∈
L
(1
≤
i
≤
n
).
L
={
a
1
,
a
2
,…
,
a
m
} is the i
t
e
m
s
e
t,
m
is the
numbe
r of item set.
N
is th
e numbe
r of
SVS
in
SVSD
.
Let
SVSE
be a
set of
the software
vulnerability se
quen
ce
element
s in
SVSD.
SSVSE=
{
SE
1
,SE
2
,…,SE
r
,…,
S
E
|SSVS
E|
}, se
que
nce el
ement
SE
r
is a pair
of it
ems
a
i
a
j
of
S
VS,
whe
r
e
i
<
j
.
T
he
n
u
mb
er
o
f
software vu
lnera
b
ility sequen
ce elem
ents in
SSVSE
is denote
d
as
|
SSVSE
|.
E
a
ch
SVS
can be
represented
as a |
SSVSE
|-dime
nsi
onal vector.
Suppo
se th
at
SVS
x
and
SV
S
y
are any t
w
o
software
vul
nerability sequences i
n
SVSD
,
SE
(
SVS
x
) and
SE
(
SVS
y
) indicate the software vulnerab
ility sequence pair sets of
SVS
x
and
SVS
y
. Th
e simila
rity measu
r
ement be
tween the
m
is e
x
presse
d
as
SVSim
(
SVS
x
,
SVS
y
)=
|
SE
(
SVS
x
)
∩
SE
(
SVS
y
)
/
|
SE
(
SVS
x
)
∪
SE
(
SVS
y
)
.
Whe
r
e
|
SE
(
SV
S
x
)
∪
SE
(
SVS
y
)
re
prese
n
ts the
numb
e
r of
all
se
qu
ence ele
m
ent
s bet
wee
n
SV
S
x
and
SVS
y
.
|
SE
(
SVS
x
)
∩
SE
(
SVS
y
)
d
enote
s
the
numbe
r of
comm
on se
quen
ce el
e
m
ents
betwe
en
SVS
x
and
SVS
y
.
In traditional
K
-mean
s, ho
w to make a
rapid a
nd a
c
curate
para
m
eter
K
is a critical
probl
em. The
sele
ction of
K
has a gre
a
t
influence o
n
the clu
s
teri
ng re
sults. In
this pape
r, to
adju
s
t the nu
mber of cl
ust
e
rs dyn
a
mi
ca
lly, adaptive
K
function is
prop
osed.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4330 – 4
336
4332
Definition 1.
Suppo
se that
SVS
is any software vul
nerability sequence in
SV
SD
.
K
represent
s the number of
clusters, the
c
enter of software vulnerability cluster
SVC
P
is de
scribed
as
SVCC
P
. The ada
ptive
K
function is
sh
own a
s
follo
ws.
2
,
1
2
))
,
(
1
(
))
,
(
1
(
)
Adaptive(K
q
P
q
P
K
P
SVC
SVS
P
SVCC
SVCC
SVSim
Min
N
SVCC
SVS
SVSim
P
(1)
Whe
r
ein,
N
is the num
be
r of obje
c
ts i
n
SVSD
.
P
=
1
,2,…,
K
-1;
q
=
P
+1,
P
+2,
…
,
K
. The
adaptive
K
i
s
gai
ned
by the minimu
m Adaptive
(
K
). In this way, the average di
ssimila
rity
between
software vulnerability sequen
ces in the
same cl
uster should be as
small as
possi
ble.
And the mini
mum value of
dissimil
a
rity betwe
en ea
ch clu
s
ter ne
e
d
s to be the l
a
rge
s
t.
4
. The SVAAKSC Me
thod
Software vul
nera
b
ility an
alysis
metho
d
ba
sed
on
adaptive-
K
seq
uen
ce
clu
s
t
e
rin
g
named SVA
AKSC inc
l
udes
five s
t
ages
. Firs
t and
foremos
t, on the bas
is
of the c
o
llected
comm
on software vulnerabilities,
SVSD
is e
s
tabli
s
hed. Afterwa
r
ds, the o
b
je
cts in
SVSD
are
prep
ro
ce
ssed
to eq
ual l
e
ngth ve
ctors.
Moreover
,
by
ada
ptive-
K
-
ba
sed
seq
uen
ce
clu
s
te
ring
algorith
m
,
K
vulnera
b
ility cente
r
s a
r
e
gaine
d. Next
through two
stage simil
a
rity measu
r
e,
th
e
most
similar vulnerability to
DV
can
be get. Fin
a
lly, the ana
lysis
repo
rt
of the dete
c
ted
vulnerability is output. The framewor
k of
SVAAKSC is shown as Figure 1.
Figure 1. The Framework
of SVAAK
SC
4.1.
SVSD
Establishing
Comm
on software vulnerabilities
are collected to establish
SVSD
.
SVSD
is co
mposed
by five-dime
n
s
ion
a
l tupl
es,
and
ea
ch
tu
ple i
s
exp
r
e
s
sed
a
s
<
SVSN
,
SVS
,
SVSTY
,
SVSI
NF
,
SVSRANK
>. Whe
r
ein,
SV
SN
is the
nu
mber of
SVS
.
SVSTY
indicates th
e
SVS
type.
SVSINF
is
the relevant feature info
rm
ation of
SVS
.
SVSRANK denotes the
SVS
rank.
4.2.
SVS
Pre
p
roces
sing
As to the l
e
n
g
ths of
softwar
e vul
nerabil
i
ty seque
nces in
SVSD
a
r
e not
the same
usu
a
lly.
Therefore, it
is ne
ce
ssa
r
y to
prep
ro
ce
ss th
em. By scan
n
ing
the
SVSD
once, item set
L
={
a
1
,
a
2
,…,
a
m
} and
SSVS
E=
{
SE
1
,SE
2
,…,
S
E
r
,…,SE
|
SSVSE|
}
are gained.
C
o
mmon
sof
t
w
a
re
v
u
lnera
b
ilit
ie
s
coll
ec
tio
n
S
o
ftw
are
sou
r
c
e
cod
e
s
Prepro
c
ess
N
sof
t
w
a
re
v
u
lnera
b
ility
v
e
ctors
A
dapt
iv
e-
K
bas
ed
s
e
qu
en
ce
clu
s
te
ri
ng
K
v
u
lnera
b
ility
cen
t
e
r
s
Prepro
c
ess
De
te
cted
v
u
lnera
b
ility
De
te
cted
v
u
lnera
b
ility
v
e
ctor
The fi
rs
t
sta
ge
simil
a
rity
meas
ur
e
Ou
tp
ut
the
ana
ly
sis
rep
o
rt
Obtai
n
t
h
e
most simil
a
r
v
u
lnera
b
ility
to det
ec
ted
v
u
lnera
b
ility
Y
Th
e
se
co
nd
sta
ge
simil
a
rity
meas
ur
e
Similar?
N
SVSD
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Software Vul
nerability Analysi
s Method Based
on Adaptive-K Sequence
Cl
ustering (Di Wu)
4333
In accord
an
ce with the su
pport relation
ship
s bet
wee
n
SVS
x
and
SE
r
in
SSVS
E
, each
SVS
x
can be
pre
p
ro
ce
sse
d
as a |
SSVSE
|-dimen
s
io
nal vecto
r
. If
SVS
x
suppo
rts
SE
r
,
then the
value is 1, otherwise, the corre
s
p
ondin
g
value is 0.
For exa
m
ple
,
SVSD=
{
SV
S
1
,
SVS
2
,
SVS
3
}
.
SVS
1
=aba
,
SVS
2
=b
caa
,
SVS
3
=cb
b
a
.
Acco
rdi
ng to
simple
an
alysis,
L
={
a
,
b
,
c
},
SSVS
E={aa,
ab,ac,b
a,bb,b
c
,ca,
cb,cc},
|
SSVSE|
=9. The
pretreatment
result
s are
SVS
1
={1,1,0,1,0,0,0,0,0},
SVS
2
={1,0,0,1,0,1,1,0,0},
SVS
3
={0,0,0,1,1,
0,1,1,0}.
4.3. Adap
tiv
e
-
K
-Bas
ed Sequen
ce Clu
s
tering
In traditional
K
-mean
s seq
uen
ce
clu
s
te
ring al
go
rithm, it ne
ed
s to
set pa
ramete
r
K
ahe
ad
of time by us
er. In this
paper, a novel
AKSC (adaptive-
K
-ba
s
e
d
seque
nce clu
s
tering
) algo
rithm
is pre
s
e
n
ted.
It only needs to restri
ct the ran
ge of
K
. As a resul
t, the similarities of
SVS
in
the
same
are
cl
uster
as la
rg
e as po
ssibl
e
, in t
he meantime, the simila
rities b
e
twee
n different
clusters are as sm
all as possi
ble. By applying AKSC algorithm,
SVS
in
SVSD
are clu
s
tere
d in
to
K
c
l
us
ters
.
K
adaptive software vul
nerability cluste
rs c
an be g
a
i
ned. The
sp
ecific p
r
o
c
e
s
s of
AKSC is
s
h
own as
follows.
Algorithm AKSC (
SVSD
,
N
,
SVS,
K
ma
x
)
Input:
SVSD
: software vu
lnera
b
ility se
quen
ce
data
base;
N
:
the
num
ber of
SVS
in
SVSD
;
SVS:
any softwa
r
e
vulnera
b
ility seque
nce of
SVSD
;
K
ma
x
: T
he value of the large
s
t
K.
Output:
K
adaptive softwa
r
e vulne
r
abilit
y cluste
rs
BEGIN
Step 1: For
K
=
2
to
K
ma
x
Step 2: In
SV
SD
,
sele
ct
K
SVS
as initial cluste
ring
ce
nters
ran
doml
y
;
Step 3:
Com
pute the
si
milarity bet
ween
SVS
in
SVSD
a
nd
cu
rren
t clu
s
teri
ng
centers,
each
SVS
is assign
ed to the most
simil
a
r clu
s
te
r;
Step 4: For each clu
s
ter,
calcul
ate the averag
e si
milaritie
s
of
SVS
, clusteri
ng ce
nters
are
upd
ated,
repe
at the St
ep3 a
nd Step
4, until
the cl
usteri
ng re
sul
t
s
do not
cha
nge
a
n
y
mo
re
,
jump to Step 5;
Step 5: Acco
rding to formul
a (1), the corresp
ondi
ng Ad
aptive
(
K
) i
s
o
b
tained;
Step 6: Com
pare
d
with th
e value of the pre
s
ent Ad
aptive
(
K
) a
n
d
the forme
r
one, the
corre
s
p
ondin
g
K
of the smallest on
e is saved;
Step 7: Output
K
adaptive softwa
r
e vuln
erability clu
s
t
e
rs.
END
Whe
r
ein, i
n
g
eneral
con
d
itions,
2
≤
K
≤
K
max
.
And the paramete
r
K
is m
u
ch le
ss than
N
,
it is the nu
mb
er of
SVS
in
SVSD
. We
set
K
max
is
N
or
N
ln
2
. Here, the l
o
we
r limit of the
corre
s
p
ondin
g
intege
r valu
e is u
s
ed. Th
e optimal
cl
u
s
terin
g
re
sult
s can be
obt
ained effe
ctively.
The impa
ct of inaccurate
selectio
n of paramete
r
K
for the clusterin
g
qualit
y of
tradition
al
K
-
mean
s is red
u
ce
d gre
a
tly.
4.4. T
w
o
-
Pha
se Similarit
y
Measuring
After
SVS
prepro
c
e
s
sing,
the
DV
(detected vulnerability)
and obj
ects in
SVS
D
are all
prep
ro
ce
ssed
to equal-di
m
ensi
onal ve
ctors. He
re,
DV
is extra
c
ted from the softwa
r
e sou
r
ce
cod
e
s after
static an
alysi
s
, and
furthe
r
need
ed to
an
alyze it
s vuln
erability featu
r
e a
c
co
rding
to
some ce
rt
ain rule
s.
In AKSC, the
s
i
milarities
between
SVS
are calculated by utilizing software vulnerability
seq
uen
ce ele
m
ents simila
rity
SVSi
m
(
SVS
x
,
SVS
y
). It is d
e
si
gne
d
to analy
z
e t
o
the
seq
u
e
n
ce
element
s that
SVS
x
and
SVS
y
contains.
Thus the soft
ware vulnera
b
ilities analy
s
is efficien
cy can
be enh
an
ced
greatly.
The p
r
o
c
e
s
s
of simila
rity measurement
is divi
d
ed int
o
two
pha
se
s. In the first
st
age, the
simila
rities of
DV
a
nd
K
software
vul
n
erability clustering
centers are comput
ed. If the m
o
st
simila
r software vuln
era
b
i
lity clusteri
ng
cente
r
to
DV
is found, t
he second
st
age of
simila
rity
measure
will
be started.
In th
e most similar
soft
ware vul
nerability cluster,
by
cal
c
ulati
n
g
simila
rities b
e
twee
n
SVS
and
DV
,
t
he most
simila
r
SVS
to
DV
can be
gai
ned
. The a
nalysi
s
repo
rt of
DV
is recorded.
It is mainly
about t
he rel
e
vant feature
information
SVSINF
of the
corre
s
p
ondin
g
SVS
. On the co
ntra
ry, if
DV
is n
o
t simila
r wi
th any software vuln
erabi
lity
clu
s
terin
g
ce
nters, then it can be vie
w
e
d
as a softwa
r
e ope
ration
seq
uen
ce. Th
e corre
s
po
ndi
ng
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4330 – 4
336
4334
analysi
s
re
po
rt is outputte
d. In a word,
whethe
r
DV
is a real
SVS
or not will be determi
ned i
n
accordan
ce with
two-pha
se
s
i
milarity matc
hing.
5. Experimental Re
sults
and An
aly
s
is
In order to verify the performance
of SVAAKSC
algorithm, FTP
s
e
rv
er s
o
ftware wu-ftpd
unde
r linux is adopted i
n
this
se
ction. T
r
adition
al
stat
ic analy
s
is to
ol ITS4 is al
so used du
rin
g
comp
ari
ng th
e false po
siti
ve rate
analy
s
is.
Wh
erei
n, the effe
ctive so
urce
co
de
line
s
of
wu
-ftpd
are 13
582, a
nd the vulnerability code li
nes a
r
e
64. 1
0000 softwa
r
e vulnera
b
ility seque
nces
were
colle
cted to e
s
tabli
s
h
SVSD
.
O
u
r e
x
p
e
r
i
men
t
s
ar
e ru
n
on
th
e In
te
l
C
o
r
e
2
D
u
o 2
.
9
3
G
H
z
C
P
U
,
2G
B ma
in memo
r
y
an
d
Mic
r
osoft XP.
All algorithms
are
written in
MyEc
lipse 8.5. We c
o
mpare SVAAKSC with DV
CMA
[9] in false po
sitive rate an
d analysi
s
efficien
cy.
5.1. False Positiv
e
Rate Analy
s
is
In this
se
ctio
n, the fal
s
e p
o
sitive rates
of three
alg
o
rithms a
r
e
an
alyzed
by the
formul
a
as
follows
.
m
t
v
arvt
FPR
Num
Num
m
R
1
)
1
(
1
∗
100%
(2)
Whe
r
ein,
Nu
m
arvt
denotes the num
be
r
of analyzed
real vulne
r
abil
i
ties in th
e
t
-t
h false
positive rate
analysi
s
.
Nu
m
v
is the number of vulnerabilities
in software. We set m=10.
The
analysi
s
resul
t
s of fal
s
e
positi
ve rates
of SVAAKSC, DVCMA
and
IT
S4 algorithms are shown as
Table 1.
Table 1. The
Analysis
Re
sults of False Positive Rate
s
Algorithm or Anal
y
s
is Tool
False Positive R
a
te
(
%
)
SVAAKSC
25.4%
DVCMA
30.8%
ITS4
36.9%
From
Tabl
e
1, we
can
see that
in F
T
P se
rver
so
ftware
wu-ftp
d, the
avera
ge fal
s
e
positive
rate of SVAAKSC i
s
lo
wer t
han
DVCMA and stati
c
anal
ysi
s
tool
ITS4. Thus the
meanin
g
of cl
usteri
ng resul
t
s can b
e
exp
l
ained a
c
cura
tely by SVAAKSC.
For SVAAKSC, by adaptive-
K
-based sequence clust
e
ring algorithm, SVSD is
clustered
by adaptive
-
K
-based sequence clustering
algorith
m
AKSC. It does not set cluster
number
K
in
advan
ce, but
only re
stri
ct its ran
ge. By comp
ari
ng wi
th all the
Adaptiv
e
(
K
),
the
objec
t with the
smalle
st
Adap
tiv
e
(
K
) i
s
d
eemed
to th
e optimal
K
. The
impa
ct
of ina
c
curate sele
ction
of
para
m
eter
K
for the clu
s
tering q
uality is re
du
ced
gr
eatly. The
optimal cl
ust
e
ring
re
sults of
software vul
n
erabilities
can be
gained.
The fal
s
e
po
sitive rate will be redu
ced greatly. In this
way, the final obtaine
d
SVS
is
the mos
t
s
i
milar with
DV
.
5.2. Analy
s
is
Time Test
To test the
software vul
nerabilities
analysi
s
time
of SVAAKSC and
DV
CM
A,
Num
denote
s
the
numbe
r of software vul
n
e
r
abilitie
s. We
set
Num
=10
00, 2000, 3
0
00, 4000, 5
0
00.
The test re
sul
t
s of the runni
ng time of t
he two algorith
m
s are sh
own as Figu
re 2.
As sh
own in
Figure 2, th
e runni
ng tim
e
s of two al
gorithm
s are
growi
ng line
a
rly with
increa
sing
Num
. In accordance
with t
he support relationships
bet
ween software
vulnerability
seq
uen
ce
SV
S
and seq
u
e
n
ce elem
ent
SE
r
in
SSVSE
, each
SVS
can
be
prep
rocesse
d
a
s
a
|
SSVSE
|-dimensi
onal
vector. If
SVS
su
ppo
rts
SE
r
,
then th
e
value
is 1,
otherwi
se,
the
corre
s
p
ondin
g
value i
s
0.
Furth
e
rmo
r
e
,
on the
basi
s
of the
num
ber
of all
seq
uen
ce el
eme
n
t
s
and the nu
m
ber of comm
on se
que
nce element
s bet
wee
n
two
SVS,
the compu
t
ation compl
e
xity
of the s
i
milarity meas
urement in SVA
AKSC is
decreas
ed. Finally, the s
o
ftware vulnerability
analysi
s
time is improved e
ffectively.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Software Vul
nerability Analysi
s Method Based
on Adaptive-K Sequence
Cl
ustering (Di Wu)
4335
Figure 2. Running Times
of
SVAAKSC a
nd DVCMA
6. Conclusio
n
In this work, i
n
ord
e
r to i
m
prove the
pe
rfor
man
c
e
of the false po
sit
i
ve rate of
so
ftware
vulnera
b
ilities analysi
s
time, softwa
r
e
vu
lnerability
analysi
s
m
e
thod ba
se
d
on adaptiv
e-
K
s
e
quenc
e
c
l
us
tering named SVAAKSC
is
disc
us
sed. Firs
t and foremos
t, common
s
o
ft
ware
vulnerabilities are
collect
ed to establish
SVSD
.
Afterwa
r
ds,
according
to the su
pp
ort
relation
shi
p
s
betwe
en soft
ware vulnera
b
ility sequen
ce
s
SVS
and sequ
en
ce
element
s, ea
ch
SVS
can
be
pre
p
ro
ce
sse
d
a
s
eq
ual-d
imensi
onal
vector.
Moreo
v
er, on th
e
basi
s
of
a n
e
w
definition of
a
daptive
K
func
tion,
a novel
adaptive
-
K
-bas
ed sequenc
e
c
l
us
tering algorithm
AKSC
is pre
s
e
n
ted. It does not se
t cluste
r num
ber
K
in adva
n
ce, but only
rest
rict its ran
ge.
The
obje
c
t
wit
h
t
he
sm
a
llest
Adaptive
(
K
) i
s
deem
ed to the
op
timal
K
. By adopting AKSC algorithm
,
K
vulnerability centers of
SVSD
are g
a
ine
d
. The impa
ct of inaccurate
sele
ction of para
m
eter
K
for
the cl
ustering quality i
s
reduced
greatly. The
optimal clusteri
ng re
sults of
software vulnerabiliti
e
s
can
be gai
ne
d. Afterwards, in accordan
ce
with
two
-
p
hase simil
a
rit
y
matching,
wheth
e
r d
e
te
cted
vulnerability
DV
is a real
SVS
or not will be dete
r
mined. On th
e basi
s
of th
e numb
e
r of
al
l
seq
uen
ce
ele
m
ents
and t
he num
be
r o
f
commo
n se
quen
ce
elem
ents b
e
twe
e
n
two
SVS,
the
comp
utation
compl
e
xity of the
simila
rity mea
s
u
r
eme
n
t is
de
cre
a
sed. Finally, th
e an
alysi
s
re
port
of the detec
ted vulnerability is
output.
Ou
r experimental res
u
lt
s
s
h
ow that
SVAAKSC
c
a
n
analyze
DV
of
softwa
r
e source co
de
s with
lo
we
r
fa
lse
p
o
sitive rate,
and better
vuln
era
b
ili
ty
analysi
s
time.
Ackn
o
w
l
e
dg
ements
This work is su
ppo
rted
by the National Natu
ral Scien
c
e
Found
ation
of China
(No.6
117
019
0), Youth
F
ound
ation of
He
bei
Ed
u
c
ation
a
l
Co
mmittee (No
.
Q2012
070
)
and
Scien
c
e
and
Tech
nolo
g
y
Re
sea
r
ch a
n
d
Developm
e
n
t Prog
ram
o
f
Han
dan
(No
:
13211
030
77
-3).
The a
u
thors also
wo
uld
like to exp
r
ess thei
r gratitude to th
e revie
w
e
r
s,
who
s
e val
u
able
comm
ents a
r
e very helpful
in revising th
e pape
r.
Referen
ces
[1]
Li QY, Luo L. Determi
nin
g
the M
i
nim
a
l
Soft
w
a
re
Rel
i
abil
i
t
y
T
e
st Effort b
y
Stratifi
ed S
a
mpl
i
n
g
.
T
E
LKOMNIKA Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
n
g
.
2013; 1
1
(8): 4
399-
440
6.
[2]
Ren J
D
, Xie Y
J
, Z
hang AG, et al. A Cl
ose
d
Sequ
enti
a
l Pa
ttern Mini
ng Al
gorithm for
Dis
cover
y
of t
h
e
Soft
w
a
re Bugs
F
eature.
Journ
a
l of Co
mp
utati
ona
l Informatio
n
Systems
. 20
11; 7(7): 23
22-
232
9.
[3]
Sun SJ,
Xiao
J
.
A Soft
w
a
re
R
e
lia
bi
lit
y GEP
Mode
l Bas
ed
o
n
Usa
g
e
Profil
e.
T
E
LKOMNIKA Indo
nesi
a
n
Journ
a
l of Elec
trical Eng
i
ne
eri
n
g
. 201
2; 10(7)
: 1756-1
7
6
4
.
[4]
Ning JF
, Hu M. Stud
y
on Softw
a
r
e Qual
it
y
I
m
prov
em
ent b
a
sed o
n
Ra
yl
ei
gh Mod
e
l an
d PDCA Mode
l.
T
E
LKOMNIKA Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
n
g
.
2013; 1
1
(8): 4
609-
461
5.
[5]
Eichi
nger F
,
Böhm
K, Hube
r M. Mining Edge-W
e
ighte
d
Ca
ll Grap
hs to Loca
lize Sof
t
w
ar
e Bugs
.
Lecture N
o
tes i
n
Co
mp
uter Scienc
e
. 200
8; 5211: 33
3-3
48.
[6]
He H, Z
hao
L, Li Q, et a
l
.
Analy
z
e
So
ftw
are Defects w
i
th Program Structure
Dep
end
ency
.
Procee
din
g
s of
the 2
nd Inter
n
ation
a
l C
onfer
ence
on
Comp
uter an
d Ap
plic
ations
(
CCA).
201
3; 17: 5
3
-
57.
0
150
300
450
600
750
1000
2000
3000
4000
5000
Running
tim
e
T/s
Num
SVAAKSC
DVCMA
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 6, June 20
14: 4330 – 4
336
4336
[7]
Maha
w
e
er
a
w
at
A, Sophat
sath
it P, Lursinsap
C.
Adaptive Se
lf-Organi
z
i
n
g
M
ap Clust
erin
g for Softw
are
F
ault Pred
ictio
n
. Procee
di
ngs
of the 4t
h Intl
. Joint C
onfere
n
ce o
n
C
o
mpu
t
er Scienc
e a
n
d
Soft
w
a
r
e
Engi
neer
in
g. T
hail
a
n
d
. 200
7: 35-4
1
.
[8]
Masri W
,
Podg
urski A. Appl
ic
ation-
base
d
An
omal
y
Intrusion Detection
w
i
th
D
y
namic Infor
m
ation F
l
o
w
Anal
ys
is.
Co
mputers an
d Sec
u
rity
. 2008; 2
7
(
5
): 176-1
87.
[9]
W
ang YY, Wang YN, Re
n
JD. Soft
w
a
re
Vulner
abi
litie
s Detection U
s
ing R
api
d D
ensit
y-b
a
s
e
d
Clusteri
ng.
Jou
r
nal of Co
mput
ation
a
l Infor
m
a
t
ion Syste
m
s. 201
1; 8(14): 32
95-3
302.
[10]
Ren
JD, C
a
i B
L
, He
HT
, et al. A Meth
od f
o
r Detecti
ng S
o
ftw
a
r
e
Vul
nera
b
i
lities B
a
se
d
on
Clust
eri
n
g
and Mo
del A
n
a
l
y
z
ing.
Jo
urna
l of Computati
o
n
a
l Informatio
n
Systems
. 20
11
; 7(4): 1065-1
0
73.
Evaluation Warning : The document was created with Spire.PDF for Python.