Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 3
,
Ju
n
e
201
6, p
p
. 1
133
~ 11
39
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
3.1
008
6
1
133
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
DNA Bar-Coding: A Novel App
roach for I
d
entifyin
g an
Individual Using Extended Levens
htein Distance Algorithm and
STR An
alysi
s
Likhitha C. P
,
Nini
th
a P,
K
a
nch
a
n
a
V
Departm
e
nt o
f
C
o
m
puter S
c
ien
c
e
,
Am
rita
Vis
h
wa Vid
y
ape
e
th
am
Univers
i
t
y
,
M
y
s
u
ru cam
pus
, Ka
r
n
atak
a,
India
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Feb 24, 2016
Rev
i
sed
May 11
, 20
16
Accepted
May 28, 2016
DNA bar-coding is a
technique that us
es the short
DNA nuc
leotide sequences
from the standard genome of the speci
es in o
r
der to find
an
d group the
species to which it belongs to. The
species ar
e identified b
y
their DNA
nucleo
tide sequences in
the s
a
me way
th
e items are recognized
an
d billed in
the supermarket using barcode scanner to
scan the Universal Product Code of
the items. Two
items may
look s
a
me to
the un
trained
ey
e, but in
both cases
the barcodes are distinct. It
was
possible to create DNA-barcodes to
characterize spec
ies by
analy
s
ing DNA
samples
from fish,
birds
,
mammals,
plants,
and inv
e
rtebr
a
tes using
Sm
ith-waterman and Needleman-Wunsch
algorithm. In this work we
ar
e
creating human DNA b
a
rcode and
implementing Extended
Levens
htein di
stance
algorithm along
with STR
anal
ys
is
that
us
es
les
s
com
puta
tion t
i
m
e
com
p
ared
to th
e pr
ev
ious
l
y
us
ed
algorithm
s
to m
eas
ure the di
fferent
ial dist
an
ce betwe
e
n the
two DNA
nucleo
tide sequences through
wh
ich
an
individu
al can
be id
entif
ied.
Keyword:
C
o
l
o
r
D
N
A
ba
r-c
ode
DN
A bar
-
c
odi
ng
Hu
m
a
n
id
en
tificatio
n
Leve
nshtein distance
algorit
h
m
Sequ
en
ce m
a
tc
h
i
ng
STR
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Lik
h
ith
a C P,
Depa
rt
m
e
nt
of
C
o
m
put
er Sci
e
nce,
Am
rita Vishwa
Vidy
a
p
eetham
U
n
ive
r
sity
, M
y
sur
u
cam
pus,
#
114
, 7
th
C
r
oss
,
B
o
ga
di
2
nd
Stag
e,
Mysu
ru
-57
002
6.
Em
a
il: l
i
k
h
i
pon
n
a
pp
a7@g
m
a
il.co
m
1.
INTRODUCTION
DNA
bar-c
odi
ng is a
fram
e
work for quic
k a
n
d accu
rate
species
rec
o
gnition t
h
at m
a
kes ec
ological
sy
st
em
m
o
re
avai
l
a
bl
e
by
usi
ng
s
h
ort
DN
A seq
u
ence ra
the
r
tha
n
entire
genom
e. The s
h
ort
DNA se
quence is
pr
o
duce
d
f
r
o
m
st
anda
rd
re
gi
on
o
f
ge
n
o
m
e
kn
o
w
n a
s
m
a
rker
. Thi
s
m
a
rk
er i
s
di
f
f
ere
n
t
fo
r di
ffe
rent
s
p
eci
es
like CO1 cytoc
h
rom
e
c oxi
da
se 1
for c
r
eatures, m
a
tK for
plants, internal
transc
ribe
d s
p
acer (ITS)
for
fungus
and m
i
t
o
cho
n
d
ri
al
ge
ne
fo
r
hum
ans.
DN
A ba
r-c
o
d
i
n
g
has
num
erou
s
appl
i
cat
i
o
ns i
n
di
ffe
rent
fi
el
ds l
i
k
e
prese
r
vi
n
g
na
t
u
ral
res
o
u
r
ce
s, secu
ri
n
g
e
nda
n
g
ere
d
s
p
ecies, recognizing
disease
vectors, ide
n
t
i
fying
ag
ricu
ltu
ral
p
e
sts, id
en
tificatio
n of m
e
d
i
ci
n
a
l p
l
an
ts and
id
en
tificatio
n
of
hu
m
a
n
s
.
As of re
cently, biological spe
c
ies were
recognize
d using morphol
ogical ele
m
ents like the sha
p
e, siz
e
and sha
d
e
of
body parts.
In m
a
ny situatio
ns, a
n
e
x
pe
rt
coul
d m
a
ke routine
dis
tingui
s
hing
pieces of proof
usi
n
g m
o
rph
o
l
ogi
cal
"key
s" (st
e
p
-
by
-
s
t
e
p i
n
st
r
u
ct
i
o
n
s
of
what
t
o
searc
h
for
,
ho
we
ver
m
u
ch of t
h
e t
i
m
e
an
accom
p
lished
proficient
taxonom
ist is required. On t
h
e
o
ff
cha
n
ce t
h
at
if a s
p
ecies is
dam
a
ged
or is in a
n
im
m
a
t
u
re pha
s
e
of i
m
provem
e
nt
, eve
n
a p
r
ofi
c
i
e
nt
t
a
xo
nomist
may
b
e
no
t ab
le to
id
entify an
d
d
i
sting
u
i
sh
th
at sp
ecies
[1]. Bar cod
i
n
g
tak
e
care of these issu
es
in
li
g
h
t
o
f
th
e
fact
th
at ev
en
no
n-au
t
h
orities can
g
e
t
stan
d
a
rd
ized
i
d
en
tification
s
fro
m
s
m
al
l
measu
r
es of tis
su
e. Th
is is
not to
say th
at t
r
ad
ition
a
l scien
tific
classificatio
n
has tu
rn
ed
ou
t to
b
e
less im
p
o
rtan
t. Mayb
e,
DNA
b
a
r cod
i
n
g
can
fill a d
o
u
b
l
e n
e
ed
as
an
o
t
her
d
e
v
i
ce i
n
th
e tax
ono
m
i
st
's to
o
l
stash supp
lemen
tin
g
th
ei
r i
n
sigh
t an
d
i
n
ad
d
ition
b
e
ing
a creativ
e
g
a
dg
et for
n
on-sp
ecialists wh
o n
e
ed
t
o
mak
e
a qu
ick
reco
gn
itio
n.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
11
3
3
– 11
39
1
134
Un
til no
w, DNA b
a
r-cod
ing
tech
n
i
q
u
e
was
p
r
ov
ed
u
s
efu
l
in
id
en
tifyin
g
sp
ecies of in
sects [2
], fish
es
[
3
],
Can
a
d
i
an m
o
sq
u
ito
[
4
], sp
id
er
s [
5
],
b
i
rd
s [
6
]
and
a
n
i
m
als [7].
It
was also us
ed e
ffectively to e
x
am
in
e
Hyalella [8], a taxonom
ically di
fficult genus of am
phipod crustacea
ns
and tuss
oc
k m
o
ths (Lepidoptera:
Ly
m
a
n
t
riid
ae) [9
].
Hum
a
n D
NA
b
a
r-c
odi
ng i
s
a
po
we
rf
ul
t
ool
i
n
f
o
re
nsi
c
s t
o
i
d
ent
i
f
y
t
h
e
hu
m
a
n [1
0]
t
h
r
o
u
gh t
h
e D
N
A
sam
p
les stored in the databa
se. This w
o
r
k
s
by
co
llecting
the sam
p
le DNA se
que
nce
fr
om
the indiv
i
dual,
co
nv
ertin
g th
is sequ
en
ce in
t
o
co
lor
b
a
rco
d
es b
a
sed
on
th
e nu
cleo
tid
e b
a
ses [1
1
]
an
d stor
in
g th
ese b
a
r
c
od
es i
n
t
h
e st
an
dar
d
l
i
b
ra
ry
al
o
ng
wi
t
h
t
h
e c
o
m
p
l
e
te det
a
i
l
s
pert
ai
ni
n
g
t
o
t
h
at
pa
rt
i
c
ul
ar i
n
di
vi
d
u
al
. B
y
scan
ni
ng t
h
e
barc
o
d
e [
12]
o
r
by
e
n
t
e
ri
n
g
t
h
e se
que
nce, t
h
e ne
wl
y
en
tered se
que
nce is
com
p
ared
to t
h
e stored
sequence in
th
e lib
rary and m
a
tch
e
d
.
If t
h
e sequ
en
ce m
a
tch
is fo
und
, t
h
e co
m
p
lete d
e
tails o
f
th
e m
a
tch
e
d
sequ
ence are
di
spl
a
y
e
d t
o
t
h
e use
r
. T
h
i
s
t
o
ol
has
m
a
ny
appl
i
cat
i
ons i
n
the areas like i
d
e
n
tifying c
r
im
in
als whose
DNA m
a
y
match evidenc
e
left at crim
e
scene, t
o
exonerate pers
ons
wrongly accus
e
d of crim
es a
nd t
o
establish fam
i
ly
relatio
n
s
h
i
p
s
. Hu
m
a
n
DNA bar-co
d
i
n
g
in
cl
ud
e
d
i
fferen
t
activ
ities su
ch as
a)
Wo
r
k
i
n
g wi
t
h
t
h
e i
ndi
vi
d
u
a
l
s:
To col
l
ect
, i
d
ent
i
f
y
,
cl
as
si
fy
and st
ore
i
ndi
vi
dual
s
’
dat
a
i
n
sec
u
re
rep
o
sito
ries.
b)
B
a
rco
d
e
gene
r
a
t
i
on:
C
o
l
o
r
D
N
A
ba
rco
d
e
o
f
t
h
e i
n
di
vi
dual
seq
u
e
n
ces i
s
g
e
nerat
e
d
by
p
r
e-p
r
oce
ssi
n
g
t
h
e
nucle
otide se
quence
s.
c)
M
a
nagi
ng dat
a
:
The generat
e
d col
o
r D
NA b
a
rco
d
es al
o
ng
wi
t
h
t
h
e seq
u
e
n
ces an
d t
h
e d
e
t
a
i
l
s
regardi
n
g
the
individual are updated
in
th
e stand
a
rd
li
b
r
ary.
d)
Fi
ndi
ng t
h
e m
a
t
c
h:
Ei
t
h
er DN
A seq
u
ence
s o
r
barc
o
d
e im
ages are upl
oade
d t
o
m
a
t
c
h wi
t
h
t
h
e dat
a
bas
e
t
o
di
spl
a
y
t
h
e
det
a
i
l
s
of t
h
e m
a
t
c
hed
se
que
nce.
Pre
v
i
o
usl
y
Sm
i
t
h
-wat
erm
a
n [
13]
a
n
d
Nee
d
l
e
m
a
n-
Wu
nsch
al
g
o
ri
t
h
m
[1
4
]
were
use
d
i
n
se
q
u
en
c
e
alignm
ent. To increase the efficiency
, t
h
e Leven
s
ht
ei
n di
st
a
n
ce al
go
ri
t
h
m
[
15]
i
s
im
pl
em
ent
e
d t
o
com
put
e t
h
e
num
ber
of
m
i
sm
at
ches bet
w
e
e
n t
w
o l
o
n
g
st
r
i
ngs
.
In
few cases,
th
e Leven
s
h
t
ein
d
i
stan
ce algo
rith
m
m
a
y retu
rn
sam
e
m
i
s
m
atch
co
un
t fo
r m
u
ltip
le
sequences
. He
nce to
find the
exact
m
a
tch between those se
que
nces t
h
e Short Ta
ndem
Repeats analysis (STR)
[1
6]
,[
1
7
]
i
s
used. STR
a
n
al
y
s
i
s
i
s
a t
ool
us
ed i
n
f
o
re
nsic
to analysis and evaluate the s
p
ecific STR re
gions
[1
8]
t
h
at
i
s
o
n
nucl
e
a
r
DN
A.
The STR
re
gi
o
n
s
t
h
at
are
ana
l
yzed from
nuclear
D
N
A
m
a
y
have
p
o
l
y
m
o
rp
hi
c
n
a
ture. B
u
t in
th
e fo
ren
s
ic testin
g
o
f
t
h
ese
STR reg
i
on
s,
i
t
sh
ow
s d
i
f
f
er
en
tiatio
n
b
e
tw
een
on
e DN
A
pr
of
ile
and anothe
r.
2.
R
E
SEARC
H M
ETHOD
2.
1.
Ba
r-co
ding
As each i
ndi
vi
duals
fingerpri
n
t is diffe
re
nt, each
individuals DNA is also diffe
r
ent. By
DNA bar-
codi
ng we
can identify s
p
eci
es or indivi
dua
l
s fast a
n
d acc
urately. Sam
e
as finge
r
printi
ng technol
ogy, DNA
bar
-
co
di
n
g
can
al
so
hel
p
i
n
fi
ndi
ng
o
u
t
t
h
e c
u
l
p
ri
t
i
n
t
h
e c
r
i
m
i
n
al
cases o
r
uni
dent
i
f
i
e
d
vi
ct
im
. The
bi
ol
ogi
cal
sam
p
les are collected from
Blood, Saliva,
U
r
i
n
e, Hai
r
, B
o
n
e
or Ti
ssu
e of
an i
ndi
vi
d
u
al
and se
nt
t
o
l
a
bo
rat
o
ry
t
o
ext
r
a
c
t
D
N
A
se
que
nce.
T
h
e se
que
nce c
o
nt
ai
n f
o
ur
n
u
cl
eot
i
d
e
bases
A
-
ade
n
i
n
e
,
T
-
t
h
ym
i
n
e, G-
gua
n
i
ne an
d
C-cytosine.
T
h
ese four nuc
leotide
rand
om
co
m
b
in
atio
n
leads to larg
e se
quences
of DNA wit
h
va
rying
l
e
ngt
h
s
. Fo
r
e
x
am
pl
e,
1.
(AT
T
CA
AA
A
GACCTC
G
CT
AA
A
AATCT
CGCA
GTCA
ACTA
TCTTT
AGC
GTT
A
A
A
TCAC
GCA
A
CA
TATTTC
AACCGCATTGGAGAGTCGAGGC
AGCTA
AGCCCGGTAACCCCTTTCATATCTGATCC
TACG
G
GAT
CTTG
GGT
TT
GTCCGCCA
T
T
CTG
A
TT
GT
GA
G
AAC
GG
GGT
GT
GTCC
GCA
G
A
A
CC
CC
TCTCTA
GAC
AACT
A
GAC
CATTCG
A
CT
CAG
)
,
2.
(
T
CGA
GAA
TA
AAA
G
T
TTCA
G
T
G
T
AA
TA
AA
CCAAGA
TGTCTTA
T
CTG
A
C
G
C
GA
G
C
TTCCTTCT
TTGAAGTAACAGTTCCTGTCTCGTCTTCACT
AAATCTTCACAGC
G
CGTCCTAATACCGGCAG
TGA
A
CCG
T
A
TCCG
GTT
A
CTAT
ATGCT
G
TT
GT
TAGAGCGTTCTCGCACG
CG
AC
ATT
A
CA
GT
A
C
CTCGCCCAGTCGCAATTC
TG
CCTGC
) an
d so on
.
The
DN
A
seq
u
ence
i
s
b
a
r
-
c
ode
d a
n
d sa
ve
d al
o
n
g
wi
t
h
a
l
l
det
a
i
l
s
of t
h
e i
ndi
vi
d
u
al
i
n
t
h
e
dat
a
base
.
The
DN
A i
s
ba
r-c
ode
d
by
assi
gni
ng t
h
e f
o
ur
nucl
e
ot
i
d
es
wi
t
h
di
ffe
rent
c
o
l
o
rs a
s
f
o
l
l
o
ws:
A -
Gre
e
n
,
T -
R
e
d,
G - Black, and
C -Blue.
The
fol
l
o
wi
n
g
i
s
a pa
rt
o
f
DN
A se
q
u
enc
e
t
a
ken
as sa
m
p
l
e
and ba
r-
code
d as
sh
o
w
n i
n
Fi
g
u
re
1
.
G
T
TG
AAG
CG
G
T
TATCG
C
G
C
AA
AAA
AG
CTGG
CG
CCCG
GA
GAGTG
G
C
A
T
G
C
A
AAG
CTG
T
CA
G
C
A
A
ACCCAAC
GTTGATCAAC
GCAGCGC
A
GCTT
GAGTGTCTTTCTTTGG
CCCATACCCAGCCCGTGC
A
A
T
GA
CCAACG
C
G
T
TAGA
TTG
A
CCTAG
T.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
DNA B
a
r C
o
ding: A
Novel
Approac
h
f
o
r
Ide
n
tifying
an
Individual
Usi
n
g
Extende
d
.... (L
ikhitha C
.
P)
1
135
Fi
gu
re
1.
C
o
l
o
r
D
N
A
ba
r-c
o
d
e
The dat
a
base cont
ai
n
s
m
i
l
lions
of rec
o
r
d
s of t
h
e i
ndi
v
i
dual
s
wi
t
h
t
h
ei
r pers
onal
d
e
t
a
i
l
s
, DN
A
seq
u
ence
an
d
bar
-
co
de
d i
m
age
of
D
N
A
se
q
u
ence
.
If
any
u
n
i
d
e
n
t
i
f
i
e
d
vi
ct
im
i
s
fo
un
d,
D
N
A
se
que
nce c
a
n
b
e
ext
r
act
ed
f
r
om
t
h
at
i
n
di
vi
dua
l
and
gi
ve
n as
i
n
p
u
t
t
o
fi
nd th
e m
a
tch
and
t
o
retriev
e
th
e
d
e
tails of th
e
v
i
cti
m
fr
om
t
h
e dat
a
b
a
se usi
n
g
E
x
t
e
nde
d
Leve
ns
ht
ei
n di
st
a
n
ce al
go
ri
t
h
m
wi
t
h
S
T
R
m
e
t
hod
.
Inform
atio
n
such
as n
a
m
e
, id
en
tification
nu
m
b
er
(ID),
reason
of
deat
h etc. are i
n
cl
ude
d i
n
t
h
e
barc
o
d
e f
o
r the
refe
re
nce.
The
ba
rco
d
e lib
rar
y
pr
o
v
ides
a fun
c
tio
n to en
cod
e
th
e co
n
t
en
t
in
to
an
im
ag
e wh
i
c
h
can
be sa
ved i
n
JPE
G
,
G
I
F,
PN
G o
r
B
i
t
m
ap f
o
rm
at
s,
and also a function to
dec
ode
a
n
im
age.
W
h
e
n
the
i
ndi
vi
dual
i
s
i
d
ent
i
f
i
e
d, t
h
e
gene
rat
e
d
col
o
r
barc
ode
i
m
age can
be
easily printed an
d at
tached t
o
t
h
e
victims
sam
p
le for fu
r
t
her p
r
oc
essin
g
instead
of ca
rry
in
g o
u
t th
e lab
o
rato
ry
p
r
ocess ag
ain
.
In
fu
t
u
re, if an
y d
e
tails
about the
victim are requir
e
d
, the attache
d
ba
r-c
ode image of that
v
i
cti
m
is scan
n
e
d
and
inpu
tted to
th
e
syste
m
.
2.
2.
Exten
d
ed Le
venshtein
distance al
gori
t
hm
The E
x
tende
d
Leve
nshtein di
stance algorithm co
m
putes the dista
n
ce be
tween the t
w
o strings.
In
ot
he
r w
o
r
d
s, i
t
co
m
put
es t
h
e num
ber of
m
i
sm
at
ches b
e
t
w
een t
w
o st
ri
n
g
s usi
ng
d
y
n
am
i
c
pro
g
ra
m
m
i
ng
app
r
oach
. T
h
e
t
w
o st
ri
n
g
s
use
d
a
r
e
nucl
e
ot
i
d
e seq
u
e
n
ces
wi
t
h
va
ry
i
n
g l
e
n
g
t
h o
f
b
p
(
b
ase
pai
r
s
)
.
The algorithm
com
p
ares the t
w
o
seque
n
ces
and
retu
r
n
s n
u
m
ber of m
i
sm
at
ch bet
w
een t
h
em
as sho
w
n
in
Figure
2
.
It ex
ecu
t
es
u
n
til it co
m
p
are the in
pu
tted se
qu
en
ces
b
y
th
e
u
s
er with all th
e list of
n
u
cleo
ti
de
sequences stored in the database. On
ce it com
p
ares all the
sequences in
t
h
e d
a
tab
a
se, it
retu
rn
s th
e m
i
n
i
m
u
m
mis
m
a
t
ch count of t
h
e t
w
o
sequences
a
n
d
its co
rres
p
on
din
g
ID
.
If t
h
e
m
i
sm
atch cou
n
t is
0
fo
r a
n
y
D
N
A
sequence
, the
n
it is consi
d
ere
d
exact m
a
tch and we
can
retri
e
v
e
th
e informatio
n
u
s
ing
t
h
e
co
rresp
ond
ing
ID.
“=” Match
“
x
” Mism
atch
(a)
(b
)
Fi
gu
re
2.
Ext
e
nde
d
Leve
ns
ht
ei
n di
st
a
n
ce al
go
ri
t
h
m
for
m
a
t
c
hi
ng
t
w
o
DN
A se
qu
ence
(a
)
m
i
sm
at
ch cou
n
t
f
o
r
diffe
re
nt strin
g
lengt
h
(b
) m
i
sm
at
ch cou
n
t
fo
r sam
e
string le
ngt
h
2.
3.
Shor
t
tandem repeats (STR)
In fe
w cases,
a part of
DNA sequence of a person
is similar to
th
at o
f
ano
t
h
e
r
p
e
rso
n
. In
such
situ
atio
n
s
, it is d
i
fficu
lt to
match
th
e seq
u
e
n
ce an
d
p
r
ed
ict th
e in
d
i
v
i
d
u
a
ls. By i
m
p
l
e
m
en
tin
g
Ex
t
e
n
d
e
d
Leve
nsht
ei
n
di
st
ance al
go
ri
t
h
m
,
t
h
e l
o
west
d
i
ssim
i
l
a
ri
t
y
between se
quence
s is obtained. T
h
e seque
n
ce ha
ving
l
o
west
di
ssi
m
i
l
a
ri
t
y
ot
her t
h
an
0 i
s
a
n
al
y
zed
usi
n
g S
T
R
m
e
t
hod.
STR
s
are
n
o
t
h
i
n
g
b
u
t
sh
o
r
t
seq
u
e
n
ce
of
DN
A o
f
l
e
n
g
t
h
4-
5 base
pai
r
s
t
h
at
are repeat
e
d
m
a
ny
num
ber of t
i
m
es i
n
a
si
ngl
e n
u
cl
eot
i
de seq
u
e
n
ce. S
T
R
i
s
use
d
to com
p
are specific loci
on nu
cleotide sequence
fro
m two or m
o
re sa
m
p
les. STR analysis
m
easures the
exact num
b
er
of re
peating units
of nucleotide
in DNA
s
e
quence.
For
exam
pl
e, i
f
gr
o
up
o
f
f
o
ur
nucl
e
ot
i
d
es
(t
et
ram
e
r) i
s
con
s
i
d
ere
d
as
re
peat
uni
t
s
, t
h
e
n
t
h
e
num
ber
o
f
repeat
uni
t
s
i
n
a
si
ngl
e ge
nom
i
c
seque
nce
i
s
cal
cul
a
t
e
d
an
d com
p
ared wi
t
h
t
h
e
n
u
m
b
er
o
f
sam
e
grou
p of
fo
ur
n
u
c
leo
tid
e repeat u
n
its in
an
oth
e
r sequ
en
ce.
If bo
th
th
e
co
u
n
t
of re
peat
uni
t
s
of DN
A seq
u
ence
s
m
a
t
c
hes t
h
en
we c
o
m
e
t
o
t
h
e
co
ncl
u
si
on
t
h
a
t
t
h
e se
que
nce
i
s
of
sam
e
pers
on
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
11
3
3
– 11
39
1
136
Ext
e
n
d
e
d
Le
ve
nsht
ei
n
Al
g
o
ri
t
h
m
wi
t
h
STR
m
e
t
hod
Step 1:
Read
S1
Step
2:
fo
r S
2
(k
to
t f
r
o
m
database)
Step
3:
Check each character
of S1 (i
from
1 to
n).
Step
4:
Check each character
of S2 (j from
1 to m
)
.
Step
5:
If
S1[i] equal to S2[j], t
h
e m
i
sm
a
t
ch is 0.
If
S1[i] no
t equal to
S2
[j
], the
mis
m
a
t
ch
is 1.
St
ep 6:
k=k+
1 g
o
t
o
St
ep 2
Step 7:
Minim
u
m
m
i
s
m
atch
an
d sequ
en
ce ID
is
ob
tain
ed
.
St
ep
8:
If m
i
nim
u
m
m
i
sm
at
ch=0 t
h
e
n
got
o St
e
p
13
el
se g
o
t
o
St
ep
9
Step 9:
Read
re
peat units.
Step 10:
Check
the number
of
rep
eat un
its
in
S2[j
].
Step 11:
Check
the number
of
rep
eat un
its
in
S1[i].
St
ep
12:
If
S1
[i
]
eq
ual
t
o
S
2
[
j
]
,
m
a
t
c
h fo
u
n
d
.
I
f
S1[
i
] no
t equal to
S2
[j
],
m
a
t
c
h
n
o
t
fo
und
.
Step
1
3
:
Prin
t t
h
e
d
e
tails wh
ose m
a
tch
foun
d th
rou
g
h
sequ
en
ce ID.
3.
R
E
SU
LTS AN
D ANA
LY
SIS
Whe
n
a
ne
w s
e
que
nce
was e
n
t
e
re
d, i
t
was
m
a
de t
o
chec
k with the se
quences
a
l
r
e
ad
y
s
t
o
r
ed
in
th
e
dat
a
base
. T
h
e
n
i
t
di
s
p
l
a
y
e
d t
h
e
n
u
m
b
er o
f
m
i
sm
at
ches al
on
g
wi
t
h
t
h
e
I
D
of
t
h
e
i
n
di
v
i
dual
fo
r eac
h
of
t
h
e
sequence
com
p
ared as s
h
own
in Figure
3.
Th
e end
resu
lt sh
owed
th
e m
i
n
i
m
u
m
mis
m
at
ch
ed
cou
n
t
thro
ugh
wh
ich
the in
d
i
v
i
d
u
a
l was id
en
tified
usi
n
g t
h
e se
q
u
e
nce I
D
.
If t
h
e
m
i
nim
u
m
m
i
sm
at
ch cou
n
t
= 0, we ca
n c
o
n
c
l
ude t
h
at
t
h
e
bot
h seq
u
e
n
ce
hav
e
exact
m
a
t
c
h an
d i
t
bel
o
n
g
s
t
o
t
h
at
i
ndi
vi
d
u
al
of
t
h
at
I
D
.
Fi
gu
re
3.
R
e
s
u
l
t
of se
q
u
ence
m
a
t
c
hi
ng
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
DNA B
a
r C
o
ding: A
Novel
Approac
h
f
o
r
Ide
n
tifying
an
Individual
Usi
n
g
Extende
d
.... (L
ikhitha C
.
P)
1
137
Th
e Figure 4
sh
ows
h
o
w th
e
in
d
i
v
i
d
u
a
l is iden
tified
.
T
h
e
x-axis re
present
s
DNA se
que
nce ID
of the
i
ndi
vi
dual
.
Th
e y
-
axi
s
re
pres
ent
s
m
i
nim
u
m
m
i
sm
at
ch cou
n
t
of t
h
e seq
u
e
nce. T
h
e g
r
a
ph i
s
pl
ot
t
e
d f
o
r t
h
e
values
obtaine
d in Figure 3, whe
n
the sequence is co
m
p
ared using E
x
tended Le
ve
nshtei
n distance algorithm
.
Th
e
po
in
t
wh
i
c
h
tou
c
h
e
s m
i
n
i
m
u
m
mis
m
a
t
ch
coun
t 0 is
identifie
d as
exact m
a
tc
h and that se
quence ID
d
e
tails is ob
tain
ed.
Fi
gu
re
4.
G
r
ap
hi
cal
re
prese
n
t
a
t
i
on
of
seq
u
e
n
ce m
a
t
c
hi
ng
If the m
i
nim
u
m
mis
m
atch count
is
greater
than 0 or if in case
m
a
ny
IDs exhi
bi
t
t
h
e sam
e
m
i
nim
u
m
mis
m
a
t
ch
co
unts, th
e STR m
e
th
od
is u
s
ed
to co
un
t and
com
p
are the tetramer repeat
un
its in
th
e sequ
en
ce as
sho
w
n i
n
Fi
gu
r
e
5.
Fi
gu
re 5.
STR
anal
y
s
i
s
If t
h
e c
o
unt
o
f
t
e
t
r
am
er re
peat
u
n
i
t
s
of
o
n
e s
e
que
nce
i
s
e
q
u
a
l
t
o
a
not
her
s
e
que
nce,
we c
a
n c
o
ncl
u
d
e
t
h
at
bot
h t
h
e s
e
que
nce
bel
o
n
g
s t
o
t
h
e sam
e
i
ndi
vi
dual
.
The
ent
i
r
e det
a
i
l
s
pert
ai
ni
ng t
o
t
h
at
per
s
o
n
i
s
d
i
spl
a
y
with
co
rresp
ond
ing
ID stored
in
th
e
d
a
tab
a
se.
The anal
y
s
i
s
pr
o
v
ed t
h
at
t
h
e pr
op
ose
d
al
go
ri
t
h
m
i
s
effi
ci
ent
as i
t
t
ook l
e
ss e
x
ecut
i
on t
i
m
e t
o
com
p
are the
biological sequence
s
whe
n
com
p
ared with
the sm
ith-waterm
a
n and
Needlem
a
n-wunsc
h
al
go
ri
t
h
m
as sh
ow
n i
n
Fi
gu
re
6.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
11
3
3
– 11
39
1
138
Fi
gu
re
6.
C
o
m
p
ari
s
on
o
f
al
go
ri
t
h
m
s
4.
SU
MM
A
R
Y
AN
D CO
N
C
LUSIO
N
The
Extende
d
Levens
h
tein distance algorithm
alon
g
with
STR an
alysis is i
m
p
l
e
m
en
ted
to
id
en
tify
people by fi
nding the least num
b
er of m
i
s
m
atches
bet
w
e
e
n t
h
e seq
u
e
n
ces t
h
r
o
u
g
h
D
NA
bar
-
co
di
n
g
. T
h
i
s
hel
p
s t
h
e g
o
v
e
r
nm
ent
or
ga
ni
zat
i
on i
n
fi
n
d
i
n
g pe
o
p
l
e
i
n
nat
u
ral
di
sast
ers.
The sam
p
l
e
s can be
col
l
ect
ed
by
t
h
e
go
ve
rnm
e
nt
d
u
ri
ng
cens
u
s
or
ot
her s
cena
r
i
o
s a
n
d st
o
r
e
i
n
t
h
e
dat
a
ba
se an
d t
h
at
ca
n
be
used
i
n
vari
ou
s
crimin
al an
d
fo
ren
s
ic app
licatio
n
s
.
In
fu
tu
re, in
case
o
f
m
u
tatio
n
in
th
e
g
e
n
o
m
ic seq
u
e
n
c
e, b
y
ex
am
in
ing
th
e
vari
at
i
o
ns i
n
t
h
e ge
ne se
q
u
enc
e
t
h
e
vari
ous
di
seases affected in a
n
indivi
dual is found out.
REFERE
NC
ES
[1]
F
e
rri, G
.
,
et al.
,
“Species identif
ication through
DNA
“barcodes”,”
Genetic Testing
and Molecu
lar
Biomarkers
, vol/issue: 13(3), pp.
421-426, 2009
.
[2]
Wilson J.
J.
, “DNA ba
rcodes for insects,
”
DNA b
a
rcodes: Method
s and protocols,
pp. 17-46
, 2012
.
[3]
Wa
rd,
R. D.
,
et al.
,
“The campaign to DNA ba
rcode all fishes,
FISH
‐
BOL,”
Journal of fish b
i
ology
, vol/issue:
74(2), pp
. 329-3
56, 2009
.
[4]
C
y
winska A.,
et al.
, “Iden
tif
y
i
n
g
Canadian mos
quito species th
rough DNA barcodes,”
M
e
dical and veterinary
entomology,
vo
l/issue: 20(4), pp.
413-24, 2006
.
[5]
Barrett R. D.
and Hebert
P. D
., “Identif
y
i
ng s
p
iders through
DNA barcodes,”
Canadian Jou
r
nal of Zoology,
vol/issue: 83(3), pp.
481-91
,
200
5.
[6]
Hebert P. D.,
et al.,
“Identification of bird
s
through DNA barcodes,
”
PLoS Biol
., vol/issue: 2(10), pp.
e312
,
2004
.
[7]
Hebert P. D.,
et al.
, “Biological id
entif
ications
through DNA
barcodes,”
Proceedings o
f
th
e
Royal Society o
f
London B: Biolo
g
ical Sciences,
v
o
l/issue: 2
70(15
12), pp
. 313-21
,
2003.
[8]
W
itt J. D.,
et al.
, “DNA barcoding reveals extr
aordinar
y
cr
y
p
tic
di
versity
in an amphipod ge
nus: implications for
desert spring
co
nservation
,
”
Mol
ecular
Ecolog
y,
vol/issue:
15(10)
, pp
. 3073-82
, 2
006.
[9]
Ball S.
L. and
Armstrong K. F., “DNA ba
rcod
es for insect pest identificatio
n
:
a test case with tussock moths
(Lepidop
tera
: L
y
m
a
ntriida
e
),
”
Ca
nadian Journal
of Forest Research,
vo
l/issue: 36
(2), pp
. 337-50
,
2006.
[10]
Zokae
e
S
.
and F
aez K.
, “
H
um
an identif
ica
tion b
a
s
e
d on ECG and palm
print,
”
International
Journal of Electrica
l
and Computer Engineering
,
vol/issue: 2(2), pp. 2
61, 2012
.
[11]
Kim Y.
,
et al.
,
“The nucleotide: DNA
sequencing and its clin
ical application,
”
Journal of oral
and maxillofa
ci
al
surge
r
y
,
vol/issue: 60(8)
, pp
. 924
-30, 2002
.
[12]
L. Wang
and C.
A. Alexand
e
r, “
A
pplications of
Autom
a
ted Iden
t
i
fic
a
tion
Te
chno
log
y
in
EHR/E
M
R,”
Internatio
nal
Journal of Public Health S
c
ien
c
e (
I
JPHS)
, vol/issue: 2(3)
, pp
. 109
-122, 2013
.
[13]
E. S
.
Orabi,
et al.
,
“DNA fingerprint using smi
t
h waterman
algorithm by
grid computing,
” in
I
n
formatics and
Systems (
I
NFOS)
,
2014 9th In
ter
national Con
f
erence on
(
ppPDC-74)
. IEEE,
2014
.
[14]
S.
A.
She
h
a
b
,
et al.
, “
F
as
t d
y
n
a
m
i
c algor
ithm
for s
e
quenc
e a
lignment based o
n
bioinformatics,”
In
ternational
Journal of Computer App
lica
tion
s
, vol/issue: 37(7
)
, pp
. 54-61
, 201
2.
[15]
P. Adhitam
a
,
et al.
, “Lexicon-Dr
iven Word Reco
gnition
Based o
n
Levenshtein D
i
stance,”
International Journal o
f
Software
Engineering and I
t
s Ap
plications
, vo
l/is
sue: 8(2), pp
. 11
-20, 2014
.
[16]
Benson G., “Tandem repeats finder: a
progr
am
to ana
l
yze DN
A s
e
quences
,
”
Nuclei
c ac
ids
r
e
s
e
ar
ch,
vol/issue:
27(2), pp
. 573
, 1
999.
[17]
Kolpakov R.,
et al.
,
“Mreps: efficient
and
flexible detection of
tandem repeats in DNA,
”
Nucl
ei
c acids
r
e
s
e
ar
ch
,
vol/issue:
31(13)
, pp
. 3672-8
,
20
03.
[18]
Ruitberg C
.
M.,
et al.
,
“STRBase: a short tandem
repe
at DNA database for the human
identity
testin
g community
,”
Nuclei
c Ac
ids
R
e
s
e
ar
ch,
vol/issue: 29(1)
, pp
. 320
-2, 2001
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
DNA B
a
r C
o
ding: A
Novel
Approac
h
f
o
r
Ide
n
tifying
an
Individual
Usi
n
g
Extende
d
.... (L
ikhitha C
.
P)
1
139
BIOGRAP
HI
ES OF
AUTH
ORS
Likhith
a C P was born in Coor
g-India in 1992. She received t
h
e BCA degree
in Com
puter
Scienc
e from
th
e Am
rita Vishw
a
Vid
y
ap
eeth
a
m
(Am
r
ita Univer
sit
y
)
,
M
y
su
ru Ca
m
pus, India,
in
2014. Curren
t
ly
, she is persuing
her MCA degree in
Computer Science from the Amrita Vishwa
Vid
y
ap
ee
tham
(Am
r
ita Univers
i
t
y
)
,
M
y
s
u
ru Ca
m
pus
, India. Her res
earch in
te
res
t
s
include
Bioinformatics and Image Processing.
Ninitha P was b
o
rn in M
y
suru-I
ndia in
1993. Sh
e receiv
e
d th
e B
C
A degree
in Co
mputer Scien
c
e
from the Amrita Vishwa Vidy
ap
eeth
a
m (Amrita
Un
iversity
), M
y
suru Campus, In
dia, in 2014.
Currently
, she is persuing her
MCA degree in
Com
puter Scie
nce from
the
Am
rita Vishwa
Vid
y
ap
ee
tham
(Am
r
ita Univers
i
t
y
)
,
M
y
s
u
ru Ca
m
pus
, India. Her res
earch in
te
res
t
s
include
Bioinformatics and So
ftware Eng
i
neer
ing.
Kanchana V was born in M
y
suru-India in 1979
. Sh
e received
th
e B.Sc. d
e
gree in PMCS fro
m
Unive
r
sity
of M
y
suru,
the
MCA
de
gre
e
in Com
puter Science fro
m VTU,
and the MTech d
e
gree
in IT from
Ka
rnatak
a S
t
a
t
e O
p
en Univers
i
t
y
.
S
h
e has
m
o
re
than 11
yea
r
s
of ac
adem
ic
experi
enc
e
. Cur
r
entl
y,
s
h
e
is
working as
As
s
i
stant Professor
in Dep
a
rtment of Computer
Science,
Amrita Vishwa Vidy
a
p
eeth
a
m
(Am
r
it
a Univers
i
t
y
),
M
y
s
u
ru Cam
pus
, India. Her
res
earch
ar
eas
in
clude
Bioinform
a
ti
cs
,
MIS, ERP and
Software
En
gineer
ing.
Evaluation Warning : The document was created with Spire.PDF for Python.