Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
8,
No.
5,
October
2018,
pp.
3913
–
3922
ISSN:
2088-8708
3913
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
DN
A
P
ool
Analysis-based
F
or
gery-Detection
of
Dairy
Pr
oducts
Francesco
Rossi
1
,
P
aola
Modesto
2
,
Cristina
Biolatti
3
,
Alfr
edo
Benso
4
,
Stefano
Di
Carlo
5
,
Gianfranco
P
olitano
6
,
and
Pierluigi
Acutis
7
1,4,5,6
Department
of
Control
and
Computer
Engineering,
Politecnico
di
T
orino,
Italy
2,3,7
Istituto
Zooprofilattico
Sperimentale
del
Piemonte
Liguria
e
V
alle
dAosta,
Italy
Article
Inf
o
Article
history:
Recei
v
ed
December
21,
2017
Re
vised
July
29,
2018
Accepted
August
18,
2018
K
eyw
ord:
Genetic
Programming
CMA-ES
DN
A
barcoding
STR
F
ood
Safety
ABSTRA
CT
F
ood
inte
grity
and
food
safety
ha
v
e
recei
v
ed
much
attention
in
recent
years
due
to
the
dramatic
increasing
number
of
food
frauds.
In
this
article
we
focus
on
the
problem
of
dairy
products
traceability
.
In
particular
,
we
propose
an
automatic
for
gery
detection
system
able
to
detect
frauds
in
milk
and
cheese.
W
e
in
v
estig
ate
the
use
of
Short
T
an-
dem
Repeats
analysis
data,
processed
by
a
Co
v
ariance
Matrix
Adaptation
Ev
olution
Strate
gy
algorithm
i
n
order
to
e
v
aluate
a
traceability
score
between
the
products
and
their
producer
,
and
to
highlight
possible
adulterations
and
inconsistencies.
T
o
demon-
strate
the
usability
of
the
proposed
heuristic
algorithm
in
a
real
setup,
we
al
so
present
the
results
collected
from
tw
o
real
Italian
f
arms.
Copyright
c
2018
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
Francesco
Rossi
Politecnico
di
T
orino
Department
of
Control
and
Computer
Engineering
Corso
Duca
delgli
Abruzzi
24,
10129
T
orino,
Italy
Email:
francesco.rossi@polito.it
1.
INTR
ODUCTION
F
ood
inte
grity
and
food
safety
ha
v
e
recei
v
ed
much
attention
in
recent
years
due
to
the
dramatic
in-
creasing
number
of
food
frauds.
T
raceability
is
a
useful
method
to
guarantee
foodstuf
f
quality
and
safety
,
to
guarantee
h
ygiene
standards,
and
to
protect
consumers
choices
and
health.
Ov
er
the
past
years
DN
A
analysis
has
bee
n
widely
recognized
as
an
ef
fecti
v
e
tool
to
deal
with
genetic
traceability
issues,
g
aining
a
k
e
y
role
in
tracing
and
testing
food
origin
and
safety
.
In
this
article
we
analyze
dairy
products
for
which
one
of
the
crucial
issues
is
traditional
chees
e
traceability
.
In
the
case
of
frauds,
it
may
occur
that
a
selected
dairy
product
that
shoul
d
be
produced
by
milk
coming
from
a
certified
f
arm,
is
instead
produced
using
a
v
ariable
amount
of
milk
coming
from
unauthorized
f
arms.
T
raceability
of
dairy
products
through
DN
A
analysis
in
v
olv
es
some
technical
challenges.
The
cheese
(CH)
is
produced
from
b
ulk
milk
(BM),
which
contains
DN
A
from
dif
ferent
co
ws
of
the
f
arm
and
under
goes
se
v
eral
biochemical
changes
during
the
ripening
process.
In
this
paper
,
we
propose
a
computer
-assisted
molecular
traceability
system
able
to
analyze
the
origin
of
a
traditional
dairy
product.
W
e
in
v
estig
ate
the
use
of
Short
T
andem
Re
peats
(STRs)
analysis
to
create
a
DN
A
fingerprint
of
small
dairy
f
arms
and
to
link
dairy
products
(milk
and
cheese)
to
the
corresponding
producer
.
So
f
ar
,
STR
analysis
has
been
applied
to
blood
samples
for
genetics
population
analysis
[1,
2,
3,
4,
5],
or
to
milk
samples
in
order
to
identify
quantitati
v
e
trait
locus
(QTL)
associated
with
traits
in
animal
science
[6,
7].
Ho
we
v
er
,
the
application
of
STR
analysis
to
trace
the
origin
of
dair
y
products
is
a
dif
ferent
and
more
comple
x
issue.
Dairy
products
contain
the
DN
A
belonging
to
s
e
v
era
l
dif
ferent
indi
viduals,
pre
v
enting
the
possibility
to
perform
single-animal
traceability
.
In
literature,
dairy
products
traceability
has
been
mainly
addressed
by
studying
F
atty
Acids
and
T
riac
ylglycerols
Content
using
Gas
Chromatograph
y
[8].
So
f
ar
the
STR
mark
er
analysis
pro
v
ed
to
be
v
alid
only
in
mono-breed
setup
to
detect
adulteration
in
dairy
product
[9].
J
ournal
Homepage:
http://iaescor
e
.com/journals/inde
x.php/IJECE
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
,
DOI:
10.11591/ijece.v8i5.pp3913-3922
Evaluation Warning : The document was created with Spire.PDF for Python.
3914
ISSN:
2088-8708
T
o
the
best
of
our
kno
wledge,
this
w
ork
is
the
first
attempt
to
e
xplore
the
use
of
pooled
STR
analysis
for
traceability
of
food
products.
T
w
o
f
arms
o
wning
dif
ferent
co
w
breeds
were
included
in
this
study
.
First,
the
DN
A
of
each
animal
w
as
anal
yzed
to
com
pu
t
e
a
DN
A
signature
based
on
the
analysis
of
kno
wn
STRs
loci.
The
same
STR
analysis
w
as
then
performed
on
the
final
dairy
products.
The
obtained
STR
genetic
datasets
were
analyzed
through
a
Co
v
ariance
Matrix
Adaptation
Ev
olution
Strate
gy
(CMA-ES)
algorithm
in
order
to
e
v
aluate
the
correlation
(and
therefore
traceability)
between
the
dairy
products
and
the
corresponding
set
of
animals
that
contrib
uted
to
their
production.
As
an
outcome,
the
proposed
algorithm
w
as
able
to
highlight
possible
adulterations
and/or
inconsistencies.
Results
sho
wed
that
b
ulk
milk
and
deri
v
ed
cheese
present
an
STR
profile
composed
of
a
subgroup
of
the
STRs
identified
in
the
animals
the
dairy
product
originated
from,
and
the
profile
could
be
ef
ficiently
used
to
trace
the
origin
of
the
dairy
product.
2.
RESEARCH
METHOD
In
this
section,
we
describe
the
procedure
follo
wed
to
generate
the
STR
datasets,
and
we
present
the
proposed
Computer
-Assisted
Molecular
T
raceability
system
and
its
implementation
based
on
the
CMA-ES
[10]
algorithm
a
v
ailable
in
R
[11].
2.1.
STR
Dataset
T
w
o
f
arms
with
dif
ferent
geographic
locations
and
breed
co
ws
were
considered
for
the
tuning
of
the
method.
At
the
be
ginning
of
the
study
,
appointed
v
eterinaries
collected
blood
and
milk
samples
from
each
co
w
.
Afterw
ards,
the
y
monthly
sampled
BM
and
CH
for
12
months
in
the
first
f
arm
and
11
months
in
the
second
one.
All
collected
samples
were
cold-stored
for
the
tuning
of
the
analysis
protocol
and
the
choice
of
the
best
genotyping
process.
The
main
steps
of
the
STRs
selection
and
data
generation
can
be
summarized
as
follo
ws:
Sample
Collection:
DN
A
e
xtraction
from
blood,
milk
somatic
cells
and
cheese
collected
during
the
months;
STRs
selection:
from
a
panel
of
280
a
v
ailable
STRs
(from
literature),
20
STRs
were
chosen
taking
into
account
some
of
their
characterist
ics,
as
well
as
other
technical
parameters
related
to
the
tuning
phase
of
the
analysis
protocol
(the
STR
selection
process
is
proprietary
and,
at
the
moment,
it
cannot
be
fully
disclosed);
Genotyping
Process:
capillary
electrophoresis
using
a
3130
Genetic
Analyzer
(Applied
Biosystems)
and
fragments
sizing
using
the
STRAnd
softw
are
[12];
Data
e
xtraction:
the
peak
height
of
each
allele
in
relati
v
e
fluorescence
unit
(RFU)
of
the
electropherogram
track
w
as
considered
as
an
indication
of
its
quantity
and
used
in
the
follo
wing
analyses.
Once
the
genotyping
process
w
as
completed,
the
obtained
ra
w
data
were
or
g
anize
d
in
a
tab
ular
format
(T
able
1)
reporting
the
allele
frequencies
for
each
STR
and
for
each
co
w
.
The
notation
in
T
able
1
must
be
read
as
follo
ws:
n
is
the
number
of
processed
STRs;
m
is
the
number
of
co
ws
a
v
ailable
within
the
e
xamined
f
arm;
a
(i,j)
(
i
2
[1
;
m
]
;
j
2
[1
;
n
])
is
the
specific
alleles
dimension
(bp)
of
the
i
th
co
w
for
the
j
th
STR.
This
notation
includes
the
indication
of
the
polymorphism
occurrence
of
being
heterozygote
(a
(i,j)x
6
=
a
(i,j)y
)
or
homozygote
(a
(i,j)x
=
a
(i,j)y
).
Similarly
,
also
the
BM
and
the
CH
genotyping
pool
analysis
data
were
or
g
anized
in
a
tab
ular
w
ay
(T
able
2).
Ho
we
v
er
,
dif
ferently
from
T
able
1,
the
information
associated
to
each
cell
aPj
(PBM,CH,
j[1,n])
of
the
table,
is
a
v
ector
including
all
the
allele
v
alues
obtained
from
the
genotyping
process
of
the
pool
P
for
the
j
th
STR.
Finally
,
the
absolute
RFU
alleles
peak
(h)
of
each
allele
for
each
co
w
of
the
f
arm,
for
BM
and
for
CH
were
or
g
anized
according
to
T
able
3.
At
the
end
all
tab
ular
data
were
stored
in
comma-separated
v
alues
(CSV)
format
te
xt
files.
IJECE
V
ol.
8,
No.
5,
October
2018:
3913
–
3922
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3915
T
able
1.
Example
of
a
data
f
arm
or
g
anization.Here
the
a
(i,j)x
,a
(i,j)y
notation
represents
the
tw
o
alleles
for
each
co
w
in
each
STR.
Co
ws
STR1
STR2
STR3
...
STR
n
CO
W1
a
(1,1)x
,a
(1,1)y
a
(1,2)x
,a
(1,2)y
a
(1,3)x
,a
(1,3)y
...
a
(1,
n
)x
,a
(1,
n
)y
CO
W2
a
(2,1)x
,a
(2,1)y
a
(2,2)x
,a
(2,2)y
a
(2,3)x
,a
(2,3)y
...
a
(2,
n
)x
,a
(2,
n
)y
CO
W3
a
(3,1)x
,a
(3,1)y
a
(3,2)x
,a
(3,2)y
a
(3,3)x
,a
(3,3)y
...
a
(3,
n
)x
,a
(3,
n
)y
...
...
...
...
...
...
CO
W
m
a
(
m
,1)x
,a
(
m
,1)y
a
(
m
,2)x
,a
(
m
,2)y
a
(
m
,3)x
,a
(
m
,3)y
...
a
(
m
,
n
)x
,a
(
m
,
n
)y
T
able
2.
BM
and
CH
data
or
g
anization.
Here
a
j
P
represents
the
pool
P
allele
v
ector
for
each
STR.
Pool
STR1
STR2
STR3
...
STR
n
BM
a
1
B
M
a
2
B
M
a
3
B
M
...
a
n
B
M
CH
a
1
C
H
a
2
C
H
a
3
C
H
...
a
n
C
H
2.2.
Computer
-Assisted
Molecular
T
raceability
The
first
e
xperiments
we
performed
attempted
to
e
v
aluate
the
ability
to
trace
dairy
products
using
well
kno
wn
softw
are
algorithms
commonly
used
in
genetic
distance
analysis
lik
e
FST
A
T
[13],
PHYLIP
[14]
and
SMOGD
[15]
and
then
resorting
to
STR
UCTURE
[16].
Ho
we
v
er
,
results
sho
wed
that
these
algorithms
were
not
well
suited
to
accomplish
the
intended
purpose.
The
y
usually
apply
a
Bayesian
algorithm
approach
to
assign
a
sample
genotype
to
a
specific
dataset
representing
the
candidate
group
of
origin.
While
the
y
w
ork
well
in
diploid
data
(i.e.
only
tw
o
alleles),
the
y
did
not
perform
properl
y
in
the
e
xperimental
setup
considered
in
this
paper
due
to
the
presence
of
v
ariable
numbers
of
alleles
for
each
STR
in
e
v
ery
sample
(e.g.
milk
and
cheese
pooled
DN
A
samples).
Therefore,
we
decided
to
implement
a
ne
w
approach
able
to
detect
if
the
BM
or
CH
fingerprint
could
be
traced
and
compared
with
the
genetic
pool
characteristics
of
the
producing
f
arm.
Our
inno
v
ati
v
e
method
is
at
first
glance
an
automatic
heuristic
procedure
based
on
the
Co
v
ariance
Matrix
Adaptation
Ev
olution
Strate
gy
(CMA-ES)
algorithm.
The
heuristic
is
emplo
yed
to
estimate
the
lik
elihood
of
an
STRs
profile
of
BM
or
CH
to
be
originated
by
a
combination
of
the
STR
profiles
of
the
co
ws
from
which
the
dairy
product
w
as
originated
from.
The
ne
xt
subsection
pro
vides
the
reader
with
the
general
principles
about
the
CMA-ES,
which
is
necessary
to
better
understand
the
proposed
computer
-assisted
molecular
traceability
method
described
ne
xt.
2.2.1.
CMA-ES
algorithm
The
co
v
ariance
matrix
adaptation
e
v
olution
strate
gy
(CMA-ES)
is
an
optimization
method
first
pro-
posed
by
Hansen,
Oster
Meier
,
and
Ga
welczyk
[17]
and
furt
her
de
v
eloped
in
subsequent
years
[18,
19].
The
CMA-ES
performs
an
e
xploration
in
a
solution
space
e
xploiting
a
co
v
ariance
matrix,
closely
related
to
the
in
v
erse
Hessian
on
con
v
e
x-quadratic
functi
o
ns
.
The
approach
is
particularly
suited
to
solv
e
dif
ficult
non-linear
,
non-con
v
e
x,
and
non-separable
problems,
of
at
least
moderate
dimensionality
(i.e.
n
2
[10
;
100]
).
In
CMA-ES,
iteration
steps
are
called
generations
due
to
its
bi
ological
foundations.
The
v
alue
of
a
generic
algori
thm
parameter
y
during
generation
g
is
denoted
with
y
(g)
.
The
mean
v
ector
m
(g)
2
R
n
represents
the
f
a
v
orite,
most
promising
solution
so
f
ar
.
The
step
size
(g)
2
R
+
controls
the
step
length,
and
the
co
v
ariance
matrix
C
(g)
2
R
n
n
determines
the
shape
of
the
distrib
ution
ellipsoid
in
the
search
space.
Con
v
ersely
,
its
goal
is
to
fit
the
search
distrib
ution
to
the
contour
lines
of
the
objecti
v
e
function
f
to
be
minimized:
C
(0)
=
I
.
One
of
the
main
characteristics
of
the
CMA-ES
is
that
it
requires
almost
no
parameter
tuning
for
its
application
unlik
e
most
common
heur
istic
optimization
methods
[20].
The
choice
of
its
internal
parameters
is
not
left
to
the
user
.
Notably
,
the
def
ault
population
size
is
comparati
v
ely
small
to
allo
w
for
f
ast
con
v
er
gence.
Restarts
with
increasing
population
size
ha
v
e
been
demonstrated
[21]
to
be
useful
to
impro
v
e
the
global
search
performance,
and
are
no
w
adays
included
as
an
option
in
the
standard
algorithm.
In
this
research
we
used
the
CMA-ES
package
de
v
eloped
in
R
[10].
DN
A
P
ool
Analysis-based
F
or
g
ery-Detection
of
Dairy
Pr
oducts
(F
r
ancesco
Rossi)
Evaluation Warning : The document was created with Spire.PDF for Python.
3916
ISSN:
2088-8708
T
able
3.
The
height
of
the
RFU
alleles
peak
(h
instead
of
a)
in
each
STR
for
each
co
w
.
RFU
STR1
STR2
STR3
...
STR
n
CO
W1
h
h
(1,1)x
,h
(1,1)y
h
(1,2)x
,h
(1,2)y
h
(1,3)x
,h
(1,3)y
...
h
(1,
n
)x
,h
(1,
n
)y
CO
W2
h
h
(2,1)x
,h
(2,1)y
h
(2,2)x
,h
(2,2)y
h
(2,3)x
,h
(2,3)y
...
h
(2,
n
)x
,h
(2,
n
)y
CO
W3
h
h
(3,1)x
,h
(3,1)y
h
(3,2)x
,h
(3,2)y
h
(3,3)x
,h
(3,3)y
...
h
(3,
n
)x
,h
(3,
n
)y
...
...
...
...
...
...
CO
W
m
h
h
(
m
,1)x
,h
(
m
,1)y
h
(
m
,2)x
,h
(
m
,2)y
h
(
m
,3)x
,h
(
m
,3)y
...
h
(
m
,
n
)x
,h
(
m
,
n
)y
BM
h
h
1
B
M
h
2
B
M
h
3
B
M
...
h
n
B
M
CH
h
h
1
C
H
h
2
C
H
h
3
C
H
...
h
n
C
H
2.2.2.
Computer
-assisted
molecular
traceability
pipeline
In
this
study
we
assume
that,
if
a
certain
number
of
co
ws
that
produced
the
BM
or
CH
does
e
xist,
then
the
BM
or
CH
genetic
STR
profile
should
be
a
linear
combination
of
the
STR
profiles
of
those
co
ws.
Under
this
postulate,
the
automated
for
gery
detection
we
propose
is
composed
of
tw
o
steps:
data
normalization,
and
heuristic
simulation.
The
purpose
of
the
data
normalization
step
is
to
preprocess
the
RFU
ra
w
data
(see
T
able
3)
of
a
specific
dairy
product
(CH
or
BM
pool
analysis)
and
the
ones
from
the
profiles
of
the
co
ws
belonging
to
the
declared
f
arm.
This
in
turn
mak
es
them
comparable
and
allo
ws
us
to
perform
for
gery
detection.
All
RFU
peak
profiles
are
therefore
normalized
between
[0,1]
producing
the
normalized
dataset
reported
in
T
able
4
where:
H
(
i;j
)
=
h
(
i;j
)
x
max
(
h
(
i
)
x
)
;
h
(
i;j
)
y
max
(
h
(
i
)
y
)
(1)
is
the
normalized
pair
v
alues
of
alleles’
RFU
peaks
for
co
w
i
and
STR
j;
H
(
j
)
p
=
h
(
j
)
p
max
(
h
p
)
(2)
is
the
normalized
v
ector
of
alleles’
RFU
peaks
for
pool
P
(BM
or
CH)
and
STR
j.
T
able
4.
Normalized
co
ws
and
pool
(BM
and
CH)
STR-RFU
peak
tab
ular
data.
Normalized
STR1
STR2
STR3
...
STR
n
CO
W1
H
H
(1,1)x
,H
(1,1)y
H
(1,2)x
,H
(1,2)y
H
(1,3)x
,H
(1,3)y
...
H
(1,
n
)x
,H
(1,
n
)y
CO
W2
H
H
(2,1)x
,H
(2,1)y
H
(2,2)x
,H
(2,2)y
H
(2,3)x
,H
(2,3)y
...
H
(2,
n
)x
,H
(2,
n
)y
CO
W3
H
H
(3,1)x
,H
(3,1)y
H
(3,2)x
,H
(3,2)y
H
(3,3)x
,H
(3,3)y
...
H
(3,
n
)x
,H
(3,
n
)y
...
...
...
...
...
...
CO
W
m
H
H
(
m
,1)x
,H
(
m
,1)y
H
(
m
,2)x
,H
(
m
,2)y
H
(
m
,3)x
,H
(
m
,3)y
...
H
(
m
,
n
)x
,H
(
m
,
n
)y
BM
H
H
1
B
M
H
2
B
M
H
3
B
M
...
H
n
B
M
CH
H
H
1
C
H
H
2
C
H
H
3
C
H
...
H
n
C
H
The
proposed
for
gery
detection
heuristic
w
orks
analyzing
the
normalized
data
reported
in
T
able
4.
Our
technique
assumes
that
the
amount
of
milk
from
each
co
w
used
in
the
production
of
the
analyzed
dairy
product
is
unkno
wn.
The
goal
of
the
heuristic
is
to
find
the
best
co
ws’
weighted
combination
(W)
in
such
a
w
ay
that
the
sum
of
the
weighted
co
ws’
STR
profiles
produces
a
pattern
as
similar
as
possible
to
those
of
the
analyzed
dairy
product.
As
an
output
score,
the
proposed
model
returns
the
sum
of
the
squared
errors
(SSE)
of
the
dif
ferences
between
the
alleles
of
the
e
xpected
milk
or
cheese
STR
profile
and
the
predicted
one,
multiplied
by
tw
o
penalty
coef
ficient.
The
first
penalty
(P1)
is
the
percentage
of
alleles
that
are
included
in
the
STR
profile
of
the
dairy
product
b
ut
that
are
not
present
in
an
y
STR
co
w
profile.
The
se
cond
penalty
(P2)
is
the
percentage
of
alleles
a
v
ailable
in
co
w
profiles
b
ut
not
detected
in
the
genotyping
process
of
the
pool.
In
other
w
ords,
P1
represents
the
possible
introduction
of
a
for
gery
,
while
P2
esti
mates
the
loss
of
alleles
from
the
co
ws
pattern
IJECE
V
ol.
8,
No.
5,
October
2018:
3913
–
3922
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3917
due,
for
e
xample,
to
the
ripening
process
or
the
sample
collection
procedure.
The
outline
of
the
proposed
method
is
sho
wn
in
Figure
1.
Figure
1.
Global
scheme
of
the
F
or
gery
Detection
Model
The
algorithm
recei
v
es
tw
o
main
inputs:
CO
W
H
is
the
m
n
matrix
containing
all
normalized
data
for
the
co
ws
composing
the
f
arm
(T
able
4).
This
table
includes
all
data
required
to
identify
the
tar
get
production
f
arm
for
the
dairy
product
under
in
v
estig
ation;
BM
H/CH
H
is
a
v
ector
reporting
the
normalized
STR
RFU
peaks
for
the
diary
product
under
in
v
estig
a-
tion
(BM
or
CH)
follo
wing
the
format
reported
in
T
able
4.
As
a
first
step,
the
algorithm
e
xploits
the
optimization
capability
of
the
CMA-ES
t
o
s
earch
for
the
best
linear
combination
of
the
STR
RFU
peaks
of
the
co
ws
composing
the
f
arm
(CO
W
H)
able
to
generate
the
STR
RFU
profile
of
the
diary
product
under
in
v
estig
ation
(BM
H
or
CH
H).
This
actually
translates
into
the
computation
of
a
v
ector
W
of
size
m
representing
the
computed
contrib
ution
of
each
co
w
to
the
tar
get
diary
product.
Essentially
the
CMA-ES
starts
with
an
unkno
wn
weight
v
ector
equal
to
0
(W=0).
The
CMA-ES
then
w
orks
o
v
er
se
v
eral
generations
until
a
stop
condition
is
reached:
max
number
of
iterations
or
con
v
er
gence.
The
best
solution
W
identified
by
the
CMA-ES
is
finally
used
to
calculate
the
predicted
profile
for
the
tar
get
diary
product
as:
pP
=
W
C
O
W
H
(3)
where
pP
2
f
pB
M
H
;
pC
H
H
g
The
computed
profi
le
(pP)
and
the
original
pool
profile
(BM
H
or
CH
H)
can
then
be
compared
to
calculate
the
sum
of
squared
error
(SSE
B
M
or
SSE
C
H
)
between
the
tw
o
profiles.
This
errors,
corrected
by
the
tw
o
penalty
scores
P1
and
P2,
can
then
be
used
to
compute
the
final
for
gery
score
of
the
diary
product
with
respect
to
the
selected
f
arm
as:
S
C
O
R
E
=
S
S
E
P
P
1
P
2
(4)
S
S
E
P
2
f
S
S
E
B
M
;
S
S
E
C
H
g
S
S
E
B
M
=
S
S
E
(
B
M
H
;
pB
M
H
)
and
S
S
E
C
H
=
S
S
E
(
C
H
H
;
pC
H
H
)
Since
in
case
of
frauds
it
may
happen
that
a
certain
allele
that
appears
in
a
specific
STR
of
BM
or
CH
does
not
appear
in
an
y
STR
allele
of
the
co
ws,
the
RFU
peak
of
that
allele
is
tak
en
into
account
in
the
SSE
computation
ag
ainst
a
def
ault
v
alue
equal
to
0.
On
the
other
hand,
if
the
occurrence
of
a
certain
allele
in
a
STR
of
a
co
w
does
not
appear
in
the
STR
alleles
v
ector
of
the
pool,
the
routine
automatically
inserts
a
def
ault
v
alue
equal
t
o
0
for
that
allele
in
the
pool’
s
STR
v
ector
.
This
last
circumstance
is
possible
when,
during
the
genotyping
process,
or
due
to
the
ripening
of
the
cheese,
some
allele
are
lost
or
not
amplified
enough.
DN
A
P
ool
Analysis-based
F
or
g
ery-Detection
of
Dairy
Pr
oducts
(F
r
ancesco
Rossi)
Evaluation Warning : The document was created with Spire.PDF for Python.
3918
ISSN:
2088-8708
The
heuristic
simulation
is
e
xpected
to
return
a
score
as
close
as
possible
to
0
in
case
of
appropriate
matching
between
the
dairy
products
and
the
co
ws
of
a
f
arm.
Otherwise,
in
case
of
frauds,
we
e
xpect
that
the
automatic
for
gery
detection
returns
a
higher
score
v
alue.
In
f
act,
in
this
case,
there
should
be
much
more
inconsistenc
y
in
the
match
due
to
incoherent
co
ws
vs.
dairy
product
STR
patterns.
In
order
to
perform
its
optimization,
the
CMA-ES
requires
the
definition
of
a
fitness
function.
Essen-
tially
,
our
goal
is
to
minimize
the
SSE
between
the
BM
or
CH
genetic
profile
and
the
corresponding
predicted
one
computed
as
a
linear
combi
nation
of
t
he
co
ws
profiles.
The
SS
E
can
t
herefore
be
e
xploi
ted
as
an
ef
ficient
fitness
function
for
our
goal.
The
temporary
weight
v
ector
that
is
generated
iterati
v
ely
during
the
generation
(g)
is
multiplied
by
the
co
ws’
profile
to
predict
the
temporary
pool’
s
pattern.
The
fitness
function
returning
the
SSE
v
alue
is
computed
as
follo
ws:
F
itness
=
S
S
E
(
g
)
P
(5)
S
S
E
(
g
)
P
2
n
S
S
E
(
g
)
B
M
;
S
S
E
(
g
)
C
H
o
S
S
E
(
g
)
B
M
=
S
S
E
(
B
M
H
;
pB
M
H
(
g
)
)
S
S
E
(
g
)
C
H
=
S
S
E
(
C
H
H
;
pC
H
H
(
g
)
)
pB
M
H
(
g
)
or
pC
H
H
(
g
)
=
W
(
g
)
C
O
W
H
W
(
g
)
is
the
temporary
weight
v
ector
computed
by
CMA-ES
at
the
generation
g
of
the
optimization
process
One
more
important
feature
that
w
as
implemented
in
the
softw
are
concerns
W
.
Since
in
a
f
arm,
during
the
lactation
period,
each
co
w
contrib
utes
with
an
unkno
wn
amount
of
milk
w
(that
is
essentially
what
the
heuristic
routine
tries
to
estimate),
we
assume
that
e
v
ery
contrib
ution
cannot
f
all
outside
a
predefined
range
that
is:
l
ow
er
boundar
y
<
w
<
u
p
per
boundar
y
(6)
l
ow
er
boundar
y
=
0
:
5
m
and
upper
boundar
y
=
max
(
3
m
;
1)
The
we
ight
boundary
condition
is
sho
wn
in
Figure
2.
It
accounts
for
the
f
act
that
a
co
w
cannot
produce
under/o
v
er
a
specific
milk
rate
in
relation
to
the
number
of
the
other
milking
co
ws
(m).
These
constraints
were
chosen
after
analyzing
se
v
eral
b
ulk
milk
batches
and
also
after
se
v
eral
discussions
with
the
f
arm
and
v
eterinary
staf
f.
Basically
it
is
supposed
that
each
co
ws
should
produce
more
than
a
half
and
less
of
the
tr
iple
of
the
mean
quantity
of
the
dairy
product
(i.e.
1
=m
).
Moreo
v
er
,
the
upper
boundary
cannot
e
xceed
the
v
alue
1
since
a
co
w
must
not
produce
all
the
dairy
product
by
itself.
An
yw
ay
these
constraints
can
be
freely
changed
and
the
y
could
be
used
to
further
refine
the
analysis
in
case
of
e
xplicit
information
from
producers
concerning
a
particular
dairy
product.
Figure
2.
Boundary
condition
for
w
during
the
CMA-ES
routine
2.2.3.
Experimental
Setup
T
o
demonstrate
the
usability
of
t
he
proposed
approach
we
designed
three
e
xperiments.
The
first
one
consists
in
analyzing
the
dairy
product
produced
with
100%
of
milk
of
the
same
f
arm
(i.e.,
CO
W
H,
BM
H
or
IJECE
V
ol.
8,
No.
5,
October
2018:
3913
–
3922
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3919
CH
H
tak
en
from
the
same
f
arm).
In
the
second
e
xperiment,
instead,
we
analyzed
a
partial
for
gery
in
which
a
dairy
product
is
produced
from
50%
randomly
selected
co
ws
from
a
f
arm
and
50%
randomly
selected
co
ws
from
the
other
f
arm
.
Finally
,
in
the
third
e
xperiment,
we
analyzed
a
full
for
gery
scenario
in
which
we
compared
the
diary
product
from
a
f
arm
ag
ainst
the
STR
profile
of
the
co
ws
of
the
second
f
arm.
F
or
each
f
arm
and
for
each
of
the
three
for
gery
le
v
els,
e
v
ery
dairy
product
has
been
analyzed
24
times
to
highlight
possible
v
ariations
within
the
results.
The
whole
e
xperiment
w
as
e
x
ecuted
in
parallel
on
an
eight-core
machine
Intel
Xenon
CPU
E5-2680
@
2.70GHz,
64
GB
RAM,
Ub
untu
14.04
L
TS.
The
STR
Dataset
pre
viously
described
in
section
2.1
is
summarized
in
T
able
5.
T
able
5.
Summary
of
the
STR
dataset
used
in
the
analysis.
F
arm
No.
Co
ws
No.
Pool
Samples
A
12
Bulk
milk:
12
Deri
v
ed
Cheese:
12
B
14
Bulk
milk:
11
Deri
v
ed
Cheese:
11
3.
RESUL
T
AND
AN
AL
YSIS
The
main
purpose
of
this
w
ork
w
as
to
de
v
elop
a
ne
w
automatic
methodology
to
highlight
poss
ible
adulterations
in
dairy
products
thanks
to
a
computational
heuristic
analysis.
Using
the
method
described
in
the
pre
vious
sections,
we
obtained
the
results
reported
in
Figure
3
and
Figure
4.
Figure
3
reports
the
mean
score
v
alues
computed
by
the
proposed
heuristic
o
v
er
the
24
repetitions
for
the
Bulk
Milk
analysis
in
F
arm
A
and
B.
F
or
each
sampled
pool,
and
for
each
month,
the
Figure
sho
ws
the
estimation
of
the
three
e
xperimental
setups
described
in
section
2.2.3.
with
the
changing
for
gery
percentage.
Figure
4
reflects
the
results
of
the
cheese
for
gery
simulation
follo
wing
the
same
criteria
of
Figure
3.
Figure
3.
Results
of
the
mean
score
v
alues
for
F
arm
A
(left
side)
and
F
arm
B
(right
side)
for
the
B
ULK
MILK
analysis
for
each
a
v
ailable
month.
Black
lines
are
related
to
100%
true
co
ws
setup
analysis,
the
blue
ones
are
related
to
50%
of
adulterated
milk
origin,
and
the
red
ones
are
100%
for
ged
milk
origins.
Our
for
gery
scores
are
o
v
erall
v
ery
good
according
to
the
e
xpect
ed
results:
higher
scores
in
ca
se
of
adulteration
and
scores
clos
e
to
0
otherwise.
Moreo
v
er
,
it
can
be
seen
that
partial
for
gery
simulations
are
globally
between
100%
for
ged
and
100%
true
e
xamples.
This
beha
vior
can
be
observ
ed
both
in
milk
and
cheese
predictions.
In
the
majority
of
the
cases,
the
proposed
automatic
for
gery
dete
ction
re
v
eals
a
considerably
good
accurac
y
with
the
e
xception
of
a
fe
w
e
xamples.
A
summary
of
the
aggre
g
ated
results
is
gi
v
en
in
Figure
5.
These
box
plots
represent
the
grouped
results
of
Figure
3
and
Figure
4,
respecti
v
ely
.
In
general,
the
scores
obtained
for
milk
and
deri
v
ed
cheese
simulation
i
ndicate
that
it
is
pos
sible
to
c
haracterize
our
model
with
progres
si
v
e
cut-of
fs
able
to
ident
ify
if
for
gery
has
occurred.
As
indicated
in
the
figure
the
b
ulk
milk
box
es
are
noticeably
well
separated,
while
the
cheese
b
ox
es
sho
w
a
less
sharp
separation
in
particular
in
the
F
arm
B
between
the
50%
for
ged
and
the
100%
true
group.
The
suggestion
is
that
probably
the
STR
profiles
of
the
F
arm
A,
that
occur
in
the
random
selection
DN
A
P
ool
Analysis-based
F
or
g
ery-Detection
of
Dairy
Pr
oducts
(F
r
ancesco
Rossi)
Evaluation Warning : The document was created with Spire.PDF for Python.
3920
ISSN:
2088-8708
Figure
4.
Results
of
the
mean
score
v
alues
for
the
F
arm
A
(left
side)
and
the
F
arm
B
(right
side)
for
the
CHEESE
analysis
for
each
a
v
ailable
month.
Black
lines
are
related
to
100%
true
co
ws
setup
analysis,
the
blue
ones
are
related
to
50%
of
adulterated
co
ws
and
the
red
ones
are
100%
for
ged
co
ws.
for
f
alse
co
ws,
are
too
similar
to
the
correct
ones
and
only
with
a
higher
percentage
of
for
gery
the
scores
are
e
xtensi
v
ely
re
v
ealed.
Figure
5.
Box
plots
of
grouped
scores
for
the
F
arm
A
and
B
in
the
b
ulk
milk
and
cheese
analysis.
Black
box
are
related
to
100%
true
co
ws
setup
analysis,
the
blue
ones
are
related
to
50%
of
adulterated
co
ws
and
the
red
ones
are
100%
for
ged
co
ws.
The
o
v
erall
results
for
the
dairy
product
analysis
is
sho
wn
in
Figure
6.
Here
the
global
scores
are
grouped
together
only
to
sho
w
the
dif
ferences
among
the
true
simulation
and
the
other
tw
o
ratios
of
adulteration.
Notice
that
F
arm
A
and
F
arm
B
are
mer
ged,
just
lik
e
BM
and
CH.
The
dif
ference
between
the
three
groups
(100%
true,
50%
for
ged
and
100%
for
ged)
is
statis
tically
significant
(p
<
0.05).
This
result
also
pro
v
es
that
it
is
possible
to
define
a
cut-of
f
between
distincti
v
e
le
v
els
of
dairy
product
counterfeiting
score
(e.g.,
score=1
define
adequately
the
limit
for
not
for
ged
product
ag
ainst
half
or
complete
f
alsified
ones,
score=2.5
is
an
opportune
cut
for
sure
complete
f
alsification).
From
the
obtained
results,
it
is
e
vident
that
the
automatic
for
gery
detection
model
implemented
and
described
in
this
paper
is
capable
to
identify
the
occurrence
of
irre
gular
dairy
product
manuf
acturing
and
is
also
able
to
quantify
the
magnitude
of
the
fraud.
These
results
also
suggest
that
this
methodology
may
pro
vide
a
useful
strate
gy
eligible
to
other
food
traceability
conte
xt.
4.
CONCLUSION
In
this
paper
we
proposed
an
inno
v
ati
v
e
automatic
for
gery
detection
method
based
on
a
heuristic
procedure.
This
system
is
able
to
measure
the
lik
elihood
that
a
traditional
dairy
product
is
ori
ginated
from
a
kno
wn
f
arm,
thus
pro
viding
a
measure
of
the
le
v
el
of
potential
counterfeiting.
W
e
in
v
estig
ated
the
use
of
Short
IJECE
V
ol.
8,
No.
5,
October
2018:
3913
–
3922
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
3921
Figure
6.
Global
simulation
scores.
Both
f
arms
and
dairy
products
are
grouped.
Black
box
is
related
to
100%
true
co
ws
setup
analysis,
the
blue
one
is
related
to
50%
of
adulterated
co
ws
and
the
red
one
is
100%
for
ged
co
ws.
The
*
indicate
significant
dif
ference
between
groups
(p¡0.05).
T
andem
Repeats
associated
to
their
relati
v
e
fluorescence
unit
(RFU)
to
estimate
the
quantity
of
each
indi
vidual
that
contrib
uted
in
the
final
pool.
W
e
emplo
yed
a
Co
v
ariance
Matrix
Adaptation
Ev
olution
Strate
gy
algorithm
in
order
to
predict
the
traceability
bet
ween
dairy
products
and
the
corresponding
producer
.
Results
obtained
in
se
v
eral
e
xperiments
pro
vided
e
xcellent
outcomes
and
encourage
the
research
community
to
in
v
estig
ate
further
to
emplo
y
this
method
to
other
foodstuf
f
traceability
issues.
A
CKNO
WLEDGEMENT
This
w
ork
w
as
supported
by
Italian
Ministry
of
Health
grant
IZS
PL
V
01/14
RC.
REFERENCES
[1]
H.
Me
gens,
et
al.,
”Biodi
v
ersity
of
pig
breeds
from
China
and
Europe
estimated
from
pooled
DN
A
samples:
dif
ferences
in
microsatellite
v
ariation
between
tw
o
areas
of
domestication,
”
Genetics
Selection
Ev
olution
,
v
ol.
40,
no.
1,
pp.
103-128,
2008.
[2]
H.
Schnack,
et
al.,
”Accurate
determination
of
microsatellite
allele
frequencies
in
pooled
DN
A
samples,
”
European
Journal
of
Human
Genetics
,
v
ol.
12,
no.
11,
pp.
925-934,
2004.
[3]
G.
Skalski,
et
al.,
”Ev
aluation
of
DN
A
Pooling
for
the
Estimation
of
Microsatellite
Allele
Frequenci
es:
A
Case
Study
Using
Striped
Bass
(Morone
saxatilis),
”
Genetics
,
v
ol.
173,
no.
2
,
pp.
863-875,
2006.
[4]
C.
Likhitha,
P
.
Ninitha,
V
.
Kanchana,
”DN
A
Bar
-coding:
A
No
v
el
Approach
for
Identifying
an
Indi
vidual
Using
Extended
Le
v
enshtein
Distance
Algorithm
and
STR
analysis,
”
International
Journal
of
Electrical
and
Computer
Engineering
(IJECE)
,
v
ol.
6,
no.
3,
pp.1133-1139,
2016.
[5]
M.
W
idyanto,
R.
N.
Hartono,
N.
Soedarsono,
”A
No
v
el
Human
STR
Similarity
Method
using
Cascade
Sta-
tistical
Fuzzy
Rules
with
T
ribal
Information
Inference,
”
International
Journal
of
Electrical
and
Computer
Engineering
(IJECE)
,
v
ol.
6,
no.
6,
pp.
3103-3111,
2016.
[6]
A.
Bagnato,
et
al.,
”Quantitati
v
e
T
rait
Loci
Af
fecting
Milk
Y
ield
and
Protein
Percentage
in
a
Three-Country
Bro
wn
Swiss
Population,
”
Journal
of
Dairy
Science
,
v
ol.
91,
no.
2,
pp.
767-783,
2008.
[7]
E.
Lipkin,
et
al.,
”Quantitati
v
e
T
rait
Locus
Mapping
in
Chi
ck
ens
by
Selecti
v
e
DN
A
Pooling
with
Dinu-
cleotide
Microsatellite
Mark
ers
by
Using
Purified
DN
A
and
Fresh
or
Frozen
Red
Blood
Cells
as
Applied
to
Mark
er
-Assisted
Selection,
”
Poultry
Science
,
v
ol.
81,
no.
3,
pp.
283-292,
2002.
[8]
J.
P
ark,
et
al.,
”Determination
of
the
Authenticity
of
Dairy
Products
on
the
Basis
of
F
atty
Acids
and
T
riac
ylglycerols
Content
using
GC
Analysis,
”
K
orean
Journal
for
F
ood
Science
of
Animal
Resources
,
v
ol.
34,
no.
3,
pp.
316-324,
2014.
[9]
M.
Sardina,
et
al.,
”Application
of
microsatellite
mark
ers
as
potential
tools
for
traceability
of
Gir
gentana
goat
breed
dairy
products,
”
F
ood
Research
International
,
v
ol.
74,
pp.
115-122,
2015.
DN
A
P
ool
Analysis-based
F
or
g
ery-Detection
of
Dairy
Pr
oducts
(F
r
ancesco
Rossi)
Evaluation Warning : The document was created with Spire.PDF for Python.
3922
ISSN:
2088-8708
[10]
H.
T
rautmann,
O.
Mersmann,
D.
Arnu,
”cmaes:
Co
v
ariance
Matrix
Adapting
Ev
olutionary
Strate
gy
,
”
R
package
v
ersion
1.0-11,
2011.
[11]
T
eam
RC,
”R:
A
language
and
en
vironment
for
statistical
computing,
”
V
ienna,
Austria:
R
F
oundation
for
Statistical
Computing,
2014.
[12]
R.
T
oonen,
S.
Hughes,
”Increased
throughput
for
fragment
analys
is
on
an
ABI
PRISM
377
automated
sequencer
using
a
membrane
comb
and
STRand
softw
are,
”
Biotechniques
,
v
ol.
31,
no.
6,
pp.
1320-1324,
2001.
[13]
J.
Goudet,
”FST
A
T
a
program
to
estimate
and
test
gene
di
v
ersitie
s
and
fixat
ion
indices
(v
ersion
2.9.3),
”
A
v
ailable:
http://www
.
unil.
ch/izea/softw
ares/fstat.html,
2001.
[14]
J.
Felsenstein,
”PHYLIP
(Ph
ylogen
y
Inference
P
ackage),
”
A
v
ailable:
http://e
v
olution.genetics.w
ashington.edu/ph
ylip.html,
2005.
[15]
N.
Cra
wford,
”smogd:
softw
are
for
the
measurement
of
genetic
di
v
ersity
,
”
Molecular
Ecology
Resources
,
v
ol.
10,
no.
3,
pp.
556-557,
2010.
[16]
M.
Hubisz,
et
al.,
”Inferring
weak
population
structure
with
the
assistance
of
sample
group
information,
”
Molecular
Ecology
Resources
,
v
ol.
9,
no.
5,
pp.
1322-1332,
2009.
[17]
N.
Hans
en,
A.
Ostermeier
,
A.
Ga
welczyk,
”On
the
Adaptation
of
Arbitrary
Normal
Mutation
Distrib
utions
in
Ev
olution
Strate
gies:
The
Generating
Set
Adaptation,
”
ICGA
,
1995,
pp.
57-64.
[18]
N.
Hansen,
A.
Ostermeier
,
”Completely
Derandomized
Self-Adaptation
in
Ev
olution
Strate
gies,
”
Ev
olu-
tionary
Computation
,
v
ol.
9,
no.
2,
pp.159-195,
2001.
[19]
N.
Hansen,
S.
Mller
,
P
.
K
oumoutsak
os,
”Reducing
the
T
ime
Compl
e
x
i
ty
of
the
Derandomized
Ev
olution
Strate
gy
with
Co
v
ariance
Matrix
Adaptation
(CMA-ES),
”
Ev
olutionary
Computation
,
v
ol.
11,
no.
1,
pp.
1-18,
2003.
[20]
I.
Ismail,A.
Hanif
Halim,
”Comparati
v
e
Study
of
Meta-heuristics
Optimization
Algorithm
using
Bench-
mark
Function,
”
International
Journal
of
Electrical
and
Computer
Engineering
(IJECE)
,
v
ol.
7,
no.
3,
pp.
1643-1650,
2017.
[21]
A.
Auger
,
N.
Hansen,
”A
restart
CMA
e
v
olution
strate
gy
with
increasing
population
size,
”
Ev
olutionary
Computation,
2005.
The
2005
IEEE
Congress
on.
IEEE
,
v
ol.
2,
pp.
1769-1776,
2005.
IJECE
V
ol.
8,
No.
5,
October
2018:
3913
–
3922
Evaluation Warning : The document was created with Spire.PDF for Python.