Inter national J our nal of Electrical and Computer Engineering (IJECE) V ol. 8, No. 5, October 2018, pp. 3913 3922 ISSN: 2088-8708 3913       I ns t it u t e  o f  A d v a nce d  Eng ine e r i ng  a nd  S cie nce   w     w     w       i                       l       c       m     DN A P ool Analysis-based F or gery-Detection of Dairy Pr oducts Francesco Rossi 1 , P aola Modesto 2 , Cristina Biolatti 3 , Alfr edo Benso 4 , Stefano Di Carlo 5 , Gianfranco P olitano 6 , and Pierluigi Acutis 7 1,4,5,6 Department of Control and Computer Engineering, Politecnico di T orino, Italy 2,3,7 Istituto Zooprofilattico Sperimentale del Piemonte Liguria e V alle dAosta, Italy Article Inf o Article history: Recei v ed December 21, 2017 Re vised July 29, 2018 Accepted August 18, 2018 K eyw ord: Genetic Programming CMA-ES DN A barcoding STR F ood Safety ABSTRA CT F ood inte grity and food safety ha v e recei v ed much attention in recent years due to the dramatic increasing number of food frauds. In this article we focus on the problem of dairy products traceability . In particular , we propose an automatic for gery detection system able to detect frauds in milk and cheese. W e in v estig ate the use of Short T an- dem Repeats analysis data, processed by a Co v ariance Matrix Adaptation Ev olution Strate gy algorithm i n order to e v aluate a traceability score between the products and their producer , and to highlight possible adulterations and inconsistencies. T o demon- strate the usability of the proposed heuristic algorithm in a real setup, we al so present the results collected from tw o real Italian f arms. Copyright c 2018 Institute of Advanced Engineering and Science . All rights r eserved. Corresponding A uthor: Francesco Rossi Politecnico di T orino Department of Control and Computer Engineering Corso Duca delgli Abruzzi 24, 10129 T orino, Italy Email: francesco.rossi@polito.it 1. INTR ODUCTION F ood inte grity and food safety ha v e recei v ed much attention in recent years due to the dramatic in- creasing number of food frauds. T raceability is a useful method to guarantee foodstuf f quality and safety , to guarantee h ygiene standards, and to protect consumers choices and health. Ov er the past years DN A analysis has bee n widely recognized as an ef fecti v e tool to deal with genetic traceability issues, g aining a k e y role in tracing and testing food origin and safety . In this article we analyze dairy products for which one of the crucial issues is traditional chees e traceability . In the case of frauds, it may occur that a selected dairy product that shoul d be produced by milk coming from a certified f arm, is instead produced using a v ariable amount of milk coming from unauthorized f arms. T raceability of dairy products through DN A analysis in v olv es some technical challenges. The cheese (CH) is produced from b ulk milk (BM), which contains DN A from dif ferent co ws of the f arm and under goes se v eral biochemical changes during the ripening process. In this paper , we propose a computer -assisted molecular traceability system able to analyze the origin of a traditional dairy product. W e in v estig ate the use of Short T andem Re peats (STRs) analysis to create a DN A fingerprint of small dairy f arms and to link dairy products (milk and cheese) to the corresponding producer . So f ar , STR analysis has been applied to blood samples for genetics population analysis [1, 2, 3, 4, 5], or to milk samples in order to identify quantitati v e trait locus (QTL) associated with traits in animal science [6, 7]. Ho we v er , the application of STR analysis to trace the origin of dair y products is a dif ferent and more comple x issue. Dairy products contain the DN A belonging to s e v era l dif ferent indi viduals, pre v enting the possibility to perform single-animal traceability . In literature, dairy products traceability has been mainly addressed by studying F atty Acids and T riac ylglycerols Content using Gas Chromatograph y [8]. So f ar the STR mark er analysis pro v ed to be v alid only in mono-breed setup to detect adulteration in dairy product [9]. J ournal Homepage: http://iaescor e .com/journals/inde x.php/IJECE       I ns t it u t e  o f  A d v a nce d  Eng ine e r i ng  a nd  S cie nce   w     w     w       i                       l       c       m     DOI:  10.11591/ijece.v8i5.pp3913-3922 Evaluation Warning : The document was created with Spire.PDF for Python.
3914 ISSN: 2088-8708 T o the best of our kno wledge, this w ork is the first attempt to e xplore the use of pooled STR analysis for traceability of food products. T w o f arms o wning dif ferent co w breeds were included in this study . First, the DN A of each animal w as anal yzed to com pu t e a DN A signature based on the analysis of kno wn STRs loci. The same STR analysis w as then performed on the final dairy products. The obtained STR genetic datasets were analyzed through a Co v ariance Matrix Adaptation Ev olution Strate gy (CMA-ES) algorithm in order to e v aluate the correlation (and therefore traceability) between the dairy products and the corresponding set of animals that contrib uted to their production. As an outcome, the proposed algorithm w as able to highlight possible adulterations and/or inconsistencies. Results sho wed that b ulk milk and deri v ed cheese present an STR profile composed of a subgroup of the STRs identified in the animals the dairy product originated from, and the profile could be ef ficiently used to trace the origin of the dairy product. 2. RESEARCH METHOD In this section, we describe the procedure follo wed to generate the STR datasets, and we present the proposed Computer -Assisted Molecular T raceability system and its implementation based on the CMA-ES [10] algorithm a v ailable in R [11]. 2.1. STR Dataset T w o f arms with dif ferent geographic locations and breed co ws were considered for the tuning of the method. At the be ginning of the study , appointed v eterinaries collected blood and milk samples from each co w . Afterw ards, the y monthly sampled BM and CH for 12 months in the first f arm and 11 months in the second one. All collected samples were cold-stored for the tuning of the analysis protocol and the choice of the best genotyping process. The main steps of the STRs selection and data generation can be summarized as follo ws: Sample Collection: DN A e xtraction from blood, milk somatic cells and cheese collected during the months; STRs selection: from a panel of 280 a v ailable STRs (from literature), 20 STRs were chosen taking into account some of their characterist ics, as well as other technical parameters related to the tuning phase of the analysis protocol (the STR selection process is proprietary and, at the moment, it cannot be fully disclosed); Genotyping Process: capillary electrophoresis using a 3130 Genetic Analyzer (Applied Biosystems) and fragments sizing using the STRAnd softw are [12]; Data e xtraction: the peak height of each allele in relati v e fluorescence unit (RFU) of the electropherogram track w as considered as an indication of its quantity and used in the follo wing analyses. Once the genotyping process w as completed, the obtained ra w data were or g anize d in a tab ular format (T able 1) reporting the allele frequencies for each STR and for each co w . The notation in T able 1 must be read as follo ws: n is the number of processed STRs; m is the number of co ws a v ailable within the e xamined f arm; a (i,j) ( i 2 [1 ; m ] ; j 2 [1 ; n ]) is the specific alleles dimension (bp) of the i th co w for the j th STR. This notation includes the indication of the polymorphism occurrence of being heterozygote (a (i,j)x 6 = a (i,j)y ) or homozygote (a (i,j)x = a (i,j)y ). Similarly , also the BM and the CH genotyping pool analysis data were or g anized in a tab ular w ay (T able 2). Ho we v er , dif ferently from T able 1, the information associated to each cell aPj (PBM,CH, j[1,n]) of the table, is a v ector including all the allele v alues obtained from the genotyping process of the pool P for the j th STR. Finally , the absolute RFU alleles peak (h) of each allele for each co w of the f arm, for BM and for CH were or g anized according to T able 3. At the end all tab ular data were stored in comma-separated v alues (CSV) format te xt files. IJECE V ol. 8, No. 5, October 2018: 3913 3922 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3915 T able 1. Example of a data f arm or g anization.Here the a (i,j)x ,a (i,j)y notation represents the tw o alleles for each co w in each STR. Co ws STR1 STR2 STR3 ... STR n CO W1 a (1,1)x ,a (1,1)y a (1,2)x ,a (1,2)y a (1,3)x ,a (1,3)y ... a (1, n )x ,a (1, n )y CO W2 a (2,1)x ,a (2,1)y a (2,2)x ,a (2,2)y a (2,3)x ,a (2,3)y ... a (2, n )x ,a (2, n )y CO W3 a (3,1)x ,a (3,1)y a (3,2)x ,a (3,2)y a (3,3)x ,a (3,3)y ... a (3, n )x ,a (3, n )y ... ... ... ... ... ... CO W m a ( m ,1)x ,a ( m ,1)y a ( m ,2)x ,a ( m ,2)y a ( m ,3)x ,a ( m ,3)y ... a ( m , n )x ,a ( m , n )y T able 2. BM and CH data or g anization. Here a j P represents the pool P allele v ector for each STR. Pool STR1 STR2 STR3 ... STR n BM a 1 B M a 2 B M a 3 B M ... a n B M CH a 1 C H a 2 C H a 3 C H ... a n C H 2.2. Computer -Assisted Molecular T raceability The first e xperiments we performed attempted to e v aluate the ability to trace dairy products using well kno wn softw are algorithms commonly used in genetic distance analysis lik e FST A T [13], PHYLIP [14] and SMOGD [15] and then resorting to STR UCTURE [16]. Ho we v er , results sho wed that these algorithms were not well suited to accomplish the intended purpose. The y usually apply a Bayesian algorithm approach to assign a sample genotype to a specific dataset representing the candidate group of origin. While the y w ork well in diploid data (i.e. only tw o alleles), the y did not perform properl y in the e xperimental setup considered in this paper due to the presence of v ariable numbers of alleles for each STR in e v ery sample (e.g. milk and cheese pooled DN A samples). Therefore, we decided to implement a ne w approach able to detect if the BM or CH fingerprint could be traced and compared with the genetic pool characteristics of the producing f arm. Our inno v ati v e method is at first glance an automatic heuristic procedure based on the Co v ariance Matrix Adaptation Ev olution Strate gy (CMA-ES) algorithm. The heuristic is emplo yed to estimate the lik elihood of an STRs profile of BM or CH to be originated by a combination of the STR profiles of the co ws from which the dairy product w as originated from. The ne xt subsection pro vides the reader with the general principles about the CMA-ES, which is necessary to better understand the proposed computer -assisted molecular traceability method described ne xt. 2.2.1. CMA-ES algorithm The co v ariance matrix adaptation e v olution strate gy (CMA-ES) is an optimization method first pro- posed by Hansen, Oster Meier , and Ga welczyk [17] and furt her de v eloped in subsequent years [18, 19]. The CMA-ES performs an e xploration in a solution space e xploiting a co v ariance matrix, closely related to the in v erse Hessian on con v e x-quadratic functi o ns . The approach is particularly suited to solv e dif ficult non-linear , non-con v e x, and non-separable problems, of at least moderate dimensionality (i.e. n 2 [10 ; 100] ). In CMA-ES, iteration steps are called generations due to its bi ological foundations. The v alue of a generic algori thm parameter y during generation g is denoted with y (g) . The mean v ector m (g) 2 R n represents the f a v orite, most promising solution so f ar . The step size (g) 2 R + controls the step length, and the co v ariance matrix C (g) 2 R n n determines the shape of the distrib ution ellipsoid in the search space. Con v ersely , its goal is to fit the search distrib ution to the contour lines of the objecti v e function f to be minimized: C (0) = I . One of the main characteristics of the CMA-ES is that it requires almost no parameter tuning for its application unlik e most common heur istic optimization methods [20]. The choice of its internal parameters is not left to the user . Notably , the def ault population size is comparati v ely small to allo w for f ast con v er gence. Restarts with increasing population size ha v e been demonstrated [21] to be useful to impro v e the global search performance, and are no w adays included as an option in the standard algorithm. In this research we used the CMA-ES package de v eloped in R [10]. DN A P ool Analysis-based F or g ery-Detection of Dairy Pr oducts (F r ancesco Rossi) Evaluation Warning : The document was created with Spire.PDF for Python.
3916 ISSN: 2088-8708 T able 3. The height of the RFU alleles peak (h instead of a) in each STR for each co w . RFU STR1 STR2 STR3 ... STR n CO W1 h h (1,1)x ,h (1,1)y h (1,2)x ,h (1,2)y h (1,3)x ,h (1,3)y ... h (1, n )x ,h (1, n )y CO W2 h h (2,1)x ,h (2,1)y h (2,2)x ,h (2,2)y h (2,3)x ,h (2,3)y ... h (2, n )x ,h (2, n )y CO W3 h h (3,1)x ,h (3,1)y h (3,2)x ,h (3,2)y h (3,3)x ,h (3,3)y ... h (3, n )x ,h (3, n )y ... ... ... ... ... ... CO W m h h ( m ,1)x ,h ( m ,1)y h ( m ,2)x ,h ( m ,2)y h ( m ,3)x ,h ( m ,3)y ... h ( m , n )x ,h ( m , n )y BM h h 1 B M h 2 B M h 3 B M ... h n B M CH h h 1 C H h 2 C H h 3 C H ... h n C H 2.2.2. Computer -assisted molecular traceability pipeline In this study we assume that, if a certain number of co ws that produced the BM or CH does e xist, then the BM or CH genetic STR profile should be a linear combination of the STR profiles of those co ws. Under this postulate, the automated for gery detection we propose is composed of tw o steps: data normalization, and heuristic simulation. The purpose of the data normalization step is to preprocess the RFU ra w data (see T able 3) of a specific dairy product (CH or BM pool analysis) and the ones from the profiles of the co ws belonging to the declared f arm. This in turn mak es them comparable and allo ws us to perform for gery detection. All RFU peak profiles are therefore normalized between [0,1] producing the normalized dataset reported in T able 4 where: H ( i;j ) = h ( i;j ) x max ( h ( i ) x ) ; h ( i;j ) y max ( h ( i ) y ) (1) is the normalized pair v alues of alleles’ RFU peaks for co w i and STR j; H ( j ) p = h ( j ) p max ( h p ) (2) is the normalized v ector of alleles’ RFU peaks for pool P (BM or CH) and STR j. T able 4. Normalized co ws and pool (BM and CH) STR-RFU peak tab ular data. Normalized STR1 STR2 STR3 ... STR n CO W1 H H (1,1)x ,H (1,1)y H (1,2)x ,H (1,2)y H (1,3)x ,H (1,3)y ... H (1, n )x ,H (1, n )y CO W2 H H (2,1)x ,H (2,1)y H (2,2)x ,H (2,2)y H (2,3)x ,H (2,3)y ... H (2, n )x ,H (2, n )y CO W3 H H (3,1)x ,H (3,1)y H (3,2)x ,H (3,2)y H (3,3)x ,H (3,3)y ... H (3, n )x ,H (3, n )y ... ... ... ... ... ... CO W m H H ( m ,1)x ,H ( m ,1)y H ( m ,2)x ,H ( m ,2)y H ( m ,3)x ,H ( m ,3)y ... H ( m , n )x ,H ( m , n )y BM H H 1 B M H 2 B M H 3 B M ... H n B M CH H H 1 C H H 2 C H H 3 C H ... H n C H The proposed for gery detection heuristic w orks analyzing the normalized data reported in T able 4. Our technique assumes that the amount of milk from each co w used in the production of the analyzed dairy product is unkno wn. The goal of the heuristic is to find the best co ws’ weighted combination (W) in such a w ay that the sum of the weighted co ws’ STR profiles produces a pattern as similar as possible to those of the analyzed dairy product. As an output score, the proposed model returns the sum of the squared errors (SSE) of the dif ferences between the alleles of the e xpected milk or cheese STR profile and the predicted one, multiplied by tw o penalty coef ficient. The first penalty (P1) is the percentage of alleles that are included in the STR profile of the dairy product b ut that are not present in an y STR co w profile. The se cond penalty (P2) is the percentage of alleles a v ailable in co w profiles b ut not detected in the genotyping process of the pool. In other w ords, P1 represents the possible introduction of a for gery , while P2 esti mates the loss of alleles from the co ws pattern IJECE V ol. 8, No. 5, October 2018: 3913 3922 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3917 due, for e xample, to the ripening process or the sample collection procedure. The outline of the proposed method is sho wn in Figure 1. Figure 1. Global scheme of the F or gery Detection Model The algorithm recei v es tw o main inputs: CO W H is the m n matrix containing all normalized data for the co ws composing the f arm (T able 4). This table includes all data required to identify the tar get production f arm for the dairy product under in v estig ation; BM H/CH H is a v ector reporting the normalized STR RFU peaks for the diary product under in v estig a- tion (BM or CH) follo wing the format reported in T able 4. As a first step, the algorithm e xploits the optimization capability of the CMA-ES t o s earch for the best linear combination of the STR RFU peaks of the co ws composing the f arm (CO W H) able to generate the STR RFU profile of the diary product under in v estig ation (BM H or CH H). This actually translates into the computation of a v ector W of size m representing the computed contrib ution of each co w to the tar get diary product. Essentially the CMA-ES starts with an unkno wn weight v ector equal to 0 (W=0). The CMA-ES then w orks o v er se v eral generations until a stop condition is reached: max number of iterations or con v er gence. The best solution W identified by the CMA-ES is finally used to calculate the predicted profile for the tar get diary product as: pP = W C O W H (3) where pP 2 f pB M H ; pC H H g The computed profi le (pP) and the original pool profile (BM H or CH H) can then be compared to calculate the sum of squared error (SSE B M or SSE C H ) between the tw o profiles. This errors, corrected by the tw o penalty scores P1 and P2, can then be used to compute the final for gery score of the diary product with respect to the selected f arm as: S C O R E = S S E P P 1 P 2 (4) S S E P 2 f S S E B M ; S S E C H g S S E B M = S S E ( B M H ; pB M H ) and S S E C H = S S E ( C H H ; pC H H ) Since in case of frauds it may happen that a certain allele that appears in a specific STR of BM or CH does not appear in an y STR allele of the co ws, the RFU peak of that allele is tak en into account in the SSE computation ag ainst a def ault v alue equal to 0. On the other hand, if the occurrence of a certain allele in a STR of a co w does not appear in the STR alleles v ector of the pool, the routine automatically inserts a def ault v alue equal t o 0 for that allele in the pool’ s STR v ector . This last circumstance is possible when, during the genotyping process, or due to the ripening of the cheese, some allele are lost or not amplified enough. DN A P ool Analysis-based F or g ery-Detection of Dairy Pr oducts (F r ancesco Rossi) Evaluation Warning : The document was created with Spire.PDF for Python.
3918 ISSN: 2088-8708 The heuristic simulation is e xpected to return a score as close as possible to 0 in case of appropriate matching between the dairy products and the co ws of a f arm. Otherwise, in case of frauds, we e xpect that the automatic for gery detection returns a higher score v alue. In f act, in this case, there should be much more inconsistenc y in the match due to incoherent co ws vs. dairy product STR patterns. In order to perform its optimization, the CMA-ES requires the definition of a fitness function. Essen- tially , our goal is to minimize the SSE between the BM or CH genetic profile and the corresponding predicted one computed as a linear combi nation of t he co ws profiles. The SS E can t herefore be e xploi ted as an ef ficient fitness function for our goal. The temporary weight v ector that is generated iterati v ely during the generation (g) is multiplied by the co ws’ profile to predict the temporary pool’ s pattern. The fitness function returning the SSE v alue is computed as follo ws: F itness = S S E ( g ) P (5) S S E ( g ) P 2 n S S E ( g ) B M ; S S E ( g ) C H o S S E ( g ) B M = S S E ( B M H ; pB M H ( g ) ) S S E ( g ) C H = S S E ( C H H ; pC H H ( g ) ) pB M H ( g ) or pC H H ( g ) = W ( g ) C O W H W ( g ) is the temporary weight v ector computed by CMA-ES at the generation g of the optimization process One more important feature that w as implemented in the softw are concerns W . Since in a f arm, during the lactation period, each co w contrib utes with an unkno wn amount of milk w (that is essentially what the heuristic routine tries to estimate), we assume that e v ery contrib ution cannot f all outside a predefined range that is: l ow er boundar y < w < u p per boundar y (6) l ow er boundar y = 0 : 5 m and upper boundar y = max ( 3 m ; 1) The we ight boundary condition is sho wn in Figure 2. It accounts for the f act that a co w cannot produce under/o v er a specific milk rate in relation to the number of the other milking co ws (m). These constraints were chosen after analyzing se v eral b ulk milk batches and also after se v eral discussions with the f arm and v eterinary staf f. Basically it is supposed that each co ws should produce more than a half and less of the tr iple of the mean quantity of the dairy product (i.e. 1 =m ). Moreo v er , the upper boundary cannot e xceed the v alue 1 since a co w must not produce all the dairy product by itself. An yw ay these constraints can be freely changed and the y could be used to further refine the analysis in case of e xplicit information from producers concerning a particular dairy product. Figure 2. Boundary condition for w during the CMA-ES routine 2.2.3. Experimental Setup T o demonstrate the usability of t he proposed approach we designed three e xperiments. The first one consists in analyzing the dairy product produced with 100% of milk of the same f arm (i.e., CO W H, BM H or IJECE V ol. 8, No. 5, October 2018: 3913 3922 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3919 CH H tak en from the same f arm). In the second e xperiment, instead, we analyzed a partial for gery in which a dairy product is produced from 50% randomly selected co ws from a f arm and 50% randomly selected co ws from the other f arm . Finally , in the third e xperiment, we analyzed a full for gery scenario in which we compared the diary product from a f arm ag ainst the STR profile of the co ws of the second f arm. F or each f arm and for each of the three for gery le v els, e v ery dairy product has been analyzed 24 times to highlight possible v ariations within the results. The whole e xperiment w as e x ecuted in parallel on an eight-core machine Intel Xenon CPU E5-2680 @ 2.70GHz, 64 GB RAM, Ub untu 14.04 L TS. The STR Dataset pre viously described in section 2.1 is summarized in T able 5. T able 5. Summary of the STR dataset used in the analysis. F arm No. Co ws No. Pool Samples A 12 Bulk milk: 12 Deri v ed Cheese: 12 B 14 Bulk milk: 11 Deri v ed Cheese: 11 3. RESUL T AND AN AL YSIS The main purpose of this w ork w as to de v elop a ne w automatic methodology to highlight poss ible adulterations in dairy products thanks to a computational heuristic analysis. Using the method described in the pre vious sections, we obtained the results reported in Figure 3 and Figure 4. Figure 3 reports the mean score v alues computed by the proposed heuristic o v er the 24 repetitions for the Bulk Milk analysis in F arm A and B. F or each sampled pool, and for each month, the Figure sho ws the estimation of the three e xperimental setups described in section 2.2.3. with the changing for gery percentage. Figure 4 reflects the results of the cheese for gery simulation follo wing the same criteria of Figure 3. Figure 3. Results of the mean score v alues for F arm A (left side) and F arm B (right side) for the B ULK MILK analysis for each a v ailable month. Black lines are related to 100% true co ws setup analysis, the blue ones are related to 50% of adulterated milk origin, and the red ones are 100% for ged milk origins. Our for gery scores are o v erall v ery good according to the e xpect ed results: higher scores in ca se of adulteration and scores clos e to 0 otherwise. Moreo v er , it can be seen that partial for gery simulations are globally between 100% for ged and 100% true e xamples. This beha vior can be observ ed both in milk and cheese predictions. In the majority of the cases, the proposed automatic for gery dete ction re v eals a considerably good accurac y with the e xception of a fe w e xamples. A summary of the aggre g ated results is gi v en in Figure 5. These box plots represent the grouped results of Figure 3 and Figure 4, respecti v ely . In general, the scores obtained for milk and deri v ed cheese simulation i ndicate that it is pos sible to c haracterize our model with progres si v e cut-of fs able to ident ify if for gery has occurred. As indicated in the figure the b ulk milk box es are noticeably well separated, while the cheese b ox es sho w a less sharp separation in particular in the F arm B between the 50% for ged and the 100% true group. The suggestion is that probably the STR profiles of the F arm A, that occur in the random selection DN A P ool Analysis-based F or g ery-Detection of Dairy Pr oducts (F r ancesco Rossi) Evaluation Warning : The document was created with Spire.PDF for Python.
3920 ISSN: 2088-8708 Figure 4. Results of the mean score v alues for the F arm A (left side) and the F arm B (right side) for the CHEESE analysis for each a v ailable month. Black lines are related to 100% true co ws setup analysis, the blue ones are related to 50% of adulterated co ws and the red ones are 100% for ged co ws. for f alse co ws, are too similar to the correct ones and only with a higher percentage of for gery the scores are e xtensi v ely re v ealed. Figure 5. Box plots of grouped scores for the F arm A and B in the b ulk milk and cheese analysis. Black box are related to 100% true co ws setup analysis, the blue ones are related to 50% of adulterated co ws and the red ones are 100% for ged co ws. The o v erall results for the dairy product analysis is sho wn in Figure 6. Here the global scores are grouped together only to sho w the dif ferences among the true simulation and the other tw o ratios of adulteration. Notice that F arm A and F arm B are mer ged, just lik e BM and CH. The dif ference between the three groups (100% true, 50% for ged and 100% for ged) is statis tically significant (p < 0.05). This result also pro v es that it is possible to define a cut-of f between distincti v e le v els of dairy product counterfeiting score (e.g., score=1 define adequately the limit for not for ged product ag ainst half or complete f alsified ones, score=2.5 is an opportune cut for sure complete f alsification). From the obtained results, it is e vident that the automatic for gery detection model implemented and described in this paper is capable to identify the occurrence of irre gular dairy product manuf acturing and is also able to quantify the magnitude of the fraud. These results also suggest that this methodology may pro vide a useful strate gy eligible to other food traceability conte xt. 4. CONCLUSION In this paper we proposed an inno v ati v e automatic for gery detection method based on a heuristic procedure. This system is able to measure the lik elihood that a traditional dairy product is ori ginated from a kno wn f arm, thus pro viding a measure of the le v el of potential counterfeiting. W e in v estig ated the use of Short IJECE V ol. 8, No. 5, October 2018: 3913 3922 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3921 Figure 6. Global simulation scores. Both f arms and dairy products are grouped. Black box is related to 100% true co ws setup analysis, the blue one is related to 50% of adulterated co ws and the red one is 100% for ged co ws. The * indicate significant dif ference between groups (p¡0.05). T andem Repeats associated to their relati v e fluorescence unit (RFU) to estimate the quantity of each indi vidual that contrib uted in the final pool. W e emplo yed a Co v ariance Matrix Adaptation Ev olution Strate gy algorithm in order to predict the traceability bet ween dairy products and the corresponding producer . Results obtained in se v eral e xperiments pro vided e xcellent outcomes and encourage the research community to in v estig ate further to emplo y this method to other foodstuf f traceability issues. A CKNO WLEDGEMENT This w ork w as supported by Italian Ministry of Health grant IZS PL V 01/14 RC. REFERENCES [1] H. Me gens, et al., ”Biodi v ersity of pig breeds from China and Europe estimated from pooled DN A samples: dif ferences in microsatellite v ariation between tw o areas of domestication, Genetics Selection Ev olution , v ol. 40, no. 1, pp. 103-128, 2008. [2] H. Schnack, et al., ”Accurate determination of microsatellite allele frequencies in pooled DN A samples, European Journal of Human Genetics , v ol. 12, no. 11, pp. 925-934, 2004. [3] G. Skalski, et al., ”Ev aluation of DN A Pooling for the Estimation of Microsatellite Allele Frequenci es: A Case Study Using Striped Bass (Morone saxatilis), Genetics , v ol. 173, no. 2 , pp. 863-875, 2006. [4] C. Likhitha, P . Ninitha, V . Kanchana, ”DN A Bar -coding: A No v el Approach for Identifying an Indi vidual Using Extended Le v enshtein Distance Algorithm and STR analysis, International Journal of Electrical and Computer Engineering (IJECE) , v ol. 6, no. 3, pp.1133-1139, 2016. [5] M. W idyanto, R. N. Hartono, N. Soedarsono, ”A No v el Human STR Similarity Method using Cascade Sta- tistical Fuzzy Rules with T ribal Information Inference, International Journal of Electrical and Computer Engineering (IJECE) , v ol. 6, no. 6, pp. 3103-3111, 2016. [6] A. Bagnato, et al., ”Quantitati v e T rait Loci Af fecting Milk Y ield and Protein Percentage in a Three-Country Bro wn Swiss Population, Journal of Dairy Science , v ol. 91, no. 2, pp. 767-783, 2008. [7] E. Lipkin, et al., ”Quantitati v e T rait Locus Mapping in Chi ck ens by Selecti v e DN A Pooling with Dinu- cleotide Microsatellite Mark ers by Using Purified DN A and Fresh or Frozen Red Blood Cells as Applied to Mark er -Assisted Selection, Poultry Science , v ol. 81, no. 3, pp. 283-292, 2002. [8] J. P ark, et al., ”Determination of the Authenticity of Dairy Products on the Basis of F atty Acids and T riac ylglycerols Content using GC Analysis, K orean Journal for F ood Science of Animal Resources , v ol. 34, no. 3, pp. 316-324, 2014. [9] M. Sardina, et al., ”Application of microsatellite mark ers as potential tools for traceability of Gir gentana goat breed dairy products, F ood Research International , v ol. 74, pp. 115-122, 2015. DN A P ool Analysis-based F or g ery-Detection of Dairy Pr oducts (F r ancesco Rossi) Evaluation Warning : The document was created with Spire.PDF for Python.
3922 ISSN: 2088-8708 [10] H. T rautmann, O. Mersmann, D. Arnu, ”cmaes: Co v ariance Matrix Adapting Ev olutionary Strate gy , R package v ersion 1.0-11, 2011. [11] T eam RC, ”R: A language and en vironment for statistical computing, V ienna, Austria: R F oundation for Statistical Computing, 2014. [12] R. T oonen, S. Hughes, ”Increased throughput for fragment analys is on an ABI PRISM 377 automated sequencer using a membrane comb and STRand softw are, Biotechniques , v ol. 31, no. 6, pp. 1320-1324, 2001. [13] J. Goudet, ”FST A T a program to estimate and test gene di v ersitie s and fixat ion indices (v ersion 2.9.3), A v ailable: http://www . unil. ch/izea/softw ares/fstat.html, 2001. [14] J. Felsenstein, ”PHYLIP (Ph ylogen y Inference P ackage), A v ailable: http://e v olution.genetics.w ashington.edu/ph ylip.html, 2005. [15] N. Cra wford, ”smogd: softw are for the measurement of genetic di v ersity , Molecular Ecology Resources , v ol. 10, no. 3, pp. 556-557, 2010. [16] M. Hubisz, et al., ”Inferring weak population structure with the assistance of sample group information, Molecular Ecology Resources , v ol. 9, no. 5, pp. 1322-1332, 2009. [17] N. Hans en, A. Ostermeier , A. Ga welczyk, ”On the Adaptation of Arbitrary Normal Mutation Distrib utions in Ev olution Strate gies: The Generating Set Adaptation, ICGA , 1995, pp. 57-64. [18] N. Hansen, A. Ostermeier , ”Completely Derandomized Self-Adaptation in Ev olution Strate gies, Ev olu- tionary Computation , v ol. 9, no. 2, pp.159-195, 2001. [19] N. Hansen, S. Mller , P . K oumoutsak os, ”Reducing the T ime Compl e x i ty of the Derandomized Ev olution Strate gy with Co v ariance Matrix Adaptation (CMA-ES), Ev olutionary Computation , v ol. 11, no. 1, pp. 1-18, 2003. [20] I. Ismail,A. Hanif Halim, ”Comparati v e Study of Meta-heuristics Optimization Algorithm using Bench- mark Function, International Journal of Electrical and Computer Engineering (IJECE) , v ol. 7, no. 3, pp. 1643-1650, 2017. [21] A. Auger , N. Hansen, ”A restart CMA e v olution strate gy with increasing population size, Ev olutionary Computation, 2005. The 2005 IEEE Congress on. IEEE , v ol. 2, pp. 1769-1776, 2005. IJECE V ol. 8, No. 5, October 2018: 3913 3922 Evaluation Warning : The document was created with Spire.PDF for Python.