Inter national J our nal of Electrical and Computer Engineering (IJECE) V ol. 11, No. 6, December 2021, pp. 4767 4773 ISSN: 2088-8708, DOI: 10.11591/ijece.v11i6.pp4767-4773 r 4767 On data collection time by an electr onic nose Piotr Bor o wik 1 , Leszek Adamo wicz 2 , Rafał T arak o wski 3 , Krzysztof Siwek 4 , T omasz Grzywacz 5 1,2,3 F aculty of Ph ysics, W arsa w Uni v ersity of T echnology , W arsza w a, Poland 4,5 F aculty of Electrical Engineering, Institute of Theory of Electrical Engineering, Measurement and Information Systems, W arsa w Uni v ersity of T echnology , W arsza w a, Poland Article Inf o Article history: Recei v ed Oct 17, 2020 Re vised May 12, 2021 Accepted Jun 12, 2021 K eyw ords: Electronic nose Measurement time Multisensor measurement Odor classification ABSTRA CT W e use electronic nose data of odor measurements to b uild machine learning clas- sification models. The presented analysis focused on determining the optimal time of measurement, leading to the best model performance. W e observ e that the most v aluable information for classification is a v ailable in data collected at the be ginning of adsor ption and the be ginning of the desorption phase of measurement. W e demon- strated that the usage of comple x features e xtracted from the sensors’ response gi v es better classification performance than use as features only ra w v alues of sensors’ re- sponse, normalized by baseline. W e use a group shuf fling cross-v alidation approach for determining the reported models’ a v erage accurac y and standard de viation. This is an open access article under the CC BY -SA license . Corresponding A uthor: Leszek Adamo wicz F aculty of Ph ysics W arsa w Uni v ersity of T echnology ul. K oszyk o w a 75, 00-662 W arsza w a, Poland Email: Leszek.Adamo wicz@pw .edu.pl 1. INTR ODUCTION Electronic noses (e-noses) [1]-[3] are artificial de vices that consist of an array of g as sensors supported by machine learning pattern recognition techniques. One of the critical fields of application of e-nose is the food industry [4]-[10] for which odor characteristics o f the products are one of the essential indi cations of product quality . De v elopment and v erification of the machine learning methods in application to the odors classification by e-nose measurements data ha v e a similar le v el of importance as the de v elopment of the sensors and sensors arrays consisting of the e-nose hardw are. Similar techniques can be emplo yed re g ardless of the domain of application of the e-nose. One of the main obstacles in such de v elopment is the relati v ely long time needed to collect suf ficient measurement data. T o o v ercome this, one can use publicly a v ailable datasets, which became f amous as testbeds for machine learning modeling reported by multiple authors. In the present studies, we use publicly a v ail able datasets [11] of odor measurements by electronic no s e. The same dataset w as used in our pre vious research [12]. W e focused on features e xtraction and selection, optimization of the number of used sensors, and the possibility to use for classification only single-sensor electronic nose. The original studies of the same dataset [13] were focused on the possibility of spoilage odor detection after a v ery short e xposure of the electronic nose to the odor sample, lasting a fe w seconds. Zhang and co w ork ers [14] used this dataset to demonstrate proposed analytical algorithms’ performance. In this report, we deal with the dif ferent subjects of optimization concerning the time of odor m ea- surement. W e are interested in the analysis of the dependence of the classification accurac y on the odor mea- surement time. Recently Rodriguez Gamboa and co w ork ers [15] e xamined se v eral datasets and used deep J ournal homepage: http://ijece .iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
4768 r ISSN: 2088-8708 learning and support v ector machine models to demonstrate the potential of using only a part of electronic nose measurement data for correct odor classification. The used dataset is collected by custom-made e-nose consisting of T aguchi type MQ-series g as sen- sors. In recent years one can find man y suggestions for constructing lo w-cost electronic noses, and se v eral groups propose de vices based on similar sensors [13], [16]-[22]. The findings presented in this report can be rele v ant to other applications of similar de vices. Considerable research concerning e-nose data is focuse d on the e xtraction of the comple x features describing curv es of sensors’ response to the g as e xposure. Ho we v er , there are also other reports, especially applying deep learning neural netw orks, in which ra w measurement data are used. It is interesting to compare both approaches and demonstrate the influence of the dimensionality reduction by the principal components method. 2. METHODS AND PR OCEDURES 2.1. Odor measur ement Rodriguez Gamboa and co w ork ers [13] presented the measurement of odor at v arious spoilage stages. T wenty-tw o samples of bottles of commercially a v ailable wines of dif ferent v arieties and vintages from four producers from the S ˜ ao Francisco v alle y (Pernamb uco-Brazil) were used. Thirteen randomly selected bottles were left open for six months, which g a v e the population of lo w-quality wines. F our randomly selected bottles were left open for tw o weeks before measurement, and the y are considered as a v erage quality wines. The remaining v e bottles are labeled as high-quality wines. Except for these samples, samples of ethanol diluted in distilled w ater in six dif ferent concentrations were used, which may be considered additional six measured bottles. That gi v es four cate gories of odor that are classified. In total, the dataset consists of measurements of 300 samples as collections of sensors’ response of 3 300 points for each sensor . The e-nose de v eloped at Uni v ersidade Federal Rural de Pernamb uco [13] consists of six com mercially a v ailable metal-oxide g as sensors produced by Hanwei Sensors (www .hwsensor .com). T w o sensors of each type (MQ-3, MQ-4, MQ-6) ha v e been used in the presented construction. During the measurement, the first 10 seconds were used to collect baselines of sensors’ response when e-nose w as e xposed to pure air . Then, the odor’ s prepared sample w as pumped into the sensor chamber , and 80 seconds of sensors’ response during the adsorption phase w as collected. After that, the sensors’ response during 90 seconds of the desorption phase w as collected when pure air w as pumped to the sensors’ chamber . After the measurement, the e-nose w as e xposed to pure air for 10 minutes to pur ge the e xperimental setup. 2.2. Classification modeling The measurement data are a series of sensors’ responses e xpressed as their resistance R o v er time. As the first step of data processing, the measurement data are di vided by the sensor resistance’ s baseline v alue R 0 collected just before electronic nose e xposure to the measured odor sample. In Figure 1, we present an e xample of sensor signals collected during the measurement of the odor sample, with a schematic representation of the time span used for the e xtracti on of modeling features. The signal collection is performed with a frequenc y of 18.5 Hz. As a first step of the data processing to reduce the measurement noise, the a v erage response with 20 observ ations is calculated. Then tw o approaches of e xtraction of features used for classification within machine learning model training ha v e been emplo yed. First, we decided to use modeling features, just the magnitudes of sensors’ re- sponse relati v e to the baselines: R =R 0 and in v ersion of these v alues representing sensors’ conductanc e G=G 0 . W e emplo yed the second approach to e xtract the comple x features describing sensors’ response curv es, e.g., a v erage v alue, maximum v alue, and maximum slope. The complete list of the features that we ha v e used for training classification models is presented in our pre vious report [12]. Since our studies are focused on the de- pendence of the classification performance on the measurement time by the e-nose, the model ing features are calculated using only part of a v ailable data, in the range from the be ginning of g as e xposure until the considered time. W e represent this by the dashed re gion in Figure 1. The odor samples were prepared from 28 bottles, and each of them w as used for about ten measure- ments. It should be noted that such an e xperimental procedure leads to a correlation between training obser - v ations. Hence, to obtain a reliable estimation of the classification models’ performance, we applied a group shuf fle cros s-v alidation procedure, assuring that all observ ations from a gi v en bottle’ s odor measurements are Int J Elec & Comp Eng, V ol. 11, No. 6, December 2021 : 4767 4773 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Elec & Comp Eng ISSN: 2088-8708 r 4769 attrib uted either to training or testing dataset. The cross-v alidation procedure has been performed 200 times in a loop, and model performance results a v eraged. Figure 1. Example of sensors signals of odor measurement, (a) normalized resistance R =R 0 and (b) normalized conductance G=G 0 ; the dashed rectangle schematically represents an e xample of the time span of data used to e xtract the modeling features T w o types of modeling techniques ha v e been applied: logistic re gression with multinomial classi- fication ( LogReg ) and support v ector machine classification ( SVC ) with radial basis functions k ernel and one-vs-one multi-class scheme. F or both algorithms, we performed tw o types of tests. In the first case, the modeling features, as described abo v e, were used. In the second ca se, these input v ariables were transformed using the principal component analysis method. Only the six most important components were used as the modeling features ( PCAReg, PCASVC ). The prepared features dataset w as transformed using the standard scaller method. W e decided to use only these classical modeling techniques [23] Moreo v er , we disre g arded more comple x algorithms such as multilayer neural netw orks s ince the number of observ ations a v ailable for modeling is quite limited. In total, the used datasets [11] contain measurements of 300 odor samples. Ev en though more fle xible modeling techniques can pro vide more e xpressi v e classification models, the number of fitted parameters is much higher than in the applied methods. Agg arw al [24] (page 25) indicate that the total number of training data points should be at least 2 to 3 times lar ger than the number of parameters in the neural netw ork. Ho we v er , the precise number of data instances depends on the specific model at hand. Hence, the simpler models that we applied in principle should be less prone to o v er -fitting. The modeling has been performed using computer codes in Python 3.7 language with a scikit-learn module [25]. 3. RESUL TS AND DISCUSSION In Figure 2, we present a comparison of the a v erage cross-v alidation accurac y of v arious types of models as a function of time from the be ginning of sensors’ e xposure to e xamined odor , from which data ha v e been used for model b uilding. Besides, in tw o subfigures, we w ould lik e to distinguish between v arious approaches to the e xtraction of modeling features. Figure 2(a) sho ws the ra w data of sensors’ response relati v e to the baseline. While in Figure 2(b), models are b uilt using an e xtensi v e set of comple x features [12]. The first observ ation from these results is that the logistic re gression model e xhibits the best model accurac y performance. It is also interesting to notice that this is confirmed in tw o considered modeling feature sets. When we compare Figures 2(a) and (b), we can also observ e that models trained on comple x features On data collection time by an electr onic nose (Piotr Bor owik) Evaluation Warning : The document was created with Spire.PDF for Python.
4770 r ISSN: 2088-8708 e xhibit better cl assification accurac y than the m odels with the sensors’ r esponse’ s ra w v alues. The significance of the feature e xtraction procedures de v eloped by the e-nose research community is thus visible. Figure 2. The a v erage accurac y of v arious classification techniques, as a function of time of measurement, from which data are used for model training and testing: (a) normalized sensors’ responses used as modeling features (both R =R 0 and G=G 0 ) and (b) comple x modeling features e xtracted from sensors’ response curv es (the first 10 seconds of measurement is not presented as it w as the phase of baseline le v el collection) Another important observ ation can be deduced from Figure 2(b). There is an abrupt increase in the model performance just at the be ginning of the sensors’ e xposur e to the studied odor . Similar beha vior can be noticed at the starting moment of desorption when the sensors are ag ain e xposed to the clean air . W e deduce that precise measurement can gi v e the most rele v ant information that can be used for odor classification during these moments. Special care [26] in the design of an electronic nose is required to pro vide a rapid change of sensors’ e xposure to dif ferent g ases, remembering to ensure repeatability of measurement conditions. Szczurek et al. [27] and Staymates et al. [28] reported measurements in ”s nif fing” mode when frequent changes between studied odor and pure air occur or in the initial time of the sensors’ action [29]. In Figure 3, we present another comparison of models’ performance as a function of time of measurement from which data are a v ailable for model b uilding. W e focus on logistic re gression models and compare six types of modeling feature sets, which are a combination of tw o cases, as we summarize in the T able 1. As we already noticed, the results presented in Figure 3(a) confirm that better classification perfor - mance can be achie v ed when comple x features e xtracted from the sensors’ response curv es are used compared to models b uilt on just ra w v alues of normalized s ensors’ response. Another interesting observ ation in this figure is that the models in which features are based on sensors’ conductance G also e xhibit better performance, especially when the time of odor measurement by the electronic nose is reduced. Suppose the model is b uilt on the sensors’ resistance R data. In that case, this requires performing odor measurement for a longer time and mainly includes measuring the desorption phase of the sensors’ response. The same observ ation concerning models b uilt on the resistance data is v alid for both types of sets of features considered in the present studies. In Figure 3(a), one can notice that for both R curv es, the y e xhibit a kind of saturation re gion. After the be ginning of the desorption phase, at 100 seconds, the models’ accurac y is ag ain impro v ed. Figure 3(a) suggests that it may be enough to reduce the odor measurement time for about 30 seconds when comple x features are e xtracted from resistance or resistance and conductance response curv es. In that case, the increase of measurement time and measurement in the desorption re gime does not lead to better classification performance. More insights gi v e e xamination of Figure 3(b), in which the standard de viation of the accurac y of 200 Int J Elec & Comp Eng, V ol. 11, No. 6, December 2021 : 4767 4773 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Elec & Comp Eng ISSN: 2088-8708 r 4771 models trained during group shuf fle cross-v alidation procedure is presented. W e can conclude that the odor measurement time should be in the ra n ge of 70–90 seconds (including 10 seconds of baseline conditions mea- surements), allo wing us to obtain more stable classification results. When the data e xtracted from the sensors’ resistance v alues are included in the modeling, it introduces some additional noise, which only slightly reduces the cla ssification accurac y and leads to less stable models. The reduction of the classification performance on ne w data may appear in this w ay . Figure 3. Cross-v alidation estimation of logistic re gression models classification performance: (a) a v erage accurac y and (b) standard de viation of accurac y (the first 10 seconds of measurement is not presented as it w as the phase of baseline le v el collection) As one can notice, e xamining Figure 1 and the description of the measurement procedure in section 2.1, such optimal time of data collection is shorter than half of the measurement time gi v en in [11], [13]. In other research, [15] similar results ha v e been found for measurements of other types of odors. An adv antage of shortening the time of odor detection by an electronic nose is noticeable. Ho we v er , one can k eep in mi nd that this time is not directly related to the number of odor samples measured by the electronic nose de vice in a gi v en time. After the measurement, there is still a need for de vice pur ging and sensors’ base state reco v ery in clear air , much longer than the odor measurement time. T able 1. T ypes of modeling feature sets sensors response ( R =R 0 ; G=G 0 ) comple x features resistance values R feat R conductance values G feat G both values RG feat RG 4. CONCLUSION In the paper , we presented machine learning classification models b uilt on publicly a v ailable datasets of e-nose measurement o f spoilage odor . The researc h focused on v erifying the optimal choice of odor mea- surement time by e-nose to collect data for training a machine learning classification model with superior performance. W e presented a comparison of v arious modeling features based on sensors’ response resistance and conductance. A group shuf fling cross-v alidation approach w as used for determining the reported models’ a v erage accurac y and standard de viation. W e demonstrate that most of the information used by the models for classification is a v ailable firstly in the data from the be ginning of the adsorption phase, which means sensors On data collection time by an electr onic nose (Piotr Bor owik) Evaluation Warning : The document was created with Spire.PDF for Python.
4772 r ISSN: 2088-8708 e xposure to the studied odor , and secondarily in the data from the be ginning of the desorption phase, which means sensors e xposure to the clear air after e xposure to the studied g as. The performed analysis leads us to the conclusions: i) that for the considered case, only comple x features e xtracted from the sensors’ conductance curv es G should be used for a classification model, ii) it is suf ficient to use data of measurement performed during g as adsorption phase only , iii) and that the logistic re gression algorithm should be used. There is a conclusion concerning the recommended machine learning classification method. In man y reports, the support v ector machine is used as a gold standard for such applications. As we demonstrated, it may depend on the considered application, and there are cases when the logistic re gression algorithms pro v e superior performance. A CKNO WLEDGEMENT This w ork w as supported by the National Centre for Research and De v elopment by the grant agree- ment BIOSTRA TEG3/347105/9/NCBR/2017. REFERENCES [1] W . Hu et al., “Electronic noses: From adv anced materials to sensors aided with data processing, Adv anced Materials T echnologies, v ol. 4, 2018, doi: 10.1002/admt.201800488. [2] H. Alam and S. Saeed, “Modern applications of electronic nose: A re vie w , International Journal of Electrical and Computer Engineering (IJECE), v ol. 3, no. 1, pp. 52–63, 2013, doi: 10.11591/ijece.v3i1.1226. [3] N. L. Husni, S. Nurmaini, I. Y ani, and A. Silvia, “Intelligent sensing using metal oxide semiconductor based-on sup- port v ector machine for odor classification, International Journal of Electrical and Computer Engineering (IJECE), v ol. 8, no. 6, pp. 4133-4147, 2018, doi: 10.11591/ijece.v8i6.pp4133-4147. [4] A. Berna, “Meta l oxide sensors for electronic noses and their application to food analysis, Sensors, v ol. 10, pp. 3882–3910, 2010, doi: 10.3390/s100403882. [5] E. A. Baldwin, J. Bai, A. Plotto, and S. Dea, “Electronic noses and tongues: Applications for the food and pharma- ceutical industries, Sensors, v ol. 11, pp. 4744–4766, 2011, doi: 10.3390/s110504744. [6] A. Gliszczy nska- Swigło and J. Chmiele wski, “Electronic nose as a tool for monitoring the authenticity of food. a re vie w , F ood Analytical Methods, v ol. 10, pp. 1800–1816, 2017, doi: 10.1007/s12161-016-0739-4. [7] B. Guna w an, S. Alf arisi, G. Satrio, A. Sudarmaji, M. Malvin, and K. Krisyarangg a, “Mos g as sensor of meat freshness analysis on e-nos e, TELK OMNIKA T elecommunication Computing Electronics and Control, v ol. 17, no. 2, pp. 771–780, 2019, doi: 10.12928/TELK OMNIKA.v13i2.11787. [8] R. Sarno and D. R. W ijaya, “Recent de v elopment in electronic nose data pr ocessing for beef qual- ity assessment, TELK OMNIKA T elecommunication Computing Electronics and Control, v ol. 17, pp. 337–348, 2019, doi: 10.12928/telk omnika.v17i1.10565. [9] S. N. Hidayat et al., “The electronic nose coupled with chemometric tools for discriminating the quality of black tea samples in situ, Chemosensors, v ol. 7, no. 3, 2019, doi: 10.3390/chemosensors7030029. [10] S. A. Lang a and R. Sarno, “T emperature ef fect of electronic nose sampling for classifying mixture of beef and pork, Indonesian Journal of Electrical Engineering and Computer Sc ience (IJEECS), v ol. 19, no. 3, pp. 1626–1634, 2020, doi: 10.11591/ijeecs.v19.i3.pp 1626-1634. [11] J. C. Rodriguez Gamboa, E. S. Albarrac in E., A. J. da Silv a, and T . A. E. Ferreira, “Electronic nose dataset for detection of wine spoilage thresholds, Data in Brief, v ol. 25, 2019, doi: 10.1016/j.dib .2019.104202. [12] P . Boro wik, L. Adamo wicz, R. T arak o wski, K. Siwek, and T . Grzyw acz, “Odor detection using e-nose wi th reduced sensors array , Sensors, v ol. 20, no. 12, 2020, doi: 10.3390/s20123542. [13] J. C. Rodriguez Gamboa, E. S. Albarracin E., A. J. da Silv a, L. L. de Andrade Lima, and T . A. E. Ferreira, “W ine quality rapid detection using a compact electronic nose system: Application focused on spoilage thresholds by acetic acid, L WT - F ood Science and T echnology , v ol. 108, pp. 377–384, 2019, doi: 10.1016/j.l wt.2019.03.074. [14] C. Zhang, W . W ang, and Y . P an, “Enhancing electronic nose performance by feature selection using an impro v ed gre y w olf optimization based algorithm, Sensors, v ol. 20, 2020, doi: 10.3390/s20154065. [15] J. C. Rodriguez Gamboa, A. J. da Silv a, I . S. C. Araujo, E. E. S. Albarracin, and A. C. W . Du- ran, “V alidation of the rapid detection approach for enhancing the electronic nose systems performance, using dif ferent deep learning models and support v ector machines, Sensors and Actuators B: Chemical, v ol. 327, 2021, doi: 10.1016/j.snb .2020.128921. [16] K. T . T ang, S. W . Chiu, C. P an, H. Y . Hsieh, Y . S. Liang, and S. C. Liu, “De v elopment of a portable electronic nose system for the detection and classification of fruity odors, Sensors, v ol. 10, pp. 9179–9193, 2010, doi: 10.3390/s101009179. [17] M. Mac ıas, J. Agudo, A. Manso, C. Orellana, H. V elasco, and R. Caballero, A compact and lo w cost electronic nose for aroma detection, Sensors, v ol. 13, pp. 5528–5541, 2013, doi: 10.3390/s130505528. Int J Elec & Comp Eng, V ol. 11, No. 6, December 2021 : 4767 4773 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Elec & Comp Eng ISSN: 2088-8708 r 4773 [18] S. T rirongjitmoah, Z. Juengmunk ong, K. Srikulnath, and P . Somboon, “Classification of g arlic culti- v ars using an electronic nose, Computers and Electronics in Agriculture, v ol. 113, pp. 148–153, 2015, doi: 10.1016/j.compag.2015.02.007. [19] W . Chansongkram and N. Nimsuk, “De v elopment of a wireless electronic nose capable of measur - ing odors both in open and closed systems, Procedia Computer Science, v ol. 86, pp. 192–195, 2016, doi: 10.1016/j.procs.2016.05.060. [20] T . Majchrzak, W . W ojno wski, T . Dymerski, J. Gebicki, and J. Namie snik, “Electronic noses in clas- sification and quality control of edible oils: A re vie w , F ood Chemistry , v ol. 246, pp. 192–201, 2018, doi: 10.1016/j.foodchem.2017.11.013. [21] S. Fuente s et al. Assessment of smok e contamination in grape vine berries and taint in wines due to b ushfires using a lo w-cost e-nose and an artificia l intelligence approach, Sensors, v ol. 20, 2020, doi: 10.3390/s20185108. [22] C. Gonzalez V iejo, S. Fuentes, A. Godbole, B. W iddicombe, and R. R. Unnithan, “De v elopment of a lo w-cost e- nose to assess aroma profiles: An artificial intelligence application to assess beer quality , Sensors and Actuators B: Chemical, v ol. 308, 2020, doi: 10.1016/j.snb .2020.127688 [23] V . S. P adala, K. Gandhi, and P . Dasari, “Machine learning: The ne w language for applications, IAES International Journal of Artificial Intelligence (IJ-AI), v ol. 8, no. 4, pp. 411-421, 2019, doi: 10.11591/ijai.v8.i4.pp411-421. [24] C. C. Agg arw a l, “Neural Netw orks and Deep Learning, Springer Inter national Publishing, 2018, doi: 10.1007/978-3-319-94463-0. [25] F . Pedre gosa et al., “Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, v ol. 12, pp. 2825–2830, 2011. [26] J. Burlachenk o, I. Kruglenk o, B. Snopok, and K. Persuad, “Sample handling for electronic nose technology: State of the art and fut ure trends, T rA C T rends in Analytical C hemistry , v ol. 82, pp. 222–236, 2016, doi: 10.1016/j.trac.2016.06.007. [27] A. Szczurek and M. Maci eje wska, ““artificial s nif fing” based on induced temporary disturbance of g as sensor response, Sensors and Actuators B: Chemica l, v ol. 186, pp. 109–116, 2013, doi: 10.1016/j.snb .2013.05.085. [28] M. E. Staymates et al., “Biomimetic snif fing impro v es the detect ion performance of a 3d printed nose of a dog and a commercial trace v apor detector , Scientific Reports, v ol. 6,2016, doi: 10.1038/srep36876. [29] C. J. Ga rc ıa-Orellana et al., “Lo w-po wer and lo w-cost en vironmental IoT electronic nose using initial action period measurements, Sensors, v ol. 19, no. 14, 2019, doi: 10.3390/s19143183. On data collection time by an electr onic nose (Piotr Bor owik) Evaluation Warning : The document was created with Spire.PDF for Python.