TY - JOUR
T1 - Benchmarking machine learning methods for comprehensive chemical fingerprinting and pattern recognition
AU - Reichenbach, Stephen E.
AU - Zini, Claudia A.
AU - Nicolli, Karine P.
AU - Welke, Juliane E.
AU - Cordero, Chiara
AU - Tao, Qingping
N1 - Funding Information:
We thank the Brazilian National Council for Scientific and Technological Development (CNPq) for C. Zini’s grant ( 1D 306067/2016-1 ); Coordination for the Improvement of Higher Education Personnel (CAPES) for K. Nicolli’s scholarship ( AUX-PE-PROEX 587/2017 ); and SIBRATEC/FINEP/FAPEG 01.13.0210-00, Project IP-Campanha , for resources for field sampling transport, wine elaboration, and physico-chemical analyses.
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/6/21
Y1 - 2019/6/21
N2 - Machine learning (ML) has been used previously to recognize particular patterns of constituent compounds. Here, ML is used with comprehensive chemical fingerprints that capture the distribution of all constituent compounds to flexibly perform various pattern recognition tasks. Such pattern recognition requires a sequence of chemical analysis, data analysis, and pattern analysis. Chemical analysis with comprehensive multidimensional chromatography is a maturing approach for highly effective separations of complex samples and so provides a solid foundation for undertaking comprehensive chemical fingerprinting. Data analysis with smart templates employs marker peaks and chemical logic for chromatographic alignment and peak-regions to delineate chromatographic windows in which analytes are quantified and matched consistently across chromatograms to create chemical profiles that serve as complete fingerprints. Pattern analysis uses ML techniques with the resulting fingerprints to recognize sample characteristics, e.g., for classification. Our experiments evaluated the effectiveness of seventeen different ML techniques for various classification problems with chemical fingerprints from a rich data set from 126 wine samples of different varieties, geographic regions, vintages, and wineries. Results of these experiments showed an accuracy range from 58% to 88% for different ML methods on the most difficult classification problems and 96% to 100% for different ML methods on the least difficult classification problems. Averaged over 14 classification problems, accuracy for the different methods ranged from 80% to 90%, with some relatively simple ML techniques among the top-performing methods.
AB - Machine learning (ML) has been used previously to recognize particular patterns of constituent compounds. Here, ML is used with comprehensive chemical fingerprints that capture the distribution of all constituent compounds to flexibly perform various pattern recognition tasks. Such pattern recognition requires a sequence of chemical analysis, data analysis, and pattern analysis. Chemical analysis with comprehensive multidimensional chromatography is a maturing approach for highly effective separations of complex samples and so provides a solid foundation for undertaking comprehensive chemical fingerprinting. Data analysis with smart templates employs marker peaks and chemical logic for chromatographic alignment and peak-regions to delineate chromatographic windows in which analytes are quantified and matched consistently across chromatograms to create chemical profiles that serve as complete fingerprints. Pattern analysis uses ML techniques with the resulting fingerprints to recognize sample characteristics, e.g., for classification. Our experiments evaluated the effectiveness of seventeen different ML techniques for various classification problems with chemical fingerprints from a rich data set from 126 wine samples of different varieties, geographic regions, vintages, and wineries. Results of these experiments showed an accuracy range from 58% to 88% for different ML methods on the most difficult classification problems and 96% to 100% for different ML methods on the least difficult classification problems. Averaged over 14 classification problems, accuracy for the different methods ranged from 80% to 90%, with some relatively simple ML techniques among the top-performing methods.
KW - Classification
KW - Comprehensive two-dimensional gas chromatography
KW - Data mining
KW - GCxGC
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85062234529&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062234529&partnerID=8YFLogxK
U2 - 10.1016/j.chroma.2019.02.027
DO - 10.1016/j.chroma.2019.02.027
M3 - Article
C2 - 30833025
AN - SCOPUS:85062234529
SN - 0021-9673
VL - 1595
SP - 158
EP - 167
JO - Journal of Chromatography A
JF - Journal of Chromatography A
ER -