TY - JOUR
T1 - Comparative evaluation of machine learning models for groundwater quality assessment
AU - Bedi, Shine
AU - Samal, Ashok
AU - Ray, Chittaranjan
AU - Snow, Daniel
N1 - Funding Information:
This research was supported, in part by grant number 16-190-3 from the Nebraska Environmental Trust.
Funding Information:
The authors of this paper thank the Know Your Well project team for its support and Dana W. Kolpin of the U.S. Geological Survey (USGS) at Iowa City, Iowa, for providing detailed metadata and information on features in the dataset used in this study.
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020/12
Y1 - 2020/12
N2 - Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.
AB - Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.
KW - Artificial neural networks (ANN)
KW - Data imbalance
KW - Feature importance
KW - Groundwater quality
KW - Support vector machines (SVM)
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85096333112&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096333112&partnerID=8YFLogxK
U2 - 10.1007/s10661-020-08695-3
DO - 10.1007/s10661-020-08695-3
M3 - Article
C2 - 33219864
AN - SCOPUS:85096333112
SN - 0167-6369
VL - 192
JO - Environmental Monitoring and Assessment
JF - Environmental Monitoring and Assessment
IS - 12
M1 - 776
ER -