TY - JOUR
T1 - Machine Learning to Develop and Internally Validate a Predictive Model for Post-operative Delirium in a Prospective, Observational Clinical Cohort Study of Older Surgical Patients
AU - the RISE Study Group
AU - Racine, Annie M.
AU - Tommet, Douglas
AU - D’Aquila, Madeline L.
AU - Fong, Tamara G.
AU - Gou, Yun
AU - Tabloski, Patricia A.
AU - Metzger, Eran D.
AU - Hshieh, Tammy T.
AU - Schmitt, Eva M.
AU - Vasunilashorn, Sarinnapha M.
AU - Kunze, Lisa
AU - Vlassakov, Kamen
AU - Abdeen, Ayesha
AU - Lange, Jeffrey
AU - Earp, Brandon
AU - Dickerson, Bradford C.
AU - Marcantonio, Edward R.
AU - Steingrimsson, Jon
AU - Travison, Thomas G.
AU - Inouye, Sharon K.
AU - Jones, Richard N.
AU - Arnold, Steven
AU - Dickerson, Bradford
AU - Jones, Richard
AU - Libermann, Towia
AU - Travison, Thomas
AU - Dillon, Simon T.
AU - Hooker, Jacob
AU - Hshieh, Tammy
AU - Ngo, Long
AU - Otu, Hasan
AU - Racine, Annie
AU - Touroutoglou, Alexandra
AU - Vasunilashorn, Sarinnapha
AU - Ayres, Douglas
AU - Brick, Gregory
AU - Chen, Antonia
AU - Davis, Robert
AU - Drew, Jacob
AU - Iorio, Richard
AU - Kornack, Fulton
AU - Weaver, Michael
AU - Webber, Anthony
AU - Wilk, Richard
AU - Shaff, David
AU - Armstrong, Brett
AU - Banda, Angelee
AU - Bertrand, Sylvie
AU - D’Aquila, Madeline
AU - Gallagher, Jacqueline
N1 - Publisher Copyright:
© 2020, Society of General Internal Medicine.
PY - 2021/2
Y1 - 2021/2
N2 - Background: Our objective was to assess the performance of machine learning methods to predict post-operative delirium using a prospective clinical cohort. Methods: We analyzed data from an observational cohort study of 560 older adults (≥ 70 years) without dementia undergoing major elective non-cardiac surgery. Post-operative delirium was determined by the Confusion Assessment Method supplemented by a medical chart review (N = 134, 24%). Five machine learning algorithms and a standard stepwise logistic regression model were developed in a training sample (80% of participants) and evaluated in the remaining hold-out testing sample. We evaluated three overlapping feature sets, restricted to variables that are readily available or minimally burdensome to collect in clinical settings, including interview and medical record data. A large feature set included 71 potential predictors. A smaller set of 18 features was selected by an expert panel using a consensus process, and this smaller feature set was considered with and without a measure of pre-operative mental status. Results: The area under the receiver operating characteristic curve (AUC) was higher in the large feature set conditions (range of AUC, 0.62–0.71 across algorithms) versus the selected feature set conditions (AUC range, 0.53–0.57). The restricted feature set with mental status had intermediate AUC values (range, 0.53–0.68). In the full feature set condition, algorithms such as gradient boosting, cross-validated logistic regression, and neural network (AUC = 0.71, 95% CI 0.58–0.83) were comparable with a model developed using traditional stepwise logistic regression (AUC = 0.69, 95% CI 0.57–0.82). Calibration for all models and feature sets was poor. Conclusions: We developed machine learning prediction models for post-operative delirium that performed better than chance and are comparable with traditional stepwise logistic regression. Delirium proved to be a phenotype that was difficult to predict with appreciable accuracy.
AB - Background: Our objective was to assess the performance of machine learning methods to predict post-operative delirium using a prospective clinical cohort. Methods: We analyzed data from an observational cohort study of 560 older adults (≥ 70 years) without dementia undergoing major elective non-cardiac surgery. Post-operative delirium was determined by the Confusion Assessment Method supplemented by a medical chart review (N = 134, 24%). Five machine learning algorithms and a standard stepwise logistic regression model were developed in a training sample (80% of participants) and evaluated in the remaining hold-out testing sample. We evaluated three overlapping feature sets, restricted to variables that are readily available or minimally burdensome to collect in clinical settings, including interview and medical record data. A large feature set included 71 potential predictors. A smaller set of 18 features was selected by an expert panel using a consensus process, and this smaller feature set was considered with and without a measure of pre-operative mental status. Results: The area under the receiver operating characteristic curve (AUC) was higher in the large feature set conditions (range of AUC, 0.62–0.71 across algorithms) versus the selected feature set conditions (AUC range, 0.53–0.57). The restricted feature set with mental status had intermediate AUC values (range, 0.53–0.68). In the full feature set condition, algorithms such as gradient boosting, cross-validated logistic regression, and neural network (AUC = 0.71, 95% CI 0.58–0.83) were comparable with a model developed using traditional stepwise logistic regression (AUC = 0.69, 95% CI 0.57–0.82). Calibration for all models and feature sets was poor. Conclusions: We developed machine learning prediction models for post-operative delirium that performed better than chance and are comparable with traditional stepwise logistic regression. Delirium proved to be a phenotype that was difficult to predict with appreciable accuracy.
KW - delirium
KW - machine learning
KW - model prediction
KW - post-operative
KW - statistical learning
UR - http://www.scopus.com/inward/record.url?scp=85095596009&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095596009&partnerID=8YFLogxK
U2 - 10.1007/s11606-020-06238-7
DO - 10.1007/s11606-020-06238-7
M3 - Article
C2 - 33078300
AN - SCOPUS:85095596009
SN - 0884-8734
VL - 36
SP - 265
EP - 273
JO - Journal of general internal medicine
JF - Journal of general internal medicine
IS - 2
ER -