Application of Gradient Boosted Decision Trees for Understanding Predictive Factors of Depression
Background Depression affects 300 million people globally and is a leading cause of disability1. Methods to identify patients with depression and understand predictive factors for its development are vital. XGBoost, which is an implementation of the gradient boosted decision trees algorithm, is particularly applicable. This study explores a unique application of classification-based XGBoost for understanding predictive factors of depression from nationally representative survey data. Methods This project utilized National Health Interview Survey (NHIS) data from 2020, with 31,568 responses. Data was preprocessed to eliminate features with excessive homogeneity. XGBoost was applied to create ML models that could predict the occurrence of depression in an individual (target variable), determined by if the respondent was prescribed antidepressants. Feature selection was applied to determine which other characteristics were closely correlated with depression. To ensure the best accuracy, randomized hyperparameter search was conducted. The results from confusion matrices assisted in guiding the optimization. The respondent features were selected on the basis of a feature importance threshold. Results The model has a Precision of 0.94, a Recall of 0.95, and an F1 Score of 0.94. Some features that the model relied on most to make its prediction were whether respondents: took anxiety medication, took sleep medication, received therapy in the last year, and could afford balanced meals. Discussion Researchers can utilize this methodology along with these predictors to approximate whether patients have depression directly from medical records. This can be useful for guiding new provider-patient discussions.