OC12 - Machine Learning and Big Data

Understanding the Biopsychosocial Mechanisms of Risk for Suicide and Self-harm Behaviours Using Machine Learning
August, 30 | 12:00 - 13:00

Suicide is a growing public health concern, with the World Health Organization estimating over 700,000 deaths by suicide annually. Traditional approaches to modelling suicidal behaviour are often constrained by data that are derived from narrow and siloed domains. Analyses with a “whole-person” approach (combining phenomic and exposomic measures with a biopsychosocial lens) are needed to better understand this complex trait. The primary objective of this study was to develop non-linear and interpretable machine learning models of suicide and self-harm behaviours (SB) in a large population-based cohort. We conducted stepwise modelling of recent SB with repeated ten-fold cross-validation for hyperparameter optimization in 23,521 individuals from the UK Biobank that had responded to an online follow-up mental health questionnaire. Self-reported SB within the past year (n=4,713) was assessed with three specific questions on an ordinal scale. We investigated predictive performances across four model architectures including general linear models, elastic net regression, random forests, and extreme gradient boosting (XGBoost). We included 97 input features across six domains: nine socio-demographic features, five lifestyle and substance use measures, 37 systemic and immune biomarkers, 13 clinical diagnoses (other than depression), and seven psychiatric polygenic risk scores. We also compared models both including and excluding 23 measures of subjective well-being (e.g. “recent feelings of depression”) and diagnosis of (recurrent) major depressive disorder. Multivariate imputation by chained equations with five iterations was used to address missing data. In held-out test data, the XGBoost model performed the best, achieving 75% accuracy (CI95%=[73.7%, 76.2%]) and an AUC of 0.82 (sensitivity=0.92, specificity=0.44). SHapley Additive exPlanation (SHAP) values were calculated to determine feature importance and to identify interaction effects; “recent feelings of depression” was identified as the most important feature followed by “recent feelings of inadequacy”. The largest interaction effects within the biological features (alanine aminotransferase with testosterone) and across domains (lymphocyte count with “subjective happiness”) highlight potential mediating effects of biological factors on mental health. This work can guide future research in the validation of predictive suicide-risk factors and the development of proactive anti-suicide interventions.

Speakers