OC17 - Internet and Suicide Prevention: Risk and Opportunities

Evaluating the Quality of Suicide-Related Narratives Generated by Large Language Models (LLMs)
August, 30 | 12:00 - 13:00

Background
Suicide-related media is known to influence suicide rates. Large Language Models (LLM), a form of Artificial Intelligence (AI), are increasingly being used as a writing tool. However, the quality of LLM-generated suicide-related content has yet to be assessed.
Methods
Our study will examine the outputs from 3 chatbots (GPT-4, Grok, ERNIE) asked to produce text in 5 writing styles (broadsheet news report, tabloid news report, adult fiction, teen fiction, social media influencer) * 11 suicide-related prompts * asked 5 times each (planned n= 825). To date, we have pilot-coded 100 queries using GPT-4. Results will be presented descriptively with regression analyses comparing results between LLMs.
Results
In our pilot, 78 responses (78%) were excluded mainly because GPT-4 refused to write the requested text citing concerns about generating potentially harmful material. Ultimately, 32 responses (32%) met inclusion criteria, mainly those related to broadsheet and tabloid news reports as well as social media influencers. With respect to the overarching narrative, 13 (41%) focused on helpful efforts by society to prevent suicide, and 11 had other, more general, uplifting messages (34%). In terms of putatively harmful response characteristics, the most common was mentioning that there is an epidemic or escalating crisis of suicide (11 responses; 34%). Notably, in terms of putatively protective response characteristics, all responses (100%) included a message of hope and 23 (72%) described alternatives to suicidal behaviour.
Conclusion
These preliminary results suggest that GPT-4 largely adheres to responsible media reporting guidelines, at rates substantially higher than the literature on output from human sources, with evidence that OpenAI has placed stringent safeguards on the types of suicide-related content GPT-4 can produce. Full results for GPT-4 as well as Grok and ERNIE will be available at the time of the conference. We expect substantially different results from the latter given that Grok has been dubbed "TruthGPT" that, in contrast to GPT, is not "trained to be politically correct" and because ERNIE is a Chinese-language LLM that arises from a different cultural context. The results of our study will represent the first attempt to characterize the narratives LLMs produce regarding suicide and can be used to inform future suicide prevention efforts through AI.

Speakers