Achieving Clinical Reliability in Suicide Risk Detection: A Low-Resource Benchmark of RoBERTa vs. DistilBERT

Authors

  • Stefania Berghia ULBS

Abstract

Detecting suicidal ideation in social media text is a critical public health objective, demanding high-accuracy, deployable models. This study addresses the challenge of achieving clinical reliability within severe hardware constraints. We conduct a comparative fine-tuning benchmark of two Transformer models, DistilBERT and RoBERTa, for binary classification of suicidal risk. The models were optimized on a balanced, 10,000-sample subset of the Reddit Suicide Detection Dataset under CPU-only, low-resource constraints. Prediction robustness is achieved through a simple selection process that chooses the output with the highest confidence score from the two models. RoBERTa achieved a peak F1-Score of 97.4% and 97.40% accuracy, substantially outperforming DistilBERT (94.1% F1-Score). Crucially for safety-critical applications, error analysis confirmed RoBERTa's ethical superiority by achieving a 40% reduction in False Negatives compared to DistilBERT. The validated framework establishes that high-performance, ethically robust risk detection is feasible under resource limitations, enabling the safe integration of the final classification system with a constrained LLaMA 3 conversational module for proactive support.

Downloads

Published

2025-12-05

Issue

Section

Articles