The Largest Randomized Trial of AI Mental Health Care Just Validated that it Works

David Borish
3 minutes ago
6 min read

In September 2023, I first wrote an article titled "ChatGPT Can Now Speak: Revolutionizing Mental Health and the Potential to Bring Therapy to Millions of People Worldwide" arguing that AI-powered conversational tools had the potential to deliver real therapeutic value to people who could not access or afford traditional mental health services. I expanded on that thesis in May 2024, writing about GPT-4o's multimodal capabilities and their implications for emotional support at scale. I also discussed this work during my keynote at NYU's AI symposium in April 2024.

At the time, the dominant response from the clinical and research community was skepticism. Published papers from late 2023 and early 2024 carried titles like "ChatGPT is not ready yet for use in providing mental health assessment and interventions." The prevailing view held that AI lacked the clinical judgment, nuance, and safety guardrails required for meaningful therapeutic engagement. Most existing studies were small, short-duration trials with median sample sizes of 148 participants and follow-up periods of roughly one month.

Now, a large-scale randomized controlled trial has produced results that align closely with what I argued more than two years ago. The findings are striking in both their magnitude and their breadth.

The Study

Researchers Manuela Angelucci, Raissa Fábregas, and Antonia Vazquez at the University of Texas at Austin conducted a randomized controlled trial with 1,964 Mexican women experiencing mild to severe psychological distress. Participants were recruited through social media, screened for eligibility using the PHQ-4 depression and anxiety scale, and randomly assigned to receive free access to Mindsurf, a commercially available AI-powered mental health app built on Cognitive Behavioral Therapy principles.

The app combined an AI conversational agent with mood tracking, guided exercises, and self-assessment tools. The control group received access only after six months. The researchers collected data at one, two, and six months through surveys, weekly mood polls via WhatsApp, and administrative app usage data. They also gathered expert predictions to benchmark their results against prevailing scientific expectations.

This study represents a significant departure from the small efficacy trials that previously dominated the literature. The sample was drawn from a policy-relevant population: women in a middle-income country with high rates of psychological distress and limited access to trained providers. Mexico has one psychiatrist per 100,000 people, compared to 16 per 100,000 in the United States.

Mental Health Improvements That Match Traditional Therapy

The headline finding: app access improved mental health by 0.29 standard deviations over six months across a composite index of depression, anxiety, stress, and well-being measures. To put that in context, meta-analytic evidence shows that CBT-based psychotherapy reduces depressive symptoms by approximately 0.22 to 0.60 standard deviations. Pharmacotherapy shows effects around 0.35 standard deviations. The AI app produced gains squarely within that range.

The individual measures tell a consistent story. Depression scores on the PHQ-8 dropped by 1.5 to 1.7 points across follow-up periods. Anxiety on the GAD-7 declined by about 1 point throughout the study. Subjective well-being increased by 4.6 to 5.7 points on the WHO-5 scale. These effects were broadly stable from month one through month six.

Perhaps most important for the safety debate: the app reduced the prevalence of severe depression by 27% and severe anxiety by 14%. Treated participants were no more likely than control participants to experience extreme distress. In fact, extreme distress was more prevalent in the control group. The researchers found no evidence that AI-powered support worsened outcomes for any subgroup.

The expert predictions collected before the results were known underscore how surprising these findings are relative to the scientific consensus. Experts predicted mental health improvements of 0.13 standard deviations at three weeks and 0.12 at eight weeks. The actual effects were roughly 2.5 times larger.

Beyond Mental Health: Sleep, Work, and Behavior

The benefits extended well beyond psychological symptom scores. Treated participants slept an additional 10 minutes per night, reported fewer nighttime awakenings, and showed sustained improvements in sleep quality over six months. They exercised more, practiced better self-care, and missed fewer days of work.

The labor market results are particularly noteworthy. App access produced a 0.10 standard deviation improvement in a composite labor index combining absenteeism, employment status, and hours worked. Participants missed 16% fewer work days. The probability of working for pay increased by 3.1 percentage points. Back-of-the-envelope calculations suggest the intervention generated six-fold gains relative to its costs through reduced absenteeism alone, and up to 400-fold gains when accounting for averted disability.

The per-user cost of the app at scale was approximately one dollar per month. The researchers compared this to other scalable mental health interventions and found Mindsurf had the lowest cost per 0.1 standard deviation improvement in mental health.

The Engagement Paradox

One of the study's most interesting findings challenges a core assumption in digital health: that declining app usage signals declining effectiveness. Initial engagement was high, with 82% of treated participants using the app in week one. But usage dropped steadily. By two months, only 36% were active. By six months, fewer than 10% were still using it. Total usage averaged 242 minutes over the entire six-month period.

Yet the mental health effects persisted. The researchers tested this directly by randomly extending app access from three to six months for half the treatment group. Extended access increased usage by 39% but produced no additional mental health gains. The benefits had already been captured.

The explanation appears to be behavioral change. Treated participants continued to practice techniques they learned from the app, including breathing exercises, sleep routines, self-compassion practices, and boundary setting, even after they stopped using it. A mediation analysis showed that improvements in sleep and healthful behaviors accounted for 34 to 48% of the mental health effects.

This finding has significant implications for how digital health tools are evaluated. Investors, donors, and policymakers routinely use engagement and retention metrics as proxies for effectiveness. This study demonstrates that engagement can systematically understate impact when the mechanism of action involves skill acquisition rather than continuous use.

Complement, Not Substitute

The most common concern about AI mental health tools is that they will displace traditional psychotherapy. This study found the opposite. App access produced a 35% increase in the likelihood of seeing a psychologist in the prior month. The effect grew over time, from 3 percentage points at one month to 8 percentage points at six months.

The increase in therapy use was concentrated among participants with moderate and severe baseline distress, suggesting the app made traditional therapy more accessible by improving daily functioning and reducing the non-monetary costs of seeking care. A mediation analysis showed that increased psychotherapy explained at most 5% of the overall mental health gains. The app was doing most of the work independently.

This finding directly addresses one of the primary objections raised against AI therapy tools: that they would serve as a cheap substitute that prevents people from getting the care they actually need. In this population, the app functioned as an on-ramp to professional care rather than a replacement for it.

What I Argued in 2023

When I wrote about ChatGPT's potential to transform mental health access in November 2023, the core argument was straightforward. Hundreds of millions of people worldwide experience psychological distress but cannot access traditional mental health services due to cost, provider shortages, geographic barriers, or stigma. AI-powered conversational tools, delivered through smartphones that most of these people already own, could provide meaningful support at a fraction of the cost.

The skeptics raised valid concerns about safety, clinical appropriateness, and the risk of displacing more effective care. Those concerns deserved serious empirical investigation, and this study provides exactly that. On every dimension the critics flagged, the evidence points in the same direction: the app was effective, safe, complementary to traditional care, and remarkably cost-efficient.

The study also validates a pattern I explore in my upcoming book The Tony Hawk Paradox: capabilities that emerge first in digital and simulated environments eventually produce real-world effects that surprise even domain experts. The expert predictions in this study expected effects less than half of what was actually observed. The gap between expert priors and empirical reality reflects a broader tendency to underestimate how quickly AI capabilities translate into measurable outcomes in domains previously considered too complex or sensitive for algorithmic intervention.

What Comes Next

This is a single study in a specific population, and the researchers appropriately note that replication across different contexts and demographics would strengthen the evidence base. But the scale of the trial, the rigor of the randomized design, the six-month follow-up, and the consistency of results across multiple outcome measures make this one of the most compelling pieces of evidence to date that AI-powered mental health tools can deliver real clinical and economic value.

The policy implications are substantial. At roughly one dollar per user per month, governments could deploy these tools at population scale for a fraction of what traditional mental health infrastructure costs. The finding that the app complements rather than displaces psychotherapy suggests it could be integrated into stepped-care models, where patients receive progressively more intensive treatment based on need.

For the hundreds of millions of people worldwide who experience psychological distress without access to professional support, this study demonstrates that meaningful help may already be available in their pockets. The technology works. The evidence is here. The question now is how quickly health systems, policymakers, and funders will act on it.

Click image to read the previous article

DAVID BORISH

The Largest Randomized Trial of AI Mental Health Care Just Validated that it Works

The Study

Mental Health Improvements That Match Traditional Therapy

Beyond Mental Health: Sleep, Work, and Behavior

The Engagement Paradox

Complement, Not Substitute

What I Argued in 2023

What Comes Next

Comments

SIGN UP FOR MY NEWSLETTER

ARTIFICIAL INTELLIGENCE, BUSINESS, TECHNOLOGY, RECENT PRESS & EVENTS

Back to top