AI Therapy Apps vs. Human Therapists: What Randomized Controlled Trials Actually Show

The market for AI-assisted mental health tools has grown substantially faster than the clinical evidence base for those tools. As of 2024, the global digital mental health market was valued at over $5 billion, with hundreds of apps available to consumers claiming to deliver cognitive behavioral therapy, mindfulness interventions, or conversational AI support. The clinical evidence, however, is narrower, more mixed, and more carefully qualified than the marketing suggests. What the randomized controlled trial literature actually shows is worth examining in detail.

Woebot: The Most Studied Conversational AI for Mental Health

Woebot is a fully automated conversational agent delivering cognitive behavioral therapy principles through a chat interface. The earliest RCT, published by Fitzpatrick and colleagues in JMIR Mental Health in 2017, randomized 70 college students with self-reported symptoms of depression and anxiety to either Woebot or an information-only control (a PDF of a CBT self-help workbook) for two weeks.

Results showed a significant reduction in PHQ-9 (Patient Health Questionnaire, depression measure) scores in the Woebot group compared to control, with a moderate effect size. PHQ-9 scores decreased by a mean of 5.1 points in the Woebot group versus 1.2 points in control — a statistically significant difference (p=0.04). The authors were appropriately cautious: the sample was small, the comparator was unusually weak, the follow-up was only two weeks, and the population was self-selected college students rather than patients with clinical depression diagnoses.

Subsequent RCTs: A More Complex Picture

Multiple subsequent RCTs in JMIR Mental Health (2022–2023) evaluated Woebot, Wysa (an AI mental health chatbot), and similar digital CBT platforms against more rigorous comparators. A 2022 meta-analysis by Abd-Alrazaq and colleagues synthesized 9 RCTs of conversational agents for depression and anxiety, finding:

Statistically significant improvements in PHQ-9 scores vs. control groups (pooled SMD -0.52, 95% CI -0.80 to -0.23)
Statistically significant improvements in GAD-7 (anxiety) scores (pooled SMD -0.64, 95% CI -1.01 to -0.26)
High dropout rates: median dropout across included trials was 31%, substantially higher than typically seen in human therapist RCTs
Very few trials included active comparators (waitlist control was the most common); direct comparison to human therapist outcomes is largely absent from the literature

The key limitation across this evidence base is the comparator problem. Demonstrating improvement versus a waitlist does not address the question most clinically relevant to practice: how does AI-delivered therapy compare to human-delivered therapy? The few trials that have attempted this comparison — including a 2022 study in Psychological Medicine comparing Wysa to brief therapist-delivered CBT — found no significant difference in outcomes, but the sample sizes were small and the study was not powered for non-inferiority conclusions.

Where AI Performs Well and Where It Does Not

The evidence points to specific use cases where AI mental health tools have demonstrated signal:

Mild-to-moderate depression and anxiety in otherwise healthy adults
Psychoeducation and skills delivery (breathing exercises, CBT thought records)
Bridging support during therapy waitlists
Mood tracking and symptom journaling with automated pattern recognition

The evidence does NOT support AI tools for severe depression, active suicidal ideation, bipolar disorder, psychosis, or personality disorders. No published RCT has demonstrated safety and efficacy of conversational AI in patients with active suicide risk, and several major platforms have developed explicit safety architectures that transfer these users to human crisis resources — an acknowledgment of the boundary.

The FDA Digital Therapeutics Framework

The FDA’s Prescription Digital Therapeutics (PDT) pathway, established following the authorization of Pear Therapeutics’ reSET for substance use disorder, created a mechanism for evidence-based digital mental health tools to achieve regulatory authorization and, theoretically, insurance reimbursement. However, as of 2025, the landscape had become more complex: Pear Therapeutics filed for bankruptcy in 2023, partly due to limited reimbursement uptake. The reimbursement pathway for digital mental health tools in the US remains substantially unresolved despite regulatory frameworks being in place.

Key Takeaway

RCT evidence supports modest efficacy of AI conversational therapy tools for mild-to-moderate depression and anxiety compared to waitlist or minimal control. High dropout rates, lack of active comparators to human therapy, and population specificity (young, healthy, mild symptoms) are significant limitations. The clinical use case is bridging and augmentation for mild presentations — not replacement of human therapists for complex or severe conditions.

Sources

1. Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment Health. 2017;4(2):e19. doi:10.2196/mental.7785

2. Abd-Alrazaq AA, Rababeh A, Alajlani M, et al. Effectiveness and Safety of Using Chatbots to Improve Mental Health: Systematic Review and Meta-Analysis. J Med Internet Res. 2020;22(7):e16021.

3. Inkster B, Sarda S, Subramanian V. An Empathy-Driven, Conversational Artificial Intelligence Agent (Wysa) for Digital Mental Well-Being. JMIR Mhealth Uhealth. 2018;6(11):e12106.

Medical Disclaimer: This article is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional for medical decisions.