AI Therapy Apps vs Human Therapists — What Randomized Controlled Trials Actually Show

Conversational AI tools for mental health — including Woebot, Wysa, and a growing list of competitors — have attracted both significant investment and significant skepticism. The clinical question is straightforward: do these tools produce measurable improvements in symptoms of depression, anxiety, and related conditions, and if so, for whom and under what conditions? A series of randomized controlled trials published between 2021 and 2023 provides the most rigorous data available, with results that are neither as promising as vendor communications suggest nor as dismissive as their critics imply.

The Evidence Base for Woebot

Woebot is a conversational AI tool grounded in cognitive behavioral therapy (CBT) principles, delivering psychoeducation, thought records, and behavioral activation exercises through a chat interface. Fitzpatrick et al. published the first RCT of Woebot in JMIR Mental Health in 2017, enrolling college students with self-reported depression and anxiety. Participants randomized to Woebot showed significant reductions in PHQ-9 (depression) and GAD-7 (anxiety) scores over two weeks compared to an information-only control group.

Subsequent RCTs in more clinical populations have produced more mixed results. A 2021 trial published in JMIR Mental Health by Fulmer et al. found that Woebot produced significant reductions in anxiety in college students over 4 weeks. However, a 2022 trial in a more heterogeneous community sample found smaller effect sizes and noted that participants with more severe baseline symptoms showed less response, consistent with the broader literature on self-guided digital interventions.

Wysa and the Engagement Problem

Wysa uses a similar CBT-based conversational framework with additional elements drawn from dialectical behavior therapy and mindfulness-based approaches. A 2022 RCT published in JMIR Mental Health by Inkster et al. demonstrated reductions in PHQ-9 scores in users who engaged with the platform at least 40 times over a defined study period. The significant qualifier is that caveat: 40 interactions. Engagement with digital mental health tools follows a steep drop-off curve; a substantial fraction of users do not reach the interaction threshold at which clinical benefits have been demonstrated.

This engagement problem is not unique to AI tools — it affects digital cognitive behavioral therapy broadly, including validated programs like Beating the Blues and FearFighter. But it takes on particular significance when evaluating AI chatbots against human therapy, because a human therapist can detect disengagement, modify the therapeutic approach, and reach out to a patient who stops attending sessions. An AI tool cannot initiate contact in the same way, and the regulatory and liability frameworks that would govern proactive AI outreach in mental health contexts are not well established.

Woebot RCTs show significant symptom reductions in mild-to-moderate anxiety and depression in student populations
Effect sizes are smaller and less consistent in community samples with more severe baseline symptoms
Wysa clinical benefits have been demonstrated primarily in high-engagement users (40+ interactions)
Engagement drop-off is a systemic challenge across all digital mental health interventions

Head-to-Head Comparisons Are Still Rare

The more clinically relevant question — do AI therapy tools produce outcomes comparable to human CBT delivery — has been addressed in very few trials. Most RCTs compare AI tools against waitlist control, information-only control, or treatment-as-usual — conditions that include minimal active intervention. Direct comparison against manualized CBT delivered by a human therapist has been attempted in some trials of internet-delivered CBT broadly but is rare in AI chatbot research specifically.

Where such comparisons exist, they generally favor human therapists for more complex presentations, but find AI tools non-inferior for subclinical or mild presentations with motivated users. This maps to the clinical logic: AI tools are most plausibly useful as a step below the threshold for human therapy — extending reach to populations who cannot access care rather than replacing care for those who need it most.

Regulatory Status and What It Means

Neither Woebot nor Wysa holds FDA clearance as a medical device, because both platforms are designed to operate as wellness tools rather than diagnostic or treatment devices. This classification affects what claims can be made about them, what post-market surveillance is required, and what liability attaches to clinical decisions made in their use. As evidence accumulates and as some AI mental health tools seek regulatory clearance for more specific clinical claims, this landscape will evolve.

Key Takeaway

RCT evidence supports modest but real benefits from AI CBT tools for mild-to-moderate anxiety and depression in engaged users — but the engagement ceiling, the lack of head-to-head comparisons with human therapists, and the absence of FDA clearance all constrain the claims that can responsibly be made about clinical equivalence.

Sources

Fitzpatrick KK, et al. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot). JMIR Mental Health. 2017;4(2):e19. doi:10.2196/mental.7785

Inkster B, et al. An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being. JMIR mHealth uHealth. 2018;6(11):e12106.

Medical Disclaimer: AI-based mental health tools described in this article are wellness applications, not FDA-cleared medical devices. Anyone experiencing significant depression, anxiety, or mental health crisis should seek evaluation by a qualified mental health professional.