Content is user-generated and unverified.

State of Clinical AI Report 2026 - Summary

Source: State of Clinical AI Report 2026 - ARISE Network, January 2026

Executive Summary

AI tools are flooding into healthcare faster than they're being tested properly—with over 1,200 FDA-cleared devices and 350,000+ consumer apps creating a $70 billion market, but most lack rigorous evaluation showing they actually improve patient care. Frontier AI models (like ChatGPT for medicine) are getting really good at complex medical reasoning tasks in controlled tests, yet they fall apart when faced with real-world messiness like missing information or uncertain situations. The gap between "what AI can theoretically do" and "what it safely accomplishes for actual patients" remains enormous, with researchers calling for prospective trials and better oversight before widespread clinical deployment.

Authors & Institutions

Lead Authors:

Dr. Peter Brodeur - Rising cardiology fellow, Harvard Medical School's Beth Israel Deaconess Medical Center; ARISE affiliate and NEJM AI reviewer
Dr. Ethan Goh - Executive Director of ARISE; directs Stanford Healthcare AI Leadership Program and Harvard's Agentic AI Executive Course; Associate Editor at BMJ Digital Health & AI
Dr. Adam Rodman - Assistant Professor, Harvard Medical School; Director of AI Programs for Carl J. Shapiro Center; Associate Editor at NEJM AI; host of ACP's Bedside Rounds podcast
Dr. Jonathan H. Chen - Stanford's inaugural Director for Medical Education in AI, Division of Computational Medicine; over 100 publications in human-AI collaboration for healthcare

Contributing Team: Emily Tat, Liam McCoy, David Wu, Priyank Jain, and multiple reviewers from academic and industry positions

Conflicts of Interest

Dr. Goh receives funding from Gordon and Betty Moore Foundation, Macy Foundation, Stanford AI partnerships; consults for Google, OpenAI, Samsung Research America, Roche Diagnostics, Novartis, Hello Heart, Grow Care Inc, and Faculty Connection
Dr. Rodman receives funding from Moore Foundation, Macy Foundation, NIH, ARPA-H, Google, and Google DeepMind
Dr. Chen cofounded Reaction Explorer (chemistry education software); paid consulting as medical expert witness; receives multiple NIH grants and foundation funding
Report prepared independently despite industry research partnerships and grant support

Key Findings

The Adoption-Evidence Gap

Over 1,200 FDA-cleared AI/ML medical devices exist (1995–2023), but more than 95% were approved through the 510(k) "equivalency" pathway rather than demonstrating actual clinical benefit
About 50% of FDA device summaries didn't report study design, 53% lacked sample size information, and less than 1% documented patient outcomes
95% of device summaries omitted demographic data and 91% lacked bias assessments, creating serious equity and safety concerns

Model Performance vs. Real-World Impact

Frontier reasoning models (like GPT-4, Claude, Gemini) showed dramatic improvements on complex medical reasoning tasks and diagnostic benchmarks
However, these same models demonstrate "very uneven performance"—they excel at structured problems but break down when uncertainty, missing information, or changing context appears
The gap between controlled benchmark performance and messy real-world clinical application remains substantial

What Clinicians Actually Want

Clinicians most value AI that reduces administrative burden and workflow friction (documentation, prior authorizations, routine paperwork)
These high-value use cases are systematically understudied and underrepresented in current AI benchmarks and research
There's a mismatch between what researchers test and what would actually help frontline clinicians

Patient-Facing AI

Direct-to-patient AI tools (symptom checkers, mental health chatbots, medical advice apps) are proliferating rapidly
These raise distinct safety concerns requiring much stronger guardrails and oversight systems that don't currently exist
The report highlights examples like Grow Therapy's AI coach (between-session mental health support) as more thoughtful approaches with clear role boundaries

Multimodal Applications

AI systems that integrate text, images, and other clinical data types are approaching practical usability for prediction and decision-making
Imaging remains the dominant clinical AI use case, with systems expanding toward multi-task capabilities
Specialties are finding creative ways to repurpose routine, noninvasive data (like ECGs) to assess previously undetectable patient risks

Regulatory Landscape

FDA clearance is increasing but near-term adoption will favor narrow, task-specific systems over general-purpose AI
Tightly scoped AI tools for specific domains and contexts are more likely to demonstrate value and gain clinical acceptance
Regulatory mechanisms for generative AI remain inadequate with no significant progress expected in 2026

Strengths

Comprehensive synthesis - The report aggregates findings across model performance, benchmarks, clinical workflows, patient-facing applications, and regulatory landscape into one accessible document
Focus on real-world impact - Rather than just celebrating technical capabilities, the report consistently asks "does this actually improve patient care?" which is the right question
Transparent about evidence gaps - The authors clearly distinguish between what AI can do in controlled settings versus what's been proven in prospective clinical trials
Acknowledges the adoption-evidence mismatch - Calling out that 95% of FDA-cleared devices lack proper evaluation is important accountability that many industry reports avoid
Interdisciplinary authorship - The team combines clinical expertise (cardiology, internal medicine), informatics, medical education, and health systems leadership, giving multiple perspectives
Practical framing for clinicians - The report identifies what clinicians actually need (workflow support, administrative relief) versus what gets researched and funded
Equity considerations - Highlighting that 95% of devices lack demographic data and 91% lack bias assessments directly addresses fairness concerns often ignored in AI hype

Weaknesses

Limited original data - This is primarily a synthesis/review report rather than presenting new research findings or meta-analyses with novel statistical analysis
Selection bias in examples - The "Applied AI & Demos" section features tools from research partners and may not represent the full landscape of clinical AI deployment, potentially creating a rosier picture than reality
Vague on methodology - The report doesn't clearly explain how studies were selected for inclusion or how quality was assessed, making it hard to evaluate potential bias in the synthesis
Predictions lack accountability - The "10 Predictions for 2026" section reads more like informed speculation than evidence-based forecasting, with no framework for how they'll be evaluated or what they're based on
Underexplored implementation science - While the report notes that clinician-valued use cases are understudied, it doesn't deeply explore why this mismatch exists or how to fix it
Limited cost-effectiveness analysis - Given the $70B market mentioned, there's minimal discussion of whether these tools represent good value for healthcare systems facing budget constraints
Regulatory recommendations unclear - The report identifies FDA clearance problems but offers little concrete guidance on what better oversight would look like
Missing patient voices - Despite emphasizing patient-facing AI risks, the report doesn't appear to include patient perspectives or patient safety advocates among reviewers/authors

Content is user-generated and unverified.