State of Clinical AI Report 2026 - Summary
Source: State of Clinical AI Report 2026 - ARISE Network, January 2026
Executive Summary
AI tools are flooding into healthcare faster than they're being tested properly—with over 1,200 FDA-cleared devices and 350,000+ consumer apps creating a $70 billion market, but most lack rigorous evaluation showing they actually improve patient care. Frontier AI models (like ChatGPT for medicine) are getting really good at complex medical reasoning tasks in controlled tests, yet they fall apart when faced with real-world messiness like missing information or uncertain situations. The gap between "what AI can theoretically do" and "what it safely accomplishes for actual patients" remains enormous, with researchers calling for prospective trials and better oversight before widespread clinical deployment.
Authors & Institutions
Lead Authors:
- Dr. Peter Brodeur - Rising cardiology fellow, Harvard Medical School's Beth Israel Deaconess Medical Center; ARISE affiliate and NEJM AI reviewer
- Dr. Ethan Goh - Executive Director of ARISE; directs Stanford Healthcare AI Leadership Program and Harvard's Agentic AI Executive Course; Associate Editor at BMJ Digital Health & AI
- Dr. Adam Rodman - Assistant Professor, Harvard Medical School; Director of AI Programs for Carl J. Shapiro Center; Associate Editor at NEJM AI; host of ACP's Bedside Rounds podcast
- Dr. Jonathan H. Chen - Stanford's inaugural Director for Medical Education in AI, Division of Computational Medicine; over 100 publications in human-AI collaboration for healthcare
Contributing Team: Emily Tat, Liam McCoy, David Wu, Priyank Jain, and multiple reviewers from academic and industry positions
Conflicts of Interest
- Dr. Goh receives funding from Gordon and Betty Moore Foundation, Macy Foundation, Stanford AI partnerships; consults for Google, OpenAI, Samsung Research America, Roche Diagnostics, Novartis, Hello Heart, Grow Care Inc, and Faculty Connection
- Dr. Rodman receives funding from Moore Foundation, Macy Foundation, NIH, ARPA-H, Google, and Google DeepMind
- Dr. Chen cofounded Reaction Explorer (chemistry education software); paid consulting as medical expert witness; receives multiple NIH grants and foundation funding
- Report prepared independently despite industry research partnerships and grant support
Key Findings
The Adoption-Evidence Gap
- Over 1,200 FDA-cleared AI/ML medical devices exist (1995–2023), but more than 95% were approved through the 510(k) "equivalency" pathway rather than demonstrating actual clinical benefit
- About 50% of FDA device summaries didn't report study design, 53% lacked sample size information, and less than 1% documented patient outcomes
- 95% of device summaries omitted demographic data and 91% lacked bias assessments, creating serious equity and safety concerns
Model Performance vs. Real-World Impact
- Frontier reasoning models (like GPT-4, Claude, Gemini) showed dramatic improvements on complex medical reasoning tasks and diagnostic benchmarks
- However, these same models demonstrate "very uneven performance"—they excel at structured problems but break down when uncertainty, missing information, or changing context appears
- The gap between controlled benchmark performance and messy real-world clinical application remains substantial
What Clinicians Actually Want
- Clinicians most value AI that reduces administrative burden and workflow friction (documentation, prior authorizations, routine paperwork)
- These high-value use cases are systematically understudied and underrepresented in current AI benchmarks and research
- There's a mismatch between what researchers test and what would actually help frontline clinicians
Patient-Facing AI
- Direct-to-patient AI tools (symptom checkers, mental health chatbots, medical advice apps) are proliferating rapidly
- These raise distinct safety concerns requiring much stronger guardrails and oversight systems that don't currently exist
- The report highlights examples like Grow Therapy's AI coach (between-session mental health support) as more thoughtful approaches with clear role boundaries
Multimodal Applications
- AI systems that integrate text, images, and other clinical data types are approaching practical usability for prediction and decision-making
- Imaging remains the dominant clinical AI use case, with systems expanding toward multi-task capabilities
- Specialties are finding creative ways to repurpose routine, noninvasive data (like ECGs) to assess previously undetectable patient risks
Regulatory Landscape
- FDA clearance is increasing but near-term adoption will favor narrow, task-specific systems over general-purpose AI
- Tightly scoped AI tools for specific domains and contexts are more likely to demonstrate value and gain clinical acceptance
- Regulatory mechanisms for generative AI remain inadequate with no significant progress expected in 2026
Strengths
- Comprehensive synthesis - The report aggregates findings across model performance, benchmarks, clinical workflows, patient-facing applications, and regulatory landscape into one accessible document
- Focus on real-world impact - Rather than just celebrating technical capabilities, the report consistently asks "does this actually improve patient care?" which is the right question
- Transparent about evidence gaps - The authors clearly distinguish between what AI can do in controlled settings versus what's been proven in prospective clinical trials
- Acknowledges the adoption-evidence mismatch - Calling out that 95% of FDA-cleared devices lack proper evaluation is important accountability that many industry reports avoid
- Interdisciplinary authorship - The team combines clinical expertise (cardiology, internal medicine), informatics, medical education, and health systems leadership, giving multiple perspectives
- Practical framing for clinicians - The report identifies what clinicians actually need (workflow support, administrative relief) versus what gets researched and funded
- Equity considerations - Highlighting that 95% of devices lack demographic data and 91% lack bias assessments directly addresses fairness concerns often ignored in AI hype
Weaknesses
- Limited original data - This is primarily a synthesis/review report rather than presenting new research findings or meta-analyses with novel statistical analysis
- Selection bias in examples - The "Applied AI & Demos" section features tools from research partners and may not represent the full landscape of clinical AI deployment, potentially creating a rosier picture than reality
- Vague on methodology - The report doesn't clearly explain how studies were selected for inclusion or how quality was assessed, making it hard to evaluate potential bias in the synthesis
- Predictions lack accountability - The "10 Predictions for 2026" section reads more like informed speculation than evidence-based forecasting, with no framework for how they'll be evaluated or what they're based on
- Underexplored implementation science - While the report notes that clinician-valued use cases are understudied, it doesn't deeply explore why this mismatch exists or how to fix it
- Limited cost-effectiveness analysis - Given the $70B market mentioned, there's minimal discussion of whether these tools represent good value for healthcare systems facing budget constraints
- Regulatory recommendations unclear - The report identifies FDA clearance problems but offers little concrete guidance on what better oversight would look like
- Missing patient voices - Despite emphasizing patient-facing AI risks, the report doesn't appear to include patient perspectives or patient safety advocates among reviewers/authors