Content is user-generated and unverified.

GPT-5 Accelerating Biological Research: A Critical Analysis

Source: Measuring AI's capability to accelerate biological research in the wet lab | OpenAI
Published: December 16, 2025

Executive Summary

OpenAI tested whether GPT-5 could autonomously improve a standard molecular biology technique—and the AI succeeded, boosting the efficiency of DNA cloning by 79-fold through a novel combination of enzymes that even surprised the researchers. Working with minimal human guidance, GPT-5 proposed adding two bacterial proteins (RecA and gp32) in a specific sequence that had never been used together this way, creating what they call "RAPF-HiFi assembly." While impressive as proof-of-concept that AI can generate genuinely novel scientific ideas and iterate on experimental results, the work was limited to a single simple lab system, and OpenAI's evaluation of its own product raises questions about objectivity and broader applicability.

Authors & Institutions

Authors: Nikolai Eroshenko, Miles Wang, Rachel Smith, Liliana Abramson, Tejal Patwardhan, Kemo Jammeh, Chase Olle, Azadeh Samadian, Nitin Mahadeo

Institutions:

OpenAI (primary research organization)
Red Queen Bio (biosecurity startup partner)
Robot on Rails (robotics collaboration)

Conflicts of Interest

Major Concerns:

OpenAI is evaluating its own product (GPT-5) without independent third-party validation—imagine Ford publishing research claiming their new car is the safest ever made.
Red Queen Bio is a commercial biosecurity startup, suggesting potential business development interests beyond pure science.
No traditional academic peer review mentioned, despite significant scientific claims about novel enzymatic mechanisms.
Published on OpenAI's own website rather than in an independent scientific journal.

The Data: What They Actually Did

Experimental Setup:

Started with standard Gibson assembly (a common DNA "glue" method from 2009) used to insert a green fluorescent protein gene into a bacterial plasmid.
Ran 5 rounds of optimization where GPT-5 proposed 8-10 protocol variants per round, scientists executed them, and results fed back to the AI.
Separately tested 13 different transformation protocols in a single "one-shot" round.

Key Results:

RAPF-HiFi assembly (the novel enzymatic approach): 2.6-fold improvement over baseline.
T7 transformation protocol (concentrating bacterial cells): 36-fold improvement over baseline.
Combined: 79-fold total improvement, confirmed with n=3 independent validation experiments.
All successful clones were sequence-verified (proving they were correct).
Control experiments removing RecA or gp32 individually reduced performance, suggesting both proteins are necessary.

Strengths: What They Did Right

Solid baseline controls: Used the commercial HiFi assembly kit as a well-established starting point, making improvements meaningful and measurable.
Genuine novelty: The RecA-gp32 combination for Gibson assembly appears to be genuinely new—while both proteins were known individually, using them together this way hadn't been reported in the scientific literature.
Proper validation: Ran triplicate experiments (n=3) for final validation and sequence-confirmed all clones to ensure they weren't false positives.
Mechanistic testing: Tested the proposed mechanism by systematically removing components (RecA alone, without both proteins) to show both are necessary—this is good scientific reasoning.
Honest about limitations: Acknowledged that fixed prompting limited optimization, that some high-performers didn't replicate, and that the transformation protocol creates "jackpot dynamics" with high variance.
No human hand-holding: Used standardized prompts without human intervention to isolate what the AI could do independently—this is critical for measuring true AI capability.
Robotic validation: Built and tested an autonomous robot executing protocols, comparing directly to human performance (albeit with lower absolute yields).

Weaknesses: Red Flags and Limitations

Single trick pony: Only tested on one simple experimental system (GFP into pUC19)—we have no idea if this approach generalizes to other cloning challenges, different DNA sequences, or more complex applications.
Reproducibility problems they buried: Many "top performers" from initial screening failed to replicate in validation experiments, but this is mentioned only briefly—raises questions about how reliable the initial optimization actually was.
No expert human comparison: Didn't compare GPT-5's performance to what an experienced molecular biologist could achieve with the same time and iterations—we don't know if this is better than human optimization or just "different."
The "jackpot" problem: Authors admit the ligase-polish reaction family shows high variance and "jackpot dynamics" where occasional outliers can look great but don't consistently perform—this could mean luck rather than genuine optimization drove some results.
Biosecurity handwaving: Mentioned biosecurity implications but provided minimal detail about safeguards or risk assessment—concerning given they're demonstrating AI can design novel biological protocols autonomously.
Self-assessment bias: OpenAI researchers evaluating OpenAI's product is like asking a pharmaceutical company to independently verify their drug works—inherent conflict of interest without external validation.
Cherry-picked success?: We don't know how many failed attempts or other experiments didn't work—publication focuses only on the successful cloning optimization.
Robot underperformed significantly: The autonomous robot had 10-fold lower absolute colony counts than human execution, suggesting the "AI + automation" dream is still far from practical reality.
Not peer-reviewed: Published on company blog rather than submitted to rigorous academic peer review where independent scientists would scrutinize methods, data, and conclusions.
Exploration-exploitation tradeoff: Authors acknowledge the fixed prompting "locked the system into exploration" and limited refinement of discoveries—meaning the 79-fold improvement might be far from optimal.

Bottom Line for Dinner Table Discussion

The Exciting Part: This is genuine evidence that frontier AI models can propose novel scientific ideas that work in the real world—GPT-5 essentially "invented" a new molecular biology technique by reasoning about enzyme mechanisms, and it actually worked when tested in the lab.

The Skeptical Take: This is OpenAI marketing their own product with a single proof-of-concept experiment, no independent validation, and significant reproducibility issues they downplayed. We need to see this replicated by independent labs, tested on diverse systems, and compared to expert human optimization before declaring AI has revolutionized wet lab biology.

The Practical Reality: Even if the science holds up, the autonomous robot performed 10x worse than humans, many "optimized" protocols didn't replicate, and this only worked on the simplest possible molecular biology task. The path from "cool demo" to "transforms biological research" is long and uncertain.

Content is user-generated and unverified.