Content is user-generated and unverified.

Extracting Books from AI Models: A Research Summary

Link to paper: https://arxiv.org/abs/2601.02671

Executive Summary

Researchers successfully extracted nearly complete copies of copyrighted books (including Harry Potter and 1984) from four major AI chatbots by simply asking them to continue a book's opening sentence and then repeatedly requesting more text. The study revealed that two AI systems (Google's Gemini and xAI's Grok) didn't even require sophisticated "jailbreaking" techniques to bypass safety measures, while Claude produced the most extensive extractions—reproducing up to 95.8% of some books near-verbatim. This research raises significant questions about whether AI companies' claims that training on copyrighted material is "transformative" hold up when their models can reproduce entire books nearly word-for-word.


Authors & Institutions

Lead Authors (Equal contribution):

  • Ahmed Ahmed - Stanford University
  • A. Feder Cooper - Stanford University and Yale University

Additional Authors:

  • Sanmi Koyejo - Stanford University
  • Percy Liang - Stanford University

Institutional Affiliations:

  • Stanford University Department of Computer Science
  • Yale University Department of Computer Science

Conflicts of Interest:

  • A. Feder Cooper was employed by Microsoft until December 2025 (as a postdoctoral researcher in the FATE group); the authors explicitly note these results should not be attributed to Microsoft.
  • The research was conducted independently and involved purchasing API access to test production AI systems.

What They Did

Research Question:

  • Can copyrighted books be extracted from production AI chatbots that have safety measures designed to prevent this?

Models Tested:

  • Claude 3.7 Sonnet (Anthropic)
  • GPT-4.1 (OpenAI)
  • Gemini 2.5 Pro (Google)
  • Grok 3 (xAI)

Method:

  • Phase 1: Started with the first sentence of a book and asked the AI to continue it verbatim; for Claude and GPT-4, they used "Best-of-N jailbreaking" (trying slightly modified versions of the prompt with character substitutions, random capitalization, etc. until one worked).
  • Phase 2: Once the AI complied, they repeatedly asked it to "continue" the text, accumulating hundreds of responses to reconstruct large portions of books.
  • They tested 13 books total (11 in-copyright, 2 public domain) mostly selected because prior research showed they were highly memorized by other AI models.

Key Measurement:

  • "nv-recall" (near-verbatim recall): the percentage of a book's words that appeared in order in long, matching chunks (minimum 100 words) in the AI's output.

Key Findings

Extraction Success Rates (nv-recall):

  • Claude 3.7 Sonnet: Extracted 95.8% of Harry Potter and the Sorcerer's Stone, 97.5% of The Great Gatsby, 95.5% of 1984, and 94.3% of Frankenstein.
  • Gemini 2.5 Pro: Extracted 76.8% of Harry Potter (no jailbreaking needed).
  • Grok 3: Extracted 70.3% of Harry Potter and 69% of 1984 (no jailbreaking needed).
  • GPT-4.1: Only extracted 4.0% of Harry Potter's first chapter before refusing to continue, despite thousands of jailbreak attempts.

Jailbreaking Results:

  • Gemini 2.5 Pro and Grok 3 directly complied with requests to reproduce copyrighted text without any jailbreaking.
  • Claude 3.7 Sonnet required jailbreaking but often succeeded with relatively few attempts (under 300 for Harry Potter).
  • GPT-4.1 required 10-1000x more jailbreak attempts than Claude and still produced minimal extraction before refusing.

Cost:

  • Ranged from ~$0.19 (GPT-4.1, Frankenstein) to $119.97 (Claude, Harry Potter).
  • Most successful extractions cost $1-10, making this relatively inexpensive.

Additional Observations:

  • When AI systems generated text that wasn't verbatim from books, it often replicated plot elements, character names, and themes from the target book.
  • Some models would generate thousands of words before hitting safety guardrails.

Strengths

Methodology & Rigor:

  • The two-phase approach (probe feasibility, then extract) was systematic and well-documented with reproducible parameters for each model.
  • The "nv-recall" metric is conservative and well-justified—requiring 100+ word blocks to claim extraction avoids counting coincidental short matches.

Measurement Validity:

  • The conservative filtering (requiring long, aligned blocks) means they likely undercounted extraction, making their claims stronger rather than inflated.
  • Testing a negative control (a book published after all models' training cutoffs) appropriately showed the method doesn't produce false positives.

Transparency:

  • Authors clearly documented all experimental settings, costs, and limitations for each model, acknowledging they used different configurations per system.
  • Responsible disclosure: they notified all companies 90 days before publication, following standard security research practices.

Statistical Honesty:

  • Authors explicitly warn against comparing models directly since they used different experimental settings for each, avoiding misleading "Model X is worse than Model Y" claims.
  • They acknowledge their results show what's possible under specific conditions, not comprehensive measurements of overall memorization.

Real-world Relevance:

  • Testing production systems (not just research models) with actual API access reflects what real users could potentially do.
  • Including public domain books (Frankenstein, Great Gatsby) for comparison was smart—allows demonstration without exclusively focusing on current copyright controversies.

Weaknesses

Limited Scope:

  • Only tested 13 books across four models during a 5-week window in 2025—this is far too small to make generalizations about overall memorization rates or comparative model safety.
  • Book selection was biased toward books known to be highly memorized in other models, which inflates apparent success rates.

Methodological Inconsistencies:

  • Different experimental configurations for each model (different temperature settings, maximum token lengths, penalties) make it impossible to fairly compare which systems are more vulnerable.
  • For Claude 3.7 Sonnet, they specifically tuned parameters to maximize extraction success, but didn't do equivalent optimization for other models—this could make Claude appear more vulnerable than it might be relative to others.

Measurement Limitations:

  • The "nv-recall" metric only counts in-order, near-verbatim text blocks, potentially missing substantial extraction that's out of order or interrupted.
  • They acknowledge but don't quantify how much valid extraction might be hidden in the "additional" and "missing" categories due to measurement artifacts.

Reproducibility Challenges:

  • Production AI systems change over time and results are non-deterministic—by the authors' own account, Claude 3.7 Sonnet was removed from availability before publication.
  • The jailbreaking approach (Best-of-N with random perturbations) introduces randomness that makes exact replication impossible.

Logical Issues:

  • The paper conflates "this extraction was possible under these specific conditions" with broader claims about copyright risk, but doesn't test how representative these conditions are.
  • The cost analysis is somewhat misleading—while $100+ seems cheap for a book, it's compared to piracy costs rather than legitimate purchase, and ignores that most users wouldn't know to attempt this.

Adversarial Context:

  • The "jailbreaking" techniques and iterative prompting represent adversarial use that's deliberately designed to circumvent safety measures—the relevance to typical copyright infringement risk is unclear.
  • The authors note this themselves but don't fully grapple with how security researchers' ability to extract data differs from what ordinary users (or even motivated bad actors) would discover.

Statistical Rigor:

  • Most results are from single runs per book-model combination, so there's no measure of variability or confidence in the reported extraction percentages.
  • The paper acknowledges they "picked configurations that resulted in the largest amount of extraction" for each model, which is a form of cherry-picking that inflates the apparent vulnerability.

Bottom Line for Dinner Table Discussion

What this means: This research demonstrates that major AI chatbots have memorized substantial portions of copyrighted books and can be prompted to reproduce them, raising genuine questions about companies' legal arguments that AI training is "transformative use." However, the extraction required persistent, technical effort (thousands of attempts for some systems), cost real money, and doesn't work consistently—so this isn't evidence that casual users are routinely downloading free Harry Potter books from ChatGPT.

The copyright question: The findings are particularly relevant because AI companies are defending copyright lawsuits by arguing their use of copyrighted training data is "transformative." If a model can regurgitate a near-complete book, that specific output doesn't look very transformed—though courts will ultimately decide whether this matters legally.

The safety question: The fact that two systems (Gemini and Grok) didn't even require jailbreaking to extract copyrighted text suggests their safety measures may be insufficient, while GPT-4's resistance (though imperfect) shows that stronger guardrails are possible even if not foolproof.

Content is user-generated and unverified.
    AI Book Extraction Research: Copyright & Safety Analysis | Claude