Schema markup has officially crossed from SEO enhancement to AI requirement. In March 2025, Microsoft's Fabrice Canel confirmed at SMX Munich that "schema markup helps Microsoft's LLMs understand content"—the first explicit confirmation from a major search engine that structured data directly feeds AI systems. This report compiles research, official statements, and case studies demonstrating that structured data now serves as the semantic layer powering AI-generated answers across platforms.
The evidence is substantial: Microsoft research shows knowledge graph-grounded LLMs achieve 300% higher accuracy than unstructured approaches; Google's Knowledge Graph (containing 500 billion facts) relies heavily on schema.org markup; and controlled experiments show pages with well-implemented schema are the only ones appearing in AI Overviews. While Google officially states no "special" schema is required for AI features, their representatives consistently emphasize that structured data helps systems "understand pages better" and "indirectly leads to better ranks."
Microsoft has provided the most direct confirmation that structured data directly impacts AI search systems. At SMX Munich in March 2025, Fabrice Canel, Principal Program Manager at Microsoft Bing, stated definitively:
"Schema markup helps Microsoft's LLMs understand content." — Fabrice Canel, SMX Munich, March 2025
This statement, reported by Search Engine Land and confirmed on LinkedIn, represents the clearest public acknowledgment from any major search engine that structured data feeds directly into AI language models, including Bing Chat and Microsoft Copilot.
Microsoft's Prometheus model combines the Bing index with OpenAI's GPT models through a technique called "grounding." According to official documentation:
"Selecting the relevant internal queries and leveraging the respective Bing search results is a critical component of Prometheus since it provides relevant and fresh information to the model, enabling it to answer recent questions and reduce inaccuracies—this method is called grounding." — Microsoft Bing Blog, February 2023
Microsoft Research has also published extensively on GraphRAG (Graph Retrieval-Augmented Generation), which uses LLM-generated knowledge graphs to enhance answer quality. Their February 2024 research paper states:
"GraphRAG uses LLM-generated knowledge graphs to provide substantial improvements in question-and-answer performance when conducting document analysis of complex information." — Jonathan Larson & Steven Truitt, Microsoft Research, February 2024
The official GraphRAG GitHub repository (released July 2024) documents this approach: https://github.com/microsoft/graphrag
Bing's webmaster documentation explicitly states: "Bing works hard to understand the content of a page and one of the clues that Bing uses is structured data." Bing was a founding member of Schema.org in 2011 alongside Google, Yahoo, and Yandex.
Canel has also emphasized the combination of structured data with IndexNow for AI optimization:
"Gen AIs value fresh content in particular, partly as a reference check of their LLM training data. Use the API at indexnow.org to push that information as it's published or updated." — Fabrice Canel, SMX Munich 2025
Source URLs:
Google's official position is more nuanced than Microsoft's. According to their December 2025 AI Features documentation:
"You don't need to create new machine readable files, AI text files, or markup to appear in these features. There's also no special schema.org structured data that you need to add." — Google Search Central, December 2025
However, Google representatives consistently emphasize that structured data helps their systems understand content, which logically extends to AI features.
When asked directly whether schema markup helps LLMs understand entities, Mueller responded (via Reddit):
"This question will stick with us for the next year and longer, and the short answer is yes, no, and it depends." — John Mueller, Google Search Advocate
Mueller clarified that some features depend heavily on structured data (like Shopping results for pricing, shipping, availability), while in other cases it enriches results. He also confirmed Google uses RAG and grounding for AI Overviews:
At Google Search Central Live Madrid (2025), Mueller explained the process: User enters question → Search finds relevant information → Information "grounds" the LLM → LLM creates answer with supporting links.
On schema's future, Mueller stated clearly: "Google is not killing schema." He noted that Google regularly retires redundant markup types while keeping essential ones, emphasizing "evergreen structured data that communicates meaning."
Source: https://www.seroundtable.com/mueller-schema-helps-llms-google-40693.html
The most viral quote about structured data's value comes from Gary Illyes at Pubcon 2017:
"[Schema markup] will help us understand your pages better, and indirectly, it leads to better ranks in some sense, because we can rank easier... Add structured data to your pages because during indexing, we will be able to better understand what your site is about." — Gary Illyes, Google Analyst, Pubcon 2017
Illyes also encouraged broad adoption: "Don't just think about the structured data that we documented on developers.google.com. Think about any schema.org schema that you could use on your pages."
Source: http://www.thesempost.com/adding-structured-data-helps-google-understand-rank-webpages-better/
The former Google Search Liaison offered measured guidance:
"It's not 'structured data and you win AI.' It simply supports how systems understand and present content, just as it already does across Search features." — Danny Sullivan, Search Off the Record Podcast, December 2025
Sullivan also coined "Good SEO is good GEO" (Generative Engine Optimization), suggesting AI optimization isn't fundamentally different from traditional SEO.
Source: https://searchengineland.com/google-danny-sullivan-seo-for-ai-is-still-seo-466368
Perhaps most revealing, Google's Software Engineer for Structured Data stated:
"A lot of our systems run much better with structured data... it's computationally cheaper than extracting it." — Ryan Levering, Google Search Central Live New York, March 2025
This implies Google's AI systems prefer structured data when available simply because it's easier to process.
Google's Knowledge Graph, launched in 2012, contained 500 billion facts on 5 billion entities by May 2020. According to industry analysis, AI Overviews rely on this Knowledge Graph, which is "heavily populated by structured data pulled in from public websites."
Google Research has published extensively on retrieval-augmented approaches, including the foundational REALM paper (2020): https://arxiv.org/abs/2002.08909
Unlike Microsoft and Google, neither OpenAI nor Perplexity provides official documentation about how their systems process structured data from websites. This represents a significant gap in industry transparency.
OpenAI operates three crawlers:
Critical technical limitation: According to Vercel and MERJ research cited by Daydream, OpenAI's crawlers do NOT execute JavaScript: "Their joint analysis tracked over half a billion GPTBot fetches and found zero evidence of JavaScript execution."
This means only server-side rendered JSON-LD in static HTML can be accessed by OpenAI's systems.
Community testing on the OpenAI Developer Forum (June 2025) shows mixed results. One user reported: "My own tests show that when a page includes schema markup, ChatGPT's answers include details (emails, trial lengths, certifications) that are only present in the JSON-LD—not visible in plain HTML."
However, official OpenAI documentation provides no confirmation.
Cloudflare's August 2025 investigation revealed concerning practices:
"We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked... Both their declared and undeclared crawlers were attempting to access the content for scraping contrary to the web crawling norms." — Cloudflare Research, August 2025
Traffic detected: 20-25M daily requests from declared crawler; 3-6M daily requests from stealth crawler.
Perplexity's official documentation states PerplexityBot "is not used to crawl content for AI foundation models," but provides zero guidance on structured data usage.
Given the documentation gaps, industry best practices suggest implementing server-side rendered JSON-LD (since AI crawlers don't execute JavaScript) and focusing on schema types that research shows correlate with AI citations: Article, FAQPage, HowTo, Product, and Organization.
The relationship between structured data, knowledge graphs, and AI systems is well-documented in academic literature spanning 25+ years.
Tim Berners-Lee's Semantic Web Vision (2001)
"The Semantic Web is really data that is processable by machine... data with well defined meaning is exchanged, and computers and people work side by side in cooperation." — Tim Berners-Lee, James Hendler, Ora Lassila, Scientific American, May 2001
The Schema.org Paper (2016) R.V. Guha, Dan Brickley, and Steve Macbeth published the definitive reference: "Schema.org: Evolution of Structured Data on the Web" in Communications of the ACM. Key finding:
"Annotations in Schema.org are used as a data source for the Knowledge Graph, providing background information about well-known entities." — Guha et al., 2016, DOI: 10.1145/2844544
By 2024, over 45 million web domains use Schema.org markup, with over 450 billion Schema.org objects indexed.
The foundational RAG paper by Patrick Lewis et al. (Meta AI, NeurIPS 2020) introduced the framework that "combines the generative capabilities of LLMs with external knowledge retrieved from a separate database."
GraphRAG research has advanced significantly. The ACM Transactions survey (2025) by Boci Peng et al. explains:
"GraphRAG leverages structural information across entities to enable more precise and comprehensive retrieval, capturing relational knowledge that traditional RAG fails to represent." — DOI: 10.1145/3777378
Several influential papers demonstrate LLMs benefit from structured knowledge:
Recent research from December 2024 ("Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data", arXiv:2412.10654) confirms:
"By grounding the reasoning processes of LLMs with KGs, we can enhance the factual accuracy of the generated text and reduce hallucinations."
A benchmark study from Data.world (cited across industry sources) found:
"LLMs grounded in knowledge graphs achieve 300% higher accuracy compared to those relying solely on unstructured data."
"Structured data does come into play here. It's not that it's being trained on the structured data, but the structured data can be ingested during the RAG pipeline." — Mike King, SEO Week 2025
King also articulated the strategic shift: "SEO in the AI Mode world is no longer about chasing blue links. It's about building robust, retrievable, and reusable content artifacts that serve as input for machine synthesis."
"Schema markup allows for the entire web to be treated as a scattered database—with algorithms mining data from all over the web... to return the best answers to any query through the construction of relationship ontologies." — Bill Slawski, SMX Advanced
"Proper use of structured data can help with E-A-T for a number of reasons. For one, structured data helps establish and solidify the relationship between entities, particularly among the various places they are mentioned online." — Lily Ray, Search Engine Journal
"Use author, organization structured data for brand and entity salience that reinforces citation metadata." — Aleyda Solis, AI Search Content Optimization Checklist, June 2025
"Search engines leverage your Schema Markup and knowledge graph as data sources to train their machines and infer new knowledge. By developing your organization's knowledge graph, you can prime your organization's web data to be 'AI-ready'." — Schema App
Methodology: Three identical single-page sites with (1) well-implemented schema, (2) poorly implemented schema, (3) no schema—tested on keywords matched for difficulty (KD:3) and volume (60/month).
Results:
Source: https://searchengineland.com/schema-ai-overviews-structured-data-visibility-462353
Methodology: Added Wikidata references to article schema; ran paired t-tests and chi-square tests.
Results:
Source: https://www.smamarketing.net/blog/structured-data-ai-search-seo
Methodology: Tested identical content on structured vs. unstructured pages; asked ChatGPT same questions.
Result: ChatGPT responses using structured pages scored 30% higher for accuracy, completeness, and presentation quality.
Source: https://contentmarketinginstitute.com/seo-for-content/structured-data-ai-engines
| Metric | Result | Source |
|---|---|---|
| E-commerce traffic increase | 35% post-schema | Schema App |
| Rakuten recipe traffic | 2.7x increase | Google case study |
| Rich results CTR | 20-30% higher vs. standard | Industry studies |
| AI response visibility | 8% → 67% in 60 days | Radiant Elephant/HubSpot |
| AI Overview citations from beyond top-10 | 83.3% | BrightEdge, Sept 2025 |
Brave's independent index (30+ billion pages) includes "Schema Enriched Web Results"—structured data about webpages optimized for AI parsing. Their API provides "rich metadata ready for AI parsing" with a 94.1% F1-score on SimpleQA benchmark.
Offers Web Search API optimized for LLMs with "structured, context-rich data" and "citation-backed results." Claims 95%+ SimpleQA accuracy; DuckDuckGo uses their API for breaking news.
Google Assistant, Alexa, and Siri all rely heavily on structured data:
Glean, Microsoft 365 + Graphwise, and Altair RapidMiner all build knowledge graphs from structured data. Glean builds "unique knowledge graphs for each customer" using triplet structures (subject, predicate, object) to power generative AI responses.
The evidence points to a clear conclusion: structured data has evolved from an SEO enhancement to a foundational requirement for AI discoverability. While Google maintains that no "special" schema is needed for AI features, their own representatives confirm structured data helps systems understand content—and Microsoft has explicitly stated schema feeds their LLMs.
The most compelling findings include:
For organizations seeking AI visibility, the strategic imperative is clear: implement comprehensive, server-side rendered JSON-LD schema markup—particularly Article, FAQPage, HowTo, Product, Organization, and Person types—while ensuring consistency between markup and visible content. The semantic layer powered by structured data has become the bridge between web content and AI understanding.