Schema.org is a per-engine lever in 2026, not a universal one. Google's May 2026 AI-search guide says structured data is not required for AI Overviews. ChatGPT, Perplexity, and Claude with web search still parse raw JSON-LD during extraction and reward clean author and dateModified fields. Ship schema for the engines that consume it and for classic rich-result eligibility, and stop treating it as the lever that gets you into AI Overviews. This is the practical reference: the seven types that matter, the JSON-LD blocks you can paste, the per-engine breakdown, and the decision tree between Schema.org and llms.txt.
Key takeaways
- Seven types — Article, FAQPage, HowTo, Product, Organization, Person, Review. Ship these as JSON-LD; treat everything else as nice-to-have.
- Two underused fields — author with a profile URL and a current dateModified. The two cheapest interventions with measurable citation lift.
- JSON-LD only — JSON-LD is the format every major generative engine reads reliably. Microdata and RDFa work but are not worth the maintenance cost in 2026.
- Schema vs llms.txt — Schema is per-page metadata. llms.txt is site-level routing. They solve different problems and both should ship.
Google's May 2026 stance on schema for AI
On 15 May 2026, Google Search Central published its first official guide on optimising for generative AI features in Google Search. The guide is explicit on structured data: it is useful for classic rich results, it is not required for AI Overviews or AI Mode eligibility, and "overfocusing on structured data" is listed as one of the tactics teams should not lean on for AI visibility on Google surfaces.
Take the guide at face value. Schema.org no longer earns its keep on Google's AI surfaces specifically. Two reasons it still earns its keep elsewhere:
- Rich results on the classic SERP. Google still uses schema to render FAQ, HowTo, Product, and Review-style enhancements in the standard blue-link results. The dataset is broad and the lift on click-through is well measured.
- Other generative engines. ChatGPT with search, Perplexity, and Claude with web search continue to parse JSON-LD during extraction. CTAIO Labs' 12-variant schema citation test measures per-engine deltas across the four major LLMs over fourteen days.
Mental model: schema for AI is a per-engine lever, not a universal one. Ship it cleanly, don't over-engineer it for AI Overviews, and prioritise the universal moves (people-first content, sourced quotations, author authority) that work on every surface.
The seven types that cover most cases
Schema.org has hundreds of types. In 2026, seven of them produce the majority of measurable citation lift for AI-mediated search. Pages outside specialised domains almost never need more.
- Article (and its subtypes
NewsArticle,BlogPosting,TechArticle). For any prose-led editorial page. Schema.org defines no strictly required fields, but the practical minimum for AI-mediated discovery and Google rich-result eligibility is:headline,description,image,datePublished,dateModified,author, andpublisher. - FAQPage. Wraps a set of
Question+acceptedAnswerpairs. The single highest-extraction-rate type for ChatGPT and Perplexity; FAQs are pulled into responses verbatim more often than any other content shape. - HowTo. Step-by-step instructional pages. Generative engines preferentially cite HowTo content when the user query is procedural. Ship the
steparray as orderedHowToStepitems. - Product. For commercial pages. Include
name,brand,offerswith price and currency, andaggregateRatingif applicable. Generative engines use this to power comparison queries. - Organization. Almost every page should have it. The entity definition that ties your brand mentions across the web back to the canonical source. Include
name,url,logo, andsameAswith links to your Wikipedia entry, Wikidata, LinkedIn, and other authority profiles. - Person. The author entity. Generative engines weight author authority heavily, especially ChatGPT. Include
name,url(to an author profile page on your domain), andsameAsto LinkedIn, Twitter/X, ORCID, or Wikipedia where applicable. - Review. For pages that include genuine third-party reviews. Generative engines surface review aggregates frequently in commercial queries. Misusing this type (self-review, fake aggregates) gets caught and downranked.
Beyond these seven, most Schema.org types are domain-specific: Recipe for cooking, Course for education, Event for tickets, JobPosting for hiring, SoftwareApplication for apps. Ship them when you operate in those domains. Otherwise the seven above are the ROI bracket.
Copy-pasteable JSON-LD
The two highest-volume cases. Adapt the URLs, dates, and authors to match your page; everything else is structural.
Article with author + publisher
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "How to Rank in ChatGPT: The Per-Engine GEO Playbook",
"description": "ChatGPT's search mode picks citations differently from Google AI Overviews and Perplexity.",
"image": "https://wetheflywheel.com/img/og/guide-rank-in-chatgpt.jpg",
"datePublished": "2026-05-14",
"dateModified": "2026-05-14",
"author": {
"@type": "Person",
"name": "Thomas Prommer",
"url": "https://prommer.net/en/about/thomas-prommer/"
},
"publisher": {
"@type": "Organization",
"name": "We The Flywheel",
"url": "https://wetheflywheel.com/",
"logo": {
"@type": "ImageObject",
"url": "https://wetheflywheel.com/img/logo.svg"
}
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://wetheflywheel.com/en/ai-search/how-to-rank-in-chatgpt/"
}
}
</script> FAQPage
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is schema for agentic search?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Schema.org markup that makes a page machine-readable for AI agents and generative engines, beyond what classic SEO requires."
}
},
{
"@type": "Question",
"name": "Which Schema.org format should I use?",
"acceptedAnswer": {
"@type": "Answer",
"text": "JSON-LD. Microdata and RDFa work but generative engines parse JSON-LD most reliably."
}
}
]
}
</script>
Three notes. First, the @context should use https://schema.org; both http and https are technically valid per the specification, but the https form is the modern convention and a few validators warn on http. Second, the author.url should point to a real author page on your domain, not a third-party profile; third-party profiles go in sameAs. Third, FAQPage markup must reflect content actually visible on the page; hidden FAQs are a manual-action risk.
Schema vs llms.txt: the decision tree
Schema.org and llms.txt solve different problems and both should ship. The simplest mental model:
- Schema.org describes this page to machines. Per-URL metadata, embedded in the HTML.
- llms.txt describes this site to machines. A single document at the root, indexing the high-value pages with one-line descriptions, grouped by topic.
A typical llms.txt looks like this:
# We The Flywheel
> Editorial site on AI search, agent orchestration, and the tooling that supports both.
## AI Search
- [Generative Engine Optimization (GEO)](https://wetheflywheel.com/en/ai-search/generative-engine-optimization/): The discipline of shaping what generative engines say about your topic. Pillar.
- [Agentic Search](https://wetheflywheel.com/en/ai-search/agentic-search/): Marie Haynes' term for the next shift in search.
- [How to Rank in ChatGPT](https://wetheflywheel.com/en/ai-search/how-to-rank-in-chatgpt/): Per-engine playbook for ChatGPT with search.
- [Schema for Agentic Search](https://wetheflywheel.com/en/ai-search/schema-for-agentic-search/): JSON-LD reference for AI-mediated discovery.
## Guides
- [Best Agent Orchestration Frameworks 2026](https://wetheflywheel.com/en/guides/best-agent-orchestration-frameworks-2026/): Ten frameworks scored.
- [Best LLM Visibility Tools 2026](https://wetheflywheel.com/en/guides/best-llm-visibility-tools-2026/): Ten citation trackers scored.
- [AI Training Data Providers 2026](https://wetheflywheel.com/en/guides/ai-training-data-providers-2026/): Vendor comparison.
## Radar
- [GEO Tools Radar](https://wetheflywheel.com/en/radar/geo-tools/): Scored shortlist. The spec is intentionally minimal. Markdown headings define sections; list items are links with a colon-separated description. The major agents already parse this file at the start of a session for navigation, and there is consistent practitioner evidence (including CTAIO Labs' 30-day citation experiment) that publishing it increases citation rate within two weeks on at least two of the four major engines.
Ship both. If you have to pick one, ship Schema.org first; it is more granular and lifts more queries. llms.txt is a faster intervention to add but it sits one layer up.
How each engine handles schema
- ChatGPT with search. Parses Article, FAQPage, HowTo, Product, Organization, and Person reliably. Uses author and Organization signals during the rerank step.
- Perplexity. Same set, with extra weight on dateModified. Stale dateModified drops a page out of the citation pool faster than on any other engine.
- Google AI Overviews and AI Mode. Google's May 2026 guide states schema is not required for AI eligibility on its own AI surfaces; "overfocusing on structured data" is listed as something not to do for AI Overviews. Schema is still useful for rich results on the classic SERP and for general site quality, just don't treat it as the AI-Overviews lever.
- Gemini. Treats schema similarly to AI Overviews, since both ride the same retrieval and ranking layer. The May 2026 guidance covers Gemini app behaviour as well.
- Bing Copilot. Bing has its own Schema.org coverage that is broader than Google's; Copilot inherits it. Several types Google does not render (like
Servicedetails) are parsed by Bing and surface in Copilot answers. - Claude with web search. Anthropic's behaviour here is the least documented publicly, but in practice ClaudeBot extracts JSON-LD blocks during its fetch and the model uses the fields in subsequent reasoning.
Validation and the agentic edge cases
Three tools cover most validation. Google's Rich Results Test validates the subset Google renders. Schema.org's official validator checks for type correctness against the broader specification. For LLM-specific parsing, the best check is empirical: deploy the schema, then prompt the four major engines for a specific structured data field (the published date, the author URL, a price). If they answer with the value within two weeks, the schema is being parsed.
The agentic edge cases that Schema.org does not currently cover well:
- Long-running agent state. No standard Schema.org type for "this agent is currently working on X." The emerging convention is to expose it via an API, referenced from the page rather than described in schema.
- Multi-step task descriptions. HowTo handles linear procedures but not conditional branching. For complex agent workflows, document the steps in prose; agents extract them reliably.
- Machine-actionable APIs. Generative engines and agents increasingly try to call APIs the page describes. OpenAPI specs linked from the page are the practical answer; agents.json is an early proposal that may or may not consolidate.
None of those three are stable enough to bet on as primary in 2026. Ship Schema.org, ship llms.txt, and revisit the agent-specific conventions annually.
Field evidence from CTAIO Labs
Related reads
Frequently asked questions
Did Google say schema doesn't matter for AI search?
Almost, but with two important carve-outs. On 15 May 2026, Google Search Central's first official AI-search guide said structured data is not required for AI Overviews or AI Mode eligibility and listed 'overfocusing on structured data' as something not to do for AI on Google surfaces. That is authoritative for Google's own AI features. It does not extend to two other reasons schema is still worth shipping. First, Google still uses schema for classic rich results on the standard SERP, which is a real CTR lever independent of AI. Second, the other major generative engines (ChatGPT with search, Perplexity, Claude with web search) continue to parse JSON-LD during their extraction step and reward clean author and dateModified fields. Read schema as a per-engine lever in 2026, not a universal one.
Which Schema.org types matter most for AI-mediated search?
Seven cover the vast majority of cases in 2026: Article (or its subtypes like NewsArticle, BlogPosting, TechArticle), FAQPage, HowTo, Product, Organization, Person, and Review. Most pages need two or three of those. A typical product page ships Product + Organization + Review. A typical editorial article ships Article + Organization + Person (for the author). An FAQ section adds FAQPage. Beyond those seven, most types are domain-specific (Recipe for cooking, Course for education, Event for tickets) and only worth shipping if you operate in that domain.
Should I use JSON-LD, Microdata, or RDFa?
JSON-LD. Every major generative engine parses it reliably and every modern site can ship it without touching the rendered HTML. Microdata embeds the markup inside the visible markup, which makes it heavier to maintain. RDFa is technically more expressive but rarely used in practice. Treat Microdata and RDFa as legacy formats; they still work, but JSON-LD is the format the ecosystem is investing in.
Do AI agents actually read raw JSON-LD?
Yes, and this is the most common misconception. Many teams skip Schema.org because Google's Search Console does not surface rich results for it. But generative engines and agentic systems parse the raw JSON-LD during their extraction step regardless of whether Google chose to render a rich snippet. CTAIO Labs is currently running a controlled A/B test of twelve schema variations across four LLMs at /en/labs/agentic-search/schema-citation-test/.
What is the difference between schema and llms.txt?
They solve complementary problems. Schema.org markup lives in the HTML of each page and describes that page's content to machines. llms.txt lives at the root of your domain (yourdomain.com/llms.txt) and provides an index of your most valuable pages with one-line descriptions, organised by topic. Schema is per-page metadata; llms.txt is site-level routing. The right answer is to ship both, not pick between them.
Which fields lift citation rate the most when added to existing schema?
Two are consistently underdeployed and consistently impactful: author (as a Person object with a profile URL, not just a name string) and dateModified (kept current via a build step, not manually). E-E-A-T-style signals (author credentials, publisher organisation, reviewedBy on factual pages) also lift citation rate measurably, especially on engines that weight authority highly like ChatGPT.
Does Schema.org cover the agentic-search use cases that classic SEO does not?
Mostly yes. The gaps are around long-running agent state, multi-step task descriptions, and machine-actionable APIs, none of which Schema.org currently addresses well. For those, the emerging conventions are llms.txt, OpenAPI specifications referenced from the page, and the early-stage agents.json proposal. None of those are stable enough to bet on alone in 2026; ship Schema.org, ship llms.txt, and revisit the more agent-specific specs annually.
How often should I update dateModified?
When the content actually changes. Both Perplexity and Gemini drop content with stale dateModified out of their citation pool faster than ChatGPT does. The wrong move is to bump the field on every deploy regardless of content changes; some engines penalise pages they suspect of artificially refreshing. Build a step into your CMS that updates dateModified only when the article body changes.
How do I validate my JSON-LD before shipping?
Three tools cover the cases. Google's Rich Results Test validates the subset Google supports for rich snippets. Schema.org's own validator (validator.schema.org) checks for type correctness more broadly. For LLM-specific behaviour, the best check is to actually query the engines: ask ChatGPT, Perplexity, and Gemini for a piece of structured data you just added (the published date, the author, a price) and see whether they cite it back. If they cite it within a week, the schema is being parsed.
Did Google say schema doesn't matter for AI search?
Almost, but with two important carve-outs. On 15 May 2026, Google Search Central's first official AI-search guide said structured data is not required for AI Overviews or AI Mode eligibility and listed 'overfocusing on structured data' as something not to do for AI on Google surfaces. That is authoritative for Google's own AI features. It does not extend to two other reasons schema is still worth shipping. First, Google still uses schema for classic rich results on the standard SERP, which is a real CTR lever independent of AI. Second, the other major generative engines (ChatGPT with search, Perplexity, Claude with web search) continue to parse JSON-LD during their extraction step and reward clean author and dateModified fields. Read schema as a per-engine lever in 2026, not a universal one.
Which Schema.org types matter most for AI-mediated search?
Seven cover the vast majority of cases in 2026: Article (or its subtypes like NewsArticle, BlogPosting, TechArticle), FAQPage, HowTo, Product, Organization, Person, and Review. Most pages need two or three of those. A typical product page ships Product + Organization + Review. A typical editorial article ships Article + Organization + Person (for the author). An FAQ section adds FAQPage. Beyond those seven, most types are domain-specific (Recipe for cooking, Course for education, Event for tickets) and only worth shipping if you operate in that domain.
Should I use JSON-LD, Microdata, or RDFa?
JSON-LD. Every major generative engine parses it reliably and every modern site can ship it without touching the rendered HTML. Microdata embeds the markup inside the visible markup, which makes it heavier to maintain. RDFa is technically more expressive but rarely used in practice. Treat Microdata and RDFa as legacy formats; they still work, but JSON-LD is the format the ecosystem is investing in.
Do AI agents actually read raw JSON-LD?
Yes, and this is the most common misconception. Many teams skip Schema.org because Google's Search Console does not surface rich results for it. But generative engines and agentic systems parse the raw JSON-LD during their extraction step regardless of whether Google chose to render a rich snippet. CTAIO Labs is currently running a controlled A/B test of twelve schema variations across four LLMs at /en/labs/agentic-search/schema-citation-test/.
What is the difference between schema and llms.txt?
They solve complementary problems. Schema.org markup lives in the HTML of each page and describes that page's content to machines. llms.txt lives at the root of your domain (yourdomain.com/llms.txt) and provides an index of your most valuable pages with one-line descriptions, organised by topic. Schema is per-page metadata; llms.txt is site-level routing. The right answer is to ship both, not pick between them.
Which fields lift citation rate the most when added to existing schema?
Two are consistently underdeployed and consistently impactful: author (as a Person object with a profile URL, not just a name string) and dateModified (kept current via a build step, not manually). E-E-A-T-style signals (author credentials, publisher organisation, reviewedBy on factual pages) also lift citation rate measurably, especially on engines that weight authority highly like ChatGPT.
Does Schema.org cover the agentic-search use cases that classic SEO does not?
Mostly yes. The gaps are around long-running agent state, multi-step task descriptions, and machine-actionable APIs, none of which Schema.org currently addresses well. For those, the emerging conventions are llms.txt, OpenAPI specifications referenced from the page, and the early-stage agents.json proposal. None of those are stable enough to bet on alone in 2026; ship Schema.org, ship llms.txt, and revisit the more agent-specific specs annually.
How often should I update dateModified?
When the content actually changes. Both Perplexity and Gemini drop content with stale dateModified out of their citation pool faster than ChatGPT does. The wrong move is to bump the field on every deploy regardless of content changes; some engines penalise pages they suspect of artificially refreshing. Build a step into your CMS that updates dateModified only when the article body changes.
How do I validate my JSON-LD before shipping?
Three tools cover the cases. Google's Rich Results Test validates the subset Google supports for rich snippets. Schema.org's own validator (validator.schema.org) checks for type correctness more broadly. For LLM-specific behaviour, the best check is to actually query the engines: ask ChatGPT, Perplexity, and Gemini for a piece of structured data you just added (the published date, the author, a price) and see whether they cite it back. If they cite it within a week, the schema is being parsed.
Ready to Find the Right AI Tools?
Browse our data-driven rankings to find the best AI tools for your team.