11 min read

May 14, 2026

Schema for Agentic Search: A Practical Reference (2026)

Which Schema.org @type to ship for which content, copy-pasteable JSON-LD blocks, and the llms.txt-vs-schema decision tree for AI-mediated search.

Schema.org JSON-LD Structured Data Agentic Search

We The Flywheel Research & Analysis

Published May 14, 2026

Schema.org markup is the most under-deployed lever in AI-mediated search. Most teams treat it as a Google rich-snippet investment, see lukewarm results in Search Console, and then deprioritise it. The mistake is that generative engines and AI agents parse the raw JSON-LD during their extraction step regardless of whether Google ever surfaces a rich result. This is the practical reference: the seven types that matter, the JSON-LD blocks you can paste, and the decision tree between Schema.org and llms.txt.

Key takeaways

Seven types — Article, FAQPage, HowTo, Product, Organization, Person, Review. Ship these as JSON-LD; treat everything else as nice-to-have.
Two underused fields — author with a profile URL and a current dateModified. The two cheapest interventions with measurable citation lift.
JSON-LD only — JSON-LD is the format every major generative engine reads reliably. Microdata and RDFa work but are not worth the maintenance cost in 2026.
Schema vs llms.txt — Schema is per-page metadata. llms.txt is site-level routing. They solve different problems and both should ship.

Resources

Primary reference Schema.org · Full type catalogue Primary reference llms.txt specification Field experiment CTAIO Labs · Schema Citation Test (12 variants, 14-day measure) Category context Pillar: Agentic Search Category context Pillar: Generative Engine Optimization

The seven types that cover most cases

Schema.org has hundreds of types. In 2026, seven of them produce the majority of measurable citation lift for AI-mediated search. Pages outside specialised domains almost never need more.

Article (and its subtypes NewsArticle, BlogPosting, TechArticle). For any prose-led editorial page. Schema.org defines no strictly required fields, but the practical minimum for AI-mediated discovery and Google rich-result eligibility is: headline, description, image, datePublished, dateModified, author, and publisher.
FAQPage. Wraps a set of Question + acceptedAnswer pairs. The single highest-extraction-rate type for ChatGPT and Perplexity; FAQs are pulled into responses verbatim more often than any other content shape.
HowTo. Step-by-step instructional pages. Generative engines preferentially cite HowTo content when the user query is procedural. Ship the step array as ordered HowToStep items.
Product. For commercial pages. Include name, brand, offers with price and currency, and aggregateRating if applicable. Generative engines use this to power comparison queries.
Organization. Almost every page should have it. The entity definition that ties your brand mentions across the web back to the canonical source. Include name, url, logo, and sameAs with links to your Wikipedia entry, Wikidata, LinkedIn, and other authority profiles.
Person. The author entity. Generative engines weight author authority heavily, especially ChatGPT. Include name, url (to an author profile page on your domain), and sameAs to LinkedIn, Twitter/X, ORCID, or Wikipedia where applicable.
Review. For pages that include genuine third-party reviews. Generative engines surface review aggregates frequently in commercial queries. Misusing this type (self-review, fake aggregates) gets caught and downranked.

Beyond these seven, most Schema.org types are domain-specific: Recipe for cooking, Course for education, Event for tickets, JobPosting for hiring, SoftwareApplication for apps. Ship them when you operate in those domains. Otherwise the seven above are the ROI bracket.

Copy-pasteable JSON-LD

The two highest-volume cases. Adapt the URLs, dates, and authors to match your page; everything else is structural.

Article with author + publisher

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How to Rank in ChatGPT: The Per-Engine GEO Playbook",
  "description": "ChatGPT's search mode picks citations differently from Google AI Overviews and Perplexity.",
  "image": "https://wetheflywheel.com/img/og/guide-rank-in-chatgpt.jpg",
  "datePublished": "2026-05-14",
  "dateModified": "2026-05-14",
  "author": {
    "@type": "Person",
    "name": "Thomas Prommer",
    "url": "https://prommer.net/en/about/thomas-prommer/"
  },
  "publisher": {
    "@type": "Organization",
    "name": "We The Flywheel",
    "url": "https://wetheflywheel.com/",
    "logo": {
      "@type": "ImageObject",
      "url": "https://wetheflywheel.com/img/logo.svg"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://wetheflywheel.com/en/ai-search/how-to-rank-in-chatgpt/"
  }
}
</script>

FAQPage

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is schema for agentic search?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Schema.org markup that makes a page machine-readable for AI agents and generative engines, beyond what classic SEO requires."
      }
    },
    {
      "@type": "Question",
      "name": "Which Schema.org format should I use?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "JSON-LD. Microdata and RDFa work but generative engines parse JSON-LD most reliably."
      }
    }
  ]
}
</script>

Three notes. First, the @context should use https://schema.org; both http and https are technically valid per the specification, but the https form is the modern convention and a few validators warn on http. Second, the author.url should point to a real author page on your domain, not a third-party profile; third-party profiles go in sameAs. Third, FAQPage markup must reflect content actually visible on the page; hidden FAQs are a manual-action risk.

Schema vs llms.txt: the decision tree

Schema.org and llms.txt solve different problems and both should ship. The simplest mental model:

Schema.org describes this page to machines. Per-URL metadata, embedded in the HTML.
llms.txt describes this site to machines. A single document at the root, indexing the high-value pages with one-line descriptions, grouped by topic.

A typical llms.txt looks like this:

# We The Flywheel

> Editorial site on AI search, agent orchestration, and the tooling that supports both.

## AI Search

- [Generative Engine Optimization (GEO)](https://wetheflywheel.com/en/ai-search/generative-engine-optimization/): The discipline of shaping what generative engines say about your topic. Pillar.
- [Agentic Search](https://wetheflywheel.com/en/ai-search/agentic-search/): Marie Haynes' term for the next shift in search.
- [How to Rank in ChatGPT](https://wetheflywheel.com/en/ai-search/how-to-rank-in-chatgpt/): Per-engine playbook for ChatGPT with search.
- [Schema for Agentic Search](https://wetheflywheel.com/en/ai-search/schema-for-agentic-search/): JSON-LD reference for AI-mediated discovery.

## Guides

- [Best Agent Orchestration Frameworks 2026](https://wetheflywheel.com/en/guides/best-agent-orchestration-frameworks-2026/): Ten frameworks scored.
- [Best LLM Visibility Tools 2026](https://wetheflywheel.com/en/guides/best-llm-visibility-tools-2026/): Ten citation trackers scored.
- [AI Training Data Providers 2026](https://wetheflywheel.com/en/guides/ai-training-data-providers-2026/): Vendor comparison.

## Radar

- [GEO Tools Radar](https://wetheflywheel.com/en/radar/geo-tools/): Scored shortlist.

The spec is intentionally minimal. Markdown headings define sections; list items are links with a colon-separated description. The major agents already parse this file at the start of a session for navigation, and there is consistent practitioner evidence (including CTAIO Labs' 30-day citation experiment) that publishing it increases citation rate within two weeks on at least two of the four major engines.

Ship both. If you have to pick one, ship Schema.org first; it is more granular and lifts more queries. llms.txt is a faster intervention to add but it sits one layer up.

How each engine handles schema

ChatGPT with search. Parses Article, FAQPage, HowTo, Product, Organization, and Person reliably. Uses author and Organization signals during the rerank step.
Perplexity. Same set, with extra weight on dateModified. Stale dateModified drops a page out of the citation pool faster than on any other engine.
Google AI Overviews. Parses everything Google parses for classic Search. Malformed schema is silently ignored rather than penalised, but pages with clean, complete schema show up in AI Overviews disproportionately often. Validate aggressively.
Gemini. Treats schema similarly to AI Overviews, since both ride the same retrieval and ranking layer.
Bing Copilot. Bing has its own Schema.org coverage that is broader than Google's; Copilot inherits it. Several types Google does not render (like Service details) are parsed by Bing and surface in Copilot answers.
Claude with web search. Anthropic's behaviour here is the least documented publicly, but in practice ClaudeBot extracts JSON-LD blocks during its fetch and the model uses the fields in subsequent reasoning.

Validation and the agentic edge cases

Three tools cover most validation. Google's Rich Results Test validates the subset Google renders. Schema.org's official validator checks for type correctness against the broader specification. For LLM-specific parsing, the best check is empirical: deploy the schema, then prompt the four major engines for a specific structured data field (the published date, the author URL, a price). If they answer with the value within two weeks, the schema is being parsed.

The agentic edge cases that Schema.org does not currently cover well:

Long-running agent state. No standard Schema.org type for "this agent is currently working on X." The emerging convention is to expose it via an API, referenced from the page rather than described in schema.
Multi-step task descriptions. HowTo handles linear procedures but not conditional branching. For complex agent workflows, document the steps in prose; agents extract them reliably.
Machine-actionable APIs. Generative engines and agents increasingly try to call APIs the page describes. OpenAPI specs linked from the page are the practical answer; agents.json is an early proposal that may or may not consolidate.

None of those three are stable enough to bet on as primary in 2026. Ship Schema.org, ship llms.txt, and revisit the agent-specific conventions annually.

Field evidence from CTAIO Labs

Frequently asked questions

Which Schema.org types matter most for AI-mediated search?

Seven cover the vast majority of cases in 2026: Article (or its subtypes like NewsArticle, BlogPosting, TechArticle), FAQPage, HowTo, Product, Organization, Person, and Review. Most pages need two or three of those. A typical product page ships Product + Organization + Review. A typical editorial article ships Article + Organization + Person (for the author). An FAQ section adds FAQPage. Beyond those seven, most types are domain-specific (Recipe for cooking, Course for education, Event for tickets) and only worth shipping if you operate in that domain.

Should I use JSON-LD, Microdata, or RDFa?

JSON-LD. Every major generative engine parses it reliably and every modern site can ship it without touching the rendered HTML. Microdata embeds the markup inside the visible markup, which makes it heavier to maintain. RDFa is technically more expressive but rarely used in practice. Treat Microdata and RDFa as legacy formats; they still work, but JSON-LD is the format the ecosystem is investing in.

Do AI agents actually read raw JSON-LD?

Yes, and this is the most common misconception. Many teams skip Schema.org because Google's Search Console does not surface rich results for it. But generative engines and agentic systems parse the raw JSON-LD during their extraction step regardless of whether Google chose to render a rich snippet. CTAIO Labs is currently running a controlled A/B test of twelve schema variations across four LLMs at /en/labs/agentic-search/schema-citation-test/.

What is the difference between schema and llms.txt?

They solve complementary problems. Schema.org markup lives in the HTML of each page and describes that page's content to machines. llms.txt lives at the root of your domain (yourdomain.com/llms.txt) and provides an index of your most valuable pages with one-line descriptions, organised by topic. Schema is per-page metadata; llms.txt is site-level routing. The right answer is to ship both, not pick between them.

Which fields lift citation rate the most when added to existing schema?

Two are consistently underdeployed and consistently impactful: author (as a Person object with a profile URL, not just a name string) and dateModified (kept current via a build step, not manually). E-E-A-T-style signals (author credentials, publisher organisation, reviewedBy on factual pages) also lift citation rate measurably, especially on engines that weight authority highly like ChatGPT.

Does Schema.org cover the agentic-search use cases that classic SEO does not?

Mostly yes. The gaps are around long-running agent state, multi-step task descriptions, and machine-actionable APIs, none of which Schema.org currently addresses well. For those, the emerging conventions are llms.txt, OpenAPI specifications referenced from the page, and the early-stage agents.json proposal. None of those are stable enough to bet on alone in 2026; ship Schema.org, ship llms.txt, and revisit the more agent-specific specs annually.

How often should I update dateModified?

When the content actually changes. Both Perplexity and Gemini drop content with stale dateModified out of their citation pool faster than ChatGPT does. The wrong move is to bump the field on every deploy regardless of content changes; some engines penalise pages they suspect of artificially refreshing. Build a step into your CMS that updates dateModified only when the article body changes.

How do I validate my JSON-LD before shipping?

Three tools cover the cases. Google's Rich Results Test validates the subset Google supports for rich snippets. Schema.org's own validator (validator.schema.org) checks for type correctness more broadly. For LLM-specific behaviour, the best check is to actually query the engines: ask ChatGPT, Perplexity, and Gemini for a piece of structured data you just added (the published date, the author, a price) and see whether they cite it back. If they cite it within a week, the schema is being parsed.

Key takeaways

The seven types that cover most cases

Copy-pasteable JSON-LD

Article with author + publisher

FAQPage

Schema vs llms.txt: the decision tree

How each engine handles schema

Validation and the agentic edge cases

Field evidence from CTAIO Labs

Schema Citation Test (in progress)

llms.txt — 30-Day Citation Experiment

GEO vs AEO vs LLM-SEO — Three Playbooks

Related reads

Agentic Search

Generative Engine Optimization (GEO)

How to Rank in ChatGPT

Best LLM Visibility Tools 2026

WTF Radar · GEO Tools

Frequently asked questions

Which Schema.org types matter most for AI-mediated search?

Should I use JSON-LD, Microdata, or RDFa?

Do AI agents actually read raw JSON-LD?

What is the difference between schema and llms.txt?

Which fields lift citation rate the most when added to existing schema?

Does Schema.org cover the agentic-search use cases that classic SEO does not?

How often should I update dateModified?

How do I validate my JSON-LD before shipping?

Which Schema.org types matter most for AI-mediated search?

Should I use JSON-LD, Microdata, or RDFa?

Do AI agents actually read raw JSON-LD?

What is the difference between schema and llms.txt?

Which fields lift citation rate the most when added to existing schema?

Does Schema.org cover the agentic-search use cases that classic SEO does not?

How often should I update dateModified?

How do I validate my JSON-LD before shipping?

Ready to Find the Right AI Tools?

Continue Reading