7 min read

May 26, 2026

How to Get Your Tool Recommended by Codex (2026)

OpenAI Codex reads AGENTS.md from the repo root and leans on GPT-5.5's training priors. The levers that get your tool picked when a developer asks Codex what to use.

Codex OpenAI AGENTS.md Agent Tool Selection

We The Flywheel Research & Analysis

Published May 26, 2026

OpenAI Codex picks tools, and a small file in your repository root has an outsized say in which ones. Codex auto-injects AGENTS.md — the open convention for project-level agent instructions — and reads it as guidance, on top of the training priors its GPT-5.5-class model brings and an optional live web search. For a tool author, that makes AGENTS.md the lever Codex hands you, distinct from the MCP server that matters most for Claude Code. This is the playbook for getting recommended by Codex, sitting under the broader pillar on generative engine optimization, extended from citation to agent tool-selection.

Key takeaways

The distinctive lever — AGENTS.md. Codex auto-injects it from the repo root and your Codex config, so an AGENTS.md naming your tool as preferred is read directly and acted on.
The baseline lever — Training-data prevalence in GPT-5.5: npm and PyPI downloads, GitHub activity, and Stack Overflow frequency form the default priors.
The fallback lever — Live web search, when enabled, pulls current docs and READMEs. A clean README and llms.txt make that fallback accurate.
The honest caveat — OpenAI publishes no tool-ranking spec, and MCP support is not clearly documented for Codex, so these tactics are inference to test.

Resources

Primary source OpenAI · Codex CLI documentation Primary source AGENTS.md · the open agent-instructions convention Reference Codersera · llms.txt complete guide (2026) Field experiment CTAIO Labs · llms.txt 30-Day Citation Experiment Category context Pillar: Generative Engine Optimization (GEO)

How Codex chooses what to recommend

Codex is OpenAI's agentic CLI, running a GPT-5.5-class model. When it recommends or reaches for a tool, three inputs combine, from slow baseline to project-specific override.

Prior — training-data prevalence sets the default, driven by npm and PyPI downloads, GitHub activity, and Stack Overflow frequency.
Inject — AGENTS.md from the repository root and the user's Codex config is read directly as instruction.
Retrieve — live web search, when enabled, pulls current docs and READMEs as a fallback.
Recommend — Codex synthesises across these and suggests or uses a tool.

Two implications follow. First, AGENTS.md is the directly addressable lever: unlike training priors, which you influence only slowly, an AGENTS.md is something a project or template author controls outright and Codex reads on every run. Second, because MCP is not the standard path here that it is for Claude Code, optimising for Codex means investing in AGENTS.md and docs rather than an MCP server — a reminder that agent-native packaging is per-agent, not one-size-fits-all.

The playbook

Tactics ordered by leverage, calibrated for Codex. The first is Codex-specific; the rest are the durable fundamentals shared across agents.

Publish an AGENTS.md with explicit usage guidance. At your tool's repository root and in any starter or template you control, include an AGENTS.md that states when to prefer your library and how to use it correctly. Codex injects this directly, so it is the most reliable way to influence a recommendation for any project that adopts your file.
Build strong npm, PyPI, and GitHub signal. Training-data prevalence is the baseline prior, and it tracks download counts, repository activity, and discussion volume. This is the slow, compounding lever that makes Codex reach for you by default, before any AGENTS.md or live retrieval comes into play.
Keep a clean README and llms.txt for live retrieval. When live web search is on, the quality of what Codex finds decides the recommendation. A well-structured README and an llms.txt that indexes your key documentation give it accurate, current input — and the same assets serve the text answer engines and the other agents.
Target framework maintainers and template authors. An AGENTS.md in a widely-used starter sets the default for everyone who uses it. Getting your tool named as preferred in a popular framework's or template's AGENTS.md is a force multiplier that outperforms reaching individual developers one at a time.
Use conventional naming and clear capability descriptions. Codex reaches more readily for tools whose purpose is legible from their name and the first lines of their docs. Describe what your tool does in the vocabulary developers and models already use, rather than relying on a clever but opaque brand name.
Keep your docs current relative to the model cutoff. Where your tool has changed since the model's training cutoff, lean on the live-retrieval path — clear, current docs and an llms.txt — so Codex reasons about the present version of your tool rather than a stale one, the same problem Context7 solves for Claude Code.

What's different from Claude Code, Hermes Agent, and OpenClaw

The agents diverge mainly on their first-class extension format, which is where the recommendation lever sits for each.

Claude Code centres on MCP servers and Context7 for live doc injection; Codex centres on AGENTS.md. If you target both, build both. The Claude Code playbook is at get recommended by Claude Code.
Hermes Agent uses a SKILL.md skills system and a community registry, a different packaging format again. The Hermes playbook is at get recommended by Hermes Agent.
OpenClaw extends through ClawHub skills, also SKILL.md-based, overlapping with Hermes. The OpenClaw playbook is at get recommended by OpenClaw.
The shared baseline — training-data prevalence and excellent public docs — helps with all four, so the foundational work compounds even as the agent-native packaging differs per agent.

Measurement

As with the other agents, the loop is proxies and direct tests. Build it in three layers:

Direct testing. Run Codex on representative tasks and prompts, record whether it reaches for your tool, and re-test after publishing an AGENTS.md or improving docs. The most direct signal there is.
Install and sign-up correlation. Watch for the agent-driven discovery pattern — developers arriving already naming your tool — and correlate it with the changes you ship.
Cross-reference the text engines. The docs-and-llms.txt work that helps Codex also helps the answer engines, so an LLM-visibility tracker gives a related read. The Radar's shortlist is at 6 GEO Tools the Radar Actually Recommends; CTAIO Labs tested ten in the visibility tools test.

Frequently asked questions

How does Codex decide which tool to recommend?

Three inputs, broadly. The baseline is training-data prevalence in its GPT-5.5-class model — npm and PyPI download counts, GitHub activity, and Stack Overflow frequency shape which tools it reaches for by default. On top of that, Codex auto-injects an AGENTS.md file from the repository root and the user's Codex config, which can name preferred libraries directly, and it can fall back to live web search when enabled. OpenAI does not publish a formal ranking algorithm, so this describes the documented inputs rather than a disclosed scoring formula.

What is AGENTS.md and why does it matter for Codex?

AGENTS.md is an open convention for giving coding agents project-level instructions, and Codex auto-injects it from the repository root and the user's ~/.codex config. For a tool author, this is the highest-signal lever: an AGENTS.md that states when to prefer your library and how to use it is read directly by Codex and acted on for that project. If your tool is the documented default in a popular template's AGENTS.md, every developer using that template gets it recommended by default.

Does Codex support MCP servers like Claude Code?

Not in a clearly documented, standard way as of this writing, which is an important divergence from Claude Code. Where Claude Code's strongest lever is shipping an MCP server, Codex's is publishing an AGENTS.md and maintaining strong public docs for its live-web-search fallback. If you are prioritising effort across agents, build the MCP server for the agents that use it and the AGENTS.md for Codex, rather than assuming one packaging format covers both.

How much does my npm or PyPI presence matter?

It is the baseline prior. GPT-5.5's training data reflects package-registry download counts, GitHub stars and activity, and how often your tool appears in documentation and discussion, so strong registry and repository signal makes you a default suggestion. This is the slow, compounding lever — it cannot be shortcut — but it is what determines whether Codex reaches for you without any AGENTS.md or live-retrieval prompting.

Does an llms.txt help with Codex?

It helps the live-web-search fallback. When Codex retrieves current information, a clean llms.txt and a well-structured README give it accurate, up-to-date input, improving the quality and likelihood of a recommendation. As with the text answer engines, it is not a ranking signal in itself, but it is cheap to maintain and improves every live-retrieval path, so it is worth having regardless of which agent you are targeting.

How is getting recommended by Codex different from ranking in an AI search engine?

It is tool selection, not citation. Codex chooses a library or tool to use, often autonomously inside a coding task, rather than citing a page for a reader. The levers shift accordingly: from on-page GEO tactics to AGENTS.md, training-data prevalence, and docs quality. The common ground is that being well-documented and well-represented in public data helps with both surfaces, so the foundational work overlaps even where the specific levers differ.

How do I measure whether this is working?

Indirectly. Test directly by running Codex on representative tasks and prompts and observing whether it reaches for your tool, re-testing after you publish an AGENTS.md or improve your docs. Correlate installs and sign-ups with agent-driven discovery patterns. Because OpenAI publishes no recommendation analytics, treat measurement as direct testing plus proxy metrics, and run changes as experiments rather than expecting a dashboard.

How does Codex decide which tool to recommend?

What is AGENTS.md and why does it matter for Codex?

Does Codex support MCP servers like Claude Code?

How much does my npm or PyPI presence matter?

Does an llms.txt help with Codex?

How is getting recommended by Codex different from ranking in an AI search engine?

How do I measure whether this is working?

Explore More

Ready to Find the Right AI Tools?

Browse our data-driven rankings to find the best AI tools for your team.

View AI Rankings Get in Touch

Key takeaways

How Codex chooses what to recommend

The playbook

What's different from Claude Code, Hermes Agent, and OpenClaw

Measurement

Related reads

Get Recommended by Claude Code

Get Recommended by OpenClaw

OpenClaw vs Codex

Generative Engine Optimization (GEO)

WTF Radar · GEO Tools

Frequently asked questions

How does Codex decide which tool to recommend?

What is AGENTS.md and why does it matter for Codex?

Does Codex support MCP servers like Claude Code?

How much does my npm or PyPI presence matter?

Does an llms.txt help with Codex?

How is getting recommended by Codex different from ranking in an AI search engine?

How do I measure whether this is working?

How does Codex decide which tool to recommend?

What is AGENTS.md and why does it matter for Codex?

Does Codex support MCP servers like Claude Code?

How much does my npm or PyPI presence matter?

Does an llms.txt help with Codex?

How is getting recommended by Codex different from ranking in an AI search engine?

How do I measure whether this is working?

Ready to Find the Right AI Tools?

Continue Reading