Key Takeaways
- An AI gateway is a control point, not just a proxy. — It sits in front of multiple LLM providers behind one endpoint and adds token-aware routing, model failover, prompt caching, per-request observability, and spend governance — the operational layer raw provider APIs do not give you.
- Choose by the job, not the model catalog. — Catalog breadth is commoditized; the differentiator is what the gateway does around the call. Governance (Portkey), infra ownership (LiteLLM), observability (Helicone), platform-native simplicity (Cloudflare/Vercel), or an existing API estate (Kong).
- Hosted vs self-hosted is the first fork. — A hosted gateway (OpenRouter, Portkey) is fastest to adopt; a self-hosted router (LiteLLM, Helicone OSS) keeps prompts in your perimeter at the cost of operating it. Decide data residency before features.
- You can layer a gateway over an aggregator. — LiteLLM or Portkey can treat OpenRouter as one provider — keep the catalog breadth while adding governance or self-hosting, without an all-or-nothing migration.
What an AI gateway is
An AI gateway is one endpoint in front of many LLM providers that adds the operational layer raw APIs leave out: token-aware routing, model failover, prompt caching, per-request observability, and spend governance. Your app calls the gateway once, in an OpenAI-compatible shape; the gateway decides which model serves the request and records what it cost. It is a control point, not a passthrough. For the precise contrast with a classic API gateway, see AI gateway vs API gateway.
The six that matter, by job
Catalog breadth is commoditized — most gateways front well over a thousand models — so the choice is about the job each one wins, not the model list.
- Portkey — Managed control plane. Fronts 1,600+ models with guardrails, governance, prompt management, and budget controls.
- LiteLLM — Self-hosted router. Open-source proxy you run yourself; becomes the stable OpenAI-compatible contract for internal apps.
- OpenRouter — Hosted aggregator. Widest hosted-model catalog with the least setup; easiest on-ramp, hosted routing.
- Helicone — Observability-first gateway. OpenAI-compatible gateway built around per-request cost, latency, sessions, and caching.
- Cloudflare / Vercel AI Gateway — Platform-native. Routing, caching, and analytics at the edge of the platform you already deploy to — one fewer vendor.
- Kong AI Gateway — Enterprise API estate. Extends existing API management — the governance, auth, and rate-limiting your REST traffic already uses.
The first fork: hosted or self-hosted
Decide data residency before features. A hosted gateway (OpenRouter, Portkey’s managed tier) is the fastest to adopt and offloads operations. A self-hosted router (LiteLLM, Helicone’s open-source gateway) keeps prompts inside your perimeter and removes per-call margin, at the cost of running it. This fork usually decides more than the feature comparison does.
How to choose
Governance and spend control across teams: Portkey. Prompts that must not leave your network, or one router to standardize on: LiteLLM. Cost and latency visibility: Helicone. One fewer vendor on a platform you already run: Cloudflare or Vercel AI Gateway. Extending an enterprise API estate: Kong AI Gateway. Maximum model breadth with zero setup: OpenRouter. The full head-to-head sits in the best LLM gateway ranking.
You can layer over an aggregator
Leaving a hosted aggregator is rarely all-or-nothing. Because LiteLLM and Portkey can treat OpenRouter as one provider behind them, you can keep its catalog breadth while adding governance, caching, or self-hosting in front. That wrap-then-replace path is covered from the migration side in OpenRouter alternatives. Landscape last verified 2026-06-10; this category moves monthly.
What is an AI gateway?
An AI gateway is a single endpoint that sits in front of multiple LLM providers and adds the operational layer around model calls: token-aware routing to the cheapest or healthiest provider, automatic failover when one is down, prompt caching, per-request observability, and spend governance. Your application calls the gateway once, in an OpenAI-compatible shape, and the gateway handles which model actually serves the request. It is the difference between calling a provider and operating a fleet of them.
What is the difference between an AI gateway and an LLM router?
They overlap heavily and the terms are often used interchangeably. "LLM router" emphasizes the routing decision — picking which model or provider serves each request. "AI gateway" is the broader term for the whole control point, of which routing is one function alongside caching, observability, governance, and failover. In practice most routers have grown into gateways, and most gateways route, so the distinction is mostly emphasis rather than category.
Do I need an AI gateway, or can I call the provider directly?
For one model at low volume, calling the provider directly is fine. A gateway earns its place when you call several models, need resilience against any one provider failing, want to cap and attribute spend across teams, or need request-level observability you do not get from raw APIs. The trigger is operational complexity, not model count alone: the moment "which model served this and what did it cost" becomes a real question, a gateway pays for itself.
Which AI gateway is best?
There is no single best — it depends on the job. Portkey leads for a managed governance control plane, LiteLLM for a self-hosted router inside your own infrastructure, Helicone for observability-first teams, Cloudflare or Vercel AI Gateway when you want one fewer vendor on a platform you already use, and Kong when you are extending an existing enterprise API estate. Our head-to-head ranking breaks this down by use case.
Ready to Find the Right AI Tools?
Browse our data-driven rankings to find the best AI tools for your team.