Cloudflare AI Gateway

Cloudflare AI Gateway is a caching, rate limiting, and observability layer for AI traffic that sits at Cloudflare's global edge network. It supports routing to OpenAI, Anthropic, Groq, Workers AI, Azure OpenAI, and many other providers through a single unified endpoint, and adds built-in analytics, request logging, and rate controls.

Use this page when

You need the exact command, config, API, or integration details for Cloudflare AI Gateway.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Keeptrusts adds policy enforcement and compliance governance on top of Cloudflare AI Gateway. By placing Keeptrusts in front of the Cloudflare gateway, you get Keeptrusts's prompt-injection detection, PII redaction, content safety filters, and audit logging applied before requests reach Cloudflare — giving you a two-layer observability and governance stack.

Keeptrusts performs gateway-specific URL derivation: given your cloudflare_account_id, cloudflare_gateway_id, and the gateway provider sub-path (e.g. openai, workers-ai, anthropic), Keeptrusts derives the correct https://gateway.ai.cloudflare.com/v1/{account}/{gateway}/{provider} URL automatically.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

Cloudflare account with AI Gateway enabled — access via the Cloudflare dashboard.
Cloudflare API token with AI Gateway permissions, and your Account ID and Gateway ID from the Cloudflare dashboard.
Keeptrusts CLI — install kt (quickstart guide).
Export your credentials:

export CLOUDFLARE_ACCOUNT_ID="your-cloudflare-account-id"
export CLOUDFLARE_GATEWAY_ID="your-gateway-id"
export CLOUDFLARE_API_TOKEN="your-cloudflare-api-token"

When cloudflare_account_id_env and cloudflare_gateway_id_env are set, Keeptrusts derives the full gateway URL automatically. You do not need to set base_url manually unless you want to override it.

Configuration

A complete policy-config.yaml that routes traffic through Cloudflare AI Gateway (OpenAI backend) with prompt-injection, PII, and safety policies:

pack:
  name: cloudflare-via-gateway
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - safety-filter
  - audit-logger
policy:
  prompt-injection:
    threshold: 0.8
    action: block
  pii-detector:
    action: redact
  safety-filter:
    mode: strict
    action: block
  audit-logger:
    retention_days: 365
providers:
  strategy: single
  targets:
  - id: cf-gateway-openai
    provider: cloudflare-gateway:openai:gpt-4o
    secret_key_ref:
      env: CLOUDFLARE_API_TOKEN

Start the gateway:

kt gateway run \
  --listen 0.0.0.0:41002 \
  --policy-config policy-config.yaml

Provider Shorthand Syntax

The provider field encodes both the gateway provider sub-path and the model:

cloudflare-gateway:<gateway-provider>:<model>

Examples:

# OpenAI via Cloudflare AI Gateway
provider: "cloudflare-gateway:openai:gpt-4o"

# Anthropic via Cloudflare AI Gateway
provider: "cloudflare-gateway:anthropic:claude-3-5-sonnet-20241022"

# Cloudflare Workers AI
provider: "cloudflare-gateway:workers-ai:@cf/meta/llama-3.3-70b-instruct-fp8-fast"

# Groq via Cloudflare AI Gateway
provider: "cloudflare-gateway:groq:llama-3.3-70b-versatile"

Provider Fields

All fields available on a providers.targets[] entry for Cloudflare AI Gateway:

Field	Type	Default	Description
`id`	string	required	Unique identifier for this target. Used in logs, the console dashboard, and routing decisions.
`provider`	string	required	Provider ID in the form `"cloudflare-gateway:<gateway-provider>:<model>"`.
`cloudflare_account_id`	string	none	Cloudflare account ID (literal value). Use `cloudflare_account_id_env` to reference an env var instead. Alias: `accountId`.
`cloudflare_account_id_env`	string	none	Environment variable holding the Cloudflare account ID. Alias: `accountIdEnvar`.
`cloudflare_gateway_id`	string	none	Cloudflare gateway ID (literal value). Use `cloudflare_gateway_id_env` to reference an env var instead. Alias: `gatewayId`.
`cloudflare_gateway_id_env`	string	none	Environment variable holding the Cloudflare gateway ID. Alias: `gatewayIdEnvar`.
`secret_key_ref`	object	none	Object reference to the environment variable holding the Cloudflare API token. Auth is optional for public or upstream-authenticated gateways.
`base_url`	string	auto-derived	Explicit base URL override. When set, takes precedence over the derived URL. Format: `https://gateway.ai.cloudflare.com/v1/{account}/{gateway}/{provider}`.
`format`	string	`"openai"`	Wire format. Cloudflare AI Gateway uses OpenAI-compatible format for most backends.
`timeout_seconds`	integer	`60`	Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error.
`stream_timeout_seconds`	integer	inherits `timeout_seconds`	Maximum wall-clock time for streaming requests.
`description`	string	none	Human-readable label shown in the console dashboard and health-check output.
`weight`	float	`1.0`	Routing weight used by the `weighted_round_robin` strategy.
`health_probe`	object	none	Active health probe configuration. Sub-fields: `enabled` (bool), `interval_seconds` (int), `timeout_seconds` (int).

Azure OpenAI via Cloudflare additionally supports:

Field	Type	Description
`resource_name` / `resourceName`	string	Azure OpenAI resource name for Cloudflare Azure gateway path derivation.
`deployment_name` / `deploymentName`	string	Azure OpenAI deployment name.

Supported Models

Cloudflare AI Gateway supports models from multiple providers. The model identifier you use must match what the underlying backend expects:

Gateway Provider	Example Models
`openai`	`gpt-4o`, `gpt-4o-mini`, `o1-preview`
`anthropic`	`claude-3-5-sonnet-20241022`, `claude-3-haiku-20240307`
`groq`	`llama-3.3-70b-versatile`, `mixtral-8x7b-32768`
`mistral`	`mistral-large-latest`, `mistral-small-latest`
`workers-ai`	`@cf/meta/llama-3.3-70b-instruct-fp8-fast`, `@cf/mistral/mistral-7b-instruct-v0.1`, `@cf/google/gemma-7b-it`
`perplexity-ai`	`llama-3.1-sonar-large-128k-online`
`cohere`	`command-r-plus`
`google-ai-studio`	`gemini-2.0-flash`, `gemini-1.5-pro`

See the Cloudflare AI Gateway documentation for the full list of supported providers and model identifiers.

Client Examples

Once the gateway is running, point your client SDK to http://localhost:8080 instead of the Cloudflare gateway URL. The standard OpenAI SDK works directly for all backends that use the OpenAI wire format.

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # auth is handled by Keeptrusts via CLOUDFLARE_API_TOKEN
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how Cloudflare AI Gateway caching works."},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused", // auth handled by Keeptrusts via CLOUDFLARE_API_TOKEN
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain how Cloudflare AI Gateway caching works." },
  ],
  temperature: 0.7,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain how Cloudflare AI Gateway caching works."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Streaming

Keeptrusts fully supports streaming for all Cloudflare AI Gateway backends. Set stream: true in your request — the gateway applies policies to each chunk in real time.

pack:
  name: cloudflare-gateway-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cf-gateway-streaming
    provider: cloudflare-gateway:openai:gpt-4o
    secret_key_ref:
      env: CLOUDFLARE_API_TOKEN
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize Cloudflare's approach to AI safety."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused",
});

const stream = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize Cloudflare's approach to AI safety." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Summarize Cloudflare'\''s approach to AI safety."}],
    "stream": true
  }'

Advanced Configuration

Workers AI

Route to Cloudflare Workers AI models hosted at Cloudflare's edge. For workers-ai, Keeptrusts derives the URL as .../workers-ai/<model> and passes the full @cf/... model identifier:

pack:
  name: cloudflare-gateway-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cf-workers-ai-llama
    provider: cloudflare-gateway:workers-ai:@cf/meta/llama-3.3-70b-instruct-fp8-fast
    secret_key_ref:
      env: CLOUDFLARE_API_TOKEN
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Azure OpenAI via Cloudflare

For Azure OpenAI backends, set resource_name and deployment_name so Keeptrusts can derive the Azure-specific Cloudflare gateway path:

pack:
  name: cloudflare-gateway-providers-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cf-gateway-azure
    provider: cloudflare-gateway:azure-openai:gpt-4o
    secret_key_ref:
      env: CLOUDFLARE_API_TOKEN
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Multi-Backend Fallback

Fall back from Cloudflare Gateway to a direct provider if the gateway is unavailable:

pack:
  name: cloudflare-gateway-providers-6
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cf-gateway-primary
    provider: cloudflare-gateway:openai:gpt-4o
    secret_key_ref:
      env: CLOUDFLARE_API_TOKEN
  - id: openai-direct-fallback
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Best Practices

Use env vars for account and gateway IDs — set cloudflare_account_id_env and cloudflare_gateway_id_env rather than hardcoding values in policy-config.yaml. This keeps credentials out of source control.
Let Keeptrusts derive the URL — omit base_url and let the gateway construct the correct gateway URL from account and gateway IDs. Only set base_url explicitly if you need to override the standard Cloudflare gateway hostname.
Layer Keeptrusts policies with Cloudflare analytics — Cloudflare AI Gateway provides request logging and analytics at the edge; Keeptrusts adds policy enforcement and compliance audit trails at the application layer. Using both gives you defense in depth.
Configure Workers AI model IDs precisely — Workers AI model identifiers use the @cf/ prefix format (e.g. @cf/meta/llama-3.3-70b-instruct-fp8-fast). The exact identifier must match what Cloudflare's Workers AI API expects.
Set stream_timeout_seconds for large models — Workers AI models and large upstream models accessed via the Cloudflare gateway can have higher first-token latencies at peak load. Set stream_timeout_seconds to at least 180.
Test backend availability independently — Cloudflare AI Gateway's health is independent of the underlying model providers it routes to. Use health_probe and a fallback strategy to handle upstream model unavailability gracefully.

For AI systems

Canonical terms: Keeptrusts gateway, Cloudflare AI Gateway, Workers AI, edge inference, provider target, policy-config.yaml.
Config field names: provider, model, base_url, secret_key_ref.env, format: "openai", cloudflare_account_id, cloudflare_gateway_id, stream_timeout_seconds.
Key behavior: Keeptrusts sits in front of Cloudflare AI Gateway, adding policy enforcement to Cloudflare's caching, rate limiting, and analytics.
Best next pages: OpenRouter integration (alternative aggregator), Provider routing, Policy configuration.

For engineers

Prerequisites: Cloudflare account with AI Gateway enabled, account ID and gateway ID, API token with Workers AI permissions, kt CLI installed.
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Set stream_timeout_seconds to at least 180 — Workers AI and upstream models via Cloudflare can have variable first-token latencies at peak load.
Cloudflare AI Gateway health is independent of upstream model providers — use health_probe and fallback strategy to handle upstream model unavailability.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"@cf/meta/llama-3-8b-instruct","messages":[{"role":"user","content":"hello"}]}'.

For leaders

Cloudflare AI Gateway provides edge caching and built-in rate limiting — Keeptrusts adds policy enforcement and audit logging on top.
Running Keeptrusts in front of Cloudflare gives you vendor-independent policy controls that persist even if you switch edge providers.
Cloudflare's global edge network reduces latency for geographically distributed users; Keeptrusts policies execute before traffic reaches Cloudflare.
Monitor both Cloudflare analytics and Keeptrusts events dashboard to get complete visibility into request flow and policy decisions.

Next steps

OpenRouter integration — alternative multi-provider aggregation gateway
AIML API integration — alternative model aggregation endpoint
Provider routing strategies — fallback and latency-based routing
Policy configuration — prompt-injection and safety policy reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Provider Shorthand Syntax​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Workers AI​

Azure OpenAI via Cloudflare​

Multi-Backend Fallback​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​