Cloudflare AI Gateway
Cloudflare AI Gateway is a caching, rate limiting, and observability layer for AI traffic that sits at Cloudflare's global edge network. It supports routing to OpenAI, Anthropic, Groq, Workers AI, Azure OpenAI, and many other providers through a single unified endpoint, and adds built-in analytics, request logging, and rate controls.
Use this page when
- You need the exact command, config, API, or integration details for Cloudflare AI Gateway.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Keeptrusts adds policy enforcement and compliance governance on top of Cloudflare AI Gateway. By placing Keeptrusts in front of the Cloudflare gateway, you get Keeptrusts's prompt-injection detection, PII redaction, content safety filters, and audit logging applied before requests reach Cloudflare — giving you a two-layer observability and governance stack.
Keeptrusts performs gateway-specific URL derivation: given your cloudflare_account_id, cloudflare_gateway_id, and the gateway provider sub-path (e.g. openai, workers-ai, anthropic), Keeptrusts derives the correct https://gateway.ai.cloudflare.com/v1/{account}/{gateway}/{provider} URL automatically.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Prerequisites
- Cloudflare account with AI Gateway enabled — access via the Cloudflare dashboard.
- Cloudflare API token with AI Gateway permissions, and your Account ID and Gateway ID from the Cloudflare dashboard.
- Keeptrusts CLI — install
kt(quickstart guide). - Export your credentials:
export CLOUDFLARE_ACCOUNT_ID="your-cloudflare-account-id"
export CLOUDFLARE_GATEWAY_ID="your-gateway-id"
export CLOUDFLARE_API_TOKEN="your-cloudflare-api-token"
When cloudflare_account_id_env and cloudflare_gateway_id_env are set, Keeptrusts derives the full gateway URL automatically. You do not need to set base_url manually unless you want to override it.
Configuration
A complete policy-config.yaml that routes traffic through Cloudflare AI Gateway (OpenAI backend) with prompt-injection, PII, and safety policies:
pack:
name: cloudflare-via-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: cf-gateway-openai
provider: cloudflare-gateway:openai:gpt-4o
secret_key_ref:
env: CLOUDFLARE_API_TOKEN
Start the gateway:
kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml
Provider Shorthand Syntax
The provider field encodes both the gateway provider sub-path and the model:
cloudflare-gateway:<gateway-provider>:<model>
Examples:
# OpenAI via Cloudflare AI Gateway
provider: "cloudflare-gateway:openai:gpt-4o"
# Anthropic via Cloudflare AI Gateway
provider: "cloudflare-gateway:anthropic:claude-3-5-sonnet-20241022"
# Cloudflare Workers AI
provider: "cloudflare-gateway:workers-ai:@cf/meta/llama-3.3-70b-instruct-fp8-fast"
# Groq via Cloudflare AI Gateway
provider: "cloudflare-gateway:groq:llama-3.3-70b-versatile"
Provider Fields
All fields available on a providers.targets[] entry for Cloudflare AI Gateway:
| Field | Type | Default | Description |
|---|---|---|---|
id | string | required | Unique identifier for this target. Used in logs, the console dashboard, and routing decisions. |
provider | string | required | Provider ID in the form "cloudflare-gateway:<gateway-provider>:<model>". |
cloudflare_account_id | string | none | Cloudflare account ID (literal value). Use cloudflare_account_id_env to reference an env var instead. Alias: accountId. |
cloudflare_account_id_env | string | none | Environment variable holding the Cloudflare account ID. Alias: accountIdEnvar. |
cloudflare_gateway_id | string | none | Cloudflare gateway ID (literal value). Use cloudflare_gateway_id_env to reference an env var instead. Alias: gatewayId. |
cloudflare_gateway_id_env | string | none | Environment variable holding the Cloudflare gateway ID. Alias: gatewayIdEnvar. |
secret_key_ref | object | none | Object reference to the environment variable holding the Cloudflare API token. Auth is optional for public or upstream-authenticated gateways. |
base_url | string | auto-derived | Explicit base URL override. When set, takes precedence over the derived URL. Format: https://gateway.ai.cloudflare.com/v1/{account}/{gateway}/{provider}. |
format | string | "openai" | Wire format. Cloudflare AI Gateway uses OpenAI-compatible format for most backends. |
timeout_seconds | integer | 60 | Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error. |
stream_timeout_seconds | integer | inherits timeout_seconds | Maximum wall-clock time for streaming requests. |
description | string | none | Human-readable label shown in the console dashboard and health-check output. |
weight | float | 1.0 | Routing weight used by the weighted_round_robin strategy. |
health_probe | object | none | Active health probe configuration. Sub-fields: enabled (bool), interval_seconds (int), timeout_seconds (int). |
Azure OpenAI via Cloudflare additionally supports:
| Field | Type | Description |
|---|---|---|
resource_name / resourceName | string | Azure OpenAI resource name for Cloudflare Azure gateway path derivation. |
deployment_name / deploymentName | string | Azure OpenAI deployment name. |
Supported Models
Cloudflare AI Gateway supports models from multiple providers. The model identifier you use must match what the underlying backend expects:
| Gateway Provider | Example Models |
|---|---|
openai | gpt-4o, gpt-4o-mini, o1-preview |
anthropic | claude-3-5-sonnet-20241022, claude-3-haiku-20240307 |
groq | llama-3.3-70b-versatile, mixtral-8x7b-32768 |
mistral | mistral-large-latest, mistral-small-latest |
workers-ai | @cf/meta/llama-3.3-70b-instruct-fp8-fast, @cf/mistral/mistral-7b-instruct-v0.1, @cf/google/gemma-7b-it |
perplexity-ai | llama-3.1-sonar-large-128k-online |
cohere | command-r-plus |
google-ai-studio | gemini-2.0-flash, gemini-1.5-pro |
See the Cloudflare AI Gateway documentation for the full list of supported providers and model identifiers.
Client Examples
Once the gateway is running, point your client SDK to http://localhost:8080 instead of the Cloudflare gateway URL. The standard OpenAI SDK works directly for all backends that use the OpenAI wire format.
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via CLOUDFLARE_API_TOKEN
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain how Cloudflare AI Gateway caching works."},
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused", // auth handled by Keeptrusts via CLOUDFLARE_API_TOKEN
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain how Cloudflare AI Gateway caching works." },
],
temperature: 0.7,
max_tokens: 512,
});
console.log(response.choices[0].message.content);
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain how Cloudflare AI Gateway caching works."}
],
"temperature": 0.7,
"max_tokens": 512
}'
Streaming
Keeptrusts fully supports streaming for all Cloudflare AI Gateway backends. Set stream: true in your request — the gateway applies policies to each chunk in real time.
pack:
name: cloudflare-gateway-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: cf-gateway-streaming
provider: cloudflare-gateway:openai:gpt-4o
secret_key_ref:
env: CLOUDFLARE_API_TOKEN
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize Cloudflare's approach to AI safety."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused",
});
const stream = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Summarize Cloudflare's approach to AI safety." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Summarize Cloudflare'\''s approach to AI safety."}],
"stream": true
}'
Advanced Configuration
Workers AI
Route to Cloudflare Workers AI models hosted at Cloudflare's edge. For workers-ai, Keeptrusts derives the URL as .../workers-ai/<model> and passes the full @cf/... model identifier:
pack:
name: cloudflare-gateway-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: cf-workers-ai-llama
provider: cloudflare-gateway:workers-ai:@cf/meta/llama-3.3-70b-instruct-fp8-fast
secret_key_ref:
env: CLOUDFLARE_API_TOKEN
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Azure OpenAI via Cloudflare
For Azure OpenAI backends, set resource_name and deployment_name so Keeptrusts can derive the Azure-specific Cloudflare gateway path:
pack:
name: cloudflare-gateway-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: cf-gateway-azure
provider: cloudflare-gateway:azure-openai:gpt-4o
secret_key_ref:
env: CLOUDFLARE_API_TOKEN
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Multi-Backend Fallback
Fall back from Cloudflare Gateway to a direct provider if the gateway is unavailable:
pack:
name: cloudflare-gateway-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: cf-gateway-primary
provider: cloudflare-gateway:openai:gpt-4o
secret_key_ref:
env: CLOUDFLARE_API_TOKEN
- id: openai-direct-fallback
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Best Practices
- Use env vars for account and gateway IDs — set
cloudflare_account_id_envandcloudflare_gateway_id_envrather than hardcoding values inpolicy-config.yaml. This keeps credentials out of source control. - Let Keeptrusts derive the URL — omit
base_urland let the gateway construct the correct gateway URL from account and gateway IDs. Only setbase_urlexplicitly if you need to override the standard Cloudflare gateway hostname. - Layer Keeptrusts policies with Cloudflare analytics — Cloudflare AI Gateway provides request logging and analytics at the edge; Keeptrusts adds policy enforcement and compliance audit trails at the application layer. Using both gives you defense in depth.
- Configure Workers AI model IDs precisely — Workers AI model identifiers use the
@cf/prefix format (e.g.@cf/meta/llama-3.3-70b-instruct-fp8-fast). The exact identifier must match what Cloudflare's Workers AI API expects. - Set
stream_timeout_secondsfor large models — Workers AI models and large upstream models accessed via the Cloudflare gateway can have higher first-token latencies at peak load. Setstream_timeout_secondsto at least180. - Test backend availability independently — Cloudflare AI Gateway's health is independent of the underlying model providers it routes to. Use
health_probeand afallbackstrategy to handle upstream model unavailability gracefully.
For AI systems
- Canonical terms: Keeptrusts gateway, Cloudflare AI Gateway, Workers AI, edge inference, provider target, policy-config.yaml.
- Config field names:
provider,model,base_url,secret_key_ref.env,format: "openai",cloudflare_account_id,cloudflare_gateway_id,stream_timeout_seconds. - Key behavior: Keeptrusts sits in front of Cloudflare AI Gateway, adding policy enforcement to Cloudflare's caching, rate limiting, and analytics.
- Best next pages: OpenRouter integration (alternative aggregator), Provider routing, Policy configuration.
For engineers
- Prerequisites: Cloudflare account with AI Gateway enabled, account ID and gateway ID, API token with Workers AI permissions,
ktCLI installed. - Start command:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml. - Set
stream_timeout_secondsto at least 180 — Workers AI and upstream models via Cloudflare can have variable first-token latencies at peak load. - Cloudflare AI Gateway health is independent of upstream model providers — use
health_probeandfallbackstrategy to handle upstream model unavailability. - Validate:
curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"@cf/meta/llama-3-8b-instruct","messages":[{"role":"user","content":"hello"}]}'.
For leaders
- Cloudflare AI Gateway provides edge caching and built-in rate limiting — Keeptrusts adds policy enforcement and audit logging on top.
- Running Keeptrusts in front of Cloudflare gives you vendor-independent policy controls that persist even if you switch edge providers.
- Cloudflare's global edge network reduces latency for geographically distributed users; Keeptrusts policies execute before traffic reaches Cloudflare.
- Monitor both Cloudflare analytics and Keeptrusts events dashboard to get complete visibility into request flow and policy decisions.
Next steps
- OpenRouter integration — alternative multi-provider aggregation gateway
- AIML API integration — alternative model aggregation endpoint
- Provider routing strategies — fallback and latency-based routing
- Policy configuration — prompt-injection and safety policy reference
- Quickstart — install
ktand run your first gateway