Rate Limits Configuration
Keeptrusts supports five independent rate limiting scopes, each configured as a top-level section in your policy config. All scopes can optionally use a distributed Redis/Valkey backend for multi-instance coordination.
Use this page when
- You are configuring request, IP, user, or token rate limits for your Keeptrusts gateway.
- You need distributed rate limiting across multiple gateway instances using Redis or Valkey.
- You are tuning size limits for request bodies, headers, or response payloads.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Quick reference
global_rate_limit:
max_requests: 1000
window_seconds: 60
ip_rate_limit:
max_requests: 100
window_seconds: 60
user_rate_limit:
max_requests: 30
window_seconds: 60
header_names: ["x-user-id"]
token_rate_limit:
max_tokens: 500000
window_seconds: 3600
scope: "global"
size_limits:
max_body_bytes: 1048576
max_response_bytes: 10485760
Global rate limit
A single counter for all requests to this gateway instance.
global_rate_limit:
max_requests: 1000 # required
window_seconds: 60 # required
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_requests | integer | yes | — | Maximum requests per window |
window_seconds | integer | yes | — | Fixed window size in seconds |
Runtime behavior: Atomic counter with epoch-based fixed window reset. Returns HTTP 429 Too Many Requests with Retry-After header when exceeded.
Per-IP rate limit
Independent counters per client IP address.
ip_rate_limit:
max_requests: 100 # required
window_seconds: 60 # required
trust_proxy_depth: 1 # optional, default: 0
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_requests | integer | yes | — | Maximum requests per IP per window |
window_seconds | integer | yes | — | Window size in seconds |
trust_proxy_depth | integer | no | 0 | Number of X-Forwarded-For hops to trust. 0 uses the direct connection IP |
Behind a reverse proxy
When running behind nginx or a load balancer, set trust_proxy_depth to the number of trusted proxies in the chain:
# Gateway behind one nginx reverse proxy
ip_rate_limit:
max_requests: 50
window_seconds: 60
trust_proxy_depth: 1
Per-user rate limit
Independent counters per user identity extracted from request headers.
user_rate_limit:
max_requests: 30 # required
window_seconds: 60 # required
header_names: # optional
- "x-user-id"
- "x-consumer-id"
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_requests | integer | yes | — | Maximum requests per user per window |
window_seconds | integer | yes | — | Window size in seconds |
header_names | string[] | no | ["x-user-id"] | Headers to extract user identity from. First non-empty value wins |
Requests without a matching header are bucketed as unknown and share a single counter.
Token rate limit
Sliding-window limit on total LLM token consumption (prompt + completion).
token_rate_limit:
max_tokens: 500000 # required
window_seconds: 3600 # required
scope: "global" # optional: global | per_key | per_ip
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_tokens | integer | yes | — | Maximum tokens consumed per window |
window_seconds | integer | yes | — | Sliding window size in seconds |
scope | string | no | "global" | Bucketing scope: global, per_key, or per_ip |
Runtime behavior: Uses a 6-sub-window sliding window. Tokens are recorded after the upstream response (from usage.total_tokens). The next request is checked against the remaining budget. Exceeding returns HTTP 429.
Scope examples
# Global: single bucket for all traffic
token_rate_limit:
max_tokens: 1000000
window_seconds: 3600
scope: "global"
# Per API key: each key gets its own budget
token_rate_limit:
max_tokens: 100000
window_seconds: 3600
scope: "per_key"
# Per IP: each client IP gets its own budget
token_rate_limit:
max_tokens: 50000
window_seconds: 3600
scope: "per_ip"
Size limits
Byte-level limits on request and response payloads.
size_limits:
max_body_bytes: 1048576 # 1 MB request body
max_header_bytes: 8192 # 8 KB headers
max_url_bytes: 4096 # 4 KB URL
max_response_bytes: 10485760 # 10 MB response
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_body_bytes | integer | no | unlimited | Maximum request body size |
max_header_bytes | integer | no | unlimited | Maximum total header size |
max_url_bytes | integer | no | unlimited | Maximum URL length |
max_response_bytes | integer | no | unlimited | Maximum response body size |
Exceeding any limit returns HTTP 413 Payload Too Large. Limits are checked in order: body → headers → URL → response.
Consumer group overrides can relax or tighten these per-key:
consumer_groups:
groups:
- name: "enterprise"
api_keys: ["sha256:abc123..."]
# size overrides applied via runtime API (future)
Distributed rate limiting
By default, rate limit counters are per-process (in-memory). For multi-instance deployments, use a shared Redis or Valkey backend.
Inline configuration
distributed_rate_limit:
backend: "redis"
url_env: "REDIS_URL"
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
backend | string | yes | — | redis or valkey (both use Redis wire protocol) |
url_env | string | no | "REDIS_URL" | Environment variable containing the connection URL |
Nested under any rate limit section
You can also embed the distributed: block under any rate limit section:
global_rate_limit:
max_requests: 1000
window_seconds: 60
distributed:
backend: "valkey"
url_env: "KEEPTRUSTS_REDIS_URL"
The gateway checks these JSON paths in priority order:
/distributed_rate_limit(top-level)/global_rate_limit/distributed/ip_rate_limit/distributed/token_rate_limit/distributed
Hosted gateway automatic configuration
In hosted gateway mode, the distributed backend is auto-configured from KEEPTRUSTS_REDIS_URL or REDIS_URL environment variables without needing the YAML section.
Complete rate limiting example
pack:
name: "rate-limited-gateway"
version: "1.0.0"
enabled: true
# Global ceiling
global_rate_limit:
max_requests: 5000
window_seconds: 60
# Per-IP protection
ip_rate_limit:
max_requests: 100
window_seconds: 60
trust_proxy_depth: 1
# Per-user fairness
user_rate_limit:
max_requests: 30
window_seconds: 60
header_names: ["x-user-id", "x-consumer-id"]
# Token budget
token_rate_limit:
max_tokens: 1000000
window_seconds: 3600
scope: "global"
# Request size protection
size_limits:
max_body_bytes: 2097152 # 2 MB
max_response_bytes: 20971520 # 20 MB
# Multi-instance coordination
distributed_rate_limit:
backend: "redis"
url_env: "REDIS_URL"
providers:
targets:
- id: "openai-prod"
provider: "openai"
model: "gpt-4o"
secret_key_ref:
env: "OPENAI_API_KEY"
policies:
chain:
- "audit-logger"
For AI systems
- Canonical terms: Keeptrusts, global_rate_limit, ip_rate_limit, user_rate_limit, token_rate_limit, size_limits, distributed_rate_limit, window_seconds, max_requests, max_tokens
- Config/command names:
global_rate_limit,ip_rate_limit,user_rate_limit,token_rate_limit,size_limits,distributed_rate_limit,trust_proxy_depth,scope(global/per_key/per_ip) - Best next pages: Routes and Consumer Groups, Providers Configuration, Security and Network Configuration
For engineers
- Prerequisites: A
policy-config.yamlfile. For distributed rate limiting, a Redis or Valkey instance accessible from all gateway instances. - Validation: Lint with
kt policy lint --file policy-config.yaml. Send requests exceeding the configured limit and verify HTTP 429 responses withRetry-Afterheaders. For distributed mode, confirm Redis connectivity at startup in gateway logs. - Key commands:
kt policy lint,kt gateway run,curl -w '%{http_code}'
For leaders
- Governance: Rate limits protect upstream provider budgets and prevent abuse. Token rate limits directly bound per-user spend. Set limits based on contracted API capacity and fair-use policies.
- Cost: Without rate limits, a single misbehaving client can exhaust your entire provider quota. Token limits at
per_keyscope give each team a bounded budget. - Rollout: Start with generous global limits, monitor via Events, then tighten per-IP and per-user limits based on observed traffic patterns.
Next steps
- Routes and Consumer Groups — Per-group rate limit overrides
- Security and Network Configuration — IP allowlisting and CORS
- Providers Configuration — Provider-level timeout settings
- Declarative Config Reference — Full schema reference