Rate Limits Configuration

Keeptrusts supports five independent rate limiting scopes, each configured as a top-level section in your policy config. All scopes can optionally use a distributed Redis/Valkey backend for multi-instance coordination.

Use this page when

You are configuring request, IP, user, or token rate limits for your Keeptrusts gateway.
You need distributed rate limiting across multiple gateway instances using Redis or Valkey.
You are tuning size limits for request bodies, headers, or response payloads.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Quick reference

global_rate_limit:
  max_requests: 1000
  window_seconds: 60

ip_rate_limit:
  max_requests: 100
  window_seconds: 60

user_rate_limit:
  max_requests: 30
  window_seconds: 60
  header_names: ["x-user-id"]

token_rate_limit:
  max_tokens: 500000
  window_seconds: 3600
  scope: "global"

size_limits:
  max_body_bytes: 1048576
  max_response_bytes: 10485760

Global rate limit

A single counter for all requests to this gateway instance.

global_rate_limit:
  max_requests: 1000        # required
  window_seconds: 60         # required

Field	Type	Required	Default	Description
`max_requests`	integer	yes	—	Maximum requests per window
`window_seconds`	integer	yes	—	Fixed window size in seconds

Runtime behavior: Atomic counter with epoch-based fixed window reset. Returns HTTP 429 Too Many Requests with Retry-After header when exceeded.

Per-IP rate limit

Independent counters per client IP address.

ip_rate_limit:
  max_requests: 100          # required
  window_seconds: 60          # required
  trust_proxy_depth: 1        # optional, default: 0

Field	Type	Required	Default	Description
`max_requests`	integer	yes	—	Maximum requests per IP per window
`window_seconds`	integer	yes	—	Window size in seconds
`trust_proxy_depth`	integer	no	`0`	Number of `X-Forwarded-For` hops to trust. `0` uses the direct connection IP

Behind a reverse proxy

When running behind nginx or a load balancer, set trust_proxy_depth to the number of trusted proxies in the chain:

# Gateway behind one nginx reverse proxy
ip_rate_limit:
  max_requests: 50
  window_seconds: 60
  trust_proxy_depth: 1

Per-user rate limit

Independent counters per user identity extracted from request headers.

user_rate_limit:
  max_requests: 30           # required
  window_seconds: 60          # required
  header_names:               # optional
    - "x-user-id"
    - "x-consumer-id"

Field	Type	Required	Default	Description
`max_requests`	integer	yes	—	Maximum requests per user per window
`window_seconds`	integer	yes	—	Window size in seconds
`header_names`	string[]	no	`["x-user-id"]`	Headers to extract user identity from. First non-empty value wins

Requests without a matching header are bucketed as unknown and share a single counter.

Token rate limit

Sliding-window limit on total LLM token consumption (prompt + completion).

token_rate_limit:
  max_tokens: 500000         # required
  window_seconds: 3600        # required
  scope: "global"             # optional: global | per_key | per_ip

Field	Type	Required	Default	Description
`max_tokens`	integer	yes	—	Maximum tokens consumed per window
`window_seconds`	integer	yes	—	Sliding window size in seconds
`scope`	string	no	`"global"`	Bucketing scope: `global`, `per_key`, or `per_ip`

Runtime behavior: Uses a 6-sub-window sliding window. Tokens are recorded after the upstream response (from usage.total_tokens). The next request is checked against the remaining budget. Exceeding returns HTTP 429.

Scope examples

# Global: single bucket for all traffic
token_rate_limit:
  max_tokens: 1000000
  window_seconds: 3600
  scope: "global"

# Per API key: each key gets its own budget
token_rate_limit:
  max_tokens: 100000
  window_seconds: 3600
  scope: "per_key"

# Per IP: each client IP gets its own budget
token_rate_limit:
  max_tokens: 50000
  window_seconds: 3600
  scope: "per_ip"

Size limits

Byte-level limits on request and response payloads.

size_limits:
  max_body_bytes: 1048576        # 1 MB request body
  max_header_bytes: 8192         # 8 KB headers
  max_url_bytes: 4096            # 4 KB URL
  max_response_bytes: 10485760   # 10 MB response

Field	Type	Required	Default	Description
`max_body_bytes`	integer	no	unlimited	Maximum request body size
`max_header_bytes`	integer	no	unlimited	Maximum total header size
`max_url_bytes`	integer	no	unlimited	Maximum URL length
`max_response_bytes`	integer	no	unlimited	Maximum response body size

Exceeding any limit returns HTTP 413 Payload Too Large. Limits are checked in order: body → headers → URL → response.

Consumer group overrides can relax or tighten these per-key:

consumer_groups:
  groups:
    - name: "enterprise"
      api_keys: ["sha256:abc123..."]
      # size overrides applied via runtime API (future)

Distributed rate limiting

By default, rate limit counters are per-process (in-memory). For multi-instance deployments, use a shared Redis or Valkey backend.

Inline configuration

distributed_rate_limit:
  backend: "redis"
  url_env: "REDIS_URL"

Field	Type	Required	Default	Description
`backend`	string	yes	—	`redis` or `valkey` (both use Redis wire protocol)
`url_env`	string	no	`"REDIS_URL"`	Environment variable containing the connection URL

Nested under any rate limit section

You can also embed the distributed: block under any rate limit section:

global_rate_limit:
  max_requests: 1000
  window_seconds: 60
  distributed:
    backend: "valkey"
    url_env: "KEEPTRUSTS_REDIS_URL"

The gateway checks these JSON paths in priority order:

/distributed_rate_limit (top-level)
/global_rate_limit/distributed
/ip_rate_limit/distributed
/token_rate_limit/distributed

Hosted gateway automatic configuration

In hosted gateway mode, the distributed backend is auto-configured from KEEPTRUSTS_REDIS_URL or REDIS_URL environment variables without needing the YAML section.

Complete rate limiting example

pack:
  name: "rate-limited-gateway"
  version: "1.0.0"
  enabled: true

# Global ceiling
global_rate_limit:
  max_requests: 5000
  window_seconds: 60

# Per-IP protection
ip_rate_limit:
  max_requests: 100
  window_seconds: 60
  trust_proxy_depth: 1

# Per-user fairness
user_rate_limit:
  max_requests: 30
  window_seconds: 60
  header_names: ["x-user-id", "x-consumer-id"]

# Token budget
token_rate_limit:
  max_tokens: 1000000
  window_seconds: 3600
  scope: "global"

# Request size protection
size_limits:
  max_body_bytes: 2097152      # 2 MB
  max_response_bytes: 20971520 # 20 MB

# Multi-instance coordination
distributed_rate_limit:
  backend: "redis"
  url_env: "REDIS_URL"

providers:
  targets:
    - id: "openai-prod"
      provider: "openai"
      model: "gpt-4o"
      secret_key_ref:
        env: "OPENAI_API_KEY"

policies:
  chain:
    - "audit-logger"

For AI systems

Canonical terms: Keeptrusts, global_rate_limit, ip_rate_limit, user_rate_limit, token_rate_limit, size_limits, distributed_rate_limit, window_seconds, max_requests, max_tokens
Config/command names: global_rate_limit, ip_rate_limit, user_rate_limit, token_rate_limit, size_limits, distributed_rate_limit, trust_proxy_depth, scope (global/per_key/per_ip)
Best next pages: Routes and Consumer Groups, Providers Configuration, Security and Network Configuration

For engineers

Prerequisites: A policy-config.yaml file. For distributed rate limiting, a Redis or Valkey instance accessible from all gateway instances.
Validation: Lint with kt policy lint --file policy-config.yaml. Send requests exceeding the configured limit and verify HTTP 429 responses with Retry-After headers. For distributed mode, confirm Redis connectivity at startup in gateway logs.
Key commands: kt policy lint, kt gateway run, curl -w '%{http_code}'

For leaders

Governance: Rate limits protect upstream provider budgets and prevent abuse. Token rate limits directly bound per-user spend. Set limits based on contracted API capacity and fair-use policies.
Cost: Without rate limits, a single misbehaving client can exhaust your entire provider quota. Token limits at per_key scope give each team a bounded budget.
Rollout: Start with generous global limits, monitor via Events, then tighten per-IP and per-user limits based on observed traffic patterns.

Next steps

Routes and Consumer Groups — Per-group rate limit overrides
Security and Network Configuration — IP allowlisting and CORS
Providers Configuration — Provider-level timeout settings
Declarative Config Reference — Full schema reference

Use this page when​

Primary audience​

Quick reference​

Global rate limit​

Per-IP rate limit​

Behind a reverse proxy​

Per-user rate limit​

Token rate limit​

Scope examples​

Size limits​

Distributed rate limiting​

Inline configuration​

Nested under any rate limit section​

Hosted gateway automatic configuration​

Complete rate limiting example​

For AI systems​

For engineers​

For leaders​

Next steps​