Skip to main content
Browse docs

Tutorial: Implementing Rate Limits per Team

This tutorial shows you how to set up consumer groups and rate limiting in the Keeptrusts gateway so different teams get isolated usage quotas and fair access to LLM providers.

Use this page when

  • You are configuring per-team rate limits using consumer groups.
  • You need to set requests-per-minute, tokens-per-minute, and tokens-per-day caps.
  • You want to expose X-RateLimit-* response headers so callers know their remaining quota.
  • You are testing rate limit enforcement and verifying HTTP 429 responses.

Primary audience

  • Primary: Platform engineers preventing abuse and ensuring fair access to LLM capacity
  • Secondary: Engineering managers defining team quotas; security teams mitigating denial-of-wallet attacks

Prerequisites

  • kt CLI installed (first-run tutorial)
  • An OpenAI-compatible API key exported as OPENAI_API_KEY
  • curl and jq installed

Step 1: Create the Configuration with Consumer Groups

Create policy-config.yaml with consumer groups and rate limits:

version: '1'
providers:
targets:
- id: openai
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
consumer_groups:
- name: engineering
api_key: kt_cg_engineering_abc123
rate_limits:
requests_per_minute: 60
tokens_per_minute: 100000
tokens_per_day: 2000000
allowed_models:
- gpt-4o-mini
- gpt-4o
- name: marketing
api_key: kt_cg_marketing_def456
rate_limits:
requests_per_minute: 20
tokens_per_minute: 50000
tokens_per_day: 500000
allowed_models:
- gpt-4o-mini
- name: data-science
api_key: kt_cg_datascience_ghi789
rate_limits:
requests_per_minute: 40
tokens_per_minute: 200000
tokens_per_day: 5000000
allowed_models:
- gpt-4o-mini
- gpt-4o
rate_limit_defaults:
requests_per_minute: 10
tokens_per_minute: 20000
tokens_per_day: 100000
response_headers: true
policies:
- name: content-filter
type: content_filter
action: flag
config:
categories:
- hate
- violence

Configuration breakdown

FieldPurpose
consumer_groups[].api_keyUnique key the team includes in requests to identify themselves
rate_limits.requests_per_minuteMax requests per rolling 60-second window
rate_limits.tokens_per_minuteMax input+output tokens per rolling 60-second window
rate_limits.tokens_per_dayMax input+output tokens per rolling 24-hour window
rate_limit_defaultsApplied to requests without a recognized consumer group key
response_headersInclude X-RateLimit-* headers in responses

Step 2: Validate and Start the Gateway

kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --port 41002

Expected output:

INFO keeptrusts::gateway Loaded 3 consumer group(s)
INFO keeptrusts::gateway Rate limits: engineering=60rpm/100k-tpm, marketing=20rpm/50k-tpm, data-science=40rpm/200k-tpm
INFO keeptrusts::gateway Default rate limit: 10rpm/20k-tpm
INFO keeptrusts::gateway Gateway ready

Step 3: Send Requests with Consumer Group Keys

Requests include the consumer group key in the X-Consumer-Group header:

# Request as engineering team
curl -s -D- http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_engineering_abc123" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}' 2>&1 | grep -E "^(X-RateLimit|HTTP)"

Expected response headers:

HTTP/1.1 200 OK
X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 59
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 99850
X-RateLimit-Reset: 2026-04-23T10:31:00Z

Step 4: Test Rate Limit Enforcement

Send rapid requests as the marketing team (limit: 20 rpm) to trigger the rate limit:

# Send 21 rapid requests
for i in $(seq 1 21); do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}')
echo "Request $i: HTTP $STATUS"
done

Expected output:

Request 1: HTTP 200
Request 2: HTTP 200
...
Request 20: HTTP 200
Request 21: HTTP 429

The 21st request receives a 429 Too Many Requests response:

curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}' | jq .
{
"error": {
"message": "Rate limit exceeded for consumer group: marketing",
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"details": {
"consumer_group": "marketing",
"limit": "requests_per_minute",
"limit_value": 20,
"retry_after_seconds": 45
}
}
}

Step 5: Verify Per-Group Isolation

Confirm that the engineering team is unaffected while marketing is throttled:

# Engineering should still work
curl -s -o /dev/null -w "Engineering: HTTP %{http_code}\n" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_engineering_abc123" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}'

# Marketing is still throttled
curl -s -o /dev/null -w "Marketing: HTTP %{http_code}\n" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}'

Expected output:

Engineering: HTTP 200
Marketing: HTTP 429

Step 6: Enforce Model Access by Group

The marketing group is restricted to gpt-4o-mini. Test access to a disallowed model:

curl -s -w "\nHTTP %{http_code}\n" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Expected output:

{
"error": {
"message": "Model gpt-4o is not allowed for consumer group: marketing",
"type": "access_denied",
"code": "model_not_allowed"
}
}
HTTP 403

Step 7: Monitor Rate Limit Usage

Check current rate limit state per consumer group:

kt events tail --last 5 --format json | jq '.[] | {consumer_group, tokens_used: .usage.total_tokens, rate_limit_remaining: .rate_limit.remaining_requests}'

Expected output:

{"consumer_group": "engineering", "tokens_used": 150, "rate_limit_remaining": 58}
{"consumer_group": "marketing", "tokens_used": 120, "rate_limit_remaining": 0}
{"consumer_group": "engineering", "tokens_used": 200, "rate_limit_remaining": 57}

Step 8: Handle Unknown Consumer Groups

Requests without a recognized key get default rate limits:

curl -s -D- http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}' \
2>&1 | grep X-RateLimit-Limit-Requests

Expected:

X-RateLimit-Limit-Requests: 10

The default limit of 10 rpm applies.

For AI systems

  • Canonical terms: Keeptrusts gateway, rate limiting, consumer groups, requests per minute, tokens per minute, tokens per day, HTTP 429.
  • Config fields: consumer_groups[].rate_limits.requests_per_minute, tokens_per_minute, tokens_per_day, rate_limit_defaults, response_headers: true.
  • CLI commands: kt gateway run, kt policy lint, kt events tail --status rate_limited.
  • Response headers: X-RateLimit-Limit-Requests, X-RateLimit-Remaining-Requests, X-RateLimit-Reset-Requests.
  • Best next pages: Consumer Group Isolation, Cost Tracking & Budgets, CORS & IP Allowlist.

For engineers

  • Prerequisites: kt CLI, OPENAI_API_KEY exported, curl and jq.
  • Validate: kt policy lint confirms rate limit values are positive and consumer group names are unique.
  • Test enforcement: send requests exceeding requests_per_minute — expect HTTP 429 with Retry-After header.
  • Check headers: curl -I shows X-RateLimit-Remaining-Requests decrementing per request.
  • Defaults: rate_limit_defaults applies to requests without a recognised consumer group API key.

For leaders

  • Rate limits prevent runaway usage from exhausting provider quotas or blowing budgets.
  • Per-team quotas ensure fair capacity distribution across the organisation.
  • Token-per-day limits cap long-term spending while per-minute limits protect against bursts.
  • Rate limit headers give developers self-service visibility into their remaining quota without contacting platform teams.

Next steps

Troubleshooting

SymptomCauseFix
429 with low trafficToken limit hit, not request limitCheck tokens_per_minute quota
All groups throttled togetherConsumer group key not sentInclude X-Consumer-Group header
Default limits appliedKey not matching any groupVerify the API key matches exactly
No X-RateLimit-* headersHeaders disabledSet rate_limit_defaults.response_headers: true