Tutorial: Implementing Rate Limits per Team
This tutorial shows you how to set up consumer groups and rate limiting in the Keeptrusts gateway so different teams get isolated usage quotas and fair access to LLM providers.
Use this page when
- You are configuring per-team rate limits using consumer groups.
- You need to set requests-per-minute, tokens-per-minute, and tokens-per-day caps.
- You want to expose
X-RateLimit-*response headers so callers know their remaining quota. - You are testing rate limit enforcement and verifying HTTP 429 responses.
Primary audience
- Primary: Platform engineers preventing abuse and ensuring fair access to LLM capacity
- Secondary: Engineering managers defining team quotas; security teams mitigating denial-of-wallet attacks
Prerequisites
ktCLI installed (first-run tutorial)- An OpenAI-compatible API key exported as
OPENAI_API_KEY curlandjqinstalled
Step 1: Create the Configuration with Consumer Groups
Create policy-config.yaml with consumer groups and rate limits:
version: '1'
providers:
targets:
- id: openai
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
consumer_groups:
- name: engineering
api_key: kt_cg_engineering_abc123
rate_limits:
requests_per_minute: 60
tokens_per_minute: 100000
tokens_per_day: 2000000
allowed_models:
- gpt-4o-mini
- gpt-4o
- name: marketing
api_key: kt_cg_marketing_def456
rate_limits:
requests_per_minute: 20
tokens_per_minute: 50000
tokens_per_day: 500000
allowed_models:
- gpt-4o-mini
- name: data-science
api_key: kt_cg_datascience_ghi789
rate_limits:
requests_per_minute: 40
tokens_per_minute: 200000
tokens_per_day: 5000000
allowed_models:
- gpt-4o-mini
- gpt-4o
rate_limit_defaults:
requests_per_minute: 10
tokens_per_minute: 20000
tokens_per_day: 100000
response_headers: true
policies:
- name: content-filter
type: content_filter
action: flag
config:
categories:
- hate
- violence
Configuration breakdown
| Field | Purpose |
|---|---|
consumer_groups[].api_key | Unique key the team includes in requests to identify themselves |
rate_limits.requests_per_minute | Max requests per rolling 60-second window |
rate_limits.tokens_per_minute | Max input+output tokens per rolling 60-second window |
rate_limits.tokens_per_day | Max input+output tokens per rolling 24-hour window |
rate_limit_defaults | Applied to requests without a recognized consumer group key |
response_headers | Include X-RateLimit-* headers in responses |
Step 2: Validate and Start the Gateway
kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --port 41002
Expected output:
INFO keeptrusts::gateway Loaded 3 consumer group(s)
INFO keeptrusts::gateway Rate limits: engineering=60rpm/100k-tpm, marketing=20rpm/50k-tpm, data-science=40rpm/200k-tpm
INFO keeptrusts::gateway Default rate limit: 10rpm/20k-tpm
INFO keeptrusts::gateway Gateway ready
Step 3: Send Requests with Consumer Group Keys
Requests include the consumer group key in the X-Consumer-Group header:
# Request as engineering team
curl -s -D- http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_engineering_abc123" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}' 2>&1 | grep -E "^(X-RateLimit|HTTP)"
Expected response headers:
HTTP/1.1 200 OK
X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 59
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 99850
X-RateLimit-Reset: 2026-04-23T10:31:00Z
Step 4: Test Rate Limit Enforcement
Send rapid requests as the marketing team (limit: 20 rpm) to trigger the rate limit:
# Send 21 rapid requests
for i in $(seq 1 21); do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}')
echo "Request $i: HTTP $STATUS"
done
Expected output:
Request 1: HTTP 200
Request 2: HTTP 200
...
Request 20: HTTP 200
Request 21: HTTP 429
The 21st request receives a 429 Too Many Requests response:
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}' | jq .
{
"error": {
"message": "Rate limit exceeded for consumer group: marketing",
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"details": {
"consumer_group": "marketing",
"limit": "requests_per_minute",
"limit_value": 20,
"retry_after_seconds": 45
}
}
}
Step 5: Verify Per-Group Isolation
Confirm that the engineering team is unaffected while marketing is throttled:
# Engineering should still work
curl -s -o /dev/null -w "Engineering: HTTP %{http_code}\n" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_engineering_abc123" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}'
# Marketing is still throttled
curl -s -o /dev/null -w "Marketing: HTTP %{http_code}\n" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}'
Expected output:
Engineering: HTTP 200
Marketing: HTTP 429
Step 6: Enforce Model Access by Group
The marketing group is restricted to gpt-4o-mini. Test access to a disallowed model:
curl -s -w "\nHTTP %{http_code}\n" \
http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Consumer-Group: kt_cg_marketing_def456" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'
Expected output:
{
"error": {
"message": "Model gpt-4o is not allowed for consumer group: marketing",
"type": "access_denied",
"code": "model_not_allowed"
}
}
HTTP 403
Step 7: Monitor Rate Limit Usage
Check current rate limit state per consumer group:
kt events tail --last 5 --format json | jq '.[] | {consumer_group, tokens_used: .usage.total_tokens, rate_limit_remaining: .rate_limit.remaining_requests}'
Expected output:
{"consumer_group": "engineering", "tokens_used": 150, "rate_limit_remaining": 58}
{"consumer_group": "marketing", "tokens_used": 120, "rate_limit_remaining": 0}
{"consumer_group": "engineering", "tokens_used": 200, "rate_limit_remaining": 57}
Step 8: Handle Unknown Consumer Groups
Requests without a recognized key get default rate limits:
curl -s -D- http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}' \
2>&1 | grep X-RateLimit-Limit-Requests
Expected:
X-RateLimit-Limit-Requests: 10
The default limit of 10 rpm applies.
For AI systems
- Canonical terms: Keeptrusts gateway, rate limiting, consumer groups, requests per minute, tokens per minute, tokens per day, HTTP 429.
- Config fields:
consumer_groups[].rate_limits.requests_per_minute,tokens_per_minute,tokens_per_day,rate_limit_defaults,response_headers: true. - CLI commands:
kt gateway run,kt policy lint,kt events tail --status rate_limited. - Response headers:
X-RateLimit-Limit-Requests,X-RateLimit-Remaining-Requests,X-RateLimit-Reset-Requests. - Best next pages: Consumer Group Isolation, Cost Tracking & Budgets, CORS & IP Allowlist.
For engineers
- Prerequisites:
ktCLI,OPENAI_API_KEYexported,curlandjq. - Validate:
kt policy lintconfirms rate limit values are positive and consumer group names are unique. - Test enforcement: send requests exceeding
requests_per_minute— expect HTTP 429 withRetry-Afterheader. - Check headers:
curl -IshowsX-RateLimit-Remaining-Requestsdecrementing per request. - Defaults:
rate_limit_defaultsapplies to requests without a recognised consumer group API key.
For leaders
- Rate limits prevent runaway usage from exhausting provider quotas or blowing budgets.
- Per-team quotas ensure fair capacity distribution across the organisation.
- Token-per-day limits cap long-term spending while per-minute limits protect against bursts.
- Rate limit headers give developers self-service visibility into their remaining quota without contacting platform teams.
Next steps
- Set up cost tracking to add budget caps alongside rate limits
- Configure PII redaction to layer privacy policies per group
- Tail events to monitor per-group usage patterns
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| 429 with low traffic | Token limit hit, not request limit | Check tokens_per_minute quota |
| All groups throttled together | Consumer group key not sent | Include X-Consumer-Group header |
| Default limits applied | Key not matching any group | Verify the API key matches exactly |
No X-RateLimit-* headers | Headers disabled | Set rate_limit_defaults.response_headers: true |