Tutorial: Implementing Rate Limits per Team

This tutorial shows you how to set up consumer groups and rate limiting in the Keeptrusts gateway so different teams get isolated usage quotas and fair access to LLM providers.

Use this page when

You are configuring per-team rate limits using consumer groups.
You need to set requests-per-minute, tokens-per-minute, and tokens-per-day caps.
You want to expose X-RateLimit-* response headers so callers know their remaining quota.
You are testing rate limit enforcement and verifying HTTP 429 responses.

Primary audience

Primary: Platform engineers preventing abuse and ensuring fair access to LLM capacity
Secondary: Engineering managers defining team quotas; security teams mitigating denial-of-wallet attacks

Prerequisites

kt CLI installed (first-run tutorial)
An OpenAI-compatible API key exported as OPENAI_API_KEY
curl and jq installed

Step 1: Create the Configuration with Consumer Groups

Create policy-config.yaml with consumer groups and rate limits:

version: '1'
providers:
  targets:
  - id: openai
    provider: openai
    secret_key_ref:
      env: OPENAI_API_KEY
consumer_groups:
- name: engineering
  api_key: kt_cg_engineering_abc123
  rate_limits:
    requests_per_minute: 60
    tokens_per_minute: 100000
    tokens_per_day: 2000000
  allowed_models:
  - gpt-4o-mini
  - gpt-4o
- name: marketing
  api_key: kt_cg_marketing_def456
  rate_limits:
    requests_per_minute: 20
    tokens_per_minute: 50000
    tokens_per_day: 500000
  allowed_models:
  - gpt-4o-mini
- name: data-science
  api_key: kt_cg_datascience_ghi789
  rate_limits:
    requests_per_minute: 40
    tokens_per_minute: 200000
    tokens_per_day: 5000000
  allowed_models:
  - gpt-4o-mini
  - gpt-4o
rate_limit_defaults:
  requests_per_minute: 10
  tokens_per_minute: 20000
  tokens_per_day: 100000
  response_headers: true
policies:
- name: content-filter
  type: content_filter
  action: flag
  config:
    categories:
    - hate
    - violence

Configuration breakdown

Field	Purpose
`consumer_groups[].api_key`	Unique key the team includes in requests to identify themselves
`rate_limits.requests_per_minute`	Max requests per rolling 60-second window
`rate_limits.tokens_per_minute`	Max input+output tokens per rolling 60-second window
`rate_limits.tokens_per_day`	Max input+output tokens per rolling 24-hour window
`rate_limit_defaults`	Applied to requests without a recognized consumer group key
`response_headers`	Include `X-RateLimit-*` headers in responses

Step 2: Validate and Start the Gateway

kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --port 41002

Expected output:

INFO  keeptrusts::gateway Loaded 3 consumer group(s)
INFO  keeptrusts::gateway Rate limits: engineering=60rpm/100k-tpm, marketing=20rpm/50k-tpm, data-science=40rpm/200k-tpm
INFO  keeptrusts::gateway Default rate limit: 10rpm/20k-tpm
INFO  keeptrusts::gateway Gateway ready

Step 3: Send Requests with Consumer Group Keys

Requests include the consumer group key in the X-Consumer-Group header:

# Request as engineering team
curl -s -D- http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Consumer-Group: kt_cg_engineering_abc123" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }' 2>&1 | grep -E "^(X-RateLimit|HTTP)"

Expected response headers:

HTTP/1.1 200 OK
X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 59
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 99850
X-RateLimit-Reset: 2026-04-23T10:31:00Z

Step 4: Test Rate Limit Enforcement

Send rapid requests as the marketing team (limit: 20 rpm) to trigger the rate limit:

# Send 21 rapid requests
for i in $(seq 1 21); do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    http://localhost:41002/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "X-Consumer-Group: kt_cg_marketing_def456" \
    -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}')
  echo "Request $i: HTTP $STATUS"
done

Expected output:

Request 1: HTTP 200
Request 2: HTTP 200
...
Request 20: HTTP 200
Request 21: HTTP 429

The 21st request receives a 429 Too Many Requests response:

curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Consumer-Group: kt_cg_marketing_def456" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}' | jq .

{
  "error": {
    "message": "Rate limit exceeded for consumer group: marketing",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "details": {
      "consumer_group": "marketing",
      "limit": "requests_per_minute",
      "limit_value": 20,
      "retry_after_seconds": 45
    }
  }
}

Step 5: Verify Per-Group Isolation

Confirm that the engineering team is unaffected while marketing is throttled:

# Engineering should still work
curl -s -o /dev/null -w "Engineering: HTTP %{http_code}\n" \
  http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Consumer-Group: kt_cg_engineering_abc123" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}'

# Marketing is still throttled
curl -s -o /dev/null -w "Marketing:   HTTP %{http_code}\n" \
  http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Consumer-Group: kt_cg_marketing_def456" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}'

Expected output:

Engineering: HTTP 200
Marketing:   HTTP 429

Step 6: Enforce Model Access by Group

The marketing group is restricted to gpt-4o-mini. Test access to a disallowed model:

curl -s -w "\nHTTP %{http_code}\n" \
  http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Consumer-Group: kt_cg_marketing_def456" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Expected output:

{
  "error": {
    "message": "Model gpt-4o is not allowed for consumer group: marketing",
    "type": "access_denied",
    "code": "model_not_allowed"
  }
}
HTTP 403

Step 7: Monitor Rate Limit Usage

Check current rate limit state per consumer group:

kt events tail --last 5 --format json | jq '.[] | {consumer_group, tokens_used: .usage.total_tokens, rate_limit_remaining: .rate_limit.remaining_requests}'

Expected output:

{"consumer_group": "engineering", "tokens_used": 150, "rate_limit_remaining": 58}
{"consumer_group": "marketing", "tokens_used": 120, "rate_limit_remaining": 0}
{"consumer_group": "engineering", "tokens_used": 200, "rate_limit_remaining": 57}

Step 8: Handle Unknown Consumer Groups

Requests without a recognized key get default rate limits:

curl -s -D- http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}' \
  2>&1 | grep X-RateLimit-Limit-Requests

Expected:

X-RateLimit-Limit-Requests: 10

The default limit of 10 rpm applies.

For AI systems

Canonical terms: Keeptrusts gateway, rate limiting, consumer groups, requests per minute, tokens per minute, tokens per day, HTTP 429.
Config fields: consumer_groups[].rate_limits.requests_per_minute, tokens_per_minute, tokens_per_day, rate_limit_defaults, response_headers: true.
CLI commands: kt gateway run, kt policy lint, kt events tail --status rate_limited.
Response headers: X-RateLimit-Limit-Requests, X-RateLimit-Remaining-Requests, X-RateLimit-Reset-Requests.
Best next pages: Consumer Group Isolation, Cost Tracking & Budgets, CORS & IP Allowlist.

For engineers

Prerequisites: kt CLI, OPENAI_API_KEY exported, curl and jq.
Validate: kt policy lint confirms rate limit values are positive and consumer group names are unique.
Test enforcement: send requests exceeding requests_per_minute — expect HTTP 429 with Retry-After header.
Check headers: curl -I shows X-RateLimit-Remaining-Requests decrementing per request.
Defaults: rate_limit_defaults applies to requests without a recognised consumer group API key.

For leaders

Rate limits prevent runaway usage from exhausting provider quotas or blowing budgets.
Per-team quotas ensure fair capacity distribution across the organisation.
Token-per-day limits cap long-term spending while per-minute limits protect against bursts.
Rate limit headers give developers self-service visibility into their remaining quota without contacting platform teams.

Next steps

Set up cost tracking to add budget caps alongside rate limits
Configure PII redaction to layer privacy policies per group
Tail events to monitor per-group usage patterns

Troubleshooting

Symptom	Cause	Fix
429 with low traffic	Token limit hit, not request limit	Check `tokens_per_minute` quota
All groups throttled together	Consumer group key not sent	Include `X-Consumer-Group` header
Default limits applied	Key not matching any group	Verify the API key matches exactly
No `X-RateLimit-*` headers	Headers disabled	Set `rate_limit_defaults.response_headers: true`

Use this page when​

Primary audience​

Prerequisites​

Step 1: Create the Configuration with Consumer Groups​

Configuration breakdown​

Step 2: Validate and Start the Gateway​

Step 3: Send Requests with Consumer Group Keys​

Step 4: Test Rate Limit Enforcement​

Step 5: Verify Per-Group Isolation​

Step 6: Enforce Model Access by Group​

Step 7: Monitor Rate Limit Usage​

Step 8: Handle Unknown Consumer Groups​

For AI systems​

For engineers​

For leaders​

Next steps​

Troubleshooting​

Use this page when

Primary audience

Prerequisites

Step 1: Create the Configuration with Consumer Groups

Configuration breakdown

Step 2: Validate and Start the Gateway

Step 3: Send Requests with Consumer Group Keys

Step 4: Test Rate Limit Enforcement

Step 5: Verify Per-Group Isolation

Step 6: Enforce Model Access by Group

Step 7: Monitor Rate Limit Usage

Step 8: Handle Unknown Consumer Groups

For AI systems

For engineers

For leaders

Next steps

Troubleshooting