Handling Policy Blocks & Errors Gracefully
The Keeptrusts gateway returns structured errors when policies block requests or when upstream providers fail. This guide covers every error type, the envelope format, and production-grade handling patterns.
Use this page when
- You need to handle 409 policy blocks, 429 rate limits, or 5xx gateway errors in your application.
- You want the full error envelope reference (types, codes, fields) for the Keeptrusts gateway.
- You are implementing retry logic, fallback patterns, or user-facing error messages for policy violations.
- You need to differentiate between
input_blocked,output_blocked,budget_exceeded, andescalation_required.
Primary audience
- Primary: Developers building production AI applications that must handle gateway errors gracefully
- Secondary: QA Engineers testing error paths, SRE/DevOps Engineers monitoring gateway health
Error Envelope Format
All gateway errors follow a consistent JSON envelope:
{
"error": {
"type": "policy_violation",
"message": "Request blocked: prompt contains disallowed content",
"policy": "block-prompt-injection",
"code": "input_blocked"
}
}
Envelope Fields
| Field | Type | Description |
|---|---|---|
type | string | Error category: policy_violation, rate_limit, upstream_error, validation_error |
message | string | Human-readable explanation |
policy | string | Policy name that triggered (only for policy_violation) |
code | string | Machine-readable error code |
HTTP Status Codes
| Status | Meaning | Action |
|---|---|---|
| 409 | Policy block — input or output violated a policy | Do not retry; show user-friendly message or use fallback |
| 422 | Validation error — malformed request | Fix the request payload |
| 429 | Rate limit — gateway or provider throttling | Retry with exponential backoff |
| 502 | Upstream provider error — provider returned an error | Retry once, then fail gracefully |
| 503 | Gateway unavailable — gateway is starting or overloaded | Retry with backoff |
| 504 | Gateway timeout — upstream provider did not respond | Retry with longer timeout or switch model |
Error Codes Reference
Policy Violation Codes (409)
| Code | Trigger |
|---|---|
input_blocked | Input policy blocked the prompt |
output_blocked | Output policy blocked the response |
tool_blocked | Tool/function call blocked by policy |
budget_exceeded | Token or cost budget limit reached |
escalation_required | Request requires human approval |
Rate Limit Codes (429)
| Code | Trigger |
|---|---|
gateway_rate_limit | Gateway-level rate limit |
provider_rate_limit | Upstream provider rate limit (forwarded) |
Python Error Handling
Comprehensive Handler
from openai import OpenAI, APIStatusError, RateLimitError, APIConnectionError
import time
client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)
def governed_completion(
messages: list[dict],
max_retries: int = 3,
fallback_response: str = "I'm unable to process this request right now.",
) -> str:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return response.choices[0].message.content or ""
except APIStatusError as e:
if e.status_code == 409:
# Policy block — do not retry
error_body = e.response.json()
error_info = error_body.get("error", {})
code = error_info.get("code", "unknown")
policy = error_info.get("policy", "unknown")
message = error_info.get("message", "Request blocked")
if code == "escalation_required":
return "This request requires human approval. It has been escalated."
if code == "budget_exceeded":
return "Usage budget exceeded. Please contact your administrator."
return f"Request blocked ({policy}): {message}"
if e.status_code == 429:
wait = 2 ** attempt
time.sleep(wait)
continue
if e.status_code in (502, 503, 504):
wait = 2 ** attempt
time.sleep(wait)
continue
raise
except APIConnectionError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
continue
return fallback_response
return fallback_response
Async Handler
import asyncio
from openai import AsyncOpenAI, APIStatusError
client = AsyncOpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)
async def governed_completion_async(
messages: list[dict],
max_retries: int = 3,
) -> str:
for attempt in range(max_retries):
try:
response = await client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return response.choices[0].message.content or ""
except APIStatusError as e:
if e.status_code == 409:
error_body = e.response.json()
return f"Blocked: {error_body.get('error', {}).get('message', 'Policy violation')}"
if e.status_code in (429, 502, 503, 504) and attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt)
continue
raise
return "Request failed after retries"
TypeScript Error Handling
Comprehensive Handler
import OpenAI from "openai";
interface PolicyErrorBody {
error: {
type: string;
message: string;
policy?: string;
code: string;
};
}
const client = new OpenAI({
baseURL: process.env.LLM_GATEWAY_URL ?? "http://localhost:41002/v1",
apiKey: process.env.OPENAI_API_KEY,
});
async function governedCompletion(
messages: OpenAI.ChatCompletionMessageParam[],
maxRetries = 3,
): Promise<string> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages,
});
return response.choices[0].message.content ?? "";
} catch (err) {
if (err instanceof OpenAI.APIError) {
if (err.status === 409) {
const body = err.error as PolicyErrorBody["error"];
return `Blocked (${body?.policy ?? "unknown"}): ${body?.message ?? "Policy violation"}`;
}
if (
err.status !== undefined &&
[429, 502, 503, 504].includes(err.status) &&
attempt < maxRetries - 1
) {
await new Promise((r) => setTimeout(r, 2 ** attempt * 1000));
continue;
}
}
throw err;
}
}
return "Request failed after retries";
}
Fallback Patterns
Model Fallback
When the primary model is blocked or unavailable, fall back to an alternative:
def completion_with_fallback(messages: list[dict]) -> str:
models = ["gpt-4o", "gpt-4o-mini"]
for model in models:
try:
response = client.chat.completions.create(
model=model,
messages=messages,
)
return response.choices[0].message.content or ""
except APIStatusError as e:
if e.status_code == 409:
return f"Blocked: {e.response.json().get('error', {}).get('message')}"
if e.status_code in (502, 504):
continue
raise
return "All models unavailable"
Cached Response Fallback
from functools import lru_cache
@lru_cache(maxsize=128)
def cached_completion(prompt_hash: str, prompt: str) -> str:
return governed_completion([{"role": "user", "content": prompt}])
def completion_with_cache_fallback(prompt: str) -> str:
prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()[:16]
try:
result = governed_completion([{"role": "user", "content": prompt}])
cached_completion.cache_clear() # Update cache on success
return result
except Exception:
cached = cached_completion(prompt_hash, prompt)
if cached:
return cached
return "Service temporarily unavailable"
Logging and Observability
Log policy blocks for operational visibility:
import logging
logger = logging.getLogger("ai_governance")
def observed_completion(messages: list[dict]) -> str:
try:
response = client.chat.completions.create(model="gpt-4o", messages=messages)
logger.info("completion_success", extra={"model": "gpt-4o"})
return response.choices[0].message.content or ""
except APIStatusError as e:
if e.status_code == 409:
error_body = e.response.json().get("error", {})
logger.warning(
"policy_block",
extra={
"policy": error_body.get("policy"),
"code": error_body.get("code"),
"message": error_body.get("message"),
},
)
return f"Blocked: {error_body.get('message')}"
logger.error("api_error", extra={"status": e.status_code})
raise
Best Practices
- Never retry on 409 — policy blocks are deterministic; the same input will always be blocked.
- Use exponential backoff for 429/5xx — start at 1s, double each attempt, cap at 3 retries.
- Type your error envelopes — define interfaces for
PolicyErrorBodyfor type-safe handling. - Differentiate error codes —
escalation_requiredandbudget_exceededneed different UX flows. - Log every policy block — correlate with gateway decision events for debugging.
- Provide fallback responses — users should see helpful messages, not raw error JSON.
- Set client timeouts — prevent hung connections from blocking your application.
- Test error paths — use observe-only policies to generate realistic 409 responses during development.
Next steps
- Developer Quick Start — set up the gateway and send your first request
- Streaming Patterns — handling errors during streaming
- Function Calling — tool-specific error handling
For AI systems
- Canonical terms: error envelope, policy block (409), rate limit (429), upstream error (502), gateway timeout (504),
type,code,policy,message. - Error codes:
input_blocked,output_blocked,tool_blocked,budget_exceeded,escalation_required,gateway_rate_limit,provider_rate_limit. - Never retry on 409 (deterministic). Exponential backoff for 429/5xx (start 1s, double, cap 3 retries).
- Best next pages: Developer Quick Start, Streaming Patterns, Function Calling.
For engineers
- All gateway errors follow a consistent JSON envelope:
{"error": {"type", "message", "policy", "code"}}. - 409 policy blocks are deterministic — the same input always triggers the same block. Do not retry.
- Differentiate error codes:
escalation_requiredneeds a human-approval UX;budget_exceededneeds a top-up flow. - Use exponential backoff for 429 (start at 1s, double each attempt, cap at 3 retries).
- Type your error envelopes (
PolicyErrorBodyinterface) for compile-time safety in TypeScript applications. - Log every policy block with the
policyfield to correlate with gateway decision events for debugging. - Set client timeouts to prevent hung connections during 504 scenarios.
For leaders
- Policy blocks (409) are expected behavior, not system failures — they indicate governance is working.
- Applications must provide fallback UX for blocks; users should see helpful messages, not raw error JSON.
escalation_requirederrors create a human-in-the-loop workflow — plan staffing for escalation review.- Budget-exceeded errors may indicate insufficient wallet allocation or unexpectedly expensive model usage.
- Error handling quality directly impacts user trust in governed AI systems.