Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Handling Policy Blocks & Errors Gracefully

The Keeptrusts gateway returns structured errors when policies block requests or when upstream providers fail. This guide covers every error type, the envelope format, and production-grade handling patterns.

Use this page when

  • You need to handle 409 policy blocks, 429 rate limits, or 5xx gateway errors in your application.
  • You want the full error envelope reference (types, codes, fields) for the Keeptrusts gateway.
  • You are implementing retry logic, fallback patterns, or user-facing error messages for policy violations.
  • You need to differentiate between input_blocked, output_blocked, budget_exceeded, and escalation_required.

Primary audience

  • Primary: Developers building production AI applications that must handle gateway errors gracefully
  • Secondary: QA Engineers testing error paths, SRE/DevOps Engineers monitoring gateway health

Error Envelope Format

All gateway errors follow a consistent JSON envelope:

{
"error": {
"type": "policy_violation",
"message": "Request blocked: prompt contains disallowed content",
"policy": "block-prompt-injection",
"code": "input_blocked"
}
}

Envelope Fields

FieldTypeDescription
typestringError category: policy_violation, rate_limit, upstream_error, validation_error
messagestringHuman-readable explanation
policystringPolicy name that triggered (only for policy_violation)
codestringMachine-readable error code

HTTP Status Codes

StatusMeaningAction
409Policy block — input or output violated a policyDo not retry; show user-friendly message or use fallback
422Validation error — malformed requestFix the request payload
429Rate limit — gateway or provider throttlingRetry with exponential backoff
502Upstream provider error — provider returned an errorRetry once, then fail gracefully
503Gateway unavailable — gateway is starting or overloadedRetry with backoff
504Gateway timeout — upstream provider did not respondRetry with longer timeout or switch model

Error Codes Reference

Policy Violation Codes (409)

CodeTrigger
input_blockedInput policy blocked the prompt
output_blockedOutput policy blocked the response
tool_blockedTool/function call blocked by policy
budget_exceededToken or cost budget limit reached
escalation_requiredRequest requires human approval

Rate Limit Codes (429)

CodeTrigger
gateway_rate_limitGateway-level rate limit
provider_rate_limitUpstream provider rate limit (forwarded)

Python Error Handling

Comprehensive Handler

from openai import OpenAI, APIStatusError, RateLimitError, APIConnectionError
import time

client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)

def governed_completion(
messages: list[dict],
max_retries: int = 3,
fallback_response: str = "I'm unable to process this request right now.",
) -> str:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return response.choices[0].message.content or ""

except APIStatusError as e:
if e.status_code == 409:
# Policy block — do not retry
error_body = e.response.json()
error_info = error_body.get("error", {})
code = error_info.get("code", "unknown")
policy = error_info.get("policy", "unknown")
message = error_info.get("message", "Request blocked")

if code == "escalation_required":
return "This request requires human approval. It has been escalated."
if code == "budget_exceeded":
return "Usage budget exceeded. Please contact your administrator."

return f"Request blocked ({policy}): {message}"

if e.status_code == 429:
wait = 2 ** attempt
time.sleep(wait)
continue

if e.status_code in (502, 503, 504):
wait = 2 ** attempt
time.sleep(wait)
continue

raise

except APIConnectionError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
continue
return fallback_response

return fallback_response

Async Handler

import asyncio
from openai import AsyncOpenAI, APIStatusError

client = AsyncOpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)

async def governed_completion_async(
messages: list[dict],
max_retries: int = 3,
) -> str:
for attempt in range(max_retries):
try:
response = await client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return response.choices[0].message.content or ""
except APIStatusError as e:
if e.status_code == 409:
error_body = e.response.json()
return f"Blocked: {error_body.get('error', {}).get('message', 'Policy violation')}"
if e.status_code in (429, 502, 503, 504) and attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt)
continue
raise
return "Request failed after retries"

TypeScript Error Handling

Comprehensive Handler

import OpenAI from "openai";

interface PolicyErrorBody {
error: {
type: string;
message: string;
policy?: string;
code: string;
};
}

const client = new OpenAI({
baseURL: process.env.LLM_GATEWAY_URL ?? "http://localhost:41002/v1",
apiKey: process.env.OPENAI_API_KEY,
});

async function governedCompletion(
messages: OpenAI.ChatCompletionMessageParam[],
maxRetries = 3,
): Promise<string> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages,
});
return response.choices[0].message.content ?? "";
} catch (err) {
if (err instanceof OpenAI.APIError) {
if (err.status === 409) {
const body = err.error as PolicyErrorBody["error"];
return `Blocked (${body?.policy ?? "unknown"}): ${body?.message ?? "Policy violation"}`;
}
if (
err.status !== undefined &&
[429, 502, 503, 504].includes(err.status) &&
attempt < maxRetries - 1
) {
await new Promise((r) => setTimeout(r, 2 ** attempt * 1000));
continue;
}
}
throw err;
}
}
return "Request failed after retries";
}

Fallback Patterns

Model Fallback

When the primary model is blocked or unavailable, fall back to an alternative:

def completion_with_fallback(messages: list[dict]) -> str:
models = ["gpt-4o", "gpt-4o-mini"]
for model in models:
try:
response = client.chat.completions.create(
model=model,
messages=messages,
)
return response.choices[0].message.content or ""
except APIStatusError as e:
if e.status_code == 409:
return f"Blocked: {e.response.json().get('error', {}).get('message')}"
if e.status_code in (502, 504):
continue
raise
return "All models unavailable"

Cached Response Fallback

from functools import lru_cache

@lru_cache(maxsize=128)
def cached_completion(prompt_hash: str, prompt: str) -> str:
return governed_completion([{"role": "user", "content": prompt}])

def completion_with_cache_fallback(prompt: str) -> str:
prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()[:16]
try:
result = governed_completion([{"role": "user", "content": prompt}])
cached_completion.cache_clear() # Update cache on success
return result
except Exception:
cached = cached_completion(prompt_hash, prompt)
if cached:
return cached
return "Service temporarily unavailable"

Logging and Observability

Log policy blocks for operational visibility:

import logging

logger = logging.getLogger("ai_governance")

def observed_completion(messages: list[dict]) -> str:
try:
response = client.chat.completions.create(model="gpt-4o", messages=messages)
logger.info("completion_success", extra={"model": "gpt-4o"})
return response.choices[0].message.content or ""
except APIStatusError as e:
if e.status_code == 409:
error_body = e.response.json().get("error", {})
logger.warning(
"policy_block",
extra={
"policy": error_body.get("policy"),
"code": error_body.get("code"),
"message": error_body.get("message"),
},
)
return f"Blocked: {error_body.get('message')}"
logger.error("api_error", extra={"status": e.status_code})
raise

Best Practices

  1. Never retry on 409 — policy blocks are deterministic; the same input will always be blocked.
  2. Use exponential backoff for 429/5xx — start at 1s, double each attempt, cap at 3 retries.
  3. Type your error envelopes — define interfaces for PolicyErrorBody for type-safe handling.
  4. Differentiate error codesescalation_required and budget_exceeded need different UX flows.
  5. Log every policy block — correlate with gateway decision events for debugging.
  6. Provide fallback responses — users should see helpful messages, not raw error JSON.
  7. Set client timeouts — prevent hung connections from blocking your application.
  8. Test error paths — use observe-only policies to generate realistic 409 responses during development.

Next steps

For AI systems

  • Canonical terms: error envelope, policy block (409), rate limit (429), upstream error (502), gateway timeout (504), type, code, policy, message.
  • Error codes: input_blocked, output_blocked, tool_blocked, budget_exceeded, escalation_required, gateway_rate_limit, provider_rate_limit.
  • Never retry on 409 (deterministic). Exponential backoff for 429/5xx (start 1s, double, cap 3 retries).
  • Best next pages: Developer Quick Start, Streaming Patterns, Function Calling.

For engineers

  • All gateway errors follow a consistent JSON envelope: {"error": {"type", "message", "policy", "code"}}.
  • 409 policy blocks are deterministic — the same input always triggers the same block. Do not retry.
  • Differentiate error codes: escalation_required needs a human-approval UX; budget_exceeded needs a top-up flow.
  • Use exponential backoff for 429 (start at 1s, double each attempt, cap at 3 retries).
  • Type your error envelopes (PolicyErrorBody interface) for compile-time safety in TypeScript applications.
  • Log every policy block with the policy field to correlate with gateway decision events for debugging.
  • Set client timeouts to prevent hung connections during 504 scenarios.

For leaders

  • Policy blocks (409) are expected behavior, not system failures — they indicate governance is working.
  • Applications must provide fallback UX for blocks; users should see helpful messages, not raw error JSON.
  • escalation_required errors create a human-in-the-loop workflow — plan staffing for escalation review.
  • Budget-exceeded errors may indicate insufficient wallet allocation or unexpectedly expensive model usage.
  • Error handling quality directly impacts user trust in governed AI systems.