Handling Policy Blocks & Errors Gracefully

The Keeptrusts gateway returns structured errors when policies block requests or when upstream providers fail. This guide covers every error type, the envelope format, and production-grade handling patterns.

Use this page when

You need to handle 409 policy blocks, 429 rate limits, or 5xx gateway errors in your application.
You want the full error envelope reference (types, codes, fields) for the Keeptrusts gateway.
You are implementing retry logic, fallback patterns, or user-facing error messages for policy violations.
You need to differentiate between input_blocked, output_blocked, budget_exceeded, and escalation_required.

Primary audience

Primary: Developers building production AI applications that must handle gateway errors gracefully
Secondary: QA Engineers testing error paths, SRE/DevOps Engineers monitoring gateway health

Error Envelope Format

All gateway errors follow a consistent JSON envelope:

{
  "error": {
    "type": "policy_violation",
    "message": "Request blocked: prompt contains disallowed content",
    "policy": "block-prompt-injection",
    "code": "input_blocked"
  }
}

Envelope Fields

Field	Type	Description
`type`	string	Error category: `policy_violation`, `rate_limit`, `upstream_error`, `validation_error`
`message`	string	Human-readable explanation
`policy`	string	Policy name that triggered (only for `policy_violation`)
`code`	string	Machine-readable error code

HTTP Status Codes

Status	Meaning	Action
409	Policy block — input or output violated a policy	Do not retry; show user-friendly message or use fallback
422	Validation error — malformed request	Fix the request payload
429	Rate limit — gateway or provider throttling	Retry with exponential backoff
502	Upstream provider error — provider returned an error	Retry once, then fail gracefully
503	Gateway unavailable — gateway is starting or overloaded	Retry with backoff
504	Gateway timeout — upstream provider did not respond	Retry with longer timeout or switch model

Error Codes Reference

Policy Violation Codes (409)

Code	Trigger
`input_blocked`	Input policy blocked the prompt
`output_blocked`	Output policy blocked the response
`tool_blocked`	Tool/function call blocked by policy
`budget_exceeded`	Token or cost budget limit reached
`escalation_required`	Request requires human approval

Rate Limit Codes (429)

Code	Trigger
`gateway_rate_limit`	Gateway-level rate limit
`provider_rate_limit`	Upstream provider rate limit (forwarded)

Python Error Handling

Comprehensive Handler

from openai import OpenAI, APIStatusError, RateLimitError, APIConnectionError
import time

client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key="sk-...",
)

def governed_completion(
    messages: list[dict],
    max_retries: int = 3,
    fallback_response: str = "I'm unable to process this request right now.",
) -> str:
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
            return response.choices[0].message.content or ""

        except APIStatusError as e:
            if e.status_code == 409:
                # Policy block — do not retry
                error_body = e.response.json()
                error_info = error_body.get("error", {})
                code = error_info.get("code", "unknown")
                policy = error_info.get("policy", "unknown")
                message = error_info.get("message", "Request blocked")

                if code == "escalation_required":
                    return "This request requires human approval. It has been escalated."
                if code == "budget_exceeded":
                    return "Usage budget exceeded. Please contact your administrator."

                return f"Request blocked ({policy}): {message}"

            if e.status_code == 429:
                wait = 2 ** attempt
                time.sleep(wait)
                continue

            if e.status_code in (502, 503, 504):
                wait = 2 ** attempt
                time.sleep(wait)
                continue

            raise

        except APIConnectionError:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)
                continue
            return fallback_response

    return fallback_response

Async Handler

import asyncio
from openai import AsyncOpenAI, APIStatusError

client = AsyncOpenAI(
    base_url="http://localhost:41002/v1",
    api_key="sk-...",
)

async def governed_completion_async(
    messages: list[dict],
    max_retries: int = 3,
) -> str:
    for attempt in range(max_retries):
        try:
            response = await client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
            return response.choices[0].message.content or ""
        except APIStatusError as e:
            if e.status_code == 409:
                error_body = e.response.json()
                return f"Blocked: {error_body.get('error', {}).get('message', 'Policy violation')}"
            if e.status_code in (429, 502, 503, 504) and attempt < max_retries - 1:
                await asyncio.sleep(2 ** attempt)
                continue
            raise
    return "Request failed after retries"

TypeScript Error Handling

Comprehensive Handler

import OpenAI from "openai";

interface PolicyErrorBody {
  error: {
    type: string;
    message: string;
    policy?: string;
    code: string;
  };
}

const client = new OpenAI({
  baseURL: process.env.LLM_GATEWAY_URL ?? "http://localhost:41002/v1",
  apiKey: process.env.OPENAI_API_KEY,
});

async function governedCompletion(
  messages: OpenAI.ChatCompletionMessageParam[],
  maxRetries = 3,
): Promise<string> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model: "gpt-4o",
        messages,
      });
      return response.choices[0].message.content ?? "";
    } catch (err) {
      if (err instanceof OpenAI.APIError) {
        if (err.status === 409) {
          const body = err.error as PolicyErrorBody["error"];
          return `Blocked (${body?.policy ?? "unknown"}): ${body?.message ?? "Policy violation"}`;
        }
        if (
          err.status !== undefined &&
          [429, 502, 503, 504].includes(err.status) &&
          attempt < maxRetries - 1
        ) {
          await new Promise((r) => setTimeout(r, 2 ** attempt * 1000));
          continue;
        }
      }
      throw err;
    }
  }
  return "Request failed after retries";
}

Fallback Patterns

Model Fallback

When the primary model is blocked or unavailable, fall back to an alternative:

def completion_with_fallback(messages: list[dict]) -> str:
    models = ["gpt-4o", "gpt-4o-mini"]
    for model in models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
            )
            return response.choices[0].message.content or ""
        except APIStatusError as e:
            if e.status_code == 409:
                return f"Blocked: {e.response.json().get('error', {}).get('message')}"
            if e.status_code in (502, 504):
                continue
            raise
    return "All models unavailable"

Cached Response Fallback

from functools import lru_cache

@lru_cache(maxsize=128)
def cached_completion(prompt_hash: str, prompt: str) -> str:
    return governed_completion([{"role": "user", "content": prompt}])

def completion_with_cache_fallback(prompt: str) -> str:
    prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()[:16]
    try:
        result = governed_completion([{"role": "user", "content": prompt}])
        cached_completion.cache_clear()  # Update cache on success
        return result
    except Exception:
        cached = cached_completion(prompt_hash, prompt)
        if cached:
            return cached
        return "Service temporarily unavailable"

Logging and Observability

Log policy blocks for operational visibility:

import logging

logger = logging.getLogger("ai_governance")

def observed_completion(messages: list[dict]) -> str:
    try:
        response = client.chat.completions.create(model="gpt-4o", messages=messages)
        logger.info("completion_success", extra={"model": "gpt-4o"})
        return response.choices[0].message.content or ""
    except APIStatusError as e:
        if e.status_code == 409:
            error_body = e.response.json().get("error", {})
            logger.warning(
                "policy_block",
                extra={
                    "policy": error_body.get("policy"),
                    "code": error_body.get("code"),
                    "message": error_body.get("message"),
                },
            )
            return f"Blocked: {error_body.get('message')}"
        logger.error("api_error", extra={"status": e.status_code})
        raise

Best Practices

Never retry on 409 — policy blocks are deterministic; the same input will always be blocked.
Use exponential backoff for 429/5xx — start at 1s, double each attempt, cap at 3 retries.
Type your error envelopes — define interfaces for PolicyErrorBody for type-safe handling.
Differentiate error codes — escalation_required and budget_exceeded need different UX flows.
Log every policy block — correlate with gateway decision events for debugging.
Provide fallback responses — users should see helpful messages, not raw error JSON.
Set client timeouts — prevent hung connections from blocking your application.
Test error paths — use observe-only policies to generate realistic 409 responses during development.

Next steps

Developer Quick Start — set up the gateway and send your first request
Streaming Patterns — handling errors during streaming
Function Calling — tool-specific error handling

For AI systems

Canonical terms: error envelope, policy block (409), rate limit (429), upstream error (502), gateway timeout (504), type, code, policy, message.
Error codes: input_blocked, output_blocked, tool_blocked, budget_exceeded, escalation_required, gateway_rate_limit, provider_rate_limit.
Never retry on 409 (deterministic). Exponential backoff for 429/5xx (start 1s, double, cap 3 retries).
Best next pages: Developer Quick Start, Streaming Patterns, Function Calling.

For engineers

All gateway errors follow a consistent JSON envelope: {"error": {"type", "message", "policy", "code"}}.
409 policy blocks are deterministic — the same input always triggers the same block. Do not retry.
Differentiate error codes: escalation_required needs a human-approval UX; budget_exceeded needs a top-up flow.
Use exponential backoff for 429 (start at 1s, double each attempt, cap at 3 retries).
Type your error envelopes (PolicyErrorBody interface) for compile-time safety in TypeScript applications.
Log every policy block with the policy field to correlate with gateway decision events for debugging.
Set client timeouts to prevent hung connections during 504 scenarios.

For leaders

Policy blocks (409) are expected behavior, not system failures — they indicate governance is working.
Applications must provide fallback UX for blocks; users should see helpful messages, not raw error JSON.
escalation_required errors create a human-in-the-loop workflow — plan staffing for escalation review.
Budget-exceeded errors may indicate insufficient wallet allocation or unexpectedly expensive model usage.
Error handling quality directly impacts user trust in governed AI systems.

Use this page when​

Primary audience​

Error Envelope Format​

Envelope Fields​

HTTP Status Codes​

Error Codes Reference​

Policy Violation Codes (409)​

Rate Limit Codes (429)​

Python Error Handling​

Comprehensive Handler​

Async Handler​

TypeScript Error Handling​

Comprehensive Handler​

Fallback Patterns​

Model Fallback​

Cached Response Fallback​

Logging and Observability​

Best Practices​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Error Envelope Format

Envelope Fields

HTTP Status Codes

Error Codes Reference

Policy Violation Codes (409)

Rate Limit Codes (429)

Python Error Handling

Comprehensive Handler

Async Handler

TypeScript Error Handling

Comprehensive Handler

Fallback Patterns

Model Fallback

Cached Response Fallback

Logging and Observability

Best Practices

Next steps

For AI systems

For engineers

For leaders