Debugging AI Requests with Events

Every request through the Keeptrusts gateway produces a decision event containing the full lifecycle: policy evaluation, upstream latency, token usage, and the final decision. This guide shows you how to use kt events tail, the console Events page, and filtering to diagnose issues fast.

Use this page when

You need to debug why an AI request was blocked, slow, or returned unexpected results.
You want to use kt events tail for real-time event streaming during development.
You are filtering decision events by token name, model, status, or decision type.
You need to trace end-to-end latency and distinguish gateway overhead from provider latency.

Primary audience

Primary: Developers debugging AI request issues in development and staging
Secondary: SRE Engineers investigating production incidents, Platform Engineers monitoring gateway health

Decision Event Structure

Each event captures the complete request lifecycle:

{
  "id": "evt_a1b2c3d4",
  "timestamp": "2026-04-23T14:30:12Z",
  "model": "gpt-4o",
  "provider": "openai",
  "decision": "allowed",
  "policies_evaluated": [
    {"name": "block-pii-output", "result": "pass"},
    {"name": "max-tokens", "result": "pass"},
    {"name": "log-all", "result": "logged"}
  ],
  "latency_ms": 842,
  "upstream_latency_ms": 780,
  "tokens": {"prompt": 156, "completion": 89, "total": 245},
  "token_name": "app-production",
  "status_code": 200
}

Key Fields

Field	Description
`decision`	`allowed`, `blocked`, `escalated`, or `modified`
`policies_evaluated`	List of policies and their individual results
`latency_ms`	Total round-trip time (gateway overhead + upstream)
`upstream_latency_ms`	Time spent waiting for the LLM provider
`token_name`	Which API or gateway key was used
`status_code`	HTTP status returned to the client

Using `kt events tail`

The CLI provides a real-time event stream for debugging:

Basic Tail

kt events tail

Output streams events as they arrive:

30:12 [allowed]  gpt-4o  842ms  245tok  app-production
30:15 [blocked]  gpt-4o  12ms   0tok    dev-prototyping  → block-pii-output
30:18 [allowed]  gpt-4o-mini  356ms  128tok  frontend-key

Filtering Events

Filter by decision type:

# Only blocked requests
kt events tail --filter "decision=blocked"

# Only a specific model
kt events tail --filter "model=gpt-4o"

# Only a specific token
kt events tail --filter "token_name=app-production"

# Combine filters
kt events tail --filter "decision=blocked,model=gpt-4o"

Limiting Output

# Last 20 events
kt events tail --limit 20

# Events from the last hour
kt events tail --since 1h

JSON Output for Scripting

kt events tail --limit 5 --output json | jq '.[] | {decision, model, latency_ms}'

{"decision": "allowed", "model": "gpt-4o", "latency_ms": 842}
{"decision": "blocked", "model": "gpt-4o", "latency_ms": 12}
{"decision": "allowed", "model": "gpt-4o-mini", "latency_ms": 356}

Console Events Page

The management console provides a visual Events page with advanced filtering.

Navigating to Events

Open the console at your deployment URL.
Click Events in the sidebar.
The page displays recent events with sortable columns.

Filtering in the Console

Use the filter bar to narrow results:

Decision: allowed, blocked, escalated
Model: Select from models in use
Time range: Last hour, last 24h, last 7 days, or custom
Token: Filter by token name
Policy: Filter by triggering policy name

Event Detail View

Click any event row to see the full detail:

Request summary: model, provider, timestamp
Policy evaluation chain: each policy's result in order
Timing breakdown: gateway overhead vs. upstream latency
Token usage: prompt, completion, and total tokens
Error details: for blocked or failed requests

Debugging Common Issues

"Why Was My Request Blocked?"

kt events tail --filter "decision=blocked" --limit 5

Check the policies_evaluated array to find which policy triggered:

{
  "decision": "blocked",
  "policies_evaluated": [
    {"name": "block-prompt-injection", "result": "blocked", "reason": "pattern match: ignore previous"}
  ]
}

Fix: Adjust the policy pattern or rephrase the prompt.

"Why Is My Request Slow?"

Compare latency_ms and upstream_latency_ms:

kt events tail --limit 10 --output json | \
  jq '.[] | {model, total: .latency_ms, upstream: .upstream_latency_ms, overhead: (.latency_ms - .upstream_latency_ms)}'

{"model": "gpt-4o", "total": 842, "upstream": 780, "overhead": 62}
{"model": "gpt-4o", "total": 2340, "upstream": 2290, "overhead": 50}

If overhead is high (>200ms), check:

Number of active policies (each adds evaluation time)
Knowledge base injection size
Network latency between gateway and API

If upstream is high, the provider is slow — consider switching models or using a fallback chain.

"Why Am I Getting 401 Errors?"

kt events tail --filter "status_code=401" --limit 5

Common causes:

Expired gateway key — check kt tokens inspect --name "your-key"
Invalid API key in provider config — verify secret_key_ref is set
Revoked token — list active tokens with kt tokens list

"My Knowledge Base Isn't Being Used"

Check the event detail for knowledge base injection:

kt events tail --limit 1 --output json | jq '.[0].knowledge_assets_injected'

If empty, verify:

The asset is promoted: kt knowledge-base list
The asset is bound in config: check knowledge_base.assets in your YAML
The gateway reloaded after config change

Trace Correlation

When running multiple services, correlate events across the pipeline:

# Find events for a specific request ID
kt events tail --filter "request_id=req_xyz789"

Adding Custom Trace IDs

Pass a trace header through the gateway:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Request-ID": "my-trace-id-123"},
)

# Find the event by your custom trace ID
kt events tail --filter "request_id=my-trace-id-123"

Latency Analysis

Histogram of Recent Latencies

kt events tail --limit 100 --output json | \
  jq -r '.[].latency_ms' | \
  awk '{
    if ($1 < 500) bucket="<500ms"
    else if ($1 < 1000) bucket="500ms-1s"
    else if ($1 < 2000) bucket="1s-2s"
    else bucket=">2s"
    print bucket
  }' | sort | uniq -c | sort -rn

P95 Latency

kt events tail --limit 100 --output json | \
  jq -r '.[].latency_ms' | sort -n | \
  awk 'NR==int(NR*0.95) {print "P95:", $1, "ms"}'

Best Practices

Practice	Why
Use `kt events tail` during development	Real-time feedback on policy behavior
Filter by token name in production	Isolate traffic from specific applications
Compare gateway vs. upstream latency	Distinguish policy overhead from provider slowness
Add `X-Request-ID` headers	Enables end-to-end trace correlation
Check events after config changes	Verify new policies are evaluated correctly
Export events for offline analysis	`--output json` pipes into jq, pandas, etc.

Next steps

Testing AI-Integrated Code — use events to verify test assertions
Managing API Keys & Gateway Keys — debug token-related 401 errors
Routing Across Multiple AI Models — debug model routing decisions

For AI systems

Canonical terms: decision event, kt events tail, filtering, latency_ms, upstream_latency_ms, token_name, policies_evaluated, X-Request-ID, event stream.
CLI commands: kt events tail (real-time), kt events tail --decision blocked (filter), kt events tail --token dev-key (by key), kt events tail --output json (machine-readable).
Key event fields: decision (allowed/blocked/escalated/modified), policies_evaluated, latency_ms, upstream_latency_ms, token_name, status_code.
Best next pages: Testing AI Code, API Key Management, Multi-Model Routing.

For engineers

Use kt events tail during development for real-time feedback on policy behavior.
Filter by --decision blocked to isolate policy violations; filter by --token to trace specific application traffic.
Compare latency_ms vs upstream_latency_ms to distinguish gateway policy overhead from provider slowness.
Add X-Request-ID headers in your application to enable end-to-end trace correlation with decision events.
Use --output json | jq for programmatic analysis and P95 latency calculations.
Check events immediately after config changes to verify new policies are being evaluated correctly.

For leaders

Decision events provide complete observability without requiring application-level instrumentation.
Event-based debugging reduces mean time to resolution (MTTR) for AI-related production incidents.
Latency decomposition (gateway vs. provider) identifies whether issues are governance overhead or provider problems.
Event filtering by token name enables per-application traffic isolation for targeted troubleshooting.
All debugging data is audit-trail quality — the same events used for debugging serve compliance reporting.

Use this page when​

Primary audience​

Decision Event Structure​

Key Fields​

Using kt events tail​

Basic Tail​

Filtering Events​

Limiting Output​

JSON Output for Scripting​

Console Events Page​

Navigating to Events​

Filtering in the Console​

Event Detail View​

Debugging Common Issues​

"Why Was My Request Blocked?"​

"Why Is My Request Slow?"​

"Why Am I Getting 401 Errors?"​

"My Knowledge Base Isn't Being Used"​

Trace Correlation​

Adding Custom Trace IDs​

Latency Analysis​

Histogram of Recent Latencies​

P95 Latency​

Best Practices​

Next steps​

For AI systems​

For engineers​

For leaders​