Debugging AI Requests with Events
Every request through the Keeptrusts gateway produces a decision event containing the full lifecycle: policy evaluation, upstream latency, token usage, and the final decision. This guide shows you how to use kt events tail, the console Events page, and filtering to diagnose issues fast.
Use this page when
- You need to debug why an AI request was blocked, slow, or returned unexpected results.
- You want to use
kt events tailfor real-time event streaming during development. - You are filtering decision events by token name, model, status, or decision type.
- You need to trace end-to-end latency and distinguish gateway overhead from provider latency.
Primary audience
- Primary: Developers debugging AI request issues in development and staging
- Secondary: SRE Engineers investigating production incidents, Platform Engineers monitoring gateway health
Decision Event Structure
Each event captures the complete request lifecycle:
{
"id": "evt_a1b2c3d4",
"timestamp": "2026-04-23T14:30:12Z",
"model": "gpt-4o",
"provider": "openai",
"decision": "allowed",
"policies_evaluated": [
{"name": "block-pii-output", "result": "pass"},
{"name": "max-tokens", "result": "pass"},
{"name": "log-all", "result": "logged"}
],
"latency_ms": 842,
"upstream_latency_ms": 780,
"tokens": {"prompt": 156, "completion": 89, "total": 245},
"token_name": "app-production",
"status_code": 200
}
Key Fields
| Field | Description |
|---|---|
decision | allowed, blocked, escalated, or modified |
policies_evaluated | List of policies and their individual results |
latency_ms | Total round-trip time (gateway overhead + upstream) |
upstream_latency_ms | Time spent waiting for the LLM provider |
token_name | Which API or gateway key was used |
status_code | HTTP status returned to the client |
Using kt events tail
The CLI provides a real-time event stream for debugging:
Basic Tail
kt events tail
Output streams events as they arrive:
14:30:12 [allowed] gpt-4o 842ms 245tok app-production
14:30:15 [blocked] gpt-4o 12ms 0tok dev-prototyping → block-pii-output
14:30:18 [allowed] gpt-4o-mini 356ms 128tok frontend-key
Filtering Events
Filter by decision type:
# Only blocked requests
kt events tail --filter "decision=blocked"
# Only a specific model
kt events tail --filter "model=gpt-4o"
# Only a specific token
kt events tail --filter "token_name=app-production"
# Combine filters
kt events tail --filter "decision=blocked,model=gpt-4o"
Limiting Output
# Last 20 events
kt events tail --limit 20
# Events from the last hour
kt events tail --since 1h
JSON Output for Scripting
kt events tail --limit 5 --output json | jq '.[] | {decision, model, latency_ms}'
{"decision": "allowed", "model": "gpt-4o", "latency_ms": 842}
{"decision": "blocked", "model": "gpt-4o", "latency_ms": 12}
{"decision": "allowed", "model": "gpt-4o-mini", "latency_ms": 356}
Console Events Page
The management console provides a visual Events page with advanced filtering.
Navigating to Events
- Open the console at your deployment URL.
- Click Events in the sidebar.
- The page displays recent events with sortable columns.
Filtering in the Console
Use the filter bar to narrow results:
- Decision:
allowed,blocked,escalated - Model: Select from models in use
- Time range: Last hour, last 24h, last 7 days, or custom
- Token: Filter by token name
- Policy: Filter by triggering policy name
Event Detail View
Click any event row to see the full detail:
- Request summary: model, provider, timestamp
- Policy evaluation chain: each policy's result in order
- Timing breakdown: gateway overhead vs. upstream latency
- Token usage: prompt, completion, and total tokens
- Error details: for blocked or failed requests
Debugging Common Issues
"Why Was My Request Blocked?"
kt events tail --filter "decision=blocked" --limit 5
Check the policies_evaluated array to find which policy triggered:
{
"decision": "blocked",
"policies_evaluated": [
{"name": "block-prompt-injection", "result": "blocked", "reason": "pattern match: ignore previous"}
]
}
Fix: Adjust the policy pattern or rephrase the prompt.
"Why Is My Request Slow?"
Compare latency_ms and upstream_latency_ms:
kt events tail --limit 10 --output json | \
jq '.[] | {model, total: .latency_ms, upstream: .upstream_latency_ms, overhead: (.latency_ms - .upstream_latency_ms)}'
{"model": "gpt-4o", "total": 842, "upstream": 780, "overhead": 62}
{"model": "gpt-4o", "total": 2340, "upstream": 2290, "overhead": 50}
If overhead is high (>200ms), check:
- Number of active policies (each adds evaluation time)
- Knowledge base injection size
- Network latency between gateway and API
If upstream is high, the provider is slow — consider switching models or using a fallback chain.
"Why Am I Getting 401 Errors?"
kt events tail --filter "status_code=401" --limit 5
Common causes:
- Expired gateway key — check
kt tokens inspect --name "your-key" - Invalid API key in provider config — verify
secret_key_refis set - Revoked token — list active tokens with
kt tokens list
"My Knowledge Base Isn't Being Used"
Check the event detail for knowledge base injection:
kt events tail --limit 1 --output json | jq '.[0].knowledge_assets_injected'
If empty, verify:
- The asset is promoted:
kt knowledge-base list - The asset is bound in config: check
knowledge_base.assetsin your YAML - The gateway reloaded after config change
Trace Correlation
When running multiple services, correlate events across the pipeline:
# Find events for a specific request ID
kt events tail --filter "request_id=req_xyz789"
Adding Custom Trace IDs
Pass a trace header through the gateway:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
extra_headers={"X-Request-ID": "my-trace-id-123"},
)
# Find the event by your custom trace ID
kt events tail --filter "request_id=my-trace-id-123"
Latency Analysis
Histogram of Recent Latencies
kt events tail --limit 100 --output json | \
jq -r '.[].latency_ms' | \
awk '{
if ($1 < 500) bucket="<500ms"
else if ($1 < 1000) bucket="500ms-1s"
else if ($1 < 2000) bucket="1s-2s"
else bucket=">2s"
print bucket
}' | sort | uniq -c | sort -rn
P95 Latency
kt events tail --limit 100 --output json | \
jq -r '.[].latency_ms' | sort -n | \
awk 'NR==int(NR*0.95) {print "P95:", $1, "ms"}'
Best Practices
| Practice | Why |
|---|---|
Use kt events tail during development | Real-time feedback on policy behavior |
| Filter by token name in production | Isolate traffic from specific applications |
| Compare gateway vs. upstream latency | Distinguish policy overhead from provider slowness |
Add X-Request-ID headers | Enables end-to-end trace correlation |
| Check events after config changes | Verify new policies are evaluated correctly |
| Export events for offline analysis | --output json pipes into jq, pandas, etc. |
Next steps
- Testing AI-Integrated Code — use events to verify test assertions
- Managing API Keys & Gateway Keys — debug token-related 401 errors
- Routing Across Multiple AI Models — debug model routing decisions
For AI systems
- Canonical terms: decision event,
kt events tail, filtering,latency_ms,upstream_latency_ms,token_name,policies_evaluated,X-Request-ID, event stream. - CLI commands:
kt events tail(real-time),kt events tail --decision blocked(filter),kt events tail --token dev-key(by key),kt events tail --output json(machine-readable). - Key event fields:
decision(allowed/blocked/escalated/modified),policies_evaluated,latency_ms,upstream_latency_ms,token_name,status_code. - Best next pages: Testing AI Code, API Key Management, Multi-Model Routing.
For engineers
- Use
kt events tailduring development for real-time feedback on policy behavior. - Filter by
--decision blockedto isolate policy violations; filter by--tokento trace specific application traffic. - Compare
latency_msvsupstream_latency_msto distinguish gateway policy overhead from provider slowness. - Add
X-Request-IDheaders in your application to enable end-to-end trace correlation with decision events. - Use
--output json | jqfor programmatic analysis and P95 latency calculations. - Check events immediately after config changes to verify new policies are being evaluated correctly.
For leaders
- Decision events provide complete observability without requiring application-level instrumentation.
- Event-based debugging reduces mean time to resolution (MTTR) for AI-related production incidents.
- Latency decomposition (gateway vs. provider) identifies whether issues are governance overhead or provider problems.
- Event filtering by token name enables per-application traffic isolation for targeted troubleshooting.
- All debugging data is audit-trail quality — the same events used for debugging serve compliance reporting.