Observability for AI-Governed Systems
Every request through the Keeptrusts gateway produces structured telemetry — events, logs, metrics, and traces. This guide shows how to instrument your stack for full observability from application to LLM provider.
Use this page when
- You are configuring structured logging, metrics collection, or OpenTelemetry integration for the gateway
- You need to build Grafana dashboards for gateway performance and policy enforcement outcomes
- You want to correlate application logs with gateway request IDs
- You are setting up alerting on governance metrics (block rate, latency, error rate)
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Observability Architecture
Structured Logging
Gateway Log Format
The gateway emits structured JSON logs with consistent fields:
{
"timestamp": "2026-04-23T10:15:30.123Z",
"level": "info",
"target": "kt_gateway::server",
"message": "Request completed",
"request_id": "req_abc123",
"model": "gpt-4o",
"provider": "openai",
"status": 200,
"latency_ms": 1245,
"input_tokens": 150,
"output_tokens": 89,
"policies_applied": ["content-filter", "pii-redaction"],
"policy_action": "pass",
"cache_hit": false
}
Log Configuration
gateway:
logging:
# Log level: trace, debug, info, warn, error
level: info
# Output format: json or pretty
format: json
# Include request/response bodies (⚠️ sensitive data)
log_bodies: false
# Redact these fields from logs
redact_fields: [api_key, authorization]
Application-Side Logging
Correlate application logs with gateway request IDs:
import { randomUUID } from 'crypto';
async function callAI(messages: Message[]) {
const requestId = randomUUID();
console.log(JSON.stringify({
level: 'info',
message: 'Sending AI request',
request_id: requestId,
model: 'gpt-4o',
input_tokens_estimate: estimateTokens(messages),
}));
const response = await fetch('http://kt-gateway:41002/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Request-ID': requestId,
},
body: JSON.stringify({ model: 'gpt-4o', messages }),
});
const data = await response.json();
console.log(JSON.stringify({
level: 'info',
message: 'AI request completed',
request_id: requestId,
status: response.status,
output_tokens: data.usage?.completion_tokens,
}));
return data;
}
Metrics Collection
Gateway Metrics Endpoint
The gateway exposes Prometheus-compatible metrics:
curl http://localhost:41002/metrics
Key Metrics
| Metric | Type | Description |
|---|---|---|
kt_requests_total | Counter | Total requests by provider, model, status |
kt_request_duration_seconds | Histogram | Request latency distribution |
kt_policy_evaluations_total | Counter | Policy evaluations by name, action |
kt_policy_duration_seconds | Histogram | Policy evaluation latency |
kt_tokens_total | Counter | Tokens processed (input/output) |
kt_connections_active | Gauge | Active upstream connections |
kt_circuit_breaker_state | Gauge | Circuit breaker state per provider |
kt_cache_hits_total | Counter | Cache hits and misses |
Prometheus Scrape Configuration
# prometheus.yml
scrape_configs:
- job_name: 'kt-gateway'
scrape_interval: 15s
static_configs:
- targets: ['kt-gateway:41002']
metrics_path: /metrics
Grafana Dashboard Panels
Key panels for your AI governance dashboard:
Row 1: Traffic Overview
- Requests/sec by provider (kt_requests_total rate)
- Error rate by provider (kt_requests_total{status=~"5.."})
- Active connections (kt_connections_active)
Row 2: Latency
- P50/P90/P99 latency (kt_request_duration_seconds)
- Policy evaluation latency (kt_policy_duration_seconds)
- Time to first byte for streaming
Row 3: Tokens and Cost
- Tokens/sec by model (kt_tokens_total rate)
- Estimated cost/hour
- Cache hit ratio (kt_cache_hits_total)
Row 4: Governance
- Policy actions (block, redact, pass)
- Circuit breaker states
- Escalation rate
OpenTelemetry Integration
Gateway OTLP Export
Configure the gateway to export spans via OTLP:
gateway:
telemetry:
otlp:
enabled: true
endpoint: http://otel-collector:4317
protocol: grpc
# Sampling rate (1.0 = 100%)
sample_rate: 0.1
# Additional resource attributes
resource_attributes:
service.name: kt-gateway
deployment.environment: production
service.version: "0.12.3"
OTel Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1024
attributes:
actions:
- key: api_key
action: delete
- key: authorization
action: delete
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, attributes]
exporters: [otlp/jaeger]
Span Structure
Console Dashboard Correlation
Event-Based Observability
The console dashboard provides a unified view of all gateway events:
# Tail events in real time
kt events tail
# Filter events by status
kt events tail --filter "status=blocked"
# Filter by policy action
kt events tail --filter "policy_action=redact"
# Search events with full-text
kt events search "prompt injection" --last 24h
Event Fields for Debugging
Each event in the console contains:
| Field | Description |
|---|---|
event_id | Unique event identifier |
request_id | Correlation ID from the original request |
timestamp | When the request was processed |
model | Model requested |
provider | Provider that served the request |
status | HTTP status code |
latency_ms | Total request latency |
input_tokens | Input token count |
output_tokens | Output token count |
policies_applied | List of policies evaluated |
policy_action | Final policy action (pass/block/redact) |
gateway_id | Which gateway processed the request |
Console Debugging Workflow
Alerting Rules
Prometheus Alert Examples
groups:
- name: kt-gateway
rules:
- alert: HighErrorRate
expr: rate(kt_requests_total{status=~"5.."}[5m]) / rate(kt_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Gateway error rate > 5%"
- alert: HighLatency
expr: histogram_quantile(0.99, rate(kt_request_duration_seconds_bucket[5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "P99 latency > 10s"
- alert: CircuitBreakerOpen
expr: kt_circuit_breaker_state == 1
for: 1m
labels:
severity: warning
annotations:
summary: "Circuit breaker open for {{ $labels.provider }}"
Log Aggregation Pipeline
Next steps
- Distributed Tracing Across AI Services — trace propagation and correlation
- Capacity Planning for AI Workloads — alert-driven scaling decisions
- Performance Engineering the AI Gateway — optimize what you measure
For AI systems
- Canonical terms: structured JSON logs,
request_id, Prometheus metrics,gateway.logging.level,gateway.logging.format, OTLP spans, OTel collector, console dashboard,kt events stats, policy_action, cache_hit - Key configuration:
gateway.logging(level, format, log_bodies, redact_fields), Prometheus scrape config, OTel collector pipeline - Best next pages: Distributed Tracing, Performance Engineering, Incident Response
For engineers
- Gateway emits structured JSON logs with
request_id,model,provider,status,latency_ms,policies_applied, andpolicy_action - Set
log_bodies: falsein production to avoid logging sensitive request/response content - Correlate logs: include the same
request_idin application-side logs before calling the gateway - Metrics: scrape Prometheus endpoint for
keeptrusts_decisions_total,keeptrusts_request_duration_seconds,kt_tokens_consumed - Console dashboard provides pre-built panels for interaction volume, policy outcomes, provider mix, and cost trends
For leaders
- Full observability stack (logs, metrics, traces, events) enables proactive governance monitoring rather than reactive incident response
- Console dashboard provides executive-ready views of AI governance posture without requiring Grafana expertise
- Structured logging with
request_idcorrelation reduces mean time to resolution (MTTR) when investigating governance incidents