Observability for AI-Governed Systems

Every request through the Keeptrusts gateway produces structured telemetry — events, logs, metrics, and traces. This guide shows how to instrument your stack for full observability from application to LLM provider.

Use this page when

You are configuring structured logging, metrics collection, or OpenTelemetry integration for the gateway
You need to build Grafana dashboards for gateway performance and policy enforcement outcomes
You want to correlate application logs with gateway request IDs
You are setting up alerting on governance metrics (block rate, latency, error rate)

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Observability Architecture

Structured Logging

Gateway Log Format

The gateway emits structured JSON logs with consistent fields:

{
  "timestamp": "2026-04-23T10:15:30.123Z",
  "level": "info",
  "target": "kt_gateway::server",
  "message": "Request completed",
  "request_id": "req_abc123",
  "model": "gpt-4o",
  "provider": "openai",
  "status": 200,
  "latency_ms": 1245,
  "input_tokens": 150,
  "output_tokens": 89,
  "policies_applied": ["content-filter", "pii-redaction"],
  "policy_action": "pass",
  "cache_hit": false
}

Log Configuration

gateway:
  logging:
    # Log level: trace, debug, info, warn, error
    level: info
    # Output format: json or pretty
    format: json
    # Include request/response bodies (⚠️ sensitive data)
    log_bodies: false
    # Redact these fields from logs
    redact_fields: [api_key, authorization]

Application-Side Logging

Correlate application logs with gateway request IDs:

import { randomUUID } from 'crypto';

async function callAI(messages: Message[]) {
  const requestId = randomUUID();

  console.log(JSON.stringify({
    level: 'info',
    message: 'Sending AI request',
    request_id: requestId,
    model: 'gpt-4o',
    input_tokens_estimate: estimateTokens(messages),
  }));

  const response = await fetch('http://kt-gateway:41002/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-Request-ID': requestId,
    },
    body: JSON.stringify({ model: 'gpt-4o', messages }),
  });

  const data = await response.json();

  console.log(JSON.stringify({
    level: 'info',
    message: 'AI request completed',
    request_id: requestId,
    status: response.status,
    output_tokens: data.usage?.completion_tokens,
  }));

  return data;
}

Metrics Collection

Gateway Metrics Endpoint

The gateway exposes Prometheus-compatible metrics:

curl http://localhost:41002/metrics

Key Metrics

Metric	Type	Description
`kt_requests_total`	Counter	Total requests by provider, model, status
`kt_request_duration_seconds`	Histogram	Request latency distribution
`kt_policy_evaluations_total`	Counter	Policy evaluations by name, action
`kt_policy_duration_seconds`	Histogram	Policy evaluation latency
`kt_tokens_total`	Counter	Tokens processed (input/output)
`kt_connections_active`	Gauge	Active upstream connections
`kt_circuit_breaker_state`	Gauge	Circuit breaker state per provider
`kt_cache_hits_total`	Counter	Cache hits and misses

Prometheus Scrape Configuration

# prometheus.yml
scrape_configs:
  - job_name: 'kt-gateway'
    scrape_interval: 15s
    static_configs:
      - targets: ['kt-gateway:41002']
    metrics_path: /metrics

Grafana Dashboard Panels

Key panels for your AI governance dashboard:

Row 1: Traffic Overview
  - Requests/sec by provider (kt_requests_total rate)
  - Error rate by provider (kt_requests_total{status=~"5.."})
  - Active connections (kt_connections_active)

Row 2: Latency
  - P50/P90/P99 latency (kt_request_duration_seconds)
  - Policy evaluation latency (kt_policy_duration_seconds)
  - Time to first byte for streaming

Row 3: Tokens and Cost
  - Tokens/sec by model (kt_tokens_total rate)
  - Estimated cost/hour
  - Cache hit ratio (kt_cache_hits_total)

Row 4: Governance
  - Policy actions (block, redact, pass)
  - Circuit breaker states
  - Escalation rate

OpenTelemetry Integration

Gateway OTLP Export

Configure the gateway to export spans via OTLP:

gateway:
  telemetry:
    otlp:
      enabled: true
      endpoint: http://otel-collector:4317
      protocol: grpc
      # Sampling rate (1.0 = 100%)
      sample_rate: 0.1
      # Additional resource attributes
      resource_attributes:
        service.name: kt-gateway
        deployment.environment: production
        service.version: "0.12.3"

OTel Collector Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024

  attributes:
    actions:
      - key: api_key
        action: delete
      - key: authorization
        action: delete

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes]
      exporters: [otlp/jaeger]

Span Structure

Console Dashboard Correlation

Event-Based Observability

The console dashboard provides a unified view of all gateway events:

# Tail events in real time
kt events tail

# Filter events by status
kt events tail --filter "status=blocked"

# Filter by policy action
kt events tail --filter "policy_action=redact"

# Search events with full-text
kt events search "prompt injection" --last 24h

Event Fields for Debugging

Each event in the console contains:

Field	Description
`event_id`	Unique event identifier
`request_id`	Correlation ID from the original request
`timestamp`	When the request was processed
`model`	Model requested
`provider`	Provider that served the request
`status`	HTTP status code
`latency_ms`	Total request latency
`input_tokens`	Input token count
`output_tokens`	Output token count
`policies_applied`	List of policies evaluated
`policy_action`	Final policy action (pass/block/redact)
`gateway_id`	Which gateway processed the request

Console Debugging Workflow

Alerting Rules

Prometheus Alert Examples

groups:
  - name: kt-gateway
    rules:
      - alert: HighErrorRate
        expr: rate(kt_requests_total{status=~"5.."}[5m]) / rate(kt_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Gateway error rate > 5%"

      - alert: HighLatency
        expr: histogram_quantile(0.99, rate(kt_request_duration_seconds_bucket[5m])) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency > 10s"

      - alert: CircuitBreakerOpen
        expr: kt_circuit_breaker_state == 1
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Circuit breaker open for {{ $labels.provider }}"

Log Aggregation Pipeline

Next steps

Distributed Tracing Across AI Services — trace propagation and correlation
Capacity Planning for AI Workloads — alert-driven scaling decisions
Performance Engineering the AI Gateway — optimize what you measure

For AI systems

Canonical terms: structured JSON logs, request_id, Prometheus metrics, gateway.logging.level, gateway.logging.format, OTLP spans, OTel collector, console dashboard, kt events stats, policy_action, cache_hit
Key configuration: gateway.logging (level, format, log_bodies, redact_fields), Prometheus scrape config, OTel collector pipeline
Best next pages: Distributed Tracing, Performance Engineering, Incident Response

For engineers

Gateway emits structured JSON logs with request_id, model, provider, status, latency_ms, policies_applied, and policy_action
Set log_bodies: false in production to avoid logging sensitive request/response content
Correlate logs: include the same request_id in application-side logs before calling the gateway
Metrics: scrape Prometheus endpoint for keeptrusts_decisions_total, keeptrusts_request_duration_seconds, kt_tokens_consumed
Console dashboard provides pre-built panels for interaction volume, policy outcomes, provider mix, and cost trends

For leaders

Full observability stack (logs, metrics, traces, events) enables proactive governance monitoring rather than reactive incident response
Console dashboard provides executive-ready views of AI governance posture without requiring Grafana expertise
Structured logging with request_id correlation reduces mean time to resolution (MTTR) when investigating governance incidents

Use this page when​

Primary audience​

Observability Architecture​

Structured Logging​

Gateway Log Format​

Log Configuration​

Application-Side Logging​

Metrics Collection​

Gateway Metrics Endpoint​

Key Metrics​

Prometheus Scrape Configuration​

Grafana Dashboard Panels​

OpenTelemetry Integration​

Gateway OTLP Export​

OTel Collector Configuration​

Span Structure​

Console Dashboard Correlation​

Event-Based Observability​

Event Fields for Debugging​

Console Debugging Workflow​

Alerting Rules​

Prometheus Alert Examples​

Log Aggregation Pipeline​

Next steps​

For AI systems​

For engineers​

For leaders​