Distributed Tracing Across AI Services
When an AI request flows through your application, the Keeptrusts gateway, and an LLM provider, distributed tracing ties the entire journey into a single trace. This guide covers trace propagation, collector configuration, span enrichment, and trace-to-event correlation.
Use this page when
- You are propagating OpenTelemetry traces through the Keeptrusts gateway to your trace backend
- You need to configure the OTel collector sidecar for gateway span export
- You want to correlate application-level traces with gateway decision events
- You are adding custom span attributes (model, provider, token count) to AI-related spans
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Trace Propagation Architecture
W3C Trace Context
The gateway propagates the W3C traceparent header through the entire request chain:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
├─ version
├──────── trace-id (32 hex)
├─────────────────────────────────── parent-span-id (16 hex)
└────────────────────────────────────────────────────── flags
If the incoming request includes a traceparent, the gateway creates child spans under that trace. If no trace context is present, the gateway starts a new trace.
Instrumenting Your Application
Python (OpenTelemetry SDK)
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.propagate import inject
import httpx
# Setup
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("my-app")
async def ask_ai(question: str) -> str:
with tracer.start_as_current_span("ai.completion") as span:
span.set_attribute("ai.model", "gpt-4o")
span.set_attribute("ai.provider", "openai")
span.set_attribute("ai.input_length", len(question))
headers = {"Content-Type": "application/json"}
inject(headers) # Injects traceparent header
async with httpx.AsyncClient() as client:
response = await client.post(
"http://kt-gateway:41002/v1/chat/completions",
headers=headers,
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": question}],
},
timeout=120.0,
)
data = response.json()
span.set_attribute("ai.output_tokens", data["usage"]["completion_tokens"])
span.set_attribute("ai.status", response.status_code)
return data["choices"][0]["message"]["content"]
TypeScript (OpenTelemetry SDK)
import { trace, context, propagation } from '@opentelemetry/api';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
const provider = new NodeTracerProvider();
provider.addSpanProcessor(
new BatchSpanProcessor(
new OTLPTraceExporter({ url: 'http://otel-collector:4317' })
)
);
provider.register();
const tracer = trace.getTracer('my-app');
async function askAI(question: string): Promise<string> {
return tracer.startActiveSpan('ai.completion', async (span) => {
span.setAttribute('ai.model', 'gpt-4o');
span.setAttribute('ai.provider', 'openai');
const headers: Record<string, string> = {
'Content-Type': 'application/json',
};
propagation.inject(context.active(), headers);
const response = await fetch(
'http://kt-gateway:41002/v1/chat/completions',
{
method: 'POST',
headers,
body: JSON.stringify({
model: 'gpt-4o',
messages: [{ role: 'user', content: question }],
}),
}
);
const data = await response.json();
span.setAttribute('ai.output_tokens', data.usage.completion_tokens);
span.end();
return data.choices[0].message.content;
});
}
OTel Collector Sidecar
Deployment Architecture
Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 512
# Enrich spans with deployment metadata
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
- key: service.namespace
value: ai-platform
action: upsert
# Remove sensitive data from spans
attributes:
actions:
- key: http.request.header.authorization
action: delete
- key: http.request.header.x-api-key
action: delete
- key: ai.prompt.content
action: delete
# Tail-based sampling: keep errors and slow requests
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
- name: slow-requests
type: latency
latency:
threshold_ms: 5000
- name: sample-rest
type: probabilistic
probabilistic:
sampling_percentage: 10
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resource, attributes, tail_sampling]
exporters: [otlp/jaeger]
Kubernetes Sidecar Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-ai-app
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://localhost:4317"
- name: kt-gateway
image: keeptrusts/gateway:latest
env:
- name: KT_OTLP_ENDPOINT
value: "http://localhost:4317"
- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
args: ["--config=/etc/otel/config.yaml"]
volumeMounts:
- name: otel-config
mountPath: /etc/otel
ports:
- containerPort: 4317 # gRPC
- containerPort: 4318 # HTTP
volumes:
- name: otel-config
configMap:
name: otel-collector-config
Span Attributes
Gateway-Emitted Span Attributes
| Attribute | Type | Description |
|---|---|---|
ai.model | string | Requested model name |
ai.provider | string | Provider that served the request |
ai.input_tokens | int | Input token count |
ai.output_tokens | int | Output token count |
ai.total_tokens | int | Total token count |
ai.streaming | bool | Whether the response was streamed |
ai.cache_hit | bool | Whether the response was served from cache |
kt.gateway_id | string | Gateway instance identifier |
kt.policy.action | string | Final policy action (pass/block/redact) |
kt.policy.names | string[] | Policies that were evaluated |
kt.event_id | string | Corresponding event ID in the control-plane |
kt.request_id | string | Gateway-assigned request identifier |
Custom Span Attributes
Add application-specific attributes for richer traces:
with tracer.start_as_current_span("ai.completion") as span:
span.set_attribute("app.feature", "customer-support")
span.set_attribute("app.user_tier", "enterprise")
span.set_attribute("app.conversation_id", conversation_id)
span.set_attribute("app.retry_attempt", attempt_number)
Trace-to-Event Correlation
How It Works
Every gateway event includes the trace ID, enabling bidirectional lookup:
Lookup by Trace ID
# Find the event for a specific trace
kt events search --trace-id "4bf92f3577b34da6a3ce929d0e0e4736"
Lookup by Event ID
From the console event detail view, the trace ID is displayed and links directly to your trace backend.
Correlation in Practice
Sampling Strategies
Head-Based Sampling
Decide at trace creation whether to sample:
gateway:
telemetry:
otlp:
sample_rate: 0.1 # Sample 10% of traces
Tail-Based Sampling
Decide after the trace completes (requires the OTel collector):
| Strategy | Keep | Drop |
|---|---|---|
| Errors | All error traces | N/A |
| Slow requests | Latency > 5s | N/A |
| Policy blocks | All blocked requests | N/A |
| Normal requests | 10% sample | 90% |
Recommended Production Sampling
# OTel collector tail sampling
tail_sampling:
policies:
- name: always-keep-errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: always-keep-slow
type: latency
latency: { threshold_ms: 5000 }
- name: sample-normal
type: probabilistic
probabilistic: { sampling_percentage: 5 }
Debugging with Traces
Common Trace Patterns
Healthy request:
[App: 250ms] → [Gateway: 245ms] → [Policy.Input: 2ms] → [Upstream: 240ms] → [Policy.Output: 3ms]
Slow policy evaluation:
[App: 5200ms] → [Gateway: 5195ms] → [Policy.Input: 4950ms ⚠️] → [Upstream: 240ms]
Provider timeout:
[App: 30100ms] → [Gateway: 30095ms] → [Policy.Input: 2ms] → [Upstream: 30000ms TIMEOUT ❌]
Next steps
- Observability for AI-Governed Systems — metrics and logging
- Performance Engineering the AI Gateway — optimize traced bottlenecks
- Capacity Planning for AI Workloads — use trace data for planning
For AI systems
- Canonical terms: W3C
traceparent, OpenTelemetry, OTel collector, OTLP exporter, span attributes, trace-to-event correlation,ai.model,ai.provider,ai.input_length,request_id, gateway spans, policy spans - Key config:
otel-collector-config.yaml,receivers.otlp,exporters,processors.batch - Best next pages: Observability for AI-Governed Systems, Performance Engineering, Incident Response
For engineers
- The gateway creates child spans under incoming
traceparentheaders — no trace context = new trace started by gateway - Instrument with:
from opentelemetry.propagate import injectthen inject headers before calling the gateway - Add semantic attributes:
ai.model,ai.provider,ai.input_length,ai.output_tokensfor AI-specific span data - Correlate traces to events using
request_idreturned in gateway response headers - Deploy OTel collector sidecar exporting to your trace backend (Jaeger, Tempo, Honeycomb)
For leaders
- Distributed tracing provides end-to-end latency visibility from application through gateway to LLM provider — critical for SLA tracking
- Trace-to-event correlation enables rapid root cause analysis when governance decisions cause unexpected behavior
- OpenTelemetry is vendor-neutral — switch trace backends without re-instrumenting applications