Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Distributed Tracing Across AI Services

When an AI request flows through your application, the Keeptrusts gateway, and an LLM provider, distributed tracing ties the entire journey into a single trace. This guide covers trace propagation, collector configuration, span enrichment, and trace-to-event correlation.

Use this page when

  • You are propagating OpenTelemetry traces through the Keeptrusts gateway to your trace backend
  • You need to configure the OTel collector sidecar for gateway span export
  • You want to correlate application-level traces with gateway decision events
  • You are adding custom span attributes (model, provider, token count) to AI-related spans

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Trace Propagation Architecture

W3C Trace Context

The gateway propagates the W3C traceparent header through the entire request chain:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
├─ version
├──────── trace-id (32 hex)
├─────────────────────────────────── parent-span-id (16 hex)
└────────────────────────────────────────────────────── flags

If the incoming request includes a traceparent, the gateway creates child spans under that trace. If no trace context is present, the gateway starts a new trace.

Instrumenting Your Application

Python (OpenTelemetry SDK)

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.propagate import inject
import httpx

# Setup
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("my-app")

async def ask_ai(question: str) -> str:
with tracer.start_as_current_span("ai.completion") as span:
span.set_attribute("ai.model", "gpt-4o")
span.set_attribute("ai.provider", "openai")
span.set_attribute("ai.input_length", len(question))

headers = {"Content-Type": "application/json"}
inject(headers) # Injects traceparent header

async with httpx.AsyncClient() as client:
response = await client.post(
"http://kt-gateway:41002/v1/chat/completions",
headers=headers,
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": question}],
},
timeout=120.0,
)
data = response.json()

span.set_attribute("ai.output_tokens", data["usage"]["completion_tokens"])
span.set_attribute("ai.status", response.status_code)
return data["choices"][0]["message"]["content"]

TypeScript (OpenTelemetry SDK)

import { trace, context, propagation } from '@opentelemetry/api';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';

const provider = new NodeTracerProvider();
provider.addSpanProcessor(
new BatchSpanProcessor(
new OTLPTraceExporter({ url: 'http://otel-collector:4317' })
)
);
provider.register();

const tracer = trace.getTracer('my-app');

async function askAI(question: string): Promise<string> {
return tracer.startActiveSpan('ai.completion', async (span) => {
span.setAttribute('ai.model', 'gpt-4o');
span.setAttribute('ai.provider', 'openai');

const headers: Record<string, string> = {
'Content-Type': 'application/json',
};
propagation.inject(context.active(), headers);

const response = await fetch(
'http://kt-gateway:41002/v1/chat/completions',
{
method: 'POST',
headers,
body: JSON.stringify({
model: 'gpt-4o',
messages: [{ role: 'user', content: question }],
}),
}
);

const data = await response.json();
span.setAttribute('ai.output_tokens', data.usage.completion_tokens);
span.end();
return data.choices[0].message.content;
});
}

OTel Collector Sidecar

Deployment Architecture

Collector Configuration

# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:
timeout: 5s
send_batch_size: 512

# Enrich spans with deployment metadata
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
- key: service.namespace
value: ai-platform
action: upsert

# Remove sensitive data from spans
attributes:
actions:
- key: http.request.header.authorization
action: delete
- key: http.request.header.x-api-key
action: delete
- key: ai.prompt.content
action: delete

# Tail-based sampling: keep errors and slow requests
tail_sampling:
decision_wait: 10s
policies:
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
- name: slow-requests
type: latency
latency:
threshold_ms: 5000
- name: sample-rest
type: probabilistic
probabilistic:
sampling_percentage: 10

exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true

otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resource, attributes, tail_sampling]
exporters: [otlp/jaeger]

Kubernetes Sidecar Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-ai-app
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://localhost:4317"

- name: kt-gateway
image: keeptrusts/gateway:latest
env:
- name: KT_OTLP_ENDPOINT
value: "http://localhost:4317"

- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
args: ["--config=/etc/otel/config.yaml"]
volumeMounts:
- name: otel-config
mountPath: /etc/otel
ports:
- containerPort: 4317 # gRPC
- containerPort: 4318 # HTTP

volumes:
- name: otel-config
configMap:
name: otel-collector-config

Span Attributes

Gateway-Emitted Span Attributes

AttributeTypeDescription
ai.modelstringRequested model name
ai.providerstringProvider that served the request
ai.input_tokensintInput token count
ai.output_tokensintOutput token count
ai.total_tokensintTotal token count
ai.streamingboolWhether the response was streamed
ai.cache_hitboolWhether the response was served from cache
kt.gateway_idstringGateway instance identifier
kt.policy.actionstringFinal policy action (pass/block/redact)
kt.policy.namesstring[]Policies that were evaluated
kt.event_idstringCorresponding event ID in the control-plane
kt.request_idstringGateway-assigned request identifier

Custom Span Attributes

Add application-specific attributes for richer traces:

with tracer.start_as_current_span("ai.completion") as span:
span.set_attribute("app.feature", "customer-support")
span.set_attribute("app.user_tier", "enterprise")
span.set_attribute("app.conversation_id", conversation_id)
span.set_attribute("app.retry_attempt", attempt_number)

Trace-to-Event Correlation

How It Works

Every gateway event includes the trace ID, enabling bidirectional lookup:

Lookup by Trace ID

# Find the event for a specific trace
kt events search --trace-id "4bf92f3577b34da6a3ce929d0e0e4736"

Lookup by Event ID

From the console event detail view, the trace ID is displayed and links directly to your trace backend.

Correlation in Practice

Sampling Strategies

Head-Based Sampling

Decide at trace creation whether to sample:

gateway:
telemetry:
otlp:
sample_rate: 0.1 # Sample 10% of traces

Tail-Based Sampling

Decide after the trace completes (requires the OTel collector):

StrategyKeepDrop
ErrorsAll error tracesN/A
Slow requestsLatency > 5sN/A
Policy blocksAll blocked requestsN/A
Normal requests10% sample90%
# OTel collector tail sampling
tail_sampling:
policies:
- name: always-keep-errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: always-keep-slow
type: latency
latency: { threshold_ms: 5000 }
- name: sample-normal
type: probabilistic
probabilistic: { sampling_percentage: 5 }

Debugging with Traces

Common Trace Patterns

Healthy request:

[App: 250ms] → [Gateway: 245ms] → [Policy.Input: 2ms] → [Upstream: 240ms] → [Policy.Output: 3ms]

Slow policy evaluation:

[App: 5200ms] → [Gateway: 5195ms] → [Policy.Input: 4950ms ⚠️] → [Upstream: 240ms]

Provider timeout:

[App: 30100ms] → [Gateway: 30095ms] → [Policy.Input: 2ms] → [Upstream: 30000ms TIMEOUT ❌]

Next steps

For AI systems

  • Canonical terms: W3C traceparent, OpenTelemetry, OTel collector, OTLP exporter, span attributes, trace-to-event correlation, ai.model, ai.provider, ai.input_length, request_id, gateway spans, policy spans
  • Key config: otel-collector-config.yaml, receivers.otlp, exporters, processors.batch
  • Best next pages: Observability for AI-Governed Systems, Performance Engineering, Incident Response

For engineers

  • The gateway creates child spans under incoming traceparent headers — no trace context = new trace started by gateway
  • Instrument with: from opentelemetry.propagate import inject then inject headers before calling the gateway
  • Add semantic attributes: ai.model, ai.provider, ai.input_length, ai.output_tokens for AI-specific span data
  • Correlate traces to events using request_id returned in gateway response headers
  • Deploy OTel collector sidecar exporting to your trace backend (Jaeger, Tempo, Honeycomb)

For leaders

  • Distributed tracing provides end-to-end latency visibility from application through gateway to LLM provider — critical for SLA tracking
  • Trace-to-event correlation enables rapid root cause analysis when governance decisions cause unexpected behavior
  • OpenTelemetry is vendor-neutral — switch trace backends without re-instrumenting applications