Distributed Tracing Across AI Services

When an AI request flows through your application, the Keeptrusts gateway, and an LLM provider, distributed tracing ties the entire journey into a single trace. This guide covers trace propagation, collector configuration, span enrichment, and trace-to-event correlation.

Use this page when

You are propagating OpenTelemetry traces through the Keeptrusts gateway to your trace backend
You need to configure the OTel collector sidecar for gateway span export
You want to correlate application-level traces with gateway decision events
You are adding custom span attributes (model, provider, token count) to AI-related spans

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Trace Propagation Architecture

W3C Trace Context

The gateway propagates the W3C traceparent header through the entire request chain:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
              ├─ version
              ├──────── trace-id (32 hex)
              ├─────────────────────────────────── parent-span-id (16 hex)
              └────────────────────────────────────────────────────── flags

If the incoming request includes a traceparent, the gateway creates child spans under that trace. If no trace context is present, the gateway starts a new trace.

Instrumenting Your Application

Python (OpenTelemetry SDK)

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.propagate import inject
import httpx

# Setup
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("my-app")

async def ask_ai(question: str) -> str:
    with tracer.start_as_current_span("ai.completion") as span:
        span.set_attribute("ai.model", "gpt-4o")
        span.set_attribute("ai.provider", "openai")
        span.set_attribute("ai.input_length", len(question))

        headers = {"Content-Type": "application/json"}
        inject(headers)  # Injects traceparent header

        async with httpx.AsyncClient() as client:
            response = await client.post(
                "http://kt-gateway:41002/v1/chat/completions",
                headers=headers,
                json={
                    "model": "gpt-4o",
                    "messages": [{"role": "user", "content": question}],
                },
                timeout=120.0,
            )
            data = response.json()

        span.set_attribute("ai.output_tokens", data["usage"]["completion_tokens"])
        span.set_attribute("ai.status", response.status_code)
        return data["choices"][0]["message"]["content"]

TypeScript (OpenTelemetry SDK)

import { trace, context, propagation } from '@opentelemetry/api';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';

const provider = new NodeTracerProvider();
provider.addSpanProcessor(
  new BatchSpanProcessor(
    new OTLPTraceExporter({ url: 'http://otel-collector:4317' })
  )
);
provider.register();

const tracer = trace.getTracer('my-app');

async function askAI(question: string): Promise<string> {
  return tracer.startActiveSpan('ai.completion', async (span) => {
    span.setAttribute('ai.model', 'gpt-4o');
    span.setAttribute('ai.provider', 'openai');

    const headers: Record<string, string> = {
      'Content-Type': 'application/json',
    };
    propagation.inject(context.active(), headers);

    const response = await fetch(
      'http://kt-gateway:41002/v1/chat/completions',
      {
        method: 'POST',
        headers,
        body: JSON.stringify({
          model: 'gpt-4o',
          messages: [{ role: 'user', content: question }],
        }),
      }
    );

    const data = await response.json();
    span.setAttribute('ai.output_tokens', data.usage.completion_tokens);
    span.end();
    return data.choices[0].message.content;
  });
}

OTel Collector Sidecar

Deployment Architecture

Collector Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 512

  # Enrich spans with deployment metadata
  resource:
    attributes:
      - key: deployment.environment
        value: production
        action: upsert
      - key: service.namespace
        value: ai-platform
        action: upsert

  # Remove sensitive data from spans
  attributes:
    actions:
      - key: http.request.header.authorization
        action: delete
      - key: http.request.header.x-api-key
        action: delete
      - key: ai.prompt.content
        action: delete

  # Tail-based sampling: keep errors and slow requests
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow-requests
        type: latency
        latency:
          threshold_ms: 5000
      - name: sample-rest
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, resource, attributes, tail_sampling]
      exporters: [otlp/jaeger]

Kubernetes Sidecar Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-ai-app
spec:
  template:
    spec:
      containers:
        - name: app
          image: my-app:latest
          env:
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://localhost:4317"

        - name: kt-gateway
          image: keeptrusts/gateway:latest
          env:
            - name: KT_OTLP_ENDPOINT
              value: "http://localhost:4317"

        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args: ["--config=/etc/otel/config.yaml"]
          volumeMounts:
            - name: otel-config
              mountPath: /etc/otel
          ports:
            - containerPort: 4317  # gRPC
            - containerPort: 4318  # HTTP

      volumes:
        - name: otel-config
          configMap:
            name: otel-collector-config

Span Attributes

Gateway-Emitted Span Attributes

Attribute	Type	Description
`ai.model`	string	Requested model name
`ai.provider`	string	Provider that served the request
`ai.input_tokens`	int	Input token count
`ai.output_tokens`	int	Output token count
`ai.total_tokens`	int	Total token count
`ai.streaming`	bool	Whether the response was streamed
`ai.cache_hit`	bool	Whether the response was served from cache
`kt.gateway_id`	string	Gateway instance identifier
`kt.policy.action`	string	Final policy action (pass/block/redact)
`kt.policy.names`	string[]	Policies that were evaluated
`kt.event_id`	string	Corresponding event ID in the control-plane
`kt.request_id`	string	Gateway-assigned request identifier

Custom Span Attributes

Add application-specific attributes for richer traces:

with tracer.start_as_current_span("ai.completion") as span:
    span.set_attribute("app.feature", "customer-support")
    span.set_attribute("app.user_tier", "enterprise")
    span.set_attribute("app.conversation_id", conversation_id)
    span.set_attribute("app.retry_attempt", attempt_number)

Trace-to-Event Correlation

How It Works

Every gateway event includes the trace ID, enabling bidirectional lookup:

Lookup by Trace ID

# Find the event for a specific trace
kt events search --trace-id "4bf92f3577b34da6a3ce929d0e0e4736"

Lookup by Event ID

From the console event detail view, the trace ID is displayed and links directly to your trace backend.

Correlation in Practice

Sampling Strategies

Head-Based Sampling

Decide at trace creation whether to sample:

gateway:
  telemetry:
    otlp:
      sample_rate: 0.1  # Sample 10% of traces

Tail-Based Sampling

Decide after the trace completes (requires the OTel collector):

Strategy	Keep	Drop
Errors	All error traces	N/A
Slow requests	Latency > 5s	N/A
Policy blocks	All blocked requests	N/A
Normal requests	10% sample	90%

Recommended Production Sampling

# OTel collector tail sampling
tail_sampling:
  policies:
    - name: always-keep-errors
      type: status_code
      status_code: { status_codes: [ERROR] }
    - name: always-keep-slow
      type: latency
      latency: { threshold_ms: 5000 }
    - name: sample-normal
      type: probabilistic
      probabilistic: { sampling_percentage: 5 }

Debugging with Traces

Common Trace Patterns

Healthy request:

[App: 250ms] → [Gateway: 245ms] → [Policy.Input: 2ms] → [Upstream: 240ms] → [Policy.Output: 3ms]

Slow policy evaluation:

[App: 5200ms] → [Gateway: 5195ms] → [Policy.Input: 4950ms ⚠️] → [Upstream: 240ms]

Provider timeout:

[App: 30100ms] → [Gateway: 30095ms] → [Policy.Input: 2ms] → [Upstream: 30000ms TIMEOUT ❌]

Next steps

Observability for AI-Governed Systems — metrics and logging
Performance Engineering the AI Gateway — optimize traced bottlenecks
Capacity Planning for AI Workloads — use trace data for planning

For AI systems

Canonical terms: W3C traceparent, OpenTelemetry, OTel collector, OTLP exporter, span attributes, trace-to-event correlation, ai.model, ai.provider, ai.input_length, request_id, gateway spans, policy spans
Key config: otel-collector-config.yaml, receivers.otlp, exporters, processors.batch
Best next pages: Observability for AI-Governed Systems, Performance Engineering, Incident Response

For engineers

The gateway creates child spans under incoming traceparent headers — no trace context = new trace started by gateway
Instrument with: from opentelemetry.propagate import inject then inject headers before calling the gateway
Add semantic attributes: ai.model, ai.provider, ai.input_length, ai.output_tokens for AI-specific span data
Correlate traces to events using request_id returned in gateway response headers
Deploy OTel collector sidecar exporting to your trace backend (Jaeger, Tempo, Honeycomb)

For leaders

Distributed tracing provides end-to-end latency visibility from application through gateway to LLM provider — critical for SLA tracking
Trace-to-event correlation enables rapid root cause analysis when governance decisions cause unexpected behavior
OpenTelemetry is vendor-neutral — switch trace backends without re-instrumenting applications

Use this page when​

Primary audience​

Trace Propagation Architecture​

W3C Trace Context​

Instrumenting Your Application​

Python (OpenTelemetry SDK)​

TypeScript (OpenTelemetry SDK)​

OTel Collector Sidecar​

Deployment Architecture​

Collector Configuration​

Kubernetes Sidecar Deployment​

Span Attributes​

Gateway-Emitted Span Attributes​

Custom Span Attributes​

Trace-to-Event Correlation​

How It Works​

Lookup by Trace ID​

Lookup by Event ID​

Correlation in Practice​

Sampling Strategies​

Head-Based Sampling​

Tail-Based Sampling​

Recommended Production Sampling​

Debugging with Traces​

Common Trace Patterns​

Next steps​

For AI systems​

For engineers​

For leaders​