Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Observability Configuration

The callbacks: section sends per-request telemetry to external observability platforms. The health_monitor: section enables background health probing of upstream providers.

Use this page when

  • You need the exact command, config, API, or integration details for Observability Configuration.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Callbacks

Callbacks are dispatched asynchronously after each proxied request. Six sink types are supported.

callbacks:
- type: "langfuse"
host: "https://cloud.langfuse.com"
public_key_env: "LANGFUSE_PUBLIC_KEY"
secret_key_env: "LANGFUSE_SECRET_KEY"

Langfuse

callbacks:
- type: "langfuse"
host: "https://cloud.langfuse.com" # default
public_key: "pk-lf-..." # OR public_key_env
public_key_env: "LANGFUSE_PUBLIC_KEY"
secret_key: "sk-lf-..." # OR secret_key_env
secret_key_env: "LANGFUSE_SECRET_KEY"

Sends traces with input/output messages, token counts, latency, model, and policy decisions.

Datadog

callbacks:
- type: "datadog"
secret_key_ref:
env: "DD_API_KEY"
site: "datadoghq.com" # default
service_name: "keeptrusts-proxy" # default
tags:
- "env:production"
- "team:platform"

Sends APM-style spans with model, provider, latency, token usage, and policy verdicts.

Prometheus

callbacks:
- type: "prometheus"

Exposes a /metrics scrape endpoint with these counters and histograms:

MetricTypeLabels
keeptrusts_llm_requests_totalcountermodel, provider, status, verdict
keeptrusts_llm_tokens_totalcountermodel, provider, direction (input/output)
keeptrusts_llm_cost_totalcountermodel, provider
keeptrusts_llm_latency_seconds_sumhistogrammodel, provider

Helicone

callbacks:
- type: "helicone"
secret_key_ref:
env: "HELICONE_API_KEY"

Injects the Helicone-Auth header into upstream requests for automatic logging.

Braintrust

callbacks:
- type: "braintrust"
secret_key_ref:
env: "BRAINTRUST_API_KEY"
project_name: "my-project" # default: "default"

Webhook

Send events to any HTTP endpoint with optional HMAC-SHA256 signing.

callbacks:
- type: "webhook"
url: "https://my-service.example.com/events"
signing_secret_env: "WEBHOOK_SECRET" # HMAC-SHA256 signing
headers:
X-Custom-Header: "value"
event_filter:
types:
- "block"
- "escalation"
- "policy_violation"
- "quality_failure"
- "request"
metadata_match:
environment: "production"

Webhook event filter types:

TypeWhen
requestEvery proxied request (default)
blockRequest was blocked by a policy
escalationRequest was escalated for human review
policy_violationA policy triggered a non-blocking violation
quality_failureQuality assertion failed

Privacy controls

Callbacks support privacy scrubbing before dispatch:

  • redact_message_bodies — Strip request/response message content
  • redact_user — Strip user identity headers
  • Payload fidelity modes:
    • full — All data
    • identity — Metadata + user identity, no message content
    • event_only — Metadata only, no content or identity

These are controlled at the provider level via providers.logging:

providers:
logging:
redact_message_bodies: true
redact_api_keys: true

Multiple callbacks

You can combine multiple callback sinks. Each receives the same event independently.

callbacks:
- type: "prometheus"
- type: "langfuse"
public_key_env: "LANGFUSE_PUBLIC_KEY"
secret_key_env: "LANGFUSE_SECRET_KEY"
- type: "webhook"
url: "https://slack-webhook.example.com/events"
event_filter:
types: ["block", "escalation"]

Health monitor

The health_monitor: section runs background probes against provider endpoints and raises alerts on sustained failures.

health_monitor:
unhealthy_threshold: 3
alert_callback_urls:
- "https://pagerduty.example.com/events"
- "https://slack.example.com/webhook"
providers:
- name: "openai"
endpoint: "https://api.openai.com/v1/models"
interval_seconds: 60
timeout_ms: 5000
- name: "anthropic"
endpoint: "https://api.anthropic.com/v1/models"
interval_seconds: 60
timeout_ms: 5000
FieldTypeRequiredDefaultDescription
unhealthy_thresholdintegerno3Consecutive failures before marking unhealthy
alert_callback_urlsstring[]no[]Webhook URLs for status-change alerts
providers[].namestringyesProvider identifier for logging
providers[].endpointstringyesURL to probe (GET request)
providers[].interval_secondsintegerno60Seconds between probes
providers[].timeout_msintegerno5000Probe timeout in milliseconds

The health monitor:

  1. Runs a background Tokio task per provider
  2. Sends HTTP GET to the endpoint at the configured interval
  3. After unhealthy_threshold consecutive failures, marks the provider unhealthy
  4. POSTs an alert event (JSON) to each URL in alert_callback_urls
  5. Continues probing; marks healthy again after one success

Complete observability example

pack:
name: observable-gateway
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-backup
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
logging:
redact_message_bodies: true
redact_api_keys: true
callbacks:
- type: prometheus
- type: langfuse
public_key_env: LANGFUSE_PUBLIC_KEY
secret_key_env: LANGFUSE_SECRET_KEY
- type: datadog
secret_key_ref:
env: DD_API_KEY
tags:
- env:production
- service:ai-gateway
- type: webhook
url: https://alerts.example.com/keeptrusts
signing_secret_env: WEBHOOK_SECRET
event_filter:
types:
- block
- escalation
health_monitor:
unhealthy_threshold: 3
alert_callback_urls:
- https://pagerduty.example.com/keeptrusts
providers:
- name: openai
endpoint: https://api.openai.com/v1/models
interval_seconds: 60
- name: anthropic
endpoint: https://api.anthropic.com/v1/models
interval_seconds: 60
policies:
chain:
- prompt-injection
- audit-logger

For AI systems

  • Canonical terms: Keeptrusts, policy-config.yaml, callbacks (langfuse, datadog, prometheus, helicone, braintrust, webhook), health_monitor, event_filter, signing_secret_env, redact_message_bodies.
  • Callbacks are dispatched asynchronously after each proxied request; health monitor runs background probes.
  • Best next pages: Providers Configuration, Rate Limits Configuration, Declarative Config Reference.

For engineers

  • Six callback sink types: Langfuse, Datadog, Prometheus, Helicone, Braintrust, and Webhook.
  • Prometheus exposes a /metrics scrape endpoint with counters for requests, tokens, cost, and latency histograms.
  • Webhook callbacks support HMAC-SHA256 signing via signing_secret_env and event filtering by type/metadata.
  • Privacy controls: redact_message_bodies strips content; redact_api_keys strips credentials from callback payloads.
  • Health monitor marks providers unhealthy after unhealthy_threshold consecutive probe failures and sends alerts to alert_callback_urls.
  • Multiple callbacks can be combined — each receives the same event independently.

For leaders

  • Observability callbacks provide real-time visibility into AI gateway performance, cost, and policy enforcement across existing tooling (Datadog, Prometheus, Langfuse).
  • Webhook event filtering allows targeted alerting on blocks and escalations without noise from normal traffic.
  • Health monitoring with automated alerts enables proactive provider failover before users are impacted.
  • Privacy controls ensure that observability data doesn't leak sensitive request/response content to external platforms.
  • Prometheus metrics enable SLA dashboards and capacity planning without additional infrastructure.

Next steps