Observability Configuration
The callbacks: section sends per-request telemetry to external observability platforms. The health_monitor: section enables background health probing of upstream providers.
Use this page when
- You need the exact command, config, API, or integration details for Observability Configuration.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Callbacks
Callbacks are dispatched asynchronously after each proxied request. Six sink types are supported.
callbacks:
- type: "langfuse"
host: "https://cloud.langfuse.com"
public_key_env: "LANGFUSE_PUBLIC_KEY"
secret_key_env: "LANGFUSE_SECRET_KEY"
Langfuse
callbacks:
- type: "langfuse"
host: "https://cloud.langfuse.com" # default
public_key: "pk-lf-..." # OR public_key_env
public_key_env: "LANGFUSE_PUBLIC_KEY"
secret_key: "sk-lf-..." # OR secret_key_env
secret_key_env: "LANGFUSE_SECRET_KEY"
Sends traces with input/output messages, token counts, latency, model, and policy decisions.
Datadog
callbacks:
- type: "datadog"
secret_key_ref:
env: "DD_API_KEY"
site: "datadoghq.com" # default
service_name: "keeptrusts-proxy" # default
tags:
- "env:production"
- "team:platform"
Sends APM-style spans with model, provider, latency, token usage, and policy verdicts.
Prometheus
callbacks:
- type: "prometheus"
Exposes a /metrics scrape endpoint with these counters and histograms:
| Metric | Type | Labels |
|---|---|---|
keeptrusts_llm_requests_total | counter | model, provider, status, verdict |
keeptrusts_llm_tokens_total | counter | model, provider, direction (input/output) |
keeptrusts_llm_cost_total | counter | model, provider |
keeptrusts_llm_latency_seconds_sum | histogram | model, provider |
Helicone
callbacks:
- type: "helicone"
secret_key_ref:
env: "HELICONE_API_KEY"
Injects the Helicone-Auth header into upstream requests for automatic logging.
Braintrust
callbacks:
- type: "braintrust"
secret_key_ref:
env: "BRAINTRUST_API_KEY"
project_name: "my-project" # default: "default"
Webhook
Send events to any HTTP endpoint with optional HMAC-SHA256 signing.
callbacks:
- type: "webhook"
url: "https://my-service.example.com/events"
signing_secret_env: "WEBHOOK_SECRET" # HMAC-SHA256 signing
headers:
X-Custom-Header: "value"
event_filter:
types:
- "block"
- "escalation"
- "policy_violation"
- "quality_failure"
- "request"
metadata_match:
environment: "production"
Webhook event filter types:
| Type | When |
|---|---|
request | Every proxied request (default) |
block | Request was blocked by a policy |
escalation | Request was escalated for human review |
policy_violation | A policy triggered a non-blocking violation |
quality_failure | Quality assertion failed |
Privacy controls
Callbacks support privacy scrubbing before dispatch:
redact_message_bodies— Strip request/response message contentredact_user— Strip user identity headers- Payload fidelity modes:
full— All dataidentity— Metadata + user identity, no message contentevent_only— Metadata only, no content or identity
These are controlled at the provider level via providers.logging:
providers:
logging:
redact_message_bodies: true
redact_api_keys: true
Multiple callbacks
You can combine multiple callback sinks. Each receives the same event independently.
callbacks:
- type: "prometheus"
- type: "langfuse"
public_key_env: "LANGFUSE_PUBLIC_KEY"
secret_key_env: "LANGFUSE_SECRET_KEY"
- type: "webhook"
url: "https://slack-webhook.example.com/events"
event_filter:
types: ["block", "escalation"]
Health monitor
The health_monitor: section runs background probes against provider endpoints and raises alerts on sustained failures.
health_monitor:
unhealthy_threshold: 3
alert_callback_urls:
- "https://pagerduty.example.com/events"
- "https://slack.example.com/webhook"
providers:
- name: "openai"
endpoint: "https://api.openai.com/v1/models"
interval_seconds: 60
timeout_ms: 5000
- name: "anthropic"
endpoint: "https://api.anthropic.com/v1/models"
interval_seconds: 60
timeout_ms: 5000
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
unhealthy_threshold | integer | no | 3 | Consecutive failures before marking unhealthy |
alert_callback_urls | string[] | no | [] | Webhook URLs for status-change alerts |
providers[].name | string | yes | — | Provider identifier for logging |
providers[].endpoint | string | yes | — | URL to probe (GET request) |
providers[].interval_seconds | integer | no | 60 | Seconds between probes |
providers[].timeout_ms | integer | no | 5000 | Probe timeout in milliseconds |
The health monitor:
- Runs a background Tokio task per provider
- Sends HTTP GET to the endpoint at the configured interval
- After
unhealthy_thresholdconsecutive failures, marks the provider unhealthy - POSTs an alert event (JSON) to each URL in
alert_callback_urls - Continues probing; marks healthy again after one success
Complete observability example
pack:
name: observable-gateway
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-backup
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
logging:
redact_message_bodies: true
redact_api_keys: true
callbacks:
- type: prometheus
- type: langfuse
public_key_env: LANGFUSE_PUBLIC_KEY
secret_key_env: LANGFUSE_SECRET_KEY
- type: datadog
secret_key_ref:
env: DD_API_KEY
tags:
- env:production
- service:ai-gateway
- type: webhook
url: https://alerts.example.com/keeptrusts
signing_secret_env: WEBHOOK_SECRET
event_filter:
types:
- block
- escalation
health_monitor:
unhealthy_threshold: 3
alert_callback_urls:
- https://pagerduty.example.com/keeptrusts
providers:
- name: openai
endpoint: https://api.openai.com/v1/models
interval_seconds: 60
- name: anthropic
endpoint: https://api.anthropic.com/v1/models
interval_seconds: 60
policies:
chain:
- prompt-injection
- audit-logger
For AI systems
- Canonical terms: Keeptrusts, policy-config.yaml, callbacks (langfuse, datadog, prometheus, helicone, braintrust, webhook), health_monitor, event_filter, signing_secret_env, redact_message_bodies.
- Callbacks are dispatched asynchronously after each proxied request; health monitor runs background probes.
- Best next pages: Providers Configuration, Rate Limits Configuration, Declarative Config Reference.
For engineers
- Six callback sink types: Langfuse, Datadog, Prometheus, Helicone, Braintrust, and Webhook.
- Prometheus exposes a
/metricsscrape endpoint with counters for requests, tokens, cost, and latency histograms. - Webhook callbacks support HMAC-SHA256 signing via
signing_secret_envand event filtering by type/metadata. - Privacy controls:
redact_message_bodiesstrips content;redact_api_keysstrips credentials from callback payloads. - Health monitor marks providers unhealthy after
unhealthy_thresholdconsecutive probe failures and sends alerts toalert_callback_urls. - Multiple callbacks can be combined — each receives the same event independently.
For leaders
- Observability callbacks provide real-time visibility into AI gateway performance, cost, and policy enforcement across existing tooling (Datadog, Prometheus, Langfuse).
- Webhook event filtering allows targeted alerting on blocks and escalations without noise from normal traffic.
- Health monitoring with automated alerts enables proactive provider failover before users are impacted.
- Privacy controls ensure that observability data doesn't leak sensitive request/response content to external platforms.
- Prometheus metrics enable SLA dashboards and capacity planning without additional infrastructure.
Next steps
- Providers Configuration — per-provider health probes and logging
- Rate Limits Configuration — Prometheus metrics for rate limit events
- Declarative Config Reference — schema structure