Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Monitor AI Governance with Datadog

Datadog provides deep observability into your Keeptrusts deployment — from gateway performance metrics to policy enforcement trends and cost tracking. This guide covers custom metrics, log forwarding, dashboard templates, and alerting.

Use this page when

  • You want to monitor Keeptrusts gateway performance and policy enforcement trends in Datadog.
  • You need to set up DogStatsD metrics, log forwarding, and custom dashboards for AI governance.
  • You are configuring anomaly detection alerts for unusual policy block spikes.
  • You need SLO tracking for gateway availability and policy evaluation latency.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Architecture overview

Keeptrusts Gateway
→ StatsD / DogStatsD metrics → Datadog Agent → Datadog
→ Logs (stdout/stderr) → Datadog Agent log collection → Datadog
→ OTel Collector sidecar → Datadog Exporter → Datadog APM

Keeptrusts API
→ /v1/webhooks → Datadog Log Intake API (real-time events)
→ kt export → scheduled batch → Datadog Log Archives

Prerequisites

  • Datadog account with API and application keys
  • Datadog Agent installed on gateway hosts or as a Kubernetes DaemonSet
  • Keeptrusts gateway running with logging enabled

Datadog Agent configuration

Kubernetes DaemonSet

# datadog-values.yaml (Helm)
datadog:
apiKey: <DATADOG_API_KEY>
appKey: <DATADOG_APP_KEY>
logs:
enabled: true
containerCollectAll: true
apm:
portEnabled: true
dogstatsd:
useHostPort: true
hostPortConfig:
hostPort: 8125

agents:
containers:
agent:
env:
- name: DD_CONTAINER_LABELS_AS_TAGS
value: '{"app":"service"}'
helm install datadog-agent datadog/datadog \
-f datadog-values.yaml \
--namespace monitoring

Host-based Agent

Add to /etc/datadog-agent/conf.d/keeptrusts.d/conf.yaml:

logs:
- type: file
path: /var/log/keeptrusts/gateway.log
service: keeptrusts-gateway
source: keeptrusts
sourcecategory: ai-governance

Custom metrics

Gateway performance metrics

Configure the gateway to emit DogStatsD metrics:

kt gateway run \
--config policy-config.yaml \
--statsd-address 127.0.0.1:8125 \
--statsd-prefix keeptrusts

Key metrics emitted:

MetricTypeDescription
keeptrusts.gateway.requestscounterTotal requests processed
keeptrusts.gateway.latencyhistogramEnd-to-end request latency (ms)
keeptrusts.gateway.policy.blockscounterRequests blocked by policy
keeptrusts.gateway.policy.escalationscounterRequests escalated
keeptrusts.gateway.policy.redactionscounterResponses with redacted content
keeptrusts.gateway.upstream.latencyhistogramUpstream LLM provider latency (ms)
keeptrusts.gateway.upstream.errorscounterUpstream provider errors

Custom metrics via DogStatsD

# Example: emit custom metric from a monitoring script
from datadog import statsd

# After querying /v1/events
statsd.gauge('keeptrusts.events.pending_escalations', pending_count, tags=['env:production'])
statsd.increment('keeptrusts.events.exported', tags=['format:csv', 'env:production'])

Log forwarding

Real-time via webhook

Forward Keeptrusts events to the Datadog Log Intake API:

curl -X POST https://api.keeptrusts.com/v1/webhooks \
-H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://http-intake.logs.datadoghq.com/api/v2/logs",
"description": "Forward events to Datadog Logs",
"event_types": ["event.*"],
"active": true,
"headers": {
"DD-API-KEY": "<DATADOG_API_KEY>",
"Content-Type": "application/json"
}
}'

Log pipeline

Create a Datadog log pipeline for Keeptrusts events:

  1. Go to Logs → Configuration → Pipelines
  2. Create a new pipeline with filter source:keeptrusts
  3. Add processors:
    • Grok Parser: Extract action, policy_name, model from event JSON
    • Category Processor: Map action=blockseverity=error, action=escalateseverity=warning
    • Remapper: Set timestamp as the official log date

Dashboard template

Create a Keeptrusts governance dashboard with these widgets:

WidgetQueryVisualization
Request volumesum:keeptrusts.gateway.requests{*}.as_count()Timeseries
Block rate(sum:keeptrusts.gateway.policy.blocks / sum:keeptrusts.gateway.requests) * 100Query value (%)
P95 latencyp95:keeptrusts.gateway.latency{*}Timeseries
Policy blocks by namesum:keeptrusts.gateway.policy.blocks{*} by {policy_name}.as_count()Top list
Upstream errorssum:keeptrusts.gateway.upstream.errors{*} by {provider}.as_count()Bar chart
Escalation trendsum:keeptrusts.gateway.policy.escalations{*}.as_count()Timeseries
Model usagesum:keeptrusts.gateway.requests{*} by {model}.as_count()Pie chart

Dashboard JSON (import)

{
"title": "Keeptrusts AI Governance",
"description": "Real-time AI governance monitoring",
"widgets": [
{
"definition": {
"title": "Request Volume",
"type": "timeseries",
"requests": [
{
"q": "sum:keeptrusts.gateway.requests{env:production}.as_count()",
"display_type": "bars"
}
]
}
},
{
"definition": {
"title": "Policy Block Rate",
"type": "query_value",
"requests": [
{
"q": "(sum:keeptrusts.gateway.policy.blocks{env:production}.as_count() / sum:keeptrusts.gateway.requests{env:production}.as_count()) * 100",
"aggregator": "avg"
}
],
"precision": 2,
"custom_unit": "%"
}
}
]
}

Anomaly detection

Set up anomaly monitors for unusual AI usage patterns:

Metric: keeptrusts.gateway.policy.blocks
Algorithm: agile
Deviations: 3
Window: 1h
Alert: "Anomalous spike in AI policy blocks detected"

Create the monitor:

  1. Go to Monitors → New Monitor → Anomaly
  2. Select metric keeptrusts.gateway.policy.blocks
  3. Set algorithm to Agile, deviations to 3
  4. Configure notification to your Slack channel or PagerDuty

SLO tracking

Track AI governance SLOs:

SLOTargetMetric
Gateway availability99.9%keeptrusts.gateway.requests with no 5xx
Policy evaluation latencyP95 < 100mskeeptrusts.gateway.latency
Event delivery success99.5%Webhook delivery success rate
SLO: Gateway Availability
Type: Monitor-based
Monitor: "Keeptrusts Gateway Health Check"
Target: 99.9% over 30 days
Warning: 99.95%

Cost monitoring

Track LLM spend through Keeptrusts metrics:

# Query cost data from the API
kt events tail --format json --limit 100 \
| jq '[.[] | .estimated_cost] | add'

Create a Datadog custom metric for cost tracking:

# Emit cost metrics from export data
statsd.gauge('keeptrusts.cost.daily_spend', daily_total, tags=['env:production', 'team:engineering'])
statsd.gauge('keeptrusts.cost.per_model', model_cost, tags=['model:gpt-4o', 'env:production'])

Troubleshooting

IssueCauseFix
No metrics in DatadogDogStatsD not reachableVerify Agent is running and port 8125 is accessible
Logs missing fieldsPipeline parser misconfiguredCheck Grok pattern matches Keeptrusts event JSON
Dashboard shows no dataWrong metric name or tag filterVerify metric names with datadog-agent status
Anomaly alerts too noisyDeviation threshold too lowIncrease deviations to 4 or use robust algorithm

For AI systems

  • Canonical terms: Keeptrusts gateway, DogStatsD, --statsd-address, --statsd-prefix keeptrusts, Datadog Agent, log pipeline, anomaly detection, SLO.
  • Key metrics: keeptrusts.gateway.requests, keeptrusts.gateway.latency, keeptrusts.gateway.policy.blocks, keeptrusts.gateway.upstream.latency.
  • Integration methods: DogStatsD (real-time metrics), Datadog Agent log collection (structured logs), OTel Collector exporter (APM traces), webhook to Log Intake API (events).
  • Best next pages: SIEM integration, PagerDuty incident response, Kubernetes deployment.

For engineers

  • Prerequisites: Datadog account with API/app keys, Datadog Agent running (DaemonSet or host-based), gateway with --statsd-address 127.0.0.1:8125.
  • Validate: Check datadog-agent status for metric collection, verify metrics appear under keeptrusts.* in Metrics Explorer.
  • Log pipeline: Create a pipeline with filter source:keeptrusts, add Grok parser for event JSON, remap timestamp.
  • Alert tuning: Start anomaly detection with Agile algorithm and 3 deviations; increase to 4 if too noisy.

For leaders

  • Visibility: Real-time dashboards show policy enforcement rates, block trends, and LLM spend across teams and models.
  • SLOs: Track gateway availability (99.9% target) and policy evaluation latency (P95 < 100ms) with monthly error budget tracking.
  • Cost insight: Custom metrics expose daily/weekly AI spend by team, model, and environment for chargeback.
  • Incident readiness: Anomaly alerts detect unusual block spikes before they become user-reported incidents.

Next steps