Monitor AI Governance with Datadog

Datadog provides deep observability into your Keeptrusts deployment — from gateway performance metrics to policy enforcement trends and cost tracking. This guide covers custom metrics, log forwarding, dashboard templates, and alerting.

Use this page when

You want to monitor Keeptrusts gateway performance and policy enforcement trends in Datadog.
You need to set up DogStatsD metrics, log forwarding, and custom dashboards for AI governance.
You are configuring anomaly detection alerts for unusual policy block spikes.
You need SLO tracking for gateway availability and policy evaluation latency.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Architecture overview

Keeptrusts Gateway
  → StatsD / DogStatsD metrics → Datadog Agent → Datadog
  → Logs (stdout/stderr) → Datadog Agent log collection → Datadog
  → OTel Collector sidecar → Datadog Exporter → Datadog APM

Keeptrusts API
  → /v1/webhooks → Datadog Log Intake API (real-time events)
  → kt export → scheduled batch → Datadog Log Archives

Prerequisites

Datadog account with API and application keys
Datadog Agent installed on gateway hosts or as a Kubernetes DaemonSet
Keeptrusts gateway running with logging enabled

Datadog Agent configuration

Kubernetes DaemonSet

# datadog-values.yaml (Helm)
datadog:
  apiKey: <DATADOG_API_KEY>
  appKey: <DATADOG_APP_KEY>
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    portEnabled: true
  dogstatsd:
    useHostPort: true
    hostPortConfig:
      hostPort: 8125

agents:
  containers:
    agent:
      env:
        - name: DD_CONTAINER_LABELS_AS_TAGS
          value: '{"app":"service"}'

helm install datadog-agent datadog/datadog \
  -f datadog-values.yaml \
  --namespace monitoring

Host-based Agent

Add to /etc/datadog-agent/conf.d/keeptrusts.d/conf.yaml:

logs:
  - type: file
    path: /var/log/keeptrusts/gateway.log
    service: keeptrusts-gateway
    source: keeptrusts
    sourcecategory: ai-governance

Custom metrics

Gateway performance metrics

Configure the gateway to emit DogStatsD metrics:

kt gateway run \
  --config policy-config.yaml \
  --statsd-address 127.0.0.1:8125 \
  --statsd-prefix keeptrusts

Key metrics emitted:

Metric	Type	Description
`keeptrusts.gateway.requests`	counter	Total requests processed
`keeptrusts.gateway.latency`	histogram	End-to-end request latency (ms)
`keeptrusts.gateway.policy.blocks`	counter	Requests blocked by policy
`keeptrusts.gateway.policy.escalations`	counter	Requests escalated
`keeptrusts.gateway.policy.redactions`	counter	Responses with redacted content
`keeptrusts.gateway.upstream.latency`	histogram	Upstream LLM provider latency (ms)
`keeptrusts.gateway.upstream.errors`	counter	Upstream provider errors

Custom metrics via DogStatsD

# Example: emit custom metric from a monitoring script
from datadog import statsd

# After querying /v1/events
statsd.gauge('keeptrusts.events.pending_escalations', pending_count, tags=['env:production'])
statsd.increment('keeptrusts.events.exported', tags=['format:csv', 'env:production'])

Log forwarding

Real-time via webhook

Forward Keeptrusts events to the Datadog Log Intake API:

curl -X POST https://api.keeptrusts.com/v1/webhooks \
  -H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://http-intake.logs.datadoghq.com/api/v2/logs",
    "description": "Forward events to Datadog Logs",
    "event_types": ["event.*"],
    "active": true,
    "headers": {
      "DD-API-KEY": "<DATADOG_API_KEY>",
      "Content-Type": "application/json"
    }
  }'

Log pipeline

Create a Datadog log pipeline for Keeptrusts events:

Go to Logs → Configuration → Pipelines
Create a new pipeline with filter source:keeptrusts
Add processors:
- Grok Parser: Extract action, policy_name, model from event JSON
- Category Processor: Map action=block → severity=error, action=escalate → severity=warning
- Remapper: Set timestamp as the official log date

Dashboard template

Create a Keeptrusts governance dashboard with these widgets:

Recommended widgets

Widget	Query	Visualization
Request volume	`sum:keeptrusts.gateway.requests{*}.as_count()`	Timeseries
Block rate	`(sum:keeptrusts.gateway.policy.blocks / sum:keeptrusts.gateway.requests) * 100`	Query value (%)
P95 latency	`p95:keeptrusts.gateway.latency{*}`	Timeseries
Policy blocks by name	`sum:keeptrusts.gateway.policy.blocks{*} by {policy_name}.as_count()`	Top list
Upstream errors	`sum:keeptrusts.gateway.upstream.errors{*} by {provider}.as_count()`	Bar chart
Escalation trend	`sum:keeptrusts.gateway.policy.escalations{*}.as_count()`	Timeseries
Model usage	`sum:keeptrusts.gateway.requests{*} by {model}.as_count()`	Pie chart

Dashboard JSON (import)

{
  "title": "Keeptrusts AI Governance",
  "description": "Real-time AI governance monitoring",
  "widgets": [
    {
      "definition": {
        "title": "Request Volume",
        "type": "timeseries",
        "requests": [
          {
            "q": "sum:keeptrusts.gateway.requests{env:production}.as_count()",
            "display_type": "bars"
          }
        ]
      }
    },
    {
      "definition": {
        "title": "Policy Block Rate",
        "type": "query_value",
        "requests": [
          {
            "q": "(sum:keeptrusts.gateway.policy.blocks{env:production}.as_count() / sum:keeptrusts.gateway.requests{env:production}.as_count()) * 100",
            "aggregator": "avg"
          }
        ],
        "precision": 2,
        "custom_unit": "%"
      }
    }
  ]
}

Anomaly detection

Set up anomaly monitors for unusual AI usage patterns:

Metric: keeptrusts.gateway.policy.blocks
Algorithm: agile
Deviations: 3
Window: 1h
Alert: "Anomalous spike in AI policy blocks detected"

Create the monitor:

Go to Monitors → New Monitor → Anomaly
Select metric keeptrusts.gateway.policy.blocks
Set algorithm to Agile, deviations to 3
Configure notification to your Slack channel or PagerDuty

SLO tracking

Track AI governance SLOs:

SLO	Target	Metric
Gateway availability	99.9%	`keeptrusts.gateway.requests` with no 5xx
Policy evaluation latency	P95 < 100ms	`keeptrusts.gateway.latency`
Event delivery success	99.5%	Webhook delivery success rate

SLO: Gateway Availability
Type: Monitor-based
Monitor: "Keeptrusts Gateway Health Check"
Target: 99.9% over 30 days
Warning: 99.95%

Cost monitoring

Track LLM spend through Keeptrusts metrics:

# Query cost data from the API
kt events tail --format json --limit 100 \
  | jq '[.[] | .estimated_cost] | add'

Create a Datadog custom metric for cost tracking:

# Emit cost metrics from export data
statsd.gauge('keeptrusts.cost.daily_spend', daily_total, tags=['env:production', 'team:engineering'])
statsd.gauge('keeptrusts.cost.per_model', model_cost, tags=['model:gpt-4o', 'env:production'])

Troubleshooting

Issue	Cause	Fix
No metrics in Datadog	DogStatsD not reachable	Verify Agent is running and port 8125 is accessible
Logs missing fields	Pipeline parser misconfigured	Check Grok pattern matches Keeptrusts event JSON
Dashboard shows no data	Wrong metric name or tag filter	Verify metric names with `datadog-agent status`
Anomaly alerts too noisy	Deviation threshold too low	Increase deviations to 4 or use robust algorithm

For AI systems

Canonical terms: Keeptrusts gateway, DogStatsD, --statsd-address, --statsd-prefix keeptrusts, Datadog Agent, log pipeline, anomaly detection, SLO.
Key metrics: keeptrusts.gateway.requests, keeptrusts.gateway.latency, keeptrusts.gateway.policy.blocks, keeptrusts.gateway.upstream.latency.
Integration methods: DogStatsD (real-time metrics), Datadog Agent log collection (structured logs), OTel Collector exporter (APM traces), webhook to Log Intake API (events).
Best next pages: SIEM integration, PagerDuty incident response, Kubernetes deployment.

For engineers

Prerequisites: Datadog account with API/app keys, Datadog Agent running (DaemonSet or host-based), gateway with --statsd-address 127.0.0.1:8125.
Validate: Check datadog-agent status for metric collection, verify metrics appear under keeptrusts.* in Metrics Explorer.
Log pipeline: Create a pipeline with filter source:keeptrusts, add Grok parser for event JSON, remap timestamp.
Alert tuning: Start anomaly detection with Agile algorithm and 3 deviations; increase to 4 if too noisy.

For leaders

Visibility: Real-time dashboards show policy enforcement rates, block trends, and LLM spend across teams and models.
SLOs: Track gateway availability (99.9% target) and policy evaluation latency (P95 < 100ms) with monthly error budget tracking.
Cost insight: Custom metrics expose daily/weekly AI spend by team, model, and environment for chargeback.
Incident readiness: Anomaly alerts detect unusual block spikes before they become user-reported incidents.

Next steps

Feed events to your SIEM for security correlation
Automate incident response with Datadog-PagerDuty integration
Deploy on Kubernetes with Datadog Agent DaemonSet

Use this page when​

Primary audience​

Architecture overview​

Prerequisites​

Datadog Agent configuration​

Kubernetes DaemonSet​

Host-based Agent​

Custom metrics​

Gateway performance metrics​

Custom metrics via DogStatsD​

Log forwarding​

Real-time via webhook​

Log pipeline​

Dashboard template​

Recommended widgets​

Dashboard JSON (import)​

Anomaly detection​

SLO tracking​

Cost monitoring​

Troubleshooting​

For AI systems​

For engineers​

For leaders​

Next steps​