Monitoring & Alerting for AI Gateway Fleet
A production AI gateway fleet requires comprehensive observability. This guide covers health endpoints, metric collection, dashboard templates, and alerting integration for Keeptrusts infrastructure.
Use this page when
- You need to collect Prometheus metrics from the gateway fleet and API
- You are building Grafana dashboards for throughput, latency, policy evaluations, and cost
- You want to configure SLO-based alerting and PagerDuty/Slack integration for AI infrastructure
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Health Endpoints
Every Keeptrusts component exposes health endpoints:
| Component | Endpoint | Port | Purpose |
|---|---|---|---|
| Gateway | /healthz | 41002 | Liveness probe |
| Gateway | /readyz | 41002 | Readiness probe |
| API | /healthz | 8080 | Liveness probe |
| API | /readyz | 8080 | Readiness and migration status |
Configure Kubernetes probes:
livenessProbe:
httpGet:
path: /healthz
port: 41002
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 41002
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
The readiness probe returns 503 while the gateway loads its policy configuration, preventing traffic before policies are active.
Prometheus Metrics
Gateway Metrics
The gateway exposes Prometheus-format metrics at /metrics:
# Request throughput
keeptrusts_gateway_requests_total{provider="openai",model="gpt-4o",status="200"}
# Latency histograms
keeptrusts_gateway_request_duration_seconds_bucket{provider="openai",le="0.5"}
keeptrusts_gateway_request_duration_seconds_bucket{provider="openai",le="1.0"}
# Policy evaluation
keeptrusts_gateway_policy_evaluations_total{policy="pii_redaction",result="pass"}
keeptrusts_gateway_policy_evaluations_total{policy="pii_redaction",result="block"}
# Token usage
keeptrusts_gateway_tokens_total{direction="input",model="gpt-4o"}
keeptrusts_gateway_tokens_total{direction="output",model="gpt-4o"}
# Active connections
keeptrusts_gateway_active_connections{provider="anthropic"}
API Metrics
The API server exposes:
# Event ingestion
keeptrusts_api_events_ingested_total{source="gateway"}
# Export jobs
keeptrusts_api_export_jobs_total{status="completed"}
keeptrusts_api_export_jobs_total{status="failed"}
# Database connections
keeptrusts_api_db_pool_connections_active
keeptrusts_api_db_pool_connections_idle
Scrape Configuration
Add Keeptrusts targets to your Prometheus configuration:
# prometheus.yml
scrape_configs:
- job_name: keeptrusts-gateway
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: keeptrusts-gateway
action: keep
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
target_label: __address__
replacement: "${1}:41002"
metrics_path: /metrics
scrape_interval: 15s
- job_name: keeptrusts-api
static_configs:
- targets: ["keeptrusts-api:8080"]
metrics_path: /metrics
scrape_interval: 15s
Grafana Dashboards
Gateway Overview Dashboard
Create a dashboard with these essential panels:
Request Rate
sum(rate(keeptrusts_gateway_requests_total[5m])) by (provider, status)
P99 Latency by Provider
histogram_quantile(0.99, sum(rate(keeptrusts_gateway_request_duration_seconds_bucket[5m])) by (le, provider))
Policy Block Rate
sum(rate(keeptrusts_gateway_policy_evaluations_total{result="block"}[5m]))
/
sum(rate(keeptrusts_gateway_policy_evaluations_total[5m]))
Token Throughput
sum(rate(keeptrusts_gateway_tokens_total[5m])) by (direction, model)
Cost Tracking Dashboard
Hourly Spend by Team
sum(increase(keeptrusts_gateway_request_cost_total[1h])) by (team)
Budget Utilization
keeptrusts_wallet_balance / keeptrusts_wallet_allocated * 100
PagerDuty Integration
Alertmanager Configuration
Route Keeptrusts alerts to PagerDuty:
# alertmanager.yml
route:
receiver: default
routes:
- match:
severity: critical
service: keeptrusts
receiver: keeptrusts-pagerduty
group_wait: 30s
group_interval: 5m
receivers:
- name: keeptrusts-pagerduty
pagerduty_configs:
- service_key: "<PAGERDUTY_INTEGRATION_KEY>"
severity: critical
description: "{{ .CommonAnnotations.summary }}"
details:
component: "{{ .CommonLabels.component }}"
runbook: "{{ .CommonAnnotations.runbook_url }}"
Critical Alerts
Define these Prometheus alerting rules:
# keeptrusts-alerts.yml
groups:
- name: keeptrusts.critical
rules:
- alert: GatewayDown
expr: up{job="keeptrusts-gateway"} == 0
for: 2m
labels:
severity: critical
service: keeptrusts
annotations:
summary: "Keeptrusts gateway instance is down"
runbook_url: "https://ops.example.com/runbooks/gateway-down"
- alert: HighPolicyBlockRate
expr: >
sum(rate(keeptrusts_gateway_policy_evaluations_total{result="block"}[5m]))
/ sum(rate(keeptrusts_gateway_policy_evaluations_total[5m])) > 0.5
for: 5m
labels:
severity: warning
service: keeptrusts
annotations:
summary: "More than 50% of requests are being blocked by policy"
- alert: GatewayLatencyHigh
expr: >
histogram_quantile(0.99, sum(rate(keeptrusts_gateway_request_duration_seconds_bucket[5m])) by (le)) > 10
for: 5m
labels:
severity: critical
service: keeptrusts
annotations:
summary: "Gateway P99 latency exceeds 10 seconds"
- alert: APIEventIngestionStalled
expr: rate(keeptrusts_api_events_ingested_total[10m]) == 0
for: 10m
labels:
severity: critical
service: keeptrusts
annotations:
summary: "No events ingested in the last 10 minutes"
SLO Tracking
Define SLOs for your AI governance layer:
| SLO | Target | Measurement |
|---|---|---|
| Gateway availability | 99.9% | up{job="keeptrusts-gateway"} |
| Request success rate | 99.5% | Non-5xx responses / total requests |
| P99 latency overhead | < 200ms | Gateway latency minus provider latency |
| Policy evaluation time | < 50ms | keeptrusts_gateway_policy_duration_seconds |
| Event delivery | 99.9% | Events ingested / events emitted |
Error Budget Calculation
# 30-day error budget remaining
1 - (
sum(increase(keeptrusts_gateway_requests_total{status=~"5.."}[30d]))
/ sum(increase(keeptrusts_gateway_requests_total[30d]))
) / (1 - 0.999)
When the error budget drops below 25%, freeze non-critical changes and focus on reliability.
OpenTelemetry Collector
For environments using OpenTelemetry, configure the sidecar collector:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
prometheusremotewrite:
endpoint: "http://victoria-metrics:8428/api/v1/write"
service:
pipelines:
metrics:
receivers: [otlp]
exporters: [prometheusremotewrite]
Next steps
- Configure Secret Management for alerting integration credentials
- Set up Disaster Recovery procedures for monitoring infrastructure
- Review Capacity Management for cost-focused dashboards
For AI systems
- Canonical terms: Prometheus metrics, Grafana dashboard, PagerDuty, SLO, health endpoints,
/metrics,/healthz,/readyz - Gateway metrics:
keeptrusts_gateway_requests_total,keeptrusts_gateway_request_duration_seconds_bucket,keeptrusts_gateway_policy_evaluations_total,keeptrusts_gateway_tokens_total,keeptrusts_gateway_active_connections - API metrics:
keeptrusts_api_events_ingested_total,keeptrusts_api_export_jobs_total - Health endpoints: gateway
/healthz+/readyz(port 41002), API/healthz+/readyz(port 8080) - Related pages: Capacity Management, Secret Management, Disaster Recovery
For engineers
- Scrape
/metricson port 41002 (gateway) and 8080 (API) with PrometheusServiceMonitoror static config - Configure Kubernetes liveness probes on
/healthzand readiness probes on/readyz - Readiness returns
503while policies are loading — use this to gate traffic during startup - Build Grafana dashboards using
keeptrusts_gateway_requests_total(throughput),_duration_seconds_bucket(latency), and_policy_evaluations_total(enforcement) - Alert on: error rate > 1%, p99 latency > 2s, policy block rate spike, export job failures
- Use OTel collector sidecar to forward metrics to VictoriaMetrics or Prometheus remote write
For leaders
- SLO tracking (availability, latency, policy enforcement coverage) provides a single health signal for AI infrastructure
- Alert integration with PagerDuty/Slack ensures the on-call team responds to gateway degradation before users notice
- Policy evaluation metrics quantify governance coverage — report the percentage of requests evaluated to auditors
- Cost metrics (via wallet/token tracking) feed into chargeback dashboards for finance visibility
- Dashboard templates enable rapid observability setup without custom development