Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Monitoring & Alerting for AI Gateway Fleet

A production AI gateway fleet requires comprehensive observability. This guide covers health endpoints, metric collection, dashboard templates, and alerting integration for Keeptrusts infrastructure.

Use this page when

  • You need to collect Prometheus metrics from the gateway fleet and API
  • You are building Grafana dashboards for throughput, latency, policy evaluations, and cost
  • You want to configure SLO-based alerting and PagerDuty/Slack integration for AI infrastructure

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Health Endpoints

Every Keeptrusts component exposes health endpoints:

ComponentEndpointPortPurpose
Gateway/healthz41002Liveness probe
Gateway/readyz41002Readiness probe
API/healthz8080Liveness probe
API/readyz8080Readiness and migration status

Configure Kubernetes probes:

livenessProbe:
httpGet:
path: /healthz
port: 41002
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 41002
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3

The readiness probe returns 503 while the gateway loads its policy configuration, preventing traffic before policies are active.

Prometheus Metrics

Gateway Metrics

The gateway exposes Prometheus-format metrics at /metrics:

# Request throughput
keeptrusts_gateway_requests_total{provider="openai",model="gpt-4o",status="200"}

# Latency histograms
keeptrusts_gateway_request_duration_seconds_bucket{provider="openai",le="0.5"}
keeptrusts_gateway_request_duration_seconds_bucket{provider="openai",le="1.0"}

# Policy evaluation
keeptrusts_gateway_policy_evaluations_total{policy="pii_redaction",result="pass"}
keeptrusts_gateway_policy_evaluations_total{policy="pii_redaction",result="block"}

# Token usage
keeptrusts_gateway_tokens_total{direction="input",model="gpt-4o"}
keeptrusts_gateway_tokens_total{direction="output",model="gpt-4o"}

# Active connections
keeptrusts_gateway_active_connections{provider="anthropic"}

API Metrics

The API server exposes:

# Event ingestion
keeptrusts_api_events_ingested_total{source="gateway"}

# Export jobs
keeptrusts_api_export_jobs_total{status="completed"}
keeptrusts_api_export_jobs_total{status="failed"}

# Database connections
keeptrusts_api_db_pool_connections_active
keeptrusts_api_db_pool_connections_idle

Scrape Configuration

Add Keeptrusts targets to your Prometheus configuration:

# prometheus.yml
scrape_configs:
- job_name: keeptrusts-gateway
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: keeptrusts-gateway
action: keep
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
target_label: __address__
replacement: "${1}:41002"
metrics_path: /metrics
scrape_interval: 15s

- job_name: keeptrusts-api
static_configs:
- targets: ["keeptrusts-api:8080"]
metrics_path: /metrics
scrape_interval: 15s

Grafana Dashboards

Gateway Overview Dashboard

Create a dashboard with these essential panels:

Request Rate

sum(rate(keeptrusts_gateway_requests_total[5m])) by (provider, status)

P99 Latency by Provider

histogram_quantile(0.99, sum(rate(keeptrusts_gateway_request_duration_seconds_bucket[5m])) by (le, provider))

Policy Block Rate

sum(rate(keeptrusts_gateway_policy_evaluations_total{result="block"}[5m]))
/
sum(rate(keeptrusts_gateway_policy_evaluations_total[5m]))

Token Throughput

sum(rate(keeptrusts_gateway_tokens_total[5m])) by (direction, model)

Cost Tracking Dashboard

Hourly Spend by Team

sum(increase(keeptrusts_gateway_request_cost_total[1h])) by (team)

Budget Utilization

keeptrusts_wallet_balance / keeptrusts_wallet_allocated * 100

PagerDuty Integration

Alertmanager Configuration

Route Keeptrusts alerts to PagerDuty:

# alertmanager.yml
route:
receiver: default
routes:
- match:
severity: critical
service: keeptrusts
receiver: keeptrusts-pagerduty
group_wait: 30s
group_interval: 5m

receivers:
- name: keeptrusts-pagerduty
pagerduty_configs:
- service_key: "<PAGERDUTY_INTEGRATION_KEY>"
severity: critical
description: "{{ .CommonAnnotations.summary }}"
details:
component: "{{ .CommonLabels.component }}"
runbook: "{{ .CommonAnnotations.runbook_url }}"

Critical Alerts

Define these Prometheus alerting rules:

# keeptrusts-alerts.yml
groups:
- name: keeptrusts.critical
rules:
- alert: GatewayDown
expr: up{job="keeptrusts-gateway"} == 0
for: 2m
labels:
severity: critical
service: keeptrusts
annotations:
summary: "Keeptrusts gateway instance is down"
runbook_url: "https://ops.example.com/runbooks/gateway-down"

- alert: HighPolicyBlockRate
expr: >
sum(rate(keeptrusts_gateway_policy_evaluations_total{result="block"}[5m]))
/ sum(rate(keeptrusts_gateway_policy_evaluations_total[5m])) > 0.5
for: 5m
labels:
severity: warning
service: keeptrusts
annotations:
summary: "More than 50% of requests are being blocked by policy"

- alert: GatewayLatencyHigh
expr: >
histogram_quantile(0.99, sum(rate(keeptrusts_gateway_request_duration_seconds_bucket[5m])) by (le)) > 10
for: 5m
labels:
severity: critical
service: keeptrusts
annotations:
summary: "Gateway P99 latency exceeds 10 seconds"

- alert: APIEventIngestionStalled
expr: rate(keeptrusts_api_events_ingested_total[10m]) == 0
for: 10m
labels:
severity: critical
service: keeptrusts
annotations:
summary: "No events ingested in the last 10 minutes"

SLO Tracking

Define SLOs for your AI governance layer:

SLOTargetMeasurement
Gateway availability99.9%up{job="keeptrusts-gateway"}
Request success rate99.5%Non-5xx responses / total requests
P99 latency overhead< 200msGateway latency minus provider latency
Policy evaluation time< 50mskeeptrusts_gateway_policy_duration_seconds
Event delivery99.9%Events ingested / events emitted

Error Budget Calculation

# 30-day error budget remaining
1 - (
sum(increase(keeptrusts_gateway_requests_total{status=~"5.."}[30d]))
/ sum(increase(keeptrusts_gateway_requests_total[30d]))
) / (1 - 0.999)

When the error budget drops below 25%, freeze non-critical changes and focus on reliability.

OpenTelemetry Collector

For environments using OpenTelemetry, configure the sidecar collector:

# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

exporters:
prometheusremotewrite:
endpoint: "http://victoria-metrics:8428/api/v1/write"

service:
pipelines:
metrics:
receivers: [otlp]
exporters: [prometheusremotewrite]

Next steps

For AI systems

  • Canonical terms: Prometheus metrics, Grafana dashboard, PagerDuty, SLO, health endpoints, /metrics, /healthz, /readyz
  • Gateway metrics: keeptrusts_gateway_requests_total, keeptrusts_gateway_request_duration_seconds_bucket, keeptrusts_gateway_policy_evaluations_total, keeptrusts_gateway_tokens_total, keeptrusts_gateway_active_connections
  • API metrics: keeptrusts_api_events_ingested_total, keeptrusts_api_export_jobs_total
  • Health endpoints: gateway /healthz + /readyz (port 41002), API /healthz + /readyz (port 8080)
  • Related pages: Capacity Management, Secret Management, Disaster Recovery

For engineers

  • Scrape /metrics on port 41002 (gateway) and 8080 (API) with Prometheus ServiceMonitor or static config
  • Configure Kubernetes liveness probes on /healthz and readiness probes on /readyz
  • Readiness returns 503 while policies are loading — use this to gate traffic during startup
  • Build Grafana dashboards using keeptrusts_gateway_requests_total (throughput), _duration_seconds_bucket (latency), and _policy_evaluations_total (enforcement)
  • Alert on: error rate > 1%, p99 latency > 2s, policy block rate spike, export job failures
  • Use OTel collector sidecar to forward metrics to VictoriaMetrics or Prometheus remote write

For leaders

  • SLO tracking (availability, latency, policy enforcement coverage) provides a single health signal for AI infrastructure
  • Alert integration with PagerDuty/Slack ensures the on-call team responds to gateway degradation before users notice
  • Policy evaluation metrics quantify governance coverage — report the percentage of requests evaluated to auditors
  • Cost metrics (via wallet/token tracking) feed into chargeback dashboards for finance visibility
  • Dashboard templates enable rapid observability setup without custom development