Monitoring & Alerting for AI Gateway Fleet

A production AI gateway fleet requires comprehensive observability. This guide covers health endpoints, metric collection, dashboard templates, and alerting integration for Keeptrusts infrastructure.

Use this page when

You need to collect Prometheus metrics from the gateway fleet and API
You are building Grafana dashboards for throughput, latency, policy evaluations, and cost
You want to configure SLO-based alerting and PagerDuty/Slack integration for AI infrastructure

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Health Endpoints

Every Keeptrusts component exposes health endpoints:

Component	Endpoint	Port	Purpose
Gateway	`/healthz`	41002	Liveness probe
Gateway	`/readyz`	41002	Readiness probe
API	`/healthz`	8080	Liveness probe
API	`/readyz`	8080	Readiness and migration status

Configure Kubernetes probes:

livenessProbe:
  httpGet:
    path: /healthz
    port: 41002
  initialDelaySeconds: 5
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /readyz
    port: 41002
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3

The readiness probe returns 503 while the gateway loads its policy configuration, preventing traffic before policies are active.

Prometheus Metrics

Gateway Metrics

The gateway exposes Prometheus-format metrics at /metrics:

# Request throughput
keeptrusts_gateway_requests_total{provider="openai",model="gpt-4o",status="200"}

# Latency histograms
keeptrusts_gateway_request_duration_seconds_bucket{provider="openai",le="0.5"}
keeptrusts_gateway_request_duration_seconds_bucket{provider="openai",le="1.0"}

# Policy evaluation
keeptrusts_gateway_policy_evaluations_total{policy="pii_redaction",result="pass"}
keeptrusts_gateway_policy_evaluations_total{policy="pii_redaction",result="block"}

# Token usage
keeptrusts_gateway_tokens_total{direction="input",model="gpt-4o"}
keeptrusts_gateway_tokens_total{direction="output",model="gpt-4o"}

# Active connections
keeptrusts_gateway_active_connections{provider="anthropic"}

API Metrics

The API server exposes:

# Event ingestion
keeptrusts_api_events_ingested_total{source="gateway"}

# Export jobs
keeptrusts_api_export_jobs_total{status="completed"}
keeptrusts_api_export_jobs_total{status="failed"}

# Database connections
keeptrusts_api_db_pool_connections_active
keeptrusts_api_db_pool_connections_idle

Scrape Configuration

Add Keeptrusts targets to your Prometheus configuration:

# prometheus.yml
scrape_configs:
  - job_name: keeptrusts-gateway
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: keeptrusts-gateway
        action: keep
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        target_label: __address__
        replacement: "${1}:41002"
    metrics_path: /metrics
    scrape_interval: 15s

  - job_name: keeptrusts-api
    static_configs:
      - targets: ["keeptrusts-api:8080"]
    metrics_path: /metrics
    scrape_interval: 15s

Grafana Dashboards

Gateway Overview Dashboard

Create a dashboard with these essential panels:

Request Rate

sum(rate(keeptrusts_gateway_requests_total[5m])) by (provider, status)

P99 Latency by Provider

histogram_quantile(0.99, sum(rate(keeptrusts_gateway_request_duration_seconds_bucket[5m])) by (le, provider))

Policy Block Rate

sum(rate(keeptrusts_gateway_policy_evaluations_total{result="block"}[5m]))
/
sum(rate(keeptrusts_gateway_policy_evaluations_total[5m]))

Token Throughput

sum(rate(keeptrusts_gateway_tokens_total[5m])) by (direction, model)

Cost Tracking Dashboard

Hourly Spend by Team

sum(increase(keeptrusts_gateway_request_cost_total[1h])) by (team)

Budget Utilization

keeptrusts_wallet_balance / keeptrusts_wallet_allocated * 100

PagerDuty Integration

Alertmanager Configuration

Route Keeptrusts alerts to PagerDuty:

# alertmanager.yml
route:
  receiver: default
  routes:
    - match:
        severity: critical
        service: keeptrusts
      receiver: keeptrusts-pagerduty
      group_wait: 30s
      group_interval: 5m

receivers:
  - name: keeptrusts-pagerduty
    pagerduty_configs:
      - service_key: "<PAGERDUTY_INTEGRATION_KEY>"
        severity: critical
        description: "{{ .CommonAnnotations.summary }}"
        details:
          component: "{{ .CommonLabels.component }}"
          runbook: "{{ .CommonAnnotations.runbook_url }}"

Critical Alerts

Define these Prometheus alerting rules:

# keeptrusts-alerts.yml
groups:
  - name: keeptrusts.critical
    rules:
      - alert: GatewayDown
        expr: up{job="keeptrusts-gateway"} == 0
        for: 2m
        labels:
          severity: critical
          service: keeptrusts
        annotations:
          summary: "Keeptrusts gateway instance is down"
          runbook_url: "https://ops.example.com/runbooks/gateway-down"

      - alert: HighPolicyBlockRate
        expr: >
          sum(rate(keeptrusts_gateway_policy_evaluations_total{result="block"}[5m]))
          / sum(rate(keeptrusts_gateway_policy_evaluations_total[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
          service: keeptrusts
        annotations:
          summary: "More than 50% of requests are being blocked by policy"

      - alert: GatewayLatencyHigh
        expr: >
          histogram_quantile(0.99, sum(rate(keeptrusts_gateway_request_duration_seconds_bucket[5m])) by (le)) > 10
        for: 5m
        labels:
          severity: critical
          service: keeptrusts
        annotations:
          summary: "Gateway P99 latency exceeds 10 seconds"

      - alert: APIEventIngestionStalled
        expr: rate(keeptrusts_api_events_ingested_total[10m]) == 0
        for: 10m
        labels:
          severity: critical
          service: keeptrusts
        annotations:
          summary: "No events ingested in the last 10 minutes"

SLO Tracking

Define SLOs for your AI governance layer:

SLO	Target	Measurement
Gateway availability	99.9%	`up{job="keeptrusts-gateway"}`
Request success rate	99.5%	Non-5xx responses / total requests
P99 latency overhead	< 200ms	Gateway latency minus provider latency
Policy evaluation time	< 50ms	`keeptrusts_gateway_policy_duration_seconds`
Event delivery	99.9%	Events ingested / events emitted

Error Budget Calculation

# 30-day error budget remaining
1 - (
  sum(increase(keeptrusts_gateway_requests_total{status=~"5.."}[30d]))
  / sum(increase(keeptrusts_gateway_requests_total[30d]))
) / (1 - 0.999)

When the error budget drops below 25%, freeze non-critical changes and focus on reliability.

OpenTelemetry Collector

For environments using OpenTelemetry, configure the sidecar collector:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  prometheusremotewrite:
    endpoint: "http://victoria-metrics:8428/api/v1/write"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheusremotewrite]

Next steps

Configure Secret Management for alerting integration credentials
Set up Disaster Recovery procedures for monitoring infrastructure
Review Capacity Management for cost-focused dashboards

For AI systems

Canonical terms: Prometheus metrics, Grafana dashboard, PagerDuty, SLO, health endpoints, /metrics, /healthz, /readyz
Gateway metrics: keeptrusts_gateway_requests_total, keeptrusts_gateway_request_duration_seconds_bucket, keeptrusts_gateway_policy_evaluations_total, keeptrusts_gateway_tokens_total, keeptrusts_gateway_active_connections
API metrics: keeptrusts_api_events_ingested_total, keeptrusts_api_export_jobs_total
Health endpoints: gateway /healthz + /readyz (port 41002), API /healthz + /readyz (port 8080)
Related pages: Capacity Management, Secret Management, Disaster Recovery

For engineers

Scrape /metrics on port 41002 (gateway) and 8080 (API) with Prometheus ServiceMonitor or static config
Configure Kubernetes liveness probes on /healthz and readiness probes on /readyz
Readiness returns 503 while policies are loading — use this to gate traffic during startup
Build Grafana dashboards using keeptrusts_gateway_requests_total (throughput), _duration_seconds_bucket (latency), and _policy_evaluations_total (enforcement)
Alert on: error rate > 1%, p99 latency > 2s, policy block rate spike, export job failures
Use OTel collector sidecar to forward metrics to VictoriaMetrics or Prometheus remote write

For leaders

SLO tracking (availability, latency, policy enforcement coverage) provides a single health signal for AI infrastructure
Alert integration with PagerDuty/Slack ensures the on-call team responds to gateway degradation before users notice
Policy evaluation metrics quantify governance coverage — report the percentage of requests evaluated to auditors
Cost metrics (via wallet/token tracking) feed into chargeback dashboards for finance visibility
Dashboard templates enable rapid observability setup without custom development

Use this page when​

Primary audience​

Health Endpoints​

Prometheus Metrics​

Gateway Metrics​

API Metrics​

Scrape Configuration​

Grafana Dashboards​

Gateway Overview Dashboard​

Cost Tracking Dashboard​

PagerDuty Integration​

Alertmanager Configuration​

Critical Alerts​

SLO Tracking​

Error Budget Calculation​

OpenTelemetry Collector​

Next steps​

For AI systems​

For engineers​

For leaders​