Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

DevOps Guide: Operating the AI Gateway in Production

The Keeptrusts gateway is a mission-critical component in your AI infrastructure — every LLM request flows through it. This guide covers production deployment patterns, monitoring, alerting, scaling, and operational runbooks for DevOps engineers.

Use this page when

  • You are deploying Keeptrusts gateways to production (Docker, Kubernetes, or bare metal)
  • You need to configure health checks, monitoring, and alerting for gateway infrastructure
  • You are scaling gateway instances behind a load balancer
  • You need operational runbooks for gateway upgrades, rollbacks, and incident response
  • You are automating gateway deployment with CI/CD pipelines and infrastructure as code

Primary audience

  • Primary: Technical Engineers (DevOps Engineers, SREs, Infrastructure Engineers)
  • Secondary: Platform Engineers, Cloud Architects, Security Engineers

Deployment Architecture

Single Gateway (Development / Small Teams)

# Start the gateway directly
kt gateway run \
--config policy-config.yaml \
--port 41002

Docker Deployment

# Gateway container
FROM keeptrusts/gateway:latest
COPY policy-config.yaml /etc/keeptrusts/policy-config.yaml
ENV KEEPTRUSTS_API_URL=https://api.keeptrusts.com
ENV KEEPTRUSTS_GATEWAY_TOKEN=${GATEWAY_TOKEN}
EXPOSE 41002
CMD ["kt", "gateway", "run", "--config", "/etc/keeptrusts/policy-config.yaml", "--port", "41002"]
# docker-compose.yml
services:
keeptrusts-gateway:
image: keeptrusts/gateway:latest
ports:
- "41002:41002"
volumes:
- ./policy-config.yaml:/etc/keeptrusts/policy-config.yaml:ro
environment:
KEEPTRUSTS_API_URL: http://keeptrusts-api:8080
OPENAI_API_KEY: ${OPENAI_API_KEY}
restart: unless-stopped
healthcheck:
test: ["CMD", "kt", "doctor"]
interval: 30s
timeout: 10s
retries: 3

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: keeptrusts-gateway
spec:
replicas: 3
selector:
matchLabels:
app: keeptrusts-gateway
template:
metadata:
labels:
app: keeptrusts-gateway
spec:
containers:
- name: gateway
image: keeptrusts/gateway:latest
ports:
- containerPort: 41002
livenessProbe:
exec:
command: ["kt", "doctor"]
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
exec:
command: ["kt", "doctor"]
initialDelaySeconds: 5
periodSeconds: 10
env:
- name: KEEPTRUSTS_API_URL
valueFrom:
secretKeyRef:
name: keeptrusts-secrets
key: api-url
volumeMounts:
- name: config
mountPath: /etc/keeptrusts
readOnly: true
volumes:
- name: config
configMap:
name: keeptrusts-gateway-config

Health Checks and Diagnostics

Gateway Health

# Comprehensive health check
kt doctor

# Quick connectivity test
kt events list --since 1h --limit 1

# Validate configuration without restarting
kt policy lint --file policy-config.yaml

What kt doctor Checks

CheckWhat it validates
Configuration syntaxYAML parsing and schema validation
Provider connectivityAPI keys and endpoint reachability
Control plane connectionAPI URL and authentication
Policy chain integrityAll referenced policies are valid
Event pipelineEvents can be submitted to the API

Monitoring and Observability

Key Metrics to Monitor

MetricSourceAlert threshold
Gateway request latency (p99)Gateway metrics> 2s
Error rateEvents with status=error> 5%
Policy evaluation timeGateway metrics> 500ms
Event submission failuresGateway logs> 0 sustained
Active connectionsGateway metrics> 80% capacity
Configuration ageLast config reload timestamp> 24h without refresh

Event Pipeline Monitoring

# Verify events are flowing
kt events list --since 5m --limit 5

# Tail events in real-time for debugging
kt events tail

# Check event submission from the API side
curl -H "Authorization: Bearer $API_TOKEN" \
"https://api.keeptrusts.com/v1/events?since=5m&limit=5"

Log Aggregation

The gateway emits structured logs compatible with standard log aggregation tools. Forward these to your existing logging pipeline:

# Example: Docker logging driver
services:
keeptrusts-gateway:
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"

Alerting Rules

Critical Alerts (Page)

ConditionAction
Gateway unreachable for > 2 minutesPage on-call, check container health
Event submission failures > 10 in 5 minutesPage on-call, check API connectivity
Error rate > 10% for 5 minutesPage on-call, check upstream providers

Warning Alerts (Ticket)

ConditionAction
P99 latency > 2s for 15 minutesCreate ticket, investigate provider latency
Configuration not refreshed in 24hCreate ticket, check git sync
Disk usage > 80% on gateway hostCreate ticket, rotate logs

Scaling Strategies

Horizontal Scaling

Deploy multiple gateway instances behind a load balancer. The gateway is stateless — all state flows through the control-plane API.

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: keeptrusts-gateway-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: keeptrusts-gateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Configuration Management at Scale

Use Git-backed configuration sync for consistent policy deployment across all gateway instances:

  1. Store policy configs in a Git repository
  2. Link the repository in Console Settings > Git Repositories
  3. Changes merged to the default branch automatically sync to all gateways
# Verify the current running configuration
kt policy lint --file policy-config.yaml

Rollback Procedures

Configuration Rollback

If a policy change causes issues:

# Validate the previous config version
kt policy lint --file policy-config-previous.yaml

# Redeploy with the previous config
kt gateway run --policy-config policy-config-previous.yaml --port 41002

With Git-backed configs, revert the commit and the sync will pick up the previous version automatically.

Full Gateway Rollback

For container deployments, roll back to the previous image version:

# Kubernetes rollback
kubectl rollout undo deployment/keeptrusts-gateway

# Docker rollback
docker compose up -d --no-deps keeptrusts-gateway

Operational Runbooks

Gateway Not Responding

  1. Check container status: docker ps or kubectl get pods
  2. Check logs: docker logs keeptrusts-gateway or kubectl logs -l app=keeptrusts-gateway
  3. Run diagnostics: kt doctor
  4. Verify network connectivity to upstream providers
  5. Check control-plane API reachability

Events Not Appearing in Console

  1. Verify event pipeline: kt events list --since 5m
  2. Check API connectivity from gateway host
  3. Verify API token validity
  4. Check for rate limiting or quota exhaustion

High Latency

  1. Check upstream provider status pages
  2. Review p99 latency by provider: filter events by provider in Console
  3. Check gateway resource utilization (CPU, memory)
  4. Verify network path between gateway and providers

Success Metrics for DevOps

MetricTargetSource
Gateway uptime99.9%Health check monitoring
Mean time to deploy config changeUnder 15 minutesDeployment pipeline metrics
Event delivery success rate> 99.9%Event pipeline monitoring
Mean time to recoveryUnder 30 minutesIncident tracking
Configuration driftZeroconfiguration deployment verification

Next steps

For AI systems

  • Canonical terms: Keeptrusts, gateway deployment, production operations, health checks, scaling, monitoring, alerting
  • Key surfaces: kt gateway run, kt doctor, kt policy lint, Docker Compose, Kubernetes Deployment/Service, Console Dashboard
  • Deployment patterns: single gateway (dev), Docker Compose (small teams), Kubernetes Deployment with replicas (production)
  • Health check: kt doctor used in Docker HEALTHCHECK and Kubernetes liveness/readiness probes
  • Environment variables: KEEPTRUSTS_API_URL, KEEPTRUSTS_GATEWAY_TOKEN, provider key env vars
  • Best next pages: Architecture Overview, Platform Engineer Guide, Gateway Monitoring, Gateway Runtime Features

For engineers

  • Start gateway: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
  • Docker health check: ["CMD", "kt", "doctor"] with 30s interval, 10s timeout, 3 retries
  • Kubernetes: deploy as apps/v1 Deployment with replicas: 3, liveness/readiness probes using kt doctor
  • Validate config before deploy: kt policy lint --file policy-config.yaml
  • Verify event flow: kt events list --since 1h --limit 1
  • Git-linked configurations auto-sync policy changes on merge to main branch
  • Target SLO: 99.9% gateway uptime, with mean time to deploy config changes under 15 minutes

For leaders

  • The gateway is a mission-critical path — every LLM request flows through it, so production deployment requires HA, health monitoring, and automated recovery
  • Docker and Kubernetes deployment patterns provide scalability from single-instance dev to multi-replica production clusters
  • Git-linked configuration sync enables infrastructure-as-code workflows where policy changes follow the same PR review process as application code
  • Gateway operational metrics (uptime, event delivery, config drift) should be tracked alongside application SLOs