Zero-Downtime Upgrade Procedures
Upgrading AI governance infrastructure requires coordination across stateless gateways, a stateful API, and frontend consoles. This guide covers procedures for upgrading each component without service interruption.
Use this page when
- You are upgrading gateway, API, or console components without service interruption
- You need to understand version compatibility, upgrade order, and rollback procedures
- You want to implement rolling updates, blue-green console deployment, or canary gateway releases
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Version Compatibility Matrix
Keeptrusts components follow semantic versioning. Adjacent minor versions are always compatible:
| API Version | Gateway Versions | Console Versions | Notes |
|---|---|---|---|
| 1.5.x | 1.4.x – 1.5.x | 1.4.x – 1.5.x | Current |
| 1.4.x | 1.3.x – 1.4.x | 1.3.x – 1.4.x | Supported |
| 1.3.x | 1.2.x – 1.3.x | 1.2.x – 1.3.x | End of life |
Upgrade order: API first, then gateways, then console. The API is backward-compatible with the previous minor version of gateways and consoles.
Rolling Gateway Updates
Gateways are stateless — rolling updates are straightforward. The key constraint is maintaining policy enforcement continuity.
Kubernetes Rolling Update
apiVersion: apps/v1
kind: Deployment
metadata:
name: keeptrusts-gateway
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
spec:
containers:
- name: gateway
image: keeptrusts/gateway:1.5.0
readinessProbe:
httpGet:
path: /readyz
port: 41002
initialDelaySeconds: 10
periodSeconds: 5
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
The preStop hook ensures in-flight requests complete before the pod is terminated. The readiness probe prevents traffic to pods still loading policy configuration.
Update Procedure
- Update the image tag in your deployment manifest
- Apply the update:
kubectl set image deployment/keeptrusts-gateway \gateway=keeptrusts/gateway:1.5.0 \-n keeptrusts
- Monitor the rollout:
kubectl rollout status deployment/keeptrusts-gateway -n keeptrusts
- Verify policy enforcement:
# Send a test request that should be blockedcurl -X POST https://gateway.example.com/v1/chat/completions \-H "Authorization: Bearer $TEST_KEY" \-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "test blocked content"}]}'# Expect 409 if policy is active
Rollback
kubectl rollout undo deployment/keeptrusts-gateway -n keeptrusts
API Migration Auto-Apply
The API server automatically applies pending database migrations at startup. This enables zero-downtime upgrades when migrations are backward-compatible.
Safe Migration Patterns
Migrations that are safe for zero-downtime deployment:
- Add a column with a default — existing code ignores the new column
- Add a new table — no impact on existing queries
- Add an index concurrently — use
CREATE INDEX CONCURRENTLY - Add a new enum value — existing code handles unknown values
Migrations that require coordination:
- Drop a column — deploy code that stops reading the column first
- Rename a column — use a two-phase approach (add new, migrate, drop old)
- Change a column type — add new column, backfill, switch reads, drop old
Upgrade Procedure
- Deploy the new API version alongside the existing one:
kubectl set image deployment/keeptrusts-api \api=keeptrusts/api:1.5.0 \-n keeptrusts
- The first pod to start applies migrations — subsequent pods wait
- Monitor migration status:
kubectl logs -l app=keeptrusts-api -n keeptrusts | grep "migration"
- Verify the API health:
curl https://api.example.com/readyz
Migration Rollback
Migrations are forward-only. If a migration causes issues:
- Deploy the previous API version — it will work with the new schema if migrations are backward-compatible
- Create a corrective migration to undo the problematic change
- Never modify a shipped migration file
Console Blue-Green Deployment
The console is a stateless Next.js application. Blue-green deployment provides instant rollback capability.
Procedure
-
Build the new console version:
docker build -t keeptrusts/console:1.5.0 -f console/Dockerfile . -
Deploy to the green environment:
kubectl set image deployment/keeptrusts-console-green \console=keeptrusts/console:1.5.0 \-n keeptrusts -
Verify the green environment:
# Internal health checkcurl https://console-green.internal.example.com/api/health# Smoke test critical pagescurl -s -o /dev/null -w "%{http_code}" \https://console-green.internal.example.com/dashboard -
Switch traffic from blue to green:
kubectl patch service keeptrusts-console \-p '{"spec":{"selector":{"version":"green"}}}' \-n keeptrusts -
Rollback if needed:
kubectl patch service keeptrusts-console \-p '{"spec":{"selector":{"version":"blue"}}}' \-n keeptrusts
Environment Variables
Console builds bake NEXT_PUBLIC_* variables at build time. Ensure the green build uses the correct values:
docker build \
--build-arg NEXT_PUBLIC_API_URL=https://api.example.com \
--build-arg NEXT_PUBLIC_GATEWAY_URL=https://gateway.example.com \
-t keeptrusts/console:1.5.0 \
-f console/Dockerfile .
Worker Binary Updates
Worker binaries (worker_export, worker_lifecycle, worker_config) process background jobs. Update them after the API:
- Stop the current worker — it will finish its current job
- Deploy the new version — it picks up where the old one left off
- Verify job processing:
curl https://api.example.com/v1/admin/workers/status \-H "Authorization: Bearer $ADMIN_TOKEN"
Workers are safe to restart at any time. In-flight jobs are retried on the next poll cycle.
Pre-Upgrade Checklist
- Read the release notes for breaking changes
- Verify version compatibility matrix
- Back up the database (or verify continuous backup)
- Test the upgrade in a staging environment
- Notify teams of the maintenance window (if applicable)
- Verify monitoring dashboards are accessible
Post-Upgrade Verification
- All health endpoints return
200 - Gateway policy enforcement is active (test a blocked request)
- Console login and navigation work
- Event ingestion is flowing (check event count)
- Export jobs are processing
- No error spikes in monitoring dashboards
Coordinated Multi-Component Upgrade
For major version upgrades affecting all components:
1. API (with migrations)
├── Wait for all pods healthy
├── Verify /readyz returns 200
└── Check migration log
2. Gateways (rolling update)
├── Wait for rollout complete
├── Verify /readyz on all pods
└── Test policy enforcement
3. Console (blue-green switch)
├── Verify green environment
├── Switch traffic
└── Smoke test critical flows
4. Workers (restart)
└── Verify job processing resumes
Next steps
- Set up Multi-Region deployment for geographic upgrade staging
- Configure Disaster Recovery for rollback procedures
- Review Monitoring & Alerting for upgrade health tracking
For AI systems
- Canonical terms: rolling update, blue-green deployment, version compatibility matrix, upgrade order,
maxUnavailable,maxSurge,preStophook, readiness probe, database migration - Upgrade order: API first (with migrations), then gateways, then console
- Version compatibility: adjacent minor versions always compatible (e.g., gateway 1.4.x works with API 1.5.x)
- Health endpoints:
/readyz(readiness), gateway port 41002 - Related pages: Multi-Region, Disaster Recovery, Monitoring & Alerting
For engineers
- Always upgrade API first — migrations auto-apply at startup and are backward-compatible with previous-minor gateways
- Use
kubectl set image deployment/keeptrusts-gateway gateway=keeptrusts/gateway:<version>for rolling gateway updates - Add
preStop: sleep 10to ensure in-flight requests complete before pod termination - Monitor rollout with
kubectl rollout status deployment/keeptrusts-gateway -n keeptrusts - For console, deploy the new version to a green environment, verify with smoke tests, then switch traffic
- Rollback:
kubectl rollout undo deployment/keeptrusts-gateway -n keeptrusts - Validate: send a test request that should be blocked and confirm 409 response (policy enforcement active)
For leaders
- Zero-downtime upgrades ensure AI traffic is never interrupted during platform maintenance
- Version compatibility matrix (N-1 support) means you don't have to upgrade all components simultaneously
- API-first upgrade order ensures database migrations are in place before gateways or consoles expect new schemas
- Blue-green console deployment provides instant rollback if the new version has issues
- Canary gateway releases (route 10% traffic to new version) reduce blast radius for gateway changes