Microservices Architecture with AI Gateway
Microservice architectures introduce unique challenges for AI governance — multiple services making independent LLM calls need consistent policy enforcement. This guide covers deployment topologies, service discovery patterns, and configuration propagation strategies.
Use this page when
- You have multiple microservices making independent LLM calls that need consistent policy enforcement
- You are choosing between shared (centralized) and sidecar (per-service) gateway deployments in Kubernetes
- You need service discovery patterns for gateway access in containerized environments
- You want to propagate configuration changes to sidecar gateways without redeploying all services
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Deployment Topologies
Shared Gateway (Centralized)
A single gateway instance serves all microservices. Simple to operate but creates a shared dependency:
Configuration:
# docker-compose.yml — shared gateway
services:
kt-gateway:
image: keeptrusts/gateway:latest
ports:
- "41002:41002"
environment:
KEEPTRUSTS_API_URL: http://keeptrusts-api:8080
OPENAI_API_KEY: ${OPENAI_API_KEY}
volumes:
- ./policy-config.yaml:/etc/keeptrusts/policy-config.yaml
service-a:
build: ./services/service-a
environment:
GATEWAY_URL: http://kt-gateway:41002
service-b:
build: ./services/service-b
environment:
GATEWAY_URL: http://kt-gateway:41002
When to use: Fewer than 10 services, uniform policy requirements, single team manages AI governance.
Sidecar Gateway (Per-Service)
Each service runs its own gateway instance with service-specific policies:
Kubernetes sidecar configuration:
# k8s/deployment-with-sidecar.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: service-a
spec:
template:
spec:
containers:
- name: service-a
image: myregistry/service-a:latest
env:
- name: GATEWAY_URL
value: "http://localhost:41002"
- name: kt-gateway
image: keeptrusts/gateway:latest
ports:
- containerPort: 41002
env:
- name: KEEPTRUSTS_API_URL
value: "http://keeptrusts-api.platform.svc:8080"
volumeMounts:
- name: policy-config
mountPath: /etc/keeptrusts
volumes:
- name: policy-config
configMap:
name: service-a-policy
When to use: Service-specific policies, independent scaling, blast radius isolation.
Hybrid Topology
Combine shared and sidecar patterns — a shared gateway for common services and dedicated sidecars for high-security workloads:
# Shared gateway for general services
services:
kt-gateway-shared:
image: keeptrusts/gateway:latest
environment:
KEEPTRUSTS_API_URL: http://keeptrusts-api:8080
volumes:
- ./policies/shared-policy.yaml:/etc/keeptrusts/policy-config.yaml
# Dedicated sidecar for PCI-scoped service
payment-ai-service:
build: ./services/payment-ai
environment:
GATEWAY_URL: http://localhost:41002
payment-gateway-sidecar:
image: keeptrusts/gateway:latest
network_mode: "service:payment-ai-service"
volumes:
- ./policies/pci-policy.yaml:/etc/keeptrusts/policy-config.yaml
Service Discovery
DNS-Based Discovery
Services resolve the gateway by DNS name. Works with Docker Compose, Kubernetes, and Consul:
# service_client.py
import os
GATEWAY_URL = os.environ.get("GATEWAY_URL", "http://kt-gateway:41002")
async def call_ai(prompt: str) -> dict:
"""Route AI call through discovered gateway."""
async with httpx.AsyncClient() as client:
return (await client.post(
f"{GATEWAY_URL}/v1/chat/completions",
json={"model": "gpt-4o-mini", "messages": [{"role": "user", "content": prompt}]},
headers={"Authorization": f"Bearer {os.environ['GATEWAY_KEY']}"},
)).json()
Kubernetes Service
# k8s/gateway-service.yaml
apiVersion: v1
kind: Service
metadata:
name: kt-gateway
namespace: platform
spec:
selector:
app: kt-gateway
ports:
- port: 41002
targetPort: 41002
Services in any namespace reach the gateway at http://kt-gateway.platform.svc:41002.
Configuration Propagation
ConfigMap-Based (Kubernetes)
Store policy configs in ConfigMaps and propagate changes without redeploying services:
# k8s/policy-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: shared-policy
namespace: platform
data:
policy-config.yaml: |
gateway:
port: 41002
secret_key_ref:
env: OPENAI_API_KEY
policies:
- name: default
input:
- type: content_safety
action: block
categories: [hate, violence, self_harm, sexual]
- type: pii_detection
action: redact
entities: [ssn, credit_card]
Git-Backed Config Sync
Use the Keeptrusts git sync feature to propagate policies from a git repository:
# Link a git repository for config sync
curl -X POST https://api.keeptrusts.example/v1/git-repos \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://github.com/org/governance-policies.git",
"branch": "main",
"path": "policies/",
"auto_create_configuration": true,
"poll_interval_seconds": 300
}'
Changes pushed to the repository automatically propagate to all gateways bound to that configuration.
Per-Service Policy Overrides
Layer service-specific policies on top of shared base configs:
governance-policies/
├── base/
│ └── shared-policy.yaml # Common safety rules
├── services/
│ ├── analytics/
│ │ └── policy-config.yaml # Allows larger context windows
│ ├── customer-support/
│ │ └── policy-config.yaml # Strict PII redaction
│ └── internal-tools/
│ └── policy-config.yaml # Relaxed output policies
Health Checks and Readiness
Configure health probes so orchestrators route traffic only to healthy gateways:
# k8s/deployment.yaml — gateway health probes
containers:
- name: kt-gateway
livenessProbe:
httpGet:
path: /health
port: 41002
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 41002
initialDelaySeconds: 3
periodSeconds: 5
Comparison Matrix
| Criterion | Shared Gateway | Sidecar | Hybrid |
|---|---|---|---|
| Operational complexity | Low | High | Medium |
| Policy isolation | Shared | Per-service | Mixed |
| Blast radius | All services | Single service | Scoped |
| Resource overhead | Single instance | N instances | Selective |
| Config propagation | Single point | Per-pod ConfigMap | Both |
| Latency | Network hop | Localhost | Varies |
Key Takeaways
- Start with a shared gateway topology and move to sidecars as policy requirements diverge
- Use Kubernetes ConfigMaps or git-backed sync for policy propagation — avoid baking configs into images
- Layer service-specific policy overrides on top of shared base configurations
- Configure liveness and readiness probes so the orchestrator only routes to healthy gateways
- Use DNS-based service discovery so gateway location is configurable per environment
For AI systems
- Canonical terms: shared gateway, sidecar gateway, service mesh, Kubernetes sidecar injection,
GATEWAY_URL, config propagation, per-service policies, consumer groups, gateway keys per service - Key configuration:
docker-compose.ymlshared gateway, KubernetesDeploymentwith sidecar container,KEEPTRUSTS_API_URL, config-reload - Best next pages: Architecture Patterns for AI-Governed Systems, Capacity Planning, Resilience Engineering
For engineers
- Shared gateway: all services set
GATEWAY_URL: http://kt-gateway:41002— best for < 10 services with uniform policies - Sidecar gateway: each pod has its own gateway container at
localhost:41002— best for per-service policy isolation - In Kubernetes, use sidecar injection via mutating admission webhook or manual container spec in Deployment
- Config propagation: sidecar gateways fetch config from the control-plane API on startup; trigger reload via
POST /v1/gateways/{id}/reload - Service discovery: use Kubernetes service names (
kt-gateway.namespace.svc.cluster.local) for shared gateway access
For leaders
- Shared gateway reduces operational overhead (single deployment to manage) but creates a shared dependency across all services
- Sidecar pattern enables team autonomy — each team owns its policy configuration — but increases total resource consumption and operational surface
- Governance coverage is complete only when every service routes through the gateway — audit for services making direct provider calls
Next steps
- Architecture Patterns for AI-Governed Systems — compare all integration patterns
- Capacity Planning for AI Workloads — size per-service and shared gateway instances
- Security Engineering for AI Pipelines — mTLS between services and gateway
- Resilience Engineering for AI Services — failover when the shared gateway is unavailable