Diagnose Gateway Issues with kt doctor

When requests fail, latency spikes, or policies misbehave, kt doctor gives you a systematic diagnostic toolkit. It validates connectivity, configuration, upstream health, and performance — then tells you exactly what to fix.

Use this page when

Requests are failing, latency is spiking, or policies are misbehaving and you need structured diagnostics.
You want to run kt doctor to validate connectivity, configuration, upstream health, or performance.
You need to generate a support bundle or integrate health checks into Kubernetes/Docker probes.

Primary audience

Primary: Technical Engineers and SREs troubleshooting gateway issues
Secondary: On-call operators, Platform Engineers configuring health probes

Running `kt doctor`

# Full diagnostic suite
kt doctor

# Target a specific gateway
kt doctor --gateway gw-prod-01

# Run specific check categories
kt doctor --checks connectivity,config,upstream

Full diagnostic output

Keeptrusts Gateway Diagnostics
══════════════════════════════

Gateway: gw-prod-01 (v2.4.1)
Config:  /etc/keeptrusts/policy-config.yaml

Connectivity
  ✓ API reachable (https://api.keeptrusts.com — 45ms)
  ✓ Gateway port 41002 listening
  ✓ TLS certificate valid (expires 2025-12-01)
  ✗ DNS resolution for api.keeptrusts.com took 850ms (threshold: 200ms)

Configuration
  ✓ Policy config valid (7 policies, 2 phases)
  ✓ All includes resolved
  ✓ Config variables: 4/4 bound
  ⚠ Deprecated policy type "content_filter_v1" in use

Upstream Providers
  ✓ openai/gpt-4o        reachable (112ms)
  ✓ anthropic/claude-3    reachable (98ms)
  ✗ azure/gpt-4           connection refused

Performance
  ✓ Policy chain eval: 12ms avg (last 100 requests)
  ✓ Memory usage: 142 MB (limit: 512 MB)
  ⚠ P99 latency: 890ms (threshold: 500ms)

Summary: 2 errors, 2 warnings
  → Fix DNS resolution: check /etc/resolv.conf or configure DNS cache
  → Fix azure/gpt-4: verify AZURE_API_KEY and endpoint URL

Diagnostic categories

Connectivity checks

Validates that the gateway can reach all required services:

kt doctor --checks connectivity

Check	What it validates
API reachability	TCP connection + HTTP health check to the control-plane API
Gateway port	The configured gateway port is listening and accepting connections
TLS certificate	Certificate validity, expiration, and chain completeness
DNS resolution	Resolution time for all configured hostnames
Firewall rules	Outbound connectivity to provider endpoints on required ports

Configuration checks

Validates the active configuration against the schema and runtime state:

kt doctor --checks config

Check	What it validates
Schema validation	Full `kt policy lint --file policy-config.yaml` against the active config source
Include resolution	All fragment files are accessible and parseable
Variable binding	All `secret_key_ref` and variable references resolve to a value
Policy deprecations	Flags deprecated policy types that will be removed in future versions
Config staleness	Warns if the config file has not been reloaded recently

Upstream provider checks

Tests connectivity and authentication to each configured LLM provider:

kt doctor --checks upstream

Upstream Provider Health
────────────────────────
openai/gpt-4o
  ✓ DNS resolution: 12ms
  ✓ TCP connection: 45ms
  ✓ TLS handshake: 89ms
  ✓ Auth validation: 200 OK
  ✓ Model available: true

azure/gpt-4
  ✓ DNS resolution: 15ms
  ✓ TCP connection: timeout after 5000ms
  ✗ FAILED: Connection refused
  → Check: AZURE_OPENAI_ENDPOINT environment variable
  → Check: Network allows outbound to *.openai.azure.com:443

Performance checks

Analyzes gateway performance using recent request data:

kt doctor --checks performance

Metric	Threshold	Description
Chain eval time	< 50ms avg	Time spent evaluating the policy chain
P50 latency	< 200ms	Median total request latency
P99 latency	< 500ms	Tail latency (excludes upstream time)
Memory usage	< 80% limit	Current RSS relative to configured limit
Open connections	< 80% max	Active connections relative to pool size
Event queue depth	< 1000	Pending events waiting for API submission

Debugging specific issues

Requests returning 409 unexpectedly

# Check which policy is triggering
kt events tail --filter "outcome=blocked" --format detailed

# Inspect the active chain
curl -s http://localhost:8080/keeptrusts/config | jq '.policies.chain'

# Re-run lint on the config source
kt policy lint --file policy-config.yaml

High latency

# Check per-policy latency
kt doctor --checks performance --verbose

# Identify slow policies
kt events tail --filter "latency_ms>500" --format detailed

# Check upstream provider response times
kt doctor --checks upstream --verbose

Events not appearing in the console

# Verify API connectivity
kt doctor --checks connectivity

# Check event queue status
kt doctor --checks performance --verbose | grep "queue"

# Manually test event submission
curl -s -o /dev/null -w "%{http_code}" \
  -X POST https://api.keeptrusts.com/v1/events \
  -H "Authorization: Bearer $KT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"test": true}'

Config reload failures

# Validate the new config before reloading
kt policy lint --file /path/to/new-config.yaml

# Check if the gateway process can read the file
kt doctor --checks config --verbose

# Reload the running gateway
kt gateway reload \
  --name local-dev \
  --gateway-url http://localhost:41002 \
  --config-path /path/to/new-config.yaml

Continuous health monitoring

Run kt doctor as a health check in your monitoring stack:

# Kubernetes liveness probe
# In your deployment manifest:
livenessProbe:
  exec:
    command: ["kt", "doctor", "--checks", "connectivity", "--exit-code"]
  initialDelaySeconds: 10
  periodSeconds: 30

# Docker health check
HEALTHCHECK --interval=30s --timeout=10s \
  CMD kt doctor --checks connectivity,config --exit-code || exit 1

Exit codes

Code	Meaning
`0`	All checks passed
`1`	One or more errors detected
`2`	One or more warnings (no errors)
`3`	Doctor command itself failed (invalid arguments, etc.)

Generating a support bundle

When you need to escalate to Keeptrusts support:

# Generate a full diagnostic bundle
kt doctor --bundle --output support-bundle.tar.gz

The bundle includes:

Full kt doctor output
Sanitized configuration (secrets redacted)
Recent event samples (last 100 events)
Gateway process logs (last 1000 lines)
System resource snapshot (CPU, memory, disk, network)

Business outcomes

Outcome	How diagnostics help
Faster incident resolution	Structured diagnostics pinpoint root causes in minutes, not hours
Proactive issue detection	Scheduled health checks catch problems before users report them
Reduced escalations	Self-service debugging resolves most issues without support tickets
Reliable deployments	Pre-deployment health checks validate connectivity and configuration
Operational confidence	Continuous monitoring proves the gateway is healthy and performant

For AI systems

Canonical terms: kt doctor, --checks connectivity|config|upstream|performance, --exit-code, --bundle, --gateway, --group.
Exit codes: 0 = all passed, 1 = errors, 2 = warnings only, 3 = command failure.
Diagnostic categories: connectivity (API, port, TLS, DNS), config (schema, includes, variables, deprecations), upstream (DNS, TCP, TLS, auth per provider), performance (chain eval, latency percentiles, memory, queue depth).
Best next pages: Performance Tuning, Multi-Gateway, Live Monitoring.

For engineers

Quick start: kt doctor runs all checks; kt doctor --gateway gw-prod-01 targets a specific instance.
Use in probes: kt doctor --checks connectivity --exit-code as a Kubernetes liveness probe or Docker HEALTHCHECK.
Debugging 409s: kt events tail --filter "outcome=blocked" --format detailed to see which policy triggered.
High latency: kt doctor --checks performance --verbose then check per-policy timing.
Support bundle: kt doctor --bundle --output support-bundle.tar.gz includes sanitized config, recent events, and logs.

For leaders

Structured diagnostics reduce mean-time-to-resolve from hours to minutes — less downtime for governed AI services.
Self-service debugging resolves most issues without vendor support tickets.
Health checks integrated into orchestration probes provide continuous assurance that governance controls are operational.
Support bundles include sanitized data only — no secrets leak to external parties.

Next steps

Tune Gateway Performance — optimize the metrics kt doctor reports
Operate Multiple Gateways — run diagnostics across a fleet
Monitor AI Traffic in Real-Time — complement diagnostics with live event streaming

Use this page when​

Primary audience​

Running kt doctor​

Full diagnostic output​

Diagnostic categories​

Connectivity checks​

Configuration checks​

Upstream provider checks​

Performance checks​

Debugging specific issues​

Requests returning 409 unexpectedly​

High latency​

Events not appearing in the console​

Config reload failures​

Continuous health monitoring​

Exit codes​

Generating a support bundle​

Business outcomes​

For AI systems​

For engineers​

For leaders​

Next steps​