Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Diagnose Gateway Issues with kt doctor

When requests fail, latency spikes, or policies misbehave, kt doctor gives you a systematic diagnostic toolkit. It validates connectivity, configuration, upstream health, and performance — then tells you exactly what to fix.

Use this page when

  • Requests are failing, latency is spiking, or policies are misbehaving and you need structured diagnostics.
  • You want to run kt doctor to validate connectivity, configuration, upstream health, or performance.
  • You need to generate a support bundle or integrate health checks into Kubernetes/Docker probes.

Primary audience

  • Primary: Technical Engineers and SREs troubleshooting gateway issues
  • Secondary: On-call operators, Platform Engineers configuring health probes

Running kt doctor

# Full diagnostic suite
kt doctor

# Target a specific gateway
kt doctor --gateway gw-prod-01

# Run specific check categories
kt doctor --checks connectivity,config,upstream

Full diagnostic output

Keeptrusts Gateway Diagnostics
══════════════════════════════

Gateway: gw-prod-01 (v2.4.1)
Config: /etc/keeptrusts/policy-config.yaml

Connectivity
✓ API reachable (https://api.keeptrusts.com — 45ms)
✓ Gateway port 41002 listening
✓ TLS certificate valid (expires 2025-12-01)
✗ DNS resolution for api.keeptrusts.com took 850ms (threshold: 200ms)

Configuration
✓ Policy config valid (7 policies, 2 phases)
✓ All includes resolved
✓ Config variables: 4/4 bound
⚠ Deprecated policy type "content_filter_v1" in use

Upstream Providers
✓ openai/gpt-4o reachable (112ms)
✓ anthropic/claude-3 reachable (98ms)
✗ azure/gpt-4 connection refused

Performance
✓ Policy chain eval: 12ms avg (last 100 requests)
✓ Memory usage: 142 MB (limit: 512 MB)
⚠ P99 latency: 890ms (threshold: 500ms)

Summary: 2 errors, 2 warnings
→ Fix DNS resolution: check /etc/resolv.conf or configure DNS cache
→ Fix azure/gpt-4: verify AZURE_API_KEY and endpoint URL

Diagnostic categories

Connectivity checks

Validates that the gateway can reach all required services:

kt doctor --checks connectivity
CheckWhat it validates
API reachabilityTCP connection + HTTP health check to the control-plane API
Gateway portThe configured gateway port is listening and accepting connections
TLS certificateCertificate validity, expiration, and chain completeness
DNS resolutionResolution time for all configured hostnames
Firewall rulesOutbound connectivity to provider endpoints on required ports

Configuration checks

Validates the active configuration against the schema and runtime state:

kt doctor --checks config
CheckWhat it validates
Schema validationFull kt policy lint --file policy-config.yaml against the active config source
Include resolutionAll fragment files are accessible and parseable
Variable bindingAll secret_key_ref and variable references resolve to a value
Policy deprecationsFlags deprecated policy types that will be removed in future versions
Config stalenessWarns if the config file has not been reloaded recently

Upstream provider checks

Tests connectivity and authentication to each configured LLM provider:

kt doctor --checks upstream
Upstream Provider Health
────────────────────────
openai/gpt-4o
✓ DNS resolution: 12ms
✓ TCP connection: 45ms
✓ TLS handshake: 89ms
✓ Auth validation: 200 OK
✓ Model available: true

azure/gpt-4
✓ DNS resolution: 15ms
✓ TCP connection: timeout after 5000ms
✗ FAILED: Connection refused
→ Check: AZURE_OPENAI_ENDPOINT environment variable
→ Check: Network allows outbound to *.openai.azure.com:443

Performance checks

Analyzes gateway performance using recent request data:

kt doctor --checks performance
MetricThresholdDescription
Chain eval time< 50ms avgTime spent evaluating the policy chain
P50 latency< 200msMedian total request latency
P99 latency< 500msTail latency (excludes upstream time)
Memory usage< 80% limitCurrent RSS relative to configured limit
Open connections< 80% maxActive connections relative to pool size
Event queue depth< 1000Pending events waiting for API submission

Debugging specific issues

Requests returning 409 unexpectedly

# Check which policy is triggering
kt events tail --filter "outcome=blocked" --format detailed

# Inspect the active chain
curl -s http://localhost:8080/keeptrusts/config | jq '.policies.chain'

# Re-run lint on the config source
kt policy lint --file policy-config.yaml

High latency

# Check per-policy latency
kt doctor --checks performance --verbose

# Identify slow policies
kt events tail --filter "latency_ms>500" --format detailed

# Check upstream provider response times
kt doctor --checks upstream --verbose

Events not appearing in the console

# Verify API connectivity
kt doctor --checks connectivity

# Check event queue status
kt doctor --checks performance --verbose | grep "queue"

# Manually test event submission
curl -s -o /dev/null -w "%{http_code}" \
-X POST https://api.keeptrusts.com/v1/events \
-H "Authorization: Bearer $KT_API_KEY" \
-H "Content-Type: application/json" \
-d '{"test": true}'

Config reload failures

# Validate the new config before reloading
kt policy lint --file /path/to/new-config.yaml

# Check if the gateway process can read the file
kt doctor --checks config --verbose

# Reload the running gateway
kt gateway reload \
--name local-dev \
--gateway-url http://localhost:41002 \
--config-path /path/to/new-config.yaml

Continuous health monitoring

Run kt doctor as a health check in your monitoring stack:

# Kubernetes liveness probe
# In your deployment manifest:
livenessProbe:
exec:
command: ["kt", "doctor", "--checks", "connectivity", "--exit-code"]
initialDelaySeconds: 10
periodSeconds: 30

# Docker health check
HEALTHCHECK --interval=30s --timeout=10s \
CMD kt doctor --checks connectivity,config --exit-code || exit 1

Exit codes

CodeMeaning
0All checks passed
1One or more errors detected
2One or more warnings (no errors)
3Doctor command itself failed (invalid arguments, etc.)

Generating a support bundle

When you need to escalate to Keeptrusts support:

# Generate a full diagnostic bundle
kt doctor --bundle --output support-bundle.tar.gz

The bundle includes:

  • Full kt doctor output
  • Sanitized configuration (secrets redacted)
  • Recent event samples (last 100 events)
  • Gateway process logs (last 1000 lines)
  • System resource snapshot (CPU, memory, disk, network)

Business outcomes

OutcomeHow diagnostics help
Faster incident resolutionStructured diagnostics pinpoint root causes in minutes, not hours
Proactive issue detectionScheduled health checks catch problems before users report them
Reduced escalationsSelf-service debugging resolves most issues without support tickets
Reliable deploymentsPre-deployment health checks validate connectivity and configuration
Operational confidenceContinuous monitoring proves the gateway is healthy and performant

For AI systems

  • Canonical terms: kt doctor, --checks connectivity|config|upstream|performance, --exit-code, --bundle, --gateway, --group.
  • Exit codes: 0 = all passed, 1 = errors, 2 = warnings only, 3 = command failure.
  • Diagnostic categories: connectivity (API, port, TLS, DNS), config (schema, includes, variables, deprecations), upstream (DNS, TCP, TLS, auth per provider), performance (chain eval, latency percentiles, memory, queue depth).
  • Best next pages: Performance Tuning, Multi-Gateway, Live Monitoring.

For engineers

  • Quick start: kt doctor runs all checks; kt doctor --gateway gw-prod-01 targets a specific instance.
  • Use in probes: kt doctor --checks connectivity --exit-code as a Kubernetes liveness probe or Docker HEALTHCHECK.
  • Debugging 409s: kt events tail --filter "outcome=blocked" --format detailed to see which policy triggered.
  • High latency: kt doctor --checks performance --verbose then check per-policy timing.
  • Support bundle: kt doctor --bundle --output support-bundle.tar.gz includes sanitized config, recent events, and logs.

For leaders

  • Structured diagnostics reduce mean-time-to-resolve from hours to minutes — less downtime for governed AI services.
  • Self-service debugging resolves most issues without vendor support tickets.
  • Health checks integrated into orchestration probes provide continuous assurance that governance controls are operational.
  • Support bundles include sanitized data only — no secrets leak to external parties.

Next steps