Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Runtime Configuration

Runtime sections control stateful gateway features: conversation loads, history persistence, memory recall, learning from past sessions, review execution, agent routing, content moderation, automatic provider selection, model listing, and response caching.

Use this page when

  • You are configuring stateful gateway features like Knowledge Base recall, conversation history, memory recall, session learning, or review execution.
  • You need to set up auto-provider routing, content moderation, or the /v1/models listing endpoint.
  • You are tuning caching, agent identity, or hosted gateway control-plane settings.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Knowledge Base Recall

The loads: section retrieves Knowledge Base context from the API and injects it into the LLM request as system or user messages.

Note: The YAML configuration key loads: retains its existing name during the compatibility period. The user-facing feature is now called Knowledge Base.

loads:
enabled: true
backend: "api"
recall_top_k: 5
write_mode: "full"
timeout_ms: 3000
fail_open: true
FieldTypeRequiredDefaultDescription
enabledbooleannofalseEnable context loading
backendstringno"api"api (fetch from control plane) or memory (in-process)
recall_top_kintegerno5Number of context items to inject
write_modestringno"full"full (write back response), read_only, metadata_only
timeout_msintegerno3000Timeout for context fetch
fail_openbooleannotrueContinue if load fails (true) or block (false)

Read-only mode

loads:
enabled: true
backend: "api"
write_mode: "read_only" # load context but don't write responses back
recall_top_k: 10

History

The history: section controls whether the gateway writes conversation history to the API.

history:
enabled: true
mode: "raw"
include_blocked: false
retention_days: 90
fail_open: true
FieldTypeRequiredDefaultDescription
enabledbooleannotrueEnable history writing
modestringno"raw"raw (full messages), metadata_only (model, tokens, verdict), disabled
include_blockedbooleannofalseWrite blocked requests to history
retention_daysintegernoAuto-purge after N days (API-side enforcement)
fail_openbooleannotrueContinue if history write fails

Metadata-only mode

Useful for compliance scenarios where you need audit records but cannot store conversation content.

history:
enabled: true
mode: "metadata_only" # records model, token count, latency, verdict
include_blocked: true

Disabling history

history:
enabled: false

Learning

The learning: section defines how agent-scoped learned-session synthesis behaves for the deployed configuration.

learning:
enabled: true
source_policy: "allowed_only"
strategy: "extract"
previous_sessions_max: 5
previous_allowed_requests_max: 20
allowed_only: true
schedule: "manual"
FieldTypeRequiredDefaultDescription
enabledbooleannofalseEnable learning
source_policystringno"allowed_only"allowed_only or all
strategystringno"extract"extract (key facts), condense (summarize), hybrid (both)
previous_sessions_maxintegerno5Max previous sessions to learn from
previous_allowed_requests_maxintegerno20Max allowed (non-blocked) requests to consider
allowed_onlybooleannotrueMust match source_policy; when false, blocked turns can contribute
schedulestringno"manual"manual or on_session_close

Memory

The memory: section controls which recalled sources can participate in agent runtime context and how aggressively they are filtered.

memory:
enabled: true
trust_floor: "trusted"
eligible_sources:
knowledge: true
memories: true
learned_sessions: true
working_context: true
frozen_memories: false
recall:
memory_scope: "agent"
memory_budget: 8
knowledge_budget: 8
require_citations: true

Use this section when you need to tighten memory trust, change recall scope, or budget how many recalled items can be injected for agent execution.

Review

The review: section controls the optional second-pass review executor for agent responses.

review:
enabled: true
mode: "judge"
provider: "openai"
model: "gpt-4.1-mini"
timeout_ms: 5000
recursion_depth_max: 1
provider_isolation: true

Use this section when you need inline review, response rewriting, or escalation-oriented review behavior to travel with the deployed configuration.

Strategy comparison

StrategyBehaviorBest for
extractPulls key facts and data pointsFactual Q&A, support bots
condenseSummarizes conversation threadsLong conversations, chat assistants
hybridExtracts then condensesComplex multi-turn workflows

Agents

The agents: section configures the default agent identity for the gateway.

agents:
default_agent_id: "agent-123"
FieldTypeRequiredDefaultDescription
default_agent_idstringnoDefault agent ID attached to events/traces

The agent ID is included in decision events and traces sent to the API, enabling per-agent analytics and policy filtering.

Moderation

The moderation: section enables external content moderation before or after LLM calls.

moderation:
provider: "openai"
secret_key_ref:
env: "OPENAI_API_KEY"
categories:
- "violence"
- "hate"
- "self-harm"
- "sexual"
threshold: 0.7
FieldTypeRequiredDefaultDescription
providerstringyesopenai or azure
secret_key_refobjectyesObject reference to the moderation API key (env or store)
endpointstringnoCustom moderation endpoint (for Azure)
categoriesstring[]noallCategories to check
thresholdnumberno0.7Score threshold for flagging (0.0–1.0)

Azure moderation

moderation:
provider: "azure"
secret_key_ref:
env: "AZURE_CONTENT_SAFETY_KEY"
endpoint: "https://my-resource.cognitiveservices.azure.com"
categories: ["violence", "hate"]
threshold: 0.5

Auto provider

The auto_provider: section enables intelligent automatic routing across all configured providers based on cost, latency, and availability.

auto_provider:
enabled: true
name: "auto"
routing:
cost_weight: 0.6
latency_weight: 0.4
max_price_per_1m_tokens: 10.0
fallback_enabled: true
unhealthy_threshold: 3
FieldTypeRequiredDefaultDescription
enabledbooleannofalseEnable automatic provider selection
namestringno"auto"Name for this virtual provider
routing.cost_weightnumberno0.5Weight for cost optimization (0.0–1.0)
routing.latency_weightnumberno0.5Weight for latency optimization (0.0–1.0)
max_price_per_1m_tokensnumbernoExclude providers above this price
fallback_enabledbooleannotrueFall back to next-best if chosen provider fails
unhealthy_thresholdintegerno3Consecutive failures before excluding a provider

Cost-optimized routing

auto_provider:
enabled: true
routing:
cost_weight: 1.0
latency_weight: 0.0
max_price_per_1m_tokens: 5.0

Latency-optimized routing

auto_provider:
enabled: true
routing:
cost_weight: 0.0
latency_weight: 1.0

Models endpoint

The models: section controls the /v1/models listing endpoint.

models:
disabled: false
include_disabled: false
exposed_model_ids: []
FieldTypeRequiredDefaultDescription
disabledbooleannofalseDisable the models endpoint entirely
include_disabledbooleannofalseInclude disabled models in the listing
exposed_model_idsarray<string>no[]Restrict the catalog to the listed /v1/models IDs; useful for hosted gateways that should publish only admin-approved models to every org

Hosted gateway provider credentials

Hosted gateways use gateway keys for runtime authentication and config variables for provider credentials. Keep provider secrets out of YAML by referencing them through secret_key_ref.

pack:
name: config-runtime-providers-14
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-chatgpt-shared
provider: openai
model: gpt-4o
base_url: https://api.openai.com/v1
secret_key_ref:
store: openai
scope: platform
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Important details:

BehaviorResult
secret_key_ref.scope: platformResolves from platform-managed config variables
Org-specific overlaysMay use cascade, org, team, or user, but not platform
Shared target IDs with platform secretsOrg overlays cannot replace those target IDs
--policy-configHosted gateways accept repeated startup config files; later flags overlay earlier ones

Cache

The cache: section enables response caching for identical or semantically similar requests.

Exact cache

cache:
enabled: true
mode: "exact"
default_on: true
ttl_seconds: 3600

Exact caching hashes the full request body. Identical requests return the cached response.

Semantic cache

cache:
enabled: true
mode: "semantic"
similarity_threshold: 0.95
embedding_provider: "openai"
default_on: true
ttl_seconds: 3600

Semantic caching embeds the request and finds similar past requests above the similarity threshold.

FieldTypeRequiredDefaultDescription
enabledbooleannofalseEnable caching
modestringyesexact or semantic
similarity_thresholdnumberno0.95Min cosine similarity for semantic cache hits
embedding_providerstringnoProvider for computing embeddings (semantic mode)
default_onbooleannotrueCache by default (can be overridden per-request)
ttl_secondsintegerno3600Cache entry time-to-live

Cache backend

The cache backend is selected via the KEEPTRUSTS_CACHE_BACKEND environment variable:

  • memory — In-process LRU cache (default, no persistence)
  • redis — Redis/Valkey for distributed caching
KEEPTRUSTS_CACHE_BACKEND=redis
KEEPTRUSTS_CACHE_REDIS_URL=redis://localhost:6379/1

Complete runtime configuration example

pack:
name: full-runtime
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-prod
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
loads:
enabled: true
backend: api
recall_top_k: 5
write_mode: full
timeout_ms: 3000
fail_open: true
history:
enabled: true
mode: raw
include_blocked: true
retention_days: 90
fail_open: true
learning:
enabled: true
source_policy: quality-scorer
strategy: hybrid
previous_sessions_max: 5
previous_allowed_requests_max: 20
agents:
default_agent_id: support-bot-v2
moderation:
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
categories:
- violence
- hate
- self-harm
threshold: 0.7
auto_provider:
enabled: true
routing:
cost_weight: 0.6
latency_weight: 0.4
max_price_per_1m_tokens: 10.0
fallback_enabled: true
models:
disabled: false
include_disabled: false
cache:
enabled: true
mode: semantic
similarity_threshold: 0.95
embedding_provider: openai
default_on: true
ttl_seconds: 7200
policies:
chain:
- prompt-injection
- pii-detector
- quality-scorer
- audit-logger

Org-Scoped Configuration Admin

Configuration admin operations (create, update, deploy, rollback) are tenant-scoped. Each configuration belongs to exactly one organization, and configuration version history is isolated per org. Key behavioral rules:

  • YAML targeting authorization uses the same action namespace as normal IAM roles (configs:write, configs:deploy).
  • All config handlers enforce org-level pagination and tenant isolation.
  • Config-variable resolution requires explicit secrets:resolve permission; the resolver uses a fail-open or fail-closed policy (set in the config) for missing secret references.
  • Git-based configuration import uses org-scoped credentials and enforces the same targeting and storage-accounting rules as direct saves.

For AI systems

  • Canonical terms: Keeptrusts, loads, history, learning, agents, moderation, auto_provider, models, cache, Knowledge Base, recall_top_k
  • Config/command names: loads: (Knowledge Base recall), history:, learning:, agents:, moderation:, auto_provider:, models:, cache:
  • Best next pages: Providers Configuration, Config Scenarios, Declarative Config Reference

For engineers

  • Prerequisites: A running Keeptrusts API for Knowledge Base recall and history features. For moderation, an OpenAI or Azure Content Safety API key.
  • Validation: Start the gateway and verify runtime config with curl http://localhost:8080/keeptrusts/config | jq .loads. Test history by sending a request and checking the console History page. Validate moderation by sending flaggable content.
  • Key commands: kt gateway run, curl /keeptrusts/config, kt events tail

For leaders

  • Governance: History mode (raw vs metadata_only) determines what conversation data is retained. Use metadata_only when content retention is prohibited but audit evidence is required.
  • Cost: Knowledge Base recall and learning features add API calls per request. Auto-provider routing optimizes cost/latency tradeoffs automatically but requires accurate pricing declarations.
  • Rollout: Enable history first for visibility, then add Knowledge Base recall for context enrichment. Enable learning only after quality-scorer baselines are established.

Next steps