Runtime Configuration
Runtime sections control stateful gateway features: conversation loads, history persistence, memory recall, learning from past sessions, review execution, agent routing, content moderation, automatic provider selection, model listing, and response caching.
Use this page when
- You are configuring stateful gateway features like Knowledge Base recall, conversation history, memory recall, session learning, or review execution.
- You need to set up auto-provider routing, content moderation, or the
/v1/modelslisting endpoint. - You are tuning caching, agent identity, or hosted gateway control-plane settings.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Knowledge Base Recall
The loads: section retrieves Knowledge Base context from the API and injects it into the LLM request as system or user messages.
Note: The YAML configuration key
loads:retains its existing name during the compatibility period. The user-facing feature is now called Knowledge Base.
loads:
enabled: true
backend: "api"
recall_top_k: 5
write_mode: "full"
timeout_ms: 3000
fail_open: true
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | no | false | Enable context loading |
backend | string | no | "api" | api (fetch from control plane) or memory (in-process) |
recall_top_k | integer | no | 5 | Number of context items to inject |
write_mode | string | no | "full" | full (write back response), read_only, metadata_only |
timeout_ms | integer | no | 3000 | Timeout for context fetch |
fail_open | boolean | no | true | Continue if load fails (true) or block (false) |
Read-only mode
loads:
enabled: true
backend: "api"
write_mode: "read_only" # load context but don't write responses back
recall_top_k: 10
History
The history: section controls whether the gateway writes conversation history to the API.
history:
enabled: true
mode: "raw"
include_blocked: false
retention_days: 90
fail_open: true
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | no | true | Enable history writing |
mode | string | no | "raw" | raw (full messages), metadata_only (model, tokens, verdict), disabled |
include_blocked | boolean | no | false | Write blocked requests to history |
retention_days | integer | no | — | Auto-purge after N days (API-side enforcement) |
fail_open | boolean | no | true | Continue if history write fails |
Metadata-only mode
Useful for compliance scenarios where you need audit records but cannot store conversation content.
history:
enabled: true
mode: "metadata_only" # records model, token count, latency, verdict
include_blocked: true
Disabling history
history:
enabled: false
Learning
The learning: section defines how agent-scoped learned-session synthesis behaves for the deployed configuration.
learning:
enabled: true
source_policy: "allowed_only"
strategy: "extract"
previous_sessions_max: 5
previous_allowed_requests_max: 20
allowed_only: true
schedule: "manual"
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | no | false | Enable learning |
source_policy | string | no | "allowed_only" | allowed_only or all |
strategy | string | no | "extract" | extract (key facts), condense (summarize), hybrid (both) |
previous_sessions_max | integer | no | 5 | Max previous sessions to learn from |
previous_allowed_requests_max | integer | no | 20 | Max allowed (non-blocked) requests to consider |
allowed_only | boolean | no | true | Must match source_policy; when false, blocked turns can contribute |
schedule | string | no | "manual" | manual or on_session_close |
Memory
The memory: section controls which recalled sources can participate in agent runtime context and how aggressively they are filtered.
memory:
enabled: true
trust_floor: "trusted"
eligible_sources:
knowledge: true
memories: true
learned_sessions: true
working_context: true
frozen_memories: false
recall:
memory_scope: "agent"
memory_budget: 8
knowledge_budget: 8
require_citations: true
Use this section when you need to tighten memory trust, change recall scope, or budget how many recalled items can be injected for agent execution.
Review
The review: section controls the optional second-pass review executor for agent responses.
review:
enabled: true
mode: "judge"
provider: "openai"
model: "gpt-4.1-mini"
timeout_ms: 5000
recursion_depth_max: 1
provider_isolation: true
Use this section when you need inline review, response rewriting, or escalation-oriented review behavior to travel with the deployed configuration.
Strategy comparison
| Strategy | Behavior | Best for |
|---|---|---|
extract | Pulls key facts and data points | Factual Q&A, support bots |
condense | Summarizes conversation threads | Long conversations, chat assistants |
hybrid | Extracts then condenses | Complex multi-turn workflows |
Agents
The agents: section configures the default agent identity for the gateway.
agents:
default_agent_id: "agent-123"
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
default_agent_id | string | no | — | Default agent ID attached to events/traces |
The agent ID is included in decision events and traces sent to the API, enabling per-agent analytics and policy filtering.
Moderation
The moderation: section enables external content moderation before or after LLM calls.
moderation:
provider: "openai"
secret_key_ref:
env: "OPENAI_API_KEY"
categories:
- "violence"
- "hate"
- "self-harm"
- "sexual"
threshold: 0.7
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | yes | — | openai or azure |
secret_key_ref | object | yes | — | Object reference to the moderation API key (env or store) |
endpoint | string | no | — | Custom moderation endpoint (for Azure) |
categories | string[] | no | all | Categories to check |
threshold | number | no | 0.7 | Score threshold for flagging (0.0–1.0) |
Azure moderation
moderation:
provider: "azure"
secret_key_ref:
env: "AZURE_CONTENT_SAFETY_KEY"
endpoint: "https://my-resource.cognitiveservices.azure.com"
categories: ["violence", "hate"]
threshold: 0.5
Auto provider
The auto_provider: section enables intelligent automatic routing across all configured providers based on cost, latency, and availability.
auto_provider:
enabled: true
name: "auto"
routing:
cost_weight: 0.6
latency_weight: 0.4
max_price_per_1m_tokens: 10.0
fallback_enabled: true
unhealthy_threshold: 3
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | no | false | Enable automatic provider selection |
name | string | no | "auto" | Name for this virtual provider |
routing.cost_weight | number | no | 0.5 | Weight for cost optimization (0.0–1.0) |
routing.latency_weight | number | no | 0.5 | Weight for latency optimization (0.0–1.0) |
max_price_per_1m_tokens | number | no | — | Exclude providers above this price |
fallback_enabled | boolean | no | true | Fall back to next-best if chosen provider fails |
unhealthy_threshold | integer | no | 3 | Consecutive failures before excluding a provider |
Cost-optimized routing
auto_provider:
enabled: true
routing:
cost_weight: 1.0
latency_weight: 0.0
max_price_per_1m_tokens: 5.0
Latency-optimized routing
auto_provider:
enabled: true
routing:
cost_weight: 0.0
latency_weight: 1.0
Models endpoint
The models: section controls the /v1/models listing endpoint.
models:
disabled: false
include_disabled: false
exposed_model_ids: []
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
disabled | boolean | no | false | Disable the models endpoint entirely |
include_disabled | boolean | no | false | Include disabled models in the listing |
exposed_model_ids | array<string> | no | [] | Restrict the catalog to the listed /v1/models IDs; useful for hosted gateways that should publish only admin-approved models to every org |
Hosted gateway provider credentials
Hosted gateways use gateway keys for runtime authentication and config variables for provider credentials. Keep provider secrets out of YAML by referencing them through secret_key_ref.
pack:
name: config-runtime-providers-14
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-chatgpt-shared
provider: openai
model: gpt-4o
base_url: https://api.openai.com/v1
secret_key_ref:
store: openai
scope: platform
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Important details:
| Behavior | Result |
|---|---|
secret_key_ref.scope: platform | Resolves from platform-managed config variables |
| Org-specific overlays | May use cascade, org, team, or user, but not platform |
| Shared target IDs with platform secrets | Org overlays cannot replace those target IDs |
--policy-config | Hosted gateways accept repeated startup config files; later flags overlay earlier ones |
Cache
The cache: section enables response caching for identical or semantically similar requests.
Exact cache
cache:
enabled: true
mode: "exact"
default_on: true
ttl_seconds: 3600
Exact caching hashes the full request body. Identical requests return the cached response.
Semantic cache
cache:
enabled: true
mode: "semantic"
similarity_threshold: 0.95
embedding_provider: "openai"
default_on: true
ttl_seconds: 3600
Semantic caching embeds the request and finds similar past requests above the similarity threshold.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | no | false | Enable caching |
mode | string | yes | — | exact or semantic |
similarity_threshold | number | no | 0.95 | Min cosine similarity for semantic cache hits |
embedding_provider | string | no | — | Provider for computing embeddings (semantic mode) |
default_on | boolean | no | true | Cache by default (can be overridden per-request) |
ttl_seconds | integer | no | 3600 | Cache entry time-to-live |
Cache backend
The cache backend is selected via the KEEPTRUSTS_CACHE_BACKEND environment variable:
memory— In-process LRU cache (default, no persistence)redis— Redis/Valkey for distributed caching
KEEPTRUSTS_CACHE_BACKEND=redis
KEEPTRUSTS_CACHE_REDIS_URL=redis://localhost:6379/1
Complete runtime configuration example
pack:
name: full-runtime
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-prod
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
loads:
enabled: true
backend: api
recall_top_k: 5
write_mode: full
timeout_ms: 3000
fail_open: true
history:
enabled: true
mode: raw
include_blocked: true
retention_days: 90
fail_open: true
learning:
enabled: true
source_policy: quality-scorer
strategy: hybrid
previous_sessions_max: 5
previous_allowed_requests_max: 20
agents:
default_agent_id: support-bot-v2
moderation:
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
categories:
- violence
- hate
- self-harm
threshold: 0.7
auto_provider:
enabled: true
routing:
cost_weight: 0.6
latency_weight: 0.4
max_price_per_1m_tokens: 10.0
fallback_enabled: true
models:
disabled: false
include_disabled: false
cache:
enabled: true
mode: semantic
similarity_threshold: 0.95
embedding_provider: openai
default_on: true
ttl_seconds: 7200
policies:
chain:
- prompt-injection
- pii-detector
- quality-scorer
- audit-logger
Org-Scoped Configuration Admin
Configuration admin operations (create, update, deploy, rollback) are tenant-scoped. Each configuration belongs to exactly one organization, and configuration version history is isolated per org. Key behavioral rules:
- YAML targeting authorization uses the same action namespace as normal IAM roles
(
configs:write,configs:deploy). - All config handlers enforce org-level pagination and tenant isolation.
- Config-variable resolution requires explicit
secrets:resolvepermission; the resolver uses a fail-open or fail-closed policy (set in the config) for missing secret references. - Git-based configuration import uses org-scoped credentials and enforces the same targeting and storage-accounting rules as direct saves.
For AI systems
- Canonical terms: Keeptrusts, loads, history, learning, agents, moderation, auto_provider, models, cache, Knowledge Base, recall_top_k
- Config/command names:
loads:(Knowledge Base recall),history:,learning:,agents:,moderation:,auto_provider:,models:,cache: - Best next pages: Providers Configuration, Config Scenarios, Declarative Config Reference
For engineers
- Prerequisites: A running Keeptrusts API for Knowledge Base recall and history features. For moderation, an OpenAI or Azure Content Safety API key.
- Validation: Start the gateway and verify runtime config with
curl http://localhost:8080/keeptrusts/config | jq .loads. Test history by sending a request and checking the console History page. Validate moderation by sending flaggable content. - Key commands:
kt gateway run,curl /keeptrusts/config,kt events tail
For leaders
- Governance: History mode (
rawvsmetadata_only) determines what conversation data is retained. Usemetadata_onlywhen content retention is prohibited but audit evidence is required. - Cost: Knowledge Base recall and learning features add API calls per request. Auto-provider routing optimizes cost/latency tradeoffs automatically but requires accurate pricing declarations.
- Rollout: Enable history first for visibility, then add Knowledge Base recall for context enrichment. Enable learning only after quality-scorer baselines are established.
Next steps
- Providers Configuration — Provider targets for moderation and auto-provider
- Config Scenarios — End-to-end runtime configuration examples
- Declarative Config Reference — Full schema reference
- Quality Scorer — Quality gates for learning source selection