Runtime Configuration

Runtime sections control stateful gateway features: conversation loads, history persistence, memory recall, learning from past sessions, review execution, agent routing, content moderation, automatic provider selection, model listing, and response caching.

Use this page when

You are configuring stateful gateway features like Knowledge Base recall, conversation history, memory recall, session learning, or review execution.
You need to set up auto-provider routing, content moderation, or the /v1/models listing endpoint.
You are tuning caching, agent identity, or hosted gateway control-plane settings.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Knowledge Base Recall

The loads: section retrieves Knowledge Base context from the API and injects it into the LLM request as system or user messages.

Note: The YAML configuration key loads: retains its existing name during the compatibility period. The user-facing feature is now called Knowledge Base.

loads:
  enabled: true
  backend: "api"
  recall_top_k: 5
  write_mode: "full"
  timeout_ms: 3000
  fail_open: true

Field	Type	Required	Default	Description
`enabled`	boolean	no	`false`	Enable context loading
`backend`	string	no	`"api"`	`api` (fetch from control plane) or `memory` (in-process)
`recall_top_k`	integer	no	`5`	Number of context items to inject
`write_mode`	string	no	`"full"`	`full` (write back response), `read_only`, `metadata_only`
`timeout_ms`	integer	no	`3000`	Timeout for context fetch
`fail_open`	boolean	no	`true`	Continue if load fails (`true`) or block (`false`)

Read-only mode

loads:
  enabled: true
  backend: "api"
  write_mode: "read_only"     # load context but don't write responses back
  recall_top_k: 10

History

The history: section controls whether the gateway writes conversation history to the API.

history:
  enabled: true
  mode: "raw"
  include_blocked: false
  retention_days: 90
  fail_open: true

Field	Type	Required	Default	Description
`enabled`	boolean	no	`true`	Enable history writing
`mode`	string	no	`"raw"`	`raw` (full messages), `metadata_only` (model, tokens, verdict), `disabled`
`include_blocked`	boolean	no	`false`	Write blocked requests to history
`retention_days`	integer	no	—	Auto-purge after N days (API-side enforcement)
`fail_open`	boolean	no	`true`	Continue if history write fails

Metadata-only mode

Useful for compliance scenarios where you need audit records but cannot store conversation content.

history:
  enabled: true
  mode: "metadata_only"       # records model, token count, latency, verdict
  include_blocked: true

Disabling history

history:
  enabled: false

Learning

The learning: section defines how agent-scoped learned-session synthesis behaves for the deployed configuration.

learning:
  enabled: true
  source_policy: "allowed_only"
  strategy: "extract"
  previous_sessions_max: 5
  previous_allowed_requests_max: 20
  allowed_only: true
  schedule: "manual"

Field	Type	Required	Default	Description
`enabled`	boolean	no	`false`	Enable learning
`source_policy`	string	no	`"allowed_only"`	`allowed_only` or `all`
`strategy`	string	no	`"extract"`	`extract` (key facts), `condense` (summarize), `hybrid` (both)
`previous_sessions_max`	integer	no	`5`	Max previous sessions to learn from
`previous_allowed_requests_max`	integer	no	`20`	Max allowed (non-blocked) requests to consider
`allowed_only`	boolean	no	`true`	Must match `source_policy`; when `false`, blocked turns can contribute
`schedule`	string	no	`"manual"`	`manual` or `on_session_close`

Memory

The memory: section controls which recalled sources can participate in agent runtime context and how aggressively they are filtered.

memory:
  enabled: true
  trust_floor: "trusted"
  eligible_sources:
    knowledge: true
    memories: true
    learned_sessions: true
    working_context: true
    frozen_memories: false
  recall:
    memory_scope: "agent"
    memory_budget: 8
    knowledge_budget: 8
    require_citations: true

Use this section when you need to tighten memory trust, change recall scope, or budget how many recalled items can be injected for agent execution.

Review

The review: section controls the optional second-pass review executor for agent responses.

review:
  enabled: true
  mode: "judge"
  provider: "openai"
  model: "gpt-4.1-mini"
  timeout_ms: 5000
  recursion_depth_max: 1
  provider_isolation: true

Use this section when you need inline review, response rewriting, or escalation-oriented review behavior to travel with the deployed configuration.

Strategy comparison

Strategy	Behavior	Best for
`extract`	Pulls key facts and data points	Factual Q&A, support bots
`condense`	Summarizes conversation threads	Long conversations, chat assistants
`hybrid`	Extracts then condenses	Complex multi-turn workflows

Agents

The agents: section configures the default agent identity for the gateway.

agents:
  default_agent_id: "agent-123"

Field	Type	Required	Default	Description
`default_agent_id`	string	no	—	Default agent ID attached to events/traces

The agent ID is included in decision events and traces sent to the API, enabling per-agent analytics and policy filtering.

Moderation

The moderation: section enables external content moderation before or after LLM calls.

moderation:
  provider: "openai"
  secret_key_ref:
    env: "OPENAI_API_KEY"
  categories:
    - "violence"
    - "hate"
    - "self-harm"
    - "sexual"
  threshold: 0.7

Field	Type	Required	Default	Description
`provider`	string	yes	—	`openai` or `azure`
`secret_key_ref`	object	yes	—	Object reference to the moderation API key (`env` or `store`)
`endpoint`	string	no	—	Custom moderation endpoint (for Azure)
`categories`	string[]	no	all	Categories to check
`threshold`	number	no	`0.7`	Score threshold for flagging (0.0–1.0)

Azure moderation

moderation:
  provider: "azure"
  secret_key_ref:
    env: "AZURE_CONTENT_SAFETY_KEY"
  endpoint: "https://my-resource.cognitiveservices.azure.com"
  categories: ["violence", "hate"]
  threshold: 0.5

Auto provider

The auto_provider: section enables intelligent automatic routing across all configured providers based on cost, latency, and availability.

auto_provider:
  enabled: true
  name: "auto"
  routing:
    cost_weight: 0.6
    latency_weight: 0.4
  max_price_per_1m_tokens: 10.0
  fallback_enabled: true
  unhealthy_threshold: 3

Field	Type	Required	Default	Description
`enabled`	boolean	no	`false`	Enable automatic provider selection
`name`	string	no	`"auto"`	Name for this virtual provider
`routing.cost_weight`	number	no	`0.5`	Weight for cost optimization (0.0–1.0)
`routing.latency_weight`	number	no	`0.5`	Weight for latency optimization (0.0–1.0)
`max_price_per_1m_tokens`	number	no	—	Exclude providers above this price
`fallback_enabled`	boolean	no	`true`	Fall back to next-best if chosen provider fails
`unhealthy_threshold`	integer	no	`3`	Consecutive failures before excluding a provider

Cost-optimized routing

auto_provider:
  enabled: true
  routing:
    cost_weight: 1.0
    latency_weight: 0.0
  max_price_per_1m_tokens: 5.0

Latency-optimized routing

auto_provider:
  enabled: true
  routing:
    cost_weight: 0.0
    latency_weight: 1.0

Models endpoint

The models: section controls the /v1/models listing endpoint.

models:
  disabled: false
  include_disabled: false
  exposed_model_ids: []

Field	Type	Required	Default	Description
`disabled`	boolean	no	`false`	Disable the models endpoint entirely
`include_disabled`	boolean	no	`false`	Include disabled models in the listing
`exposed_model_ids`	`array<string>`	no	`[]`	Restrict the catalog to the listed `/v1/models` IDs; useful for hosted gateways that should publish only admin-approved models to every org

Hosted gateway provider credentials

Hosted gateways use gateway keys for runtime authentication and config variables for provider credentials. Keep provider secrets out of YAML by referencing them through secret_key_ref.

pack:
  name: config-runtime-providers-14
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-chatgpt-shared
    provider: openai
    model: gpt-4o
    base_url: https://api.openai.com/v1
    secret_key_ref:
      store: openai
      scope: platform
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Important details:

Behavior	Result
`secret_key_ref.scope: platform`	Resolves from platform-managed config variables
Org-specific overlays	May use `cascade`, `org`, `team`, or `user`, but not `platform`
Shared target IDs with platform secrets	Org overlays cannot replace those target IDs
`--policy-config`	Hosted gateways accept repeated startup config files; later flags overlay earlier ones

Cache

The cache: section enables response caching for identical or semantically similar requests.

Exact cache

cache:
  enabled: true
  mode: "exact"
  default_on: true
  ttl_seconds: 3600

Exact caching hashes the full request body. Identical requests return the cached response.

Semantic cache

cache:
  enabled: true
  mode: "semantic"
  similarity_threshold: 0.95
  embedding_provider: "openai"
  default_on: true
  ttl_seconds: 3600

Semantic caching embeds the request and finds similar past requests above the similarity threshold.

Field	Type	Required	Default	Description
`enabled`	boolean	no	`false`	Enable caching
`mode`	string	yes	—	`exact` or `semantic`
`similarity_threshold`	number	no	`0.95`	Min cosine similarity for semantic cache hits
`embedding_provider`	string	no	—	Provider for computing embeddings (semantic mode)
`default_on`	boolean	no	`true`	Cache by default (can be overridden per-request)
`ttl_seconds`	integer	no	`3600`	Cache entry time-to-live

Cache backend

The cache backend is selected via the KEEPTRUSTS_CACHE_BACKEND environment variable:

memory — In-process LRU cache (default, no persistence)
redis — Redis/Valkey for distributed caching

KEEPTRUSTS_CACHE_BACKEND=redis
KEEPTRUSTS_CACHE_REDIS_URL=redis://localhost:6379/1

Complete runtime configuration example

pack:
  name: full-runtime
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-prod
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic-prod
    provider: anthropic
    model: claude-sonnet-4-20250514
    secret_key_ref:
      env: ANTHROPIC_API_KEY
loads:
  enabled: true
  backend: api
  recall_top_k: 5
  write_mode: full
  timeout_ms: 3000
  fail_open: true
history:
  enabled: true
  mode: raw
  include_blocked: true
  retention_days: 90
  fail_open: true
learning:
  enabled: true
  source_policy: quality-scorer
  strategy: hybrid
  previous_sessions_max: 5
  previous_allowed_requests_max: 20
agents:
  default_agent_id: support-bot-v2
moderation:
  provider: openai
  secret_key_ref:
    env: OPENAI_API_KEY
  categories:
  - violence
  - hate
  - self-harm
  threshold: 0.7
auto_provider:
  enabled: true
  routing:
    cost_weight: 0.6
    latency_weight: 0.4
  max_price_per_1m_tokens: 10.0
  fallback_enabled: true
models:
  disabled: false
  include_disabled: false
cache:
  enabled: true
  mode: semantic
  similarity_threshold: 0.95
  embedding_provider: openai
  default_on: true
  ttl_seconds: 7200
policies:
  chain:
  - prompt-injection
  - pii-detector
  - quality-scorer
  - audit-logger

Org-Scoped Configuration Admin

Configuration admin operations (create, update, deploy, rollback) are tenant-scoped. Each configuration belongs to exactly one organization, and configuration version history is isolated per org. Key behavioral rules:

YAML targeting authorization uses the same action namespace as normal IAM roles (configs:write, configs:deploy).
All config handlers enforce org-level pagination and tenant isolation.
Config-variable resolution requires explicit secrets:resolve permission; the resolver uses a fail-open or fail-closed policy (set in the config) for missing secret references.
Git-based configuration import uses org-scoped credentials and enforces the same targeting and storage-accounting rules as direct saves.

For AI systems

Canonical terms: Keeptrusts, loads, history, learning, agents, moderation, auto_provider, models, cache, Knowledge Base, recall_top_k
Config/command names: loads: (Knowledge Base recall), history:, learning:, agents:, moderation:, auto_provider:, models:, cache:
Best next pages: Providers Configuration, Config Scenarios, Declarative Config Reference

For engineers

Prerequisites: A running Keeptrusts API for Knowledge Base recall and history features. For moderation, an OpenAI or Azure Content Safety API key.
Validation: Start the gateway and verify runtime config with curl http://localhost:8080/keeptrusts/config | jq .loads. Test history by sending a request and checking the console History page. Validate moderation by sending flaggable content.
Key commands: kt gateway run, curl /keeptrusts/config, kt events tail

For leaders

Governance: History mode (raw vs metadata_only) determines what conversation data is retained. Use metadata_only when content retention is prohibited but audit evidence is required.
Cost: Knowledge Base recall and learning features add API calls per request. Auto-provider routing optimizes cost/latency tradeoffs automatically but requires accurate pricing declarations.
Rollout: Enable history first for visibility, then add Knowledge Base recall for context enrichment. Enable learning only after quality-scorer baselines are established.

Next steps

Providers Configuration — Provider targets for moderation and auto-provider
Config Scenarios — End-to-end runtime configuration examples
Declarative Config Reference — Full schema reference
Quality Scorer — Quality gates for learning source selection

Use this page when​

Primary audience​

Knowledge Base Recall​

Read-only mode​

History​

Metadata-only mode​

Disabling history​

Learning​

Memory​

Review​

Strategy comparison​

Agents​

Moderation​

Azure moderation​

Auto provider​

Cost-optimized routing​

Latency-optimized routing​

Models endpoint​

Hosted gateway provider credentials​

Cache​

Exact cache​

Semantic cache​

Cache backend​

Complete runtime configuration example​

Org-Scoped Configuration Admin​

For AI systems​

For engineers​

For leaders​

Next steps​