Org-Shared Engineering Cache

Org-shared engineering cache helps teams lower AI usage costs when many engineers work on the same codebases. Keeptrusts builds reusable codebase context once, then gateways and agents reuse that context safely across compatible requests.

Use this page when

You want to reduce AI usage costs by sharing codebase context across engineers working on the same repositories.
You are configuring workflow_cache in your declarative policy config.
You need to understand how Codebase Context Fabric, org-shared cache, and provider prompt-prefix cache interact.
You are setting up agent gateway groups for failover-safe caching.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

How It Works

Keeptrusts uses three layers together:

Codebase Context Fabric builds versioned repository intelligence such as repo maps, file summaries, dependency graphs, test maps, API inventories, symbol indexes, embeddings, recent change summaries, and known failure fingerprints.
Org-shared cache stores reusable fabric artifacts and exact read-only responses under org, repo, codebase identity, policy, and entitlement gates.
Provider prompt-prefix cache reduces provider-side cost when stable context still needs to be sent upstream.

The first fill for an organization can cost more because Keeptrusts needs to build the shared context. Later prompts from other engineers can reuse the same fabric artifacts or exact read-only responses, reducing marginal cost.

Accuracy Defaults

Keeptrusts favors accuracy over blind replay:

Read-only explanations and summaries can use exact response replay when the request, codebase identity, policy, provider, model, and entitlements match.
Semantic lookup is used primarily to find reusable context.
Direct semantic replay is disabled unless enabled by org, repo, agent, and declarative configuration policy.
Code-changing, destructive, security-sensitive, approval, and write-operation prompts generate fresh answers instead of replaying semantic matches.

Freshness

Code-aware cache entries are tied to repository freshness signals:

Repository ID
Branch or ref
Commit SHA or tree hash
Relevant file digests
Agent version
Policy/config digest
Entitlement digest

If code changes, full-response replay misses instead of returning stale answers. Unchanged fabric slices can still be reused when their source digests match.

Avoided Cost

Cache-hit economics records avoided provider cost only. Cache hits do not debit wallet balance and do not charge a separate platform fee.

Admins can review:

Fill cost
Avoided provider cost
Provider cached-token savings
Net savings
Hit rate
Stale misses
Single-flight collapses

Chat Prompt Cost Estimates

Before a chat or hosted gateway task is dispatched, Keeptrusts can estimate the prompt plan cost. The estimate includes uncached input cost, provider cached input discount, Context Fabric or org-shared cache avoided cost, output budget, and worst-case total. Unsupported providers use heuristic confidence labels so users can see when the estimate is approximate.

After completion, chat compares estimated prompt tokens, estimated output tokens, estimated total cost, actual prompt tokens, actual output tokens, actual cached input tokens, and actual total cost when gateway usage metadata is available. The reconciliation is advisory; provider spend logs remain the billing source of truth.

Declarative Config

Use workflow_cache to configure codebase identity and semantic replay:

workflow_cache:
  enabled: true
  default_tier: org_shared_cache
  org_shared_enabled: true
  direct_semantic_replay_enabled: false
  codebase_identity_mode: repository_isolated

For teams that operate multiple repositories as one monorepo-scale system:

workflow_cache:
  enabled: true
  default_tier: org_shared_cache
  org_shared_enabled: true
  codebase_identity_mode: monorepo_group
  monorepo_group_id: core-platform
  monorepo_repo_ids:
    - api
    - cli
    - console
  agent_gateway_group_cache_sharing_enabled: true
  agent_gateway_group_id: primary-agent-gateways

Repository-isolated mode is the default. Use monorepo grouping only when the repositories should intentionally share one codebase identity.

Agent Gateway Groups

When an agent is served by multiple hosted or edge gateways, Keeptrusts can put those gateways in an agent gateway group. Org-shared cache keys include the agent_id and agent_gateway_group_id, not the physical gateway ID, so a failover from one gateway to another does not invalidate compatible cache entries. Set physical_gateway_private_cache_only: true only when a deployment must isolate cache entries to one physical gateway.

Hosted gateway task execution is the public task interface for this flow. Older internal records may still contain runner field names, but new task execution metadata should use execution_surface: hosted_gateway.

Knowledge Base And Context Fabric Together

Chats and hosted gateway tasks can use both curated Knowledge Base assets and Codebase Context Fabric artifacts. Use Knowledge Base for approved guidance, playbooks, policy text, and durable engineering decisions. Use Context Fabric for fresh repository maps, dependency graphs, test maps, symbol indexes, file summaries, and source-digest-bound provenance.

Responses should keep the two source types separate:

Curated Knowledge Base citations point to governed assets and versions.
Codebase Context Fabric provenance points to artifact type, repo or folder identity, source digest, and freshness identity. It does not expose raw hidden source content.

Operational Checks

Before rollout, verify:

worker_cache_warmer is running.
The connected repo has fresh fabric artifacts.
Replay audit shows no cross-org hits.
Cache-hit economics records avoided cost only.
Code-changing prompts do not use semantic direct replay.

For AI systems

Canonical terms: Keeptrusts org-shared engineering cache, Codebase Context Fabric, workflow cache, semantic replay, provider prompt-prefix cache, agent gateway group, cache economics, avoided cost.
Config key: workflow_cache with sub-keys enabled, default_tier, org_shared_enabled, direct_semantic_replay_enabled, codebase_identity_mode, monorepo_group_id, agent_gateway_group_id.
Freshness signals: repository ID, branch/ref, commit SHA, file digests, agent version, policy/config digest, entitlement digest.
Related pages: Knowledge Base, Cost & Spend, Gateway Configuration.

For engineers

Before rollout, verify worker_cache_warmer is running and that connected repos have fresh fabric artifacts.
Cache entries are invalidated when source digests change — code-changing prompts always generate fresh answers.
Use codebase_identity_mode: repository_isolated (default) unless repositories should intentionally share identity.
Set direct_semantic_replay_enabled: false (default) to avoid replaying semantically similar but non-identical responses.
Monitor cache-hit economics: fill cost, avoided provider cost, hit rate, and stale misses in the admin dashboard.
Agent gateway groups allow failover without cache invalidation — cache keys use agent_id and agent_gateway_group_id, not physical gateway ID.

For leaders

Org-shared cache can significantly reduce marginal AI cost — the first fill costs more, but subsequent engineers reuse shared context.
Cache hits do not debit wallet balance and do not incur a platform fee — savings are pure avoided provider cost.
Accuracy is prioritized over savings: code-changing, security-sensitive, and approval-related prompts always get fresh answers.
Review cache economics (fill cost vs. avoided cost, hit rate) to measure ROI before expanding to additional teams or repositories.
Monorepo grouping and agent gateway groups are advanced configurations — start with repository-isolated mode for straightforward governance.

Use this page when​

Primary audience​

How It Works​

Accuracy Defaults​

Freshness​

Avoided Cost​

Chat Prompt Cost Estimates​

Declarative Config​

Agent Gateway Groups​

Knowledge Base And Context Fabric Together​

Operational Checks​

For AI systems​

For engineers​

For leaders​

Next steps​