Org-Shared Engineering Cache
Org-shared engineering cache helps teams lower AI usage costs when many engineers work on the same codebases. Keeptrusts builds reusable codebase context once, then gateways and agents reuse that context safely across compatible requests.
Use this page when
- You want to reduce AI usage costs by sharing codebase context across engineers working on the same repositories.
- You are configuring
workflow_cachein your declarative policy config. - You need to understand how Codebase Context Fabric, org-shared cache, and provider prompt-prefix cache interact.
- You are setting up agent gateway groups for failover-safe caching.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
How It Works
Keeptrusts uses three layers together:
- Codebase Context Fabric builds versioned repository intelligence such as repo maps, file summaries, dependency graphs, test maps, API inventories, symbol indexes, embeddings, recent change summaries, and known failure fingerprints.
- Org-shared cache stores reusable fabric artifacts and exact read-only responses under org, repo, codebase identity, policy, and entitlement gates.
- Provider prompt-prefix cache reduces provider-side cost when stable context still needs to be sent upstream.
The first fill for an organization can cost more because Keeptrusts needs to build the shared context. Later prompts from other engineers can reuse the same fabric artifacts or exact read-only responses, reducing marginal cost.
Accuracy Defaults
Keeptrusts favors accuracy over blind replay:
- Read-only explanations and summaries can use exact response replay when the request, codebase identity, policy, provider, model, and entitlements match.
- Semantic lookup is used primarily to find reusable context.
- Direct semantic replay is disabled unless enabled by org, repo, agent, and declarative configuration policy.
- Code-changing, destructive, security-sensitive, approval, and write-operation prompts generate fresh answers instead of replaying semantic matches.
Freshness
Code-aware cache entries are tied to repository freshness signals:
- Repository ID
- Branch or ref
- Commit SHA or tree hash
- Relevant file digests
- Agent version
- Policy/config digest
- Entitlement digest
If code changes, full-response replay misses instead of returning stale answers. Unchanged fabric slices can still be reused when their source digests match.
Avoided Cost
Cache-hit economics records avoided provider cost only. Cache hits do not debit wallet balance and do not charge a separate platform fee.
Admins can review:
- Fill cost
- Avoided provider cost
- Provider cached-token savings
- Net savings
- Hit rate
- Stale misses
- Single-flight collapses
Chat Prompt Cost Estimates
Before a chat or hosted gateway task is dispatched, Keeptrusts can estimate the prompt plan cost. The estimate includes uncached input cost, provider cached input discount, Context Fabric or org-shared cache avoided cost, output budget, and worst-case total. Unsupported providers use heuristic confidence labels so users can see when the estimate is approximate.
After completion, chat compares estimated prompt tokens, estimated output tokens, estimated total cost, actual prompt tokens, actual output tokens, actual cached input tokens, and actual total cost when gateway usage metadata is available. The reconciliation is advisory; provider spend logs remain the billing source of truth.
Declarative Config
Use workflow_cache to configure codebase identity and semantic replay:
workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true
direct_semantic_replay_enabled: false
codebase_identity_mode: repository_isolated
For teams that operate multiple repositories as one monorepo-scale system:
workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true
codebase_identity_mode: monorepo_group
monorepo_group_id: core-platform
monorepo_repo_ids:
- api
- cli
- console
agent_gateway_group_cache_sharing_enabled: true
agent_gateway_group_id: primary-agent-gateways
Repository-isolated mode is the default. Use monorepo grouping only when the repositories should intentionally share one codebase identity.
Agent Gateway Groups
When an agent is served by multiple hosted or edge gateways, Keeptrusts can put
those gateways in an agent gateway group. Org-shared cache keys include the
agent_id and agent_gateway_group_id, not the physical gateway ID, so a
failover from one gateway to another does not invalidate compatible cache
entries. Set physical_gateway_private_cache_only: true only when a deployment
must isolate cache entries to one physical gateway.
Hosted gateway task execution is the public task interface for this flow. Older
internal records may still contain runner field names, but new task execution
metadata should use execution_surface: hosted_gateway.
Knowledge Base And Context Fabric Together
Chats and hosted gateway tasks can use both curated Knowledge Base assets and Codebase Context Fabric artifacts. Use Knowledge Base for approved guidance, playbooks, policy text, and durable engineering decisions. Use Context Fabric for fresh repository maps, dependency graphs, test maps, symbol indexes, file summaries, and source-digest-bound provenance.
Responses should keep the two source types separate:
- Curated Knowledge Base citations point to governed assets and versions.
- Codebase Context Fabric provenance points to artifact type, repo or folder identity, source digest, and freshness identity. It does not expose raw hidden source content.
Operational Checks
Before rollout, verify:
worker_cache_warmeris running.- The connected repo has fresh fabric artifacts.
- Replay audit shows no cross-org hits.
- Cache-hit economics records avoided cost only.
- Code-changing prompts do not use semantic direct replay.
For AI systems
- Canonical terms: Keeptrusts org-shared engineering cache, Codebase Context Fabric, workflow cache, semantic replay, provider prompt-prefix cache, agent gateway group, cache economics, avoided cost.
- Config key:
workflow_cachewith sub-keysenabled,default_tier,org_shared_enabled,direct_semantic_replay_enabled,codebase_identity_mode,monorepo_group_id,agent_gateway_group_id. - Freshness signals: repository ID, branch/ref, commit SHA, file digests, agent version, policy/config digest, entitlement digest.
- Related pages: Knowledge Base, Cost & Spend, Gateway Configuration.
For engineers
- Before rollout, verify
worker_cache_warmeris running and that connected repos have fresh fabric artifacts. - Cache entries are invalidated when source digests change — code-changing prompts always generate fresh answers.
- Use
codebase_identity_mode: repository_isolated(default) unless repositories should intentionally share identity. - Set
direct_semantic_replay_enabled: false(default) to avoid replaying semantically similar but non-identical responses. - Monitor cache-hit economics: fill cost, avoided provider cost, hit rate, and stale misses in the admin dashboard.
- Agent gateway groups allow failover without cache invalidation — cache keys use
agent_idandagent_gateway_group_id, not physical gateway ID.
For leaders
- Org-shared cache can significantly reduce marginal AI cost — the first fill costs more, but subsequent engineers reuse shared context.
- Cache hits do not debit wallet balance and do not incur a platform fee — savings are pure avoided provider cost.
- Accuracy is prioritized over savings: code-changing, security-sensitive, and approval-related prompts always get fresh answers.
- Review cache economics (fill cost vs. avoided cost, hit rate) to measure ROI before expanding to additional teams or repositories.
- Monorepo grouping and agent gateway groups are advanced configurations — start with repository-isolated mode for straightforward governance.