Cache Tiers: Private vs Org-Shared
Keeptrusts provides two distinct cache tiers, each designed for different use cases. Understanding when each tier applies helps you maximize savings while maintaining appropriate isolation.
Use this page when
- You need to understand the difference between private edge cache and org-shared cache.
- You are deciding which cache tier to use for a specific workload or policy requirement.
- You want to configure isolation rules that force specific requests to private cache.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Overview
| Private Edge Cache | Org-Shared Cache | |
|---|---|---|
| Scope | Per-user or per-API-key | Entire organization |
| Sharing | No cross-user sharing | All authorized users share entries |
Key includes key_id | Yes | No |
Key includes org_id | Yes | Yes |
| Primary benefit | Individual repetition savings | Team-wide deduplication |
| Savings multiplier | 1× (single user) | N× (N engineers) |
| When used | Policy requires isolation | Default for code-aware traffic |
Private Edge Cache (private_edge_cache)
The private edge cache stores responses scoped to a single user or API key. No sharing occurs between users.
When Private Edge Cache Is Used
- The request's policy chain requires per-user isolation
- The request contains user-specific context that shouldn't be shared (e.g., personal notes, draft documents)
- The gateway is running in local mode (not central)
- The request explicitly opts out of shared caching via headers
- Entitlement-based access controls prevent shared access to certain content
Cache Key Composition
Private edge cache keys include:
private_edge_key = hash(
org_id,
key_id, ← ties entry to specific user/key
config_version,
normalized_prompt_content
)
Because key_id is part of the key, Engineer A's cached responses are invisible to Engineer B, even for identical prompts.
Cost Impact
Private edge cache only saves money when the same user repeats the same request. In practice:
- Developers re-running the same prompt after editing code: cache hit
- Same developer asking the same question in a new session: cache hit
- Different developer asking the same question: cache miss (different key_id)
Typical savings: 10-20% per individual user (personal repetition patterns only).
Org-Shared Cache (org_shared_cache)
The org-shared cache stores responses shared across all authorized users in the same organization. This is where the massive savings for engineering teams come from.
When Org-Shared Cache Is Used
- Default for all hosted-gateway code-aware traffic
- The request's policy chain does not require per-user isolation
- The entitlement digest matches between requesting user and cached entry
- The config version matches (no policy changes since entry was cached)
Cache Key Composition
Org-shared cache keys deliberately exclude user identity:
org_shared_key = hash(
org_id,
entitlement_digest, ← ensures authorization match
config_version, ← invalidates on policy change
normalized_prompt_content
)
Critically, key_id is not part of this key. This is the mechanism that enables cross-engineer sharing. When Engineer A populates a cache entry, Engineers B through Z can hit it because the key doesn't distinguish between users.
Cost Impact
Org-shared cache saves money every time any user in the org repeats a semantically equivalent request. For a 100-engineer team:
- Engineer 1 asks about
AuthService→ cache miss → paid fill - Engineers 2-100 ask about
AuthService→ 99 cache hits → zero cost
Typical savings: 70-90% of total org spend after the fill phase completes.
The Entitlement Digest Requirement
Both cache tiers use an entitlement digest to ensure that cached responses are only served to users who are authorized to see the underlying content.
What Is the Entitlement Digest?
The entitlement digest is a hash of the effective permissions that apply to a request:
- Which repositories the user can access
- Which file paths are visible under the user's role
- Which policy rules apply to the response
Why It Matters
If Engineer A has access to a private repository and asks about it, the response should only be cached for users with the same access level. The entitlement digest ensures:
- Users with identical permissions share cache entries (efficiency)
- Users with different permissions get separate cache entries (security)
- Policy changes invalidate entries that no longer apply
Typical Scenarios
| Scenario | Digest matches? | Cache shared? |
|---|---|---|
| Same team, same repos, same policies | Yes | ✓ Shared |
| Different teams, same repos, same policies | Yes | ✓ Shared |
| Same team, different repo access | No | ✗ Separate |
| Same repos, different policy tier | No | ✗ Separate |
| Admin vs regular user | No | ✗ Separate |
For most engineering teams where all engineers have access to the same repositories, the entitlement digest is identical — meaning full cache sharing.
Choosing the Right Tier
Use Org-Shared Cache (Default) When:
- Engineers share codebases (the common case)
- Responses contain codebase knowledge, not personal data
- Maximum cost savings is the goal
- Your security model allows response sharing within the org
Use Private Edge Cache When:
- Responses contain user-specific sensitive information
- Regulatory requirements mandate per-user isolation
- The request context includes personal documents or private notes
- Policy explicitly requires isolation for compliance
Hybrid Approach
Most organizations use both tiers simultaneously:
- Org-shared: Code questions, architecture queries, error lookups, refactoring guidance (95% of traffic)
- Private edge: Personal code reviews with private feedback, draft document analysis, compliance-sensitive queries (5% of traffic)
The gateway automatically routes to the appropriate tier based on policy evaluation. You configure the default, and policy rules override per-request.
Where the 100-Engineer Savings Come From
The org-shared cache tier is specifically designed for the shared-codebase scenario:
- Same code, many engineers: 100 people working on 5-10 repos generate massive prompt overlap
- Key excludes user identity: The first person to ask pays; everyone else benefits for free
- Fabric amplifies sharing: Pre-built context artifacts mean everyone's prompts look similar at the cache key level
- Single-flight fill: When 5 engineers ask the same question simultaneously, only one upstream call is made
The Math
Without org-shared cache:
Daily cost = engineers × prompts_per_day × avg_tokens × cost_per_token
= 100 × 50 × 4,000 × $0.003/1K
= $60/day input alone
With org-shared cache (85% hit rate):
Daily cost = (engineers × prompts × tokens × cost) × (1 - hit_rate)
= $60 × 0.15
= $9/day input
Savings = $51/day = $1,530/month
For larger token budgets and output tokens, savings scale proportionally.
Configuration
Enabling Org-Shared Cache
workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true
Forcing Private Edge for Specific Routes
workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true
isolation_rules:
- match:
path_prefix: "/personal/"
tier: private_edge_cache
- match:
header: "x-cache-isolation: private"
tier: private_edge_cache
Disabling Cache for Specific Requests
Clients can bypass cache entirely with:
X-Cache-Control: no-cache
This forces an upstream call and does not populate the cache with the response.
Next steps
- How 100 Engineers Share One Cache — detailed sharing mechanics
- Gateway Configuration for Caching — complete config reference
- Cache Hit Rates: What Good Looks Like — benchmark expectations
For AI systems
- Canonical terms: Keeptrusts, private edge cache, org-shared cache, cache tiers, entitlement digest, cache isolation, key_id exclusion.
- Exact feature/config names:
private_edge_cache,org_shared_cache,workflow_cache.default_tier,isolation_rules,X-Cache-Control: no-cacheheader,entitlement_digest. - Best next pages: How 100 Engineers Share One Cache, Gateway Configuration for Caching, Zero-Cost Cache Hits.
For engineers
- Org-shared cache (default for hosted gateway mode) excludes
key_idfrom cache keys, enabling cross-engineer sharing. - Private edge cache includes
key_id, isolating entries per user — use for personal/draft content or local-mode gateways. - Force private tier for specific routes using
isolation_ruleswithpath_prefixorheadermatchers. - Bypass cache entirely with
X-Cache-Control: no-cacheheader when fresh upstream responses are required. - The entitlement digest ensures cached responses are only served to users authorized for the underlying content.
For leaders
- Org-shared cache delivers N× savings (N = team size) vs. private cache's 1× (single-user repetition only).
- Private edge cache exists for compliance scenarios requiring per-user isolation — most engineering traffic should use org-shared.
- The entitlement digest provides authorization enforcement without sacrificing cross-engineer cost savings.
- Typical savings: 10-20% from private cache alone vs. 70-90% from org-shared cache for teams on shared codebases.