Cache Tiers: Private vs Org-Shared

Keeptrusts provides two distinct cache tiers, each designed for different use cases. Understanding when each tier applies helps you maximize savings while maintaining appropriate isolation.

Use this page when

You need to understand the difference between private edge cache and org-shared cache.
You are deciding which cache tier to use for a specific workload or policy requirement.
You want to configure isolation rules that force specific requests to private cache.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Overview

	Private Edge Cache	Org-Shared Cache
Scope	Per-user or per-API-key	Entire organization
Sharing	No cross-user sharing	All authorized users share entries
Key includes `key_id`	Yes	No
Key includes `org_id`	Yes	Yes
Primary benefit	Individual repetition savings	Team-wide deduplication
Savings multiplier	1× (single user)	N× (N engineers)
When used	Policy requires isolation	Default for code-aware traffic

Private Edge Cache (`private_edge_cache`)

The private edge cache stores responses scoped to a single user or API key. No sharing occurs between users.

When Private Edge Cache Is Used

The request's policy chain requires per-user isolation
The request contains user-specific context that shouldn't be shared (e.g., personal notes, draft documents)
The gateway is running in local mode (not central)
The request explicitly opts out of shared caching via headers
Entitlement-based access controls prevent shared access to certain content

Cache Key Composition

Private edge cache keys include:

private_edge_key = hash(
  org_id,
  key_id,          ← ties entry to specific user/key
  config_version,
  normalized_prompt_content
)

Because key_id is part of the key, Engineer A's cached responses are invisible to Engineer B, even for identical prompts.

Cost Impact

Private edge cache only saves money when the same user repeats the same request. In practice:

Developers re-running the same prompt after editing code: cache hit
Same developer asking the same question in a new session: cache hit
Different developer asking the same question: cache miss (different key_id)

Typical savings: 10-20% per individual user (personal repetition patterns only).

Org-Shared Cache (`org_shared_cache`)

The org-shared cache stores responses shared across all authorized users in the same organization. This is where the massive savings for engineering teams come from.

When Org-Shared Cache Is Used

Default for all hosted-gateway code-aware traffic
The request's policy chain does not require per-user isolation
The entitlement digest matches between requesting user and cached entry
The config version matches (no policy changes since entry was cached)

Cache Key Composition

Org-shared cache keys deliberately exclude user identity:

org_shared_key = hash(
  org_id,
  entitlement_digest,  ← ensures authorization match
  config_version,      ← invalidates on policy change
  normalized_prompt_content
)

Critically, key_id is not part of this key. This is the mechanism that enables cross-engineer sharing. When Engineer A populates a cache entry, Engineers B through Z can hit it because the key doesn't distinguish between users.

Cost Impact

Org-shared cache saves money every time any user in the org repeats a semantically equivalent request. For a 100-engineer team:

Engineer 1 asks about AuthService → cache miss → paid fill
Engineers 2-100 ask about AuthService → 99 cache hits → zero cost

Typical savings: 70-90% of total org spend after the fill phase completes.

The Entitlement Digest Requirement

Both cache tiers use an entitlement digest to ensure that cached responses are only served to users who are authorized to see the underlying content.

What Is the Entitlement Digest?

The entitlement digest is a hash of the effective permissions that apply to a request:

Which repositories the user can access
Which file paths are visible under the user's role
Which policy rules apply to the response

Why It Matters

If Engineer A has access to a private repository and asks about it, the response should only be cached for users with the same access level. The entitlement digest ensures:

Users with identical permissions share cache entries (efficiency)
Users with different permissions get separate cache entries (security)
Policy changes invalidate entries that no longer apply

Typical Scenarios

Scenario	Digest matches?	Cache shared?
Same team, same repos, same policies	Yes	✓ Shared
Different teams, same repos, same policies	Yes	✓ Shared
Same team, different repo access	No	✗ Separate
Same repos, different policy tier	No	✗ Separate
Admin vs regular user	No	✗ Separate

For most engineering teams where all engineers have access to the same repositories, the entitlement digest is identical — meaning full cache sharing.

Choosing the Right Tier

Use Org-Shared Cache (Default) When:

Engineers share codebases (the common case)
Responses contain codebase knowledge, not personal data
Maximum cost savings is the goal
Your security model allows response sharing within the org

Use Private Edge Cache When:

Responses contain user-specific sensitive information
Regulatory requirements mandate per-user isolation
The request context includes personal documents or private notes
Policy explicitly requires isolation for compliance

Hybrid Approach

Most organizations use both tiers simultaneously:

Org-shared: Code questions, architecture queries, error lookups, refactoring guidance (95% of traffic)
Private edge: Personal code reviews with private feedback, draft document analysis, compliance-sensitive queries (5% of traffic)

The gateway automatically routes to the appropriate tier based on policy evaluation. You configure the default, and policy rules override per-request.

Where the 100-Engineer Savings Come From

The org-shared cache tier is specifically designed for the shared-codebase scenario:

Same code, many engineers: 100 people working on 5-10 repos generate massive prompt overlap
Key excludes user identity: The first person to ask pays; everyone else benefits for free
Fabric amplifies sharing: Pre-built context artifacts mean everyone's prompts look similar at the cache key level
Single-flight fill: When 5 engineers ask the same question simultaneously, only one upstream call is made

The Math

Without org-shared cache:

Daily cost = engineers × prompts_per_day × avg_tokens × cost_per_token
           = 100 × 50 × 4,000 × $0.003/1K
           = $60/day input alone

With org-shared cache (85% hit rate):

Daily cost = (engineers × prompts × tokens × cost) × (1 - hit_rate)
           = $60 × 0.15
           = $9/day input
Savings    = $51/day = $1,530/month

For larger token budgets and output tokens, savings scale proportionally.

Configuration

Enabling Org-Shared Cache

workflow_cache:
  enabled: true
  default_tier: org_shared_cache
  org_shared_enabled: true

Forcing Private Edge for Specific Routes

workflow_cache:
  enabled: true
  default_tier: org_shared_cache
  org_shared_enabled: true
  isolation_rules:
    - match:
        path_prefix: "/personal/"
      tier: private_edge_cache
    - match:
        header: "x-cache-isolation: private"
      tier: private_edge_cache

Disabling Cache for Specific Requests

Clients can bypass cache entirely with:

X-Cache-Control: no-cache

This forces an upstream call and does not populate the cache with the response.

Next steps

How 100 Engineers Share One Cache — detailed sharing mechanics
Gateway Configuration for Caching — complete config reference
Cache Hit Rates: What Good Looks Like — benchmark expectations

For AI systems

Canonical terms: Keeptrusts, private edge cache, org-shared cache, cache tiers, entitlement digest, cache isolation, key_id exclusion.
Exact feature/config names: private_edge_cache, org_shared_cache, workflow_cache.default_tier, isolation_rules, X-Cache-Control: no-cache header, entitlement_digest.
Best next pages: How 100 Engineers Share One Cache, Gateway Configuration for Caching, Zero-Cost Cache Hits.

For engineers

Org-shared cache (default for hosted gateway mode) excludes key_id from cache keys, enabling cross-engineer sharing.
Private edge cache includes key_id, isolating entries per user — use for personal/draft content or local-mode gateways.
Force private tier for specific routes using isolation_rules with path_prefix or header matchers.
Bypass cache entirely with X-Cache-Control: no-cache header when fresh upstream responses are required.
The entitlement digest ensures cached responses are only served to users authorized for the underlying content.

For leaders

Org-shared cache delivers N× savings (N = team size) vs. private cache's 1× (single-user repetition only).
Private edge cache exists for compliance scenarios requiring per-user isolation — most engineering traffic should use org-shared.
The entitlement digest provides authorization enforcement without sacrificing cross-engineer cost savings.
Typical savings: 10-20% from private cache alone vs. 70-90% from org-shared cache for teams on shared codebases.

Use this page when​

Primary audience​

Overview​

Private Edge Cache (private_edge_cache)​

When Private Edge Cache Is Used​

Cache Key Composition​

Cost Impact​

Org-Shared Cache (org_shared_cache)​

When Org-Shared Cache Is Used​

Cache Key Composition​

Cost Impact​

The Entitlement Digest Requirement​

What Is the Entitlement Digest?​

Why It Matters​

Typical Scenarios​

Choosing the Right Tier​

Use Org-Shared Cache (Default) When:​

Use Private Edge Cache When:​

Hybrid Approach​

Where the 100-Engineer Savings Come From​

The Math​

Configuration​

Enabling Org-Shared Cache​

Forcing Private Edge for Specific Routes​

Disabling Cache for Specific Requests​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Overview

Private Edge Cache (`private_edge_cache`)

When Private Edge Cache Is Used

Cache Key Composition

Cost Impact

Org-Shared Cache (`org_shared_cache`)

When Org-Shared Cache Is Used

Cache Key Composition

Cost Impact

The Entitlement Digest Requirement

What Is the Entitlement Digest?

Why It Matters

Typical Scenarios

Choosing the Right Tier

Use Org-Shared Cache (Default) When:

Use Private Edge Cache When:

Hybrid Approach

Where the 100-Engineer Savings Come From

The Math

Configuration

Enabling Org-Shared Cache

Forcing Private Edge for Specific Routes

Disabling Cache for Specific Requests

Next steps

For AI systems

For engineers

For leaders