Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Cache Tiers: Private vs Org-Shared

Keeptrusts provides two distinct cache tiers, each designed for different use cases. Understanding when each tier applies helps you maximize savings while maintaining appropriate isolation.

Use this page when

  • You need to understand the difference between private edge cache and org-shared cache.
  • You are deciding which cache tier to use for a specific workload or policy requirement.
  • You want to configure isolation rules that force specific requests to private cache.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Overview

Private Edge CacheOrg-Shared Cache
ScopePer-user or per-API-keyEntire organization
SharingNo cross-user sharingAll authorized users share entries
Key includes key_idYesNo
Key includes org_idYesYes
Primary benefitIndividual repetition savingsTeam-wide deduplication
Savings multiplier1× (single user)N× (N engineers)
When usedPolicy requires isolationDefault for code-aware traffic

Private Edge Cache (private_edge_cache)

The private edge cache stores responses scoped to a single user or API key. No sharing occurs between users.

When Private Edge Cache Is Used

  • The request's policy chain requires per-user isolation
  • The request contains user-specific context that shouldn't be shared (e.g., personal notes, draft documents)
  • The gateway is running in local mode (not central)
  • The request explicitly opts out of shared caching via headers
  • Entitlement-based access controls prevent shared access to certain content

Cache Key Composition

Private edge cache keys include:

private_edge_key = hash(
org_id,
key_id, ← ties entry to specific user/key
config_version,
normalized_prompt_content
)

Because key_id is part of the key, Engineer A's cached responses are invisible to Engineer B, even for identical prompts.

Cost Impact

Private edge cache only saves money when the same user repeats the same request. In practice:

  • Developers re-running the same prompt after editing code: cache hit
  • Same developer asking the same question in a new session: cache hit
  • Different developer asking the same question: cache miss (different key_id)

Typical savings: 10-20% per individual user (personal repetition patterns only).

Org-Shared Cache (org_shared_cache)

The org-shared cache stores responses shared across all authorized users in the same organization. This is where the massive savings for engineering teams come from.

When Org-Shared Cache Is Used

  • Default for all hosted-gateway code-aware traffic
  • The request's policy chain does not require per-user isolation
  • The entitlement digest matches between requesting user and cached entry
  • The config version matches (no policy changes since entry was cached)

Cache Key Composition

Org-shared cache keys deliberately exclude user identity:

org_shared_key = hash(
org_id,
entitlement_digest, ← ensures authorization match
config_version, ← invalidates on policy change
normalized_prompt_content
)

Critically, key_id is not part of this key. This is the mechanism that enables cross-engineer sharing. When Engineer A populates a cache entry, Engineers B through Z can hit it because the key doesn't distinguish between users.

Cost Impact

Org-shared cache saves money every time any user in the org repeats a semantically equivalent request. For a 100-engineer team:

  • Engineer 1 asks about AuthService → cache miss → paid fill
  • Engineers 2-100 ask about AuthService → 99 cache hits → zero cost

Typical savings: 70-90% of total org spend after the fill phase completes.

The Entitlement Digest Requirement

Both cache tiers use an entitlement digest to ensure that cached responses are only served to users who are authorized to see the underlying content.

What Is the Entitlement Digest?

The entitlement digest is a hash of the effective permissions that apply to a request:

  • Which repositories the user can access
  • Which file paths are visible under the user's role
  • Which policy rules apply to the response

Why It Matters

If Engineer A has access to a private repository and asks about it, the response should only be cached for users with the same access level. The entitlement digest ensures:

  • Users with identical permissions share cache entries (efficiency)
  • Users with different permissions get separate cache entries (security)
  • Policy changes invalidate entries that no longer apply

Typical Scenarios

ScenarioDigest matches?Cache shared?
Same team, same repos, same policiesYes✓ Shared
Different teams, same repos, same policiesYes✓ Shared
Same team, different repo accessNo✗ Separate
Same repos, different policy tierNo✗ Separate
Admin vs regular userNo✗ Separate

For most engineering teams where all engineers have access to the same repositories, the entitlement digest is identical — meaning full cache sharing.

Choosing the Right Tier

Use Org-Shared Cache (Default) When:

  • Engineers share codebases (the common case)
  • Responses contain codebase knowledge, not personal data
  • Maximum cost savings is the goal
  • Your security model allows response sharing within the org

Use Private Edge Cache When:

  • Responses contain user-specific sensitive information
  • Regulatory requirements mandate per-user isolation
  • The request context includes personal documents or private notes
  • Policy explicitly requires isolation for compliance

Hybrid Approach

Most organizations use both tiers simultaneously:

  • Org-shared: Code questions, architecture queries, error lookups, refactoring guidance (95% of traffic)
  • Private edge: Personal code reviews with private feedback, draft document analysis, compliance-sensitive queries (5% of traffic)

The gateway automatically routes to the appropriate tier based on policy evaluation. You configure the default, and policy rules override per-request.

Where the 100-Engineer Savings Come From

The org-shared cache tier is specifically designed for the shared-codebase scenario:

  1. Same code, many engineers: 100 people working on 5-10 repos generate massive prompt overlap
  2. Key excludes user identity: The first person to ask pays; everyone else benefits for free
  3. Fabric amplifies sharing: Pre-built context artifacts mean everyone's prompts look similar at the cache key level
  4. Single-flight fill: When 5 engineers ask the same question simultaneously, only one upstream call is made

The Math

Without org-shared cache:

Daily cost = engineers × prompts_per_day × avg_tokens × cost_per_token
= 100 × 50 × 4,000 × $0.003/1K
= $60/day input alone

With org-shared cache (85% hit rate):

Daily cost = (engineers × prompts × tokens × cost) × (1 - hit_rate)
= $60 × 0.15
= $9/day input
Savings = $51/day = $1,530/month

For larger token budgets and output tokens, savings scale proportionally.

Configuration

Enabling Org-Shared Cache

workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true

Forcing Private Edge for Specific Routes

workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true
isolation_rules:
- match:
path_prefix: "/personal/"
tier: private_edge_cache
- match:
header: "x-cache-isolation: private"
tier: private_edge_cache

Disabling Cache for Specific Requests

Clients can bypass cache entirely with:

X-Cache-Control: no-cache

This forces an upstream call and does not populate the cache with the response.

Next steps

For AI systems

For engineers

  • Org-shared cache (default for hosted gateway mode) excludes key_id from cache keys, enabling cross-engineer sharing.
  • Private edge cache includes key_id, isolating entries per user — use for personal/draft content or local-mode gateways.
  • Force private tier for specific routes using isolation_rules with path_prefix or header matchers.
  • Bypass cache entirely with X-Cache-Control: no-cache header when fresh upstream responses are required.
  • The entitlement digest ensures cached responses are only served to users authorized for the underlying content.

For leaders

  • Org-shared cache delivers N× savings (N = team size) vs. private cache's 1× (single-user repetition only).
  • Private edge cache exists for compliance scenarios requiring per-user isolation — most engineering traffic should use org-shared.
  • The entitlement digest provides authorization enforcement without sacrificing cross-engineer cost savings.
  • Typical savings: 10-20% from private cache alone vs. 70-90% from org-shared cache for teams on shared codebases.