Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

How 100 Engineers Share One Cache

This is the core narrative of Keeptrusts org-shared caching: when 100 engineers work on the same codebases, the first request about any piece of code pays full price. Every subsequent request from any engineer about the same code costs nothing. The result is 85-95% savings on AI spend.

Use this page when

  • You want to understand the cache key composition that enables cross-engineer sharing.
  • You need to explain the "one pays, all benefit" economic model to your team.
  • You are evaluating expected savings at different team sizes (100, 500, 1000+ engineers).

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

The Economic Model

One Pays, All Benefit

The fundamental principle:

  1. Engineer 1 asks "How does PaymentService.processRefund() work?"

    • Cache: miss
    • Action: Request goes upstream to LLM provider
    • Cost: Full price (input + output tokens)
    • After: Response cached in org-shared tier
  2. Engineers 2-100 ask about the same function (different wording, same intent)

    • Cache: hit
    • Action: Response served from cache
    • Cost: $0 (no provider call, no wallet transaction, no platform fee)
  3. Net result: The org pays for 1 upstream call and serves 100 engineers

Why Different Wording Still Hits Cache

Engineers don't ask identical questions. They phrase things differently:

  • "What does processRefund do?"
  • "Explain the refund processing logic"
  • "How are refunds handled in PaymentService?"
  • "Walk me through the refund flow"

The cache handles this through multiple matching strategies:

  • Exact match: Identical normalized prompts (rare but fast)
  • Semantic match: Embedding-based similarity above threshold (common)
  • Fabric-mediated match: When fabric context makes prompts converge (very common)

Fabric-mediated matching is particularly powerful: because all engineers receive the same pre-built file summaries and repo maps as context, their actual prompts look more similar at the cache key level than their raw questions suggest.

Cache Key Composition

The org-shared cache key determines what counts as "the same request":

org_shared_key = hash(
org_id, ← your organization
entitlement_digest, ← authorization level
config_version, ← current policy version
normalized_content ← the actual prompt/context
)

What's Included (and Why)

ComponentPurpose
org_idPrevents cross-org cache pollution
entitlement_digestEnsures only authorized users hit entries
config_versionInvalidates when policies change
normalized_contentThe semantic content of the request

What's Excluded (and Why)

ExcludedReason
key_idWould prevent cross-engineer sharing
user_idWould prevent cross-engineer sharing
team_idWould limit sharing to single teams
timestampWould make every request unique
session_idWould prevent cross-session reuse

The deliberate exclusion of user identity from the cache key is the architectural decision that enables 100-engineer sharing.

Single-Flight Fill Coordination

When multiple engineers ask the same question simultaneously (before the cache is populated), Keeptrusts uses single-flight fill to avoid duplicate upstream calls:

Without Single-Flight Fill

09:00:01 - Engineer A asks about AuthService → cache miss → upstream call #1
09:00:02 - Engineer B asks about AuthService → cache miss → upstream call #2
09:00:03 - Engineer C asks about AuthService → cache miss → upstream call #3
09:00:04 - Engineer D asks about AuthService → cache miss → upstream call #4
09:00:05 - Engineer E asks about AuthService → cache miss → upstream call #5

Result: 5 upstream calls, 5× cost, same response 5 times.

With Single-Flight Fill

09:00:01 - Engineer A asks about AuthService → cache miss → upstream call #1 (leader)
09:00:02 - Engineer B asks about AuthService → cache miss → waits on flight #1
09:00:03 - Engineer C asks about AuthService → cache miss → waits on flight #1
09:00:04 - Engineer D asks about AuthService → cache miss → waits on flight #1
09:00:05 - Engineer E asks about AuthService → cache miss → waits on flight #1
09:00:08 - Response arrives → served to A, B, C, D, E → cached

Result: 1 upstream call, 1× cost, same response served to all waiters.

Single-flight fill is especially valuable during:

  • Morning startup (many engineers begin work simultaneously)
  • Incident response (many engineers investigate the same symptoms)
  • After deployments (many engineers explore new code)

Fabric Context Reuse

Codebase Context Fabric amplifies cache effectiveness by ensuring all engineers receive the same pre-built context:

Without Fabric

Each engineer's IDE sends raw files as context:

  • Engineer A: sends auth.ts (v1, 847 tokens) + question
  • Engineer B: sends auth.ts (v1, 847 tokens) + slightly different question
  • Different raw context → different cache keys → both miss

With Fabric

All engineers receive the same pre-built file summary:

  • Engineer A: receives file_summary(auth.ts) (212 tokens) + question
  • Engineer B: receives file_summary(auth.ts) (212 tokens) + question
  • Same fabric context → same cache key prefix → high hit likelihood

Fabric creates convergence in the context portion of requests, dramatically increasing the cache hit rate.

Worked Example: 100 Engineers, Real Numbers

Assumptions

ParameterValue
Engineers100
Prompts per engineer per day50
Average input tokens per prompt4,000
Average output tokens per prompt1,000
Input cost per 1M tokens$3.00
Output cost per 1M tokens$15.00
Cache hit rate (steady state)85%

Without Cache (Baseline)

Daily input tokens = 100 × 50 × 4,000 = 20,000,000
Daily output tokens = 100 × 50 × 1,000 = 5,000,000

Daily input cost = 20M × $3.00/1M = $60.00
Daily output cost = 5M × $15.00/1M = $75.00
Daily total cost = $135.00
Monthly total = $4,050
Annual total = $49,275

With Org-Shared Cache (85% Hit Rate)

Cache hits: 85% → zero cost
Cache misses: 15% → full cost

Daily cost = $135.00 × 0.15 = $20.25
Monthly cost = $607.50
Annual cost = $7,391

Annual savings = $49,275 - $7,391 = $41,884 (85% reduction)

With Org-Shared Cache + Fabric Token Reduction (85% hit, 40% token reduction on misses)

Cache hits: 85% → zero cost
Cache misses: 15% → 60% of full token cost (fabric reduces context size)

Daily miss cost = $135.00 × 0.15 × 0.60 = $12.15
Monthly cost = $364.50
Annual cost = $4,434

Annual savings = $49,275 - $4,434 = $44,841 (91% reduction)

Sensitivity Analysis

Hit rateMonthly costMonthly savingsSavings %
60%$1,620$2,43060%
70%$1,215$2,83570%
80%$810$3,24080%
85%$608$3,44385%
90%$405$3,64590%
95%$203$3,84895%

What Drives Hit Rate Higher

The more overlap in your engineering team's work, the higher your hit rate:

Factors That Increase Hit Rate

  • Fewer repositories: 100 engineers on 3 repos → very high overlap
  • Shared services/modules: Core libraries used by everyone
  • Standard architectures: Consistent patterns generate consistent questions
  • Active development on same areas: Sprint-focused teams exploring same code
  • Onboarding: New engineers ask the same questions as previous new hires

Factors That Decrease Hit Rate

  • Many independent repositories: 100 engineers on 100 repos → little overlap
  • Highly personal codebases: Each engineer owns isolated modules
  • Rapid code churn: Frequent changes invalidate cached responses
  • Diverse technology stacks: Different languages/frameworks reduce overlap
  • Short TTL settings: Aggressive expiry forces more re-fills

For Teams at Scale

100-Engineer Team

  • Expected steady-state hit rate: 80-90%
  • Typical payback period: 3-5 days
  • Monthly savings: $3,000-4,000 (depending on provider and model choice)

500-Engineer Organization

  • Expected steady-state hit rate: 88-95%
  • Higher hit rate because more people = more overlap
  • Monthly savings: $15,000-25,000
  • The cache becomes more valuable with scale, not less

1,000+ Engineer Enterprise

  • Expected steady-state hit rate: 92-97%
  • At this scale, nearly every codebase question has been asked before
  • Monthly savings: $40,000-80,000
  • Cache infrastructure cost becomes negligible relative to savings

The Flywheel

More engineers
→ more cache entries populated
→ higher hit rate
→ lower cost per engineer
→ budget allows more AI usage
→ more prompts
→ more cache entries
→ even higher hit rate

Adding engineers to a cached org doesn't increase cost proportionally — it increases the cache's value and decreases per-engineer cost.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, org-shared cache, cache key composition, single-flight fill, entitlement digest, cross-engineer sharing, cache key exclusion.
  • Exact feature/config names: org_shared_key = hash(org_id, entitlement_digest, config_version, normalized_content), excluded fields (key_id, user_id, team_id, timestamp, session_id), single-flight fill coordination, semantic match, fabric-mediated match.
  • Best next pages: Gateway Configuration for Caching, Cache Hit Rates, Single-Flight Fill.

For engineers

  • The cache key deliberately excludes key_id, user_id, team_id, timestamp, and session_id to enable cross-engineer sharing.
  • Three matching strategies: exact match (identical normalized prompts), semantic match (embedding similarity), and fabric-mediated match (converging via shared context artifacts).
  • Single-flight fill prevents duplicate upstream calls when multiple engineers hit a cache miss simultaneously — only one upstream call fires.
  • Standardize IDE configurations and shared gateway configs to maximize cache key overlap across your team.

For leaders

  • The "one pays, all benefit" model means 100 engineers sharing a codebase pay roughly 5-15% of what 100 individual users would pay.
  • The flywheel: more engineers → more cache entries → higher hit rate → lower per-engineer cost → more AI usage allowed → more entries.
  • At 100 engineers: expect 85-90% hit rate and $3,000-4,000/month savings. At 500: 88-95% hit rate, $15,000-25,000/month savings.
  • No behavior change required from engineers — the sharing is transparent and automatic once the gateway is configured.