How 100 Engineers Share One Cache

This is the core narrative of Keeptrusts org-shared caching: when 100 engineers work on the same codebases, the first request about any piece of code pays full price. Every subsequent request from any engineer about the same code costs nothing. The result is 85-95% savings on AI spend.

Use this page when

You want to understand the cache key composition that enables cross-engineer sharing.
You need to explain the "one pays, all benefit" economic model to your team.
You are evaluating expected savings at different team sizes (100, 500, 1000+ engineers).

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

The Economic Model

One Pays, All Benefit

The fundamental principle:

Engineer 1 asks "How does PaymentService.processRefund() work?"
- Cache: miss
- Action: Request goes upstream to LLM provider
- Cost: Full price (input + output tokens)
- After: Response cached in org-shared tier
Engineers 2-100 ask about the same function (different wording, same intent)
- Cache: hit
- Action: Response served from cache
- Cost: $0 (no provider call, no wallet transaction, no platform fee)
Net result: The org pays for 1 upstream call and serves 100 engineers

Why Different Wording Still Hits Cache

Engineers don't ask identical questions. They phrase things differently:

"What does processRefund do?"
"Explain the refund processing logic"
"How are refunds handled in PaymentService?"
"Walk me through the refund flow"

The cache handles this through multiple matching strategies:

Exact match: Identical normalized prompts (rare but fast)
Semantic match: Embedding-based similarity above threshold (common)
Fabric-mediated match: When fabric context makes prompts converge (very common)

Fabric-mediated matching is particularly powerful: because all engineers receive the same pre-built file summaries and repo maps as context, their actual prompts look more similar at the cache key level than their raw questions suggest.

Cache Key Composition

The org-shared cache key determines what counts as "the same request":

org_shared_key = hash(
  org_id,              ← your organization
  entitlement_digest,  ← authorization level
  config_version,      ← current policy version
  normalized_content   ← the actual prompt/context
)

What's Included (and Why)

Component	Purpose
`org_id`	Prevents cross-org cache pollution
`entitlement_digest`	Ensures only authorized users hit entries
`config_version`	Invalidates when policies change
`normalized_content`	The semantic content of the request

What's Excluded (and Why)

Excluded	Reason
`key_id`	Would prevent cross-engineer sharing
`user_id`	Would prevent cross-engineer sharing
`team_id`	Would limit sharing to single teams
`timestamp`	Would make every request unique
`session_id`	Would prevent cross-session reuse

The deliberate exclusion of user identity from the cache key is the architectural decision that enables 100-engineer sharing.

Single-Flight Fill Coordination

When multiple engineers ask the same question simultaneously (before the cache is populated), Keeptrusts uses single-flight fill to avoid duplicate upstream calls:

Without Single-Flight Fill

00:01 - Engineer A asks about AuthService → cache miss → upstream call #1
00:02 - Engineer B asks about AuthService → cache miss → upstream call #2
00:03 - Engineer C asks about AuthService → cache miss → upstream call #3
00:04 - Engineer D asks about AuthService → cache miss → upstream call #4
00:05 - Engineer E asks about AuthService → cache miss → upstream call #5

Result: 5 upstream calls, 5× cost, same response 5 times.

With Single-Flight Fill

00:01 - Engineer A asks about AuthService → cache miss → upstream call #1 (leader)
00:02 - Engineer B asks about AuthService → cache miss → waits on flight #1
00:03 - Engineer C asks about AuthService → cache miss → waits on flight #1
00:04 - Engineer D asks about AuthService → cache miss → waits on flight #1
00:05 - Engineer E asks about AuthService → cache miss → waits on flight #1
00:08 - Response arrives → served to A, B, C, D, E → cached

Result: 1 upstream call, 1× cost, same response served to all waiters.

Single-flight fill is especially valuable during:

Morning startup (many engineers begin work simultaneously)
Incident response (many engineers investigate the same symptoms)
After deployments (many engineers explore new code)

Fabric Context Reuse

Codebase Context Fabric amplifies cache effectiveness by ensuring all engineers receive the same pre-built context:

Without Fabric

Each engineer's IDE sends raw files as context:

Engineer A: sends auth.ts (v1, 847 tokens) + question
Engineer B: sends auth.ts (v1, 847 tokens) + slightly different question
Different raw context → different cache keys → both miss

With Fabric

All engineers receive the same pre-built file summary:

Engineer A: receives file_summary(auth.ts) (212 tokens) + question
Engineer B: receives file_summary(auth.ts) (212 tokens) + question
Same fabric context → same cache key prefix → high hit likelihood

Fabric creates convergence in the context portion of requests, dramatically increasing the cache hit rate.

Worked Example: 100 Engineers, Real Numbers

Assumptions

Parameter	Value
Engineers	100
Prompts per engineer per day	50
Average input tokens per prompt	4,000
Average output tokens per prompt	1,000
Input cost per 1M tokens	$3.00
Output cost per 1M tokens	$15.00
Cache hit rate (steady state)	85%

Without Cache (Baseline)

Daily input tokens  = 100 × 50 × 4,000 = 20,000,000
Daily output tokens = 100 × 50 × 1,000 = 5,000,000

Daily input cost  = 20M × $3.00/1M  = $60.00
Daily output cost = 5M × $15.00/1M  = $75.00
Daily total cost  = $135.00
Monthly total     = $4,050
Annual total      = $49,275

With Org-Shared Cache (85% Hit Rate)

Cache hits: 85% → zero cost
Cache misses: 15% → full cost

Daily cost = $135.00 × 0.15 = $20.25
Monthly cost = $607.50
Annual cost = $7,391

Annual savings = $49,275 - $7,391 = $41,884 (85% reduction)

With Org-Shared Cache + Fabric Token Reduction (85% hit, 40% token reduction on misses)

Cache hits: 85% → zero cost
Cache misses: 15% → 60% of full token cost (fabric reduces context size)

Daily miss cost = $135.00 × 0.15 × 0.60 = $12.15
Monthly cost = $364.50
Annual cost = $4,434

Annual savings = $49,275 - $4,434 = $44,841 (91% reduction)

Sensitivity Analysis

Hit rate	Monthly cost	Monthly savings	Savings %
60%	$1,620	$2,430	60%
70%	$1,215	$2,835	70%
80%	$810	$3,240	80%
85%	$608	$3,443	85%
90%	$405	$3,645	90%
95%	$203	$3,848	95%

What Drives Hit Rate Higher

The more overlap in your engineering team's work, the higher your hit rate:

Factors That Increase Hit Rate

Fewer repositories: 100 engineers on 3 repos → very high overlap
Shared services/modules: Core libraries used by everyone
Standard architectures: Consistent patterns generate consistent questions
Active development on same areas: Sprint-focused teams exploring same code
Onboarding: New engineers ask the same questions as previous new hires

Factors That Decrease Hit Rate

Many independent repositories: 100 engineers on 100 repos → little overlap
Highly personal codebases: Each engineer owns isolated modules
Rapid code churn: Frequent changes invalidate cached responses
Diverse technology stacks: Different languages/frameworks reduce overlap
Short TTL settings: Aggressive expiry forces more re-fills

For Teams at Scale

100-Engineer Team

Expected steady-state hit rate: 80-90%
Typical payback period: 3-5 days
Monthly savings: $3,000-4,000 (depending on provider and model choice)

500-Engineer Organization

Expected steady-state hit rate: 88-95%
Higher hit rate because more people = more overlap
Monthly savings: $15,000-25,000
The cache becomes more valuable with scale, not less

1,000+ Engineer Enterprise

Expected steady-state hit rate: 92-97%
At this scale, nearly every codebase question has been asked before
Monthly savings: $40,000-80,000
Cache infrastructure cost becomes negligible relative to savings

The Flywheel

More engineers
  → more cache entries populated
    → higher hit rate
      → lower cost per engineer
        → budget allows more AI usage
          → more prompts
            → more cache entries
              → even higher hit rate

Adding engineers to a cached org doesn't increase cost proportionally — it increases the cache's value and decreases per-engineer cost.

Next steps

Gateway Configuration for Team-Wide Caching — set up your gateway for maximum sharing
Cache Hit Rates: What Good Looks Like — benchmark your team
Reducing Redundant LLM Calls — eliminate remaining waste

For AI systems

Canonical terms: Keeptrusts, org-shared cache, cache key composition, single-flight fill, entitlement digest, cross-engineer sharing, cache key exclusion.
Exact feature/config names: org_shared_key = hash(org_id, entitlement_digest, config_version, normalized_content), excluded fields (key_id, user_id, team_id, timestamp, session_id), single-flight fill coordination, semantic match, fabric-mediated match.
Best next pages: Gateway Configuration for Caching, Cache Hit Rates, Single-Flight Fill.

For engineers

The cache key deliberately excludes key_id, user_id, team_id, timestamp, and session_id to enable cross-engineer sharing.
Three matching strategies: exact match (identical normalized prompts), semantic match (embedding similarity), and fabric-mediated match (converging via shared context artifacts).
Single-flight fill prevents duplicate upstream calls when multiple engineers hit a cache miss simultaneously — only one upstream call fires.
Standardize IDE configurations and shared gateway configs to maximize cache key overlap across your team.

For leaders

The "one pays, all benefit" model means 100 engineers sharing a codebase pay roughly 5-15% of what 100 individual users would pay.
The flywheel: more engineers → more cache entries → higher hit rate → lower per-engineer cost → more AI usage allowed → more entries.
At 100 engineers: expect 85-90% hit rate and $3,000-4,000/month savings. At 500: 88-95% hit rate, $15,000-25,000/month savings.
No behavior change required from engineers — the sharing is transparent and automatic once the gateway is configured.

Use this page when​

Primary audience​

The Economic Model​

One Pays, All Benefit​

Why Different Wording Still Hits Cache​

Cache Key Composition​

What's Included (and Why)​

What's Excluded (and Why)​

Single-Flight Fill Coordination​

Without Single-Flight Fill​

With Single-Flight Fill​

Fabric Context Reuse​

Without Fabric​

With Fabric​

Worked Example: 100 Engineers, Real Numbers​

Assumptions​

Without Cache (Baseline)​

With Org-Shared Cache (85% Hit Rate)​

With Org-Shared Cache + Fabric Token Reduction (85% hit, 40% token reduction on misses)​

Sensitivity Analysis​

What Drives Hit Rate Higher​

Factors That Increase Hit Rate​

Factors That Decrease Hit Rate​

For Teams at Scale​

100-Engineer Team​

500-Engineer Organization​

1,000+ Engineer Enterprise​

The Flywheel​

Next steps​

For AI systems​

For engineers​

For leaders​