How 100 Engineers Share One Cache
This is the core narrative of Keeptrusts org-shared caching: when 100 engineers work on the same codebases, the first request about any piece of code pays full price. Every subsequent request from any engineer about the same code costs nothing. The result is 85-95% savings on AI spend.
Use this page when
- You want to understand the cache key composition that enables cross-engineer sharing.
- You need to explain the "one pays, all benefit" economic model to your team.
- You are evaluating expected savings at different team sizes (100, 500, 1000+ engineers).
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
The Economic Model
One Pays, All Benefit
The fundamental principle:
-
Engineer 1 asks "How does
PaymentService.processRefund()work?"- Cache: miss
- Action: Request goes upstream to LLM provider
- Cost: Full price (input + output tokens)
- After: Response cached in org-shared tier
-
Engineers 2-100 ask about the same function (different wording, same intent)
- Cache: hit
- Action: Response served from cache
- Cost: $0 (no provider call, no wallet transaction, no platform fee)
-
Net result: The org pays for 1 upstream call and serves 100 engineers
Why Different Wording Still Hits Cache
Engineers don't ask identical questions. They phrase things differently:
- "What does processRefund do?"
- "Explain the refund processing logic"
- "How are refunds handled in PaymentService?"
- "Walk me through the refund flow"
The cache handles this through multiple matching strategies:
- Exact match: Identical normalized prompts (rare but fast)
- Semantic match: Embedding-based similarity above threshold (common)
- Fabric-mediated match: When fabric context makes prompts converge (very common)
Fabric-mediated matching is particularly powerful: because all engineers receive the same pre-built file summaries and repo maps as context, their actual prompts look more similar at the cache key level than their raw questions suggest.
Cache Key Composition
The org-shared cache key determines what counts as "the same request":
org_shared_key = hash(
org_id, ← your organization
entitlement_digest, ← authorization level
config_version, ← current policy version
normalized_content ← the actual prompt/context
)
What's Included (and Why)
| Component | Purpose |
|---|---|
org_id | Prevents cross-org cache pollution |
entitlement_digest | Ensures only authorized users hit entries |
config_version | Invalidates when policies change |
normalized_content | The semantic content of the request |
What's Excluded (and Why)
| Excluded | Reason |
|---|---|
key_id | Would prevent cross-engineer sharing |
user_id | Would prevent cross-engineer sharing |
team_id | Would limit sharing to single teams |
timestamp | Would make every request unique |
session_id | Would prevent cross-session reuse |
The deliberate exclusion of user identity from the cache key is the architectural decision that enables 100-engineer sharing.
Single-Flight Fill Coordination
When multiple engineers ask the same question simultaneously (before the cache is populated), Keeptrusts uses single-flight fill to avoid duplicate upstream calls:
Without Single-Flight Fill
09:00:01 - Engineer A asks about AuthService → cache miss → upstream call #1
09:00:02 - Engineer B asks about AuthService → cache miss → upstream call #2
09:00:03 - Engineer C asks about AuthService → cache miss → upstream call #3
09:00:04 - Engineer D asks about AuthService → cache miss → upstream call #4
09:00:05 - Engineer E asks about AuthService → cache miss → upstream call #5
Result: 5 upstream calls, 5× cost, same response 5 times.
With Single-Flight Fill
09:00:01 - Engineer A asks about AuthService → cache miss → upstream call #1 (leader)
09:00:02 - Engineer B asks about AuthService → cache miss → waits on flight #1
09:00:03 - Engineer C asks about AuthService → cache miss → waits on flight #1
09:00:04 - Engineer D asks about AuthService → cache miss → waits on flight #1
09:00:05 - Engineer E asks about AuthService → cache miss → waits on flight #1
09:00:08 - Response arrives → served to A, B, C, D, E → cached
Result: 1 upstream call, 1× cost, same response served to all waiters.
Single-flight fill is especially valuable during:
- Morning startup (many engineers begin work simultaneously)
- Incident response (many engineers investigate the same symptoms)
- After deployments (many engineers explore new code)
Fabric Context Reuse
Codebase Context Fabric amplifies cache effectiveness by ensuring all engineers receive the same pre-built context:
Without Fabric
Each engineer's IDE sends raw files as context:
- Engineer A: sends
auth.ts(v1, 847 tokens) + question - Engineer B: sends
auth.ts(v1, 847 tokens) + slightly different question - Different raw context → different cache keys → both miss
With Fabric
All engineers receive the same pre-built file summary:
- Engineer A: receives
file_summary(auth.ts)(212 tokens) + question - Engineer B: receives
file_summary(auth.ts)(212 tokens) + question - Same fabric context → same cache key prefix → high hit likelihood
Fabric creates convergence in the context portion of requests, dramatically increasing the cache hit rate.
Worked Example: 100 Engineers, Real Numbers
Assumptions
| Parameter | Value |
|---|---|
| Engineers | 100 |
| Prompts per engineer per day | 50 |
| Average input tokens per prompt | 4,000 |
| Average output tokens per prompt | 1,000 |
| Input cost per 1M tokens | $3.00 |
| Output cost per 1M tokens | $15.00 |
| Cache hit rate (steady state) | 85% |
Without Cache (Baseline)
Daily input tokens = 100 × 50 × 4,000 = 20,000,000
Daily output tokens = 100 × 50 × 1,000 = 5,000,000
Daily input cost = 20M × $3.00/1M = $60.00
Daily output cost = 5M × $15.00/1M = $75.00
Daily total cost = $135.00
Monthly total = $4,050
Annual total = $49,275
With Org-Shared Cache (85% Hit Rate)
Cache hits: 85% → zero cost
Cache misses: 15% → full cost
Daily cost = $135.00 × 0.15 = $20.25
Monthly cost = $607.50
Annual cost = $7,391
Annual savings = $49,275 - $7,391 = $41,884 (85% reduction)
With Org-Shared Cache + Fabric Token Reduction (85% hit, 40% token reduction on misses)
Cache hits: 85% → zero cost
Cache misses: 15% → 60% of full token cost (fabric reduces context size)
Daily miss cost = $135.00 × 0.15 × 0.60 = $12.15
Monthly cost = $364.50
Annual cost = $4,434
Annual savings = $49,275 - $4,434 = $44,841 (91% reduction)
Sensitivity Analysis
| Hit rate | Monthly cost | Monthly savings | Savings % |
|---|---|---|---|
| 60% | $1,620 | $2,430 | 60% |
| 70% | $1,215 | $2,835 | 70% |
| 80% | $810 | $3,240 | 80% |
| 85% | $608 | $3,443 | 85% |
| 90% | $405 | $3,645 | 90% |
| 95% | $203 | $3,848 | 95% |
What Drives Hit Rate Higher
The more overlap in your engineering team's work, the higher your hit rate:
Factors That Increase Hit Rate
- Fewer repositories: 100 engineers on 3 repos → very high overlap
- Shared services/modules: Core libraries used by everyone
- Standard architectures: Consistent patterns generate consistent questions
- Active development on same areas: Sprint-focused teams exploring same code
- Onboarding: New engineers ask the same questions as previous new hires
Factors That Decrease Hit Rate
- Many independent repositories: 100 engineers on 100 repos → little overlap
- Highly personal codebases: Each engineer owns isolated modules
- Rapid code churn: Frequent changes invalidate cached responses
- Diverse technology stacks: Different languages/frameworks reduce overlap
- Short TTL settings: Aggressive expiry forces more re-fills
For Teams at Scale
100-Engineer Team
- Expected steady-state hit rate: 80-90%
- Typical payback period: 3-5 days
- Monthly savings: $3,000-4,000 (depending on provider and model choice)
500-Engineer Organization
- Expected steady-state hit rate: 88-95%
- Higher hit rate because more people = more overlap
- Monthly savings: $15,000-25,000
- The cache becomes more valuable with scale, not less
1,000+ Engineer Enterprise
- Expected steady-state hit rate: 92-97%
- At this scale, nearly every codebase question has been asked before
- Monthly savings: $40,000-80,000
- Cache infrastructure cost becomes negligible relative to savings
The Flywheel
More engineers
→ more cache entries populated
→ higher hit rate
→ lower cost per engineer
→ budget allows more AI usage
→ more prompts
→ more cache entries
→ even higher hit rate
Adding engineers to a cached org doesn't increase cost proportionally — it increases the cache's value and decreases per-engineer cost.
Next steps
- Gateway Configuration for Team-Wide Caching — set up your gateway for maximum sharing
- Cache Hit Rates: What Good Looks Like — benchmark your team
- Reducing Redundant LLM Calls — eliminate remaining waste
For AI systems
- Canonical terms: Keeptrusts, org-shared cache, cache key composition, single-flight fill, entitlement digest, cross-engineer sharing, cache key exclusion.
- Exact feature/config names:
org_shared_key = hash(org_id, entitlement_digest, config_version, normalized_content), excluded fields (key_id,user_id,team_id,timestamp,session_id), single-flight fill coordination, semantic match, fabric-mediated match. - Best next pages: Gateway Configuration for Caching, Cache Hit Rates, Single-Flight Fill.
For engineers
- The cache key deliberately excludes
key_id,user_id,team_id,timestamp, andsession_idto enable cross-engineer sharing. - Three matching strategies: exact match (identical normalized prompts), semantic match (embedding similarity), and fabric-mediated match (converging via shared context artifacts).
- Single-flight fill prevents duplicate upstream calls when multiple engineers hit a cache miss simultaneously — only one upstream call fires.
- Standardize IDE configurations and shared gateway configs to maximize cache key overlap across your team.
For leaders
- The "one pays, all benefit" model means 100 engineers sharing a codebase pay roughly 5-15% of what 100 individual users would pay.
- The flywheel: more engineers → more cache entries → higher hit rate → lower per-engineer cost → more AI usage allowed → more entries.
- At 100 engineers: expect 85-90% hit rate and $3,000-4,000/month savings. At 500: 88-95% hit rate, $15,000-25,000/month savings.
- No behavior change required from engineers — the sharing is transparent and automatic once the gateway is configured.