The Cache Fill-Then-Save Model
Keeptrusts org-shared cache follows a two-phase economic model that transforms how your organization pays for AI-assisted development. You invest once to fill the cache, then save continuously as your team reuses shared context.
Use this page when
- You want to understand the two-phase cost model: expensive initial fill followed by dramatically cheap ongoing usage.
- You are estimating the payback period for your team before enabling org-shared cache.
- You need to explain to stakeholders why fill-phase costs spike before savings materialize.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Two-Phase Economics
Phase 1: Fill
During the fill phase, your organization pays to build shared context. Every cache miss triggers an upstream provider call at full cost. The cache stores the response so future equivalent requests avoid that cost entirely.
Fill happens naturally as engineers work:
- First engineer asks about a module → cache miss → upstream call → response cached
- First time a file summary is requested → cache miss → upstream call → response cached
- First architecture question about a service → cache miss → upstream call → response cached
The fill phase is the most expensive period. Your daily cost may temporarily exceed your pre-cache baseline because the system is actively building the shared context layer while engineers continue normal work.
Phase 2: Save
Once the cache is populated with your codebase context, the save phase begins. Now most requests hit cache:
- Second engineer asks about the same module → cache hit → zero provider cost
- Anyone requests the same file summary → cache hit → zero provider cost
- Any architecture question with matching context → cache hit → zero provider cost
Cache hits skip upstream provider calls entirely. No tokens are sent to the provider. No wallet reservation is made. No settle transaction occurs. The only record is an avoided-cost entry for your savings dashboard.
Cost Curve Visualization
The cost trajectory for a typical 100-engineer team looks like this:
Daily Cost ($)
│
│ ████ ← Fill phase: high cost
│ ████ (Days 1-3)
│ ████
│ ██████
│ ████████
│ ██████████████
│ ████████████████████████████ ← Baseline (no cache)
│ ──────────────────────────────────────────────────
│ ████ ← Save phase: low cost
│ ██ (Day 4+)
│ █
│ █ █ █ █ █ █ █ █ █ ← Steady state
│────────────────────────────────────── Time (days)
1 2 3 4 5 6 7 8 9 10
- Days 1-3: Fill phase. Cost spikes as the cache populates. You pay full provider price for cache misses plus normal engineering traffic.
- Day 4+: Save phase. Cost drops dramatically. Most requests hit cache. You only pay for genuinely new questions and code changes.
- Steady state: After the initial fill, ongoing cost is determined by your code change rate and the rate of truly novel questions — typically 10-20% of total request volume.
Cache-Hit Economics
When a request hits the org-shared cache, the following happens:
| Step | Uncached request | Cache hit |
|---|---|---|
| Gateway receives request | ✓ | ✓ |
| Policy evaluation (input) | ✓ | ✓ |
| Cache lookup | Miss | Hit |
| Wallet reserve | ✓ | Skipped |
| Upstream provider call | ✓ | Skipped |
| Wallet settle | ✓ | Skipped |
| Token cost charged | ✓ | $0 |
| Platform fee charged | ✓ | $0 |
| Response returned | ✓ | ✓ |
| Avoided-cost record emitted | — | ✓ |
The key insight: cache hits have zero marginal cost. No provider tokens, no platform fee, no wallet transaction. The only cost is the infrastructure running the cache layer itself, which is amortized across all requests.
Avoided-Cost Records
Every cache hit emits an avoided-cost record that tracks:
- The tokens that would have been sent upstream
- The estimated provider cost avoided
- The cache tier that served the response (org_shared_cache or private_edge_cache)
These records power your savings dashboard and ROI reporting.
For engineers
Declarative Configuration
Enable org-shared cache in your gateway configuration:
workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true
ttl_seconds: 86400
max_entry_tokens: 32000
This configuration tells the gateway to:
- Check the org-shared cache before making upstream calls
- Store responses in the org-shared tier by default
- Expire entries after 24 hours (configurable)
- Cache responses up to 32,000 tokens
What Gets Cached
The cache stores complete provider responses keyed by a composite key:
org_id— your organizationentitlement_digest— ensures only authorized users hit cache entriesconfig_version— invalidates cache when policy changes- Normalized prompt content — semantic or exact matching
Importantly, key_id (the individual API key) is not part of the org-shared cache key. This is what enables cross-engineer sharing.
What Doesn't Get Cached
Some requests bypass the cache by design:
- Requests with
cache: skipheader - Requests that trigger policy escalations
- Requests where policy requires per-user isolation
- Streaming responses (cached at completion)
For leaders
Payback Period
The payback period for org-shared cache is the time it takes for cumulative savings to exceed the fill cost:
Payback period = Fill cost ÷ Daily savings rate
For a typical 100-engineer team:
| Metric | Value |
|---|---|
| Daily uncached spend | $200 |
| Fill cost (3-day ramp) | $800 |
| Post-fill daily spend (85% hit rate) | $30 |
| Daily savings | $170 |
| Payback period | 4.7 days |
After the payback period, every day is pure savings. A team that spends $6,000/month uncached drops to under $1,000/month after fill — a 5-6× reduction.
Fill Cost Is an Investment
Frame the fill phase cost as a capital investment, not an expense:
- It's a one-time cost that unlocks ongoing savings
- It's proportional to codebase complexity, not team size
- It amortizes faster as more engineers join the org
- Incremental fills (new code, new repos) are small compared to initial fill
When Fill Cost Reoccurs
The cache requires re-fill in specific scenarios:
- Major refactoring: Significant code changes invalidate cached responses about the old structure
- Config version bump: Policy changes that affect response content invalidate relevant entries
- TTL expiry: Entries expire after the configured TTL and must be re-filled on next request
- New repository added: Each new repo requires its own fill phase
In practice, daily code changes cause small incremental re-fills that are barely noticeable against the savings baseline.
The Flywheel Effect
Org-shared cache creates a positive flywheel:
- More engineers → more cache hits → lower per-engineer cost
- Lower cost → more generous AI usage policies → more prompts
- More prompts → more cache entries → higher hit rate
- Higher hit rate → even lower cost per engineer
This flywheel means that adding engineers to your org doesn't proportionally increase AI cost — it actually decreases cost per engineer as the shared cache becomes more valuable.
Next steps
- Your First 24 Hours with Org-Shared Cache — see the fill-then-save model in action
- How 100 Engineers Share One Cache — understand the sharing mechanics
- Estimating Fill Cost for a New Repository — plan your fill budget
For AI systems
- Canonical terms: Keeptrusts, fill-then-save, cache fill phase, cache save phase, org-shared cache, avoided-cost record, cache miss, flywheel effect.
- Exact feature/config names:
workflow_cache.enabled,workflow_cache.default_tier: org_shared_cache,ttl_seconds,max_entry_tokens,estimated_avoided_cost,cache_hit: true. - Best next pages: Your First 24 Hours, How 100 Engineers Share One Cache, Zero-Cost Cache Hits.
For engineers
- Enable org-shared cache: set
workflow_cache.enabled: trueanddefault_tier: org_shared_cachein your gateway config. - During fill phase (days 1-3), expect cost spikes above your pre-cache baseline as the system builds shared context.
- After fill phase, verify steady-state by checking Cost Center → Savings for declining fill cost and rising avoided cost.
- Cache entries are keyed by
org_id + entitlement_digest + config_version + normalized_content—key_idis excluded to enable cross-engineer sharing. - Requests with
cache: skipheader, policy escalations, or per-user isolation requirements bypass the cache.
For leaders
- Payback period for a 100-engineer team is typically under 5 days: ~$800 fill investment yields $170/day savings.
- Frame the fill phase as a capital investment, not an expense — one-time cost unlocking ongoing 5-6× cost reduction.
- Post-fill monthly spend drops from $6,000 to under $1,000 for typical teams at 85% hit rate.
- The flywheel effect means adding engineers decreases per-engineer cost (more hits per fill dollar) rather than scaling cost linearly.