Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

The Cache Fill-Then-Save Model

Keeptrusts org-shared cache follows a two-phase economic model that transforms how your organization pays for AI-assisted development. You invest once to fill the cache, then save continuously as your team reuses shared context.

Use this page when

  • You want to understand the two-phase cost model: expensive initial fill followed by dramatically cheap ongoing usage.
  • You are estimating the payback period for your team before enabling org-shared cache.
  • You need to explain to stakeholders why fill-phase costs spike before savings materialize.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Two-Phase Economics

Phase 1: Fill

During the fill phase, your organization pays to build shared context. Every cache miss triggers an upstream provider call at full cost. The cache stores the response so future equivalent requests avoid that cost entirely.

Fill happens naturally as engineers work:

  • First engineer asks about a module → cache miss → upstream call → response cached
  • First time a file summary is requested → cache miss → upstream call → response cached
  • First architecture question about a service → cache miss → upstream call → response cached

The fill phase is the most expensive period. Your daily cost may temporarily exceed your pre-cache baseline because the system is actively building the shared context layer while engineers continue normal work.

Phase 2: Save

Once the cache is populated with your codebase context, the save phase begins. Now most requests hit cache:

  • Second engineer asks about the same module → cache hit → zero provider cost
  • Anyone requests the same file summary → cache hit → zero provider cost
  • Any architecture question with matching context → cache hit → zero provider cost

Cache hits skip upstream provider calls entirely. No tokens are sent to the provider. No wallet reservation is made. No settle transaction occurs. The only record is an avoided-cost entry for your savings dashboard.

Cost Curve Visualization

The cost trajectory for a typical 100-engineer team looks like this:

Daily Cost ($)

│ ████ ← Fill phase: high cost
│ ████ (Days 1-3)
│ ████
│ ██████
│ ████████
│ ██████████████
│ ████████████████████████████ ← Baseline (no cache)
│ ──────────────────────────────────────────────────
│ ████ ← Save phase: low cost
│ ██ (Day 4+)
│ █
│ █ █ █ █ █ █ █ █ █ ← Steady state
│────────────────────────────────────── Time (days)
1 2 3 4 5 6 7 8 9 10
  • Days 1-3: Fill phase. Cost spikes as the cache populates. You pay full provider price for cache misses plus normal engineering traffic.
  • Day 4+: Save phase. Cost drops dramatically. Most requests hit cache. You only pay for genuinely new questions and code changes.
  • Steady state: After the initial fill, ongoing cost is determined by your code change rate and the rate of truly novel questions — typically 10-20% of total request volume.

Cache-Hit Economics

When a request hits the org-shared cache, the following happens:

StepUncached requestCache hit
Gateway receives request
Policy evaluation (input)
Cache lookupMissHit
Wallet reserveSkipped
Upstream provider callSkipped
Wallet settleSkipped
Token cost charged$0
Platform fee charged$0
Response returned
Avoided-cost record emitted

The key insight: cache hits have zero marginal cost. No provider tokens, no platform fee, no wallet transaction. The only cost is the infrastructure running the cache layer itself, which is amortized across all requests.

Avoided-Cost Records

Every cache hit emits an avoided-cost record that tracks:

  • The tokens that would have been sent upstream
  • The estimated provider cost avoided
  • The cache tier that served the response (org_shared_cache or private_edge_cache)

These records power your savings dashboard and ROI reporting.

For engineers

Declarative Configuration

Enable org-shared cache in your gateway configuration:

workflow_cache:
enabled: true
default_tier: org_shared_cache
org_shared_enabled: true
ttl_seconds: 86400
max_entry_tokens: 32000

This configuration tells the gateway to:

  1. Check the org-shared cache before making upstream calls
  2. Store responses in the org-shared tier by default
  3. Expire entries after 24 hours (configurable)
  4. Cache responses up to 32,000 tokens

What Gets Cached

The cache stores complete provider responses keyed by a composite key:

  • org_id — your organization
  • entitlement_digest — ensures only authorized users hit cache entries
  • config_version — invalidates cache when policy changes
  • Normalized prompt content — semantic or exact matching

Importantly, key_id (the individual API key) is not part of the org-shared cache key. This is what enables cross-engineer sharing.

What Doesn't Get Cached

Some requests bypass the cache by design:

  • Requests with cache: skip header
  • Requests that trigger policy escalations
  • Requests where policy requires per-user isolation
  • Streaming responses (cached at completion)

For leaders

Payback Period

The payback period for org-shared cache is the time it takes for cumulative savings to exceed the fill cost:

Payback period = Fill cost ÷ Daily savings rate

For a typical 100-engineer team:

MetricValue
Daily uncached spend$200
Fill cost (3-day ramp)$800
Post-fill daily spend (85% hit rate)$30
Daily savings$170
Payback period4.7 days

After the payback period, every day is pure savings. A team that spends $6,000/month uncached drops to under $1,000/month after fill — a 5-6× reduction.

Fill Cost Is an Investment

Frame the fill phase cost as a capital investment, not an expense:

  • It's a one-time cost that unlocks ongoing savings
  • It's proportional to codebase complexity, not team size
  • It amortizes faster as more engineers join the org
  • Incremental fills (new code, new repos) are small compared to initial fill

When Fill Cost Reoccurs

The cache requires re-fill in specific scenarios:

  • Major refactoring: Significant code changes invalidate cached responses about the old structure
  • Config version bump: Policy changes that affect response content invalidate relevant entries
  • TTL expiry: Entries expire after the configured TTL and must be re-filled on next request
  • New repository added: Each new repo requires its own fill phase

In practice, daily code changes cause small incremental re-fills that are barely noticeable against the savings baseline.

The Flywheel Effect

Org-shared cache creates a positive flywheel:

  1. More engineers → more cache hits → lower per-engineer cost
  2. Lower cost → more generous AI usage policies → more prompts
  3. More prompts → more cache entries → higher hit rate
  4. Higher hit rate → even lower cost per engineer

This flywheel means that adding engineers to your org doesn't proportionally increase AI cost — it actually decreases cost per engineer as the shared cache becomes more valuable.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, fill-then-save, cache fill phase, cache save phase, org-shared cache, avoided-cost record, cache miss, flywheel effect.
  • Exact feature/config names: workflow_cache.enabled, workflow_cache.default_tier: org_shared_cache, ttl_seconds, max_entry_tokens, estimated_avoided_cost, cache_hit: true.
  • Best next pages: Your First 24 Hours, How 100 Engineers Share One Cache, Zero-Cost Cache Hits.

For engineers

  • Enable org-shared cache: set workflow_cache.enabled: true and default_tier: org_shared_cache in your gateway config.
  • During fill phase (days 1-3), expect cost spikes above your pre-cache baseline as the system builds shared context.
  • After fill phase, verify steady-state by checking Cost Center → Savings for declining fill cost and rising avoided cost.
  • Cache entries are keyed by org_id + entitlement_digest + config_version + normalized_contentkey_id is excluded to enable cross-engineer sharing.
  • Requests with cache: skip header, policy escalations, or per-user isolation requirements bypass the cache.

For leaders

  • Payback period for a 100-engineer team is typically under 5 days: ~$800 fill investment yields $170/day savings.
  • Frame the fill phase as a capital investment, not an expense — one-time cost unlocking ongoing 5-6× cost reduction.
  • Post-fill monthly spend drops from $6,000 to under $1,000 for typical teams at 85% hit rate.
  • The flywheel effect means adding engineers decreases per-engineer cost (more hits per fill dollar) rather than scaling cost linearly.