The Cache Fill-Then-Save Model

Keeptrusts org-shared cache follows a two-phase economic model that transforms how your organization pays for AI-assisted development. You invest once to fill the cache, then save continuously as your team reuses shared context.

Use this page when

You want to understand the two-phase cost model: expensive initial fill followed by dramatically cheap ongoing usage.
You are estimating the payback period for your team before enabling org-shared cache.
You need to explain to stakeholders why fill-phase costs spike before savings materialize.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Two-Phase Economics

Phase 1: Fill

During the fill phase, your organization pays to build shared context. Every cache miss triggers an upstream provider call at full cost. The cache stores the response so future equivalent requests avoid that cost entirely.

Fill happens naturally as engineers work:

First engineer asks about a module → cache miss → upstream call → response cached
First time a file summary is requested → cache miss → upstream call → response cached
First architecture question about a service → cache miss → upstream call → response cached

The fill phase is the most expensive period. Your daily cost may temporarily exceed your pre-cache baseline because the system is actively building the shared context layer while engineers continue normal work.

Phase 2: Save

Once the cache is populated with your codebase context, the save phase begins. Now most requests hit cache:

Second engineer asks about the same module → cache hit → zero provider cost
Anyone requests the same file summary → cache hit → zero provider cost
Any architecture question with matching context → cache hit → zero provider cost

Cache hits skip upstream provider calls entirely. No tokens are sent to the provider. No wallet reservation is made. No settle transaction occurs. The only record is an avoided-cost entry for your savings dashboard.

Cost Curve Visualization

The cost trajectory for a typical 100-engineer team looks like this:

Daily Cost ($)
│
│ ████                          ← Fill phase: high cost
│ ████                            (Days 1-3)
│ ████
│ ██████
│ ████████
│ ██████████████
│ ████████████████████████████  ← Baseline (no cache)
│ ──────────────────────────────────────────────────
│         ████                  ← Save phase: low cost
│           ██                    (Day 4+)
│            █
│            █ █ █ █ █ █ █ █ █  ← Steady state
│────────────────────────────────────── Time (days)
  1  2  3  4  5  6  7  8  9 10

Days 1-3: Fill phase. Cost spikes as the cache populates. You pay full provider price for cache misses plus normal engineering traffic.
Day 4+: Save phase. Cost drops dramatically. Most requests hit cache. You only pay for genuinely new questions and code changes.
Steady state: After the initial fill, ongoing cost is determined by your code change rate and the rate of truly novel questions — typically 10-20% of total request volume.

Cache-Hit Economics

When a request hits the org-shared cache, the following happens:

Step	Uncached request	Cache hit
Gateway receives request	✓	✓
Policy evaluation (input)	✓	✓
Cache lookup	Miss	Hit
Wallet reserve	✓	Skipped
Upstream provider call	✓	Skipped
Wallet settle	✓	Skipped
Token cost charged	✓	$0
Platform fee charged	✓	$0
Response returned	✓	✓
Avoided-cost record emitted	—	✓

The key insight: cache hits have zero marginal cost. No provider tokens, no platform fee, no wallet transaction. The only cost is the infrastructure running the cache layer itself, which is amortized across all requests.

Avoided-Cost Records

Every cache hit emits an avoided-cost record that tracks:

The tokens that would have been sent upstream
The estimated provider cost avoided
The cache tier that served the response (org_shared_cache or private_edge_cache)

These records power your savings dashboard and ROI reporting.

For engineers

Declarative Configuration

Enable org-shared cache in your gateway configuration:

workflow_cache:
  enabled: true
  default_tier: org_shared_cache
  org_shared_enabled: true
  ttl_seconds: 86400
  max_entry_tokens: 32000

This configuration tells the gateway to:

Check the org-shared cache before making upstream calls
Store responses in the org-shared tier by default
Expire entries after 24 hours (configurable)
Cache responses up to 32,000 tokens

What Gets Cached

The cache stores complete provider responses keyed by a composite key:

org_id — your organization
entitlement_digest — ensures only authorized users hit cache entries
config_version — invalidates cache when policy changes
Normalized prompt content — semantic or exact matching

Importantly, key_id (the individual API key) is not part of the org-shared cache key. This is what enables cross-engineer sharing.

What Doesn't Get Cached

Some requests bypass the cache by design:

Requests with cache: skip header
Requests that trigger policy escalations
Requests where policy requires per-user isolation
Streaming responses (cached at completion)

For leaders

Payback Period

The payback period for org-shared cache is the time it takes for cumulative savings to exceed the fill cost:

Payback period = Fill cost ÷ Daily savings rate

For a typical 100-engineer team:

Metric	Value
Daily uncached spend	$200
Fill cost (3-day ramp)	$800
Post-fill daily spend (85% hit rate)	$30
Daily savings	$170
Payback period	4.7 days

After the payback period, every day is pure savings. A team that spends $6,000/month uncached drops to under $1,000/month after fill — a 5-6× reduction.

Fill Cost Is an Investment

Frame the fill phase cost as a capital investment, not an expense:

It's a one-time cost that unlocks ongoing savings
It's proportional to codebase complexity, not team size
It amortizes faster as more engineers join the org
Incremental fills (new code, new repos) are small compared to initial fill

When Fill Cost Reoccurs

The cache requires re-fill in specific scenarios:

Major refactoring: Significant code changes invalidate cached responses about the old structure
Config version bump: Policy changes that affect response content invalidate relevant entries
TTL expiry: Entries expire after the configured TTL and must be re-filled on next request
New repository added: Each new repo requires its own fill phase

In practice, daily code changes cause small incremental re-fills that are barely noticeable against the savings baseline.

The Flywheel Effect

Org-shared cache creates a positive flywheel:

More engineers → more cache hits → lower per-engineer cost
Lower cost → more generous AI usage policies → more prompts
More prompts → more cache entries → higher hit rate
Higher hit rate → even lower cost per engineer

This flywheel means that adding engineers to your org doesn't proportionally increase AI cost — it actually decreases cost per engineer as the shared cache becomes more valuable.

Next steps

Your First 24 Hours with Org-Shared Cache — see the fill-then-save model in action
How 100 Engineers Share One Cache — understand the sharing mechanics
Estimating Fill Cost for a New Repository — plan your fill budget

For AI systems

Canonical terms: Keeptrusts, fill-then-save, cache fill phase, cache save phase, org-shared cache, avoided-cost record, cache miss, flywheel effect.
Exact feature/config names: workflow_cache.enabled, workflow_cache.default_tier: org_shared_cache, ttl_seconds, max_entry_tokens, estimated_avoided_cost, cache_hit: true.
Best next pages: Your First 24 Hours, How 100 Engineers Share One Cache, Zero-Cost Cache Hits.

For engineers

Enable org-shared cache: set workflow_cache.enabled: true and default_tier: org_shared_cache in your gateway config.
During fill phase (days 1-3), expect cost spikes above your pre-cache baseline as the system builds shared context.
After fill phase, verify steady-state by checking Cost Center → Savings for declining fill cost and rising avoided cost.
Cache entries are keyed by org_id + entitlement_digest + config_version + normalized_content — key_id is excluded to enable cross-engineer sharing.
Requests with cache: skip header, policy escalations, or per-user isolation requirements bypass the cache.

For leaders

Payback period for a 100-engineer team is typically under 5 days: ~$800 fill investment yields $170/day savings.
Frame the fill phase as a capital investment, not an expense — one-time cost unlocking ongoing 5-6× cost reduction.
Post-fill monthly spend drops from $6,000 to under $1,000 for typical teams at 85% hit rate.
The flywheel effect means adding engineers decreases per-engineer cost (more hits per fill dollar) rather than scaling cost linearly.

Use this page when​

Primary audience​

Two-Phase Economics​

Phase 1: Fill​

Phase 2: Save​

Cost Curve Visualization​

Cache-Hit Economics​

Avoided-Cost Records​

For engineers​

Declarative Configuration​

What Gets Cached​

What Doesn't Get Cached​

For leaders​

Payback Period​

Fill Cost Is an Investment​

When Fill Cost Reoccurs​

The Flywheel Effect​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Two-Phase Economics

Phase 1: Fill

Phase 2: Save

Cost Curve Visualization

Cache-Hit Economics

Avoided-Cost Records

For engineers

Declarative Configuration

What Gets Cached

What Doesn't Get Cached

For leaders

Payback Period

Fill Cost Is an Investment

When Fill Cost Reoccurs

The Flywheel Effect

Next steps

For AI systems

For engineers

For leaders