Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Single-Flight Fill: One Request Serves Many

When multiple engineers send the same prompt within seconds — a common occurrence at the start of the workday — the gateway collapses them into a single upstream request. One engineer pays the fill cost. Everyone else receives the cached result. This is single-flight fill.

Use this page when

  • You want to understand how concurrent identical requests are deduplicated to save cost during cache fill.
  • You are diagnosing morning-surge cost spikes and want to verify single-flight is collapsing duplicates.
  • You need to tune single-flight timeout or monitor collapse metrics in the console.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

The Morning Surge Problem

Consider a team of 100 engineers who start work at 9:00 AM. Within the first 10 minutes:

  • 60 engineers open their IDE and trigger context-loading prompts
  • Many of these prompts are identical (same codebase, same system context, same common questions)
  • Without single-flight: 60 upstream API calls, 60× the cost
  • With single-flight: ~5-8 upstream API calls, the rest served from cache

The gateway detects that multiple in-flight requests share the same cache key and holds subsequent callers until the first response returns.

How Single-Flight Works

Engineer A sends prompt → Cache miss → Forward upstream → Wait for response
Engineer B sends same prompt → Cache miss → Detect in-flight request → Wait
Engineer C sends same prompt → Cache miss → Detect in-flight request → Wait
...
Provider responds to Engineer A's request
→ Store in cache
→ Serve cached response to Engineer A
→ Serve cached response to Engineer B
→ Serve cached response to Engineer C

Only one upstream call fires. Only one fill cost is incurred. All waiting engineers receive the response with minimal additional latency (typically < 100ms beyond the first response).

Cost Impact

For the morning surge example:

MetricWithout Single-FlightWith Single-Flight
Upstream calls60~6
Fill cost (at $0.08/request avg)$4.80$0.48
Cache entries created60 (duplicates)6 (unique)
Latency for first requester~2s~2s
Latency for collapsed requesters~2s each (parallel)~2.1s (wait + serve)

Over a month with daily surges, single-flight saves hundreds of dollars in fill cost alone — before the ongoing cache-hit savings even begin.

Viewing Single-Flight Metrics

Navigate to Cost Center → Cache Performance in the console to see:

MetricDescription
single_flight_collapsesNumber of requests that waited on an in-flight fill
single_flight_groupsNumber of distinct cache keys that had multiple waiters
avg_waiters_per_groupAverage number of requests collapsed per single-flight group
max_waiters_per_groupPeak collapse count (useful for capacity planning)

Example Metrics Output

{
"period": "2026-04-01T09:00:00Z/2026-04-01T09:10:00Z",
"single_flight_collapses": 54,
"single_flight_groups": 6,
"avg_waiters_per_group": 9,
"max_waiters_per_group": 18,
"fill_cost_incurred": 0.48,
"fill_cost_avoided_by_collapse": 4.32
}

In this 10-minute window, single-flight saved $4.32 by reducing 60 potential upstream calls to 6.

Latency Characteristics

Single-flight introduces minimal latency for waiting requests:

  • First requester: Normal upstream latency (1-5 seconds depending on model and token count)
  • Collapsed requesters: First requester's latency + ~10-50ms (local cache serve time)
  • No requester times out: The gateway holds waiters until the upstream response arrives or the upstream call fails

If the upstream call fails, all waiters receive the error. The failed response is not cached.

When Single-Flight Activates

Single-flight activates when:

  1. A request produces a cache key that matches an in-flight upstream call
  2. The in-flight call has not yet completed
  3. The requesting engineer has valid entitlements for the cached content

Single-flight does not activate when:

  • The cache already contains a valid entry (this is a normal cache hit, even cheaper)
  • The in-flight request targets a different model or configuration
  • Policy evaluation produces a different outcome for the waiting request

Interaction with Wallet Reserve

For single-flight collapsed requests:

  • The first requester's wallet is reserved and settled normally (fill cost)
  • Collapsed requesters are not charged — they receive a cache hit once the fill completes
  • Avoided-cost records are emitted for all collapsed requesters

This means single-flight requests appear in your savings dashboard as avoided cost, just like regular cache hits.

Maximizing Single-Flight Effectiveness

To increase the frequency of single-flight collapses:

  1. Standardize IDE configurations — Consistent system prompts produce identical cache keys
  2. Use shared gateway configurations — All engineers on the same codebase should route through the same gateway config
  3. Align work schedules loosely — Teams starting at similar times naturally produce concurrent identical requests
  4. Keep codebase context stable — Rapid file changes between requests reduce key overlap

Monitoring and Alerting

Set up alerts for single-flight metrics:

  • High collapse counts indicate good cost efficiency but may suggest capacity planning needs
  • Zero collapses during peak hours suggest cache keys are too diverse (investigate context ordering)
  • Failed single-flight groups (upstream errors affecting multiple waiters) warrant provider reliability review

Scale Example: 100 Engineers, One Monorepo

Time of DayRequestsUnique KeysSingle-Flight CollapsesUpstream CallsSavings
9:00-9:10 AM85127312$5.84
9:10-10:00 AM2004515545$12.40
10:00 AM-12:00 PM400180220180$17.60
Full day2,5008001,700800$136.00

Daily savings from single-flight alone: $136. Monthly: ~$2,720.

This is in addition to regular cache-hit savings from previously filled entries.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, single-flight fill, request deduplication, org-shared cache, cache fill cost, concurrent request collapse.
  • Exact feature/config names: single_flight_enabled, single_flight_timeout_ms, single_flight_collapses metric, single_flight_groups metric, workflow_cache config block.
  • Best next pages: Zero-Cost Cache Hits, Wallet Integration with Cache Hits, Tracking Avoided Cost.

For engineers

  • Single-flight activates automatically when single_flight_enabled: true in your workflow_cache configuration.
  • Monitor single-flight effectiveness via Cost Center → Cache Performance metrics: single_flight_collapses, avg_waiters_per_group.
  • Zero collapses during peak hours suggests cache keys are too diverse — check context ordering and prompt normalization.
  • Collapsed requesters pay nothing; only the flight leader's wallet is reserved/settled.

For leaders

  • Single-flight fill reduces the "morning surge" cost spike by 85-90% for teams starting work at similar times.
  • No per-engineer licensing or configuration is required — the gateway handles deduplication transparently.
  • Monthly savings scale with team size: 100 engineers on one monorepo can save ~$2,700/month from single-flight alone.
  • Combine with org-shared cache hits for compound savings (single-flight prevents duplicate fills; cache hits prevent all subsequent calls).