Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Estimating Fill Cost for a New Repository

The initial cache fill is a one-time investment that unlocks ongoing savings. Before connecting a new repository, estimate the fill cost so you can budget appropriately and set expectations with stakeholders.

Use this page when

  • You are connecting a new repository and need to budget the initial cache fill cost.
  • You want per-artifact-type cost formulas to estimate fabric build expense.
  • You are comparing fill cost against annual uncached spend to justify the investment.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

What "Fill Cost" Means

Fill cost is the total LLM provider spend required to:

  1. Build fabric artifacts — summarize files, generate embeddings, map dependencies
  2. Populate the response cache — the first request for each common question pays full price

Fabric artifact build is a predictable, calculable cost. Response cache population is driven by organic traffic and happens gradually over days.

Fill Cost Formula

Total fill cost = Fabric build cost + Response cache fill cost

Where:

Fabric build cost = Σ (artifact_type_cost × count_per_type)
Response cache fill cost ≈ Daily uncached spend × fill_days × (1 - early_hit_rate)

In practice, fabric build cost is the larger and more predictable component. Response cache fill cost depends on traffic patterns.

Fabric Build Cost by Artifact Type

repo_map

Generates a high-level structural map of the repository.

FactorImpact on cost
Number of top-level directoriesLow (1-3 LLM calls total)
Repository documentationLow (adds context to calls)

Typical cost: $0.02-0.10 per repository

Formula: repo_map_cost ≈ 3 calls × avg_call_cost

file_summary

Generates natural language summaries for each source file.

FactorImpact on cost
Number of source filesHigh (1 call per file)
Average file sizeMedium (larger files = more input tokens)
Language complexityLow (all languages similar cost)

Typical cost: $0.002-0.008 per file

Formula: file_summary_cost = num_files × avg_tokens_per_file × input_cost_per_token + num_files × avg_summary_tokens × output_cost_per_token

Example for 1,000 files:

Input: 1,000 files × 800 avg tokens × $3/1M = $2.40
Output: 1,000 files × 200 avg summary tokens × $15/1M = $3.00
Total: $5.40

dependency_graph

Maps module and package dependencies.

FactorImpact on cost
Number of modulesLow (static analysis dominant)
Dependency depthLow (graph traversal, not LLM)

Typical cost: $0.01-0.05 per repository (mostly static analysis, minimal LLM)

test_map

Maps test files to source files.

FactorImpact on cost
Number of test filesLow-Medium
Test framework complexityLow

Typical cost: $0.05-0.30 per repository

Formula: test_map_cost ≈ num_test_files × 0.001

api_inventory

Catalogs API endpoints and their schemas.

FactorImpact on cost
Number of route handlersMedium
Middleware complexityLow
Number of endpointsMedium

Typical cost: $0.10-1.00 per repository (depends on API surface area)

Formula: api_inventory_cost ≈ num_endpoints × 0.005

symbol_index

Indexes functions, classes, and types.

FactorImpact on cost
AnyNone — pure static analysis

Typical cost: $0.00 (no LLM calls, AST parsing only)

embedding_index

Generates semantic embedding vectors for code chunks.

FactorImpact on cost
Total lines of codeHigh (determines number of chunks)
Chunk size settingMedium (smaller chunks = more embeddings)

Typical cost: $0.0001 per chunk (embedding models are cheap)

Formula: embedding_cost = (total_lines ÷ chunk_size) × embedding_cost_per_call

Example for 100,000 lines (200-line chunks):

Chunks: 100,000 ÷ 200 = 500 chunks
Cost: 500 × $0.0001 = $0.05

recent_change_summary

Summarizes recent commit history.

FactorImpact on cost
Number of recent commitsLow-Medium
Commit message qualityLow

Typical cost: $0.05-0.20 per repository

known_failure_fingerprint

Catalogs known error patterns and their resolutions.

FactorImpact on cost
CI/CD log availabilityMedium
Number of known failuresLow

Typical cost: $0.02-0.10 per repository

Cost Estimation by Repository Size

Small Repository (fewer than 10,000 lines, fewer than 100 files)

ArtifactEstimated cost
repo_map$0.03
file_summary$0.50
dependency_graph$0.02
test_map$0.05
api_inventory$0.10
symbol_index$0.00
embedding_index$0.01
recent_change_summary$0.05
known_failure_fingerprint$0.03
Total fabric build$0.79

Medium Repository (10,000-100,000 lines, 100-1,000 files)

ArtifactEstimated cost
repo_map$0.05
file_summary$3.00-8.00
dependency_graph$0.03
test_map$0.15
api_inventory$0.40
symbol_index$0.00
embedding_index$0.05-0.10
recent_change_summary$0.10
known_failure_fingerprint$0.05
Total fabric build$3.83-8.88

Large Repository (100,000-500,000 lines, 1,000-5,000 files)

ArtifactEstimated cost
repo_map$0.08
file_summary$8.00-30.00
dependency_graph$0.05
test_map$0.30
api_inventory$1.00-3.00
symbol_index$0.00
embedding_index$0.10-0.50
recent_change_summary$0.15
known_failure_fingerprint$0.08
Total fabric build$9.76-34.16

Very Large Repository (500,000+ lines, 5,000+ files)

ArtifactEstimated cost
repo_map$0.10
file_summary$30.00-80.00
dependency_graph$0.08
test_map$0.50
api_inventory$3.00-8.00
symbol_index$0.00
embedding_index$0.50-2.50
recent_change_summary$0.20
known_failure_fingerprint$0.10
Total fabric build$34.48-91.48

Response Cache Fill Cost

Beyond fabric, the response cache fills as engineers work. This cost is:

Response fill cost ≈ daily_traffic × (1 - early_hit_rate) × avg_cost_per_request × fill_days

For a 100-engineer team:

Day 1: 5,000 requests × (1 - 0.10) × $0.015 = $67.50
Day 2: 5,000 requests × (1 - 0.35) × $0.015 = $48.75
Day 3: 5,000 requests × (1 - 0.55) × $0.015 = $33.75
Day 4: 5,000 requests × (1 - 0.70) × $0.015 = $22.50
Day 5: 5,000 requests × (1 - 0.78) × $0.015 = $16.50

Total 5-day response fill cost: ~$189

This is not "extra" cost — it's what you would have spent anyway, just with the bonus of building the cache layer for future savings.

Total Fill Cost vs. Annual Uncached Spend

The fill cost is tiny compared to what you save:

Repository sizeFabric buildResponse fill (5 days)Total fillAnnual uncached spendFill as % of annual
Small$0.79$50~$51$12,0000.4%
Medium$6.00$120~$126$36,0000.4%
Large$20.00$189~$209$48,0000.4%
Very Large$60.00$250~$310$60,0000.5%

The fill cost is consistently under 1% of annual uncached spend. It pays for itself within the first week.

Ongoing Refill Cost

After initial fill, the cache requires maintenance fills:

Daily Incremental Cost

Daily refill = changed_files × file_summary_cost + new_questions × miss_cost

For a typical day (30 files changed, 500 new unique questions):

Fabric refresh: 30 × $0.005 = $0.15
New response fills: 500 × $0.015 = $7.50
Total daily refill: $7.65

Compare to daily savings of $150-200 — the refill is under 5% of savings.

Monthly Maintenance Budget

Team sizeDaily refillMonthly refillMonthly savingsNet monthly savings
50 engineers$4.00$80$2,500$2,420
100 engineers$7.65$153$4,000$3,847
200 engineers$12.00$240$8,000$7,760

Planning Your Fill Budget

Step 1: Identify Repository Size

# Count source files (exclude vendor, node_modules, build artifacts)
find . -name "*.ts" -o -name "*.rs" -o -name "*.py" | wc -l

# Count total lines
find . -name "*.ts" -o -name "*.rs" -o -name "*.py" | xargs wc -l | tail -1

Step 2: Estimate Using Size Bracket

Match your repo to the size brackets above and use the estimated total.

Step 3: Add Response Fill Budget

Add 3-5 days of normal spend as response fill budget:

Response fill budget = current_daily_AI_spend × 5

Step 4: Set Wallet Balance

Ensure your org wallet has:

Required balance = fabric_build_estimate + response_fill_budget + 20% buffer

Step 5: Communicate to Stakeholders

Frame the fill cost as:

  • A one-time investment that pays back within 1 week
  • Less than 1% of what you'd spend annually without caching
  • The "price of admission" to 80-95% ongoing savings

Next steps

For AI systems

  • Canonical terms: Keeptrusts, fill cost, fabric build cost, response cache fill, per-artifact cost, file_summary, embedding_index, symbol_index, cache investment.
  • Formulas: Total fill = Fabric build + Response cache fill, Fabric build = Σ(artifact_type_cost × count), Response fill ≈ daily_traffic × (1 - early_hit_rate) × avg_cost × fill_days.
  • Best next pages: Budget Alerts for Fill Phases, Cache Hit Rates, ROI Calculation for a 100-Engineer Team.

For engineers

  • symbol_index costs $0 (pure AST parsing). embedding_index is cheap ($0.0001/chunk). file_summary dominates cost (1 LLM call per file).
  • Size your repo: find . -name "*.ts" -o -name "*.rs" | wc -l for file count, | xargs wc -l | tail -1 for LOC.
  • Small repo (fewer than 100 files): ~$0.79 fabric. Medium (100-1000 files): ~$4-$9. Large (1000-5000 files): ~$10-$34.
  • Response fill budget: multiply current daily AI spend by 5 days as fill budget.
  • Wallet balance needed: fabric_estimate + response_fill_budget + 20% buffer.
  • Ongoing daily refill is typically less than 5% of daily savings.

For leaders

  • Fill cost is consistently less than 1% of annual uncached spend. It pays for itself within the first week.
  • Total fill for a medium repo (100 engineers): ~$126. Annual savings: ~$36,000. ROI > 280×.
  • The cost is not additional spend — response fills are requests you’d pay for anyway, just with the bonus of populating cache.
  • Frame to stakeholders: one-time investment, less than 1% of annual cost, enables 80-95% ongoing reduction.