Embedding Indexes: Semantic Search Across Your Code

The embedding_index artifact transforms your codebase into a searchable semantic space. Instead of relying on exact keyword matches, engineers ask natural language questions like "find code that handles rate limiting" or "where do we validate user permissions" and receive relevant results based on meaning.

Use this page when

You want AI to support natural language code search ("find code that handles rate limiting") without keyword matching.
You need to configure embedding model selection, chunk strategy, vector backend (Qdrant), or update frequency.
You are understanding how embedding indexes power semantic cache and context assembly.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

What an Embedding Index Contains

An embedding_index artifact stores vector representations of code segments:

Code Chunk Embeddings

Your codebase is divided into semantic chunks, and each chunk receives a vector embedding:

Function-level embeddings (one vector per function body)
File-level embeddings (one vector per file summary)
Block-level embeddings (logical code blocks within large files)
Comment and documentation embeddings

Metadata Per Chunk

Each embedding carries associated metadata:

Source file path and line range
Language and framework context
Symbol names and types within the chunk
Last modification timestamp
Relevance tags (auth, database, API, UI, etc.)

Index Configuration

Settings that define how the index was built:

Embedding model and version used
Chunk size and overlap parameters
Vector dimensionality
Distance metric (cosine, dot product, euclidean)

How Embedding Indexes Work

Embedding Generation

The pipeline processes your codebase through a configured embedding model:

Chunk — Split source files into semantic units (functions, classes, logical blocks)
Prepare — Format each chunk with surrounding context for better embedding quality
Embed — Pass chunks through the embedding model to produce dense vector representations
Store — Write vectors and metadata to the vector storage backend

Query Processing

When an engineer searches with natural language:

Embed the query — Transform the search query into a vector using the same model
Similarity search — Find the nearest vectors in the index by distance metric
Retrieve metadata — Return source locations and context for the top matches
Rank and filter — Apply relevance thresholds and deduplication

Embedding Model Selection

You choose the embedding model that best fits your needs:

Factors to Consider

Quality — Larger models produce better embeddings but cost more tokens
Speed — Smaller models generate embeddings faster for large codebases
Specialization — Some models are trained specifically on code, others on general text
Dimensionality — Higher dimensions capture more nuance but require more storage

Supported Models

Context Fabric supports any embedding model accessible through your configured providers. Common choices include:

Code-specialized models for high relevance on technical queries
General-purpose models for natural language descriptions of code behavior
Multilingual models for codebases with comments in multiple languages

You configure the model in your repository settings. Changing models triggers a full re-index.

Vector Storage

Embedding indexes require a vector database backend for efficient similarity search:

Qdrant

The recommended backend for production deployments. Qdrant provides:

Fast approximate nearest neighbor search
Metadata filtering alongside vector similarity
Horizontal scaling for large indexes
Segment-based storage for efficient updates

Semantic Backend

For smaller deployments or evaluation, Context Fabric supports a built-in semantic storage layer that requires no external infrastructure.

Storage Sizing

Estimate storage requirements based on:

Number of code chunks (typically 5–20 per source file)
Vector dimensionality (768–3072 dimensions depending on model)
Metadata overhead per chunk (200–500 bytes)

A typical 10,000-file repository produces 50,000–200,000 chunks requiring 500MB–2GB of vector storage.

How Embedding Indexes Power the Cache

Embedding indexes serve a critical role in the org-shared engineering cache:

Semantic Cache Candidate Retrieval

When a new AI interaction arrives, the system embeds the query and searches the embedding index for relevant code. If a previous interaction asked a semantically similar question about the same code, the cache serves the previous result.

Context Assembly

The embedding index helps assemble optimal context for AI interactions:

Embed the engineer's question
Find the most relevant code chunks
Retrieve cached file summaries for those chunks
Assemble a focused context window that maximizes relevance per token

Deduplication

When multiple engineers ask similar questions, the embedding index identifies semantic overlap. Instead of treating each interaction independently, the cache recognizes equivalence and serves pre-computed answers.

Token Savings

Without semantic search, finding relevant code requires:

Keyword grep across the codebase (often misses relevant results)
Reading multiple candidate files to find the right one
Iterating when initial results are not relevant

A semantic search against the embedding index returns relevant code in a single query, costing only the embedding computation (typically 100–500 tokens for the query). The alternative manual search path costs 10,000–50,000 tokens in file reading and iteration.

When Embedding Indexes Regenerate

Regeneration is incremental:

New files — Generate embeddings for new chunks and insert into the index
Modified files — Re-embed affected chunks and update their vectors
Deleted files — Remove corresponding vectors from the index
Model change — Full re-index required (new model produces incompatible vectors)

The incremental approach means daily repository changes update only a small fraction of the index, keeping regeneration fast and cost-effective.

Configuration

You configure embedding_index generation in your repository settings:

Embedding model — Select the model for vector generation
Chunk strategy — Function-level, file-level, or block-level chunking
Vector backend — Qdrant instance URL or built-in semantic storage
Update frequency — Real-time on push, scheduled, or manual
Scope — Which directories and file types to index
Exclusion patterns — Skip generated code, dependencies, binary files

Use Cases

Natural Language Code Search

Engineers ask "find the code that validates webhook signatures" and receive the exact function, even if it does not contain those exact words.

Similar Code Discovery

When implementing a new feature, engineers ask "show me code that does something similar to X" to find patterns and examples.

Knowledge Discovery

New team members explore the codebase by describing what they are looking for in plain language rather than guessing file names or function names.

Duplicate Detection

The embedding index reveals semantically similar code blocks that may be candidates for consolidation.

Next steps

What is Codebase Context Fabric? — Return to the architecture overview
File Summaries — Combine semantic search with structured summaries

For AI systems

Canonical terms: Keeptrusts, Codebase Context Fabric, embedding_index artifact, semantic search, vector storage, Qdrant, embedding model, code chunks, similarity search, semantic cache candidate retrieval, context assembly.
Feature/config names: embedding_index artifact type, embedding model, chunk strategy (function/file/block), vector backend (Qdrant, built-in semantic storage), update frequency (real-time/scheduled/manual), scope, exclusion patterns, distance metric (cosine/dot product/euclidean).
Best next pages: File Summaries, Fabric Slices Reduce Prompts, Artifact Freshness.

For engineers

The embedding pipeline: chunk code → prepare with context → embed via configured model → store in Qdrant or built-in backend.
Regeneration is incremental: new files get new embeddings, modified files re-embed, deleted files remove vectors. Full re-index only on model change.
Storage sizing: 10,000-file repo → 50,000-200,000 chunks → 500MB-2GB vector storage (depends on dimensionality).
Configure: embedding model, chunk strategy, backend URL, update frequency, scope patterns, and exclusion patterns in repository settings.

For leaders

Embedding indexes enable “find code that does X” queries — dramatically accelerating onboarding, knowledge discovery, and duplicate detection.
Token savings: semantic search finds relevant code in one query (100-500 tokens) vs. manual keyword grep + file reading (10,000-50,000 tokens).
Infrastructure cost: Qdrant cluster for vector storage. Scale based on codebase size and dimensionality. The index serves the entire org.
Semantic cache integration: embedding indexes power cache hit detection for semantically equivalent (not just identical) requests.

Use this page when​

Primary audience​

What an Embedding Index Contains​

Code Chunk Embeddings​

Metadata Per Chunk​

Index Configuration​

How Embedding Indexes Work​

Embedding Generation​

Query Processing​

Embedding Model Selection​

Factors to Consider​

Supported Models​

Vector Storage​

Qdrant​

Semantic Backend​

Storage Sizing​

How Embedding Indexes Power the Cache​

Semantic Cache Candidate Retrieval​

Context Assembly​

Deduplication​

Token Savings​

When Embedding Indexes Regenerate​

Configuration​

Use Cases​

Natural Language Code Search​

Similar Code Discovery​

Knowledge Discovery​

Duplicate Detection​

Next steps​

For AI systems​

For engineers​

For leaders​