Embedding Indexes: Semantic Search Across Your Code
The embedding_index artifact transforms your codebase into a searchable semantic space. Instead of relying on exact keyword matches, engineers ask natural language questions like "find code that handles rate limiting" or "where do we validate user permissions" and receive relevant results based on meaning.
Use this page when
- You want AI to support natural language code search ("find code that handles rate limiting") without keyword matching.
- You need to configure embedding model selection, chunk strategy, vector backend (Qdrant), or update frequency.
- You are understanding how embedding indexes power semantic cache and context assembly.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
What an Embedding Index Contains
An embedding_index artifact stores vector representations of code segments:
Code Chunk Embeddings
Your codebase is divided into semantic chunks, and each chunk receives a vector embedding:
- Function-level embeddings (one vector per function body)
- File-level embeddings (one vector per file summary)
- Block-level embeddings (logical code blocks within large files)
- Comment and documentation embeddings
Metadata Per Chunk
Each embedding carries associated metadata:
- Source file path and line range
- Language and framework context
- Symbol names and types within the chunk
- Last modification timestamp
- Relevance tags (auth, database, API, UI, etc.)
Index Configuration
Settings that define how the index was built:
- Embedding model and version used
- Chunk size and overlap parameters
- Vector dimensionality
- Distance metric (cosine, dot product, euclidean)
How Embedding Indexes Work
Embedding Generation
The pipeline processes your codebase through a configured embedding model:
- Chunk — Split source files into semantic units (functions, classes, logical blocks)
- Prepare — Format each chunk with surrounding context for better embedding quality
- Embed — Pass chunks through the embedding model to produce dense vector representations
- Store — Write vectors and metadata to the vector storage backend
Query Processing
When an engineer searches with natural language:
- Embed the query — Transform the search query into a vector using the same model
- Similarity search — Find the nearest vectors in the index by distance metric
- Retrieve metadata — Return source locations and context for the top matches
- Rank and filter — Apply relevance thresholds and deduplication
Embedding Model Selection
You choose the embedding model that best fits your needs:
Factors to Consider
- Quality — Larger models produce better embeddings but cost more tokens
- Speed — Smaller models generate embeddings faster for large codebases
- Specialization — Some models are trained specifically on code, others on general text
- Dimensionality — Higher dimensions capture more nuance but require more storage
Supported Models
Context Fabric supports any embedding model accessible through your configured providers. Common choices include:
- Code-specialized models for high relevance on technical queries
- General-purpose models for natural language descriptions of code behavior
- Multilingual models for codebases with comments in multiple languages
You configure the model in your repository settings. Changing models triggers a full re-index.
Vector Storage
Embedding indexes require a vector database backend for efficient similarity search:
Qdrant
The recommended backend for production deployments. Qdrant provides:
- Fast approximate nearest neighbor search
- Metadata filtering alongside vector similarity
- Horizontal scaling for large indexes
- Segment-based storage for efficient updates
Semantic Backend
For smaller deployments or evaluation, Context Fabric supports a built-in semantic storage layer that requires no external infrastructure.
Storage Sizing
Estimate storage requirements based on:
- Number of code chunks (typically 5–20 per source file)
- Vector dimensionality (768–3072 dimensions depending on model)
- Metadata overhead per chunk (200–500 bytes)
A typical 10,000-file repository produces 50,000–200,000 chunks requiring 500MB–2GB of vector storage.
How Embedding Indexes Power the Cache
Embedding indexes serve a critical role in the org-shared engineering cache:
Semantic Cache Candidate Retrieval
When a new AI interaction arrives, the system embeds the query and searches the embedding index for relevant code. If a previous interaction asked a semantically similar question about the same code, the cache serves the previous result.
Context Assembly
The embedding index helps assemble optimal context for AI interactions:
- Embed the engineer's question
- Find the most relevant code chunks
- Retrieve cached file summaries for those chunks
- Assemble a focused context window that maximizes relevance per token
Deduplication
When multiple engineers ask similar questions, the embedding index identifies semantic overlap. Instead of treating each interaction independently, the cache recognizes equivalence and serves pre-computed answers.
Token Savings
Without semantic search, finding relevant code requires:
- Keyword grep across the codebase (often misses relevant results)
- Reading multiple candidate files to find the right one
- Iterating when initial results are not relevant
A semantic search against the embedding index returns relevant code in a single query, costing only the embedding computation (typically 100–500 tokens for the query). The alternative manual search path costs 10,000–50,000 tokens in file reading and iteration.
When Embedding Indexes Regenerate
Regeneration is incremental:
- New files — Generate embeddings for new chunks and insert into the index
- Modified files — Re-embed affected chunks and update their vectors
- Deleted files — Remove corresponding vectors from the index
- Model change — Full re-index required (new model produces incompatible vectors)
The incremental approach means daily repository changes update only a small fraction of the index, keeping regeneration fast and cost-effective.
Configuration
You configure embedding_index generation in your repository settings:
- Embedding model — Select the model for vector generation
- Chunk strategy — Function-level, file-level, or block-level chunking
- Vector backend — Qdrant instance URL or built-in semantic storage
- Update frequency — Real-time on push, scheduled, or manual
- Scope — Which directories and file types to index
- Exclusion patterns — Skip generated code, dependencies, binary files
Use Cases
Natural Language Code Search
Engineers ask "find the code that validates webhook signatures" and receive the exact function, even if it does not contain those exact words.
Similar Code Discovery
When implementing a new feature, engineers ask "show me code that does something similar to X" to find patterns and examples.
Knowledge Discovery
New team members explore the codebase by describing what they are looking for in plain language rather than guessing file names or function names.
Duplicate Detection
The embedding index reveals semantically similar code blocks that may be candidates for consolidation.
Next steps
- What is Codebase Context Fabric? — Return to the architecture overview
- File Summaries — Combine semantic search with structured summaries
For AI systems
- Canonical terms: Keeptrusts, Codebase Context Fabric, embedding_index artifact, semantic search, vector storage, Qdrant, embedding model, code chunks, similarity search, semantic cache candidate retrieval, context assembly.
- Feature/config names:
embedding_indexartifact type, embedding model, chunk strategy (function/file/block), vector backend (Qdrant, built-in semantic storage), update frequency (real-time/scheduled/manual), scope, exclusion patterns, distance metric (cosine/dot product/euclidean). - Best next pages: File Summaries, Fabric Slices Reduce Prompts, Artifact Freshness.
For engineers
- The embedding pipeline: chunk code → prepare with context → embed via configured model → store in Qdrant or built-in backend.
- Regeneration is incremental: new files get new embeddings, modified files re-embed, deleted files remove vectors. Full re-index only on model change.
- Storage sizing: 10,000-file repo → 50,000-200,000 chunks → 500MB-2GB vector storage (depends on dimensionality).
- Configure: embedding model, chunk strategy, backend URL, update frequency, scope patterns, and exclusion patterns in repository settings.
For leaders
- Embedding indexes enable “find code that does X” queries — dramatically accelerating onboarding, knowledge discovery, and duplicate detection.
- Token savings: semantic search finds relevant code in one query (100-500 tokens) vs. manual keyword grep + file reading (10,000-50,000 tokens).
- Infrastructure cost: Qdrant cluster for vector storage. Scale based on codebase size and dimensionality. The index serves the entire org.
- Semantic cache integration: embedding indexes power cache hit detection for semantically equivalent (not just identical) requests.