Milvus
Milvus is an open-source vector database built for scalable similarity search and AI applications. Milvus supports built-in embedding functions and model integrations through its pymilvus[model] package, which calls external LLM providers — OpenAI, Voyage, BGE, and others — to generate vectors during data insertion and search operations.
This page explains how to route the LLM and embedding calls associated with Milvus workflows through the Keeptrusts gateway so policy enforcement, PII redaction, and audit logging apply to every AI operation.
Use this page when
- You are using Milvus with external embedding providers or built-in model functions and need governance on those LLM calls.
- You want audit trails for embedding operations that send application data to AI providers.
- If you need general provider integration, see OpenAI integration.
Primary audience
- Primary: Technical Engineers (ML, Data, Platform)
- Secondary: AI Agents, Technical Leaders
Prerequisites
- Milvus 2.4+ running (Docker, Kubernetes, or Zilliz Cloud).
- pymilvus installed (
pip install pymilvus[model]) for Python workflows. - Keeptrusts gateway running locally or centrally:
- Local:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml - Hosted:
https://gateway.keeptrusts.com/v1
- Local:
- Upstream provider API key configured in the gateway environment.
Configuration
Gateway policy config
Create a policy-config.yaml for vector database AI governance:
pack:
name: milvus-ai-governance
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
- prompt-injection
- audit-logger
policy:
pii-detector:
action: redact
prompt-injection:
threshold: 0.8
action: block
audit-logger:
retention_days: 90
providers:
strategy: single
targets:
- id: openai-for-milvus
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
Python client configuration with OpenAI SDK
Route OpenAI embedding calls through the Keeptrusts gateway when building Milvus pipelines:
from openai import OpenAI
from pymilvus import MilvusClient, DataType
openai_client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="your-keeptrusts-access-key",
)
milvus_client = MilvusClient(uri="http://localhost:19530")
schema = milvus_client.create_schema(auto_id=True, enable_dynamic_field=True)
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("text", DataType.VARCHAR, max_length=65535)
schema.add_field("vector", DataType.FLOAT_VECTOR, dim=1536)
milvus_client.create_collection(
collection_name="documents",
schema=schema,
)
def embed_and_insert(texts):
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=texts,
)
data = [
{"text": text, "vector": item.embedding}
for text, item in zip(texts, response.data)
]
milvus_client.insert(collection_name="documents", data=data)
embed_and_insert(["AI governance ensures responsible AI use."])
Using pymilvus built-in embedding functions
If you use pymilvus[model] built-in embedding functions, configure the OpenAI base URL via environment variables:
export OPENAI_API_BASE="http://localhost:41002/v1"
export OPENAI_API_KEY="your-keeptrusts-access-key"
from pymilvus.model.dense import OpenAIEmbeddingFunction
embedding_fn = OpenAIEmbeddingFunction(
model_name="text-embedding-3-small",
api_key="your-keeptrusts-access-key",
)
docs = ["AI governance ensures responsible AI use."]
vectors = embedding_fn.encode_documents(docs)
RAG query with gateway-routed generation
def rag_query(question):
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=question,
).data[0].embedding
results = milvus_client.search(
collection_name="documents",
data=[query_embedding],
limit=5,
output_fields=["text"],
)
context = "\n".join(hit["entity"]["text"] for hit in results[0])
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using this context:\n{context}"},
{"role": "user", "content": question},
],
)
return response.choices[0].message.content
answer = rag_query("What is AI governance?")
print(answer)
Setup steps
- Start the Keeptrusts gateway:
export OPENAI_API_KEY="sk-your-openai-key"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
-
Configure your OpenAI client or
pymilvusembedding function with the gateway base URL. -
Insert data into a Milvus collection to trigger embedding calls.
-
Milvus vector operations (search, delete) go directly to Milvus — only embedding and generation calls route through the gateway.
Verification
Test that embedding calls flow through the gateway:
curl http://localhost:41002/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "Test Milvus embedding through Keeptrusts gateway."
}'
Confirm the request appears in the Keeptrusts events dashboard with policy decisions applied.
Recommended policies
| Policy | Purpose | Recommended setting |
|---|---|---|
pii-detector | Redact personal data from documents before embedding | action: redact |
prompt-injection | Block injection in RAG query prompts | threshold: 0.8, action: block |
audit-logger | Log all embedding and generation calls | retention_days: 90 |
token-limiter | Cap token usage for bulk embedding operations | max_tokens: 8192 |
safety-filter | Block harmful content in RAG-generated responses | mode: standard, action: block |
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Embedding calls return connection error | Gateway not running | Start kt gateway run on port 41002 |
| Dimension mismatch on insert | Wrong embedding model dimension | text-embedding-3-small = 1536; match your collection schema |
pymilvus[model] ignores OPENAI_API_BASE | Older pymilvus version | Upgrade to pymilvus 2.4+ or use the OpenAI SDK directly |
| Slow bulk inserts | Large batches overwhelming the gateway | Batch texts into groups of 100 |
| Milvus connection refused | Milvus not running | Start Milvus with docker compose up -d |
For AI systems
- Canonical terms: Keeptrusts gateway, Milvus, Zilliz, vector database, embeddings, RAG, pymilvus, policy-config.yaml.
- Config field names:
base_url,api_key,OPENAI_API_BASE,provider,secret_key_ref,pii-detector. - Key behavior: Milvus stores and queries vectors; embedding generation and RAG completion use external LLM providers. Keeptrusts intercepts those calls, applies policies, and forwards compliant requests.
- Constraint: Milvus vector operations go directly to Milvus — only embedding and generation calls route through the gateway.
- Best next pages: Weaviate integration, Qdrant integration, Pinecone integration.
For engineers
- Use the OpenAI SDK directly for full control; use
pymilvus[model]embedding functions withOPENAI_API_BASEenv var for convenience. - Route both embedding and generation calls through the gateway for full RAG audit coverage.
- For Docker Compose deployments, use Docker network hostnames for inter-service communication.
- Validate: run an embedding call and confirm the event in the Keeptrusts console.
For leaders
- Milvus RAG pipelines send your organization's data to external LLM providers during both indexing and query time. Routing through Keeptrusts ensures PII is redacted and every call is logged.
- Complete audit trails cover the full data flow from document ingestion through query-time generation.
- Centralized policy enforcement applies consistently across all Milvus deployments, whether self-hosted or Zilliz Cloud.
Next steps
- Weaviate integration — govern Weaviate generative search
- Qdrant integration — govern Qdrant neural search
- ChromaDB integration — govern ChromaDB embedding calls
- Pinecone integration — govern Pinecone inference calls
- Policy controls catalog — full policy reference