Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Databricks

Keeptrusts integrates with Databricks Model Serving Foundation Model APIs, giving you a policy enforcement layer over Llama, DBRX, and Mixtral models running inside your Databricks workspace. Because Databricks Foundation Models store no customer data by default, this integration is well-suited for regulated workloads that require full audit trails without sacrificing zero-retention guarantees.

Use this page when

  • You need the exact command, config, API, or integration details for Databricks.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Prerequisites

  • A Databricks workspace (AWS, Azure, or GCP) with Model Serving enabled
  • A Databricks personal access token (PAT) with CAN_USE permission on the served model endpoints
  • kt CLI installed and authenticated (kt auth login)

Set your token before starting the gateway:

export DATABRICKS_TOKEN="dapi..."

Configuration

Minimal — single Foundation Model endpoint

pack:
name: databricks-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: databricks-llama
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Full governance config

pack:
name: databricks-enterprise
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- dlp-filter
- rbac
- audit-logger
policy:
rbac:
roles:
data-engineer:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
- databricks-dbrx-instruct
max_tokens_per_request: 4096
data-scientist:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
- databricks-meta-llama-3-1-405b-instruct
- databricks-dbrx-instruct
- databricks-mixtral-8x7b-instruct
max_tokens_per_request: 8192
analyst:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
max_tokens_per_request: 2048
dlp-filter:
patterns:
- name: databricks-pat
regex: dapi[a-f0-9]{32}
action: block
- name: jdbc-connection-string
regex: jdbc:databricks://[^\s]+
action: redact
- name: unity-catalog-path
regex: catalog\.schema\.table
action: redact
pii-detector:
action: redact
entities:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
providers:
targets:
- id: databricks-llama-70b
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
- id: databricks-llama-405b
provider: databricks:chat:databricks-meta-llama-3-1-405b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
- id: databricks-dbrx
provider: databricks:chat:databricks-dbrx-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
- id: databricks-embeddings
provider: databricks
model: databricks-bge-large-en
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN

Provider Fields

FieldRequiredDescription
providerYes"databricks" or "databricks:chat:{model-endpoint-name}"
base_urlYesYour workspace serving endpoint URL: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_refYesEnvironment variable holding the Databricks PAT (e.g. DATABRICKS_TOKEN)
modelNoEndpoint name when using the bare "databricks" provider ID
formatNo"openai" (default for Foundation Model APIs)
data_policy.zero_data_retentionNotrue — Databricks Foundation Models do not store request/response data

Supported Models

Models are served through Databricks Model Serving and billed per-token via Databricks Foundation Model APIs.

Model EndpointContext WindowInput (per 1M)Output (per 1M)Notes
databricks-meta-llama-3-3-70b-instruct128k$0.54$0.54Best price/performance; recommended default
databricks-meta-llama-3-1-405b-instruct128k$5.00$15.00Highest capability open-weight model
databricks-dbrx-instruct32k$0.75$2.25Databricks flagship MoE
databricks-mixtral-8x7b-instruct32k$0.50$1.00Fast MoE; cost-efficient for high volume
databricks-bge-large-en512 tokens$0.10Embeddings only; 1024-dim vectors

Pricing reflects Databricks published rates. Actual charges depend on your workspace agreement and DBU pricing tier.

Client Examples

Start the gateway:

export DATABRICKS_TOKEN="dapi..."
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
from openai import OpenAI

client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="unused", # auth handled by Keeptrusts
)

# Chat completion
response = client.chat.completions.create(
model="databricks-meta-llama-3-3-70b-instruct",
messages=[
{"role": "system", "content": "You are a data engineering expert."},
{"role": "user", "content": "Write a PySpark query to compute 7-day rolling average sales by region."},
],
max_tokens=1024,
temperature=0.2,
)
print(response.choices[0].message.content)

# Embeddings
embedding = client.embeddings.create(
model="databricks-bge-large-en",
input="quarterly revenue by product line",
)
print(f"Vector dimensions: {len(embedding.data[0].embedding)}")

Streaming

Databricks Foundation Model APIs support server-sent event (SSE) streaming. Keeptrusts passes streams through transparently after policy checks on the initial request.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:41002/v1", api_key="unused")

with client.chat.completions.stream(
model="databricks-meta-llama-3-3-70b-instruct",
messages=[{"role": "user", "content": "Explain Delta Lake ACID transactions step by step."}],
max_tokens=2048,
) as stream:
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

In your policy config, set a reasonable stream timeout:

pack:
name: databricks-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: databricks-llama-70b
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Advanced Configuration

Unity Catalog integration and DLP

When Databricks AI runs inside a Unity Catalog-governed workspace, sensitive table names, column names, and schema paths may leak into prompts. Add DLP patterns that match your catalog structure:

policy:
dlp-filter:
detect_patterns:
- '[a-z_]+\.[a-z_]+\.[a-z_]+'
- dbutils\.secrets\.get\([^)]+\)
- 0[0-9]{3}-[0-9]{6}-[a-z0-9]{8}
action: block
pack:
name: databricks-example-4
version: 1.0.0
enabled: true
policies:
chain:
- dlp-filter

RBAC for multi-team workspaces

Large Databricks deployments typically serve multiple teams with different model budgets. Map workspace groups to Keeptrusts roles and limit each role to cost-appropriate endpoints:

policy:
rbac:
roles:
ai-platform:
allowed_models:
- databricks-meta-llama-3-1-405b-instruct
- databricks-meta-llama-3-3-70b-instruct
- databricks-dbrx-instruct
- databricks-mixtral-8x7b-instruct
max_tokens_per_request: 16384
application-team:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
max_tokens_per_request: 4096
read-only:
allowed_models: []
action: block
pack:
name: databricks-example-5
version: 1.0.0
enabled: true
policies:
chain:
- rbac

Zero-data-retention audit trail

Databricks Foundation Models store no customer data server-side. Keeptrusts's audit logger captures a local event record for every request so you maintain a compliance trail without relying on provider storage:

policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
providers:
targets:
- id: databricks-llama-70b
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
pack:
name: databricks-example-6
version: 1.0.0
enabled: true
policies:
chain:
- audit-logger

Best Practices

  1. Pin to the workspace region closest to your data — Databricks serving endpoints are regional. Use a base_url that matches your primary data region to minimise latency and avoid cross-region data movement.

  2. Rotate PATs on a schedule — Databricks PATs do not expire by default. Set a 90-day rotation policy, use DATABRICKS_TOKEN in a secrets manager (Vault, AWS Secrets Manager), and inject at runtime rather than baking into config files.

  3. Use databricks-meta-llama-3-3-70b-instruct as the default tier — It offers the best cost/quality ratio for most enterprise tasks. Reserve the 405B endpoint for tasks that demonstrably require it and protect it with a role that requires explicit elevation.

  4. Block raw Databricks credentials in prompts — Add DLP patterns for PAT prefixes (dapi), JDBC connection strings, and secret scope references. A leaked token in a prompt can expose your entire workspace.

  5. Enable zero-data-retention flags in both Keeptrusts and Databricks — Set data_policy.zero_data_retention: true in Keeptrusts and confirm your Databricks workspace has Foundation Model API data retention disabled. Document this in your compliance evidence package.

  6. Test fallback with databricks-mixtral-8x7b-instruct — Use the Keeptrusts routing policy to fall back to Mixtral when the 70B endpoint times out under load. Mixtral is significantly cheaper and can handle most non-critical requests without quality loss.

For AI systems

  • Canonical terms: Keeptrusts gateway, Databricks, Databricks Model Serving, Mosaic ML, Unity Catalog, provider target, policy-config.yaml, provider: "databricks".
  • Config field names: provider, model, base_url, secret_key_ref.env: "DATABRICKS_TOKEN", format: "openai", timeout_seconds, health_probe.
  • Key behavior: Keeptrusts routes to Databricks Model Serving endpoints using PAT or OAuth token auth with OpenAI-compatible format.
  • Best next pages: AWS Bedrock integration, Together AI integration, Provider routing.

For engineers

  • Prerequisites: Databricks workspace with Model Serving endpoint deployed, Personal Access Token (DATABRICKS_TOKEN), kt CLI installed.
  • Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
  • Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"databricks-meta-llama-3-1-70b-instruct","messages":[{"role":"user","content":"hello"}]}'.
  • Base URL follows Databricks workspace pattern: https://<workspace>.databricks.com/serving-endpoints.
  • Use fallback strategy with Mixtral 8x7B as a cost-effective fallback for when 70B endpoints time out under load.

For leaders

  • Databricks Model Serving integrates with Unity Catalog governance — Keeptrusts adds policy enforcement on the request path that Unity Catalog does not cover.
  • Data stays within your Databricks workspace — no prompts leave your cloud account, satisfying data residency requirements.
  • Fallback from expensive 70B models to Mixtral 8x7B provides cost control without complete service degradation.
  • Keeptrusts audit logging complements Databricks system tables for end-to-end observability of AI workloads.

Next steps