Databricks

Keeptrusts integrates with Databricks Model Serving Foundation Model APIs, giving you a policy enforcement layer over Llama, DBRX, and Mixtral models running inside your Databricks workspace. Because Databricks Foundation Models store no customer data by default, this integration is well-suited for regulated workloads that require full audit trails without sacrificing zero-retention guarantees.

Use this page when

You need the exact command, config, API, or integration details for Databricks.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

A Databricks workspace (AWS, Azure, or GCP) with Model Serving enabled
A Databricks personal access token (PAT) with CAN_USE permission on the served model endpoints
kt CLI installed and authenticated (kt auth login)

Set your token before starting the gateway:

export DATABRICKS_TOKEN="dapi..."

Configuration

Minimal — single Foundation Model endpoint

pack:
  name: databricks-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: databricks-llama
    provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
    base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
    secret_key_ref:
      env: DATABRICKS_TOKEN
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Full governance config

pack:
  name: databricks-enterprise
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - dlp-filter
  - rbac
  - audit-logger
policy:
  rbac:
    roles:
      data-engineer:
        allowed_models:
        - databricks-meta-llama-3-3-70b-instruct
        - databricks-dbrx-instruct
        max_tokens_per_request: 4096
      data-scientist:
        allowed_models:
        - databricks-meta-llama-3-3-70b-instruct
        - databricks-meta-llama-3-1-405b-instruct
        - databricks-dbrx-instruct
        - databricks-mixtral-8x7b-instruct
        max_tokens_per_request: 8192
      analyst:
        allowed_models:
        - databricks-meta-llama-3-3-70b-instruct
        max_tokens_per_request: 2048
  dlp-filter:
    patterns:
    - name: databricks-pat
      regex: dapi[a-f0-9]{32}
      action: block
    - name: jdbc-connection-string
      regex: jdbc:databricks://[^\s]+
      action: redact
    - name: unity-catalog-path
      regex: catalog\.schema\.table
      action: redact
  pii-detector:
    action: redact
    entities:
    - PERSON
    - EMAIL_ADDRESS
    - PHONE_NUMBER
providers:
  targets:
  - id: databricks-llama-70b
    provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
    base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
    secret_key_ref:
      env: DATABRICKS_TOKEN
  - id: databricks-llama-405b
    provider: databricks:chat:databricks-meta-llama-3-1-405b-instruct
    base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
    secret_key_ref:
      env: DATABRICKS_TOKEN
  - id: databricks-dbrx
    provider: databricks:chat:databricks-dbrx-instruct
    base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
    secret_key_ref:
      env: DATABRICKS_TOKEN
  - id: databricks-embeddings
    provider: databricks
    model: databricks-bge-large-en
    base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
    secret_key_ref:
      env: DATABRICKS_TOKEN

Provider Fields

Field	Required	Description
`provider`	Yes	`"databricks"` or `"databricks:chat:{model-endpoint-name}"`
`base_url`	Yes	Your workspace serving endpoint URL: `https://{workspace}.azuredatabricks.net/serving-endpoints`
`secret_key_ref`	Yes	Environment variable holding the Databricks PAT (e.g. `DATABRICKS_TOKEN`)
`model`	No	Endpoint name when using the bare `"databricks"` provider ID
`format`	No	`"openai"` (default for Foundation Model APIs)
`data_policy.zero_data_retention`	No	`true` — Databricks Foundation Models do not store request/response data

Supported Models

Models are served through Databricks Model Serving and billed per-token via Databricks Foundation Model APIs.

Model Endpoint	Context Window	Input (per 1M)	Output (per 1M)	Notes
`databricks-meta-llama-3-3-70b-instruct`	128k	$0.54	$0.54	Best price/performance; recommended default
`databricks-meta-llama-3-1-405b-instruct`	128k	$5.00	$15.00	Highest capability open-weight model
`databricks-dbrx-instruct`	32k	$0.75	$2.25	Databricks flagship MoE
`databricks-mixtral-8x7b-instruct`	32k	$0.50	$1.00	Fast MoE; cost-efficient for high volume
`databricks-bge-large-en`	512 tokens	$0.10	—	Embeddings only; 1024-dim vectors

Pricing reflects Databricks published rates. Actual charges depend on your workspace agreement and DBU pricing tier.

Client Examples

Start the gateway:

export DATABRICKS_TOKEN="dapi..."
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key="unused",  # auth handled by Keeptrusts
)

# Chat completion
response = client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a data engineering expert."},
        {"role": "user", "content": "Write a PySpark query to compute 7-day rolling average sales by region."},
    ],
    max_tokens=1024,
    temperature=0.2,
)
print(response.choices[0].message.content)

# Embeddings
embedding = client.embeddings.create(
    model="databricks-bge-large-en",
    input="quarterly revenue by product line",
)
print(f"Vector dimensions: {len(embedding.data[0].embedding)}")

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:41002/v1",
  apiKey: "unused",
});

// Chat completion
const response = await client.chat.completions.create({
  model: "databricks-meta-llama-3-3-70b-instruct",
  messages: [
    { role: "system", content: "You are a data engineering expert." },
    {
      role: "user",
      content: "Write a PySpark query to compute 7-day rolling average sales by region.",
    },
  ],
  max_tokens: 1024,
  temperature: 0.2,
});
console.log(response.choices[0].message.content);

// Embeddings
const embedding = await client.embeddings.create({
  model: "databricks-bge-large-en",
  input: "quarterly revenue by product line",
});
console.log("Vector dimensions:", embedding.data[0].embedding.length);

# Chat completion
curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "databricks-meta-llama-3-3-70b-instruct",
    "messages": [
      {"role": "system", "content": "You are a data engineering expert."},
      {"role": "user", "content": "Write a PySpark query for 7-day rolling average sales."}
    ],
    "max_tokens": 1024,
    "temperature": 0.2
  }' | jq .choices[0].message.content

# Embeddings
curl -s http://localhost:41002/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "databricks-bge-large-en",
    "input": "quarterly revenue by product line"
  }' | jq .data[0].embedding[:5]

Streaming

Databricks Foundation Model APIs support server-sent event (SSE) streaming. Keeptrusts passes streams through transparently after policy checks on the initial request.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:41002/v1", api_key="unused")

with client.chat.completions.stream(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[{"role": "user", "content": "Explain Delta Lake ACID transactions step by step."}],
    max_tokens=2048,
) as stream:
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

In your policy config, set a reasonable stream timeout:

pack:
  name: databricks-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: databricks-llama-70b
    provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
    base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
    secret_key_ref:
      env: DATABRICKS_TOKEN
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Advanced Configuration

Unity Catalog integration and DLP

When Databricks AI runs inside a Unity Catalog-governed workspace, sensitive table names, column names, and schema paths may leak into prompts. Add DLP patterns that match your catalog structure:

policy:
  dlp-filter:
    detect_patterns:
    - '[a-z_]+\.[a-z_]+\.[a-z_]+'
    - dbutils\.secrets\.get\([^)]+\)
    - 0[0-9]{3}-[0-9]{6}-[a-z0-9]{8}
    action: block
pack:
  name: databricks-example-4
  version: 1.0.0
  enabled: true
policies:
  chain:
  - dlp-filter

RBAC for multi-team workspaces

Large Databricks deployments typically serve multiple teams with different model budgets. Map workspace groups to Keeptrusts roles and limit each role to cost-appropriate endpoints:

policy:
  rbac:
    roles:
      ai-platform:
        allowed_models:
        - databricks-meta-llama-3-1-405b-instruct
        - databricks-meta-llama-3-3-70b-instruct
        - databricks-dbrx-instruct
        - databricks-mixtral-8x7b-instruct
        max_tokens_per_request: 16384
      application-team:
        allowed_models:
        - databricks-meta-llama-3-3-70b-instruct
        max_tokens_per_request: 4096
      read-only:
        allowed_models: []
        action: block
pack:
  name: databricks-example-5
  version: 1.0.0
  enabled: true
policies:
  chain:
  - rbac

Zero-data-retention audit trail

Databricks Foundation Models store no customer data server-side. Keeptrusts's audit logger captures a local event record for every request so you maintain a compliance trail without relying on provider storage:

policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true
providers:
  targets:
  - id: databricks-llama-70b
    provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
    base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
    secret_key_ref:
      env: DATABRICKS_TOKEN
pack:
  name: databricks-example-6
  version: 1.0.0
  enabled: true
policies:
  chain:
  - audit-logger

Best Practices

Pin to the workspace region closest to your data — Databricks serving endpoints are regional. Use a base_url that matches your primary data region to minimise latency and avoid cross-region data movement.
Rotate PATs on a schedule — Databricks PATs do not expire by default. Set a 90-day rotation policy, use DATABRICKS_TOKEN in a secrets manager (Vault, AWS Secrets Manager), and inject at runtime rather than baking into config files.
Use databricks-meta-llama-3-3-70b-instruct as the default tier — It offers the best cost/quality ratio for most enterprise tasks. Reserve the 405B endpoint for tasks that demonstrably require it and protect it with a role that requires explicit elevation.
Block raw Databricks credentials in prompts — Add DLP patterns for PAT prefixes (dapi), JDBC connection strings, and secret scope references. A leaked token in a prompt can expose your entire workspace.
Enable zero-data-retention flags in both Keeptrusts and Databricks — Set data_policy.zero_data_retention: true in Keeptrusts and confirm your Databricks workspace has Foundation Model API data retention disabled. Document this in your compliance evidence package.
Test fallback with databricks-mixtral-8x7b-instruct — Use the Keeptrusts routing policy to fall back to Mixtral when the 70B endpoint times out under load. Mixtral is significantly cheaper and can handle most non-critical requests without quality loss.

For AI systems

Canonical terms: Keeptrusts gateway, Databricks, Databricks Model Serving, Mosaic ML, Unity Catalog, provider target, policy-config.yaml, provider: "databricks".
Config field names: provider, model, base_url, secret_key_ref.env: "DATABRICKS_TOKEN", format: "openai", timeout_seconds, health_probe.
Key behavior: Keeptrusts routes to Databricks Model Serving endpoints using PAT or OAuth token auth with OpenAI-compatible format.
Best next pages: AWS Bedrock integration, Together AI integration, Provider routing.

For engineers

Prerequisites: Databricks workspace with Model Serving endpoint deployed, Personal Access Token (DATABRICKS_TOKEN), kt CLI installed.
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"databricks-meta-llama-3-1-70b-instruct","messages":[{"role":"user","content":"hello"}]}'.
Base URL follows Databricks workspace pattern: https://<workspace>.databricks.com/serving-endpoints.
Use fallback strategy with Mixtral 8x7B as a cost-effective fallback for when 70B endpoints time out under load.

For leaders

Databricks Model Serving integrates with Unity Catalog governance — Keeptrusts adds policy enforcement on the request path that Unity Catalog does not cover.
Data stays within your Databricks workspace — no prompts leave your cloud account, satisfying data residency requirements.
Fallback from expensive 70B models to Mixtral 8x7B provides cost control without complete service degradation.
Keeptrusts audit logging complements Databricks system tables for end-to-end observability of AI workloads.

Next steps

AWS Bedrock integration — alternative managed model hosting on AWS
Together AI integration — hosted open models with broader selection
Provider routing strategies — fallback and load-based routing
Policy configuration — prompt-injection and audit-logger reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Minimal — single Foundation Model endpoint​

Full governance config​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Unity Catalog integration and DLP​

RBAC for multi-team workspaces​

Zero-data-retention audit trail​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​