Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Weights & Biases (W&B)

Keeptrusts integrates with Weights & Biases (W&B) by combining W&B's experiment tracking and model observability with Keeptrusts gateway governance events. When your ML pipeline makes LLM calls through the Keeptrusts gateway, you log both the W&B experiment metrics and the Keeptrusts policy decisions in a single workflow — giving your team unified visibility into model performance and compliance posture.

Use this page when

  • You are combining W&B experiment tracking with Keeptrusts governance events.
  • You need the gateway config and Python integration pattern for dual logging.
  • You want governance metadata (policy decisions, blocked requests) alongside W&B metrics.
  • If you want a general quickstart instead, see Quickstart.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Prerequisites

  • A W&B account (free tier or enterprise)
  • W&B Python SDK installed (pip install wandb)
  • Keeptrusts CLI (kt) installed and authenticated (kt auth login)
  • An upstream LLM provider key exported as an environment variable
  • The openai Python SDK installed (pip install openai)

Configuration

Gateway policy config

pack:
name: wandb-governed-experiments
version: 1.0.0
enabled: true
providers:
targets:
- id: experiment-llm
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- prompt-injection
- pii-detector
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
entities:
- PERSON
- EMAIL_ADDRESS
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Start the gateway

export OPENAI_API_KEY="sk-..."
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Setup steps

1. Initialize W&B and the OpenAI client

import wandb
from openai import OpenAI

wandb.init(project="keeptrusts-governed-llm", config={
"gateway": "http://localhost:41002/v1",
"model": "gpt-4o",
"policies": ["prompt-injection", "pii-detector", "audit-logger"]
})

client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="unused",
)

2. Log governed LLM calls to W&B

import time

prompts = [
"Summarize the Q4 earnings report.",
"Draft a customer outreach email.",
"Explain our refund policy in simple terms.",
]

for i, prompt in enumerate(prompts):
start = time.time()

try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=512,
temperature=0.3,
)
latency = time.time() - start
output = response.choices[0].message.content
tokens_used = response.usage.total_tokens

wandb.log({
"prompt_index": i,
"latency_seconds": latency,
"tokens_used": tokens_used,
"status": "allowed",
"output_length": len(output),
})

except Exception as e:
latency = time.time() - start
wandb.log({
"prompt_index": i,
"latency_seconds": latency,
"status": "blocked",
"error": str(e),
})

wandb.finish()

3. Create a W&B dashboard for governance metrics

In the W&B UI, create a dashboard that tracks:

  • Allowed vs. blocked requests — bar chart on status
  • Latency distribution — histogram on latency_seconds
  • Token usage over time — line chart on tokens_used

This gives your ML team a single pane for both experiment quality and policy compliance.

Verification

# Verify gateway is healthy
curl http://localhost:41002/health

# Run the Python script above
python governed_experiment.py

# Check W&B dashboard for logged metrics
# https://wandb.ai/<your-org>/<project>/runs

# Check Keeptrusts audit log
kt events list --limit 10
PolicyPurposeRecommended setting
pii-detectorRedact personal data from experiment promptsaction: redact, entities: PERSON, EMAIL
prompt-injectionBlock adversarial inputs in experiment datasetsthreshold: 0.8, action: block
audit-loggerCompliance trail for all experiment LLM callsretention_days: 365, immutable: true
rbacRestrict model access by team or experiment tierMap W&B teams to Keeptrusts roles
cost-trackerMonitor token spend per experiment runTrack tokens_used alongside W&B cost metrics

Troubleshooting

SymptomCauseFix
W&B logs show status: blocked for all requestsPolicy chain too restrictiveTest with audit-logger only, then add policies incrementally
Missing tokens_used in W&BResponse does not include usage objectEnsure stream: false (usage is not returned in streamed responses)
wandb.init() hangsW&B API key not setRun wandb login or export WANDB_API_KEY
Gateway returns 502Upstream provider unreachableCheck OPENAI_API_KEY and provider connectivity
Latency spikes in W&B chartsPolicy evaluation overheadMove pii-detector to async mode if supported; ensure gateway is co-located

For AI systems

  • Canonical terms: Keeptrusts gateway, Weights & Biases, W&B, experiment tracking, MLOps observability, governance events, policy-config.yaml.
  • Config field names: provider, secret_key_ref.env, pii-detector, audit-logger.
  • Key behavior: W&B logs experiment metrics while Keeptrusts enforces policies on the same LLM calls. Both systems see every request — W&B for observability, Keeptrusts for governance.
  • Best next pages: MLflow integration, LangSmith integration, Policy controls catalog.

For engineers

Prerequisites

  • W&B account, wandb and openai Python SDKs, kt CLI installed.

Validation

  • Run the experiment script and verify metrics appear in W&B.
  • Run kt events list --limit 10 and verify all experiment requests are logged.
  • Confirm that PII in prompts appears redacted in Keeptrusts audit logs but experiment metrics still flow to W&B.

For leaders

  • W&B tracks model quality; Keeptrusts tracks compliance. Together they give ML leadership a unified view of whether experiments are both performing well and following governance rules.
  • Audit trails in Keeptrusts prove that every experiment LLM call was policy-checked, which satisfies internal AI governance committee requirements.
  • Blocked-request metrics in W&B help identify which datasets or prompt templates trigger policy violations, enabling proactive cleanup before production deployment.

Next steps