Weights & Biases (W&B)

Keeptrusts integrates with Weights & Biases (W&B) by combining W&B's experiment tracking and model observability with Keeptrusts gateway governance events. When your ML pipeline makes LLM calls through the Keeptrusts gateway, you log both the W&B experiment metrics and the Keeptrusts policy decisions in a single workflow — giving your team unified visibility into model performance and compliance posture.

Use this page when

You are combining W&B experiment tracking with Keeptrusts governance events.
You need the gateway config and Python integration pattern for dual logging.
You want governance metadata (policy decisions, blocked requests) alongside W&B metrics.
If you want a general quickstart instead, see Quickstart.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Prerequisites

A W&B account (free tier or enterprise)
W&B Python SDK installed (pip install wandb)
Keeptrusts CLI (kt) installed and authenticated (kt auth login)
An upstream LLM provider key exported as an environment variable
The openai Python SDK installed (pip install openai)

Configuration

Gateway policy config

pack:
  name: wandb-governed-experiments
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: experiment-llm
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - prompt-injection
  - pii-detector
  - audit-logger
policy:
  prompt-injection:
    threshold: 0.8
    action: block
  pii-detector:
    action: redact
    entities:
    - PERSON
    - EMAIL_ADDRESS
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Start the gateway

export OPENAI_API_KEY="sk-..."
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Setup steps

1. Initialize W&B and the OpenAI client

import wandb
from openai import OpenAI

wandb.init(project="keeptrusts-governed-llm", config={
    "gateway": "http://localhost:41002/v1",
    "model": "gpt-4o",
    "policies": ["prompt-injection", "pii-detector", "audit-logger"]
})

client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key="unused",
)

2. Log governed LLM calls to W&B

import time

prompts = [
    "Summarize the Q4 earnings report.",
    "Draft a customer outreach email.",
    "Explain our refund policy in simple terms.",
]

for i, prompt in enumerate(prompts):
    start = time.time()

    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            max_tokens=512,
            temperature=0.3,
        )
        latency = time.time() - start
        output = response.choices[0].message.content
        tokens_used = response.usage.total_tokens

        wandb.log({
            "prompt_index": i,
            "latency_seconds": latency,
            "tokens_used": tokens_used,
            "status": "allowed",
            "output_length": len(output),
        })

    except Exception as e:
        latency = time.time() - start
        wandb.log({
            "prompt_index": i,
            "latency_seconds": latency,
            "status": "blocked",
            "error": str(e),
        })

wandb.finish()

3. Create a W&B dashboard for governance metrics

In the W&B UI, create a dashboard that tracks:

Allowed vs. blocked requests — bar chart on status
Latency distribution — histogram on latency_seconds
Token usage over time — line chart on tokens_used

This gives your ML team a single pane for both experiment quality and policy compliance.

Verification

# Verify gateway is healthy
curl http://localhost:41002/health

# Run the Python script above
python governed_experiment.py

# Check W&B dashboard for logged metrics
# https://wandb.ai/<your-org>/<project>/runs

# Check Keeptrusts audit log
kt events list --limit 10

Recommended policies

Policy	Purpose	Recommended setting
`pii-detector`	Redact personal data from experiment prompts	`action: redact`, entities: PERSON, EMAIL
`prompt-injection`	Block adversarial inputs in experiment datasets	`threshold: 0.8`, `action: block`
`audit-logger`	Compliance trail for all experiment LLM calls	`retention_days: 365`, `immutable: true`
`rbac`	Restrict model access by team or experiment tier	Map W&B teams to Keeptrusts roles
`cost-tracker`	Monitor token spend per experiment run	Track `tokens_used` alongside W&B cost metrics

Troubleshooting

Symptom	Cause	Fix
W&B logs show `status: blocked` for all requests	Policy chain too restrictive	Test with `audit-logger` only, then add policies incrementally
Missing `tokens_used` in W&B	Response does not include usage object	Ensure `stream: false` (usage is not returned in streamed responses)
`wandb.init()` hangs	W&B API key not set	Run `wandb login` or export `WANDB_API_KEY`
Gateway returns 502	Upstream provider unreachable	Check `OPENAI_API_KEY` and provider connectivity
Latency spikes in W&B charts	Policy evaluation overhead	Move `pii-detector` to async mode if supported; ensure gateway is co-located

For AI systems

Canonical terms: Keeptrusts gateway, Weights & Biases, W&B, experiment tracking, MLOps observability, governance events, policy-config.yaml.
Config field names: provider, secret_key_ref.env, pii-detector, audit-logger.
Key behavior: W&B logs experiment metrics while Keeptrusts enforces policies on the same LLM calls. Both systems see every request — W&B for observability, Keeptrusts for governance.
Best next pages: MLflow integration, LangSmith integration, Policy controls catalog.

For engineers

Prerequisites

W&B account, wandb and openai Python SDKs, kt CLI installed.

Validation

Run the experiment script and verify metrics appear in W&B.
Run kt events list --limit 10 and verify all experiment requests are logged.
Confirm that PII in prompts appears redacted in Keeptrusts audit logs but experiment metrics still flow to W&B.

For leaders

W&B tracks model quality; Keeptrusts tracks compliance. Together they give ML leadership a unified view of whether experiments are both performing well and following governance rules.
Audit trails in Keeptrusts prove that every experiment LLM call was policy-checked, which satisfies internal AI governance committee requirements.
Blocked-request metrics in W&B help identify which datasets or prompt templates trigger policy violations, enabling proactive cleanup before production deployment.

Next steps

MLflow integration — alternative experiment tracking with model serving
LangSmith integration — LLM-specific observability alongside Keeptrusts
Policy controls catalog — full reference for all policy types
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Gateway policy config​

Start the gateway​

Setup steps​

1. Initialize W&B and the OpenAI client​

2. Log governed LLM calls to W&B​

3. Create a W&B dashboard for governance metrics​

Verification​

Recommended policies​

Troubleshooting​

For AI systems​

For engineers​

Prerequisites​

Validation​

For leaders​

Next steps​

Use this page when

Primary audience

Prerequisites

Configuration

Gateway policy config

Start the gateway

Setup steps

1. Initialize W&B and the OpenAI client

2. Log governed LLM calls to W&B

3. Create a W&B dashboard for governance metrics

Verification

Recommended policies

Troubleshooting

For AI systems

For engineers

Prerequisites

Validation

For leaders

Next steps