Fireworks AI

Keeptrusts gateways Fireworks AI's inference API with full policy enforcement, audit logging, and real-time content filtering. Fireworks offers high-throughput inference for open-weight models with an OpenAI-compatible API, making integration seamless — no format translation is needed.

Use this page when

You need the exact command, config, API, or integration details for Fireworks AI.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

Fireworks API key — obtain one from the Fireworks Console.
Keeptrusts CLI — install kt (quickstart guide).
Export your API key:

export FIREWORKS_API_KEY="fw_..."

Keeptrusts auto-detects FIREWORKS_API_KEY and the Fireworks base URL when provider is set to "fireworks".

Configuration

Create a policy-config.yaml with your provider targets:

pack:
  name: fireworks-gateway
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - safety-filter
  - audit-logger
policy:
  prompt-injection:
    threshold: 0.8
    action: block
  pii-detector:
    action: redact
  safety-filter:
    mode: strict
    action: block
  audit-logger:
    retention_days: 365
providers:
  strategy: single
  targets:
  - id: fireworks-llama-70b
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-70b-instruct
    base_url: https://api.fireworks.ai/inference/v1
    secret_key_ref:
      env: FIREWORKS_API_KEY

Start the gateway:

kt gateway run \
  --listen 0.0.0.0:41002 \
  --policy-config policy-config.yaml

Provider Fields

All fields available on a providers.targets[] entry for Fireworks AI:

Field	Type	Default	Description
`id`	string	required	Unique identifier for this target
`provider`	string	required	Provider ID: `"fireworks"` or `"fireworks:chat:accounts/fireworks/models/llama-v3p1-70b-instruct"`
`model`	string	required	Model path, e.g. `"accounts/fireworks/models/llama-v3p1-70b-instruct"`
`base_url`	string	`https://api.fireworks.ai/inference/v1`	API base URL (auto-detected for fireworks)
`secret_key_ref`	object	`FIREWORKS_API_KEY`	Object reference to the environment variable holding the API key
`timeout_seconds`	integer	`60`	Maximum time for non-streaming requests
`stream_timeout_seconds`	integer	none	Maximum time for streaming requests; falls back to `timeout_seconds`
`format`	string	`"openai"`	Wire format — Fireworks is natively OpenAI-compatible
`provider_type`	string	`"openai"`	Explicit provider type; Fireworks uses the OpenAI-compatible gateway
`description`	string	none	Human-readable description for dashboards and logs
`weight`	float	`1.0`	Routing weight for `weighted_round_robin` strategy
`pricing`	object	none	Token pricing in USD per 1M tokens (`prompt`, `completion`)
`health_probe`	object	none	Active health probe configuration

Supported Models

Model	Context Window	Notes
`accounts/fireworks/models/llama-v3p1-70b-instruct`	128K	General purpose, high quality
`accounts/fireworks/models/llama-v3p1-8b-instruct`	128K	Fast, cost-effective
`accounts/fireworks/models/mixtral-8x22b-instruct`	64K	Mixture-of-experts, balanced
`accounts/fireworks/models/firefunction-v2`	8K	Optimized for function calling
`accounts/fireworks/models/qwen2-72b-instruct`	128K	Strong multilingual performance

Any model available on the Fireworks API can be used — set the model field to the full model path. Keeptrusts passes the model identifier through to the upstream without validation.

Client Examples

Once the gateway is running, point your client to http://localhost:8080 instead of https://api.fireworks.ai/inference/v1. Clients send standard OpenAI-format requests.

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # auth is handled by Keeptrusts via FIREWORKS_API_KEY
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p1-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What are the key principles of distributed systems?"},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused", // auth handled by Keeptrusts via FIREWORKS_API_KEY
});

const response = await client.chat.completions.create({
  model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What are the key principles of distributed systems?" },
  ],
  temperature: 0.7,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/fireworks/models/llama-v3p1-70b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What are the key principles of distributed systems?"}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Streaming

Keeptrusts fully supports Fireworks' streaming mode. Set stream: true in your request — the gateway applies policies to each chunk in real time.

pack:
  name: fireworks-providers-2
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: fireworks-streaming
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-70b-instruct
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Python
cURL

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p1-70b-instruct",
    messages=[{"role": "user", "content": "Write a haiku about machine learning."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "accounts/fireworks/models/llama-v3p1-70b-instruct",
    "messages": [{"role": "user", "content": "Write a haiku about machine learning."}],
    "stream": true
  }'

Advanced Configuration

Multi-Model Fallback

Automatically fail over from the 70B model to the faster 8B model:

pack:
  name: fireworks-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: fireworks-70b-primary
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-70b-instruct
    secret_key_ref:
      env: FIREWORKS_API_KEY
  - id: fireworks-8b-fallback
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-8b-instruct
    secret_key_ref:
      env: FIREWORKS_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Cross-Provider Fallback

Use Fireworks as the primary with OpenAI as a fallback:

pack:
  name: fireworks-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: fireworks-primary
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-70b-instruct
    secret_key_ref:
      env: FIREWORKS_API_KEY
  - id: openai-fallback
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Function Calling

Fireworks offers firefunction-v2, a model optimized for tool/function calling:

pack:
  name: fireworks-providers-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: fireworks-functions
    provider: fireworks
    model: accounts/fireworks/models/firefunction-v2
    secret_key_ref:
      env: FIREWORKS_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

response = client.chat.completions.create(
    model="accounts/fireworks/models/firefunction-v2",
    messages=[{"role": "user", "content": "What's the weather in London?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }],
)

Weighted A/B Testing

Split traffic across models:

pack:
  name: fireworks-providers-6
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: variant-70b
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-70b-instruct
    secret_key_ref:
      env: FIREWORKS_API_KEY
  - id: variant-mixtral
    provider: fireworks
    model: accounts/fireworks/models/mixtral-8x22b-instruct
    secret_key_ref:
      env: FIREWORKS_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Circuit Breaker

Temporarily remove unhealthy targets from the rotation:

pack:
  name: fireworks-providers-7
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: fireworks-main
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-70b-instruct
    secret_key_ref:
      env: FIREWORKS_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Retry Policy

Retry transient failures automatically:

pack:
  name: fireworks-providers-8
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: fireworks-main
    provider: fireworks
    model: accounts/fireworks/models/llama-v3p1-70b-instruct
    secret_key_ref:
      env: FIREWORKS_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Best Practices

Fireworks is OpenAI-compatible — no format translation is needed. Use any OpenAI SDK client without code changes.
Use full model paths — Fireworks model IDs follow the accounts/fireworks/models/<name> pattern.
Use firefunction-v2 for tool/function calling workloads — it is specifically optimized for structured output.
Enable health probes on production targets so routing strategies can react to API outages.
Prefer fallback strategy for critical workloads; pair Fireworks with a second provider for resilience.
Declare pricing even if approximate — it enables cost dashboards and per-request budget enforcement.
Separate API keys per environment — use distinct secret_key_ref values for dev, staging, and production.
Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For AI systems

Canonical terms: Keeptrusts gateway, Fireworks AI, Fireworks, provider target, policy-config.yaml, provider: "fireworks", function calling, optimized inference.
Config field names: provider, model, base_url: "https://api.fireworks.ai/inference/v1", secret_key_ref.env: "FIREWORKS_API_KEY", format: "openai", stream_timeout_seconds.
Provider shorthand: fireworks:chat:<model> (e.g., fireworks:chat:llama-v3p3-70b-instruct).
Best next pages: Together AI integration, Groq integration, Provider routing.

For engineers

Prerequisites: Fireworks AI API key (FIREWORKS_API_KEY env var from fireworks.ai), kt CLI installed.
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"accounts/fireworks/models/llama-v3p3-70b-instruct","messages":[{"role":"user","content":"hello"}]}'.
Fireworks uses OpenAI-compatible API with function calling support — standard OpenAI SDKs work without modification.
Use separate secret_key_ref values for dev, staging, and production API keys.
Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For leaders

Fireworks AI offers optimized inference with competitive latency and function calling support — suitable for agentic workloads.
Per-token pricing varies by model; populate pricing fields for accurate cost dashboards.
Fireworks supports fine-tuned model deployment — Keeptrusts policies apply uniformly to base and fine-tuned models.
OpenAI-compatible format means switching between Fireworks and other providers requires only config changes, not code changes.

Next steps

Together AI integration — alternative fast inference for open models
Groq integration — ultra-low latency inference
Provider routing strategies — fallback and weighted routing
Policy configuration — prompt-injection and audit-logger reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Multi-Model Fallback​

Cross-Provider Fallback​

Function Calling​

Weighted A/B Testing​

Circuit Breaker​

Retry Policy​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​