Fireworks AI
Keeptrusts gateways Fireworks AI's inference API with full policy enforcement, audit logging, and real-time content filtering. Fireworks offers high-throughput inference for open-weight models with an OpenAI-compatible API, making integration seamless — no format translation is needed.
Use this page when
- You need the exact command, config, API, or integration details for Fireworks AI.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Prerequisites
- Fireworks API key — obtain one from the Fireworks Console.
- Keeptrusts CLI — install
kt(quickstart guide). - Export your API key:
export FIREWORKS_API_KEY="fw_..."
Keeptrusts auto-detects FIREWORKS_API_KEY and the Fireworks base URL when provider is set to "fireworks".
Configuration
Create a policy-config.yaml with your provider targets:
pack:
name: fireworks-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: fireworks-llama-70b
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
base_url: https://api.fireworks.ai/inference/v1
secret_key_ref:
env: FIREWORKS_API_KEY
Start the gateway:
kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml
Provider Fields
All fields available on a providers.targets[] entry for Fireworks AI:
| Field | Type | Default | Description |
|---|---|---|---|
id | string | required | Unique identifier for this target |
provider | string | required | Provider ID: "fireworks" or "fireworks:chat:accounts/fireworks/models/llama-v3p1-70b-instruct" |
model | string | required | Model path, e.g. "accounts/fireworks/models/llama-v3p1-70b-instruct" |
base_url | string | https://api.fireworks.ai/inference/v1 | API base URL (auto-detected for fireworks) |
secret_key_ref | object | FIREWORKS_API_KEY | Object reference to the environment variable holding the API key |
timeout_seconds | integer | 60 | Maximum time for non-streaming requests |
stream_timeout_seconds | integer | none | Maximum time for streaming requests; falls back to timeout_seconds |
format | string | "openai" | Wire format — Fireworks is natively OpenAI-compatible |
provider_type | string | "openai" | Explicit provider type; Fireworks uses the OpenAI-compatible gateway |
description | string | none | Human-readable description for dashboards and logs |
weight | float | 1.0 | Routing weight for weighted_round_robin strategy |
pricing | object | none | Token pricing in USD per 1M tokens (prompt, completion) |
health_probe | object | none | Active health probe configuration |
Supported Models
| Model | Context Window | Notes |
|---|---|---|
accounts/fireworks/models/llama-v3p1-70b-instruct | 128K | General purpose, high quality |
accounts/fireworks/models/llama-v3p1-8b-instruct | 128K | Fast, cost-effective |
accounts/fireworks/models/mixtral-8x22b-instruct | 64K | Mixture-of-experts, balanced |
accounts/fireworks/models/firefunction-v2 | 8K | Optimized for function calling |
accounts/fireworks/models/qwen2-72b-instruct | 128K | Strong multilingual performance |
Any model available on the Fireworks API can be used — set the model field to the full model path. Keeptrusts passes the model identifier through to the upstream without validation.
Client Examples
Once the gateway is running, point your client to http://localhost:8080 instead of https://api.fireworks.ai/inference/v1. Clients send standard OpenAI-format requests.
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via FIREWORKS_API_KEY
)
response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the key principles of distributed systems?"},
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused", // auth handled by Keeptrusts via FIREWORKS_API_KEY
});
const response = await client.chat.completions.create({
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What are the key principles of distributed systems?" },
],
temperature: 0.7,
max_tokens: 512,
});
console.log(response.choices[0].message.content);
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "accounts/fireworks/models/llama-v3p1-70b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the key principles of distributed systems?"}
],
"temperature": 0.7,
"max_tokens": 512
}'
Streaming
Keeptrusts fully supports Fireworks' streaming mode. Set stream: true in your request — the gateway applies policies to each chunk in real time.
pack:
name: fireworks-providers-2
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-streaming
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
- Python
- cURL
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
stream = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-70b-instruct",
messages=[{"role": "user", "content": "Write a haiku about machine learning."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "accounts/fireworks/models/llama-v3p1-70b-instruct",
"messages": [{"role": "user", "content": "Write a haiku about machine learning."}],
"stream": true
}'
Advanced Configuration
Multi-Model Fallback
Automatically fail over from the 70B model to the faster 8B model:
pack:
name: fireworks-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-70b-primary
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
- id: fireworks-8b-fallback
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-8b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Cross-Provider Fallback
Use Fireworks as the primary with OpenAI as a fallback:
pack:
name: fireworks-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-primary
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
- id: openai-fallback
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Function Calling
Fireworks offers firefunction-v2, a model optimized for tool/function calling:
pack:
name: fireworks-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-functions
provider: fireworks
model: accounts/fireworks/models/firefunction-v2
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
model="accounts/fireworks/models/firefunction-v2",
messages=[{"role": "user", "content": "What's the weather in London?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}],
)
Weighted A/B Testing
Split traffic across models:
pack:
name: fireworks-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: variant-70b
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
- id: variant-mixtral
provider: fireworks
model: accounts/fireworks/models/mixtral-8x22b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Circuit Breaker
Temporarily remove unhealthy targets from the rotation:
pack:
name: fireworks-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-main
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Retry Policy
Retry transient failures automatically:
pack:
name: fireworks-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-main
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Best Practices
- Fireworks is OpenAI-compatible — no format translation is needed. Use any OpenAI SDK client without code changes.
- Use full model paths — Fireworks model IDs follow the
accounts/fireworks/models/<name>pattern. - Use
firefunction-v2for tool/function calling workloads — it is specifically optimized for structured output. - Enable health probes on production targets so routing strategies can react to API outages.
- Prefer
fallbackstrategy for critical workloads; pair Fireworks with a second provider for resilience. - Declare
pricingeven if approximate — it enables cost dashboards and per-request budget enforcement. - Separate API keys per environment — use distinct
secret_key_refvalues for dev, staging, and production. - Set
stream_timeout_secondsfor streaming workloads to accommodate longer generations.
For AI systems
- Canonical terms: Keeptrusts gateway, Fireworks AI, Fireworks, provider target, policy-config.yaml,
provider: "fireworks", function calling, optimized inference. - Config field names:
provider,model,base_url: "https://api.fireworks.ai/inference/v1",secret_key_ref.env: "FIREWORKS_API_KEY",format: "openai",stream_timeout_seconds. - Provider shorthand:
fireworks:chat:<model>(e.g.,fireworks:chat:llama-v3p3-70b-instruct). - Best next pages: Together AI integration, Groq integration, Provider routing.
For engineers
- Prerequisites: Fireworks AI API key (
FIREWORKS_API_KEYenv var from fireworks.ai),ktCLI installed. - Start command:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml. - Validate:
curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"accounts/fireworks/models/llama-v3p3-70b-instruct","messages":[{"role":"user","content":"hello"}]}'. - Fireworks uses OpenAI-compatible API with function calling support — standard OpenAI SDKs work without modification.
- Use separate
secret_key_refvalues for dev, staging, and production API keys. - Set
stream_timeout_secondsfor streaming workloads to accommodate longer generations.
For leaders
- Fireworks AI offers optimized inference with competitive latency and function calling support — suitable for agentic workloads.
- Per-token pricing varies by model; populate
pricingfields for accurate cost dashboards. - Fireworks supports fine-tuned model deployment — Keeptrusts policies apply uniformly to base and fine-tuned models.
- OpenAI-compatible format means switching between Fireworks and other providers requires only config changes, not code changes.
Next steps
- Together AI integration — alternative fast inference for open models
- Groq integration — ultra-low latency inference
- Provider routing strategies — fallback and weighted routing
- Policy configuration — prompt-injection and audit-logger reference
- Quickstart — install
ktand run your first gateway