Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Fireworks AI

Keeptrusts gateways Fireworks AI's inference API with full policy enforcement, audit logging, and real-time content filtering. Fireworks offers high-throughput inference for open-weight models with an OpenAI-compatible API, making integration seamless — no format translation is needed.

Use this page when

  • You need the exact command, config, API, or integration details for Fireworks AI.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Prerequisites

  1. Fireworks API key — obtain one from the Fireworks Console.
  2. Keeptrusts CLI — install kt (quickstart guide).
  3. Export your API key:
export FIREWORKS_API_KEY="fw_..."

Keeptrusts auto-detects FIREWORKS_API_KEY and the Fireworks base URL when provider is set to "fireworks".

Configuration

Create a policy-config.yaml with your provider targets:

pack:
name: fireworks-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: fireworks-llama-70b
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
base_url: https://api.fireworks.ai/inference/v1
secret_key_ref:
env: FIREWORKS_API_KEY

Start the gateway:

kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml

Provider Fields

All fields available on a providers.targets[] entry for Fireworks AI:

FieldTypeDefaultDescription
idstringrequiredUnique identifier for this target
providerstringrequiredProvider ID: "fireworks" or "fireworks:chat:accounts/fireworks/models/llama-v3p1-70b-instruct"
modelstringrequiredModel path, e.g. "accounts/fireworks/models/llama-v3p1-70b-instruct"
base_urlstringhttps://api.fireworks.ai/inference/v1API base URL (auto-detected for fireworks)
secret_key_refobjectFIREWORKS_API_KEYObject reference to the environment variable holding the API key
timeout_secondsinteger60Maximum time for non-streaming requests
stream_timeout_secondsintegernoneMaximum time for streaming requests; falls back to timeout_seconds
formatstring"openai"Wire format — Fireworks is natively OpenAI-compatible
provider_typestring"openai"Explicit provider type; Fireworks uses the OpenAI-compatible gateway
descriptionstringnoneHuman-readable description for dashboards and logs
weightfloat1.0Routing weight for weighted_round_robin strategy
pricingobjectnoneToken pricing in USD per 1M tokens (prompt, completion)
health_probeobjectnoneActive health probe configuration

Supported Models

ModelContext WindowNotes
accounts/fireworks/models/llama-v3p1-70b-instruct128KGeneral purpose, high quality
accounts/fireworks/models/llama-v3p1-8b-instruct128KFast, cost-effective
accounts/fireworks/models/mixtral-8x22b-instruct64KMixture-of-experts, balanced
accounts/fireworks/models/firefunction-v28KOptimized for function calling
accounts/fireworks/models/qwen2-72b-instruct128KStrong multilingual performance

Any model available on the Fireworks API can be used — set the model field to the full model path. Keeptrusts passes the model identifier through to the upstream without validation.

Client Examples

Once the gateway is running, point your client to http://localhost:8080 instead of https://api.fireworks.ai/inference/v1. Clients send standard OpenAI-format requests.

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via FIREWORKS_API_KEY
)

response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the key principles of distributed systems?"},
],
temperature=0.7,
max_tokens=512,
)

print(response.choices[0].message.content)

Streaming

Keeptrusts fully supports Fireworks' streaming mode. Set stream: true in your request — the gateway applies policies to each chunk in real time.

pack:
name: fireworks-providers-2
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-streaming
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-70b-instruct",
messages=[{"role": "user", "content": "Write a haiku about machine learning."}],
stream=True,
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

Advanced Configuration

Multi-Model Fallback

Automatically fail over from the 70B model to the faster 8B model:

pack:
name: fireworks-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-70b-primary
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
- id: fireworks-8b-fallback
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-8b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Cross-Provider Fallback

Use Fireworks as the primary with OpenAI as a fallback:

pack:
name: fireworks-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-primary
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
- id: openai-fallback
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Function Calling

Fireworks offers firefunction-v2, a model optimized for tool/function calling:

pack:
name: fireworks-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-functions
provider: fireworks
model: accounts/fireworks/models/firefunction-v2
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

response = client.chat.completions.create(
model="accounts/fireworks/models/firefunction-v2",
messages=[{"role": "user", "content": "What's the weather in London?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}],
)

Weighted A/B Testing

Split traffic across models:

pack:
name: fireworks-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: variant-70b
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
- id: variant-mixtral
provider: fireworks
model: accounts/fireworks/models/mixtral-8x22b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Circuit Breaker

Temporarily remove unhealthy targets from the rotation:

pack:
name: fireworks-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-main
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Retry Policy

Retry transient failures automatically:

pack:
name: fireworks-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: fireworks-main
provider: fireworks
model: accounts/fireworks/models/llama-v3p1-70b-instruct
secret_key_ref:
env: FIREWORKS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Best Practices

  • Fireworks is OpenAI-compatible — no format translation is needed. Use any OpenAI SDK client without code changes.
  • Use full model paths — Fireworks model IDs follow the accounts/fireworks/models/<name> pattern.
  • Use firefunction-v2 for tool/function calling workloads — it is specifically optimized for structured output.
  • Enable health probes on production targets so routing strategies can react to API outages.
  • Prefer fallback strategy for critical workloads; pair Fireworks with a second provider for resilience.
  • Declare pricing even if approximate — it enables cost dashboards and per-request budget enforcement.
  • Separate API keys per environment — use distinct secret_key_ref values for dev, staging, and production.
  • Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For AI systems

  • Canonical terms: Keeptrusts gateway, Fireworks AI, Fireworks, provider target, policy-config.yaml, provider: "fireworks", function calling, optimized inference.
  • Config field names: provider, model, base_url: "https://api.fireworks.ai/inference/v1", secret_key_ref.env: "FIREWORKS_API_KEY", format: "openai", stream_timeout_seconds.
  • Provider shorthand: fireworks:chat:<model> (e.g., fireworks:chat:llama-v3p3-70b-instruct).
  • Best next pages: Together AI integration, Groq integration, Provider routing.

For engineers

  • Prerequisites: Fireworks AI API key (FIREWORKS_API_KEY env var from fireworks.ai), kt CLI installed.
  • Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
  • Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"accounts/fireworks/models/llama-v3p3-70b-instruct","messages":[{"role":"user","content":"hello"}]}'.
  • Fireworks uses OpenAI-compatible API with function calling support — standard OpenAI SDKs work without modification.
  • Use separate secret_key_ref values for dev, staging, and production API keys.
  • Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For leaders

  • Fireworks AI offers optimized inference with competitive latency and function calling support — suitable for agentic workloads.
  • Per-token pricing varies by model; populate pricing fields for accurate cost dashboards.
  • Fireworks supports fine-tuned model deployment — Keeptrusts policies apply uniformly to base and fine-tuned models.
  • OpenAI-compatible format means switching between Fireworks and other providers requires only config changes, not code changes.

Next steps