AIML API
AIML API aggregates 200+ AI models — including GPT-4o, Claude, Llama, Gemini, DeepSeek, and more — through a unified OpenAI-compatible endpoint. Keeptrusts routes AIML API requests through its policy engine, enabling governance, PII redaction, prompt-injection detection, and audit logging across the full catalog of hosted models with zero application-side changes.
Use this page when
- You need the exact command, config, API, or integration details for AIML API.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Because AIML API uses the OpenAI wire format, any OpenAI SDK client can be pointed at the Keeptrusts gateway without code changes. The gateway handles authentication, policy enforcement, and optional fallback routing transparently.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Prerequisites
- AIML API key — obtain one from AIML API.
- Keeptrusts CLI — install
kt(quickstart guide). - Export your API key so the gateway can read it at startup:
export AIMLAPI_KEY="your-aimlapi-key"
When the provider field is set to "aimlapi", Keeptrusts auto-detects both the base URL (https://api.aimlapi.com/v1) and the API key environment variable (AIMLAPI_KEY). You only need to override these if you use a non-standard env-var name.
Configuration
A minimal policy-config.yaml that routes traffic through AIML API with prompt-injection, PII, and safety policies:
pack:
name: aimlapi-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: aimlapi-gpt4o
provider: aimlapi
model: gpt-4o
base_url: https://api.aimlapi.com/v1
secret_key_ref:
env: AIMLAPI_KEY
Start the gateway:
kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml
Compact Provider Shorthand
You can encode the model directly in the provider field. The two forms below are equivalent:
# Shorthand — model embedded in the provider string
- id: "aimlapi-gpt4o"
provider: "aimlapi:chat:gpt-4o"
# Explicit — separate provider and model fields
- id: "aimlapi-gpt4o"
provider: "aimlapi"
model: "gpt-4o"
Provider Fields
All fields available on a providers.targets[] entry for AIML API:
| Field | Type | Default | Description |
|---|---|---|---|
id | string | required | Unique identifier for this target. Used in logs, the console dashboard, and routing decisions. |
provider | string | required | Provider ID. Use "aimlapi" or the shorthand "aimlapi:chat:<model>". |
model | string | required | Model name, e.g. "gpt-4o" or "meta-llama/Llama-3.3-70B-Instruct-Turbo". Passed through to the upstream API as-is. |
base_url | string | https://api.aimlapi.com/v1 | API base URL. Auto-detected when provider is "aimlapi". Override for custom routing. |
secret_key_ref | object | AIMLAPI_KEY | Object reference to the environment variable holding the AIML API key. Auto-detected for the "aimlapi" provider. |
format | string | "openai" | Wire format. AIML API is OpenAI-compatible, so this is always "openai". |
timeout_seconds | integer | 60 | Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error. |
stream_timeout_seconds | integer | inherits timeout_seconds | Maximum wall-clock time for streaming requests. Set higher than timeout_seconds for long generations. |
max_context_tokens | integer | none | Maximum token budget for the request. The gateway rejects requests that exceed this limit before forwarding. |
description | string | none | Human-readable label shown in the console dashboard, logs, and health-check output. |
weight | float | 1.0 | Routing weight used by the weighted_round_robin strategy. |
pricing | object | none | Token pricing in USD per 1M tokens. Fields: prompt (input cost), completion (output cost). |
health_probe | object | none | Active health probe. Sub-fields: enabled (bool), interval_seconds (int), timeout_seconds (int). |
Supported Models
AIML API aggregates 200+ models. A representative selection:
| Model | Category | Notes |
|---|---|---|
gpt-4o | OpenAI | Latest multimodal GPT-4o |
claude-3-5-sonnet-20241022 | Anthropic | Anthropic's strongest coding and reasoning model |
meta-llama/Llama-3.3-70B-Instruct-Turbo | Meta | High-throughput open-weight Llama 3.3 |
gemini-2.0-flash | Fast, cost-efficient Gemini 2.0 | |
deepseek-r1 | DeepSeek | Strong mathematical and scientific reasoning |
mistral-large-latest | Mistral | Multilingual flagship from Mistral AI |
grok-2 | xAI | Real-time knowledge via xAI Grok |
The full model catalog is available on the AIML API model page. Keeptrusts passes the model identifier through to the upstream without validation, so newly added models are supported automatically.
model field in your config — no provider or SDK changes required.Client Examples
Once the gateway is running, point your client SDK to http://localhost:8080 instead of https://api.aimlapi.com/v1. Standard OpenAI-format requests work unchanged.
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via AIMLAPI_KEY
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the key differences between transformer and diffusion models."},
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused", // auth handled by Keeptrusts via AIMLAPI_KEY
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Summarize the key differences between transformer and diffusion models." },
],
temperature: 0.7,
max_tokens: 512,
});
console.log(response.choices[0].message.content);
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the key differences between transformer and diffusion models."}
],
"temperature": 0.7,
"max_tokens": 512
}'
Streaming
Keeptrusts fully supports streaming for AIML API. Set stream: true in your request — the gateway applies policies to each chunk in real time, including content filtering and PII redaction on partial tokens.
Configure stream_timeout_seconds to allow enough time for long-running streamed generations:
pack:
name: aimlapi-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: aimlapi-streaming
provider: aimlapi
model: meta-llama/Llama-3.3-70B-Instruct-Turbo
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Explain zero-trust network architecture in depth."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8080/v1", apiKey: "unused" });
const stream = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages: [{ role: "user", content: "Explain zero-trust network architecture in depth." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"messages": [{"role": "user", "content": "Explain zero-trust network architecture in depth."}],
"stream": true
}'
Advanced Configuration
Multi-Model Fallback
Route across different models hosted through AIML API, failing over automatically on errors or timeouts:
pack:
name: aimlapi-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: aimlapi-gpt4o-primary
provider: aimlapi
model: gpt-4o
secret_key_ref:
env: AIMLAPI_KEY
- id: aimlapi-llama-fallback
provider: aimlapi
model: meta-llama/Llama-3.3-70B-Instruct-Turbo
secret_key_ref:
env: AIMLAPI_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Cross-Provider A/B Testing
Split traffic between models from different originating providers — all aggregated through a single AIML API key:
pack:
name: aimlapi-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: variant-gpt4o
provider: aimlapi
model: gpt-4o
secret_key_ref:
env: AIMLAPI_KEY
- id: variant-claude
provider: aimlapi
model: claude-3-5-sonnet-20241022
secret_key_ref:
env: AIMLAPI_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Pair with audit-logger and the console Events dashboard to compare output quality and cost per variant.
Circuit Breaker
Protect your application when AIML API or a specific model becomes degraded:
pack:
name: aimlapi-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: aimlapi-primary
provider: aimlapi
model: gpt-4o
secret_key_ref:
env: AIMLAPI_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Best Practices
- Use model shorthand for quick configs:
provider: "aimlapi:chat:gpt-4o"is equivalent to settingprovider: "aimlapi"+model: "gpt-4o"and keeps your config concise. - Set
pricingfor cost tracking: AIML API charges vary by underlying model. Populatepricing.promptandpricing.completionwith the per-model rates so cost dashboards are accurate. - Enable
health_probein production to detect AIML API degradation before clients experience errors. Setinterval_seconds: 30andtimeout_seconds: 5as a baseline. - Increase
stream_timeout_secondsfor large models like Llama 3.3 70B or DeepSeek R1 — they produce tokens more slowly than GPT-4o. Start at 300 seconds and tune down if needed. - Centralize your API key: Store
AIMLAPI_KEYin a secret manager and inject it at runtime. Do not hard-code it inpolicy-config.yaml. - Pin model versions in production: Use exact model identifiers rather than generic aliases to guarantee reproducible outputs across deployments.
For AI systems
- Canonical terms: Keeptrusts gateway, AIML API, provider target, policy-config.yaml,
provider: "aimlapi",secret_key_ref, AIMLAPI_KEY. - Config field names:
provider,model,base_url,secret_key_ref.env,format: "openai",pricing.prompt,pricing.completion,health_probe,weight. - Provider shorthand:
aimlapi:chat:<model>(e.g.,aimlapi:chat:gpt-4o). - Best next pages: OpenAI integration, Provider routing, Policy configuration.
For engineers
- Prerequisites: AIML API key (
AIMLAPI_KEYenv var),ktCLI installed. - Start command:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml. - Validate:
curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'. - AIML API uses OpenAI wire format — any OpenAI SDK client works without code changes.
- Set
pricing.promptandpricing.completionper model for accurate cost dashboards.
For leaders
- AIML API aggregates 200+ models under one API key — simplifies vendor management but creates single-provider concentration risk.
- Cost varies significantly by underlying model (GPT-4o vs Llama 3.3); enforce per-model cost tracking via the
pricingfield. - All traffic is auditable via the
audit-loggerpolicy regardless of which upstream model is selected. - Switching models requires only a
modelfield change — no SDK or provider reconfiguration — which accelerates vendor diversification.
Next steps
- OpenAI integration — native OpenAI routing when you don't need aggregation
- Provider routing strategies — fallback, round-robin, and weighted routing
- Policy configuration — prompt-injection, PII, and safety policy reference
- Quickstart — install
ktand run your first gateway