Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

AIML API

AIML API aggregates 200+ AI models — including GPT-4o, Claude, Llama, Gemini, DeepSeek, and more — through a unified OpenAI-compatible endpoint. Keeptrusts routes AIML API requests through its policy engine, enabling governance, PII redaction, prompt-injection detection, and audit logging across the full catalog of hosted models with zero application-side changes.

Use this page when

  • You need the exact command, config, API, or integration details for AIML API.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Because AIML API uses the OpenAI wire format, any OpenAI SDK client can be pointed at the Keeptrusts gateway without code changes. The gateway handles authentication, policy enforcement, and optional fallback routing transparently.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Prerequisites

  1. AIML API key — obtain one from AIML API.
  2. Keeptrusts CLI — install kt (quickstart guide).
  3. Export your API key so the gateway can read it at startup:
export AIMLAPI_KEY="your-aimlapi-key"

When the provider field is set to "aimlapi", Keeptrusts auto-detects both the base URL (https://api.aimlapi.com/v1) and the API key environment variable (AIMLAPI_KEY). You only need to override these if you use a non-standard env-var name.

Configuration

A minimal policy-config.yaml that routes traffic through AIML API with prompt-injection, PII, and safety policies:

pack:
name: aimlapi-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: aimlapi-gpt4o
provider: aimlapi
model: gpt-4o
base_url: https://api.aimlapi.com/v1
secret_key_ref:
env: AIMLAPI_KEY

Start the gateway:

kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml

Compact Provider Shorthand

You can encode the model directly in the provider field. The two forms below are equivalent:

# Shorthand — model embedded in the provider string
- id: "aimlapi-gpt4o"
provider: "aimlapi:chat:gpt-4o"

# Explicit — separate provider and model fields
- id: "aimlapi-gpt4o"
provider: "aimlapi"
model: "gpt-4o"

Provider Fields

All fields available on a providers.targets[] entry for AIML API:

FieldTypeDefaultDescription
idstringrequiredUnique identifier for this target. Used in logs, the console dashboard, and routing decisions.
providerstringrequiredProvider ID. Use "aimlapi" or the shorthand "aimlapi:chat:<model>".
modelstringrequiredModel name, e.g. "gpt-4o" or "meta-llama/Llama-3.3-70B-Instruct-Turbo". Passed through to the upstream API as-is.
base_urlstringhttps://api.aimlapi.com/v1API base URL. Auto-detected when provider is "aimlapi". Override for custom routing.
secret_key_refobjectAIMLAPI_KEYObject reference to the environment variable holding the AIML API key. Auto-detected for the "aimlapi" provider.
formatstring"openai"Wire format. AIML API is OpenAI-compatible, so this is always "openai".
timeout_secondsinteger60Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error.
stream_timeout_secondsintegerinherits timeout_secondsMaximum wall-clock time for streaming requests. Set higher than timeout_seconds for long generations.
max_context_tokensintegernoneMaximum token budget for the request. The gateway rejects requests that exceed this limit before forwarding.
descriptionstringnoneHuman-readable label shown in the console dashboard, logs, and health-check output.
weightfloat1.0Routing weight used by the weighted_round_robin strategy.
pricingobjectnoneToken pricing in USD per 1M tokens. Fields: prompt (input cost), completion (output cost).
health_probeobjectnoneActive health probe. Sub-fields: enabled (bool), interval_seconds (int), timeout_seconds (int).

Supported Models

AIML API aggregates 200+ models. A representative selection:

ModelCategoryNotes
gpt-4oOpenAILatest multimodal GPT-4o
claude-3-5-sonnet-20241022AnthropicAnthropic's strongest coding and reasoning model
meta-llama/Llama-3.3-70B-Instruct-TurboMetaHigh-throughput open-weight Llama 3.3
gemini-2.0-flashGoogleFast, cost-efficient Gemini 2.0
deepseek-r1DeepSeekStrong mathematical and scientific reasoning
mistral-large-latestMistralMultilingual flagship from Mistral AI
grok-2xAIReal-time knowledge via xAI Grok

The full model catalog is available on the AIML API model page. Keeptrusts passes the model identifier through to the upstream without validation, so newly added models are supported automatically.

Because AIML API aggregates providers, you can switch between GPT-4o, Claude, and Llama by changing only the model field in your config — no provider or SDK changes required.

Client Examples

Once the gateway is running, point your client SDK to http://localhost:8080 instead of https://api.aimlapi.com/v1. Standard OpenAI-format requests work unchanged.

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via AIMLAPI_KEY
)

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the key differences between transformer and diffusion models."},
],
temperature=0.7,
max_tokens=512,
)

print(response.choices[0].message.content)

Streaming

Keeptrusts fully supports streaming for AIML API. Set stream: true in your request — the gateway applies policies to each chunk in real time, including content filtering and PII redaction on partial tokens.

Configure stream_timeout_seconds to allow enough time for long-running streamed generations:

pack:
name: aimlapi-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: aimlapi-streaming
provider: aimlapi
model: meta-llama/Llama-3.3-70B-Instruct-Turbo
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Explain zero-trust network architecture in depth."}],
stream=True,
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

Advanced Configuration

Multi-Model Fallback

Route across different models hosted through AIML API, failing over automatically on errors or timeouts:

pack:
name: aimlapi-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: aimlapi-gpt4o-primary
provider: aimlapi
model: gpt-4o
secret_key_ref:
env: AIMLAPI_KEY
- id: aimlapi-llama-fallback
provider: aimlapi
model: meta-llama/Llama-3.3-70B-Instruct-Turbo
secret_key_ref:
env: AIMLAPI_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Cross-Provider A/B Testing

Split traffic between models from different originating providers — all aggregated through a single AIML API key:

pack:
name: aimlapi-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: variant-gpt4o
provider: aimlapi
model: gpt-4o
secret_key_ref:
env: AIMLAPI_KEY
- id: variant-claude
provider: aimlapi
model: claude-3-5-sonnet-20241022
secret_key_ref:
env: AIMLAPI_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Pair with audit-logger and the console Events dashboard to compare output quality and cost per variant.

Circuit Breaker

Protect your application when AIML API or a specific model becomes degraded:

pack:
name: aimlapi-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: aimlapi-primary
provider: aimlapi
model: gpt-4o
secret_key_ref:
env: AIMLAPI_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Best Practices

  • Use model shorthand for quick configs: provider: "aimlapi:chat:gpt-4o" is equivalent to setting provider: "aimlapi" + model: "gpt-4o" and keeps your config concise.
  • Set pricing for cost tracking: AIML API charges vary by underlying model. Populate pricing.prompt and pricing.completion with the per-model rates so cost dashboards are accurate.
  • Enable health_probe in production to detect AIML API degradation before clients experience errors. Set interval_seconds: 30 and timeout_seconds: 5 as a baseline.
  • Increase stream_timeout_seconds for large models like Llama 3.3 70B or DeepSeek R1 — they produce tokens more slowly than GPT-4o. Start at 300 seconds and tune down if needed.
  • Centralize your API key: Store AIMLAPI_KEY in a secret manager and inject it at runtime. Do not hard-code it in policy-config.yaml.
  • Pin model versions in production: Use exact model identifiers rather than generic aliases to guarantee reproducible outputs across deployments.

For AI systems

  • Canonical terms: Keeptrusts gateway, AIML API, provider target, policy-config.yaml, provider: "aimlapi", secret_key_ref, AIMLAPI_KEY.
  • Config field names: provider, model, base_url, secret_key_ref.env, format: "openai", pricing.prompt, pricing.completion, health_probe, weight.
  • Provider shorthand: aimlapi:chat:<model> (e.g., aimlapi:chat:gpt-4o).
  • Best next pages: OpenAI integration, Provider routing, Policy configuration.

For engineers

  • Prerequisites: AIML API key (AIMLAPI_KEY env var), kt CLI installed.
  • Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
  • Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'.
  • AIML API uses OpenAI wire format — any OpenAI SDK client works without code changes.
  • Set pricing.prompt and pricing.completion per model for accurate cost dashboards.

For leaders

  • AIML API aggregates 200+ models under one API key — simplifies vendor management but creates single-provider concentration risk.
  • Cost varies significantly by underlying model (GPT-4o vs Llama 3.3); enforce per-model cost tracking via the pricing field.
  • All traffic is auditable via the audit-logger policy regardless of which upstream model is selected.
  • Switching models requires only a model field change — no SDK or provider reconfiguration — which accelerates vendor diversification.

Next steps