AIML API

AIML API aggregates 200+ AI models — including GPT-4o, Claude, Llama, Gemini, DeepSeek, and more — through a unified OpenAI-compatible endpoint. Keeptrusts routes AIML API requests through its policy engine, enabling governance, PII redaction, prompt-injection detection, and audit logging across the full catalog of hosted models with zero application-side changes.

Use this page when

You need the exact command, config, API, or integration details for AIML API.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Because AIML API uses the OpenAI wire format, any OpenAI SDK client can be pointed at the Keeptrusts gateway without code changes. The gateway handles authentication, policy enforcement, and optional fallback routing transparently.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

AIML API key — obtain one from AIML API.
Keeptrusts CLI — install kt (quickstart guide).
Export your API key so the gateway can read it at startup:

export AIMLAPI_KEY="your-aimlapi-key"

When the provider field is set to "aimlapi", Keeptrusts auto-detects both the base URL (https://api.aimlapi.com/v1) and the API key environment variable (AIMLAPI_KEY). You only need to override these if you use a non-standard env-var name.

Configuration

A minimal policy-config.yaml that routes traffic through AIML API with prompt-injection, PII, and safety policies:

pack:
  name: aimlapi-gateway
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - safety-filter
  - audit-logger
policy:
  prompt-injection:
    threshold: 0.8
    action: block
  pii-detector:
    action: redact
  safety-filter:
    mode: strict
    action: block
  audit-logger:
    retention_days: 365
providers:
  strategy: single
  targets:
  - id: aimlapi-gpt4o
    provider: aimlapi
    model: gpt-4o
    base_url: https://api.aimlapi.com/v1
    secret_key_ref:
      env: AIMLAPI_KEY

Start the gateway:

kt gateway run \
  --listen 0.0.0.0:41002 \
  --policy-config policy-config.yaml

Compact Provider Shorthand

You can encode the model directly in the provider field. The two forms below are equivalent:

# Shorthand — model embedded in the provider string
- id: "aimlapi-gpt4o"
  provider: "aimlapi:chat:gpt-4o"

# Explicit — separate provider and model fields
- id: "aimlapi-gpt4o"
  provider: "aimlapi"
  model: "gpt-4o"

Provider Fields

All fields available on a providers.targets[] entry for AIML API:

Field	Type	Default	Description
`id`	string	required	Unique identifier for this target. Used in logs, the console dashboard, and routing decisions.
`provider`	string	required	Provider ID. Use `"aimlapi"` or the shorthand `"aimlapi:chat:<model>"`.
`model`	string	required	Model name, e.g. `"gpt-4o"` or `"meta-llama/Llama-3.3-70B-Instruct-Turbo"`. Passed through to the upstream API as-is.
`base_url`	string	`https://api.aimlapi.com/v1`	API base URL. Auto-detected when provider is `"aimlapi"`. Override for custom routing.
`secret_key_ref`	object	`AIMLAPI_KEY`	Object reference to the environment variable holding the AIML API key. Auto-detected for the `"aimlapi"` provider.
`format`	string	`"openai"`	Wire format. AIML API is OpenAI-compatible, so this is always `"openai"`.
`timeout_seconds`	integer	`60`	Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error.
`stream_timeout_seconds`	integer	inherits `timeout_seconds`	Maximum wall-clock time for streaming requests. Set higher than `timeout_seconds` for long generations.
`max_context_tokens`	integer	none	Maximum token budget for the request. The gateway rejects requests that exceed this limit before forwarding.
`description`	string	none	Human-readable label shown in the console dashboard, logs, and health-check output.
`weight`	float	`1.0`	Routing weight used by the `weighted_round_robin` strategy.
`pricing`	object	none	Token pricing in USD per 1M tokens. Fields: `prompt` (input cost), `completion` (output cost).
`health_probe`	object	none	Active health probe. Sub-fields: `enabled` (bool), `interval_seconds` (int), `timeout_seconds` (int).

Supported Models

AIML API aggregates 200+ models. A representative selection:

Model	Category	Notes
`gpt-4o`	OpenAI	Latest multimodal GPT-4o
`claude-3-5-sonnet-20241022`	Anthropic	Anthropic's strongest coding and reasoning model
`meta-llama/Llama-3.3-70B-Instruct-Turbo`	Meta	High-throughput open-weight Llama 3.3
`gemini-2.0-flash`	Google	Fast, cost-efficient Gemini 2.0
`deepseek-r1`	DeepSeek	Strong mathematical and scientific reasoning
`mistral-large-latest`	Mistral	Multilingual flagship from Mistral AI
`grok-2`	xAI	Real-time knowledge via xAI Grok

The full model catalog is available on the AIML API model page. Keeptrusts passes the model identifier through to the upstream without validation, so newly added models are supported automatically.

Because AIML API aggregates providers, you can switch between GPT-4o, Claude, and Llama by changing only the model field in your config — no provider or SDK changes required.

Client Examples

Once the gateway is running, point your client SDK to http://localhost:8080 instead of https://api.aimlapi.com/v1. Standard OpenAI-format requests work unchanged.

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # auth is handled by Keeptrusts via AIMLAPI_KEY
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarize the key differences between transformer and diffusion models."},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused", // auth handled by Keeptrusts via AIMLAPI_KEY
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Summarize the key differences between transformer and diffusion models." },
  ],
  temperature: 0.7,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize the key differences between transformer and diffusion models."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Streaming

Keeptrusts fully supports streaming for AIML API. Set stream: true in your request — the gateway applies policies to each chunk in real time, including content filtering and PII redaction on partial tokens.

Configure stream_timeout_seconds to allow enough time for long-running streamed generations:

pack:
  name: aimlapi-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: aimlapi-streaming
    provider: aimlapi
    model: meta-llama/Llama-3.3-70B-Instruct-Turbo
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Explain zero-trust network architecture in depth."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({ baseURL: "http://localhost:8080/v1", apiKey: "unused" });

const stream = await client.chat.completions.create({
  model: "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  messages: [{ role: "user", content: "Explain zero-trust network architecture in depth." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [{"role": "user", "content": "Explain zero-trust network architecture in depth."}],
    "stream": true
  }'

Advanced Configuration

Multi-Model Fallback

Route across different models hosted through AIML API, failing over automatically on errors or timeouts:

pack:
  name: aimlapi-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: aimlapi-gpt4o-primary
    provider: aimlapi
    model: gpt-4o
    secret_key_ref:
      env: AIMLAPI_KEY
  - id: aimlapi-llama-fallback
    provider: aimlapi
    model: meta-llama/Llama-3.3-70B-Instruct-Turbo
    secret_key_ref:
      env: AIMLAPI_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Cross-Provider A/B Testing

Split traffic between models from different originating providers — all aggregated through a single AIML API key:

pack:
  name: aimlapi-providers-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: variant-gpt4o
    provider: aimlapi
    model: gpt-4o
    secret_key_ref:
      env: AIMLAPI_KEY
  - id: variant-claude
    provider: aimlapi
    model: claude-3-5-sonnet-20241022
    secret_key_ref:
      env: AIMLAPI_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Pair with audit-logger and the console Events dashboard to compare output quality and cost per variant.

Circuit Breaker

Protect your application when AIML API or a specific model becomes degraded:

pack:
  name: aimlapi-providers-6
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: aimlapi-primary
    provider: aimlapi
    model: gpt-4o
    secret_key_ref:
      env: AIMLAPI_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Best Practices

Use model shorthand for quick configs: provider: "aimlapi:chat:gpt-4o" is equivalent to setting provider: "aimlapi" + model: "gpt-4o" and keeps your config concise.
Set pricing for cost tracking: AIML API charges vary by underlying model. Populate pricing.prompt and pricing.completion with the per-model rates so cost dashboards are accurate.
Enable health_probe in production to detect AIML API degradation before clients experience errors. Set interval_seconds: 30 and timeout_seconds: 5 as a baseline.
Increase stream_timeout_seconds for large models like Llama 3.3 70B or DeepSeek R1 — they produce tokens more slowly than GPT-4o. Start at 300 seconds and tune down if needed.
Centralize your API key: Store AIMLAPI_KEY in a secret manager and inject it at runtime. Do not hard-code it in policy-config.yaml.
Pin model versions in production: Use exact model identifiers rather than generic aliases to guarantee reproducible outputs across deployments.

For AI systems

Canonical terms: Keeptrusts gateway, AIML API, provider target, policy-config.yaml, provider: "aimlapi", secret_key_ref, AIMLAPI_KEY.
Config field names: provider, model, base_url, secret_key_ref.env, format: "openai", pricing.prompt, pricing.completion, health_probe, weight.
Provider shorthand: aimlapi:chat:<model> (e.g., aimlapi:chat:gpt-4o).
Best next pages: OpenAI integration, Provider routing, Policy configuration.

For engineers

Prerequisites: AIML API key (AIMLAPI_KEY env var), kt CLI installed.
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'.
AIML API uses OpenAI wire format — any OpenAI SDK client works without code changes.
Set pricing.prompt and pricing.completion per model for accurate cost dashboards.

For leaders

AIML API aggregates 200+ models under one API key — simplifies vendor management but creates single-provider concentration risk.
Cost varies significantly by underlying model (GPT-4o vs Llama 3.3); enforce per-model cost tracking via the pricing field.
All traffic is auditable via the audit-logger policy regardless of which upstream model is selected.
Switching models requires only a model field change — no SDK or provider reconfiguration — which accelerates vendor diversification.

Next steps

OpenAI integration — native OpenAI routing when you don't need aggregation
Provider routing strategies — fallback, round-robin, and weighted routing
Policy configuration — prompt-injection, PII, and safety policy reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Compact Provider Shorthand​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Multi-Model Fallback​

Cross-Provider A/B Testing​

Circuit Breaker​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​