AWS SageMaker

AWS SageMaker lets you deploy foundation models — via JumpStart or custom training — as real-time inference endpoints inside your own VPC. Keeptrusts gateways SageMaker endpoints using AWS Signature Version 4 (SigV4) authentication, eliminating the need to distribute AWS credentials to application teams while enforcing prompt-injection, PII, DLP, and audit policies on every request. The SageMaker endpoint must expose an OpenAI-compatible API (standard for JumpStart-deployed models); Keeptrusts translates the standard OpenAI format used by your clients automatically.

Use this page when

You need the exact command, config, API, or integration details for AWS SageMaker.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

An active SageMaker real-time inference endpoint in a supported AWS region
The endpoint must serve an OpenAI-compatible API (all JumpStart foundation model deployments do by default)
AWS credentials with sagemaker:InvokeEndpoint permission on the target endpoint ARN
Keeptrusts CLI (kt) installed and on your PATH

Configuration

Minimal configuration

pack:
  name: sagemaker-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: sagemaker-llama
    provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Keeptrusts picks up AWS credentials from the standard credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars, ~/.aws/credentials, or an EC2/ECS instance role).

Full named configuration with policy chain

pack:
  name: sagemaker-enterprise
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - dlp-filter
  - audit-logger
policy:
  prompt-injection:
    threshold: 0.8
    action: block
  pii-detector:
    action: redact
    entities:
    - EMAIL
    - PHONE
    - SSN
    - CREDIT_CARD
  dlp-filter:
    patterns:
    - name: aws-access-key
      regex: AKIA[A-Z0-9]{16}
      action: block
    - name: aws-secret-key
      regex: "[A-Za-z0-9/+=]{40}"
      action: redact
    - name: internal-arn
      regex: 'arn:aws:[a-z0-9-]+:[a-z0-9-]*:[0-9]{12}:'
      action: redact
  audit-logger:
    retention_days: 365
providers:
  targets:
  - id: sagemaker-llama-70b
    provider: sagemaker:chat:jumpstart-llama-3-3-70b

Named AWS profile (non-default credentials)

pack:
  name: sagemaker-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: sagemaker-prod
    provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Mistral 7B endpoint

pack:
  name: sagemaker-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: sagemaker-mistral
    provider: sagemaker:chat:jumpstart-mistral-7b
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Custom (non-JumpStart) endpoint

When your endpoint name does not match a JumpStart alias, supply the exact SageMaker endpoint name as the model value:

pack:
  name: sagemaker-providers-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: sagemaker-custom-model
    provider: sagemaker:chat:my-finetuned-llama-endpoint
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Start the gateway

# Using environment credentials
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

# Using a named profile
AWS_PROFILE=sagemaker-inference-role kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Provider Fields

Field	Required	Default	Description
`provider`	Yes	—	Provider identifier. Use `"sagemaker:chat:<endpoint-name>"` or bare `"sagemaker"`. The model segment must match the SageMaker endpoint name exactly, or one of the JumpStart aliases below.
`aws_region`	Yes	—	AWS region where the SageMaker endpoint is deployed (e.g., `us-east-1`, `eu-west-1`).
`aws_profile`	No	Default chain	Named profile from `~/.aws/config`. Defaults to the environment credential chain if omitted.
`provider_type`	No	`sagemaker`	Forces the SageMaker runtime. Required when the provider string is ambiguous.
`format`	No	`openai`	Wire format. SageMaker JumpStart endpoints expose an OpenAI-compatible API; keep this as `openai`.
`base_url`	No	Auto-derived	Override the SageMaker runtime URL. Auto-derived as `https://runtime.sagemaker.{aws_region}.amazonaws.com` if omitted.
`max_context_tokens`	No	Model default	Maximum context length for the target endpoint. Used to validate requests before they reach SageMaker and avoid expensive rejected calls.
`timeout_seconds`	No	`60`	HTTP timeout for a complete response. Large models (70B+) can take 60–120 s for long completions. Increase this for streaming workloads.
`options.max_tokens`	No	Model default	Maximum tokens in the completion.
`options.temperature`	No	`0.3`	Sampling temperature (0–2).

Supported Models

The following JumpStart aliases are recognised by Keeptrusts. For other models, use the exact SageMaker endpoint name as the model field.

| JumpStart Alias | Underlying Model | Context Window | Notes | |----------------|-----------------|---------------| | jumpstart-llama-3-3-70b | Meta Llama 3.3 70B Instruct | 128k | Best quality JumpStart LLM; requires ml.g5.48xlarge or larger | | jumpstart-llama-3-1-8b | Meta Llama 3.1 8B Instruct | 128k | Cost-efficient; runs on ml.g5.2xlarge | | jumpstart-mistral-7b | Mistral 7B Instruct v0.3 | 32k | Strong instruction following; fast on ml.g5.2xlarge | | jumpstart-falcon-40b | TII Falcon 40B Instruct | 8k | Legacy; prefer Llama or Mistral for new deployments | | jumpstart-mixtral-8x7b | Mistral Mixtral 8×7B | 32k | Mixture-of-experts; requires ml.p4d.24xlarge |

Endpoint name must match

For custom or non-JumpStart endpoints, the model field in your provider config must exactly match the SageMaker endpoint name shown in the SageMaker console — not the underlying model name. Keeptrusts uses this value verbatim when constructing the SigV4-signed invocation URL.

Client Examples

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # SigV4 auth is handled by the gateway
)

response = client.chat.completions.create(
    model="jumpstart-llama-3-3-70b",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful enterprise assistant.",
        },
        {
            "role": "user",
            "content": "Summarise the attached quarterly report in three bullet points.",
        },
    ],
    max_tokens=1024,
    temperature=0.2,
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused",
});

const response = await client.chat.completions.create({
  model: "jumpstart-llama-3-3-70b",
  messages: [
    {
      role: "system",
      content: "You are a helpful enterprise assistant.",
    },
    {
      role: "user",
      content: "Summarise the attached quarterly report in three bullet points.",
    },
  ],
  max_tokens: 1024,
  temperature: 0.2,
});
console.log(response.choices[0].message.content);

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jumpstart-llama-3-3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful enterprise assistant."},
      {"role": "user", "content": "Summarise the attached quarterly report in three bullet points."}
    ],
    "max_tokens": 1024,
    "temperature": 0.2
  }'

Streaming

SageMaker real-time endpoints support streaming via InvokeEndpointWithResponseStream. Keeptrusts forwards SSE chunks after applying per-chunk content policies. Set stream: true in your request:

Python
Node.js
cURL

with client.chat.completions.stream(
    model="jumpstart-llama-3-3-70b",
    messages=[{"role": "user", "content": "Walk me through a root-cause analysis approach."}],
    max_tokens=2048,
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

const stream = await client.chat.completions.stream({
  model: "jumpstart-llama-3-3-70b",
  messages: [{ role: "user", content: "Walk me through a root-cause analysis approach." }],
  max_tokens: 2048,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(delta);
}

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jumpstart-llama-3-3-70b",
    "messages": [{"role": "user", "content": "Walk me through a root-cause analysis approach."}],
    "max_tokens": 2048,
    "stream": true
  }'

Timeout configuration for streaming

Increase timeout_seconds to at least 120 when streaming long-form completions from 70B models. The default 60 s covers most short completions but will time out on 2000+ token responses from large, slow instance types.

Advanced Configuration

Least-privilege IAM policy

Grant the Keeptrusts gateway the minimum permissions needed to invoke your endpoint. Avoid sagemaker:* wildcard grants:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "KeeptrustsSageMakerInvoke",
      "Effect": "Allow",
      "Action": [
        "sagemaker:InvokeEndpoint",
        "sagemaker:InvokeEndpointWithResponseStream"
      ],
      "Resource": "arn:aws:sagemaker:us-east-1:123456789012:endpoint/jumpstart-llama-3-3-70b"
    }
  ]
}

Parameterise the ARN per environment. Never share a single IAM policy that covers all endpoints across all environments.

Multi-region active-active routing

Route requests across two regional endpoints for high availability. Keeptrusts will use the first healthy target:

pack:
  name: sagemaker-providers-6
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: sagemaker-us-east-1
    provider: sagemaker:chat:jumpstart-llama-3-3-70b
  - id: sagemaker-us-west-2
    provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Timeout tuning by model size

Large models on smaller instance types can be slow. Tune timeout_seconds to match your endpoint's P99 latency:

pack:
  name: sagemaker-providers-7
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: sagemaker-mistral-fast
    provider: sagemaker:chat:jumpstart-mistral-7b
  - id: sagemaker-llama-large
    provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Best Practices

Use instance role credentials in AWS-hosted deployments — When running the Keeptrusts gateway on ECS, EKS, or EC2, rely on the instance/task IAM role for credentials rather than distributing long-lived AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars. SigV4 signing works transparently with all standard AWS credential chain sources.
Apply least-privilege IAM — Restrict the IAM policy to sagemaker:InvokeEndpoint and sagemaker:InvokeEndpointWithResponseStream actions scoped to the exact endpoint ARN(s) the gateway needs. Never grant sagemaker:* or *:*.
Set max_context_tokens explicitly — SageMaker charges per token processed, including input tokens. Setting max_context_tokens causes Keeptrusts to reject over-length requests before they reach AWS, preventing expensive rejected invocations and unexpected bills.
Increase timeout_seconds for large models — 70B models on standard instance types regularly take 90–120 s for 2k-token completions. Insufficient timeouts cause client-visible errors that are billed by AWS even when the invocation is abandoned.
Enable DLP filtering for AWS-internal identifiers — Use the dlp-filter policy to block or redact AWS ARNs, access key IDs, and secret key patterns before they appear in model completions. SageMaker responses may echo back structured data from the prompt context.
Pin endpoint names in production — Treat the SageMaker endpoint name in your provider config as a versioned artifact. Endpoint names are reused when models are redeployed, which can silently change model behaviour. Use separate endpoint names for different model versions and update the provider config deliberately.

For AI systems

Canonical terms: Keeptrusts gateway, AWS SageMaker, SageMaker endpoints, SigV4, IAM, custom models, provider target, policy-config.yaml, provider: "sagemaker".
Config field names: provider, model, aws_region, sagemaker_endpoint_name, aws_access_key_env, aws_secret_key_env, format: "openai", provider_type: "sagemaker".
Auth: SigV4 request signing using standard AWS env vars, same as Bedrock.
Key behavior: Keeptrusts routes to SageMaker real-time inference endpoints and translates between OpenAI format and SageMaker's invoke format.
Best next pages: AWS Bedrock integration, vLLM integration, Policy configuration.

For engineers

Prerequisites: SageMaker endpoint deployed and InService, IAM credentials with sagemaker:InvokeEndpoint permission, kt CLI installed.
Required config: sagemaker_endpoint_name, aws_region, and AWS credentials (env vars or instance profile).
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"your-endpoint","messages":[{"role":"user","content":"hello"}]}'.
Pin endpoint names in production — redeploying a model to the same endpoint name can silently change behavior.
Use separate endpoint names for different model versions and update provider config deliberately.

For leaders

SageMaker keeps inference within your AWS account — full data sovereignty and VPC isolation.
Custom model endpoints support fine-tuned and proprietary models — Keeptrusts applies uniform governance regardless of model origin.
IAM-based access integrates with existing AWS governance (CloudTrail, SCPs) for centralized control.
Endpoint scaling and instance costs are your responsibility — capacity planning is separate from Keeptrusts policy configuration.

Next steps

AWS Bedrock integration — managed foundation models on AWS without endpoint management
vLLM integration — self-managed high-throughput serving (non-AWS alternative)
Provider routing strategies — multi-endpoint failover
Policy configuration — audit-logger and PII policy reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Minimal configuration​

Full named configuration with policy chain​

Named AWS profile (non-default credentials)​

Mistral 7B endpoint​

Custom (non-JumpStart) endpoint​

Start the gateway​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Least-privilege IAM policy​

Multi-region active-active routing​

Timeout tuning by model size​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​