AWS SageMaker
AWS SageMaker lets you deploy foundation models — via JumpStart or custom training — as real-time inference endpoints inside your own VPC. Keeptrusts gateways SageMaker endpoints using AWS Signature Version 4 (SigV4) authentication, eliminating the need to distribute AWS credentials to application teams while enforcing prompt-injection, PII, DLP, and audit policies on every request. The SageMaker endpoint must expose an OpenAI-compatible API (standard for JumpStart-deployed models); Keeptrusts translates the standard OpenAI format used by your clients automatically.
Use this page when
- You need the exact command, config, API, or integration details for AWS SageMaker.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Prerequisites
- An active SageMaker real-time inference endpoint in a supported AWS region
- The endpoint must serve an OpenAI-compatible API (all JumpStart foundation model deployments do by default)
- AWS credentials with
sagemaker:InvokeEndpointpermission on the target endpoint ARN - Keeptrusts CLI (
kt) installed and on yourPATH
Configuration
Minimal configuration
pack:
name: sagemaker-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: sagemaker-llama
provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Keeptrusts picks up AWS credentials from the standard credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars, ~/.aws/credentials, or an EC2/ECS instance role).
Full named configuration with policy chain
pack:
name: sagemaker-enterprise
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- dlp-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
entities:
- EMAIL
- PHONE
- SSN
- CREDIT_CARD
dlp-filter:
patterns:
- name: aws-access-key
regex: AKIA[A-Z0-9]{16}
action: block
- name: aws-secret-key
regex: "[A-Za-z0-9/+=]{40}"
action: redact
- name: internal-arn
regex: 'arn:aws:[a-z0-9-]+:[a-z0-9-]*:[0-9]{12}:'
action: redact
audit-logger:
retention_days: 365
providers:
targets:
- id: sagemaker-llama-70b
provider: sagemaker:chat:jumpstart-llama-3-3-70b
Named AWS profile (non-default credentials)
pack:
name: sagemaker-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: sagemaker-prod
provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Mistral 7B endpoint
pack:
name: sagemaker-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: sagemaker-mistral
provider: sagemaker:chat:jumpstart-mistral-7b
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Custom (non-JumpStart) endpoint
When your endpoint name does not match a JumpStart alias, supply the exact SageMaker endpoint name as the model value:
pack:
name: sagemaker-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: sagemaker-custom-model
provider: sagemaker:chat:my-finetuned-llama-endpoint
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Start the gateway
# Using environment credentials
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
# Using a named profile
AWS_PROFILE=sagemaker-inference-role kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
Provider Fields
| Field | Required | Default | Description |
|---|---|---|---|
provider | Yes | — | Provider identifier. Use "sagemaker:chat:<endpoint-name>" or bare "sagemaker". The model segment must match the SageMaker endpoint name exactly, or one of the JumpStart aliases below. |
aws_region | Yes | — | AWS region where the SageMaker endpoint is deployed (e.g., us-east-1, eu-west-1). |
aws_profile | No | Default chain | Named profile from ~/.aws/config. Defaults to the environment credential chain if omitted. |
provider_type | No | sagemaker | Forces the SageMaker runtime. Required when the provider string is ambiguous. |
format | No | openai | Wire format. SageMaker JumpStart endpoints expose an OpenAI-compatible API; keep this as openai. |
base_url | No | Auto-derived | Override the SageMaker runtime URL. Auto-derived as https://runtime.sagemaker.{aws_region}.amazonaws.com if omitted. |
max_context_tokens | No | Model default | Maximum context length for the target endpoint. Used to validate requests before they reach SageMaker and avoid expensive rejected calls. |
timeout_seconds | No | 60 | HTTP timeout for a complete response. Large models (70B+) can take 60–120 s for long completions. Increase this for streaming workloads. |
options.max_tokens | No | Model default | Maximum tokens in the completion. |
options.temperature | No | 0.3 | Sampling temperature (0–2). |
Supported Models
The following JumpStart aliases are recognised by Keeptrusts. For other models, use the exact SageMaker endpoint name as the model field.
| JumpStart Alias | Underlying Model | Context Window | Notes |
|----------------|-----------------|---------------|
| jumpstart-llama-3-3-70b | Meta Llama 3.3 70B Instruct | 128k | Best quality JumpStart LLM; requires ml.g5.48xlarge or larger |
| jumpstart-llama-3-1-8b | Meta Llama 3.1 8B Instruct | 128k | Cost-efficient; runs on ml.g5.2xlarge |
| jumpstart-mistral-7b | Mistral 7B Instruct v0.3 | 32k | Strong instruction following; fast on ml.g5.2xlarge |
| jumpstart-falcon-40b | TII Falcon 40B Instruct | 8k | Legacy; prefer Llama or Mistral for new deployments |
| jumpstart-mixtral-8x7b | Mistral Mixtral 8×7B | 32k | Mixture-of-experts; requires ml.p4d.24xlarge |
For custom or non-JumpStart endpoints, the model field in your provider config must exactly match the SageMaker endpoint name shown in the SageMaker console — not the underlying model name. Keeptrusts uses this value verbatim when constructing the SigV4-signed invocation URL.
Client Examples
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # SigV4 auth is handled by the gateway
)
response = client.chat.completions.create(
model="jumpstart-llama-3-3-70b",
messages=[
{
"role": "system",
"content": "You are a helpful enterprise assistant.",
},
{
"role": "user",
"content": "Summarise the attached quarterly report in three bullet points.",
},
],
max_tokens=1024,
temperature=0.2,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused",
});
const response = await client.chat.completions.create({
model: "jumpstart-llama-3-3-70b",
messages: [
{
role: "system",
content: "You are a helpful enterprise assistant.",
},
{
role: "user",
content: "Summarise the attached quarterly report in three bullet points.",
},
],
max_tokens: 1024,
temperature: 0.2,
});
console.log(response.choices[0].message.content);
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "jumpstart-llama-3-3-70b",
"messages": [
{"role": "system", "content": "You are a helpful enterprise assistant."},
{"role": "user", "content": "Summarise the attached quarterly report in three bullet points."}
],
"max_tokens": 1024,
"temperature": 0.2
}'
Streaming
SageMaker real-time endpoints support streaming via InvokeEndpointWithResponseStream. Keeptrusts forwards SSE chunks after applying per-chunk content policies. Set stream: true in your request:
- Python
- Node.js
- cURL
with client.chat.completions.stream(
model="jumpstart-llama-3-3-70b",
messages=[{"role": "user", "content": "Walk me through a root-cause analysis approach."}],
max_tokens=2048,
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
const stream = await client.chat.completions.stream({
model: "jumpstart-llama-3-3-70b",
messages: [{ role: "user", content: "Walk me through a root-cause analysis approach." }],
max_tokens: 2048,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(delta);
}
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "jumpstart-llama-3-3-70b",
"messages": [{"role": "user", "content": "Walk me through a root-cause analysis approach."}],
"max_tokens": 2048,
"stream": true
}'
Increase timeout_seconds to at least 120 when streaming long-form completions from 70B models. The default 60 s covers most short completions but will time out on 2000+ token responses from large, slow instance types.
Advanced Configuration
Least-privilege IAM policy
Grant the Keeptrusts gateway the minimum permissions needed to invoke your endpoint. Avoid sagemaker:* wildcard grants:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "KeeptrustsSageMakerInvoke",
"Effect": "Allow",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointWithResponseStream"
],
"Resource": "arn:aws:sagemaker:us-east-1:123456789012:endpoint/jumpstart-llama-3-3-70b"
}
]
}
Parameterise the ARN per environment. Never share a single IAM policy that covers all endpoints across all environments.
Multi-region active-active routing
Route requests across two regional endpoints for high availability. Keeptrusts will use the first healthy target:
pack:
name: sagemaker-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: sagemaker-us-east-1
provider: sagemaker:chat:jumpstart-llama-3-3-70b
- id: sagemaker-us-west-2
provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Timeout tuning by model size
Large models on smaller instance types can be slow. Tune timeout_seconds to match your endpoint's P99 latency:
pack:
name: sagemaker-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: sagemaker-mistral-fast
provider: sagemaker:chat:jumpstart-mistral-7b
- id: sagemaker-llama-large
provider: sagemaker:chat:jumpstart-llama-3-3-70b
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Best Practices
-
Use instance role credentials in AWS-hosted deployments — When running the Keeptrusts gateway on ECS, EKS, or EC2, rely on the instance/task IAM role for credentials rather than distributing long-lived
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYenv vars. SigV4 signing works transparently with all standard AWS credential chain sources. -
Apply least-privilege IAM — Restrict the IAM policy to
sagemaker:InvokeEndpointandsagemaker:InvokeEndpointWithResponseStreamactions scoped to the exact endpoint ARN(s) the gateway needs. Never grantsagemaker:*or*:*. -
Set
max_context_tokensexplicitly — SageMaker charges per token processed, including input tokens. Settingmax_context_tokenscauses Keeptrusts to reject over-length requests before they reach AWS, preventing expensive rejected invocations and unexpected bills. -
Increase
timeout_secondsfor large models — 70B models on standard instance types regularly take 90–120 s for 2k-token completions. Insufficient timeouts cause client-visible errors that are billed by AWS even when the invocation is abandoned. -
Enable DLP filtering for AWS-internal identifiers — Use the
dlp-filterpolicy to block or redact AWS ARNs, access key IDs, and secret key patterns before they appear in model completions. SageMaker responses may echo back structured data from the prompt context. -
Pin endpoint names in production — Treat the SageMaker endpoint name in your provider config as a versioned artifact. Endpoint names are reused when models are redeployed, which can silently change model behaviour. Use separate endpoint names for different model versions and update the provider config deliberately.
For AI systems
- Canonical terms: Keeptrusts gateway, AWS SageMaker, SageMaker endpoints, SigV4, IAM, custom models, provider target, policy-config.yaml,
provider: "sagemaker". - Config field names:
provider,model,aws_region,sagemaker_endpoint_name,aws_access_key_env,aws_secret_key_env,format: "openai",provider_type: "sagemaker". - Auth: SigV4 request signing using standard AWS env vars, same as Bedrock.
- Key behavior: Keeptrusts routes to SageMaker real-time inference endpoints and translates between OpenAI format and SageMaker's invoke format.
- Best next pages: AWS Bedrock integration, vLLM integration, Policy configuration.
For engineers
- Prerequisites: SageMaker endpoint deployed and InService, IAM credentials with
sagemaker:InvokeEndpointpermission,ktCLI installed. - Required config:
sagemaker_endpoint_name,aws_region, and AWS credentials (env vars or instance profile). - Start command:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml. - Validate:
curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"your-endpoint","messages":[{"role":"user","content":"hello"}]}'. - Pin endpoint names in production — redeploying a model to the same endpoint name can silently change behavior.
- Use separate endpoint names for different model versions and update provider config deliberately.
For leaders
- SageMaker keeps inference within your AWS account — full data sovereignty and VPC isolation.
- Custom model endpoints support fine-tuned and proprietary models — Keeptrusts applies uniform governance regardless of model origin.
- IAM-based access integrates with existing AWS governance (CloudTrail, SCPs) for centralized control.
- Endpoint scaling and instance costs are your responsibility — capacity planning is separate from Keeptrusts policy configuration.
Next steps
- AWS Bedrock integration — managed foundation models on AWS without endpoint management
- vLLM integration — self-managed high-throughput serving (non-AWS alternative)
- Provider routing strategies — multi-endpoint failover
- Policy configuration — audit-logger and PII policy reference
- Quickstart — install
ktand run your first gateway