-
Notifications
You must be signed in to change notification settings - Fork 398
Description
Product
BAML
Describe the bug
Summary
The aws-bedrock provider fails to authenticate in containerized AWS environments (ECS, EKS, Lambda, Bedrock AgentCore Runtime) due to incorrect implementation of the IMDSv2 (Instance Metadata Service v2) credential fetching protocol.
Environment
- BAML Version: 0.218.0 (baml-py)
- Runtime: AWS Bedrock AgentCore Runtime (Firecracker-based containerized environment)
- Affected Deployments: All AWS containerized environments using IAM roles (ECS task roles, EKS IRSA, Lambda execution roles, EC2 instance profiles)
- Related Issue: Feature Request: AWS Web Identity Token / IRSA support for aws-bedrock #2849 (IRSA/Web Identity Token support - same root cause)
Debugging Checklist
- IAM Role is properly attached to the container/instance and has necessary Bedrock permissions
- Security groups allow outbound traffic (even changed to a permissive rule to allow all outbound traffic)
- IMDS is enabled and accessible - can reach
http://169.254.169.254from within the container - boto3 works in the same environment - confirms IAM role, network, and permissions are correct
- langchain_aws works in the same environment - another library using boto3 authenticates successfully
- No conflicting environment variables - AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, and AWS_PROFILE are unset
- AWS_REGION is properly set to a valid region (us-east-1)
- Manual credential injection works - setting credentials as environment variables allows BAML to work
Problem Description
When running in AWS containerized environments, BAML's aws-bedrock provider throws a DispatchFailure with a nested FailedToLoadToken error, ultimately caused by an HTTP 405 response from the Instance Metadata Service (IMDS).
Error Chain:
DispatchFailure
└─ ConnectorError
└─ ProviderError
└─ FailedToLoadToken
└─ ServiceError
└─ TokenError { kind: NoTtl }
└─ HTTP 405: "Not allowed HTTP method"
The IMDS endpoint responds with Allow: GET, indicating that BAML is using the wrong HTTP method for the token request.
Root Cause Analysis
IMDSv2 Protocol Requirements
AWS IMDSv2 requires a two-step authentication flow:
-
Token Acquisition (Step 1):
- Method:
PUT(not GET) - Endpoint:
http://169.254.169.254/latest/api/token - Required Header:
X-aws-ec2-metadata-token-ttl-seconds: 21600 - Response: Session token (string)
- Method:
-
Credentials Retrieval (Step 2):
- Method:
GET - Endpoint:
http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name> - Required Header:
X-aws-ec2-metadata-token: <token-from-step-1> - Response: JSON with AccessKeyId, SecretAccessKey, Token
- Method:
BAML's Current Implementation Issues
Based on the error traceback, BAML's credential provider is:
- Using wrong HTTP method: Attempting GET instead of PUT for token acquisition
- Missing required header: The
NoTtlerror indicates theX-aws-ec2-metadata-token-ttl-secondsheader is missing or not properly formatted - Not following the two-step flow: The token exchange protocol is not being implemented correctly
The error originates deep in BAML's Rust core, specifically in the AWS credential provider chain implementation.
Evidence
- boto3 works in same environment: Python's boto3 library successfully authenticates using proper IMDSv2 implementation
- langchain_aws works in same environment: LangChain's high-level AWS SDK integration (which uses boto3) successfully authenticates
- Manual credential injection works: Bypassing IMDS by setting
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_SESSION_TOKENenvironment variables works - Local development works: AWS profiles work correctly (they don't use IMDS)
- Error is protocol-level, not authentication: The failure occurs at token fetch, before credentials are even requested
Expected Behavior
BAML should support the standard AWS credential provider chain, which includes (in order):
- Environment variables (
AWS_ACCESS_KEY_ID, etc.) ✅ Currently works - Web Identity Token (
AWS_WEB_IDENTITY_TOKEN_FILE+AWS_ROLE_ARN) See IRSA/Web Identity Token support - same root cause - ECS Container Credentials (via
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI) ❌ Broken - EC2 Instance Metadata (IMDSv2) ❌ Broken (this issue)
- AWS Profiles (
~/.aws/credentials) ✅ Currently works
Implementation Recommendations
Solution 1: Use aws-config DefaultCredentialsChain (Recommended)
The Rust aws-config crate already provides a complete, production-ready credential provider chain that handles all AWS authentication methods correctly.
Reference: https://docs.rs/aws-config/latest/aws_config/
The aws-config crate includes:
DefaultCredentialsChain- Implements full AWS SDK credential resolutionImdsCredentialsProvider- Proper IMDSv2 implementationWebIdentityTokenCredentialsProvider- IRSA support (fixes Feature Request: AWS Web Identity Token / IRSA support for aws-bedrock #2849)EcsCredentialsProvider- ECS task role support- Automatic credential refresh and caching
Implementation approach:
Instead of implementing custom credential resolution in aws_client.rs, use:
use aws_config::default_provider::credentials::DefaultCredentialsChain;
// Let aws-config handle the entire credential chain
let config = aws_config::defaults(BehaviorVersion::latest())
.region(region)
.load()
.await;
// Use config.credentials_provider() for the bedrock clientThis single change would:
- Fix IMDSv2 protocol handling
- Add IRSA support (resolves Feature Request: AWS Web Identity Token / IRSA support for aws-bedrock #2849)
- Add ECS container credentials support
- Add automatic credential refresh
- Follow AWS SDK best practices
Solution 2: Manual IMDSv2 Fix (Minimal)
If DefaultCredentialsChain cannot be used, the minimal fix requires:
-
Change HTTP method from GET to PUT for token endpoint:
let token_response = http_client .put("http://169.254.169.254/latest/api/token") .header("X-aws-ec2-metadata-token-ttl-seconds", "21600") .send() .await?;
-
Use token in credentials request:
let token = token_response.text().await?; let creds_response = http_client .get("http://169.254.169.254/latest/meta-data/iam/security-credentials/") .header("X-aws-ec2-metadata-token", &token) .send() .await?;
-
Implement token caching (tokens are valid for TTL duration)
-
Handle credential refresh (temporary credentials expire)
However, this approach is error-prone and doesn't solve related issues like IRSA support.
References
- IMDSv2 Specification: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html
- aws-config Rust Crate: https://docs.rs/aws-config/latest/aws_config/
- BAML aws_client.rs: https://github.com/BoundaryML/baml/blob/canary/engine/baml-runtime/src/internal/llm_client/primitive/aws/aws_client.rs
- Related Issue Feature Request: AWS Web Identity Token / IRSA support for aws-bedrock #2849: IRSA/Web Identity Token support
Please implement proper AWS credential chain support using the aws-config crate's DefaultCredentialsChain. This would resolve this issue, #2849, and provide a production-ready authentication experience that matches boto3 and other AWS SDKs.
Thank you for your attention!
Reproduction Steps
Prerequisites
- AWS ECS/EKS/Lambda/EC2 environment with IAM role attached (no static credentials)
- BAML 0.218.0 (baml-py)
- Network access to IMDS endpoint (169.254.169.254)
Step 1: Create BAML Client Configuration
Create a clients.baml file:
client<llm> BedrockClaude {
provider aws-bedrock
options {
model "us.anthropic.claude-3-5-haiku-20241022-v1:0"
region env.AWS_REGION
// Note: NOT setting access_key_id, secret_access_key, or session_token
// to force BAML to use credential discovery (IMDS)
}
}
Step 2: Ensure Environment Variables Are NOT Set
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset AWS_SESSION_TOKEN
unset AWS_PROFILE
# AWS_REGION should be set
export AWS_REGION=us-east-This forces BAML to use the credential provider chain (IMDS).
Step 3: Attempt to Call BAML Function
from baml_client import b
# This will fail with DispatchFailure -> FailedToLoadToken
response = b.MyFunction("test input")Step 4: Observe the Error
DispatchFailure(
DispatchFailure {
source: ConnectorError {
kind: Other(None),
source: ProviderError(
ProviderError {
source: FailedToLoadToken(
FailedToLoadToken {
source: ServiceError(
ServiceError {
source: TokenError {
kind: NoTtl,
},
raw: Response {
status: StatusCode(405),
headers: {
"server": "Firecracker API",
"allow": "GET",
},
body: "Not allowed HTTP method.",
},
},
),
},
),
},
),
},
},
)
Step 5: Verify boto3 Works (Comparison)
In the same environment, boto3 authenticates successfully:
import boto3
# This works - boto3 implements IMDSv2 correctly
session = boto3.Session()
bedrock_runtime = session.client('bedrock-runtime', region_name='us-west-2')
response = bedrock_runtime.invoke_model(
modelId='us.anthropic.claude-3-5-haiku-20241022-v1:0',
body='{"messages": [{"role": "user", "content": "test"}], "anthropic_version": "bedrock-2023-05-31", "max_tokens": 100}'
)
# Success - returns model responseBAML Version
0.218.0
Language/Framework
Python
LLM Provider
Other
LLM Model
"us.anthropic.claude-3-5-haiku-20241022-v1:0" via "aws-bedrock" provider
Operating System
None
Browser
None
Code Editor
None