Skip to content

[bug] AWS Bedrock Provider IMDSv2 Protocol Violation Causes HTTP 405 in Containerized Environments #3033

@mallmann02

Description

@mallmann02

Product

BAML

Describe the bug

Summary

The aws-bedrock provider fails to authenticate in containerized AWS environments (ECS, EKS, Lambda, Bedrock AgentCore Runtime) due to incorrect implementation of the IMDSv2 (Instance Metadata Service v2) credential fetching protocol.

Environment

  • BAML Version: 0.218.0 (baml-py)
  • Runtime: AWS Bedrock AgentCore Runtime (Firecracker-based containerized environment)
  • Affected Deployments: All AWS containerized environments using IAM roles (ECS task roles, EKS IRSA, Lambda execution roles, EC2 instance profiles)
  • Related Issue: Feature Request: AWS Web Identity Token / IRSA support for aws-bedrock #2849 (IRSA/Web Identity Token support - same root cause)

Debugging Checklist

  • IAM Role is properly attached to the container/instance and has necessary Bedrock permissions
  • Security groups allow outbound traffic (even changed to a permissive rule to allow all outbound traffic)
  • IMDS is enabled and accessible - can reach http://169.254.169.254 from within the container
  • boto3 works in the same environment - confirms IAM role, network, and permissions are correct
  • langchain_aws works in the same environment - another library using boto3 authenticates successfully
  • No conflicting environment variables - AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, and AWS_PROFILE are unset
  • AWS_REGION is properly set to a valid region (us-east-1)
  • Manual credential injection works - setting credentials as environment variables allows BAML to work

Problem Description

When running in AWS containerized environments, BAML's aws-bedrock provider throws a DispatchFailure with a nested FailedToLoadToken error, ultimately caused by an HTTP 405 response from the Instance Metadata Service (IMDS).

Error Chain:

DispatchFailure
  └─ ConnectorError
      └─ ProviderError
          └─ FailedToLoadToken
              └─ ServiceError
                  └─ TokenError { kind: NoTtl }
                      └─ HTTP 405: "Not allowed HTTP method"

The IMDS endpoint responds with Allow: GET, indicating that BAML is using the wrong HTTP method for the token request.

Root Cause Analysis

IMDSv2 Protocol Requirements

AWS IMDSv2 requires a two-step authentication flow:

  1. Token Acquisition (Step 1):

    • Method: PUT (not GET)
    • Endpoint: http://169.254.169.254/latest/api/token
    • Required Header: X-aws-ec2-metadata-token-ttl-seconds: 21600
    • Response: Session token (string)
  2. Credentials Retrieval (Step 2):

    • Method: GET
    • Endpoint: http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>
    • Required Header: X-aws-ec2-metadata-token: <token-from-step-1>
    • Response: JSON with AccessKeyId, SecretAccessKey, Token

BAML's Current Implementation Issues

Based on the error traceback, BAML's credential provider is:

  1. Using wrong HTTP method: Attempting GET instead of PUT for token acquisition
  2. Missing required header: The NoTtl error indicates the X-aws-ec2-metadata-token-ttl-seconds header is missing or not properly formatted
  3. Not following the two-step flow: The token exchange protocol is not being implemented correctly

The error originates deep in BAML's Rust core, specifically in the AWS credential provider chain implementation.

Evidence

  1. boto3 works in same environment: Python's boto3 library successfully authenticates using proper IMDSv2 implementation
  2. langchain_aws works in same environment: LangChain's high-level AWS SDK integration (which uses boto3) successfully authenticates
  3. Manual credential injection works: Bypassing IMDS by setting AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables works
  4. Local development works: AWS profiles work correctly (they don't use IMDS)
  5. Error is protocol-level, not authentication: The failure occurs at token fetch, before credentials are even requested

Expected Behavior

BAML should support the standard AWS credential provider chain, which includes (in order):

  1. Environment variables (AWS_ACCESS_KEY_ID, etc.) ✅ Currently works
  2. Web Identity Token (AWS_WEB_IDENTITY_TOKEN_FILE + AWS_ROLE_ARN) See IRSA/Web Identity Token support - same root cause
  3. ECS Container Credentials (via AWS_CONTAINER_CREDENTIALS_RELATIVE_URI) ❌ Broken
  4. EC2 Instance Metadata (IMDSv2) ❌ Broken (this issue)
  5. AWS Profiles (~/.aws/credentials) ✅ Currently works

Implementation Recommendations

Solution 1: Use aws-config DefaultCredentialsChain (Recommended)

The Rust aws-config crate already provides a complete, production-ready credential provider chain that handles all AWS authentication methods correctly.

Reference: https://docs.rs/aws-config/latest/aws_config/

The aws-config crate includes:

Implementation approach:

Instead of implementing custom credential resolution in aws_client.rs, use:

use aws_config::default_provider::credentials::DefaultCredentialsChain;

// Let aws-config handle the entire credential chain
let config = aws_config::defaults(BehaviorVersion::latest())
    .region(region)
    .load()
    .await;

// Use config.credentials_provider() for the bedrock client

This single change would:

Solution 2: Manual IMDSv2 Fix (Minimal)

If DefaultCredentialsChain cannot be used, the minimal fix requires:

  1. Change HTTP method from GET to PUT for token endpoint:

    let token_response = http_client
        .put("http://169.254.169.254/latest/api/token")
        .header("X-aws-ec2-metadata-token-ttl-seconds", "21600")
        .send()
        .await?;
  2. Use token in credentials request:

    let token = token_response.text().await?;
    
    let creds_response = http_client
        .get("http://169.254.169.254/latest/meta-data/iam/security-credentials/")
        .header("X-aws-ec2-metadata-token", &token)
        .send()
        .await?;
  3. Implement token caching (tokens are valid for TTL duration)

  4. Handle credential refresh (temporary credentials expire)

However, this approach is error-prone and doesn't solve related issues like IRSA support.

References

Please implement proper AWS credential chain support using the aws-config crate's DefaultCredentialsChain. This would resolve this issue, #2849, and provide a production-ready authentication experience that matches boto3 and other AWS SDKs.

Thank you for your attention!

Reproduction Steps

Prerequisites

  • AWS ECS/EKS/Lambda/EC2 environment with IAM role attached (no static credentials)
  • BAML 0.218.0 (baml-py)
  • Network access to IMDS endpoint (169.254.169.254)

Step 1: Create BAML Client Configuration

Create a clients.baml file:

client<llm> BedrockClaude {
  provider aws-bedrock
  options {
    model "us.anthropic.claude-3-5-haiku-20241022-v1:0"
    region env.AWS_REGION
    // Note: NOT setting access_key_id, secret_access_key, or session_token
    // to force BAML to use credential discovery (IMDS)
  }
}

Step 2: Ensure Environment Variables Are NOT Set

unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset AWS_SESSION_TOKEN
unset AWS_PROFILE
# AWS_REGION should be set
export AWS_REGION=us-east-

This forces BAML to use the credential provider chain (IMDS).

Step 3: Attempt to Call BAML Function

from baml_client import b

# This will fail with DispatchFailure -> FailedToLoadToken
response = b.MyFunction("test input")

Step 4: Observe the Error

DispatchFailure(
    DispatchFailure {
        source: ConnectorError {
            kind: Other(None),
            source: ProviderError(
                ProviderError {
                    source: FailedToLoadToken(
                        FailedToLoadToken {
                            source: ServiceError(
                                ServiceError {
                                    source: TokenError {
                                        kind: NoTtl,
                                    },
                                    raw: Response {
                                        status: StatusCode(405),
                                        headers: {
                                            "server": "Firecracker API",
                                            "allow": "GET",
                                        },
                                        body: "Not allowed HTTP method.",
                                    },
                                },
                            ),
                        },
                    ),
                },
            ),
        },
    },
)

Step 5: Verify boto3 Works (Comparison)

In the same environment, boto3 authenticates successfully:

import boto3

# This works - boto3 implements IMDSv2 correctly
session = boto3.Session()
bedrock_runtime = session.client('bedrock-runtime', region_name='us-west-2')

response = bedrock_runtime.invoke_model(
    modelId='us.anthropic.claude-3-5-haiku-20241022-v1:0',
    body='{"messages": [{"role": "user", "content": "test"}], "anthropic_version": "bedrock-2023-05-31", "max_tokens": 100}'
)
# Success - returns model response

BAML Version

0.218.0

Language/Framework

Python

LLM Provider

Other

LLM Model

"us.anthropic.claude-3-5-haiku-20241022-v1:0" via "aws-bedrock" provider

Operating System

None

Browser

None

Code Editor

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions