Multi-architecture container based on Fedora 43 for working with AWS and OpenShift clusters. Designed for AWS Fargate with ECS Exec support.
- AWS CLI: Both Fedora RPM and official AWS CLI v2 with alternatives support
- OpenShift CLI: Versions 4.14 through 4.20 from stable channels
- Claude Code: AI-powered CLI assistant with Amazon Bedrock integration
- Dynamic Version Selection: Switch tool versions via environment variables at runtime
- ECS Exec Ready: Designed for AWS Fargate with ECS Exec support
- Multi-architecture: Supports both x86_64 (amd64) and ARM64 (aarch64)
- OIDC Authentication: Keycloak integration with Lambda-based authorization
- Tag-Based Isolation: Per-user IAM roles with task-level access control
Copy the example environment file and update with your values:
cp .env.example .env
# Edit .env with your AWS account ID and Keycloak URLThe .env file is gitignored and contains:
- AWS account ID
- Keycloak server URL and realm
- OIDC client ID
Authentication scripts automatically load .env if present.
See Fargate Deployment section below for Terraform deployment steps.
rosa-boundary/
├── .env.example # Environment configuration template (copy to .env)
├── Containerfile # Multi-arch container build
├── entrypoint.sh # Runtime initialization and signal handling
├── skel/ # Skeleton files for container users
│ └── sre/.claude/ # Claude Code configuration templates
├── deploy/
│ ├── regional/ # Terraform: ECS, EFS, S3, Lambda, OIDC
│ │ ├── *.tf # Infrastructure definitions
│ │ ├── examples/ # Manual lifecycle scripts
│ │ └── README.md # Deployment guide
│ └── keycloak/ # Terraform: Keycloak realm and clients
├── lambda/
│ └── create-investigation/ # Lambda function for OIDC-authenticated creation
│ ├── handler.py # Group auth, role creation, task tagging
│ └── Makefile # Build Lambda package
├── tools/
│ ├── sre-auth/ # OIDC authentication scripts
│ │ ├── get-oidc-token.sh # Keycloak PKCE flow
│ │ ├── assume-role.sh # AWS role assumption
│ │ └── README.md # Auth documentation
│ └── create-investigation-lambda.sh # End-to-end creation wrapper
├── tests/
│ └── localstack/ # LocalStack integration tests
│ ├── compose.yml # LocalStack Pro + mock OIDC
│ └── integration/ # AWS service tests
├── docs/ # Architecture and implementation docs
└── .github/workflows/ # CI/CD automation
make allmake build-amd64 # Build x86_64 only
make build-arm64 # Build ARM64 onlymake manifest # Combines both architecturesmake clean # Remove all images and manifestsThe easiest way to select tool versions is via environment variables at container startup:
| Variable | Values | Default | Description |
|---|---|---|---|
OC_VERSION |
4.14, 4.15, 4.16, 4.17, 4.18, 4.19, 4.20 |
4.20 |
OpenShift CLI version |
AWS_CLI |
fedora, official |
official |
AWS CLI source |
S3_AUDIT_ESCROW |
S3 URI (e.g., s3://bucket/path/) |
(none) | S3 destination for /home/sre sync on exit |
CLAUDE_CODE_USE_BEDROCK |
0, 1 |
1 |
Enable Claude Code via Amazon Bedrock |
AWS_REGION |
AWS region code | (auto-detect) | AWS region for Bedrock. Auto-detected from ECS metadata; fallback to us-east-1 |
ANTHROPIC_MODEL |
Bedrock model ID | (default) | Override Claude model (e.g., global.anthropic.claude-sonnet-4-5-20250929-v1:0) |
Examples:
# Use OpenShift CLI 4.18
podman run -e OC_VERSION=4.18 rosa-boundary:latest
# Use Fedora's AWS CLI
podman run -e AWS_CLI=fedora rosa-boundary:latest
# Use both together
podman run -e OC_VERSION=4.17 -e AWS_CLI=fedora rosa-boundary:latest
# With a custom command
podman run -e OC_VERSION=4.19 rosa-boundary:latest /bin/bashThe container includes a non-root sre user (uid=1000) designed for SSM/ECS Exec connections. The /home/sre directory is intended to be mounted as EFS via Fargate task definition.
When the container receives termination signals (SIGTERM, SIGINT, SIGHUP) or exits normally, the entrypoint automatically syncs /home/sre to S3 if S3_AUDIT_ESCROW is set:
# Container will sync /home/sre to S3 on exit
podman run -e S3_AUDIT_ESCROW=s3://my-bucket/investigation-123/ rosa-boundary:latestFeatures:
- Automatic sync on container exit or termination signals
- Graceful failure - warns but doesn't block exit if sync fails
- Only syncs if
S3_AUDIT_ESCROWis defined (no sync if unset) - Useful for preserving investigation artifacts after ephemeral container use
The container supports two methods for switching tool versions:
- Environment Variables (recommended): Set
OC_VERSIONorAWS_CLIat container startup (see above) - Alternatives Commands (advanced): Manually switch versions inside a running container
The container includes two AWS CLI versions managed with alternatives:
- fedora (priority 10): Fedora RPM package
- aws-official (priority 20): Official AWS CLI v2 (default)
# View current AWS CLI configuration
alternatives --display aws
# Switch to Fedora version
alternatives --set aws /usr/bin/aws
# Switch to official version
alternatives --set aws /opt/aws-cli-official/v2/current/bin/awsSeven OpenShift CLI versions are available (4.14-4.20), with 4.20 as the default:
# View available oc versions
alternatives --display oc
# Switch to a specific version
alternatives --set oc /opt/openshift/4.17/oc
alternatives --set oc /opt/openshift/4.19/ocThe container includes Claude Code CLI with Amazon Bedrock integration for AI-assisted troubleshooting and automation.
Location: /home/sre/.claude/
Default configuration files are automatically initialized on first run:
settings.json- Bedrock authentication and auto-update settingsCLAUDE.md- SRE workflow guidance and available tools documentation
Authentication: Uses IAM via Amazon Bedrock (no API keys required)
Claude Code automatically detects the AWS region from ECS task metadata:
- Checks if
AWS_REGIONenvironment variable is set (explicit override) - Queries ECS metadata endpoint to extract region from Task ARN
- Falls back to
us-east-1if detection fails
This ensures Claude Code uses Bedrock in the same region as the running container.
The ECS task role needs Bedrock permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:ListInferenceProfiles"
],
"Resource": [
"arn:aws:bedrock:*:*:inference-profile/*",
"arn:aws:bedrock:*:*:foundation-model/*"
]
}
]
}# Start Claude Code session
claude
# Get help with a command
claude "How do I check the status of cluster operators?"
# Run interactive investigation
claude "Investigate pods in crashloop in default namespace"
# Disable Claude Code via environment variable
podman run -e CLAUDE_CODE_USE_BEDROCK=0 rosa-boundary:latestConfiguration files in /home/sre/.claude/ are preserved across container restarts when using EFS:
- First run: Skeleton files copied from
/etc/skel-sre/.claude/ - Subsequent runs: Existing configuration preserved (no overwrite)
- Customize: Edit
/home/sre/.claude/CLAUDE.mdto add cluster-specific context
The tools/sre-auth/ directory contains OIDC authentication scripts for AWS federation:
Obtains OIDC ID token from Keycloak via browser-based PKCE flow:
# Get token (uses cache if < 4 minutes old)
TOKEN=$(./tools/sre-auth/get-oidc-token.sh)
# Force fresh authentication
TOKEN=$(./tools/sre-auth/get-oidc-token.sh --force)Features:
- Browser-based authentication with Keycloak
- 4-minute token caching (avoids repeated popups)
- PKCE for secure public client flow
Assumes AWS IAM role using OIDC web identity:
# Assume role created by Lambda
eval $(./tools/sre-auth/assume-role.sh --role arn:aws:iam::123456789012:role/rosa-boundary-user-abc123)
# Verify identity
aws sts get-caller-identityFeatures:
- Uses get-oidc-token.sh internally
- Returns bash export statements for credentials
- Credentials valid for 1 hour
See tools/sre-auth/README.md for detailed documentation.
# Run with default versions (OC 4.20, official AWS CLI)
podman run -it rosa-boundary:latest /bin/bash
# Run with specific versions
podman run -it -e OC_VERSION=4.18 -e AWS_CLI=fedora rosa-boundary:latest /bin/bash
# Check tool versions
podman run --rm rosa-boundary:latest sh -c "aws --version && oc version --client"This container is designed to run as an AWS Fargate task with ECS Exec for remote access. Two deployment approaches are supported:
OIDC-authenticated Lambda function that creates per-user IAM roles with tag-based authorization:
cd deploy/regional/
# Deploy infrastructure (includes Lambda function)
terraform init
terraform apply
# Create investigation with OIDC authentication
cd ../tools/
./create-investigation-lambda.sh rosa-boundary-dev inv-12345
# Script will:
# 1. Authenticate via Keycloak (browser popup)
# 2. Call Lambda to create investigation (role + task)
# 3. Assume OIDC role with tag-based permissions
# 4. Wait for task to be running
# 5. Provide ECS Exec connection commandFeatures:
- Group-based authorization (requires
sre-teammembership) - Per-user IAM roles with tag-based task isolation
- Automatic role creation on first use
- Token caching (4 minutes) to avoid repeated browser authentication
See tools/sre-auth/README.md for authentication details and docs/LAMBDA_AUTH_SUMMARY.md for architecture.
Lower-level scripts for manual investigation management:
cd deploy/regional/examples/
# Create investigation (access point + task definition)
./create_investigation.sh <cluster-id> <investigation-id> [oc-version]
# Launch task
./launch_task.sh <task-family>
# Connect to task
./join_task.sh <task-id>
# Stop task (triggers S3 sync)
./stop_task.sh <task-id>
# Cleanup investigation
./close_investigation.sh <task-family> <access-point-id>See deploy/regional/README.md for complete documentation.
For manual deployment without Terraform:
- Push container image to ECR or container registry
- Create ECS cluster with Container Insights
- Create EFS filesystem with access points per investigation
- Create IAM roles with Bedrock, S3, and ECS Exec permissions
- Create task definition with EFS mount and environment variables
- Launch tasks with
--enable-execute-command
The container runs sleep infinity by default. On termination, it syncs /home/sre to S3 if configured.
- Base: Fedora 43
- AWS CLI: v2.32.16+ (official), v2.27.0 (Fedora RPM)
- OpenShift CLI: 4.14.x, 4.15.x, 4.16.x, 4.17.x, 4.18.x, 4.19.x, 4.20.x
- Claude Code: 2.0.69 (native installer), auto-updates disabled
- Additional tools: util-linux (includes su for user switching)
The manifest list automatically selects the appropriate image for your platform:
linux/amd64- x86_64 architecturelinux/arm64- ARM64/aarch64 architecture (Graviton)
Test AWS functionality locally before production deployment:
# Start LocalStack (requires LocalStack Pro token)
make localstack-up
# Run fast tests (~2-3 min)
make test-localstack-fast
# Run full test suite (~5-7 min)
make test-localstack
# Stop LocalStack
make localstack-downSee tests/localstack/README.md for complete documentation.
cd lambda/create-investigation/
make testGitHub Actions workflow runs on PRs and pushes to main:
- LocalStack Integration Tests - AWS service validation
- Lambda Unit Tests - Handler function validation with moto
Required GitHub Secret: LOCALSTACK_AUTH_TOKEN (LocalStack Pro license)