Skip to content

backend/s3: optimize StateMgr workspace existence check#38214

Open
joewrightfd wants to merge 2 commits intohashicorp:mainfrom
joewrightfd:handle_large_s3_buckets
Open

backend/s3: optimize StateMgr workspace existence check#38214
joewrightfd wants to merge 2 commits intohashicorp:mainfrom
joewrightfd:handle_large_s3_buckets

Conversation

@joewrightfd
Copy link

@joewrightfd joewrightfd commented Feb 25, 2026

This is a performance improvement for terraform init for the S3 backend when there are a large (10k+) amount of workspaces in that bucket.

Issue: #33137

Optimization

Replace O(n) ListObjectsV2 pagination with O(1) HeadObject call when checking if a workspace state file exists in StateMgr.

Previously, StateMgr() called Workspaces() which listed all objects matching the workspace prefix (env:), then linearly searched the results. With 15,000 workspaces this meant 15 API calls and 15,000 string comparisons just to check if one workspace exists.

Now we call HeadObject directly on the specific state file path since we have enough information to know the exact path we need.

Benchmark

Workspaces Before (ListObjectsV2) After (HeadObject) Improvement
15,000 14.4s avg (15 API calls) ~12s avg (1 API call) 2.4s faster (17%)
32,000 21.1s avg (32 API calls) ~12s avg (1 API call) 9.1s faster (43%)

The "After" time remains constant (~12s) regardless of workspace count, while "Before" scales linearly.

The time saved scales linearly with workspace count - patched version stays constant while the original grows with O(n) pagination.

Steps to Reproduce

# 1. Create test S3 bucket
aws s3 mb s3://terraform-perf-test-bucket

# 2. Create initial state file
echo '{"version":4,"terraform_version":"1.5.0","serial":1,"lineage":"test","outputs":{},"resources":[]}' > /tmp/test.tfstate
aws s3 cp /tmp/test.tfstate s3://terraform-perf-test-bucket/env:/workspace1/terraform.tfstate

# 3. Populate with many workspace state files (adjust count as needed)
export AWS_PAGER=""
seq 2 15000 | xargs -P 100 -I {} aws s3 cp \
  s3://terraform-perf-test-bucket/env:/workspace1/terraform.tfstate \
  s3://terraform-perf-test-bucket/env:/workspace{}/terraform.tfstate

# 4. Create test terraform config
mkdir tf-perf-test && cd tf-perf-test
cat > main.tf << 'EOF'
terraform {
  backend "s3" {
    bucket = "terraform-perf-test-bucket"
    key    = "terraform.tfstate"
    region = "us-east-1"
  }
}
EOF

# 5. Benchmark
export TF_WORKSPACE=workspace9999
rm -f .terraform/terraform.tfstate && time terraform init

Target Release

1.15.x

Rollback Plan

  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

None.

CHANGELOG entry

  • This change is user-facing and I added a changelog entry.
  • This change is not user-facing.

Replace O(n) ListObjectsV2 pagination with O(1) HeadObject call when
checking if a workspace state file exists in StateMgr.

Previously, StateMgr called Workspaces() which listed ALL objects
matching the workspace prefix, then linearly searched the results.
With 15,000 workspaces this meant 15 API calls and 15,000 string
comparisons just to check if one workspace exists.

Now we call HeadObject directly on the specific state file path.

Benchmark with 15,000 workspace objects:
- Before: 14.4s avg (15 paginated ListObjectsV2 calls)
- After:  10.9s avg (1 HeadObject call)
- Improvement: 24-33% faster
@joewrightfd joewrightfd requested review from a team as code owners February 25, 2026 23:38
@hashicorp-cla-app
Copy link

hashicorp-cla-app bot commented Feb 25, 2026

CLA assistant check
All committers have signed the CLA.

@crw
Copy link
Contributor

crw commented Feb 26, 2026

Thanks for this submission! The S3 backend is maintained by the AWS Provider team at HashiCorp, an IBM company. They review PRs infrequently, as their schedule allows. Please expect any updates to come from someone on that team. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S3 backend directory structure is inefficient with large numbers of projects and workspaces

2 participants