Skip to content

fix(s3): skip hydrate calls for buckets outside name/region qualifiers#2740

Open
rmnjaat wants to merge 1 commit intoturbot:mainfrom
rmnjaat:fix/aws-s3-bucket-skip-hydrate-on-name-region-filter
Open

fix(s3): skip hydrate calls for buckets outside name/region qualifiers#2740
rmnjaat wants to merge 1 commit intoturbot:mainfrom
rmnjaat:fix/aws-s3-bucket-skip-hydrate-on-name-region-filter

Conversation

@rmnjaat
Copy link
Copy Markdown

@rmnjaat rmnjaat commented Apr 29, 2026

Problem

Querying aws_s3_bucket with WHERE region = 'x' or WHERE name = 'x'
triggered all 14 hydrate API calls on every bucket globally before
any filtering occurred.

For an account with 276 buckets and WHERE region = 'ap-southeast-2'
(zero matching buckets), this caused ~3,800 API calls and query timeouts
exceeding 12 minutes despite returning zero results.

Fixes #2737

Root Cause

  • ListBuckets always returns all buckets globally (unavoidable)
  • getBucketRegion (HeadBucket) ran for every bucket regardless of filters
  • All 13 downstream hydrates ran for every bucket regardless of filters
  • Region/name filtering only happened at the SQL engine level, after all API calls completed

Solution

Two early-exit gates inserted at the earliest possible stage:

Gate 1 — Name filter (inside listS3Buckets, before StreamListItem)

name is available directly from ListBuckets. Buckets not matching a
name = '...' or name IN (...) qualifier are skipped before
d.StreamListItem, so they never enter the hydrate pipeline.

Saves: HeadBucket + 13 downstream API calls per non-matching bucket.

Gate 2 — Region filter (inside doGetBucketRegion, after HeadBucket)

After HeadBucket resolves the actual bucket region, buckets outside the
requested region return nil, nil from getBucketRegion. All 13
dependent hydrate functions have a nil guard at the top and exit
immediately without making any API calls.

Saves: 13 API calls per non-matching bucket (HeadBucket is unavoidable
as region is unknown before it runs).

Both gates handle = and IN operators via d.Quals loops. Other
operators fall back to full hydration. The region cache path also applies
the qualifier check so warm-cache queries are equally optimised.

Performance Impact

Scenario (276 total buckets) Before After
WHERE region = 'ap-southeast-2' (0 match) ~3,864 calls, 12+ min ~276 calls, ~1.3s
WHERE region = 'us-east-1' (21 match) ~3,864 calls ~276 + 273 = 549 calls
WHERE name = 'my-bucket' (1 match) ~3,864 calls ~15 calls, ~0.2s
No filter ~3,864 calls ~3,864 calls (unchanged)

Notes

Integration test logs

Logs
$ go test ./aws/... -v
--- PASS: TestConvertPolicy/32 (0.00s)
--- PASS: TestConvertPolicy/33 (0.00s)
--- PASS: TestConvertPolicy/34 (0.00s)
PASS
ok      github.com/turbot/steampipe-plugin-aws/aws      0.830s

Example query results

Results
-- Test 1: name = filter (1 match from 276 buckets)
> select name, region, versioning_enabled, bucket_policy_is_public, tags
  from aws_s3_bucket
  where name = 'my-app-assets-prod';

+--------------------+------------+--------------------+-------------------------+----------------------------------------------+
| name               | region     | versioning_enabled | bucket_policy_is_public | tags                                         |
+--------------------+------------+--------------------+-------------------------+----------------------------------------------+
| my-app-assets-prod | ap-south-1 | false              | false                   | {"Environment":"Prod","Team":"platform"}     |
+--------------------+------------+--------------------+-------------------------+----------------------------------------------+


-- Test 2: name IN filter (2 matches)
> select name, region, versioning_enabled
  from aws_s3_bucket
  where name in ('my-app-assets-prod', 'my-app-logs-dev');

+--------------------+------------+--------------------+
| name               | region     | versioning_enabled |
+--------------------+------------+--------------------+
| my-app-assets-prod | ap-south-1 | false              |
| my-app-logs-dev    | ap-south-1 | false              |
+--------------------+------------+--------------------+


-- Test 3: region = filter (21 matches)
> select name, region
  from aws_s3_bucket
  where region = 'us-east-1'
  order by name;

+------------------------------+-----------+
| name                         | region    |
+------------------------------+-----------+
| my-alb-access-logs-01        | us-east-1 |
| my-app-cdn-logs              | us-east-1 |
| my-app-cloudfront-01         | us-east-1 |
| my-app-ses-logs              | us-east-1 |
| my-backup-bucket-01          | us-east-1 |
| my-billing-reports           | us-east-1 |
| my-cdn-assets                | us-east-1 |
| my-cloudtrail-logs           | us-east-1 |
| my-cost-reports-main         | us-east-1 |
| my-cur-bucket-123456789012   | us-east-1 |
| my-db-backups                | us-east-1 |
| my-deploy-artifacts          | us-east-1 |
| my-elb-logs-prod             | us-east-1 |
| my-lambda-artifacts          | us-east-1 |
| my-ops-cur-bucket            | us-east-1 |
| my-root-activity-alerts      | us-east-1 |
| my-s3-access-logs-03         | us-east-1 |
| my-serveraccess-logs         | us-east-1 |
| my-static-assets             | us-east-1 |
| my-temp-logs-use1            | us-east-1 |
| my-waf-logs-cloudfront       | us-east-1 |
+------------------------------+-----------+


-- Test 4: region = filter with 0 matches
> select name, region
  from aws_s3_bucket
  where region = 'ap-southeast-2';

+------+--------+
| name | region |
+------+--------+
+------+--------+


-- Test 5: region IN filter (multiple regions, 23 matches)
> select name, region
  from aws_s3_bucket
  where region in ('us-east-1', 'eu-west-1')
  order by region, name;

+------------------------------+-----------+
| name                         | region    |
+------------------------------+-----------+
| my-eu-app-uploads            | eu-west-1 |
| my-eu-temp-logs              | eu-west-1 |
| my-alb-access-logs-01        | us-east-1 |
| my-app-cdn-logs              | us-east-1 |
| my-app-cloudfront-01         | us-east-1 |
| my-app-ses-logs              | us-east-1 |
| my-backup-bucket-01          | us-east-1 |
| my-billing-reports           | us-east-1 |
| my-cdn-assets                | us-east-1 |
| my-cloudtrail-logs           | us-east-1 |
| my-cost-reports-main         | us-east-1 |
| my-cur-bucket-123456789012   | us-east-1 |
| my-db-backups                | us-east-1 |
| my-deploy-artifacts          | us-east-1 |
| my-elb-logs-prod             | us-east-1 |
| my-lambda-artifacts          | us-east-1 |
| my-ops-cur-bucket            | us-east-1 |
| my-root-activity-alerts      | us-east-1 |
| my-s3-access-logs-03         | us-east-1 |
| my-serveraccess-logs         | us-east-1 |
| my-static-assets             | us-east-1 |
| my-temp-logs-use1            | us-east-1 |
| my-waf-logs-cloudfront       | us-east-1 |
+------------------------------+-----------+


-- Test 6: no filter — baseline unchanged (276 total buckets)
> select count(*) as total from aws_s3_bucket;

+-------+
| total |
+-------+
| 276   |
+-------+

Querying aws_s3_bucket with a WHERE region or WHERE name filter was
triggering all 14 hydrate API calls for every bucket globally before
any filtering occurred. For accounts with many buckets (e.g. 272),
this caused ~3,800 API calls and query timeouts exceeding 12 minutes.

Two early-exit gates are introduced:

1. Name filter (listS3Buckets): buckets whose name does not match a
   name = '...' or name IN (...) qualifier are skipped before
   d.StreamListItem, so they never enter the hydrate pipeline at all.
   This saves HeadBucket + all 13 downstream API calls per bucket.

2. Region filter (doGetBucketRegion / getBucketRegion): after
   HeadBucket resolves the bucket's actual region, buckets outside
   the requested region return nil,nil, causing all 13 dependent
   hydrate functions to exit immediately via a nil guard added at
   the top of each function.

Both filters handle = and IN operators via d.Quals loops. Other
operators (LIKE, !=) fall back to full hydration.

The region cache check also applies the qual filter so warm-cache
paths are equally optimised.

Fixes turbot#2737
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize aws_s3_bucket: Skip hydrate calls for buckets outside requested region qualifier

1 participant