A comprehensive Terraform module for managing AWS S3 inventory configurations, including automated inventory reports, Glue catalog integration, and Athena querying capabilities.
The variable var.union_view_name has been renamed to var.union_all_view_name
If you were using var.union_view_name, then use var.union_all_view_name instead.
This module no longer creates the S3 inventory bucket or Glue database
If you were using the default behavior (create_inventory_bucket = true and create_inventory_database = true),
these resources are no longer created by the module. They must now be created externally and passed to the module. You must:
- Create the S3 bucket externally before calling this module
- Create the Glue database externally before calling this module
- Remove the following variables from your module configuration (they no longer exist):
create_inventory_bucketcreate_inventory_databaseapply_default_inventory_lifecyle_rulesinventory_bucket_lifecycle_rulesinventory_bucket_object_lock_retention_daysinventory_bucket_object_lock_modeinventory_bucket_encryption_config
See the examples for updated usage patterns.
This module no longer creates additional S3 resources
The following resources are no longer created by the module because it no longer creates the S3 bucket. These resources should be created externally (if required):
- Bucket lifecycle rules
- Bucket encryption
- Bucket object lock configuration
This change allows the user a lot more flexibility to manage the S3 bucket in their own environment.
- S3 Inventory Management: Creates and configures S3 inventory reports for multiple source buckets
- Glue Catalog Integration: Sets up Glue tables for querying inventory data (database must be provided)
- Union All View: Optional view that unions ALL inventory partitions from all buckets (complete historical data)
- Union Latest View: Optional view that unions only the LATEST partition from each bucket (current state, more efficient)
- Security & Compliance: Optional default bucket policy and configurable LakeFormation permissions
- Flexible Architecture: Bring your own S3 bucket and Glue database with custom configurations
Many features are optional and can be enabled/disabled as required.
Important: The inventory bucket requires a specific bucket policy to allow the AWS S3 service to write inventory files. This policy is required for S3 inventory to function.
The module provides two approaches:
-
Default (Recommended): The module automatically generates and attaches the required bucket policy
- Set
attach_bucket_policy = true(this is the default) - The module handles everything for you
- Set
-
Custom Policy: If you need additional policy statements beyond the default
-
You could provide them using
additional_bucket_policy_statements, and the module will include them, or: -
Set
attach_bucket_policy = false -
Use the
required_bucket_policyoutput to get the required policy -
Merge it with your custom statements using
source_policy_documents -
Apply the combined policy yourself
-
Note: Only one bucket policy can exist per S3 bucket. For S3 inventory to be able to write to the destination bucket, the bucket policy must include the module's required policy statements.
# First, create the S3 bucket for inventory storage
resource "aws_s3_bucket" "inventory" {
bucket = "my-company-s3-inventory"
}
# Create the Glue database for inventory tables
resource "aws_glue_catalog_database" "inventory" {
name = "s3_inventory_db"
}
# Now configure the S3 inventory module
module "s3_inventory" {
source = "cloudandthings/terraform-aws-s3-inventory/aws"
version = "~> 2.0"
# Required: Reference the externally created resources
inventory_bucket_name = aws_s3_bucket.inventory.bucket
inventory_database_name = aws_glue_catalog_database.inventory.name
# Source buckets to inventory
source_bucket_names = [
"my-app-data-bucket",
"my-logs-bucket",
"my-backup-bucket"
]
# Optional: Union all view - all inventory partitions (complete historical data)
union_all_view_name = "${local.random_name}_union_all_view"
# Optional: Union latest view - latest partition only (current state, more efficient)
union_latest_view_name = "${local.random_name}_union_latest_view"
# Optional: Add LakeFormation permissions
# database_admin_principals = [...]
# database_read_principals = [...]
# By default, the module will attach the required bucket policy automatically
# attach_bucket_policy = true # This is the default
}See examples dropdown on Terraform Cloud, or browse the GitHub repo.
Once deployed, you can query your S3 inventory data using Amazon Athena.
Each source bucket has its own Glue table with all historical partitions:
SELECT bucket, key, size, last_modified_date, storage_class
FROM s3_inventory_db.my_app_data_bucket
WHERE dt = '2024-08-29-00-00'
ORDER BY size DESC
LIMIT 100;Recommended for most use cases - the union latest view queries only the most recent partition from each bucket:
-- Get current object count and total size per bucket
SELECT bucket,
COUNT(*) as object_count,
SUM(size) as total_size,
AVG(size) as avg_size
FROM s3_inventory_db.union_latest_view
GROUP BY bucket
ORDER BY total_size DESC;
-- Find largest objects across all buckets
SELECT bucket, key, size, storage_class, last_modified_date
FROM s3_inventory_db.union_latest_view
ORDER BY size DESC
LIMIT 100;The union latest view is designed for performance and queries yesterday's data rather than dynamically finding the maximum partition for each bucket. This means:
- Only yesterday's inventory data is included - The view filters for partitions from yesterday (
date_add('day', -1, CURRENT_DATE)) - Stale inventories won't appear - If a bucket's inventory hasn't run recently, it won't show up in the latest view
- For stale data, use the union all view - To see historical or stale inventories, query the union all view which includes all partitions
Why this design? Querying the maximum partition of a projected table in Amazon Athena is inefficient and can result in slow query performance and higher costs. By using yesterday's data, the view provides a fast, cost-effective way to get a recent snapshot of your S3 inventory across all buckets.
If you need to access all inventory data regardless of age, use the union all view instead (see below).
The union all view includes all inventory partitions from all buckets - use this for trend analysis:
-- Track storage growth over time
SELECT dt, bucket, COUNT(*) as object_count, SUM(size) as total_size
FROM s3_inventory_db.union_all_view
WHERE dt >= '2024-08-01-00-00'
GROUP BY dt, bucket
ORDER BY dt DESC, total_size DESC;- Use the “latest inventories” view for current state queries – the view configured via
var.union_latest_view_name; this scans only the most recent partition per bucket (faster and cheaper) - Use the “all inventories” view for trend analysis – the view configured via
var.union_all_view_name; this includes all historical partitions when you need time-series data - Query individual bucket tables when working with a single bucket and need partition filtering
- Always use column projection - select only needed columns instead of
SELECT * - Apply partition filters on individual tables:
WHERE dt >= 'YYYY-MM-DD-HH-MM'
As of 2025, Amazon Athena does not properly support dynamic range projection with the S3 inventory partitioning scheme. When using a dynamic range like "NOW-3MONTHS,NOW" with this module, the Glue tables will return zero rows.
To work around this limitation, this Terraform module defaults to using the beginning of the previous year as the start date. The year is calculated based on when the Terraform plan runs. For example, if today is 2025-08-25, the date range will be defaulted to "2024-01-01-00-00,NOW".
Important: This approach causes Terraform state drift annually when the year changes.
To avoid state drift, provide a fixed start date for partition projection, such as:
athena_projection_dt_range = "2025-08-01-00-00,NOW"
Choose your start date based on either:
- Your specific requirements
- The date when your S3 inventories were first deployed
This module supports configuring AWS Lake Formation permissions for the Glue database and tables. Use the following variables to grant access:
database_admin_principals- Principals with full admin access (create, update, delete) to the database and tablesdatabase_read_principals- Principals with read-only access (query tables, describe metadata)
Important validation rules:
- Each list must not contain duplicate values
- A principal cannot appear in both lists - you must choose either admin or read access, not both
module "s3_inventory" {
# ... other configuration ...
database_admin_principals = [
"arn:aws:iam::123456789012:role/data-engineering-admin",
]
database_read_principals = [
"arn:aws:iam::123456789012:role/analytics-team",
"arn:aws:iam::123456789012:role/reporting-service",
]
}- S3 inventory reports are charged per million objects listed
- Additional S3 storage costs for inventory files
- Athena charges apply when querying the data
- Consider lifecycle rules to manage long-term storage costs
Direct contributions are welcome.
See CONTRIBUTING.md for further information.
This project is licensed under the MIT License - see the LICENSE file for details.
This module was created from terraform-aws-template
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| additional_bucket_policy_statements | Additional IAM policy statements to include in the bucket policy (will be merged with module's statements) | list(object({ |
[] |
no |
| athena_projection_dt_range | Date range for Athena partition projection (format: START_DATE,END_DATE). If null then a value will be generated, see README for more information. | string |
null |
no |
| attach_bucket_policy | Whether module should attach the policy to the inventory bucket. Set to false if: - You want to attach the policy yourself using the s3_bucket_policy_json or s3_bucket_required_policy_json outputs - The bucket already has a policy and you want to merge them yourself - You only want to use this module to generate the policy statements |
bool |
true |
no |
| database_admin_principals | List of principal ARNs that will be allowed to manage (create, update, delete) the Glue database and its tables. Must not contain duplicates or overlap with database_read_principals. | list(string) |
[] |
no |
| database_read_principals | List of principal ARNs that will be allowed to read from the Glue database (query tables, describe metadata). Must not contain duplicates or overlap with database_admin_principals. | list(string) |
[] |
no |
| enable_bucket_inventory_configs | Whether to create S3 inventory configurations for the specified buckets | bool |
true |
no |
| inventory_bucket_name | Name of the S3 inventory bucket | string |
n/a | yes |
| inventory_config_encryption | Map containing encryption settings for the S3 inventory configuration. | any |
{} |
no |
| inventory_config_frequency | Frequency of the S3 inventory report generation | string |
"Daily" |
no |
| inventory_config_name | Name identifier for the S3 inventory configuration | string |
"daily" |
no |
| inventory_config_object_versions | Which object versions to include in the inventory report | string |
"All" |
no |
| inventory_database_name | Name of the S3 inventory Glue database | string |
n/a | yes |
| inventory_optional_fields | List of optional fields to include in the S3 inventory report | list(string) |
[ |
no |
| inventory_tables_description | Description to set on every S3 inventory Glue table. If not provided, a default will be used. | string |
null |
no |
| source_bucket_names | List of S3 bucket names to create inventory reports for | list(string) |
[] |
no |
| union_all_view_name | Name for the Athena view that unions ALL inventory partitions from all source buckets (complete historical data) | string |
null |
no |
| union_latest_view_name | Name for the Athena view that unions the LATEST inventory partition from each source bucket (current state only, more efficient) | string |
null |
no |
No modules.
| Name | Description |
|---|---|
| athena_projection_dt_range | The value used for projection.dt.range on the Glue table |
| bucket_policy | Complete bucket policy JSON including required statements and any additional statements. Use this to attach the policy yourself when attach_bucket_policy = false |
| required_bucket_policy | Required bucket policy JSON (S3 inventory write permissions only). Use with source_policy_documents to merge with your custom policy |
| union_all_view_name | Name of the created union all view (all partitions from all buckets), if enabled |
| union_latest_view_name | Name of the created union latest view (latest partition from each bucket), if enabled |
| Name | Version |
|---|---|
| aws | ~> 6.0 |
| Name | Version |
|---|---|
| terraform | >= 1.5.7 |
| aws | ~> 6.0 |
| Name | Type |
|---|---|
| aws_glue_catalog_table.s3_inventory | resource |
| aws_glue_catalog_table.union_all_view | resource |
| aws_glue_catalog_table.union_latest_view | resource |
| aws_lakeformation_permissions.inventory_database_admin | resource |
| aws_lakeformation_permissions.inventory_database_read | resource |
| aws_lakeformation_permissions.inventory_tables_admin | resource |
| aws_lakeformation_permissions.inventory_tables_read | resource |
| aws_s3_bucket_inventory.this | resource |
| aws_s3_bucket_policy.this | resource |
| aws_caller_identity.current | data source |
| aws_default_tags.current | data source |
| aws_iam_policy_document.additional | data source |
| aws_iam_policy_document.combined | data source |
| aws_iam_policy_document.required | data source |
| aws_partition.current | data source |
| aws_region.current | data source |