Terraform AWS S3 Inventory Module

A comprehensive Terraform module for managing AWS S3 inventory configurations, including automated inventory reports, Glue catalog integration, and Athena querying capabilities.

⚠️ Breaking Changes in v2.0.0

The variable var.union_view_name has been renamed to var.union_all_view_name

If you were using var.union_view_name, then use var.union_all_view_name instead.

This module no longer creates the S3 inventory bucket or Glue database

If you were using the default behavior (create_inventory_bucket = true and create_inventory_database = true), these resources are no longer created by the module. They must now be created externally and passed to the module. You must:

Create the S3 bucket externally before calling this module
Create the Glue database externally before calling this module
Remove the following variables from your module configuration (they no longer exist):
- create_inventory_bucket
- create_inventory_database
- apply_default_inventory_lifecyle_rules
- inventory_bucket_lifecycle_rules
- inventory_bucket_object_lock_retention_days
- inventory_bucket_object_lock_mode
- inventory_bucket_encryption_config

See the examples for updated usage patterns.

This module no longer creates additional S3 resources

The following resources are no longer created by the module because it no longer creates the S3 bucket. These resources should be created externally (if required):

Bucket lifecycle rules
Bucket encryption
Bucket object lock configuration

This change allows the user a lot more flexibility to manage the S3 bucket in their own environment.

Features

S3 Inventory Management: Creates and configures S3 inventory reports for multiple source buckets
Glue Catalog Integration: Sets up Glue tables for querying inventory data (database must be provided)
Union All View: Optional view that unions ALL inventory partitions from all buckets (complete historical data)
Union Latest View: Optional view that unions only the LATEST partition from each bucket (current state, more efficient)
Security & Compliance: Optional default bucket policy and configurable LakeFormation permissions
Flexible Architecture: Bring your own S3 bucket and Glue database with custom configurations

Many features are optional and can be enabled/disabled as required.

S3 Bucket Policy Requirement

Important: The inventory bucket requires a specific bucket policy to allow the AWS S3 service to write inventory files. This policy is required for S3 inventory to function.

The module provides two approaches:

Default (Recommended): The module automatically generates and attaches the required bucket policy
- Set attach_bucket_policy = true (this is the default)
- The module handles everything for you
Custom Policy: If you need additional policy statements beyond the default
- You could provide them using additional_bucket_policy_statements, and the module will include them, or:
- Set attach_bucket_policy = false
- Use the required_bucket_policy output to get the required policy
- Merge it with your custom statements using source_policy_documents
- Apply the combined policy yourself

Note: Only one bucket policy can exist per S3 bucket. For S3 inventory to be able to write to the destination bucket, the bucket policy must include the module's required policy statements.

Quick Start

# First, create the S3 bucket for inventory storage
resource "aws_s3_bucket" "inventory" {
  bucket = "my-company-s3-inventory"
}

# Create the Glue database for inventory tables
resource "aws_glue_catalog_database" "inventory" {
  name = "s3_inventory_db"
}

# Now configure the S3 inventory module
module "s3_inventory" {
  source  = "cloudandthings/terraform-aws-s3-inventory/aws"
  version = "~> 2.0"

  # Required: Reference the externally created resources
  inventory_bucket_name   = aws_s3_bucket.inventory.bucket
  inventory_database_name = aws_glue_catalog_database.inventory.name

  # Source buckets to inventory
  source_bucket_names = [
    "my-app-data-bucket",
    "my-logs-bucket",
    "my-backup-bucket"
  ]

  # Optional: Union all view - all inventory partitions (complete historical data)
  union_all_view_name    = "${local.random_name}_union_all_view"

  # Optional: Union latest view - latest partition only (current state, more efficient)
  union_latest_view_name = "${local.random_name}_union_latest_view"

  # Optional: Add LakeFormation permissions
  # database_admin_principals = [...]
  # database_read_principals = [...]

  # By default, the module will attach the required bucket policy automatically
  # attach_bucket_policy = true  # This is the default
}

Usage

See examples dropdown on Terraform Cloud, or browse the GitHub repo.

Querying Your Inventory Data

Once deployed, you can query your S3 inventory data using Amazon Athena.

Querying Individual Buckets

Each source bucket has its own Glue table with all historical partitions:

SELECT bucket, key, size, last_modified_date, storage_class
FROM s3_inventory_db.my_app_data_bucket
WHERE dt = '2024-08-29-00-00'
ORDER BY size DESC
LIMIT 100;

Querying Current State (Union Latest View)

Recommended for most use cases - the union latest view queries only the most recent partition from each bucket:

-- Get current object count and total size per bucket
SELECT bucket,
       COUNT(*) as object_count,
       SUM(size) as total_size,
       AVG(size) as avg_size
FROM s3_inventory_db.union_latest_view
GROUP BY bucket
ORDER BY total_size DESC;

-- Find largest objects across all buckets
SELECT bucket, key, size, storage_class, last_modified_date
FROM s3_inventory_db.union_latest_view
ORDER BY size DESC
LIMIT 100;

How the Latest View Works

The union latest view is designed for performance and queries yesterday's data rather than dynamically finding the maximum partition for each bucket. This means:

Only yesterday's inventory data is included - The view filters for partitions from yesterday (date_add('day', -1, CURRENT_DATE))
Stale inventories won't appear - If a bucket's inventory hasn't run recently, it won't show up in the latest view
For stale data, use the union all view - To see historical or stale inventories, query the union all view which includes all partitions

Why this design? Querying the maximum partition of a projected table in Amazon Athena is inefficient and can result in slow query performance and higher costs. By using yesterday's data, the view provides a fast, cost-effective way to get a recent snapshot of your S3 inventory across all buckets.

If you need to access all inventory data regardless of age, use the union all view instead (see below).

Querying Historical Data (Union All View)

The union all view includes all inventory partitions from all buckets - use this for trend analysis:

-- Track storage growth over time
SELECT dt, bucket, COUNT(*) as object_count, SUM(size) as total_size
FROM s3_inventory_db.union_all_view
WHERE dt >= '2024-08-01-00-00'
GROUP BY dt, bucket
ORDER BY dt DESC, total_size DESC;

Performance Tips

Use the “latest inventories” view for current state queries – the view configured via var.union_latest_view_name; this scans only the most recent partition per bucket (faster and cheaper)
Use the “all inventories” view for trend analysis – the view configured via var.union_all_view_name; this includes all historical partitions when you need time-series data
Query individual bucket tables when working with a single bucket and need partition filtering
Always use column projection - select only needed columns instead of SELECT *
Apply partition filters on individual tables: WHERE dt >= 'YYYY-MM-DD-HH-MM'

Important Considerations

Athena Partition Date Projection

As of 2025, Amazon Athena does not properly support dynamic range projection with the S3 inventory partitioning scheme. When using a dynamic range like "NOW-3MONTHS,NOW" with this module, the Glue tables will return zero rows.

To work around this limitation, this Terraform module defaults to using the beginning of the previous year as the start date. The year is calculated based on when the Terraform plan runs. For example, if today is 2025-08-25, the date range will be defaulted to "2024-01-01-00-00,NOW".

Important: This approach causes Terraform state drift annually when the year changes.

Workaround

To avoid state drift, provide a fixed start date for partition projection, such as:

athena_projection_dt_range = "2025-08-01-00-00,NOW"

Choose your start date based on either:

Your specific requirements
The date when your S3 inventories were first deployed

Lake Formation Permissions

This module supports configuring AWS Lake Formation permissions for the Glue database and tables. Use the following variables to grant access:

database_admin_principals - Principals with full admin access (create, update, delete) to the database and tables
database_read_principals - Principals with read-only access (query tables, describe metadata)

Important validation rules:

Each list must not contain duplicate values
A principal cannot appear in both lists - you must choose either admin or read access, not both

module "s3_inventory" {
  # ... other configuration ...

  database_admin_principals = [
    "arn:aws:iam::123456789012:role/data-engineering-admin",
  ]

  database_read_principals = [
    "arn:aws:iam::123456789012:role/analytics-team",
    "arn:aws:iam::123456789012:role/reporting-service",
  ]
}

Costs

S3 inventory reports are charged per million objects listed
Additional S3 storage costs for inventory files
Athena charges apply when querying the data
Consider lifecycle rules to manage long-term storage costs

Contributing

Direct contributions are welcome.

See CONTRIBUTING.md for further information.

License

This project is licensed under the MIT License - see the LICENSE file for details.

This module was created from terraform-aws-template

Documentation

Inputs

Name	Description	Type	Default	Required
additional_bucket_policy_statements	Additional IAM policy statements to include in the bucket policy (will be merged with module's statements)	list(object({ Sid = optional(string) Effect = string Principal = any Action = any Resource = any Condition = optional(any) }))	`[]`	no
athena_projection_dt_range	Date range for Athena partition projection (format: START_DATE,END_DATE). If null then a value will be generated, see README for more information.	`string`	`null`	no
attach_bucket_policy	Whether module should attach the policy to the inventory bucket. Set to false if: - You want to attach the policy yourself using the s3_bucket_policy_json or s3_bucket_required_policy_json outputs - The bucket already has a policy and you want to merge them yourself - You only want to use this module to generate the policy statements	`bool`	`true`	no
database_admin_principals	List of principal ARNs that will be allowed to manage (create, update, delete) the Glue database and its tables. Must not contain duplicates or overlap with database_read_principals.	`list(string)`	`[]`	no
database_read_principals	List of principal ARNs that will be allowed to read from the Glue database (query tables, describe metadata). Must not contain duplicates or overlap with database_admin_principals.	`list(string)`	`[]`	no
enable_bucket_inventory_configs	Whether to create S3 inventory configurations for the specified buckets	`bool`	`true`	no
inventory_bucket_name	Name of the S3 inventory bucket	`string`	n/a	yes
inventory_config_encryption	Map containing encryption settings for the S3 inventory configuration.	`any`	`{}`	no
inventory_config_frequency	Frequency of the S3 inventory report generation	`string`	`"Daily"`	no
inventory_config_name	Name identifier for the S3 inventory configuration	`string`	`"daily"`	no
inventory_config_object_versions	Which object versions to include in the inventory report	`string`	`"All"`	no
inventory_database_name	Name of the S3 inventory Glue database	`string`	n/a	yes
inventory_optional_fields	List of optional fields to include in the S3 inventory report	`list(string)`	[ "Size", "LastModifiedDate", "IsMultipartUploaded", "ReplicationStatus", "EncryptionStatus", "BucketKeyStatus", "StorageClass", "IntelligentTieringAccessTier", "ETag", "ChecksumAlgorithm", "ObjectLockRetainUntilDate", "ObjectLockMode", "ObjectLockLegalHoldStatus", "ObjectAccessControlList", "ObjectOwner" ]	no
inventory_tables_description	Description to set on every S3 inventory Glue table. If not provided, a default will be used.	`string`	`null`	no
source_bucket_names	List of S3 bucket names to create inventory reports for	`list(string)`	`[]`	no
union_all_view_name	Name for the Athena view that unions ALL inventory partitions from all source buckets (complete historical data)	`string`	`null`	no
union_latest_view_name	Name for the Athena view that unions the LATEST inventory partition from each source bucket (current state only, more efficient)	`string`	`null`	no

Modules

No modules.

Outputs

Name	Description
athena_projection_dt_range	The value used for projection.dt.range on the Glue table
bucket_policy	Complete bucket policy JSON including required statements and any additional statements. Use this to attach the policy yourself when attach_bucket_policy = false
required_bucket_policy	Required bucket policy JSON (S3 inventory write permissions only). Use with source_policy_documents to merge with your custom policy
union_all_view_name	Name of the created union all view (all partitions from all buckets), if enabled
union_latest_view_name	Name of the created union latest view (latest partition from each bucket), if enabled

Providers

Name	Version
aws	~> 6.0

Requirements

Name	Version
terraform	>= 1.5.7
aws	~> 6.0

Resources

Name	Type
aws_glue_catalog_table.s3_inventory	resource
aws_glue_catalog_table.union_all_view	resource
aws_glue_catalog_table.union_latest_view	resource
aws_lakeformation_permissions.inventory_database_admin	resource
aws_lakeformation_permissions.inventory_database_read	resource
aws_lakeformation_permissions.inventory_tables_admin	resource
aws_lakeformation_permissions.inventory_tables_read	resource
aws_s3_bucket_inventory.this	resource
aws_s3_bucket_policy.this	resource
aws_caller_identity.current	data source
aws_default_tags.current	data source
aws_iam_policy_document.additional	data source
aws_iam_policy_document.combined	data source
aws_iam_policy_document.required	data source
aws_partition.current	data source
aws_region.current	data source

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
examples		examples
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.release-please-manifest.json		.release-please-manifest.json
.terraformignore		.terraformignore
.tfdocs-config.yml		.tfdocs-config.yml
.tflint.hcl		.tflint.hcl
.trivyignore		.trivyignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
data.tf		data.tf
dev-bootstrap.sh		dev-bootstrap.sh
glue-table.tf		glue-table.tf
glue-union-all-view.tf		glue-union-all-view.tf
glue-union-latest-view.tf		glue-union-latest-view.tf
lake-formation-permissions.tf		lake-formation-permissions.tf
locals.tf		locals.tf
main.tf		main.tf
mise.toml		mise.toml
outputs.tf		outputs.tf
release-please-config.json		release-please-config.json
source-s3-buckets.tf		source-s3-buckets.tf
terraform.tf		terraform.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terraform AWS S3 Inventory Module

⚠️ Breaking Changes in v2.0.0

Features

S3 Bucket Policy Requirement

Quick Start

Usage

Querying Your Inventory Data

Querying Individual Buckets

Querying Current State (Union Latest View)

How the Latest View Works

Querying Historical Data (Union All View)

Performance Tips

Important Considerations

Athena Partition Date Projection

Workaround

Lake Formation Permissions

Costs

Contributing

License

Documentation

Inputs

Modules

Outputs

Providers

Requirements

Resources

About

Uh oh!

Releases 8

Packages

Contributors 3

Uh oh!

Languages

License

cloudandthings/terraform-aws-s3-inventory

Folders and files

Latest commit

History

Repository files navigation

Terraform AWS S3 Inventory Module

⚠️ Breaking Changes in v2.0.0

Features

S3 Bucket Policy Requirement

Quick Start

Usage

Querying Your Inventory Data

Querying Individual Buckets

Querying Current State (Union Latest View)

How the Latest View Works

Querying Historical Data (Union All View)

Performance Tips

Important Considerations

Athena Partition Date Projection

Workaround

Lake Formation Permissions

Costs

Contributing

License

Documentation

Inputs

Modules

Outputs

Providers

Requirements

Resources

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 3

Uh oh!

Languages

Packages