Skip to content

Feature/aws infra module#224

Open
HariGS-DB wants to merge 3 commits intomainfrom
feature/aws-infra-module
Open

Feature/aws infra module#224
HariGS-DB wants to merge 3 commits intomainfrom
feature/aws-infra-module

Conversation

@HariGS-DB
Copy link
Copy Markdown

closes #178

This PR adds a standard module for all things AWS infra. The objective is to use the base aws infra for developing any databricks examples involving aws. It contains the following:

  1. code for s3 buckets, iam roles, vpc, sg
  2. optional flag to create resources for private link backend
  3. optional flag to create hub and spoke arch
  4. optional flag to create firewall to control outbound

@HariGS-DB HariGS-DB requested review from a team as code owners November 11, 2025 16:18
@HariGS-DB HariGS-DB requested review from alexott and rauchy November 11, 2025 16:18
@alexott alexott requested a review from Copilot November 25, 2025 14:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive AWS infrastructure module (aws-infra) for Databricks deployments, providing a standardized foundation for developing AWS-based Databricks examples. The module creates VPC infrastructure, S3 storage buckets, IAM roles, and security configurations with optional features for Private Link, hub-spoke architecture, and network firewalls.

Key changes:

  • Creates core AWS infrastructure components (VPC, S3 buckets, IAM roles, VPC endpoints)
  • Adds optional Private Link support for secure Databricks connectivity
  • Implements optional hub-spoke architecture with Transit Gateway and Network Firewall
  • Includes comprehensive documentation and example configurations

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
modules/aws/aws-infra/main.tf Orchestrates module components and conditionally creates hub networking submodule
modules/aws/aws-infra/variables.tf Defines input variables for networking, storage, IAM, security, and advanced networking configurations
modules/aws/aws-infra/locals.tf Computes local values for tags, availability zones, subnet CIDRs, and IAM configurations
modules/aws/aws-infra/networking.tf Creates VPC infrastructure using AWS VPC module with Databricks-specific security group rules
modules/aws/aws-infra/workspacestorage.tf Creates root S3 bucket for Databricks workspace with encryption and public access blocking
modules/aws/aws-infra/ucstorage.tf Creates Unity Catalog metastore and data S3 buckets with security configurations
modules/aws/aws-infra/iam.tf Creates IAM roles and policies for Databricks cross-account access and Unity Catalog
modules/aws/aws-infra/vpc-endpoints.tf Creates VPC endpoints for S3, STS, and Kinesis services
modules/aws/aws-infra/private-link.tf Creates Private Link resources including subnets, security groups, and VPC endpoints
modules/aws/aws-infra/outputs.tf Exposes VPC ID, S3 bucket names, and IAM role ARNs
modules/aws/aws-infra/versions.tf Specifies Terraform and provider version requirements
modules/aws/aws-infra/modules/hub-networking/transit-gateway.tf Implements Transit Gateway with hub-spoke architecture and routing configuration
modules/aws/aws-infra/modules/hub-networking/firewall.tf Creates Network Firewall with FQDN and network-based rule groups
modules/aws/aws-infra/modules/hub-networking/variables.tf Defines hub networking submodule input variables
modules/aws/aws-infra/modules/hub-networking/locals.tf Computes hub VPC subnet CIDRs and Transit Gateway name
modules/aws/aws-infra/modules/hub-networking/outputs.tf Exposes hub VPC ID
modules/aws/aws-infra/components/iam.tf Duplicate IAM configuration file (appears to be unused)
modules/aws/aws-infra/README.md Comprehensive documentation with architecture diagrams, usage examples, and configuration details
Comments suppressed due to low confidence (1)

modules/aws/aws-infra/components/iam.tf:1

  • This file appears to be a duplicate of modules/aws/aws-infra/iam.tf with identical content. Having duplicate IAM configurations can lead to maintenance issues and confusion about which file is the source of truth. Remove this duplicate file and use only modules/aws/aws-infra/iam.tf.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

"Module" = "aws-infra"
"Prefix" = var.prefix
"Region" = var.region
"CreatedDate" = formatdate("YYYY-MM-DD", timestamp())
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using timestamp() in common tags will cause Terraform to detect changes on every plan/apply, even when no actual infrastructure changes are needed. This is a known anti-pattern that leads to unnecessary plan noise and potential state drift. Consider removing this tag or using a static value set once during initial deployment.

Suggested change
"CreatedDate" = formatdate("YYYY-MM-DD", timestamp())

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends_on = [aws_s3_bucket_public_access_block.root]
}


Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
restrict_public_buckets = true
}


Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
}
}


Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
# Current region (for firewall rules)
current_region = var.region
}

Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The file ends with two blank lines. While not critical, consistent formatting typically uses a single trailing newline.

Suggested change

Copilot uses AI. Check for mistakes.
This module creates a complete AWS infrastructure foundation optimized for Databricks, featuring:

- **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking
- **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the encryption optional/configurable?


- **🔧 Simplified Configuration**: Uses official `terraform-aws-modules/vpc` for networking
- **🔒 Secure Storage**: S3 buckets with encryption for workspace and Unity Catalog
- **👤 IAM Integration**: Cross-account and Unity Catalog roles with Databricks-generated policies
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How configurable are policies? For cross-account we have different policy types: restricted/managed/...

## Module Components

### Core Components (Always Created)
- **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always need a NAT gateway when implementing a Hub & Spoke architecture?

### Core Components (Always Created)
- **networking.tf** - VPC, subnets, security groups, NAT gateway (via AWS VPC module)
- **workspacestorage.tf** - Root S3 bucket for Databricks workspace
- **ucstorage.tf** - Unity Catalog S3 buckets (metastore & data)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it makes sense to have a separate module for UC S3 buckets, but allow them to be linked with this module.

Comment on lines +214 to +217
# Unity Catalog Configuration
create_metastore_bucket = true
unity_catalog_account_id = "414351767826"
external_id = "12345678-1234-1234-1234-123456789abc"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it makes sense to move to a separate module

}
```

## Inputs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use terraform-docs for generation of that tables

}

# Cross-Account Role Policy
data "aws_iam_policy_document" "cross_account_policy" {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

# Cross-Account Role Policy
data "aws_iam_policy_document" "cross_account_policy" {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the duplication with modules/aws/aws-infra/components/iam.tf?

Copy link
Copy Markdown
Collaborator

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my & Copilot's comments

…ty, remove duplicate

- Fix AWS Network Firewall policy: remove priority fields (requires STRICT_ORDER mode
  which is incompatible with DEFAULT_ACTION_ORDER rule groups); DEFAULT_ACTION_ORDER
  evaluates specific allow rules before the deny-all catch-all
- Fix FQDN format in README examples: AWS Network Firewall uses leading-dot format
  (.domain.com) not wildcard (*.domain.com); document this with an explicit note
- Add storage_encryption variable (SSE-S3 default, SSE-KMS with kms_key_id option)
- Add cross_account_policy_type variable (managed/restricted/customer-managed)
- Switch to databricks_aws_crossaccount_policy data source instead of inline policy doc
- Replace timestamp() with time_static for CreatedDate tag (prevents plan drift on every apply)
- Auto-disable spoke NAT gateway when hub_spoke_architecture = true (hub handles egress)
- Add external_id-conditional Unity Catalog trust policy with basic trust fallback
- Remove duplicate modules/aws/aws-infra/components/iam.tf
- Add .terraform-docs.yml for terraform-docs integration
- Update README: configurable encryption/policy docs, hub-spoke NAT clarification,
  terraform-docs note

Co-authored-by: Isaac
@HariGS-DB HariGS-DB force-pushed the feature/aws-infra-module branch from d4ada1b to 80c4a20 Compare March 15, 2026 18:05
@HariGS-DB
Copy link
Copy Markdown
Author

Thanks for the thorough review @alexott! All comments addressed in the latest commit (80c4a20):

Fixed:

  • timestamp()time_static for CreatedDate tag — prevents plan drift on every apply
  • Trailing blank lines removed from workspacestorage.tf, ucstorage.tf, versions.tf, hub-networking/locals.tf
  • Duplicate components/iam.tf removed
  • Cross-account policy now uses databricks_aws_crossaccount_policy data source with configurable cross_account_policy_type (managed / restricted / customer-managed)
  • S3 encryption is now configurable via storage_encryption variable (SSE-S3 default, SSE-KMS with kms_key_id)
  • Hub-Spoke: NAT gateway is automatically disabled in the spoke when hub_spoke_architecture = true — the hub handles all egress. No need to set enable_nat_gateway = false manually.
  • .terraform-docs.yml added; the <!-- BEGIN_TF_DOCS --> block in the README is now generated by terraform-docs
  • FQDN examples in README corrected to leading-dot format (.domain.com) — AWS Network Firewall does not support *.domain.com wildcard syntax

Noted for follow-up PRs:

  • Extracting UC S3 buckets into a separate reusable submodule
  • Extracting Private Link into a separate submodule

All 4 test scenarios validated end-to-end in eu-west-1:

  • Scenario 1: Basic VPC + IAM + S3
  • Scenario 2: + Private Link (backend + relay endpoints)
  • Scenario 3: + Hub-Spoke with Transit Gateway
  • Scenario 4: + Network Firewall + NAT in hub

@HariGS-DB HariGS-DB requested a review from alexott March 15, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize aws infra modules

3 participants