Skip to content

Conversation

@edamico
Copy link
Member

@edamico edamico commented Jul 21, 2025

  • Created environment-specific variable files for dev and staging environments.
  • Added README documentation for environment configurations.
  • Implemented IAM roles and policies for Glue job execution.
  • Configured S3 bucket for Glue artifacts with versioning and lifecycle rules.
  • Defined Glue job configurations for PySpark and Python shell jobs.
  • Established networking resources including security groups and Glue connections.
  • Added outputs for important resource identifiers and deployment summary.
  • Created example Terraform variables for Glue deployment POC.
  • Implemented local variables for common tags and resource naming conventions.
  • Added main Terraform configuration file as an entry point.
  • Included version constraints for Terraform and required providers.

What's this PR do?

This PR upgrades Glue to v5 and since Serverless Framework does not offer support for this version we change the deployment to Terraform

@ulises-jeremias @edamico @Tsemenzato

Summary by CodeRabbit

  • New Features

    • Introduced Terraform-based infrastructure for AWS Glue, including modular scripts for jobs, IAM roles, networking, S3 bucket, and CloudWatch logging.
    • Added comprehensive documentation for deploying and migrating Glue jobs using Terraform.
    • Provided example and environment-specific Terraform variable files for easy configuration.
    • Added a JupyterLab startup script for local development.
  • Improvements

    • Enhanced logging and configuration management in Glue job scripts for better traceability and masking of sensitive data.
    • Updated Docker and local development setup for Glue 5.0 compatibility and improved environment consistency.
    • Expanded and reorganized package scripts for Python linting, formatting, and requirements management.
  • Bug Fixes

    • Corrected import paths and configuration key usage to ensure module reliability and compatibility.
  • Chores

    • Removed Serverless Framework configuration and replaced all references with Terraform.
    • Cleaned up obsolete files and updated version pinning for Node.js and Python.
    • Improved .gitignore coverage for system and editor artifacts.

- Created environment-specific variable files for dev and staging environments.
- Added README documentation for environment configurations.
- Implemented IAM roles and policies for Glue job execution.
- Configured S3 bucket for Glue artifacts with versioning and lifecycle rules.
- Defined Glue job configurations for PySpark and Python shell jobs.
- Established networking resources including security groups and Glue connections.
- Added outputs for important resource identifiers and deployment summary.
- Created example Terraform variables for Glue deployment POC.
- Implemented local variables for common tags and resource naming conventions.
- Added main Terraform configuration file as an entry point.
- Included version constraints for Terraform and required providers.
@edamico edamico requested a review from ulises-jeremias as a code owner July 21, 2025 12:34
@coderabbitai
Copy link

coderabbitai bot commented Jul 21, 2025

Walkthrough

This update migrates AWS Glue ETL infrastructure provisioning from the Serverless Framework to a Terraform-based workflow. It removes all Serverless-related files and configuration, introduces comprehensive Terraform modules for Glue jobs, IAM, networking, S3, and logging, and updates documentation and scripts to reflect the new deployment process. Python, Node.js, and Docker environment versions are modernized, and code is refactored for improved logging, configuration management, and import consistency.

Changes

Files / Groups Change Summary
.gitignore, .node-version, .python-version, Pipfile, package.json Ignore rules updated; Node.js and Python versions bumped; Pipfile and package scripts updated for Terraform and code quality tools.
Dockerfile, local/compose.yml, local/localstack/start-jupyter.sh Docker image updated for Glue 5.0, Jupyter startup script and workspace paths changed, new startup script added.
README.md, docs/TERRAFORM_DEPLOYMENT.md Documentation revised to replace Serverless with Terraform, new detailed Terraform deployment and migration guide added.
serverless.yml, glue.yml, jobs/etl/load_s3.py All Serverless Framework configuration and related job config files removed.
jobs/pyspark_hello_world.py, jobs/pythonshell_hello_world.py Logging improvements and code cleanup in example Glue jobs.
libs/config/config.py, libs/logger.py Major config refactor: robust fallback logic, sensitive value masking, and detailed logging. New logger utilities added.
libs/aws/s3.py, libs/common/__init__.py, libs/config/__init__.py, libs/config/secrets.py, libs/db/docdb_mongo.py, libs/db/mongo.py, libs/etl/__init__.py, libs/etl/extract.py, libs/etl/load_documentdb.py, libs/etl/load_postgresql.py, libs/io/__init__.py, libs/io/helpers.py, libs/io/reader.py, libs/io/writer.py Imports reordered or switched to relative; minor code style and structure improvements.
libs/etl/load_s3.py Adds new load_to_s3 function for writing Glue DynamicFrames to S3.
terraform/.gitignore New ignore rules for Terraform state and lock files.
terraform/cloudwatch.tf, terraform/glue_jobs.tf, terraform/iam.tf, terraform/locals.tf, terraform/main.tf, terraform/networking.tf, terraform/outputs.tf, terraform/s3.tf, terraform/scripts.tf, terraform/variables.tf, terraform/versions.tf New Terraform modules for S3, IAM, networking, Glue jobs, CloudWatch logs, outputs, and variable definitions.
terraform/environments/README.md, terraform/environments/dev.tfvars, terraform/environments/staging.tfvars, terraform/terraform.tfvars, terraform/terraform.tfvars.example New environment-specific and example Terraform variable files with documentation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Terraform
    participant AWS
    participant S3
    participant Glue

    User->>Terraform: terraform init / plan / apply
    Terraform->>AWS: Provision S3, IAM, Networking, Glue Jobs, CloudWatch
    AWS->>S3: Create S3 bucket and upload scripts/support files
    AWS->>IAM: Create Glue job role and attach policies
    AWS->>Glue: Create Glue jobs (PySpark, PythonShell) using S3 scripts
    AWS->>CloudWatch: Create log groups for Glue jobs and service
    User->>Glue: Trigger Glue job (via AWS Console or CLI)
    Glue->>S3: Read scripts and support files
    Glue->>CloudWatch: Stream logs during execution
Loading

Estimated code review effort

4 (60–120 minutes)

Poem

🐇
Terraform carrots in neat little rows,
Serverless burrows have come to a close.
Glue jobs now sprout from HCL seed,
With logs and roles all managed in breed.
Python and Docker, both fresh and new—
A bunny’s delight: clean, modular Glue!
🥕


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 28

🔭 Outside diff range comments (1)
libs/io/helpers.py (1)

55-58: {database} and TLS flag never interpolated – missing f prefix

Only the first line of the multi-line string is an f-string. The second line is a plain string, so {database} and the TLS ternary remain literally in the URI.

-    url = (
-        f"mongodb://{encoded_user}:{encoded_pass}@{host}:{port}/"
-        "{database}?tls={'true' if ssl else 'false'}"
-    )
+    url = (
+        f"mongodb://{encoded_user}:{encoded_pass}@{host}:{port}/"
+        f"{database}?tls={'true' if ssl else 'false'}"
+    )
🧹 Nitpick comments (25)
libs/io/helpers.py (1)

2-4: Redundant blank line—follow project style guide

A single blank line between standard-library imports is already present; adding a second one is unnecessary and trips some linters (e.g. ruff E303).

terraform/.gitignore (1)

1-4: Consider broader ignore patterns for Terraform artefacts

Add wildcard variants to avoid accidental commits of env-specific state files or crash logs:

+*.tfstate*
+crash.log
+.terraformrc
libs/config/__init__.py (1)

1-1: Silence Ruff F401 by explicitly re-exporting public symbols

Config and get_config are purposefully re-exported but Ruff flags them as “imported-but-unused”. Add __all__ to make the intent explicit:

 from .config import Config, get_config
+
+# Re-export for library consumers / tests.
+__all__ = ["Config", "get_config"]
libs/common/__init__.py (1)

2-2: Mark re-exports to avoid lint noise

cached_property is likely re-exported for downstream imports. Add it to __all__ (or remove the import if that’s not the intent) to silence F401:

 from .perm import load_tls_ca_bundle
 from .util import cached_property
+
+__all__ = ["load_tls_ca_bundle", "cached_property"]
libs/etl/__init__.py (1)

3-4: Expose public ETL helpers via __all__

Several helpers are imported only to surface a clean package API, triggering Ruff F401. Clarify with:

 from .extract import extract
 from .load_documentdb import load_to_document_db
 from .load_postgresql import load_to_postgresql_db
 from .load_s3 import load_to_s3
+
+__all__ = [
+    "extract",
+    "load_to_document_db",
+    "load_to_postgresql_db",
+    "load_to_s3",
+]
jobs/pythonshell_hello_world.py (1)

3-3: Consider using a module-specific logger instead of configuring the root logger.

Setting the root logger level globally can affect logging behavior of imported libraries. For production code, consider using a module-specific logger:

-logging.getLogger().setLevel(logging.INFO)
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)

However, for a simple "hello world" script like this, the current approach is acceptable.

libs/etl/extract.py (1)

4-6: Import refactor is fine, but keep List import if actually unused
Switching to relative imports is correct. Static linters will now flag from typing import List as unused; consider removing it in a follow-up.

libs/etl/load_s3.py (2)

1-1: Remove unused import.

GlueContext is imported but never used in this function.

-from awsglue.context import GlueContext, DynamicFrame
+from awsglue.context import DynamicFrame

14-16: Improve path validation.

The current validation only checks for None but should also handle empty strings and whitespace-only strings.

-    if not path:
+    if not path or not path.strip():
         raise ValueError("Missing destination path for S3 write")
jobs/pyspark_hello_world.py (1)

1-1: Remove unused import.

The json import is not used anywhere in the script and should be removed to keep the code clean.

-import json
terraform/s3.tf (1)

4-47: Excellent S3 security configuration!

The S3 bucket configuration follows security best practices with private ACL, public access restrictions, server-side encryption, and appropriate lifecycle rules for cost optimization. The versioning and cleanup policies are well-configured for artifact management.

Optional improvement: Consider pinning the module source to a specific commit hash for production environments to ensure reproducible deployments:

module "s3_bucket" {
  source  = "cloudposse/s3-bucket/aws"
- version = "~> 4.0"
+ version = "4.2.0" # or specific commit hash
local/localstack/start-jupyter.sh (1)

2-2: Consider using stricter error handling.

The set +X option disables command tracing but doesn't enable strict error handling. Consider using set -euo pipefail for better error handling in production environments.

-set +X
+set -euo pipefail
libs/logger.py (2)

19-26: Consider preventing duplicate handlers.

The current logic only checks if handlers exist, but doesn't prevent adding duplicate handlers if the function is called multiple times with the same logger name.

 # Configure a handler if none exists
 if not logger.handlers:
     handler = logging.StreamHandler()
     formatter = logging.Formatter(
         "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
     )
     handler.setFormatter(formatter)
     logger.addHandler(handler)
+    # Prevent propagation to avoid duplicate logs
+    logger.propagate = False

42-48: Enhance sensitive value detection.

The current implementation only checks if any sensitive word is in the key, but doesn't handle nested structures or more complex patterns.

Consider improving the detection logic:

 def mask_sensitive_value(key, value):
     """
     Mask sensitive values in logs
     """
-    sensitive_keys = ["password", "secret", "token", "key"]
+    sensitive_keys = ["password", "secret", "token", "key", "credential", "auth"]
     if (
         any(sensitive_word in key.lower() for sensitive_word in sensitive_keys)
-        and value
+        and value is not None and str(value).strip()
     ):
         return "********"
     return value
terraform/terraform.tfvars.example (2)

23-23: Fix formatting inconsistency.

There's extra whitespace in the type field assignment.

-    type               =     "spark"                        # Job type: "spark" or "pythonshell"
+    type               = "spark"                        # Job type: "spark" or "pythonshell"

40-44: Additional support file adds complexity.

The second support file (jobs.zip) in this example differs from the first tfvars file. Ensure this is intentional and document the purpose of each support file.

Consider adding comments to clarify the purpose of each support file:

     support_files = [
       {
-        local_path = "libs.zip"  # Path to local file relative to terraform directory
+        local_path = "libs.zip"  # Shared libraries and dependencies
         s3_prefix  = ""          # S3 prefix where the file will be uploaded
       },
       {
-        local_path = "jobs.zip"
+        local_path = "jobs.zip"  # Job scripts and configurations
         s3_prefix  = ""
       }
     ]
terraform/environments/dev.tfvars (1)

78-80: Consider more restrictive CIDR blocks for development.

The CIDR block 10.0.0.0/8 is very broad and may grant excessive network access for a development environment.

Consider using a more restrictive CIDR block:

-database_cidr_blocks = ["10.0.0.0/8"]
+database_cidr_blocks = ["10.0.0.0/16"]  # More restrictive for development
terraform/cloudwatch.tf (1)

6-21: Logs are un-encrypted; consider KMS for compliance

CloudWatch Log Groups default to AWS-managed encryption. If your org requires customer-managed keys, add kms_key_id.

 resource "aws_cloudwatch_log_group" "glue_job_logs" {
   for_each = var.glue_jobs
-
+  kms_key_id        = var.kms_key_id # new variable

Same change applies to the service & crawler log groups.

terraform/networking.tf (1)

28-43: Outbound HTTP/HTTPS to 0.0.0.0/0 may breach least-privilege

If Glue jobs only need to reach AWS services, restrict to AWS IP ranges or VPC endpoints; else document the requirement.

Dockerfile (1)

10-14: Consider pinning package versions for reproducibility.

While the cleanup is good, running yum update -y without version constraints could lead to non-reproducible builds if package versions change between builds.

Consider specifying versions for critical packages to ensure build reproducibility:

-RUN yum update -y && \
-    yum install -y gcc python3-devel wget && \
+RUN yum install -y \
+    gcc \
+    python3-devel \
+    wget && \
     yum clean all && \
     rm -rf /var/cache/yum
terraform/glue_jobs.tf (1)

84-103: Consider removing explicit aws_region parameter.

AWS Glue jobs automatically inherit the region from their execution environment. Explicitly passing it as a job argument could cause issues if the job is copied to another region.

       # Support files - reference the uploaded zip files
       "--extra-py-files" = "s3://${module.s3_bucket.bucket_id}/libs.zip,s3://${module.s3_bucket.bucket_id}/jobs.zip"
-      "--aws_region"          = var.aws_region
     },
     each.value.default_arguments
   )

If your job code specifically needs the region, it should use the AWS SDK's region detection instead of relying on a passed parameter.

libs/config/config.py (1)

207-236: Consider using consistent log levels for parameter sources.

Currently, default value usage is logged at DEBUG level while other sources use INFO level. This inconsistency could make troubleshooting difficult.

         # Return default if not found in either place
-        logger.debug(f"Using default value for {env_param_name}/{job_param_name}")
+        logger.info(f"Using default value for {env_param_name}/{job_param_name}")
         return default

This ensures all parameter source decisions are visible at the same log level.

terraform/variables.tf (3)

4-8: Avoid hard-coding the AWS region
Hard-wiring us-east-1 as the default can lead to accidental multi-region drift or bill surprises when the variable is not explicitly overridden. Either remove the default or set it to null and compute a fallback via data.aws_region inside the module.

-variable "aws_region" {
-  description = "AWS region for resources"
-  type        = string
-  default     = "us-east-1"
-}
+variable "aws_region" {
+  description = "AWS region for resources"
+  type        = string
+  default     = null # Require the caller to pass it or fall back to the current provider region
+}

44-64: Strengthen validation of the glue_jobs object
Nothing prevents typos such as "sparkk" or negative max_concurrent_runs. Add a top-level variable validation to keep bad input from reaching the provider.

# example validation – place under the variable block
validation {
  condition = alltrue([
    for job in values(var.glue_jobs) :
    contains(["spark","pythonshell"], job.type) &&
    job.max_concurrent_runs >= 1 &&
    (job.number_of_workers == null || job.number_of_workers > 0)
  ])
  error_message = "Each glue_jobs entry must have type 'spark' or 'pythonshell', positive concurrency, and (if set) a positive number_of_workers."
}

86-108: Duplicate literal lists – extract to a local to avoid drift
The permissible CloudWatch retention values are duplicated for log_retention_days and service_log_retention_days. Move the list to locals and reference it in both validations to reduce maintenance overhead.

+locals {
+  cw_retention_days = [
+    1,3,5,7,14,30,60,90,120,150,180,365,400,545,731,1827,3653
+  ]
+}-  validation {
-    condition = contains([
-      1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653
-    ], var.log_retention_days)
+  validation {
+    condition     = contains(local.cw_retention_days, var.log_retention_days)-  validation {
-    condition = contains([
-      1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653
-    ], var.service_log_retention_days)
+  validation {
+    condition     = contains(local.cw_retention_days, var.service_log_retention_days)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac73f36 and c415134.

⛔ Files ignored due to path filters (2)
  • Pipfile.lock is excluded by !**/*.lock
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (49)
  • .gitignore (1 hunks)
  • .node-version (1 hunks)
  • .python-version (1 hunks)
  • Dockerfile (1 hunks)
  • Pipfile (1 hunks)
  • README.md (4 hunks)
  • docs/TERRAFORM_DEPLOYMENT.md (1 hunks)
  • glue.yml (0 hunks)
  • jobs/etl/load_s3.py (0 hunks)
  • jobs/pyspark_hello_world.py (2 hunks)
  • jobs/pythonshell_hello_world.py (1 hunks)
  • libs/aws/s3.py (2 hunks)
  • libs/common/__init__.py (1 hunks)
  • libs/config/__init__.py (1 hunks)
  • libs/config/config.py (6 hunks)
  • libs/config/secrets.py (1 hunks)
  • libs/db/docdb_mongo.py (1 hunks)
  • libs/db/mongo.py (1 hunks)
  • libs/etl/__init__.py (1 hunks)
  • libs/etl/extract.py (1 hunks)
  • libs/etl/load_documentdb.py (1 hunks)
  • libs/etl/load_postgresql.py (1 hunks)
  • libs/etl/load_s3.py (1 hunks)
  • libs/io/__init__.py (1 hunks)
  • libs/io/helpers.py (1 hunks)
  • libs/io/reader.py (1 hunks)
  • libs/io/writer.py (1 hunks)
  • libs/logger.py (1 hunks)
  • local/compose.yml (2 hunks)
  • local/localstack/start-jupyter.sh (1 hunks)
  • package.json (2 hunks)
  • serverless.yml (0 hunks)
  • terraform/.gitignore (1 hunks)
  • terraform/cloudwatch.tf (1 hunks)
  • terraform/environments/README.md (1 hunks)
  • terraform/environments/dev.tfvars (1 hunks)
  • terraform/environments/staging.tfvars (1 hunks)
  • terraform/glue_jobs.tf (1 hunks)
  • terraform/iam.tf (1 hunks)
  • terraform/locals.tf (1 hunks)
  • terraform/main.tf (1 hunks)
  • terraform/networking.tf (1 hunks)
  • terraform/outputs.tf (1 hunks)
  • terraform/s3.tf (1 hunks)
  • terraform/scripts.tf (1 hunks)
  • terraform/terraform.tfvars (1 hunks)
  • terraform/terraform.tfvars.example (1 hunks)
  • terraform/variables.tf (1 hunks)
  • terraform/versions.tf (1 hunks)
💤 Files with no reviewable changes (3)
  • serverless.yml
  • jobs/etl/load_s3.py
  • glue.yml
🧰 Additional context used
🧬 Code Graph Analysis (9)
libs/db/docdb_mongo.py (1)
libs/common/perm.py (1)
  • load_tls_ca_bundle (7-10)
libs/io/__init__.py (1)
libs/io/helpers.py (1)
  • get_connection_options (152-250)
libs/etl/__init__.py (1)
libs/etl/load_postgresql.py (1)
  • load_to_postgresql_db (7-11)
libs/etl/extract.py (3)
libs/io/reader.py (1)
  • read_from_options (6-34)
libs/config/config.py (1)
  • Config (100-265)
libs/aws/s3.py (1)
  • AwsS3Client (8-110)
libs/common/__init__.py (1)
libs/common/util.py (1)
  • cached_property (4-14)
libs/aws/s3.py (1)
libs/config/config.py (2)
  • get_config (271-290)
  • s3_vars (136-147)
libs/etl/load_s3.py (2)
libs/io/writer.py (1)
  • write_from_options (6-22)
libs/config/config.py (2)
  • Config (100-265)
  • s3_vars (136-147)
libs/config/config.py (3)
libs/common/util.py (1)
  • cached_property (4-14)
libs/config/env.py (2)
  • get_envs (41-53)
  • get_var (18-35)
libs/config/secrets.py (1)
  • get_secrets_resolver (12-27)
libs/logger.py (1)
libs/config/config.py (2)
  • aws_client_vars (113-133)
  • s3_vars (136-147)
🪛 Ruff (0.12.2)
libs/config/__init__.py

1-1: .config.Config imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


1-1: .config.get_config imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

libs/io/__init__.py

1-1: .helpers.get_connection_options imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

libs/etl/__init__.py

3-3: .load_postgresql.load_to_postgresql_db imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

libs/common/__init__.py

2-2: .util.cached_property imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

libs/io/writer.py

1-1: awsglue.context.GlueContext imported but unused

Remove unused import: awsglue.context.GlueContext

(F401)

libs/etl/load_s3.py

1-1: awsglue.context.GlueContext imported but unused

Remove unused import: awsglue.context.GlueContext

(F401)

jobs/pyspark_hello_world.py

1-1: json imported but unused

Remove unused import: json

(F401)

libs/config/config.py

3-3: os imported but unused

Remove unused import: os

(F401)

🪛 Shellcheck (0.10.0)
local/localstack/start-jupyter.sh

[warning] 10-10: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 15-15: This $? refers to a condition, not a command. Assign to a variable to avoid it being overwritten.

(SC2319)

🪛 Checkov (3.2.334)
terraform/s3.tf

[MEDIUM] 4-47: Ensure Terraform module sources use a commit hash

(CKV_TF_1)

terraform/iam.tf

[HIGH] 62-90: Ensure no IAM policies documents allow "*" as a statement's resource for restrictable actions

(CKV_AWS_356)


[MEDIUM] 118-143: Ensure Terraform module sources use a commit hash

(CKV_TF_1)

🪛 Gitleaks (8.27.2)
terraform/networking.tf

81-81: Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches.

(hashicorp-tf-password)

🔇 Additional comments (38)
.node-version (1)

1-1: Verify runtimes using Node v22.11.0

Jumping from v16 → v22 is a breaking change. Ensure:

  1. Local dev containers have been rebuilt.
  2. Any Lambda / Glue-shell jobs that package Node won’t exceed currently supported AWS runtimes (latest ≈ v20).
Pipfile (1)

21-23: Python 3.11 bump – double-check Glue job base images

Glue 5 images currently ship with CPython 3.10. Deploying wheels compiled for 3.11 will fail at runtime. Verify that:

  • Glue job Docker images are also updated to a 3.11 variant, or
  • python_version remains 3.10 for compatibility.
.python-version (1)

1-1: Verify Glue-runtime compatibility before locking to Python 3.11

AWS Glue v5 currently ships with Python 3.10. Bumping the local environment to 3.11 risks introducing language features / wheel builds that will fail at job runtime.
Double-check Glue’s runtime image or pin the local version to 3.10 to stay aligned.

libs/config/secrets.py (1)

2-7: Import grouping is now PEP-8 compliant – good to go
No functional impact was introduced by re-ordering the boto3 import under the third-party section.
LGTM.

libs/io/reader.py (1)

1-2: LGTM! Clean import organization.

The import reordering maintains consistency with the codebase refactoring, and both imports are properly utilized in the function.

libs/db/mongo.py (1)

5-5: Good improvement! Relative imports enhance package portability.

The change from absolute to relative imports follows Python packaging best practices and improves module portability.

libs/db/docdb_mongo.py (1)

3-3: Excellent! Consistent relative import usage.

The change to relative imports aligns with the package-wide refactoring effort and maintains the same functionality for TLS certificate handling.

local/compose.yml (1)

4-4: Confirm start-jupyter.sh exists inside the final image and is executable
Changing the entrypoint path to /home/hadoop/start-jupyter.sh will fail fast if the Dockerfile does not copy + chmod this script at exactly that location with the correct user/group. Please double-check the Dockerfile layer (or add one) to COPY the script and RUN chmod +x.

libs/etl/load_postgresql.py (1)

3-4: 👍 Relative import switch looks correct
The new ..io.writer and ..config paths resolve cleanly one level up from libs/etl. Ensure libs/__init__.py and libs/io/__init__.py exist so the package is import-able by Glue.

libs/etl/load_documentdb.py (1)

3-4: Consistent relative imports adopted
Matches the pattern used in other ETL modules; improves portability inside zipped Glue job packages. No further action.

.gitignore (1)

202-210: LGTM! Good additions to .gitignore.

These are standard ignore patterns for macOS system files, Vim editor files, and common backup/temporary files. This improves repository hygiene by preventing accidental commits of temporary files.

libs/aws/s3.py (1)

5-5: LGTM! Consistent with relative import refactoring.

The change to relative imports aligns with the broader refactoring across the libs package.

terraform/locals.tf (1)

1-14: Excellent Terraform locals configuration.

This locals block follows best practices by:

  • Centralizing common tags for consistent resource tagging
  • Establishing a standardized naming convention with name_prefix
  • Creating reusable variables that promote DRY principles

The structure supports maintainable and consistent infrastructure deployment.

README.md (2)

3-3: LGTM! Updated to reflect Terraform deployment.

The description accurately reflects the migration to Terraform-based infrastructure deployment.


25-25: Comprehensive documentation update for Terraform migration.

All references have been systematically updated from Serverless Framework to Terraform, maintaining consistency throughout the documentation. The simplified git clone installation approach is more straightforward than the previous npx serverless install method.

Also applies to: 32-32, 39-39, 75-75, 93-93, 99-99

libs/etl/load_s3.py (1)

7-19: Well-implemented S3 loading function.

The function demonstrates good practices:

  • Proper error handling for missing required parameters
  • Safe copying of config to avoid mutation
  • Correct S3 URI construction
  • Integration with the writer utility

The logic is sound and follows the established patterns in the codebase.

terraform/environments/README.md (1)

1-62: Excellent documentation for environment management!

This README provides comprehensive and well-structured guidance for managing environment-specific Terraform configurations. The inclusion of usage examples, step-by-step instructions for creating new environments, and variable precedence explanation makes it very user-friendly.

terraform/versions.tf (1)

1-22: Solid Terraform provider configuration!

The version constraints and provider configuration follow best practices. The AWS provider v5.x requirement aligns perfectly with the Glue v5 upgrade objective, and the default tags setup ensures consistent resource tagging across the infrastructure.

jobs/pyspark_hello_world.py (1)

9-40: Excellent logging enhancements!

The addition of structured logging throughout the job execution provides valuable observability for production monitoring. The logger setup and comprehensive logging at key stages (job start, data processing, S3 operations, completion) will greatly help with debugging and monitoring job performance.

terraform/scripts.tf (1)

4-28: Well-structured script upload configuration!

The S3 object resources are properly configured with appropriate etag handling for change detection, consistent tagging strategy, and clear organization. The use of filemd5() ensures scripts are only re-uploaded when changed, which is efficient for deployment pipelines.

terraform/terraform.tfvars (1)

26-26: Verify AWS Glue 5.0 worker type compatibility

Please confirm that the specified worker type (G.1X) is supported by AWS Glue version 5.0. As of current public information, AWS hasn’t officially documented Glue 5.0’s available worker types. To ensure your Terraform configuration won’t fail at deployment:

• Check the AWS Glue 5.0 documentation or the AWS Console for the list of valid worker types.
• Update worker_type in terraform/terraform.tfvars if Glue 5.0 introduces new options or deprecates existing ones.

terraform/environments/dev.tfvars (3)

32-34: Excellent development-specific configurations.

The inclusion of --enable-spark-ui, --enable-metrics, and --job-bookmark-disable are appropriate for development environments as they enhance debugging capabilities and ensure clean test runs.


83-85: Appropriate logging configuration for development.

The shorter retention periods (7 and 3 days) and enabled crawler logs are well-suited for a development environment, balancing debugging needs with cost efficiency.


24-25: Confirm AWS Glue 5.0 support for “Standard” worker type
I searched the repo for any references to supported worker types and didn’t find guidance on using “Standard” with Spark jobs in Glue 5.0. Please verify against the official AWS Glue 5.0 documentation to ensure this worker type is valid for Spark workloads.

• File requiring review: terraform/environments/dev.tfvars (lines 24–25)
• Configuration to check:

worker_type       = "Standard"
number_of_workers = 2
terraform/environments/staging.tfvars (1)

49-56: Check Python-Shell worker class against Glue 5 docs

Standard is still accepted for Python-Shell, but Glue 5 introduced Z.0X workers with higher memory. If you need >1 GB RAM or Spark UI, consider switching:

worker_type = "Z.0X"

Confirm with the team’s sizing requirements before changing.

terraform/cloudwatch.tf (1)

23-35: Retention of operational logs is hard-coded

var.service_log_retention_days might differ from compliance requirements per env. Ensure the variable is surfaced in every tfvars (dev/staging/prod).

terraform/iam.tf (1)

118-121: Pin module source to a commit or tag

Relying on a floating version ~> 0.19 can introduce breaking changes.

source  = "cloudposse/iam-role/aws"
version = "0.19.2"  # or specific commit SHA
Dockerfile (2)

1-2: LGTM! Good platform support for multi-architecture.

The upgrade to AWS Glue 5.0 and explicit platform support for both amd64 and arm64 architectures is a good practice for cross-platform compatibility.


25-29: LGTM! Good version pinning for JupyterLab components.

Explicitly pinning versions for JupyterLab and related packages ensures consistent behavior across deployments.

docs/TERRAFORM_DEPLOYMENT.md (2)

117-130: LGTM! Good use of Terraform workspaces for environment management.

The documentation correctly demonstrates using Terraform workspaces with environment-specific variable files for proper environment isolation.


407-421: LGTM! Correct Terraform import syntax.

The import commands correctly handle the escaping needed for map keys in Terraform resource addresses.

terraform/outputs.tf (3)

8-11: LGTM! Good use of preconditions for output validation.

The precondition ensures the S3 bucket was created successfully before exposing its name, providing early failure detection.


70-88: LGTM! Well-structured output for Glue jobs.

The output elegantly merges both PySpark and Python shell jobs into a single map with consistent structure, making it easy for consumers to iterate over all jobs regardless of type.


144-152: LGTM! Proper handling of conditional resource outputs.

The outputs correctly return null for resources that aren't created when VPC is not configured, preventing confusion about resource availability.

terraform/glue_jobs.tf (1)

119-166: LGTM! Proper configuration for Python shell jobs.

The Python shell job configuration correctly omits Spark-specific settings while maintaining the same logging and monitoring capabilities as PySpark jobs.

libs/config/config.py (2)

56-83: LGTM! Robust parameter retrieval with format flexibility.

The function handles AWS Glue's parameter naming conventions well, trying multiple formats to accommodate different naming styles.


85-98: LGTM! Good security practice for logging sensitive data.

The function appropriately masks sensitive values based on parameter names, preventing accidental exposure in logs.

terraform/variables.tf (1)

73-77: Default CIDR block is overly permissive
10.0.0.0/8 opens the security group to every private IP. Double-check if that is intentional; consider scoping to the actual database subnets.

Dockerfile Outdated
Comment on lines 16 to 22
# Install Python tools and extract requirements
COPY Pipfile Pipfile.lock ./
RUN pip3 install --upgrade pip && \
pip3 install pipenv && \
# Use the newer syntax for exporting requirements
pipenv requirements > requirements.txt && \
pip3 install -r requirements.txt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize Python dependency installation process.

Installing pipenv just to export requirements adds unnecessary overhead to the Docker image. Consider pre-exporting requirements.txt in your build process.

Option 1 (Preferred): Pre-export requirements.txt in your CI/CD pipeline and copy it directly:

-# Install Python tools and extract requirements
-COPY Pipfile Pipfile.lock ./
-RUN pip3 install --upgrade pip && \
-    pip3 install pipenv && \
-    # Use the newer syntax for exporting requirements
-    pipenv requirements > requirements.txt && \
-    pip3 install -r requirements.txt
+# Install Python dependencies from pre-exported requirements
+COPY requirements.txt ./
+RUN pip3 install --upgrade pip && \
+    pip3 install -r requirements.txt

Option 2: Add error handling if keeping the current approach:

 RUN pip3 install --upgrade pip && \
     pip3 install pipenv && \
-    pipenv requirements > requirements.txt && \
+    pipenv requirements > requirements.txt || (echo "Failed to export requirements" && exit 1) && \
     pip3 install -r requirements.txt
🤖 Prompt for AI Agents
In Dockerfile lines 16 to 22, avoid installing pipenv inside the Docker image
just to export requirements.txt, as it adds unnecessary overhead. Instead,
pre-export the requirements.txt file during your CI/CD pipeline and copy it
directly into the image. If you must keep the current approach, add error
handling around the pipenv export command to prevent build failures and improve
robustness.

Comment on lines +34 to +37
# Install extra Spark drivers (PostgreSQL + MongoDB)
RUN mkdir -p /opt/spark/jars && \
wget https://jdbc.postgresql.org/download/postgresql-42.2.12.jar -O /opt/spark/jars/postgresql-42.2.12.jar && \
wget https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/5.5.1/mongodb-driver-sync-5.5.1.jar -O /opt/spark/jars/mongodb-driver-sync-5.5.1.jar
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add checksum verification for downloaded JAR files.

Downloading JAR files without checksum verification poses a security risk. If the download URLs are compromised, malicious JARs could be introduced into your build.

Add checksum verification for the downloaded files:

-RUN mkdir -p /opt/spark/jars && \
-    wget https://jdbc.postgresql.org/download/postgresql-42.2.12.jar -O /opt/spark/jars/postgresql-42.2.12.jar && \
-    wget https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/5.5.1/mongodb-driver-sync-5.5.1.jar -O /opt/spark/jars/mongodb-driver-sync-5.5.1.jar
+RUN mkdir -p /opt/spark/jars && \
+    wget https://jdbc.postgresql.org/download/postgresql-42.2.12.jar -O /opt/spark/jars/postgresql-42.2.12.jar && \
+    echo "1f06e20e7bcbef36ebf539f0ed3e9b3c5337eeb9  /opt/spark/jars/postgresql-42.2.12.jar" | sha1sum -c - && \
+    wget https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/5.5.1/mongodb-driver-sync-5.5.1.jar -O /opt/spark/jars/mongodb-driver-sync-5.5.1.jar && \
+    echo "expected_sha1_here  /opt/spark/jars/mongodb-driver-sync-5.5.1.jar" | sha1sum -c -

Note: Replace expected_sha1_here with the actual SHA1 checksum of the MongoDB driver.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Install extra Spark drivers (PostgreSQL + MongoDB)
RUN mkdir -p /opt/spark/jars && \
wget https://jdbc.postgresql.org/download/postgresql-42.2.12.jar -O /opt/spark/jars/postgresql-42.2.12.jar && \
wget https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/5.5.1/mongodb-driver-sync-5.5.1.jar -O /opt/spark/jars/mongodb-driver-sync-5.5.1.jar
# Install extra Spark drivers (PostgreSQL + MongoDB)
RUN mkdir -p /opt/spark/jars && \
wget https://jdbc.postgresql.org/download/postgresql-42.2.12.jar -O /opt/spark/jars/postgresql-42.2.12.jar && \
echo "1f06e20e7bcbef36ebf539f0ed3e9b3c5337eeb9 /opt/spark/jars/postgresql-42.2.12.jar" | sha1sum -c - && \
wget https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/5.5.1/mongodb-driver-sync-5.5.1.jar -O /opt/spark/jars/mongodb-driver-sync-5.5.1.jar && \
echo "expected_sha1_here /opt/spark/jars/mongodb-driver-sync-5.5.1.jar" | sha1sum -c -
🤖 Prompt for AI Agents
In Dockerfile lines 34 to 37, the downloaded PostgreSQL and MongoDB JAR files
lack checksum verification, which is a security risk. To fix this, after each
wget command, add a step to verify the SHA1 checksum of the downloaded file
using a tool like sha1sum and compare it against the expected checksum. If the
checksum does not match, the build should fail to prevent using potentially
compromised files. Obtain the correct SHA1 checksums for both JARs and include
them in the verification commands.

Comment on lines 78 to 83
connection_properties = {
JDBC_CONNECTION_URL = "jdbc:postgresql://localhost:5432/postgres"
USERNAME = "postgres"
PASSWORD = "password"
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Hard-coded JDBC password committed to VCS

Storing credentials in plain text violates security best practices and triggers secret-scanning tools.

-    JDBC_CONNECTION_URL = "jdbc:postgresql://localhost:5432/postgres"
-    USERNAME           = "postgres"
-    PASSWORD           = "password"
+    # Replace with reference to Secrets Manager / SSM
+    JDBC_CONNECTION_URL = data.aws_secretsmanager_secret_version.db_creds.secret_string["url"]
+    USERNAME            = data.aws_secretsmanager_secret_version.db_creds.secret_string["username"]
+    PASSWORD            = data.aws_secretsmanager_secret_version.db_creds.secret_string["password"]

Also add the secret data source and IAM permissions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
connection_properties = {
JDBC_CONNECTION_URL = "jdbc:postgresql://localhost:5432/postgres"
USERNAME = "postgres"
PASSWORD = "password"
}
connection_properties = {
+ # Replace with reference to Secrets Manager / SSM
- JDBC_CONNECTION_URL = "jdbc:postgresql://localhost:5432/postgres"
- USERNAME = "postgres"
- PASSWORD = "password"
+ JDBC_CONNECTION_URL = data.aws_secretsmanager_secret_version.db_creds.secret_string["url"]
+ USERNAME = data.aws_secretsmanager_secret_version.db_creds.secret_string["username"]
+ PASSWORD = data.aws_secretsmanager_secret_version.db_creds.secret_string["password"]
}
🧰 Tools
🪛 Gitleaks (8.27.2)

81-81: Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches.

(hashicorp-tf-password)

🤖 Prompt for AI Agents
In terraform/networking.tf around lines 78 to 83, the JDBC password is
hard-coded in plain text, which is a security risk. Remove the hard-coded
password and instead reference it securely using a secret data source such as
AWS Secrets Manager or HashiCorp Vault. Additionally, update the Terraform
configuration to include the necessary IAM permissions to allow access to the
secret. This will ensure credentials are not exposed in version control and
comply with security best practices.

pythonshell_hello_world = {
script_path = "jobs/pythonshell_hello_world.py"
type = "pythonshell"
python_version = "3.11"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Missing glue_version for Python shell job.

The Python shell job example is missing the glue_version parameter, which is required for consistency with the PySpark job configuration.

 pythonshell_hello_world = {
   script_path         = "jobs/pythonshell_hello_world.py"
   type               = "pythonshell"
+  glue_version       = "5.0"
   python_version     = "3.11"
🤖 Prompt for AI Agents
In terraform/terraform.tfvars at line 47, the Python shell job configuration is
missing the required glue_version parameter. Add a glue_version entry with the
appropriate version value to ensure consistency with the PySpark job
configuration and proper job execution.

Comment on lines +32 to +36
variable "bucket_name" {
description = "Name for the S3 bucket storing Glue artifacts"
type = string
default = "glue-artifacts"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Bucket name is not globally unique – expect deployment collisions
"glue-artifacts" will already exist in most accounts. Either (a) require callers to supply the bucket name, or (b) derive it dynamically with account/region suffixes to guarantee uniqueness.

-variable "bucket_name" {
-  description = "Name for the S3 bucket storing Glue artifacts"
-  type        = string
-  default     = "glue-artifacts"
-}
+# The caller must supply a globally-unique bucket name.  No default provided.
+variable "bucket_name" {
+  description = "Globally unique S3 bucket name to store Glue artifacts"
+  type        = string
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
variable "bucket_name" {
description = "Name for the S3 bucket storing Glue artifacts"
type = string
default = "glue-artifacts"
}
variable "bucket_name" {
description = "Globally unique S3 bucket name to store Glue artifacts"
type = string
}
🤖 Prompt for AI Agents
In terraform/variables.tf around lines 32 to 36, the bucket_name variable has a
default value "glue-artifacts" which is not globally unique and may cause
deployment collisions. To fix this, remove the default value so the bucket_name
must be supplied by the caller, or modify the default to dynamically include
unique identifiers like the AWS account ID and region as suffixes to ensure
uniqueness.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/TERRAFORM_DEPLOYMENT.md (1)

110-115: Still missing a safety check before executing the upload script

A prior review asked to verify that test_script_upload.sh exists before chmod / execution. That safeguard has not been applied, so running these commands in a wrong directory will fail silently.

-# Make the upload script executable
-chmod +x test_script_upload.sh
-
-# Upload scripts to S3
-./test_script_upload.sh
+# Verify the script exists
+if [ ! -f "test_script_upload.sh" ]; then
+  echo "Error: test_script_upload.sh not found. Ensure you are in the correct directory." >&2
+  exit 1
+fi
+
+# Make the upload script executable
+chmod +x test_script_upload.sh
+
+# Upload scripts to S3
+./test_script_upload.sh
🧹 Nitpick comments (1)
docs/TERRAFORM_DEPLOYMENT.md (1)

179-181: Malformed markdown heading

Lines 179-180 split the ## Migration from Serverless Framework heading over two lines, causing an empty “##” header. Merge them onto one line.

-##
- Migration from Serverless Framework
+## Migration from Serverless Framework
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c415134 and c48dddc.

📒 Files selected for processing (6)
  • Dockerfile (1 hunks)
  • docs/TERRAFORM_DEPLOYMENT.md (1 hunks)
  • terraform/networking.tf (1 hunks)
  • terraform/outputs.tf (1 hunks)
  • terraform/terraform.tfvars (1 hunks)
  • terraform/terraform.tfvars.example (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • terraform/terraform.tfvars
🚧 Files skipped from review as they are similar to previous changes (4)
  • terraform/networking.tf
  • terraform/terraform.tfvars.example
  • Dockerfile
  • terraform/outputs.tf
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: super-linter

Comment on lines +82 to +85
# Optional variables
glue_version = "4.0"
python_version = "3"
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Default Glue version contradicts PR objective

The sample terraform.tfvars still recommends glue_version = "4.0", while this PR explicitly upgrades to Glue v5. Keeping an outdated default will mislead users and cause runtime errors (Glue 4 → Python 3.7, Glue 5 → Python 3.10).

-# Optional variables
-glue_version = "4.0"
+# Optional variables
+# Use Glue v5 unless you have a hard reason to stay on 4.x
+glue_version = "5.0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Optional variables
glue_version = "4.0"
python_version = "3"
```
# Optional variables
# Use Glue v5 unless you have a hard reason to stay on 4.x
glue_version = "5.0"
python_version = "3"
🤖 Prompt for AI Agents
In docs/TERRAFORM_DEPLOYMENT.md around lines 82 to 85, update the default
glue_version value from "4.0" to "5.0" in the sample terraform.tfvars to align
with the PR's upgrade to Glue v5 and avoid user confusion or runtime errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants