Skip to content

feat(data-engineering): add data-engineering extension family#266

Open
jleto wants to merge 1 commit into
awslabs:mainfrom
jleto:extensions-data-engineering
Open

feat(data-engineering): add data-engineering extension family#266
jleto wants to merge 1 commit into
awslabs:mainfrom
jleto:extensions-data-engineering

Conversation

@jleto
Copy link
Copy Markdown

@jleto jleto commented May 14, 2026

Summary

  • Introduces an opt-in data-engineering extension family under aidlc-rules/aws-aidlc-rule-details/extensions/data-engineering/ with seven extensions: baseline, catalog, cicd, orchestration, redshift, s3-lakehouse, and glue-etl.
  • Each extension follows the established pattern: a rule file (<name>.md) plus an opt-in prompt file (<name>.opt-in.md) with A/B/X format and [Answer]: tag.
  • Rules carry P0 (blocking) / P1 (warning) / P2 (advisory) severity tiers defined in baseline, and verification bullets keyed to AIDLC stages (Requirements Analysis, Functional Design, Infrastructure Design, Code Generation, Build and Test, NFR Requirements).
  • glue-etl covers Glue as a compute surface (Spark, Spark Streaming, Ray, Python shell, interactive sessions, Studio, DataBrew) — complementing existing catalog/lakehouse extensions that govern Glue Data Catalog and Lake Formation.

Test plan

  • Verify each *.opt-in.md file is discovered by the extensions loader at workflow start
  • Confirm that opting IN for an extension loads the corresponding rules file on demand
  • Confirm that opting OUT skips loading the full rules file
  • Run a sample inception flow that opts into baseline plus glue-etl and validates the rules surface in Requirements Analysis
  • Confirm composition references between extensions (e.g., s3-lakehouse S3LH-02 referenced by glue-etl GLUE-05) resolve as expected

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

Introduce opt-in data-engineering extensions for AIDLC: baseline (cross-cutting
data pipeline rules), catalog (dataset registration and discovery), cicd
(version control and promotion), orchestration (MWAA and Step Functions),
redshift (provisioned and Serverless), s3-lakehouse (Iceberg/Delta on S3 with
Glue Catalog), and glue-etl (Spark, Streaming, Ray, Python shell, interactive
sessions, Studio, DataBrew). Each extension follows the rule + opt-in file
pattern with P0/P1/P2 severity tiers and AIDLC-stage-keyed verification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant