DAIVI : A Reference Solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS
- About DAIVI
- Management Data Lake for Cost Analytics
- Deploying DAIVI
- Exploring DAIVI
- Security Recommendatioms
- Contributors
DAIVI is a reference solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation "Amazon SageMaker" platform. The goal of the DAIVI solution is to provide engineers with sample Terraform modules to build their enterprise data platform on AWS. The solution is being developed working backward from customer priorities.
The solution currently supports the first 6 modules marked in green in the solution roadmap below, that include
- Identity Center modules for Organization Level and Account Level Identity Centers with Users, Groups, Permissions and IAM Mapping
- Data Lake modules to create Hive Tables and Iceberg Tables, both on standard S3 bucket and S3 Table bucket
- Sagemake modules to provision domains and projects, add members to the projects, configure lakehouse and add compute
- Sagemaker Catalog modules to support standard assets, custom assets, data lineage, both standard and custom, data quality and data mesh
- DataZone modules to support standard assets, custom blueprints, data lineage, both standard and custom, data quality and data mesh
- Data pipeline module for Glue batch pipeline, Data exploration module for Athena and Data visualization module for Quicksight.
The solution supports modules to provision 1) Hive data lakes on standard S3 buckets 2) Iceberg data lakes on standard S3 buckets and 3) Iceberg data lakes on S3 table buckets. In addition,it covers how customers can implement 1) transaction data lakes 2) schema evolution and 3) time travel queries on Iceberg data lakes.
The solution plans to support 1) batch data pipelines using Glue 2) streaming data pipeline using Glue and 3) streaming data pipeline using Flink. The initial release supports modules to provision batch data pipelines using Glue.
The solutions supports modules to provision IAM Identity Center instances, at organization-level or account-level, create users and groups and grant them required permissions.
The solutions supports modules to provision Sagemaker domains and Sagemaker projects, with integration with IAM Identity Center to support various Sagemaker domain/project roles. It provisions one domain and creates two projects, producer and consumer.
The solutions supports modules to configure Sagemaker projects to load data lakes into Sagemaker Lakehouse.
The solutions demonstrates how to implement data processing using Athena, Redshift and EMR.
The solutions supports modules to configure Sagemaker projects to create data sources and load assets from data lakes, view data quality and data lineage. It also supports modules to create custom assets and custom lineage.
The solutions demonstrates how to implement a data mesh solution between a producer and consumer projects with proper data governance enforced by the producer project.
The solutions demonstrates how to create visualization using Quicksight leveraging data from data lakes.
Note: Additional reading about DAIVI architecture and Design principles here.
Working backward from the needs of a customer, the solution implements a sample management data lake for cost analytics to demonstrate how to create Hive and Iceberg data lakes on S3 buckets and S3 table buckets, and how to implement batch data pipelines using Glue.
All three data pipelines below receive data from different data sources and store them in Hive and Iceberg data lakes using standard S3 buckets or S3 table bucket.
The solution uses a Glue batch data pipeline to load AWS Cost and Usage report to Hive and Iceberg data lakes.
The solution uses a Glue batch data pipeline to load AWS S3 Inventory report to Hive and Iceberg data lakes.
The solution uses a Glue batch data pipeline to load operational data from a Splunk instance into Iceberg data lakes.
- Deploying DAIVI
- Exploring DAIVI
- Security Recommendatioms
- Additional Resources: