DAIVI : A Reference Solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS

About DAIVI on AWS

DAIVI is a reference solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation "Amazon SageMaker" platform. The goal of the DAIVI solution is to provide engineers with sample Terraform modules to build their enterprise data platform on AWS. The solution is being developed working backward from customer priorities.

The solution currently supports the first 6 modules marked in green in the solution roadmap below, that include

Identity Center modules for Organization Level and Account Level Identity Centers with Users, Groups, Permissions and IAM Mapping
Data Lake modules to create Hive Tables and Iceberg Tables, both on standard S3 bucket and S3 Table bucket
Sagemake modules to provision domains and projects, add members to the projects, configure lakehouse and add compute
Sagemaker Catalog modules to support standard assets, custom assets, data lineage, both standard and custom, data quality and data mesh
DataZone modules to support standard assets, custom blueprints, data lineage, both standard and custom, data quality and data mesh
Data pipeline module for Glue batch pipeline, Data exploration module for Athena and Data visualization module for Quicksight.

Solution Vision

Solution Architecture

Solution Components

Data Lakes

The solution supports modules to provision 1) Hive data lakes on standard S3 buckets 2) Iceberg data lakes on standard S3 buckets and 3) Iceberg data lakes on S3 table buckets. In addition,it covers how customers can implement 1) transaction data lakes 2) schema evolution and 3) time travel queries on Iceberg data lakes.

Data Pipelines

The solution plans to support 1) batch data pipelines using Glue 2) streaming data pipeline using Glue and 3) streaming data pipeline using Flink. The initial release supports modules to provision batch data pipelines using Glue.

IAM Identity Center

The solutions supports modules to provision IAM Identity Center instances, at organization-level or account-level, create users and groups and grant them required permissions.

Sagemaker Unified Studio

The solutions supports modules to provision Sagemaker domains and Sagemaker projects, with integration with IAM Identity Center to support various Sagemaker domain/project roles. It provisions one domain and creates two projects, producer and consumer.

Sagemaker Lakehouse

The solutions supports modules to configure Sagemaker projects to load data lakes into Sagemaker Lakehouse.

Sagemaker Data Processing

The solutions demonstrates how to implement data processing using Athena, Redshift and EMR.

Sagemaker Catalog

The solutions supports modules to configure Sagemaker projects to create data sources and load assets from data lakes, view data quality and data lineage. It also supports modules to create custom assets and custom lineage.

Sagemaker Data Mesh

The solutions demonstrates how to implement a data mesh solution between a producer and consumer projects with proper data governance enforced by the producer project.

Sagemaker Quicksight

The solutions demonstrates how to create visualization using Quicksight leveraging data from data lakes.

Note: Additional reading about DAIVI architecture and Design principles here.

Management Data Lake for Cost Analytics

Working backward from the needs of a customer, the solution implements a sample management data lake for cost analytics to demonstrate how to create Hive and Iceberg data lakes on S3 buckets and S3 table buckets, and how to implement batch data pipelines using Glue.

High level architecture

Data Lakes

All three data pipelines below receive data from different data sources and store them in Hive and Iceberg data lakes using standard S3 buckets or S3 table bucket.

1. AWS Cost and Usage Data Lake

The solution uses a Glue batch data pipeline to load AWS Cost and Usage report to Hive and Iceberg data lakes.

2. AWS S3 Inventory Data Lake

The solution uses a Glue batch data pipeline to load AWS S3 Inventory report to Hive and Iceberg data lakes.

3. Splunk Data Lake

The solution uses a Glue batch data pipeline to load operational data from a Splunk instance into Iceberg data lakes.

Next Steps

Deploying DAIVI
Exploring DAIVI
Security Recommendatioms
Additional Resources:
- About Amazon SageMaker Unified Studio
- About Amazon SageMaker Unified Studio Lakehouse

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
build-script		build-script
data		data
docs		docs
iac		iac
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
init.sh		init.sh
set-env-vars.sh		set-env-vars.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DAIVI : A Reference Solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS

Table of Contents

About DAIVI on AWS

Solution Vision

Solution Architecture

Solution Components

Data Lakes

Data Pipelines

IAM Identity Center

Sagemaker Unified Studio

Sagemaker Lakehouse

Sagemaker Data Processing

Sagemaker Catalog

Sagemaker Data Mesh

Sagemaker Quicksight

Management Data Lake for Cost Analytics

High level architecture

Data Lakes

1. AWS Cost and Usage Data Lake

2. AWS S3 Inventory Data Lake

3. Splunk Data Lake

Next Steps

Contributors

About

Releases

Packages

Contributors 3

Languages

License

aws-samples/sample-pace-data-analytics-ml-ai

Folders and files

Latest commit

History

Repository files navigation

DAIVI : A Reference Solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS

Table of Contents

About DAIVI on AWS

Solution Vision

Solution Architecture

Solution Components

Data Lakes

Data Pipelines

IAM Identity Center

Sagemaker Unified Studio

Sagemaker Lakehouse

Sagemaker Data Processing

Sagemaker Catalog

Sagemaker Data Mesh

Sagemaker Quicksight

Management Data Lake for Cost Analytics

High level architecture

Data Lakes

1. AWS Cost and Usage Data Lake

2. AWS S3 Inventory Data Lake

3. Splunk Data Lake

Next Steps

Contributors

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages