Skip to content

DAIVI is a reference solution with IAC modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation Amazon SageMaker Unified Studio. The goal of the DAIVI solution is to provide engineers with sample infrastructure-as-code modules and application modules to build their data platforms.

License

Notifications You must be signed in to change notification settings

aws-samples/sample-pace-data-analytics-ml-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DAIVI : A Reference Solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS

Table of Contents

About DAIVI on AWS

DAIVI is a reference solution with Terraform modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation "Amazon SageMaker" platform. The goal of the DAIVI solution is to provide engineers with sample Terraform modules to build their enterprise data platform on AWS. The solution is being developed working backward from customer priorities.

The solution currently supports the first 6 modules marked in green in the solution roadmap below, that include

  1. Identity Center modules for Organization Level and Account Level Identity Centers with Users, Groups, Permissions and IAM Mapping
  2. Data Lake modules to create Hive Tables and Iceberg Tables, both on standard S3 bucket and S3 Table bucket
  3. Sagemake modules to provision domains and projects, add members to the projects, configure lakehouse and add compute
  4. Sagemaker Catalog modules to support standard assets, custom assets, data lineage, both standard and custom, data quality and data mesh
  5. DataZone modules to support standard assets, custom blueprints, data lineage, both standard and custom, data quality and data mesh
  6. Data pipeline module for Glue batch pipeline, Data exploration module for Athena and Data visualization module for Quicksight.

Solution Vision

Vision

Solution Architecture

Vision

Solution Components

Data Lakes

The solution supports modules to provision 1) Hive data lakes on standard S3 buckets 2) Iceberg data lakes on standard S3 buckets and 3) Iceberg data lakes on S3 table buckets. In addition,it covers how customers can implement 1) transaction data lakes 2) schema evolution and 3) time travel queries on Iceberg data lakes.

Data Pipelines

The solution plans to support 1) batch data pipelines using Glue 2) streaming data pipeline using Glue and 3) streaming data pipeline using Flink. The initial release supports modules to provision batch data pipelines using Glue.

IAM Identity Center

The solutions supports modules to provision IAM Identity Center instances, at organization-level or account-level, create users and groups and grant them required permissions.

Sagemaker Unified Studio

The solutions supports modules to provision Sagemaker domains and Sagemaker projects, with integration with IAM Identity Center to support various Sagemaker domain/project roles. It provisions one domain and creates two projects, producer and consumer.

Sagemaker Lakehouse

The solutions supports modules to configure Sagemaker projects to load data lakes into Sagemaker Lakehouse.

Sagemaker Data Processing

The solutions demonstrates how to implement data processing using Athena, Redshift and EMR.

Sagemaker Catalog

The solutions supports modules to configure Sagemaker projects to create data sources and load assets from data lakes, view data quality and data lineage. It also supports modules to create custom assets and custom lineage.

Sagemaker Data Mesh

The solutions demonstrates how to implement a data mesh solution between a producer and consumer projects with proper data governance enforced by the producer project.

Sagemaker Quicksight

The solutions demonstrates how to create visualization using Quicksight leveraging data from data lakes.

Note: Additional reading about DAIVI architecture and Design principles here.

Management Data Lake for Cost Analytics

Working backward from the needs of a customer, the solution implements a sample management data lake for cost analytics to demonstrate how to create Hive and Iceberg data lakes on S3 buckets and S3 table buckets, and how to implement batch data pipelines using Glue.

High level architecture

High Level Architecture

Data Lakes

All three data pipelines below receive data from different data sources and store them in Hive and Iceberg data lakes using standard S3 buckets or S3 table bucket.

1. AWS Cost and Usage Data Lake

The solution uses a Glue batch data pipeline to load AWS Cost and Usage report to Hive and Iceberg data lakes.

Cost

2. AWS S3 Inventory Data Lake

The solution uses a Glue batch data pipeline to load AWS S3 Inventory report to Hive and Iceberg data lakes.

Cost

3. Splunk Data Lake

The solution uses a Glue batch data pipeline to load operational data from a Splunk instance into Iceberg data lakes.

Cost

Next Steps

Contributors

About

DAIVI is a reference solution with IAC modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation Amazon SageMaker Unified Studio. The goal of the DAIVI solution is to provide engineers with sample infrastructure-as-code modules and application modules to build their data platforms.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •