Skip to content

Latest commit

 

History

History
71 lines (46 loc) · 2.49 KB

File metadata and controls

71 lines (46 loc) · 2.49 KB

Analytics

Athena

  • Used to query data in S3 using SQL queries

  • Interactive query service for analyzing data in Amazon S3 using SQL

  • Serverless; no need to manage infrastructure

  • Works with data formats like CSV, JSON, Parquet, and ORC

  • Integration with AWS Glue for data cataloging and schema discovery

Kinesis Family

  • Used for stream or real-time processing of data (such as stock prices or video streaming)

  • Real-time data streaming and analytics platform

  • Captures, processes, and stores streaming data

  • Loads streaming data into data lakes, warehouses, or analytics tools like Redshift and S3

  • Processes real-time streams using SQL queries

Quicksight

  • Used for business intelligence/analytics to visualize your data using reports and dashboard

  • Cloud-native business intelligence tool for data visualization and reporting

  • Supports interactive dashboards, machine learning insights, and collaboration

  • Directly integrates with AWS data sources like Redshift, Athena, and S3

  • Supports pay-per-session pricing for cost efficiency

Amazon Redshift

  • Fully managed data warehouse optimized for large-scale analytics
  • Columnar storage and advanced compression for high query performance
  • Integrates with S3 for data lake querying using Redshift Spectrum

AWS Glue

  • Serverless data integration and ETL (extract, transform, load) service
  • Manages metadata with a central data catalog
  • Auto-generates ETL scripts and supports Python and Scala

AWS Lake Formation

  • Simplifies the creation of secure data lakes
  • Automates data ingestion, transformation, and security policy application
  • Integrates with S3 and other analytics services like Athena and Redshift

EMR (Elastic MapReduce)

  • A hosted hadoop framework that also supports Spark, Hive and Presto

  • Managed service for big data processing using frameworks like Apache Spark, Hadoop, and Presto

  • Scales to process petabytes of data

  • Cost-effective with spot instances and auto-scaling

Amazon OpenSearch Service

  • Managed service for search, log analytics, and real-time monitoring
  • Fully integrates with Kinesis, CloudWatch, and Lambda
  • Formerly known as Amazon Elasticsearch Service

AWS Data Pipeline

  • Workflow orchestration for data movement and transformation
  • Automates ETL processes and integrates with various AWS services like S3, Redshift, and DynamoDB

AWS Step Functions

  • Coordinates distributed applications and analytics workflows
  • Used to orchestrate analytics jobs across services like Glue, Lambda, and EMR