This repository documents my experiments and tests with a wide range of Big Data technologies and infrastructure components. The goal is to understand how these tools work individually and together within modern data platforms.
- Apache Hadoop – Distributed storage and processing
- Apache Hive – Data warehousing on top of Hadoop
- PostgreSQL – Relational database for metadata and integration
- Apache Spark – Fast, in-memory data processing engine
- Apache Kylin – OLAP engine for real-time analytics
- Apache HBase – NoSQL database on HDFS
- Apache Kafka – Distributed messaging and streaming platform
- Cloudflare – Edge networking and security (DNS, caching, etc.)
- Docker Compose – Local container orchestration
- Kubernetes – Scalable container orchestration for cloud-native deployments
- MinIO – High-performance, S3-compatible object storage designed for large-scale data infrastructure and cloud-native environments
- Kubernetes – Scalable container orchestration for cloud-native deployments