Welcome to the C++ Data Pipeline Project! ๐ This project simulates a multi-stage data pipeline using modern C++, featuring thread-safe communication between stages via a custom Thread_Safe_Queue. It showcases real-world concepts like multithreading, synchronization, producer-consumer patterns, and data transformation.
- Thread-safe communication between pipeline stages. The 3 stages:
- Extraction ๐งน โ Read/generate raw data
- Transformation ๐ง โ Modify or clean the data
- Loading ๐๏ธ โ Save/process the final data
- Custom ThreadSafeQueue for safe data sharing between threads
- Condition variables to block and wake up efficiently
- Scalable design for future extensions
- /include
- pipeline_manager.hpp // Manages pipeline threads
- thread_safe_queue.hpp // Thread-safe queue implementation
- /source
- pipeline_manager.cpp // Pipeline execution logic
- main.cpp // Program entry point
- /_pipeline_build (generated by CMake)
- CMakeLists.txt // Build configuration
- README.md // This file
- Clone the repository:
git clone https://github.com/arrowten/Ingestor.git
cd Ingestor
- Build
sh build.sh
- Run
sh run.sh
[INFO]: Extracted: data item
[INFO]: Transformed: DATA ITEM
Loaded: DATA ITEM
[INFO]: Extracted: data item
[INFO]: Transformed: DATA ITEM
Loaded: DATA ITEM
...
โ Smooth multithreaded extraction, transformation, and loading of data!
- C++17/20 (Modern C++ features)
- CMake (for building)
- POSIX Threads (via std::thread, std::mutex)