From 554ac745641ad0090390f63fbc2340c5e3c91bdf Mon Sep 17 00:00:00 2001 From: automationdream Date: Thu, 14 Aug 2025 23:40:38 +0200 Subject: [PATCH 1/2] Fix Docker Compose: update console image registry path - Changed console image from docker.redpanda.com/vectorized/console:v2.2.3 to redpandadata/console:v2.2.3 (same version, correct registry) - Removed deprecated version field - Maintained all original component versions --- docker-compose.yaml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docker-compose.yaml b/docker-compose.yaml index cd55c96..1301e4d 100644 --- a/docker-compose.yaml +++ b/docker-compose.yaml @@ -1,4 +1,3 @@ -version: "3.7" services: redpanda: command: @@ -26,7 +25,7 @@ services: - "19644:9644" console: container_name: redpanda-console - image: docker.redpanda.com/vectorized/console:v2.2.3 + image: redpandadata/console:v2.2.3 entrypoint: /bin/sh command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console' environment: From d4830f8755936e41ed39e2290b76c174d8222d81 Mon Sep 17 00:00:00 2001 From: automationdream Date: Fri, 15 Aug 2025 00:08:25 +0200 Subject: [PATCH 2/2] Enhance Readme.md with project status and detailed setup instructions - Added project status section highlighting readiness for streaming processing. - Expanded the "Project Setup and Data Loading" section with steps to compile the project and load data into Kafka. - Included commands for running transactions and state producers, along with verification steps for data loading. --- Readme.md | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 65 insertions(+), 2 deletions(-) diff --git a/Readme.md b/Readme.md index 8d34a16..f1e7b98 100644 --- a/Readme.md +++ b/Readme.md @@ -7,11 +7,19 @@ Stream Processing with Apache Flink This repository contains the code for the book **[Stream Processing: Hands-on with Apache Flink](https://leanpub.com/streamprocessingwithapacheflink)**. +> **🚀 Project Status: Ready for Streaming Processing!** +> +> ✅ **Infrastructure**: RedPanda (Kafka) + Flink cluster running +> ✅ **Project**: Compiled and packaged +> ✅ **Data Loaded**: 1M+ transactions, 5K+ customers, 4K+ accounts in Kafka topics +> ✅ **Next Step**: Ready to run Flink streaming jobs + ### Table of Contents 1. [Environment Setup](#environment-setup) -2. [Register UDF](#register-udf) -3. [Deploy a JAR file](#deploy-a-jar-file) +2. [Project Setup and Data Loading](#project-setup-and-data-loading) +3. [Register UDF](#register-udf) +4. [Deploy a JAR file](#deploy-a-jar-file) ### Environment Setup @@ -35,6 +43,61 @@ or this command for kafka setup ``` +### Project Setup and Data Loading +This section documents the steps to compile the project and load data into Kafka for streaming processing. + +#### 1. Compile the Project +The project is a Maven-based Java application. To compile it: + +```shell +# Compile only +mvn clean compile + +# Compile and create JAR file +mvn clean package +``` + +#### 2. Load Data into Kafka +After starting the infrastructure (RedPanda/Kafka + Flink), you need to populate the topics with data: + +**Run Transactions Producer:** +```shell +mvn exec:java -Dexec.mainClass="io.streamingledger.producers.TransactionsProducer" +``` +This will: +- Load 1,000,000 transactions from `/data/transactions.csv` +- Send them to the `transactions` topic +- Use optimized producer settings (batch size: 64KB, compression: gzip) + +**Run State Producer:** +```shell +mvn exec:java -Dexec.mainClass="io.streamingledger.producers.StateProducer" +``` +This will: +- Load 5,369 customers from `/data/customers.csv` → `customers` topic +- Load 4,500 accounts from `/data/accounts.csv` → `accounts` topic + +#### 3. Verify Data Loading +After running both producers, you should have: +- **`transactions`** topic: 1,000,000 transaction records +- **`customers`** topic: 5,369 customer records +- **`accounts`** topic: 4,500 account records + +You can monitor the data flow through RedPanda Console at `http://localhost:8080` + +#### 4. Alternative Ways to Run +```shell +# Option 1: Using Maven exec plugin (recommended) +mvn exec:java -Dexec.mainClass="io.streamingledger.producers.TransactionsProducer" + +# Option 2: Using the compiled JAR +java -cp target/classes:target/dependency/* io.streamingledger.producers.TransactionsProducer + +# Option 3: Using the packaged JAR +java -jar target/spf-0.1.0.jar +``` + + ### Register UDF ```shell CREATE FUNCTION maskfn AS 'io.streamingledger.udfs.MaskingFn' LANGUAGE JAVA USING JAR '/opt/flink/jars/spf-0.1.0.jar';