polyzos · automationdream · Aug 14, 2025 · Aug 14, 2025
diff --git a/Readme.md b/Readme.md
@@ -7,11 +7,19 @@ Stream Processing with Apache Flink
 
 This repository contains the code for the book **[Stream Processing: Hands-on with Apache Flink](https://leanpub.com/streamprocessingwithapacheflink)**.
 
+> **🚀 Project Status: Ready for Streaming Processing!**
+> 
+> ✅ **Infrastructure**: RedPanda (Kafka) + Flink cluster running  
+> ✅ **Project**: Compiled and packaged  
+> ✅ **Data Loaded**: 1M+ transactions, 5K+ customers, 4K+ accounts in Kafka topics  
+> ✅ **Next Step**: Ready to run Flink streaming jobs
+
 
 ### Table of Contents
 1. [Environment Setup](#environment-setup)
-2. [Register UDF](#register-udf)
-3. [Deploy a JAR file](#deploy-a-jar-file)
+2. [Project Setup and Data Loading](#project-setup-and-data-loading)
+3. [Register UDF](#register-udf)
+4. [Deploy a JAR file](#deploy-a-jar-file)
 
 
 ### Environment Setup
@@ -35,6 +43,61 @@ or this command for kafka setup
 ```
 
 
+### Project Setup and Data Loading
+This section documents the steps to compile the project and load data into Kafka for streaming processing.
+
+#### 1. Compile the Project
+The project is a Maven-based Java application. To compile it:
+
+```shell
+# Compile only
+mvn clean compile
+
+# Compile and create JAR file
+mvn clean package
+```
+
+#### 2. Load Data into Kafka
+After starting the infrastructure (RedPanda/Kafka + Flink), you need to populate the topics with data:
+
+**Run Transactions Producer:**
+```shell
+mvn exec:java -Dexec.mainClass="io.streamingledger.producers.TransactionsProducer"
+```
+This will:
+- Load 1,000,000 transactions from `/data/transactions.csv`
+- Send them to the `transactions` topic
+- Use optimized producer settings (batch size: 64KB, compression: gzip)
+
+**Run State Producer:**
+```shell
+mvn exec:java -Dexec.mainClass="io.streamingledger.producers.StateProducer"
+```
+This will:
+- Load 5,369 customers from `/data/customers.csv` → `customers` topic
+- Load 4,500 accounts from `/data/accounts.csv` → `accounts` topic
+
+#### 3. Verify Data Loading
+After running both producers, you should have:
+- **`transactions`** topic: 1,000,000 transaction records
+- **`customers`** topic: 5,369 customer records  
+- **`accounts`** topic: 4,500 account records
+
+You can monitor the data flow through RedPanda Console at `http://localhost:8080`
+
+#### 4. Alternative Ways to Run
+```shell
+# Option 1: Using Maven exec plugin (recommended)
+mvn exec:java -Dexec.mainClass="io.streamingledger.producers.TransactionsProducer"
+
+# Option 2: Using the compiled JAR
+java -cp target/classes:target/dependency/* io.streamingledger.producers.TransactionsProducer
+
+# Option 3: Using the packaged JAR
+java -jar target/spf-0.1.0.jar
+```
+
+
 ### Register UDF
 ```shell
 CREATE FUNCTION maskfn  AS 'io.streamingledger.udfs.MaskingFn'      LANGUAGE JAVA USING JAR '/opt/flink/jars/spf-0.1.0.jar';

diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -1,4 +1,3 @@
-version: "3.7"
 services:
   redpanda:
     command:
@@ -26,7 +25,7 @@ services:
       - "19644:9644"
   console:
     container_name: redpanda-console
-    image: docker.redpanda.com/vectorized/console:v2.2.3
+    image: redpandadata/console:v2.2.3
     entrypoint: /bin/sh
     command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console'
     environment: