In this demo, we will show you how to use docker-compose to run multiple datagen instances and produce 30GB of data to a Kafka cluster.
The docker-compose.yaml file defines the following services:
redpanda: A single-node Kafka instance.- 3
datageninstances that produce data to Redpanda simultaneously.
Each datagen instance produces 10GB of random data to Redpanda using an auto incrementing key thanks to the iteration.index identifier in the schemas/schema.json file. This allows you to simulate an upsert source with a total of 30GB of data but only 10GB of unique data.
Example of the datagen instance configuration:
datagen1:
image: materialize/datagen:latest
container_name: datagen1
depends_on:
- redpanda
environment:
KAFKA_BROKERS: redpanda:9092
volumes:
- ./schemas:/schemas
entrypoint:
datagen -s /tests/schema.json -f json -n 10024 --record-size 1048576 -dRundown of the datagen instance configuration:
image: ThedatagenDocker image.container_name: The name of the container. This should be unique for each instance.depends_on: Thedatageninstance depends on theredpandaservice.environment: TheKAFKA_BROKERSenvironment variable is used to configure the Kafka/Redpanda brokers. If you are using a Kafka cluster with SASL authentication, you can also set theSASL_USERNAME,SASL_PASSWORDandSASL_MECHANISMenvironment variables.volumes: Thedatageninstance mounts theschemasdirectory to the/schemasdirectory in the container. This is where we have theschema.jsonfile.entrypoint: Thedatagencommand line arguments. The-sflag is used to specify the schema file. The-fflag is used to specify the output format. The-nflag is used to specify the number of records to generate. The--record-sizeflag is used to specify the size of each record. The-dflag is used to enable debug logging.
-
Clone the
datagenrepository:git clone https://github.com/MaterializeInc/datagen.git cd datagen/examples/docker-compose -
Start the demo:
docker-compose up -d
The demo will take a few minutes to start up. You should see the following output:
Creating network "docker-compose_default" with the default driver Creating docker-compose_redpanda_1 ... done Creating docker-compose_datagen_1 ... done Creating docker-compose_datagen_2 ... done Creating docker-compose_datagen_3 ... done
-
Verify that the demo is running:
docker-compose ps -a
-
Stopping the demo:
docker-compose down -v