Skip to content

Add Kafka to Kafka RAW example pipeline configuration #34992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Ashfaqbs
Copy link

This Apache Beam YAML pipeline demonstrates a basic Kafka-to-Kafka message mirroring use case. It reads raw byte messages from a source Kafka topic (i-topic) and writes them directly to a target Kafka topic (o-topic) without any transformation or decoding.

The pipeline is defined using Beam’s YAML DSL and leverages the ReadFromKafka and WriteToKafka transforms. It is configured to use the RAW data format, making it suitable for scenarios where the message structure is opaque or processing is deferred downstream.

This example serves as a minimal reference for validating Kafka I/O configurations and can be extended for more complex streaming dataflows. It is compatible with runners that support Kafka I/O, including Direct Runner and Flink Runner.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@robertwb
Copy link
Contributor

Looks like this is failing because we are trying to test it (and it has no output asserts nor the requisite kafka topics set up). You could explicitly skip this particular example at https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/examples/testing/examples_test.py#L310 (though it might be good to at least ensure that it parses).

The RAT failure is because explicit license headers are required on all files, see for example https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/examples/wordcount_minimal.yaml that you could copy over. (A brief description of this pipeline would be good as well.)

@Ashfaqbs
Copy link
Author

assign @robertwb

@Ashfaqbs
Copy link
Author

Hi Robert

As discussed via email, this PR contributes a minimal Kafka-to-Kafka example in RAW byte format using Beam's YAML DSL. It's a simple pass-through pipeline that reads from an input Kafka topic and writes to an output topic without any transformation or decoding. This should serve as a helpful reference for users validating Kafka I/O integration with Beam YAML.

✅ Apache license header and description are included
✅ File is renamed to kafka-to-kafka_example.yaml to skip automated test execution (as the pipeline has no assertions and requires Kafka infra)
✅ All non-functional checks (RAT, Lint, Formatters, Docker, etc.) have passed
❌ The Python PreCommit test failures are unrelated and expected — they attempt to run YAMLs without assertable output

Let me know if any further changes are needed. Thanks again for your time and review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants