Skip to content

[mqtt] Add portable MqttIO Read/Write transforms (revives #32385)#38493

Draft
tkaymak wants to merge 2 commits into
apache:masterfrom
tkaymak:mqtt-xlang-schematransform
Draft

[mqtt] Add portable MqttIO Read/Write transforms (revives #32385)#38493
tkaymak wants to merge 2 commits into
apache:masterfrom
tkaymak:mqtt-xlang-schematransform

Conversation

@tkaymak
Copy link
Copy Markdown
Contributor

@tkaymak tkaymak commented May 13, 2026

What

Revives the approved diff from PR #32385 (Add portable Mqtt source and sink transforms) and wires the new SchemaTransforms into Python cross-language wrapper generation.

After this lands, Python users can do:

from apache_beam.io import ReadFromMqtt, WriteToMqtt

and reach MqttIO over xlang, the same way ReadFromKafka / WriteToKafka work today.

How

Two commits:

  1. [mqtt] Add SchemaTransform providers for MqttIO Read/Write

    • Decorate MqttIO.ConnectionConfiguration with @DefaultSchema(AutoValueSchema.class) + @SchemaFieldDescription so it round-trips through Beam Schemas.
    • New MqttReadSchemaTransformProvider (beam:schematransform:org.apache.beam:mqtt_read:v1) and MqttWriteSchemaTransformProvider (beam:schematransform:org.apache.beam:mqtt_write:v1), both @AutoService-registered.
    • New MqttSchemaTransformProviderTest covering the read-with-timeout-no-data case and a write-then-read round trip via an embedded ActiveMQ broker.
    • Pull :sdks:java:io:mqtt into :sdks:java:io:expansion-service so the providers are discoverable by ExpansionService.
  2. [mqtt] Wire MqttIO into Python xlang wrapper generation

    • Add name overrides in sdks/standard_expansion_services.yaml so the generated wrappers use kafka-style naming (ReadFromMqtt / WriteToMqtt).
    • Regenerate sdks/standard_external_transforms.yaml via :sdks:python:generateExternalTransformsConfig.
    • Add an I/Os entry to CHANGES.md for 2.74.0.

Notes vs. PR #32385

  • Generic-typing fix for the post-Add support for Read with Meatadata in MqttIO #32668 API: MqttIO.Read<byte[]> / MqttIO.Write<byte[]> instead of the raw types in the original PR.
  • Naming: ReadFromMqtt / WriteToMqtt (kafka-style) instead of the auto-derived MqttRead / MqttWrite. Per @Abacn's roadmap comment about onboarding through standard_expansion_services.yaml.
  • The original PR's regenerated standard_external_transforms.yaml shape changed slightly (fields is now a list and Python types have a more compact representation) because gen_xlang_wrappers.py evolved since 2024-08. Our regenerated diff follows current master's format.
  • topic is now Optional in the generated schema because MqttIO.ConnectionConfiguration#getTopic() was made @Nullable by PR #32668 (readWithMetadata).

Scope

Batch only. The streaming-mode failure that @twosom flagged on the original PR (commentbatch worked, streaming did not) was never root-caused. That investigation is intentionally out of scope here and will be addressed in a dedicated follow-up PR.

Credits

Original work by @ahmedabu98 and @twosom on PR #32385; @damondouglas approved that PR before it went stale and auto-closed on 2025-10-14. This revives that change with the small adjustments above.

Verification

GRADLE_USER_HOME=/tmp/.gradle ./gradlew \
    :sdks:java:io:mqtt:test \
    :sdks:java:io:expansion-service:build \
    :validateChanges

All pass locally.

Closes the gap from #32385 / addresses #21060 (Python MQTT IO).

tkaymak added 2 commits May 13, 2026 21:20
Adds MqttReadSchemaTransformProvider and MqttWriteSchemaTransformProvider
so MqttIO can be used through the portable SchemaTransform API and exposed
as cross-language transforms. Decorates MqttIO.ConnectionConfiguration with
@DefaultSchema(AutoValueSchema.class) and @SchemaFieldDescription so the
config round-trips through Beam Schemas.

Wires :sdks:java:io:mqtt into :sdks:java:io:expansion-service so the
SchemaTransforms are picked up by ExpansionService via @autoservice.

Tests cover a read-with-timeout-no-data case and a write-then-read round
trip against an embedded ActiveMQ broker. Batch only; the streaming case
flagged on PR apache#32385 will be addressed in a follow-up.

Revives the approved diff from PR apache#32385 (ahmedabu98, twosom) and adapts
it to the post-apache#32668 generic API (MqttIO.Read<T> / MqttIO.Write<T>).
Adds name overrides for the new MqttIO SchemaTransforms in
standard_expansion_services.yaml so the generated Python wrappers follow
the kafka-style naming (ReadFromMqtt / WriteToMqtt) and become available
under apache_beam.io.

Regenerates standard_external_transforms.yaml via
:sdks:python:generateExternalTransformsConfig — the file now includes the
mqtt_read:v1 and mqtt_write:v1 entries with their inferred Row schema for
ConnectionConfiguration.

Adds an I/Os entry in CHANGES.md for the upcoming 2.74.0 release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant