Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sets PubsubMessageWithAttributesAndMessageIdAndOrderingKeyCoder as default coder for all PubsubMessage #24887

Closed

Conversation

egalpin
Copy link
Member

@egalpin egalpin commented Jan 4, 2023

This PR alters the default Coder for pubsub message objects from PubsubMessageWithAttributesCoder to PubsubMessageWithAttributesAndMessageIdAndOrderingKeyCoder as default coder for all PubsubMessage.class. Note that there was a previous dev@ thread[1] on this subject where there was no opposition to this change.

The one important detail to call out is that pipeline authors may need to explicitly set the coder to the value which was previously the default, PubsubMessageWithAttributesCoder, in cases where pipeline update is required. I do have some concern over whether that's completely sufficient/possible based on the behaviour observed in #21162 (comment).

I'm not sure this should be merged until further testing can be done to validate whether a running pipeline can be successfully updated.

Fixes #23525

[1] https://lists.apache.org/thread/c3qk0cp3rbhk2wh8m0z8gqxqy931dbwc

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@egalpin egalpin added this to the 2.45.0 Release milestone Jan 4, 2023
@egalpin egalpin requested review from johnjcasey and apilloud January 4, 2023 22:51
@egalpin egalpin marked this pull request as ready for review January 4, 2023 22:52
private static final Coder<String> MESSAGE_ID_CODER = StringUtf8Coder.of();
// A message's messageId can only be null when the message is an outgoing message (i.e. to be
// published). Incoming messages will always have a non-null messageId
private static final Coder<String> MESSAGE_ID_CODER = NullableCoder.of(StringUtf8Coder.of());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change as it adds a byte to the encoded message. Why is it needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be needed to allow for use of this coder for outgoing messages, where the messageId field must be null to allow publishing.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@johnjcasey
Copy link
Contributor

@egalpin and @apilloud should this be blocking the 2.45 release?

@egalpin
Copy link
Member Author

egalpin commented Jan 23, 2023

@johnjcasey no I don’t believe so

@damccorm
Copy link
Contributor

@egalpin and @apilloud should this be blocking the 2.46 release? I'm guessing no given that it wasn't a 2.45 blocker

@egalpin
Copy link
Member Author

egalpin commented Feb 17, 2023

@damccorm i don’t believe it should be a blocker, no

@damccorm damccorm removed this from the 2.46.0 Release milestone Feb 17, 2023
@kennknowles
Copy link
Member

I notice this is attached to an open P1 for a while. What do we need to make progress?

@egalpin
Copy link
Member Author

egalpin commented Apr 28, 2023

@kennknowles thanks for the bump. I have not had much bandwidth to meaningfully test this change. Something that concerns me and requires validation is whether or not the coder can be successfully set/changed from default so as to allow backward compatibility for pipeline authors. It appears that using .setCoder was not entirely sufficient based on prior exploration[1], but I have not yet revisited or gotten into a good repro-test feedback loop.

I have no issue handing this off if it’s high priority and someone else is keen to take up the work.

[1] #21162 (comment)

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 27, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jul 4, 2023

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Default PubsubMessage coder will drop message id and orderingKey
5 participants