Skip to content

ContextId refactor #8119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: staging
Choose a base branch
from
Open

ContextId refactor #8119

wants to merge 18 commits into from

Conversation

mgoworko
Copy link
Contributor

@mgoworko mgoworko commented May 20, 2025

Describe your changes

Changes:

  • separated parts of the ContextId structure - it is no longer a simple string, but instead case class that contains a few fields representing scenario id, source node id, index etc.
  • it does not fundamentally change the ContextId representation - after serialization to string is looks the same
  • with one significant modification - the Split node now adds suffix to context ids of each of its outgoing branches; Split essentially clones the event and sends copy to each of its branches, those outgoing contexts need to be identifiable
  • the new ContextId structure is returned in test results data structures

Checklist before merge

  • Related issue ID is placed at the beginning of PR title in [brackets] (can be GH issue or Nu Jira issue)
  • Code is cleaned from temporary changes and commented out lines
  • Parts of the code that are not easy to understand are documented in the code
  • Changes are covered by automated tests
  • Showcase in dev-application.conf added to demonstrate the feature
  • Documentation added or updated
  • Added entry in Changelog.md describing the change from the perspective of a public distribution user
  • Added MigrationGuide.md entry in the appropriate subcategory if introducing a breaking change
  • Verify that PR will be squashed during merge

@mgoworko mgoworko changed the title In progress, do not review: ContextId refactor ContextId refactor May 21, 2025
}

case class ContextId(value: String)
case class ContextId(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor separates constituent parts of the ContextId, but essentially it does not change this structure. Should we go in any particular direction? J suggested adding some displayableName.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we shouldn't strictly assume restriction that ContextId should be exactly the same as before change. Instead, we should think what parts where necessary to provide the current identity behaviour and extract them. We should avoid magical parts:

  • prefix: did you find some places where it was something different than scenarioName? If, so, maybe we have two separate parts: scenarioName + optional sth else
  • nodeId: let's make it even more explicit. nodeId was added to easily provide identity between context generated by different nodes. Let's call it e.g. originatingNodeId
  • suffix: I assume that it was used in places where new context was generated and neither taskId nor node id was sufficient? can you give some examples?
  • index was used in for-each or other contexts?

With regards to displayableName, IMO we should easily change presentation so I think that we shouldn't provide this and make FE decide how to display these information.

@@ -65,4 +67,37 @@ object RowConversions {

}

private implicit def contextIdCodec: Codec[ContextId] =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the circe codec here, and keep the contextId in the single String field, as before. The reason is that we may need to further modify ContextId representation, without making multiple (very) problematic changes in the Flink serialization. But at the same time I am not sure if we are supposed to use circe in this context - whether it will have impact on performance.

@mgoworko mgoworko requested a review from arkadius May 22, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants