Skip to content

Use now + 2mins as the end timestamp for change stream read API if the connector endTimestamp is omitted #34967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2025

Conversation

changliiu
Copy link
Contributor

V1 change stream can use null end timestamp for the query, however V2 the end timestamp of the query should be NOT NULL, and should be at most 30 mins from the max(now, start_timestamp).

To allow users to still omit the connector endTimestamp field to run the connector forever, but to give a valid endTimestamp when try to query change stream, we set the change stream endTimestamp in this case as now + 2 mins.

This solution works as the Apache beam checkpoints the ReadChangeStreamPartition execution every 5s or 5MB of output data produced.
Moreover the change stream query has a hard 1 min deadline.

@changliiu changliiu force-pushed the refactor-run-forever branch 4 times, most recently from 4d78dc3 to 6d6b262 Compare May 16, 2025 23:14
@changliiu changliiu marked this pull request as ready for review May 16, 2025 23:15
@changliiu changliiu force-pushed the refactor-run-forever branch from 6d6b262 to 953531c Compare May 16, 2025 23:56
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@changliiu changliiu force-pushed the refactor-run-forever branch from 953531c to 3585ffe Compare May 27, 2025 01:12
@changliiu
Copy link
Contributor Author

Checked failed integration test, not relevant to this PR. No need to block the review.

@changliiu
Copy link
Contributor Author

assign set of reviewers

Copy link
Contributor

Assigning reviewers:

R: @m-trieu for label java.
R: @nielm for label spanner.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

github-actions bot commented Jun 7, 2025

Reminder, please take a look at this pr: @m-trieu @nielm

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.
R: @nielm for label spanner.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

Copy link
Contributor

@nielm nielm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM

Test failures appear to be unrelated to this change

Copy link
Contributor

@thiagotnunes thiagotnunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we run a pipeline to test this?

@changliiu
Copy link
Contributor Author

Did we run a pipeline to test this?

Yes I ran a pipeline containing this change in order to run a connector without specifying end timestamp.

However, I don't record any evidence for that. If you have opinion, I can run again and put evidence somewhere maybe here.
Please let me know your thoughts, thanks!

@nancyxu825
Copy link

nancyxu825 commented Jun 13, 2025

Just so I understand things correctly:

  1. The change stream v2 partition token does not have an explicit end timestamp
  2. Users however are required to provide an end timestamp to their change stream V2 queries within 30 minutes in the future
  3. We chose 2 minutes, since it is relatively small and > the Dataflow 5s checkpointing.

Was wondering if you could confirm?

@changliiu changliiu force-pushed the refactor-run-forever branch from 3585ffe to 33798df Compare June 16, 2025 17:10
@changliiu
Copy link
Contributor Author

Just so I understand things correctly:

  1. The change stream v2 partition token does not have an explicit end timestamp
  2. Users however are required to provide an end timestamp to their change stream V2 queries within 30 minutes in the future
  3. We chose 2 minutes, since it is relatively small and > the Dataflow 5s checkpointing.

Was wondering if you could confirm?

  1. V2 change stream required end_timestamp, and it's cannot be omitted.
  2. However, connector end_timestamp can be skipped. If so, the connector needs to run forever.
  3. Yes.

Please let me know if more details/explanation is needed. Thanks!

@changliiu changliiu closed this Jun 16, 2025
@changliiu
Copy link
Contributor Author

Friendly ping :)

@changliiu changliiu reopened this Jun 17, 2025
Copy link
Contributor

@nielm nielm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@Abacn
Copy link
Contributor

Abacn commented Jun 18, 2025

please fix:

Execution failed for task ':sdks:java:io:google-cloud-platform:spotlessJavaCheck'.
> The following files had format violations:
      sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/action/QueryChangeStreamAction.java
          @@ -299,7 +299,8 @@
           ??}
           
           ??//?Return?(now?+?2?mins)?as?the?end?timestamp?for?reading?change?streams.?This?is?only?used?if
          -??//?users?want?to?run?the?connector?forever.?This?approach?works?because?Google?Dataflow?checkpoints
          +??//?users?want?to?run?the?connector?forever.?This?approach?works?because?Google?Dataflow
          +??//?checkpoints
           ??//?every?5s?or?5MB?output?provided?and?the?change?stream?query?has?deadline?for?1?min.
           ??private?Timestamp?getNextReadChangeStreamEndTimestamp()?{
           ????final?Timestamp?current?=?Timestamp.now();
  Run './gradlew :sdks:java:io:google-cloud-platform:spotlessApply' to fix these violations.

@changliiu changliiu force-pushed the refactor-run-forever branch from 33798df to 3d76a66 Compare June 20, 2025 17:22
@Abacn Abacn merged commit 33ea938 into apache:master Jun 23, 2025
16 checks passed
@changliiu changliiu deleted the refactor-run-forever branch June 25, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants