-
Notifications
You must be signed in to change notification settings - Fork 123
Activity pause interrupt heartbeat #909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
e24fee7
c11dd71
2a017f0
fbd1bc1
bf47c10
7e4bfa2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,7 +10,10 @@ use std::{ | |
| time::{Duration, Instant}, | ||
| }; | ||
| use temporal_sdk_core_protos::{ | ||
| coresdk::{ActivityHeartbeat, IntoPayloadsExt, activity_task::ActivityCancelReason}, | ||
| coresdk::{ | ||
| ActivityHeartbeat, IntoPayloadsExt, | ||
| activity_task::{ActivityCancelReason, ActivityCancellationDetails, ActivityTask}, | ||
| }, | ||
| temporal::api::{ | ||
| common::v1::Payload, workflowservice::v1::RecordActivityTaskHeartbeatResponse, | ||
| }, | ||
|
|
@@ -142,12 +145,19 @@ impl ActivityHeartbeatManager { | |
| .record_activity_heartbeat(tt.clone(), details.into_payloads()) | ||
| .await | ||
| { | ||
| Ok(RecordActivityTaskHeartbeatResponse { cancel_requested, activity_paused: _ }) => { | ||
| if cancel_requested { | ||
| Ok(RecordActivityTaskHeartbeatResponse { cancel_requested, activity_paused }) => { | ||
| if cancel_requested || activity_paused { | ||
| // Prioritize Cancel over Pause | ||
| let reason = if cancel_requested { ActivityCancelReason::Cancelled } else { ActivityCancelReason::Paused}; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fmt is checked by CI, buuuuut: This makes me think that
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah just verified, both fields can be set to true: I don't mind changing it to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Interesting that https://github.com/temporalio/sdk-core/actions/runs/14725400139/job/41326870507?pr=909 passes and As for repeated, yeah, usually I would say show them all and let lang figure out how to handle, but I can also see that Core needs to pick an "interruption reason" winner. I may bring this up internally.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I don't think it's about "picking", per se, we may need to expose both all the way to the user
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, after discussion, it is the case that multiple booleans on this can be true (again there is a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also after discussion, it should be known that "pause" can flip between true and false, so we may want to memoize its first true and not let it be set back to false, or maybe we are ok with the flipping.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This formatting does indeed seem weird, but don't know why CI isn't complaining. |
||
| cancels_tx | ||
| .send(PendingActivityCancel::new( | ||
| tt.clone(), | ||
| ActivityCancelReason::Cancelled, | ||
| reason, | ||
| ActivityCancellationDetails { | ||
| is_cancelled: cancel_requested, | ||
| is_paused: activity_paused, | ||
| ..Default::default() | ||
| } | ||
| )) | ||
| .expect( | ||
| "Receive half of heartbeat cancels not blocked", | ||
|
|
@@ -164,6 +174,7 @@ impl ActivityHeartbeatManager { | |
| .send(PendingActivityCancel::new( | ||
| tt.clone(), | ||
| ActivityCancelReason::NotFound, | ||
| ActivityTask::primary_reason_to_cancellation_details(ActivityCancelReason::NotFound) | ||
| )) | ||
| .expect("Receive half of heartbeat cancels not blocked"); | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -67,7 +67,18 @@ message Start { | |
|
|
||
| // Attempt to cancel a running activity | ||
| message Cancel { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question, say I heartbeat and get paused as true, so we send a cancel, then I heartbeat again and paused is false, then I heartbeat again and paused is true again. Do we send multiple cancellations or does Core only send the first cancellation reason it sees? This can also happen if say, paused becomes true, then cancel becomes true or then worker shutdown occurs. My concern if Core does only apply the first cancellation reason, is that we can't get updates to these This question is a bit blocking for me because I'm trying to understand how we model updating activity state as it comes back from the server/Core. I wonder if we should separate this cancel that only happens once from "activity state updated" we may send more frequently (and how we can make sure langs account for it).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's the former, as in, only apply first cancellation. Suppose the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO it does not make sense for core to send multiple cancels. I don't think it should or will ever do this right now... (I would have to double-check the shutdown case). It adds a bunch of complication, and I'm not sure what benefit you'd get even if you did it. We can always, if it turns out to be needed for some reason, add some kind of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Completely agree on sending multiple cancels. But langs now need to make it clear that "is_paused" does not mean "is paused" (which it very well may be) but rather "pausing was reason for cancellation request". |
||
| // Primary cancellation reason | ||
| ActivityCancelReason reason = 1; | ||
| // Activity cancellation details, surfaces all cancellation reasons. | ||
| ActivityCancellationDetails details = 2; | ||
| } | ||
|
|
||
| message ActivityCancellationDetails { | ||
| bool is_not_found = 1; | ||
| bool is_cancelled = 2; | ||
| bool is_paused = 3; | ||
| bool is_timed_out = 4; | ||
| bool is_worker_shutdown = 5; | ||
| } | ||
|
|
||
| enum ActivityCancelReason { | ||
|
|
@@ -79,6 +90,8 @@ enum ActivityCancelReason { | |
| TIMED_OUT = 2; | ||
| // Core is shutting down and the graceful timeout has elapsed | ||
| WORKER_SHUTDOWN = 3; | ||
| // Activity was paused | ||
| PAUSED = 4; | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need this I think. If it is necessary, see if we can find a way to avoid the sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for other locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically adapted this from
heartbeats_report_cancels_only_once, my guess was that the sleep was needed to wait for some heartbeat flush timeout to elapse. (if this is the case) I suppose we only need to flush this once for all heartbeats, instead of per heartbeat, now that I think about this more.Does the "general testing pattern" comment in the test line up accurately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaah, right. Yeah... I suppose that's a bit unavoidable here. The comment makes sense.