Skip to content

Activity pause interrupt heartbeat#909

Merged
THardy98 merged 6 commits intomasterfrom
activity_pause_interrupt_heartbeat
May 1, 2025
Merged

Activity pause interrupt heartbeat#909
THardy98 merged 6 commits intomasterfrom
activity_pause_interrupt_heartbeat

Conversation

@THardy98
Copy link
Copy Markdown
Contributor

@THardy98 THardy98 commented Apr 29, 2025

What was changed

Add Pause as a reason for activity cancellations.

Why?

Provide visibility to consuming lang-SDKs to discern whether the activity has been paused.

Note that if both the cancellation and activity flags are set, the activity will be cancelled, not paused. This follows the same priority as Java.

  1. Closes Heartbeating activities should be interrupted when the activities are paused. #895

  2. How was this tested:
    Unit test

  3. Any docs updates needed?
    Nope

@THardy98 THardy98 requested a review from a team as a code owner April 29, 2025 07:04
if cancel_requested {
Ok(RecordActivityTaskHeartbeatResponse { cancel_requested, activity_paused }) => {
if cancel_requested || activity_paused {
let reason = if cancel_requested { ActivityCancelReason::Cancelled } else { ActivityCancelReason::Paused};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fmt may be needed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fmt is checked by CI, buuuuut:

This makes me think that ActivityCancelReason needs to be a repeated field in the Cancel proto, since it is possible to have cancel/paused happen at the same time. Assuming that's actually a thing server can produce, we'll need to reflect that.

Copy link
Copy Markdown
Contributor Author

@THardy98 THardy98 Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just verified, both fields can be set to true:
https://github.com/temporalio/temporal/blob/main/service/history/api/recordactivitytaskheartbeat/api.go#L85-L130

I don't mind changing it to repeated, but I'm unsure what the benefit is. I feel like it just makes lang SDKs pick which reason is relevant when we could be doing that in core, giving us consistent behaviour by default.

Copy link
Copy Markdown
Contributor

@cretz cretz Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fmt is checked by CI:

Interesting that https://github.com/temporalio/sdk-core/actions/runs/14725400139/job/41326870507?pr=909 passes and let reason = if cancel_requested { ActivityCancelReason::Cancelled } else { ActivityCancelReason::Paused}; is accepted considering the inconsistency on the trailing space before end brace.

As for repeated, yeah, usually I would say show them all and let lang figure out how to handle, but I can also see that Core needs to pick an "interruption reason" winner. I may bring this up internally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think it's about "picking", per se, we may need to expose both all the way to the user

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after discussion, it is the case that multiple booleans on this can be true (again there is a activity_reset boolean coming later). So let's go ahead and prepare for a collection of reasons here. And maybe even a "primary reason" so that each SDK doesn't have to pick the winning reason in cases where only a single one is supported.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also after discussion, it should be known that "pause" can flip between true and false, so we may want to memoize its first true and not let it be set back to false, or maybe we are ok with the flipping.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formatting does indeed seem weird, but don't know why CI isn't complaining.

@cretz
Copy link
Copy Markdown
Contributor

cretz commented Apr 29, 2025

@THardy98 and @Sushisource - note, as part of temporalio/api#578, there will be a RESET reason too coming soon. Not sure you need to change anything in this PR, just FYI.

task_token: act.task_token.clone(),
details: vec![vec![1_u8, 2, 3].into()],
});
sleep(Duration::from_millis(10)).await;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need this I think. If it is necessary, see if we can find a way to avoid the sleep.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for other locations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically adapted this from heartbeats_report_cancels_only_once, my guess was that the sleep was needed to wait for some heartbeat flush timeout to elapse. (if this is the case) I suppose we only need to flush this once for all heartbeats, instead of per heartbeat, now that I think about this more.

Does the "general testing pattern" comment in the test line up accurately?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaah, right. Yeah... I suppose that's a bit unavoidable here. The comment makes sense.

if cancel_requested {
Ok(RecordActivityTaskHeartbeatResponse { cancel_requested, activity_paused }) => {
if cancel_requested || activity_paused {
let reason = if cancel_requested { ActivityCancelReason::Cancelled } else { ActivityCancelReason::Paused};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fmt is checked by CI, buuuuut:

This makes me think that ActivityCancelReason needs to be a repeated field in the Cancel proto, since it is possible to have cancel/paused happen at the same time. Assuming that's actually a thing server can produce, we'll need to reflect that.

@THardy98 THardy98 requested a review from Sushisource April 29, 2025 18:59
@THardy98 THardy98 requested a review from cretz April 30, 2025 21:48
@THardy98
Copy link
Copy Markdown
Contributor Author

THardy98 commented Apr 30, 2025

@Sushisource opted for a "primary reason" + "details" to expose all possible cancellation reasons.

I figure this reduces the diff for both core & lang. Primary reason gives some lang some direction, with flexibility by inspecting the details (if need be).

@@ -67,7 +67,18 @@ message Start {

// Attempt to cancel a running activity
message Cancel {
Copy link
Copy Markdown
Contributor

@cretz cretz Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, say I heartbeat and get paused as true, so we send a cancel, then I heartbeat again and paused is false, then I heartbeat again and paused is true again. Do we send multiple cancellations or does Core only send the first cancellation reason it sees? This can also happen if say, paused becomes true, then cancel becomes true or then worker shutdown occurs.

My concern if Core does only apply the first cancellation reason, is that we can't get updates to these is_ details that may change. My concern if Core does not only apply the first but instead sends a cancellation for each, is that langs may be unprepared for multiple cancellations (not sure we ever tested swallowed cancellation followed by worker shutdown that may trigger a second).

This question is a bit blocking for me because I'm trying to understand how we model updating activity state as it comes back from the server/Core. I wonder if we should separate this cancel that only happens once from "activity state updated" we may send more frequently (and how we can make sure langs account for it).

Copy link
Copy Markdown
Contributor Author

@THardy98 THardy98 Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the former, as in, only apply first cancellation.
But I should take another look.

Suppose the is_ details do change from a heartbeat response, which is currently limited to cancel/pause flags. From what I can tell, worst-case scenario, we would have been able to run an activity that we have now cancelled. In which case, wouldn't we retry?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it does not make sense for core to send multiple cancels. I don't think it should or will ever do this right now... (I would have to double-check the shutdown case).

It adds a bunch of complication, and I'm not sure what benefit you'd get even if you did it. We can always, if it turns out to be needed for some reason, add some kind of currentActivityCancellationState() function to the Core API. But, I hope not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it does not make sense for core to send multiple cancels

Completely agree on sending multiple cancels. But langs now need to make it clear that "is_paused" does not mean "is paused" (which it very well may be) but rather "pausing was reason for cancellation request".

Copy link
Copy Markdown
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Maybe double check what's going on with the formatting

@@ -67,7 +67,18 @@ message Start {

// Attempt to cancel a running activity
message Cancel {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it does not make sense for core to send multiple cancels. I don't think it should or will ever do this right now... (I would have to double-check the shutdown case).

It adds a bunch of complication, and I'm not sure what benefit you'd get even if you did it. We can always, if it turns out to be needed for some reason, add some kind of currentActivityCancellationState() function to the Core API. But, I hope not.

if cancel_requested {
Ok(RecordActivityTaskHeartbeatResponse { cancel_requested, activity_paused }) => {
if cancel_requested || activity_paused {
let reason = if cancel_requested { ActivityCancelReason::Cancelled } else { ActivityCancelReason::Paused};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formatting does indeed seem weird, but don't know why CI isn't complaining.

@THardy98
Copy link
Copy Markdown
Contributor Author

split the conditional on separate lines, no complaints from the formatter either way 🤷

@THardy98 THardy98 merged commit a6b3157 into master May 1, 2025
17 checks passed
@THardy98 THardy98 deleted the activity_pause_interrupt_heartbeat branch May 1, 2025 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heartbeating activities should be interrupted when the activities are paused.

3 participants