Activity not cancelled after start_to_close_timeout when heartbeat_timeout is shorter

### Bug Description

When an activity has both `startToCloseTimeout` and `heartbeatTimeout` set, and `heartbeatTimeout < startToCloseTimeout`, the Core SDK's local timeout timer never fires because successful heartbeats keep resetting it. The activity runs indefinitely on the worker even after the server has timed it out.

This affects all SDKs built on the Core SDK (Python, TypeScript). The Go SDK is not affected because it uses `context.WithDeadline` based on `startToCloseTimeout` directly.

### Reproduction

**Setup**: Activity that polls in a loop, heartbeats every 2 seconds, never completes.

| Setting | Value |
|---------|-------|
| `startToCloseTimeout` | 10s |
| `heartbeatTimeout` | 5s |
| `retryPolicy.maximumAttempts` | 1 |

**Expected**: Activity receives cancellation ~10s after start (when `startToCloseTimeout` fires).

**Actual**: Activity runs forever. No cancellation delivered. Heartbeats succeed with no errors even after the server has timed out the activity.

Tested on:
- Python SDK v1.23.0 with `temporalio.testing.WorkflowEnvironment.start_local()` (Temporal Server v1.30.2) — **not cancelled**
- TypeScript SDK (latest) with `TestWorkflowEnvironment.createLocal()` (Temporal Server v1.30.2) — **not cancelled**
- Go SDK v1.41.1 with real local server — **properly cancelled** via `context.WithDeadline`

### Python Reproduction

```python
import asyncio
import time
from datetime import timedelta
from temporalio import activity, workflow
from temporalio.common import RetryPolicy
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker

activity_cancelled = False

@activity.defn
async def stuck_activity() -> str:
    global activity_cancelled
    # Background heartbeat
    async def hb_loop():
        while True:
            try:
                activity.heartbeat()
            except Exception:
                pass
            await asyncio.sleep(2)
    hb_task = asyncio.create_task(hb_loop())
    try:
        while True:
            await asyncio.sleep(1)
    except asyncio.CancelledError:
        activity_cancelled = True
        raise
    finally:
        hb_task.cancel()

@workflow.defn
class TestWf:
    @workflow.run
    async def run(self) -> str:
        try:
            return await workflow.execute_activity(
                stuck_activity,
                start_to_close_timeout=timedelta(seconds=10),
                heartbeat_timeout=timedelta(seconds=5),
                retry_policy=RetryPolicy(maximum_attempts=1),
            )
        except Exception as e:
            return f"failed: {e}"

async def main():
    async with await WorkflowEnvironment.start_local() as env:
        async with Worker(env.client, task_queue="q", workflows=[TestWf], activities=[stuck_activity]):
            result = await env.client.execute_workflow(TestWf.run, id="test", task_queue="q", execution_timeout=timedelta(seconds=30))
            print(f"Result: {result}")
            print(f"Activity cancelled: {activity_cancelled}")
            # Prints: activity_cancelled = False

asyncio.run(main())
```

### TypeScript Reproduction

```typescript
// Same pattern — activity polls with setInterval heartbeat, never completes.
// startToCloseTimeout=10s, heartbeatTimeout=5s, maximumAttempts=1.
// Result: activity NOT cancelled after timeout. Only cancelled on worker shutdown.
```

### Go SDK (NOT affected)

```go
// Same pattern — activity heartbeats in a loop.
// startToCloseTimeout=10s, heartbeatTimeout=5s, maximumAttempts=1.
// Result: ctx.Done() fires after exactly 10s with "context deadline exceeded".
// Go SDK uses context.WithDeadline — works correctly.
```

### Root Cause

In `crates/sdk-core/src/worker/activities.rs` (lines ~585-638), the local timeout implementation picks **only the minimum** of `heartbeat_timeout` and `start_to_close_timeout`:

```rust
let timeout_at = [
    (HEARTBEAT_TYPE, task.resp.heartbeat_timeout),
    ("start_to_close", task.resp.start_to_close_timeout),
]
.into_iter()
.filter_map(...)
.min_by(|(_, d1), (_, d2)| d1.cmp(d2));
```

When `heartbeat_timeout (5s) < start_to_close_timeout (10s)`:
1. Only the heartbeat timer is created (type = `HEARTBEAT_TYPE`)
2. A `timeout_resetter` (Notify) is created for the heartbeat timer
3. Every successful heartbeat calls `resetter.notify_one()`, resetting the timer
4. The `start_to_close_timeout` timer is **never created**
5. As long as heartbeats succeed, the activity runs forever

### What the Go SDK Does Differently

The Go SDK (`internal/internal_task_handlers.go`) sets a hard context deadline:
```go
ctx, dlCancelFunc := context.WithDeadline(ctx, info.deadline)
```
Where `info.deadline` is computed from `startToCloseTimeout` (not heartbeat_timeout). This fires regardless of heartbeating.

### Suggested Fix

Create **both** timers when both timeouts are set:
1. A **resettable** heartbeat timer that resets on each heartbeat
2. A **non-resettable** `start_to_close_timeout` timer that always fires

This was explicitly requested by @cretz in the [PR #620 review](https://github.com/temporalio/sdk-core/pull/620):

> "Ideally we can have a start to close timer that will fail task AND a heartbeat timer that will fail the task, the latter simply getting reset every heartbeat call."

### Production Impact

This caused a production incident where activities (with `startToCloseTimeout=15min`, `heartbeatTimeout=3min`) ran for **5+ hours** as zombies on a worker after the server timed them out. Each zombie held a worker task slot, eventually saturating all 20 slots and rendering the worker unable to process any new activities.

### Related

- #620 — PR that added local timeout enforcement (this bug is in its implementation)
- #490 — Original feature request for local timeout enforcement
- temporalio/features#170 — Feature spec
- temporalio/sdk-python#700 — Related cancellation UX gap


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activity not cancelled after start_to_close_timeout when heartbeat_timeout is shorter #1188

Bug Description

Reproduction

Python Reproduction

TypeScript Reproduction

Go SDK (NOT affected)

Root Cause

What the Go SDK Does Differently

Suggested Fix

Production Impact

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Setting	Value
`startToCloseTimeout`	10s
`heartbeatTimeout`	5s
`retryPolicy.maximumAttempts`	1

Activity not cancelled after start_to_close_timeout when heartbeat_timeout is shorter #1188

Description

Bug Description

Reproduction

Python Reproduction

TypeScript Reproduction

Go SDK (NOT affected)

Root Cause

What the Go SDK Does Differently

Suggested Fix

Production Impact

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions