Skip to content

Investigate "cannot find task in worker to cancel" error #242

@sharpener6

Description

@sharpener6
[INFO]2025-09-23 23:36:29+0000: TestClient:test_heavy_function ==============================================
[INFO]2025-09-23 23:36:29+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:29+0000: ObjectStorageServer: start and listen to tcp://127.0.0.1:53551
[INFO]2025-09-23 23:36:29+0000: SchedulerClusterCombo: started
[INFO]2025-09-23 23:36:29+0000: ObjectStorageServer: started
[INFO]2025-09-23 23:36:29+0000: ScalerClient: connect to scheduler at tcp://127.0.0.1:44511
[INFO]2025-09-23 23:36:29+0000: ZMQAsyncConnector: started
[INFO]2025-09-23 23:36:29+0000: ZMQAsyncConnector: started
[INFO]2025-09-23 23:36:29+0000: ClientHeartbeatManager: started
[INFO]2025-09-23 23:36:30+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:30+0000: use event loop: builtin
[INFO]2025-09-23 23:36:30+0000: ConfigController: event_loop = builtin
[INFO]2025-09-23 23:36:30+0000: ConfigController: address = tcp://127.0.0.1:44511
[INFO]2025-09-23 23:36:30+0000: ConfigController: storage_address = ObjectStorageConfig(host='127.0.0.1', port=53551, identity='ObjectStorageServer')
[INFO]2025-09-23 23:36:30+0000: ConfigController: monitor_address = None
[INFO]2025-09-23 23:36:30+0000: ConfigController: adapter_webhook_url = None
[INFO]2025-09-23 23:36:30+0000: ConfigController: io_threads = 1
[INFO]2025-09-23 23:36:30+0000: ConfigController: max_number_of_tasks_waiting = -1
[INFO]2025-09-23 23:36:30+0000: ConfigController: client_timeout_seconds = 60
[INFO]2025-09-23 23:36:30+0000: ConfigController: worker_timeout_seconds = 60
[INFO]2025-09-23 23:36:30+0000: ConfigController: object_retention_seconds = 60
[INFO]2025-09-23 23:36:30+0000: ConfigController: load_balance_seconds = 1
[INFO]2025-09-23 23:36:30+0000: ConfigController: load_balance_trigger_times = 2
[INFO]2025-09-23 23:36:30+0000: ConfigController: protected = True
[INFO]2025-09-23 23:36:30+0000: ConfigController: allocate_policy = AllocatePolicy.even
[INFO]2025-09-23 23:36:30+0000: ConfigController: object_storage_address = tcp://127.0.0.1:53551
[INFO]2025-09-23 23:36:30+0000: ConfigController: updated `monitor_address` from `None` to `tcp://127.0.0.1:44513`
[INFO]2025-09-23 23:36:30+0000: Scheduler: listen to scheduler address tcp://127.0.0.1:44511
[INFO]2025-09-23 23:36:30+0000: Scheduler: connect to object storage server tcp://127.0.0.1:53551
[INFO]2025-09-23 23:36:30+0000: Scheduler: listen to scheduler monitor address tcp://127.0.0.1:44513
[INFO]2025-09-23 23:36:30+0000: ZMQAsyncBinder: started
[INFO]2025-09-23 23:36:30+0000: PyAsyncObjectStorageConnector: started
[INFO]2025-09-23 23:36:30+0000: VanillaGraphTaskController: started
[INFO]2025-09-23 23:36:30+0000: VanillaBalanceController: started
[INFO]2025-09-23 23:36:30+0000: VanillaClientController: started
[INFO]2025-09-23 23:36:30+0000: VanillaObjectController: started
[INFO]2025-09-23 23:36:30+0000: VanillaWorkerController: started
[INFO]2025-09-23 23:36:30+0000: VanillaInformationController: started
[INFO]2025-09-23 23:36:30+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:30+0000: Cluster: starting 3 workers, heartbeat_interval_seconds=2, task_timeout_seconds=0
[INFO]2025-09-23 23:36:30+0000: WorkerID(17350|Worker|runnervmf4ws1_0|60101aee8ca24ffe89f7f56111710fd3) started
[INFO]2025-09-23 23:36:30+0000: WorkerID(17350|Worker|runnervmf4ws1_1|97e733d600ad4e1a84ede686a7783cd6) started
[INFO]2025-09-23 23:36:30+0000: WorkerID(17350|Worker|runnervmf4ws1_2|03d65155bca64486891ac2bddabeb77b) started
[INFO]2025-09-23 23:36:30+0000: ClientID(16928|Client|a3584385bcff459e91c8b069479872d7) connected
[INFO]2025-09-23 23:36:30+0000: ScalerClient: connect to object storage at tcp://127.0.0.1:53551
[INFO]2025-09-23 23:36:30+0000: beginning submit 10000 heavy function (500mb) for 10000 tasks
[INFO]2025-09-23 23:36:31+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:31+0000: use event loop: builtin
[INFO]2025-09-23 23:36:31+0000: WorkerID(17350|Worker|runnervmf4ws1_0|60101aee8ca24ffe89f7f56111710fd3): start Processor[17377]
[INFO]2025-09-23 23:36:31+0000: ZMQAsyncConnector: started
[INFO]2025-09-23 23:36:31+0000: PyAsyncObjectStorageConnector: started
[INFO]2025-09-23 23:36:31+0000: ZMQAsyncBinder: started
[INFO]2025-09-23 23:36:31+0000: VanillaHeartbeatManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaTimeoutManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaTaskManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaProfilingManager: started
[INFO]2025-09-23 23:36:31+0000: worker WorkerID(17350|Worker|runnervmf4ws1_0|60101aee8ca24ffe89f7f56111710fd3) connected
[INFO]2025-09-23 23:36:31+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:31+0000: use event loop: builtin
[INFO]2025-09-23 23:36:31+0000: WorkerID(17350|Worker|runnervmf4ws1_2|03d65155bca64486891ac2bddabeb77b): start Processor[17380]
[INFO]2025-09-23 23:36:31+0000: ZMQAsyncConnector: started
[INFO]2025-09-23 23:36:31+0000: PyAsyncObjectStorageConnector: started
[INFO]2025-09-23 23:36:31+0000: ZMQAsyncBinder: started
[INFO]2025-09-23 23:36:31+0000: VanillaHeartbeatManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaTimeoutManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaTaskManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaProfilingManager: started
[INFO]2025-09-23 23:36:31+0000: worker WorkerID(17350|Worker|runnervmf4ws1_2|03d65155bca64486891ac2bddabeb77b) connected
[INFO]2025-09-23 23:36:31+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:31+0000: use event loop: builtin
[INFO]2025-09-23 23:36:31+0000: WorkerID(17350|Worker|runnervmf4ws1_1|97e733d600ad4e1a84ede686a7783cd6): start Processor[17383]
[INFO]2025-09-23 23:36:31+0000: ZMQAsyncConnector: started
[INFO]2025-09-23 23:36:31+0000: PyAsyncObjectStorageConnector: started
[INFO]2025-09-23 23:36:31+0000: ZMQAsyncBinder: started
[INFO]2025-09-23 23:36:31+0000: VanillaHeartbeatManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaTimeoutManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaTaskManager: started
[INFO]2025-09-23 23:36:31+0000: VanillaProfilingManager: started
[INFO]2025-09-23 23:36:31+0000: worker WorkerID(17350|Worker|runnervmf4ws1_1|97e733d600ad4e1a84ede686a7783cd6) connected
[INFO]2025-09-23 23:36:32+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:32+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:32+0000: logging to ('/dev/stdout',)
[INFO]2025-09-23 23:36:51+0000: balancing task: {WorkerID(17350|Worker|runnervmf4ws1_1|97e733d600ad4e1a84ede686a7783cd6): 118}
[EROR]2025-09-23 23:36:51+0000: TaskID(0077851c4d8a4d2ea141f02d23f15ba3): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(0077851c4d8a4d2ea141f02d23f15ba3): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(0077851c4d8a4d2ea141f02d23f15ba3): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(006540eae42341aeb2c5d3e44ec30543): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(006540eae42341aeb2c5d3e44ec30543): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(006540eae42341aeb2c5d3e44ec30543): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(20fd72fe0a954be4af958ab7dd17b495): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(20fd72fe0a954be4af958ab7dd17b495): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(20fd72fe0a954be4af958ab7dd17b495): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(4796a43b763846b989793de24f66e9c9): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(4796a43b763846b989793de24f66e9c9): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(4796a43b763846b989793de24f66e9c9): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(f8dd4e493f5c40f593531821e9ba6ba0): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(f8dd4e493f5c40f593531821e9ba6ba0): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(f8dd4e493f5c40f593531821e9ba6ba0): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(6dc93feef63445b7809eec6f98230f9d): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(6dc93feef63445b7809eec6f98230f9d): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(6dc93feef63445b7809eec6f98230f9d): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(d5e77e0849c941fe8fe4897e2825cebc): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(d5e77e0849c941fe8fe4897e2825cebc): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(d5e77e0849c941fe8fe4897e2825cebc): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(588c624304b3498cac0a30a9c9741b35): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(588c624304b3498cac0a30a9c9741b35): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(588c624304b3498cac0a30a9c9741b35): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(7d94654802e840a2b584fdca53a2a97b): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(7d94654802e840a2b584fdca53a2a97b): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(7d94654802e840a2b584fdca53a2a97b): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(2539effe1ca847919c4e7462d2a29346): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(2539effe1ca847919c4e7462d2a29346): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(2539effe1ca847919c4e7462d2a29346): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(a8b7423223974aaaa2afd3f6b1ebbb3a): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(a8b7423223974aaaa2afd3f6b1ebbb3a): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(a8b7423223974aaaa2afd3f6b1ebbb3a): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(c54020609d0a4f8abca17cf31ea21379): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(c54020609d0a4f8abca17cf31ea21379): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(c54020609d0a4f8abca17cf31ea21379): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(7ff5622ede824b4ab5cbd739fac88e77): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(7ff5622ede824b4ab5cbd739fac88e77): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(7ff5622ede824b4ab5cbd739fac88e77): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(b51d51a78b934cec86577e4097c49a78): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(b51d51a78b934cec86577e4097c49a78): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(b51d51a78b934cec86577e4097c49a78): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(b3848514bf364afdad7760cc28fa589b): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(b3848514bf364afdad7760cc28fa589b): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(b3848514bf364afdad7760cc28fa589b): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(3d8a896c289741f2a45051aca2ddf6f9): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(3d8a896c289741f2a45051aca2ddf6f9): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling
[INFO]2025-09-23 23:36:51+0000: TaskID(3d8a896c289741f2a45051aca2ddf6f9): unknown transition: TaskTransition.TaskCancelConfirmNotFound
[EROR]2025-09-23 23:36:51+0000: TaskID(fdd50988794143198f2acc98585df4ea): cannot find task in worker to cancel
[EROR]2025-09-23 23:36:51+0000: TaskID(fdd50988794143198f2acc98585df4ea): cannot apply TaskTransition.TaskCancelConfirmNotFound to current state TaskState.BalanceCanceling

Metadata

Metadata

Assignees

Labels

pythonPull requests that update python code

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions