Open
Description
Describe the issue:
When providing custom code for worker, failures do not propagate and are just printed to console.
Relevant code:
distributed/distributed/preloading.py
Lines 233 to 236 in a57ab42
I run critical setup code in preload so I need worker to fail if some database failed to connect instead of getting runtime errors. Currently I use the following hack to at least prevent workers from registering in scheduler:
from dask.distributed import Worker
# dask calls this function
# all code which might fail must exist inside try block
async def dask_setup(worker: Worker):
try:
from backend.dask.preload import preload
await preload(worker)
except Exception as e:
import sys
print("preload failed:", e)
# explicitly exit to prevent worker from running without preload code
# this does not kill pod but prevents it from registering in scheduler
# worker.stop() does not work for some reason
sys.exit(1)
Minimal Complete Verifiable Example:
async def dask_setup(worker):
1 / 0
Will result in:
2024-06-28 14:15:26,013 - distributed.nanny - INFO - Start Nanny at: 'tcp://10.244.9.106:44589'
2024-06-28 14:15:26,476 - distributed.preloading - INFO - Creating preload: /opt/backend/dask/bin.runfiles/_main/backend/dask/preload_entrypoint.py
2024-06-28 14:15:26,477 - distributed.utils - INFO - Reload module preload_entrypoint from .py file
2024-06-28 14:15:26,477 - distributed.preloading - INFO - Import preload module: /opt/backend/dask/bin.runfiles/_main/backend/dask/preload_entrypoint.py
2024-06-28 14:15:26,876 - distributed.preloading - INFO - Run preload setup: /opt/backend/dask/bin.runfiles/_main/backend/dask/preload_entrypoint.py
2024-06-28 14:15:26,876 - distributed.preloading - ERROR - Failed to start preload: /opt/backend/dask/bin.runfiles/_main/backend/dask/preload_entrypoint.py
Traceback (most recent call last):
File "/opt/backend/dask/bin.runfiles/rules_python~0.26.0~pip~pypi_311_distributed/site-packages/distributed/preloading.py", line 234, in start
await preload.start()
File "/opt/backend/dask/bin.runfiles/rules_python~0.26.0~pip~pypi_311_distributed/site-packages/distributed/preloading.py", line 213, in start
await future
File "/tmp/dask-scratch-space/worker-78qpzjdj/preload_entrypoint.py", line 8, in dask_setup
1 / 0
~~^~~
ZeroDivisionError: division by zero
But worker keeps running as if nothing happened.
Anything else we need to know?:
Environment:
- Dask version:
2024.5.2
- Python version:
3.11
- Operating System: google's distroless_python
- Install method (conda, pip, source): pip