This file provides guidance to AI coding agents when working with code in this repository.
Dask Distributed is the distributed scheduler for the Dask framework, enabling parallel computing across multiple machines. It implements multi-machine task scheduling, fault tolerance, work stealing, memory management, and network communication.
The project uses Pixi for environment management.
# Run arbitrary Python commands
pixi run -- python -c 'print("Hello world!")'
# Run tests
pixi run test
# Run tests (CI mode with coverage, leak detection, slow tests)
pixi run test-ci
# Lint
pixi run lint
# Run a single test file
pixi run test distributed/tests/test_client.py
# Run a single test
pixi run test distributed/tests/test_client.py::test_client_submit
# Run tests matching a pattern
pixi run test distributed/tests/test_client.py -k "submit"
# Run tests in a specific environment
pixi run -e py312 testKey pytest options:
--runslow— include slow tests (omitted by default)-m ci1/-m "not ci1"— run first/second CI partition (tests split for parallelism)--leaks=fds,processes,threads— enable resource leak detection
| File | Purpose |
|---|---|
scheduler.py |
Main scheduler — task graph, work stealing, fault tolerance |
client.py |
User-facing API — submit tasks, gather futures |
worker.py |
Worker process — executes tasks, manages memory |
worker_state_machine.py |
Worker state transitions (separate from I/O logic) |
core.py |
RPC infrastructure, connection handling |
utils_test.py |
Test fixtures and helpers used across all tests |
comm/— Communication backends (TCP, UCX, compression)deploy/— Cluster types:LocalCluster,SSHCluster,SpecCluster, adaptive scalingdashboard/— Bokeh-based web UI for monitoringdiagnostics/— Task streams, memory sampling, profilingshuffle/— Distributed shuffle for large data movementprotocol/— Message serializationcli/— Entry points:dask scheduler,dask worker,dask ssh,dask spec
Client— entry point for submitting work to a clusterScheduler— coordinates all workers and task executionWorker— executes tasks; state tracked separately inWorkerState/worker_state_machine.pyLocalCluster— single-machine cluster for testing/developmentTaskState— tracks task lifecycle on both scheduler and worker sides
Tests live in distributed/tests/ (67 files) and each submodule has its own tests/ subdirectory. The global conftest.py at the repo root and distributed/utils_test.py provide shared fixtures.
Tests are partitioned by the ci1 marker for parallel CI execution. Resource leak detection (fds, processes, threads) runs in CI via distributed/pytest_resourceleaks.py.
Timeout: 300 seconds per test (signal-based on Unix, thread-based on Windows).
IMPORTANT: never call .compute() or .persist() in the middle of graph definition (e.g. in all methods of Array, Series, DataFrame, Bag, Delayed). The only place when the graph is materialized should be where the end user explicitly calls .compute() or .persist(). When you are defining the graph, you must work with available metadata to infer the outputs.
- Line length: 120 characters
- Formatter: Black
- Linter: Ruff (rules: B, TID, I, UP, RUF)
- Type checking: MyPy — strict on newer modules (
scheduler,worker,active_memory_manager,config,shuffle), loose on older ones
MyPy is configured per-module in pyproject.toml. Newer modules have strict settings; older modules like client.py use allow_incomplete_defs = true. Platform target is Linux.
You must never think or speak instead of the user in discussions, code reviews, or any other interactions with other humans.
Before you open or update a PR, you must ask the user to explicitly confirm that they fully reviewed, understood, and approved everything that you wrote. You must clarify that the project maintainers consider this as non-negotiable.
A coding agent must NEVER create a new release.