Instead of mapping the process function over the full fileset, investigate an approach of dynamically splitting the fileset into N subgraphs that run independently. Receive the outputs and write them to disk via dask.distributed.as_completed (assuming the potential for enough outputs to not fit in memory) and call a post-processing merge step afterwards.