Description
Right now, nx-parallel
only have embarrassingly parallel algorithms. And we support all the parallel configurations that joblib offers. We pass all those configurations into joblib’s parallel_config
context manager as is and then run the joblib.Parallel
call(which is inside nx-parallel) with those configurations.
I’d like to propose adding support for algorithms that might not fit neatly into joblib’s structure— for example those that might only work with some of the parallel backends that joblib supports or backends that joblib does not support. We could introduce additional parameters for such non-embarrassingly parallel graph algorithms in the _configure_if_nx_active
decorator. For example:
@nxp._configure_if_nx_active(supported_backends=["dask", "ray"])
def non_embarassingly_parallel_graph_algo(G, nodes=None, get_chunks="chunks"):
"""This algorithm is written only to work with `dask` and `ray` backends..."""
...
Our aim should be to provide support for all the configurations that joblib provides but if we are not able to because of the nature of the algorithm or something else, then we should still be able to incorporate that algorithm in nx-parallel. And it seems reasonable to me to add algorithms that only supports a few backends right now but might/might not support more in the future. And we can also have different implementations of the same algorithm supporting different backends and then internally dispatching(within nx-parallel) based on the configurations' values.
I'm not sure what the exact standards should be-- for an implementation to be added in nx-parallel. On one end of this would be parallel implementations that are fixed and does not let the user configure anything, and on the other end would be algorithms that support all the joblib's configs along with their own configurations/backends(that joblib does not offer). But, I'm no expert on this. But, we should make sure that there is some consistency across the repository and that the parameters passed in the decorator are not too specific to one algorithm or that the additional configuration(s) for a few algorithms don’t start to look like additional backend-specific kwargs.
Somewhat related-- In some past discussions, @dPys and I talked about supporting multiple levels of parallelism within nx-parallel
-- at the level of function(which we currently have in nx-parallel) and at the level of graph objects. Parallelism at the level of graph object means, if someone has 100 graph(or subgraph) objects and they want to run the same algorithm on all of them then that process could be done in parallel. But, I'm not sure how much faster running a parallel algorithms on 100 graphs one by one would be from running an algorithm on 100 graphs while processing those graphs in parallel. Also, how would dispatching work in this case?
Feel free to leave more ideas/thoughts on this below!
Also pls LMK if you think anything above is not clear and/or needs more explanation.
Thank you :)
PS: Thank you to @dschult for asking me questions like "... whether there are fundamental limits to our approach...", "Do you feel the project is in shape to have people work on this?", "Are these the next steps you would have taken given the time to do so?" which lead me to thinking all of the above stuff and creating this issue. Thanks!
Activity