183 add dask parallelism for cube single field imaging#237
Conversation
…-add-dask-parallelism-for-cube-single-field-imaging
… of https://github.com/casangi/astroviper into 183-add-dask-parallelism-for-cube-single-field-imaging
…-add-dask-parallelism-for-cube-single-field-imaging
… of https://github.com/casangi/astroviper into 183-add-dask-parallelism-for-cube-single-field-imaging
… of https://github.com/casangi/astroviper into 183-add-dask-parallelism-for-cube-single-field-imaging
… of https://github.com/casangi/astroviper into 183-add-dask-parallelism-for-cube-single-field-imaging
|
Jan-Willem Steeb seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
❌ 8 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
smcastro
left a comment
There was a problem hiding this comment.
@Jan-Willem, thanks for this major effort to refactor and provide consistency across the project. I have a few suggestions that could improve clarify, in case you are considering modifications at the naming level.
Naming
distributed_graphs
The suffix _graphs is a bit ambiguous as this belongs to graphviper. The main purpose of this directory is to construct parallel_coords, map node_tasks over chunks, etc. and it does not implement graphs. Wouldn't a different name be more consistent and shorter? Some suggestions:
- distributed
- parallel
node_tasks
At start I had to read the documentation to understand the purpose of this directory. The suffix task seems to be overloaded everywhere and I was wondering if there could be a better name here that does not cause ambiguity in the overall VIPER ecosystem. For example, currently task is used as:
- Prefect @task
- Dask delayed task
- Graphviper node_task parameter
Since node_tasks is a chunk (slice) worker, some suggestions could be to use
chunk_workers or map_workers, which describe the roles in place and do not collide with Prefect or Dask.
processing_functions
The suffix "functions" doesn't add anything and makes the name very long. Some suggestions:
- processing
- domain
Symbols
There is some symbol repetition across the layers, which could use some suffix to lighten the use and prevent confusion. For example, currently we have:
distributed_graphs.imaging.image_cube_single_field # driver
node_tasks.imaging.image_cube_single_field # chunk worker
processing_functions.imaging.image_cube_single_field # science
The feather is breaking this by using feather_core for science, which could be followed by the other APIs. Some suggestions:
distributed.imaging.image_cube_single_field #driver or
parallel.imaging.image_cube_single_field #driver
chunk_workers.imaging.image_cube_single_field # chunk worker
processing.imaging.image_cube_single_field_core. # science
The _core suffix will indicate this is the core algorithm (domain layer) and will be clearer at the high level when a flowviper workflow calls it.
Tests
I would like to suggest to rename stakeholders to component, which aligns to the naming convention of the Testing Strategy (unit, component, integration). Integration will be used inside testviper and each VIPER component should contain unit and component tests.
No description provided.