You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most data science workflows involve a Direct Acrylic Graph in the data processing. Since many tasks in the pipeline are often also time consuming, this makes them a very good use case for task caching.
The issue is that most data science workflows also use a configuration file to define things such as hyper-parameters.
With the current implementation of the task runner, this would mean that all tasks depend on the configuration file. This would make the entire pipeline run when the file changes, even if most pipeline steps where unaffected by the change.
Alternatively, configuration files for each step could be defined. This is however quite messy and it would most likely result in many small files. Not ideal.
Is there any way to make this work that I might be missing? How to keep a single configuration file but still have the task runner identify which tasks need to be run based on the parameters that affect each specific task?
I am also open to other ways of structuring the project that might help solve the issue...
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Most data science workflows involve a Direct Acrylic Graph in the data processing. Since many tasks in the pipeline are often also time consuming, this makes them a very good use case for task caching.
The issue is that most data science workflows also use a configuration file to define things such as hyper-parameters.
With the current implementation of the task runner, this would mean that all tasks depend on the configuration file. This would make the entire pipeline run when the file changes, even if most pipeline steps where unaffected by the change.
Alternatively, configuration files for each step could be defined. This is however quite messy and it would most likely result in many small files. Not ideal.
Is there any way to make this work that I might be missing? How to keep a single configuration file but still have the task runner identify which tasks need to be run based on the parameters that affect each specific task?
I am also open to other ways of structuring the project that might help solve the issue...
Beta Was this translation helpful? Give feedback.
All reactions