Tasks and Data Science configuration files #3582

jrr96 · 2025-04-12T15:12:05Z

jrr96
Apr 12, 2025

Most data science workflows involve a Direct Acrylic Graph in the data processing. Since many tasks in the pipeline are often also time consuming, this makes them a very good use case for task caching.

The issue is that most data science workflows also use a configuration file to define things such as hyper-parameters.

With the current implementation of the task runner, this would mean that all tasks depend on the configuration file. This would make the entire pipeline run when the file changes, even if most pipeline steps where unaffected by the change.

Alternatively, configuration files for each step could be defined. This is however quite messy and it would most likely result in many small files. Not ideal.

Is there any way to make this work that I might be missing? How to keep a single configuration file but still have the task runner identify which tasks need to be run based on the parameters that affect each specific task?

I am also open to other ways of structuring the project that might help solve the issue...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks and Data Science configuration files #3582

{{title}}

Replies: 0 comments

Select a reply

Tasks and Data Science configuration files #3582

jrr96 Apr 12, 2025

Replies: 0 comments

jrr96
Apr 12, 2025