Right now, we run every pipeline ID as separate pipeline runs, with their own top-level pipeline.
There should be a way to get everything (e.g. imputation step, reducer step, clustering step, etc) operating as separate sub-pipelines, in which case we can run the whole thing as one big pipeline. This will allows us to be smarter about when gets regenerated during a pipeline run.
I think this will involve fixing up the namespaces, since they'll need to be totally unique.
Benefits:
- this will speed things up for early intensive steps like KNN imputation. we'll only have to do it once.
- it will add speed-ups when we start varying the clusters more
#todo
Right now, we run every pipeline ID as separate pipeline runs, with their own top-level pipeline.
There should be a way to get everything (e.g. imputation step, reducer step, clustering step, etc) operating as separate sub-pipelines, in which case we can run the whole thing as one big pipeline. This will allows us to be smarter about when gets regenerated during a pipeline run.
I think this will involve fixing up the namespaces, since they'll need to be totally unique.
Benefits:
#todo