Intermediate data sources when running multiple child-producing operators

The primary operators in tomviz that produce children via `dataset.create_child_dataset()` are reconstruction operators. Since usually only one reconstruction is performed on a data source, running multiple child-producing operators on one data source is currently not common.

With #2061 coming, however, child-producing operators may become more common, because setting scalars that have different dimensions than the other scalars on the dataset is easier to do on a child than on the original dataset (because scalars with non-matching dimensions are disposed of).

This issue is partly here to document the current behavior of running multiple child-producing operators on one data source, and then discuss what the behavior should be. I think the big question is: should we be producing intermediate data sources, as some of the examples below are doing?

The attached operator is used in these examples. It simply produces a child that is the inverse of its parent.
[make_inverted_child.py.gz](https://github.com/OpenChemistry/tomviz/files/4325934/make_inverted_child.py.gz)
[make_inverted_child.json.gz](https://github.com/OpenChemistry/tomviz/files/4325935/make_inverted_child.json.gz)

Internal pipeline with description.json
============================

![Screenshot from 2020-03-12 14-54-57](https://user-images.githubusercontent.com/9558430/76557213-82fe3000-6471-11ea-933d-6e5bc718efab.png)

This produces intermediate data sources, that are kept and not deleted when the next operator is performed. Each child's parent is the data source before it. The modules are only moved down once when the first child is created, but the output of each step appears to be correct.

Internal pipeline with no description.json
===============================
![Screenshot from 2020-03-12 14-58-24](https://user-images.githubusercontent.com/9558430/76557451-f607a680-6471-11ea-8229-cb504d2d306b.png)

In this case, there are no intermediate data sources. However, the output is not correct, because the internal pipeline requires a `description.json`, or else it ignores the child and just copies the input down.

External pipeline with description.json
=============================

![Screenshot from 2020-03-12 15-02-34](https://user-images.githubusercontent.com/9558430/76557773-88a84580-6472-11ea-89d8-e5c213337a6b.png)

This produces intermediate data sources. The first child output is correct, but all subsequent child outputs are not (it seems to just copy the input after the first one).

External pipeline with no description.json
===============================
![Screenshot from 2020-03-12 15-00-24](https://user-images.githubusercontent.com/9558430/76557623-4b43b800-6472-11ea-9e2c-271f0d418f33.png)

There are no intermediate data sources, but each output is correct - it properly inverts the data source each time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermediate data sources when running multiple child-producing operators #2075

Internal pipeline with description.json

Internal pipeline with no description.json

External pipeline with description.json

External pipeline with no description.json

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Intermediate data sources when running multiple child-producing operators #2075

Description

Internal pipeline with description.json

Internal pipeline with no description.json

External pipeline with description.json

External pipeline with no description.json

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions