[FEATURE REQUEST] Multi-tensor samples from complex data sources

**Is your feature request related to a problem? Please describe.**
I want to know how to accomplish the following. It doesn't need to require zero effort on the user's part, but there needs to be a clear best-hangar-practices path to a workable setup.

Take a source data format that is complex (e.g. DICOM, JPEG) and infeasible to reconstitute bit-exact from the tensor+metadata form. Each instance of the raw data produces a sample that consists of 2 (or more) tensors; an image tensor and a 1D tensor that encodes things like lat/long or age/sex/etc. (to be concatenated with the output of the convolutional layers prior to the fully connected layers). To be clear, this is intended to be an illustrative example, not a concrete use case.

Per my reading of the docs, right now these two tensors wouldn't qualify as being in the same hangar dataset (it's not clear if that's problematic or not).

Let's express the above conversion as:

`f_v1(raw) -> (t1, t2)`

Users will need to:

- Update the conversion function to `f_v2` and repopulate `t1` and `t2`.
- Update the conversion function to `f_v3` which outputs `(t1, t2, t3)`.
- Update the raw data for a sample and repopulate `t1` and `t2`.
- Be handed the pair of `t1` and `t2` for training/validation (including when training is randomized).
- Retrieve the raw data given IDs/tags/metadata included with the training sample (for use in an external viewer, manual investigation, etc.).

**Describe the solution you'd like**
I think that changing the definition of a sample to be a tuple of binary blobs plus a tuple of tensors plus metadata would work, but I haven't considered the potential impacts from that kind of change. Seems potentially large.

**Describe alternatives you've considered**
Another option would be to have separate datasets for `t1` and `t2` and combine them manually, plus manage the binary blobs separately. That seems like a lot of infra work, and might be at risk of having drift between the samples themselves, and with the blobs.

**Additional context**
I suspect that I want/expect Hangar to solve a larger slice of the problem than it's intended to, but it's not clear at first glance what the intended approach would be for more complicated setups like the above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE REQUEST] Multi-tensor samples from complex data sources #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST] Multi-tensor samples from complex data sources #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions