Description
marked as low-priority, since I don't think we have a pressing usecase for this right now, but capturing discussion from #11 , quoting @makkus
What is a file bundle
A data type that contains one or several files, each identified by an internal (relative) sub-path within the bundle. The contained files are usually related in some way that is relevant to the computations that will be done on them (for example multiple text files belonging to the same corpus)
when would you use that rather than just importing lots of files individually
whenever you have files that have that shared context, and would be fed into a downstream operation at the same time. Otherwise the downstream operation would need to have an input field for every individual file, which would be inefficient and only possible if you know exactly how many (sub-) files you will be dealing with.
Is there anything you can do with a file bundle you can't do with a file or vice versa?
Technically not I guess, but the question really is what operation would make sense for a single file that also makes sense for a file bundle. The only thing I can think of is doing the same operation on every sub-file of a bundle, which would be very inefficient and painful to have to do manually, so it'd be nice to have a module that can take a file-bundle and does that operation for all included files. But we haven't had a use-case like that so far, if I remember right.
For kiaras purposes, a file and a file_bundle are 2 different data types, and a module that takes one as input can't be used with the other. You'd have to use a 'pick.file' operation on a file bundle first, for example, if you have a single file input in an operation you want to use. Or you'd have to 'augment' a single file with an internal relative-path (which basically means adding information to data) if you wanted to convert a single file to a file_bundle (but that's not something we had to do so far I think).
- Provide a code example of how to import a file bundle, in the context of some actual usecase (eg you have a corpus of documents).
- Include an example of how to get individual files out of the bundle in case you need them.
- Discuss when you should use file bundle instead of multiple files, what the tradeoffs are. Eg why the network analysis examples have nodes and edges CSVs but usually import them separately (is this wrong???)
Activity