Skip to content

How do I import a bundle of files into Kiara via the python API? #13

Open
@caro401

Description

@caro401

marked as low-priority, since I don't think we have a pressing usecase for this right now, but capturing discussion from #11 , quoting @makkus

What is a file bundle

A data type that contains one or several files, each identified by an internal (relative) sub-path within the bundle. The contained files are usually related in some way that is relevant to the computations that will be done on them (for example multiple text files belonging to the same corpus)

when would you use that rather than just importing lots of files individually

whenever you have files that have that shared context, and would be fed into a downstream operation at the same time. Otherwise the downstream operation would need to have an input field for every individual file, which would be inefficient and only possible if you know exactly how many (sub-) files you will be dealing with.

Is there anything you can do with a file bundle you can't do with a file or vice versa?

Technically not I guess, but the question really is what operation would make sense for a single file that also makes sense for a file bundle. The only thing I can think of is doing the same operation on every sub-file of a bundle, which would be very inefficient and painful to have to do manually, so it'd be nice to have a module that can take a file-bundle and does that operation for all included files. But we haven't had a use-case like that so far, if I remember right.

For kiaras purposes, a file and a file_bundle are 2 different data types, and a module that takes one as input can't be used with the other. You'd have to use a 'pick.file' operation on a file bundle first, for example, if you have a single file input in an operation you want to use. Or you'd have to 'augment' a single file with an internal relative-path (which basically means adding information to data) if you wanted to convert a single file to a file_bundle (but that's not something we had to do so far I think).

  • Provide a code example of how to import a file bundle, in the context of some actual usecase (eg you have a corpus of documents).
  • Include an example of how to get individual files out of the bundle in case you need them.
  • Discuss when you should use file bundle instead of multiple files, what the tradeoffs are. Eg why the network analysis examples have nodes and edges CSVs but usually import them separately (is this wrong???)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    how-toRequest or outline for a how-to or tutorial type doclow-priorityThings we don't have resources to address right nowpython APIDocs about how to use kiara via Python API (in jupyter or otherwise)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions