Skip to content

Create LLM-optimized text representation of the contents of an NWB file #104

@rly

Description

@rly

It would be useful to have a text/JSON-based representation of an NWB file for feeding into an LLM. That LLM could then generate code to explore the file or plot data from the file. It could also create a text-based summary of the file contents.

Initial thoughts from @magland:

For purpose of feeding it to a LLM I think a transformation would make sense:

  • Remove all of the references to remote data chunks
  • Keep the group and dataset attributes (but adjust the attribute values where necessary to be text readable)
  • Keep the dataset values for only small datasets (such as strings) but apply suitable transformations so they are readable text (e.g. no base64 or other encodings)

And from me:

Particularly, for small columns in the trials table and units table, it might be useful for the LLM to know what values are observed. E.g., A "stimulus_type" column might have five values "beach", "tool", "person", "building", "forest" or something. Maybe those column values could be included in the file, or possible_values could be extracted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions