Using select_streams argument traverses file twice

Currently, when one uses the `select_streams` argument, it will first traverse the file's chunks to find/parse the headers, and sets some flags used subsequently for a check during the `load_xdf` main loop, which traverses and parses the file again*. It does that using an alternative code path that mirrors the chunk traversal and header parsing of the main loop, but currently skips, among others, some error checking in case of file corruptions etc.

_Side note_: I do like the idea of breaking `load_xdf` up into smaller subroutines, and I like the generator approach of `_read_chunks`, i.e., there may be a possibility to refactor `load_xdf` in this overall style, especially if the logic were to get more complex over the years. However, as it stands, the current main loop is still pretty simple and easy to follow (esp when one reads it alongside the spec), and could continue to serve as an approachable reference implementation for future language ports for a couple more years. So I'm not ready to pick a side at this point, also considering the effort that a full & clean refactor would come down to.

For now, maybe a way to reconcile the code duplication (which I hope is temporary) and double-traversal of the file could be to add a self.skip (as in, skip processing chunks of this stream) bool in the `StreamData` constructor, and we could move the matching logic there or into a helper method/function. This way, it would run the first time a header is encountered, and then whenever one sees a chunk of that stream, one can, roughly, do an `if streams[StreamId].skip: continue` near the place where it currently does that check. We could then earmark the remainder of that alternative code path (parse_chunks, _read_chunks, ...) for future consideration when or if we take on a refactor of `load_xdf` in this general style (maybe with a git tag). 

I'd be willing to implement the suggested change (using `StreamData.skip`) this week if there's no objection. I think this may also get us a closer to a future simple and fast load-only-headers option for `load_xdf` (some time soon I'm hoping to have a separate discussion on that).

(*): The double-traversal may not sound like much, it'd be relatively more costly on a network file system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using select_streams argument traverses file twice #60

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using select_streams argument traverses file twice #60

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions