-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Currently, when one uses the select_streams argument, it will first traverse the file's chunks to find/parse the headers, and sets some flags used subsequently for a check during the load_xdf main loop, which traverses and parses the file again*. It does that using an alternative code path that mirrors the chunk traversal and header parsing of the main loop, but currently skips, among others, some error checking in case of file corruptions etc.
Side note: I do like the idea of breaking load_xdf up into smaller subroutines, and I like the generator approach of _read_chunks, i.e., there may be a possibility to refactor load_xdf in this overall style, especially if the logic were to get more complex over the years. However, as it stands, the current main loop is still pretty simple and easy to follow (esp when one reads it alongside the spec), and could continue to serve as an approachable reference implementation for future language ports for a couple more years. So I'm not ready to pick a side at this point, also considering the effort that a full & clean refactor would come down to.
For now, maybe a way to reconcile the code duplication (which I hope is temporary) and double-traversal of the file could be to add a self.skip (as in, skip processing chunks of this stream) bool in the StreamData constructor, and we could move the matching logic there or into a helper method/function. This way, it would run the first time a header is encountered, and then whenever one sees a chunk of that stream, one can, roughly, do an if streams[StreamId].skip: continue near the place where it currently does that check. We could then earmark the remainder of that alternative code path (parse_chunks, _read_chunks, ...) for future consideration when or if we take on a refactor of load_xdf in this general style (maybe with a git tag).
I'd be willing to implement the suggested change (using StreamData.skip) this week if there's no objection. I think this may also get us a closer to a future simple and fast load-only-headers option for load_xdf (some time soon I'm hoping to have a separate discussion on that).
(*): The double-traversal may not sound like much, it'd be relatively more costly on a network file system.