Skip to content

Feature add batchwise filtered reading#32

Open
smoosbau wants to merge 10 commits intoseemoo-lab:mainfrom
smoosbau:feature_add_batchwise_filtered_reading
Open

Feature add batchwise filtered reading#32
smoosbau wants to merge 10 commits intoseemoo-lab:mainfrom
smoosbau:feature_add_batchwise_filtered_reading

Conversation

@smoosbau
Copy link
Copy Markdown

@smoosbau smoosbau commented Dec 4, 2025

As I needed a possibility to (a) filter the data read from the binary files and (b) read the data batchwise due to the size (my measurements take about 8h (sampling rate 1024Hz)) I added this functionalities.

Filtering is added to the complete BinaryReader class, while batchwise reading is realized using a new subclass.

This Pull Request is currently a draft, as I'm not sure wether the maintainers prefere another realization of that features or discard it completet.
Thus, I'd like to ask @lumagi to review this draft and share his thoughts with me. Afterwards, I can either proceed with documentation and tests, change the design / implementation, or delete this PR.

Any feedback is appreciated. :)

@lumagi
Copy link
Copy Markdown
Collaborator

lumagi commented Dec 6, 2025

Hi, thank you for the PR and for the effort. Could you please format the code with Black? It's not really mentioned anywhere yet, but I recently formatted the entire code base with it. It looks like there are some formatting changes in the diff.

@lumagi
Copy link
Copy Markdown
Collaborator

lumagi commented Dec 6, 2025

@smoosbau I integrated the black code check into the pipeline. Can you merge main into your PR? Then it should automatically pick up the check as part of the pipeline.

@Basti-iTAS-PLMS
Copy link
Copy Markdown

Hi @lumagi,
sorry, I was busy the last days.
I'll make the requested changes asap! Thanks for checking my PR. :)

@smoosbau smoosbau marked this pull request as ready for review December 15, 2025 09:50
@smoosbau
Copy link
Copy Markdown
Author

@lumagi, formatting is done.
Please let me know what to do next. :)

@lumagi
Copy link
Copy Markdown
Collaborator

lumagi commented Dec 21, 2025

Hi, I took a thorough look through the code. Do I understand correctly that the reason behind your batched approach is to avoid loading all the 8h data into memory? I like the batched approach that you are using. I even think it is possible to still use a single Reader class that offers both a batched and a none-batched approach. If all data should be read, the Reader class internally uses the batched approach to read all data into a pre-created buffer. If the user wants to use a batched approach, they can directly use the public method for reading the data batched. I specifically like the yield semantics in Python for this. What do you think about this? Then there is no algorithmic duplication in the code.

@lumagi
Copy link
Copy Markdown
Collaborator

lumagi commented Dec 21, 2025

I didn't do this when I first wrote the code, but the data array for the non-batched read approach can even be preallocated since the file size is known.

@lumagi
Copy link
Copy Markdown
Collaborator

lumagi commented Dec 21, 2025

I have yet another question: Why did you always reset the sample counter for the synchronization offset in your batched read operation to 0? Would it not make more sense to keep the sample counter for proper global synchronization?

@smoosbau
Copy link
Copy Markdown
Author

Hi @lumagi,
sorry, for the late response. I've been on vacation.
I like the idea of fusing the batched-wise and complete reading approach into one. Didn't do it, because I wanted to keep code changes small (according to my result, achievement of this goal can be discussed :D).

According to the counter, I need to re-check that. I'm currently not sure which counter you mean.

Futhermore, I have some additional ideas for improvement we could / should discuss.
Are you in for a chat elswhere (i.e. MS Teams or Discord)?
I would like to boost reading time by not seeking so much through the files and avoid reloading the complete header everytime, as most of the data in there is static.
If wee agree on common goals, we could add issues and I could work on them. What do you think about that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants