Skip to content

Conversation

@woodtp
Copy link

@woodtp woodtp commented Aug 22, 2025

Per suggestions from @shishlo, I investigated the frequency that Bunch::print would flush the output buffer. This PR replaces all instances of std::endl with '\n', as the former invokes a flush operation for each line written to the output file. For very large bunches, this substantially inflates write times. I've also removed manual flushes of the output buffer, as it is flushed automatically when the buffer is full anyway.

In testing with a bunch size of N=100_000_000, with MPI enabled and utilizing 8 cores, I observe these changes to reduce bunch write time by 60-70% across several trials, with no significant increase in RAM consumption relative to the existing implementation.

As an aside, I experimented with pre-allocating the output buffer with various sizes ranging from 64kB to 16384 kB, while also varying the chunkSize controlling the amount of data communicated to the rank 0 process at once. Neither seemed to yield any further decrease in write times.

@shishlo
Copy link
Contributor

shishlo commented Aug 25, 2025

Thank you Tony! I never could imaging that std:endl has so harmful effect on the performance of file writing. I aways assume it is used just to be consistent on Windows and Unix operating systems. I will try to avoid to use it in the future.

@azukov azukov merged commit 7671a02 into PyORBIT-Collaboration:main Sep 2, 2025
5 checks passed
@woodtp woodtp deleted the improvement/faster-bunch-dump branch September 22, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants