-
Notifications
You must be signed in to change notification settings - Fork 16
feat: function to collect bunch across MPI ranks into a single Python dictionary #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: function to collect bunch across MPI ranks into a single Python dictionary #72
Conversation
I don't see why not, if that's preferable. That layout seems more consistent with how a Bunch stores this information internally. |
| A dictionary containing the collected bunch attributes. Returns None if not on the root MPI rank or if the global bunch size is 0. | ||
| BunchDict structure: | ||
| { | ||
| "coords": NDArray[np.float64] of shape (N, 6) where N is the total number of macroparticles, | ||
| and the 6 columns correspond to [x, xp, y, yp, z, dE] in units of [m, rad, m, rad, m, eV], respectively. | ||
| "sync_part": { | ||
| "coords": NDArray[np.float64] of shape (3,), | ||
| "kin_energy": np.float64, | ||
| "momentum": np.float64, | ||
| "beta": np.float64, | ||
| "gamma": np.float64, | ||
| "time": np.float64 | ||
| }, | ||
| "attributes": { | ||
| <bunch attribute name>: <attribute value (np.float64 or np.int32)>, | ||
| ... | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@austin-hoover, I incorporated your request. See these lines for the updated list of keys/values:
|
@azukov correctly pointed out to me that there was an issue with the implementation in the case where the global bunch size wasn't divisible by the total number of MPI nodes. I have rewritten the utility entirely to resolve that issue while simultaneously avoiding loading the entire bunch into RAM, unless explicitly requested. It now works by creating a I timed collecting a bunch with 100M particles using the new method, split across 7 nodes (to force an uneven distribution), and here are the results: |
| # file_desc, fname = tempfile.mkstemp(suffix=".dat", prefix="collect_bunch_", dir="/tmp") | ||
| # os.close(file_desc) | ||
| # | ||
| # TODO: this doesn't seem to work. "SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Initially, I tried to use the built-in tempfile module to initialize the temporary file for the memmap. When attempting to broadcast the name of the file to the other nodes, it seems like I'm running into an instance of #65... So, I'm hard-coding the name of the file for now.
46848ca to
79ee8cb
Compare
|
Ok, it still wasn't working properly after some particles in the bunch were lost and I was getting some fun artifacts when I tried plotting emittance spectra. I was able to fix that by writing bunch coords to separate mem-mapped arrays on each MPI node, then concatenating them on the primary rank at the end of the routine. I also squashed the previous commits for readability. |
19150c1 to
2de7945
Compare
|
Please add benchmark for reading the bunch from binary file. |
@azukov here are some benchmarking results. I realized, after the fact, that I hadn't rebased on main after merging #75, so I included benchmarks with and without that PR as well. FWIW, the final file sizes for this size bunch are 7.8G and 4.5G for the ASCII and
|
…nary of numpy objects Each MPI node writes the bunch coordinates to a memory-mapped numpy array in /tmp. The primary rank concatenates them into a single memory-mapped array, and the extras are removed from disk. Also introduces a FileHandler protocol, which can define the schema for handling different filetypes, e.g., numpy binaries, HDF5, etc. The desired FileHandler can be passed as an argument to the functions in `collect_bunch.py`
2de7945 to
dc24558
Compare
|
@woodtp I think it's all good now. |

This PR adds
collect_bunchtoorbit.bunch_utils, which accepts aBunchand collects attributes across MPI ranks into a Python dictionary. The 6 coordinates (x, px, y, py, z, dE) are stored in the dictionary at key"coords"in annp.ndarraywith shape(N, 6). In the case of either attributes related to the synchronous particle or those contained in the bunch structure retrieved viaBunch.bunchAttrDoubleorBunch.bunchAttrInt, the bunch dictionary values are stored in their own dictionaries nested at keys"sync_part"and"attributes", respectively, as eithernp.float64ornp.int32where appropriate.Since the output of
collect_bunchis a dictionary of numpy objects, the user has the flexibility to pass directly to visualization libs and/or store the output in whichever format they wish. E.g.,