-
Notifications
You must be signed in to change notification settings - Fork 68
Description
Original report (archived issue) by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
There are a few ways we might speed up the overall rate at which shots are run. Some are pretty invasive so it's not a minor change, but the basic idea is to split up transition_to_buffered() and transition_to_manual() into multiple steps, and a) only call the ones that are necessary, and b) call the ones that are not dependent on previous steps simultaneously.
So for example, transition_to_manual() could be split into:
- Read data from hardware (this can be done before the shot is even complete)
- Save data to shot file (this can be done after the shot is complete, even whilst the next shot is running)
- Get hardware into an appropriate state for either a) another buffered run or b) actual transition to manual. (
program_manualcould be skipped unless the queue is paused)
transition_to_buffered() could be split into:
- Read instructions from the shot file (this can be done before the previous shot is complete, and can be completely skipped if it is known that the shot is an exact repeat of the previous one)
- Program the hardware
Running as many of these steps as possible simultaneously, and skipping unnecessary ones could go some way to speeding up the cycle time of BLACS. In the ideal case, devices that are retriggerable with the same date will not need any reconfiguration in between shots, and will contribute no overhead.
Profiling will reveal what the actual overhead is. If after fixing the above sources of overhead (if they are what's dominating), it turns out that opening and closing HDF5 files is the slow thing, then we can have some kind of intelligent "readahead" in one hit in a single process as soon as the shot arrives in BLACS - knowing based on previous shots what groups and datasets a particular driver opened, the data can be read all ahead of time and the worker process will see a proxy HDF5 file object which requires no zlock to open, and which already has all data available, only opening the actual shot file if the driver attempts to read a group that was not read in advance. This would consume more RAM, so should be disableable of course.
These are the sorts of optimisation we could do, but before any of it I would want to do profiling, marking particular functions and when they were called, and getting some statistics to see where the bottlenecks are.