Skip to content

Commit 4cc7ac6

Browse files
alexterceteidan-arm
andcommitted
remote/coordinator: run reservation scheduling under lock
Reservation scheduling mutates shared coordinator state (`reservations`, `places`, and the derived `place.reservation` view), but the periodic scheduler path was performing these updates without holding the coordinator lock. This can lead to races between the poll-based scheduler and concurrent RPC handlers (e.g. acquire/release/reservation operations), potentially resulting in inconsistent state or incorrect reservation transitions. We now run reservation scheduling under the coordinator lock by making it async and awaiting it in the poll path. This ensures that all mutations of coordinator state are serialized consistently with the rest of the coordinator operations. Testing ======= - Ran existing unit tests covering reservation and acquire/release flows. - Exercised reservation creation/cancel and place acquire/release scenarios manually. - No deadlocks were observed in these runs. - This patch does not add a dedicated stress test, and the current tests do not prove the absence of deadlocks. Performance =========== - Measured `schedule_reservations` runtime under load. With 100 places and 200 reservations created in parallel, observed runtime was ~0.0013s, with worst case ~0.0036 s. - RPC paths were already invoking reservation scheduling while holding the coordinator lock; the primary change is that the poll-based scheduler path is now also serialized under the same lock. This patch focuses on fixing the scheduler locking model and does not change coordinator scheduling behavior. Signed-off-by: Alex Tercete <alex.tercete@arm.com> Reviewed-by: Alex Tercete <alex.tercete@arm.com> # gatekeeper Co-authored-by: Idan Saadon <idan.saadon@arm.com>
1 parent 06fd06c commit 4cc7ac6

1 file changed

Lines changed: 4 additions & 2 deletions

File tree

labgrid/remote/coordinator.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -241,8 +241,9 @@ async def _poll_step_sync_resources(self):
241241

242242
async def _poll_step_schedule(self):
243243
# update reservations
244-
with warn_if_slow("schedule reservations"):
245-
self.schedule_reservations()
244+
async with self.lock:
245+
with warn_if_slow("schedule reservations"):
246+
self.schedule_reservations()
246247

247248
async def poll(self, step_func):
248249
while not self.loop.is_closed():
@@ -956,6 +957,7 @@ def schedule_reservations(self):
956957
# only have a copy for convenience.
957958

958959
# expire reservations
960+
assert self.lock.locked()
959961
for res in list(self.reservations.values()):
960962
if res.state is ReservationState.acquired:
961963
# acquired reservations do not expire

0 commit comments

Comments
 (0)