Skip to content

Commit 217f4b8

Browse files
coordinator/client: add lease-based place locking
Place acquisition is currently indefinite, which can lead to stale locks when users stop interacting with a place without releasing it. Add a lease-based locking mode that allows places to be acquired for a limited time and expire automatically unless extended. Leases are tied to the lifetime of a reservation. This prevents stale locks and reduces the need for manual cleanup. Signed-off-by: Asher Pemberton <asher.pemberton@arm.com> Reviewed-by: Asher Pemberton <asher.pemberton@arm.com> # gatekeeper Co-authored-by: Idan Saadon <idan.saadon@arm.com>
1 parent 4e6d27e commit 217f4b8

14 files changed

Lines changed: 1542 additions & 261 deletions

doc/getting_started.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,14 @@ To interact with this place, it needs to be acquired first, this is done by
311311
312312
labgrid-venv $ labgrid-client -p example-place acquire
313313
314+
For short-lived access, a place can also be leased using a reservation:
315+
316+
.. code-block:: bash
317+
318+
labgrid-venv $ labgrid-client reserve board=example --shell
319+
labgrid-venv $ labgrid-client wait
320+
labgrid-venv $ labgrid-client -p + lease
321+
314322
Now we can connect to the serial console:
315323

316324
.. code-block:: bash

doc/man/client.rst

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,23 @@ a name defined within the exporter, cls is the class of the exported resource
8888
and name is its name. Wild cards in match patterns are explicitly allowed, *
8989
matches anything.
9090

91+
Leases
92+
------
93+
A lease is a time-limited acquisition of a place.
94+
95+
Leases are always associated with a reservation. The reserved place is referenced
96+
using ``-p +``. To keep a lease active, the reservation must be extended using
97+
the ``extend`` command.
98+
99+
Use ``labgrid-client extend`` to refresh the lease once, or
100+
``labgrid-client extend --keepalive`` to keep it alive automatically.
101+
102+
If a lease extension fails, the coordinator cancels the lease and releases the
103+
place to avoid leaving the system in an inconsistent state.
104+
105+
If the reservation expires, the coordinator automatically releases the leased
106+
place.
107+
91108
Adding Named Resources
92109
----------------------
93110
If a target contains multiple Resources of the same type, named matches need to
@@ -122,6 +139,28 @@ Open a console to the acquired place:
122139
123140
$ labgrid-client -p <placename> console
124141
142+
Lease a place using a reservation:
143+
144+
.. code-block:: bash
145+
146+
$ labgrid-client reserve board=imx8 --shell
147+
$ labgrid-client wait
148+
$ labgrid-client -p + lease
149+
150+
Extend the lease to keep it alive:
151+
152+
.. code-block:: bash
153+
154+
# refresh once by passing the token explicitly
155+
$ labgrid-client extend ABC123
156+
157+
# or take the token from the environment (LG_TOKEN)
158+
$ export LG_TOKEN=ABC123
159+
$ labgrid-client extend
160+
161+
# keep the lease alive automatically
162+
$ labgrid-client extend --keepalive
163+
125164
Add all resources with the group "example-group" to the place example-place:
126165

127166
.. code-block:: bash

doc/usage.rst

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ can lock that place.
117117
timeout: 2019-08-06 12:59:11.840780
118118
$ labgrid-client -p +SP37P5OQRU console
119119
120+
120121
When using reservation in a CI job or to save some typing, the ``labgrid-client
121122
reserve`` command supports a ``--shell`` command to print code for evaluating
122123
in the shell.
@@ -166,6 +167,65 @@ allocated before returning.
166167
A reservation will time out after a short time, if it is neither refreshed nor
167168
used by locked places.
168169

170+
Leasing a place using a reservation
171+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
172+
173+
After a reservation has been allocated, the reserved place can be acquired
174+
either indefinitely or as a lease.
175+
176+
First, create a reservation and export the reservation token:
177+
178+
.. code-block:: bash
179+
180+
$ labgrid-client reserve board=imx8 --shell
181+
export LG_TOKEN=ABC123
182+
183+
Once the reservation is allocated, the reserved place can be referenced using
184+
``-p +``, which expands to the place allocated to the reservation.
185+
186+
.. code-block:: bash
187+
188+
$ labgrid-client -p + lease
189+
leased place board-01
190+
191+
A lease is time-limited and must be kept alive by extending the reservation.
192+
This is done using the ``extend`` command.
193+
194+
To refresh the reservation once:
195+
196+
.. code-block:: bash
197+
198+
$ labgrid-client extend ABC123
199+
200+
The reservation token can also be taken from the environment variable
201+
``LG_TOKEN``:
202+
203+
.. code-block:: bash
204+
205+
$ export LG_TOKEN=ABC123
206+
$ labgrid-client extend
207+
208+
To continuously keep the reservation alive, use the keepalive mode:
209+
210+
.. code-block:: bash
211+
212+
$ labgrid-client extend --keepalive
213+
214+
In keepalive mode, the client automatically renews the lease at a suitable interval.
215+
216+
If a lease extension fails (for example due to exporter issues), the lease is
217+
cancelled and the place is released to avoid inconsistent state.
218+
219+
To acquire the place indefinitely (unchanged behaviour), use ``lock`` or
220+
``acquire`` without lease mode:
221+
222+
.. code-block:: bash
223+
224+
$ labgrid-client -p + lock
225+
# or
226+
$ labgrid-client -p + acquire
227+
228+
169229
Library
170230
-------
171231
labgrid can be used directly as a Python library, without the infrastructure

labgrid/remote/client.py

Lines changed: 152 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,19 @@ async def print_resources(self):
383383
else:
384384
print(line)
385385

386+
async def _get_reservation_by_token(self, token):
387+
request = labgrid_coordinator_pb2.GetReservationsRequest()
388+
try:
389+
response = await self.stub.GetReservations(request)
390+
except grpc.aio.AioRpcError as e:
391+
raise ServerError(e.details()) from e
392+
393+
for res_pb in response.reservations:
394+
res = Reservation.from_pb2(res_pb)
395+
if res.token == token:
396+
return res
397+
return None
398+
386399
async def print_places(self):
387400
"""Print out the places"""
388401
if self.args.sort_last_changed:
@@ -525,8 +538,13 @@ def get_acquired_place(self, place=None):
525538
async def print_place(self):
526539
"""Print out the current place and related resources"""
527540
place = self.get_place()
541+
reservation = None
542+
if place.reservation:
543+
reservation = await self._get_reservation_by_token(place.reservation)
544+
528545
print(f"Place '{place.name}':")
529-
place.show(level=1)
546+
place.show(level=1, reservation=reservation)
547+
530548
if place.acquired:
531549
for resource_path in place.acquired_resources:
532550
(exporter, group_name, cls, resource_name) = resource_path
@@ -723,6 +741,20 @@ async def acquire(self):
723741
raise errors[0]
724742
raise ErrorGroup("Multiple errors occurred during acquire", errors)
725743

744+
async def lease(self):
745+
errors = []
746+
places = self.get_place_names_from_env() if self.env else [self.args.place]
747+
for place in places:
748+
try:
749+
await self._lease_place(place)
750+
except Error as e:
751+
errors.append(e)
752+
753+
if errors:
754+
if len(errors) == 1:
755+
raise errors[0]
756+
raise ErrorGroup("Multiple errors occurred during lease", errors)
757+
726758
async def _acquire_place(self, place):
727759
"""Acquire a place, marking it unavailable for other clients"""
728760
place = self.get_place(place)
@@ -769,6 +801,55 @@ async def _acquire_place(self, place):
769801

770802
raise ServerError(e.details())
771803

804+
async def _lease_place(self, place):
805+
"""Lease a place using a reservation"""
806+
place = self.get_place(place)
807+
808+
if place.acquired:
809+
host, user = place.acquired.split("/")
810+
allowhelp = f"'labgrid-client -p {place.name} allow {self.gethostname()}/{self.getuser()}' on {host}."
811+
812+
if self.getuser() != user:
813+
raise UserError(
814+
f"Place {place.name} is already acquired by {place.acquired}. "
815+
f"To work simultaneously, {user} can execute {allowhelp}"
816+
)
817+
818+
if self.gethostname() == host:
819+
raise UserError(f"You have already acquired place {place.name}.")
820+
821+
raise UserError(
822+
f"You have already acquired place {place.name} on {host}. To work simultaneously, execute {allowhelp}"
823+
)
824+
825+
if not self.args.allow_unmatched:
826+
self.check_matches(place)
827+
828+
request = labgrid_coordinator_pb2.LeasePlaceRequest(placename=place.name)
829+
830+
try:
831+
await self.stub.LeasePlace(request)
832+
await self.sync_with_coordinator()
833+
print(f"leased place {place.name}")
834+
except grpc.aio.AioRpcError as e:
835+
# check potential failure causes
836+
for exporter, groups in sorted(self.resources.items()):
837+
for group_name, group in sorted(groups.items()):
838+
for resource_name, resource in sorted(group.items()):
839+
resource_path = (exporter, group_name, resource.cls, resource_name)
840+
if not resource.acquired:
841+
continue
842+
match = place.getmatch(resource_path)
843+
if match is None:
844+
continue
845+
name = resource_name
846+
if match.rename:
847+
name = match.rename
848+
print(
849+
f"Matching resource '{name}' ({exporter}/{group_name}/{resource.cls}/{resource_name}) already acquired by place '{resource.acquired}'"
850+
) # pylint: disable=line-too-long
851+
raise ServerError(e.details())
852+
772853
async def release(self):
773854
errors = []
774855
places = self.get_place_names_from_env() if self.env else [self.args.place]
@@ -1592,16 +1673,25 @@ async def cancel_reservation(self):
15921673
except grpc.aio.AioRpcError as e:
15931674
raise ServerError(e.details())
15941675

1595-
async def _wait_reservation(self, token: str, verbose=True):
1596-
while True:
1597-
request = labgrid_coordinator_pb2.PollReservationRequest(token=token)
1676+
async def _extend_lease(self, token: str) -> Reservation:
1677+
request = labgrid_coordinator_pb2.ExtendLeaseRequest(token=token)
1678+
try:
1679+
response = await self.stub.ExtendLease(request)
1680+
except grpc.aio.AioRpcError as e:
1681+
raise ServerError(e.details())
1682+
return Reservation.from_pb2(response.reservation)
15981683

1599-
try:
1600-
response: labgrid_coordinator_pb2.PollReservationResponse = await self.stub.PollReservation(request)
1601-
except grpc.aio.AioRpcError as e:
1602-
raise ServerError(e.details())
1684+
async def _poll_reservation(self, token: str) -> Reservation:
1685+
request = labgrid_coordinator_pb2.PollReservationRequest(token=token)
1686+
try:
1687+
response: labgrid_coordinator_pb2.PollReservationResponse = await self.stub.PollReservation(request)
1688+
except grpc.aio.AioRpcError as e:
1689+
raise ServerError(e.details())
1690+
return Reservation.from_pb2(response.reservation)
16031691

1604-
res = Reservation.from_pb2(response.reservation)
1692+
async def _wait_reservation(self, token: str, verbose=True):
1693+
while True:
1694+
res = await self._poll_reservation(token)
16051695
if verbose:
16061696
res.show()
16071697
if res.state is ReservationState.waiting:
@@ -1613,6 +1703,37 @@ async def wait_reservation(self):
16131703
token = self.args.token
16141704
await self._wait_reservation(token)
16151705

1706+
async def _get_lease_config(self):
1707+
request = labgrid_coordinator_pb2.GetLeaseConfigRequest()
1708+
try:
1709+
response = await self.stub.GetLeaseConfig(request)
1710+
except grpc.aio.AioRpcError as e:
1711+
raise ServerError(e.details()) from e
1712+
return response
1713+
1714+
async def extend_reservation(self):
1715+
token = self.args.token
1716+
1717+
if not self.args.keepalive:
1718+
res = await self._extend_lease(token)
1719+
if not self.args.quiet:
1720+
print(f"extended lease {res.token} until {datetime.fromtimestamp(res.timeout)}")
1721+
return
1722+
1723+
lease_config = await self._get_lease_config()
1724+
interval = max(1.0, lease_config.default_extend_duration / 2.0)
1725+
1726+
while True:
1727+
res = await self._extend_lease(token)
1728+
if res.state is ReservationState.expired:
1729+
raise UserError("Reservation is expired; cannot extend lease")
1730+
if not self.args.quiet:
1731+
print(
1732+
f"extended lease {res.token} until {datetime.fromtimestamp(res.timeout)} "
1733+
f"(next keepalive in {interval:.1f}s)"
1734+
)
1735+
await asyncio.sleep(interval)
1736+
16161737
async def print_reservations(self):
16171738
request = labgrid_coordinator_pb2.GetReservationsRequest()
16181739

@@ -2054,6 +2175,12 @@ def get_parser(auto_doc_mode=False) -> "argparse.ArgumentParser | AutoProgramArg
20542175
)
20552176
subparser.set_defaults(func=ClientSession.acquire)
20562177

2178+
subparser = subparsers.add_parser("lease", help="lease a place (time-limited, requires extension)")
2179+
subparser.add_argument(
2180+
"--allow-unmatched", action="store_true", help="allow missing resources for matches when locking the place"
2181+
)
2182+
subparser.set_defaults(func=ClientSession.lease)
2183+
20572184
subparser = subparsers.add_parser("release", aliases=("unlock",), help="release a place")
20582185
subparser.add_argument(
20592186
"-k", "--kick", action="store_true", help="release a place even if it is acquired by a different user"
@@ -2312,6 +2439,21 @@ def get_parser(auto_doc_mode=False) -> "argparse.ArgumentParser | AutoProgramArg
23122439
subparser.add_argument("token", type=str, nargs="?")
23132440
subparser.set_defaults(func=ClientSession.wait_reservation)
23142441

2442+
subparser = subparsers.add_parser("extend", help="extend a lease by the coordinator default duration")
2443+
subparser.add_argument("token", type=str, nargs="?")
2444+
subparser.add_argument(
2445+
"--keepalive",
2446+
action="store_true",
2447+
help="keep extending the lease automatically using coordinator policy",
2448+
)
2449+
subparser.add_argument(
2450+
"-q",
2451+
"--quiet",
2452+
action="store_true",
2453+
help="do not print lease status on each extension",
2454+
)
2455+
subparser.set_defaults(func=ClientSession.extend_reservation)
2456+
23152457
subparser = subparsers.add_parser("reservations", help="list current reservations")
23162458
subparser.set_defaults(func=ClientSession.print_reservations)
23172459

@@ -2388,7 +2530,7 @@ def main():
23882530
if args.initial_state is None:
23892531
args.initial_state = initial_state
23902532

2391-
if args.command in ["cancel-reservation", "wait"] and args.token is None:
2533+
if args.command in ["cancel-reservation", "wait", "extend"] and args.token is None:
23922534
if token:
23932535
args.token = token
23942536
else:

0 commit comments

Comments
 (0)