Skip to content

Multiple processes competing for the same BLE adapter cause persistent InProgress errors -- no cross-process serialization #233

@cgoudie

Description

@cgoudie

Problem

On systems running multiple BLE services, connection attempts from different processes collide on the same adapter. BlueZ allows only one LE Create Connection per adapter at a time. When two processes call Device1.Connect() simultaneously, one gets org.bluez.Error.InProgress. With 3-5 services sharing 2 adapters, ~40% of connection attempts fail this way -- not because of stale state, but because another process is genuinely using the adapter.

establish_connection() has no mechanism to serialize operations across processes. Each process retries independently, creating a thundering-herd pattern where retries collide again.

Environment

  • Victron Cerbo GX, Venus OS v3.67, BlueZ 5.x
  • 2 USB BLE adapters (hci0, hci1)
  • 3-5 concurrent BLE services: BMS batteries (2 instances), power monitor, relay switches (7 devices), advertisement scanner

Production Evidence

[device] hci1 busy (InProgress, direct)
[device] hci0 busy (InProgress, scan)
[device] hci1 busy (InProgress, scan)
[device] InProgress dominated (3), clearing BlueZ

Both adapters return InProgress simultaneously because a battery service is connecting on hci1 and an advertisement scanner is scanning on hci0. The service exhausts its retry budget without getting a free adapter timeslot. On a single-service system this never happens.

Proposed Approach

An opt-in per-adapter file lock using fcntl.flock, configured through a LockConfig dataclass:

@dataclass
class LockConfig:
    enabled: bool = False
    lock_dir: str = "/run"
    lock_template: str = "bleak-retry-connector-{adapter}.lock"
    lock_timeout: float = 15.0

async def establish_connection(
    ...,
    lock_config: LockConfig | None = None,
    in_process_semaphore: asyncio.Semaphore | None = None,
    **kwargs,
) -> AnyBleakClient:

Key design decisions:

  • fcntl.flock is released automatically when the fd is closed, including on process crash. No stale locks, no cleanup.
  • Non-blocking with async retry: Uses LOCK_NB so the event loop isn't blocked. Retries with asyncio.sleep(0.25s).
  • Graceful degradation: If the lock isn't acquired within lock_timeout, the attempt proceeds without it. Prevents deadlocks.
  • Per-adapter granularity: Each adapter gets its own lock file (e.g., /run/bleak-retry-connector-hci0.lock). Services on different adapters don't block each other.
  • Released between retries: Lock held only during a single connection attempt, then released so other processes can interleave.
  • Optional in_process_semaphore: For asyncio tasks sharing an adapter within one process. Can combine with the file lock.
  • Linux-only: No-ops when fcntl is absent.

This only covers connect() operations via establish_connection(). External services not using this library (e.g., Victron's advertisement scanner) would need their own participation for full system-wide serialization.

No behavior change when lock_config is not provided.

What This Fixes

  • Adapter saturation on multi-service systems: Per-adapter lock ensures only one process uses a given adapter at a time, eliminating InProgress collisions between services using this library.
  • Thundering-herd retries: Forces sequential access instead of competing retries that keep colliding.

Reference Implementation

Branch with code and tests: feat/cross-process-lock


Related Upstream Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions