Skip to content

feat: add per-adapter cross-process file locking for BLE serialization#222

Closed
cgoudie wants to merge 2 commits intoBluetooth-Devices:mainfrom
TechBlueprints:feat/cross-process-lock
Closed

feat: add per-adapter cross-process file locking for BLE serialization#222
cgoudie wants to merge 2 commits intoBluetooth-Devices:mainfrom
TechBlueprints:feat/cross-process-lock

Conversation

@cgoudie
Copy link

@cgoudie cgoudie commented Feb 16, 2026

Summary

  • Adds LockConfig dataclass for configuring per-adapter file locks that serialize BLE operations across processes
  • Adds lock.py module with async-safe acquire_lock() / release_lock() using non-blocking fcntl.flock
  • Adds lock_config and in_process_semaphore parameters to establish_connection() for cross-process and in-process serialization
  • Lock is acquired before each connection attempt and released in a finally block, allowing other processes to interleave between retries

Problem

On multi-service systems (e.g. Venus OS / Cerbo GX), several processes compete for the same BLE adapter simultaneously:

Service BLE operations
dbus-serialbattery (1 per BLE battery) Scan + connect every ~60s
dbus-power-watchdog Scan on discovery + persistent connect
dbus-shyion-switch Scan on discovery + connect-on-demand
dbus-ble-sensors (Victron) Continuous passive scan
vesmart-server (Victron) Periodic GATT connect

Without coordination, 3-5 processes compete for a single BlueZ adapter, producing org.bluez.Error.InProgress errors on ~40% of connection attempts.

Design decisions

Decision Rationale
Per-adapter locks (not global) hci0 and hci1 are independent radios. A global lock unnecessarily serializes operations that could run in parallel on different adapters
Opt-in (LockConfig(enabled=True)) Existing callers are unaffected; only multi-service deployments need this
Non-blocking flock (LOCK_NB + asyncio.sleep retry) A blocking LOCK_EX would freeze the asyncio event loop
Graceful degradation on timeout Proceeds without the lock after lock_timeout seconds to prevent deadlock if a lock holder crashes
fcntl.flock auto-release Released automatically on fd close / process exit — crashed processes cannot hold the lock permanently
in_process_semaphore parameter fcntl.flock is per-process, not per-thread. Multiple asyncio tasks in the same process need an asyncio.Semaphore for serialization. Both can be used together
Lock released per attempt Lock is held only during client.connect(), not during backoff. Other processes can interleave between retry attempts

Stuck states this addresses

State Name How this PR helps
State 4 Scan Collision (InProgress During Scan) Direct fix. Per-adapter file lock prevents concurrent scans on the same adapter across processes
State 10 InProgress Dominance (All Adapters Stuck) Partial fix. Serialization reduces the probability of all adapters being stuck simultaneously from concurrent operations

Usage

from bleak_retry_connector import establish_connection, LockConfig

# Multi-process: per-adapter file lock
lock_config = LockConfig(enabled=True)

# Multi-task (same process): shared semaphore
import asyncio
semaphore = asyncio.Semaphore(1)

client = await establish_connection(
    BleakClient,
    device,
    "my-service",
    lock_config=lock_config,
    in_process_semaphore=semaphore,
)

Test plan

  • test_lock_config_path_for_adapter — correct path generation per adapter
  • test_lock_config_custom_template — custom lock templates work
  • test_acquire_release_lock — basic acquire/release cycle
  • test_acquire_lock_disabled — no lock when config disabled
  • test_acquire_lock_no_fcntl — graceful degradation without fcntl
  • test_release_lock_none — safe no-op on None
  • test_lock_contention_timeout — second acquirer times out
  • test_lock_released_after_holder_closes — re-acquire after release
  • test_lock_per_adapter_independent — hci0 and hci1 locks are independent
  • test_lock_bad_directory — graceful degradation on missing directory
  • test_establish_connection_with_lock_config — lock acquired/released around connection
  • test_establish_connection_with_semaphore — semaphore acquired/released
  • test_establish_connection_lock_released_on_failure — lock+semaphore released on error
  • test_establish_connection_no_lock_by_default — no lock when not configured
  • All 68 existing tests pass
  • All pre-commit hooks pass (black, flake8, mypy, bandit, isort, etc.)

Made with Cursor

On multi-service systems (e.g. Venus OS / Cerbo GX) several processes
may compete for the same BLE adapter, causing InProgress errors on
~40% of connection attempts. This adds optional per-adapter file
locking via fcntl.flock to serialize BLE operations across processes.

New components:
- LockConfig dataclass (const.py): configures lock directory, template,
  and timeout. Per-adapter lock files allow full parallelism across
  adapters while preventing InProgress from concurrent scans on the
  same adapter.
- lock.py: acquire_lock() / release_lock() helpers using non-blocking
  fcntl.flock with async retry. Graceful degradation on timeout or
  missing directory.
- establish_connection() gains lock_config and in_process_semaphore
  parameters. Lock is acquired before each connection attempt and
  released in a finally block, allowing other processes to interleave
  between retries.

Key design decisions:
- Opt-in via LockConfig(enabled=True) — existing callers unaffected
- Per-adapter, not global — hci0 and hci1 are independent radios
- Non-blocking flock with asyncio.sleep retry — never blocks event loop
- Graceful degradation on timeout — prevents deadlock
- fcntl.flock auto-releases on fd close / process exit
- in_process_semaphore for same-process serialization (asyncio tasks)

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 93.93939% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.10%. Comparing base (69ea901) to head (77a7830).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/bleak_retry_connector/lock.py 90.47% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   79.64%   81.10%   +1.45%     
==========================================
  Files           6        7       +1     
  Lines         570      635      +65     
  Branches      112      118       +6     
==========================================
+ Hits          454      515      +61     
- Misses         70       74       +4     
  Partials       46       46              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

/run is the standard location for runtime state files — cleared on
reboot so stale locks cannot survive, and not subject to tmpwatch or
noexec mount concerns that affect /tmp.

Co-authored-by: Cursor <cursoragent@cursor.com>
@bdraco
Copy link
Member

bdraco commented Feb 16, 2026

Closing this. Please do not open PRs without first opening an issue to discuss the problem you're trying to solve and the proposed approach.

These PRs appear to be bulk AI-generated without understanding the project's architecture or real-world usage. Before contributing, please:

  1. Open an issue first describing the specific problem you're experiencing (with logs, hardware details, reproduction steps)
  2. Discuss the approach before writing code — many of these changes introduce significant architectural decisions (subprocess calls, file locking, thread-level watchdogs) that need discussion
  3. Keep PRs small and focused — one logical change per PR

@bdraco bdraco closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments