fix: Properly handle docling threading on macOS #11140

erichare · 2025-12-23T17:09:23Z

This pull request refactors the DoclingInlineComponent to support both threading and multiprocessing for worker execution, improving compatibility and stability across platforms (notably fixing issues on macOS). The changes generalize worker management code, allowing the component to use threads on macOS (to avoid fork safety issues) and processes elsewhere for performance. Worker monitoring, termination, and queue handling logic have all been updated for this flexibility.

Worker management and cross-platform support:

Refactored worker handling to support both threading.Thread (used on macOS) and multiprocessing.Process (used on Linux/Windows), switching queue types and worker creation accordingly in process_files (DoclingInlineComponent).
Renamed and generalized monitoring and termination methods to work with both threads and processes, updating their logic and docstrings for clarity (_wait_for_result_with_worker_monitoring, _terminate_worker_gracefully). [1] [2]

Logging and error handling improvements:

Updated log and error messages to use "worker" instead of "process", and to distinguish between thread and process failures for clearer diagnostics. [1] [2]

Resource cleanup:

Enhanced queue cleanup logic to only close and join multiprocessing queues (which support these methods), preventing errors when using thread queues.

Imports and type handling:

Adjusted imports to support both threading and multiprocessing, and aliased queue.Queue to avoid naming conflicts.

Summary by CodeRabbit

Release Notes

Performance Improvements
- Optimized cross-platform worker execution with platform-specific strategies for enhanced reliability
Bug Fixes
- Improved timeout handling with clearer diagnostic messages when operations exceed time limits
- Enhanced error reporting and logging for worker failures
- Better resource cleanup and graceful termination to prevent memory leaks

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-23T17:10:10Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Refactors worker-monitoring and process-termination logic in the Docling component to support flexible worker-based execution. Introduces platform-specific worker management: macOS uses threading with thread-safe queues, while other platforms use multiprocessing with process workers. Updates method signatures, result retrieval, termination strategies, and error handling accordingly.

Changes

Cohort / File(s)	Change Summary
Worker orchestration and platform-specific execution `src/lfx/src/lfx/components/docling/docling_inline.py`	Renamed `_wait_for_result_with_process_monitoring()` to `_wait_for_result_with_worker_monitoring()` to handle both Thread and Process workers; now polls queue with timeout and logs results differently per worker type. Renamed `_terminate_process_gracefully()` to `_terminate_worker_gracefully()` to handle thread termination (await and warn) separately from process termination (SIGTERM/SIGKILL). Added platform detection for macOS (threading with ThreadQueue) vs other platforms (multiprocessing.Process with Queue). Updated resource cleanup to conditionally close/join queues. Adjusted timeout and error messages to reflect worker-based terminology. Enhanced processing flow to initialize, start, and monitor worker based on platform.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR implements significant refactoring for threading/multiprocessing worker execution but includes no test files or behavioral assertions.	Add regression tests verifying threading on macOS and multiprocessing on other platforms, covering worker management logic, queue handling, timeout behavior, and resource cleanup.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Quality And Coverage	⚠️ Warning	The pull request introduces substantial new worker management logic with critical behavioral changes, but zero tests exist for the DoclingInlineComponent class or its refactored methods.	Create comprehensive test file for DoclingInlineComponent covering _wait_for_result_with_worker_monitoring(), _terminate_worker_gracefully(), process_files() platform-specific logic, and error scenarios.
Test File Naming And Structure	❓ Inconclusive	Cannot locate test files for DoclingInlineComponent in repository structure despite extensive search using multiple methods and patterns.	Verify test file location, naming convention used in project, and whether tests exist for threading/multiprocessing refactoring changes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: Properly handle docling threading on macOS' accurately summarizes the main change: introducing proper threading support on macOS for the Docling component by refactoring worker handling to use threads on macOS and processes on other platforms.
Excessive Mock Usage Warning	✅ Passed	The pull request does not include any test files with excessive mock usage. The only docling-related test file uses real objects rather than mocks, testing actual behavior without obscuring what's being tested.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR refactors the DoclingInlineComponent to handle Docling processing using threading on macOS and multiprocessing on other platforms, addressing fork safety issues specific to macOS while maintaining performance elsewhere.

Key Changes:

Platform-specific worker execution: threading for macOS, multiprocessing for Linux/Windows
Generalized worker management methods to handle both threads and processes
Enhanced queue cleanup to avoid errors with thread-based queues

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/lfx/src/lfx/components/docling/docling_inline.py

github-actions · 2025-12-23T17:14:40Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	16.68% (4707/28211)	9.98% (2177/21801)	10.96% (679/6192)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1830	0 💤	0 ❌	0 🔥	25.376s ⏱️

Co-authored-by: Copilot <[email protected]>

codecov · 2025-12-23T17:19:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.24%. Comparing base (9ce7d84) to head (01b4d82).

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #11140   +/-   ##
=======================================
  Coverage   33.23%   33.24%           
=======================================
  Files        1394     1394           
  Lines       66068    66068           
  Branches     9778     9778           
=======================================
+ Hits        21956    21961    +5     
+ Misses      42986    42980    -6     
- Partials     1126     1127    +1

Flag	Coverage Δ
backend	`52.47% <ø> (+0.03%)`	⬆️
frontend	`15.37% <ø> (ø)`
lfx	`39.48% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

src/lfx/src/lfx/components/docling/docling_inline.py (3)
6-6: ThreadQueue alias may cause confusion.

The alias ThreadQueue for queue.Queue is used alongside multiprocessing queues. Consider using queue.Queue directly at usage sites or choosing a more descriptive name like ThreadSafeQueue to make the distinction clearer.

196-225: Extract platform-specific worker creation logic.

The platform detection and worker creation logic adds significant complexity to process_files. Consider extracting this into a dedicated method like _create_worker() to improve testability and maintainability.

244-248: Clarify queue cleanup with isinstance check.

The nested hasattr checks make the cleanup logic harder to follow. Consider checking the queue type once using isinstance to make the intent clearer:
if isinstance(queue, multiprocessing.queues.Queue):
    try:
        queue.close()
        queue.join_thread()
    except Exception as e:
        self.log(f"Warning: Error during queue cleanup - {e}")

🧹 Nitpick comments (3)

src/lfx/src/lfx/components/docling/docling_inline.py (3)
98-98: Add type annotations for improved type safety.

The queue and worker parameters lack type annotations. Consider adding them to improve static analysis and code clarity:
def _wait_for_result_with_worker_monitoring(
    self, 
    queue: ThreadQueue | multiprocessing.Queue, 
    worker: threading.Thread | multiprocessing.Process, 
    timeout: int = 300
):
Or use Union if targeting older Python versions.

140-140: Add type annotation for worker parameter.

The worker parameter should have a type annotation to clarify it accepts both threading.Thread and multiprocessing.Process:
def _terminate_worker_gracefully(
    self, 
    worker: threading.Thread | multiprocessing.Process, 
    timeout_terminate: int = 10, 
    timeout_kill: int = 5
):
196-196: Consider documenting platform-specific behavior.

The code only checks for sys.platform == "darwin" to handle macOS fork-safety issues. While this addresses the immediate concern, consider:

Documenting why macOS specifically requires threading (CoreFoundation fork-safety)

Evaluating whether other BSD variants might have similar issues

Noting that Windows always uses spawn for multiprocessing (already handled by get_context("spawn"))

This would help future maintainers understand the platform-specific logic.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5211855 and 80ba0ba.

📒 Files selected for processing (2)

src/lfx/src/lfx/_assets/component_index.json
src/lfx/src/lfx/components/docling/docling_inline.py

🧰 Additional context used

🧬 Code graph analysis (1)

src/lfx/src/lfx/components/docling/docling_inline.py (1)

src/lfx/src/lfx/base/data/docling_utils.py (1)

docling_worker (155-389)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)

GitHub Check: Run Frontend Tests / Playwright Tests - Shard 11/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 16/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 13/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 14/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 12/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 15/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/17
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/17
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
GitHub Check: Test Starter Templates

🔇 Additional comments (1)

src/lfx/src/lfx/components/docling/docling_inline.py (1)

198-210: The signal handler registration in docling_worker is already safely guarded. Lines 196-202 of docling_utils.py wrap the signal.signal() calls in a try-except block that catches ValueError (the exception raised when attempting to register signal handlers from a non-main thread). The code logs a warning and continues gracefully instead of crashing.

Likely an incorrect or invalid review comment.

src/lfx/src/lfx/components/docling/docling_inline.py

fix: Properly handle threading on macOS

955a86c

erichare requested a review from Copilot December 23, 2025 17:09

github-actions bot added the bug Something isn't working label Dec 23, 2025

Copilot AI reviewed Dec 23, 2025

View reviewed changes

Update docling_inline.py

2168447