-
Notifications
You must be signed in to change notification settings - Fork 8.2k
fix: Properly handle docling threading on macOS #11140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughRefactors worker-monitoring and process-termination logic in the Docling component to support flexible worker-based execution. Introduces platform-specific worker management: macOS uses threading with thread-safe queues, while other platforms use multiprocessing with process workers. Updates method signatures, result retrieval, termination strategies, and error handling accordingly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touchesImportant Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings, 1 inconclusive)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the DoclingInlineComponent to handle Docling processing using threading on macOS and multiprocessing on other platforms, addressing fork safety issues specific to macOS while maintaining performance elsewhere.
Key Changes:
- Platform-specific worker execution: threading for macOS, multiprocessing for Linux/Windows
- Generalized worker management methods to handle both threads and processes
- Enhanced queue cleanup to avoid errors with thread-based queues
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #11140 +/- ##
=======================================
Coverage 33.23% 33.24%
=======================================
Files 1394 1394
Lines 66068 66068
Branches 9778 9778
=======================================
+ Hits 21956 21961 +5
+ Misses 42986 42980 -6
- Partials 1126 1127 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/lfx/src/lfx/components/docling/docling_inline.py (3)
6-6: ThreadQueue alias may cause confusion.The alias
ThreadQueueforqueue.Queueis used alongside multiprocessing queues. Consider usingqueue.Queuedirectly at usage sites or choosing a more descriptive name likeThreadSafeQueueto make the distinction clearer.
196-225: Extract platform-specific worker creation logic.The platform detection and worker creation logic adds significant complexity to
process_files. Consider extracting this into a dedicated method like_create_worker()to improve testability and maintainability.
244-248: Clarify queue cleanup with isinstance check.The nested
hasattrchecks make the cleanup logic harder to follow. Consider checking the queue type once usingisinstanceto make the intent clearer:if isinstance(queue, multiprocessing.queues.Queue): try: queue.close() queue.join_thread() except Exception as e: self.log(f"Warning: Error during queue cleanup - {e}")
🧹 Nitpick comments (3)
src/lfx/src/lfx/components/docling/docling_inline.py (3)
98-98: Add type annotations for improved type safety.The
queueandworkerparameters lack type annotations. Consider adding them to improve static analysis and code clarity:def _wait_for_result_with_worker_monitoring( self, queue: ThreadQueue | multiprocessing.Queue, worker: threading.Thread | multiprocessing.Process, timeout: int = 300 ):Or use
Unionif targeting older Python versions.
140-140: Add type annotation for worker parameter.The
workerparameter should have a type annotation to clarify it accepts boththreading.Threadandmultiprocessing.Process:def _terminate_worker_gracefully( self, worker: threading.Thread | multiprocessing.Process, timeout_terminate: int = 10, timeout_kill: int = 5 ):
196-196: Consider documenting platform-specific behavior.The code only checks for
sys.platform == "darwin"to handle macOS fork-safety issues. While this addresses the immediate concern, consider:
- Documenting why macOS specifically requires threading (CoreFoundation fork-safety)
- Evaluating whether other BSD variants might have similar issues
- Noting that Windows always uses spawn for multiprocessing (already handled by
get_context("spawn"))This would help future maintainers understand the platform-specific logic.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lfx/src/lfx/_assets/component_index.jsonsrc/lfx/src/lfx/components/docling/docling_inline.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/lfx/src/lfx/components/docling/docling_inline.py (1)
src/lfx/src/lfx/base/data/docling_utils.py (1)
docling_worker(155-389)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 11/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 16/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 13/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 14/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 12/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 15/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/17
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/17
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Test Starter Templates
🔇 Additional comments (1)
src/lfx/src/lfx/components/docling/docling_inline.py (1)
198-210: The signal handler registration indocling_workeris already safely guarded. Lines 196-202 ofdocling_utils.pywrap thesignal.signal()calls in a try-except block that catchesValueError(the exception raised when attempting to register signal handlers from a non-main thread). The code logs a warning and continues gracefully instead of crashing.Likely an incorrect or invalid review comment.
This pull request refactors the
DoclingInlineComponentto support both threading and multiprocessing for worker execution, improving compatibility and stability across platforms (notably fixing issues on macOS). The changes generalize worker management code, allowing the component to use threads on macOS (to avoid fork safety issues) and processes elsewhere for performance. Worker monitoring, termination, and queue handling logic have all been updated for this flexibility.Worker management and cross-platform support:
threading.Thread(used on macOS) andmultiprocessing.Process(used on Linux/Windows), switching queue types and worker creation accordingly inprocess_files(DoclingInlineComponent)._wait_for_result_with_worker_monitoring,_terminate_worker_gracefully). [1] [2]Logging and error handling improvements:
Resource cleanup:
Imports and type handling:
queue.Queueto avoid naming conflicts.Summary by CodeRabbit
Release Notes
Performance Improvements
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.