refactor: improve code quality in worker, scheduler, and format modules#3409
Open
devkp-dalhousie wants to merge 1 commit intospotify:masterfrom
Open
refactor: improve code quality in worker, scheduler, and format modules#3409devkp-dalhousie wants to merge 1 commit intospotify:masterfrom
devkp-dalhousie wants to merge 1 commit intospotify:masterfrom
Conversation
jpsnyder
reviewed
Mar 17, 2026
Comment on lines
+861
to
+863
| task_does_not_exist = task is None | ||
| worker_disabled_and_task_not_running = task is not None and task.status != RUNNING and not worker.enabled | ||
| if task_does_not_exist or worker_disabled_and_task_not_running: |
There was a problem hiding this comment.
I would argue this is change is harder to read. Negative variable names makes code more difficult to follow.
| logger.info("[pid %s] Worker %s running %s", os.getpid(), self.worker_id, self.task) | ||
|
|
||
| if self.use_multiprocessing: | ||
| # Need to have different random seeds if running in separate processes |
Comment on lines
-186
to
-189
| # Verify that all the tasks are fulfilled! For external tasks we | ||
| # don't care about unfulfilled dependencies, because we are just | ||
| # checking completeness of self.task so outputs of dependencies are | ||
| # irrelevant. |
There was a problem hiding this comment.
Should this comment be converted to a docstring on _check_unfulfilled_dependencies()?
412301d to
4ee3107
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Apply seven refactoring techniques to improve readability and maintainability across three core modules (luigi/worker.py, luigi/scheduler.py, luigi/format.py).
Set I Refactorings:
Extract Method: Decompose TaskProcess.run() into _execute_external_task(), _execute_regular_task(), _check_unfulfilled_dependencies(); extract _determine_task_status() from Worker._add()
Rename Variable: Fix PEP 8 naming conventions (processID -> process_id, currentTime -> current_time) in TaskProcess.run()
Decompose Conditional: Simplify Worker._keep_alive() from nested if/elif chain into sequential guard clauses with explaining variables
Introduce Explaining Variable: Replace complex inline boolean expression in Scheduler.add_task() with named variables (task_does_not_exist, worker_disabled_and_task_not_running, etc.)
Set II Refactorings:
Extract Class: Create WorkerNotifier class to separate notification/email concerns from the Worker class
Move Method: Promote _validate_dependency() and _check_complete_value() to module-level functions (they had no dependency on Worker instance state)
Pull-up Method: Introduce CompressionFormat base class for GzipFormat and Bzip2Format to eliminate duplicated pipe_reader/pipe_writer boilerplate
Motivation and Context
The Worker class in worker.py had 38 methods and a WMC of 135, mixing task scheduling, execution, notification, and lifecycle management. Key methods like TaskProcess.run() (71 lines) and Worker._add() (83 lines) were long with deep nesting. GzipFormat and Bzip2Format in format.py duplicated the same pipe reader/writer pattern. These refactorings reduce complexity, improve separation of concerns, and make the codebase easier to extend and maintain.
Have you tested this? If so, how?
I ran the existing test suite and all tests continue to pass. No test modifications were required.