Skip to content

refactor: improve code quality in worker, scheduler, and format modules#3409

Open
devkp-dalhousie wants to merge 1 commit intospotify:masterfrom
devkp-dalhousie:refactoring-improvements
Open

refactor: improve code quality in worker, scheduler, and format modules#3409
devkp-dalhousie wants to merge 1 commit intospotify:masterfrom
devkp-dalhousie:refactoring-improvements

Conversation

@devkp-dalhousie
Copy link

Apply seven refactoring techniques to improve readability and maintainability across three core modules (luigi/worker.py, luigi/scheduler.py, luigi/format.py).

Set I Refactorings:

Extract Method: Decompose TaskProcess.run() into _execute_external_task(), _execute_regular_task(), _check_unfulfilled_dependencies(); extract _determine_task_status() from Worker._add()
Rename Variable: Fix PEP 8 naming conventions (processID -> process_id, currentTime -> current_time) in TaskProcess.run()
Decompose Conditional: Simplify Worker._keep_alive() from nested if/elif chain into sequential guard clauses with explaining variables
Introduce Explaining Variable: Replace complex inline boolean expression in Scheduler.add_task() with named variables (task_does_not_exist, worker_disabled_and_task_not_running, etc.)
Set II Refactorings:

Extract Class: Create WorkerNotifier class to separate notification/email concerns from the Worker class
Move Method: Promote _validate_dependency() and _check_complete_value() to module-level functions (they had no dependency on Worker instance state)
Pull-up Method: Introduce CompressionFormat base class for GzipFormat and Bzip2Format to eliminate duplicated pipe_reader/pipe_writer boilerplate
Motivation and Context
The Worker class in worker.py had 38 methods and a WMC of 135, mixing task scheduling, execution, notification, and lifecycle management. Key methods like TaskProcess.run() (71 lines) and Worker._add() (83 lines) were long with deep nesting. GzipFormat and Bzip2Format in format.py duplicated the same pipe reader/writer pattern. These refactorings reduce complexity, improve separation of concerns, and make the codebase easier to extend and maintain.

Have you tested this? If so, how?
I ran the existing test suite and all tests continue to pass. No test modifications were required.

@devkp-dalhousie devkp-dalhousie requested review from a team and dlstadther as code owners March 17, 2026 05:45
Comment on lines +861 to +863
task_does_not_exist = task is None
worker_disabled_and_task_not_running = task is not None and task.status != RUNNING and not worker.enabled
if task_does_not_exist or worker_disabled_and_task_not_running:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue this is change is harder to read. Negative variable names makes code more difficult to follow.

logger.info("[pid %s] Worker %s running %s", os.getpid(), self.worker_id, self.task)

if self.use_multiprocessing:
# Need to have different random seeds if running in separate processes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the comment?

Comment on lines -186 to -189
# Verify that all the tasks are fulfilled! For external tasks we
# don't care about unfulfilled dependencies, because we are just
# checking completeness of self.task so outputs of dependencies are
# irrelevant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment be converted to a docstring on _check_unfulfilled_dependencies()?

@devkp-dalhousie devkp-dalhousie force-pushed the refactoring-improvements branch from 412301d to 4ee3107 Compare March 17, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants