Skip to content

Refactor fetching tasks to process #75

@emyller

Description

@emyller

A couple issues have been observed around the database function responsible for retrieving and locking tasks to process.

1. Result inconsistency.

Especially when running tests, somewhat often the database query will return no results, leading to flaky test failures, e.g.

    def test_run_task_runs_task_and_creates_task_run_object_when_success(
        database: str,
        dummy_task: TaskHandler[[str, str]],
    ) -> None:
        # Given
        task = Task.create(dummy_task.task_identifier, scheduled_for=timezone.now())
        task.save(using=database)
    
        # When
        task_runs = run_tasks(database)
    
        # Then
>       assert cache.get(DEFAULT_CACHE_KEY)
E       AssertionError: assert None
E        +  where None = get('foo')
E        +    where get = <django.utils.connection.ConnectionProxy object at 0x107958080>.get

tests/unit/task_processor/test_unit_task_processor_processor.py:93: AssertionError

We became suspicious of a time zone issue so we forced UTC in both test databases as an experiment to confirm the bug. While the change led to tests passing consistently, eventually the same exception has randomly arose again.

We're still not sure what causes this. Time might be involved.

2. Locked tasks

Since the database function will select N tuples and lock them in-query, and if the associated Django queryset is not exhausted — e.g. because of an exception, we'll result in a state with tasks locked but never processed.

Related code:

tasks = task_manager.get_tasks_to_process(num_tasks)
if tasks:
logger.debug(f"Running {len(tasks)} task(s) from database '{database}'")
executed_tasks = []
task_runs = []
for task in tasks:
task, task_run = _run_task(task)
executed_tasks.append(task)
assert isinstance(task_run, TaskRun)
task_runs.append(task_run)

The current implementation wraps running tasks within a try/except block which should prevent dangling tasks. Still, the design could be improved, and perhaps future-proofed, by using the Django ORM to somehow achieve a similar effect and efficiency.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions