Skip to content

Poor scheduling throughput when a low percentage of tasks are schedule-able. #31185

Open
@rob-1126

Description

@rob-1126

Apache Airflow version

2.6.0

What happened

Airflow scheduling task throughput suffers with relatively moderate numbers of total outstanding tasks - in situations where you have a few thousand tasks associated with running dags but a low fraction of them are schedu-able the scheduler spends a large portion of the time considering tasks before it finds the one or two that are eligible for scheduling in every cycle of the scheduler loop.

Similarly, a dag with many blocked tasks can adversely impact a generally simple dagbag with moderate throughput requirements.

In practice, the scheduler under nearly ideal circumstances can consider roughly 500-1000 tasks a second.

This appears to be a combination of two factors.

  1. Airflow is considering all tasks in that belong to a dag and re-considers each task each run of the scheduler instead of in response to events which can cause state changes.
  2. Airflow's per-task consideration (including ti_deps) in the main scheduler loop is 1-2ms per task from a world of all non-completed tasks and is not sufficient to brute-force through without impacting scheduler latency.

What you think should happen instead

Either:

ti_deps should be stored in the database and modified in response to events. The main scheduler loop should pull only tasks where there are no unsatisfied task dependencies and non check task-dependencies at run time.

OR:

per-task consideration in the main scheduler loop critical path should be reduced extremely significantly. Do any of the ti_deps result in further database calls for individual instances? If not, this path may not be plausible without the scheduler taking out a broad-scoped lock and keeping substantial state in memory.

How to reproduce

Construct a dag bag with two dags:

  • one dag (dag_id=sleep-then-parallel) that sleeps for an hour and then spawns 5000 parallel trivial tasks
  • another dag(dag_id=serial-dag) that runs every minute and spawns a 20-deep series of simple tasks.

Each step of the 20-deep series of simple tasks will be delayed in execution for several seconds as the previous scheduler loop takes longer to finish up the 5000 tasks (if processing the sleep-then-parallel dag after the serial-dag) or the new scheduler loop process the 500 tasks (if processing the sleep-then-parallel dag before the serial-dag).

Operating System

MacOS against my will

Versions of Apache Airflow Providers

N/A

Deployment

Other

Deployment details

Local source with custom no-op executor benchmarking and additional instrumentation in sqlalchemy, python, and elsewhere.

Anything else

My feeling is that one or both of these items should be filed as a feature-request if there is consensus on a course of action.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions