Fix concurrent batches deadlock #12335

QMalcolm · 2026-01-09T20:28:54Z

Resolves #11420

Problem

When microbatch models were being run concurrently (only possible with snowflake currently), people were experiencing deadlock 💀 😬 This happened when the number of distinct microbatch models being run reached the number of threads available. That is, say I have a project with ~1000 models, with 100 microbatch models, and running with 64 threads. If all of those threads were executing distinct microbatch models, you'd suddenly have deadlock. This is because there are MicrobatchModelRunners and MicrobatchBatchRunners. In a multi-threaded environment, each MicrobatchModelRunner takes up thread and acts as an orchestrator of MicrobatchBatchRunners which are run on separate threads (except for the first and last batch which are always run synchronously). So if you had 64 threads held by MicrobatchModelRunners that were trying to run 3+ batches, there were no threads left to execute the actual batches (MicrobatchBatchRunners) 🤦🏻

Solution

Limit the number of MicrobatchModelRunner threads to half the number of possible threads, rounded down (minimum 1)
Enable MicrobatchModelRunners to execute MicrobatchBatchRunners synchronously when all threads are currently held by other processes

Of note, I had taken a prior approach to this. It was similar in philosophy, but implemented differently. I don't have that code anymore, it was in a stash that I blew away. That first stab unfortunately caused the second to last batch to always hang. I'm not sure what I did differently this time, and I unfortunately don't have the code to compare. This implementation though does not suffer from that problem 🎉

Testing

Unfortunately we don't/can't have an integration test for this currently because our integration test suite uses postgres and the only adapter that currently supports concurrent batch execution is postgres. However, I did manually test this for what it's worth.

My testing process was

Set up a snowflake project with 2 microbatch models with a lookback of 4 (3 should also do the trick, but I did 4 for good measure)
Run the following on main
a. ✅ dbt run --single-threaded (will work)
b. ❌ dbt run --threads=1 (will deadlock)
c. ❌ dbt run --threads=2 (will deadlock)
b. ✅ dbt run --threads=3 (will work)
Run the following off this branch
a. ✅ dbt run --single-threaded (will work)
b. ✅ dbt run --threads=1 (will work)
c. ✅ dbt run --threads=2 (will work)
b. ✅ dbt run --threads=3 (will work)

Checklist

I have read the contributing guide and understand what's expected of me.
I have run this code in development, and it appears to resolve the stated issue.
This PR includes tests, or tests are not required or relevant for this PR.
This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
This PR includes type annotations for new and modified functions.

Keeping track of these things will allow us to not execute "too many" microbatch model runners in the upcoming commits

…ads, rounded down Each `MicrobatchModelRunner` is essentially a batch orchestrator, scheduling `MicrobatchBatchRunner` instances. When in a multi threaded environment, if the number of `MicrobatchModelRunner`s running was equal to the number of threads, then the run would lock up, because there'd be no threads available for running batches. By limiting the number of running `MicrobatchModelRunner` instances, we ensure to avoid deadlock.

…ilable It turns out `dbt run` (no threading) and `dbt run --threads=1` are not the same. The former is synchronous execution, the latter attemps asynchronous execution (but only has 1 thread). In the latter case, when `dbt run --threads=1`, a `MicrobatchModelRunner` would occupy the only available thread, and not be able to run any `MicrobatchBatchRunner`s causing deadlock. This change makes it such that if we're doing asynchronous execution (even if there is only one thread), we only submit the batch for asynchronous execution if there are threads available. If there are no threads available, the `MicrobatchBatchRunner` gets run syncchronously on the thread of the `MicrobatchModelRunner`. This conveniently also makes it so that the `MicrobatchModelRunner` doesn't "just" orchestrate batches in a asynchronous environment, but will also sychronously execute batches when threads are maxed.

codecov · 2026-01-09T20:32:04Z

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 91.35%. Comparing base (9b4a8bb) to head (4c7c73c).

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #12335   +/-   ##
=======================================
  Coverage   91.35%   91.35%           
=======================================
  Files         203      203           
  Lines       25044    25063   +19     
=======================================
+ Hits        22878    22896   +18     
- Misses       2166     2167    +1

Flag	Coverage Δ
integration	`88.24% <96.55%> (+0.02%)`	⬆️
unit	`65.27% <62.06%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Unit Tests	`65.27% <62.06%> (-0.02%)`	⬇️
Integration Tests	`88.24% <96.55%> (+0.02%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

core/dbt/graph/queue.py

core/dbt/task/run.py

… node_ids We weren't doing this :face-palm:. This meant we were over counting how many "microbatch" nodes were being run. This didn't cause any failures/deadlock, but it did "slow down" the process by making it think more microbatch things were running than there really were.

QMalcolm added 9 commits January 8, 2026 14:16

Begin tracking which microbatch model nodes are in progress

8ef4ce8

Add some typing to runnable.py funcitons

75b8598

Begin tracking thread limit, and microbatch limit, on DbtThreadPool

0a2b664

Keeping track of these things will allow us to not execute "too many" microbatch model runners in the upcoming commits

Improve naming and documentation for microbatch thread limiting changes

f8b7af0

Fix microbatch test that broke due to new debug log

2cf2514

Drop # type:ignore from instantiations of EventCatcher

6375994

Add changie doc

290ea67

QMalcolm requested a review from a team as a code owner January 9, 2026 20:28

cla-bot bot added the cla:yes label Jan 9, 2026

MichelleArk reviewed Jan 9, 2026

View reviewed changes

core/dbt/graph/queue.py Show resolved Hide resolved

MichelleArk reviewed Jan 9, 2026

View reviewed changes

core/dbt/task/run.py Outdated Show resolved Hide resolved

MichelleArk reviewed Jan 9, 2026

View reviewed changes

core/dbt/task/run.py Show resolved Hide resolved

QMalcolm added 3 commits January 9, 2026 16:27

Add context comment for when batches are run in parallel or not

73beab0

Move _maybe_wait_for_microbatch to more relvant location in file

4c7c73c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix concurrent batches deadlock #12335

Fix concurrent batches deadlock #12335

QMalcolm commented Jan 9, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix concurrent batches deadlock #12335

Are you sure you want to change the base?

Fix concurrent batches deadlock #12335

Conversation

QMalcolm commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Checklist

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

QMalcolm commented Jan 9, 2026 •

edited

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading