Skip to content

Conversation

@QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented Jan 9, 2026

Resolves #11420

Problem

When microbatch models were being run concurrently (only possible with snowflake currently), people were experiencing deadlock 💀 😬 This happened when the number of distinct microbatch models being run reached the number of threads available. That is, say I have a project with ~1000 models, with 100 microbatch models, and running with 64 threads. If all of those threads were executing distinct microbatch models, you'd suddenly have deadlock. This is because there are MicrobatchModelRunners and MicrobatchBatchRunners. In a multi-threaded environment, each MicrobatchModelRunner takes up thread and acts as an orchestrator of MicrobatchBatchRunners which are run on separate threads (except for the first and last batch which are always run synchronously). So if you had 64 threads held by MicrobatchModelRunners that were trying to run 3+ batches, there were no threads left to execute the actual batches (MicrobatchBatchRunners) 🤦🏻

Solution

  1. Limit the number of MicrobatchModelRunner threads to half the number of possible threads, rounded down (minimum 1)
  2. Enable MicrobatchModelRunners to execute MicrobatchBatchRunners synchronously when all threads are currently held by other processes

Of note, I had taken a prior approach to this. It was similar in philosophy, but implemented differently. I don't have that code anymore, it was in a stash that I blew away. That first stab unfortunately caused the second to last batch to always hang. I'm not sure what I did differently this time, and I unfortunately don't have the code to compare. This implementation though does not suffer from that problem 🎉

Testing

Unfortunately we don't/can't have an integration test for this currently because our integration test suite uses postgres and the only adapter that currently supports concurrent batch execution is postgres. However, I did manually test this for what it's worth.

My testing process was

  1. Set up a snowflake project with 2 microbatch models with a lookback of 4 (3 should also do the trick, but I did 4 for good measure)
  2. Run the following on main
    a. ✅ dbt run --single-threaded (will work)
    b. ❌ dbt run --threads=1 (will deadlock)
    c. ❌ dbt run --threads=2 (will deadlock)
    b. ✅ dbt run --threads=3 (will work)
  3. Run the following off this branch
    a. ✅ dbt run --single-threaded (will work)
    b. ✅ dbt run --threads=1 (will work)
    c. ✅ dbt run --threads=2 (will work)
    b. ✅ dbt run --threads=3 (will work)

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

Keeping track of these things will allow us to not execute "too many"
microbatch model runners in the upcoming commits
…ads, rounded down

Each `MicrobatchModelRunner` is essentially a batch orchestrator, scheduling
`MicrobatchBatchRunner` instances. When in a multi threaded environment,
if the number of `MicrobatchModelRunner`s running was equal to the number
of threads, then the run would lock up, because there'd be no threads available
for running batches. By limiting the number of running `MicrobatchModelRunner`
instances, we ensure to avoid deadlock.
…ilable

It turns out `dbt run` (no threading) and `dbt run --threads=1` are
not the same. The former is synchronous execution, the latter attemps
asynchronous execution (but only has 1 thread). In the latter case,
when `dbt run --threads=1`, a `MicrobatchModelRunner` would occupy the
only available thread, and not be able to run any `MicrobatchBatchRunner`s
causing deadlock.

This change makes it such that if we're doing asynchronous execution
(even if there is only one thread), we only submit the batch for
asynchronous execution if there are threads available. If there are no
threads available, the `MicrobatchBatchRunner` gets run syncchronously
on the thread of the `MicrobatchModelRunner`. This conveniently also makes
it so that the `MicrobatchModelRunner` doesn't "just" orchestrate batches
in a asynchronous environment, but will also sychronously execute batches
when threads are maxed.
@QMalcolm QMalcolm requested a review from a team as a code owner January 9, 2026 20:28
@cla-bot cla-bot bot added the cla:yes label Jan 9, 2026
@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 91.35%. Comparing base (9b4a8bb) to head (4c7c73c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #12335   +/-   ##
=======================================
  Coverage   91.35%   91.35%           
=======================================
  Files         203      203           
  Lines       25044    25063   +19     
=======================================
+ Hits        22878    22896   +18     
- Misses       2166     2167    +1     
Flag Coverage Δ
integration 88.24% <96.55%> (+0.02%) ⬆️
unit 65.27% <62.06%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 65.27% <62.06%> (-0.02%) ⬇️
Integration Tests 88.24% <96.55%> (+0.02%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… node_ids

We weren't doing this :face-palm:. This meant we were over counting how many
"microbatch" nodes were being run. This didn't cause any failures/deadlock,
but it did "slow down" the process by making it think more microbatch things
were running than there really were.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Microbatch models hang with dbt-snowflake

3 participants