(v1-2/3) tea.go: teardown-related deadlock & race condition fixes #1373

desertwitch · 2025-03-26T14:35:59Z

Overview

The PR addresses and resolves three possible deadlocks and a multitude of race conditions:

p.Wait() deadlocking (externally) after the program aborts/is killed.
The proposed change also factors in program abort/kill and reports such completions too.
multiple p.Wait() deadlocking (externally) as the p.finished channel was not closed.
The proposed change closes the respective channel before program exit, releasing such waiting functions.
p.shutdown() deadlocking (internally) on multiple executions due to p.finished buffer constraints
The proposed change does not send into the completion channel, but rather closes it, to signal all waiting functions.
p.Kill() race conditions and points of failure calling p.shutdown() again, despite it being called at program end.
The proposed change cancels the program's context instead, untying p.Kill() from program state and allowing the natural program teardown eventually resulting in p.shutdown() as needed. p.Kill() is now idempotent, safe and versatile.

Specifics of occasional test failures regarding `p.shutdown()` and `p.Wait()`:

The failed test helped uncover multiple deadlock situations that the initial code (if clause) was probably put in place to treat symptomatically (causing deadlocks on its own in the process, albeit external ones). p.shutdown() may get called multiple times throughout the code, in the (sometimes failing, sometimes not - depending on the race condition's timing) test's case once by our explicit p.Kill() call and then upon exit of the p.eventLoop() another time. This resulted in sending to the p.finished channel twice despite it only having a buffer of one.

To mitigate this problem, the logic of reporting completion through that channel was moved closer to the actual end of the program using a logically placed defer call right after the channel's establishment (for logic and readability).

The code was also refactored to closing the p.finished channel on teardown, instead of sending into it, to allow unblocking of multiple blocked p.Wait() calls instead of just one and having the rest deadlock (just one value being drained).

Specifics of occasional (older and new) test failures regarding `p.Kill()`:

First off, we're lucky this happened exactly here - because it doesn't always:

Running all tests for 1000 times without result caching...
=========================
Test Summary
Successes: 830
Failures:  170
=========================

You may have seen this in other commits and considered it an occasional blip of the CI:
https://github.com/charmbracelet/bubbletea/actions/runs/13250713846/job/36987719494
https://github.com/charmbracelet/bubbletea/actions/runs/13796125964/job/38588086635
https://github.com/charmbracelet/bubbletea/actions/runs/13268485968/job/37041941350

But, at present there's a myriad of race conditions (seen in these test failures) and possible pitfalls associated with calling p.Kill() either too soon or too late. This is with the program not having fully initialised yet or another shutdown already being in progress from inside the program. It all stems from directly calling p.shutdown(true) inside of p.Kill():

bubbletea/tea.go

Lines 710 to 712 in 6a1ebaa

    
           func (p *Program) Kill() { 
        
           	p.shutdown(true) 
        
           }

This call makes no sense, because at the end of the natural program flow (either natural or accelerated by a quit/kill) it always gets called anyway - shutdown function then runs doubly, waits doubly, attempts to restore the terminal doubly, you get the idea.

bubbletea/tea.go

Line 658 in 6a1ebaa

p.shutdown(killed)

Ideally a user should never need to care about when it is safe to call p.Kill(), it should just ensure the program is aware it needs to tear down fast and do that. Since the program is already (internal) context aware, it is safer to just cancel that context inside p.Kill() and let the program tear down itself - always eventually ending in the needed p.shutdown() at the end of p.Run().

func (p *Program) Kill() {
	p.cancel()
}

This change makes the p.Kill() function versatile, idempotent and not tied to the program's state.
Plus, it also eliminates all race conditions, double executions and related points of failure - which I think is sweet. 😎

Testing the proposed changes

Running all tests for 1000 times without result caching...
=========================
Test Summary
Successes: 1000
Failures:  0
=========================

codecov · 2025-03-26T16:41:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@6a1ebaa). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1373   +/-   ##
=======================================
  Coverage        ?   69.26%           
=======================================
  Files           ?       17           
  Lines           ?     1692           
  Branches        ?        0           
=======================================
  Hits            ?     1172           
  Misses          ?      472           
  Partials        ?       48

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

desertwitch · 2025-03-27T10:26:53Z

All cleaned up (PR branch), changes explained and documented in PR... ready for review! 😎 🚀

aymanbagabas · 2025-04-20T10:15:37Z

This was covered in #1376

desertwitch requested review from meowgorithm and aymanbagabas as code owners March 26, 2025 14:36

fix: release p.Wait() on p.Kill() to prevent external deadlocks

9d66bd3

desertwitch force-pushed the patch-2 branch from c289dfd to 79b149d Compare March 27, 2025 08:24

desertwitch changed the title ~~tea.go: fix p.Wait() deadlock after program aborts/was killed~~ tea.go: teardown-related deadlock & race condition fixes Mar 27, 2025

desertwitch added 2 commits March 27, 2025 11:22

fix: resolve race conditions caused by p.Kill()

4fcd83f

fix(tests): account for multiple p.Wait()

ae1259d

desertwitch force-pushed the patch-2 branch from 5c7c8df to ae1259d Compare March 27, 2025 10:23

desertwitch mentioned this pull request Mar 28, 2025

(v1-3/3) tea.go: fix panic handling race condition + wrongful nil returns despite errors (ctx, panic, ...) #1375

Closed

desertwitch changed the title ~~tea.go: teardown-related deadlock & race condition fixes~~ (2/3) tea.go: teardown-related deadlock & race condition fixes Mar 28, 2025

desertwitch mentioned this pull request Mar 28, 2025

(v1-0/3) tea.go: all fixes combined (for convenience/CI) #1376

Merged

bashbunni added this to the v2.0.0 milestone Apr 7, 2025

caarlos0 approved these changes Apr 7, 2025

View reviewed changes

desertwitch mentioned this pull request Apr 13, 2025

(v2) tea.go: deadlock/error return fixes, additional tests (targeted at v2) #1388

Merged

desertwitch changed the title ~~(2/3) tea.go: teardown-related deadlock & race condition fixes~~ (v1-2/3) tea.go: teardown-related deadlock & race condition fixes Apr 13, 2025

aymanbagabas approved these changes Apr 14, 2025

View reviewed changes

aymanbagabas closed this Apr 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(v1-2/3) tea.go: teardown-related deadlock & race condition fixes #1373

(v1-2/3) tea.go: teardown-related deadlock & race condition fixes #1373

Uh oh!

desertwitch commented Mar 26, 2025 •

edited

Loading

Uh oh!

codecov bot commented Mar 26, 2025 •

edited

Loading

Uh oh!

desertwitch commented Mar 27, 2025 •

edited

Loading

Uh oh!

aymanbagabas commented Apr 20, 2025

Uh oh!

Uh oh!

(v1-2/3) tea.go: teardown-related deadlock & race condition fixes #1373

(v1-2/3) tea.go: teardown-related deadlock & race condition fixes #1373

Uh oh!

Conversation

desertwitch commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Specifics of occasional test failures regarding p.shutdown() and p.Wait():

Specifics of occasional (older and new) test failures regarding p.Kill():

Testing the proposed changes

Uh oh!

codecov bot commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

desertwitch commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aymanbagabas commented Apr 20, 2025

Uh oh!

Uh oh!

desertwitch commented Mar 26, 2025 •

edited

Loading

Specifics of occasional test failures regarding `p.shutdown()` and `p.Wait()`:

Specifics of occasional (older and new) test failures regarding `p.Kill()`:

codecov bot commented Mar 26, 2025 •

edited

Loading

desertwitch commented Mar 27, 2025 •

edited

Loading