Skip to content

(v1-2/3) tea.go: teardown-related deadlock & race condition fixes #1373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

desertwitch
Copy link

@desertwitch desertwitch commented Mar 26, 2025

Overview

The PR addresses and resolves three possible deadlocks and a multitude of race conditions:

  • p.Wait() deadlocking (externally) after the program aborts/is killed.
    The proposed change also factors in program abort/kill and reports such completions too.

  • multiple p.Wait() deadlocking (externally) as the p.finished channel was not closed.
    The proposed change closes the respective channel before program exit, releasing such waiting functions.

  • p.shutdown() deadlocking (internally) on multiple executions due to p.finished buffer constraints
    The proposed change does not send into the completion channel, but rather closes it, to signal all waiting functions.

  • p.Kill() race conditions and points of failure calling p.shutdown() again, despite it being called at program end.
    The proposed change cancels the program's context instead, untying p.Kill() from program state and allowing the natural program teardown eventually resulting in p.shutdown() as needed. p.Kill() is now idempotent, safe and versatile.

Specifics of occasional test failures regarding p.shutdown() and p.Wait():

The failed test helped uncover multiple deadlock situations that the initial code (if clause) was probably put in place to treat symptomatically (causing deadlocks on its own in the process, albeit external ones). p.shutdown() may get called multiple times throughout the code, in the (sometimes failing, sometimes not - depending on the race condition's timing) test's case once by our explicit p.Kill() call and then upon exit of the p.eventLoop() another time. This resulted in sending to the p.finished channel twice despite it only having a buffer of one.

To mitigate this problem, the logic of reporting completion through that channel was moved closer to the actual end of the program using a logically placed defer call right after the channel's establishment (for logic and readability).

The code was also refactored to closing the p.finished channel on teardown, instead of sending into it, to allow unblocking of multiple blocked p.Wait() calls instead of just one and having the rest deadlock (just one value being drained).

Specifics of occasional (older and new) test failures regarding p.Kill():

First off, we're lucky this happened exactly here - because it doesn't always:

Running all tests for 1000 times without result caching...
=========================
Test Summary
Successes: 830
Failures:  170
=========================

You may have seen this in other commits and considered it an occasional blip of the CI:
https://github.com/charmbracelet/bubbletea/actions/runs/13250713846/job/36987719494
https://github.com/charmbracelet/bubbletea/actions/runs/13796125964/job/38588086635
https://github.com/charmbracelet/bubbletea/actions/runs/13268485968/job/37041941350

But, at present there's a myriad of race conditions (seen in these test failures) and possible pitfalls associated with calling p.Kill() either too soon or too late. This is with the program not having fully initialised yet or another shutdown already being in progress from inside the program. It all stems from directly calling p.shutdown(true) inside of p.Kill():

bubbletea/tea.go

Lines 710 to 712 in 6a1ebaa

func (p *Program) Kill() {
p.shutdown(true)
}

This call makes no sense, because at the end of the natural program flow (either natural or accelerated by a quit/kill) it always gets called anyway - shutdown function then runs doubly, waits doubly, attempts to restore the terminal doubly, you get the idea.

p.shutdown(killed)

Ideally a user should never need to care about when it is safe to call p.Kill(), it should just ensure the program is aware it needs to tear down fast and do that. Since the program is already (internal) context aware, it is safer to just cancel that context inside p.Kill() and let the program tear down itself - always eventually ending in the needed p.shutdown() at the end of p.Run().

func (p *Program) Kill() {
	p.cancel()
}

This change makes the p.Kill() function versatile, idempotent and not tied to the program's state.
Plus, it also eliminates all race conditions, double executions and related points of failure - which I think is sweet. 😎

Testing the proposed changes

Running all tests for 1000 times without result caching...
=========================
Test Summary
Successes: 1000
Failures:  0
=========================

Copy link

codecov bot commented Mar 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@6a1ebaa). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1373   +/-   ##
=======================================
  Coverage        ?   69.26%           
=======================================
  Files           ?       17           
  Lines           ?     1692           
  Branches        ?        0           
=======================================
  Hits            ?     1172           
  Misses          ?      472           
  Partials        ?       48           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@desertwitch desertwitch changed the title tea.go: fix p.Wait() deadlock after program aborts/was killed tea.go: teardown-related deadlock & race condition fixes Mar 27, 2025
@desertwitch
Copy link
Author

desertwitch commented Mar 27, 2025

All cleaned up (PR branch), changes explained and documented in PR... ready for review! 😎 🚀

@desertwitch desertwitch changed the title tea.go: teardown-related deadlock & race condition fixes (2/3) tea.go: teardown-related deadlock & race condition fixes Mar 28, 2025
@bashbunni bashbunni added this to the v2.0.0 milestone Apr 7, 2025
@desertwitch desertwitch changed the title (2/3) tea.go: teardown-related deadlock & race condition fixes (v1-2/3) tea.go: teardown-related deadlock & race condition fixes Apr 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants