Skip to content

seastar could crash if a stall report is emitted at an inopportune time #2697

Open
@travisdowns

Description

@travisdowns

The stall reporter uses ::backtrace from a signal, and this method is not officially signal safe. There are mixed reports of its "practical" signal safety, including other issues here in this repo, but ultimately it seems that it is not, in fact, safe. In particular, ::backtrace shares machinery with libgcc exception unwind machinery, and if a stall report is emitted at a critical time when the interrupted fiber is unwinding the stack due to an exception, the stall report back crash in backtrace. This is especially hard to diagnose because the SIGSEGV handler is also not able to emit a backtrace in this case, as it crashes during backtrace in the same way (but I have a partial fix for this which I will push soon).

Here is a reproducer:

constexpr int max_depth = 2;

struct state {
    int caught = 0, top = 0, thrown = 0;
};

[[gnu::noinline]]
void impl(state& s, int x) {
    ++s.top;
    if (x <= 0) {
        throw std::runtime_error("foo");
    } else {
        try {
            impl(s, x - 1);
        } catch (...) {
            ++s.caught;
            ++s.thrown;
            throw;
        }
    }
}

SEASTAR_THREAD_TEST_CASE(stall_detector_crash) {
  auto total_iters = 100000000;
  auto now = [] { return std::chrono::high_resolution_clock::now(); };

  auto next_yield = now();
  state s;
  for (int a = 0; a < total_iters; a++) {
    if (now() > next_yield) {
        thread::yield();
        next_yield = now() + 10ms;
    }

    try {
        impl(s, a % max_depth);
    } catch (...) {
    }

    if (a % 100000 == 0) {
        fmt::print("Making progress: {:6.3f}%, top: {}, caught: {}\n", 100. * a / total_iters, s.top, s.caught);
    }
  }
}

Compile with clang-18 and run with -- --blocked-reactor-reports-per-minute=99999 --blocked-reactor-notify-ms=1 and it should crash almost immediately. Note that it can crash also with default parameters for the stall notifier, these ones just help it happen quickly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions