Add source location to task and tasktrace object #2707

radoslawcybulski · 2025-03-31T14:23:38Z

Adds source location of next resume point to task objects. When constructing tasktrace object, source location of resume points will be added to it. When printing, the source location of next resume point will be printed as well, hopefully improving debug experience.

Example:

0x2b3899 /home/y/work/scylladb-test/build/Debug/seastar/libseastar.so+0x454c364 ...
   --------
   test_app/main.cpp:15:15
   seastar::continuation<seastar::internal::promise_base_with_type<void>, ...

Partially fixes #2381

avikivity · 2025-03-31T14:48:28Z

include/seastar/core/task.hh

@@ -35,6 +36,8 @@ class task {
 protected:
    scheduling_group _sg;
 private:
+    std::source_location resume_point = {};
+


This is making a very heavily used object larger.

After adding std::source_location

coroutine_traits_base<>::promise_type (which inherits from task) is 64 bytes,
continuation<Promise, Func, Wrapper, T> is around 104-120, depending on exact version.

No difference except for that single unlucky value, who is "promoted" to the next 64 byte cache line in case of continuation.

I think it's okay.

avikivity · 2025-03-31T14:51:12Z

Aren't those addresses derivable from the stack trace? Or are those points lost?

How large is std::source_location?

radoslawcybulski · 2025-03-31T14:55:09Z

Aren't those addresses derivable from the stack trace? Or are those points lost?

Probably with a little bit of magic. You'd still have to store a pointer somewhere, because stacktrace is valid for an actually running task, while my patch adds "next instruction" to tasks, which are suspended.

How large is std::source_location?

8 bytes.

avikivity · 2025-03-31T15:09:23Z

Aren't those addresses derivable from the stack trace? Or are those points lost?

Probably with a little bit of magic. You'd still have to store a pointer somewhere, because stacktrace is valid for an actually running task, while my patch adds "next instruction" to tasks, which are suspended.

How large is std::source_location?

8 bytes.

Maybe we can tolerate it. @tgrabiec what's your opinion?

Please run scylladb's perf-simple-query and report insn-per-op before and after (although we won't learn of the impact of the size change).

radoslawcybulski · 2025-03-31T16:21:39Z

my patch

instructions_per_op: mean=48465.87 standard-deviation=33.03 median=48459.42 median-absolute-deviation=22.63 maximum=48520.81 minimum=48437.81
instructions_per_op: mean=48482.90 standard-deviation=21.16 median=48487.16 median-absolute-deviation=6.97 maximum=48504.13 minimum=48447.34
instructions_per_op: mean=48469.42 standard-deviation=18.34 median=48472.45 median-absolute-deviation=17.26 maximum=48488.53 minimum=48448.81

master

instructions_per_op: mean=48337.41 standard-deviation=21.29 median=48334.08 median-absolute-deviation=19.67 maximum=48364.48 minimum=48317.02
instructions_per_op: mean=48352.36 standard-deviation=27.87 median=48347.35 median-absolute-deviation=16.28 maximum=48395.26 minimum=48322.29
instructions_per_op: mean=48339.83 standard-deviation=26.48 median=48323.12 median-absolute-deviation=20.80 maximum=48372.64 minimum=48319.03

My patch will run more instructions, as there are additional stores to the task object (std::source_location needs to be written).

Why do we measure instructions per operation? We are not billed by instruction count, rather by time, we should measure total running time instead? Does it work as a sanity check or am i missing something?

avikivity · 2025-03-31T17:37:35Z

Why do we measure instructions per operation? We are not billed by instruction count, rather by time, we should measure total running time instead? Does it work as a sanity check or am i missing something?

Time is very unstable (just try it) since it depends on temperature and cooling.

Instructions is a more stable proxy for time (but inaccurate since instructions-per-clock can change).

avikivity · 2025-03-31T17:38:40Z

So we add 0.25% overhead. Hard to judge if it's worthwhile.

radoslawcybulski · 2025-03-31T18:01:15Z

Time is very unstable (just try it) since it depends on temperature and cooling.

Agree.

Instructions is a more stable proxy for time (but inaccurate since instructions-per-clock can change).

It's absolutely stable, but we're counting steps, which - in my opinion - sort-of blinds us. It's like two different walks, the same step count, but step sizes differ, so is distance. Here for example if we're running at max IPC, then we've 0.25% overhead. If otherwise we're waiting for memory, then the write is pretty much free (we've already waited for this cache line, because other stuff is read / written there) and proc is waiting for other memory pieces, so it can do our write.

avikivity · 2025-03-31T18:08:08Z

Time is very unstable (just try it) since it depends on temperature and cooling.

Agree.

Instructions is a more stable proxy for time (but inaccurate since instructions-per-clock can change).

It's absolutely stable, but we're counting steps, which - in my opinion - sort-of blinds us. It's like two different walks, the same step count, but step sizes differ, so is distance. Here for example if we're running at max IPC, then we've 0.25% overhead. If otherwise we're waiting for memory, then the write is pretty much free (we've already waited for this cache line, because other stuff is read / written there) and proc is waiting for other memory pieces, so it can do our write.

Writes are more-or-less free, except for the increase in instruction footprint (which are extra reads).

We aren't close to running at max IPC. Typical for mini-benchmarks like perf-simple-query is 2, and for production ~1. The main killer is waiting for instruction fetch (although maybe in production mispredicts would also contribute since the the data is less regular). I'm sure an M4 would give much higher IPC.

tchaikov

if the performance impact could be a concern, can we make it an optional feature which is enabled only in the debug builds?

tchaikov · 2025-04-07T07:50:40Z

include/seastar/core/coroutine.hh

@@ -30,6 +30,7 @@

 #ifndef SEASTAR_MODULE
 #include <coroutine>
+#include <source_location>


it took me a while to check if it's safe to use <source_location> instead of util/std-compat.hh . the answer is yes. as we support the latest two major versions of the GCC and Clang compilers. in general, we use libstdc++, which added the support to source_location in gcc-mirror/gcc@57d76ee, which is included by gcc 12 and up. and the latest stable release of GCC is GCC 14.2.

radoslawcybulski · 2025-04-14T06:27:30Z

if the performance impact could be a concern, can we make it an optional feature which is enabled only in the debug builds?

This has been discussed here (#2381) and consensus is that the release is where it's useful the most.

tgrabiec · 2025-04-17T11:53:38Z

Aren't those addresses derivable from the stack trace? Or are those points lost?

Probably with a little bit of magic. You'd still have to store a pointer somewhere, because stacktrace is valid for an actually running task, while my patch adds "next instruction" to tasks, which are suspended.

How large is std::source_location?

8 bytes.

Maybe we can tolerate it. @tgrabiec what's your opinion?

I think it's probably ok.

tchaikov

lgtm

bitpathfinder · 2025-04-22T07:22:48Z

include/seastar/core/task.hh

@@ -35,6 +36,8 @@ class task {
 protected:
    scheduling_group _sg;
 private:
+    std::source_location resume_point = {};


I'd put the resume_point first since it's larger than scheduling_group. that way the task object size is 4 bytes less.

Not really - if the class is not packed then the size includes the alignment, so order doesn't matter.

Yes, that's true. The object will be aligned to 8 bytes boundary. However in case we add something after these variables, less than 5 bytes, then there will be a difference, so having it ordered by size might be beneficial in the future.

bitpathfinder · 2025-04-22T12:34:43Z

include/seastar/core/task.hh

 #endif

 namespace seastar {

 SEASTAR_MODULE_EXPORT
 class task {
+    std::source_location resume_point = {};


(nit) Should not we prepend class member variables with an underscore: resume_point->_resume_point?

Add an empty, default constructed std::source_location object to the task object and getter / setter.

Add calls to update `resume_point` variable with location of next resume location to all `await_suspend` functions and `then` functions.

Add resume point locations to `tasktrace` object. Update `formatter::format` to print source location of next resume alone with task type.

radoslawcybulski · 2025-04-22T13:13:22Z

@scylladb/seastar-maint please merge.

…ktrace

denesb · 2025-06-27T14:11:51Z

So we add 0.25% overhead. Hard to judge if it's worthwhile.

@avikivity I think the gain is much higher than 0.25% here. We need to reduce the bar of entry into debugging seastar applications.

radoslawcybulski · 2025-06-27T18:42:59Z

@denesb can you tell me, how did you figure out performance differences for my patch (or any patch)? Sub 1% precision looks amazing to me, how did you do that?

denesb · 2025-06-30T07:49:44Z

@denesb can you tell me, how did you figure out performance differences for my patch (or any patch)? Sub 1% precision looks amazing to me, how did you do that?

I didn't, I just reacted to Avi's comment. I didn't claim that the perf regression is something else, just that 0.25% that Avi claimed is worth it for the gain of seeing line numbers.

michoecho · 2025-06-30T11:32:22Z

@denesb can you tell me, how did you figure out performance differences for my patch (or any patch)? Sub 1% precision looks amazing to me, how did you do that?

We're talking about the perf_simple_query results posted earlier in this thread.

radoslawcybulski force-pushed the pr2381-add-source-location-to-tasktrace branch from 02aa70a to c0149e3 Compare March 31, 2025 14:28

radoslawcybulski requested a review from tchaikov March 31, 2025 14:30

avikivity reviewed Mar 31, 2025

View reviewed changes

radoslawcybulski force-pushed the pr2381-add-source-location-to-tasktrace branch from c0149e3 to 08d3c8a Compare March 31, 2025 14:58

radoslawcybulski requested a review from avikivity March 31, 2025 16:22

tchaikov reviewed Apr 7, 2025

View reviewed changes

radoslawcybulski requested a review from tchaikov April 14, 2025 10:35

radoslawcybulski self-assigned this Apr 14, 2025

tgrabiec approved these changes Apr 17, 2025

View reviewed changes

tchaikov approved these changes Apr 22, 2025

View reviewed changes

bitpathfinder reviewed Apr 22, 2025

View reviewed changes

radoslawcybulski force-pushed the pr2381-add-source-location-to-tasktrace branch 2 times, most recently from 6fe697d to 80ce73a Compare April 22, 2025 09:46

radoslawcybulski requested a review from bitpathfinder April 22, 2025 12:31

bitpathfinder reviewed Apr 22, 2025

View reviewed changes

Radosław Cybulski added 3 commits April 22, 2025 14:38

Add a std::source_location (resume_point) to task object.

13d5be4

Add an empty, default constructed std::source_location object to the task object and getter / setter.

Add calls to update resume_point

2ffc305

Add calls to update `resume_point` variable with location of next resume location to all `await_suspend` functions and `then` functions.

Update backtrace with source locations of resume points

58aa04e

Add resume point locations to `tasktrace` object. Update `formatter::format` to print source location of next resume alone with task type.

radoslawcybulski force-pushed the pr2381-add-source-location-to-tasktrace branch from 80ce73a to 58aa04e Compare April 22, 2025 12:38

radoslawcybulski requested a review from bitpathfinder April 22, 2025 12:40

bitpathfinder approved these changes Apr 22, 2025

View reviewed changes

Merge branch 'scylladb:master' into pr2381-add-source-location-to-tas…

8026e1e

…ktrace

Add source location to task and tasktrace object #2707

Are you sure you want to change the base?

Add source location to task and tasktrace object #2707

Uh oh!

Conversation

radoslawcybulski commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avikivity commented Mar 31, 2025

Uh oh!

radoslawcybulski commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avikivity commented Mar 31, 2025

Uh oh!

radoslawcybulski commented Mar 31, 2025

Uh oh!

avikivity commented Mar 31, 2025

Uh oh!

avikivity commented Mar 31, 2025

Uh oh!

radoslawcybulski commented Mar 31, 2025

Uh oh!

avikivity commented Mar 31, 2025

Uh oh!

tchaikov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

radoslawcybulski commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgrabiec commented Apr 17, 2025

Uh oh!

tchaikov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

radoslawcybulski commented Apr 22, 2025

Uh oh!

denesb commented Jun 27, 2025

Uh oh!

radoslawcybulski commented Jun 27, 2025

Uh oh!

denesb commented Jun 30, 2025

Uh oh!

michoecho commented Jun 30, 2025

Uh oh!

Uh oh!

radoslawcybulski commented Mar 31, 2025 •

edited

Loading

radoslawcybulski commented Mar 31, 2025 •

edited

Loading

radoslawcybulski commented Apr 14, 2025 •

edited

Loading