Skip to content

Add source location to task and tasktrace object #2707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

radoslawcybulski
Copy link

@radoslawcybulski radoslawcybulski commented Mar 31, 2025

Adds source location of next resume point to task objects. When constructing tasktrace object, source location of resume points will be added to it. When printing, the source location of next resume point will be printed as well, hopefully improving debug experience.

Example:

0x2b3899 /home/y/work/scylladb-test/build/Debug/seastar/libseastar.so+0x454c364 ...
   --------
   test_app/main.cpp:15:15
   seastar::continuation<seastar::internal::promise_base_with_type<void>, ...

Partially fixes #2381

Radosław Cybulski added 2 commits March 31, 2025 16:27
Add an empty, default constructed std::source_location object
to the task object and getter / setter.
Add calls to update `resume_point` variable with location of
next resume location to all `await_suspend` functions and `then`
functions.
@radoslawcybulski radoslawcybulski force-pushed the pr2381-add-source-location-to-tasktrace branch from 02aa70a to c0149e3 Compare March 31, 2025 14:28
@@ -35,6 +36,8 @@ class task {
protected:
scheduling_group _sg;
private:
std::source_location resume_point = {};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making a very heavily used object larger.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding std::source_location

coroutine_traits_base<>::promise_type (which inherits from task) is 64 bytes,
continuation<Promise, Func, Wrapper, T> is around 104-120, depending on exact version.

No difference except for that single unlucky value, who is "promoted" to the next 64 byte cache line in case of continuation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay.

@avikivity
Copy link
Member

Aren't those addresses derivable from the stack trace? Or are those points lost?

How large is std::source_location?

@radoslawcybulski
Copy link
Author

radoslawcybulski commented Mar 31, 2025

Aren't those addresses derivable from the stack trace? Or are those points lost?

Probably with a little bit of magic. You'd still have to store a pointer somewhere, because stacktrace is valid for an actually running task, while my patch adds "next instruction" to tasks, which are suspended.

How large is std::source_location?

8 bytes.

Add resume point locations to `tasktrace` object.
Update `formatter::format` to print source location of next resume alone
with task type.
@radoslawcybulski radoslawcybulski force-pushed the pr2381-add-source-location-to-tasktrace branch from c0149e3 to 08d3c8a Compare March 31, 2025 14:58
@avikivity
Copy link
Member

Aren't those addresses derivable from the stack trace? Or are those points lost?

Probably with a little bit of magic. You'd still have to store a pointer somewhere, because stacktrace is valid for an actually running task, while my patch adds "next instruction" to tasks, which are suspended.

How large is std::source_location?

8 bytes.

Maybe we can tolerate it. @tgrabiec what's your opinion?

Please run scylladb's perf-simple-query and report insn-per-op before and after (although we won't learn of the impact of the size change).

@radoslawcybulski
Copy link
Author

my patch

instructions_per_op: mean=48465.87 standard-deviation=33.03 median=48459.42 median-absolute-deviation=22.63 maximum=48520.81 minimum=48437.81
instructions_per_op: mean=48482.90 standard-deviation=21.16 median=48487.16 median-absolute-deviation=6.97 maximum=48504.13 minimum=48447.34
instructions_per_op: mean=48469.42 standard-deviation=18.34 median=48472.45 median-absolute-deviation=17.26 maximum=48488.53 minimum=48448.81

master

instructions_per_op: mean=48337.41 standard-deviation=21.29 median=48334.08 median-absolute-deviation=19.67 maximum=48364.48 minimum=48317.02
instructions_per_op: mean=48352.36 standard-deviation=27.87 median=48347.35 median-absolute-deviation=16.28 maximum=48395.26 minimum=48322.29
instructions_per_op: mean=48339.83 standard-deviation=26.48 median=48323.12 median-absolute-deviation=20.80 maximum=48372.64 minimum=48319.03

My patch will run more instructions, as there are additional stores to the task object (std::source_location needs to be written).

Why do we measure instructions per operation? We are not billed by instruction count, rather by time, we should measure total running time instead? Does it work as a sanity check or am i missing something?

@avikivity
Copy link
Member

Why do we measure instructions per operation? We are not billed by instruction count, rather by time, we should measure total running time instead? Does it work as a sanity check or am i missing something?

Time is very unstable (just try it) since it depends on temperature and cooling.

Instructions is a more stable proxy for time (but inaccurate since instructions-per-clock can change).

@avikivity
Copy link
Member

So we add 0.25% overhead. Hard to judge if it's worthwhile.

@radoslawcybulski
Copy link
Author

Time is very unstable (just try it) since it depends on temperature and cooling.

Agree.

Instructions is a more stable proxy for time (but inaccurate since instructions-per-clock can change).

It's absolutely stable, but we're counting steps, which - in my opinion - sort-of blinds us. It's like two different walks, the same step count, but step sizes differ, so is distance. Here for example if we're running at max IPC, then we've 0.25% overhead. If otherwise we're waiting for memory, then the write is pretty much free (we've already waited for this cache line, because other stuff is read / written there) and proc is waiting for other memory pieces, so it can do our write.

@avikivity
Copy link
Member

Time is very unstable (just try it) since it depends on temperature and cooling.

Agree.

Instructions is a more stable proxy for time (but inaccurate since instructions-per-clock can change).

It's absolutely stable, but we're counting steps, which - in my opinion - sort-of blinds us. It's like two different walks, the same step count, but step sizes differ, so is distance. Here for example if we're running at max IPC, then we've 0.25% overhead. If otherwise we're waiting for memory, then the write is pretty much free (we've already waited for this cache line, because other stuff is read / written there) and proc is waiting for other memory pieces, so it can do our write.

Writes are more-or-less free, except for the increase in instruction footprint (which are extra reads).

We aren't close to running at max IPC. Typical for mini-benchmarks like perf-simple-query is 2, and for production ~1. The main killer is waiting for instruction fetch (although maybe in production mispredicts would also contribute since the the data is less regular). I'm sure an M4 would give much higher IPC.

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the performance impact could be a concern, can we make it an optional feature which is enabled only in the debug builds?

@@ -30,6 +30,7 @@

#ifndef SEASTAR_MODULE
#include <coroutine>
#include <source_location>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me a while to check if it's safe to use <source_location> instead of util/std-compat.hh . the answer is yes. as we support the latest two major versions of the GCC and Clang compilers. in general, we use libstdc++, which added the support to source_location in gcc-mirror/gcc@57d76ee, which is included by gcc 12 and up. and the latest stable release of GCC is GCC 14.2.

@radoslawcybulski
Copy link
Author

radoslawcybulski commented Apr 14, 2025

if the performance impact could be a concern, can we make it an optional feature which is enabled only in the debug builds?

This has been discussed here (#2381) and consensus is that the release is where it's useful the most.

@radoslawcybulski radoslawcybulski self-assigned this Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Looking for stuck/deadlocked fibers could be easier
3 participants