-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add source location to task and tasktrace object #2707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add source location to task and tasktrace object #2707
Conversation
02aa70a
to
c0149e3
Compare
@@ -35,6 +36,8 @@ class task { | |||
protected: | |||
scheduling_group _sg; | |||
private: | |||
std::source_location resume_point = {}; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is making a very heavily used object larger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding std::source_location
coroutine_traits_base<>::promise_type
(which inherits from task) is 64 bytes,
continuation<Promise, Func, Wrapper, T>
is around 104-120, depending on exact version.
No difference except for that single unlucky value, who is "promoted" to the next 64 byte cache line in case of continuation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay.
Aren't those addresses derivable from the stack trace? Or are those points lost? How large is std::source_location? |
Probably with a little bit of magic. You'd still have to store a pointer somewhere, because stacktrace is valid for an actually running task, while my patch adds "next instruction" to tasks, which are suspended.
8 bytes. |
c0149e3
to
08d3c8a
Compare
Maybe we can tolerate it. @tgrabiec what's your opinion? Please run scylladb's perf-simple-query and report insn-per-op before and after (although we won't learn of the impact of the size change). |
my patch
master
My patch will run more instructions, as there are additional stores to the task object (std::source_location needs to be written). Why do we measure instructions per operation? We are not billed by instruction count, rather by time, we should measure total running time instead? Does it work as a sanity check or am i missing something? |
Time is very unstable (just try it) since it depends on temperature and cooling. Instructions is a more stable proxy for time (but inaccurate since instructions-per-clock can change). |
So we add 0.25% overhead. Hard to judge if it's worthwhile. |
Agree.
It's absolutely stable, but we're counting steps, which - in my opinion - sort-of blinds us. It's like two different walks, the same step count, but step sizes differ, so is distance. Here for example if we're running at max IPC, then we've 0.25% overhead. If otherwise we're waiting for memory, then the write is pretty much free (we've already waited for this cache line, because other stuff is read / written there) and proc is waiting for other memory pieces, so it can do our write. |
Writes are more-or-less free, except for the increase in instruction footprint (which are extra reads). We aren't close to running at max IPC. Typical for mini-benchmarks like perf-simple-query is 2, and for production ~1. The main killer is waiting for instruction fetch (although maybe in production mispredicts would also contribute since the the data is less regular). I'm sure an M4 would give much higher IPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the performance impact could be a concern, can we make it an optional feature which is enabled only in the debug builds?
@@ -30,6 +30,7 @@ | |||
|
|||
#ifndef SEASTAR_MODULE | |||
#include <coroutine> | |||
#include <source_location> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it took me a while to check if it's safe to use <source_location>
instead of util/std-compat.hh
. the answer is yes. as we support the latest two major versions of the GCC and Clang compilers. in general, we use libstdc++, which added the support to source_location
in gcc-mirror/gcc@57d76ee, which is included by gcc 12 and up. and the latest stable release of GCC is GCC 14.2.
This has been discussed here (#2381) and consensus is that the release is where it's useful the most. |
I think it's probably ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
include/seastar/core/task.hh
Outdated
@@ -35,6 +36,8 @@ class task { | |||
protected: | |||
scheduling_group _sg; | |||
private: | |||
std::source_location resume_point = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put the resume_point
first since it's larger than scheduling_group
. that way the task
object size is 4 bytes less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really - if the class is not packed then the size includes the alignment, so order doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true. The object will be aligned to 8 bytes boundary. However in case we add something after these variables, less than 5 bytes, then there will be a difference, so having it ordered by size might be beneficial in the future.
6fe697d
to
80ce73a
Compare
include/seastar/core/task.hh
Outdated
#endif | ||
|
||
namespace seastar { | ||
|
||
SEASTAR_MODULE_EXPORT | ||
class task { | ||
std::source_location resume_point = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Should not we prepend class member variables with an underscore: resume_point
->_resume_point
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Add an empty, default constructed std::source_location object to the task object and getter / setter.
Add calls to update `resume_point` variable with location of next resume location to all `await_suspend` functions and `then` functions.
Add resume point locations to `tasktrace` object. Update `formatter::format` to print source location of next resume alone with task type.
80ce73a
to
58aa04e
Compare
@scylladb/seastar-maint please merge. |
@avikivity I think the gain is much higher than 0.25% here. We need to reduce the bar of entry into debugging seastar applications. |
@denesb can you tell me, how did you figure out performance differences for my patch (or any patch)? Sub 1% precision looks amazing to me, how did you do that? |
I didn't, I just reacted to Avi's comment. I didn't claim that the perf regression is something else, just that 0.25% that Avi claimed is worth it for the gain of seeing line numbers. |
We're talking about the |
Adds source location of next resume point to
task
objects. When constructingtasktrace
object, source location of resume points will be added to it. When printing, the source location of next resume point will be printed as well, hopefully improving debug experience.Example:
Partially fixes #2381