Improve tasks dump#3434
Conversation
Update `log` function variants, that take `rate_limit` to return true if message was actually logged and false otherwise. This is useful, when you want to rate limit group of messages - in such case you rate limit first message and ignore printing rest of them if first one was skipped.
adbe1dd to
b62c5a7
Compare
There was a problem hiding this comment.
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b62c5a7 to
2d0be18
Compare
|
@avikivity - can you please review? |
@avikivity - re-pinging. |
|
I don't think it will fix SCYLLADB-1734, it will make the infinite loop not crash on OOM, but it will still be infinite. |
| template <typename... Args> | ||
| void log(log_level level, rate_limit& rl, format_info_t<Args...> fmt, Args&&... args) noexcept { | ||
| bool log(log_level level, rate_limit& rl, format_info_t<Args...> fmt, Args&&... args) noexcept { | ||
| if (is_enabled(level) && rl.check()) { |
There was a problem hiding this comment.
I think the rate-limit check should be done outside the loop. Check once, then call the log() overload that doesn't take rate-limit as a parameter. Otherwise a long multi-line log will randomly break in the middle.
There was a problem hiding this comment.
I'm not following.
The check in my version is done on the first line of the whole output (Too long queue accumulated.. line), outside of the for loop doing the task dump. Technically it's inside the while loop doing tasks, but the rate limi is exactly for this purpose - it will print for one task every N seconds.
| if (count_text.size() < 8) | ||
| count_text.resize(8, ' '); | ||
|
|
||
| for(auto t = task; t; t = t->waiting_task()) { |
There was a problem hiding this comment.
We use spaces after for, for for is not a function.
Previously dump of large task queue would build one, large message. It turned out it can cause memory allocation failures in rare cases. The patch updates function to log task queue info's each line as it's own log message. In current version it looks like this: WARN 2026-06-01 10:20:01,565 [shard 0:main] seastar - Too long queue accumulated for main (35 tasks) WARN 2026-06-01 10:20:01,565 [shard 0:main] seastar - 4: seastar/include/seastar/core/smp.hh:378:32 WARN 2026-06-01 10:20:01,566 [shard 0:main] seastar - 15: seastar/include/seastar/core/semaphore.hh:648:37 WARN 2026-06-01 10:20:01,566 [shard 0:main] seastar - seastar/src/core/reactor.cc:3750:36 WARN 2026-06-01 10:20:01,566 [shard 0:main] seastar - 16: seastar/src/core/reactor.cc:3750:36 WARN 2026-06-01 10:20:01,566 [shard 0:main] seastar - End of dump for main (35 tasks) Since log messages might interleave with other logs, to retrieve whole task dump you need to grep by `[shard 1:main]` and gather all lines between `Too long queue accumulated...` and `End of dump for` with the same name. Refs: SCYLLADB-1734
2d0be18 to
45ea21c
Compare
|
Patch updated - remove spurious end-of-line character in log output. |
|
@avikivity - ping for re-review. |
Previously dump of large task queue would build one, large message. It turned out it can cause memory allocation failures in rare cases. The patch updates function to log task queue info's each line as it's own log message. In current version it looks like this:
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - Too long queue accumulated for main (13 tasks)
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 1: seastar/src/core/reactor.cc:3749:36
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: tasks/task_manager.cc:322:22
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - tasks/task_manager.cc:322:22
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: seastar/include/seastar/core/semaphore.hh:648:37
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - seastar/include/seastar/core/semaphore.hh:648:37
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 3: seastar/include/seastar/core/smp.hh:240:50
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: compaction/task_manager_module.cc:811:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - compaction/task_manager_module.cc:811:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 1: PN7seastar4taskE
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: tasks/task_manager.cc:253:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - tasks/task_manager.cc:253:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - End of dump for main (13 tasks)
Since log messages might interleave with other logs, to retrieve whole task dump you need to grep by
[shard 1:main]and gather all lines betweenToo long queue accumulated...andEnd of dump forwith the same name.Refs: SCYLLADB-1734