Skip to content

Improve tasks dump#3434

Open
radoslawcybulski wants to merge 2 commits into
scylladb:masterfrom
radoslawcybulski:improve-tasks-dump
Open

Improve tasks dump#3434
radoslawcybulski wants to merge 2 commits into
scylladb:masterfrom
radoslawcybulski:improve-tasks-dump

Conversation

@radoslawcybulski

@radoslawcybulski radoslawcybulski commented May 27, 2026

Copy link
Copy Markdown

Previously dump of large task queue would build one, large message. It turned out it can cause memory allocation failures in rare cases. The patch updates function to log task queue info's each line as it's own log message. In current version it looks like this:

WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - Too long queue accumulated for main (13 tasks)
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 1: seastar/src/core/reactor.cc:3749:36
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: tasks/task_manager.cc:322:22
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - tasks/task_manager.cc:322:22
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: seastar/include/seastar/core/semaphore.hh:648:37
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - seastar/include/seastar/core/semaphore.hh:648:37
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 3: seastar/include/seastar/core/smp.hh:240:50
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: compaction/task_manager_module.cc:811:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - compaction/task_manager_module.cc:811:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 1: PN7seastar4taskE
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - 2: tasks/task_manager.cc:253:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - tasks/task_manager.cc:253:9
WARN 2026-05-27 12:49:03,505 [shard 1:main] seastar - End of dump for main (13 tasks)

Since log messages might interleave with other logs, to retrieve whole task dump you need to grep by [shard 1:main] and gather all lines between Too long queue accumulated... and End of dump for with the same name.

Refs: SCYLLADB-1734

Update `log` function variants, that take `rate_limit` to return true if
message was actually logged and false otherwise. This is useful, when
you want to rate limit group of messages - in such case you rate limit
first message and ignore printing rest of them if first one was skipped.
@radoslawcybulski radoslawcybulski requested a review from xemul May 27, 2026 10:58
@ScyllaPiotr ScyllaPiotr requested a review from Copilot May 27, 2026 12:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mykaul

mykaul commented May 28, 2026

Copy link
Copy Markdown
Contributor

@avikivity - can you please review?

@mykaul

mykaul commented May 31, 2026

Copy link
Copy Markdown
Contributor

@avikivity - can you please review?

@avikivity - re-pinging.

@avikivity

Copy link
Copy Markdown
Member

I don't think it will fix SCYLLADB-1734, it will make the infinite loop not crash on OOM, but it will still be infinite.

template <typename... Args>
void log(log_level level, rate_limit& rl, format_info_t<Args...> fmt, Args&&... args) noexcept {
bool log(log_level level, rate_limit& rl, format_info_t<Args...> fmt, Args&&... args) noexcept {
if (is_enabled(level) && rl.check()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rate-limit check should be done outside the loop. Check once, then call the log() overload that doesn't take rate-limit as a parameter. Otherwise a long multi-line log will randomly break in the middle.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following.

The check in my version is done on the first line of the whole output (Too long queue accumulated.. line), outside of the for loop doing the task dump. Technically it's inside the while loop doing tasks, but the rate limi is exactly for this purpose - it will print for one task every N seconds.

Comment thread src/core/reactor.cc Outdated
if (count_text.size() < 8)
count_text.resize(8, ' ');

for(auto t = task; t; t = t->waiting_task()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use spaces after for, for for is not a function.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Previously dump of large task queue would build one, large message. It
turned out it can cause memory allocation failures in rare cases. The
patch updates function to log task queue info's each line as it's own
log message. In current version it looks like this:

WARN  2026-06-01 10:20:01,565 [shard 0:main] seastar - Too long queue accumulated for main (35 tasks)
WARN  2026-06-01 10:20:01,565 [shard 0:main] seastar -  4:     seastar/include/seastar/core/smp.hh:378:32
WARN  2026-06-01 10:20:01,566 [shard 0:main] seastar -  15:    seastar/include/seastar/core/semaphore.hh:648:37
WARN  2026-06-01 10:20:01,566 [shard 0:main] seastar -         seastar/src/core/reactor.cc:3750:36
WARN  2026-06-01 10:20:01,566 [shard 0:main] seastar -  16:    seastar/src/core/reactor.cc:3750:36
WARN  2026-06-01 10:20:01,566 [shard 0:main] seastar - End of dump for main (35 tasks)

Since log messages might interleave with other logs, to retrieve whole
task dump you need to grep by `[shard 1:main]` and gather all lines
between `Too long queue accumulated...` and `End of dump for` with the
same name.

Refs: SCYLLADB-1734
@radoslawcybulski

Copy link
Copy Markdown
Author

Patch updated - remove spurious end-of-line character in log output.

@mykaul

mykaul commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@avikivity - ping for re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants