Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu_profiler: add scheduler group #25394

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

travisdowns
Copy link
Member

cpu_profiler: add scheduler group

Add scheduler group to each sample. This is recorded on the seastar at
the moment of the sample side and we now include it in the output.

This is useful for understanding what is running in what scheduler
group.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • The built-in CPU profiler function now includes in its output the scheduler group active at the moment of the sample.

@travisdowns travisdowns requested a review from a team as a code owner March 15, 2025 04:01
@travisdowns travisdowns requested review from StephanDollberg and removed request for a team March 15, 2025 04:02
@travisdowns travisdowns requested a review from ballard26 March 15, 2025 04:02
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#63191
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b8-4796-9575-793c3e172469 FLAKY 1/2
rptest.tests.datalake.custom_partitioning_test.DatalakeCustomPartitioningTest.test_spec_evolution.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b7-4381-8345-9903024e185c FLAKY 1/2
rptest.tests.datalake.datalake_upgrade_test.DatalakeUpgradeTest.test_upload_through_upgrade.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.SPARK ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959854-70bc-4cf3-b173-b91ffd0606f2 FLAKY 1/5
rptest.tests.full_disk_test.FullDiskReclaimTest.test_full_disk_triggers_gc ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b8-4796-9575-793c3e172469 FLAKY 1/2
rptest.tests.tiered_storage_pause_test.TestTieredStoragePause.test_safe_pause_resume.allow_gaps_topic_level=True.allow_gaps_cluster_level=False ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959854-70bb-4714-a93c-b17737b1f69e FLAKY 1/2
rptest.tests.upgrade_test.UpgradeWithWorkloadTest.test_rolling_upgrade_with_rollback.upgrade_after_rollback=True ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b7-4381-8345-9903024e185c FLAKY 1/2

for (auto& result : results_buffer.samples) {
backtraces[result.user_backtrace]++;
++backtraces[{result.user_backtrace, result.sg.name()}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still allow us to later aggregate the scheduling group away on the pprof side right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and I think that happens by default? I.e., it doesn't split by additional tags by default.

... and currently I don't do anything with this attribute in the convert.py, so right now it should always be aggregated together (but I better double check the pprof behavoir here: that it's fine to get two identical backtraces in its input and sums their occurrences).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ballard26 will probably know but I guess it begs the question why we do this sample aggregation in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephanDollberg - yeah I'm pretty sure pprof would work fine if every sample just had occurrences=1 and there were repeats. I guess the aggregation helps reduce the size of everything after it happens, e.g., the json response, etc. Aggregation makes sense to me currently though if we want to add more attributes it might stop working at some point, e.g., if added a timestamp or serial number then every sample would be unique.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PProf works fine with identical samples AFAIK. I added aggregation for memory usage and json size savings.

Copy link
Contributor

@ballard26 ballard26 Mar 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e, currently we see PProf aggregate identical traces from different shards as we'd expect when the shard label isn't being set as the tag root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephanDollberg tested a bit and confirmed:

  • pprof seems fine with the same stack multiple times in the input, in views like flame graph it just sums the sample count from any identical stacks
  • the difference is still visible in some cases, e.g., at pprof CLI using traces shows unaggregated values, similar to the input (i.e., if you have two identical stacks with 5 and 5 samples, they behave the same mostly as 1 stack with 10 samples, but in traces output you do see 5/5

None of the above really has anything to with scheduling group per-se: that is a "label" and doesn't have any effect on the aggregation by default. If you use --tag_root then it uses the tag to create a synthetic root node as you are probably aware.

Here's a example from localhost using --tag_root scheduling_group:

image

Note that I added an sg: prefix to the scheduling group, since otherwise main group gets conflated with main() function causing weirdness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I hadn't refreshed and seen @ballard26 comments before posting my reply but they amount to the same thing.

Add the resource management tests to bazel.

In the storage packages, we needed to make batch_cache visible to this
package as there is a test for it here.

Fixes CORE-7586.
Fixes CORE-7587.
Fixes CORE-7588.
Fixes CORE-7589.
Fixes CORE-7590.
Add scheduler group to each sample. This is recorded on the seastar at
the moment of the sample side and we now include it in the output.

This is useful for understanding what is running in what scheduler
group.

Add a new unit test case for this functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants