Skip to content

cpu_profiler: add scheduler group #25394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025

Conversation

travisdowns
Copy link
Member

cpu_profiler: add scheduler group

Add scheduler group to each sample. This is recorded on the seastar at
the moment of the sample side and we now include it in the output.

This is useful for understanding what is running in what scheduler
group.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • The built-in CPU profiler function now includes in its output the scheduler group active at the moment of the sample.

@travisdowns travisdowns requested a review from a team as a code owner March 15, 2025 04:01
@travisdowns travisdowns requested review from StephanDollberg and removed request for a team March 15, 2025 04:02
@travisdowns travisdowns requested a review from ballard26 March 15, 2025 04:02
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 15, 2025

CI test results

test results on build#63191
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b8-4796-9575-793c3e172469 FLAKY 1/2
rptest.tests.datalake.custom_partitioning_test.DatalakeCustomPartitioningTest.test_spec_evolution.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b7-4381-8345-9903024e185c FLAKY 1/2
rptest.tests.datalake.datalake_upgrade_test.DatalakeUpgradeTest.test_upload_through_upgrade.cloud_storage_type=CloudStorageType.S3.query_engine=QueryEngineType.SPARK ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959854-70bc-4cf3-b173-b91ffd0606f2 FLAKY 1/5
rptest.tests.full_disk_test.FullDiskReclaimTest.test_full_disk_triggers_gc ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b8-4796-9575-793c3e172469 FLAKY 1/2
rptest.tests.tiered_storage_pause_test.TestTieredStoragePause.test_safe_pause_resume.allow_gaps_topic_level=True.allow_gaps_cluster_level=False ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959854-70bb-4714-a93c-b17737b1f69e FLAKY 1/2
rptest.tests.upgrade_test.UpgradeWithWorkloadTest.test_rolling_upgrade_with_rollback.upgrade_after_rollback=True ducktape https://buildkite.com/redpanda/redpanda/builds/63191#01959840-73b7-4381-8345-9903024e185c FLAKY 1/2
test results on build#63208
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/63208#0195a249-1b02-4cb8-93c6-a2debaac4eb3 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/63208#0195a263-042f-4493-8eed-f908e66ff256 FLAKY 1/2
test_kafka_protocol_unit_rpunit.test_kafka_protocol_unit_rpunit unit https://buildkite.com/redpanda/redpanda/builds/63208#0195a208-9cb5-4b67-a80f-70bf1ff64a26 FLAKY 1/2
test results on build#63243
test_id test_kind job_url test_status passed
rptest.tests.availability_test.AvailabilityTests.test_recovery_after_catastrophic_failure ducktape https://buildkite.com/redpanda/redpanda/builds/63243#0195a581-8218-4999-aa9d-25b2a22d9677 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/63243#0195a59c-dae4-4e43-b0b3-d5d4021235dd FLAKY 1/2
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/63243#0195a581-8219-4962-a5ce-d64275001455 FLAKY 1/2
rptest.tests.upgrade_test.UpgradeWithWorkloadTest.test_rolling_upgrade_with_rollback.upgrade_after_rollback=True ducktape https://buildkite.com/redpanda/redpanda/builds/63243#0195a581-8218-4999-aa9d-25b2a22d9677 FLAKY 1/2
test results on build#63347
test_id test_kind job_url test_status passed
kafka_server_rpfixture.kafka_server_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/63347#0195aebc-1bdf-4d7e-8bc1-da01c6158d0c FLAKY 1/2
rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/63347#0195af01-8fda-4c24-9476-634002395888 FLAKY 1/2
rptest.tests.consumer_group_test.ConsumerGroupTest.test_group_lag_metrics ducktape https://buildkite.com/redpanda/redpanda/builds/63347#0195af01-8fdb-4bc2-9b5e-e5bc75540266 FLAKY 1/3
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/63347#0195aefd-f266-4c6a-9d49-4171c9cd3093 FLAKY 1/3
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_HADOOP ducktape https://buildkite.com/redpanda/redpanda/builds/63347#0195aefd-f264-444e-8ef2-b16e3b64395b FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/63347#0195af01-8fda-4c24-9476-634002395888 FLAKY 1/2
rptest.tests.upgrade_test.UpgradeWithWorkloadTest.test_rolling_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/63347#0195aefd-f264-444e-8ef2-b16e3b64395b FLAKY 1/2

for (auto& result : results_buffer.samples) {
backtraces[result.user_backtrace]++;
++backtraces[{result.user_backtrace, result.sg.name()}];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still allow us to later aggregate the scheduling group away on the pprof side right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and I think that happens by default? I.e., it doesn't split by additional tags by default.

... and currently I don't do anything with this attribute in the convert.py, so right now it should always be aggregated together (but I better double check the pprof behavoir here: that it's fine to get two identical backtraces in its input and sums their occurrences).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ballard26 will probably know but I guess it begs the question why we do this sample aggregation in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephanDollberg - yeah I'm pretty sure pprof would work fine if every sample just had occurrences=1 and there were repeats. I guess the aggregation helps reduce the size of everything after it happens, e.g., the json response, etc. Aggregation makes sense to me currently though if we want to add more attributes it might stop working at some point, e.g., if added a timestamp or serial number then every sample would be unique.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PProf works fine with identical samples AFAIK. I added aggregation for memory usage and json size savings.

Copy link
Contributor

@ballard26 ballard26 Mar 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e, currently we see PProf aggregate identical traces from different shards as we'd expect when the shard label isn't being set as the tag root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephanDollberg tested a bit and confirmed:

  • pprof seems fine with the same stack multiple times in the input, in views like flame graph it just sums the sample count from any identical stacks
  • the difference is still visible in some cases, e.g., at pprof CLI using traces shows unaggregated values, similar to the input (i.e., if you have two identical stacks with 5 and 5 samples, they behave the same mostly as 1 stack with 10 samples, but in traces output you do see 5/5

None of the above really has anything to with scheduling group per-se: that is a "label" and doesn't have any effect on the aggregation by default. If you use --tag_root then it uses the tag to create a synthetic root node as you are probably aware.

Here's a example from localhost using --tag_root scheduling_group:

image

Note that I added an sg: prefix to the scheduling group, since otherwise main group gets conflated with main() function causing weirdness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I hadn't refreshed and seen @ballard26 comments before posting my reply but they amount to the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Btw did you test does --tag_root take multiple args? I.e.: can you do shard+sg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephanDollberg yes, it does, here's the result of pprof -tagroot thread_id,scheduling_group -http 127.0.0.1:10002 prof.proto:

image

@travisdowns
Copy link
Member Author

/ci-repeat 1

Add scheduler group to each sample. This is recorded on the seastar at
the moment of the sample side and we now include it in the output.

This is useful for understanding what is running in what scheduler
group.

Add a new unit test case for this functionality.
@travisdowns
Copy link
Member Author

Push 67a9d53 removes the shared commit with an PR that went in earlier.

Copy link
Member

@StephanDollberg StephanDollberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to backport?

@travisdowns travisdowns merged commit cb150c6 into redpanda-data:dev Mar 19, 2025
19 checks passed
@piyushredpanda piyushredpanda modified the milestone: v25.1.1-rc3 Mar 20, 2025
@vbotbuildovich
Copy link
Collaborator

Oops! Something went wrong.

Workflow run logs.

@travisdowns
Copy link
Member Author

/backport v25.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants