Skip to content

cargo: Updating iai-callgrind to gungraun#1692

Draft
teofr wants to merge 3 commits into
mainfrom
teofr/update-iai-callgrind
Draft

cargo: Updating iai-callgrind to gungraun#1692
teofr wants to merge 3 commits into
mainfrom
teofr/update-iai-callgrind

Conversation

@teofr

@teofr teofr commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Fixes #1690

Upgrade iai-callgrind to gungraun (new name) and some breaking changes on the API.

NOTE:

Upgrading from iai-callgrind (v0.12.3) to gungraun (v0.18.1) changes how the proc macro formats benchmark IDs. The part after the bench case name now wraps the setup expression in parentheses:

Before: comparison::cooldogs_group::slang_cooldogs test:tests :: setup :: setup(stringify! (cooldogs))
After: comparison::cooldogs_group::slang_cooldogs test:(tests :: setup :: setup(stringify! (cooldogs)))

Solution

Before merging the PR, rename the existing benchmarks in Bencher Cloud using the bencher benchmark update CLI command. This rewrites the names in-place so that when gungraun starts reporting with the new ID format, Bencher matches them to the existing historical data — preserving continuity across all 3 projects (slang-dashboard-cargo-slang, slang-dashboard-cargo-cmp, slang-dashboard-cargo-slang-v2).

@teofr teofr requested review from a team as code owners April 15, 2026 12:27
@changeset-bot

changeset-bot Bot commented Apr 15, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 32bdf49

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@teofr teofr added the ci:perf Runs performance test dry-runs in a PR (rather than the smoke-tests) label Apr 15, 2026
@socket-security

socket-security Bot commented Apr 15, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​gungraun@​0.18.17710096100100

View full report

@teofr teofr marked this pull request as draft April 15, 2026 12:31
@teofr

teofr commented Apr 15, 2026

Copy link
Copy Markdown
Contributor Author

Let's wait 2 days for the cooldown check, I don't think there's urgency on this.

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchteofr/update-iai-callgrind
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🚨 3 Alerts

🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchteofr/update-iai-callgrind
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchteofr/update-iai-callgrind
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchteofr/update-iai-callgrind
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

🐰 View full continuous benchmarking report in Bencher

@teofr teofr force-pushed the teofr/update-iai-callgrind branch from 428b470 to 24957a9 Compare April 20, 2026 18:23
@teofr teofr marked this pull request as ready for review April 20, 2026 18:24

@nebasuke nebasuke left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Teo, looks good!

@teofr teofr force-pushed the teofr/update-iai-callgrind branch from 24957a9 to b2d21ac Compare April 30, 2026 14:15
@teofr teofr force-pushed the teofr/update-iai-callgrind branch from b2d21ac to 430fdf7 Compare May 6, 2026 12:07

@ggiraldez ggiraldez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@teofr teofr force-pushed the teofr/update-iai-callgrind branch from 430fdf7 to 7bd5b5b Compare May 11, 2026 09:36
teofr added a commit that referenced this pull request May 13, 2026
…rks names (#1750)

This PR renames the way each `gungraun` benchmark is named:

`gungraun` names benchmarks using the following template
`{file}::{group}::{function} <bench_id>:(<arguments>)`.

Before this change we were using:
 - each `group` for a different project (like `weighted_pool`);
- each `function` for a different phase, but also repeating the project
(since function names have to be unique);
 - the `<bench_id>` was always just test;
 - and `arguments` is usually ignored (since it's also truncated).

This change makes them more uniform and less repetitive:
- `{file}` is the bench binary: `slang`, `slang_v2`, or `comparison`
(same as before);
- `{group}` is `pipeline` for `slang` and `slang_v2`, or `parsers` for
`comparison` (no actual information, but we could want to add other
comparison points, or end to end usecases to slang(_v2));
- `{function}` is the pipeline stage (`parser`, `cursor`, `query`,
`bindings_build`, …) for `slang` / `slang_v2`, or the parser
implementation (`slang`, `slang_v2`, `solar`, `tree_sitter`) for
`comparison`;
- `<project>` is one of the entries in
[`projects.json`](./projects.json), attached as a
`#[bench::<project>("<project>")]` attribute on the function.

This makes benchmark names go from:
`slang::weighted_pool_full::weighted_pool_parser test:(...)`

to:
`slang::pipeline::parser weighted_pool:(...)`

This also simplifies how some of the benchmarks express which projects
they can run.

-----
Note: This is a big change since it requires either loosing the history
within bencher, or migrating old benchmarks names. But since we already
have to do it because of #1692, it's a good time to improve on it.


Also:

This PR adds another step to `infra perf cargo --pr-benchmark` to
compare the names of the current's code benchmarks and the ones present
in bencher. If there are any differences it will report the new and
orphan benchmarks.

Given how `gungraun` names benchmarks, how we can recover them locally
without actually executing them, and how bencher stores them, we may see
some false negatives in the future. If this best effort approach becomes
too cumbersome we could revert it.
teofr added a commit that referenced this pull request May 13, 2026
…rks names (#1750)

This PR renames the way each `gungraun` benchmark is named:

`gungraun` names benchmarks using the following template
`{file}::{group}::{function} <bench_id>:(<arguments>)`.

Before this change we were using:
 - each `group` for a different project (like `weighted_pool`);
- each `function` for a different phase, but also repeating the project
(since function names have to be unique);
 - the `<bench_id>` was always just test;
 - and `arguments` is usually ignored (since it's also truncated).

This change makes them more uniform and less repetitive:
- `{file}` is the bench binary: `slang`, `slang_v2`, or `comparison`
(same as before);
- `{group}` is `pipeline` for `slang` and `slang_v2`, or `parsers` for
`comparison` (no actual information, but we could want to add other
comparison points, or end to end usecases to slang(_v2));
- `{function}` is the pipeline stage (`parser`, `cursor`, `query`,
`bindings_build`, …) for `slang` / `slang_v2`, or the parser
implementation (`slang`, `slang_v2`, `solar`, `tree_sitter`) for
`comparison`;
- `<project>` is one of the entries in
[`projects.json`](./projects.json), attached as a
`#[bench::<project>("<project>")]` attribute on the function.

This makes benchmark names go from:
`slang::weighted_pool_full::weighted_pool_parser test:(...)`

to:
`slang::pipeline::parser weighted_pool:(...)`

This also simplifies how some of the benchmarks express which projects
they can run.

-----
Note: This is a big change since it requires either loosing the history
within bencher, or migrating old benchmarks names. But since we already
have to do it because of #1692, it's a good time to improve on it.


Also:

This PR adds another step to `infra perf cargo --pr-benchmark` to
compare the names of the current's code benchmarks and the ones present
in bencher. If there are any differences it will report the new and
orphan benchmarks.

Given how `gungraun` names benchmarks, how we can recover them locally
without actually executing them, and how bencher stores them, we may see
some false negatives in the future. If this best effort approach becomes
too cumbersome we could revert it.
@teofr teofr force-pushed the teofr/update-iai-callgrind branch from 4d0f601 to 8d932d2 Compare May 13, 2026 07:26
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Bench list diff for slang_v2 (project slang-dashboard-cargo-slang-v2)

Orphan benchmarks (36) — recorded on bencher but no longer in code. Either renamed in this PR or candidates for archival.
  • slang_v2::cooldogs_full_v2::cooldogs_compute_contracts_abi::test
  • slang_v2::cooldogs_full_v2::cooldogs_ir_builder::test
  • slang_v2::cooldogs_full_v2::cooldogs_parser::test
  • slang_v2::cooldogs_full_v2::cooldogs_semantic::test
  • slang_v2::create_x_full_v2::create_x_compute_contracts_abi::test
  • slang_v2::create_x_full_v2::create_x_ir_builder::test
  • slang_v2::create_x_full_v2::create_x_parser::test
  • slang_v2::create_x_full_v2::create_x_semantic::test
  • slang_v2::merkle_proof_full::merkle_proof_ir_builder::test
  • slang_v2::merkle_proof_full::merkle_proof_parser::test
  • slang_v2::merkle_proof_full_v2::merkle_proof_compute_contracts_abi::test
  • slang_v2::merkle_proof_full_v2::merkle_proof_ir_builder::test
  • slang_v2::merkle_proof_full_v2::merkle_proof_parser::test
  • slang_v2::merkle_proof_full_v2::merkle_proof_semantic::test
  • slang_v2::multicall3_full_v2::multicall3_compute_contracts_abi::test
  • slang_v2::multicall3_full_v2::multicall3_ir_builder::test
  • slang_v2::multicall3_full_v2::multicall3_parser::test
  • slang_v2::multicall3_full_v2::multicall3_semantic::test
  • slang_v2::one_step_leverage_f_full_v2::one_step_leverage_f_compute_contracts_abi::test
  • slang_v2::one_step_leverage_f_full_v2::one_step_leverage_f_ir_builder::test
  • slang_v2::one_step_leverage_f_full_v2::one_step_leverage_f_parser::test
  • slang_v2::one_step_leverage_f_full_v2::one_step_leverage_f_semantic::test
  • slang_v2::pointer_libraries_full_v2::pointer_libraries_compute_contracts_abi::test
  • slang_v2::pointer_libraries_full_v2::pointer_libraries_ir_builder::test
  • slang_v2::pointer_libraries_full_v2::pointer_libraries_parser::test
  • slang_v2::pointer_libraries_full_v2::pointer_libraries_semantic::test
  • slang_v2::ui_pool_data_provider_v3_full::ui_pool_data_provider_v3_ir_builder::test
  • slang_v2::ui_pool_data_provider_v3_full::ui_pool_data_provider_v3_parser::test
  • slang_v2::ui_pool_data_provider_v3_full_v2::ui_pool_data_provider_v3_compute_contracts_abi::test
  • slang_v2::ui_pool_data_provider_v3_full_v2::ui_pool_data_provider_v3_ir_builder::test
  • slang_v2::ui_pool_data_provider_v3_full_v2::ui_pool_data_provider_v3_parser::test
  • slang_v2::ui_pool_data_provider_v3_full_v2::ui_pool_data_provider_v3_semantic::test
  • slang_v2::uniswap_full_v2::uniswap_compute_contracts_abi::test
  • slang_v2::uniswap_full_v2::uniswap_ir_builder::test
  • slang_v2::uniswap_full_v2::uniswap_parser::test
  • slang_v2::uniswap_full_v2::uniswap_semantic::test

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Bench list diff for comparison (project slang-dashboard-cargo-cmp)

Orphan benchmarks (34) — recorded on bencher but no longer in code. Either renamed in this PR or candidates for archival.
  • comparison::cooldogs_group::slang_cooldogs::test
  • comparison::cooldogs_group::slang_v2_cooldogs::test
  • comparison::cooldogs_group::solar_cooldogs::test
  • comparison::cooldogs_group::tree_sitter_cooldogs::test
  • comparison::create_x_group::slang_create_x::test
  • comparison::create_x_group::slang_v2_create_x::test
  • comparison::create_x_group::solar_create_x::test
  • comparison::merkle_proof_group::slang_merkle_proof::test
  • comparison::merkle_proof_group::slang_v2_merkle_proof::test
  • comparison::merkle_proof_group::solar_merkle_proof::test
  • comparison::merkle_proof_group::tree_sitter_merkle_proof::test
  • comparison::mooniswap_group::slang_mooniswap::test
  • comparison::mooniswap_group::tree_sitter_mooniswap::test
  • comparison::multicall3_group::slang_multicall3::test
  • comparison::multicall3_group::slang_v2_multicall3::test
  • comparison::multicall3_group::solar_multicall3::test
  • comparison::multicall3_group::tree_sitter_multicall3::test
  • comparison::one_step_leverage_f_group::slang_one_step_leverage_f::test
  • comparison::one_step_leverage_f_group::slang_v2_one_step_leverage_f::test
  • comparison::one_step_leverage_f_group::solar_one_step_leverage_f::test
  • comparison::one_step_leverage_f_group::tree_sitter_one_step_leverage_f::test
  • comparison::pointer_libraries_group::slang_pointer_libraries::test
  • comparison::pointer_libraries_group::slang_v2_pointer_libraries::test
  • comparison::pointer_libraries_group::solar_pointer_libraries::test
  • comparison::ui_pool_data_provider_v3_group::slang_ui_pool_data_provider_v3::test
  • comparison::ui_pool_data_provider_v3_group::slang_v2_ui_pool_data_provider_v3::test
  • comparison::ui_pool_data_provider_v3_group::solar_ui_pool_data_provider_v3::test
  • comparison::ui_pool_data_provider_v3_group::tree_sitter_ui_pool_data_provider_v3::test
  • comparison::uniswap_group::slang_uniswap::test
  • comparison::uniswap_group::slang_v2_uniswap::test
  • comparison::uniswap_group::solar_uniswap::test
  • comparison::weighted_pool_group::slang_weighted_pool::test
  • comparison::weighted_pool_group::solar_weighted_pool::test
  • comparison::weighted_pool_group::tree_sitter_weighted_pool::test

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Bench list diff for slang (project slang-dashboard-cargo-slang)

Orphan benchmarks (16) — recorded on bencher but no longer in code. Either renamed in this PR or candidates for archival.
  • slang::merkle_proof_full::merkle_proof_binder_v2_cleanup::test
  • slang::merkle_proof_full::merkle_proof_binder_v2_run::test
  • slang::merkle_proof_full::merkle_proof_bindings_build::test
  • slang::merkle_proof_full::merkle_proof_bindings_resolve::test
  • slang::merkle_proof_full::merkle_proof_cleanup::test
  • slang::merkle_proof_full::merkle_proof_cursor::test
  • slang::merkle_proof_full::merkle_proof_parser::test
  • slang::merkle_proof_full::merkle_proof_query::test
  • slang::weighted_pool_full::weighted_pool_binder_v2_cleanup::test
  • slang::weighted_pool_full::weighted_pool_binder_v2_run::test
  • slang::weighted_pool_full::weighted_pool_bindings_build::test
  • slang::weighted_pool_full::weighted_pool_bindings_resolve::test
  • slang::weighted_pool_full::weighted_pool_cleanup::test
  • slang::weighted_pool_full::weighted_pool_cursor::test
  • slang::weighted_pool_full::weighted_pool_parser::test
  • slang::weighted_pool_full::weighted_pool_query::test

teofr added a commit that referenced this pull request May 14, 2026
…rks names (#1750)

This PR renames the way each `gungraun` benchmark is named:

`gungraun` names benchmarks using the following template
`{file}::{group}::{function} <bench_id>:(<arguments>)`.

Before this change we were using:
 - each `group` for a different project (like `weighted_pool`);
- each `function` for a different phase, but also repeating the project
(since function names have to be unique);
 - the `<bench_id>` was always just test;
 - and `arguments` is usually ignored (since it's also truncated).

This change makes them more uniform and less repetitive:
- `{file}` is the bench binary: `slang`, `slang_v2`, or `comparison`
(same as before);
- `{group}` is `pipeline` for `slang` and `slang_v2`, or `parsers` for
`comparison` (no actual information, but we could want to add other
comparison points, or end to end usecases to slang(_v2));
- `{function}` is the pipeline stage (`parser`, `cursor`, `query`,
`bindings_build`, …) for `slang` / `slang_v2`, or the parser
implementation (`slang`, `slang_v2`, `solar`, `tree_sitter`) for
`comparison`;
- `<project>` is one of the entries in
[`projects.json`](./projects.json), attached as a
`#[bench::<project>("<project>")]` attribute on the function.

This makes benchmark names go from:
`slang::weighted_pool_full::weighted_pool_parser test:(...)`

to:
`slang::pipeline::parser weighted_pool:(...)`

This also simplifies how some of the benchmarks express which projects
they can run.

-----
Note: This is a big change since it requires either loosing the history
within bencher, or migrating old benchmarks names. But since we already
have to do it because of #1692, it's a good time to improve on it.


Also:

This PR adds another step to `infra perf cargo --pr-benchmark` to
compare the names of the current's code benchmarks and the ones present
in bencher. If there are any differences it will report the new and
orphan benchmarks.

Given how `gungraun` names benchmarks, how we can recover them locally
without actually executing them, and how bencher stores them, we may see
some false negatives in the future. If this best effort approach becomes
too cumbersome we could revert it.
@teofr teofr force-pushed the teofr/update-iai-callgrind branch 2 times, most recently from 1faae5f to cb54152 Compare May 14, 2026 14:22
@teofr teofr marked this pull request as draft May 14, 2026 16:24
@teofr

teofr commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

Converting to draft until we understand what changed with the DHAT measurements

@teofr teofr force-pushed the teofr/update-iai-callgrind branch from cb54152 to 9718ba4 Compare May 14, 2026 19:13
@teofr teofr force-pushed the teofr/update-iai-callgrind branch from 9718ba4 to e0a8a2d Compare May 20, 2026 08:25
…rks names (#1750)

This PR renames the way each `gungraun` benchmark is named:

`gungraun` names benchmarks using the following template
`{file}::{group}::{function} <bench_id>:(<arguments>)`.

Before this change we were using:
 - each `group` for a different project (like `weighted_pool`);
- each `function` for a different phase, but also repeating the project
(since function names have to be unique);
 - the `<bench_id>` was always just test;
 - and `arguments` is usually ignored (since it's also truncated).

This change makes them more uniform and less repetitive:
- `{file}` is the bench binary: `slang`, `slang_v2`, or `comparison`
(same as before);
- `{group}` is `pipeline` for `slang` and `slang_v2`, or `parsers` for
`comparison` (no actual information, but we could want to add other
comparison points, or end to end usecases to slang(_v2));
- `{function}` is the pipeline stage (`parser`, `cursor`, `query`,
`bindings_build`, …) for `slang` / `slang_v2`, or the parser
implementation (`slang`, `slang_v2`, `solar`, `tree_sitter`) for
`comparison`;
- `<project>` is one of the entries in
[`projects.json`](./projects.json), attached as a
`#[bench::<project>("<project>")]` attribute on the function.

This makes benchmark names go from:
`slang::weighted_pool_full::weighted_pool_parser test:(...)`

to:
`slang::pipeline::parser weighted_pool:(...)`

This also simplifies how some of the benchmarks express which projects
they can run.

-----
Note: This is a big change since it requires either loosing the history
within bencher, or migrating old benchmarks names. But since we already
have to do it because of #1692, it's a good time to improve on it.


Also:

This PR adds another step to `infra perf cargo --pr-benchmark` to
compare the names of the current's code benchmarks and the ones present
in bencher. If there are any differences it will report the new and
orphan benchmarks.

Given how `gungraun` names benchmarks, how we can recover them locally
without actually executing them, and how bencher stores them, we may see
some false negatives in the future. If this best effort approach becomes
too cumbersome we could revert it.
@teofr teofr force-pushed the teofr/update-iai-callgrind branch 2 times, most recently from 997ad18 to 21e96bf Compare May 20, 2026 11:42
@teofr teofr force-pushed the teofr/update-iai-callgrind branch from 21e96bf to 32bdf49 Compare May 21, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:perf Runs performance test dry-runs in a PR (rather than the smoke-tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate perf benchmarks from iai-callgrind to gungraun

3 participants