Skip to content

Support reporting statistics in spark datasource#8057

Open
robert3005 wants to merge 1 commit into
developfrom
rk/sparkstats
Open

Support reporting statistics in spark datasource#8057
robert3005 wants to merge 1 commit into
developfrom
rk/sparkstats

Conversation

@robert3005
Copy link
Copy Markdown
Contributor

Spark mostly focuses on sizeInBytes which we populate from file sizes with
scaling. We also report numRows since that exists in our datasource.

Signed-off-by: Robert Kruszewski <github@robertk.io>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 2 improved benchmarks
❌ 1 regressed benchmark
✅ 1234 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation fast_lt_out_of_range[4, 65536] 204.3 µs 262.3 µs -22.09%
Simulation baseline_eq[16, 65536] 287.6 µs 259.6 µs +10.78%
Simulation baseline_lt[16, 65536] 302.7 µs 274.7 µs +10.19%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing rk/sparkstats (41c0d4f) with develop (012d0ec)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant