Add minimal benchmark runner for comparing refs using ASV#128
Add minimal benchmark runner for comparing refs using ASV#128sanskarmit wants to merge 1 commit intosympy:masterfrom
Conversation
There was a problem hiding this comment.
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def rev_parse(ref: str) -> str: | ||
| """Resolve a git ref to a commit hash.""" | ||
| return subprocess.run( | ||
| ["git", "rev-parse", ref], check=True, capture_output=True, text=True | ||
| ).stdout.strip() |
There was a problem hiding this comment.
rev_parse() runs git rev-parse in the current working directory, which resolves refs/commits for this repository, not the repository being benchmarked by ASV (configured via asv.conf.json as https://github.com/sympy/sympy.git). This will typically fail for defaults like origin/master (if this repo’s default branch is main) and, even if it succeeds, it will produce commit hashes that won’t match the ASV result filenames, leading to empty/incorrect reports. Consider removing this and instead (a) accept commit hashes as inputs, or (b) determine the commit hash from the ASV results produced in results_dir (e.g., by inspecting the created *-*.json filenames / metadata), or (c) run git rev-parse inside ASV’s checkout of the benchmarked repo.
| result_files = list(results_dir.glob(f"**/{commit_hash[:8]}-*.json")) + list( | ||
| results_dir.glob(f"**/{commit_hash}-*.json") | ||
| ) | ||
|
|
||
| merged: Dict[str, float] = {} | ||
| for result_file in result_files: | ||
| data = json.loads(result_file.read_text()) | ||
| for benchmark_name, payload in data.get("results", {}).items(): | ||
| value = extract_numeric(payload.get("result") if isinstance(payload, dict) else payload) | ||
| if value is not None: | ||
| merged[benchmark_name] = value |
There was a problem hiding this comment.
load_results() merges results across all matching JSON files but iterates files in filesystem order and overwrites merged[benchmark_name] on collisions. If ASV produces multiple result files for the same commit (e.g., multiple envs from the matrix), this makes the output nondeterministic and can silently drop data. Consider sorting result_files and either grouping results by environment/file or erroring when the same benchmark appears with different values.
| result_files = list(results_dir.glob(f"**/{commit_hash[:8]}-*.json")) + list( | |
| results_dir.glob(f"**/{commit_hash}-*.json") | |
| ) | |
| merged: Dict[str, float] = {} | |
| for result_file in result_files: | |
| data = json.loads(result_file.read_text()) | |
| for benchmark_name, payload in data.get("results", {}).items(): | |
| value = extract_numeric(payload.get("result") if isinstance(payload, dict) else payload) | |
| if value is not None: | |
| merged[benchmark_name] = value | |
| result_files = sorted( | |
| list(results_dir.glob(f"**/{commit_hash[:8]}-*.json")) | |
| + list(results_dir.glob(f"**/{commit_hash}-*.json")), | |
| key=lambda p: str(p), | |
| ) | |
| merged: Dict[str, float] = {} | |
| benchmark_sources: Dict[str, Path] = {} | |
| for result_file in result_files: | |
| data = json.loads(result_file.read_text()) | |
| for benchmark_name, payload in data.get("results", {}).items(): | |
| value = extract_numeric( | |
| payload.get("result") if isinstance(payload, dict) else payload | |
| ) | |
| if value is None: | |
| continue | |
| if benchmark_name in merged: | |
| if merged[benchmark_name] != value: | |
| raise ValueError( | |
| f"Conflicting results for benchmark '{benchmark_name}': " | |
| f"{merged[benchmark_name]} from {benchmark_sources[benchmark_name]} " | |
| f"vs {value} from {result_file}" | |
| ) | |
| else: | |
| merged[benchmark_name] = value | |
| benchmark_sources[benchmark_name] = result_file |
| parser.add_argument("--base-ref", default="origin/master") | ||
| parser.add_argument("--target-ref", default="HEAD") | ||
| parser.add_argument("--config", default="asv.conf.json") | ||
| parser.add_argument("--output", default="benchmark_results/runner_report.json") | ||
| parser.add_argument("--machine", default="benchmark-runner") | ||
| parser.add_argument("--bench", default=".*") | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def main() -> None: | ||
| args = parse_args() | ||
| output_path = Path(args.output) | ||
| output_path.parent.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| base_hash = rev_parse(args.base_ref) | ||
| target_hash = rev_parse(args.target_ref) | ||
|
|
||
| base_results_dir = output_path.parent / "base_results" | ||
| target_results_dir = output_path.parent / "target_results" | ||
| if base_results_dir.exists(): | ||
| shutil.rmtree(base_results_dir) | ||
| if target_results_dir.exists(): | ||
| shutil.rmtree(target_results_dir) |
There was a problem hiding this comment.
The default output location (benchmark_results/...) and the derived base_results/target_results directories are not covered by .gitignore (whereas results/ is). Running this script will likely leave many untracked files in the working tree. Consider defaulting output/results under an already-ignored directory (like results/) or updating the ignore patterns accordingly.
| def main() -> None: | ||
| args = parse_args() | ||
| output_path = Path(args.output) | ||
| output_path.parent.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| base_hash = rev_parse(args.base_ref) | ||
| target_hash = rev_parse(args.target_ref) | ||
|
|
||
| base_results_dir = output_path.parent / "base_results" | ||
| target_results_dir = output_path.parent / "target_results" | ||
| if base_results_dir.exists(): | ||
| shutil.rmtree(base_results_dir) | ||
| if target_results_dir.exists(): | ||
| shutil.rmtree(target_results_dir) | ||
|
|
||
| run_asv(args.base_ref, args.config, base_results_dir, machine=args.machine, bench=args.bench) | ||
| run_asv(args.target_ref, args.config, target_results_dir, machine=args.machine, bench=args.bench) | ||
|
|
||
| report = { | ||
| "metadata": { | ||
| "base_ref": args.base_ref, | ||
| "target_ref": args.target_ref, | ||
| "bench": args.bench, | ||
| "machine": args.machine, | ||
| }, | ||
| "base": { | ||
| "ref": args.base_ref, | ||
| "commit": base_hash, | ||
| "results": load_results(base_results_dir, base_hash), | ||
| }, | ||
| "target": { | ||
| "ref": args.target_ref, | ||
| "commit": target_hash, | ||
| "results": load_results(target_results_dir, target_hash), | ||
| }, | ||
| } | ||
|
|
||
| output_path.write_text(json.dumps(report, indent=2, sort_keys=True)) | ||
| print(f"Wrote {output_path}") |
There was a problem hiding this comment.
The CLI path in main() (arg parsing + running both refs + writing the report) is currently untested; the tests cover run_asv() command construction and result parsing only. Adding an integration-style unit test that monkeypatches subprocess.run and creates fake ASV result files would help prevent regressions (and would catch issues like resolving commit hashes in the wrong repo).
References to other Issues or PRs
NO
Brief description of what is fixed or changed
Adds a minimal ASV-based benchmark runner to compare performance between two refs and generate a structured JSON report. Includes unit tests also.
AI Generation Disclosure
I used online resources and AI tools to assist in understanding ASV usage patterns, structuring some parts of the implementation, and enhancing code readability. All of the code is reviewed, adapted, and tested by me to ensure accuracy and appropriateness in relation to the project requirements.
I am familiar with the implementation, and I can explain or modify any part of the code.