Skip to content

Add minimal benchmark runner for comparing refs using ASV#128

Open
sanskarmit wants to merge 1 commit intosympy:masterfrom
sanskarmit:clean-benchmark-runner
Open

Add minimal benchmark runner for comparing refs using ASV#128
sanskarmit wants to merge 1 commit intosympy:masterfrom
sanskarmit:clean-benchmark-runner

Conversation

@sanskarmit
Copy link
Copy Markdown

References to other Issues or PRs

NO

Brief description of what is fixed or changed

Adds a minimal ASV-based benchmark runner to compare performance between two refs and generate a structured JSON report. Includes unit tests also.

AI Generation Disclosure

I used online resources and AI tools to assist in understanding ASV usage patterns, structuring some parts of the implementation, and enhancing code readability. All of the code is reviewed, adapted, and tested by me to ensure accuracy and appropriateness in relation to the project requirements.

I am familiar with the implementation, and I can explain or modify any part of the code.

Copilot AI review requested due to automatic review settings March 30, 2026 18:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmark_runner.py
Comment on lines +69 to +73
def rev_parse(ref: str) -> str:
"""Resolve a git ref to a commit hash."""
return subprocess.run(
["git", "rev-parse", ref], check=True, capture_output=True, text=True
).stdout.strip()
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev_parse() runs git rev-parse in the current working directory, which resolves refs/commits for this repository, not the repository being benchmarked by ASV (configured via asv.conf.json as https://github.com/sympy/sympy.git). This will typically fail for defaults like origin/master (if this repo’s default branch is main) and, even if it succeeds, it will produce commit hashes that won’t match the ASV result filenames, leading to empty/incorrect reports. Consider removing this and instead (a) accept commit hashes as inputs, or (b) determine the commit hash from the ASV results produced in results_dir (e.g., by inspecting the created *-*.json filenames / metadata), or (c) run git rev-parse inside ASV’s checkout of the benchmarked repo.

Copilot uses AI. Check for mistakes.
Comment thread benchmark_runner.py
Comment on lines +46 to +56
result_files = list(results_dir.glob(f"**/{commit_hash[:8]}-*.json")) + list(
results_dir.glob(f"**/{commit_hash}-*.json")
)

merged: Dict[str, float] = {}
for result_file in result_files:
data = json.loads(result_file.read_text())
for benchmark_name, payload in data.get("results", {}).items():
value = extract_numeric(payload.get("result") if isinstance(payload, dict) else payload)
if value is not None:
merged[benchmark_name] = value
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_results() merges results across all matching JSON files but iterates files in filesystem order and overwrites merged[benchmark_name] on collisions. If ASV produces multiple result files for the same commit (e.g., multiple envs from the matrix), this makes the output nondeterministic and can silently drop data. Consider sorting result_files and either grouping results by environment/file or erroring when the same benchmark appears with different values.

Suggested change
result_files = list(results_dir.glob(f"**/{commit_hash[:8]}-*.json")) + list(
results_dir.glob(f"**/{commit_hash}-*.json")
)
merged: Dict[str, float] = {}
for result_file in result_files:
data = json.loads(result_file.read_text())
for benchmark_name, payload in data.get("results", {}).items():
value = extract_numeric(payload.get("result") if isinstance(payload, dict) else payload)
if value is not None:
merged[benchmark_name] = value
result_files = sorted(
list(results_dir.glob(f"**/{commit_hash[:8]}-*.json"))
+ list(results_dir.glob(f"**/{commit_hash}-*.json")),
key=lambda p: str(p),
)
merged: Dict[str, float] = {}
benchmark_sources: Dict[str, Path] = {}
for result_file in result_files:
data = json.loads(result_file.read_text())
for benchmark_name, payload in data.get("results", {}).items():
value = extract_numeric(
payload.get("result") if isinstance(payload, dict) else payload
)
if value is None:
continue
if benchmark_name in merged:
if merged[benchmark_name] != value:
raise ValueError(
f"Conflicting results for benchmark '{benchmark_name}': "
f"{merged[benchmark_name]} from {benchmark_sources[benchmark_name]} "
f"vs {value} from {result_file}"
)
else:
merged[benchmark_name] = value
benchmark_sources[benchmark_name] = result_file

Copilot uses AI. Check for mistakes.
Comment thread benchmark_runner.py
Comment on lines +78 to +100
parser.add_argument("--base-ref", default="origin/master")
parser.add_argument("--target-ref", default="HEAD")
parser.add_argument("--config", default="asv.conf.json")
parser.add_argument("--output", default="benchmark_results/runner_report.json")
parser.add_argument("--machine", default="benchmark-runner")
parser.add_argument("--bench", default=".*")
return parser.parse_args()


def main() -> None:
args = parse_args()
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)

base_hash = rev_parse(args.base_ref)
target_hash = rev_parse(args.target_ref)

base_results_dir = output_path.parent / "base_results"
target_results_dir = output_path.parent / "target_results"
if base_results_dir.exists():
shutil.rmtree(base_results_dir)
if target_results_dir.exists():
shutil.rmtree(target_results_dir)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default output location (benchmark_results/...) and the derived base_results/target_results directories are not covered by .gitignore (whereas results/ is). Running this script will likely leave many untracked files in the working tree. Consider defaulting output/results under an already-ignored directory (like results/) or updating the ignore patterns accordingly.

Copilot uses AI. Check for mistakes.
Comment thread benchmark_runner.py
Comment on lines +87 to +125
def main() -> None:
args = parse_args()
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)

base_hash = rev_parse(args.base_ref)
target_hash = rev_parse(args.target_ref)

base_results_dir = output_path.parent / "base_results"
target_results_dir = output_path.parent / "target_results"
if base_results_dir.exists():
shutil.rmtree(base_results_dir)
if target_results_dir.exists():
shutil.rmtree(target_results_dir)

run_asv(args.base_ref, args.config, base_results_dir, machine=args.machine, bench=args.bench)
run_asv(args.target_ref, args.config, target_results_dir, machine=args.machine, bench=args.bench)

report = {
"metadata": {
"base_ref": args.base_ref,
"target_ref": args.target_ref,
"bench": args.bench,
"machine": args.machine,
},
"base": {
"ref": args.base_ref,
"commit": base_hash,
"results": load_results(base_results_dir, base_hash),
},
"target": {
"ref": args.target_ref,
"commit": target_hash,
"results": load_results(target_results_dir, target_hash),
},
}

output_path.write_text(json.dumps(report, indent=2, sort_keys=True))
print(f"Wrote {output_path}")
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI path in main() (arg parsing + running both refs + writing the report) is currently untested; the tests cover run_asv() command construction and result parsing only. Adding an integration-style unit test that monkeypatches subprocess.run and creates fake ASV result files would help prevent regressions (and would catch issues like resolving commit hashes in the wrong repo).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants