Add option to use latest directory and add run printing#1586
Add option to use latest directory and add run printing#1586c-hagem wants to merge 4 commits intoawslabs:mainfrom
Conversation
12e26af to
b9172e4
Compare
| def find_multirun_dir(index: int = 0) -> str: | ||
| """Find the Nth latest directory in multirun (0=most recent, 1=previous, etc.)""" | ||
| if not Path('multirun').exists(): | ||
| warnings.warn("multirun directory not found") |
There was a problem hiding this comment.
Raise an exception here?
| if not sorted_subdirs: | ||
| warnings.warn("No experiment directories found in multirun") | ||
| sys.exit(1) | ||
| return sorted_subdirs[index][1] |
There was a problem hiding this comment.
Above there are warnings, but if the index is out of range, this can raise an exception without a handy description.
There was a problem hiding this comment.
This will be caught as an IndexError, since we currently only use index 0 i think.
|
|
||
| parser.add_argument('--csv-output', help='Optional CSV file to write the results to') | ||
| parser.add_argument( | ||
| '--runs', choices=['tri', 'all'], help='Show run numbers in results (tri=min/median/max, all=all runs)' |
There was a problem hiding this comment.
I've also not heard of it. Wondering if there's a slightly less technical term which also works here
|
|
||
| results_rows = [] | ||
| for config_key, throughput_data in grouped_results.items(): | ||
| throughputs = [t for t, _ in throughput_data] |
There was a problem hiding this comment.
Nit: throughputs, run_numbers = zip(*throughput_data)
| if args.runs == "tri": | ||
| # Find min, max, and median run numbers based on throughput | ||
| sorted_by_throughput = sorted(zip(throughputs, run_numbers)) | ||
| min_run = sorted_by_throughput[0][1] |
There was a problem hiding this comment.
Unclear why we're zipping after just unzipping above
|
|
||
| row.append(",".join(unique_runs)) | ||
| else: | ||
| sorted_by_throughput = sorted(zip(throughputs, run_numbers), reverse=True) |
There was a problem hiding this comment.
Why are we reverse sorting here but not above?
There was a problem hiding this comment.
Above we pick min, p50 and max , I guess we could use reverse sorting in both
There was a problem hiding this comment.
Adjusted to reverse sort once
|
|
||
| selected_runs = [max_run, median_run, min_run] | ||
| # Remove duplicates while preserving order | ||
| unique_runs = [] |
There was a problem hiding this comment.
How large is this list realistically getting? This approach is O(n^2)
There was a problem hiding this comment.
The list only has.3 elements (i.e. selected_runs).
There was a problem hiding this comment.
Fixed to faster method
Adds two features to the autogroup.py script, namely - the possibility to print either the numbers of all runs of a category (ordered descendingly by throughput), when specifying `--runs=all`, or just to print max, median and min run numbers `--runs=tri` - makes default order of throughputs consistent with sorting order if runs are specified - adds possibility to use print the latest run (by default latest is inferred from the run number, but can also be switched to modification time). Additionally, with `--latest=K` the k-th latest run acording to the specified order is picked. Signed-off-by: Christian Hagemeier <chagem@amazon.com>
Signed-off-by: Christian Hagemeier <chagem@amazon.com>
Signed-off-by: Christian Hagemeier <chagem@amazon.com>
Signed-off-by: Christian Hagemeier <chagem@amazon.com>
|
|
||
| # Add run numbers column if requested | ||
| if args.runs: | ||
| sorted_by_throughput = sorted(zip(throughputs, run_numbers), reverse=True) |
There was a problem hiding this comment.
Isn't zip(throughputs, run_numbers) equivalent to throughput_data?
| selected_runs = [max_run, median_run, min_run] | ||
| # Remove duplicates while preserving order using dict.fromkeys() | ||
| # (works in python > 3.7) | ||
| unique_runs = list(dict.fromkeys(selected_runs)) |
There was a problem hiding this comment.
I think your previous approach was more readable 😅
Perhaps just move it to a function instead?
|
|
||
| row.append(",".join(unique_runs)) | ||
| else: | ||
| all_runs = [r for _, r in sorted_by_throughput] |
Adds two features to the autogroup.py script, namely
--runs=all, or just to print max, median and min run numbers--runs=rep(for representative)Only changes behaviour of the benchmarking autogroup script, so no Changelog entries / version bumps needed.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).