Add option to use latest directory and add run printing by c-hagem · Pull Request #1586 · awslabs/mountpoint-s3

c-hagem · 2025-08-28T06:26:53Z

Adds two features to the autogroup.py script, namely

the possibility to print either the numbers of all runs of a category (ordered descendingly by throughput), when specifying --runs=all, or just to print max, median and min run numbers --runs=rep (for representative)
makes default order of throughputs consistent with sorting order if runs are specified
adds possibility to use print the latest run, which is used automatically when no directory is specified

Only changes behaviour of the benchmarking autogroup script, so no Changelog entries / version bumps needed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

muddyfish · 2025-08-28T13:56:36Z

benchmark/analysis-scripts/autogroup.py

+def find_multirun_dir(index: int = 0) -> str:
+    """Find the Nth latest directory in multirun (0=most recent, 1=previous, etc.)"""
+    if not Path('multirun').exists():
+        warnings.warn("multirun directory not found")


Raise an exception here?

muddyfish · 2025-08-28T13:58:26Z

benchmark/analysis-scripts/autogroup.py

+    if not sorted_subdirs:
+        warnings.warn("No experiment directories found in multirun")
+        sys.exit(1)
+    return sorted_subdirs[index][1]


Above there are warnings, but if the index is out of range, this can raise an exception without a handy description.

This will be caught as an IndexError, since we currently only use index 0 i think.

muddyfish · 2025-08-28T13:59:24Z

benchmark/analysis-scripts/autogroup.py

+
    parser.add_argument('--csv-output', help='Optional CSV file to write the results to')
+    parser.add_argument(
+        '--runs', choices=['tri', 'all'], help='Show run numbers in results (tri=min/median/max, all=all runs)'


I've also not heard of it. Wondering if there's a slightly less technical term which also works here

muddyfish · 2025-08-28T14:02:41Z

benchmark/analysis-scripts/autogroup.py

+
+    results_rows = []
+    for config_key, throughput_data in grouped_results.items():
+        throughputs = [t for t, _ in throughput_data]


Nit: throughputs, run_numbers = zip(*throughput_data)

muddyfish · 2025-08-28T14:03:52Z

benchmark/analysis-scripts/autogroup.py

+            if args.runs == "tri":
+                # Find min, max, and median run numbers based on throughput
+                sorted_by_throughput = sorted(zip(throughputs, run_numbers))
+                min_run = sorted_by_throughput[0][1]


Unclear why we're zipping after just unzipping above

muddyfish · 2025-08-28T14:04:17Z

benchmark/analysis-scripts/autogroup.py

+
+                row.append(",".join(unique_runs))
+            else:
+                sorted_by_throughput = sorted(zip(throughputs, run_numbers), reverse=True)


Why are we reverse sorting here but not above?

Above we pick min, p50 and max , I guess we could use reverse sorting in both

Adjusted to reverse sort once

muddyfish · 2025-08-28T14:08:05Z

benchmark/analysis-scripts/autogroup.py

+
+                selected_runs = [max_run, median_run, min_run]
+                # Remove duplicates while preserving order
+                unique_runs = []


How large is this list realistically getting? This approach is O(n^2)

The list only has.3 elements (i.e. selected_runs).

Fixed to faster method

Adds two features to the autogroup.py script, namely - the possibility to print either the numbers of all runs of a category (ordered descendingly by throughput), when specifying `--runs=all`, or just to print max, median and min run numbers `--runs=tri` - makes default order of throughputs consistent with sorting order if runs are specified - adds possibility to use print the latest run (by default latest is inferred from the run number, but can also be switched to modification time). Additionally, with `--latest=K` the k-th latest run acording to the specified order is picked. Signed-off-by: Christian Hagemeier <chagem@amazon.com>

Signed-off-by: Christian Hagemeier <chagem@amazon.com>

muddyfish · 2025-08-29T14:20:26Z

benchmark/analysis-scripts/autogroup.py

+
+        # Add run numbers column if requested
+        if args.runs:
+            sorted_by_throughput = sorted(zip(throughputs, run_numbers), reverse=True)


Isn't zip(throughputs, run_numbers) equivalent to throughput_data?

muddyfish · 2025-08-29T14:26:49Z

benchmark/analysis-scripts/autogroup.py

+                selected_runs = [max_run, median_run, min_run]
+                # Remove duplicates while preserving order using dict.fromkeys()
+                # (works in python > 3.7)
+                unique_runs = list(dict.fromkeys(selected_runs))


I think your previous approach was more readable 😅
Perhaps just move it to a function instead?

muddyfish · 2025-08-29T14:27:18Z

benchmark/analysis-scripts/autogroup.py

+
+                row.append(",".join(unique_runs))
+            else:
+                all_runs = [r for _, r in sorted_by_throughput]


Isn't this run_numbers?

c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure

c-hagem force-pushed the autogroup-latest-option branch from 12e26af to b9172e4 Compare August 28, 2025 06:33

c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure

c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure

c-hagem requested a review from sahityadg August 28, 2025 11:10

c-hagem had a problem deploying to PR integration tests August 28, 2025 12:32 — with GitHub Actions Failure

sahityadg previously approved these changes Aug 28, 2025

View reviewed changes

muddyfish reviewed Aug 28, 2025

View reviewed changes

c-hagem added 4 commits August 29, 2025 13:13

Remove superflous options

fc98a6b

Signed-off-by: Christian Hagemeier <chagem@amazon.com>

Remove latest option

1ff675c

Signed-off-by: Christian Hagemeier <chagem@amazon.com>

address feedback

137139b

Signed-off-by: Christian Hagemeier <chagem@amazon.com>

muddyfish reviewed Aug 29, 2025

View reviewed changes

Conversation

c-hagem commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

c-hagem commented Aug 28, 2025 •

edited

Loading