Analytics cli #894

bkorycki · 2025-03-11T01:12:44Z

First attempt at an analytics CLI tool: run-csv.
To summarize, each run results in a new directory containing the output results file and a metadata json file.

The output structure is largely based off of the sample dvc directory in the modelbench-private dvc-experiments branch. However, I was not able to follow it 100%. Here are the key differences:

The metadata lists suts instead of one sut. This is because the pipeline runner is shared with the already-exisiting run-csv-ITEMS which allows for multi-sut runs.
I changed the prompts metadata key to input in order to generalize to runs where the input is prompts + responses.
It is not currently possible to include provider and provider_id in the metadata. The best I could do is include the SUT initialization record. Annotators do not have initialization records, however, which maybe we should fix.
I excluded the unique_count for responses in the metadata because it would be kind of difficult to keep track of. If it's super important, we can include it in a later PR.
There is no cache (see convo in discord)

Some questions

Would it be better to include the full path to the input source?
Any advice on how to make the unit tests less repetitive would be appreciated.
Also open to better names for the cli tool.

github-actions · 2025-03-11T01:12:55Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

rogthefrog · 2025-03-11T20:10:06Z

src/modelgauge/command_line.py

@@ -102,6 +102,8 @@ def validate_uid(ctx, param, value):
    Raises a BadParameter exception if the user-supplied arg(s) are not valid UIDs.
    Applicable for parameters '--sut', '--test', and '--annotator'.
    """
+    if not value:


That seems inconsistent with the comment right above: isn't a null or blank value a bad UID?

Yeah I can see how it looks a little counter intuitive. But this callback is called even when no argument is supplied. So it shouldn't return False when we aren't actually validating a UID.

Before, every click option that used this validate_uid callback either required exactly one argument or optionally allows a list of args. So validate_uid would never receive None.

However, my new cli has an optional option --sut. Since it's not a list, it's default value is None if no sut is specified, instead of an empty list.
That's why I need to add this line of code.

Basically, this function only validates UID strings that are actually provided. Does that make sense?

rogthefrog · 2025-03-11T20:12:27Z

src/modelgauge/main.py

+def run_csv(sut_uid, annotator_uids, workers, output_dir, tag, debug, input_path, max_tokens, temp, top_p, top_k):
+    """Run rows in a CSV through some SUTs and/or annotators.
+
+    If running a SUT, the file must have 'UID' and 'Text' columns. The output will be saved to a CSV file.


What is Text? Is it the prompt?

Yes. This is identical to the already existing run-csv-items command.

As long as we're changing things, do we want to change this so that will read our prompt files? Because I think that's going to be the most common use case.

I can do that but those changes will also affect the run-csv-items command that the eval folks use.

src/modelgauge/main.py

src/modelgauge/pipeline_runner.py

wpietri · 2025-03-11T21:08:06Z

Also open to better names for the cli tool.

I think the CSV-ness of this is incidental to the purpose, so I might be more inclined to go with run-prompts or run-job. Or you could look for a more poetic name. Maybe something with a name derived from kitchen tools or hand tools?

wpietri

Looks good! I think not having everything in the metadata for now is fine. Let's get it into people's hands and see what they want once they're using it.

src/modelgauge/main.py

src/modelgauge/pipeline_runner.py

wpietri · 2025-03-11T21:17:14Z

tests/modelgauge_tests/test_cli.py

+        catch_exceptions=False,
+    )
+
+    assert result.exit_code == 0


I see what you're saying about the tests. Have you tried creating an object that holds all of those variables? You could at least move the boilerplate inside it, and I suspect it would be a good place for some convenience methods that would clear up your code some.

Yeah I thought about that but I'm having trouble setting up boiler plate because each test differs slightly in input file, cli args, and expected output file.

bkorycki · 2025-03-12T23:04:27Z

@wpietri I'm not very poetic so I re-named to run-job. Thanks for the suggestion!

bkorycki added 11 commits February 28, 2025 10:28

run-stuff is straight copy of run-csv-items

a8e634d

only 1 sut per run

a94397a

fix small bug

0c0f2f8

correct output directory structure

4dea393

The runners are responsible for setting up the output directory

be39b59

write empty metadata

9a1bbb8

rename sub_dir_name to run_id

bb5c747

basic metadata

009417d

annotation and response counts in metadata

45c1504

No more --cache-dir

e5ad582

run-stuff is now run-csv

bdc9736

bkorycki requested review from wpietri and rogthefrog March 11, 2025 01:12

bkorycki requested a review from a team as a code owner March 11, 2025 01:12

bkorycki had a problem deploying to Scheduled Testing March 11, 2025 01:12 — with GitHub Actions Failure

bkorycki had a problem deploying to Scheduled Testing March 11, 2025 01:12 — with GitHub Actions Error

bkorycki added 2 commits March 10, 2025 18:14

black

1033453

mypy

e8308e0

bkorycki temporarily deployed to Scheduled Testing March 11, 2025 01:18 — with GitHub Actions Inactive

rogthefrog approved these changes Mar 11, 2025

View reviewed changes

wpietri approved these changes Mar 11, 2025

View reviewed changes

more details about how validate_uid works

e844115

bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:22 — with GitHub Actions Inactive

log instead of print

c6ad0b9

bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:33 — with GitHub Actions Inactive

use logger instead of warnings in main

87b74d2

bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:35 — with GitHub Actions Inactive

rename to run-job

6d054db

bkorycki temporarily deployed to Scheduled Testing March 12, 2025 23:04 — with GitHub Actions Inactive

bkorycki merged commit 602c62d into main Mar 12, 2025
4 checks passed

github-actions bot locked and limited conversation to collaborators Mar 12, 2025

Analytics cli #894

Analytics cli #894

Uh oh!

Conversation

bkorycki commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpietri commented Mar 11, 2025

Uh oh!

wpietri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bkorycki commented Mar 12, 2025

Uh oh!

Uh oh!

Uh oh!

bkorycki commented Mar 11, 2025 •

edited

Loading

github-actions bot commented Mar 11, 2025 •

edited

Loading