Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analytics cli #894

Merged
merged 17 commits into from
Mar 12, 2025
Merged

Analytics cli #894

merged 17 commits into from
Mar 12, 2025

Conversation

bkorycki
Copy link
Contributor

@bkorycki bkorycki commented Mar 11, 2025

First attempt at an analytics CLI tool: run-csv.
To summarize, each run results in a new directory containing the output results file and a metadata json file.

The output structure is largely based off of the sample dvc directory in the modelbench-private dvc-experiments branch. However, I was not able to follow it 100%. Here are the key differences:

  • The metadata lists suts instead of one sut. This is because the pipeline runner is shared with the already-exisiting run-csv-ITEMS which allows for multi-sut runs.
  • I changed the prompts metadata key to input in order to generalize to runs where the input is prompts + responses.
  • It is not currently possible to include provider and provider_id in the metadata. The best I could do is include the SUT initialization record. Annotators do not have initialization records, however, which maybe we should fix.
  • I excluded the unique_count for responses in the metadata because it would be kind of difficult to keep track of. If it's super important, we can include it in a later PR.
  • There is no cache (see convo in discord)

Some questions

  • Would it be better to include the full path to the input source?
  • Any advice on how to make the unit tests less repetitive would be appreciated.
  • Also open to better names for the cli tool.

@bkorycki bkorycki requested review from wpietri and rogthefrog March 11, 2025 01:12
@bkorycki bkorycki requested a review from a team as a code owner March 11, 2025 01:12
Copy link

github-actions bot commented Mar 11, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@bkorycki bkorycki temporarily deployed to Scheduled Testing March 11, 2025 01:18 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 11, 2025 01:18 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 11, 2025 01:18 — with GitHub Actions Inactive
@@ -102,6 +102,8 @@ def validate_uid(ctx, param, value):
Raises a BadParameter exception if the user-supplied arg(s) are not valid UIDs.
Applicable for parameters '--sut', '--test', and '--annotator'.
"""
if not value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems inconsistent with the comment right above: isn't a null or blank value a bad UID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can see how it looks a little counter intuitive. But this callback is called even when no argument is supplied. So it shouldn't return False when we aren't actually validating a UID.

Before, every click option that used this validate_uid callback either required exactly one argument or optionally allows a list of args. So validate_uid would never receive None.

However, my new cli has an optional option --sut. Since it's not a list, it's default value is None if no sut is specified, instead of an empty list.
That's why I need to add this line of code.

Basically, this function only validates UID strings that are actually provided. Does that make sense?

def run_csv(sut_uid, annotator_uids, workers, output_dir, tag, debug, input_path, max_tokens, temp, top_p, top_k):
"""Run rows in a CSV through some SUTs and/or annotators.

If running a SUT, the file must have 'UID' and 'Text' columns. The output will be saved to a CSV file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Text? Is it the prompt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is identical to the already existing run-csv-items command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we're changing things, do we want to change this so that will read our prompt files? Because I think that's going to be the most common use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that but those changes will also affect the run-csv-items command that the eval folks use.

@wpietri
Copy link
Contributor

wpietri commented Mar 11, 2025

  • Also open to better names for the cli tool.

I think the CSV-ness of this is incidental to the purpose, so I might be more inclined to go with run-prompts or run-job. Or you could look for a more poetic name. Maybe something with a name derived from kitchen tools or hand tools?

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think not having everything in the metadata for now is fine. Let's get it into people's hands and see what they want once they're using it.

catch_exceptions=False,
)

assert result.exit_code == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying about the tests. Have you tried creating an object that holds all of those variables? You could at least move the boilerplate inside it, and I suspect it would be a good place for some convenience methods that would clear up your code some.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about that but I'm having trouble setting up boiler plate because each test differs slightly in input file, cli args, and expected output file.

@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:22 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:22 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:22 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:33 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:33 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:33 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:35 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:35 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 19:35 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 23:04 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 23:04 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 12, 2025 23:04 — with GitHub Actions Inactive
@bkorycki
Copy link
Contributor Author

@wpietri I'm not very poetic so I re-named to run-job. Thanks for the suggestion!

@bkorycki bkorycki merged commit 602c62d into main Mar 12, 2025
4 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants