Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use public prompt sets instead of prompt sets that require auth #865

Merged
merged 12 commits into from
Mar 11, 2025

Conversation

rogthefrog
Copy link
Contributor

@rogthefrog rogthefrog commented Feb 14, 2025

Some tests download real prompt sets, which may require an auth token.

Instead of monkeypatching the secrets to use the auth token defined in the real secrets just for the test, we replace prompt set URLs with their demo equivalents.

https://github.com/mlcommons/modelbench-private/issues/53

@rogthefrog rogthefrog requested a review from a team as a code owner February 14, 2025 00:42
Copy link

github-actions bot commented Feb 14, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@rogthefrog
Copy link
Contributor Author

Supersedes #863

TOO_SLOW = {"real_toxicity_prompts", "bbq"}


def ensure_public_dependencies(dependencies):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say more about the need for this? It seems a little magic for my tastes. I take it it's difficult to instantiate the tests while explicitly specifying the demo prompt set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult in a couple of ways:

  • a local file dependency isn't a drop-in replacement for the WebData dependencies in the tests in question; it's possible but not super clean.
  • some of the WebData dependencies are Google Drive links, some are modellab links, and in principle they could be anything. There's no generic way to replace all the WebData dependencies with local files without this test knowing where those files are and what's in them.

Our options are:

  • modify how dependencies are defined in tests, by injecting them into the test, rather than having the test manage its own dependencies. This is a far-reaching change, just to accommodate this test;
  • overwrite specific tests' dependencies with local file dependencies, which adds some complexity (we need to write code to convert a WebData object to a local file data object + we need to maintain local files for tests);
  • overwrite specific tests' dependencies by using public demo prompt sets vs the practice prompt sets they use.

I picked option 3 in this PR because it's the least amount of code, it's local to this test, and more importantly it doesn't significantly change the internals of the test objects by replacing a web download with a local file (i.e. it still does a web download). If an error happens while using the local file, it's something we have to fix that doesn't really tell us anything interesting about the test we're testing (which doesn't use a local file outside of tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's helpful, I think the WebData stuff is wildly overgeneral. There was a hot minute when we were using Google Drive links, and at one point we had dreams of accommodating a lot of general tests. But in the concrete present, we pull a small number of files from exactly one server, sometimes authed and sometimes otherwise. I don't expect that to change soon. So if you want to go on a complexity reduction spree, I'm all for it.

I think that in exactly one spot we should have a test that makes sure all of that works together, like a smoke test. And then ideally, the rest of the tests run straightforwardly from local files or mocks. But I'm not religious on any of that, so if this is the lowest-magic/lowest-effort opportunity for now, I say go for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we place a dummy private dataset file in run_data/tests/SafeTestWhatever? Because the dependency helper won't try to retrieve the file if it thinks it already has it downloaded locally. I might be overlooking something, but this would be simplest option imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a remote file download is worth testing in a smoke test. I also think having a reliable local file using the existing local file capabilities is worth doing for unit tests. So I'll tidy this up for the smoke test, and then look at other places in the code where a remote resource can be replaced with a local resource.

@rogthefrog rogthefrog marked this pull request as draft February 19, 2025 05:15
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing March 11, 2025 21:23 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing March 11, 2025 21:23 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing March 11, 2025 21:23 — with GitHub Actions Inactive
@rogthefrog rogthefrog marked this pull request as ready for review March 11, 2025 21:36
@rogthefrog
Copy link
Contributor Author

@wpietri @dhosterman @bkorycki ready for another review please.

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@rogthefrog rogthefrog merged commit 18e16c1 into main Mar 11, 2025
4 checks passed
@rogthefrog rogthefrog deleted the fix/53-use-demo branch March 11, 2025 22:25
@github-actions github-actions bot locked and limited conversation to collaborators Mar 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants