use public prompt sets instead of prompt sets that require auth #865

rogthefrog · 2025-02-14T00:42:33Z

Some tests download real prompt sets, which may require an auth token.

Instead of monkeypatching the secrets to use the auth token defined in the real secrets just for the test, we replace prompt set URLs with their demo equivalents.

https://github.com/mlcommons/modelbench-private/issues/53

github-actions · 2025-02-14T00:42:47Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

rogthefrog · 2025-02-14T01:12:59Z

Supersedes #863

wpietri · 2025-02-14T15:57:02Z

plugins/validation_tests/test_object_creation.py

+TOO_SLOW = {"real_toxicity_prompts", "bbq"}
+
+
+def ensure_public_dependencies(dependencies):


Could you say more about the need for this? It seems a little magic for my tastes. I take it it's difficult to instantiate the tests while explicitly specifying the demo prompt set?

It's difficult in a couple of ways:

a local file dependency isn't a drop-in replacement for the WebData dependencies in the tests in question; it's possible but not super clean.

some of the WebData dependencies are Google Drive links, some are modellab links, and in principle they could be anything. There's no generic way to replace all the WebData dependencies with local files without this test knowing where those files are and what's in them.

Our options are:

modify how dependencies are defined in tests, by injecting them into the test, rather than having the test manage its own dependencies. This is a far-reaching change, just to accommodate this test;

overwrite specific tests' dependencies with local file dependencies, which adds some complexity (we need to write code to convert a WebData object to a local file data object + we need to maintain local files for tests);

overwrite specific tests' dependencies by using public demo prompt sets vs the practice prompt sets they use.

I picked option 3 in this PR because it's the least amount of code, it's local to this test, and more importantly it doesn't significantly change the internals of the test objects by replacing a web download with a local file (i.e. it still does a web download). If an error happens while using the local file, it's something we have to fix that doesn't really tell us anything interesting about the test we're testing (which doesn't use a local file outside of tests).

If it's helpful, I think the WebData stuff is wildly overgeneral. There was a hot minute when we were using Google Drive links, and at one point we had dreams of accommodating a lot of general tests. But in the concrete present, we pull a small number of files from exactly one server, sometimes authed and sometimes otherwise. I don't expect that to change soon. So if you want to go on a complexity reduction spree, I'm all for it.

I think that in exactly one spot we should have a test that makes sure all of that works together, like a smoke test. And then ideally, the rest of the tests run straightforwardly from local files or mocks. But I'm not religious on any of that, so if this is the lowest-magic/lowest-effort opportunity for now, I say go for it.

What if we place a dummy private dataset file in run_data/tests/SafeTestWhatever? Because the dependency helper won't try to retrieve the file if it thinks it already has it downloaded locally. I might be overlooking something, but this would be simplest option imho.

I think having a remote file download is worth testing in a smoke test. I also think having a reliable local file using the existing local file capabilities is worth doing for unit tests. So I'll tidy this up for the smoke test, and then look at other places in the code where a remote resource can be replaced with a local resource.

…pt set

… publicly-accessible one

…dantic-core (2.27.2) from archive pydantic_core-2.27.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl not found in known hashes (was: sha256:ed4964723e97cdf8c70abebd1495001f511491b8eeac817b033db1af28a86bb5)'

rogthefrog · 2025-03-11T21:36:28Z

@wpietri @dhosterman @bkorycki ready for another review please.

wpietri

🚀

rogthefrog requested a review from a team as a code owner February 14, 2025 00:42

rogthefrog temporarily deployed to Scheduled Testing February 14, 2025 00:42 — with GitHub Actions Inactive

rogthefrog requested review from wpietri, dhosterman and bkorycki February 14, 2025 01:12

rogthefrog mentioned this pull request Feb 14, 2025

a less janky, more local way for a specific test to request real secrets #863

Closed

wpietri reviewed Feb 14, 2025

View reviewed changes

rogthefrog force-pushed the fix/53-use-demo branch from b97c0eb to 25ddb9d Compare February 18, 2025 20:38

rogthefrog temporarily deployed to Scheduled Testing February 18, 2025 20:38 — with GitHub Actions Inactive

rogthefrog marked this pull request as draft February 19, 2025 05:15

rogthefrog force-pushed the fix/53-use-demo branch from 25ddb9d to eb197f9 Compare February 26, 2025 01:38

rogthefrog had a problem deploying to Scheduled Testing February 26, 2025 01:38 — with GitHub Actions Failure

rogthefrog had a problem deploying to Scheduled Testing February 26, 2025 01:38 — with GitHub Actions Error

rogthefrog had a problem deploying to Scheduled Testing February 26, 2025 01:52 — with GitHub Actions Error

rogthefrog had a problem deploying to Scheduled Testing February 26, 2025 01:52 — with GitHub Actions Failure

rogthefrog temporarily deployed to Scheduled Testing February 26, 2025 02:03 — with GitHub Actions Inactive

rogthefrog force-pushed the fix/53-use-demo branch from afdf679 to c3f4b94 Compare February 26, 2025 02:16

rogthefrog temporarily deployed to Scheduled Testing February 26, 2025 02:16 — with GitHub Actions Inactive

rogthefrog had a problem deploying to Scheduled Testing March 11, 2025 02:19 — with GitHub Actions Failure

rogthefrog had a problem deploying to Scheduled Testing March 11, 2025 02:19 — with GitHub Actions Error

rogthefrog temporarily deployed to Scheduled Testing March 11, 2025 20:51 — with GitHub Actions Inactive

rogthefrog temporarily deployed to Scheduled Testing March 11, 2025 21:19 — with GitHub Actions Inactive

rogthefrog added 12 commits March 11, 2025 17:22

use public prompt sets instead of prompt sets that require auth

a173f43

disable a noisy test until we fix it properly

24f12aa

find the public demo prompt set that corresponds to a non-public prom…

3f0e7f9

…pt set

less janky still way to turn a token-protected prompt set file into a…

252b678

… publicly-accessible one

download public files instead of private or protected

8e7ef50

fixed test to reflect new function behavior

c0da68d

appease mypy

3a65153

remove test removed elsewhere; remove unneeded tmp_path fixture

fd823d9

remove unsupported model

508b0dc

refresh lock file with the right version of Poetry

0406bac

restore gemini 1.0 pro, which is really supposed to be supported.

1333ea7

rogthefrog force-pushed the fix/53-use-demo branch from 73523fe to 1333ea7 Compare March 11, 2025 21:23

rogthefrog temporarily deployed to Scheduled Testing March 11, 2025 21:23 — with GitHub Actions Inactive

rogthefrog marked this pull request as ready for review March 11, 2025 21:36

wpietri approved these changes Mar 11, 2025

View reviewed changes

rogthefrog merged commit 18e16c1 into main Mar 11, 2025
4 checks passed

rogthefrog deleted the fix/53-use-demo branch March 11, 2025 22:25

github-actions bot locked and limited conversation to collaborators Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use public prompt sets instead of prompt sets that require auth #865

use public prompt sets instead of prompt sets that require auth #865

Uh oh!

rogthefrog commented Feb 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 14, 2025 •

edited

Loading

Uh oh!

rogthefrog commented Feb 14, 2025

Uh oh!

wpietri Feb 14, 2025

Uh oh!

rogthefrog Feb 15, 2025

Uh oh!

wpietri Feb 17, 2025

Uh oh!

bkorycki Feb 18, 2025

Uh oh!

rogthefrog Feb 25, 2025

Uh oh!

rogthefrog commented Mar 11, 2025

Uh oh!

wpietri left a comment

Uh oh!

Uh oh!

Uh oh!

		TOO_SLOW = {"real_toxicity_prompts", "bbq"}


		def ensure_public_dependencies(dependencies):

use public prompt sets instead of prompt sets that require auth #865

use public prompt sets instead of prompt sets that require auth #865

Uh oh!

Conversation

rogthefrog commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rogthefrog commented Feb 14, 2025

Uh oh!

wpietri Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

rogthefrog Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

wpietri Feb 17, 2025

Choose a reason for hiding this comment

Uh oh!

bkorycki Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

rogthefrog Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

rogthefrog commented Mar 11, 2025

Uh oh!

wpietri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rogthefrog commented Feb 14, 2025 •

edited

Loading

github-actions bot commented Feb 14, 2025 •

edited

Loading