-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use public prompt sets instead of prompt sets that require auth #865
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Supersedes #863 |
TOO_SLOW = {"real_toxicity_prompts", "bbq"} | ||
|
||
|
||
def ensure_public_dependencies(dependencies): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you say more about the need for this? It seems a little magic for my tastes. I take it it's difficult to instantiate the tests while explicitly specifying the demo prompt set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult in a couple of ways:
- a local file dependency isn't a drop-in replacement for the WebData dependencies in the tests in question; it's possible but not super clean.
- some of the WebData dependencies are Google Drive links, some are modellab links, and in principle they could be anything. There's no generic way to replace all the WebData dependencies with local files without this test knowing where those files are and what's in them.
Our options are:
- modify how dependencies are defined in tests, by injecting them into the test, rather than having the test manage its own dependencies. This is a far-reaching change, just to accommodate this test;
- overwrite specific tests' dependencies with local file dependencies, which adds some complexity (we need to write code to convert a WebData object to a local file data object + we need to maintain local files for tests);
- overwrite specific tests' dependencies by using public demo prompt sets vs the practice prompt sets they use.
I picked option 3 in this PR because it's the least amount of code, it's local to this test, and more importantly it doesn't significantly change the internals of the test objects by replacing a web download with a local file (i.e. it still does a web download). If an error happens while using the local file, it's something we have to fix that doesn't really tell us anything interesting about the test we're testing (which doesn't use a local file outside of tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's helpful, I think the WebData stuff is wildly overgeneral. There was a hot minute when we were using Google Drive links, and at one point we had dreams of accommodating a lot of general tests. But in the concrete present, we pull a small number of files from exactly one server, sometimes authed and sometimes otherwise. I don't expect that to change soon. So if you want to go on a complexity reduction spree, I'm all for it.
I think that in exactly one spot we should have a test that makes sure all of that works together, like a smoke test. And then ideally, the rest of the tests run straightforwardly from local files or mocks. But I'm not religious on any of that, so if this is the lowest-magic/lowest-effort opportunity for now, I say go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we place a dummy private dataset file in run_data/tests/SafeTestWhatever
? Because the dependency helper won't try to retrieve the file if it thinks it already has it downloaded locally. I might be overlooking something, but this would be simplest option imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a remote file download is worth testing in a smoke test. I also think having a reliable local file using the existing local file capabilities is worth doing for unit tests. So I'll tidy this up for the smoke test, and then look at other places in the code where a remote resource can be replaced with a local resource.
b97c0eb
to
25ddb9d
Compare
25ddb9d
to
eb197f9
Compare
afdf679
to
c3f4b94
Compare
… publicly-accessible one
…dantic-core (2.27.2) from archive pydantic_core-2.27.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl not found in known hashes (was: sha256:ed4964723e97cdf8c70abebd1495001f511491b8eeac817b033db1af28a86bb5)'
73523fe
to
1333ea7
Compare
@wpietri @dhosterman @bkorycki ready for another review please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Some tests download real prompt sets, which may require an auth token.
Instead of monkeypatching the secrets to use the auth token defined in the real secrets just for the test, we replace prompt set URLs with their demo equivalents.
https://github.com/mlcommons/modelbench-private/issues/53