Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUT Options defined at test level #897

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

bkorycki
Copy link
Contributor

This is part of the effort to chisel away the need for the test wrapper class. More specifically, this PR makes it easy to access the SUT options used by a test object.

However, there is no guarantee that the SUT options returned by test.sut_options() are actually used in all of the test items. This is because the tests are responsible to assembling their test own test items (i.e. bundling the prompts + contexts + sut options).

I see 3 options here:

  1. Tests are responsible of ensuring that sut_options() is accurate (i.e. what this PR does). I don't like this weak link between the method and what it actually represents, but it is the easiest approach.
  2. Tests produce test items sans SUT options, and the benchmark runner is responsible for assembling the final TestItem object.
  3. SUTOptions is completely removed from TestItem and we pass it as a separate object to the SUT. i.e sut.translate_text_prompt(prompt) becomes sut.translate_text_prompt(prompt, sut_options).

@bkorycki bkorycki requested review from wpietri and rogthefrog March 13, 2025 20:27
@bkorycki bkorycki requested a review from a team as a code owner March 13, 2025 20:27
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 13, 2025 20:27 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 13, 2025 20:27 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 13, 2025 20:27 — with GitHub Actions Inactive
Copy link

github-actions bot commented Mar 13, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@rogthefrog
Copy link
Contributor

Conceptually, I like the idea of removing sut options from TestItems and passing them in where needed.

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good other than the disabled test.

As to how to make it so people are less likely to accidentally break things by using the wrong options, would it make sense for the TestItem to have a pointer to the Test? Then anything that needs them, like a runner, could either do test_item.test.sut_options() or just test_item.sut_options() which has a default implementation that passes the request along to the Test.

@@ -50,6 +51,12 @@ def get_annotators(cls) -> List[str]:
"""
pass

@classmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a class method? It seems to me that it could happen at the instance level, although given our small number of tests, I suppose it's hard to say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might also not make it an abstract method, come to think of it. Having a default that uses a default SUTOptions seems fine by me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it might be useful for it to be a class method so that we could get information without having to instantiate a test object, which might be expensive. But it definitely doesn't have to be a class method.

And yeah you're right about getting rid of the abstract portion. I'll implement add a default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, if that's something we're doing, that would be a good reason. But if we don't need it now, I'd say stick with an instance method and we can always rejigger later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I did it.

@classmethod
def sut_options(cls) -> SUTOptions:
"""Returns the SUT options that are supplied in each test item."""
return SUTOptions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a new one each time, which is fine if it's not called often, but "once per test item" is for us a pretty frequent call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true. I could make it a class property instead.

@bkorycki bkorycki temporarily deployed to Scheduled Testing March 13, 2025 21:39 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 13, 2025 21:39 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 13, 2025 21:39 — with GitHub Actions Inactive
@bkorycki
Copy link
Contributor Author

@wpietri @rogthefrog I ended up going with Roger's suggestion of removing sut_options from TestItem. I tried doing William's suggestion but you can't really point to non-pydantic models in a pydantic model.

@bkorycki bkorycki deployed to Scheduled Testing March 14, 2025 21:15 — with GitHub Actions Active
@bkorycki bkorycki requested a review from wpietri March 14, 2025 21:15
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 14, 2025 21:15 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing March 14, 2025 21:15 — with GitHub Actions Inactive
@bkorycki bkorycki requested a review from rogthefrog March 14, 2025 21:15
Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the cleanup work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants