SUT Options defined at test level #897

bkorycki · 2025-03-13T20:27:06Z

This is part of the effort to chisel away the need for the test wrapper class. More specifically, this PR makes it easy to access the SUT options used by a test object.

However, there is no guarantee that the SUT options returned by test.sut_options() are actually used in all of the test items. This is because the tests are responsible to assembling their test own test items (i.e. bundling the prompts + contexts + sut options).

I see 3 options here:

Tests are responsible of ensuring that sut_options() is accurate (i.e. what this PR does). I don't like this weak link between the method and what it actually represents, but it is the easiest approach.
Tests produce test items sans SUT options, and the benchmark runner is responsible for assembling the final TestItem object.
SUTOptions is completely removed from TestItem and we pass it as a separate object to the SUT. i.e sut.translate_text_prompt(prompt) becomes sut.translate_text_prompt(prompt, sut_options).

github-actions · 2025-03-13T20:27:21Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

rogthefrog · 2025-03-13T21:13:50Z

Conceptually, I like the idea of removing sut options from TestItems and passing them in where needed.

tests/modelgauge_tests/test_prompt_sets.py

wpietri

Seems good other than the disabled test.

As to how to make it so people are less likely to accidentally break things by using the wrong options, would it make sense for the TestItem to have a pointer to the Test? Then anything that needs them, like a runner, could either do test_item.test.sut_options() or just test_item.sut_options() which has a default implementation that passes the request along to the Test.

wpietri · 2025-03-13T21:24:10Z

src/modelgauge/base_test.py

@@ -50,6 +51,12 @@ def get_annotators(cls) -> List[str]:
        """
        pass

+    @classmethod


Why is this a class method? It seems to me that it could happen at the instance level, although given our small number of tests, I suppose it's hard to say.

I might also not make it an abstract method, come to think of it. Having a default that uses a default SUTOptions seems fine by me.

I thought it might be useful for it to be a class method so that we could get information without having to instantiate a test object, which might be expensive. But it definitely doesn't have to be a class method.

And yeah you're right about getting rid of the abstract portion. I'll implement add a default.

Oh, yeah, if that's something we're doing, that would be a good reason. But if we don't need it now, I'd say stick with an instance method and we can always rejigger later.

Okay I think I did it.

wpietri · 2025-03-13T21:26:20Z

src/modelgauge/tests/safe_v1.py

+    @classmethod
+    def sut_options(cls) -> SUTOptions:
+        """Returns the SUT options that are supplied in each test item."""
+        return SUTOptions(


This makes a new one each time, which is fine if it's not called often, but "once per test item" is for us a pretty frequent call.

Yeah that's true. I could make it a class property instead.

tests/modelgauge_tests/test_prompt_sets.py

bkorycki · 2025-03-14T21:00:52Z

@wpietri @rogthefrog I ended up going with Roger's suggestion of removing sut_options from TestItem. I tried doing William's suggestion but you can't really point to non-pydantic models in a pydantic model.

wpietri

Thanks for all the cleanup work!

bkorycki added 2 commits March 13, 2025 13:04

class method sut_options

248b3c3

trim test wrapper

fc6ff0e

bkorycki requested review from wpietri and rogthefrog March 13, 2025 20:27

bkorycki requested a review from a team as a code owner March 13, 2025 20:27

bkorycki temporarily deployed to Scheduled Testing March 13, 2025 20:27 — with GitHub Actions Inactive

rogthefrog reviewed Mar 13, 2025

View reviewed changes

tests/modelgauge_tests/test_prompt_sets.py Outdated Show resolved Hide resolved

wpietri requested changes Mar 13, 2025

View reviewed changes

put back test

59db6d1

bkorycki temporarily deployed to Scheduled Testing March 13, 2025 21:39 — with GitHub Actions Inactive

wpietri approved these changes Mar 13, 2025

View reviewed changes

bkorycki added 4 commits March 13, 2025 16:55

sut_options is an instance method

6af4e1c

First try: SUTOptions is separate from TestItem

a3c976c

finish refactor

478e8d4

Remove another sut_options from testitem

4fd3a1b

bkorycki had a problem deploying to Scheduled Testing March 14, 2025 20:59 — with GitHub Actions Failure

bkorycki had a problem deploying to Scheduled Testing March 14, 2025 20:59 — with GitHub Actions Error

bkorycki had a problem deploying to Scheduled Testing March 14, 2025 20:59 — with GitHub Actions Failure

Fix plugins

fbe3e7b

bkorycki temporarily deployed to Scheduled Testing March 14, 2025 21:15 — with GitHub Actions Inactive

bkorycki requested a review from wpietri March 14, 2025 21:15

bkorycki temporarily deployed to Scheduled Testing March 14, 2025 21:15 — with GitHub Actions Inactive

bkorycki requested a review from rogthefrog March 14, 2025 21:15

wpietri approved these changes Mar 15, 2025

View reviewed changes

bkorycki merged commit 668c93b into main Mar 17, 2025
4 checks passed

bkorycki deleted the sut-options-test-level branch March 17, 2025 15:59

github-actions bot locked and limited conversation to collaborators Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SUT Options defined at test level #897

SUT Options defined at test level #897

Uh oh!

bkorycki commented Mar 13, 2025

Uh oh!

github-actions bot commented Mar 13, 2025 •

edited

Loading

Uh oh!

rogthefrog commented Mar 13, 2025

Uh oh!

Uh oh!

wpietri left a comment

Uh oh!

wpietri Mar 13, 2025

Uh oh!

wpietri Mar 13, 2025

Uh oh!

bkorycki Mar 13, 2025

Uh oh!

wpietri Mar 13, 2025

Uh oh!

bkorycki Mar 14, 2025

Uh oh!

wpietri Mar 13, 2025

Uh oh!

bkorycki Mar 13, 2025

Uh oh!

Uh oh!

bkorycki commented Mar 14, 2025

Uh oh!

wpietri left a comment

Uh oh!

Uh oh!

Uh oh!

SUT Options defined at test level #897

SUT Options defined at test level #897

Uh oh!

Conversation

bkorycki commented Mar 13, 2025

Uh oh!

github-actions bot commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rogthefrog commented Mar 13, 2025

Uh oh!

Uh oh!

wpietri left a comment

Choose a reason for hiding this comment

Uh oh!

wpietri Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

wpietri Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

bkorycki Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

wpietri Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

bkorycki Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

wpietri Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

bkorycki Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bkorycki commented Mar 14, 2025

Uh oh!

wpietri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2025 •

edited

Loading