Add Import Capability for externally run Search Evaluation #120

epugh · 2025-06-14T02:25:11Z

Description

This addresses Issue #46 which allows importing of documents and evaluation results.

This is a re-implementation of the work that @sstults did in #73 after the discussion about using the POST http verb. It also updates to the latest data structures.

Issues Resolved

Closes #46

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

martin-gaievski

I think this implementation is unnecessary overcomplicate things. In follows closely the sequence from PUT action, but there we actually run the experiment. For POST action logic should be much simpler.
At the high level these are over engineered steps/areas:

 public class PostExperimentTransportAction {
    
    // Creates experiment with PROCESSING status
    Experiment initialExperiment = new Experiment(
        id, timestamp, type, AsyncStatus.PROCESSING, // WHY?
        querySetId, searchConfigurationList, judgmentList, size, new ArrayList<>()
    );
    
    // Background async processing that's not needed
    private void triggerAsyncProcessing(String experimentId, PostExperimentRequest request) {
        // This should be synchronous!
    }
    
    // Complex atomic counters for tracking "pending" operations
    AtomicInteger pendingQueries = new AtomicInteger(request.getEvaluationResultList().size());
    AtomicBoolean hasFailure = new AtomicBoolean(false);
    
    // Multiple status updates PROCESSING → COMPLETED
    private void updateFinalExperiment(...) {
        // Just create it as COMPLETED from the start!
    }
}

you can improve the implementation and follow this sequence:

protected void doExecute() {
    try {
        // Validate input (synchronously)
        validateRequest(request);
        
        // Create experiment as COMPLETED (data already processed)
        String experimentId = UUID.randomUUID().toString();
        List<Map<String, Object>> results = new ArrayList<>();
        
        // Persist evaluation results (synchronously) 
        for (Map<String, Object> evalData : request.getEvaluationResultList()) {
            EvaluationResult result = createEvaluationResult(evalData, request);
            evaluationResultDao.putEvaluationResultSync(result); // Sync call
            results.add(createResultSummary(result));
        }
        
        // Create final experiment with COMPLETED status
        Experiment experiment = new Experiment(
            experimentId, TimeUtils.getTimestamp(), request.getType(),
            AsyncStatus.COMPLETED, // Already complete!
            request.getQuerySetId(), request.getSearchConfigurationList(),
            request.getJudgmentList(), request.getSize(), results
        );
        
        // Persist and return
        experimentDao.putExperiment(experiment, listener);
        
    } catch (Exception e) {
        listener.onFailure(new SearchRelevanceException("Import failed", e, RestStatus.BAD_REQUEST));
    }
}

martin-gaievski · 2025-07-03T22:53:11Z

.../java/org/opensearch/searchrelevance/transport/experiment/PostExperimentTransportAction.java

+
+        try {
+            String id = UUID.randomUUID().toString();
+            LOGGER.warn("Experiment ID: " + id);


warn is a bit too much, info or debug suits better here

makes sense, doing it now.

martin-gaievski · 2025-07-03T23:09:22Z

src/main/java/org/opensearch/searchrelevance/rest/RestPostExperimentAction.java

+        XContentParser parser = request.contentParser();
+        Map<String, Object> source = parser.map();
+
+        String querySetId = (String) source.get("querySetId");


please add validation for missing or blank values

epugh · 2025-07-21T11:36:56Z

@sstults can you tackle the feedback from @martin-gaievski on this one when you are back in the office?

epugh · 2025-07-23T16:30:10Z

@sstults I went through again, and yes, I think @martin-gaievski flagged some other items seperate from your previous PR. Also, I added you as a collaborator to epugh/search-relevance so you can update the PR directly.

sstults · 2025-07-24T19:36:09Z

Still working on the first comment, but addressed the second two.

sstults · 2025-07-28T21:06:36Z

@martin-gaievski I believe I've addressed the issues found in the review. Could you re-review?

martin-gaievski

In addition to few comments below one missing thing is proper integration test. it can be a simple happy path but for new API we need to have it.
Because it's a new API we need to have an app sec review and we cannot merge to main branch before is signed off by security team. We can only merge to feature branch. Do you have proper design doc and threat model in order to start the app sec process?

UPD: As per later discussion let's focus on addressing code level comments (adding validations, tests etc.). Once this is completed PR should be good to merge and design and other technical details can be addressed in parallel. I've created PR with template of the technical document we'll be looking for #201, it's still in review but I don't expect any fundamental level requests as part of PR review.
^ @epugh @sstults

martin-gaievski · 2025-07-29T22:49:47Z

src/main/java/org/opensearch/searchrelevance/transport/experiment/PostExperimentRequest.java

+
+    @Override
+    public ActionRequestValidationException validate() {
+        return null;


I think you need to have validations for all required fields, something like:

ActionRequestValidationException validationException = null; if (type == null) { validationException = addValidationError("type is required", validationException); } if (querySetId == null || querySetId.isEmpty()) { validationException = addValidationError("querySetId is required", validationException); } if (searchConfigurationList == null || searchConfigurationList.isEmpty()) { validationException = addValidationError("searchConfigurationList is required", validationException); } if (evaluationResultList == null || evaluationResultList.isEmpty()) { validationException = addValidationError("evaluationResultList is required", validationException); } return validationException;

I can imagine situations where we don't actually need htings like a SearchConfiguration, however I bet our frontend UI will blow up, so we should require them until we have a strong use case to NOT require them!

I've added checks for empty values, but I'm leaving the nullness check to the annotations.

martin-gaievski · 2025-07-29T22:53:18Z

.../java/org/opensearch/searchrelevance/transport/experiment/PostExperimentTransportAction.java

+            List<Map<String, Object>> results = new ArrayList<>();
+
+            // Persist evaluation results (synchronously)
+            for (Map<String, Object> evalData : request.getEvaluationResultList()) {


without checking for a limit this loop is potentially dangerous, each result it written synchronously potentially blocking transport thread, and we don't have any timeout protection.
My recommendations:

add a limit for number of elements in the list

add timeout (optional)

implement queue-based approach for large imports (super optional)

i think reference SEARCH_RELEVANCE_QUERY_SET_MAX_LIMIT (and fix JavaDocs!)

martin-gaievski · 2025-07-29T23:04:35Z

.../java/org/opensearch/searchrelevance/transport/experiment/PostExperimentTransportAction.java

+            );
+
+            // Persist and return
+            experimentDao.putExperiment(experiment, listener);


what is the behavior in case last out of the 100 evaluation results has failed? I assume 99 evaluation records will be stored, last evaluation record is skipped, and experiment document is skipped too. We should do better in this case. Unfortunately we cannot do transactional semantic, or it's rather very hard to implement. But because we saved some of evaluation documents we need to store the experiment document with status ERROR instead of COMPLETED.

andrross · 2025-07-30T17:16:56Z

Because it's a new API we need to have an app sec review and we cannot merge to main branch before is signed off by security team. We can only merge to feature branch. Do you have proper design doc and threat model in order to start the app sec process?

I don't think we should block this PR on a sign-off from a security team separate from the OpenSearch Project. Many companies use OpenSearch, and they will all have different security requirements and processes. It's not tenable to put any of those processes into the critical path of committing code into open-source.

That being said, I think it's totally fair game for the OpenSearch Project to implement certain standards and requirements related to security (e.g. new APIs must include a high level architecture and threat model as a part of design and implementation). I'm tagging @getsaurabh02 @msfroh from the build interest group as "compliance with security standards" is part of their charter. Do you have anything to add here?

@martin-gaievski I'm a little concerned that this is something of an 11th hour requirement from @epugh's perspective. How can we get this merged before the 3.2 release? Are there any existing artifacts that Eric can use as an example for this case to compile in the next couple days? I believe this plugin was released for the first time in the 3.1 distribution and it introduced new APIs. What design docs/thread models were used for that case? I would think this case should just be a very small update to those existing artifacts.

martin-gaievski · 2025-07-30T17:44:04Z

Because it's a new API we need to have an app sec review and we cannot merge to main branch before is signed off by security team. We can only merge to feature branch. Do you have proper design doc and threat model in order to start the app sec process?

I don't think we should block this PR on a sign-off from a security team separate from the OpenSearch Project. Many companies use OpenSearch, and they will all have different security requirements and processes. It's not tenable to put any of those processes into the critical path of committing code into open-source.

That being said, I think it's totally fair game for the OpenSearch Project to implement certain standards and requirements related to security (e.g. new APIs must include a high level architecture and threat model as a part of design and implementation). I'm tagging @getsaurabh02 @msfroh from the build interest group as "compliance with security standards" is part of their charter. Do you have anything to add here?

@martin-gaievski I'm a little concerned that this is something of an 11th hour requirement from @epugh's perspective. How can we get this merged before the 3.2 release? Are there any existing artifacts that Eric can use as an example for this case to compile in the next couple days? I believe this plugin was released for the first time in the 3.1 distribution and it introduced new APIs. What design docs/thread models were used for that case? I would think this case should just be a very small update to those existing artifacts.

I think the most effective way to address requirements is to put those technical details and security related answers to RFC, there are many examples of those with RFC label, even in this repository: RFC1, RFC2, RFC3. Eric and team can have extended section regrading security issues as part of the RFC - what are potential risks (like limiting the type and size of user provided content), how they are addressed in current implementation, any new feature flags etc.
Regarding the features of search-relevance that were released in 3.1 - there are RFCs with those details for some of the features: ref1, ref2, ref3. I'm not sure if you aware of this, but most content of this repository got migrated from o19s repository close to 3.1 code freeze date ref. The the repo has been deleted after the migration, and that is including all GH Issues that may have those technical details. In 3.1 all features were behind the feature flag with "opt in" mode, partially because of in_process security review (ref). It has been flipped to "opt out" only recently, after all security concerns were addressed.

andrross · 2025-07-30T19:03:20Z

what are potential risks (like limiting the type and size of user provided content)

@martin-gaievski If you have specific technical concerns like this, then as a maintainer you can and should ensure they are addressed before merging. Similarly if you have specific concerns about making this behavior available by default then you can propose to make it opt-in via a feature flag. My biggest concern is preventing code that is otherwise ready from being merged pending an opaque review from a specific company.

martin-gaievski · 2025-07-30T19:36:10Z

what are potential risks (like limiting the type and size of user provided content)

@martin-gaievski If you have specific technical concerns like this, then as a maintainer you can and should ensure they are addressed before merging. Similarly if you have specific concerns about making this behavior available by default then you can propose to make it opt-in via a feature flag. My biggest concern is preventing code that is otherwise ready from being merged pending an opaque review from a specific company.

That's what I put into my recent review, for instance this comment.
Looking at the SRW experiments in general, I cannot find any document that describes the design, events flow and other similar details (@epugh or @sstults, correct me if I'm wrong about this). That is my main reason for requesting GH issue (or similar doc). While this PR is extending workflow for experiments in SRW, it's hard to understand the implications of this change without seeing the whole picture.
I want to clarify that my intention is not to prevent code from being merged unnecessarily. Rather, my concerns are focused on ensuring that the code being integrated is ready and of high quality. As maintainers, our responsibility is to keep the system stable and free from regressions. Having proper documentation and understanding the full design helps us achieve this goal. I believe that addressing these technical concerns through collaborative discussion will lead to a better product for all users.

epugh · 2025-07-31T13:23:30Z

I will take the template that was added: https://github.com/opensearch-project/search-relevance/blob/main/docs/DESIGN_TEMPLATE.md and go through it and add details to cover this PR. @martin-gaievski are we thinking this document go into the PR as text, or we create an issue with the text, or do we add a new doc in docs/PR-120-TECHNICAL-DESIGN.md? Or is it an RFC? What are we thinking the next steps are?

martin-gaievski · 2025-07-31T16:14:53Z

I will take the template that was added: https://github.com/opensearch-project/search-relevance/blob/main/docs/DESIGN_TEMPLATE.md and go through it and add details to cover this PR. @martin-gaievski are we thinking this document go into the PR as text, or we create an issue with the text, or do we add a new doc in docs/PR-120-TECHNICAL-DESIGN.md? Or is it an RFC? What are we thinking the next steps are?

In general, a new GitHub issue is the best option, which would look very much like an RFC. This approach makes sense for relatively big features with multiple components and user-facing changes. For smaller features or additions to existing features, this can be part of the feature request (if available at the time of creation, or added as a comment) or most realistically, part of the PR description.

For this PR, because there appears to be no overall design documentation for Experiments in SRW (please correct me if I'm wrong and I've simply missed the relevant document), I believe we should create a combined RFC with:

An overall overview of the Experiment flow in SRW (with fewer details, primarily for context to understand why we need the import API)
Details about the functionality added in this PR

To meet the upcoming 3.2 deadline, we can start by adding details related to the new import experiment API directly to this PR (I think most details are already there, just make sure everything is up to date). This should be sufficient to merge the PR. The broader RFC can be developed in parallel and shouldn't block this PR for the 3.2 release.

epugh · 2025-07-31T16:36:08Z

The design for the Experiments was covered in the Search Relevance Workbench RFC. A lot of the questions that the design templates highlights though, like "Threat Analysis", are not covered at all in the RFC.

I will take your suggestion about, for this PR, to modify the PR and add the details out of the DESIGN_TEMPLATE.md and see how it goes!

martin-gaievski · 2025-07-31T16:56:58Z

The design for the Experiments was covered in the Search Relevance Workbench RFC. A lot of the questions that the design templates highlights though, like "Threat Analysis", are not covered at all in the RFC.

I will take your suggestion about, for this PR, to modify the PR and add the details out of the DESIGN_TEMPLATE.md and see how it goes!

Yes, Search Relevance Workbench RFC](opensearch-project/OpenSearch#17735) answers high level questions, thanks for sharing the link. Then update in this PR only should be sufficient

epugh · 2025-08-02T10:56:59Z

@martin-gaievski I wanted to draw your attention the the seperate PR that @sstults linked with the design doc: #205. I know we talked about it being in the github issue as well.. How does this look towards getting this reviewed and merged? I'm aware of the narrowing window we have for getting this in for 3.2! @sstults also has promised to get those last review comments dealt with this weekend.

epugh · 2025-08-02T11:02:59Z

Also, I think I am just realizing that I need two approvals in this repo for a merge? I'm going to tag more maintainers to get review....

msfroh · 2025-08-02T18:04:23Z

Similarly if you have specific concerns about making this behavior available by default then you can propose to make it opt-in via a feature flag.

I would even argue that if a company has an internal review process that needs to be followed before exposing a new feature that has been accepted into the project following the normal code review process, the onus should be on that company to leave the feature off until they do their review. That is, I would make it opt-out instead of opt-in.

Of course, asking for the ability to easily disable the feature (maybe via a static node-level flag) is reasonable. Also, there's nothing stopping anyone from patching out a feature on their internal branch.

fen-qin · 2025-08-02T19:06:20Z

Would you like to rebase from current main branch ?

@sstults

Re-implementation of the work that @sstults did in opensearch-project#73 that uses a POST verb to set up the import instead of a ExperimentType.POINTWISE_EVALUATION_IMPORT. Signed-off-by: Eric Pugh <[email protected]>

Signed-off-by: Eric Pugh <[email protected]>

…experiment_results

Signed-off-by: Eric Pugh <[email protected]>

epugh · 2025-08-25T11:47:53Z

I did some testing and what we import doesn't mirror what we generate when directly running an experiment. I am going to work on this PR for a bit but if it gets too confusing I may start yet another PR as. Fresh attempt. Ugh.

…ment_results

…to SRW.

Signed-off-by: Eric Pugh <[email protected]>

…experiment_results

…esults together.

Signed-off-by: Eric Pugh <[email protected]>

…W (and the import API ;-) )

fen-qin · 2025-09-19T22:16:43Z

@epugh
Can you confirm that this PR is up to date and ready for final review. since you mentioned

I may start yet another PR as. Fresh attempt

Because this feature aims to import external experiment results into SRW, so we'd like to evaluate the risk carefully.

There are risks regarding resource exhaustion due to missing payload and field-size validation.
there are not enough restrictions/validations for external experiment results in the current code, especially at the API boundary.
would you like to add explicit limits/checks for payload size, result count, and field size to improve robustness and security.

Let me share the validation check items:

1. Number of Evaluation Results

The RestPostExperimentAction parses the entire input request body using parser.map(), which deserializes the whole JSON payload into memory. Large evaluate results can exhaust memory or processing resources, leading to denial-of-service (DoS) or node instability. Add explicit checks for evaluate results size either via OpenSearch settings or directly in the REST handler (e.g., check request.content().length()).

take an example, we had introduce a settings for imported query set size, we can extend it to be more general, or make another size limit for external experiment results

int maxLimit = settingsAccessor.getMaxQuerySetAllowed();
int processCount = Math.min(evaluationResultList.size(), maxLimit);

2. Size of Individual Fields

There is minimal validation for the size of individual fields (e.g., a single searchText, elements of metrics, document IDs, etc.). The REST layer does not check for maximum string lengths or map/list sizes within each result. Large individual fields (e.g., a searchText with megabytes of content, or extremely large metric arrays) can cause excessive memory or storage use, slow processing, or even failure in downstream DAO layers. Can you define and enforce maximum allowed sizes for key fields in the REST handler or the validate() method.

take an example, we add TextValidationUtil for imported query set - https://github.com/opensearch-project/search-relevance/blob/main/src/main/java/org/opensearch/searchrelevance/utils/TextValidationUtil.java, we can extend it to be more general, or add more functions for external experiment results

ajleong623

(I meant for this review to be an inline one. I have added the proper review below. Feel free to dismiss this comment.) What were you thinking for the behavior of has failure when putting an evaluation? Should the loop continue to add experiment results or would there be a short-circuit where null results are returned?

ajleong623 · 2025-09-25T06:31:09Z

.../java/org/opensearch/searchrelevance/transport/experiment/PostExperimentTransportAction.java

+                    updateFinalExperiment(experimentId, request, finalResults, judgmentList, hasFailure.get());
+                }
+            }, error -> {
+                hasFailure.set(true);


What were you thinking for the behavior of has failure when putting an evaluation? Should the loop continue to add experiment results or would there be a short-circuit where null results are returned?

i think if anything fails, then we mark the import as failed. We don't really want a partial import, it wouldn't be useful. And it's not like we are loading data that is sooo big that you are like "oh good, I got 1 out of 3 terrabytes of data in, and now I fix my bug and start with the remainder"! If anything goes wrong, just set the whole thing to failed.

ajleong623 · 2025-09-25T17:27:06Z

src/test/data-external-evaluation/README.md

+
+1) The `movies_queryset.json` was hand extracted.
+
+1) The `movies_queryset.json` was done using the Python script `extract_quepid_judgements.py`.


I noticed that on lines 5 and 7, you mentioned that movies_queryset.json was both hand extracted and done using the Python script

ajleong623 · 2025-09-25T17:32:31Z

src/test/data-external-evaluation/extract_quepid_experiment.py

+    metrics = []
+    for doc in item.get("ratings", []):
+        doc_id = doc.get("doc_id", "")
+        rating = {


What is the rating object used for?

codecov · 2025-09-30T20:10:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (b4ded15) to head (e3f1d90).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files

@@     Coverage Diff     @@
##   main   #120   +/-   ##
===========================
===========================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

epugh requested a review from martin-gaievski June 14, 2025 09:56

epugh added the v3.2.0 label Jun 14, 2025

epugh mentioned this pull request Jun 14, 2025

[FEATURE]Add Import Capability for externally run Search Evaluation #46

Open

epugh force-pushed the use_post_to_import_experiment_results branch from 38ce2ce to 0ce542a Compare July 3, 2025 17:55

martin-gaievski requested changes Jul 3, 2025

View reviewed changes

martin-gaievski requested changes Jul 29, 2025

View reviewed changes

sstults mentioned this pull request Aug 1, 2025

External evaluation design #205

Open

epugh requested a review from wrigleyDan August 2, 2025 11:00

epugh requested review from fen-qin and heemin32 August 2, 2025 11:03

sstults force-pushed the use_post_to_import_experiment_results branch from 88988f3 to a46c34c Compare August 2, 2025 18:55

epugh added 3 commits August 2, 2025 18:24

Import Search Evaluation

128f739

Re-implementation of the work that @sstults did in opensearch-project#73 that uses a POST verb to set up the import instead of a ExperimentType.POINTWISE_EVALUATION_IMPORT. Signed-off-by: Eric Pugh <[email protected]>

Use correct package path

662b3f5

Signed-off-by: Eric Pugh <[email protected]>

working POC style code

ea68771

Signed-off-by: Eric Pugh <[email protected]>

epugh added 3 commits August 22, 2025 08:03

Merge remote-tracking branch 'upstream/main' into use_post_to_import_…

2483142

…experiment_results

Correct the filename and provide some instructions on using it

4132524

Signed-off-by: Eric Pugh <[email protected]>

Correct the filename and provide some instructions on using it

8d9443e

Signed-off-by: Eric Pugh <[email protected]>

epugh added 15 commits August 25, 2025 10:53

Merge branch 'opensearch-project:main' into use_post_to_import_experi…

f447486

…ment_results

Now have a way of testing a real export from another tool (Quepid) in…

0c5ee12

…to SRW.

Ensure that Integer ratings do not cause problems by making them strings

f2c25fd

Signed-off-by: Eric Pugh <[email protected]>

Merge remote-tracking branch 'upstream/main' into use_post_to_import_…

bf70771

…experiment_results

Demonstrate importing a pointwise experiment.

3c1f109

Language tweak

dea5c09

When importing, you need the experimentId to tie all the evaluation r…

dc97ef5

…esults together.

Prep to have a pairwise example too

a670d37

Refactoring to prep for adding importPairwise experiment

dd9d1d5

Introduce capablity to import a PAIRWISE_COMPARISON as well.

e9f633d

Script demonstrating pairwise import.

d5476fb

Fix constnats (respond to feedback)

f7b5f3e

lint

7b21763

Reduce level of detail, prompted by feedback

914a79b

Signed-off-by: Eric Pugh <[email protected]>

Allow these demos to be run in any opensearch environment that has SR…

e3f1d90

…W (and the import API ;-) )

epugh requested review from fen-qin, martin-gaievski and wrigleyDan September 7, 2025 00:48

ajleong623 reviewed Sep 25, 2025

View reviewed changes

epugh added v3.4.0 and removed v3.3.0 labels Sep 30, 2025


		1) The `movies_queryset.json` was hand extracted.

		1) The `movies_queryset.json` was done using the Python script `extract_quepid_judgements.py`.

Add Import Capability for externally run Search Evaluation #120

Are you sure you want to change the base?

Add Import Capability for externally run Search Evaluation #120

Uh oh!

Conversation

epugh commented Jun 14, 2025

Description

Issues Resolved

Uh oh!

martin-gaievski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

epugh commented Jul 21, 2025

Uh oh!

epugh commented Jul 23, 2025

Uh oh!

sstults commented Jul 24, 2025

Uh oh!

sstults commented Jul 28, 2025

Uh oh!

martin-gaievski left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrross commented Jul 30, 2025

Uh oh!

martin-gaievski commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrross commented Jul 30, 2025

Uh oh!

martin-gaievski commented Jul 30, 2025

Uh oh!

epugh commented Jul 31, 2025

Uh oh!

martin-gaievski commented Jul 31, 2025

Uh oh!

epugh commented Jul 31, 2025

Uh oh!

martin-gaievski commented Jul 31, 2025

Uh oh!

epugh commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

epugh commented Aug 2, 2025

Uh oh!

msfroh commented Aug 2, 2025

Uh oh!

fen-qin commented Aug 2, 2025

Uh oh!

epugh commented Aug 25, 2025

Uh oh!

fen-qin commented Sep 19, 2025

1. Number of Evaluation Results

2. Size of Individual Fields

Uh oh!

ajleong623 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-gaievski left a comment •

edited

Loading

martin-gaievski commented Jul 30, 2025 •

edited

Loading

epugh commented Aug 2, 2025 •

edited

Loading

ajleong623 left a comment •

edited

Loading