Skip to content

Random integration test failures for OPERA_RTC_S1 API validation #2742

@jtherrmann

Description

@jtherrmann

For #2724 (the opera-rtc branch) I'm seeing apparently random integration test failures for OPERA_RTC_S1 API validation.

One failure: https://github.com/ASFHyP3/hyp3/actions/runs/14805284454/job/41572395801

=========================== short test summary info ============================
FAILED tests/test_api/test_opera_rtc_s1.py::test_opera_rtc_s1_static_coverage - assert 200 == <HTTPStatus.BAD_REQUEST: 400>
 +  where 200 = <WrapperTestResponse streamed [200 OK]>.status_code
 +  and   <HTTPStatus.BAD_REQUEST: 400> = HTTPStatus.BAD_REQUEST
================= 1 failed, 202 passed, 830 warnings in 59.70s =================

Two failures: https://github.com/ASFHyP3/hyp3/actions/runs/14805283725/job/41572394216

=========================== short test summary info ============================
FAILED tests/test_api/test_opera_rtc_s1.py::test_opera_rtc_s1_static_coverage - assert 200 == <HTTPStatus.BAD_REQUEST: 400>
 +  where 200 = <WrapperTestResponse streamed [200 OK]>.status_code
 +  and   <HTTPStatus.BAD_REQUEST: 400> = HTTPStatus.BAD_REQUEST
FAILED tests/test_api/test_opera_rtc_s1.py::test_opera_rtc_s1_dem_coverage - assert 200 == <HTTPStatus.BAD_REQUEST: 400>
 +  where 200 = <WrapperTestResponse streamed [200 OK]>.status_code
 +  and   <HTTPStatus.BAD_REQUEST: 400> = HTTPStatus.BAD_REQUEST
============ 2 failed, 201 passed, 833 warnings in 62.69s (0:01:02) ============

These tests assert that known-bad granules fail the appropriate validation checks (static coverage and DEM coverage). The failures show that the HyP3 API returned a 200 response for those granules when it was expected to return a 400.

Some possibilities:

  1. It could be a problem with the tests or test configuration, e.g. responses mocking not getting cleaned up from the unit tests (which I attempted to fix as part of Add OPERA_RTC job type #2724 but there could still be an issue with that), though I wouldn't expect it to fail randomly like this. Or one of the monkeypatch validator mocks from test_opera_rtc_s1_validation_order not getting cleaned up reliably.

  2. It could be a problem with the HyP3 API itself not returning the right status code for some reason (despite the validator function raising the appropriate exception). This seems the least likely explanation.

  3. It could be a problem with the validator functions themselves. E.g. for check_opera_rtc_s1_static_coverage, if CMR was occasionally returning a static granule entry in the _has_opera_rtc_s1_static_coverage helper function for the bad test granule, that would cause the test to fail. Or if _get_cmr_metadata was failing to return the CMR entry for the bad test granule, I believe check_dem_coverage would just succeed because it wouldn't actually be checking any granules, which would cause the test to fail. Either of these explanations would seem more likely if we hadn't seen both of these tests fail; it seems unlikely that CMR malfunctioned in two different ways as part of the same test run.

  4. It could be a problem with GitHub Actions, though I don't know what that problem would be.

My bet is on (1) being the place to start. But if we wanted to help rule out (3), we could submit the same batch of jobs with validate_only repeatedly, and confirm that the same granules fail those two validators each time, over however long a time period that it takes for us to feel confident that those validators are working correctly.

To start investigating (1), we probably need to actually reproduce the issue. I've re-run the full test suite several times manually and they pass every time. We could write a script to run the tests on a loop until there's a failure, just to see if we can reproduce this issue at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions