Skip to content

chore: Update output data validations for frames and ctf #484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Bento007
Copy link
Contributor

@Bento007 Bento007 commented Apr 16, 2025

Description

resolves chanzuckerberg/cryoet-data-portal#1528

Validations for MDOC/Frames

  1. number of mdoc sections == number of frame files on s3 == number of items in frames table/metadata
  2. frame file names match file names in mdoc section
  3. tilt axis angle in mdoc file matches that in tilt series metadata (+/- 10 deg)
  4. tilt axis angle in mdoc file matches that in the alignment metadata [PSAP. in_plane_rotation] (+/- 10 deg)
  5. number of frames >= # of per section parameters

Validations for Frames Metadata

  1. When isGainCorrected == False, a Gains entity exists for the run
  2. max(acquisitionOrder) <= number of frames - 1
  3. Sum of exposureDose of all frames associated with a tilt series == totalFlux+-1 of tilt series
  4. Sorting acquisitionOrder low-to-high and accumulatedDose low-to-high results in the same order

Validations for PerSectionParameter Metadata

  1. -180 <= astigmaticAngle <= 180
  2. majorDefocus > minorDefocus
  3. 0 <= phaseShift <= 2*pi
  4. maxResolution > 0
  5. rawAngle matches mdoc TiltAngle (+-10^-3 deg).
  6. 0 <= zIndex <= (z-Dimension of tilt series - 1)

Extra

  • generalize helper_angles_injection_errors angle_tolerance to work with more use cases.
  • Remove len(errors) == 0 assertion when multiple errors are caught. This reduces noise in the error message received.
  • Add frame_metadata and frames_meta_file pytest fixtures to facilitate testing.

- update poetry lock
…ose low-to-high results in the same order"

- add test for "Sum of exposureDose of all frames associated with a tilt series == totalFlux of tilt series"
- add test for "max(acquisitionOrder) < number of frames - 1"
- add test for "When isGainCorrected == False, a Gains entity exists for the run"
- test_mdoc_frame_paths covers "frame file names match file names in mdoc section"
- updated test_mdoc_frames to cover "number of mdoc sections == number of frame files on s3 == number of items in frames table/metadata"
- There is no per_section_alignment_parameter.rotation_angle so I'm using tilt_angle.
- test_tilt_axis_angle
- add test for "number of frames >= # of per section parameters"
- -180 <= astigmaticAngle <= 180
- majorDefocus > minorDefocus
- 0 <= phaseShift <= 2*pi
- maxResolution > 0
- rawAngle matches mdoc angle as defined above
- 0 <= zIndex <= (z-Dimension of tilt series - 1)
@Bento007 Bento007 changed the title tsmith/1528-frame-ctf-valdiation chore: Update output data validations for frames and ctf Apr 16, 2025
@Bento007
Copy link
Contributor Author

tilt axis angle in mdoc file matches that in the alignment metadata [PSAP.rotation_angle] (+/- 10 deg)

@uermel I don't see rotation_angle in Per Section Alignment Parameter(PSAP.rotation_angle). What value in PSAP should be compared to tiltseries_tilt_axis

@Bento007
Copy link
Contributor Author

Bento007 commented Apr 17, 2025

max(acquisitionOrder) < number of frames - 1

@uermel Should this be "max(acquisitionOrder) <= number of frames - 1."

@uermel
Copy link
Contributor

uermel commented Apr 18, 2025

tilt axis angle in mdoc file matches that in the alignment metadata [PSAP.rotation_angle] (+/- 10 deg)

@uermel I don't see rotation_angle in Per Section Alignment Parameter(PSAP.rotation_angle). What value in PSAP should be compared to tiltseries_tilt_axis

@Bento007 It should be in_plane_rotation.

@uermel
Copy link
Contributor

uermel commented Apr 18, 2025

max(acquisitionOrder) < number of frames - 1

@uermel Should this be "max(acquisitionOrder) <= number of frames - 1."

@Bento007 Yes, that's right.

@Bento007
Copy link
Contributor Author

testing with the following datasets 10014, 10447, 10085, 10156

10014

FAILED: "Tiltseries: sum of exposureDose of all frames associated with a tilt series == totalFlux of tilt series"
tests/test_tiltseries.py:130 (TestTiltseries.test_exposure_dose[10014-pda2021-01-12-1-100])
45.001599999999996 != 50.0

Expected :50.0
Actual :45.001599999999996

@uermel
Copy link
Contributor

uermel commented Apr 18, 2025

testing with the following datasets 10014, 10447, 10085, 10156

10014

FAILED: "Tiltseries: sum of exposureDose of all frames associated with a tilt series == totalFlux of tilt series" tests/test_tiltseries.py:130 (TestTiltseries.test_exposure_dose[10014-pda2021-01-12-1-100]) 45.001599999999996 != 50.0

Expected :50.0 Actual :45.001599999999996

@Bento007 Interesting! The validations already find mistakes! The total flux should indeed be 45 for this (based on the methods in the associated publication), I need to fix that in the config. Forgot to update total flux and only updated dose rate.

Can we please compare this with a pytest.approx with a precision of ~1? Due to the precision of the values for dose rate this may never match exactly.

@Bento007 Bento007 requested review from manasaV3, Copilot and uermel and removed request for manasaV3 April 21, 2025 20:59
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates and expands the data validations for MDOC/Frames, Frames Metadata, and PerSectionParameter Metadata while adding additional pytest fixtures to support testing.

  • Adds new tests for tiltseries and frame validations including exposure dose, per section parameters, tilt axis angle comparisons, etc.
  • Updates error handling in tests to raise AssertionErrors instead of simple assertions.
  • Enhances fixtures and helper functions to support new file filtering and metadata loading.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ingestion_tools/scripts/data_validation/standardized/tests/test_tiltseries.py Adds tests for exposure dose, per section parameter count, astigmatic angle, phase shift, max resolution, raw angle, and z_index validations.
ingestion_tools/scripts/data_validation/standardized/tests/test_frame.py Introduces tests for gain correction, acquisition order, and sorting of frames.
ingestion_tools/scripts/data_validation/standardized/tests/test_alignments.py Implements tests for tilt angle consistency between mdoc and alignment metadata with improved error handling.
ingestion_tools/scripts/data_validation/standardized/fixtures/path.py Updates file filtering to exclude .json files in addition to .mdoc.
ingestion_tools/scripts/data_validation/standardized/fixtures/data.py Adds a fixture to load frame metadata from frames_metadata.json.
ingestion_tools/scripts/data_validation/shared/helper/tiltseries_helper.py Updates tilt axis angle test to verify consistency with a 10° tolerance.
ingestion_tools/scripts/data_validation/shared/helper/tilt_angles_helper.py Adjusts error raising for raw tilt tests.
ingestion_tools/scripts/data_validation/shared/helper/mdoc_helper.py Enhances mdoc tests to include verification of frame metadata length and more robust error reporting.
ingestion_tools/scripts/data_validation/shared/helper/angles_helper.py Updates helper to utilize the passed angle_tolerance parameter.

@Bento007
Copy link
Contributor Author

A couple of the test datasets are failing but they look like actual errors with the data and not the tests

@Bento007 Bento007 closed this Apr 21, 2025
@Bento007 Bento007 reopened this Apr 21, 2025
if filesystem.exists(dst):
return dst
else:
pytest.fail(f"Frames metadata file not found: {dst}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we update the message such that, it informs that Frames directory exists but the metadata.json doesn't?

pytest.skip("Alignment metadata missing per_section_alignment_parameters.")
errors = []
for i, psap in enumerate(per_section_alignment_parameters):
if (in_plane_rotation := psap.get("in_plane_rotation")) is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, the in_plane_rotation is a 2x2 matrix. should there be a conversion for it before the following check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific way we do that conversion?

frame_metadata: Dict,
gain_headers: Dict[str, Union[List[tifffile.TiffPage], MrcInterpreter]], # this is skipped if it is not found
):
if frame_metadata.get("is_gain_corrected") is False: # todo is none considered false?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would treat None as False for this case.

Copy link
Contributor Author

@Bento007 Bento007 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that you mean the if condition should be true, if is_gain_corrected is None, which would be equivalent to

Suggested change
if frame_metadata.get("is_gain_corrected") is False: # todo is none considered false?
if not frame_metadata.get("is_gain_corrected"):

@allure.title("Frames: max(acquisitionOrder) <= number of frames -1")
def test_max_acquisition_order(self, frame_metadata: Dict):
acquisition_order_max = max(f.get("acquisition_order", 0) for f in frame_metadata["frames"])
assert acquisition_order_max <= len(frame_metadata["frames"]) -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after - ?

def test_exposure_dose(self, frame_metadata: Dict, tiltseries_metadata: Dict):
assert sum(f.get("exposure_dose", 0) for f in frame_metadata["frames"]) == pytest.approx(tiltseries_metadata["total_flux"], abs=1)

@allure.title("PerSectionParameters: number of frames >= # of per section parameters.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label of what it is testing against like "PerSectionParameters" is very neat! 😄

- fixture frames_meta_file failure message indicate the frames directory exists.
- frame_metadata["is_gain_corrected"] is None is treated as
- move mdoc_tilt_axis_angle outside TiltSeriesHelper to make it easier to share.
- Add function to convert matrix to degress to be used in test_mdoc_tilt_axis_angle_in_alignment_per_section_alignment_parameters

Signed-off-by: Bento007 <[email protected]>
@Bento007 Bento007 requested a review from manasaV3 April 24, 2025 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update output data validations for frames and ctf
3 participants