Skip to content

Validator rewrite#53

Open
rzlim08 wants to merge 9 commits into
mainfrom
rlim/port-dca-zarr-validator-2
Open

Validator rewrite#53
rzlim08 wants to merge 9 commits into
mainfrom
rlim/port-dca-zarr-validator-2

Conversation

@rzlim08

@rzlim08 rzlim08 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator
  • Ports the parallel zarr validation framework from dca_helpers and adopts it
    as the OPS zarr validator
  • replaces pd.read_parquet with lazy polars in
    CellDataValidator
  • migrates the CLI to typer, and removes three modules
    that no callers exercise.

Sorry for the big PR. It's mostly because this is a port.

@rzlim08 rzlim08 changed the title Rlim/port dca zarr validator 2 Validator rewrite Jun 2, 2026
@rzlim08 rzlim08 requested review from VBaratham and cathystoli June 3, 2026 15:51

@VBaratham VBaratham left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just getting around to this. Looks good to me, just flagged a few things we might want to update in either the validator or the schema itself

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to remove all the cross-artifact validations?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was wired into the main validation command, I can add it in if wanted though, it was just a bit confusing to me.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true - it wasn't wired in, which was probably an oversight on Claude's part. So I get why it was removed here. But I think we should add it back and wire it in, since the things it verifies really are required by the schema.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude flagged a few validation changes between this and the new zarr validator - I think they are all fine, but can we add text to the schema specification to account for these? (SHOULD statements for warnings, MUST statements for errors)

- Chunk size: < 512 KB uncompressed → ERROR; < 1 MB → SHOULD warning. (none before)
- Shard size: ≥ 5 TB → ERROR; > 5 GB → warning. (none before)
- Downsampling factors across levels: T/C must be factor 1 → ERROR; spatial dims should be ~2 → warning. (none before)
- dtype recommendation: anything outside {uint8, uint16, uint32} → SHOULD warning. (none before)
- Compression recommendations: integer/float data on blosc → warning; zstd level > 3 → warning. (old only hard-errored on a non-allowlisted final codec)
- HCS acquisition FK: field acquisition id must appear in plate.acquisitions → ERROR. (none before)
- Cross-field uniformity: axes and level-0 chunk shape uniform across plate fields → warnings. (none before)
- OME-NGFF structural validation now runs via ome-zarr-models open_ome_zarr(), which can reject structurally-malformed stores the old attr-parsing path tolerated (and vice versa).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think some of these are straight from the DCA spec. But it might not make sense to add here, or might at least want to relax from MUST -> SHOULD.

I know it's difficult to enforce e.g. outside labs are chunking properly but on some level it would be good to know what standards are and aren't being followed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm you may be right that we don't want these here, or maybe we can just log them instead of erroring. By my reading, it looks like there are certain fields in the OPS schema that are intended to be aligned with the DCA spec, but the OPS schema doesn't adopt ALL the DCA requirements. We might need @cathystoli to make a call here.

Comment thread validator/src/ops_validator/models/zarr_images.py
Comment thread validator/src/ops_validator/models/zarr_images.py
f"Value {annotation_type!r} accepted with warning.",
)

# source_channel.index must match a channel in channels_metadata

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement this in the new validator?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

rzlim08 and others added 2 commits June 23, 2026 16:15
Add two MUST checks to the OPS v0.1 image spec model:

1. axis_unit_rules (OPSStoreSpecV0_1): space/time axes must carry a
   physical unit; channel axes must not. This required promoting the
   `axes` field from list[str] to list[OPSAxis] (name/type/unit) and
   forwarding type/unit through _build_node_dict. The plate-field
   uniformity check keeps comparing names only, so its set() stays
   hashable.

2. check_unique_channel_indices (OPSPlateChannelsSpec): channels_metadata[]
   .index must form a 0-based, gap-free, duplicate-free set. The check sorts
   before comparing to range(n), so list order does not matter. The class is
   renamed from _OPSPlateChannelsSpec to OPSPlateChannelsSpec so tests can
   import it directly.

Also narrow requires-python to >=3.11,<3.14 to match ome-zarr-models, which
unblocks `uv run`/`uv sync` (the >=3.10 floor made the resolver unsatisfiable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…om-validation

feat(zarr): add axis-unit and channel-index validation to OPS v0.1 model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants