Validator rewrite#53
Conversation
VBaratham
left a comment
There was a problem hiding this comment.
Sorry, just getting around to this. Looks good to me, just flagged a few things we might want to update in either the validator or the schema itself
There was a problem hiding this comment.
Do we really want to remove all the cross-artifact validations?
There was a problem hiding this comment.
I don't think this was wired into the main validation command, I can add it in if wanted though, it was just a bit confusing to me.
There was a problem hiding this comment.
That is true - it wasn't wired in, which was probably an oversight on Claude's part. So I get why it was removed here. But I think we should add it back and wire it in, since the things it verifies really are required by the schema.
There was a problem hiding this comment.
Claude flagged a few validation changes between this and the new zarr validator - I think they are all fine, but can we add text to the schema specification to account for these? (SHOULD statements for warnings, MUST statements for errors)
- Chunk size: < 512 KB uncompressed → ERROR; < 1 MB → SHOULD warning. (none before)
- Shard size: ≥ 5 TB → ERROR; > 5 GB → warning. (none before)
- Downsampling factors across levels: T/C must be factor 1 → ERROR; spatial dims should be ~2 → warning. (none before)
- dtype recommendation: anything outside {uint8, uint16, uint32} → SHOULD warning. (none before)
- Compression recommendations: integer/float data on blosc → warning; zstd level > 3 → warning. (old only hard-errored on a non-allowlisted final codec)
- HCS acquisition FK: field acquisition id must appear in plate.acquisitions → ERROR. (none before)
- Cross-field uniformity: axes and level-0 chunk shape uniform across plate fields → warnings. (none before)
- OME-NGFF structural validation now runs via ome-zarr-models open_ome_zarr(), which can reject structurally-malformed stores the old attr-parsing path tolerated (and vice versa).
There was a problem hiding this comment.
Yeah I think some of these are straight from the DCA spec. But it might not make sense to add here, or might at least want to relax from MUST -> SHOULD.
I know it's difficult to enforce e.g. outside labs are chunking properly but on some level it would be good to know what standards are and aren't being followed.
There was a problem hiding this comment.
Hmmm you may be right that we don't want these here, or maybe we can just log them instead of erroring. By my reading, it looks like there are certain fields in the OPS schema that are intended to be aligned with the DCA spec, but the OPS schema doesn't adopt ALL the DCA requirements. We might need @cathystoli to make a call here.
| f"Value {annotation_type!r} accepted with warning.", | ||
| ) | ||
|
|
||
| # source_channel.index must match a channel in channels_metadata |
There was a problem hiding this comment.
Can we implement this in the new validator?
Add two MUST checks to the OPS v0.1 image spec model: 1. axis_unit_rules (OPSStoreSpecV0_1): space/time axes must carry a physical unit; channel axes must not. This required promoting the `axes` field from list[str] to list[OPSAxis] (name/type/unit) and forwarding type/unit through _build_node_dict. The plate-field uniformity check keeps comparing names only, so its set() stays hashable. 2. check_unique_channel_indices (OPSPlateChannelsSpec): channels_metadata[] .index must form a 0-based, gap-free, duplicate-free set. The check sorts before comparing to range(n), so list order does not matter. The class is renamed from _OPSPlateChannelsSpec to OPSPlateChannelsSpec so tests can import it directly. Also narrow requires-python to >=3.11,<3.14 to match ome-zarr-models, which unblocks `uv run`/`uv sync` (the >=3.10 floor made the resolver unsatisfiable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…om-validation feat(zarr): add axis-unit and channel-index validation to OPS v0.1 model
dca_helpersand adopts itas the OPS zarr validator
pd.read_parquetwith lazy polars inCellDataValidatortyper, and removes three modulesthat no callers exercise.
Sorry for the big PR. It's mostly because this is a port.