Skip to content

Fix NumPy 2.0 compatibility by upgrading to zarr v3 and consolidating dependencies #42

@adamjtaylor

Description

@adamjtaylor

Problem

The Docker build is experiencing dependency version conflicts that break the zarr library:

Error: AttributeError: module 'numpy' has no attribute 'PINF'

Root Cause

  1. Conda installs NumPy 1.x with constraint >=1.23.0,<2.0
  2. Subsequent pip install commands (especially opencv-python-headless) upgrade NumPy to 2.0
  3. The zarr library (v2) uses deprecated np.PINF which was removed in NumPy 2.0
  4. This breaks the build with AttributeError

Current Architecture

  • Multi-stage Dockerfile: conda environment (stage 1) + pip installs (stage 2)
  • Dependencies split between environment.yml and Dockerfile pip commands
  • No automated testing of dependency versions after merge

Solution

Consolidate dependencies and pin to NumPy 1.x stack to prevent version drift:

  1. Maintain NumPy 1.x (numpy>=1.23.0,<2.0) - required for zarr v2
  2. Pin opencv-python-headless to 4.9.0.80 (last version supporting NumPy 1.x)
  3. Consolidate all pip dependencies into environment.yml - prevents version drift
  4. Simplify Dockerfile - remove separate pip install commands, add gcc for C extensions
  5. Add post-merge CI testing - validate dependency versions after merge

Why Pin to NumPy 1.x?

BLOCKER: minerva-lib (critical dependency) requires zarr==2.6.1, which uses deprecated NumPy APIs removed in NumPy 2.0.

  • minerva-lib dependency chain: minerva-libzarr==2.6.1 → requires NumPy 1.x
  • opencv-python-headless 4.9.0.80: Last version supporting NumPy 1.x (from mid-2024)
  • Temporary solution: This pins versions until minerva-lib updates to support zarr v3

Implementation

Files Changed

  1. docker/environment.yml - Add pip dependencies with version pins, keep NumPy <2.0
  2. docker/Dockerfile - Add gcc/g++/make for C extensions, remove pip installs from runtime stage, add --ignore-missing-files to conda-pack
  3. .github/workflows/build-test.yml - New post-merge CI workflow with dependency validation

Key Changes

docker/environment.yml:

dependencies:
  - numpy>=1.23.0,<2.0  # Keep NumPy 1.x for zarr v2 compatibility
  - pip:
      - opencv-python-headless==4.9.0.80  # Pin to NumPy 1.x compatible version
      - openslide-python
      - synapseclient
      - mantel
      - "ome_types>=0.4.2"
      - git+https://github.com/labsyspharm/minerva-lib-python@master#egg=minerva-lib

docker/Dockerfile:

  • Added build tools (gcc, g++, make) in build stage for minerva-lib C extensions
  • Removed all pip install commands from runtime stage
  • Added --ignore-missing-files flag to conda-pack to handle pip overwrites

.github/workflows/build-test.yml:

  • Triggers on push to main/master (after merge)
  • Builds Docker image with smoke tests
  • Validates NumPy 1.x and zarr 2.x versions
  • Tests zarr functionality and critical package imports

Testing Strategy

Automated (CI - Post-merge):

  • Build Docker image on every push to main
  • Smoke test: Verify NumPy 1.x is installed
  • Smoke test: Verify zarr 2.x is installed (required by minerva-lib)
  • Functional test: Create zarr array successfully
  • Import test: Ensure all critical packages load

Manual (Local validation):

  • Test Docker build locally before pushing to main
  • Run full nextflow pipeline with test data
  • Verify zarr array I/O operations

Verification

After merging, verify:

  • Docker image builds successfully
  • NumPy version is 1.x: python -c "import numpy; print(numpy.__version__)"
  • Zarr version is 2.x: python -c "import zarr; print(zarr.__version__)"
  • opencv-python-headless is 4.9.0.80
  • minerva-lib imports successfully
  • Nextflow pipeline runs end-to-end
  • CI workflow passes on main branch

Future Work

When minerva-lib updates to support zarr v3:

  1. Remove opencv-python-headless pin → allow latest
  2. Update NumPy constraint: >=1.26.0 (allow 2.x)
  3. Update zarr: >=3.0.0
  4. Update CI to verify NumPy 2.x and zarr 3.x

References

Related Issues/PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions