forked from adamjtaylor/htan-artist
-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Problem
The Docker build is experiencing dependency version conflicts that break the zarr library:
Error: AttributeError: module 'numpy' has no attribute 'PINF'
Root Cause
- Conda installs NumPy 1.x with constraint
>=1.23.0,<2.0 - Subsequent
pip installcommands (especiallyopencv-python-headless) upgrade NumPy to 2.0 - The zarr library (v2) uses deprecated
np.PINFwhich was removed in NumPy 2.0 - This breaks the build with AttributeError
Current Architecture
- Multi-stage Dockerfile: conda environment (stage 1) + pip installs (stage 2)
- Dependencies split between
environment.ymlandDockerfilepip commands - No automated testing of dependency versions after merge
Solution
Consolidate dependencies and pin to NumPy 1.x stack to prevent version drift:
- Maintain NumPy 1.x (
numpy>=1.23.0,<2.0) - required for zarr v2 - Pin opencv-python-headless to
4.9.0.80(last version supporting NumPy 1.x) - Consolidate all pip dependencies into environment.yml - prevents version drift
- Simplify Dockerfile - remove separate pip install commands, add gcc for C extensions
- Add post-merge CI testing - validate dependency versions after merge
Why Pin to NumPy 1.x?
BLOCKER: minerva-lib (critical dependency) requires zarr==2.6.1, which uses deprecated NumPy APIs removed in NumPy 2.0.
- minerva-lib dependency chain:
minerva-lib→zarr==2.6.1→ requires NumPy 1.x - opencv-python-headless 4.9.0.80: Last version supporting NumPy 1.x (from mid-2024)
- Temporary solution: This pins versions until minerva-lib updates to support zarr v3
Implementation
Files Changed
- docker/environment.yml - Add pip dependencies with version pins, keep NumPy <2.0
- docker/Dockerfile - Add gcc/g++/make for C extensions, remove pip installs from runtime stage, add
--ignore-missing-filesto conda-pack - .github/workflows/build-test.yml - New post-merge CI workflow with dependency validation
Key Changes
docker/environment.yml:
dependencies:
- numpy>=1.23.0,<2.0 # Keep NumPy 1.x for zarr v2 compatibility
- pip:
- opencv-python-headless==4.9.0.80 # Pin to NumPy 1.x compatible version
- openslide-python
- synapseclient
- mantel
- "ome_types>=0.4.2"
- git+https://github.com/labsyspharm/minerva-lib-python@master#egg=minerva-libdocker/Dockerfile:
- Added build tools (gcc, g++, make) in build stage for minerva-lib C extensions
- Removed all pip install commands from runtime stage
- Added
--ignore-missing-filesflag to conda-pack to handle pip overwrites
.github/workflows/build-test.yml:
- Triggers on push to main/master (after merge)
- Builds Docker image with smoke tests
- Validates NumPy 1.x and zarr 2.x versions
- Tests zarr functionality and critical package imports
Testing Strategy
Automated (CI - Post-merge):
- Build Docker image on every push to main
- Smoke test: Verify NumPy 1.x is installed
- Smoke test: Verify zarr 2.x is installed (required by minerva-lib)
- Functional test: Create zarr array successfully
- Import test: Ensure all critical packages load
Manual (Local validation):
- Test Docker build locally before pushing to main
- Run full nextflow pipeline with test data
- Verify zarr array I/O operations
Verification
After merging, verify:
- Docker image builds successfully
- NumPy version is 1.x:
python -c "import numpy; print(numpy.__version__)" - Zarr version is 2.x:
python -c "import zarr; print(zarr.__version__)" - opencv-python-headless is 4.9.0.80
- minerva-lib imports successfully
- Nextflow pipeline runs end-to-end
- CI workflow passes on main branch
Future Work
When minerva-lib updates to support zarr v3:
- Remove opencv-python-headless pin → allow latest
- Update NumPy constraint:
>=1.26.0(allow 2.x) - Update zarr:
>=3.0.0 - Update CI to verify NumPy 2.x and zarr 3.x
References
- opencv-python-headless 4.9.0.80 - Last version with NumPy 1.x support
- minerva-lib source
- NumPy 2.0 Migration Guide
Related Issues/PRs
- Fix dockerfile #40 - Docker build failing
- Pin numpy to <2.0 to fix zarr compatibility #41 - Pin numpy to <2.0 to fix zarr compatibility (temporary fix)
- Commit 04c975e - Initial NumPy constraint
Metadata
Metadata
Assignees
Labels
No labels