Skip to content

fix(ci): replace minikube devstack with standalone MinIO container#3157

Open
npow wants to merge 3 commits intomasterfrom
minio-ci-fixes
Open

fix(ci): replace minikube devstack with standalone MinIO container#3157
npow wants to merge 3 commits intomasterfrom
minio-ci-fixes

Conversation

@npow
Copy link
Copy Markdown
Collaborator

@npow npow commented Apr 27, 2026

Summary

  • Replace the minikube-based metaflow-dev all-up S3 test environment with a lightweight standalone MinIO Docker container
  • Skip tests incompatible with MinIO (SSE, non-ASCII filenames)
  • Skip heavy parametrizations (5gb_file, 3000_files) that exceed CI timeout
  • Pin CI to Python 3.11 with 90min timeout
  • Add workflow_dispatch trigger

Extracted from #3148 — these are the MinIO CI changes only, without the boto3 refactor.

Test plan

  • Verify metaflow.s3_tests.minio CI workflow passes with the new Docker-based MinIO setup
  • Confirm skipped tests are correctly gated on MINIO_TEST env var

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR replaces the minikube-based MinIO devstack with a lightweight standalone Docker container, pins CI to Python 3.11 with a 90-minute timeout, and gates SSE/non-ASCII/heavy-parametrization tests behind MINIO_TEST. The full-stack-test.yml improvements (timeout, teardown guard, failure dump) are clean. The two minor findings are in the SSE fixture and the retry count — both P2 and non-blocking.

Confidence Score: 5/5

Safe to merge; all findings are P2 style suggestions and the known P1s are already tracked in prior review threads

No new P0/P1 findings introduced by this PR beyond the issues already flagged in previous review threads. The SSE duplicate-run and high retry count are minor quality concerns that don't affect correctness.

.github/workflows/metaflow.s3_tests.minio.yml still has open issues from prior threads (untagged image, silent readiness timeout, broken workflow_dispatch) that should be addressed before relying on this workflow in production CI.

Important Files Changed

Filename Overview
.github/workflows/metaflow.s3_tests.minio.yml Replaces minikube devstack with a standalone MinIO Docker container; known issues with untagged image, silent readiness-loop timeout, and broken workflow_dispatch are flagged in prior review threads
.github/workflows/full-stack-test.yml Adds 30-min timeout, extends wait-ready to 900 s, adds debug dump on failure, and ensures teardown always runs — clean improvement
test/data/s3/test_s3.py SSE fixture returns None for MinIO, causing encryption_settings = [None, None] — tests run twice identically instead of being skipped; functional but wasteful
test/data/s3/test_s3op.py Correctly gates non-ASCII filename test behind MINIO_TEST env var using pytest.mark.skipif

Reviews (2): Last reviewed commit: "ci: retrigger checks" | Re-trigger Greptile

Comment thread .github/workflows/metaflow.s3_tests.minio.yml
Comment thread .github/workflows/metaflow.s3_tests.minio.yml
Comment thread .github/workflows/metaflow.s3_tests.minio.yml
npow and others added 3 commits April 27, 2026 23:15
Replace the minikube-based S3 test environment with a lightweight
standalone MinIO Docker container. This eliminates the ~150s spin-up
delay and removes the kubernetes dependency.

Changes:
- Use `docker run minio/minio` instead of `metaflow-dev all-up`
- Set env vars at job level instead of inside a heredoc shell
- Add health-check polling loop for MinIO readiness
- Create test bucket via boto3 instead of relying on devstack
- Skip SSE tests under MinIO (unsupported)
- Skip non-ASCII filename test under MinIO (unsupported)
- Pin to Python 3.11, add 90min timeout, fail-fast: false

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The full-stack-test workflow was timing out on generate-configs with
WAIT_TIMEOUT=600 (10 min). CI runners are slow and services sometimes
need longer to initialize.

- Increase WAIT_TIMEOUT from 600 to 900 (15 min)
- Add timeout-minutes: 30 to prevent runaway jobs (was using 6h default)
- Add diagnostic step on failure: dump tilt resource status and recent logs
- Run teardown with if: always() so cleanup happens on failure too

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant