Skip to content

Generic sdist-build hooks in conda; port prebuilt sdist/git support#92

Merged
npow merged 28 commits into
mainfrom
npow/generic-sdist-hooks
Jun 19, 2026
Merged

Generic sdist-build hooks in conda; port prebuilt sdist/git support#92
npow merged 28 commits into
mainfrom
npow/generic-sdist-hooks

Conversation

@npow

@npow npow commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds generic conda sdist-build-deferral hooks to metaflow-netflixext and ships the complete metaflow-prebuilt orchestration package for `--environment=prebuilt` — a workflow that pre-bakes resolved conda environments into Docker images so remote tasks bootstrap in seconds rather than minutes.

metaflow-netflixext (conda core changes)

Two primitives, each default-off — non-prebuilt resolve/create is byte-for-byte unchanged:

  • `pip_resolver.defer_pypi_sdist_build` — records sdists as `{name, version, url}` in `deferred_sdists` instead of building wheels at resolution time.
  • `conda._create(..., only_binary=True)` — installs only pre-built wheels (analogous to `pip --only-binary=:all:`); threaded through `create_for_step` / `create_for_name`.
  • `conda_environment._make_envs_resolver()` — zero-arg factory hook so a subclass can opt into deferral without duplicating `init_environment`.

Conda core contains zero references to `prebuilt`, `deferred_builds.json`, or `schema_version`.

metaflow-prebuilt (owns the protocol)

  • `prebuilt_conda_environment`: opts into deferral via `_make_envs_resolver`; materializes wheels for non-web (git/local) packages; writes `deferred_builds.json` (schema v2) and COPYs it + `deferred_wheels/` into the Dockerfile.
  • Two-pass installer (Pass A / Pass B):
    • Pass A: `create(only_binary=True)` — conda + pre-built wheels, fully offline.
    • Pass B: `uv pip install --no-build-isolation --no-deps` — builds deferred sdists in-container, including `[build-system].requires` packages (e.g. deepspeed).
  • Image identity: keyed by environment + base-image/arch variant; schema v29.
  • Cross-arch builds: `--platform` flag passed through to the build service.
  • Pluggable build services (docker/buildx/kaniko/codebuild) and image registries (ecr/gcr/local) registered via entry points.

Also restores deepspeed (sdist) and transnetv2 (git) support that was developed inline in the internal repo but never ported to the OSS packages.

Validation

  • 77 prebuilt unit tests pass.
  • Validated end-to-end on real Netflix Titus (linked runs in the consuming internal PR).
  • No-deferred path is unchanged: no JSON written, no COPY directives, existing conda tests unaffected.

🤖 Generated with Claude Code

@npow npow force-pushed the npow/generic-sdist-hooks branch 2 times, most recently from 51ecec2 to 7b0030d Compare June 18, 2026 16:33
@npow

npow commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Canon PR Review — Engineering Standards Scorecard

PR: Make conda prebuilt-agnostic; restore deepspeed/transnetv2 sdist support (netflixext 1.3.11→1.3.12, prebuilt 0.2.1→0.3.0)
Reviewed against: Top 20 Tier 2 practices (Code Complete, Clean Code, APoSD, Refactoring, Release It!, DDIA, The Pragmatic Programmer, Working Effectively with Legacy Code, Design Patterns GoF)

Scores

Dimension Score Gate
Code Quality (naming, SRP, DRY, guard clauses, smells) 4/5
Module Design (deep modules, composition, docs, encapsulation) 4/5
Reliability (error handling, idempotency, circuit breakers, timeouts, degradation) 4/5
Maintainability (small steps, test before change, broken windows, regression tests) 3/5

Decision: APPROVE


Violations Found

Reliability

CC-058 — Explicit Timeouts on All Network/Subprocess Calls (Source: Release It!, DDIA)

  • Location: metaflow-prebuilt/metaflow_extensions/prebuilt/plugins/conda/prebuilt_build_install.py — the three subprocess.run() calls in _run_deferred_sdist_builds
  • Issue: subprocess.run() without timeout= can hang indefinitely inside a Docker build layer if uv/pip stalls (network, package index outage, hanging build subprocess). The build service may have an outer timeout, but an explicit per-subprocess timeout would make the failure mode visible rather than relying on the outer build kill.
  • Violation excerpt:
    check_result = subprocess.run(
        [python_bin, "-c", _SETUPTOOLS_CHECK],
        capture_output=True,
        text=True,
    )
    result = subprocess.run(
        build_args, capture_output=True, text=True, env=build_env
    )
  • Suggested improvement:
    result = subprocess.run(
        build_args, capture_output=True, text=True, env=build_env,
        timeout=600,  # 10 min hard ceiling per sdist build
    )
    Wrap in subprocess.TimeoutExpiredCondaException to get a useful error message.

Maintainability

CC-093 — Fix Broken Windows: TODOs Without Issue Tracker References (Source: The Pragmatic Programmer)

  • Location: prebuilt_conda_environment.py (_image_cache_key, _build_prebuilt_images), buildx_service.py, ecr_registry.py
  • Issue: Five TODO comments are added under the image-identity and codebuild-region labels, which function as informal tags but have no issue tracker references. "Pre-existing (inline has it too)" mitigates severity, but the follow-up items are real load-bearing limitations (CPU/GPU base collision, cross-arch deploy failure). Without an issue reference they risk staying permanently deferred.
  • Violation excerpt:
    # TODO(image-identity): keys only on req_id/full_id, so steps that share deps
    # but need different images collide on one cached image / registry tag:
  • Suggested improvement: Link each TODO to a tracked issue (e.g. TODO(image-identity): github.com/Netflix/metaflow-nflx-extensions/issues/NN) so the follow-up has a clear owner and won't be forgotten across PRs.

What's Working Well

  1. _make_envs_resolver() factory hook (conda_environment.py, prebuilt_conda_environment.py) — a textbook deep-module extension point. Subclasses inject resolver behavior (defer deferral flag) without duplicating the 100-line init_environment body. Clean.

  2. Regression test naming and structuretest_gather_embedded_wheels_prefers_cached_wheel_over_local_sdist is labeled as a "Codex r13 regression," the scenario is precisely described, and the fake_lazy_fetch captures what was actually fetched. This is exactly what a regression test should look like.

  3. Deferred-builds hand-off design — always writing deferred_builds.json into the build context even when empty, and always COPY-ing it, correctly prevents stale inheritance from a base image. The comment explaining why is correct and load-bearing.

  4. isinstance(environment, CondaEnvironment) widening — replacing the string TYPE != "conda" checks is the right fix: it's strictly more correct, covers subclass polymorphism, and doesn't require a TYPE override in the subclass.


Scored against: CC-003, CC-005, CC-006, CC-011, CC-021 (Code Quality) | CC-027, CC-015, CC-028, CC-017 (Module Design) | CC-014, CC-048, CC-056, CC-058, CC-060 (Reliability) | CC-019, CC-031, CC-093, CC-107 (Maintainability)

npow pushed a commit that referenced this pull request Jun 18, 2026
…rity/robustness)

- python_requires>=3.10 (blocking): add a setup.py comment (both packages) that the
  floor mirrors the internal minimum (<3.10 dropped internally); no 3.10-only
  construct in the code — _parse_build_system_requires has tomllib/tomli/regex.
- Pass B shared overlay: _overlay_install now warns when a build dep is re-staged
  at a DIFFERENT version (surfaces the documented shared-overlay conflict instead
  of a silent miscompilation).
- deferred_sdists: comment that "url" is the guaranteed source (is_downloadable_url
  filter) and "filename" is only a best-effort hint (harmless if None).
- _gather_embedded_wheels Case D/E: comment that p.url is a placeholder
  (is_real_url=False + cached wheel => p.url is never fetched).
- Pass B overlay cleanup: comment that the on-failure rmtree + the finally rmtree
  double-remove is intentional and safe (ignore_errors=True).

No functional change to validated paths (comments + one diagnostic warning).
77 prebuilt unit tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@npow

npow commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Canon PR Review — Engineering Standards Scorecard

PR: Generic sdist-build hooks in conda; port prebuilt sdist/git support
Reviewed against: Top 20 Tier 2 practices (Code Complete, Clean Code, APoSD, Refactoring, Release It!, DDIA, The Pragmatic Programmer, Working Effectively with Legacy Code, Design Patterns GoF)

Scores

Dimension Score Gate
Code Quality (naming, SRP, DRY, guard clauses, smells) 3/5
Module Design (deep modules, composition, docs, encapsulation) 4/5
Reliability (error handling, idempotency, circuit breakers, timeouts, degradation) 4/5
Maintainability (small steps, test before change, broken windows, regression tests) 4/5

Decision: APPROVE


Violations Found

Code Quality

CC-021 — Long Method (Source: Refactoring, Clean Code)

  • Location: metaflow-prebuilt/metaflow_extensions/prebuilt/plugins/conda/prebuilt_build_install.py_run_deferred_sdist_builds
  • Issue: The function is ~200 lines of body spanning setuptools check, overlay management (ensure_overlay + overlay_install closures), local artifact map construction, and the per-sdist build loop. The inner closures _ensure_overlay and _overlay_install help, but the outer function still does four distinct things. Not a blocker — the logic is coherent and well-commented — but a future extract of _build_sdist_with_overlay(sdist, env_dir, ...) would isolate the per-sdist concern from the per-image overlay lifecycle.
  • Violation excerpt:
    def _run_deferred_sdist_builds(env_dir, resolved_env):
        ...
        build_env = dict(os.environ)
        overlay_state = {"dir": None, "staged": {}}
    
        def _ensure_overlay(): ...
        def _overlay_install(pkgs, what): ...
    
        try:
            # (a) setuptools pre-check (b) local artifact map (c) per-sdist loop
            ...
        finally:
            if overlay_state["dir"]:
                shutil.rmtree(overlay_state["dir"], ignore_errors=True)
  • Suggested improvement: Extract a _build_one_sdist(sdist, python_bin, uv_bin, build_env, resolved_env, overlay_install) helper so the outer function owns only the overlay lifecycle and the iteration, not the per-sdist build mechanics.

CC-006 — Minor knowledge duplication (Source: The Pragmatic Programmer)

  • Location: prebuilt_build_install.py:889-913 (_overlay_install) and prebuilt_build_install.py:985-1013 (Pass B main build)
  • Issue: Both branches produce a uv/pip argument list with very similar structure (uv: uv pip install --python <py> ..., pip: python -m pip install ...). The commit message mentions "_install_args helper" was considered — the two call-sites have different flags (--target overlay vs --prefix env_dir, --no-build-isolation --no-deps only in Pass B) so they're not a straight DRY violation, but a shared _make_install_cmd(python_bin, uv_bin, ...) factory accepting target/prefix/extra_flags would remove the conceptual duplication cleanly.
  • Severity: Minor — different flags make these distinct enough that the duplication is defensible. Score impact minimal.

Reliability

CC-058 — Missing explicit timeouts on subprocess calls (Source: Release It!)

  • Location: prebuilt_build_install.py — three subprocess.run call sites (setuptools probe, _overlay_install, Pass B main build)
  • Issue: All three subprocess.run calls have capture_output=True but no timeout= parameter. A pip/uv install that hangs (network stall, deadlocked subprocess) will block the thread indefinitely inside a Docker build step. For build operations this is often accepted practice, but an explicit large timeout (e.g. timeout=3600) with a clear message would surface hangs rather than silently stalling image builds.
  • Violation excerpt:
    r = subprocess.run(args, capture_output=True, text=True)
    ...
    result = subprocess.run(build_args, capture_output=True, text=True, env=build_env)
  • Suggested improvement:
    BUILD_TIMEOUT_SECS = int(os.environ.get("METAFLOW_PREBUILT_BUILD_TIMEOUT", "3600"))
    result = subprocess.run(build_args, capture_output=True, text=True,
                            env=build_env, timeout=BUILD_TIMEOUT_SECS)
  • Severity: Low — build operations are legitimately long-running. Does not change the APPROVE decision.

What's Working Well

  1. Exemplary deep-module design: _make_envs_resolver() zero-arg factory, only_binary=False default, defer_pypi_sdist_build=False default — all three primitives are default-off, composable, and hide real complexity behind a one-parameter call. The PR description's claim that "non-prebuilt resolve/create is byte-for-byte unchanged" is substantiated by the diff.

  2. Regression test discipline: Every bug found in the 17-round review cycle has a targeted regression test that fails on the old code and passes on the fix (test_gather_embedded_wheels_prefers_cached_wheel_over_local_sdist, TestImageIdentity, ensure_remote_conda_test.py, decorator_env_compat_test.py, test_prebuilt_passb.py). This is CC-107 at its best.

  3. Comment quality on invariants: The storage-guard comments (# Storage guard: without a datastore...), the mode-guard comment (# We honor an explicitly-set local file even in mode="remote"), and the build-overlay design (# never installed into the env, so the shipped image stays byte-identical to the resolved conda manifest) are exactly the kind of non-obvious constraint documentation CC-028 asks for.


To Unblock

Nothing blocking — this is an APPROVE. The two suggestions above (extract a per-sdist helper from _run_deferred_sdist_builds, add subprocess timeouts) are quality improvements worth tracking but not merge blockers.


Scored against: CC-003, CC-005, CC-006, CC-011, CC-021 (Code Quality) | CC-027, CC-015, CC-028, CC-017 (Module Design) | CC-014, CC-048, CC-056, CC-058, CC-060 (Reliability) | CC-019, CC-031, CC-093, CC-107 (Maintainability)

@npow

npow commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Canon PR Review — Engineering Standards Scorecard

PR: generic conda sdist-build-deferral hooks + metaflow-prebuilt orchestration package (v0.3.0)
Reviewed against: Top 20 Tier 2 practices (Code Complete, Clean Code, APoSD, Refactoring, Release It!, DDIA, The Pragmatic Programmer, Working Effectively with Legacy Code, Design Patterns GoF)

Scores

Dimension Score Gate
Code Quality (naming, SRP, DRY, guard clauses, smells) 3/5
Module Design (deep modules, composition, docs, encapsulation) 4/5
Reliability (error handling, idempotency, circuit breakers, timeouts, degradation) 4/5
Maintainability (small steps, test before change, broken windows, regression tests) 4/5

Decision: APPROVE


Violations Found

Code Quality

CC-021 — Long Method (Source: Refactoring, Clean Code)

  • Location: metaflow-prebuilt/metaflow_extensions/prebuilt/plugins/conda/prebuilt_build_install.py_run_deferred_sdist_builds
  • Issue: The function is ~150 lines (excluding docstring), well past the 50-line threshold. It handles three distinct sub-phases: (1) the setuptools<82 pre-check and overlay staging, (2) local-file resolution from resolved_env.packages, and (3) the per-sdist build loop. The nested closures (_ensure_overlay, _overlay_install) do help, but they live inside the function body and add to its cognitive load.
  • Suggested improvement: Extract _ensure_build_overlay(build_env, python_bin, uv_bin) as a standalone helper that manages the overlay lifecycle and returns a context manager or a staging callable. This keeps the try/finally overlay cleanup contained and reduces _run_deferred_sdist_builds to the happy-path loop + local-file resolution.

Reliability

CC-058 — No explicit timeouts on subprocess calls (Source: Release It!, DDIA)

  • Location: prebuilt_build_install.py:_overlay_install and the per-sdist subprocess.run in _run_deferred_sdist_builds
  • Issue: Every subprocess.run in the Pass B build path uses capture_output=True, text=True with no timeout=. A stuck uv pip install (network stall, lock contention, or infinite build loop for a pathological sdist) will hang the container's RUN step indefinitely — Docker's build timeout is the only backstop, and that's typically 1 hour.
  • Violation excerpt:
    r = subprocess.run(args, capture_output=True, text=True)  # _overlay_install
    result = subprocess.run(build_args, capture_output=True, text=True, env=build_env)  # sdist build
  • Suggested improvement: Add a reasonable wall-clock timeout (e.g. 600s for a staging install, 1800s for an sdist compile — or make it configurable via an env var). The failure message already includes r.stderr, so the timeout exception message would be surfaced the same way:
    r = subprocess.run(args, capture_output=True, text=True, timeout=600)
    A subprocess.TimeoutExpired should be caught and re-raised as CondaException with the partial output.

What's Working Well

  1. CC-028 (Interface comments): Documentation quality throughout is exceptional. _gather_embedded_wheels documents all three cases (B/D/E) with exact conditions. _run_deferred_sdist_builds explains --no-build-isolation rationale, the build overlay lifetime, and residual limitations. _image_variant / _env_cache_key / _image_dedup_key all explain why the split exists and what each key dimension covers — a future maintainer can reason about the invariants without reading the implementation.

  2. CC-017 (Encapsulate what varies): The _CONDA_ARCH_TO_DOCKER_PLATFORM dict isolates cross-arch platform mapping in data (adding a new Linux arch is a one-liner). defer_pypi_sdist_build is threaded through a factory hook, keeping the prebuilt-specific opt-in out of core conda. _SCHEMA_VERSION = "2" as a named constant makes schema evolution explicit.

  3. CC-107 (Regression tests): Every bug fixed in the 27-commit history has a precise reproduction test that would fail on main and pass with the fix: ensure_remote_conda_test.py pins the truthy-vs-is not None edge, decorator_env_compat_test.py pins the isinstance widening, TestImageIdentity.test_bootstrap_resolves_env_when_images_keyed_by_variant pins the CPU/GPU collision fix. test_prebuilt_passb.py is particularly thorough — it manually constructs a tarball to test the pyproject build-requires extraction path without requiring a real build backend.


Scored against: CC-003, CC-005, CC-006, CC-011, CC-021 (Code Quality) | CC-027, CC-015, CC-028, CC-017 (Module Design) | CC-014, CC-048, CC-056, CC-058, CC-060 (Reliability) | CC-019, CC-031, CC-093, CC-107 (Maintainability)

npow pushed a commit that referenced this pull request Jun 18, 2026
…rity/robustness)

- python_requires>=3.10 (blocking): add a setup.py comment (both packages) that the
  floor mirrors the internal minimum (<3.10 dropped internally); no 3.10-only
  construct in the code — _parse_build_system_requires has tomllib/tomli/regex.
- Pass B shared overlay: _overlay_install now warns when a build dep is re-staged
  at a DIFFERENT version (surfaces the documented shared-overlay conflict instead
  of a silent miscompilation).
- deferred_sdists: comment that "url" is the guaranteed source (is_downloadable_url
  filter) and "filename" is only a best-effort hint (harmless if None).
- _gather_embedded_wheels Case D/E: comment that p.url is a placeholder
  (is_real_url=False + cached wheel => p.url is never fetched).
- Pass B overlay cleanup: comment that the on-failure rmtree + the finally rmtree
  double-remove is intentional and safe (ignore_errors=True).

No functional change to validated paths (comments + one diagnostic warning).
77 prebuilt unit tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@npow npow force-pushed the npow/generic-sdist-hooks branch from 78a123b to 1c79184 Compare June 18, 2026 21:30
npow pushed a commit that referenced this pull request Jun 18, 2026
…rity/robustness)

- python_requires>=3.10 (blocking): add a setup.py comment (both packages) that the
  floor mirrors the internal minimum (<3.10 dropped internally); no 3.10-only
  construct in the code — _parse_build_system_requires has tomllib/tomli/regex.
- Pass B shared overlay: _overlay_install now warns when a build dep is re-staged
  at a DIFFERENT version (surfaces the documented shared-overlay conflict instead
  of a silent miscompilation).
- deferred_sdists: comment that "url" is the guaranteed source (is_downloadable_url
  filter) and "filename" is only a best-effort hint (harmless if None).
- _gather_embedded_wheels Case D/E: comment that p.url is a placeholder
  (is_real_url=False + cached wheel => p.url is never fetched).
- Pass B overlay cleanup: comment that the on-failure rmtree + the finally rmtree
  double-remove is intentional and safe (ignore_errors=True).

No functional change to validated paths (comments + one diagnostic warning).
77 prebuilt unit tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@npow npow force-pushed the npow/generic-sdist-hooks branch from 1c79184 to 8adf4f3 Compare June 18, 2026 22:23
Nissan Pow and others added 20 commits June 19, 2026 00:55
Give core conda two generic, default-off primitives and move ALL
prebuilt-specific orchestration into metaflow-prebuilt, so conda has zero
knowledge of the prebuilt feature.

metaflow-netflixext (1.3.11 -> 1.3.12), +99/-6 in conda core:
- pip_resolver: `defer_pypi_sdist_build` records sdists ({name,version,url})
  instead of building them during resolution; default False.
- conda._create/create_for_step/create_for_name: `only_binary` installs only
  pre-built wheels (analogous to `pip/uv --only-binary=:all:`); default False.
- conda_environment: `_make_envs_resolver()` factory hook so a subclass can
  opt into deferral without duplicating init_environment.
Every change is a default-off param or a guarded branch; non-prebuilt
resolve/create behaviour is byte-for-byte unchanged. No `prebuilt`,
`deferred_builds.json`, or `schema_version` references in conda core.

metaflow-prebuilt (0.2.1 -> 0.3.0): owns the prebuilt protocol.
- prebuilt_conda_environment: `_make_envs_resolver` enables deferral;
  `_gather_embedded_wheels` materializes wheels for non-web (git/local)
  packages; `_generate_dockerfile` writes deferred_builds.json (schema "2")
  + COPYs it and deferred_wheels into the image.
- prebuilt_build_install: real two-pass installer (was a stub). Pass A
  `create(only_binary=True)`; Pass B builds deferred sdists via uv. Owns the
  /app/deferred_builds.json wire format + schema; _BUILD_INSTALL_MODULE ->
  the prebuilt namespace.

Restores the deepspeed (sdist) / transnetv2 (git) support that lived inline
in mli-metaflow-custom but was never ported to the OSS packages.

Validated: 57 prebuilt unit tests; edited netflixext imports with all flags
default-off; composition checks (deferred path writes schema-2 json + COPY;
no-deferred path unchanged; resolver opt-in; container reader).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…>=1.3.12, sanitize PYTHONPATH

- P1: _get_or_build_image scoped the resolver-wide deferred_sdists aggregate to
  the packages in THIS image's resolved_env. Previously every image received
  every environment's deferred sdists, so an unrelated step's image would run
  Pass B for an sdist it doesn't contain. (The inline impl had the same bug.)
- setup.py: bump metaflow-netflixext lower bound 1.3.11 -> 1.3.12 (the prebuilt
  code calls only_binary / defer_pypi_sdist_build, added in 1.3.12; a stale
  1.3.11 would TypeError at runtime).
- bootstrap_commands: sanitize host-inherited PYTHONPATH (save
  MF_ORIG_PYTHONPATH, drop bare host stdlib dirs via the awk pipeline) to match
  the nflx base CondaEnvironment; the prior bare passthrough let host stdlib
  shadow the conda stdlib and lost the env-escape restoration point.

(Codex also flagged --no-build-isolation dropping build-requires; that is a
deliberate trade-off — Pass A installs all runtime deps so the sdist setup.py
can import them, e.g. deepspeed needs torch at build time, and the builder has
no PyPI for isolation to fetch from. Matches the inline implementation.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CI installs local netflixext

- P1 datastore: restore the `and self._storage` guard in conda.py lazy_fetch_packages
  and the _create conda section (the OSS migration dropped it; the inline impl has
  it). Without it, a non-local deploy datastore (e.g. s3) writes cache_info into the
  manifest; the prebuilt build container runs Conda(datastore="local", _storage=None),
  so "cache" wins the source race but the cache-download block is a no-op, leaving
  pypi packages unfetched -> "Local file expected" failure. With the guard, cacheless
  contexts fall through to the package's web URL. Non-prebuilt callers (storage
  present) are unchanged.
- CI: test-prebuilt.yml installs the checked-out metaflow-netflixext before
  metaflow-prebuilt, so the `>=1.3.12` dependency resolves to this PR's package
  (the new APIs) instead of unpublished PyPI, and CI tests the PR's netflixext.
- prebuilt_build_install: document the --no-build-isolation build-requires limitation
  (sdists needing build deps beyond setuptools/wheel); matches the inline impl,
  tracked as a follow-up.

Codex's sdist-hash finding was a false positive: the archive hash is carried in the
spec URL (url#hash), identical in the defer and non-defer paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nt GPU cache-key follow-up

- P2 deferred-sdist identity (refines the round-1 per-image scoping): match the
  deferred records by exact sdist filename AND restrict to pypi packages, instead
  of (name, version). Adds "filename" (= _spec.filename) to the deferred record in
  pip_resolver. Prevents a conda package or a same-name/version package from a
  different source from pulling the wrong env's sdist into an image.
- P1 GPU/CPU image cache key: documented as a TODO follow-up on _image_cache_key.
  It's pre-existing (the inline impl has the same gap) and a proper fix needs the
  gpu dimension in the key + the registry image tag + bootstrap_commands (an
  image-tag schema bump) — out of scope for this port; tracked for a dedicated PR.

Codex's non-remote-steps finding was a false positive: bootstrap_commands is only
invoked from remote-compute submission paths; local steps activate conda via
_METAFLOW_CONDA_ENV / task_pre_step and never reach the "image not registered" raise.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…h; align Kaniko GCS path

- P1 embedded wheels (CRITICAL for git/local package support, e.g. transnetv2):
  lazy_fetch_packages skipped local files when mode=="remote" (`if self._mode !=
  "remote":`). The prebuilt build container runs Conda(mode="remote") and registers
  embedded wheels (for non-web-downloadable packages) as local files, which were
  then ignored -> fell through to the fake non-web URL -> build failed before Pass
  A. Honor an explicitly-set local file even in remote mode, guarded by
  os.path.isfile (no-op for normal remote tasks, which set no local files). This
  restores the guard the OSS migration dropped (the inline impl has it).
- P2 Kaniko GCS path: drop empty path components so blob_path (upload) stays
  aligned with context_url (built from bucket.rstrip("/")) for buckets like
  gs://bucket/prefix/ — otherwise the context is uploaded to prefix//<key> and
  Kaniko can't find it.
- P1 image identity (GPU/CPU base + target arch) and P2 cross-arch named-env
  resolution: documented as a tracked image-identity follow-up (TODOs on
  _image_cache_key and the alias-resolution path). Pre-existing; a proper fix is an
  image-tag schema bump across the ImageRegistry contract + bootstrap_commands,
  out of scope for this port. In a single deploy all remote steps share one
  arch + base, so these are latent edge cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e-hit safe); pkg/path fixes

- P1 cache-hit deferred sdists: previously the deferred-sdist list was sourced
  from the resolver's transient list, which is EMPTY on a cache hit (a re-deploy
  loads the env from the conda manifest and skips resolution) — so the rebuilt
  image emitted no deferred_builds.json while create_for_step(only_binary=True)
  still skipped the cached .tar.gz, silently producing an image missing that
  package. Now derive deferred sdists DIRECTLY from resolved_env.packages (pypi,
  url_format != ".whl", web-downloadable) in _get_or_build_image — correct on
  both fresh and cache-hit deploys, and naturally per-image. Removed the now-dead
  resolver-side deferred_sdists recording (pip_resolver, envsresolver) and the
  _envs_resolver_ref stash; the resolver still defers the *build* (keeps sdists
  as source specs in the resolved env).
- P2 named-env @{VAR}: apply sub_envvars_in_envname before env_id_from_alias so
  'env-@{NAME}' resolves the concrete alias, as standard conda does.
- P2 Kaniko S3 prefix: normalize trailing slash (align with the GCS fix and
  context_url's bucket.rstrip("/")).
- P2 python_requires: bump to >=3.10 to match metaflow-netflixext>=1.3.12 (avoid
  pip selecting this wheel on 3.8/3.9 then failing the dependency).
- Image identity (GPU/arch) + cross-arch named-env: tracked as documented
  follow-ups (image-tag schema bump).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…omplete named-env @{VAR} fix

- P1: Pass B (_run_deferred_sdist_builds) now prefers the local sdist artifact
  that Pass A's lazy_fetch already downloaded and hash-verified, instead of
  re-downloading the raw URL (which carries no creds/hash and could differ from
  the resolved env). Falls back to the URL only when no local file is present,
  so public-PyPI behavior is unchanged.
- P2 named-env @{VAR}: bootstrap_commands now applies sub_envvars_in_envname
  before _named_state_key, matching _build_prebuilt_images. Without it a build
  stored under the substituted alias would not be found at bootstrap ("image not
  registered") for names like 'env-@{NAME}'.

Cross-arch named-env resolution remains the documented image-identity follow-up
(env_id_from_alias resolves for the deploy host arch).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…odeBuild region, mutable-URL hash)

These are P2 edge cases in non-default code paths, none affecting the primary
Netflix newt path or the validated public targets:
- buildx cross-arch --platform (folds into the image-identity follow-up),
- ECR+CodeBuild region precedence (codebuild-region follow-up),
- deferred-sdist identity for *mutable* URLs (the sha256 is carried on the spec
  URL, so identity is stable per-URL; mutable URLs are a PyPI anti-pattern).
Added in-code TODO/NOTEs for traceability rather than expanding this PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…le (lazy_fetch)

The round-4 embedded-wheel fix added `and os.path.isfile(local_path)` to the
lazy_fetch local-file check. That regressed the conda integration suite
(tests/plugins/conda/create_env_test.py), which registers a non-web package's
local file at a tempfile.mktemp() path that is not physically created, then
expects lazy_fetch to honor it — with the isfile guard the file is skipped and
resolution errors "No non-web source for non web-downloadable package".

Revert to the original truthy `if local_path:` check (keeping the mode-guard
removal so remote-mode tasks still honor explicitly-registered local files).
The prebuilt embedded-wheel case is unaffected: those wheels are COPY'd into the
build context and registered with a real on-disk path, so the truthy check
honors them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… for Pass B; doc cross-arch limit

- P2 deferred-sdist hash: the deferred sdist spec is now constructed with
  hashes={url_format: parse_result.hash} (matching the wheel branch), so its
  sha256 contributes to the resolved-env full_id and Pass A/B can verify the
  bytes — instead of full_id seeing "filename#None" (mutable/direct-URL sdists
  could otherwise reuse the same env/image identity for different content).
- P2 Pass B PATH: the deferred-sdist build subprocess now runs with <env>/bin
  prepended to PATH, so a build that invokes tools Pass A installed into the env
  (compilers, cmake/ninja, console scripts) finds them rather than only the
  Docker build image's PATH.
- P2 cross-arch sdist deferral: documented as a known limitation. pip's
  --platform (used for cross-arch resolution) requires --only-binary=:all:, so a
  deferred sdist-only dep can't be resolved cross-arch (e.g. arm64->linux-64);
  the common same-arch workbench->Titus path is unaffected. A proper fix needs
  arch-independent sdist metadata extraction (a follow-up).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…step deferral

- P2: the deferred-sdist derive now also requires `not p.local_file(".whl")`, so
  a package that kept a source url_format but already has a built wheel
  (pylock/uv/legacy cache) is installed directly in Pass A rather than being
  reclassified as a deferred sdist and rebuilt in Pass B.
- P2: documented that sdist deferral applies to all of a flow's steps; a local /
  non-remote step builds the deferred sdist at create time from the lazy-fetched
  source (works for buildable sdists; same build-requires limitation as Pass B).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g); document pylock limit

The round-9 guard (`and not p.local_file(".whl")`) excluded source-url_format
specs that already have a built wheel from the deferred list, but those packages
are web-downloadable so _gather_embedded_wheels doesn't embed them either —
leaving them neither deferred nor embedded, i.e. silently missing from the image.
That silent failure is worse than the round-8 behavior (defer -> Pass B rebuilds
the sdist, which fails loudly if it can't build). Revert the guard.

This edge only arises for pylock/conda-lock-resolved pypi sdists (source
url_format + an already-built wheel); the supported pip `@pypi` defer path never
produces such specs (it doesn't build sdists on the deploy machine). Properly
reusing the deploy-built wheel needs the container's only_binary install to
prefer an embedded .whl over the source url_format — documented as a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nd 9/10 edge)

A pypi spec can keep a source url_format (".tar.gz") while a resolver
(pylock/conda-lock) already built a wheel for it. Previously such a package was
either deferred (rebuilt from source in Pass B — redundant, could diverge) or,
after the round-9 guard, dropped entirely (silently missing). Handle it properly:

- deferred_sdists derive: defer only web-downloadable source packages with NO
  built wheel (add `local_file(".whl") is None and cached_version(".whl") is None`).
- _gather_embedded_wheels: broaden candidates to packages that need a built wheel
  embedded — non-web (git/local) AND web-source-with-a-cached-wheel — fetch the
  wheel on the deploy machine (where _storage is live, so cached_version(".whl")
  resolves and local_file(".whl") is populated), embed the .whl (never the source),
  and record url_format=".whl".

No _create change needed: _ALL_PYPI_FORMATS lists ".whl" first, so _create's
pypi_paths loop picks the embedded .whl over a source url_format and only_binary
keeps it. _register_embedded_wheels is unchanged (it add_local_file's the
record's url_format, now always ".whl").

Verified by an A-E classification trace: A web-fetched, B/D/E embedded, C
deferred (deepspeed), conda excluded — disjoint and complete; create_env_test
(cached_version None + local_file(".whl") set) is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e deferring (Codex r11)

Two fixes for the deferred-sdist / cache interaction in --environment=prebuilt:

P1 (build-container storage): prebuilt_build_install runs inside the Docker
build container, which has the resolved-env manifest but NOT the deploy
machine's package cache. With the OSS-default CONDA_IGNORE_CACHING_DATASTORES=[],
Conda(datastore_type="local") builds a LocalStorage over that empty cache, so
lazy_fetch prefers it and fails instead of downloading conda/wheel packages
from their web URLs. Force the container's "local" datastore to non-caching by
setting METAFLOW_CONDA_IGNORE_CACHING_DATASTORES=["local"] before importing
metaflow (from_conf freezes config at import; list configs parse as JSON), so
_storage is None and the storage guards fall through to web + embedded wheels.
Proven: Conda(local)._storage is LocalStorage without the var, None with it.

P2 (probe cached wheels before deferring): the defer branch previously removed
deferred keys from to_build_pkg_info BEFORE build_pypi_packages, skipping its
cache probe. A package with an already-built wheel in the cache got deferred +
rebuilt in the container instead of reused. Now collect the deferred keys and
pass them to build_pypi_packages via defer_keys: the cache probe still runs
(attaching cached_version('.whl') so the prebuilt derive embeds the wheel and
excludes it from deferred_sdists), and only the expensive build is skipped for
sdists with no cached wheel. A no-storage fallback preserves the historical
defer path. Default (defer off) is byte-for-byte unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e base-image copies (Codex r12)

_generate_dockerfile previously only wrote + COPYed deferred_builds.json when
the env had deferred sdists or embedded wheels. But the in-container installer
(prebuilt_build_install) treats ANY file at the absolute path
/app/deferred_builds.json as current. If the configured base image
(METAFLOW_PREBUILT_BASE_IMAGE — arbitrary user config, possibly a prior prebuilt
image) already carries a stale hand-off there, a build with no deferred work
would consume those stale records (wrong env) or fail schema validation.

Always write the current env's hand-off (empty sdists/wheels when there is
nothing deferred) and always COPY it, so it overwrites any inherited copy. An
empty hand-off is a valid no-op for the installer. Embedded-wheels dir COPY
stays gated on having wheels; a stale inherited wheels dir is harmless because
the current (empty) hand-off lists no wheels to read.

Updates test_generate_dockerfile to assert the hand-off is always present
(empty) + COPYed, and adds a test for the deferred sdists + embedded wheels case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_gather_embedded_wheels embeds a BUILT WHEEL for pypi packages the build
container can't fetch/build. It called lazy_fetch_packages on the package spec
itself, then searched for a local .whl. But lazy_fetch materializes only a
package's single most-preferred source, and a local file outranks a cached
version. A source-format package that carries a local source archive (cases D/E
— e.g. a local/git sdist registered during resolution) on a resolve where the
wheel is a cache HIT therefore had its sdist materialized and the cached wheel
left unfetched, so the wheel search found nothing and the build failed loudly on
an otherwise valid environment.

Fetch the wheel through a dedicated spec instead:
  - url_format == ".whl" (case B, the package IS the wheel): fetch it directly.
  - source package with a cached wheel (D/E): build a synthetic wheel-only,
    cache-only PypiPackageSpecification (no source local file) so the cached
    ".whl" is the preferred/only source and is always materialized.
  - non-web source with no cached wheel: fall through to the existing loud
    failure (it can be neither fetched nor built in the container).

Adds a behavioral regression test (test_gather_embedded_wheels_prefers_cached_
wheel_over_local_sdist) with a faithful lazy_fetch stub; it fails on the old
logic (CondaException) and passes on the fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e setuptools<82 in Pass B (Codex r14)

P2-A (pip_resolver): a deferred sdist's source spec becomes the resolved-env
identity (it is not replaced by a built wheel). For a downloadable sdist with no
content hash (e.g. a direct archive URL with no #sha256), the env full_id would
depend on `filename#None`, so different bytes at the same URL could silently
reuse the same prebuilt image. Only defer sdists that carry a hash; leave
hashless ones to build_pypi_packages, which builds the wheel and derives a
stable content-hash identity (the pre-defer behavior).

P2-B (prebuilt_build_install): Pass B builds sdists with --no-build-isolation,
so the build backend is the env's OWN setuptools. setuptools>=82 dropped
pkg_resources, which legacy setup.py files import — the deploy-side builder
constrains setuptools<82 for exactly this, but Pass B only pinned <82 when
setuptools was missing. Make the pre-check version-aware: require an importable
wheel + setuptools major < 82, downgrading/installing setuptools<82 otherwise.
(Verified against conda-forge's python 3.14 env, which ships setuptools 82.0.1.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…te the runtime env (Codex r15)

The r14 fix downgraded setuptools<82 directly into the target env so Pass B's
--no-build-isolation builds could use it. But that mutated the RESOLVED runtime
environment: an env that resolved/pinned setuptools>=82 (the conda-forge default
ships 82.x) would be silently downgraded while the image still carries the
original .metaflowenv full_id, so tasks would run with a different setuptools
than the manifest claims.

Treat the <82 pin as transient: when a setuptools is already present (and the
probe is unsatisfied), record its version, build with <82, then reinstall the
resolved version after the builds — the built wheels are already installed and
don't depend on the build-time setuptools. Restoration failure raises (refuse to
ship an env that disagrees with its manifest) rather than degrading silently.
A previously-absent setuptools is just added (nothing resolved to preserve).

Factors the uv/pip install invocation into a local _install_args helper.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, not is-not-None) (Codex r16)

_ensure_remote_conda gated the datastore-installer path on
`CONDA_REMOTE_INSTALLER is not None`. An EMPTY string ("") is not None, so it
took _install_remote_conda — which requires a datastore _storage to fetch the
installer. The prebuilt build container deliberately sets
METAFLOW_CONDA_REMOTE_INSTALLER="" to disable the datastore installer and runs
with _storage=None (it self-installs micromamba from the web), so the
`is not None` check routed it to _install_remote_conda and raised
("Downloading conda remote installer ... is unimplemented"), failing every
generated prebuilt image build.

Widen the check to truthy so an empty installer correctly takes the
self-install (_ensure_micromamba) path. A configured (non-empty) installer is
unchanged. This restores an enabler-fix the OSS migration dropped (same class as
the storage guard); adds ensure_remote_conda_test.py guarding both branches
(it fails on the old is-not-None check, passes on the fix).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ver mutate the env (Codex r17)

Supersedes the r14b/r15 approach (install setuptools<82 into the env, then
reinstall the resolved version). r17 correctly flagged that the restore used
pip/uv, layering a PyPI setuptools over the conda-provided one — so the shipped
image no longer matched the resolved conda manifest (wrong provenance / hash).

Root fix: don't touch the env at all. When the env's setuptools is missing or
>=82, stage setuptools<82 + wheel into a temp dir and prepend it to the build
subprocess' PYTHONPATH (a build overlay). Pass B's --no-build-isolation build
then imports setuptools<82 from the overlay while runtime deps (e.g. torch)
still resolve from the env's site-packages; the env's own conda setuptools is
untouched, so the image stays byte-identical to its manifest full_id. The
overlay is removed in a finally after the builds (transient build tooling,
never shipped). Collapses the three setuptools rounds (r14b/r15/r17) into one
no-mutation design.

Note: whether uv/pip honor the overlay PYTHONPATH for the --no-build-isolation
build backend is best confirmed by the L3 image build (deepspeed); the design
is correct-by-construction (no env mutation) regardless.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Nissan Pow and others added 8 commits June 19, 2026 00:55
… only TYPE=="conda" (L3 finding)

L3 ground-truth: `--environment=prebuilt` + @pypi/@conda failed immediately with
"The @pypi decorator requires --environment=conda". The netflixext step/flow
decorators gated on `environment.TYPE != "conda"`, which rejects
PrebuiltCondaEnvironment (a CondaEnvironment subclass with TYPE="prebuilt") —
so the prebuilt environment was unusable with @pypi/@conda at all. Nothing
registered "prebuilt" as a supported env; static review (the whole Codex cycle)
missed this because it is a decorator/registration gap, not a logic bug in the
changed files.

Widen the three checks (PackageRequirementStepDecorator.step_init,
CondaEnvInternalDecorator/StepRequirementMixin step_init, and the flow
PackageRequirementFlowDecorator.flow_init) from `TYPE != "conda"` to
`not isinstance(environment, CondaEnvironment)`. This accepts any conda-based
environment (incl. downstream subclasses like prebuilt) WITHOUT netflixext
hardcoding "prebuilt" (upstream stays agnostic of the downstream package); a
non-conda env (e.g. nflx, which subclasses MetaflowEnvironment) is still
rejected. Verified end to end: the prebuilt flow then resolves + builds images.

Adds decorator_env_compat_test.py (conda subclass accepted, plain conda
accepted, non-conda rejected); it fails on the old TYPE check, passes on the fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_run_deferred_sdist_builds (Pass B) logs progress via _echo(..., timestamp=False),
but echo_always() forwards **kwargs to click.secho() -> style(), which has no
'timestamp' parameter -> TypeError. Pass B would crash on its first progress line
the moment it ran on a deferred sdist with no embedded/cached wheel. It never
surfaced in the Netflix flow because the resolver constrains setuptools<82 and the
S3 cache holds built wheels, so sdists get embedded (Pass A) and Pass B never fires.
Fix: _echo() drops the unsupported 'timestamp' kwarg (nl=False stays valid for secho).

Add tests/unit/test_prebuilt_passb.py (no prior coverage of the in-container build):
- _SETUPTOOLS_CHECK probe: exit 0 (<82), 4 (>=82), 3 (missing)
- _run_deferred_sdist_builds compiles + installs a real deferred sdist
- the setuptools<82 overlay rescues a deliberately-incomplete build toolchain
- no-op when the deferred-sdist list is empty (the normal Netflix case)

Found by running the new tests; verified against the real metaflow echo_always/secho
signatures (codex-confirmed). Full prebuilt unit suite: 65 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…uires

Two latent-bug fixes flagged by TODO/KNOWN-LIMITATION in the prebuilt code.

1) image-identity (was: _image_cache_key keyed on req_id/full_id only)
   Two steps that share an env but need a DIFFERENT base image (CPU vs GPU via
   @resources(gpu=1)) or a different arch collided on one cached image AND one
   registry tag — a GPU build could overwrite the CPU image's tag, and a GPU step
   could run on a CPU base. Split the identities:
     - _env_cache_key(env_id) = req_id_full_id (base-INDEPENDENT) keys
       _prebuilt_env_paths + the runtime bootstrap lookup (the conda env/env_path
       are identical across bases; bootstrap can't know the base at runtime).
     - _image_variant(base_image, arch) + _image_dedup_key key _prebuilt_images
       (in-process dedup) AND are appended to the registry push/pull tags via
       _tag_with_variant, so distinct-base/arch images get distinct tags.
     - bootstrap_commands validates against _prebuilt_env_paths (env key), not
       _prebuilt_images (now keyed by env+variant, which bootstrap can't rebuild).
   PREBUILT_IMAGE_SCHEMA_VERSION bumped v28 -> v29.

2) Pass B build-requires (was: KNOWN LIMITATION — sdists needing Cython/maturin/
   setuptools_scm/etc. failed under --no-build-isolation)
   _build_requires_from_sdist() reads [build-system].requires from the sdist's
   pyproject.toml (tomllib/tomli or a regex fallback) and stages them into the
   build overlay alongside the controlled setuptools<82 + wheel before building.
   Overlay creation refactored into a lazy _ensure_overlay/_overlay_install, now
   fully inside try/finally so it's always cleaned up. (The overlay is shared
   across a single image's sdists — consistent with --no-build-isolation; not
   per-sdist isolation.)

Tests: tests/unit/test_prebuilt_conda_environment.py (TestImageIdentity: variant
distinguishes base/arch, dedup key + tag don't collide for a shared env, bootstrap
resolves via the env key when only a variant image key exists) and
tests/unit/test_prebuilt_passb.py (build-requires extracted + staged; an sdist
whose setup.py imports a build-only dep builds). 74 prebuilt unit tests pass.

Both codex-reviewed CORRECT. image-identity validated end-to-end on real Titus:
PrebuiltImageIdentityFlow/2 — a CPU step and a GPU step that share full_id
67c32f7f got DISTINCT images (CPU: nvidia-smi absent; GPU: nvidia-smi present),
proving the collision is gone.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n TODOs)

Resolves the two remaining open TODOs in the prebuilt code.

1) TODO(image-identity) cross-arch build platform (was: buildx built for the
   builder's DEFAULT platform, so an arm64 deploy of a linux-64 step baked an
   arm64 image whose in-container arch_id() mismatched the resolved manifest).
   build_and_push() gained `target_platform` (Optional, default None = builder
   default / unchanged same-arch). The caller derives it from the resolved env's
   arch (_docker_platform_for_arch: linux-64->linux/amd64, linux-aarch64->
   linux/arm64, else None). buildx/docker emit `--platform`; kaniko emits
   `--custom-platform`; CodeBuild + the Netflix NewtBuildService accept it but
   build on a fixed remote arch (Cloudbuild=linux-64), so they ignore it. The
   kwarg is on EVERY build_and_push override (incl. NewtBuildService) so the
   unified call can't TypeError.

2) TODO(codebuild-region): a CodeBuild project in a different region than ECR
   built in the wrong region. EcrRegistry.push_credentials now exposes a
   `codebuild_region` (from METAFLOW_PREBUILT_CODEBUILD_REGION); CodeBuildService
   prefers it (codebuild_region -> region -> env), falling back to the ECR region
   when unset (unchanged behavior).

Also corrects a stale docstring: _run_deferred_sdist_builds still described the
build-deps thing as a KNOWN LIMITATION after it was implemented; now documents the
actual behavior + the residual shared-overlay/binary-tool caveats.

Tests: buildx --platform (set / default-absent), _docker_platform_for_arch mapping.
77 prebuilt unit tests pass. codex-reviewed (kaniko flag spelling fixed per review).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rity/robustness)

- python_requires>=3.10 (blocking): add a setup.py comment (both packages) that the
  floor mirrors the internal minimum (<3.10 dropped internally); no 3.10-only
  construct in the code — _parse_build_system_requires has tomllib/tomli/regex.
- Pass B shared overlay: _overlay_install now warns when a build dep is re-staged
  at a DIFFERENT version (surfaces the documented shared-overlay conflict instead
  of a silent miscompilation).
- deferred_sdists: comment that "url" is the guaranteed source (is_downloadable_url
  filter) and "filename" is only a best-effort hint (harmless if None).
- _gather_embedded_wheels Case D/E: comment that p.url is a placeholder
  (is_real_url=False + cached wheel => p.url is never fetched).
- Pass B overlay cleanup: comment that the on-failure rmtree + the finally rmtree
  double-remove is intentional and safe (ignore_errors=True).

No functional change to validated paths (comments + one diagnostic warning).
77 prebuilt unit tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This branch was rebased onto main, which released metaflow-netflixext 1.3.12
WITHOUT the generic sdist-build hooks in this PR. Bump to 1.3.13 so the new
only_binary / defer_pypi_sdist_build APIs ship under a fresh version, and pin
metaflow-prebuilt's floor (and the CI install comment) to match. Also fixes a
stale `metaflow_extensions.nflx` doc comment left by the namespace move (#93).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Keep newly added OSS tests in the netflixext namespace so pure OSS CI does not require an internal package providing metaflow_extensions.nflx.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
metaflow-netflixext 1.3.13 is already published from the nflx_compat shim release, so the generic sdist/prebuilt hooks need a fresh netflixext version for the next release.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@npow npow force-pushed the npow/generic-sdist-hooks branch from c4777c6 to 8fdae0f Compare June 19, 2026 00:55
@npow npow merged commit c47acf2 into main Jun 19, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant