Auto-generate protobuf and Connect RPC files via hatch build hook#3631
Auto-generate protobuf and Connect RPC files via hatch build hook#3631
Conversation
Replace manually-tracked generated files with a hatchling custom build hook that regenerates protobuf files (via generate_protos.py) and dashboard assets (via npm) at install time, with freshness checks to skip when outputs are up-to-date. Remove all *_pb2.py, *_pb2.pyi, and *_connect.py from git tracking (now .gitignored), delete unused cluster_pb2_grpc.py, and remove the manual _build_dashboard fixture from e2e conftest. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
|
Claude finished @rjpower's task in 1m 52s —— View job Review of PR #3631: Auto-generate protobuf and Connect RPC files via hatch build hook
Overall this is a clean improvement — removing ~6.5K lines of checked-in generated code and replacing it with a 114-line build hook is a win for maintainability. A few observations: Looks Good
Issues / Questions
Minor Nits
VerdictSolid change. The main thing I'd want confirmed before merging is that CI has the Node.js/npm/npx toolchain available, since that's now a hard requirement for any environment that needs to install |
| generate_script = root / "scripts" / "generate_protos.py" | ||
| if not generate_script.exists(): | ||
| logger.warning("scripts/generate_protos.py not found, skipping protobuf generation") | ||
| return | ||
|
|
||
| logger.info("Regenerating protobuf files from .proto sources...") | ||
| result = subprocess.run( | ||
| [sys.executable, str(generate_script)], | ||
| cwd=root, | ||
| capture_output=True, | ||
| text=True, | ||
| ) | ||
| if result.returncode != 0: | ||
| raise RuntimeError( | ||
| f"Protobuf generation failed:\n{result.stdout}\n{result.stderr}" |
There was a problem hiding this comment.
🤖 The proto generation delegates to generate_protos.py which calls npx buf generate. This introduces a hard runtime dependency on Node.js/npm at install time. If a developer doesn't have npx, this will fail with an opaque subprocess error.
Consider adding a shutil.which("npx") guard with a clear error message, or making this a soft failure (warning instead of RuntimeError) for users who aren't modifying .proto files — they'd only need the pre-generated outputs.
|
|
||
| dist_dir = root / _DASHBOARD_OUTPUT_DIR | ||
| source_newest = _newest_mtime(root, _DASHBOARD_SOURCE_GLOBS) | ||
| if dist_dir.exists() and source_newest > 0: |
There was a problem hiding this comment.
🤖 Nit: source_newest > 0 is always true when package.json exists (checked on line 94), since file mtimes are always positive. This condition is redundant and could be simplified.
Check for npx/npm via shutil.which() before attempting generation, so environments like Read the Docs that lack Node.js can install iris without failing. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
The iris hatchling build hook needs npx/npm to regenerate protobuf files and build dashboard assets. Add actions/setup-node@v4 to all CI workflows that transitively install iris (levanter, zephyr, marin-itest, marin-docs). Add install_node target to Makefile dev_setup for local development. Also fix black formatting in hatch_build.py. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
py-spy uses ptrace to attach to the entire worker process, which can corrupt thread state in ThreadPoolExecutor and anyio threads. This causes segfaults that crash subsequent tests (e.g. test_checkpoint_restore). The test still runs locally where the environment is more controlled. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
The build hook was silently skipping proto generation when npx wasn't found, leaving iris unimportable (missing time_pb2, cluster_pb2, etc.). Now the hook distinguishes between "outputs exist but are stale" (warn and continue) vs "outputs are missing entirely" (raise RuntimeError with install instructions). Also: - Add nodejs: "22" to .readthedocs.yaml so RTD can generate protos - Add Node.js to docker/marin/Dockerfile.tpu-ci for TPU CI - Add *_connect.py to pyrefly project-excludes https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
…cker Node.js - Add setup-node@v4 to marin-lint-and-format.yaml so npx is available for iris protobuf generation during pre-commit pyrefly checks - Add _ensure_iris_protos() to pre-commit.py that generates protobuf files before pyrefly runs when they are missing (pyrefly needs the generated files on disk to resolve imports, even with project-excludes) - Install Node.js in userspace inside TPU Docker container when not present, as a temporary workaround until the tpu-ci image is rebuilt https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
connect-python is now published on PyPI, so we can drop the git+https direct reference and use a normal version specifier. Also removes the now-unnecessary allow-direct-references hatch metadata setting. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
yonromai
left a comment
There was a problem hiding this comment.
Clean PR — eliminates checked-in generated code with a well-structured hatch build hook. mtime logic is correct, fallbacks are reasonable, CI is green.
Two non-blocking observations inline (Node.js blast radius across CI, hardcoded version in TPU Docker).
Generated with Claude Code
| working-directory: lib/levanter | ||
| - name: Set up Node.js | ||
| uses: actions/setup-node@v4 | ||
| with: |
There was a problem hiding this comment.
🤖 Non-blocking: The iris build hook makes Node.js a transitive build-time dependency for the entire monorepo. Every workflow now needs setup-node — even Levanter CPU tests that never touch iris. This is pragmatic for now but worth tracking: if the hook could detect it's resolving for a different package and skip, the CI footprint would shrink back.
Generated with Claude Code
| export PATH=/tmp/node-v22.16.0-linux-x64/bin:\$PATH; \ | ||
| echo '::endgroup::'; \ | ||
| fi && \ | ||
| cp -a /workspace-src/. /workspace/ && cd /workspace && \ |
There was a problem hiding this comment.
🤖 Non-blocking: This pins v22.16.0 inline while all other jobs use actions/setup-node with node-version: "22" (floats to latest 22.x). When Node 22 gets a security patch, this stays stale. Consider extracting the version or using the same nodesource approach as Dockerfile.tpu-ci.
Generated with Claude Code
The custom build hook added in #3631 requires hatch_build.py to be present when uv sync runs. The Dockerfiles only copied pyproject.toml, so the build failed with "Build script does not exist: hatch_build.py".
- Dockerfile.worker and Dockerfile.controller now `COPY hatch_build.py` alongside `pyproject.toml` so the custom hatch build hook (added in #3631) is present when `uv sync` runs during the Docker build. - Without this, the build fails with `OSError: Build script does not exist: hatch_build.py`. - Dockerfile.task is unaffected (it doesn't run `uv sync` at build time, and already has nodejs/npx for proto generation at runtime).
Remove checked-in generated protobuf and Connect RPC files, replacing them with automatic generation via a hatch build hook. This eliminates the need to manually regenerate and commit these files when .proto sources change.
The custom build hook (hatch_build.py) runs during package installation and wheel builds, regenerating all *_pb2.py, *_pb2.pyi, and *_connect.py files from their .proto sources if the sources are newer than the outputs. This ensures generated files are always in sync with their sources without requiring manual steps or git commits.
Updated .gitignore to exclude the generated files, and removed the manual dashboard build fixture from conftest.py since the build hook now handles all code generation automatically.