Auto-generate protobuf and Connect RPC files via hatch build hook by rjpower · Pull Request #3631 · marin-community/marin

rjpower · 2026-03-13T22:00:27Z

Remove checked-in generated protobuf and Connect RPC files, replacing them with automatic generation via a hatch build hook. This eliminates the need to manually regenerate and commit these files when .proto sources change.

The custom build hook (hatch_build.py) runs during package installation and wheel builds, regenerating all *_pb2.py, *_pb2.pyi, and *_connect.py files from their .proto sources if the sources are newer than the outputs. This ensures generated files are always in sync with their sources without requiring manual steps or git commits.

Updated .gitignore to exclude the generated files, and removed the manual dashboard build fixture from conftest.py since the build hook now handles all code generation automatically.

Replace manually-tracked generated files with a hatchling custom build hook that regenerates protobuf files (via generate_protos.py) and dashboard assets (via npm) at install time, with freshness checks to skip when outputs are up-to-date. Remove all *_pb2.py, *_pb2.pyi, and *_connect.py from git tracking (now .gitignored), delete unused cluster_pb2_grpc.py, and remove the manual _build_dashboard fixture from e2e conftest. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

claude · 2026-03-13T22:00:45Z

Claude finished @rjpower's task in 1m 52s —— View job

Review of PR #3631: Auto-generate protobuf and Connect RPC files via hatch build hook

Read PR changes and understand the approach
Review hatch_build.py build hook implementation
Review pyproject.toml and .gitignore changes
Review conftest.py changes
Post review feedback

Overall this is a clean improvement — removing ~6.5K lines of checked-in generated code and replacing it with a 114-line build hook is a win for maintainability. A few observations:

Looks Good

Incremental rebuild logic via _newest_mtime / _oldest_mtime is correct and avoids unnecessary rebuilds.
.gitignore patterns match the three output glob patterns exactly.
Removing the _build_dashboard fixture from conftest.py is the right call since the hook now handles it.
Deletion of cluster_pb2_grpc.py — this was legacy (not generated by buf.gen.yaml) and nothing imports it.

Issues / Questions

npx / npm required at install time — The proto generation path calls generate_protos.py which runs npx buf generate, and the dashboard build calls npm ci + npm run build. This means a plain pip install -e . or uv sync now silently requires Node.js/npm on the system. If a user doesn't have npm, the build hook will fail with an unclear subprocess error. Consider:
- Logging a clear message if npm/npx is not found (e.g., shutil.which("npx") check).
- Or making proto generation a soft failure with a warning rather than a hard RuntimeError, since developers who aren't modifying .proto files would never need to regenerate.
First clone / CI cold start — After this PR, a fresh clone has no *_pb2.py files. The _needs_rebuild check will correctly trigger generation (since _oldest_mtime returns 0.0 when no outputs exist). But this means CI and new contributors must have npx and the buf plugin available. Is that already guaranteed in CI? Worth verifying.
Race condition in parallel uv sync — If multiple processes (e.g., pytest-xdist workers) trigger uv sync concurrently, the old conftest had fcntl.flock protection for the dashboard build. The hatch hook has no such locking. Hatch likely serializes hook execution within a single install, but if two installs run in parallel (e.g., in CI matrix), the npm ci calls could conflict. Low risk but worth noting.
Missing *_pb2_grpc.py in .gitignore — The old codebase had cluster_pb2_grpc.py checked in. While buf.gen.yaml no longer generates grpc files, if someone ever re-adds a grpc plugin, those files would get committed. Minor, but adding *_pb2_grpc.py to .gitignore for completeness wouldn't hurt.
_DASHBOARD_OUTPUT_DIR rebuild check (line 100-102) — The condition source_newest > 0 is always true if any source file exists (mtimes are always positive). This isn't a bug, but the check is redundant — _newest_mtime returns 0.0 only if no files match the glob, which would mean package.json doesn't exist, but you already checked for that on line 94. Consider simplifying.

Minor Nits

hatch_build.py:80-88 — The subprocess.run for proto generation doesn't forward stdout/stderr to the logger on success. If the user runs uv sync -v, they won't see buf generate output unless it fails. Consider logger.debug(result.stdout) on success for debuggability.

Verdict

Solid change. The main thing I'd want confirmed before merging is that CI has the Node.js/npm/npx toolchain available, since that's now a hard requirement for any environment that needs to install iris from source. The code itself is well-structured.

claude · 2026-03-13T22:02:49Z

+        generate_script = root / "scripts" / "generate_protos.py"
+        if not generate_script.exists():
+            logger.warning("scripts/generate_protos.py not found, skipping protobuf generation")
+            return
+
+        logger.info("Regenerating protobuf files from .proto sources...")
+        result = subprocess.run(
+            [sys.executable, str(generate_script)],
+            cwd=root,
+            capture_output=True,
+            text=True,
+        )
+        if result.returncode != 0:
+            raise RuntimeError(
+                f"Protobuf generation failed:\n{result.stdout}\n{result.stderr}"


🤖 The proto generation delegates to generate_protos.py which calls npx buf generate. This introduces a hard runtime dependency on Node.js/npm at install time. If a developer doesn't have npx, this will fail with an opaque subprocess error.

Consider adding a shutil.which("npx") guard with a clear error message, or making this a soft failure (warning instead of RuntimeError) for users who aren't modifying .proto files — they'd only need the pre-generated outputs.

claude · 2026-03-13T22:02:50Z

+
+        dist_dir = root / _DASHBOARD_OUTPUT_DIR
+        source_newest = _newest_mtime(root, _DASHBOARD_SOURCE_GLOBS)
+        if dist_dir.exists() and source_newest > 0:


🤖 Nit: source_newest > 0 is always true when package.json exists (checked on line 94), since file mtimes are always positive. This condition is redundant and could be simplified.

Check for npx/npm via shutil.which() before attempting generation, so environments like Read the Docs that lack Node.js can install iris without failing. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

The iris hatchling build hook needs npx/npm to regenerate protobuf files and build dashboard assets. Add actions/setup-node@v4 to all CI workflows that transitively install iris (levanter, zephyr, marin-itest, marin-docs). Add install_node target to Makefile dev_setup for local development. Also fix black formatting in hatch_build.py. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

py-spy uses ptrace to attach to the entire worker process, which can corrupt thread state in ThreadPoolExecutor and anyio threads. This causes segfaults that crash subsequent tests (e.g. test_checkpoint_restore). The test still runs locally where the environment is more controlled. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

The build hook was silently skipping proto generation when npx wasn't found, leaving iris unimportable (missing time_pb2, cluster_pb2, etc.). Now the hook distinguishes between "outputs exist but are stale" (warn and continue) vs "outputs are missing entirely" (raise RuntimeError with install instructions). Also: - Add nodejs: "22" to .readthedocs.yaml so RTD can generate protos - Add Node.js to docker/marin/Dockerfile.tpu-ci for TPU CI - Add *_connect.py to pyrefly project-excludes https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

…cker Node.js - Add setup-node@v4 to marin-lint-and-format.yaml so npx is available for iris protobuf generation during pre-commit pyrefly checks - Add _ensure_iris_protos() to pre-commit.py that generates protobuf files before pyrefly runs when they are missing (pyrefly needs the generated files on disk to resolve imports, even with project-excludes) - Install Node.js in userspace inside TPU Docker container when not present, as a temporary workaround until the tpu-ci image is rebuilt https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

connect-python is now published on PyPI, so we can drop the git+https direct reference and use a normal version specifier. Also removes the now-unnecessary allow-direct-references hatch metadata setting. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

yonromai

Clean PR — eliminates checked-in generated code with a well-structured hatch build hook. mtime logic is correct, fallbacks are reasonable, CI is green.

Two non-blocking observations inline (Node.js blast radius across CI, hardcoded version in TPU Docker).

_{Generated with Claude Code}

yonromai · 2026-03-14T01:13:40Z

          working-directory: lib/levanter
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:


🤖 Non-blocking: The iris build hook makes Node.js a transitive build-time dependency for the entire monorepo. Every workflow now needs setup-node — even Levanter CPU tests that never touch iris. This is pragmatic for now but worth tracking: if the hook could detect it's resolving for a different package and skip, the CI footprint would shrink back.

_{Generated with Claude Code}

yonromai · 2026-03-14T01:13:40Z

+                export PATH=/tmp/node-v22.16.0-linux-x64/bin:\$PATH; \
+                echo '::endgroup::'; \
+              fi && \
+              cp -a /workspace-src/. /workspace/ && cd /workspace && \


🤖 Non-blocking: This pins v22.16.0 inline while all other jobs use actions/setup-node with node-version: "22" (floats to latest 22.x). When Node 22 gets a security patch, this stays stale. Consider extracting the version or using the same nodesource approach as Dockerfile.tpu-ci.

_{Generated with Claude Code}

follow up #3631

The custom build hook added in #3631 requires hatch_build.py to be present when uv sync runs. The Dockerfiles only copied pyproject.toml, so the build failed with "Build script does not exist: hatch_build.py".

- Dockerfile.worker and Dockerfile.controller now `COPY hatch_build.py` alongside `pyproject.toml` so the custom hatch build hook (added in #3631) is present when `uv sync` runs during the Docker build. - Without this, the build fails with `OSError: Build script does not exist: hatch_build.py`. - Dockerfile.task is unaffected (it doesn't run `uv sync` at build time, and already has nodejs/npx for proto generation at runtime).

claude bot reviewed Mar 13, 2026

View reviewed changes

claude added 6 commits March 13, 2026 22:05

Skip protobuf and dashboard build hooks when Node.js is unavailable

b3d1f73

Check for npx/npm via shutil.which() before attempting generation, so environments like Read the Docs that lack Node.js can install iris without failing. https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ

rjpower requested a review from yonromai March 13, 2026 23:07

claude bot mentioned this pull request Mar 13, 2026

iris: split up protos #3636

Open

rjpower enabled auto-merge (squash) March 14, 2026 00:24

Merge branch 'main' into claude/pyproject-build-actions-hyngK

2ec64ad

rjpower requested a review from dlwh March 14, 2026 00:57

yonromai approved these changes Mar 14, 2026

View reviewed changes

rjpower merged commit 2d7123b into main Mar 14, 2026
16 checks passed

rjpower deleted the claude/pyproject-build-actions-hyngK branch March 14, 2026 01:13

ravwojdyla mentioned this pull request Mar 14, 2026

Update uv lock #3648

Merged

ravwojdyla added a commit that referenced this pull request Mar 14, 2026

Update uv lock (#3648)

3363c5b

follow up #3631

rjpower mentioned this pull request Mar 14, 2026

iris: copy hatch_build.py into Docker images #3650

Merged

yonromai mentioned this pull request Mar 16, 2026

iris CLI broken locally after #3649: missing query_pb2 #3709

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-generate protobuf and Connect RPC files via hatch build hook#3631

Auto-generate protobuf and Connect RPC files via hatch build hook#3631
rjpower merged 8 commits intomainfrom
claude/pyproject-build-actions-hyngK

rjpower commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

claude bot Mar 13, 2026

Uh oh!

claude bot Mar 13, 2026

Uh oh!

yonromai left a comment

Uh oh!

yonromai Mar 14, 2026

Uh oh!

yonromai Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rjpower commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #3631: Auto-generate protobuf and Connect RPC files via hatch build hook

Looks Good

Issues / Questions

Minor Nits

Verdict

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

yonromai left a comment

Choose a reason for hiding this comment

Uh oh!

yonromai Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

yonromai Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Mar 13, 2026 •

edited

Loading