Skip to content

Auto-generate protobuf and Connect RPC files via hatch build hook#3631

Merged
rjpower merged 8 commits intomainfrom
claude/pyproject-build-actions-hyngK
Mar 14, 2026
Merged

Auto-generate protobuf and Connect RPC files via hatch build hook#3631
rjpower merged 8 commits intomainfrom
claude/pyproject-build-actions-hyngK

Conversation

@rjpower
Copy link
Copy Markdown
Collaborator

@rjpower rjpower commented Mar 13, 2026

Remove checked-in generated protobuf and Connect RPC files, replacing them with automatic generation via a hatch build hook. This eliminates the need to manually regenerate and commit these files when .proto sources change.

The custom build hook (hatch_build.py) runs during package installation and wheel builds, regenerating all *_pb2.py, *_pb2.pyi, and *_connect.py files from their .proto sources if the sources are newer than the outputs. This ensures generated files are always in sync with their sources without requiring manual steps or git commits.

Updated .gitignore to exclude the generated files, and removed the manual dashboard build fixture from conftest.py since the build hook now handles all code generation automatically.

Replace manually-tracked generated files with a hatchling custom build hook
that regenerates protobuf files (via generate_protos.py) and dashboard assets
(via npm) at install time, with freshness checks to skip when outputs are
up-to-date. Remove all *_pb2.py, *_pb2.pyi, and *_connect.py from git
tracking (now .gitignored), delete unused cluster_pb2_grpc.py, and remove
the manual _build_dashboard fixture from e2e conftest.

https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 13, 2026

Claude finished @rjpower's task in 1m 52s —— View job


Review of PR #3631: Auto-generate protobuf and Connect RPC files via hatch build hook

  • Read PR changes and understand the approach
  • Review hatch_build.py build hook implementation
  • Review pyproject.toml and .gitignore changes
  • Review conftest.py changes
  • Post review feedback

Overall this is a clean improvement — removing ~6.5K lines of checked-in generated code and replacing it with a 114-line build hook is a win for maintainability. A few observations:

Looks Good

  • Incremental rebuild logic via _newest_mtime / _oldest_mtime is correct and avoids unnecessary rebuilds.
  • .gitignore patterns match the three output glob patterns exactly.
  • Removing the _build_dashboard fixture from conftest.py is the right call since the hook now handles it.
  • Deletion of cluster_pb2_grpc.py — this was legacy (not generated by buf.gen.yaml) and nothing imports it.

Issues / Questions

  1. npx / npm required at install time — The proto generation path calls generate_protos.py which runs npx buf generate, and the dashboard build calls npm ci + npm run build. This means a plain pip install -e . or uv sync now silently requires Node.js/npm on the system. If a user doesn't have npm, the build hook will fail with an unclear subprocess error. Consider:

    • Logging a clear message if npm/npx is not found (e.g., shutil.which("npx") check).
    • Or making proto generation a soft failure with a warning rather than a hard RuntimeError, since developers who aren't modifying .proto files would never need to regenerate.
  2. First clone / CI cold start — After this PR, a fresh clone has no *_pb2.py files. The _needs_rebuild check will correctly trigger generation (since _oldest_mtime returns 0.0 when no outputs exist). But this means CI and new contributors must have npx and the buf plugin available. Is that already guaranteed in CI? Worth verifying.

  3. Race condition in parallel uv sync — If multiple processes (e.g., pytest-xdist workers) trigger uv sync concurrently, the old conftest had fcntl.flock protection for the dashboard build. The hatch hook has no such locking. Hatch likely serializes hook execution within a single install, but if two installs run in parallel (e.g., in CI matrix), the npm ci calls could conflict. Low risk but worth noting.

  4. Missing *_pb2_grpc.py in .gitignore — The old codebase had cluster_pb2_grpc.py checked in. While buf.gen.yaml no longer generates grpc files, if someone ever re-adds a grpc plugin, those files would get committed. Minor, but adding *_pb2_grpc.py to .gitignore for completeness wouldn't hurt.

  5. _DASHBOARD_OUTPUT_DIR rebuild check (line 100-102) — The condition source_newest > 0 is always true if any source file exists (mtimes are always positive). This isn't a bug, but the check is redundant — _newest_mtime returns 0.0 only if no files match the glob, which would mean package.json doesn't exist, but you already checked for that on line 94. Consider simplifying.

Minor Nits

  • hatch_build.py:80-88 — The subprocess.run for proto generation doesn't forward stdout/stderr to the logger on success. If the user runs uv sync -v, they won't see buf generate output unless it fails. Consider logger.debug(result.stdout) on success for debuggability.

Verdict

Solid change. The main thing I'd want confirmed before merging is that CI has the Node.js/npm/npx toolchain available, since that's now a hard requirement for any environment that needs to install iris from source. The code itself is well-structured.

Comment thread lib/iris/hatch_build.py Outdated
Comment on lines +74 to +88
generate_script = root / "scripts" / "generate_protos.py"
if not generate_script.exists():
logger.warning("scripts/generate_protos.py not found, skipping protobuf generation")
return

logger.info("Regenerating protobuf files from .proto sources...")
result = subprocess.run(
[sys.executable, str(generate_script)],
cwd=root,
capture_output=True,
text=True,
)
if result.returncode != 0:
raise RuntimeError(
f"Protobuf generation failed:\n{result.stdout}\n{result.stderr}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The proto generation delegates to generate_protos.py which calls npx buf generate. This introduces a hard runtime dependency on Node.js/npm at install time. If a developer doesn't have npx, this will fail with an opaque subprocess error.

Consider adding a shutil.which("npx") guard with a clear error message, or making this a soft failure (warning instead of RuntimeError) for users who aren't modifying .proto files — they'd only need the pre-generated outputs.

Comment thread lib/iris/hatch_build.py Outdated

dist_dir = root / _DASHBOARD_OUTPUT_DIR
source_newest = _newest_mtime(root, _DASHBOARD_SOURCE_GLOBS)
if dist_dir.exists() and source_newest > 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Nit: source_newest > 0 is always true when package.json exists (checked on line 94), since file mtimes are always positive. This condition is redundant and could be simplified.

claude added 6 commits March 13, 2026 22:05
Check for npx/npm via shutil.which() before attempting generation, so
environments like Read the Docs that lack Node.js can install iris
without failing.

https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
The iris hatchling build hook needs npx/npm to regenerate protobuf files
and build dashboard assets. Add actions/setup-node@v4 to all CI workflows
that transitively install iris (levanter, zephyr, marin-itest, marin-docs).
Add install_node target to Makefile dev_setup for local development.

Also fix black formatting in hatch_build.py.

https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
py-spy uses ptrace to attach to the entire worker process, which can
corrupt thread state in ThreadPoolExecutor and anyio threads. This
causes segfaults that crash subsequent tests (e.g. test_checkpoint_restore).
The test still runs locally where the environment is more controlled.

https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
The build hook was silently skipping proto generation when npx wasn't
found, leaving iris unimportable (missing time_pb2, cluster_pb2, etc.).
Now the hook distinguishes between "outputs exist but are stale" (warn
and continue) vs "outputs are missing entirely" (raise RuntimeError
with install instructions).

Also:
- Add nodejs: "22" to .readthedocs.yaml so RTD can generate protos
- Add Node.js to docker/marin/Dockerfile.tpu-ci for TPU CI
- Add *_connect.py to pyrefly project-excludes

https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
…cker Node.js

- Add setup-node@v4 to marin-lint-and-format.yaml so npx is available
  for iris protobuf generation during pre-commit pyrefly checks
- Add _ensure_iris_protos() to pre-commit.py that generates protobuf
  files before pyrefly runs when they are missing (pyrefly needs the
  generated files on disk to resolve imports, even with project-excludes)
- Install Node.js in userspace inside TPU Docker container when not
  present, as a temporary workaround until the tpu-ci image is rebuilt

https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
connect-python is now published on PyPI, so we can drop the git+https
direct reference and use a normal version specifier. Also removes the
now-unnecessary allow-direct-references hatch metadata setting.

https://claude.ai/code/session_01FJeyiYHDHfvKXj9gVQXvQJ
@rjpower rjpower requested a review from yonromai March 13, 2026 23:07
@claude claude bot mentioned this pull request Mar 13, 2026
@rjpower rjpower enabled auto-merge (squash) March 14, 2026 00:24
@rjpower rjpower requested a review from dlwh March 14, 2026 00:57
Copy link
Copy Markdown
Contributor

@yonromai yonromai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean PR — eliminates checked-in generated code with a well-structured hatch build hook. mtime logic is correct, fallbacks are reasonable, CI is green.

Two non-blocking observations inline (Node.js blast radius across CI, hardcoded version in TPU Docker).

Generated with Claude Code

working-directory: lib/levanter
- name: Set up Node.js
uses: actions/setup-node@v4
with:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Non-blocking: The iris build hook makes Node.js a transitive build-time dependency for the entire monorepo. Every workflow now needs setup-node — even Levanter CPU tests that never touch iris. This is pragmatic for now but worth tracking: if the hook could detect it's resolving for a different package and skip, the CI footprint would shrink back.

Generated with Claude Code

export PATH=/tmp/node-v22.16.0-linux-x64/bin:\$PATH; \
echo '::endgroup::'; \
fi && \
cp -a /workspace-src/. /workspace/ && cd /workspace && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Non-blocking: This pins v22.16.0 inline while all other jobs use actions/setup-node with node-version: "22" (floats to latest 22.x). When Node 22 gets a security patch, this stays stale. Consider extracting the version or using the same nodesource approach as Dockerfile.tpu-ci.

Generated with Claude Code

@rjpower rjpower merged commit 2d7123b into main Mar 14, 2026
16 checks passed
@rjpower rjpower deleted the claude/pyproject-build-actions-hyngK branch March 14, 2026 01:13
@ravwojdyla ravwojdyla mentioned this pull request Mar 14, 2026
ravwojdyla added a commit that referenced this pull request Mar 14, 2026
rjpower added a commit that referenced this pull request Mar 14, 2026
The custom build hook added in #3631 requires hatch_build.py to be
present when uv sync runs. The Dockerfiles only copied pyproject.toml,
so the build failed with "Build script does not exist: hatch_build.py".
rjpower added a commit that referenced this pull request Mar 14, 2026
- Dockerfile.worker and Dockerfile.controller now `COPY hatch_build.py`
alongside `pyproject.toml` so the custom hatch build hook (added in
#3631) is present when `uv sync` runs during the Docker build.
- Without this, the build fails with `OSError: Build script does not
exist: hatch_build.py`.
- Dockerfile.task is unaffected (it doesn't run `uv sync` at build time,
and already has nodejs/npx for proto generation at runtime).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants