Skip to content

fix(bundler): support git gems#1629

Open
romanblanco wants to merge 4 commits into
hermetoproject:mainfrom
romanblanco:fix-yabeda-dep
Open

fix(bundler): support git gems#1629
romanblanco wants to merge 4 commits into
hermetoproject:mainfrom
romanblanco:fix-yabeda-dep

Conversation

@romanblanco

@romanblanco romanblanco commented Jun 25, 2026

Copy link
Copy Markdown

Git gems (declared via github: or git: in Gemfile) are absent from runtime images in hermetic builds that use multi-stage Dockerfiles.

Root cause

BUNDLE_LOCAL.<gem> triggers bundler's local override path in git.rb:

# source/git.rb
def local_override!(path)
  set_install_path!(path)  # @install_path = local clone, not BUNDLE_PATH
  allow_git_ops? && !local? && !locked_revision_checked_out?
  #                 ^^^^^^^^ false when BUNDLE_LOCAL is enabled → skips checkout
end

Bundler uses the gem in-place from the clone and never copies it into BUNDLE_PATH.
The runtime stage copies only .bundle/ and the clone is lost.

Fix

Replace BUNDLE_LOCAL.* with GIT_CONFIG_GLOBAL pointing to a generated git config with url.insteadOf redirects for each git dependency.
Bundler clones from the local file:// path, local? stays false and checkout runs which should get the git gem in BUNDLE_PATH.

Fixes: #1630

Documentation:
https://git-scm.com/docs/git-config#Documentation/git-config.txt-urlbaseinsteadOf
https://git-scm.com/docs/git-config#Documentation/git-config.txt-GITCONFIGCOUNT

@romanblanco romanblanco marked this pull request as ready for review June 25, 2026 09:21

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors how git dependencies are handled during hermetic builds in the Bundler package manager. Instead of using BUNDLE_LOCAL.* configurations, it redirects git remote URLs to pre-fetched local clones using git's insteadOf rewrites in a global git config. Feedback on these changes includes: 1) Handling a potential issue where multiple git dependencies point to the same repository URL but with different revisions, which would cause git's insteadOf to only respect the last entry. It is suggested to detect this and raise an UnsupportedFeature error. 2) Aligning the new unit test with the repository style guide by avoiding substring matching (in) and instead asserting exact equality against a dedented expected string.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread hermeto/core/package_managers/bundler/main.py Outdated
Comment thread tests/unit/package_managers/bundler/test_main.py Outdated
@romanblanco romanblanco force-pushed the fix-yabeda-dep branch 3 times, most recently from de167b4 to 62456d8 Compare June 26, 2026 07:51
@eskultety

Copy link
Copy Markdown
Member

Replace BUNDLE_LOCAL.* with GIT_CONFIG_GLOBAL pointing to a generated git config with url.insteadOf redirects for each git dependency.

I feel like the blast radius of GIT_CONFIG_GLOBAL is quite big and quite risky. From Hermeto's POV everything seems fine because we "did our job" and if the user edits their Dockerfile manually it will work. However, one needs to consider Hermeto being used in CI/CD deployments where the context is too wide for the CI/CD system to always do the right thing and (as a matter of fact) . <dir>/hermeto.env may be prepended to every RUN instruction in the Dockerfile and so every command that you run inside your build (which can be anything really and even use git underneath) could now be indirectly affected by this variable. Note, it's not about the specific options in the config, it's about the fact that we replace the whole global config.

We may be safer with using dedicated key-value pairs: https://git-scm.com/docs/git-config#Documentation/git-config.txt-GITCONFIGCOUNT although that will be more fiddly to correctly generate and still won't be 100% bulletproof in the end, but definitely a lot less likely to cause issues during container build, if this git override works correctly that is (I haven't verified your claim).

@romanblanco

Copy link
Copy Markdown
Author

Replace BUNDLE_LOCAL.* with GIT_CONFIG_GLOBAL pointing to a generated git config with url.insteadOf redirects for each git dependency.

We may be safer with using dedicated key-value pairs: https://git-scm.com/docs/git-config#Documentation/git-config.txt-GITCONFIGCOUNT

Thanks for the suggestion!
Updated the PR.

@romanblanco romanblanco force-pushed the fix-yabeda-dep branch 3 times, most recently from 4233efd to 984fa62 Compare June 26, 2026 15:45
Comment thread docs/design/bundler.md
Comment thread docs/design/bundler.md
Comment thread docs/design/bundler.md
Comment on lines +459 to +462
To solve this without coupling to bundler's internal cache format
(which has [changed across versions](https://github.com/ruby/rubygems/commit/7d6b6316)), we use git's
[`url.insteadOf`](https://git-scm.com/docs/git-config#Documentation/git-config.txt-urlltbasegtinsteadOf)
via `GIT_CONFIG_COUNT`/`KEY`/`VALUE` environment variables to redirect each remote URL to the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be best to have a brief example on this so it's easier for imagination.

Comment thread hermeto/core/package_managers/bundler/gem_models.py
Comment thread hermeto/core/package_managers/bundler/gem_models.py
Comment thread tests/unit/package_managers/bundler/test_main.py Outdated
Comment thread tests/unit/package_managers/bundler/test_main.py Outdated
@romanblanco romanblanco force-pushed the fix-yabeda-dep branch 2 times, most recently from 3e85c3f to 48e4b66 Compare June 30, 2026 13:58

@eskultety eskultety left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanblanco https://github.com/hermetoproject/hermeto/blob/main/CONTRIBUTING.md#pull-request-guidelines

  • Run linters & unit tests before opening a PR
  • Ensure every single commit passes CI

Comment thread docs/design/bundler.md
Comment thread docs/design/bundler.md
@eskultety

Copy link
Copy Markdown
Member

@romanblanco you will need to squash the following to the 2nd commit:

diff --git tests/integration/test_data/bundler_e2e/.build-config.yaml tests/integration/test_data/bundler_e2e/.build-config.yaml
index b23f60ab..34ab6c93 100644
--- tests/integration/test_data/bundler_e2e/.build-config.yaml
+++ tests/integration/test_data/bundler_e2e/.build-config.yaml
@@ -2,13 +2,13 @@ environment_variables:
 - name: BUNDLE_APP_CONFIG
   value: ${output_dir}/bundler/config_override
 - name: GIT_CONFIG_COUNT
-  value: "2"
+  value: '2'
 - name: GIT_CONFIG_KEY_0
   value: url.file://${output_dir}/deps/bundler/json-schema-26487618a684/.insteadOf
-- name: GIT_CONFIG_VALUE_0
-  value: https://github.com/3scale/json-schema
 - name: GIT_CONFIG_KEY_1
   value: protocol.file.allow
+- name: GIT_CONFIG_VALUE_0
+  value: https://github.com/3scale/json-schema
 - name: GIT_CONFIG_VALUE_1
   value: always
 project_files:
diff --git tests/integration/test_data/bundler_e2e_missing_gemspec/.build-config.yaml tests/integration/test_data/bundler_e2e_missing_gemspec/.build-config.yaml
index b23f60ab..34ab6c93 100644
--- tests/integration/test_data/bundler_e2e_missing_gemspec/.build-config.yaml
+++ tests/integration/test_data/bundler_e2e_missing_gemspec/.build-config.yaml
@@ -2,13 +2,13 @@ environment_variables:
 - name: BUNDLE_APP_CONFIG
   value: ${output_dir}/bundler/config_override
 - name: GIT_CONFIG_COUNT
-  value: "2"
+  value: '2'
 - name: GIT_CONFIG_KEY_0
   value: url.file://${output_dir}/deps/bundler/json-schema-26487618a684/.insteadOf
-- name: GIT_CONFIG_VALUE_0
-  value: https://github.com/3scale/json-schema
 - name: GIT_CONFIG_KEY_1
   value: protocol.file.allow
+- name: GIT_CONFIG_VALUE_0
+  value: https://github.com/3scale/json-schema
 - name: GIT_CONFIG_VALUE_1
   value: always
 project_files:

@eskultety eskultety left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanblanco I did some further experiments on our side in context of this request of mine: https://github.com/hermetoproject/hermeto/pull/1629/changes#r3490366370

diff --git tests/integration/test_bundler.py tests/integration/test_bundler.py
index 55438345..35966c93 100644
--- tests/integration/test_bundler.py
+++ tests/integration/test_bundler.py
@@ -68,9 +68,19 @@ def test_bundler_packages(
                 packages=({"path": ".", "type": "bundler", "binary": {}},),
                 check_output=True,
             ),
-            [],  # No additional commands are run to verify the build
             [],
-            id="bundler_e2e",
+            [],
+            id="bundler_e2e_ruby33",
+        ),
+        pytest.param(
+            utils.TestParameters(
+                branch="bundler/e2e",
+                packages=({"path": ".", "type": "bundler", "binary": {}},),
+                check_output=True,
+            ),
+            [],
+            [],
+            id="bundler_e2e_ruby40",
         ),
         pytest.param(
             utils.TestParameters(

Then I migrated/introduced test case/s to cover both old and new bundler with multi-stage builds, e.g.:

diff --git tests/integration/test_data/bundler_e2e_ruby40/container/Containerfile tests/integration/test_data/bundler_e2e_ruby40/container/Containerfile
new file mode 100644
index 00000000..0507284e
--- /dev/null
+++ tests/integration/test_data/bundler_e2e_ruby40/container/Containerfile
@@ -0,0 +1,15 @@
+FROM mirror.gcr.io/ruby:4.0 AS builder
+
+# Test disabled network access
+RUN if getent hosts www.google.com; then echo "Has network access!"; exit 1; fi
+
+WORKDIR /src
+
+ENV BUNDLE_PATH=/gems
+RUN . /tmp/hermeto.env && bundle install
+
+FROM mirror.gcr.io/ruby:4.0
+
+COPY --from=builder /gems /gems
+RUN test -d /gems/ruby/*/bundler/gems/json-schema-*

and it worked. I'm not encouraging to copy this verbatim, this is something I botched quickly locally. However, I think the PR could go like this:

  1. Rename the existing integration test bundler_e2e test data to bundler_e2e_ruby33
  2. Convert to cloning bare repos
  3. Introducing the GIT_CONFIG configuration
  4. Changing the renamed e2e_ruby33 test to a multi-stage docker build and introducing a new ruby 4.0 test pretty much identical to the e2e_ruby33 one so that we have coverage for both old and new bundler going forward.

How does ^that sound?

@eskultety

Copy link
Copy Markdown
Member

feat(bundler): support git gems

btw. This is not a feature, it is a bugfix ;) .

@romanblanco romanblanco changed the title feat(bundler): support git gems fix(bundler): support git gems Jul 2, 2026
Signed-off-by: Roman Blanco <rblanco@redhat.com>
Use bare git clones (`git clone --bare`) instead of full clones for
git dependencies. Bare clones contain only git objects without a
working tree, which is all that bundler needs to resolve the source.

Drop the `branch` field from the lockfile parser output — bundler's
git source only requires `remote` and `revision` to identify and
install a git dependency [1][2]. The branch metadata was parsed but
never consumed during the offline install.

Remove BUNDLE_LOCAL configuration from the hermetic build setup.
BUNDLE_LOCAL requires a working tree with a checked-out revision,
which bare clones do not provide.

[1] https://github.com/ruby/rubygems/blob/v4.0.15/bundler/lib/bundler/source/git.rb#L53-L62
[2] https://github.com/ruby/rubygems/blob/e7167b9a42e9804a52c04a2c16aab31949471d69/bundler/lib/bundler/source/git.rb#L38

Signed-off-by: Roman Blanco <rblanco@redhat.com>
…teadOf

BUNDLE_LOCAL causes bundler to use git gems in-place from the clone
directory rather than installing them into BUNDLE_PATH. In multi-stage
container builds, the runtime stage copies only BUNDLE_PATH — the
in-place clones are lost, producing LoadError at startup.

Replace BUNDLE_LOCAL with git's url.<base>.insteadOf mechanism [1],
configured through GIT_CONFIG_COUNT/KEY/VALUE environment variables.
This transparently redirects each git dependency's remote URL to a
local file:// path pointing at the prefetched bare clone. Bundler
treats the dependency as a normal git source and installs it into
BUNDLE_PATH, surviving stage transitions.

Also add protocol.file.allow=always to the git configuration, as
git may restrict file:// protocol access by default [2].

Reject multiple git gems pointing at the same remote URL, since
url.insteadOf maps one URL to one local path.

[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-urlltbasegtinsteadOf
[2] https://git-scm.com/docs/git-config#ENVIRONMENT

Closes hermetoproject#1630

Signed-off-by: Roman Blanco <rblanco@redhat.com>
…ge builds

Multi-stage Containerfiles prove that url.insteadOf-based git
dependency resolution survives COPY --from=builder stage transitions.
The second stage verifies the git-sourced gem (json-schema) lands in
the expected bundler/gems directory without network access.

Add Ruby 4.0 test variant (bundler 4.x) alongside existing Ruby 3.3
(bundler 2.x) to cover both current and next-generation bundler
behavior.

Signed-off-by: Roman Blanco <rblanco@redhat.com>
romanblanco pushed a commit to romanblanco/hermeto that referenced this pull request Jul 2, 2026
EnvironmentVariable instances are never mutated after construction.
resolve_value() is a pure function returning a new string. Making
the model frozen enforces this invariant and enables hashing, which
allows using EnvironmentVariable in sets and as dict keys.

Reference: hermetoproject#1629 (comment)

Signed-off-by: Erik Skultety <eskultet@redhat.com>

@eskultety eskultety left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be documented in the user-facing docs, i.e. docs/bundler.md to tell users about the build-time env variables we expose. This is because it's still not a bulletproof solution, IOW the GIT_CONFIG_N numbered variables we expose might potentially conflict with other identically named variables populated from outside the container image during build in which case our variables would overwrite those. That said, what we've come up with seems like the only way for now and is definitely a smaller blast radius than setting global git config. Please make sure to warn users about this, so they can account for that in their builds.

Other than that the logic is IMO sound, I verified each commit in this revision with integration tests to make sure we don't magically break in between commits (it's important here) and everything is green.

Minor comments left since there will have to be 1 more revision and we should be good to go.

Comment thread docs/design/bundler.md
tzinfo-2.0.6.gem
```

Notice that all the `.gem` dependencies are kept in their original format, and Git dependencies are just plain clones

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

Use bare git clones (git clone --bare) instead of full clones for
git dependencies. Bare clones contain only git objects without a
working tree, which is all that bundler needs to resolve the source.

This is all true, undisputed, but I'd like to see more context in the message for bookkeeping reasons, i.e. this commit deletes the BUNDLE_LOCAL_* variables and converts to bare git clones but doesn't discuss why that is okay to do, especially since newer bundler would likely complain. Therefore, I'd like to see context expansion saying something along the lines of the following:

  • the design was originally based on newer bundler than we tested for which differs in the git gem cache format
  • adjust the design to favour the older bundler approach (bare git clones) as tested, the next patch will expand this with proper support for newer bundler and provide version agnostic support

Comment thread docs/design/bundler.md
Comment thread docs/design/bundler.md
Comment on lines -456 to +457
Bundler seems to follow a different approach when it comes to git dependencies since in its default
configuration it always tries to fetch the source from the remote to ensure the application is built
against the correct branch/revision. This argument is indirectly supported by the
[docs](https://bundler.io/guides/deploying.html#deploying-your-application), more specifically:

> If you have run bundle pack, checked in the vendor/cache directory, and do not have any git gems,
Bundler will not contact the internet while installing your bundle.

This is a problem for hermetic builds and as such setting `BUNDLE_DEPLOYMENT` alone doesn't help
and we need more settings. In order to overcome this behavioral trait, we need to follow the
recommendation in the [config](https://bundler.io/v2.5/man/bundle-config.1.html#LOCAL-GIT-REPOS)
docs and override each git dependency with the location on the disk we fetched the git dependency
to and tell bundler about it with the `BUNDLE_LOCAL__<GEM_NAME>` configuration key.
However, this still isn't enough for Bundler to honour offline installs with git dependencies,
because then it's trying to enforce further checks as outlined in the
[docs](https://bundler.io/v2.5/man/bundle-config.1.html#LOCAL-GIT-REPOS):

>Bundler does many checks to ensure a developer won't work with invalid references. Particularly,
>we force a developer to specify a branch in the Gemfile in order to use this feature. If the
>branch specified in the Gemfile and the current branch in the local git repository do not
>match, Bundler will abort.

Therefore, we additionally need to enforce both `BUNDLE_DISABLE_LOCAL_BRANCH_CHECK` and
`BUNDLE_DISABLE_LOCAL_REVISION_CHECK`.
Bundler always tries to fetch git dependencies from the remote, so `BUNDLE_DEPLOYMENT` alone
doesn't prevent network access.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this hunk here I'd like to be more thorough though - it should go to the earlier commit switching to bare git clones. It's fine to keep the section in that commit brief, we'll add the rest (below) in this commit.

Comment thread docs/design/bundler.md
Comment on lines +459 to +478
To solve this without coupling to bundler's internal cache format
(which has [changed across versions](https://github.com/ruby/rubygems/commit/7d6b6316)), we use git's
[`url.insteadOf`](https://git-scm.com/docs/git-config#Documentation/git-config.txt-urlltbasegtinsteadOf)
via `GIT_CONFIG_COUNT`/`KEY`/`VALUE` environment variables to redirect each remote URL to the
pre-fetched local bare clone. This operates at the git transport layer — bundler runs its standard
flow but git silently fetches from disk instead of the network.

For example, given a git dependency on `https://github.com/3scale/json-schema`, hermeto sets:

```
GIT_CONFIG_COUNT=2
GIT_CONFIG_KEY_0=url.file:///output/deps/bundler/json-schema-26487618a684/.insteadOf
GIT_CONFIG_VALUE_0=https://github.com/3scale/json-schema
GIT_CONFIG_KEY_1=protocol.file.allow
GIT_CONFIG_VALUE_1=always
```

The `protocol.file.allow=always` entry ensures git permits the `file://` protocol after the URL
rewrite. See the [git documentation](https://git-scm.com/docs/git-config#Documentation/git-config.txt-protocolallow)
for details.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this hunk discussing the actual solution we've implemented should remain here as is. :)

if len(dirs) > 1:
raise UnsupportedFeature(
f"Multiple git dependencies point to the same repository ({url}) "
f"but use different revisions: {', '.join(sorted(dirs))}. "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use populate revisions in the error message with dirs which are filesystem dir names (e.g. name-abc123) rather than bare refs.

@a-ovchinnikov a-ovchinnikov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor nitpicks, LGTM otherwise.

"""Raise if multiple git deps share the same URL with different revisions."""
url_to_dirs: dict[str, set[str]] = {}
for _packname, dirname, url in git_paths:
url_to_dirs.setdefault(url, set()).add(dirname)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nitpicks: _packname is not used in this function thus _ should be enough (that would also tell the reader that the element is not used); since you are not using the return value of setdefault it might be better to use defaultdict:

from collections improt defaultdict
...
...
...
urls_to_dirs = defaultfict(set)
for _, dirname, url in git_paths:
    urls_to_dirs[url].add(dirname)

It would also be nice to find all offending url:dirs pairs in one go (there could be more than one and users would get really upset if they were to fish for them one at a time):

offenders = [(url, dirs) for url, dirs in urls_to_dirs.items() if len(dirs) > 1]
if offenders:
    raise ...

EnvironmentVariable(name="BUNDLE_APP_CONFIG", value="${output_dir}/" + CONFIG_OVERRIDE),
]
if git_paths:
# Redirect git remote URLs to pre-fetched local clones via git's

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-blocking request: could you please move lines 206 to 221 into a separate helper, maybe _produce_git_redirection_variables(git_paths: list[tuple[DepName, FSDepName, DepURL]] | None) -> list[EnvironmentVariable] to get something like this:

env_vars = [
        # Contains path to a directory where a new config could be found.
        EnvironmentVariable(name="BUNDLE_APP_CONFIG", value="${output_dir}/" + CONFIG_OVERRIDE),
]
env_vars.extend(_produce_git_redirection_variables(git_paths))
return env_vars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

bundler: git gems missing from runtime image in multi-stage container builds

3 participants