Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 27 additions & 40 deletions docs/design/bundler.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,17 +271,17 @@ nokogiri-1.16.6.gem
tzinfo-2.0.6.gem
```

Notice that all the `.gem` dependencies are kept in their original format, and Git dependencies are just plain clones

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

Use bare git clones (git clone --bare) instead of full clones for
git dependencies. Bare clones contain only git objects without a
working tree, which is all that bundler needs to resolve the source.

This is all true, undisputed, but I'd like to see more context in the message for bookkeeping reasons, i.e. this commit deletes the BUNDLE_LOCAL_* variables and converts to bare git clones but doesn't discuss why that is okay to do, especially since newer bundler would likely complain. Therefore, I'd like to see context expansion saying something along the lines of the following:

  • the design was originally based on newer bundler than we tested for which differs in the git gem cache format
  • adjust the design to favour the older bundler approach (bare git clones) as tested, the next patch will expand this with proper support for newer bundler and provide version agnostic support

Notice that all the `.gem` dependencies are kept in their original format, and Git dependencies are bare git clones
Comment thread
eskultety marked this conversation as resolved.
of the repository placed in a folder. For Git dependencies, the folder name must match this specific
[format](https://github.com/rubygems/rubygems/blob/3da9b1dda0824d1d770780352bb1d3f287cb2df5/bundler/lib/bundler/source/git.rb#L130):

```ruby
"#{base_name}-#{shortref_for_path(revision)}"
```

The name of the directory **must come from the Git URL**, not the actual name of the gem, and the cloned folder must
Comment thread
eskultety marked this conversation as resolved.
contain unpacked source code. Any other format will cause bundler to try to re-download the repository, causing the
build to fail.
The name of the directory **must come from the Git URL**, not the actual name of the gem. The clones are bare
repositories used as local remotes via git's `url.insteadOf` mechanism (see
[Offline installs involving git dependencies](#offline-installs-involving-git-dependencies)).
Comment thread
eskultety marked this conversation as resolved.
Comment thread
eskultety marked this conversation as resolved.

##### Multiple Gems in a single repository

Expand Down Expand Up @@ -389,8 +389,6 @@ BUNDLE_DEPLOYMENT=true
BUNDLE_NO_PRUNE=true
BUNDLE_ALLOW_OFFLINE_INSTALL=true
BUNDLE_DISABLE_VERSION_CHECK=true
BUNDLE_DISABLE_LOCAL_BRANCH_CHECK=true
Comment thread
eskultety marked this conversation as resolved.
BUNDLE_DISABLE_LOCAL_REVISION_CHECK=true
```

- **BUNDLE_CACHE_PATH**: The directory that Bundler will place cached gems in when running bundle package, and that
Expand All @@ -415,16 +413,6 @@ package cache.
- **BUNDLE_DISABLE_VERSION_CHECK**: Stop Bundler from checking if a newer Bundler version is
available on rubygems.org.

- **BUNDLE_DISABLE_LOCAL_REVISION_CHECK**: Allow Bundler to use a local git override without a
branch specified in the Gemfile

- **BUNDLE_DISABLE_LOCAL_BRANCH_CHECK**: Allow Bundler to use a local git override without checking
if the revision present in the lockfile is present in the repository.

- **BUNDLE_LOCAL__<GEM_NAME>**: Instead of checking out the remote git repository for GEM_NAME,
the local git directory override will be used. See below for more information on Bundler's git
dependency handling.

For more information, see Bundler's [documentation](https://bundler.io/v2.5/man/bundle-config.1.html).

#### Offline installation pitfalls
Expand Down Expand Up @@ -465,30 +453,29 @@ flag](https://github.com/rubygems/rubygems/issues/8265) via configuration option
making use of the deployment mode.**

##### Offline installs involving git dependencies
Bundler seems to follow a different approach when it comes to git dependencies since in its default
configuration it always tries to fetch the source from the remote to ensure the application is built
against the correct branch/revision. This argument is indirectly supported by the
[docs](https://bundler.io/guides/deploying.html#deploying-your-application), more specifically:

> If you have run bundle pack, checked in the vendor/cache directory, and do not have any git gems,
Bundler will not contact the internet while installing your bundle.

This is a problem for hermetic builds and as such setting `BUNDLE_DEPLOYMENT` alone doesn't help
and we need more settings. In order to overcome this behavioral trait, we need to follow the
recommendation in the [config](https://bundler.io/v2.5/man/bundle-config.1.html#LOCAL-GIT-REPOS)
docs and override each git dependency with the location on the disk we fetched the git dependency
to and tell bundler about it with the `BUNDLE_LOCAL__<GEM_NAME>` configuration key.
However, this still isn't enough for Bundler to honour offline installs with git dependencies,
because then it's trying to enforce further checks as outlined in the
[docs](https://bundler.io/v2.5/man/bundle-config.1.html#LOCAL-GIT-REPOS):

>Bundler does many checks to ensure a developer won't work with invalid references. Particularly,
>we force a developer to specify a branch in the Gemfile in order to use this feature. If the
>branch specified in the Gemfile and the current branch in the local git repository do not
>match, Bundler will abort.

Therefore, we additionally need to enforce both `BUNDLE_DISABLE_LOCAL_BRANCH_CHECK` and
`BUNDLE_DISABLE_LOCAL_REVISION_CHECK`.
Bundler always tries to fetch git dependencies from the remote, so `BUNDLE_DEPLOYMENT` alone
doesn't prevent network access.
Comment on lines -456 to +457

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this hunk here I'd like to be more thorough though - it should go to the earlier commit switching to bare git clones. It's fine to keep the section in that commit brief, we'll add the rest (below) in this commit.


To solve this without coupling to bundler's internal cache format
(which has [changed across versions](https://github.com/ruby/rubygems/commit/7d6b6316)), we use git's
[`url.insteadOf`](https://git-scm.com/docs/git-config#Documentation/git-config.txt-urlltbasegtinsteadOf)
via `GIT_CONFIG_COUNT`/`KEY`/`VALUE` environment variables to redirect each remote URL to the
Comment on lines +459 to +462

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be best to have a brief example on this so it's easier for imagination.

pre-fetched local bare clone. This operates at the git transport layer — bundler runs its standard
flow but git silently fetches from disk instead of the network.

For example, given a git dependency on `https://github.com/3scale/json-schema`, hermeto sets:

```
GIT_CONFIG_COUNT=2
GIT_CONFIG_KEY_0=url.file:///output/deps/bundler/json-schema-26487618a684/.insteadOf
GIT_CONFIG_VALUE_0=https://github.com/3scale/json-schema
GIT_CONFIG_KEY_1=protocol.file.allow
GIT_CONFIG_VALUE_1=always
```

The `protocol.file.allow=always` entry ensures git permits the `file://` protocol after the URL
rewrite. See the [git documentation](https://git-scm.com/docs/git-config#Documentation/git-config.txt-protocolallow)
for details.
Comment on lines +459 to +478

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this hunk discussing the actual solution we've implemented should remain here as is. :)


##### Offline installation using deployment mode
Deployment mode is a way of vendoring one's code along with the dependencies.
Expand Down
10 changes: 2 additions & 8 deletions hermeto/core/package_managers/bundler/gem_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,12 +100,10 @@ class GitDependency(_GemMetadata):

Attributes:
url: The URL of the git repository.
branch: The branch to checkout.
ref: Commit hash.
"""

url: AcceptedUrl
branch: str | None = None
ref: AcceptedGitRef

@cached_property
Expand Down Expand Up @@ -134,17 +132,13 @@ def download_to(self, deps_dir: RootedPath) -> None:
git_repo_path.path.mkdir(parents=True)

log.info("Cloning git repository %s", self.url)
repo = GitRepo.clone_from(
GitRepo.clone_from(
url=str(self.url),
to_path=git_repo_path.path,
bare=True,
env={"GIT_TERMINAL_PROMPT": "0"},
)
Comment thread
eskultety marked this conversation as resolved.

if self.branch is not None:
repo.git.checkout(self.branch)

repo.git.reset("--hard", self.ref)

Comment thread
eskultety marked this conversation as resolved.

class PathDependency(_GemMetadata):
"""
Expand Down
86 changes: 48 additions & 38 deletions hermeto/core/package_managers/bundler/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@
def fetch_bundler_source(request: Request) -> RequestOutput:
"""Resolve and process all bundler packages."""
components: list[Component] = []
environment_variables: list[EnvironmentVariable] = (
_prepare_environment_variables_for_hermetic_build()
)
project_files: list[ProjectFile] = []
git_paths = []

Expand All @@ -50,9 +47,11 @@ def fetch_bundler_source(request: Request) -> RequestOutput:
)
components.extend(_comps)
git_paths.extend(_git_paths)
project_files.append(
_prepare_for_hermetic_build(request.source_dir, request.output_dir, git_paths)

environment_variables: list[EnvironmentVariable] = (
_prepare_environment_variables_for_hermetic_build(git_paths)
)
project_files.append(_prepare_for_hermetic_build(request.source_dir, request.output_dir))

annotations = []
if backend_annotation := create_backend_annotation(components, "bundler"):
Expand All @@ -65,17 +64,17 @@ def fetch_bundler_source(request: Request) -> RequestOutput:
)


# Aliases for git dependency name and git dependency name as
# it is written to file system:
# Aliases for git dependency name, file system name, and remote URL:
DepName = str
FSDepName = str
DepURL = str


def _resolve_bundler_package(
package_dir: RootedPath,
output_dir: RootedPath,
binary_filters: BundlerBinaryFilters | None = None,
) -> tuple[list[Component], list[tuple[DepName, FSDepName]]]:
) -> tuple[list[Component], list[tuple[DepName, FSDepName, DepURL]]]:
"""Process a request for a single bundler package."""
deps_dir = output_dir.join_within_root("deps", "bundler")
deps_dir.path.mkdir(parents=True, exist_ok=True)
Expand Down Expand Up @@ -110,7 +109,7 @@ def _resolve_bundler_package(
files_to_download[dep.remote_location] = dep.download_location(deps_dir)
case GitDependency():
dep.download_to(deps_dir)
git_paths.append((dep.name, dep.repo_name + "-" + dep.ref[:12]))
git_paths.append((dep.name, dep.repo_name + "-" + dep.ref[:12], str(dep.url)))

c = Component(name=dep.name, version=dep.version, purl=dep.purl, properties=properties)
components.append(c)
Expand Down Expand Up @@ -196,16 +195,51 @@ def _get_repo_name_from_origin_remote(package_dir: RootedPath) -> str:
return str(resolved_path)


def _prepare_environment_variables_for_hermetic_build() -> list[EnvironmentVariable]:
return [
def _prepare_environment_variables_for_hermetic_build(
git_paths: list[tuple[DepName, FSDepName, DepURL]] | None = None,
) -> list[EnvironmentVariable]:
env_vars = [
# Contains path to a directory where a new config could be found.
EnvironmentVariable(name="BUNDLE_APP_CONFIG", value="${output_dir}/" + CONFIG_OVERRIDE),
]
if git_paths:
# Redirect git remote URLs to pre-fetched local clones via git's

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-blocking request: could you please move lines 206 to 221 into a separate helper, maybe _produce_git_redirection_variables(git_paths: list[tuple[DepName, FSDepName, DepURL]] | None) -> list[EnvironmentVariable] to get something like this:

env_vars = [
        # Contains path to a directory where a new config could be found.
        EnvironmentVariable(name="BUNDLE_APP_CONFIG", value="${output_dir}/" + CONFIG_OVERRIDE),
]
env_vars.extend(_produce_git_redirection_variables(git_paths))
return env_vars

# GIT_CONFIG_COUNT/KEY/VALUE mechanism. This injects url.insteadOf
# entries without replacing the global git config.
# See: https://git-scm.com/docs/git-config#ENVIRONMENT
_check_for_duplicate_git_urls(git_paths)
# (key, value) pairs for GIT_CONFIG_KEY_N / GIT_CONFIG_VALUE_N
git_config: list[tuple[str, str]] = []
for _packname, dirname, url in git_paths:
clone_file_url = "file://${output_dir}/deps/bundler/" + dirname + "/"
git_config.append((f"url.{clone_file_url}.insteadOf", url))
git_config.append(("protocol.file.allow", "always"))

env_vars.append(EnvironmentVariable(name="GIT_CONFIG_COUNT", value=str(len(git_config))))
for idx, (key, value) in enumerate(git_config):
env_vars.append(EnvironmentVariable(name=f"GIT_CONFIG_KEY_{idx}", value=key))
env_vars.append(EnvironmentVariable(name=f"GIT_CONFIG_VALUE_{idx}", value=value))
return env_vars


def _check_for_duplicate_git_urls(
git_paths: list[tuple[DepName, FSDepName, DepURL]],
) -> None:
"""Raise if multiple git deps share the same URL with different revisions."""
url_to_dirs: dict[str, set[str]] = {}
for _packname, dirname, url in git_paths:
url_to_dirs.setdefault(url, set()).add(dirname)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nitpicks: _packname is not used in this function thus _ should be enough (that would also tell the reader that the element is not used); since you are not using the return value of setdefault it might be better to use defaultdict:

from collections improt defaultdict
...
...
...
urls_to_dirs = defaultfict(set)
for _, dirname, url in git_paths:
    urls_to_dirs[url].add(dirname)

It would also be nice to find all offending url:dirs pairs in one go (there could be more than one and users would get really upset if they were to fish for them one at a time):

offenders = [(url, dirs) for url, dirs in urls_to_dirs.items() if len(dirs) > 1]
if offenders:
    raise ...

for url, dirs in url_to_dirs.items():
if len(dirs) > 1:
raise UnsupportedFeature(
f"Multiple git dependencies point to the same repository ({url}) "
f"but use different revisions: {', '.join(sorted(dirs))}. "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use populate revisions in the error message with dirs which are filesystem dir names (e.g. name-abc123) rather than bare refs.

"This is not supported because git's url.insteadOf redirect "
"can only map a repository URL to a single local clone."
)


def _prepare_for_hermetic_build(
source_dir: RootedPath, output_dir: RootedPath, git_paths: list | None = None
) -> ProjectFile:
def _prepare_for_hermetic_build(source_dir: RootedPath, output_dir: RootedPath) -> ProjectFile:
"""Prepare a package for hermetic build by injecting necessary config."""
potential_bundle_config = source_dir.join_within_root(".bundle/config").path
hermetic_config = dedent(
Expand All @@ -218,30 +252,6 @@ def _prepare_for_hermetic_build(
BUNDLE_VERSION: "system"
"""
)
# Note: if a package depends on a git revision then the following variables
# are necessary for a hermetic build:
# BUNDLE_DISABLE_LOCAL_BRANCH_CHECK
# BUNDLE_DISABLE_LOCAL_REVISION_CHECK
# because otherwise some (potentially all, depending on exact set of
# ecosystem components versions, environment variables and celestial
# alignment) Bundler versions will try to fetch the latest changes of the
# remotes which may be present even when instructed not to with --local
# flag.
# See https://bundler.io/guides/git.html#local-git-repos for details.
# (or https://github.com/rubygems/bundler-site/blob/
# 9ff3b76e9866524ecefe165633ffb547f0004a99/source/guides/git.html.md
# if the link above ceases to exist).
if git_paths is not None:
hermetic_config += 'BUNDLE_DISABLE_LOCAL_BRANCH_CHECK: "true"\n'
hermetic_config += 'BUNDLE_DISABLE_LOCAL_REVISION_CHECK: "true"\n'
for packname, dirname in git_paths:
# "-" in variable names is deprecated in Bundler and now generates
# a warning and a suggestion to replace all dashes with triple
# underscores. Package names sometimes contain dashes:
varname = "BUNDLE_LOCAL." + packname.upper().replace("-", "___")
location = "${output_dir}/deps/bundler/" + dirname
config_entry = varname + f': "{location}"'
hermetic_config += f"{config_entry}\n"
if potential_bundle_config.is_file():
config_data = potential_bundle_config.read_text()
config_data += hermetic_config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@
version: spec.version,
type: 'git',
url: spec.source.uri,
branch: spec.source.branch,
ref: spec.source.revision
}
parsed_specs << parsed_spec
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/test_bundler.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def test_bundler_packages(
),
[], # No additional commands are run to verify the build
[],
id="bundler_e2e",
id="bundler_e2e_ruby33",
),
pytest.param(
utils.TestParameters(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
environment_variables:
- name: BUNDLE_APP_CONFIG
value: ${output_dir}/bundler/config_override
- name: GIT_CONFIG_COUNT
value: '2'
- name: GIT_CONFIG_KEY_0
value: url.file://${output_dir}/deps/bundler/json-schema-26487618a684/.insteadOf
- name: GIT_CONFIG_KEY_1
value: protocol.file.allow
- name: GIT_CONFIG_VALUE_0
value: https://github.com/3scale/json-schema
- name: GIT_CONFIG_VALUE_1
value: always
project_files:
- abspath: ${test_case_tmp_path}/hermeto-output/bundler/config_override/config
template: |2
Expand All @@ -11,6 +21,3 @@ project_files:
BUNDLE_ALLOW_OFFLINE_INSTALL: "true"
BUNDLE_DISABLE_VERSION_CHECK: "true"
BUNDLE_VERSION: "system"
BUNDLE_DISABLE_LOCAL_BRANCH_CHECK: "true"
BUNDLE_DISABLE_LOCAL_REVISION_CHECK: "true"
BUNDLE_LOCAL.JSON___SCHEMA: "${output_dir}/deps/bundler/json-schema-26487618a684"
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
environment_variables:
- name: BUNDLE_APP_CONFIG
value: ${output_dir}/bundler/config_override
- name: GIT_CONFIG_COUNT
value: '2'
- name: GIT_CONFIG_KEY_0
value: url.file://${output_dir}/deps/bundler/json-schema-26487618a684/.insteadOf
- name: GIT_CONFIG_KEY_1
value: protocol.file.allow
- name: GIT_CONFIG_VALUE_0
value: https://github.com/3scale/json-schema
- name: GIT_CONFIG_VALUE_1
value: always
project_files:
- abspath: ${test_case_tmp_path}/hermeto-output/bundler/config_override/config
template: |2
Expand All @@ -11,6 +21,3 @@ project_files:
BUNDLE_ALLOW_OFFLINE_INSTALL: "true"
BUNDLE_DISABLE_VERSION_CHECK: "true"
BUNDLE_VERSION: "system"
BUNDLE_DISABLE_LOCAL_BRANCH_CHECK: "true"
BUNDLE_DISABLE_LOCAL_REVISION_CHECK: "true"
BUNDLE_LOCAL.JSON___SCHEMA: "${output_dir}/deps/bundler/json-schema-26487618a684"
Loading