|
| 1 | +# Content-addressable gems in the **v1** compact index — resolution proof |
| 2 | + |
| 3 | +This prototype implements the proposal where |
| 4 | +content-addressable ("skinny") gems are **directly in the v1 compact index** |
| 5 | +, gated by a `rubygems:>=` requirement so that: |
| 6 | + |
| 7 | +- **old clients ignore** the skinny rows (they don't satisfy `rubygems:>=`), and |
| 8 | +- **new clients process** them, match on the `platform:` metadata token, and |
| 9 | + download the content-addressed `name-version-<sha>.gem`. |
| 10 | + |
| 11 | +A skinny `/info/hola` entry looks like (content-addressable rows last): |
| 12 | + |
| 13 | +``` |
| 14 | +--- |
| 15 | +1.0.0 |checksum:…,ruby:>= 3.1,rubygems:>= 3.3.22 |
| 16 | +1.0.0-x86_64-linux |checksum:…,ruby:>= 3.1 |
| 17 | +1.0.0-ef716ba7a6 |checksum:…,ruby:~> 4.0.0,rubygems:>= 4.1.0.dev,platform:= x86_64-linux |
| 18 | +``` |
| 19 | + |
| 20 | +- The version slot's "platform" (`ef716ba7a6`) is the **content address** = |
| 21 | + `sha256(.gem)[0,10]`. It becomes the gem's `full_name` and download path. |
| 22 | +- The `platform:` **metadata token** carries the real platform, used for |
| 23 | + compatibility matching. |
| 24 | +- `rubygems:>= 4.1.0.dev` is the gate: this is the RubyGems version |
| 25 | + content-addressable support is assumed to ship in. |
| 26 | + |
| 27 | +## Run it |
| 28 | + |
| 29 | +```bash |
| 30 | +# 1. Resolution proof — no compiler/network needed (pure parser + selection) |
| 31 | +./dev/content-addressable-v1-demo/test_resolution.sh |
| 32 | + |
| 33 | +# 2. Local install path — build a native gem, gem install, bundle install --local |
| 34 | +./dev/content-addressable-v1-demo/test_local.sh |
| 35 | + |
| 36 | +# 3. Remote install path — serve a v1 index from a fake server, bundle install |
| 37 | +PORT=8920 ./dev/content-addressable-v1-demo/test_remote_v1.sh |
| 38 | + |
| 39 | +# 4. Old-client compatibility — stock RubyGems/Bundler ignores the skinny rows |
| 40 | +PORT=8920 ./dev/content-addressable-v1-demo/test_remote_v1_oldclient.sh |
| 41 | +``` |
| 42 | + |
| 43 | +Each ends with `ALL GOOD`. `test_local.sh` / `test_remote_v1.sh` build a tiny |
| 44 | +native gem (`demo/`) and need a C toolchain plus a one-time `rake-compiler` |
| 45 | +install. They use the Ruby on your `PATH` (the demo gemspec pins `~>` that |
| 46 | +Ruby's minor, so it is "skinny" for whatever Ruby you run); override with |
| 47 | +`RUBY_PREFIX=/opt/rubies/X`. |
| 48 | + |
| 49 | +## What it proves |
| 50 | + |
| 51 | +### Resolution (`test_resolution.sh`) |
| 52 | + |
| 53 | +For a single `hola 1.0.0` published as **source**, **fat** (regular |
| 54 | +precompiled), and **skinny** (content-addressable) variants, using the *real* |
| 55 | +client code (`Gem::Resolver::APISet::GemParser`, `Bundler::EndpointSpecification` |
| 56 | +built exactly like `Bundler::Fetcher#specs`, and |
| 57 | +`Bundler::MatchPlatform.select_best_platform_match`): |
| 58 | + |
| 59 | +1. **New client picks the skinny one** and its download name reconstructs to |
| 60 | + `hola-1.0.0-ef716ba7a6.gem`. |
| 61 | +2. **Old client ignores the skinny rows.** The `rubygems:>= 4.1.0.dev` |
| 62 | + requirement is not satisfied by a pre-CA RubyGems (e.g. 3.5.0), so |
| 63 | + `matches_current_rubygems?` is false and the row is dropped; the old client |
| 64 | + falls back to the fat binary. |
| 65 | + |
| 66 | +### Install paths (`test_local.sh`, `test_remote_v1.sh`) |
| 67 | + |
| 68 | +These exercise the install machinery ported from |
| 69 | +[PR #168](https://github.com/Shopify/rubygems/pull/168) on top of the v1 branch: |
| 70 | + |
| 71 | +- **Build:** `rake native gem` content-addresses the skinny gem, renaming |
| 72 | + `hola-1.0.0-<platform>.gem` → `hola-1.0.0-<sha>.gem` in `Gem::Package.build`. |
| 73 | +- **`--local`:** `gem install` installs under `gems/hola-1.0.0-<sha>/` and |
| 74 | + records the sha in the stub line (`# stub: hola 1.0.0 <platform> lib <sha>`), |
| 75 | + so `full_name` reconstructs the content-addressed name offline. |
| 76 | + `bundle install --local` resolves it (the lockfile stays portable — |
| 77 | + `hola (1.0.0-<platform>)` — and Bundler bridges back to the on-disk |
| 78 | + `name-version-<sha>` gem), `bundle exec require` + `bundle list` work, and |
| 79 | + re-install is idempotent. |
| 80 | +- **Remote (v1):** Bundler fetches `/versions` and `/info/hola` (unprefixed — |
| 81 | + **no `/v2/` endpoint**), selects the skinny variant, downloads |
| 82 | + `/gems/hola-1.0.0-<sha>.gem`, installs it, and `require` works. |
| 83 | + |
| 84 | +### Old-client compatibility (`test_remote_v1_oldclient.sh`) |
| 85 | + |
| 86 | +The most important backwards-compat guarantee: a **new publisher** serving |
| 87 | +content-addressable gems must not break **old consumers**. This test serves a |
| 88 | +real fat binary plus a gated skinny row, then installs with two clients against |
| 89 | +the same v1 index: |
| 90 | + |
| 91 | +- **Old client** (the stock RubyGems/Bundler on `PATH` — here 4.0.10, older than |
| 92 | + the `4.1.0.dev` gate and without the patches): ignores the skinny row |
| 93 | + (`rubygems:>= 4.1.0.dev` is unsatisfied, and the `<sha>` token doesn't match |
| 94 | + the local platform anyway), installs `hola-1.0.0-<platform>.gem`, and |
| 95 | + `require` works. It **never requests** the skinny `.gem`, and doesn't choke on |
| 96 | + the `<sha>` version token or the `platform:` metadata token. |
| 97 | +- **New client** (patched): selects and downloads the skinny |
| 98 | + `hola-1.0.0-<sha>.gem`. |
| 99 | + |
| 100 | +The server request log shows exactly which `.gem` each client fetched, proving |
| 101 | +the split. |
| 102 | + |
| 103 | +## Client changes that make this work (vs. master) |
| 104 | + |
| 105 | +All on top of `master`; **no v2 endpoint involved**. Naming convention on this |
| 106 | +branch: the sha identity is `version_suffix`; the real platform from metadata is |
| 107 | +`platform_requirement`. |
| 108 | + |
| 109 | +### RubyGems core (build + install) |
| 110 | + |
| 111 | +| File | Change | |
| 112 | +| --- | --- | |
| 113 | +| `lib/rubygems/platform.rb` | Preserve a 10-hex-char **version suffix** verbatim instead of normalizing it to `os="unknown"`. Kept in its own `version_suffix` field; `==`/`hash`/`===` treat it as an exact-match token. | |
| 114 | +| `lib/rubygems/specification.rb` | `content_addressable?` / `content_addressable_ruby_abi` (skinny detection: `~> X.Y.Z` and rake-compiler's `>= X.Y, < X.(Y+1).dev`); `to_ruby` writes the version suffix into the stub line. | |
| 115 | +| `lib/rubygems/package.rb` | `Gem::Package.build` renames skinny gems to `name-version-<sha>.gem`. | |
| 116 | +| `lib/rubygems/package_task.rb` | Move the (renamed) built file to the package dir. | |
| 117 | +| `lib/rubygems/basic_specification.rb` | `version_suffix` accessor; `full_name` returns `name-version-<sha>` when set. | |
| 118 | +| `lib/rubygems/stub_specification.rb` | Read the optional 5th stub-line field (the sha); `full_name` reconstructs the content-addressed name; `to_spec` carries the suffix onto the loaded full spec. | |
| 119 | +| `lib/rubygems/installer.rb` | `assign_version_suffix` derives the sha from the gem's bytes before any path is computed. | |
| 120 | + |
| 121 | +### Bundler (resolution + install bridge) |
| 122 | + |
| 123 | +| File | Change | |
| 124 | +| --- | --- | |
| 125 | +| `bundler/lib/bundler/endpoint_specification.rb` | Parse the `platform:` metadata token into `platform_requirement`; add `content_addressable?`, `version_suffix`, and a platform match that uses the platform requirement. | |
| 126 | +| `bundler/lib/bundler/match_platform.rb` | When a skinny variant compatible with the running Ruby exists, prefer it **exclusively** over fat/source (per-Ruby-minor `~>` ranges are disjoint, so at most one qualifies). | |
| 127 | +| `bundler/lib/bundler/lazy_specification.rb` | Carry `platform_requirement`/`version_suffix`; reconstruct `full_name` as `name-version-<sha>`; on a `--local` exact-match miss, retry by name+version so the portable lockfile entry resolves to the on-disk `name-version-<sha>` gem. | |
| 128 | +| `bundler/lib/bundler/stub_specification.rb` | Delegate `full_name`/`version_suffix` to the underlying RubyGems stub. | |
| 129 | + |
| 130 | +## Differences from PR #168 |
| 131 | + |
| 132 | +- **No v2 endpoint.** PR #168 negotiates a `/v2/` compact-index namespace to hide |
| 133 | + content-addressable entries from old clients. This branch keeps everything in |
| 134 | + the **v1** index and hides skinny rows from old clients with a `rubygems:>=` |
| 135 | + gate instead, so `compact_index_client.rb` / `cache.rb` / |
| 136 | + `fetcher/compact_index.rb` are left untouched. |
| 137 | +- **Naming.** PR #168 uses `content_address` / `real_platform`; this branch uses |
| 138 | + `version_suffix` / `platform_requirement`. |
| 139 | + |
| 140 | +## Known nuance |
| 141 | + |
| 142 | +On the **remote** path the gem currently installs into |
| 143 | +`gems/hola-1.0.0-<real-platform>/`, whereas the **`--local`** path installs into |
| 144 | +`gems/hola-1.0.0-<sha>/`. Both are internally consistent and work for a single |
| 145 | +active Ruby, but they disagree on the on-disk directory name (same nuance noted |
| 146 | +in PR #168). |
0 commit comments