Skip to content

Commit 7ee1a23

Browse files
committed
Add content-addressable v1 index demo and tests
Self-contained, native end-to-end tests for content-addressable ("skinny") binaries served from a v1 compact index (no v2 endpoint, no rubygems.org, no containers): - demo/: a tiny native gem whose required_ruby_version pins the building Ruby's minor, so it builds as a skinny binary. - fake_compact_index.rb: a dependency-free threaded HTTP server that serves a directory as a v1 compact index (/versions, /info/<gem>, /gems/*). - prove_resolution.rb / test_resolution.sh: using the real parser, the real EndpointSpecification, and the real MatchPlatform selection, prove a new client picks the skinny variant over source+fat, and an old client drops it via the rubygems:>= gate. - test_local.sh: build, gem install (gems/name-version-<sha>/ + sha in the stub), bundle install --local (lockfile round-trip), require, idempotent re-install. - test_remote_v1.sh: bundle install against the fake v1 server; the skinny gem is selected and downloaded as /gems/name-version-<sha>.gem. - test_remote_v1_oldclient.sh: a stock RubyGems/Bundler older than the gate installs the fat binary and never requests the skinny gem, while the patched client picks the skinny one, against the same index. README.md documents what each proves and how this differs from PR #168. Assisted-By: devx/4ab7951d-76be-4b93-8ffb-c3581711ac1f
1 parent 4fc3dd7 commit 7ee1a23

13 files changed

Lines changed: 691 additions & 0 deletions

File tree

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Content-addressable gems in the **v1** compact index — resolution proof
2+
3+
This prototype implements the proposal where
4+
content-addressable ("skinny") gems are **directly in the v1 compact index**
5+
, gated by a `rubygems:>=` requirement so that:
6+
7+
- **old clients ignore** the skinny rows (they don't satisfy `rubygems:>=`), and
8+
- **new clients process** them, match on the `platform:` metadata token, and
9+
download the content-addressed `name-version-<sha>.gem`.
10+
11+
A skinny `/info/hola` entry looks like (content-addressable rows last):
12+
13+
```
14+
---
15+
1.0.0 |checksum:…,ruby:>= 3.1,rubygems:>= 3.3.22
16+
1.0.0-x86_64-linux |checksum:…,ruby:>= 3.1
17+
1.0.0-ef716ba7a6 |checksum:…,ruby:~> 4.0.0,rubygems:>= 4.1.0.dev,platform:= x86_64-linux
18+
```
19+
20+
- The version slot's "platform" (`ef716ba7a6`) is the **content address** =
21+
`sha256(.gem)[0,10]`. It becomes the gem's `full_name` and download path.
22+
- The `platform:` **metadata token** carries the real platform, used for
23+
compatibility matching.
24+
- `rubygems:>= 4.1.0.dev` is the gate: this is the RubyGems version
25+
content-addressable support is assumed to ship in.
26+
27+
## Run it
28+
29+
```bash
30+
# 1. Resolution proof — no compiler/network needed (pure parser + selection)
31+
./dev/content-addressable-v1-demo/test_resolution.sh
32+
33+
# 2. Local install path — build a native gem, gem install, bundle install --local
34+
./dev/content-addressable-v1-demo/test_local.sh
35+
36+
# 3. Remote install path — serve a v1 index from a fake server, bundle install
37+
PORT=8920 ./dev/content-addressable-v1-demo/test_remote_v1.sh
38+
39+
# 4. Old-client compatibility — stock RubyGems/Bundler ignores the skinny rows
40+
PORT=8920 ./dev/content-addressable-v1-demo/test_remote_v1_oldclient.sh
41+
```
42+
43+
Each ends with `ALL GOOD`. `test_local.sh` / `test_remote_v1.sh` build a tiny
44+
native gem (`demo/`) and need a C toolchain plus a one-time `rake-compiler`
45+
install. They use the Ruby on your `PATH` (the demo gemspec pins `~>` that
46+
Ruby's minor, so it is "skinny" for whatever Ruby you run); override with
47+
`RUBY_PREFIX=/opt/rubies/X`.
48+
49+
## What it proves
50+
51+
### Resolution (`test_resolution.sh`)
52+
53+
For a single `hola 1.0.0` published as **source**, **fat** (regular
54+
precompiled), and **skinny** (content-addressable) variants, using the *real*
55+
client code (`Gem::Resolver::APISet::GemParser`, `Bundler::EndpointSpecification`
56+
built exactly like `Bundler::Fetcher#specs`, and
57+
`Bundler::MatchPlatform.select_best_platform_match`):
58+
59+
1. **New client picks the skinny one** and its download name reconstructs to
60+
`hola-1.0.0-ef716ba7a6.gem`.
61+
2. **Old client ignores the skinny rows.** The `rubygems:>= 4.1.0.dev`
62+
requirement is not satisfied by a pre-CA RubyGems (e.g. 3.5.0), so
63+
`matches_current_rubygems?` is false and the row is dropped; the old client
64+
falls back to the fat binary.
65+
66+
### Install paths (`test_local.sh`, `test_remote_v1.sh`)
67+
68+
These exercise the install machinery ported from
69+
[PR #168](https://github.com/Shopify/rubygems/pull/168) on top of the v1 branch:
70+
71+
- **Build:** `rake native gem` content-addresses the skinny gem, renaming
72+
`hola-1.0.0-<platform>.gem``hola-1.0.0-<sha>.gem` in `Gem::Package.build`.
73+
- **`--local`:** `gem install` installs under `gems/hola-1.0.0-<sha>/` and
74+
records the sha in the stub line (`# stub: hola 1.0.0 <platform> lib <sha>`),
75+
so `full_name` reconstructs the content-addressed name offline.
76+
`bundle install --local` resolves it (the lockfile stays portable —
77+
`hola (1.0.0-<platform>)` — and Bundler bridges back to the on-disk
78+
`name-version-<sha>` gem), `bundle exec require` + `bundle list` work, and
79+
re-install is idempotent.
80+
- **Remote (v1):** Bundler fetches `/versions` and `/info/hola` (unprefixed —
81+
**no `/v2/` endpoint**), selects the skinny variant, downloads
82+
`/gems/hola-1.0.0-<sha>.gem`, installs it, and `require` works.
83+
84+
### Old-client compatibility (`test_remote_v1_oldclient.sh`)
85+
86+
The most important backwards-compat guarantee: a **new publisher** serving
87+
content-addressable gems must not break **old consumers**. This test serves a
88+
real fat binary plus a gated skinny row, then installs with two clients against
89+
the same v1 index:
90+
91+
- **Old client** (the stock RubyGems/Bundler on `PATH` — here 4.0.10, older than
92+
the `4.1.0.dev` gate and without the patches): ignores the skinny row
93+
(`rubygems:>= 4.1.0.dev` is unsatisfied, and the `<sha>` token doesn't match
94+
the local platform anyway), installs `hola-1.0.0-<platform>.gem`, and
95+
`require` works. It **never requests** the skinny `.gem`, and doesn't choke on
96+
the `<sha>` version token or the `platform:` metadata token.
97+
- **New client** (patched): selects and downloads the skinny
98+
`hola-1.0.0-<sha>.gem`.
99+
100+
The server request log shows exactly which `.gem` each client fetched, proving
101+
the split.
102+
103+
## Client changes that make this work (vs. master)
104+
105+
All on top of `master`; **no v2 endpoint involved**. Naming convention on this
106+
branch: the sha identity is `version_suffix`; the real platform from metadata is
107+
`platform_requirement`.
108+
109+
### RubyGems core (build + install)
110+
111+
| File | Change |
112+
| --- | --- |
113+
| `lib/rubygems/platform.rb` | Preserve a 10-hex-char **version suffix** verbatim instead of normalizing it to `os="unknown"`. Kept in its own `version_suffix` field; `==`/`hash`/`===` treat it as an exact-match token. |
114+
| `lib/rubygems/specification.rb` | `content_addressable?` / `content_addressable_ruby_abi` (skinny detection: `~> X.Y.Z` and rake-compiler's `>= X.Y, < X.(Y+1).dev`); `to_ruby` writes the version suffix into the stub line. |
115+
| `lib/rubygems/package.rb` | `Gem::Package.build` renames skinny gems to `name-version-<sha>.gem`. |
116+
| `lib/rubygems/package_task.rb` | Move the (renamed) built file to the package dir. |
117+
| `lib/rubygems/basic_specification.rb` | `version_suffix` accessor; `full_name` returns `name-version-<sha>` when set. |
118+
| `lib/rubygems/stub_specification.rb` | Read the optional 5th stub-line field (the sha); `full_name` reconstructs the content-addressed name; `to_spec` carries the suffix onto the loaded full spec. |
119+
| `lib/rubygems/installer.rb` | `assign_version_suffix` derives the sha from the gem's bytes before any path is computed. |
120+
121+
### Bundler (resolution + install bridge)
122+
123+
| File | Change |
124+
| --- | --- |
125+
| `bundler/lib/bundler/endpoint_specification.rb` | Parse the `platform:` metadata token into `platform_requirement`; add `content_addressable?`, `version_suffix`, and a platform match that uses the platform requirement. |
126+
| `bundler/lib/bundler/match_platform.rb` | When a skinny variant compatible with the running Ruby exists, prefer it **exclusively** over fat/source (per-Ruby-minor `~>` ranges are disjoint, so at most one qualifies). |
127+
| `bundler/lib/bundler/lazy_specification.rb` | Carry `platform_requirement`/`version_suffix`; reconstruct `full_name` as `name-version-<sha>`; on a `--local` exact-match miss, retry by name+version so the portable lockfile entry resolves to the on-disk `name-version-<sha>` gem. |
128+
| `bundler/lib/bundler/stub_specification.rb` | Delegate `full_name`/`version_suffix` to the underlying RubyGems stub. |
129+
130+
## Differences from PR #168
131+
132+
- **No v2 endpoint.** PR #168 negotiates a `/v2/` compact-index namespace to hide
133+
content-addressable entries from old clients. This branch keeps everything in
134+
the **v1** index and hides skinny rows from old clients with a `rubygems:>=`
135+
gate instead, so `compact_index_client.rb` / `cache.rb` /
136+
`fetcher/compact_index.rb` are left untouched.
137+
- **Naming.** PR #168 uses `content_address` / `real_platform`; this branch uses
138+
`version_suffix` / `platform_requirement`.
139+
140+
## Known nuance
141+
142+
On the **remote** path the gem currently installs into
143+
`gems/hola-1.0.0-<real-platform>/`, whereas the **`--local`** path installs into
144+
`gems/hola-1.0.0-<sha>/`. Both are internally consistent and work for a single
145+
active Ruby, but they disagree on the on-disk directory name (same nuance noted
146+
in PR #168).
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
/tmp/
2+
/pkg/
3+
/lib/hola/*.bundle
4+
/lib/hola/*.so
5+
*.gem
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
require "rake/extensiontask"
2+
spec = Gem::Specification.load("hola.gemspec")
3+
Rake::ExtensionTask.new("hola", spec) do |ext|
4+
ext.lib_dir = "lib/hola"
5+
end
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
require "mkmf"
2+
create_makefile("hola/hola")
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#include <ruby.h>
2+
3+
static VALUE hola(VALUE self) {
4+
return rb_str_new_cstr("hola from native");
5+
}
6+
7+
void Init_hola(void) {
8+
rb_define_global_function("hola", hola, 0);
9+
}
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Gem::Specification.new do |s|
2+
s.name = "hola"
3+
s.version = "1.0.0"
4+
s.summary = "content-addressable native gem demo"
5+
s.authors = ["poc"]
6+
s.files = Dir["lib/**/*.rb"] + Dir["ext/**/*"]
7+
s.extensions = ["ext/hola/extconf.rb"]
8+
# Pin to a single Ruby ABI (the Ruby building this gem) so it is a "skinny"
9+
# binary -> content-addressable. Computed at build time so the demo is skinny
10+
# for whatever Ruby you run it with, not just 3.3.
11+
minor = RUBY_VERSION.split(".").first(2).join(".")
12+
s.required_ruby_version = "~> #{minor}.0"
13+
end
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
require "hola/hola"
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/env ruby
2+
# frozen_string_literal: true
3+
#
4+
# Minimal compact-index gem server for testing content-addressable ("skinny")
5+
# binaries end-to-end over the **v1** compact index, with no external deps
6+
# (raw TCPServer, threaded).
7+
#
8+
# Serves a directory tree as static files over HTTP/1.1:
9+
# GET /versions -> <root>/versions
10+
# GET /info/<gem> -> <root>/info/<gem>
11+
# GET /gems/<file>.gem -> <root>/gems/<file>.gem
12+
#
13+
# Always returns the full body (200); ignores Range (Bundler's compact-index
14+
# updater handles a full response even when it sent a Range header). Logs every
15+
# request to stderr so you can see exactly what Bundler asks for.
16+
#
17+
# Usage: ruby fake_compact_index.rb <root_dir> <port>
18+
19+
require "socket"
20+
21+
root = File.expand_path(ARGV[0] || ".")
22+
port = Integer(ARGV[1] || "8899")
23+
24+
server = TCPServer.new("127.0.0.1", port)
25+
warn "[fake-index] serving #{root} on http://127.0.0.1:#{port}"
26+
27+
loop do
28+
conn = server.accept
29+
Thread.new(conn) do |c|
30+
begin
31+
request_line = c.gets
32+
next unless request_line
33+
method, path, = request_line.split(" ")
34+
# drain headers
35+
while (line = c.gets) && line != "\r\n"; end
36+
37+
clean = path.split("?", 2).first.to_s
38+
file = File.join(root, clean)
39+
warn "[fake-index] #{method} #{clean} -> #{File.exist?(file) ? "200" : "404"}"
40+
41+
if File.file?(file)
42+
body = File.binread(file)
43+
ctype = clean.end_with?(".gem") ? "application/octet-stream" : "text/plain"
44+
head = +"HTTP/1.1 200 OK\r\n"
45+
head << "Content-Type: #{ctype}\r\n"
46+
head << "Content-Length: #{body.bytesize}\r\n"
47+
head << "Accept-Ranges: none\r\n"
48+
head << "Connection: close\r\n\r\n"
49+
c.write(head)
50+
c.write(body) unless method == "HEAD"
51+
else
52+
c.write("HTTP/1.1 404 Not Found\r\nContent-Length: 0\r\nConnection: close\r\n\r\n")
53+
end
54+
rescue => e
55+
warn "[fake-index] error: #{e.class}: #{e.message}"
56+
ensure
57+
c.close rescue nil
58+
end
59+
end
60+
end
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# frozen_string_literal: true
2+
3+
# Proof: content-addressable ("skinny") gems encoded in the **v1** compact index
4+
# are (a) chosen by a new client over the fat and source variants, and (b)
5+
# ignored by old clients via the `rubygems:` requirement gate.
6+
#
7+
# This deliberately uses the *real* client code paths -- the host RubyGems
8+
# compact-index line parser (Gem::Resolver::APISet::GemParser), the real
9+
# Bundler::EndpointSpecification (built exactly like Bundler::Fetcher#specs
10+
# does), and the real Bundler::MatchPlatform platform selection -- so the
11+
# demo proves the production behaviour, not a reimplementation.
12+
#
13+
# Run with the patched RubyGems + Bundler from this checkout:
14+
#
15+
# ruby --disable-gems -Ilib -rrubygems -Ibundler/lib \
16+
# dev/content-addressable-v1-demo/prove_resolution.rb
17+
#
18+
# (test_resolution.sh wires that up for you.)
19+
20+
require "bundler"
21+
require "bundler/endpoint_specification"
22+
require "bundler/match_platform"
23+
require "rubygems/resolver"
24+
25+
# The RubyGems version at which content-addressable support ships. New clients
26+
# satisfy this; old clients do not and therefore drop the skinny rows.
27+
CA_SUPPORTED_FROM = "4.1.0.dev"
28+
29+
LOCAL = Gem::Platform.local # e.g. arm64-darwin-24 / x86_64-linux
30+
RUBY_MINOR = RUBY_VERSION.split(".").first(2).join(".") # e.g. "4.0"
31+
SHA10 = "ef716ba7a6" # sha256(.gem)[0,10]
32+
33+
# A single gem `hola 1.0.0` published in three shapes, exactly as it would
34+
# appear in the v1 `/info/hola` file. Order mirrors the Slack proposal: the
35+
# content-addressable rows come last and carry `rubygems:>=` + `platform:`.
36+
INFO = <<~INFO
37+
---
38+
1.0.0 |checksum:#{"a" * 64},ruby:>= 3.1,rubygems:>= 3.3.22
39+
1.0.0-#{LOCAL} |checksum:#{"b" * 64},ruby:>= 3.1
40+
1.0.0-#{SHA10} |checksum:#{"c" * 64},ruby:~> #{RUBY_MINOR}.0,rubygems:>= #{CA_SUPPORTED_FROM},platform:= #{LOCAL}
41+
INFO
42+
43+
# Minimal spec fetcher stand-in: EndpointSpecification only needs #uri (for
44+
# checksum attribution). #fetch_spec is never reached in this offline demo.
45+
FakeFetcher = Struct.new(:uri) do
46+
def fetch_spec(*) = raise("network access not expected in this demo")
47+
end
48+
FETCHER = FakeFetcher.new("https://example.test")
49+
50+
# Build EndpointSpecifications from the info file using the real parser, the
51+
# same way Bundler::Fetcher#specs does (name, version, platform, deps, metadata).
52+
def build_specs(info)
53+
parser = Gem::Resolver::APISet::GemParser.new
54+
lines = info.split("\n")
55+
body = lines[(lines.index("---") + 1)..]
56+
body.map do |line|
57+
version, platform, deps, reqs = parser.parse(line)
58+
Bundler::EndpointSpecification.new("hola", version, platform, FETCHER, deps, reqs)
59+
end
60+
end
61+
62+
def describe(spec)
63+
kind =
64+
if spec.content_addressable? then "SKINNY (content-addressable)"
65+
elsif spec.platform == Gem::Platform::RUBY then "source (ruby platform)"
66+
else "fat (precompiled platform)"
67+
end
68+
"#{spec.full_name.ljust(28)} -> #{kind}"
69+
end
70+
71+
specs = build_specs(INFO)
72+
73+
puts "Running RubyGems #{Gem::VERSION}, Ruby #{RUBY_VERSION}, platform #{LOCAL}"
74+
puts
75+
puts "Parsed v1 /info/hola into #{specs.size} candidate specs:"
76+
specs.each {|s| puts " #{describe(s)}" }
77+
puts
78+
79+
# ---------------------------------------------------------------------------
80+
# Part 1: a NEW client (this RubyGems) picks the skinny variant.
81+
# ---------------------------------------------------------------------------
82+
chosen = Bundler::MatchPlatform.select_best_platform_match(specs, LOCAL)
83+
raise "expected exactly one winner, got #{chosen.size}" unless chosen.size == 1
84+
winner = chosen.first
85+
86+
puts "[new client] select_best_platform_match(#{LOCAL}) chose:"
87+
puts " #{describe(winner)}"
88+
puts " download name: #{winner.full_name}.gem (reconstructed from version + sha)"
89+
puts
90+
91+
unless winner.content_addressable?
92+
abort "FAIL: expected the skinny (content-addressable) gem to be chosen"
93+
end
94+
95+
# ---------------------------------------------------------------------------
96+
# Part 2: an OLD client drops the skinny rows via the `rubygems:` gate.
97+
# `matches_current_rubygems?` is exactly the lever the resolver uses to
98+
# exclude metadata-incompatible candidates. We simulate an old client by
99+
# checking the gate against a pre-CA RubyGems version.
100+
# ---------------------------------------------------------------------------
101+
old_rubygems = Gem::Version.new("3.5.0")
102+
skinny = specs.find(&:content_addressable?)
103+
gate = skinny.required_rubygems_version
104+
old_client_keeps_skinny = gate.satisfied_by?(old_rubygems)
105+
new_client_keeps_skinny = skinny.matches_current_rubygems?
106+
107+
puts "[gate] skinny row declares rubygems:#{gate}"
108+
puts " old client (RubyGems #{old_rubygems}): keeps skinny? #{old_client_keeps_skinny} -> ignores it"
109+
puts " new client (RubyGems #{Gem::VERSION}): keeps skinny? #{new_client_keeps_skinny} -> processes it"
110+
puts
111+
112+
if old_client_keeps_skinny
113+
abort "FAIL: an old client should NOT satisfy the rubygems gate"
114+
end
115+
unless new_client_keeps_skinny
116+
abort "FAIL: this (new) client should satisfy the rubygems gate"
117+
end
118+
119+
# What an OLD client would actually resolve: drop gated-out rows first, then
120+
# run the same platform selection. It must fall back to the fat binary.
121+
old_visible = specs.reject {|s| !s.required_rubygems_version.satisfied_by?(old_rubygems) }
122+
old_winner = Bundler::MatchPlatform.select_best_platform_match(old_visible, LOCAL).first
123+
puts "[old client] after dropping gated rows, select_best_platform_match chose:"
124+
puts " #{describe(old_winner)}"
125+
if old_winner.content_addressable?
126+
abort "FAIL: old client should fall back to the fat binary, not the skinny one"
127+
end
128+
puts
129+
130+
puts "ALL GOOD: new clients choose the skinny gem; old clients fall back to fat."

0 commit comments

Comments
 (0)