Skip to content

Conversation

@raphael-proust
Copy link
Contributor

The idea for this PR is to allow the URL field of opam files to include a signed-by field:

url {
  src: "git+https://<git-repo-url>#<hash-or-tag>"
  signed-by: <public key here>
}

The status is that it's not ready in any way but that the basic structure of the code is there. (You know, all that remains is the mere pesky little details of cryptography.)

This PR is meant more as a place to discuss primarily the basic idea of adding commit-signature verification into opam, and then secondarily the file format changes and the implementation and such…

  • Is this idea workable in any way?
  • What obvious issues have I missed?
  • Any comments? Counter proposals? etc.

url {
  src: "git+https://<git-repo-url>#<hash-or-tag>"
  signed-by: <public key here>
}
@raphael-proust raphael-proust changed the title WIP: thinking out loud about signed-by field to accompany git: urls RFC: thinking out loud about signed-by field to accompany git: urls Sep 19, 2025
@raphael-proust
Copy link
Contributor Author

pinging @kit-ty-kate who has discussed some of this before and @hannesm who has thought about package signing in the context of opam-repo

@kit-ty-kate
Copy link
Member

I'm unfamiliar with the invariants brought by git signing (e.g. how does this check the final content, is the check redundant, what algorithms are used, …) so i'll leave that to @hannesm who probably knows better.

Adding this to be used in opam-repository at scale, bring a lot of new issues. Here's what i could think of so far:

  • non-shallow git clones are extremely slow and network taxing compared to a simple http request. Shallow clones have been explored in the past but they break some users in subtle ways (Git --depth 1 breaks orb/dune #6145), so i think to make this work properly, opam should have a new ""download mode"" other than git(+https):// which downloads via git but will just remove the .git directory before running any of the commands
  • this would require a new way of caching download artifacts (currently archives are cached in $OPAMROOT/download-cache but adding even bare directories brings a new class of worries at scale in terms of IO (e.g. what i'm trying to avoid with [WIP] Improve performance of opam update/init by changing the structure of the internal http opam repositories (use the tar.gz as-is) #6625) so i think it would probably be best to use ocaml-git (or any newer stack) directly and store the artefact as a tarball using ocaml-tar
  • as a more minor issue, new fields can't be added as-is in opam 2.x until everyone uses opam >= 2.3 (Silently mark packages requiring an unsupported version of opam as unavailable #5665) so it would have to be an x-signed-by field instead, similar to the x-env-path-rewrite added in opam 2.2, or this would have to wait for opam 3.0 (which seems likely, given the number of large issues)
  • similar to a previous point but at another level, how would the archive cache work? (e.g. https://opam.ocaml.org/cache)

@kit-ty-kate kit-ty-kate marked this pull request as draft September 19, 2025 16:12
@kit-ty-kate kit-ty-kate added the PR: WIP Not for merge at this stage label Sep 19, 2025
@kit-ty-kate
Copy link
Member

kit-ty-kate commented Sep 19, 2025

src: "git+https://<git-repo-url>#<hash-or-tag>"

Also i don't think opam should allow tags in signed-by mode as it breaks the reproducibility principle (the author can, maliciously or not, force-push the tag to something completely different, as far as i understand signing doesn't protect against that at all).

Hashes can be fine though, iff the algorithms used are sound.

@raphael-proust
Copy link
Contributor Author

Thanks for the answer!

re: tags breaking the reproducibility: Wouldn't the checksum check take care of detecting changes? Actually thinking about it: what does the checksum do for git urls? should it check the commit hash? Gonna look into this a bit.

re: storing and caching (but actually it also relates to checking reproducibility): I think we can work something because we can go from commit object to tree hash, from tree hash to tree (via clone or checkout), from tree to archive (via git archive), and we can get a checksum from the archive. We can also do archive to tree/tree-hash (via write-tree) although I'm not 100%. So the download process for a signed commit would be:

  • get the commit object
  • check signature
  • check cache (local and remote) based on checksum
  • if miss: download tree, check that archiving + hashing gives the correct checksum
  • if hit: check that "treeifying" gives the same hash as from the commit object
    there's a lot of details to figure out and it feels like maybe there should be something simpler

@hannesm
Copy link
Member

hannesm commented Sep 20, 2025

What a nice idea. But what is the gain and what are the failure semantics?

So we have a package A that is signed with some key at a commit C. How should an update to C+n work? What should happen (inside opam?) if the signature verification fails? Last time I looked, the signature data is our of the git tree and thus a malicious server (or person in the middle) could strip the signature off.

And what about opam files with released packages, where we have the tarballs?

And will the code for signature verification be part of opam? Or done differently?

I guess with this and conex we can verify pins (so something ponned to a git repository) and releases at the same time, which sounds great.

But another thing is, for conex and the threat models, it is crucial that the metadata (the opam file) itself is signed - since there's the set of dependencies (so nobody injects you some other dependency).... I guess with this proposal here only the source code would be signed (which is already better than nothing, but maybe not enough).

I suggest to read through https://theupdateframework.github.io/specification/latest/#goals-to-protect-against-specific-attacks to get an idea what are attack vectors (by no means this says we shouldn't do anything that doesn't cover all of them).

@raphael-proust
Copy link
Contributor Author

So we have a package A that is signed with some key at a commit C. How should an update to C+n work?

I think the idea I had is that C+n would be a new release, necessiting a new package description file. So you'd have a file packages/foobar/foobar.0.3/opam mentioning commit 0123456789abcdef and you'd have packages/foobar/foobar.0.4/opam mentioning package 0987654321fedcab.

The advantage is that it makes it easy to know that both packages were published by the same author. (Assuming the author's key has not be stollen etc.) Two packages with the same "signed-by" field can only validly point to commits that have been signed by the same person (once again, assuming…).

A nice future work is for opam to warn users when they update from a version to another and the signing key is different.

What should happen (inside opam?) if the signature verification fails? Last time I looked, the signature data is our of the git tree and thus a malicious server (or person in the middle) could strip the signature off.

Signature verification failure should be treated like a checksum failure: don't even save the data in the local cache.

The remote (or some intermediary) can remove the signature, but that's the same threat as the remote sending a tarball with different data than was published. Right? (I might be missing something.)

And what about opam files with released packages, where we have the tarballs?

I don't understand the question. The idea is to be able to use either tarball-style publishing (as it currently exists) or commit-style publishing (new proposal). Packages can do either. But I'm not sure this answers your question.

And will the code for signature verification be part of opam? Or done differently?

It could be done by a separate tool that opam calls when it tries to download git urls with signed-by field specifically. At this stage I don't know if that's better than opam doing the check itself.

I guess with this and conex we can verify pins (so something ponned to a git repository) and releases at the same time, which sounds great.

But another thing is, for conex and the threat models, it is crucial that the metadata (the opam file) itself is signed - since there's the set of dependencies (so nobody injects you some other dependency).... I guess with this proposal here only the source code would be signed (which is already better than nothing, but maybe not enough).

Correct, it's only the source code that is signed.
I guess that if we add code to check commit signatures, we might share some of it with some mechanism to check signatures on the opam-repository commits too. I don't know that it's something we'd want to do.

I suggest to read through https://theupdateframework.github.io/specification/latest/#goals-to-protect-against-specific-attacks to get an idea what are attack vectors (by no means this says we shouldn't do anything that doesn't cover all of them).

Thanks, I'll have a look.

@hannesm
Copy link
Member

hannesm commented Oct 20, 2025

The advantage is that it makes it easy to know that both packages were published by the same author. (Assuming the author's key has not be stollen etc.) Two packages with the same "signed-by" field can only validly point to commits that have been signed by the same person (once again, assuming…).

A nice future work is for opam to warn users when they update from a version to another and the signing key is different.

This I guess is one of the really difficult bits and pieces: the public key infrastructure you want to establish. For your personal project this may be feasible (assuming you never lose your key) -- but for some organization publishing releases (or yourself losing your key), key rollover (so another key signs the next release) is crucial. It may be feasible with your proposed signed-by field to reference sigstore (https://www.sigstore.dev/)?

If every key rollover triggers a warning in opam, I suspect you'll end up with lots of false positives and users/clients will ignore the warning -- so a compromise won't be detected.

Also, it sounds computationally heavy that for each update you'll need to check all previous releases of that package to check whether the key id is the same.

And the other issue is that the metadata (the opam file itself) is not signed -- i.e. an attacker with access to the repository can introduce arbitrary dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AREA: DESIGN PR: WIP Not for merge at this stage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants