Add encryptionScheme parameter to htsget by brainstorm · Pull Request #808 · samtools/hts-specs

brainstorm · 2025-02-03T09:26:53Z

This is a non-breaking, optional change for htsget.

This is already supported as experimental feature in htsget-rs since umccr/htsget-rs#298. It specifically allows for Crypt4GH requests for each underlying bioinformatics format.

There's also the corresponding work in progress in htsget-compliance testsuite to support this extra parameter/schemes in ga4gh/htsget-compliance#11.

/cc @ohofmann @mmalenic @mlin @jppay

daviesrob · 2025-02-03T10:14:58Z

How exactly are you implementing crypt4gh in htsget-rs?

There's basically two ways you could do it - either encrypt each segment linked in the htsget response separately, or have them all join together to make a single encrypted file. HTSlib has historically supported the latter, and also looks for the magic number so it can tell an encrypted payload apart from a plain one without having to make any changes to the htsget specification.

mmalenic · 2025-02-04T02:14:58Z

How exactly are you implementing crypt4gh in htsget-rs?

It's implemented by joining segments together into a single encrypted file. It calculates the bytes to return to the client, and then re-encrypts the Crypt4GH header and adds an edit list packet using a configured public/private key pair.

and also looks for the magic number so it can tell an encrypted payload apart from a plain one without having to make any changes to the htsget specification.

This works, however the encryptionScheme parameter is non-breaking and optional, and I think it would clearer and more practical for the client.

The advantage of adding an additional parameter is that it allows the client to directly request encrypted or non-encrypted files without having to inspect the response header type. I think this lines up with the logic of specifying CRAM/BAM or BCF/VCF in the format parameter. It also allows the client to request encrypted or non-encrypted data for the same <id>.

Similar discussion to #581, where clients can give hints on the data they are expecting to make their logic easier.

brainstorm · 2025-02-10T04:22:46Z

@daviesrob If needed, shall we discuss this over a the 4-weekly htsget APAC call?:

GA4GH htsget videoconference taking place at an Asia-Pacific
friendly time of 10pm BST / 2pm PDT / 5pm EDT and 
next day 7am Melbourne AEST.

jmarshall · 2025-02-12T00:01:44Z

I agree with @daviesrob that detecting what you got back is and should be the primary way of selecting how to decode the response.

However it does seem like there would be some benefit in enabling the client to say it wants encryption and having a way for the client and server to negotiate that. Somewhat unclear that that should be in the initial request URL rather than something negotiated via the ticket… 🤔

jmarshall · 2025-02-12T00:03:44Z

If encryptionScheme is specified a 4 letter code […] When c4gh is specified […]

Where do these 4-character codes come from? Are they in common with some other protocol, or have they been invented for this PR? Are letters really in such short supply that this has to be specifically 4 characters?

jmarshall · 2025-02-12T00:05:23Z

htsget.md

+
+If `encryptionScheme` is specified a 4 letter code for the encryption standard MUST be returned. When `c4gh` is specified, Crypt4GH payload for a particular object will be returned (if available).
+
+The server SHOULD reply with a `NotFound` error if the requested reference does not exist.


This doesn't make sense and appears to have been copy/pasted from the referenceName section. Do you mean if the server does not recognise a requested encryption code?

Good catch, I meant (encrypted) object Id, fixed in 4e6fa1a

If this is about the resource's <id>, i.e., what forms the bulk of the path part of the URL, then it should be in the URL parameters section (in fact, it seems this has always been missing there!). That would be a separate PR.

If this is about the 4-letter code designating the encryption method, then it should be here and use the same code terminology.

htsget.md

brainstorm · 2025-02-12T00:45:12Z

If encryptionScheme is specified a 4 letter code […] When c4gh is specified […]

Where do these 4-character codes come from? Are they in common with some other protocol, or have they been invented for this PR? Are letters really in such short supply that this has to be specifically 4 characters?

I was hesitant to go for free-form string and I borrowed the idea of restricting format names to (at most) a 4 letter enum: BAM, CRAM, VCF, BCF.

Admittedly a lot of encryption schemes might require more than 4 letters to declare, I wish there was a standard that defined this (IETF? W3C? OpenSSL source code?) that enumerates the most common encryption scheme strings. Anyways, very good point, what would you suggest @jmarshall?

brainstorm · 2025-02-12T00:48:35Z

@jmarshall:

(...) is and should be the primary way of selecting how to decode the response.

Why?

jmarshall · 2025-11-18T21:17:26Z

Discussion from the September meeting:

Let's spell out encryptionScheme=crypt4gh rather than abbreviating as c4gh.
Further consider this proposal vs requesting via Accept headers (probably this proposal is more specific and bespoke so appropriate in this case) and what interplay there is with adding fields in the ticket to represent what encryption is available.
Consider also adding an encryptionScheme=none|unencrypted|plaintext (pick one!) value for the client to request specifically unencrypted data.
What should the server return when encryptionScheme=crypt4gh is requested but the server only has unencrypted data available? Perhaps a plain NotFound indistinguishable from when it has no data at all for that sample/etc; or a new NotAvailableEncrypted error; or it just returns the plaintext version despite the client's request (almost certainly not).

We also clarified the motivation for this functionality:

Clients may want to say “I definitely want the transport used for data coming to me to be crypt4gh”.
Provides a way to select which edition to provide if a server has both plain and encrypted copies of the same data.

jmarshall · 2025-11-18T22:13:44Z

Often it'll be the server that's motivated to not send unencrypted data, and it can already just return an error instead. But still some utility in this explicitness.

Add encryptionScheme parameter to htsget

30c52eb

jmarshall reviewed Feb 12, 2025

View reviewed changes

htsget.md Outdated Show resolved Hide resolved

jmarshall added the htsget label Feb 12, 2025

Address samtools#808 (comment)

4e6fa1a

Address samtools#808 (comment)

016f595

brainstorm mentioned this pull request Mar 17, 2025

feat: add in-depth demo for htsget-rs umccr-svc/site#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add encryptionScheme parameter to htsget#808

Add encryptionScheme parameter to htsget#808
brainstorm wants to merge 3 commits intosamtools:masterfrom
umccr:htsget_crypt4gh

brainstorm commented Feb 3, 2025

Uh oh!

daviesrob commented Feb 3, 2025

Uh oh!

mmalenic commented Feb 4, 2025

Uh oh!

brainstorm commented Feb 10, 2025 •

edited

Loading

Uh oh!

jmarshall commented Feb 12, 2025 •

edited

Loading

Uh oh!

jmarshall commented Feb 12, 2025

Uh oh!

jmarshall Feb 12, 2025

Uh oh!

brainstorm Feb 12, 2025

Uh oh!

jmarshall Feb 12, 2025

Uh oh!

Uh oh!

brainstorm commented Feb 12, 2025

Uh oh!

brainstorm commented Feb 12, 2025 •

edited

Loading

Uh oh!

jmarshall commented Nov 18, 2025 •

edited

Loading

Uh oh!

jmarshall commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		If `encryptionScheme` is specified a 4 letter code for the encryption standard MUST be returned. When `c4gh` is specified, Crypt4GH payload for a particular object will be returned (if available).

		The server SHOULD reply with a `NotFound` error if the requested reference does not exist.

Conversation

brainstorm commented Feb 3, 2025

Uh oh!

daviesrob commented Feb 3, 2025

Uh oh!

mmalenic commented Feb 4, 2025

Uh oh!

brainstorm commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarshall commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarshall commented Feb 12, 2025

Uh oh!

jmarshall Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

brainstorm Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

jmarshall Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brainstorm commented Feb 12, 2025

Uh oh!

brainstorm commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarshall commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarshall commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brainstorm commented Feb 10, 2025 •

edited

Loading

jmarshall commented Feb 12, 2025 •

edited

Loading

brainstorm commented Feb 12, 2025 •

edited

Loading

jmarshall commented Nov 18, 2025 •

edited

Loading