Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds TLS/SSL cipher suite and protocol version information to WARC records by implementing the WARC-Cipher-Suite and WARC-Protocol headers as defined in the WARC specifications proposal.
- Extracts cipher suite and protocol version from TLS connections
- Maps TLS version constants to standardized protocol strings
- Adds conditional WARC headers when TLS information is available
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
@NGTmeaty any chance you can resolve the conflict and look at Copilot's review? Thanks! |
|
i beg you to merge this |
|
surely this is ready now.... 😄 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } else if !strings.HasPrefix(protocol, "tls/") { | ||
| t.Errorf("WARC-Protocol should start with 'tls/' for HTTPS, got: %s", protocol) |
There was a problem hiding this comment.
If the WARC-Protocol value is http/1.1 (which is valid per iipc/warc-specifications#42), then the test would incorrectly fail according to the spec, even if the point of this test is to look for a specific WARC-Protocol header
Given that we define type Header as a string to string map, we are not equipped to handle WARC-Protocol headers because the draft spec we're implementing supports multiple instances of the header, and attempting to write multiple headers right now would repeatedly overwrite the last header due to the map's unique key requirement. I don't think this limitation is the end of the world, but it should probably be noted in our readme? We have the same problem for WARC-Concurrent-To, and I'm not sure what our WARC verify command would report for a WARC with repeated headers
| switch state.Version { | ||
| case tls.VersionSSL30: | ||
| selectedProtocol = "ssl/3" | ||
| case tls.VersionTLS10: | ||
| selectedProtocol = "tls/1.0" | ||
| case tls.VersionTLS11: | ||
| selectedProtocol = "tls/1.1" | ||
| case tls.VersionTLS12: | ||
| selectedProtocol = "tls/1.2" | ||
| case tls.VersionTLS13: | ||
| selectedProtocol = "tls/1.3" | ||
| } |
There was a problem hiding this comment.
In addition to writing a WARC-Protocol value for the encryption used, it could be nice to add support for which HTTP verison was archived to have feature parity with wget-lua:
- http/0.9
- http/1.0
- http/1.1
Not a request for right now, but noted all the same
Fixes #40