Skip to content

Add WARC-Cipher-Suite and WARC-Protocol#50

Open
NGTmeaty wants to merge 5 commits intomasterfrom
warc-http-headers
Open

Add WARC-Cipher-Suite and WARC-Protocol#50
NGTmeaty wants to merge 5 commits intomasterfrom
warc-http-headers

Conversation

@NGTmeaty
Copy link
Copy Markdown
Collaborator

Fixes #40

@NGTmeaty NGTmeaty marked this pull request as ready for review September 27, 2024 04:58
@NGTmeaty NGTmeaty marked this pull request as draft September 27, 2024 10:19
@CorentinB CorentinB requested a review from Copilot August 22, 2025 23:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds TLS/SSL cipher suite and protocol version information to WARC records by implementing the WARC-Cipher-Suite and WARC-Protocol headers as defined in the WARC specifications proposal.

  • Extracts cipher suite and protocol version from TLS connections
  • Maps TLS version constants to standardized protocol strings
  • Adds conditional WARC headers when TLS information is available

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread dialer.go Outdated
Comment thread dialer.go Outdated
@CorentinB CorentinB marked this pull request as ready for review August 23, 2025 09:21
@CorentinB
Copy link
Copy Markdown
Collaborator

@NGTmeaty any chance you can resolve the conflict and look at Copilot's review? Thanks!

@CorentinB
Copy link
Copy Markdown
Collaborator

i beg you to merge this

@NGTmeaty
Copy link
Copy Markdown
Collaborator Author

surely this is ready now.... 😄

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread client_test.go Outdated
Comment thread read_test.go
Comment on lines +328 to +329
} else if !strings.HasPrefix(protocol, "tls/") {
t.Errorf("WARC-Protocol should start with 'tls/' for HTTPS, got: %s", protocol)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the WARC-Protocol value is http/1.1 (which is valid per iipc/warc-specifications#42), then the test would incorrectly fail according to the spec, even if the point of this test is to look for a specific WARC-Protocol header

Given that we define type Header as a string to string map, we are not equipped to handle WARC-Protocol headers because the draft spec we're implementing supports multiple instances of the header, and attempting to write multiple headers right now would repeatedly overwrite the last header due to the map's unique key requirement. I don't think this limitation is the end of the world, but it should probably be noted in our readme? We have the same problem for WARC-Concurrent-To, and I'm not sure what our WARC verify command would report for a WARC with repeated headers

Comment thread dialer.go
Comment on lines +507 to +518
switch state.Version {
case tls.VersionSSL30:
selectedProtocol = "ssl/3"
case tls.VersionTLS10:
selectedProtocol = "tls/1.0"
case tls.VersionTLS11:
selectedProtocol = "tls/1.1"
case tls.VersionTLS12:
selectedProtocol = "tls/1.2"
case tls.VersionTLS13:
selectedProtocol = "tls/1.3"
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to writing a WARC-Protocol value for the encryption used, it could be nice to add support for which HTTP verison was archived to have feature parity with wget-lua:

- http/0.9
- http/1.0
- http/1.1

Not a request for right now, but noted all the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add WARC-Cipher-Suite and WARC-Protocol WARC headers

4 participants