-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
Where there are multiple Set-Cookie headers in a server response from a WARC record the value of httpHeaders.Set-Cookie is always the last one in the list. This should be returned as an array of the Set-Cookie headers if that change doesnt break other things or there should be another method to get all of the cookies from the headers block. Another option would be to keep the line endings (\n) for the response so it is still a string but you can split it if you want.
Example WARC record (minus the content block):
WARC/1.0
WARC-Type: request
WARC-Date: 2019-06-15T21:54:45Z
WARC-Record-ID: <urn:uuid:1e7aaba9-c5b9-49cd-b0a8-6a4d7460c9b3>
Content-Length: 296
Content-Type: application/http; msgtype=request
WARC-Warcinfo-ID: <urn:uuid:07d8abda-2416-492c-b139-8fb526d5f792>
WARC-IP-Address: 95.216.246.36
WARC-Target-URI: https://www.bpazar.com/index.php?route=product/search&search=Sarj&page=4
GET /index.php?route=product/search&search=Sarj&page=4 HTTP/1.1
User-Agent: CCBot/2.0 (https://commoncrawl.org/faq/)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Host: www.bpazar.com
Connection: Keep-Alive
Accept-Encoding: gzip
WARC/1.0
WARC-Type: response
WARC-Date: 2019-06-15T21:54:45Z
WARC-Record-ID: <urn:uuid:3f3d6e43-9e5d-42ba-a111-43fcd90dd633>
Content-Length: 1043231
Content-Type: application/http; msgtype=response
WARC-Warcinfo-ID: <urn:uuid:07d8abda-2416-492c-b139-8fb526d5f792>
WARC-Concurrent-To: <urn:uuid:1e7aaba9-c5b9-49cd-b0a8-6a4d7460c9b3>
WARC-IP-Address: 95.216.246.36
WARC-Target-URI: https://www.bpazar.com/index.php?route=product/search&search=Sarj&page=4
WARC-Payload-Digest: sha1:N2WQFAUYKKXT6MRWSCXCQC7FOZRQCLTI
WARC-Block-Digest: sha1:S3FKWWFJ7LCYFOHUZ4RBPFAMYNQSVQMH
WARC-Identified-Payload-Type: text/html
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 15 Jun 2019 21:54:44 GMT
Content-Type: text/html; charset=UTF-8
X-Crawler-Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie: OCSESSID=d4163e3479bec29a507792acc4; path=/
Set-Cookie: OCSESSID=57bfbd42e2fe9d4d5af66485f7; path=/
Set-Cookie: language=tr-tr; expires=Mon, 15-Jul-2019 21:54:40 GMT; Max-Age=2592000; path=/; domain=tr-tr
Set-Cookie: currency=TRY; expires=Mon, 15-Jul-2019 21:54:40 GMT; Max-Age=2592000; path=/; domain=www.bpazar.com
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Nginx-Cache-Status: BYPASS
X-Server-Powered-By: Engintron
X-Crawler-Content-Encoding: gzip
Response from console.log(record.httpHeaders); when used in the record callback:
{ Server: 'nginx',
Date: 'Sat, 15 Jun 2019 21:54:44 GMT',
'Content-Type': 'text/html; charset=UTF-8',
'X-Crawler-Transfer-Encoding': 'chunked',
Connection: 'keep-alive',
Vary: 'Accept-Encoding',
'Set-Cookie':
'currency=TRY; expires=Mon, 15-Jul-2019 21:54:40 GMT; Max-Age=2592000; path=/; domain=www.bpazar.com',
'X-XSS-Protection': '1; mode=block',
'X-Content-Type-Options': 'nosniff',
'X-Nginx-Cache-Status': 'BYPASS',
'X-Server-Powered-By': 'Engintron',
'X-Crawler-Content-Encoding': 'gzip' }
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels