-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Labels
Description
When using the filtering tool on ARC-files, the output is a WARC-file. But for ARC records with just 1 http header line, the conversion fails to produce a valid WARC-record since it is missing the HTTP header and only has the new WARC-header.
This is the start of ARC record with only a single http header line.
http://wolfgrass.myetang.com:80/ 202.96.96.20 20010926085547 no-type 5913
<SCRIPT LANGUAGE="JavaScript" SRC="/-fs0/sys/pop-up.js"></SCRIPT>
<html>
And the start of the record in the converted WARC file: (no HTTP header)
WARC/1.1
Content-Length: 37069
Content-Type: application/http;msgtype=response
WARC-Date: 2001-09-28T13:38:05Z
WARC-IP-Address: 202.96.96.20
WARC-Target-URI: http://bigmouthnet.myetang.com:80/
WARC-Type: response
<SCRIPT LANGUAGE="JavaScript" SRC="/-fs0/sys/pop-up.js"></SCRIPT>
<html>
I can give you an arc file so you can reproduce it.(/netarkivet/042g/fildir/DENMARK-EXTRACTED-2001-part-00001117.arc.gz)
Reactions are currently unavailable