Skip to content

Releases: iipc/jwarc

v0.28.0: Release 0.28.0

27 Jul 06:45
@ato ato
Compare
Choose a tag to compare

New features:

  • Added fetch options to WarcWriter.fetch and fetch tool: maxTime, maxLength, readTimeout, userAgent
  • Added fetch tool option --output-file

Bugs fixed:

  • Fixed missing response.http().body().size() value when response is truncated and WarcReader.calculateBlockDigest() is enabled

v0.27.1: Release 0.27.1

26 Jul 07:56
@ato ato
Compare
Choose a tag to compare

Bugs fixed:

  • Lenient HTTP parser now accepts folded header lines that use LF instead of CRLF
  • Fixed bug where bogus ARC MIME field could be prepended to the length field

v0.27.0: Release 0.27.0

26 Jul 06:11
@ato ato
Compare
Choose a tag to compare

New features

  • Added a HttpRequest.Builder(method, uri) constructor that populates the Host header.

Bugs fixed:

  • WarcWriter.fetch(uri) was omitting the query string

Changes:

  • ARC parser now accepts garbage in the MIME field
  • HTTP parser in lenient mode now accepts messages without a minor version number (e.g. "HTTP/2") #70

v0.26.0: Release 0.26.0

19 Jul 06:11
@ato ato
Compare
Choose a tag to compare

New features

  • CDX tool gained a --warc-full-path option to emits absolute paths in the filename field #76 (Thomas Egense)
  • Added a CdxWriter class that provides a programmatic interface to the CDX indexing tool

v0.25.0: Release 0.25.0

14 Jul 04:10
@ato ato
Compare
Choose a tag to compare

New features

  • CDX tool: New option -r or --revisits-included to include revisit records #75 (Thomas Egense)

v0.24.1: Release 0.24.1

10 Jul 08:01
@ato ato
Compare
Choose a tag to compare

Changes:

  • Removed optional dependency on jackson-core. This was only used when processing JSON request bodies with the --post-append option of the CDX tool. jwarc now includes a small JSON tokenizer instead.

v0.24.0: Release 0.24.0

28 Jun 05:25
@ato ato
Compare
Choose a tag to compare

New features

  • CDX tool gained a --digest-unchanged option to output the raw value of the WARC-Payload-Digest field #74 (Thomas Egense)
  • MediaType and WarcDigest gained a .raw() method for accessing the original unparsed string

v0.23.1: Release 0.23.1

20 Jun 06:02
@ato ato
Compare
Choose a tag to compare

Bugs fixed:

  • CdxRequestEncoder: Match pywb's 4096 character truncation correctly

v0.23.0

19 Jun 09:55
@ato ato
Compare
Choose a tag to compare

New features:

  • Response parser for the Gemini protocol

Bugs fixed:

  • CdxRequestEncoder: Improved compatibility with Pywb for JSON and text/plain request types

  • ListTool: Fixed exception when reading non-HTTP records

v0.22.0

11 May 07:30
@ato ato
Compare
Choose a tag to compare

New features:

  • WarcWriter.fetch() now returns a FetchResult object containing the request and response WARC and HTTP headers