Skip to content

v0.7.0

Compare
Choose a tag to compare
@ato ato released this 11 Mar 04:07

New features

  • jwarc now includes a simple filter language for selecting matching WARC records.
    • jwarc filter 'warc-type != "request"'
    • jwarc filter ':status == 200 && http:content-type =~ "image/.*"'
    • long errors = reader.records().filter(WarcFilter.compile(":status >= 400")).count();
  • Native binary builds of the jwarc CLI tool are now available for Linux and MacOS. These are built using GraalVM and do not require Java to be installed. (The cross-platform .jar is still the recommended version though.)

Changed

  • Calling record.http() no longer invalidates record.body() although care must still be taken.
  • Remove the HttpParser.Handler interface