Skip to content

Releases: internetarchive/Zeno

v2.0.13

18 Jul 23:20
09aa1c4

Choose a tag to compare

What's Changed

  • Add arm64 builds on release with Zig by @NGTmeaty in #401
  • Move SetDocument(nil) to Item.Close by @vbanos in #393
  • chore: reduce LQ waiting time by @yzqzss in #394
  • Update ExtractAPIPostPermalinks to use models.URL instead of models.Item by @vbanos in #391
  • Refactor HTMLOutlinks to use models.URL instead of models.Item by @vbanos in #390
  • Refactor ExtractAssets by @vbanos in #399
  • Outlink extractor refactoring using interfaces by @vbanos in #400

Full Changelog: v2.0.12...v2.0.13

v2.0.12

17 Jul 01:12
b99f7d5

Choose a tag to compare

Full Changelog: v2.0.11...v2.0.12

v2.0.8

16 Jul 22:27
e94bf8b

Choose a tag to compare

What's Changed

  • Implement a shared interface for source so anyone can implement their own source of items by @equals215 in #377
  • Fix outlinks extraction and widen HTML assets extraction by @equals215 in #378
  • chore(deps): update github.com/ada-url/goada to v1.0.0 by @otkd in #379
  • Add support for embed tag in HTML extractor by @vbanos in #381
  • Fix: discard hooks does not interrupt downloads by @yzqzss in #383
  • Fix: BadStatusCode responses are not handled by copyWithTimeout() by @yzqzss in #385
  • Simplify WARC writer stats by @vbanos in #382
  • Unit test for extractAssets by @vbanos in #387
  • chore(deps): bump the go-modules group with 2 updates by @dependabot[bot] in #384
  • Separate reddit.com preprocessor / postprocessor by @vbanos in #386
  • Refactor request preprocessors by @vbanos in #389
  • Implement CheckDiskUsage() for Windows by @yzqzss in #376

New Contributors

  • @otkd made their first contribution in #379

Full Changelog: v2.0.7...v2.0.8

v2.0.7

09 Jul 21:08
5e496ff

Choose a tag to compare

What's Changed

Resolved long standing bug with CDX timestamps being invalid in WARC Refers-To-Date headers inside of gowarc library.

Full Changelog: v2.0.6...v2.0.7

v2.0.6

30 Jun 17:55
4231a33

Choose a tag to compare

What's Changed

  • Discard oversized Content-Length and oversized streaming responses during download by @yzqzss in #369

Full Changelog: v2.0.5...v2.0.6

v2.0.5

26 Jun 21:07
5031666

Choose a tag to compare

What's Changed

  • Update gowarc and add Doppelganger support by @NGTmeaty in #357
  • Extracting URLs from CSS @import rule by @yzqzss in #339
  • Cascadingly capture css @import urls and extracting urls from separate css item by @yzqzss in #345
  • Improve GitHub archiving by @yzqzss in #353
  • Exponential backoff for disk watcher by @vbanos in #331
  • Simplify URL logging by @vbanos in #333
  • Remove unused vars in rotate_file.go by @vbanos in #332
  • Move closeBody(seed *model.Item) to Item.Close() by @vbanos in #334
  • Use strings.ContainsAny instead of multiple strings.Contains by @vbanos in #336
  • chore(deps): bump the go-modules group with 2 updates by @dependabot in #335
  • Redundant disk util function by @vbanos in #341
  • Remove unused GetSHA1 utility function by @vbanos in #342
  • Remove unused DedupeURLs utility function by @vbanos in #343
  • chore(FieldedLogger): log with odered prefix fileds by @yzqzss in #340
  • strip hyphens from job name when updating globalPromStats by @willmhowes in #350
  • Improve URL mimetype / content-type handling by @vbanos in #344
  • fix: missing text/css mimetype by @yzqzss in #352
  • Cause we had to say bye ✌️ by @equals215 in #358
  • Main channel descriptions by @vbanos in #359
  • removed unused config parameters by @fosterlynch in #337
  • Simplify JSON isValidURL by @vbanos in #364
  • Remove redundant code in domainscrawl by @vbanos in #365
  • Remove redundant done var from Start() by @vbanos in #363

New Contributors

Full Changelog: v2.0.4...v2.0.5

v2.0.4

05 Jun 02:32
adf43d4

Choose a tag to compare

What's Changed

  • add dependabot.yml by @7h3-3mp7y-m4n in #264
  • chore(deps): bump the go-modules group with 6 updates by @dependabot in #268
  • chore(deps): bump golang from 1.22.6-alpine3.20 to 1.24.2-alpine3.20 in the ci group by @dependabot in #267
  • chore(deps): bump wangyoucao577/go-release-action from 1.51 to 1.53 in the github-actions group by @dependabot in #266
  • chore(deps): bump github.com/gabriel-vasile/mimetype from 1.4.8 to 1.4.9 in the go-modules group by @dependabot in #269
  • Enhance base URL handling by @yzqzss in #272
  • chore(deps): bump github.com/refraction-networking/utls from 1.6.7 to 1.7.0 in the go_modules group by @dependabot in #271
  • update warc library package references by @willmhowes in #276
  • Make matchRegexExclusion testable and add unit tests by @vbanos in #282
  • feat: Extract WACZ files from replayweb embeds by @NGTmeaty in #279
  • chore(deps): bump golang from 1.24.2-alpine3.20 to 1.24.3-alpine3.20 in the ci group by @dependabot in #278
  • chore(deps): bump the go-modules group across 1 directory with 4 updates by @dependabot in #277
  • Increase mimetype detection buffer from 2k to 3k by @vbanos in #287
  • Optimise XML extractor document loading by @vbanos in #286
  • Simplify logWithLevel by @vbanos in #285
  • chore(deps): bump github.com/ncruces/go-sqlite3 from 0.25.1 to 0.25.2 in the go-modules group by @dependabot in #283
  • update Dockerfile for multi-stage build to improve build efficiency by @sk-pathak in #270
  • Define models.NewURL to simplify URL object creation by @vbanos in #289
  • Filter outlinks with unwanted protocols by @vbanos in #288
  • Improve HTML extractor and add unit tests by @vbanos in #284
  • Optimize the OSS extraction process and implement Azure Blob extractor by @yzqzss in #281
  • Regex improvements and add strict regex toggle by @NGTmeaty in #290
  • Make Azure test URLs insensitive to the order of the URL query parameters by @yzqzss in #292
  • Config code simplification by @vbanos in #291
  • Resolve panics by @NGTmeaty in #294
  • Refactor HTML unit tests and increase coverage by @vbanos in #299
  • Make terminal logging colorful 🌈 by @yzqzss in #295
  • Add test with corrupt PDF by @vbanos in #304
  • NewURL(): return URL with raw string even if parsing error by @yzqzss in #307
  • Fix the nil item.url.parsed panic html test by @yzqzss in #308
  • Fix HQ sleep time by @NGTmeaty in #300
  • Extract iframe SRC as outlinks by @NGTmeaty in #302
  • Support extracting URL from meta refresh HTML tag by @vbanos in #303
  • Extract url quoted in single-quotes from meta[content] by @yzqzss in #309
  • Increase ProcessBody memory buffer from 2 to 8MB by @vbanos in #306
  • Cleanup unused stats methods by @vbanos in #311
  • chore(deps): bump github.com/fatih/color from 1.16.0 to 1.18.0 in the go-modules group by @dependabot in #305
  • Remove useless fmt.Sprintf, use a plain "zeno" string by @vbanos in #315
  • ineffectual return statements by @vbanos in #316
  • Simplify models.NewItem by @vbanos in #318
  • Simplify item.GetStatus() logging by @vbanos in #319
  • fix: combinedArgs was not used in FieldedLogger by @yzqzss in #317
  • Simplify retrySleepTime logging by @vbanos in #320
  • Extract more types of outlinks by @vbanos in #327
  • Use goleak in all package tests by @vbanos in #330
  • chore(deps): bump alpine from 3.21 to 3.22 in the ci group by @dependabot in #326
  • chore(deps): bump the go-modules group with 2 updates by @dependabot in #325
  • Replacing the regex-based CSS extractor with a standard CSS parser by @yzqzss in #324

New Contributors

Full Changelog: v2.0.3...v2.0.4

v2.0.3

11 Apr 16:24
8fc5950

Choose a tag to compare

Full Changelog: v2.0.2...v2.0.3

v2.0.2

11 Apr 10:00
c45d49d

Choose a tag to compare

What's Changed

  • chore(deps): bump golang.org/x/net from 0.35.0 to 0.36.0 in the go_modules group across 1 directory by @dependabot in #247
  • Switch to DiscardHook by @yzqzss in #263

New Contributors

Full Changelog: v2.0.1...v2.0.2

v2.0.1

07 Apr 09:00
6d7cb95

Choose a tag to compare

Full Changelog: v2.0.0...v2.0.1