Releases: internetarchive/Zeno
Releases · internetarchive/Zeno
v2.0.13
What's Changed
- Add arm64 builds on release with Zig by @NGTmeaty in #401
- Move SetDocument(nil) to Item.Close by @vbanos in #393
- chore: reduce LQ waiting time by @yzqzss in #394
- Update ExtractAPIPostPermalinks to use models.URL instead of models.Item by @vbanos in #391
- Refactor HTMLOutlinks to use models.URL instead of models.Item by @vbanos in #390
- Refactor ExtractAssets by @vbanos in #399
- Outlink extractor refactoring using interfaces by @vbanos in #400
Full Changelog: v2.0.12...v2.0.13
v2.0.12
Full Changelog: v2.0.11...v2.0.12
v2.0.8
What's Changed
- Implement a shared interface for
sourceso anyone can implement their own source of items by @equals215 in #377 - Fix outlinks extraction and widen HTML assets extraction by @equals215 in #378
- chore(deps): update github.com/ada-url/goada to v1.0.0 by @otkd in #379
- Add support for embed tag in HTML extractor by @vbanos in #381
- Fix: discard hooks does not interrupt downloads by @yzqzss in #383
- Fix:
BadStatusCoderesponses are not handled bycopyWithTimeout()by @yzqzss in #385 - Simplify WARC writer stats by @vbanos in #382
- Unit test for extractAssets by @vbanos in #387
- chore(deps): bump the go-modules group with 2 updates by @dependabot[bot] in #384
- Separate reddit.com preprocessor / postprocessor by @vbanos in #386
- Refactor request preprocessors by @vbanos in #389
- Implement
CheckDiskUsage()for Windows by @yzqzss in #376
New Contributors
Full Changelog: v2.0.7...v2.0.8
v2.0.7
What's Changed
Resolved long standing bug with CDX timestamps being invalid in WARC Refers-To-Date headers inside of gowarc library.
Full Changelog: v2.0.6...v2.0.7
v2.0.6
v2.0.5
What's Changed
- Update gowarc and add Doppelganger support by @NGTmeaty in #357
- Extracting URLs from CSS
@importrule by @yzqzss in #339 - Cascadingly capture css
@importurls and extracting urls from separate css item by @yzqzss in #345 - Improve GitHub archiving by @yzqzss in #353
- Exponential backoff for disk watcher by @vbanos in #331
- Simplify URL logging by @vbanos in #333
- Remove unused vars in rotate_file.go by @vbanos in #332
- Move closeBody(seed *model.Item) to Item.Close() by @vbanos in #334
- Use strings.ContainsAny instead of multiple strings.Contains by @vbanos in #336
- chore(deps): bump the go-modules group with 2 updates by @dependabot in #335
- Redundant disk util function by @vbanos in #341
- Remove unused GetSHA1 utility function by @vbanos in #342
- Remove unused DedupeURLs utility function by @vbanos in #343
- chore(FieldedLogger): log with odered prefix fileds by @yzqzss in #340
- strip hyphens from job name when updating globalPromStats by @willmhowes in #350
- Improve URL mimetype / content-type handling by @vbanos in #344
- fix: missing text/css mimetype by @yzqzss in #352
- Cause we had to say bye ✌️ by @equals215 in #358
- Main channel descriptions by @vbanos in #359
- removed unused config parameters by @fosterlynch in #337
- Simplify JSON isValidURL by @vbanos in #364
- Remove redundant code in domainscrawl by @vbanos in #365
- Remove redundant done var from Start() by @vbanos in #363
New Contributors
- @fosterlynch made their first contribution in #337
Full Changelog: v2.0.4...v2.0.5
v2.0.4
What's Changed
- add dependabot.yml by @7h3-3mp7y-m4n in #264
- chore(deps): bump the go-modules group with 6 updates by @dependabot in #268
- chore(deps): bump golang from 1.22.6-alpine3.20 to 1.24.2-alpine3.20 in the ci group by @dependabot in #267
- chore(deps): bump wangyoucao577/go-release-action from 1.51 to 1.53 in the github-actions group by @dependabot in #266
- chore(deps): bump github.com/gabriel-vasile/mimetype from 1.4.8 to 1.4.9 in the go-modules group by @dependabot in #269
- Enhance base URL handling by @yzqzss in #272
- chore(deps): bump github.com/refraction-networking/utls from 1.6.7 to 1.7.0 in the go_modules group by @dependabot in #271
- update warc library package references by @willmhowes in #276
- Make matchRegexExclusion testable and add unit tests by @vbanos in #282
- feat: Extract WACZ files from replayweb embeds by @NGTmeaty in #279
- chore(deps): bump golang from 1.24.2-alpine3.20 to 1.24.3-alpine3.20 in the ci group by @dependabot in #278
- chore(deps): bump the go-modules group across 1 directory with 4 updates by @dependabot in #277
- Increase mimetype detection buffer from 2k to 3k by @vbanos in #287
- Optimise XML extractor document loading by @vbanos in #286
- Simplify logWithLevel by @vbanos in #285
- chore(deps): bump github.com/ncruces/go-sqlite3 from 0.25.1 to 0.25.2 in the go-modules group by @dependabot in #283
- update Dockerfile for multi-stage build to improve build efficiency by @sk-pathak in #270
- Define models.NewURL to simplify URL object creation by @vbanos in #289
- Filter outlinks with unwanted protocols by @vbanos in #288
- Improve HTML extractor and add unit tests by @vbanos in #284
- Optimize the OSS extraction process and implement Azure Blob extractor by @yzqzss in #281
- Regex improvements and add strict regex toggle by @NGTmeaty in #290
- Make Azure test URLs insensitive to the order of the URL query parameters by @yzqzss in #292
- Config code simplification by @vbanos in #291
- Resolve panics by @NGTmeaty in #294
- Refactor HTML unit tests and increase coverage by @vbanos in #299
- Make terminal logging colorful 🌈 by @yzqzss in #295
- Add test with corrupt PDF by @vbanos in #304
- NewURL(): return URL with raw string even if parsing error by @yzqzss in #307
- Fix the nil
item.url.parsedpanic html test by @yzqzss in #308 - Fix HQ sleep time by @NGTmeaty in #300
- Extract iframe SRC as outlinks by @NGTmeaty in #302
- Support extracting URL from meta refresh HTML tag by @vbanos in #303
- Extract url quoted in single-quotes from
meta[content]by @yzqzss in #309 - Increase ProcessBody memory buffer from 2 to 8MB by @vbanos in #306
- Cleanup unused stats methods by @vbanos in #311
- chore(deps): bump github.com/fatih/color from 1.16.0 to 1.18.0 in the go-modules group by @dependabot in #305
- Remove useless fmt.Sprintf, use a plain "zeno" string by @vbanos in #315
- ineffectual return statements by @vbanos in #316
- Simplify models.NewItem by @vbanos in #318
- Simplify item.GetStatus() logging by @vbanos in #319
- fix: combinedArgs was not used in FieldedLogger by @yzqzss in #317
- Simplify retrySleepTime logging by @vbanos in #320
- Extract more types of outlinks by @vbanos in #327
- Use goleak in all package tests by @vbanos in #330
- chore(deps): bump alpine from 3.21 to 3.22 in the ci group by @dependabot in #326
- chore(deps): bump the go-modules group with 2 updates by @dependabot in #325
- Replacing the regex-based CSS extractor with a standard CSS parser by @yzqzss in #324
New Contributors
- @7h3-3mp7y-m4n made their first contribution in #264
- @willmhowes made their first contribution in #276
- @sk-pathak made their first contribution in #270
Full Changelog: v2.0.3...v2.0.4
v2.0.3
Full Changelog: v2.0.2...v2.0.3
v2.0.2
What's Changed
- chore(deps): bump golang.org/x/net from 0.35.0 to 0.36.0 in the go_modules group across 1 directory by @dependabot in #247
- Switch to
DiscardHookby @yzqzss in #263
New Contributors
- @dependabot made their first contribution in #247
Full Changelog: v2.0.1...v2.0.2
v2.0.1
Full Changelog: v2.0.0...v2.0.1