Harden release downloader and finalize releases-first model#99
Conversation
Use releases.amikos.tech as fixed primary source with GitHub fallback, enforce checksum/ABI validations, and remove release source env overrides. Add coverage for header behavior, checksum parsing, and helper utilities; align workflows/docs/examples with the finalized download model.
PR Review: Harden release downloader and finalize releases-first modelSummaryOverall this is a solid improvement — the releases-first architecture is cleaner, auth token scoping is correct, and strict checksum enforcement closes real gaps. A few things worth addressing: Issues / Concerns1. Double-load inefficiency in
Consider returning the library handle from a combined verify+load function to avoid the redundant syscalls. 2. Wrapper functions for constants add unnecessary indirection
3. In if parsed.IsAbs() {
if !strings.EqualFold(parsed.Scheme, "https") {
return "", fmt.Errorf("invalid checksums_url scheme...")
}
return checksumsURL, nil // could be any HTTPS host
}If Consider pinning the checksums URL to the same host as 4. Module-level
Positives
|
Benchmark Comparison |
PR Review: Harden release downloader and finalize releases-first modelOverall, this is a solid hardening PR. Good fixes: scoped auth headers to GitHub API only, os.MkdirTemp replacing shared temp dir (eliminates concurrent-download races), proper checksumForAsset parsing with typed sentinel errors, and post-download ABI validation moved into DownloadAndCacheLibrary. The test additions are valuable. A few things worth considering: SecurityresolveChecksumsURL accepts arbitrary HTTPS host from latest.json (download.go): If releases.amikos.tech is compromised, checksums_url in latest.json can redirect checksum fetching to an attacker-controlled server - the scheme-only https check is necessary but not sufficient. Consider validating the host against a known allowlist (e.g. releases.amikos.tech and objects.githubusercontent.com). PerformanceDouble library load in the fast path (library_loading.go): verifyLibraryABICompatibility loads and immediately closes the library, then loadLibrary loads it again. For the happy path (cached library, compatible), that is 2 full dlopen/dlclose cycles. Consider returning the handle from the check or splitting out a checkSymbolsAndVersion(handle) helper the caller invokes after opening. API behavioral changeGetAvailableVersions now silently returns at most 1 version - the []string return type implies a list, but the new implementation always returns a single-element slice. The in-code comment explains this, but callers expecting multiple versions will silently get less data. Worth noting more prominently in the function doc comment or changelog. Minor
|
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Go Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: e07d837 | Previous: 659e4cf | Ratio |
|---|---|---|---|
BenchmarkDecode/WithSpecialTokens |
22208 ns/op 740 B/op 10 allocs/op |
18443 ns/op 740 B/op 10 allocs/op |
1.20 |
BenchmarkDecode/WithSpecialTokens - ns/op |
22208 ns/op |
18443 ns/op |
1.20 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @tazarov
ReviewGood overall direction — the releases-first architecture, auth header scoping, strict checksum enforcement, and replacing panic-recovery symbol probing with explicit Performance RegressionThe benchmark CI flagged regressions (1.2–1.3x) on
|
Supplementary Review NoteThree prior Claude reviews already cover the key issues well (double-load on the hot path, Stale ABI verification cache on external file replacement
Summary of open items from prior reviews
|
ReviewOverall this is a solid improvement — the primary/fallback download architecture is clean, checksum enforcement is now strict, and auth token scoping to MediumDouble library load in post-download verification (
Minor
Nits
|
Follow-up to prior reviewsTwo previous review comments raised concerns that are actually already addressed in this PR — clarifying for record: Corrections to prior review concernsChecksums URL host restriction — both prior reviews flagged resolveChecksumsURL as accepting arbitrary HTTPS hosts. The PR already implements a host allowlist (allowedChecksumsHosts) initialized at package load time and enforced in resolveChecksumsURL. The allowlist contains releases.amikos.tech and objects.githubusercontent.com. The concern is resolved. Double-load in LoadTokenizerLibrary — prior reviews cited verifyLibraryABICompatibility loading+closing then loadLibrary re-opening. The PR fixes this: LoadTokenizerLibrary now calls loadLibrary once to get libh and passes the already-open handle to verifyLibraryABICompatibilityHandle(libh), avoiding the redundant syscall on the hot path. One new minor observationlibraryABIFingerprint uses size:mtime as a weak cache key (download.go): A local-process cache replacement preserving size and mtime would bypass the ABI re-check. Acceptable for a locally-trusted cache but worth a comment explaining the trade-off. Overall this is a well-structured PR. The security hardening (scoped auth headers, strict checksum enforcement, host allowlist, os.MkdirTemp replacing shared temp path) are all meaningful improvements. |
Supplemental Review NotesNoting a correction and a couple of minor items not covered in the prior review. Correction on prior review point #3 (checksums URL host): The existing review claims Minor: The test Minor: package-level
|
Supplementary ReviewSeveral prior reviews cover the main points well. Two gaps worth adding: Benchmark regression root cause: double
|
Summary
This PR finalizes the releases-first download architecture and hardens the downloader, verification, and docs/workflow consistency.
What changed
https://releases.amikos.tech/pure-tokenizersamikos-tech/pure-tokenizersTOKENIZERS_RELEASES_BASE_URL,TOKENIZERS_RELEASES_PROJECT,TOKENIZERS_GITHUB_REPO) from runtime behavior.TOKENIZERS_GITHUB_REPOis set (now ignored).purego.Dlsymwindows.GetProcAddressDownloadAndCacheLibrary()so standalone calls are safe.LoadTokenizerLibrary().rust-release.yml,go-release.yml,make test, checksum manifest mention).examples/example_download.goto compile (removes non-existentWithDownloadLibrary()).Validation
go test ./...TOKENIZERS_REQUIRE_ONLINE_TESTS=1 go test -run 'TestDownloadFunctionality/GetAvailableVersions|TestDownloadLibraryFromGitHub' -count=1 ./...Notes
skills-lock.jsonis intentionally not included in this PR.