Update tokenizers crate to 0.22.2#27
Merged
Merged
Conversation
Bumps the Hugging Face tokenizers Rust dependency to the latest version. No source code changes were needed. Re-vendored dependencies (tarball shrank from ~10MB to ~7.7MB due to removed transitive deps).
Use COPYFILE_DISABLE=1 and --no-xattrs when creating the vendor tarball to prevent macOS xattr metadata (com.apple.provenance) from being embedded, which causes warnings when extracted with GNU tar on Linux CI.
These WASI-only crates are never compiled for our supported targets (Linux, macOS, Windows) but Cargo still parses their manifests when using vendored sources. wit-bindgen v0.51.0 uses edition 2024 which requires Cargo >= 1.85, breaking offline builds with older toolchains.
wit-bindgen v0.51.0 uses edition 2024, which Cargo < 1.85 cannot parse. Since it is never compiled for our targets (only needed for WASI), patch its Cargo.toml to edition 2021 after vendoring so older Cargo can read the manifest without error.
Remove the .Rbuildignore exclusion for tests/testthat/_snaps so that snapshot tests work correctly during R CMD check.
unicode-segmentation 1.13.2 requires rustc 1.85+. Pin to 1.12.0 to maintain compatibility with the MSRV of 1.81 specified in Cargo.toml.
CRAN's Debian testing has rustc 1.92, so 1.91 is safe. This removes the need to pin unicode-segmentation (1.13.2 requires 1.85+) and avoids other MSRV-related compilation issues with tokenizers 0.22.2.
Match the MSRV update in Cargo.toml and DESCRIPTION.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tokenizersRust dependency from 0.20.3 to 0.22.2thiserror1.x→2.x,rand0.8→0.9,indicatif0.17→0.18)lazy_static, oldwindows-targetssub-crates, etc.)extendr-apistays at 0.8.1 (already latest)Test plan