List view
Tokenizer concurrency safety and lifecycle hardening for shared service usage. Key deliverables: - Prevent Close() from racing in-flight Encode/Decode/EncodePairs/VocabSize calls - Ensure Rust FFI entrypoints fail safely instead of aborting the process on panic - Add race/regression coverage for concurrent use and shutdown - Document the supported concurrency and shutdown contract Target: A patch release that is safe to reuse across goroutines in production and shuts down cleanly.
No due date•6/11 issues closedAdvanced Features - Full Tokenizer API This milestone completes the tokenizer API with advanced features including training, serialization, and pipeline component access. **Key Features:** - Training Support (Train, TrainFromIterator) - Serialization (Save, ToJSON) - Pipeline Component Access (get/set normalizer, pre-tokenizer, post-processor) **Target:** Complete feature parity with HuggingFace tokenizers for advanced use cases including custom tokenizer training and fine-grained component control.
No due date•0/3 issues closedExtended Functionality - Medium Priority Features This milestone extends the tokenizer with dynamic token management and enhanced encoding information capabilities. **Key Features:** - Dynamic Token Management (AddTokens, AddSpecialTokens) - Enhanced Encoding Information (WordIDs, SequenceIDs, mapping methods) **Target:** Advanced functionality for use cases requiring runtime vocabulary modification and detailed encoding analysis.
No due date•0/2 issues closedCore Functionality - High Priority Features This milestone focuses on essential batch processing and token/vocabulary access functionality that extends the core tokenization capabilities. **Key Features:** - Batch Processing (EncodeBatch, DecodeBatch) - Token/Vocabulary Access (TokenToID, IDToToken, GetVocab) **Target:** Essential functionality for production use cases requiring batch operations and vocabulary introspection.
No due date•2/11 issues closed