Skip to content

Releases: brody-0125/dart_sentencepiece_tokenizer

v1.3.2

07 Apr 16:27
891f2b3

Choose a tag to compare

What's New

GitHub Actions CI Pipeline (#19)

Automated continuous integration is now configured for the project:

  • Analyze job — Enforces dart format consistency and dart analyze --fatal-infos with zero tolerance for warnings.
  • Test job — Runs the full test suite across a matrix of Dart stable and Dart 3.10.7 (minimum supported SDK version).
  • Minimal permissions (contents: read) and concurrency groups to cancel stale runs.

Improvements

  • Code formatting — Applied dart format to 23 source files for consistent code style across the codebase.
  • Static analysis cleanup — Resolved all dart analyze --fatal-infos issues:
    • Removed deprecated avoid_returning_null_for_future lint rule.
    • Added curly braces to if statements, const constructors, and final local variables where required.
  • Documentation (#21) — Added inline comments clarifying google/sentencepiece proto spec compliance for default token IDs (unkId=0, bosId=1, eosId=2, padId=-1).

Notes

This is a maintenance release with no API changes or breaking changes. Focus: CI infrastructure, code hygiene, and documentation clarity.

Full Changelog: v1.3.1...v1.3.2

v1.3.1 — HuggingFace tokenizer.json Native Support

03 Apr 16:12
147bd0b

Choose a tag to compare

Load HuggingFace tokenizers directly from tokenizer.json — no conversion step required.

What's New

HuggingFace tokenizer.json Format Support

You can now load any HuggingFace tokenizer.json file without converting it to SentencePiece .model format first. This makes it straightforward to use tokenizers published on the HuggingFace Hub.

// Load from file
final tokenizer = await HuggingFaceTokenizerLoader.fromJsonFile('tokenizer.json');

// Load from a pre-parsed map
final tokenizer = HuggingFaceTokenizerLoader.fromMap(jsonMap);

// Auto-detection — works transparently with TokenizerJsonLoader
final tokenizer = await TokenizerJsonLoader.fromJsonFile('tokenizer.json');

Supported model types:

  • Unigram — Llama, T5, ALBERT, XLNet, and other Unigram-based models
  • BPE — Gemma, GPT-2, RoBERTa, and other BPE-based models

Automatic configuration inference:

  • Special tokens (unk, bos, eos, pad) are detected from the added_tokens section
  • Normalizer settings (addDummyPrefix, escapeWhitespaces) are inferred from the HuggingFace normalizer config
  • Post-processor flags (addBosToken, addEosToken) are parsed from TemplateProcessing
  • Byte fallback behavior is detected from the decoder configuration
  • Tokens beyond the base vocabulary are handled automatically

Format detection:

TokenizerJsonLoader.isHuggingFaceFormat() lets you check whether a JSON map uses the HuggingFace format. When you call TokenizerJsonLoader.fromJsonFile(), HuggingFace format is detected and delegated automatically — no code changes needed if you already use TokenizerJsonLoader.

Install / Upgrade

dependencies:
  dart_sentencepiece_tokenizer: ^1.3.1

Full Changelog: https://github.com/brody-0125/dart_sentencepiece_tokenizer/blob/develop/CHANGELOG.md

1.3.0

02 Feb 14:12
b2f9d49

Choose a tag to compare

What's Changed

  • feat: add HuggingFace TextStreamer compatible streaming API by @brody-0125 in #8

Full Changelog: 1.2.2...1.3.0

1.2.2

02 Feb 14:11
4e75dff

Choose a tag to compare

What's Changed

  • feat: optimize memory usage and refactor tests for v1.2.2 by @brody-0125 in #7

Full Changelog: 1.2.1...1.2.2

1.2.1

27 Jan 16:47
2f5c652

Choose a tag to compare

What's Changed

  • feat: add JSON Serialization API, Dynamic Token Addition API and Optimized BPE Algorithm by @brody-0125 in #5

Full Changelog: 1.2.0...1.2.1

1.1.0~1.2.0

17 Jan 12:49
929d90e

Choose a tag to compare

What's Changed

  • feat: improve BPE Algorithm by @brody-0125 in #1
  • feat: improve BPE Algorithm (#1) by @brody-0125 in #2
  • feat: add JSON Serialization API, Dynamic Token Addition API and Optimized BPE Algorithm by @brody-0125 in #3

Full Changelog: 1.0.0...1.2.0

1.0.0

02 Jan 16:23

Choose a tag to compare