Make sure our tokenizer handles numbers well, e.g. right-to-left parsing, place-aligned chunking, digit splitting, etc. - [From Digits to Decisions: How Tokenization Impacts Arithmetic in LLMs](https://huggingface.co/spaces/huggingface/number-tokenization-blog) - [Arcee Trinity Large Technical Report](https://arxiv.org/html/2602.17004v1#S2) Section 2.1.1 notes a standard regex has pathological backtracking behavior and details their fix.
Make sure our tokenizer handles numbers well, e.g. right-to-left parsing, place-aligned chunking, digit splitting, etc.