Converted the Jina Tokenizer regex pattern to python. Based on: https://x.com/_philschmid/status/1825121514816938105