[torch-xla 2.9RC1] Random crash in sentencepiece with torch-xla 2.9 when doing vocab loading

## 🐛 Bug

We are seeing random crash (likely memory corruption) in sentencepiece with torch-xla 2.9 when doing sentencepiece vocab loading:

```
#import torch
import torch_xla
import sentencepiece as spm
sp_model = spm.SentencePieceProcessor("/home/ubuntu/souseki_sentencepiece.model")
```
On Ubuntu22 we get:
```
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument
Aborted (core dumped)

```
In repeated runs, sometime it works without any crash:
```
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ 
```
If we uncomment import torch, we also see:

```
WARNING:root:Defaulting to PJRT_DEVICE=CPU
terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument
Aborted (core dumped)
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ vi repro2.py 
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
terminate called after throwing an instance of 'c10::Error'
  what():  kernels_.find(DispatchKey::Undefined) == kernels_.end() INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp":278, please report a bug to PyTorch. 
Exception raised from hasKernelForAnyDispatchKey at /pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:278 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x70c71717cb80 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x69 (0x70c71710f095 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::impl::OperatorEntry::hasKernelForAnyDispatchKey(c10::DispatchKeySet) const + 0x6a (0x70c6fc1dd2ba in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: c10::impl::OperatorEntry::computeDispatchTableEntryWithDebug(c10::Dispatcher const&, c10::DispatchKey) const + 0x124 (0x70c6fc1e0d04 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10::impl::OperatorEntry::computeDispatchTableEntry(c10::Dispatcher const&, c10::DispatchKey) const + 0x9 (0x70c6fc1e0ea9 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10::impl::OperatorEntry::updateDispatchTableEntry_(c10::Dispatcher const&, c10::DispatchKey) + 0x38 (0x70c6fc1e0ee8 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #6: c10::impl::OperatorEntry::updateDispatchTable_(c10::Dispatcher const&, c10::DispatchKey) + 0x95 (0x70c6fc1e1095 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #7: c10::Dispatcher::deregisterImpl_(c10::OperatorHandle const&, c10::OperatorName const&, std::optional<c10::DispatchKey>, std::_List_iterator<c10::impl::AnnotatedKernel>) + 0x27 (0x70c6fc1d2a17 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x19d2b31 (0x70c6fc1d2b31 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: torch::detail::TorchLibraryInit::~TorchLibraryInit() + 0x38 (0x70c5dc23a698 in /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/_XLAC.cpython-310-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x45495 (0x70c71ea45495 in /lib/x86_64-linux-gnu/libc.so.6)
frame #11: on_exit + 0 (0x70c71ea45610 in /lib/x86_64-linux-gnu/libc.so.6)
frame #12: <unknown function> + 0x29d97 (0x70c71ea29d97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: __libc_start_main + 0x80 (0x70c71ea29e40 in /lib/x86_64-linux-gnu/libc.so.6)
<omitting python frames>

Aborted (core dumped)

```

You may have to run several times to trigger the crash. It's also strange that it is stable if we either uninstall accelerate or install accelerate, depending on the environment.

```
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ pip uninstall accelerate
Found existing installation: accelerate 1.11.0
Uninstalling accelerate-1.11.0:
  Would remove:
    /home/ubuntu/test_venv_py310/bin/accelerate
    /home/ubuntu/test_venv_py310/bin/accelerate-config
    /home/ubuntu/test_venv_py310/bin/accelerate-estimate-memory
    /home/ubuntu/test_venv_py310/bin/accelerate-launch
    /home/ubuntu/test_venv_py310/bin/accelerate-merge-weights
    /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/accelerate-1.11.0.dist-info/*
    /home/ubuntu/test_venv_py310/lib/python3.10/site-packages/accelerate/*
Proceed (Y/n)? y
  Successfully uninstalled accelerate-1.11.0
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
(test_venv_py310) ubuntu@ip-172-31-1-215:~$ python repro2.py
WARNING:root:Defaulting to PJRT_DEVICE=CPU
```

By bisecting torch-xla commits, I narrowed down to the upgrade to August openxla https://github.com/pytorch/xla/commit/748ac9b1032cea9499f8062a10607eceb4a84cb7 which includes an update to protobuf 6.31.1: https://github.com/openxla/xla/commit/72a784fc2fbdbbb0864d0d3856a9d80c8e8a0378:
```
b098be87dde58fe48e5effe72c0bb6b9b4ba5b6e    bad 8/22/2025
748ac9b1032cea9499f8062a10607eceb4a84cb7    bad 8/22/2025
6b6ef5c7d757f955565b2083c48d936bfd758dcd    good    8/22/2025
b84c83b46615f767e6d94cda959db8178ddd95b5    good    8/21/2025
0f56dec9a33a993d4c14cb755bdd25490cabba21    good    8/19/2025
a1c6ee92c85e8b0955c20892ed68f032a6015c09    good    8/16/2025
```

Building torch-xla with DEBUG=1 also avoid sentencepiece crash, so that's not a debug option.

Looking at Sentencepiece, which has it's own copy of protobuf, I see that the last update was to version 3.14 5 years ago (https://github.com/google/sentencepiece/commit/152a87f53c68ff78363b09e440cf2901ca345532)

Compiling latest Sentencepiece didn't help. I don't know how to update protobuf-lite there.

## To Reproduce

Install if you are using python 3.10 env:
```
pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.9.0rc1-cp310-cp310-linux_x86_64.whl
pip install accelerate torch==2.9 sentencepiece
```

Download any sentencepiece model, example from HF:
```
cd ~/
wget https://huggingface.co/ganchengguang/RoBERTa-base-japanese-sentencepiece/resolve/main/souseki_sentencepiece.model
```

The model argument to SentencePieceProcessor needs to be absolute path. Change it to match your environment.
```
#import torch
import torch_xla
import sentencepiece as spm
sp_model = spm.SentencePieceProcessor("/home/ubuntu/souseki_sentencepiece.model")
```

## Expected behavior
No crash

## Environment

 - Reproducible on XLA backend [CPU/TPU]: CPU
 - torch_xla version: 2.9


## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[torch-xla 2.9RC1] Random crash in sentencepiece with torch-xla 2.9 when doing vocab loading #9691

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[torch-xla 2.9RC1] Random crash in sentencepiece with torch-xla 2.9 when doing vocab loading #9691

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions