Backend impacted
The Rust implementation
Operating system
OSX
Hardware
CPU
Description
I wonder if it could be due to noise, but there are times where in the model doesn't produce tokens, I wonder if this has to do with high noise scrambling the model? for a short time and reset not functioning correctly? I am using it in a streaming scenario
Extra information
Its a bit difficult to reproduce, I wonder if you have encountered it a few times
Environment
Fill in the following information on your system.
- Operating system version:
If the backend impacted is PyTorch:
- Python version:
- PyTorch version:
- CUDA version (run
python -c 'import torch; print(torch.version.cuda)'):
- GPU model and memory:
If the backend is MLX: