When running chat_cli.py, the streaming output sometimes displays garbled text for Chinese characters. This happens because multi-byte Unicode characters (like Chinese) can be split across multiple tokens during the generation process. Printing them token-by-token directly leads to incomplete character decoding.
Proposed Solution:
To resolve the garbled output, we can check for the \ufffd (replacement character) during decoding to ensure we only print complete characters. I suggest introducing a dedicated buffer_token_ids array purely for handling the stream printing.
When running chat_cli.py, the streaming output sometimes displays garbled text for Chinese characters. This happens because multi-byte Unicode characters (like Chinese) can be split across multiple tokens during the generation process. Printing them token-by-token directly leads to incomplete character decoding.
Proposed Solution:
To resolve the garbled output, we can check for the \ufffd (replacement character) during decoding to ensure we only print complete characters. I suggest introducing a dedicated buffer_token_ids array purely for handling the stream printing.