Skip to content

[Bug] Garbled Chinese text during streaming output in chat_cli.py #3

Description

@Zon9hu

When running chat_cli.py, the streaming output sometimes displays garbled text for Chinese characters. This happens because multi-byte Unicode characters (like Chinese) can be split across multiple tokens during the generation process. Printing them token-by-token directly leads to incomplete character decoding.

Proposed Solution:
To resolve the garbled output, we can check for the \ufffd (replacement character) during decoding to ensure we only print complete characters. I suggest introducing a dedicated buffer_token_ids array purely for handling the stream printing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions