Skip to content

fix: resolve Windows GBK encoding issue in CLI output#472

Open
fanghuaqi wants to merge 1 commit into
deedy5:mainfrom
fanghuaqi:fix/windows-encoding-issue
Open

fix: resolve Windows GBK encoding issue in CLI output#472
fanghuaqi wants to merge 1 commit into
deedy5:mainfrom
fanghuaqi:fix/windows-encoding-issue

Conversation

@fanghuaqi

@fanghuaqi fanghuaqi commented Jun 2, 2026

Copy link
Copy Markdown

Summary

Fixes UnicodeEncodeError when running ddgs CLI commands on Windows with GBK console encoding.

Changes

  • Set stdout/stderr to UTF-8 encoding in safe_entry_point()
  • Fixes UnicodeEncodeError when output contains special Unicode characters (e.g., Arabic characters, Chinese characters, trademark symbols ™)
  • Only applies on Windows when sys.stdout.reconfigure() is available
  • Add test case for extract command

Problem

On Windows, the default console encoding is GBK. When ddgs CLI outputs results containing characters that can't be encoded in GBK (like Arabic characters, Chinese characters, or special symbols), it crashes with:

UnicodeEncodeError: 'gbk' codec can't encode character 'ع' in position 61: illegal multibyte sequence

Solution

Instead of adding exception handling to every click.secho() call, we set the stdout/stderr encoding to UTF-8 at the program entry point in safe_entry_point(). This is a cleaner, more comprehensive solution that:

  1. Fixes the encoding issue for all CLI commands at once
  2. Doesn't require modifying individual output functions
  3. Falls back gracefully if reconfigure() is not available
if sys.platform == "win32" and hasattr(sys.stdout, "reconfigure"):
    try:
        sys.stdout.reconfigure(encoding="utf-8")
        sys.stderr.reconfigure(encoding="utf-8")
    except Exception:
        pass

Testing

  • All 9 CLI tests pass
  • Added test_extract_command test case
  • Verified with Chinese characters, Arabic characters, and special symbols

- Set stdout/stderr to UTF-8 encoding in safe_entry_point()
- Fixes UnicodeEncodeError when output contains special Unicode characters
  (e.g., Arabic characters, Chinese characters, trademark symbols)
- Only applies on Windows when sys.stdout.reconfigure() is available
- Add test case for extract command
@fanghuaqi fanghuaqi force-pushed the fix/windows-encoding-issue branch from af728b2 to b317e2a Compare June 2, 2026 03:54
@fanghuaqi fanghuaqi changed the title fix: resolve Windows console encoding issue in extract command fix: resolve Windows GBK encoding issue in CLI output Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant