Skip to content

Commit 18c1e24

Browse files
committed
avoid encoding errors with unicode content piped through stdio on Windows
Consider this trivial file (with a trailing LF): print('This is a unicode character: ≠'.encode("UTF-8")) This command worked in cmd.exe or an MSYS terminal, and printed ≠ correctly: $ cat test.py | pyupgrade.exe --py38-plus - This crashed with an encoding error: $ cat test.py | pyupgrade.exe --py38-plus - > reformated.py Traceback (most recent call last): File "C:\hgdev\python39-x64\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\hgdev\python39-x64\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "c:\Users\Matt\.local\bin\pyupgrade.exe\__main__.py", line 7, in <module> File "C:\Users\Matt\pipx\venvs\pyupgrade\lib\site-packages\pyupgrade\_main.py", line 389, in main ret |= _fix_file(filename, args) File "C:\Users\Matt\pipx\venvs\pyupgrade\lib\site-packages\pyupgrade\_main.py", line 330, in _fix_file print(contents_text, end='') File "C:\hgdev\python39-x64\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2260' in position 36: character maps to <undefined> Since bytes are read from `stdin.buffer` and decoded as UTF-8 when the input file is '-', it makes sense to write UTF-8 bytes to `stdout.buffer`, and avoid using the default codepage. The use case here is wiring this up to the `hg fix` extension, which writes content to the tool's stdin and reads it back from its stdout to reformat files. That shouldn't change the encoding. A workaround using the existing code is to set `PYTHONUTF8=1` in the environment, but that's not obvious or always easily done. This change also has the nice side effect of no longer changing LF input to CRLF output. (You'd think that `print(..., end='')` would avoid printing the EOL, but that's apparently baked into the `TextIO` object that is `sys.stdout`, and not something the print function can override.)
1 parent 1a2a5ae commit 18c1e24

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

pyupgrade/_main.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -327,7 +327,7 @@ def _fix_file(filename: str, args: argparse.Namespace) -> int:
327327
contents_text = _fix_tokens(contents_text)
328328

329329
if filename == '-':
330-
print(contents_text, end='')
330+
sys.stdout.buffer.write(contents_text.encode())
331331
elif contents_text != contents_text_orig:
332332
print(f'Rewriting {filename}', file=sys.stderr)
333333
with open(filename, 'w', encoding='UTF-8', newline='') as f:

0 commit comments

Comments
 (0)