Standardize Unicode encodings to explicitly use UTF-8#635
Standardize Unicode encodings to explicitly use UTF-8#635dvmukul wants to merge 1 commit intomicrosoft:mainfrom
Conversation
This change adds encoding='utf-8' to file open operations in the script runner, configuration manager, marketplace client, and other core modules. This resolves reported issues (e.g., microsoft#604) where UnicodeDecodeError occurs on Windows when handling prompts and configuration files with non-ASCII characters.
|
@dvmukul please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
sergio-sisternes-epam
left a comment
There was a problem hiding this comment.
Looks good overall — the changes are correct and directly fix the Windows UnicodeDecodeError from #604. Two suggestions in the inline comments (missed open() calls + test coverage).
|
|
||
| if not os.path.exists(CONFIG_FILE): | ||
| with open(CONFIG_FILE, "w") as f: | ||
| with open(CONFIG_FILE, "w", encoding="utf-8") as f: |
There was a problem hiding this comment.
👍 Good fix. Note that an AST scan of the codebase found five more text-mode open() calls that still lack explicit encoding='utf-8':
| File | Line | Operation |
|---|---|---|
adapters/client/codex.py |
65 | toml.dump() — write config |
adapters/client/codex.py |
80 | toml.load() — read config |
adapters/client/copilot.py |
68 | json.dump() — write config |
adapters/client/copilot.py |
83 | json.load() — read config |
deps/github_downloader.py |
184 | write empty gitconfig |
Since the PR title says "Standardize … across the codebase," it would be great to cover these too (same pattern, same Windows risk).
| self.compiled_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| with open(prompt_path, "r") as f: | ||
| with open(prompt_path, "r", encoding="utf-8") as f: |
There was a problem hiding this comment.
This is the core fix for #604. Consider adding a small test that round-trips a non-ASCII string (e.g. CJK characters) through PromptCompiler.compile() to lock in the fix — the existing test_cli_encoding.py only covers console stream reconfiguration, not file I/O.
Summary of Changes
This Pull Request standardizes file operations across the codebase by adding explicit
encoding='utf-8'toopen()calls.Rationale
On Windows, the default system encoding is often locale-specific (e.g., CP1252 or CP950), which causes
UnicodeDecodeErrorwhen reading or writing files that contain UTF-8 characters (like prompt templates or localized marketplace metadata). This fix ensures thatapmbehaves consistently across Windows, macOS, and Linux.Key Fixes:
script_runner.py: Fixed issue in prompt compilation and script reading (resolves bug: UnicodeDecodeError reading .prompt.md on Windows CP950 — open() missing encoding parameter #604).config.py: Standardized global configuration file operations.marketplace/client.py®istry.py: Fixed local caching of marketplace manifests.models/plugin.py: Ensured stable reading ofplugin.jsonmetadata.runtime/copilot_runtime.py: Fixed MCP configuration retrieval.Verification
Changes were audited for text-based text vs binary-based operations to avoid data corruption. Existing unit tests were prioritized to ensure no regressions in core functionality.