fix: centralize YAML I/O with UTF-8 encoding for cross-platform safety#433
Merged
danielmeppiel merged 4 commits intomainfrom Mar 24, 2026
Merged
fix: centralize YAML I/O with UTF-8 encoding for cross-platform safety#433danielmeppiel merged 4 commits intomainfrom
danielmeppiel merged 4 commits intomainfrom
Conversation
Add yaml_io.py module (load_yaml, dump_yaml, yaml_to_str) that guarantees UTF-8 encoding and allow_unicode=True on all YAML file operations. This prevents silent mojibake on Windows where the default file encoding is cp1252. Migrated 20 YAML I/O sites across 15 source files to use the new helpers. Added CI guard that catches any yaml.dump/safe_dump writing to a file handle outside yaml_io.py. Addresses the root cause behind #387: non-ASCII characters in apm.yml (e.g. accented author names) are now preserved as real UTF-8 on all platforms instead of being escaped as \xNN sequences. Based on the investigation in #388 by @alopezsanchez. Co-authored-by: Alejandro Lopez Sanchez <alejandrolsan@inditex.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b1a7db0 to
6aa5827
Compare
7 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
This PR centralizes all YAML read/write behavior in apm_cli to a single UTF-8-safe helper module, preventing Unicode escaping and Windows cp1252 mojibake issues when round-tripping apm.yml and related YAML content.
Changes:
- Add
src/apm_cli/utils/yaml_io.pyhelpers (load_yaml,dump_yaml,yaml_to_str) enforcing UTF-8 +allow_unicode=True. - Migrate YAML I/O call sites across CLI commands, deps, bundling, compilation, and integrators to use the helpers.
- Add unit tests for UTF-8/unicode round-trips and a CI workflow guard intended to prevent reintroducing direct
yaml.dump/safe_dumpfile-handle writes.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/apm_cli/utils/yaml_io.py |
New centralized YAML UTF-8 + unicode-safe helpers. |
tests/unit/test_yaml_io.py |
New unit tests covering unicode preservation, ordering, and UTF-8 round-trips. |
tests/unit/test_deps.py |
Updates mocks to patch yaml_io internals after refactor. |
src/apm_cli/models/apm_package.py |
Switch apm.yml parsing to load_yaml(). |
src/apm_cli/integration/mcp_integrator.py |
Switch lazy apm.yml load to load_yaml(). |
src/apm_cli/deps/verifier.py |
Switch config load to load_yaml(). |
src/apm_cli/deps/plugin_parser.py |
Switch YAML serialization + parsing to yaml_to_str() / load_yaml(). |
src/apm_cli/deps/lockfile.py |
Switch lockfile YAML string serialization to yaml_to_str(). |
src/apm_cli/deps/github_downloader.py |
Switch apm.yml version-stamping read/write to load_yaml() / dump_yaml(). |
src/apm_cli/deps/aggregator.py |
Switch workflow dependency YAML write to dump_yaml(). |
src/apm_cli/core/script_runner.py |
Switch apm.yml load to load_yaml(). |
src/apm_cli/compilation/agents_compiler.py |
Switch apm.yml compilation config load to load_yaml(). |
src/apm_cli/commands/uninstall/engine.py |
Switch apm.yml read during orphan cleanup to load_yaml(). |
src/apm_cli/commands/uninstall/cli.py |
Switch apm.yml read/write to load_yaml() / dump_yaml(). |
src/apm_cli/commands/install.py |
Switch apm.yml read/write to load_yaml() / dump_yaml(). |
src/apm_cli/commands/_helpers.py |
Switch shared apm.yml load + minimal apm.yml write to load_yaml() / dump_yaml(). |
src/apm_cli/bundle/plugin_exporter.py |
Switch apm.yml read to load_yaml() for devDependencies filtering. |
src/apm_cli/bundle/lockfile_enrichment.py |
Switch pack/lock YAML rendering to yaml_to_str(). |
CHANGELOG.md |
Add unreleased changelog entry describing the YAML encoding fix. |
.github/workflows/ci.yml |
Add CI grep guard step intended to enforce yaml_io usage. |
Address PR review comments: - Remove unused 'import yaml' from 5 files after yaml_io migration - Switch CI guard from grep -E (POSIX ERE) to grep -P (PCRE) for reliable \s and \b matching on Ubuntu runners - Add PR number to CHANGELOG entry Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a centralized
yaml_io.pymodule (load_yaml,dump_yaml,yaml_to_str) that guarantees UTF-8 encoding andallow_unicode=Trueon all YAML file operations. This prevents silent mojibake on Windows where the default file encoding is cp1252.Root cause: PyYAML's default
allow_unicode=Falseescapes non-ASCII characters as\xNNsequences, and manyopen()calls lacked explicitencoding='utf-8'. On Windows (cp1252 default encoding), this created a write-read mismatch that would silently corrupt non-ASCII data like accented author names.Addresses the root cause behind #387.
Changes
src/apm_cli/utils/yaml_io.py-- 3 functions, ~55 LOCci.ymlcatches anyyaml.dump/safe_dumpwriting to file handles outsideyaml_io.pytests/unit/test_yaml_io.pycovering UTF-8 round-trips, unicode preservation, cross-platform safety, key ordering, edge casesType of change
Testing
Relationship to #388
PR #388 by @alopezsanchez correctly identified the
allow_unicodebug but only fixed write sites -- leaving read sites vulnerable to mojibake on Windows. This PR fixes the complete I/O path structurally. Once merged, #388 can be closed or rebased as a much smaller diff.