Skip to content

fix: centralize YAML I/O with UTF-8 encoding for cross-platform safety#433

Merged
danielmeppiel merged 4 commits intomainfrom
fix/yaml-utf8-encoding
Mar 24, 2026
Merged

fix: centralize YAML I/O with UTF-8 encoding for cross-platform safety#433
danielmeppiel merged 4 commits intomainfrom
fix/yaml-utf8-encoding

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Description

Adds a centralized yaml_io.py module (load_yaml, dump_yaml, yaml_to_str) that guarantees UTF-8 encoding and allow_unicode=True on all YAML file operations. This prevents silent mojibake on Windows where the default file encoding is cp1252.

Root cause: PyYAML's default allow_unicode=False escapes non-ASCII characters as \xNN sequences, and many open() calls lacked explicit encoding='utf-8'. On Windows (cp1252 default encoding), this created a write-read mismatch that would silently corrupt non-ASCII data like accented author names.

Addresses the root cause behind #387.

Changes

  • New module: src/apm_cli/utils/yaml_io.py -- 3 functions, ~55 LOC
  • Migrated 20 YAML I/O sites across 15 source files to use the new helpers
  • CI guard: grep-based step in ci.yml catches any yaml.dump/safe_dump writing to file handles outside yaml_io.py
  • 18 new tests in tests/unit/test_yaml_io.py covering UTF-8 round-trips, unicode preservation, cross-platform safety, key ordering, edge cases

Type of change

  • Bug fix
  • Maintenance / refactor

Testing

  • All 2,676 unit tests pass
  • 18 new yaml_io-specific tests pass
  • CI encoding guard passes locally (zero violations)

Relationship to #388

PR #388 by @alopezsanchez correctly identified the allow_unicode bug but only fixed write sites -- leaving read sites vulnerable to mojibake on Windows. This PR fixes the complete I/O path structurally. Once merged, #388 can be closed or rebased as a much smaller diff.

Copilot AI review requested due to automatic review settings March 24, 2026 03:28
Add yaml_io.py module (load_yaml, dump_yaml, yaml_to_str) that
guarantees UTF-8 encoding and allow_unicode=True on all YAML file
operations. This prevents silent mojibake on Windows where the default
file encoding is cp1252.

Migrated 20 YAML I/O sites across 15 source files to use the new
helpers. Added CI guard that catches any yaml.dump/safe_dump writing
to a file handle outside yaml_io.py.

Addresses the root cause behind #387: non-ASCII characters in apm.yml
(e.g. accented author names) are now preserved as real UTF-8 on all
platforms instead of being escaped as \xNN sequences.

Based on the investigation in #388 by @alopezsanchez.

Co-authored-by: Alejandro Lopez Sanchez <alejandrolsan@inditex.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes all YAML read/write behavior in apm_cli to a single UTF-8-safe helper module, preventing Unicode escaping and Windows cp1252 mojibake issues when round-tripping apm.yml and related YAML content.

Changes:

  • Add src/apm_cli/utils/yaml_io.py helpers (load_yaml, dump_yaml, yaml_to_str) enforcing UTF-8 + allow_unicode=True.
  • Migrate YAML I/O call sites across CLI commands, deps, bundling, compilation, and integrators to use the helpers.
  • Add unit tests for UTF-8/unicode round-trips and a CI workflow guard intended to prevent reintroducing direct yaml.dump/safe_dump file-handle writes.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/apm_cli/utils/yaml_io.py New centralized YAML UTF-8 + unicode-safe helpers.
tests/unit/test_yaml_io.py New unit tests covering unicode preservation, ordering, and UTF-8 round-trips.
tests/unit/test_deps.py Updates mocks to patch yaml_io internals after refactor.
src/apm_cli/models/apm_package.py Switch apm.yml parsing to load_yaml().
src/apm_cli/integration/mcp_integrator.py Switch lazy apm.yml load to load_yaml().
src/apm_cli/deps/verifier.py Switch config load to load_yaml().
src/apm_cli/deps/plugin_parser.py Switch YAML serialization + parsing to yaml_to_str() / load_yaml().
src/apm_cli/deps/lockfile.py Switch lockfile YAML string serialization to yaml_to_str().
src/apm_cli/deps/github_downloader.py Switch apm.yml version-stamping read/write to load_yaml() / dump_yaml().
src/apm_cli/deps/aggregator.py Switch workflow dependency YAML write to dump_yaml().
src/apm_cli/core/script_runner.py Switch apm.yml load to load_yaml().
src/apm_cli/compilation/agents_compiler.py Switch apm.yml compilation config load to load_yaml().
src/apm_cli/commands/uninstall/engine.py Switch apm.yml read during orphan cleanup to load_yaml().
src/apm_cli/commands/uninstall/cli.py Switch apm.yml read/write to load_yaml() / dump_yaml().
src/apm_cli/commands/install.py Switch apm.yml read/write to load_yaml() / dump_yaml().
src/apm_cli/commands/_helpers.py Switch shared apm.yml load + minimal apm.yml write to load_yaml() / dump_yaml().
src/apm_cli/bundle/plugin_exporter.py Switch apm.yml read to load_yaml() for devDependencies filtering.
src/apm_cli/bundle/lockfile_enrichment.py Switch pack/lock YAML rendering to yaml_to_str().
CHANGELOG.md Add unreleased changelog entry describing the YAML encoding fix.
.github/workflows/ci.yml Add CI grep guard step intended to enforce yaml_io usage.

danielmeppiel and others added 2 commits March 24, 2026 04:58
Address PR review comments:
- Remove unused 'import yaml' from 5 files after yaml_io migration
- Switch CI guard from grep -E (POSIX ERE) to grep -P (PCRE) for
  reliable \s and \b matching on Ubuntu runners
- Add PR number to CHANGELOG entry

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit fb9dfa3 into main Mar 24, 2026
6 checks passed
@danielmeppiel danielmeppiel deleted the fix/yaml-utf8-encoding branch March 24, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Author field in apm.yml is not being properly filled due to unicode escaping

2 participants