yaya (Yet Another YAML AST transformer) is a Python library for byte-for-byte preserving YAML editing. Unlike ruamel.yaml's round-trip mode (which preserves most formatting but makes small changes), this library guarantees that only the values you explicitly modify will change.
We use a clean AST architecture for truly lossless editing:
- Parse YAML with ruamel.yaml to get AST + position info
- Extract all formatting metadata (quotes, indentation, styles, blank lines) from original bytes
- Convert ruamel's CommentedMap/CommentedSeq to a clean, immutable AST
- Track modifications in the clean AST as you make changes
- Re-serialize the entire document, preserving all formatting for unchanged sections
This guarantees byte-for-byte preservation while supporting arbitrary modifications.
src/yaya/
├── __init__.py # Package exports
├── document.py # YAYA class (main interface)
├── converter.py # Convert ruamel AST → clean AST
├── emitter.py # Serialize clean AST → bytes
├── nodes.py # Clean AST node types (Scalar, Mapping, Sequence, etc.)
├── extract.py # Extract formatting from original bytes
├── serialization.py # ruamel.yaml wrapper for programmatic nodes
├── path.py # Path parsing and navigation
├── formatting.py # Style hints for programmatic nodes
└── jinja2_helpers.py # Detect/preserve Jinja2 expressions
Key components:
- nodes.py: Immutable AST nodes (Document, Mapping, Sequence, Scalar, Comment, BlankLines, etc.)
- converter.py:
convert_to_clean_ast()- Extract formatting and build clean AST - emitter.py:
serialize()- Render clean AST to bytes with full formatting preservation - extract.py: Extract quotes, indentation, styles, offsets from original bytes
- YAYA class: Main interface with dict-like access, path navigation, modifications
- ✅ String replacements (literal and regex)
- ✅ Comment preservation (inline and standalone)
- ✅ Whitespace preservation (including trailing spaces)
- ✅ Blank line preservation (including within dicts)
- ✅ Quote style preservation (single, double, unquoted)
- ✅ Block scalar handling with indicator preservation
- ✅ Flow vs block style control
- ✅ List indentation detection (aligned vs indented)
- ✅ Path navigation and assertions
- ✅ Key addition, replacement, deletion with order control
- ✅ Jinja2 expression preservation
- ✅ Idempotency
- ✅ Real-world workflow transformations
- Add yq-style path selectors with wildcards (
.jobs.test.steps[*].run) - Better error messages when modifications fail
- Explicit testing for anchors/aliases, multi-document streams
- Callback-based value transformation
- More flexible key insertion (not just
add_key_after,insert_key_between) - Preserve and manipulate standalone comments (not attached to keys)
-
src/yaya/document.py: Main YAYA class (user-facing API)replace_in_values(),replace_in_values_regex(): String replacementsadd_key(),replace_key(),add_key_after(),insert_key_between(),delete_key(): Key manipulationget_path(),assert_value(),assert_present(),assert_absent(): Navigation and assertionsensure_key(): Idempotent key additionset_list_indent_style(): Control list indentation
-
src/yaya/converter.py: Convert ruamel AST to clean ASTconvert_to_clean_ast(): Main entry point_convert_mapping(): Extract formatting from mappings (includes blank line preservation)_convert_sequence(): Extract formatting from sequences_convert_scalar(): Extract quotes and values from scalars
-
src/yaya/emitter.py: Serialize clean AST to bytesserialize(): Main entry point_emit_mapping(),_emit_sequence(),_emit_scalar(): Per-node renderers- Preserves all formatting metadata during rendering
-
src/yaya/nodes.py: Clean AST node definitionsDocument,Mapping,Sequence,Scalar: Core structureComment,BlankLines,InlineCommented: Formatting metadataKeyValue: Mapping key-value pairs- All nodes are immutable (NamedTuple)
-
src/yaya/extract.py: Extract formatting from original bytesextract_quote_style(): Detect single/double/plain quotesextract_indentation(): Find indentation at lineextract_mapping_style(),extract_sequence_style(): Flow vs blockextract_sequence_offset(): List dash offset
-
src/yaya/path.py: Path parsing and navigationparse_path(): Parse dotted paths with array indicesnavigate_to_path(): Navigate in ruamel AST
-
tests/: Comprehensive test suite (95 tests)test_basic.py: Core functionalitytest_blank_lines.py: Blank line preservation (NEW in 0.3.0)test_quote_preservation.py: Quote style handlingtest_list_indentation.py: List indent detectiontest_style_hints.py: Flow vs block style controltest_workflow_transforms.py: Real-world transformationstest_jinja2_expressions.py: Jinja2 preservation- Plus many more specialized tests
doc = YAYA.load('test.yaml')
print(f"Before: {doc.modifications}") # Should be {}
doc.replace_in_values('old', 'new')
print(f"After: {doc.modifications}") # Should show byte rangesfrom ruamel.yaml import YAML
yaml = YAML()
data = yaml.load(open('test.yaml'))
# For mappings
if hasattr(data, 'lc'):
print(data.lc.data) # {key: [key_line, key_col, val_line, val_col]}
# For sequences
if hasattr(data['list'], 'lc'):
print(data['list'].lc.data) # {index: [line, col]}# Run all tests
pytest tests/ -v
# Run specific test
pytest tests/test_basic.py::test_block_scalar -v
# Run with debugging
pytest tests/ -vv -sThis library was created to solve a specific problem: updating file paths in GitHub Actions workflows when restructuring a monorepo. For example:
# Before
run: pytest src/marin/tests
# After (preserving everything else)
run: pytest lib/marin/src/marin/testsUsing ruamel.yaml's round-trip mode would change:
- Block scalar indicators (
|→|-) - Trailing whitespace in blank lines
- Sometimes indentation
This library guarantees those stay untouched.
- The lossless approach works with stock ruamel.yaml
- Easier to maintain as a separate library
- Can iterate independently
- ruamel.yaml maintainer might not want this complexity
- Guarantees perfect preservation
- Simpler mental model: "only change what I explicitly modified"
- Avoids the complexity of tracking every formatting detail in the AST
yqis written in Go, doesn't integrate well with Python workflowsyqdoesn't support arbitrary string replacement within values- We want programmatic Python access to the AST
- ruamel.yaml docs: https://yaml.dev/doc/ruamel.yaml/
- ruamel.yaml source: https://sourceforge.net/p/ruamel-yaml/code/ (Mercurial)
- Original discussion in:
/Users/ryan/c/ruamel-yaml/(cloned from SourceForge)
-
Refactored codebase (latest): Split single 772-line file into focused modules
- Created
byte_ops.pyfor low-level operations - Created
path.pyfor path parsing/navigation - Created
serialization.pyfor YAML serialization - Created
modifications.pyfor modification tracking - Simplified
document.py(main YAYA class) - All tests still pass (21/21)
- Created
-
Block scalar handling: Fixed indentation preservation
-
Nested structures: Fixed tracking of mappings within sequences
-
List indentation: Smart detection and configuration
-
Path operations: Full support for dotted paths with array indices
-
Key manipulation: Can add, replace, and insert keys