Skip to content

Nested-only step definitions can be silently wiped when a parent step is updated #6165

@pabloarosado

Description

@pabloarosado

Written by Claude Code — @pabloarosado at the wheel.

Problem

The nested DAG format lets a step be defined only as a nested dependency under a parent (no top-level entry). When the parent's dependency list is later rewritten — for example by `etl update --include-usages`, or by a manual edit — the nested child's definition is lost along with it. The version tracker never gets to flag the old step as `UNUSED → ARCHIVABLE`, and `etl archive` never gets a chance to move it to the archive DAG.

Example

Starting DAG (step defined only nested):

steps:
  export://multidim/foo/latest/bar:
    - data://grapher/foo/2025-01-01/bar:    # only definition (nested)
      - data://garden/foo/2025-01-01/bar    # only definition (nested)

Run:

etl update data://grapher/foo/2025-01-01/bar --include-usages

Result:

steps:
  data://garden/foo/2026-01-01/bar: [...]   # new top-level
  data://grapher/foo/2026-01-01/bar:        # new top-level
    - data://garden/foo/2026-01-01/bar
  export://multidim/foo/latest/bar:
    - data://grapher/foo/2026-01-01/bar     # dep pointer replaced

The two `2025-01-01` entries are gone — wiped, not archived.

Why it happens

`_write_to_nested_dag_file` replaces the entire dependency list at the found location: `mapping[key] = _build_dependency_sequence(...)`. When the export step's deps get rewritten, anything nested inside that subtree is overwritten — including step definitions that lived nowhere else.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions