Skip to content

fix(dashboard): drop ELK edges that reference nodes removed by orphan/cycle repair passes#456

Open
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/elk-layout-orphan-edges
Open

fix(dashboard): drop ELK edges that reference nodes removed by orphan/cycle repair passes#456
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/elk-layout-orphan-edges

Conversation

@tirth8205

Copy link
Copy Markdown
Contributor

Problem

  • repairElkInput is supposed to hand ELK a self-consistent graph, but the orphan-edge pass (step 4) validates edges against allIds, which is computed once by walking childrenB — the child set BEFORE step 3 removes orphan children and BEFORE step 5 removes containment-cycle nodes. So an edge whose endpoint is a node that gets dropped by step 3 or step 5 is NOT dropped: it survives in the returned edges array pointing at a node that no longer exists in children. Verified by reproducing the logic: input children [a, orphan(parentId:ghost)] with edge a -> orphan; after repair children is [a] (orphan dropped) yet edges still contains e1: a -> orphan. This dangling edge is exactly the kind of malformed input that can make elk.layout() throw, which then trips the elk-layout-failed fatal path the repair function exists to avoid.

Fix

  • Move the orphan-edge pass (step 4) to run AFTER the containment-cycle pass (step 5), and recompute the id set from the final child tree (childrenD) instead of reusing the stale allIds from childrenB. Concretely: rename the step-4 allIds use to a freshly-walked set, e.g. const finalIds = new Set(); const walkFinal = (children: ElkChild[]) => { for (const c of children) { finalIds.add(c.id); if (c.children) walkFinal(c.children); } }; walkFinal(childrenD); let orphanEdges = 0; const…

Testing

Adds unit test(s) that fail before the change and pass after. The full dashboard test suite, eslint, and tsc --noEmit all pass locally on this branch.

Found via a static correctness audit of the dashboard ELK layout repair.

🤖 Generated with Claude Code

…/cycle repair passes

The orphan-edge pass validated edges against an id set walked from the
pre-removal child tree (childrenB), so an edge whose endpoint is a node
later dropped by the orphan-child or containment-cycle passes survived,
pointing at a node no longer present in children. ELK can throw on such
dangling input, tripping the elk-layout-failed fatal path.

Move the orphan-edge pass to run after the containment-cycle pass and
validate edges against a freshly-walked id set from the final child tree
(childrenD).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@thejesh23 thejesh23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns before this lands.

1. Layout is still the de-facto deduper. Step 3 (orphan-child) and step 4 (cycle) know they're removing nodes but don't drop their own dangling edges — they leave it to step 5 to clean up. Worse, when step 3 filters a parent it drops c.children wholesale, so every edge into that subtree's descendants gets silently dropped by step 5 too. If the orphan/cycle passes returned {children, removedIds} the edge filter could be local to each pass and the reason for each drop would be recoverable.

2. Observability is lossy. The issue message is "Dropped N edge(s) referencing nonexistent nodes" and the category is elk-orphan-edge regardless of whether the missing endpoint was an original ghost, an orphan-child drop, or a cycle drop. Combined with appendLayoutIssues deduping by level|message (store.ts:776), a second layout run that drops a different single edge produces the same string and is swallowed — so diff-impact re-layouts won't surface fresh losses. At minimum, include edge ids or the missing-endpoint id in the message.

3. Test gaps. The new test covers the orphan-child case but not the cycle case (step 4 removing a node that an edge targets), nor the cascading case where step 3 drops a parent and an edge into a now-vanished grandchild gets quietly dropped by step 5. Both are reachable from the same root cause and would lock the contract this PR is establishing.

Nit: // 4. dropCircularContainment comment block was renumbered but the inline doc on step 5 says "step 3 or step 4" — fine until someone renumbers again. Consider naming the steps instead of numbering them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants