Skip to content

Commit d9a5ed1

Browse files
authored
Merge pull request #26 from jongalloway/docs
Add documentation for rendering pipeline, semantic model, testing, and theming
2 parents c9abde0 + bc78ffe commit d9a5ed1

6 files changed

Lines changed: 1483 additions & 0 deletions

File tree

doc/layout-architecture.md

Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
# Layout Engine — Architecture & Roadmap
2+
3+
## 1. Purpose
4+
5+
This document describes the current DiagramForge layout engine, compares it
6+
with the layout system used by [mermaid-js/mermaid](https://github.com/mermaid-js/mermaid),
7+
and proposes an incremental improvement roadmap.
8+
9+
It is intended to inform contributors and maintainers making design decisions
10+
about layout quality, subgraph handling, edge routing, and when (or whether) to
11+
adopt an external graph-layout library.
12+
13+
---
14+
15+
## 2. Architecture Context
16+
17+
DiagramForge converts diagram text to SVG through a four-stage pipeline:
18+
19+
```
20+
Diagram text → Parser → Semantic Model → Layout Engine → SVG Renderer
21+
```
22+
23+
The **layout engine** is the third stage. It receives a fully populated
24+
`Diagram` (nodes, edges, groups, layout hints) and mutates each element's
25+
`X`, `Y`, `Width`, and `Height` in place. The renderer then reads those
26+
coordinates — it never computes positions itself.
27+
28+
The interface is intentionally minimal:
29+
30+
```csharp
31+
public interface ILayoutEngine
32+
{
33+
void Layout(Diagram diagram, Theme theme);
34+
}
35+
```
36+
37+
This makes the engine swappable: any implementation that assigns coordinates to
38+
every node and group satisfies the contract.
39+
40+
---
41+
42+
## 3. Current Implementation (`DefaultLayoutEngine`)
43+
44+
### 3.1 Sizing Pass
45+
46+
Each node's width is estimated from its label using a character-count
47+
heuristic:
48+
49+
```
50+
width = max(MinNodeWidth, charCount × fontSize × 0.6 + 2 × NodePadding)
51+
```
52+
53+
The constant `0.6` approximates the average glyph advance for Latin sans-serif
54+
fonts (Segoe UI, Inter, Arial). Height is uniform (`MinNodeHeight = 40`).
55+
56+
There is no DOM or font-metrics engine, so the estimate can be ±10% off. The
57+
padding budget absorbs most of the error.
58+
59+
### 3.2 Layer Assignment (Ranking)
60+
61+
Nodes are assigned to layers (ranks) using **Kahn's algorithm** — a BFS-based
62+
topological sort:
63+
64+
1. Compute in-degree for every node from the edge list.
65+
2. Enqueue all nodes with in-degree 0 (roots) at layer 0.
66+
3. For each dequeued node, place every neighbour at
67+
`max(current_rank, parent_rank + 1)` and decrement its in-degree.
68+
4. When in-degree reaches 0, enqueue the neighbour.
69+
70+
This is equivalent to a **longest-path** heuristic: each node lands at the
71+
deepest layer reachable from any root.
72+
73+
**Cycle handling:** Nodes that never reach in-degree 0 (back-edge participants)
74+
are appended at incrementing ranks after the last BFS layer, in stable sorted
75+
order by node ID.
76+
77+
### 3.3 Coordinate Assignment
78+
79+
Coordinates are assigned in a single forward pass based on `LayoutDirection`:
80+
81+
| Direction | Layer axis | Within-layer axis |
82+
|-----------|-----------|-------------------|
83+
| **TB / BT** | Each layer → horizontal row, advancing in Y | Nodes advance in X by their individual width + `HorizontalSpacing` |
84+
| **LR / RL** | Each layer → vertical column, advancing in X by the widest node in the column | Nodes stack in Y by uniform height + `VerticalSpacing` |
85+
86+
For **RL** and **BT**, a mirror step flips coordinates along the major axis
87+
after placement.
88+
89+
### 3.4 Group Bounding Boxes
90+
91+
Groups (subgraphs) are handled **post-hoc**: after all node positions are
92+
final, each group's bounding rectangle is computed from its member nodes plus
93+
padding. An extra top inset is added when the group has a label, reserving room
94+
for the header text.
95+
96+
If any group extends into negative coordinate space (label padding exceeds
97+
diagram padding), the entire diagram is shifted so nothing clips.
98+
99+
### 3.5 Edge Rendering
100+
101+
Edges are drawn by the SVG renderer (not the layout engine) as **cubic
102+
Bézier curves** between anchor points on node edges:
103+
104+
- The dominant direction (horizontal vs. vertical) between source and target
105+
determines whether anchors sit on the sides or top/bottom of the nodes.
106+
- Control points are offset 40% of the gap distance, producing a smooth S-curve.
107+
- Edge labels are positioned at the midpoint of the start and end anchors.
108+
109+
There are no bend-points, no edge-routing around intervening nodes, and no
110+
crossing-avoidance logic.
111+
112+
### 3.6 Canvas Sizing
113+
114+
`ComputeWidth` / `ComputeHeight` scan all nodes **and** groups to find the
115+
maximum extent, then add `DiagramPadding`. This ensures group borders that
116+
extend beyond their member nodes are not clipped.
117+
118+
---
119+
120+
## 4. Mermaid.js Layout Architecture
121+
122+
Mermaid supports four pluggable layout engines, selectable per diagram:
123+
124+
| Engine | Algorithm family | Typical use |
125+
|--------|-----------------|-------------|
126+
| **dagre** (default) | Sugiyama layered graph | Flowcharts, state diagrams |
127+
| **ELK** | Eclipse Layout Kernel (Sugiyama + many variants) | Complex flowcharts, nested subgraphs |
128+
| **cose-bilkent** | Force-directed (spring-embedder) | Organic / unstructured layouts |
129+
| **tidy-tree** | Reingold-Tilford | Hierarchical trees, mindmaps |
130+
131+
### 4.1 Dagre (Default Flowchart Engine)
132+
133+
Dagre implements the full **Sugiyama pipeline**, which is the academic
134+
gold-standard for layered graph drawing:
135+
136+
1. **Cycle removal** — back-edges are reversed to make the graph a DAG.
137+
2. **Layer assignment** — network simplex algorithm (optimal-depth ranking,
138+
minimises total edge length).
139+
3. **Crossing minimization** — barycenter / median heuristics, iterated over
140+
multiple up-down sweeps to reorder nodes within each layer.
141+
4. **Coordinate assignment** — Brandes-Köpf algorithm for compact, balanced
142+
positioning that minimises white-space and keeps nodes aligned with their
143+
predecessors.
144+
5. **Edge routing** — polylines or splines with computed control points to
145+
route around nodes and minimise crossings.
146+
147+
### 4.2 ELK
148+
149+
ELK provides the same Sugiyama pipeline with additional capabilities:
150+
151+
- Native support for **compound graphs** (nested subgraphs are laid out as
152+
first-class containers that influence node placement).
153+
- More layout variants (force-based, stress-majorization, radial).
154+
- Configurable per-graph, per-node, and per-edge options.
155+
156+
### 4.3 Text Measurement
157+
158+
Mermaid runs in a browser and uses **DOM `getBBox()`** for pixel-accurate text
159+
measurement. This means node sizes are exact, not estimated.
160+
161+
---
162+
163+
## 5. Comparison
164+
165+
| Aspect | DiagramForge (`DefaultLayoutEngine`) | Mermaid (dagre) |
166+
|--------|--------------------------------------|-----------------|
167+
| **Layer assignment** | BFS longest-path (Kahn's) | Network simplex (optimal) |
168+
| **Crossing minimization** | None (stable ID sort within layers) | Barycenter/median, multi-sweep |
169+
| **Coordinate assignment** | Running offset per layer | Brandes-Köpf (compact, balanced) |
170+
| **Cycle handling** | Append to end layers; no edge reversal | Reverse back-edges; full DAG treatment |
171+
| **Subgraph clustering** | Post-hoc bounding box | Layout-aware: dagre `setParent`, ELK compound nodes |
172+
| **Edge routing** | Cubic Bézier from anchor to anchor; no obstacle avoidance | Polylines / splines with control points routed around nodes |
173+
| **Text measurement** | `charCount × fontSize × 0.6` heuristic | Browser DOM `getBBox()` (pixel-accurate) |
174+
| **Pluggable engines** | 1 (behind `ILayoutEngine`) | 4 (dagre, ELK, cose-bilkent, tidy-tree) |
175+
| **Direction support** | TB, BT, LR, RL | TB, BT, LR, RL |
176+
| **Runtime dependency** | None (pure .NET, no browser) | Browser DOM required |
177+
178+
### 5.1 Where DiagramForge Is Good Enough
179+
180+
For **simple to moderate flowcharts** — linear chains, trees, small DAGs with
181+
a handful of subgraphs — BFS layering + grid placement produces clean,
182+
readable output comparable to dagre's. The variable-width sizing pass and
183+
four-direction support cover the most common real-world cases.
184+
185+
### 5.2 Where the Gap Shows
186+
187+
| Scenario | Symptom |
188+
|----------|---------|
189+
| Many parallel paths with cross-links | Unnecessary edge crossings (no reordering) |
190+
| Dense DAGs (wide fan-out + fan-in) | Suboptimal layer count (longest-path ≠ minimum layers) |
191+
| Nested / overlapping subgraphs | Group rectangles may overlap because members interleave in the same BFS layer |
192+
| Cycles (back-edges) | Cycle nodes are pushed to the end instead of being naturally integrated |
193+
| Long label text | Width estimate can be off by ~10%, occasionally clipping or excess padding |
194+
| Back-edges in flowcharts | No visual distinction (edge appears as a forward long-distance edge) |
195+
196+
---
197+
198+
## 6. Improvement Roadmap
199+
200+
The improvements below are ordered by **impact-to-effort ratio**. Each is
201+
independently shippable and testable.
202+
203+
### Phase 1 — Quick Wins
204+
205+
#### 1a. Crossing Minimization (Barycenter Heuristic)
206+
207+
**Impact:** Large visual improvement for any graph with ≥ 3 nodes per layer.
208+
209+
After layer assignment, reorder nodes within each layer to minimise edge
210+
crossings:
211+
212+
1. For each node in layer *i*, compute the **barycenter** (average position of
213+
connected nodes in layers *i−1* and *i+1*).
214+
2. Sort the layer by barycenter.
215+
3. Repeat top-down then bottom-up for 4–8 sweeps (convergence is fast).
216+
217+
This is a well-understood algorithm with no API or model changes required — it
218+
operates purely on the layer lists before coordinate assignment.
219+
220+
#### 1b. Cluster-Aware Layer Ordering
221+
222+
**Impact:** Subgraph rectangles stop overlapping in most practical cases.
223+
224+
After barycenter reordering, apply a **grouping constraint**: nodes belonging
225+
to the same `Group` must occupy **adjacent** slots within their layer. This is
226+
a sorting tie-breaker, not a hard constraint, so it does not conflict with
227+
crossing minimisation.
228+
229+
### Phase 2 — Structural Improvements
230+
231+
#### 2a. Edge Reversal for Cycles
232+
233+
Instead of appending cycle members to the end, reverse back-edges to create a
234+
proper DAG (the standard Sugiyama preprocessing step). This allows cycle nodes
235+
to participate in normal layer assignment and appear at their natural depth.
236+
The reversed edges are flagged and rendered with a distinct visual treatment
237+
(e.g., dashed with a curved-back arrowhead).
238+
239+
#### 2b. Edge Routing with Bend-Points
240+
241+
Add a post-layout pass that computes bend-points for edges that would otherwise
242+
cross through intervening nodes:
243+
244+
1. For each edge, check whether the straight path intersects any node bounding
245+
box.
246+
2. If so, route the edge around the obstacle using a simple offset algorithm.
247+
3. Store bend-points as a `List<(double X, double Y)>` on the `Edge` model.
248+
4. The renderer draws a polyline or multi-segment Bézier through the points.
249+
250+
This requires adding a `BendPoints` property to `Edge` and updating
251+
`AppendEdge` in `SvgRenderer`.
252+
253+
#### 2c. Improved Text Measurement
254+
255+
Replace the fixed `0.6` constant with a **per-character width table** for
256+
common Latin glyphs (derived from Inter or Segoe UI metrics). This narrows the
257+
error band from ±10% to ±2% without requiring a font-loading library.
258+
259+
Alternatively, accept an optional `Func<string, double, double>` text-measure
260+
delegate via `Theme` or `LayoutHints`, allowing callers with access to real
261+
font metrics to plug them in.
262+
263+
### Phase 3 — Advanced Layout Engines
264+
265+
#### 3a. Network Simplex Ranking
266+
267+
Replace the BFS longest-path ranking with the **network simplex** algorithm
268+
for minimum-total-edge-length layer assignment. This produces tighter, more
269+
balanced layouts for complex DAGs.
270+
271+
This is the most algorithm-intensive change; it can be implemented from the
272+
classic Gansner et al. (1993) paper or adapted from the dagre source
273+
(MIT-licensed).
274+
275+
#### 3b. Force-Directed Layout Engine
276+
277+
Add a second `ILayoutEngine` implementation for **non-hierarchical** diagrams
278+
(entity-relationship, organic networks) using a basic spring-embedder (Fruchterman-Reingold).
279+
280+
This would be selected by parsers that produce diagrams without a clear
281+
directional flow.
282+
283+
#### 3c. Tree Layout Engine
284+
285+
Add a **Reingold-Tilford** implementation for strict tree structures (mindmaps,
286+
org charts, hierarchies). The conceptual DSL's `hierarchy` and `mindmap`
287+
diagram types would benefit from this over the BFS grid.
288+
289+
### Phase 4 — External Integration (Optional)
290+
291+
#### 4a. MSAGL Integration
292+
293+
[Microsoft Automatic Graph Layout (MSAGL)](https://github.com/microsoft/automatic-graph-layout)
294+
is a mature .NET graph layout library with Sugiyama, force-directed, and
295+
large-graph layout algorithms. Wrapping MSAGL behind `ILayoutEngine` would
296+
bring production-grade layout quality to DiagramForge without reimplementing
297+
the core algorithms.
298+
299+
**Trade-offs:** MSAGL is a significant dependency (~2 MB, MIT-licensed). It
300+
would be offered as an optional package (e.g., `DiagramForge.Layout.Msagl`),
301+
not a core requirement.
302+
303+
---
304+
305+
## 7. Decision Framework
306+
307+
When choosing which improvements to prioritise, consider:
308+
309+
| Factor | Guidance |
310+
|--------|----------|
311+
| **Diagram complexity ceiling** | If target users rarely exceed 10–15 nodes, Phase 1 alone may be sufficient. |
312+
| **Subgraph usage** | If Mermaid subgraph support is a key feature, Phase 1b + 2b are high priority. |
313+
| **Non-flowchart diagrams** | Conceptual diagrams (cycle, pyramid, matrix) use specialised layout logic in their parsers, so the general engine matters less for those. |
314+
| **Dependency tolerance** | MSAGL (Phase 4) gives the most layout quality per line of code, but adds a large dependency. |
315+
| **Rendering fidelity** | If output is used in slides / print, text measurement accuracy (Phase 2c) matters more than in web previews. |
316+
317+
---
318+
319+
## 8. References
320+
321+
- Gansner, Koutsofios, North, Vo. "A Technique for Drawing Directed Graphs" (1993) — network simplex, Sugiyama pipeline.
322+
- Brandes, Köpf. "Fast and Simple Horizontal Coordinate Assignment" (2001) — coordinate assignment.
323+
- [dagre-js/dagre](https://github.com/dagrejs/dagre) — MIT-licensed JavaScript Sugiyama implementation used by Mermaid.
324+
- [Eclipse ELK](https://www.eclipse.org/elk/) — Eclipse Layout Kernel.
325+
- [microsoft/automatic-graph-layout (MSAGL)](https://github.com/microsoft/automatic-graph-layout) — .NET graph layout library.
326+
- [Mermaid Layouts documentation](https://mermaid.js.org/config/layouts.html) — supported layout engines in Mermaid.

0 commit comments

Comments
 (0)