1- # Bounded Fan-Out Tiling
1+ # Bounded Fanout Tiling
22
33` BlockPartitioning ` is latency-oriented. It decomposes one dimension at a time
44and gives a shallow communication tree, but the local fan-out can be large.
3838
3939` BoundedFanout k ` exposes this tradeoff as a tiling parameter.
4040
41- Large ` k ` gives shallow, block-partitioning-like schedules. Small feasible ` k `
41+ Large ` k ` approaches ` BlockPartitioning ` at the root; interior tiles continue to
42+ observe the cap as they recurse, so the per-hop fan-out budget is honored
43+ throughout the tree rather than only at the entry point. Small feasible ` k `
4244gives deeper, bisection-like schedules with lower local fan-out. The
4345interpolation is behavioral rather than structural: ` BoundedFanout ` computes a
4446local rectangular frontier, while ` BlockPartitioning ` and ` Bisection ` use
@@ -111,6 +113,13 @@ The fallback is local. A high-dimensional root may require more than the
111113requested cap, but interior tiles often have fewer active dimensions after
112114earlier splits. At those interior tiles, the requested cap is honored again.
113115
116+ When the requested cap exceeds the per-dimension floor, the surplus is
117+ distributed across active dimensions in declaration order, up to each
118+ dimension's capacity (` n - 1 ` for a dimension of size ` n ` ). So ` BoundedFanout 4 `
119+ over a ` 2 x 4 ` tile gives one frontier piece to dimension ` 0 ` (capacity ` 1 ` ,
120+ saturated) and two frontier pieces to dimension ` 1 ` — the dim order matters
121+ when the cap is between the floor and the maximum.
122+
114123## Relations
115124
116125A tiling returns structural children as ` TileNode ` s. Each child has a
204213
205214The root fan-out is bounded by ` 2 ` , and the depth increases accordingly.
206215
216+ ### Multi-dimensional frontier: ` BoundedFanout 3 ` over ` 2 x 4 `
217+
218+ For a ` 2 x 4 ` tile:
219+
220+ ``` text
221+ A B C D
222+ E F G H
223+ ```
224+
225+ dimension ` 0 ` has capacity ` 1 ` (size ` 2 ` ), dimension ` 1 ` has capacity ` 3 `
226+ (size ` 4 ` ). ` minimumFanout = 2 ` ; ` effectiveFanout ` with ` k = 3 ` is ` 3 ` .
227+ ` allocateGroups ` gives ` [1, 2] ` — one frontier piece for dim ` 0 ` , two for dim
228+ ` 1 ` . The root frontier:
229+
230+ ``` text
231+ [E F G H] dim 0, away-from-root
232+ [B C] dim 1, with dim 0 anchored
233+ [D] dim 1, with dim 0 anchored
234+ ```
235+
236+ and the send tree:
237+
238+ ``` text
239+ A
240+ ├─ E
241+ │ ├─ F
242+ │ ├─ G
243+ │ └─ H
244+ ├─ B
245+ │ └─ C
246+ └─ D
247+ ```
248+
249+ Root fan-out is exactly ` 3 ` . At interior tile ` E ` (a ` 1 x 4 ` subtile),
250+ ` activeDims ` shrinks to one, the per-dim floor drops to one, and the cap is
251+ fully available — ` E ` 's fan-out is again ` 3 ` . At ` B ` (a ` 1 x 2 ` subtile)
252+ geometry caps fan-out at ` 1 ` .
253+
254+ ### Narrow rectangle with larger ` k ` : ` BoundedFanout 4 ` over ` 1 x 8 `
255+
256+ To show that ` BoundedFanout ` isn't just a relabelled bisection, raise ` k `
257+ above ` 2 ` on the same ` 1 x 8 ` tile. With ` k = 4 ` , ` allocateGroups ` gives ` [4] ` ,
258+ ` boundedIntervals 4 8 ` yields four intervals of sizes ` [2, 2, 2, 1] ` , and the
259+ root frontier becomes:
260+
261+ ``` text
262+ [B C] [D E] [F G] [H]
263+ ```
264+
265+ The send tree:
266+
267+ ``` text
268+ A
269+ ├─ B
270+ │ └─ C
271+ ├─ D
272+ │ └─ E
273+ ├─ F
274+ │ └─ G
275+ └─ H
276+ ```
277+
278+ Compare against bisection on the same ` 1 x 8 ` (fan-out 3, depth 3) and block
279+ partitioning (fan-out 7, depth 1). ` BoundedFanout 4 ` sits between them with
280+ fan-out ` 4 ` and depth ` 2 ` — exactly the tunable point the parameter is meant
281+ to expose. The intervals are sized by integer division (the first `extra =
282+ remaining mod groups` intervals get one more element), so distribution is
283+ deterministic and balanced.
284+
207285## Interpretation
208286
209287The tradeoff now lives in the tiling algebra:
@@ -216,9 +294,29 @@ Bisection
216294 low-fan-out reference point
217295
218296BoundedFanout k
219- tunable fan-out/depth tradeoff
297+ tunable fan-out/depth tradeoff, honored at every tile in the tree
220298```
221299
222300The schedule and executor do not need special cases. They read the hop tree
223301produced by the tiler.
224302
303+ ## When to pick which
304+
305+ - ** ` BlockPartitioning ` ** when latency dominates and per-hop fan-out is
306+ unconstrained — a single hop reaches every member, and tree depth is
307+ minimal.
308+ - ** ` Bisection ` ** as a low-fan-out reference point for benchmarking or when
309+ the per-hop budget is genuinely the smallest geometrically possible.
310+ - ** ` BoundedFanout k ` ** when there is a known per-hop fan-out budget (network
311+ fan-out cap, per-process outgoing connection limit, etc.). ` k ` is honored
312+ at every interior tile, not just at the root.
313+
314+ ## Implementation
315+
316+ ` src/Tile/Tiling.hs ` . Exports ` BoundedFanout ` , ` minimumFanout ` , and
317+ ` effectiveFanout ` ; instances ` Tiling BoundedFanout ` . Helpers ` allocateGroups ` ,
318+ ` boundedIntervals ` , ` frontierTile ` , ` anchorPrefix ` , and ` rootPointTile ` are
319+ internal. The same ` Relation ` algebra (` Anchor ` / ` Sibling ` ) is shared with
320+ ` BlockPartitioning ` and ` Bisection ` ; ` contractAnchors ` requires no changes
321+ for the new tiler.
322+
0 commit comments