Skip to content

Commit 1338794

Browse files
add some bounded fanout tests over slices
1 parent 2b018ec commit 1338794

2 files changed

Lines changed: 135 additions & 3 deletions

File tree

article/bounded-fanout.md

Lines changed: 101 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Bounded Fan-Out Tiling
1+
# Bounded Fanout Tiling
22

33
`BlockPartitioning` is latency-oriented. It decomposes one dimension at a time
44
and gives a shallow communication tree, but the local fan-out can be large.
@@ -38,7 +38,9 @@ A
3838

3939
`BoundedFanout k` exposes this tradeoff as a tiling parameter.
4040

41-
Large `k` gives shallow, block-partitioning-like schedules. Small feasible `k`
41+
Large `k` approaches `BlockPartitioning` at the root; interior tiles continue to
42+
observe the cap as they recurse, so the per-hop fan-out budget is honored
43+
throughout the tree rather than only at the entry point. Small feasible `k`
4244
gives deeper, bisection-like schedules with lower local fan-out. The
4345
interpolation is behavioral rather than structural: `BoundedFanout` computes a
4446
local rectangular frontier, while `BlockPartitioning` and `Bisection` use
@@ -111,6 +113,13 @@ The fallback is local. A high-dimensional root may require more than the
111113
requested cap, but interior tiles often have fewer active dimensions after
112114
earlier splits. At those interior tiles, the requested cap is honored again.
113115

116+
When the requested cap exceeds the per-dimension floor, the surplus is
117+
distributed across active dimensions in declaration order, up to each
118+
dimension's capacity (`n - 1` for a dimension of size `n`). So `BoundedFanout 4`
119+
over a `2 x 4` tile gives one frontier piece to dimension `0` (capacity `1`,
120+
saturated) and two frontier pieces to dimension `1` — the dim order matters
121+
when the cap is between the floor and the maximum.
122+
114123
## Relations
115124

116125
A tiling returns structural children as `TileNode`s. Each child has a
@@ -204,6 +213,75 @@ A
204213

205214
The root fan-out is bounded by `2`, and the depth increases accordingly.
206215

216+
### Multi-dimensional frontier: `BoundedFanout 3` over `2 x 4`
217+
218+
For a `2 x 4` tile:
219+
220+
```text
221+
A B C D
222+
E F G H
223+
```
224+
225+
dimension `0` has capacity `1` (size `2`), dimension `1` has capacity `3`
226+
(size `4`). `minimumFanout = 2`; `effectiveFanout` with `k = 3` is `3`.
227+
`allocateGroups` gives `[1, 2]` — one frontier piece for dim `0`, two for dim
228+
`1`. The root frontier:
229+
230+
```text
231+
[E F G H] dim 0, away-from-root
232+
[B C] dim 1, with dim 0 anchored
233+
[D] dim 1, with dim 0 anchored
234+
```
235+
236+
and the send tree:
237+
238+
```text
239+
A
240+
├─ E
241+
│ ├─ F
242+
│ ├─ G
243+
│ └─ H
244+
├─ B
245+
│ └─ C
246+
└─ D
247+
```
248+
249+
Root fan-out is exactly `3`. At interior tile `E` (a `1 x 4` subtile),
250+
`activeDims` shrinks to one, the per-dim floor drops to one, and the cap is
251+
fully available — `E`'s fan-out is again `3`. At `B` (a `1 x 2` subtile)
252+
geometry caps fan-out at `1`.
253+
254+
### Narrow rectangle with larger `k`: `BoundedFanout 4` over `1 x 8`
255+
256+
To show that `BoundedFanout` isn't just a relabelled bisection, raise `k`
257+
above `2` on the same `1 x 8` tile. With `k = 4`, `allocateGroups` gives `[4]`,
258+
`boundedIntervals 4 8` yields four intervals of sizes `[2, 2, 2, 1]`, and the
259+
root frontier becomes:
260+
261+
```text
262+
[B C] [D E] [F G] [H]
263+
```
264+
265+
The send tree:
266+
267+
```text
268+
A
269+
├─ B
270+
│ └─ C
271+
├─ D
272+
│ └─ E
273+
├─ F
274+
│ └─ G
275+
└─ H
276+
```
277+
278+
Compare against bisection on the same `1 x 8` (fan-out 3, depth 3) and block
279+
partitioning (fan-out 7, depth 1). `BoundedFanout 4` sits between them with
280+
fan-out `4` and depth `2` — exactly the tunable point the parameter is meant
281+
to expose. The intervals are sized by integer division (the first `extra =
282+
remaining mod groups` intervals get one more element), so distribution is
283+
deterministic and balanced.
284+
207285
## Interpretation
208286

209287
The tradeoff now lives in the tiling algebra:
@@ -216,9 +294,29 @@ Bisection
216294
low-fan-out reference point
217295
218296
BoundedFanout k
219-
tunable fan-out/depth tradeoff
297+
tunable fan-out/depth tradeoff, honored at every tile in the tree
220298
```
221299

222300
The schedule and executor do not need special cases. They read the hop tree
223301
produced by the tiler.
224302

303+
## When to pick which
304+
305+
- **`BlockPartitioning`** when latency dominates and per-hop fan-out is
306+
unconstrained — a single hop reaches every member, and tree depth is
307+
minimal.
308+
- **`Bisection`** as a low-fan-out reference point for benchmarking or when
309+
the per-hop budget is genuinely the smallest geometrically possible.
310+
- **`BoundedFanout k`** when there is a known per-hop fan-out budget (network
311+
fan-out cap, per-process outgoing connection limit, etc.). `k` is honored
312+
at every interior tile, not just at the root.
313+
314+
## Implementation
315+
316+
`src/Tile/Tiling.hs`. Exports `BoundedFanout`, `minimumFanout`, and
317+
`effectiveFanout`; instances `Tiling BoundedFanout`. Helpers `allocateGroups`,
318+
`boundedIntervals`, `frontierTile`, `anchorPrefix`, and `rootPointTile` are
319+
internal. The same `Relation` algebra (`Anchor` / `Sibling`) is shared with
320+
`BlockPartitioning` and `Bisection`; `contractAnchors` requires no changes
321+
for the new tiler.
322+

test/Main.hs

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,9 @@ theoremTests =
6868
testProperty "T3 affine slicing is closed and included in its parent" propAffineSliceIncluded,
6969
testProperty "T4 structural and communication children are included in their parent" propChildrenIncluded,
7070
testProperty "T5 bounded fanout respects effective fan-out" propBoundedFanoutChildren,
71+
testProperty "T5s bounded fanout respects effective fan-out on slices" propBoundedFanoutChildrenSliced,
7172
testProperty "T6 bounded fanout honors achievable caps" propBoundedFanoutHonorsAchievableCap,
73+
testProperty "T6s bounded fanout honors achievable caps on slices" propBoundedFanoutHonorsAchievableCapSliced,
7274
testProperty "T7 fault-free schedules form a spanning send tree" propFaultFreeScheduleSpansTile,
7375
testProperty "T8 occluded schedules deliver exactly to live members" propOccludedScheduleCoversLiveMembers
7476
]
@@ -158,6 +160,27 @@ propBoundedFanoutHonorsAchievableCap =
158160
in minimumFanout parent <= k ==>
159161
length (children (BoundedFanout k) parent) <= k
160162

163+
propBoundedFanoutChildrenSliced :: Property
164+
propBoundedFanoutChildrenSliced =
165+
forAll genAffineSlice $ \(shape, dim, begin, end, step) ->
166+
forAll (chooseInt (1, 8)) $ \k ->
167+
case select (rowMajor shape) dim begin end step of
168+
Nothing -> property True
169+
Just sliced ->
170+
let parent = Tile sliced
171+
in property (length (children (BoundedFanout k) parent) <= effectiveFanout parent k)
172+
173+
propBoundedFanoutHonorsAchievableCapSliced :: Property
174+
propBoundedFanoutHonorsAchievableCapSliced =
175+
forAll genAffineSlice $ \(shape, dim, begin, end, step) ->
176+
forAll (chooseInt (1, 8)) $ \k ->
177+
case select (rowMajor shape) dim begin end step of
178+
Nothing -> property True
179+
Just sliced ->
180+
let parent = Tile sliced
181+
in minimumFanout parent <= k ==>
182+
length (children (BoundedFanout k) parent) <= k
183+
161184
propFaultFreeScheduleSpansTile :: Property
162185
propFaultFreeScheduleSpansTile =
163186
forAll genShape $ \shape ->
@@ -445,6 +468,17 @@ inclusionTests =
445468
map tileRanks (children BlockPartitioning middleColumns)
446469
@?= [[5, 6], [2]]
447470
mapM_ (assertRanksIncludedIn middleColumns) (children BlockPartitioning middleColumns),
471+
testCase "BoundedFanout 2 over a sliced tile preserves base-rank frame" $ do
472+
let full = rootTile [2, 4]
473+
middleColumns = expectTile "middle columns" (Tile <$> select (space full) 1 1 3 1)
474+
-- slice members: {1, 2, 5, 6}; root rank 1.
475+
root middleColumns @?= 1
476+
sort (tileRanks middleColumns) @?= [1, 2, 5, 6]
477+
-- communication children: sibling rank 5 (covers {5, 6}) and sibling rank 2.
478+
let kids = children (BoundedFanout 2) middleColumns
479+
map root kids @?= [5, 2]
480+
map (sort . tileRanks) kids @?= [[5, 6], [2]]
481+
mapM_ (assertRanksIncludedIn middleColumns) kids,
448482
testCase "occluded schedule over jagged region sends only to live members" $ do
449483
let members = ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
450484
occ = Occlusion (`elem` ["F", "H", "I"])

0 commit comments

Comments
 (0)