You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The second parameter `0` is the dimension offset (always 0 for 1D vectors). This loads a **vectorized chunk** of data in a single operation. The exact number of elements loaded depends on your GPU's SIMD capabilities.
Copy file name to clipboardExpand all lines: book/src/puzzle_25/puzzle_25.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ Lane 0 ──broadcast──> All lanes (0, 1, 2, ..., 31)
35
35
36
36
### **Warp communication operations in Mojo**
37
37
38
-
Learn the core communication primitives from `gpu.warp`:
38
+
Learn the core communication primitives from `gpu.primitives.warp`:
39
39
40
40
1.**[`shuffle_down(value, offset)`](https://docs.modular.com/mojo/stdlib/gpu/warp/shuffle_down)**: Get value from lane at higher index (neighbor access)
41
41
2.**[`broadcast(value)`](https://docs.modular.com/mojo/stdlib/gpu/warp/broadcast)**: Share lane 0's value with all other lanes (one-to-many)
Copy file name to clipboardExpand all lines: book/src/puzzle_27/block_broadcast.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# block.broadcast() Vector Normalization
2
2
3
-
Implement vector mean normalization by combining [block.sum](https://docs.modular.com/mojo/stdlib/gpu/block/sum) and [block.broadcast](https://docs.modular.com/mojo/stdlib/gpu/block/broadcast) operations to demonstrate the complete block-level communication workflow. Each thread will contribute to computing the mean, then receive the broadcast mean to normalize its element, showcasing how block operations work together to solve real parallel algorithms.
3
+
Implement vector mean normalization by combining [block.sum](https://docs.modular.com/mojo/stdlib/gpu/primitives/block/sum) and [block.broadcast](https://docs.modular.com/mojo/stdlib/gpu/primitives/block/broadcast) operations to demonstrate the complete block-level communication workflow. Each thread will contribute to computing the mean, then receive the broadcast mean to normalize its element, showcasing how block operations work together to solve real parallel algorithms.
4
4
5
-
**Key insight:**_The [block.broadcast()](https://docs.modular.com/mojo/stdlib/gpu/block/broadcast) operation enables one-to-all communication, completing the fundamental block communication patterns: reduction (all→one), scan (all→each), and broadcast (one→all)._
5
+
**Key insight:**_The [block.broadcast()](https://docs.modular.com/mojo/stdlib/gpu/primitives/block/broadcast) operation enables one-to-all communication, completing the fundamental block communication patterns: reduction (all→one), scan (all→each), and broadcast (one→all)._
6
6
7
7
## Key concepts
8
8
@@ -126,7 +126,7 @@ if local_i == 0:
126
126
127
127
**Why thread 0?** Consistent with `block.sum()` pattern where thread 0 receives the result.
128
128
129
-
### 4. **[block.broadcast()](https://docs.modular.com/mojo/stdlib/gpu/block/broadcast) API concepts**
129
+
### 4. **[block.broadcast()](https://docs.modular.com/mojo/stdlib/gpu/primitives/block/broadcast) API concepts**
Copy file name to clipboardExpand all lines: book/src/puzzle_27/block_prefix_sum.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# block.prefix_sum() Parallel Histogram Binning
2
2
3
-
This puzzle implements parallel histogram binning using block-level [block.prefix_sum](https://docs.modular.com/mojo/stdlib/gpu/block/prefix_sum) operations for advanced parallel filtering and extraction. Each thread determines its element's target bin, then applies `block.prefix_sum()` to compute write positions for extracting elements from a specific bin, showing how prefix sum enables sophisticated parallel partitioning beyond simple reductions.
3
+
This puzzle implements parallel histogram binning using block-level [block.prefix_sum](https://docs.modular.com/mojo/stdlib/gpu/primitives/block/prefix_sum) operations for advanced parallel filtering and extraction. Each thread determines its element's target bin, then applies `block.prefix_sum()` to compute write positions for extracting elements from a specific bin, showing how prefix sum enables sophisticated parallel partitioning beyond simple reductions.
4
4
5
-
**Key insight:**_The [block.prefix_sum()](https://docs.modular.com/mojo/stdlib/gpu/block/prefix_sum) operation provides parallel filtering and extraction by computing cumulative write positions for matching elements across all threads in a block._
5
+
**Key insight:**_The [block.prefix_sum()](https://docs.modular.com/mojo/stdlib/gpu/primitives/block/prefix_sum) operation provides parallel filtering and extraction by computing cumulative write positions for matching elements across all threads in a block._
0 commit comments