Skip to content

Commit 62d32a2

Browse files
lukstaficlaude
andcommitted
Add proposal for gh-ocannl-291: audit manual sharing specification
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6b46f45 commit 62d32a2

1 file changed

Lines changed: 99 additions & 0 deletions

File tree

docs/proposals/gh-ocannl-291.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Audit manually specifying sharing in tensor node memory modes
2+
3+
GitHub issue: [ahrefs/ocannl#291](https://github.com/ahrefs/ocannl/issues/291)
4+
5+
## Goal
6+
7+
Audit the correctness of manually specifying the `sharing` property (`Per_stream` vs `Shared_cross_streams`) on tensor node memory modes, identify bugs or inconsistencies between manual specification and the inference/allocation machinery, and either fix issues or document limitations. Since multi-streaming was removed (#341), the "shared cross-stream but updated from multiple streams" scenario from the original issue is largely moot -- the audit focuses on whether manual sharing specification works correctly in the current single-stream-per-device context and whether dead sharing code should be cleaned up.
8+
9+
## Acceptance Criteria
10+
11+
- [ ] **Manual sharing specification paths verified**: Confirm that calling `Tn.update_memory_mode` with `On_device Shared_cross_streams` or `On_device Per_stream` before compilation correctly influences `alloc_if_needed` allocation behavior in `backends.ml`.
12+
- [ ] **update_memory_sharing consistency audit**: Verify all case arms in `update_memory_sharing` (tnode.ml lines 357-388) are reachable and correct; identify any arms that are dead code after multi-streaming removal.
13+
- [ ] **alloc_if_needed sharing logic audit**: Verify the three-way branch in `alloc_if_needed` (backends.ml lines 520-559) -- read-only/constant path, `known_shared_cross_streams` writable path, and per-stream fallback -- handles all manual specification cases correctly.
14+
- [ ] **Conflict detection audit**: Verify that manually specifying sharing as `Shared_cross_streams` on a node that is then written to by different streams produces a clear error (not silent corruption).
15+
- [ ] **Dead sharing code identified**: List sharing-related code that is unreachable after multi-streaming removal (cross-stream candidates, owner_stream tracking, shared_writer_streams synchronization in `backends.ml`) and file a cleanup issue if substantial.
16+
- [ ] **Test coverage assessed**: Determine whether existing tests exercise manual sharing specification; if not, add a minimal test or document why testing is impractical in the single-stream context.
17+
- [ ] **Documentation of current behavior**: Update code comments or docs to clarify the post-multi-streaming-removal semantics of `sharing` -- what `Shared_cross_streams` means when there is only one stream per device.
18+
19+
## Context
20+
21+
### Memory mode and sharing types (tnode.ml)
22+
23+
The `sharing` type has three variants:
24+
- `Unset` -- not yet determined, will be inferred as `Per_stream` or `Shared_cross_streams`
25+
- `Per_stream` -- separate buffer per stream (each stream gets its own allocation)
26+
- `Shared_cross_streams` -- single buffer per device, reused across streams
27+
28+
The `sharing` property appears embedded in:
29+
- `On_device of sharing` -- device-resident tensor with specified sharing
30+
- `Hosted (Changed_on_devices of sharing)` -- hosted tensor whose device copy has specified sharing
31+
32+
Users can manually set sharing via:
33+
- `Tn.update_memory_mode tn (On_device Shared_cross_streams) provenance` -- directly on tnode
34+
- `Tn.update_memory_sharing tn Shared_cross_streams provenance` -- updates sharing while preserving the memory mode category
35+
- `Train.set_materialized` -- sets `Materialized` (sharing-agnostic, resolved later)
36+
- `Train.set_on_host` / `Train.set_hosted` -- sets hosted modes (sharing resolved via `Changed_on_devices Unset`)
37+
38+
### Sharing inference in alloc_if_needed (backends.ml lines 495-559)
39+
40+
The allocation logic in `alloc_if_needed` determines sharing at link time:
41+
1. **Read-only/constant nodes**: If not `known_non_cross_stream`, uses `cross_stream_candidates` hashtable to share a single buffer; calls `update_memory_sharing Shared_cross_streams 39`.
42+
2. **Writable + `known_shared_cross_streams`**: Requires an owner stream; looks up shared buffer from `cross_stream_candidates`; errors if written from multiple streams.
43+
3. **Writable + not shared**: Falls through to `update_memory_sharing Per_stream 410` and allocates a fresh buffer.
44+
45+
Key concern from the original issue: If a user manually marks a node as `Shared_cross_streams` but the node is writable and appears in multiple streams, the code at line 543-550 raises a `User_error`. This is the intended safety check, but since multi-streaming is removed, the scenario cannot arise in practice.
46+
47+
### Synchronization infrastructure (backends.ml)
48+
49+
The `shared_writer_streams`, `host_reading_streams`, and related event-tracking machinery in `Add_buffer_retrieval_and_syncing` was designed for multi-stream synchronization. Key functions:
50+
- `wait_for_all` / `wait_for_ready` -- synchronize across streams before transfers
51+
- `to_host` -- checks `potentially_cross_stream` before host transfer
52+
- `update_writer_event` -- tracks writer events for cross-stream nodes
53+
- `sync_routine.pre` -- waits for cross-stream writer events before routine execution
54+
55+
With single-stream-per-device, most of this synchronization code is technically unnecessary but harmless. The `potentially_cross_stream` predicate still fires for nodes without explicit `Per_stream` marking.
56+
57+
### Related tasks and issues
58+
59+
- **gh-ocannl-333**: Remove hosted memory mode entirely (would subsume much of the sharing complexity)
60+
- **gh-ocannl-341**: Multi-streaming removed (closed, "Not planned") -- makes cross-stream sharing moot
61+
- **gh-ocannl-296**: Low-level audit (adjacent concern, memory mode inference in `cleanup_virtual_llc`)
62+
63+
### Code pointers
64+
65+
- **Sharing types**: `arrayjit/lib/tnode.ml` lines 25-34
66+
- **Memory mode types**: `arrayjit/lib/tnode.ml` lines 47-65
67+
- **update_memory_mode**: `arrayjit/lib/tnode.ml` lines 306-352 (~46 case arms)
68+
- **update_memory_sharing**: `arrayjit/lib/tnode.ml` lines 357-388 (~12 case arms)
69+
- **Sharing predicates**: `known_shared_cross_streams`, `known_non_cross_stream`, `potentially_cross_stream` (tnode.ml lines 285-299)
70+
- **alloc_if_needed**: `arrayjit/lib/backends.ml` lines 495-559
71+
- **Synchronization**: `arrayjit/lib/backends.ml` lines 38-90 (`wait_for_all`, `to_host`, `update_writer_event`)
72+
- **Device ref type**: `arrayjit/lib/backend_intf.ml` lines 91-100 (`cross_stream_candidates`, `owner_stream`, `shared_writer_streams`)
73+
- **Metal backend sharing**: `arrayjit/lib/metal_backend.ml` line 764 (uses `cross_stream_candidates` for constant buffers)
74+
- **Train helpers**: `lib/train.ml` lines 63-182 (`set_on_host`, `set_materialized`, `set_hosted`, `set_virtual`)
75+
76+
## Approach
77+
78+
### Phase 1: Static analysis of sharing code paths
79+
80+
1. Trace all call sites of `update_memory_sharing` (only 2: backends.ml lines 541 and 557) and `update_memory_mode` with sharing-carrying modes to build a complete map of how sharing gets set.
81+
2. For each case arm in `update_memory_sharing`, determine: (a) is it reachable in single-stream context? (b) is the behavior correct?
82+
3. Analyze the `alloc_if_needed` three-way branch for all combinations of manual sharing specification vs inference.
83+
84+
### Phase 2: Identify dead code and simplification opportunities
85+
86+
1. Catalog sharing-related infrastructure that is unreachable with single-stream-per-device: `owner_stream` tracking, `shared_writer_streams` synchronization, multi-stream `wait_for_all` patterns.
87+
2. Determine what simplifications are safe vs what should be preserved for potential future multi-stream reintroduction.
88+
3. Consider interaction with gh-ocannl-333 (hosted mode removal) -- if that lands first, much of the sharing complexity disappears.
89+
90+
### Phase 3: Verify or add test coverage
91+
92+
1. Check whether any existing test exercises manual sharing specification (search for explicit `On_device Shared_cross_streams` or `Per_stream` in test code -- current search shows none).
93+
2. Write a minimal test that manually sets sharing before compilation and verifies correct allocation behavior, or document why this is impractical in the current architecture.
94+
95+
### Phase 4: Document findings and clean up
96+
97+
1. Update code comments on `sharing` type and `update_memory_sharing` to reflect post-multi-streaming semantics.
98+
2. File cleanup issue for dead sharing infrastructure if warranted.
99+
3. Update task notes with complete audit findings.

0 commit comments

Comments
 (0)