docs: add 3D tensor scaling section to PR description

katosh · claude · katosh · commit 38e77af84e35 · 2026-03-30T13:31:11.000-07:00
Explains factored storage pattern for large tensors: register_section
for compact rank-R factors + register_anndata_namespace for on-demand
tensor reconstruction and O(rank) point queries. Includes compression
ratio (1.5M× for 1M cells) and sparse.COO note.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/PR4_DESCRIPTION.md b/PR4_DESCRIPTION.md
@@ -191,6 +191,66 @@ True
 | `repr(adata)` | Shows when non-empty |
 | View copy-on-write | Writing to a view triggers copy |
 
+### Scaling 3D tensors: factored storage + accessor
+
+A dense `(n_obs × n_obs × n_vars)` tensor is infeasible for large datasets (1M cells × 1M cells × 30K genes ≈ 10^16 entries). The practical pattern is to store compact rank-R factors and reconstruct on demand:
+
+```python
+# Register factor storage (tiny: n_obs × rank and n_vars × rank)
+@register_section("comm_obs", alignment="obs")
+class CommObs:
+    pass
+
+@register_section("comm_var", alignment="var")
+class CommVar:
+    pass
+
+# Register accessor for tensor reconstruction
+@register_anndata_namespace("comm")
+class CellCommAccessor:
+    def __init__(self, adata: ad.AnnData):
+        self._adata = adata
+
+    def tensor(self, key="default"):
+        """Reconstruct (obs × obs × var) tensor from factors."""
+        U = self._adata.comm_obs[key]   # (n_obs, rank)
+        V = self._adata.comm_var[key]   # (n_vars, rank)
+        return np.einsum("ir,jr,kr->ijk", U, U, V)
+
+    def query(self, sender, receiver, gene, key="default"):
+        """O(rank) point query without materializing tensor."""
+        U = self._adata.comm_obs[key]
+        V = self._adata.comm_var[key]
+        i = self._adata.obs_names.get_loc(sender)
+        j = self._adata.obs_names.get_loc(receiver)
+        k = self._adata.var_names.get_loc(gene)
+        return float(U[i] @ (U[j] * V[k]))
+```
+
+```python
+>>> adata.comm_obs["lr"] = np.random.rand(100, 10)   # factors: 12 KB
+>>> adata.comm_var["lr"] = np.random.rand(50, 10)
+
+>>> adata.comm.tensor("lr").shape                      # dense tensor: 4 MB
+(100, 100, 50)
+
+>>> adata.comm.query("cell_0", "cell_1", "CD8A", "lr") # O(rank), no tensor
+0.7386
+
+>>> t_cells = adata[adata.obs["cell_type"] == "T"]
+>>> t_cells.comm.tensor("lr").shape                     # factors were subsetted
+(50, 50, 50)
+
+>>> adata.write("test.h5ad")                            # only factors written
+>>> adata2 = ad.read_h5ad("test.h5ad")
+>>> adata2.comm.tensor("lr").shape                      # reconstructs from factors
+(100, 100, 50)
+```
+
+This combines `@register_section` (factor storage with automatic subsetting and IO) with `@register_anndata_namespace` (tensor API and point queries). For 1M cells with rank 20, the factors are ~160 MB while the dense tensor would be ~240 TB — a 1,500,000× compression.
+
+For moderately-sized datasets, `sparse.COO` from the PyData sparse package also works directly in registered sections (subsetting handles N-D sparse arrays).
+
 ### Also in this PR
 
 - **`@register_anndata_namespace`** — custom accessor APIs (`adata.spatial.images`)
@@ -199,7 +259,7 @@ True
 
 ### Test coverage
 
-67 tests covering all alignment patterns, custom validation, custom IO (JSON, xarray), 3D tensor subsetting, copy-on-write, and end-to-end workflows for TreeData-like, SpatialData-like, CellChat-like, and SCENIC-like scenarios.
+73 tests covering all alignment patterns, custom validation, custom IO (JSON, xarray), 3D tensor subsetting, factored tensor with accessor, copy-on-write, and end-to-end workflows for TreeData-like, SpatialData-like, CellChat-like, SCENIC-like, and factored communication scenarios.
 
 ### Future direction