Make action-summary concatenation sound by chunking bundles#2385
Open
paulfitz wants to merge 3 commits into
Open
Make action-summary concatenation sound by chunking bundles#2385paulfitz wants to merge 3 commits into
paulfitz wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reworks ActionSummary generation to be robust when a single bundle contains multiple schema changes by splitting bundles into summarizable chunks and composing per-chunk summaries, with an additional fast path that uses engine-provided undo ownership when available.
Changes:
- Add
undoOwnerpropagation from the sandbox engine through the server into action history, and include it in action hashing when present. - Introduce
ActionLayoutchunkers (chunkByOwners,chunkByLattice) and updatesummarizeStoredAndUndoto chunk-then-concatenate. - Add extensive unit + fuzz tests to validate chunking, concatenation associativity, and tricky rename/recycle/conversion shapes.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/server/lib/ActionSummaryFuzz.ts | New fuzz/property harness comparing per-action vs combined-bundle summaries and cross-checking both chunkers. |
| test/server/lib/ActionSummary.ts | Expanded unit coverage for canonicalization/associativity, defunct re-keying, and chunker behavior. |
| test/server/lib/ActionHistory.ts | Tests for hashing behavior when undoOwner is absent vs present. |
| sandbox/grist/useractions.py | Records undo ownership for undos produced while applying each stored action. |
| sandbox/grist/docactions.py | Documents the “expected inverses” contract relied on by the lattice chunker. |
| sandbox/grist/action_summary.py | Tags preserved-row undos with their stored-update owner; leaves front restores unowned. |
| sandbox/grist/action_obj.py | Adds undo_owner tracking to ActionGroup/ActionBundle JSON (undoOwner). |
| sandbox/grist/acl.py | Resolves identity-keyed undo-owner map into an undoOwner list parallel to undo. |
| app/server/lib/Sharing.ts | Plumbs undoOwner from sandbox bundle into server-side LocalActionBundle. |
| app/server/lib/DocApi.ts | Simplifies change aggregation to a single concatenateSummaries call over action summaries. |
| app/server/lib/ActionHistoryImpl.ts | Hashes undoOwner when present; clarifies write-once hash behavior. |
| app/common/TimeQuery.ts | Switches incremental folding to concatenateSummaryPair to avoid per-step canonicalization loss. |
| app/common/DocActions.ts | Normalizes row-id extraction using Array.isArray. |
| app/common/ActionSummarizer.ts | Implements chunk-then-walk-then-concat, introduces canonicalization and stable rename ordering. |
| app/common/ActionLayout.ts | New chunking algorithms (owner-driven and lattice DP) plus stored-side analysis for attribution/orphans. |
| app/common/ActionBundle.ts | Adds undoOwner fields to SandboxActionBundle and LocalActionBundle types. |
| app/client/models/DataTableModelWithDiff.ts | Keeps locally-added rows visible in “changed rows only” mode. |
| app/client/components/ActionLog.ts | Combines change summaries in chronological order using concatenateSummaries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8ffd356 to
5a74e6d
Compare
Concatenating two action summaries, or summarizing a bundle that carries several schema changes at once, could produce the wrong result. A single forward/reverse walk over a whole bundle cannot tell a name that appeared and vanished mid-bundle from one that really sits at the boundary, so it picks the wrong answer on recycled names, renames, and type conversions. This splits a bundle into chunks that each summarize cleanly and then composes them. ActionLayout adds a lattice chunker: chunking becomes a least-cost path over a (stored x undo) grid, priced against the inverse shapes the engine actually emits (read from docactions.py, so it is a contract rather than a guess). The summarizer concatenation is reworked to canonicalize renames, preserve recycled entities, and route removed rows onto the right defunct names. The lattice reconstructs which undo entries invert which stored actions. The data engine already knows that correspondence as it applies each action, so we also record it and use it directly. Each stored doc action tags the undo entries it produces, keyed by object identity so the tag survives the later reorderings (the ModifyColumn pop and re-append, the calc-flush front inserts). A front calc-flush restore of a removed column, table, or row is left unowned. The server attributes it to its removal the same way the lattice does. The result rides along as an undoOwner list parallel to undo, enveloped across the sandbox boundary like the other action arrays and flattened into the local bundle and action history. On the server, chunkByOwners groups by that ownership, one chunk per stored action, which is the finest split the lattice searches for but here is known directly. summarizeStoredAndUndo takes this path when the ownership is present and well-formed, unless ignoreUndoGrouping is set. Otherwise it falls back to the lattice, so older history and engine-less bundles are unaffected. Both chunkers share the front-restore attribution. undoOwner is folded into the action hash when present, so it is covered by the checksum. This happens only when present, so actions recorded before this keep their hashes. A fuzz harness drives chunk-then-concat against a live summary oracle over a recycling corpus, cross-checks that the two chunkers agree on every bundle, and unit tests pin the tricky single-table shapes, truncation, and the chunker's edge cases. This was heavily machine assisted, but wasn't done quickly, there were a few dozen false starts before getting a handle on how to do this robustly. There's a case for just ignoring past history and supporting clean summarization only for future actions :-)
The "changed rows only" filter (TableDataWithDiff.getKeepFunc) kept updated rows, removed rows, and synthetic rows, but not locally-added rows. A locally-added row is a real row in the core table, present only in the delta's addRows, so it was filtered out and never shown as a local-add. It rendered before only when an added row also happened to appear in updateRows, which made the existing comparison test flaky. Keep leftTableDelta.addRows in the filter, alongside updates and removals. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two test-only fixes to the ActionSummary fuzz harness, both flagged while addressing the Copilot review of this branch. randomValue() had its probability thresholds out of order: the empty-string branch (r < 0.14) sat behind the encoded-list branch (r < 0.16) and was unreachable, so empty-string values were never generated and the Text-default and "0"-vs-0 coercion cases went untested. Reorder into monotonic bands. docSnapshot()'s same-final-state precondition compared only user-table row data, so it missed a column-type divergence that lives purely in metadata when the table ends with no surviving rows. The engine's formula-to-data type guess is batch-sensitive (Text vs Numeric), which let such scenarios slip past the gate and spuriously fail the oracle even though the summarizer was faithful. Fold column type/isFormula into the snapshot so these scenarios are skipped, as the harness already intends for type-divergent cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5a74e6d to
0c61850
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Concatenating two action summaries, or summarizing a bundle that carries several schema changes at once, could produce the wrong result. A single forward/reverse walk over a whole bundle cannot tell a name that appeared and vanished mid-bundle from one that really sits at the boundary, so it picks the wrong answer on recycled names, renames, and type conversions. We've known about these problems, but in practice, for documents edited via the web client, such bundles don't happen, so it hasn't been a big issue. But for building new features, it is an awkward limitation.
The data engine knows more about which undo entries invert which stored actions than it used to tell. Now, each stored doc action tags the undo entries it produces (in
useractions.py), in a way that survives some gnarly reorderings (the ModifyColumn pop and re-append inaction_obj.py, the calc-flush front inserts inaction_summary.py). A front calc-flush restore of a removed column, table, or row is left unowned. That tag rides along as anundoOwnerlist parallel to undo, making its way into the local bundle and action history. TheundoOwneris included in the action hash when present, so it is covered by the checksum. This happens only when present, so actions recorded before this keep their hashes.However, that change only affects actions in Grist documents from now on. There's still action history in existing documents, which is valuable, and which could persist for some time. That's where this work gets interesting, some might say overkill. It turns out, if you're careful, it IS possible to split a bundle into chunks without knowing the precise relationship between actions and their undos up front, by using dynamic programming and the constraints of what the data engine can issue. The data engine might change in the future, but that doesn't matter, since for future bundles we have the new tags.
Our goal is to split a bundle into chunks that each summarize cleanly and then can be composed. ActionLayout adds a lattice chunker: chunking becomes a least-cost path over a (stored x undo) grid, priced against the inverse shapes the engine actually emits (read from
docactions.py). The summarizer concatenation is reworked to canonicalize renames, preserve recycled entities, and route removed rows onto the right defunct names.On the server,
chunkByOwnersgroups actions and undos trivially byundoOwnerif known. Otherwise we fall back to the lattice chunker, so older history and engine-less bundles are unaffected.A fuzz harness drives chunk-then-concat against a live summary oracle over a recycling corpus, cross-checks that the two chunkers agree on every bundle, and unit tests pin the tricky single-table shapes, truncation, and the chunker's edge cases.
Also in this branch, a small related fix: the "changed rows only" filter when showing comparisons now keeps locally-added rows alongside updates and removals, so a row added locally shows as a local-add instead of being filtered out (it previously rendered only when an added row also happened to appear in
updateRows, which made the comparison test flaky).This was heavily machine assisted. But it wasn't done quickly. There were a few dozen false starts before getting a handle on how to do this robustly. There's a case for just ignoring past history and supporting clean summarization only for future actions :-)
Related: investigating this turned up a data engine bug #2387