Skip to content

Make action-summary concatenation sound by chunking bundles#2385

Open
paulfitz wants to merge 3 commits into
mainfrom
paulfitz/associative-concat
Open

Make action-summary concatenation sound by chunking bundles#2385
paulfitz wants to merge 3 commits into
mainfrom
paulfitz/associative-concat

Conversation

@paulfitz

@paulfitz paulfitz commented Jun 8, 2026

Copy link
Copy Markdown
Member

Concatenating two action summaries, or summarizing a bundle that carries several schema changes at once, could produce the wrong result. A single forward/reverse walk over a whole bundle cannot tell a name that appeared and vanished mid-bundle from one that really sits at the boundary, so it picks the wrong answer on recycled names, renames, and type conversions. We've known about these problems, but in practice, for documents edited via the web client, such bundles don't happen, so it hasn't been a big issue. But for building new features, it is an awkward limitation.

The data engine knows more about which undo entries invert which stored actions than it used to tell. Now, each stored doc action tags the undo entries it produces (in useractions.py), in a way that survives some gnarly reorderings (the ModifyColumn pop and re-append in action_obj.py, the calc-flush front inserts in action_summary.py). A front calc-flush restore of a removed column, table, or row is left unowned. That tag rides along as an undoOwner list parallel to undo, making its way into the local bundle and action history. The undoOwner is included in the action hash when present, so it is covered by the checksum. This happens only when present, so actions recorded before this keep their hashes.

However, that change only affects actions in Grist documents from now on. There's still action history in existing documents, which is valuable, and which could persist for some time. That's where this work gets interesting, some might say overkill. It turns out, if you're careful, it IS possible to split a bundle into chunks without knowing the precise relationship between actions and their undos up front, by using dynamic programming and the constraints of what the data engine can issue. The data engine might change in the future, but that doesn't matter, since for future bundles we have the new tags.

Our goal is to split a bundle into chunks that each summarize cleanly and then can be composed. ActionLayout adds a lattice chunker: chunking becomes a least-cost path over a (stored x undo) grid, priced against the inverse shapes the engine actually emits (read from docactions.py). The summarizer concatenation is reworked to canonicalize renames, preserve recycled entities, and route removed rows onto the right defunct names.

On the server, chunkByOwners groups actions and undos trivially by undoOwner if known. Otherwise we fall back to the lattice chunker, so older history and engine-less bundles are unaffected.

A fuzz harness drives chunk-then-concat against a live summary oracle over a recycling corpus, cross-checks that the two chunkers agree on every bundle, and unit tests pin the tricky single-table shapes, truncation, and the chunker's edge cases.

Also in this branch, a small related fix: the "changed rows only" filter when showing comparisons now keeps locally-added rows alongside updates and removals, so a row added locally shows as a local-add instead of being filtered out (it previously rendered only when an added row also happened to appear in updateRows, which made the comparison test flaky).

This was heavily machine assisted. But it wasn't done quickly. There were a few dozen false starts before getting a handle on how to do this robustly. There's a case for just ignoring past history and supporting clean summarization only for future actions :-)

Related: investigating this turned up a data engine bug #2387

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reworks ActionSummary generation to be robust when a single bundle contains multiple schema changes by splitting bundles into summarizable chunks and composing per-chunk summaries, with an additional fast path that uses engine-provided undo ownership when available.

Changes:

  • Add undoOwner propagation from the sandbox engine through the server into action history, and include it in action hashing when present.
  • Introduce ActionLayout chunkers (chunkByOwners, chunkByLattice) and update summarizeStoredAndUndo to chunk-then-concatenate.
  • Add extensive unit + fuzz tests to validate chunking, concatenation associativity, and tricky rename/recycle/conversion shapes.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/server/lib/ActionSummaryFuzz.ts New fuzz/property harness comparing per-action vs combined-bundle summaries and cross-checking both chunkers.
test/server/lib/ActionSummary.ts Expanded unit coverage for canonicalization/associativity, defunct re-keying, and chunker behavior.
test/server/lib/ActionHistory.ts Tests for hashing behavior when undoOwner is absent vs present.
sandbox/grist/useractions.py Records undo ownership for undos produced while applying each stored action.
sandbox/grist/docactions.py Documents the “expected inverses” contract relied on by the lattice chunker.
sandbox/grist/action_summary.py Tags preserved-row undos with their stored-update owner; leaves front restores unowned.
sandbox/grist/action_obj.py Adds undo_owner tracking to ActionGroup/ActionBundle JSON (undoOwner).
sandbox/grist/acl.py Resolves identity-keyed undo-owner map into an undoOwner list parallel to undo.
app/server/lib/Sharing.ts Plumbs undoOwner from sandbox bundle into server-side LocalActionBundle.
app/server/lib/DocApi.ts Simplifies change aggregation to a single concatenateSummaries call over action summaries.
app/server/lib/ActionHistoryImpl.ts Hashes undoOwner when present; clarifies write-once hash behavior.
app/common/TimeQuery.ts Switches incremental folding to concatenateSummaryPair to avoid per-step canonicalization loss.
app/common/DocActions.ts Normalizes row-id extraction using Array.isArray.
app/common/ActionSummarizer.ts Implements chunk-then-walk-then-concat, introduces canonicalization and stable rename ordering.
app/common/ActionLayout.ts New chunking algorithms (owner-driven and lattice DP) plus stored-side analysis for attribution/orphans.
app/common/ActionBundle.ts Adds undoOwner fields to SandboxActionBundle and LocalActionBundle types.
app/client/models/DataTableModelWithDiff.ts Keeps locally-added rows visible in “changed rows only” mode.
app/client/components/ActionLog.ts Combines change summaries in chronological order using concatenateSummaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/server/lib/ActionSummaryFuzz.ts
Comment thread app/common/ActionLayout.ts
@paulfitz paulfitz force-pushed the paulfitz/associative-concat branch from 8ffd356 to 5a74e6d Compare June 15, 2026 14:10
paulfitz and others added 3 commits June 18, 2026 16:31
Concatenating two action summaries, or summarizing a bundle that carries
several schema changes at once, could produce the wrong result. A single
forward/reverse walk over a whole bundle cannot tell a name that appeared
and vanished mid-bundle from one that really sits at the boundary, so it
picks the wrong answer on recycled names, renames, and type conversions.

This splits a bundle into chunks that each summarize cleanly and then
composes them. ActionLayout adds a lattice chunker: chunking becomes a
least-cost path over a (stored x undo) grid, priced against the inverse
shapes the engine actually emits (read from docactions.py, so it is a
contract rather than a guess). The summarizer concatenation is reworked to
canonicalize renames, preserve recycled entities, and route removed rows
onto the right defunct names.

The lattice reconstructs which undo entries invert which stored actions. The
data engine already knows that correspondence as it applies each action, so we
also record it and use it directly. Each stored doc action tags the undo
entries it produces, keyed by object identity so the tag survives the later
reorderings (the ModifyColumn pop and re-append, the calc-flush front inserts).
A front calc-flush restore of a removed column, table, or row is left unowned.
The server attributes it to its removal the same way the lattice does. The
result rides along as an undoOwner list parallel to undo, enveloped across the
sandbox boundary like the other action arrays and flattened into the local
bundle and action history.

On the server, chunkByOwners groups by that ownership, one chunk per stored
action, which is the finest split the lattice searches for but here is known
directly. summarizeStoredAndUndo takes this path when the ownership is present
and well-formed, unless ignoreUndoGrouping is set. Otherwise it falls back to
the lattice, so older history and engine-less bundles are unaffected. Both
chunkers share the front-restore attribution.

undoOwner is folded into the action hash when present, so it is covered by the
checksum. This happens only when present, so actions recorded before this keep
their hashes.

A fuzz harness drives chunk-then-concat against a live summary oracle over a
recycling corpus, cross-checks that the two chunkers agree on every bundle, and
unit tests pin the tricky single-table shapes, truncation, and the chunker's
edge cases.

This was heavily machine assisted, but wasn't done quickly, there were a few
dozen false starts before getting a handle on how to do this robustly. There's
a case for just ignoring past history and supporting clean summarization only
for future actions :-)
The "changed rows only" filter (TableDataWithDiff.getKeepFunc) kept updated
rows, removed rows, and synthetic rows, but not locally-added rows. A
locally-added row is a real row in the core table, present only in the delta's
addRows, so it was filtered out and never shown as a local-add.

It rendered before only when an added row also happened to appear in updateRows,
which made the existing comparison test flaky. Keep leftTableDelta.addRows in the
filter, alongside updates and removals.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two test-only fixes to the ActionSummary fuzz harness, both flagged while
addressing the Copilot review of this branch.

randomValue() had its probability thresholds out of order: the empty-string
branch (r < 0.14) sat behind the encoded-list branch (r < 0.16) and was
unreachable, so empty-string values were never generated and the Text-default
and "0"-vs-0 coercion cases went untested. Reorder into monotonic bands.

docSnapshot()'s same-final-state precondition compared only user-table row data,
so it missed a column-type divergence that lives purely in metadata when the
table ends with no surviving rows. The engine's formula-to-data type guess is
batch-sensitive (Text vs Numeric), which let such scenarios slip past the gate
and spuriously fail the oracle even though the summarizer was faithful. Fold
column type/isFormula into the snapshot so these scenarios are skipped, as the
harness already intends for type-divergent cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@paulfitz paulfitz force-pushed the paulfitz/associative-concat branch from 5a74e6d to 0c61850 Compare June 18, 2026 20:31
@paulfitz paulfitz marked this pull request as ready for review June 19, 2026 14:55
@paulfitz paulfitz changed the title Make ActionSummary chunk+concat sound; add ActionLayout chunker Make action-summary concatenation sound by chunking bundles Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants