RTC: Sanitize remote CRDT changes from untrusted contributors#77207
RTC: Sanitize remote CRDT changes from untrusted contributors#77207maxschmeling wants to merge 20 commits intotrunkfrom
Conversation
When a remote RTC collaborator sends malicious content (e.g. `<script>` tags, event handler attributes, `javascript:` URIs) through the shared CRDT document, it could execute in other collaborators' browsers when written to the local store. This adds DOMPurify sanitization at the single gateway where remote CRDT content enters the local entity store (`_updateEntityRecord` in the sync manager). All string values in the changes object are recursively sanitized before being passed to `editRecord`, while preserving WordPress block comment delimiters and safe HTML. Made-with: Cursor
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Size Change: +9.33 kB (+0.12%) Total Size: 7.76 MB 📦 View Changed
ℹ️ View Unchanged
|
|
Flaky tests detected in d54d55f. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/24528527757
|
Track which users submit sync updates via a contributor list stored in post meta on the sync storage post. On each poll response the server checks every contributor against the `unfiltered_html` capability and returns a `trustworthy` flag. The list is cleared on post save so the check resets for subsequent editing sessions. On the client side the polling provider relays the flag through a new `trustworthy` provider event. The sync manager stores it per entity and only runs DOMPurify sanitization when `trustworthy` is false, avoiding unnecessary filtering when all collaborators are trusted. Made-with: Cursor
| $storage = new WP_Sync_Post_Meta_Storage(); | ||
| $storage->clear_contributors( $room ); | ||
| } | ||
| add_action( 'save_post', 'gutenberg_clear_sync_contributors_on_save', 10, 2 ); |
There was a problem hiding this comment.
Are there any other actions we might want to clear out contributors for? Maybe on wp_delete_post() we'd want to check for and clean up contributor meta? Not immediately important, though.
There was a problem hiding this comment.
Actually not sure on delete. I would think we might want to just leave the standard WP behavior there.
Move contributor tracking to the top of the per-room loop in handle_request so that every authenticated access is recorded, including awareness-only polls. This is simpler and more conservative than gating on whether the request carries document updates. Made-with: Cursor
Replace the read-modify-write pattern (get list, append, update row) with add_post_meta inserting one row per contributor per access. Concurrent requests from different users no longer risk overwriting each other. Duplicates across rows are deduplicated in get_contributors() via array_unique. Made-with: Cursor
Check for an existing row with the same meta key and user ID value before inserting. This prevents unbounded row growth from repeated polls by the same user. A rare race between concurrent requests can produce at most one extra row per user, which get_contributors() already handles via array_unique. Made-with: Cursor
Go back to plain add_post_meta without the direct DB dedup query. Made-with: Cursor
Check existing meta rows via get_post_meta before inserting to avoid growing the table on every poll. A rare race can still produce one duplicate per user, handled by array_unique in get_contributors(). Made-with: Cursor
| export function sanitizeRemoteChanges( | ||
| changes: Partial< ObjectData > | ||
| ): Partial< ObjectData > { | ||
| return sanitizeObjectData( changes ); |
There was a problem hiding this comment.
Does this cause any issues with rich text HTML?
There was a problem hiding this comment.
I'm having trouble testing this PR due to some build errors, but looking at how DOMPurify works I don't think rich text will be an issue. There's a huge default whitelist which includes tags like <strong>.
| 'room' => $room, | ||
| 'should_compact' => $should_compact, | ||
| 'total_updates' => $total_updates, | ||
| 'trustworthy' => $trustworthy, |
There was a problem hiding this comment.
Instead of trustworthy, can we have this 1:1 match the capability it represents? E.g. trust_unfiltered_html. I can imagine we might inspect other capabilities that affect syncing in the future, and "trustworthy" might mean something else in that context.
There was a problem hiding this comment.
Or, even more generally, we could use the capability directly:
{
sync_capabilities: [ 'unfiltered_html' ] // When all users have the capability
// or
sync_capabilities: [] // When an untrusted user joins
}
@peterwilsoncc I agree with the overall idea that the code should be explicit and easy to read about being deny-first in your code comments above. I pushed up some changes in f095565 to address that, even if it's functionally the same logic. |
|
Hmm, I noticed one issue with this change: gutenberg/packages/sync/src/manager.ts Line 267 in d54d55f This causes the document to load with an assumed Then save and refresh the page, the script is sanitized on initial load before we receive the |
peterwilsoncc
left a comment
There was a problem hiding this comment.
I've left a couple of notes inline.
DOMPurify looks to be maintained based on a quick review of the repo so I think it's fine to add. @youknowriad @ellatrix @desrosj do any of you have strong feelings on this?
| 'window._wpCollaborationEnabled = ' . wp_json_encode( $enabled ) . ';' . | ||
| 'window._wpCollaborationKsesHtml = ' . wp_json_encode( wp_kses_allowed_html( 'post' ) ) . ';', |
There was a problem hiding this comment.
Maybe:
$config = [
enabled: $enabled,
kses: $enabled ? wp_kses_allowed_html( 'post' ) : []
];
window._wpCollaboration = wp_json_encode( $config );
It might be helpful to keep window._wpCollaborationEnabled for a bit while wp-dev and gb get synced up with intent to remove.
| // URI scheme filtering is left to DOMPurify's built-in `IS_ALLOWED_URI` | ||
| // pattern (blocks `javascript:`, `data:`, `vbscript:`, etc.) rather than a | ||
| // hand-rolled regex around `wp_allowed_protocols()`. Slightly stricter than | ||
| // kses (a few exotic schemes like `irc:`, `webcal:` are dropped), which is | ||
| // acceptable for a security-first sanitizer. |
There was a problem hiding this comment.
Yeah, I'm happy with this if allowed protocols is overly complex.
Also, use multi-line comment standard per WP standards for inline comments (applies throughout so I won't repeat myself).
Still a work in progress. This can break valid administrative HTML on load, and doesn't account for poisoned CRDT doc changes getting persisted in a trusted save.
Summary
@wordpress/syncpackage via DOMPurify, configured with thewp_kses_allowed_html( 'post' )allowlist injected from PHP, mitigating XSS risk from RTC collaborators without theunfiltered_htmlcapability injecting dangerous content.permissions.unfilteredHtmlflag computed server-side: when all tracked contributors share theunfiltered_htmlcapability, remote changes are passed through; when any contributor lacks it, all string values in the changes object are recursively sanitized before reaching the local entity store.How it works
When a remote update arrives via the Yjs CRDT document, the sync manager extracts the changed properties through
getChangesFromCRDTDocand writes them to the local store viahandlers.editRecord(). The newsanitizeRemoteChanges()helper intercepts this flow when the room is untrusted, recursively walking the changes object and running every string through DOMPurify. The allowlist is built fromwindow._wpCollaborationKsesHtml, which PHP populates fromwp_kses_allowed_html( 'post' ), and a customuponSanitizeAttributehook enforces per-tag attribute scoping to match kses semantics.The trust signal flows from the server: PHP tracks every authenticated contributor per room in postmeta, computes
permissions: { unfiltered_html: bool }on each poll response (fail-closed on empty lists), clears the list onsave_post, and the HTTP polling provider relays the flag as apermissionsevent before any updates in the same response are applied.Test plan
@wordpress/synctests continue to passsanitize.test.tscovers script stripping, event handler removal,javascript:URI blocking on anchors and form actions,meta/base/stylestripping, per-tag attribute scoping, safe HTML preservation, block comment preservation, primitive pass-through, recursive traversal of arrays/objects, and the class-instance recursion boundarymanager.tsandpolling-manager.test.tsverify that XSS payloads from a simulated remote peer are stripped whenpermissions.unfilteredHtmlis false, including the default-false path before the first poll arrives, and that the wirepermissionsfield is only honored when strictly=== trueMade with Cursor and Claude Code