-
Notifications
You must be signed in to change notification settings - Fork 166
✨[PANA-3971] Add a more compact experimental DOM mutation encoding #4060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨[PANA-3971] Add a more compact experimental DOM mutation encoding #4060
Conversation
4d23639 to
287703c
Compare
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage 🔗 Commit SHA: 9233797 | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
Bundles Sizes Evolution
🚀 CPU PerformancePending... 🧠 Memory PerformancePending... |
| let privacyLevel: NodePrivacyLevel | ||
|
|
||
| const selfPrivacyLevel = getNodeSelfPrivacyLevel(node) | ||
| if (selfPrivacyLevel) { | ||
| privacyLevel = reducePrivacyLevel(selfPrivacyLevel, parentPrivacyLevel) | ||
| } else { | ||
| privacyLevel = parentPrivacyLevel | ||
| } | ||
|
|
||
| if (privacyLevel === NodePrivacyLevel.HIDDEN) { | ||
| serializeHiddenNodePlaceholder(cursor, node, transaction) | ||
| return | ||
| } | ||
|
|
||
| // Totally ignore risky or unwanted elements. (e.g. <script>, some <link> and <meta> elements) | ||
| if (privacyLevel === NodePrivacyLevel.IGNORE) { | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to execute this on every node instead of just on element nodes as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the first part, where we call getNodeSelfPrivacyLevel(), it seems more different than it really is. getNodeSelfPrivacyLevel() immediately returns undefined for non-elements, and in response we immediately set privacyLevel to parentPrivacyLevel and skip the invocation of reducePrivacyLevel().
Regarding the second part, where we make checks like if privacyLevel === NodePrivacyLevel.HIDDEN, this part needs to be there for correctness, because external callers can pass arbitrary nodes into this function, and so we might be handed a non-element node without passing through an element ancestor first. The old implementation would've been better off having these checks, too; the current state of affairs requires callers to defensively implement the checks themselves, with the potential for a privacy bug if they forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I was also hoping to keep as much of the privacy logic in a single place as possible, to make it easier to understand.
| transaction: ChangeSerializationTransaction | ||
| ): void { | ||
| const { nodeId, insertionPoint } = cursor.advance(document) | ||
| transaction.addNode(insertionPoint, '#document') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to have NodeType[type] (which was misleading because it did not match https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType ) Still, shouldn't we keep using numbers instead of strings and use this opportunity to match the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW we still use numbers, in the end, but the numbers are indexes into the string table. 🙂
We are actually mostly aligned with the spec here, but it's the spec for nodeName that we're aligned with, not nodeType. (We don't perfectly match it because there are a few cases where we diverge to do something a bit nicer -- e.g. we handle DocumentType nodes differently, since spec'd nodeName would yield html for <!DOCTYPE html>, which is not what we want.)
Why nodeName and not nodeType? nodeName is nice because it eliminates the need to store separate fields for the node type and the tag name of element nodes. Non-element nodes have nodeNames that start with # (excluding the oddballs like DocumentType), and because element tag names cannot start with # in HTML, this is never ambiguous. By eliminating the need for separate node type and tag name fields, each serialized node is a bit smaller.
e5ca210 to
81de533
Compare
|
/to-staging |
|
View all feedbacks in Devflow UI.
Commit 81de533e2f will soon be integrated into staging-03.
We couldn't automatically merge the commit 81de533e2f into staging-03! To solve the conflicts directly in Github, click here to create a fix pull request. Alternatively, you can also click here reset the integration branch or use the following Slack command: |
|
🚂 Branch Integration: starting soon, merge expected in approximately 14m (p90) Commit 81de533e2f will soon be integrated into staging-03. |
|
🚂 Branch Integration Commit 81de533e2f has been merged into staging-03 in merge commit 11f81c700c. Check out the triggered pipeline on Gitlab 🦊 If you need to revert this integration, you can use the following command: |
packages/rum/src/domain/record/serialization/insertionCursor.ts
Outdated
Show resolved
Hide resolved
packages/rum/src/domain/record/serialization/serializationTransaction.ts
Outdated
Show resolved
Hide resolved
| transaction.addNode(insertionPoint, encodedElementName(node), [PRIVACY_ATTR_NAME, PRIVACY_ATTR_VALUE_HIDDEN]) | ||
| const { width, height } = node.getBoundingClientRect() | ||
| transaction.setSize(nodeId, width, height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: will setSize be used for something else? Else, why not just use style attributes like we did before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will setSize be used for something else?
SizeChange will eventually be used for recording the size of censored images as well.
why not just use style attributes like we did before?
- It's nice to clearly distinguish between content that was present in the real DOM at recording time, and content that we generate. (This is, admittedly, more of a philosophical stance than anything else.)
- It's more compact to represent these sizes as a simple array with two integers than to represent them using CSS syntax.
- It's possible for hidden nodes and censored images to have
styleattributes already, and we may find in the future that it's necessary to preserve style information on these nodes to produce the correct layout. (I've seen the need for that in the past.) If we find ourselves needing to support that, it'll be nice to not have to worry about merging our generated styles with customer styles.
Motivation
We have observed that there are a number of opportunities for reducing the size of session replay data. This would be beneficial for a number of reasons:
Less overhead, more reliability; seems like a clear win!
This PR takes a significant step in that direction by adding an experimental implementation of a new encoding for DOM mutations. In experiments, this new encoding reduced the size of session replay data by 77.3% before compression, and by 34.7% after compression is taken into account. (Both of these numbers exclude stylesheet data, which requires a somewhat different approach; this is a separate opportunity which isn't addressed in this PR.)
The new encoding is a drop-in replacement for the old encoding; it can be converted to the old representation losslessly, producing a byte-for-byte identical result. The tests included in this PR verify that this is true for the session replay unit tests. For additional verification in a real-world scenario, I plan to push a commit to staging tomorrow that will perform the same verification on all recorded pages within our staging environment.
Changes
A new experimental feature,
USE_CHANGE_RECORDS, has been added. Enabling this feature has the following effects:serializeNodeAsChange, which produces aChangerecord instead of aFullSnapshotrecord.serializeNodeAsChangeand the functions it recursively calls are the heart of the new encoding implementation.RecordingScopedata structure resets between full snapshots; all ids are cleared. The new encoding assigns ids implicitly in the order that nodes and other objects are encountered; this means that there is no way to carry over ids from a different full snapshot, so resetting theRecordingScopeis necessary.serializeNodeAsChangeis supported by several new types that help implement its functionality:ChangeSerializationTransactiontype exposes builder methods which are used to construct theChangerecord. Once we're confident in this code, we can move this functionality onto the existingSerializationTransactiontype; I've separated things out in this PR only to keep the new code as isolated as possible.InsertionCursortracks the current position in the DOM during the serialization process. It's responsible for allocating node ids and generating anInsertionPositionfor each node; theseInsertionPositions are used in the new encoding to tell the player where in the DOM tree to insert each node.ChangeEncoderautomatically optimizes eachChangerecord as it's being constructed. It splits out strings into a separate string table for deduplication purposes, and it groups changes of the same type together to minimize the number of bytes that must be devoted to identifying the type of each change.It's probably also worth calling out
changeConversions.ts, which is unfortunately by far the largest file in this PR despite not actually being code that runs as part of the browser SDK. This file containsconvertChangeToFullSnapshot(), which is used in tests to verify thatserializeNodeAsChange()produces byte-for-byte identical results toserializeNode(). It would ordinarily be calledchangeConversions.specHelper.ts, but I need it to be temporarily importable in non-test code so I can use it to perform validation on staging. I'll give it the.specHelper.tsextension before merging. (EDIT: I've completed this validation and renamed the file tochangeConversions.specHelper.ts.)There are some changes which are deliberately not included in this PR:
InsertionCursorcan only be positioned at the root of the DOM tree right now.)Changerecords to the session replay sandbox immediately, but that code lives in a separate repo, so it can't be included in this PR. I'll bump the session replay sandbox version used in the browser extension in a followup. (Edit: This was ✨ [PANA-5359] - Support Change records in the developer extension #4072.)Test instructions
This PR has been validated by pushing a commit to staging that generates serializations in both formats for every full snapshot, converts the
BrowserChangeRecordto aBrowserFullSnapshotRecord. The current version of the PR producesBrowserChangeRecords which, when converted toBrowserFullSnapshotRecords, are byte-for-byte identical in all circumstances encountered on staging.Manually testing the changes is currently a bit tricky, since the browser SDK extension does not support rendering
BrowserChangeRecords. I've merged a PR for the replay code that adds this functionality, so it should be possible to test things visually in an easier way soon, but that PR has not been deployed yet. I'm happy to share the details on Slack, if you want to test things visually.Checklist