Skip to content

Conversation

@sethfowler-datadog
Copy link
Contributor

@sethfowler-datadog sethfowler-datadog commented Jan 5, 2026

Motivation

We have observed that there are a number of opportunities for reducing the size of session replay data. This would be beneficial for a number of reasons:

  • We'd use less client-side resources, decreasing the performance impact of the SDK.
  • Along the same lines, we'd have less impact on end user mobile data and battery life.
  • Fewer bytes take less time to transmit, which reduces the likelihood that the data is lost because the user navigates away from the page or loses network connectivity.

Less overhead, more reliability; seems like a clear win!

This PR takes a significant step in that direction by adding an experimental implementation of a new encoding for DOM mutations. In experiments, this new encoding reduced the size of session replay data by 77.3% before compression, and by 34.7% after compression is taken into account. (Both of these numbers exclude stylesheet data, which requires a somewhat different approach; this is a separate opportunity which isn't addressed in this PR.)

The new encoding is a drop-in replacement for the old encoding; it can be converted to the old representation losslessly, producing a byte-for-byte identical result. The tests included in this PR verify that this is true for the session replay unit tests. For additional verification in a real-world scenario, I plan to push a commit to staging tomorrow that will perform the same verification on all recorded pages within our staging environment.

Changes

A new experimental feature, USE_CHANGE_RECORDS, has been added. Enabling this feature has the following effects:

  1. The full snapshot is captured using a new serialization function, serializeNodeAsChange, which produces a Change record instead of a FullSnapshot record. serializeNodeAsChange and the functions it recursively calls are the heart of the new encoding implementation.
  2. The RecordingScope data structure resets between full snapshots; all ids are cleared. The new encoding assigns ids implicitly in the order that nodes and other objects are encountered; this means that there is no way to carry over ids from a different full snapshot, so resetting the RecordingScope is necessary.

serializeNodeAsChange is supported by several new types that help implement its functionality:

  1. The new ChangeSerializationTransaction type exposes builder methods which are used to construct the Change record. Once we're confident in this code, we can move this functionality onto the existing SerializationTransaction type; I've separated things out in this PR only to keep the new code as isolated as possible.
  2. InsertionCursor tracks the current position in the DOM during the serialization process. It's responsible for allocating node ids and generating an InsertionPosition for each node; these InsertionPositions are used in the new encoding to tell the player where in the DOM tree to insert each node.
  3. ChangeEncoder automatically optimizes each Change record as it's being constructed. It splits out strings into a separate string table for deduplication purposes, and it groups changes of the same type together to minimize the number of bytes that must be devoted to identifying the type of each change.

It's probably also worth calling out changeConversions.ts, which is unfortunately by far the largest file in this PR despite not actually being code that runs as part of the browser SDK. This file contains convertChangeToFullSnapshot(), which is used in tests to verify that serializeNodeAsChange() produces byte-for-byte identical results to serializeNode(). It would ordinarily be called changeConversions.specHelper.ts, but I need it to be temporarily importable in non-test code so I can use it to perform validation on staging. I'll give it the .specHelper.ts extension before merging. (EDIT: I've completed this validation and renamed the file to changeConversions.specHelper.ts.)

There are some changes which are deliberately not included in this PR:

  • Support for incremental mutations will be added in a separate PR. This one is huge as it is, and I felt it would be better to review those changes in a more focused way, since incremental mutations are more complex. I've tried to remove all vestiges of them in this PR. (This is why, for example, a new InsertionCursor can only be positioned at the root of the DOM tree right now.)
  • I want to add some new telemetry associated with this change, but I need to make some changes to the implementation of the old encoding as well, so I'll handle telemetry in a followup PR. Within this PR, the goal is to generate the same telemetry that we generate right now, and nothing further. (Edit: This is 📈 [PANA-5371] Add telemetry to help evaluate new DOM mutation encoding #4077.)
  • I plan to add support for rendering Change records to the session replay sandbox immediately, but that code lives in a separate repo, so it can't be included in this PR. I'll bump the session replay sandbox version used in the browser extension in a followup. (Edit: This was ✨ [PANA-5359] - Support Change records in the developer extension #4072.)

Test instructions

This PR has been validated by pushing a commit to staging that generates serializations in both formats for every full snapshot, converts the BrowserChangeRecord to a BrowserFullSnapshotRecord. The current version of the PR produces BrowserChangeRecords which, when converted to BrowserFullSnapshotRecords, are byte-for-byte identical in all circumstances encountered on staging.

Manually testing the changes is currently a bit tricky, since the browser SDK extension does not support rendering BrowserChangeRecords. I've merged a PR for the replay code that adds this functionality, so it should be possible to test things visually in an easier way soon, but that PR has not been deployed yet. I'm happy to share the details on Slack, if you want to test things visually.

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.

@sethfowler-datadog sethfowler-datadog requested review from a team as code owners January 5, 2026 18:25
@sethfowler-datadog sethfowler-datadog force-pushed the seth.fowler/PANA-3971-add-a-more-compact-dom-mutation-encoding branch from 4d23639 to 287703c Compare January 5, 2026 18:28
@datadog-official
Copy link

datadog-official bot commented Jan 5, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 78.82%
Overall Coverage: 77.30% (+0.01%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 9233797 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@cit-pr-commenter
Copy link

cit-pr-commenter bot commented Jan 5, 2026

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 164.35 KiB 164.36 KiB +10 B +0.01%
Rum Profiler 4.33 KiB 4.33 KiB 0 B 0.00%
Rum Recorder 20.04 KiB 24.27 KiB +4.23 KiB +21.11%
Logs 56.14 KiB 56.14 KiB 0 B 0.00%
Flagging 944 B 944 B 0 B 0.00%
Rum Slim 121.57 KiB 121.57 KiB 0 B 0.00%
Worker 23.63 KiB 23.63 KiB 0 B 0.00%
🚀 CPU Performance

Pending...

🧠 Memory Performance

Pending...

🔗 RealWorld

Comment on lines +26 to +43
let privacyLevel: NodePrivacyLevel

const selfPrivacyLevel = getNodeSelfPrivacyLevel(node)
if (selfPrivacyLevel) {
privacyLevel = reducePrivacyLevel(selfPrivacyLevel, parentPrivacyLevel)
} else {
privacyLevel = parentPrivacyLevel
}

if (privacyLevel === NodePrivacyLevel.HIDDEN) {
serializeHiddenNodePlaceholder(cursor, node, transaction)
return
}

// Totally ignore risky or unwanted elements. (e.g. <script>, some <link> and <meta> elements)
if (privacyLevel === NodePrivacyLevel.IGNORE) {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to execute this on every node instead of just on element nodes as before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the first part, where we call getNodeSelfPrivacyLevel(), it seems more different than it really is. getNodeSelfPrivacyLevel() immediately returns undefined for non-elements, and in response we immediately set privacyLevel to parentPrivacyLevel and skip the invocation of reducePrivacyLevel().

Regarding the second part, where we make checks like if privacyLevel === NodePrivacyLevel.HIDDEN, this part needs to be there for correctness, because external callers can pass arbitrary nodes into this function, and so we might be handed a non-element node without passing through an element ancestor first. The old implementation would've been better off having these checks, too; the current state of affairs requires callers to defensively implement the checks themselves, with the potential for a privacy bug if they forget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I was also hoping to keep as much of the privacy logic in a single place as possible, to make it easier to understand.

transaction: ChangeSerializationTransaction
): void {
const { nodeId, insertionPoint } = cursor.advance(document)
transaction.addNode(insertionPoint, '#document')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to have NodeType[type] (which was misleading because it did not match https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType ) Still, shouldn't we keep using numbers instead of strings and use this opportunity to match the spec?

Copy link
Contributor Author

@sethfowler-datadog sethfowler-datadog Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we still use numbers, in the end, but the numbers are indexes into the string table. 🙂

We are actually mostly aligned with the spec here, but it's the spec for nodeName that we're aligned with, not nodeType. (We don't perfectly match it because there are a few cases where we diverge to do something a bit nicer -- e.g. we handle DocumentType nodes differently, since spec'd nodeName would yield html for <!DOCTYPE html>, which is not what we want.)

Why nodeName and not nodeType? nodeName is nice because it eliminates the need to store separate fields for the node type and the tag name of element nodes. Non-element nodes have nodeNames that start with # (excluding the oddballs like DocumentType), and because element tag names cannot start with # in HTML, this is never ambiguous. By eliminating the need for separate node type and tag name fields, each serialized node is a bit smaller.

@sethfowler-datadog sethfowler-datadog force-pushed the seth.fowler/PANA-3971-add-a-more-compact-dom-mutation-encoding branch from e5ca210 to 81de533 Compare January 13, 2026 11:19
@sethfowler-datadog
Copy link
Contributor Author

/to-staging

@gh-worker-devflow-routing-ef8351
Copy link

gh-worker-devflow-routing-ef8351 bot commented Jan 13, 2026

View all feedbacks in Devflow UI.

2026-01-13 11:39:04 UTC ℹ️ Start processing command /to-staging


2026-01-13 11:39:10 UTC ℹ️ Branch Integration: starting soon, merge expected in approximately 14m (p90)

Commit 81de533e2f will soon be integrated into staging-03.


2026-01-13 11:39:23 UTC 🚨 Branch Integration: this merge request has conflicts which couldn't be solved automatically

We couldn't automatically merge the commit 81de533e2f into staging-03!

To solve the conflicts directly in Github, click here to create a fix pull request.

Alternatively, you can also click here reset the integration branch or use the following Slack command: @devflow reset-branch -r browser-sdk -b staging-03

dd-devflow bot added a commit that referenced this pull request Jan 13, 2026
@dd-devflow
Copy link
Contributor

dd-devflow bot commented Jan 13, 2026

🚂 Branch Integration: starting soon, merge expected in approximately 14m (p90)

Commit 81de533e2f will soon be integrated into staging-03.

@dd-devflow
Copy link
Contributor

dd-devflow bot commented Jan 13, 2026

🚂 Branch Integration

Commit 81de533e2f has been merged into staging-03 in merge commit 11f81c700c.

Check out the triggered pipeline on Gitlab 🦊

If you need to revert this integration, you can use the following command: /code revert-integration -b staging-03

Comment on lines +196 to +198
transaction.addNode(insertionPoint, encodedElementName(node), [PRIVACY_ATTR_NAME, PRIVACY_ATTR_VALUE_HIDDEN])
const { width, height } = node.getBoundingClientRect()
transaction.setSize(nodeId, width, height)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: will setSize be used for something else? Else, why not just use style attributes like we did before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will setSize be used for something else?

SizeChange will eventually be used for recording the size of censored images as well.

why not just use style attributes like we did before?

  1. It's nice to clearly distinguish between content that was present in the real DOM at recording time, and content that we generate. (This is, admittedly, more of a philosophical stance than anything else.)
  2. It's more compact to represent these sizes as a simple array with two integers than to represent them using CSS syntax.
  3. It's possible for hidden nodes and censored images to have style attributes already, and we may find in the future that it's necessary to preserve style information on these nodes to produce the correct layout. (I've seen the need for that in the past.) If we find ourselves needing to support that, it'll be nice to not have to worry about merging our generated styles with customer styles.

@sethfowler-datadog sethfowler-datadog merged commit f8fdaa9 into main Jan 13, 2026
21 checks passed
@sethfowler-datadog sethfowler-datadog deleted the seth.fowler/PANA-3971-add-a-more-compact-dom-mutation-encoding branch January 13, 2026 18:58
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants