Skip to content

parseXml: optimize DASH SegmentTimeline <S> parsing #4984

@PascalThuet

Description

@PascalThuet

Is your feature request related to a problem? Please describe.

Yes.

Large DASH live DVR manifests block the main thread during parsing.

This issue is about the XML parser project @svta/cml-xml, not about changing dash.js itself. dash.js is only the workload used to measure the problem.

On an XL synthetic manifest (50 Periods x 4 AdaptationSets x 500 S entries, about 102k XML nodes), the full dash.js manifest parse takes about 40 ms on an M1. The XML parser alone takes about 30.5 ms of that total, or 62%.

About 98% of the nodes in these manifests are SegmentTimeline <S> entries. They are simple self-closing nodes with only integer attributes like t, d, r, and k, but they still go through the full generic parsing path.

Describe the solution you'd like

Add a specialized fast path in parseXml() for self-closing DASH SegmentTimeline <S> nodes.

The proposed implementation should happen in @svta/cml-xml, inside parseXml().

Expected behavior

  • detect <S ... /> nodes while parsing
  • parse t, d, r, and k directly as integers
  • skip unescapeHtml() for numeric attributes
  • avoid unnecessary generic attribute parsing work for these nodes
  • reuse a shared empty childNodes array for self-closing nodes when safe
  • preserve the current output shape

In a synthetic benchmark, a specialized eager parser for <S> nodes reduced the XML parsing cost from about 39.8 ms to about 19.1 ms on the XL manifest, a reduction of about 52% (~2.1x faster).

Describe alternatives you've considered

  • Lazy parsing of SegmentTimeline.S

    Rejected. These entries are typically consumed immediately after manifest parsing for duration calculation, segment counting, and segment lookup.

  • A local XML parser variant inside dash.js

    Rejected. A local clone was slower than the current @svta/cml-xml implementation in synthetic benchmarks.

  • A downstream dash.js optimization in DashParser.processNode()

    Useful, but secondary. For example, replacing arrayNodes.indexOf() with Set.has() helps, but the main bottleneck is still cmlParseXml().

Additional context

Example hot-case nodes

<S t="123456" d="180000" />
<S d="180000" r="14" />
<S d="180000" r="-1" k="3" />

Reproduction

node test/bench-manifest-parsing.mjs
node test/bench-parsexml.mjs

Real public live DVR example

  • https://livesim2.dashif.org/livesim2/segtimeline_1/tsbd_21600/testpic_2s/Manifest.mpd
  • verified on March 10, 2026
  • timeShiftBufferDepth="PT6H"
  • about 170 KB
  • about 5402 literal <S> nodes in the fetched MPD

Larger multi-period variant

  • https://livesim2.dashif.org/livesim2/segtimeline_1/tsbd_21600/periods_60/continuous_1/testpic_2s/Manifest.mpd
  • timeShiftBufferDepth="PT6H"
  • about 901 KB
  • 361 periods
  • about 5942 literal <S> nodes in the fetched MPD

Why this matters for users

  • on desktop, 39.8 ms -> 19.1 ms still means about 52% less XML parsing work on the main thread
  • on lower-end Smart TV / STB class hardware, that same reduction can translate to roughly 100-200 ms less blocking per manifest refresh
  • on live DVR streams, this cost repeats on every manifest update cycle, so users can experience recurring UI hitching rather than a one-time delay

Environment used for the measurements above

What would make this complete

  • no observable output change for current consumers
  • measurable improvement on large DASH manifests with dense SegmentTimeline
  • no regression on non-DASH XML inputs
  • benchmark or test coverage proving both correctness and performance

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions