VTT subtitles timing misalignment on HLS if the timestamp map doesn't account for discontinuity sequences

**Have you read the [FAQ](https://bit.ly/ShakaFAQ) and checked for duplicate open issues?**
Yes

**If the problem is related to FairPlay, have you read the tutorial?**

No

**What version of Shaka Player are you using?**

fork based on 4.10. I haven't tested latest main branch but the relevant piece of code seems mostly identical, so, I assume, it'll be an issue there too.

**Can you reproduce the issue with our latest release version?**


**Can you reproduce the issue with the latest code from `main`?**


**Are you using the demo app or your own custom app?**


**If custom app, can you reproduce the issue using our demo app?**


**What browser and OS are you using?**
Safari, but likely any browser. Only have an example that runs on Safari at the moment.

**For embedded devices (smart TVs, etc.), what model and firmware version are you using?**


**What are the manifest and license server URIs?**

I don't have a stream to share.

**What configuration are you using?  What is the output of `player.getNonDefaultConfiguration()`?**

Standard config.

**What did you do?**

With HLS, we have an SSAI stream with segmented VTT captions in text file. It the VTT, the timestamp map, maps the VTT times to the content time.
content.vtt:
```vtt
WEBVTT
X-TIMESTAMP-MAP=MPEGTS:0,LOCAL:00:00:00.000

00:00:06.000 --> 00:00:08.000
example text
```
The HLS manifest looks something like this:
```m3u8
#EXT-X-DISCONTINUITY
#EXTINF:15,
empty.vtt
#EXT-X-DISCONTINUITY
#EXTINF:30,
content.vtt
```

**What did you expect to happen?**
I'd expect that the cue `example text` would display 6 seconds into its discontinuity sequence, or, 21s of the beginning of the combined timeline.

**What actually happened?**


Instead, depending on when the captions are enabled, the cues will show up at 6s rather than 21s. If I wait to make sure that the video segments from the second discontinuity sequence are loaded, then it will correctly display the cue at 21s.

This is because when VTT cues are parsed, it doesn't make sure that we have the correct timestamp offset associated with the particular discontinuity sequence. In here, https://github.com/shaka-project/shaka-player/blob/70904f52759af19b13e0f24b1f6a61be5878f7e5/lib/media/media_source_engine.js#L1149-L1159, the timestamp offset that we wait for could be from the first discontinuity, but we're already trying to parse `content.vtt`.

My proposal is that we should keep track of the timestamp offsets of each discontinuity sequence and if we're trying to parse the cues and we don't yet have the expected timestamp offset, we hold off until it is provided. Segment references already have a `discontinuitySequence` property on them, so, we can pass it in to the text engine. Then, [potentially here](https://github.com/shaka-project/shaka-player/blob/70904f52759af19b13e0f24b1f6a61be5878f7e5/lib/media/media_source_engine.js#L1254), we pass the timestamp offset along with the discontinuity sequence number to the text engine. When we call TextEngine#appendBuffer, we also pass the reference's discontinuitySequence through as well.

In TextEngine#appendBuffer, we can see if we have the correct timestamp offset and if so we can continue to parse, using that stored value. Otherwise, we'd want to delay parsing until we get the timestamp offset associated with that disco sequence. In playing around, I naively have it schedule a DelayedTick which just calls appendBuffer again, but there's probably a better way of handling that.

The TextEngine would also need a map of disco sequence numbers to timestamp offsets, rather than a single timestamp offset.

With my current naive implementation, seeking also produces `Assertion Failed: There should not be a gap in text references >1s` from the TextEngine, but I think that's addressable in the changes.

Any thoughts on the matter would also be greatly appreciated.

**Are you planning to send a PR to fix it?**
Yes. I'm working on changes on our fork but can submit a PR once that's ready/approved for contribution. But was also hoping for input on my proposal and the issue at hand.

	if (contentType == ContentType.TEXT) {
	if (this.manifestType_ == shaka.media.ManifestParser.HLS) {
	// This won't be known until the first video segment is appended.
	const offset = await this.textSequenceModeOffset_;
	this.textEngine_.setTimestampOffset(offset);
	}
	await this.textEngine_.appendBuffer(
	data,
	reference ? reference.startTime : null,
	reference ? reference.endTime : null,
	reference ? reference.getUris()[0] : null);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VTT subtitles timing misalignment on HLS if the timestamp map doesn't account for discontinuity sequences #9470

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VTT subtitles timing misalignment on HLS if the timestamp map doesn't account for discontinuity sequences #9470

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions