-
Notifications
You must be signed in to change notification settings - Fork 70
Added distributed transcoding and playback design doc #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,178 @@ | ||||||||||||||
| # Distributed transcoding and how playback works | ||||||||||||||
|
|
||||||||||||||
| Kyoo provides videos via [HTTP live streaming](https://www.cloudflare.com/learning/video/what-is-http-live-streaming/) (HLS). HLS is comprised of two components: a "playlist" (using the `.m3u8` file extension), and "transport stream segments", or "segments" (using the `.ts` file extension). Playlists contain a set of segments, which are pieces of the video being streamed. When playing a video, the client first requests a playlist of the video, and then requests segments of the video, as needed. | ||||||||||||||
|
|
||||||||||||||
| Segments can be generated from videos as-is (direct playback), or transcoded. Kyoo supports both options. Transcoding is on the fly, slightly ahead of when a client is expected to request segments. Transcoding may be done one segment at a time, or in batches, which generally results in better transcoding performance. Once segments are transcoded, they are cached in the storage backend (filesystem, S3) for a user-configurable duration, and eventually removed when they have not been recently accessed. Cleanup is handled as a background job, and old segments may not be removed immediately. | ||||||||||||||
|
|
||||||||||||||
| The transcoding service is designed to be highly available. When multiple transcoding service instances are deployed at once and configured properly, users should not notice when at least one service fails. This holds true even when the failed instance(s) were transcoding a video being actively played. This is because Kyoo supports _distributed, parallel transcoding_. The service can be configured so that a minimum number of transcoder instances will transcode the parts of the same video. When multiple instances transcode the same parts of the same video at the same time, only one has to succeed for each segments for transcoding to be successful. | ||||||||||||||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not convinced we want to have an option to configure multiples instance that would transcode the same parts. This makes code way more complex & aggravate mismatch bugs (if segments are not perfectly cuts). The benefits for this complexity (+ wasted compute) is somewhat discussable, i think we could handle service failures without ALWAYS running transcode 2+ times
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue I have with this isn't that service failures cannot be handled at all - it's that they cannot be handled fast enough. An transcoding job essentially amounts to:
When there is a failure in step 2, 3, or 4, the whole process needs to start over. Starting over adds transcoding latency, increasing the time it takes for the transcoded segments to be available to clients. When the transcoder takes a relatively long time (larger segment sizes, slower hardware, other streams being processed), and is barely faster than the actual client playback, starting the job over results in buffering. When the application is configured to transcode the same content on multiple instances, this latency is entirely mitigated. IMO the issue of wasted compute shouldn't be considered here. I think that the user should have to explicitly enable/configure parallel decoding, therefore, they should be the ones to decide if the compute tradeoff is worth it. As far as segment cutting bugs, this approach should not introduce any additional bugs that would not also be added by supporting transcoding job restarts in the first place. If all transcoding for an entire video is not handled by a single atomic job (so that all segments come from a single ffmpeg call), then if there is a segment cutting bug, it will affect playback.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We upload segments & update the pg's metadata after every segment. We could just detect if segment transcoding takes more than |
||||||||||||||
|
|
||||||||||||||
| Because no two segments are guaranteed to come from the same transcoder instance, it is critical that all segments are entirely independent of each other, and do not overlap. "Parallel segments", or segments covering the same video and same time range that are produced by different instances, must always start with a [I-frame](https://en.wikipedia.org/wiki/Video_compression_picture_types). The start-finish time interval of parallel segments must also match up exactly, with no extra (or missing) frames. Additionally, for interoperability with direct playback, segments must line up exactly with keyframes in the source video. See [here](https://zoriya.dev/blogs/transcoder/) for more information. | ||||||||||||||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sgegments also need to start by i-frame for better seeking by clients (without that, clients might need to fetch the segment before the seeked time [& create another transcode job just to get that one frame]) |
||||||||||||||
|
|
||||||||||||||
| ## Playback | ||||||||||||||
|
|
||||||||||||||
| ### Playlist requests | ||||||||||||||
| ```mermaid | ||||||||||||||
| sequenceDiagram | ||||||||||||||
| participant cvp as Client video player | ||||||||||||||
| box Transcoder service (1..N) | ||||||||||||||
| participant api as Web API | ||||||||||||||
| participant jobs as Job worker | ||||||||||||||
| end | ||||||||||||||
| box Backend (HA) | ||||||||||||||
| participant db as Postgres | ||||||||||||||
| participant fs as Storage | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| cvp ->> api: Request playlist | ||||||||||||||
| api ->> db: Get video metadata (segmentation times) | ||||||||||||||
| db -->> api: Return result | ||||||||||||||
| critical Metadata not available | ||||||||||||||
| api -->> api: Generate metadata | ||||||||||||||
| api -) db: Cache metadata | ||||||||||||||
| end | ||||||||||||||
| api ->> api: Generate playlist | ||||||||||||||
| api -) db: Create transcoding job for first k segments | ||||||||||||||
| api ->> cvp: Return video playlist | ||||||||||||||
|
Comment on lines
+33
to
+35
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
we do not wait for the first k segments to be ready to return the playlist, i think this change would highlight it.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify the Are you sure that you'd prefer "Return video playlist" before "Generate playlist"? |
||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| ### Segment requests | ||||||||||||||
| ```mermaid | ||||||||||||||
| sequenceDiagram | ||||||||||||||
| participant cvp as Client video player | ||||||||||||||
| box Transcoder service (1..N) | ||||||||||||||
| participant api as Web API | ||||||||||||||
| participant jobs as Job worker | ||||||||||||||
| end | ||||||||||||||
| box Backend (HA) | ||||||||||||||
| participant db as Postgres | ||||||||||||||
| participant fs as Storage | ||||||||||||||
| end | ||||||||||||||
|
Comment on lines
+42
to
+49
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fs & jobs are never used in the schema (maybe just remove them?)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea sure I can do this on all the diagrams where they aren't in use. I just had all the participants listed to make it a little easier to scroll back and forth between the diagrams.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah i saw that afterwards, might be good to keep them but the document should explain what they are before the schemas. I have no clue what |
||||||||||||||
|
|
||||||||||||||
| cvp ->> api: Requests video segment | ||||||||||||||
| loop Until segment is available | ||||||||||||||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in practice we'd wait for an event from pg (using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea I should change it to this. I wrote this before adding the job completion notifications |
||||||||||||||
| critical Get segment URL | ||||||||||||||
| api ->> db: Request segment URL (worker, S3) | ||||||||||||||
| option Segment exists, not pending deletion | ||||||||||||||
| db ->> db: Update segment access time<br/>via trigger (for cleanup) | ||||||||||||||
| db -->> api: Return URL | ||||||||||||||
| option Segment pending deletion | ||||||||||||||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a corner case where a segment hasn't been accessed in awhile, and is in the "pending deletion" part of segment cleanup here. This ensures that there isn't a race condition between DB and S3 state when requesting a segment that is about to disappear.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then we probably want to handle that exactly like a |
||||||||||||||
| db -->> api: Return not available | ||||||||||||||
| option Segment does not exist, job not in progress | ||||||||||||||
| db ->> db: Create transcoding job for k segments | ||||||||||||||
| db -->> api: Return not available | ||||||||||||||
| end | ||||||||||||||
| end | ||||||||||||||
| api ->> storage: Request segment | ||||||||||||||
| storage -->> api: Return segment | ||||||||||||||
| api -->> cvp: Return segment | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| ### Segment cleanup | ||||||||||||||
| ```mermaid | ||||||||||||||
| sequenceDiagram | ||||||||||||||
| participant cvp as Client video player | ||||||||||||||
| box Transcoder service (1..N) | ||||||||||||||
| participant api as Web API | ||||||||||||||
| participant jobs as Job worker | ||||||||||||||
| end | ||||||||||||||
| box Backend (HA) | ||||||||||||||
| participant db as Postgres | ||||||||||||||
| participant fs as Storage | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| loop pg_cron: trigger every time duration d | ||||||||||||||
| db ->> db: Create segment cleanup job | ||||||||||||||
| worker ->> storage: Get all segments | ||||||||||||||
| storage -->> worker: Return segments | ||||||||||||||
| loop For each segment | ||||||||||||||
| critical Cleanup old segments | ||||||||||||||
| worker ->> db: Get last accessed time | ||||||||||||||
| option No record, segment older than expiration time t, or<br/>Record exists, segment access time older than expiration time t | ||||||||||||||
| worker ->> db: Mark segment as "pending deletion" | ||||||||||||||
| worker ->> storage: Delete segment | ||||||||||||||
| worker ->> db: Delete segment record | ||||||||||||||
| end | ||||||||||||||
| end | ||||||||||||||
|
Comment on lines
+83
to
+95
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks super complex for what it is. I don't think we need a pg_cron (which is an extension that is a pain to install) nor do we need a specific worker for that.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea you're right. I was on the fence about whether or not this should be triggered by the DB, or if workers should have a long running ticker/goroutine that handles cleanup.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would keep the current goroutine/timer workflow, let's keep it simple |
||||||||||||||
| end | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| ### Job creation and processing | ||||||||||||||
| ```mermaid | ||||||||||||||
| sequenceDiagram | ||||||||||||||
| participant cvp as Client video player | ||||||||||||||
| box Transcoder service (1..N) | ||||||||||||||
| participant api as Web API | ||||||||||||||
| participant jobs as Job worker | ||||||||||||||
| end | ||||||||||||||
| box Backend (HA) | ||||||||||||||
| participant db as Postgres | ||||||||||||||
| participant fs as Storage | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| api ->> db: Add job details to job type-specific jobs table, if matching job not in progress | ||||||||||||||
| api ->> db: NOTIFY job is available with job type, ID as payload | ||||||||||||||
| db ->> worker: Forward the NOTIFY payload to all workers | ||||||||||||||
|
|
||||||||||||||
| worker ->> db: Get job details from job type-specific jobs table | ||||||||||||||
| db -->> worker: Return job details | ||||||||||||||
| worker ->> worker: If job can be accepted (allow for job-specific logic here),<br/>set to "pending" state in thread-safe job processing map | ||||||||||||||
| worker ->> db: Record worker taking job in the job type-specific processing table,<br/>IF below desired worker count (count matching rows) | ||||||||||||||
| critical Job processing | ||||||||||||||
| option Other workers already processing job | ||||||||||||||
| worker ->> worker: Remove job from job processing map | ||||||||||||||
| option Job is cancelled, completed elsewhere | ||||||||||||||
| db ->> worker: Forward the NOTIFY payload to the listener | ||||||||||||||
| worker ->> worker: Cancel job context | ||||||||||||||
| option Job acceptance was successfully recorded | ||||||||||||||
| worker ->> worker: Set job state to "processing" in job processing map | ||||||||||||||
| worker ->> worker: Process job | ||||||||||||||
| worker ->> storage: Upload result to storage (if needed) | ||||||||||||||
| worker ->> db: Update records (if needed) | ||||||||||||||
| loop retry on failure | ||||||||||||||
| worker ->> db: Record job completion type (pass, fail), runtime<br/>(jobs table, processing table) | ||||||||||||||
| worker ->> db: Record error (if any) in job type-specific error table | ||||||||||||||
| worker ->> db: NOTIFY job is complete with job type, ID as payload | ||||||||||||||
| end | ||||||||||||||
| db -) api: Forward the NOTIFY payload to all API instances | ||||||||||||||
| db -) worker: Forward the NOTIFY payload to all API instances | ||||||||||||||
| worker ->> worker: Remove job from processing map | ||||||||||||||
| end | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| ### Job tracker cleanup | ||||||||||||||
| ```mermaid | ||||||||||||||
| sequenceDiagram | ||||||||||||||
| participant cvp as Client video player | ||||||||||||||
| box Transcoder service (1..N) | ||||||||||||||
| participant api as Web API | ||||||||||||||
| participant jobs as Job worker | ||||||||||||||
| end | ||||||||||||||
| box Backend (HA) | ||||||||||||||
| participant db as Postgres | ||||||||||||||
| participant fs as Storage | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| loop pg_cron: trigger every time duration d | ||||||||||||||
| loop For each job type | ||||||||||||||
| db ->> db: Delete old jobs (cascade delete processing records) | ||||||||||||||
| end | ||||||||||||||
| end | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| ### Worker startup | ||||||||||||||
| ```mermaid | ||||||||||||||
| sequenceDiagram | ||||||||||||||
| participant cvp as Client video player | ||||||||||||||
| box Transcoder service (1..N) | ||||||||||||||
| participant api as Web API | ||||||||||||||
| participant jobs as Job worker | ||||||||||||||
| end | ||||||||||||||
| box Backend (HA) | ||||||||||||||
| participant db as Postgres | ||||||||||||||
| participant fs as Storage | ||||||||||||||
| end | ||||||||||||||
|
|
||||||||||||||
| worker ->> db: LISTEN for job notifications | ||||||||||||||
| worker ->> db: Look for available (pending) jobs | ||||||||||||||
| worker ->> worker: Process pending jobs (see Jobs section) | ||||||||||||||
| ``` | ||||||||||||||
|
Comment on lines
+142
to
+178
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't have 2+ transcoders running on the same parts, wa can almost remove all of that (we'd only need some logic to detect when a transcoder died unexpectingly i think)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is the case. To support multiple instances of the service, with both handling API calls, there must be some form of communication between them. Otherwise, only process-level failures can be handled (and handled downstream). The following failures could not be handled (as examples):
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand how the proposed workflow fixes those issues ngl. How i see those problem handled:
Shouldn't the workload be handled by a load balancer?
This should just fallback to software transcoding, we can't really know in advance if hwaccell will be available
I don't think we can (or should) handle that well. This should just error out.
Updates should be transparent & allow both the old & new ones to work together. We store on db
This should just error out. |
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Segments aren't necessarily
.tsfiles, some versions of HLS support fmp4 (see #542) and one day another segment format could be adopted.