Skip to content

Commit 8d82025

Browse files
committed
Introduce dual mode
1 parent 359d305 commit 8d82025

File tree

1 file changed

+128
-47
lines changed

1 file changed

+128
-47
lines changed

draft-ietf-avtcore-rtp-sframe.md

Lines changed: 128 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,14 @@ author:
2727
organization: Microsoft
2828
2929

30+
normative:
31+
WebRTC_Encoded_Transform:
32+
author:
33+
org: World Wide Web Consortium
34+
title: WebRTC Encoded Transform
35+
date: 2025-05
36+
target: https://w3c.github.io/webrtc-encoded-transform/
37+
3038
--- abstract
3139

3240
This document describes the RTP payload format of SFrame.
@@ -35,8 +43,8 @@ This document describes the RTP payload format of SFrame.
3543

3644
# Introduction
3745

38-
SFrame {{?I-D.draft-ietf-sframe-enc-01}} describes an end-to-end encryption and authentication mechanism
39-
for media frames in a multiparty conference call, in which central media servers (SFUs) can access the
46+
SFrame {{!RFC9605}} describes an end-to-end encryption and authentication mechanism
47+
for media data in a multiparty conference call, in which central media servers (SFUs) can access the
4048
media metadata needed to make forwarding decisions without having access to the actual media.
4149

4250
This document describes how to packetize a media frame encrypted using SFrame into RTP packets.
@@ -48,77 +56,150 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
4856
document are to be interpreted as described in BCP 14 {{!RFC2119}} {{!RFC8174}}
4957
when, and only when, they appear in all capitals, as shown here.
5058

59+
# SFrame format
60+
61+
An SFrame ciphertext comprises a header and encrypted data.
62+
The SFrame header has a size varying between 1 to 17 bytes.
63+
The encrypted data can be of arbitrary length and is larger than the unencrypted data by a fixed overhead that depends on the encryption algorithm.
64+
The overhead can be up to 16 bytes.
5165

52-
# RTP Packetization of a media frame encrypted by SFrame
66+
An SFrame ciphertext having an arbitrary long length, an application may decide to partition the data encrypted with SFrame
67+
small enough so that the SFrame ciphertext fits in a single RTP packet.
68+
We call this per-packet SFrame.
69+
This has the advantage of allowing to decrypt the content as soon as received.
5370

54-
In order to packetize SFrame into RTP, packetization is done in 2 stages.
55-
In the first stage, before SFrame encryption, media is packetized into RTP packets in a way specific to the media format.
56-
In the second stage, each RTP packet from the first stage is packetized into RTP packets in a way specific to SFrame.
57-
SFrame encryption is applied to the payload of each RTP packet between the first and second stages.
71+
An alternative is to encrypt the data, a media frame typically, and send the SFrame ciphertext over several RTP packets.
72+
We call this per-frame SFrame.
73+
This has the advantage of limiting the SFrame overhead, especially for video frames.
74+
This alternative is also compatible with {{WebRTC_Encoded_Transform}}, which is important for backward compatibility of existing services.
5875

59-
For example, if a media frame to be encrypted by SFrame is encoded using VP8, the media frame is first
60-
packetized according to {{!RFC7741}} into one RTP packets with VP8-specific payloads. Each of those
61-
VP8 RTP payloads are then encrypted using SFrame, resulting in an SFrame-encrypted RTP payload of VP8.
62-
SFrame-specific packetization is then applied to the SFrame-encrypted RTP payload of VP8, resulting in
63-
RTP packets with SFrame-specific RTP payloads.
76+
The RTP format presented in this document supports both alternatives.
6477

65-
SFrame-specific packetization is done by first breaking up the output of SFrame encryption
66-
into fragments, and then prepending some fragment metadata necessary for depacketization. Finally,
67-
fragments are combined with values from the RTP header of the output of the media-format-specific
68-
packetization.
78+
# RTP Header Usage
6979

70-
The SFrame-specific RTP payloads (fragments with prepended metadata) have the following format:
80+
The general RTP payload format for SFrame is depicted below.
7181

7282
~~~
73-
0 1 2
74-
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
75-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
76-
|L| media PT | media frame ID |
77-
| fragment index | fragment ... |
78-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
83+
0 1 2 3
84+
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
85+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
86+
|V=2|P|X| CC |M| PT | sequence number |
87+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
88+
| timestamp |
89+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
90+
| synchronization source (SSRC) identifier |
91+
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
92+
| contributing source (CSRC) identifiers |
93+
| .... |
94+
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
95+
|S E x x x x x x| |
96+
| |
97+
: SFrame payload :
98+
| |
99+
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
100+
| : OPTIONAL RTP padding |
101+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
79102
~~~
80103

81-
The media PT must be the payload type of the output of the media-format-specific packetization.
82-
The frame index of the first fragment of each media frame MUST be 0.
83-
The frame index of each subsequent fragment MUST be one more than the previous fragment.
84-
The L bit MUST be 0 for all fragments except for the last one of the media frame.
85-
The media frame ID must be unique enough that a depacketizer may be able to differentiate
86-
the fragments of one media frame from another.
104+
The first byte of the RTP payload is the SFrame RTP header.
105+
106+
The S bit of the SFrame RTP header MUST be 0 for all fragments except for the first one of the SFrame frame.
107+
108+
The E bit of the SFrame RTP header MUST be 0 for all fragments except for the last one of the SFrame frame.
109+
110+
The 6 remaining bits of the SFrame RTP header are reserved for future use.
111+
112+
The payload type (PT) identifies the format of the media encrypted with SFrame.
113+
87114
The SSRC, timestamp, marker bit, and CSRCs of the SFrame RTP packets MUST be the same
88115
as those of the output of the media-format-specific packetization.
89116
The header extensions of the SFrame RTP packets SHOULD be the same
90117
as those of the output of the media-format-specific packetization, but some may be omitted
91118
if it is known that the omitted header extensions do not need to be duplicated on each SFrame RTP packet.
92-
The payload type of the SFrame RTP packets must be a payload type that indicates the payload
93-
format defined in this document, and it must have a negotiated RTP clock rate that is the same as the
94-
media-format-specific RTP packet.
119+
120+
# RTP Packetization of SFrame
121+
122+
SFrame packets can be generated either from RTP media packets or from media frames as defined by {{WebRTC_Encoded_Transform}}.
123+
124+
For per-packet SFrame, the following processing is done, with a media frame as input:
125+
126+
1. Generate a group of RTP media packets from the media frame using a media-format-specific packetizer.
127+
The media-format-specific packetizer needs to be made aware of the SFrame overhead that happens to each RTP packet.
128+
2. For each RTP packet of the group, encrypt its payload with SFrame.
129+
3. Prepend to each RTP packet payload a SFrame RTP header with the S and E bits set to 1.
130+
4. Send each RTP packet of the group.
131+
132+
For per-frame SFrame, the following processing is done, with a media frame as input:
133+
134+
1. Generate a SFrame ciphertext from the media frame data.
135+
2. Fragment the SFrame ciphertext in a group of payloads so that RTP packets generated from them do not exceed the network maximum transmission unit size.
136+
3. Prepend a zero byte as the SFrame RTP header to all payloads of the group.
137+
4. Set the first bit S of the SFrame RTP header of the first packet to 1.
138+
5. Set the second bit E of the SFrame RTP header of the last packet to 1.
139+
6. Generate a group of RTP packets from the group of payloads, using the media frame to generate the RTP header, including RTP header extensions.
140+
7. Send each RTP packet of the RTP packet group.
95141

96142
# RTP depacketization of SFrame
97143

98-
Depacketization is done by doing the packetization process in reverse:
144+
Reception of SFrame packets is done as follows:
145+
146+
1. The fragments of a given SFrame ciphertext are grouped together in order of the RTP sequence number,
147+
the first packet of the group having its S bit set to 1 and the last packet of the group havint its E bit set to 1.
148+
All packets in between the first and last needs to be in the group.
149+
2. Concatenate the payloads of all packets of the group to form the SFrame ciphertext.
150+
3. Decrypt the SFrame ciphertext to obtain the media decrypted data.
151+
4. If per-packet SFrame is being used, the following processing is done:
152+
1. assert that the group of packets consist of a single packet.
153+
2. Set the media decrypted data as the payload of the packet and send the packet to the media-format-specific RTP depacketizer.
154+
3. If the depacketizer cannot generate a media frame yet, abort these steps. Otherwise, generate a media frame from the depacketizer.
155+
5. If per-frame SFrame is being used, the following processing is done:
156+
1. assert that the group of packets all have the same payload type.
157+
2. Extract the media metadata from the group of packets.
158+
3. Generate a media frame from the media decrypted data and the media metadata.
159+
6. Send the media frame to the receiving pipeline.
160+
161+
# SFrame SDP negotiation
162+
163+
SFrame packetization is indicated via a new "a=sframe" SDP attribute defined in this specification.
164+
This attribute is used at media level, it does not appear at session level.
165+
166+
The presence of the "a=sframe" attribute in a media section (in either an offer or an answer) indicates that the
167+
endpoint is expecting to receive RTP packets encrypted with SFrame for that media section, as defined below.
99168

100-
1. The fragments of a given media frame ID are grouped together in order of fragment index and concatenated together, resulting in a media frame encrypted by SFrame.
169+
Once each peer has verified that the other party expects to receive SFrame RTP packets, senders are expected to send SFrame encrypted RTP packets.
170+
If one peer expects to use SFrame for a media section and identifies that the other peer does not support it, the peer
171+
is expected to stop the transceiver associated to the media section, which will generate a zero port for that m-section.
101172

102-
2. The media frame is decrypted using SFrame, resulting in a media-format-specific RTP payload.
173+
When SFrame is in use for that media section, it will apply to the relevant media encodings defined for that media section.
174+
This includes RTP payload types bound to media packetizers and media depacketizers as defined in {{!RFC7656}}, typically audio formats such as Opus and RTP video formats such as H264.
175+
This notably includes RTP payload types representing {{WebRTC_Encoded_Transform}} [encoded video frame formats](https://w3c.github.io/webrtc-encoded-transform/#dom-rtcencodedvideoframe-data) and [encoded audio frame formats](https://w3c.github.io/webrtc-encoded-transform/#dom-rtcencodedaudioframe-data).
103176

104-
3. The media-format-specific RTP payload is combined with the RTP headers of the RTP packet with fragment index 0, resulting in a media-format-specific RTP packet.
105-
The "media PT" from the SFrame RTP payload header is used as the payload type of the media-format-specific RTP packet.
177+
This does not include RTP-Based Redundancy mechanisms as defined in {!RFC7656}}.
178+
For instance, RTX defined in {{!RFC4588}} will retransmit SFrame based packets.
179+
Forward error correction formats as defined in {{!RFC5109}} will protect the encrypted content.
180+
For Redundant Audio Data, known as RED, as defined in {{!RFC2198}}, a RED packetizer will take as input SFrame encrypted media data instead of unencrypted media data.
106181

107-
4. The media-format-specific RTP packet is passed into a media-format-specific RTP depacketizer, resulting in a media frame.
182+
If BUNDLE is in use and the "a=sframe" attribute is present for a media section but not for another media section of the same BUNDLE,
183+
payload types for media encodings that are relevant for SFrame MUST not be reused between the two media sections.
108184

185+
Questions:
109186

110-
# SFrame payload type negotiation
187+
1. Should we precise how RTX/FEC works with SFrame packetization? No impact AFAIK since RTX/FEC would work on packets (whether SFramed or not).
188+
2. Is RED current proposal (transmit SFrame ciphertext blocks) good enough? An alternative is to have SFrame being applied on the entire RED packet payload.
189+
3. Should we allow `a=sframe` at session level to mean that all media sections want sframe?
111190

112-
Because the payload type of an RTP packet that results from SFrame-specific packetization must match the
113-
clock rate of the payload type of the RTP packet that results from media-format-specific packetization,
114-
it may be necessary to negotiate more than one SFrame payload type. For example, if one were to use SDP
115-
to negotiate payload types, the following payload types could be negotiated with different clock rates:
191+
Here is an example of SFrame being negotiated for audio (opus and CN) and for video (H264 and VP8):
116192

117193
~~~
118-
m=audio 50000 RTP/SAVPF 96
119-
a=rtpmap:96 sframe/48000
120-
m=video 50002 RTP/SAVPF 97
121-
a=rtpmap:97 sframe/90000
194+
m=audio 50000 RTP/SAVPF 10 11
195+
a=sframe
196+
a=rtpmap:10 opus/48000/2
197+
a=rtpmap:11 CN/8000
198+
199+
m=video 50002 RTP/SAVPF 100 101
200+
a=sframe
201+
a=rtpmap:100 H264/90000
202+
a=rtpmap:101 VP8/90000
122203
~~~
123204

124205
# Security Considerations

0 commit comments

Comments
 (0)