diff --git a/proposals/4140-delayed-events-futures.md b/proposals/4140-delayed-events-futures.md new file mode 100644 index 00000000000..50d28027238 --- /dev/null +++ b/proposals/4140-delayed-events-futures.md @@ -0,0 +1,677 @@ +# MSC4140: Cancellable delayed events + +Scheduling messages to be sent at a defined later time is a feature present on a number of other +messaging platforms such as [Teams] or [Telegram]. This mechanism has a wide range of possible +applications such as tea timers, reminders, self-destructing messages (where the redaction is +scheduled to be sent later), or other ephemeral events such as temporary power level changes. + +[Teams]: https://support.microsoft.com/en-us/office/schedule-chat-messages-in-microsoft-teams-2fc5ea77-7bb4-4511-8f59-e62bac1c0f6a +[Telegram]: https://telegram.org/blog/scheduled-reminders-themes + +Another possible use case are reliable "hang up" events in VoIP calls. [MSC4143], for instance, +communicates call membership through timeline events. Leaving a call requires sending a new +event. The leaving client may not be able to send this event on its own, however, as it may lose +connectivity due to network issues or because the application was shutdown unexpectedly. +In this situation it would be helpful, if the client could schedule its "hang up" event to be +sent by the server at a defined later time. While the client is still connected and in the call, +it could repeatedly push the scheduled time forward as a kind of "heartbeat" mechanism. If the +client then loses connectivity, the server would emit the "hang up" event at the scheduled time +resulting in reliable call membership status for other participants. + +[MSC4143]: https://github.com/matrix-org/matrix-spec-proposals/pull/4143 + +This proposal caters to the use cases described above and introduces a mechanism by which a +Matrix client can schedule "delayed events" which will be sent into a room at a later time by +the homeserver. This includes APIs for scheduling delayed events and managing scheduled delayed +events as well as a way to delegate the management of a delayed event to external services such as +[Selective Forwarding Units (SFUs)]. + +[Selective Forwarding Units (SFUs)]: https://trueconf.com/blog/wiki/sfu + +## Proposal + +The following operations are added to the Client-Server API: + +- Schedule an event to be sent at a later time +- Retrieve details on scheduled and finalised delayed events + - A delayed event is said to be "finalised" when it has been sent, + or has been cancelled due to user action or an error on the attempt to send the event. +- Restart the timer of a scheduled delayed event +- Send a scheduled delayed event immediately +- Cancel a scheduled delayed event so that it is never sent + +At the point of an event being scheduled, the homeserver is unable to allocate the event ID[^eventId]. +Instead, the homeserver allocates a `delay_id` to the scheduled event which is used during the above API operations. + +### Scheduling a delayed event + +A new authenticated Client-Server API endpoint at +`PUT /_matrix/client/v3/rooms/{roomId}/delayed_event/{eventType}/{txnId}` allows clients to +schedule the sending of a message or state event. + +The path parameters of this endpoint are the same as for the existing +[`PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}`]( +https://spec.matrix.org/v1.18/client-server-api/#put_matrixclientv3roomsroomidsendeventtypetxnid) +endpoint. It also makes use of [transaction identifiers]( +https://spec.matrix.org/v1.18/client-server-api/#transaction-identifiers), +and supports [timestamp massaging](https://spec.matrix.org/latest/application-service-api/#timestamp-massaging) +when called by an application service. + +The body for requests to this endpoint is a JSON object containing the following fields: + +- `delay_ms` - Required. A positive non-zero number of milliseconds the homeserver should wait before sending the event. +- `state_key` - The state key for the event to be sent, if it is to be a state event; absent otherwise. +- `content` - Required. The content of the event to be sent. + +The homeserver schedules the event to be sent with the specified delay and responds with an +opaque `delay_id` field (omitting the `event_id` as it is not available): + +```http +200 OK +Content-Type: application/json + +{ + "delay_id": "1234567890" +} +``` + +The `delay_id` is an [opaque identifier](https://spec.matrix.org/v1.18/appendices/#opaque-identifiers) +generated by the homeserver. +It MUST be globally unique and SHOULD be cryptographically secure (in the sense that it is infeasible to predict). +This way, the `delay_id` alone may be provided to approved external services to control the delayed event through the +unauthenticated management endpoints introduced [below](#managing-scheduled-delayed-events), +with the cryptographic security of the `delay_id` preventing unapproved access to the delayed event. + +The homeserver MUST NOT send the event before the scheduled time. +To support batch sending, homeservers MAY add up to 30 seconds to the scheduled send time. +Note: clients might find that their events are delayed further due to server load and similar conditions. + +The homeserver MUST persist scheduled delayed events such that they will not be lost by the homeserver being restarted. +Moreover, when a homeserver restarts, it MUST scan for all scheduled delayed events whose send time has already passed +(i.e. delayed events that were scheduled to be sent while the homeserver was offline), +and send those delayed events as soon as possible, in chronological order of their scheduled send times. + +The homeserver MAY enforce a maximum allowed delay for delayed events. +If a requested delay exceeds this maximum, the homeserver will respond with HTTP 400 +and a [standard error response](https://spec.matrix.org/v1.18/client-server-api/#standard-error-response) +with an `errcode` of `M_INVALID_PARAM`. + +The homeserver SHOULD apply rate limiting to the scheduling of delayed events to provide mitigation against the +[Resource Exhaustion](https://spec.matrix.org/v1.18/appendices/#threat-resource-exhaustion) threat. + +The homeserver SHOULD enforce a limit of how many delayed events a user may have scheduled at once +to provide mitigation against both the +[High Volume of Messages](https://spec.matrix.org/v1.18/appendices/#threat-high-volume-of-messages) and +[Resource Exhaustion](https://spec.matrix.org/v1.18/appendices/#threat-resource-exhaustion) threats. +If a user's request to schedule a delayed event would exceed this limit, the homeserver will respond with HTTP 429, +a [standard error response](https://spec.matrix.org/latest/client-server-api/#standard-error-response) +with an `errcode` of `M_LIMIT_EXCEEDED`, and a `Retry-After` header whose value is set to the time of/until +the scheduled send time of the next of the user's delayed events to be sent, +rounded up to the nearest second. + +```http +429 Too Many Requests +Content-Type: application/json +Retry-After: 1200 + +{ + "errcode": "M_LIMIT_EXCEEDED", + "error": "The maximum number of delayed events has been reached.", +} +``` + +As a special case, if the homeserver has set either of these limits such that scheduling delayed events is disallowed +(i.e. it sets the maximum allowed delay to 0 seconds, or a limit of 0 scheduled delayed events per user), it may respond +to event scheduling requests with HTTP 400 and a standard error response with an `errcode` of `M_UNKNOWN`. + +#### Delayed event limits as a capability + +The values of both the maximum allowed delay and the maximum allowed number of scheduled events are advertised as a +[capability](https://spec.matrix.org/v1.18/client-server-api/#capabilities-negotiation) named `m.delayed_events`, via +the values of fields named `max_delay_ms` and `max_scheduled` respectively. +If the server doesn't enforce one of these limits, its representative field MUST be absent from the capability. +If the server enforces none of these limits, the capability MAY be omitted entirely instead of having an empty body. + +For example, the following specifies a maximum allowed delay of 24 hours and a per-user limit of 10 delayed events: + +```json +{ + "capabilities": { + "m.delayed_events": { + "max_delay_ms": 86400000, + "max_scheduled": 10 + } + } +} +``` + +### Managing scheduled delayed events + +A set of new unauthenticated Client-Server API endpoints at +`POST /_matrix/client/v1/delayed_events/{delay_id}/{action}` allows clients to +manage the sending of scheduled delayed events. + +The `action` specifies the management action to take on the scheduled delayed event with the specified `delay_id`. +The supported `action`s are the following: + +- `send` - Send the delayed event immediately instead of waiting for its scheduled send time. +- `cancel` - Cancel the delayed event so that it is never sent. +- `restart` - Reset the delayed event's scheduled send time to be the current time + its original `delay_ms`. + +For example, the following would send the delayed event with `delay_id` `1234567890` immediately: + +```http +POST /_matrix/client/v1/delayed_events/1234567890/send +Content-Type: application/json + +{ +} +``` + +These endpoints are unauthenticated so that control over a particular scheduled delayed event may be +[delegated to an external service](#delegating-scheduled-delayed-events) +by sharing the target delayed event's `delay_id` with the service. + +Where the `action` is `send`, the homeserver SHOULD apply rate limiting to provide mitigation against the +[High Volume of Messages](https://spec.matrix.org/v1.18/appendices/#threat-high-volume-of-messages) threat. + +For all `action`s, the homeserver SHOULD apply rate limiting to provide mitigation against the +[Resource Exhaustion](https://spec.matrix.org/v1.18/appendices/#threat-resource-exhaustion) threat. + +If no delayed event with the specified `delay_id` can be found, +the homeserver will respond with HTTP 404 +and a [standard error response](https://spec.matrix.org/latest/client-server-api/#standard-error-response) +with an `errcode` of `M_NOT_FOUND`. + +On success, the homeserver will respond with HTTP 200 +and a response body of an empty object. A future MSC may define additional keys, such as returning +the event ID for an `action` of `send` or the new expected send time for an `action` of `restart`. + +To allow safely retrying requests, the homeserver will respond with success +if the target delayed event is already finalised with an outcome that matches the `action`, i.e. +if the `action` is `send` and the delayed event has already been sent, +or if the `action` is `cancel` and the delayed event has already been cancelled +(either due to user action or an error). + +If the target delayed event is already finalised with an outcome that conflicts with the `action`, i.e. +if the `action` is `send` or `restart` and the delayed event has already been cancelled, +or if the `action` is `cancel` and the delayed event has already been sent, +the homeserver will respond with HTTP 409 +and a [standard error response](https://spec.matrix.org/latest/client-server-api/#standard-error-response) +with an `errcode` of `M_UNKNOWN`. + +If the action is `send` and the delayed event is unable to be sent due to an error, +the homeserver will respond with that error (e.g. HTTP 403 +and a [standard error response](https://spec.matrix.org/latest/client-server-api/#standard-error-response) +with an `errcode` of `M_FORBIDDEN` if the user doesn't have permission to send the event at the time of sending, +or HTTP 429 if the user has exceeded rate limits for sending room events at that time), +as if the request had been to send the event as a non-delayed event with either the `/send` or `/state` endpoint. +The homeserver SHOULD keep the delayed event scheduled, to account for the fact that the cause of the error +may resolve by the time of the delayed event's scheduled send time, +and to allow retries of the `send` action until then. + +#### Delegating scheduled delayed events + +It is useful for external services to also interact with scheduled delayed events. +If a client disconnects, an external service can be the best source to send the delayed event/"last will". + +To permit this, the `delay_id` that uniquely identifies a delayed event also behaves as a scoped access token +that only allows to interact with the `POST /delayed_events/{delay_id}/{action}` endpoints on that specific `delay_id`. + +With this, an SFU that tracks the current client connection state could be given the power to control the delayed event. +The client would share the `delay_id` and the required details, so that the SFU can call the +`POST /delayed_events/{delay_id}/refresh` endpoint while a user is connected +and can call the `POST /delayed_events/{delay_id}/send` endpoint once the user disconnects. +This way, the SFU can be used as the source of truth for the call membership events without knowing anything about +the Matrix call. + +Since the SFU has a much lower chance of running into a network issue, +`POST /delayed_events/{delay_id}/restart` calls may be sent much more infrequently. +Instead of calling that endpoint every couple of seconds, a delayed event's +timeout can be set to be long (e.g. 6 hours), as the SFU can be expected to not forget sending the +`POST /delayed_events/{delay_id}/send` requests when it detects a disconnecting client. + +### Getting delayed events + +A set of new authenticated Client-Server API endpoints allows clients to look up +both scheduled and finalised delayed events owned by the requesting user. + +The homeserver SHOULD apply rate limiting to these endpoints to provide mitigation against the +[Resource Exhaustion](https://spec.matrix.org/v1.18/appendices/#threat-resource-exhaustion) threat. +They most likely require (dependent on the implementation) serialization steps +and can be used to slow down the homeserver. + +#### Getting a single delayed event + +A new authenticated Client-Server API endpoint at +`GET /_matrix/client/v1/delayed_events/{delay_id}` responds with +details on the delayed event with the specified `delay_id` owned by the requesting user. + +If no such delayed event can be found, the homeserver will respond with HTTP 404 +and a [standard error response](https://spec.matrix.org/latest/client-server-api/#standard-error-response) +with an `errcode` of `M_NOT_FOUND`. + +On success, the homeserver will respond with HTTP 200 and a JSON object containing the following fields: + +- `delay_id` - Required. The ID of the delayed event. +- `room_id` - Required. The ID of the room that the delayed event was scheduled to be sent in. +- `type` - Required. The event type of the delayed event. +- `state_key` - The state key of the delayed event if it is a state event; absent otherwise. +- `delay_ms` - Required. The delay in milliseconds after the point of scheduling that the event is/was to be sent at. +- `scheduled_at` - Required. The timestamp (as Unix time in milliseconds) when the delayed event was scheduled or + last restarted. +- `content` - Required. The content of the delayed event. + This is the body of the original `PUT` request, not a preview of the full event after sending. +- `error` - Present only for finalised events that were cancelled due to an error. + The [standard error response](https://spec.matrix.org/v1.18/client-server-api/#standard-error-response) + of the error that prevented the delayed event from being sent. +- `event_id` - The `event_id` this event got in case it was sent. +- `finalised_ts` - The timestamp (as Unix time in milliseconds) when the event was finalised; + absent if it is still scheduled. + Using [timestamp massaging](https://spec.matrix.org/latest/application-service-api/#timestamp-massaging) + does not affect the value of this field. + +Whether a delayed event is still scheduled, or has been sent, failed to be sent due to an error, or was cancelled +can be determined by examining which of the optional fields are present in the response object: + +- If `finalised_ts` is absent, then the delayed event is still scheduled. +- Otherwise, if `event_id` is present, then the delayed event has been sent. + - If `finalised_ts` < `scheduled_at` + `delay_ms`, then the event was sent manually by + [the `/send` endpoint](#managing-scheduled-delayed-events); otherwise, it was sent on its scheduled send time. +- Otherwise, if `error` is present, then the delayed event failed to be sent (and was descheduled) due to an error. +- Otherwise, the delayed event was cancelled by [the `/cancel` endpoint](#managing-scheduled-delayed-events). + +#### Getting a list of delayed events + +A new authenticated Client-Server API endpoint at +`GET /_matrix/client/v1/delayed_events` responds with +a list of details about scheduled delayed events owned by the requesting user. + +Delayed events are returned in chronological order of their intended send time, which is `scheduled_at` + `delay_ms`. + +On success, the response is HTTP 200 and a JSON object containing the following fields: + +- `delayed_events` - An array of objects describing delayed events owned by the requesting user. + These objects contain the same fields as the object returned by + [the single-item lookup](#getting-a-single-delayed-event), + except for the fields exclusive to finalised delayed events (`error`, `event_id`, and `finalised_ts`). + +```http +200 OK +Content-Type: application/json + +{ + "delayed_events": [ + { + "delay_id": "...", + "room_id": "!roomid:example.com", + "type": "m.room.message", + "delay_ms": 5500, + "scheduled_at": 1721732853284, + "content": { + "msgtype": "m.text", + "body": "I am now offline" + } + }, + ... + ] +} +``` + +#### Retention of finalised delayed events + +The amount of finalised events that stay on the homeserver can be chosen by the homeserver. +The recommended strategy is to retain finalised events for up to 7 days or 1000 events per user, +whichever occurs first. + +There is no guarantee for a client that events will be available +if they exceed the limits of their homeserver. +Additionally, a homeserver MAY discard finalised delayed events that have been returned by a +`GET /_matrix/client/v1/delayed_events/{delay_id}` response. + +### Additional homeserver behaviour + +#### `delay_id` in `unsigned` event data +The `delay_id` of a sent delayed event MUST be included in the resulting room event's `unsigned` data +if, and only if, the client being given the event is authenticated as the event's sender. + +#### Power levels are evaluated at the point of sending + +Power levels are evaluated for each event only once the delay has occurred and it will be distributed/inserted into the +DAG. This implies a delayed event can fail if it violates power levels at the time the delay passes. + +Conversely, it's also possible to successfully schedule an event that the user has no permission to send at the time of +sending. If the power level situation has changed at the time the delay passes, the event can even reach the DAG. + +#### Rate-limiting at the point of sending + +Further to the rate limiting of the API endpoints, the homeserver SHOULD apply rate limiting to the sending +of delayed messages at the point that they are inserted into the DAG. + +This is to provide mitigation against the [High Volume of Messages]( +https://spec.matrix.org/v1.18/appendices/#threat-high-volume-of-messages) threat where a malicious actor +could schedule a large volume of events ahead of time without exceeding a rate limit on the initial `PUT` request, +but has specified a `delay_ms` that corresponds to a common point of time in the future. + +If a delayed event fails to be sent at its scheduled send time due to a rate limit failure, +the homeserver SHOULD NOT retry sending the event. Instead, the event will be stored as a finalised delayed event +with its `error` field set, available to be retrieved by a client for the user who requested the event. +It then becomes the user's responsibility to fetch this error and retry sending the event as appropriate. + +### Guest accounts + +All delayed event related endpoints are available to guest accounts. +This allows guest accounts to participate in MatrixRTC sessions. + +## Potential issues + +### Compatibility with Cryptographic Identities + +Ideally, this proposal should be compatible with other proposals such as [MSC4080: Cryptographic Identities]( +https://github.com/matrix-org/matrix-spec-proposals/pull/4080) which introduce mechanisms +to allow the recipient of an event to determine whether it was sent by a client as opposed to have been spoofed/injected +by a malicious homeserver. + +In the context of this proposal, the delayed events should be signed with the same cryptographic identity as the client +that scheduled them. + +This means that the content of the original scheduled event must be sent "as is" without modification by the homeserver. +The consequence is an implementation detail that client developers must be aware of: if the content of the delayed +event contains a timestamp, then it would be the timestamp of when the event was originally scheduled rather than +anything later. + +However, the `origin_server_ts` of the delayed event should be the time that the event is actually sent +by the homeserver. + +This is a general problem that arises with the introduction +of [Cryptographic Identities](https://github.com/matrix-org/matrix-spec-proposals/pull/4080). +A user can intentionally, or caused by network conditions, delay the signing and sending of an event. +A possible solution would be the introduction of a `signing_ts` (in the signed section) and keep the `origin_server_ts` +in the unsigned section. +Both are reasonable data points that clients might want to use. +This would solve issues related to delayed events since +it would make it transparent to clients, when an event was scheduled and when it was distributed over federation. + +### Conflicting delayed state events + +A delayed state event can overwrite other state events that were sent in between the delayed event +being scheduled and it being sent. Whether or not this is problematic strongly depends on the +use case, though. When the overwrite is undesired, a possible remedy could be to cancel the scheduled +delayed event when a conflicting new state event is sent into the room. Alternatively, it might be +possible to avoid the conflict in the first place by using separate `state_key`s or by not relying on +state events to begin with. Additionally, this type of race condition can also happen without delayed +events due to federation delay. Potentially addressing this situation is, therefore, left to a future +proposal. + +### Rate-limiting for heartbeats + +In office environments several clients might share the same public IP address. If the server +rate limits based on IP address and multiple clients use the heartbeat pattern (where a scheduled +delayed event is rescheduled recurringly), it is more likely for them to run into rate limit blocks. +This can result in their delayed event being triggered too early which can negatively affect application +logic. For instance, if the delayed event is a call disconnect event, triggering it too early will result +in unexpected disconnections from the call[^call-rate-limit]. + +The same problem with rate limiting based on IP address also occurs when an external service +manages delayed events for a large number of users. + +[^call-rate-limit]: See also https://github.com/element-hq/element-call/issues/3985. + +To mitigate this, the server SHOULD rate limit the management endpoints based on the `delay_id`. + +### Inability to filter and paginate delayed events + +`GET /_matrix/client/v1/delayed_events` lacks request parameters for filtering and pagination. It also +doesn't allow querying finalised delayed events. This could be limiting in some cases. A future proposal +such as [MSC4486] may extend the endpoint to support those use cases. + +[MSC4486]: https://github.com/matrix-org/matrix-spec-proposals/pull/4486 + +## Alternatives + +### OAuth 2.0 scope for management endpoints + +Instead of the [delayed event management endpoints](#managing-scheduled-delayed-events) being unauthenticated +to permit [delegation to an external service](#delegating-scheduled-delayed-events), +those endpoints could be given an OAuth 2.0 scope and be restricted to sessions that have requested it. +The scope would be within the existing `urn:matrix:client:api:*` scope, +so that access to the entirety the Client-Server API would include access to these endpoints as well. + +Using OAuth 2.0 to restrict access on these endpoints has many benefits over using a path parameter (`delay_id`) as an +access token, such as more fine-grained revocability on access to the endpoints, and +better identification of what entity is requesting these endpoints, which can be used to apply per-entity ratelimits. + +The downsides of this approach are the required work of having to implement scoped access tokens in homeservers +and the additional network/configuration overhead for external services to request access to this scope. + +### Management endpoint action in request body + +A previous version of this MSC defined the [delayed event management endpoints](#managing-scheduled-delayed-events) +with a single URL for all management actions, where the action to perform was specified in the request's JSON body +in a field named "action": + +```http +POST /_matrix/client/v1/delayed_events/1234567890 +Content-Type: application/json + +{ + "action": "send" +} +``` + +This has been changed to permit more fine-grained routing/load-balancing/authentication/scopes on those endpoints, +and to optimize network traffic by eliminating the payload of these requests. + +### Batch sending + +In some scenarios it is important to allow to send an event with an associated +delay at the same time. + +- One example would be redacting an event. It only makes sense to redact the event if it exists. + It might be important to have the guarantee that the delayed redact is received + by the homeserver at the time where the original message is sent. +- In the case of a state event, a user might want to set the state to `A` and after a + timeout change it back to `{}`. By using two separate requests, sending `A` could work, + but the event with content `{}` could fail. The state would not automatically + reset to `{}`. + +For this use case, batch sending of multiple delayed events would be desired. + +Batch sending is not included in the proposal of this MSC however, since batch sending should +become a generic Matrix concept as proposed with `/send_pdus`. +(see: [MSC4080: Cryptographic Identities](https://github.com/matrix-org/matrix-spec-proposals/pull/4080)) + +[MSC2716: Incrementally importing history into existing rooms]( +https://github.com/matrix-org/matrix-spec-proposals/pull/2716) already proposes a `batch_send` endpoint. +However, it is limited to application services and focuses on historic +data. Since the additional capability to use a template `event_id` parameter is also needed, +this probably is not a good fit. + +### Reusing the `send`/`state` endpoints + +Instead of creating a new endpoint for scheduling a delayed event, +the `/send` and `/state` endpoints could support sending events with a delay, +via an optional query parameter for specifying the desired delay: + +`PUT /_matrix/client/v1/rooms/{roomId}/send/{eventType}/{txnId}?delay={delay_ms}` + +`PUT /_matrix/client/v1/rooms/{roomId}/state/{eventType}/{stateKey}?delay={delay_ms}` + +This would require the endpoints to give a different response body when sending a delayed event, +as the usual response of the event ID is unavailable at the time of scheduling a delayed event. + +The main benefit of reusing the endpoints as described is the potential for code reuse in homeserver implementations. +However, this could be negated by the complexity of having to support multiple response body formats. + +This approach also requires using a query parameter for a purpose other than filtering, +which defies the usual semantics of a URI query. + +### Sync-loop based heartbeats + +[MSC4018: Reliable call membership] was an earlier attempt at making call membership more reliable. +It used the client's sync loop as an indicator to determine if the call membership event is expired. +Conceptually, this is similar to [MQTT]'s "Will Message" that is published by the server when a client +disconnects. + +[MSC4018: Reliable call membership]: https://github.com/matrix-org/matrix-spec-proposals/pull/4018 +[MQTT]: https://mqtt.org/ + +The advantage of this approach is that it doesn't require a new system for emitting heartbeats. An API +for letting clients store the leave event to be sent would still be required due to encryption, however. + +On the downside, the common timeouts for syncs are much larger than what would be desirable for +call membership (30s vs. 5s). Overall this alternative feels overly implicit and less flexible than the +solution proposed above. + +### Federated delayed events + +Delayed events could be sent over federation immediately to then have the receiving homeservers send them down to +clients at the appropriate time. + +Downsides of this approach that have been considered are that: + +- Restarts ("heartbeats") would need to be distributed via federation, meaning more traffic and processing. +- If any homeservers missed the federated "heartbeat"/restart message, then they might decide that the event is visible + to clients whereas other homeservers might have received it and come to a different conclusion. + If the event was later cancelled, then resolving the inconsistency feels more complex than if the event was never sent + in the first place. + +[MSC3277: Scheduled messages](https://github.com/matrix-org/matrix-spec-proposals/pull/3277) proposes a similar feature +and there is an extensive analysis of the pros and cons of this MSC vs MSC3277 +[in this discussion](https://github.com/matrix-org/matrix-spec-proposals/pull/4140#discussion_r1653083566). + +If modifying scheduled delayed events is not required, there is a benefit in federating them. It increases +resilience because the sender's homeserver can disconnect and the delayed event will still be delivered to receiving +clients by their own homeservers. + +However, for the MatrixRTC use case it's required to be able to modify the event after it has been scheduled. As such, +this approach has been discounted. + +The considerations above apply irrespective of whether delayed events are federated directly +or through other means such as by (ab)using typing notification EDUs. + +### `M_MAX_DELAYED_EVENTS_EXCEEDED` instead of `M_LIMIT_EXCEEDED` + +A new error code `M_MAX_DELAYED_EVENTS_EXCEEDED` could be introduced instead of reusing the existing `M_LIMIT_EXCEEDED` +code. This would allow clients to better distinguish delayed event scheduling limits from other resource limits. +Given that other resource limits are currently not differentiated via separate error codes in the API either, +reusing `M_LIMIT_EXCEEDED` seems reasonable though. + +### Naming + +The following alternative names for this concept are considered: + +- Futures: Doesn't seem like a good match because the result of the API call cannot be "awaited" as you + would normally `await` a future in various programming languages. +- Postponed Events: Similar to "delayed events" but longer. +- Last Will: Seems unfitting because the feature can be used regardless of whether the client goes offline + or not. The term also feels somewhat dark. + +### Don't provide a `send` action + +Instead of providing a `send` management action for scheduled delayed events, +the client could cancel the scheduled delayed event and send a new non-delayed event instead. + +This would simplify the API, but it's less efficient since the client would have to send two requests instead of one. + +### Use `DELETE` HTTP method for `cancel` action + +Instead of providing a `cancel` management action for scheduled delayed events, +the client could send a `DELETE` request to an endpoint representing a target delayed event. + +This feels more elegant, but it doesn't feel like a good suggestion for how the other actions are mapped. +Also, `DELETE` suggests that the target resource will be truly deleted, but this is at odds with how +cancelling a delayed event has it retained as a finalised event for later lookup. + +### Alternative to `scheduled_at` field + +Some alternative names for the `scheduled_at` field on the `GET` response are: + +- `running_since` - clearly indicates the purpose of the field, but + no other part of this proposal uses the term `running` to describe a scheduled delayed event, and + no other part of the spec uses a suffix of `since` for the name of a timestamp-valued field +- `scheduled_ts` - clearly designates the field as a timestamp due to its suffix of `ts`, + but might be misinterpreted as the scheduled send time instead of when the delayed event had been scheduled/restarted +- `created_ts` - also clearly a timestamp, but no other part of this proposal uses the term `created` +- `delaying_from` - `delaying` might be clearer than `scheduled`/`running`, and `from` might be clearer than `at`/`since` +- `delayed_since` - using past tense might better convey that this time is in the past +- `delaying_since` - `since` might be a clearer suffix than `from` +- `last_restart` - but this feels less clear than other names for a delayed event that hasn't been restarted + +An alternative field altogether is `send_ts`, with a value of `delay_ms` + the start time of the timer. +However, explicitly returning the scheduled send time suggests a strong guarantee of exactly when a delayed event will +be sent, despite this proposal allowing homeservers to adjust the scheduled send time to support batch sending. + +### Syncing failed delayed events + +Currently, clients have to fetch the delayed event info after the timeout to find an error in case the event failed, +or if another client belonging to the same user had cancelled a scheduled delayed event. +We could instead define a new method to push failed & cancelled delayed events down `/sync` to the sender. +For application services, this information would need to be pushed via transactions. +However, this is not strictly necessary for delayed events to be usable, and may thus be discussed in a separate MSC +in the interest of keeping this MSC focused on the core functionality of delayed events. + +## Security considerations + +### Authentication + +All new endpoints are either authenticated or require knowledge of a homeserver-generated `delay_id`. +As such, generated `delay_id`s MUST be cryptographically random such that they are difficult to guess. + +### Limits + +To mitigate the risk of users flooding the delayed events database, homeservers MUST impose limits on the number and +timeout duration of scheduled delayed events. The exact limits are left as an implementation detail. + +It is the homeserver maintainer's responsibility to evaluate the best trade-off between what use cases +their users have for delayed events for and the resources they are able to provide. + +It is the homeserver implementer's responsibility to communicate this and educate the homeserver hosters about +the trade-offs and potentially give reasonable example values for those configurations. + +As described [above](#power-levels-are-evaluated-at-the-point-of-sending), the homeserver MUST evaluate and enforce the +power levels at the time of the delayed event being sent (i.e. added to the DAG). + +This feature has the risk of being used by a malicious actor to circumvent existing rate limiting measures which +corresponds to the [High Volume of Messages](https://spec.matrix.org/v1.18/appendices/#threat-high-volume-of-messages) +threat. The homeserver SHOULD apply rate-limiting to both the scheduling of delayed events and the later sending to +mitigate this risk, as well as limiting the number of scheduled events a user can have at any one time. + +## Unstable prefix + +Whilst the MSC is unstable: + +- `PUT /_matrix/client/unstable/org.matrix.msc4140/rooms/{roomId}/delayed_event/{eventType}/{txnId}` should be used + instead of the `PUT /_matrix/client/v3/rooms/{roomId}/delayed_event/{eventType}/{txnId}` endpoint. +- `POST /_matrix/client/unstable/org.matrix.msc4140/delayed_events/{delay_id}/{action}` should be used + instead of the `POST /_matrix/client/v1/delayed_events/{delay_id}/{action}` endpoints. +- `GET /_matrix/client/unstable/org.matrix.msc4140/delayed_events` should be used + instead of the `GET /_matrix/client/v1/delayed_events` endpoint. +- `org.matrix.msc4140.delay_id` should be used instead of `delay_id` as the key in `unsigned` event data. +- `org.matrix.msc4140.delayed_events` should be used instead of the `m.delayed_events` capability name. + +Additionally, the feature is to be advertised as an unstable feature in the `GET /_matrix/client/versions` response, +with the key `org.matrix.msc4140` set to `true`. So, the response could then look as follows: + +```json +{ + "versions": ["..."], + "unstable_features": { + "org.matrix.msc4140": true + } +} +``` + +Once the MSC is accepted, but before the homeserver advertises the spec version that includes the MSC, the homeserver +should advertise `org.matrix.msc4140.stable` as an unstable feature flag to let clients know that they can use the +stable endpoints for sending and managing delayed events. + +## Dependencies + +None. + +[^eventId]: An event's ID is computed from its [reference hash](https://spec.matrix.org/v1.18/rooms/v11/#event-ids) +which is obtained by combining several event properties including `origin_server_ts`. The latter is +only available once the event has actually been sent, however. Since scheduled delayed events may be +cancelled or re-scheduled, the `origin_server_ts` and, thus, the event ID cannot be determined ahead +of time.