Skip to content

Add Events design sketch proposal#1

Open
pja-ant wants to merge 4 commits intomainfrom
pja/design-sketch
Open

Add Events design sketch proposal#1
pja-ant wants to merge 4 commits intomainfrom
pja/design-sketch

Conversation

@pja-ant
Copy link
Copy Markdown
Contributor

@pja-ant pja-ant commented Apr 9, 2026

Initial design sketch for the MCP Events primitive — subscription model, delivery modes (poll/push/webhook), and protocol surface.

Draft for WG discussion.

@pja-ant pja-ant marked this pull request as ready for review April 9, 2026 20:31
@pja-ant pja-ant requested a review from clareliguori April 9, 2026 20:31
- Mermaid sequence diagrams for poll/push/webhook delivery
- Appendix: end-to-end GitHub example with Redis sub store
- Push: clarify multiple concurrent streams allowed; note HTTP/2 dependency
- Webhook: specify 2xx ACK requires durability or end-consumer processing
- Use nextPollSeconds consistently
Comment thread docs/design-sketch-proposal.md Outdated
- **Servers advertise delivery modes; no mode is mandatory.** Each event type lists the delivery modes it supports. The client picks the best mode it can use. If there is no overlap between what the server offers and what the client can consume, that event type is simply unavailable to that client — the protocol does not require a universal fallback. (Reference SDKs are still encouraged to make poll cheap to support so that overlap is common in practice.)
- **SDK-level polling, not LLM-level polling.** The client SDK drives the polling loop without burning LLM inference tokens. The LLM is only invoked when events actually arrive.
- **No durable subscription state.** Poll holds no subscription state at all. Push scopes subscription state to connection lifetime. Webhook uses soft state with mandatory TTL — the server holds subscriptions in memory, but they expire automatically if the client stops refreshing. The client is always the source of truth across all three modes. (This principle is about *subscription* state. Servers that relay from upstream sources will still hold other state — upstream credentials, webhook registrations with the upstream, an in-memory event buffer — but none of it is owed to any particular MCP client and all of it is outside the protocol's concern.)
- **Client owns subscription state.** In all modes, the client holds the canonical list of subscriptions. For poll and push, the server has no subscription state at all. For webhook, the server holds TTL-scoped subscription records, but the client drives their lifecycle via periodic refresh.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For poll and push, the server has no subscription state at all."
This somewhat conflicts with the previous bullet: "Push scopes subscription state to connection lifetime." The server will have connection-scoped state of what events to send back on the stream

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, this is true. I think I meant "durable" state but should just reword.

Comment thread docs/design-sketch-proposal.md Outdated

- **Servers advertise delivery modes; no mode is mandatory.** Each event type lists the delivery modes it supports. The client picks the best mode it can use. If there is no overlap between what the server offers and what the client can consume, that event type is simply unavailable to that client — the protocol does not require a universal fallback. (Reference SDKs are still encouraged to make poll cheap to support so that overlap is common in practice.)
- **SDK-level polling, not LLM-level polling.** The client SDK drives the polling loop without burning LLM inference tokens. The LLM is only invoked when events actually arrive.
- **No durable subscription state.** Poll holds no subscription state at all. Push scopes subscription state to connection lifetime. Webhook uses soft state with mandatory TTL — the server holds subscriptions in memory, but they expire automatically if the client stops refreshing. The client is always the source of truth across all three modes. (This principle is about *subscription* state. Servers that relay from upstream sources will still hold other state — upstream credentials, webhook registrations with the upstream, an in-memory event buffer — but none of it is owed to any particular MCP client and all of it is outside the protocol's concern.)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Webhook uses soft state with mandatory TTL

This is an interesting choice, let's discuss about pros/cons: "Webhook uses soft state with mandatory TTL — the server holds subscriptions in memory, but they expire automatically if the client stops refreshing."

I can imagine a mostly-idle agent waking up every 5 minutes or every hour, calling some tool to get the current state of the world or poll (in case of missed webhook events), and then refreshing their webhook subscription. Does that match your mental model?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit orthogonal to what the agent does, it's more a transport concern. The agent just says "subscribe(event)" once, but the transport (via the SDK, or otherwise) periodically sends the actual subscribe message to keep the subscription fresh via TTL. It's effectively a keepalive for the webhook transport. Eventually the agent would unsubscribe, or die, or have its own application-level TTL - and then the SDK would stop sending the subscribes.

- **SDK-level polling, not LLM-level polling.** The client SDK drives the polling loop without burning LLM inference tokens. The LLM is only invoked when events actually arrive.
- **No durable subscription state.** Poll holds no subscription state at all. Push scopes subscription state to connection lifetime. Webhook uses soft state with mandatory TTL — the server holds subscriptions in memory, but they expire automatically if the client stops refreshing. The client is always the source of truth across all three modes. (This principle is about *subscription* state. Servers that relay from upstream sources will still hold other state — upstream credentials, webhook registrations with the upstream, an in-memory event buffer — but none of it is owed to any particular MCP client and all of it is outside the protocol's concern.)
- **Client owns subscription state.** In all modes, the client holds the canonical list of subscriptions. For poll and push, the server has no subscription state at all. For webhook, the server holds TTL-scoped subscription records, but the client drives their lifecycle via periodic refresh.
- **Event payloads are untrusted data.** The spec must be explicit that event payloads carry the same injection risks as tool results.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should applications do about this? What is our current guidance how to treat tool results as injection risks?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MCP is generally (and perhaps intentionally) a bit vague on this. In practice, it could mean things doing scanning for prompt injection attacks, perhaps audit logging, etc.

Comment thread docs/design-sketch-proposal.md Outdated

There are three delivery modes with different subscription mechanisms:

- **Poll mode:** Client calls `events/poll` with event name, params, and cursor. No separate subscribe step needed — the first poll with a null cursor bootstraps the subscription. Server is fully stateless.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take: is events/poll the same as events/stream but the server immediately closes once delivering all pending messages? Does it need to be a separate message? Semantically could be the same but perhaps there is utility in the nextPollSeconds and hasMore that are quite specific to polling and it makes it clear on the intended usage.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one I am thinking aligning a bit more on the transports work to make the transport less stateful may be desirable? (I am thinking SEP 2322 - MRTR)

@panyam
Copy link
Copy Markdown

panyam commented Apr 14, 2026

One clarification - are UI-originated events (user interactions) in scope of this, or is this strictly about external/backend events? Because the delivery patterns and the 'how does it reach the LLM' answer may be quite different for the two cases?

@pja-ant
Copy link
Copy Markdown
Contributor Author

pja-ant commented Apr 14, 2026

One clarification - are UI-originated events (user interactions) in scope of this, or is this strictly about external/backend events? Because the delivery patterns and the 'how does it reach the LLM' answer may be quite different for the two cases?

This is for MCP, so it is about events between an MCP client and an MCP server.

Hosts are welcome to, e.g. spin up a local MCP server and publish events for UI stuff over an MCP channel if that helps them in some way, but I don't imagine that to be a common interaction pattern. For that, it would likely make sense for the host to just have its own in-process event system that doesn't use MCP (but could hook into the same handlers if needed).

…ity, security hardening

Resolve internal contradictions and design gaps found in review:
- Webhook cursors are now client-owned (per-event retry, no server watermark); name/params/delivery.url are immutable identity fields; unsubscribe requires delivery.url
- Unify error/terminated shape across modes; define heartbeat and terminated notifications; renumber error codes to -32011..-32016 to avoid base-spec collision
- SSRF: validate at delivery time, block fe80::/10, MUST NOT follow redirects; HMAC: X-MCP-Subscription-Id header, hex-lowercase encoding, mandatory timestamp, retry regenerates
- events/stream scoped to event notifications only (existing GET SSE retained); StreamEventsResult send rules clarified per transport
- Broadcast emit now uses author-supplied match/transform hooks with ctx; poll lease keyed on (principal, eventName, hash(params)); eventId SHOULD be upstream's stable identifier
- Qualify at-least-once for emit-only types; refresh reactivates suspended delivery; capability uses listChanged
- Add Open Questions on multi-event-name subscriptions and ownership-verification handshake

Add design-sketch-revision-deltas.md documenting all changes for implementations built against the prior revision.
@pja-ant pja-ant requested a review from a team as a code owner April 17, 2026 10:52

- `events/subscribe` is ONLY used for webhook delivery. Poll and push do not need it.
- `id` is a client-generated, high-entropy identifier for the logical subscription. It MUST contain at least 122 bits of entropy (e.g., a UUIDv4). See *Subscription Identity* below.
- `secret` is a shared secret for HMAC-SHA256 signature verification of webhook deliveries. The server generates it when the subscription is created and returns it in the response; it is not returned on subsequent refreshes of an existing subscription. If the server has lost the subscription (restart, TTL expiry) the refresh creates a new one, and a fresh `secret` appears in the response — the client MUST check for its presence on every refresh and update the verifier accordingly. The client MAY supply `delivery.secret` in the request to override server generation (e.g., when the secret is provisioned out-of-band in a vault); servers SHOULD accept this but are not required to.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec here says that if the client provides delivery.secret, the sever SHOULD accept this, but is not required to.

This would mean that clients that have pre-provisioned secrets (from IAC) will not be able to connect to such MCP servers.

Is the reason here Compliance requirements, entropy quality assurance, or keeping the server implementation minimal?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget where we landed on this last week, but I do remember advocating for server-defined secrets, and my main argument is entropy quality assurance. The other benefit is that depending on webhook platform, secret behaviors change massively, and this prevents the MCP server from needing to intermediate/rewrite webhooks from upstream platforms as much.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, this is the point that's still swirling in my head. Here's my current chain of thought. (Note: this isn't reflected in the sketch right now)

The gateway is the party that verifies deliveries, so it has to recognize the secret before the first POST arrives. Server-generated forces a per-subscription write-back — host subscribes, server returns a fresh secret, host pushes it to the gateway, racing the first delivery. Client-supplied avoids that: the host sends a secret the gateway can already recognize, typically derived from a single shared root (secret = HMAC(root, sub-tuple)), so there's zero per-subscription host↔gateway traffic — and derivation is impossible with server-generated because each server picks an unpredictable value. It also changes what the secret proves: server-generated only proves "this came from the MCP server," which doesn't stop an attacker subscribing with url=<victim>; client-supplied proves "the endpoint owner asked for this," so a subscription the gateway didn't originate is inert by construction. Server-generated works for Stripe/GitHub because a human copies the secret out of a dashboard once; MCP subscriptions are per-user × per-topic and agent-initiated, so there's no dashboard moment.

Once the secret is client-supplied and high-entropy, it can carry more weight than just HMAC: make it part of the subscription identity ((url, secret, name, params)). Then subscribe is idempotent on the tuple, unsubscribe authority is "possess the secret" (no challenge, no principal special-casing), the key shape is identical on auth and unauth servers, and the client-generated id field disappears. One primitive — the secret — covers verification, proof-of-intent, and identity. The server still returns a derived id = hash(tuple) for receivers to route on, but nobody generates or remembers anything except the secret.

Trade-offs: the server can't vouch for entropy (spec a hard floor, MUST reject < 24 bytes; SDKs generate by default), it runs against the Stripe/GitHub norm (Twitch/Zoom are the client-supplied precedent), and rotating the secret means a new subscription — acceptable given everything is TTL'd anyway.

I'm like 30% confident on this though...


Push state is scoped to the lifetime of the `events/stream` connection — when the connection closes, all loops and listeners stop.

**For webhook mode:** The SDK uses the same two patterns (poll-driven or direct emit), but instead of writing to an SSE stream, it POSTs events to the subscriber's callback URL with HMAC signatures. The SDK holds webhook subscriptions in memory with TTL — no external storage is required. If the server restarts, all webhook subscriptions are lost. Clients will re-subscribe on their next refresh cycle, passing their last-persisted cursor; for event types backed by a durable upstream this resumes without gaps, while emit-only event types lose events that occurred during the outage. This is by design: the mandatory TTL + refresh mechanism eliminates the need for durable subscription storage.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is sending an event after TTL expiry considered non-conformance? Implicitly I think yes, but there's certainly some infrastructural considerations here that a softer TTL limit would help.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's allowed, but they should expect an error back from the client that the sub is not found. There are races here, and that's fine, server should just expect it near the TTL boundary.


There are three delivery modes with different subscription mechanisms:

- **Poll mode:** Client calls `events/poll` with event name, params, and cursor. No separate subscribe step needed — the first poll with a null cursor bootstraps the subscription. Server holds no protocol-required state (the SDK MAY track an ephemeral poll lease for lifecycle hooks; see *Unsubscribe timing by mode*).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Certainly more philosophical in nature: we could theoretically build this on top of resource templates. What would the relationship between this shape and a resources/read response?

],
"cursor": "historyId_99842",
"hasMore": false,
"nextPollSeconds": 30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Per Discord (and I believe last week's meeting), we've discussed removing batching support polling. Reading multiple nextPollSeconds items makes it a bit clearer too why this is useful; having multiple poll timelines to batch against adds a lot of complexity to a client looking to comply fully.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'll not have batching. Aligned.

- `id` is a client-provided identifier for each subscription. It is opaque to the server and echoed back in responses to allow the client to correlate results with subscriptions. It must be unique within a single `events/poll` request.
- `cursor` is opaque to the client. The client stores it and passes it back on the next poll. A `null` cursor means "start from now" — the server returns no events and provides a fresh cursor for subsequent polls.
- `eventId` enables client-side deduplication across polls (e.g., after a crash/restart). It is **server-assigned**: when the upstream source provides a stable event identifier (Stripe `evt_*`, GitHub delivery GUID, Kafka offset, Gmail message ID), the server SHOULD use that value as `eventId` so that the same upstream event surfaced via multiple paths (e.g., webhook emit and poll backfill) carries the same `eventId` and dedup works. The SDK auto-generates an `eventId` only when the author supplies none.
- `maxEvents` is an optional top-level field that caps the number of events returned per subscription. If more events are available than the limit, the server returns a partial batch with an intermediate cursor and sets `hasMore: true`. The client SHOULD poll again immediately (ignoring `nextPollSeconds`) to drain the backlog. If omitted, the server uses its own default limit.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: "events per subscription" is a bit ambiguous here, would be nice to make it clear it's per subscription in the poll context

- **Heartbeat.** The server MUST send periodic keepalive messages on the push stream so the client can distinguish "nothing to send" from "connection is dead." On Streamable HTTP, this is an SSE comment (`: keepalive\n\n`). On stdio, this is a `notifications/events/heartbeat` notification with empty params: `{"jsonrpc":"2.0","method":"notifications/events/heartbeat","params":{}}`. The server SHOULD send a heartbeat at least every 30 seconds. The client SHOULD treat absence of any data (events or heartbeats) beyond a threshold (e.g., 60 seconds) as connection failure and reconnect with cursors.
- **Cancellation.** On Streamable HTTP, the client aborts the request stream. On stdio, the client sends `notifications/cancelled` with the `requestId` matching the `events/stream` request's `id`. In both cases, the server MUST stop delivering events and release any associated resources. On stdio, the server SHOULD then send the `StreamEventsResult` (the result is empty and harmless; base MCP says servers SHOULD NOT respond to cancelled requests, so a server that omits it is also compliant). On Streamable HTTP, the abort is the terminal signal and no result is sent.
- **Updating subscriptions.** A client MAY hold multiple concurrent `events/stream` requests open, each with its own subscription list. To add subscriptions, open an additional stream; to remove them, cancel only the stream that carries them. Nothing prevents a client from instead consolidating onto a single stream by cancelling and re-issuing `events/stream` with the full updated list — cursor replay covers the transition gap — but this is an optimization, not a requirement. Note that on HTTP/1.1 each stream consumes a TCP connection, so clients that expect many independent subscriptions effectively depend on HTTP/2 multiplexing for the multi-stream approach to scale; SDKs SHOULD coalesce subscriptions onto fewer streams when the transport does not multiplex.
- **Reconnection after failure.** If the connection drops (HTTP) or the server stops sending (stdio), the client sends a new `events/stream` with the same subscriptions and their last-known cursors.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I wonder if there's a way to implement best-effort updates. If, for example, you have a game that has updates you probably don't care about previous events but re-subscription doesn't feel like the right response here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does subscribing with cursor: null solve this? Server is also under no obligation to replay if it just has "current" state.

- `events/subscribe` is ONLY used for webhook delivery. Poll and push do not need it.
- `id` is a client-generated, high-entropy identifier for the logical subscription. It MUST contain at least 122 bits of entropy (e.g., a UUIDv4). See *Subscription Identity* below.
- `secret` is a shared secret for HMAC-SHA256 signature verification of webhook deliveries. The server generates it when the subscription is created and returns it in the response; it is not returned on subsequent refreshes of an existing subscription. If the server has lost the subscription (restart, TTL expiry) the refresh creates a new one, and a fresh `secret` appears in the response — the client MUST check for its presence on every refresh and update the verifier accordingly. The client MAY supply `delivery.secret` in the request to override server generation (e.g., when the secret is provisioned out-of-band in a vault); servers SHOULD accept this but are not required to.
- `cursor` (request only) tells the server where to begin delivery. `null` means "start from now." A non-null value requests replay from that position (honoured when the event type is backed by a durable upstream). The cursor is **client-owned**: the server does not track a delivery watermark, and the response does not include a cursor. The client persists the `cursor` carried in each delivered payload (see *Webhook Event Delivery*) and supplies it on every refresh. If the subscription is live, the supplied cursor is at or behind the server's in-flight position and the server treats it as a no-op (delivery continues uninterrupted). If the subscription has lapsed or the server has restarted, the cursor becomes the replay point. This means clients use a single rule — always pass the last-persisted cursor — and the server is idempotent under it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is cursor required? event replay is certainly a fairly advanced feature of webhook delivery I doubt many will implement it in practice

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it shouldn't be. Need to soften the language. I think servers that do not support cursors should just return cursor: null on every event.

- `events/subscribe` is idempotent within the caller's subscription scope (see *Subscription Identity*). If a subscription with the same scoped key exists, the server resets the TTL and updates mutable fields in place. If the subscription has expired — or the server has restarted and lost it — the server creates a fresh subscription using the provided cursor.
- The server holds subscription state (id, event name, params, callback URL, secret) in memory with TTL. No durable storage is required — if the server restarts, clients will re-subscribe on their next refresh cycle. For event types backed by a durable upstream, the client's persisted cursor recovers any events that occurred during the gap; for emit-only event types, events during the gap are not recoverable (see *Emit-only event types*).

#### Subscription Identity
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This section seems to push pretty hard into MCP server implementation internals, is the goal to say that principal confusion is such a huge issue we should be making a prescription here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to an extent, but subscription identity is a protocol issue, and especially since we need to deal with the webhook receiver that doesn't have the principal, so we do need to say something. Aligned with trying not to be over-prescriptive and please push back if you feel it is.

Comment on lines +432 to +434
X-MCP-Subscription-Id: f47ac10b-58cc-4372-a567-0e02b2c3d479
X-MCP-Signature: sha256=<hex-lowercase HMAC-SHA256(secret, timestamp + "." + body)>
X-MCP-Timestamp: 1739980800
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bringing down some thoughts on whether we want to make the webhook naming scheme here MCP-specific, versus pulling in Standard Webhooks or similar's naming scheme to avoid bikeshedding

The SDK intersects the event type's advertised `delivery` list with the modes the client is configured to use, and picks in this preference order:

1. **Webhook** if the client has `WebhookConfig` set and the server lists it — low latency without tying up a persistent connection.
2. **Push** if the transport supports streaming (stdio, or HTTP when webhook isn't configured) and the server lists it — low latency, but requires holding a connection open.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider UDP in push mode. This will be super helpful for voice and video scenarios.

Copy link
Copy Markdown

@panyam panyam Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we leverage http for the transports. Two areas that this might be considered at/by:

  1. SEP 2322 - For relaxing some of the dependence specific transport types to make the transport stateless overall
  2. UDP (as a transport level requirement for events) would mean decoupling events with transports. I beleive there is a Custom Transports effort that allows one to swap these in - but would also need hosts to support (and proxies/firewalls in between)

Definitely would be interesting to see this blossom - but i also suspsect our shortest path to UDP might be http 3.0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in the meeting, but will mention again here: I'd love for MCP to support voice and video! For this SEP and WG though (and perhaps this should be made clearer), I'm considering real-time media streaming as out of scope. My read is that the protocols and interactions needed for them are very different, e.g. media probably needs something like WebRTC or similar and I expect the signaling layer protocol to be quite different from what we're trying to do here (which is more discrete event delivery).

I'd actually be very interested in whether folks have LLM use cases for voice and video over MCP. It's not something I've explored.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broader point I was trying to get at is whether the events model should leave room for more continuous or high-frequency event sources, not just discrete business events like “email received” or “incident
created”.

Examples:

  • long-running job progress: queued → running → 30% → retrying → completed/failed
  • monitoring/agent telemetry: CPU/memory/error-rate threshold changes, health state transitions, periodic stats snapshots
  • media-adjacent events: “audio stream started”, “speech segment detected”, “transcript chunk ready”, “video analysis result available”

So I agree media payload transport itself can be out of scope, but I’d like to get clarity on whether event streams can represent status/progress/telemetry-style updates, and where the boundary is
between:

  1. discrete MCP events,
  2. progress/logging/notifications that already exist,
  3. high-frequency telemetry,
  4. actual real-time media streams.

Maybe the SEP should explicitly say real-time media transport is out of scope, while control-plane events and derived events from media or monitoring systems are in scope if they fit the cursor/dedup/delivery
model.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tradoff id see with adding voice/video - is it couples the intent of events (as triggers of changes?) with specific app payloads. Today we dont have guarantees on latencies of events (or even notifications/pushes etc). And lack of these guarantees means we are looking at best efforts scenarios for now? Realtime to me feels like starting to have clearer guarantees there - so not being in scope makes sense.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikalyan Good point. One thing - For long running job progress - tasks (SEP 2663) may be a better option (which already posts notifications on progress etc). Events to me looks like sources of progress where the work did NOT orginate from a (mcp) host.

But the monitoring/telemetry is a good use case and may actually fit here - "start monitoring for cpu usage on X" and this would be a windowed listing (last N events or X seconds).


The client owns all subscription state across all three delivery modes. There is no server-side subscription listing method. The client maintains its own subscription registry and can reconstruct server-side state at any time via idempotent `events/subscribe` (for webhook mode) or by resuming poll/push with stored cursors.

Enterprise governance tools can inspect the client SDK's subscription registry for audit purposes. For webhook mode, orphaned subscriptions (e.g., from a crashed client) are cleaned up automatically by TTL expiry — no server-side listing is needed.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how server operators audit active webhook subscriptions, investigate abuse, or perform emergency shutdowns without relying only on client SDK state

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking if (and how much) say clients should have in this. Today server has full authority to do this - I am suspecting this is about how much should this be notified to the client?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are saying that server owner should keep track of all the subscriptions?


- `delivery` lists the delivery modes this event type supports — any non-empty subset of `"poll"`, `"push"`, `"webhook"`. No mode is mandatory. A client that cannot use any of the listed modes cannot subscribe to this event type.
- `inputSchema` is a JSON Schema describing valid subscription parameters — these may include filters (which narrow the event stream), transforms (which modify payloads), or other server-defined configuration. This mirrors the `inputSchema` on tools for consistency.
- `payloadSchema` describes the shape of `data` in delivered events.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How event payloads evolve compatibly, whether event names need namespaces/versioning, and whether MCP reserves an mcp.* prefix?


**Subscribe-time:** The server MUST verify that the authenticated user has permission to subscribe to the requested event type with the given params. For example, a Slack server must verify the user has access to the channel specified in the params.

**Delivery-time:** The server SHOULD periodically re-verify permissions. If the user's access is revoked (e.g., removed from a Slack channel), the server terminates the subscription. The termination signal carries the same nested-`error` shape across all modes:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should authorization must be checked on every poll, every push delivery, every webhook delivery, or only on a configured interval, and what happens to queued events after revocation

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good one - one thing coming out of the auth (and fine-grained-auth) wg is also looking at this - I am thinking deferring to that would keep it simpler here (i am sure the integration wont be as zero effort)

@calclavia
Copy link
Copy Markdown
Member

Posting here as well as Discord for more feedback/visibility:

I had an orthogonal proposal from some time ago on a way to avoid webhooks on the protocol layer completely by putting the responsibility on the transport layer.

The premise is that if MCP is already bidirectional when stateful, push should just work, and we don't need to invent anything new for triggers/events, and webhooks exist primarily because the bidirectional connection isn't durable (the network might drop). So it becomes more of a question of: how do we ensure durable SSE?

The idea is to patch the problem for Streamable HTTP: when a notification (server -> client) fails to get pulled by SSE, hit a webhook backup to deliver the events.

In this setting, the current event proposal will focus solely on event semantics, and not on how it gets delivered. That separated concern goes into the transport layer. It feels more elegant because stdio servers, for example, don't seem to make a ton of sense supporting a webhook.

Would love to hear if this idea has value!

Comment on lines +603 to +609
for msg in history.messages:
if matches_params(params, msg):
data = {"messageId": msg.id, "from": msg.sender, ...}
if params.get("redact_pii"):
data = redact(data)
events.append(Event(name="email.received", eventId=msg.id, data=data))
return EventResult(events=events, cursor=history.historyId)
Copy link
Copy Markdown

@caseychow-oai caseychow-oai Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on doing a yield-style setup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to it! All the SDK stuff here is non-normative. I'd be interested in seeing prototypes of that to see how well it works.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting @caseychow-oai - Will share a demo of this. It was interesting because I am thinking (and this is obviously an impl detail) the Yield source and an actual Event source could have different expectations on whether the listing is by a tool call or for justing getting events.


## Enterprise Governance

The spec does not mandate specific governance mechanisms but is designed to enable them:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've gotten a lot of questions about lately has been private cloud deployments with webhooks, especially around "what do I put in my WAF?"-style questions, may be helpful to pre-empt here.


6. **Should webhook subscription require an ownership-verification handshake?** Before activating delivery, the server would POST a challenge token to `delivery.url` and require the endpoint to echo it back (cf. Slack's URL verification, SNS `SubscriptionConfirmation`). This proves the subscriber controls the endpoint, preventing a client from pointing deliveries at a third party. Cost: an extra round-trip and an endpoint-side requirement. The current SSRF defenses (blocklist, no-redirect, delivery-time IP validation) mitigate internal-target abuse but not third-party-target abuse.

7. **Should webhook deliveries support multiple signatures for zero-downtime secret rotation?** The current design has a single `X-MCP-Signature` header, and rotation happens via an atomic upsert of `delivery.secret`. But there's a race: webhooks in flight when the upsert lands were signed with the old secret, and a receiver that has already switched to validating against the new secret will reject them. Stripe-style multi-signature (e.g., `X-MCP-Signature: t=<ts>,v1=<sig_old>,v1=<sig_new>`) lets the server sign with both secrets during a grace window so the receiver can verify against either. Is the in-flight window small enough to ignore, or should the spec allow `delivery.secret` to be an array (or require servers to dual-sign for N seconds after rotation)?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this, very hard to add after

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants