Skip to content

Conversation

@MauriceVanVeen
Copy link
Member

@MauriceVanVeen MauriceVanVeen commented Nov 6, 2025

Stream sourcing/mirroring on WorkQueue or Interest streams is not reliable due to the use of an ephemeral AckNone consumer. This ADR proposes a design for durable stream sourcing/mirroring that works reliably on all stream retention types.

Signed-off-by: Maurice van Veen [email protected]

@roeschter

This comment was marked as outdated.

@roeschter

This comment was marked as outdated.

@neilalexander
Copy link
Member

@roeschter Please add comments inline using the review function, otherwise it's extremely difficult to thread and track replies to individual comments.

Copy link
Contributor

@roeschter roeschter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing specifications?
-Will the sourcing side verify that the consumer has the right AckPolicy of AckFlowControl?
-What will happen to sourcing from a workqueue is NOT consumer is set? WE shouldn't fail (compatibility), but should we at least print a warning

adr/ADR-57.md Outdated

Some additional tooling will be required to create the durable consumer with the proper configuration. But through the
use of new fields/values on the consumer configuration, the server will be able to help enforce the correct
configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too vague. Which fields? What will be enforced? We need to at least reference where the details are specified,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to:

... But through the use of a new AckPolicy=AckFlowControl field, the server will be able to help enforce the correct configuration.

The enforcing it does is then mentioned in the "Consumer configuration" section, specifically:

  • Requires FlowControl and Heartbeat to be set.

adr/ADR-57.md Outdated

The durable consumer used for stream sourcing/mirroring will need to be just as performant as the current ephemeral
variant. The current ephemeral consumer configuration uses `AckNone` which is problematic for WorkQueue and Interest
streams. A different `AckPolicy` will need to be used to ensure that messages are not lost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new AckPolicy=AckFlowControl will be introduce.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just introductory text explaining AckNone can't be used, the mention of the new field is done below:

  • Uses AckPolicy=AckFlowControl instead of AckNone

adr/ADR-57.md Outdated
- `AckPolicy=AckFlowControl` will function like `AckAll` although the server will not use the ack reply on the received
messages.
- The server responds to the flow control messages, including the stream sequence (`Nats-Last-Stream`) and delivery
sequence (`Nats-Last-Consumer`) as headers to specify which messages have been successfully stored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The receiving stream responds with a flow control message, which includes the stream sequence (Nats-Last-Stream) and delivery sequence (Nats-Last-Consumer) as headers to signalling which messages have been successfully stored

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly updated to:

The receiving server responds to the flow control messages, which includes the stream sequence (Nats-Last-Stream) and delivery sequence (Nats-Last-Consumer) as headers to signal which messages have been successfully stored.

Specifically the server that's receiving the flow control message responds to it. It doesn't send flow control messages itself.

adr/ADR-57.md Outdated
messages.
- The server responds to the flow control messages, including the stream sequence (`Nats-Last-Stream`) and delivery
sequence (`Nats-Last-Consumer`) as headers to specify which messages have been successfully stored.
- The server receiving the flow control response will ack messages based on these stream/delivery sequences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server receiving the flow control response will internally acknowledge all prior messages in the consumer according to its retention policy. For work queue and interest based streams this may result in messages deletion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have kept the addition of For WorkQueue and Interest streams this may result in messages deletion. But am inclined to not change the former. A response to a flow control message will not mean ALL prior messages are to be acknowledged. Only those messages that are below the stream/delivery sequences are acked.

adr/ADR-57.md Outdated
- The consumer will be reset, resembling the delivery state of creating a new consumer with `opt_start_seq` set to the
specified sequence.
- The pending and redelivered messages will always be reset.
- The delivered stream/consumer sequences will always be reset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delivered consumer sequence will always be reset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated this to delivered stream and consumer sequence. These are two independent sequences that need to be reset.

@jnmoyne
Copy link
Contributor

jnmoyne commented Dec 1, 2025

LGTM

@MauriceVanVeen
Copy link
Member Author

Opened a draft PR on the server with a working version for this. I'll update this ADR with some refinements soon.

@MauriceVanVeen
Copy link
Member Author

This ADR is now also updated to reflect the initial implemention. Requiring to specify MaxAckPending as upper bound for pending messages, unset AckWait/BackOff, and infinite MaxDeliver.

@MauriceVanVeen MauriceVanVeen force-pushed the maurice/durable-sourcing branch from f741e0a to ed4847e Compare December 5, 2025 09:05
Copy link
Collaborator

@ripienaar ripienaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feels like we need a proper ok from @derekcollison also maybe?

@MauriceVanVeen MauriceVanVeen force-pushed the maurice/durable-sourcing branch from 06e8ab3 to 00aef99 Compare January 6, 2026 15:26
@MauriceVanVeen MauriceVanVeen changed the title ADR-57: JetStream durable stream sourcing/mirroring ADR-58: JetStream durable stream sourcing/mirroring Jan 6, 2026
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/durable-sourcing branch from 00aef99 to e7c3980 Compare January 6, 2026 15:40
adr/ADR-58.md Outdated

The durable consumer used for stream sourcing/mirroring will need to be just as performant as the current ephemeral
variant. The current ephemeral consumer configuration uses `AckNone` which is problematic for WorkQueue and Interest
streams. A different `AckPolicy` (`AckFlowControl`) will need to be used to ensure that messages are not lost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial change:
A new AckPolicy (AckFlowControl) will need to be used to ensure that performance is on par with an OrderedConsumer and the messages are not lost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

adr/ADR-58.md Outdated
- Requires `FlowControl` and `Heartbeat` to be set.
- Uses `AckPolicy=AckFlowControl` instead of `AckNone`.
- `AckPolicy=AckFlowControl` will function like `AckAll` although the receiving server will not use the current ack
reply format by acknowledging individual messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to understand - Suggestion:
AckPolicy=AckFlowControl will function like AckAll. The flow control messages (see below) will ack message rather than the current ack reply format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

adr/ADR-58.md Outdated
- Uses `AckPolicy=AckFlowControl` instead of `AckNone`.
- `AckPolicy=AckFlowControl` will function like `AckAll` although the receiving server will not use the current ack
reply format by acknowledging individual messages.
- The receiving server responds to the flow control messages, which includes the stream sequence (`Nats-Last-Stream`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"With a" rather than "to a"
The receiving server responds with a flow control messages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

and delivery sequence (`Nats-Last-Consumer`) as headers to signal which messages have been successfully stored.
- The server receiving the flow control response will ack messages based on these stream/delivery sequences. For
WorkQueue and Interest streams this may result in messages deletion.
- Acknowledgements happen based on flow control limits, usually a data window size. But if the stream is idle the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember there is a reasons for moving the ack floor. In the next section we talk about flow control ack being done according to window size. So this one is for very slow moevign streams?

But if the source/mirror is idle (low message rate) the Heartbeat will also trigger a flow control message to move the acknowledgement floor up even if the window size has not been reached.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if a stream receives small messages very slowly, it will not trigger the flow control messages often. So, if the stream hasn't received messages for Heartbeat amount of time, then a flow control message will also be sent. This ensures that a WorkQueue/Interest stream can remove messages after ack in such a slow/idle stream case.

`MaxAckPending` setting. `MaxAckPending` determines the maximum number of pending messages that can be sent before the
sourcing pauses. A flow control message will be automatically sent (no need to wait for a `Heartbeat`) so these
messages are acknowledged and new messages can be sent as soon as possible.
- Since acknowledgements happen based on dynamic flow control, it being determined either by data size, `MaxAckPending`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MUST HAVE is fine, but is this being verified? Will ackwait or backoff settings result in an error? a warning? Or result in undefined behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that's not allowed will error. Have added that here.


Importantly, the server should also handle the case where a user manually resets the consumer that's used for sourcing.
The server should handle this gracefully and ensure no messages are lost. However, the user could also reset the
consumer such that it moves ahead in the stream. The server should also handle this by properly skipping over those
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the receiving server detect that this is an intentional skip rather than a gap? Please document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, consumer delivery sequence goes back to 1, the gap is the difference between the received stream sequence and the previously received stream sequence.


Client should support the `$JS.API.CONSUMER.RESET.<STREAM>.<CONSUMER>` reset API. Clients should not rely on this call
to be initiated by the client process, but it potentially being called by another process, by the CLI for example.
Importantly, clients should not fail when the consumer delivery sequence is not monotonic, except when needed for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with a reset the consumer delivery sequence should still be monotonic. BUT the stream sequence may move backwards!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting the consumer also resets the consumer delivery sequence back to 1, in that case it's not monotonic anymore due to the reset. But it's intended to be monotonic after that (until another reset), and a gap/unordered is detected if it's not monotonically increasing.

The stream sequence may move arbitrarily forward or backward and that doesn't matter for the client's ordered consumer implementations, also why it's not mentioned here.

Signed-off-by: Maurice van Veen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants