Skip to content

Commit 29d0926

Browse files
fix(async): add stateless async bus guide (#70)
Signed-off-by: Frank Spitulski <fspitulski@nvidia.com>
1 parent f9db5e6 commit 29d0926

2 files changed

Lines changed: 239 additions & 0 deletions

File tree

docs/stateless-async-bus.md

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# Why a Message Bus - Stateless Async
2+
3+
At gigawatt scale, polling breaks down quickly. Every application cannot ask every other application for possible updates every second and still leave room for the work that actually matters.
4+
5+
The bus model pushes signals on change and on a cadence, then lets every
6+
interested consumer receive the same signal. A producer does not need to know
7+
who is listening, and a consumer does not need to know who else cares.
8+
9+
## Design Principles
10+
11+
- **Push, Don’t Poll**: React quickly at very large scale without building a polling
12+
mesh.
13+
- **Publish Once, Fan Out Many**: One state publication reaches all interested
14+
consumers without per-consumer producer load.
15+
- **Decouple Producers From Consumers**: Producers publish state. Consumers
16+
independently decide how to react.
17+
- **Converge on Current State**: Consistency comes from publishing current
18+
state, not preserving a perfect message stream. Failure is corrected by later
19+
publications, so the system self-heals.
20+
21+
## Polling at 1GW
22+
23+
Leak detection shows why polling breaks down. A liquid rack leak needs fast
24+
reaction, and more than one consumer may care. Host management, workload
25+
migration, facility response, alerting, and analysis may all need it. Waiting
26+
for the next minute or five-minute poll is too slow, and having every consumer
27+
poll the BMS every second is wasteful.
28+
29+
The BMS publishes the leak state when it changes. The bus fans that
30+
publication out to the consumers that need it.
31+
32+
```mermaid
33+
flowchart LR
34+
BMS["BMS leak state"] -->|publish on detection| Bus["Bus"]
35+
Bus --> NICo["Host management"]
36+
Bus --> Workload["Workload migration"]
37+
Bus --> Facilities["Facility response"]
38+
Bus --> Alerting["Alerting"]
39+
Bus --> Analysis["Analysis"]
40+
```
41+
42+
| Polling mesh | Stateless async bus |
43+
| :----------- | :------------------ |
44+
| Every consumer polls each producer | Producer publishes each message once |
45+
| Fast reaction needs tight polling | Fast reaction comes from push delivery |
46+
| Adding consumers adds producer load | Adding consumers adds bus load |
47+
48+
Push delivery replaces every application polling
49+
every other application and improves reaction times.
50+
51+
## Stateless by Default
52+
53+
DSX Exchange carries live, current state events. It is not a database for every
54+
application's state. The source application owns its state, publishes current
55+
state when it changes, and periodically republishes current state at a cadence
56+
it can sustain.
57+
58+
Consumers converge on current state. Messages should carry _current values_, not
59+
deltas. A temperature message should say the current temperature is 24, not that
60+
the temperature changed by 2. Consumers apply those current values idempotently,
61+
so repeated messages are safe.
62+
63+
The normal flow has three parts:
64+
65+
1. Publish when state changes.
66+
2. Periodically republish current state even when it did not change.
67+
3. Consumers process messages idempotently.
68+
69+
```mermaid
70+
sequenceDiagram
71+
participant Source as Source application
72+
participant Bus
73+
participant Consumer
74+
75+
Source->>Bus: Publish current value on change
76+
Bus->>Consumer: Deliver current value
77+
Consumer->>Consumer: Apply current value idempotently
78+
loop Periodic current state republish
79+
Source->>Source: Republish interval
80+
Source->>Bus: Republish current value
81+
Bus->>Consumer: Deliver current value
82+
Consumer->>Consumer: Apply current value idempotently
83+
end
84+
```
85+
86+
Failure can mean a missed update, producer bug, broker problem, network issue,
87+
consumer bug, bad local state, data corruption, power loss across too many
88+
high-availability domains, or any other problem that leaves a consumer with the
89+
wrong state. The next update or periodic current state republish gives the
90+
consumer the current value again.
91+
92+
This design gives the system eventual consistency and self-reconciliation.
93+
Change publications provide fast reaction, while slower scheduled republishes
94+
provide reconciliation. If local state drifts, the next changed state message or
95+
scheduled publish brings it back.
96+
97+
Keeping the bus stateless is both a correctness choice and a performance choice.
98+
_Correctness_ comes from convergence on the source's next current-state
99+
publication. This also repairs a missed message, stale consumer cache, or incorrect local value. The source publishes the current value again and
100+
consumers apply it idempotently. _Performance_ comes from keeping high-rate state
101+
on the live message path without turning each publication into replicated
102+
persistent state.
103+
104+
### The Startup Problem
105+
106+
Bootstrapping consumers is where the stateless event flow needs help. A new
107+
consumer starts with no local view. For fast-changing values, the normal stream
108+
is enough. Consumers subscribe and wait for the next current state publication.
109+
110+
Slow-changing context is different. For example, BMS metadata barely changes, so
111+
a new consumer could wait too long to learn the context needed to interpret live
112+
values. Without a bootstrap path, the consumer may be connected and receiving
113+
live values but unable to use them correctly.
114+
115+
In MQTT, use retained messages for this startup case. DSX Exchange persists the
116+
retained set so a new consumer can build its first view immediately. The
117+
retained set should stay small and slow-changing. Retained messages should be
118+
used as an optimization, not for correctness. The source application still owns
119+
the data and is responsible for republishing for reconciliation. Assume retained
120+
data can eventually be lost. At gigawatt scale, unlikely events happen.
121+
122+
This is a compromise. Retained messages _are_ broker state. Even in memory,
123+
that state has to be stored and replicated, so it has lower maximum throughput
124+
than the stateless live message path. Keep high-rate live values on the
125+
stateless path and recover missed, stale, or incorrect live values through the
126+
next current state publication.
127+
128+
## Decoupled Intent - Remodeling a Sync Request as Async
129+
130+
Traditional synchronous requests can often be modeled as async to gain scaling,
131+
decoupling, and self-healing benefits.
132+
133+
A requester publishes intent when it wants another application to change
134+
something. The state owner uses its own rules to decide what to do with that intent.
135+
136+
The state owner keeps publishing current state or status on its normal stream.
137+
That stream does not depend on an intent message. It is the ongoing source of
138+
truth for every consumer.
139+
140+
If the owner accepts the intent, the next state or status it publishes shows the
141+
accepted value. If the owner ignores it, clamps it, or falls back, the stream
142+
shows that result instead. The requester confirms the outcome by reading the
143+
same stream as every other consumer.
144+
145+
The state owner does not need a response topic, callback address, or connection
146+
back to the requester.
147+
148+
```mermaid
149+
sequenceDiagram
150+
participant Requester as Requester application
151+
participant Bus
152+
participant StateOwner as State owner
153+
154+
loop Normal state or status stream
155+
StateOwner->>Bus: Publish current state or status
156+
Bus->>Requester: Deliver current state or status
157+
end
158+
159+
Requester->>Bus: Publish intent
160+
Bus->>StateOwner: Deliver intent
161+
StateOwner->>StateOwner: Apply local rules
162+
StateOwner->>Bus: Publish current state or status
163+
Bus->>Requester: Deliver current state or status
164+
```
165+
166+
## BMS Setpoint Example
167+
168+
Even straightforward synchronous requests such as "set the target temperature
169+
for a CDU" can be modeled as async. With this model, multiple integrating systems
170+
can see and direct BMS state without increasing the load on the BMS.
171+
172+
A CDU liquid temperature control loop has three values in the bus model:
173+
174+
- The current temperature is what the BMS measures and publishes.
175+
176+
```text
177+
BMS/v1/PUB/Value/CDU/LiquidTemperature/{currentTemperatureTagPath}
178+
```
179+
180+
- The target setpoint is what the BMS is trying to hold, and publishes as BMS
181+
state.
182+
183+
```text
184+
BMS/v1/PUB/Value/CDU/LiquidTemperature/{targetSetpointTagPath}
185+
```
186+
187+
- The requested target setpoint is what the integration wants the BMS to use.
188+
The integration publishes it on the topic the BMS listens to.
189+
190+
```text
191+
BMS/v1/{integration}/Value/CDU/LiquidTemperatureSpRequest/{requestTagPath}
192+
```
193+
194+
The requested target setpoint is intent. The BMS may apply it, ignore it, clamp
195+
it to a configured range, or fall back to a local default. The integration does
196+
not get a callback from the BMS.
197+
198+
Confirmation comes from the BMS published target setpoint. If the BMS accepts
199+
the request, the target setpoint changes to the accepted value. If the BMS
200+
clamps or falls back, the target setpoint shows the value the BMS actually
201+
chose. The current temperature remains the measured value.
202+
203+
## When not to use a Bus
204+
205+
Async is the right approach for live state, fan-out, and decoupled integration, but it is not the right approach for every workflow.
206+
207+
Use a direct API when one caller needs an immediate response from one known
208+
owner before it can continue. Provisioning a machine, rebooting a known host,
209+
creating a VPC, or changing a setting that requires acknowledgement of that
210+
exact request are better as direct requests.
211+
212+
DSX Exchange can still carry the resulting resource state after the direct API
213+
call creates or changes the resource. That gives interested consumers a native
214+
async state stream without making the bus part of the synchronous request path.
215+
216+
## Practical Checklist for AsyncAPI Design
217+
218+
- Publish the current value when it changes.
219+
- Republish the current value periodically at a cadence the source can sustain.
220+
- Make repeated current-value messages safe to process idempotently.
221+
- Prefer current values over deltas.
222+
- Include a timestamp for when the value was observed or created.
223+
- Include enough identifying information in the topic, metadata, or subject for
224+
consumers to know which value is being updated.
225+
- Include a correlation field if a later status message must tie back to an intent
226+
or request.
227+
- Retain infrequently-changing metadata needed at startup.
228+
- Do not retain frequently-changing live values.
229+
230+
## Related Docs
231+
232+
- [Architecture](architecture.md)
233+
- [BMS Integration](bms-integration.md)
234+
- [BMS Event Bus Schema](schema-bms.mdx)
235+
- [NICo Host State Schema](schema-nico.mdx)
236+
- [Power Management Schema](schema-power-management.mdx)

fern/docs.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ navigation:
3939
- page: Architecture
4040
path: ../docs/architecture.md
4141
slug: architecture
42+
- page: Stateless Async Bus
43+
path: ../docs/stateless-async-bus.md
44+
slug: stateless-async-bus
4245
- page: Pre-Deployment
4346
path: ../docs/pre-deployment.md
4447
slug: pre-deployment

0 commit comments

Comments
 (0)