Description
Upgrading a sled requires it to be vacated of instances and services. This requires that, at any time,
- A rack must always have enough aggregate free capacity to arrange to hold the contents of any individual sled, and
- For any sled S, it must always be possible to arrange the workloads on other sleds in the rack so that there is an acceptable destination for each workload running on S. (In other words, the available space can't become fragmented such that there's a sled's worth of space in aggregate, but there's no way to gather the free space onto a single sled to allow a large workload to land there.)
(All this applies equally to multi-rack deployments; the important thing is that if a workload can land in some domain, there must be enough possibly-contiguous capacity to empty a sled in that domain.)
If these properties don't hold, users or operators will have to stop workloads to take a sled out of service. We would prefer to avoid this, at least in cases where all sleds are operating normally and nothing has failed.
Instance provisioning doesn't currently guarantee either property. It only ensures that an instance will land on a sled that has space available for it without taking global usage into account. Even if Nexus did track domain-wide resource usage, we would have to discover and implement a bin-packing scheme that preserves our fragmentation properties. That seems difficult, though maybe this is a solved problem whose solution I don't know.
In the short term, it may be simplest to address this by allowing a sled to be put into a state where it's only eligible to receive migratory workloads and suggesting that operators use this mechanism to keep a sled in reserve for updates.