Description
After #2954, Sled Agent has a POST /cockroachdb
API that initializes the control plane database. The current implementation is not idempotent, leading to #3498.
There's another bit of this I'm a little worried about, which is exposing an API in sled agent to initialize the CockroachDB cluster. That seems a little dangerous and also overkill since we only ever intend to do this once, and only before the control plane is initialized. I considered changing this to instead have RSS pass configuration to an SMF service which would do this. The problem is that there's not a great way to propagate success/failure information back to RSS so that it can decide whether to proceed (or, I guess, burn down the world and try again). I'm going to defer fixing this for now because we really need to start playing with multi-node CockroachDB.
I think it's worth looking at the SMF option closer. This would solve a few issues:
- it can't be called concurrently (which is good)
- it can't be called by any component in the system at any time (which the current API can, which would almost never be valid)
- it replaces an implicit interface between Sled Agent and the CockroachDB zone with a more explicit, well-defined one (see Sled Agent uses implicit interfaces with components it provisions #3407)
None of these affect idempotency per se, but I think this would give a better foundation for making it idempotent.