Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlueprintSledConfig: One generation number for sled-agent instead of three #7788

Merged
merged 14 commits into from
Mar 17, 2025

Conversation

jgallagher
Copy link
Contributor

This is the first half of #7774. I naively expected to do the blueprint side and the sled-agent side together, but the blueprint side alone is pretty big already, so here it is.

The primary changes are in nexus-types:

  • Remove BlueprintZonesConfig, BlueprintDatasetsConfig, and BlueprintPhysicalDisksConfig; each of these was a generation number + an appropriate IdMap
  • OmicronSledConfig now contains a single generation number and three IdMaps

and everything else is fallout from that. Most of the changes are mechanical (and I highly recommend viewing the diff ignoring whitespace; it cuts a few hundred +/- lines out). The bits that are less mechanical and deserve a closer view are:

  • Database schema changes
  • Blueprint diff changes (bikeshedding welcome on how to present the single config generation number)

One decision I think is fine but is also worth double checking: as of this PR, we still have to send 3 generation numbers to sled-agent in OmicronSledConfig, so I'm replicating the single BlueprintSledConfig generation for all three of the sub-configs. I think this is fine; in practice if deployed, it means a sled might see its disks or datasets generation number jump a few units up without any changes to the actual disks or datasets, but I believe that's effectively a no-op other than rewriting the ledger JSON.

@jgallagher
Copy link
Contributor Author

I think this is fine; in practice if deployed, it means a sled might see its disks or datasets generation number jump a few units up without any changes to the actual disks or datasets, but I believe that's effectively a no-op other than rewriting the ledger JSON.

I think this is actually not fine; the helios/deploy test failure looks like another instance of #7622 (albeit for a different reason):

% rg 'physical disks ensure' oxide-sled-agent_default.log | looker
21:37:56.429Z INFO SledAgent: physical disks ensure
    file = sled-agent/src/sled_agent.rs:895
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:00.195Z INFO SledAgent: physical disks ensure: Updated storage
    file = sled-agent/src/sled_agent.rs:899
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:00.195Z INFO SledAgent: physical disks ensure: Propagating new generation of disks
    file = sled-agent/src/sled_agent.rs:925
    generation = Generation(2)
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:00.195Z INFO SledAgent: physical disks ensure: Updated storage monitor
    file = sled-agent/src/sled_agent.rs:930
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:00.195Z INFO SledAgent: physical disks ensure: Updated zone bundler
    file = sled-agent/src/sled_agent.rs:935
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:00.195Z INFO SledAgent: physical disks ensure: Updated probes
    file = sled-agent/src/sled_agent.rs:941
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:00.195Z INFO SledAgent: physical disks ensure: Updated instances
    file = sled-agent/src/sled_agent.rs:946
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:12.714Z INFO SledAgent: physical disks ensure
    file = sled-agent/src/sled_agent.rs:895
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:12.715Z INFO SledAgent: physical disks ensure: Updated storage
    file = sled-agent/src/sled_agent.rs:899
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4
21:38:12.715Z INFO SledAgent: physical disks ensure: Propagating new generation of disks
    file = sled-agent/src/sled_agent.rs:925
    generation = Generation(3)
    sled_id = 019961bd-caf2-4093-bb29-adcee288a2a4

I suspect that when the generation is bumped without any of the disks actually changing, the new generation is not propagated out and so we get stuck waiting for it here just like after expunging a nonexistent disk:

// Ensure that the StorageMonitor, and the dump devices, have committed
// to start using new disks and stop using old ones.
self.inner.storage_monitor.await_generation(*our_gen).await?;
info!(self.log, "physical disks ensure: Updated storage monitor");

@jgallagher
Copy link
Contributor Author

I think this is actually not fine; the helios/deploy test failure looks like another instance of #7622 (albeit for a different reason)

FWIW: I think it's fine to review this PR as-is. I'm going to work on fixing #7622 separately, and once that lands I'm expecting the deploy test will pass.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Love the simplification.

@@ -371,17 +388,13 @@ impl BpTableData for BlueprintZonesConfig {

// Useful implementation for printing two column tables stored in a `BTreeMap`, where
// each key and value impl `Display`.
impl<S1, S2> BpTableData for (Generation, &BTreeMap<S1, S2>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

.sleds
.added
.values()
.fold(0, |acc, c| acc + c.zones_config.zones.len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so nice to get rid of those intermediate types. This reads much better now.

}
}
}

#[derive(Debug)]
pub struct DiffPhysicalDisksDetails {
// Newly added sleds don't have "before" disks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These generations were the most annoying things to work with. Very happy to get rid of them and simplify all the callers above.

@@ -0,0 +1,2 @@
ALTER INDEX IF EXISTS omicron.public.bp_sled_metadata @ bp_sled_state_pkey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

jgallagher added a commit that referenced this pull request Mar 14, 2025
…ources` (#7793)

This isn't a very big PR, but I'm not super familiar or comfortable with
the pieces surrounding these changes, so please review carefully.
Rationale for these changes follow.

---

In both #7622 and
#7788 (comment),
we see `omicron_physical_disks_ensure` get stuck (apparently
indefinitely) here:
https://github.com/oxidecomputer/omicron/blob/3802ae4736a26b25b66be028c0a2217f6428a722/sled-agent/src/sled_agent.rs#L927-L930

I believe this can happen when there's a generation number change to the
disks config but the corresponding execution of
`StorageResources::synchronize_disk_management()` does not set `updated`
to true, which means it [will not send the new disk config on the
`disks_updated`
channel](https://github.com/oxidecomputer/omicron/blob/3802ae4736a26b25b66be028c0a2217f6428a722/sled-storage/src/resources.rs#L403-L405).
This leaves `StorageMonitor::await_generation()` stuck [waiting for an
update that is never
coming](https://github.com/oxidecomputer/omicron/blob/8222b27e850d2c3adc496222a317fe5ff96d1289/sled-agent/src/storage_monitor.rs#L59-L64).

Both of the above issues are examples of such paths:

1. In #7622, if the new config removes a disk that is physically not
present, we [do not set `updated` to
true](https://github.com/oxidecomputer/omicron/blob/3802ae4736a26b25b66be028c0a2217f6428a722/sled-storage/src/resources.rs#L331-L340).
2. In
#7788 (comment),
if the generation is increased but there have been no changes to the
disk config, we never set `updated` to true.

I think there are two design choices in `StorageResources` that make
this difficult to reason about, and this PR attempts to address them
both.

1. `StorageResources` holds both [`disks: AllDisks` and `disk_updates:
watch::Sender<AllDisks>`](https://github.com/oxidecomputer/omicron/blob/3802ae4736a26b25b66be028c0a2217f6428a722/sled-storage/src/resources.rs#L221-L231),
and allows them to diverge at times and has to choose when to sync them
back up. This PR removes `disk_updates` and changes `disks` to be the
`watch::Sender<AllDisks>`, removing the possibility of us having two
different `AllDisks` that have diverged in contents. (We still have to
make local clones of the `AllDisks` via `Arc::make_mut()`, so we
temporarily have multiple copies that intentionally diverge, but they're
local and dropped when the methods return.) I tried to make it so we
didn't need to keep track of `updated` manually.
2. `StorageResources::set_config()` [updates the generation held inside
`self.disks`](https://github.com/oxidecomputer/omicron/blob/3802ae4736a26b25b66be028c0a2217f6428a722/sled-storage/src/resources.rs#L274),
but neither sends that information along the `disk_updates` channel nor
informs `synchronize_disk_resources` that the generation has changed.
This seems particularly dicey: after `set_config` returns, we've
increased the generation inside `self.disks` but have not actually acted
on the new config from that generation, so inspecting `self.disks` at
that point would be very confusing. This PR changes `set_config` to
record the generation "on the side" (i.e., not updating the generation
inside `self.disks`), and defers updating the generation inside
`self.disks` to `synchronize_disk_resources`.

I tested this on a4x2 by manually performing a "spurious disk config
generation bump", and confirmed sled-agent did not get wedged as it does
on `main`.

---

Fixes #7622.
@jgallagher
Copy link
Contributor Author

I think this is actually not fine; the helios/deploy test failure looks like another instance of #7622 (albeit for a different reason)

FWIW: I think it's fine to review this PR as-is. I'm going to work on fixing #7622 separately, and once that lands I'm expecting the deploy test will pass.

This was fixed by #7793, and indeed after merging with main with that landed, the tests on this PR pass with no other changes.

@jgallagher jgallagher merged commit 0c690f8 into main Mar 17, 2025
18 checks passed
@jgallagher jgallagher deleted the john/bp-sled-config-one-gen branch March 17, 2025 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants