Description
I saw an internal server error installing the control plane this evening. I built and installed Omicron using omicron-package
, building from a local commit merging #1298 and the current main, which is at 154a4a6. From a fresh reboot I:
pfexec ./tools/create_virtual_hardware.sh
cargo build --bin omicron-package --release
./target/release/omicron-package package
pfexec ./target/release/omicron-package install
Looking at the sled agent logs, I saw:
[2022-06-29T03:33:29.08339844Z] INFO: SledAgent/StorageManager/6346 on feldspar: CRDB is online
[2022-06-29T03:33:29.084297094Z] INFO: SledAgent/StorageManager/6346 on feldspar: Formatting CRDB
[2022-06-29T03:33:35.983551762Z] INFO: SledAgent/StorageManager/6346 on feldspar: halt_and_remove_logged: Previous zone state: Running
zone: oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
[2022-06-29T03:33:35.986214134Z] INFO: SledAgent/StorageManager/6346 on feldspar: Stopped and uninstalled zone
zone: oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
[2022-06-29T03:33:35.993636362Z] WARN: SledAgent/StorageManager/6346 on feldspar: failed to notify nexus about datasets, will retry in 327.794992ms
[2022-06-29T03:33:35.994011147Z] WARN: SledAgent/StorageManager/6346 on feldspar: failed to notify nexus, will retry in 1.07985318s
[2022-06-29T03:33:35.994340186Z] WARN: SledAgent/StorageManager/6346 on feldspar: failed to notify nexus about datasets, will retry in 471.061865ms
[2022-06-29T03:33:35.994666535Z] WARN: SledAgent/StorageManager/6346 on feldspar: failed to notify nexus about datasets, will retry in 488.023497ms
[2022-06-29T03:33:35.995017057Z] WARN: SledAgent/StorageManager/6346 on feldspar: failed to notify nexus, will retry in 925.598032ms
[2022-06-29T03:33:35.995311707Z] INFO: SledAgent/dropshot (SledAgent)/6346 on feldspar: request completed (req_id=ed6c07d9-c91a-44ec-aa51-d7948d830391, uri=/filesystem, method=PUT, remote_addr=[fd00:1122:3344:101::1]:37061, local_addr=[fd00:1122:3344:101::1]:12345, error_message_external="Internal Server Error", response_code=500)
error_message_internal: Error managing storage: Error running command in zone 'oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b': Command [/usr/sbin/zlogin oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b /opt/oxide/cockroachdb/bin/cockroach sql --insecure --host [fd00:1122:3344:101::2]:32221 --file /opt/oxide/cockroachdb/sql/dbinit.sql] executed and failed with status: exit status: 1. Stdout: CREATE DATABASE
CREATE ROLE
GRANT
CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE TYPE
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE TYPE
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE TYPE
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE INDEX
CREATE TYPE
CREATE TABLE
CREATE INDEX
CREATE INDEX
CREATE TABLE
, Stderr: NOTICE: GRANT INSERT, SELECT, UPDATE, DELETE ON DATABASE is deprecated.
This statement was automatically converted to USE omicron; ALTER DEFAULT PRIVILEGES GRANT INSERT, SELECT, UPDATE, DELETE ON TABLES TO omicron;
Please use ALTER DEFAULT PRIVILEGES going forward
ERROR: driver: bad connection
ERROR: driver: bad connection
Failed running "sql"
[2022-06-29T03:33:35.996129256Z] WARN: SledAgent/RSS/6346 on feldspar: failed to create filesystem (error="Error Response")
[2022-06-29T03:33:36.64487338Z] INFO: SledAgent/RSS/6346 on feldspar: creating new filesystem: DatasetEnsureBody { id: 4d08fc19-3d5f-4f6b-9c48-925f8eac7255, zpool_id: d462a7f7-b628-40fe-80ff-4e4189e2d62b, dataset_kind: CockroachDb { all_addresses: [[fd00:1122:3344:101::2]:32221] }, address: [fd00:1122:3344:101::2]:32221 }
[2022-06-29T03:33:36.647081411Z] INFO: SledAgent/StorageManager/6346 on feldspar: add_dataset: NewFilesystemRequest { zpool_id: d462a7f7-b628-40fe-80ff-4e4189e2d62b, dataset_kind: CockroachDb { all_addresses: [[fd00:1122:3344:101::2]:32221] }, address: [fd00:1122:3344:101::2]:32221, responder: Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) } }
[2022-06-29T03:33:36.647918134Z] INFO: SledAgent/StorageManager/6346 on feldspar: Ensuring dataset oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/cockroachdb exists
[2022-06-29T03:33:36.66757923Z] INFO: SledAgent/StorageManager/6346 on feldspar: Ensuring zone for oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/cockroachdb is running
[2022-06-29T03:33:36.679000179Z] INFO: SledAgent/StorageManager/6346 on feldspar: Zone for oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/cockroachdb was not found
[2022-06-29T03:33:36.695774353Z] INFO: SledAgent/StorageManager/6346 on feldspar: Configuring new Omicron zone: oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
[2022-06-29T03:33:36.715740835Z] INFO: SledAgent/StorageManager/6346 on feldspar: Installing Omicron zone: oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
[2022-06-29T03:33:39.927704567Z] INFO: SledAgent/StorageManager/6346 on feldspar: Zone booting
zone: oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
[2022-06-29T03:33:41.969687215Z] INFO: SledAgent/StorageManager/6346 on feldspar: Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:101::2, prefix: 64 }))
zone: oxz_cockroachdb_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
[2022-06-29T03:33:42.536102317Z] INFO: SledAgent/StorageManager/6346 on feldspar: start_zone: Loading CRDB manifest
[2022-06-29T03:33:42.905934849Z] INFO: SledAgent/StorageManager/6346 on feldspar: start_zone: setting CRDB's config/listen_addr: [fd00:1122:3344:101::2]:32221
[2022-06-29T03:33:42.94067554Z] INFO: SledAgent/StorageManager/6346 on feldspar: start_zone: setting CRDB's config/store
[2022-06-29T03:33:42.971693725Z] INFO: SledAgent/StorageManager/6346 on feldspar: start_zone: setting CRDB's config/join_addrs
[2022-06-29T03:33:42.998374003Z] INFO: SledAgent/StorageManager/6346 on feldspar: start_zone: refreshing manifest
[2022-06-29T03:33:43.023893849Z] INFO: SledAgent/StorageManager/6346 on feldspar: start_zone: enabling CRDB service
[2022-06-29T03:33:43.047899489Z] INFO: SledAgent/StorageManager/6346 on feldspar: start_zone: awaiting liveness of CRDB
[2022-06-29T03:33:43.05261359Z] WARN: SledAgent/6346 on feldspar: failed to contact nexus, will retry in 1.316873734s
[2022-06-29T03:33:43.052927673Z] WARN: SledAgent/StorageManager/6346 on feldspar: cockroachdb not yet alive
[2022-06-29T03:33:43.251247719Z] WARN: SledAgent/StorageManager/6346 on feldspar: cockroachdb not yet alive
[2022-06-29T03:33:43.520264543Z] WARN: SledAgent/StorageManager/6346 on feldspar: cockroachdb not yet alive
[2022-06-29T03:33:44.370186624Z] INFO: SledAgent/6346 on feldspar: contacting server nexus, registering sled: fb0f7546-4d46-40ca-9d56-cbb810684ca7
[2022-06-29T03:33:44.472806569Z] WARN: SledAgent/StorageManager/6346 on feldspar: cockroachdb not yet alive
[2022-06-29T03:33:45.822259473Z] INFO: SledAgent/StorageManager/6346 on feldspar: CRDB is online
[2022-06-29T03:33:45.823833209Z] INFO: SledAgent/StorageManager/6346 on feldspar: Formatting CRDB
[2022-06-29T03:33:58.271956658Z] INFO: SledAgent/StorageManager/6346 on feldspar: Formatting CRDB - Completed
The relevant bit there is the long list of database permissions. The sled agent starts up CRDB, then runs dbinit.sql
to initialize the database and schema. That includes a statement granting permissions to operate on the tables in the database. That syntax appears outdated as of cockroachdb/cockroach#73065. The sled agent apparently gets a non-zero exit code from the subprocess running that SQL file, since it tears down the zone and tries again. The second attempt succeeds. I'm not sure why there would be a difference.
We should probably adopt the recommended action which is to use ALTER DEFAULT PRIVILEGES
instead.