Skip to content

Test flake: silent failure to initialize cockroachdb? #3889

Open
@jgallagher

Description

@jgallagher

https://github.com/oxidecomputer/omicron/pull/3887/checks?check_run_id=15946171931 failed; the immediate symptom in the deploy job a timeout after trying to log in for 10 minutes:

673	2023-08-16T15:51:53.331Z	2023-08-16 15:51:52.586682700 UTC: login failed: logging in: error sending request for url (https://recovery.sys.oxide.test/v1/login/recovery/local): error trying to connect: operation timed out
674	2023-08-16T15:51:54.332Z	2023-08-16 15:51:53.587485600 UTC: attempting to log into API
675	2023-08-16T15:52:09.366Z	2023-08-16 15:52:08.621092329 UTC: login failed: logging in: error sending request for url (https://recovery.sys.oxide.test/v1/login/recovery/local): error trying to connect: operation timed out
676	2023-08-16T15:52:10.368Z	2023-08-16 15:52:09.622786274 UTC: attempting to log into API
677	2023-08-16T15:52:25.402Z	2023-08-16 15:52:24.656701751 UTC: login failed: logging in: error sending request for url (https://recovery.sys.oxide.test/v1/login/recovery/local): error trying to connect: operation timed out
678	2023-08-16T15:52:26.403Z	2023-08-16 15:52:25.657578343 UTC: attempting to log into API
679	2023-08-16T15:52:41.436Z	2023-08-16 15:52:40.691037367 UTC: login failed: logging in: error sending request for url (https://recovery.sys.oxide.test/v1/login/recovery/local): error trying to connect: operation timed out
680	2023-08-16T15:52:42.439Z	Error: logging in
681	2023-08-16T15:52:42.464Z	
682	2023-08-16T15:52:42.489Z	Caused by:
683	2023-08-16T15:52:43.040Z	    timed out after 609.325828065s

All three nexus logs have many warning that seem to indicate the database wasn't set up correctly, I think?

WARN	nexus: Cannot look up rack: Object (of type ById(24ed7902-3649-4e0e-8635-cd082ae9b0c0)) not found: rack
    file = nexus/src/app/rack.rs:555

but the sled-agent logs do not report any issues with initial CRDB setup:

328	2023-08-16T15:43:19.640Z	INFO	SledAgent (ServiceManager): Formatting CRDB
    file = sled-agent/src/services.rs:2248
...
330	2023-08-16T15:43:22.432Z	INFO	SledAgent (ServiceManager): Formatting CRDB - Completed
    file = sled-agent/src/services.rs:2269

Metadata

Metadata

Assignees

No one assigned

    Labels

    Test FlakeTests that work. Wait, no. Actually yes. Hang on. Something is broken.databaseRelated to database access

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions