Skip to content

Uncaught exception on initial deployment on vsphere with attached volumes #602

Open
@paulomach

Description

Reported by James Beedy/Omnivector
Ocassionally fails to startup.

Steps to reproduce

Not entirely clear, but it requires the storage to not be ready, i.e.:

$ juju status
Model        Controller                 Cloud/Region                  Version  SLA          Timestamp  Notes
slurm-admin  admera-vsphere-controller  admera-vsphere/Admera Health  3.4.4    unsupported  16:18:46Z  suspended since cloud credential is not valid

App            Version          Status   Scale  Charm                     Channel      Rev  Exposed  Message
admera-hpc                      active       1  admera-hpc                               4  no       
apptainer      1.3.2-1~ubun...  active       1  apptainer                 latest/edge    1  no       
mysql          8.0.39-0ubun...  active       1  mysql                     8.0/stable   313  no       
mysqlv0        8.0.39-0ubun...  active       1  mysql                     8.0/stable   313  no       
node-exporter                   active       2  prometheus-node-exporter                 1  no       node-exporter started
slurmctldv0    23.02.7-0ubu...  active       1  slurmctld                                1  no       
slurmctldv2                     unknown      0  slurmctld                 latest/edge   85  no       
slurmdbdv0     23.02.7-0ubu...  active       1  slurmdbd                                 1  no       

Unit                Workload  Agent  Machine  Public address  Ports           Message
mysql/1*            blocked   idle   11       192.168.7.133   3306,33060/tcp  Failed to initialize MySQL users
  node-exporter/0*  active    idle            192.168.7.133                   node-exporter started
mysqlv0/1*          blocked   idle   13       192.168.7.56    3306,33060/tcp  Failed to initialize MySQL users
  node-exporter/1   active    idle            192.168.7.56                    node-exporter started
slurmctldv0/0*      active    idle   6        192.168.7.132                   
  admera-hpc/1*     active    idle            192.168.7.132                   
  apptainer/1*      active    idle            192.168.7.132                   
slurmdbdv0/0*       active    idle   4        192.168.7.140                   

Machine  State    Address        Inst id         Base          AZ  Message
4        started  192.168.7.140  juju-90f657-4   [email protected]      poweredOn
6        started  192.168.7.132  juju-90f657-6   [email protected]      poweredOn
11       started  192.168.7.133  juju-90f657-11  [email protected]      poweredOn
12       started  192.168.7.54   juju-90f657-12  [email protected]      poweredOn
13       started  192.168.7.56   juju-90f657-13  [email protected]      poweredOn

Offer        Application  Charm      Rev  Connected  Endpoint  Interface  Role
slurmctldv0  slurmctldv0  slurmctld  1    2/2        slurmd    slurmd     requirer
ubuntu@deployer:~
$ juju storage
ERROR getting details for storage database/1: filesystem for storage instance "database/1" not found

Expected behavior

Bootstrap dabatase/users and start operation

Actual behavior

Uncaught exception on workload initialization

Versions

Juju CLI: 3.4.4, 3.6.2

Juju agent: 3.4.4, 3.6.2

Charm revision: 313

Log output

Logs are at: https://paste.ubuntu.com/p/K6YGjDwhNJ/

Juju storage failures

ubuntu@deployer:~
$ juju list-storage
ERROR getting details for storage database/1: filesystem for storage instance "database/1" not found
ubuntu@deployer:~
$ juju remove-storage database/1
removing database/1
ubuntu@deployer:~
$ juju list-storage
ERROR getting details for storage database/2: filesystem for storage instance "database/2" not found
ubuntu@deployer:~
$ juju remove-storage database/2
removing database/2
ubuntu@deployer:~
$ juju list-storage
Unit       Storage ID  Type        Pool    Size    Status    Message
mysql/1    database/3  filesystem  rootfs  48 GiB  attached  
mysqlv0/1  database/4  filesystem  rootfs  48 GiB  attached  

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

bugSomething isn't working as expected

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions