Open
Description
Reported by James Beedy/Omnivector
Ocassionally fails to startup.
Steps to reproduce
Not entirely clear, but it requires the storage to not be ready, i.e.:
$ juju status
Model Controller Cloud/Region Version SLA Timestamp Notes
slurm-admin admera-vsphere-controller admera-vsphere/Admera Health 3.4.4 unsupported 16:18:46Z suspended since cloud credential is not valid
App Version Status Scale Charm Channel Rev Exposed Message
admera-hpc active 1 admera-hpc 4 no
apptainer 1.3.2-1~ubun... active 1 apptainer latest/edge 1 no
mysql 8.0.39-0ubun... active 1 mysql 8.0/stable 313 no
mysqlv0 8.0.39-0ubun... active 1 mysql 8.0/stable 313 no
node-exporter active 2 prometheus-node-exporter 1 no node-exporter started
slurmctldv0 23.02.7-0ubu... active 1 slurmctld 1 no
slurmctldv2 unknown 0 slurmctld latest/edge 85 no
slurmdbdv0 23.02.7-0ubu... active 1 slurmdbd 1 no
Unit Workload Agent Machine Public address Ports Message
mysql/1* blocked idle 11 192.168.7.133 3306,33060/tcp Failed to initialize MySQL users
node-exporter/0* active idle 192.168.7.133 node-exporter started
mysqlv0/1* blocked idle 13 192.168.7.56 3306,33060/tcp Failed to initialize MySQL users
node-exporter/1 active idle 192.168.7.56 node-exporter started
slurmctldv0/0* active idle 6 192.168.7.132
admera-hpc/1* active idle 192.168.7.132
apptainer/1* active idle 192.168.7.132
slurmdbdv0/0* active idle 4 192.168.7.140
Machine State Address Inst id Base AZ Message
4 started 192.168.7.140 juju-90f657-4 [email protected] poweredOn
6 started 192.168.7.132 juju-90f657-6 [email protected] poweredOn
11 started 192.168.7.133 juju-90f657-11 [email protected] poweredOn
12 started 192.168.7.54 juju-90f657-12 [email protected] poweredOn
13 started 192.168.7.56 juju-90f657-13 [email protected] poweredOn
Offer Application Charm Rev Connected Endpoint Interface Role
slurmctldv0 slurmctldv0 slurmctld 1 2/2 slurmd slurmd requirer
ubuntu@deployer:~
$ juju storage
ERROR getting details for storage database/1: filesystem for storage instance "database/1" not found
Expected behavior
Bootstrap dabatase/users and start operation
Actual behavior
Uncaught exception on workload initialization
Versions
Juju CLI: 3.4.4, 3.6.2
Juju agent: 3.4.4, 3.6.2
Charm revision: 313
Log output
Logs are at: https://paste.ubuntu.com/p/K6YGjDwhNJ/
Juju storage failures
ubuntu@deployer:~
$ juju list-storage
ERROR getting details for storage database/1: filesystem for storage instance "database/1" not found
ubuntu@deployer:~
$ juju remove-storage database/1
removing database/1
ubuntu@deployer:~
$ juju list-storage
ERROR getting details for storage database/2: filesystem for storage instance "database/2" not found
ubuntu@deployer:~
$ juju remove-storage database/2
removing database/2
ubuntu@deployer:~
$ juju list-storage
Unit Storage ID Type Pool Size Status Message
mysql/1 database/3 filesystem rootfs 48 GiB attached
mysqlv0/1 database/4 filesystem rootfs 48 GiB attached
Activity