Skip to content

Provide troubleshooting guidance, when "TASK [core/cluster : cluster | Create new cluster]" hanging #581

Open
@troppens

Description

@troppens

Describe the bug
I provisioned three VMs on virtual infrastructure and tried to create a three-node Spectrum Scale cluster.

The following step was hanging for an hour or so:

TASK [core/cluster : cluster | Create new cluster] *********************************************

I added a debug message to the core/cluster.yml:

    - debug:
        msg: "/usr/lpp/mmfs/bin/mmcrcluster -N /var/mmfs/tmp/NodeFile -C {{ scale_cluster_clustername }} {{ profile_type }} {{ extra_option }}"

    - name: cluster | Create new cluster
      command: /usr/lpp/mmfs/bin/mmcrcluster -N /var/mmfs/tmp/NodeFile -C {{ scale_cluster_clustername }} {{ profile_type }} {{ extra_option }}
      notify: accept-licenses
      register: mmcrcluster_results

In the next run of the playbook it gave me a hint:

TASK [core/cluster : debug] ******************************************************************************************************************************************************************
ok: [sc1-n1 -> sc1-n1] => {
    "msg": "/usr/lpp/mmfs/bin/mmcrcluster -N /var/mmfs/tmp/NodeFile -C gpfs1.local  "
}

So I tried this command without Ansible:

[root@sc1-n1 ~]# /usr/lpp/mmfs/bin/mmcrcluster -N /var/mmfs/tmp/NodeFile -C gpfs1.local
mmcrcluster: Performing preliminary node verification ...
mmcrcluster: Processing quorum and other critical nodes ...
The authenticity of host 'sc1-n3.fyre.ibm.com (10.11.22.161)' can't be established.
ECDSA key fingerprint is SHA256:2J35XBfRLzv5RUqHYH9rGCGA+jS1KR/Lw1f1+n0JbSU.
The authenticity of host 'sc1-n1.fyre.ibm.com (10.11.17.240)' can't be established.
ECDSA key fingerprint is SHA256:2J35XBfRLzv5RUqHYH9rGCGA+jS1KR/Lw1f1+n0JbSU.
The authenticity of host 'sc1-n2.fyre.ibm.com (10.11.22.160)' can't be established.
ECDSA key fingerprint is SHA256:2J35XBfRLzv5RUqHYH9rGCGA+jS1KR/Lw1f1+n0JbSU.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Ah. SSH is not set up properly. For a new user this is not easy to determine, although this is mentioned in the README.

To improve usability, it would be good to have an additional check in the role for ssh connectivity to make the troubleshooting easier for new users.

I have also considered to have a section Troubleshooting in the README, though a check in the role would be preferred.

To Reproduce
Steps to reproduce the behavior:

  1. Provision three VMs
  2. Follow instructions in README and run ansible-playbook -i hosts playbook.yml

Expected behavior
Described above.

Environment
Please run the following an paste your output here:

  • Spectrum Scale 5.1.2.1
  • Ansible from EPEL
  • Current version of this project

Metadata

Metadata

Assignees

No one assigned

    Labels

    Component: CoreType: EnhancementType: Enhancement Indicates a request for feature to be improved.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions