Repository for the Scale Solution Tile implementation files.
Follow the steps below to provision an IBM Spectrum Scale cluster using IBM Cloud CLI.
$ cp sample/configs/hpc_workspace_config.json config.json
$ ibmcloud iam api-key-create my-api-key --file ~/.ibm-api-key.json -d "my api key"
$ cat ~/.ibm-api-key.json | jq -r ."apikey"
# copy your apikey
$ vim config.json
# paste your apikey and set all the required input parameters to create spectrum scale cluster
Also need to generate github token if you use private Github repository.
$ ibmcloud schematics workspace new -f config.json --github-token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
$ ibmcloud schematics workspace list | grep <workspace name provided in config.json>
Name ID Description Version Status Frozen
hpcc-scale-test us-east.workspace.hpcc-scale-test.3172ff2f Terraform v1.0.11 INACTIVE False
OK
$ ibmcloud schematics plan --id us-east.workspace.hpcc-scale-test.3172ff2f
Activity ID 4a606ec27712d879159464a0f8d33f1e
OK
$ ibmcloud schematics apply --id us-east.workspace.hpcc-scale-test.3172ff2f
Do you really want to perform this action? [y/N]> y
Activity ID 0f69588331523aab748361fbb854e6d0
OK
$ ibmcloud schematics logs --id us-east.workspace.hpcc-scale-test.3172ff2f
2022/04/20 05:11:57 Terraform apply | Apply complete! Resources: 40 added, 0 changed, 0 destroyed.
2022/04/20 05:11:57 Terraform apply |
2022/04/20 05:11:57 Terraform apply | Outputs:
2022/04/20 05:11:57 Terraform apply |
2022/04/20 05:11:57 Terraform apply | shematics_controller_ip = [
2022/04/20 05:11:57 Terraform apply | "169.63.173.216",
2022/04/20 05:11:57 Terraform apply | ]
2022/04/20 05:11:57 Terraform apply | ssh_command = "ssh -J [email protected] [email protected]"
2022/04/20 05:11:57 Terraform apply | ]
2022/04/20 05:11:57 Command finished successfully.
2022/04/20 05:12:04 Done with the workspace action
OK
$ ssh -J ubuntu@<bastion node ip> vpcuser@<Ip of bootstrap/storage/compute nodes>
$ ibmcloud schematics destroy --id us-east.workspace.hpcc-scale-test.3172ff2f
Do you really want to perform this action? [y/N]> y
Activity ID facd6ab01ae28d368b38198598d5e37c
OK
-
Go to https://cloud.ibm.com/schematics/workspaces and click on create a workspace.
-
Further with Schematics workspace creation page, specify the github repo URL and provide the SSH token to access private Github repo. Select Terraform version as 1.0 and click next.
-
Update the workspace details with the name/resource group information. Also choose in which region the workspace needs to be created and click save.
-
Go to Schematic Workspace Settings, under variable section, click on "burger icons" to update the following parameters:
- Provide the region and vpc_availability_zones details, where the scale cluster resources need to be provisioned.
- Update bastion_key_pair/compute_cluster_key_pair/storage_cluster_key_pair with your ibm cloud SSH key name such as "scale-ssh-key" created from a specific region in IBM Cloud.
- If required update the resource_prefix to the required naming convention.
- Fetch the public ip address of the device and update the same on remote_cidr_blocks.
- IBM Customer number(Bring your licence) needs to be provided for entitlement check.
- Update the total_storage_cluster_instances and total_compute_cluster_instances count as per your requirement.
- Update compute_cluster_gui_username, compute_cluster_gui_password, storage_cluster_gui_username, storage_cluster_gui_password.
Note: If IBM Customer number is not provide cluster creation will fail
-
Click on "Generate Plan" and ensure there are no errors and fix the errors if there are any
-
After "Generate Plan" gives no errors, click on "Apply Plan" to create resources.
-
Check the "Jobs" section on the left hand side to view the resource creation progress.
-
See the Log if the "Apply Plan" activity is successful and copy the output SSH command to your laptop terminal to SSH to either bootstrap/storage/compute nodes.
-
If device gets connected to any other different network i.e(Wi-Fi/LAN/Mobile Hotspot) from usual connection. Update the public ip address on Bastion security group to SSH to nodes
- The Spectrum Scale storage and compute nodes are configured as a GPFS cluster (owningCluster) which owns and serves the file system to be mounted.
- AccessingCluster, i.e., the compute cluster is the cluster that accesses owningCluster, and is also configured as a GPFS cluster.
- The file system mountpoint on owningCluster(storage gpfs Cluster) is specified in the variable storage_cluster_filesystem_mountpoint. Default value = "/gpfs/fs1"
- The file system mountpoint on accessingCluster(compute gpfs Cluster) is specified in the variable compute_cluster_filesystem_mountpoint. Default value = "/gpfs/fs1"
- Login to the storage node using SSH (ssh -J ubuntu@bastion_ip root@any storage_ip)
- The below command is to export the actual path from where we could run all the respective commands to validate the setup
# sudo su
# export PATH=$PATH:/usr/lpp/mmfs/bin
- The command below shows the status of the cluster
# mmgetstate -a
Node number Node name GPFS state
----------------------------------------------------
1 spectrum-scale-storage-2 active
2 spectrum-scale-storage-3 active
3 spectrum-scale-storage-1 active
- The command below shows the complete information about GPFS cluster/IP address/Admin node/Designation
# mmlscluster
GPFS cluster information
========================
GPFS cluster name: spectrum-scale.storage
GPFS cluster id: 9876153676758860235
GPFS UID domain: spectrum-scale.storage
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
---------------------------------------------------------------------------------------------------------------
1 spectrum-scale-storage-2.strgscale.com 10.241.1.7 spectrum-scale-storage-2.strgscale.com quorum-manager-perfmon
2 spectrum-scale-storage-3.strgscale.com 10.241.1.8 spectrum-scale-storage-3.strgscale.com quorum-manager-perfmon
3 spectrum-scale-storage-1.strgscale.com 10.241.1.9 spectrum-scale-storage-1.strgscale.com quorum-perfmon
- The command below shows the details about the file system
# mmlsfs all
File system attributes for /dev/fs1:
====================================
flag value description
------------------- ------------------------ -----------------------------------
-f 8192 Minimum fragment (subblock) size in bytes
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 2 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 2 Maximum number of data replicas
-j cluster Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 4194304 Block size
-Q none Quotas accounting enabled
none Quotas enforced
none Default quotas enabled
--perfileset-quota no Per-fileset quota enforcement
--filesetdf no Fileset df enabled?
-V 27.00 (5.1.3.0) File system version
--create-time Thu Apr 21 11:43:47 2022 File system creation time
-z no Is DMAPI enabled?
-L 33554432 Logfile size
-E yes Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation option
--fastea yes Fast external attributes enabled?
--encryption no Encryption enabled?
--inode-limit 3433472 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned yes is4KAligned?
--rapid-repair yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 512 Number of subblocks per full block
-P system Disk storage pools in file system
--file-audit-log no File Audit Logging enabled?
--maintenance-mode no Maintenance Mode enabled?
--flush-on-close no flush cache on file close enabled?
-d nsd_10_241_1_7_vdb;nsd_10_241_1_7_vdc;nsd_10_241_1_8_vdb;nsd_10_241_1_8_vdc;nsd_10_241_1_9_vdb;nsd_10_241_1_9_vdc Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /gpfs/fs1 Default mount point
--mount-priority 0 Mount priority
- The command below shows about the filesystem mounted to all the compute and storage nodes
# mmlsmount all -L
File system fs1 is mounted on 6 nodes:
10.241.1.8 spectrum-scale-storage-3.strgscale spectrum-scale.storage
10.241.1.9 spectrum-scale-storage-1.strgscale spectrum-scale.storage
10.241.1.7 spectrum-scale-storage-2.strgscale spectrum-scale.storage
10.241.0.5 spectrum-scale-compute-3.compscale spectrum-scale.compute
10.241.0.7 spectrum-scale-compute-1.compscale spectrum-scale.compute
10.241.0.6 spectrum-scale-compute-2.compscale spectrum-scale.compute
- The command below shows the information about the NSD servers and Disk name
# mmlsnsd -a
File system Disk name NSD servers
------------------------------------------------------------------------------
fs1 nsd_10_241_1_7_vdb spectrum-scale-storage-2.strgscale.com
fs1 nsd_10_241_1_7_vdc spectrum-scale-storage-2.strgscale.com
fs1 nsd_10_241_1_8_vdb spectrum-scale-storage-3.strgscale.com
fs1 nsd_10_241_1_8_vdc spectrum-scale-storage-3.strgscale.com
fs1 nsd_10_241_1_9_vdb spectrum-scale-storage-1.strgscale.com
fs1 nsd_10_241_1_9_vdc spectrum-scale-storage-1.strgscale.com
- The command below shows the heath status of the cluster
# mmhealth cluster show
Component Total Failed Degraded Healthy Other
-------------------------------------------------------------------------------------
NODE 3 0 0 2 1
GPFS 3 0 0 2 1
NETWORK 3 0 0 3 0
FILESYSTEM 1 0 0 1 0
DISK 6 0 0 6 0
FILESYSMGR 1 0 0 1 0
GUI 1 0 0 1 0
PERFMON 3 0 0 3 0
THRESHOLD 3 0 0 3 0
- The command below shows the status of the individual nodes
# mmhealth node show
Node name: spectrum-scale-storage-3.strgscale.com
Node status: HEALTHY
Status Change: 2 hours ago
Component Status Status Change Reasons & Notices
----------------------------------------------------------------
FILESYSMGR HEALTHY 2 hours ago -
GPFS HEALTHY 2 hours ago -
NETWORK HEALTHY 2 hours ago -
FILESYSTEM HEALTHY 2 hours ago -
DISK HEALTHY 2 hours ago -
PERFMON HEALTHY 2 hours ago -
THRESHOLD HEALTHY 2 hours ago -
- The command below showcase the feasibility to check the status of node from another node from same cluster i.e(Storage-storage/compute-compute)
# mmhealth node show -N 10.241.1.9
Node name: spectrum-scale-storage-1.strgscale.com
Node status: HEALTHY
Status Change: 2 hours ago
Component Status Status Change Reasons & Notices
----------------------------------------------------------------
GPFS HEALTHY 2 hours ago -
NETWORK HEALTHY 2 hours ago -
FILESYSTEM HEALTHY 2 hours ago -
DISK HEALTHY 2 hours ago -
PERFMON HEALTHY 2 hours ago -
THRESHOLD HEALTHY 2 hours ago -
- The command below show how to access other node from one node in the same cluster
# ssh [email protected]
###########################################################################################
# You have logged in to Instance storage virtual server. #
# - Instance storage is temporary storage that's available only while your virtual #
# server is running. #
# - Data on the drive is unrecoverable after instance shutdown, disruptive maintenance, #
# or hardware failure. #
# #
# Refer: https://cloud.ibm.com/docs/vpc?topic=vpc-instance-storage #
###########################################################################################
Activate the web console with: systemctl enable --now cockpit.socket
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register
Last login: Thu Apr 21 11:49:39 2022 from 10.241.1.5
- The command below showcase the autorization between storage and compute cluster
# mmauth show
Cluster name: spectrum-scale.compute
Cipher list: AUTHONLY
SHA digest: 1645130b96b8518d98b420a80c078638384a3a58a08c100c63e0de9f34501641
File system access: fs1 (rw, root allowed)
Cluster name: spectrum-scale.storage (this cluster)
Cipher list: AUTHONLY
SHA digest: 173bb739f290dab735ce250c1d954d6da1d912aa8b650d5b3fca49ac2e9475fd
File system access: (all rw)
- The command below shows the complete configurations of the cluster
# mmlsconfig
Configuration data for cluster spectrum-scale.storage:
------------------------------------------------------
clusterName spectrum-scale.storage
clusterId 9876153676758860235
autoload yes
profile storagesncparams
dmapiFileHandleSize 32
minReleaseLevel 5.1.3.0
tscCmdAllowRemoteConnections no
ccrEnabled yes
cipherList AUTHONLY
sdrNotifyAuthEnabled yes
maxblocksize 16M
restripeOnDiskFailure yes
unmountOnDiskFail meta
readReplicaPolicy local
workerThreads 128
maxStatCache 0
maxFilesToCache 64k
ignorePrefetchLUNCount yes
prefetchAggressivenessWrite 0
prefetchAggressivenessRead 2
[storagenodegrp]
pagepool 1G
[common]
tscCmdPortRange 60000-61000
adminMode central
File systems in cluster spectrum-scale.storage:
-----------------------------------------------
/dev/fs1
Note: The above specified commands can be tried from both compute/storage nodes. The output would be the same, respective of nodes accessed from
- The command below shows the complete storage/compute cluster information
# sudo su
# mmcloudworkflows cluster info
Spectrum Scale Storage Cluster
|-------------------------------------------|---------------|--------|-----|
| Instance Id | Private IP | Quorum | GUI |
|-------------------------------------------|---------------|--------|-----|
| 0787_46b70435-ca51-42ed-9953-5723a14329a3 | 10.241.1.9 | Y | |
| 0787_45bf9f53-e480-4e0e-af49-c48a2f5a792e | 10.241.1.7 | Y | Y |
| 0787_b201c97f-744f-4bf1-8e46-abd175f1cec1 | 10.241.1.8 | Y | |
|-------------------------------------------|---------------|--------|-----|
Admin Node: 10.241.1.7
Spectrum Scale Compute Cluster
|-------------------------------------------|---------------|--------|-----|
| Instance Id | Private IP | Quorum | GUI |
|-------------------------------------------|---------------|--------|-----|
| 0787_5d168e74-f278-4fd7-9d59-ee1ac8cb0002 | 10.241.0.7 | Y | |
| 0787_0f8d00e2-4210-4ddd-8c00-427b3bdd9229 | 10.241.0.6 | Y | |
| 0787_cb3a9b6d-d882-4d60-9630-644222756da8 | 10.241.0.5 | Y | Y |
|-------------------------------------------|---------------|--------|-----|
Admin Node: 10.241.0.5
- The command below shows how to destroy the storage and compute nodes part of scale cluster
# mmcloudworkflows cluster destroy ibmcloud
2022-04-21 14:13:14,594 - INFO - Logging in to file: /var/adm/ras/ibm_cloud_workflow_logs/mm_cloud_workflow_teardown.log_2022-Apr-21_14-13-14
=======================================================================
| ! Danger Zone ! |
======================================================================|
| This workflow, will result in teardown of IBM Spectrum Scale |
| cluster and resources. However, it will not destroy VPC, Bastion |
| Host resources and the s3 bucket. |
| |
| Notes: |
| 1. Ensure to STOP all your applications before proceeding further. |
| 2. All IBM Scale Scale instances must be in either 'running', |
| 'pending', 'stopping' or 'stopped' state. |
=======================================================================
Do you want to continue teardown [y/N]: y
2022-04-21 14:13:33,844 - INFO - Proceeding for tear down ..
2022-04-21 14:13:33,844 - INFO - Obtaining necessary permissions to destroy cluster resources
2022-04-21 14:13:34,695 - INFO - Proceeding to destroy the IBM Spectrum Scale cluster
2022-04-21 14:13:34,695 - INFO - This may take a few minutes to complete.
Note:
- For the best user experience with Spectrum scale cluster destruction process, always log into Bootstrap node first and run the above destroy command. Wait a while for the resources related to storage and compute to be deleted.
- Once the above specified command is ran successful, login to IBM Cloud account and access schematics to destroy the resources from the workspace.
- The command below shows need to be ran from local machine to access GUI for storage and compute cluster to monitor resources
# eval `ssh-agent`
# ssh-add -k <path_of_region_specific_key>
# ssh -A -L 22443:<GUI_node_IP>:443 -N ubuntu@<bastion_host_IP>
Note:
- Provide the IP address of the GUI node for compute/storage
- To fetch the IP, you can login to bootstrap node and run one of the above specified command i.e (mmcloudworkflows cluster info). Block with "Y" shows where the GUI has been installed
Name | Version |
---|---|
ibm | 1.41.0 |
Name | Description | Type |
---|---|---|
bastion_key_pair | Name of the SSH key configured in your IBM Cloud account that is used to establish a connection to the Bastion and Bootstrap nodes. Ensure that the SSH key is present in the same resource group and region where the cluster is being provisioned and our automation supports only one ssh key that can be attached to bastion and bootstrap node.If you do not have an SSH key in your IBM Cloud account, create one by using the SSH keys instructions. | string |
compute_cluster_gui_password | Password used for logging in to the compute cluster through the GUI. Note: Password should contain a minimum of 8 characters, and for a strong password it must be a combination of uppercase letters, lowercase letters, one number and a special character. Ensure that the password doesn't include the username. | string |
compute_cluster_gui_username | GUI username to perform system management and monitoring tasks on the compute cluster. Note: Username should be at least 4 characters, any combination of lowercase and uppercase letters. | string |
compute_cluster_key_pair | Name of the SSH key configured in your IBM Cloud account that is used to establish a connection to the Compute cluster nodes. Ensure that the SSH key is present in the same resource group and region where the cluster is being provisioned and our automation supports only one ssh key that can be attached to compute nodes. If you do not have an SSH key in your IBM Cloud account, create one by using the SSH keys instructions. | string |
ibm_customer_number | IBM Customer number to be used for BYOL (bring your own license) entitlement check. | string |
remote_cidr_blocks | Comma separated list of IP addresses that can access the Spectrum Scale cluster Bastion node via SSH. For the purpose of security provide the public IP address(es) assigned to the device(s) authorized to establish SSH connections. (Example : ["169.45.117.34"]) Learn more. | list(string) |
storage_cluster_gui_password | Password used for logging in to the storage cluster through the GUI. Note: Password should contain a minimum of 8 characters, and for a strong password it must be a combination of uppercase letters, lowercase letters, one number and a special character. Ensure that the password doesn't include the username. | string |
storage_cluster_gui_username | GUI username to perform system management and monitoring tasks on the storage cluster. Note: Username should be at least 4 characters, any combination of lowercase and uppercase letters. | string |
storage_cluster_key_pair | Name of the SSH key configured in your IBM Cloud account that is used to establish a connection to the Storage cluster nodes. Ensure that the SSH key is present in the same resource group and region where the cluster is being provisioned and our automation supports only one ssh key that can be attached to storage nodes. If you do not have an SSH key in your IBM Cloud account, create one by using the SSH keys instructions. | string |
vpc_availability_zones | IBM Cloud Availability Zone name(s) within the selected region where the Spectrum Scale cluster should be deployed. (Examples: ["us-south-1"]) For more information, see Region and data center locations for resource deployment. | list(string) |
vpc_region | Name of the IBM Cloud region where the resources need to be provisioned.(Examples: us-east, us-south, etc.) For more information, see Region and data center locations for resource deployment. | string |
TF_PARALLELISM | Limit the number of concurrent operation. | string |
TF_VERSION | The version of the Terraform engine that's used in the Schematics workspace. | string |
bastion_osimage_name | Name of the image that will be used to provision the Bastion node for the Spectrum Scale cluster. Only Ubuntu stock images of any version available to the IBM Cloud account in the specific region are supported. | string |
bastion_vsi_profile | The virtual server instance profile type name to be used to create the Bastion node. For more information, see Instance Profiles | string |
bootstrap_osimage_name | Name of the custom image that you would like to use to create the Bootstrap node for the Spectrum Scale cluster. Our automation supports only the custom image that has the functionality of scale and any other custom images used without scale function will lead to the failure of cluster. | string |
bootstrap_vsi_profile | The virtual server instance profile type name to be used to create the Bootstrap node. For more information, see Instance Profiles. | string |
compute_cluster_filesystem_mountpoint | Spectrum Compute cluster (accessing Cluster) file system mount point. The accessingCluster is the cluster that accesses the owningCluster. Learn more. | string |
compute_vsi_osimage_name | Name of the custom image that you would like to use to create the Compute cluster nodes for the Spectrum Scale cluster. Our automation supports both stock images of any version and custom image of rhel 7.9 and 8.4 version which has the scale functionality. | string |
compute_vsi_profile | The virtual server instance profile type name to be used to create the Compute cluster nodes. For more information, see Instance Profiles. | string |
filesystem_block_size | File system block size. Spectrum Scale supported block sizes (in bytes) include: 256K, 512K, 1M, 2M, 4M, 8M, 16M. | string |
resource_group | Resource group name from your IBM Cloud account where the VPC resources should be deployed. For more information, seeManaging resource groups. | string |
resource_prefix | Prefix that is used to name the IBM Cloud resources that are provisioned to build the Spectrum Scale cluster. It is not possible to create multiple resources with same name. Make sure that the prefix is unique. | string |
storage_cluster_filesystem_mountpoint | Spectrum Scale storage cluster (owningCluster) file system mount point. The owningCluster is the cluster that owns and serves the file system to be mounted. Learn more. | string |
storage_vsi_osimage_name | Name of the custom image that you would like to use to create the Storage cluster nodes for the Spectrum Scale cluster. Our automation supports both stock images of any version and custom image of rhel 8.4 which has the scale functionality. | string |
storage_vsi_profile | Specify the virtual server instance profile type name to be used to create the Storage nodes. For more information, see Instance Profiles. | string |
total_compute_cluster_instances | Total number of Compute cluster instances required. A minimum of 3 nodes and a maximum of 64 nodes are supported. | number |
total_storage_cluster_instances | Total number of Storage cluster instances required. A minimum of 3 nodes and a maximum of 18 nodes are supported. | number |
vpc_cidr_block | IBM Cloud VPC address prefixes required for the VPC creation. Since our automation supports only single availability zone, so provide one cidr address prefix for vpc creation. Learn more. | list(string) |
vpc_compute_cluster_dns_domain | IBM Cloud DNS domain name to be used for compute cluster. | string |
vpc_compute_cluster_private_subnets_cidr_blocks | CIDR_block required for the creation of the compute cluster private subnet. Modify when the CIDR block has already been reserved/used for other applications within the VPC or conflicts with any on-premise CIDR blocks when using a hybrid environment. Provide only one cidr_block for the creation of compute subnet. | list(string) |
vpc_storage_cluster_dns_domain | IBM Cloud DNS domain name to be used for storage cluster. | string |
vpc_storage_cluster_private_subnets_cidr_blocks | CIDR_block required for the creation of the storage cluster private subnet. Modify when the CIDR block has already been reserved/used for other applications within the VPC or conflicts with any on-premise CIDR blocks when using a hybrid environment. Provide only one cidr_block for the creation of storage subnet. | list(string) |
Name | Description |
---|---|
shematics_controller_ip | IP that has been used by the schematics side to ssh for bastion node to push the user input data file to create storage and compute cluster. |
ssh_command | SSH command that can be used to login to bootstrap node to destroy the cluster. Use the same command to ssh to any of storage/compute node but update the respective ip of the nodes in place of bootstrap node ip.(Examples: ssh -J ubuntu@bastionip <vpcuser@ip of storage/compute node>) |
trusted_profile_id | IBM Cloud Trusted Profile ID. |
vpc_compute_cluster_dns_service_id | IBM Cloud DNS compute cluster resource instance server ID. |
vpc_compute_cluster_dns_zone_id | IBM Cloud DNS compute cluster zone ID. |
vpc_compute_cluster_private_subnets | List of IDs of compute cluster private subnets. |
vpc_custom_resolver_id | IBM Cloud DNS custom resolver ID. |
vpc_id | The ID of the VPC. |
vpc_storage_cluster_dns_service_id | IBM Cloud DNS storage cluster resource instance server ID. |
vpc_storage_cluster_dns_zone_id | IBM Cloud DNS storage cluster zone ID. |
vpc_storage_cluster_private_subnets | List of IDs of storage cluster private subnets. |