@@ -6,7 +6,7 @@ HPC Toolkit is an open-source software offered by Google Cloud which makes it
66easy for customers to deploy HPC environments on Google Cloud.
77
88HPC Toolkit allows customers to deploy turnkey HPC environments (compute,
9- networking, storage, etc) following Google Cloud best-practices, in a repeatable
9+ networking, storage, etc. ) following Google Cloud best-practices, in a repeatable
1010manner. The HPC Toolkit is designed to be highly customizable and extensible,
1111and intends to address the HPC deployment needs of a broad range of customers.
1212
@@ -140,14 +140,14 @@ packer build .
140140
141141## HPC Toolkit Components
142142
143- The HPC Toolkit has been designed to simplify the process of deploying a
144- familiar HPC cluster on Google Cloud. The block diagram below describes the
145- individual components of the HPC toolkit.
143+ The HPC Toolkit has been designed to simplify the process of deploying an HPC
144+ cluster on Google Cloud. The block diagram below describes the individual
145+ components of the HPC toolkit.
146146
147147``` mermaid
148148graph LR
149149 subgraph HPC Environment Configuration
150- A(1. GCP-provided Blueprint Examples) --> B(2. HPC Blueprint)
150+ A(1. Provided Blueprint Examples) --> B(2. HPC Blueprint)
151151 end
152152 B --> D
153153 subgraph Creating an HPC Deployment
@@ -159,19 +159,19 @@ graph LR
159159 end
160160```
161161
162- 1 . ** GCP-provided Blueprint Examples** – A set of vetted reference blueprints
163- can be found in the examples directory. These can be used to create a
164- predefined deployment for a cluster or as a starting point for creating a
165- custom deployment.
162+ 1 . ** Provided Blueprint Examples** – A set of vetted reference blueprints can be
163+ found in the ./ examples and ./community/examples directories. These can be
164+ used to create a predefined deployment for a cluster or as a starting point
165+ for creating a custom deployment.
1661662 . ** HPC Blueprint** – The primary interface to the HPC Toolkit is an HPC
167167 Blueprint file. This is a YAML file that defines which modules to use and how
168168 to customize them.
169- 3 . ** gHPC Engine** – The gHPC engine converts the blueprint file into a
170- self-contained deployment directory.
171- 4 . ** HPC Modules** – The building blocks of a deployment directory are the
169+ 3 . ** HPC Modules** – The building blocks of a deployment directory are the
172170 modules. Modules can be found in the ./modules and community/modules
173171 directories. They are composed of terraform, packer and/or script files that
174172 meet the expectations of the gHPC engine.
173+ 4 . ** gHPC Engine** – The gHPC engine converts the blueprint file into a
174+ self-contained deployment directory.
1751755 . ** Deployment Directory** – A self-contained directory that can be used to
176176 deploy a cluster onto Google Cloud. This is the output of the gHPC engine.
1771776 . ** HPC environment on GCP** – After deployment, an HPC environment will be
@@ -239,15 +239,14 @@ to the Google Cloud Console.
239239Many of the above examples are easily executed within a Cloud Shell environment.
240240Be aware that Cloud Shell has [ several limitations] [ cloud-shell-limitations ] ,
241241in particular an inactivity timeout that will close running shells after 20
242- minutes. Please consider it only for small blueprints that are quickly
243- deployed.
242+ minutes. Please consider it only for blueprints that are quickly deployed.
244243
245244## Blueprint Warnings and Errors
246245
247246By default, each blueprint is configured with a number of "validator" functions
248- which perform basic tests of your global variables. If ` project_id ` , ` region ` ,
249- and ` zone ` are defined as global variables, then the following validators are
250- enabled:
247+ which perform basic tests of your deployment variables. If ` project_id ` ,
248+ ` region ` , and ` zone ` are defined as deployment variables, then the following
249+ validators are enabled:
251250
252251``` yaml
253252validators :
@@ -344,7 +343,7 @@ To view the Cloud Billing reports for your Cloud Billing account:
344343 [`Billing`](https://console.cloud.google.com/billing/overview).
3453442. At the prompt, choose the Cloud Billing account for which you'd like to view
346345 reports. The Billing Overview page opens for the selected billing account.
347- 3. In the Billing navigation menu, select Reports.
346+ 3. In the Billing navigation menu, select ` Reports` .
348347
349348In the right side, expand the Filters view and then filter by label, specifying the key `ghpc_deployment` (or `ghpc_blueprint`) and the desired value.
350349
@@ -468,7 +467,7 @@ can be found in the [Slurm on Google Cloud User Guide][slurm-on-gcp-ug],
468467specifically the section titled "Create Service Accounts".
469468
470469After creating the service account, it can be set via the
471- " compute_node_service_account" and " controller_service_account" settings on the
470+ ` compute_node_service_account` and ` controller_service_account` settings on the
472471[slurm-on-gcp controller module][slurm-on-gcp-con] and the
473472" login_service_account" setting on the
474473[slurm-on-gcp login module][slurm-on-gcp-login].
@@ -493,7 +492,7 @@ message. Here are some common reasons for the deployment to fail:
493492* **Filestore resource limit:** When regularly deploying filestore instances
494493 with a new vpc you may see an error during deployment such as :
495494 ` System limit for internal resources has been reached` . See
496- [this doc](https://cloud.google.com/filestore/docs/troubleshooting#api_cannot_be_disabled )
495+ [this doc](https://cloud.google.com/filestore/docs/troubleshooting#system_limit_for_internal_resources_has_been_reached_error_when_creating_an_instance )
497496 for the solution.
498497* **Required permission not found:**
499498 * Example: `Required 'compute.projects.get' permission for 'projects/... forbidden`
@@ -536,13 +535,34 @@ drop-down menu at the top-left.
536535
537536# # Inspecting the Deployment
538537
539- The deployment is created in the directory matching the provided
540- ` deployment_name` variable in the blueprint. Within this directory are all the
541- modules needed to deploy your cluster. The deployment directory will contain
542- subdirectories representing the deployment groups defined in the blueprint file.
543- Most example configurations contain a single deployment group.
538+ The deployment will be created with the following directory structure :
539+
540+ ` ` ` text
541+ <<OUTPUT_PATH>>/<<DEPLOYMENT_NAME>>/{<<DEPLOYMENT_GROUPS>>}/
542+ ` ` `
543+
544+ If an output directory is provided with the `--output/-o` flag, the deployment
545+ directory will be created in the output directory, represented as
546+ ` <<OUTPUT_PATH>>` here. If not provided, `<<OUTPUT_PATH>>` will default to the
547+ current working directory.
548+
549+ The deployment directory is created in `<<OUTPUT_PATH>>` as a directory matching
550+ the provided `deployment_name` deployment variable (`vars`) in the blueprint.
551+
552+ Within the deployment directory are directories representing each deployment
553+ group in the blueprint named the same as the `group` field for each element
554+ in `deployment_groups`.
544555
545- From the [example above](#quick-start) we get the following deployment directory:
556+ In each deployment group directory, are all of the configuration scripts and
557+ modules needed to deploy. The modules are in a directory named `modules` named
558+ the same as the source module, for example the
559+ [vpc module](./modules/network/vpc/README.md) is in a directory named `vpc`.
560+
561+ A hidden directory containing meta information and backups is also created and
562+ named `.ghpc`.
563+
564+ From the [hpc-cluster-small.yaml example](./examples/hpc-cluster-small.yaml), we
565+ get the following deployment directory :
546566
547567` ` ` text
548568hpc-small/
@@ -556,54 +576,9 @@ hpc-small/
556576 SchedMD-slurm-on-gcp-login-node/
557577 SchedMD-slurm-on-gcp-partition/
558578 vpc/
579+ .ghpc/
559580` ` `
560581
561- # # `ghpc` Commands
562-
563- # ## Create
564-
565- ` ` ` shell
566- ./ghpc create <blueprint.yaml>
567- ` ` `
568-
569- The create command is the primary interface for the HPC Toolkit. This command
570- takes the path to a blueprint file as an input and creates a deployment based on
571- it. Further information on creating this blueprint file, see
572- [Writing an HPC Blueprint](examples/README.md#writing-an-hpc-blueprint).
573-
574- By default, the deployment directory will be created in the same directory as
575- the `ghpc` binary and will have the name specified by the `deployment_name`
576- field from the blueprint. Optionally, the output directory can be specified with
577- the `-o` flag as shown in the following example.
578-
579- ` ` ` shell
580- ./ghpc create examples/hpc-cluster-small.yaml -o deployments/
581- ` ` `
582-
583- # ## Expand
584-
585- ` ` ` shell
586- ./ghpc expand <blueprint.yaml> –out <expanded-blueprint.yaml>
587- ` ` `
588-
589- The expand command creates an expanded blueprint file with all settings
590- explicitly listed and variables expanded. This can be a useful tool for creating
591- explicit, detailed examples and for debugging purposes. The expanded blueprint
592- is still valid as input to [`ghpc create`](#create) to create the deployment.
593-
594- # ## Completion
595-
596- ` ` ` shell
597- ./ghpc completion [bash|zsh|fish|powershell]
598- ` ` `
599-
600- The completion command creates a shell completion config file for the specified
601- shell. To apply the configuration file created by the command, it is required to
602- set up for each shell. For example, loading the completion config by .bashrc is
603- required for Bash.
604-
605- Call `ghpc completion --help` for shell specific setup instructions.
606-
607582# # Dependencies
608583
609584Much of the HPC Toolkit deployment is built using Terraform and Packer, and
@@ -620,33 +595,59 @@ List of dependencies:
620595* make
621596* git
622597
623- # # MacOS Details
598+ # ## MacOS Additional Dependencies
599+
600+ On macOS, `make` is packaged with the Xcode command line developer tools. To
601+ install, run the following command :
624602
625- * Install GNU `findutils` with Homebrew or Conda
626- * `brew install findutils` (and follow instructions for modifying `PATH`)
627- * `conda install findutils`
628- * If using `conda`, it's easier to use conda-forge Golang without CGO
629- * `conda install go go-nocgo go-nocgo_osx-64`
603+ ` ` ` shell
604+ xcode-select --install
605+ ` ` `
606+
607+ Alternatively you can build `ghpc` directly using `go build ghpc.go`.
608+
609+ # ## Notes on Packer
610+
611+ The Toolkit supports Packer templates in the contemporary [HCL2 file
612+ format][pkrhcl2] and not in the legacy JSON file format. We require the use of
613+ Packer 1.7 or above, and recommend using the latest release.
614+
615+ The Toolkit's [Packer template module documentation][pkrmodreadme] describes
616+ input variables and their behavior. An [image-building example][pkrexample]
617+ and [usage instructions][pkrexamplereadme] are provided. The example integrates
618+ Packer, Terraform and
619+ [startup-script](./modules/scripts/startup-script/README.md) runners to
620+ demonstrate the power of customizing images using the same scripts that can be
621+ applied at boot-time.
622+
623+ [pkrhcl2] : https://www.packer.io/guides/hcl
624+ [pkrmodreadme] : modules/packer/custom-image/README.md
625+ [pkrexamplereadme] : examples/README.md#image-builderyaml
626+ [pkrexample] : examples/image-builder.yaml
630627
631628# # Development
632629
633630The following setup is in addition to the [dependencies](#dependencies) needed
634631to build and run HPC-Toolkit.
635632
636633Please use the `pre-commit` hooks [configured](./.pre-commit-config.yaml) in
637- this repository to ensure that all Terraform and golang modules are validated
638- and properly documented before pushing code changes. The pre-commits configured
634+ this repository to ensure that all changes are validated, tested and properly
635+ documented before pushing code changes. The pre-commits configured
639636in the HPC Toolkit have a set of dependencies that need to be installed before
640637successfully passing.
641638
639+ Follow these steps to install and setup pre-commit in your cloned repository :
640+
6426411. Install pre-commit using the instructions from [the pre-commit website](https://pre-commit.com/).
6436421. Install TFLint using the instructions from
644643 [the TFLint documentation](https://github.com/terraform-linters/tflint#installation).
645- * Note: The version of TFLint must be compatible with the Google plugin
646- version identified in [tflint.hcl](.tflint.hcl). Versions of the plugin
647- ` >=0.16.0` should use `tflint>=0.35.0` and versions of the plugin
648- ` <=0.15.0` should preferably use `tflint==0.34.1`. These versions are
649- readily available via GitHub or package managers.
644+
645+ > **_NOTE:_** The version of TFLint must be compatible with the Google plugin
646+ > version identified in [tflint.hcl](.tflint.hcl). Versions of the plugin
647+ > `>=0.16.0` should use `tflint>=0.35.0` and versions of the plugin
648+ > `<=0.15.0` should preferably use `tflint==0.34.1`. These versions are
649+ > readily available via GitHub or package managers.
650+
6506511. Install ShellCheck using the instructions from
651652 [the ShellCheck documentation](https://github.com/koalaman/shellcheck#installing)
6526531. The other dev dependencies can be installed by running the following command
@@ -665,26 +666,16 @@ successfully passing.
665666
666667Now pre-commit is configured to automatically run before you commit.
667668
668- # ## Packer
669+ # ## Development on macOS
669670
670- The Toolkit supports Packer templates in the contemporary [HCL2 file
671- format][pkrhcl2] and not in the legacy JSON file format. We require the use of
672- Packer 1.7 or above, and recommend using the latest release.
671+ While macOS is a supported environment for building and executing the Toolkit,
672+ it is not supported for Toolkit development due to GNU specific shell scripts.
673673
674- The Toolkit's [Packer template module documentation][pkrmodreadme] describes
675- input variables and their behavior. An [image-building example][pkrexample]
676- and [usage instructions][pkrexamplereadme] are provided. The example integrates
677- Packer, Terraform and Toolkit Runners to demonstrate the power of customizing
678- images using the same scripts that can be applied at boot-time.
679-
680- [pkrhcl2] : https://www.packer.io/guides/hcl
681- [pkrmodreadme] : modules/packer/custom-image/README.md
682- [pkrexamplereadme] : examples/README.md#image-builderyaml
683- [pkrexample] : examples/image-builder.yaml
674+ If developing on a mac, a workaround is to install GNU tooling by installing
675+ ` coreutils` and `findutils` from a package manager such as homebrew or conda.
684676
685677# ## Contributing
686678
687679Please refer to the [contributing file](CONTRIBUTING.md) in our github repo, or
688680to
689681[Google’s Open Source documentation](https://opensource.google/docs/releasing/template/CONTRIBUTING/#).
690- Before submitting, we recommend contributors run pre-commit tests (more below).
0 commit comments