Skip to content

Commit c194064

Browse files
authored
Merge pull request #341 from GoogleCloudPlatform/develop
Version 0.7.3
2 parents cc87651 + d69a058 commit c194064

File tree

168 files changed

+1780
-803
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

168 files changed

+1780
-803
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright 2021 Google LLC
1+
# Copyright 2022 Google LLC
22
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.

.tfdocs-markdown.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright 2021 Google LLC
1+
# Copyright 2022 Google LLC
22
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.

.tflint.hcl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2021 Google LLC
1+
// Copyright 2022 Google LLC
22
//
33
// Licensed under the Apache License, Version 2.0 (the "License");
44
// you may not use this file except in compliance with the License.

Makefile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ MIN_GOLANG_VERSION=1.16 # for building ghpc
1111
check-tflint check-pre-commit
1212

1313
ENG = ./cmd/... ./pkg/...
14-
TERRAFORM_FOLDERS=$(shell find ./modules ./community/modules ./tools -type f -name "*.tf" -not -path '*/\.*' -printf '%h\n' | sort -u)
15-
PACKER_FOLDERS=$(shell find ./modules ./community/modules ./tools -type f -name "*.pkr.hcl" -not -path '*/\.*' -printf '%h\n' | sort -u)
14+
TERRAFORM_FOLDERS=$(shell find ./modules ./community/modules ./tools -type f -name "*.tf" -not -path '*/\.*' -exec dirname "{}" \; | sort -u)
15+
PACKER_FOLDERS=$(shell find ./modules ./community/modules ./tools -type f -name "*.pkr.hcl" -not -path '*/\.*' -exec dirname "{}" \; | sort -u)
1616

1717
# RULES MEANT TO BE USED DIRECTLY
1818

@@ -23,12 +23,12 @@ ghpc: warn-go-version warn-terraform-version warn-packer-version $(shell find ./
2323
install-user:
2424
$(info ******** installing ghpc in ~/bin *********************)
2525
mkdir -p ~/bin
26-
install -t ~/bin ./ghpc
26+
install ./ghpc ~/bin
2727

2828
ifeq ($(shell id -u), 0)
2929
install:
3030
$(info ***** installing ghpc in /usr/local/bin ***************)
31-
install -t /usr/local/bin ./ghpc
31+
install ./ghpc /usr/local/bin
3232

3333
else
3434
install: install-user

README.md

Lines changed: 91 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ HPC Toolkit is an open-source software offered by Google Cloud which makes it
66
easy for customers to deploy HPC environments on Google Cloud.
77

88
HPC Toolkit allows customers to deploy turnkey HPC environments (compute,
9-
networking, storage, etc) following Google Cloud best-practices, in a repeatable
9+
networking, storage, etc.) following Google Cloud best-practices, in a repeatable
1010
manner. The HPC Toolkit is designed to be highly customizable and extensible,
1111
and intends to address the HPC deployment needs of a broad range of customers.
1212

@@ -140,14 +140,14 @@ packer build .
140140

141141
## HPC Toolkit Components
142142

143-
The HPC Toolkit has been designed to simplify the process of deploying a
144-
familiar HPC cluster on Google Cloud. The block diagram below describes the
145-
individual components of the HPC toolkit.
143+
The HPC Toolkit has been designed to simplify the process of deploying an HPC
144+
cluster on Google Cloud. The block diagram below describes the individual
145+
components of the HPC toolkit.
146146

147147
```mermaid
148148
graph LR
149149
subgraph HPC Environment Configuration
150-
A(1. GCP-provided Blueprint Examples) --> B(2. HPC Blueprint)
150+
A(1. Provided Blueprint Examples) --> B(2. HPC Blueprint)
151151
end
152152
B --> D
153153
subgraph Creating an HPC Deployment
@@ -159,19 +159,19 @@ graph LR
159159
end
160160
```
161161

162-
1. **GCP-provided Blueprint Examples** – A set of vetted reference blueprints
163-
can be found in the examples directory. These can be used to create a
164-
predefined deployment for a cluster or as a starting point for creating a
165-
custom deployment.
162+
1. **Provided Blueprint Examples** – A set of vetted reference blueprints can be
163+
found in the ./examples and ./community/examples directories. These can be
164+
used to create a predefined deployment for a cluster or as a starting point
165+
for creating a custom deployment.
166166
2. **HPC Blueprint** – The primary interface to the HPC Toolkit is an HPC
167167
Blueprint file. This is a YAML file that defines which modules to use and how
168168
to customize them.
169-
3. **gHPC Engine** – The gHPC engine converts the blueprint file into a
170-
self-contained deployment directory.
171-
4. **HPC Modules** – The building blocks of a deployment directory are the
169+
3. **HPC Modules** – The building blocks of a deployment directory are the
172170
modules. Modules can be found in the ./modules and community/modules
173171
directories. They are composed of terraform, packer and/or script files that
174172
meet the expectations of the gHPC engine.
173+
4. **gHPC Engine** – The gHPC engine converts the blueprint file into a
174+
self-contained deployment directory.
175175
5. **Deployment Directory** – A self-contained directory that can be used to
176176
deploy a cluster onto Google Cloud. This is the output of the gHPC engine.
177177
6. **HPC environment on GCP** – After deployment, an HPC environment will be
@@ -239,15 +239,14 @@ to the Google Cloud Console.
239239
Many of the above examples are easily executed within a Cloud Shell environment.
240240
Be aware that Cloud Shell has [several limitations][cloud-shell-limitations],
241241
in particular an inactivity timeout that will close running shells after 20
242-
minutes. Please consider it only for small blueprints that are quickly
243-
deployed.
242+
minutes. Please consider it only for blueprints that are quickly deployed.
244243

245244
## Blueprint Warnings and Errors
246245

247246
By default, each blueprint is configured with a number of "validator" functions
248-
which perform basic tests of your global variables. If `project_id`, `region`,
249-
and `zone` are defined as global variables, then the following validators are
250-
enabled:
247+
which perform basic tests of your deployment variables. If `project_id`,
248+
`region`, and `zone` are defined as deployment variables, then the following
249+
validators are enabled:
251250

252251
```yaml
253252
validators:
@@ -344,7 +343,7 @@ To view the Cloud Billing reports for your Cloud Billing account:
344343
[`Billing`](https://console.cloud.google.com/billing/overview).
345344
2. At the prompt, choose the Cloud Billing account for which you'd like to view
346345
reports. The Billing Overview page opens for the selected billing account.
347-
3. In the Billing navigation menu, select Reports.
346+
3. In the Billing navigation menu, select `Reports`.
348347

349348
In the right side, expand the Filters view and then filter by label, specifying the key `ghpc_deployment` (or `ghpc_blueprint`) and the desired value.
350349

@@ -468,7 +467,7 @@ can be found in the [Slurm on Google Cloud User Guide][slurm-on-gcp-ug],
468467
specifically the section titled "Create Service Accounts".
469468

470469
After creating the service account, it can be set via the
471-
"compute_node_service_account" and "controller_service_account" settings on the
470+
`compute_node_service_account` and `controller_service_account` settings on the
472471
[slurm-on-gcp controller module][slurm-on-gcp-con] and the
473472
"login_service_account" setting on the
474473
[slurm-on-gcp login module][slurm-on-gcp-login].
@@ -493,7 +492,7 @@ message. Here are some common reasons for the deployment to fail:
493492
* **Filestore resource limit:** When regularly deploying filestore instances
494493
with a new vpc you may see an error during deployment such as:
495494
`System limit for internal resources has been reached`. See
496-
[this doc](https://cloud.google.com/filestore/docs/troubleshooting#api_cannot_be_disabled)
495+
[this doc](https://cloud.google.com/filestore/docs/troubleshooting#system_limit_for_internal_resources_has_been_reached_error_when_creating_an_instance)
497496
for the solution.
498497
* **Required permission not found:**
499498
* Example: `Required 'compute.projects.get' permission for 'projects/... forbidden`
@@ -536,13 +535,34 @@ drop-down menu at the top-left.
536535

537536
## Inspecting the Deployment
538537

539-
The deployment is created in the directory matching the provided
540-
`deployment_name` variable in the blueprint. Within this directory are all the
541-
modules needed to deploy your cluster. The deployment directory will contain
542-
subdirectories representing the deployment groups defined in the blueprint file.
543-
Most example configurations contain a single deployment group.
538+
The deployment will be created with the following directory structure:
539+
540+
```text
541+
<<OUTPUT_PATH>>/<<DEPLOYMENT_NAME>>/{<<DEPLOYMENT_GROUPS>>}/
542+
```
543+
544+
If an output directory is provided with the `--output/-o` flag, the deployment
545+
directory will be created in the output directory, represented as
546+
`<<OUTPUT_PATH>>` here. If not provided, `<<OUTPUT_PATH>>` will default to the
547+
current working directory.
548+
549+
The deployment directory is created in `<<OUTPUT_PATH>>` as a directory matching
550+
the provided `deployment_name` deployment variable (`vars`) in the blueprint.
551+
552+
Within the deployment directory are directories representing each deployment
553+
group in the blueprint named the same as the `group` field for each element
554+
in `deployment_groups`.
544555

545-
From the [example above](#quick-start) we get the following deployment directory:
556+
In each deployment group directory, are all of the configuration scripts and
557+
modules needed to deploy. The modules are in a directory named `modules` named
558+
the same as the source module, for example the
559+
[vpc module](./modules/network/vpc/README.md) is in a directory named `vpc`.
560+
561+
A hidden directory containing meta information and backups is also created and
562+
named `.ghpc`.
563+
564+
From the [hpc-cluster-small.yaml example](./examples/hpc-cluster-small.yaml), we
565+
get the following deployment directory:
546566

547567
```text
548568
hpc-small/
@@ -556,54 +576,9 @@ hpc-small/
556576
SchedMD-slurm-on-gcp-login-node/
557577
SchedMD-slurm-on-gcp-partition/
558578
vpc/
579+
.ghpc/
559580
```
560581

561-
## `ghpc` Commands
562-
563-
### Create
564-
565-
``` shell
566-
./ghpc create <blueprint.yaml>
567-
```
568-
569-
The create command is the primary interface for the HPC Toolkit. This command
570-
takes the path to a blueprint file as an input and creates a deployment based on
571-
it. Further information on creating this blueprint file, see
572-
[Writing an HPC Blueprint](examples/README.md#writing-an-hpc-blueprint).
573-
574-
By default, the deployment directory will be created in the same directory as
575-
the `ghpc` binary and will have the name specified by the `deployment_name`
576-
field from the blueprint. Optionally, the output directory can be specified with
577-
the `-o` flag as shown in the following example.
578-
579-
```shell
580-
./ghpc create examples/hpc-cluster-small.yaml -o deployments/
581-
```
582-
583-
### Expand
584-
585-
```shell
586-
./ghpc expand <blueprint.yaml> –out <expanded-blueprint.yaml>
587-
```
588-
589-
The expand command creates an expanded blueprint file with all settings
590-
explicitly listed and variables expanded. This can be a useful tool for creating
591-
explicit, detailed examples and for debugging purposes. The expanded blueprint
592-
is still valid as input to [`ghpc create`](#create) to create the deployment.
593-
594-
### Completion
595-
596-
```shell
597-
./ghpc completion [bash|zsh|fish|powershell]
598-
```
599-
600-
The completion command creates a shell completion config file for the specified
601-
shell. To apply the configuration file created by the command, it is required to
602-
set up for each shell. For example, loading the completion config by .bashrc is
603-
required for Bash.
604-
605-
Call `ghpc completion --help` for shell specific setup instructions.
606-
607582
## Dependencies
608583

609584
Much of the HPC Toolkit deployment is built using Terraform and Packer, and
@@ -620,33 +595,59 @@ List of dependencies:
620595
* make
621596
* git
622597

623-
## MacOS Details
598+
### MacOS Additional Dependencies
599+
600+
On macOS, `make` is packaged with the Xcode command line developer tools. To
601+
install, run the following command:
624602

625-
* Install GNU `findutils` with Homebrew or Conda
626-
* `brew install findutils` (and follow instructions for modifying `PATH`)
627-
* `conda install findutils`
628-
* If using `conda`, it's easier to use conda-forge Golang without CGO
629-
* `conda install go go-nocgo go-nocgo_osx-64`
603+
```shell
604+
xcode-select --install
605+
```
606+
607+
Alternatively you can build `ghpc` directly using `go build ghpc.go`.
608+
609+
### Notes on Packer
610+
611+
The Toolkit supports Packer templates in the contemporary [HCL2 file
612+
format][pkrhcl2] and not in the legacy JSON file format. We require the use of
613+
Packer 1.7 or above, and recommend using the latest release.
614+
615+
The Toolkit's [Packer template module documentation][pkrmodreadme] describes
616+
input variables and their behavior. An [image-building example][pkrexample]
617+
and [usage instructions][pkrexamplereadme] are provided. The example integrates
618+
Packer, Terraform and
619+
[startup-script](./modules/scripts/startup-script/README.md) runners to
620+
demonstrate the power of customizing images using the same scripts that can be
621+
applied at boot-time.
622+
623+
[pkrhcl2]: https://www.packer.io/guides/hcl
624+
[pkrmodreadme]: modules/packer/custom-image/README.md
625+
[pkrexamplereadme]: examples/README.md#image-builderyaml
626+
[pkrexample]: examples/image-builder.yaml
630627

631628
## Development
632629

633630
The following setup is in addition to the [dependencies](#dependencies) needed
634631
to build and run HPC-Toolkit.
635632

636633
Please use the `pre-commit` hooks [configured](./.pre-commit-config.yaml) in
637-
this repository to ensure that all Terraform and golang modules are validated
638-
and properly documented before pushing code changes. The pre-commits configured
634+
this repository to ensure that all changes are validated, tested and properly
635+
documented before pushing code changes. The pre-commits configured
639636
in the HPC Toolkit have a set of dependencies that need to be installed before
640637
successfully passing.
641638

639+
Follow these steps to install and setup pre-commit in your cloned repository:
640+
642641
1. Install pre-commit using the instructions from [the pre-commit website](https://pre-commit.com/).
643642
1. Install TFLint using the instructions from
644643
[the TFLint documentation](https://github.com/terraform-linters/tflint#installation).
645-
* Note: The version of TFLint must be compatible with the Google plugin
646-
version identified in [tflint.hcl](.tflint.hcl). Versions of the plugin
647-
`>=0.16.0` should use `tflint>=0.35.0` and versions of the plugin
648-
`<=0.15.0` should preferably use `tflint==0.34.1`. These versions are
649-
readily available via GitHub or package managers.
644+
645+
> **_NOTE:_** The version of TFLint must be compatible with the Google plugin
646+
> version identified in [tflint.hcl](.tflint.hcl). Versions of the plugin
647+
> `>=0.16.0` should use `tflint>=0.35.0` and versions of the plugin
648+
> `<=0.15.0` should preferably use `tflint==0.34.1`. These versions are
649+
> readily available via GitHub or package managers.
650+
650651
1. Install ShellCheck using the instructions from
651652
[the ShellCheck documentation](https://github.com/koalaman/shellcheck#installing)
652653
1. The other dev dependencies can be installed by running the following command
@@ -665,26 +666,16 @@ successfully passing.
665666

666667
Now pre-commit is configured to automatically run before you commit.
667668

668-
### Packer
669+
### Development on macOS
669670

670-
The Toolkit supports Packer templates in the contemporary [HCL2 file
671-
format][pkrhcl2] and not in the legacy JSON file format. We require the use of
672-
Packer 1.7 or above, and recommend using the latest release.
671+
While macOS is a supported environment for building and executing the Toolkit,
672+
it is not supported for Toolkit development due to GNU specific shell scripts.
673673

674-
The Toolkit's [Packer template module documentation][pkrmodreadme] describes
675-
input variables and their behavior. An [image-building example][pkrexample]
676-
and [usage instructions][pkrexamplereadme] are provided. The example integrates
677-
Packer, Terraform and Toolkit Runners to demonstrate the power of customizing
678-
images using the same scripts that can be applied at boot-time.
679-
680-
[pkrhcl2]: https://www.packer.io/guides/hcl
681-
[pkrmodreadme]: modules/packer/custom-image/README.md
682-
[pkrexamplereadme]: examples/README.md#image-builderyaml
683-
[pkrexample]: examples/image-builder.yaml
674+
If developing on a mac, a workaround is to install GNU tooling by installing
675+
`coreutils` and `findutils` from a package manager such as homebrew or conda.
684676

685677
### Contributing
686678

687679
Please refer to the [contributing file](CONTRIBUTING.md) in our github repo, or
688680
to
689681
[Google’s Open Source documentation](https://opensource.google/docs/releasing/template/CONTRIBUTING/#).
690-
Before submitting, we recommend contributors run pre-commit tests (more below).

0 commit comments

Comments
 (0)