@@ -10,183 +10,37 @@ networking, storage, etc.) following Google Cloud best-practices, in a repeatabl
1010manner. The HPC Toolkit is designed to be highly customizable and extensible,
1111and intends to address the HPC deployment needs of a broad range of customers.
1212
13- ## Installation
13+ More information can be found on the
14+ [ Google Cloud Docs] ( https://cloud.google.com/hpc-toolkit/docs/overview ) .
1415
15- These instructions assume you are using
16- [ Cloud Shell] ( https://cloud.google.com/shell ) which comes with the
17- [ dependencies] ( #dependencies ) pre-installed.
16+ ## Quickstart
1817
19- To use the HPC-Toolkit, you must clone the project from GitHub and build the
20- ` ghpc ` binary.
18+ Running through the
19+ [ quickstart tutorial] ( https://cloud.google.com/hpc-toolkit/docs/quickstarts/slurm-cluster )
20+ is the recommended path to get started with the HPC Toolkit.
2121
22- 1 . Execute ` gh auth login `
23- * Select GitHub.com
24- * Select HTTPS
25- * Select Yes for "Authenticate Git with your GitHub credentials?"
26- * Select "Login with a web browser"
27- * Copy the one time code presented in the terminal
28- * Press [ enter]
29- * Click the link https://github.com/login/device presented in the terminal
22+ Find a full list of tutorials [ here] ( docs/tutorials/README.md ) .
3023
31- A web browser will open, paste the one time code into the web browser prompt.
32- Continue to log into GitHub, then return to the terminal. You should see a
33- message that includes "Authentication complete."
24+ ---
3425
35- You can now clone the Toolkit:
26+ If a self directed path is preferred, you can use the following commands to
27+ build the ` ghpc ` binary:
3628
3729``` shell
38- gh repo clone GoogleCloudPlatform/hpc-toolkit
30+ git clone git@github.com:GoogleCloudPlatform/hpc-toolkit.git
31+ cd hpc-toolkit
32+ make
33+ ./ghpc --version
34+ ./ghpc --help
3935```
4036
41- Finally, build the toolkit.
42-
43- ``` shell
44- cd hpc-toolkit && make
45- ```
46-
47- You should now have a binary named ` ghpc ` in the project root directory.
48- Optionally, you can run ` ./ghpc --version ` to verify the build.
49-
50- ## Quick Start
51-
52- To create an HPC deployment, an HPC blueprint file needs to be written or
53- adapted from one of the [ core examples] ( examples/ ) or
54- [ community examples] ( community/examples/ ) .
55-
56- These instructions will use
57- [ examples/hpc-cluster-small.yaml] ( examples/hpc-cluster-small.yaml ) , which is a
58- good starting point and creates a deployment containing:
59-
60- * a new network
61- * a filestore instance
62- * a slurm login node
63- * a slurm controller
64-
65- > ** _ NOTE:_ ** More information on the example blueprints can be found in
66- > [ examples/README.md] ( examples/README.md ) .
67-
68- These instructions assume you are using
69- [ Cloud Shell] ( https://cloud.google.com/shell ) in the context of the GCP project
70- you wish to deploy in, and that you are in the root directory of the hpc-toolkit
71- repo cloned during [ installation] ( #installation ) .
72-
73- Run the ghpc binary with the following command:
74-
75- ``` shell
76- ./ghpc create examples/hpc-cluster-small.yaml --vars " project_id=${GOOGLE_CLOUD_PROJECT} "
77- ```
78-
79- > ** _ NOTE:_ ** The ` --vars ` argument supports comma-separated list of name=value
80- > variables to override blueprint variables. This feature only supports
81- > variables of string type.
82-
83- This will create a deployment directory named ` hpc-small/ ` .
84-
85- After successfully running ` ghpc create ` , a short message displaying how to
86- proceed is displayed. For the ` hpc-cluster-small ` example, the message will
87- appear similar to:
88-
89- ``` shell
90- terraform -chdir=hpc-cluster-small/primary init
91- terraform -chdir=hpc-cluster-small/primary validate
92- terraform -chdir=hpc-cluster-small/primary apply
93- ```
94-
95- Use these commands to run terraform and deploy your cluster. If the ` apply ` is
96- successful, a message similar to the following will be displayed:
97-
98- ``` shell
99- Apply complete! Resources: 13 added, 0 changed, 0 destroyed.
100- ```
101-
102- > ** _ NOTE:_ ** Before you run this for the first time you may need to enable some
103- > APIs and possibly request additional quotas. See
104- > [ Enable GCP APIs] ( #enable-gcp-apis ) and
105- > [ Small Example Quotas] ( examples/README.md#hpc-cluster-smallyaml ) .\
106- > ** _ NOTE:_ ** If not using cloud shell you may need to set up
107- > [ GCP Credentials] ( #gcp-credentials ) .\
108- > ** _ NOTE:_ ** Cloud Shell times out after 20 minutes of inactivity. This example
109- > deploys in about 5 minutes but for more complex deployments it may be
110- > necessary to deploy (` terraform apply ` ) from a cloud VM. The same process
111- > above can be used, although [ dependencies] ( #dependencies ) will need to be
112- > installed first.
113-
114- Once successfully deployed, take the following steps to run a job:
115-
116- * First navigate to ` Compute Engine ` > ` VM instances ` in the Google Cloud Console.
117- * Next click on the ` SSH ` button associated with the ` slurm-hpc-small-login0 ` instance.
118- * Finally run the ` hostname ` command on 3 nodes by running the following command in the shell popup:
119-
120- ``` shell
121- $ srun -N 3 hostname
122- slurm-hpc-slurm-small-debug-0-0
123- slurm-hpc-slurm-small-debug-0-1
124- slurm-hpc-slurm-small-debug-0-2
125- ```
126-
127- By default, this runs the job on the ` debug ` partition. See details in
128- [ examples/] ( examples/README.md#compute-partition ) for how to run on the more
129- performant ` compute ` partition.
130-
131- This example does not contain any Packer-based modules but for completeness,
132- you can use the following command to deploy a Packer-based deployment group:
133-
134- ``` shell
135- cd < deployment-directory> /< packer-group> /< custom-vm-image>
136- packer init .
137- packer validate .
138- packer build .
139- ```
37+ > ** _ NOTE:_ ** You may need to [ install dependencies] ( #dependencies ) first.
14038
14139## HPC Toolkit Components
14240
143- The HPC Toolkit has been designed to simplify the process of deploying an HPC
144- cluster on Google Cloud. The block diagram below describes the individual
145- components of the HPC toolkit.
146-
147- ``` mermaid
148- graph LR
149- subgraph HPC Environment Configuration
150- A(1. Provided Blueprint Examples) --> B(2. HPC Blueprint)
151- end
152- B --> D
153- subgraph Creating an HPC Deployment
154- C(3. Modules, eg. Terraform, Scripts) --> D(4. ghpc Engine)
155- D --> E(5. Deployment Directory)
156- end
157- subgraph Google Cloud
158- E --> F(6. HPC environment on GCP)
159- end
160- ```
161-
162- 1 . ** Provided Blueprint Examples** – A set of vetted reference blueprints can be
163- found in the ./examples and ./community/examples directories. These can be
164- used to create a predefined deployment for a cluster or as a starting point
165- for creating a custom deployment.
166- 2 . ** HPC Blueprint** – The primary interface to the HPC Toolkit is an HPC
167- Blueprint file. This is a YAML file that defines which modules to use and how
168- to customize them.
169- 3 . ** HPC Modules** – The building blocks of a deployment directory are the
170- modules. Modules can be found in the ./modules and community/modules
171- directories. They are composed of terraform, packer and/or script files that
172- meet the expectations of the gHPC engine.
173- 4 . ** gHPC Engine** – The gHPC engine converts the blueprint file into a
174- self-contained deployment directory.
175- 5 . ** Deployment Directory** – A self-contained directory that can be used to
176- deploy a cluster onto Google Cloud. This is the output of the gHPC engine.
177- 6 . ** HPC environment on GCP** – After deployment, an HPC environment will be
178- available in Google Cloud.
179-
180- Users can configure a set of modules, and using the gHPC Engine of the HPC
181- Toolkit, they can produce a deployment directory with instructions for
182- deploying. Terraform is the primary method for defining the modules behind the
183- HPC cluster, but other modules based on tools like ansible and Packer are
184- available.
185-
186- The HPC Toolkit can provide extra flexibility to configure a cluster to the
187- specifications of a customer by making the deployment directory available and
188- editable before deploying. Any HPC customer seeking a quick on-ramp to building
189- out their infrastructure on GCP can benefit from this.
41+ Learn about the components that make up the HPC Toolkit and more on how it works
42+ on the
43+ [ Google Cloud Docs Product Overview] ( https://cloud.google.com/hpc-toolkit/docs/overview#components ) .
19044
19145## GCP Credentials
19246
@@ -309,23 +163,18 @@ In a new GCP project there are several apis that must be enabled to deploy your
309163HPC cluster. These will be caught when you perform `terraform apply` but you can
310164save time by enabling them upfront.
311165
312- List of APIs to enable ([instructions](https://cloud.google.com/apis/docs/getting-started#enabling_apis)):
313-
314- * Compute Engine API
315- * Cloud Filestore API
316- * Cloud Runtime Configuration API - _needed for `high-io` example_
166+ See
167+ [Google Cloud Docs](https://cloud.google.com/hpc-toolkit/docs/setup/configure-environment#enable-apis)
168+ for instructions.
317169
318170# # GCP Quotas
319171
320172You may need to request additional quota to be able to deploy and use your HPC
321- cluster. For example, by default the `SchedMD-slurm-on-gcp-partition` module
322- uses `c2-standard-60` VMs for compute nodes. Default quota for C2 CPUs may be as
323- low as 8, which would prevent even a single node from being started.
324-
325- Required quotas will be based on your custom HPC configuration. Minimum quotas
326- have been [documented](examples/README.md#example-blueprints) for the provided examples.
173+ cluster.
327174
328- Quotas can be inspected and requested at `IAM & Admin` > `Quotas`.
175+ See
176+ [Google Cloud Docs](https://cloud.google.com/hpc-toolkit/docs/setup/hpc-blueprint#request-quota)
177+ for more information.
329178
330179# # Billing Reports
331180
@@ -581,30 +430,8 @@ hpc-small/
581430
582431# # Dependencies
583432
584- Much of the HPC Toolkit deployment is built using Terraform and Packer, and
585- therefore they must be available in the same machine calling the toolkit. In
586- addition, building the HPC Toolkit from source requires git, make, and Go to be
587- installed.
588-
589- List of dependencies :
590-
591- * Terraform: version>=1.0.0 - [install instructions](https://www.terraform.io/downloads.html)
592- * Packer: version>=1.6.0 - [install instructions](https://www.packer.io/downloads)
593- * golang: version>=1.16 - [install instructions](https://golang.org/doc/install)
594- * To setup GOPATH and development environment: `export PATH=$PATH:$(go env GOPATH)/bin`
595- * make
596- * git
597-
598- # ## MacOS Additional Dependencies
599-
600- On macOS, `make` is packaged with the Xcode command line developer tools. To
601- install, run the following command :
602-
603- ` ` ` shell
604- xcode-select --install
605- ` ` `
606-
607- Alternatively you can build `ghpc` directly using `go build ghpc.go`.
433+ See
434+ [Cloud Docs on Installing Dependencies](https://cloud.google.com/hpc-toolkit/docs/setup/install-dependencies).
608435
609436# ## Notes on Packer
610437
0 commit comments