Skip to content

Commit 0328c0e

Browse files
authored
docs: Issue263 Installation doc updates (#270)
* docs: Updated the installation doc to include a section on cloning the repo, docs on authentication, and updating Terraform instructions so that it can function * docs: Reformat sections to be more context friendly. Add more clear language and link to Google Cloud SDK installation guide * docs: specified terraform folder
1 parent 3c21ee5 commit 0328c0e

File tree

1 file changed

+35
-17
lines changed

1 file changed

+35
-17
lines changed

docs/installation.md

Lines changed: 35 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,59 @@
11
# Data Validation Tool Installation Guide
2-
The data validation tool can be installed on any machine that has Python 3.6+ installed.
32

43
The tool natively supports BigQuery connections. If you need to connect to other databases such as Teradata or Oracle, you will need to install the appropriate connection libraries. (See the [Connections](connections.md) page for details)
54

65
This tool can be natively installed on your machine or can be containerized and run with Docker.
76

87

98
## Prerequisites
10-
The Data Validation Tool can be configured to store the results of validation runs into BigQuery tables. To allow tool to do that, we need to do following:
9+
10+
- Any machine with Python 3.6+ installed.
1111

1212
## Setup
1313

14-
To write results to BigQuery, you'll need to setup the required cloud
15-
resources, local authentication, and configure the tool.
14+
By default, the data validation tool writes the results of data validation to `stdout`. However, we recommend storing the results of validations to a BigQuery table in order to standardize the process and share results across a team. In order to allow the data validation tool to write to a BigQuery table, users need to have a BigQuery table created with a specific schema. If you choose to write results to a BigQuery table, there are a couple of requirements:
15+
16+
- A Google Cloud Platform project with the BigQuery API enabled.
17+
18+
- A Google user account with appropriate permissions. If you plan to run this tool in production, it's recommended that you create a service account specifically for running the tool. See our [guide](https://cloud.google.com/docs/authentication/production) on how to authenticate with your service account. If you are using a service account, you need to grant your service account appropriate roles on your project so that it has permissions to create and read resources.
19+
20+
Clone the repository onto your machine and navigate inside the directory:
21+
22+
```
23+
git clone https://github.com/GoogleCloudPlatform/professional-services-data-validator.git
24+
cd professional-services-data-validator
25+
```
26+
27+
There are two methods of creating the BigQuery output table for the tool: via *Terraform* or the *Cloud SDK*.
1628

17-
A Google Cloud Platform project with the BigQuery API enabled is required.
1829

19-
Confirm which Google user account will be used to execute the tool. If you plan to run this tool in
20-
production, it's recommended that you create a service account specifically
21-
for running the tool.
22-
There are two methods of creating the Cloud resources necessary for the tool: via Terraform or the Cloud SDK.
23-
### Create cloud resources - Terraform
30+
### Cloud Resource Creation - Terraform
2431

25-
You can use Terraform to create the necessary BigQuery resources. (See next
26-
section for manually creating resources with `gcloud`.)
32+
By default, Terraform is run inside a test environment and needs to be directed to your project. Perform the following steps to direct the creation of the BigQuery table to your project:
33+
34+
1. Delete the `testenv.tf` file inside the `terraform` folder
35+
2. View `variables.tf` inside the `terraform` folder and replace `default = "pso-kokoro-resources"` with `default = "YOUR_PROJECT_ID"`
36+
37+
38+
After installing the [terraform CLI tool](https://learn.hashicorp.com/tutorials/terraform/install-cli) and completing the steps above, run the following commands from inside the root of the repo:
2739

2840
```
2941
cd terraform
3042
terraform init
3143
terraform apply
3244
```
3345

34-
You should see a dataset named `pso_data_validator` and a table named
35-
`results`.
46+
### Cloud Resource Creation - Cloud SDK (gcloud)
3647

37-
### Create cloud resources - Cloud SDK (gcloud)
48+
Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) if necessary.
3849

39-
Create a dataset for validation results.
50+
Create a dataset for validation results:
4051

4152
```
4253
bq mk pso_data_validator
4354
```
4455

45-
Create a table.
56+
Create a table:
4657

4758
```
4859
bq mk --table \
@@ -52,6 +63,13 @@ bq mk --table \
5263
terraform/results_schema.json
5364
```
5465

66+
### Cloud Resource Creation - After success
67+
68+
You should see a dataset named `pso_data_validator` and a table named
69+
`results` created inside of your project.
70+
71+
After installing the CLI tool using the instructions below, you will be ready to run data validation commands and output the results to BigQuery. See an example [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#store-results-in-a-bigquery-table).
72+
5573

5674
## Deploy Data Validation CLI on your machine
5775

0 commit comments

Comments
 (0)