You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* docs: Updated the installation doc to include a section on cloning the repo, docs on authentication, and updating Terraform instructions so that it can function
* docs: Reformat sections to be more context friendly. Add more clear language and link to Google Cloud SDK installation guide
* docs: specified terraform folder
Copy file name to clipboardExpand all lines: docs/installation.md
+35-17Lines changed: 35 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,48 +1,59 @@
1
1
# Data Validation Tool Installation Guide
2
-
The data validation tool can be installed on any machine that has Python 3.6+ installed.
3
2
4
3
The tool natively supports BigQuery connections. If you need to connect to other databases such as Teradata or Oracle, you will need to install the appropriate connection libraries. (See the [Connections](connections.md) page for details)
5
4
6
5
This tool can be natively installed on your machine or can be containerized and run with Docker.
7
6
8
7
9
8
## Prerequisites
10
-
The Data Validation Tool can be configured to store the results of validation runs into BigQuery tables. To allow tool to do that, we need to do following:
9
+
10
+
- Any machine with Python 3.6+ installed.
11
11
12
12
## Setup
13
13
14
-
To write results to BigQuery, you'll need to setup the required cloud
15
-
resources, local authentication, and configure the tool.
14
+
By default, the data validation tool writes the results of data validation to `stdout`. However, we recommend storing the results of validations to a BigQuery table in order to standardize the process and share results across a team. In order to allow the data validation tool to write to a BigQuery table, users need to have a BigQuery table created with a specific schema. If you choose to write results to a BigQuery table, there are a couple of requirements:
15
+
16
+
- A Google Cloud Platform project with the BigQuery API enabled.
17
+
18
+
- A Google user account with appropriate permissions. If you plan to run this tool in production, it's recommended that you create a service account specifically for running the tool. See our [guide](https://cloud.google.com/docs/authentication/production) on how to authenticate with your service account. If you are using a service account, you need to grant your service account appropriate roles on your project so that it has permissions to create and read resources.
19
+
20
+
Clone the repository onto your machine and navigate inside the directory:
There are two methods of creating the BigQuery output table for the tool: via *Terraform* or the *Cloud SDK*.
16
28
17
-
A Google Cloud Platform project with the BigQuery API enabled is required.
18
29
19
-
Confirm which Google user account will be used to execute the tool. If you plan to run this tool in
20
-
production, it's recommended that you create a service account specifically
21
-
for running the tool.
22
-
There are two methods of creating the Cloud resources necessary for the tool: via Terraform or the Cloud SDK.
23
-
### Create cloud resources - Terraform
30
+
### Cloud Resource Creation - Terraform
24
31
25
-
You can use Terraform to create the necessary BigQuery resources. (See next
26
-
section for manually creating resources with `gcloud`.)
32
+
By default, Terraform is run inside a test environment and needs to be directed to your project. Perform the following steps to direct the creation of the BigQuery table to your project:
33
+
34
+
1. Delete the `testenv.tf` file inside the `terraform` folder
35
+
2. View `variables.tf` inside the `terraform` folder and replace `default = "pso-kokoro-resources"` with `default = "YOUR_PROJECT_ID"`
36
+
37
+
38
+
After installing the [terraform CLI tool](https://learn.hashicorp.com/tutorials/terraform/install-cli) and completing the steps above, run the following commands from inside the root of the repo:
27
39
28
40
```
29
41
cd terraform
30
42
terraform init
31
43
terraform apply
32
44
```
33
45
34
-
You should see a dataset named `pso_data_validator` and a table named
35
-
`results`.
46
+
### Cloud Resource Creation - Cloud SDK (gcloud)
36
47
37
-
### Create cloud resources - Cloud SDK (gcloud)
48
+
Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) if necessary.
38
49
39
-
Create a dataset for validation results.
50
+
Create a dataset for validation results:
40
51
41
52
```
42
53
bq mk pso_data_validator
43
54
```
44
55
45
-
Create a table.
56
+
Create a table:
46
57
47
58
```
48
59
bq mk --table \
@@ -52,6 +63,13 @@ bq mk --table \
52
63
terraform/results_schema.json
53
64
```
54
65
66
+
### Cloud Resource Creation - After success
67
+
68
+
You should see a dataset named `pso_data_validator` and a table named
69
+
`results` created inside of your project.
70
+
71
+
After installing the CLI tool using the instructions below, you will be ready to run data validation commands and output the results to BigQuery. See an example [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#store-results-in-a-bigquery-table).
0 commit comments