Skip to content

Commit 89f14d4

Browse files
authored
Merge pull request #94 from bqbooster/fix-docs
Fix documentation and especially configuration one
2 parents eee30a2 + dd9e3c8 commit 89f14d4

File tree

6 files changed

+134
-122
lines changed

6 files changed

+134
-122
lines changed

docs/audit-logs-vs-information-schema.md renamed to docs/configuration/audit-logs-vs-information-schema.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 5
2+
sidebar_position: 4.1
33
slug: /audit-logs-vs-information-schema
44
---
55

docs/configuration/audit-logs.md

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
sidebar_position: 4.2
3+
slug: /configuration/audit-logs
4+
---
5+
6+
# GCP BigQuery audit logs
7+
8+
In this mode, the package will monitor all the jobs that written to a GCP BigQuery Audit logs table instead of using `INFORMATION_SCHEMA.JOBS` one.
9+
10+
:::tip
11+
12+
To get the best out of this mode, you should enable the `should_combine_audit_logs_and_information_schema` setting to combine both sources.
13+
More details on [the related page](/audit-logs-vs-information-schema).
14+
15+
:::
16+
17+
To enable the "cloud audit logs mode", you'll need to define explicitly mandatory settings to set in the `dbt_project.yml` file:
18+
19+
```yml
20+
vars:
21+
enable_gcp_bigquery_audit_logs: true
22+
gcp_bigquery_audit_logs_storage_project: 'my-gcp-project'
23+
gcp_bigquery_audit_logs_dataset: 'my_dataset'
24+
gcp_bigquery_audit_logs_table: 'my_table'
25+
# should_combine_audit_logs_and_information_schema: true # Optional, default to false but you might want to combine both sources
26+
```
27+
28+
[You might use environment variable as well](/configuration/package-settings).
29+

docs/configuration/configuration.md

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
sidebar_position: 4
3+
slug: /configuration
4+
---
5+
6+
# Configuration
7+
8+
Settings have default values that can be overriden using:
9+
10+
- dbt project variables (and therefore also by CLI variable override)
11+
- environment variables
12+
13+
Please note that the default region is `us` and there's no way, at the time of writing, to query cross region tables but you might run that project in each region you want to monitor and [then replicate the tables to a central region](https://cloud.google.com/bigquery/docs/data-replication) to build an aggregated view.
14+
15+
To know which region is related to a job, in the BQ UI, use the `Job history` (bottom panel), take a job and look at `Location` field when clicking on a job. You can also access the region of a dataset/table by opening the details panel of it and check the `Data location` field.
16+
17+
:::tip
18+
19+
To get the best out of this package, you should probably configure all data sources and settings:
20+
- Choose the [Baseline mode](#modes) that fits your GCP setup
21+
- [Add metadata to queries](#add-metadata-to-queries-recommended-but-optional)
22+
- [GCP BigQuery Audit logs](/configuration/audit-logs)
23+
- [GCP Billing export](/configuration/gcp-billing)
24+
- [Settings](/configuration/package-settings) (especially the pricing ones)
25+
26+
:::
27+
28+
29+
## Modes
30+
31+
### Region mode (default)
32+
33+
In this mode, the package will monitor all the GCP projects in the region specified in the `dbt_project.yml` file.
34+
35+
```yml
36+
vars:
37+
# dbt bigquery monitoring vars
38+
bq_region: 'us'
39+
```
40+
41+
**Requirements**
42+
43+
- Execution project needs to be the same as the storage project else you'll need to use the second mode.
44+
- If you have multiple GCP Projects in the same region, you should use the "project mode" (with `input_gcp_projects` setting to specify them) as else you will run into errors such as: `Within a standard SQL view, references to tables/views require explicit project IDs unless the entity is created in the same project that is issuing the query, but these references are not project-qualified: "region-us.INFORMATION_SCHEMA.JOBS"`.
45+
46+
### Project mode
47+
48+
To enable the "project mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:
49+
50+
```yml
51+
vars:
52+
# dbt bigquery monitoring vars
53+
input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
54+
```
55+
56+
## Add metadata to queries (Recommended but optional)
57+
58+
To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query)
59+
To configure the query comments, add the following config to `dbt_project.yml`.
60+
61+
```yaml
62+
query-comment:
63+
comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
64+
job-label: True # Use query comment JSON as job labels
65+
```
66+

docs/configuration/gcp-billing.md

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
sidebar_position: 4.3
3+
slug: /configuration/gcp-billing
4+
---
5+
6+
# GCP Billing export
7+
GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.
8+
9+
To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.
10+
11+
Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:
12+
13+
```yml
14+
vars:
15+
enable_gcp_billing_export: true
16+
gcp_billing_export_storage_project: 'my-gcp-project'
17+
gcp_billing_export_dataset: 'my_dataset'
18+
gcp_billing_export_table: 'my_table'
19+
```

docs/configuration.md renamed to docs/configuration/package-settings.md

+13-105
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,21 @@
11
---
2-
sidebar_position: 4
3-
slug: /configuration
2+
sidebar_position: 4.4
3+
slug: /configuration/package-settings
44
---
55

6-
# Configuration
7-
8-
Settings have default values that can be overriden using:
9-
10-
- dbt project variables (and therefore also by CLI variable override)
11-
- environment variables
12-
13-
Please note that the default region is `us` and there's no way, at the time of writing, to query cross region tables but you might run that project in each region you want to monitor and [then replicate the tables to a central region](https://cloud.google.com/bigquery/docs/data-replication) to build an aggregated view.
14-
15-
To know which region is related to a job, in the BQ UI, use the `Job history` (bottom panel), take a job and look at `Location` field when clicking on a job. You can also access the region of a dataset/table by opening the details panel of it and check the `Data location` field.
16-
17-
## Modes
18-
19-
### Region mode (default)
20-
21-
In this mode, the package will monitor all the GCP projects in the region specified in the `dbt_project.yml` file.
22-
23-
```yml
24-
vars:
25-
# dbt bigquery monitoring vars
26-
bq_region: 'us'
27-
```
28-
29-
**Requirements**
30-
31-
- Execution project needs to be the same as the storage project else you'll need to use the second mode.
32-
- If you have multiple GCP Projects in the same region, you should use the "project mode" (with `input_gcp_projects` setting to specify them) as else you will run into errors such as: `Within a standard SQL view, references to tables/views require explicit project IDs unless the entity is created in the same project that is issuing the query, but these references are not project-qualified: "region-us.INFORMATION_SCHEMA.JOBS"`.
33-
34-
### Project mode
35-
36-
To enable the "project mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:
37-
38-
```yml
39-
vars:
40-
# dbt bigquery monitoring vars
41-
input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
42-
```
43-
44-
##### GCP Billing export
45-
GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.
46-
To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.
47-
Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:
48-
49-
```yml
50-
vars:
51-
enable_gcp_bigquery_audit_logs: true
52-
gcp_bigquery_audit_logs_storage_project: 'my-gcp-project'
53-
gcp_bigquery_audit_logs_dataset: 'my_dataset'
54-
gcp_bigquery_audit_logs_table: 'my_table'
55-
```
56-
57-
58-
59-
### BigQuery audit logs mode
60-
61-
In this mode, the package will monitor all the jobs that written to a GCP BigQuery Audit logs table instead of using `INFORMATION_SCHEMA.JOBS` one.
62-
63-
To enable the "cloud audit logs mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:
64-
65-
```yml
66-
vars:
67-
# dbt bigquery monitoring vars
68-
bq_region: 'us'
69-
cloud_audit_logs_table: 'my-gcp-project.my_dataset.my_table'
70-
```
71-
72-
[You might use environment variable as well](#gcp-bigquery-audit-logs-configuration).
73-
74-
### GCP Billing export
75-
76-
GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.
77-
78-
To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.
79-
80-
Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:
81-
82-
```yml
83-
vars:
84-
# dbt bigquery monitoring vars
85-
enable_gcp_billing_export: true
86-
gcp_billing_export_storage_project: 'my-gcp-project'
87-
gcp_billing_export_dataset: 'my_dataset'
88-
gcp_billing_export_table: 'my_table'
89-
```
90-
91-
## Add metadata to queries (Recommended but optional)
92-
93-
To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query)
94-
To configure the query comments, add the following config to `dbt_project.yml`.
95-
96-
```yaml
97-
query-comment:
98-
comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
99-
job-label: True # Use query comment JSON as job labels
100-
```
101-
102-
## Customizing the package configuration
6+
# Customizing the package settings
1037

1048
Following settings can be overriden to customize the package configuration.
1059
To do so, you can set the following variables in your `dbt_project.yml` file or use environment variables.
10610

107-
### Environment
11+
## Environment
10812

10913
| Variable | Environment Variable | Description | Default |
11014
|----------|-------------------|-------------|---------|
11115
| `input_gcp_projects` | `DBT_BQ_MONITORING_GCP_PROJECTS` | List of GCP projects to monitor | `[]` |
11216
| `bq_region` | `DBT_BQ_MONITORING_REGION` | Region where the monitored projects are located | `us` |
11317

114-
### Pricing
18+
## Pricing
11519

11620
| Variable | Environment Variable | Description | Default |
11721
|----------|-------------------|-------------|---------|
@@ -126,7 +30,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
12630
| `bi_engine_gb_hourly_price` | `DBT_BQ_MONITORING_BI_ENGINE_GB_HOURLY_PRICE` | Hourly price in US dollars per BI engine GB of memory | `0.0416` |
12731
| `free_storage_gb_per_month` | `DBT_BQ_MONITORING_FREE_STORAGE_GB_PER_MONTH` | Free storage GB per month | `10` |
12832

129-
### Package
33+
## Package
34+
35+
These settings are used to configure how dbt will run and materialize the models.
13036

13137
| Variable | Environment Variable | Description | Default |
13238
|----------|-------------------|-------------|---------|
@@ -136,7 +42,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
13642
| `output_partition_expiration_days` | `DBT_BQ_MONITORING_OUTPUT_LIMIT_SIZE` | Default table expiration in days for incremental models | `365` days |
13743
| `use_copy_partitions` | `DBT_BQ_MONITORING_USE_COPY_PARTITIONS` | Whether to use copy partitions or not | `true` |
13844

139-
#### GCP Billing export configuration
45+
### GCP Billing export configuration
46+
47+
See [GCP Billing export](/configuration/gcp-billing) for more information.
14048

14149
| Variable | Environment Variable | Description | Default |
14250
|----------|-------------------|-------------|---------|
@@ -145,9 +53,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
14553
| `gcp_billing_export_dataset` | `DBT_BQ_MONITORING_GCP_BILLING_EXPORT_DATASET` | The dataset for GCP billing export data | `'placeholder'` if enabled, `None` otherwise |
14654
| `gcp_billing_export_table` | `DBT_BQ_MONITORING_GCP_BILLING_EXPORT_TABLE` | The table for GCP billing export data | `'placeholder'` if enabled, `None` otherwise |
14755

148-
#### GCP BigQuery Audit logs configuration
56+
### GCP BigQuery Audit logs configuration
14957

150-
See [GCP BigQuery Audit logs](#bigquery-audit-logs-mode) for more information.
58+
See [GCP BigQuery Audit logs](/configuration/audit-logs) for more information.
15159

15260
| Variable | Environment Variable | Description | Default |
15361
|----------|-------------------|-------------|---------|

docs/contributing.md

+6-16
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,6 @@ slug: /contributing
1010
You're free to use the environment management tools you prefer but if you're familiar with those, you can use the following:
1111

1212
- pipx (to isolate the global tools from your local environment)
13-
- tox (to run the tests)
14-
- pre-commit (to run the linter)
1513
- SQLFluff (to lint SQL)
1614
- changie (to generate CHANGELOG entries)
1715

@@ -27,20 +25,12 @@ pipx ensurepath
2725
Then you'll be able to install tox, pre-commit and sqlfluff with pipx:
2826

2927
```bash
30-
pipx install tox
31-
pipx install pre-commit
3228
pipx install sqlfluff
3329
```
3430

3531
To install changie, there are few options depending on your OS.
3632
See the [installation guide](https://changie.dev/guide/installation/) for more details.
3733

38-
To configure pre-commit hooks:
39-
40-
```bash
41-
pre-commit install
42-
```
43-
4434
To configure your dbt profile, run following command and follow the prompts:
4535

4636
```bash
@@ -52,7 +42,7 @@ dbt init
5242
- Fork the repo
5343
- Create a branch from `main`
5444
- Make your changes
55-
- Run `tox` to run the tests
45+
- Run the tests
5646
- Create your changelog entry with `changie new` (don't edit directly the CHANGELOG.md)
5747
- Commit your changes (it will run the linter through pre-commit)
5848
- Push your branch and open a PR on the repository
@@ -71,27 +61,27 @@ We use SQLFluff to keep SQL style consistent. By installing `pre-commit` per the
7161

7262
Lint all models in the /models directory:
7363
```bash
74-
tox -e lint_all
64+
sqlfluff lint
7565
```
7666

7767
Fix all models in the /models directory:
7868
```bash
79-
tox -e fix_all
69+
sqlfluff fix
8070
```
8171

8272
Lint (or subsitute lint to fix) a specific model:
8373
```bash
84-
tox -e lint -- models/path/to/model.sql
74+
sqlfluff lint -- models/path/to/model.sql
8575
```
8676

8777
Lint (or subsitute lint to fix) a specific directory:
8878
```bash
89-
tox -e lint -- models/path/to/directory
79+
sqlfluff lint -- models/path/to/directory
9080
```
9181

9282
#### Rules
9383

94-
Enforced rules are defined within `tox.ini`. To view the full list of available rules and their configuration, see the [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/rules.html).
84+
Enforced rules are defined within `.sqlfluff`. To view the full list of available rules and their configuration, see the [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/rules.html).
9585

9686
## Generation of dbt base google models
9787

0 commit comments

Comments
 (0)