Merge pull request #94 from bqbooster/fix-docs

Kayrnt · web-flow · commit 89f14d4a4c18 · 2025-01-01T19:56:06.000+01:00
Fix documentation and especially configuration one
diff --git a/docs/configuration/audit-logs-vs-information-schema.md b/docs/configuration/audit-logs-vs-information-schema.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 5
+sidebar_position: 4.1
 slug: /audit-logs-vs-information-schema
 ---
 
diff --git a/docs/configuration/audit-logs.md b/docs/configuration/audit-logs.md
@@ -0,0 +1,29 @@
+---
+sidebar_position: 4.2
+slug: /configuration/audit-logs
+---
+
+# GCP BigQuery audit logs
+
+In this mode, the package will monitor all the jobs that written to a GCP BigQuery Audit logs table instead of using `INFORMATION_SCHEMA.JOBS` one.
+
+:::tip
+
+To get the best out of this mode, you should enable the `should_combine_audit_logs_and_information_schema` setting to combine both sources.
+More details on [the related page](/audit-logs-vs-information-schema).
+
+:::
+
+To enable the "cloud audit logs mode", you'll need to define explicitly mandatory settings to set in the `dbt_project.yml` file:
+
+```yml
+vars:
+  enable_gcp_bigquery_audit_logs: true
+  gcp_bigquery_audit_logs_storage_project: 'my-gcp-project'
+  gcp_bigquery_audit_logs_dataset: 'my_dataset'
+  gcp_bigquery_audit_logs_table: 'my_table'
+  # should_combine_audit_logs_and_information_schema: true # Optional, default to false but you might want to combine both sources
+```
+
+[You might use environment variable as well](/configuration/package-settings).
+
diff --git a/docs/configuration/configuration.md b/docs/configuration/configuration.md
@@ -0,0 +1,66 @@
+---
+sidebar_position: 4
+slug: /configuration
+---
+
+# Configuration
+
+Settings have default values that can be overriden using:
+
+- dbt project variables (and therefore also by CLI variable override)
+- environment variables
+
+Please note that the default region is `us` and there's no way, at the time of writing, to query cross region tables but you might run that project in each region you want to monitor and [then replicate the tables to a central region](https://cloud.google.com/bigquery/docs/data-replication) to build an aggregated view.
+
+To know which region is related to a job, in the BQ UI, use the `Job history` (bottom panel), take a job and look at `Location` field when clicking on a job. You can also access the region of a dataset/table by opening the details panel of it and check the `Data location` field.
+
+:::tip
+
+To get the best out of this package, you should probably configure all data sources and settings:
+- Choose the [Baseline mode](#modes) that fits your GCP setup
+- [Add metadata to queries](#add-metadata-to-queries-recommended-but-optional)
+- [GCP BigQuery Audit logs](/configuration/audit-logs)
+- [GCP Billing export](/configuration/gcp-billing)
+- [Settings](/configuration/package-settings) (especially the pricing ones)
+
+:::
+
+
+## Modes
+
+### Region mode (default)
+
+In this mode, the package will monitor all the GCP projects in the region specified in the `dbt_project.yml` file.
+
+```yml
+vars:
+  # dbt bigquery monitoring vars
+  bq_region: 'us'
+```
+
+**Requirements**
+
+- Execution project needs to be the same as the storage project else you'll need to use the second mode.
+- If you have multiple GCP Projects in the same region, you should use the "project mode" (with `input_gcp_projects` setting to specify them) as else you will run into errors such as: `Within a standard SQL view, references to tables/views require explicit project IDs unless the entity is created in the same project that is issuing the query, but these references are not project-qualified: "region-us.INFORMATION_SCHEMA.JOBS"`.
+
+### Project mode
+
+To enable the "project mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:
+
+```yml
+vars:
+  # dbt bigquery monitoring vars
+  input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
+```
+
+## Add metadata to queries (Recommended but optional)
+
+To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query)
+To configure the query comments, add the following config to `dbt_project.yml`.
+
+```yaml
+query-comment:
+  comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
+  job-label: True # Use query comment JSON as job labels
+```
+
diff --git a/docs/configuration/gcp-billing.md b/docs/configuration/gcp-billing.md
@@ -0,0 +1,19 @@
+---
+sidebar_position: 4.3
+slug: /configuration/gcp-billing
+---
+
+# GCP Billing export
+GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.
+
+To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.
+
+Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:
+
+```yml
+vars:
+  enable_gcp_billing_export: true
+  gcp_billing_export_storage_project: 'my-gcp-project'
+  gcp_billing_export_dataset: 'my_dataset'
+  gcp_billing_export_table: 'my_table'
+```
diff --git a/docs/configuration/package-settings.md b/docs/configuration/package-settings.md
@@ -1,117 +1,21 @@
 ---
-sidebar_position: 4
-slug: /configuration
+sidebar_position: 4.4
+slug: /configuration/package-settings
 ---
 
-# Configuration
-
-Settings have default values that can be overriden using:
-
-- dbt project variables (and therefore also by CLI variable override)
-- environment variables
-
-Please note that the default region is `us` and there's no way, at the time of writing, to query cross region tables but you might run that project in each region you want to monitor and [then replicate the tables to a central region](https://cloud.google.com/bigquery/docs/data-replication) to build an aggregated view.
-
-To know which region is related to a job, in the BQ UI, use the `Job history` (bottom panel), take a job and look at `Location` field when clicking on a job. You can also access the region of a dataset/table by opening the details panel of it and check the `Data location` field.
-
-## Modes
-
-### Region mode (default)
-
-In this mode, the package will monitor all the GCP projects in the region specified in the `dbt_project.yml` file.
-
-```yml
-vars:
-  # dbt bigquery monitoring vars
-  bq_region: 'us'
-```
-
-**Requirements**
-
-- Execution project needs to be the same as the storage project else you'll need to use the second mode.
-- If you have multiple GCP Projects in the same region, you should use the "project mode" (with `input_gcp_projects` setting to specify them) as else you will run into errors such as: `Within a standard SQL view, references to tables/views require explicit project IDs unless the entity is created in the same project that is issuing the query, but these references are not project-qualified: "region-us.INFORMATION_SCHEMA.JOBS"`.
-
-### Project mode
-
-To enable the "project mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:
-
-```yml
-vars:
-  # dbt bigquery monitoring vars
-  input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
-```
-
-##### GCP Billing export
-GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.
-To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.
-Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:
-
-```yml
-vars:
-  enable_gcp_bigquery_audit_logs: true
-  gcp_bigquery_audit_logs_storage_project: 'my-gcp-project'
-  gcp_bigquery_audit_logs_dataset: 'my_dataset'
-  gcp_bigquery_audit_logs_table: 'my_table'
-```
-
-
-
-### BigQuery audit logs mode
-
-In this mode, the package will monitor all the jobs that written to a GCP BigQuery Audit logs table instead of using `INFORMATION_SCHEMA.JOBS` one.
-
-To enable the "cloud audit logs mode", you'll need to define explicitly one mandatory setting to set in the `dbt_project.yml` file:
-
-```yml
-vars:
-  # dbt bigquery monitoring vars
-  bq_region: 'us'
-  cloud_audit_logs_table: 'my-gcp-project.my_dataset.my_table'
-```
-
-[You might use environment variable as well](#gcp-bigquery-audit-logs-configuration).
-
-### GCP Billing export
-
-GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.
-
-To enable on GCP end, you can follow the [official documentation](https://cloud.google.com/billing/docs/how-to/export-data-bigquery) to set up the export.
-
-Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the `dbt_project.yml` file:
-
-```yml
-vars:
-  # dbt bigquery monitoring vars
-  enable_gcp_billing_export: true
-  gcp_billing_export_storage_project: 'my-gcp-project'
-  gcp_billing_export_dataset: 'my_dataset'
-  gcp_billing_export_table: 'my_table'
-```
-
-## Add metadata to queries (Recommended but optional)
-
-To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query)
-To configure the query comments, add the following config to `dbt_project.yml`.
-
-```yaml
-query-comment:
-  comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
-  job-label: True # Use query comment JSON as job labels
-```
-
-## Customizing the package configuration
+# Customizing the package settings
 
 Following settings can be overriden to customize the package configuration.
 To do so, you can set the following variables in your `dbt_project.yml` file or use environment variables.
 
-### Environment
+## Environment
 
 | Variable | Environment Variable | Description | Default |
 |----------|-------------------|-------------|---------|
 | `input_gcp_projects` | `DBT_BQ_MONITORING_GCP_PROJECTS` | List of GCP projects to monitor | `[]` |
 | `bq_region` | `DBT_BQ_MONITORING_REGION` | Region where the monitored projects are located | `us` |
 
-### Pricing
+## Pricing
 
 | Variable | Environment Variable | Description | Default |
 |----------|-------------------|-------------|---------|
@@ -126,7 +30,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
 | `bi_engine_gb_hourly_price` | `DBT_BQ_MONITORING_BI_ENGINE_GB_HOURLY_PRICE` | Hourly price in US dollars per BI engine GB of memory | `0.0416` |
 | `free_storage_gb_per_month` | `DBT_BQ_MONITORING_FREE_STORAGE_GB_PER_MONTH` | Free storage GB per month | `10` |
 
-### Package
+## Package
+
+These settings are used to configure how dbt will run and materialize the models.
 
 | Variable | Environment Variable | Description | Default |
 |----------|-------------------|-------------|---------|
@@ -136,7 +42,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
 | `output_partition_expiration_days` | `DBT_BQ_MONITORING_OUTPUT_LIMIT_SIZE` | Default table expiration in days for incremental models | `365` days |
 | `use_copy_partitions` | `DBT_BQ_MONITORING_USE_COPY_PARTITIONS` | Whether to use copy partitions or not | `true` |
 
-#### GCP Billing export configuration
+### GCP Billing export configuration
+
+See [GCP Billing export](/configuration/gcp-billing) for more information.
 
 | Variable | Environment Variable | Description | Default |
 |----------|-------------------|-------------|---------|
@@ -145,9 +53,9 @@ To do so, you can set the following variables in your `dbt_project.yml` file or
 | `gcp_billing_export_dataset` | `DBT_BQ_MONITORING_GCP_BILLING_EXPORT_DATASET` | The dataset for GCP billing export data | `'placeholder'` if enabled, `None` otherwise |
 | `gcp_billing_export_table` | `DBT_BQ_MONITORING_GCP_BILLING_EXPORT_TABLE` | The table for GCP billing export data | `'placeholder'` if enabled, `None` otherwise |
 
-#### GCP BigQuery Audit logs configuration
+### GCP BigQuery Audit logs configuration
 
-See [GCP BigQuery Audit logs](#bigquery-audit-logs-mode) for more information.
+See [GCP BigQuery Audit logs](/configuration/audit-logs) for more information.
 
 | Variable | Environment Variable | Description | Default |
 |----------|-------------------|-------------|---------|
diff --git a/docs/contributing.md b/docs/contributing.md
@@ -10,8 +10,6 @@ slug: /contributing
 You're free to use the environment management tools you prefer but if you're familiar with those, you can use the following:
 
 - pipx (to isolate the global tools from your local environment)
-- tox (to run the tests)
-- pre-commit (to run the linter)
 - SQLFluff (to lint SQL)
 - changie (to generate CHANGELOG entries)
 
@@ -27,20 +25,12 @@ pipx ensurepath
 Then you'll be able to install tox, pre-commit and sqlfluff with pipx:
 
 ```bash
-pipx install tox
-pipx install pre-commit
 pipx install sqlfluff
 ```
 
 To install changie, there are few options depending on your OS.
 See the [installation guide](https://changie.dev/guide/installation/) for more details.
 
-To configure pre-commit hooks:
-
-```bash
-pre-commit install
-```
-
 To configure your dbt profile, run following command and follow the prompts:
 
 ```bash
@@ -52,7 +42,7 @@ dbt init
 - Fork the repo
 - Create a branch from `main`
 - Make your changes
-- Run `tox` to run the tests
+- Run the tests
 - Create your changelog entry with `changie new` (don't edit directly the CHANGELOG.md)
 - Commit your changes (it will run the linter through pre-commit)
 - Push your branch and open a PR on the repository
@@ -71,27 +61,27 @@ We use SQLFluff to keep SQL style consistent. By installing `pre-commit` per the
 
 Lint all models in the /models directory:
 ```bash
-tox -e lint_all
+sqlfluff lint
 ```
 
 Fix all models in the /models directory:
 ```bash
-tox -e fix_all
+sqlfluff fix
 ```
 
 Lint (or subsitute lint to fix) a specific model:
 ```bash
-tox -e lint -- models/path/to/model.sql
+sqlfluff lint -- models/path/to/model.sql
 ```
 
 Lint (or subsitute lint to fix) a specific directory:
 ```bash
-tox -e lint -- models/path/to/directory
+sqlfluff lint -- models/path/to/directory
 ```
 
 #### Rules
 
-Enforced rules are defined within `tox.ini`. To view the full list of available rules and their configuration, see the [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/rules.html).
+Enforced rules are defined within `.sqlfluff`. To view the full list of available rules and their configuration, see the [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/rules.html).
 
 ## Generation of dbt base google models