Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions .claude/implementation/data_sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Data lookups module (`modules/data_lookups`)

This module centralizes **read-only** account discovery used when YAML references resources that are **not** defined in the same Terraform state (for example connections that already exist in dbt Cloud, or GitHub App installation IDs that are account-specific).

It mirrors the intent of `modules/projects_v2/data_sources.tf` on the importer branch, but as an explicit child module with clear inputs and outputs so root orchestration stays predictable.

## When the module is instantiated

Root enables `module.data_lookups` when **either**:

- The merged project YAML contains at least one **`LOOKUP:`** global-connection placeholder (see below), **or**
- `var.dbt_pat` is set (so GitHub installations can be fetched from the dbt Cloud integrations API).

Gating uses `local._lookup_connection_ref_strings` in `variables.tf`; keep that extraction **in sync** with the `lookup_connection_keys` logic in `modules/data_lookups/main.tf`.

## `LOOKUP:` global connections

### Syntax

Use a **string** value that starts with `LOOKUP:` followed by the **exact display name** of an existing global connection in the target dbt Cloud account (the `name` field returned by `data.dbtcloud_global_connections`).

Example:

```yaml
environments:
- name: Prod
key: prod
type: deployment
connection_key: "LOOKUP:Snowflake Production"
```

The map key passed to `modules/environments` and `modules/profiles` is the **full placeholder string** (e.g. `LOOKUP:Snowflake Production`), not the name alone.

### Where placeholders are scanned

- **Environments**: `connection` if set, otherwise `connection_key` (same precedence as `modules/environments` resolution).
- **Profiles**: `connection_key` only.

### Resolution

1. `data.dbtcloud_global_connections` runs **only** when at least one such placeholder exists (avoids an unnecessary read).
2. `lookup_connection_ids` maps each placeholder to `tostring(connection.id)` where `connection.name == replace(placeholder, "LOOKUP:", "")`.
3. Root builds `local.global_connection_ids_effective`:

`merge(lookup_connection_ids, managed_global_connection_ids)`

**Managed Terraform connections win on key collision** (in practice YAML keys and `LOOKUP:…` keys should not overlap).

### Validation (V-01)

`validation.tf` **does not** require `LOOKUP:…` values to appear under `global_connections[]`. Placeholders are intentionally for **pre-existing** connections. If no matching name exists in the account, resolution yields `null` and apply can fail on the environment resource; fixing that is an operational/data issue, not schema validation.

## GitHub App installations

When `var.dbt_pat` is non-null, the module calls:

`GET {dbt_host}/api/v2/integrations/github/installations/`

with `Authorization: Bearer <dbt_pat>`.

Outputs:

- `github_installation_by_owner` — map of **lowercase** GitHub `account.login` → installation **numeric id**.
- `github_installation_fallback_id` — first installation in the filtered list when owner matching is not used.

**Note:** Service tokens cannot use this API; use a PAT. Default host for the HTTP call is `coalesce(var.dbt_host_url, "https://cloud.getdbt.com")` with a trailing `/api` segment stripped if present.

### Consumption in `modules/repository`

Root passes `module.data_lookups[0].github_installation_by_owner` and `github_installation_fallback_id` into the repository module when `data_lookups` is enabled (same conditions as above). The repository module resolves **`github_installation_id`** in order:

1. **`repository.github_installation_id`** from YAML, if set
2. **`github_installation_by_owner[lower(owner)]`** where `owner` is parsed from `remote_url` (`github.com/<owner>/…` or `git@github.com:<owner>/…`)
3. **`github_installation_fallback_id`** (first installation returned for the account)

**Auto-detect GitHub** (`remote_url` on github.com, no explicit `git_clone_strategy`) uses **`github_app`** only when a non-null resolved installation id exists **or** `dbt_pat` is set (discovery may fill the id at apply). Otherwise it uses **`deploy_key`**.

Explicit **`git_clone_strategy: github_app`** follows the same rule: without YAML id, discovery map entry, fallback, or PAT, strategy downgrades to **`deploy_key`**.

Root still exposes the GitHub outputs for debugging and for any external callers.

## Repository `LOOKUP:` (scalar, legacy)

If `project.repository` is a **scalar** string beginning with `LOOKUP:` (v2 / importer style), it is collected in `lookup_repository_keys`. There is **no** resolution here yet; repository linking for the current v1 object-shaped `repository` block is unchanged.

## Root outputs

| Output | Meaning |
|--------|---------|
| `connection_ids` | **Effective** map used by environments/profiles (managed + `LOOKUP:`). |
| `lookup_connection_ids` | Only the `LOOKUP:`-resolved entries. |
| `github_installation_by_owner` / `github_installation_fallback_id` | From integrations API when PAT is set. |

## Dependencies

- **Provider**: `hashicorp/http` (declared in root `providers.tf` and the module).
- **dbt Cloud**: `data.dbtcloud_global_connections` uses the default `dbtcloud` provider configuration at root (`dbt_token`, `dbt_account_id`).

## Extending this module

When adding new lookup types:

1. Add **inputs** only if root cannot derive them from existing YAML/locals.
2. Gate **expensive** `data` sources with a `count` tied to a `local.needs_*` flag.
3. Expose stable **outputs**; merge at root if multiple modules need the same id map.
4. Update **this document** and, where relevant, `schemas/v1.json` descriptions.
Empty file.
55 changes: 36 additions & 19 deletions .terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Documentation now matches the **YAML schema version 1** layout: `version: 1`, `account`, `globals` (connections, service tokens, groups, notifications, PrivateLink), environment field **`connection`** (not `connection_key`), `project_artefacts` / `semantic_layer_config`, and job **`environment_variable_overrides`**. Examples and troubleshooting were updated accordingly.

### Fixed

### Removed
Expand Down
86 changes: 50 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This downloads the [examples/basic/](examples/basic/) starter into `./my-dbt-clo
Then:

```bash
cd my-dbt-platform
cd my-dbt-cloud
cp .env.example .env # fill in your dbt Cloud credentials
# edit dbt-config.yml # replace YOUR_ placeholders with your warehouse details
source .env && terraform init && terraform apply
Expand Down Expand Up @@ -69,7 +69,26 @@ module "dbt_cloud" {

**2. Create `dbt-config.yml`**

Configuration uses **`version: 1`**, an **`account`** block (including `host_url` for the dbt Cloud region), shared resources under **`globals`** (connections, service tokens, groups, notifications, PrivateLink endpoints), and a **`projects`** list. Validate in your editor with [`schemas/v1.json`](docs/configuration/yaml-schema.md).

```yaml
# yaml-language-server: $schema=https://raw.githubusercontent.com/trouze/terraform-dbtcloud-yaml/main/schemas/v1.json

version: 1
account:
name: Your Account
host_url: https://cloud.getdbt.com

globals:
connections:
- name: Databricks Production
key: databricks_prod
type: databricks
details:
host: adb-1234567890.1.azuredatabricks.net
http_path: /sql/1.0/warehouses/abc123
catalog: main

projects:
- name: Analytics
key: analytics
Expand All @@ -81,7 +100,7 @@ projects:
environments:
- name: Production
key: prod
connection_key: databricks_prod # references global_connections key below
connection: databricks_prod # globals.connections[].key (or numeric id / LOOKUP:…)
deployment_type: production
type: deployment
custom_branch: main
Expand All @@ -93,8 +112,12 @@ projects:

- name: Development
key: dev
connection_key: databricks_prod
connection: databricks_prod
type: development
credential:
credential_type: databricks
catalog: main
schema: analytics_dev

jobs:
- name: Daily Build
Expand All @@ -110,14 +133,6 @@ projects:
schedule_type: days_of_week
schedule_days: [1, 2, 3, 4, 5]
schedule_hours: [6]

global_connections:
- name: Databricks Production
key: databricks_prod
type: databricks
host: adb-1234567890.1.azuredatabricks.net
http_path: /sql/1.0/warehouses/abc123
catalog: main
```

**3. Create `terraform.tfvars`**
Expand Down Expand Up @@ -157,9 +172,9 @@ Sensitive values are never in the YAML file. They're passed as Terraform variabl

| Variable | Key format | Matches |
|---|---|---|
| `token_map` | `"my_token_name"` | `credential.token_name` in YAML (Databricks legacy) |
| `token_map` | `"my_token_name"` | `credential.token_name` (Databricks legacy) or `jobs[].environment_variable_overrides` values prefixed with `secret_` |
| `environment_credentials` | `"project_key_env_key"` | Environment credential by composite key |
| `connection_credentials` | `"connection_key"` | `global_connections[].key` in YAML |
| `connection_credentials` | `"connection_key"` | `globals.connections[].key` in YAML |
| `lineage_tokens` | `"project_key_integration_key"` | `lineage_integrations[].key` composite |
| `oauth_client_secrets` | `"oauth_config_key"` | `oauth_configurations[].key` in YAML |

Expand All @@ -169,27 +184,27 @@ The composite key for `environment_credentials` uses underscores: a project with

## What you can manage

**Account-level**
**Account-level** (optional unless noted; shared connections and RBAC live under `globals` in YAML)
- `account_features` — advanced CI, partial parsing, repo caching flags
- `global_connections` — shared warehouse connections (Databricks, Snowflake, BigQuery, Postgres, Redshift)
- `service_tokens` — API tokens with scoped permissions
- `groups` — user groups with project/account permissions
- `user_groups` — user-to-group assignments
- `notifications` — email, Slack, PagerDuty, webhook alerts
- `globals.connections` — shared warehouse connections (Databricks, Snowflake, BigQuery, Postgres, Redshift, and other adapter types supported by the provider)
- `globals.service_tokens` — API tokens with scoped permissions
- `globals.groups` — user groups with project/account permissions
- `user_groups` — user-to-group assignments (document root)
- `globals.notifications` — job alerts (dbt Cloud user, Slack channel, or external email)
- `oauth_configurations` — OAuth provider configs
- `ip_restrictions` — IP allowlist/denylist rules

**Per-project**
- `repository` — Git integration (GitHub App, GitLab deploy token, Azure DevOps, SSH)
- `environments` — deployment and development environments
- `credentials` — warehouse credentials (14 types: Databricks, Snowflake password/keypair, BigQuery, Postgres, Redshift, Athena, Fabric, Synapse, Starburst, Spark, Teradata)
- `jobs` — scheduled, CI, merge, and on-demand jobs
- `environment_variables` — project and environment-level dbt vars
- `extended_attributes` — connection-level overrides per environment
- `profiles` — links connection + credential + extended attributes
- `lineage_integrations` — Tableau/Looker lineage config
- `artefacts` — docs job and freshness job links
- `semantic_layer` — semantic layer configuration
- `repository` — Git integration (GitHub App, GitLab, Azure DevOps, deploy key/token)
- `environments` — deployment and development environments (reference a global connection with `connection`, or use `primary_profile_key` when using profiles)
- per-environment `credential` — warehouse credentials (many adapter types; secrets via `environment_credentials`)
- `jobs` — scheduled, CI, merge, and other job types; optional `environment_variable_overrides` for job-specific env vars
- `environment_variables` — project- and environment-scoped dbt vars (with map or list `environment_values` forms normalized at apply time)
- `extended_attributes` — connection-level override payloads linked from environments
- `profiles` — link connection, credentials, and extended attributes for deployment environments
- `lineage_integrations` — Tableau / Looker lineage config
- `project_artefacts` — docs job and freshness job keys
- `semantic_layer_config` — semantic layer target environment

---

Expand All @@ -198,11 +213,12 @@ The composite key for `environment_credentials` uses underscores: a project with
Set `protected: true` on any resource to prevent accidental deletion:

```yaml
global_connections:
- name: Databricks Production
key: databricks_prod
protected: true # terraform destroy will be blocked for this resource
...
globals:
connections:
- name: Databricks Production
key: databricks_prod
protected: true # terraform destroy will be blocked for this resource
...

projects:
- name: Analytics
Expand Down Expand Up @@ -241,8 +257,6 @@ terraform apply -var="yaml_file=./configs/finance.yml"
terraform apply -var="yaml_file=./configs/marketing.yml"
```

**Backward compatibility:** If your existing YAML uses the singular `project:` key, it still works — the module automatically wraps it in a list.

---

## Job scheduling
Expand Down
11 changes: 7 additions & 4 deletions docs/configuration/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ The key `"analytics_prod"` maps to a project with `key: analytics` and an enviro

### `connection_credentials`

Map of connection credential objects for global connections, keyed by `global_connections[].key`:
Map of connection credential objects for global connections, keyed by `globals.connections[].key`:

```bash
export TF_VAR_connection_credentials='{
Expand All @@ -216,13 +216,16 @@ export TF_VAR_connection_credentials='{

### `token_map`

Legacy Databricks token map, keyed by `credential.token_name` in YAML:
Used in two ways:

1. **Legacy Databricks** — keyed by `credential.token_name` in an environment `credential` block.
2. **Job env var overrides** — when a job sets `environment_variable_overrides` and a value starts with `secret_`, the prefix is removed and the remainder is looked up in this map (see [YAML Schema](yaml-schema.md)).

```bash
export TF_VAR_token_map='{"my_databricks_token": "dapi_abc123"}'
export TF_VAR_token_map='{"my_databricks_token": "dapi_abc123", "ci_override_secret": "sensitive-value"}'
```

This is the older pattern. Prefer `environment_credentials` for new setups.
For warehouse credentials, prefer `environment_credentials` over legacy `token_name` when possible.

### `lineage_tokens`

Expand Down
Loading
Loading