trouze · trouze · Apr 7, 2026 · Apr 7, 2026 · Apr 7, 2026 · Apr 7, 2026
@@ -0,0 +1,106 @@
+# Data lookups module (`modules/data_lookups`)
+
+This module centralizes **read-only** account discovery used when YAML references resources that are **not** defined in the same Terraform state (for example connections that already exist in dbt Cloud, or GitHub App installation IDs that are account-specific).
+
+It mirrors the intent of `modules/projects_v2/data_sources.tf` on the importer branch, but as an explicit child module with clear inputs and outputs so root orchestration stays predictable.
+
+## When the module is instantiated
+
+Root enables `module.data_lookups` when **either**:
+
+- The merged project YAML contains at least one **`LOOKUP:`** global-connection placeholder (see below), **or**
+- `var.dbt_pat` is set (so GitHub installations can be fetched from the dbt Cloud integrations API).
+
+Gating uses `local._lookup_connection_ref_strings` in `variables.tf`; keep that extraction **in sync** with the `lookup_connection_keys` logic in `modules/data_lookups/main.tf`.
+
+## `LOOKUP:` global connections
+
+### Syntax
+
+Use a **string** value that starts with `LOOKUP:` followed by the **exact display name** of an existing global connection in the target dbt Cloud account (the `name` field returned by `data.dbtcloud_global_connections`).
+
+Example:
+
+```yaml
+environments:
+  - name: Prod
+    key: prod
+    type: deployment
+    connection_key: "LOOKUP:Snowflake Production"
+```
+
+The map key passed to `modules/environments` and `modules/profiles` is the **full placeholder string** (e.g. `LOOKUP:Snowflake Production`), not the name alone.
+
+### Where placeholders are scanned
+
+- **Environments**: `connection` if set, otherwise `connection_key` (same precedence as `modules/environments` resolution).
+- **Profiles**: `connection_key` only.
+
+### Resolution
+
+1. `data.dbtcloud_global_connections` runs **only** when at least one such placeholder exists (avoids an unnecessary read).
+2. `lookup_connection_ids` maps each placeholder to `tostring(connection.id)` where `connection.name == replace(placeholder, "LOOKUP:", "")`.
+3. Root builds `local.global_connection_ids_effective`:
+
+   `merge(lookup_connection_ids, managed_global_connection_ids)`
+
+   **Managed Terraform connections win on key collision** (in practice YAML keys and `LOOKUP:…` keys should not overlap).
+
+### Validation (V-01)
+
+`validation.tf` **does not** require `LOOKUP:…` values to appear under `global_connections[]`. Placeholders are intentionally for **pre-existing** connections. If no matching name exists in the account, resolution yields `null` and apply can fail on the environment resource; fixing that is an operational/data issue, not schema validation.
+
+## GitHub App installations
+
+When `var.dbt_pat` is non-null, the module calls:
+
+`GET {dbt_host}/api/v2/integrations/github/installations/`
+
+with `Authorization: Bearer <dbt_pat>`.
+
+Outputs:
+
+- `github_installation_by_owner` — map of **lowercase** GitHub `account.login` → installation **numeric id**.
+- `github_installation_fallback_id` — first installation in the filtered list when owner matching is not used.
+
+**Note:** Service tokens cannot use this API; use a PAT. Default host for the HTTP call is `coalesce(var.dbt_host_url, "https://cloud.getdbt.com")` with a trailing `/api` segment stripped if present.
+
+### Consumption in `modules/repository`
+
+Root passes `module.data_lookups[0].github_installation_by_owner` and `github_installation_fallback_id` into the repository module when `data_lookups` is enabled (same conditions as above). The repository module resolves **`github_installation_id`** in order:
+
+1. **`repository.github_installation_id`** from YAML, if set  
+2. **`github_installation_by_owner[lower(owner)]`** where `owner` is parsed from `remote_url` (`github.com/<owner>/…` or `git@github.com:<owner>/…`)  
+3. **`github_installation_fallback_id`** (first installation returned for the account)
+
+**Auto-detect GitHub** (`remote_url` on github.com, no explicit `git_clone_strategy`) uses **`github_app`** only when a non-null resolved installation id exists **or** `dbt_pat` is set (discovery may fill the id at apply). Otherwise it uses **`deploy_key`**.
+
+Explicit **`git_clone_strategy: github_app`** follows the same rule: without YAML id, discovery map entry, fallback, or PAT, strategy downgrades to **`deploy_key`**.
+
+Root still exposes the GitHub outputs for debugging and for any external callers.
+
+## Repository `LOOKUP:` (scalar, legacy)
+
+If `project.repository` is a **scalar** string beginning with `LOOKUP:` (v2 / importer style), it is collected in `lookup_repository_keys`. There is **no** resolution here yet; repository linking for the current v1 object-shaped `repository` block is unchanged.
+
+## Root outputs
+
+| Output | Meaning |
+|--------|---------|
+| `connection_ids` | **Effective** map used by environments/profiles (managed + `LOOKUP:`). |
+| `lookup_connection_ids` | Only the `LOOKUP:`-resolved entries. |
+| `github_installation_by_owner` / `github_installation_fallback_id` | From integrations API when PAT is set. |
+
+## Dependencies
+
+- **Provider**: `hashicorp/http` (declared in root `providers.tf` and the module).
+- **dbt Cloud**: `data.dbtcloud_global_connections` uses the default `dbtcloud` provider configuration at root (`dbt_token`, `dbt_account_id`).
+
+## Extending this module
+
+When adding new lookup types:
+
+1. Add **inputs** only if root cannot derive them from existing YAML/locals.
+2. Gate **expensive** `data` sources with a `count` tied to a `local.needs_*` flag.
+3. Expose stable **outputs**; merge at root if multiple modules need the same id map.
+4. Update **this document** and, where relevant, `schemas/v1.json` descriptions.
@@ -11,6 +11,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 
+- Documentation now matches the **YAML schema version 1** layout: `version: 1`, `account`, `globals` (connections, service tokens, groups, notifications, PrivateLink), environment field **`connection`** (not `connection_key`), `project_artefacts` / `semantic_layer_config`, and job **`environment_variable_overrides`**. Examples and troubleshooting were updated accordingly.
+
 ### Fixed
 
 ### Removed

@@ -15,7 +15,7 @@ This downloads the [examples/basic/](examples/basic/) starter into `./my-dbt-clo
 Then:
 
 ```bash
-cd my-dbt-platform
+cd my-dbt-cloud
 cp .env.example .env        # fill in your dbt Cloud credentials
 # edit dbt-config.yml       # replace YOUR_ placeholders with your warehouse details
 source .env && terraform init && terraform apply
@@ -69,7 +69,26 @@ module "dbt_cloud" {
 
 **2. Create `dbt-config.yml`**
 
+Configuration uses **`version: 1`**, an **`account`** block (including `host_url` for the dbt Cloud region), shared resources under **`globals`** (connections, service tokens, groups, notifications, PrivateLink endpoints), and a **`projects`** list. Validate in your editor with [`schemas/v1.json`](docs/configuration/yaml-schema.md).
+
 ```yaml
+# yaml-language-server: $schema=https://raw.githubusercontent.com/trouze/terraform-dbtcloud-yaml/main/schemas/v1.json
+
+version: 1
+account:
+  name: Your Account
+  host_url: https://cloud.getdbt.com
+
+globals:
+  connections:
+    - name: Databricks Production
+      key: databricks_prod
+      type: databricks
+      details:
+        host: adb-1234567890.1.azuredatabricks.net
+        http_path: /sql/1.0/warehouses/abc123
+        catalog: main
+
 projects:
   - name: Analytics
     key: analytics
@@ -81,7 +100,7 @@ projects:
     environments:
       - name: Production
         key: prod
-        connection_key: databricks_prod   # references global_connections key below
+        connection: databricks_prod      # globals.connections[].key (or numeric id / LOOKUP:…)
         deployment_type: production
         type: deployment
         custom_branch: main
@@ -93,8 +112,12 @@ projects:
 
       - name: Development
         key: dev
-        connection_key: databricks_prod
+        connection: databricks_prod
         type: development
+        credential:
+          credential_type: databricks
+          catalog: main
+          schema: analytics_dev
 
     jobs:
       - name: Daily Build
@@ -110,14 +133,6 @@ projects:
         schedule_type: days_of_week
         schedule_days: [1, 2, 3, 4, 5]
         schedule_hours: [6]
-
-global_connections:
-  - name: Databricks Production
-    key: databricks_prod
-    type: databricks
-    host: adb-1234567890.1.azuredatabricks.net
-    http_path: /sql/1.0/warehouses/abc123
-    catalog: main
 ```
 
 **3. Create `terraform.tfvars`**
@@ -157,9 +172,9 @@ Sensitive values are never in the YAML file. They're passed as Terraform variabl
 
 | Variable | Key format | Matches |
 |---|---|---|
-| `token_map` | `"my_token_name"` | `credential.token_name` in YAML (Databricks legacy) |
+| `token_map` | `"my_token_name"` | `credential.token_name` (Databricks legacy) or `jobs[].environment_variable_overrides` values prefixed with `secret_` |
 | `environment_credentials` | `"project_key_env_key"` | Environment credential by composite key |
-| `connection_credentials` | `"connection_key"` | `global_connections[].key` in YAML |
+| `connection_credentials` | `"connection_key"` | `globals.connections[].key` in YAML |
 | `lineage_tokens` | `"project_key_integration_key"` | `lineage_integrations[].key` composite |
 | `oauth_client_secrets` | `"oauth_config_key"` | `oauth_configurations[].key` in YAML |
 
@@ -169,27 +184,27 @@ The composite key for `environment_credentials` uses underscores: a project with
 
 ## What you can manage
 
-**Account-level**
+**Account-level** (optional unless noted; shared connections and RBAC live under `globals` in YAML)
 - `account_features` — advanced CI, partial parsing, repo caching flags
-- `global_connections` — shared warehouse connections (Databricks, Snowflake, BigQuery, Postgres, Redshift)
-- `service_tokens` — API tokens with scoped permissions
-- `groups` — user groups with project/account permissions
-- `user_groups` — user-to-group assignments
-- `notifications` — email, Slack, PagerDuty, webhook alerts
+- `globals.connections` — shared warehouse connections (Databricks, Snowflake, BigQuery, Postgres, Redshift, and other adapter types supported by the provider)
+- `globals.service_tokens` — API tokens with scoped permissions
+- `globals.groups` — user groups with project/account permissions
+- `user_groups` — user-to-group assignments (document root)
+- `globals.notifications` — job alerts (dbt Cloud user, Slack channel, or external email)
 - `oauth_configurations` — OAuth provider configs
 - `ip_restrictions` — IP allowlist/denylist rules
 
 **Per-project**
-- `repository` — Git integration (GitHub App, GitLab deploy token, Azure DevOps, SSH)
-- `environments` — deployment and development environments
-- `credentials` — warehouse credentials (14 types: Databricks, Snowflake password/keypair, BigQuery, Postgres, Redshift, Athena, Fabric, Synapse, Starburst, Spark, Teradata)
-- `jobs` — scheduled, CI, merge, and on-demand jobs
-- `environment_variables` — project and environment-level dbt vars
-- `extended_attributes` — connection-level overrides per environment
-- `profiles` — links connection + credential + extended attributes
-- `lineage_integrations` — Tableau/Looker lineage config
-- `artefacts` — docs job and freshness job links
-- `semantic_layer` — semantic layer configuration
+- `repository` — Git integration (GitHub App, GitLab, Azure DevOps, deploy key/token)
+- `environments` — deployment and development environments (reference a global connection with `connection`, or use `primary_profile_key` when using profiles)
+- per-environment `credential` — warehouse credentials (many adapter types; secrets via `environment_credentials`)
+- `jobs` — scheduled, CI, merge, and other job types; optional `environment_variable_overrides` for job-specific env vars
+- `environment_variables` — project- and environment-scoped dbt vars (with map or list `environment_values` forms normalized at apply time)
+- `extended_attributes` — connection-level override payloads linked from environments
+- `profiles` — link connection, credentials, and extended attributes for deployment environments
+- `lineage_integrations` — Tableau / Looker lineage config
+- `project_artefacts` — docs job and freshness job keys
+- `semantic_layer_config` — semantic layer target environment
 
 ---
 
@@ -198,11 +213,12 @@ The composite key for `environment_credentials` uses underscores: a project with
 Set `protected: true` on any resource to prevent accidental deletion:
 
 ```yaml
-global_connections:
-  - name: Databricks Production
-    key: databricks_prod
-    protected: true   # terraform destroy will be blocked for this resource
-    ...
+globals:
+  connections:
+    - name: Databricks Production
+      key: databricks_prod
+      protected: true   # terraform destroy will be blocked for this resource
+      ...
 
 projects:
   - name: Analytics
@@ -241,8 +257,6 @@ terraform apply -var="yaml_file=./configs/finance.yml"
 terraform apply -var="yaml_file=./configs/marketing.yml"
 ```
 
-**Backward compatibility:** If your existing YAML uses the singular `project:` key, it still works — the module automatically wraps it in a list.
-
 ---
 
 ## Job scheduling

@@ -199,7 +199,7 @@ The key `"analytics_prod"` maps to a project with `key: analytics` and an enviro
 
 ### `connection_credentials`
 
-Map of connection credential objects for global connections, keyed by `global_connections[].key`:
+Map of connection credential objects for global connections, keyed by `globals.connections[].key`:
 
 ```bash
 export TF_VAR_connection_credentials='{
@@ -216,13 +216,16 @@ export TF_VAR_connection_credentials='{
 
 ### `token_map`
 
-Legacy Databricks token map, keyed by `credential.token_name` in YAML:
+Used in two ways:
+
+1. **Legacy Databricks** — keyed by `credential.token_name` in an environment `credential` block.
+2. **Job env var overrides** — when a job sets `environment_variable_overrides` and a value starts with `secret_`, the prefix is removed and the remainder is looked up in this map (see [YAML Schema](yaml-schema.md)).
 
 ```bash
-export TF_VAR_token_map='{"my_databricks_token": "dapi_abc123"}'
+export TF_VAR_token_map='{"my_databricks_token": "dapi_abc123", "ci_override_secret": "sensitive-value"}'
 ```
 
-This is the older pattern. Prefer `environment_credentials` for new setups.
+For warehouse credentials, prefer `environment_credentials` over legacy `token_name` when possible.
 
 ### `lineage_tokens`