sdebruyn
diff --git a/‎docs/authentication.md‎
Lines changed: 245 additions & 0 deletions b/‎docs/authentication.md‎
Lines changed: 245 additions & 0 deletions
diff --git a/‎docs/feature-comparison.md‎
Lines changed: 8 additions & 0 deletions b/‎docs/feature-comparison.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎docs/python-models.md‎
Lines changed: 102 additions & 0 deletions b/‎docs/python-models.md‎
Lines changed: 102 additions & 0 deletions
@@ -0,0 +1,245 @@
+# Authentication
+
+The dbt-fabric-samdebruyn adapter supports a variety of authentication methods so you can connect to Microsoft Fabric from any environment. This guide walks through each method, explains when to use it, and provides ready-to-use `profiles.yml` examples.
+
+!!! tip "Quick recommendation"
+
+    | Scenario | Recommended method |
+    | --- | --- |
+    | Local development | [`CLI`](#azure-cli) or [`auto`](#automatic-defaultazurecredential) |
+    | CI/CD pipelines | [`environment`](#environment-variables) or [`ActiveDirectoryServicePrincipal`](#service-principal) |
+    | Fabric Notebook | [`environment`](#environment-variables) or [`ActiveDirectoryServicePrincipal`](#service-principal) |
+
+All examples below assume the following base profile structure. Only the authentication-related keys change per method.
+
+```yaml
+default:
+  target: dev
+  outputs:
+    dev:
+      type: fabric
+      workspace: My Workspace
+      database: my_data_warehouse
+      schema: dbt
+      # + authentication keys shown below
+```
+
+??? tip "Use environment variables for secrets"
+
+    Never hardcode secrets in your `profiles.yml`. Use Jinja to reference environment variables:
+
+    ```yaml
+    client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
+    ```
+
+---
+
+## Local development
+
+### Azure CLI
+
+The simplest way to authenticate during local development. Log in once with the Azure CLI and dbt will reuse that session.
+
+**Step 1 — Log in**
+
+```bash
+az login
+```
+
+Your account does not need access to any Azure subscription — it only needs access to your Fabric workspace.
+
+**Step 2 — Configure your profile**
+
+```yaml
+default:
+  target: dev
+  outputs:
+    dev:
+      type: fabric
+      database: my_data_warehouse
+      schema: dbt
+      workspace: My Workspace  # or use host
+      authentication: CLI
+```
+
+!!! info "Keep your Azure CLI up to date"
+
+    There have been reports of issues when using an outdated version of the Azure CLI. Run `az upgrade` to make sure you are on the latest version.
+
+The Azure CLI itself supports [multiple login methods](https://learn.microsoft.com/cli/azure/authenticate-azure-cli?view=azure-cli-latest&WT.mc_id=MVP_310840) (browser, device code, service principal, managed identity, …), making this a flexible option that adapts to many scenarios.
+
+### Automatic (`DefaultAzureCredential`)
+
+Set `authentication` to `auto` (or omit it entirely — it's the default). The adapter uses the Azure Identity SDK's [`DefaultAzureCredential`](https://learn.microsoft.com/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python&WT.mc_id=MVP_310840) which tries several credential sources in order:
+
+1. Environment variables
+2. Workload identity
+3. Managed identity
+4. Azure CLI
+5. Azure PowerShell
+6. Azure Developer CLI
+7. Interactive browser (if available)
+
+```yaml
+default:
+  target: dev
+  outputs:
+    dev:
+      type: fabric
+      database: my_data_warehouse
+      schema: dbt
+      workspace: My Workspace
+      # authentication: auto  ← this is the default, can be omitted
+```
+
+This means that if you are logged in with **Azure PowerShell** (`Connect-AzAccount`), it will automatically be picked up — no extra configuration needed.
+
+!!! tip "When to use `auto` vs `CLI`"
+
+    `auto` tries multiple credential sources in a chain, which means it can be slightly slower on first connection. It can also pick up credentials you don't intend to use — for example, a managed identity or environment variables left over from another tool. If you know you will always use the Azure CLI, setting `authentication: CLI` explicitly skips the chain, connects faster, and ensures no unexpected credentials are used.
+
+---
+
+## CI/CD & automated environments
+
+### Service Principal
+
+Use a Microsoft Entra ID app registration (service principal) with a client secret. This is ideal for unattended, automated runs.
+
+**Prerequisites:**
+
+- A registered application in Microsoft Entra ID
+- The application must have access to your Fabric workspace
+- You need the **client ID**, **client secret**, and **tenant ID**
+
+```yaml
+default:
+  target: ci
+  outputs:
+    ci:
+      type: fabric
+      database: my_data_warehouse
+      schema: dbt
+      workspace: My Workspace
+      authentication: ActiveDirectoryServicePrincipal
+      tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
+      client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
+      client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
+```
+
+!!! warning "Tenant ID is required"
+
+    When using `ActiveDirectoryServicePrincipal` together with [`workspace_name`](configuration.md#workspace_name) or [`workspace_id`](configuration.md#workspace_id) — or when running Python models — the `tenant_id` must be provided.
+
+### Environment variables
+
+Set `authentication` to `environment` and configure credentials through environment variables. The adapter uses Azure Identity's [`EnvironmentCredential`](https://learn.microsoft.com/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python&WT.mc_id=MVP_310840), which supports the following variables:
+
+=== "Service principal with secret"
+
+    | Variable | Description |
+    | --- | --- |
+    | `AZURE_TENANT_ID` | Microsoft Entra tenant ID |
+    | `AZURE_CLIENT_ID` | Application (client) ID |
+    | `AZURE_CLIENT_SECRET` | Client secret |
+
+=== "Service principal with certificate"
+
+    | Variable | Description |
+    | --- | --- |
+    | `AZURE_TENANT_ID` | Microsoft Entra tenant ID |
+    | `AZURE_CLIENT_ID` | Application (client) ID |
+    | `AZURE_CLIENT_CERTIFICATE_PATH` | Path to a PEM or PKCS12 certificate |
+    | `AZURE_CLIENT_CERTIFICATE_PASSWORD` | *(optional)* Certificate password |
+
+=== "Username & password"
+
+    | Variable | Description |
+    | --- | --- |
+    | `AZURE_TENANT_ID` | Microsoft Entra tenant ID |
+    | `AZURE_CLIENT_ID` | Application (client) ID |
+    | `AZURE_USERNAME` | Username |
+    | `AZURE_PASSWORD` | Password |
+
+```yaml
+default:
+  target: ci
+  outputs:
+    ci:
+      type: fabric
+      database: my_data_warehouse
+      schema: dbt
+      workspace: My Workspace
+      authentication: environment
+```
+
+This method keeps your `profiles.yml` completely free of secrets, which is an advantage over the explicit `ActiveDirectoryServicePrincipal` method.
+
+---
+
+## Fabric Notebook
+
+When running dbt inside a **Fabric Notebook**, the recommended approach is to use **environment variable** or **service principal** authentication.
+
+Configure your notebook to set the required environment variables (e.g. `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`) and use the [`environment`](#environment-variables) or [`ActiveDirectoryServicePrincipal`](#service-principal) method.
+
+```yaml
+default:
+  target: notebook
+  outputs:
+    notebook:
+      type: fabric
+      database: my_data_warehouse
+      schema: dbt
+      workspace: My Workspace
+      authentication: environment
+```
+
+Alternatively, with explicit service principal configuration:
+
+```yaml
+default:
+  target: notebook
+  outputs:
+    notebook:
+      type: fabric
+      database: my_data_warehouse
+      schema: dbt
+      workspace: My Workspace
+      authentication: ActiveDirectoryServicePrincipal
+      tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
+      client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
+      client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
+```
+
+!!! warning "`FabricSpark` is currently broken"
+
+    The adapter also has a `FabricSpark` (alias `SynapseSpark`) authentication method that uses [NotebookUtils](https://learn.microsoft.com/fabric/data-engineering/notebook-utilities?WT.mc_id=MVP_310840) to obtain an access token from the notebook session. However, this method is **not working** at the moment because Microsoft's Runtime in the Notebooks returns a credential with a scope that is not allowed to access Data Warehouses and SQL Endpoints. Use one of the alternatives above instead.
+
+---
+
+## Other methods
+
+The adapter supports several additional authentication methods such as managed identity, interactive browser, and pre-acquired access tokens. For a complete list of all supported methods and their configuration options, see the [configuration documentation](configuration.md#authentication).
+
+---
+
+## Troubleshooting
+
+### Which authentication method is being used?
+
+Run `dbt debug` to see the resolved connection information, including the active authentication method.
+
+```bash
+dbt debug
+```
+
+### Common issues
+
+| Symptom | Likely cause | Fix |
+| --- | --- | --- |
+| `Login timeout expired` | Slow network or restrictive firewall | Increase [`login_timeout`](configuration.md#login_timeout) (e.g. `30`) |
+| `AADSTS700016: Application not found` | Wrong `client_id` or the app isn't registered in the correct tenant | Verify the app registration in Microsoft Entra ID |
+| `DefaultAzureCredential failed` | No valid credential source found | Make sure you are logged in (`az login` / `Connect-AzAccount`) or that environment variables are set |
+| `Token expired` when using `access_token` | The pre-acquired token has expired | Refresh the token before running dbt |
+| `notebookutils not found` | Using `FabricSpark` outside of a Fabric/Synapse notebook | Switch to a different authentication method |
@@ -35,6 +35,10 @@ While most authentication methods have been contributed back to dbt-fabric, some
 
 ## MERGE in incremental and microbatch models
 
+!!! info
+
+    MERGE has recently been added in Microsoft's version as well.
+
 Incremental models in dbt-fabric support the `append`, `insert_overwrite`, and `delete+insert` strategies.
 
 This adapter adds support for [MERGE](https://learn.microsoft.com/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver17&WT.mc_id=MVP_310840).
@@ -72,6 +76,10 @@ select * from source('my_source', 'my_table')
 {% endif %}
 ```
 
+## Better support for [warehouse snapshots](warehouse-snapshots.md)
+
+Both adapters support Fabric [warehouse snapshots](https://learn.microsoft.com/fabric/data-warehouse/warehouse-snapshot?WT.mc_id=MVP_310840), but Microsoft's implementation hijacks Python runtime components and does not respect the proper dbt lifecycle. This adapter exposes a macro you can call from `on-run-start`, `on-run-end`, `post-hook`, or any other Jinja context — giving you full control over when and how often snapshots are taken.
+
 ## Better support for popular packages
 
 [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) is already fully supported and more packages are being tested and added.
 
@@ -0,0 +1,102 @@
+# Python models
+
+The dbt-fabric-samdebruyn adapter supports [Python models](https://docs.getdbt.com/docs/build/python-models), allowing you to use PySpark DataFrames to transform data in your Fabric Data Warehouse. This is a feature exclusive to this adapter — Microsoft's upstream dbt-fabric does not support it.
+
+Python models are useful when you need transformations that are difficult or impossible to express in SQL, such as machine learning inference, complex string parsing, or calling external APIs.
+
+---
+
+## Prerequisites
+
+To use Python models, your `profiles.yml` must include the following additional configuration options on top of the standard connection settings:
+
+| Option | Description |
+| --- | --- |
+| [`workspace`](configuration.md#workspace_name) or [`workspace_id`](configuration.md#workspace_id) | Identifies your Fabric Workspace. Required so the adapter can locate the Livy API endpoint. |
+| [`lakehouse`](configuration.md#lakehouse_name) or [`lakehouse_id`](configuration.md#lakehouse_id) | Identifies the Lakehouse where Spark sessions run. A Lakehouse must exist in your workspace. |
+
+!!! warning "Tenant ID required for service principal auth"
+
+    If you are using [`ActiveDirectoryServicePrincipal`](configuration.md#activedirectoryserviceprincipal) authentication, you must also provide the [`tenant_id`](configuration.md#tenant_id) option.
+
+### Example profile
+
+```yaml
+default:
+  target: dev
+  outputs:
+    dev:
+      type: fabric
+      workspace: My Workspace
+      database: my_data_warehouse
+      schema: dbt
+      lakehouse: My Lakehouse
+      authentication: CLI
+```
+
+---
+
+## Writing a Python model
+
+A Python model is a `.py` file in your `models/` directory that defines a `model()` function. This function receives a `dbt` object and a `spark` session, and must return a PySpark DataFrame.
+
+```python
+def model(dbt, spark):
+    source_df = dbt.ref("my_upstream_model")
+
+    result_df = source_df.withColumn("full_name", 
+        spark.sql("concat(first_name, ' ', last_name)")
+    )
+
+    return result_df
+```
+
+### The `dbt` object
+
+The `dbt` object provides the same interface as in other adapters:
+
+- **`dbt.ref("model_name")`** — Returns a PySpark DataFrame for the referenced model.
+- **`dbt.source("source_name", "table_name")`** — Returns a PySpark DataFrame for the referenced source.
+- **`dbt.config.get("key")`** — Access the model's configuration.
+
+### The `spark` object
+
+The `spark` object is a standard PySpark `SparkSession`. Behind the scenes, the adapter configures it with Fabric's [synapsesql connector](https://learn.microsoft.com/fabric/data-engineering/spark-data-warehouse-connector?WT.mc_id=MVP_310840) so that `dbt.ref()` and `dbt.source()` read directly from your Data Warehouse.
+
+---
+
+## How it works
+
+Understanding the execution flow can help with debugging:
+
+1. **Code generation** — dbt compiles your Python model and wraps it with boilerplate that configures the Spark session and sets up the `synapsesql` connector for reads and writes.
+2. **Livy session** — The adapter connects to the [Livy API](https://learn.microsoft.com/fabric/data-engineering/lakehouse-api?WT.mc_id=MVP_310840) on your Lakehouse and either reuses an existing Spark session named `dbt-fabric` or creates a new one.
+3. **Statement execution** — The compiled code is submitted as a PySpark statement to the Livy session.
+4. **Write back** — The returned DataFrame is written to your Data Warehouse using `synapsesql` in `overwrite` mode.
+
+All Python models in a single dbt run share the same Livy session, which avoids the overhead of starting a new Spark session for each model.
+
+---
+
+## Limitations
+
+| Limitation | Details |
+| --- | --- |
+| **Table materialization only** | Python models only support the `table` materialization. Incremental models are not supported. |
+| **PySpark DataFrames only** | Your `model()` function must return a PySpark DataFrame. Pandas DataFrames are not supported. |
+| **Always full refresh** | The table is fully replaced (`overwrite` mode) on each run. |
+| **Session timeout** | The adapter polls for session and statement completion with a timeout of approximately 5 minutes. Long-running Spark jobs may hit this limit. |
+
+---
+
+## Troubleshooting
+
+### Common issues
+
+| Symptom | Likely cause | Fix |
+| --- | --- | --- |
+| `workspace_id must be provided` | Missing workspace configuration | Add [`workspace`](configuration.md#workspace_name) or [`workspace_id`](configuration.md#workspace_id) to your profile |
+| `lakehouse_id must be provided` | Missing lakehouse configuration | Add [`lakehouse`](configuration.md#lakehouse_name) or [`lakehouse_id`](configuration.md#lakehouse_id) to your profile |
+| Livy session times out | The Spark session took too long to start | Retry — Fabric Spark sessions can be slow to start on first use |
+| Statement fails with `synapsesql` error | Connection between Spark and the Data Warehouse failed | Verify that the Lakehouse and Data Warehouse are in the same workspace |
+| `HTTP 429` errors in logs | Fabric API rate limiting | The adapter handles this automatically with retries — no action needed |