Skip to content

[Bug] [BigQuery] dbt-adapters 1.23.0 breaks seeds with empty CSV — table dropped without recreation #1959

@ljubo-does-data

Description

@ljubo-does-data

Is this a new bug in dbt-adapters?

  • I believe this is a new bug in dbt-adapters
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

On BigQuery, a seed defined by an empty (header-only) CSV with column_types specified in YAML has been a valid pattern: dbt loads the CSV via the BigQuery load_table_from_file API with CREATE_IF_NEEDED, creating an empty table whose schema comes from column_types. The table is then available for relationship tests, downstream ref()s, and gets populated later via PR.

Since dbt-adapters 1.23.0 (introduced by #1866, --empty support for dbt seed), this pattern is broken on BigQuery:

  • On the first build where the seed table already exists, bigquery__reset_csv_table drops the table via client.delete_table() (REST API — does not appear in INFORMATION_SCHEMA.JOBS_BY_PROJECT).
  • The new rows_affected > 0 guard in the default seed materialization (dbt/include/global_project/macros/materializations/seeds/seed.sql) then skips load_csv_rows, which on BigQuery is the only macro that actually issues the LOAD job.
  • bigquery__create_csv_table is a no-op upstream.
  • Net result: the table is dropped and never recreated. Subsequent builds cannot recreate it either — the path goes through create_csv_table for non-existing tables, which is also a no-op.
  • dbt reports seed.<name> success INSERT 0 despite the table being gone.

Expected Behavior

The empty-CSV seed pattern should continue to work as it did in 1.22.x and earlier: the table is created (or maintained) with the schema specified by column_types, even when the CSV has no data rows.

Steps To Reproduce

  1. Install dbt-adapters>=1.23.0 and dbt-bigquery==1.11.1.
  2. Define a seed with a header-only CSV and column_types in _seeds.yml:

```yaml

  • name: my_lookup
    config:
    column_types:
    id: string
    label: string
    ```

```csv
id,label
```

  1. dbt seed --select my_lookup once: table is created (no rows, schema from column_types).
  2. dbt seed --select my_lookup again (existing table): table is dropped via delete_table REST call; load_csv_rows is skipped due to the new guard; table is not recreated.
  3. Any downstream not_null/unique/relationships test on the seed subsequently errors with Not found: Table ....

Relevant log output

```
seed..my_lookup success INSERT 0
test..not_null_my_lookup_id error
Database Error: Not found: Table :.my_lookup was not found in location
```

Environment

  • Python: 3.12
  • dbt-adapters: 1.23.0, 1.24.0, 1.24.1 (all reproduce — seed.sql is byte-identical across these)
  • dbt-bigquery: 1.11.1
  • dbt-core: 1.11.8 — 1.11.10 (all reproduce)

Which database adapter are you using?

BigQuery. The regression source is in dbt-adapters's default seed materialization. The breakage manifests specifically on BigQuery because of how bigquery__create_csv_table and bigquery__reset_csv_table are defined.

Additional Context

bigquery__create_csv_table in dbt-bigquery is a no-op:

```jinja
{% macro bigquery__create_csv_table(model, agate_table) %}
-- no-op
{% endmacro %}
```

bigquery__reset_csv_table only drops:

```jinja
{% macro bigquery__reset_csv_table(model, full_refresh, old_relation, agate_table) %}
{{ adapter.drop_relation(old_relation) }}
{% endmacro %}
```

bigquery__load_csv_rows is the only macro that calls adapter.load_dataframe, which is what actually issues the BigQuery LOAD job (with CREATE_IF_NEEDED and WRITE_TRUNCATE). In #1866 / 1.23.0, that call is now guarded by rows_affected > 0 in the default seed materialization:

```jinja
{% set sql = "" %}
{% if rows_affected > 0 %}
{% set sql = load_csv_rows(model, agate_table) %}
{% endif %}
```

For BigQuery, the result is that an empty CSV no longer triggers the LOAD job at all — neither to create the table nor to maintain it.

A possible fix could live in dbt-bigquery: have bigquery__create_csv_table issue an adapter.load_dataframe(...) call when the agate table has zero rows, and adjust bigquery__reset_csv_table to be a no-op (or drop-and-immediately-recreate) for empty CSVs. The fix likely needs to live in the BigQuery adapter rather than the global materialization, because the guard in seed.sql is intentional for --empty and the issue is that the BigQuery seed macros assume load_csv_rows will always run.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions