fix: migrate to AdapterLogger, separate extract/delta columns, fix INSERT order, parallelize ADLS walk by mdrakiburrahman · Pull Request #15 · microsoft/dbt-scope

mdrakiburrahman · 2026-04-08T01:54:35Z

Why this change is needed

Several issues were discovered during production pipeline testing:

Noisy logging — The adapter used logging.getLogger(__name__) which bypassed dbt's log routing, flooding stdout with internal debug noise. Operators couldn't distinguish adapter messages from dbt's own output.
Extract vs. Delta column conflation — A single scope_columns list (with an extract: bool flag) was used for both CREATE TABLE and EXTRACT clauses. This breaks when source file columns differ from the final Delta table schema (e.g. computed/derived columns, renamed columns).
INSERT column-order mismatch — INSERT INTO @target SELECT * FROM @batch_data assumed positional column alignment between the user's SELECT and the Delta table schema. Any reordering caused silent data corruption or runtime failures.
Slow ADLS Gen1 directory listing — Recursive _walk() was serial, blocking the adapter for minutes on deep directory trees.
Per-model AU/priority not wired — Model-level au and priority config was parsed but never forwarded to the ADLA job submission.

How

1. Migrate logging to `AdapterLogger` (all adapter modules)

Replaced logging.getLogger(__name__) with AdapterLogger("scope") across all adapter modules (impl.py, connections.py, script_builder.py, checkpoint.py, adls_gen1_client.py, delta_lake.py, file_tracker.py, _file_lock.py, sqlglot_parser.py). Demoted most log.info() calls to log.debug() so routine operational detail only appears at debug verbosity. This funnels all adapter output through dbt's event system.

Added _pretty_print_file_batch() in impl.py — uses pandas to render a human-readable table of file metadata (timestamps → ISO-8601, sizes → human-readable) for debug-level batch logging.

2. Separate Delta table columns from Extract columns

Config rename: scope_columns → split into delta_table_columns (CREATE TABLE schema) + extract_columns (EXTRACT column list).

ScriptConfig: replaced columns: list[ColumnDef] with delta_columns and extract_columns. Removed ColumnDef.extract flag — no longer needed since the two column sets are explicit.

ScriptBuilder: _create_table() uses delta_columns, _extract_from_files() uses extract_columns, and _model_transform_and_insert() receives delta_columns for the explicit INSERT column list.

Macros: table.sql and incremental.sql read both config keys and pass them through separately. utils.sql helper macros renamed accordingly.

impl.py (build_script_config): parses delta_table_columns and extract_columns from model config independently.

3. Fix INSERT/SELECT column-order mismatch

_model_transform_and_insert() now takes delta_columns and emits:

INSERT INTO @target
SELECT col1, col2, col3 FROM @batch_data;

instead of SELECT *, guaranteeing column alignment with the Delta table schema regardless of the user's SELECT order.

The Jinja fallback path in table.sql also changed from SELECT * to SELECT {{ delta_table_columns | map(attribute='name') | join(', ') }}.

4. Parallel ADLS Gen1 directory walk

AdlsGen1Client._walk() rewritten to use ThreadPoolExecutor with concurrent.futures.wait(FIRST_COMPLETED). Each directory is listed in parallel (default max_workers=8), with per-directory timing logged at debug level. Zero-length files are now skipped. FileInfo gains a raw: dict field preserving the original ADLS entry for debug display.

list_relations_without_caching in impl.py also gains per-step timing instrumentation.

5. Per-model AU and priority support

ScopeConnectionHandle gains _next_job_au and _next_job_priority fields.
ScopeAdapter exposes set_next_job_au() and set_next_job_priority() as @available methods.
ScopeConnectionManager.execute() reads and clears these per-call overrides.
Both table.sql and incremental.sql macros call the new setters when the model config specifies au or priority.

6. README and integration test updates

README examples updated to use delta_table_columns / extract_columns syntax with partition_column_in_extract flag.
Integration test models (append_no_delete.sql, filtered_edition.sql) updated to use the new config keys.

Test

All existing unit tests pass with updated fixtures reflecting the new column config shape.
New unit tests added for AdlsGen1Client parallel walk (test_adls_gen1_client.py).
New unit tests for explicit INSERT column list generation (test_script_builder.py).
Integration tests updated to use the new delta_table_columns / extract_columns config.

The previous commit (6e4a398) fixed _model_transform_and_insert in script_builder.py, but the materializations use the Jinja macro scope__build_file_based_script in table.sql — not the Python ScriptBuilder. Apply the same explicit column list fix to the Jinja template so INSERT INTO @target includes column names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…e order The previous fix added an explicit column list to INSERT but kept SELECT *, which still emits columns in model SELECT order. Replace SELECT * with SELECT col1, col2, ... in delta_columns (table definition) order so the positional mapping is correct regardless of model SQL order. Fixes both script_builder.py and the Jinja macro in table.sql. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mdrakiburrahman added 2 commits April 8, 2026 01:51

Improve logging by using adapter

e3dcb43

Pull out Delta Table columns vs Extract Columns

c842acc

This was linked to issues Apr 8, 2026

bug: Use adapter logging not Python logging #13

Closed

bug: SCOPE Extractor tries to pull out columns with the same name as Delta #14

Closed

mdrakiburrahman and others added 3 commits April 8, 2026 02:33

Fix column name ordering bug

6e4a398

mdrakiburrahman changed the title ~~fix: Improve logging by using adapter and several bug fixes~~ fix: migrate to AdapterLogger, separate extract/delta columns, fix INSERT order, parallelize ADLS walk Apr 8, 2026

mdrakiburrahman marked this pull request as ready for review April 8, 2026 03:20

query_poll_timeout_seconds

986f37e

mdrakiburrahman merged commit 8d4c89c into main Apr 8, 2026
2 checks passed

mdrakiburrahman deleted the dev/mdrrahman/perf-improvements branch April 8, 2026 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: migrate to AdapterLogger, separate extract/delta columns, fix INSERT order, parallelize ADLS walk#15

fix: migrate to AdapterLogger, separate extract/delta columns, fix INSERT order, parallelize ADLS walk#15
mdrakiburrahman merged 6 commits intomainfrom
dev/mdrrahman/perf-improvements

mdrakiburrahman commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mdrakiburrahman commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this change is needed

How

1. Migrate logging to AdapterLogger (all adapter modules)

2. Separate Delta table columns from Extract columns

3. Fix INSERT/SELECT column-order mismatch

4. Parallel ADLS Gen1 directory walk

5. Per-model AU and priority support

6. README and integration test updates

Test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mdrakiburrahman commented Apr 8, 2026 •

edited

Loading

1. Migrate logging to `AdapterLogger` (all adapter modules)