fix: migrate to AdapterLogger, separate extract/delta columns, fix INSERT order, parallelize ADLS walk#15
Merged
mdrakiburrahman merged 6 commits intomainfrom Apr 8, 2026
Conversation
This was
linked to
issues
Apr 8, 2026
The previous commit (6e4a398) fixed _model_transform_and_insert in script_builder.py, but the materializations use the Jinja macro scope__build_file_based_script in table.sql — not the Python ScriptBuilder. Apply the same explicit column list fix to the Jinja template so INSERT INTO @target includes column names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e order The previous fix added an explicit column list to INSERT but kept SELECT *, which still emits columns in model SELECT order. Replace SELECT * with SELECT col1, col2, ... in delta_columns (table definition) order so the positional mapping is correct regardless of model SQL order. Fixes both script_builder.py and the Jinja macro in table.sql. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this change is needed
Several issues were discovered during production pipeline testing:
logging.getLogger(__name__)which bypassed dbt's log routing, flooding stdout with internal debug noise. Operators couldn't distinguish adapter messages from dbt's own output.scope_columnslist (with anextract: boolflag) was used for bothCREATE TABLEandEXTRACTclauses. This breaks when source file columns differ from the final Delta table schema (e.g. computed/derived columns, renamed columns).INSERT INTO @target SELECT * FROM @batch_dataassumed positional column alignment between the user's SELECT and the Delta table schema. Any reordering caused silent data corruption or runtime failures._walk()was serial, blocking the adapter for minutes on deep directory trees.auandpriorityconfig was parsed but never forwarded to the ADLA job submission.How
1. Migrate logging to
AdapterLogger(all adapter modules)Replaced
logging.getLogger(__name__)withAdapterLogger("scope")across all adapter modules (impl.py,connections.py,script_builder.py,checkpoint.py,adls_gen1_client.py,delta_lake.py,file_tracker.py,_file_lock.py,sqlglot_parser.py). Demoted mostlog.info()calls tolog.debug()so routine operational detail only appears at debug verbosity. This funnels all adapter output through dbt's event system.Added
_pretty_print_file_batch()inimpl.py— uses pandas to render a human-readable table of file metadata (timestamps → ISO-8601, sizes → human-readable) for debug-level batch logging.2. Separate Delta table columns from Extract columns
Config rename:
scope_columns→ split intodelta_table_columns(CREATE TABLE schema) +extract_columns(EXTRACT column list).ScriptConfig: replacedcolumns: list[ColumnDef]withdelta_columnsandextract_columns. RemovedColumnDef.extractflag — no longer needed since the two column sets are explicit.ScriptBuilder:_create_table()usesdelta_columns,_extract_from_files()usesextract_columns, and_model_transform_and_insert()receivesdelta_columnsfor the explicit INSERT column list.Macros:
table.sqlandincremental.sqlread both config keys and pass them through separately.utils.sqlhelper macros renamed accordingly.impl.py(build_script_config): parsesdelta_table_columnsandextract_columnsfrom model config independently.3. Fix INSERT/SELECT column-order mismatch
_model_transform_and_insert()now takesdelta_columnsand emits:instead of
SELECT *, guaranteeing column alignment with the Delta table schema regardless of the user's SELECT order.The Jinja fallback path in
table.sqlalso changed fromSELECT *toSELECT {{ delta_table_columns | map(attribute='name') | join(', ') }}.4. Parallel ADLS Gen1 directory walk
AdlsGen1Client._walk()rewritten to useThreadPoolExecutorwithconcurrent.futures.wait(FIRST_COMPLETED). Each directory is listed in parallel (defaultmax_workers=8), with per-directory timing logged at debug level. Zero-length files are now skipped.FileInfogains araw: dictfield preserving the original ADLS entry for debug display.list_relations_without_cachinginimpl.pyalso gains per-step timing instrumentation.5. Per-model AU and priority support
ScopeConnectionHandlegains_next_job_auand_next_job_priorityfields.ScopeAdapterexposesset_next_job_au()andset_next_job_priority()as@availablemethods.ScopeConnectionManager.execute()reads and clears these per-call overrides.table.sqlandincremental.sqlmacros call the new setters when the model config specifiesauorpriority.6. README and integration test updates
delta_table_columns/extract_columnssyntax withpartition_column_in_extractflag.append_no_delete.sql,filtered_edition.sql) updated to use the new config keys.Test
AdlsGen1Clientparallel walk (test_adls_gen1_client.py).test_script_builder.py).delta_table_columns/extract_columnsconfig.