Description
First of all, I am really enjoying this tool! Unfortunately I have come across this bug which is blocking a rollout to the wider team so I am hoping that there is a quick fix!
Describe the bug
When comparing two piperider reports, a warning "bigquery" is returned and no comparison summary is generated.
$ piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report
────────────────────────────── Comparison report ───────────────────────────────
Selected reports:
Base: /target_prod_profile/run.json
Target: /outputs/pre-release-20231206213745/run.json
Warning:
'bigquery'
Got problem to generate changeset.
Comparison report:
/data_warehouse/comparison_report/index.html
Reproduce
Unfortunately, I cannot provide the manifest jsons but I will try my best to describe the issue and steps taken.
- I generated a run report on production and I run this across all models in prod
dbt compile -t prod
piperider run --dbt-target prod --debug --report-dir $CI_PROJECT_DIR
- When I open an MR, I run a dbt run on the modified and new models only in staging
dbt run --fail-fast -t pre-release --select "state:modified.body+ state:modified.configs+ state:new+" --defer --state /target_prod
- I create the staging piperider run report only on the models above
piperider run --select "state:modified.body+ state:modified.configs+ state:new+" --state /target_prod --dbt-target pre-release --debug --report-dir $CI_PROJECT_DIR
- I then compare the two reports
piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report
What's strange is that the diff summary report works for some MRs but not others. I have tried to find the common trait but I am unable to.
The MR and subsequent report comparison that works is a very simple test case where I add a text column to an existing table with a constant value e.g.
...
"apples" as fruit,
...
Looking at the comparison report, row and col information for both base (production) and target (staging) are recorded.
What I have tried
- Tried locally and in my CI environment for the failed cases
- Tested the run.json of the MR that did work above both locally and in the CI pipeline and it still works
- Tried different combinations of adding cols, removing cols, enforcing data contracts etc
piperider diagnose
passes
I attached a debugger and I tried to figure out what was going on.
- The
GraphDataChangeSet
object fails to be created - This is due to the function call for
list_changes_in_unique_id
failing - This fails because when it invokes the dbt task, the key error for
bigquery
(hence the warning printed on the console), theAdapterContainer
'slookup_adapter
function is called which attempts to extract thebigquery
adapter using thebigquery
key
def lookup_adapter(self, adapter_name: str) -> Adapter:
return self.adapters[adapter_name]
Relevant code linked here
Expected behavior
Diff summary reports for dbt models that have been changed.
Example output below from the successful MR comparison
Selected reports:
Base: /target_prod_profile/run.json
Target: /outputs/pre-release-20231206165611/run.json
Impact Summary:
Code Changes: added=0, removed=0, modified=2
Resource Impact: potentially_impacted=7, assessed=7, skipped=0, impacted=5
Comparison report:
/data_warehouse/comparison_report/index.html
Comparison summary:
/data_warehouse/comparison_report/summary.md
Desktop (please complete the following information):
- OS: macOS local env and ubuntu in CI env
- Python Version 3.10
- Version v0.41 piperider
- dbt-core 1.7.2
- dbt-bigquery 1.7.2