-
Notifications
You must be signed in to change notification settings - Fork 2
Feature/dbt producer compatibility test #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/dbt producer compatibility test #180
Conversation
- Add atomic test runner with CLI interface and validation - Add OpenLineage event generation and PIE framework integration - Add scenario-based testing structure for csv_to_duckdb_local - Include comprehensive documentation and maintainer info - Add gitignore exclusions for local artifacts and sensitive files This implements a complete dbt producer compatibility test that validates: - OpenLineage event generation from dbt runs - Event schema compliance using PIE framework validation - Column lineage, schema, and SQL facet extraction - Community-standard directory structure and documentation Signed-off-by: roller100 (BearingNode) <[email protected]>
…mentation version testing framework as future Signed-off-by: roller100 (BearingNode) <[email protected]>
…enhance documentation, and improve testing framework Signed-off-by: roller100 (BearingNode) <[email protected]>
… testing and recommendations Signed-off-by: roller100 (BearingNode) <[email protected]>
d0c3ace to
6d9938a
Compare
|
hi @roller100, thanks a lot for the contribution!! I review it and reach out about integrating it to our workflows :) |
- Add producer_dbt.yml workflow for automated CI/CD testing - Add run-scenario command to CLI for per-scenario event generation - Update releases.json to include dbt version tracking - Fix requirements.txt syntax for pip compatibility The workflow follows the official OpenLineage compatibility test framework: - Uses get_valid_test_scenarios.sh for version-based scenario filtering - Generates events in per-scenario directories as individual JSON files - Integrates with run_event_validation action for syntax/semantic validation - Produces standardized test reports for compatibility tracking This addresses Steering Committee feedback on PR OpenLineage#180 to integrate dbt producer tests with GitHub Actions workflows.
Merges workflow integration from test branch including: - GitHub Actions workflow (producer_dbt.yml) - Test runner enhancements (run-scenario command) - Version management (versions.json) - dbt test fixes (all 15 tests passing) - Release tracking updates Addresses Steering Committee feedback on PR OpenLineage#180 for automated CI/CD testing.
Add version matrix and scenario constraints: - versions.json: Define testable dbt-core (1.8.0) and OpenLineage (1.23.0) versions - config.json: Add component_versions and openlineage_versions to scenario Tested with get_valid_test_scenarios.sh - scenario correctly detected. These version constraints allow the workflow to filter which scenarios run for given version combinations, matching the pattern used by spark_dataproc and hive_dataproc producers. Signed-off-by: roller100 (BearingNode) <[email protected]>
- Add producer_dbt.yml workflow for automated CI/CD testing - Add run-scenario command to CLI for per-scenario event generation - Update releases.json to include dbt version tracking - Fix requirements.txt syntax for pip compatibility The workflow follows the official OpenLineage compatibility test framework: - Uses get_valid_test_scenarios.sh for version-based scenario filtering - Generates events in per-scenario directories as individual JSON files - Integrates with run_event_validation action for syntax/semantic validation - Produces standardized test reports for compatibility tracking This addresses Steering Committee feedback on PR OpenLineage#180 to integrate dbt producer tests with GitHub Actions workflows. Signed-off-by: roller100 (BearingNode) <[email protected]>
Add 'schema: main' to source definition so dbt tests can find the seed tables. Without this, source tests were looking for tables in a non-existent 'raw_data' schema, causing 7 test failures. Result: All 15 dbt tests now pass (PASS=15 WARN=0 ERROR=0) Signed-off-by: roller100 (BearingNode) <[email protected]>
dcfd892 to
d80a8cb
Compare
Trivial change to test GitHub Actions dbt workflow execution. Signed-off-by: roller100 (BearingNode) <[email protected]>
Enable workflow_dispatch to allow manual testing of dbt workflow. Signed-off-by: roller100 (BearingNode) <[email protected]>
[TEST] dbt Workflow Validation
Trivial change to test dbt workflow with complete integration. Signed-off-by: roller100 (BearingNode) <[email protected]>
[TEST] Validate dbt Workflow
Add missing dbt file change detection in pull request workflow. This enables the dbt job to trigger when dbt producer files are modified. Signed-off-by: roller100 (BearingNode) <[email protected]>
- Add workflow_dispatch event with component selection input - Support manual workflow execution without requiring PRs - Add conditional logic to handle both PR and manual triggers - Add dbt job definition with matrix strategy - Add dbt to collect-and-compare-reports dependencies This eliminates the need for internal test branches and PRs. Testing can now be done directly on feature branch with: gh workflow run main_pr.yml --ref feature/dbt-producer-compatibility-test Resolves workflow testing complexity and branch proliferation issues.
Update: Switching to PostgreSQL for Enhanced CI/CD IntegrationDuring workflow validation testing, we discovered that the current DuckDB-based scenario is incompatible with the Issue Identified
Revised Approach: PostgreSQL with GitHub Actions Service ContainersWe're switching to PostgreSQL to align with automated testing best practices and ensure full compatibility: Benefits:
Implementation:
This change strengthens the test by ensuring it runs reliably in automated CI/CD environments while maintaining the same test coverage for OpenLineage event generation. Will update the PR shortly with the PostgreSQL-based implementation. |
Hello OpenLineage Maintainers,
This pull request introduces a new compatibility test for the
openlineage-dbtproducer.Contribution
This test provides a standardized framework for validating that the
openlineage-dbtintegration generates events that are compliant with the OpenLineage specification. It covers bothdbt runanddbt testoperations, including validation for key facets likecolumnLineageanddataQualityAssertions.The test is self-contained and designed for reproducibility. It uses a local dbt project with a DuckDB backend and captures events using the file transport for validation.
Documentation
We have included comprehensive documentation to explain the architecture, workflow, and validation scope. For a full overview, please see the new
README.mdin theproducer/dbt/directory.A detailed facet-by-facet coverage analysis is also available in
producer/dbt/SPECIFICATION_COVERAGE_ANALYSIS.md.Future Work
Additionally, this contribution includes design documents in the
producer/dbt/future/directory that explore a potential approach for multi-spec and multi-implementation testing. We hope this can serve as a useful starting point for community discussions on this topic.We believe this test will be a valuable addition to the OpenLineage compatibility testing suite and look forward to your feedback.
Thank you,
The BearingNode Team