Skip to content

Conversation

@roller100
Copy link

Hello OpenLineage Maintainers,

This pull request introduces a new compatibility test for the openlineage-dbt producer.

Contribution

This test provides a standardized framework for validating that the openlineage-dbt integration generates events that are compliant with the OpenLineage specification. It covers both dbt run and dbt test operations, including validation for key facets like columnLineage and dataQualityAssertions.

The test is self-contained and designed for reproducibility. It uses a local dbt project with a DuckDB backend and captures events using the file transport for validation.

Documentation

We have included comprehensive documentation to explain the architecture, workflow, and validation scope. For a full overview, please see the new README.md in the producer/dbt/ directory.

A detailed facet-by-facet coverage analysis is also available in producer/dbt/SPECIFICATION_COVERAGE_ANALYSIS.md.

Future Work

Additionally, this contribution includes design documents in the producer/dbt/future/ directory that explore a potential approach for multi-spec and multi-implementation testing. We hope this can serve as a useful starting point for community discussions on this topic.

We believe this test will be a valuable addition to the OpenLineage compatibility testing suite and look forward to your feedback.

Thank you,
The BearingNode Team

roller100 (BearingNode) added 4 commits September 22, 2025 11:19
- Add atomic test runner with CLI interface and validation
- Add OpenLineage event generation and PIE framework integration
- Add scenario-based testing structure for csv_to_duckdb_local
- Include comprehensive documentation and maintainer info
- Add gitignore exclusions for local artifacts and sensitive files

This implements a complete dbt producer compatibility test that validates:
- OpenLineage event generation from dbt runs
- Event schema compliance using PIE framework validation
- Column lineage, schema, and SQL facet extraction
- Community-standard directory structure and documentation

Signed-off-by: roller100 (BearingNode) <[email protected]>
…mentation version testing framework as future

Signed-off-by: roller100 (BearingNode) <[email protected]>
…enhance documentation, and improve testing framework

Signed-off-by: roller100 (BearingNode) <[email protected]>
… testing and recommendations

Signed-off-by: roller100 (BearingNode) <[email protected]>
@roller100 roller100 force-pushed the feature/dbt-producer-compatibility-test branch from d0c3ace to 6d9938a Compare September 22, 2025 10:20
@tnazarew
Copy link
Collaborator

hi @roller100, thanks a lot for the contribution!! I review it and reach out about integrating it to our workflows :)

roller100 pushed a commit to BearingNode/openlineage-compatibility-tests that referenced this pull request Nov 18, 2025
- Add producer_dbt.yml workflow for automated CI/CD testing
- Add run-scenario command to CLI for per-scenario event generation
- Update releases.json to include dbt version tracking
- Fix requirements.txt syntax for pip compatibility

The workflow follows the official OpenLineage compatibility test framework:
- Uses get_valid_test_scenarios.sh for version-based scenario filtering
- Generates events in per-scenario directories as individual JSON files
- Integrates with run_event_validation action for syntax/semantic validation
- Produces standardized test reports for compatibility tracking

This addresses Steering Committee feedback on PR OpenLineage#180 to integrate
dbt producer tests with GitHub Actions workflows.
roller100 pushed a commit to BearingNode/openlineage-compatibility-tests that referenced this pull request Nov 18, 2025
Merges workflow integration from test branch including:
- GitHub Actions workflow (producer_dbt.yml)
- Test runner enhancements (run-scenario command)
- Version management (versions.json)
- dbt test fixes (all 15 tests passing)
- Release tracking updates

Addresses Steering Committee feedback on PR OpenLineage#180 for automated CI/CD testing.
roller100 (BearingNode) added 3 commits November 18, 2025 12:04
Add version matrix and scenario constraints:
- versions.json: Define testable dbt-core (1.8.0) and OpenLineage (1.23.0) versions
- config.json: Add component_versions and openlineage_versions to scenario

Tested with get_valid_test_scenarios.sh - scenario correctly detected.

These version constraints allow the workflow to filter which scenarios
run for given version combinations, matching the pattern used by
spark_dataproc and hive_dataproc producers.

Signed-off-by: roller100 (BearingNode) <[email protected]>
- Add producer_dbt.yml workflow for automated CI/CD testing
- Add run-scenario command to CLI for per-scenario event generation
- Update releases.json to include dbt version tracking
- Fix requirements.txt syntax for pip compatibility

The workflow follows the official OpenLineage compatibility test framework:
- Uses get_valid_test_scenarios.sh for version-based scenario filtering
- Generates events in per-scenario directories as individual JSON files
- Integrates with run_event_validation action for syntax/semantic validation
- Produces standardized test reports for compatibility tracking

This addresses Steering Committee feedback on PR OpenLineage#180 to integrate
dbt producer tests with GitHub Actions workflows.

Signed-off-by: roller100 (BearingNode) <[email protected]>
Add 'schema: main' to source definition so dbt tests can find the
seed tables. Without this, source tests were looking for tables in
a non-existent 'raw_data' schema, causing 7 test failures.

Result: All 15 dbt tests now pass (PASS=15 WARN=0 ERROR=0)
Signed-off-by: roller100 (BearingNode) <[email protected]>
@roller100 roller100 force-pushed the feature/dbt-producer-compatibility-test branch from dcfd892 to d80a8cb Compare November 18, 2025 12:04
roller100 (BearingNode) and others added 3 commits November 18, 2025 12:26
Trivial change to test GitHub Actions dbt workflow execution.

Signed-off-by: roller100 (BearingNode) <[email protected]>
Enable workflow_dispatch to allow manual testing of dbt workflow.

Signed-off-by: roller100 (BearingNode) <[email protected]>
Trivial change to test dbt workflow with complete integration.

Signed-off-by: roller100 (BearingNode) <[email protected]>
roller100 and others added 3 commits November 18, 2025 13:04
Add missing dbt file change detection in pull request workflow.
This enables the dbt job to trigger when dbt producer files are modified.

Signed-off-by: roller100 (BearingNode) <[email protected]>
- Add workflow_dispatch event with component selection input
- Support manual workflow execution without requiring PRs
- Add conditional logic to handle both PR and manual triggers
- Add dbt job definition with matrix strategy
- Add dbt to collect-and-compare-reports dependencies

This eliminates the need for internal test branches and PRs.
Testing can now be done directly on feature branch with:
  gh workflow run main_pr.yml --ref feature/dbt-producer-compatibility-test

Resolves workflow testing complexity and branch proliferation issues.
@roller100
Copy link
Author

Update: Switching to PostgreSQL for Enhanced CI/CD Integration

During workflow validation testing, we discovered that the current DuckDB-based scenario is incompatible with the dbt-ol (OpenLineage dbt integration) in automated CI/CD environments.

Issue Identified

dbt-ol does not support the DuckDB adapter. While dbt itself works perfectly with DuckDB for local development, the OpenLineage integration layer fails during event generation:

NotImplementedError: Only bigquery, snowflake, redshift, spark, postgres, 
databricks, sqlserver, dremio, athena adapters are supported. Passed duckdb

Revised Approach: PostgreSQL with GitHub Actions Service Containers

We're switching to PostgreSQL to align with automated testing best practices and ensure full compatibility:

Benefits:

  • Fully supported by dbt-ol - PostgreSQL is a tier-1 supported adapter
  • Zero external infrastructure - Uses GitHub Actions service containers (no secrets, no external databases)
  • Self-contained testing - PostgreSQL runs locally in the workflow runner
  • Consistent with existing patterns - Aligns with spark_dataproc producer approach
  • Easier to reproduce - Standard PostgreSQL setup anyone can replicate

Implementation:

  • Add PostgreSQL service container to producer_dbt.yml workflow
  • Update dbt profiles to use postgres adapter with localhost connection
  • Keep same CSV seeds and model transformations (just executing in PostgreSQL)
  • All OpenLineage events still captured to local file transport

This change strengthens the test by ensuring it runs reliably in automated CI/CD environments while maintaining the same test coverage for OpenLineage event generation.

Will update the PR shortly with the PostgreSQL-based implementation.

@roller100 roller100 closed this Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants