-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Contribute Openlineage to dbt-core
#11688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
a84a7ec
to
9ccb1ed
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
9ccb1ed
to
f26d822
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
4fc0388
to
d5c0d5c
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
d5c0d5c
to
d77afdd
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
d77afdd
to
4167064
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
4167064
to
9e2209b
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
9e2209b
to
a1d67cf
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
a1d67cf
to
fd8fea2
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @MassyB |
|
||
def add_to_parser(self, parser: OptionParser, ctx: Context): | ||
def parser_process(value: str, state: ParsingState): | ||
@t.no_type_check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my pre-commit mypy
step was failing on this so I added some annotations and mypy ignore comments
fd8fea2
to
4e338af
Compare
ol_handler = OpenLineageHandler(ctx) | ||
callbacks = ctx.obj.get("callbacks", []) | ||
if is_runnable_dbt_command(flags): | ||
callbacks.append(ol_handler.handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where the OL callback is added
) | ||
|
||
|
||
ALL_PROTO_TYPES: Dict[str, Any] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useful to convert a dict to an actual type defined in proto
return f"Artifacts skipped for command : {self.msg}" | ||
|
||
|
||
class OpenLineageException(WarnLevel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that all the events have been moved to https://github.com/dbt-labs/proto-python-public
How do we do to add an event ?
the documentation still references core_types.proto but I couldn’t find it
@@ -0,0 +1,410 @@ | |||
import traceback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the most important part of the PR where we construct OL events out of the dbt structured logs
"pydantic<2", | ||
# ---- | ||
# OpenLineage Dependencies | ||
"openlineage-python==1.30.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only the python client is added from OL
return ParseDict(e, msg_cls()) | ||
|
||
|
||
def assert_ol_events_match(expected_event, actual_event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main function used in functional tests to assert that two sets of events are the same.
The interesting part is the usage of regex-like feature where patterns like {{ .* }}
is used to match a given string.
You can use a regex by enclosing it like so
{{<space><YOUR-REGEX-HERE><space>}}
try: | ||
self.handle_unsafe(e) | ||
except Exception as exception: | ||
self._handle_exception(exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all exceptions related to OL are non critical. They don't make dbt fail
self._handle_exception(exception) | ||
|
||
def _handle_exception(self, e: Exception): | ||
fire_event(OpenLineageException(exc=str(e), exc_info=traceback.format_exc())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need help to add this new event OpenLineageException
following the new public proto
@@ -0,0 +1,1010 @@ | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an example of OL events generated
@@ -0,0 +1,1010 @@ | |||
[ | |||
{ | |||
"eventTime":"{{ .* }}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regexes have to be defined following:
{{<space><REGEX><space>}}
Signed-off-by: Massy Bourennani <[email protected]>
Signed-off-by: Massy Bourennani <[email protected]>
6dd1a3d
to
b9b0e9c
Compare
dbt-core
dbt-core
dbt-core
dbt-core
Resolves #11750
TL;DR
This PR integrates dbt with Openlineage. It unlocks lineage tracking and observability of the dbt pipelines.
Openlineage is an open source standard. From its main page:
Openlineage (OL) defines events according to a specification. This PR constructs those OL events by consuming the dbt structured logs and sends them to an endpoint.
The endpoint that consumes OL events is totally configurable by the user. It can be Marquez, Datadog or something else. Examples in this PR are using Datadog.
Problem
Let's build the jaffle shop project using the following command
We have this output
I've truncated the output but:
This output doesn't tell us the SQL queries executed by every model. We can use the
--debug
for that:We have the following output:
Observability of the dbt pipeline is not ideal:
This PR is about enhancing the observability of dbt pipelines and addressing the shortcomings mentioned above.
Solution
Instead of relying on the textual logs to report progress of the dbt pipeline, This PR integrates dbt-core with Openlineage. Like what has been done for Apache Airflow.
Below are examples on how we leverage those OL events in Datadog to report on the progression of dbt pipelines.
When running:
In the waterfall view we can see:
This is when we build the entire jaffle shop project:
An interesting flame graph view when the jaffle shop project is executed using two threads
PR details
You can see a presentation of the integration in this short YT video (relevant part is ~10 minutes long). Be sure to check the linked PRs in order to have more context.
in a nutshell this PR adds a callback that listens for particular dbt structured logs events.
for each of those events an OL event is generated and emitted.
How to test
This PR adds functional tests that checks the generated OL events against expected ones.
You can run them by setting up a dev environment and execute the following command
pytest "tests/functional/openlineage/openlineage_project.py"
If there is a failure you will get a json-path-like to the attribute that has a discrepancy.
For unit tests you can run
pytest "tests/unit/openlineage/"
Linked PRs/Issues
Additional context form the Openlineage repository
test
andbuild
commands OpenLineage/OpenLineage#3362test
andbuild
commands OpenLineage/OpenLineage#3362Checklist
PS
Perhaps the most important motivation of this PR: tell your dbt teammate @le-brice Massy says hi.