Update Custom Jobs (OpenLineage) docs #37278
Conversation
… reference, and dataset naming conventions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…open-lineage-docs
Preview links (active after the
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…itional Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OliviaShoup
left a comment
There was a problem hiding this comment.
thanks for the PR! this is a strong restructure and makes the page more readable
left some inline comments. also one bigger thing to consider:
now there are no worked example with inputs/outputs. the page tells users to "include inputs and outputs in your event" for lineage edges, and the dataset-naming table explains the namespace/name formats, but the code examples had their inputs removed, so nothing actually demonstrates a dataset reference. the PR description mentions an "optional COMPLETE with datasets" example that doesn't appear on the page. maybe you can add a concrete snippet (a COMPLETE event, or an annotated inputs/outputs block) so readers have a model?
|
|
||
| ## Step 1: Send a `START` event | ||
|
|
||
| Choose a method to send OpenLineage events to Datadog. All examples use the same `runId` UUID throughout the run—generate one and keep it. |
There was a problem hiding this comment.
Datadog style avoids em dashes that join clauses able to stand alone:
| Choose a method to send OpenLineage events to Datadog. All examples use the same `runId` UUID throughout the run—generate one and keep it. | |
| Choose a method to send OpenLineage events to Datadog. All examples use the same `runId` UUID throughout the run. Generate one and keep it. |
|
|
||
| #### `integration` values | ||
|
|
||
| Use `custom` for custom jobs. The values below are used by Datadog's native integrations—using them for custom jobs may produce unexpected behavior. In particular, `SPARK` prevents span generation. |
There was a problem hiding this comment.
marking for em dash
| Use `custom` for custom jobs. The values below are used by Datadog's native integrations—using them for custom jobs may produce unexpected behavior. In particular, `SPARK` prevents span generation. | |
| Use `custom` for custom jobs. The values below are used by Datadog's native integrations. Using them for custom jobs may produce unexpected behavior. In particular, `SPARK` prevents span generation. |
| ## Prerequisites | ||
|
|
||
| - A Datadog API key. See [API and Application Keys][6]. | ||
| - Your Datadog [site URL][3]. The examples on this page use `datadoghq.com`. Replace the hostname in the examples with the intake endpoint for your site. To find your site, see [Getting started with Datadog sites][3]. |
There was a problem hiding this comment.
[3] is linked twice in this one bullet ("site URL" and "Getting started with Datadog sites" both point to the same page). we can just link once like this:
| - Your Datadog [site URL][3]. The examples on this page use `datadoghq.com`. Replace the hostname in the examples with the intake endpoint for your site. To find your site, see [Getting started with Datadog sites][3]. | |
| - Your Datadog [site URL][3]. The examples on this page use `datadoghq.com`; replace the hostname with the intake endpoint for your site. |
|
|
||
| ```shell | ||
| export DD_API_KEY=your-datadog-api-key | ||
| export DD_API_KEY=<YOUR_API_KEY> |
There was a problem hiding this comment.
marking this for consistency (the curl and Python examples use <DD_API_KEY>, but this one uses <YOUR_API_KEY>)
| export DD_API_KEY=<YOUR_API_KEY> | |
| export DD_API_KEY=<DD_API_KEY> |
|
|
||
| | Facet | What Datadog does | | ||
| |---|---| | ||
| | parent | Creates parent-child job hierarchy in the lineage graph | |
There was a problem hiding this comment.
code-formatting for consistency
| | parent | Creates parent-child job hierarchy in the lineage graph | | |
| | `parent` | Creates parent-child job hierarchy in the lineage graph | |
Same for the rows below (errorMessage, tags, sql).
| | Facet | What Datadog does | | ||
| |---|---| | ||
| | parent | Creates parent-child job hierarchy in the lineage graph | | ||
| | errorMessage | Generates error spans with `error.message` and `error.stack` tags | |
There was a problem hiding this comment.
| | errorMessage | Generates error spans with `error.message` and `error.stack` tags | | |
| | `errorMessage` | Generates error spans with `error.message` and `error.stack` tags | |
| |---|---| | ||
| | parent | Creates parent-child job hierarchy in the lineage graph | | ||
| | errorMessage | Generates error spans with `error.message` and `error.stack` tags | | ||
| | tags | Adds span tags to the run; `_dd.ol_service` value maps to the Datadog service name | |
There was a problem hiding this comment.
| | tags | Adds span tags to the run; `_dd.ol_service` value maps to the Datadog service name | | |
| | `tags` | Adds span tags to the run; `_dd.ol_service` value maps to the Datadog service name | |
| | parent | Creates parent-child job hierarchy in the lineage graph | | ||
| | errorMessage | Generates error spans with `error.message` and `error.stack` tags | | ||
| | tags | Adds span tags to the run; `_dd.ol_service` value maps to the Datadog service name | | ||
| | sql | Parses and masks the SQL query; generates query events | |
There was a problem hiding this comment.
| | sql | Parses and masks the SQL query; generates query events | | |
| | `sql` | Parses and masks the SQL query; generates query events | |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s/outputs are optional Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hi @OliviaShoup! I made the changes you requested + added in example inputs/outputs. I also found a few more things I wanted to change/add so made those as well. Please lmk your thoughts! |
OliviaShoup
left a comment
There was a problem hiding this comment.
thanks for addressing the feedback so fast! looks great :) approving with one tiny style comment
| | Value | Platform | | ||
| |---|---| | ||
| | `custom` | Custom or unsupported platforms | | ||
| | `SPARK` | Apache Spark (native integration only—do not use for custom jobs) | |
There was a problem hiding this comment.
one remaining em dash (it was fixed in the prose sections but missed in this table cell)
| | `SPARK` | Apache Spark (native integration only—do not use for custom jobs) | | |
| | `SPARK` | Apache Spark (native integration only; do not use for custom jobs) | |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What does this PR do? What is the motivation?
Updates
content/en/data_observability/jobs_monitoring/openlineage/_index.mdto reflect the current state of the Custom Jobs (OpenLineage) product.Datadog)
JobTypeJobFacetwith integration values,processingType, andjobTypeoptionsMerge instructions
Merge readiness:
For Datadog employees:
Your branch name MUST follow the
<name>/<description>convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.If your branch doesn't follow this format, rename it or create a new branch and PR.
[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.
AI assistance
Used Claude Code for drafting and editing content, with manual review and corrections against internal docs.
Additional notes