Skip to content

feat(ingestion): add MarkDeprecated transformer#18000

Open
rospe wants to merge 1 commit into
datahub-project:masterfrom
rospe:feat/mark-deprecated-transformer
Open

feat(ingestion): add MarkDeprecated transformer#18000
rospe wants to merge 1 commit into
datahub-project:masterfrom
rospe:feat/mark-deprecated-transformer

Conversation

@rospe

@rospe rospe commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Add a built-in transformer for setting the deprecation aspect on entities during ingestion. Currently there is no recipe-configurable way to mark assets as deprecated — users must use datahub put one entity at a time or write a custom transformer.

Motivation

When decommissioning data sources, teams need to bulk-deprecate datasets, dashboards, and other assets as part of their ingestion pipelines. This transformer makes that a single config block in any recipe.

Changes

New transformer: mark_deprecated

  • Sets deprecated, note, actor, replacement, and decommissionTime on entities
  • decommission_time defaults to the current time at pipeline start
  • urns filter: if populated, only matching entities are affected; if empty, all entities in the pipeline are marked
  • Supports OVERWRITE (default) and PATCH semantics
    • PATCH merges with existing server state — preserves the original note, actor, and decommission date if already set (useful for recurring pipelines where you want to keep the first deprecation timestamp)
  • Supported entity types: dataset, chart, dashboard, dataFlow, dataJob, container

Entry points: registered as mark_deprecated in both setup.py and pyproject.toml

Docs: added to the Universal Transformers page with config table and examples

Tests: unit tests covering OVERWRITE, PATCH, URN filtering, all entity types, and default decommission_time behavior

Example usage

transformers:
  - type: "mark_deprecated"
    config:
      semantics: PATCH
      deprecated: true
      note: "Source system decommissioned"
      replacement: "urn:li:dataset:(urn:li:dataPlatform:snowflake,db.schema.new_table,PROD)"

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly PR Title Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Adds a recipe-configurable transformer for setting the deprecation aspect
on entities flowing through the pipeline. Supports OVERWRITE and PATCH
semantics, decommission_time (defaults to now), replacement URN, and
URN-based filtering.

Supported entity types: dataset, chart, dashboard, dataFlow, dataJob,
container.
@github-actions github-actions Bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jun 23, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Linear: ING-2900

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Bundle Report

Bundle size has no change ✅

@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants