17 Mar 20:29

skrydal

a5e4380

v1.4.0.9 Latest

Latest

🚀 Ingestion Release Notes: v1.4.0.8 → v1.4.0.9rc1

🌟 New Features

dbt — Extract and emit stats from catalog.json (datahub-project#16044) — @alfiyas-datahub
BigQuery — Enrich external table metadata with source format, URIs, compression, and max bad records (datahub-project#16348) — @EladLeev
Glue — Iceberg Lineage support (datahub-project#16562) — @ligfx
Power BI — Add external URL for Power BI App entities (datahub-project#16572) — @alfiyas-datahub

🐛 Bug Fixes

Ingestion — Bump authlib to >=1.6.9 to address JWE RSA1_5 padding oracle vulnerability (datahub-project#16633) — @david-leifker
CLI — Add .gql files to the wheel build (datahub-project#16637) — @skrydal

🙌 Contributors

Thanks to @alfiyas-datahub, @EladLeev, @ligfx, @david-leifker, and @skrydal for their contributions to this release!

Contributors

ligfx, EladLeev, and 3 other contributors

Assets 2

17 Mar 19:12

skrydal

v1.4.0.9rc1

a5e4380

v1.4.0.9rc1 Pre-release

Pre-release

Full Changelog: v1.4.0.8...v1.4.0.9rc1

Assets 2

17 Mar 13:17

skrydal

v1.4.0.8

cbf7f9f

v1.4.0.8

DataHub Ingestion v1.4.0.8

🌟 New Features
Configurable ingestion report sample sizes — You can now control how many failure and warning entries appear in ingestion reports via environment variables DATAHUB_REPORT_FAILURE_SAMPLE_SIZE and DATAHUB_REPORT_WARNING_SAMPLE_SIZE (default: 10 each). Useful when debugging large ingestion runs where you need more failure context. (datahub-project#16165) — thanks @rob-1019!

🐛 Bug Fixes
Pin sqlglot dependency — Pinned sqlglotc to prevent unexpected version drift that could break SQL parsing in ingestion sources. (datahub-project#16614) — thanks @jayacryl!

📄 Documentation
Fix dead link on Integrations page — Added the missing request-connector documentation page, resolving a broken link on the Integrations page that pointed to a non-existent route. (datahub-project#16617) — thanks @shirshanka!
Full Changelog: v1.4.0.7...v1.4.0.8

Contributors

shirshanka, jayacryl, and rob-1019

Assets 2

16 Mar 17:27

askumar27

v1.4.0.7

92367ce

v1.4.0.7

What's New in v1.4.0.7

Breaking Changes 🚨

Browse paths for DataFlow/DataJob with platform_instance: When platform_instance is configured, DataFlow and DataJob entities now receive a browsePathsV2 aspect with the platform instance as the root path. Previously these entities were placed in a generic "Default" folder, mixing entities from multiple platform instances. Affects Fivetran, Glue, Kafka-Connect, and other sources that emit DataFlow/DataJob entities with platform_instance. Sources without platform_instance are unaffected. (datahub-project#15270) by @treff7es
DataHubRestEmitter.emit_mcps() return type changed: The method now returns List[TraceData] instead of int. To get the previous chunk count, use len(result) on the returned list. emit_mcp() now returns Optional[TraceData] instead of None. (datahub-project#15744) by @pedro93
Mode connector: SQL parsing behavior change: Join resolution for CTEs and subqueries was optimized. In rare edge cases with unusual CTE patterns, join metadata may differ from previous results. Affects all SQL-based lineage connectors. (datahub-project#16300) by @treff7es
PowerBI: extract_column_level_lineage now defaults to true: Previously defaulted to false. Set extract_column_level_lineage: false in your recipe to restore previous behavior. (datahub-project#16568) by @ligfx

New Features ✨

New RDF ingestion connector: Ingest metadata from RDF data sources including support for FIBO and BCBS-239 ontology dialects, glossary terms, domains, and relationships. Supports Turtle, JSON-LD, RDF/XML, and other formats, with SPARQL filtering. (datahub-project#15741) by @stephengoldbaum
Trace ID support in REST emitter: The SDK now exposes trace IDs for SYNC_PRIMARY and ASYNC emit modes, enabling easier debugging and status checking of ingestion operations. (datahub-project#15744) by @pedro93
Mode connector performance improvements: Major performance upgrade — concurrent API fetching with threading, SQL query response caching, improved rate limiting, and SQL parsing optimizations. Column-level lineage now falls back to table-level lineage when parsing times out. (datahub-project#16300) by @treff7es
Kafka-Connect: Debezium and Confluent JDBC sink connector support: Added support for Debezium source connectors and Confluent JDBC sink connectors, expanding lineage coverage for Kafka Connect pipelines. (datahub-project#16483) by @acrylJonny
datahub search CLI command: New command with semantic search, field projection, and agent context support for querying DataHub from the command line. (datahub-project#16471) by @shirshanka
SQLAlchemy profiler feature parity: The SQLAlchemy profiler now achieves feature parity with the GE profiler, including additional column statistics and type mappings. (datahub-project#16529) by @sgomezvillamor
Glue: lastModified from table UpdateTime: AWS Glue datasets now populate the lastModified field in dataset properties from the Glue table's UpdateTime. (datahub-project#16508) by @alokr-dhub
Configurable ingestion report sample sizes: New options to control the number of failure/warning samples kept in ingestion reports, with improved failure logging for easier debugging. (datahub-project#16165) by @rob-1019

Bug Fixes 🐛

Glue: fine-grained lineage without a graph: Fixed an error that prevented column-level lineage generation when no graph service is configured. (datahub-project#16494) by @ligfx
Glue: treat table UpdateTime as UTC: AWS Glue returns update times as UTC without timezone info; the connector now correctly interprets them as UTC. (datahub-project#16561) by @ligfx
BigQuery: thread-safe GCP client credentials: Fixed a thread-safety issue by passing explicit credentials to GCP clients, preventing credential sharing between threads during concurrent ingestion. (datahub-project#16579) by @jayacryl
Ingestion emit modes regression: Restored correct emit mode behavior (SYNC_PRIMARY, ASYNC, SYNC) after a regression in a prior release. (datahub-project#16521) by @askumar27
Kafka-Connect: platform instance resolution: Fixed incorrect platform instance resolution in the Kafka Connect schema resolver. (datahub-project#16526) by @kevinkarchacryl
Kafka-Connect: JDBC sink DataJobs when runtime topics API is empty: Fixed an issue where DataJobs were not produced for JDBC sink connectors when the runtime topics API returned no data. (datahub-project#16557) by @acrylJonny
Schema resolver bulk-fetch caching: Fixed a caching bug in the schema resolver's bulk-fetch path that could cause redundant API calls and slow down SQL lineage resolution. (datahub-project#16499) by @treff7es
Pin sqlglotc dependency: Pinned sqlglotc to prevent unexpected breakage from upstream updates. (datahub-project#16614) by @jayacryl
PyArrow minimum version bumped for CVE: Updated the minimum pyarrow version to address a known security vulnerability. (datahub-project#16563) by @david-leifker
Security dependency updates: Applied CVE minimum versions via constraints and bumped Authlib and filelock to address known vulnerabilities. (datahub-project#16517) by @david-leifker

Improvements 🔧

Reproducible ingestion builds: Added uv.lock and constraints.txt to pin all transitive dependencies, enabling fully reproducible ingestion environment builds. (datahub-project#16489) by @kyungsoo-datahub
Lock file freshness checks: Added CI validation to verify that uv.lock and constraints.txt stay in sync with dependency manifests. (datahub-project#16559) by @kyungsoo-datahub
Dependency constraint fixes: Added missing dependency constraints to resolve installation conflicts in certain environments. (datahub-project#16513) by @kyungsoo-datahub
DataPlex: updated datacatalog lineage and protobuf dependencies: Upgraded the DataPlex connector to use newer library versions. (datahub-project#16560) by @sgomezvillamor
Kafka connector configurable replication factor: Replication factor is now configurable per topic for Kafka topic ingestion. (datahub-project#16585) by @david-leifker

Documentation 📚

Connector docs structure consistency: Standardized the structure of all ingestion connector documentation pages. (datahub-project#16431) by @sgomezvillamor
Power BI docs updated for Entra configuration: Updated Power BI ingestion documentation to reflect Microsoft Entra (formerly Azure AD) authentication setup. (datahub-project#16519) by @ligfx
Streamlined integrations catalog: Improved the integrations page with an expanded connector catalog and updated logos for many platforms including RDF, Confluent, DataPlex, and more. (datahub-project#16597) by @shirshanka
RDF connector documentation: Added documentation for the new RDF ingestion connector. (datahub-project#16589, datahub-project#16617) by @shirshanka

Contributors

Thanks to all contributors: @treff7es, @stephengoldbaum, @pedro93, @rob-1019, @acrylJonny, @ligfx, @shirshanka, @kyungsoo-datahub, @alokr-dhub, @sgomezvillamor, @jayacryl, @askumar27, @kevinkarchacryl, @david-leifker, @Dutt23

Full Changelog: v1.4.0.6...v1.4.0.7

Contributors

stephengoldbaum, ligfx, and 13 other contributors

Assets 2

16 Mar 22:56

askumar27

v1.4.0.8rc1

0b134a5

v1.4.0.8rc1 Pre-release

Pre-release

DataHub Ingestion v1.4.0.8rc1

🌟 New Features

Configurable ingestion report sample sizes — You can now control how many failure and warning entries appear in ingestion reports via environment variables DATAHUB_REPORT_FAILURE_SAMPLE_SIZE and DATAHUB_REPORT_WARNING_SAMPLE_SIZE (default: 10 each). Useful when debugging large ingestion runs where you need more failure context. (datahub-project#16165) — thanks @rob-1019!

🐛 Bug Fixes

Pin sqlglot dependency — Pinned sqlglotc to prevent unexpected version drift that could break SQL parsing in ingestion sources. (datahub-project#16614) — thanks @jayacryl!

📄 Documentation

Fix dead link on Integrations page — Added the missing request-connector documentation page, resolving a broken link on the Integrations page that pointed to a non-existent route. (datahub-project#16617) — thanks @shirshanka!

Full Changelog: v1.4.0.7...v1.4.0.8rc1

Contributors

shirshanka, jayacryl, and rob-1019

Assets 2

16 Mar 16:57

askumar27

v1.4.0.7rc3

92367ce

v1.4.0.7rc3 Pre-release

Pre-release

DataHub Ingestion — v1.4.0.7rc3

🌟 New Features

Configurable ingestion report sample sizes — Control how many failure and warning samples appear in ingestion reports via environment variables (DATAHUB_REPORT_FAILURE_SAMPLE_SIZE, DATAHUB_REPORT_WARNING_SAMPLE_SIZE). Defaults remain unchanged at 10. (datahub-project#16165, @rob-1019)
SQLAlchemy profiler feature parity with Great Expectations — The SQLAlchemy profiler now matches GE behavior: generates basic profiles even when row count fails due to permission errors, skips column profiling for empty tables (performance win), and adds support for DECIMAL/NUMERIC column types via a new ProfilerDataType.NUMERIC type. (datahub-project#16529, @sgomezvillamor)
Improved Integrations catalog page — The integrations catalog is now auto-generated from docgen with descriptions, logos, support tiers, and platform type metadata. The page features category pill filters, support-level badges, and improved card layout. (datahub-project#16597, @shirshanka)

🐛 Bug Fixes

Kafka Connect: column-level lineage URN fix — When building column-level lineage URNs for Kafka topics, the connect_to_platform_map was being ignored. The fix uses the existing get_platform_instance() helper that correctly checks both platform_instance_map and connect_to_platform_map. (datahub-project#16526, @kevinkarchacryl)
Kafka Connect: JDBC sink DataJobs when runtime topics API is empty — When a JDBC sink connector hasn't yet processed messages or after a topic reset, the runtime topics API returns an empty list, causing lineage edges and DataJob entities to be silently dropped. The fix falls back to config-defined topics when the runtime API returns nothing. (datahub-project#16557, @acrylJonny)
RDF connector: fixed docGen CI failure — docgen.py threw a KeyError: 'rdf' because it only iterated source_registry plugins. Now iterates the union of source_registry and connector_registry, fixing CI on master. Also adds missing lockfiles from the original RDF PR. (datahub-project#16589, @shirshanka)
Pin sqlglot to a stable version to prevent unexpected breakage from upstream releases. (datahub-project#16614, @jayacryl)

🔒 Security

Protobuf upgraded to 5.x (CVE-2026-0994) — The google-cloud-datacatalog-lineage dependency was pinned to 0.2.2, constraining protobuf to vulnerable versions <5.x. Updated to >=0.5.0,<1.0.0 with migrated import paths and regenerated lockfiles. (datahub-project#16560, @sgomezvillamor)

📚 Documentation

Added a request-connector page to fix a dead link on the Integrations page. The page guides users to request new connectors via the FeatureOS portal, GitHub issues, or by building their own. (datahub-project#16617, @shirshanka)

Full Changelog: v1.4.0.7rc2...v1.4.0.7rc3

Contributors

sgomezvillamor, shirshanka, and 4 other contributors

Assets 2

15 Mar 18:36

skrydal

v1.4.0.7rc2

adf9bd9

v1.4.0.7rc2 Pre-release

Pre-release

DataHub v1.4.0.7rc2 Release Notes (Ingestion)

New Features

RDF Connector (MVP) (#15741) — New connector for RDF/Linked Data metadata ingestion, supporting Turtle, N-Triples, and other RDF serialization formats. by @stephengoldbaum
GraphQL Query Projection System (#16522) — Introduces a GraphQL query projection system for schema compatibility, improving the reliability of the CLI and Python SDK against varying server versions. by @shirshanka
SQLAlchemy Profiler Feature Parity (#16529) — The SQLAlchemy profiler now achieves full feature parity with the Great Expectations profiler, including improved type mapping. by @sgomezvillamor
Configurable Report Sample Sizes (#16165) — Adds configurable sample sizes for ingestion reports and enhanced failure logging for better observability. by @rob-1019
PowerBI Column-Level Lineage Enabled by Default (#16568) — extract_column_level_lineage is now true by default for PowerBI. Set extract_column_level_lineage: false to restore the previous behavior. by @ligfx

Bug Fixes

Emit Modes Regression (#16521) — Restored correct async/sync/test emit modes in the DataHub REST sink following a regression introduced in datahub-project#15968. by @askumar27
Kafka Connect JDBC Sink Datajobs (#16557) — Kafka Connect now correctly emits DataJob entities for JDBC sink connectors when the runtime topics API returns empty results. by @acrylJonny
Kafka Platform Instance Helper (#16526) — Fixed platform instance resolution in the Kafka Connect schema resolver. by @kevinkarchacryl
BigQuery Thread Safety (#16579) — BigQuery ingestion now passes explicit credentials to all GCP clients, preventing credential sharing across threads. by @jayacryl
Glue Table UpdateTime Timezone (#16561) — Fixed AWS Glue table UpdateTime not being correctly interpreted as UTC. by @ligfx

Security

PyArrow Minimum Version (#16563) — Bumped minimum pyarrow version to address CVE-2026-25087. by @david-leifker

Other Improvements

Updated google-cloud-datacatalog lineage and protobuf dependencies (#16560) by @sgomezvillamor
Pinned sqlglotc for ingestion stability (#16614) by @jayacryl

Documentation

Streamlined Integrations page and improved connector catalog generation (#16597) by @shirshanka
Added request-connector page, fixing a dead link on the Integrations page (#16617) by @shirshanka
Fixed doc generation failure for the RDF connector (#16589) by @shirshanka

Contributors

Thanks to all 12 contributors for this release: @acrylJonny, @askumar27, @david-leifker, @jayacryl, @kevinkarchacryl, @ligfx, @rob-1019, @sgomezvillamor, @shirshanka, @sgomezvillamor, @stephengoldbaum, and @alokr-dhub.

Full Changelog: v1.4.0.7rc1...v1.4.0.7rc2

Contributors

stephengoldbaum, ligfx, and 9 other contributors

Assets 2

12 Mar 08:03

treff7es

v1.4.0.6

d1e9bb4

v1.4.0.6

DataHub Ingestion v1.4.0.6

🚨 Breaking Changes

Oracle connector URN update — When connecting via service_name to a multitenant Oracle database, dataset URNs now use the Pluggable Database (PDB) name instead of the Container Database (CDB) name. Set urn_db_name in your recipe to preserve old URNs. (datahub-project#16396) — @acrylJonny
Python packaging migration — Dependency declarations now use pyproject.toml (PEP 621). setup.py remains the source of truth for now but will be deprecated in a future release. (datahub-project#16339) — @kyungsoo-datahub

🌟 New Features

New Snowplow connector — Ingest metadata from Snowplow analytics pipelines (datahub-project#15735) — @treff7es
dbt semantic models — Full support for ingesting dbt semantic model metadata (datahub-project#16236) — @alfiyas-datahub
dbt convert_urns_to_lowercase — Opt-in flag to prevent duplicate entities from mixed-case identifiers on case-insensitive platforms like Snowflake (datahub-project#16358) — @alfiyas-datahub
Snowflake pattern pushdown — Metadata pattern pushdown and table type filtering for improved performance (datahub-project#16100) — @rajatoss
Trino column-level lineage — Column-level lineage support on upstreamLineage (datahub-project#16292) — @alfiyas-datahub
Iceberg domain assignment — Ingestion-time domain assignment for Iceberg sources (datahub-project#16443) — @sergey-pozdnyakov-epam
MongoDB AWS IAM auth — Added pymongo[aws] extra for AWS IAM authentication (datahub-project#16412) — @javabrett
Kafka Avro validation toggle — Option to disable Avro schema name validation (datahub-project#16310) — @Devarsh23
Kafka Connect bundled JVM — Bundle JVM via jdk4py to remove system Java dependency (datahub-project#16445) — @StanDmitrievAiven
CLI agent improvements — Agent-friendly datahub graphql and datahub init enhancements (datahub-project#16476) — @shirshanka

🐛 Bug Fixes

Redshift — Boundary-aware segment stitching for query reconstruction (datahub-project#16253) — @kyungsoo-datahub
Tableau — Apply project filters to embedded datasources when emit_all_embedded_datasources is enabled (datahub-project#16340) — @aviraj-gour
Dagster — Preserve DataJob lineage on failed/canceled runs (datahub-project#16386) — @treff7es
Snowflake — Quoting fix (datahub-project#16393), map COPY query type to INSERT (datahub-project#16461) — @treff7es
Teradata — Set DATABASE context for view HELP commands (datahub-project#16208) — @JohnRTurner
Kafka Connect — Use canonical mssql platform for Debezium SQL Server (datahub-project#16413) — @treff7es
Oracle — Fix profiling crashes and silent table exclusions (datahub-project#16396) — @acrylJonny
Snowplow — Add missing cachetools dependency (datahub-project#16442) — @treff7es

⚡ Performance

Snowflake tags — Emulate tag inheritance in-memory to eliminate N+1 queries (datahub-project#16400) — @treff7es
Dataplex — Streamline ingestion by removing unnecessary entity lookups (datahub-project#16063) — @NehaGslab

🔧 Maintenance

Upgrade to urllib3 v2 (datahub-project#16464) — @sgomezvillamor
Iceberg source recipe examples updated (datahub-project#16417) — @skrydal

📚 Documentation

Microsoft Copilot Context Kit guide and misc docs improvements (datahub-project#16452) — @jjoyce0510
Connector development guide for datahub-skills (datahub-project#16435) — @maggiehays

Contributors: @acrylJonny, @alfiyas-datahub, @aviraj-gour, @Devarsh23, @javabrett, @jjoyce0510, @JohnRTurner, @kyungsoo-datahub, @maggiehays, @NehaGslab, @rajatoss, @sergey-pozdnyakov-epam, @sgomezvillamor, @shirshanka, @skrydal, @StanDmitrievAiven, @treff7es

Contributors

sgomezvillamor, shirshanka, and 15 other contributors

Assets 2

13 Mar 04:08

kyungsoo-datahub

v1.4.0.7rc1

d4e2a4b

v1.4.0.7rc1 Pre-release

Pre-release

Full Changelog: v1.4.0.6...v1.4.0.7rc1

Assets 2

12 Mar 07:38

treff7es

v1.4.0.6rc5

d1e9bb4

v1.4.0.6rc5 Pre-release

Pre-release

DataHub Ingestion `v1.4.0.6rc5`

🚨 Breaking Changes

Dependency declarations migrated to pyproject.toml (PEP 621) — setup.py remains the source of truth for editing dependencies for now; pyproject.toml is auto-generated via ./gradlew :metadata-ingestion:generatePyprojectDeps. setup.py will be deprecated in a future release. (datahub-project#16339) — @kyungsoo-datahub

🌟 Features

dbt: Semantic model support — Ingestion now extracts dbt semantic models, including entities, measures, and dimensions, from both dbt Cloud and dbt Core. (datahub-project#16236) — @alfiyas-datahub
Iceberg: Ingestion-time domain assignment — You can now assign domains to Iceberg datasets at ingestion time via source configuration. (datahub-project#16443) — @sergey-pozdnyakov-epam
CLI: Agent-friendly datahub graphql and datahub init — The CLI now supports schema introspection, operation discovery, and dry-run mode for GraphQL queries, making it easier to integrate with AI agents and automation. (datahub-project#16476) — @shirshanka

⚡ Performance

Snowflake: In-memory tag inheritance — Tag inheritance is now emulated in-memory, eliminating N+1 queries against Snowflake and significantly improving ingestion speed for tagged environments. (datahub-project#16400) — @treff7es

🐛 Bug Fixes

Tableau: Project filters now apply to embedded datasources — When emit_all_embedded_datasources is enabled, project filters are correctly applied to embedded datasources. (datahub-project#16340) — @aviraj-gour

🛠️ Maintenance

urllib3 upgraded to v2 — The ingestion framework now uses urllib3 v2, bringing improved connection handling and modern TLS defaults. (datahub-project#16464) — @sgomezvillamor

📖 Documentation

DataHub Skills connector development guide added. (datahub-project#16435) — @maggiehays
Microsoft Copilot Context Kit guide and miscellaneous docs improvements. (datahub-project#16452) — @jjoyce0510

Contributors

sgomezvillamor, shirshanka, and 7 other contributors

Assets 2

Releases: acryldata/datahub

v1.4.0.9

Contributors

Uh oh!

v1.4.0.9rc1

Uh oh!

v1.4.0.8

Contributors

Uh oh!

v1.4.0.7

What's New in v1.4.0.7

Breaking Changes 🚨

New Features ✨

Bug Fixes 🐛

Improvements 🔧

Documentation 📚

Contributors

Contributors

Uh oh!

v1.4.0.8rc1

DataHub Ingestion v1.4.0.8rc1

🌟 New Features

🐛 Bug Fixes

📄 Documentation

Contributors

Uh oh!

v1.4.0.7rc3

DataHub Ingestion — v1.4.0.7rc3

🌟 New Features

🐛 Bug Fixes

🔒 Security

📚 Documentation

Contributors

Uh oh!

v1.4.0.7rc2

DataHub v1.4.0.7rc2 Release Notes (Ingestion)

New Features

Bug Fixes

Security

Other Improvements

Documentation

Contributors

Contributors

Uh oh!

v1.4.0.6

DataHub Ingestion v1.4.0.6

🚨 Breaking Changes

🌟 New Features

🐛 Bug Fixes

⚡ Performance

🔧 Maintenance

📚 Documentation

Contributors

Uh oh!

v1.4.0.7rc1

Uh oh!

v1.4.0.6rc5

DataHub Ingestion v1.4.0.6rc5

🚨 Breaking Changes

🌟 Features

⚡ Performance

🐛 Bug Fixes

🛠️ Maintenance

📖 Documentation

Contributors

Uh oh!

DataHub Ingestion `v1.4.0.6rc5`