Skip to content

feat(rdf): ontology-gated relationship routing and extension triple s…#17998

Draft
stephengoldbaum wants to merge 1 commit into
masterfrom
rdf-ingest-2
Draft

feat(rdf): ontology-gated relationship routing and extension triple s…#17998
stephengoldbaum wants to merge 1 commit into
masterfrom
rdf-ingest-2

Conversation

@stephengoldbaum

Copy link
Copy Markdown
Contributor

Supporting new RDF capabilities in the existing RDF Ingestor

Route harvested RDF triples via a bundled TBox to native glossaryRelatedTerms or fixed hierarchical structured properties, with predicate glossary terms, reference closure, OWL axiom filtering, and synchronous SP definition registration.


This pull request introduces significant enhancements to the RDF ingestion pipeline, particularly around relationship and structured property handling. It refactors the relationship entity module to support a more flexible, ontology-driven routing of relationships, adds support for RDF extension structured properties, and improves the classification of glossary terms. The changes also include new data structures, entity registration mechanisms, and summary reporting enhancements.

Relationship and Structured Property Routing Enhancements:

  • Refactored the relationship entity module to support ontology-gated routing, harvesting all URI-object property triples and routing them via DataHub TBox alignments to either native relationship fields or extension structured properties. Deprecated the old RelationshipConverter and legacy-only types, and introduced new data structures such as RDFStatement, DataHubNativeRelationship, DataHubStructuredPropertyAssignment, and DataHubStructuredPropertyDefinition for more granular handling. [1] [2]
  • Registered the new rdf_structured_property entity type, including its MCP builder and AST class, and updated the entity registry to support this new entity with appropriate CLI names and processing order. [1] [2] [3] [4] [5]

Glossary Term Classification Improvements:

  • Added a new mechanism for classifying glossary terms by kind (e.g., concept, class, named individual, predicate, materialized), using a dedicated property and resolver function. This ensures each term's kind is inferred and stored as a custom property during extraction. [1] [2] [3]

Core AST and Summary Reporting:

  • Extended the core DataHubGraph AST to track native relationships, structured property definitions, and assignments, and updated its summary reporting to include these new fields. [1] [2]

Other Improvements and Cleanups:

  • Updated the entity registry to remove the legacy relationship converter, streamline registration logic, and ensure proper dependency and processing order.
  • Added packaging support for ontology .ttl files.
  • Minor bugfix in domain/term processing to avoid mutating the glossary term list during iteration.

Files and Data Structure Additions:

  • Added new files for structured property entity registration, AST, and MCP builder, with logic for collecting definitions, building MCPs, and synchronous registration. [1] [2] [3]

Relationship and Structured Property Routing:

  • Refactored the relationship entity to support ontology-gated routing, harvesting all property triples and routing them to native fields or extension structured properties; deprecated legacy converter and types, and introduced new data structures for more granular relationship and property handling. [1] [2]
  • Registered the new rdf_structured_property entity type, including AST, MCP builder, and CLI names; updated the registry to support this entity and set its processing order. [1] [2] [3] [4] [5]

Glossary Term Classification:

  • Added glossary term kind classification with a resolver function and ensured each term's kind is stored as a custom property during extraction. [1] [2] [3]

Core AST and Summary Reporting:

  • Extended DataHubGraph AST to include native relationships, structured property definitions, and assignments; updated summary reporting to reflect these fields. [1] [2]

Other Improvements:

  • Cleaned up entity registry logic, removed legacy relationship converter, and ensured proper registration and dependency order.
  • Added packaging support for .ttl ontology files.
  • Fixed a bug in term/domain processing to avoid mutating lists during iteration.

@github-actions github-actions Bot added the ingestion PR or Issue related to the ingestion of metadata label Jun 23, 2026
…torage

Route harvested RDF triples via a bundled TBox to native glossaryRelatedTerms
or fixed hierarchical structured properties, with predicate glossary terms,
reference closure, OWL axiom filtering, and synchronous SP definition registration.

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

❌ 14 Tests Failed:

Tests completed Failed Passed Skipped
1662 14 1648 101
View the top 3 failed test(s) by shortest run time
tests.integration.rdf.test_rdf_source::test_fibo_named_individual_ingestion
Stack Traces | 0.086s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.ClassTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_export_only_filter
Stack Traces | 0.096s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.ChildTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.glossary.ParentTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_stateful_ingestion
Stack Traces | 0.098s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.CustomerName:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.glossary.AccountIdentifier:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_fibo_exclude_provisional_terms
Stack Traces | 0.102s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn added, urn:li:structuredProperty:io.datahub.rdf.predicate.omg.org.spec.Commons.AnnotationVocabulary.hasMaturityLevel
Urn added, urn:li:glossaryTerm:example.org.ProvisionalTerm

Urn changed, urn:li:glossaryTerm:example.org.ReleasedTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
<structuredProperties> added

Urn changed, urn:li:glossaryTerm:example.org.NoMaturityTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_recursive_directory_ingestion
Stack Traces | 0.103s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.AnotherTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.glossary.AccountIdentifier:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.glossary.CustomerName:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_glossary_with_domains_ingestion
Stack Traces | 0.116s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:bank.com.trading.loans.Customer_Name:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:bank.com.trading.loans.Loan_Amount:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_glossary_with_relationships_ingestion
Stack Traces | 0.116s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.ChildTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.glossary.ParentTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_rdf_xml_format
Stack Traces | 0.118s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.TestTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_json_ld_format
Stack Traces | 0.125s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.TestTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_skip_export_filter
Stack Traces | 0.134s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.ChildTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.glossary.ParentTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_fibo_include_provisional_terms
Stack Traces | 0.137s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn added, urn:li:structuredProperty:io.datahub.rdf.predicate.omg.org.spec.Commons.AnnotationVocabulary.hasMaturityLevel

Urn changed, urn:li:glossaryTerm:example.org.ReleasedTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
<structuredProperties> added

Urn changed, urn:li:glossaryTerm:example.org.NoMaturityTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.ProvisionalTerm:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
<structuredProperties> added
tests.integration.rdf.test_rdf_source::test_sparql_filter_multiple_namespaces
Stack Traces | 0.149s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.module1.Term1:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.module1.Term2:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.module2.Term1:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.module2.Term2:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_simple_glossary_ingestion
Stack Traces | 0.153s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.glossary.CustomerName:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.glossary.AccountIdentifier:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.
tests.integration.rdf.test_rdf_source::test_sparql_filter_single_namespace
Stack Traces | 0.254s run time
Metadata files differ (use `pytest --update-golden-files` to update):
Urn changed, urn:li:glossaryTerm:example.org.module1.Term1:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

Urn changed, urn:li:glossaryTerm:example.org.module1.Term2:
<glossaryTermInfo> changed:
	Item aspect['customProperties']['rdf:entityKind'] added to dictionary.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant