-
Notifications
You must be signed in to change notification settings - Fork 94
Croissant RDF: merge-rdf and convert RDF back to Croissant JSON-LD #958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Some datasets return @context as a simple string (e.g., "https://schema.org") while others return it as a dict with @vocab and namespace prefixes. Updated test_fetch_data_workflow to handle both formats correctly.
Implements the reverse operation to regenerate Croissant JSON-LD from RDF files. This addresses one of the key objectives from issue mlcommons#850. Changes: - Add convert_from_rdf() method to CroissantHarvester - Create new rdf-to-jsonld CLI tool for easy conversion - Add comprehensive tests for round-trip conversion - Supports all RDF formats (Turtle, N-Triples, RDF/XML, etc.)
Implements the ability to merge RDF files from multiple Croissant providers into a unified knowledge graph. This addresses issue mlcommons#850 objective. Features: - Merge multiple RDF files with automatic deduplication - Support for various RDF formats (Turtle, N-Triples, RDF/XML, etc.) - CLI tool 'merge-rdf' for easy merging - Wildcard support for batch merging (e.g., *.ttl) - Output format selection (turtle, json-ld, n3, nt, xml) - Comprehensive tests for merging and deduplication Example: merge-rdf huggingface.ttl openml.ttl kaggle.ttl -o unified.ttl
Major improvements: - Added comprehensive Quick Start section with all providers - Documented new CLI tools: rdf-to-jsonld and merge-rdf - Added CLI tools reference table - Included practical use cases (cross-platform catalogs, bioinformatics KG) - Improved SPARQL query examples with better descriptions - Added architecture diagram - Reorganized development section for better clarity - Highlighted multi-provider and knowledge graph merging capabilities The README now reflects all new features implemented for issue mlcommons#850.
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
eeb7446 to
f297487
Compare
Changed rdf-to-jsonld and merge-rdf from standalone commands to subcommands (to-jsonld and merge) under a unified croissant-rdf CLI. The old standalone commands remain for backward compatibility.
Added documentation for the new croissant-rdf command with to-jsonld and merge subcommands. Updated all usage examples to show the new unified CLI while noting that legacy commands remain available.
Only the unified croissant-rdf CLI with to-jsonld and merge subcommands is now available. Updated documentation accordingly.
|
Looks good @david4096 ! I only wonder if the simple dispatches to |
This PR implements features from issue #850 to enhance croissant-rdf with round-trip RDF conversion and multi-provider graph merging capabilities.
Changes
1. Fix test for varying @context formats
Some datasets return
@contextas a string while others return it as a dict. Updatedtest_fetch_data_workflowto handle both formats correctly.2. RDF to JSON-LD conversion
Implements the reverse operation to regenerate Croissant JSON-LD from RDF files.
convert_from_rdf()method inCroissantHarvesterrdf-to-jsonld3. Multi-provider RDF merging
Enables combining RDF files from multiple Croissant providers into unified knowledge graphs.
merge-rdfCLI tool4. Documentation improvements
Test Results
All tests passing: 21/21 (71% code coverage)
CLI Tools Added
rdf-to-jsonld: Convert RDF back to Croissant JSON-LDmerge-rdf: Merge multiple RDF files into unified graphsExample Usage
Addresses objectives from #850. @stefanches7