-
Notifications
You must be signed in to change notification settings - Fork 66
Further Croissant RDF integration #858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Further Croissant RDF integration #858
Conversation
… script. fixed networkx adjacency matrix
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
About the common dependencies - I am not quite sure how it should work. Should I somehow reference cc @david4096 |
@stefanches7 I meant that croissant-rdf code would be included in mlcroissant's library. But we didn't go that path. Thanks a lot for the follow up! |
@@ -25,8 +25,7 @@ dependencies = [ | |||
"kaggle >=1.6.17", | |||
"openml >=0.15.1", | |||
"rich >=13.9.4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Remove rich in favour of tqdm as pointed in the initial PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We started with tqdm but moved to rich because it's... richer ;) Prefer we keep it otherwise we're sort of thrashing. https://medium.com/pythoneers/from-tqdm-to-rich-my-quest-for-better-progress-bars-afff39985ffc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick to rich then, and take the decision if we ever merge the code with mlcroissant.
@@ -3,7 +3,7 @@ | |||
import pytest | |||
from rdflib import Graph | |||
|
|||
from croissant_rdf.providers import HuggingfaceHarvester | |||
from croissant_rdf import HuggingfaceHarvester | |||
|
|||
OUTPUT_FILEPATH = "./tests/test_output.ttl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We should avoid this pattern as pointed out in the initial review. This raises issues when distributing the tests or running them in parallel. Instead we should create temporary files within the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment would apply to all tests in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, overlooked. Thanks for the link!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is really great to see, thanks @stefanches7 ! |
Discussed with @david4096 that dependending on the |
If you ever need to do that, let's just merge both packages. Introducing a dependency would be too hard to maintain - for zero return as I underline in the first PR. For now, we can keep them distinct, ping me if we ever need mlcroissant from croissant-rdf. Thanks again for the good work! Ready to merge once the temporary file issue is solved. |
Thanks to being on Windows the |
Could it be merged? |
How about merging this @marcenacp @ccl-core or does anything prevent this? Thanks |
It seems like the CI tests are still failing? Is it related to this PR? |
Thanks! I'll have a look at the other PR :) |
According to #848 comments there are further points in the croissant-rdf code integration