Skip to content

Commit 0f95e04

Browse files
authored
Rerun Croissant Health reports for Hugging Face and OpenML (#660)
1 parent d65b6ce commit 0f95e04

File tree

3 files changed

+51
-805
lines changed

3 files changed

+51
-805
lines changed

health/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ pip install -r requirements.txt
1717

1818
# Test the spider locally.
1919
# In huggingface.py you can uncomment the line in
20-
# `start_requests` to produce crawl fake data.
20+
# `list_datasets` to produce crawl fake data.
2121
scrapy crawl huggingface
2222

2323
# When you're ready, the following commands launch a new job:

health/crawler/spiders/openml.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,6 @@ def list_datasets(self):
2020

2121
def get_url(self, dataset_id: str):
2222
"""See base class."""
23-
return f"https://openml1.win.tue.nl/dataset{dataset_id}/croissant.json"
23+
return (
24+
f"https://openml1.win.tue.nl/{dataset_id // 10000:04d}/{dataset_id:04d}/dataset_{dataset_id}_croissant.json"
25+
)

health/visualizer/report_huggingface.ipynb

Lines changed: 47 additions & 803 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)