Automate dataset generation and decouple Hugging Face sync#30
Conversation
* Update `README.md` to track the latest snapshot date.
* Add `schedule` trigger for cron `0 0 * * 1` to `.github/workflows/opendata.yml`.
* Output the snapshot date from the `generate-opendata` job.
* Extract "Update README" and "Upload to Hugging Face" processes into independent `update-readme` and `upload-dataset` jobs, respectively.
Co-authored-by: MrOrz <108608+MrOrz@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant automation to the dataset generation and publication process. It establishes a weekly schedule for creating new datasets, automatically updates the repository's README with the latest snapshot date, and independently handles the synchronization of these datasets with Hugging Face. These changes streamline maintenance, enhance data currency, and reduce manual intervention for data releases. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
* Update `README.md` to track the latest snapshot date.
* Add `schedule` trigger for cron `24 16 * * 0` (Monday 00:24 Taiwan time) to `.github/workflows/opendata.yml`.
* Output the snapshot date from the `generate-opendata` job.
* Extract "Update README" and "Upload to Hugging Face" processes into independent `update-readme` and `upload-dataset` jobs, respectively.
Co-authored-by: MrOrz <108608+MrOrz@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request automates the weekly dataset generation and decouples the Hugging Face synchronization. The changes include modifying the GitHub Actions workflow and updating the README.md to display the latest dataset snapshot date. My review focuses on the README.md change, suggesting an improvement to the placeholder date for better clarity.
Note: Security Review has been skipped due to the limited scope of the PR.
* Revert incorrect python module invocation for huggingface-cli.
* Add `actions/setup-python@v5` to `upload-dataset` job before `pip install` to properly configure Python and the path, ensuring `huggingface-cli` is found during execution on `ubuntu-latest`.
Co-authored-by: MrOrz <108608+MrOrz@users.noreply.github.com>
* Revert `hf` binary to user-requested `huggingface-cli`.
* Update python package installation step to explicitly install `huggingface_hub[cli]<0.25.0` to preserve the availability of the `huggingface-cli` executable. Versions >=0.25 deprecated it in favor of `hf`.
Co-authored-by: MrOrz <108608+MrOrz@users.noreply.github.com>
* Replace `pip install huggingface_hub[cli]` step with `astral-sh/setup-uv@v5`.
* Update the `huggingface-cli` command to be run via `uvx --from huggingface_hub huggingface-cli upload`.
Co-authored-by: MrOrz <108608+MrOrz@users.noreply.github.com>
* Modify `uvx` command to specifically use `huggingface_hub[cli]<0.25.0` since `huggingface-cli` was removed in v0.25+ and renamed to `hf`.
* Set `enable-cache: false` in `astral-sh/setup-uv@v5` to prevent GitHub Actions warnings when no Python lockfiles/requirements are found in the repository.
Co-authored-by: MrOrz <108608+MrOrz@users.noreply.github.com>
* Modify `uvx` command to use the modern `hf` executable from the `huggingface_hub` package, correctly matching latest upstream changes instead of pinning deprecated `<0.25` versions. Co-authored-by: MrOrz <108608+MrOrz@users.noreply.github.com>

Implements automated weekly dataset generation as per user request.
Modifies
.github/workflows/opendata.ymlto:0 0 * * 1).README.mdwith the new snapshot date and commit the changes.Updates
README.mdto establish the HTML comment block tracking the snapshot date.PR created automatically by Jules for task 12801888251781349117 started by @MrOrz