Adding harvester robustness #19373

wallentx · 2025-03-12T02:37:23Z

This fixes #19383

This PR addresses the issue where the harvester abandons an entire plot directory when encountering I/O errors in subdirectories during recursive plot scanning.
It also introduces some new CLI commands and RPC endpoints which are useful for recovering from such a scenario.

Features Added/Changed:

Cache Clear Method:
- Added a new clear() method to the Cache class that removes all entries from the cache
- This allows for a complete cache reset without having to manually delete the ~/.chia/mainnet/cache/plot_manager_v2.dat file, and restarting the harvester
New harvester RPC Endpoints:
- Added a new /refresh_plots endpoint to the harvester RPC API for normal plot refreshing
- Added a new /hard_refresh_plots endpoint that clears the plot cache and triggers a full refresh
- These endpoints allow programmatic refreshing of plots without restarting the harvester
New CLI Commands:
- Added chia plots refresh command to trigger a normal refresh of plots
- Added chia plots refresh --hard option to perform a hard refresh (clearing cache)
- Provides users with convenient ways to refresh plots without restarting the harvester

These changes improve the harvester's resilience to I/O errors by providing a way to clear the cache and force a complete refresh without restarting the service. This is particularly valuable for large farms with multiple drive mounts where occasional I/O errors are inevitable.

wallentx · 2025-03-12T23:57:34Z

I have some tests that don't fully work yet in another branch of my fork if you really want them.

arvidn

It seems like this PR have two independent changes. Making traversing plot directories more robust to failure seems uncontroversial. By splitting that out into a serparate PR, I think it's easier to land them, as we don't have to wait for both parts to be ready before landing anything

arvidn · 2025-03-26T13:02:05Z

chia/plotting/util.py

+            try:
+                files = glob.glob(str(directory / "**" / "*.plot"), recursive=True)
+                for file in files:
+                    try:
+                        filepath = Path(file).resolve()
+                        if filepath.is_file() and not filepath.name.startswith("._"):
+                            all_files.append(filepath)
+                    except Exception as e:
+                        # If we can't process a specific file, log and continue
+                        log.warning(f"Error processing file {file}: {e}")
+                        continue
+            except Exception as e:


it seems unnecessary to keep the glob() path. The idea is that the "manual" traversal works too, right? It's much easier to test if there's only one implementation

chia/plotting/util.py

arvidn · 2025-03-26T13:05:29Z

chia/rpc/harvester_rpc_api.py

@@ -23,6 +23,7 @@ def get_routes(self) -> dict[str, Endpoint]:
        return {
            "/get_plots": self.get_plots,
            "/refresh_plots": self.refresh_plots,
+            "/hard_refresh_plots": self.hard_refresh_plots,


it would seem more intuitive to add an argument to /refresh_plots, something like force=True, rather than adding a whole new endpoint. Maybe there's a good reason.

arvidn · 2025-03-26T13:07:33Z

chia/plotting/util.py

+        try:
+            # Get all plot files in this directory
+            plot_files = get_filenames(directory, recursive_scan, recursive_follow_links)
+            all_files[directory] = plot_files
+        except Exception as e:


with your change, does get_filenames() ever throw an exception?

chia/plotting/util.py

wallentx · 2025-03-26T14:53:27Z

Thanks @arvidn . Just want to make sure:
Make a separate PR for "preventing the failure" and keep this PR for "how to recover the state in the event of a failure", so that I get to keep the benefit of your review?

arvidn · 2025-03-26T15:57:16Z

yeah. It seems the RPC to force-refresh the plots is somewhat independent from making directory traversal more robust

Co-authored-by: Arvid Norberg <[email protected]>

emlowe · 2025-04-02T22:33:49Z

The manual method using os.walk and iterating might be slow, but I have no data for this. Have you tried to benchmark this at all?

I concur with Arvid that two PRs, one where you replace glob with walk and a second where you introduced new chia commands, are a good idea. As the new chia commands would work regardless of the iteration method

github-actions · 2025-05-18T11:05:38Z

This PR has been flagged as stale due to no activity for over 60 days. It will not be automatically closed, but it has been given a stale-pr label and should be manually reviewed by the relevant parties.

github-actions · 2025-06-16T21:32:45Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wallentx added 4 commits March 11, 2025 21:27

Adding harvester robustness

57da0e0

Fix chia plots refresh

18a7116

Fix command connection to harvester RPC

8f750af

Fix 'chia plots refresh' command connetion to harvester

3b033b7

wallentx marked this pull request as ready for review March 12, 2025 23:57

wallentx requested a review from a team as a code owner March 12, 2025 23:57

ChiaAutomation added the community-pr label Mar 13, 2025

arvidn reviewed Mar 26, 2025

View reviewed changes

wallentx and others added 8 commits March 27, 2025 03:02

Update chia/plotting/util.py

8fb7e29

Co-authored-by: Arvid Norberg <[email protected]>

Update chia/plotting/util.py

d200c79

Co-authored-by: Arvid Norberg <[email protected]>

Update chia/plotting/util.py

8a0875b

Co-authored-by: Arvid Norberg <[email protected]>

Update chia/plotting/util.py

531c997

Co-authored-by: Arvid Norberg <[email protected]>

Update chia/plotting/util.py

564ee0b

Co-authored-by: Arvid Norberg <[email protected]>

Update chia/plotting/util.py

550ef11

Co-authored-by: Arvid Norberg <[email protected]>

Update chia/plotting/util.py

64b62bf

Co-authored-by: Arvid Norberg <[email protected]>

Merge branch 'main' into wallentx/harvester-robustvester

8b5bbf4

github-actions bot added the stale-pr Flagged as stale and in need of manual review label May 18, 2025

github-actions bot added merge_conflict Branch has conflicts that prevent merge to main and removed stale-pr Flagged as stale and in need of manual review labels Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding harvester robustness #19373

Adding harvester robustness #19373

wallentx commented Mar 12, 2025 •

edited

Loading

Uh oh!

wallentx commented Mar 12, 2025

Uh oh!

arvidn left a comment

Uh oh!

arvidn Mar 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arvidn Mar 26, 2025

Uh oh!

arvidn Mar 26, 2025

Uh oh!

Uh oh!

wallentx commented Mar 26, 2025

Uh oh!

arvidn commented Mar 26, 2025

Uh oh!

emlowe commented Apr 2, 2025

Uh oh!

github-actions bot commented May 18, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

Uh oh!

Adding harvester robustness #19373

Are you sure you want to change the base?

Adding harvester robustness #19373

Conversation

wallentx commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features Added/Changed:

Uh oh!

wallentx commented Mar 12, 2025

Uh oh!

arvidn left a comment

Choose a reason for hiding this comment

Uh oh!

arvidn Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arvidn Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

arvidn Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wallentx commented Mar 26, 2025

Uh oh!

arvidn commented Mar 26, 2025

Uh oh!

emlowe commented Apr 2, 2025

Uh oh!

github-actions bot commented May 18, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

Uh oh!

wallentx commented Mar 12, 2025 •

edited

Loading