-
Notifications
You must be signed in to change notification settings - Fork 2k
Adding harvester robustness #19373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding harvester robustness #19373
Conversation
I have some tests that don't fully work yet in another branch of my fork if you really want them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this PR have two independent changes. Making traversing plot directories more robust to failure seems uncontroversial. By splitting that out into a serparate PR, I think it's easier to land them, as we don't have to wait for both parts to be ready before landing anything
try: | ||
files = glob.glob(str(directory / "**" / "*.plot"), recursive=True) | ||
for file in files: | ||
try: | ||
filepath = Path(file).resolve() | ||
if filepath.is_file() and not filepath.name.startswith("._"): | ||
all_files.append(filepath) | ||
except Exception as e: | ||
# If we can't process a specific file, log and continue | ||
log.warning(f"Error processing file {file}: {e}") | ||
continue | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems unnecessary to keep the glob()
path. The idea is that the "manual" traversal works too, right? It's much easier to test if there's only one implementation
@@ -23,6 +23,7 @@ def get_routes(self) -> dict[str, Endpoint]: | |||
return { | |||
"/get_plots": self.get_plots, | |||
"/refresh_plots": self.refresh_plots, | |||
"/hard_refresh_plots": self.hard_refresh_plots, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would seem more intuitive to add an argument to /refresh_plots
, something like force=True
, rather than adding a whole new endpoint. Maybe there's a good reason.
try: | ||
# Get all plot files in this directory | ||
plot_files = get_filenames(directory, recursive_scan, recursive_follow_links) | ||
all_files[directory] = plot_files | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with your change, does get_filenames()
ever throw an exception?
Thanks @arvidn . Just want to make sure: |
yeah. It seems the RPC to force-refresh the plots is somewhat independent from making directory traversal more robust |
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
The manual method using I concur with Arvid that two PRs, one where you replace |
This PR has been flagged as stale due to no activity for over 60 days. It will not be automatically closed, but it has been given a stale-pr label and should be manually reviewed by the relevant parties. |
This fixes #19383
This PR addresses the issue where the harvester abandons an entire plot directory when encountering I/O errors in subdirectories during recursive plot scanning.
It also introduces some new CLI commands and RPC endpoints which are useful for recovering from such a scenario.
Features Added/Changed:
Cache Clear Method:
clear()
method to theCache
class that removes all entries from the cache~/.chia/mainnet/cache/plot_manager_v2.dat
file, and restarting the harvesterNew
harvester
RPC Endpoints:/refresh_plots
endpoint to the harvester RPC API for normal plot refreshing/hard_refresh_plots
endpoint that clears the plot cache and triggers a full refreshNew CLI Commands:
chia plots refresh
command to trigger a normal refresh of plotschia plots refresh --hard
option to perform a hard refresh (clearing cache)These changes improve the harvester's resilience to I/O errors by providing a way to clear the cache and force a complete refresh without restarting the service. This is particularly valuable for large farms with multiple drive mounts where occasional I/O errors are inevitable.