-
-
Notifications
You must be signed in to change notification settings - Fork 125
Description
The crawler currently handles browser crashes (or other interruptions) by exiting with a specific error code, and assuming that the crawler container will be restarted. This has many advantages, ensuring full cleanup, etc... and works well with Kubernetes pod behavior.
Since we run the crawler in production only in Kubernetes, we have leaned into this behavior more and more.
However, we understand many users don't want to run the crawler in Kubernetes, or with an external controller or process manager.
For these deployments, having the crawler exit with a status code is not ideal, and I'm thinking if perhaps a wrapper shell script that does exponential backoff and restarts the node process would provide a good standalone feature? We would then default to running with restartsOnError set to true, and that would be the default path.
The exponential back-off script could be something simple, there's many example, like: https://gist.github.com/nathforge/62456d9b18e276954f58eb61bf234c17
It would need to have additional properties, to mimic the Kubernetes behavior:
- Reset the time if crawler is running successfully for some amount of time (eg. 10 without any exits)
- Don't restart on certain error codes, like time limit reached or out of disk space
As this wouldn't be our production deployment, we would want help from the community in testing this approach, as we won't have a lot of bandwidth to test this, especially for longer running crawls.
I can see it being helpful for issues such as one in #927 and especially for openzim/zimit#527
For users running larger-scale crawls with just Browsertrix Crawler (@benoit74, @gitreich @Mr0grog) would you be willing to help test this type of setup? What do you think of this approach?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status