Skip to content

[Feature]: Optionally restarting on browser crashes with exponential backoff #938

@ikreymer

Description

@ikreymer

The crawler currently handles browser crashes (or other interruptions) by exiting with a specific error code, and assuming that the crawler container will be restarted. This has many advantages, ensuring full cleanup, etc... and works well with Kubernetes pod behavior.
Since we run the crawler in production only in Kubernetes, we have leaned into this behavior more and more.

However, we understand many users don't want to run the crawler in Kubernetes, or with an external controller or process manager.

For these deployments, having the crawler exit with a status code is not ideal, and I'm thinking if perhaps a wrapper shell script that does exponential backoff and restarts the node process would provide a good standalone feature? We would then default to running with restartsOnError set to true, and that would be the default path.

The exponential back-off script could be something simple, there's many example, like: https://gist.github.com/nathforge/62456d9b18e276954f58eb61bf234c17

It would need to have additional properties, to mimic the Kubernetes behavior:

  • Reset the time if crawler is running successfully for some amount of time (eg. 10 without any exits)
  • Don't restart on certain error codes, like time limit reached or out of disk space

As this wouldn't be our production deployment, we would want help from the community in testing this approach, as we won't have a lot of bandwidth to test this, especially for longer running crawls.

I can see it being helpful for issues such as one in #927 and especially for openzim/zimit#527

For users running larger-scale crawls with just Browsertrix Crawler (@benoit74, @gitreich @Mr0grog) would you be willing to help test this type of setup? What do you think of this approach?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions