Skip to content

[question] How to check progress of large downloads? #941

@wblondel

Description

@wblondel

Hello everyone,

I'm archiving a website that has a lot of large files, and this takes a very long time.
Is there any way to check the progress of these downloads?
I see files are being downloading by looking at the /tmp folder in the container, but the real name of the files are not there.

As an extra note, I had to set the timeout parameter to 999999, because downloads always timeout after the timeout duration + 100. For example, if timeout is set to 180, I will get Finishing Fetch Timed Out after 280 seconds. I haven't found any settings defaulted to 100.

Current config:

collection: "mycatalog"
workers: 2

seeds:
    - url: https://subdomain.domain.local/
      scopeType: "host"
      extraHops: 1
      exclude:
        - subdomain.domain.local/logout
        - logout$

blockRules:
   - url: googleanalytics.com

profile: /crawls/profiles/profile.tar.gz
generateWACZ: true
combineWARC: true
text: "to-pages"

blockAds: true
timeout: 999999
screencastPort: 9037

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions