-
-
Notifications
You must be signed in to change notification settings - Fork 125
Open
Description
Hello everyone,
I'm archiving a website that has a lot of large files, and this takes a very long time.
Is there any way to check the progress of these downloads?
I see files are being downloading by looking at the /tmp folder in the container, but the real name of the files are not there.
As an extra note, I had to set the timeout parameter to 999999, because downloads always timeout after the timeout duration + 100. For example, if timeout is set to 180, I will get Finishing Fetch Timed Out after 280 seconds. I haven't found any settings defaulted to 100.
Current config:
collection: "mycatalog"
workers: 2
seeds:
- url: https://subdomain.domain.local/
scopeType: "host"
extraHops: 1
exclude:
- subdomain.domain.local/logout
- logout$
blockRules:
- url: googleanalytics.com
profile: /crawls/profiles/profile.tar.gz
generateWACZ: true
combineWARC: true
text: "to-pages"
blockAds: true
timeout: 999999
screencastPort: 9037Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Triage