-
-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Command to repro:
docker run -v $PWD/output:/output --name crawlme --rm webrecorder/browsertrix-crawler:1.5.4 crawl --url "https://www.survivorlibrary.com/index.php/Accounting" --scopeType host --cwd /output --sizeLimit 100000000
Expected behavior: crawler should stop when archive size has reached about 100MB
Logs
{"timestamp":"2025-02-25T08:26:20.760Z","logLevel":"info","context":"general","message":"Browsertrix-Crawler 1.5.4 (with warcio.js 2.4.3)","details":{}}
{"timestamp":"2025-02-25T08:26:20.761Z","logLevel":"info","context":"general","message":"Seeds","details":[{"url":"https://www.survivorlibrary.com/index.php/Accounting","scopeType":"host","include":["/^https?:\\/\\/www\\.survivorlibrary\\.com\\//"],"exclude":[],"allowHash":false,"depth":-1,"sitemap":null,"auth":null,"_authEncoded":null,"maxExtraHops":0,"maxDepth":1000000}]}
{"timestamp":"2025-02-25T08:26:20.761Z","logLevel":"info","context":"general","message":"Link Selectors","details":[{"selector":"a[href]","extract":"href","isAttribute":false}]}
{"timestamp":"2025-02-25T08:26:20.761Z","logLevel":"info","context":"general","message":"Behavior Options","details":{"message":"{\"autoplay\":true,\"autofetch\":true,\"autoscroll\":true,\"siteSpecific\":true,\"log\":\"__bx_log\",\"startEarly\":true,\"clickSelector\":\"a\"}"}}
{"timestamp":"2025-02-25T08:26:21.163Z","logLevel":"info","context":"worker","message":"Creating 1 workers","details":{}}
{"timestamp":"2025-02-25T08:26:21.164Z","logLevel":"info","context":"worker","message":"Worker starting","details":{"workerid":0}}
{"timestamp":"2025-02-25T08:26:21.245Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/Accounting"}}
{"timestamp":"2025-02-25T08:26:21.246Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:21.165Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/Accounting\",\"added\":\"2025-02-25T08:26:20.848Z\",\"depth\":0}"]}}
{"timestamp":"2025-02-25T08:26:22.425Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:23.428Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/index.php/Accounting","frameId":"1BCD0EDE107814DF87457333EB010696"}}
{"timestamp":"2025-02-25T08:26:27.635Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/Accounting"],"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:27.635Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/Accounting","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:28.157Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:28.158Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:28.159Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/Accounting","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:28.159Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:30.662Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}}
{"timestamp":"2025-02-25T08:26:30.673Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/"}}
{"timestamp":"2025-02-25T08:26:30.674Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":1,"total":84,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:30.673Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/\",\"added\":\"2025-02-25T08:26:27.573Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:26:31.596Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:35.680Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/"],"page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:35.680Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/","page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:36.199Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:36.200Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:36.200Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/","page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:36.200Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:38.702Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/","workerid":0}}
{"timestamp":"2025-02-25T08:26:38.712Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/main-library-index/"}}
{"timestamp":"2025-02-25T08:26:38.713Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":2,"total":94,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:38.712Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/main-library-index\\/\",\"added\":\"2025-02-25T08:26:27.574Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:26:39.769Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:43.756Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/main-library-index/"],"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:43.756Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/main-library-index/","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:44.276Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:44.277Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:44.278Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/main-library-index/","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:44.278Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:46.780Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}}
{"timestamp":"2025-02-25T08:26:46.791Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/store/"}}
{"timestamp":"2025-02-25T08:26:46.791Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":3,"total":260,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:46.790Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/store\\/\",\"added\":\"2025-02-25T08:26:27.574Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:26:48.516Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:52.599Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/store/"],"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:52.599Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/store/","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:53.114Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:53.115Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:53.115Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/store/","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:53.115Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:55.619Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}}
{"timestamp":"2025-02-25T08:26:55.629Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/about-us/"}}
{"timestamp":"2025-02-25T08:26:55.629Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":4,"total":273,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:55.628Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/about-us\\/\",\"added\":\"2025-02-25T08:26:27.575Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:26:56.656Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:00.498Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/about-us/"],"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:00.498Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/about-us/","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:01.012Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:01.012Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:01.013Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/about-us/","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:01.013Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:03.516Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}}
{"timestamp":"2025-02-25T08:27:03.648Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/library-faqs/"}}
{"timestamp":"2025-02-25T08:27:03.648Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":5,"total":274,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:03.524Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/library-faqs\\/\",\"added\":\"2025-02-25T08:26:27.575Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:04.678Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:05.361Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/index.php/library-faqs/","frameId":"23694ECCB488D794F9C10D5E3CE48705"}}
{"timestamp":"2025-02-25T08:27:05.387Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/wp-content/uploads/2024/06/Slider1-7.jpg","frameId":"23694ECCB488D794F9C10D5E3CE48705"}}
{"timestamp":"2025-02-25T08:27:05.388Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/wp-content/uploads/2024/06/Slide-2.jpg","frameId":"23694ECCB488D794F9C10D5E3CE48705"}}
{"timestamp":"2025-02-25T08:27:08.601Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/library-faqs/"],"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:08.601Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/library-faqs/","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:09.116Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:09.116Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:09.117Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/library-faqs/","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:09.117Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:11.620Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}}
{"timestamp":"2025-02-25T08:27:11.630Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/contact-me/"}}
{"timestamp":"2025-02-25T08:27:11.631Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":6,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:11.630Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/contact-me\\/\",\"added\":\"2025-02-25T08:26:27.576Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:12.650Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:16.470Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/contact-me/"],"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:16.470Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/contact-me/","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:16.982Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:16.982Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:16.983Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/contact-me/","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:16.983Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:19.486Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}}
{"timestamp":"2025-02-25T08:27:19.494Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf"}}
{"timestamp":"2025-02-25T08:27:19.495Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":7,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:19.494Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/20th_century_bookkeeping_and_accounting_1922.pdf\",\"added\":\"2025-02-25T08:26:27.576Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:19.946Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:23.502Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:23.502Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:23.517Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf"}}
{"timestamp":"2025-02-25T08:27:23.518Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":8,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:23.517Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/accounts_in_theory_and_practice_1920.pdf\",\"added\":\"2025-02-25T08:26:27.577Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:23.662Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:24.942Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:24.943Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:24.959Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf"}}
{"timestamp":"2025-02-25T08:27:24.962Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":9,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:24.958Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/bookkeeping_and_accountancy_1911.pdf\",\"added\":\"2025-02-25T08:26:27.577Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:25.110Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:26.668Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:26.668Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:26.684Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf"}}
{"timestamp":"2025-02-25T08:27:26.685Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":10,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:26.683Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf\",\"added\":\"2025-02-25T08:26:27.578Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:26.844Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:27.909Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:27.910Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:27.923Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf"}}
{"timestamp":"2025-02-25T08:27:27.924Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":11,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:27.922Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/bookkeeping_complete_course_1912.pdf\",\"added\":\"2025-02-25T08:26:27.578Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:28.068Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:30.718Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:30.719Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:30.736Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf"}}
{"timestamp":"2025-02-25T08:27:30.741Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":12,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:30.735Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/business_accounting_vol_1_1920.pdf\",\"added\":\"2025-02-25T08:26:27.579Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:30.897Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:32.542Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:32.543Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:32.563Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf"}}
{"timestamp":"2025-02-25T08:27:32.568Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":13,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:32.562Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/business_accounting_vol_2_1920.pdf\",\"added\":\"2025-02-25T08:26:27.579Z\",\"depth\":1}"]}}
{"timestamp":"2025-02-25T08:27:32.717Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:34.468Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:34.468Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","workerid":0}}
{"timestamp":"2025-02-25T08:27:34.472Z","logLevel":"info","context":"general","message":"Size threshold reached 101863539 >= 100000000, stopping","details":{}}
{"timestamp":"2025-02-25T08:27:34.481Z","logLevel":"info","context":"general","message":"Crawler interrupted, gracefully finishing current pages","details":{}}
{"timestamp":"2025-02-25T08:27:34.481Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2025-02-25T08:27:35.343Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/collections/crawl-20250225082620743/crawls/crawl-20250225082735-5ca503fe02f8.yaml","details":{}}
{"timestamp":"2025-02-25T08:27:35.345Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":14,"total":283,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2025-02-25T08:27:35.345Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2025-02-25T08:27:35.346Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
One of the last message is as expected Size threshold reached 101863539 >= 100000000, stopping
But looking at archive content:
> ls -lah ./output/crawl-20250109130150478/archive
-rw-r--r-- 1 me me 957M Jan 9 13:03 rec-f3a6661b2366-20250109130155511-0.warc.gz
-rw-r--r-- 1 me me 202M Jan 9 13:08 rec-f3a6661b2366-20250109130324090-0.warc.gz
The website contains many big PDFs, so it is expected/normal that crawler does not stops exactly at 100MB. But 1.2G is way above 100M.
I'm starting a first investigation and will keep you updated.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status