Skip to content

--sizeLimit behavior seems inappropriate #776

@benoit74

Description

@benoit74

Command to repro:

docker run -v $PWD/output:/output --name crawlme --rm  webrecorder/browsertrix-crawler:1.5.4 crawl --url "https://www.survivorlibrary.com/index.php/Accounting" --scopeType host --cwd /output --sizeLimit 100000000

Expected behavior: crawler should stop when archive size has reached about 100MB

Logs

{"timestamp":"2025-02-25T08:26:20.760Z","logLevel":"info","context":"general","message":"Browsertrix-Crawler 1.5.4 (with warcio.js 2.4.3)","details":{}} {"timestamp":"2025-02-25T08:26:20.761Z","logLevel":"info","context":"general","message":"Seeds","details":[{"url":"https://www.survivorlibrary.com/index.php/Accounting","scopeType":"host","include":["/^https?:\\/\\/www\\.survivorlibrary\\.com\\//"],"exclude":[],"allowHash":false,"depth":-1,"sitemap":null,"auth":null,"_authEncoded":null,"maxExtraHops":0,"maxDepth":1000000}]} {"timestamp":"2025-02-25T08:26:20.761Z","logLevel":"info","context":"general","message":"Link Selectors","details":[{"selector":"a[href]","extract":"href","isAttribute":false}]} {"timestamp":"2025-02-25T08:26:20.761Z","logLevel":"info","context":"general","message":"Behavior Options","details":{"message":"{\"autoplay\":true,\"autofetch\":true,\"autoscroll\":true,\"siteSpecific\":true,\"log\":\"__bx_log\",\"startEarly\":true,\"clickSelector\":\"a\"}"}} {"timestamp":"2025-02-25T08:26:21.163Z","logLevel":"info","context":"worker","message":"Creating 1 workers","details":{}} {"timestamp":"2025-02-25T08:26:21.164Z","logLevel":"info","context":"worker","message":"Worker starting","details":{"workerid":0}} {"timestamp":"2025-02-25T08:26:21.245Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/Accounting"}} {"timestamp":"2025-02-25T08:26:21.246Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:21.165Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/Accounting\",\"added\":\"2025-02-25T08:26:20.848Z\",\"depth\":0}"]}} {"timestamp":"2025-02-25T08:26:22.425Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:23.428Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/index.php/Accounting","frameId":"1BCD0EDE107814DF87457333EB010696"}} {"timestamp":"2025-02-25T08:26:27.635Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/Accounting"],"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:27.635Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/Accounting","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:28.157Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:28.158Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:28.159Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/Accounting","page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:28.159Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:30.662Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/Accounting","workerid":0}} {"timestamp":"2025-02-25T08:26:30.673Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/"}} {"timestamp":"2025-02-25T08:26:30.674Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":1,"total":84,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:30.673Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/\",\"added\":\"2025-02-25T08:26:27.573Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:26:31.596Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:35.680Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/"],"page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:35.680Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/","page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:36.199Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:36.200Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:36.200Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/","page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:36.200Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:38.702Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/","workerid":0}} {"timestamp":"2025-02-25T08:26:38.712Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/main-library-index/"}} {"timestamp":"2025-02-25T08:26:38.713Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":2,"total":94,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:38.712Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/main-library-index\\/\",\"added\":\"2025-02-25T08:26:27.574Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:26:39.769Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:43.756Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/main-library-index/"],"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:43.756Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/main-library-index/","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:44.276Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:44.277Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:44.278Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/main-library-index/","page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:44.278Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:46.780Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/main-library-index/","workerid":0}} {"timestamp":"2025-02-25T08:26:46.791Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/store/"}} {"timestamp":"2025-02-25T08:26:46.791Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":3,"total":260,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:46.790Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/store\\/\",\"added\":\"2025-02-25T08:26:27.574Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:26:48.516Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:52.599Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/store/"],"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:52.599Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/store/","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:53.114Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:53.115Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:53.115Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/store/","page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:53.115Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:55.619Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/store/","workerid":0}} {"timestamp":"2025-02-25T08:26:55.629Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/about-us/"}} {"timestamp":"2025-02-25T08:26:55.629Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":4,"total":273,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:26:55.628Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/about-us\\/\",\"added\":\"2025-02-25T08:26:27.575Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:26:56.656Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:00.498Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/about-us/"],"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:00.498Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/about-us/","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:01.012Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:01.012Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:01.013Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/about-us/","page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:01.013Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:03.516Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/about-us/","workerid":0}} {"timestamp":"2025-02-25T08:27:03.648Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/library-faqs/"}} {"timestamp":"2025-02-25T08:27:03.648Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":5,"total":274,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:03.524Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/library-faqs\\/\",\"added\":\"2025-02-25T08:26:27.575Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:04.678Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:05.361Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/index.php/library-faqs/","frameId":"23694ECCB488D794F9C10D5E3CE48705"}} {"timestamp":"2025-02-25T08:27:05.387Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/wp-content/uploads/2024/06/Slider1-7.jpg","frameId":"23694ECCB488D794F9C10D5E3CE48705"}} {"timestamp":"2025-02-25T08:27:05.388Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.survivorlibrary.com/wp-content/uploads/2024/06/Slide-2.jpg","frameId":"23694ECCB488D794F9C10D5E3CE48705"}} {"timestamp":"2025-02-25T08:27:08.601Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/library-faqs/"],"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:08.601Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/library-faqs/","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:09.116Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:09.116Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:09.117Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/library-faqs/","page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:09.117Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:11.620Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/library-faqs/","workerid":0}} {"timestamp":"2025-02-25T08:27:11.630Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/index.php/contact-me/"}} {"timestamp":"2025-02-25T08:27:11.631Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":6,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:11.630Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/index.php\\/contact-me\\/\",\"added\":\"2025-02-25T08:26:27.576Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:12.650Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:16.470Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://www.survivorlibrary.com/index.php/contact-me/"],"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:16.470Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/contact-me/","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:16.982Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:16.982Z","logLevel":"info","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:16.983Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://www.survivorlibrary.com/index.php/contact-me/","page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:16.983Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:19.486Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://www.survivorlibrary.com/index.php/contact-me/","workerid":0}} {"timestamp":"2025-02-25T08:27:19.494Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf"}} {"timestamp":"2025-02-25T08:27:19.495Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":7,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:19.494Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/20th_century_bookkeeping_and_accounting_1922.pdf\",\"added\":\"2025-02-25T08:26:27.576Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:19.946Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:23.502Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:23.502Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/20th_century_bookkeeping_and_accounting_1922.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:23.517Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf"}} {"timestamp":"2025-02-25T08:27:23.518Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":8,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:23.517Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/accounts_in_theory_and_practice_1920.pdf\",\"added\":\"2025-02-25T08:26:27.577Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:23.662Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:24.942Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:24.943Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/accounts_in_theory_and_practice_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:24.959Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf"}} {"timestamp":"2025-02-25T08:27:24.962Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":9,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:24.958Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/bookkeeping_and_accountancy_1911.pdf\",\"added\":\"2025-02-25T08:26:27.577Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:25.110Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:26.668Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:26.668Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/bookkeeping_and_accountancy_1911.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:26.684Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf"}} {"timestamp":"2025-02-25T08:27:26.685Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":10,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:26.683Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf\",\"added\":\"2025-02-25T08:26:27.578Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:26.844Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:27.909Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:27.910Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/bookkeeping-the_principles_and_practice_of_double_entry_1904.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:27.923Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf"}} {"timestamp":"2025-02-25T08:27:27.924Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":11,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:27.922Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/bookkeeping_complete_course_1912.pdf\",\"added\":\"2025-02-25T08:26:27.578Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:28.068Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:30.718Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:30.719Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/bookkeeping_complete_course_1912.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:30.736Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf"}} {"timestamp":"2025-02-25T08:27:30.741Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":12,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:30.735Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/business_accounting_vol_1_1920.pdf\",\"added\":\"2025-02-25T08:26:27.579Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:30.897Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:32.542Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:32.543Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_1_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:32.563Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf"}} {"timestamp":"2025-02-25T08:27:32.568Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":13,"total":283,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-02-25T08:27:32.562Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.survivorlibrary.com\\/library\\/business_accounting_vol_2_1920.pdf\",\"added\":\"2025-02-25T08:26:27.579Z\",\"depth\":1}"]}} {"timestamp":"2025-02-25T08:27:32.717Z","logLevel":"info","context":"fetch","message":"Directly fetching page URL without browser","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:34.468Z","logLevel":"info","context":"fetch","message":"Direct fetch successful","details":{"url":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","mime":"application/pdf","page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:34.468Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":2,"page":"https://www.survivorlibrary.com/library/business_accounting_vol_2_1920.pdf","workerid":0}} {"timestamp":"2025-02-25T08:27:34.472Z","logLevel":"info","context":"general","message":"Size threshold reached 101863539 >= 100000000, stopping","details":{}} {"timestamp":"2025-02-25T08:27:34.481Z","logLevel":"info","context":"general","message":"Crawler interrupted, gracefully finishing current pages","details":{}} {"timestamp":"2025-02-25T08:27:34.481Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}} {"timestamp":"2025-02-25T08:27:35.343Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/collections/crawl-20250225082620743/crawls/crawl-20250225082735-5ca503fe02f8.yaml","details":{}} {"timestamp":"2025-02-25T08:27:35.345Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":14,"total":283,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}} {"timestamp":"2025-02-25T08:27:35.345Z","logLevel":"info","context":"general","message":"Crawling done","details":{}} {"timestamp":"2025-02-25T08:27:35.346Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}

One of the last message is as expected Size threshold reached 101863539 >= 100000000, stopping

But looking at archive content:

> ls -lah ./output/crawl-20250109130150478/archive
-rw-r--r-- 1 me me 957M Jan  9 13:03 rec-f3a6661b2366-20250109130155511-0.warc.gz
-rw-r--r-- 1 me me 202M Jan  9 13:08 rec-f3a6661b2366-20250109130324090-0.warc.gz

The website contains many big PDFs, so it is expected/normal that crawler does not stops exactly at 100MB. But 1.2G is way above 100M.

I'm starting a first investigation and will keep you updated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions