Open
Description
What is the current behavior?
Looks like crawler doesn't call preRequest for links with empty href?
If the current behavior is a bug, please provide the steps to reproduce
const HCCrawler = require('headless-chrome-crawler');
const seedUrl = 'https://en.comparis.ch/gesundheit/arzt/search?searchcat=doctor';
const capUrl = 'https://en.comparis.ch/gesundheit/arzt';
const testUrl = (url) => !url || url.startsWith(capUrl);
HCCrawler.launch({
obeyRobotsTxt: false,
args: ['--disable-web-security'],
maxDepth: 2,
preRequest: (options) => console.log(`${testUrl(options.url) ? '+' : '-'} [${options.url}]`) || testUrl(options.url),
evaluatePage: (() => ({ text: window.document.body.innerText })
),
onSuccess: ((result) => {
// console.log(` === ${result.options.url} === `);
}),
})
.then((crawler) => {
crawler.queue(seedUrl);
crawler.onIdle()
.then(() => crawler.close());
});
What is the expected behavior?
I expect to see in the console log empty URLs are tested. For example pagination buttons.
What is the motivation / use case for changing the behavior?
to be able to navigate in dynamic sites, where we have links with empty href attr handled by page javascript.
Please tell us about your environment:
- Version: 1.8.0
- Platform / OS version: Ubuntu / 20.04
- Node.js version: v14.0.0
Metadata
Metadata
Assignees
Labels
No labels