[Feature Request]: Add callto: to special_protocols set to prevent errors during crawling #1916
Eibich
started this conversation in
Feature requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What needs to be done?
The current implementation of special protocol handling covers
mailto:,tel:,ftp:,file:,data:, andjavascript:, but does not includecallto:. Whilecallto:is not part of any W3C or IETF standard (it was a proprietary scheme introduced by Skype), it is still widely used across many websites. When the crawler encounters acallto:link, it throws an error because the href is not recognized as a special protocol and is presumably processed as a relative URL instead.crawl4ai/crawl4ai/utils.py
Lines 2444 to 2447 in 1debe5f
What problem does this solve?
Make the crawler more resilient against real-world markup.
Target users/beneficiaries
Everyone.
Current alternatives/workarounds
Add a Filter which acceppts only http and https links.
Use the filter where you want to need it:
Proposed approach
Add
callto:to thespecial_protocolsset. Depending on the project's scope, it might also be worth considering other non-standard but commonly encountered URI schemes such assip:,sips:,skype:,viber:,whatsapp:, ortg:(Telegram) to make the crawler more resilient against real-world markup.Beta Was this translation helpful? Give feedback.
All reactions