-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi @Jontpan, we started using Intric a few weeks ago at Ekonomistyrningsverket. I came back from parental leave last week and I now have an account.
Where would be a good place to send you feature request? Can I do it by creating issues here?
I'll start with a first one:
I recently created a first assistant using a website as a source and the crawling worked great. That being said, I noticed that what is extracted from webpages contains a lot of useless content. I'm talking about the header, footer, menus and sidebars.
I would like to suggest to use packages such as Mozilla's readability (python version) to extract the article or the main text on the page, when it exists.
I had a look at the code and I think it could be used here instead of html2text. Nothing wrong about Aaron's 13 year-old package but website menus aren't useful content here.