Replies: 2 comments 5 replies
-
|
RAGFlow can handle large numbers of HTML files in a knowledge base, but it only parses the static HTML content. The built-in HTML parser extracts the title and main text using the readability library, so it does not execute or process JavaScript—meaning any content generated or modified by JavaScript in your help files will not be included in the knowledge base extraction process see source. If your help files rely on JavaScript to render important content, you’ll need to preprocess them with an external tool (like a headless browser or site scraper that can render JS and output static HTML) before importing them into RAGFlow. For structured directories, you can batch upload the HTML files. Just make sure they’re parsed after upload—parsing is required before the files become searchable in RAGFlow see source. Also, RAGFlow currently requires you to specify the file parsing method when creating the knowledge base, which can be limiting if you have mixed file types, but there are suggestions to make this more flexible in the future see source. If you need to include dynamic content, preprocess your HTML files to flatten them into static HTML before adding them to RAGFlow. Otherwise, for static help files, you can use them directly. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I use a cad software, which has lots of help files in html format. The html files are structured, means, there are sub directories to categorize different themes of cad.
Is it possible to use all these html files (maybe several thousands) in a knowledg base? Can ragflow parse html with js correctly?
Or do you habe some suggestions to handle html files for rag?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions