Replies: 3 comments
-
|
Yes, RAGFlow lets you customize how documents are chunked. You can define chunking strategies by configuring the parser (using the parser_config dictionary) to split by token count, delimiters, section headers, or even hierarchical document structure—so chunking by article or subsection is possible. This is handled in the chunking logic for each file type (like PDF, DOCX, images), and you can adapt it to your needs using section titles, outlines, or bullet patterns as cues [source]. For metadata, each chunk is a dictionary that can include fields like section title, page number, position, document name, and more. You can set or enrich these fields programmatically during chunking. There are APIs (like setMeta, create_chunk, set_chunk) to attach or update metadata for each chunk [source]. You can also use LLMs to generate or enrich chunk metadata. The "tag knowledge base" feature (v0.16.0+) supports bulk tagging of chunks with LLMs during ingestion, and there are workflows for LLM-powered keyword extraction and content tagging [source]. For images, computer vision LLMs can generate text descriptions as metadata [source]. One limitation: persistent chunk-level metadata isn't natively stored in the main database (only document-level metadata is), but you can extend the storage layer if you need this. Filtering or retrieval by chunk-level tags is not yet supported, but it's on the roadmap [source]. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Is #8914 same to this case? |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Is there any way to define how to create the text chunks of a document, for example by article or by subsections? Also, how can I create metadata for each chunk, is it possible, perhaps with some LLM?
Beta Was this translation helpful? Give feedback.
All reactions