-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (Language Policy).
- Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- Please do not modify this template :) and fill in all the required fields.
Is your feature request related to a problem?
The problem is that RAPTOR indexing currently runs as a single, long-running monolithic task for the whole batch. When using remote APIs (like SiliconFlow) for large Knowledge Bases (100+ files), the process can take 10+ hours. If a network fluctuation occurs at the 5th hour, the entire job fails, and all progress is lost ("all-or-nothing"). There is no checkpointing or per-document granularity, making it extremely difficult and expensive to use RAPTOR on large datasets.Describe the feature you'd like
I propose granularizing the RAPTOR task execution logic. Instead of treating the entire RAPTOR generation as one giant task, please implement one of the following:
Per-Document Checkpointing: Save the RAPTOR index progress after each document or small batch is processed. If the task fails, allow resuming from the last successful document instead of restarting from zero.
Independent Tasks: Decouple RAPTOR generation so it runs independently for each file (similar to how file parsing works). If one file fails due to an API timeout, only that specific file should be marked as 'Error', while others continue to completion.
Better Error Handling: If an API call fails mid-process, the system should pause or retry that specific chunk, rather than crashing the entire batch job.
Describe implementation you've considered
I strongly suggest reverting to the previous logic where RAPTOR was processed per individual file during the parsing stage, which isolated failures to specific documents.
Alternatively, if batch processing is necessary, implementing intermediate checkpoints is crucial. If an error occurs (e.g., API timeout after 5 hours), restarting the task must resume from the last successful point rather than starting over from scratch. Without these mechanisms, the current implementation is too fragile for production use with unstable APIs.
Documentation, adoption, use case
Environment: RAGFlow v0.22.0 / v0.22.1 LLM Service: SiliconFlow API (DeepSeek/Qwen) Dataset: 100+ PDF documents
Impact: This issue is particularly critical for users relying on paid API tokens. When a task fails after 6 hours due to a single timeout, 6 hours worth of API tokens are permanently wasted with no result. The lack of resilience makes RAPTOR currently viable only for very small KBs or local models, limiting its potential for serious academic or enterprise use cases.Additional information
No response