OpenRAG Version
0.5.0
Deployment Method
Local development (make dev)
Operating System
Ubuntu 24.04.4 LTS
Python Version
3.13.13
Affected Area
Ingestion (document processing, upload, Docling)
Bug Description
URL ingestion of page links to sub-pages (crawl depth) needs reconciling with product requirements
Steps to Reproduce
- Go to Chat
- Enter prompt: "Ingest this URL: https://crawler-test.com/"
- URL successfully ingested
- Crawl Depth used by agent is 1 (instead of 2)
- BUG(?): Unable to find any content from sub-pages
- See screenshot below
Expected Behavior
Verify: Only pages up to the configured crawl depth (default 2) are ingested; no runaway crawl
Actual Behavior
- Crawl Depth used by agent is 1 (instead of 2)
Relevant Logs
Screenshots
Additional Context
ℹ️ Feedback from @lucaseduoli
- Crawl depth should be based on the length of the page (rather than sub-pages)
- Should delegate and let agent decide what sub-pages (if any) should be crawled
- No known competitor RAG tools that automatically crawl to a depth of 2 (sub-pages) - only 1 (same page)
- Default crawl depth should be 1 (which is the current behavior)
- Should consult with Product team to verify
- This test scenario is really valid
Checklist
OpenRAG Version
0.5.0
Deployment Method
Local development (make dev)
Operating System
Ubuntu 24.04.4 LTS
Python Version
3.13.13
Affected Area
Ingestion (document processing, upload, Docling)
Bug Description
URL ingestion of page links to sub-pages (crawl depth) needs reconciling with product requirements
Steps to Reproduce
Expected Behavior
Verify: Only pages up to the configured crawl depth (default 2) are ingested; no runaway crawlActual Behavior
Relevant Logs
Screenshots
Additional Context
ℹ️ Feedback from @lucaseduoli
Checklist