Skip to content

Agentic Retrieval fails to surface deep nested sections, returns overly generic answers #115

@EricNGOntos

Description

@EricNGOntos

Problem

When querying documents with deep heading hierarchies (e.g. 5+ levels), the agentic retrieval agent consistently returns high-level summaries instead of the precise leaf-level content that answers the question.

Root Cause Observed (User Perspective)

The document navigation phase only "sees" the top-level outline sections. When a parent node (e.g. Section 5: Safety Measures) is selected during discovery, its child chunks (e.g. 5.3 / 5.3.1 Monitoring) are loaded into a flat buffer rather than being properly nested into the document tree. As a result:

  • The agent cannot drill down to child sections within the same navigation turn.
  • Deep chunks are displayed as orphaned text instead of being nested under their section headings.
  • The answer is assembled from the generic parent summary, missing the specific technical details.

Steps to Reproduce

  1. Ingest a structured document with at least 4 heading levels (e.g. a technical construction or safety report).
  2. Ask a question that requires a specific detail from a level-3 or deeper sub-section.
  3. Observe that the retrieved evidence contains only level-1 section summaries.

Expected Behavior

  • When the agent COLLECTs a non-leaf section, all descendant chunks should be reparented into the correct child subtree.
  • The navigation runner should build a hierarchical outline by nesting outline metadata + hydrated leaf content, level by level.
  • A shallow hydration mode should exist so the agent can fetch direct children only (avoiding over-fetching the entire subtree).
  • The depth-limit filter in section loading should be bypassable when the navigation runner explicitly needs the full tree.

Impact

Users asking detailed, fact-specific questions on long documents receive dangerously incomplete answers. This erodes trust in the knowledge retrieval system for professional use cases (legal, engineering, medical documentation).

Metadata

Metadata

Assignees

Labels

agentic-rag-coreCore agentic RAG retrieval pipeline

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions