[Question]: DeepDoc + “Parse as Paper” Generates Only OCR Text; Missing LLM-Based Graph Descriptions (Works in General Parsing)

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Describe your problem

When using DeepDoc with the “Parse as Paper” option, the output generated in the graph view contains only OCR-extracted text from the PDF.
However, no semantic / LLM-generated description is created for the graph nodes.

In contrast, when the exact same file is processed using “Parse as General”, the system successfully generates:

OCR extraction

and an LLM-generated description / summary for each graph node

This means that Paper parsing mode appears to disable or skip the LLM description step, resulting in graph nodes that are incomplete and less useful for retrieval or downstream reasoning.

At the moment, “Parse as Paper” = OCR only, while “Parse as General” = OCR + LLM description.

This behavior is unexpected because Paper mode is typically used for structured scientific documents, where semantic descriptions are even more important.

**Steps to Reproduce**

Upload any PDF with structured content (e.g., a research paper).

Choose DeepDoc as the parser.

Select “Parse as Paper.”

After parsing completes, open the Graph View.

Observe that:

Each node contains only OCR text chunks.

The field where a semantic description normally appears is missing or empty.

Re-upload or re-parse the same file using “Parse as General.”

In Graph View:

Each node now includes not only OCR text but also the LLM-generated description, as expected.

**Expected Behavior**

DeepDoc + Parse as Paper should produce both:

OCR-extracted raw text

LLM-based description / summarization, exactly like General parsing
unless there is a setting explicitly disabling it.

The graph should contain a semantic description field that enhances retrieval quality and document understanding.

**Actual Behavior**

Parse as Paper output:

Only plain OCR text in graph nodes

No LLM summary or description generated

Graph nodes appear incomplete compared to the General mode

Parse as General output:

OCR text + full LLM description

Graph nodes are enriched and more usable


This issue affects:

Document comprehension

RAG quality (embeddings rely heavily on semantic descriptions)

Accuracy of downstream QA

Graph-based retrieval usefulness

Especially for research papers, losing LLM descriptions significantly reduces the value of DeepDoc’s processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: DeepDoc + “Parse as Paper” Generates Only OCR Text; Missing LLM-Based Graph Descriptions (Works in General Parsing) #11492

Self Checks

Describe your problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: DeepDoc + “Parse as Paper” Generates Only OCR Text; Missing LLM-Based Graph Descriptions (Works in General Parsing) #11492

Description

Self Checks

Describe your problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions