[Feature Request]: Optimize Table Structure in Document Parsing for Better Token Efficiency

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Is your feature request related to a problem?

```Markdown

```

### Describe the feature you'd like

Feature Request: Optimize Table Structure in Document Parsing for Better Token Efficiency

Currently, parsed documents containing tabular data (e.g., reports) are chunked and stored as raw HTML <table> elements. While this approach faithfully preserves complex table structures (e.g., merged cells, nested headers), it results in low data density—most tokens are spent reconstructing HTML markup rather than conveying actual content. This introduces significant noise, especially for simple two-dimensional tables that could be more efficiently represented in denser formats like CSV or Markdown tables.

Proposal:

Introduce intelligent table serialization during document parsing:

For simple 2D tables (no merged cells, consistent structure): convert to a compact format such as CSV or Markdown to improve token efficiency and reduce noise.
For complex tables (merged cells, irregular layouts): retain HTML representation or explore alternative structured representations that balance fidelity and density.
This would allow the system to adapt its output format based on table complexity, optimizing downstream processing (e.g., LLM consumption, embedding, retrieval) without sacrificing the ability to handle sophisticated layouts.

We’d appreciate community input on strategies for detecting table complexity and alternative compact representations for non-trivial tables.


### Describe implementation you've considered

_No response_

### Documentation, adoption, use case

```Markdown

```

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Optimize Table Structure in Document Parsing for Better Token Efficiency #11490

Self Checks

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Optimize Table Structure in Document Parsing for Better Token Efficiency #11490

Description

Self Checks

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions