Skip to content

feat(docreader): structure Excel table extraction#1796

Open
langcaiye wants to merge 2 commits into
Tencent:mainfrom
langcaiye:feat/structured-excel-parser
Open

feat(docreader): structure Excel table extraction#1796
langcaiye wants to merge 2 commits into
Tencent:mainfrom
langcaiye:feat/structured-excel-parser

Conversation

@langcaiye

Copy link
Copy Markdown
Contributor

Summary

  • add structured extraction for table-like Excel workbooks
  • preserve sheet notes separately from table records to avoid merged-cell duplication
  • emit sheet/row/header metadata for structured Excel chunks while keeping the legacy fallback

Test

  • PYTHONPATH=/Users/langcaiye/research/zankb/WeKnora docreader/.venv/bin/python -m unittest discover -s docreader/tests -p 'test_excel_parser.py'

@langcaiye langcaiye force-pushed the feat/structured-excel-parser branch 2 times, most recently from b2ae936 to 43a34ed Compare June 25, 2026 03:51
@langcaiye langcaiye force-pushed the feat/structured-excel-parser branch from 43a34ed to 03765ad Compare June 25, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant