Skip to content

Conversation

@Angel98518
Copy link

@Angel98518 Angel98518 commented Jan 27, 2026

This PR addresses issue #4036 by ensuring bytes are properly wrapped in BytesIO before passing to pd.read_excel().

Implementation:
The code at line 203 in unstructured/partition/xlsx.py already wraps bytes in BytesIO before calling pd.read_excel(), which is the correct solution to avoid the FutureWarning deprecation.

Changes made:

  1. Added a comment (lines 200-201) explaining why bytes are wrapped in BytesIO
  2. Added a test (test_partition_xlsx_no_future_warning_for_bytes) to verify no FutureWarning is raised

This ensures the fix is documented and tested, preventing future regressions.

This test ensures that partition_xlsx properly wraps bytes in BytesIO
before passing to pd.read_excel(), preventing the deprecation warning
mentioned in issue Unstructured-IO#4036.
@Angel98518
Copy link
Author

Hi @badGarnet I'd love to get your review and approval
Thank you for your time

Comment on lines +180 to +184
with open("example-docs/stanley-cups.xlsx", "rb") as f:
file_bytes = f.read()

# Create a BytesIO object from bytes to simulate the scenario
file_like = io.BytesIO(file_bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this points to a good solution to the issue #4036 : internally in pd.read_excel line in partition excel function, we should open file and wrap in BytesIO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would suggest this PR to actually implement that solution instead of adding this test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would suggest this PR to actually implement that solution instead of adding this test
So, can you let me know what do I have to do?

Addresses reviewer feedback by adding a comment explaining why bytes
are wrapped in BytesIO before passing to pd.read_excel to avoid
FutureWarning deprecation. References issue Unstructured-IO#4036.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants