Skip to content

feat: add mime_type to ByteStream in LibreOfficeFileConverter#3057

Merged
julian-risch merged 4 commits into
deepset-ai:mainfrom
StealthTensor:fix-libreoffice-mime-type
Mar 30, 2026
Merged

feat: add mime_type to ByteStream in LibreOfficeFileConverter#3057
julian-risch merged 4 commits into
deepset-ai:mainfrom
StealthTensor:fix-libreoffice-mime-type

Conversation

@StealthTensor

@StealthTensor StealthTensor commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

Added a _resolve_mime_type helper method to ensure the mime_type field is properly populated when returning ByteStream objects.

  • The logic first attempts to resolve the MIME type using Python's standard mimetypes.guess_type().
  • If that returns None, it safely falls back to a deterministic MIME_TYPE_FALLBACKS dictionary based on the output file's extension.
  • Hooked this resolution into both the run and run_async execution paths.

How did you test it?

Ran the local unit test suite using hatch run test:unit inside the integrations/libreoffice directory. All tests passed successfully without any regressions.

Notes for the reviewer

Thanks for tagging this as a good first issue! I didn't add any new unit tests since the existing suite covers the ByteStream creation, but let me know if you want a specific test added for the MIME resolution or if you need any tweaks to the fallback dictionary.

Checklist

Copilot AI review requested due to automatic review settings March 26, 2026 14:11
@StealthTensor StealthTensor requested a review from a team as a code owner March 26, 2026 14:11
@StealthTensor StealthTensor requested review from julian-risch and removed request for a team March 26, 2026 14:11

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds MIME type detection to LibreOfficeFileConverter so returned ByteStream outputs include mime_type, improving downstream handling (fixes #3055).

Changes:

  • Introduces a _resolve_mime_type() helper using mimetypes.guess_type() with an extension-based fallback map.
  • Populates mime_type on ByteStream outputs in both run and run_async.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CLAassistant

CLAassistant commented Mar 26, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread integrations/libreoffice/tests/test_converter.py
@StealthTensor

Copy link
Copy Markdown
Contributor Author

Friendly ping — PR is ready for review. Happy to address any feedback. Thanks!

@github-actions github-actions Bot added the type:documentation Improvements or additions to documentation label Mar 30, 2026

@julian-risch julian-risch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for your contribution @StealthTensor and congratulations on your first haystack-core-integrations pull request! I extended the existing integration tests to check that the mime type is returned.
I also looked into adding more/extending existing unit tests to test the new mime types in the unit tests but it seemed overly complicated and increased complexity of the tests a lot. I didn't run integration tests locally, only unit tests because of installation steps.

@julian-risch julian-risch merged commit 880057f into deepset-ai:main Mar 30, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:libreoffice type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add mime_type to the returned ByteStreams from LibreOfficeFileConverter

4 participants