fix: use Document.from_dict in InMemoryDocumentStore.load_from_disk by Ayushhgit · Pull Request #11594 · deepset-ai/haystack

Ayushhgit · 2026-06-12T08:08:00Z

Related Issues

fixes fix: InMemoryDocumentStore.load_from_disk corrupts documents with blob or sparse_embedding (loaded as raw dicts) #11593

Proposed Changes:

InMemoryDocumentStore.load_from_disk rebuilt documents with the plain Document(**doc) constructor, which performs no conversion of nested fields. Since save_to_disk serializes with Document.to_dict(flatten=False) (converting blob to ByteStream.to_dict() and sparse_embedding to SparseEmbedding.to_dict()), any document saved with those fields came back with raw dicts in their place. The corrupted documents crashed repr(), to_dict(), equality comparison, a second save_to_disk, and any component accessing document.blob.data (e.g. image pipelines).

One-line fix: reconstruct with Document.from_dict(doc), the documented inverse of to_dict, which restores ByteStream and SparseEmbedding instances.

How did you test it?

New regression test test_save_to_disk_and_load_from_disk_with_blob_and_sparse_embedding: saves a document with both a blob and a sparse_embedding, reloads, asserts proper types, equality with the original, and that the reloaded store can be saved again. Fails on main, passes with this fix.
hatch run test:unit test/document_stores/test_in_memory.py — 148 passed, 4 skipped.

Notes for the reviewer

Document.from_dict also handles the nested meta dict produced by to_dict(flatten=False), so documents without blob/sparse fields round-trip exactly as before (covered by the existing test_save_to_disk_and_load_from_disk).

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

🤖 Generated with Claude Code

load_from_disk rebuilt documents with the plain Document constructor, which does not convert nested fields. Documents saved with a blob (ByteStream) or sparse_embedding (SparseEmbedding) came back with those fields as raw dicts, crashing repr(), to_dict(), equality comparison, save_to_disk of the reloaded store, and any component accessing document.blob.data. save_to_disk serializes with Document.to_dict(flatten=False); Document.from_dict is its inverse and restores the proper types. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vercel · 2026-06-12T08:08:08Z

@Ayushhgit is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

davidsbatista · 2026-06-12T08:51:48Z

@Ayushhgit you currently have 3 open PRs and keep opening more. Please, focus on one PR at a time.

Ayushhgit · 2026-06-12T09:40:42Z

Hey @davidsbatista these were my last, I'll wait until all current PR's of mine close until starting a new one. Sorry if I caused any inconvenience.

Ayushhgit requested a review from a team as a code owner June 12, 2026 08:08

Ayushhgit requested review from davidsbatista and removed request for a team June 12, 2026 08:08

github-actions Bot added the topic:tests label Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use Document.from_dict in InMemoryDocumentStore.load_from_disk#11594

fix: use Document.from_dict in InMemoryDocumentStore.load_from_disk#11594
Ayushhgit wants to merge 1 commit into
deepset-ai:mainfrom
Ayushhgit:fix-load-from-disk-document-from-dict

Ayushhgit commented Jun 12, 2026

Uh oh!

vercel Bot commented Jun 12, 2026

Uh oh!

davidsbatista commented Jun 12, 2026

Uh oh!

Ayushhgit commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ayushhgit commented Jun 12, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented Jun 12, 2026

Uh oh!

davidsbatista commented Jun 12, 2026

Uh oh!

Ayushhgit commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants