Skip to content

[Bug]: legacy_json_to_doc drops persisted doc_id and generates a new UUID #20749

@gautamvarmadatla

Description

@gautamvarmadatla

Bug Description

When loading legacy node JSON via legacy_json_to_doc, the doc_id stored in the legacy payload is not preserved and a new UUID is generated instead. This would break the backward compatibility for users restoring or migrating persisted stores where stable node IDs are required (e.g., docstore lookups, relationship resolution, etc).

Version

llama-index-core 0.14.15

Steps to Reproduce

from llama_index.core.constants import DATA_KEY, TYPE_KEY
from llama_index.core.schema import Document
from llama_index.core.storage.docstore.utils import legacy_json_to_doc

doc_dict = {
    TYPE_KEY: Document.get_type(),
    DATA_KEY: {
        "text": "hello",
        "extra_info": {},
        "doc_id": "doc-123",
        "relationships": {},
    },
}

loaded = legacy_json_to_doc(doc_dict)

print("expected:", "doc-123")
print("actual:  ", loaded.id_)

assert loaded.id_ == "doc-123", "BUG: legacy loader lost persisted doc_id"

Relevant Logs/Tracbacks

expected: doc-123
actual:   b980379e-b733-49dd-8cfc-4e5bfed7fca5
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipython-input-767892816.py in <cell line: 0>()
     18 print("actual:  ", loaded.id_)
     19 
---> 20 assert loaded.id_ == "doc-123", "BUG: legacy loader lost persisted doc_id"

AssertionError: BUG: legacy loader lost persisted doc_id

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions