Skip to content

fix: off-by-one error in RecursiveJsonSplitter.split_json#35649

Open
Ethan T. (gambletan) wants to merge 1 commit intolangchain-ai:masterfrom
gambletan:fix/json-splitter-off-by-one
Open

fix: off-by-one error in RecursiveJsonSplitter.split_json#35649
Ethan T. (gambletan) wants to merge 1 commit intolangchain-ai:masterfrom
gambletan:fix/json-splitter-off-by-one

Conversation

@gambletan
Copy link
Contributor

Summary

Fixes #29153

Two bugs in RecursiveJsonSplitter._json_split() that cause data loss at chunk boundaries:

  1. Off-by-one boundary check: Changed size < remaining to size <= remaining so items that fit exactly at the chunk size boundary are added to the current chunk instead of being pushed to a new one.

  2. Empty dict/leaf value loss: When a value doesn't fit in the current chunk and a new chunk is started, the code recursed into the value. For empty dicts {}, the for key, value in data.items() loop has zero iterations, silently dropping the key-value pair from all chunks. Added explicit handling to directly set leaf values and empty dicts instead of recursing.

Reproduction

from langchain_text_splitters import RecursiveJsonSplitter

data = {
    "projects": {
        "GTMS": {f"GTMS-{i}": {} for i in range(1, 23)},
        "ITSAMPLE": {f"ITSAMPLE-{i}": {} for i in range(1, 4)},
    }
}

splitter = RecursiveJsonSplitter(max_chunk_size=300)
chunks = splitter.split_json(data)

# Before fix: GTMS-10 and ITSAMPLE-2 are lost at chunk boundaries
# After fix: all items are present in chunks

Test plan

🤖 Generated with Claude Code

Two issues fixed:
1. Changed `size < remaining` to `size <= remaining` so items that fit
   exactly at the boundary are added to the current chunk instead of
   being pushed to a new one.
2. Added explicit handling for empty dicts and leaf values when they
   don't fit in the current chunk. Previously, recursing into an empty
   dict `{}` caused the for loop to have zero iterations, silently
   dropping the key-value pair from all chunks.

Fixes langchain-ai#29153

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added text-splitters Related to the package `text-splitters` external fix For PRs that implement a fix labels Mar 8, 2026
@org-membership-reviewer org-membership-reviewer bot added the size: XS < 50 LOC label Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external fix For PRs that implement a fix size: XS < 50 LOC text-splitters Related to the package `text-splitters`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Offset by 1 bug on RecursiveJsonSplitter::split_json() function

1 participant