refactor(json_tracker): simplify using sibling heuristic by thomasnormal · Pull Request #2000 · 567-labs/instructor

thomasnormal · 2026-01-13T22:31:14Z

I meant to include this in the last PR, but you merge them so fast :-)

Summary

Simplify JsonCompleteness from 302 to 139 lines (54% reduction)
Remove all string scanning and brace counting logic
Use jiter for parsing, simple structure walking for completeness

Key Insight

If a value has a next sibling in the parsed structure, it must be complete (jiter had to finish parsing it to find the next sibling). For the last sibling, we don't need to know - parent validation will cover it when complete.

Before (302 lines)

Custom JSON parsing with _analyze_object(), _analyze_array(), _analyze_string(), _analyze_number() and various helpers.

After (139 lines)

Core logic is now ~25 lines in _check_siblings() - just walks the structure jiter already parsed.

Test plan

All 45 partial streaming tests pass
Edge case tests for nested objects, arrays, strings with braces

Generated with Claude Code

Important

Refactor JsonCompleteness in json_tracker.py to simplify JSON completeness tracking using jiter and a sibling heuristic, reducing code complexity and size.

Behavior:
- Refactor JsonCompleteness in json_tracker.py to use jiter for parsing JSON.
- Simplifies logic by using sibling heuristic: if a value has a next sibling, it is complete.
- Removes custom parsing methods like _analyze_object(), _analyze_array(), etc.
Functions:
- Adds is_json_complete() to check if a JSON string is complete using jiter.
- Implements _mark_all() and _check_siblings() to mark paths as complete based on sibling heuristic.
Misc:
- Reduces JsonCompleteness from 302 to 139 lines, removing string scanning and brace counting logic.

^{This description was created by}^{for 0e32d26. You can customize this summary. It will automatically update as commits are pushed.}

Reduce JsonCompleteness from 302 to 139 lines (54% reduction). Key insight: if a value has a next sibling in the parsed structure, it must be complete (jiter had to finish parsing it to find the next). For the last sibling, parent validation will cover it when complete. Changes: - Remove all string scanning and brace counting logic - Use jiter for parsing, simple structure walking for completeness - Core logic now just ~25 lines in _check_siblings() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 0e32d26 in 2 minutes and 18 seconds. Click for details.

Reviewed 374 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. instructor/dsl/json_tracker.py:66

Draft comment:
Avoid using a bare 'except Exception:' block. Consider catching specific exceptions or logging the error to aid debugging.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 85% The code is dealing with JSON parsing during streaming, where incomplete or malformed JSON is expected. The bare except Exception: blocks appear intentional - they're handling cases where parsing might fail for various reasons (incomplete JSON, malformed JSON, encoding issues, etc.). At line 65, is_json_complete() already checked if the JSON is complete, so the try-except at line 66-70 seems defensive. The comment is technically correct that bare exceptions are generally not best practice, but in this streaming context where various parsing failures are expected and the code needs to be resilient, this might be acceptable. However, the comment does suggest an improvement - at minimum, specific exceptions from jiter could be caught, or errors could be logged for debugging. The bare exception handling might be intentional for this streaming use case where various parsing errors are expected and should be silently handled. The code already has a guard with is_json_complete(), so the exception case might be rare. Additionally, this is internal library code where silent failure and graceful degradation might be the desired behavior rather than logging every parsing error during streaming. While silent failure might be intentional, the comment raises a valid code quality point. Even in streaming contexts, catching specific exceptions (like jiter's parsing exceptions) would make the code more maintainable and debuggable. However, looking at the rules, I should only keep comments that suggest "actionable and clear" refactors. The comment is somewhat generic ("consider catching specific exceptions or logging") without being specific about which exceptions or how to log them in this context. This is a generic code quality suggestion about exception handling. While technically correct, it's not specific enough to be clearly actionable - it doesn't specify which exceptions from jiter should be caught or how logging should be implemented in this streaming context. The bare exception handling appears intentional for resilience during streaming.

2. instructor/dsl/json_tracker.py:75

Draft comment:
Consider using a named constant for the partial_mode value ('trailing-strings') to improve maintainability.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 85% None

3. instructor/dsl/json_tracker.py:106

Draft comment:
Clarify in a comment that last siblings are intentionally not marked complete—even if scalar—since their completeness is only confirmed by the parent structure.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 85% None

4. instructor/dsl/json_tracker.py:61

Draft comment:
The check for empty JSON strings (using 'if not json_str or not json_str.strip()') appears in both is_json_complete() and analyze(). Consider refactoring to centralize this logic and avoid duplication.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 85% None

Workflow ID: wflow_oFcA623ch47ythcU

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Previously analyze() called is_json_complete() which parses once, then parsed again. Now we try strict parsing first and reuse the result.

jxnl · 2026-01-15T15:42:00Z

thank you so much for all the contributions!

Update CHANGELOG for PRs #2000, #2002, #2005, #2007, and #2011. Co-authored-by: jason <jason@jxnl.co>

ellipsis-dev bot reviewed Jan 13, 2026

View reviewed changes

thomasahle added 2 commits January 13, 2026 23:35

fix: catch ValueError specifically instead of bare Exception

3ac734f

perf: avoid double parsing for complete JSON

b548cff

Previously analyze() called is_json_complete() which parses once, then parsed again. Now we try strict parsing first and reuse the result.

jxnl merged commit a937635 into 567-labs:main Jan 15, 2026
14 checks passed

cursor bot pushed a commit that referenced this pull request Jan 16, 2026

chore(release): bump version to 1.14.4

25304f8

Update CHANGELOG for PRs #2000, #2002, #2005, #2007, and #2011. Co-authored-by: jason <jason@jxnl.co>

jxnl mentioned this pull request Jan 16, 2026

New release preparation #2013

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(json_tracker): simplify using sibling heuristic#2000

refactor(json_tracker): simplify using sibling heuristic#2000
jxnl merged 3 commits into567-labs:mainfrom
thomasnormal:refactor/simplify-json-tracker

thomasnormal commented Jan 13, 2026 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

jxnl commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

thomasnormal commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Insight

Before (302 lines)

After (139 lines)

Test plan

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jxnl commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thomasnormal commented Jan 13, 2026 •

edited

Loading