Skip to content

refactor(json_tracker): simplify using sibling heuristic#2000

Merged
jxnl merged 3 commits into567-labs:mainfrom
thomasnormal:refactor/simplify-json-tracker
Jan 15, 2026
Merged

refactor(json_tracker): simplify using sibling heuristic#2000
jxnl merged 3 commits into567-labs:mainfrom
thomasnormal:refactor/simplify-json-tracker

Conversation

@thomasnormal
Copy link
Contributor

@thomasnormal thomasnormal commented Jan 13, 2026

I meant to include this in the last PR, but you merge them so fast :-)

Summary

  • Simplify JsonCompleteness from 302 to 139 lines (54% reduction)
  • Remove all string scanning and brace counting logic
  • Use jiter for parsing, simple structure walking for completeness

Key Insight

If a value has a next sibling in the parsed structure, it must be complete (jiter had to finish parsing it to find the next sibling). For the last sibling, we don't need to know - parent validation will cover it when complete.

Before (302 lines)

Custom JSON parsing with _analyze_object(), _analyze_array(), _analyze_string(), _analyze_number() and various helpers.

After (139 lines)

Core logic is now ~25 lines in _check_siblings() - just walks the structure jiter already parsed.

Test plan

  • All 45 partial streaming tests pass
  • Edge case tests for nested objects, arrays, strings with braces

Generated with Claude Code


Important

Refactor JsonCompleteness in json_tracker.py to simplify JSON completeness tracking using jiter and a sibling heuristic, reducing code complexity and size.

  • Behavior:
    • Refactor JsonCompleteness in json_tracker.py to use jiter for parsing JSON.
    • Simplifies logic by using sibling heuristic: if a value has a next sibling, it is complete.
    • Removes custom parsing methods like _analyze_object(), _analyze_array(), etc.
  • Functions:
    • Adds is_json_complete() to check if a JSON string is complete using jiter.
    • Implements _mark_all() and _check_siblings() to mark paths as complete based on sibling heuristic.
  • Misc:
    • Reduces JsonCompleteness from 302 to 139 lines, removing string scanning and brace counting logic.

This description was created by Ellipsis for 0e32d26. You can customize this summary. It will automatically update as commits are pushed.

Reduce JsonCompleteness from 302 to 139 lines (54% reduction).

Key insight: if a value has a next sibling in the parsed structure,
it must be complete (jiter had to finish parsing it to find the next).
For the last sibling, parent validation will cover it when complete.

Changes:
- Remove all string scanning and brace counting logic
- Use jiter for parsing, simple structure walking for completeness
- Core logic now just ~25 lines in _check_siblings()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 0e32d26 in 2 minutes and 18 seconds. Click for details.
  • Reviewed 374 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. instructor/dsl/json_tracker.py:66
  • Draft comment:
    Avoid using a bare 'except Exception:' block. Consider catching specific exceptions or logging the error to aid debugging.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 85% The code is dealing with JSON parsing during streaming, where incomplete or malformed JSON is expected. The bare except Exception: blocks appear intentional - they're handling cases where parsing might fail for various reasons (incomplete JSON, malformed JSON, encoding issues, etc.). At line 65, is_json_complete() already checked if the JSON is complete, so the try-except at line 66-70 seems defensive. The comment is technically correct that bare exceptions are generally not best practice, but in this streaming context where various parsing failures are expected and the code needs to be resilient, this might be acceptable. However, the comment does suggest an improvement - at minimum, specific exceptions from jiter could be caught, or errors could be logged for debugging. The bare exception handling might be intentional for this streaming use case where various parsing errors are expected and should be silently handled. The code already has a guard with is_json_complete(), so the exception case might be rare. Additionally, this is internal library code where silent failure and graceful degradation might be the desired behavior rather than logging every parsing error during streaming. While silent failure might be intentional, the comment raises a valid code quality point. Even in streaming contexts, catching specific exceptions (like jiter's parsing exceptions) would make the code more maintainable and debuggable. However, looking at the rules, I should only keep comments that suggest "actionable and clear" refactors. The comment is somewhat generic ("consider catching specific exceptions or logging") without being specific about which exceptions or how to log them in this context. This is a generic code quality suggestion about exception handling. While technically correct, it's not specific enough to be clearly actionable - it doesn't specify which exceptions from jiter should be caught or how logging should be implemented in this streaming context. The bare exception handling appears intentional for resilience during streaming.
2. instructor/dsl/json_tracker.py:75
  • Draft comment:
    Consider using a named constant for the partial_mode value ('trailing-strings') to improve maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 85% None
3. instructor/dsl/json_tracker.py:106
  • Draft comment:
    Clarify in a comment that last siblings are intentionally not marked complete—even if scalar—since their completeness is only confirmed by the parent structure.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 85% None
4. instructor/dsl/json_tracker.py:61
  • Draft comment:
    The check for empty JSON strings (using 'if not json_str or not json_str.strip()') appears in both is_json_complete() and analyze(). Consider refactoring to centralize this logic and avoid duplication.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 85% None

Workflow ID: wflow_oFcA623ch47ythcU

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Previously analyze() called is_json_complete() which parses once,
then parsed again. Now we try strict parsing first and reuse the result.
@jxnl jxnl merged commit a937635 into 567-labs:main Jan 15, 2026
14 checks passed
@jxnl
Copy link
Collaborator

jxnl commented Jan 15, 2026

thank you so much for all the contributions!

cursor bot pushed a commit that referenced this pull request Jan 16, 2026
Update CHANGELOG for PRs #2000, #2002, #2005, #2007, and #2011.

Co-authored-by: jason <jason@jxnl.co>
@jxnl jxnl mentioned this pull request Jan 16, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants