refactor(json_tracker): simplify using sibling heuristic#2000
Merged
jxnl merged 3 commits into567-labs:mainfrom Jan 15, 2026
Merged
refactor(json_tracker): simplify using sibling heuristic#2000jxnl merged 3 commits into567-labs:mainfrom
jxnl merged 3 commits into567-labs:mainfrom
Conversation
Reduce JsonCompleteness from 302 to 139 lines (54% reduction). Key insight: if a value has a next sibling in the parsed structure, it must be complete (jiter had to finish parsing it to find the next). For the last sibling, parent validation will cover it when complete. Changes: - Remove all string scanning and brace counting logic - Use jiter for parsing, simple structure walking for completeness - Core logic now just ~25 lines in _check_siblings() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 0e32d26 in 2 minutes and 18 seconds. Click for details.
- Reviewed
374lines of code in1files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. instructor/dsl/json_tracker.py:66
- Draft comment:
Avoid using a bare 'except Exception:' block. Consider catching specific exceptions or logging the error to aid debugging. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 85% The code is dealing with JSON parsing during streaming, where incomplete or malformed JSON is expected. The bareexcept Exception:blocks appear intentional - they're handling cases where parsing might fail for various reasons (incomplete JSON, malformed JSON, encoding issues, etc.). At line 65,is_json_complete()already checked if the JSON is complete, so the try-except at line 66-70 seems defensive. The comment is technically correct that bare exceptions are generally not best practice, but in this streaming context where various parsing failures are expected and the code needs to be resilient, this might be acceptable. However, the comment does suggest an improvement - at minimum, specific exceptions from jiter could be caught, or errors could be logged for debugging. The bare exception handling might be intentional for this streaming use case where various parsing errors are expected and should be silently handled. The code already has a guard withis_json_complete(), so the exception case might be rare. Additionally, this is internal library code where silent failure and graceful degradation might be the desired behavior rather than logging every parsing error during streaming. While silent failure might be intentional, the comment raises a valid code quality point. Even in streaming contexts, catching specific exceptions (like jiter's parsing exceptions) would make the code more maintainable and debuggable. However, looking at the rules, I should only keep comments that suggest "actionable and clear" refactors. The comment is somewhat generic ("consider catching specific exceptions or logging") without being specific about which exceptions or how to log them in this context. This is a generic code quality suggestion about exception handling. While technically correct, it's not specific enough to be clearly actionable - it doesn't specify which exceptions from jiter should be caught or how logging should be implemented in this streaming context. The bare exception handling appears intentional for resilience during streaming.
2. instructor/dsl/json_tracker.py:75
- Draft comment:
Consider using a named constant for the partial_mode value ('trailing-strings') to improve maintainability. - Reason this comment was not posted:
Confidence changes required:33%<= threshold85%None
3. instructor/dsl/json_tracker.py:106
- Draft comment:
Clarify in a comment that last siblings are intentionally not marked complete—even if scalar—since their completeness is only confirmed by the parent structure. - Reason this comment was not posted:
Confidence changes required:33%<= threshold85%None
4. instructor/dsl/json_tracker.py:61
- Draft comment:
The check for empty JSON strings (using 'if not json_str or not json_str.strip()') appears in both is_json_complete() and analyze(). Consider refactoring to centralize this logic and avoid duplication. - Reason this comment was not posted:
Confidence changes required:33%<= threshold85%None
Workflow ID: wflow_oFcA623ch47ythcU
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Previously analyze() called is_json_complete() which parses once, then parsed again. Now we try strict parsing first and reuse the result.
Collaborator
|
thank you so much for all the contributions! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I meant to include this in the last PR, but you merge them so fast :-)
Summary
Key Insight
If a value has a next sibling in the parsed structure, it must be complete (jiter had to finish parsing it to find the next sibling). For the last sibling, we don't need to know - parent validation will cover it when complete.
Before (302 lines)
Custom JSON parsing with _analyze_object(), _analyze_array(), _analyze_string(), _analyze_number() and various helpers.
After (139 lines)
Core logic is now ~25 lines in _check_siblings() - just walks the structure jiter already parsed.
Test plan
Generated with Claude Code
Important
Refactor
JsonCompletenessinjson_tracker.pyto simplify JSON completeness tracking usingjiterand a sibling heuristic, reducing code complexity and size.JsonCompletenessinjson_tracker.pyto usejiterfor parsing JSON._analyze_object(),_analyze_array(), etc.is_json_complete()to check if a JSON string is complete usingjiter._mark_all()and_check_siblings()to mark paths as complete based on sibling heuristic.JsonCompletenessfrom 302 to 139 lines, removing string scanning and brace counting logic.This description was created by
for 0e32d26. You can customize this summary. It will automatically update as commits are pushed.