Rust implementation by kavitharaju · Pull Request #392 · Bridgeconn/usfm-grammar

kavitharaju · 2026-04-28T05:21:55Z

User description

Implements Rust version for usfm-grammar #370

* Parsing and Partial Validation done * commented changes are executed * Add link to demo in readme * corrected the error collection * draft validator * updates .gitignore * updates .gitignore * updated .gitignore * updated validation * target folder removed --------- Co-authored-by: kavitharaju <kavitharaju18@gmail.com> Co-authored-by: Joel Mathew <mail@joelmath.io>

* On branch rust new file: rust-usfm-parser/Cargo.toml new file: rust-usfm-parser/src/main.rs * rust project init * rust project folder init * rust floder init * added README.md * read me * readme * readme * rust init * rust init * rust init * node to json * Add link to demo in readme * usjGenerator * all markers added * all markers added. * more type support added * more type support added * Rename main.rs to usjGenerator.rs * Delete rust-usfm-parser/src/usj_generator.rs * Delete rust-usfm-parser/src/usj_generator.js * Delete rust-usfm-parser/src/main * usj_generator added modified: rust-usfm-parser/Cargo.lock modified: rust-usfm-parser/Cargo.toml renamed: rust-usfm-parser/src/input.usfm -> rust-usfm-parser/input.usfm modified: rust-usfm-parser/src/main.rs new file: rust-usfm-parser/src/usj_generator.rs * new file: rust-usfm-parser/src/globals.rs modified: rust-usfm-parser/src/main.rs modified: rust-usfm-parser/src/parser.rs modified: rust-usfm-parser/src/usj_generator.rs * Delete rust-usfm-parser/target directory * Update usjGenerator.rs * Delete .idea directory * Delete rust-usfm-parser/target directory * Delete .idea directory * Update package.json * Update usj_generator.py * Delete rust-usfm-parser/src/usjGenerator.rs * Update usj_generator.rs * Update validator.rs * Update schema.rs * Update parser.rs * Update main.rs * Update globals.rs * Delete rust-usfm-parser/captures.rs * Delete rust-usfm-parser/captures.rs * fixed some bugs modified: rust-usfm-parser/input.usfm modified: rust-usfm-parser/src/usj_generator.rs * Delete node-usfm-parser/package-lock.json * Update input.usfm --------- Co-authored-by: kavitharaju <kavitharaju18@gmail.com> Co-authored-by: Joel Mathew <mail@joelmath.io>

* Add link to demo in readme * lib added new file: rust-usfm-parser/src/lib.rs * lib edit modified: rust-usfm-parser/src/lib.rs * test cases added ,filter addded to parser.rs * Test suit verified * added seperate files for test utils and test_usj functions * added seperate files for test_utils and test_usj functions --------- Co-authored-by: kavitharaju <kavitharaju18@gmail.com> Co-authored-by: Joel Mathew <mail@joelmath.io>

* On branch rust new file: rust-usfm-parser/Cargo.toml new file: rust-usfm-parser/src/main.rs * rust project init * rust project folder init * rust floder init * added README.md * read me * readme * readme * rust init * rust init * rust init * node to json * Add link to demo in readme * usjGenerator * all markers added * all markers added. * more type support added * more type support added * Rename main.rs to usjGenerator.rs * Delete rust-usfm-parser/src/usj_generator.rs * Delete rust-usfm-parser/src/usj_generator.js * Delete rust-usfm-parser/src/main * usj_generator added modified: rust-usfm-parser/Cargo.lock modified: rust-usfm-parser/Cargo.toml renamed: rust-usfm-parser/src/input.usfm -> rust-usfm-parser/input.usfm modified: rust-usfm-parser/src/main.rs new file: rust-usfm-parser/src/usj_generator.rs * new file: rust-usfm-parser/src/globals.rs modified: rust-usfm-parser/src/main.rs modified: rust-usfm-parser/src/parser.rs modified: rust-usfm-parser/src/usj_generator.rs * Delete rust-usfm-parser/target directory * Update usjGenerator.rs * Delete .idea directory * Delete rust-usfm-parser/target directory * Delete .idea directory * Update package.json * Update usj_generator.py * Delete rust-usfm-parser/src/usjGenerator.rs * Update usj_generator.rs * Update validator.rs * Update schema.rs * Update parser.rs * Update main.rs * Update globals.rs * Delete rust-usfm-parser/captures.rs * Delete rust-usfm-parser/captures.rs * fixed some bugs modified: rust-usfm-parser/input.usfm modified: rust-usfm-parser/src/usj_generator.rs * Delete node-usfm-parser/package-lock.json * Update input.usfm * lib added new file: rust-usfm-parser/src/lib.rs * lib edit modified: rust-usfm-parser/src/lib.rs * test cases added ,filter addded to parser.rs * Test suit verified * added seperate files for test utils and test_usj functions * added seperate files for test_utils and test_usj functions * format of the errors have updated for test functions * format of the errors have updated for test functions --------- Co-authored-by: amal-J-k <22l024@gecskp.ac.in> Co-authored-by: amal-J-k <143809111+amal-J-k@users.noreply.github.com> Co-authored-by: kavitharaju <kavitharaju18@gmail.com> Co-authored-by: Joel Mathew <mail@joelmath.io>

* latest usj_generator modified: rust-usfm-parser/src/main.rs modified: rust-usfm-parser/src/parser.rs modified: rust-usfm-parser/src/usj_generator.rs modified: rust-usfm-parser/tests/test_usj.rs * cahnged node2_usj modified: rust-usfm-parser/src/main.rs modified: rust-usfm-parser/src/usj_generator.rs * usj_generator changed modified: rust-usfm-parser/src/usj_generator.rs * small changes in node_2_usj modified: rust-usfm-parser/src/usj_generator.rs * debug usj_generator.rs modified: rust-usfm-parser/src/usj_generator.rs * major bug that caused repetition of fixed modified: rust-usfm-parser/src/usj_generator.rs

* latest usj_generator modified: rust-usfm-parser/src/main.rs modified: rust-usfm-parser/src/parser.rs modified: rust-usfm-parser/src/usj_generator.rs modified: rust-usfm-parser/tests/test_usj.rs * cahnged node2_usj modified: rust-usfm-parser/src/main.rs modified: rust-usfm-parser/src/usj_generator.rs * usj_generator changed modified: rust-usfm-parser/src/usj_generator.rs * small changes in node_2_usj modified: rust-usfm-parser/src/usj_generator.rs * debug usj_generator.rs modified: rust-usfm-parser/src/usj_generator.rs * major bug that caused repetition of fixed modified: rust-usfm-parser/src/usj_generator.rs * Update usj_generator.rs

* threading * implimented threading in the new code * updated gitignore * implimented threading in the new code

* Update grammar to support char in rem * Bump version: 3.0.0 → 3.0.1-alpha.1 * Add new test case to check char in rem support

* Fix issue of missing contents when paras are excluded * Always include BCV when converting to List * Copy changes in node to web also * Test and fix * Bump version: 3.0.1-alpha.1 → 3.0.1-alpha.2 * Include link to issue in \rem test case

* Implement biblenlp to usfm conversion * Add a test case for BibleNlp to USFM conversion * Include BibleNLP's default vref in code base * Implement CLI for in_format biblenlp * Add usage instruction for from_biblenlp option in README * Fix some linting issues * Fix more linting issues * Copy biblenlp to usfm conversion logic to node * Add tests in node * Minor test fix * Copy the conversion logic to web module * Tests for web module * Documentation for BibleNlp input usage in web and node modules * Bump version: 3.0.1-alpha.2 → 3.0.1-alpha.3 * Correct Readmes * Correct Readmes * Add links to inner readme files from the outer readme * Allow optional bookcode arg when using full vref and/or texts * Allow optional bookcode arg when using full vref in Node * Allow optional bookcode arg when using full vref in Web module * Test and fix node and web * Allow text list to be of length 23213, 31170 or 41899 * Handle the changes in python CLI * Ignore pylint warning * Ignore pylint warning * Ignore pylint warning

Bumps [tar-fs](https://github.com/mafintosh/tar-fs) from 2.1.1 to 2.1.2. - [Commits](mafintosh/tar-fs@v2.1.1...v2.1.2) --- updated-dependencies: - dependency-name: tar-fs dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix the USJ conversion when there is nestedchar in footnote. In python * Add a test case with char in footnote with and without nesting * Fix the USX conversion in python * Fix the issue in node and web * Update grammar to allow empty \ft and other notes elements * update the usj output in testsuite

* Output version in cli * Bump version: 3.0.1-alpha.3 → 3.0.1-alpha.4 * Validate the usfm starts with \ before parsing * Exempt empty file tests * Change to_list() to generate reference list and change how include exclude works * Implement toList() changes in node * Implement toList() changes in web * Update error message in test for usfm sanity check * Bump version: 3.0.1-alpha.4 → 3.1.0 * Handle linting issues

Upgrades the versions of github actions due to the deprecation. - upload-artifact@v3 to upload-artifacts@v4 - download-artifacts@v3 to download-artifacts@v4

* fix the special handling of c and v not reachable * List xmldom as a peerdependency * Bump version: 3.1.0 → 3.1.1

…#324)

* Update grammar to allow v and empty string in table cell * Bump version: 3.1.1 → 3.1.2 * Add test for empty table cell and v in table * In USX gen, to add verse end, find the previous rows last cell when in for cell of current row * save state of last versetext parent and sid in python * Implement the fix in node and web * Minor change

Bumps [tar-fs](https://github.com/mafintosh/tar-fs) from 2.1.2 to 2.1.3. - [Commits](https://github.com/mafintosh/tar-fs/commits) --- updated-dependencies: - dependency-name: tar-fs dependency-version: 2.1.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update grammar to support \q4 * Change parse tree outputs in grammar tests * Add a test case for \q4 * Let \liv, \th, \thr, \tc, \tcr, etc have unlimited numbering * Let \h have unrestricted numbering

* Add grammar rules for \tcc * Update python methods to include \tcc * Add testcase for \tcc * Update web module to support \tcc * Update node module to support \tcc * Fix linting issues * Fix typo

* Update testpypi_publish.yml * Update testpypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml

* Allow multiple \sr and \r markers under one \s# * Bump version: 3.1.2 → 3.1.3

Bumps [deepdiff](https://github.com/seperman/deepdiff) from 7.0.1 to 8.6.1. - [Release notes](https://github.com/seperman/deepdiff/releases) - [Changelog](https://github.com/seperman/deepdiff/blob/master/docs/changelog.rst) - [Commits](https://github.com/seperman/deepdiff/commits) --- updated-dependencies: - dependency-name: deepdiff dependency-version: 8.6.1 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 3.6.8 to 4.4.8. - [Release notes](https://github.com/jupyterlab/jupyterlab/releases) - [Changelog](https://github.com/jupyterlab/jupyterlab/blob/main/RELEASE.md) - [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/vdom@3.6.8...@jupyterlab/lsp@4.4.8) --- updated-dependencies: - dependency-name: jupyterlab dependency-version: 4.4.8 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [tar-fs](https://github.com/mafintosh/tar-fs) from 2.1.3 to 2.1.4. - [Commits](mafintosh/tar-fs@v2.1.3...v2.1.4) --- updated-dependencies: - dependency-name: tar-fs dependency-version: 2.1.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>

… out generated ones in comparison tests

codeant-ai · 2026-04-28T05:22:00Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-04-28T05:26:52Z

CodeAnt AI is running the review.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-04-28T05:28:20Z

Sequence Diagram

This diagram shows how the new Rust USFMParser accepts multiple input formats, builds a syntax tree with tree sitter, and converts the content into USJ, USX, table, USFM, or BibleNLP formats with optional marker filtering.

sequenceDiagram
    participant Client
    participant USFMParser
    participant TreeSitter
    participant Converters

    Client->>USFMParser: Create from USFM USJ USX or BibleNLP data
    USFMParser->>TreeSitter: Parse USFM string into syntax tree and collect errors
    TreeSitter-->>USFMParser: Syntax tree plus errors warnings
    Client->>USFMParser: Request output in chosen format
    USFMParser->>USFMParser: Apply marker filters and combine adjacent text
    USFMParser->>Converters: Convert filtered USJ to USJ USX table USFM or BibleNLP
    Converters-->>USFMParser: Target format data
    USFMParser-->>Client: Return converted output or write files

Generated by CodeAnt AI

codeant-ai · 2026-04-28T05:30:19Z

CodeAnt AI finished reviewing your PR.

codeant-ai · 2026-04-28T05:31:51Z

+        let (vrefs, texts): (Vec<&str>, Vec<&str>) = if let Some(ref bc) = book_code_upper {
+            let pairs: Vec<(&str, &str)> = input
+                .vref
+                .iter()
+                .zip(input.text.iter())
+                .filter(|(r, _)| r.trim().to_uppercase().starts_with(bc.as_str()))
+                .map(|(r, t)| (r.as_str(), t.as_str()))
+                .collect();
+            pairs.iter().map(|(r, t)| (*r, *t)).unzip()


🟠 Architect Review — HIGH

In biblenlp_to_usfm, when a book_code is provided, vref and text are zipped and filtered before the length check, so mismatched input lists can silently drop extra verses instead of raising the intended length/versification error.

Suggestion: Mirror the Python logic by validating the original vref/text lengths before zipping, and when filtering by book_code, filter both lists in lockstep and then enforce a strict length equality check that raises a ParsingError on mismatch instead of truncating.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood. **Path:** rust-usfm-parser/src/usfm_generator.rs **Line:** 429:437 **Comment:** *HIGH: In biblenlp_to_usfm, when a book_code is provided, vref and text are zipped and filtered before the length check, so mismatched input lists can silently drop extra verses instead of raising the intended length/versification error. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

codeant-ai · 2026-04-28T05:31:51Z

+                let align = if style.contains("tcc") || style.contains("thc") {
+                    "center"
+                } else if style.ends_with('r') {
+                    "end"
+                } else {
+                    "start"


🟠 Architect Review — HIGH

In USXGenerator::node_2_usx_table, table cell alignment uses style.ends_with('r'), so numbered right-aligned markers like "tcr3" and "thr3" are misclassified as "start" rather than "end", diverging from the USJGenerator alignment logic and the tests' expectations.

Suggestion: Normalize the alignment rule for table cells to match USJGenerator (e.g. detect right alignment with style.contains('r') while keeping the special tcc/thc center check) so numbered tcr*/thr* styles are correctly marked as end-aligned in USX output.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood. **Path:** rust-usfm-parser/src/usx_generator.rs **Line:** 615:620 **Comment:** *HIGH: In USXGenerator::node_2_usx_table, table cell alignment uses style.ends_with('r'), so numbered right-aligned markers like "tcr3" and "thr3" are misclassified as "start" rather than "end", diverging from the USJGenerator alignment logic and the tests' expectations. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

codeant-ai · 2026-04-28T05:32:11Z

+fn punct_no_space_after() -> &'static Regex {
+    static RE: OnceLock<Regex> = OnceLock::new();
+    RE.get_or_init(|| Regex::new(r#"[,.\-—/;:!?@$%^)}\]>»]$"#).unwrap())
+}


Suggestion: The "no-space-after" punctuation regex is inverted: it currently matches trailing closing punctuation like . and ,, which suppresses spaces after sentence punctuation and can merge words incorrectly during text combination. Use the opening-punctuation set for this check (as in the Python behavior) so spaces are only suppressed where appropriate. [logic error]

Severity Level: Critical 🚨

- ❌ BibleNLP text output merges words after sentence punctuation. - ❌ USJ output with combine_texts=true mis-spaces verse text. - ⚠️ Table/list exports with combine_texts=true may join tokens.

Steps of Reproduction ✅

1. Build and run the Rust CLI in `rust-usfm-parser/src/main.rs` with a real USFM file that contains sentence-ending punctuation, e.g. `tests/usfmjsTests/psa_quotes/origin.usfm` (verified in that file at `tests/usfmjsTests/psa_quotes/origin.usfm:13-16` with text ending in `,` and `.`). 2. Invoke BibleNLP export so that the Rust code paths are exercised by running: `usfm-grammar --infile tests/usfmjsTests/psa_quotes/origin.usfm --out_format biblenlp --ignore_errors` which hits the `match out_format` arm `"biblenlp"` in `rust-usfm-parser/src/main.rs:27-48` and calls `parser.to_biblenlp_format(ignore_errors)` at `main.rs:41`. 3. Inside `USFMParser::to_biblenlp_format` in `rust-usfm-parser/src/usfm_parser.rs:380-401`, the parser builds a USJ tree and then calls `include_markers_in_usj(&usj_root, &keep, true)` at `usfm_parser.rs:17-19`, passing `combine_texts = true`, which in turn causes `include_inner` in `rust-usfm-parser/src/filter.rs:77-146` to call `combine_consecutive_text_contents` at `filter.rs:128-129`. 4. In `combine_consecutive_text_contents` (`rust-usfm-parser/src/filter.rs:37-67`), the decision to insert a space between accumulated `buffer` and incoming `text` uses `!punct_no_space_before().is_match(&text) && !punct_no_space_after().is_match(&buffer)` at `filter.rs:47-48`. However, `punct_no_space_after` is currently defined as `Regex::new(r#"[,.\-—/;:!?@$%^)}\]>»]$"#)` at `filter.rs:16-18`, which matches *closing* punctuation like `.` and `,`. As a result, when a verse fragment ends with `.` (e.g. the `mockers.` at `psa_quotes/origin.usfm:15`) and the next fragment starts with a word, `punct_no_space_after().is_match(&buffer)` returns true and `add_space` becomes false, producing concatenated output like `mockers.Blessed` in the generated BibleNLP text instead of `mockers. Blessed`. Comparing to the reference JavaScript implementation in `web-usfm-parser/src/filters.js:12-16` shows that `punctPatternNoSpaceAfter` there intentionally matches *opening* punctuation (`/[\-—/`@^&({[<"«]$/`), so the Rust version's use of closing punctuation is inverted and causes incorrect space suppression after sentence punctuation.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/src/filter.rs **Line:** 16:19 **Comment:** *Logic Error: The "no-space-after" punctuation regex is inverted: it currently matches trailing closing punctuation like `.` and `,`, which suppresses spaces after sentence punctuation and can merge words incorrectly during text combination. Use the opening-punctuation set for this check (as in the Python behavior) so spaces are only suppressed where appropriate. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:32:11Z

+    } else if is_biblenlp {
+        Err("BibleNLP input is not yet supported in this CLI. \
+             Supply a USFM, USJ, or USX file instead.".into())


Suggestion: --in_format biblenlp is advertised as a valid input format, but this branch always returns an error instead of constructing a parser from BibleNLP data. This breaks the CLI contract and makes BibleNLP input unusable. Implement BibleNLP parsing here (including reading vref data) and call USFMParser::from_biblenlp(...) instead of hard-failing. [logic error]

Severity Level: Critical 🚨

- ❌ CLI cannot import BibleNLP data via --in_format biblenlp. - ⚠️ Rust CLI diverges from BibleNLP support promised in PR. - ⚠️ Prevents round-tripping BibleNLP → USFM in Rust tool.

Steps of Reproduction ✅

1. Build the Rust CLI binary for this crate (e.g. `cargo build -p rust-usfm-parser`) so that `rust-usfm-parser/src/main.rs` is compiled into an executable using `main()` at lines 130–404. 2. Run the CLI with an explicit BibleNLP input format, for example: `target/debug/rust-usfm-parser --in_format biblenlp some_input.txt` (the `in_format` flag is accepted because it is registered with `value_parser(["usfm", "usj", "usx", "biblenlp"])` in `main` at lines 153–157). 3. In `main` at lines 243–249, the program calls `handle_input(infile, Some(in_format.unwrap_or("usfm")))`, passing `"biblenlp"` as `in_format`; inside `handle_input` (lines 92–124), `fmt_str` is computed and `is_biblenlp` is set to `true` at line 107 when `fmt_str.as_deref() == Some("biblenlp")`. 4. The `else if is_biblenlp` branch at lines 117–119 is taken and returns `Err("BibleNLP input is not yet supported in this CLI. Supply a USFM, USJ, or USX file instead.".into())`, which `main` treats as a fatal error at lines 243–249 (`eprintln!("Error: {}", e); process::exit(1);`), so any `--in_format biblenlp` invocation fails even though the flag is advertised as valid input and `USFMParser::from_biblenlp(...)` exists in `rust-usfm-parser/src/usfm_parser.rs:20–28`.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/src/main.rs **Line:** 117:119 **Comment:** *Logic Error: `--in_format biblenlp` is advertised as a valid input format, but this branch always returns an error instead of constructing a parser from BibleNLP data. This breaks the CLI contract and makes BibleNLP input unusable. Implement BibleNLP parsing here (including reading vref data) and call `USFMParser::from_biblenlp(...)` instead of hard-failing. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:32:11Z

+                    for row in rows {
+                        let line = [
+                            row.book.as_str(),
+                            row.chapter.as_str(),
+                            row.verse.as_str(),
+                            row.text.as_str(),
+                            row.r#type.as_str(),
+                            row.marker.as_str(),
+                        ]
+                        .join(col_sep);
+                        write!(out, "{}{}", line, row_sep).unwrap();


Suggestion: Table output is generated by raw string joining, so any field containing the chosen separator or row separator will corrupt the table structure (shifted columns or broken rows). Use a proper CSV/TSV writer with escaping/quoting instead of manual join. [logic error]

Severity Level: Major ⚠️

- ⚠️ Table/CSV exports can be malformed when text has separators. - ⚠️ Downstream spreadsheet imports may misalign verse columns. - ⚠️ Rust CLI behavior diverges from Python csv.writer implementation.

Steps of Reproduction ✅

1. Build the Rust CLI binary for this crate (e.g. `cargo build -p rust-usfm-parser`) so `main()` in `rust-usfm-parser/src/main.rs:130–404` is used. 2. Prepare an input that leads to a `ListRow` whose `text` includes the chosen column or row separator; for example, create a small USFM file under `tests-data/` where a verse text contains a literal tab character (`\t`) or newline, which `USFMParser::to_list(...)` ultimately copies into `ListRow.text` without stripping (see `rust-usfm-parser/src/list_generator.rs:141–169` where `text.to_string()` is assigned directly). 3. Run the CLI with table output: `target/debug/rust-usfm-parser --out_format table --csv_col_sep "\t" --csv_row_sep "\n" path/to/that.usfm`. In `main` at lines 297–307, the program calls `parser.to_list(...)` and iterates `rows` at lines 308–320. 4. For each row, `main.rs` lines 311–319 build `line` by `[..].join(col_sep)` and then write it with `write!(out, "{}{}", line, row_sep)` at line 320; when `row.text` contains `col_sep` or `row_sep`, the output row gains extra separators or embedded newlines, so downstream CSV/TSV consumers (e.g. Python's `csv.reader` or spreadsheet tools) see shifted columns or split rows instead of one logical record—unlike the Python CLI, which uses `csv.writer` with proper quoting in `py-usfm-parser/src/usfm_grammar/__main__.py:18–30`.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/src/main.rs **Line:** 310:320 **Comment:** *Logic Error: Table output is generated by raw string joining, so any field containing the chosen separator or row separator will corrupt the table structure (shifted columns or broken rows). Use a proper CSV/TSV writer with escaping/quoting instead of manual `join`. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:32:30Z

+            let v_end = append_empty(parent, "verse");
+            v_end.set_attr("eid", eid.as_str());
+        }


Suggestion: Verse end milestones are closed on the current parent instead of the paragraph/row that actually owned the previous verse content. When a verse boundary crosses into a new paragraph or table row, this will attach the eid to the wrong node and produce incorrect USX verse structure. Track and use the previous verse container (like the Python prev_verse_parent) when emitting the previous eid. [logic error]

Severity Level: Critical 🚨

- ❌ USFM-to-USX conversion misplaces verse closing milestones. - ⚠️ Round-trip conversions can desynchronize verse boundaries.

Steps of Reproduction ✅

1. Construct a `USXGenerator` with USFM input where a verse ends in one paragraph and the next `\v` marker starts a new paragraph (e.g., `\p \v 1 ...` followed by `\p \v 2 ...` in the same chapter) and obtain a tree-sitter AST root node. 2. Call `USXGenerator::get_usx` at `rust-usfm-parser/src/usx_generator.rs:142`, which dispatches to `node_2_usx_chapter` for the chapter node at `rust-usfm-parser/src/usx_generator.rs:293`. 3. Inside `node_2_usx_chapter` (lines 293–301), the first `paragraph` child is handled by `node_2_usx` with `parent` pointing to the USX root; `node_2_usx_para` (lines 460–514) creates the first `<para>` and, when it encounters `v 1`, calls `node_2_usx_verse` (lines 376–433) with `parent` pointing to that first `<para>`, setting `prev_verse_sid_to_close = Some("BOOK 1:1")`. 4. When the second `paragraph` (containing `v 2`) is processed, `node_2_usx_para` creates a second `<para>`; its `v 2` node again calls `node_2_usx_verse`, but at entry the `parent` now points to the second `<para>`. The code at lines 379–381 appends `<verse eid="BOOK 1:1"/>` to the second `<para>`, so the end milestone for verse 1 is attached to the wrong container instead of the paragraph or row that actually contained verse 1's content.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/src/usx_generator.rs **Line:** 379:381 **Comment:** *Logic Error: Verse end milestones are closed on the current `parent` instead of the paragraph/row that actually owned the previous verse content. When a verse boundary crosses into a new paragraph or table row, this will attach the `eid` to the wrong node and produce incorrect USX verse structure. Track and use the previous verse container (like the Python `prev_verse_parent`) when emitting the previous `eid`. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:32:30Z

+// #[test]
+fn test_compare_usx_with_testsuite_samples() {


Suggestion: The main test that compares generated USX with testsuite reference XML is not annotated as a test, so it never runs. This silently removes regression coverage for exact USX output differences. Re-enable the test annotation so comparison failures are caught in CI. [code quality]

Severity Level: Critical 🚨

- ❌ `cargo test` omits USX-vs-origin.xml regression coverage. - ⚠️ Structural USX changes may ship without detection. - ⚠️ Differences in normalized whitespace may go unnoticed. - ⚠️ Bible testsuite origin.xml alignment not enforced by Rust tests.

Steps of Reproduction ✅

1. Open `rust-usfm-parser/tests/test_usx_conversion.rs` and locate the section `// test_compare_usx_with_testsuite_samples` starting at line 146. 2. Observe at lines 213–214 that the test attribute is commented out: `// #[test]` followed by `fn test_compare_usx_with_testsuite_samples() {`, meaning this function is a plain helper and not a Rust test. 3. Search for usages with `rg "test_compare_usx_with_testsuite_samples" -n /workspace/usfm-grammar` and confirm that the only occurrence is this definition; no other test or module calls it, so it is never executed during `cargo test`. 4. Run the Rust test suite for this crate (e.g., `cd rust-usfm-parser && cargo test`) and note that only `test_successful_usx_conversion` (lines 73–89), `test_all_markers_are_in_usx_output` (lines 95–143), and `test_usx_round_tripping` (lines 294–339) execute; there is no output or failure related to exact USX-vs-origin.xml comparison even though `test_compare_usx_with_testsuite_samples` contains that logic at lines 215–287, so regressions in exact USX serialization can pass unnoticed.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/tests/test_usx_conversion.rs **Line:** 213:214 **Comment:** *Code Quality: The main test that compares generated USX with testsuite reference XML is not annotated as a test, so it never runs. This silently removes regression coverage for exact USX output differences. Re-enable the test annotation so comparison failures are caught in CI. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:33:14Z

+        if !parser.errors.is_empty() {
+            continue;
+        }


Suggestion: This test silently skips files that fail parsing instead of failing the test, which can hide real regressions in marker preservation. Treat parse errors as failures here so a valid fixture that starts failing does not get ignored. [logic error]

Severity Level: Major ⚠️

- ⚠️ Marker coverage test may pass while fixture parsing fails. - ⚠️ Parser regressions may slip through CI unnoticed.

Steps of Reproduction ✅

1. Run `cargo test test_usj_all_markers_are_in_output` in `rust-usfm-parser/` so the test at `rust-usfm-parser/tests/test_json_conversion.rs:214-246` executes. 2. Inside this test, the loop over `positive_files()` (constructed from `negative_tests()` in `rust-usfm-parser/tests/common.rs:126-131`) initializes a parser for each `origin.usfm` via `initialise_parser` at `rust-usfm-parser/tests/common.rs:176-181`. 3. For any supposedly "positive" fixture where `USFMParser::new` produces errors, `parser.errors` is non-empty and the guard at `test_json_conversion.rs:221-223` (`if !parser.errors.is_empty() { continue; }`) skips that file without pushing anything into the local `failures` vector. 4. The final assertion at `test_json_conversion.rs:245` only checks `failures.is_empty()`, so the test still passes even though at least one "valid" USFM fixture failed to parse and its marker preservation was never validated (unlike the Python original at `py-usfm-parser/tests/test_json_conversion.py:12-21`, which asserts on `test_parser.errors`).

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/tests/test_json_conversion.rs **Line:** 221:223 **Comment:** *Logic Error: This test silently skips files that fail parsing instead of failing the test, which can hide real regressions in marker preservation. Treat parse errors as failures here so a valid fixture that starts failing does not get ignored. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:33:14Z

+        if !parser1.errors.is_empty() {
+            continue;
+        }


Suggestion: In the round-trip test, parse errors in the source file are ignored with continue, so the test can falsely pass without exercising USJ→USFM round-tripping for affected fixtures. Record these as failures instead of skipping. [logic error]

Severity Level: Critical 🚨

- ⚠️ Round-trip conversion regressions may not be detected. - ⚠️ USJ→USFM pipeline untested for some valid fixtures.

Steps of Reproduction ✅

1. Run `cargo test test_usj_round_tripping` so the round-trip test at `rust-usfm-parser/tests/test_json_conversion.rs:255-291` executes. 2. For each file in `positive_files()` (from `tests/common.rs:126-131`), the test calls `initialise_parser(&path)` at `tests/common.rs:176-181`, creating `parser1` with any parse diagnostics stored in `parser1.errors`. 3. If a "positive" USFM fixture now produces parse errors (e.g. due to a regression in `USFMParser::new`), the guard at `test_json_conversion.rs:260-262` (`if !parser1.errors.is_empty() { continue; }`) skips that fixture entirely, so neither `parser1.to_usj` (lines 264-269) nor `USFMParser::from_usj` and the re-parse at lines 272-282 are exercised. 4. Because no entry is added to the local `failures` vector for such skipped fixtures, the final `assert!(failures.is_empty())` at `test_json_conversion.rs:291` still passes, meaning USFM→USJ→USFM round-tripping is not validated for the failing files (in contrast to the Python test at `py-usfm-parser/tests/test_json_conversion.py:48-57`, which asserts on `test_parser1.errors`).

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/tests/test_json_conversion.rs **Line:** 260:262 **Comment:** *Logic Error: In the round-trip test, parse errors in the source file are ignored with `continue`, so the test can falsely pass without exercising USJ→USFM round-tripping for affected fixtures. Record these as failures instead of skipping. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:33:14Z

+        if !parser.errors.is_empty() {
+            continue;
+        }


Suggestion: This comparison test also skips files when parsing fails, which hides regressions and makes results unreliable. The test should fail for unexpected parser errors on positive fixtures. [logic error]

Severity Level: Major ⚠️

- ⚠️ USJ vs origin.json mismatches may go untested. - ⚠️ Parser failures can hide broken JSON comparison logic.

Steps of Reproduction ✅

1. Run `cargo test test_compare_usj_with_testsuite_samples` so the comparison test at `rust-usfm-parser/tests/test_json_conversion.rs:333-380` executes. 2. For each path returned by `positive_files()` (from `tests/common.rs:126-131`), the test constructs `parser` via `initialise_parser(&path)` at `tests/common.rs:176-181`, populating `parser.errors` if the USFM fails to parse. 3. When any "positive" fixture yields non-empty `parser.errors`, the guard at `test_json_conversion.rs:342-344` (`if !parser.errors.is_empty() { continue; }`) skips that fixture, so no USJ is generated and no comparison to `origin.json` is performed for that file. 4. Since skipped files do not add entries to the `failures` vector, the final assertion at `test_json_conversion.rs:379` (`assert!(failures.is_empty(), ...)`) can still pass, meaning the test suite may report success while some "valid" fixtures both fail to parse and never have their USJ compared to the reference JSON (whereas the Python test at `py-usfm-parser/tests/test_json_conversion.py:198-215` asserts on `test_parser.errors` before comparison).

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/tests/test_json_conversion.rs **Line:** 342:344 **Comment:** *Logic Error: This comparison test also skips files when parsing fails, which hides regressions and makes results unreliable. The test should fail for unexpected parser errors on positive fixtures. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:33:15Z

+        let mut origin_usj: Value = match fs::read_to_string(&usj_path) {
+            Ok(text) => match serde_json::from_str(&text) {
+                Ok(v) => v,
+                Err(_) => continue,
+            },
+            Err(_) => continue, // no reference file — skip
+        };


Suggestion: Errors while reading or parsing origin.json are swallowed and treated as skips, so missing/corrupt reference fixtures are never reported and the test can pass without performing the intended comparison. Only skip missing files if that is intended; malformed JSON should fail the test. [logic error]

Severity Level: Major ⚠️

- ⚠️ Corrupted origin.json fixtures are silently ignored in tests. - ⚠️ USJ comparison coverage weaker than Python test suite.

Steps of Reproduction ✅

1. Run `cargo test test_compare_usj_with_testsuite_samples` to execute the test at `rust-usfm-parser/tests/test_json_conversion.rs:333-380`. 2. For each "positive" fixture path, the test computes `usj_path = path.with_file_name("origin.json")` at line 352 and then executes the nested `match` at lines 353-359, calling `fs::read_to_string(&usj_path)` and `serde_json::from_str(&text)`. 3. If the `origin.json` file exists but is malformed JSON (e.g. truncated or hand-edited incorrectly), `serde_json::from_str` returns `Err`, and the inner arm at `test_json_conversion.rs:354-357` executes `Err(_) => continue`, causing the test to silently skip that fixture instead of recording a failure in the `failures` vector. 4. Because missing files are also skipped via the outer `Err(_) => continue` at line 358 (matching the Python test's `FileNotFoundError` handling at `py-usfm-parser/tests/test_json_conversion.py:208-217`), the final `assert!(failures.is_empty())` at `test_json_conversion.rs:379` can still succeed even when a reference `origin.json` is present but invalid and never actually compared.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/tests/test_json_conversion.rs **Line:** 353:359 **Comment:** *Logic Error: Errors while reading or parsing `origin.json` are swallowed and treated as skips, so missing/corrupt reference fixtures are never reported and the test can pass without performing the intended comparison. Only skip missing files if that is intended; malformed JSON should fail the test. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:33:59Z

+                    let val_str = val.as_str().unwrap_or("");
+                    if key == "file" {
+                        self.usfm_string.push_str(&format!("src=\"{}\" ", val_str));
+                    } else {
+                        self.usfm_string.push_str(&format!("{}=\"{}\" ", key, val_str));
+                    }


Suggestion: Attribute values are written into quoted USFM attributes without escaping embedded quotes, so a value containing " will break attribute syntax and can corrupt the regenerated USFM. Sanitize or escape quotes in USJ attribute values before interpolation, consistently with the USX conversion path. [possible bug]

Severity Level: Major ⚠️

- ❌ CLI USJ→USX fails when attributes contain double quotes. - ⚠️ USJ→USFM round-trip produces syntactically malformed USFM. - ⚠️ Extra attributes after quote lost or misparsed on parse.

Steps of Reproduction ✅

1. Prepare a USJ JSON file for the CLI input, using the format consumed in `rust-usfm-parser/src/main.rs:96-18` when `is_json` is true (extension `.json` or `.usj`). Include at least one object with an extra attribute whose value contains a double quote, e.g.: ```json { "type": "char", "marker": "x", "content": [], "custom": "foo\"bar" }

Run the rust-usfm-parser CLI (binary with main in
rust-usfm-parser/src/main.rs:35-53) with this file as infile, --in_format=usj (or
.json extension), --out_format=usx and without --ignore_errors, so is_json is
true at main.rs:14-18 and USFMParser::from_usj(&usj) is called at main.rs:15-17.

Inside USFMParser::from_usj (rust-usfm-parser/src/usfm_parser.rs:10-17), the code
constructs a USFMGenerator and calls usj_to_usfm(usj, false)
(usfm_generator.rs:100-104). For the USJ object with "custom": "foo"bar", the "custom" key is not in NON_ATTRIB_USJ_KEYS (usfm_generator.rs:15-18), so the "Extra attributes" block at usfm_generator.rs:178-195runs:val_strbecomes"foo"bar" (line 187), and self.usfm_string.push_str(&format!("custom="{}" ", val_str));atline 191-192producescustom="foo"bar "`. This string has an embedded double quote that prematurely
closes the attribute value, making the generated USFM syntactically malformed.

Still inside from_usj, the malformed USFM is passed to USFMParser::build
(usfm_parser.rs:45-66), which parses it with tree-sitter and collects ERROR nodes into
self.errors (usfm_parser.rs:67-79). When the CLI later calls
parser.to_usx(ignore_errors) at rust-usfm-parser/src/main.rs:19-21 with
ignore_errors false by default (main.rs:201-205,225), guard_errors at
usfm_parser.rs:42-48 sees non-empty self.errors, returns an error, and the CLI prints
the error and exits with code 1 at main.rs:21-29. Thus, any USJ input containing an
extra attribute value with a double quote reliably breaks USJ→USFM→USX conversion.

</details> [Fix in Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20rust-usfm-parser%2Fsrc%2Fusfm_generator.rs%0A%2A%2ALine%3A%2A%2A%20187%3A192%0A%2A%2AComment%3A%2A%2A%0A%09%2APossible%20Bug%3A%20Attribute%20values%20are%20written%20into%20quoted%20USFM%20attributes%20without%20escaping%20embedded%20quotes%2C%20so%20a%20value%20containing%20%60%22%60%20will%20break%20attribute%20syntax%20and%20can%20corrupt%20the%20regenerated%20USFM.%20Sanitize%20or%20escape%20quotes%20in%20USJ%20attribute%20values%20before%20interpolation%2C%20consistently%20with%20the%20USX%20conversion%20path.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A) | [Fix in VSCode Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20rust-usfm-parser%2Fsrc%2Fusfm_generator.rs%0A%2A%2ALine%3A%2A%2A%20187%3A192%0A%2A%2AComment%3A%2A%2A%0A%09%2APossible%20Bug%3A%20Attribute%20values%20are%20written%20into%20quoted%20USFM%20attributes%20without%20escaping%20embedded%20quotes%2C%20so%20a%20value%20containing%20%60%22%60%20will%20break%20attribute%20syntax%20and%20can%20corrupt%20the%20regenerated%20USFM.%20Sanitize%20or%20escape%20quotes%20in%20USJ%20attribute%20values%20before%20interpolation%2C%20consistently%20with%20the%20USX%20conversion%20path.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** rust-usfm-parser/src/usfm_generator.rs **Line:** 187:192 **Comment:** *Possible Bug: Attribute values are written into quoted USFM attributes without escaping embedded quotes, so a value containing `"` will break attribute syntax and can corrupt the regenerated USFM. Sanitize or escape quotes in USJ attribute values before interpolation, consistently with the USX conversion path. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:33:59Z

+        let sid = format!(
+            "{} {}:{}",
+            self.parse_state.book_slug.as_deref().unwrap_or(""),
+            self.parse_state.current_chapter.as_deref().unwrap_or(""),
+            num
+        );
+
+        let mut v_obj = json!({
+            "type":   "verse",
+            "marker": "v",
+            "number": num,
+            "sid":    sid,


Suggestion: The verse reference id is always constructed and emitted even when book or chapter state is missing, which produces malformed references like empty-book/empty-chapter SIDs on partial or error-tolerant parses. Only set a SID when both required state values are available, otherwise omit it. [logic error]

Severity Level: Major ⚠️

- ⚠️ USJ from to_usj() can contain malformed verse sid fields. - ⚠️ BibleNLP export via to_biblenlp_format() inherits invalid SIDs. - ⚠️ Downstream tools consuming sid cannot rely on verse references.

Steps of Reproduction ✅

1. Prepare malformed or partial USFM that includes a `\v` verse marker without a preceding `\id` book marker or `\c` chapter marker, so the tree-sitter parse cannot populate book/chapter state (grammar for `id` and `chapter` is in `tree-sitter-usfm3/grammar.js:45-48` and `146-154`). 2. In Rust, call `USFMParser::to_usj(exclude_markers, include_markers, ignore_errors = true, combine_texts)` from `rust-usfm-parser/src/usfm_parser.rs:26-48` so that `guard_errors()` allows generation despite parse errors caused by the missing markers. 3. `to_usj()` builds a `USJGenerator` and calls `get_usj(self.syntax_tree.root_node())`; within `get_usj`, `node_2_usj()` (`rust-usfm-parser/src/usj_generator.rs:382-447`) dispatches each `v` node to `node_2_usj_verse()` at `rust-usfm-parser/src/usj_generator.rs:270-330`. 4. In `node_2_usj_verse()`, the `sid` is constructed at `usj_generator.rs:308-313` using `self.parse_state.book_slug.as_deref().unwrap_or("")` and `self.parse_state.current_chapter.as_deref().unwrap_or("")`, both still `None` because no `id` or `c` node was processed, so the resulting USJ verse objects contain malformed identifiers like `"sid": " :5"` with empty book and chapter components.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/src/usj_generator.rs **Line:** 308:319 **Comment:** *Logic Error: The verse reference id is always constructed and emitted even when book or chapter state is missing, which produces malformed references like empty-book/empty-chapter SIDs on partial or error-tolerant parses. Only set a SID when both required state values are available, otherwise omit it. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:33:59Z

+                if nestable {
+                    self.node_2_usj(child, &mut para_obj);
+                } else {
+                    if !node_added {
+                        Self::push_obj(parent, para_obj.clone());
+                        node_added = true;
+                    } 
+                    self.node_2_usj(child, parent);
+                }


Suggestion: The paragraph object is pushed as a clone the first time a non-nestable child is seen, and later nestable children are still appended to the local para_obj that is no longer connected to the output tree. This can silently drop trailing inline content from the generated USJ. Push the paragraph object once up front (or mutate the already-pushed node) so all nestable children are preserved. [logic error]

Severity Level: Critical 🚨

- ❌ USJ output from to_usj() drops text after figures. - ❌ List output from to_list() omits trailing heading content. - ⚠️ BibleNLP export via to_biblenlp_format() may miss title text.

Steps of Reproduction ✅

1. Create a USFM file that uses a generic paragraph-style marker such as `\cd` with inline text, an embedded figure, and trailing text, for example: `\cd First part \fig ...\fig* second part`. The `cd` rule allowing `fig` is defined in `tree-sitter-usfm3/grammar.js:180-186`. 2. In Rust, construct a parser and call `USFMParser::to_usj(..)` from `rust-usfm-parser/src/usfm_parser.rs:26-48` on this USFM, so it builds a syntax tree and then invokes `USJGenerator::get_usj(self.syntax_tree.root_node())`. 3. Inside `USJGenerator::node_2_usj()` at `rust-usfm-parser/src/usj_generator.rs:382-447`, the `cd` node is dispatched to `node_2_usj_generic()` (`rust-usfm-parser/src/usj_generator.rs:306-352`), which creates a local `para_obj` and then iterates its children. 4. In `node_2_usj_generic()`, the first `text` child is treated as `nestable` and appended to `para_obj`, the `fig` child is treated as non-nestable and causes `para_obj.clone()` to be pushed to `parent` at `usj_generator.rs:738-743`, and any subsequent `text` children (the "second part") are again treated as `nestable` and appended only to the local `para_obj` that is no longer pushed. Inspecting the `to_usj()` JSON output shows that text after the `\fig` (e.g. "second part") is missing from the generated USJ.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** rust-usfm-parser/src/usj_generator.rs **Line:** 738:746 **Comment:** *Logic Error: The paragraph object is pushed as a clone the first time a non-nestable child is seen, and later nestable children are still appended to the local `para_obj` that is no longer connected to the output tree. This can silently drop trailing inline content from the generated USJ. Push the paragraph object once up front (or mutate the already-pushed node) so all nestable children are preserved. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

👍 | 👎

codeant-ai · 2026-04-28T05:34:04Z

CodeAnt AI finished running the review.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

Kali-05 and others added 30 commits January 29, 2025 20:30

Threding (#320)

5cf20e3

* threading * implimented threading in the new code * updated gitignore * implimented threading in the new code

Char in rem (#306)

07336ce

* Update grammar to support char in rem * Bump version: 3.0.0 → 3.0.1-alpha.1 * Add new test case to check char in rem support

List with bcv (#307)

9a0b936

* Fix issue of missing contents when paras are excluded * Always include BCV when converting to List * Copy changes in node to web also * Test and fix * Bump version: 3.0.1-alpha.1 → 3.0.1-alpha.2 * Include link to issue in \rem test case

Update pypi_publish github action (#321)

411c381

Upgrades the versions of github actions due to the deprecation. - upload-artifact@v3 to upload-artifacts@v4 - download-artifacts@v3 to download-artifacts@v4

3.1.0 node n web issues (#323)

ba77757

* fix the special handling of c and v not reachable * List xmldom as a peerdependency * Bump version: 3.1.0 → 3.1.1

try fix the artifact already exists issue in py pyblish on gitactions (…

d35f0f1

…#324)

q4 support #330 (#332)

a8daae0

* Update grammar to support \q4 * Change parse tree outputs in grammar tests * Add a test case for \q4 * Let \liv, \th, \thr, \tc, \tcr, etc have unlimited numbering * Let \h have unrestricted numbering

New marker tcc (#334)

16efbf3

* Add grammar rules for \tcc * Update python methods to include \tcc * Add testcase for \tcc * Update web module to support \tcc * Update node module to support \tcc * Fix linting issues * Fix typo

Hotfix/pypi publish june2025 (#336)

74c19fb

* Update testpypi_publish.yml * Update testpypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml * Update pypi_publish.yml

Multiple sr (#343)

99d2b89

* Allow multiple \sr and \r markers under one \s# * Bump version: 3.1.2 → 3.1.3

resolve conflict

455111a

Bump version: 3.1.3 → 3.2.0-beta.1

9bc253c

Not to commit packagae-lock.json

d7e0c0b

Fix spelling mistake for "Python" in README.md

a5bac6a

remove pnpm-lock and unused eslintrc

742d322

kavitharaju added 14 commits April 9, 2026 16:44

Test and fix: filtering usj

6bfe069

Test and fix: Correct table cell in usj generation

8ec76de

Test and fix: handle space handling difference in test suite usjs and…

7b6ea21

… out generated ones in comparison tests

Test and fix: Correct table cell align in testsuite samples

ec7ff94

Test and fix: add space notmalization in USX comparison tests

8bfe389

Test and fix: Try some more special handling for USX comparison

24d4fa9

Add CLI support in main.rs

71599f8

Test and fix: add verse end tag to previous para at chapter end.

62b0682

Test and fix: Remove quick xml dependency

4b90a3e

replace the old rust implementation with new

80a7ee0

Add a readme.md

0b5575a

remove old readme

7a36327

Correct readme

dc4ef01

Make a beta.1 release

5d2b906

kavitharaju requested a review from joelthe1 April 28, 2026 05:21

kavitharaju added the Epic label Apr 28, 2026

codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Apr 28, 2026

Merge branch 'master' into rust

2ff7b3b

codeant-ai Bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Apr 28, 2026

codeant-ai Bot reviewed Apr 28, 2026

View reviewed changes

Conversation

kavitharaju commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Uh oh!

codeant-ai Bot commented Apr 28, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai Bot commented Apr 28, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai Bot commented Apr 28, 2026

Sequence Diagram

Uh oh!

codeant-ai Bot commented Apr 28, 2026

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot commented Apr 28, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kavitharaju commented Apr 28, 2026 •

edited

Loading