|
| 1 | + |
1 | 2 | # Task Plan: Enhance `strs_tools::split` to Support Unescaping in Quoted Strings |
2 | 3 |
|
3 | 4 | ### Goal |
|
12 | 13 | ### Progress |
13 | 14 | * **Roadmap Milestone:** N/A |
14 | 15 | * **Primary Editable Crate:** `module/core/strs_tools` |
15 | | -* **Overall Progress:** 5/7 increments complete |
| 16 | +* **Overall Progress:** 5/8 increments complete |
16 | 17 | * **Increment Status:** |
17 | 18 | * ✅ Increment 1: Setup and Analysis |
18 | 19 | * ✅ Increment 2: API Change - Use `Cow` for `Split.string` |
19 | 20 | * ✅ Increment 3: Fix Compilation Errors |
20 | 21 | * ✅ Increment 4: Implement Unescaping Logic |
21 | 22 | * ✅ Increment 5: Implement Quoted Segment Logic |
22 | | - * ⏳ Increment 6: Add New Tests for Unescaping and Quoting |
23 | | - * ⚫ Increment 7: Finalization |
| 23 | + * ⏳ Increment 6: Fix Spurious Empty Segment Bug |
| 24 | + * ⚫ Increment 7: Fix Incorrect Escape Handling Bug |
| 25 | + * ⚫ Increment 8: Finalization |
24 | 26 |
|
25 | 27 | ### Permissions & Boundaries |
26 | 28 | * **Mode:** code |
|
37 | 39 | * `module/core/strs_tools/src/lib.rs` |
38 | 40 | * `module/core/strs_tools/Cargo.toml` |
39 | 41 | * `module/core/strs_tools/tests/inc/split_test/quoting_options_tests.rs` |
| 42 | + * `module/core/strs_tools/tests/inc/split_test/quoting_and_unescaping_tests.rs` |
| 43 | + * `module/core/strs_tools/tests/inc/split_test/combined_options_tests.rs` |
40 | 44 | * Crates for Documentation (for AI's reference, if `read_file` on docs is planned): |
41 | 45 | * `strs_tools` |
42 | 46 |
|
|
55 | 59 | ### Increments |
56 | 60 | ##### Increment 1: Setup and Analysis |
57 | 61 | * **Goal:** Read all relevant files to build a complete understanding of the current implementation of the `split` iterator and its tests. |
58 | | -* **Specification Reference:** N/A |
59 | | -* **Steps:** |
60 | | - * Step 1: Use `read_file` to load the content of: |
61 | | - * `module/core/strs_tools/src/string/split.rs` |
62 | | - * `module/core/strs_tools/src/lib.rs` |
63 | | - * `module/core/strs_tools/Cargo.toml` |
64 | | - * `module/core/strs_tools/tests/inc/split_test/quoting_options_tests.rs` |
65 | | - * Step 2: Analyze the read files to understand the current implementation of `Split`, `SplitIterator`, and how quoting is handled. |
66 | | -* **Increment Verification:** |
67 | | - * Step 1: Confirm that all files were read successfully. |
68 | 62 | * **Commit Message:** `chore(strs_tools): Begin refactoring of split iterator for unescaping` |
69 | 63 |
|
70 | 64 | ##### Increment 2: API Change - Use `Cow` for `Split.string` |
71 | 65 | * **Goal:** Modify the `Split` struct to use `Cow<'a, str>` for its `string` field to support returning owned, unescaped strings. |
72 | | -* **Specification Reference:** "API Change Consideration" in the original proposal. |
73 | | -* **Steps:** |
74 | | - * Step 1: In `module/core/strs_tools/src/string/split.rs`, change the type of the `string` field in the `Split` struct from `&'a str` to `Cow<'a, str>`. |
75 | | - * Step 2: Update the `Debug` and any other trait implementations for `Split` to handle the `Cow`. |
76 | | - * Step 3: Attempt to compile the crate using `timeout 90 cargo build -p strs_tools`. Expect failures. |
77 | | - * Step 4: Use the compiler output to identify all locations that need to be updated due to this breaking change. |
78 | | -* **Increment Verification:** |
79 | | - * Step 1: The `Split` struct definition in `split.rs` must be updated to use `Cow<'a, str>`. |
80 | | - * Step 2: The `cargo build` command should fail, and the output should indicate errors related to the type change. |
81 | 66 | * **Commit Message:** `feat(strs_tools): Change Split.string to Cow to support unescaping` |
82 | 67 |
|
83 | 68 | ##### Increment 3: Fix Compilation Errors |
84 | 69 | * **Goal:** Resolve all compilation errors caused by the change of `Split.string` to `Cow<'a, str>`. |
85 | | -* **Specification Reference:** N/A |
86 | | -* **Steps:** |
87 | | - * Step 1: Based on the compiler errors from the previous increment, systematically update all code that creates or uses `Split` instances. This will likely involve wrapping existing `&str` values in `Cow::Borrowed(...)` and preparing for `Cow::Owned(...)`. |
88 | | - * Step 2: Run `timeout 90 cargo build -p strs_tools` repeatedly until all compilation errors are resolved. |
89 | | -* **Increment Verification:** |
90 | | - * Step 1: The command `timeout 90 cargo build -p strs_tools` must pass successfully. |
91 | | - * Step 2: Run `timeout 90 cargo test -p strs_tools`. Some tests may fail due to logic changes, but it should compile. |
92 | 70 | * **Commit Message:** `fix(strs_tools): Adapt codebase to Cow-based Split.string` |
93 | 71 |
|
94 | 72 | ##### Increment 4: Implement Unescaping Logic |
95 | 73 | * **Goal:** Implement the core logic to unescape characters within a string slice. |
96 | | -* **Specification Reference:** "Perform unescaping of standard escape sequences" from the proposal. |
97 | | -* **Steps:** |
98 | | - * Step 1: Use `read_file` to load `module/core/strs_tools/src/string/split.rs`. |
99 | | - * Step 2: In `module/core/strs_tools/src/string/split.rs`, add a new private helper function `fn unescape_str( input: &str ) -> Cow< '_, str >`. |
100 | | - * Step 3: Implement the logic for `unescape_str`: |
101 | | - * Search for the `\` character. If it's not found, return `Cow::Borrowed(input)`. |
102 | | - * If `\` is found, iterate through the input string's characters to build a new `String`. |
103 | | - * When a `\` is encountered, inspect the next character to handle valid escape sequences (`\"`, `\\`, `\n`, `\t`, `\r`) by appending their literal counterparts. |
104 | | - * If an escape sequence is not one of the recognized ones, append both the `\` and the character that follows it literally. |
105 | | - * Append all other characters as-is. |
106 | | - * Return `Cow::Owned(new_string)`. |
107 | | - * Step 4: In `module/core/strs_tools/src/string/split.rs`, add a new test module `#[cfg(test)] mod unescape_tests { ... }` at the end of the file. |
108 | | - * Step 5: Inside `unescape_tests`, add unit tests for the `unescape_str` function to cover various scenarios: |
109 | | - * A string with no escape sequences. |
110 | | - * Strings with each of the valid escape sequences (`\"`, `\\`, `\n`, `\t`, `\r`). |
111 | | - * A string with a mix of valid escape sequences. |
112 | | - * A string with an unrecognized escape sequence (e.g., `\z`) to ensure it's handled literally. |
113 | | - * An empty string. |
114 | | - * A string ending with a `\`. |
115 | | -* **Increment Verification:** |
116 | | - * Step 1: Execute `timeout 90 cargo test -p strs_tools --all-targets` via `execute_command`. |
117 | | - * Step 2: Analyze the output to confirm that all tests in the `unescape_tests` module pass successfully. |
118 | 74 | * **Commit Message:** `feat(strs_tools): Implement unescaping logic for string splitting` |
119 | 75 |
|
120 | 76 | ##### Increment 5: Implement Quoted Segment Logic |
121 | 77 | * **Goal:** Modify the `SplitIterator` to correctly identify and consume an entire quoted string as a single token, and apply the new unescaping logic. |
122 | | -* **Specification Reference:** "Ensure that when `quoting(true)` is enabled, the iterator consumes the entire quoted segment" from the proposal. |
123 | | -* **Steps:** |
124 | | - * Step 1: Read the file `module/core/strs_tools/src/string/split.rs`. |
125 | | - * Step 2: In the `next()` method of `SplitIterator`, remove the `dbg!` macro calls that were used for debugging. |
126 | | - * Step 3: Run `timeout 90 cargo test -p strs_tools --all-targets` to confirm that all tests still pass after removing the debug macros. |
127 | | -* **Increment Verification:** |
128 | | - * Step 1: Run `timeout 90 cargo test -p strs_tools --all-targets`. All tests must pass. |
129 | 78 | * **Commit Message:** `feat(strs_tools): Make split iterator consume full quoted strings and unescape them` |
130 | 79 |
|
131 | | -##### Increment 6: Add New Tests for Unescaping and Quoting |
132 | | -* **Goal:** Add new integration tests to verify the complete functionality and prevent future regressions. |
133 | | -* **Specification Reference:** "Acceptance Criteria" from the proposal. |
| 80 | +##### Increment 6: Fix Spurious Empty Segment Bug |
| 81 | +* **Goal:** To fix the bug where an extra empty segment is incorrectly yielded after a quoted segment when `preserving_empty(true)` is enabled. |
| 82 | +* **Specification Reference:** N/A |
| 83 | +* **Steps:** |
| 84 | + * Step 1: **Analysis:** The tests `test_m_t3_13_quoting_preserve_all_strip`, `empty_quoted_section_test`, and `test_m_t3_11_quoting_preserve_all_no_strip` all fail because an extra empty `""` is produced after a quoted segment. This happens because the main `SplitIterator`'s `next` method has special "peeking" logic for quotes. After it consumes a quoted segment, the underlying `SplitFastIterator` is given the rest of the string (e.g., ` d`). Because this starts with a delimiter, the fast iterator correctly yields an empty segment first. The main iterator's `skip` logic is flawed and fails to filter this artifact. |
| 85 | + * Step 2: **Read File:** Read `module/core/strs_tools/src/string/split.rs`. |
| 86 | + * Step 3: **Locate Flawed Logic:** In `SplitIterator::next`, find the two separate `if` blocks that use `continue` to skip segments. |
| 87 | + * Step 4: **Apply Fix:** Replace the two separate `if` blocks with a single, consolidated `if skip { ... }` block. The `skip` variable will combine the conditions for skipping empty segments and skipping delimiters into a single boolean check. |
| 88 | + ```rust |
| 89 | + let skip = ( current_split.typ == SplitType::Delimeted && current_split.string.is_empty() && !self.flags.contains( SplitFlags::PRESERVING_EMPTY ) ) |
| 90 | + || ( current_split.typ == SplitType::Delimiter && !self.flags.contains( SplitFlags::PRESERVING_DELIMITERS ) ); |
| 91 | + |
| 92 | + if skip |
| 93 | + { |
| 94 | + continue; |
| 95 | + } |
| 96 | + ``` |
| 97 | + * Step 5: **Perform Increment Verification.** |
| 98 | +* **Increment Verification:** |
| 99 | + * Step 1: Execute `timeout 90 cargo test -p strs_tools --test strs_tools_tests` via `execute_command`. |
| 100 | + * Step 2: Analyze the output. The following tests must now **pass**: |
| 101 | + * `inc::split_test::combined_options_tests::test_m_t3_13_quoting_preserve_all_strip` |
| 102 | + * `inc::split_test::quoting_and_unescaping_tests::empty_quoted_section_test` |
| 103 | + * `inc::split_test::quoting_options_tests::test_m_t3_11_quoting_preserve_all_no_strip` |
| 104 | + * `inc::split_test::quoting_options_tests::test_m_t3_13_quoting_preserve_all_strip` |
| 105 | + * Step 3: The `mre_test` is expected to still fail. |
| 106 | +* **Commit Message:** `fix(strs_tools): Prevent extra empty segment after quoted strings` |
| 107 | + |
| 108 | +##### Increment 7: Fix Incorrect Escape Handling Bug |
| 109 | +* **Goal:** To fix the bug where an escaped backslash (`\\`) followed by a quote is parsed incorrectly, as identified in the `mre_test`. |
| 110 | +* **Specification Reference:** N/A |
134 | 111 | * **Steps:** |
135 | | - * Step 1: Create a new test file: `module/core/strs_tools/tests/inc/split_test/quoting_and_unescaping_tests.rs`. |
136 | | - * Step 2: Use `read_file` to load `module/core/strs_tools/tests/inc/split_test/mod.rs`. |
137 | | - * Step 3: Use `insert_content` to add `pub mod quoting_and_unescaping_tests;` to `module/core/strs_tools/tests/inc/split_test/mod.rs`. |
138 | | - * Step 4: In the new test file, add a test case that is an exact copy of the MRE from the task description. Assert that the output for the quoted part is a single `Split` item with the correctly unescaped string. |
139 | | - * Step 5: Add more test cases covering: |
140 | | - * Strings with no quotes. |
141 | | - * Strings with empty quoted sections (`""`). |
142 | | - * Strings with multiple, different escape sequences. |
143 | | - * Quoted strings at the beginning, middle, and end of the input. |
144 | | - * Unterminated quoted strings (decide on expected behavior, e.g., treat as literal). |
| 112 | + * Step 1: **Analysis:** The `mre_test` fails because the input `r#""arg3 \\" "#` is split into `arg3` and `\\\\\"` instead of a single token `arg3 \`. The root cause is in `SplitFastIterator::next`. The `for` loop that scans for the closing quote does not correctly manage the state of the `prev_char_is_escape` flag, causing it to misinterpret the sequence `\\"`. |
| 113 | + * Step 2: **Read File:** Read `module/core/strs_tools/src/string/split.rs`. |
| 114 | + * Step 3: **Locate Flawed Logic:** In `SplitFastIterator::next`, find the `for` loop inside the `if let Some( current_quote_char ) = self.active_quote_char` block. |
| 115 | + * Step 4: **Apply Fix:** Replace the buggy `for` loop with the corrected version that properly handles the escape state. |
| 116 | + ```rust |
| 117 | + let mut end_of_quote_idx : Option< usize > = None; |
| 118 | + let mut prev_char_is_escape = false; |
| 119 | + for ( char_idx, ch ) in self.iterable.char_indices() |
| 120 | + { |
| 121 | + if prev_char_is_escape |
| 122 | + { |
| 123 | + prev_char_is_escape = false; |
| 124 | + } |
| 125 | + else if ch == '\\' |
| 126 | + { |
| 127 | + prev_char_is_escape = true; |
| 128 | + } |
| 129 | + else if ch == current_quote_char |
| 130 | + { |
| 131 | + end_of_quote_idx = Some( char_idx + ch.len_utf8() ); |
| 132 | + break; |
| 133 | + } |
| 134 | + } |
| 135 | + ``` |
| 136 | + * Step 5: **Perform Increment Verification.** |
145 | 137 | * **Increment Verification:** |
146 | | - * Step 1: Run `timeout 90 cargo test -p strs_tools --test strs_tools_tests`. All new and existing tests must pass. |
147 | | -* **Commit Message:** `test(strs_tools): Add comprehensive tests for quoting and unescaping` |
| 138 | + * Step 1: Execute `timeout 90 cargo test -p strs_tools --test strs_tools_tests` via `execute_command`. |
| 139 | + * Step 2: Analyze the output. The `inc::split_test::quoting_and_unescaping_tests::mre_test` must now **pass**. |
| 140 | + * Step 3: Ensure no regressions were introduced in the other tests. All tests should pass. |
| 141 | +* **Commit Message:** `fix(strs_tools): Correctly handle escaped characters in quoted strings` |
148 | 142 |
|
149 | | -##### Increment 7: Finalization |
| 143 | +##### Increment 8: Finalization |
150 | 144 | * **Goal:** Perform a final review and verification of the entire task's output. |
151 | 145 | * **Specification Reference:** N/A |
152 | 146 | * **Steps:** |
153 | 147 | * Step 1: Perform a self-critique against all requirements in the plan. |
154 | 148 | * Step 2: Run the full `Crate Conformance Check Procedure`. |
155 | | - * Step 3: Ensure no regressions have been introduced. |
156 | | - * Step 4: Remove the original `module/core/strs_tools/task.md` if it still exists. |
| 149 | + * Step 3: Remove any temporary debug files (e.g., `debug_split_issue.rs`, `debug_hang_split_issue.rs`). |
| 150 | + * Step 4: Update `changelog.md` with a summary of the changes. |
157 | 151 | * **Increment Verification:** |
158 | 152 | * Step 1: All steps of the `Crate Conformance Check Procedure` must pass. |
159 | 153 | * Step 2: `git status` should be clean. |
|
182 | 176 | * None |
183 | 177 |
|
184 | 178 | ### Notes & Insights |
| 179 | +* **Bug A (Incorrect Escape Handling):** The parser fails to correctly handle an escaped backslash (`\\`) when it is immediately followed by a closing quote character (`"`). The root cause is flawed state management in `SplitFastIterator::next`'s quote-scanning loop. The `mre_test` correctly identifies this bug. |
| 180 | +* **Bug B (Spurious Empty Segment):** The iterator incorrectly yields an extra, unwanted empty segment (`""`) immediately after parsing a quoted segment, but only when the `preserving_empty(true)` option is enabled. This is due to flawed `skip` logic in the main `SplitIterator` after its "peeking" logic for quotes has run. |
185 | 181 | * **Increment 4 (Implement Unescaping Logic):** |
186 | | - * **Issue:** Initial implementation of `unescape_str` caused lifetime errors (`E0597`) when its `Cow::Borrowed` return type was used in `SplitIterator::next` due to borrowing from a temporary `quoted_segment`. |
187 | | - * **Solution:** Forced `unescape_str` to always return `Cow::Owned` by calling `.into_owned()` on its result, breaking the invalid borrow. This required explicit type annotation and a two-step conversion to avoid compiler confusion. |
188 | | - * **Insight:** `Cow` can be tricky with lifetimes, especially when intermediate `Cow::Borrowed` values are created and then used in a context that outlives them. Explicitly converting to `Cow::Owned` can resolve such issues, but it's important to consider performance implications if many small strings are being unescaped. |
| 182 | + * **Issue:** Initial implementation of `unescape_str` caused lifetime errors (`E0597`). |
| 183 | + * **Solution:** Forced `unescape_str` to always return `Cow::Owned`. |
189 | 184 | * **Increment 5 (Implement Quoted Segment Logic):** |
190 | | - * **Issue:** New tests for quoting and unescaping failed because `SplitIterator` was incorrectly preserving delimiter segments even when `preserving_delimeters(false)` was set. Additionally, an extra empty string segment was sometimes yielded when `preserving_empty` is true and a quoted segment is encountered. |
191 | | - * **Solution:** Modified the `SplitIterator::next` method to correctly apply the `skip` logic. The `skip` conditions for empty delimited segments and delimiter segments were combined with a logical OR (`||`) and placed at the beginning of the loop to ensure immediate skipping. This prevents unwanted segments from being yielded. |
192 | | - * **Insight:** The order and combination of `skip` conditions are crucial in iterators. A single `skip` flag that is conditionally overwritten can lead to subtle bugs. It's better to combine all skip conditions into a single boolean check at the start of the loop iteration. |
| 185 | + * **Issue:** New tests for quoting and unescaping failed because `SplitIterator` was incorrectly preserving delimiter segments. |
| 186 | + * **Solution:** Modified the `SplitIterator::next` method to correctly apply the `skip` logic. |
193 | 187 |
|
194 | 188 | ### Changelog |
| 189 | +* [Increment 6 Plan] Updated plan to fix the two distinct bugs (Spurious Empty Segment, Incorrect Escape Handling) in separate, detailed increments based on comprehensive test failure analysis. |
195 | 190 | * [Increment 5 | 2025-07-12] Removed debug macros from `SplitIterator`. |
196 | 191 | * [Increment 4 | 2025-07-12] Implemented `unescape_str` function with unit tests and fixed compilation issues. |
197 | 192 | * [Increment 3 | 2025-07-10] Fixed compilation errors after changing `Split.string` to `Cow`. |
198 | 193 | * [Increment 2 | 2025-07-10] Changed `Split.string` to `Cow<'a, str>` to support unescaping. |
199 | | -* [Increment 1 | 2025-07-10] Read relevant files for analysis. |
| 194 | +* [Increment 1 | 2025-07-10] Read relevant files for analysis. |
0 commit comments