Skip to content

Commit c621e0e

Browse files
committed
strs_tools
1 parent 13034ee commit c621e0e

File tree

1 file changed

+78
-83
lines changed

1 file changed

+78
-83
lines changed
Lines changed: 78 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
# Task Plan: Enhance `strs_tools::split` to Support Unescaping in Quoted Strings
23

34
### Goal
@@ -12,15 +13,16 @@
1213
### Progress
1314
* **Roadmap Milestone:** N/A
1415
* **Primary Editable Crate:** `module/core/strs_tools`
15-
* **Overall Progress:** 5/7 increments complete
16+
* **Overall Progress:** 5/8 increments complete
1617
* **Increment Status:**
1718
* ✅ Increment 1: Setup and Analysis
1819
* ✅ Increment 2: API Change - Use `Cow` for `Split.string`
1920
* ✅ Increment 3: Fix Compilation Errors
2021
* ✅ Increment 4: Implement Unescaping Logic
2122
* ✅ Increment 5: Implement Quoted Segment Logic
22-
* ⏳ Increment 6: Add New Tests for Unescaping and Quoting
23-
* ⚫ Increment 7: Finalization
23+
* ⏳ Increment 6: Fix Spurious Empty Segment Bug
24+
* ⚫ Increment 7: Fix Incorrect Escape Handling Bug
25+
* ⚫ Increment 8: Finalization
2426

2527
### Permissions & Boundaries
2628
* **Mode:** code
@@ -37,6 +39,8 @@
3739
* `module/core/strs_tools/src/lib.rs`
3840
* `module/core/strs_tools/Cargo.toml`
3941
* `module/core/strs_tools/tests/inc/split_test/quoting_options_tests.rs`
42+
* `module/core/strs_tools/tests/inc/split_test/quoting_and_unescaping_tests.rs`
43+
* `module/core/strs_tools/tests/inc/split_test/combined_options_tests.rs`
4044
* Crates for Documentation (for AI's reference, if `read_file` on docs is planned):
4145
* `strs_tools`
4246

@@ -55,105 +59,95 @@
5559
### Increments
5660
##### Increment 1: Setup and Analysis
5761
* **Goal:** Read all relevant files to build a complete understanding of the current implementation of the `split` iterator and its tests.
58-
* **Specification Reference:** N/A
59-
* **Steps:**
60-
* Step 1: Use `read_file` to load the content of:
61-
* `module/core/strs_tools/src/string/split.rs`
62-
* `module/core/strs_tools/src/lib.rs`
63-
* `module/core/strs_tools/Cargo.toml`
64-
* `module/core/strs_tools/tests/inc/split_test/quoting_options_tests.rs`
65-
* Step 2: Analyze the read files to understand the current implementation of `Split`, `SplitIterator`, and how quoting is handled.
66-
* **Increment Verification:**
67-
* Step 1: Confirm that all files were read successfully.
6862
* **Commit Message:** `chore(strs_tools): Begin refactoring of split iterator for unescaping`
6963

7064
##### Increment 2: API Change - Use `Cow` for `Split.string`
7165
* **Goal:** Modify the `Split` struct to use `Cow<'a, str>` for its `string` field to support returning owned, unescaped strings.
72-
* **Specification Reference:** "API Change Consideration" in the original proposal.
73-
* **Steps:**
74-
* Step 1: In `module/core/strs_tools/src/string/split.rs`, change the type of the `string` field in the `Split` struct from `&'a str` to `Cow<'a, str>`.
75-
* Step 2: Update the `Debug` and any other trait implementations for `Split` to handle the `Cow`.
76-
* Step 3: Attempt to compile the crate using `timeout 90 cargo build -p strs_tools`. Expect failures.
77-
* Step 4: Use the compiler output to identify all locations that need to be updated due to this breaking change.
78-
* **Increment Verification:**
79-
* Step 1: The `Split` struct definition in `split.rs` must be updated to use `Cow<'a, str>`.
80-
* Step 2: The `cargo build` command should fail, and the output should indicate errors related to the type change.
8166
* **Commit Message:** `feat(strs_tools): Change Split.string to Cow to support unescaping`
8267

8368
##### Increment 3: Fix Compilation Errors
8469
* **Goal:** Resolve all compilation errors caused by the change of `Split.string` to `Cow<'a, str>`.
85-
* **Specification Reference:** N/A
86-
* **Steps:**
87-
* Step 1: Based on the compiler errors from the previous increment, systematically update all code that creates or uses `Split` instances. This will likely involve wrapping existing `&str` values in `Cow::Borrowed(...)` and preparing for `Cow::Owned(...)`.
88-
* Step 2: Run `timeout 90 cargo build -p strs_tools` repeatedly until all compilation errors are resolved.
89-
* **Increment Verification:**
90-
* Step 1: The command `timeout 90 cargo build -p strs_tools` must pass successfully.
91-
* Step 2: Run `timeout 90 cargo test -p strs_tools`. Some tests may fail due to logic changes, but it should compile.
9270
* **Commit Message:** `fix(strs_tools): Adapt codebase to Cow-based Split.string`
9371

9472
##### Increment 4: Implement Unescaping Logic
9573
* **Goal:** Implement the core logic to unescape characters within a string slice.
96-
* **Specification Reference:** "Perform unescaping of standard escape sequences" from the proposal.
97-
* **Steps:**
98-
* Step 1: Use `read_file` to load `module/core/strs_tools/src/string/split.rs`.
99-
* Step 2: In `module/core/strs_tools/src/string/split.rs`, add a new private helper function `fn unescape_str( input: &str ) -> Cow< '_, str >`.
100-
* Step 3: Implement the logic for `unescape_str`:
101-
* Search for the `\` character. If it's not found, return `Cow::Borrowed(input)`.
102-
* If `\` is found, iterate through the input string's characters to build a new `String`.
103-
* When a `\` is encountered, inspect the next character to handle valid escape sequences (`\"`, `\\`, `\n`, `\t`, `\r`) by appending their literal counterparts.
104-
* If an escape sequence is not one of the recognized ones, append both the `\` and the character that follows it literally.
105-
* Append all other characters as-is.
106-
* Return `Cow::Owned(new_string)`.
107-
* Step 4: In `module/core/strs_tools/src/string/split.rs`, add a new test module `#[cfg(test)] mod unescape_tests { ... }` at the end of the file.
108-
* Step 5: Inside `unescape_tests`, add unit tests for the `unescape_str` function to cover various scenarios:
109-
* A string with no escape sequences.
110-
* Strings with each of the valid escape sequences (`\"`, `\\`, `\n`, `\t`, `\r`).
111-
* A string with a mix of valid escape sequences.
112-
* A string with an unrecognized escape sequence (e.g., `\z`) to ensure it's handled literally.
113-
* An empty string.
114-
* A string ending with a `\`.
115-
* **Increment Verification:**
116-
* Step 1: Execute `timeout 90 cargo test -p strs_tools --all-targets` via `execute_command`.
117-
* Step 2: Analyze the output to confirm that all tests in the `unescape_tests` module pass successfully.
11874
* **Commit Message:** `feat(strs_tools): Implement unescaping logic for string splitting`
11975

12076
##### Increment 5: Implement Quoted Segment Logic
12177
* **Goal:** Modify the `SplitIterator` to correctly identify and consume an entire quoted string as a single token, and apply the new unescaping logic.
122-
* **Specification Reference:** "Ensure that when `quoting(true)` is enabled, the iterator consumes the entire quoted segment" from the proposal.
123-
* **Steps:**
124-
* Step 1: Read the file `module/core/strs_tools/src/string/split.rs`.
125-
* Step 2: In the `next()` method of `SplitIterator`, remove the `dbg!` macro calls that were used for debugging.
126-
* Step 3: Run `timeout 90 cargo test -p strs_tools --all-targets` to confirm that all tests still pass after removing the debug macros.
127-
* **Increment Verification:**
128-
* Step 1: Run `timeout 90 cargo test -p strs_tools --all-targets`. All tests must pass.
12978
* **Commit Message:** `feat(strs_tools): Make split iterator consume full quoted strings and unescape them`
13079

131-
##### Increment 6: Add New Tests for Unescaping and Quoting
132-
* **Goal:** Add new integration tests to verify the complete functionality and prevent future regressions.
133-
* **Specification Reference:** "Acceptance Criteria" from the proposal.
80+
##### Increment 6: Fix Spurious Empty Segment Bug
81+
* **Goal:** To fix the bug where an extra empty segment is incorrectly yielded after a quoted segment when `preserving_empty(true)` is enabled.
82+
* **Specification Reference:** N/A
83+
* **Steps:**
84+
* Step 1: **Analysis:** The tests `test_m_t3_13_quoting_preserve_all_strip`, `empty_quoted_section_test`, and `test_m_t3_11_quoting_preserve_all_no_strip` all fail because an extra empty `""` is produced after a quoted segment. This happens because the main `SplitIterator`'s `next` method has special "peeking" logic for quotes. After it consumes a quoted segment, the underlying `SplitFastIterator` is given the rest of the string (e.g., ` d`). Because this starts with a delimiter, the fast iterator correctly yields an empty segment first. The main iterator's `skip` logic is flawed and fails to filter this artifact.
85+
* Step 2: **Read File:** Read `module/core/strs_tools/src/string/split.rs`.
86+
* Step 3: **Locate Flawed Logic:** In `SplitIterator::next`, find the two separate `if` blocks that use `continue` to skip segments.
87+
* Step 4: **Apply Fix:** Replace the two separate `if` blocks with a single, consolidated `if skip { ... }` block. The `skip` variable will combine the conditions for skipping empty segments and skipping delimiters into a single boolean check.
88+
```rust
89+
let skip = ( current_split.typ == SplitType::Delimeted && current_split.string.is_empty() && !self.flags.contains( SplitFlags::PRESERVING_EMPTY ) )
90+
|| ( current_split.typ == SplitType::Delimiter && !self.flags.contains( SplitFlags::PRESERVING_DELIMITERS ) );
91+
92+
if skip
93+
{
94+
continue;
95+
}
96+
```
97+
* Step 5: **Perform Increment Verification.**
98+
* **Increment Verification:**
99+
* Step 1: Execute `timeout 90 cargo test -p strs_tools --test strs_tools_tests` via `execute_command`.
100+
* Step 2: Analyze the output. The following tests must now **pass**:
101+
* `inc::split_test::combined_options_tests::test_m_t3_13_quoting_preserve_all_strip`
102+
* `inc::split_test::quoting_and_unescaping_tests::empty_quoted_section_test`
103+
* `inc::split_test::quoting_options_tests::test_m_t3_11_quoting_preserve_all_no_strip`
104+
* `inc::split_test::quoting_options_tests::test_m_t3_13_quoting_preserve_all_strip`
105+
* Step 3: The `mre_test` is expected to still fail.
106+
* **Commit Message:** `fix(strs_tools): Prevent extra empty segment after quoted strings`
107+
108+
##### Increment 7: Fix Incorrect Escape Handling Bug
109+
* **Goal:** To fix the bug where an escaped backslash (`\\`) followed by a quote is parsed incorrectly, as identified in the `mre_test`.
110+
* **Specification Reference:** N/A
134111
* **Steps:**
135-
* Step 1: Create a new test file: `module/core/strs_tools/tests/inc/split_test/quoting_and_unescaping_tests.rs`.
136-
* Step 2: Use `read_file` to load `module/core/strs_tools/tests/inc/split_test/mod.rs`.
137-
* Step 3: Use `insert_content` to add `pub mod quoting_and_unescaping_tests;` to `module/core/strs_tools/tests/inc/split_test/mod.rs`.
138-
* Step 4: In the new test file, add a test case that is an exact copy of the MRE from the task description. Assert that the output for the quoted part is a single `Split` item with the correctly unescaped string.
139-
* Step 5: Add more test cases covering:
140-
* Strings with no quotes.
141-
* Strings with empty quoted sections (`""`).
142-
* Strings with multiple, different escape sequences.
143-
* Quoted strings at the beginning, middle, and end of the input.
144-
* Unterminated quoted strings (decide on expected behavior, e.g., treat as literal).
112+
* Step 1: **Analysis:** The `mre_test` fails because the input `r#""arg3 \\" "#` is split into `arg3` and `\\\\\"` instead of a single token `arg3 \`. The root cause is in `SplitFastIterator::next`. The `for` loop that scans for the closing quote does not correctly manage the state of the `prev_char_is_escape` flag, causing it to misinterpret the sequence `\\"`.
113+
* Step 2: **Read File:** Read `module/core/strs_tools/src/string/split.rs`.
114+
* Step 3: **Locate Flawed Logic:** In `SplitFastIterator::next`, find the `for` loop inside the `if let Some( current_quote_char ) = self.active_quote_char` block.
115+
* Step 4: **Apply Fix:** Replace the buggy `for` loop with the corrected version that properly handles the escape state.
116+
```rust
117+
let mut end_of_quote_idx : Option< usize > = None;
118+
let mut prev_char_is_escape = false;
119+
for ( char_idx, ch ) in self.iterable.char_indices()
120+
{
121+
if prev_char_is_escape
122+
{
123+
prev_char_is_escape = false;
124+
}
125+
else if ch == '\\'
126+
{
127+
prev_char_is_escape = true;
128+
}
129+
else if ch == current_quote_char
130+
{
131+
end_of_quote_idx = Some( char_idx + ch.len_utf8() );
132+
break;
133+
}
134+
}
135+
```
136+
* Step 5: **Perform Increment Verification.**
145137
* **Increment Verification:**
146-
* Step 1: Run `timeout 90 cargo test -p strs_tools --test strs_tools_tests`. All new and existing tests must pass.
147-
* **Commit Message:** `test(strs_tools): Add comprehensive tests for quoting and unescaping`
138+
* Step 1: Execute `timeout 90 cargo test -p strs_tools --test strs_tools_tests` via `execute_command`.
139+
* Step 2: Analyze the output. The `inc::split_test::quoting_and_unescaping_tests::mre_test` must now **pass**.
140+
* Step 3: Ensure no regressions were introduced in the other tests. All tests should pass.
141+
* **Commit Message:** `fix(strs_tools): Correctly handle escaped characters in quoted strings`
148142

149-
##### Increment 7: Finalization
143+
##### Increment 8: Finalization
150144
* **Goal:** Perform a final review and verification of the entire task's output.
151145
* **Specification Reference:** N/A
152146
* **Steps:**
153147
* Step 1: Perform a self-critique against all requirements in the plan.
154148
* Step 2: Run the full `Crate Conformance Check Procedure`.
155-
* Step 3: Ensure no regressions have been introduced.
156-
* Step 4: Remove the original `module/core/strs_tools/task.md` if it still exists.
149+
* Step 3: Remove any temporary debug files (e.g., `debug_split_issue.rs`, `debug_hang_split_issue.rs`).
150+
* Step 4: Update `changelog.md` with a summary of the changes.
157151
* **Increment Verification:**
158152
* Step 1: All steps of the `Crate Conformance Check Procedure` must pass.
159153
* Step 2: `git status` should be clean.
@@ -182,18 +176,19 @@
182176
* None
183177

184178
### Notes & Insights
179+
* **Bug A (Incorrect Escape Handling):** The parser fails to correctly handle an escaped backslash (`\\`) when it is immediately followed by a closing quote character (`"`). The root cause is flawed state management in `SplitFastIterator::next`'s quote-scanning loop. The `mre_test` correctly identifies this bug.
180+
* **Bug B (Spurious Empty Segment):** The iterator incorrectly yields an extra, unwanted empty segment (`""`) immediately after parsing a quoted segment, but only when the `preserving_empty(true)` option is enabled. This is due to flawed `skip` logic in the main `SplitIterator` after its "peeking" logic for quotes has run.
185181
* **Increment 4 (Implement Unescaping Logic):**
186-
* **Issue:** Initial implementation of `unescape_str` caused lifetime errors (`E0597`) when its `Cow::Borrowed` return type was used in `SplitIterator::next` due to borrowing from a temporary `quoted_segment`.
187-
* **Solution:** Forced `unescape_str` to always return `Cow::Owned` by calling `.into_owned()` on its result, breaking the invalid borrow. This required explicit type annotation and a two-step conversion to avoid compiler confusion.
188-
* **Insight:** `Cow` can be tricky with lifetimes, especially when intermediate `Cow::Borrowed` values are created and then used in a context that outlives them. Explicitly converting to `Cow::Owned` can resolve such issues, but it's important to consider performance implications if many small strings are being unescaped.
182+
* **Issue:** Initial implementation of `unescape_str` caused lifetime errors (`E0597`).
183+
* **Solution:** Forced `unescape_str` to always return `Cow::Owned`.
189184
* **Increment 5 (Implement Quoted Segment Logic):**
190-
* **Issue:** New tests for quoting and unescaping failed because `SplitIterator` was incorrectly preserving delimiter segments even when `preserving_delimeters(false)` was set. Additionally, an extra empty string segment was sometimes yielded when `preserving_empty` is true and a quoted segment is encountered.
191-
* **Solution:** Modified the `SplitIterator::next` method to correctly apply the `skip` logic. The `skip` conditions for empty delimited segments and delimiter segments were combined with a logical OR (`||`) and placed at the beginning of the loop to ensure immediate skipping. This prevents unwanted segments from being yielded.
192-
* **Insight:** The order and combination of `skip` conditions are crucial in iterators. A single `skip` flag that is conditionally overwritten can lead to subtle bugs. It's better to combine all skip conditions into a single boolean check at the start of the loop iteration.
185+
* **Issue:** New tests for quoting and unescaping failed because `SplitIterator` was incorrectly preserving delimiter segments.
186+
* **Solution:** Modified the `SplitIterator::next` method to correctly apply the `skip` logic.
193187

194188
### Changelog
189+
* [Increment 6 Plan] Updated plan to fix the two distinct bugs (Spurious Empty Segment, Incorrect Escape Handling) in separate, detailed increments based on comprehensive test failure analysis.
195190
* [Increment 5 | 2025-07-12] Removed debug macros from `SplitIterator`.
196191
* [Increment 4 | 2025-07-12] Implemented `unescape_str` function with unit tests and fixed compilation issues.
197192
* [Increment 3 | 2025-07-10] Fixed compilation errors after changing `Split.string` to `Cow`.
198193
* [Increment 2 | 2025-07-10] Changed `Split.string` to `Cow<'a, str>` to support unescaping.
199-
* [Increment 1 | 2025-07-10] Read relevant files for analysis.
194+
* [Increment 1 | 2025-07-10] Read relevant files for analysis.

0 commit comments

Comments
 (0)