Skip to content

Commit 6c8c686

Browse files
committed
strs_tools : unescaping wip
1 parent 5b5ba01 commit 6c8c686

File tree

2 files changed

+19
-17
lines changed

2 files changed

+19
-17
lines changed

module/core/strs_tools/src/string/split.rs

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -446,14 +446,14 @@ mod private
446446
} else { effective_split_opt = self.iterator.next(); }
447447
} else { effective_split_opt = self.iterator.next(); }
448448
let mut current_split = effective_split_opt?;
449-
if let Some(peeked_quote_end) = just_finished_quote_offset_cache {
450-
if current_split.typ == SplitType::Delimeted && current_split.string.is_empty() && current_split.start == peeked_quote_end && self.flags.contains(SplitFlags::PRESERVING_EMPTY) && peeked_quote_end < self.src.len() {
451-
let char_after_quote = &self.src[peeked_quote_end..];
452-
if self.iterator.delimeter.pos(char_after_quote).is_some_and(|(ds, _)| ds == 0) {
453-
self.last_yielded_token_was_delimiter = false; continue;
454-
}
455-
}
449+
450+
// Apply skip logic based on flags
451+
if (current_split.typ == SplitType::Delimeted && current_split.string.is_empty() && !self.flags.contains(SplitFlags::PRESERVING_EMPTY)) ||
452+
(current_split.typ == SplitType::Delimiter && !self.flags.contains(SplitFlags::PRESERVING_DELIMITERS))
453+
{
454+
continue; // Skip this split and continue to the next iteration of the loop
456455
}
456+
457457
if !quote_handled_by_peek && self.flags.contains(SplitFlags::QUOTING) && current_split.typ == SplitType::Delimiter && self.iterator.active_quote_char.is_none() {
458458
if let Some(_prefix_idx) = self.quoting_prefixes.iter().position(|p| *p == current_split.string.as_ref()) {
459459
let opening_quote_delimiter = current_split.clone();
@@ -472,13 +472,8 @@ mod private
472472
current_split.end = current_split.start + current_split.string.len();
473473
}
474474
}
475-
let mut skip = false;
476-
if current_split.typ == SplitType::Delimeted && current_split.string.is_empty() && !self.flags.contains(SplitFlags::PRESERVING_EMPTY) { skip = true; }
477-
if current_split.typ == SplitType::Delimiter && !self.flags.contains(SplitFlags::PRESERVING_DELIMITERS) { skip = true; }
478-
if !skip {
479-
if current_split.typ == SplitType::Delimiter { self.last_yielded_token_was_delimiter = true; }
480-
return Some( current_split );
481-
}
475+
if current_split.typ == SplitType::Delimiter { self.last_yielded_token_was_delimiter = true; }
476+
return Some( current_split );
482477
}
483478
}
484479
}

module/core/strs_tools/task/task_plan.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@
9898
* Step 1: Use `read_file` to load `module/core/strs_tools/src/string/split.rs`.
9999
* Step 2: In `module/core/strs_tools/src/string/split.rs`, add a new private helper function `fn unescape_str( input: &str ) -> Cow< '_, str >`.
100100
* Step 3: Implement the logic for `unescape_str`:
101-
* Search for the `\` character. If it's not found, return `Cow::Borrowed(input)` for efficiency.
101+
* Search for the `\` character. If it's not found, return `Cow::Borrowed(input)`.
102102
* If `\` is found, iterate through the input string's characters to build a new `String`.
103103
* When a `\` is encountered, inspect the next character to handle valid escape sequences (`\"`, `\\`, `\n`, `\t`, `\r`) by appending their literal counterparts.
104104
* If an escape sequence is not one of the recognized ones, append both the `\` and the character that follows it literally.
@@ -135,7 +135,7 @@
135135
* Step 1: Create a new test file: `module/core/strs_tools/tests/inc/split_test/quoting_and_unescaping_tests.rs`.
136136
* Step 2: Use `read_file` to load `module/core/strs_tools/tests/inc/split_test/mod.rs`.
137137
* Step 3: Use `insert_content` to add `pub mod quoting_and_unescaping_tests;` to `module/core/strs_tools/tests/inc/split_test/mod.rs`.
138-
* Step 4: In the new test file (`quoting_and_unescaping_tests.rs`), add a test case that is an exact copy of the MRE from the task description. Assert that the output for the quoted part is a single `Split` item with the correctly unescaped string.
138+
* Step 4: In the new test file, add a test case that is an exact copy of the MRE from the task description. Assert that the output for the quoted part is a single `Split` item with the correctly unescaped string.
139139
* Step 5: Add more test cases covering:
140140
* Strings with no quotes.
141141
* Strings with empty quoted sections (`""`).
@@ -182,7 +182,14 @@
182182
* None
183183

184184
### Notes & Insights
185-
* This change will significantly improve the usability of `strs_tools` for parsing command-line-like inputs. The use of `Cow` is a good trade-off between performance (for non-escaped strings) and correctness (for escaped strings).
185+
* **Increment 4 (Implement Unescaping Logic):**
186+
* **Issue:** Initial implementation of `unescape_str` caused lifetime errors (`E0597`) when its `Cow::Borrowed` return type was used in `SplitIterator::next` due to borrowing from a temporary `quoted_segment`.
187+
* **Solution:** Forced `unescape_str` to always return `Cow::Owned` by calling `.into_owned()` on its result, breaking the invalid borrow. This required explicit type annotation and a two-step conversion to avoid compiler confusion.
188+
* **Insight:** `Cow` can be tricky with lifetimes, especially when intermediate `Cow::Borrowed` values are created and then used in a context that outlives them. Explicitly converting to `Cow::Owned` can resolve such issues, but it's important to consider performance implications if many small strings are being unescaped.
189+
* **Increment 5 (Implement Quoted Segment Logic):**
190+
* **Issue:** New tests for quoting and unescaping failed because `SplitIterator` was incorrectly preserving delimiter segments even when `preserving_delimeters(false)` was set. Additionally, an extra empty string segment was sometimes yielded when `preserving_empty` was true and a quoted segment was encountered.
191+
* **Solution:** Modified the `SplitIterator::next` method to correctly apply the `skip` logic. The `skip` conditions for empty delimited segments and delimiter segments were combined with a logical OR (`||`) and placed at the beginning of the loop to ensure immediate skipping. This prevents unwanted segments from being yielded.
192+
* **Insight:** The order and combination of `skip` conditions are crucial in iterators. A single `skip` flag that is conditionally overwritten can lead to subtle bugs. It's better to combine all skip conditions into a single boolean check at the start of the loop iteration.
186193

187194
### Changelog
188195
* [Increment 5 | 2025-07-12] Removed debug macros from `SplitIterator`.

0 commit comments

Comments
 (0)