feat(string): implement RopeString with thin vtable and lazy flattening#5006
feat(string): implement RopeString with thin vtable and lazy flattening#5006akash-R-A-J wants to merge 18 commits intoboa-dev:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5006 +/- ##
===========================================
+ Coverage 47.24% 59.33% +12.09%
===========================================
Files 476 591 +115
Lines 46892 63973 +17081
===========================================
+ Hits 22154 37959 +15805
- Misses 24738 26014 +1276 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // SAFETY: Caller must ensure the type matches. | ||
| unsafe { self.ptr.cast::<T>().as_ref() } | ||
| #[must_use] | ||
| pub fn as_str(&self) -> JsStr<'static> { |
There was a problem hiding this comment.
This is unsound. It's returning a 'static reference to a temporary JsString, so you could deallocate the original string and the compiler will happily let you access the JsStr<'static>:
let s = JsString::from_str("undefined behaviour!");
let temp = s.as_str();
drop(s)
println!("{}", temp.display_lossy()); // UB!There was a problem hiding this comment.
Thanks for pointing this out! you're absolutely right, that was a leftover from the initial pointer-based POC while focusing on the memory layout. I've now refactored the vtable to use HRTBs for as_str which replaces that temporary shim with a lifetime binding. It was my fault.
zhuzhu81998
left a comment
There was a problem hiding this comment.
ci shows segmentaion fault (potential cause below)
| let specifier = specifier.cow_replace('/', "\\"); | ||
|
|
||
| let short_path = Path::new(&specifier); | ||
| let short_path = Path::new(&*specifier); |
There was a problem hiding this comment.
what does this have to do with rope strings?
| pub(crate) mod rope; | ||
| pub(crate) use rope::RopeString; | ||
|
|
||
| /// Header for all `JsString` allocations. |
There was a problem hiding this comment.
Then call this JsStringHeader instead of RawJsString perhaps?
There was a problem hiding this comment.
sounds good, will replace RawJsString with the suggested JsStringHeader
| unsafe impl Sync for RawJsString {} | ||
| // SAFETY: RawJsString contains only thread-safe data. | ||
| unsafe impl Send for RawJsString {} |
There was a problem hiding this comment.
do you need this somewhere?
| // SAFETY: We only mutate refcount and hash via atomic-casts when kind != Static. | ||
| unsafe impl Sync for JsStringVTable {} | ||
| // SAFETY: JsStringVTable contains only thread-safe data. | ||
| unsafe impl Send for JsStringVTable {} |
There was a problem hiding this comment.
same question: where is the need?
| /// A rope string that is a tree of other strings. | ||
| #[repr(C)] | ||
| pub(crate) struct RopeString { | ||
| /// Standardized header for all strings. | ||
| pub(crate) header: RawJsString, | ||
| pub(crate) left: JsString, | ||
| pub(crate) right: JsString, | ||
| flattened: OnceCell<JsString>, | ||
| pub(crate) depth: u8, | ||
| } |
There was a problem hiding this comment.
hmmm so it is my understanding that JsString here will most likely GC-tracked. And there is no way for GC to know that RopeString is holdig reference to the left and right here?
are yo doing something to prevent this?
There was a problem hiding this comment.
To clarify: JSStrings are not GC-tracked, they're independently ref-counted, so it's fine to not trace through them. I'm fairly certain the UB comes from the modifications made to as_str that removed the lifetime inheritance for JsStr<'static>.
There was a problem hiding this comment.
indeed its not gc. disabling the gc does not resolve the segmentation fault XD.
6587b0c to
951985f
Compare
Test262 conformance changes
Broken tests (10):Tested main commit: |
|
@jedel1043 I believe all requested changes have now been addressed. CI is green across all platforms, and test262 results match Please let me know if anything else needs adjustment. |
c21f4cb to
b6134ad
Compare
f21309a to
af923c7
Compare
|
Hey @jedel1043, whenever you have some time, could you please take another look at this PR? All previously mentioned issues should now be addressed, including the rope rebalancing changes, safety fixes, and flattening adjustments discussed in the earlier review. Here's the latest benchmark results: |
Test262 conformance changes
Broken tests (10):Tested main commit: |
3482050 to
fe779dc
Compare
|
the failed tests are probably related. |
| /// Reference to the static vtable for this string kind. | ||
| pub vtable: &'static JsStringVTable, |
There was a problem hiding this comment.
I like the thin vtable here, we could make this even more space efficient if we store the kind only and have a table of vtables (u8 that we could potentially even encode in the len or refcount), the reference to a vtable makes more sense for object that we can't determine at compile time and are not limited (but JsString types are limited 😄).
const VTABLES = [JsStringVTable; {num vtables}] = [
LATIN1_VABLE,
ROPE_VTABLE,
// ...
];
// ...
// usage:
let dealloc = VTABLES[header.kind as usize].dealloc;Keeping it by value instead of reference in the table may also help with cache locality for all JsString internal implementations.
|
|
||
| /// Fibonacci numbers for rope balancing thresholds. | ||
| /// `F(n) = Fib(n + 2)`. A rope of depth `n` is balanced if its length >= `F(n)`. | ||
| static FIBONACCI_THRESHOLDS: [usize; 46] = [ |
There was a problem hiding this comment.
Suggestion: We can shrink this lookup table by half here (at least on 64bit systems) 😄
| static FIBONACCI_THRESHOLDS: [usize; 46] = [ | |
| static FIBONACCI_THRESHOLDS: [u32; 46] = [ |
| }; | ||
| ( $x:expr, $y:expr ) => { | ||
| $crate::string::JsString::concat($crate::string::JsStr::from($x), $crate::string::JsStr::from($y)) | ||
| $crate::string::JsString::concat(&$crate::string::JsString::from($x), &$crate::string::JsString::from($y)) |
There was a problem hiding this comment.
Suggestion: Recursively calling js_string here, has the benefit of creating a static string instead of two dynamic strings. (i.e. js_string!(s, ";"))
| $crate::string::JsString::concat(&$crate::string::JsString::from($x), &$crate::string::JsString::from($y)) | |
| $crate::string::JsString::concat(&crate::js_string!($x), &crate::js_string!($y)) |
| }; | ||
| ( $( $s:expr ),+ ) => { | ||
| $crate::string::JsString::concat_array(&[ $( $crate::string::JsStr::from($s) ),+ ]) | ||
| $crate::string::JsString::concat_array_strings(&[ $( $crate::string::JsString::from($s) ),+ ]) |
There was a problem hiding this comment.
| $crate::string::JsString::concat_array_strings(&[ $( $crate::string::JsString::from($s) ),+ ]) | |
| $crate::string::JsString::concat_array_strings(&[ $( $crate::js_string!($s) ),+ ]) |

This Pull Request fixes/closes #5005 .
This PR introduces Rope strings to
boa_stringand integrates them into the engine’s concatenation pipeline to eliminate the pathological O(N²) behavior of repeated string concatenation.Key Changes
Rope Strings
RopeStringrepresentation (core/string/src/vtable/rope.rs)Engine Integration
JsValue::addnow creates ropes for large concatenations (len(lhs) + len(rhs) > 1024)ConcatToStringupdated to use balanced batch concatenationString.prototype.concatrefactored to use the new APIThin VTable & Performance Recovery
clone,drop,as_str,code_unit_at)u64hash to speed up property lookupsptr_eq+ hash checks to optimize equality comparisonsBenchmarks
Concatenation Stress Test
≈ 21.7× faster
V8 Combined Suite
Summary
This change removes quadratic concatenation behavior while maintaining baseline performance for general workloads. The thin vtable and devirtualization ensure that ropes introduce minimal overhead when not used.