Skip to content

Comments

Ncollection strings#71

Open
dpasukhi wants to merge 12 commits intomasterfrom
ncollection_strings
Open

Ncollection strings#71
dpasukhi wants to merge 12 commits intomasterfrom
ncollection_strings

Conversation

@dpasukhi
Copy link
Owner

No description provided.

dpasukhi and others added 11 commits January 16, 2026 10:41
…string_view support

- Added constructor and assignment operator for std::u16string_view.
- Implemented conversion operator to std::u16string_view.
- Introduced methods to append std::u16string_view.
- Added Copy and assignment operator overloads for char16_t pointers.
- Fixed variable name inconsistencies (myLength vs mylength).
- Added new string manipulation methods: LeftAdjust, RightAdjust, LeftJustify, RightJustify, Center, Capitalize, Prepend, FirstLocationInSet, FirstLocationNotInSet, IntegerValue, IsIntegerValue, RealValue, IsRealValue, IsSameString.
…and methods for better handling of empty strings and parameter validation
…r and real number strings in TCollection_ExtendedString constructors
…xtendedString

- Updated constructor and method documentation for clarity and consistency.
- Renamed parameters in constructors and methods to follow a consistent naming convention (e.g., `astring` to `theString`, `isMultiByte` to `theIsMultiByte`).
- Enhanced examples in documentation to illustrate usage more effectively.
- Improved comments for methods to provide better context and usage scenarios.
…String.cxx

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Pasukhin Dmitry <pasuhinsvzn@gmail.com>
…ndedString to use char16_t for ASCII characters
@dpasukhi
Copy link
Owner Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Walkthrough

This pull request refactors the TCollection_ExtendedString class with internal member renaming from mystring/mylength to myString/myLength and updates all associated allocation and deallocation paths. New constructors are added for multiple source types including char16_t*, int, double, TCollection_AsciiString, and C++17 std::u16string_view. Additional public methods are introduced for string manipulation (LeftAdjust, RightJustify, Capitalize, etc.), location-based search (FirstLocationInSet, FirstLocationNotInSet), numeric parsing (IntegerValue, RealValue), and comparison utilities. A comprehensive unit test suite validates these methods across multiple scenarios. The XML driver file receives a minor change replacing a narrow character null terminator with a Unicode terminator.

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author, making it impossible to assess whether the description relates to the changeset. Add a clear pull request description explaining the purpose of the changes, the scope of modifications (UTF-16 support, new APIs, refactoring), and any breaking changes or migration notes.
Docstring Coverage ⚠️ Warning Docstring coverage is 70.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Ncollection strings' is vague and does not clearly convey the specific changes made. It appears to be a generic label rather than a descriptive summary of the primary modification. Use a more specific title that highlights the main change, e.g., 'Add UTF-16 support and string manipulation APIs to TCollection_ExtendedString' or 'Expand TCollection_ExtendedString with char16_t and std::u16string_view support'.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/FoundationClasses/TKernel/TCollection/TCollection_ExtendedString.cxx`:
- Around line 227-233: The TCollection_ExtendedString constructor that takes
(const int theLength, const char16_t theFiller) must validate theLength is
non-negative before calling allocate to avoid underflow/oversized allocations;
add a guard at the start of
TCollection_ExtendedString::TCollection_ExtendedString to check theLength >= 0
and handle invalid values (e.g., throw an appropriate exception or set length to
0) and only call allocate(theLength) and fill myString when theLength is valid.
- Around line 495-590: Several pointer+length string methods (IsEqual,
IsDifferent, IsLess, IsGreater, StartsWith, EndsWith) perform memcmp/indexing
without validating theOther or theLength, causing UB for nullptr or negative
lengths; add guarded early checks: if theLength < 0 return false (or false/true
consistent with semantics), treat theOther == nullptr && theLength == 0 as empty
(continue), otherwise if theOther == nullptr with positive length return false
(or appropriate comparison result), and only call memcmp or index myString when
theOther != nullptr and theLength > 0; apply the same guard pattern to each
named method (IsEqual, IsDifferent, IsLess, IsGreater, StartsWith, EndsWith).
- Around line 1200-1214: TCollection_ExtendedString::Prepend currently captures
theOther before reallocate which can dangle if theOther aliases into the current
buffer (self‑prepend); fix by detecting aliasing first: save the old buffer
pointer (oldPtr = myString) and old length (anOldLength), check if theOther lies
in [oldPtr, oldPtr + anOldLength) and if so compute an index offset = theOther -
oldPtr, then call reallocate(myLength + theLength), use memmove(myString +
theLength, myString, anOldLength * sizeof(char16_t)) to shift old contents, and
finally copy the prefix using memmove(myString, myString + theLength + index,
theLength * sizeof(char16_t)) when aliased (otherwise copy directly from
theOther with memcpy or memmove); reference functions/vars:
TCollection_ExtendedString::Prepend, reallocate, myString, myLength,
anOldLength, theOther, memmove/memcpy.
- Around line 305-341: The AssignCat overloads use memcpy after reallocate which
causes UB for self-append or when the source pointer lies inside the current
buffer (dangling after reallocate); change the logic in AssignCat(const
TCollection_ExtendedString&), AssignCat(const char16_t*,...), and the
pointer-based path to detect overlap or self-append (e.g. if &theOther == this
or if theString lies between myString and myString+myLength) and in those cases
first copy the source range into a temporary buffer (stack array or
std::vector<char16_t>), then call reallocate and memcpy from the temporary; for
non-overlapping cases you can reallocate and memcpy directly (or use memmove
where appropriate). Ensure references to reallocate, myString, myLength,
AssignCat and theOther are updated accordingly.
🧹 Nitpick comments (1)
src/FoundationClasses/TKernel/TCollection/TCollection_ExtendedString.hxx (1)

69-122: Use Standard_ typedefs for new public APIs.*

Several new signatures use raw int/double/bool (e.g., Line 69+, Line 893+, and the internal length helper/member). OCCT style prefers Standard_Integer, Standard_Real, and Standard_Boolean. Consider aligning these declarations (and the .cxx definitions) for consistency.

♻️ Example alignment (apply broadly)
-  Standard_EXPORT TCollection_ExtendedString(const int theValue);
-  Standard_EXPORT TCollection_ExtendedString(const double theValue);
+  Standard_EXPORT TCollection_ExtendedString(const Standard_Integer theValue);
+  Standard_EXPORT TCollection_ExtendedString(const Standard_Real theValue);

-  Standard_EXPORT int IntegerValue() const;
-  Standard_EXPORT bool IsIntegerValue() const;
-  Standard_EXPORT double RealValue() const;
-  Standard_EXPORT bool IsRealValue(bool theToCheckFull = false) const;
+  Standard_EXPORT Standard_Integer IntegerValue() const;
+  Standard_EXPORT Standard_Boolean IsIntegerValue() const;
+  Standard_EXPORT Standard_Real RealValue() const;
+  Standard_EXPORT Standard_Boolean IsRealValue(Standard_Boolean theToCheckFull = Standard_False) const;
As per learnings, prefer Standard_Real/Standard_Integer/Standard_Boolean in src/**/*.@(cxx|hxx).

Also applies to: 893-934, 949-969

Comment on lines 227 to 233
TCollection_ExtendedString::TCollection_ExtendedString(const int theLength,
const char16_t theFiller)
{
allocate(length);
for (int i = 0; i < length; i++)
mystring[i] = filler;
allocate(theLength);
for (int i = 0; i < theLength; i++)
myString[i] = theFiller;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate negative length in filler constructor.

Line 227 currently accepts negative lengths, which can underflow allocation sizing and trigger oversized allocations.

🛠 Proposed fix
TCollection_ExtendedString::TCollection_ExtendedString(const int theLength,
                                                       const char16_t theFiller)
{
+  if (theLength < 0)
+  {
+    throw Standard_OutOfRange("TCollection_ExtendedString : negative length");
+  }
   allocate(theLength);
   for (int i = 0; i < theLength; i++)
     myString[i] = theFiller;
}
🤖 Prompt for AI Agents
In `@src/FoundationClasses/TKernel/TCollection/TCollection_ExtendedString.cxx`
around lines 227 - 233, The TCollection_ExtendedString constructor that takes
(const int theLength, const char16_t theFiller) must validate theLength is
non-negative before calling allocate to avoid underflow/oversized allocations;
add a guard at the start of
TCollection_ExtendedString::TCollection_ExtendedString to check theLength >= 0
and handle invalid values (e.g., throw an appropriate exception or set length to
0) and only call allocate(theLength) and fill myString when theLength is valid.

Comment on lines 305 to 341
void TCollection_ExtendedString::AssignCat(const TCollection_ExtendedString& theOther)
{
if (theOther.mylength == 0)
if (theOther.myLength == 0)
{
return;
}

const int anOtherLength = theOther.mylength;
const int anOldLength = mylength;
reallocate(mylength + anOtherLength);
memcpy(mystring + anOldLength, theOther.mystring, anOtherLength * sizeof(char16_t));
const int anOtherLength = theOther.myLength;
const int anOldLength = myLength;
reallocate(myLength + anOtherLength);
memcpy(myString + anOldLength, theOther.myString, anOtherLength * sizeof(char16_t));
}

//=================================================================================================
//==================================================================================================

void TCollection_ExtendedString::AssignCat(const char16_t theChar)
{
if (theChar != '\0')
if (theChar != u'\0')
{
reallocate(mylength + 1);
mystring[mylength - 1] = theChar;
reallocate(myLength + 1);
myString[myLength - 1] = theChar;
}
}

//=================================================================================================
//==================================================================================================

TCollection_ExtendedString TCollection_ExtendedString::Cat(
const TCollection_ExtendedString& other) const
void TCollection_ExtendedString::AssignCat(const char16_t* theString, const int theLength)
{
TCollection_ExtendedString res(mylength + other.mylength, 0);
if (mylength > 0)
memcpy(res.mystring, mystring, mylength * sizeof(char16_t));
if (other.mylength > 0)
memcpy(res.mystring + mylength, other.mystring, other.mylength * sizeof(char16_t));
return res;
if (theString == nullptr || theLength <= 0)
{
return;
}

const int anOldLength = myLength;
reallocate(myLength + theLength);
memcpy(myString + anOldLength, theString, theLength * sizeof(char16_t));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix self‑append / overlapping source handling in AssignCat.

Line 315 uses memcpy after reallocate, so s += s or overlapping sources are UB; the pointer+length overload can also dangle if the source points into the current buffer.

🛠 Proposed fix
void TCollection_ExtendedString::AssignCat(const TCollection_ExtendedString& theOther)
{
  if (theOther.myLength == 0)
  {
    return;
  }

  const int anOtherLength = theOther.myLength;
  const int anOldLength   = myLength;
  reallocate(myLength + anOtherLength);
-  memcpy(myString + anOldLength, theOther.myString, anOtherLength * sizeof(char16_t));
+  // memmove to allow self-append (overlapping ranges)
+  memmove(myString + anOldLength, theOther.myString, anOtherLength * sizeof(char16_t));
}

void TCollection_ExtendedString::AssignCat(const char16_t* theString, const int theLength)
{
  if (theString == nullptr || theLength <= 0)
  {
    return;
  }

+  // Guard against aliasing with this->myString (reallocate may move the buffer)
+  if (theString >= myString && theString < myString + myLength)
+  {
+    TCollection_ExtendedString aTmp(theString, theLength);
+    AssignCat(aTmp);
+    return;
+  }
  const int anOldLength = myLength;
  reallocate(myLength + theLength);
  memcpy(myString + anOldLength, theString, theLength * sizeof(char16_t));
}
🤖 Prompt for AI Agents
In `@src/FoundationClasses/TKernel/TCollection/TCollection_ExtendedString.cxx`
around lines 305 - 341, The AssignCat overloads use memcpy after reallocate
which causes UB for self-append or when the source pointer lies inside the
current buffer (dangling after reallocate); change the logic in AssignCat(const
TCollection_ExtendedString&), AssignCat(const char16_t*,...), and the
pointer-based path to detect overlap or self-append (e.g. if &theOther == this
or if theString lies between myString and myString+myLength) and in those cases
first copy the source range into a temporary buffer (stack array or
std::vector<char16_t>), then call reallocate and memcpy from the temporary; for
non-overlapping cases you can reallocate and memcpy directly (or use memmove
where appropriate). Ensure references to reallocate, myString, myLength,
AssignCat and theOther are updated accordingly.

Comment on lines 495 to 590
bool TCollection_ExtendedString::IsEqual(const char16_t* theOther, const int theLength) const
{
return ExtStrCmp(mystring, other) < 0;
if (myLength != theLength)
{
return false;
}
if (theLength == 0)
{
return true;
}
return memcmp(myString, theOther, theLength * sizeof(char16_t)) == 0;
}

//=================================================================================================
//==================================================================================================

bool TCollection_ExtendedString::IsLess(const TCollection_ExtendedString& other) const
bool TCollection_ExtendedString::IsDifferent(const char16_t* theOther, const int theLength) const
{
return ExtStrCmp(mystring, other.mystring) < 0;
if (myLength != theLength)
{
return true;
}
if (theLength == 0)
{
return false;
}
return memcmp(myString, theOther, theLength * sizeof(char16_t)) != 0;
}

//=================================================================================================
//==================================================================================================

bool TCollection_ExtendedString::IsGreater(const char16_t* other) const
bool TCollection_ExtendedString::IsLess(const char16_t* theOther, const int theLength) const
{
return ExtStrCmp(mystring, other) > 0;
const int aMinLen = (myLength < theLength) ? myLength : theLength;
for (int i = 0; i < aMinLen; ++i)
{
if (myString[i] < theOther[i])
{
return true;
}
if (myString[i] > theOther[i])
{
return false;
}
}
return myLength < theLength;
}

//=================================================================================================
//==================================================================================================

bool TCollection_ExtendedString::IsGreater(const TCollection_ExtendedString& other) const
bool TCollection_ExtendedString::IsGreater(const char16_t* theOther, const int theLength) const
{
return ExtStrCmp(mystring, other.mystring) > 0;
const int aMinLen = (myLength < theLength) ? myLength : theLength;
for (int i = 0; i < aMinLen; ++i)
{
if (myString[i] > theOther[i])
{
return true;
}
if (myString[i] < theOther[i])
{
return false;
}
}
return myLength > theLength;
}

//=================================================================================================
//==================================================================================================

bool TCollection_ExtendedString::StartsWith(const TCollection_ExtendedString& theStartString) const
bool TCollection_ExtendedString::StartsWith(const char16_t* theStartString,
const int theLength) const
{
if (this == &theStartString)
if (theLength == 0)
{
return true;
}

return mylength >= theStartString.mylength
&& memcmp(theStartString.mystring, mystring, theStartString.mylength * sizeof(char16_t))
== 0;
if (myLength < theLength)
{
return false;
}
return memcmp(myString, theStartString, theLength * sizeof(char16_t)) == 0;
}

//=================================================================================================
//==================================================================================================

bool TCollection_ExtendedString::EndsWith(const TCollection_ExtendedString& theEndString) const
bool TCollection_ExtendedString::EndsWith(const char16_t* theEndString, const int theLength) const
{
if (this == &theEndString)
if (theLength == 0)
{
return true;
}

return mylength >= theEndString.mylength
&& memcmp(theEndString.mystring,
mystring + mylength - theEndString.mylength,
theEndString.mylength * sizeof(char16_t))
== 0;
if (myLength < theLength)
{
return false;
}
return memcmp(myString + myLength - theLength, theEndString, theLength * sizeof(char16_t)) == 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add nullptr/negative‑length guards for pointer+length comparisons.

Line 495+ uses memcmp/indexing without validating theOther or theLength; a nullptr with positive length (or a negative length) is UB. Consider early validation (e.g., treat nullptr+0 as empty, otherwise return false/throw) across IsEqual, IsDifferent, IsLess, IsGreater, StartsWith, and EndsWith.

🛠 Proposed fix (apply similarly to the other pointer+length overloads)
bool TCollection_ExtendedString::IsEqual(const char16_t* theOther, const int theLength) const
{
+  if (theLength < 0)
+  {
+    throw Standard_OutOfRange("TCollection_ExtendedString::IsEqual : negative length");
+  }
+  if (theOther == nullptr)
+  {
+    return (myLength == 0 && theLength == 0);
+  }
  if (myLength != theLength)
  {
    return false;
  }
  if (theLength == 0)
  {
    return true;
  }
  return memcmp(myString, theOther, theLength * sizeof(char16_t)) == 0;
}
🤖 Prompt for AI Agents
In `@src/FoundationClasses/TKernel/TCollection/TCollection_ExtendedString.cxx`
around lines 495 - 590, Several pointer+length string methods (IsEqual,
IsDifferent, IsLess, IsGreater, StartsWith, EndsWith) perform memcmp/indexing
without validating theOther or theLength, causing UB for nullptr or negative
lengths; add guarded early checks: if theLength < 0 return false (or false/true
consistent with semantics), treat theOther == nullptr && theLength == 0 as empty
(continue), otherwise if theOther == nullptr with positive length return false
(or appropriate comparison result), and only call memcmp or index myString when
theOther != nullptr and theLength > 0; apply the same guard pattern to each
named method (IsEqual, IsDifferent, IsLess, IsGreater, StartsWith, EndsWith).

Comment on lines 1200 to 1214
void TCollection_ExtendedString::Prepend(const char16_t* theOther, const int theLength)
{
if (theOther == nullptr || theLength <= 0)
{
return;
}

const int anOldLength = myLength;
reallocate(myLength + theLength);

// Move existing characters to the end
memmove(myString + theLength, myString, anOldLength * sizeof(char16_t));
// Copy theOther to the beginning
memcpy(myString, theOther, theLength * sizeof(char16_t));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prepend needs overlap‑safe handling (self‑prepend / aliasing).

Line 1208 captures the source pointer before reallocate; if theOther points into this string (including self‑prepend), that pointer can dangle after reallocation.

🛠 Proposed fix
void TCollection_ExtendedString::Prepend(const char16_t* theOther, const int theLength)
{
  if (theOther == nullptr || theLength <= 0)
  {
    return;
  }

+  // Protect against aliasing (e.g., self-prepend)
+  TCollection_ExtendedString aTmp;
+  if (theOther >= myString && theOther < myString + myLength)
+  {
+    aTmp    = TCollection_ExtendedString(theOther, theLength);
+    theOther = aTmp.myString;
+  }
+
  const int anOldLength = myLength;
  reallocate(myLength + theLength);

  // Move existing characters to the end
  memmove(myString + theLength, myString, anOldLength * sizeof(char16_t));
  // Copy theOther to the beginning
  memcpy(myString, theOther, theLength * sizeof(char16_t));
}
🤖 Prompt for AI Agents
In `@src/FoundationClasses/TKernel/TCollection/TCollection_ExtendedString.cxx`
around lines 1200 - 1214, TCollection_ExtendedString::Prepend currently captures
theOther before reallocate which can dangle if theOther aliases into the current
buffer (self‑prepend); fix by detecting aliasing first: save the old buffer
pointer (oldPtr = myString) and old length (anOldLength), check if theOther lies
in [oldPtr, oldPtr + anOldLength) and if so compute an index offset = theOther -
oldPtr, then call reallocate(myLength + theLength), use memmove(myString +
theLength, myString, anOldLength * sizeof(char16_t)) to shift old contents, and
finally copy the prefix using memmove(myString, myString + theLength + index,
theLength * sizeof(char16_t)) when aliased (otherwise copy directly from
theOther with memcpy or memmove); reference functions/vars:
TCollection_ExtendedString::Prepend, reallocate, myString, myLength,
anOldLength, theOther, memmove/memcpy.

Base automatically changed from IR to master February 15, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant