Skip to content

Conversation

@Sulfoxide1819
Copy link

Add new method IsAlpha and rewrite IsNameStartChar to avoid using library functions. Since the old functions were applied only to us-ASCII characters, bit mask can be used in rewritten methods. It gives a good acceleration: 11-15% x86, about 20% RISCV.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces standard library functions isalpha and isspace with custom bit mask implementations to achieve significant performance improvements (11-15% on x86, ~20% on RISC-V). The optimization focuses on US-ASCII character processing using efficient bit manipulation operations.

Key changes:

  • Introduced new IsSpace function using a 64-bit bitmask to identify whitespace characters (tab, newline, vertical tab, form feed, carriage return, space)
  • Rewrote IsNameStartChar to use a 4-element bitmask array covering the full 8-bit character range
  • Added comprehensive unit tests for IsNameStartChar covering edge cases and boundary values

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tinyxml2.h Implements bit mask-based character classification functions replacing library calls
xmltest.cpp Adds unit tests for IsNameStartChar validating correct behavior for special characters, alphabetic ranges, and high-order bytes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +564 to +576
inline static bool IsSpace( unsigned char ch ) {
static constexpr uint64_t mask =
1ULL << 9
| 1ULL << 10
| 1ULL << 11
| 1ULL << 12
| 1ULL << 13
| 1ULL << 32;
if ( ch > 32 ) {
return false;
}
return mask >> ch & 1;
}
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing documentation for new function. Consider adding a comment explaining what this function does, similar to the comment on line 582 for IsNameStartChar. For example: "// Checks if a character is whitespace (tab, newline, vertical tab, form feed, carriage return, or space)"

Copilot uses AI. Check for mistakes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty opaque - if cool - function. Why does this even work? Is there a reference to where the algo comes from?

if ( ch > 32 ) {
return false;
}
return mask >> ch & 1;
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing parentheses in bitwise operation. The expression mask >> ch & 1 is evaluated as mask >> (ch & 1) due to operator precedence, not (mask >> ch) & 1 as intended. This will produce incorrect results for most inputs.

Fix: return (mask >> ch) & 1;

Suggested change
return mask >> ch & 1;
return (mask >> ch) & 1;

Copilot uses AI. Check for mistakes.
Comment on lines +2733 to +2750
// ---------- Testing IsNameStartChar ----------
{
XMLUtil test;
// Tests validate key edge cases for IsNameStartChar without exhaustive coverage
XMLTest("IsNameStartChar(':')", true, test.IsNameStartChar(':'));
XMLTest("IsNameStartChar('_')", true, test.IsNameStartChar('_'));
XMLTest("IsNameStartChar('@')", false, test.IsNameStartChar('@'));
XMLTest("IsNameStartChar('A')", true, test.IsNameStartChar('A'));
XMLTest("IsNameStartChar('Z')", true, test.IsNameStartChar('Z'));
XMLTest("IsNameStartChar('[')", false, test.IsNameStartChar('['));
XMLTest("IsNameStartChar('`')", false, test.IsNameStartChar('`'));
XMLTest("IsNameStartChar('a')", true, test.IsNameStartChar('a'));
XMLTest("IsNameStartChar('z')", true, test.IsNameStartChar('z'));
XMLTest("IsNameStartChar('{')", false, test.IsNameStartChar('{'));
XMLTest("IsNameStartChar(127)", false, test.IsNameStartChar(static_cast<unsigned char>(127)));
XMLTest("IsNameStartChar(128)", true, test.IsNameStartChar(static_cast<unsigned char>(128)));
XMLTest("IsNameStartChar(255)", true, test.IsNameStartChar(static_cast<unsigned char>(255)));
}
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: this test block uses spaces for indentation, while the rest of the file uses tabs. Please convert the indentation to tabs to match the existing code style.

Suggested change
// ---------- Testing IsNameStartChar ----------
{
XMLUtil test;
// Tests validate key edge cases for IsNameStartChar without exhaustive coverage
XMLTest("IsNameStartChar(':')", true, test.IsNameStartChar(':'));
XMLTest("IsNameStartChar('_')", true, test.IsNameStartChar('_'));
XMLTest("IsNameStartChar('@')", false, test.IsNameStartChar('@'));
XMLTest("IsNameStartChar('A')", true, test.IsNameStartChar('A'));
XMLTest("IsNameStartChar('Z')", true, test.IsNameStartChar('Z'));
XMLTest("IsNameStartChar('[')", false, test.IsNameStartChar('['));
XMLTest("IsNameStartChar('`')", false, test.IsNameStartChar('`'));
XMLTest("IsNameStartChar('a')", true, test.IsNameStartChar('a'));
XMLTest("IsNameStartChar('z')", true, test.IsNameStartChar('z'));
XMLTest("IsNameStartChar('{')", false, test.IsNameStartChar('{'));
XMLTest("IsNameStartChar(127)", false, test.IsNameStartChar(static_cast<unsigned char>(127)));
XMLTest("IsNameStartChar(128)", true, test.IsNameStartChar(static_cast<unsigned char>(128)));
XMLTest("IsNameStartChar(255)", true, test.IsNameStartChar(static_cast<unsigned char>(255)));
}
// ---------- Testing IsNameStartChar ----------
{
XMLUtil test;
// Tests validate key edge cases for IsNameStartChar without exhaustive coverage
XMLTest("IsNameStartChar(':')", true, test.IsNameStartChar(':'));
XMLTest("IsNameStartChar('_')", true, test.IsNameStartChar('_'));
XMLTest("IsNameStartChar('@')", false, test.IsNameStartChar('@'));
XMLTest("IsNameStartChar('A')", true, test.IsNameStartChar('A'));
XMLTest("IsNameStartChar('Z')", true, test.IsNameStartChar('Z'));
XMLTest("IsNameStartChar('[')", false, test.IsNameStartChar('['));
XMLTest("IsNameStartChar('`')", false, test.IsNameStartChar('`'));
XMLTest("IsNameStartChar('a')", true, test.IsNameStartChar('a'));
XMLTest("IsNameStartChar('z')", true, test.IsNameStartChar('z'));
XMLTest("IsNameStartChar('{')", false, test.IsNameStartChar('{'));
XMLTest("IsNameStartChar(127)", false, test.IsNameStartChar(static_cast<unsigned char>(127)));
XMLTest("IsNameStartChar(128)", true, test.IsNameStartChar(static_cast<unsigned char>(128)));
XMLTest("IsNameStartChar(255)", true, test.IsNameStartChar(static_cast<unsigned char>(255)));
}

Copilot uses AI. Check for mistakes.
return IsSpace( static_cast<unsigned char>(p) );
}

// The method checks a char for matching ':', '_', alphabetic symbols or char >= 128 by bit mask
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Trailing whitespace at end of comment line. Remove the trailing space after "bit mask".

Suggested change
// The method checks a char for matching ':', '_', alphabetic symbols or char >= 128 by bit mask
// The method checks a char for matching ':', '_', alphabetic symbols or char >= 128 by bit mask

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants