Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl(spanner): add uuid type #15023

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

scotthart
Copy link
Member

@scotthart scotthart commented Mar 10, 2025

This change is Reviewable

@scotthart scotthart requested a review from a team as a code owner March 10, 2025 17:12
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Mar 10, 2025
@scotthart scotthart requested a review from cuiy0006 March 10, 2025 17:12
Copy link

codecov bot commented Mar 10, 2025

Codecov Report

Attention: Patch coverage is 98.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.90%. Comparing base (518d46a) to head (5fd97bc).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
google/cloud/spanner/uuid.cc 95.83% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           main   #15023       +/-   ##
=========================================
+ Coverage      0   92.90%   +92.90%     
=========================================
  Files         0     2354     +2354     
  Lines         0   210355   +210355     
=========================================
+ Hits          0   195435   +195435     
- Misses        0    14920    +14920     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member Author

@scotthart scotthart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @cuiy0006)

@scotthart
Copy link
Member Author

/gcbrun

kMaxUuidNumberOfHexDigits, str),
GCP_ERROR_INFO());
}
std::string original_str = std::string(str);
Copy link
Contributor

@devbww devbww Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this could/should be done without copying the string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary to copy the original string for error message contents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, copying the string isn't necessary. Nothing here mutates the characters of the argument string_view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of the operations performed do mutate the string_view. I added additional expectation criteria to the tests that verified that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You appear to be confusing "mutate the string_view" with "mutate the characters of the string_view". Nothing does the latter. Indeed, string_view prohibits it. So, there is no need to copy the string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the string_view instead.

str.remove_suffix(1);
}

// Check for leading hyphen after braces.
Copy link
Contributor

@devbww devbww Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hyphen after braces" implies that this only applies to a hyphen after a brace. So, perhaps "hyphen after stripping any surrounding braces" would sound better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
std::string original_str = std::string(str);
// Check and remove optional braces
bool has_braces = (str[0] == '{');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_braces == true seems wrong for something we only know starts with a brace. Given that the variable is only used once, I'd just get rid of it. For example, ...

  if (absl::ConsumePrefix(&str, "{")) {
    if (!absl::ConsumeSuffix(&str, "}") {
      return internal::InvalidArgumentError(...);
    }
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 46 to 47
absl::StrFormat("UUID must be at least %d characters long: %s",
kMaxUuidNumberOfHexDigits, original_str),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a string of 31 hex digits, each separated by a hyphen, for a total length of 61 chars. Complaining that the string must be at least 32 chars long, when it is, seems wrong. Something about "not enough hex digits" would be more appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this error message and some others to specify hex digits as appropriate.

// Helper function to parse a single hexadecimal block of a UUID.
// A hexadecimal block is a 16-digit hexadecimal number, which is represented
// as 8 bytes.
StatusOr<uint64_t> ParseHexBlock(absl::string_view& str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/uint64_t/std::uint64_t/ et seq.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 50 to 48
if (str[0] == '-') {
return internal::InvalidArgumentError(
absl::StrFormat("UUID cannot contain consecutive hyphens: %s",
original_str),
GCP_ERROR_INFO());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be clearer to move this into the if (it == char_to_hex->end() block, if only to highlight that we're just special-casing the error message for a particular invalid character.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 100 to 108
to_hex(high_bits, 15, 8, target);
*(target + kHyphenPos[0]) = '-';
to_hex(high_bits, 7, 4, target + kHyphenPos[0] + 1);
*(target + kHyphenPos[1]) = '-';
to_hex(high_bits, 3, 0, target + kHyphenPos[1] + 1);
*(target + kHyphenPos[2]) = '-';
to_hex(low_bits, 15, 12, target + kHyphenPos[2] + 1);
*(target + kHyphenPos[3]) = '-';
to_hex(low_bits, 11, 0, target + kHyphenPos[3] + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to generalize using kHyphenPos, you should also compute the start/end_index pairs using it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I began computing start/end pairs per call to to_hex, looked at the results and thought "I keep recomputing the values of start/end, this could be a loop". Refactored it into a loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the operator std::string() snippet above, which has a single loop, and doesn't do any index computations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #15043 for future improvements as I'm out of time budget to work on this.

return !(lhs < rhs);
}
friend bool operator>(Uuid const& lhs, Uuid const& rhs) {
return (rhs < lhs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant parentheses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


StatusOr<Uuid> MakeUuid(absl::string_view str) {
// Early checks for invalid length or leading hyphen
if (str.size() < kMaxUuidNumberOfHexDigits) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems best omitted, and then handled by the actual failure to find enough hex digits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this check as recommended, and also added a guard for an empty string.

Copy link
Member Author

@scotthart scotthart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, Brad. I appreciate you donating your time.

Reviewable status: 0 of 6 files reviewed, 10 unresolved discussions (waiting on @cuiy0006 and @devbww)

return !(lhs < rhs);
}
friend bool operator>(Uuid const& lhs, Uuid const& rhs) {
return (rhs < lhs);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Helper function to parse a single hexadecimal block of a UUID.
// A hexadecimal block is a 16-digit hexadecimal number, which is represented
// as 8 bytes.
StatusOr<uint64_t> ParseHexBlock(absl::string_view& str,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 46 to 47
absl::StrFormat("UUID must be at least %d characters long: %s",
kMaxUuidNumberOfHexDigits, original_str),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this error message and some others to specify hex digits as appropriate.

Comment on lines 50 to 48
if (str[0] == '-') {
return internal::InvalidArgumentError(
absl::StrFormat("UUID cannot contain consecutive hyphens: %s",
original_str),
GCP_ERROR_INFO());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 100 to 108
to_hex(high_bits, 15, 8, target);
*(target + kHyphenPos[0]) = '-';
to_hex(high_bits, 7, 4, target + kHyphenPos[0] + 1);
*(target + kHyphenPos[1]) = '-';
to_hex(high_bits, 3, 0, target + kHyphenPos[1] + 1);
*(target + kHyphenPos[2]) = '-';
to_hex(low_bits, 15, 12, target + kHyphenPos[2] + 1);
*(target + kHyphenPos[3]) = '-';
to_hex(low_bits, 11, 0, target + kHyphenPos[3] + 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I began computing start/end pairs per call to to_hex, looked at the results and thought "I keep recomputing the values of start/end, this could be a loop". Refactored it into a loop.


StatusOr<Uuid> MakeUuid(absl::string_view str) {
// Early checks for invalid length or leading hyphen
if (str.size() < kMaxUuidNumberOfHexDigits) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this check as recommended, and also added a guard for an empty string.

kMaxUuidNumberOfHexDigits, str),
GCP_ERROR_INFO());
}
std::string original_str = std::string(str);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary to copy the original string for error message contents.

}
std::string original_str = std::string(str);
// Check and remove optional braces
bool has_braces = (str[0] == '{');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

str.remove_suffix(1);
}

// Check for leading hyphen after braces.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Parses a textual representation a `Uuid` from a string of hexadecimal digits.
* Returns an error if unable to parse the given input.
*
* Acceptable input strings must consist of [0-f] digits with formatting such:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0-f] is not good definition of a character class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded definition. If this still isn't a "good" definition, please suggest one.

/// Default construction yields a zero value UUID.
Uuid() = default;

/// Construct a UUID from a packed integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "packed" is supposed to convey.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, although we now have "128 bit unsigned" and "unsigned 64 bit" in consecutive comments. Consistency would be nice. And "N-bit" should be hyphenated when used as an adjective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded across usages for consistency.

Comment on lines 44 to 45
/// Construct a UUID from two 64 bit pieces.
Uuid(std::uint64_t high_bits, std::uint64_t low_bits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd get rid of this constructor. If the caller has two std::uint64_ts, they can build the absl::uint128 by themselves. It shouldn't be part of this class (just like there shouldn't be an accessor that returns a pair of std::uint64_ts).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to give users a options that uses std integer types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, there should be an accessor that returns a pair of std::uint64_ts.

And, you need to #include <cstdint>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// @name Conversion to packed integer representation.
explicit operator absl::uint128() const { return uuid_; }

/// @name Conversion to a lower case string using formatted:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"using formatted:" doesn't parse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ using//

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


std::string output;
output.resize(kUuidStringLen);
char* target = &((output)[output.size() - kUuidStringLen]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output.size() - kUuidStringLen is zero. Why obfuscate that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-obfuscated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const_casting isn't good either. Why not just ...

  char* target = &output[0];
  char* const last = &output[output.size()];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&output[0] turns into a clang-tidy diagnostic about using .data() instead which until c++17 only returns a const char*. I have multiple non-ideal choices.

Comment on lines +140 to +143
auto high_bits = ParseHexBlock(str, original_str);
if (!high_bits.ok()) return std::move(high_bits).status();
auto low_bits = ParseHexBlock(str, original_str);
if (!low_bits.ok()) return std::move(low_bits).status();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I would suggest not introducing two 64-bit values here. (Otherwise, why not four 32-bit values, for example?) A single 128-bit value should suffice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lacking a std::uint128_t and absl::uint128 not behaving exactly like a std::uint128_t, std::uint64_t is the largest integer currently available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't know what absl::uint128 behavior you think is lacking. For example,

#include <cctype>
#include <cstring>

constexpr char kHexDigits[] = "0123456789abcdef";

StatusOr<Uuid> MakeUuid(absl::string_view str) {
  absl::uint128 uuid = 0;
  auto const original_str = str;

  if (absl::ConsumePrefix(&str, "{")) {
    if (!absl::ConsumeSuffix(&str, "}")) {
      return internal::InvalidArgumentError(
          absl::StrFormat("UUID missing closing '}': %s", original_str),
          GCP_ERROR_INFO());
    }
  }
  if (absl::StartsWith(str, "-")) {
    return internal::InvalidArgumentError(
        absl::StrFormat("UUID cannot begin with '-': %s", original_str),
        GCP_ERROR_INFO());
  }

  constexpr int kUuidNumberOfHexDigits = 32;
  for (int j = 0; j != kUuidNumberOfHexDigits; ++j) {
    absl::ConsumePrefix(&str, "-");
    if (str.empty()) {
      return internal::InvalidArgumentError(
          absl::StrFormat("UUID must contain %d hexadecimal digits: %s",
                          kUuidNumberOfHexDigits, original_str),
          GCP_ERROR_INFO());
    }
    auto const* dp = std::strchr(
        kHexDigits, std::tolower(static_cast<unsigned char>(str[0])));
    if (dp == nullptr) {
      if (str[0] == '-') {
        return internal::InvalidArgumentError(
            absl::StrFormat("UUID cannot contain consecutive hyphens: %s",
                            original_str),
            GCP_ERROR_INFO());
      }
      return internal::InvalidArgumentError(
          absl::StrFormat("UUID contains invalid character (%c): %s", str[0],
                          original_str),
          GCP_ERROR_INFO());
    }
    str.remove_prefix(1);
    uuid <<= 4;
    uuid += dp - kHexDigits;
  }

  if (!str.empty()) {
    return internal::InvalidArgumentError(
        absl::StrFormat("Extra characters found after parsing UUID: %s", str),
        GCP_ERROR_INFO());
  }
  return Uuid{uuid};
}

No "blocks" (or hash maps). Just a simple per-nibble loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #15043 for future improvements as I'm out of time budget to work on this.

Comment on lines 147 to 148
absl::StrFormat("Extra characters (%d) found after parsing UUID: %s",
str.size(), original_str),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not provide the actual extra characters rather than the number of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

StatusOr<Uuid> MakeUuid(absl::string_view str) {
if (str.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this check as recommended, and also added a guard for an empty string.

I wouldn't make a special case of the empty string either. Just change the

  if (str[0] == '-') {

test to

  if (absl::StartsWith(str, "-")) {

and the empty string will be treated like any other illegal argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the early exit and specific error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Early exit" is only a preferred strategy when it makes the following code simpler. Here it remains the same.

Making "UUID cannot be empty" a special case over "UUID must contain 32 hexadecimal digits" is not worth the redundancy. It makes the code more cluttered and more difficult to understand. And why shouldn't MakeUuid("{}") behave the same way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this check and added a test for "{}". Both now result in a satisfactory error message.

GOOGLE_CLOUD_CPP_INLINE_NAMESPACE_BEGIN
namespace {

constexpr int kMaxUuidNumberOfHexDigits = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this localized to ParseHexBlock() just like kMaxUuidBlockLength is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 58 to 62
friend bool operator==(Uuid const& lhs, Uuid const& rhs);
friend bool operator!=(Uuid const& lhs, Uuid const& rhs) {
return !(lhs == rhs);
}
friend bool operator<(Uuid const& lhs, Uuid const& rhs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that == and < are simple operations on the representation, I don't know why we wouldn't choose to inline them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

@scotthart scotthart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 21 unresolved discussions (waiting on @cuiy0006 and @devbww)

/// Default construction yields a zero value UUID.
Uuid() = default;

/// Construct a UUID from a packed integer.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded

Comment on lines 44 to 45
/// Construct a UUID from two 64 bit pieces.
Uuid(std::uint64_t high_bits, std::uint64_t low_bits);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to give users a options that uses std integer types.

Comment on lines 58 to 62
friend bool operator==(Uuid const& lhs, Uuid const& rhs);
friend bool operator!=(Uuid const& lhs, Uuid const& rhs) {
return !(lhs == rhs);
}
friend bool operator<(Uuid const& lhs, Uuid const& rhs);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// @name Conversion to packed integer representation.
explicit operator absl::uint128() const { return uuid_; }

/// @name Conversion to a lower case string using formatted:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded

* Parses a textual representation a `Uuid` from a string of hexadecimal digits.
* Returns an error if unable to parse the given input.
*
* Acceptable input strings must consist of [0-f] digits with formatting such:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded definition. If this still isn't a "good" definition, please suggest one.


std::string output;
output.resize(kUuidStringLen);
char* target = &((output)[output.size() - kUuidStringLen]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-obfuscated

output.resize(kUuidStringLen);
char* target = &((output)[output.size() - kUuidStringLen]);
char* const last = &((output)[output.size()]);
auto bits = Uint128High64(uuid_);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::uint128 isn't a drop in replacement in the existing code (I tried). When we get a std::uint128_t I'm happy to revisit.

}

StatusOr<Uuid> MakeUuid(absl::string_view str) {
if (str.empty()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the early exit and specific error message.

Comment on lines +140 to +143
auto high_bits = ParseHexBlock(str, original_str);
if (!high_bits.ok()) return std::move(high_bits).status();
auto low_bits = ParseHexBlock(str, original_str);
if (!low_bits.ok()) return std::move(low_bits).status();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lacking a std::uint128_t and absl::uint128 not behaving exactly like a std::uint128_t, std::uint64_t is the largest integer currently available.

Comment on lines 147 to 148
absl::StrFormat("Extra characters (%d) found after parsing UUID: %s",
str.size(), original_str),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@devbww devbww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome.

Aside: "switch from absl::flat_hash_map to std::unordered_map for 32 bit compatibility" sounds wacky.

Reviewed 3 of 6 files at r1, 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @cuiy0006 and @scotthart)


google/cloud/spanner/uuid.cc line 31 at r6 (raw file):

StatusOr<std::uint64_t> ParseHexBlock(absl::string_view& str,
                                      absl::string_view original_str) {
  constexpr int kMaxUuidNumberOfHexDigits = 32;

In what sense is this a "Max" value? That implies that we could have fewer hex digits, but that is not the case.


google/cloud/spanner/uuid.cc line 33 at r6 (raw file):

  constexpr int kMaxUuidNumberOfHexDigits = 32;
  constexpr int kMaxUuidBlockLength = 16;
  static auto const* char_to_hex = new std::unordered_map<char, std::uint8_t>(

#include <unordered_map>

/// Default construction yields a zero value UUID.
Uuid() = default;

/// Construct a UUID from a packed integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, although we now have "128 bit unsigned" and "unsigned 64 bit" in consecutive comments. Consistency would be nice. And "N-bit" should be hyphenated when used as an adjective.

Comment on lines 44 to 45
/// Construct a UUID from two 64 bit pieces.
Uuid(std::uint64_t high_bits, std::uint64_t low_bits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, there should be an accessor that returns a pair of std::uint64_ts.

And, you need to #include <cstdint>.

/// @name Conversion to packed integer representation.
explicit operator absl::uint128() const { return uuid_; }

/// @name Conversion to a lower case string using formatted:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ using//

Comment on lines 100 to 108
to_hex(high_bits, 15, 8, target);
*(target + kHyphenPos[0]) = '-';
to_hex(high_bits, 7, 4, target + kHyphenPos[0] + 1);
*(target + kHyphenPos[1]) = '-';
to_hex(high_bits, 3, 0, target + kHyphenPos[1] + 1);
*(target + kHyphenPos[2]) = '-';
to_hex(low_bits, 15, 12, target + kHyphenPos[2] + 1);
*(target + kHyphenPos[3]) = '-';
to_hex(low_bits, 11, 0, target + kHyphenPos[3] + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the operator std::string() snippet above, which has a single loop, and doesn't do any index computations.

kMaxUuidNumberOfHexDigits, str),
GCP_ERROR_INFO());
}
std::string original_str = std::string(str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You appear to be confusing "mutate the string_view" with "mutate the characters of the string_view". Nothing does the latter. Indeed, string_view prohibits it. So, there is no need to copy the string.


std::string output;
output.resize(kUuidStringLen);
char* target = &((output)[output.size() - kUuidStringLen]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const_casting isn't good either. Why not just ...

  char* target = &output[0];
  char* const last = &output[output.size()];

output.resize(kUuidStringLen);
char* target = &((output)[output.size() - kUuidStringLen]);
char* const last = &((output)[output.size()]);
auto bits = Uint128High64(uuid_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you tried, but if absl::uint128 didn't work as an integral type then that would be an Abseil bug.

In any case, I'll assert that it does work.

constexpr char kHexDigits[] = "0123456789abcdef";

Uuid::operator std::string() const {
  constexpr char kTemplate[] = "00000000-0000-0000-0000-000000000000";
  char buf[sizeof kTemplate];
  auto uuid = uuid_;
  for (auto j = sizeof buf; j-- != 0;) {
    if (kTemplate[j] != '0') {
      buf[j] = kTemplate[j];
    } else {
      buf[j] = kHexDigits[static_cast<int>(uuid & 0xf)];
      uuid >>= 4;
    }
  }
  return buf;
}

}

StatusOr<Uuid> MakeUuid(absl::string_view str) {
if (str.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Early exit" is only a preferred strategy when it makes the following code simpler. Here it remains the same.

Making "UUID cannot be empty" a special case over "UUID must contain 32 hexadecimal digits" is not worth the redundancy. It makes the code more cluttered and more difficult to understand. And why shouldn't MakeUuid("{}") behave the same way?

Comment on lines +140 to +143
auto high_bits = ParseHexBlock(str, original_str);
if (!high_bits.ok()) return std::move(high_bits).status();
auto low_bits = ParseHexBlock(str, original_str);
if (!low_bits.ok()) return std::move(low_bits).status();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't know what absl::uint128 behavior you think is lacking. For example,

#include <cctype>
#include <cstring>

constexpr char kHexDigits[] = "0123456789abcdef";

StatusOr<Uuid> MakeUuid(absl::string_view str) {
  absl::uint128 uuid = 0;
  auto const original_str = str;

  if (absl::ConsumePrefix(&str, "{")) {
    if (!absl::ConsumeSuffix(&str, "}")) {
      return internal::InvalidArgumentError(
          absl::StrFormat("UUID missing closing '}': %s", original_str),
          GCP_ERROR_INFO());
    }
  }
  if (absl::StartsWith(str, "-")) {
    return internal::InvalidArgumentError(
        absl::StrFormat("UUID cannot begin with '-': %s", original_str),
        GCP_ERROR_INFO());
  }

  constexpr int kUuidNumberOfHexDigits = 32;
  for (int j = 0; j != kUuidNumberOfHexDigits; ++j) {
    absl::ConsumePrefix(&str, "-");
    if (str.empty()) {
      return internal::InvalidArgumentError(
          absl::StrFormat("UUID must contain %d hexadecimal digits: %s",
                          kUuidNumberOfHexDigits, original_str),
          GCP_ERROR_INFO());
    }
    auto const* dp = std::strchr(
        kHexDigits, std::tolower(static_cast<unsigned char>(str[0])));
    if (dp == nullptr) {
      if (str[0] == '-') {
        return internal::InvalidArgumentError(
            absl::StrFormat("UUID cannot contain consecutive hyphens: %s",
                            original_str),
            GCP_ERROR_INFO());
      }
      return internal::InvalidArgumentError(
          absl::StrFormat("UUID contains invalid character (%c): %s", str[0],
                          original_str),
          GCP_ERROR_INFO());
    }
    str.remove_prefix(1);
    uuid <<= 4;
    uuid += dp - kHexDigits;
  }

  if (!str.empty()) {
    return internal::InvalidArgumentError(
        absl::StrFormat("Extra characters found after parsing UUID: %s", str),
        GCP_ERROR_INFO());
  }
  return Uuid{uuid};
}

No "blocks" (or hash maps). Just a simple per-nibble loop.

Copy link
Member Author

@scotthart scotthart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the m32-pr build was failing with an assert from absl::flat_hash_map about the constructed from value not being the same as the key type. As inefficient as it is, std::unordered_map did not have the same issue.

Reviewable status: 3 of 6 files reviewed, 12 unresolved discussions (waiting on @cuiy0006 and @devbww)


google/cloud/spanner/uuid.cc line 31 at r6 (raw file):

Previously, devbww (Bradley White) wrote…

In what sense is this a "Max" value? That implies that we could have fewer hex digits, but that is not the case.

Renamed


google/cloud/spanner/uuid.cc line 33 at r6 (raw file):

Previously, devbww (Bradley White) wrote…

#include <unordered_map>

Added.

/// Default construction yields a zero value UUID.
Uuid() = default;

/// Construct a UUID from a packed integer.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded across usages for consistency.

Comment on lines 44 to 45
/// Construct a UUID from two 64 bit pieces.
Uuid(std::uint64_t high_bits, std::uint64_t low_bits);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// @name Conversion to packed integer representation.
explicit operator absl::uint128() const { return uuid_; }

/// @name Conversion to a lower case string using formatted:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 100 to 108
to_hex(high_bits, 15, 8, target);
*(target + kHyphenPos[0]) = '-';
to_hex(high_bits, 7, 4, target + kHyphenPos[0] + 1);
*(target + kHyphenPos[1]) = '-';
to_hex(high_bits, 3, 0, target + kHyphenPos[1] + 1);
*(target + kHyphenPos[2]) = '-';
to_hex(low_bits, 15, 12, target + kHyphenPos[2] + 1);
*(target + kHyphenPos[3]) = '-';
to_hex(low_bits, 11, 0, target + kHyphenPos[3] + 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #15043 for future improvements as I'm out of time budget to work on this.

kMaxUuidNumberOfHexDigits, str),
GCP_ERROR_INFO());
}
std::string original_str = std::string(str);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the string_view instead.


std::string output;
output.resize(kUuidStringLen);
char* target = &((output)[output.size() - kUuidStringLen]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&output[0] turns into a clang-tidy diagnostic about using .data() instead which until c++17 only returns a const char*. I have multiple non-ideal choices.

output.resize(kUuidStringLen);
char* target = &((output)[output.size() - kUuidStringLen]);
char* const last = &((output)[output.size()]);
auto bits = Uint128High64(uuid_);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try much. I changed to_hex first parameter to an absl::uint128 and got compile errors trying to index into kHexChar
Created #15043 for future improvements as I'm out of time budget to work on this.

}

StatusOr<Uuid> MakeUuid(absl::string_view str) {
if (str.empty()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this check and added a test for "{}". Both now result in a satisfactory error message.

Comment on lines +140 to +143
auto high_bits = ParseHexBlock(str, original_str);
if (!high_bits.ok()) return std::move(high_bits).status();
auto low_bits = ParseHexBlock(str, original_str);
if (!low_bits.ok()) return std::move(low_bits).status();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #15043 for future improvements as I'm out of time budget to work on this.

Copy link
Contributor

@devbww devbww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to understand what "constructed value does not match the lookup key" means rather than switching to a container that doesn't make a similar check.

Reviewed 3 of 3 files at r7, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @cuiy0006)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants