Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPAddress Presto type #10596

Closed
wants to merge 21 commits into from
Closed

Conversation

mohsaka
Copy link
Contributor

@mohsaka mohsaka commented Jul 29, 2024

Split of #10538

In response to @mbasmanova :
#10538 (review)

Took a look and added the to and from varbinary casting
Contains CAST functions for IPAddress <-> Varchar and IPAddress <-> Varbinary utilizing the folly library.

In response to:
https://velox-lib.io/blog/optimize-try-more

I believe the folly library utilizes makeUnexpected already.
https://github.com/facebook/folly/blob/main/folly/IPAddress.cpp
I added one error for Varbinary conversion utilizing context.setStatus
If I'm not understanding something correctly please let me know.

As for adding a limited fuzzer @aditi-pandit
#10538 (comment)

Is this to test the casting IPAddress <-> Varchar and IPAddress <-> Varbinary or something else.
We already test the casting using the duckdb type, we also have the prestissimo test in the E2E test.
Fuzzer support will be added in after the types and functions have been added.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2024
Copy link

netlify bot commented Jul 29, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5eea4c6
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66ba4ed04b54c6000862e306

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka Nice clean change. Thank you for putting the effort. Overall looks good. Some comments below. I still need to understand the cast logic, but it will be easier once you add documentation. Thanks.

velox/docs/develop/types.rst Show resolved Hide resolved
velox/functions/prestosql/IPAddressFunctions.h Outdated Show resolved Hide resolved
velox/functions/prestosql/types/IPAddressType.h Outdated Show resolved Hide resolved
velox/functions/prestosql/types/IPAddressType.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/types/IPAddressType.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp Outdated Show resolved Hide resolved
@mbasmanova mbasmanova changed the title Add Presto IPAddress type Add IPAddress Presto type Aug 1, 2024
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mohsaka. Have some minor comments.

velox/functions/prestosql/types/IPAddressType.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/types/IPAddressType.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/types/IPAddressType.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/functions/presto/conversion.rst Outdated Show resolved Hide resolved
velox/docs/develop/types.rst Show resolved Hide resolved
@mohsaka
Copy link
Contributor Author

mohsaka commented Aug 8, 2024

@mbasmanova Thank you for all of the review comments. I believe I got everything. If there is anything else or I misunderstood something please let me know. Thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka Thank you for adding the documentation. This is very helpful. One question.

velox/docs/develop/types.rst Outdated Show resolved Hide resolved
type is BIGINT. The format that the address is stored in is defined as part of `(RFC 4291#section-2.5.5.2) <https://datatracker.ietf.org/doc/html/rfc4291.html#section-2.5.5.2>`_
As Velox is run on Little Endian systems and the standard is network byte(Big Endian)
order, we reverse the bytes to allow for masking and other bit operations
used in IPADDRESS/IPPREFIX related functions. This type can be used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an orderable type? If so, does OrderBy produce correct results if it treats values of this type the same as values of BIGINT type? If not, we may have a problem similar to #10338

CC: @pedroerp @bikramSingh91

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova
I believe that it is working somewhat correctly. To confirm I do have an E2E test that was added for my complete PR of IPADDRESS/IPPREFIX/functions.
prestodb/presto#23282

I believe that the reason for this is due to the 1 to 1 mapping of IPADDRESS to BIGINT and how they are stored.

For example the ipaddress
1.1.1.1 in hex is 00000000000000000000ffff01010101 and which has a binary value of 281470698586369

1.1.1.2 in hex is 00000000000000000000ffff01010102 which has a binary value of 281470698586370 which is one larger.

1.1.2.1 in hex is 00000000000000000000ffff01010201 which is 281470698586625 which is larger than both 1.1.1.1 and 1.1.1.2 as it should be.

Similar with IPV6.

That being said that does point out two issues. One that I don't think presto even takes care of.

  1. When we sort a mixture of IPV4 and IPV6 there will be a split between IPV6 and IPV4 addresses at ::FFFF:FFFF:FFFF This one should exist in presto.
    So the ordering will be for example
::FFFE:FFFF:FFFF
0.0.0.0
....
255.255.255.255
::1:0000:0000:0000
  1. The other problem being that any IPV6 IPADDRESS with the MSB set will be placed at the start instead of at the end. This one shouldn't exist in presto as its doing a compareUnsigned.
    For example
FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF
::
::1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this. I'll need more time to understand.

I see that compare in Presto for IPADDRESS is not the same as for BIGINT (or maybe they just look different). This suggest that IPADDRESS type is affected by #10338 and therefore we won't be able to use it (without causing correctness issues) until that issue is resolved or we find a way to block code paths that may be affected.

    @Override
    public int compareTo(Block leftBlock, int leftPosition, Block rightBlock, int rightPosition)
    {
        int compare = Long.compareUnsigned(reverseBytes(leftBlock.getLong(leftPosition, 0)), reverseBytes(rightBlock.getLong(rightPosition, 0)));
        if (compare != 0) {
            return compare;
        }
        return Long.compareUnsigned(reverseBytes(leftBlock.getLong(leftPosition, SIZE_OF_LONG)), reverseBytes(rightBlock.getLong(rightPosition, SIZE_OF_LONG)));
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Simplest fix would be to switch to varbinary but that is not very efficient. Other option would be to override the operator similar to how we override cast? Not sure if this is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is possible

Not yet. @bikramSingh91 has been looking into this as part of #10338

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka One option might be to proceed to land this change and continue landing functions that use that type, but document that it is not "ready to be used" and add a plan validator to Presto to reject plans with that type when running native engine (we have that for TS_w_TS type).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka We should add the plan validator with a config like it was done for timestamp with timezone in Presto coordiantor first before merging this change in velox.
Example:
prestodb/presto#23200
prestodb/presto#23374
It will allow us to test it, but disable for production or correctness critical use cases. If we merge this and worker start executing queries with IPAddress type unconditonally, we will face correctness problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amitkdutta Thank you for the pointers. Will work on adding the plan changes before merging in these changes.

velox/docs/functions/presto/conversion.rst Show resolved Hide resolved
folly::IPAddressV6 v6Addr(addrBytes);

exec::StringWriter<false> result(flatResult, row);
result.reserve(kIPAddressMaxStrLen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reserve capacity for all rows outside of the loop?

See getRawStringBufferWithSpace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed example

flatResult->resize(rows.size());

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka Looks great % this type suffering from #10338

CC: @amitkdutta

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Aug 9, 2024
const auto* ipaddresses = input.as<SimpleVector<int128_t>>();
folly::ByteArray16 addrBytes;

flatResult->resize(rows.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right. I'm out next week. Maybe @bikramSingh91 can help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka Thank you for contributing.
Let me get back to you by EOD today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 Thank you. I went off of

flatResult->resize(rows.size());

as stated before. Which when I looked at it it seemed to me to be
rows.size() - number of elements
size_t totalBytes = flagBytes_ * rows.size() + fixedWidthRowSize * rows.size(); - Total number of bytes for all of the elements.

I guess I misunderstood something. Thank you for helping me out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka thanks for the update, I unfortunately did not get time to review this today and need to head out of work, I will try to get to it over the weekend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context: rows here is a selectivity vector which can have inactive rows at the end. Therefore you can have cases where rows.size() is say 10 but has 6 inactive trailing rows. Therefore, if you use rows.size() then you would be allocating extra space than which is actually required as in expression eval (as eval here is only concerned with the active rows for efficiency, the code around expression eval ensures those rows are copied over to the final result or this vector is sufficiently resized). rows.end() is therefore more appropriate here which returns one past the last selected value.

(My recommendation) However, to make implementation simpler, we have a utility function context.ensureWritable() which takes care of these lower level details for you. See example: https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L167

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context.ensureWritable() should already be called in castTo and castFrom. More follow up on
#10596 (comment)

velox/docs/develop/types.rst Outdated Show resolved Hide resolved
const auto* ipaddresses = input.as<SimpleVector<int128_t>>();
folly::ByteArray16 addrBytes;

flatResult->resize(rows.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context: rows here is a selectivity vector which can have inactive rows at the end. Therefore you can have cases where rows.size() is say 10 but has 6 inactive trailing rows. Therefore, if you use rows.size() then you would be allocating extra space than which is actually required as in expression eval (as eval here is only concerned with the active rows for efficiency, the code around expression eval ensures those rows are copied over to the final result or this vector is sufficiently resized). rows.end() is therefore more appropriate here which returns one past the last selected value.

(My recommendation) However, to make implementation simpler, we have a utility function context.ensureWritable() which takes care of these lower level details for you. See example: https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L167

folly::ByteArray16 addrBytes;

flatResult->resize(rows.size());
flatResult->getRawStringBufferWithSpace(rows.size() * kIPAddressMaxStrLen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont need this, since you are using exec::StringWriter, it takes care of string buffer management for you.

In fact calling getRawStringBufferWithSpace() would be counter productive as this would try to allocate a single large buffer of this size, whereas the vector might already have a number of smaller buffers that can accomodate values that you will be writing next. Using exec::StringWriter will abstract away all those lower level details for you so would be the preferred way to go.

Another way to do this (without using exec::StringWriter) would be to use the FlatVector's API FlatVector<StringView>::set(vector_size_t idx, StringView value) which also takes care of buffer management

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 Thank you for the tips. The reason I tried to use the allocator was via @mbasmanova's suggestion of preallocating the strings before hand. See
#10596 (comment)
&
#10596 (comment)

At first I didn't have any reservation for the string space, then I reserved each individual row's string space. But then @mbasmanova suggested I allocate the entire string space before entering the row by row casting.

exec::EvalCtx& context,
const SelectivityVector& rows,
BaseVector& result) {
auto* flatResult = result.as<FlatVector<int128_t>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call context.ensureWritable before writing into the vector. here and other cast* implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate your advice but I believe that I already call this in the caller, see castTo castFrom functions.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mohsaka
Copy link
Contributor Author

mohsaka commented Aug 12, 2024

@xiaoxmeng This PR is not ready to merge. If possible could you remove the [ready-to-merge] label?

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in e092535.

Copy link

Conbench analyzed the 1 benchmark run on commit e0925359.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@wraymo
Copy link

wraymo commented Oct 1, 2024

Did you consider the compatibility with Presto IPADDRESS type? When casting a string to IPADDRESS, Presto uses Slice as the return type, which is not compatible with BIGINT. It can result in this error encoding == kindEncoding Serialized encoding is not compatible with requested type: BIGINT. Expected LONG_ARRAY. Got VARIABLE_WIDTH.

@mohsaka
Copy link
Contributor Author

mohsaka commented Oct 1, 2024

Did you consider the compatibility with Presto IPADDRESS type? When casting a string to IPADDRESS, Presto uses Slice as the return type, which is not compatible with BIGINT. It can result in this error encoding == kindEncoding Serialized encoding is not compatible with requested type: BIGINT. Expected LONG_ARRAY. Got VARIABLE_WIDTH.

I used UUID as the example for the IPADDRESS type. So I figured that we would already have compatibility between fixed width 128 type <-> HUGEINT. But it seems like I may be wrong. I did do E2E testing with Prestissimo, but I did not try creating tables in Presto and reading them in Prestissimo.

Is that what you are talking about? Please confirm. Thanks! Also could you give UUID a try as well? There's another type TimestampWithTimezone but I think that may be fine, being an AbstractLongType in Presto.

We need a Serializer change here if that is the case.

Thanks @wraymo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants