Add IPAddress Presto type #10596

mohsaka · 2024-07-29T18:11:26Z

Split of #10538

In response to @mbasmanova :
#10538 (review)

Took a look and added the to and from varbinary casting
Contains CAST functions for IPAddress <-> Varchar and IPAddress <-> Varbinary utilizing the folly library.

In response to:
https://velox-lib.io/blog/optimize-try-more

I believe the folly library utilizes makeUnexpected already.
https://github.com/facebook/folly/blob/main/folly/IPAddress.cpp
I added one error for Varbinary conversion utilizing context.setStatus
If I'm not understanding something correctly please let me know.

As for adding a limited fuzzer @aditi-pandit
#10538 (comment)

Is this to test the casting IPAddress <-> Varchar and IPAddress <-> Varbinary or something else.
We already test the casting using the duckdb type, we also have the prestissimo test in the E2E test.
Fuzzer support will be added in after the types and functions have been added.

netlify · 2024-07-29T18:11:43Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`5eea4c6`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/66ba4ed04b54c6000862e306

mbasmanova

@mohsaka Nice clean change. Thank you for putting the effort. Overall looks good. Some comments below. I still need to understand the cast logic, but it will be easier once you add documentation. Thanks.

velox/docs/develop/types.rst

velox/functions/prestosql/IPAddressFunctions.h

velox/functions/prestosql/types/IPAddressType.h

velox/functions/prestosql/types/IPAddressType.cpp

velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp

aditi-pandit

Thanks @mohsaka. Have some minor comments.

velox/functions/prestosql/types/IPAddressType.cpp

velox/functions/prestosql/tests/IPAddressFunctionsTest.cpp

velox/docs/functions/presto/conversion.rst

velox/docs/develop/types.rst

velox/functions/prestosql/types/IPAddressType.h

velox/functions/prestosql/types/IPAddressType.cpp

mohsaka · 2024-08-08T22:20:49Z

@mbasmanova Thank you for all of the review comments. I believe I got everything. If there is anything else or I misunderstood something please let me know. Thanks!

mbasmanova

@mohsaka Thank you for adding the documentation. This is very helpful. One question.

velox/docs/develop/types.rst

mbasmanova · 2024-08-08T23:24:02Z

velox/docs/develop/types.rst

+type is BIGINT. The format that the address is stored in is defined as part of `(RFC 4291#section-2.5.5.2) <https://datatracker.ietf.org/doc/html/rfc4291.html#section-2.5.5.2>`_
+As Velox is run on Little Endian systems and the standard is network byte(Big Endian)
+order, we reverse the bytes to allow for masking and other bit operations
+used in IPADDRESS/IPPREFIX related functions. This type can be used to


Is this an orderable type? If so, does OrderBy produce correct results if it treats values of this type the same as values of BIGINT type? If not, we may have a problem similar to #10338

CC: @pedroerp @bikramSingh91

@mbasmanova
I believe that it is working somewhat correctly. To confirm I do have an E2E test that was added for my complete PR of IPADDRESS/IPPREFIX/functions.
prestodb/presto#23282

I believe that the reason for this is due to the 1 to 1 mapping of IPADDRESS to BIGINT and how they are stored.

For example the ipaddress
1.1.1.1 in hex is 00000000000000000000ffff01010101 and which has a binary value of 281470698586369

1.1.1.2 in hex is 00000000000000000000ffff01010102 which has a binary value of 281470698586370 which is one larger.

1.1.2.1 in hex is 00000000000000000000ffff01010201 which is 281470698586625 which is larger than both 1.1.1.1 and 1.1.1.2 as it should be.

Similar with IPV6.

That being said that does point out two issues. One that I don't think presto even takes care of.

When we sort a mixture of IPV4 and IPV6 there will be a split between IPV6 and IPV4 addresses at ::FFFF:FFFF:FFFF This one should exist in presto.
So the ordering will be for example

::FFFE:FFFF:FFFF 0.0.0.0 .... 255.255.255.255 ::1:0000:0000:0000

The other problem being that any IPV6 IPADDRESS with the MSB set will be placed at the start instead of at the end. This one shouldn't exist in presto as its doing a compareUnsigned.
For example

FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF :: ::1

Thank you for looking into this. I'll need more time to understand.

I see that compare in Presto for IPADDRESS is not the same as for BIGINT (or maybe they just look different). This suggest that IPADDRESS type is affected by #10338 and therefore we won't be able to use it (without causing correctness issues) until that issue is resolved or we find a way to block code paths that may be affected.

@Override public int compareTo(Block leftBlock, int leftPosition, Block rightBlock, int rightPosition) { int compare = Long.compareUnsigned(reverseBytes(leftBlock.getLong(leftPosition, 0)), reverseBytes(rightBlock.getLong(rightPosition, 0))); if (compare != 0) { return compare; } return Long.compareUnsigned(reverseBytes(leftBlock.getLong(leftPosition, SIZE_OF_LONG)), reverseBytes(rightBlock.getLong(rightPosition, SIZE_OF_LONG))); }

No problem. Simplest fix would be to switch to varbinary but that is not very efficient. Other option would be to override the operator similar to how we override cast? Not sure if this is possible.

Not sure if this is possible

Not yet. @bikramSingh91 has been looking into this as part of #10338

@mohsaka One option might be to proceed to land this change and continue landing functions that use that type, but document that it is not "ready to be used" and add a plan validator to Presto to reject plans with that type when running native engine (we have that for TS_w_TS type).

@mohsaka We should add the plan validator with a config like it was done for timestamp with timezone in Presto coordiantor first before merging this change in velox.
Example:
prestodb/presto#23200
prestodb/presto#23374
It will allow us to test it, but disable for production or correctness critical use cases. If we merge this and worker start executing queries with IPAddress type unconditonally, we will face correctness problems.

@amitkdutta Thank you for the pointers. Will work on adding the plan changes before merging in these changes.

velox/docs/functions/presto/conversion.rst

mbasmanova · 2024-08-08T23:30:57Z

velox/functions/prestosql/types/IPAddressType.cpp

+      folly::IPAddressV6 v6Addr(addrBytes);
+
+      exec::StringWriter<false> result(flatResult, row);
+      result.reserve(kIPAddressMaxStrLen);


Can we reserve capacity for all rows outside of the loop?

See getRawStringBufferWithSpace

Followed example

velox/velox/exec/RowContainer.cpp

Line 676 in c0f0a31

flatResult->resize(rows.size());

velox/functions/prestosql/types/IPAddressType.cpp

mbasmanova

@mohsaka Looks great % this type suffering from #10338

CC: @amitkdutta

mbasmanova · 2024-08-09T19:07:54Z

velox/functions/prestosql/types/IPAddressType.cpp

+    const auto* ipaddresses = input.as<SimpleVector<int128_t>>();
+    folly::ByteArray16 addrBytes;
+
+    flatResult->resize(rows.size());


This is not right. I'm out next week. Maybe @bikramSingh91 can help.

@mohsaka Thank you for contributing.
Let me get back to you by EOD today.

@bikramSingh91 Thank you. I went off of

velox/velox/exec/RowContainer.cpp

Line 676 in c0f0a31

flatResult->resize(rows.size());

as stated before. Which when I looked at it it seemed to me to be
rows.size() - number of elements
size_t totalBytes = flagBytes_ * rows.size() + fixedWidthRowSize * rows.size(); - Total number of bytes for all of the elements.

I guess I misunderstood something. Thank you for helping me out.

@mohsaka thanks for the update, I unfortunately did not get time to review this today and need to head out of work, I will try to get to it over the weekend.

context: rows here is a selectivity vector which can have inactive rows at the end. Therefore you can have cases where rows.size() is say 10 but has 6 inactive trailing rows. Therefore, if you use rows.size() then you would be allocating extra space than which is actually required as in expression eval (as eval here is only concerned with the active rows for efficiency, the code around expression eval ensures those rows are copied over to the final result or this vector is sufficiently resized). rows.end() is therefore more appropriate here which returns one past the last selected value.

(My recommendation) However, to make implementation simpler, we have a utility function context.ensureWritable() which takes care of these lower level details for you. See example: https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L167

context.ensureWritable() should already be called in castTo and castFrom. More follow up on
#10596 (comment)

velox/docs/develop/types.rst

bikramSingh91 · 2024-08-11T18:27:18Z

velox/functions/prestosql/types/IPAddressType.cpp

+    const auto* ipaddresses = input.as<SimpleVector<int128_t>>();
+    folly::ByteArray16 addrBytes;
+
+    flatResult->resize(rows.size());


context: rows here is a selectivity vector which can have inactive rows at the end. Therefore you can have cases where rows.size() is say 10 but has 6 inactive trailing rows. Therefore, if you use rows.size() then you would be allocating extra space than which is actually required as in expression eval (as eval here is only concerned with the active rows for efficiency, the code around expression eval ensures those rows are copied over to the final result or this vector is sufficiently resized). rows.end() is therefore more appropriate here which returns one past the last selected value.

(My recommendation) However, to make implementation simpler, we have a utility function context.ensureWritable() which takes care of these lower level details for you. See example: https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L167

bikramSingh91 · 2024-08-11T18:31:21Z

velox/functions/prestosql/types/IPAddressType.cpp

+    folly::ByteArray16 addrBytes;
+
+    flatResult->resize(rows.size());
+    flatResult->getRawStringBufferWithSpace(rows.size() * kIPAddressMaxStrLen);


you dont need this, since you are using exec::StringWriter, it takes care of string buffer management for you.

In fact calling getRawStringBufferWithSpace() would be counter productive as this would try to allocate a single large buffer of this size, whereas the vector might already have a number of smaller buffers that can accomodate values that you will be writing next. Using exec::StringWriter will abstract away all those lower level details for you so would be the preferred way to go.

Another way to do this (without using exec::StringWriter) would be to use the FlatVector's API FlatVector<StringView>::set(vector_size_t idx, StringView value) which also takes care of buffer management

@bikramSingh91 Thank you for the tips. The reason I tried to use the allocator was via @mbasmanova's suggestion of preallocating the strings before hand. See
#10596 (comment)
&
#10596 (comment)

At first I didn't have any reservation for the string space, then I reserved each individual row's string space. But then @mbasmanova suggested I allocate the entire string space before entering the row by row casting.

bikramSingh91 · 2024-08-11T18:37:29Z

velox/functions/prestosql/types/IPAddressType.cpp

+      exec::EvalCtx& context,
+      const SelectivityVector& rows,
+      BaseVector& result) {
+    auto* flatResult = result.as<FlatVector<int128_t>>();


call context.ensureWritable before writing into the vector. here and other cast* implementations.

Really appreciate your advice but I believe that I already call this in the caller, see castTo castFrom functions.

facebook-github-bot · 2024-08-12T22:13:04Z

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mohsaka · 2024-08-12T22:16:23Z

@xiaoxmeng This PR is not ready to merge. If possible could you remove the [ready-to-merge] label?

facebook-github-bot · 2024-08-13T08:59:12Z

@xiaoxmeng merged this pull request in e092535.

conbench-facebook · 2024-08-13T09:40:33Z

Conbench analyzed the 1 benchmark run on commit e0925359.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

wraymo · 2024-10-01T00:02:00Z

Did you consider the compatibility with Presto IPADDRESS type? When casting a string to IPADDRESS, Presto uses Slice as the return type, which is not compatible with BIGINT. It can result in this error encoding == kindEncoding Serialized encoding is not compatible with requested type: BIGINT. Expected LONG_ARRAY. Got VARIABLE_WIDTH.

mohsaka · 2024-10-01T00:31:06Z

Did you consider the compatibility with Presto IPADDRESS type? When casting a string to IPADDRESS, Presto uses Slice as the return type, which is not compatible with BIGINT. It can result in this error encoding == kindEncoding Serialized encoding is not compatible with requested type: BIGINT. Expected LONG_ARRAY. Got VARIABLE_WIDTH.

I used UUID as the example for the IPADDRESS type. So I figured that we would already have compatibility between fixed width 128 type <-> HUGEINT. But it seems like I may be wrong. I did do E2E testing with Prestissimo, but I did not try creating tables in Presto and reading them in Prestissimo.

Is that what you are talking about? Please confirm. Thanks! Also could you give UUID a try as well? There's another type TimestampWithTimezone but I think that may be fine, being an AbstractLongType in Presto.

We need a Serializer change here if that is the case.

Thanks @wraymo

mohsaka added 2 commits July 26, 2024 16:08

Add varbinary

16d69ff

All formatted

5df77ca

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2024

amitkdutta requested review from mbasmanova and Yuhta August 1, 2024 03:58

mbasmanova reviewed Aug 1, 2024

View reviewed changes

mbasmanova changed the title ~~Add Presto IPAddress type~~ Add IPAddress Presto type Aug 1, 2024

aditi-pandit reviewed Aug 2, 2024

View reviewed changes

mohsaka added 5 commits August 2, 2024 15:26

Address comments

0541476

comments

a8dd6fb

Comments

f56b527

Format

b7e4836

Add IPAddress to docs

fadf329

mbasmanova reviewed Aug 7, 2024

View reviewed changes

Test case changes

6aaf937

mbasmanova reviewed Aug 8, 2024

View reviewed changes

mohsaka added 9 commits August 8, 2024 10:23

Some documentation fixes

44845a8

Address more comments

6c39388

From ipaddress casts

f31614a

Address comments

a941c93

format

e8897a2

Address more comments

0519e94

Format

afc20ab

add return

3bbc2fa

Add type

7ad4fcd

mohsaka requested review from mbasmanova and aditi-pandit August 8, 2024 22:21

mbasmanova reviewed Aug 8, 2024

View reviewed changes

address comments

63d9ad2

mbasmanova approved these changes Aug 9, 2024

View reviewed changes

mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Aug 9, 2024

Allocate entire buffer

2047cca

mbasmanova reviewed Aug 9, 2024

View reviewed changes

bikramSingh91 reviewed Aug 11, 2024

View reviewed changes

mohsaka added 2 commits August 12, 2024 10:58

Fix documentation

af963ee

Remove buffer allocation

5eea4c6

facebook-github-bot closed this in e092535 Aug 13, 2024

facebook-github-bot added the Merged label Aug 13, 2024

mohsaka mentioned this pull request Aug 13, 2024

[native] Disable IPAddress type on presto native prestodb/presto#23436

Merged

6 tasks

amitkdutta mentioned this pull request Aug 21, 2024

Function to aggregate IP prefixes. prestodb/presto#23375

Closed

mohsaka mentioned this pull request Aug 22, 2024

Add IPPrefix Presto type #10816

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IPAddress Presto type #10596

Add IPAddress Presto type #10596

mohsaka commented Jul 29, 2024 •

edited

Loading

netlify bot commented Jul 29, 2024 •

edited

Loading

mbasmanova left a comment

aditi-pandit left a comment

mohsaka commented Aug 8, 2024

mbasmanova left a comment

mbasmanova Aug 8, 2024

mohsaka Aug 8, 2024

mbasmanova Aug 9, 2024

mohsaka Aug 9, 2024

mbasmanova Aug 9, 2024

mbasmanova Aug 9, 2024

amitkdutta Aug 9, 2024

mohsaka Aug 9, 2024

mbasmanova Aug 8, 2024

mohsaka Aug 9, 2024

mbasmanova left a comment

mbasmanova Aug 9, 2024

bikramSingh91 Aug 9, 2024

mohsaka Aug 9, 2024

bikramSingh91 Aug 10, 2024

bikramSingh91 Aug 11, 2024

mohsaka Aug 12, 2024

bikramSingh91 Aug 11, 2024

bikramSingh91 Aug 11, 2024

mohsaka Aug 12, 2024

bikramSingh91 Aug 11, 2024

mohsaka Aug 12, 2024

facebook-github-bot commented Aug 12, 2024

mohsaka commented Aug 12, 2024

facebook-github-bot commented Aug 13, 2024

conbench-facebook bot commented Aug 13, 2024

wraymo commented Oct 1, 2024 •

edited

Loading

mohsaka commented Oct 1, 2024 •

edited

Loading

Add IPAddress Presto type #10596

Add IPAddress Presto type #10596

Conversation

mohsaka commented Jul 29, 2024 • edited Loading

netlify bot commented Jul 29, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

mbasmanova left a comment

Choose a reason for hiding this comment

aditi-pandit left a comment

Choose a reason for hiding this comment

mohsaka commented Aug 8, 2024

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 12, 2024

mohsaka commented Aug 12, 2024

facebook-github-bot commented Aug 13, 2024

conbench-facebook bot commented Aug 13, 2024

wraymo commented Oct 1, 2024 • edited Loading

mohsaka commented Oct 1, 2024 • edited Loading

mohsaka commented Jul 29, 2024 •

edited

Loading

netlify bot commented Jul 29, 2024 •

edited

Loading

wraymo commented Oct 1, 2024 •

edited

Loading

mohsaka commented Oct 1, 2024 •

edited

Loading