-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Implement native ESRI reader #25241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement native ESRI reader #25241
Conversation
1361efe
to
7f29bc9
Compare
...ng/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveStorageFormats.java
Show resolved
Hide resolved
5b41a78
to
16784ed
Compare
@@ -17,6 +17,11 @@ | |||
</properties> | |||
|
|||
<dependencies> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that stands out is that the library https://github.com/Esri/geometry-api-java has not received any update for almost a year and its latest release is over 4 years ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the community point of view, what would you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath we already ship esri so this doesn't change anything.
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
break; | ||
} | ||
catch (ParseException e) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is intended and we return null
for unsupported timestamp format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After skimming the code and trying TestEsri
I definitely understand the purpose of this contribution.
The referenced library geometry-api-java
seems not lively anymore.
It would be useful to have a test reading all types.
However before adding any other changes, I think it is worth asking the maintainers @wendigo , @dain whether this contribution is basically fit from a functional perspective to be inclued in the Trino project code.
16784ed
to
f02ca1b
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveStorageFormat.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriReader.java
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
06d5677
to
d28634f
Compare
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriReader.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriReader.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriReader.java
Outdated
Show resolved
Hide resolved
@@ -17,6 +17,11 @@ | |||
</properties> | |||
|
|||
<dependencies> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the community point of view, what would you suggest?
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/HiveClassNames.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/esri/EsriPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/esri/EsriPageSourceFactory.java
Outdated
Show resolved
Hide resolved
8de204a
to
87d9d72
Compare
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriReader.java
Outdated
Show resolved
Hide resolved
87d9d72
to
2a28c44
Compare
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/esri/EsriPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/esri/EsriPageSource.java
Outdated
Show resolved
Hide resolved
2a28c44
to
a7df4a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a good while reviewing this. Overall I think the approach is sound, but the code is missing defenses against bad data files. In this code we should strive to be bug-for-bug compatible with hive, and this includes handling of "bad" files, because users often rely on these undocumented behaviors.
Additionally, Jackson has some unexpected behaviors when recursing into nested structures, and this code falls into that trap. Specifically, the code isn't properly skipping nexted data which can result in processing inside of objects that is not expected (I had to learn this the hard way a couple of years back). In general, I used (copied) the framework laid out in the Json reader, which handles these issues.
Instead of adding a lot of mundane comments, I just applied them to the code which you can find in this commit dain@8b95a73
Finally, the tests seem to be missing cases for some of the supported attribute types... I see then when running the tests with coverage.
@@ -17,6 +17,11 @@ | |||
</properties> | |||
|
|||
<dependencies> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath we already ship esri so this doesn't change anything.
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm").withZone(UTC_ZONE), | ||
DATE_FORMATTER); | ||
|
||
private final int numColumns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trino avoids abbreviations where possible, so in this case I would call this columnCount
. That said, I don't think this is necessary, you can simply use columnNames.size()
geometryColumn = i; | ||
} | ||
} | ||
this.geometryColumn = geometryColumn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw an exception if no geometry column is specified?
this.geometryColumn = geometryColumn; | ||
} | ||
|
||
private ImmutableMap<String, Integer> createColumnNameToIndexMap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Trino, return types, function arguments, and class fields should use the generic collection type (e.g. Map
) and not the implementation type ImmutableMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'd just inline this method
public final class EsriDeserializer | ||
{ | ||
private static final VarHandle INT_HANDLE_BIG_ENDIAN = MethodHandles.byteArrayViewVarHandle(int[].class, ByteOrder.BIG_ENDIAN); | ||
private static final ZoneId UTC_ZONE = ZoneId.of("UTC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a constant for UTC built into Java ZoneOffset.UTC
, but static import the field when using it.
continue; | ||
} | ||
|
||
if (GEOMETRY_FIELD_NAME.equals(parser.currentName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is a field not named "geometry" or "attributes"? Is it an error, or should it be skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decoding a document like this will fail:
String json = """
{
"extra-junk": {
"geometry": null,
"attributes": {
"id": 42
}
},
"attributes": {
"id": 1
},
"geometry": {
"x": 10,
"y": 20
}
}
""";
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
Description
This PR implements the native ESRI reader for reading Esri JSON which can be used for geospatial queries. (NOTE: we only support UTC timezone in this port)
Customer can now submit geospatial query on a table using ESRI serde.
DDL example
Example data is from https://docs.aws.amazon.com/athena/latest/ug/geospatial-example-queries.html
DML example (note: the test table data is large enough (8 MB) and needed multiple pages as it exceeded maximum page size (1 MB))
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( O ) Release notes are required, with the following suggested text: