-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API, Core: Add scan planning apis to REST Catalog #11180
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rahil-c for kicking off the implementation! I wonder what others think but IMO this can be combined with serializer implementations that get registered in RESTSerializers
. Then we can have more robust tests which exercise all the validation logic in the model.
core/src/main/java/org/apache/iceberg/rest/responses/PlanTableScanResponse.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/responses/FetchPlanningResultResponse.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/requests/PlanTableScanRequest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/requests/PlanTableScanRequest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/responses/FetchScanTasksResponse.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/requests/FetchScanTasksRequest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/requests/FetchScanTasksRequest.java
Outdated
Show resolved
Hide resolved
@rahil-c can you also please add respective JSON parser(s) with tests for those parser(s) as those request/response classes won't be serialized/deserialized |
Thanks @amogh-jahagirdar and @nastra for revieiwing, will follow up on this pr with serializers, parsers, and the tests for them. |
core/src/main/java/org/apache/iceberg/rest/requests/FetchScanTasksRequest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/responses/FetchPlanningResultResponse.java
Outdated
Show resolved
Hide resolved
planStatus() != null, "invalid, status can not be null: ", planStatus()); | ||
Preconditions.checkArgument( | ||
planStatus() != PlanStatus.COMPLETED && (planTasks() != null || fileScanTasks() != null), | ||
"invalid response, tasks can only be returned in a 'completed' status"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this throw IllegalArgumentException
? Should this be some other REST exception because the service sent an invalid response? It isn't an argument that was the problem, it was the response itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow but I think it can be considered an IllegalArgumentException
since planStatus and tasks can be considered as arguments to the request builder, and if the combination of arguments are logically in a state where it's not spec compliant I feel like it makes sense to fail here with that exception? Another compelling part of this implementation is that anyone using this response builder from the reference implementation has immediate validation that their response is spec compliant or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amogh-jahagirdar for sharing your thoughts I feel like that was my understanding as well, the request and response require certain arguments to be present that the spec mandates, otherwise it is considered invalid and the IllegalArgumentException
seems appropriate.
When checking in the iceberg's core rest
module, we already do this in serval places so it seems to be the pattern.
Examples in the request classes
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/requests/CommitTransactionRequest.java#L40
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/requests/CreateNamespaceRequest.java#L48
Examples in parser classes
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/requests/CommitTransactionRequestParser.java#L44
@rdblue If this is an issue though can try to follow your suggestion as well, let me know what you think.
Preconditions.checkArgument(planStatus() != null, "invalid response, status can not be null"); | ||
Preconditions.checkArgument( | ||
planStatus() != PlanStatus.CANCELLED, | ||
"invalid response, 'cancelled' is not a valid status for this endpoint"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also not be an IllegalArgumentException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented here https://github.com/apache/iceberg/pull/11180/files#r1779211148
If this is an issue Im assuming you would like this to also be some specific new REST exception on the actual fields value being invalid?
|
||
@Nullable | ||
@Value.Default | ||
default Boolean useSnapShotSchema() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here.
Also, I don't think this is the correct place to add a default. The parser should add the default. When constructing a request the caller should be required to set this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually when reading this again I agree that the parser can add the default value but im not undertanding why a caller should be required to set this? In the spec its not a required field https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L4210
Update: Can still update this for this useSnapShotSchema
and caseSensitive
.
api/src/main/java/org/apache/iceberg/exceptions/NoSuchPlanIdException.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/requests/PlanTableScanRequest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/responses/FetchPlanningResultResponse.java
Outdated
Show resolved
Hide resolved
@rdblue @danielcweeks @amogh-jahagirdar @nastra @jackye1995 @singhpk234 When implementing the parsers, one thing that I noticed was that not having a I have marked these areas with The |
@@ -26,7 +26,7 @@ | |||
import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; | |||
import org.apache.iceberg.types.Types; | |||
|
|||
class GenericDataFile extends BaseFile<DataFile> implements DataFile { | |||
public class GenericDataFile extends BaseFile<DataFile> implements DataFile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and GenericDeleteFile shouldn't be made public, based on discussion with @rahil-c this is just an artifact of having the parser implementation in the REST package which can't access this class. I think the real solution should be to uplevel the new parser to core + reuse as much of the existing Content file parsers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do my suggestion below, this along with the rev API changes should go away and get the diff to be a lot smaller by reusing ContentFileParser as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this thanks @amogh-jahagirdar
// TODO at the time of deserialization we dont have the partition spec we just have spec id. | ||
// we will need to get the spec from table metadata using spec id. | ||
// we wil need to send parition spec i believe, put some placeholder here for now null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think there's a compelling case to update the rest spec so we send back the full serialized partition spec .In case we were worried about payload sizes going over the wire we could follow the same concept we do for delete file references where there's a top level list where it's only listed once, and in the tasks itself it's just an integer index. But this complicates the protocol.
I think payload size is more of a concern for sending schemas, which I'm pretty against. I feel like resolving schemas is probably worth it on the client. Besides clients have full context on what's being projected and can just send the right schema to the file scan task. There's also an argument for the server sending back a schema ID so that the client doesn't have to do the work of resolving a schema to use for passing to the file scan task, but the schema in FileScanTask is ultimately for any client side grouping the client wants to do and we could just use the current schema or the snapshot schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we send back the full serialized partition spec .In case we were worried about payload sizes going over the wire we could follow the same concept we do for delete file references where there's a top level list where it's only listed once, and in the tasks itself it's just an integer index. But this complicates the protocol.
presently this leaves us with no option but to bind the partition spec with the current schema of the snapshot being queried, which is tbh not a bad option as schema / partition evolution cases takes care of not allowing backward incompatible schema by default. interesting would be cases when we allow this flag in schema evolution ?
Having schema_id would be beneficial tbh as if for now we want to project col a, col b and the data file not having any of them would still be returned to the client which eventually the engines / caller will discard but this makes me think more in terms of column level access control, but totally understand we lack this mechanism today.
public PartitionData() { | ||
// This construtor has niche case of being used at time for rest serial/deserilization | ||
// where we do not have the partition spec and can not populate these values. | ||
// these values will be refreshed before returning to client engine | ||
this.partitionType = null; | ||
this.size = 0; | ||
this.data = new Object[size]; | ||
this.schema = null; | ||
this.stringSchema = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @rahil-c technically we could avoid this constructor by passing in a null value to the GenericDataFile and rebinding that later, but the bigger question is if we want the server to just send back a serialized spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this in next revision.
public Boolean caseSensitive() { | ||
return caseSensitive; | ||
} | ||
|
||
public Boolean useSnapshotSchema() { | ||
return useSnapshotSchema; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should always be boolean
. From a model perspective I think you'll want these methods to return some value especially since we have defaults defined. This will also simplify the parser logic where you don't have to if/else these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this.
if (request.caseSensitive() != null) { | ||
gen.writeBooleanField(CASE_SENSITIVE, request.caseSensitive()); | ||
} else { | ||
gen.writeBooleanField(CASE_SENSITIVE, true); | ||
} | ||
if (request.useSnapshotSchema() != null) { | ||
gen.writeBooleanField(USE_SNAPSHOT_SCHEMA, request.useSnapshotSchema()); | ||
} else { | ||
gen.writeBooleanField(USE_SNAPSHOT_SCHEMA, false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above, I think we should write the model in the library in a manner where request.useSnapshotSchema() always returns a concrete boolean value and can't be null.
The spec says it's optional but that doesn't mean the reference implementation for the spec should enable that behavior for the in memory representation for these structures. The spec just gives that flexibility to clients and defines what the server should do in case these boolean fields are missing.
In this case having these well defined in the API make it so callers have to set this when building and servers can have the expectation when handling such a request that the field exists instead of doing some sort of null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also newlines after if blocks please. I think this is a case where it's really helpful for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this thanks
.palantir/revapi.yml
Outdated
org.apache.iceberg:iceberg-api: | ||
- code: "java.class.defaultSerializationChanged" | ||
old: "class org.apache.iceberg.expressions.ResidualEvaluator" | ||
new: "class org.apache.iceberg.expressions.ResidualEvaluator" | ||
justification: "local testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing how we can get rid of this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do what I mentioned below we should be able to get rid of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it necessary to modify ResidualEvaluator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really needed, #11180 (comment) we can just set the residual evaluator once we have that info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue Will use the example of the existing FileScanTaskParser
. Currently when deseriaizing FIleScanTasks in the fromJson
of this parser (which is similar to the RESTFileScanTaskParser
, in order to construct a BaseFileScanTask
it takes in a ResidualEvaluator
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/FileScanTaskParser.java#L117
The Residual current constructor currently requires a spec
as well as a caseSensitive
flag https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/ResidualEvaluator.java#L84. At the time of deserialization we do not have either, hence an additional constructor was created which only passes in the filter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following @amogh-jahagirdar suggestion, so will remove this.
import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
import org.apache.iceberg.util.JsonUtil; | ||
|
||
public class RESTContentFileParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big chunk of this PR is this class and the other classes which are quite similar to the existing parsers. Discussed with @rahil-c that We should reuse as much as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this.
List<Integer> deleteFileReferences = null; | ||
if (jsonNode.has(DELETE_FILE_REFERENCES)) { | ||
ImmutableList.Builder deleteFileReferencesBuilder = ImmutableList.builder(); | ||
JsonNode deletesArray = jsonNode.get(DELETE_FILE_REFERENCES); | ||
for (JsonNode deleteRef : deletesArray) { | ||
deleteFileReferencesBuilder.add(deleteRef); | ||
} | ||
deleteFileReferences = deleteFileReferencesBuilder.build(); | ||
} | ||
|
||
if (deleteFileReferences != null) { | ||
ImmutableList.Builder matchedDeleteFilesBuilder = ImmutableList.builder(); | ||
for (Integer deleteFileIdx : deleteFileReferences) { | ||
matchedDeleteFilesBuilder.add(allDeleteFiles.get(deleteFileIdx)); | ||
} | ||
matchedDeleteFiles = (DeleteFile[]) matchedDeleteFilesBuilder.build().stream().toArray(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic here for resolving the delete files for the task is incorrect and can also just be simplified to something like the following:
Set<Integer> deleteFileReferences = Sets.newHashSet();
if (jsonNode.has(DELETE_FILE_REFERENCES)) {
deleteFileReferences.addAll(JsonUtil.getIntegerList(DELETE_FILE_REFERENCES, jsonNode));
}
List<DeleteFile> deleteFilesForTask = Lists.newArrayList();
for (int i = 0; i < deleteFiles.size(); i++) {
if (deleteFileReferences.contains(i)) {
deleteFilesForTask.add(deleteFiles.get(i));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, should we add tests for these parsers as well or so far we skip it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amogh-jahagirdar for pointing this and the concrete suggestion.
@singhpk234 will look into adding tests for the parsers, was first just trying to get most of src
implementation out so others can comment.
// TODO revisit this logic | ||
if (deleteFiles != null) { | ||
generator.writeArrayFieldStart(DELETE_FILE_REFERENCES); | ||
for (int delIndex = 0; delIndex < deleteFiles.size(); delIndex++) { | ||
generator.writeNumber(delIndex); | ||
} | ||
generator.writeEndArray(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems incorrect, why do we need to pass the full delete files list when serializing the scan task? I think we just need to pass in the references, that's really what's relevant for serializing the scan task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this thanks @amogh-jahagirdar
if (response.fileScanTasks() != null) { | ||
gen.writeArrayFieldStart(FILE_SCAN_TASKS); | ||
for (FileScanTask fileScanTask : response.fileScanTasks()) { | ||
RESTFileScanTaskParser.toJson(fileScanTask, deleteFiles, gen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment #11180 (comment), passing the deleteFIles like this is incorrect. We should be passing only the indices of the top level list that are relevant for the task we are serializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map<String, Integer> deleteFilePathToIndex = Maps.newHashMap();
if (response.deleteFiles() != null) {
deleteFiles = response.deleteFiles();
gen.writeArrayFieldStart(DELETE_FILES);
for (int i = 0; i < deleteFiles.size(); i++) {
deleteFilePathToIndex.put(deleteFiles.get(i).location(), i);
RESTContentFileParser.toJson(deleteFiles.get(i), gen);
}
gen.writeEndArray();
}
if (response.fileScanTasks() != null) {
List<Integer> deleteFileReferences = Lists.newArrayList();
gen.writeArrayFieldStart(FILE_SCAN_TASKS);
for (FileScanTask fileScanTask : response.fileScanTasks()) {
if (deleteFiles != null) {
for (DeleteFile taskDelete : taskDeletes) {
deleteFileReferences.add(deleteFilePathToIndex.get(taskDelete.location()));
}
}
RESTFileScanTaskParser.toJson(fileScanTask, deleteFileReferences, gen);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amogh-jahagirdar for the clarification here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at this again locally, correction in the second half:
if (response.fileScanTasks() != null) {
gen.writeArrayFieldStart(FILE_SCAN_TASKS);
for (FileScanTask fileScanTask : response.fileScanTasks()) {
// the list should be initialized here, the delete file references are per file scan task
List<Integer> deleteFileReferences = Lists.newArrayList();
if (deleteFiles != null) {
for (DeleteFile taskDelete : taskDeletes) {
deleteFileReferences.add(deleteFilePathToIndex.get(taskDelete.location()));
}
}
RESTFileScanTaskParser.toJson(fileScanTask, deleteFileReferences, gen);
}
Anyways, with tests this will all be made clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amogh-jahagirdar
// Needed for Jackson Deserialization. | ||
} | ||
|
||
public FetchScanTasksResponse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are following a builder pattern, I think these constructors should be private.
this.planTasks = withPlanTasks; | ||
return this; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the builder API for specifying file scan tasks should also accept another argument to specify a list of partition specs which corresponds to the list of file scan tasks. This would get used in serialization and then we can avoid all the custom stuff in RESTContentFileParser
around serializing partition values and just use ContentFileParser
as it is today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not like the builder/in memory model can't have extra fields compared to the spec by the way. As long as during serialization we're serializing according to the spec that's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline with @amogh-jahagirdar on this, will try this idea out thanks for the helpful suggestion!
return this; | ||
} | ||
|
||
public Builder withDeleteFiles(List<DeleteFile> withDeleteFiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check the names of these args, I think you just copied the method name. Should just be deleteFiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have been done due to checkstyle complaining about hidden fields, my bad. It'd be better if we could have different names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this was due to checkstyle shadow/hidden field. Will use different names
public Builder withPlanStatus(PlanStatus withPlanStatus) { | ||
this.planStatus = withPlanStatus; | ||
return this; | ||
} | ||
|
||
public Builder withPlanTasks(List<String> withPlanTasks) { | ||
this.planTasks = withPlanTasks; | ||
return this; | ||
} | ||
|
||
public Builder withFileScanTasks(List<FileScanTask> withFileScanTasks) { | ||
this.fileScanTasks = withFileScanTasks; | ||
return this; | ||
} | ||
|
||
public Builder withDeleteFiles(List<DeleteFile> withDeleteFiles) { | ||
this.deleteFiles = withDeleteFiles; | ||
return this; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below double check some of the names on these args, they shouldn't have the with
public PlanTableScanRequest() { | ||
// Needed for Jackson Deserialization. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't be needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIll remove these
BaseFileScanTask baseFileScanTask = | ||
new BaseFileScanTask(dataFile, matchedDeleteFiles, null, null, residualEvaluator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can introduce a UnboundFileScanTask
which extends BaseFileScanTask and overrides the spec/schema to throw Unsupported. That class could have a bind(PartitionSpec spec, Schema schema)
API which returns a BaseFileScanTask with the updated spec/schema, and the data file partition data updated correctly.
Binding would happen when planning the tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try this idea out thanks again @amogh-jahagirdar !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to bind the residual as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt] These fileScanTasks can be belong to diff snapshots (lets say we define start / end snapshot both) and we would have to bind this this UnboundFileScanTask to get FileScanTask via first getting the schema of the snapshot this fileScan task belongs to then get PartitionSpec from binding that schema with the UnboundedPartitionSpec and then we get to it. Do we have a metadata for mapping FileScanTask to a snapshot id ?
ref this interesting case : #11162
core/src/main/java/org/apache/iceberg/RESTFileScanTaskParser.java
Outdated
Show resolved
Hide resolved
jsonNode.isObject(), "Invalid JSON node for file scan task: non-object (%s)", jsonNode); | ||
|
||
GenericDataFile dataFile = | ||
(GenericDataFile) ContentFileParser.unboundContentFileFromJson(jsonNode.get(DATA_FILE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use JsonUtil.get(..)
where possible, since it already includes error handling and will show if this field is missing
String planningModeValue = response.config().get(REST_SERVER_PLANNING_MODE_KEY); | ||
RESTPlanningMode planningMode = RESTPlanningMode.fromName(planningModeValue); | ||
if (planningMode == RESTPlanningMode.REQUIRED | ||
|| ((planningMode == RESTPlanningMode.SUPPORTED) && restServerPlanningEnabled)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|| ((planningMode == RESTPlanningMode.SUPPORTED) && restServerPlanningEnabled)) { | |
|| (planningMode == RESTPlanningMode.SUPPORTED && restServerPlanningEnabled)) { |
@@ -125,4 +125,38 @@ public String view(TableIdentifier ident) { | |||
public String renameView() { | |||
return SLASH.join("v1", prefix, "views", "rename"); | |||
} | |||
|
|||
public String planTableScan(TableIdentifier ident) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all places that use these new endpoints need to do an Endpoint.check(...)
as it's still possible that some of these endpoints don't exist, even if the rest-server-planning-enabled
was set by a client
* Cleanup plan table scan response * more response cleanup
Please see the following PR: #11369 which just contains the changes only for the model and parsers. |
@amogh-jahagirdar @rdblue @danielcweeks @nastra Moving on adding to adding tests to check |
planTasks.addAll(response.planTasks()); | ||
// TODO need to confirm recursive call here | ||
if (response.planTasks() != null) { | ||
getScanTasksIterable(response.planTasks()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we're not using this result anywhere at the moment to populate the top level list? unless i'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this, will fix this and adding a test to exercise this code path
It looks like this won't be ready this week so I'll move it out to the next release milestone. |
* Fix partitioned table planning * Fix condition for failing when specs by id is missing
53e475d
to
136ca3b
Compare
@rdblue @danielcweeks Was wondering if you guys can take a look whenever you get a chance? |
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
Commenting to avoid this being marked as stale. |
This pr adds the new scan planning apis based on the recent REST spec changes made in the following #9695.
In addition there is an additional spec change to
loadTable
that https://github.com/apache/iceberg/pull/11156/files that allows for the service to notify the client when to call these new apis.cc @amogh-jahagirdar @nastra @rdblue @danielcweeks @jackye1995 @singhpk234 @geruh