-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53891][SQL] Model DSV2 Commit Operation Metrics API #52595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b249670
to
09c7d44
Compare
09c7d44
to
4fe3f19
Compare
3769ff8
to
84b2eb6
Compare
* </ul> | ||
* </li> | ||
* </ul> | ||
* @param operationMetrics operation metrics collected from the query producing write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing an empty space here? Can we also mention that it is best effort to collect/compute these so that users don't report bugs for cases that we currently don't support?
import org.apache.spark.annotation.Evolving; | ||
|
||
/** | ||
* Interface that provides merge operation metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: merge
-> MERGE
?
* @since 4.1.0 | ||
*/ | ||
@Evolving | ||
public interface OperationMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming seems reasonable to me but if folks have better ideas, it would be great to hear them.
cc @cloud-fan @viirya @gengliangwang @dongjoon-hyun @huaxingao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the AS-IS name, OperationMetrics
.
/** | ||
* Returns the number of target rows copied unmodified because they did not match any action. | ||
*/ | ||
long numTargetRowsCopied(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we will use -1 if unknown. Shall we document this?
collectFirst(query) { case m: MergeRowsExec => m }.map { n => | ||
val metrics = n.metrics | ||
MergeOperationMetricsImpl( | ||
metrics.get("numTargetRowsCopied").map(_.value).getOrElse(-1L), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add constants for these in a separate PR? It seems fragile.
* @since 4.1.0 | ||
*/ | ||
@Evolving | ||
public interface OperationMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking how to simplify consumption of these objects in connectors. The question is whether this interface should have some sort of operation
method that would tell the type of metrics. That said, it is probably not the end of the world if connectors do a class check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts anyone?
I feel like using a proper object is the right call here compared to the map. Left some questions. Would love to hear what others think too. |
What changes were proposed in this pull request?
#51377 added a DataSourceV2 API that sends operation metrics along with the commit, via a map of string, long. Change this to a proper model.
Suggestion from @aokolnychyi
Why are the changes needed?
It would be cleaner to model it as a proper object so that it is more clear what metrics Spark sends, and to handle future cases where metrics may not be long values.
Does this PR introduce any user-facing change?
No, unreleased DSV2 API.
How was this patch tested?
Existing tests
Was this patch authored or co-authored using generative AI tooling?
No