-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Improve][Core][Metrics] Refactor and enhance job metrics handling for multiple sinks with updated tests #10273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…multiple sinks with updated tests
… and source indices for accurate assertions
… and source indices for accurate assertions
corgy-w
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @hawk9821
| return; | ||
| } | ||
|
|
||
| java.util.List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| java.util.List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName); | |
| List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName); |
| return ""; | ||
| } | ||
|
|
||
| Pattern pattern = Pattern.compile("((?:Sink|Source|Transform)\\[\\d+\\])"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not recommended to put the Pattern inside the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request refactors job metrics handling to support multiple sinks/sources with the same table name by adding unique identifiers (e.g., "Sink[0]", "Source[0]") to metric keys. This prevents metric collision when different sinks write to tables with identical names.
Key changes:
- Enhanced metrics aggregation logic to extract and use plugin identifiers from DAG information
- Modified metric key format to include source/sink identifiers for disambiguation
- Updated all related tests and E2E tests to expect the new metric key format with identifiers
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| BaseService.java | Core refactoring: added JobDAGInfo parameter to getJobMetrics, implemented identifier extraction and mapping logic, refactored metric processing to use qualified keys |
| JobConfigParser.java | Changed visibility of action name creation methods from package-private to public to enable test usage |
| BaseServiceTableMetricsTest.java | Added new test for multiple sinks scenario, updated existing tests to pass JobDAGInfo parameter |
| RestApiIT.java | Updated metric key assertions to include Source/Sink identifier prefixes |
| MultiTableMetricsIT.java | Updated metric key assertions to use qualified names with identifiers |
| CommittedMetricsIT.java | Updated metric key assertions to use qualified names with identifiers |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int arraySize = metricNode.size(); | ||
|
|
||
| if (arraySize == sinkIdentifiers.size()) { | ||
| ObjectMapper mapper = new ObjectMapper(); |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new ObjectMapper instance is created inside the loop at line 424, which is inefficient. ObjectMapper instantiation is relatively expensive and should be reused. Consider creating the ObjectMapper instance once before the loop (at the method level or as a static/instance field) to improve performance, especially when processing multiple sink identifiers.
| if (arraySize == sinkIdentifiers.size()) { | ||
| ObjectMapper mapper = new ObjectMapper(); | ||
| for (int i = 0; i < arraySize; i++) { | ||
| String sinkIdentifier = sinkIdentifiers.get(i); |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'sinkIdentifier' at line 426 is misleadingly named. This variable holds an identifier that could be for either a source or a sink, depending on the context. The name should reflect this dual purpose. Consider renaming it to 'identifier' to avoid confusion.
| public static String createSourceActionName(int configIndex, String pluginName) { | ||
| return String.format("Source[%s]-%s", configIndex, pluginName); | ||
| } | ||
|
|
||
| static String createSinkActionName(int configIndex, String pluginName, String table) { | ||
| public static String createSinkActionName(int configIndex, String pluginName, String table) { | ||
| return String.format("Sink[%s]-%s-%s", configIndex, pluginName, table); | ||
| } | ||
|
|
||
| static String createTransformActionName(int configIndex, String pluginName) { | ||
| public static String createTransformActionName(int configIndex, String pluginName) { | ||
| return String.format("Transform[%s]-%s", configIndex, pluginName); |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods createSourceActionName, createSinkActionName, and createTransformActionName are changed from package-private to public visibility. While this change enables their use in tests, there is no documentation explaining their purpose, parameters, or expected usage. Consider adding JavaDoc comments to these newly public methods to help external callers understand how to use them correctly, especially since they are now part of the public API of the class.
| Map<String, List<String>> tableToSinkIdentifiersMap = new HashMap<>(); | ||
| if (jobDAGInfo != null && jobDAGInfo.getVertexInfoMap() != null) { | ||
| for (VertexInfo vertexInfo : jobDAGInfo.getVertexInfoMap().values()) { | ||
| String identifier = extractSinkIdentifier(vertexInfo.getConnectorType()); |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition at line 273 lacks a clarifying comment. The check 'identifier.equals(vertexInfo.getConnectorType())' is used to detect when the extractSinkIdentifier method failed to extract a pattern (meaning it returned the original input unchanged). This logic is not immediately obvious to readers. Consider adding a comment explaining that this condition filters out vertices where no identifier pattern was found, or refactor extractSinkIdentifier to return an Optional to make the intent clearer.
| String identifier = extractSinkIdentifier(vertexInfo.getConnectorType()); | |
| String identifier = extractSinkIdentifier(vertexInfo.getConnectorType()); | |
| // When extractSinkIdentifier cannot find a pattern, it returns the original | |
| // connectorType. In that case (identifier equals connectorType) we skip this | |
| // vertex because no specific identifier was extracted for metrics grouping. |
| processMetric( | ||
| metricName, tableName, metricNode, tableMetricsMaps); | ||
| metricName, | ||
| tableName, | ||
| metricNode, | ||
| tableMetricsMaps, | ||
| identifiersMap); |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential null pointer issue: When identifiersMap is null (which happens when a metric name contains "#" but doesn't start with TableSource, Source, TableSink, or Sink), the method processMetric is still called with a null map at line 375. Inside processMetric at line 415, calling .get(tableName) on a null map will throw a NullPointerException. While current metric names appear to always match the expected prefixes, this creates a fragile dependency. Consider adding a null check at line 415 (e.g., 'if (tableToSinkIdentifiersMap != null)') or validating that identifiersMap is not null before calling processMetric to make the code more robust.
| return; | ||
| } | ||
|
|
||
| java.util.List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName); |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'sinkIdentifiers' at line 415 is misleadingly named. This variable can contain either source or sink identifiers depending on which map is passed in (tableToSourceIdentifiersMap or tableToSinkIdentifiersMap). Using the name 'sinkIdentifiers' when it might actually contain source identifiers makes the code confusing. Consider renaming it to 'identifiers' to accurately reflect what it contains.
| } | ||
| } | ||
|
|
||
| private String extractSinkIdentifier(String vertexName) { |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name 'extractSinkIdentifier' is misleading because it can extract identifiers for Source, Sink, or Transform plugins, not just sinks. The regex pattern on line 500 explicitly matches all three types. Consider renaming the method to 'extractPluginIdentifier' or 'extractVertexIdentifier' to accurately reflect its broader purpose.
| private String extractSinkIdentifier(String vertexName) { | |
| private String extractPluginIdentifier(String vertexName) { |
| if (sinkIdentifiers != null | ||
| && !sinkIdentifiers.isEmpty() | ||
| && metricNode.isArray() | ||
| && sinkIdentifiers.size() > 1) { | ||
| int arraySize = metricNode.size(); | ||
|
|
||
| if (arraySize == sinkIdentifiers.size()) { | ||
| ObjectMapper mapper = new ObjectMapper(); | ||
| for (int i = 0; i < arraySize; i++) { | ||
| String sinkIdentifier = sinkIdentifiers.get(i); | ||
| String metricKey = sinkIdentifier + "." + tableName; | ||
|
|
||
| try { | ||
| String json = "[" + mapper.writeValueAsString(metricNode.get(i)) + "]"; | ||
| JsonNode arrayNode = mapper.readTree(json); | ||
| putMetricToMap(metricName, metricKey, arrayNode, tableMetricsMaps); | ||
| } catch (JsonProcessingException e) { | ||
| putMetricToMap(metricName, metricKey, metricNode.get(i), tableMetricsMaps); | ||
| } | ||
| } | ||
| return; | ||
| } | ||
| } | ||
|
|
||
| String metricKey = tableName; | ||
| if (sinkIdentifiers != null && !sinkIdentifiers.isEmpty()) { | ||
| metricKey = sinkIdentifiers.get(0) + "." + tableName; |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition at line 420 checks if sinkIdentifiers.size() is greater than 1, but the subsequent check at line 423 verifies if arraySize equals sinkIdentifiers.size(). This means when there's only one identifier (sinkIdentifiers.size() == 1), the code skips to line 441 and uses sinkIdentifiers.get(0). However, when sinkIdentifiers.size() > 1 but arraySize != sinkIdentifiers.size(), the code also falls through to line 441 and only uses the first identifier, potentially losing data. Consider adding a warning log or handling this mismatch case more explicitly to alert developers when the array size doesn't match the expected number of identifiers.
| if (vertexInfo.getType() == PluginType.SOURCE) { | ||
| targetMap = tableToSourceIdentifiersMap; | ||
| } else if (vertexInfo.getType() == PluginType.SINK) { | ||
| targetMap = tableToSinkIdentifiersMap; |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code at lines 276-281 only handles SOURCE and SINK plugin types, ignoring TRANSFORM types. However, the extractSinkIdentifier method on line 271 explicitly extracts Transform identifiers (as shown in the regex pattern on line 500). This inconsistency could lead to confusion. If Transform metrics are not expected to be processed in this context, consider updating the regex pattern to exclude Transform. If Transform metrics should be handled, add a case for PluginType.TRANSFORM.
| targetMap = tableToSinkIdentifiersMap; | |
| targetMap = tableToSinkIdentifiersMap; | |
| } else if (vertexInfo.getType() == PluginType.TRANSFORM) { | |
| // Currently, transform plugin metrics are not aggregated by table | |
| // in this method. We still call extractSinkIdentifier for all | |
| // plugin types, but intentionally do not populate a target map | |
| // for TRANSFORM here. |
| String json = "[" + mapper.writeValueAsString(metricNode.get(i)) + "]"; | ||
| JsonNode arrayNode = mapper.readTree(json); |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON serialization and deserialization at lines 430-431 appears inefficient. The code converts a single JsonNode element to a JSON string, wraps it in brackets to create an array string, and then parses it back to a JsonNode array. This round-trip conversion is unnecessary overhead. Consider creating the array node directly using Jackson's API (e.g., mapper.createArrayNode().add(metricNode.get(i))) instead of string manipulation and re-parsing.
…
Purpose of this pull request
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide
incompatible-changes.mdto describe the incompatibility caused by this PR.