Skip to content

Conversation

@zhangshenghang
Copy link
Member

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@github-actions github-actions bot added the e2e label Jan 5, 2026
@corgy-w corgy-w requested a review from liugddx January 7, 2026 14:05
Copy link
Contributor

@corgy-w corgy-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @hawk9821

return;
}

java.util.List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
java.util.List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName);
List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName);

return "";
}

Pattern pattern = Pattern.compile("((?:Sink|Source|Transform)\\[\\d+\\])");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not recommended to put the Pattern inside the method.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors job metrics handling to support multiple sinks/sources with the same table name by adding unique identifiers (e.g., "Sink[0]", "Source[0]") to metric keys. This prevents metric collision when different sinks write to tables with identical names.

Key changes:

  • Enhanced metrics aggregation logic to extract and use plugin identifiers from DAG information
  • Modified metric key format to include source/sink identifiers for disambiguation
  • Updated all related tests and E2E tests to expect the new metric key format with identifiers

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
BaseService.java Core refactoring: added JobDAGInfo parameter to getJobMetrics, implemented identifier extraction and mapping logic, refactored metric processing to use qualified keys
JobConfigParser.java Changed visibility of action name creation methods from package-private to public to enable test usage
BaseServiceTableMetricsTest.java Added new test for multiple sinks scenario, updated existing tests to pass JobDAGInfo parameter
RestApiIT.java Updated metric key assertions to include Source/Sink identifier prefixes
MultiTableMetricsIT.java Updated metric key assertions to use qualified names with identifiers
CommittedMetricsIT.java Updated metric key assertions to use qualified names with identifiers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

int arraySize = metricNode.size();

if (arraySize == sinkIdentifiers.size()) {
ObjectMapper mapper = new ObjectMapper();
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new ObjectMapper instance is created inside the loop at line 424, which is inefficient. ObjectMapper instantiation is relatively expensive and should be reused. Consider creating the ObjectMapper instance once before the loop (at the method level or as a static/instance field) to improve performance, especially when processing multiple sink identifiers.

Copilot uses AI. Check for mistakes.
if (arraySize == sinkIdentifiers.size()) {
ObjectMapper mapper = new ObjectMapper();
for (int i = 0; i < arraySize; i++) {
String sinkIdentifier = sinkIdentifiers.get(i);
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'sinkIdentifier' at line 426 is misleadingly named. This variable holds an identifier that could be for either a source or a sink, depending on the context. The name should reflect this dual purpose. Consider renaming it to 'identifier' to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to 58
public static String createSourceActionName(int configIndex, String pluginName) {
return String.format("Source[%s]-%s", configIndex, pluginName);
}

static String createSinkActionName(int configIndex, String pluginName, String table) {
public static String createSinkActionName(int configIndex, String pluginName, String table) {
return String.format("Sink[%s]-%s-%s", configIndex, pluginName, table);
}

static String createTransformActionName(int configIndex, String pluginName) {
public static String createTransformActionName(int configIndex, String pluginName) {
return String.format("Transform[%s]-%s", configIndex, pluginName);
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods createSourceActionName, createSinkActionName, and createTransformActionName are changed from package-private to public visibility. While this change enables their use in tests, there is no documentation explaining their purpose, parameters, or expected usage. Consider adding JavaDoc comments to these newly public methods to help external callers understand how to use them correctly, especially since they are now part of the public API of the class.

Copilot uses AI. Check for mistakes.
Map<String, List<String>> tableToSinkIdentifiersMap = new HashMap<>();
if (jobDAGInfo != null && jobDAGInfo.getVertexInfoMap() != null) {
for (VertexInfo vertexInfo : jobDAGInfo.getVertexInfoMap().values()) {
String identifier = extractSinkIdentifier(vertexInfo.getConnectorType());
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition at line 273 lacks a clarifying comment. The check 'identifier.equals(vertexInfo.getConnectorType())' is used to detect when the extractSinkIdentifier method failed to extract a pattern (meaning it returned the original input unchanged). This logic is not immediately obvious to readers. Consider adding a comment explaining that this condition filters out vertices where no identifier pattern was found, or refactor extractSinkIdentifier to return an Optional to make the intent clearer.

Suggested change
String identifier = extractSinkIdentifier(vertexInfo.getConnectorType());
String identifier = extractSinkIdentifier(vertexInfo.getConnectorType());
// When extractSinkIdentifier cannot find a pattern, it returns the original
// connectorType. In that case (identifier equals connectorType) we skip this
// vertex because no specific identifier was extracted for metrics grouping.

Copilot uses AI. Check for mistakes.
Comment on lines 370 to +375
processMetric(
metricName, tableName, metricNode, tableMetricsMaps);
metricName,
tableName,
metricNode,
tableMetricsMaps,
identifiersMap);
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential null pointer issue: When identifiersMap is null (which happens when a metric name contains "#" but doesn't start with TableSource, Source, TableSink, or Sink), the method processMetric is still called with a null map at line 375. Inside processMetric at line 415, calling .get(tableName) on a null map will throw a NullPointerException. While current metric names appear to always match the expected prefixes, this creates a fragile dependency. Consider adding a null check at line 415 (e.g., 'if (tableToSinkIdentifiersMap != null)') or validating that identifiersMap is not null before calling processMetric to make the code more robust.

Copilot uses AI. Check for mistakes.
return;
}

java.util.List<String> sinkIdentifiers = tableToSinkIdentifiersMap.get(tableName);
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'sinkIdentifiers' at line 415 is misleadingly named. This variable can contain either source or sink identifiers depending on which map is passed in (tableToSourceIdentifiersMap or tableToSinkIdentifiersMap). Using the name 'sinkIdentifiers' when it might actually contain source identifiers makes the code confusing. Consider renaming it to 'identifiers' to accurately reflect what it contains.

Copilot uses AI. Check for mistakes.
}
}

private String extractSinkIdentifier(String vertexName) {
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name 'extractSinkIdentifier' is misleading because it can extract identifiers for Source, Sink, or Transform plugins, not just sinks. The regex pattern on line 500 explicitly matches all three types. Consider renaming the method to 'extractPluginIdentifier' or 'extractVertexIdentifier' to accurately reflect its broader purpose.

Suggested change
private String extractSinkIdentifier(String vertexName) {
private String extractPluginIdentifier(String vertexName) {

Copilot uses AI. Check for mistakes.
Comment on lines +417 to +443
if (sinkIdentifiers != null
&& !sinkIdentifiers.isEmpty()
&& metricNode.isArray()
&& sinkIdentifiers.size() > 1) {
int arraySize = metricNode.size();

if (arraySize == sinkIdentifiers.size()) {
ObjectMapper mapper = new ObjectMapper();
for (int i = 0; i < arraySize; i++) {
String sinkIdentifier = sinkIdentifiers.get(i);
String metricKey = sinkIdentifier + "." + tableName;

try {
String json = "[" + mapper.writeValueAsString(metricNode.get(i)) + "]";
JsonNode arrayNode = mapper.readTree(json);
putMetricToMap(metricName, metricKey, arrayNode, tableMetricsMaps);
} catch (JsonProcessingException e) {
putMetricToMap(metricName, metricKey, metricNode.get(i), tableMetricsMaps);
}
}
return;
}
}

String metricKey = tableName;
if (sinkIdentifiers != null && !sinkIdentifiers.isEmpty()) {
metricKey = sinkIdentifiers.get(0) + "." + tableName;
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition at line 420 checks if sinkIdentifiers.size() is greater than 1, but the subsequent check at line 423 verifies if arraySize equals sinkIdentifiers.size(). This means when there's only one identifier (sinkIdentifiers.size() == 1), the code skips to line 441 and uses sinkIdentifiers.get(0). However, when sinkIdentifiers.size() > 1 but arraySize != sinkIdentifiers.size(), the code also falls through to line 441 and only uses the first identifier, potentially losing data. Consider adding a warning log or handling this mismatch case more explicitly to alert developers when the array size doesn't match the expected number of identifiers.

Copilot uses AI. Check for mistakes.
if (vertexInfo.getType() == PluginType.SOURCE) {
targetMap = tableToSourceIdentifiersMap;
} else if (vertexInfo.getType() == PluginType.SINK) {
targetMap = tableToSinkIdentifiersMap;
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code at lines 276-281 only handles SOURCE and SINK plugin types, ignoring TRANSFORM types. However, the extractSinkIdentifier method on line 271 explicitly extracts Transform identifiers (as shown in the regex pattern on line 500). This inconsistency could lead to confusion. If Transform metrics are not expected to be processed in this context, consider updating the regex pattern to exclude Transform. If Transform metrics should be handled, add a case for PluginType.TRANSFORM.

Suggested change
targetMap = tableToSinkIdentifiersMap;
targetMap = tableToSinkIdentifiersMap;
} else if (vertexInfo.getType() == PluginType.TRANSFORM) {
// Currently, transform plugin metrics are not aggregated by table
// in this method. We still call extractSinkIdentifier for all
// plugin types, but intentionally do not populate a target map
// for TRANSFORM here.

Copilot uses AI. Check for mistakes.
Comment on lines +430 to +431
String json = "[" + mapper.writeValueAsString(metricNode.get(i)) + "]";
JsonNode arrayNode = mapper.readTree(json);
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON serialization and deserialization at lines 430-431 appears inefficient. The code converts a single JsonNode element to a JSON string, wraps it in brackets to create an array string, and then parses it back to a JsonNode array. This round-trip conversion is unnecessary overhead. Consider creating the array node directly using Jackson's API (e.g., mapper.createArrayNode().add(metricNode.get(i))) instead of string manipulation and re-parsing.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants