Skip to content

Fix how "all matches" extractor are handled#2568

Merged
lampajr merged 4 commits intoHyperfoil:masterfrom
lampajr:js_all_match_extractor
Oct 31, 2025
Merged

Fix how "all matches" extractor are handled#2568
lampajr merged 4 commits intoHyperfoil:masterfrom
lampajr:js_all_match_extractor

Conversation

@lampajr
Copy link
Member

@lampajr lampajr commented Oct 29, 2025

Fixes Issue

The expectation is that "previewing" a label execution and its actual label value computation return the same result, which is not true a the moment when one of the extractor is marked as "all matches".

Changes proposed

  • Removed apparently wrong flatten((ArrayNode) row[3])
  • Added some more tests

Check List (Check all the applicable boxes)

  • My code follows the code style of this project.
  • My change requires changes to the documentation.
  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@lampajr
Copy link
Member Author

lampajr commented Oct 29, 2025

Commit 66e7c0b reproduces the problem raised by other users, it seems that when using "all matches" extractor (which should usually produce an array or arrays as result) is not working as expected when computing the corresponding label value.

In contrast, the "label preview" seems working as expected resulting in different results between the preview and the actual label value.

This seems to happen only if using JS function, in this case the input (array of arrays) is flattened when trying to compute the label value.

@lampajr
Copy link
Member Author

lampajr commented Oct 29, 2025

Worth noting that the "label preview" is calling evaluateOnce under the hood, whereas the label values computation is calling evaluateWithCombinationFunction.

Signed-off-by: Andrea Lamparelli <a.lamparelli95@gmail.com>
@lampajr lampajr force-pushed the js_all_match_extractor branch from 1829377 to 66e7c0b Compare October 29, 2025 10:36
@lampajr
Copy link
Member Author

lampajr commented Oct 29, 2025

Found that the issue is happening only when there is one single extractor, and that extractor is "all matches" one.

Signed-off-by: Andrea Lamparelli <a.lamparelli95@gmail.com>
Signed-off-by: Andrea Lamparelli <a.lamparelli95@gmail.com>
@lampajr
Copy link
Member Author

lampajr commented Oct 29, 2025

Added some more tests covering first and all matches scenarios , especially when dealing with one single extractor.

@lampajr lampajr force-pushed the js_all_match_extractor branch from 615d685 to 131e5ca Compare October 29, 2025 11:04
FingerprintDAO.deleteById(datasetId);
Util.evaluateWithCombinationFunction(toCompute,
(row) -> (String) row[2],
(row) -> (row[3] instanceof ArrayNode ? flatten((ArrayNode) row[3]) : (JsonNode) row[3]),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why this flatten was introduced, it seems forcing to do the "first match" in case of array of arrays..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, idk if we have any data that would be affected if we remove it?

@lampajr lampajr force-pushed the js_all_match_extractor branch from 131e5ca to 530402e Compare October 29, 2025 12:10
@lampajr lampajr marked this pull request as ready for review October 29, 2025 12:10
@lampajr lampajr force-pushed the js_all_match_extractor branch from 530402e to 5cb808f Compare October 29, 2025 12:22
public void testDatasetLabelAsyncSingleWithReduceFunctionArray() {
withExampleSchemas((schemas) -> {
int labelReduce = addLabel(schemas[0], "sum", "async (value) => { return value.reduce((a,b) => a+b); }",
int labelReduce = addLabel(schemas[0], "sum", "async (value) => { return value[0].reduce((a,b) => a+b); }",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this test testDatasetLabelAsyncSingleWithReduceFunctionArray and testDatasetLabelSingleWithReduceFunctionArray were created to test exactly that use case when using "All Matches" and reduce function.. but in my opinion the expectation was wrong and did not make sense as it would change the results/expectation a user will have when dealing with such arrays as input.

IMHO the proper way is to consider the input as an array of arrays which is exactly what it would be in all other use cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, but it would be good to understand why... From the git log it seems @willr3 created this. @willr3, any comments?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created testDatasetLabelAsyncSingleWithReduceFunctionArray to test the use of async and based it off what already existed in testDatasetLabelSingleWithReduceFunctionArray so I cannot comment about the use of value[0] vs value.

@lampajr lampajr requested review from stalep and willr3 October 29, 2025 12:24
@lampajr lampajr self-assigned this Oct 29, 2025
@lampajr lampajr force-pushed the js_all_match_extractor branch from 5cb808f to c34a943 Compare October 29, 2025 12:38
Copy link
Member

@stalep stalep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Very nice with those tests added!!
It would be good to get some background info from @willr3 if he remembers why it's like this in the first place (if I'm correct that he wrote it :).

FingerprintDAO.deleteById(datasetId);
Util.evaluateWithCombinationFunction(toCompute,
(row) -> (String) row[2],
(row) -> (row[3] instanceof ArrayNode ? flatten((ArrayNode) row[3]) : (JsonNode) row[3]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, idk if we have any data that would be affected if we remove it?

public void testDatasetLabelAsyncSingleWithReduceFunctionArray() {
withExampleSchemas((schemas) -> {
int labelReduce = addLabel(schemas[0], "sum", "async (value) => { return value.reduce((a,b) => a+b); }",
int labelReduce = addLabel(schemas[0], "sum", "async (value) => { return value[0].reduce((a,b) => a+b); }",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, but it would be good to understand why... From the git log it seems @willr3 created this. @willr3, any comments?

Copy link
Collaborator

@willr3 willr3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only made minor comments about readability and the background on the async test I added

@lampajr lampajr force-pushed the js_all_match_extractor branch from c34a943 to 6edcc0c Compare October 31, 2025 16:27
@lampajr
Copy link
Member Author

lampajr commented Oct 31, 2025

Only made minor comments about readability and the background on the async test I added

Applied your suggestions ✅

Signed-off-by: Andrea Lamparelli <a.lamparelli95@gmail.com>
@lampajr lampajr force-pushed the js_all_match_extractor branch from 6edcc0c to 8c6579a Compare October 31, 2025 16:29
@lampajr lampajr merged commit ff1f77a into Hyperfoil:master Oct 31, 2025
2 checks passed
@lampajr lampajr deleted the js_all_match_extractor branch October 31, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants