Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dfs transformation function in XContentMapValues #17612

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jmazanec15
Copy link
Member

Description

In KNN, we are developing feature to remove vectors from source from documents before serialization. Originally, we were using the XContentMapValues.filter function to remove the vector fields from the source objects (ref). However, the problem with filter is that if a filter produces an empty object, then it will be removed from the array if its in an array. This alters the structure of the source map. Thus, putting the vectors back becomes very complex for nested cases (opensearch-project/k-NN#2583).

After some discussion, we are pivoting to an approach that instead of filtering the fields, we will mask the field with a smaller representation on write, and vice versa on read. I added PoC here: https://github.com/jmazanec15/k-NN-1/tree/mask-derived-poc. This change adds a transform method to XContentMapValues that performs depth first traversal on a map, potentially applying transformations to different values along the way. I could add in k-NN plugin, but thought it might be useful in future in core. The DFS nature works well with the architecture of nested documents.

Related Issues

Resolves opensearch-project/k-NN#2377

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

✅ Gradle check result for 4617895: SUCCESS

Copy link

codecov bot commented Mar 17, 2025

Codecov Report

Attention: Patch coverage is 91.93548% with 5 lines in your changes missing coverage. Please review.

Project coverage is 72.37%. Comparing base (dcad6b8) to head (4e06dc5).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...rch/common/xcontent/support/XContentMapValues.java 91.93% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17612      +/-   ##
============================================
- Coverage     72.46%   72.37%   -0.10%     
+ Complexity    65757    65713      -44     
============================================
  Files          5311     5311              
  Lines        305001   305073      +72     
  Branches      44230    44243      +13     
============================================
- Hits         221022   220796     -226     
- Misses        65932    66116     +184     
- Partials      18047    18161     +114     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

✅ Gradle check result for 6e6b7ae: SUCCESS

Copy link
Contributor

❕ Gradle check result for 4e06dc5: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteStoreStatsIT.testZeroLagOnCreateIndex

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Collaborator

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jmazanec15 , had one suggestion for an additional test case but otherwise LGTM

@jmazanec15
Copy link
Member Author

Thanks @jed326 - added the test

@jmazanec15 jmazanec15 requested a review from jed326 March 19, 2025 18:28
Copy link
Collaborator

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! @msfroh do you want to take another pass at this?

Copy link
Contributor

❌ Gradle check result for f9b40ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f9b40ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f9b40ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@jed326
Copy link
Collaborator

jed326 commented Mar 20, 2025

❌ Gradle check result for f9b40ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

I'm not sure what is actually failing here, @jmazanec15 maybe try rebasing your branch if the next retry fails again

Adds a transformation function for XContentMapValues that performs depth
first traversal into a map, potentially applying transformations to
different values along the way. Main application for the method will be
to provide masks that change values in the map without compromising the
structure.

Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
@jmazanec15
Copy link
Member Author

Sure @jed326 , just rebased

Copy link
Contributor

❌ Gradle check result for f43e1d6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for f43e1d6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cicd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Derived Source for Vectors
5 participants