Email search scenario tests of using dotnet spark as reducers#205
Open
ruihmicrosoft wants to merge 2 commits intodotnet:mainfrom
Open
Email search scenario tests of using dotnet spark as reducers#205ruihmicrosoft wants to merge 2 commits intodotnet:mainfrom
ruihmicrosoft wants to merge 2 commits intodotnet:mainfrom
Conversation
…l search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet. 1) The first reducer computes the top N contacts per user. It first groups by the user id and then collapses each group into one row by concatenating the records. A Udf is used in order to apply some string operations. 2) The second reducer takes N rows and outputs M rows. Given a succeeded search query with flattened folder and item list, the reducer explodes the input dataframe by pairing the corresponding folder and item ids. If the folder and item list contain multiple records, the reducer will generate multiple rows as output.
Contributor
|
@ruihmicrosoft, we have been adding examples with some documentations under the example folder: #319, #320. If you want to push forward this PR, can you please convert this to an example, not E2E test? Thanks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds two unit tests which mimic the reducers used in email search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet.
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnnin your description to cause GitHub to automatically close the issue(s) when your PR is merged.