-
Notifications
You must be signed in to change notification settings - Fork 3
test: DH-21118: Added with_serial benchmarks #415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds three new benchmark tests to measure UDF (User-Defined Function) performance when using the with_serial() method, which forces single-threaded execution. These tests complement existing benchmarks to help measure the performance impact of serial versus parallel execution on GIL-based systems.
Key changes:
- Added serial variants of existing UDF benchmarks for No Hints, Python Hints, and Numpy Hints scenarios
- Each serial test uses
Selectable.parse().with_serial()to enforce single-threaded execution - Tests maintain consistency with their non-serial counterparts in terms of setup and configuration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return num1 + num2 | ||
| from deephaven.table import Selectable | ||
| col1 = Selectable.parse('num1=f(num1, num2)').with_serial() | ||
| col2 = Selectable.parse('num1=(double)num1').with_serial() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the second cast for? Seems like it is going to do extra work beyond the UDF; making us measure not quite what we intend. I see we have it in the other test, so changing it would be inconsistent, but I would like to know the motivation so we can decide if it is really necessary going forward.
If we do need the cast, because there are no hints; why do we prefer it this way instead of just"(double)f(num1, num2)" as a single statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the javadoc heading I have the following...
Note: The "No Hints" tests have casts to make them equivalent to the hints tests, otherwise
the return value would always be a PyObject and not really the same test. They use two
formulas to achieve this, otherwise vectorization would not happen on "No Hints" benchmarks.
... Jianfeng pointed the vectorization issue out in a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that wse are testing exactly what we want though; because with a select like that we have to write down a PyObject. Then we have to read the PyObject from the first column and cast it.
col1 = Selectable.parse('num1=(double)f(num1, num2)').with_serial()
Would take the PyObject inside of the formula, then turn it into a double, rather than writing it down and holding onto it for ever.
Fixing/changing the benchmark does have some negative effects though; because we basically lose the history.
with_serialto track single threaded usage.