Skip to content

Conversation

@stanbrub
Copy link
Collaborator

@stanbrub stanbrub commented Dec 5, 2025

  • Added three more benchmarks for UDF that use with_serial to track single threaded usage.
  • The tests are have counterparts for No Hints, Python Hints, Numpy Hints
  • These tests now show the disparately between stateless and serial behavior on a GIL system

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds three new benchmark tests to measure UDF (User-Defined Function) performance when using the with_serial() method, which forces single-threaded execution. These tests complement existing benchmarks to help measure the performance impact of serial versus parallel execution on GIL-based systems.

Key changes:

  • Added serial variants of existing UDF benchmarks for No Hints, Python Hints, and Numpy Hints scenarios
  • Each serial test uses Selectable.parse().with_serial() to enforce single-threaded execution
  • Tests maintain consistency with their non-serial counterparts in terms of setup and configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return num1 + num2
from deephaven.table import Selectable
col1 = Selectable.parse('num1=f(num1, num2)').with_serial()
col2 = Selectable.parse('num1=(double)num1').with_serial()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the second cast for? Seems like it is going to do extra work beyond the UDF; making us measure not quite what we intend. I see we have it in the other test, so changing it would be inconsistent, but I would like to know the motivation so we can decide if it is really necessary going forward.

If we do need the cast, because there are no hints; why do we prefer it this way instead of just"(double)f(num1, num2)" as a single statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the javadoc heading I have the following...

Note: The "No Hints" tests have casts to make them equivalent to the hints tests, otherwise 
the return value would always be a PyObject and not really the same test. They use two 
formulas to achieve this, otherwise vectorization would not happen on "No Hints" benchmarks.

... Jianfeng pointed the vectorization issue out in a review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that wse are testing exactly what we want though; because with a select like that we have to write down a PyObject. Then we have to read the PyObject from the first column and cast it.

    col1 = Selectable.parse('num1=(double)f(num1, num2)').with_serial()

Would take the PyObject inside of the formula, then turn it into a double, rather than writing it down and holding onto it for ever.

Fixing/changing the benchmark does have some negative effects though; because we basically lose the history.

@stanbrub stanbrub requested a review from cpwright December 5, 2025 22:52
@stanbrub stanbrub merged commit 324ef4e into deephaven:main Dec 11, 2025
9 checks passed
@stanbrub stanbrub deleted the with-serial-udf branch December 11, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants