-
Notifications
You must be signed in to change notification settings - Fork 690
Enable multi-column user-defined aggregate functions #2396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Co-authored-by: mathiasrw <[email protected]>
|
@copilot There is a slight misunderstanding. We dont want to hardcode the CORR function. We want to make it possible for users to make aggrecated functions with more than one column. I imagine we would have to look into managing multiple inputs in custom functions and then make sure they are passed to the function in a generic way without breaking changes. |
Co-authored-by: mathiasrw <[email protected]>
You're absolutely right! I've updated the implementation to enable generic multi-column user-defined aggregates instead of hardcoding CORR. The changes allow users to create custom aggregate functions with multiple columns by defining them like: alasql.aggr.MY_FUNC = function(col1, col2, ..., accumulator, stage) {
// Function logic with stage 1 (init), stage 2 (accumulate), stage 3 (finalize)
}Then use them in SQL: The implementation is backward compatible - single-column aggregates work exactly as before. All changes are in the parser and groupby compiler to pass multiple arguments through the 3-stage aggregation process. See commit 2e20581 for the full implementation with test cases for 2-column and 3-column user-defined aggregates. |
| assert.deepEqual(res.length, 1); | ||
| assert(Math.abs(res[0].corr - 1) < 0.0001, 'Expected correlation close to 1'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want you to correlate a deep equal to the exact output you are expecting. This applies to all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAn we please also test with data coming as parameters (corr(?,?))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any tests of 3 or 4 parameters?
User attempted to implement CORR() as a custom aggregate but discovered AlaSQL's aggregate architecture only supports single-column inputs. Instead of hardcoding CORR, this PR extends the REDUCE mechanism to allow users to create custom aggregate functions with multiple columns.
Changes
Extended REDUCE aggregate mechanism for multi-column support:
src/alasqlparser.jisonandsrc/alasqlparser.js) to store all arguments in theargsarray when creating REDUCE aggregatesexprlist.pop()toexprlist[0], args: exprlistto preserve all column expressionsUpdated GROUP BY compiler (
src/423groupby.js):Updated finalization logic (
src/38query.js):undefinedvalues followed by accumulator and stage for consistent function signaturesAdded comprehensive tests (
test/test2600.js):Usage
Users can now define custom aggregates with multiple columns:
Backward Compatibility
Single-column user-defined aggregates continue to work exactly as before. The system automatically detects the number of arguments and handles them appropriately.
Original prompt
This section details on the original issue you should resolve
<issue_title>Trying to implement CORR() using custom function logic, but it is not working</issue_title>
<issue_description>Hello. I am trying to implement CORR() using custom function logic provided here:
https://github.com/alasql/alasql/wiki/User-Defined-Functions
This is my code.
But, somehow this returned
{0 : {corr: 3}}, which signifies the calculation is not being done properly, considering pearson correlation always lies between -1 and 1...Any guess or help to figure out what to do?
Thank you in advance.
</issue_description>
<agent_instructions>Please look into this issue and determine if there is a bug for us to solve, or if he is using SQL the wrong way. </agent_instructions>
Comments on the Issue (you are @copilot in this section)
@mathiasrw If you provide some example data and expected output it makes it easier to help out.From the AlaSQL Bot
Your implementation of the
CORR()function logic looks well-structured, but there seems to be an issue with the Pearson correlation coefficient calculation, possibly due to either the data or the implementation of the accumulation logic.Here are a few potential issues to check:
Data Points: Make sure that the data passed to
CORR()contains valid numeric values and there are no undefined or non-numeric values. IfvalueXorvalueYis non-numeric or undefined, it should not contribute to the calculation.Numerator and Denominator Calculation:
(count * sumXY) - (sumX * sumY)is correct and makes sense given the data.(denominatorX * denominatorY)does not lead to a division by zero. IfdenominatorXordenominatorYis zero, the correlation should be undefined, hence returningnull(which you already handle).Edge Cases: Ensure your function handles edge cases, such as when there is only one data point or when all
valueXorvalueYare the same, which would result in a zero denominator.Here's an improved implementation suggestion:
stage 1/2accumulates data correctly.nullwhen invalid data (e.g., single data point or zero variance) is encountered.