Replies: 4 comments 6 replies
-
The link is 404 |
Beta Was this translation helpful? Give feedback.
-
Here is a good paper on the high level differences (in the background section): |
Beta Was this translation helpful? Give feedback.
-
I didn't look at your code too closely, but the actual datasource itself also seems to make many allocations As @tustvold said in Discord
Here is some documentation on how to do it: https://datafusion.apache.org/library-user-guide/profiling.html Note that it is possible to reuse the allocations in DataFusion's functions, though most of the built in ones don't do it as we don't normally see allocations as the bottleneck in filter evaluations See the example here: datafusion/datafusion-examples/examples/advanced_udf.rs Lines 203 to 246 in 4d2e06f Most |
Beta Was this translation helpful? Give feedback.
-
One additional point is that real row-based system code can't be as optimized as your demo implementation. let row = input.get_row(i);
let elem = row.get_col(row.get_schema().get_column_datatype(j), j); The key issue is that to extract a single element, the system has to pay the overhead of multiple function calls every row. If you benchmark an analytical query in a row-based system like PostgreSQL, most of the execution time will be spent on these function calls interpreting each row, rather than on the actual intended computation. In contrast, vectorized engines like DataFusion only incur this function-calling overhead once per vector, which can contain thousands of elements. This significantly improves efficiency by amortizing function call overhead across many data points. To generate such an optimized row-based implementation, there is another technique called compiled execution, which translates SQL queries directly into low-level code for execution. Currently, I think this kind of dark magic is mostly found in academia. I remember this idea is discussed in https://15721.courses.cs.cmu.edu/spring2023/papers/03-storage/p967-abadi.pdf, or possibly in another one from the reading list on the https://15721.courses.cs.cmu.edu/spring2023/schedule.html. |
Beta Was this translation helpful? Give feedback.
-
When implementing a simple filtering and summation query using Arrow, I observed that the performance fell short of expectations. Compared to the row-oriented implementation, the performance degradation appears to be attributed to additional memory allocations. In contrast, the row-oriented engine demonstrates superior performance as it can avoid deep copying when transferring data between operators.
The experimental codebase is at here.
Beta Was this translation helpful? Give feedback.
All reactions