Custom aggregate functions for MySQL, implemented as User-Defined Functions (UDFs) in C++.
Multiplies all non-NULL values in a group and returns the result — the multiplicative equivalent of SUM().
SELECT product(value) FROM t;| Behaviour | Detail |
|---|---|
| Return type | DOUBLE |
| Argument | Any numeric column or expression |
| NULL handling | NULL values are skipped (not treated as zero) |
| Empty group | Returns 1.0 (multiplicative identity) |
Returns the median of all non-NULL values in a group. For an even-sized group, returns the average of the two middle values.
SELECT median(value) FROM t;| Behaviour | Detail |
|---|---|
| Return type | DOUBLE |
| Argument | Any numeric column or expression |
| NULL handling | NULL values are skipped |
| Empty group | Returns NULL |
| Memory | Buffers all non-NULL values for the group; O(n) per group |
Returns the most frequently occurring non-NULL value in a group. If multiple values tie for most frequent, an arbitrary one of them is returned.
SELECT mode(value) FROM t;| Behaviour | Detail |
|---|---|
| Return type | DOUBLE |
| Argument | Any numeric column or expression |
| NULL handling | NULL values are skipped |
| Empty group | Returns NULL |
| Memory | Hash map keyed by distinct values; O(d) per group where d is distinct value count |
MySQL aggregate UDFs follow a five-function lifecycle that MySQL calls once per query (or once per group when used with GROUP BY). For an aggregate named xxx:
| Function | Called when | What it does |
|---|---|---|
xxx_init |
Query starts | Allocates per-query state, validates argument count, coerces input types |
xxx_clear |
Each new group begins | Resets the accumulator |
xxx_add |
Each row in the group | Folds the row's value into the accumulator; skips NULLs |
xxx |
Group is complete | Returns the accumulated result |
xxx_deinit |
Query finishes | Frees state allocated in _init |
Each UDF in this repo follows that pattern with its own state shape:
| UDF | State held in UDF_INIT->ptr |
Final step |
|---|---|---|
product |
double accumulator (init 1.0) |
Return accumulator |
median |
std::vector<double> of all values |
nth_element to find middle, average two middles if even count |
mode |
std::unordered_map<double, uint64_t> of value → count |
Linear scan for max count |
apt-get install -y cmake libmysqlclient-dev build-essential
# For ARM cross-compilation only:
apt-get install -y crossbuild-essential-arm64Output is placed in a build/ subdirectory. The CMake build supports native compilation and cross-compilation for a different target architecture.
Native (matches your host machine):
mkdir build && cd build
cmake ..
makeCross-compile for ARM (aarch64) from an x86 host:
mkdir build-arm && cd build-arm
cmake .. -DBUILD_ARM=ON
makeCross-compile for x86_64 from an ARM host:
mkdir build && cd build
cmake .. -DBUILD_X86=ON
makeThe compiled libraries are product.so, median.so, and mode.so — one per UDF.
-
Copy the libraries you want to use into MySQL's plugin directory:
PLUGIN_DIR=$(mysql --help | grep 'plugin-dir' | awk '{print $2}') sudo cp build/product.so build/median.so build/mode.so "$PLUGIN_DIR"
-
Register each function in MySQL:
CREATE AGGREGATE FUNCTION product RETURNS REAL SONAME 'product.so'; CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'median.so'; CREATE AGGREGATE FUNCTION mode RETURNS REAL SONAME 'mode.so';
-
To unload one later:
DROP FUNCTION product;
The test_server/ directory contains a Docker Compose setup that spins up a MySQL instance with the plugin pre-loaded — useful for quick testing without touching a local MySQL installation.
-
Build the libraries and copy them into the plugin mount:
mkdir build && cd build && cmake .. && make && cd .. cp build/product.so build/median.so build/mode.so test_server/plugin/
-
Start the container:
cd test_server docker compose up -d -
Connect and load the functions:
docker compose exec mysql mysql -uroot -proot test
SOURCE /entrypoint.sh; -- or run load_function.sql manually CREATE AGGREGATE FUNCTION product RETURNS REAL SONAME 'product.so'; CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'median.so'; CREATE AGGREGATE FUNCTION mode RETURNS REAL SONAME 'mode.so';
The container's entrypoint handles the plugin directory automatically — if MySQL's actual plugin path differs from
/usr/lib/mysql/plugin, it creates a symlink so the mounted.sois found regardless.
-- Multiply all values in a column
SELECT product(price) FROM items;
-- Per-group product with GROUP BY
SELECT category, product(multiplier)
FROM adjustments
GROUP BY category;
-- Median latency per endpoint
SELECT endpoint, median(latency_ms)
FROM requests
GROUP BY endpoint;
-- Most common rating per product
SELECT product_id, mode(rating)
FROM reviews
GROUP BY product_id;NULL values are skipped by every aggregate. A group containing only NULLs returns the identity for product (1.0) and NULL for median and mode.
The full specification for writing MySQL UDFs is in the official docs: MySQL 8.4 — Adding a Loadable Function
Relevant sections:
UDF_INIT— struct passed to every lifecycle function; itsptrfield is the standard way to carry per-query state across_init→_clear→_add→ main →_deinitcallsUDF_ARGS— struct describing the arguments MySQL passes to each call;args->args[n]is achar*that MySQL sets toNULLfor SQL NULLs, which is how_adddetects and skips them- Naming rules — for an aggregate function named
xxx, MySQL expects exactly five symbols exported from the.so:xxx_init,xxx_clear,xxx_add,xxx, andxxx_deinit initid->maybe_null— set to1in_initformedianandmodeso MySQL knows the result can be NULL on empty groups;productleaves it default since it always returns1.0