Duckdb contains an optimisation (see here) that effectively rewrites:
CASE
WHEN cosine_sim(l, r) > 0.9 THEN 1
WHEN cosine_sim(l, r) > 0.8 THEN 2
WHEN cosine_sim(l, r) > 0.7 THEN 3
...
END
to
WITH precomputed_cosine AS (
SELECT cosine_sim(l, r) AS precompute_cosine_value, ...
)
SELECT
CASE
WHEN precompute_cosine_value > 0.9 THEN 1
WHEN precompute_cosine_value > 0.8 THEN 2
WHEN precompute_cosine_value > 0.7 THEN 3
...
END
However, testing has show this optimisation only works when the case statement ONLY includes precompute_cosine_value.
As soon as the original statement is:
CASE
WHEN l or r is null then 0
WHEN cosine_sim(l, r) > 0.9 THEN 1
WHEN cosine_sim(l, r) > 0.8 THEN 2
WHEN cosine_sim(l, r) > 0.7 THEN 3
...
END
the optimisation fails
Ref:
#2738
[TODO: verify this by looking at the relevant duckdb code)