-
-
Notifications
You must be signed in to change notification settings - Fork 372
no_attributes? true relationship with multiple parent() refs in or filter produces cartesian product#2577Description
Code of Conduct
- I agree to follow this project's Code of Conduct
AI Policy
- I agree to follow this project's AI Policy, or I agree that AI was not used while creating this issue.
Is your feature request related to a problem? Please describe.
When a no_attributes? true relationship filter combines multiple parent() references with or, the generated SQL joins the parent relationships with LEFT OUTER JOINs before the final LATERAL join. This creates a cartesian product — if one parent relationship has N rows and another has M rows, the intermediate result has N×M rows, and the lateral join executes N×M times.
defmodule Post do
relationships do
has_many :tags_a, Tag do
no_attributes? true
filter expr(parent(author.favorite_tags.id) == id)
end
has_many :tags_b, Tag do
no_attributes? true
filter expr(parent(category.default_tags.id) == id)
end
# Combine both into one relationship
# When this relationship has any sort (explicit or via default_sort),
# the lateral subquery includes row_number() AS __order__,
# which breaks DISTINCT deduplication.
has_many :all_tags, Tag do
no_attributes? true
sort inserted_at: :asc # <-- any sort triggers the correctness bug
filter expr(
parent(tags_a.id) == id or
parent(tags_b.id) == id
)
end
end
endThe generated SQL looks roughly like:
SELECT DISTINCT post.id, lateral_result.*, lateral_result.__order__
FROM post
LEFT JOIN favorite_tag ON ... -- tags_a path (N rows)
LEFT JOIN LATERAL (...) AS resolved_a ON TRUE
LEFT JOIN default_tag ON ... -- tags_b path (M rows)
LEFT JOIN LATERAL (...) AS resolved_b ON TRUE
INNER JOIN LATERAL (
SELECT *, row_number() OVER (ORDER BY ...) AS __order__
FROM tag
WHERE resolved_a.id = tag.id OR resolved_b.id = tag.id
) AS lateral_result ON TRUEThere are two issues:
-
Cartesian product (performance) — The LEFT JOINs for
tags_a(N rows) andtags_b(M rows) multiply, creating N×M intermediate rows. The final lateral join executes N×M times instead of once. This happens regardless of whether a sort exists. -
Duplicate results when sort exists (correctness) — Ash applies
SELECT DISTINCTto deduplicate the cartesian product. Without any sort, this works correctly — the same record always has identical column values across lateral executions, soDISTINCTremoves all duplicates.
However, when the relationship has any sort (explicit sort in the relationship definition, default_sort from an extension, or sort applied via read_action), the lateral subquery includes row_number() OVER (ORDER BY ...) AS __order__ to preserve ordering, and this column is included in the DISTINCT evaluation.
Since each lateral execution operates on a different subset of matching tags (depending on which cartesian product row it sees), the same tag can receive different __order__ values across executions, making DISTINCT unable to deduplicate.
Concrete example:
tags_a matches: [tag_1(inserted_at=10:00), tag_2(inserted_at=12:00)]
tags_b matches: [tag_3(inserted_at=11:00)]
Lateral execution 1 (resolved_a=tag_1):
matching set = {tag_1(10:00), tag_3(11:00)}
→ tag_3 gets __order__ = 2
Lateral execution 2 (resolved_a=tag_2):
matching set = {tag_3(11:00), tag_2(12:00)}
→ tag_3 gets __order__ = 1
DISTINCT sees (tag_3, __order__=2) and (tag_3, __order__=1) as different rows.
Result: tag_3 appears twice.
Note: if the sort column values happen to be monotonically ordered across sources (e.g., all tags_a timestamps < all tags_b timestamps), __order__ stays consistent and DISTINCT works correctly. This makes the bug hard to reproduce in tests (where records are created sequentially with monotonic timestamps) but easy to trigger in production (where records from different sources have interleaved sort values).
Describe the solution you'd like
Approach 1: Automatic optimization (no syntax change)
When Ash encounters multiple parent(rel.id) == id conditions combined with or in a no_attributes? true relationship filter, it should automatically generate EXISTS subqueries instead of lateral joins. This requires no API change — the current expression:
filter expr(
parent(tags_a.id) == id or
parent(tags_b.id) == id
)-- Current: lateral joins → cartesian product
FROM tag
JOIN LATERAL (...tags_a...) ON true
JOIN LATERAL (...tags_b...) ON true
WHERE tags_a.id = tag.id OR tags_b.id = tag.id
-- Proposed: EXISTS subqueries → no cartesian product
FROM tag
WHERE EXISTS (SELECT 1 FROM ...tags_a... WHERE tags_a.id = tag.id)
OR EXISTS (SELECT 1 FROM ...tags_b... WHERE tags_b.id = tag.id)Approach 2: Explicit parent_exists expression
If automatic detection is too complex, a new expression could let users opt in explicitly:
filter expr(
parent_exists(:tags_a, id == parent(id)) or
parent_exists(:tags_b, id == parent(id))
)Where parent_exists(rel, condition) means: "on the parent resource, check if rel has a record matching condition." Inside the condition, id refers to the relationship target's field, and parent(id) refers back to the current child resource's field.
This is analogous to exists/2 but operates on the parent's relationships from within a no_attributes? true filter context.
Describe alternatives you've considered
- Implementing a manual relationship that loads and unions the constituent relationships instead of relying on a single complex filter expression.
defmodule MyApp.Relationships.UnionMany do
use Ash.Resource.ManualRelationship
@impl true
def load(records, opts, ctx) do
relationships = opts |> Keyword.fetch!(:relationships)
ash = to_opts(ctx)
records =
records
|> Ash.load!(relationships, [lazy?: true] ++ ash)
{:ok,
Map.new(records, fn record ->
items =
relationships
|> Enum.flat_map(&(record |> Map.get(&1) |> List.wrap()))
|> Enum.uniq_by(& &1.id)
{record.id, items}
end)}
end
end# Usage
has_many :all_tags, Tag do
manual {MyApp.Relationships.UnionMany, relationships: [:tags_a, :tags_b]}
end-
Rewriting the filter using child-side FKs — Instead of
parent(rel.id) == id, usechild_fk == parent(field)andcolumn == :valueto avoid the extra joins. This works when the child has direct FKs but doesn't generalize to cases where 2+parent()joins are unavoidable. -
Using a calculation instead of a relationship — Loading the sub-relationships individually and concatenating. This works but loses the ability to further load nested relationships on the result.
Additional context
No response