Skip to content

no_attributes? true relationship with multiple parent() refs in or filter produces cartesian product #2577

@nallwhy

Description

@nallwhy

Code of Conduct

  • I agree to follow this project's Code of Conduct

AI Policy

  • I agree to follow this project's AI Policy, or I agree that AI was not used while creating this issue.

Is your feature request related to a problem? Please describe.

When a no_attributes? true relationship filter combines multiple parent() references with or, the generated SQL joins the parent relationships with LEFT OUTER JOINs before the final LATERAL join. This creates a cartesian product — if one parent relationship has N rows and another has M rows, the intermediate result has N×M rows, and the lateral join executes N×M times.

defmodule Post do
  relationships do
    has_many :tags_a, Tag do
      no_attributes? true
      filter expr(parent(author.favorite_tags.id) == id)
    end

    has_many :tags_b, Tag do
      no_attributes? true
      filter expr(parent(category.default_tags.id) == id)
    end

    # Combine both into one relationship
    # When this relationship has any sort (explicit or via default_sort),
    # the lateral subquery includes row_number() AS __order__,
    # which breaks DISTINCT deduplication.
    has_many :all_tags, Tag do
      no_attributes? true
      sort inserted_at: :asc   # <-- any sort triggers the correctness bug

      filter expr(
        parent(tags_a.id) == id or
          parent(tags_b.id) == id
      )
    end
  end
end

The generated SQL looks roughly like:

SELECT DISTINCT post.id, lateral_result.*, lateral_result.__order__
FROM post
  LEFT JOIN favorite_tag  ON ...   -- tags_a path (N rows)
  LEFT JOIN LATERAL (...)  AS resolved_a ON TRUE
  LEFT JOIN default_tag   ON ...   -- tags_b path (M rows)
  LEFT JOIN LATERAL (...)  AS resolved_b ON TRUE
  INNER JOIN LATERAL (
    SELECT *, row_number() OVER (ORDER BY ...) AS __order__
    FROM tag
    WHERE resolved_a.id = tag.id OR resolved_b.id = tag.id
  ) AS lateral_result ON TRUE

There are two issues:

  1. Cartesian product (performance) — The LEFT JOINs for tags_a (N rows) and tags_b (M rows) multiply, creating N×M intermediate rows. The final lateral join executes N×M times instead of once. This happens regardless of whether a sort exists.

  2. Duplicate results when sort exists (correctness) — Ash applies SELECT DISTINCT to deduplicate the cartesian product. Without any sort, this works correctly — the same record always has identical column values across lateral executions, so DISTINCT removes all duplicates.

However, when the relationship has any sort (explicit sort in the relationship definition, default_sort from an extension, or sort applied via read_action), the lateral subquery includes row_number() OVER (ORDER BY ...) AS __order__ to preserve ordering, and this column is included in the DISTINCT evaluation.

Since each lateral execution operates on a different subset of matching tags (depending on which cartesian product row it sees), the same tag can receive different __order__ values across executions, making DISTINCT unable to deduplicate.

Concrete example:

tags_a matches: [tag_1(inserted_at=10:00), tag_2(inserted_at=12:00)]
tags_b matches: [tag_3(inserted_at=11:00)]

Lateral execution 1 (resolved_a=tag_1):
  matching set = {tag_1(10:00), tag_3(11:00)}
  → tag_3 gets __order__ = 2

Lateral execution 2 (resolved_a=tag_2):
  matching set = {tag_3(11:00), tag_2(12:00)}
  → tag_3 gets __order__ = 1

DISTINCT sees (tag_3, __order__=2) and (tag_3, __order__=1) as different rows.
Result: tag_3 appears twice.

Note: if the sort column values happen to be monotonically ordered across sources (e.g., all tags_a timestamps < all tags_b timestamps), __order__ stays consistent and DISTINCT works correctly. This makes the bug hard to reproduce in tests (where records are created sequentially with monotonic timestamps) but easy to trigger in production (where records from different sources have interleaved sort values).

Describe the solution you'd like

Approach 1: Automatic optimization (no syntax change)

When Ash encounters multiple parent(rel.id) == id conditions combined with or in a no_attributes? true relationship filter, it should automatically generate EXISTS subqueries instead of lateral joins. This requires no API change — the current expression:

filter expr(
  parent(tags_a.id) == id or
    parent(tags_b.id) == id
)
-- Current: lateral joins → cartesian product
FROM tag
  JOIN LATERAL (...tags_a...) ON true
  JOIN LATERAL (...tags_b...) ON true
WHERE tags_a.id = tag.id OR tags_b.id = tag.id

-- Proposed: EXISTS subqueries → no cartesian product
FROM tag
WHERE EXISTS (SELECT 1 FROM ...tags_a... WHERE tags_a.id = tag.id)
   OR EXISTS (SELECT 1 FROM ...tags_b... WHERE tags_b.id = tag.id)

Approach 2: Explicit parent_exists expression

If automatic detection is too complex, a new expression could let users opt in explicitly:

filter expr(
  parent_exists(:tags_a, id == parent(id)) or
    parent_exists(:tags_b, id == parent(id))
)

Where parent_exists(rel, condition) means: "on the parent resource, check if rel has a record matching condition." Inside the condition, id refers to the relationship target's field, and parent(id) refers back to the current child resource's field.

This is analogous to exists/2 but operates on the parent's relationships from within a no_attributes? true filter context.

Describe alternatives you've considered

  1. Implementing a manual relationship that loads and unions the constituent relationships instead of relying on a single complex filter expression.
defmodule MyApp.Relationships.UnionMany do
  use Ash.Resource.ManualRelationship

  @impl true
  def load(records, opts, ctx) do
    relationships = opts |> Keyword.fetch!(:relationships)
    ash = to_opts(ctx)

    records =
      records
      |> Ash.load!(relationships, [lazy?: true] ++ ash)

    {:ok,
     Map.new(records, fn record ->
       items =
         relationships
         |> Enum.flat_map(&(record |> Map.get(&1) |> List.wrap()))
         |> Enum.uniq_by(& &1.id)

       {record.id, items}
     end)}
  end
end
# Usage
has_many :all_tags, Tag do
  manual {MyApp.Relationships.UnionMany, relationships: [:tags_a, :tags_b]}
end
  1. Rewriting the filter using child-side FKs — Instead of parent(rel.id) == id, use child_fk == parent(field) and column == :value to avoid the extra joins. This works when the child has direct FKs but doesn't generalize to cases where 2+ parent() joins are unavoidable.

  2. Using a calculation instead of a relationship — Loading the sub-relationships individually and concatenating. This works but loses the ability to further load nested relationships on the result.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions