Skip to content

Fix nil map merges #4601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 7, 2025
Merged

Conversation

greg-rychlewski
Copy link
Member

Closes #4598

The problem was actually more general than CTEs. Any map merging where the right side of the merge is a {source, schema, prefix} with a nil return value would do the same thing. For example, you can recreate the problem with this query

TestRepo.insert!(%Post{title: "hi"})

from p in Post,
  left_join: s in Comment,
  on: p.id == s.post_id,
  select: %{title: p.title},
  select_merge: map(s, [:text])

Repo.all(query)
# [%{title: "hi"}]

@greg-rychlewski
Copy link
Member Author

It looks like one test is failing here https://github.com/elixir-ecto/ecto/blob/master/integration_test/cases/repo.exs#L1548

This changes the behaviour when the left and right side have common key names. Previously it would take the left side value if the right side is all nil. Now it overwrites common keys with nil.

Without knowing the background behind the current logic, I think we want the new behaviour. If you ran the query in a psql client or something like that the right side would return nil it wouldn't disappear. But it might be we have the logic this way for a good reason. So don't want to assume anything without asking.

@josevalim
Copy link
Member

I think we could argue both ways, actually. We could also argue that nils in SQL are mostly a no-op (i.e. they disappear) so in this case we should ignore the nils, as if nothing is being merged.

Perhaps if you the behaviour in this PR, then you should do an explicit map instead select_merge: %{stake: c.stake}? Then we document that when merging another source, nothing is merged if the source is nil.

@greg-rychlewski
Copy link
Member Author

I'm ok with either. I'll change the docs instead to explain the existing behaviour.

@greg-rychlewski
Copy link
Member Author

After playing around with this more, I think it is not so simple =/.

If we are not dealing with joins then this still returns a map instead of just nil

TestRepo.insert!(%Post{title: nil})

q = from p in Post, select: map(p, [:title])
TestRepo.all(q)
# [%{title: nil}]

It seems a bit inconsistent to me if it didn't retain the same behaviour on merging/joining. But tbh it's not clear to me what kind of query would trigger that all_nil behaviour except these joins. So I need to dig more.

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Apr 5, 2025

For example

query =
  from(Barebone)
  |> join(:left, [s], t in Barebone, on: s.id == t.id)
  |> select([s, t], {s, t})

TestRepo.all(query)

# [
#  {%Ecto.Integration.Barebone{
#     __meta__: #Ecto.Schema.Metadata<:loaded, "barebones">,
#     num: nil
#   }, nil}
# ]

So it seems intentional this is only for joins. But the rationale is not super clear to me. If you're sure we want to keep it like this I can document it. But just wanted to show you this in case it changes your mind

@josevalim
Copy link
Member

The difference is between returning a row where all values are nils and returning a row that there is no join. In your case, all values are nil, because if there was no entry, there would be no row. In the join case, both "all values are nil" and "returning a row" would be very similar except that, since it is almost certain you are relying on a non-nil condition somewhere (such as foreign keys), we can safely assume that if all values are nil, it is because nothing matched. So I think we have two options:

  1. Go with this PR for simplicity
  2. Make sure left_outer/right_outer joins include the column it was joined on so we can distinguish between nil columns and missing "row"

2 is probably too complex and come with corner cases, so it probably makes more sense to go with 1 for simplicity. Thoughts?

@greg-rychlewski
Copy link
Member Author

Thanks that makes it clear the reason for the existing behaviour. I agree about option 2...we are probably asking for trouble.

By going with this PR, you mean documenting the existing behaviour right? If so, that would be my choice too. I'm not too crazy about my original submission now because it seems this behaviour exists even without select_merge. So to keep it all acting the same it's probably clearer if we don't change anything. But let me know if you are thinking something else.

@josevalim
Copy link
Member

By going with this PR, you mean documenting the existing behaviour right?

I believe today we guarantee that joins merge correctly because we always return their primary keys/foreign keys, so we can distinguish a nil columns from no left joins, right? If that's the case, then I'd document the behaviour. However, if today we don't distinguish them, then it is saner to just always consider the values are nil.

@greg-rychlewski
Copy link
Member Author

so we can distinguish a nil columns from no left joins, right

From my testing it looks like we cannot do that today.

test "test join with all nil" do
    TestRepo.insert!(%Post{})

    query = 
      from p1 in Post, 
        left_join: p2 in Post, 
        on: p1.id == p2.id, 
        select: {map(p1, [:title]), map(p2, [:title])}

    IO.inspect TestRepo.all(query, log: :error)

    21:19:39.397 [error] QUERY OK source="posts" db=1.5ms queue=0.6ms
    SELECT p0."title", p1."title" FROM "posts" AS p0 LEFT OUTER JOIN "posts" AS p1 ON p0."id" = p1."id" []

    [{%{title: nil}, nil}]
  end

So then in this case I'm not sure I completely if you were thinking:

  1. Always assume the the record exists and has all nil values. In which case I think we need to completely remove the all_nil? checking in the post processor?
  2. Always assume the record doesn't exist and keep things as they are today. Because we maybe don't know all the ripple effects of reversing this behaviour

@josevalim
Copy link
Member

Right, the issue in this case is that map restricts the number of fields returned. However, I think it would be weird if we did this:

      from p1 in Post, 
        left_join: p2 in Post, 
        on: p2.id == -1, 
        select: {map(p1, [:title]), p2}

And it returned maps...

So I think this PR, #4601, is the right way to go about it. We make it so map and friends are consistent at least. 👍

@greg-rychlewski
Copy link
Member Author

Got it. Makes sense to me!

I just altered the failing test to make it inline with the new behaviour. Other than that, do you think it's ok to merge?

@greg-rychlewski greg-rychlewski merged commit f7615de into elixir-ecto:master Apr 7, 2025
7 checks passed
@greg-rychlewski greg-rychlewski deleted the merge_nil branch April 7, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

select_merge does not return columns when map and CTE used and CTE returns no data
2 participants