Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor codegen for collection expression spread (IEnumerable<T> -> ImmutableArray<T>) #71296

Open
Sergio0694 opened this issue Dec 17, 2023 · 8 comments
Assignees
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code help wanted The issue is "up for grabs" - add a comment if you are interested in working on it New Feature - Collection Expressions
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Dec 17, 2023

Related to #71195

Version Used: 4.9.0-ci (a8119d1)

Steps to Reproduce:

using System.Collections.Generic;
using System.Collections.Immutable;

static ImmutableArray<string> Test(IEnumerable<string> items)
{
    return [.. items];
}

sharplab

Expected Behavior:

return ImmutableArray.CreateRange(items);

OR

return items.ToImmutableArray();

Actual Behavior:

List<string> list = new List<string>();
IEnumerator<string> enumerator = items.GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        string current = enumerator.Current;
        list.Add(current);
    }
}
finally
{
    if (enumerator != null)
    {
        enumerator.Dispose();
    }
}
return ImmutableCollectionsMarshal.AsImmutableArray(list.ToArray());

@RikkiGibson @CyrusNajmabadi not entirely sure whether the spec allows this, I think the problem is that the factory API is CreateRange, but [CollectionBuilder] over ImmutableArray<T> uses Create as the method name. Perhaps it would be possible to just hardcode these additional ImmutableArray<T> cases? Though it'd be nice to have a more general solution 🤔

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Dec 17, 2023
@CyrusNajmabadi
Copy link
Member

not entirely sure whether the spec allows this

The spec allows this.

@Sergio0694
Copy link
Contributor Author

Had a feeling that was the case but wanted to double check, that's great 🙂

@jcouv jcouv added New Feature - Collection Expressions Code Gen Quality Room for improvement in the quality of the compiler's generated code labels Dec 18, 2023
@RikkiGibson
Copy link
Contributor

RikkiGibson commented Jan 2, 2024

After the spread optimization PR, codegen is as follows: SharpLab

List<string> list = new List<string>();
list.AddRange(items);
return ImmutableCollectionsMarshal.AsImmutableArray(list.ToArray());

This is better than the original but not great. Let's use this issue to track completing the relevant bullet from #68785:

Emit Enumerable.TryGetNonEnumeratedCount() and avoid intermediate List<T> at runtime.

https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.trygetnonenumeratedcount?view=net-8.0

I'd prefer that our solution has the following traits:

  1. if [..enumerable] is optimized, so should [..enumerable1, ..enumerable2] and [elem, ..enumerable].
    • It's fine to use an API like Enumerable.ToArray or Enumerable.ToImmutableArray for the single-spread case, but would like for the complex cases to also be "smart" here.
  2. if ImmutableArray is optimized, so should arrays

@Sergio0694
Copy link
Contributor Author

@RikkiGibson I have a question related to this and also #71292. Right now all of these optimizations are hardcoded in the compiler, which means they (1) need someone in the compiler team to specifically implement them, and also (2) they won't work with user defined collection types that are not recognized by Roslyn.

Here's an idea, would it be possible to make it so that if:

  • You have an expression in the form [.. coll]
  • The target type is attributed with [CollectionBuilder] and the factory method being specified is compatible with coll

Roslyn will simply lower it to TheType.Factory(coll).

For instance, with this optimization, that [.. span] example targeting ImmutableArray would have "just worked", because ImmutableArray has a [CollectionBuilder] attribute and the declared factory takes a span. No specific compiler work for optimizing that case would've been needed.

Thoughts? 🙂

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Jan 2, 2024

they won't work with user defined collection types that are not recognized by Roslyn.

We expect user defined types to make use of collection-expr optimizations, to the extent that they are created in terms of well-known types. For example [CollectionBuilder] requires a ReadOnlySpan parameter, which means all ReadOnlySpan optimizations will apply for a collection-expr which is lowered to [CollectionBuilder].

Generally when a type has CollectionBuilder, that strategy is considered better than the others (IIRC). ImmutableArray is just special, and we happen to think our "create an array then cast to ImmutableArray" approach is better--and it is, when it comes to creating from scalars, since it removes the value copy from the span to the final array. However for [..readOnlySpan], it's probably better for code size to just call ImmutableArray.Create(readOnlySpan) over ImmutableCollectionsMarshal.AsImmutableArray(readOnlySpan.ToArray()).

@jaredpar jaredpar added this to the 17.10 milestone Jan 3, 2024
@jaredpar jaredpar removed the untriaged Issues and PRs which have not yet been triaged by a lead label Jan 3, 2024
@jaredpar jaredpar added the help wanted The issue is "up for grabs" - add a comment if you are interested in working on it label Mar 27, 2024
@jaredpar jaredpar modified the milestones: 17.10, Backlog Mar 27, 2024
@DoctorKrolic
Copy link
Contributor

I'd prefer that our solution has the following traits:

  1. if [..enumerable] is optimized, so should [..enumerable1, ..enumerable2] and [elem, ..enumerable].
    • It's fine to use an API like Enumerable.ToArray or Enumerable.ToImmutableArray for the single-spread case, but would like for the complex cases to also be "smart" here.
  2. if ImmutableArray is optimized, so should arrays

I would suggest to break things down and focus here only on single-spread case for immutable arrays.

For instance, we need to determine, when we should pick ToImmutableArray over ImmutableArray.CreateRange. If we look at the implementation of ToImmutableArray we will see, that it just calls CreateRange, but adds an additional optimization check on top. Therefore we should always use CreateRange if type is known at compile-time to not be another immutable array (e.g. if spreading expression is a concrete class like List<T> or a HashSet<T>) to help the JIT avoid additional analysis to throw away redundant check (from the IL size perspective, we don't care what to call, it takes the same size)

However, what should we do if we the underlying spreading expression type can be an immutable array, e.g. in cases, when it is an interface, that ImmutableArray<T> implements like IEnumerable<T>? Lets make a few statements:

  • [.. spread] should imply copying semantics
  • If ToImmutableArray gets another immutable array as an argument, it just returns it without copying
  • ImmutableArray<T> is an immutable type therefore it should be unobservable
    • But there are "unsafe" APIs like ImmutableCollectionsMarshal, by using which we can observe whether there was a copy or not
      • But, again, these APIs are "unsafe", so if we skip the copy and it becomes observable, should we just say "Well, you used unsafe patterns, we cannot guarantee you anything now"?
    • If we decide to skip the copy by using ToImmutableArray, does it mean that in a very special case ImmutableArray<T> pseudoCopy = [.. anotherImmutableArrayOfT] we can lower it to just ImmutableArray<T> pseudoCopy = anotherImmutableArrayOfT?

@RikkiGibson What do you think?

@RikkiGibson
Copy link
Contributor

It is fine to use ToImmutableArray when a single spread of reference type is implicitly convertible to IEnumerable<T>.

I would not invest time in honing it further than that for now. There is a broader optimization space that I would want to invest in first.

I also think skipping the copy for ImmutableArray<T> other = [..immutableArray]; is a low priority optimization. If user can statically see it is ImmutableArray, then they should simply read it instead of spreading it.

@CyrusNajmabadi
Copy link
Member

If user can statically see it is ImmutableArray, then they should simply read it instead of spreading it.

Agreed. This is a better place for our analyzers to call out an unnecessary copy. Something I believe they already do on unnecessary explicit ToImmArray calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code help wanted The issue is "up for grabs" - add a comment if you are interested in working on it New Feature - Collection Expressions
Projects
None yet
Development

No branches or pull requests

7 participants