Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optimizations for constructing collections from collection literals #68785

Open
8 of 17 tasks
cston opened this issue Jun 26, 2023 · 42 comments
Open
8 of 17 tasks
Assignees
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code New Feature - Collection Expressions
Milestone

Comments

@cston
Copy link
Member

cston commented Jun 26, 2023

Implement various optimizations for constructing collections from collection literals, including:

  • Use inline array for the collection builder method ReadOnlySpan<T> argument when the span cannot escape. #69227

  • Use inline arrays for collection expressions with span target types #69412

  • Use existing optimization for ReadOnlySpan<T> to blit data into the assembly section when values are constants of primitive type. #69412

  • Emit [] as Array.Empty<T>() when the target type is IEnumerable<T> or T[]. #69355

  • Emit [] as default(Span<T>) when the target type is Span<T>. #70260

  • Avoid intermediate List<T> if all spread elements are countable and collection satisfies certain heuristics. #69875

  • Use EnsureCapacity(int) for collection initializer types with no spreads; potentially also when length is known and collection satisfies certain heuristics.

  • Use CollectionsMarshal.SetCount<T>(List<T>, int) and CollectionsMarshal.AsSpan<T>(List<T>) to create a List<T> when length is known and collection satisfies certain heuristics; fallback to List<T>..ctor(int capacity) if CollectionsMarshal is not available. #70197

  • Use ImmutableCollectionsMarshal.AsImmutableArray<T>(T[]) to create an ImmutableArray<T>. #70222

  • Use .ctor(int capacity) for some BCL types when EnsureCapacity(int) is not available, or not available downlevel.

  • Emit Enumerable.TryGetNonEnumeratedCount() and avoid intermediate List<T> at runtime. Poor codegen for collection expression spread (IEnumerable<T> -> ImmutableArray<T>) #71296

  • Use List<T>.AddRange(IEnumerable<T>) when adding a spread to a List<T> if the spread implements ICollection<T>, or if the spread implements IEnumerable<T> and does not have a struct enumerator; use CollectionExtensions.AddRange<T>(List<T>, ReadOnlySpan<T>) if the spread is a span; otherwise use foreach and List<T>.Add(T).

  • Use CollectionsMarshal.AsSpan<T>(List<T>) to create a ReadOnlySpan<T> from the temporary List<T> when creating a [CollectionBuilder] collection from a collection literal of unknown length.

  • emit foreach (var x in y ?? []) as a lifted null check that executes nothing if 'y' is null.

  • When producing an IEnumerable<T> use the same codegen pattern that yield does. Specifically, where the instance created is both an IEnumerable<T> and its own IEnumerator<T> for the very first iteration of the enumerable on the same thread. This avoids an extra allocation for hte common case of producing an IEnumerable<T> just to have it be immediately iterated.

  • Avoid heap allocation for spread in [.. x ? [y] : []]. (moved to #69277)

  • Avoid heap allocation for collection in foreach (bool x in [true, false]). (moved to #69277)

Relates to test plan #66418

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Jun 26, 2023
@cston cston self-assigned this Jun 26, 2023
@jcouv jcouv added this to the C# 12.0 milestone Jun 26, 2023
@jcouv jcouv added Code Gen Quality Room for improvement in the quality of the compiler's generated code and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Jun 26, 2023
@CyrusNajmabadi
Copy link
Member

An additional optimization is that we should use https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.immutablecollectionsmarshal.asimmutablearray?view=net-8.0 (when present) for immutable arrays.

@CyrusNajmabadi
Copy link
Member

@cston The intent here is for all of this to be present in C# 12, correct? This is a core part of the collection-expr story, so i want to make sure it's making the v1 release :)

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jul 19, 2023

Also, on top of Enumerable.TryGetNonEnumeratedCount() we just need to directly instantiate with teh right capacity when all the args are either elements, or spreads of types with .Length/.Count since we can directly compute (with certainty) the final capacity.

Enumerable.TryGetNonEnumeratedCount() is for when we can't determine that statically, but still want to try at runtime if we're handed instances that can efficiently give us the count.

Whoops. Reading is fundamental.

@CyrusNajmabadi
Copy link
Member

For Emit [] as Array.Empty<T>(), from offline talk, we want this to apply for both arrays and IEnumerables. so int[] = [] should use Array.Empty as well.

@cston
Copy link
Member Author

cston commented Jul 21, 2023

Added the items from the comments above to the list in the issue description.

@MichalPetryka
Copy link

AddRange for List should also be specialcased to this Span extension method.

@sakno
Copy link

sakno commented Aug 6, 2023

#69355 offers ad-hoc solution only for arrays instead of generic approach. For instance, compiler might check whether the target type TCollection has static property TCollection Empty {get;} or static factory method TCollection<T> Empty<T>()

@CyrusNajmabadi
Copy link
Member

or static factory method

@sakno as I mentioned in the lang chat, our pattern allows the type itself to decide what to return. So types can choose to return a .Empty Singleton if they so choose.

@TonyValenti
Copy link

I was thinking about this a bit and thought his was likely discussed elsewhere but I wanted to mention it:

  1. [] should be documented to not necessarily return a new instance.

  2. [] should ideally map to a .Empty static property or method so that it works with all types and is not special cased for arrays or enumerables.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Aug 6, 2023

@TonyValenti

We do doc '1'. The overall design also gives the compiler broad leeway to make assumptions and optimize, including in ways that can be observable. There is a general presumption though of good behavior (in line with the same assumptions and optimizations we do in last patterns). For example, presuming that is you enumerate an empty collection that that will not mutate state anywhere and that you will get no elements returned from the enumeration.

'2' naturally falls out as we defer to the type to determine what value to return. Types with an Empty in the BCL are doing this already.

Types where it is safe to use an empty Singleton will all work this way. Types where it would not be safe (like List<T>) will not :-)

@jnm2
Copy link
Contributor

jnm2 commented Aug 23, 2023

(Examples are decompiling code generated by SDK 8.0.0-preview.7.23375.6, with varying return types and a method body of return [.. enumerable];.)

Could enumerable.ToArray() be called here instead of an Add loop and an intermediate List<T>? This would take advantage of LINQ method optimizations that inspect the enumerable in case the size is known at runtime, etc.

// Emitted by SDK 8.0.0-preview.7.23375.6 and decompiled
private object?[] Example(IEnumerable<object?> enumerable)
{
    List<object> objectList = new List<object>();
    foreach (object obj in enumerable)
        objectList.Add(obj);
    return objectList.ToArray();
}

Could enumerable.ToImmutableArray() be called here instead of an Add loop and an intermediate List<T>?

// Emitted by SDK 8.0.0-preview.7.23375.6 and decompiled
private ImmutableArray<object?> Example(IEnumerable<object?> enumerable)
{
    List<object> objectList = new List<object>();
    foreach (object obj in enumerable)
        objectList.Add(obj);
    return ImmutableArray.Create<object>(new ReadOnlySpan<object>(objectList.ToArray()));
}

@jaredpar jaredpar modified the milestones: C# 12.0, 17.8 Sep 11, 2023
@jnm2
Copy link
Contributor

jnm2 commented Sep 25, 2023

Could there be value in calling HashSet<T>.UnionWith (which it has instead of AddRange) when another hash set with a matching comparer is being spread into it?

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Oct 5, 2023

Emit [] as default(Span<T>) when the target type is Span<T>.

This one may be worth doing for 17.8. I will begin working on it and we'll see if it lands in 17.8 or 17.9.

@CyrusNajmabadi
Copy link
Member

Thank you @RikkiGibson !

@alrz
Copy link
Contributor

alrz commented Oct 20, 2023

Would it make sense to synthesize a single-element type for IEnumerable<T>, IReadOnlyList<T>, etc on [e]? I think this could be even up to a threshold, not only limited to a single-element.

@CyrusNajmabadi
Copy link
Member

Yes it would.

@Sergio0694
Copy link
Contributor

Leaving this weird case here that I've found, in case it useful. Not sure if this is by design 🤔

using System;

static int[] Fast(ReadOnlySpan<int> array)
{
    return [0, 1, ..array, 3, 4];
}

static int[] Slow(ReadOnlySpan<int> array)
{
    return [0, 1, 2, ..array, 3, 4];
}

(sharplab link)

Fast correctly allocates an array and writes the items directly.
Slow uses a temporary list instead (with no pre-sizing either).

Is there some hidden threshold here when you fall off this performance cliff? Should there be (as in, is this expected)? And if so, how can users know when they're going past it, to ensure they don't accidentally introduce performance regressions?

@Sergio0694
Copy link
Contributor

Additionally, for cases like those, or also this one:

static int[] Test(ReadOnlySpan<int> array)
{
    return [0, .. array, 1, .. array];
}

Ie. when the source is either a span, or something that implements ICollection<T>, couldn't Roslyn just call CopyTo on the source to copy the items into the target range, rather than manually using an enumerator instead, which is much slower?

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Oct 22, 2023

Yes. We have approved being allowed to call copyto

@CyrusNajmabadi
Copy link
Member

Is there some hidden threshold here when you fall off this performance cliff?

There currently is. But, @cston we really need to rethink it. The threshold is far too small. We need to up it to a much better sweet spot.

@slang25
Copy link
Contributor

slang25 commented Nov 11, 2023

Hey y'all, is there a definitive list of what optimizations have landed for C# 12 and what is on to todo list, or is this issue that list? I can't tell if all of the optimizations made it in time for the GA.

@CyrusNajmabadi
Copy link
Member

@slang25 this issue is that list. The items that have made it have check marks on them

@karakasa

This comment was marked as resolved.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 18, 2023

Linq is explicitly defined as just a syntactic transformation. We would need to change the specification to allow the compiler to behave differently.

Collection expressions are not done in the same fashion. They're explicitlynot defined to l as syntactic transformations. Instead, the semantics are defined and it is explicitly called out that a compliant implementation is free to emit whatever it wants as long as those semantics are preserved.

If you would like linq to be changed, is recommend a new discussion on that topic. Thanks!

@sakno
Copy link

sakno commented Nov 22, 2023

Currently (.NET 8), the following code

public static char[] Concat(ReadOnlySpan<char> span1, ReadOnlySpan<char> span2)
        => [..span1, ..span2];

produces

public static char[] Concat(ReadOnlySpan<char> span1, ReadOnlySpan<char> span2)
    {
        ReadOnlySpan<char> readOnlySpan = span1;
        ReadOnlySpan<char> readOnlySpan2 = span2;
        int num = 0;
        char[] array = new char[readOnlySpan.Length + readOnlySpan2.Length];
        ReadOnlySpan<char>.Enumerator enumerator = readOnlySpan.GetEnumerator();
        while (enumerator.MoveNext())
        {
            char c = (array[num] = enumerator.Current);
            num++;
        }
        enumerator = readOnlySpan2.GetEnumerator();
        while (enumerator.MoveNext())
        {
            char c = (array[num] = enumerator.Current);
            num++;
        }
        return array;
    }

The quality of codegen is poor. It cannot be considered as optimal. Why simple concatenation of twos spans cannot be done using Span<T>.CopyTo instead of MoveNext? In some cases, analyzer in VSCode suggests to replace some statements with collection literals however this replacement is incorrect from performance point of view. I guess that most of developers don't know how the collection literal translated under the hood. As a result, suggestion from IDE leads to performance loss that is hard to investigate. How is this codegen considered as production-ready?

@CyrusNajmabadi
Copy link
Member

Why simple concatenation of twos spans cannot be done using Span.CopyTo instead of MoveNext?

Concatenation of spans (or anything for that matter) can be done with .CopyTo. And this was explicitly approved by the LDM. @RikkiGibson for visibility on this (unless you're already working on this).

however this replacement is incorrect from performance point of view.

Compiler strategy is an internal implementation detail.

How is this codegen considered as production-ready?

There are a near infinite number of things that can be optimized in the real world. Any particular optimization that is lacking is a non-starter for some customer. As such, if production-ready doesn't mean "sufficient for all customers", it means sufficient for the vast majority.

@jnyrup
Copy link

jnyrup commented Dec 27, 2023

When using the combination of CollectionsMarshal.SetCount(list, count) for a constant count and CollectionsMarshal.AsSpan(list) we still leave the JIT with a Span<T> of unknown size, preventing it from eliding bounds checks when writing to the Span<T>.

On the other hand, when writing to a Span<T> created with MemoryMarshal.CreateSpan(ref T, length) created with the same constant value as used for CollectionsMarshal.SetCount(list, count) the bounds checks are now elided from the generated assembly.

As the List<T> is not visible from other threads at this point, I believe the pitfalls discussed in dotnet/runtime#56773 does not apply here.

This optimization only seems applicable when the number of elements is a compile-time constant.

SharpLab

@CyrusNajmabadi
Copy link
Member

Thanks for the info @jnyrup. Tagging @RikkiGibson and @stephentoub

@ViIvanov
Copy link

emit foreach (var x in y ?? []) as a lifted null check that executes nothing if 'y' is null.

It would be nice also to support the following use case:

List<object> list = [1, "a", .. items ?? [], "z"];
//                                    ^---^

and skip handling (adding) items to the list in case items is null.

@alrz
Copy link
Contributor

alrz commented Feb 6, 2024

Is using Unsafe valid to avoid allocating the synthesized type for IEnumerble IReadOnlyList IReadOnlyCollection?

(sharplab.io)

@Sergio0694
Copy link
Contributor

No. Using Unsafe.As<T>(object) to alias reference types that would not pass a normal cast is undefined behavior.

@jnm2
Copy link
Contributor

jnm2 commented Jul 4, 2024

emit foreach (var x in y ?? []) as a lifted null check that executes nothing if 'y' is null.

I created a proof of concept for this, which likely needs to be redone per the TODO comment: main...jnm2:optimize_foreach_coalesce_empty

@333fred raised a concern that skipping GetEnumerator and MoveNext calls on the transient empty collection could be too dangerous, and he suggested checking against known types like List<T>. Things like ImmutableArray<T>?, Dictionary<TKey, TValue> and HashSet<T> come to mind as well for that list.

Also, he would want to see benchmarks before replacing ?? Array.Empty<T>() with the check and jump.

Is it worth starting a new thread to discuss this?

@Zetanova
Copy link

Zetanova commented Jul 4, 2024

@jnm2 What is the defined behavior in .net 1.1 foreach and enumerators, if it can be statically proofen that collection.Count == 0 ?

ArrayList items = new ArrayList();
foreach(object item in items) //is the Enumerator required by definition?
{
}

@Zetanova
Copy link

Zetanova commented Jul 4, 2024

@RikkiGibson thx, for the link.
With out extension of the definition and some kind of marker/flag seams in the current form impossible to optimize it.
First idea would be to introduce something like IBoundedEnumerable and include it in the foreach definition
this would then result in:

bool empty = ((C)(x)) is IBoundedEnumerable b && b.Count == 0;
if(!empty)
{
    E e = ((C)(x)).GetEnumerator();
    try
    {
        while (e.MoveNext())
        {
            V v = (V)(T)e.Current;
            «embedded_statement»
        }
    }
    finally
    {
        ... // Dispose e
    }
}

@jnm2
Copy link
Contributor

jnm2 commented Jul 4, 2024

@Zetanova That would not be as low-overhead as surrounding with an if null check, or a hypothetical future foreach? keyword.

@Zetanova
Copy link

Zetanova commented Jul 4, 2024

@jnm2 simply doing nothing is the best option. the upcoming ´union types´ would resolve the issue.
But I don't know how much collection literals require optimization today.
The underlining logical issue is that the compiler can't currently statically test on an Array for Empty | Single | Many

@jnm2
Copy link
Contributor

jnm2 commented Jul 4, 2024

I would urge us to still skip creating and enumerating an empty collection in foreach, because we have discussed a strong desire for this pattern to not actually instantiate inner collections, because this is currently the sole way to achieve conditional inclusion in the middle of a collection expression:

SomeCustomCollection x = [a, .. condition ? [b] : []];

(This may not ship in 13, but hopefully soon. In the meantime, you can write ? new[] { b } : [].)

If we can skip creating a non-empty transient collection there as well as an empty one, I would absolutely expect the same here (and we've even touched on this in design discussions):

SomeCustomCollection x = [a, .. items ?? []];

Now we're already quite parallel to foreach (var _ in items ?? []).

@Zetanova
Copy link

Zetanova commented Jul 4, 2024

@jnm2 Interesting is that the lowering of foreach (var _ in Span<object>.Empty) is over a while loop over the indexer.
This is already not defined in the foreach spec
example: sharplab

@jnm2
Copy link
Contributor

jnm2 commented Jul 4, 2024

@Zetanova Same for arrays. Underneath the compiler expansions which showcase the use of enumerators, it says:

An implementation is permitted to implement a given foreach_statement differently; e.g., for performance reasons, as long as the behavior is consistent with the above expansion.

@Zetanova
Copy link

Zetanova commented Jul 5, 2024

@jnm2 when the foreach expression has the form T ?? new T() where T has class, IEnumerable then the main...jnm2:optimize_foreach_coalesce_empty can be applied sharplab

For ValueTypes the this optimization should not apply.

but this example would break: sharplab
I can't imaging that it is every been used this way, but who knows ...

An other option is to go the static default why like this: sharplab
maybe there is already a type like this in the code base. But even this has some low risk to break a very badly designed Enumerable.

Don't really see a clean why without an extension.
I would prefer instead of foreach? (var _ in items) {} this form foreach (var _ in items?) {}, It could also be applied to [a, ... items?] but it would be not aligned to var _ = items?[..]

@Zetanova
Copy link

@jnm2 I found a partial optimization to get rid of the heap-alloc.
Instead of lowering items ?? [] to (items ?? new List<object>()).GetEnumerator() for the foreach.
A duck-typing pattern could be added. If the resulting enumerator type E contains a static E Empty { get; } then it should be used for the foreach block E enumerator = items?.GetEnumerator() ?? E.Empty

Demo: sharplap

@jaredpar jaredpar modified the milestones: 17.9, 17.13 Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code New Feature - Collection Expressions
Projects
None yet
Development

No branches or pull requests