`<algorithm>`: Manually vectorised implementations of algorithms can be much slower than the basic implementation

# Describe the bug

`std::max({value1, value2, value3})` (at least with `double` values) is much slower than `std::max(value1, std::max(value2, value3))`. The explicitly-vectorised implementation described here https://learn.microsoft.com/en-us/cpp/standard-library/vectorized-stl-algorithms?view=msvc-170 is called with the initialiser list, which means function call overhead, possibly some dispatching to a variant of the function with the best vector instructions for the ISA extensions present on the CPU, and presumably eventually gets to the 'last three' elements and uses a non-vectorised implementation to deal with them.

As it's a fixed size at a compile time, picking the variant based on the size and avoiding the vectorised one for input small enough to not benefit from it should help. At the moment, `_USE_STD_VECTOR_ALGORITHMS` is the only control users have, and that kills the optimisation in places it's actually helpful, too. For the three-value example given, it's not a big loss of readability or conciseness to avoid the initialiser list, but the threshold where the optimisation is an optimisation is more than three.

# Command-line test case

I'm unconvinced this will make this any clearer, but can throw together a microbenchmark that demonstrates this if you really need one.


# Expected behavior

The manually-vectorised implementations of algorithms are only used when they have a reasonable chance of not making things slower.

# STL version
```
Microsoft (R) C/C++ Optimizing Compiler Version 19.44.35228 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.
```
This isn't the latest, but I looked at the relevant header, and there's still nothing to address this.

# Additional context
Add any other context about the problem here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<algorithm>`: Manually vectorised implementations of algorithms can be much slower than the basic implementation #6308

Describe the bug

Command-line test case

Expected behavior

STL version

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

<algorithm>: Manually vectorised implementations of algorithms can be much slower than the basic implementation #6308

Description

Describe the bug

Command-line test case

Expected behavior

STL version

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`<algorithm>`: Manually vectorised implementations of algorithms can be much slower than the basic implementation #6308