Fix | SqlVector: Explicitly perform little-endian multibyte writes#3861
Fix | SqlVector: Explicitly perform little-endian multibyte writes#3861edwardneal wants to merge 5 commits intodotnet:mainfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR refactors the SqlVector<T> implementation to use explicit little-endian byte operations when serializing and deserializing vector data. The change replaces platform-specific memory marshaling/copying approaches with consistent loop-based operations using BinaryPrimitives methods to ensure correct endianness.
Key Changes:
- Replaced manual bit manipulation and platform-specific
Buffer.BlockCopy/MemoryMarshaloperations with explicit little-endian read/write methods - Introduced platform-specific handling:
BinaryPrimitives.WriteSingleLittleEndianfor .NET andBitConverterCompatible.SingleToInt32Bits+BinaryPrimitives.WriteInt32LittleEndianfor .NET Framework - Ensured symmetry between serialization (
MakeTdsBytes) and deserialization (MakeArray) operations
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlTypes/SqlVector.cs
Outdated
Show resolved
Hide resolved
|
|
||
| private SqlVector(int length) | ||
| { | ||
| if (length < 0) |
There was a problem hiding this comment.
Should we also be throwing if length > ushort.Max ?
Same for the ReadOnlyMemory constructor.
(With updated public docs to match.)
There was a problem hiding this comment.
I think so, yes - although I think it's probably going to be TdsEnums.VECTOR_HEADER_SIZE + (_elementSize * Length) > 8000 to align with SQL Server.
This also means that length will always be < 8000, thus always in the acceptable range for a ushort (so MakeTdsBytes can just have a simple debug assertion rather than an exception.)
| result[1] = VecVersionNo; | ||
| result[2] = (byte)(Length & 0xFF); | ||
| result[3] = (byte)((Length >> 8) & 0xFF); | ||
| BinaryPrimitives.WriteUInt16LittleEndian(result.AsSpan(2), (ushort)Length); |
There was a problem hiding this comment.
I think we need a new class invariant to ensure Length's value is compatible with ushort. See my comment on the constructor above.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlTypes/SqlVector.cs
Outdated
Show resolved
Hide resolved
| for (int i = 0, currPosition = TdsEnums.VECTOR_HEADER_SIZE; i < values.Length; i++, currPosition += _elementSize) | ||
| { | ||
| #if NET | ||
| BinaryPrimitives.WriteSingleLittleEndian(result.AsSpan(currPosition), (float)(object)valueSpan[i]); |
There was a problem hiding this comment.
Is there a performance impact here (good or bad)?
There was a problem hiding this comment.
I expected there to be one, but didn't expect it to be so large.
Previously, the method would have taken ~70ns to copy a max-length vector when the ReadOnlyMemory was backed by an array and ~240ns when it was backed by unmanaged memory (as a result of the extra copy.) As a result of the first set of changes, it would have taken ~1150ns.
The endianness is only really a concern on big-endian systems, so I've changed the method slightly. On a big-endian machine, it'll continue to use the replacement method (so will take about 1150ns.) On a little-endian machine, it'll continue to use the pre-PR method (albeit with a Span-based copy instead of Buffer.BlockCopy) which now takes ~60ns.
Vectors must always be less than 8000 bytes. This also means that Length will always be <= ushort.MaxValue. Add assertion to highlight this.
Length is the number of elements.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
This pull request has been marked as stale due to inactivity for more than 30 days. If you would like to keep this pull request open, please provide an update or respond to any comments. Otherwise, it will be closed automatically in 7 days. |
|
This PR is not stale. |
Description
This deals with a minor point which came out of the original implementation PR: when we build (or read) the byte array representing a
vector({size}, float32), we now explicitly do so using BinaryPrimitives' little-endian methods.There are two slightly unusual points:
BinaryPrimitives.WriteSingleLittleEndianmethod. Instead, we fall back to the existingBitConverterCompatible.SingleToInt32Bitson this target.(float)(object)valueSpan[i]. This is another variation of the same pattern used elsewhere of(T)(object)item, and the same pattern holds: the JIT sees thatvalueSpanis actually aReadOnlySpan<float>and eliminates the redundant cast. An example of this on Sharplab is here.Issues
Fixes #3790.
Testing
Automated tests continue to pass.