-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce usage of unsafe constructs throughout codebase #7426
base: main
Are you sure you want to change the base?
Conversation
- Only where we know for a fact the JIT will elide bounds checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reduces unsafe pointer usage and replaces it with safe MemoryMarshal and checked arithmetic operations. Key changes include replacing unsafe blocks with MemoryMarshal APIs in ToByteArrayExtensions.cs, refactoring stream reading methods in Stream.cs to use span-based operations, and removing unsafe code from VectorUtils.cs for vector operations.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
src/Microsoft.ML.FastTree/Utils/ToByteArrayExtensions.cs | Replaces unsafe pointer manipulation with MemoryMarshal calls and adds checked arithmetic. |
src/Microsoft.ML.Core/Utilities/Stream.cs | Refactors binary reading to use span-based operations via a new ReadBinaryDataIntoSpan method. |
src/Microsoft.ML.FastTree/Utils/VectorUtils.cs | Removes unsafe blocks from vector operations, consolidating loops into safe code. |
Comments suppressed due to low confidence (1)
src/Microsoft.ML.FastTree/Utils/VectorUtils.cs:207
- The method name 'MutiplyInPlace' is misspelled and should be 'MultiplyInPlace'.
public static void MutiplyInPlace(double[] vector, double val)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7426 +/- ##
==========================================
+ Coverage 68.97% 69.01% +0.03%
==========================================
Files 1481 1481
Lines 273708 273386 -322
Branches 28285 28188 -97
==========================================
- Hits 188789 188671 -118
+ Misses 77525 77352 -173
+ Partials 7394 7363 -31
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
} | ||
} | ||
tmp = vector[i] - mean; | ||
sum += tmp * tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This will end up being double.Infinity
if there's overflow.
You saw that I sprinkled some overflow checks throughout the code, but they're only in cases where we calculate array indices. I didn't add any overflow checks elsewhere, including in places where we calculate array values (as opposed to indices). That seemed like too risky a change to the core logic.
Seeing the following failure, I hope it is not related. I'll restart this test to see if it will pass or it is consistent failure.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This reduces the use of raw pointers and replaces them with safer constructs where possible. In some cases,
MemoryMarshal.Read<T>
,MemoryMarshal.Write<T>
, andMemoryMarshal.AsBytes<T>
are used.Though these three methods are normally unsafe-equivalent APIs, they are guaranteed safe when T is a primitive integral or floating point type, char, or an enum thereof. (That is, it's guaranteed safe when T is byte, sbyte, short, ushort, char, int, uint, long, ulong, int128, uint128, nint, nuint, float, double, Half, or an enum of any of these.)
I've left code comments where things couldn't be made fully safe. I've also opted not to touch some code in VectorUtils.cs. I only touched code where the loop logic is simple enough for the JIT (even the older netfx JIT!) to always elide bounds checks. I didn't touch code paths where multiple buffers were being touched at once since the JIT doesn't yet properly elide bounds checks in those cases, and I didn't want to risk a possible perf regression.