Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce usage of unsafe constructs throughout codebase #7426

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

GrabYourPitchforks
Copy link
Member

This reduces the use of raw pointers and replaces them with safer constructs where possible. In some cases, MemoryMarshal.Read<T>, MemoryMarshal.Write<T>, and MemoryMarshal.AsBytes<T> are used.

Though these three methods are normally unsafe-equivalent APIs, they are guaranteed safe when T is a primitive integral or floating point type, char, or an enum thereof. (That is, it's guaranteed safe when T is byte, sbyte, short, ushort, char, int, uint, long, ulong, int128, uint128, nint, nuint, float, double, Half, or an enum of any of these.)

I've left code comments where things couldn't be made fully safe. I've also opted not to touch some code in VectorUtils.cs. I only touched code where the loop logic is simple enough for the JIT (even the older netfx JIT!) to always elide bounds checks. I didn't touch code paths where multiple buffers were being touched at once since the JIT doesn't yet properly elide bounds checks in those cases, and I didn't want to risk a possible perf regression.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reduces unsafe pointer usage and replaces it with safe MemoryMarshal and checked arithmetic operations. Key changes include replacing unsafe blocks with MemoryMarshal APIs in ToByteArrayExtensions.cs, refactoring stream reading methods in Stream.cs to use span-based operations, and removing unsafe code from VectorUtils.cs for vector operations.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/Microsoft.ML.FastTree/Utils/ToByteArrayExtensions.cs Replaces unsafe pointer manipulation with MemoryMarshal calls and adds checked arithmetic.
src/Microsoft.ML.Core/Utilities/Stream.cs Refactors binary reading to use span-based operations via a new ReadBinaryDataIntoSpan method.
src/Microsoft.ML.FastTree/Utils/VectorUtils.cs Removes unsafe blocks from vector operations, consolidating loops into safe code.
Comments suppressed due to low confidence (1)

src/Microsoft.ML.FastTree/Utils/VectorUtils.cs:207

  • The method name 'MutiplyInPlace' is misspelled and should be 'MultiplyInPlace'.
public static void MutiplyInPlace(double[] vector, double val)
Copy link

codecov bot commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 57.14286% with 54 lines in your changes missing coverage. Please review.

Project coverage is 69.01%. Comparing base (adad40c) to head (4c46e4f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...crosoft.ML.FastTree/Utils/ToByteArrayExtensions.cs 46.77% 33 Missing ⚠️
src/Microsoft.ML.FastTree/Utils/VectorUtils.cs 26.92% 19 Missing ⚠️
src/Microsoft.ML.Core/Utilities/Stream.cs 94.73% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7426      +/-   ##
==========================================
+ Coverage   68.97%   69.01%   +0.03%     
==========================================
  Files        1481     1481              
  Lines      273708   273386     -322     
  Branches    28285    28188      -97     
==========================================
- Hits       188789   188671     -118     
+ Misses      77525    77352     -173     
+ Partials     7394     7363      -31     
Flag Coverage Δ
Debug 69.01% <57.14%> (+6.40%) ⬆️
production 63.31% <57.14%> (+0.70%) ⬆️
test 89.46% <ø> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/Microsoft.ML.Core/Utilities/Stream.cs 69.38% <94.73%> (+1.40%) ⬆️
src/Microsoft.ML.FastTree/Utils/VectorUtils.cs 9.95% <26.92%> (-1.55%) ⬇️
...crosoft.ML.FastTree/Utils/ToByteArrayExtensions.cs 35.56% <46.77%> (+1.64%) ⬆️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}
}
tmp = vector[i] - mean;
sum += tmp * tmp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp * tmp;

I assume we don't care about the overflow here, I know there is no change here from the old code behavior.

Copy link
Member Author

@GrabYourPitchforks GrabYourPitchforks Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This will end up being double.Infinity if there's overflow.

You saw that I sprinkled some overflow checks throughout the code, but they're only in cases where we calculate array indices. I didn't add any overflow checks elsewhere, including in places where we calculate array values (as opposed to indices). That seemed like too risky a change to the core logic.

@tarekgh
Copy link
Member

tarekgh commented Mar 19, 2025

Seeing the following failure, I hope it is not related. I'll restart this test to see if it will pass or it is consistent failure.

https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-pull-7426-merge-9e855dccaf3f484d86/Microsoft.ML.Core.Tests/1/console.91a4c382.log?helixlogtype=result

�[mStarting test: Microsoft.ML.RunTests.TestEntryPoints.EntryPointEvaluateRegression
Finished test: Microsoft.ML.RunTests.TestEntryPoints.EntryPointEvaluateRegression with memory usage 192,921,600.00 and max memory usage 0.00
Starting test: Microsoft.ML.RunTests.TestEntryPoints.EntryPointPercentileThreshold
�[31;1m    Microsoft.ML.RunTests.TestEntryPoints.EntryPointEvaluateRegression [FAIL]
�[m�[37m      System.ArgumentOutOfRangeException : If specified, must be non-negative. (Parameter 'valuesCount')
�[m�[30;1m      Stack Trace:
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.DataView/VBuffer.cs(427,0): at Microsoft.ML.Data.VBuffer`1.GetEditor(Int32 newLogicalLength, Nullable`1 valuesCount, Nullable`1 maxValuesCapacity, Boolean keepOldOnResize, Boolean requireIndicesOnDense)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/DataView/CacheDataView.cs(1398,0): at Microsoft.ML.Data.CacheDataView.ColumnCache.ImplVec`1.Fetch(Int32 idx, VBuffer`1& value)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/DataView/CacheDataView.cs(496,0): at Microsoft.ML.Data.CacheDataView.RowCursor`1.<>c__DisplayClass8_0`1.<CreateGetterDelegateCore>b__0(TValue& value)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/Training/TrainerUtils.cs(555,0): at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext()
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.StandardTrainers/Standard/SdcaBinary.cs(385,0): at Microsoft.ML.Trainers.SdcaTrainerBase`3.TrainCore(IChannel ch, RoleMappedData data, LinearModelParameters predictor, Int32 weightSetCount)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.StandardTrainers/Standard/StochasticTrainerBase.cs(43,0): at Microsoft.ML.Trainers.StochasticTrainerBase`2.TrainModelCore(TrainContext context)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs(100,0): at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Microsoft.ML.ITrainer<Microsoft.ML.IPredictor>.Train(TrainContext context)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs(168,0): at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Microsoft.ML.ITrainer.Train(TrainContext context)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/Commands/TrainCommand.cs(280,0): at Microsoft.ML.Data.TrainUtils.TrainCore(IHostEnvironment env, IChannel ch, RoleMappedData data, ITrainer trainer, RoleMappedData validData, IComponentFactory`1 calibrator, Int32 maxCalibrationExamples, Nullable`1 cacheData, IPredictor inputPredictor, RoleMappedData testData)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/Commands/TrainCommand.cs(249,0): at Microsoft.ML.Data.TrainUtils.Train(IHostEnvironment env, IChannel ch, RoleMappedData data, ITrainer trainer, IComponentFactory`1 calibrator, Int32 maxCalibrationExamples)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.Data/EntryPoints/InputBase.cs(119,0): at Microsoft.ML.EntryPoints.TrainerEntryPointsUtils.Train[TArg,TOut](IHost host, TArg input, Func`1 createTrainer, Func`1 getLabel, Func`1 getWeight, Func`1 getGroup, Func`1 getName, Func`1 getCustom, ICalibratorTrainerFactory calibrator, Int32 maxCalibrationExamples)
�[m�[37m        /Users/runner/work/1/s/src/Microsoft.ML.StandardTrainers/Standard/SdcaRegression.cs(221,0): at Microsoft.ML.Trainers.Sdca.TrainRegression(IHostEnvironment env, Options input)
�[m�[37m           at InvokeStub_Sdca.TrainRegression(Object, Span`1)
�[m�[37m           at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
�[m�[30;1m      Output:
�[m�[37m        Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.
�[m�[37m        
�[m�[37m        
�[m�[37m        Using 2 threads to train.
�[m�[37m        
�[m�[37m        
�[m�[37m        Automatically choosing a check frequency of 2.

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants