Skip to content

Commit bdd22fe

Browse files
author
Ryan Lai
authored
Made debug flag documentation more explicit (#181)
* Made debug flag documentation more explicit * rename flag to DebugEvaluate
1 parent 71e9db5 commit bdd22fe

File tree

4 files changed

+86
-66
lines changed

4 files changed

+86
-66
lines changed

Tools/WinMLRunner/README.md

+73-57
Original file line numberDiff line numberDiff line change
@@ -30,26 +30,33 @@ Required command-Line arguments:
3030
-folder <path> : Fully qualifed path to a folder with .onnx and/or .pb models, will run all of the models in the folder.
3131
3232
#Optional command-line arguments:
33-
-version: : prints the version information for this build of WinMLRunner.exe
34-
-Perf : optional:<all>: capture performance measurements such as timing and memory usage. Specifying "all" will output all measurements
35-
-Iterations <int> : Number of times to evaluate the model when capturing performance measurements.
36-
-CPU : Will create a session on the CPU.
37-
-GPU : Will create a session on the GPU.
38-
-GPUHighPerformance : Will create a session with the most powerful GPU device available.
39-
-GPUMinPower : Will create a session with GPU with the least power.
40-
-CreateDeviceOnClient : Will create the device on the client and explicitly pass it to WinML via the API. GPU runs using this flag will usually be faster than -CreateDeviceInWinML since we avoid a cross-device copy by creating the video frame on the same device that DML uses to bind inputs.
41-
-CreateDeviceInWinML : Will create the device inside WinML. GPU runs using this flag will usually be slower than -CreateDeviceOnClient since we have to copy the video frame to a different device.
42-
-CPUBoundInput : Will bind the input to the CPU.
43-
-GPUBoundInput : Will bind the input to the GPU.
44-
-BGR : Will load the input as a BGR image.
45-
-RGB : Will load the input as an RGB image.
46-
-Tensor : Will load the input as a tensor.
47-
-Input <image/CSV path> : Will bind image/data from CSV to model.
48-
-PerfOutput <CSV path> : Path to the CSV where the perf results will be written.
49-
-SavePerIterationPerf : Save per iteration performance results to csv file.
50-
-Debug : Will start a trace logging session.
51-
-Terse : Will suppress repetitive console output (initial iteration and summary info will be output).
52-
-AutoScale <mode> : Will automatically scale an input image to match the required input dimensions of the model. Pass in the interpolation mode, one of ["Nearest", "Linear", "Cubic", "Fant"].
33+
-version: prints the version information for this build of WinMLRunner.exe
34+
-CPU : run model on default CPU
35+
-GPU : run model on default GPU
36+
-GPUHighPerformance : run model on GPU with highest performance
37+
-GPUMinPower : run model on GPU with the least power
38+
-CreateDeviceOnClient : create the device on the client and pass it to WinML
39+
-CreateDeviceInWinML : create the device inside WinML
40+
-CPUBoundInput : bind the input to the CPU
41+
-GPUBoundInput : bind the input to the GPU
42+
-RGB : load the input as an RGB image
43+
-BGR : load the input as a BGR image
44+
-Tensor : load the input as a tensor
45+
-Perf [all]: capture performance measurements such as timing and memory usage. Specifying "all" will output all measurements
46+
-Iterations : # times perf measurements will be run/averaged
47+
-Input <fully qualified path>: binds image or CSV to model
48+
-PerfOutput [<fully qualified path>]: csv file to write the perf results to
49+
-SavePerIterationPerf : save per iteration performance results to csv file
50+
-SaveTensorData <saveMode folderPath>: saveMode: save first iteration or all iteration output tensor results to csv file [First, All]
51+
folderPath: Optional folder path can be specified to hold tensor data. It will be created if folder doesn't exist.
52+
-DebugEvaluate: Print evaluation debug output to debug console if debugger is present.
53+
-Terse: Terse Mode (suppresses repetitive console output)
54+
-AutoScale <interpolationMode>: Enable image autoscaling and set the interpolation mode [Nearest, Linear, Cubic, Fant]
55+
56+
Concurrency Options:
57+
-ConcurrentLoad: load models concurrently
58+
-NumThreads <number>: number of threads to load a model. By default this will be the number of model files to be executed
59+
-ThreadInterval <milliseconds>: interval time between two thread creations in milliseconds
5360
5461
```
5562

@@ -177,16 +184,17 @@ Shared Memory (MB) - The amount of memory that was used on the DRAM by the GPU.
177184
```
178185
.\WinMLRunner.exe -model SqueezeNet.onnx -perf
179186
WinML Runner
180-
GPU: AMD Radeon Pro WX 3100
187+
Printing available GPUs with DXGI..
188+
Index: 0, Description: AMD Radeon Pro WX 3100
181189
182-
Loading model (path = SqueezeNet.onnx)...
190+
Loading model (path = .\SqueezeNet.onnx)...
183191
=================================================================
184192
Name: squeezenet_old
185193
Author: onnx-caffe2
186194
Version: 9223372036854775807
187195
Domain:
188196
Description:
189-
Path: SqueezeNet.onnx
197+
Path: .\SqueezeNet.onnx
190198
Support FP16: false
191199
192200
Input Feature Info:
@@ -199,47 +207,55 @@ Feature Kind: Float
199207
200208
=================================================================
201209
202-
Binding (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS]
203-
Evaluating (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS]
204-
Outputting results..
205-
Feature Name: softmaxout_1
206-
resultVector[818] has the maximal value of 1
207210
211+
Creating Session with CPU device
212+
Binding (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
213+
Evaluating (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
208214
209-
Results (device = CPU, numIterations = 1, inputBinding = CPU, inputDataType = Tensor):
210-
Load: 408.386300 ms
211-
Bind: 0.9184 ms
212-
Evaluate: 739.173 ms
213-
Total Time: 1148.48 ms
214-
Wall-Clock Load: 408.064 ms
215-
Wall-Clock Bind: 1.1311 ms
216-
Wall-Clock Evaluate: 739.337 ms
217-
Total Wall-Clock Time: 1148.53 ms
218-
Working Set Memory usage (evaluate): 0 MB
219-
Dedicated Memory Usage (evaluate): 0 MB
220-
Shared Memory Usage (evaluate): 0 MB
221215
216+
Results (device = CPU, numIterations = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):
222217
218+
First Iteration Performance (load, bind, session creation, and evaluate):
219+
Load: 436.598 ms
220+
Bind: 0.8575 ms
221+
Session Creation: 120.181 ms
222+
Evaluate: 177.233 ms
223223
224-
Binding (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS]
225-
Evaluating (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS]
226-
Outputting results..
227-
Feature Name: softmaxout_1
228-
resultVector[818] has the maximal value of 1
224+
Working Set Memory usage (evaluate): 9.95313 MB
225+
Working Set Memory usage (load, bind, session creation, and evaluate): 45.6289 MB
226+
Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 46.5625 MB
227+
228+
Dedicated Memory usage (evaluate): 0 MB
229+
Dedicated Memory usage (load, bind, session creation, and evaluate): 0 MB
230+
231+
Shared Memory usage (evaluate): 0 MB
232+
Shared Memory usage (load, bind, session creation, and evaluate): 0 MB
233+
234+
235+
236+
237+
Creating Session with GPU: AMD Radeon Pro WX 3100
238+
Binding (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
239+
Evaluating (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
240+
241+
242+
Results (device = GPU, numIterations = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):
243+
244+
First Iteration Performance (load, bind, session creation, and evaluate):
245+
Load: 436.598 ms
246+
Bind: 5.1858 ms
247+
Session Creation: 285.041 ms
248+
Evaluate: 25.7202 ms
249+
250+
Working Set Memory usage (evaluate): 1.21484 MB
251+
Working Set Memory usage (load, bind, session creation, and evaluate): 42.8047 MB
252+
Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 44.1152 MB
229253
254+
Dedicated Memory usage (evaluate): 10.082 MB
255+
Dedicated Memory usage (load, bind, session creation, and evaluate): 15.418 MB
230256
231-
Results (device = GPU, numIterations = 1, inputBinding = CPU, inputDataType = Tensor):
232-
Load: N/A
233-
Bind: 3.6711 ms
234-
Evaluate: 66.5285 ms
235-
Total Time: 70.1996 ms
236-
Wall-Clock Load: 0 ms
237-
Wall-Clock Bind: 3.9697 ms
238-
Wall-Clock Evaluate: 67.2518 ms
239-
Total Wall-Clock Time: 71.2215 ms
240-
Working Set Memory usage (evaluate): 13.668 MB
241-
Dedicated Memory Usage (evaluate): 13.668 MB
242-
Shared Memory Usage (evaluate): 1 MB
257+
Shared Memory usage (evaluate): 1 MB
258+
Shared Memory usage (load, bind, session creation, and evaluate): 6.04688 MB
243259
```
244260

245261
## Capturing Trace Logs

Tools/WinMLRunner/src/CommandLineArgs.cpp

+9-5
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,14 @@ void CommandLineArgs::PrintUsage() {
2525
std::cout << " -RGB : load the input as an RGB image" << std::endl;
2626
std::cout << " -BGR : load the input as a BGR image" << std::endl;
2727
std::cout << " -Tensor : load the input as a tensor" << std::endl;
28-
std::cout << " -Perf optional:<all>: capture performance measurements such as timing and memory usage. Specifying \"all\" will output all measurements" << std::endl;
28+
std::cout << " -Perf [all]: capture performance measurements such as timing and memory usage. Specifying \"all\" will output all measurements" << std::endl;
2929
std::cout << " -Iterations : # times perf measurements will be run/averaged" << std::endl;
3030
std::cout << " -Input <fully qualified path>: binds image or CSV to model" << std::endl;
31-
std::cout << " -PerfOutput optional:<fully qualified path>: csv file to write the perf results to" << std::endl;
31+
std::cout << " -PerfOutput [<fully qualified path>]: csv file to write the perf results to" << std::endl;
3232
std::cout << " -SavePerIterationPerf : save per iteration performance results to csv file" << std::endl;
3333
std::cout << " -SaveTensorData <saveMode folderPath>: saveMode: save first iteration or all iteration output tensor results to csv file [First, All]" << std::endl;
3434
std::cout << " folderPath: Optional folder path can be specified to hold tensor data. It will be created if folder doesn't exist." << std::endl;
35-
std::cout << " -Debug: print trace logs" << std::endl;
35+
std::cout << " -DebugEvaluate: Print evaluation debug output to debug console if debugger is present." << std::endl;
3636
std::cout << " -Terse: Terse Mode (suppresses repetitive console output)" << std::endl;
3737
std::cout << " -AutoScale <interpolationMode>: Enable image autoscaling and set the interpolation mode [Nearest, Linear, Cubic, Fant]" << std::endl;
3838
std::cout << std::endl;
@@ -134,9 +134,13 @@ CommandLineArgs::CommandLineArgs(const std::vector<std::wstring> &args)
134134
}
135135
m_perfCapture = true;
136136
}
137-
else if ((_wcsicmp(args[i].c_str(), L"-Debug") == 0))
137+
else if ((_wcsicmp(args[i].c_str(), L"-DebugEvaluate") == 0))
138138
{
139-
m_debug = true;
139+
if (!IsDebuggerPresent())
140+
{
141+
throw hresult_invalid_argument(L"-DebugEvaluate flag should only be used when WinMLRunner is under a user-mode debugger!");
142+
}
143+
ToggleEvaluationDebugOutput(true);
140144
}
141145
else if ((_wcsicmp(args[i].c_str(), L"-SavePerIterationPerf") == 0))
142146
{

Tools/WinMLRunner/src/CommandLineArgs.h

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ class CommandLineArgs
1414
bool IsUsingGPUBoundInput() const { return m_useGPUBoundInput; }
1515
bool IsPerformanceCapture() const { return m_perfCapture; }
1616
bool IsPerformanceConsoleOutputVerbose() const { return m_perfConsoleOutputAll; }
17-
bool IsDebugOutputEnabled() const { return m_debug; }
17+
bool IsEvaluationDebugOutputEnabled() const { return m_evaluation_debug_output; }
1818
bool TerseOutput() const { return m_terseOutput; }
1919
bool IsPerIterationCapture() const { return m_perIterCapture; }
2020
bool IsCreateDeviceOnClient() const { return m_createDeviceOnClient; }
@@ -91,7 +91,7 @@ class CommandLineArgs
9191
void TogglePerformanceCapture(bool perfCapture) { m_perfCapture = perfCapture; }
9292
void ToggleIgnoreFirstRun(bool ignoreFirstRun) { m_ignoreFirstRun=ignoreFirstRun;}
9393
void TogglePerIterationPerformanceCapture(bool perIterCapture) { m_perIterCapture = perIterCapture; }
94-
void ToggleDebugOutput(bool debug) { m_debug = debug; }
94+
void ToggleEvaluationDebugOutput(bool debug) { m_evaluation_debug_output = debug; }
9595
void ToggleTerseOutput(bool terseOutput) { m_terseOutput = terseOutput; }
9696

9797

@@ -128,7 +128,7 @@ class CommandLineArgs
128128
bool m_useCPUBoundInput = false;
129129
bool m_useGPUBoundInput = false;
130130
bool m_ignoreFirstRun = false;
131-
bool m_debug = false;
131+
bool m_evaluation_debug_output = false;
132132
bool m_perIterCapture = false;
133133
bool m_terseOutput = false;
134134
bool m_autoScale = false;

Tools/WinMLRunner/src/Run.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ HRESULT EvaluateModel(
278278
return hr.code();
279279
}
280280

281-
if (args.IsDebugOutputEnabled())
281+
if (args.IsEvaluationDebugOutputEnabled())
282282
{
283283
// Enables trace log output.
284284
session.EvaluationProperties().Insert(L"EnableDebugOutput", nullptr);

0 commit comments

Comments
 (0)