You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Md files need to have only one heading for rst files to (#125)
show proper titles.
e.g Python with HuggingFace <../tutorials/Quick_Deploy/HuggingFaceTransformers/README.md>
will not show as "Python with HuggingFace" in userguides if the README
had multiple headers.
Deploying models in Triton also comes with the benefit of access to a fully-supported suite
211
211
of deployment analyzers to help you better understand and tailor your systems to fit your
212
212
needs. Triton currently has two options for deployment analysis:
213
213
-[Performance Analyzer](https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton-inference-server-2310/user-guide/docs/user_guide/perf_analyzer.html): An inference performance optimizer.
214
214
-[Model Analyzer](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_analyzer.html) A GPU memory and compute utilization optimizer.
215
215
216
-
### Performance Analyzer
216
+
####Performance Analyzer
217
217
To use the performance analyzer, please remove the persimmon8b model from `model_repository` and restart
218
218
the Triton server using the `docker run` command from above.
219
219
@@ -289,7 +289,7 @@ guide.
289
289
For more information regarding dynamic batching in Triton, please see [this](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html#dynamic-batcher)
290
290
guide.
291
291
292
-
### Model Analyzer
292
+
####Model Analyzer
293
293
294
294
In the performance analyzer section, we used intuition to increase our throughput by changing
295
295
a subset of variables and measuring the difference in performance. However, we only changed
0 commit comments