-
Notifications
You must be signed in to change notification settings - Fork 356
Commit
Signed-off-by: Naren Dasan <[email protected]> Signed-off-by: Naren Dasan <[email protected]>
- Loading branch information
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n\n# Compiling ResNet using the Torch-TensorRT `torch.compile` Backend\n\nThis interactive script is intended as a sample of the Torch-TensorRT workflow with `torch.compile` on a ResNet model.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Imports and Model Definition\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import torch\nimport torch_tensorrt\nimport torchvision.models as models" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Initialize model with half precision and sample inputs\nmodel = models.resnet18(pretrained=True).half().eval().to(\"cuda\")\ninputs = [torch.randn((1, 3, 224, 224)).to(\"cuda\").half()]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Optional Input Arguments to `torch_tensorrt.compile`\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Enabled precision for TensorRT optimization\nenabled_precisions = {torch.half}\n\n# Whether to print verbose logs\ndebug = True\n\n# Workspace size for TensorRT\nworkspace_size = 20 << 30\n\n# Maximum number of TRT Engines\n# (Lower value allows more graph segmentation)\nmin_block_size = 7\n\n# Operations to Run in Torch, regardless of converter support\ntorch_executed_ops = {}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Compilation with `torch_tensorrt.compile`\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Build and compile the model with torch.compile, using Torch-TensorRT backend\noptimized_model = torch_tensorrt.compile(\n model,\n ir=\"torch_compile\",\n inputs=inputs,\n enabled_precisions=enabled_precisions,\n debug=debug,\n workspace_size=workspace_size,\n min_block_size=min_block_size,\n torch_executed_ops=torch_executed_ops,\n)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Equivalently, we could have run the above via the torch.compile frontend, as so:\n`optimized_model = torch.compile(model, backend=\"torch_tensorrt\", options={\"enabled_precisions\": enabled_precisions, ...}); optimized_model(*inputs)`\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Inference\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Does not cause recompilation (same batch size as input)\nnew_inputs = [torch.randn((1, 3, 224, 224)).half().to(\"cuda\")]\nnew_outputs = optimized_model(*new_inputs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Does cause recompilation (new batch size)\nnew_batch_size_inputs = [torch.randn((8, 3, 224, 224)).half().to(\"cuda\")]\nnew_batch_size_outputs = optimized_model(*new_batch_size_inputs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Cleanup\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Finally, we use Torch utilities to clean up the workspace\ntorch._dynamo.reset()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Cuda Driver Error Note\n\nOccasionally, upon exiting the Python runtime after Dynamo compilation with `torch_tensorrt`,\none may encounter a Cuda Driver Error. This issue is related to https://github.com/NVIDIA/TensorRT/issues/2052\nand can be resolved by wrapping the compilation/inference in a function and using a scoped call, as in::\n\n if __name__ == '__main__':\n compile_engine_and_infer()\n\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
""" | ||
.. _torch_compile_advanced_usage: | ||
Torch Compile Advanced Usage | ||
====================================================== | ||
This interactive script is intended as an overview of the process by which `torch_tensorrt.compile(..., ir="torch_compile", ...)` works, and how it integrates with the `torch.compile` API.""" | ||
|
||
# %% | ||
# Imports and Model Definition | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
import torch | ||
import torch_tensorrt | ||
|
||
# %% | ||
|
||
|
||
# We begin by defining a model | ||
class Model(torch.nn.Module): | ||
def __init__(self) -> None: | ||
super().__init__() | ||
self.relu = torch.nn.ReLU() | ||
|
||
def forward(self, x: torch.Tensor, y: torch.Tensor): | ||
x_out = self.relu(x) | ||
y_out = self.relu(y) | ||
x_y_out = x_out + y_out | ||
return torch.mean(x_y_out) | ||
|
||
|
||
# %% | ||
# Compilation with `torch.compile` Using Default Settings | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
# Define sample float inputs and initialize model | ||
sample_inputs = [torch.rand((5, 7)).cuda(), torch.rand((5, 7)).cuda()] | ||
model = Model().eval().cuda() | ||
|
||
# %% | ||
|
||
# Next, we compile the model using torch.compile | ||
# For the default settings, we can simply call torch.compile | ||
# with the backend "torch_tensorrt", and run the model on an | ||
# input to cause compilation, as so: | ||
optimized_model = torch.compile(model, backend="torch_tensorrt", dynamic=False) | ||
optimized_model(*sample_inputs) | ||
|
||
# %% | ||
# Compilation with `torch.compile` Using Custom Settings | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
# First, we use Torch utilities to clean up the workspace | ||
# after the previous compile invocation | ||
torch._dynamo.reset() | ||
|
||
# Define sample half inputs and initialize model | ||
sample_inputs_half = [ | ||
torch.rand((5, 7)).half().cuda(), | ||
torch.rand((5, 7)).half().cuda(), | ||
] | ||
model_half = Model().eval().cuda() | ||
|
||
# %% | ||
|
||
# If we want to customize certain options in the backend, | ||
# but still use the torch.compile call directly, we can provide | ||
# custom options to the backend via the "options" keyword | ||
# which takes in a dictionary mapping options to values. | ||
# | ||
# For accepted backend options, see the CompilationSettings dataclass: | ||
# py/torch_tensorrt/dynamo/_settings.py | ||
backend_kwargs = { | ||
"enabled_precisions": {torch.half}, | ||
"debug": True, | ||
"min_block_size": 2, | ||
"torch_executed_ops": {"torch.ops.aten.sub.Tensor"}, | ||
"optimization_level": 4, | ||
"use_python_runtime": False, | ||
} | ||
|
||
# Run the model on an input to cause compilation, as so: | ||
optimized_model_custom = torch.compile( | ||
model_half, | ||
backend="torch_tensorrt", | ||
options=backend_kwargs, | ||
dynamic=False, | ||
) | ||
optimized_model_custom(*sample_inputs_half) | ||
|
||
# %% | ||
# Cleanup | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
# Finally, we use Torch utilities to clean up the workspace | ||
torch._dynamo.reset() | ||
|
||
# %% | ||
# Cuda Driver Error Note | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
# | ||
# Occasionally, upon exiting the Python runtime after Dynamo compilation with `torch_tensorrt`, | ||
# one may encounter a Cuda Driver Error. This issue is related to https://github.com/NVIDIA/TensorRT/issues/2052 | ||
# and can be resolved by wrapping the compilation/inference in a function and using a scoped call, as in:: | ||
# | ||
# if __name__ == '__main__': | ||
# compile_engine_and_infer() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
""" | ||
.. _torch_compile_stable_diffusion: | ||
Torch Compile Stable Diffusion | ||
====================================================== | ||
This interactive script is intended as a sample of the Torch-TensorRT workflow with `torch.compile` on a Stable Diffusion model. A sample output is featured below: | ||
.. image:: /tutorials/images/majestic_castle.png | ||
:width: 512px | ||
:height: 512px | ||
:scale: 50 % | ||
:align: right | ||
""" | ||
|
||
# %% | ||
# Imports and Model Definition | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
import torch | ||
from diffusers import DiffusionPipeline | ||
|
||
import torch_tensorrt | ||
|
||
model_id = "CompVis/stable-diffusion-v1-4" | ||
device = "cuda:0" | ||
|
||
# Instantiate Stable Diffusion Pipeline with FP16 weights | ||
pipe = DiffusionPipeline.from_pretrained( | ||
model_id, revision="fp16", torch_dtype=torch.float16 | ||
) | ||
pipe = pipe.to(device) | ||
|
||
backend = "torch_tensorrt" | ||
|
||
# Optimize the UNet portion with Torch-TensorRT | ||
pipe.unet = torch.compile( | ||
pipe.unet, | ||
backend=backend, | ||
options={ | ||
"truncate_long_and_double": True, | ||
"precision": torch.float16, | ||
}, | ||
dynamic=False, | ||
) | ||
|
||
# %% | ||
# Inference | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
prompt = "a majestic castle in the clouds" | ||
image = pipe(prompt).images[0] | ||
|
||
image.save("images/majestic_castle.png") | ||
image.show() |