add int8 quantization support for llm models by lanluo-nvidia · Pull Request #4086 · pytorch/TensorRT

lanluo-nvidia · 2026-02-19T21:27:19Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tools/llm/quantize_utils.py	2026-02-19 21:27:32.249942+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/quantize_utils.py	2026-02-19 21:28:16.257493+00:00
@@ -57,16 +57,20 @@
        device="cuda:0",
    )
    if args.quant_format == "int8":
        if args.quant_algo == "smoothquant":
            if args.weight_only:
-                raise RuntimeError("SmoothQuant is supported for weight-and-activation quantization, weight-only flag should not be set")
+                raise RuntimeError(
+                    "SmoothQuant is supported for weight-and-activation quantization, weight-only flag should not be set"
+                )
            quant_cfg = mtq.INT8_SMOOTHQUANT_CFG
        elif args.weight_only:
            quant_cfg = mtq.INT8_WEIGHT_ONLY_CFG
        else:
-            raise RuntimeError(f"Unsupported args.quant_algo: {args.quant_algo} and args.weight_only: {args.weight_only} for int8 quantization")
+            raise RuntimeError(
+                f"Unsupported args.quant_algo: {args.quant_algo} and args.weight_only: {args.weight_only} for int8 quantization"
+            )
    elif args.quant_format == "fp8":
        quant_cfg = mtq.FP8_DEFAULT_CFG
    elif args.quant_format == "nvfp4":
        quant_cfg = mtq.NVFP4_DEFAULT_CFG
    else:
--- /home/runner/work/TensorRT/TensorRT/tools/llm/run_llm.py	2026-02-19 21:27:32.249942+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/run_llm.py	2026-02-19 21:28:16.328324+00:00
@@ -269,11 +269,11 @@
        default="max",
    )
    arg_parser.add_argument(
        "--weight_only",
        help=("Apply weight only quantization. True (default: False)"),
-       action="store_true",
+        action="store_true",
    )
    args = arg_parser.parse_args()

    with torch.inference_mode():
        model = get_model(args)

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tools/llm/quantize_utils.py	2026-02-20 00:10:03.446101+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/quantize_utils.py	2026-02-20 00:10:48.765198+00:00
@@ -57,16 +57,20 @@
        device="cuda:0",
    )
    if args.quant_format == "int8":
        if args.quant_algo == "smoothquant":
            if args.weight_only:
-                raise RuntimeError("SmoothQuant is supported for weight-and-activation quantization, weight-only flag should not be set")
+                raise RuntimeError(
+                    "SmoothQuant is supported for weight-and-activation quantization, weight-only flag should not be set"
+                )
            quant_cfg = mtq.INT8_SMOOTHQUANT_CFG
        elif args.weight_only:
            quant_cfg = mtq.INT8_WEIGHT_ONLY_CFG
        else:
-            raise RuntimeError(f"Unsupported args.quant_algo: {args.quant_algo} and args.weight_only: {args.weight_only} for int8 quantization")
+            raise RuntimeError(
+                f"Unsupported args.quant_algo: {args.quant_algo} and args.weight_only: {args.weight_only} for int8 quantization"
+            )
    elif args.quant_format == "fp8":
        quant_cfg = mtq.FP8_DEFAULT_CFG
    elif args.quant_format == "nvfp4":
        quant_cfg = mtq.NVFP4_DEFAULT_CFG
    else:
--- /home/runner/work/TensorRT/TensorRT/tools/llm/run_llm.py	2026-02-20 00:10:03.446101+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/run_llm.py	2026-02-20 00:10:49.258692+00:00
@@ -269,11 +269,11 @@
        default="max",
    )
    arg_parser.add_argument(
        "--weight_only",
        help=("Apply weight only quantization. True (default: False)"),
-       action="store_true",
+        action="store_true",
    )
    args = arg_parser.parse_args()

    with torch.inference_mode():
        model = get_model(args)

narendasan

LGTM

add int8 quantization support for llm models

245f24f

meta-cla bot added the cla signed label Feb 19, 2026

github-actions bot requested changes Feb 19, 2026

View reviewed changes

Merge branch 'main' into lluo/int8_non_prequantized

7265976

github-actions bot requested changes Feb 20, 2026

View reviewed changes

fix lint

3128131

lanluo-nvidia marked this pull request as ready for review February 20, 2026 00:18

narendasan approved these changes Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add int8 quantization support for llm models#4086

add int8 quantization support for llm models#4086
lanluo-nvidia wants to merge 3 commits intomainfrom
lluo/int8_non_prequantized

lanluo-nvidia commented Feb 19, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

narendasan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

lanluo-nvidia commented Feb 19, 2026

Description

Type of change

Checklist:

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments