feat: AutoAWQ to compressed-tensors conversion tool by NJX-njx · Pull Request #2440 · vllm-project/llm-compressor

NJX-njx · 2026-03-04T11:34:12Z

Summary

Adds a conversion module (llmcompressor.conversion.autoawq_to_ct) that converts AutoAWQ quantized checkpoints to the compressed-tensors pack_quantized format, enabling direct loading in vLLM without accuracy loss.

Closes #2087

Key Changes

Core conversion logic

Int4 repacking: Correctly handles AutoAWQ's interleaved packing order [0, 2, 4, 6, 1, 3, 5, 7] and repacks weights into compressed-tensors' sequential order [0, 1, 2, 3, 4, 5, 6, 7]
Tensor renaming: Maps AWQ parameter names to compressed-tensors conventions:
- qweight → weight_packed
- scales → weight_scale
- qzeros → weight_zero_point
Zero-point conversion: Unpacks AWQ's packed zero points (also interleaved)

Metadata generation

Generates proper quantization_config in config.json with quant_method: compressed-tensors and format: pack_quantized
Auto-detects bits, group_size, and symmetric from AWQ config
Supports multi-shard models with proper safetensors index file rewriting

Usage

Python API:

from llmcompressor.conversion.autoawq_to_ct import convert_autoawq_to_ct

convert_autoawq_to_ct(
    model_path="/path/to/autoawq-model",
    output_path="/path/to/output",
)

CLI:

python -m llmcompressor.conversion.autoawq_to_ct \
    --model-path /path/to/autoawq-model \
    --output-path /path/to/output

Testing

Unit tests covering packing round-trip, sequential repacking, key renaming, and AWQ order verification (all passing)
Integration test with a synthetic AutoAWQ checkpoint validates full pipeline correctness

References

AutoAWQ interleaved packing order detail: Can't using llmcompressor for awq quantization and using autoawq completely replace each other? #1909 (comment)
compressed-tensors PackQuantizedCompressor: https://github.com/vllm-project/compressed-tensors/blob/main/src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py

Add a conversion module that converts AutoAWQ quantized checkpoints to the compressed-tensors pack_quantized format, enabling direct loading in vLLM. Key features: - Handles AutoAWQ's interleaved int4 packing order [0,2,4,6,1,3,5,7] and repacks weights into compressed-tensors sequential order - Converts tensor naming: qweight -> weight_packed, scales -> weight_scale, qzeros -> weight_zero_point - Generates proper quantization_config in config.json with compressed-tensors metadata - Supports multi-shard models with proper index file rewriting - Both Python API and CLI entry point Closes vllm-project#2087

chatgpt-codex-connector · 2026-03-04T11:34:18Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

github-actions · 2026-03-04T11:35:14Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-03-04T11:35:51Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new conversion tool to transform AutoAWQ quantized models into the compressed-tensors pack_quantized format. This enables direct loading of AutoAWQ models within vLLM, ensuring compatibility and maintaining accuracy by correctly handling the differences in int4 weight packing and metadata.

Highlights

Int4 Weight Repacking: Implemented logic to correctly re-pack AutoAWQ's interleaved int4 weights (order [0, 2, 4, 6, 1, 3, 5, 7]) into compressed-tensors' sequential order ([0, 1, 2, 3, 4, 5, 6, 7]).
Tensor Renaming: Mapped AutoAWQ tensor names (e.g., 'qweight', 'scales', 'qzeros') to compressed-tensors conventions ('weight_packed', 'weight_scale', 'weight_zero_point').
Zero-Point Conversion: Added functionality to unpack AutoAWQ's interleaved zero points for correct conversion to the compressed-tensors format.
Quantization Metadata Generation: Automatically generates the appropriate 'quantization_config' in 'config.json' for compressed-tensors, including auto-detection of 'bits', 'group_size', and 'symmetric' parameters from the AutoAWQ config.
Multi-Shard Model Support: Ensured proper handling and rewriting of safetensors index files for models distributed across multiple shards.
Usage API: Provided both Python API and CLI interfaces for easy model conversion.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/llmcompressor/conversion/init.py
- Added convert_autoawq_to_ct to the module's __all__ export list.
src/llmcompressor/conversion/main.py
- Created a new module to enable CLI execution of the conversion tool.
src/llmcompressor/conversion/autoawq_to_ct.py
- Implemented the core logic for converting AutoAWQ models to compressed-tensors format.
- Added functions for unpacking AutoAWQ's interleaved int4 values and packing them into compressed-tensors sequential format.
- Included logic for renaming tensor keys and generating quantization_config metadata.
tests/llmcompressor/conversion/test_autoawq_to_ct.py
- Added comprehensive unit tests for int4 packing/unpacking, repacking, and key renaming.
- Included an integration test with a synthetic AutoAWQ model to validate the full conversion pipeline.

Activity

Unit tests were added covering packing round-trip, sequential repacking, key renaming, and AWQ order verification, all reported as passing.
An integration test with a synthetic AutoAWQ checkpoint was implemented and validated the full pipeline correctness.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable conversion tool to transform AutoAWQ quantized models into the compressed-tensors format. The implementation is well-structured, with clear separation of concerns for packing/unpacking logic, key renaming, and the main conversion workflow. The inclusion of a CLI entry point and comprehensive unit tests, including a reference implementation for AWQ packing, is commendable.

My review focuses on improving the robustness of file handling, enhancing code clarity, and increasing test coverage. Specifically, I've suggested a more robust method for copying auxiliary model files, simplified a redundant conditional block in the tensor conversion loop, and recommended adding a check for zero-point correctness in the integration test.

gemini-code-assist · 2026-03-04T11:37:46Z

src/llmcompressor/conversion/autoawq_to_ct.py

+                suffix = key[len(matched_prefix):]
+
+                if suffix == ".qweight":
+                    converted[f"{matched_prefix}.weight_packed"] = (
+                        _repack_awq_to_ct(tensor)
+                    )
+
+                elif suffix == ".scales":
+                    converted[f"{matched_prefix}.weight_scale"] = tensor
+
+                elif suffix == ".qzeros":
+                    # Zero-points are also packed with the AWQ interleave.
+                    zp = _unpack_awq_int4(tensor)
+                    converted[f"{matched_prefix}.weight_zero_point"] = zp
+
+                elif suffix == ".bias":
+                    converted[key] = tensor
+
+                else:
+                    converted[key] = tensor


This section for handling different tensor suffixes can be simplified. The elif suffix == ".bias": block is redundant because its logic is identical to the else: block that follows. Combining them will make the code more concise and easier to read.

suffix = key[len(matched_prefix):] if suffix == ".qweight": converted[f"{matched_prefix}.weight_packed"] = ( _repack_awq_to_ct(tensor) ) elif suffix == ".scales": converted[f"{matched_prefix}.weight_scale"] = tensor elif suffix == ".qzeros": # Zero-points are also packed with the AWQ interleave. zp = _unpack_awq_int4(tensor) converted[f"{matched_prefix}.weight_zero_point"] = zp else: # Pass through other parameters like bias. converted[key] = tensor

gemini-code-assist · 2026-03-04T11:37:46Z

src/llmcompressor/conversion/autoawq_to_ct.py

+    _auxiliary_globs = [
+        "generation_config.json",
+        "special_tokens_map.json",
+        "merges.txt",
+    ]
+    for pattern in _auxiliary_globs:
+        for src in model_path.glob(pattern):
+            dst = output_path / src.name
+            if not dst.exists():
+                shutil.copy2(src, dst)


The current method of copying auxiliary files using a hardcoded list of globs is brittle. It may miss important files required for the model to load correctly, such as tokenizer.json or other tokenizer-related files not covered by save_pretrained. A more robust approach is to iterate through all files in the source directory and copy any that are not explicitly generated or modified by this script. This ensures a more complete and reliable model conversion.

# ----- Copy any remaining auxiliary files ----- for src in model_path.glob("*"): if src.is_dir() or src.suffix == ".safetensors": continue dst = output_path / src.name if not dst.exists(): shutil.copy2(src, dst)

gemini-code-assist · 2026-03-04T11:37:46Z

tests/llmcompressor/conversion/test_autoawq_to_ct.py

+        for i in range(8):
+            ct_unpacked[:, i::8] = (ct_packed >> (i * 4)) & 0xF
+
+        torch.testing.assert_close(ct_unpacked, ground_truth["weights"])


The integration test verifies that the repacked weights are correct, but it's missing a similar verification for the zero points. Since zero points are also transformed (unpacked from the AWQ format), it's important to add an assertion to ensure they are correctly handled in the conversion process. This will improve the test's coverage and confidence in the conversion logic.

Suggested change

torch.testing.assert_close(ct_unpacked, ground_truth["weights"])

torch.testing.assert_close(ct_unpacked, ground_truth["weights"])

# Verify zero-point values are correct after unpacking

zp_unpacked = f.get_tensor(

"model.layers.0.self_attn.q_proj.weight_zero_point"

)

torch.testing.assert_close(zp_unpacked, ground_truth["zeros"])

Copilot

Pull request overview

Adds a new conversion utility to transform AutoAWQ int4-packed checkpoints into the compressed-tensors pack_quantized format (including renaming tensors and writing updated quantization metadata) so the result can be loaded directly by vLLM.

Changes:

Introduces llmcompressor.conversion.autoawq_to_ct with AWQ→CT int4 repacking, tensor key renaming, config.json rewriting, and optional safetensors index rewriting.
Adds CLI entrypoint support via module execution and exports the converter from llmcompressor.conversion.
Adds unit + integration tests for packing/unpacking, key renaming, and an end-to-end synthetic checkpoint conversion.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.

File	Description
`src/llmcompressor/conversion/autoawq_to_ct.py`	Core conversion implementation (packing, renaming, config + index rewriting, CLI parser).
`src/llmcompressor/conversion/__init__.py`	Exposes `convert_autoawq_to_ct` from the conversion package.
`src/llmcompressor/conversion/__main__.py`	Adds a `python -m llmcompressor.conversion ...` entrypoint that delegates to the converter CLI.
`tests/llmcompressor/conversion/test_autoawq_to_ct.py`	New tests covering packing correctness, key renaming, and a single-shard end-to-end conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T11:42:13Z

src/llmcompressor/conversion/autoawq_to_ct.py

+def _repack_awq_to_ct(packed_awq: torch.Tensor) -> torch.Tensor:
+    """One-shot conversion: AWQ-packed int32 → CT-packed int32."""
+    return _pack_ct_int4(_unpack_awq_int4(packed_awq))
+
+


_repack_awq_to_ct expands the packed int32 weights into a full int32 matrix in _unpack_awq_int4 (8× more elements) and then repacks. For large models this intermediate can be tens of GB and may OOM CPU RAM. Consider repacking by extracting/reordering nibbles within the packed int32 tensor (staying in the packed shape) or using a chunked/streaming approach to cap peak memory.

Suggested change

def _repack_awq_to_ct(packed_awq: torch.Tensor) -> torch.Tensor:

"""One-shot conversion: AWQ-packed int32 → CT-packed int32."""

return _pack_ct_int4(_unpack_awq_int4(packed_awq))

def _repack_awq_to_ct(

packed_awq: torch.Tensor,

max_chunk_bytes: int = 256 * 1024 * 1024,

) -> torch.Tensor:

"""Convert AWQ-packed int32 → CT-packed int32 with bounded peak memory.

The naive implementation would unpack the entire tensor to an 8× larger

int32 matrix and then repack it. For large models this can require tens

of GB of RAM. To avoid that, we process the tensor in row-wise chunks:

each chunk is unpacked and repacked independently, and the intermediate

is immediately discarded.

:param packed_awq: AWQ-packed int32 tensor of shape ``(rows, cols_packed)``.

:param max_chunk_bytes: Approximate upper bound on the size of the

unpacked intermediate per chunk, in bytes.

:return: CT-packed int32 tensor with the same shape as ``packed_awq``.

"""

if packed_awq.dim() != 2:

# Keep behavior simple and explicit: this helper is for 2D weight

# matrices. If other shapes are needed, they should be reshaped by

# the caller.

raise ValueError(

f"_repack_awq_to_ct expects a 2D tensor, got shape {tuple(packed_awq.shape)}"

)

rows, cols_packed = packed_awq.shape

if rows == 0 or cols_packed == 0:

return packed_awq.clone()

# Each packed column expands to 8 int32 values in the unpacked matrix.

cols_unpacked = cols_packed * 8

bytes_per_row_unpacked = cols_unpacked * 4 # int32 = 4 bytes

# Compute a chunk size (number of rows) that keeps the unpacked

# intermediate for a chunk under `max_chunk_bytes`. Always process at

# least one row.

max_rows_per_chunk = max(1, max_chunk_bytes // max(bytes_per_row_unpacked, 1))

# Preallocate output tensor in the packed CT layout.

packed_ct = torch.empty_like(packed_awq)

for start in range(0, rows, max_rows_per_chunk):

end = min(rows, start + max_rows_per_chunk)

# Slice the current chunk of rows, convert layout, and write back.

chunk_packed_awq = packed_awq[start:end]

chunk_unpacked = _unpack_awq_int4(chunk_packed_awq)

chunk_packed_ct = _pack_ct_int4(chunk_unpacked)

packed_ct[start:end] = chunk_packed_ct

return packed_ct

Copilot · 2026-03-04T11:42:13Z

src/llmcompressor/conversion/autoawq_to_ct.py

+        num_bits = awq_config.get("bits", num_bits)
+        group_size = awq_config.get("group_size", group_size)
+        # AutoAWQ uses ``zero_point: True`` to indicate *asymmetric* quant.
+        symmetric = not awq_config.get("zero_point", True)


CLI-provided num_bits / group_size / symmetric are always overwritten when config.quantization_config is present, so users cannot override AutoAWQ metadata even if they pass explicit flags. If override is intended, consider using None defaults (and argparse defaults of None) so you can distinguish “not provided” from “provided”, or add an explicit --no-autodetect / --prefer-cli switch.

Suggested change

num_bits = awq_config.get("bits", num_bits)

group_size = awq_config.get("group_size", group_size)

# AutoAWQ uses ``zero_point: True`` to indicate *asymmetric* quant.

symmetric = not awq_config.get("zero_point", True)

# Only apply AutoAWQ metadata when the corresponding value is still at

# its default, so that explicit CLI arguments can override it.

if num_bits == 4:

num_bits = awq_config.get("bits", num_bits)

if group_size == 128:

group_size = awq_config.get("group_size", group_size)

# AutoAWQ uses ``zero_point: True`` to indicate *asymmetric* quant.

if symmetric is False:

symmetric = not awq_config.get("zero_point", True)

Copilot · 2026-03-04T11:42:13Z

src/llmcompressor/conversion/autoawq_to_ct.py

+    # ----- Build compressed-tensors quantization_config -----
+    strategy = "group" if group_size > 0 else "channel"
+    quant_config = {
+        "quant_method": "compressed-tensors",
+        "format": "pack_quantized",
+        "global_compression_ratio": None,
+        "config_groups": {
+            "group_0": {
+                "targets": ["Linear"],
+                "weights": {
+                    "num_bits": num_bits,
+                    "type": "int",
+                    "symmetric": symmetric,
+                    "strategy": strategy,
+                    "group_size": group_size if group_size > 0 else None,
+                },
+                "input_activations": None,
+                "output_activations": None,
+            }
+        },
+        "ignore": ["lm_head"],
+    }


The generated quantization_config schema differs from the one produced elsewhere in this repo (e.g., entrypoints/model_free/save_utils.update_config), which includes fields like compression_version and quantization_status and constructs the config via compressed_tensors.quantization.QuantizationConfig. To avoid incompatibilities with downstream loaders expecting the standard compressed-tensors config shape, consider building this dict using QuantizationConfig/QuantizationScheme and dumping it similarly to update_config (including format, ignore, and quantization_status).

Copilot · 2026-03-04T11:42:14Z

src/llmcompressor/conversion/__main__.py

@@ -0,0 +1,5 @@
+"""Allow ``python -m llmcompressor.conversion.autoawq_to_ct``."""


The module docstring says python -m llmcompressor.conversion.autoawq_to_ct, but src/llmcompressor/conversion/__main__.py is only used by python -m llmcompressor.conversion. Either update the docstring to reflect the actual invocation, or consider moving this entrypoint to autoawq_to_ct/__main__.py (or rely solely on if __name__ == '__main__' already present in autoawq_to_ct.py).

Suggested change

"""Allow ``python -m llmcompressor.conversion.autoawq_to_ct``."""

"""Allow ``python -m llmcompressor.conversion``."""

Copilot · 2026-03-04T11:42:14Z

tests/llmcompressor/conversion/test_autoawq_to_ct.py

+def test_convert_autoawq_to_ct(fake_awq_model: Path, tmp_path: Path):
+    """Full conversion pipeline: verify tensor contents and config."""
+    output_dir = tmp_path / "ct_model"
+    convert_autoawq_to_ct(model_path=fake_awq_model, output_path=output_dir)
+
+    # --- config.json ---
+    with open(output_dir / "config.json") as f:
+        cfg = json.load(f)
+    qcfg = cfg["quantization_config"]
+    assert qcfg["quant_method"] == "compressed-tensors"
+    assert qcfg["format"] == "pack_quantized"
+    group_cfg = qcfg["config_groups"]["group_0"]["weights"]
+    assert group_cfg["num_bits"] == 4
+    assert group_cfg["group_size"] == 16
+    assert group_cfg["symmetric"] is False
+
+    # --- safetensors ---
+    from safetensors import safe_open
+
+    with safe_open(str(output_dir / "model.safetensors"), framework="pt") as f:
+        keys = set(f.keys())
+        assert "model.layers.0.self_attn.q_proj.weight_packed" in keys
+        assert "model.layers.0.self_attn.q_proj.weight_scale" in keys
+        assert "model.layers.0.self_attn.q_proj.weight_zero_point" in keys
+        assert "model.embed_tokens.weight" in keys
+
+        # Old AWQ keys must be gone
+        assert "model.layers.0.self_attn.q_proj.qweight" not in keys
+        assert "model.layers.0.self_attn.q_proj.scales" not in keys
+        assert "model.layers.0.self_attn.q_proj.qzeros" not in keys
+


The integration test exercises only a single-shard model.safetensors case and doesn’t cover the multi-shard path (*.safetensors.index.json) or key rewriting in the index. Since the converter has dedicated logic for index rewriting, consider adding a test that creates 2 shards plus an index file and validates that (1) renamed keys exist in the correct output shard and (2) the rewritten weight_map matches the produced tensors.

Copilot · 2026-03-04T11:42:14Z

src/llmcompressor/conversion/autoawq_to_ct.py

+            # AWQ prefixes in *this* shard
+            shard_prefixes: set[str] = set()
+            for key in keys:
+                if key.endswith(".qweight"):
+                    shard_prefixes.add(key.removesuffix(".qweight"))
+            all_awq_prefixes |= shard_prefixes
+
+            for key in tqdm(keys, desc=f"  {st_file.name}", leave=False):
+                tensor = f.get_tensor(key)
+
+                # Try to match to an AWQ quantised layer
+                matched_prefix = None
+                for prefix in shard_prefixes:
+                    if key.startswith(prefix):
+                        matched_prefix = prefix
+                        break
+
+                if matched_prefix is None:
+                    # Non-quantised parameter – pass through unchanged.
+                    converted[key] = tensor
+                    continue
+


Shard conversion currently identifies quantized layer prefixes only from keys ending in .qweight within the same shard. If a shard contains .scales/.qzeros for a layer whose .qweight lives in a different shard, those tensors will be passed through unconverted, while the index rewrite later will still rename them based on all_awq_prefixes, producing a broken checkpoint (index points to renamed keys that don't exist). Consider detecting AWQ tensors directly by suffix (.qweight, .scales, .qzeros) and deriving the prefix from the key itself, or do a pre-pass over all shards to collect prefixes and use the global set when converting each shard.

Copilot · 2026-03-04T11:42:14Z

src/llmcompressor/conversion/autoawq_to_ct.py

+                # Try to match to an AWQ quantised layer
+                matched_prefix = None
+                for prefix in shard_prefixes:
+                    if key.startswith(prefix):
+                        matched_prefix = prefix
+                        break
+


The per-tensor conversion does an O(#keys × #quant_prefixes) scan (for prefix in shard_prefixes: if key.startswith(prefix)) for every key. On large sharded checkpoints this can be a noticeable CPU cost. Consider determining the prefix via known suffixes (e.g., if key.endswith('.qweight'): prefix=removesuffix(...)) or precomputing a lookup so each key is classified in O(1).

HDCharles · 2026-03-04T23:57:28Z

tests/llmcompressor/conversion/test_autoawq_to_ct.py

@@ -0,0 +1,272 @@
+"""Tests for the AutoAWQ → compressed-tensors conversion tool."""


Not sure how useful this test is, better to make an example that downloads an auto awq model from hf and converts it. Ideally you'd do lm eval on the output model to verify accuracy approximately matches before/after conversion

HDCharles

Thank you for your contribution, Please address comments.

Copilot AI review requested due to automatic review settings March 4, 2026 11:34

Copilot started reviewing on behalf of NJX-njx March 4, 2026 11:35 View session

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

Copilot AI reviewed Mar 4, 2026

View reviewed changes

HDCharles reviewed Mar 4, 2026

View reviewed changes

HDCharles requested changes Mar 4, 2026

View reviewed changes

-def _repack_awq_to_ct(packed_awq: torch.Tensor) -> torch.Tensor:
-    """One-shot conversion: AWQ-packed int32 → CT-packed int32."""
-    return _pack_ct_int4(_unpack_awq_int4(packed_awq))
+def _repack_awq_to_ct(
+    packed_awq: torch.Tensor,
+    max_chunk_bytes: int = 256 * 1024 * 1024,
+) -> torch.Tensor:
+    """Convert AWQ-packed int32 → CT-packed int32 with bounded peak memory.
+    The naive implementation would unpack the entire tensor to an 8× larger
+    int32 matrix and then repack it. For large models this can require tens
+    of GB of RAM. To avoid that, we process the tensor in row-wise chunks:
+    each chunk is unpacked and repacked independently, and the intermediate
+    is immediately discarded.
+    :param packed_awq: AWQ-packed int32 tensor of shape ``(rows, cols_packed)``.
+    :param max_chunk_bytes: Approximate upper bound on the size of the
+        unpacked intermediate per chunk, in bytes.
+    :return: CT-packed int32 tensor with the same shape as ``packed_awq``.
+    """
+    if packed_awq.dim() != 2:
+        # Keep behavior simple and explicit: this helper is for 2D weight
+        # matrices. If other shapes are needed, they should be reshaped by
+        # the caller.
+        raise ValueError(
+            f"_repack_awq_to_ct expects a 2D tensor, got shape {tuple(packed_awq.shape)}"
+        )
+    rows, cols_packed = packed_awq.shape
+    if rows == 0 or cols_packed == 0:
+        return packed_awq.clone()
+    # Each packed column expands to 8 int32 values in the unpacked matrix.
+    cols_unpacked = cols_packed * 8
+    bytes_per_row_unpacked = cols_unpacked * 4  # int32 = 4 bytes
+    # Compute a chunk size (number of rows) that keeps the unpacked
+    # intermediate for a chunk under `max_chunk_bytes`. Always process at
+    # least one row.
+    max_rows_per_chunk = max(1, max_chunk_bytes // max(bytes_per_row_unpacked, 1))
+    # Preallocate output tensor in the packed CT layout.
+    packed_ct = torch.empty_like(packed_awq)
+    for start in range(0, rows, max_rows_per_chunk):
+        end = min(rows, start + max_rows_per_chunk)
+        # Slice the current chunk of rows, convert layout, and write back.
+        chunk_packed_awq = packed_awq[start:end]
+        chunk_unpacked = _unpack_awq_int4(chunk_packed_awq)
+        chunk_packed_ct = _pack_ct_int4(chunk_unpacked)
+        packed_ct[start:end] = chunk_packed_ct
+    return packed_ct

		@@ -0,0 +1,5 @@
		"""Allow ``python -m llmcompressor.conversion.autoawq_to_ct``."""

		@@ -0,0 +1,272 @@
		"""Tests for the AutoAWQ → compressed-tensors conversion tool."""

Conversation

NJX-njx commented Mar 4, 2026

Summary

Key Changes

Core conversion logic

Metadata generation

Usage

Testing

References

Uh oh!

chatgpt-codex-connector bot commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

HDCharles Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants