[Bug]:Llama 3.1 8B Instruct W2A16 Quantization leads to Model Collapse (repeating "!") despite using "Best" settings

### Problem Description

> I am encountering severe model collapse when quantifying the `Llama 3.1 8B Instruct` model to **2-bit (W2A16)** using AutoRound.
> Despite using the "Best" configuration (iters=1000, nsamples=512, enable_alg_ext), the resulting model loses all language capabilities and outputs repeating exclamation marks (e.g., `!!!!!!!!!!!!!!!!!!!`) when loaded in vLLM.
> **Expected Behavior:**
> Since Llama 2 (7B/13B) retains some reasoning capabilities under 2-bit quantization in previous benchmarks, I expected Llama 3.1 8B to at least generate coherent text.
> **Current Behavior:**
> * The model outputs repeating characters (garbage).
> * GSM8K score is exactly 0.
> * **However, the same setup works perfectly for 4-bit (W4A16), achieving 76.6% on GSM8K.** This confirms my environment and basic workflow are correct.


### Reproduction Steps


> Run the AutoRound quantization with the following "Best" settings for 2-bit:
> 
> 

> ```bash
> auto_round \
>   --model_name "./llama3.1-8B-Instruct" \
>   --bits 2 \
>   --group_size 128 \
>   --format "auto_round" \
>   --iters 1000 \
>   --nsamples 512 \
>   --seqlen 2048 \
>   --batch_size 8 \
>   --minmax_lr 2e-3 \
>   --enable_alg_ext \
>   --output_dir "./llama3.1-8B-Instruct-W2A16-Best"
> 
> ```

### Environment Information

> * **Model:** Llama 3.1 8B Instruct
> * **AutoRound Version:** 0.9.4
> * **Inference Engine:** vLLM 
> * **GPU:** RTX 6000Ada
> * **CUDA Version:** 12.8

### Error Logs

```shell

```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]:Llama 3.1 8B Instruct W2A16 Quantization leads to Model Collapse (repeating "!") despite using "Best" settings #1342

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]:Llama 3.1 8B Instruct W2A16 Quantization leads to Model Collapse (repeating "!") despite using "Best" settings #1342

Description

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions