Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 234 additions & 3 deletions tico/quantization/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,58 @@ The examples are intentionally small. Real implementation code lives in
tico/quantization/examples/
├── README.md
├── quantize.py # Run a config-driven quantization pipeline
├── evaluate.py # Evaluate an FP model or saved fake-quant checkpoint
├── evaluate.py # Evaluate an FP model or saved checkpoint
├── inspect.py # Run trace/parity/debug tools
└── configs/ # Reusable recipe presets
```

## Supported top-level actions

### Which command should I use?

Use `quantize.py` when you want to run a quantization recipe. It loads the
model, builds calibration inputs, runs enabled `pipeline` stages such as GPTQ,
PTQ, SpinQuant, or CLE, and then optionally evaluates and exports the resulting
model.

Use `evaluate.py` when you only want to evaluate a floating-point model or an
already saved checkpoint. `evaluate.py` does **not** run `pipeline` stages from
the config. If the config contains enabled stages such as `gptq` or `ptq`, they
are ignored by `evaluate.py`.

Use `inspect.py` for debug-oriented workflows such as trace, parity, runtime
inspection, and wrapper-level smoke checks.

Summary:

| Command | Runs `pipeline` stages? | Builds calibration inputs? | Evaluates? | Exports? |
|---|---:|---:|---:|---:|
| `quantize.py` | Yes | Yes | If `evaluation.enabled=true` | If `export.enabled=true` |
| `evaluate.py` | No | No | Yes | No |
| `inspect.py` | Mode-dependent | Mode-dependent | Debug only | Debug only |

Common usage patterns:

```bash
# Run GPTQ/PTQ and evaluate the quantized model in the same process.
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--set evaluation.enabled=true \
--set export.enabled=false
```

```bash
# Evaluate a saved checkpoint without running quantization again.
python -m tico.quantization.examples.evaluate \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--checkpoint ./out/llama/quantized_model.pt
```

### Quantize

`quantize.py` is the command that executes the recipe pipeline. It runs the
enabled stages under `pipeline` in the order they appear in the config.

```bash
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
Expand All @@ -39,14 +82,64 @@ python -m tico.quantization.examples.quantize \
--model Qwen/Qwen3-VL-2B-Instruct
```

Examples:

```bash
# Run quantization only. Skip evaluation and export.
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--set evaluation.enabled=false \
--set export.enabled=false
```

```bash
# Run quantization and evaluate the quantized model. Skip export.
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--set evaluation.enabled=true \
--set export.enabled=false
```

```bash
# Run quantization and export configured artifacts. Skip evaluation.
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--set evaluation.enabled=false \
--set export.enabled=true \
--output-dir ./out/llama
```

### Evaluate

`evaluate.py` evaluates the model loaded by the adapter, or the model loaded
from `--checkpoint` when one is provided. It does not prepare, calibrate,
convert, or export the model.

Running this command with a GPTQ/PTQ config evaluates the floating-point model,
because the `pipeline` section is not executed by `evaluate.py`:

```bash
python -m tico.quantization.examples.evaluate \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml
```

To evaluate a quantized result, pass a saved checkpoint:

```bash
python -m tico.quantization.examples.evaluate \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--checkpoint ./out/llama/quantized_model.pt
```

Task overrides are supported:

```bash
python -m tico.quantization.examples.evaluate \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--checkpoint ./out/llama/quantized_model.pt \
--tasks winogrande,arc_easy
```

### Inspect / debug

```bash
Expand Down Expand Up @@ -121,6 +214,65 @@ module.name: mean|diff|=..., max|diff|=...
is omitted, trace mode records all named modules. Use a small calibration sample
count before tracing large Qwen3-VL models.

#### Wrapper smoke mode

Wrapper smoke mode runs module-level sanity checks for quantization wrappers.
It is useful when you want to quickly validate `prepare -> calibrate -> convert`,
numerical parity, plotting, and optional Circle export for a single wrapped
module.

List available wrapper smoke cases:

```bash
python -m tico.quantization.examples.inspect \
--mode wrapper-smoke \
--list-cases
```

Example output:

```text
Available wrapper smoke cases:

nn_linear
nn_conv3d
nn_conv3d_special_case
nn_layernorm
nn_tied_embedding

llama_attention_prefill
llama_attention_decode
llama_mlp
llama_decoder_layer_prefill
llama_decoder_layer_decode

qwen3_vl_text_attention
qwen3_vl_text_mlp
qwen3_vl_text_decoder_layer
qwen3_vl_text_model
qwen3_vl_vision_attention
```

Run one case:

```bash
python -m tico.quantization.examples.inspect \
--config tico/quantization/examples/configs/wrapper_smoke.yaml \
--mode wrapper-smoke \
--case llama_attention_prefill
```

Run one case with Circle export:

```bash
python -m tico.quantization.examples.inspect \
--config tico/quantization/examples/configs/wrapper_smoke.yaml \
--mode wrapper-smoke \
--case llama_attention_prefill \
--export circle \
--output-dir ./out/wrapper_smoke \
--strict
```

## CLI overrides

Expand All @@ -134,6 +286,51 @@ python -m tico.quantization.examples.quantize \
--set export.enabled=false
```

Values are parsed as YAML-like scalars when possible:

```bash
--set evaluation.enabled=true
--set calibration.n_samples=1
--set model.hf_token=null
--set runtime.dtype=float32
```

List indices are zero-based. This is most useful for toggling entries in the
`pipeline` list:

```yaml
pipeline:
- name: spinquant # pipeline.0
enabled: false

- name: cle # pipeline.1
enabled: false

- name: gptq # pipeline.2
enabled: true

- name: ptq # pipeline.3
enabled: true
```

For the example above, enable SpinQuant from the command line with:

```bash
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--set pipeline.0.enabled=true
```

You can also override fields inside later stages by index. For example, if the
PTQ stage is `pipeline.3`, change its SpinQuant rotation weight bit-width with:

```bash
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--set pipeline.0.enabled=true \
--set pipeline.3.spin_rotation_weight_bits=8
```

Convenience aliases are also available:

```bash
Expand All @@ -142,8 +339,34 @@ Convenience aliases are also available:
--output-dir # overrides export.output_dir, quantize.py only
```

Prefer adding a dedicated config preset when the change modifies list-valued
sections such as `pipeline`.
Prefer adding a dedicated config preset when the change adds, removes, or
reorders list-valued sections such as `pipeline`. Command-line list overrides
are convenient for toggling existing stages or changing simple scalar fields,
but they are easy to misuse when the stage order differs between configs.

Good command-line overrides:

```bash
--set pipeline.0.enabled=true
--set pipeline.2.weight_bits=4
--set pipeline.3.activation_dtype=int16
```

Prefer a new config file for structural changes:

```yaml
pipeline:
- name: spinquant
enabled: true

- name: gptq
enabled: true
weight_bits: 4

- name: ptq
enabled: true
activation_dtype: int16
```

## Developer rule: do not add one script per model or algorithm

Expand Down Expand Up @@ -263,3 +486,11 @@ python -m tico.quantization.examples.inspect \
--mode trace \
--set calibration.n_samples=1
```

```bash
python -m tico.quantization.examples.inspect \
--config tico/quantization/examples/configs/wrapper_smoke.yaml \
--mode wrapper-smoke \
--case nn_linear \
--no-plot
```
Loading