diff --git a/tico/quantization/examples/README.md b/tico/quantization/examples/README.md index 82d263e5..22b87aec 100644 --- a/tico/quantization/examples/README.md +++ b/tico/quantization/examples/README.md @@ -12,15 +12,58 @@ The examples are intentionally small. Real implementation code lives in tico/quantization/examples/ ├── README.md ├── quantize.py # Run a config-driven quantization pipeline -├── evaluate.py # Evaluate an FP model or saved fake-quant checkpoint +├── evaluate.py # Evaluate an FP model or saved checkpoint ├── inspect.py # Run trace/parity/debug tools └── configs/ # Reusable recipe presets ``` ## Supported top-level actions +### Which command should I use? + +Use `quantize.py` when you want to run a quantization recipe. It loads the +model, builds calibration inputs, runs enabled `pipeline` stages such as GPTQ, +PTQ, SpinQuant, or CLE, and then optionally evaluates and exports the resulting +model. + +Use `evaluate.py` when you only want to evaluate a floating-point model or an +already saved checkpoint. `evaluate.py` does **not** run `pipeline` stages from +the config. If the config contains enabled stages such as `gptq` or `ptq`, they +are ignored by `evaluate.py`. + +Use `inspect.py` for debug-oriented workflows such as trace, parity, runtime +inspection, and wrapper-level smoke checks. + +Summary: + +| Command | Runs `pipeline` stages? | Builds calibration inputs? | Evaluates? | Exports? | +|---|---:|---:|---:|---:| +| `quantize.py` | Yes | Yes | If `evaluation.enabled=true` | If `export.enabled=true` | +| `evaluate.py` | No | No | Yes | No | +| `inspect.py` | Mode-dependent | Mode-dependent | Debug only | Debug only | + +Common usage patterns: + +```bash +# Run GPTQ/PTQ and evaluate the quantized model in the same process. +python -m tico.quantization.examples.quantize \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --set evaluation.enabled=true \ + --set export.enabled=false +``` + +```bash +# Evaluate a saved checkpoint without running quantization again. +python -m tico.quantization.examples.evaluate \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --checkpoint ./out/llama/quantized_model.pt +``` + ### Quantize +`quantize.py` is the command that executes the recipe pipeline. It runs the +enabled stages under `pipeline` in the order they appear in the config. + ```bash python -m tico.quantization.examples.quantize \ --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ @@ -39,14 +82,64 @@ python -m tico.quantization.examples.quantize \ --model Qwen/Qwen3-VL-2B-Instruct ``` +Examples: + +```bash +# Run quantization only. Skip evaluation and export. +python -m tico.quantization.examples.quantize \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --set evaluation.enabled=false \ + --set export.enabled=false +``` + +```bash +# Run quantization and evaluate the quantized model. Skip export. +python -m tico.quantization.examples.quantize \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --set evaluation.enabled=true \ + --set export.enabled=false +``` + +```bash +# Run quantization and export configured artifacts. Skip evaluation. +python -m tico.quantization.examples.quantize \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --set evaluation.enabled=false \ + --set export.enabled=true \ + --output-dir ./out/llama +``` + ### Evaluate +`evaluate.py` evaluates the model loaded by the adapter, or the model loaded +from `--checkpoint` when one is provided. It does not prepare, calibrate, +convert, or export the model. + +Running this command with a GPTQ/PTQ config evaluates the floating-point model, +because the `pipeline` section is not executed by `evaluate.py`: + +```bash +python -m tico.quantization.examples.evaluate \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml +``` + +To evaluate a quantized result, pass a saved checkpoint: + ```bash python -m tico.quantization.examples.evaluate \ --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ --checkpoint ./out/llama/quantized_model.pt ``` +Task overrides are supported: + +```bash +python -m tico.quantization.examples.evaluate \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --checkpoint ./out/llama/quantized_model.pt \ + --tasks winogrande,arc_easy +``` + ### Inspect / debug ```bash @@ -121,6 +214,65 @@ module.name: mean|diff|=..., max|diff|=... is omitted, trace mode records all named modules. Use a small calibration sample count before tracing large Qwen3-VL models. +#### Wrapper smoke mode + +Wrapper smoke mode runs module-level sanity checks for quantization wrappers. +It is useful when you want to quickly validate `prepare -> calibrate -> convert`, +numerical parity, plotting, and optional Circle export for a single wrapped +module. + +List available wrapper smoke cases: + +```bash +python -m tico.quantization.examples.inspect \ + --mode wrapper-smoke \ + --list-cases +``` + +Example output: + +```text +Available wrapper smoke cases: + + nn_linear + nn_conv3d + nn_conv3d_special_case + nn_layernorm + nn_tied_embedding + + llama_attention_prefill + llama_attention_decode + llama_mlp + llama_decoder_layer_prefill + llama_decoder_layer_decode + + qwen3_vl_text_attention + qwen3_vl_text_mlp + qwen3_vl_text_decoder_layer + qwen3_vl_text_model + qwen3_vl_vision_attention +``` + +Run one case: + +```bash +python -m tico.quantization.examples.inspect \ + --config tico/quantization/examples/configs/wrapper_smoke.yaml \ + --mode wrapper-smoke \ + --case llama_attention_prefill +``` + +Run one case with Circle export: + +```bash +python -m tico.quantization.examples.inspect \ + --config tico/quantization/examples/configs/wrapper_smoke.yaml \ + --mode wrapper-smoke \ + --case llama_attention_prefill \ + --export circle \ + --output-dir ./out/wrapper_smoke \ + --strict +``` ## CLI overrides @@ -134,6 +286,51 @@ python -m tico.quantization.examples.quantize \ --set export.enabled=false ``` +Values are parsed as YAML-like scalars when possible: + +```bash +--set evaluation.enabled=true +--set calibration.n_samples=1 +--set model.hf_token=null +--set runtime.dtype=float32 +``` + +List indices are zero-based. This is most useful for toggling entries in the +`pipeline` list: + +```yaml +pipeline: + - name: spinquant # pipeline.0 + enabled: false + + - name: cle # pipeline.1 + enabled: false + + - name: gptq # pipeline.2 + enabled: true + + - name: ptq # pipeline.3 + enabled: true +``` + +For the example above, enable SpinQuant from the command line with: + +```bash +python -m tico.quantization.examples.quantize \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --set pipeline.0.enabled=true +``` + +You can also override fields inside later stages by index. For example, if the +PTQ stage is `pipeline.3`, change its SpinQuant rotation weight bit-width with: + +```bash +python -m tico.quantization.examples.quantize \ + --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \ + --set pipeline.0.enabled=true \ + --set pipeline.3.spin_rotation_weight_bits=8 +``` + Convenience aliases are also available: ```bash @@ -142,8 +339,34 @@ Convenience aliases are also available: --output-dir # overrides export.output_dir, quantize.py only ``` -Prefer adding a dedicated config preset when the change modifies list-valued -sections such as `pipeline`. +Prefer adding a dedicated config preset when the change adds, removes, or +reorders list-valued sections such as `pipeline`. Command-line list overrides +are convenient for toggling existing stages or changing simple scalar fields, +but they are easy to misuse when the stage order differs between configs. + +Good command-line overrides: + +```bash +--set pipeline.0.enabled=true +--set pipeline.2.weight_bits=4 +--set pipeline.3.activation_dtype=int16 +``` + +Prefer a new config file for structural changes: + +```yaml +pipeline: + - name: spinquant + enabled: true + + - name: gptq + enabled: true + weight_bits: 4 + + - name: ptq + enabled: true + activation_dtype: int16 +``` ## Developer rule: do not add one script per model or algorithm @@ -263,3 +486,11 @@ python -m tico.quantization.examples.inspect \ --mode trace \ --set calibration.n_samples=1 ``` + +```bash +python -m tico.quantization.examples.inspect \ + --config tico/quantization/examples/configs/wrapper_smoke.yaml \ + --mode wrapper-smoke \ + --case nn_linear \ + --no-plot +```