pytorch · SalmanMohammadi · May 28, 2025 · May 28, 2025 · May 28, 2025 · May 29, 2025
diff --git a/README.md b/README.md
@@ -213,6 +213,7 @@ We're also fortunate to be integrated into some of the leading open-source libra
 4. [TorchTune](https://pytorch.org/torchtune/main/tutorials/qlora_finetune.html?highlight=qlora) for our QLoRA and QAT recipes
 5. VLLM for LLM serving: [usage](https://docs.vllm.ai/en/latest/features/quantization/torchao.html)
 6. SGLang for LLM serving: [usage](https://docs.sglang.ai/backend/server_arguments.html#server-arguments) and the major [PR](https://github.com/sgl-project/sglang/pull/1341).
+7. Axolotl for [QAT](https://docs.axolotl.ai/docs/qat.html) and [PTQ](https://docs.axolotl.ai/docs/quantize.html)
 
 ## Videos
 * [Keynote talk at GPU MODE IRL](https://youtu.be/FH5wiwOyPX4?si=VZK22hHz25GRzBG1&t=1009)

diff --git a/torchao/quantization/qat/README.md b/torchao/quantization/qat/README.md
@@ -115,11 +115,20 @@ To fake quantize embedding in addition to linear, you can additionally call
 the following with a filter function during the prepare step:
 
 ```
-from torchao.quantization.quant_api import _is_linear
+# first apply linear transformation to the model as above
+activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
+weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
+quantize_(
+    model,
+    IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
+)
+
+# then apply weight-only transformation to embedding layers
+# activation fake quantization is not supported for embedding layers
 quantize_(
     m,
-    IntXQuantizationAwareTrainingConfig(weight_config=weight_config),
-    filter_fn=lambda m, _: isinstance(m, torch.nn.Embedding) or _is_linear(m),
+    IntXQuantizationAwareTrainingConfig(weight_config=weight_config), 
+    filter_fn=lambda m, _: isinstance(m, torch.nn.Embedding) 
 )
 ```