Conversation
Closes #2305 This PR adds: - gsm8k_eval.py: Evaluation script for running GSM8K benchmarks on quantized models - RESULTS.md: Quantization and evaluation results for Qwen2.5-0.5B-Instruct with FP8_DYNAMIC and FP8_BLOCK schemes Key findings: - FP8_DYNAMIC achieves 22.67% strict match vs 17.97% for FP8_BLOCK on GSM8K - Both schemes achieve ~1.2x compression (1.1GB -> 0.92GB) - Quantized models uploaded to HuggingFace Hub for reproducibility Evaluated on Google Colab L4 GPU (22.5GB) using the existing example scripts. Signed-off-by: rtj1 <tharunjagarlamudi@gmail.com>
Signed-off-by: rtj1 <tharunjagarlamudi@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jagarlamudi <76727507+rtj1@users.noreply.github.com> Signed-off-by: rtj1 <tharunjagarlamudi@gmail.com>
Summary of ChangesHello @HDCharles, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new evaluation script and comprehensive results for AWQ+FP8 quantized models on the GSM8K benchmark. It provides a direct comparison between FP8_DYNAMIC and FP8_BLOCK quantization schemes, highlighting their performance on a Qwen2.5-0.5B-Instruct model and offering a clear recommendation for optimal accuracy preservation. The changes enable standardized benchmarking and provide valuable insights into the effectiveness of different quantization approaches. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
|
|
There was a problem hiding this comment.
Code Review
This pull request introduces evaluation results for AWQ+FP8 quantization in a RESULTS.md file and adds a Python script, gsm8k_eval.py, for running the evaluation. The results documentation is clear and well-structured. The evaluation script, however, contains a significant bug with duplicated code and logic placed outside the if __name__ == '__main__' guard. This could lead to runtime errors if the script is imported as a module. I've provided a suggestion to refactor the script's entry point to resolve these issues.
| if __name__ == "__main__": | ||
| parser = argparse.ArgumentParser(description="Eval quantized models on GSM8K") | ||
| parser.add_argument("model_path", help="Path to quantized model directory") | ||
| args = parser.parse_args() | ||
|
|
||
| if not os.path.isdir(args.model_path): | ||
| print(f"Error: Model path not found: {args.model_path}", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| if not os.path.isdir(args.model_path): | ||
| print(f"Error: Model path not found: {args.model_path}", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| evaluate_model(args.model_path) |
There was a problem hiding this comment.
This block contains duplicated code and has script logic outside the if __name__ == "__main__" guard. The check for os.path.isdir is repeated, and evaluate_model is called at the module level. This will cause a NameError if the file is imported as a module, as args will not be defined. The script-running logic should be consolidated within the if __name__ == "__main__" block to fix the duplication and prevent import-time errors.
| if __name__ == "__main__": | |
| parser = argparse.ArgumentParser(description="Eval quantized models on GSM8K") | |
| parser.add_argument("model_path", help="Path to quantized model directory") | |
| args = parser.parse_args() | |
| if not os.path.isdir(args.model_path): | |
| print(f"Error: Model path not found: {args.model_path}", file=sys.stderr) | |
| sys.exit(1) | |
| if not os.path.isdir(args.model_path): | |
| print(f"Error: Model path not found: {args.model_path}", file=sys.stderr) | |
| sys.exit(1) | |
| evaluate_model(args.model_path) | |
| if __name__ == "__main__": | |
| parser = argparse.ArgumentParser(description="Eval quantized models on GSM8K") | |
| parser.add_argument("model_path", help="Path to quantized model directory") | |
| args = parser.parse_args() | |
| if not os.path.isdir(args.model_path): | |
| print(f"Error: Model path not found: {args.model_path}", file=sys.stderr) | |
| sys.exit(1) | |
| evaluate_model(args.model_path) |
- Update RESULTS.md with HDCharles's Llama-3-8B-Instruct evaluation results - FP8_DYNAMIC: 76.42% strict match vs FP8_BLOCK: 75.21% - Run make style with proper dev dependencies (pip install -e.[dev]) - Fix code formatting per maintainer feedback Results from: vllm-project#2347 Signed-off-by: rtj1 <tharunjagarlamudi@gmail.com>
#2330