Skip to content

[NvTesnorRtRtx] Add mixed precision nvModelOpt recipes for Phi 4 mini instruct #242

Open
ynankani wants to merge 2 commits intomicrosoft:mainfrom
ynankani:phi-4-mini-instruct-nvtensorrtrtx-recipe
Open

[NvTesnorRtRtx] Add mixed precision nvModelOpt recipes for Phi 4 mini instruct #242
ynankani wants to merge 2 commits intomicrosoft:mainfrom
ynankani:phi-4-mini-instruct-nvtensorrtrtx-recipe

Conversation

@ynankani
Copy link

Add mixed precision NvModelOpt recipes for Phi-4-mini-instruct

Observed improvement in mmlu and perplexity score for the above model with mixed (Int4+Int8) precision quantization compared to standard int4 quantization.

MMLU          
Model FP16-MB Mixed AWQ -MO Mixed RTN-MO Pure INT4 AWQ-MO Pure INT4 RTN-MO
Phi-4-mini-instruct 66.70% 65.00% 65.20% 64.10% 61.60%
           
Perplexity (isl=1024, stride=512)        
Model FP16-MB Mixed AWQ -MO Mixed RTN-MO Pure INT4 AWQ-MO Pure INT4 RTN-MO
Phi-4-mini-instruct 9.039 9.673 9.712 10.015 10.911

…-instruct

Signed-off-by: unknown <ynankani@nvidia.com>
…-instruct

Signed-off-by: unknown <ynankani@nvidia.com>
@ynankani
Copy link
Author

Please review, merge.

CC @devang-ml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant