Requesting model to test?

For both the Code Editing and Refactoring benchmark, it does not include enough LLMs like Phi-3 / Phi-3.1 / Phi-3.5 family of models since they claim to be both small and capable. This issue came up when observing dashboards such as BigCodeBench and EvalPlus.