Evaluation Parameters for models on Office-Bench

Thanks for the great work on Simia-OB! I'm trying to reproduce the results for Simia-OB (Qwen3-8B) on Office-Bench but haven't had much luck with the accuracy. Could you share the evaluation parameters or scripts used in the original experiments? Thanks again!