Hi, I have a question about why using MLLM(like GPT-4o or QWEN-VL) in evaluating image-editing tasks. Because I read the paper of VIEScore, it said MLLMs are well aligning human evaluators in image generation tasks, but not good at image-editing tasks. (In Part-6, Conclusion) https://arxiv.org/pdf/2312.14867.