Many text instances not detected on large images

Hi,

On very large images (up to 5000×7000), the model detects some text but misses many instances, and recognition mostly fails. I fine-tuned on 1024×1024 images, but adjusting parameters for larger inputs usually causes memory issues. I’d consider fine-tuning on higher-resolution data, but I only have limited samples.

Besides tiling, any suggestions for handling such large images?

Thank you!