Hi,
On very large images (up to 5000×7000), the model detects some text but misses many instances, and recognition mostly fails. I fine-tuned on 1024×1024 images, but adjusting parameters for larger inputs usually causes memory issues. I’d consider fine-tuning on higher-resolution data, but I only have limited samples.
Besides tiling, any suggestions for handling such large images?
Thank you!