Skip to content

Inquiry Regarding Arabic and Urdu Text Support in SAM Model #28

@hussnainmuavia

Description

@hussnainmuavia

I’ve been experimenting with the Hi-SAM model for text-based segmentation and had a few questions regarding its use with non-Latin scripts, specifically Arabic and Urdu.

Language Support:
Is it possible to use the Hi-SAM model effectively with Arabic or Urdu text, especially when using their respective fonts? If so, are there any specific considerations or settings that should be adjusted?

Masking Issue with Tall Glyphs:
While testing, I noticed that certain Arabic and Urdu characters—especially those with tall ascenders or descenders—extend beyond the typical text line boundaries. As a result, when applying a mask, these portions often get cropped, leaving behind unwanted noise. Do you have any recommendations for handling such cases?

Annotation in JSON Format:
Finally, could you guide me on how to generate annotations (e.g., bounding boxes or masks) in JSON format for Arabic or Urdu text in images?

Any suggestions, tips, or documentation links would be greatly appreciated. Thank you in advance for your time and support!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions