ToUnicode patch corrupts Tibetan text into Latin/ASCII gibberish (GID mismatch)

## Description
When running pdf-cmap-fix on the document TI1844-01-001.pdf, the script degrades the text extraction rather than fixing it.
Folder name: IE3KG730
File name: TI1844-01-001.pdf
Fonts:  BookmanOldStyle, Ededris-a, Ededris-a1, Ededris-b, Ededris-b1, Ededris-vowa, Jomolhari, Kailasa, MonlamUniOuChan2, Cambria, Kailasa, Calibri, #e6#96#87#e9#bc#8e#e7#b2#97#e6#af#9b#e6#a5#

### RAW Output: 
(RAW): ༄༅། ། ༧ ས་་དམ་པ་ད་བན་ར་་འགས་ད་ན་གས་

### PATCHED Output: 
(PATCHED): ༄༅# # ( Tú!Ö!f∞!u!çf!£Áq!të!¤!ÎV<ú!≥f!yq!∫<ú!
The script overwrites the existing ToUnicode map with incorrect Latin/ASCII characters, turning the text into complete gibberish.

Sample page for testing : 
[sample.pdf](https://github.com/user-attachments/files/27503805/sample.pdf)

### Subtasks
- [x] implement gname to unicode mapping if gid to unicode fails
- [x] iimplement hash or glyph curve to unicode mapping , gshape-> unicode if both of above fails. 
- [x] using gname find unicode for pua cases.
- [x] do joining on gname and gshape to updated gshape mappings.
- [x] update cli endpoints 
- [x] run test 
- [x] documentation of repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ToUnicode patch corrupts Tibetan text into Latin/ASCII gibberish (GID mismatch) #3

Description

RAW Output:

PATCHED Output:

Subtasks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ToUnicode patch corrupts Tibetan text into Latin/ASCII gibberish (GID mismatch) #3

Description

Description

RAW Output:

PATCHED Output:

Subtasks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions