Description
When running pdf-cmap-fix on the document TI1844-01-001.pdf, the script degrades the text extraction rather than fixing it.
Folder name: IE3KG730
File name: TI1844-01-001.pdf
Fonts: BookmanOldStyle, Ededris-a, Ededris-a1, Ededris-b, Ededris-b1, Ededris-vowa, Jomolhari, Kailasa, MonlamUniOuChan2, Cambria, Kailasa, Calibri, #e6#96#87#e9#bc#8e#e7#b2#97#e6#af#9b#e6#a5#
RAW Output:
(RAW): ༄༅། ། ༧ ས་་དམ་པ་ད་བན་ར་་འགས་ད་ན་གས་
PATCHED Output:
(PATCHED): ༄༅# # ( Tú!Ö!f∞!u!çf!£Áq!të!¤!ÎV<ú!≥f!yq!∫<ú!
The script overwrites the existing ToUnicode map with incorrect Latin/ASCII characters, turning the text into complete gibberish.
Sample page for testing :
sample.pdf
Subtasks
Description
When running pdf-cmap-fix on the document TI1844-01-001.pdf, the script degrades the text extraction rather than fixing it.
Folder name: IE3KG730
File name: TI1844-01-001.pdf
Fonts: BookmanOldStyle, Ededris-a, Ededris-a1, Ededris-b, Ededris-b1, Ededris-vowa, Jomolhari, Kailasa, MonlamUniOuChan2, Cambria, Kailasa, Calibri, #e6#96#87#e9#bc#8e#e7#b2#97#e6#af#9b#e6#a5#
RAW Output:
(RAW): ༄༅། ། ༧ ས་་དམ་པ་ད་བན་ར་་འགས་ད་ན་གས་
PATCHED Output:
(PATCHED): ༄༅# # ( Tú!Ö!f∞!u!çf!£Áq!të!¤!ÎV<ú!≥f!yq!∫<ú!
The script overwrites the existing ToUnicode map with incorrect Latin/ASCII characters, turning the text into complete gibberish.
Sample page for testing :
sample.pdf
Subtasks