Open
Description
Description of the bug
I'm encountering an error during parsing: "unsupported colorspace for '{output}'".
My requirement is that I cannot modify the original PDF file, so I need to address this issue within the parsing script itself.
I've noticed others have raised this issue as well, but it hasn't been resolved. How can I tackle this problem without altering the source code?
How to reproduce the bug
000.pdf
Problem file
def to_docx(file_path):
try:
pdf_file = file_path
word_file_path = file_path[:-4] + '.docx'
docx_file = word_file_path
start_time = time.time()
cv = Converter(pdf_file)
cv.convert(docx_file, start=0, end=None)
cv.close()
end_time = time.time()
logger.info(start_time-end_time)
return True
except Exception as e:
logger.error(f': {e}')
return False
‘ERROR’ unsupported colorspace for '{output}'
pdf2docx version
0.5.8
Operating system
Linux
Python version
3.12
Metadata
Metadata
Assignees
Labels
No labels
Activity