Problem: Files in the Archive directory with .pdf extensions that are actually plain text files cause "Failed to load PDF document" errors.
Example: Archive/ WEBSITE PACKAGE/2025/Latha_2025_Veterinary Clinic.pdf was a 92-byte text file, not a valid PDF.
The backend now automatically converts text files to proper PDFs!
Location: backend/server.py
Features:
- Fully automatic - No user action required
- Seamless - Happens during file upload
- Transparent - User doesn't see any difference
- Robust - Fallback if conversion fails
- Logged - Server logs show conversion status
How it works:
- User uploads a file through LibraDigit AI
- Backend detects if it's a text file with
.pdfextension - Automatically converts to proper PDF format
- Extracts text from the converted PDF
- Continues normal workflow
User Experience:
Upload file → ✅ Auto-converted → ✅ Text extracted → Continue workflow
The backend can also handle text files without conversion:
Location: backend/server.py
Features:
- Detects text files masquerading as PDFs
- Extracts content from text files
- Adds informative notes to the extracted text
- Allows normal workflow to continue
How it works:
- When a file is uploaded, the system checks the file header
- If it's a text file (doesn't start with
%PDF), it's handled specially - Text content is extracted and processed
- User is notified about the file type
A Python utility script to convert text files to proper PDFs:
Location: backend/convert_text_to_pdf.py
Features:
- Converts single files or entire directories
- Preserves metadata (Author, Year, Subject, Title)
- Creates professionally formatted PDFs
- Safe to run multiple times (skips already-converted files)
Simply upload your file through the LibraDigit AI interface:
- Go to Upload & OCR page
- Upload your file (even if it's a text file with
.pdfextension) - The system will automatically detect the file type
- Text will be extracted and you can proceed normally
cd backend
python convert_text_to_pdf.py "Archive/ WEBSITE PACKAGE/2025/Latha_2025_Veterinary Clinic.pdf"cd backend
python convert_text_to_pdf.py "Archive/"This will scan recursively and convert all text files with .pdf extensions.
If you need to use the conversion script, ensure reportlab is installed:
cd backend
pip install -r requirements.txtOr install just reportlab:
pip install reportlabFile: Latha_2025_Veterinary Clinic.pdf
Size: 92 bytes
Type: Plain text
Status: ✗ Cannot be opened by PDF readers
File: Latha_2025_Veterinary Clinic.pdf
Size: 1,822 bytes
Type: Valid PDF document
Status: ✓ Can be opened by any PDF reader
def is_text_file(filepath):
with open(filepath, 'rb') as f:
header = f.read(4)
# PDF files start with %PDF
if header.startswith(b'%PDF'):
return False
return TrueThe system now handles:
- ✓ Valid PDF documents
- ✓ Text files with
.pdfextension - ✓ Image files (PNG, JPG, JPEG, TIFF, BMP)
- ✓ Scanned PDFs (with Tesseract OCR)
Cause: File is a text file with .pdf extension
Solutions:
- Upload through LibraDigit AI (automatic handling)
- Run the conversion script manually
- Check file size (very small files like 92 bytes are likely text files)
Check:
- Is Python installed?
python --version - Is reportlab installed?
pip show reportlab - Is the file path correct? (use quotes for paths with spaces)
Verify:
- Backend server is running
- Latest version of
server.pyis deployed - Check server logs for errors
- Conversion Script Documentation:
backend/CONVERT_TEXT_TO_PDF.md - Backend API Documentation:
backend/server.py - Main README:
README.md
Both solutions are now in place:
- Automatic Detection: The backend automatically handles text files during upload
- Manual Conversion: Use the utility script to batch-convert existing files
You can use either method or both together for maximum flexibility!