-
Notifications
You must be signed in to change notification settings - Fork 151
Open
Description
Problem
Currently, the PDFHandler only extracts visible text from resumes.
If a link is present in the PDF as clickable text (e.g., "GitHub"), the underlying URL is not captured.
As a result, the JSON resume does not include these URLs in the "profiles" section.
Proposed Solution
Enhance to_markdown (or PDFHandler) to:
- Extract link annotations (e.g.,
link['uri']from PyMuPDF). - Append URLs to the text passed to the LLM prompt.
- Ensure the LLM prompt can include these URLs for accurate JSON extraction.
Benefits
- Improves accuracy of profile extraction (GitHub, LinkedIn, portfolio links).
- Ensures that clickable links in resumes are not lost.
- Makes the system more robust for real-world resumes.
Metadata
Metadata
Assignees
Labels
No labels