Add CVFileLoader integration page#3969
Open
Ilansky (ilanoh) wants to merge 1 commit into
Open
Conversation
Contributor
|
Thanks for opening a docs PR, Ilansky (@ilanoh)! When it's ready for review, please add the relevant reviewers:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a document loader integration page for
langchain-cvfile, which loads.cvfiles (an open file format for resumes/CVs: PDF/A-3u with embedded Markdown/HTML/JSON payloads carried as PDF Associated Files).Why
A
.cvfile is a PDF/A-3u that ships a Markdown copy of the same content (plus optional HTML and JSON Resume) as PDF Associated Files (ISO 32000-2 §14.13). For RAG/ATS use cases, the embedded Markdown is a much cleaner text representation than OCRing the visual PDF layer. This loader pulls those embedded payloads directly and returns oneDocumentper textual payload, with the payload declared ascv:primaryPayloadin the file's XMP metadata flaggedmetadata["primary"] = True.What this PR contains
A single new MDX page at
src/oss/python/integrations/document_loaders/cvfile.mdx, following the existing PyPDFLoader / similar loader template (frontmatter, integration details table, loader features table, setup, initialization, load + lazy_load examples, metadata reference, API reference link).Package details
cvfile>=0.1.0,<1,langchain-core>=0.3,<1Code sample is runnable
The example in the page works against any
.cvfile produced by the reference SDK. Sample output in the page was generated against an actual fixture; runningloader.load()returns three Documents for a typical resume (resume.md primary, resume.html alternate, resume.json alternate).