Skip to content

New Feature: extract from string #188

@doncat99

Description

@doncat99
# Initialize the extractor
extractor = Extractor()
extractor.load_document_loader(DocumentLoaderPyPdf())
extractor.load_llm("gpt-4o-mini")  # or any other supported model

# Extract data from the document
result = extractor.extract(test_file_path, InvoiceContract)

above is the standard usage of ExtractThinker.

What if I already have custom processing for the PDF document, such as removing headers and footers and filtering out the target string from the PDF document, and I want the extractor to continue based on my pdf_string?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions