prototype for PDF to HTML option 3- Python Prototype

**Description**: As a developer, I want to prototype a solution using Python-based PDF parsing and HTML conversion libraries, so that I can compare its performance and output consistency with Ruby-based approaches.

Details on decision narratives [here](https://github.com/department-of-veterans-affairs/abd-vro/issues/4134#issuecomment-2689001735) (including mock letter, and examination of 961 letter content).

Hypothetical implementation [here](https://github.com/department-of-veterans-affairs/abd-vro/issues/4134#issuecomment-2691757982).

**Acceptance Criteria** 

1.  The service accepts a PDF file as input.
2.  Identify and select Python libraries for:
Text extraction.
Heading detection based on font size.
List and table extraction.
Image extraction with alt attributes.
3. Convert extracted content into structured, accessible HTML.
4. The output is evaluated for consistency across multiple decision narrative PDFs.
5. Compare this prototype to the others and make a recommendation 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prototype for PDF to HTML option 3- Python Prototype #4173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

prototype for PDF to HTML option 3- Python Prototype #4173

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions