Skip to content

prototype for PDF to HTML option 3- Python Prototype #4173

Open
@meganhicks

Description

@meganhicks

Description: As a developer, I want to prototype a solution using Python-based PDF parsing and HTML conversion libraries, so that I can compare its performance and output consistency with Ruby-based approaches.

Details on decision narratives here (including mock letter, and examination of 961 letter content).

Hypothetical implementation here.

Acceptance Criteria

  1. The service accepts a PDF file as input.
  2. Identify and select Python libraries for:
    Text extraction.
    Heading detection based on font size.
    List and table extraction.
    Image extraction with alt attributes.
  3. Convert extracted content into structured, accessible HTML.
  4. The output is evaluated for consistency across multiple decision narrative PDFs.
  5. Compare this prototype to the others and make a recommendation

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions