Skip to content

Prototype option 1 for PDF to HTML -pdf2htmlEX + Nokogiri #4170

Closed
@meganhicks

Description

@meganhicks

Description As a developer, I want to prototype a solution using pdf2htmlEX to convert PDFs to HTML and process the output with Nokogiri, so that I can determine if this approach produces structured and accessible HTML consistently.

Details on decision narratives here (including mock letter, and examination of 961 letter content).

Hypothetical implementation here.

Acceptance Criteria

  1. The service accepts a PDF file as input.
  2. pdf2htmlEX successfully converts the PDF into an HTML format while maintaining layout structure.
  3. Nokogiri processes the HTML to:
    Convert headings (h1, h2) based on font size.
    Convert lists (ul, ol, li).
    Structure tables with , , .
    Ensure images have alt text.
    Remove absolute positioning styles for accessibility.
  4. The service returns a well-structured HTML output.
  5. The output is evaluated for consistency across multiple decision narrative PDFs.
  6. This protype is compared to the others and a recommendation is made

New Note

The scope has adjusted for the project to VBMS only generated decision letters. This is what you should use to test the prototype.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions