prototype for PDF to HTML option 2- Ruby Prototype

**Description** As a developer, I want to prototype a solution using Ruby libraries (pdf-reader, pdf-extract, hexapdf, nokogiri) to extract and structure PDF content, so that I can determine if a native Ruby approach provides more control and consistency.

Details on decision narratives [here](https://github.com/department-of-veterans-affairs/abd-vro/issues/4134#issuecomment-2689001735) (including mock letter, and examination of 961 letter content).

Hypothetical implementation [here](https://github.com/department-of-veterans-affairs/abd-vro/issues/4134#issuecomment-2691757982).

**Acceptance Criteria** 

1.  The service accepts a PDF file as input.
2.  pdf-reader extracts text while maintaining paragraph structure.
3. pdf-extract identifies and extracts structured elements:
Headings (h1, h2) based on font size.
Lists (ul, ol) and detects ordered vs. unordered lists.
Tables with correct <thead>, <tbody>, and <th>.
4. hexapdf extracts images and assigns alt text.
5. Extracted content is processed with Nokogiri to generate structured, accessible HTML.
6. The output is evaluated for consistency across multiple decision narrative PDFs
7. Compare this prototype to the others and make a recommendation 

Notes: 

Gabe proposed making this modular and working together on this. Gabe spoke in depth about this idea on slack. 

**New Note**

The scope has adjusted for the project to VBMS only generated decision letters. This is what you should use to test the prototype. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

prototype for PDF to HTML option 2- Ruby Prototype #4172

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

prototype for PDF to HTML option 2- Ruby Prototype #4172

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions