Open
Description
Description As a developer, I want to prototype a solution using Ruby libraries (pdf-reader, pdf-extract, hexapdf, nokogiri) to extract and structure PDF content, so that I can determine if a native Ruby approach provides more control and consistency.
Details on decision narratives here (including mock letter, and examination of 961 letter content).
Hypothetical implementation here.
Acceptance Criteria
- The service accepts a PDF file as input.
- pdf-reader extracts text while maintaining paragraph structure.
- pdf-extract identifies and extracts structured elements:
Headings (h1, h2) based on font size.
Lists (ul, ol) and detects ordered vs. unordered lists.
Tables with correct , , and . - hexapdf extracts images and assigns alt text.
- Extracted content is processed with Nokogiri to generate structured, accessible HTML.
- The output is evaluated for consistency across multiple decision narrative PDFs
- Compare this prototype to the others and make a recommendation
Metadata
Metadata
Assignees
Labels
No labels
Activity