A Rust library for parsing Korean Hangul Word Processor (HWP) files with full layout rendering support.
- Complete HWP 5.0 Format Support: Parse all document components including text, formatting, tables, and embedded objects
- Visual Layout Rendering: Reconstruct documents with pixel-perfect accuracy when layout data is available
- Font and Style Preservation: Extract and apply original fonts, sizes, colors, and text formatting
- Advanced Layout Engine: Support for multi-column layouts, line-by-line positioning, and character-level formatting
- SVG Export: Render documents to scalable vector graphics
- Zero-copy Parsing: Efficient parsing with minimal memory allocation
- Safe Rust: Memory-safe implementation with comprehensive error handling
- Document Creation: Full HWP document writing support
- Rich Text Formatting: Bold, italic, colors, fonts, sizes
- Tables: Creation, styling, cell merging
- Lists: Bullets, numbering, Korean/alphabetic/roman formats
- Images: PNG/JPEG/BMP/GIF with captions
- Text Boxes: Positioned and styled text boxes
- Hyperlinks: URL, email, file, and bookmark links
- Headers/Footers: Page numbers and custom content
- Page Layout: Sizes, margins, orientation, columns, backgrounds
Add this to your Cargo.toml:
[dependencies]
hwpers = "0.3"use hwpers::HwpReader;
// Parse an HWP file
let document = HwpReader::from_file("document.hwp")?;
// Extract text content
let text = document.extract_text();
println!("{}", text);
// Access document properties
if let Some(props) = document.get_properties() {
println!("Pages: {}", props.total_page_count);
}
// Iterate through sections and paragraphs
for (i, section) in document.sections().enumerate() {
println!("Section {}: {} paragraphs", i, section.paragraphs.len());
for paragraph in §ion.paragraphs {
if let Some(text) = ¶graph.text {
println!(" {}", text.content);
}
}
}use hwpers::{HwpReader, render::{HwpRenderer, RenderOptions}};
let document = HwpReader::from_file("document.hwp")?;
// Create renderer with custom options
let options = RenderOptions {
dpi: 96,
scale: 1.0,
show_margins: false,
show_baselines: false,
};
let renderer = HwpRenderer::new(&document, options);
let result = renderer.render();
// Export first page to SVG
if let Some(svg) = result.to_svg(0) {
std::fs::write("page1.svg", svg)?;
}
println!("Rendered {} pages", result.pages.len());use hwpers::writer::HwpWriter;
use hwpers::model::hyperlink::Hyperlink;
// Create a new document
let mut writer = HwpWriter::new();
// Add formatted text
writer.add_aligned_paragraph(
"제목",
hwpers::writer::style::ParagraphAlignment::Center
)?;
// Add hyperlinks
let link = Hyperlink::new_url("Rust", "https://rust-lang.org");
writer.add_paragraph_with_hyperlinks(
"Visit Rust website",
vec![link]
)?;
// Configure page layout
writer.set_custom_page_size(210.0, 297.0, // A4 size
hwpers::model::page_layout::PageOrientation::Portrait)?;
writer.set_page_margins_mm(20.0, 20.0, 20.0, 20.0);
// Add header and footer
writer.add_header("Document Header");
writer.add_footer_with_page_number("Page ",
hwpers::model::header_footer::PageNumberFormat::Numeric);
// Save the document
writer.save_to_file("output.hwp")?;// Access character and paragraph formatting
for section in document.sections() {
for paragraph in §ion.paragraphs {
// Get paragraph formatting
if let Some(para_shape) = document.get_para_shape(paragraph.para_shape_id as usize) {
println!("Indent: {}, Alignment: {}",
para_shape.indent,
para_shape.get_alignment()
);
}
// Get character formatting runs
if let Some(char_shapes) = ¶graph.char_shapes {
for pos_shape in &char_shapes.char_positions {
if let Some(char_shape) = document.get_char_shape(pos_shape.char_shape_id as usize) {
println!("Position {}: Size {}, Bold: {}",
pos_shape.position,
char_shape.base_size / 100,
char_shape.is_bold()
);
}
}
}
}
}- ✅ File header and version detection
- ✅ Document properties and metadata
- ✅ Section definitions and page layout
- ✅ Paragraph and character formatting
- ✅ Font definitions (FaceName)
- ✅ Styles and templates
- ✅ Text content with full Unicode support
- ✅ Tables and structured data
- ✅ Control objects (images, OLE objects)
- ✅ Numbering and bullet lists
- ✅ Tab stops and alignment
- ✅ Page dimensions and margins
- ✅ Multi-column layouts
- ✅ Line-by-line positioning (when available)
- ✅ Character-level positioning (when available)
- ✅ Borders and fill patterns
- ✅ SVG export with accurate positioning
- ✅ Compressed document support
- ✅ CFB (Compound File Binary) format handling
- ✅ Multiple encoding support (UTF-16LE)
- ✅ Error recovery and partial parsing
The library includes a command-line tool for inspecting HWP files:
# Install the tool
cargo install hwpers
# Inspect an HWP file
hwp_info document.hwpThis library supports HWP 5.0 format files. For older HWP formats, consider using format conversion tools first.
The HWP writer functionality has been significantly improved with comprehensive feature support:
- Hyperlinks: Complete hyperlink support with proper serialization
- URL links, email links, file links, bookmarks
- Multiple hyperlinks per paragraph
- Custom styling (colors, underline, visited state)
- Header/Footer: Full header and footer implementation
- Custom header/footer text
- Page numbering with multiple formats (numeric, roman, etc.)
- Multiple headers/footers per document
- Page Layout: Comprehensive page layout control
- Custom page sizes and standard sizes (A4, Letter, etc.)
- Portrait/landscape orientation
- Custom margins (narrow, normal, wide, custom)
- Multi-column layouts with adjustable spacing
- Page background colors
- Tables: Full table creation and formatting
- Cell borders and styling
- Cell merging (horizontal and vertical)
- Custom cell content
- Lists/Numbering: Complete list support
- Bullet lists with different symbols per level
- Numbered lists (1., 2., 3., ...)
- Alphabetic lists (a), b), c), ...)
- Roman numeral lists (i., ii., iii., ...)
- Korean lists (가., 나., 다., ...)
- Nested lists with proper indentation
- Text Boxes: Full text box implementation
- Positioned text boxes
- Styled text boxes (highlight, warning, info, etc.)
- Custom styling (borders, backgrounds, alignment)
- Floating text boxes with rotation and transparency
- Images: Complete image insertion
- PNG, JPEG, BMP, GIF support
- Custom dimensions and positioning
- Image captions
- Proper BinData integration
- Styled Text: Rich text formatting
- Bold, italic, underline, strikethrough
- Custom fonts and sizes
- Text colors and background colors
- Multiple styles in single paragraph
- Advanced Formatting:
- Paragraph alignment (left, center, right, justify)
- Line spacing control
- Paragraph spacing (before/after)
- Headings with automatic sizing
- Character and paragraph styles
- Document Properties: Full metadata support
- Title, author, subject, keywords
- Document statistics (character count, word count, etc.)
- Automatic statistics updates
- Shapes/Drawing: Geometric shapes and drawing objects
- Rectangles, circles, ellipses
- Lines, arrows, polygons
- Custom shapes with styling
- Shapes with text content
- Shape grouping
- (See examples/shape_document.rs.disabled for usage examples)
- Charts/Graphs: Data visualization objects
- Mathematical Equations: MathML support
- Forms: Input fields and form controls
- Comments/Annotations: Review and comment features
- Track Changes: Revision history
- Mail Merge: Variable field insertion
- No compression support for writer (reader supports both compressed and uncompressed)
- Some advanced table features may have compatibility issues with older Hanword versions
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
- HWP file format specification by Hancom Inc.
- Korean text processing community
- Rust parsing and document processing ecosystem