|
| 1 | +# hwpers |
| 2 | + |
| 3 | +[](https://crates.io/crates/hwpers) |
| 4 | +[](https://docs.rs/hwpers) |
| 5 | +[](https://github.com/yourusername/hwpers/actions) |
| 6 | +[](LICENSE-MIT) |
| 7 | + |
| 8 | +A Rust library for parsing Korean Hangul Word Processor (HWP) files with full layout rendering support. |
| 9 | + |
| 10 | +## Features |
| 11 | + |
| 12 | +- **Complete HWP 5.0 Format Support**: Parse all document components including text, formatting, tables, and embedded objects |
| 13 | +- **Visual Layout Rendering**: Reconstruct documents with pixel-perfect accuracy when layout data is available |
| 14 | +- **Font and Style Preservation**: Extract and apply original fonts, sizes, colors, and text formatting |
| 15 | +- **Advanced Layout Engine**: Support for multi-column layouts, line-by-line positioning, and character-level formatting |
| 16 | +- **SVG Export**: Render documents to scalable vector graphics |
| 17 | +- **Zero-copy Parsing**: Efficient parsing with minimal memory allocation |
| 18 | +- **Safe Rust**: Memory-safe implementation with comprehensive error handling |
| 19 | + |
| 20 | +## Quick Start |
| 21 | + |
| 22 | +Add this to your `Cargo.toml`: |
| 23 | + |
| 24 | +```toml |
| 25 | +[dependencies] |
| 26 | +hwpers = "0.1" |
| 27 | +``` |
| 28 | + |
| 29 | +### Basic Usage |
| 30 | + |
| 31 | +```rust |
| 32 | +use hwpers::HwpReader; |
| 33 | + |
| 34 | +// Parse an HWP file |
| 35 | +let document = HwpReader::from_file("document.hwp")?; |
| 36 | + |
| 37 | +// Extract text content |
| 38 | +let text = document.extract_text(); |
| 39 | +println!("{}", text); |
| 40 | + |
| 41 | +// Access document properties |
| 42 | +if let Some(props) = document.get_properties() { |
| 43 | + println!("Pages: {}", props.total_page_count); |
| 44 | +} |
| 45 | + |
| 46 | +// Iterate through sections and paragraphs |
| 47 | +for (i, section) in document.sections().enumerate() { |
| 48 | + println!("Section {}: {} paragraphs", i, section.paragraphs.len()); |
| 49 | + |
| 50 | + for paragraph in §ion.paragraphs { |
| 51 | + if let Some(text) = ¶graph.text { |
| 52 | + println!(" {}", text.content); |
| 53 | + } |
| 54 | + } |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +### Visual Layout Rendering |
| 59 | + |
| 60 | +```rust |
| 61 | +use hwpers::{HwpReader, render::{HwpRenderer, RenderOptions}}; |
| 62 | + |
| 63 | +let document = HwpReader::from_file("document.hwp")?; |
| 64 | + |
| 65 | +// Create renderer with custom options |
| 66 | +let options = RenderOptions { |
| 67 | + dpi: 96, |
| 68 | + scale: 1.0, |
| 69 | + show_margins: false, |
| 70 | + show_baselines: false, |
| 71 | +}; |
| 72 | + |
| 73 | +let renderer = HwpRenderer::new(&document, options); |
| 74 | +let result = renderer.render(); |
| 75 | + |
| 76 | +// Export first page to SVG |
| 77 | +if let Some(svg) = result.to_svg(0) { |
| 78 | + std::fs::write("page1.svg", svg)?; |
| 79 | +} |
| 80 | + |
| 81 | +println!("Rendered {} pages", result.pages.len()); |
| 82 | +``` |
| 83 | + |
| 84 | +### Advanced Formatting Access |
| 85 | + |
| 86 | +```rust |
| 87 | +// Access character and paragraph formatting |
| 88 | +for section in document.sections() { |
| 89 | + for paragraph in §ion.paragraphs { |
| 90 | + // Get paragraph formatting |
| 91 | + if let Some(para_shape) = document.get_para_shape(paragraph.para_shape_id as usize) { |
| 92 | + println!("Indent: {}, Alignment: {}", |
| 93 | + para_shape.indent, |
| 94 | + para_shape.get_alignment() |
| 95 | + ); |
| 96 | + } |
| 97 | + |
| 98 | + // Get character formatting runs |
| 99 | + if let Some(char_shapes) = ¶graph.char_shapes { |
| 100 | + for pos_shape in &char_shapes.char_positions { |
| 101 | + if let Some(char_shape) = document.get_char_shape(pos_shape.char_shape_id as usize) { |
| 102 | + println!("Position {}: Size {}, Bold: {}", |
| 103 | + pos_shape.position, |
| 104 | + char_shape.base_size / 100, |
| 105 | + char_shape.is_bold() |
| 106 | + ); |
| 107 | + } |
| 108 | + } |
| 109 | + } |
| 110 | + } |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +## Supported Features |
| 115 | + |
| 116 | +### Document Structure |
| 117 | +- ✅ File header and version detection |
| 118 | +- ✅ Document properties and metadata |
| 119 | +- ✅ Section definitions and page layout |
| 120 | +- ✅ Paragraph and character formatting |
| 121 | +- ✅ Font definitions (FaceName) |
| 122 | +- ✅ Styles and templates |
| 123 | + |
| 124 | +### Content Types |
| 125 | +- ✅ Text content with full Unicode support |
| 126 | +- ✅ Tables and structured data |
| 127 | +- ✅ Control objects (images, OLE objects) |
| 128 | +- ✅ Numbering and bullet lists |
| 129 | +- ✅ Tab stops and alignment |
| 130 | + |
| 131 | +### Layout and Rendering |
| 132 | +- ✅ Page dimensions and margins |
| 133 | +- ✅ Multi-column layouts |
| 134 | +- ✅ Line-by-line positioning (when available) |
| 135 | +- ✅ Character-level positioning (when available) |
| 136 | +- ✅ Borders and fill patterns |
| 137 | +- ✅ SVG export with accurate positioning |
| 138 | + |
| 139 | +### Advanced Features |
| 140 | +- ✅ Compressed document support |
| 141 | +- ✅ CFB (Compound File Binary) format handling |
| 142 | +- ✅ Multiple encoding support (UTF-16LE) |
| 143 | +- ✅ Error recovery and partial parsing |
| 144 | + |
| 145 | +## Command Line Tool |
| 146 | + |
| 147 | +The library includes a command-line tool for inspecting HWP files: |
| 148 | + |
| 149 | +```bash |
| 150 | +# Install the tool |
| 151 | +cargo install hwpers |
| 152 | + |
| 153 | +# Inspect an HWP file |
| 154 | +hwp_info document.hwp |
| 155 | +``` |
| 156 | + |
| 157 | +## Format Support |
| 158 | + |
| 159 | +This library supports HWP 5.0 format files. For older HWP formats, consider using format conversion tools first. |
| 160 | + |
| 161 | +## Contributing |
| 162 | + |
| 163 | +Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change. |
| 164 | + |
| 165 | +## License |
| 166 | + |
| 167 | +This project is licensed under either of |
| 168 | + |
| 169 | +- Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0) |
| 170 | +- MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT) |
| 171 | + |
| 172 | +at your option. |
| 173 | + |
| 174 | +## Acknowledgments |
| 175 | + |
| 176 | +- HWP file format specification by Hancom Inc. |
| 177 | +- Korean text processing community |
| 178 | +- Rust parsing and document processing ecosystem |
0 commit comments