You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+38-5Lines changed: 38 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,9 @@
1
1
# rtfparse
2
2
3
-
Parses Microsofts Rich Text Format (RTF) documents. It creates an in-memory object which represents the tree structure of the RTF document. This object can in turn be rendered by using one of the renderers.
4
-
So far, rtfparse provides only one renderer (`Decapsulate_HTML`) which liberates the HTML code encapsulated in RTF. This will come handy, for examle, if you ever need to extract the HTML from a HTML-formatted email message saved by Microsoft Outlook.
3
+
Parses Microsoft's Rich Text Format (RTF) documents. It creates an in-memory object which represents the tree structure of the RTF document. This object can in turn be rendered by using one of the renderers.
4
+
So far, rtfparse provides only one renderer (`HTML_Decapsulator`) which liberates the HTML code encapsulated in RTF. This will come handy, for examle, if you ever need to extract the HTML from a HTML-formatted email message saved by Microsoft Outlook.
5
5
6
-
MS Outlook also tends to use RTF compression, so the CLI of rtfparse can optionally do that, too.
6
+
MS Outlook also tends to use RTF compression, so the CLI of rtfparse can optionally decompress that, too.
7
7
8
8
You can of course write your own renderers of parsed RTF documents and consider contributing them to this project.
9
9
@@ -56,7 +56,9 @@ In the current version the option `--embed-img` does nothing.
56
56
57
57
# Programatic usage in a Python module
58
58
59
-
```
59
+
## Decapsulate HTML from an uncompressed RTF file
60
+
61
+
```py
60
62
from pathlib import Path
61
63
from rtfparse.parser import Rtf_Parser
62
64
from rtfparse.renderers.html_decapsulator importHTML_Decapsulator
@@ -75,8 +77,39 @@ with open(target_path, mode="w", encoding="utf-8") as html_file:
75
77
renderer.render(parsed, html_file)
76
78
```
77
79
80
+
## Decapsulate HTML from an MS Outlook msg file
81
+
82
+
```py
83
+
from pathlib import Path
84
+
from extract_msg import openMsg
85
+
from compressed_rtf import decompress
86
+
from io import BytesIO
87
+
from rtfparse.parser import Rtf_Parser
88
+
from rtfparse.renderers.html_decapsulator importHTML_Decapsulator
89
+
90
+
91
+
source_file = Path("path/to/your/source.msg")
92
+
target_file = Path(r"path/to/your/target.html")
93
+
# Create parent directory of `target_path` if it does not already exist:
0 commit comments