Merge pull request #28 from fleetingbytes/develop

fleetingbytes · web-flow · commit 77ccd98c6647 · 2024-06-21T07:53:30.000+02:00
Develop
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,8 @@
+# Test files
+target.html
+extract.py
+test.msg
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,20 @@
 
 <!-- towncrier release notes start -->
 
+## 0.9.1 (2024-06-21)
+
+
+### Documentation
+
+- Fix old naming in readme [#22](https://github.com/fleetingbytes/rtfparse/issues/22)
+- Add example how to programmatically extract HTML from MS Outlook message [#25](https://github.com/fleetingbytes/rtfparse/issues/25)
+
+
+### Bugfixes
+
+- Don't setup log if not using the CLI [#24](https://github.com/fleetingbytes/rtfparse/issues/24)
+- Fix possible bug in error handling [#26](https://github.com/fleetingbytes/rtfparse/issues/26)
+
 ## 0.9.0 (2024-03-11)
 
 
diff --git a/README.md b/README.md
@@ -1,9 +1,9 @@
 # rtfparse
 
-Parses Microsofts Rich Text Format (RTF) documents. It creates an in-memory object which represents the tree structure of the RTF document. This object can in turn be rendered by using one of the renderers.
-So far, rtfparse provides only one renderer (`Decapsulate_HTML`) which liberates the HTML code encapsulated in RTF. This will come handy, for examle, if you ever need to extract the HTML from a HTML-formatted email message saved by Microsoft Outlook.
+Parses Microsoft's Rich Text Format (RTF) documents. It creates an in-memory object which represents the tree structure of the RTF document. This object can in turn be rendered by using one of the renderers.
+So far, rtfparse provides only one renderer (`HTML_Decapsulator`) which liberates the HTML code encapsulated in RTF. This will come handy, for examle, if you ever need to extract the HTML from a HTML-formatted email message saved by Microsoft Outlook.
 
-MS Outlook also tends to use RTF compression, so the CLI of rtfparse can optionally do that, too.
+MS Outlook also tends to use RTF compression, so the CLI of rtfparse can optionally decompress that, too.
 
 You can of course write your own renderers of parsed RTF documents and consider contributing them to this project.
 
@@ -56,7 +56,9 @@ In the current version the option `--embed-img` does nothing.
 
 # Programatic usage in a Python module
 
-```
+## Decapsulate HTML from an uncompressed RTF file
+
+```py
 from pathlib import Path
 from rtfparse.parser import Rtf_Parser
 from rtfparse.renderers.html_decapsulator import HTML_Decapsulator
@@ -75,8 +77,39 @@ with open(target_path, mode="w", encoding="utf-8") as html_file:
     renderer.render(parsed, html_file)
 ```
 
+## Decapsulate HTML from an MS Outlook msg file
+
+```py
+from pathlib import Path
+from extract_msg import openMsg
+from compressed_rtf import decompress
+from io import BytesIO
+from rtfparse.parser import Rtf_Parser
+from rtfparse.renderers.html_decapsulator import HTML_Decapsulator
+
+
+source_file = Path("path/to/your/source.msg")
+target_file = Path(r"path/to/your/target.html")
+# Create parent directory of `target_path` if it does not already exist:
+target_file.parent.mkdir(parents=True, exist_ok=True)
+
+# Get a decompressed RTF bytes buffer from the MS Outlook message
+msg = openMsg(source_file)
+decompressed_rtf = decompress(msg.compressedRtf)
+rtf_buffer = BytesIO(decompressed_rtf)
+
+# Parse the rtf buffer
+parser = Rtf_Parser(rtf_file=rtf_buffer)
+parsed = parser.parse_file()
+
+# Decapsulate the HTML from the parsed RTF
+decapsulator = HTML_Decapsulator()
+with open(target_file, mode="w", encoding="utf-8") as html_file:
+    decapsulator.render(parsed, html_file)
+```
+
 # RTF Specification Links
 
 * [RTF Informative References](https://learn.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfcp/85c0b884-a960-4d1a-874e-53eeee527ca6)
-* [RTF Spec 1.9.1](https://go.microsoft.com/fwlink/?LinkId=120924)
+* [RTF Specification 1.9.1](https://go.microsoft.com/fwlink/?LinkId=120924)
 * [RTF Extensions, MS-OXRTFEX](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/411d0d58-49f7-496c-b8c3-5859b045f6cf)
diff --git a/src/rtfparse/__about__.py b/src/rtfparse/__about__.py
@@ -1,4 +1,4 @@
 #!/usr/bin/env python
 
 
-__version__ = "0.9.0"
+__version__ = "0.9.1"
diff --git a/src/rtfparse/parser.py b/src/rtfparse/parser.py
@@ -93,7 +93,7 @@ def parse_file(self) -> entities.Group:
             self.parsed = entities.Group(encoding, file)
         except Exception as err:
             logger.exception(err)
-            self.parsed == Namespace()
+            self.parsed = Namespace()
             self.parsed.structure = list()
         finally:
             if self.rtf_path is not None:

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`	`1`	`#!/usr/bin/env python`
`2`	`2`
`3`	`3`
`4`		`-__version__ = "0.9.0"`
	`4`	`+__version__ = "0.9.1"`