Skip to content

Commit 9445a2d

Browse files
authored
chore: fix CHANGELOG formatting (#3800)
Fixes formatting in CHANGELOG.md where most of the page was bold and indented. (verify the branch version here: https://github.com/Unstructured-IO/unstructured/blob/crag/tables-tweak/CHANGELOG.md) Bonus tweak: u-table-inspect.sh is more robust to adding borders for visualizations
1 parent 0fe6ac6 commit 9445a2d

File tree

3 files changed

+14
-4
lines changed

3 files changed

+14
-4
lines changed

Diff for: CHANGELOG.md

+11-3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
## 0.16.9-dev0
2+
3+
### Enhancements
4+
5+
### Features
6+
7+
### Fixes
8+
19
## 0.16.8
210

311
### Enhancements
@@ -10,7 +18,7 @@
1018
## 0.16.7
1119

1220
### Enhancements
13-
- **Add image_alt_mode to partition_html** Adds an `image_alt_mode` parameter to `partition_html()` to control how alt text is extracted from images in HTML documents. The parameter can be set to `to_text` to extract alt text as text from <img> html tags
21+
- **Add image_alt_mode to partition_html** Adds an `image_alt_mode` parameter to `partition_html()` to control how alt text is extracted from images in HTML documents for `html_parser_version=v2` . The parameter can be set to `to_text` to extract alt text as text from `<img>` html tags
1422

1523
### Features
1624

@@ -20,8 +28,8 @@
2028
## 0.16.6
2129

2230
### Enhancements
23-
- **Every <table> tag is considered to be ontology.Table** Added special handling for tables in HTML partitioning. This change is made to improve the accuracy of table extraction from HTML documents.
24-
- **Every HTML has default ontology class assigned** When parsing HTML to ontology each defined HTML in the Ontology has assigned default ontology class. This way it is possible to assign ontology class instead of UncategorizedText when the HTML tag is predicted correctly without class assigned class
31+
- **Every `<table>` tag is considered to be ontology.Table** Added special handling for tables in HTML partitioning (`html_parser_version=v2`. This change is made to improve the accuracy of table extraction from HTML documents.
32+
- **Every HTML has default ontology class assigned** When parsing HTML with `html_parser_version=v2` to ontology each defined HTML in the Ontology has assigned default ontology class. This way it is possible to assign ontology class instead of UncategorizedText when the HTML tag is predicted correctly without class assigned class
2533
- **Use (number of actual table) weighted average for table metrics** In evaluating table metrics the mean aggregation now uses the actual number of tables in a document to weight the metric scores
2634

2735
### Features

Diff for: scripts/user/u-tables-inspect.sh

+2
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ jq -c '.[] | select(.type == "Table") | .metadata.text_as_html' "$JSON_FILE" | w
4545
HTML_CONTENT=${HTML_CONTENT%\"}
4646
# add a border and padding to clearly see cell definition
4747
# shellcheck disable=SC2001
48+
HTML_CONTENT=$(echo "$HTML_CONTENT" | sed 's/<table /<table border="1" cellpadding="10" /')
49+
# shellcheck disable=SC2001
4850
HTML_CONTENT=$(echo "$HTML_CONTENT" | sed 's/<table>/<table border="1" cellpadding="10">/')
4951
# add newlines for readability in the html
5052
# shellcheck disable=SC2001

Diff for: unstructured/__version__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.16.8" # pragma: no cover
1+
__version__ = "0.16.9-dev0" # pragma: no cover

0 commit comments

Comments
 (0)