Commit 238f985
authored
feat: add --images support to unstructured-get-json.sh (#3888)
E.g., now can run:
```bash
# extracts base64 encoded image data for `Table` and `Image` elements
$ unstructured-get-json.sh --trace --verbose --images /t/docs/Captur-1317-5_ENG-p5.pdf
# also extracts `Title` elements (see screenshot)
$ IMAGE_BLOCK_TYPES='"title","table","image"' unstructured-get-json.sh --trace --verbose --images /t/docs/Captur-1317-5_ENG-p5.pdf
```
It was discovered during testing that "narrativetext" does not work,
probably due to camel casing of NarrativeText 😬
1 parent b5b1307 commit 238f985
1 file changed
+10
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
68 | 70 | | |
69 | 71 | | |
70 | 72 | | |
| 73 | + | |
71 | 74 | | |
72 | 75 | | |
73 | 76 | | |
| |||
100 | 103 | | |
101 | 104 | | |
102 | 105 | | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
103 | 110 | | |
104 | 111 | | |
105 | 112 | | |
| |||
180 | 187 | | |
181 | 188 | | |
182 | 189 | | |
| 190 | + | |
| 191 | + | |
183 | 192 | | |
184 | 193 | | |
185 | 194 | | |
186 | 195 | | |
187 | 196 | | |
188 | | - | |
| 197 | + | |
189 | 198 | | |
190 | 199 | | |
191 | 200 | | |
| |||
0 commit comments