Commit 2ee3cac
authored
feat: add metadata model hierarchy (#408)
* feat: add metadata model hierarchy
Signed-off-by: Panos Vagenas <[email protected]>
* add deprecation, add first migration
Signed-off-by: Panos Vagenas <[email protected]>
* extend annotations migration
Signed-off-by: Panos Vagenas <[email protected]>
* update with feedback
Signed-off-by: Panos Vagenas <[email protected]>
* expose main prediction
Signed-off-by: Panos Vagenas <[email protected]>
* ideas on enforcing separation between standard and custom fields
Signed-off-by: Panos Vagenas <[email protected]>
* add custom field setter method
Signed-off-by: Panos Vagenas <[email protected]>
* update Markdown serialization
Signed-off-by: Panos Vagenas <[email protected]>
* revert description, add include_non_meta, showcase custom serializer for summaries
Signed-off-by: Panos Vagenas <[email protected]>
* simplify customization
Signed-off-by: Panos Vagenas <[email protected]>
* fix reference exclusion
Signed-off-by: Panos Vagenas <[email protected]>
* eliminate serialization dupliation between meta & (legacy) annotations
Signed-off-by: Panos Vagenas <[email protected]>
* remove old file
Signed-off-by: Panos Vagenas <[email protected]>
* fix item used in get_parts for meta ser
Signed-off-by: Panos Vagenas <[email protected]>
* serialize GroupItem meta prior to content, DocItem meta after content
Signed-off-by: Panos Vagenas <[email protected]>
* restore ser order for all nodeitems
Signed-off-by: Panos Vagenas <[email protected]>
* move meta serialization into DocSerializer.serialize() to maintain seamless chunking integration
Signed-off-by: Panos Vagenas <[email protected]>
* add allow- & block-lists for meta names, add std field name enum
Signed-off-by: Panos Vagenas <[email protected]>
* add HTML serializer, document meta field names, rename SMILES field
Signed-off-by: Panos Vagenas <[email protected]>
* bump DoclingDocument version
Signed-off-by: Panos Vagenas <[email protected]>
* make TabularChartMetaField.title optional, expose new classes through __init__.py, add MetaUtils
Signed-off-by: Panos Vagenas <[email protected]>
* add DocTags serialization, revert smiles to smi to prevent confusion with plural
Signed-off-by: Panos Vagenas <[email protected]>
---------
Signed-off-by: Panos Vagenas <[email protected]>1 parent a3feae0 commit 2ee3cac
File tree
80 files changed
+2721
-258
lines changed- docling_core
- transforms/serializer
- types/doc
- docs
- test
- data
- chunker
- docling_document/unit
- doc
- legacy_doc
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
80 files changed
+2721
-258
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
258 | 259 | | |
259 | 260 | | |
260 | 261 | | |
| 262 | + | |
261 | 263 | | |
262 | 264 | | |
263 | 265 | | |
| |||
267 | 269 | | |
268 | 270 | | |
269 | 271 | | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
270 | 281 | | |
271 | 282 | | |
272 | 283 | | |
| |||
287 | 298 | | |
288 | 299 | | |
289 | 300 | | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
290 | 321 | | |
291 | 322 | | |
292 | 323 | | |
| |||
0 commit comments