Skip to content

Commit b9e6f34

Browse files
committed
docs: remove documentation in markdown to support python 3.13 (#43)
Since json-schema-for-humans dependency does not support python 3.13, remove the generation of documentation in markdown of main docling types. Remove 'ds' prefix from documentation scripts. Update README. Add python 3.13 in CI/CD workflow checks. Signed-off-by: Cesar Berrospi Ramis <[email protected]>
1 parent 1b30a74 commit b9e6f34

File tree

11 files changed

+99
-13189
lines changed

11 files changed

+99
-13189
lines changed

.github/workflows/checks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ jobs:
66
runs-on: ubuntu-latest
77
strategy:
88
matrix:
9-
python-version: ['3.9', '3.10', '3.11', '3.12']
9+
python-version: ['3.9', '3.10', '3.11', '3.12', '3.13']
1010
steps:
1111
- uses: actions/checkout@v3
1212
- uses: ./.github/actions/setup-poetry

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ repos:
5252
hooks:
5353
- id: docs
5454
name: Docs
55-
entry: poetry run ds_generate_docs docs
55+
entry: poetry run generate_docs docs
5656
pass_filenames: false
5757
language: system
5858
files: '\.py$'

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Docling Core
22

33
[![PyPI version](https://img.shields.io/pypi/v/docling-core)](https://pypi.org/project/docling-core/)
4-
![Python](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)
4+
![Python](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%20%203.11%20%7C%203.12%20%7C%203.13-blue)
55
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
66
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
77
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
@@ -21,7 +21,7 @@ pip install docling-core
2121

2222
### Development setup
2323

24-
To develop for Docling Core, you need Python 3.9 / 3.10 / 3.11 / 3.12 and Poetry. You can then install from your local clone's root dir:
24+
To develop for Docling Core, you need Python 3.9 / 3.10 / 3.11 / 3.12 / 3.13 and Poetry. You can then install from your local clone's root dir:
2525
```bash
2626
poetry install
2727
```
@@ -45,14 +45,14 @@ poetry run pytest test
4545
Document.model_validate_json(data_str)
4646
```
4747

48-
- You can generate the JSON schema of a model with the script `ds_generate_jsonschema`.
48+
- You can generate the JSON schema of a model with the script `generate_jsonschema`.
4949

5050
```py
5151
# for the `Document` type
52-
ds_generate_jsonschema Document
52+
generate_jsonschema Document
5353

5454
# for the use `Record` type
55-
ds_generate_jsonschema Record
55+
generate_jsonschema Record
5656
```
5757

5858
## Documentation
@@ -61,12 +61,12 @@ Docling supports 3 main data types:
6161

6262
- **Document** for publications like books, articles, reports, or patents. When Docling converts an unstructured PDF document, the generated JSON follows this schema.
6363
The Document type also models the metadata that may be attached to the converted document.
64-
Check [Document](docs/Document.md) for the full JSON schema.
64+
Check [Document](docs/Document.json) for the full JSON schema.
6565
- **Record** for structured database records, centered on an entity or _subject_ that is provided with a list of attributes.
6666
Related to records, the statements can represent annotations on text by Natural Language Processing (NLP) tools.
67-
Check [Record](docs/Record.md) for the full JSON schema.
67+
Check [Record](docs/Record.json) for the full JSON schema.
6868
- **Generic** for any data representation, ensuring minimal configuration and maximum flexibility.
69-
Check [Generic](docs/Generic.md) for the full JSON schema.
69+
Check [Generic](docs/Generic.json) for the full JSON schema.
7070

7171
The data schemas are defined using [pydantic](https://pydantic-docs.helpmanual.io/) models, which provide built-in processes to support the creation of data that adhere to those models.
7272

docling_core/utils/ds_generate_docs.py

Lines changed: 0 additions & 144 deletions
This file was deleted.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
#
2+
# Copyright IBM Corp. 2024 - 2024
3+
# SPDX-License-Identifier: MIT
4+
#
5+
6+
"""Generate documentation of Docling types as JSON schema.
7+
8+
Example:
9+
python docling_core/utils/generate_docs.py /tmp/docling_core_files
10+
"""
11+
import argparse
12+
import json
13+
import os
14+
from argparse import BooleanOptionalAction
15+
from pathlib import Path
16+
from shutil import rmtree
17+
from typing import Final
18+
19+
from docling_core.utils.generate_jsonschema import generate_json_schema
20+
21+
MODELS: Final = ["Document", "Record", "Generic"]
22+
23+
24+
def _prepare_directory(folder: str, clean: bool = False) -> None:
25+
"""Create a directory or empty its content if it already exists.
26+
27+
Args:
28+
folder: The name of the directory.
29+
clean: Whether any existing content in the directory should be removed.
30+
"""
31+
if os.path.isdir(folder):
32+
if clean:
33+
for path in Path(folder).glob("**/*"):
34+
if path.is_file():
35+
path.unlink()
36+
elif path.is_dir():
37+
rmtree(path)
38+
else:
39+
os.makedirs(folder, exist_ok=True)
40+
41+
42+
def generate_collection_jsonschema(folder: str):
43+
"""Generate the JSON schema of Docling collections and export them to a folder.
44+
45+
Args:
46+
folder: The name of the directory.
47+
"""
48+
for item in MODELS:
49+
json_schema = generate_json_schema(item)
50+
with open(
51+
os.path.join(folder, f"{item}.json"), mode="w", encoding="utf8"
52+
) as json_file:
53+
json.dump(json_schema, json_file, ensure_ascii=False, indent=2)
54+
55+
56+
def main() -> None:
57+
"""Generate the JSON Schema of Docling collections and export documentation."""
58+
argparser = argparse.ArgumentParser()
59+
argparser.add_argument(
60+
"directory",
61+
help=(
62+
"Directory to generate files. If it exists, any existing content will be"
63+
" removed."
64+
),
65+
)
66+
argparser.add_argument(
67+
"--clean",
68+
help="Whether any existing content in directory should be removed.",
69+
action=BooleanOptionalAction,
70+
dest="clean",
71+
default=False,
72+
required=False,
73+
)
74+
args = argparser.parse_args()
75+
76+
_prepare_directory(args.directory, args.clean)
77+
78+
generate_collection_jsonschema(args.directory)
79+
80+
81+
if __name__ == "__main__":
82+
main()

docling_core/utils/ds_generate_jsonschema.py renamed to docling_core/utils/generate_jsonschema.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"""Generate the JSON Schema of pydantic models and export them to files.
77
88
Example:
9-
python docling_core/utils/ds_generate_jsonschema.py legacy_doc.base.TableCell
9+
python docling_core/utils/generate_jsonschema.py doc.document.TableCell
1010
1111
"""
1212
import argparse
@@ -27,10 +27,10 @@ def _import_class(class_reference: str) -> Any:
2727

2828

2929
def generate_json_schema(class_reference: str) -> Union[dict, None]:
30-
"""Generate a jsonable dict of a model's schema from DS data types.
30+
"""Generate a jsonable dict of a model's schema from a data type.
3131
3232
Args:
33-
class_reference: The reference to a class in 'src.data_types'.
33+
class_reference: The reference to a class in 'docling_core.types'.
3434
3535
Returns:
3636
A jsonable dict of the model's schema.
@@ -48,7 +48,7 @@ def main() -> None:
4848
"""Print the JSON Schema of a model."""
4949
argparser = argparse.ArgumentParser()
5050
argparser.add_argument(
51-
"class_ref", help="Class reference, e.g., legacy_doc.base.TableCell"
51+
"class_ref", help="Class reference, e.g., doc.document.TableCell"
5252
)
5353
args = argparser.parse_args()
5454

0 commit comments

Comments
 (0)