Skip to content

Commit a860a12

Browse files
FEAT Parse model cards with pandoc (#257)
Use pandoc to load existing model cards.
1 parent f9d443b commit a860a12

22 files changed

+1973
-3
lines changed

.github/workflows/build-test.yml

+3
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,9 @@ jobs:
6262
then pip install --pre --extra-index https://pypi.anaconda.org/scipy-wheels-nightly/simple scikit-learn;
6363
else pip install "scikit-learn~=${{ matrix.sklearn_version }}";
6464
fi
65+
if [ ${{ matrix.os }} == "ubuntu-latest" ];
66+
then wget -q https://github.com/jgm/pandoc/releases/download/2.19.2/pandoc-2.19.2-1-amd64.deb && sudo dpkg -i pandoc-2.19.2-1-amd64.deb;
67+
fi
6568
python --version
6669
pip --version
6770
pip list

.pre-commit-config.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ repos:
66
exclude: .github/conda/meta.yaml
77
- id: end-of-file-fixer
88
- id: trailing-whitespace
9+
exclude: skops/card/tests/examples
910
- id: check-case-conflict
1011
- id: check-merge-conflict
1112
- repo: https://github.com/psf/black

docs/changes.rst

+3
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ v0.5
2020
enabled, will result in the Hugging Face inference API running with Intel's
2121
scikit-learn intelex library, which can accelerate inference times. :pr:`267`
2222
by `Benjamin Bossan`_.
23+
- Model cards that have been written into a markdown file can now be parsed back
24+
into a :class:`skops.card.Card` object and edited further by using the
25+
:func:`skops.card.parse_modelcard` function. :pr:`257` by `Benjamin Bossan`_.
2326

2427
v0.4
2528
----

docs/model_card.rst

+48-2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ beginning of it, following with the content of the model card in markdown
1111
format. The metadata section is used to make models searchable on the Hub, and
1212
get the inference API and the widgets on the website working.
1313

14+
Metadata
15+
--------
16+
1417
The metadata part of the file needs to follow the specifications `here
1518
<https://huggingface.co/docs/hub/models-cards#model-card-metadata>`__. It
1619
includes simple attributes of your models such as the task you're solving,
@@ -40,6 +43,9 @@ Here's an example of the metadata section of the ``README.md`` file:
4043
``skops`` creates this section of the file for you, and you almost never need
4144
to touch it yourself.
4245

46+
Model Card Content
47+
------------------
48+
4349
The markdown part does not necessarily need to follow any specification in
4450
terms of information passed, which gives the user a lot of flexibility. The
4551
markdown part of the ``README.md`` file comes with a couple of defaults provided
@@ -90,8 +96,8 @@ as well as adding some subsections with plots below that, you can call the
9096
})
9197
9298
Furthermore, you can select existing sections (as well as their subsections)
93-
using :meth:`Card.select`, and you can delete sections using
94-
:meth:`Card.delete`:
99+
using :meth:`.Card.select`, and you can delete sections using
100+
:meth:`.Card.delete`:
95101

96102
.. code-block:: python
97103
@@ -103,3 +109,43 @@ using :meth:`Card.select`, and you can delete sections using
103109
104110
To see how you can use the API in ``skops`` to create a model card, please
105111
refer to :ref:`sphx_glr_auto_examples_plot_model_card.py`.
112+
113+
Saving and Loading Model Cards
114+
------------------------------
115+
116+
Once you have finished creating and modifying the model card, you can save it
117+
using the :meth:`.Card.save` method:
118+
119+
.. code-block:: python
120+
121+
card.save("README.md")
122+
123+
This renders the content of the model card to markdown format and stores it in
124+
the indicated file. It is now ready to be uploaded to Hugging Face Hub.
125+
126+
If you have a finished model card but want to load to make some modifications,
127+
you can use the function :func:`skops.card.parse_modelcard`. This function
128+
parses the model card back into a :class:`.Card` instance that you can work on
129+
further:
130+
131+
.. code-block:: python
132+
133+
from skops import card
134+
model_card = card.parse_modelcard("README.md")
135+
model_card.add(**{"A new section": "Some new content"})
136+
model_card.save("README.md")
137+
138+
When the card is parsed, some minor details of the model card can change, e.g.
139+
if you used different column alignment than the default, this could change, as
140+
well as removing excess empty lines or trailing whitespace. However, the content
141+
itself should be exactly the same. All known deviations are documented in the
142+
`parse_modelcard docs
143+
<https://skops.readthedocs.io/en/stable/modules/classes.html#skops.card.parse_modelcard>`_
144+
145+
For the parsing part, we rely on `pandoc <https://pandoc.org/>`_. If you haven't
146+
installed it, please follow `these instructions
147+
<https://pandoc.org/installing.html>`_. The advantage of using pandoc is that
148+
it's a very mature library and that it supports many different document formats.
149+
Therefore, it should be possible to parse model cards even if they use a format
150+
that's not markdown, for instance reStructuredText, org, or asciidoc. For
151+
saving, we only support markdown for now.

skops/_min_dependencies.py

+1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
"sphinx-prompt": ("1.3.0", "docs", None),
2626
"sphinx-issues": ("1.2.0", "docs", None),
2727
"matplotlib": ("3.3", "docs, tests", None),
28+
"packaging": ("17.0", "install", None),
2829
"pandas": ("1", "docs, tests", None),
2930
# required for persistence tests of external libraries
3031
"lightgbm": ("3", "tests", None),

skops/card/__init__.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
from ._model_card import Card, metadata_from_config
2+
from ._parser import parse_modelcard
23

3-
__all__ = ["Card", "metadata_from_config"]
4+
__all__ = ["Card", "metadata_from_config", "parse_modelcard"]

0 commit comments

Comments
 (0)