Skip to content

Commit b1e2312

Browse files
authored
Merge pull request #165 from ThibaudReal/doc/misc
Documentation: update doc for shapash report
2 parents cf03fab + 09fffb5 commit b1e2312

File tree

12 files changed

+312
-115
lines changed

12 files changed

+312
-115
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@
2525
<img src="https://img.shields.io/pypi/l/shapash" alt="license">
2626
</a>
2727
<!-- Doc -->
28-
<a href="https://readthedocs.org/projects/shapash/badge/?version=latest">
28+
<a href="https://shapash.readthedocs.io/en/latest/">
2929
<img src="https://readthedocs.org/projects/shapash/badge/?version=latest" alt="doc">
3030
</a>
3131
</p>
3232

3333

34-
🎉 **We just released Shapash 1.3.0 that includes the generation of a standalone HTML report that constitutes a basis of an audit document.** 🎉
34+
🎉 **We just released Shapash 1.3.0 that includes the generation of a standalone HTML report that constitutes a basis of an audit document. [See an example here](https://shapash.readthedocs.io/en/latest/report.html) that was generated [using this tutorial.](https://github.com/MAIF/shapash/blob/master/tutorial/report/tuto-shapash-report01.ipynb)** 🎉
3535

3636
## 🔍 Overview
3737

@@ -86,12 +86,12 @@ Shapash also contributes to data science auditing by displaying usefull informat
8686

8787
- Deploy interpretability part of your project: From model training to deployment (API or Batch Mode)
8888

89-
- Contribute to the **auditability of your model** by generating a **standalone HTML report** of your projects
89+
- Contribute to the **auditability of your model** by generating a **standalone HTML report** of your projects. [Report Example](https://shapash.readthedocs.io/en/latest/report.html)
9090
>We hope that this report will bring a valuable support to auditing models and data related to a better AI governance.
9191
Data Scientists can now deliver to anyone who is interested in their project **a document that freezes different aspects of their work as a basis of an audit report**.
9292
This document can be easily shared across teams (internal audit, DPO, risk, compliance...).
9393

94-
<a href="https://shapash-demo.ossbymaif.fr/">
94+
<a href="https://shapash.readthedocs.io/en/latest/report.html">
9595
<p align="center">
9696
<img src="https://raw.githubusercontent.com/MAIF/shapash/master/docs/_static/shapash-report-demo.gif" width="800" title="report-demo">
9797
</p>
@@ -167,6 +167,8 @@ xpl.generate_report(
167167
)
168168
```
169169

170+
[Report Example](https://shapash.readthedocs.io/en/latest/report.html)
171+
170172
- Step 5: From training to deployment : SmartPredictor Object
171173
> Shapash provides a SmartPredictor object to deploy the summary of local explanation for the operational needs.
172174
It is an object dedicated to deployment, lighter than SmartExplainer with additional consistency checks.

docs/index.html

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,9 @@
5757
<p class="intro">
5858
Shapash is a Python library dedicated to the interpretability of Data Science models. It provides several
5959
types of visualization that display explicit labels that everyone can understand. Data Scientists can more
60-
easily understand their models and share their results. End users can understand the suggestion proposed by
61-
a model using a summary of the most influential criteria.
60+
easily understand their models, share their results and easily document their projects in a html report.
61+
End users can understand the suggestion proposed by a model using a summary of the most influential
62+
criteria.
6263
</p>
6364
<div class="details">
6465
<h1>Features</h1>
@@ -75,6 +76,7 @@ <h1>Features</h1>
7576
<li>Usable for Regression, Binary Classification or Multiclass</li>
7677
<li>Compatible with most of sklearn, lightgbm, catboost, xgboost models</li>
7778
<li>Relevant for exploration and also deployment (through an API or in Batch mode)</li>
79+
<li>Freeze different aspects of a data science project as a basis of an audit report</li>
7880
</ul>
7981
</div>
8082
</div>
@@ -106,6 +108,24 @@ <h2>high adaptability</h2>
106108
</div>
107109
</a>
108110
</div>
111+
<div class="col-md-12 col-xs-4">
112+
<a href="https://shapash-demo.ossbymaif.fr/">
113+
<div class="outer-circle">
114+
<div class="inner-circle">
115+
<p>WEBAPP</p>
116+
</div>
117+
</div>
118+
</a>
119+
</div>
120+
<div class="col-md-12 col-xs-4">
121+
<a href="https://shapash.readthedocs.io/en/latest/report.html">
122+
<div class="outer-circle">
123+
<div class="inner-circle">
124+
<p>REPORT</p>
125+
</div>
126+
</div>
127+
</a>
128+
</div>
109129
</div>
110130
</div>
111131
</div>

docs/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,13 @@ The project was developed by **MAIF** Data Scientists.
6363
|Cheap |0.8524|Ground living area square feet| 1188| 0.9421|Remodel date | 1959| 0.4234|Overall material and finish of the house| 5| 0.3785|Full bathrooms above grade| 1| 0.3738|Number of fireplaces | 0| 0.1687|Rating of basement finished area |Average Rec Room| 0.1302|Wood deck area in square feet| 0| 0.1225|
6464
+--------------------+------+------------------------------+-------+--------------+------------------------------+--------------+--------------+----------------------------------------+-------+--------------+--------------------------+-------+--------------+------------------------------------------+-------------+--------------+----------------------------------------+----------------+--------------+-----------------------------+--------------------------+--------------+
6565

66+
- To freeze different aspects of a data science project as a basis of an audit report
67+
68+
.. image:: ./_static/shapash-report-demo.gif
69+
:width: 700px
70+
:align: center
71+
:target: https://shapash.readthedocs.io/en/latest/report.html
72+
6673
- To discuss results: **Shapash** allows Data Scientists to easily share and discuss their results with non-Data users
6774

6875
**Shapash** features:
@@ -80,6 +87,7 @@ The project was developed by **MAIF** Data Scientists.
8087
- Usable for Regression, Binary Classification or Multiclass
8188
- Compatible with most of sklearn, lightgbm, catboost, xgboost models
8289
- Relevant for exploration and **also** deployment (through an API or in Batch mode) for operational use
90+
- Freeze different aspects of a data science project as a basis of an audit report
8391

8492

8593
**Shapash** is easy to install and use:

docs/tutorials/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,5 @@ This part offers a series of tutorials and allows users to gradually discover th
1515
encoder
1616
postprocess
1717
explainer
18+
report
1819
predictor

docs/tutorials/report.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
.. explainer:
2+
3+
Generate a standalone report of your project
4+
=============================
5+
6+
.. toctree::
7+
:maxdepth: 2
8+
9+
10+
tuto-shapash-report01.rst
Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
Shapash Report
2+
==============
3+
4+
The Shapash Report feature allows data scientists to deliver to
5+
anyone who is interested in their project **a document that freezes
6+
different aspects of their work as a basis of an audit report**. This
7+
document can be easily shared across teams and does not require
8+
anything else than a working internet connexion.
9+
10+
| The shapash ``generate_report`` method allows to generate a report of
11+
your project.
12+
| The result is a standalone HTML file that does not require any
13+
external dependency or server to work.
14+
| The only requirement for the document to display properly is an active
15+
internet connexion.
16+
17+
The report contains the following information :
18+
19+
1. General information about the project
20+
2. Description of the dataset used
21+
3. Documentation about data preparation and feature engineering
22+
4. Details about your model used (library, parameters…)
23+
5. Exploration of the data with a focus on the difference between train and test sets
24+
6. Global explainability of the model
25+
7. Model performance
26+
27+
The first three points are generated using a YML file that the user
28+
should fill. An example is available
29+
`here <https://github.com/MAIF/shapash/blob/master/tutorial/report/utils/project_info.yml>`__.
30+
31+
This tutorial presents an example of how one can generate the Shapash
32+
Report.
33+
34+
Content:
35+
36+
- Set up an example project
37+
- Create and fill your project information that will be displayed in the report
38+
- Generate the base Shapash Report
39+
- *Go further*: Generate a custom report
40+
41+
Data from Kaggle `House
42+
Prices <https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data>`__
43+
44+
Note : you may need to download the HTML report locally and open it
45+
in your browser otherwise it may not show properly.
46+
47+
.. code:: ipython3
48+
49+
import pandas as pd
50+
from category_encoders import OrdinalEncoder
51+
from sklearn.ensemble import RandomForestRegressor
52+
from sklearn.model_selection import train_test_split
53+
54+
Building Supervized Model
55+
-------------------------
56+
57+
.. code:: ipython3
58+
59+
from shapash.data.data_loader import data_loading
60+
house_df, house_dict = data_loading('house_prices')
61+
y_df=house_df['SalePrice']
62+
X_df=house_df[house_df.columns.difference(['SalePrice'])]
63+
64+
.. code:: ipython3
65+
66+
from category_encoders import OrdinalEncoder
67+
68+
categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']
69+
70+
encoder = OrdinalEncoder(
71+
cols=categorical_features,
72+
handle_unknown='ignore',
73+
return_df=True).fit(X_df)
74+
75+
X_df = encoder.transform(X_df)
76+
77+
.. code:: ipython3
78+
79+
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)
80+
81+
.. code:: ipython3
82+
83+
regressor = RandomForestRegressor(n_estimators=50).fit(Xtrain, ytrain)
84+
85+
.. code:: ipython3
86+
87+
y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'], index=Xtest.index)
88+
89+
Fill your project information
90+
-----------------------------
91+
92+
**The next step is to create a YML file containing information about
93+
your project.**
94+
95+
| We will use the example file available
96+
`here <https://github.com/MAIF/shapash/blob/master/tutorial/report/utils/project_info.yml>`__.
97+
| **You are welcome to use this file as a template for your own
98+
report.**
99+
100+
We display the information contained in the YML file below :
101+
102+
.. code:: ipython3
103+
104+
import yaml
105+
106+
with open(r'utils/project_info.yml') as file:
107+
project_info = yaml.full_load(file)
108+
109+
print(yaml.dump(project_info, sort_keys=False))
110+
111+
--------------
112+
113+
**If you want to create your own custom file :**
114+
115+
| The keys of the YML file are the titles of the different sections in
116+
the report.
117+
| The YML file must then respect the following format:
118+
119+
.. code:: yaml
120+
121+
Title of section 1:
122+
property1 name: property1 value
123+
property2 name: property2 value
124+
...
125+
Title of section 2:
126+
property1 name: property1 value
127+
...
128+
129+
..
130+
131+
Note that the **date** can be computed automatically using the *auto*
132+
property value (see example above)
133+
134+
Generate your report
135+
--------------------
136+
137+
Declare and compile SmartExplainer object
138+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139+
140+
.. code:: ipython3
141+
142+
from shapash.explainer.smart_explainer import SmartExplainer
143+
144+
.. code:: ipython3
145+
146+
xpl = SmartExplainer(features_dict=house_dict) # optional parameter, specifies label for features name
147+
148+
.. code:: ipython3
149+
150+
xpl.compile(
151+
x=Xtest,
152+
model=regressor,
153+
preprocessing=encoder, # Optional: compile step can use inverse_transform method
154+
y_pred=y_pred # Optional
155+
)
156+
157+
At this step the model can be checked and inspected using different
158+
methods of the SmartExplainer object we just created.
159+
160+
Please refer to the other tutorials for more information.
161+
162+
Generate the base Shapash Report
163+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164+
165+
Next we can generate the report using the ``generate_report`` method of
166+
our SmartExplainer object.
167+
168+
We need to pass ``x_train``, ``y_train`` and ``y_test`` parameters in
169+
order to explore the data used when training the model.
170+
171+
Please refer to the documentation for a full description of the
172+
parameters.
173+
174+
.. code:: ipython3
175+
176+
xpl.generate_report(
177+
output_file='output/report.html',
178+
project_info_file='utils/project_info.yml',
179+
x_train=Xtrain,
180+
y_train=ytrain,
181+
y_test=ytest,
182+
title_story="House prices report",
183+
title_description="""This document is a data science report of the kaggle house prices tutorial project.
184+
It was generated using the Shapash library.""",
185+
metrics=[
186+
{
187+
'path': 'sklearn.metrics.mean_absolute_error',
188+
'name': 'Mean absolute error',
189+
},
190+
{
191+
'path': 'sklearn.metrics.mean_squared_error',
192+
'name': 'Mean squared error',
193+
}
194+
]
195+
)
196+
197+
Customize your own report
198+
-------------------------
199+
200+
Now let’s customize our report by adding some new sections.
201+
202+
To do so : - First, **copy the base report notebook** you can find
203+
`here <https://github.com/MAIF/shapash/blob/master/shapash/report/base_report.ipynb>`__.
204+
This is the notebook that is used to generate the shapash report. It is
205+
executed and then converted to an HTML file. Only the output of each
206+
cell is kept and the code is deleted. - Then, delete or add cells
207+
depending on what you want to change. - Finally, add the parameter
208+
``notebook_path="path/to/your/custom/report.ipynb"`` in the
209+
``generate_report`` method.
210+
211+
**Tip** : You can use the ``working_dir`` parameter to easily work
212+
inside your custom notebook before using the ``generate_report``
213+
method. This way you can load the parameters used inside the notebook
214+
by papermill. Replace the ``dir_path`` inside your custom notebook
215+
with your own ``working_dir`` where are saved the different instances
216+
used.
217+
218+
For our simple example, we created `this
219+
notebook <https://github.com/MAIF/shapash/blob/master/tutorial/report/utils/custom_report.ipynb>`__.
220+
- We removed the multivariate analysis using the
221+
``report.display_dataset_analysis(multivariate_analysis=False)`` (see
222+
notebook utils/custom_report.ipynb for more information) - It includes
223+
new sections **Relashionship with target variable** and **Relashionship
224+
between training variables** in which we included new simple graphs for
225+
this example. - We also added new cells at the end of the **metrics**
226+
section.
227+
228+
Next, we use this notebook to generate our new custom report :
229+
230+
.. code:: ipython3
231+
232+
xpl.generate_report(
233+
output_file='output/custom_report.html',
234+
project_info_file='utils/project_info.yml',
235+
x_train=Xtrain,
236+
y_train=ytrain,
237+
y_test=ytest,
238+
title_story="House prices report",
239+
title_description="""This document is a data science report of the kaggle house prices tutorial project.
240+
It was generated using the Shapash library.""",
241+
metrics=[
242+
{
243+
'path': 'sklearn.metrics.mean_absolute_error',
244+
'name': 'Mean absolute error',
245+
},
246+
{
247+
'path': 'sklearn.metrics.mean_squared_error',
248+
'name': 'Mean squared error',
249+
}
250+
],
251+
working_dir='working',
252+
notebook_path="utils/custom_report.ipynb"
253+
)
254+

0 commit comments

Comments
 (0)