|
| 1 | +Shapash Report |
| 2 | +============== |
| 3 | + |
| 4 | + The Shapash Report feature allows data scientists to deliver to |
| 5 | + anyone who is interested in their project **a document that freezes |
| 6 | + different aspects of their work as a basis of an audit report**. This |
| 7 | + document can be easily shared across teams and does not require |
| 8 | + anything else than a working internet connexion. |
| 9 | + |
| 10 | +| The shapash ``generate_report`` method allows to generate a report of |
| 11 | + your project. |
| 12 | +| The result is a standalone HTML file that does not require any |
| 13 | + external dependency or server to work. |
| 14 | +| The only requirement for the document to display properly is an active |
| 15 | + internet connexion. |
| 16 | +
|
| 17 | +The report contains the following information : |
| 18 | + |
| 19 | +1. General information about the project |
| 20 | +2. Description of the dataset used |
| 21 | +3. Documentation about data preparation and feature engineering |
| 22 | +4. Details about your model used (library, parameters…) |
| 23 | +5. Exploration of the data with a focus on the difference between train and test sets |
| 24 | +6. Global explainability of the model |
| 25 | +7. Model performance |
| 26 | + |
| 27 | + The first three points are generated using a YML file that the user |
| 28 | + should fill. An example is available |
| 29 | + `here <https://github.com/MAIF/shapash/blob/master/tutorial/report/utils/project_info.yml>`__. |
| 30 | + |
| 31 | +This tutorial presents an example of how one can generate the Shapash |
| 32 | +Report. |
| 33 | + |
| 34 | +Content: |
| 35 | + |
| 36 | +- Set up an example project |
| 37 | +- Create and fill your project information that will be displayed in the report |
| 38 | +- Generate the base Shapash Report |
| 39 | +- *Go further*: Generate a custom report |
| 40 | + |
| 41 | +Data from Kaggle `House |
| 42 | +Prices <https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data>`__ |
| 43 | + |
| 44 | + Note : you may need to download the HTML report locally and open it |
| 45 | + in your browser otherwise it may not show properly. |
| 46 | + |
| 47 | +.. code:: ipython3 |
| 48 | +
|
| 49 | + import pandas as pd |
| 50 | + from category_encoders import OrdinalEncoder |
| 51 | + from sklearn.ensemble import RandomForestRegressor |
| 52 | + from sklearn.model_selection import train_test_split |
| 53 | +
|
| 54 | +Building Supervized Model |
| 55 | +------------------------- |
| 56 | + |
| 57 | +.. code:: ipython3 |
| 58 | +
|
| 59 | + from shapash.data.data_loader import data_loading |
| 60 | + house_df, house_dict = data_loading('house_prices') |
| 61 | + y_df=house_df['SalePrice'] |
| 62 | + X_df=house_df[house_df.columns.difference(['SalePrice'])] |
| 63 | +
|
| 64 | +.. code:: ipython3 |
| 65 | +
|
| 66 | + from category_encoders import OrdinalEncoder |
| 67 | + |
| 68 | + categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object'] |
| 69 | + |
| 70 | + encoder = OrdinalEncoder( |
| 71 | + cols=categorical_features, |
| 72 | + handle_unknown='ignore', |
| 73 | + return_df=True).fit(X_df) |
| 74 | + |
| 75 | + X_df = encoder.transform(X_df) |
| 76 | +
|
| 77 | +.. code:: ipython3 |
| 78 | +
|
| 79 | + Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1) |
| 80 | +
|
| 81 | +.. code:: ipython3 |
| 82 | +
|
| 83 | + regressor = RandomForestRegressor(n_estimators=50).fit(Xtrain, ytrain) |
| 84 | +
|
| 85 | +.. code:: ipython3 |
| 86 | +
|
| 87 | + y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'], index=Xtest.index) |
| 88 | +
|
| 89 | +Fill your project information |
| 90 | +----------------------------- |
| 91 | + |
| 92 | +**The next step is to create a YML file containing information about |
| 93 | +your project.** |
| 94 | + |
| 95 | +| We will use the example file available |
| 96 | + `here <https://github.com/MAIF/shapash/blob/master/tutorial/report/utils/project_info.yml>`__. |
| 97 | +| **You are welcome to use this file as a template for your own |
| 98 | + report.** |
| 99 | +
|
| 100 | +We display the information contained in the YML file below : |
| 101 | + |
| 102 | +.. code:: ipython3 |
| 103 | +
|
| 104 | + import yaml |
| 105 | + |
| 106 | + with open(r'utils/project_info.yml') as file: |
| 107 | + project_info = yaml.full_load(file) |
| 108 | + |
| 109 | + print(yaml.dump(project_info, sort_keys=False)) |
| 110 | +
|
| 111 | +-------------- |
| 112 | + |
| 113 | +**If you want to create your own custom file :** |
| 114 | + |
| 115 | +| The keys of the YML file are the titles of the different sections in |
| 116 | + the report. |
| 117 | +| The YML file must then respect the following format: |
| 118 | +
|
| 119 | +.. code:: yaml |
| 120 | +
|
| 121 | + Title of section 1: |
| 122 | + property1 name: property1 value |
| 123 | + property2 name: property2 value |
| 124 | + ... |
| 125 | + Title of section 2: |
| 126 | + property1 name: property1 value |
| 127 | + ... |
| 128 | +
|
| 129 | +.. |
| 130 | +
|
| 131 | + Note that the **date** can be computed automatically using the *auto* |
| 132 | + property value (see example above) |
| 133 | + |
| 134 | +Generate your report |
| 135 | +-------------------- |
| 136 | + |
| 137 | +Declare and compile SmartExplainer object |
| 138 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 139 | + |
| 140 | +.. code:: ipython3 |
| 141 | +
|
| 142 | + from shapash.explainer.smart_explainer import SmartExplainer |
| 143 | +
|
| 144 | +.. code:: ipython3 |
| 145 | +
|
| 146 | + xpl = SmartExplainer(features_dict=house_dict) # optional parameter, specifies label for features name |
| 147 | +
|
| 148 | +.. code:: ipython3 |
| 149 | +
|
| 150 | + xpl.compile( |
| 151 | + x=Xtest, |
| 152 | + model=regressor, |
| 153 | + preprocessing=encoder, # Optional: compile step can use inverse_transform method |
| 154 | + y_pred=y_pred # Optional |
| 155 | + ) |
| 156 | +
|
| 157 | +At this step the model can be checked and inspected using different |
| 158 | +methods of the SmartExplainer object we just created. |
| 159 | + |
| 160 | +Please refer to the other tutorials for more information. |
| 161 | + |
| 162 | +Generate the base Shapash Report |
| 163 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 164 | + |
| 165 | +Next we can generate the report using the ``generate_report`` method of |
| 166 | +our SmartExplainer object. |
| 167 | + |
| 168 | +We need to pass ``x_train``, ``y_train`` and ``y_test`` parameters in |
| 169 | +order to explore the data used when training the model. |
| 170 | + |
| 171 | +Please refer to the documentation for a full description of the |
| 172 | +parameters. |
| 173 | + |
| 174 | +.. code:: ipython3 |
| 175 | +
|
| 176 | + xpl.generate_report( |
| 177 | + output_file='output/report.html', |
| 178 | + project_info_file='utils/project_info.yml', |
| 179 | + x_train=Xtrain, |
| 180 | + y_train=ytrain, |
| 181 | + y_test=ytest, |
| 182 | + title_story="House prices report", |
| 183 | + title_description="""This document is a data science report of the kaggle house prices tutorial project. |
| 184 | + It was generated using the Shapash library.""", |
| 185 | + metrics=[ |
| 186 | + { |
| 187 | + 'path': 'sklearn.metrics.mean_absolute_error', |
| 188 | + 'name': 'Mean absolute error', |
| 189 | + }, |
| 190 | + { |
| 191 | + 'path': 'sklearn.metrics.mean_squared_error', |
| 192 | + 'name': 'Mean squared error', |
| 193 | + } |
| 194 | + ] |
| 195 | + ) |
| 196 | +
|
| 197 | +Customize your own report |
| 198 | +------------------------- |
| 199 | + |
| 200 | +Now let’s customize our report by adding some new sections. |
| 201 | + |
| 202 | +To do so : - First, **copy the base report notebook** you can find |
| 203 | +`here <https://github.com/MAIF/shapash/blob/master/shapash/report/base_report.ipynb>`__. |
| 204 | +This is the notebook that is used to generate the shapash report. It is |
| 205 | +executed and then converted to an HTML file. Only the output of each |
| 206 | +cell is kept and the code is deleted. - Then, delete or add cells |
| 207 | +depending on what you want to change. - Finally, add the parameter |
| 208 | +``notebook_path="path/to/your/custom/report.ipynb"`` in the |
| 209 | +``generate_report`` method. |
| 210 | + |
| 211 | + **Tip** : You can use the ``working_dir`` parameter to easily work |
| 212 | + inside your custom notebook before using the ``generate_report`` |
| 213 | + method. This way you can load the parameters used inside the notebook |
| 214 | + by papermill. Replace the ``dir_path`` inside your custom notebook |
| 215 | + with your own ``working_dir`` where are saved the different instances |
| 216 | + used. |
| 217 | + |
| 218 | +For our simple example, we created `this |
| 219 | +notebook <https://github.com/MAIF/shapash/blob/master/tutorial/report/utils/custom_report.ipynb>`__. |
| 220 | +- We removed the multivariate analysis using the |
| 221 | +``report.display_dataset_analysis(multivariate_analysis=False)`` (see |
| 222 | +notebook utils/custom_report.ipynb for more information) - It includes |
| 223 | +new sections **Relashionship with target variable** and **Relashionship |
| 224 | +between training variables** in which we included new simple graphs for |
| 225 | +this example. - We also added new cells at the end of the **metrics** |
| 226 | +section. |
| 227 | + |
| 228 | +Next, we use this notebook to generate our new custom report : |
| 229 | + |
| 230 | +.. code:: ipython3 |
| 231 | +
|
| 232 | + xpl.generate_report( |
| 233 | + output_file='output/custom_report.html', |
| 234 | + project_info_file='utils/project_info.yml', |
| 235 | + x_train=Xtrain, |
| 236 | + y_train=ytrain, |
| 237 | + y_test=ytest, |
| 238 | + title_story="House prices report", |
| 239 | + title_description="""This document is a data science report of the kaggle house prices tutorial project. |
| 240 | + It was generated using the Shapash library.""", |
| 241 | + metrics=[ |
| 242 | + { |
| 243 | + 'path': 'sklearn.metrics.mean_absolute_error', |
| 244 | + 'name': 'Mean absolute error', |
| 245 | + }, |
| 246 | + { |
| 247 | + 'path': 'sklearn.metrics.mean_squared_error', |
| 248 | + 'name': 'Mean squared error', |
| 249 | + } |
| 250 | + ], |
| 251 | + working_dir='working', |
| 252 | + notebook_path="utils/custom_report.ipynb" |
| 253 | + ) |
| 254 | +
|
0 commit comments