sib-swiss
diff --git a/‎01_data_manipulation_and_representation.ipynb‎
Lines changed: 77 additions & 24 deletions b/‎01_data_manipulation_and_representation.ipynb‎
Lines changed: 77 additions & 24 deletions
@@ -120,7 +120,7 @@
    "source": [
     "import pandas as pd\n",
     "\n",
-    "df = pd.read_table(\"data/swiss_census_1880.csv\" , sep=',') \n",
+    "df = pd.read_table(\"data/swiss_census_1880.csv\"  , sep=',') \n",
     "#try to see what happens when sep has a different value\n",
     "\n",
     "df.head() # this returns the 5 first lines of the table"
@@ -198,6 +198,15 @@
     "df['Foreigner']"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.Foreigner"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -257,6 +266,15 @@
     "maskVD.value_counts()"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "maskVD"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -300,15 +318,27 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**micro-exercise :** Select towns with less than 1000 inhabitants (column `Total`), (*optional*: display only town name and number of inhabitants)"
+    "**micro-exercise :** Select towns with less than 1000 inhabitants (column `Total`), (*optional*: display only town name and number of inhabitants)\n",
+    "\n",
+    "<details style=\"border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;\"><summary>👁 View solution </summary>\n",
+    "\n",
+    "```python\n",
+    "df.loc[ df.Total < 1000 , : ]\n",
+    "```\n",
+    " </div>\n",
+    " \n",
+    "</details>\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "## write your solution to the micro-exercise here...\n"
+   ]
   },
   {
    "cell_type": "markdown",
@@ -400,7 +430,17 @@
    "outputs": [],
    "source": [
     "# %load -r 9- solutions/solution_01_01.py\n",
-    "#2. Create a new column is the `DataFrame` representing the fraction of population which is Reformed in each town."
+    "# 2. Create a new column is the DataFrame representing the fraction of population \n",
+    "# which is Reformed in each town. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# optional : What is the minimum/maximum value for this fraction?"
    ]
   },
   {
@@ -632,6 +672,7 @@
     "plotWithMeanMedianMode( dfFractions['0-14 y.o.'] , ax=axes[0])\n",
     "plotWithMeanMedianMode( dfFractions['Foreigner'] , ax=axes[1])\n",
     "plotWithMeanMedianMode( dfFractions['Reformed'] , ax=axes[2])\n",
+    "f.tight_layout()\n",
     "plt.show()"
    ]
   },
@@ -651,20 +692,13 @@
     "3. plot the distribution of the fraction of catholics in the canton of Zurich."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# %load -r 1-7 solutions/solution_01_02.py\n",
+    "# %load -r 1-11 solutions/solution_01_02.py\n",
     "# 1. plot the distribution of the total number of habitants. Try to choose an appropriate mode of representation (histogram, density line? number of bins?)"
    ]
   },
@@ -674,7 +708,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# %load -r 10-14 solutions/solution_01_02.py\n",
+    "# %load -r 12-17 solutions/solution_01_02.py\n",
     "# 2. try to call `sns.histplot` twice in a row, once with to plot the fraction of Foreigner and the other for the fraction of Swiss. What happens?"
    ]
   },
@@ -684,7 +718,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# %load -r 15- solutions/solution_01_02.py\n",
+    "# %load -r 18- solutions/solution_01_02.py\n",
     "# 3. plot the distribution of the fraction of catholics in the canton of Zurich."
    ]
   },
@@ -720,6 +754,13 @@
     "We will also create a column that describes the main religion and main languague for each town:"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -778,6 +819,15 @@
     "dfFractions.groupby('majority language')['Catholic'].mean() ## mean fraction of caholics in towns depending on the majority language"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dfFractions['Catholic'].mean()"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -789,7 +839,7 @@
     "    indexsmallestTown = data.Total.idxmin()\n",
     "    return data['town name'][indexsmallestTown] , data.Total[indexsmallestTown]\n",
     "\n",
-    "grouped.apply(getSmallestTown) ## name and population of the town with minimal number of inhabitants for each canton"
+    "grouped.apply(getSmallestTown, include_groups=False) ## name and population of the town with minimal number of inhabitants for each canton"
    ]
   },
   {
@@ -805,7 +855,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "sns.catplot( x = 'majority language' , y='Catholic' ,  data=dfFractions)"
+    "sns.catplot( y = 'majority language' , x='Catholic' ,  data=dfFractions)"
    ]
   },
   {
@@ -911,7 +961,8 @@
    "outputs": [],
    "source": [
     "sns.catplot( x = 'German speakers' , y='majority religion' , \n",
-    "            data=dfFractions , kind = 'violin' ,height=2, aspect=5 )"
+    "            data=dfFractions , kind = 'violin' ,height=2, aspect=5, inner=None  )\n",
+    "#NB: I use inner=None to remov the little boxplot inside the violin"
    ]
   },
   {
@@ -932,7 +983,7 @@
    "outputs": [],
    "source": [
     "sns.catplot( x = 'German speakers' , y='majority religion' , \n",
-    "            data=dfFractions , kind = 'violin' ,height=2, aspect=5 , cut=0)"
+    "            data=dfFractions , kind = 'violin' ,height=2, aspect=5 , cut=0, inner=None )"
    ]
   },
   {
@@ -965,9 +1016,9 @@
    "outputs": [],
    "source": [
     "sns.catplot( x = 'German speakers' , y='majority religion' , data=dfFractions , height=2, aspect=5,\n",
-    "            kind = 'bar' , ci='sd').set(title='standard deviation')\n",
+    "            kind = 'bar' , errorbar='sd').set(title='standard deviation')\n",
     "sns.catplot( x = 'German speakers' , y='majority religion' , data=dfFractions , height=2, aspect=5,\n",
-    "            kind = 'bar' , ci=95 ).set(title='95% confidence interval')\n"
+    "            kind = 'bar' , errorbar=('ci',95) ).set(title='95% confidence interval')\n"
    ]
   },
   {
@@ -1076,7 +1127,9 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "df['canton name'].unique()"
+   ]
   },
   {
    "cell_type": "code",
@@ -1221,9 +1274,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Environment (conda_py38)",
+   "display_name": "Environment (conda_py311)",
    "language": "python",
-   "name": "conda_py38"
+   "name": "conda_py311"
   },
   "language_info": {
    "codemirror_mode": {
@@ -1235,7 +1288,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.8"
+   "version": "3.11.0"
   }
  },
  "nbformat": 4,