Skip to content

Commit fffc1d6

Browse files
authored
Merge pull request #7 from calculquebec/rev-tech-summary
Révision des résumés techniques
2 parents edb5df4 + dc99af7 commit fffc1d6

4 files changed

Lines changed: 36 additions & 40 deletions

File tree

src/01-dataframe.ipynb

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@
166166
"outputs": [],
167167
"source": [
168168
"# Note that pd.read_csv is used because we imported pandas as pd\n",
169-
"surveys_df = pd.read_csv(\"../data/surveys.csv\")"
169+
"surveys_df = pd.read_csv('../data/surveys.csv')"
170170
]
171171
},
172172
{
@@ -182,7 +182,7 @@
182182
"outputs": [],
183183
"source": [
184184
"# Note that pd.read_csv is used because we imported pandas as pd\n",
185-
"pd.read_csv(\"../data/surveys.csv\")"
185+
"pd.read_csv('../data/surveys.csv')"
186186
]
187187
},
188188
{
@@ -972,14 +972,21 @@
972972
" * **Sélection** : `df['nom_colonne']`\n",
973973
" * **Méthodes** :\n",
974974
" * Statistiques descriptives :\n",
975-
" `count()`, `mean()`, `std()`, `min()`, `median()`, `max()`\n",
976-
" * Autres : `describe()`, `nunique()`, `unique()`\n",
975+
" * `count()`, `mean()`, `std()`\n",
976+
" * `min()`, `median()`, `max()`\n",
977+
" * `nunique()`, `unique()`\n",
978+
" * Sommaire statistique : `describe()`\n",
977979
"* **Grouper selon les valeurs** d'une ou plusieurs colonnes :\n",
978980
" * `groupby(nom_col)`\n",
979981
" * `groupby([nom_col1, nom_col2])`\n",
982+
" * Statistiques descriptives : `aggregate([fonction1, ...])`\n",
980983
"* **Tableaux croisés dynamiques**\n",
981984
" * Transformation selon les valeurs de l'index : `unstack()`\n",
982-
" * Aggrégation dans un tableau croisé dynamique : `pivot_table()`"
985+
" * Aggrégation dans un tableau croisé dynamique : `pivot_table()`\n",
986+
" * `values=colX`\n",
987+
" * `index=[col_ind]`\n",
988+
" * `columns=[categorie1, categorie2]`\n",
989+
" * `aggfunc=fonction` (défaut: moyenne)"
983990
]
984991
},
985992
{
@@ -1001,14 +1008,21 @@
10011008
" * **Selection**: `df['column_name']`\n",
10021009
" * **Methods**:\n",
10031010
" * Descriptive statistics:\n",
1004-
" `count()`, `mean()`, `std()`, `min()`, `median()`, `max()`\n",
1005-
" * Others: `describe()`, `nunique()`, `unique()`\n",
1011+
" * `count()`, `mean()`, `std()`\n",
1012+
" * `min()`, `median()`, `max()`\n",
1013+
" * `nunique()`, `unique()`\n",
1014+
" * Statistical summary: `describe()`\n",
10061015
"* **Grouping by values** of one or many columns:\n",
10071016
" * `groupby(column_name)`\n",
10081017
" * `groupby([column_name1, column_name2])`\n",
1018+
" * Descriptive statistics: `aggregate([function1, ...])`\n",
10091019
"* **Pivot tables**\n",
10101020
" * Reshaping a DataFrame from values in the index: `unstack()`\n",
1011-
" * Aggregation in a pivot table: `pivot_table()`"
1021+
" * Aggregation in a pivot table: `pivot_table()`\n",
1022+
" * `values=colX`\n",
1023+
" * `index=[col_ind]`\n",
1024+
" * `columns=[category1, category2]`\n",
1025+
" * `aggfunc=function` (default: mean)"
10121026
]
10131027
},
10141028
{

src/02-selection.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@
7777
"import pandas as pd\n",
7878
"\n",
7979
"# Charger les données\n",
80-
"surveys_df = pd.read_csv(\"../data/surveys.csv\")"
80+
"surveys_df = pd.read_csv('../data/surveys.csv')"
8181
]
8282
},
8383
{
@@ -93,7 +93,7 @@
9393
"import pandas as pd\n",
9494
"\n",
9595
"# Read in the survey csv\n",
96-
"surveys_df = pd.read_csv(\"../data/surveys.csv\")"
96+
"surveys_df = pd.read_csv('../data/surveys.csv')"
9797
]
9898
},
9999
{

src/03-format.ipynb

Lines changed: 8 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171
"import pandas as pd\n",
7272
"\n",
7373
"# Charger les données\n",
74-
"surveys_df = pd.read_csv(\"../data/surveys.csv\")"
74+
"surveys_df = pd.read_csv('../data/surveys.csv')"
7575
]
7676
},
7777
{
@@ -87,7 +87,7 @@
8787
"import pandas as pd\n",
8888
"\n",
8989
"# Read in the survey csv\n",
90-
"surveys_df = pd.read_csv(\"../data/surveys.csv\")"
90+
"surveys_df = pd.read_csv('../data/surveys.csv')"
9191
]
9292
},
9393
{
@@ -652,18 +652,14 @@
652652
},
653653
"source": [
654654
"## Résumé technique\n",
655-
"* **Gestion des types**\n",
656-
" * Pour un **DataFrame** :\n",
657-
" * Attributs : `dtypes`\n",
658-
" * Pour une **série** (colonne) :\n",
659-
" * Attributs : `dtype`\n",
660-
" * Méthodes : `astype()`\n",
655+
"* **Statistique descriptive par groupes selon l'index de** `df`\n",
656+
" * `df.groupby()[colonne].transform(fonction)`\n",
661657
"* **Nettoyage**\n",
662658
" * `df.copy()`\n",
663659
" * `isna()`, `notna()`\n",
664660
" * `colonne.fillna(valeur, inplace=True)`\n",
665661
"* **Sauvegarde**\n",
666-
" * `df.to_csv(nom_csv, index=False)`"
662+
" * `df.to_csv(nom_csv, index)`"
667663
]
668664
},
669665
{
@@ -674,18 +670,14 @@
674670
},
675671
"source": [
676672
"## Technical Summary\n",
677-
"* **Managing data types**\n",
678-
" * For a **DataFrame**:\n",
679-
" * Attribute: `dtypes`\n",
680-
" * For a **Series** (column):\n",
681-
" * Attribute: `dtype`\n",
682-
" * Method: `astype()`\n",
673+
"* **Descriptive statistic by groups with the index of** `df`\n",
674+
" * `df.groupby()[column].transform(function)`\n",
683675
"* **Cleaning data**\n",
684676
" * `df.copy()`\n",
685677
" * `isna()`, `notna()`\n",
686678
" * `column.fillna(value, inplace=True)`\n",
687679
"* **Saving a DataFrame**\n",
688-
" * `df.to_csv(csv_filename, index=False)`"
680+
" * `df.to_csv(csv_filename, index)`"
689681
]
690682
},
691683
{

src/04-combine.ipynb

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1244,14 +1244,9 @@
12441244
" * Réinitialiser l'index au besoin : `reset_index(drop=True)`\n",
12451245
"* **Joindre** des DataFrames avec `pandas.merge()`\n",
12461246
" * `left=`, `right=` : les deux DataFrames à joindre\n",
1247-
" * `left_on=`, `right_on=` : les clés de jonction de chaque DataFrame\n",
1248-
" * `on=` : clés de jonction communes aux deux DataFrames\n",
12491247
" * `how=` : `'inner'` (défaut), `'left'`, `'right'`, `'outer'`\n",
1250-
"* **Table de pivot** : `pivot_table()`\n",
1251-
" * `values=colX`\n",
1252-
" * `index=[col_ind]`\n",
1253-
" * `columns=[categorie1, categorie2]`\n",
1254-
" * `aggfunc=numpy.mean` (défaut: moyenne)"
1248+
" * `left_on=`, `right_on=` : les clés de jonction de chaque DataFrame\n",
1249+
" * `on=` : clés de jonction communes aux deux DataFrames"
12551250
]
12561251
},
12571252
{
@@ -1269,14 +1264,9 @@
12691264
" * Resetting the index: `reset_index(drop=True)`\n",
12701265
"* **Joining** DataFrames with `pandas.merge()`\n",
12711266
" * `left=`, `right=`: both DataFrames to join\n",
1272-
" * `left_on=`, `right_on=`: join key for each DataFrame\n",
1273-
" * `on=`: join key for both DataFrames\n",
12741267
" * `how=`: `'inner'` (default), `'left'`, `'right'`, `'outer'`\n",
1275-
"* **Pivot table** `pivot_table()`\n",
1276-
" * `values=colX`\n",
1277-
" * `index=[col_ind]`\n",
1278-
" * `columns=[category1, category2]`\n",
1279-
" * `aggfunc=numpy.mean` (default: mean)"
1268+
" * `left_on=`, `right_on=`: join key for each DataFrame\n",
1269+
" * `on=`: join key for both DataFrames"
12801270
]
12811271
},
12821272
{

0 commit comments

Comments
 (0)