Separated the logic to group by market cap in the notebook (#294)

quantopiancal · web-flow · commit 908df78e26a1 · 2018-11-21T16:08:10.000-05:00
New groupby logic
diff --git a/notebooks/tutorials/3_factset_alphalens_lesson_4/notebook.ipynb b/notebooks/tutorials/3_factset_alphalens_lesson_4/notebook.ipynb
@@ -13,11 +13,16 @@
     "1. Grouping assets by market cap, then analyzing each cap type individually.\n",
     "2. Writing group neutral strategies.\n",
     "3. Determining an alpha factor's decay rate.\n",
-    "4. Dealing with a common Alphalens error named MaxLossExceededError.\n",
-    "\n",
-    "**All sections of this lesson will use the data produced by the Pipeline created in the following cell. Please run it.**\n",
+    "4. Dealing with a common Alphalens error named MaxLossExceededError."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Grouping By Market Cap\n",
     "\n",
-    "**Important note**: Until this lesson, we passed the output of `run_pipeline()` to `get_clean_factor_and_forward_returns()` without any changes. This was possible because the previous lessons' Pipelines only returned one column. This lesson's Pipeline returns two columns, which means we need to *specify the column* we're passing as factor data. Look for commented code near `get_clean_factor_and_forward_returns()` in the following cell to see how to do this."
+    "The following code defines a universe and creates an alpha factor within a pipeline. It also returns a classifier by using the quantiles() function. This function is useful for grouping your assets by an arbitrary column of data. In this example, we will group our assets by their market cap, and analyze how effective our alpha factor is among the different cap types (small, medium, and large cap)."
    ]
   },
   {
@@ -54,19 +59,17 @@
     "\n",
     "    factor_to_analyze = (ciwc_processed + spwc_processed).zscore()\n",
     "\n",
-    "    # The following columns will help us group assets by market cap. This will allow us to analyze\n",
-    "    # whether our alpha factor's predictiveness varies among assets with different market caps.\n",
     "    market_cap = factset.Fundamentals.mkt_val.latest\n",
-    "    is_small_cap = market_cap.percentile_between(0, 100)\n",
-    "    is_mid_cap = market_cap.percentile_between(50, 100)\n",
-    "    is_large_cap = market_cap.percentile_between(90, 100)\n",
+    "    \n",
+    "    # .quantiles(), when supplied with bins=3, tells you which third that the assets value places in.\n",
+    "    # for example, in 2018, Apple is in the third bin, because it has a large market cap.\n",
+    "    # A different asset with a smaller market cap would probably be in the first or second bin.\n",
+    "    cap_type = market_cap.quantiles(bins=3, mask=base_universe)\n",
     "\n",
     "    return Pipeline(\n",
     "        columns = {\n",
     "          'factor_to_analyze': factor_to_analyze, \n",
-    "          'small_cap_filter': is_small_cap,\n",
-    "          'mid_cap_filter': is_mid_cap,\n",
-    "          'large_cap_filter': is_large_cap,\n",
+    "          'cap_type': cap_type\n",
     "        },\n",
     "        screen = (\n",
     "            base_universe\n",
@@ -75,29 +78,14 @@
     "        )\n",
     "    )\n",
     "\n",
-    "\n",
+    "# Create the pipeline data\n",
     "pipeline_output = run_pipeline(make_pipeline(), '2013-1-1', '2014-1-1')\n",
-    "pricing_data = get_pricing(pipeline_output.index.levels[1], '2013-1-1', '2014-3-1', fields='open_price')\n",
     "\n",
-    "# To group by market cap, we will follow the following steps.\n",
+    "# Replace the quantile values in the cap_type column for added clarity\n",
+    "pipeline_output['cap_type'].replace([0, 1, 2], ['small_cap', 'mid_cap', 'large_cap'], inplace=True)\n",
     "\n",
-    "# Convert the \"True\" values to ones, so they can be added together\n",
-    "pipeline_output[['small_cap_filter', 'mid_cap_filter', 'large_cap_filter']] *= 1\n",
-    "\n",
-    "# If a stock passed the large_cap filter, it also passed the mid_cap and small_cap filters.\n",
-    "# This means we can add the three columns, and stocks that are large_cap will get a value of 3,\n",
-    "# stocks that are mid cap will get a value of 2, and stocks that are small cap will get 1.\n",
-    "pipeline_output['cap_type'] = (\n",
-    "    pipeline_output['small_cap_filter'] + pipeline_output['mid_cap_filter'] + pipeline_output['large_cap_filter']\n",
-    ")\n",
-    "\n",
-    "# drop the old columns, we don't need them anymore\n",
-    "pipeline_output.drop(['small_cap_filter', 'mid_cap_filter', 'large_cap_filter'], axis=1, inplace=True)\n",
-    "\n",
-    "# rename the 1's, 2's and 3's for clarity\n",
-    "pipeline_output['cap_type'].replace([1, 2, 3], ['small_cap', 'mid_cap', 'large_cap'], inplace=True)\n",
+    "pricing_data = get_pricing(pipeline_output.index.levels[1], '2013-1-1', '2014-3-1', fields='open_price')\n",
     "\n",
-    "# the final product\n",
     "pipeline_output.head(5)"
    ]
   },
@@ -111,7 +99,9 @@
     "\n",
     "You can group assets by any classifier, but sector and market cap are most common. The Pipeline in the first cell of this lesson returns a column named `cap_type`, whose values represent the assets market capitalization. All we have to do now is pass that column to the `groupby` argument of `get_clean_factor_and_forward_returns()`\n",
     "\n",
-    "**Run the following cell, and notice the charts at the bottom of the tear sheet showing how our factor performs among different cap types.**"
+    "**Run the following cell, and notice the charts at the bottom of the tear sheet showing how our factor performs among different cap types.**\n",
+    "\n",
+    "**Important note**: Until this lesson, we passed the output of `run_pipeline()` to `get_clean_factor_and_forward_returns()` without any changes. This was possible because the previous lessons' Pipelines only returned one column. This lesson's Pipeline returns two columns, which means we need to *specify the column* we're passing as factor data. Look for commented code near `get_clean_factor_and_forward_returns()` in the following cell to see how to do this."
    ]
   },
   {
@@ -125,7 +115,7 @@
     "from alphalens.tears import create_returns_tear_sheet\n",
     "\n",
     "factor_data = get_clean_factor_and_forward_returns(\n",
-    "    factor=pipeline_output['factor_to_analyze'],\n",
+    "    factor=pipeline_output['factor_to_analyze'], # This is how you pass a single column from pipeline_output\n",
     "    prices=pricing_data,\n",
     "    groupby=pipeline_output['cap_type'],\n",
     ")\n",
@@ -247,26 +237,26 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "new_pipeline_output = run_pipeline(\n",
+    "pipeline_output = run_pipeline(\n",
     "    make_pipeline(),\n",
     "    start_date='2013-1-1', \n",
     "    end_date='2014-1-1' #  *** NOTE *** Our factor data ends in 2014\n",
     ")\n",
     "\n",
-    "new_pricing_data = get_pricing(\n",
+    "pricing_data = get_pricing(\n",
     "    pipeline_output.index.levels[1], \n",
     "    start_date='2013-1-1',\n",
     "    end_date='2015-2-1', # *** NOTE *** Our pricing data ends in 2015\n",
     "    fields='open_price'\n",
     ")\n",
     "\n",
-    "new_factor_data = get_clean_factor_and_forward_returns(\n",
-    "    new_pipeline_output['factor_to_analyze'], \n",
-    "    new_pricing_data,\n",
+    "factor_data = get_clean_factor_and_forward_returns(\n",
+    "    pipeline_output['factor_to_analyze'], \n",
+    "    pricing_data,\n",
     "    periods=range(1,252,20) # Change the step to 10 or more for long look forward periods to save time\n",
     ")\n",
     "\n",
-    "mean_information_coefficient(new_factor_data).plot()"
+    "mean_information_coefficient(factor_data).plot()"
    ]
   },
   {
@@ -293,23 +283,23 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 2",
    "language": "python",
-   "name": "python3"
+   "name": "python2"
   },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 3
+    "version": 2
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.0"
+   "pygments_lexer": "ipython2",
+   "version": "2.7.12"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
+}
diff --git a/notebooks/tutorials/3_factset_alphalens_lesson_5/notebook.ipynb b/notebooks/tutorials/3_factset_alphalens_lesson_5/notebook.ipynb
@@ -60,16 +60,13 @@
     "    # The following columns will help us group assets by market cap. This will allow us to analyze\n",
     "    # whether our alpha factor's predictiveness varies among assets with different market caps.\n",
     "    market_cap = factset.Fundamentals.mkt_val.latest\n",
-    "    is_small_cap = market_cap.percentile_between(0, 100)\n",
-    "    is_mid_cap = market_cap.percentile_between(50, 100)\n",
-    "    is_large_cap = market_cap.percentile_between(90, 100)\n",
+    "    cap_type = market_cap.quantiles(bins=3, mask=base_universe)\n",
     "\n",
     "    return Pipeline(\n",
     "        columns = {\n",
-    "          'factor_to_analyze': factor_to_analyze, \n",
-    "          'small_cap_filter': is_small_cap,\n",
-    "          'mid_cap_filter': is_mid_cap,\n",
-    "          'large_cap_filter': is_large_cap,\n",
+    "            'factor_to_analyze': factor_to_analyze,\n",
+    "            'cap_type': cap_type\n",
+    "            \n",
     "        },\n",
     "        screen = (\n",
     "            base_universe\n",
@@ -78,26 +75,11 @@
     "        )\n",
     "    )\n",
     "\n",
-    "# To group by market cap, we will follow the following steps.\n",
-    "\n",
-    "# Convert the \"True\" values to ones, so they can be added together\n",
-    "pipeline_output[['small_cap_filter', 'mid_cap_filter', 'large_cap_filter']] *= 1\n",
-    "\n",
-    "# If a stock passed the large_cap filter, it also passed the mid_cap and small_cap filters.\n",
-    "# This means we can add the three columns, and stocks that are large_cap will get a value of 3,\n",
-    "# stocks that are mid cap will get a value of 2, and stocks that are small cap will get 1.\n",
-    "pipeline_output['cap_type'] = (\n",
-    "  pipeline_output['small_cap_filter'] + pipeline_output['mid_cap_filter'] + pipeline_output['large_cap_filter']\n",
-    ")\n",
-    "\n",
-    "# drop the old columns, we don't need them anymore\n",
-    "pipeline_output.drop(['small_cap_filter', 'mid_cap_filter', 'large_cap_filter'], axis=1, inplace=True)\n",
-    "\n",
+    "pipeline_output = run_pipeline(make_pipeline(), '2015-1-1', '2016-1-1')\n",
     "# rename the 1's, 2's and 3's for clarity\n",
     "pipeline_output['cap_type'].replace([1, 2, 3], ['small_cap', 'mid_cap', 'large_cap'], inplace=True)\n",
     "\n",
-    "# the final product\n",
-    "pipeline_output.head(5)"
+    "pricing_data = get_pricing(pipeline_output.index.levels[1], '2015-1-1', '2016-6-1', fields='open_price')"
    ]
   },
   {