Merge pull request #26 from anyscale/recsys-fix

ceteri · web-flow · commit 82102b628c1d · 2020-09-15T15:08:59.000-07:00
Recsys fix
diff --git a/ray-rllib/recsys/01-Recsys.ipynb b/ray-rllib/recsys/01-Recsys.ipynb
@@ -215,7 +215,7 @@
    "source": [
     "This kind of cluster analysis has stochastic aspects, so results may differ on different runs. Generally, the plot shows a \"knee\" in the curve near `k=7` as the decrease in error begins to level out. That's a reasonable number of clusters, such that each cluster will tend to have ~14% of the items. That choice has an inherent trade-off:\n",
     "\n",
-    "  * too few clusters → poor predictions (less accuracy)\n",
+    "  * too few clusters → poor predictions (less precision)\n",
     "  * too many clusters → poor predictive power (less recall)\n",
     "\n",
     "Now we can run K-means in `scikit-learn` with that hyperparameter `k=7` to get the clusters that we'll use in our RL environment:"
@@ -881,35 +881,9 @@
     "        ]\n",
     "\n",
     "    df.loc[len(df)] = row\n",
-    "    print(status.format(*row))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The learning is stochastic and not guaranteed to improve *monotonically*, i.e., increase the min/mean/max rewards per episode in every training iterations.\n",
-    "We can use a [*pareto archive*](https://ieeexplore.ieee.org/document/781913) to find a *non-dominated* solution.\n",
-    "In other words, among the saved checkpoints of trained policies, which have the best mean rewards per episode, and among those which have the best min and max rewards?\n",
-    "The following code uses the [`paretoset`](https://github.com/tommyod/paretoset) Python implementation to select the best checkpoint:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from paretoset import paretoset\n",
-    "\n",
-    "df_front = df.drop(columns=[\"steps\", \"checkpoint\"])\n",
-    "mask = paretoset(df_front, sense=[\"max\", \"max\", \"max\"])\n",
-    "\n",
-    "optimal = df_front[mask]\n",
-    "max_val = optimal[\"avg_reward\"].max()\n",
-    "\n",
-    "BEST_CHECKPOINT = df.loc[df[\"avg_reward\"] == max_val, \"checkpoint\"].values[0]\n",
-    "print(\"best checkpoint:\", BEST_CHECKPOINT)"
+    "    print(status.format(*row))\n",
+    "    \n",
+    "BEST_CHECKPOINT = checkpoint_file"
    ]
   },
   {
@@ -1077,13 +1051,6 @@
    "source": [
     "ray.shutdown()"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
@@ -1102,7 +1069,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.7"
+   "version": "3.7.4"
   }
  },
  "nbformat": 4,
diff --git a/requirements.txt b/requirements.txt
@@ -1,9 +1,8 @@
 gym >= 0.17.2
-paretoset >= 1.1.2
 numpy >= 1.18.5
 pandas
 requests
-pytorch
+torch
 torchvision
 tensorboard >= 2.3.0
 tensorflow >= 2.3.0
@@ -18,7 +17,6 @@ jupyterlab
 jupyter-server-proxy
 beautifulsoup4
 lxml
-setproctitle
 pytz
 ray[all]==0.8.7
 atoma