Merge pull request #159 from gperdrizet/dev

gperdrizet · web-flow · commit 1fa9e379402d · 2025-12-09T11:55:56.000-05:00
Added requirements file for Kaggle
diff --git a/notebooks/unit3/lesson_20/Lesson_20_activity.ipynb b/notebooks/unit3/lesson_20/Lesson_20_activity.ipynb
@@ -48,7 +48,19 @@
     "   - Your notebook must output test set predictions to `submission.csv` in the correct format\n",
     "   - Go to 'Submit to competition' tab in the right sidebar and click 'Submit'\n",
     "\n",
-    "**Note:** This notebook uses a `KAGGLE` flag (under 'Run configuration') to switch between Kaggle and local file paths. Set it to `True` when running on Kaggle, or `False` when running locally with data in a `../data/` directory.\n",
+    "You may see warnings when running on Kaggle due to inconsistencies in installed package versions between your environment and Kaggle. If you are using a virtual environment already, install this [kaggle_requirements.txt](https://github.com/gperdrizet/FSA_devops/blob/main/notebooks/unit3/lesson_20/kaggle_requirements.txt.ipynb):\n",
+    "\n",
+    "```\n",
+    "pip install --force-reinstall kaggle_requirements.txt\n",
+    "```\n",
+    "\n",
+    "This is working for me with Python 3.12. It contains a slightly newer version of scikit-learn than is found on Kaggle. Update in the Kaggle environment by going to 'Add-ons' -> 'Install Dependencies' and adding:\n",
+    "\n",
+    "```\n",
+    "pip install scikit-learn==1.5.2\n",
+    "```\n",
+    "\n",
+    ">**Note:** This notebook uses a `KAGGLE` flag (under 'Run >configuration') to switch between Kaggle and local file paths. Set it >to `True` when running on Kaggle, or `False` when running locally.\n",
     "\n",
     "## Notebook set-up\n",
     "\n",
@@ -122,7 +134,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "id": "d1a2421c",
    "metadata": {
     "execution": {
@@ -358,11 +370,13 @@
    "source": [
     "# Set file paths based on environment\n",
     "if KAGGLE:\n",
+    "\n",
     "    # Kaggle paths - data is in /kaggle/input/\n",
     "    train_df_path = '/kaggle/input/playground-series-s5e12/train.csv'\n",
     "    test_df_path = '/kaggle/input/playground-series-s5e12/test.csv'\n",
     "\n",
     "else:\n",
+    "\n",
     "    # Otherwise, load data from course GitHub repository\n",
     "    train_df_path = 'https://gperdrizet.github.io/FSA_devops/assets/data/unit3/diabetes_prediction_train.csv'\n",
     "    test_df_path = 'https://gperdrizet.github.io/FSA_devops/assets/data/unit3/diabetes_prediction_test.csv'\n",
diff --git a/notebooks/unit3/lesson_20/kaggle_requirements.txt b/notebooks/unit3/lesson_20/kaggle_requirements.txt
@@ -0,0 +1,7 @@
+ipykernel
+matplotlib==3.7.2
+numpy==1.26.4
+pandas==2.2.3
+scipy==1.15.3
+scikit-learn==1.2.2
+seaborn==0.12.2
diff --git a/site/_posts/2025-12-09-kaggle_requirements.md b/site/_posts/2025-12-09-kaggle_requirements.md
@@ -0,0 +1,20 @@
+---
+layout: post
+title: "Kaggle requirements"
+date: 2025-12-08
+categories: resources
+---
+
+Added a `requirements.txt` file for Kaggle notebooks.
+
+You may see warnings when running on Kaggle due to inconsistencies in installed package versions between your environment and Kaggle. If you are using a virtual environment already, install this [kaggle_requirements.txt](https://github.com/gperdrizet/FSA_devops/blob/main/notebooks/unit3/lesson_20/kaggle_requirements.txt):
+
+```
+pip install --force-reinstall kaggle_requirements.txt
+```
+
+This is working for me with Python 3.12. It contains a slightly newer version of scikit-learn than is found on Kaggle. Update in the Kaggle environment by going to 'Add-ons' -> 'Install Dependencies' and adding:
+
+```
+pip install scikit-learn==1.5.2
+```