chore: add data path

Kyle van de Langemheen · Kyle van de Langemheen · commit 705cef0d359b · 2026-03-19T08:03:34.000+01:00
diff --git a/notebooks/Lifelines_Modelling.ipynb b/notebooks/Lifelines_Modelling.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "342e73df",
+   "id": "0",
    "metadata": {},
    "source": [
     "# Lifelines Modelling\n",
@@ -12,7 +12,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c1dc8173",
+   "id": "1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -25,7 +25,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b457091b",
+   "id": "2",
    "metadata": {},
    "source": [
     "## Prepare the Input Data\n",
@@ -35,11 +35,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "43c007ae",
+   "id": "3",
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = su.read_sales_data()\n",
+    "df = su.read_sales_data(\"../data/sales_df.csv\")\n",
     "len_b = len(df)\n",
     "df = df.drop_duplicates()\n",
     "len_a = len(df)\n",
@@ -49,7 +49,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "55e9800a",
+   "id": "4",
    "metadata": {},
    "source": [
     "## Build, Fit, and Rank with the Cox Model\n",
@@ -59,7 +59,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "361fe200",
+   "id": "5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -114,7 +114,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eaaab92d",
+   "id": "6",
    "metadata": {},
    "source": [
     "## Check Cox Assumptions\n",
@@ -124,7 +124,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8498ea11",
+   "id": "7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -133,7 +133,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d21572b9",
+   "id": "8",
    "metadata": {},
    "source": [
     "## Evaluate Cox Performance\n",
@@ -143,7 +143,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c06844c7",
+   "id": "9",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -162,7 +162,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c477b0e5",
+   "id": "10",
    "metadata": {},
    "source": [
     "## Compare Weibull and Log-Normal AFT Variants\n",
@@ -172,7 +172,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9308a7a9",
+   "id": "11",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -249,7 +249,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "85f53f3a",
+   "id": "12",
    "metadata": {},
    "source": [
     "We look at recall since we are only interested in a list of customers who will order in a certain intervall. We are not actually interested in the absolute order of the priority list. Recall tells us how many of the top k have been correctly included in the top k."
@@ -258,7 +258,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "007521ed",
+   "id": "13",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -282,7 +282,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "sme-kt-zh-collaboration-forecasting",
+   "display_name": "sme-kt-zh-collaboration-forecasting (3.12.3)",
    "language": "python",
    "name": "python3"
   },
@@ -296,7 +296,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.14"
+   "version": "3.12.3"
   }
  },
  "nbformat": 4,
diff --git a/notebooks/RSF_Modelling.ipynb b/notebooks/RSF_Modelling.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "1y6kpzwajfw",
+   "id": "0",
    "metadata": {},
    "source": [
     "# RSF Modelling\n",
@@ -12,7 +12,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b31030b6",
+   "id": "1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -27,7 +27,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "u1in2aq63no",
+   "id": "2",
    "metadata": {},
    "source": [
     "## Prepare the Input Data\n",
@@ -37,11 +37,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "dc65164f",
+   "id": "3",
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = su.read_sales_data()\n",
+    "df = su.read_sales_data(\"../data/sales_df.csv\")\n",
     "len_b = len(df)\n",
     "df = df.drop_duplicates()\n",
     "len_a = len(df)\n",
@@ -51,7 +51,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "enodf5zrb9r",
+   "id": "4",
    "metadata": {},
    "source": [
     "## Feature Engineering\n",
@@ -61,7 +61,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09b14307",
+   "id": "5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -99,7 +99,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "vr70gk02d8",
+   "id": "6",
    "metadata": {},
    "source": [
     "## Train / Test Split and Survival Data Preparation\n",
@@ -109,7 +109,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "pqym5f9y31",
+   "id": "7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -131,7 +131,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "oy1837j4h7",
+   "id": "8",
    "metadata": {},
    "source": [
     "## Fit the RSF Model\n",
@@ -141,7 +141,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b232d36ftrd",
+   "id": "9",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -177,7 +177,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "l4zik5zvi7o",
+   "id": "10",
    "metadata": {},
    "source": [
     "## Evaluate RSF Performance\n",
@@ -187,7 +187,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ihs8vbovjn",
+   "id": "11",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -216,7 +216,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dljgu0g217",
+   "id": "12",
    "metadata": {},
    "source": [
     "## Tune Hyperparameters\n",
@@ -226,7 +226,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9b9e4549",
+   "id": "13",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -256,7 +256,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2hyfphb325k",
+   "id": "14",
    "metadata": {},
    "source": [
     "## Evaluate Tuned RSF and Compare\n",
@@ -266,7 +266,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2ybvzmao1vr",
+   "id": "15",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -290,7 +290,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "15c7317e",
+   "id": "16",
    "metadata": {},
    "source": [
     "## Results Discussion Optimized vs Unoptimized\n",
@@ -300,7 +300,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a6fl4aargf8",
+   "id": "17",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -323,7 +323,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "sme-kt-zh-collaboration-forecasting",
+   "display_name": "sme-kt-zh-collaboration-forecasting (3.12.3)",
    "language": "python",
    "name": "python3"
   },
@@ -337,7 +337,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.14"
+   "version": "3.12.3"
   }
  },
  "nbformat": 4,