chore: added more comments for clarity. added MIT License file.

Oliver · Oliver · commit ee828d5b7c26 · 2026-03-19T11:27:43.000+01:00
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) [2026] [Swiss Data Science Center]
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,6 +1,16 @@
-# SME-KT-ZH Collaboration Forecasting
+# SME-KT-ZH: Collaboration Forecasting
 
-This project applies **survival analysis** to B2B/B2C sales transaction data to predict when customers are likely to place their next order. The output is a ranked priority list of customers, enabling proactive outreach and collaboration planning.
+This project leverages **survival analysis** on B2B and B2C sales transaction data to predict customer re-order timing. By estimating when a customer is likely to return, the model generates a ranked priority list to drive proactive outreach and strategic collaboration planning.
+
+### 🎯 Scope & Purpose
+* **What this is:** A prototype developed during a week-long workshop within the [Canton of Zurich SME program](https://www.datascience.ch/innovation/canton-zurich-sme-program) (Step 2: *Practical Sessions and Prototyping*).
+* **The Goal:** To provide a foundation for understanding survival analysis in a commercial setting.
+* **Open Source:** You are encouraged to use this codebase as a starting point for your own data and experiments.
+
+---
+
+### ⚠️ Disclaimer
+**This project is a proof-of-concept.** The code is intended for educational and prototyping purposes only. It is **not** production-ready and should not be deployed into live systems without significant refactoring and robust testing.
 
 ---
 
diff --git a/notebooks/Lifelines_Modelling.ipynb b/notebooks/Lifelines_Modelling.ipynb
@@ -252,7 +252,15 @@
    "id": "12",
    "metadata": {},
    "source": [
-    "We look at recall since we are only interested in a list of customers who will order in a certain intervall. We are not actually interested in the absolute order of the priority list. Recall tells us how many of the top k have been correctly included in the top k."
+    "### Evaluation Strategy: Focus on Recall @ k\n",
+    "\n",
+    "We prioritize **Recall** over ranking metrics like the C-index. Our primary objective is to identify the specific cohort of customers likely to order within a defined time window, rather than achieving a perfect ordinal ranking of the entire database. Here, Recall represents our \"hit rate\": the proportion of customers in our **Top $k$ priority list** who actually transacted during the observation period.\n",
+    "\n",
+    "**Performance Note:**\n",
+    "The current models achieve a Recall of approx. **62%**. \n",
+    "\n",
+    "* **As a Workshop Result:** This is a highly encouraging first milestone. It confirms that the available features contain a significant predictive signal and provides a clear uplift over a random outreach strategy.\n",
+    "* **Path to Production:** While 62% validates the approach, a production-grade system would likely require higher precision to optimize sales resources. Future iterations should focus on richer feature engineering (e.g., seasonality, product-category trends) to push this metric toward a more robust threshold for automated business decisions."
    ]
   },
   {
@@ -282,7 +290,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "sme-kt-zh-collaboration-forecasting (3.12.3)",
+   "display_name": "sme-kt-zh-collaboration-forecasting",
    "language": "python",
    "name": "python3"
   },
@@ -296,7 +304,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.11.14"
   }
  },
  "nbformat": 4,
diff --git a/notebooks/RSF_Modelling.ipynb b/notebooks/RSF_Modelling.ipynb
@@ -293,8 +293,11 @@
    "id": "16",
    "metadata": {},
    "source": [
-    "## Results Discussion Optimized vs Unoptimized\n",
-    "We optimize for C index which is higher in the tuned version, but the resulting recall is worse. This can be an indicator of high variance in the data. - We choose not to further optimize and leave it to the reader to further optimize and feature engineer the RSF model."
+    "## Results Discussion: Tuned vs. Baseline RSF\n",
+    "\n",
+    "While hyperparameter tuning slightly improved the **C-index**, we observed a simultaneous decline in **Recall**. This divergence often indicates high variance or that the model is capturing noise rather than generalizable patterns. \n",
+    "\n",
+    "**Key Takeaway:** Naive parameter optimization is rarely a silver bullet. This result highlights that real performance gains are usually found in **feature engineering** and data quality rather than just grid searching. We have intentionally left the model in this state to serve as a starting point for the reader to explore further data refinement and advanced feature construction."
    ]
   },
   {
@@ -319,11 +322,19 @@
     "\n",
     "comparison_df"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "sme-kt-zh-collaboration-forecasting (3.12.3)",
+   "display_name": "sme-kt-zh-collaboration-forecasting",
    "language": "python",
    "name": "python3"
   },
@@ -337,7 +348,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.11.14"
   }
  },
  "nbformat": 4,
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "sme_kt_zh_collaboration_forecasting"
-
+license = "MIT"
 description = ""
 readme = "README.md"
 requires-python = ">=3.11"