Skip to content

Commit ee828d5

Browse files
author
Oliver
committed
chore: added more comments for clarity. added MIT License file.
1 parent 02531b5 commit ee828d5

5 files changed

Lines changed: 60 additions & 10 deletions

File tree

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) [2026] [Swiss Data Science Center]
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
1-
# SME-KT-ZH Collaboration Forecasting
1+
# SME-KT-ZH: Collaboration Forecasting
22

3-
This project applies **survival analysis** to B2B/B2C sales transaction data to predict when customers are likely to place their next order. The output is a ranked priority list of customers, enabling proactive outreach and collaboration planning.
3+
This project leverages **survival analysis** on B2B and B2C sales transaction data to predict customer re-order timing. By estimating when a customer is likely to return, the model generates a ranked priority list to drive proactive outreach and strategic collaboration planning.
4+
5+
### 🎯 Scope & Purpose
6+
* **What this is:** A prototype developed during a week-long workshop within the [Canton of Zurich SME program](https://www.datascience.ch/innovation/canton-zurich-sme-program) (Step 2: *Practical Sessions and Prototyping*).
7+
* **The Goal:** To provide a foundation for understanding survival analysis in a commercial setting.
8+
* **Open Source:** You are encouraged to use this codebase as a starting point for your own data and experiments.
9+
10+
---
11+
12+
### ⚠️ Disclaimer
13+
**This project is a proof-of-concept.** The code is intended for educational and prototyping purposes only. It is **not** production-ready and should not be deployed into live systems without significant refactoring and robust testing.
414

515
---
616

notebooks/Lifelines_Modelling.ipynb

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,15 @@
252252
"id": "12",
253253
"metadata": {},
254254
"source": [
255-
"We look at recall since we are only interested in a list of customers who will order in a certain intervall. We are not actually interested in the absolute order of the priority list. Recall tells us how many of the top k have been correctly included in the top k."
255+
"### Evaluation Strategy: Focus on Recall @ k\n",
256+
"\n",
257+
"We prioritize **Recall** over ranking metrics like the C-index. Our primary objective is to identify the specific cohort of customers likely to order within a defined time window, rather than achieving a perfect ordinal ranking of the entire database. Here, Recall represents our \"hit rate\": the proportion of customers in our **Top $k$ priority list** who actually transacted during the observation period.\n",
258+
"\n",
259+
"**Performance Note:**\n",
260+
"The current models achieve a Recall of approx. **62%**. \n",
261+
"\n",
262+
"* **As a Workshop Result:** This is a highly encouraging first milestone. It confirms that the available features contain a significant predictive signal and provides a clear uplift over a random outreach strategy.\n",
263+
"* **Path to Production:** While 62% validates the approach, a production-grade system would likely require higher precision to optimize sales resources. Future iterations should focus on richer feature engineering (e.g., seasonality, product-category trends) to push this metric toward a more robust threshold for automated business decisions."
256264
]
257265
},
258266
{
@@ -282,7 +290,7 @@
282290
],
283291
"metadata": {
284292
"kernelspec": {
285-
"display_name": "sme-kt-zh-collaboration-forecasting (3.12.3)",
293+
"display_name": "sme-kt-zh-collaboration-forecasting",
286294
"language": "python",
287295
"name": "python3"
288296
},
@@ -296,7 +304,7 @@
296304
"name": "python",
297305
"nbconvert_exporter": "python",
298306
"pygments_lexer": "ipython3",
299-
"version": "3.12.3"
307+
"version": "3.11.14"
300308
}
301309
},
302310
"nbformat": 4,

notebooks/RSF_Modelling.ipynb

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -293,8 +293,11 @@
293293
"id": "16",
294294
"metadata": {},
295295
"source": [
296-
"## Results Discussion Optimized vs Unoptimized\n",
297-
"We optimize for C index which is higher in the tuned version, but the resulting recall is worse. This can be an indicator of high variance in the data. - We choose not to further optimize and leave it to the reader to further optimize and feature engineer the RSF model."
296+
"## Results Discussion: Tuned vs. Baseline RSF\n",
297+
"\n",
298+
"While hyperparameter tuning slightly improved the **C-index**, we observed a simultaneous decline in **Recall**. This divergence often indicates high variance or that the model is capturing noise rather than generalizable patterns. \n",
299+
"\n",
300+
"**Key Takeaway:** Naive parameter optimization is rarely a silver bullet. This result highlights that real performance gains are usually found in **feature engineering** and data quality rather than just grid searching. We have intentionally left the model in this state to serve as a starting point for the reader to explore further data refinement and advanced feature construction."
298301
]
299302
},
300303
{
@@ -319,11 +322,19 @@
319322
"\n",
320323
"comparison_df"
321324
]
325+
},
326+
{
327+
"cell_type": "code",
328+
"execution_count": null,
329+
"id": "18",
330+
"metadata": {},
331+
"outputs": [],
332+
"source": []
322333
}
323334
],
324335
"metadata": {
325336
"kernelspec": {
326-
"display_name": "sme-kt-zh-collaboration-forecasting (3.12.3)",
337+
"display_name": "sme-kt-zh-collaboration-forecasting",
327338
"language": "python",
328339
"name": "python3"
329340
},
@@ -337,7 +348,7 @@
337348
"name": "python",
338349
"nbconvert_exporter": "python",
339350
"pygments_lexer": "ipython3",
340-
"version": "3.12.3"
351+
"version": "3.11.14"
341352
}
342353
},
343354
"nbformat": 4,

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "sme_kt_zh_collaboration_forecasting"
7-
7+
license = "MIT"
88
description = ""
99
readme = "README.md"
1010
requires-python = ">=3.11"

0 commit comments

Comments
 (0)