This project builds a production-style baseline recommendation system for information-feed ads, with one-click execution and four objective modes:
- CTR optimization
- Order optimization
- Session dwell optimization
- Multi-objective optimization (click + order + dwell)
Given user_id and current request time when a user opens the APP, rank candidate ads to maximize business outcomes under different objective settings.
Available data:
data/user_app_behavior.xlsxdata/user_order.xlsxdata/ads.xlsxdata/product.xlsx
flowchart LR
A["Raw Event and Master Data"] --> B["Data Cleaning and Timestamp Parsing"]
B --> C["Display Level Label Construction"]
C --> D["Leakage Safe Feature Engineering"]
D --> E["Temporal Train Val Test Split"]
E --> F["CTR Model Training"]
E --> G["Order Model Training"]
E --> H["Dwell Model Training"]
F --> I["Objective Specific Scoring"]
G --> I
H --> I
I --> J["Rank Candidate Ads"]
J --> K["Offline Evaluation and Artifact Export"]
flowchart TD
A["python3 main.py"] --> B["Profile Dataset"]
B --> C["Train All Models"]
C --> D["Auto Select Demo User"]
D --> E["Generate Top K Recommendations"]
E --> F["Save Bundle Metrics and Recommendation CSV"]
Training unit:
- one row per
displayevent
Labels:
label_click: user clicked same ad within 30 minuteslabel_order_24h: user placed any order within 24 hours after displaylabel_order_7d: long-window order label for analysislabel_dwell_sec: remaining session duration after display
Why this design:
- aligns recommendation ranking to exposure decisions
- supports direct objective switching without rebuilding the sample table
- keeps modeling and evaluation consistent across scenarios
Feature groups:
- context features:
hour,weekday,is_weekend - session dynamics:
session_event_index,session_elapsed_sec - user history: prior displays, clicks, historical CTR
- ad history: prior displays, clicks, historical CTR
- user-ad interaction history: pair-level display/click/CTR stats
- transaction history: prior orders, average items per order, time since last order
- ad metadata:
creative_type,main_color,design_style,has_ad_profile
Leakage control:
- all history features use only prior events
- split is time-based (not random)
- as-of joins use sorted keys to preserve temporal causality
- target: maximize immediate click probability
- model:
p_ctr = P(click | user, ad, context) - score:
score_ctr = p_ctr
- target: maximize downstream orders
- model:
p_order = P(order_24h | user, ad, context) - score:
score_order = p_ctr * p_order
Reason:
- this combines engagement likelihood and conversion likelihood, reducing instability from sparse conversion labels.
- target: maximize session continuation
- model:
pred_dwell_sec - score:
score_dwell = 0.35 * p_ctr + 0.65 * norm(pred_dwell_sec)
Reason:
- dwell-only ranking can drift from ad relevance; CTR term acts as a relevance guardrail.
- target: optimize click, order, and dwell jointly
- score:
score_multi = w_ctr*norm(p_ctr) + w_order*norm(p_order) + w_dwell*norm(pred_dwell_sec)- default weights:
w_ctr=0.4,w_order=0.4,w_dwell=0.2
Reason:
- explicit utility weighting gives product and business teams a controllable tradeoff mechanism.
Models:
- CTR:
HistGradientBoostingClassifier - Order:
HistGradientBoostingClassifier - Dwell:
HistGradientBoostingRegressoronlog1p(dwell)
Preprocessing:
- categorical: most-frequent imputation + ordinal encoder with unknown handling
- numeric: median imputation
- class imbalance: balanced sample weights for binary tasks
Split strategy:
- strict temporal split with default ratios
70/15/15
- CTR and Order: ROC-AUC, PR-AUC, LogLoss, positive rate
- Dwell: RMSE, MAE
- NDCG@5 and NDCG@10 on click and order labels
- top1 policy simulation:
top1_click_ratetop1_order_rate_24htop1_avg_dwell_sectop1_multi_utility
flowchart LR
A["Validation and Test Predictions"] --> B["Model Metrics"]
A --> C["Objective Scores"]
C --> D["NDCG at K"]
C --> E["Top1 Policy Simulation"]
B --> F["metrics.json"]
D --> F
E --> F
main.py: CLI entrypoint and one-click orchestrationsrc/recsys/config.py: path and training config dataclassessrc/recsys/data.py: loading, cleaning, labels, features, temporal splitsrc/recsys/modeling.py: model builders and fit/predict helperssrc/recsys/scoring.py: objective score functionssrc/recsys/evaluation.py: offline metric suitesrc/recsys/pipeline.py: profile/train/recommend orchestration
python3 main.pyEquivalent explicit mode:
python3 main.py run-all --data-dir data --artifact-dir artifacts --objective multi --top-k 20Option-only style also works:
python3 main.py --artifact-dir artifacts --objective order --top-k 30Profile only:
python3 main.py profile --data-dir dataTrain only:
python3 main.py train --data-dir data --artifact-dir artifactsRecommend only:
python3 main.py recommend --artifact-dir artifacts --user-id 6176 --timestamp "2021-11-21 12:00:00" --objective multi --top-k 20artifacts/model_bundle.joblibartifacts/metrics.jsonartifacts/val_predictions.csvartifacts/test_predictions.csvartifacts/one_click_recommendations_<user>_<objective>.csv
- Add propensity logging and IPS or DR estimators for counterfactual policy evaluation.
- Replace independent models with multi-task architecture such as MMoE or PLE.
- Introduce constrained re-ranking with KPI guardrails and pacing logic.
- Add online experiment framework with CUPED and guardrail monitoring.
- Add nearline feature updates for fresher user state at serving time.
- Order labels are user-level outcomes and do not establish direct ad-to-order causality.
- Offline policy metrics are still correlational without explicit propensity logs.
- This baseline does not include latency benchmarks or serving infrastructure.
These limitations are documented intentionally to define a clear next iteration plan.