Running the Application

Prerequisites

Make sure you have Python installed on your system. You can download it from python.org.

Installation

Clone the repository or download the source code.
Navigate to the project directory.
Install the required dependencies by running:

pip install -r requirements.txt

Running the Application

After installing the dependencies, run the following command to start the application:

python3 DeepFM_with_csv_input.py

To remove components such as CrossNet or Deep part and Factorization Machine part of the DeepFM you can modify the file model_zoo/DeepFM/DeepFM_torch/src/DeepFM.py

Introduction to CTR Prediction Problem

Click-Through Rate (CTR) prediction is a fundamental problem in online advertising, e-commerce, and recommendation systems. It involves estimating the probability that a user will click on a given item, such as an advertisement, product, or content, based on historical data and user behavior.

CTR prediction is crucial for optimizing user experience and maximizing revenue. Accurate CTR models help in delivering personalized content, allocating advertising budgets efficiently, and improving overall engagement. However, the task is challenging due to the sparsity of user interactions, high-dimensional feature spaces, and the need to capture complex patterns, such as feature interactions and contextual dependencies.

Advanced models like DeepFM, Wide & Deep, and CrossNet have been developed to address these challenges by leveraging both linear and non-linear feature interactions, combining traditional machine learning techniques with deep learning for enhanced performance.

Part 1: Literature Survey

CTR Study with DeepFM Recommender System

1.1.a

I'd like to start with the FM formula above.

It consists of two parts:

1 - Addition

2 - Inner Products

Addition part helps capture the linear (order-1) feature interaction. In inner products, latent (hidden) vectors also come into the play, which, in turn, helps capture the order-2 interactions.

FM part of the DeepFM algorithm does not require features i and j both appear in the same data record, which is a great advancement in comparison to the previous approaches when it comes to capturing the order-2 feature interactions.

Factorization Machine does it by computing the inner products of the latent vectors.

1.1.b

Architecture

I'd like to start with the architecture above. Looking at the big picture, we observe the DeepFM 2 consists of 2 parts: Factorization Machine and Deep Component.

In Deep Component, there is one challenge we need to tackle with: We're dealing with highly sparse data. It is computationally very hard to train the model with such a sparse data, therefore we compress the data to a dense embedding layer. We do this by means of Latent Vector in FM. Unlike [Zhang et al., 2016]'s approach, latent feature vectors (V) in FM now serves as network weights and they are used to do the aforementioned compression.

Mathematic

Moving to the mathemathical part of DeepFM, it is trained like any neural network. In the picture above, representations are as follows: σ(): activation function a(l): output W(l): model weights b(l): bias of the l-th layer.

This feed-forward NN captures high-order feature interactions, note that the low-order feature interactions are captured by the FM part.

At the end of the network we get a value, and the final output of the DeepFM model is calculated via sigmoid function, which takes the results of FM part and Deep part respectively.

1.1.c

Performance Evaluation

There are a number of key takeaways here:

1- Learning feature interaction improves the performance of CTR prediction models. We can conclude it from the fact that Logistic Regression algorithm (which does not having feature interaction) performs worse than the models having feature interaction.

2- DeepFM (it learns the low-order feature interactions and high-order feature interactions simultaneously) outperforms models that learn only low-order feature interactions and models that learn only high-order feature interactions FNN, IPNN, OPNN, PNN∗

3- Some models like LR & DNN and FM & DNN use separate feature embeddings while learning high-order and low-order feature interactions. Using the same feature embedding like we do in DeepFM outperforms the aforementioned models.

Do create feature interaction in recommendation systems.
Use the same feature embedding for low-order and high-order feature interactions.
Learn low-order and high-order feature interactions simultaneously.

Below, see the performance comparisons with Area Under Curve and Logloss methods:

We can see from the graph below that training of DeepFM is computationally fast:

Hyper-Parameter Study

activation functions; 2) dropout rate; 3) number of neurons per layer; 4) number of hidden layers; 5) network shape

Activation Function

ReLU performs better than tanh because it incudes (produces) sparsity.

Dropout Rate Formally, Dropout [Srivastava et al., 2014] refers to the probability that a neuron is kept in the network. In my interpretation, we kind of make some neurons sit idle in some iterations of the NN. As it can be seen from the graph below, Dropout Regularization Technique makes the model more robust when it is set to 0.9.

Number of Neurons per Layer

More neurons may seem to give a better performance in the NN, however, when the number of neurons per layer is increased from 400 to 800 in DeepFM it performs worse. The reason is that it creates complexity and likelihood of overfitting. See the picture below:

Number of Hidden Layers

Adding more hidden layers initially improves model performance, but excessive hidden layers lead to performance degradation due to overfitting. See the picture below

Network Shape

Among the network shapes such as constant, increasing, decreasing, and diamond; constant performs the best empirically. See the picture below:

1.1.d

Architectural Changes:

Explicit Feature Interactions:
- xDeepFM introduces the Compressed Interaction Network (CIN), which directly models higher-order feature interactions at the vector level, whereas DeepFM relies on a DNN to implicitly capture these interactions.
Hybrid Approach:
- xDeepFM combines both CIN and DNN, allowing it to learn both explicit and implicit feature interactions, giving it a more complete ability to capture complex patterns in the data.

Mathematical Improvements:

CIN’s Mechanism:
- CIN calculates higher-order interactions through Hadamard products and weighted sums. With each layer, the model captures increasingly complex interactions, making them easier to understand and interpret.
Vector-Wise Operations:
- While DeepFM models interactions at the bit level, CIN works at the vector level, which aligns more with how factorization machines traditionally work.
Parameter Efficiency:
- CIN reduces complexity by using a compressed representation, making it more scalable. It also uses methods like rank decomposition for better dimensionality management.

1.2. IMPROVEMENT

Part 2: Experimental Study / Coding

Validation and Test Metrics for DeepFM Model and Ablations

Without Ablating Any Component:

Validation:
- Log Loss: 0.277496
- AUC (Area Under Curve): 0.941540
Test:
- Log Loss: 0.280279
- AUC (Area Under Curve): 0.940040

Ablating `fm_layer`:

Validation:
- Log Loss: 0.279265
- AUC (Area Under Curve): 0.941655
Test:
- Log Loss: 0.281922
- AUC (Area Under Curve): 0.940112

Ablating `lr_layer`:

Validation:
- Log Loss: 0.279698
- AUC (Area Under Curve): 0.940230
Test:
- Log Loss: 0.281219
- AUC (Area Under Curve): 0.939263

Ablating `MLP`:

Validation:
- Log Loss: 0.294823
- AUC (Area Under Curve): 0.935569
Test:
- Log Loss: 0.296791
- AUC (Area Under Curve): 0.933542

Merging CrossNet by Concatenating to the MLP Layer

Metrics:

Validation:
- Log Loss: 0.278291
- AUC (Area Under Curve): 0.940934
Test:
- Log Loss: 0.279972
- AUC (Area Under Curve): 0.940006

Discussion

According to the Log Loss and AUC results, we can claim that there exists high-order feature interaction since the ablation of the deep part of the DeepFM resulted in a higher loss in AUC and higher increase in Log Loss.

References

Wang, R., Fu, B., Fu, G., & Wang, M. (2018). xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. Paper Link
Guo, H., Tang, R., Ye, Y., Li, Z., & He, X. (2017). DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. Paper Link
RecZoo: Benchmarking Advances in Recommendation Systems. GitHub Repository: BARS
RecZoo: FuxiCTR - A Flexible CTR Prediction Framework. GitHub Repository: FuxiCTR v2.0.2

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
checkpoints/frappe		checkpoints/frappe
config		config
frappe		frappe
fuxictr		fuxictr
model_zoo		model_zoo
DeepFM_test_csv.model		DeepFM_test_csv.model
DeepFM_with_csv_input.py		DeepFM_with_csv_input.py
README.md		README.md
requirements.txt		requirements.txt
test.csv		test.csv
train.csv		train.csv
valid.csv		valid.csv

anilanlar/CTR-DeepFM

Folders and files

Latest commit

History

Repository files navigation

Running the Application

Prerequisites

Installation

Running the Application

Introduction to CTR Prediction Problem

Part 1: Literature Survey

1.1.a

1.1.b

Architecture

Mathematic

1.1.c

Performance Evaluation

Hyper-Parameter Study

Activation Function

Number of Neurons per Layer

Number of Hidden Layers

Network Shape

1.1.d

Architectural Changes:

Mathematical Improvements:

1.2. IMPROVEMENT

Part 2: Experimental Study / Coding

Validation and Test Metrics for DeepFM Model and Ablations

Without Ablating Any Component:

Ablating fm_layer:

Ablating lr_layer:

Ablating MLP:

Merging CrossNet by Concatenating to the MLP Layer

Metrics:

Discussion

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Ablating `fm_layer`:

Ablating `lr_layer`:

Ablating `MLP`:

Packages