Integrating Users’ Contextual Engagements with Their General Preferences: An Interpretable Followee Recommendation Method

If you use the code in this repository, please cite the paper "Ran Y, Liu J, Zhang Y (2022) Integrating users’ contextual engagements with their general preferences: An interpretable followee recommendation method. INFORMS Journal on Computing." And cite the code with the following DOI: https://doi.org/10.1287/ijoc.2023.1284.cd

Below is the BibTex for citing this version of the material.

@article{PELDA2022,
    author    = {Yaxuan Ran and Jiani Liu and Yishi Zhang},
    publisher   = {INFORMS Journal on Computing}
    title     = {Integrating Users’ Contextual Engagements with Their General Preferences: An Interpretable Followee Recommendation Method},
    year      = {2022},
    doi       = {10.1287/ijoc.2023.1284.cd},
    url       = {https://github.com/INFORMSJoC/2021.0172}   
}

Description

Users’ contextual engagements can affect their decisions about who to follow on online social networks because engaged (vs. disengaged) users tend to seek more information about the interested topic and are more likely to follow relevant accounts successively. However, existing followee recommendation methods neglect to consider contextual engagement by only relying on users’ general preferences. In the light of the chronological characteristic of the user’s following behavior, we draw on the engagement theory and propose an interpretable algorithm, namely Preference-Engagement Latent Dirichlet Allocation (PE-LDA), which integrates users’ contextual engagements with their general preferences for followee recommendation. Specifically, we suggest that if a user is engaged in the current interest, he/she will be more likely to select a followee relevant to that interest. If not, the user tends to select a followee according to his/her general preference. That is, a user's following decisions are jointly influenced by the long-term general preference (distribution besides the vertial axis) and the short-term contextual engagement.

To implement this framework, we extend the original LDA by (1) introducing an indicator (1 vs. 0) to represent whether the user is contextually engaged in his/her current interest, and (2) assuming a first-order Markov property between the user’s successive interests to model the condition of contextual engagement, in which the user is prone to consecutively select followees that are highly relevant to the previous interest. We conduct extensive experiments using a real-world Twitter dataset. Results demonstrate the superior performance of PE-LDA compared with several existing methods.

Data and instructions to run PE-LDA

The original version of the dataset used in our study is an open dataset. You can find and download the raw dataset at https://www.kaggle.com/datasets/hwassner/TwitterFriends/download?datasetVersionNumber=2 (login may be required). Detailed information on this dataset is also available at https://www.kaggle.com/datasets/hwassner/TwitterFriends. A data backup is also available at https://drive.google.com/file/d/13BLIS_eQTdz6XsHkERQYDAI3vWtDMM-p.

Since both the original and pre-processed datasets exceed the maximum capacity of the platform (25MB), we provide interactive python notebook (.ipynb) files containing the pre-processing procedure and PE-LDA code to get the pre-processed datasets and the results used in our study. Please be aware that if you want to run the PE-LDA model, you must first download this dataset and pre-process it with data_preprocessing.ipynb in src folder.

The code PELDA.ipynb implements PE-LDA model. In this implementation, we first define four sampling functions to draw samples from Beta, Binomial, Dirichlet, and Multinomial distributions. Then, we train Gibbs sampler to estimate latent variables. Detailed derivations are given in appendix.pdf. In our paper, PE-LDA was implemented in C in our experiments. It is available based on reasonable request.

The code evaluation.ipynb evaluates our PE-LDA model. Considering that using one single training split may raise random bias concerns, we use sliding window, a time-dependencies friendly cross-validation method that matches our research. We adopt a five-fold cross validation to assess how the results of a statistical analysis will generalize to an independent data set. The following figure provides an illustration of the procedure.

Specifically, we conduct a five-fold cross validation by dividing the dataset into nine equal parts chronologically. The window size of each fold is five adjacent parts. for each user $u$, the most recent account in this fold is selected as the ground-truth account $g_u$, and the remaining accounts are selected as the training set $Train_u$ to train our PE-LDA model. We randomly samples 99 negative accounts that the user does not followed (i.e., not in the $Train_u$) and ranks them together with the ground-truth account to select top-N recommended candidates for the user. We compute $CR@N$ and $NDCG@N$ to evaluate the performance of PE-LDA. These two metrics are widely adopted in existing top-N recommendation literature using the leaveone-out evaluation strategy.

Prerequisties (please install the following packages before you run our PE-LDA model)

python 3.6
numpy 1.19.2
pandas 1.1.3
gensim 3.8.3
tqdm 4.50.0

Appendix

The file appendix.pdf is the online supplementary material for the paper "Integrating Users’ Contextual Engagements with Their General Preferences: An Interpretable Followee Recommendation Method". It includes:

The Preliminary Study in the Theoretical Foundation Section
Literature Summary on LDA-based Followee Recommendation
Derivation of Equations (2)--(3), i.e., the update equation from which the Gibbs sampler draws the hidden variable in our PE-LDA model
Summary of the Recommendation Methods Applied in Our Experiments
Convergence Analysis
Sensitivity Analysis: Impacts of $\alpha$ and $\beta$

Experimental results

Table 3 in the paper shows the performance comparison of our PE-LDA and various other benchmark methods

Figure 5 shows the comparison of the execution time (i.e., computation cost) of PE-LDA and other benchmarks.

Figure 6 shows the results of sensitivity analysis w.r.t the number of interests (i.e., topic).

You can access more results from the original paper.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
results		results
src		src
LICENSE		LICENSE
README.md		README.md
appendix.pdf		appendix.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integrating Users’ Contextual Engagements with Their General Preferences: An Interpretable Followee Recommendation Method

Description

Data and instructions to run PE-LDA

Prerequisties (please install the following packages before you run our PE-LDA model)

Appendix

Experimental results

About

Releases

Packages

Contributors 4

Languages

License

INFORMSJoC/2021.0172

Folders and files

Latest commit

History

Repository files navigation

Integrating Users’ Contextual Engagements with Their General Preferences: An Interpretable Followee Recommendation Method

Description

Data and instructions to run PE-LDA

Prerequisties (please install the following packages before you run our PE-LDA model)

Appendix

Experimental results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages