Skip to content

INFORMSJoC/2021.0172

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INFORMS_Journal_on_Computing_Header

If you use the code in this repository, please cite the paper "Ran Y, Liu J, Zhang Y (2022) Integrating users’ contextual engagements with their general preferences: An interpretable followee recommendation method. INFORMS Journal on Computing." And cite the code with the following DOI: https://doi.org/10.1287/ijoc.2023.1284.cd

Below is the BibTex for citing this version of the material.

@article{PELDA2022,
    author    = {Yaxuan Ran and Jiani Liu and Yishi Zhang},
    publisher   = {INFORMS Journal on Computing}
    title     = {Integrating Users’ Contextual Engagements with Their General Preferences: An Interpretable Followee Recommendation Method},
    year      = {2022},
    doi       = {10.1287/ijoc.2023.1284.cd},
    url       = {https://github.com/INFORMSJoC/2021.0172}   
}

Description

Users’ contextual engagements can affect their decisions about who to follow on online social networks because engaged (vs. disengaged) users tend to seek more information about the interested topic and are more likely to follow relevant accounts successively. However, existing followee recommendation methods neglect to consider contextual engagement by only relying on users’ general preferences. In the light of the chronological characteristic of the user’s following behavior, we draw on the engagement theory and propose an interpretable algorithm, namely Preference-Engagement Latent Dirichlet Allocation (PE-LDA), which integrates users’ contextual engagements with their general preferences for followee recommendation. Specifically, we suggest that if a user is engaged in the current interest, he/she will be more likely to select a followee relevant to that interest. If not, the user tends to select a followee according to his/her general preference. That is, a user's following decisions are jointly influenced by the long-term general preference (distribution besides the vertial axis) and the short-term contextual engagement.

To implement this framework, we extend the original LDA by (1) introducing an indicator (1 vs. 0) to represent whether the user is contextually engaged in his/her current interest, and (2) assuming a first-order Markov property between the user’s successive interests to model the condition of contextual engagement, in which the user is prone to consecutively select followees that are highly relevant to the previous interest. We conduct extensive experiments using a real-world Twitter dataset. Results demonstrate the superior performance of PE-LDA compared with several existing methods.

Data and instructions to run PE-LDA

The original version of the dataset used in our study is an open dataset. You can find and download the raw dataset at https://www.kaggle.com/datasets/hwassner/TwitterFriends/download?datasetVersionNumber=2 (login may be required). Detailed information on this dataset is also available at https://www.kaggle.com/datasets/hwassner/TwitterFriends. A data backup is also available at https://drive.google.com/file/d/13BLIS_eQTdz6XsHkERQYDAI3vWtDMM-p.

Since both the original and pre-processed datasets exceed the maximum capacity of the platform (25MB), we provide interactive python notebook (.ipynb) files containing the pre-processing procedure and PE-LDA code to get the pre-processed datasets and the results used in our study. Please be aware that if you want to run the PE-LDA model, you must first download this dataset and pre-process it with data_preprocessing.ipynb in src folder.

The code PELDA.ipynb implements PE-LDA model. In this implementation, we first define four sampling functions to draw samples from Beta, Binomial, Dirichlet, and Multinomial distributions. Then, we train Gibbs sampler to estimate latent variables. Detailed derivations are given in appendix.pdf. In our paper, PE-LDA was implemented in C in our experiments. It is available based on reasonable request.

The code evaluation.ipynb evaluates our PE-LDA model. Considering that using one single training split may raise random bias concerns, we use sliding window, a time-dependencies friendly cross-validation method that matches our research. We adopt a five-fold cross validation to assess how the results of a statistical analysis will generalize to an independent data set. The following figure provides an illustration of the procedure.

截屏2022-12-20 下午8 43 41

Specifically, we conduct a five-fold cross validation by dividing the dataset into nine equal parts chronologically. The window size of each fold is five adjacent parts. for each user $u$, the most recent account in this fold is selected as the ground-truth account $g_u$, and the remaining accounts are selected as the training set $Train_u$ to train our PE-LDA model. We randomly samples 99 negative accounts that the user does not followed (i.e., not in the $Train_u$) and ranks them together with the ground-truth account to select top-N recommended candidates for the user. We compute $CR@N$ and $NDCG@N$ to evaluate the performance of PE-LDA. These two metrics are widely adopted in existing top-N recommendation literature using the leaveone-out evaluation strategy.

Prerequisties (please install the following packages before you run our PE-LDA model)

  • python 3.6
  • numpy 1.19.2
  • pandas 1.1.3
  • gensim 3.8.3
  • tqdm 4.50.0

Appendix

The file appendix.pdf is the online supplementary material for the paper "Integrating Users’ Contextual Engagements with Their General Preferences: An Interpretable Followee Recommendation Method". It includes:

  • The Preliminary Study in the Theoretical Foundation Section
  • Literature Summary on LDA-based Followee Recommendation
  • Derivation of Equations (2)--(3), i.e., the update equation from which the Gibbs sampler draws the hidden variable in our PE-LDA model
  • Summary of the Recommendation Methods Applied in Our Experiments
  • Convergence Analysis
  • Sensitivity Analysis: Impacts of $\alpha$ and $\beta$

Experimental results

Table 3 in the paper shows the performance comparison of our PE-LDA and various other benchmark methods Results; Table 3

Figure 5 shows the comparison of the execution time (i.e., computation cost) of PE-LDA and other benchmarks. Results; Figure 5

Figure 6 shows the results of sensitivity analysis w.r.t the number of interests (i.e., topic). Results; Figure 6

You can access more results from the original paper.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •