Skip to content

Commit 5dca595

Browse files
authored
Merge pull request #2124 from mkhe93/feature/ALS_MF
[feature] ALS Matrix Factorization using External Library (implicit)
2 parents c2e7c06 + a8887dc commit 5dca595

File tree

5 files changed

+308
-1
lines changed

5 files changed

+308
-1
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. automodule:: recbole.model.general_recommender.als
2+
:members:
3+
:undoc-members:
4+
:show-inheritance:
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
ALS(External algorithm library)
2+
===========
3+
4+
Introduction
5+
---------------------
6+
7+
`[ALS (implicit)] <https://benfred.github.io/implicit/api/models/cpu/als.html>`_
8+
9+
**ALS (AlternatingLeastSquares)** by implicit is a Recommendation Model based on the algorithm proposed by Koren in `Collaborative Filtering for Implicit Feedback Datasets <http://yifanhu.net/PUB/cf.pdf>`_.
10+
It furthermore leverages the finding out of `Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering <https://dl.acm.org/doi/pdf/10.1145/2043932.2043987>`_ for performance optimization.
11+
`Implicit <https://benfred.github.io/implicit/index.html>`_ provides several models for implicit feedback recommendations.
12+
13+
`[paper] <http://yifanhu.net/PUB/cf.pdf>`_
14+
15+
**Title:** Collaborative Filtering for Implicit Feedback Datasets
16+
17+
**Authors:** Hu, Yifan and Koren, Yehuda and Volinsky, Chris
18+
19+
**Abstract:** A common task of recommender systems is to improve
20+
customer experience through personalized recommendations based on prior implicit feedback. These systems passively track different sorts of user behavior, such as purchase history, watching habits and browsing activity, in order to model user preferences. Unlike the much more extensively researched explicit feedback, we do not have any
21+
direct input from the users regarding their preferences. In
22+
particular, we lack substantial evidence on which products
23+
consumer dislike. In this work we identify unique properties of implicit feedback datasets. We propose treating the
24+
data as indication of positive and negative preference associated with vastly varying confidence levels. This leads to a
25+
factor model which is especially tailored for implicit feedback recommenders. We also suggest a scalable optimization procedure, which scales linearly with the data size. The
26+
algorithm is used successfully within a recommender system
27+
for television shows. It compares favorably with well tuned
28+
implementations of other known methods. In addition, we
29+
offer a novel way to give explanations to recommendations
30+
given by this factor model.
31+
32+
Running with RecBole
33+
-------------------------
34+
35+
**Model Hyper-Parameters:**
36+
37+
- ``embedding_size (int)`` : The number of latent factors to compute. Defaults to ``64``.
38+
- ``regularization (float)`` : The regularization factor to use. Defaults to ``0.01``.
39+
- ``alpha (float)`` : The weight to give to positive examples. Defaults to ``1.0``.
40+
41+
Please refer to [Implicit Python package](https://benfred.github.io/implicit/index.html) for more details.
42+
43+
**A Running Example:**
44+
45+
Write the following code to a python file, such as `run.py`
46+
47+
.. code:: python
48+
49+
from recbole.quick_start import run_recbole
50+
51+
run_recbole(model='ALS', dataset='ml-100k')
52+
53+
And then:
54+
55+
.. code:: bash
56+
57+
python run.py
58+
59+
Tuning Hyper Parameters
60+
-------------------------
61+
62+
If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
63+
64+
.. code:: bash
65+
66+
regularization choice [0.01, 0.03, 0.05, 0.1]
67+
embedding_size choice [32, 64, 96, 128, 256]
68+
alpha choice [0.5, 0.7, 1.0, 1.3, 1.5]
69+
70+
Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
71+
72+
Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
73+
74+
.. code:: bash
75+
76+
python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
77+
78+
For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
79+
80+
81+
If you want to change parameters, dataset or evaluation settings, take a look at
82+
83+
- :doc:`../../../user_guide/config_settings`
84+
- :doc:`../../../user_guide/data_intro`
85+
- :doc:`../../../user_guide/train_eval_intro`
86+
- :doc:`../../../user_guide/usage`
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# -*- coding: utf-8 -*-
2+
# @Time : 2024/12/01
3+
# @Author : Markus Hoefling
4+
# @Email : markus.hoefling01@gmail.com
5+
6+
r"""
7+
ALS
8+
################################################
9+
Reference 1:
10+
Hu, Y., Koren, Y., & Volinsky, C. (2008). "Collaborative Filtering for Implicit Feedback Datasets." In ICDM 2008.
11+
12+
Reference 2:
13+
Frederickson, Ben, "Implicit 0.7.2", code: https://github.com/benfred/implicit, readthedocs: https://benfred.github.io/implicit/
14+
"""
15+
16+
import numpy as np
17+
import threadpoolctl
18+
import torch
19+
from implicit.als import AlternatingLeastSquares
20+
from recbole.model.abstract_recommender import GeneralRecommender
21+
from recbole.utils import InputType, ModelType
22+
threadpoolctl.threadpool_limits(1, "blas") # Due to a warning that occurred while running the ALS algorithm
23+
24+
class ALS(GeneralRecommender):
25+
r"""
26+
ALS is a matrix factorization model implemented using the Alternating Least Squares (ALS) method
27+
from the `implicit` library (https://benfred.github.io/implicit/).
28+
This model optimizes the embeddings through the Alternating Least Squares algorithm.
29+
"""
30+
31+
input_type = InputType.POINTWISE
32+
type = ModelType.GENERAL
33+
34+
def __init__(self, config, dataset):
35+
super(ALS, self).__init__(config, dataset)
36+
37+
# load parameters info
38+
self.embedding_size = config['embedding_size']
39+
self.regularization = config['regularization']
40+
self.alpha = config['alpha']
41+
self.iterations = config['epochs']
42+
43+
# define model
44+
self.model = AlternatingLeastSquares(
45+
factors=self.embedding_size,
46+
regularization=self.regularization,
47+
alpha=self.alpha,
48+
iterations=1, # iterations are done by the ALSTrainer via 'epochs'
49+
use_cg=True,
50+
calculate_training_loss=True,
51+
num_threads=0,
52+
random_state=42
53+
)
54+
55+
# initialize embeddings
56+
self.user_embeddings = np.random.rand(self.n_users, self.embedding_size)
57+
self.item_embeddings = np.random.rand(self.n_items, self.embedding_size)
58+
59+
# fake embeddings for optimizer initialization
60+
self.fake_parameter = torch.nn.Parameter(torch.zeros(1))
61+
62+
def get_user_embedding(self, user):
63+
return torch.tensor(self.user_embeddings[user])
64+
65+
def get_item_embedding(self, item):
66+
return torch.tensor(self.item_embeddings[item])
67+
68+
def forward(self, user, item):
69+
user_e = self.get_user_embedding(user)
70+
item_e = self.get_item_embedding(item)
71+
return user_e, item_e
72+
73+
def _callback(self, iteration, time, loss):
74+
self._loss = loss
75+
76+
def calculate_loss(self, interactions):
77+
self.model.fit(interactions, show_progress=False, callback=self._callback)
78+
self.user_embeddings = self.model.user_factors
79+
self.item_embeddings = self.model.item_factors
80+
return self._loss
81+
82+
def predict(self, interaction):
83+
user = interaction[self.USER_ID]
84+
item = interaction[self.ITEM_ID]
85+
user_e, item_e = self.forward(user, item)
86+
return torch.dot(user_e, item_e)
87+
88+
def full_sort_predict(self, interaction):
89+
user = interaction[self.USER_ID]
90+
user_e = self.get_user_embedding(user)
91+
all_item_e = torch.tensor(self.model.item_factors)
92+
score = torch.matmul(user_e, all_item_e.transpose(0, 1))
93+
return score.view(-1)

recbole/properties/model/ALS.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
regularization: 0.01 # The number of latent factors to compute
2+
embedding_size: 64 # The regularization factor to use
3+
alpha: 1.0 # The weight to give to positive examples.

recbole/trainer/trainer.py

Lines changed: 122 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
from torch.nn.utils.clip_grad import clip_grad_norm_
2929
from tqdm import tqdm
3030
import torch.cuda.amp as amp
31+
import scipy.sparse as sp
3132

3233
from recbole.data.interaction import Interaction
3334
from recbole.data.dataloader import FullSortEvalDataLoader
@@ -92,7 +93,6 @@ def sync_grad_loss(self):
9293
sync_loss += torch.sum(params) * 0
9394
return sync_loss
9495

95-
9696
class Trainer(AbstractTrainer):
9797
r"""The basic Trainer for basic training and evaluation strategies in recommender systems. This class defines common
9898
functions for training and evaluation processes of most recommender system models, including fit(), evaluate(),
@@ -671,6 +671,127 @@ def _spilt_predict(self, interaction, batch_size):
671671
result_list.append(result)
672672
return torch.cat(result_list, dim=0)
673673

674+
class ALSTrainer(Trainer):
675+
r"""ALSTrainer is designed for the ALS model of the implicit library: https://benfred.github.io/implicit"""
676+
677+
def __init__(self, config, model):
678+
super(ALSTrainer, self).__init__(config, model)
679+
680+
def fit(
681+
self,
682+
train_data,
683+
valid_data=None,
684+
verbose=True,
685+
saved=True,
686+
show_progress=False,
687+
callback_fn=None,
688+
):
689+
r"""Train the model based on the train data and the valid data.
690+
691+
Args:
692+
train_data (DataLoader): the train data
693+
valid_data (DataLoader, optional): the valid data, default: None.
694+
If it's None, the early_stopping is invalid.
695+
verbose (bool, optional): whether to write training and evaluation information to logger, default: True
696+
saved (bool, optional): whether to save the model parameters, default: True
697+
show_progress (bool): Show the progress of training epoch and evaluate epoch. Defaults to ``False``.
698+
callback_fn (callable): Optional callback function executed at end of epoch.
699+
Includes (epoch_idx, valid_score) input arguments.
700+
701+
Returns:
702+
(float, dict): best valid score and best valid result. If valid_data is None, it returns (-1, None)
703+
"""
704+
if saved and self.start_epoch >= self.epochs:
705+
self._save_checkpoint(-1, verbose=verbose)
706+
707+
self.eval_collector.data_collect(train_data)
708+
if self.config["train_neg_sample_args"].get("dynamic", False):
709+
train_data.get_model(self.model)
710+
valid_step = 0
711+
712+
for epoch_idx in range(self.start_epoch, self.epochs):
713+
# train
714+
training_start_time = time()
715+
# pass entire dataset as sparse csr, as required in https://benfred.github.io/implicit
716+
train_loss = self.model.calculate_loss(train_data._dataset.inter_matrix(form='csr'))
717+
self.train_loss_dict[epoch_idx] = (
718+
sum(train_loss) if isinstance(train_loss, tuple) else train_loss
719+
)
720+
training_end_time = time()
721+
train_loss_output = self._generate_train_loss_output(
722+
epoch_idx, training_start_time, training_end_time, train_loss
723+
)
724+
if verbose:
725+
self.logger.info(train_loss_output)
726+
self._add_train_loss_to_tensorboard(epoch_idx, train_loss)
727+
self.wandblogger.log_metrics(
728+
{"epoch": epoch_idx, "train_loss": train_loss, "train_step": epoch_idx},
729+
head="train",
730+
)
731+
732+
# eval
733+
if self.eval_step <= 0 or not valid_data:
734+
if saved:
735+
self._save_checkpoint(epoch_idx, verbose=verbose)
736+
continue
737+
if (epoch_idx + 1) % self.eval_step == 0:
738+
valid_start_time = time()
739+
valid_score, valid_result = self._valid_epoch(
740+
valid_data, show_progress=show_progress
741+
)
742+
743+
(
744+
self.best_valid_score,
745+
self.cur_step,
746+
stop_flag,
747+
update_flag,
748+
) = early_stopping(
749+
valid_score,
750+
self.best_valid_score,
751+
self.cur_step,
752+
max_step=self.stopping_step,
753+
bigger=self.valid_metric_bigger,
754+
)
755+
valid_end_time = time()
756+
valid_score_output = (
757+
set_color("epoch %d evaluating", "green")
758+
+ " ["
759+
+ set_color("time", "blue")
760+
+ ": %.2fs, "
761+
+ set_color("valid_score", "blue")
762+
+ ": %f]"
763+
) % (epoch_idx, valid_end_time - valid_start_time, valid_score)
764+
valid_result_output = (
765+
set_color("valid result", "blue") + ": \n" + dict2str(valid_result)
766+
)
767+
if verbose:
768+
self.logger.info(valid_score_output)
769+
self.logger.info(valid_result_output)
770+
self.tensorboard.add_scalar("Vaild_score", valid_score, epoch_idx)
771+
self.wandblogger.log_metrics(
772+
{**valid_result, "valid_step": valid_step}, head="valid"
773+
)
774+
775+
if update_flag:
776+
if saved:
777+
self._save_checkpoint(epoch_idx, verbose=verbose)
778+
self.best_valid_result = valid_result
779+
780+
if callback_fn:
781+
callback_fn(epoch_idx, valid_score)
782+
783+
if stop_flag:
784+
stop_output = "Finished training, best eval result in epoch %d" % (
785+
epoch_idx - self.cur_step * self.eval_step
786+
)
787+
if verbose:
788+
self.logger.info(stop_output)
789+
break
790+
791+
valid_step += 1
792+
793+
self._add_hparam_to_tensorboard(self.best_valid_score)
794+
return self.best_valid_score, self.best_valid_result
674795

675796
class KGTrainer(Trainer):
676797
r"""KGTrainer is designed for Knowledge-aware recommendation methods. Some of these models need to train the

0 commit comments

Comments
 (0)