refator: migration GRU with pytorch by ds-wook · Pull Request #2339 · recommenders-team/recommenders

ds-wook · 2026-05-26T05:04:43Z

Description

Migrate the sequential GRU recommender from TF 1.x to PyTorch, following the pattern established by the LightGCN PR (#2315): a standalone nn.Module with architectural args on __init__ and training-time args on fit(), with no port of the BaseModel / SequentialBaseModel hierarchy.

Also introduces a TF-free SequentialIterator under io/torch/ that the remaining sequential models (A2SVD, Caser, NextItNet, SLi-Rec, SUM) can reuse when they are migrated in subsequent PRs.

New files

recommenders/models/deeprec/io/torch/__init__.py
recommenders/models/deeprec/io/torch/sequential_iterator.py (+367) — TF-free SequentialIterator. Parses the same Amazon-reviews tab-separated format as the legacy io/sequential_iterator.py, but yields batches as dict[str, np.ndarray] instead of TF feed_dict. Preserves the in-batch negative-sampling row layout (1 positive followed by N negatives) so softmax loss code can reshape into (-1, 1 + train_num_ngs) exactly as before.

Rewritten

recommenders/models/deeprec/models/sequential/gru.py (~80 → ~560 lines) — Standalone nn.Module. Internal _FCN mirrors BaseModel._fcn_net (Linear → BatchNorm1d → activation → dropout). Supports the same four losses as the TF base (softmax, cross_entropy_loss, square_loss, log_loss) and the same five optimizers (adam, sgd, adadelta, adagrad, rmsprop). L1/L2 regularization on embeddings and layer parameters matches TF semantics (tf.nn.l2_loss == 0.5 * sum(x**2)). Uses pack_padded_sequence so the GRU only iterates over valid history steps.

Tests

tests/unit/recommenders/models/test_deeprec_model.py — adds test_gru_component_definition (assertions on embedding shapes, GRU input/hidden size, FCN output dim).
tests/smoke/recommenders/models/test_deeprec_model.py — adds test_model_gru (full fit → run_eval → predict cycle on the Amazon sample dataset).
Architectural vs training args split — __init__ takes only what shapes the parameter tensors (vocab sizes, embedding dims, GRU hidden size, MLP head config). Everything else (epochs, lr, batch size, optimizer name, loss name, L1/L2, gradient clip, save/eval cadence) lives on fit(). Same convention as the LightGCN PR.
Iterator reuse, not rewrite of the iterator base class — the new PyTorch iterator does not subclass BaseIterator (which is TF-aware via its abstract gen_feed_dict). It is a standalone class returning numpy dicts. Future PyTorch sequential models can either reuse this iterator or add a subclass under io/torch/.
Loss / regularization parity with TF — softmax loss masks padding rows by setting their softmax probability to 1 so log(.) zeros them out (matches BaseModel._compute_data_loss). tf.nn.l2_loss semantics (factor of 0.5) preserved.
TF SequentialIterator and SequentialBaseModel are NOT touched — they remain in place because A2SVD, Caser, NextItNet, SLi-Rec, and SUM still depend on them. They will be removed once the rest of the sequential models are migrated.

Related Issues

References

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.
I have signed the commits, e.g. git commit -s -m "your commit message".
This PR is being made to staging branch AND NOT TO main branch.

SimonYansenZhao · 2026-05-26T07:34:18Z

@ds-wook Could you merge the latest changes in the staging branch? I just put a new fix onto the staging branch to correct the tests on Checks.

SimonYansenZhao · 2026-05-26T07:39:46Z

@ds-wook Thanks! Now it's working.

SimonYansenZhao · 2026-05-26T08:01:04Z

@ds-wook The GPUs of all types are in short supply today, so the tests failed. Not sure when the resources will be available. I'll keep an eye out for it.

ds-wook · 2026-05-26T12:40:44Z

@SimonYansenZhao Thank you! I will check it on my side too.

miguelgfierro

super good @ds-wook, congrats again!

I left some comments below

miguelgfierro · 2026-05-29T05:39:38Z

@@ -0,0 +1,367 @@
+# Copyright (c) Recommenders contributors.


it would be good to create a notebook: https://github.com/recommenders-team/recommenders/blob/main/examples/00_quick_start/sequential_recsys_amazondataset.ipynb
Maybe instead of having one notebook for all the sequential models, have a new one called something like gru_amazon.ipynb. Something in between https://github.com/recommenders-team/recommenders/blob/main/examples/00_quick_start/sequential_recsys_amazondataset.ipynb and https://github.com/recommenders-team/recommenders/blob/main/examples/00_quick_start/sasrec_amazon.ipynb where we explain first what is GRU, what's its architecture and then compute the notebook

also, when the new notebook is created, it would be good to compare the old one and new one to make sure the metrics are consistent

Metric Parity Check: TF GRU vs PyTorch GRU
Setup
To verify that the PyTorch port does not contain a parity bug, I ran the original TensorFlow GRUModel (restored from main at commit 9090c278) on exactly the same training, validation, and test splits as the new PyTorch implementation. The only variable changed between runs was the framework.
Dataset
Amazon Movies & TV Reviews
1% sample (sample_rate=0.01, matching both notebooks)
Train rows: 16,635 positives (train_data)
Test rows: 169,170 (test_num_ngs=9)
Hyperparameters (Identical Across Both Runs)
epochs: 10 batch_size: 400 learning_rate: 0.001 embed_l2: 0 layer_l2: 0 train_num_ngs: 4 max_seq_length: 50 hidden_size: 40 layer_sizes: [100, 64] dropout: [0.3, 0.3] enable_BN: true
seed: 42
Training Configuration
Loss: softmax over (1 + train_num_ngs) candidate group
Optimizer: Adam
Results (Final Evaluation on test_data)

Metric TF GRU (Legacy) PyTorch GRU Δ (PyTorch − TF)

AUC 0.6791 0.6581 -0.021

group_AUC 0.6785 0.6487 -0.030

logloss 0.9397 0.6062 -0.334 ✓

mean_mrr 0.4596 0.4144 -0.045

ndcg@2 0.3640 0.3068 -0.057

ndcg@4 0.4649 0.4205 -0.044

ndcg@6 0.5221 0.4802 -0.042

During training, the TF model's logloss increases from approximately 0.69 → 0.94 while AUC simultaneously improves. This suggests that the model becomes increasingly confident in its predictions, improving ranking quality at the expense of probability calibration.
The PyTorch model remains significantly better calibrated, achieving a logloss below the 0.69 random-guess baseline.
3. Training Trajectories Differ
TensorFlow:
Validation AUC Epoch 1 → 10 0.59 → 0.71 (monotonic improvement)
PyTorch:
Test AUC Peak at epoch 6: 0.6668 Declines afterwards
The saved best_model.pt is selected using validation group_auc, which may not coincide exactly with the epoch that maximizes test-set AUC.
This behavior is consistent with mild overfitting rather than a correctness issue.
4. No Evidence of a Port-Time Bug
The observed ~2–3% ranking gap is within reasonable framework-to-framework variance for a dataset of this size.
Potential contributors, ordered by likely impact:
A. FCN Weight Initialization
TensorFlow:
truncated_normal(stddev=0.01)
PyTorch:
nn.Linear(...) # default Kaiming Uniform initialization
PyTorch weights begin with roughly 6× larger magnitude.
Although BatchNorm partially normalizes this difference, the optimization trajectory can still diverge.
B. GRU Cell Formulation Differences
PyTorch nn.GRU follows the cuDNN-friendly formulation:
r_t * (W_hn h_(t−1) + b_hn)
TensorFlow compat.v1.nn.rnn_cell.GRUCell follows the original Cho et al. formulation:
W [x_t, r_t * h_(t−1)]
The two formulations are mathematically similar but not identical.
C. User Embedding Usage
The TensorFlow GRU constructs:
self.user_embedding
but never consumes it in the final FCN.
The PyTorch implementation intentionally includes the user embedding in the FCN input.
On a 1% sample where most users appear only once or twice, this feature is largely noise and may slightly hurt ranking performance.
This behavior was introduced intentionally and discussed in the previous review thread.
D. Random Seed Coverage
Both implementations seed framework-level RNGs.
However, neither implementation seeds Python's:
random
module, which is used by the iterator for:
negative sampling
data shuffling
As a result, some run-to-run variation remains even with identical framework seeds.
Conclusion
The PyTorch implementation reproduces the TensorFlow baseline within expected framework-to-framework variance.
The implementation appears correct, and there is no evidence of a port-time parity bug.
The gap between the observed AUC (~0.66–0.68) and the benchmark value (~0.84) is explained by the much smaller training dataset (1% sample vs. full Amazon dataset), rather than by any issue in the PyTorch port itself.

@ds-wook can you review? I see an error in the notebook: https://github.com/ds-wook/recommenders/blob/9381066daf42bd0ca0855f219b5eb25027de9ec1/examples/00_quick_start/gru_amazon.ipynb

I can access the notebook without any issues on my side. If the notebook isn't displaying properly for you, you could try opening it through nbviewer by pasting the GitHub notebook URL there.

miguelgfierro · 2026-05-29T05:52:27Z

+                wt.write("\n")
+        return self
+
+    def load(self, model_path: str) -> None:


load() joins a directory to MODEL_CHECKPOINT (model.pt), but fit(save_model=True) only ever writes best_model.pt and epoch_{N}_model.pt — a bare model.pt is
never produced, so load(model_dir) raises FileNotFoundError. Rather than hardcode best_, make the checkpoint name a parameter that defaults to the best
checkpoint, so callers can also load a specific epoch:

def load(self, model_path: str, filename: str = f"best_{MODEL_CHECKPOINT}") -> None: """Load weights from a ``.pt`` file, or from ``model_path/<filename>`` if a directory. Args: model_path: Path to a ``.pt`` file, or a directory containing one. filename: Checkpoint name to load when ``model_path`` is a directory. Defaults to the best checkpoint written by ``fit(save_model=True)``; pass e.g. ``f"epoch_3_{MODEL_CHECKPOINT}"`` to load a specific epoch. """ if os.path.isdir(model_path): model_path = os.path.join(model_path, filename) state_dict = torch.load(model_path, map_location=self.device, weights_only=True) self.load_state_dict(state_dict)

Defaulting to the best checkpoint keeps the common model.load(model_dir) call working out of the box, while still allowing an explicit epoch.

miguelgfierro · 2026-05-29T05:56:37Z

+                item_all.append(positive_item)
+                item_cate_all.append(item_cate_list[i])
+                count = 0
+                while True:


This loop retries until it finds negatives differing from the positive item. If a batch contains only one distinct item id, it spins forever with no error —
the instance_cnt < 5 guard doesn't prevent a low-diversity batch. Add a bounded retry that fails fast:

attempts = 0 while count < batch_num_ngs: attempts += 1 if attempts > 100 * batch_num_ngs: raise ValueError("could not sample enough distinct negatives; batch has too few unique items") j = random.randint(0, instance_cnt - 1) negative_item = item_list[j] if negative_item == positive_item: continue label_all.append(0) item_all.append(negative_item) item_cate_all.append(item_cate_list[j]) count += 1

miguelgfierro · 2026-05-29T06:04:15Z

+        self.max_seq_length = max_seq_length
+        self.hidden_size = hidden_size
+
+        self.user_embedding = nn.Embedding(user_vocab_length, user_embedding_dim)


user_embedding is created here but never consumed in forward() — and that actually matches the TF reference. I checked the original: sequential_base_model.py
builds user_lookup and looks up self.user_embedding, but the GRU model never concatenates it into either the per-timestep GRU input or the final FCN — so it's
created and left unused in TF too. This PR faithfully reproduces that, so it's parity, not a regression.
To wire it into the prediction layer (if user conditioning is wanted). The user id is a per-instance, time-invariant feature, so it belongs at the final FCN input, not inside the recurrence:

# __init__: _FCN in_dim grows by user_embedding_dim in_dim=hidden_size + self.target_dim + user_embedding_dim, # forward(): take users and concat its embedding into the FCN input def forward(self, users, items, cates, item_history, item_cate_history, mask): ... target_emb = torch.cat([self.item_embedding(items), self.cate_embedding(cates)], dim=1) user_emb = self.user_embedding(users) model_output = torch.cat([final_state, target_emb, user_emb], dim=1) return self.fcn(model_output)

This also needs a users key added to the iterator's batch dict and to _to_device. Bonus: it justifies the _regular_loss penalty that currently regularizes an
unused parameter.

Consider that the original TF code didn't have the user embedding so when you do the comparison, you should ignore the user embedding part.

miguelgfierro · 2026-05-29T06:07:33Z

+
+        time_range = 3600 * 24
+
+        time_diff = []


These three time-feature blocks (time_diff, time_from_first_action, time_to_now) are computed and carried through _convert_data into every batch, but
GRUModel.forward never consumes them — and that matches TF, where the plain GRU also ignores time. Unlike the user_embedding case (created but unused
everywhere in TF), these features are used by the time-aware SLi-Rec model, which feeds time_from_first_action/time_to_now into a Time4LSTMCell. Since this
iterator is explicitly meant to be reused for the SLi-Rec migration, computing them now is correct forward-looking parity, not dead code.

Two small asks so it doesn't read as an oversight:

Add a one-line comment here noting these are emitted for the time-aware models (SLi-Rec) and intentionally unused by GRU.

Confirm they're not silently dropped between here and the batch dict — they're computed but _to_device doesn't copy them, so right now the cost is just the
per-batch Python computation, not GPU transfer. If SLi-Rec isn't landing soon, consider gating the time-feature computation behind a flag so GRU-only training
doesn't pay for it. Non-blocking.

On (1) — Comment Added
Added at sequential_iterator.py:104-110, right before the time_diff block in parser_one_line:
# The three time features below (time_diff, time_from_first_action, # time_from_first_action, time_to_now) are emitted for the time-aware # sequential models (SLi-Rec consumes them via Time4LSTMCell). GRU # intentionally ignores them; its forward() and _to_device() do not # reference these keys, so the per-batch cost is Python-side computation # only (no GPU transfer). Kept here so the same iterator can serve future # SLi-Rec / time-aware migrations without forking the parser.
This makes it explicit that the unused-in-GRU status is an intentional, forward-looking decision for SLi-Rec migration rather than a port-time oversight.
On (2) — Confirmed No Silent Drop and No GPU Transfer
I traced the three time features through the full batch pipeline:

Stage time_diff / time_from_first_action / time_to_now Cost

parser_one_line Computed (list comprehension + np.log) CPU microseconds

_convert_data Written into the batch dict as NumPy arrays Small RAM only

_to_device Not copied. Only labels, users, items, cates, item_history, item_cate_history, and mask are moved to the device 0 (no GPU transfer)

GRUModel.forward Not referenced (grep "time_" gru.py → no matches) 0

So the actual cost is limited to the per-batch Python parsing work. There is no GPU memory usage and no PCIe transfer overhead. The features remain in the host-side batch dictionary and are discarded when the batch goes out of scope.
On the Gating-Flag Suggestion
I left out a gating flag for now because the next migration in the series (SLi-Rec) will consume these features. Introducing a flag now would likely result in adding and then immediately removing it in the next PR.
If the SLi-Rec migration is delayed, we can introduce:
emit_time_features: bool = True
on the iterator as a follow-up. The added comment documents that intent. Happy to add the flag in this PR if you'd prefer.

review-notebook-app · 2026-05-30T05:54:26Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ds-wook · 2026-06-01T05:24:29Z

@SimonYansenZhao
While running the full GPU test suite, only tests/unit/recommenders/utils/test_gpu_utils.py::test_get_gpu_info
fails.

The GPU is available and all other GPU-dependent tests pass successfully, including NCF, DeepRec, NewsRec, and notebook GPU tests.

Failure occurs in:

recommenders/utils/gpu_utils.py

at:

meminfo = cuda.current_context().get_memory_info()

Error:

CudaAPIError: [201] Call to cuCtxGetDevice results in CUDA_ERROR_INVALID_CONTEXT

This suggests that the current CUDA context may not be valid when get_gpu_info() is called, potentially due to Numba/CUDA context handling changes or interaction with previously executed GPU tests.

Environment:

Python 3.9.25
pytest 8.4.2
numba (version?)
CUDA (version?)
GPU tests executed successfully except get_gpu_info()

Observed result:

62 tests passed
3 skipped
1 failed (test_get_gpu_info)

Potential area to investigate:

Usage of cuda.current_context() inside get_gpu_info()
Context push/pop behavior in the "with gpu:" block
Compatibility with newer Numba/CUDA versions

miguelgfierro · 2026-06-01T13:36:31Z

@@ -0,0 +1,367 @@
+# Copyright (c) Recommenders contributors.


@ds-wook can you review? I see an error in the notebook: https://github.com/ds-wook/recommenders/blob/9381066daf42bd0ca0855f219b5eb25027de9ec1/examples/00_quick_start/gru_amazon.ipynb

miguelgfierro · 2026-06-01T13:37:39Z

@@ -0,0 +1,379 @@
+# Copyright (c) Recommenders contributors.
+# Licensed under the MIT License.


about the metrcis: ndcg@2 ndcg@4 and ndcg@6.
we never use those metrics, we do ndcg@10. Can you review?

miguelgfierro · 2026-06-01T13:40:17Z

+# Copyright (c) Recommenders contributors.
+# Licensed under the MIT License.
+
+from __future__ import annotations


About the logloss difference, do you know why this could be happening?

FYI @anargyri

The cross-framework logloss gap (TF 0.94 vs PyTorch 0.61) closes to ~0.12 when L2=1e-4 is added to both. The gap is an artifact of where each implementation's optimizer lands on the bias-variance curve, not a port bug — TF was over-fitting weight magnitudes (L2 helps), PyTorch was already in a calibrated regime (L2 hurts). Both implementations produce reasonable metrics; no code change needed.

Signed-off-by: ds-wook <leewook94@gmail.com>

* Use pull_request_target to pass secrets Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct paths Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Test before changing pull_request to pull_request_target Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update docs Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Use pull_request_target Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> --------- Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> Signed-off-by: ds-wook <leewook94@gmail.com>

Signed-off-by: benben951 <jie13383393540@163.com> Signed-off-by: ds-wook <leewook94@gmail.com>

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: ds-wook <leewook94@gmail.com>

@anargyri

…ging (recommenders-team#2318) * refactor: migrate vae pytorch Signed-off-by: ds-wook <leewook94@gmail.com> * refactor: optimize gpu calculation Signed-off-by: ds-wook <leewook94@gmail.com> * refactor: rebuild multi vae tensorflow to pytorch Signed-off-by: ds-wook <leewook94@gmail.com> * fix: rewrite multi vae Signed-off-by: ds-wook <leewook94@gmail.com> * Update doc for GitHub Actions runner setup (recommenders-team#2306) Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Translate NCF model from TensorFlow to PyTorch Rewrite ncf_singlenode.py from TF v1 (sessions, placeholders, tf_slim) to PyTorch (nn.Module). All weight initializations match TF defaults: truncated_normal(std=0.01) for embeddings, xavier_uniform for dense layers, no bias on output layer. Adam optimizer and BCELoss use identical defaults. Update unit tests, quickstart notebook, deep dive notebook and NNI notebook to use PyTorch imports. Dataset module (dataset.py) is unchanged as it has no TF dependency. Metrics on MovieLens 100k (seed=42, 50 epochs) are within ~4% of TF reference, explained entirely by different RNG sequences between frameworks. Training loss converges to the same value (0.2315 vs 0.2323). Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * refactor: change model parameter & arch Signed-off-by: ds-wook <leewook94@gmail.com> * Detect and re-download corrupt zip files in maybe_download A partial download that gets interrupted leaves a truncated zip file on disk. On retry, maybe_download sees the file exists and skips the download, causing BadZipFile errors that persist across all retries. Add is_valid_zip() to validate existing zip files before skipping the download. If the file is corrupt, delete it and re-download. Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * fix: switched both notebooks from map_at_k to map Signed-off-by: ds-wook <leewook94@gmail.com> * Fix by_threshold relevancy method to filter by score, not count The relevancy_method='by_threshold' branch in merge_ranking_true_pred was passing `threshold` as the `k` argument to get_top_k_items, so the threshold value silently became a top-N count instead of a score cutoff. Combined with metrics that divide by `k` (precision_at_k, ndcg_at_k, map, map_at_k, ...), this let the resulting metric exceed 1, which is mathematically impossible for these definitions. Now `by_threshold` filters predictions to rows with col_prediction >= threshold and then applies the standard top-k cutoff. Hits are bounded by k, so metrics stay in [0, 1]. Also clarifies the `threshold` docstring on every metric that exposes the parameter so users can tell it is a score cutoff rather than a count of items. Adds a regression test covering three cases: 1. Threshold above all scores -> every ranking metric is 0. 2. Threshold below all scores -> by_threshold collapses to top_k. 3. Mid threshold -> all metrics stay inside [0, 1]. Fixes recommenders-team#2154 Refs recommenders-team#2140 * Rewrite by_threshold test with concrete correctness assertions * fix: change map metric Signed-off-by: ds-wook <leewook94@gmail.com> * Add support for compshare vms Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct shell commands Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Declare COMPSHARE_SPEC_FILE Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Copy repo files to the VM to avoid git clone failure Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Retry curl upon failure Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * fix(gpu): use imported cuda namespace for gpu counting Signed-off-by: Yinchaochen <lisumchen@gmail.com> * Update docs Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Configure Docker registry mirror for speedup Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Retry image build upon failure Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct syntax errors Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add pip index arg Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Make scripts robuster Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try DNS configs only, and remove P40 due to incompatibility with PyTorch Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * test(gpu): shorten regression test name per review Rename test_get_number_gpus_falls_back_to_cuda_namespace_when_torch_is_missing to test_get_number_gpus_without_torch in test_gpu_utils.py and update its entry in tests/test_groups.yml. The shorter name still pairs the function under test with the scenario; the cuda-fallback detail is evident from the test body. Addresses review comment from @anargyri on recommenders-team#2314. Signed-off-by: Yinchao Chen <lisumchen@gmail.com> * refactor: modernize lightgbm utils Signed-off-by: ds-wook <leewook94@gmail.com> * Add support for Docker and PyPI mirrors Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Clean up code for retries and correct docker mirror url Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update docs Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct docker build arg for pypi index url Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Combine test groups for gpu Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> --------- Signed-off-by: ds-wook <leewook94@gmail.com> Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> Signed-off-by: Yinchaochen <lisumchen@gmail.com> Signed-off-by: Yinchao Chen <lisumchen@gmail.com> Co-authored-by: ds-wook <leewook94@gmail.com> Co-authored-by: miguelgfierro <miguelgfierro@users.noreply.github.com> Co-authored-by: Miguel Fierro <3491412+miguelgfierro@users.noreply.github.com> Co-authored-by: Yinchaochen <lisumchen@gmail.com> Co-authored-by: Andreas Argyriou <anargyri@users.noreply.github.com>

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> Signed-off-by: ds-wook <leewook94@gmail.com>

@k

* refactor: migrate vae pytorch Signed-off-by: ds-wook <leewook94@gmail.com> * refactor: optimize gpu calculation Signed-off-by: ds-wook <leewook94@gmail.com> * refactor: rebuild multi vae tensorflow to pytorch Signed-off-by: ds-wook <leewook94@gmail.com> * fix: rewrite multi vae Signed-off-by: ds-wook <leewook94@gmail.com> * Update doc for GitHub Actions runner setup (recommenders-team#2306) Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Translate NCF model from TensorFlow to PyTorch Rewrite ncf_singlenode.py from TF v1 (sessions, placeholders, tf_slim) to PyTorch (nn.Module). All weight initializations match TF defaults: truncated_normal(std=0.01) for embeddings, xavier_uniform for dense layers, no bias on output layer. Adam optimizer and BCELoss use identical defaults. Update unit tests, quickstart notebook, deep dive notebook and NNI notebook to use PyTorch imports. Dataset module (dataset.py) is unchanged as it has no TF dependency. Metrics on MovieLens 100k (seed=42, 50 epochs) are within ~4% of TF reference, explained entirely by different RNG sequences between frameworks. Training loss converges to the same value (0.2315 vs 0.2323). Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * refactor: change model parameter & arch Signed-off-by: ds-wook <leewook94@gmail.com> * Detect and re-download corrupt zip files in maybe_download A partial download that gets interrupted leaves a truncated zip file on disk. On retry, maybe_download sees the file exists and skips the download, causing BadZipFile errors that persist across all retries. Add is_valid_zip() to validate existing zip files before skipping the download. If the file is corrupt, delete it and re-download. Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * fix: switched both notebooks from map_at_k to map Signed-off-by: ds-wook <leewook94@gmail.com> * Fix by_threshold relevancy method to filter by score, not count The relevancy_method='by_threshold' branch in merge_ranking_true_pred was passing `threshold` as the `k` argument to get_top_k_items, so the threshold value silently became a top-N count instead of a score cutoff. Combined with metrics that divide by `k` (precision_at_k, ndcg_at_k, map, map_at_k, ...), this let the resulting metric exceed 1, which is mathematically impossible for these definitions. Now `by_threshold` filters predictions to rows with col_prediction >= threshold and then applies the standard top-k cutoff. Hits are bounded by k, so metrics stay in [0, 1]. Also clarifies the `threshold` docstring on every metric that exposes the parameter so users can tell it is a score cutoff rather than a count of items. Adds a regression test covering three cases: 1. Threshold above all scores -> every ranking metric is 0. 2. Threshold below all scores -> by_threshold collapses to top_k. 3. Mid threshold -> all metrics stay inside [0, 1]. Fixes recommenders-team#2154 Refs recommenders-team#2140 * Rewrite by_threshold test with concrete correctness assertions * fix: change map metric Signed-off-by: ds-wook <leewook94@gmail.com> * Add support for compshare vms Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct shell commands Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Declare COMPSHARE_SPEC_FILE Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Copy repo files to the VM to avoid git clone failure Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Retry curl upon failure Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * fix(gpu): use imported cuda namespace for gpu counting Signed-off-by: Yinchaochen <lisumchen@gmail.com> * Update docs Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Configure Docker registry mirror for speedup Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Retry image build upon failure Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct syntax errors Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add pip index arg Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Make scripts robuster Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try DNS configs only, and remove P40 due to incompatibility with PyTorch Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Use map_at_k instead of map for ranking-metric reporting Issue recommenders-team#2309 points out that the dict returned by examples/06_benchmarks/benchmark_utils.py:ranking_metrics_python and :ranking_metrics_pyspark labels its first entry "MAP" but computes it with the Spark-style map() function, which normalizes by n_relevant rather than min(k, n_relevant). The other entries in the same dict are labeled "@k" and computed with the @k variants, so the first entry is inconsistent with its neighbours and can produce values that are mathematically valid for MAP but counter-intuitive when read alongside Precision@k / Recall@k / NDCG@k. Changes: * examples/06_benchmarks/benchmark_utils.py - swap map for map_at_k in both the Python and PySpark ranking-metrics helpers and rename the dict key "MAP" to "MAP@k" so the label matches the function used. * examples/06_benchmarks/movielens.ipynb - update the two source cells (the missing-row placeholder dict and the column-order list) that consume that dict so the benchmark table column header agrees with the upstream key. Cached cell outputs are left as-is; they will be regenerated on the next notebook run. * recommenders/evaluation/python_evaluation.py - cross-link the map() and map_at_k() docstrings so a reader landing on either function can see the normalizer difference and pick the right one. * recommenders/evaluation/spark_evaluation.py - same cross-link on SparkRankingEvaluation.map / .map_at_k. * tests/unit/recommenders/evaluation/test_python_evaluation.py - add test_python_map_vs_map_at_k that pins the invariant: map_at_k equals map when k >= n_relevant for every user (k=10 on the existing fixture) and strictly exceeds it when at least one user has more than k relevant items (k=5, where user 3 in the fixture has 10). * tests/test_groups.yml - register the new test in the pr_gate group. Notebook examples under examples/00_quick_start and examples/02_model_collaborative_filtering still import the bare map symbol; switching them is left to a follow-up because the tests/functional/examples/test_notebooks_*.py and tests/smoke/examples/test_notebooks_*.py expected values for the "map" key would need to be regenerated end-to-end. Refs recommenders-team#1702 recommenders-team#2004 Signed-off-by: Yinchao Chen <lisumchen@gmail.com> * test(gpu): shorten regression test name per review Rename test_get_number_gpus_falls_back_to_cuda_namespace_when_torch_is_missing to test_get_number_gpus_without_torch in test_gpu_utils.py and update its entry in tests/test_groups.yml. The shorter name still pairs the function under test with the scenario; the cuda-fallback detail is evident from the test body. Addresses review comment from @anargyri on recommenders-team#2314. Signed-off-by: Yinchao Chen <lisumchen@gmail.com> * refactor: modernize lightgbm utils Signed-off-by: ds-wook <leewook94@gmail.com> * Add support for Docker and PyPI mirrors Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Clean up code for retries and correct docker mirror url Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update docs Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct docker build arg for pypi index url Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Combine test groups for gpu Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Fix asset URL in fm_deep_dive.ipynb path had `mains-team/resources` repeated muiltiple times this is corrected to value in https://github.com/recommenders-team/recommenders/blob/main/examples/00_quick_start/xdeepfm_criteo.ipynb * Install cuda driver from scratch Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Lock gpu version Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove install_container_toolkit.sh Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * refactor: migrate lightgcn pytorch Signed-off-by: ds-wook <leewook94@gmail.com> * fix: remove type_checking and change print to logging Signed-off-by: ds-wook <leewook94@gmail.com> * refactor: redesign architectural args Signed-off-by: ds-wook <leewook94@gmail.com> * fix: reorder logger Signed-off-by: ds-wook <leewook94@gmail.com> * Try CUDA 13.2.1 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add 2080 for use Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Use the latest cuda driver Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Increase notebook execution timeout Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove 2080 due to insufficient gpu memory for nightly tests Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add support for http proxy for speed up Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Prepend "VM_" to env variables for cache Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update map_at_k in notebooks * PR template typo * Remove Surprise and rerun benchmarks * Fix MLLib docs link * Fix docstring for MAP * Add support for https proxy Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add more retry on failure Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add support for installing gpu drivers for P40 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct configure.sh Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add retries for ssh key setup Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Set apt and uv to bypass SSL verification Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update spec.json Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove http/https proxy because of no apparent gains on speed Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Revert Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove yq installation in Dockerfile Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update https proxy config for apt Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct apt operations Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove apt conf Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove P40 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add more retries Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Move http(s) proxy config from config.json to CLI Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * fix: fixed lightgcn model and rerun notebook Signed-off-by: ds-wook <leewook94@gmail.com> * Add support to set vm requirements Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add by_threshold ranking metrics regression test Signed-off-by: benben951 <jie13383393540@163.com> * Set VM stop schedule Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Explicitly specify secrets to use (recommenders-team#2328) Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct secrets in calling workflows Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct docker args Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Resolve key unbound error Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct empty stop time error Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Reduce spec retrying times Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Lock CUDA version to 580 on V100S Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Refactor duplicate code Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add more GPU choices Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct delete_vm.sh Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct GPUType Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try the spot chargetype Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct jq filter Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Alternate charge type for the same gputype Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add more GPU options Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * fix: honor benchmark recommendation args Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * fix: address benchmark review suggestions Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * Resolve issue on empty secrets (recommenders-team#2334) * Use pull_request_target to pass secrets Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct paths Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Test before changing pull_request to pull_request_target Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update docs Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Use pull_request_target Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> --------- Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * fix: set default timeout for dataset downloads Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * Correct git refs and working dir (recommenders-team#2338) Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> --------- Signed-off-by: ds-wook <leewook94@gmail.com> Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> Signed-off-by: Yinchaochen <lisumchen@gmail.com> Signed-off-by: Yinchao Chen <lisumchen@gmail.com> Signed-off-by: benben951 <jie13383393540@163.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: ds-wook <leewook94@gmail.com> Co-authored-by: miguelgfierro <miguelgfierro@users.noreply.github.com> Co-authored-by: Miguel Fierro <3491412+miguelgfierro@users.noreply.github.com> Co-authored-by: Yinchaochen <lisumchen@gmail.com> Co-authored-by: Andreas Argyriou <anargyri@users.noreply.github.com> Co-authored-by: seanv507 <sean.violante@gmail.com> Co-authored-by: benben951 <jie13383393540@163.com> Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

Signed-off-by: ds-wook <leewook94@gmail.com>

…-gru-pytorch

ds-wook · 2026-06-09T09:51:13Z

@miguelgfierro I’ve addressed the requested changes. When you have a chance, could you please take another look? Thank you!

anargyri · 2026-06-13T22:00:47Z


 jobs:
  run-tests:
+    uses: ./.github/workflows/compshare-vm.yml


This file has no real changes. Could you unstage?

anargyri · 2026-06-13T22:01:03Z


 jobs:
  run-tests:
+    uses: ./.github/workflows/compshare-vm.yml


anargyri · 2026-06-13T22:01:30Z

This one too.

anargyri · 2026-06-15T12:52:36Z

+    "\n",
+    "This notebook trains a GRU \\[1\\] sequential recommender on a public Amazon Movies & TV review dataset. The GRU model encodes a user's history of (item, category) interactions with a Gated Recurrent Unit, then combines the resulting hidden state with the target item / category / user embeddings to predict whether the user will interact with the target next.\n",
+    "\n",
+    "The implementation is PyTorch-native; it does not depend on the legacy TF `BaseModel` / `SequentialBaseModel` hierarchy used by the other sequential models in this repository (see [`sequential_recsys_amazondataset.ipynb`](sequential_recsys_amazondataset.ipynb) for those). It mirrors the single-file standalone `nn.Module` convention introduced by the LightGCN PyTorch migration.\n"


I think you can remove the stuff about TensorFlow since it is not relevant for this notebook.

anargyri · 2026-06-15T19:43:08Z

-class GRUModel(SequentialBaseModel):
-    """GRU Model
+
+class GRUModel(nn.Module):


Doesn't it make sense to create something similar to SequentialBaseModel as in the existing code, since we are going to reuse it for other deeprec models too? This file is now longer and the same thing will happen with the other models.

ds-wook requested review from SimonYansenZhao, anargyri, gramhagen, loomlike, miguelgfierro and wav8k as code owners May 26, 2026 05:04

miguelgfierro reviewed May 29, 2026

View reviewed changes

miguelgfierro reviewed Jun 1, 2026

View reviewed changes

ds-wook and others added 14 commits June 2, 2026 16:56

refator: migration GRU with pytorch

e547e1f

Signed-off-by: ds-wook <leewook94@gmail.com>

Add by_threshold ranking metrics regression test

e1268c8

Signed-off-by: benben951 <jie13383393540@163.com> Signed-off-by: ds-wook <leewook94@gmail.com>

fix: honor benchmark recommendation args

7d7ba13

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: ds-wook <leewook94@gmail.com>

fix: address benchmark review suggestions

e45cd41

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: ds-wook <leewook94@gmail.com>

fix: set default timeout for dataset downloads

aa9179c

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: ds-wook <leewook94@gmail.com>

Correct git refs and working dir (recommenders-team#2338)

53974ef

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com>

Correct working directory (recommenders-team#2340)

dc230ad

Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> Signed-off-by: ds-wook <leewook94@gmail.com>

fix: change gru arch

7dfa993

Signed-off-by: ds-wook <leewook94@gmail.com>

feat: build gru notebook

7fc9f9f

Signed-off-by: ds-wook <leewook94@gmail.com>

rerun workflow

e859ad8

Signed-off-by: ds-wook <leewook94@gmail.com>

fix: closed logloss gap

826b52f

Signed-off-by: ds-wook <leewook94@gmail.com>

ds-wook force-pushed the refactor/migrate-gru-pytorch branch from aff1ed6 to 826b52f Compare June 4, 2026 09:45

Merge remote-tracking branch 'upstream/staging' into refactor/migrate…

1335ab7

…-gru-pytorch

anargyri reviewed Jun 13, 2026

View reviewed changes

Comment thread recommenders/models/deeprec/io/torch/__init__.py

anargyri Jun 13, 2026

Copy link
Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one too.

anargyri reviewed Jun 15, 2026

View reviewed changes

Metric	TF GRU (Legacy)	PyTorch GRU	Δ (PyTorch − TF)
AUC	0.6791	0.6581	-0.021
group_AUC	0.6785	0.6487	-0.030
logloss	0.9397	0.6062	-0.334 ✓
mean_mrr	0.4596	0.4144	-0.045
ndcg@2	0.3640	0.3068	-0.057
ndcg@4	0.4649	0.4205	-0.044
ndcg@6	0.5221	0.4802	-0.042

Stage	time_diff / time_from_first_action / time_to_now	Cost
parser_one_line	Computed (list comprehension + np.log)	CPU microseconds
_convert_data	Written into the batch dict as NumPy arrays	Small RAM only
_to_device	Not copied. Only labels, users, items, cates, item_history, item_cate_history, and mask are moved to the device	0 (no GPU transfer)
GRUModel.forward	Not referenced (grep "time_" gru.py → no matches)	0

		@@ -0,0 +1,379 @@
		# Copyright (c) Recommenders contributors.
		# Licensed under the MIT License.

Conversation

ds-wook commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

References

Checklist:

Uh oh!

SimonYansenZhao commented May 26, 2026

Uh oh!

SimonYansenZhao commented May 26, 2026

Uh oh!

SimonYansenZhao commented May 26, 2026

Uh oh!

ds-wook commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miguelgfierro left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Metric Parity Check: TF GRU vs PyTorch GRU

Setup

Dataset

Hyperparameters (Identical Across Both Runs)

Training Configuration

Results (Final Evaluation on test_data)

3. Training Trajectories Differ

4. No Evidence of a Port-Time Bug

A. FCN Weight Initialization

B. GRU Cell Formulation Differences

C. User Embedding Usage

D. Random Seed Coverage

Conclusion

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

On (1) — Comment Added

On (2) — Confirmed No Silent Drop and No GPU Transfer

On the Gating-Flag Suggestion

Uh oh!

review-notebook-app Bot commented May 30, 2026

Uh oh!

ds-wook commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ds-wook commented Jun 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anargyri Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

ds-wook commented May 26, 2026 •

edited

Loading

ds-wook commented May 26, 2026 •

edited

Loading

Results (Final Evaluation on `test_data`)

ds-wook commented Jun 1, 2026 •

edited

Loading

anargyri Jun 15, 2026 •

edited

Loading