Hi, thanks for maintaining this repo. I'm running the HSTU implementation in this repo on MovieLens-1M and getting numbers that are substantially below what Meta report in the original HSTU paper (NDCG@10 ≈ 0.1720). My setup follows the paper's protocol. Leave-one-out split, multi-epoch full-shuffle training, full-vocabulary ranking at eval. I tried both full-vocab cross-entropy and sampled-softmax (128 negatives). CE gave NDCG@10 = 0.1140 and SS gave 0.0948. I wanted to check with you before drawing any conclusions, since the README references the Meta paper as the source of this implementation. Can you clarify whether the HSTU here is intended/expected to reproduce Meta's MovieLens results, or has it only been validated on the Amazon datasets shown in the README benchmarks?
Hi, thanks for maintaining this repo. I'm running the HSTU implementation in this repo on MovieLens-1M and getting numbers that are substantially below what Meta report in the original HSTU paper (NDCG@10 ≈ 0.1720). My setup follows the paper's protocol. Leave-one-out split, multi-epoch full-shuffle training, full-vocabulary ranking at eval. I tried both full-vocab cross-entropy and sampled-softmax (128 negatives). CE gave NDCG@10 = 0.1140 and SS gave 0.0948. I wanted to check with you before drawing any conclusions, since the README references the Meta paper as the source of this implementation. Can you clarify whether the HSTU here is intended/expected to reproduce Meta's MovieLens results, or has it only been validated on the Amazon datasets shown in the README benchmarks?