Skip to content

Pretrained embeddings underperform raw XGBoost baseline in fraud detection evaluation #7

Description

@luolin0715

Hi NVIDIA team,

Thank you for releasing this transaction foundation model example. I am trying to reproduce and understand the downstream fraud detection results using the pretrained checkpoint.

I followed the notebook flow described in the repository:

  • 01_dataset_baseline.ipynb: raw-feature XGBoost baseline
  • 04_inference_embedding_extraction.ipynb: pretrained model inference and embedding extraction
  • 05_xgboost_fraud_detection.ipynb: downstream XGBoost evaluation using raw features, embeddings, and combined features

The README says that notebooks 04 and 05 use the pretrained checkpoint from models/decoder-foundation-model/, and that notebook 04 extracts 512-dimensional embeddings via last-token pooling.

However, in my run, the pretrained embeddings alone perform much worse than the raw-feature baseline:

Model Feature Dim ROC-AUC Avg Precision / AP
Raw Features baseline 13d 0.9885 0.1238
64d PCA Embeddings 64d 0.8689 0.0139
Combined raw + embedding features 77d 0.9872 0.1569

For comparison, notebook 01 reports approximately:

  • ROC-AUC: 0.98
  • AUPRC/AP: 0.14

My questions are:

  1. Is it expected that the pretrained embeddings alone can underperform the raw XGBoost baseline so significantly?
  2. Are the pretrained embeddings intended mainly as complementary features rather than a replacement for raw tabular features?
  3. Should the embedding evaluation in notebook 05 use the full 512-dimensional embeddings instead of 64d PCA embeddings?
  4. Are there any recommended preprocessing, pooling, normalization, or XGBoost hyperparameter settings for evaluating the pretrained embeddings?
  5. Could this gap indicate that I missed a step, such as git lfs pull, using the wrong checkpoint, mismatched train/test split, or different notebook execution order?

Any guidance on the expected metrics for the pretrained checkpoint would be very helpful.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions