Cold users on test set #23

manumacc · 2021-05-28T21:27:56Z

manumacc
May 28, 2021

From the forum:

Will "new" users appear in the test set? Namely, whether the users in the whole training set can cover the test set, or there is new user appearing in the test set?
We want to create user embedding, while it seems that there is no available GPU on test environment and we can't do inference to get user embedding for those new users efficiently. (it will exceed the time limit, or there would be GPU on test environment?)

Reply:

Hi, by design, in the original dataset, all engaging users (for whom you are making the prediction) in the test set should be in the set of engaging users from the training set.
Now since we are scrubbing the dataset and removing rows corresponding to tweets that were deleted, there's a small fraction (less than 1% currently) of users that disappeared from the training set and are still in the test set because the tweets they engaged with were removed).
As for the engaged with users, new users can totally appear in the test or validation sets.

Note that XGBoost can impute missing values. So we should be fine if the number of cold users is relatively low.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cold users on test set #23

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Cold users on test set #23

Uh oh!

Uh oh!

manumacc May 28, 2021

Replies: 0 comments

manumacc
May 28, 2021