Skip to content

Prediction on new dataset #8

@georgkempf

Description

@georgkempf

Hello,

I trained a model with experimental data (split into train, valid, test) and wanted to use it for prediction on an independent library. For prediction, I used the example script and substituted the test data by the new library. I was wondering what would be the best practice in this case regarding the qsar_vocab. In the example, the qsar_vocab seems to be build from train and valid data:

    qsar_vocab = TextLMDataBunch.from_df(path, train_aug, valid_aug, bs=bs, tokenizer=tok, 
                                  chunksize=50000, text_cols=0,label_cols=1, max_vocab=60000, include_bos=False)

    test_data_clas = TextClasDataBunch.from_df(path, train, test, bs=bs, tokenizer=tok, 
                          chunksize=50000, text_cols='smiles',label_cols='label', vocab=qsar_vocab.vocab, max_vocab=60000,
                                          include_bos=False)

When I now use the new library as test data, does the qsar_vocab, which would come from the experimental library used for training and validation, influence the results? Why does test_data_clas need a reference to the train data?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions