Prediction on new dataset

Hello,

I trained a model with experimental data (split into train, valid, test) and wanted to use it for prediction on an independent library. For prediction, I used the example script and substituted the test data by the new library. I was wondering what would be the best practice in this case regarding the qsar_vocab. In the example, the qsar_vocab seems to be build from train and valid data:

```
    qsar_vocab = TextLMDataBunch.from_df(path, train_aug, valid_aug, bs=bs, tokenizer=tok, 
                                  chunksize=50000, text_cols=0,label_cols=1, max_vocab=60000, include_bos=False)

    test_data_clas = TextClasDataBunch.from_df(path, train, test, bs=bs, tokenizer=tok, 
                          chunksize=50000, text_cols='smiles',label_cols='label', vocab=qsar_vocab.vocab, max_vocab=60000,
                                          include_bos=False)
```

When I now use the new library as test data, does the qsar_vocab, which would come from the experimental library used for training and validation, influence the results? Why does `test_data_clas` need a reference to the `train` data?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction on new dataset #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Prediction on new dataset #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions