Skip to content

Problem with saving the model #1431

@donottakemyusername

Description

@donottakemyusername

Hi, I am using the partial_fit function to perform incremental learning with BERTopic. When I tried to save the BERTopic model using safetensors, I got the following error: KeyError: 'tokenizer'. The error was raised in bertopic/_save_utils.py when the function tries to recreate the countvectorizer delete the parameters in cv but they don't actually exist.
I tried to save the model using the code: model.save('some_directory', serialization="safetensors", save_ctfidf=True),
and here is the error code I got:
/python3.9/site-packages/bertopic/_save_utils.py in save_ctfidf_config(model, path)
293 # Recreate CountVectorizer
294 cv_params = model.vectorizer_model.get_params()
--> 295 del cv_params["tokenizer"], cv_params["preprocessor"], cv_params["dtype"]
296 if not isinstance(cv_params["analyzer"], str):
297 del cv_params["analyzer"]

KeyError: 'tokenizer'

I have run the function model.vectorizer_model.get_params() and it only contains 2 parameters: {'decay': 0.05, 'delete_min_df': None}.
Is there anything I've done wrong? Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions