Skip to content

Conversation

27pchrisl
Copy link

This PR adds support for serializing the config of the OnlineCountVectorizer by explicitly stating all the constructor properties. Based on the test, it looks like no other changes were needed to support the loading of the vectorizer.

Fixes #1431

@MaartenGr
Copy link
Owner

Thanks for the PR! I am not entirely sure if adding the parameters manually is the right way to go here. What if the parameters of CountVectorizer get updated in a new release of scikit-learn? That would break this functionality. Instead, might it not be possible to update the save_ctfidf_config function to account for the missing parameters by looking at the internal params of the function, either by adding a function to save them or update the get_params function?

@27pchrisl
Copy link
Author

Hi @MaartenGr, no it is certainly not ideal! This was previously mentioned in scikit-learn/scikit-learn#8822 and discussed further in scikit-learn/scikit-learn#13555 - it seems for now this is the way others handle this.

I could add a test case that compared the params of CountVectorizer to a list in the test case, so if the list were changed a test would fail?

@MaartenGr
Copy link
Owner

I could add a test case that compared the params of CountVectorizer to a list in the test case, so if the list were changed a test would fail?

Although that would help detect any issues in newer versions, it will still break the code requiring an immediate update. It is a nice solution if there is not alternative but what about the following I mentioned in the previous message:

Instead, might it not be possible to update the save_ctfidf_config function to account for the missing parameters by looking at the internal params of the function, either by adding a function to save them or update the get_params function?

This could be implemented as shown in the below example: https://stackoverflow.com/a/56491394/10532563

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with saving the model

2 participants