Skip to content

NotFittedError: idf vector is not fitted #7

@Shafizadegan

Description

@Shafizadegan

I ran the code you provided in the colab environment, but it gives me the following error:

Sequential Document-cluster association is initialized...
Cluster Alignment Procedure is initialized...
Topic Representation is initialized...
/content/ANTM/antm/ctfidf.py:34: RuntimeWarning:

divide by zero encountered in divide


NotFittedError Traceback (most recent call last)
in <cell line: 25>()
23
24 #learn the model and save it
---> 25 topics_per_period=model.fit(save=True)
26 #output is a list of timeframes including all the topics associated with that period

6 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
1755
1756 if not _is_fitted(estimator, attributes, all_or_any):
-> 1757 raise NotFittedError(msg % {"name": type(estimator).name})
1758
1759

NotFittedError: idf vector is not fitted

code is:

from antm import ANTM
import pandas as pd

df=pd.read_parquet("./data/dblpFullSchema_2000_2020_extract_big_data_2K.parquet")
df=df[["abstract","year"]].rename(columns={"abstract":"content","year":"time"})
df=df.dropna().sort_values("time").reset_index(drop=True).reset_index()

window_size = 6
overlap = 2

model=ANTM(df,overlap,window_size,umap_n_neighbors=10, partioned_clusttering_size=5,mode="data2vec",num_words=10,path="./saved_data")

topics_per_period=model.fit(save=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions