-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I ran the code you provided in the colab environment, but it gives me the following error:
Sequential Document-cluster association is initialized...
Cluster Alignment Procedure is initialized...
Topic Representation is initialized...
/content/ANTM/antm/ctfidf.py:34: RuntimeWarning:
divide by zero encountered in divide
NotFittedError Traceback (most recent call last)
in <cell line: 25>()
23
24 #learn the model and save it
---> 25 topics_per_period=model.fit(save=True)
26 #output is a list of timeframes including all the topics associated with that period
6 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
1755
1756 if not _is_fitted(estimator, attributes, all_or_any):
-> 1757 raise NotFittedError(msg % {"name": type(estimator).name})
1758
1759
NotFittedError: idf vector is not fitted
code is:
from antm import ANTM
import pandas as pd
df=pd.read_parquet("./data/dblpFullSchema_2000_2020_extract_big_data_2K.parquet")
df=df[["abstract","year"]].rename(columns={"abstract":"content","year":"time"})
df=df.dropna().sort_values("time").reset_index(drop=True).reset_index()
window_size = 6
overlap = 2
model=ANTM(df,overlap,window_size,umap_n_neighbors=10, partioned_clusttering_size=5,mode="data2vec",num_words=10,path="./saved_data")
topics_per_period=model.fit(save=True)