Open
Description
This is great! How can one incorporate dimensionality reduction into the pipeline? For substantive and speed reasons, I'd like to exclude the most and least common words:
corpus = st.CorpusFromPandas(df,
category_col='country',
text_col='text',
nlp=nlp,
# can we discard 1st and 99th percentile of words here?
).build()
Metadata
Metadata
Assignees
Labels
No labels