In vectorizer.fit_transform() function, when tf_type="log" we get UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int32') with casting rule 'same_kind'

### steps to reproduce


1. Read a text file.

2. Set the value of the following parameters one by one
tf_type=["linear", "sqrt", "log", "binary"]
idf_type = ["standard", "smooth", "bm25"]
dl_type= ["linear", "sqrt", "log"]
norm =["l1", "l2"]
models= ["lsa","lda","nmf"] 
3. Iterate with a nested loop along values of all 5 parameters  and compute doc_term_matrix
ie 
`for t in tf_type:
    for i in idf_type:
        for d in dl_type:
            for n in norm:
                for mo in models:
                    vectorizer = textacy.vsm.Vectorizer(tf_type=t, apply_idf=True, idf_type=i,dl_type=d, norm=n,min_df=2, max_df=0.95)
                    doc_term_matrix = vectorizer.fit_transform((doc._.to_terms_list(ngrams=3, entities=True, as_strings=True)for doc in spacy_gram))` 
4. When the tf_type="log", we receive the above error.


### expected vs. actual behavior


### possible solution?
I saw that inside the `vectroizer.fit_transform` there is a function `_reweight_values(self, doc_term_matrix)` function. When the `tf_type="log"`, we read `np.log(doc_term_matrix.data, doc_term_matrix.data, casting="unsafe")`. Even though the casting has been declared as "unsafe", there is error is on the next line i.e `doc_term_matrix.data += 1.0`. I think it should be initialized as `doc_term_matrix.data = doc_term_matrix.data+1.0` according to `https://stackoverflow.com/questions/38673531/multiply-numpy-int-and-float-arrays-cannot-cast-ufunc-multiply-output-from-dtyp` 

### context

I am trying to get clusters with similar intent according to my dataset and for that I need the document term matrix. I am just using the brute force method as to when I can receive the best silhouette score of the cluster based on tweaking the parameters of the vectorizer function in a loop.

### environment


Receving an TypeError here in `print_markdown(items)` i.e.` TypeError: `s` must be (<class 'str'>, <class 'bytes'>), not <class 'list'>` inside the `to_unicode(s, encoding, errors)` function.

- operating system: Ubuntu 18.04
- python version: Python 3.7.4
- `spacy` version: 2.2.3
- installed `spacy` models: en_core_web_sm, en_core_web_md, 
- `textacy` version: 0.9.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In vectorizer.fit_transform() function, when tf_type="log" we get UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int32') with casting rule 'same_kind' #288

steps to reproduce

expected vs. actual behavior

possible solution?

context

environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

In vectorizer.fit_transform() function, when tf_type="log" we get UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int32') with casting rule 'same_kind' #288

Description

steps to reproduce

expected vs. actual behavior

possible solution?

context

environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions