Open
Description
I have problems running the following commands in python:
import ember
ember.create_vectorized_features("/data/ember2018/")
I have installed the dependencies and tried on docker with leif versions 0.9.0, 0.10.1 and i still get the same failure:
ember.create_vectorized_features("./ember/")
Vectorizing training set
0%| | 0/900000 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 44, in vectorize_unpack
return vectorize(*args)
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 31, in vectorize
feature_vector = extractor.process_raw_features(raw_features)
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in process_raw_features
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in <listcomp>
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 192, in process_raw_features
entry_name_hashed = FeatureHasher(50, input_type="string").transform([raw_obj['entry']]).toarray()[0]
File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/sklearn/feature_extraction/_hash.py", line 170, in transform
raise ValueError(
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 75, in create_vectorized_features
File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 60, in vectorize_subset
File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
>>>
I seems from the error msg, that the input is not the same format as expected in the vectorizor?
Any fix to this?
Metadata
Metadata
Assignees
Labels
No labels