Skip to content

create_vectorized_features error #103

Open
@MLFlexer

Description

@MLFlexer

I have problems running the following commands in python:

import ember
ember.create_vectorized_features("/data/ember2018/")

I have installed the dependencies and tried on docker with leif versions 0.9.0, 0.10.1 and i still get the same failure:

ember.create_vectorized_features("./ember/")
Vectorizing training set
  0%|                                                                                    | 0/900000 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 44, in vectorize_unpack
    return vectorize(*args)
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 31, in vectorize
    feature_vector = extractor.process_raw_features(raw_features)
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in process_raw_features
    feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in <listcomp>
    feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 192, in process_raw_features
    entry_name_hashed = FeatureHasher(50, input_type="string").transform([raw_obj['entry']]).toarray()[0]
  File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/sklearn/feature_extraction/_hash.py", line 170, in transform
    raise ValueError(
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 75, in create_vectorized_features
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 60, in vectorize_subset
  File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
>>>

I seems from the error msg, that the input is not the same format as expected in the vectorizor?
Any fix to this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions