Skip to content

It appears Keyerror when I use the Dataprep module #973

Open
@DummyBroker

Description

@DummyBroker

Describe the bug
I use anaconda and install the dataprep module by the following code
conda install -c conda-forge dataprep
Then I try the example code from the website

from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing
df = load_dataset("titanic")
print(df.columns.tolist())
create_report(df).show()

and it showed the following error:

from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing

df = load_dataset("titanic")
print(df.columns.tolist())
['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']

create_report(df).show()
Computing series-max-agg-6f34ce939adc72d34b6b5a81d3b66957:   0%|          | 0/1420 [00:00<?, ?it/s]C:\ProgramData\anaconda3\Lib\site-packages\dask\core.py:119: RuntimeWarning: invalid value encountered in divide
  return func(*(_execute_task(a, cache) for a in args))
error happended in column:Survived                                                                              
Traceback (most recent call last):

  File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3653 in get_loc
    values are attempted to be sorted, but any TypeError from

  File pandas\_libs\index.pyx:147 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\index.pyx:176 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\hashtable_class_helper.pxi:7080 in pandas._libs.hashtable.PyObjectHashTable.get_item

  File pandas\_libs\hashtable_class_helper.pxi:7088 in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Survived'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  Cell In[123], line 1
    create_report(df).show()

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\__init__.py:68 in create_report
    "components": format_report(df, cfg, mode, progress),

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:78 in format_report
    comps = format_basic(edaframe, cfg)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:291 in format_basic
    res_variables = _format_variables(df, cfg, data)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:120 in _format_variables
    rndrd = render(itmdt, cfg)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:2473 in render
    visual_elem = render_cat(itmdt, cfg)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:1573 in render_cat
    fig = bar_viz(

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:223 in bar_viz
    df["pct"] = df[col] / nrows * 100

  File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\frame.py:3761 in __getitem__
    key = com.apply_if_callable(key, self)

  File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3655 in get_loc

KeyError: 'Survived'

My numpy version is 1.25.2
My pandas version is 2.0.3
My Python version is 3.11.4

I want to know why this error happen and how to solve it.
Is there anything needed to be added?
Thank you so much!

Metadata

Metadata

Assignees

Labels

type: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions