Skip to content

The EDA demo doesn't work in google colab #996

Open
@paddymul

Description

@paddymul

Describe the bug

I tried to follow your google colab eda demo. The default "install from develop" pip install didn't work.
trying "pip install dataprep" also failed

A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://colab.research.google.com/drive/1U_-pAMcne3hK1HbMB3kuEt-093Np_7Uk?usp=sharing
  2. Click on execute the cells
  3. See an error of ```
    Installing build dependencies ... done
    Getting requirements to build wheel ... done
    Preparing metadata (pyproject.toml) ... done
    Preparing metadata (setup.py) ... done
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 153.9/153.9 kB 4.4 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 702.8/702.8 kB 18.2 MB/s eta 0:00:00
    Preparing metadata (setup.py) ... done
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 37.0 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 67.8 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.5/18.5 MB 62.0 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 38.8 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.0/12.0 MB 71.8 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.8/101.8 kB 6.0 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.6/133.6 kB 6.6 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 66.5 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 41.2 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 53.4 MB/s eta 0:00:00
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 48.2 MB/s eta 0:00:00
    Building wheel for dataprep (pyproject.toml) ... done
    Building wheel for metaphone (setup.py) ... done
    Building wheel for regex (setup.py) ... done
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 1.5.3 which is incompatible.
    albumentations 2.0.4 requires pydantic>=2.9.2, but you have pydantic 1.10.21 which is incompatible.
    xarray 2025.1.2 requires pandas>=2.1, but you have pandas 1.5.3 which is incompatible.
    mizani 0.13.1 requires pandas>=2.2.0, but you have pandas 1.5.3 which is incompatible.
    wandb 0.19.6 requires pydantic<3,>=2.6, but you have pydantic 1.10.21 which is incompatible.
    google-genai 0.8.0 requires pydantic<3.0.0dev,>=2.0.0, but you have pydantic 1.10.21 which is incompatible.
    langchain-core 0.3.37 requires pydantic<3.0.0,>=2.5.2; python_full_version < "3.12.4", but you have pydantic 1.10.21 which is incompatible.
    langchain 0.3.19 requires pydantic<3.0.0,>=2.7.4, but you have pydantic 1.10.21 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cublas-cu12==12.4.5.8; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cublas-cu12 12.5.3.2 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cuda-cupti-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-cupti-cu12 12.5.82 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cuda-nvrtc-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-nvrtc-cu12 12.5.82 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.5.82 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.3.0.75 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cufft-cu12==11.2.1.3; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cufft-cu12 11.2.3.61 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-curand-cu12==10.3.5.147; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-curand-cu12 10.3.6.82 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cusolver-cu12==11.6.1.9; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cusolver-cu12 11.6.3.83 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-cusparse-cu12==12.3.1.170; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cusparse-cu12 12.5.1.3 which is incompatible.
    torch 2.5.1+cu124 requires nvidia-nvjitlink-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-nvjitlink-cu12 12.5.82 which is incompatible.
    panel 1.6.1 requires bokeh<3.7.0,>=3.5.0, but you have bokeh 2.4.3 which is incompatible.
    sphinx 8.1.3 requires Jinja2>=3.1, but you have jinja2 3.0.3 which is incompatible.
    cudf-cu12 24.12.0 requires pandas<2.2.4dev0,>=2.0, but you have pandas 1.5.3 which is incompatible.
    plotnine 0.14.5 requires pandas>=2.2.0, but you have pandas 1.5.3 which is incompatible.
    holoviews 1.20.1 requires bokeh>=3.1, but you have bokeh 2.4.3 which is incompatible.


Steps to reproduce the behavior:
1. Go to https://colab.research.google.com/drive/1U_-pAMcne3hK1HbMB3kuEt-093Np_7Uk?usp=sharing
2. Click on execute the cells
3. Try "pip install dataprep" instead of installing from git
4. See an error of ```
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

[<ipython-input-5-8880ceaa41a0>](https://localhost:8080/#) in <cell line: 0>()
----> 1 from dataprep.eda import *
      2 from dataprep.datasets import load_dataset
      3 from dataprep.eda import plot, plot_correlation, plot_missing, plot_diff, create_report

18 frames

[/usr/lib/python3.11/inspect.py](https://localhost:8080/#) in _descriptor_get(descriptor, obj)
   2430     if get is _sentinel:
   2431         return descriptor
-> 2432     return get(descriptor, obj, type(obj))
   2433 
   2434 

TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object

Expected behavior
I would expect the demo to be useable without debugging.

Further, if this project isn't actively maintained, why don't you say so?

Metadata

Metadata

Assignees

Labels

type: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions