Description
Bug Report: Live.log_sklearn_plot with 1.4M values
Description
I'm training a model where I wanted to log and plot a confusion matrix in the dvc extension.
If I'm creating a confusion matrix by myself, I can display and save it in dvc. But if I want to log it in live.log_sklearn_plot and plotting it in the extension, the CPU Performance is reached maximum rapidly when I click on Plot and my IDE throws that Error Message:
Maximum call stack size exceeded.
DVC can't handle vectors with 1.4M values.
Alternative way how it could done at the moment is to create a function that is creating the confusion matrix and save it as a tsv-file.
Afterwards plot it with a custom-made plot in dvc.
Reproduce
Create 2 vectors with over +1 million values. Then log it with live.log_sklearn_plot. After you ran it, open a terminal and look at the CPU Performance. While the terminal is open, start the extension and try to plot it.
For me, it crashes because of overflow.
Expected
A plot of a confusion matrix in 'Extension: dvc --> plot'
Environment information
Python 3.10.4
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.58.0 (pip)
Platform: Python 3.10.4 on Linux-6.8.0-51-generic-x86_64-with-glibc2.39
Subprojects:
dvc_data = 3.16.7
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.9
Supports:
http (aiohttp = 3.11.9, aiohttp-retry = 2.9.1),
https (aiohttp = 3.11.9, aiohttp-retry = 2.9.1),
ssh (sshfs = 2024.9.0)
Config:
Global: /home/dominicbechtold/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda1
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sda1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/b2a06e49ddecd45b3f218825ee38d78f
Additional Information (if any):