Skip to content

Suggestion: remove tests from the distribution #30741




Would it make sense to remove tests folder from the pandas distribution? It takes roughly 33% of the whole package weight.

It is especially important when using pandas inside the AWS Lambdas, where the deployment package size is limited to 50 MB zipped and 5 MB might really make a difference.

# Uncompressed
du -h -s pandas*
 46.5M	pandas
 30.9M	pandas_no_tests

# Compressed
du -h -s pandas*



TomAugspurger commented on Jan 6, 2020


I think we've talked about that in the past. We do have pandas.test() as part of the public API however, so we'd need to consider that.

A couple options:

  1. Provide a separate distribution like pandas-slim or something that excludes these files (and docs)
  2. Have pandas.test() fetch the source files on demand. That seems a bit messy though.

Just as a note: we do exclude the test data files that are present in the git repository. So we're only talking about source files.


vfilimonov commented on Jan 6, 2020


Hello @TomAugspurger

pandas-slim sounds like an good workaround.

It looks like docs are not a part of the distribution.
And right it's tests code only without the data files - in terms of size they are second to _libs and almost equal to the rest of the code.

8.0K	./arrays
 16K	./errors
 24K	./api
 40K	./__pycache__
 76K	./compat
 88K	./_config
248K	./tseries
308K	./util
440K	./plotting
2.3M	./io
6.7M	./core
 17M	./tests
 20M	./_libs

And what is the reason of having tests as a part of an API? I see that numpy, scipy, matplotlib etc are doing the same (while many other libs, especially web-oriented like flask, requests, jinja don't)?


TomAugspurger commented on Jan 6, 2020


stonecharioteer commented on Jan 7, 2020


I particularly used to run the tests for numpy, scikitlearn and matplotlib after installing, since at times I'd have them fail on Windows. However this was quite some time ago, 4 years ago perhaps. Perhaps other users were doing the same?


jonringer commented on Jan 9, 2020


I'm a package manager for nixpkgs, and I'm against removing tests from the sdist package, however, removing from the wheel would make sense from a packaging standpoint. It's considered best practice in FOSS that if you distribute source, you also distribute tests along side it.

tests are a nice guarantee that the package is working as intended.

We could also checkout the github repo for tests. However, quickly looking at the, pandas was meant to have the CI set correct metadata such as version. So the version-controlled source can't be directly be used to package pandas.


joaoe commented on Jan 26, 2020


Hi. i reported this elsewhere, so I'm pasting my comment here.

My use case
After a pip install pandas the lib/site-packages/pandas/tests/ includes a lot of testing code which is definitely not relevant for me and many other end users of pandas.
This bloats the installation and makes installation slower.
I'm working on packaging a python environment to distribute with a preinstalled set of modules and application and there are too many popular 3rd-party modules which include unneeded test code, like numpy, IPython, jupyterlab, etc, which needs to be striped to keep the package size down. I'll be reporting issues to these projects as well.

Therefore, my suggestion is to keep the pandas module streamlined, and move the tests out. Perhaps create a pandas-unittests module if people are interested in it, or just expect users to checkout the code. Another possibility would be to skip packaging the tests folder and when creating packages to upload to

Regarding pandas-slim, everyone and their mothers have a dependency on pandas which would pull the whole code again with tests.

It's considered best practice in FOSS that if you distribute source, you also distribute tests along side it.

That perfectly fine. The discussion is whether tests are bundled with the pandas module or not.

Since you are now almost releasing 1.0 it might be a bit short notice to include this is such a big release. But for the next major release, it could work.

Thank you very much for your attention.

BuildLibrary building on various platforms
Testingpandas testing functions or related to the test suite
on Jan 27, 2020

vfilimonov commented on Jun 3, 2020


A small remark: as a part of recent commit to pyarrow @wesm removed pyarrow.tests from the wheel which to my understanding contributed 2.3 MB of ~60 MB installed size.

In case of pandas tests folder contributed (as of version 1.0.3) tests folder contributed 17.9 MB out of 49 MB installed size.

So I'd like to bring the question back to the discussion and perhaps, @wesm could comment on that?


wesm commented on Jun 3, 2020


I think it would be a good idea to not ship the tests in wheels. If you want users to be able to run the tests against their production installs perhaps the tests can be packaged as a separate source wheel. Install size is becoming a problem because of size constraints in things like AWS Lambda.


TomAugspurger commented on Oct 30, 2020

Contributor has some information.

I think I've come around to the idea that we can just not ship the test files in the main pandas distributions. We can have a separate pandas-tests so that pip install pandas-tests package that's just

  1. The test files
  2. A small file ties things together

We could even update pandas.test() to check for the presence of the pandas-tests package.

28 remaining items

self-assigned this
on Apr 15, 2023

viccsjain commented on Sep 7, 2023


Splitting the pandas library and tests would be really useful. We are using this library in our serverless deployment. and there is size restriction to upload the package into AWS lambda of 250 MB. Removing tests file will reduce the size of our package.


thesamesam commented on Sep 7, 2023


See also the discussion in #54907.


jbsilva commented on Sep 25, 2023


Making docs and tests optional would be great.
In my cloud deployments I repackage it without the tests; 15 MB do make a difference for me.
I've seem many other packages including tests, but never that big.


dolfinus commented on May 30, 2024


I was checking the size of one of my docker images, and found that tests are about 50% of the size of installed package:

Completely waste of space for me.


jonas-w commented on Aug 3, 2024


According to the package has 240 million downloads per month.

Now if the 32MB tests folder from the package would be removed, the package size would be halved. Currently the wheels are roughly 13MB large, so let's say the wheels would be 7MB after removing the tests, then pypi would save ~1.7 Petabyte of Bandwidth per Month, and could have saved roughly ~90 Petabytes of traffic since this issue was opened...


takluyver commented on Sep 17, 2024


I just noticed that on one of our filesystems, which is not set up for lots of small files, pandas' tests end up taking 300 MB of installed size.

Would it work as an initial step to make a script which splits the tests out of the wheels to be uploaded, and makes separate pandas-test wheels which can be uploaded separately? This is obviously not the most elegant way to do it, but I think I can see more or less how to make that work, whereas I'm not sure I can commit the time to figuring out how to rework pandas' build scripts and CI config to produce & use two separate wheels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment





BuildLibrary building on various platformsTestingpandas testing functions or related to the test suite


No type


No projects


No milestone


None yet



Issue actions

    Suggestion: remove tests from the distribution · Issue #30741 · pandas-dev/pandas