Description
This came up indirectly in #52509 and I think merits some brainstorming. In no particular order:
-
Circa 2018 there was discussion of stripping some (debug?) symbols from our C files. No idea if that went anywhere. cc @WillAyd
-
In the last couple years we have improved perf in some groupby reductions by using fused types in libgroupby to support more dtypes directly without casts. I think this significantly increased the size of libgroupby. We did something similar in libalgos and libhashtable. I think avoiding the casting is worth it, but we should acknowledge the tradeoffs.
-
Some stuff in _libs could plausibly live outside of cython without a ton of downside. ops_dispatch and reduction come to mind, though these are both quite small. More could move if we learn to live with circular dependencies.
-
This would be a PITA, but we could distribute some dtype-specific stuff separately e.g.
pip install pandas[sparse] pandas[interval] pandas[period]
and potentially see some big savings that way. This would really be a PITA, but would make a big dent. -
IIUC moving cython code back to plain C might get some mileage cc @WillAyd again? This wo
-
Avoid the numpy dependency. (grep finds 1105 "import numpy"s in pandas/, some of them in eg doctests. 33 "cimport numpy"s)
-
Avoid pytz dependency (xref DEPR: deprecate pytz support #46463 coming up shortly once we drop py38)
-
Avoid dateutil dependency
-
There was a discussion [citation needed] of distributing pandas without the tests. I guess that was a "no".
-
related DEV: reduce the size of the dev environment.yml #49998