-
Notifications
You must be signed in to change notification settings - Fork 2
Improving installing packages with conda #97
Description
I'm opening this issue to discuss about how we could improve the installation of packages with conda, with the goals of making that:
- faster
- more flexible
- more predictable
This is motivated by the recent work done by @philippjfr to improve the speed of the test workflows (starting from Panel) and by various difficulties experienced with using pyctdev for almost a year now.
How it works
Installing packages with pyctdev requires first to create an environment. This is usually done with installing first pyctdev and then running:
doit env_create --name envname --python=3.x -c channel1 -c channel2
which:
- Creates a new environment with the name, python version and channels provided
- Install
pyctdevin that new environment:- If
pyctdev(the one installed originally and running these steps) is in a pre-release version, install from thepyviz/label/devchannel - If the env var
PYCTDEV_SELF_CHANNELis provided, install from the channel provided as value of the env var - If none of the above is true, install from the
pyvizchannel
- If
Note as this may be important that in step 2, the conda install step does not include all the channels listed in the doit env_create call.
Then the environment is activated, there's no pyctdev command for that.
The main installation can now take place, this is done by running:
doit develop_install -c channel1 -c channel2 -o options1 -o options2 -o options3
which:
- Finds the list of build dependencies and installs them with conda (respecting the channels provided with
-c-) - Finds and concatenates the options dependencies, and installs them with conda (respecting the channels provided with
-c-) - Installs the package in editable mode with
python -m pip install --no-deps --no-build-isolation -e .(--no-depsas all the dependencies should already be there,--no-build-isolationto avoid creating a virtual environment, all the build dependencies are already installed anyway)
--conda-mode=mamba can be set to use mamba instead of conda in steps 1 and 2.
Making it faster
There are I believe two main avenues to make this faster.
The first one would be to use a faster solver, by default, either mamba or the libmamba solver. The --conda-mode option already offers the possibility to run the slowest install steps with mamba. Ideally though we wouldn't have to use mamba, and we would rely on the libmamba solver implemented in conda, which hopefully should one day becomesthe default one, or at least available not under an experimental flag.
The second one would consist in reducing the number of conda install steps. There are currently 4 conda install steps, the first one being when the environment is created, the three other ones to install pyctdev, the build dependencies and then all the other required dependencies. Installing multiple times in a conda environment is known to lead to long solving times:
doit env_createcould install installpyctdevwhen it is creating the environment, reducing it basically toconda create -n new-env python=3.x pyctdev(as a matter of fact I have replaceddoit env_createby this in a number of workflows already)- Unless I miss something, it seems it would be fine merging the two first steps of
doit develop_install.
We could go even further and have a single command to install all the dependencies, adding to doit env_create some of the features of doit develop_install, which would then be called as such: doit env_create --name my-env --python=3.x -c channel1 -c channel2 -o options1 -o options2 -o options3. As this might end up in installing a version of pyctdev that is not the latest, another command line parameter could be added to be able to add a version constraint, e.g. --pyctdev-install=">=1.1".
Making it more flexible
Some packages that are needed to run the test suite or the docs build are not available on PyPi. Because of that they are not listed anywhere in the setup.py file, instead, they are installed directly in the Github workflows files with conda install. pyctdev should offer a way to install these packages without having to resort to use conda directly.
Some packages are not available on Anaconda.org (e.g. a recent example is pytest-playwright). These packages are usually installed manually with pip after running doit develop_install. There should be a way to declare a list of packages that pyctdev should install with pip.
Regarding these two points, one could think that the packages to install only with conda and only with pip could be declared in a config file (e.g. in setup.cfg). However, I believe that for maximum flexibility it would actually be better to add command line parameters to pyctdev instead, as sometimes the packages to install depend on the operating system or on some other conditions. I would suggest something like doit develop_install --conda-install "nodejs>15" --conda-install mesalib --pip-install pytest-playwright --pip-install ....
Making it more predictable
What sometimes makes the installation process difficult to predict, and even not so robust, is the "channel dance" 💃 , whereby some packages get re-installed from another channel because of different channels being specified in the install steps. This was the source of a bad bug - that took months to find on HoloViews test suite as it happened only on a platform, and that still happens from time to time in the ecosystem - by which Python itself was being re-installed during a doit develop_install call, leading to a cryptic doit/pyctdev error.
One of the steps that I think is one source of this problem is the second step of doit env_create, the one that installs pyctdev. Because it doesn't re-use the channels passed to doit env_create, and because it chooses the channel to install pyctdev itself based on some rather implicit conditions. I would suggest that this step should install pyctdev with the channels provided to doit env_create, and adding a command line parameter to env_create to override the channel it should be installed from, e.g. doit env_create ... --pyctdev-channel "pyviz/label/dev". Note that in most HoloViz cases you wouldn't use that new parameter as either pyviz or pyviz/label/dev are specified in the channels list.
An approach that I have recently tried and that I find very appealing is to:
- create an empty environment:
conda create -n my-env - activate that environment:
conda activate my-env - configure the environment channels:
conda config --env --append channels channel1 --append channels channel2 - configure the channel priority to strict:
conda config --set channel_priority strict
This creates a local condaRC file associated with that environment. The benefits of this approach is that all the channels are declared prior to installing anything, in the order they are supposed to be used. Setting the channel priority to strict makes the environment solving even more predictable (and faster I believe). So the environment is set up, and the later conda install calls don't have to specify any channel at all. I think that this approach also offers a better separation between the user conda configuration, their system condaRC is less likely to leak its configuration during the installation procedure. Another situation that can benefit from this approach is in a local setup you want to download a new package or update a package. In that case you would do that using conda install directly, and you would have to remember the channels you should use and their order in order to avoid the channel dance. With the suggested approach you don't have to remember anything about the channels.
Suggestion
If I would combine all the suggestions I've made into a rather ambitious proposal, that would be extending doit env_create so that the following is allowed:
doit env_create \
--conda-mode=mamba \ # using mamba but would prefer libmamba
--name=my-env \
--pyctdev-install ">=1.1" \
--python=3.x \
-c pyviz/label/dev -c conda-forge -c nodefaults \
--channel-priority strict \
-o tests -o examples \
--conda-install "nodejs>15" \
--pip-install pytest-playwright
which would do:
conda create -n my-envconda activate my-envconda config --env --append channels pyviz/label/dev --append channels conda-forge --append channels nodefaultsconda config --set channel_priority strictmamba install python=3.x "pyctdev>=1.1" all the other tests and examples dependencies "nodejs>15"python -m pip install pytest-playwrightpython -m pip install --no-deps --no-build-isolation -e .
I would appreciate any feedback on this issue. If the last suggestion is too ambitious, implementing separately some of the first suggestions should already be an improvement. Note that I have not given any thought on the pip version of doit develop_install, which I think doesn't suffer from the issues reported here, at least not the performance related problems.