Skip to content

Created virtual env does not honour sys.base_prefix correctly #2770

Open
@pelson

Description

@pelson

Issue

When creating a virtual environment, the base prefix of the virtual environment should be the same as the base prefix of the Python that is creating the virtual environment - there should be no additional resolve steps with regards to symlinks.

This behaviour is seen in venv, and was implemented as expected until #2686, where symlinks were resolved.

Why is this important?

In big (scientific) institutions, it is common to have a network mounted filesystem which can be access from all managed machines. This could be to mount a homespace, or to mount some data etc. (I've seen both). To scale this up, it is necessary to have multiple filesystem servers all using the same underlying storage. Machines are then clustered to point to different servers, but the user doesn't know which server a machine is talking to. When combining this with something such as autofs, you find that the server being contacted is in the path (e.g. /nfs/some-machine/), and to smoothen this out accross machines, the managed machines get a canonical symlink (e.g. /project/{my-project} which symlinks to the specific machine mountpoint). Essentially:

machine1:

ls -ltr /project/my-project -> /nfs/nfs-server1

machine2:

ls -ltr /project/my-project -> /nfs/nfs-server2

With this setup, you can create a virtual environment in /project/my-project which works on both machines IFF the symlink is not resolved. This is the behaviour of CPython, and also the behaviour of venv and virtualenv<20.25.1.

Reproducer

Written in the form of a pytest (w validation against venv also):

import os
import pathlib
import sys
import subprocess
import typing

import pytest


def read_venv_config(venv_prefix: pathlib.Path) -> typing.Dict[str, str]:
    venv_cfg_path = venv_prefix / 'pyvenv.cfg'

    cfg = {}
    with venv_cfg_path.open('rt') as cfg_fh:
        for line in cfg_fh:
            name, _, value = line.partition('=')
            cfg[name.strip()] = value.strip()
    return cfg


@pytest.mark.parametrize("venv_impl", ["venv", "virtualenv"])
def test_symlink_python(tmp_path: pathlib.Path, venv_impl: str) -> None:
    py_link = tmp_path / "some-other-prefix"
    pathlib.Path(py_link).symlink_to(sys.base_prefix)

    dest_venv = tmp_path / "some-venv"

    py_bin = py_link / 'bin' / 'python'
    subprocess.run([py_bin, '-m', venv_impl, str(dest_venv)], check=True)

    cfg = read_venv_config(dest_venv)

    py_base_prefix = subprocess.check_output([py_bin, '-c', 'import sys; print(sys.base_prefix)'], text=True).strip()

    # Get the link of the py bin. Don't recursively resolve this (like pathlib.resolve would do)
    py_bin_link = os.readlink(dest_venv / 'bin' / 'python')
    assert py_bin_link == str(py_link / 'bin' / 'python')

    assert str(py_link) == py_base_prefix
    assert py_base_prefix + '/bin' == cfg['home']

The result is a pass in 20.25.0 and a fail since:

>       assert py_base_prefix + '/bin' == cfg['home']
E       AssertionError: assert '/tmp/pytest-of-pelson/pytest-401/test_symlink_python_virtualenv0/some-other-prefix/bin' == '/path/to/my/environment/bin'
E         - /path/to/my/environment/bin
E         + /tmp/pytest-of-pelson/pytest-401/test_symlink_python_virtualenv0/some-other-prefix/bin

It is worth noting that the implementation of #2686 has a bug in the fact that the symlink at dest_venv / 'bin' / 'python' points to sys.base_prefix / 'bin' / 'python', and not the resolved symlink location of sys.base_prefix. Therefore the home value is inconsistent with the symlink that is created by virtualenv. The test validates this.

Implications

There are two issues which #2686 closed:

To be honest, I'm not sure what #2682 is asking for. Perhaps is is a request for behaviour that is different for venv and also the std library (wrt. sys.base_prefix) @mayeut.

For #2684, I also don't fully understand the reason for this being a virtualenv issue (this is my problem, not a problem with the issue itself) - and somebody who knows what the correct behaviour should be would need to chime in (perhaps @ofek?).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions