Skip to content

fix: ensure filenames with spaces are excluded from targets #2748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ Unreleased changes template.
Fixes [#2685](https://github.com/bazel-contrib/rules_python/issues/2685).
* (toolchains) Run the check on the Python interpreter in isolated mode, to ensure it's not affected by userland environment variables, such as `PYTHONPATH`.
* (toolchains) Ensure temporary `.pyc` and `.pyo` files are also excluded from the interpreters repository files.
* (pypi) Ensure files from external pypi dependencies with spaces in are excluded from globs.

{#v0-0-0-added}
### Added
Expand Down
5 changes: 4 additions & 1 deletion python/private/hermetic_runtime_repo_setup.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,11 @@ def define_hermetic_runtime_toolchain_impl(
# tests for the standard libraries.
"lib/python{major}.{minor}*/**/test/**".format(**version_dict),
"lib/python{major}.{minor}*/**/tests/**".format(**version_dict),
# During pyc creation, temp files named *.pyc.NNN are created
# During pyc and pyo creation, temp files named *.pyc.NNN and *.pyo.NNN are created.
"**/__pycache__/*.pyc.*",
"**/__pycache__/*.pyo.*",
# File names with spaces should also be ignored.
"**/* *",
] + glob_excludes.version_dependent_exclusions() + extra_files_glob_exclude,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that it would be nice to have glob_excludes.pyc_files() and glob_excludes.pyo_files() and glob_excludes.files_with_spaces(). Then we can ensure that the explanation for why we need to do what we need to do can be next to their definitions.

I would also love to exclude .pyc and .pyc.* is the hermetic toolchain definition, so that the exclude is the same regardless if we are chmoding the dir to be read-only or not.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah can add those methods.

Re: pyc, I think we'd only want the temp files excluded here? I'd originally excluded then in a different PR in a different part of the code (removed in this PR in favor of here). This change is keeping the pyc excluded in a single place.

If the pyc files are stable, then generally it would be preferable to keep them, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Yeah, if they are stable it is fine and we are already setting the vars to make them stable, so SGTM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memory serves, excluding pyc was what finally got rid of the Windows jobs getting "can't delete open file" errors. My theory was two processes both went to import at a module without a pyc. Both would start the pyc process, but one would manage to finish writing and open the pyc, then the other process would try to overwrite it. But it couldn't, because the file was open.

The secondary issue is, as pycs are created, they show as additional files added to the target, thus invalidating it, which means anything downstream has to re-run. Eventually things will settle, but they'll only stay settled as long as the repo sticks around. A similar issue can happen with the timestamps: two processes might race and end up creating slightly different timestamped pycs, thus making it look like the file changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true for the pyc generation happening at repository_rule execution time. I have added -B a while ago.

When the packages are used in the regular py_binary and py_test rules I expect the pyc files to be created in the sandbox and not the repository_rule output dirs, but my claim should be checked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good. Yeah, that should prevent that issue, then. SGTM.

),
)
Expand Down
17 changes: 15 additions & 2 deletions python/private/pypi/whl_library_targets.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,14 @@ def whl_library_targets(
for filegroup_name, glob in filegroups.items():
native.filegroup(
name = filegroup_name,
srcs = native.glob(glob, allow_empty = True),
srcs = native.glob(
glob,
exclude = [
# File names with spaces should be excluded.
"**/* *",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that our supported bazel versions support files with spaces, so why do we need to exclude them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it seems that someone tried it and it did not work?

https://github.com/michael-christen/toolbox/pull/184/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we also hit the issue with setuptools in runfiles, but with the Go runfiles library. Setuptools seems to contain files with spaces, so even if bazel itself can handle then now, the runfiles libraries can't.

],
allow_empty = True,
),
visibility = ["//visibility:public"],
)

Expand Down Expand Up @@ -229,10 +236,13 @@ def whl_library_targets(
"**/*.py",
"**/*.pyc",
"**/*.pyc.*", # During pyc creation, temp files named *.pyc.NNNN are created
"**/*.pyo.*", # During pyo creation, temp files named *.pyo.NNNN are created
# RECORD is known to contain sha256 checksums of files which might include the checksums
# of generated files produced when wheels are installed. The file is ignored to avoid
# Bazel caching issues.
"**/*.dist-info/RECORD",
# File names with spaces should be excluded.
"**/* *",
] + glob_excludes.version_dependent_exclusions()
for item in data_exclude:
if item not in _data_exclude:
Expand All @@ -242,7 +252,10 @@ def whl_library_targets(
name = py_library_label,
srcs = native.glob(
["site-packages/**/*.py"],
exclude = srcs_exclude,
exclude = srcs_exclude + [
# File names with spaces should be excluded.
"**/* *",
],
# Empty sources are allowed to support wheels that don't have any
# pure-Python code, e.g. pymssql, which is written in Cython.
allow_empty = True,
Expand Down
5 changes: 2 additions & 3 deletions python/private/python_repository.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -193,9 +193,8 @@ def _python_repository_impl(rctx):
# Exclude them from the glob because otherwise between the first time and second time a python toolchain is used,"
# the definition of this filegroup will change, and depending rules will get invalidated."
# See https://github.com/bazel-contrib/rules_python/issues/1008 for unconditionally adding these to toolchains so we can stop ignoring them."
# pyc* is ignored because pyc creation creates temporary .pyc.NNNN files
"**/__pycache__/*.pyc*",
"**/__pycache__/*.pyo*",
"**/__pycache__/*.pyc",
"**/__pycache__/*.pyo",
]

if "windows" in platform:
Expand Down