Skip to content

secret scan pre-commit crashing on Windows when doing big merges #1032

Open
@mherzberg

Description

@mherzberg

Environment

  • ggshield version: 1.34.0
  • Operating system (Linux, macOS, Windows): Windows
  • Operating system version: Windows 11 Pro 23H2
  • Python version: 3.12

Describe the bug

When the user is about to commit a merge with many files and long file names, ggshield secret scan pre-commit crashes with [WinError 206] The filename or extension is too long. This is because ggshield is building and executing a git command that can be longer then the maximum path length on Windows, which is approximately 32,767 with "long paths" enabled.

Steps to reproduce:

You can use the following bash script in a Git shell on Windows to reproduce the error:

#!/usr/bin/env bash

cd $(mktemp -d)
git init

function commit_files() {
    for i in $(seq -f "%0200g" 200)
    do
        echo $1 > $i.txt
    done
    git add .
    git commit -m "msg" --no-verify
}

git checkout -b a
commit_files x

git checkout -b b
commit_files foo

git checkout a
commit_files bar

git merge b
git add .

ggshield secret scan pre-commit --verbose --debug

This script creates 200 files with file names that are a bit longer than 200 characters each, creates some changes for each of them across two branches, and finally merges the 200 changes from both branches. The kind of merge within this script is a bit artifical but demonstrates the issue that we observed during real merges.

Actual result:

Traceback (most recent call last):
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\cmd\utils\common_decorators.py", line 18, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\cmd\secret\scan\precommit.py", line 85, in precommit_cmd
    commit = Commit.from_merge(ctx_obj.exclusion_regexes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\core\scan\commit.py", line 87, in from_merge
    shas_in_merge_branch = dict(
                           ^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\core\scan\commit_utils.py", line 370, in get_file_sha_in_ref
    output = git(["ls-tree", "-z", ref] + files, cwd=cwd)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\utils\git_shell.py", line 206, in git
    result = subprocess.run(
             ^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\User\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 206] The filename or extension is too long
                    20408:6608 [D] ggshield.__main__:56 scan exit_code=128

The full debug log can be found here: error.log

Expected result:

ggshield scanning for secrets and not crashing.

Possible solutions:

One could start breaking the single ls-tree command into multiple ones once the command grows too long (depending on platform and "long path" configuration). This might apply for other subprocess executions, too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    status:newThis issue needs to be reviewedtype:bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions