Skip to content

Trailing-space directory examples/data/hotel_invoices/extracted_invoice_json / breaks fresh clones on Windows #2703

@jluocsa

Description

@jluocsa

Bug

The repository contains a directory whose name ends with a trailing space:

examples/data/hotel_invoices/extracted_invoice_json /

(Note the space between extracted_invoice_json and /.)

NTFS and git on Windows refuse to materialize such a path because Windows silently strips trailing spaces from directory components, causing a self-collision with the sibling file examples/data/hotel_invoices/extracted_invoice_json (no trailing space — that file exists as a regular file at that exact name). As a result, every fresh clone of this repository on Windows leaves 31 files reported as missing/modified, and several common workflows fail outright.

Affected paths

31 files under examples/data/hotel_invoices/extracted_invoice_json /, e.g.:

examples/data/hotel_invoices/extracted_invoice_json /20190119_002_extracted.json
examples/data/hotel_invoices/extracted_invoice_json /20190202_THE MADISON HAMBURG_001_extracted.json
examples/data/hotel_invoices/extracted_invoice_json /citadines-20190331_Invoice_extracted.json
...

These were referenced from the existing tracked file examples/data/hotel_invoices/extracted_invoice_json (a directory listing exported as JSON, also tracked in the repo).

Repro on Windows

git clone https://github.com/openai/openai-cookbook.git
cd openai-cookbook
git status --short

Yields:

 M "examples/data/hotel_invoices/extracted_invoice_json /20190119_002_extracted.json"
 M "examples/data/hotel_invoices/extracted_invoice_json /20190202_THE MADISON HAMBURG_001_extracted.json"
 ...  (31 entries total)

Any subsequent git rebase, git read-tree --reset HEAD, or git checkout-index -f -a will fail with:

error: invalid path 'examples/data/hotel_invoices/extracted_invoice_json /20190119_002_extracted.json'
fatal: make_cache_entry failed for path '…'

…unless the user sets core.protectNTFS=false and core.longpaths=true locally, which most contributors will not know to do.

Impact

  • New Windows contributors can't get a clean git status after clone — discouraging contribution.
  • git rebase upstream/main is impossible without per-file --skip-worktree workarounds.
  • Cross-platform CI (if added later) would similarly fail on Windows runners.

This is not a Windows-only concern: macOS APFS is case-insensitive by default and also trims trailing whitespace in some Finder contexts, so the fragility is broader than Windows alone — though Windows is where it consistently breaks.

Suggested fix

Rename the directory to remove the trailing space, then update the sibling JSON descriptor file:

git mv "examples/data/hotel_invoices/extracted_invoice_json " \
       "examples/data/hotel_invoices/extracted_invoice_json_files"

…and update any path references in:

  • examples/data/hotel_invoices/extracted_invoice_json (the JSON listing file)
  • Any notebooks under examples/object_oriented_agentic_approach/ or elsewhere that reference these paths

I can put up a PR doing the rename + reference update if maintainers ack the approach. Worth confirming first whether the directory was intentionally named that way (it looks like an accidental shell-quoting artifact during the original ingest).

Environment

  • Git for Windows 2.53.0.windows.1
  • Windows 11, NTFS
  • Reproducible on a vanilla clone — no local config required to reproduce the failure (only to work around it).

Related discovery context

Found while doing routine git rebase upstream/main to refresh a fork; took some debugging to realize the "phantom deletion" of unrelated files was actually caused by this directory blocking index rebuilds on Windows. Posting so others don't go down the same rabbit hole.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions