Skip to content

OpenNeuro CLI fails to upload from DataLad dataset with permission errors #3596

@amaiacc

Description

@amaiacc

The OpenNeuro CLI fails to upload a dataset that has been converted to a DataLad dataset. The upload fails with permission errors when trying to copy files, even though the files are accessible and have correct permissions.

Error 1: Permission denied on annexed files

error: Uncaught (in worker "") (in promise) PermissionDenied: Permission denied (os error 13): 
copy '/path/to/dataset/sub-0505/func/sub-0505_task-ep2dboldPinelSpanish_run-2_events.tsv' 
-> '/tmp/openneuro_cli_openneuro.org_46d897301370d48e/ds006458/sub-0505/func/sub-0505_task-ep2dboldPinelSpanish_run-2_events.tsv'

Root cause: The file is a git-annex symlink:

$ ls -lh sub-0505/func/sub-0505_task-ep2dboldPinelSpanish_run-2_events.tsv
lrwxrwxrwx 1 user group 128 May 29 12:51 sub-0505/func/sub-0505_task-ep2dboldPinelSpanish_run-2_events.tsv 
-> ../../.git/annex/objects/v0/gm/MD5E-s4032--bb37893c477af2800a5caf55f491dc5a.tsv/MD5E-s4032--bb37893c477af2800a5caf55f491dc5a.tsv

The CLI's copyFile operation doesn't handle symlinks properly.

Workaround attempted: After running datalad unlock on text files, this specific error was resolved.

Error 2: CLI tries to process .git/annex/objects/

After unlocking files, a new error appeared:

WARN Skipped file ".git/annex/objects/2V/kq/SHA256E-s11534688--f0281c13596fa1e9c0f7c0956eca9f40b6df62f383c1ca7a51dc12d567f4e09f.nii/SHA256E-s11534688--f0281c13596fa1e9c0f7c0956eca9f40b6df62f383c1ca7a51dc12d567f4e09f.nii"
error: Uncaught (in promise) Error: Unhandled error in child worker.

The CLI appears to be traversing into .git/annex/objects/ and attempting to process files there, which should not be uploaded.

Analysis

The OpenNeuro CLI seems to have three issues when working with DataLad datasets:

  1. Does not handle git-annex symlinks: The copyFile operation should either:

    • Follow symlinks when copying (use readlink/realpath)
    • Require/detect that files be unlocked before upload
  2. Does not properly exclude .git directory: The CLI should skip the entire .git directory, including .git/annex/objects/, but appears to be traversing into it.

  3. Hardcodes /tmp directory on Linux: The CLI (or underlying isomorphic-git library) ignores the TMPDIR environment variable and hardcodes /tmp on Linux systems. This prevents workarounds for permission or filesystem issues by using alternative temporary directories.

Expected behavior

The upload should succeed, as OpenNeuro is designed to work with DataLad datasets.

How to reproduce

  1. Initial dataset state: Started with a non-DataLad BIDS dataset that was successfully uploaded to OpenNeuro
    openneuro upload --affirmDefaced . --verbose
  2. Converted to DataLad:
    datalad create --force .
  3. Configured OpenNeuro as remote: Following https://docs.openneuro.org/packages/openneuro-cli.html
  4. Made edits and saved:
    datalad save -m "commit message"
  5. Attempted upload:
    deno run --reload -A jsr:@openneuro/cli upload --dataset ds006458 . --affirmDefaced --verbose

Desktop

  • OpenNeuro CLI version: 4.38.3 (from jsr:@openneuro/cli)
  • Deno version: deno 2.5.3 (stable, release, x86_64-unknown-linux-gnu)
    v8 14.0.365.5-rusty
    typescript 5.9.2
  • OS: Linux
  • DataLad version: datalad 1.2.1
  • git-annex version: git-annex version: 10.20230408-g5b1e8ba77
  • DataLad configuration:
    datalad siblings output:
    .: here(+) [git]
    .: openneuro(-) [https://openneuro.org/git/0/ds006458 (git)]
    git remote -v output:
    openneuro https://openneuro.org/git/0/ds006458 (fetch)
    openneuro https://openneuro.org/git/0/ds006458 (push)

Phone

No response

Additional information

  • The dataset passes BIDS validation (with warnings only)
  • File permissions are correct (644 for files, 755 for directories)
  • The dataset is currently in draft/unpublished state on OpenNeuro.
  • datalad push also failed with error: Username for 'https://openneuro.org/git/0/ds006458': (authentication prompt). This suggests the OpenNeuro CLI is the intended method for uploading to draft datasets, but it doesn't handle DataLad datasets properly.
  • Dataset integrity verified with git annex fsck - all files exist locally and are accessible

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions