Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
assets/datasets/* filter=lfs diff=lfs merge=lfs -text
49 changes: 49 additions & 0 deletions .github/workflows/pr_file_check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Check for Large Files and Restricted Extensions

on:
pull_request:
branches:
- main
types: [opened, synchronize, reopened]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
LLVM_VERSION: 16

jobs:
check-files:
name: Check file size and type
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
set-safe-directory: true
fetch-depth: 1

- name: Fetch base branch
run: git fetch origin ${{ github.event.pull_request.base.ref }} --depth=1

- name: Check for large files
run: |
MAX_SIZE=5M # Set max file size limit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set this to be 5M for now. Happy to change to some other more sensible numbers

LARGE_FILES=$(git diff --name-only --diff-filter=A origin/${{ github.event.pull_request.base.ref }} | xargs du -h | awk -v max="$MAX_SIZE" '$1 > max {print $2}')

if [[ ! -z "$LARGE_FILES" ]]; then
echo "❌ The following files exceed the allowed size of $MAX_SIZE:"
echo "$LARGE_FILES"
exit 1
fi

- name: Check for restricted file types
run: |
BLOCKED_EXTENSIONS="(exe|zip|tar.gz|bz2)" # Add any forbidden extensions
BAD_FILES=$(git diff --name-only --diff-filter=A origin/${{ github.event.pull_request.base.ref }} | grep -E "\.($BLOCKED_EXTENSIONS)$" || true)
if [[ ! -z "$BAD_FILES" ]]; then
echo "❌ The following files have restricted extensions:"
echo "$BAD_FILES"
exit 1
fi
67 changes: 67 additions & 0 deletions assets/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Git LFS Setup and Usage Guide

## Installation

Git LFS must be installed before using it in a repository. Follow the installation steps based on your operating system.

### Ubuntu (Debian-based distributions)
```sh
sudo apt update
sudo apt install git-lfs
git lfs install

### AlmaLinux and ManyLinux
sudo dnf install git-lfs
git lfs install

### macOS
brew install git-lfs
git lfs install
```

## Tracking and committing large files

1. Initialize Git LFS in your repository:

`git lfs install`

2. Track specific file types or individual files using the following command:

`git lfs track "assets/*"`, where `assets` is a directory containing large files.

3. Commit the changes to `.gitattributes`:

`git add .gitattributes && git commit -m "Track large files with Git LFS"`

4. Add and commit the large files:

`git add assets/largefile.zip && git commit -m "Add large file"`

5. Push to remote:

`git push origin branch_name`

## Cloning and fetching large files

1. Clone a repository that uses Git LFS:

`git clone https://github.com/username/repository.git`. By default, cloning only retrieves the pointer files to the large file. To fetch the actual large files, use `git lfs pull`.

2. Fetch large files for an existing repository:

`git lfs pull`

## Check Git LFS status

To check which files are tracked by Git LFS:

`git lfs ls-files`

## Removing a file from LFS

Use the following steps to remove a file from LFS:

`git rm --cached assets/largefile.zip`, then commit and push.

Once the file is removed, remember to delete the tracking information in `.gitattributes`.