Open
Description
If we use BFG to remove all blobs larger than 1M, we can reduce the openpipeline repo from 200MiB to around 44MiB. We can probably reduce it even further if we set the threshold even lower. @DriesSchaumont WDYT?
$ git clone --mirror [email protected]:openpipelines-bio/openpipeline.git lfs_test.git
Cloning into bare repository 'lfs_test.git'...
remote: Enumerating objects: 397073, done.
remote: Counting objects: 100% (6019/6019), done.
remote: Compressing objects: 100% (2307/2307), done.
remote: Total 397073 (delta 3644), reused 5873 (delta 3512), pack-reused 391054
Receiving objects: 100% (397073/397073), 200.99 MiB | 5.97 MiB/s, done.
Resolving deltas: 100% (269042/269042), done.
$ java -jar ~/Downloads/bfg-1.14.0.jar --strip-blobs-bigger-than 1M lfs_test.git
Using repo : /home/rcannood/workspace/openpipelines-bio/lfs_test.git
This repo has been processed by The BFG before! Will prune repo before proceeding - to avoid unnecessary cleaning work on unused objects...
Completed prune of old objects - will now proceed with the main job!
Scanning packfile for large blobs: 1588292
Scanning packfile for large blobs completed in 6,443 ms.
Found 6 blob ids for large blobs - biggest=14395908 smallest=1521437
Total size (unpacked)=47515450
Found 443 objects to protect
Found 512 commit-pointing refs : HEAD, refs/heads/481-add-leiden-clustering-to-scvi-pipeline, refs/heads/590-clusterleiden-config-contains-incorrect-markdown-references, ...
Found 4 tag-pointing refs : refs/tags/0.3.0, refs/tags/0.3.1, refs/tags/0.4.0, refs/tags/0.4.1
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 5fb2a9e0 (protected by 'HEAD')
Cleaning
--------
Found 4459 commits
Cleaning commits: 100% (4459/4459)
Cleaning commits completed in 3,003 ms.
Updating 156 Refs
-----------------
Ref Before After
------------------------------------------------------------------------------------------------
refs/heads/481-add-leiden-clustering-to-scvi-pipeline | 700bffd6 | 6d0b9eec
refs/heads/590-clusterleiden-config-contains-incorrect-markdown-references | 772769ee | 7abac021
refs/heads/604-use-the-viash-dependencies-config-value-for-workflows | 843009e8 | 8b7b78ba
refs/heads/concat_dtypes | c8f1e5f8 | e92cbea4
refs/heads/feature/ataq-demux | 5dcebba7 | 1666af0f
refs/heads/feature/ataq-qc | dde357ff | 98d64cbd
refs/heads/feature/scpoli_implementation | b17c3a84 | 3ee6bc23
refs/heads/increase_ci_memory | 1464e7aa | 9b6af876
refs/heads/integration_build | b225d951 | d1eaab7b
refs/heads/main | 5fb2a9e0 | 56ac0431
refs/heads/main_build | 8a9894a6 | cc0001cd
refs/heads/main_build_datasets_schema | 5022c403 | 901839ca
refs/heads/more_memory_tests | fe5188fa | 7608da95
refs/heads/release | 98678513 | 0594ac36
refs/heads/review_cellxgene | f881710c | 475cecfc
...
Updating references: 100% (156/156)
...Ref update completed in 38 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
..............................................DDDDDDDDDmmDmm
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | 6455c1d6 | fae7b4ab
Last dirty commit | e27f9172 | 3ffb155c
Deleted files
-------------
Filename Git id
--------------------------------------------------------------------------------------------
cellranger-tiny-bcl-1.2.0.tar.gz | 4b3e7995 (13.4 MB)
cl-base.obo | af96cc47 (1.5 MB)
matrix.mtx.gz | 9e469be2 (4.0 MB)
pbmc_1k_protein_v3_filtered_feature_bc_matrix.h5 | eade8772 (5.2 MB)
pbmc_1k_protein_v3_filtered_feature_bc_matrix.h5ad | 145b611c (13.7 MB)
pbmc_1k_protein_v3_filtered_feature_bc_matrix.norm.hvg.pca.nn.umap.h5ad | de2901dd (7.6 MB)
In total, 22327 object ids were changed. Full details are logged here:
/home/rcannood/workspace/openpipelines-bio/lfs_test.git.bfg-report/2023-11-24/14-40-05
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ cd lfs_test.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 397073, done.
Counting objects: 100% (397073/397073), done.
Delta compression using up to 32 threads
Compressing objects: 100% (379869/379869), done.
Writing objects: 100% (397073/397073), done.
Selecting bitmap commits: 4368, done.
Building bitmaps: 100% (148/148), done.
Total 397073 (delta 268875), reused 124073 (delta 0), pack-reused 0
$ git push
Enumerating objects: 397073, done.
Writing objects: 100% (397073/397073), 44.70 MiB | 3.69 MiB/s, done.
Total 397073 (delta 0), reused 0 (delta 0), pack-reused 397073
remote: Resolving deltas: 100% (268875/268875), done.
Metadata
Metadata
Assignees
Labels
No labels