Newsletter
TechAnV Blog
Get updates on security engineering, Rust, eBPF, and DevSecOps. No spam, unsubscribe anytime.
Check your inbox and click the confirmation link to complete your subscription.
Calculating the size of all LFS files in a repo#
I wanted to know how large the deepseek-ai/DeepSeek-V3-Base repo on Hugging Face was without actually downloading all of the files.
With some help from Claude, here’s the recipe that worked.
First, clone the repo without having Git LFS download the files:
1GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-Base2cd DeepSeek-V3-BaseThe git lfs ls-files -s command lists the files along with their sizes:
1git lfs ls-files -s13f4e5fcec2 - model-00001-of-000163.safetensors (5.2 GB)24fb0c2abdd - model-00002-of-000163.safetensors (4.3 GB)3...Then I used this awk recipe to add up those numbers:
1git lfs ls-files -s | grep -o '[0-9.]\+ GB' | awk '{sum += $1} END {print sum " GB"}'Output:
1687.9 GBSince this only counts lines with GB in them I asked Claude for a longer one-liner for handling other units as well. This appears to work but I haven’t verified it in depth yet, so use with caution:
1git lfs ls-files -s | grep -o '[0-9.]\+ [KMGT]B' | awk '{2 split($0, a, " ");3 size=a[1];4 unit=a[2];5 if(unit=="KB") size*=1024;6 else if(unit=="MB") size*=1024^2;7 else if(unit=="GB") size*=1024^3;8 else if(unit=="TB") size*=1024^4;9 total+=size10} END {11 if(total<1024) print total " B";12 else if(total<1024^2) print total/1024 " KB";13 else if(total<1024^3) print total/1024^2 " MB";14 else if(total<1024^4) print total/1024^3 " GB";15 else print total/1024^4 " TB"16}'