When I took over managing my personal documents this year I wanted to start versioning these files. The documents were often large files like PDFs, images and office documents. This article explores how to effectively manage such files using Git Large File Storage (Git LFS) shown on my personal example.
Context
Before describing my solution for tracking the file changes, let me describe the problem that I was trying to solve. Starting this year I copied over the digital documents of my personal life that my mom archived over the course of my life (Thanks a lot mom!). I developed the following hierarchy to organize the documents into logical categories.
Here’s the implemented folder strucutre:
.
├── README.md
└── Tomáš Ljutenko
├── 01 Documents
│ ├── National ID Card
│ ├── Passport
│ ├── Insurance
│ ├── Birth Certificate
│ └── Driver's License
├── 02 Education
│ ├── 01 Primary School
│ ├── 02 Secondary School
│ ├── 03 University
│ ├── 04 Driving School
│ └── 05 Certificates
├── 03 Career
│ ├── CV
│ ├── Work Contracts
│ └── Business License
├── 04 Finances
├── 05 Health
│ ├── Surgery
│ ├── Covid Tests, Vaccination
│ ├── Immunology
│ ├── Dermatology
│ ├── Neurology
│ ├── Ophthalmology
│ ├── Orthopedics and Rehab
│ ├── Psychology
│ ├── Nutrition
│ ├── Weight, Measurements
│ ├── General Medical Card
│ └── Dental
└── 06 Housing
├── 1 Dormitory
├── 2 Summer Sublet 2022
├── 3 Current Sublet
├── Brno districts.docx
├── Brno center map.docx
└── Dormitory-arrangements-Brno.docx
Each main category contains relevant subcategories. For example, the Education section includes separate folders for primary school, secondary school, university, driving school, and certificates.
The Challenge: Version Controlling Large Files
When attempting to version control this document structure with Git, I encountered a significant obstacle. After initializing the repository and making my initial commit, attempting to push to a remote repository resulted in an error:
$ git push -u origin main
[...]
error: RPC failed; HTTP 413 curl 22 The requested URL returned error: 413 Request Entity Too Large
fatal: The remote end hung up unexpectedlyThis error occurred because I was trying to push large files into the Git repository over the network. I use Cloudflare as a proxy to my git server, which allows only 100MB request body for the free tier.
Git LFS to the rescue
Git Large File Store (Git LFS) is a Git extension that allows us to store large files in a separate file storage while maintaining references to them in Git. Here is how to implement it:
- First, install Git LFS:
git lfs install- Configure which file types should be handled by Git LFS. In my case, I needed to track various document and image formats:
git lfs track "*.pdf"
git lfs track "*.PDF"
git lfs track "*.docx"
git lfs track "*.xlsx"
# ... and so on for other formatsThese commands create or update a .gitattributes file, which tells Git which files should be managed by LFS. Here’s my complete configuration
*.xlsx filter=lfs diff=lfs merge=lfs -text
*.xls filter=lfs diff=lfs merge=lfs -text
*.pdf filter=lfs diff=lfs merge=lfs -text
*.PDF filter=lfs diff=lfs merge=lfs -text
*.docx filter=lfs diff=lfs merge=lfs -text
*.csv filter=lfs diff=lfs merge=lfs -text
*.doc filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.JPG filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
*.PNG filter=lfs diff=lfs merge=lfs -textAfter implementing Git LFS, I could successfully commit and push all files to the repository. The system now efficiently tracks changes to these large files while maintaining the benefits of Git’s version control capabilities.
Conclusion
Git LFS provides an effective solution for version controlling large binary files within Git repositories. This approach has allowed me to maintain a well-organized digital archive of important documents while preserving their version history.