When syncing with a huge git annex repository on usb disk, my small laptop partition runs out of inodes.
Any workaround for this?
Use bare repository?
Some git annex command are not supported. This makes managing (particularly adding) files difficult.
Use a loop file partition with tiny block size and large inode numbers?
Operations on a huge git repository are slow. On a loop file partition will be slower.
Maybe shrink the partition and make room for a specific partition for git annex repository?
Any opinions?
Define "huge"? Checking out a git repository necessarily requires one inode per file in the repository, plus a smaller quantity for the things in .git. A git-annex repository is much the same as any other git repository.
Even when I make a really tiny 100 mb ext4 filesystem, it defaults to 25000 inodes, which would be enough to contain a checkout of my second largest git-annex repository.
Anyway, using git branches seems like a reasonable workaround, to the extent I understand your problem. Make a branch with the files in it you want to have available on the small drive, or check out an empty branch on the small drive and
git annex add
files in there. You can merge the branch back into your master branch on the large drive.TL;DR: don't play with
inode_ratio
when creating a filesystem. You will certainly get more storage space, up to 1.5%. But the numerous symlinks maintained by git-annex will consume a lot of inodes. You don't want your filesystem to fail to create any new file when doing a biggit annex add
or worse,git repack
orgit gc
orgit repair
, right ?Git annex heavily uses symlinks and dirs, which consumes several inodes per file.
I ran out of inodes on two of my disks, with different git-annex repositories.
Joeyh wrote:
I believe that in
.git/annex/objects
, the structure consumes two inodes, one for the directory and one for the actual file storing data. Is it right?So, when a file is checked out and managed by
git-annex
, is consumes 3 inodes, right?Plus possibly many inodes used temporarily when doing some git operations. E.g.
git repair
. Man page says: "Since this command unpacks all packs in the repository, you may want to run git gc afterwards."So, at least 4 inodes per file ?
Optimising space by reducing inodes is no longer worth the trouble.
External disks were formatted years ago maximizing disk space (for details see http://serverfault.com/a/523210 ). The assumption was that nearly all files were more than 4 megabytes in size, so the ext2
inode_ratio
was 4194304. This assumption was very true and all went well untilgit-annex
entered the game.In the case of a git-annex repository, if there is at least 4 inodes per actual image file,
inode_ratio
must be at most the average file size divided by 4.I considered an inode ratio at 64k. With most files far above 1 megabyte, that leaves room for much more than 16 inodes for housekeeping. You don't know in advance how the usage will evolve. Perhaps you will need to do one or more git clones of the same annexed data, especially in a recovery situation, which will quickly multiply the number of consumed inodes.
mkfs.ext4
chooses by itselfinode_ratio=16384
. Filesystem capacity is 1.5% smaller than with the minimum number of inode. I'll just stick with that.(Yes,
-m 0
removes the 5% reserved space for root, and yes, filesystem performance drops dramatically then full beyond 90% then 95%. That parameter can be tuned with tune2fs at any time. In doubt, remove it.)An observation I've just made cleaning up after a surprising (to me back then, not to the reader seeing the context here) "disk full" condition on a half-empty disk:
On my system it was not git-annex's objects and symlinks (with its mere several thousand inodes) but git (with millions of them) that filled up the system during
git gc
.Running
git prune
helped me get out of the immediate situation, but anygit gc
filled up the disk again. This appears to be a general problem when repacking large git repositories in some states (producing 8M files intermediately). In the end, I resorted to moving .git/objects to a dedicated btrfs partition (which doesn't suffer from inode exhaustion).An alternative could be forgetting history, this should also help getting down the number of git objects.