One reason to use git-annex is to save disk space by tossing files you don't use that often.
I can find big files in the repository with git annex find --largerthan=100M, but is there a way to find large directories? In an ordinary filesystem I'd use "du -h" with a maxdepth to get an idea of what parts of a directory are taking up my disk space, but obviously that won't work with git annex because all the content is in .git/annex. Any ideas?
(I can get a listing of file sizes in a directory with the handy -L flag of ls -- "ls -lL" shows me the sizes of the link targets -- but that won't summarize all the sizes of subdirectories. Unless my ls-fu is just weak.)
du(1) also accepts the -L option, so if you for example want to find what directories occupies most storage:
And if you want to find the biggest files in a directory tree:
I've been thinking about writing a sort of git-annex du. I'm surprised to find someone else looking for such a thing. While "du -L" will tell you how much space is used by files you actually have, I was interested in knowing (approximately) how much space would be used if you were to git-annex get everything you don't yet have.
There are many options and variations to think about, such as:
All of the backends so so far seem to store the size of the files in the filename, so my plan was to read it out of the links. If anybody has a better idea about how to get the sizes of annexed files or options that would be handy for a git-annex du, let me know. I'll see if I can get the start of something useful this weekend. I'll post here when I have something to share.
I'm also open to suggestions for the executable name. Right now I'm thinking "gadu" for git-annex disk usage.
Steve, that would be a very useful utility. I've been thinking of such a tool, but haven't gotten around to write it yet. It would be practical to have before copying big/many files from another drive. If I've been short of free space, I've executed
du -L
in the source directory, but that's a bit cumbersome.And "gadu" is a fine name, yes. Goes well along with my "ga" shortcut for "git annex", which I created two hours after I started using git-annex. I've probably saved thousands of keystrokes because of that. ☺
git annex info --fast *
.... or even, more fancy:
Downside: the json output doesn't give us something
sort
can really work with (it expectsM
,G
, notmebibytes
,gibibytes
, which is arguably a bug...). But precision fanatics can also work around that with:Then you can go crazy trying to convert those numbers back to something readable in your own spare time...
Wrote a bare minimum fuse fs so that du-like utilities like ncdu, gt5, gdu can be used.
It reads each symlink target, try to get a number after
SHA256E-s
, and pretends it's regular file with that size.git-annex add
ed files don't need to be locally available.Files can be deleted but no other operations are implemented.
Hey @wzhd,
Thanks a lot for this tool - annexize works like a charm even in tags views!
It's basically solve the most important problem for me:
I can use ncdu for organizing files in my local git annex repository that does not contain any actual files (only file links), and then just sync with linked repos that do store those files.