Based on the thread over at "du" equivalent on an annex? I decided to finally write a du like utility for git-annex. A 0.01 version is up over at http://git-annex.mysteryvortex.com/git-annex-utils.html. It works, but I intend to make it smarter about handling git repos and annexed files, as well as adding more of the options available in the standard du utility.
Currently it will tally up the sizes of links that look like they are annexed files. I plan to make it actually interact with git and git-annex to verify the files are annexed and enable options like tallying files only from specific remotes, only missing files, not double counting files which are annexed multiple times but stored only once, etc...
I'll have time to work on this on the weekends, and plan to get my git repo up soon. After gadu is mostly complete I might work on some other tools.
Releases are signed with a PGP key with fingerprint 5E1A 65D7 D5E9 56F1 C239 43DF C6C8 9A0B 6003 8953 (available on the website)
I fixed some bugs that gave the wrong answer occasionally, and made gadu much smarter now.
It now searches for the .git dir an makes sure the git-annex links are well formed before counting them. I also added a few more du like options.
Have downloaded v0.02 and experimented a bit, and it seems to work nicely. A couple of things, though:
du
(1) from GNU coreutils uses 1024kB as default. AFAIK 512 byte blocks is an old way of measuring sizes from the really ancient UNIX days. Traditionally correct, maybe, but not very useful these days..
" as default?du
(1) from GNU coreutils uses-h
for this, but that option is already used for--help
. And that's OK, I think-h
should be reserved for that purpose. IMHO using-h
as a synonym for--human-readable
was a bad choice by coreutils, but it's too late to change that now.Is there any Git repository available for git-annex-utils somewhere? That's my preferred way of getting updates and follow the development.
Anyway, thanks.
du will take up to yottabytes for the --block-size option. I had been fudging the sizes with a size_t thinking 16 exabytes was plenty big enough for now, but since I was implementing --block-size I went ahead and converted everything to use the GNU MP. So libgmp is now a dependency.
--human-readable probably doesn't have exactly the same output, but I think it is good enough. I tried to make the options work mostly the same as du from core-utils. Let me know if you find other discrepancies.
I'll see about making the git tree available soon, but it may have to wait until next weekend. I may also look into a forum for the website, or a mailing list.
I pay attention to feedback
I'm not done with it yet, I want to add in some options to limit what gets counted.
For example: If you have two annexed files that contain the same content using the same backend, they will be stored only once in the .git/annex/objects directory but be counted twice by gadu.
I want to fix that, but I'll leave an option to keep that behavior if you want. I also want to add options to count or not count files that exist in a certain repo. It will be very easy to add options to only count files that you have or don't have locally as well.
Making it pay attention to environment variables that git and git-annex do would also be a good idea. (like GIT_DIR, etc...)
I'm open to good ideas that anybody has, unfortunately I can only work on it on the weekends for now.
Hi
gadu is a great util! The speed increase compared to "du -smL" will make it my fav. util for size calc!
ciao markus
sunny256, the git repo is now accessible at http://git.mysteryvortex.com
Markus, never used the -m option myself. I added it in git it'll be in the next tarball. (I plan to go through the du man page and add all appropriate options soon)
John, I wasn't aware of your sizes utility. I'll look into it.
No problem, glad to see it is useful. I'm not exactly a web guy, but I want to get some sort of comment/discussion system up there soon so we aren't filling up Joey's web site with semi-offtopic discussion. (also a little beautification is in order)
Yes, contributions are welcome. GPG/PGP encrypted email is the preferred mode of communication.
Currently I ask for copyright assignment in case I want to change licenses in the future. I pledge not to go to a non-free license, but the GPL3 license choice was fairly arbitrary. I might want to add the "or any later version" clause, for example. There is also potential for a library to be split off which might benefit from something like LGPL licensing or similar. I haven't really studied the licensing situation since GPL3 came around, so I need to take some time to look into it.
I don't want to have a licensing discussion here though as it would be offtopic. Feel free to email me and we can discuss.
I don't want to steal gadu's thunder, and I really quite like having an ecosystem of tools develop around git-annex.
With that said, "git annex status ." now shows the disk used for all files in the current directory and below. It also shows the number of keys, and the total amount of disk those keys would use.
Additionally, you can use all the standard git-annex file limiting options. For example, here I'm finding out how much disk space is used by files located on a remote system:
I just had a look at this question today as I learn git-annex. I think the commands have changed since the last comment. However, there remain several ways to determine disk usage, for example in the folder
Music
but you could also use
du
withso the previous comments by joeyh were correct 2 years ago, but now git annex status behaves more like git-status than anything else, and will not give you disk usage.
however,
git annex info
will, and if you use--fast
, it works pretty fast as well. example, on my pictures collection:whereas without
--fast
is much slower, presumably because it's fetching the tracking information:14 seconds vs 114 seconds! almost an order of magnitude of difference...
still, it seems to me
git annex info --fast $path
should be more clearly put forward as an alternative du solution for now. maybe this should be made into a tips page?