I have recently noticed that my .git directory in one copy of my git annex repository has ballooned to about 500GB (on a direct mode network).
Is there any way to move the history to another repository? I guess 500GB isn't that bad on my archive repositories, but for one of my deployment machines this much space for files I don't use isn't really acceptable.
Alternatively, and since this repo is quite a bit older, I think I could also just delete some of the history. Is there any way to do that?
git annex info
you can see how much of this space is from actual annexed objectslocal annex size
. Anyhow if the files aren't associated with any files in your working tree and you don't want them you can remove them with theunused
command, there are some pointers here: http://git-annex.branchable.com/walkthrough/unused_data/.I don't understand what you mean,
git annex info
tells me nothing about the repository size, and I have no idea where to enter the “local annex size” string. At any rate, no, I have already rungit annex dropunused
to get rid of all unused files. All the rest seems to be needed for some reason. My question was whether I can move all of this history to one of my volumes which have more space on them.There are various ways to forget history, both in git and git-annex. I don't have enough clarity into what history is taking up space in your repository to give you a good answer. Answering the following questions will give me more insight into where the space is being used up, then I can give you some ideas on how to reclaim it:
Is the repo in question direct or indirect (I am not sure what you meant by "direct mode network")? Output of
git annex info | grep "repository mode"
command will tell you this.What git-annex repo version is the repo in question? Output of
git annex version | grep "local repository version"
command will tell you this.If you cd into the repo in question and run
git-annex info
it gives you various information about what git-annex thinks about the repository. One of the outputs of this command is "local annex size" which tells you how much space this repo is taking up. In a direct mode repo this should be the same size as you get from sizing all the files in your working directory excluding the .git directory (du -sh --exclude=.git
on Linux). Otherwise in an indirect mode repo, the "local annex size" given bygit-annex info
should match the size of the.git/annex/objects
directory.If you cd into the repo in question what are the outputs of the following commands.
Size of git annex objects (In a direct mode repo this should be very small):
Size of git objects (This just tells you how much history is stored in git. This should also be small (unless you store a lot of large files in git, which you probably don't since you are using git-annex):
Size of working tree (this will tell you file content present in this repo):
this does not happen,
git-annex info | grep "local annex size"
returns nothing.This does nothing but hang and I am not sure whether it's git annex or grep that hangs:
Aaah, sorry, yeah,
git-annex info
is very slow its checks many things locally and remotely… (i've seen it run for 30min+ on some of my repos). No, worries I don't think we'll learn too much more from that command than we learned from thedu
commands.You indeed do have some un-accounted for space in
.git
, I usually expect most of the space to be in the git-annex or git objects folders but that only accounts for 1.6 of the 501 GB in your .git folder.What are the outputs of
du -h -d 1 .git/
thats a level-1 listing of files in .git, anddu -h -d 1 .git/annex/
thats for files in the annex specific folder? That will help narrow down where the space is eaten up from. Perhaps.git/annex/misctmp
or.git/annex/tmp
are the culprits.Additionally, I notice that the git annex version my repo has (5) is 2 versions old. Given the git-annex availability on my distributions, I think I could bump this to 6 --- do you suggest I do this now or after I have this issue handled?
Aaah. All the space is in
.git/annex/misctmp
. This is essentially a directory for git annex to stage things temporarily, but I don't know too much about what gets put in this directory and when it is safe to delete it (the only official documentation is in internals).One person had their
.git/annex/misctmp
dir fill up after interrupting the assistant during transfers, another person had their misctmp fill up after interrupting git annex while it was switching to direct mode.Maybe one of those situations applies to you? Perhaps take a look at some of the files in misctmp and try to evaluate if you feel they are safe to delete? They should have somewhat recognizable names. I don't know if running
git annex fsck
will cleanup any of these files (Joey?).I would personally not rush into upgrading from v5. v6 has been deprecated so, with the latest git-annex, it will auto-upgrade v6 to v7 (so you can't have a v6 repo anymore). So your only options are staying on v5 or upgrading to v7. But, there are some significant differences (currently) that you need to be aware of. v7 no longer supports direct mode (it has features that are similar but not equivalent in all situations). v7 (and v6) take control of
git add
so files are actually added to the annex (not git) when you use this command unless you have configured largefiles (this makes it a bit more difficult to maintain repos that have a mix of git and git-annex files. And unlocked/locked files are treated differently.There were actually a few ways git-annex could be interrupted and leave droppings in misctmp.
This is now dealt with, a subsequent run of git-annex will clean up leftover files from a previous interrupted run.