I have a git-annex repository located at ~/annex which has been set up using git-annex assistant.
This repository is configured as "client".
My other repository is a huge USB drive configured as "full archive".
Now everything seems to work fine except there is one thing I don't understand:
alip@client:~/annex> git-annex whereis ./archive/kus.png
whereis archive/kus.png (1 copy)
e79a4cf6-4c48-4833-93de-98ba6eb625d6 -- deniz
ok
Fine, there is only one copy according to git-annex but the file is still present in this client repository:
alip@client:> du -hs ~/annex/archive
20G /home/alip/annex/archive/
How do I free this space? Am I supposed to call git-annex drop manually?
git-annex version: 3.20130124
Give this a try:
http://git-annex.branchable.com/walkthrough/unused_data/
I'm seeing the same thing and I think it's a bug. Everything I'm reading says that when you move a file from anywhere into an archive directory, it's supposed to upload it to an archive and drop it locally.
If I copy the file from outside my annex into the archive directory it will usually do exactly that and I'll end up with a symlink, but not always. Sometimes I'll end up with a file.
If I move the file from within my annex into the archive, everything gets confused. Here is an example:
From what I can see here, the file was archived properly and the delete in git was issued correctly. The daemon.log shows me that the file was uploaded correctly when it was initially created and then added to the archive directory, but I don't see a direct delete in the log file (even though the git log shows me it happened).
Now on to the really weird part. If I restart the daemon (or there are other ways to trigger it, but this seems to be the easiest for me), I see that log lines that say testdir/archive/test.txt was dropped (good) but then testdir/test.txt is re-downloaded and appears back in that directory. I wanted the file deleted from testdir, why is it back?!? Since it gets redownloaded and readded, my new logs look like this:
And the filesystem looks like this:
I have 3 remotes set up: a usb drive (client, it was created like that directly by annex when I created a Removable drive repo, it's a bare repo), s3 (archive), and nas (a remote server repo, that connects over ssh and uses rsync). I thought maybe that the usb drive still had a record of testdir/test.txt existing, but the logs in refs/heads/synced/master match.
It's as if the git annex thinks that the original content is still within a non-archive directory, so the data can't be dropped, but everywhere I look the references to that content are gone. The only way I've found to fix this is to convert to indirect mode and delete the symlinks from there (which seems to work better). It seems to be an interaction between direct mode, the archive, and maybe my filesystem (xfs).
git annex fsck
to fix the location log info, and if you restart the assistant it'll probably then realize the files are locally present and drop them. Of course, the USB drive has to be connected for that to work; git-annex needs to verify the files have been moved to it before it can drop them.@Jason That is a bug. Please don't report what are obviously bugs in the forum!
I've posted a bug report about it here: ?direct mode renames